Thursday, July 30, 2020

Azure Synapse Analytics: What is it?

Data Lakes and Data Warehouses are key for better analytics and business intelligence, but until recently, it’s been hard to make the two work together.

Azure Synapse Analytics is a new Microsoft data service that aims to bring the management of these two essential components - and plenty more capabilities - together into one toolset, helping companies overhaul their data analytics practices for better decision-making with a more intuitive, end-to-end solution.

Built as the next evolution of Azure SQL Data Warehouse, it offers limitless analytics and serverless on-demand or provisioned resources at scale to meet different business needs for ingesting, preparing, managing and serving data for business intelligence and machine learning. In short, its capabilities are far more defined than SQL Data Warehouse.

But what does this actually mean?

Essentially, it provides an easier and more powerful way for businesses to bring together the processes of consolidating, storing, curating and analysing our data to generate reliable, actionable insights.

First announced at Microsoft Ignite 2019, Synapse Analytics is a fascinating and highly relevant new offering from Azure, given the moves by all three major public cloud vendors to simplify and consolidate their increasing number of data analytics, AI and machine learning services.

So, who is it for, exactly?

In this blog, we cover how exactly Azure Synapse Analytics aims to help your analytics efforts, its potential use case for your unique data analytics initiatives, and how it can offer your team a simpler data management platform by unifying BI, ML, AI and warehousing processes into one.

 

How Azure Synapse Analytics Helps Your Business

Many companies typically use several tools to consolidate, store and analyse their critical workloads.

It’s usually a complex and expensive task to try and get these siloed systems to work together in a cohesive way, as usually they’re unable to integrate with each other, leaving organisations at the mercy of the vendor or just having to use them all separately - a time consuming inefficiency.

With Azure Synapse Analytics, large businesses can centralise management of their data lakes and data warehouses into a single unified experience -  and leverage native integration with the BI and ML capabilities of other Microsoft tools such as Power BI and Azure Machine Learning.

You can use your data faster and more securely, as the tool combines insights from all your data sources - be that data warehouses, BI tools, data lakes - into one specialised workspace called the Azure Synapse Studio.

Here, staff can handle data preparation, data warehousing, general data management, and integration with business intelligence tools (data visualisation, reporting) and AI efforts like machine learning with more simplicity and organisation than ad-hoc and traditional on-premise analytics solutions.

Here are some key benefits it provides for businesses looking to overhaul their analytics:

  • Better BI and Data Visualisation: With easy and native integration with Power BI, reporting and analysis of key metrics is far more engaging, easy-to-use and easy to share with relevant stakeholders across the business. Less data silos, more visibility.

  • Less Code: Decision-makers can stay on-top of what their data specialists and engineers are doing across warehousing, analytics, BI and AI initiatives thanks to a code-free environment and visual-based UI for managing data pipelines that’s far more accessible for the average person.

  • Limitless scale: As a cloud-based service, you can view, organise and query relational and non-relational data much faster (at petabyte scale) than traditional on-premises tools, all while using familiar SQL language - there’s no new and complicated skills roadblock here. You can also optimise performance for critical workloads using workload isolation and limitless concurrency.

  • In-built Security: Synapse Analytics provides advanced protection of all business data with the broader and proven security and identity benefits of the cloud. You get the latest authentication, compliance, encryption, governance, policies and threat detection controls for all assets, and enhanced privacy and protection with dynamic data masking and row-level and column-level security. Most on-premise data management tools can’t compare - and this is in-built.

Azure Synapse Analytics, at its heart, however, is a tool that requires a fair amount of technical and analytics knowledge to get the most out of. Thankfully, developers and data scientists have more than their own fair share of clear benefits in using the service.

 

How Azure Synapse Analytics Helps Your Team

azure_synapse_analytics_how_it_works

With Azure Synapse Analytics, data professionals and engineers get a powerful but unified data platform for all their enterprise data warehousing and analytics needs. Aside from the high-level, general benefits listed above, the teams on the ground can leverage several benefits.

  • Access to Azure Machine Learning: Analysing data in Azure Synapse comes with native integration with Azure Machine Learning (AML) and its machine learning capabilities - meaning you’re able to score ML models and generate predictions directly within your data warehouse without having to move large amounts of data. You can also convert existing models trained in AML into an internal format representation within Synapse Analytics without having to start over from scratch, saving time and money.

  • Faster Sharing: Native integration with Azure Data Services means your team can easily share data internally and externally via Azure Data Share integration, for both data lakes and data warehouse data. No more having to copy terabytes of big data from various different enterprise storage systems and somehow avoiding the complexity that inherently accompanies such a task - it’s all in one platform, and seamless.

  • Support for data streaming ingestion: You can execute analytic queries over streaming data with Azure Synapse Analytics, directly in your data warehousing environment, and leverage integration with Events Hubs and IoT hubs.

  • Workload Isolation: A functionality new with Synapse Analytics, workload isolation allows you to reserve resources, exclusively, for a workload group with increased flexibility and control over data warehouse resources.

In terms of its integration with other powerful Azure Data Services solutions like Azure Data Factory (for ingestion), Power BI (for data visualisation and reporting), Azure Databricks (analysis via Spark or Azure Data Warehouse), this is a solution that should be on your radar.

 

Conclusion: Why use Azure Synapse?

Azure Synapse Analytics is an evolving service, having only launched in the past year to enhance SQL Data Warehousing. If you or your business needs help understanding how Azure Synapse Analytics could fit your business case, visit our expertise section. 

IS AZURE SQL DATA WAREHOUSE A GOOD FIT? (UPDATED)

Azure SQL DW.png

Over time at BlueGranite, we have observed some customer confusion around when Azure SQL Data Warehouse is most appropriate to use. This blog post and the accompanying decision tree below are meant to help you answer the question: Is Azure SQL Data Warehouse the best technology choice for your implementation?

Azure SQL Data Warehouse (SQL DW) is a cloud-based Platform-as-a-Service (PaaS) offering from Microsoft. It is a large-scale, distributed, MPP (massively parallel processing) relational database technology in the same class of competitors as Amazon Redshift or Snowflake. Azure SQL DW is an important component of the Modern Data Warehouse multi-platform architecture. Because Azure SQL DW is an MPP system with a shared-nothing architecture across distributions, it is meant for large-scale analytical workloads which can take advantage of parallelism. The distributed nature of Azure SQL DW allows for storage and compute to be decoupled, which in turn offers independent billing and scalability. Azure SQL DW is considered an elastic data warehouse because its level of compute power can be scaled up, down, or even paused, to reserve (and pay for) the amount of compute resources necessary to support the workload. Azure SQL DW is part of Microsoft’s ‘SQL Server family’ of products which also includes Azure SQL Database and SQL Server (both of which are SMP, symmetric multiprocessing, architecture). This commonality means that knowledge and experience will translate well to Azure SQL DW, with one notable exception: MPP architecture is very different from the SMP architecture of Azure SQL Database and SQL Server, thus requiring specific design techniques to take full advantage of the MPP architecture. The remainder of this post discusses some of the most important things to consider when making a decision to use Azure SQL DW.

Azure SQL Data Warehouse

The following comments highlight each of the items in the decision tree above:

Q1: Have you justified that a relational data warehouse solution is supported by business requirements and needs?

The most common justifications for a data warehouse implementation include:

Consolidate and relate multiple disparate data sources. Data is inherently more valuable once it has been integrated together from multiple sources. A common example cited is the 360-degree view of a customer which could align customer master data, sales, open receivables, and support requests so they can be analyzed together.

Centralize analytical data for user data access. A data warehouse is most often thought of as supporting corporate BI efforts (typically thought of as standardized corporate reports and dashboards). It can also play a big role in self-service BI efforts by providing consistent, cleansed, governed data. Realistically, a "single version of the truth" can never be 100% met, but effective governance and master data management can increase the odds that the data warehouse provides consistent and accurate data for all types of analytics throughout the organization.

Historical analysis. The data warehouse supports historical reporting and analysis via techniques such as periodic snapshots and slowly changing dimensions. A common scenario is a customer's sales representative has changed this quarter, or a department rolls up to a different division now. The flexibility to report on either "the way it was" or "the way it is" can offer significant value - and is rarely available from standard source systems.

User-friendly data structure. It is valuable to structure the data into a user-friendly dimensional model which really helps the largest portion of the user base. Other techniques such as friendly table and column names, derived attributes, and helpful measures (such as MTD, QTD, and YTD), contribute significantly to ease of use. Time investments here should encourage data analysts to utilize the data warehouse, leading to consistent results and, in turn, saving time and effort.

Minimize silos. When a business-driven analytical solution (often referred to as shadow IT) becomes critical to running the business, that is a signal that it's time to promote the solution up to a centralized system so that it can be supported more fully, integrated with other data, and made available to a larger user base. The data warehouse can take advantage of business user efforts and continue gaining in maturity and value, as well as minimize silos and “one-off” solutions.

Multi-platform architecture which takes advantage of existing investment. If your existing data warehouse does bring value for certain use cases, it is not economically feasible to retire it or migrate everything to another architecture (ex: Hadoop or data lake). Instead, we recommend a multi-platform architecture in which the data warehouse is one, albeit important, component. For instance, using a data lake for data ingestion, exploratory analysis, staging for a data warehouse, and/or archival from the data warehouse, are all complementary to the data warehouse which can handle serving much of the curated, cleansed data.

TipTip: A data warehouse is most advantageous when is deployed alongside other services, such as a data lake, so that each type of service can do what it does best.

--------------------------------------------------------------------------------------

Q2: Are you comfortable with a cloud-based Platform-as-a-Service solution?

Azure SQL DW is a service offering for the public cloud and the national (sovereign) clouds. It is a PaaS (Platform-as-a-Service) solution in which the customer has no responsibility for, or visibility to, the underlying server architecture. The storage and compute are decoupled, which is a very big advantage of Azure SQL DW. Costs for processing power (compute) are based on a consumption model, which is controlled by data warehouse units (DWUs for Gen1, and cDWUs for Gen2) that can be scaled to meet demanding data loads and peak user volumes. The persisted data is required to be stored on Azure premium storage, which performs better than standard storage and thus is more expensive.

TipTip: As a PaaS service, Microsoft handles system updates. For Azure SQL DW, customers may specify a preferred primary and secondary day/time range for system maintenance to occur.

--------------------------------------------------------------------------------------

Q3: What kind of workload do you have?

Azure SQL DW is most appropriate for analytical workloads: batch-oriented, set-based read and write operations. Workloads which are transactional in nature (i.e., many small read and write operations), with many row-by-row operations are not suitable.

TipTip: Although ‘data warehouse’ is part of the product name, it is possible to use Azure SQL Database for a smaller-scale data warehousing workload if Azure SQL DW is not justifiable. Keep in mind that Azure SQL DW is part of the SQL Server family; there are some limitations and feature differences between Azure SQL DW, Azure SQL DB, and SQL Server.

--------------------------------------------------------------------------------------

Q4: How large is your database?

It is difficult to pinpoint an exact number for the absolute minimum size recommended for Azure SQL DW. Many data professionals in the industry see the minimum “practical” data size for Azure SQL DW in the 1-4 TB range. Since Azure SQL DW is an MPP (massively parallel processing) system, you experience a significant performance penalty with small data sizes because of the overhead incurred to distribute and consolidate across the nodes (which are distributions in a “shared nothing” architecture). We recommend Azure SQL DW for a data warehouse which is starting to approach 1 TB and expected to continue growing.

TipTip: It’s important to factor in realistic future growth when deciding whether to use Azure SQL DW. Since the data load patterns are different for Azure SQL DW (to utilize PolyBase and techniques such as CTAS which maximize MPP performance) versus Azure SQL DB or SQL Server, it may be a wise decision to begin using Azure SQL DW to avoid a future migration and future time redesigning data load processes. Do keep in mind though that it is a myth that you can provision the smallest size Azure SQL DW and expect it to perform just like Azure SQL DB.

--------------------------------------------------------------------------------------

Q5: Do you have firm RPO, RTO, or backup requirements?

Being a PaaS offering, Azure SQL DW handles snapshots and backups each day. The service automatically creates restore points throughout each day and supports an 8-hour recovery point objective (RPO) over the previous 7 days. Once a day the service also automatically generates a geo-redundant backup, with its recovery point objective being 24 hours. Customers also have the capability of creating a user-defined restore point as of a specific point in time. The retention period for a user-defined restore point is still 7 days, after which it is automatically deleted.

TipTip: Backups are not taken when the compute resources for Azure SQL DW are in a paused state.

--------------------------------------------------------------------------------------

Q6: Do you plan to deliver a multi-tenant data warehouse?

A multi-tenancy database design pattern is typically discouraged with Azure SQL DW.

TipTip: Although features which can be important to multi-tenancy (such as row-level security and column-level security) are available, you may instead want to evaluate using elastic pools in conjunction with Azure SQL Database for multi-tenant scenarios.

--------------------------------------------------------------------------------------

Q7: What kind of data model represents your data warehouse?

A highly normalized data warehouse structure does not completely preclude you from using Azure SQL DW. However, since Azure SQL DW takes significant advantage of clustered columnstore indexes (which utilize columnar compression techniques), Azure SQL DW performs substantially better with denormalized data structures. For that reason, following sound dimensional design principles is strongly advised.

TipTip: Modern reporting tools are more forgiving of a substandard data model, which leads some data warehouse developers to be less strict with dimensional design. This can be a mistake, particularly if there are many users issuing self-service queries because a well-formed star schema aids significantly in usability.

--------------------------------------------------------------------------------------

Q8: How is your data dispersed across tables in the database?

Even if you have a large database (1-4 TB+), table distribution is another consideration. An MPP system such as Azure SQL DW performs better with fewer, larger tables (1 billion+ rows) versus many small to medium-size tables (less than 100 million rows).

TipTip: As a rule of thumb, a table does not benefit from being defined as a clustered columnstore index until it has more than 60 million rows (60 distributions x 1 million rows each). In Azure SQL DW, we want to make the effort to use clustered columnstore indexes (CCIs) as effectively as possible. There are several reasons for this, but one key reason is because CCI data is cached on local SSDs and retrieving data from cache improves performance significantly (applicable to Azure SQL DW Gen2 only).

--------------------------------------------------------------------------------------

Q9: Do you understand your data loading and data consumption patterns extremely well?

Being a relational database, Azure SQL DW is considered “schema on write.” Since Azure SQL DW is also a distributed system, distribution keys inform the system how it should allocate data across the nodes. The selection of a good distribution key is critical for performance of large tables. In addition to distributing data, partitioning strategies are different in Azure SQL DW versus standard SQL Server. It is extremely important for Azure SQL DW developers to deeply understand data load patterns and query patterns in order to maximize parallelization, avoid data skew, and to minimize data movement operations and shuffling within the MPP platform.

TipTip: PolyBase can be used one of two ways: (1) for loading data into Azure SQL DW (in fact, it’s recommended), or (2) for querying remote data stored outside of Azure SQL DW. PolyBase is the recommended method for loading data in Azure SQL DW because it can natively take advantage of the parallelization of the compute nodes, whereas other loading techniques do not perform as well because they go through the control node. Usage of PolyBase for querying remote data should be done very carefully (see Q15 below).

--------------------------------------------------------------------------------------

Q10: Are you comfortable with ELT vs. ETL data load patterns, and with designing data load operations to specifically take advantage of distributed, parallel processing capabilities?

The principles and patterns for loading a distributed MPP system are very different from a traditional SMP (Symmetric Multi-Processing) system. To utilize parallelization across the compute nodes, PolyBase and ELT (extract>load>transform) techniques are recommended for Azure SQL DW data load processes. This means that migration to Azure SQL DW often involves redesigning existing ETL operations to maximize performance, minimize logging, and/or utilize supported features (for example, merge statements are not currently supported in SQL DW; there are also limitations with respect to how insert and delete operations may be written; using CTAS techniques are recommended to minimize logging). New tables added to Azure SQL DW often involve an iterative effort to find the best distribution method.

TipTip: Although PolyBase does significantly improve the performance for data loads because of parallelization, PolyBase can be very challenging to work with depending on the data source format and data contents (for instance, when commas and line breaks appear within the data itself). Be sure to include adequate time in your project plan for development and testing of the new data load design patterns.

--------------------------------------------------------------------------------------

Q11: Do you have staff to manage, monitor, and tune the MPP environment?

Although Azure SQL DW is a PaaS platform, it should not be thought of as a hands-free environment. It requires monitoring of data loads and query demands to determine if distribution keys, partitions, indexes, and statistics are configured well.

TipTip: Azure SQL DW does have some emerging features offering recommendations. Integration with Azure Advisor and Azure Monitor is continually evolving, which makes it easier for an administrator to identify issues. Specifically, Azure SQL DW Gen2 utilizes “automatic intelligent insights” within Azure Advisor to display if issues exist related to data skew, missing statistics, or outdated statistics.

--------------------------------------------------------------------------------------

Q12: Do you have a low number of concurrent query users?

Queries are queued up if the maximum concurrency threshold is exceeded, at which time queries are resolved on a first-in-first-out basis. Because of concurrent user considerations, Azure SQL DW frequently has complementary solutions in a multi-platform architecture for handling different types of query demands. We often see Azure Analysis Services and/or Azure SQL Database as the spokes in a hub-and-spoke design.

TipTip: The number of concurrent queries executing at the same time can be as high as 128 depending on the service tier (i.e., the pricing level) and based on resource class usage (because assigning more resources to a specific user reduces the overall number of concurrency slots available).

--------------------------------------------------------------------------------------

Q13: Do you have a lot of self-service BI users sending unpredictable queries?

A data warehouse is primarily intended to serve data for large queries. Although a tool such as Power BI supports direct query with Azure SQL DW, this should be done with some measure of caution. Specifically, dashboards can be troublesome because a dashboard page refresh can issue many, many queries all at once to the data warehouse. As noted in Q12, for production use of Power BI we often recommend using a semantic layer, such as Azure Analysis Services, as part of a hub-and-spoke strategy. The objective of introducing a semantic layer is to (a) reduce some of the query demand on the MPP (reducing data movement when unpredictable queries come in), and (b) reduce concurrent queries executed on the MPP system, and (c) include user-friendly calculations and measures which can dynamically respond as a user interacts with a report.

TipTip: It is certainly possible to use Azure SQL DW directly with an analytical tool like Power BI (i.e., with Power BI operating in DirectQuery mode rather than import mode). However, usage of DirectQuery mode should be tested thoroughly, especially if the user base expects sub-second speed when slicing and dicing. Conversely, if the expected usage is more data exploration where query response time is more flexible, then it’s possible direct querying of Azure SQL DW from a tool such as Power BI will work. The Gen2 tier of Azure SQL DW introduced adaptive caching for tables which are defined as a clustered columnstore index (CCI). Adaptive caching increases the possibility that self-service user queries can be satisfied from the cached data in Azure SQL DW, which improves performance significantly. Another option to potentially consider: Power BI Premium now also has aggregations which can cache data in Power BI’s in-memory model for the first level of queries, requiring a drillthrough to Azure SQL DW only when the user gets to a lower level or less commonly used data.

--------------------------------------------------------------------------------------

Q14: Do you have near-real-time data ingestion and reporting requirements?

Distributed systems like Azure SQL DW are most commonly associated with batch-oriented data load processes. However, capabilities continue to emerge which support near real-time streaming data ingestion into Azure SQL DW. This works in conjunction with Azure Databricks streaming dataframes, which opens some interesting new scenarios for analysis of lower latency data (such as data generated from IoT devices or the web).

TipTip: When using Azure Databricks for streaming data, it is the front-end for the ingestion stream before being output to Azure SQL DW in mini-batches via PolyBase—meaning this can be classified as a near-real-time system, but you should expect there to be some latency. Also, keep in mind that Azure Databricks can also be effectively utilized as a data engineering tool for data processing and loading to Azure SQL DW in batch mode (the JDBC Azure SQL DW connector from Azure Databricks does take advantage of PolyBase).

--------------------------------------------------------------------------------------

Q15: Do you have requirements for data virtualization in addition to or in lieu of data integration?

PolyBase in Azure SQL DW currently supports Azure Storage (blobs) and Azure Data Lake Storage (Gen1 or Gen2), which can be used for very selective data virtualization and data federation needs. Data virtualization refers to querying the data where it lives (thus saving work to do data integration to relocate the data elsewhere).

TipTip: When using PolyBase for data virtualization (i.e., querying remote data stored in Azure Storage or Azure Data Lake Storage), there is no pushdown computation to improve query performance. This means that Azure SQL DW needs to read the entire file into TempDB to satisfy the query. For queries which are issued rarely (such as a quarter-end analysis, or data to supply to the auditors), it’s possible to use virtualized queries effectively in very specific situations where expectations for query response speed is not of utmost importance.

--------------------------------------------------------------------------------------

Q16: Do you anticipate the need to integrate with multi-structured data sources?

It is very common for a data warehouse to be complementary to a data lake which contains multi-structured data from sources such as web logs, social media, IoT devices, and so forth. Although Azure SQL DW does not support data types such as JSON, XML, spatial or image, it can work in conjunction with Azure Storage and/or Azure Data Lake Storage (Gen1 or Gen2) which might provide additional flexibility for data integration and/or data virtualization scenarios.

TipTip: When connecting to external data, PolyBase currently supports reading Parquet, Hive ORC, Hive RCFile, or delimited text (such as a CSV) formats. Parquet has emerged as one of the leading candidates for a data lake storage format.

--------------------------------------------------------------------------------------

Q17: Do you intend to scale processing power up, down, and/or pause to meet varying data load and/or query demands?

One of the best features of a cloud offering like Azure SQL DW is its elasticity of compute power. For instance, you could scale up on a schedule to support a demanding data load, then scale back down to the normal level when the load is complete. The Azure SQL DW can even be paused during times when no queries are sent to the data warehouse at all, during which the storage of data is safe, yet there are no compute charges at all (because storage and compute are decoupled). Utilizing scale up/down/pause techniques can prevent over-provisioning of resources, which is an excellent cost optimization technique.

TipTip: When an operation to scale up or down is initiated, all open sessions are terminated, and open insert/update/delete transactions are rolled back. This behavior is to ensure the Azure SQL DW is in a stable state when the change occurs. The short downtime may not be acceptable for a production system and/or may only be acceptable during specific business hours. Also, keep in mind that the adaptive cache (available with the Gen2 tier) is cleared when a scale or pause occurs, requiring the cache to be re-warmed to achieve optimal performance.

Azure Data Warehouse becomes Azure Synapse Analytics - What's new?


Azure announced the rebranding of Azure Data Warehouse into Azure Synapse Analytics. Given it's always a bit unclear if a software vendor just changed the name or actually added features, we did your homework and summarized the key findings. Enjoy the read!

Some introductions into Azure SQL Data Warehouse

Azure Data Warehouse becomes Azure Synapse Analytics - What's new?

Before going into Azure Synapse, this is what you need to know about the older product Azure SQL Data Warehouse.

Released in Gen 1 in 2016, Microsoft released in 2018 Azure Data Warehouse Gen 2, a top-notch cloud-native OLAP datawarehouse. The core difference between SQL Data Warehouse and a SQL Database it that the first - as its competitors - splits compute and storage as it should be: data is filed in a filesystem and compute is used only when needed and with massive parallel scaling opportunities. Azure DW was Microsoft's answer to competitors like Google BigQuery, Amazon Redshift, Presto or Snowflake.

Azure SQL Data Warehouse becomes Azure Synapse Analytics

On Ignite 2019, Azure announced the rebranding of Azure Data Warehouse into Azure Synapse Analytics: a solution aimed to further simplify the set-up and use of Modern Data Platforms including their development, their use by analysts and their management and montoring.

Azure Data Warehouse becomes Azure Synapse Analytics - What's new?

 

What Azure Synapse Analytics adds new to the table

With Azure Synapse Analytics, Microsoft makes up for some missing functionalities in Azure DW or generally the Azure Cloud overall. Synapse is thus more than a pure rebranding.

On-demand queries

With Synapse we can finally run on-demand SQL or Spark queries. Rather than spinning up a Spark service (e.g. Databricks) or resuming a Data Warehouse for running query, we can now write our SQL or PySpark code and pay per query. On-demand queries make it so much easier for analysts to take a quick look at a .parquet file (just opening it in Synapse - see example below) or to analyse the Data Lake for some interesting data (using the integrated Data Catalog)

 

Azure Data Warehouse becomes Azure Synapse Analytics - What's new?

Picture: Opening a Data Lake parquet file directly in a Notebook

 

Integration of Storage Explorer

Data in your Data Lake isn't always easy to browse through and for sure not for a business user or analyst. Within Synapse, Azure now integrated their Storage Explorer interface: a way to easily browse through the Data Lake and access all folders. 

With Data Explorer integrated, an analyst can - in one interface - see and access all the data in the Data Lake and Data Warehouse (which he/she has access to): no further connection strings to be created and shared and no need for local tool such as SQL Server Management Studio (for accessing the Data Warehouse) and Azure Storage Explorer for Data Lake browsing.

Azure Data Warehouse becomes Azure Synapse Analytics - What's new?

Picture: Browse through your Data Lake and Database datasets in 1 single interface

Notebooks and SQL workbench

Up to now, analysts or data scientists had to work with local notebook tools (Jupyter), Databricks and/or local SQL tools to access the different data from the Data Lake or Data Warehouse. Both Azure Data Warehouse and Data Lake Store had data explorer in preview but the functionalities where limited.

Within Synapse, Microsoft created an end-to-end analysis workbench accesible through the portal. One can write SQL queries and run them on the Data Warehouse compute or an on-demand SQL or Spark compute. We offcourse all hope we can access our Databricks cluster right from this interface.

SQL Analytics on Data Lake

Parquet-format is a great highly-compressed format commonly used in Data Lakes. It's great to store but a bit more cumbersome to read and analyse: you can't open a parquet file in Windows; you'll need a tool which can read parquet (Spark or a parquet-tool like ParquetViewer).

Within Synapse, Microsoft integrated SQL Analytics functionalities on Data Lake formats: you can now run SQL script on parquet files directly in your Data Lake: e.g. using right-click on the files and using 'Open in SQL Script'.

Azure Data Warehouse becomes Azure Synapse Analytics - What's new?

Picture: Analyse your parquet's using SQL

What Azure Synapse Analytics does that was already there before

Additional to bringing new features with Azure Synapse, Azure also tries to further simplify the set-up and use of Modern Data Platforms. Rather than having multiple tools in multiple interfaces, Azure delivers one interface in which a user can 

  • develop orchestration of data ingestion (powered by Azure Data Factory)
  • analyse data (using SQL or Python Notebooks) on SQL (powered by Azure Data Warehouse) or Spark (powered by Databricks)
  • build and visualize reports in self-service mode (using Power BI)
  • and manage an enterprise-grade modern Data Warehouse allowing you to build enterprise data models, facts and dimensions (powered by the Azure Data Warehouse technology)

The interface reflects above integration well by and under the hood refers to the other services. Rather than having to spin up Data Factory and Data Warehouse, you'll thus only need to spin up 1 resource now: Azure Synapse.

Azure Data Warehouse becomes Azure Synapse Analytics - What's new?

 

Let's get started

Are you keen on getting started with a Modern Data Platform or Synapse in general: get in touch & let's get you started!

 

Keen to know more?

  • Continue reading on how Microsoft & element61 look towards a Modern Data Platform build for end-to-end analytics including real-time, AI analytics and big data integration.
  • Visit our showcase site to try out some Data & AI applications which run on a Modern Data Platform architecture

A beginner’s guide to Microsoft’s Azure Data Warehouse

Your business data is extremely POWERFUL, only if you are able to use it properly– to generate valuable and actionable insights. However, it is also imperative to organize and analyze it well. A recent report says, less than 0.5% of the business data is actually stored and analyzed in a right way. As an impact, enterprises lose over $600 billion a year.

Today, the power of computing and cloud storage of business data has lifted up the demand for a data warehousing solution by businesses of all sizes. It is no more a large capital expenditure; indeed, it has become a one-time investment on the implementation of data warehousing system and can be deployed in no time. This allows any business to access their structured data sources and thus, collect, query and discover insights from it.  Microsoft has introduced Azure SQL Data Warehouse that has come as a permanent and effective product in the data platform ecosystem.

Microsoft’s Azure SQL Data Warehouse is a highly elastic and scalable cloud service. It is compatible with several other Azure offerings, for instance, Data Factory and Machine Learning and with various SQL Server tools and Microsoft products. Azure’s SQL based Data warehouse has the capability to process huge amount of data through parallel processing. Being a distributed database management system, it has overcome most of the shortcomings of traditional data warehousing systems.

Before handling the logic involved in data queries, Azure SQL Data Warehouse spreads data across multiple shared storage and processing units. This makes it suitable for the batch loading, transformation, and serving data in bulk. As an integrated Azure feature, it has the same scalability and consistency just like other Azure services like high-performance computing.

The traditional data warehouses have two or more identical processors and consist of Symmetric Multiprocessing (SMP) machines. They have complete access to all I/O devices as these are connected to a single shared memory. A single Operating System controls and treats them equally. With growing business demand in the recent years, the need for high scalability has arisen.

Read our whitepaper on advantages of cloud data warehouse

How Azure Data Warehousing overcomes these drawbacks

Azure SQL data warehouse caters all demands through shared nothing architecture. The feature of data storage in multiple location enables to process large volumes of parallel data. If you are new to Azure data warehouse and want to understand it completely, you can take Azure training from experts. You will get to know about virtual networks, azure machines and more during your training.

Features of Azure Data Warehouse

  • It is a combination of SQL Server relational database and Azure cloud scale-out capabilities;
  • It keeps computing separated from storage;
  • It can scale up, scale down, pause and resume computations;
  • Azure is an integrated platform;
  • It includes the use of tools and T-SQL (SQL server transact).

From legal to business security requirements, it shows complete compliance.

Benefits of Azure Data Warehouse

  1. Elasticity: Azure data warehouse possesses a great elasticity due to the separation of computing and storage components. Computing can be scaled independently. Even if the query is running, it allows addition and elimination of resources. 
  2. Security-oriented: Azure SQL has various security components (row-level security, data masking, encryption, auditing, etc.). Considering the cyber threats to cloud data security, components of Azure data warehouse are secure enough to keep your data safe.
  3. V12 portability: Now, you can easily upgrade from SQL Server to Azure SQL and vice-versa with the tools that Microsoft provides.
  4. High scalability: Scalability is high in Azure. Azure data warehouse scales up and down quickly according to the requirements.
  5. Polybase: Users can query across non-relational sources with through Polybase.


Different components of Azure Data Warehousing and their functions

  1. Control node: All connections and applications communicate with the front end of the system–Control node. From the data movement to computations, the control node coordinates everything required for running parallel queries. To do this, all individual queries are transformed to run in parallel on various Compute nodes.
  2. Compute node: As the compute nodes receive the query, it is further stored and processed. Even the parallel processing of queries takes place with multiple compute nodes. The results are passed back to the control node as soon as the processing completes. Then the results are collected, and the final result is returned.
  3. Storage: Azure Blob storage can store large amounts of unstructured data. Compute nodes read and write directly from Blob storage to interact with data. Azure data storage is expanding transparently. The storage is resistant to flaws. It provides strong backup and restores data in no time.
  4. DMS: Windows provides the Data Movement Service, and it runs alongside SQL databases on all nodes. This moves the data between nodes. It forms the core part of the whole process as it has an important role to play in data movement for parallel processing.

Azure Data Warehouse structure and functions

  • Being a distributed database system, it is capable of shared nothing architecture.
  • The data is distributed throughout multiple shared, storage and processing units.
  • Data storage in Azure data warehouse is a premium locally redundant storage layer.
  • Compute nodes on top of this layer execute queries.
  • As the control node is capable of receiving multiple requests, they are optimized for distribution to allocate to various compute nodes to work parallel. 

When you need massively parallel processing (MPP), Azure SQL Data Warehouse is the ultimate solution. Unlike the on-premises equivalent, Azure SQL Data Warehouse solutions is easily accessible to anyone with a workload using the familiar T-SQL language.

If you are looking to harness this wonderful data warehousing solution for your business, a Microsoft Partner like CloudMoyo can help. From evaluation, requirements and assessment phase, to data warehouse platform selection, architecture, integration, data management and further support, CloudMoyo’s brings to the table expertise, flexibility along with long term commitment for excellence

Saturday, July 25, 2020

Create an ASP.NET Core 3.0 Angular SPA project with Docker support -QUICK TUTORIAL


With great jubilee we can now create ASP.NET Core web application with an Angular template. However much to my dismay I found that the little Enable Docker Support checkbox is grayed out…

https://medium.com/swlh/create-an-asp-net-core-3-0-angular-spa-web-application-with-docker-support-86e8c15796aa
Image for post

Not to fear, a little pocking is all that’s required here ☝

To get started let’s right click add docker support

Image for post

That will create the following Dockerfile

It’s created in the project folder BUT the path to the resources is in the folder is one level back i.e.

COPY [“netcore-angular-docker/netcore-angular-docker.csproj”, “netcore-angular-docker/”]

So just cut the Dockerfile one folder back next to it’s sln and .vs friends 😊

Image for post

If we give that docker file a build (I use docker desktop for windows and the docker cli in kitematic)

> docker build . -t netcore-angular-docker

we will run happily along until we hit an unfortunate non-zero code

Step 1/17 :  ✔
Step 2/17 : ✔
Step 3/17 : ✔
Step 4/17 : ✔
Step 5/17 : ✔
Step 6/17 : ✔
Step 7/17 : ✔
Step 8/17 : ✔
Step 9/17 : ✔
Step 10/17 : ✔
Step 11/17 : ✔
Step 12/17 : ✔
Step 13/17 : RUN dotnet publish “netcore-angular-docker.csproj” -c Release -o /app/publish
...
..
.

npm: not found

What’s this!? Somebody call the FBI.

Indeed npm is not found in the aspnet:3.0-buster-slim build enviornment.

The npm command is executed from the csproj production publish node

netcore-angular-docker.csproj

That’s great because that’s all included in the asp.net core application. Now all we need to do is install npm (it feels great to write that the other way round for once) to the docker build environment. Update the Dockerfile with the following lines and try again.

RUN apt-get update -yq \
&& apt-get install curl gnupg -yq \
&& curl -sL https://deb.nodesource.com/setup_10.x | bash \
&& apt-get install nodejs -yq

Let’s try another docker build

C:\netcore-angular-docker> docker build . -t netcore-angular-dockerStep 1/19 :  ✔
Step 2/19 : ✔
Step 3/19 : ✔
Step 4/19 : ✔
Step 5/19 : ✔
Step 6/19 : ✔
Step 7/19 : ✔
Step 8/19 : ✔
Step 9/19 : ✔
Step 10/19 : ✔
Step 11/19 : ✔
Step 12/19 : ✔
Step 13/19 : ✔✔
Step 14/19 : ✔✔✔
Step 15/19 : ✔✔✔✔
Step 16/19 : ✔✔✔✔✔
Step 17/19 : ✔✔✔✔✔✔
Step 18/19 : ✔✔✔✔✔✔✔
Step 19/19 : ✔✔✔✔✔✔✔✔
Successfully built d42ca7e6964e
Successfully tagged netcore-angular-docker:latest

Yeah Buddy

okay but now it’s time for the moment of truth. Does the container serve the app? Try to run the app

docker run -d -p 8090:80 netcore-angular-docker

Indeed we can see that our angular app backed by asp.net core is serving happily from the docker image:


Image for post

And the API is running as well:

Free hosting web sites and features -2024

  Interesting  summary about hosting and their offers. I still host my web site https://talash.azurewebsites.net with zero cost on Azure as ...