Architecture recommendations and data-processing techniques with Azure Synapse Analytics. This article of ours provides two architecture recommendations, besides showing how they ca be implemented an how data are provided for visualization.
Table of Contents
In addition to data ingestion, data processing in the Industrial Internet of Things (IIoT) is still a major challenge for many companies. How companies successfully implement IoT projects, and what a successful 6-point plan looks like, can be read here. Also shown is how IoT Central can be used to read an industrial robot's data from an OPC-UA server and deposit the data using Azure Blob Storage.
Industrial demand for cloud computing platforms such as Microsoft Azure is growing steadily. The resultant scalability and available IoT stack allow rapid ingestion, processing and analysis of industrial data from sources such as SCADA, as well as connection of different ERP & MES systems.
Azure Synapse Analytics
Microsoft Azure offers numerous services for processing IoT data. The architecture recommendations in the second section are based on Azure Synapse Analytics, a central analytics platform which combines data ingestion, processing, storage and visualization. In addition to (near) real-time data processing using Spark Pools and Synapse Notebooks, there is also a possibility of batch processing by means of Synapse Pipelines. Another advantage is integration with Azure Data Lake storage, as well as saving of data in delta format. Processed data can subsequently be visualized in combination with direct Power BI.
There are also further services allowing IoT data processing. These are discussed in the second blog article of this series of ours. We describe data processing with Azure Stream Analytics, and a serverless variant with Azure Functions.
Batch processing with Azure Synapse Analytics
Data saved using blob storage are loaded, transformed, and written to Azure Data Lake Storage Gen2 via Azure Pipelines. The pipeline created for this purpose contains two central functions. Firstly, columns must be converted to the correct data types; secondly, the JSON string column must be parsed. By means of a provided parseJson function, nested columns are extracted and inserted as individual columns into the data set.
The transformed data are stored in Azure Data Lake Storage, and made available for visualization using Synapse.
Finally, Azure Dedicated SQL Pool makes it possible to create a Power BI data set via a view of the data lake, and continuously update reports. A pipeline trigger can be selected in the management section of Azure Synapse Portal. A time schedule must be set there specifically for each case of batch processing.
(Near-) real-time processing with Azure Synapse Analytics
As an alternative to batch processing, data in this use case are forwarded to Synapse via Azure event hubs. Data processing takes place with the help of Spark Streaming and Azure Spark Pools, and is divided into various stages. A central service here is Azure Data Lake Storage Gen2 which reproduces the write-once, access-often analytics pattern in Azure. The employed storage format is delta, which offers higher reliability and performance for all data sources stored in ADLS, and is therefore very suitable for IoT data processing.
In ADLS, data are divided into different layers:
Raw: Raw data are stored in delta format, and neither transformed nor enriched.
Standardized: Data are stored in a standardized format with a clear structure.
Curated: Data are enriched by means of further information.
Export: Data are prepared for export and further processing.
The final export layer is required due to the still existent limitation of external tables with Azure Synapse Dedicated SQL Pools which are not able to read the delta format (Azure documentation).
Connections between event hubs and ADLS Gen 2 are established using Spark Streaming. A prerequisite for this is provision of an access token in Azure Key Vault, and queries using mssparkutils.
Further transformation and enrichment of data are also carried out with Spark Streaming. For this purpose, the JSON string column in event-hub data is first extracted and divided into individual data columns. After this standardization, further KPIs are calculated and the final data set is saved.
The enriched data are queried using a Synapse Dedicated SQL Pool, and made available in an external table for Power BI. The stored table is updated in (near) real-time and enables corresponding insights into the industrial robot's current data.
Outlook
In the next article, we will discuss two more architectures related to IoT data processing. We will show how to implement these using Azure Stream Analytics and Azure Functions. Afterward, we will also take a closer look at the Power BI dashboard for data visualization, and present the result of end-to-end data processing with the recommended Azure architectures.
Who is b.telligent?
Do you want to replace the IoT core with a multi-cloud solution and utilise the benefits of other IoT services from Azure or Amazon Web Services? Then get in touch with us and we will support you in the implementation with our expertise and the b.telligent partner network.
As stated in part one of this blog series on the reference architecture for our cloud data platform, we will share and describe different parts of this model, and then translate it for the three major cloud providers – Google Cloud Platform, AWS, and Azure. In case you just came across this blog post before seeing the first one about the ingestion part of our model, you can still read it here first. For all others, we’ll start by looking at the data lake part of the b.telligent reference architecture before diving deeper into analytics and DWH in part 3.
The world of medium-sized companies is being stirred at present by one topic, in particular: Industry 4.0. Pressure here is growing enormously because the digitization bus must not be missed if successful interaction on the market is to remain possible in the long term. What most companies do not know, however: Not all internal procedures must be cast overboard in order to remain abreast of the issues of big data and Industry 4.0. It is much more important make use of one's own employees' curiosity and convey the data culture to all departments. This blog post shows how to master the change to digitization step by step.
Ever thought about what the architecture of a cloud data platform should look like? We did! In our free webinar series Data Firework Days, we introduced our b.telligent reference architecture for a cloud data platform, a blueprint of how to build a successful data platform for your analytics, AI/ML, or DWH use cases. And we went a step further. Since we all know there’s not just one cloud out there, we also translated our model for the three major cloud providers – Google Cloud Platform, AWS, and Azure. In this blog series, we intend to describe the reference architecture in the first three blog posts and then, in parts 4–6, we’ll look into implementation options for each of them. So, do join us on our journey through the cloud.