How can I integrate data sources that are secured via private endpoints into Fabric? How do I deal with Azure Data Lakes behind a firewall? This blog post shows the possibilities which Fabric Nativ offers
Inhaltsverzeichnis
Securing Incoming Data Traffic
Considerations of network security need to cover incoming as well as outgoing data transfer.
Inbound data traffic describes access to Fabric itself (e.g. invoking app.fabric.microsoft.com). It is possible to integrate Fabric into a network via private links. This makes Fabric available exclusively via the internal network. Networks from the Internet are blocked.
Figure 1: Inbound access to Fabric - own illustration
Two parameters are used in the admin portal for this purpose: Azure Private Link and Block Public Internet Access.
Private links enable services to be delivered across private virtual networks (Vnet) without the need to connect them via peering.
"Block Public Internet Access" deactivates all traffic via the Internet. It is important to ensure that a private link is configured correctly before the setting is enabled. Otherwise there is a risk of locking oneself out of Fabric.
Figure 2: Screenshot from the Fabric Admin Portal
Subject area
Private Link activation
Activation of Block Public Internet Access setting
Governance
One Lake Regional Endpoint is not supported (used to comply with data residency).
Microsoft Purview Information Protection is not supported, causing the Sensitivity button to be greyed out in Power BI, label information to be unavailable, and decryption of .pbix to fail.
Migration
Inability to migrate workspaces to capacities in another region.
Tenant migrations are not supported.
Fabric trials are not supported.
Data Engineering
Tenant must be in the home region where Fabric data engineering is supported (regardless of capacity region) to allow use of Spark jobs.
Visual Queries in the Warehouse
If data are to be loaded into the lakehouse via pipeline, implementation is possible. If a data warehouse is involved, no support is currently available.
Data Engineering IoT Focus
Shortcuts for Eventhouses are not possible.
Event Stream feature is not usable
Inability to invoke Eventhouses from data pipelines.
Reading of data via queuing as well as data connectors which rely on queuing.
Eventhouse via T-SQL queries.
Data Analysis
"Publish to Web" function is not available.
Power BI semantic model, Datamart or Dataflow Gen1 cannot connect to a Power BI semantic model or Dataflow as a data source.
Reports cannot be exported to PDF or PowerPoint.
E-mail subscriptions for dashboards and areas are no longer possible (brief component overviews are regularly sent to addressees).
On-premises Data Gateway does not work, a workaround is necessary here.
If both parameters are enabled, updates to the database for the modern usage metrics report fail.
User Experience
Deployment of designs and external images to customize the Fabric portal.
Service Limits
450 capacities supported
Outbound Traffic for Destinations in Azure With Network Security
For outbound data traffic, access from Fabric to external data sources is of central importance. Different scenarios are possible here:
Scenario 1: Azure Storage / Data Lake Gen 2 behind firewall
Figure 3: Trusted Workspace access to Azure Data Lake Gen 2 - own illustration
It is possible to whitelist all Fabric workspaces of the tenant or individual ones. Though all the tenant's workspaces can be specified, this is not recommended because the feature might be discontinued in future.
Individual workspaces can also be specified via the ARM template of the storage account / data lake:
Here, however, only „b2788a72-eef5-4258-a609-9b1c3e454624“ is to be replaced with the own workspace ID. This can be found in the Fabric portal when invoking the workspace, and is indicated as part of the URL.
In addition to the network, there is also identity as a security parameter 1. Therefore, the appropriate permissions still have to be set. Two access models are available here: Firstly, access-control lists and secondly, role-based access-control assignments (RBAC). For RBAC assignments, it is important to keep in mind that there are dedicated roles for data lakes and storage accounts (e.g. Storage Blob Data Reader).
Restrictions at a glance:
Currently only supported by Azure Data Lake Gen 2
Whitelisting at the individual instance level requires ARM template know-how
F-capacity neede
Scenario 2: Azure PaaS resources integrated into the network via private endpoints are to be accessed by Fabric.
This is not the case if Anonymus Blob Access is activated (only recommended for fewer use cases, as authentication is thus bypassed).
Figure 4: Vnet integration of a data lake via private endpoints. Source: https://learn.microsoft.com/en-us/fabric/security/security-managed-vnets-fabric-overview
Here, it is possible to work in Fabric via managed private endpoints and managed Vnets
Figure 6: Dialog for creating managed private endpoints - screenshot from workspace settings
This must then be approved in the Azure portal:
Figure 7: Networking tab of a data lake - screenshot from the Azure portal
After that, it is possible to access the data sources with notebooks and Spark job definitions via the private endpoints. Access via pipelines is currently not possible, however. Spark notebooks
require CPU power, for which a cluster of virtual machines is created in the background. By default, the machines share a standard network. In this scenario, an isolated, managed network is created during the first run.
Restrictions at a glance:
Spark jobs in on-demand clusters are less readily available than default pools.
Fabric data-engineering workloads must be supported for tenant and capacity region: This may result in restrictions for Switzerland West.
F-capacity needed.
OneLake shortcuts are not supported if pointing to connections to a data lake with private endpoints.
Another scenario mentioned here for the sake of completeness is connectivity to on-premises data sources. The on-premises data gateway can be used to access these data sources directly.
Ultimately, it can be said that the private-endpoint feature often being discussed currently is only relevant if a data source existent in Azure is to be connected. In this case, one should be aware that the data source needs to be loaded via notebooks / Spark job definitions. Access via pipelines is not possible.
Fabric itself or the capacity cannot be integrated into a network via a simple private endpoint. A more complex setup is needed for this purpose. A private link must be used, and the tenant settings for this link and/or the Block Public Internet Access setting must be activated.
However, companies wanting to avoid a use of network integration can also employ other security measures, as described at the beginning of the article. In the end: Fabric security means more than the use of private endpoints.
Let’s Unlock the Full Potential of Your Data – Together!
Looking to become more data-driven, optimize processes, or leverage cutting-edge technologies? Our blog provides valuable insights – but the best way to tackle your specific challenges is through a direct conversation.
Let’s talk – our experts are just one click away!
Want To Learn More? Contact Us!
Your contact person
Caspar von Stülpnagel
Managing Director
Who is b.telligent?
Do you want to replace the IoT core with a multi-cloud solution and utilise the benefits of other IoT services from Azure or Amazon Web Services? Then get in touch with us and we will support you in the implementation with our expertise and the b.telligent partner network.
You want to manage your infrastructure with Terraform, but then it happens – manual changes are made, and you need to find a solution. How to handle this depends on the specific case.
One of Terraform’s greatest strengths is its ability to handle changes made outside its managed resources. The keywords are: data, import, removed, ignore_changes, lock, variables.
Many security considerations involving Azure revolve primarily around network security. Other important security aspects to be considered in the context of Microsoft Fabric are indicated below.
As stated in part one of this blog series on the reference architecture for our cloud data platform, we will share and describe different parts of this model, and then translate it for the three major cloud providers – Google Cloud Platform, AWS, and Azure. In case you just came across this blog post before seeing the first one about the ingestion part of our model, you can still read it here first. For all others, we’ll start by looking at the data lake part of the b.telligent reference architecture before diving deeper into analytics and DWH in part 3.