Azure Synapse and Microsoft Purview – a marriage made in Redmond
A quick look at the ever deeper integration between Azure Synapse and Microsoft Purview - from data discovery and governance, to the use of data
In the article Azure Purview, unified data governance, filling this blind spot for Microsoft? (makingmeaning.info), I describe Microsoft Purview (previously called Azure Purview) and its role with data cataloguing, discoverability and ultimately governance.
When a Consumer of a data asset uses Purview to discover data assets (raw data, transformed data, modelled data and / or data visualisation), and after confirming the appropriateness of a data asset for his data related requirements, it is logical that the Consumer would want to seamlessly access that data asset straight off the bat directly from within Purview without having to go to another application.
Such a seamless transition from discovered data asset to its use is of course possible if the data solution is a Power BI dataset or report, as the Consumer, which, in the case of Power BI, is likely a Business Modeller or a Report Creator, has the option to open the dataset or report in Power BI (subject to permissions) as is shown in the image below.
BUT, up to now, this was not possible if the Consumer was a Data Engineer wanting to access actual raw or transformed data. The Data Engineer would have had to conduct discovery and understanding of the data assets in Purview, and then and then switch over into Azure Synapse SQL, or another application, to access the data asset (subject to permissions) to further the profiling and engineering tasks. That is pretty frustrating…
Fortunately, now, with even deeper integration between Microsoft Purview and Azure Synapse – the gap between governance and discoverability of data assets, and its profiling and use is closed even further.
It is now possible, after connecting your Purview workspace to your Synapse Studio, to search and discover your data assets right from within Synapse Studio. This is possible in only a few simple steps:
1) Connect the Purview workspace to Synapse Studio:
To connect a Microsoft Purview Account to a Synapse workspace, you need a Contributor role in Synapse workspace from Azure portal IAM and you need access to that Microsoft Purview Account.
Go to https://web.azuresynapse.net and sign in Synapse workspace.
Go to Manage -> Microsoft Purview, select Connect to a Microsoft Purview account.
Select the applicable Purview account.
Once connected, you can see the name of the Microsoft Purview account in the tab Microsoft Purview account.
(Note that as at the date of publishing this article, the label Azure Purview may still apply as it only recently changed to Microsoft Purview.)
2) Discover data assets and understand the metadata from with Synapse Studio as would have been done in Purview:
Once connected, Purview becomes available in the search bar in the data pane of Synapse Studio.
The full Purview discoverability is now available from within Synapse. In the example below, all data solutions adhering to the search term “Address” is displayed.
In this scenario, the Data Engineer understands the metadata and confirms the appropriateness of the “userdata” data asset (which contains address details), for his purposes, by scrutinising the available metadata (in this case the data is a parquet file in the data lake, but search results will return all data assets applicable to the search term “Address” including SQL, Data Lake, Power BI, and so on across the data estate be it Azure, Power BI and also far beyond those).
The Data Engineer now simply use the metadata information on the left to create a linked service right from within Synapse (if not already created) to access the data asset.
3) From Discoverability to Understanding to Use:
The Data Engineer can now start their query and data transformation journey. In the example below, the Data Engineer will run a simple select TOP 100 query from the userdata data asset, after which time he tailors and extends his query logic.
Great step forward, and I am looking to even closer integration between the core analytical and data services in Azure.