Orchestras rehearse so that every musician plays in sync. For business teams, however, using data in sync necessitates a number of steps be taken to make the data useful. One of the challenges is that teams are often using different platforms and applications that generate data in varying formats and types. It takes time and effort to parse these data types and connect the data so it can become valuable information.
One of the newest solutions aiming to ease these orchestration woes is Microsoft Azure Purview, a cloud platform-as-a-service (PaaS) that provides a unified view of data sources across an organization. With that unified view, teams have better sense of the data they have to manage and can better coordinate issues related to data management, such as data governance.
An Introduction to Microsoft Azure Purview
Microsoft launched a beta version of Azure Purview in December 2020. At Ignite 2021 in March, Microsoft announced a series of updates.
First, Microsoft expanded the sources customers can choose from to scan and classify data in Azure Purview. When Azure Purview debuted, it supported the following data sources: Teradata, SQL Server on-premises, Azure data services and Power BI. Now users can also select Amazon AWS S3, on-premises Oracle DB, and SAP ERP instances.
Second, users can now use Azure Purview to scan Azure Synapse workspaces across serverless and dedicated SQL pools. As I reported in an earlier post, Synapse acts as a bridge to bring enterprise data sources into analytics and machine learning efforts. So while Synapses focuses on analytics, Purview focuses on the arrangement of data sources related to data movement. With the integration with Purview, users can discover local data with a Purview-powered search within their Synapse workspaces.
To begin, an analyst uses an Azure account to set up a Purview console interface. This interface connects to connectors, a data abstraction that copies the metadata from the data sources. The connectors form an ongoing link to data sources without the need to move any data out of its current state. If a change happens in the data schema or a new table is added, the connectors automatically account for changes.
Learning Opportunities
Analysts can then use the search text window to scan the connectors. The query scans across the metadata from the connectors, then classifies the metadata based on 100 classifiers specified through machine learning. Analysts also can also create custom classifiers for metadata they anticipate a need for. The query ultimately returns a data map, with results based on a hierarchy of tables, databases and server locations. The hierarchy is a display of data lineage, a diagram that shows where data is sourced and how data at those sources are systematically consumed.
A data catalog presents all of the available data sources. It is assessable to business teams who use the data, so everyone can be up-to-date on what kinds of data assets are available, from images to reports. Users can also create a business term glossary, to introduce organizational vocabulary in the data catalog. This can help align data governance needs in a language familiar to those impacted business teams.
Related Article: New Data Import for Google Analytics GA4 Is a Workflow Boon
Azure Purview, Off to a Good Start
Microsoft has seen high adoption of Azure Purview since its December debut. Declaring the acceptance a "roaring success" in its blog, the company claimed that customers have used Azure Purview to automatically scan and classify over 14.5 Billion data assets.
Business teams will always face a complex myriad of data management choices that can challenge the comprehensiveness of governance tasks. Azure Purview provides business analysts and managers the means to break down those operational challenges quickly so that everyone can hit the right note.