Data blending may as well be a dance craze because the capability is very popular among analytic solutions. Solution providers are refining their data import features to provide analysts with more creative options for their data models.
Over the last several months, Google has been refining Data Studio to provide better blending capability and more flexibility in combining data. In February, the data blend function was expanded, enhancing updates to the data source selector made last fall. All of this makes usability of data joins in the platform more intuitive.
Start With Connectors to Know Where Data Is Being Sourced
Understanding the value of data blending starts with appreciating connectors, a plug-in extension that allows solutions to share data.
Google has a library of connectors that users can select. Twenty-two connectors feature Google's other cloud services, such as Google Ads, Firebase, Google Analytics and Big Query. There are also connectors for other platforms such as Amazon Redshift and Microsoft SQL server, both of which were released from beta this year. Third party vendors like Supermetrics have developed connectors for other data platforms where an API is available. As a result, there is a wide variety of data sources available for import, ranging from Twitter ads to Reddit.
When you are logged into a Data Studio account, you can select a connector then choose tables through the connectors using the data blend option in the data selector.
You can view which connectors are added in your Data Studio report. The view also indicates who owns the connector and the date when it was last accessed.
Related article: Google Analytics Introduces Migration Tool for Conversion Transition to GA4
Where Joins Can Join In
The updated data source selector introduced more configuration options for joins. Joins are the connection of fields between tables. A join condition is a field or fields that can be found in each table and used to link the records of those tables together.
Join conditions are the type of joins you would see for SQL, Python or any language querying data. Originally only a left join was available. Now the data blend function has five choices — a cross join, an inner join and three outer join operators. The three outer joins include the previously available left outer join, a right outer join and a full outer join. Each will match tables according to left table, right table, and both tables, respectively.
Google refers to the resulting table that contains the combined fields from a data blend as a blend in its support documents. The blend represents the output from a data join. By default, data blending is a left outer join arrangement in which the primary source, which is to the left is added to a secondary source which is on the right. Together they make a table of blended data.
Related Article: How Marketers Should Prepare for Google's Universal Analytics 2023 Sunset
Reliable Centralized Platform
Another advantage for Google Data Studio is simplicity for key tables.
Usually in queries such as SQL a joint key is needed to be able to pull data from separate tables together. Google Data Studio does not require a key once the data is imported through a connector.
One drawback with the blend feature in Google Data Studio is that the no query syntax is easily visible to verify a request. In SQL you can view how rows and columns join tables within the query syntax. By abstracting the joins, analysis can become hard to inspect if the right data is being pulled together. It is also hard to see how that data is treated as a NULL or N/A once the tables are combined; many programming languages and platforms treat missing data different in a result. Many repositories like data world will allow you to view preview a potential SQL syntax that represents a potential combination of data. Blending data can be complex when several tables and different field combinations are needed.
Despite some complexity, the latest data blending features in Google Data Studio will help analysts quickly identify how data can best be used for reports and decisions. It provides a reliable centralized platform to develop solid visualizations and analysis from a variety of data.