Cloud data warehousing company Snowflake’s record IPO last month left the company valued at more than $66 billion. Why should those who manage customer data in systems such as customer data platforms (CDPs) care about this particular industry movement in public financial circles?
Well, for starters, cloud hosting companies like Snowflake enable direct SQL queries against stored data but also the ability to load non-structured data, according to David Raab, founder of the CDP Institute. “It can scale pretty much infinitely,” he added, “and this happens automatically, so no pain-staking database design is needed to maintain good performance. It can also create virtual versions of the underlying data in different formats, so different systems can access the data via Snowflake without having to make separate copies. All this flexibility means it’s easier to deploy a CDP and to make changes when new data is added.”
These features are directly relevant to customer data, which often requires adding new sources, accepting new data elements within existing sources and sharing the resulting customer profiles with different systems, according to Raab. This should greatly reduces the labor required to build and maintain a customer database.
Snowflake Capabilities and CDPs
Several customer data platforms use Snowflake, according to Raab, who cited Simon Data in particular. Snowflake, he said, is well-suited to CDP because it’s flexible in terms of the kinds of data it can store and, being "natively" cloud-based, highly extensible in terms of volume.
“Being well-suited,” Raab said, “it reduces the work that a CDP developer needs to do, which makes it easier to create a CDP product and to deploy CDPs at individual clients. Other cloud databases on Google Cloud, AWS, and Azure have similar advantages.”
Standard relational databases like Oracle or SQL Server are easy to query but require each data element to be defined in advance, according to Raab. They must be carefully designed to give good performance, especially when volumes are high. “This means adding even a single data element can require an expert database designer, and that unexpected data can’t be added until the database is modified to accommodate it,” Raab said. “The main alternative to relational databases is to use a distributed file structure like Hadoop, which stores massive volumes of data distributed over many network nodes. These can easily add new data elements, since there’s no predetermined structure the data must squeeze into.”
However, Raab said, querying those data stores requires scanning the entire data set, which often isn’t practical. Users can get around this by extracting selected elements into a more structured format, using databases like Apache HBase. “That works but now you’re again limited to what has already been anticipated,” he said.
Related Article: Why US Clouds Are Creating Data Problems for Europeans
Finding a Home in the Cloud
What led to Snowflake recording the largest software IPO to date? The cloud has increasingly become the home for customer data and personally identifiable information (PII) for use in analytics and machine learning, as evidenced by Snowflake's recent IPO, according to Rick Farnell, CEO of data security company Protegrity. “Snowflake correctly read the tea leaves of cloud data innovation... With a strong focus on SQL as an engine, Snowflake’s capabilities are available to the world’s most popular languages for interacting with customer data.” This approach, he added, reduces the back-end issues that routinely slow innovative analytics projects.
Snowflake’s IPO will fuel innovation for all supporting players of cloud data warehousing, Farnell said, and will spark a “snowball” effect to drive new investments and technological breakthroughs. His company is aligning with the future of data stores like Snowflake, and this snowball effect will also take place in the entrepreneurial cycle and venture capital, leading to new money that will fuel innovation in cloud data technologies.
Organizations must still be vigilant about managing enterprise data strategies, including data security and privacy policies, even as they move sensitive customer data to the cloud, he added.
Why Amazon, Snowflake Entered the Cloud Game
Russell P. Reeder, CEO of disaster-recovery-as-a-service provider Infrascale, said managing and analyzing data has always been an opportunity and a challenge. From the initial creation of the relational database (RDMS) to NoSQL to object-oriented databases, IT professionals are on a constant search for where to store and analyze their data.
“The invention of multi-dimensional business intelligence systems and data warehousing solutions provides the right data storage and analytical tools, but still leaves much of the programming and capital expense to the company,” he added.
Amazon in 2013 launched its cloud-based data warehousing solution, Redshift. Now, Snowflake and many other cloud-based data warehousing solutions take advantage of cost-effective security and scalability that the public cloud enables. “Snowflake makes it easier for IT professionals to quickly move their data warehousing to the cloud and leverage pre-built analysis tools,” Reeder said.
Related Article: Snowflake Sets Record IPO
The big data and analytics space had incumbents with solutions similar to Snowflake, but their products had become victims of their own success and couldn’t adapt to support new business models, according to Oskari Saarenmaa, CEO of open-source cloud data platform provider Aiven.
“Snowflake certainly found a great market and went on to build a great product for it,” Saarenmaa said. “Cloud-native solutions and related flexible business models open new opportunities. A majority of the market is still old-fashioned on-premise products based, but moving rapidly to the cloud. This is where the growth happens now.”
Snowflake succeeded by going full speed in creating a scalable cloud-native solution, rather than trying to "patch old ways of working" with complex on-premise or hybrid solutions, Saarenmaa said. “Still,” he said, “it maintained the ability to use existing tools and skills in enterprises with its SQL compatibility.”
Snowflake also introduced a new flexible pricing model, separating computing and storage, which allows companies to start cost-effectively and grow. It also provided a 360-degree data analytics stack “as-a-service,” and not "yet-another-hard-to-implement-database" product, according to Saarenmaa.
“Snowflake's extensive ecosystem of integrations with related products and tools make it easy for customers to build automated workflows for their data,” he said. “Being a cloud-native software-as-a-service platform means that their solution is easier, more flexible and more scalable than traditional on-premise solutions. The Snowflake solution is also available in the cloud that each customer wants, with a flexible pricing model. It doesn't force customers to move to a specific cloud and lock them there."
Architecting the Right Data Cloud Program
Snowflake’s record IPO is certainly intriguing, and moving data and applications to the cloud has many advantages. However, Reeder noted, it also has many downsides if you do not architect your solution correctly.
“Just because you move your data to a cloud-based data warehousing solution like Snowflake, it does not mean that your data cannot become corrupted,” he said. “And if you do not have a backup of your data, you could easily lose days or weeks of work. IT professionals also have to be cautious when using the various applications and pre-built tools to analyze data. Using these tools brings speed and cost advantages, but if not designed correctly, they could lead to vendor lock-in that limits the ability to move to another provider in the future."
Designing for the End Result
Before jumping in head first and uploading data to a cloud-based data warehouse, you must first ask yourself what type of analysis you are looking to perform, Reeder said, and what answers you are looking to get from your data.
“When choosing a cloud-based data warehousing solution like Snowflake, you also need to understand from where you will pull your data and what other data sources you need to access to help make better sense of it,” Reeder added. Making sure that easily import your data and have access to other third-party data sources is critical to getting the right answers. "As with most projects," Reeder added, "If you first do not define what success looks like, your project will never be successful."