3 Overlooked Fundamentals That Plague Big Data Projects

6 minute read
Joanna Schloss avatar

Earlier this month (yes, they were still playing baseball in November!), the Kansas City Royals won the World Series for the first time in 30 years. And they did so largely on the strength of fundamentals. 

At a time when most organizations in baseball have become infatuated with sluggers who can hit the ball a mile and hard-throwing pitchers who can light up the radar gun, the Royals stood out by excelling at the basics of the game. They threw strikes, fielded their positions and made smart decisions on the bases. Though not splashy, these fundamental skills, deployed consistently, repeatedly and reliably, added up to a world championship for Kansas City.

At the same time the World Series was going on, many notable IT vendors were staging their marquee annual events. And if there was a common theme present throughout this fall’s IT event season, it was that seemingly everyone is all-in on the mega-trends, with big investments being made in exciting new big data, data analytics and IoT technologies. 

As with home runs and 100-mph fastballs in baseball, these splashy new technologies can pay huge dividends for organizations large and small. But they can likewise be quickly and decisively undermined by poor data management fundamentals.

Forgotten Fundamentals

Even with today’s increasingly sophisticated big data technologies and the continued maturation of the overall big data ecosystem, a large portion of big data projects still fail. And when they do, it’s more often than not on account of flawed data management fundamentals. 

Unquestionably, many data management fundamentals are core to the success of any big data initiative. In talking with customers, however, three in particular stand out as being the most often overlooked — and as a consequence, the most often responsible for undermining big data projects: The ability to ensure the accuracy and reliability of data and databases, the ability to gain a complete view of all data, and the ability to analyze data without impacting core business processes.

Now, you’re probably thinking that each of these fundamental data management capabilities seems obvious, and in turn, wondering how it is they can be so often overlooked. But that line of thinking — understandable as it is — is precisely what makes fundamentals so easy to miss. 

Fundamentals, by nature, are things we assume will be done and done right. Otherwise they wouldn’t be fundamentals. So just like a baseball manager assumes his players know how to bunt or how to properly run the bases, when they green light a big data project, a CMO or CIO assumes that the data used will be accurate and complete, and that running the project won’t adversely impact a core business process.

Unfortunately, such assumptions are often incorrect, and projects are quickly undermined as a result. With that in mind, let’s take a closer look at each of these core data management fundamentals, and examine what businesses can do to solidify them.

Ensure data and database accuracy

In our rush to analyze data, organizations too often fail to ensure the accuracy of their data and databases. In order for any big data project to be successful, you need to ensure the accuracy of both your source data (original, unmodified data that flows into your organization) and your target data (structures that are created by the modification of target data such as data marts or data warehouses).

Learning Opportunities

Again, data accuracy is not something a C-level or executive project sponsor is likely to question. So it’s the job of IT to make sure accuracy is ensured. The best way to do so is to have a reliable data curation process in place which is extensible across the entire organization. That requires investing in core data management technology that enables DBAs to ensure that the databases they manage are properly administered, and that the data they deliver to analysts is properly reconciled.

Provide a complete view of ALL data

The hype surrounding big data has obscured the fundamental need for companies to connect to and analyze all data. Large or small, structured or unstructured, on-premises or off — all data has value and all data is needed to ensure a complete view of the trends impacting your company and its customers. Failing to deliver a complete view across all of your data environments undermines the credibility and findings of any big data project.

The ability to reconcile data from both SQL and NoSQL environments is thus paramount. Organizations should consider increasing their investment in multi-lingual DBAs and in data and database technologies that are truly platform agnostic. A setup in which your organization has a person and tool for Oracle, a person and tool for SQL Server, and a person and tool for Hadoop generally won’t work. 

You need people and database tools capable of working simultaneously across all of your environments, and you need to have consistent and meaningful collaboration between IT and lines of business. Only then can you achieve the complete view that is fundamental to the success of any so-called big data project.

Delivering analytics without impacting business systems

A seemingly obvious and yet far-too-often-overlooked fundamental of any big data initiative is to ensure that analytics are delivered without adversely impacting business systems, such as online transaction processing (OTLP) systems. Running customer analytics that slow your point of sales to a crawl and impede your customers’ ability to transact with you is the very definition of being counterproductive. And yet, many companies still run analytic processes against their transactional systems.

This fundamental rule should never be broken. To ensure that it isn’t, consider leveraging replication to create analytic sandboxes where you can run analytics without impacting production systems. In keeping with the spirit of the two aforementioned fundamentals, choose a platform-agnostic replication tool with real time change data capture capabilities. You can then integrate data from across your various database environments while ensuring that the data in your sandbox remains up-to-date and accurate.

Fundamentals for Success

In big data — as in baseball — fundamentals are often what separate the most successful organizations from the rest of the pack. So, while there’s nothing wrong with investing in and getting excited about the latest and greatest trends and technologies, make sure you never lose sight of your core data management fundamentals. Focusing on fundamentals helped the Royals win the World Series. Imagine what it can help your business do.

 Title imageCreative Commons Creative Commons Attribution-Share Alike 2.0 Generic License by  [nivs] 

About the author

Joanna Schloss

Joanna Schloss is Senior VP of Product Marketing at SmartBear and has more than 20 years of experience successfully transforming and evolving both global 500 companies and startups. She has extensive knowledge in big data analytics and business intelligence and has launched a variety of tools and applications for various companies, including Confluent, IBM, and Oracle, among others.