injecting magic in the every day
PHOTO: Almos Bechtold

Cloud computing is somewhat of a magic trick. And to borrow from the movie "The Prestige," every magic trick consists of three parts.

First, a magician shows you something ordinary — a bird, a coin, a deck of cards. Second, he makes this “ordinary something do something extraordinary,” for example, disappear. This is good, but not enough. To amaze, the magician has to do more than make the object vanish: he has to return the object to where it was before.

In the same vein, cloud-magicians take something ordinary from your data center — an app, a server, a database — and make it disappear. But unlike a deck of cards, you need that thing. So, when it reappears in the cloud, it doesn’t feel like magic. It's what you expect.

In the cloud, it’s not enough just to bring the object back. A cloud app, server or database has to do more than when it left your data center.

Cloud apps are easier to use, scale and modify than apps on-premises. Cloud infrastructure is elastic, effectively infinite and globally available, which a corporate data center could never be. But what about the database? What new abilities should a database have when it reappears in the cloud?

In the cloud, the database should become something else — a data utility. A data utility absorbs and delivers data of any size and shape for any application, analytic or algorithm. Far more than just a database, a data utility is a service that makes it easy for developers to innovate, for operations teams to fulfill service-level agreements (SLAs), and for data scientists and analysts to easily put datasets to new uses. And all of this must be true of data in any format at any scale.

Related Article: Cloud Computing Takes a Back Seat to Edge Computing. Or Is it Fog?

Data Management as a Service: The Data Utility

A cloud data utility running on an infinite computer, supporting planet-scale digital services is very different from traditional data management. To the people who use it, like developers, operations teams and data scientists, the utility looks like a set of common APIs that allow data to flow between different applications. But under the hood, extraordinary engineering translates simple API calls into sophisticated coordination among many services.

When a developer sends a quick call to the data utility requesting a place to hold some data so she can go back to working on her app, a whole lot of machinery kicks into gear behind the scenes. The utility spins up a database, mirrors it in multiple locations to protect against hardware failures, and ensures it’s fully secure and compliant with corporate policies. This all happens in mere seconds.

But that’s not all. The developer is working on an app she hopes will go viral, but she can’t be sure. Fortunately, the data utility scales the necessary compute and storage resources up and down at will — and independently of each other — so the infrastructure will grow to suit it if the app is a hit, and stay small if it’s not. The utility also constantly monitors its every action, anticipating hardware problems and rerouting workloads before failures occur, as well as diagnosing and resolving performance bottlenecks automatically whenever possible. This makes the operations team’s lives much easier.

Related Article: Data Ingestion Best Practices

Data Out, Data In

The data this app produces isn’t just a record of what happened, it’s also a necessary input into new analytics and algorithms. Some of these analytics have to be calculated in real-time and presented in the app, others on a batch basis for reports, and still others according to detailed regulatory requirements.

The data science team wants to do something completely different. They’re interested in combining the app’s data with other datasets to uncover possible cross-product and cross-customer correlations. All of these different analyses require distinct arrangements of the original data.

The data utility makes delivery of those original observations to many endpoints easier by automatically provisioning and populating data warehouses, lakes and streams requested by analysts and data scientists. It simplifies access for all manner of data science tools including digital notebooks, statistical programming languages and query services by examining the structure of the data and generating new APIs on its own.

To make sure all analytics and algorithms constantly have the latest data, the data utility keeps track of endpoints as they’re provisioned and retired, the flows of data between them, and the transformations that take place along the way. This data provenance is always available and fully transparent.

Simple automation is not enough to deliver this kind of data utility. Machine learning embedded in every aspect of data management is the key. Every decision made by the utility, from determining good time windows for applying upgrades to generating data indexes that speed up query performance, must be based on observations of actual system performance and fed back into algorithms to tailor their behavior to the work patterns of each activity.

This is what it means to reinvent data management for the cloud. It means creating an elastic, infinite, globally available data utility using artificial intelligence that constantly tunes the utility’s behavior to the shifting performance demands of intertwined apps, analytics and algorithms.

It looks like magic, but really it’s just technology sufficiently advanced to appear that way.