Industrial pipelines and valves on a background of a blue sky with clouds in piece about hybrid clouds and data pipelines.
Feature

Building Seamless Data Pipelines in a Hybrid Cloud Environment

5 minute read
Pat Brans avatar
By
SAVED
Discover how automation tools not only orchestrate data pipelines on-premises but also enhance data sharing in hybrid cloud environments.

The Gist

  • Hybrid solutions. Automation tools bridge data sharing gaps in hybrid cloud environments.
  • Efficiency boost. Orchestration platforms streamline complex data pipelines for better management.
  • Privacy compliance. Automated workflows support stringent data privacy and compliance in hybrid clouds.

One of the biggest challenges in a hybrid cloud environment is sharing data across applications. Some organizations are discovering that the same automation tools they use to orchestrate data pipelines on-premises, can also improve data sharing in a hybrid environment. Let’s take a look at hybrid cloud management and the use of automation tools.

The most commonly used definition of a hybrid cloud environment is that it is an on-premises data center combined with one or more public clouds — usually AWS, Google or Microsoft Azure. Organizations typically adopt a hybrid environment for two primary reasons. The first is a limitation that prevents them from operating entirely on-premises or fully in the cloud. For instance, they may need to use cloud-native functions, such as the ability to scale up or down quickly, but they also need to maintain a legacy system that can only run on-premises. Additionally, there might be heightened concerns about the privacy of certain data, which discourages an enterprise from hosting everything in the cloud.

Panorama of a beautiful blue sky and white puffy clouds in piece about hybrid clouds and data automation.
One of the biggest challenges in a hybrid cloud environment is sharing data across applications. FocusStocker on Adobe Stock Photos

Hybrid Cloud Management: Strategy Missteps and Costs

The second common reason companies end up with a hybrid cloud is that it often occurs inadvertently. Through acquisitions, a large company may find itself with multiple computing environments to manage. Alternatively, an organization might transition to the cloud without a strategic plan, moving applications in a "lift-and-shift" manner, which does not leverage cloud benefits. Or, a company might partially migrate, then discover that they have underestimated the costs involved. These scenarios can cause a loss of momentum and lead to abandoning further cloud migration plans.

Related Article: IBM's New Game Changer: Hybrid Cloud Mesh Set to Transform Connectivity

Hybrid Clouds: Bridging Data Location Gaps

No matter how an organization ends up with a hybrid cloud environment, they are likely to face challenges when systems requiring data are not co-located with the data's source. "If you're running an analytics platform in the cloud, and you need to query a data source on-premises or on another cloud, you have to reach from the cloud into that other place," says David Shannon, head of hyperautomation at SAS UK & Ireland.

Related Article: Is a Hybrid Cloud Architecture Right for Your Business?

Manual Data Moves Risk Errors, Delays

Organizations may occasionally rely on manual processes to transfer information. Someone extracts data and uploads it to the cloud for use by an analytics platform. These manual methods are susceptible to human error and cause delays. Additionally, when data is prepared in a data center and moved to the cloud, two versions of the same information are created, leading to potential discrepancies if one changes and the other does not.

Another set of challenges in hybrid cloud management is that you have multiple environments to manage, and the platforms require different skills. "Sometimes administration becomes ad hoc, with the different teams not communicating," says Martin Hulbert, CTO at Ignite Technology

Cloud spending continues to grow at a very high rate. Gartner recently forecast cloud investment by enterprises to grow at 19.5% growth rate over the next five years.

"Many organizations find themselves in a situation where they cannot live without the cloud, and they also need to maintain assets on-premises," says Hulbert. "Now they want to get the different environments to work better together." 

Related Article: How Pipelines Can Help Marketers Gain a Better Understanding of Machine Learning

Automation Platforms Connect Different Environments

"A good solution is to set up a system that enables administrators to initiate a query at another location," says Shannon. "Software agents that sit on-prem or in the data center react to the request. Those agents are responsible for accessing the right system to fulfill the query, which means they need given credentials beforehand to be granted authorization when they need to make the queries. The main challenges with this arrangement are security and the practical mechanics of joining data sources together automatically." 

Related Article: Automating Customer Data Imports Can Benefit Businesses

Automation Streamlines Complex Data Pipelines

The application of automation in managing large data pipelines is well-established. It's already used for large data pipelines that require information from several different sources, both internal and external to the organization. This could include financial information from internal sources, pricing from vendors, and schedules from contractors. Automation platforms help manage data privacy and compliance by providing visibility into each part of the workflow, allowing for more efficient scheduling, and reducing unnecessary delays.

Related Article: Good Customer Data Fuels AI Revolution in Customer Experience Management

Automation: Orchestrating Data Like a Spider

An automation platform works like a spider. The orchestration engine, which is the body, is centrally located. It's controlled by a set of rules and logic that is configured by a human engineer in advance. Then agents, which are the legs, attach to all the remote pieces where they perform actions on things like web services, ERPs, databases, or simply file drop-off points.

Tiny Scared Spider in Front of a Window on its web with a green background, in piece about data automation and hybrid clouds.
An automation platform works like a spider.ATRPhoto on Adobe Stock Photos

Automation Navigates Cloud Ingress-Egress Costs

"When you use automation in a cloud environment, you have to be mindful of ingress and egress, particularly if data's flowing back out from the cloud to an on-prem source," says Shannon. "Some egress costs are incurred. You want the repeatability of a well-formed automated process because then you're able to better predict what volumes of data are going to be moving back and forth. You can also better control the quality of the data. Analytics are only as trustworthy and reliable as the quality of the data you put in."

Optimize Analytics by Data Proximity

Analytics should be conducted as close to the data source as possible. When an analytics platform is in the cloud and a legacy data system is on-premises, transferring entire tables or the results of basic SQL queries to the cloud for processing is inefficient and raises the risk of issues due to data duplication. "We encourage users to push queries to the source and only return the results of the queries," says Shannon. 

"You don't need an orchestration tool for something very small," says Hulbert. "But you do need it if you're bringing multiple data sources together or pulling data from various data warehouses and putting it into a data lake for data analysis."

Learning Opportunities

The Same Use Cases for Automation Apply to Hybrid Cloud

In finance, automation helps guarantee compliance. You can be sure information is stored at the right time and erased at the right time. You can also be sure the right privacy measures are imposed. You know when each hop is done along the workflow, because you have visibility of each part — what it's doing, when it completes, and when it moves to the next hop, be that on-premises or on the cloud.

Retail Automation Streamlines Orders and Shipping

In retail, automation is used to ensure the website is up to date, that orders are taken and moved around correctly, and that orders are shipped out to customers on time. "A lot of data moves around between the various components, because you generally have SAP running all the suppliers and Oracle Retail running all of the orders off the website," says Hulbert.

Automation Boosts Data Transparency and Compliance

"Observability and traceability are important to the CDO, the CIO, and business leaders," says Shannon. "By automating data flows across the different parts of a hybrid environment, the CDO can account for what they're doing with the data. IT can know who did what, where, when to that data source. And the business can understand how data is flowing, what it means and how it impacts the decisions an organization makes. Having that complete transparency is really important."

"It's also essential for compliance," says Shannon. "You need to drill right down to an individual row level or subject level in a source to be able to say this is what happened and why it happened."

About the Author
Pat Brans

Pat Brans is an esteemed freelance technology journalist and author with over 15 years of experience. Prior to writing, he had a 22-year-long career in high-tech, working for industry leaders like CSC, HP, and Sybase. He leverages this expertise to communicate complex technological topics, from AI to quantum computing, in a clear and compelling manner. Brans, who holds a Master's Degree in Computer Science from Johns Hopkins University, also lectures at Grenoble École de Management and is the author of two significant books. Connect with Pat Brans:

Main image: Rawf8 on Adobe Stock Photos
Featured Research