Data silos days are numbered. Businesses that organize around products or business functions and not data will disappear.
The development and adoption of service oriented architecture (SOA) and cloud computing created a steady stream of incentives to abandon data silos. However, data inertia -- the difficulties of easily giving access or moving data -- and organizational structures have posed significant stumbling blocks in exploiting them fully and organizing businesses around data at the same time.
A Turning Point
We are at a turning point with big data. SOA and cloud computing were about efficiencies -- which could be ignored -- big data is a strategic shift and will transform businesses in a way that adopters will supersede the stragglers.
Businesses, especially large ones, have long been trying to break down the silos of data storage and computing. It is a truism that replicating data storage and processing across an organization is inefficient and obstructs insight. SOA and cloud computing emerged as potential solutions to address these problems by migrating data and processes from departments to shared infrastructures.
SOA removes duplication of business logic and encourages reuse through business process management and orchestration. SAP in 2001 imagined a global service marketplace where companies exchange and orchestrate services, switching and adopting uncompetitive or unreliable services (interleaving internal and third party ones). This has not come true in the many years SOA has been promoted by the likes of SAP, IBM, Microsoft, and marketplace efforts have largely been abandoned. In reality, large corporations struggle with service discovery, standardization and subsequently re-use -- even internally. The interest in SOA has subsequently waned.
Cloud computing and virtualization has met with more success. It commodifies and optimizes the utilization of data processing and storage. Private, hybrid and public cloud patterns target variations of use cases to balance privacy, compliance, and base- and burst-load situations, for example. The offerings range from Infrastructure (e.g., (virtualized) hardware), to Platform (e.g., database systems) and software services (e.g., email). More than 80 percent of companies now use a cloud service. The adoption varies though with start-ups often embracing the cloud as an option for their entire infrastructure and services to iterate fast and cheaply in exchange for higher operational costs.
Established businesses with legacy computing infrastructure and compliance requirements like banks have been much slower in adopting the cloud. Hybrid cloud solutions and special secure cloud offerings are targeting these businesses, to offload or speedup non-sensitive data processing for example. Notably this does not necessitate a unified approach to data management including the abandoning of data silos and refocusing the business around data.
Unsurprisingly, many companies struggle to extract value from even their own data let alone combine it with external data. Jenna Danko, product marketing manager at Oracle Financial Services stated that,
The [Financial Sector] sector is one of the most data-driven industries, and analysts estimate that somewhere between 80 percent and 90 percent of the data that exists within a bank’s data centre is not analyzed -- data from call logs, weblogs, emails and documents.”
This highlights the untapped potential these businesses have and the threat it could be to them when competitors exploit it before them.
The Big Data Shift
Businesses that don't adopt cloud computing can usually continue to compete in the marketplace without adopting them. They are in effect cost saving measures, which can make a business more competitive, but they rarely are a deciding factor. Moreover, even data-driven industries still have yet to achieve a complete exploitation of their own data and an extensive correlation of it with relevant external sources to achieve a holistic data view.
Big data changes this. The focus on collecting, sharing and generating insight from large volumes of data from within and beyond a business promises to improve existing products and processes. Importantly, it permits new products and processes, which will set it apart from the competition.
Big data requires breaking down organizational barriers, data silos and technical challenges to combine all the relevant data -- internal and external -- for products, analytics and insight. The technical challenges have been solved with numerous tools most of which have reached a high level of maturity in the last years, e.g., Hadoop, Hive, Pig, Flume, Sqoop, HBase, Oozie, Storm, to name a few prominent open source ones. They provide an ecosystem that can Extract, Transform and Load (ETL) data of any size and from any source easily and inexpensively.
Traditional analytics can still be applied on the outcome, however, with a holistic view of the business and the environment it operates in, based on more complete data sets and the ability to include third party data, e.g., from social networks or data mining operations. The outcome is a more accurate insight into the business and market, in a more timely fashion.
The final step with big data commonly involves Data Science experts -- data scientists -- employing machine-learning algorithms to go beyond existing analytics, reports and key performance indicators. Data Science can apply supervised learning to identify and scale problems like fraud detection, learning from past examples. Like the reporting, it can utilize multiple data sources to improve speed and accuracy of detections.
Data Science can apply unsupervised learning, e.g., to detect hidden or emerging patterns across data sources to identify sales trends or high value customer cohorts. The paradigm shift achievable today with Big Data is the virtual limitless scale of data on which these algorithms can operate with tools like Mahout.
Mobile Leads the Way
Crowdsourced sensor networks are a great example of how big data transforms products and give competitive advantages. These have significant economic value as Google’s takeover of Waze, a crowd sourcing navigation and mapping service, for $1 billion demonstrated, and they enable distinguishing features like real-time traffic updates in Google Maps.
New businesses in the mobile sector are at the forefront of this trend since these devices offer an abundance of sensor data, which can lead to surprising discoveries and products. OpenSignal published such an example: Its phone applications collect, map and illustrate data about mobile networks and data usage. It collects all kinds of data from mobile phones and aggregates it.
OpenSignal recently identified a strong correlation of battery temperature of the mobile phones and outdoor temperatures. Combining this information, geographic locations, battery temperatures and weather data revealed that it could make a prediction of the outside temperature. Each data point by itself was not telling because of strong fluctuations -- is the user inside, outside, does she carry the phone in a pocket or is she holding it. However, the combination of the many data points allows deriving an algorithm to predict temperatures.
This lead to integrate more mobile phone sensor data about air pressure and light, for example, to create a new application and website, WeatherSignal, for crowd sourced real-time weather data. This new product would have been unthinkable without Big Data, the combination and exploitation of various data sources and a holistic approach putting data at the center of the business.
For the future we can expect even more dramatic changes. Traditionally, domain knowledge, human judgments and experiments inform the design of algorithms, which then are applied to curated, purpose build data sets. This will be replaced with vast data sources, the combination of internal and external data -- Big Data and self-learning solutions.
Human judgment will still be utilized to train algorithms but the focus changes from designing specific heuristic rules, algorithms or features for machine-learning to only judging the outcome. Deep learning, for example, promises to shift our attention from data curation and algorithm design to designing questions and teaching answers. These are applied by the new generation of algorithms to patterns and relationships learned from billions, trillions, or more data points to arrive at the optimal solution.
Disconcertingly with this development we are increasingly losing the ability to explain why a certain solution or answer is given since the complexity and latent relationships are so numerous and subtle that they escape human comprehension, which is the exact reason why this will be such a paradigm shift irreplaceable by manual measures or current technology.
Deep learning and the like are in their early stages and out of reach of nearly all businesses but the most technologically advanced ones. It is crucial though to understand their impact and prepare businesses for this emerging change.
In summary, big data has arrived and enables wider reaching, faster analytics as well as intelligent insight and data driven products that non-adopters cannot match with small, traditional data sets and processing tools.The future of big data lies with intelligent algorithms, which will require a unified big data platform to be leveraged. Consequently, traditional, data silo businesses or businesses not exploiting their and third party data will have the choice: transform or disappear in the next decade.
Title image courtesy of Maria Arts (Shutterstock)