Over the past few years, major enterprises have shown interest in combining semantic web technology with big data for added value. Let's take a look at what enterprises are seeking and why they think semantic web can make big data smarter.

3 Key Benefits

Provides end-users increased ability to self-manage data from varied sources

Users need to be able to search, access, aggregate, curate, filter, visualize, analyze, collaborate and create reports. They need to combine extracted or analyzed data from big data stores with data from documents, emails, spreadsheets, the web and other databases to get further insights.

By providing self-help, IT is no longer the bottleneck to business analysis and action. However IT needs to continue to manage access, security, data lineage, back-up and other much-desired enterprise IT support and governance functions. Smart data layers and smart data solutions using unified information based on semantic technology can address user self-help needs while providing the IT support and governance functions.

Addresses varying user needs and changing business environments

In traditional big data IT solutions, the data model and the IT solutions are designed to address specific business needs and to handle specific data types and data sources. As the business needs and data sources change, the IT solutions no longer work and new data marts and new solutions must be built.

Semantic-based solutions have data models that can evolve in run time. This allows the solutions to evolve with user customization requirements and changing business environments. When building a solution with semantic technology, a user can start with something quick and then evolve the solution, adding new datasets as needed, saving significant support time and expenses.

Manages terminology, concepts and relationships while connecting diverse data from varied data sources

Different data sources can define the same entity, concept or term differently. For example, IBM may be called Big Blue or International Business Machines. There is a need not only to have a glossary of terms and entities but also to manage the relationships between different data and meta-data so that search, data lineage and other actions can be performed.

Moreover, as data leaves its application, metadata must travel with the data so that the data does not lose its meaning. Semantic technology addresses these and other data relationships and meta-data management needs. If the smart data layer is placed over big data store and other existing data stores, the smart data layer can manage relationships across all these varied sources.

Industry Group Adoption

Leading industry groups such as OMG, EDM Council, CDISC and HL7 understand that big data and semantic web technology are ideal complements and have been building industry standard data models based on semantic technology that can be used with big data. Many of these groups are working with regulatory bodies to use these standards for government compliance and risk management. These standards will drive enterprise adoption.

Making Big Data Smarter

As you create a semantic layer over your big data initiative, be sure to include the following elements:

  1. Flexible, universal data model based on industry standards: Using standard industry models with a semantic platform, allows for big data solution developers to quickly create industry or company specific solutions that can be used with big data stores and where the solutions can evolve as data needs evolve.
  2. Use of semantic RDF standards to make the data “self-describing”: By using semantic RDF standards, instance data and meaning (meta-data) travel together so that both humans and machines can understand and use the data. Use of platforms or solutions built on standards also means that the solution built will be inter-operable with other technologies using the standard.
  3. Graph representation and management of data: Big data is just a large bucket of key/value pairs, with little if any relationships between the data. By using a graph representation, big data gets contextualized with entity and relationships that can be used for search and analysis. To understand the value, look at what value Facebook’s open graph provides to the Facebook social media solution.
  4. Service-Oriented Architecture (SOA) infrastructure: A SOA infrastructure over big data and existing data stores allows in run time to bring in data into the big data store as necessary. It can also be used to extract data in run time to create sandbox data marts for combining data from varied sources for user manipulation.
  5. Post-ingestion data characterization: Big data is all about collecting data without worrying about schemas and data descriptions but the problem is that usually the data never gets any sort of description so it stays “dumb” and of limited utility. But as you use and understand the data the Semantic layer should automatically classify the data, associate relationships and find new relationships. This is done by using OWL — the Web Ontology Language  — in the semantic layer.

Title image by Vilma  (Flickr) via a CC BY-NC-SA 2.0 license.