In my first article on The Semantic Web and the Modern Enterprise, I introduced the vision of the Semantic Web. I also discussed how the progress made while working towards that vision provides a strong foundation to help enterprises better deal with their information management challenges. In this article, we’ll take a high-level look at what the core Semantic Web technologies are, why they’re different from conventional technology approaches and how they deliver tangible benefits for enterprise information management.
Flexible Data Integration with RDF
Semantic Web data is represented using a technology standard called Resource Description Framework (RDF). RDF is a graph (web-like) structure that links data elements together in a self-describing way. RDF is perhaps best understood in contrast to how enterprise information has traditionally been represented for the past 30 years — relational databases.
What does data mean?
Relational data is stored in tables, and the meaning of a particular piece of data is dependent on the particular table and column it lives in. Details about the table and column, such as its name, its definition and acceptable values, are kept in a dark corner of the database known as the catalog. This information is difficult to access and is not directly linked to the data itself. In practice, this means that software that accesses relational data needs to have the meaning of the data hard-coded into it.
The meaning of RDF data, in contrast, is part of the data itself. This means that wherever data goes, details about it (i.e. metadata) are always immediately available. You don’t need explicit knowledge of what data means, nor do you need completely separate mechanisms to interrogate the meaning of data. In a very real way, RDF data is self-describing.
How is the data stored?
Relational databases store data in tables. These tables act as both physical structures that hold data on a disk and also as a way to represent a logical model of entities and attributes. The tables contain actual data (e.g. a user’s email address) and also artifacts of the data’s physical storage (things like IDs, keys, indexes and join tables).
Because the logical model and the physical storage are so tightly intertwined, even small changes to the logical model can require significant changes to the physical model, which in turn have a ripple effect for all software components that depend on this data.
With RDF, the logical model is decoupled from how data is physically stored on disk. Attributes and relationships can be added and removed as necessary without impacting deployed applications. Details of how information is physically stored are completely isolated from the logical data model, which also has the effect that RDF is well-suited to ad-hoc data integration and exploration.
How is the data identified?
In the relational database world, identity is local to a database. Each table uses its own scheme for identifying data and telling one row apart from another. There’s no standard way to refer to data from another table or another database altogether.
RDF, on the other hand, names everything using URIs. (URIs are just like the URLs that we use for web pages, except that they’re being used to identify data elements.) With RDF, you can be confident that if two databases from different organizations across the world use the same identifier, they mean the same thing.
Together, these characteristics mean that it is far easier and cheaper to deal with change and to combine together data represented in RDF than it’s been in the past with other approaches. You can begin to allow owners of diverse but related data to collaborate on each other’s data without requiring months and months of upfront coordination. Your software can evolve to incorporate new sources of data and new business requirements more quickly than before.
Speaking to All Audiences
RDF Schema and OWL are two core Semantic Web technologies focusing on describing the concepts and relationships within data for both people and for software.
For people, these technologies use the same language that subject-matter experts in a domain would use to talk about their data. They provide labels and descriptions intended for people, and they’re not obfuscated with irrelevant IDs, codes, or abbreviations. Often, software user interfaces can be driven directly from the human-friendly descriptions of the data in RDF Schema and OWL.
For software, these technologies are a formal and expressive way to define the semantics of concepts and the relationships between them. Software components known as reasoners can use these rich descriptions to automatically classify data in meaningful ways and discover new relationships within existing data.
This means that unlike in the relational world, the schema/model is not just used to define how software stores the data. And unlike in the UML world, the schema/model is not just used to document the concepts and relationships for people. Instead, the Semantic Web models expressed in RDF Schema and OWL directly drive user interfaces, ensure valid and consistent data in complex domains, and give end users a better understanding of their data.