In my first article on The Semantic Web and the Modern Enterprise, I introduced the vision of the Semantic Web. I also discussed how the progress made while working towards that vision provides a strong foundation to help enterprises better deal with their information management challenges. In this article, we’ll take a high-level look at what the core Semantic Web technologies are, why they’re different from conventional technology approaches and how they deliver tangible benefits for enterprise information management.

Flexible Data Integration with RDF

Semantic Web data is represented using a technology standard called Resource Description Framework (RDF). RDF is a graph (web-like) structure that links data elements together in a self-describing way. RDF is perhaps best understood in contrast to how enterprise information has traditionally been represented for the past 30 years -- relational databases.

What does data mean?

Relational data is stored in tables, and the meaning of a particular piece of data is dependent on the particular table and column it lives in. Details about the table and column, such as its name, its definition and acceptable values, are kept in a dark corner of the database known as the catalog. This information is difficult to access and is not directly linked to the data itself. In practice, this means that software that accesses relational data needs to have the meaning of the data hard-coded into it.

The meaning of RDF data, in contrast, is part of the data itself. This means that wherever data goes, details about it (i.e. metadata) are always immediately available. You don’t need explicit knowledge of what data means, nor do you need completely separate mechanisms to interrogate the meaning of data. In a very real way, RDF data is self-describing.

How is the data stored?

Relational databases store data in tables. These tables act as both physical structures that hold data on a disk and also as a way to represent a logical model of entities and attributes. The tables contain actual data (e.g. a user’s email address) and also artifacts of the data’s physical storage (things like IDs, keys, indexes and join tables).

Because the logical model and the physical storage are so tightly intertwined, even small changes to the logical model can require significant changes to the physical model, which in turn have a ripple effect for all software components that depend on this data.

With RDF, the logical model is decoupled from how data is physically stored on disk. Attributes and relationships can be added and removed as necessary without impacting deployed applications. Details of how information is physically stored are completely isolated from the logical data model, which also has the effect that RDF is well-suited to ad-hoc data integration and exploration.

How is the data identified?

In the relational database world, identity is local to a database. Each table uses its own scheme for identifying data and telling one row apart from another. There’s no standard way to refer to data from another table or another database altogether.