Drupal In the march toward creating the semantic web, web content management systems such as Drupal (news, site) and many proprietary vendors struggle with the goal of emitting structured information that other sites and tools can usefully consume. There's a balance to be struck between human and machine utility, not to mention simplicity of instrumentation.

With RDFa (see W3C proposal),  software and web developers have the specification they need to know how to structure data in order to lend meaning both to machines and to humans, all in a single file. And from what we've seen recently, the Drupal community is making the best of it.

Introducing RDFa

RDFa is a set of XHTML attributes meant in particular to augment visual data with machine-readable hints. In layman's terms, RDFa was created to help machines understand what humans intuitively get while browsing around the web. The hints in this case will strike those familiar with microformats and the rel nofollow open standard as rather familiar.

Actually, RDFa goes beyond this, providing somewhat of a circular benefit. While this standard helps machines understand what humans see, it also has applications for providing metadata to augment content. Machines then display the augmented content and humans suddenly understand even better the context of what they're seeing.


Okay, too many acronyms you say? It can easily happen -- especially while skipping around W3C specifications. Let's untangle just a bit.

RDFa is based upon the principles of RDF. RDF stands for Resource Description Framework. It's an endorsed W3C Recommendation implemented as an XML syntax, and is meant to be a language for representing information about stuff found on the web (things that have URIs).

If you need to assign importance to this RDF thing, consider this: RDF is at the heart of what we call the semantic web.

The Heart of RDF

The most important RDF concept to understand is that of the RDF Triple. A triple has, as one might guess, three parts: the Subject, the Predicate and the Object (you will also see Subject, Property and Value). An expression in RDF (also refered to as an RDF Graph)  then is a collection of these triples.

Resource Description Framework (RDF) Triple -- Subject, Predicate (or Property) and Object (or Value)

The meaning (or semantic value) of an RDF expression is that some relationship -- defined by the RDF Predicate/Property -- exists between the RDF Subject and the RDF Object/Value. In the end, that is as simple as the semantic web gets.

RDF in Action

The idea behind RDF is to give us geeks a simple way to make statements about things on the web and have machines understand us. Let's look at an example.

Let's say I want to express that the website found at http://www.google.com was created by Larry Page. In this case we have the following 3 things that comprise our assertion:

  1. The subject: http://www.google.com
  2. The predicate or property: creator
  3. The object or value: Larry Page

Now to put this into RDF syntax, we do the following:

 <?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  <rdf:Description rdf:about="http://www.google.com">
        <dc:creator>Larry Page</dc:creator>

RDFa is Much Simpler Than RDF

What we have above is Greek to most of us (unless you're Greek and then it's something else). Fortunately, RDFa significantly simplifies the implementation of RDF.

It provides much of the semantic power of RDF but in practical and accessible manner, and most importantly, within the context of existing XHTML usage patterns. In short, we don't have to learn much to begin using RDFa. Ah, yes...simplicity.

RDFa in Action

For a simple example, consider links. Humans understand what a link is pointing to typically by reading the linked text. Machines, however, have no idea. They just know that it's a link.

RDFa provides the "rel" attribute to help bridge this gap. You add this attribute like you would any other. Instead of using the following to point to the W3C's RDFa primer:

 <a href="http://www.w3.org/TR/xhtml-rdfa-primer/"
   >The W3C's RDFa Primer</a>

You might use:

 <a href="http://www.w3.org/TR/xhtml-rdfa-primer/" rel="cite" >The W3C's RDFa Primer</a>

...to designate that you're citing the standard's official primer page.

If you can't find the metadata types or terms you need already defined, before making up your own, look to see if an additional vocabulary has already been created. Two popular vocabularies for RDFa include the Dublin Core Metadata Initiative (DCMI) and the Friend of a Friend Project (FOAF).

The Dublin Core Metadata Initiative offers, among other things, a collection of metadata terms built around documents, such as "creator" for the original author, "type" for the type of document, and "description" for a short description of the document. To use a term from an external vocabulary, you need to first import the vocabulary (see our RDF tidbit above).

Expanding the example from earlier, perhaps the link was created to go into a larger document that starts something like this:

  <h1>All About RDFa</h1>
  <h3>CMSWire Staff Writer</h3>
  <h4>All you ever wanted to know about RDFa but 
      were terrified to ask.</h4>
       >The W3C's RDFa Primer</a> is the definitive resource ...

Now, in this section, to tell computers that you're importing vocabulary from DCMI, I add an XML namespace statement to the div that links to the machine-readable vocabulary definition document

 <div xmlns:dcmi="http://purl.org/dc/elements/1.1">

The dcmi tells computers that when they see dcmi in front of a vocabulary term, they should look to the DCMI document to know how to handle it. Now that the vocabulary's imported for this section, the title, the creator and the description can all be marked with metadata, allowing computers to decide how to format and present the information: 

 <div xmlns:dcmi="http://purl.org/dc/elements/1.1">
  <h1 property="dcmi:title">All About RDFa</h1>
  <h3 property="dcmi:creator">CMSWire Staff Writer</h3>
  <h4 property="dcmi:description">All you ever wanted to know about RDFa 
      but were terrified to ask.</h4>
       >The W3C's RDFa Primer</a> is the definitive resource ...

Using "property" tells the computers that there's extra markup terms being added so they better pay attention.

As you might imagine, using a technology like RDFa can make creating a document a bit more work, but it also means that machines can now understand the context of the content they're serving. A  program on one site looking at document after document marked up in the methods discussed above might be programmed to display all documents sortable by title or author.

A program on another site might be more interested in the titles and descriptions, de-emphasizing the author information in the listing so people can focus on what the documents are about.

One search engine might know to ignore the document because they're only interested in product information, while others focus on helping researchers and so would definitely index this content, presenting its search results in a way that helps the user quickly find what they need.

Giving machines context about what they're processing adds a whole new depth to what they can accomplish.

Drupal and RDFa

A web content management system such as Drupal can contain gigabytes of structured data. However, that structure remains safely tucked away in the database, contributing nothing to the context of the data -- nothing for machines, nothing for humans.

The first step in ensuring that this structured data might also contain machine-readable context was by bringing useful fields into Drupal's core. Each field can then be used as an attribute, and its value mapped to that attribute, embedding metadata into Drupal's XHTML output.

Now that fields are in Drupal core, RDFa support is in the process of being added. The implications for search, social networking, storefront and just about every other web application are still hard to fathom, but a few people are trying to put them into words. And into video.

Geeks, You are Not Alone

We geeks often like to be alone, safe, with our code and data. But we also know the power of community. Being a semantic web nerd is the contrary of isolation, both in principle and in terms of social reality.

Linking Open Data Cloud by Richard Cyganiak. See clickable version here.

The project of making the web more machine comprehensible has been underway for sometime. The Linking Open Data Project is an active and growing W3C community. And the W3C reported last October that as part of which there were already over 100 billion known triples and more than 100 million links between public data sets.

Projects like Silk are advancing methodologies for connecting instrumented data sets. And there are tutorials available for publishing these data sets in useful ways.

The term of the hour is Linked Data and by that it is meant data sets that are structured using RDF syntax with RDF triples providing meaning, and URIs connecting datasets. The term was coined by a little ole someone named Tim Berners-Lee. We think it just might catch on.

Going Further

Manifestations of the RDFa possibilities are still few, but they are arguably endless -- and the idea factories just now cranking up.

To spark your imagination further, check out the W3C's RDFa Primer. If you're a Drupal geek, tune into the Drupal Semantic Web Group and check out the RDFa (video) session from the last DrupalCon in DC.

Maybe it's high time your Web CMS got smarter.