The Dublin Core Metadata Initiative offers, among other things, a collection of metadata terms built around documents, such as "creator" for the original author, "type" for the type of document, and "description" for a short description of the document. To use a term from an external vocabulary, you need to first import the vocabulary (see our RDF tidbit above).

Expanding the example from earlier, perhaps the link was created to go into a larger document that starts something like this:

<div>
  <h1>All About RDFa</h1>
  <h3>CMSWire Staff Writer</h3>
  <h4>All you ever wanted to know about RDFa but 
      were terrified to ask.</h4>
  <a 
     href="http://www.w3.org/TR/xhtml-rdfa-primer/"
     rel="cite"
       >The W3C's RDFa Primer</a> is the definitive resource ...
</div>

Now, in this section, to tell computers that you're importing vocabulary from DCMI, I add an XML namespace statement to the div that links to the machine-readable vocabulary definition document

<div xmlns:dcmi="http://purl.org/dc/elements/1.1">

The dcmi tells computers that when they see dcmi in front of a vocabulary term, they should look to the DCMI document to know how to handle it. Now that the vocabulary's imported for this section, the title, the creator and the description can all be marked with metadata, allowing computers to decide how to format and present the information: 

<div xmlns:dcmi="http://purl.org/dc/elements/1.1">
  <h1 property="dcmi:title">All About RDFa</h1>
  <h3 property="dcmi:creator">CMSWire Staff Writer</h3>
  <h4 property="dcmi:description">All you ever wanted to know about RDFa 
      but were terrified to ask.</h4>
  <a  
     href="http://www.w3.org/TR/xhtml-rdfa-primer/"
     rel="cite"       
       >The W3C's RDFa Primer</a> is the definitive resource ...
</div>

Using "property" tells the computers that there's extra markup terms being added so they better pay attention.

As you might imagine, using a technology like RDFa can make creating a document a bit more work, but it also means that machines can now understand the context of the content they're serving. A  program on one site looking at document after document marked up in the methods discussed above might be programmed to display all documents sortable by title or author.

A program on another site might be more interested in the titles and descriptions, de-emphasizing the author information in the listing so people can focus on what the documents are about.

One search engine might know to ignore the document because they're only interested in product information, while others focus on helping researchers and so would definitely index this content, presenting its search results in a way that helps the user quickly find what they need.

Giving machines context about what they're processing adds a whole new depth to what they can accomplish.

Drupal and RDFa

A web content management system such as Drupal can contain gigabytes of structured data. However, that structure remains safely tucked away in the database, contributing nothing to the context of the data -- nothing for machines, nothing for humans.

The first step in ensuring that this structured data might also contain machine-readable context was by bringing useful fields into Drupal's core. Each field can then be used as an attribute, and its value mapped to that attribute, embedding metadata into Drupal's XHTML output.

Now that fields are in Drupal core, RDFa support is in the process of being added. The implications for search, social networking, storefront and just about every other web application are still hard to fathom, but a few people are trying to put them into words. And into video.

Geeks, You are Not Alone

We geeks often like to be alone, safe, with our code and data. But we also know the power of community. Being a semantic web nerd is the contrary of isolation, both in principle and in terms of social reality.

lod-datasets_2008-03-31.jpg
Linking Open Data Cloud by Richard Cyganiak. See clickable version here.

The project of making the web more machine comprehensible has been underway for sometime. The Linking Open Data Project is an active and growing W3C community. And the W3C reported last October that as part of which there were already over 100 billion known triples and more than 100 million links between public data sets.

Projects like Silk are advancing methodologies for connecting instrumented data sets. And there are tutorials available for publishing these data sets in useful ways.

The term of the hour is Linked Data and by that it is meant data sets that are structured using RDF syntax with RDF triples providing meaning, and URIs connecting datasets. The term was coined by a little ole someone named Tim Berners-Lee. We think it just might catch on.

Going Further

Manifestations of the RDFa possibilities are still few, but they are arguably endless -- and the idea factories just now cranking up.

To spark your imagination further, check out the W3C's RDFa Primer. If you're a Drupal geek, tune into the Drupal Semantic Web Group and check out the RDFa (video) session from the last DrupalCon in DC.

Maybe it's high time your Web CMS got smarter.