If you think RDFa and the semantic web is only for geeks, it's time to take a second look. The World Wide Web Consortium (W3C) is advancing the standards for tomorrow's Internet and web content management vendors are getting on-board. The result is going to be a smarter, more findable Web.

The W3C recently took another significant step forward with their semantic web project -- the publishing of the first public working draft (FPWD) of the RDFa API. This document by the RDFa Working Group has long been expected, and is significant as it enables developers to begin using RDFa in their applications. The RDFa API document details the mechanism by which software can make use of (extract facts from) RDFa mark-up inside of web pages.

Blah blah. What is this RDFa Geekery?

If you don't follow the topic closely, you might think that RDFa is just another standard in the already cluttered world of Web standards. Well, this isn't exactly so.

First, RDFa is still in its draft stages and it is not a standard yet. As we have already reported, the first working draft of the HTML + RDFa specification was published in October 2009 and was updated in March 2010. The latest draft of the RDF API is another significant step in the progression. We're not quite to standardom yet though.

Secondly, RDFa stands a chance of revolutionizing what we now call the Web -- broadly enhancing it's usefulness and significantly changing what we know as "web browsing".

What's So Great About RDFa?

The RDFa project defines the rules for publishing structured information that is both human- and machine-readable.

Traditionally, human-readable web content is in a format machines can't semantically understand (i.e., your browser can render the content, but it can't tell you what it means) and machine-readable data is incomprehensible for most humans. The RDFa and RDFa API standards are supposed to bridge this gap.

There are many examples of human-readable and machine-readable data in Web documents. When this data is marked respectively and there is an easy way to extract it in a structured form from a document, it can be used in many ways. The purpose of the RDFa API is exactly this -- to allow web developers to programmatically extract and use structured information from normal, human-readable web pages.

Put another way, once people start publishing their web content with RDFa mark-up, the Internet will increasingly become one big web database, which means that you and I will be able to ask more sophisticated questions and get much richer responses. It also means that the robots which index the web will be able to make many, many more connections between data sources.

Once you sit back and imagine the possibilities, the semantic web starts to get exciting. And more than that, those who are not participating, are going to be left behind in the next wave of information findability and connectivity.

How RDFa API Works

RDFa API uses so-called PropertyGroups, where properties of elements in HTML and XML documents are gathered together to assert facts. These properties store data or facts about items, such as individuals, companies, information, etc. and this is the information we need to extract in both machine- and human-readable form. The extraction is accomplished with the help of interfaces, as defined by the RDFa API.

Usually RDFa documents have two data layers -- a Document Object Model (DOM) level, where information about the hierarchy and data values is contained, and an embedded metadata level, where RDFa data is stored.

Metadata can be accessed via existing DOM interfaces but this is less convenient than when both kinds of information are available in one collection (i.e., a PropertyGroup). Here is where RDFa APIs come into play and makes it easier and faster for developers to access data from both levels.

[Note: For details on RDFa syntax, see the W3C's RDFa Core working draft. For clear examples of RDFa in action, see our article RDFa, Drupal and a Practical Semantic Web.]

RDFa is used to process web pages, extracting facts that machines understand.

Web CMS Vendors Already on the RDFa Bandwagon

The RDFa standards are still developing, but cutting-edge Web CMS projects like the Drupal Web CMS are already integrating RDFa functionality. Beta 1 of Drupal 7 was recently released and one of its major new features is content type extensibility (Fields) with RDFa support (see details here).

Drupal 7's core includes customizable content field definitions where data is stored and can be rendered to the web in a machine readable format, while at the same time being consumed and presented normally in your browser.

Drupal is one of the first Web CMS packages to include support for RDFa, but the bar for participation is low, so we expect broad support for the standard as it progresses towards maturity.