Customer Experience Management (CXM), Information Management, Social Business
 
 
 

The Semantic Web is Here? XML, Calais and SearchMonkey

JustSystems semantic web enterprise cms ecms xml rdf

When we talk about the Semantic Web we mean more meta-information hidden in the page code, but derived from the content itself, with the aim of letting Web services and search engines know exactly what's there without having to guess from keywords and tags. XML is one format which can structure content to contain more classification material. RDF is the preferred data model used, which splits content into entities and relationships, and the RDF model most usually utilizes XML to structure content.

Paul Wlodarczyk of JustSystems thinks it's closer than we think. He's written a great post about the Semantic Web, focusing particularly on how XML can take ambiguity out of search and enable other semantic advantages. He likens the 'old' Web to one where the wisdom of crowds prevails (more backlinks equals better content), and the RDF-structured Web as one where the 'wisdom of authors' wins; '…who can let the crowds know — in no uncertain terms — what their content means.' So a post on a New York sports team involved in a trade with a L.A. sports team (using Paul's example) is unambiguously about the Knicks and the Lakers, or whatever.

More importantly, using RDF/XML relationships between those entities can be formally identified. So a player-trade between those two teams can be tagged as such, and this relationship-enabling for Web content enables a whole new world of potential Web usage, not just in search but in content mashups, Web monitoring and intelligence, social marketing, targeted advertising etc.

There are three ingredients needed to enable and popularize the semantic Web, so let's look at them, with a little help from Wlodarczyk, and see if we really are on the cusp of a content revolution.

The Recipe for the Semantic Web


1: Markup Technology

XML and RDF: Check. XML has been around forever, is mature, is familiar to developers, optimized for the Web and is easy to learn. The rapidly evolving XMBRL will hopefully bring classification and order to financial and business data. W3C's RDF framework doesn't have to work off XML, but it looks like the winning ticket.

2: Transforming Content and Legacy Content to Rich Formats

Check. The key here is technologies like Calais which will auto-tag content and render it as XML. You can test this for yourself: go to Clear Forest's tag-generator and XML file rendering service (it has an open API) to check it out. You can paste text or import a file, see how the content has been tagged, and save the output as an XML file. Interestingly, Clear Forest was bought by Reuters last year.

The next step here may be unleashing an army of crawlers, which might trawl through your content rendering it as XML and perhaps mirroring it. Perhaps a shadow Web might emerge, with legacy content living both as HTML and in a new location as XML, the latter to be optimized for the new search technologies and Services?

Flights of fancy aside, the last time we talked with Acquia (commercial Drupal distros, co-founded by Drupal founder Dries Buytaert), they were talking about bundling Open Calais with their first commercial Drupal package, and there's already a Drupal Calais module.

 

Continue reading this article:

 
 
Useful article?
  Email It      

Related Articles:
Tags: , , , , , , ,
 
 
 

Featured Events  View all | Add event | feed RSS

Who's Hiring?  View all | Post a job | feed RSS


 
Are you hiring?    Post your job today ($45 for 45 days)!