If you believe (as I do) that semantics is the key to smart content — to content enriched and structured to promote findability, reuse, and task-focused knowledge extraction — and if you believe (as I do) in open source, then the IKS Semantic Project may be for you.
IKS stands for Interactive Knowledge Stack (see previous coverage). It provides a framework for semanticizing managed content. Why is that important? Because "current [content management systems] lack the capability for semantic web enabled, intelligent content, and therefore lack the capacity for users to interact with the content at the user’s knowledge level," according to the August 2009 project documentation.
IKS is open source, designed to integrate with open-source web and enterprise content management systems. The project is in early stages, the work of a consortium that consists of seven academic research groups and six "industrial partners," companies active in the content management space. It is funded by European Union research program grants and was accepted into the Apache incubator program, as Apache Stanbol, in November 2010.
Myself, I will learn more about the project, the technology, and adoption at the up-coming IKS Project Workshop, July 5-6 in Paris, where I am slated to speak on Smart Content. For now, I’ll summarize what I’ve learned and what I’ve concluded to date and I’ll also relate impressions I solicited from a number of industry observers.
Start with Technology
IKS is a framework. It implements a Reference Architecture for Content and Knowledge (RICK), which "provides two main functionalities for a content management system," namely "an infrastructure to manage referenced sites" and an Entity Hub, "RESTful services to work with entities used by a CMS."
Entities, in the text analytics/semantics worlds, are named persons, organizations, geographic areas, and the like. The term is sometimes extended to pattern-based elements such as e-mail addresses and Social Security Numbers and to abstractions such as "the economy" and "automobile manufacturers." RESTful refers to stateless Web services, typically invoked via a URL from a Web browser or application program.
IKS resolves entities via FISE, an open-source, RESTful semantic engine. A blog post published by IKS participant Nuxeo describes FISE in some detail. FISE is designed for private installation and uses Apache OpenNLP, a set of natural language processing (NLP) tools, for named-entity detection and the Apache Lucene full-text search library for indexing. It enriches content via links to DBpedia, which is essentially a database replicate that captures information harvested from Wikipedia data structures.
OpenNLP has Apache incubation status, which is a sort-of provisional acceptance by the Apache Software Foundation. It provides only basic NLP functions. It doesn't support extraction of facts, events, or relationships, nor sentiment or pattern-based information such as telephone numbers and e-mail addresses. It appears, however, that FISE should be able to accommodate other annotators, whether installed or invoked via calls to entity-resolution Web services. There are many NLP engines that are more advanced and capable than OpenNLP.
One IKS annotation feature, found in some but not all competing annotation engines, is ability to resolve identified entities to Semantic Web uniform resource identifiers (URIs), IDs that uniquely designate things. URIs enable the DBpedia enrichment links. Beyond DBpedia, URIs are the key linking technology for the aspirational "Web of data."
IKS has additional components, beyond NLP. They include KRES, Knowledge Representation and Reasoning, which uses ontologies and Semantic Web technologies (OWL, RDF).