If you believe (as I do) that semantics is the key to smart content -- to content enriched and structured to promote findability, reuse, and task-focused knowledge extraction -- and if you believe (as I do) in open source, then the IKS Semantic Project may be for you.
IKS stands for Interactive Knowledge Stack (see previous coverage). It provides a framework for semanticizing managed content. Why is that important? Because "current [content management systems] lack the capability for semantic web enabled, intelligent content, and therefore lack the capacity for users to interact with the content at the user’s knowledge level," according to the August 2009 project documentation.
IKS is open source, designed to integrate with open-source web and enterprise content management systems. The project is in early stages, the work of a consortium that consists of seven academic research groups and six "industrial partners," companies active in the content management space. It is funded by European Union research program grants and was accepted into the Apache incubator program, as Apache Stanbol, in November 2010.
Myself, I will learn more about the project, the technology, and adoption at the up-coming IKS Project Workshop, July 5-6 in Paris, where I am slated to speak on Smart Content. For now, I’ll summarize what I’ve learned and what I’ve concluded to date and I’ll also relate impressions I solicited from a number of industry observers.
Start with Technology
IKS is a framework. It implements a Reference Architecture for Content and Knowledge (RICK), which "provides two main functionalities for a content management system," namely "an infrastructure to manage referenced sites" and an Entity Hub, "RESTful services to work with entities used by a CMS."
Entities, in the text analytics/semantics worlds, are named persons, organizations, geographic areas, and the like. The term is sometimes extended to pattern-based elements such as e-mail addresses and Social Security Numbers and to abstractions such as "the economy" and "automobile manufacturers." RESTful refers to stateless Web services, typically invoked via a URL from a Web browser or application program.
IKS resolves entities via FISE, an open-source, RESTful semantic engine. A blog post published by IKS participant Nuxeo describes FISE in some detail. FISE is designed for private installation and uses Apache OpenNLP, a set of natural language processing (NLP) tools, for named-entity detection and the Apache Lucene full-text search library for indexing. It enriches content via links to DBpedia, which is essentially a database replicate that captures information harvested from Wikipedia data structures.
OpenNLP has Apache incubation status, which is a sort-of provisional acceptance by the Apache Software Foundation. It provides only basic NLP functions. It doesn't support extraction of facts, events, or relationships, nor sentiment or pattern-based information such as telephone numbers and e-mail addresses. It appears, however, that FISE should be able to accommodate other annotators, whether installed or invoked via calls to entity-resolution Web services. There are many NLP engines that are more advanced and capable than OpenNLP.
One IKS annotation feature, found in some but not all competing annotation engines, is ability to resolve identified entities to Semantic Web uniform resource identifiers (URIs), IDs that uniquely designate things. URIs enable the DBpedia enrichment links. Beyond DBpedia, URIs are the key linking technology for the aspirational "Web of data."
IKS has additional components, beyond NLP. They include KRES, Knowledge Representation and Reasoning, which uses ontologies and Semantic Web technologies (OWL, RDF).
Architecture and Community
IKS is being developed and supported by a community of cooperating-competing project participants and is designed for adoption by those participants and by a broader set of content-management and search providers and users. The converging-diverging needs of a diverse community are best served with a strong technical and business architecture that allows for coordinated development of components and capabilities, and IKS has one.
I won’t attempt to describe the technical elements of the IKS architecture. Instead, I will cite the technical functions it aims to support. Those capabilities are expressed as seven "industrial benchmarks" per early product documentation:
- Semantic search
- Content creation and presentation service (intelligent authoring)
- Workflow service (business processes and content)
- Multi-channel publishing (customizing content)
- Product configuration service (complex content aggregation)
- Event distribution service (spatio-temporal, semantic content, "making events visible in ambient environments")
- Customer relationship management, personalization service (with community semantics and "BI about the customer base")
IKS is in an "early adopters" phase. Who, in the CMS and search realms, is using it or otherwise participating?
- Jahia (news, site) open-source "Web Content Integration Software... combining Enterprise Web Content Management with Document Management and Portal features."
- Midgard (news, site) open-source "content management framework" for PHP and other Web languages.
- Nuxeo (news, site) open-source document management. I looked through the semantics demo and the annotations demo. There's automated content enrichment via FISE reach into DBpedia, but beyond that, it appears that the semantic hooks (annotation processes) are manual, not automated.
- TXT Solutions (multimedia)
- Alfresco partner Zaizi is building FISE into the Alfresco Enterprise CMS
Tying the participants list back to the architecture point: Note that work done by one partner in a larger open-source project is not necessarily attractive to other users of the software or other project contributors. The work will languish if the partner doesn't continue it. Architecture is important to insulate and protect the whole from the parts.
A Frank Assessment
It is clear that the project is in early stages, also that it has generated a degree of enthusiasm: Good, elements, necessary although not sufficient for success. The project is ambitious, long on vision, but also duplicative or even redundant of other efforts without necessarily being better. It is likely of interest to only a small portion of the market -- open source typically appeals to systems integrators and developers, and IKS is explicitly pitched to those audiences -- and sustainability is highly questionable.
Sustainability is a significant consideration. As mentioned, the IKS technology was very recently accepted into the Apache incubator program, in November 2010. A robust, mature product could gain full Apache acceptance. Apache cross-pollination could help. I note that the Stanbol Apache incubator proposal page states, "members of the Clerezza community have contributed some key pieces, and ties between [the Clerezza and Stanbol] communities are strong."
Clerezza is another Apache Incubator semantics-related project: "Clerezza is a service platform based on OSGi (Open Services Gateway initiative) which provides a set of functionality for management of semantically linked data accessible through RESTful Web Services and in a secured way.
Furthermore, Clerezza allows to easily develop semantic web applications by providing tools to manipulate RDF data, create RESTful Web Services and Renderlets using ScalaServerPages."
Apache interoperation -- Stanbol-Clerezza-OpenNLP -- is good, and promotion out of the incubator, at the appropriate time would be even better, an implicit endorsement of the project’s viability. But Apache presence is not enough, especially if the project is to survive beyond the point where European Union (EU) and European-government funding (in the IKS/Stanbol case) dries up. There is significant risk that the nascent community would dissolve without that funding. The research groups (7 for IKS) could move on to other funded projects and the weaker or less committed industrial users (6 total for IKS) could abandon it.
I think the key to success is on-going involvement by commercial project supporters such as Adobe, enterprise content management provider Nuxeo, and the Jahia community-applications platform and adoption, in the next year or two, by other, significant ECM/CMS/applications providers.
Adobe is, far and away, IKS’s most prominent supporter -- via its acquisition of Day Software, a Web Content Management/Digital Asset Management vendor. Given a recent presentation by Day Software/Adobe developer, Bertrand Delacrétaz, it appears that Adobe is continuing to support the effort for the time being.
Even then, the IKS technologies will likely remain only a low-end/mid-market solution -- which is not a bad thing, but it also won’t cause Autonomy, IBM, OpenText or Oracle to lose any sleep. But if IKS can become a preferred semantics add-on for open-source CMS platforms such as Drupal and Alfresco, so much the better.
Enterprise CMS-world Reactions, and Beyond
I asked a couple of enterprise content management authorities about IKS.
An executive with a large software/services firm replied, "Open source content management systems haven't really caught on ... the total market revenue is about $20 million across all of the vendors (Nuxeo, Alfresco, etc). While I agree with the idea, and the need, it's hard to see this community having that much ability to influence any time soon even if it has some EU funding."
John Blossom of Shore Communications remarked to me a few months back, "Interesting initiative, looking at the layers that could be defined I am somewhat skeptical as to how much of this can be translated from an open source specification into actual open source code."
Is there market space for IKS/Stanbol -- for open-source semantic services for mid-market content-management platforms? I’ll learn more myself at the up-coming community workshop. Readers who can’t make it to Paris next week (quel dommage) or who won’t read this article until after the workshop -- the vast majority of you -- check out IKS is on the Web at iks-project.eu, on Twitter at @iks_project, and at the Apache Incubator site.
(Disclosures: My travel expenses will be paid by the IKS Paris conference. Further, this article is adapted from research I did for a consulting client, Enterprise CMS vendor OpenText, which is not involved in the project or the conference.)