WCM Fields Notes is a regular column written in collaboration with Jon Marks (@McBoof), head of development at LBi. This first issue looks at evolving standards in the content management space and how they might influence you when selecting and implementing a web content management system.

The web is humming with talk of standards at the moment -- due mainly to the fact that the Content Management Interoperability Standard (CMIS) version 1.0 is nearing its final state and open for public review.

To celebrate this, I recently drew a picture:

2009-12-JMarks-JCRCMISOverview_v1-4.jpg

CMIS, JCR and OSGi for Idiots by Jon Marks (full version here)

Now while this picture is undoubtedly the best thing you've ever seen, it may very well make zero sense to you. To address that problem, in this article I shed some light on what the JCR, CMIS and OSGi standards are, and why you should care about them.

Java Content Repository (JCR) Alive and Well

Unlike standards such as WebDAV and the other JCR (the Johnson County Republicans), the Java Content Repository is still in active development.

For the purposes of this article, let's define the JCR as an infrastructure specification for interacting with general purpose content repositories using a Java API. If JCR is a new topic for you, I strongly recommend having a look at JCR In 15 Minutes by Bertrand Delacretaz.

For the rest of us, let's remember that a content repository is a multi-purpose service that combines the best of relational databases and file systems, with a number of useful facilities like versioning and observation. The following diagram -- presented by David Nuescheler (@davidnuescheler), JCR Spec Lead,Official JCR/CMIS Liaison and CTO of Day Software, at the recent JBoye conference in Aarhus -- provides us with a quick reference for typical content repository services.

2009-12-JCR-Slide-01.jpg

Content Repository Concepts by David Nuescheler

The Java Content Repository specification has evolved over a number of years and with the participation of many different software organizations. Three months ago we saw the final release of version 2.0 of the spec. The two versions of the Java Content Repository -- JCR 1.0 and JCR 2.0 -- are defined by two Java Specification Requests, JSR-170 and JSR-283 respectively.

The following diagram provides a high level view of JCR and the domains of v1.0 versus the v2.0 specification.

2009-12-JMarks-JCR.jpg

JCR from 10,000 Feet -- v1.0 versus v2.0

The v2.0 iteration deprecated XPath support and introduced a number of new features, with my personal favorite being Shareable Nodes. The Shareable Nodes feature gives you the ability to implement a content graph, not just a tree. In other words, with v2.0 a content item in the repository can have more than one parent.

Does JCR Need to be in Your RFP?

In the field, I still see the JCR specs mentioned in many content management RFPs. Then again, we know that buzzwords are often included in CMS RFPs just for the sake of it. To this point Matt Hamilton (@HammerToe), Technical Director of Netsight, tweeted this recently:

Got govt tender doc mandating JCR-* standards, which effectively mandates Java. Can a govt body legally mandate a technology in a tender?

Matt raises an interesting question. I don't think we'll ever know if this was put into the RFP to intentionally mandate Java, or was simply cut-and-paste from some other buzzword-rich requirements document.  

I often see strange stuff in CMS RFPs. Piero Tintori (@pierotintori), CEO at TERMINALFOUR (news, site) has some funny examples of questions he recently saw. My favorites include "What size are the shipping pallets used for your product?" and "Is your product radioactive?".

You need to understand that mandating JSR-170 and/or JSR-283 in your RFP implies a Java-based CMS. If you include such requirements, know precisely why, have a clear business case supporting your thinking, and don't waste the time of the non-Java vendors by sending the RFP to them.

The New Kid on the Block: CMIS

The Content Management Interoperability Specification (CMIS) is an OASIS project started in 2008 and driven by a number of medium and large content management vendors (i.e., Alfresco, Day, EMC, Fatwire, IBM, Microsoft, Open Text and others).

At this point I'm assuming that most readers have basic familiarity with CMIS. But for those that don't, have a look at the informative JCR loves CMIS presentation by Nuescheler and review the CMSWire coverage here. If you're feeling ambitious, you can take the deep dive on the OASIS project website.

For the purposes of this article, let's define CMIS as an interoperability specification for interacting with document-centric content repositories via HTTP-based protocols.

To keep ourselves on the straight and narrow, focus on the oft overlooked word interoperability in the CMIS acronym. Keep in mind that the aim of CMIS is to allow diverse systems to interoperate.

Further, CMIS is by definition a lowest common denominator specification -- it only provides core functionality. And by being simple, it is meant to be easy for vendors to implement. In terms of its place in the standards world, CMIS is intended to complement JCR, not compete with it -- JCRs are used as a systems internal repository, where as repositories interacted with via CMIS-compliant interfaces are typically supplementary.

2009-12-JMarks-CMIS.jpg

CMIS from 10,000 Feet

Should it be called DMIS?

There is broad agreement that CMIS 1.0 focuses on document-centric use cases -- and it's for this reason that I raise the (rhetorical) DMIS question. This is also the reason that the WCM Field Notes on CMIS make for easy reading -- there are none, nor will there be for a while.

CMIS is not a Web Content Management (WCM) specification, nor is it an Enterprise Content Management (ECM) specification. In fact, let's keep in mind that it isn't a content management specification at all. It is a content repository interoperability specification.

If you are managing composite HTML pages (that's WCM), CMIS isn't ready for you. Please don't go adding CMIS as a requirement in your WCM selection RFP document quite yet. It's not big and it's not clever. Feel free to include it in your Document Management RFP as recently suggested by Alan Pelz-Sharpe of CMS Watch.

If CMIS is Not for WCM...Then What? 

As usual, Laurence Hart (The Pie) explains it best, so I'm going to paraphrase him. His first example is Repository-to-Repository (R2R) interaction.

R2R interactions happen often in existing Enterprise CMS suites, often in proprietary ways. A fairly typical example is the journey of a piece of content from the collaboration tool (authoring environment) to the content management tool (to publish it) to the records management tool (to keep it compliant).

Each component in this journey could speak CMIS and pass the content directly between one another. As CMIS does not have an event system to let some external workflow engine know about changes, this kind of integration can't be easily replaced by an external application without hacking.

I prefer Laurence's second use case -- Application-to-Repository (A2R) interaction. Here we have an application that uses CMIS to talk to any compliant repository. This is the "SQL for Content Repository" situation and the possibilities are endless. The "application" could be lightweight JavaScript running in a browser for a bit of CMIS mashup fun.

His third use case, Federated Repositories, is A2R on steroids -- a single user interface presents the information from multiple repositories. This is clearly good news for a cross-repository search but, then again, search engines have always been good at indexing disparate sources and aggregating the results, so we probably don't need CMIS for that. But if we want to edit the results and save them back to the repository, we need a lot more than a search engine.

People have suggested that CMIS might be useful for content migrations. This makes some sense, but only if you're in a hurry to decommission a legacy system. Otherwise, just leave the content where it is and use one of the methods above to get at it.

It will be interesting to see if pure-play content migration vendors such as Vamosa and Kapow take an interest in it. My guess is no.

[Editor's Note: For more on this topic, see our series on content migration tools.]

A CMIS vs. WSRP Diversion

For clarification purposes, it's worth mentioning two other Java Specification Requests common in the CMS field. JSR-168 and JSR-286 define different versions of the Java Portlet Specification.

I think that the relationship between CMIS and the JCR is quite similar to the relationship between Web Services for Remote Portlets (WSRP) and the Java Portlet Specs:

  • One provides a local API, the other is meant to be used remotely over HTTP
  • One is Java based (although the JCR has been ported), while the other is programming language independent
  • Java Portlets and JCR repositories can be exposed via WSRP and CMIS using Apache WSRP4J and Apache Chemistry respectively
  • The one is a Java Community Process specification, the other is an OASIS specification 
  • They are complementary, not competing technologies

With that said, it appears that WSRP is losing the standards war to widely adopted, simpler things like Google's OpenSocial and other widget/gadget platforms. Let's hope the same isn't in store for CMIS.

OSGi - the Dynamic Module System for Java

You may not have heard of OSGi, and the acronym certainly doesn't give away any secrets. It used to stand for the Open Services Gateway initiative, but it doesn't stand for much of anything these days. This is not to say that OSGi is passé or dead. Au contraire.

To start, it's much more helpful to use the tagline: The Dynamic Module System for Java (tm). It doesn't have much to do with the JCR or CMIS, except that it is an important part of Day Software's JCR repository product, and is included on my most excellent diagram. More seriously, I feel it deserves a mention and that content management people should be aware of it.

Enterprise Java No Place for Dilettantes

Let's face it, Java Enterprise Edition can be a beast -- the world is trying to keep things simple and Java EE certainly isn't. This is precisely why Java application frameworks like Spring have taken a big bite of the market. The main selling points for OSGi are that compared to a traditional Java enterprise application deployment, an OSGi-based app improves modularity and is much simpler.

OSGi provides a framework which sits on top of a Java Virtual Machine. Developers create bundles which live inside the framework. The framework manages the lifecycle of these bundles -- installing, starting, stopping and uninstalling them, the dependencies between them (saving you from Dependency Hell) and controls access to them. The framework also provides a Compendium of Services (much like Java EE) to make a developer's life easier. These include logging, configuration management, monitoring, caching, deployment and provisioning.

The following diagram provides a high-level view of bundles and OSGi framework services. For a deep dive on why one might fall in love with OSGi, you can scratch around in the OSGi Alliance website.

2009-12-JMarks-OSGi.png

OSGi and Bundles From 10,000 Feet

Why You Should Care

Content management people with a penchant for Java should have an eye on OSGi. Why? Because with OSGi one can do many neat things in a sane manner. For example, engineers might use OSGi as a basis for a product's auto-update feature. Or they can use it to deploy code and content bundles between different environments. Or they could run two versions of a product side-by-side. All the while, their code will be neater and more modular, and change management will be more rational.

For now, let's just settle for two OSGi takeaways. Firstly, writing truly modular software is good, and OSGi provides a framework for doing this. Secondly, you don't just deploy applications. You deploy everything. When I say everything, this includes code, content, configuration or combinations of these things together.

If you find these concepts appealing, I urge you to read more about OSGi and consider how its use might impact your life and/or the sanity of the platform you're about to invest in. But think carefully about including OSGi as a CMS RFP requirement -- doing so will dramatically limit your options.

Some good OSGi starting points:

In summary

The success of a standard is, in my humble opinion, measured entirely by how widely it is adopted. When you're out hunting in the Web CMS field, you've got to keep your eye out for the important ones. Long-established standards are easy to spot, but sometimes a standard is on the verge of fame so you miss it. There is nothing more depressing than hacking together your own proprietary mess only to discover that something already exists that will do a far better job than you ever did. So, when putting together your solution architecture remember:

  • If you're using a Java API to access a content repository, make sure you think about the JCR as an alternative
  • If you're accessing remote documents over the web, consider using CMIS.
  • If you're scared of building a tightly coupled Java monolith, read more about OSGi

The standards are out there. Stand on the shoulders of giants. Use them.

And now we'll close with the wisdom of Bob Dylan:

Oh, what did you see, my blue-eyed son?
Oh, what did you see, my darling young one?
I saw a newborn baby with wild wolves all around it,
I saw a highway of diamonds with nobody on it,
- A HARD RAIN'S A-GONNA FALL