One of the final sessions at Gilbane SF yesterday was around content standards: CMIS, JSR-170 and JSR-283.

Many realize there are several challenges with CMIS in particular and efficiently working with content from disparate content repositories in general.

The session aimed at shedding light on some of these challenges and possible solutions in the standards space.

Too Much Content in Too Many Content Repositories

Chances are, if you’re in the enterprise content management space and you have an ECM system, this still doesn’t solve all your ECM problems. There are also document management and digital asset management systems, for example, you need to be able to “talk to.” Users of one ECM system often need to access and store documents in an entirely different content repository.

Content standards existing right now are conceptually simple but challenging to implement. At the same time, many realize that content integration standards hold some promise for developing an enterprise-wide content infrastructure.

During the Gilbane SF session on content standards, we looked at challenges and prospects for existing and future standards like JSR-170, CMIS and JSR 283.

Moderated by Larry Hawes, the session featured two implementers who tried to address content integration problems.

CMIS, or Can Content Management Software Keep Pace?

Dick Weisinger, vice president and chief technologist at Formtek kicked off his presentation full of interesting statistical numbers. Weisinger’s pointed out a fact we’re all very well aware of: content in the digital universe is exploding.

According to a research, the amount of worldwide content was estimated at the following numbers:

  • 2003 -- 20 exabytes
  • 2008 -- 486 exabytes
  • 2010 -- 988 exabytes

At the same time, the hardware costs are shrinking. The question posed by Weisinger was whether software can keep pace in this ever-changing industry landscape.

From content management to other related areas, the speaker discussed how search algorithms have evolved, but the search is still really not very good. What makes search even more challenging is that it is hard to model enterprise usage patterns.

Nevertheless, enterprise search market is still growing with many companies doing different things with more or less success. However, today’s stats from AIIM say that 49% of business users consider finding data a difficult task to do.

Scattered data repositories only add to the challenge. The majority of companies have an assortment of repositories, be it ERP, PLM, PDM, BI, KM, WCM, or DM systems. The problems we run into with multiple repositories are compliance, eDiscovery and business intelligence.

Add to that the fact that 80% of data is unstructured, and the enterprise world looks very gloom. Search gets harder as data sets grow. It takes longer to index. Thus, it takes longer to search.

But the goal remains: extract knowledge and distill data.

How you do that?

  1. Structure your data (using XML, for example)
  2. Centralize multiple repositories and manage efficiencies of scale

Weisinger referred to CMIS as SQL for document management. Although, we also do need to point out that CMIS can be compared to more like HTTP for DM since it’s a protocol, as rightfully so noted by Eric Barroca, CEO of Nuxeo, who was in the audience.

In the end, the challenge is still around massive growth of content and the need for content intelligence.

ECM: CMIS or JSR-170/283?

Naresh Devnani, managing director at Lean Management Group, gave us a peek into real-life scenarios and impressions of implementing a standard’s wrapper, from the times when he was working for Vignette PS.

Devnani talked about implementing a JSR- 170 (the standard that was led by Day Software) level 1 functionality for an RDBMS-based web content management system.

The reality is that most customers have more than one repository. The focus of CMIS should really be around helping customers not vendors. JCR is independent of the repository logic, while CMIS targets one or more content repositories in order to allow for communication between them.

The conclusion was that JSR-170 is not the best fit for WCM with its nodes and properties equaling to folders and files. In Devnani’s opinion, it fits document management better.

One of the challenges in implementing JSR-170 is lack of information. The TCK has not been updated for quite some time now, even after bugs were reported. Jackrabbit is the only active community out there in relation to the JSR-170 standard.

Speaking of lessons learned, Devnani mentioned:

  • Lack of ease in implementing a contained 1-n parent-child relationship
  • Inefficient reference model in certain cases
  • Node types not useful for WCM object wrapper
  • No multidimensional view of repositories
  • Big ramp-up

Awaiting JSR-283

It’s been close to three years since JSR-283 early draft. Since then, a new set of APIs has been added, but the basic architecture is still the same and it looks to be just more of a fine-tuned version of the older standard.

High Hopes for CMIS

Devnani said: “When CMIS first was announced I was excited because of the vendor names associated with it.” The CMIS should focus on collaboration between content authors, mashups and portals.
One of the examples at the session was quite shocking, actually. According to Devnani, some customers think of interoperability in terms of a content migration and moving things around from one repository to another.

In the end, Devnani reminded us all of the huge impact of social media on the industry. Personalization is again at the front. Applications are becoming more complex, while content standards may not necessarily be catching up. Some vendors, however, do manage to catch up. While every enterprise player is looking into federated search nowadays, CMIS is complimentary to search, as search alone has limited value.