In last month's article (The Semantics of Content Management: What We Mean and How We Say It), I discussed how we often trip over our collective tongues with our use of language and terminology when discussing content management and related technologies within our enterprises. This month, I thought I would address one of those terms that could possibly cause some confusion: content analytics.

Content analytics is not really a new term. I recall attending a very interesting content analytics session at a business intelligence conference in the UK back in 2007. At this year's Info360 conference in Washington D.C., there were a number of sessions addressing this area. So while content analytics might not be new, it is perhaps increasingly fashionable? I suppose the first question must be -- what is it?

Content Analytics Defined

I will provide you with that age old consultant answer to any question: "Well, it depends..."  Which of course is only marginally better than responding with "What do you want it to be?"

Content analytics can be a broad church, with many different types of believers. It can span a broad panoply of content management related technologies; indeed, last year's AIIM Industry Watch report "Content Analytics -- research tools for unstructured content and rich media" mentions all of the following:

  • Web analytics
  • Digital asset management
  • Faceted search tools
  • e-Discovery tools
  • Content de-duplication tools
  • Content assessment
  • Metadata tagging
  • Text analytics
  • Social media monitoring
  • Digital forensics
  • Sentiment analysis

A widely flung net indeed!

As you can tell from the title of the AIIM report, it examines content analytics from the perspective of tools that can analyze your content from a research perspective. It is often suggested that content analytics is bringing together content management, business intelligence (BI) and search technologies; but to achieve what end?

What Content Do You Want to Analyze -- And How?

The aim of BI is normally seen as being the discovery of trends, both historic and with suitable analysis and extrapolation, of future options and possibilities for those trends. It is seen as providing actionable information in order to inform decision making.

BI is firmly embedded in the world of structured data, relational database systems and data warehouses. So we might extrapolate from this that content analytics is about doing the same for our massive, and ever increasing stores of unstructured data. Analyzing the "internals" of our content items, for example using sophisticated text analytics on our burgeoning document stores, in order to discover new insights?

This to me is the research oriented view, and I think it's going to be difficult to achieve and difficult to measure the outcomes of such efforts. In this context, at an Info360 keynote, IBM noted that their particular content analytics solution includes advanced natural language processing technology developed for the Watson game show winning super-computer! If you fancy a rest from reading text, check out this IBM video on YouTube.

Analyzing the Use of Content

Another view of content analytics is one that is more akin in my mind to web analytics. It's about analyzing how people are using or interacting with content. As Alan Pelz-Sharpe of the Real Story Group mentioned in his content analytics session at Info360, this is already achieved to some extent in the marriage of BI "reporting dashboard" technologies with business process management or workflow technologies.

Managers can assess a real time updated dashboard showing how many items are at what particular stage of their workflows, how many exceptions there have been, what are the statuses of those exceptions, etc. a Management Information System (MIS) for content management ?

It is a well known fact that the amount of unstructured content generated in our businesses is growing on a massively exponential scale. Should information management professionals be looking for real time or near real time dashboards, as well as the non-real time, periodic reporting that we often associate with data warehouses, in order to develop their own insights into the trends developing for content creation and use within their organizations?

Most intranet managers would be able to tell you how many hits the corporate homepage has over a month. But in how many organizations can information managers show the CIO how many new documents were created and/or uploaded into the document management system; how many content items were put under retention; how many records were disposed of; how many new workspaces were created; and how many blog comments, forum postings or microblog status updates were created in the last month (and how many people actually subscribed to or read them)?

Most CMS generate large amounts of metadata such as name of file, type of content item, name of the creator and date of creation, etc. without turning on the audit logs. If you want to see how many people actually accessed a document, then you may need to configure additional audit logging, which will use up more disk space. Most CMS also use an RDBMS to hold this metadata, so can we not re-purpose BI tools we already have licenses for to undertake this task?


Of course this all depends on your business and what the potential positive outcomes might be. It may be your organization is far more worried about the ERP reports that show how many widgets its factories churn out. On the other hand, it might be very useful to know that across the globe, 33 Word documents -- using the same template, with exactly the same title, and all linked to the same (or similar) projects --  were created in the same week, and that only two people have read one of them since.

I will conclude by throwing this idea out there for some enterprising software designer to run with: If most content management systems are embracing the CMIS standard, perhaps a suite of light weight, open source, "content usage analytics" tools might be a good idea? Just asking...

Let us know using the comments feature below what type of content analytics you currently indulge in, or what interests you.