Inside the Crater

Deep content is enabling the next wave of digital transformation for core business processes. It is about storing and managing business information in a way that is both human-readable and computer-ready. It is the foundation for a whole range of new applications and improved business processes in the organization.

A fundamental change is required for the enterprise content management (ECM) industry to truly achieve deep content. We need to rethink content modeling – the language used to construct the core model, to define and manipulate the applications holding your content. This language directly limits what you can express and store and therefore limits your ability to get the most from your content.

Today's content, mainly stored in legacy ECM systems, is typically modeled with a "file-centric" approach: Files are at the center of the design, then metadata is added, and finally relationships (especially containment) are placed on top of that. This strongly orients the content around a fixed model and limits the expressiveness and readiness of the content, not only for humans but, more importantly, for software. 

Deep content is about reversing that, putting (meta)data first, then crafting your content model based on your business domain. This gives you content objects that speak your business language and puts files where they belong – inside these content objects.

Most of the technology world has become data-first, and now it's time for the content industry to join in. But first we have a few hurdles to overcome.

Core Issues With Legacy Content Modeling 

The file-centric approach, inherited from file-systems, carries three fatal flaws:

  • One file and only one file per content object means you can't manage files together when they really belong to the same business object. For example if a marketer wants to test variants on banner ads, it makes more sense for there to be one banner ad object with multiple files to represent each variation vs. multiple banner ad objects, one for each file
  • Folders can't hold files (i.e., content objects can't contain other content objects). Traditionally, an object is a "file" or a "folder." In addition to seriously limiting the expressivity of the hierarchy, this also makes containment the most important relationship
  • Properties (metadata) are flat (not nested) — for example, you can't easily model a list of addresses as part of an object or easily create an address data type that you can use on your object

Of course, there is a reason many systems that have been around for some time currently have those limitations: It's hard to implement at scale. 

As a side note, you might have noticed that these three issues are also present in the CMIS protocol. This really highlights the issue we're discussing here since these three limitations are in CMIS because they are present in most of the leading legacy ECM systems, and CMIS is attempting to support the legacy base.

Moving Into the Data-First World

Ubiquitous web and mobile apps enable the user to create and structure data easily and naturally. At the same time, big data systems are generating a ton of insights into process. Those insights become structured information that needs to be used and exploited by the user to make it valuable. Metadata makes the content usable by other systems and devices. It is now is at the core of the information — it has become content. Files now become assets that support content. 

Our industry can only enable proper application building by putting data first. We need to start from the business domain, model objects after it, make information our first priority, and consider a file to be a simple data type. First you must identify the business objects you use and manipulate.

Next, identify your data types (addresses, opening times, IP rights, etc.), and factor them so that you can reuse them on your objects. Having advanced, custom data types, where the data you manipulate every day is inherently structured, is key to advancing your business goals. 

Now identify and define the relationships among your objects, including but not limited to containment, dependencies, etc. As a sanity check, verify that access control and versioning work well with your model. Those are usually the two main factors for making important modeling decisions, especially advanced data types in a property vs. a full content object.

Using this approach will let you surface the important data, use files where they belong, and manipulate your content as information artifacts (which it is and should be) not as files (which it is not). Seeing the content this way enables focus on the value of the data around this content. 

A Wide Range of Possibilities

In a sense, it's simply a matter of applying the traditional application design pattern to ECM, modeling your content based on your business domain. 

The information expressed with your content model, following the deep content approach, is probably already lying somewhere in your information system, maybe even in several places. Putting this data back with the actual content, modeled into an information artifact, enables content to become a core component of the value of the enterprise, at the core of the information system — be it as a knowledge store, master data repository, brand asset or custom documentation. 

In a sense, that’s the next evolution for big data: Connect human knowledge with massive analysis of data and leverage the outcome for tracking noncompliance and fraud or for gathering insights into trends. Most business applications that deal with content and business processes will benefit from this approach.

Content was always meant to be structured. In fact, many previous attempts to apply new content models (XML databases, visual XML editors, etc.) are here as testimony. The reason it works today is because we have mobile and web apps that enable us to create content in a structured, manageable way. We have firmly entered the data-first word, and ECM, as the core system managing information for workers, must adapt.

Title imageCreative Commons Creative Commons Attribution 2.0 Generic License by  subarcticmike