XML and the content imperativeWhere do most automation efforts begin or get to almost immediately? With the technology, sometimes even choosing the products before they start. Should they?

Probably not, but they do anyway.

Projects that start with technology often end up with technology unmatched to their needs, costing more than they can really afford, often failing to improve things or making them worse.

Is there a better way to approach automation? Experience suggests that there is: following the content imperative.

What is the Content Imperative?

For most applications, content is the only enduring value created in the application lifecycle. While not always the case -- process-centric credit card authorization transactions are an exception -- the Internet and Web are making content the major value-added in automation. Even in workflow, it’s the content that flows, not the work which is performed on it during the application lifecycle.

From that realization grows an assumption that content is the key value component, and if it is right, a wide range of technology can be successful. If it is not, no amount of technology will generate a truly workable environment.

Content as the Repository of Value

To find and follow the content imperative, you must understand what content means to your organization and efforts. Though not always the case, the majority of content in today’s world is logically complex, intended for diverse audiences, and usable at multiple levels via components nested within it.

This is not, unfortunately, how most organizations view content and is certainly not how most software vendors would like you to view it. Their view: dumb content means smarter and more expensive technology.

So the first part of the content imperative is to understand what you have or know that others want, demand and will pay for. Only then can you begin the process of figuring out the best way to create it.

Notation ain't Design

Once you have identified your intellectual value, you must select a notation to record it: my suggestion is XML and its related protocols. XML, if not the latest gee-whiz protocol, is by far the most mature, flexible and easily processed data recording approach available, its subtle disparagement by the database community notwithstanding. People -- including me I admit -- after a long history with XML and its precursor SGML, tend to view the decision to adopt XML as a given, but it is more difficult than it may seem and this view has not been conventional wisdom.

Many organizations begin with their final deliverable on the web, see the HTML with its <angle bracket notation> and congratulate themselves on using XML. Not quite for a number of reasons.

Look at a file of content in HTML (or XHTML: essentially XML syntax-compliant HTML) and you see a well formed structure with < > tags, names and attributes. What could be more useful than that? The answer is “a lot." The HTML tag set, even with the addition of “cascading style sheets” and other ancillary metadata, is designed to record how a browser should display raw content on a screen, caring little about the underlying information value or other delivery modes.

While valuable in browser setting, this form of notation ignores the majority of value in content, and by ignoring it makes it unavailable. If HTML or XHTML is your guide, you will end up leaving a great deal of value on the table.

Designing your Intellectual Value

So the first thing you must do as you adopt the content imperative is to look at your intellectual property with an eye toward what you have that can and should be captured in content and, once captured, all of the contexts and uses to which it should be put.

There are plenty of XML content design examples: Congress’s recently issued USLM for legislative content, Docbook for technical documentation, S1000d and ATA for maintenance, DITA’ six DTDs, NLM for medical and scientific documents and a host of others, each designed to capture a high percentage of intellectual value and make it available for use in all of the contexts for which it is or may be intended.

If you don’t have someone on staff that can do this, contract for the skills. It isn't highly technical but it ain't beanbag either, and it’s no game for amateurs ... or software vendors with an agenda.

It’s likely that you will find one or more standard sets of XML schema and DTDs, all or a portion of which may fit your needs. As you move through this phase, remember, starting with well designed XML, you can transform your content into just about any form or product you wish, usually with free or low-cost tools. This reality frees you from dependence on software vendors who claim that only their tools can solve your problems and from the fear that you must deal with a single vendor to guarantee effective flow of content through your organization.

With XML, all you need is vendors and tools that support the standard and you can mix and match.

Look Closely at your Content Sources

When you have made a tentative decision about what content design best fits your information needs -- XML preferably -- you will need to look at how you get the content you want with the resources you have available. Unfortunately, content creation has been focused on word processing for so long that old habits, especially when favored by Microsoft and its ilk, die hard. You have, however, a number of avenues to get what you want, detailed in an earlier column and while you may not fully get to native creation of the forms you select, you can get close enough to do the job.

Your best avenue will probably be “discipline and convert.” If your raw content is in a word processing form (Word, WordPerfect, etc.), you may not be immediately able to convince your authoring community to create XML from the outset. You can, however, work with your creators to jointly impose a higher degree of discipline on what they do, using WP templates, enhanced training, even WP applets running during capture to help with consistency. This takes some work, but it can be done, to both the authors and your benefit in the long run.

In a major defense intelligence facility, for example, the authoring community -- under the extreme pressures of a post-9/11 world -- balked at use of native XML authoring tools but accepted discipline plus templates plus training plus applets for complex forms. The resulting WP files are then converted to the DIA -- mandated XML forms via software.

Likewise, the commercial aircraft and airline industries, facing stiff worldwide accident liability issues, began a move toward standardized maintenance, test and operation content decades ago, adopting SGML and XML as each became available.

The message here is that creating richly designed XML content, while no simple task, is achievable in the real world we all inhabit.

Manage What You have Created

Once you have engineered the creation of XML content, you must manage it for use. There is no shortage of vendors ready to sell you their stuff (you could develop it yourself, but no one much does that any more).

As you contemplate how to proceed, consider that content management systems, the software that keeps track of your content, people and tasks, tend to fall into three categories, with the proviso that for some environments, an acceptable degree of management can be achieved without a formal CMS:

  1. The Database Content Management Systems that exclusively use an underlying RDBMS to keep track of things. There are a number of these, some quite good, but they all suffer from the fact that hierarchical content (XML) doesn't work well with relational tables and fields. These systems work around this problem, enabling them to handle XML, but at the cost of more complex software and somewhat less capable operation with richly tagged content
  2. The native XML Database Systems (NXD) that handle the hierarchical nature of XML out of the box, making its entire logical structure available without fragmentation. There are only a few of these: Marklogic, BaseX, eXlist and Sedna for example, but they deserve a close look if you are dealing with logically complex content and the need for a wide range of output products and uses.
  3. Hybrid Database/native XML Systems that blend the features of 1 and 2 above. There are a few of these; EMC Documentum and SDL Contenta come to mind, offering some of the advantages of each type. They typically, however, manage the core content in their relational database, offering limited additional functionality available via their native XML components and are probably the best choice for organizations that already use one of the relational systems.

In a world constantly offering technology as savior, keeping your mind on your content can require a near Odyssean lashing to the mast. But you can do it, and those who do, if they persevere, more likely than not find a path to success.

The Defense Intelligence Agency (DIA) took that critical first step when its director Vice Admiral Lowell Jacoby, addressing congress in 2002, said "We must move toward a common data framework and set of standards that will allow interoperability at the data, not system, level." That decision to focus on content, unfettered by mandates about technology or systems architecture, allowed DIA to find and successfully follow its content imperative, as the Library of National Intelligence, opened in 2007 will attest.

If an organization the DIA's size and complexity and the entire air transport industry can do it, chances are you can too.

Title image courtesy of Sergey Nivens (Shutterstock)

Editor's Note: Be sure to read Barry's previous article on XML, The Battle for Data Supremacy: The Cost of Ignoring XML