The information world is clamoring for better access to information of all types. Given that what comes out of the delivery pipeline depends on the ability of functions across the entire content life cycle -- design, authoring, identification, management and delivery -- to work efficiently, significant shortfall in any function is cause for concern. 

Authoring from a Content-Centric Perspective

If there is a single word for where problems usually begin and must be addressed, it is “authoring,” the constellation of people, activities and tools responsible for capturing the rational thoughts of subject matter for the growing array of target forms and venues.

From the delivery perspective, everything will work if we can just get -- and afford -- richly tagged content using standard, widely supported schemes with embedded semantic and other types of finding aid data, in an easily searched content store ready for users’ queries.

If your primary interest is the content generated by authoring, it makes sense to concentrate on authoring functions that directly impact the data generated: the human author, the format in which the content is to be captured and the tools used to perform that capture.

First: the human author

News reporters, engineers, scientists, historians or what have you, working individually or in groups. These folks run the gamut of society’s disciplines, bringing with them the particular approaches to their work common in their communities, and except for a minority using WordPerfect and a smattering of other tools, they overwhelmingly create their content using Microsoft Word.

Authors like word processing’s ease of use and flexibility, but most of all they like the absence of discipline on their work -- you know; do the entire document using the default “normal” style and you can still make it look OK. Indeed, getting them to act congruently to capture their intellectual product in any consistent manner is a challenge from the outset.

In some areas, technical documentation for example, authors are hired to create content in the form desired, using the tools provided and upgrading their skills where necessary to become and stay productive. A challenge in itself, this is orders of magnitude easier than working with subject matter experts who view their role as thinking and writing, not mastering new capture technologies.

So we are well advised to remember that our intended participants won’t necessarily be cheering us on to change their working lives.

Second: The Format in which the content is to be captured

Today, the most important element of our subject is the content moving from creator to consumer. Change the people; change the tools; change the vendors or delivery medium: it still works if somehow the desired content can be created and moved through the system. But change the content in a major way and things may grind to a halt until the downstream functions can be reworked to handle the new formats.

XML to the Rescue?

Today’s odds-on favorite for life cycle effectiveness is content marked up in XML and its various associated protocols. Properly deployed, XML content can carry a wealth of information about the subject, its source, its logical organization and how it can best be located and delivered. No other lexicon is likely to fully meet this set of requirements, objections from various technical communities notwithstanding.

So, we might ask: “in a Microsoft Word-based world, how can we get the XML content we need?” XML authoring works best when the involved industry develops and validates content standards which are then supported by industry vendors giving adopters a wide range of available tools and support. This has worked in a number of major industries, and, even with only partial implementation, it can work in today’s information delivery world.

It will, however, require imposition of an increased level of discipline on authors and on the authoring process.

Finally: the Tools

If authors, especially subject matter experts, typically do their work in MS Word, might we look to Word for some sense of the level of content creation available to us and what we must do to produce final deliverables?

Microsoft and MS Word: Boon or Bane?

Word, despite its high level of flexibility and functionality for its users, uses a tightly controlled internal data model based on linear objects it calls “paragraphs,” defined as content between hard returns -- tables of course are a different matter and are handled with unique formats.

This is important because much of today’s content destined for web delivery is hierarchical in nature: elements nested within each other so that each level acts as the child of its parents and recognizes siblings with the same parentage. While XML is designed to record and leverage this hierarchical nature, Word’s underlying data models do not easily record it nor impose rules for what may or must be nested within what.

Word and XML: A Tortured Relationship

Accordingly, as the web delivery world demands increasingly nested content, systems based on Word have difficulty mapping between initial input files and needed deliverable formats; indeed, an entire sub-industry has grown up to provide tools and services to make the necessary translations, mostly from Word to XML. The development and adoption of the Open Office XML standard -- interestingly, by Microsoft -- has made the intersection between word processing and XML somewhat more transparent, but it hasn't done much for hierarchical content.