The information world is clamoring for better access to information of all types. Given that what comes out of the delivery pipeline depends on the ability of functions across the entire content life cycle — design, authoring, identification, management and delivery — to work efficiently, significant shortfall in any function is cause for concern.
Authoring from a Content-Centric Perspective
If there is a single word for where problems usually begin and must be addressed, it is “authoring,” the constellation of people, activities and tools responsible for capturing the rational thoughts of subject matter for the growing array of target forms and venues.
From the delivery perspective, everything will work if we can just get — and afford — richly tagged content using standard, widely supported schemes with embedded semantic and other types of finding aid data, in an easily searched content store ready for users’ queries.
If your primary interest is the content generated by authoring, it makes sense to concentrate on authoring functions that directly impact the data generated: the human author, the format in which the content is to be captured and the tools used to perform that capture.
First: the human author
News reporters, engineers, scientists, historians or what have you, working individually or in groups. These folks run the gamut of society’s disciplines, bringing with them the particular approaches to their work common in their communities, and except for a minority using WordPerfect and a smattering of other tools, they overwhelmingly create their content using Microsoft Word.
Authors like word processing’s ease of use and flexibility, but most of all they like the absence of discipline on their work — you know; do the entire document using the default “normal” style and you can still make it look OK. Indeed, getting them to act congruently to capture their intellectual product in any consistent manner is a challenge from the outset.
In some areas, technical documentation for example, authors are hired to create content in the form desired, using the tools provided and upgrading their skills where necessary to become and stay productive. A challenge in itself, this is orders of magnitude easier than working with subject matter experts who view their role as thinking and writing, not mastering new capture technologies.
So we are well advised to remember that our intended participants won’t necessarily be cheering us on to change their working lives.
Second: The Format in which the content is to be captured
Today, the most important element of our subject is the content moving from creator to consumer. Change the people; change the tools; change the vendors or delivery medium: it still works if somehow the desired content can be created and moved through the system. But change the content in a major way and things may grind to a halt until the downstream functions can be reworked to handle the new formats.
XML to the Rescue?
Today’s odds-on favorite for life cycle effectiveness is content marked up in XML and its various associated protocols. Properly deployed, XML content can carry a wealth of information about the subject, its source, its logical organization and how it can best be located and delivered. No other lexicon is likely to fully meet this set of requirements, objections from various technical communities notwithstanding.
So, we might ask: “in a Microsoft Word-based world, how can we get the XML content we need?” XML authoring works best when the involved industry develops and validates content standards which are then supported by industry vendors giving adopters a wide range of available tools and support. This has worked in a number of major industries, and, even with only partial implementation, it can work in today’s information delivery world.
It will, however, require imposition of an increased level of discipline on authors and on the authoring process.
Finally: the Tools
If authors, especially subject matter experts, typically do their work in MS Word, might we look to Word for some sense of the level of content creation available to us and what we must do to produce final deliverables?
Microsoft and MS Word: Boon or Bane?
Word, despite its high level of flexibility and functionality for its users, uses a tightly controlled internal data model based on linear objects it calls “paragraphs,” defined as content between hard returns — tables of course are a different matter and are handled with unique formats.