The 6th annual Gilbane Boston conference kicked off today with a program full of interesting sessions, as the crowd was trying to find a way to be in two places at once (aside from following the #gilbaneboston Twitter feed). As a precursor to the conference, yesterday was a day full of workshops.
One of them -- Content Migration, the Dirty Little Secret of Content Management – aimed to investigate the real stumbling blocks of content migrations and shed light on some techniques on how to avoid them.
Organized by CMPros, the session brought together practitioners and consultants, who presented case studies and discussed content migration issues in a panel afterwards, including:
- Andy Striemer and Chad Miller of Robert Half International, Inc.
- Brad Kain, Quoin on Tactical Planning
- Gerrit Berkouwer, Dutch Ministry of General Affairs/Government Information Service
- Linda Berman
- Andrew Wilcox, CMPros
- Joan Lasselle, Lasselle-Ramsay Inc.
Content Migration Landscape
Dealing with structured and unstructured content can be challenging, “it’s a bit of a beast.” In earlier days, you could probably lock down several people in the room and slide pizza under the doors to keep them working on a mostly manual migration. Today, tools are becoming more sophisticated, and there is a number of providers in the content migration industry that specialize in this area.
You could still consider hiring temps to do your migration manually, but, in many cases, automation can be a better option – especially, when you deal with rather large volumes of data like 56000+ HTML pages. Some of the presenters were looking for content migration products that are:
- Quick and easy
- Flexible and easily leveraged
- Provide reliable customer support
And each seemed to have found an option that worked for their scenarios and requirements.
Extending the CMS
An interesting discussion spanned around topics where the organizations wanted to extend their content management systems to do what they wanted, but without having to buy additional modules. Other needs included going beyond the migration and doing more tasks like:
- Bulk upload
- Bulk publishing
- Cache clearing
- Reporting and statistics
When discussing the benefits of using a content migration tool, the following pluses were mentioned:
- Save time (or the tools don’t sleep or take weekends)
- Minimize redundant tasks
- Avoid errors
Tactical Aspects of Content Migration
Depending on a project, you may have either no CMS or 2-3+ CMSs to deal with. In some scenarios, there are large amounts of content to work with. If you’re doing a relatively small migration, it may make sense to hire skilled manual labor, which can be available for US$3-5 dollars per hour. A good advice was to view this process as an incremental process, with the following steps:
- Understand the source
- Repeat the process is bulk of the work
- Gradually build on that to fit in your new structure
- Validate each time you go through the process
At the start of content migration, you’ll be lucky if you have well-formed XHTML. When dealing with content extraction and processing of existing content to an intermediate format like XML, the well-formed initial data will be of huge help.
Content Migration Challenges
The presenters seemed to have all agreed that the content migration process is hard work, not extraordinarily challenging, but involves a lot of details, making these types of projects very front-loaded. Another challenge is internal links that already exist in your current CMS in a devilish mix of relative and absolute links that almost always need processing. On top of that, target of a link may not have an identifier until content is imported.
Content Migration Tips
Some of the tips from real-life scenarios were given to the audience, including:
- You really have to know your content, know your information types (e.g.: editorial content, formal documents, binaries)
- Know where you’re coming from and where you’re going to
- A content migration always takes longer than you think
- It is almost always harder than you think
- You should expect “stuff’’ to happen
- Go iteratively in Agile fashion
- A certain degree of content cleansing needs to happen, advisable to do it upfront
If you don’t have the knowledge about your own content, you may as well outsource the project. Outsourcing is a viable option for projects that involve less than 10, 000 pages. With a lot of questions around how to come up with accurate time and effort estimates, scope was identified as a crucial part. One suggestion on the scope was to go up one unit after you’ve doubled three times.