DIWD09_logo_2009.jpg Deciding that you need a content management system is a no-brainer for online publishers. And given the veritable financial bloodbath happening in the publishing industry -- thanks to the drop in advertising revenue -- it shouldn't come as a surprise that the idea of a free, open source CMS holds a great deal of appeal. However, in the case of The Economist, it didn't start out this way.

There's Legacy and there's Legacy

When the publication was founded in 1928 there were no computers, let alone a World Wide Web. Until recently, their web presence lived on a custom-built CMS that sat on a proprietary, Microsoft-based stack including Cold Fusion and Oracle. Bolted on were additional applications such as Movable Type and Pluk.

Unfortunately, their entire process at that point was broken. Rob Purdie, Scrum Practice Leader for The Economist, described it as waterfall in style and yet causing frequent firefighting. They needed to become:

  • More responsive to change
  • Able to deliver business value sooner
  • More sustainable

In order to achieve these goals, they didn't need just better technology (though they needed that too). They also need improved processes, organizational structure and culture.

The Plan

The decision was made to proceed in two different ways. Updates would be made both iteratively, and incrementally. Rather than one huge move, they'd take the site over piece by piece, improving as they went, so they could quickly add value to the business. The philosophy "perfect is the enemy of better" was the name of the game.

Their work began by looking over the existing Web CMS space. What were the leading platforms? What were other newspapers using? Major options appeared to be to build a new custom platform, purchase a proprietary system, or go free and open source.

It turned out that for their needs (community and content publishing), Drupal was a perfect fit. It offered a robust development framework and a development language (PHP) that had a large developer community. Java solutions in particular were avoided because they felt that the developers would be too expensive.

Getting Geared Up

First they had to sell the idea internally. As Purdie put it, "There is no suit-wearing Drupal salesforce." Instead, they'd have to make the internal case on their own.

They began by attending Drupalcon Boston 2008, networking with the community and learning more about the platform. Then Purdie arranged Drupal workshops and training with Lullabot.

Rather than trying to do this all on their own, he brought on Moshe Weitzman of Cyrve. Weitzman is a long-time fixture in the Drupal community, having been a Drupal core developer since before there was a Drupal.org. Together they built a Proof of Concept that consisted of an article page in Drupal using CCK (CCK is the Content Construction Kit add-on module for Drupal, it has been largely integrated into Drupal 7 core) for a rich article content type, and mocked up channel pages for site sections.

This mix of old and new sites would also require a creative approach to hosting. They identified two different possibilities for how they might bridge the two systems until the migration was complete.

The Economist had two options, to funnel everything through a proxy or use subdomains.

In the beginning they would entirely use the proxy method. As features matured, some could be moved to the sub-domain method. And once the migration was complete, they could retire both options.

Getting Started

Once they had approval to continue, they chose the comments and recommendations subsystem as the feature to migrate first. Along with moving over the data, they wanted to extend the functionality with capabilities such as threaded comments.

Rather than getting terribly fancy, they decided to use native Drupal comments. Doing so required creating a node for every type of legacy CMS content that might be commented on. Rather than doing so in one large batch, a node is created on the fly for each ColdFusion request. Content and user data is synchronized between the sites every 5 minutes to prevent data drift.

Other items chosen for early migration or enhancement included recommendations, abuse reports, and user profiles. All of which are business value-driven features.

The Tools

Weitzman and Cyrves' specialties are migrating data into Drupal. Given that The Economist is a weekly is a weekly newspaper brought online in 1997, that's a lot of content to move.

There are two open source, GPL'd modules that Weitzman counts on for such migrations:

  • Table Wizard: Takes MySQL tables and writes Views integrations automatically
  • Migrate: Builds on Table Wizard by migrating certain views and view subsets of legacy data and pushing into Drupal as nodes, comments, taxonomy terms, etc.

The folks at The Economist identified the important legacy data. Then, for example, Weitzman can have Migrate make a view for all articles from December 2 years ago, take a look at the results to see what's wrong in the migration, and delete any mangled migrated data without having to manually do all of the data bookkeeping.

For a publication site, Weitzman also uses Pressflow https://launchpad.net/pressflow from Four Kitchens, which is a Drupal distribution built for high performance using a combination of Drupal 6 and patches from Drupal 7. Given that The Economist sees 20 to 30 million page views and 3 to 4 million unique visitors per month, performance had to be considered.

This distribution is fully Drupal API compatible and is designed to take advantage of Varnish's  reverse proxy capabilities to take a lot of load off of both Drupal and MySQL.

Another new feature added to the site is a grid-based theme. Weitzman says that the 960 pixel grid-based theme is popular in Drupal for simplifying site layout work and making it easier to work with designers for quick theme changes.

Other important tools include:

  • A new focus on testing, such as unit tests with SimpleTest and Selenium and Hudson to launch automated tests every time a change is checked into Subversion
  • Changing to Apache Lucene/SOLR for search, using the hosted Acquia Search solution
  • Unfuddle for ticket tracking

Much Farther to Go

Ultimately, Purdie says that they're far less worried with finishing the full migration than they are with rolling out new features to engage readers. Many "heated discussions" result from this decision as their current hybrid approach increases the level of complexity.

The decision likely won't change until it would provide more business value to complete the migration than to finish another new feature. Just where that tipping point is, however, can be hard to spot.