Migrating existing content into a SharePoint system, and specifically into a SharePoint Page Library -- the foundation of a SharePoint Publishing website -- is neither simple nor straightforward.
To migrate content into SharePoint we took a look at a number of tools, both commercial tools and custom-built ones paired with a data aggregation framework. Here is a review of our experience, a summary of our final approach and some postmortem thoughts.
Content Migration Scenarios
Scenario 1: Content in a number of locations
In this scenario, the customer had content spread over static sites, an unfamiliar-to-us Web CMS and some custom content databases. All of this content needed to be moved onto the SharePoint 2007 platform.
The core goal was to move all the content to the SharePoint Publishing Portal, into the page library. This required a lot of transformations on the existing content and also required us to pre-check for UTF compliance in addition to stripping of any additional HTML information embedded in it (except links and image URLs).
Scenario 2: Migration from Vignette to SharePoint
This scenario, like the first was to migrate to the SharePoint Publishing Portal from the client's current Vignette content management system. This too created a need to map the various content types in Vignette to SharePoint content types and then perform the migration.
Both scenarios shared the same goals:
- Move old content entities to SharePoint Page Libraries
- Move old images to SharePoint Picture Libraries and link them correctly in the content entities
- Move old documents and other assets to SharePoint Document Libraries and establish the correct links
- Move other items like Events, Links, etc.. to corresponding SharePoint Lists
- Replication of the site structure inside of SharePoint
Vendor Migration Tools
While there are many vendors around who have tools that can migrate items to SharePoint Lists / Document Libraries, there are only couple of vendors who have the tools to migrate content to SharePoint Page Libraries.
This proved to be the first challenge as these vendors are not well advertised or listed on the popular blogs and tool evaluation sites.
Once we found a candidate vendor tool, we moved into the evaluation phase with a sample content migration operation. Our evaluation of the tools showed us the following:
- A well defined wizard to crawl websites and map the relevant content blocks to the SharePoint Page Layout fields.
- Migration capability of related images and documents to the relevant image / document library and updating of the links.
- Limited reporting of the migration process, as it was limited to stopping of the migration process in case of an error.
- These tools worked wonderfully when it came to Simple Page Layouts with the standard columns in the page library and started to fail with large number of fields.
During testing we found several problems. For example, we realized that the wizards typically did not allow us to do the content mapping as we needed to. They forced us to map the source content blocks to the target columns based on the Page Layout. This approach did not take into consideration the Page Libraries Content Type columns.
What actually needed to be done was to map directly to the Page Library's Content Type columns, and then take the Page Layout as the Meta Information prior to moving the content.
We were quite sure with this approach the existing migration tools would have worked for many kinds of scenarios.
Custom Built Aggregation Framework
Once we ran into a wall with the existing tools, we decided to try and use an internal content aggregation framework that we had been using extensively for years. This framework had both aggregation and transformation capabilities.
List Item Migration
Migrating items to a SharePoint List is relatively straight forward. List items in SharePoint are roughly analogous to database records, and one can insert items into Lists via Web Service calls or the SharePoint object model API.
We took the first approach, using Web Services, making SOAP calls to the exposed Web Service methods. The SOAP approach allowed us to log errors on a per record basis, giving us the ability to track the migration at a very detailed level.
Page Library Migration
SharePoint Page Libraries are more complex than SharePoint Lists. When migrating content into a Page Library, the content must be transformed into a SharePoint Page -- an entity more complex than a simple List item.
To accomplish this, the original content mark-up must to be parsed, unwanted tags need to be stripped and then mapped to the correct resource in SharePoint. Once the source content is parsed, cleaned and deconstructed, it can be stored in the respective native SharePoint fields based on the relevant Content Type (in our scenario, the Article Page).
After this it can be mapped to the relevant Page Layout. Page Layouts determine the look and feel of the rendered content, in accordance with the principle of separating content and presentation.
We could, in principle, also map to other Custom Content Types without an issue using this approach.
Page Libraries expose these pages in the format of XML with just the column / field as an element in this XML. Once the element was also the page layout that need to be used to present this Page.
This made our job easier. We made our framework generate the Pages in this format and import them into Page Library while applying the right Page Layout for presentation as listed in one of the elements in the XML.
In addition to this we also had to face the challenge of creating sub sites to replicate the content structure in SharePoint. It's important to note that this task added a lot of overhead to the migration project.
In a Page Library, every piece of content -- other than the Web Part content -- gets mapped to a corresponding Content Type column in the Page Library. And the Page Layout itself is one of the metadata which the system uses to apply the correct Page Layout during content rendering.
Following from this relationships, we concluded that any content migration tool that claims to support SharePoint must be aware of and support mapping source content blocks to the Page Library Content Type columns. We did not find this to be the case.
In addition, we think that it is time that Microsoft re-examined its Page Library concept. Like other content management systems, we feel that SharePoint should support the creation of hierarchical directories within Page Libraries. This would allow for easier categorization of content and remove the need to create separate sub-sites to represent hierarchy.