Last week we discussed some of the on-site factors that affect search engine rankings within a CMS implementation. This week, we’ll revisit these factors and present guidance for addressing and automating these strategies during your CMS implementation.
Enforce W3C Compliant Code
Most mature CMS allow you to define and “lock down” the code used in templates. By validating these templates prior to deployment you can ensure most site content will be W3C compliant.
Some solutions – such as RedDot XCMS – incorporate a code validation utility. Automated third party validation is also available, most notably from Watchfire.com or pages can be validated manually using the W3C validation tools.
Create Generic Site Maps
Virtually all content management solutions allow automated generation and updating of site maps. Best practices suggest restricting the number of links on any one page to fewer than 100. This may require the creation of a series of hierarchical Site Maps to provide spiders with quick access to all site content.
Deploy Google and Yahoo Site Maps
To the best of our knowledge, no mature content management solution is available that creates Google and Yahoo! site maps “out of the box.” A reasonably skilled developer should be able to extend any CMS with an open API to produce these maps in the required format.
Alternatively, the non-linear creations team has developed plug-ins to generate Google and Yahoo! site maps for a number of leading mid-tier content management solutions.
Mandate Search Engine Friendly URLs
There are two major models for content management solution architectures:
- Static Publication. In this model, activities associated with content management are separated from content delivery. CMS activities – authoring, editing, workflow – take place on one server (usually inside the firewall). These files are transferred as static HTML to a separate web server. This server does nothing but serve these static HTML pages to visitors. These systems almost always generate URLs with no dynamic variables.
- Dynamic Publication. Dynamic CMS systems assemble content “on the fly” to create a page as it is requested by a visitor. Content management and delivery activities take place within the same system. These systems usually generate long URLs littered with dynamic variables. Some of these systems provide work-arounds for this challenge by allowing URL aliases to be created. These aliases can be standard URLs.
Publish to a Flat Directory
Many mature web content management solutions – particularly those that publish content statically - allow you to organize content hierarchically within the CMS independent of its physical location on the web server. With these systems you can maintain a much more complex system of organization within the CMS than is apparent in the directory structure of the live web site.
Eliminate Broken Links
Most content management solutions will not publish content that contains invalid links to other content maintained by the system. That is, they prohibit the publication of broken internal links on the site. With the exception of RedDot CMS, few are able to validate links to external sites. A number of utilities are available for monitoring the validity of links. These include:
- Watchfire WebXM (www.watchfire.com) – suitable for enterprise class sites
- LinkcheckerPro (www.linkcheckerpro.com)
- Xenu Link Sleuth (home.snafu.de/tilman/xenulink.html)
Use Robots.txt Appropriately
Several CMS solutions allow end users to control the robots.txt on a page-by-page basis (most notably, HotBanana), but most leave its definition to the site developer.
Having a robots.txt file is not absolutely mandatory – its absence is most notable for generating 404 errors in server logs and messing up web log analysis tools. If you choose to implement the file, however, it is critical that it is valid and accurately defines access to site content. A mistake in implementation can prevent the major search engines from indexing any of your site content.
Address Canonical Issues
You want your visitors to find your site whether they type in http://www.url.com, http://url.com or http://www.url.com/index.html. But you certainly don’t want the search engines to see these as separate web sites with duplicate content.