Last week we discussed some of the on-site factors that affect search engine rankings within a CMS implementation. This week, we’ll revisit these factors and present guidance for addressing and automating these strategies during your CMS implementation.
Enforce W3C Compliant Code
Most mature CMS allow you to define and “lock down” the code used in templates. By validating these templates prior to deployment you can ensure most site content will be W3C compliant.
Some solutions – such as RedDot XCMS – incorporate a code validation utility. Automated third party validation is also available, most notably from Watchfire.com or pages can be validated manually using the W3C validation tools.
Create Generic Site Maps
Virtually all content management solutions allow automated generation and updating of site maps. Best practices suggest restricting the number of links on any one page to fewer than 100. This may require the creation of a series of hierarchical Site Maps to provide spiders with quick access to all site content.
Deploy Google and Yahoo Site Maps
To the best of our knowledge, no mature content management solution is available that creates Google and Yahoo! site maps “out of the box.” A reasonably skilled developer should be able to extend any CMS with an open API to produce these maps in the required format.
Alternatively, the non-linear creations team has developed plug-ins to generate Google and Yahoo! site maps for a number of leading mid-tier content management solutions.
Mandate Search Engine Friendly URLs
There are two major models for content management solution architectures:
- Static Publication. In this model, activities associated with content management are separated from content delivery. CMS activities – authoring, editing, workflow – take place on one server (usually inside the firewall). These files are transferred as static HTML to a separate web server. This server does nothing but serve these static HTML pages to visitors. These systems almost always generate URLs with no dynamic variables.
- Dynamic Publication. Dynamic CMS systems assemble content “on the fly” to create a page as it is requested by a visitor. Content management and delivery activities take place within the same system. These systems usually generate long URLs littered with dynamic variables. Some of these systems provide work-arounds for this challenge by allowing URL aliases to be created. These aliases can be standard URLs.
Publish to a Flat Directory
Many mature web content management solutions – particularly those that publish content statically - allow you to organize content hierarchically within the CMS independent of its physical location on the web server. With these systems you can maintain a much more complex system of organization within the CMS than is apparent in the directory structure of the live web site.
Eliminate Broken Links
Most content management solutions will not publish content that contains invalid links to other content maintained by the system. That is, they prohibit the publication of broken internal links on the site. With the exception of RedDot CMS, few are able to validate links to external sites. A number of utilities are available for monitoring the validity of links. These include:
- Watchfire WebXM (www.watchfire.com) – suitable for enterprise class sites
- LinkcheckerPro (www.linkcheckerpro.com)
- Xenu Link Sleuth (home.snafu.de/tilman/xenulink.html)
Use Robots.txt Appropriately
Several CMS solutions allow end users to control the robots.txt on a page-by-page basis (most notably, HotBanana), but most leave its definition to the site developer.
Having a robots.txt file is not absolutely mandatory – its absence is most notable for generating 404 errors in server logs and messing up web log analysis tools. If you choose to implement the file, however, it is critical that it is valid and accurately defines access to site content. A mistake in implementation can prevent the major search engines from indexing any of your site content.
Address Canonical Issues
You want your visitors to find your site whether they type in http://www.url.com, http://url.com or http://www.url.com/index.html. But you certainly don’t want the search engines to see these as separate web sites with duplicate content.
Fortunately, you can take a few simple steps to overcome this challenge. Before you go live with your newly content-managed site, select one of these URLs as the site URL. Then set up permanent (301) redirects for the other URLs. For example, if you selected www.url.com as your primary URL, you would set up a 301 redirect for url.com and www.url.com/index.html.
The major search engines do not penalize permanent redirects. This simple method overcomes most canonical issues.
Avoid Session Variables
Session variables in the URL GET requests are most frequently employed as an alternative to cookies. They “hold state” allowing the system to track one visitor throughout a visit. If your CMS is assigning session variables you have two safe options and one potentially risky approach:
- Investigate replacing session variables with cookies as a means of holding state. Many systems provide this as a configurable option.
- Determine whether the use of session variables can be restricted to those parts of the site that absolutely require holding state. For example, an e-commerce element of your site may require the use of session variables, but you may not require their use throughout the site. If this is the case, you may be able to improve the likelihood of pages ranking in search engines by eliminating the variable.
- In the risky approach, you custom configure the system to identify in bound search spiders (by their IP Address or User Agent) and consistently assign the same session variable. This overcomes the session variable challenge by ensuring the search engines recognize the persistence of a page over time. It is risky because any time you treat a search engine spider differently than human visitors you open yourself to accusations of “cloaking,” a decidedly black hat SEO tactic. The penalties applied by search engines when they detect cloaking are harsh and include the possibility of a permanent ban from listings.
It’s a risk that needs to be weighed against the potential gain. If your site is already ranking well, then don’t risk it. If your pages are not being indexed at all, then there is little downside to a ban. Your site is probably in between and you’ll need to make a judgment call.
Reduce Code Clutter
Increasing the clarity and prominence of the text on a page is one of the simplest, most-effective SEO tactics.
Most mature content management solutions provide the choice of using cascading style sheets (CSS) to control the format of a page. Making use of a CSS based design – and then prohibiting the modification of this design – will eliminate much of the HTML code that would otherwise be required.
Create Site Navigation as Descriptive Text
When creating the design templates for your site, ensure that the navigation elements (those links that structure access to the site) are not images or Macromedia Flash objects, but simple text links. Enforce this navigation through the use of the content management solutions template functions.
Many content management solutions allow you to automatically generate “bread crumbs” that indicate to visitors where they are in the site. Use this feature to generate a consistent key-word rich text element on each page.
The Bottom Line
Embedding SEO best practices during the deployment of a content management solution can pay large dividends. If you are just about to proceed with your CMS implementation, this is definitely worth considering. And if you’re ready to expand your SEM strategy, review this Link Building Guide to explore link building tactics that help improve your search engine visibility.
About the Author
Randy Woods is a co-founder of non-linear creations. With his breadth of knowledge and experience in online strategy, content management and search marketing, Randy shares his lessons learned through the non-linear creations Leadership Series; a number of published whitepapers including: Best Practices in CMS Governance, SEO and CMS: Best Practices and the NLC Performance Framework.