Last week we discussed some of the on-site factors that affect search engine rankings within a CMS implementation. This week, we’ll revisit these factors and present guidance for addressing and automating these strategies during your CMS implementation.
System-Wide Considerations
Enforce W3C Compliant Code
Most mature CMS allow you to define and “lock down” the code used in templates. By validating these templates prior to deployment you can ensure most site content will be W3C compliant.
Some solutions – such as RedDot XCMS – incorporate a code validation utility. Automated third party validation is also available, most notably from Watchfire.com or pages can be validated manually using the W3C validation tools.
Create Generic Site Maps
Virtually all content management solutions allow automated generation and updating of site maps. Best practices suggest restricting the number of links on any one page to fewer than 100. This may require the creation of a series of hierarchical Site Maps to provide spiders with quick access to all site content.
SPONSORSHIP
CMSWire speaks to a specific
audience of professionals. You can too.
Advertise here.
Deploy Google and Yahoo Site Maps
To the best of our knowledge, no mature content management solution is available that creates Google and Yahoo! site maps “out of the box.” A reasonably skilled developer should be able to extend any CMS with an open API to produce these maps in the required format.
Alternatively, the non-linear creations team has developed plug-ins to generate Google and Yahoo! site maps for a number of leading mid-tier content management solutions.
Mandate Search Engine Friendly URLs
There are two major models for content management solution architectures:
- Static Publication. In this model, activities associated with content management are separated from content delivery. CMS activities – authoring, editing, workflow – take place on one server (usually inside the firewall). These files are transferred as static HTML to a separate web server. This server does nothing but serve these static HTML pages to visitors. These systems almost always generate URLs with no dynamic variables.
- Dynamic Publication. Dynamic CMS systems assemble content “on the fly” to create a page as it is requested by a visitor. Content management and delivery activities take place within the same system. These systems usually generate long URLs littered with dynamic variables. Some of these systems provide work-arounds for this challenge by allowing URL aliases to be created. These aliases can be standard URLs.
Publish to a Flat Directory
Many mature web content management solutions – particularly those that publish content statically - allow you to organize content hierarchically within the CMS independent of its physical location on the web server. With these systems you can maintain a much more complex system of organization within the CMS than is apparent in the directory structure of the live web site.
Eliminate Broken Links
Most content management solutions will not publish content that contains invalid links to other content maintained by the system. That is, they prohibit the publication of broken internal links on the site. With the exception of RedDot CMS, few are able to validate links to external sites. A number of utilities are available for monitoring the validity of links. These include:
- Watchfire WebXM (www.watchfire.com) – suitable for enterprise class sites
- LinkcheckerPro (www.linkcheckerpro.com)
- Xenu Link Sleuth (home.snafu.de/tilman/xenulink.html)
Use Robots.txt Appropriately
Several CMS solutions allow end users to control the robots.txt on a page-by-page basis (most notably, HotBanana), but most leave its definition to the site developer.
Having a robots.txt file is not absolutely mandatory – its absence is most notable for generating 404 errors in server logs and messing up web log analysis tools. If you choose to implement the file, however, it is critical that it is valid and accurately defines access to site content. A mistake in implementation can prevent the major search engines from indexing any of your site content.
Address Canonical Issues
You want your visitors to find your site whether they type in http://www.url.com, http://url.com or http://www.url.com/index.html. But you certainly don’t want the search engines to see these as separate web sites with duplicate content.
Fortunately, you can take a few simple steps to overcome this challenge. Before you go live with your newly content-managed site, select one of these URLs as the site URL. Then set up permanent (301) redirects for the other URLs. For example, if you selected www.url.com as your primary URL, you would set up a 301 redirect for url.com and www.url.com/index.html.
The major search engines do not penalize permanent redirects. This simple method overcomes most canonical issues.
Avoid Session Variables
Session variables in the URL GET requests are most frequently employed as an alternative to cookies. They “hold state” allowing the system to track one visitor throughout a visit. If your CMS is assigning session variables you have two safe options and one potentially risky approach:
- Investigate replacing session variables with cookies as a means of holding state. Many systems provide this as a configurable option.
Be aware that a considerable and growing percentage of visitors refuse cookies. As a result, this approach may not be viable for systems that rely heavily on holding state (such as some shopping cart systems)
- Determine whether the use of session variables can be restricted to those parts of the site that absolutely require holding state. For example, an e-commerce element of your site may require the use of session variables, but you may not require their use throughout the site. If this is the case, you may be able to improve the likelihood of pages ranking in search engines by eliminating the variable.
- In the risky approach, you custom configure the system to identify in bound search spiders (by their IP Address or User Agent) and consistently assign the same session variable. This overcomes the session variable challenge by ensuring the search engines recognize the persistence of a page over time. It is risky because any time you treat a search engine spider differently than human visitors you open yourself to accusations of “cloaking,” a decidedly black hat SEO tactic. The penalties applied by search engines when they detect cloaking are harsh and include the possibility of a permanent ban from listings.
It’s a risk that needs to be weighed against the potential gain. If your site is already ranking well, then don’t risk it. If your pages are not being indexed at all, then there is little downside to a ban. Your site is probably in between and you’ll need to make a judgment call.
Template-Level Considerations
Reduce Code Clutter
Increasing the clarity and prominence of the text on a page is one of the simplest, most-effective SEO tactics.
Most mature content management solutions provide the choice of using cascading style sheets (CSS) to control the format of a page. Making use of a CSS based design – and then prohibiting the modification of this design – will eliminate much of the HTML code that would otherwise be required.
If you chose to use JavaScript, ensure that the code is incorporated into the site as an “include” file rather than in the body of the page. By defining this at the template level, you can ensure the tactic is deployed throughout the site.
Create Site Navigation as Descriptive Text
When creating the design templates for your site, ensure that the navigation elements (those links that structure access to the site) are not images or Macromedia Flash objects, but simple text links. Enforce this navigation through the use of the content management solutions template functions.
Many content management solutions allow you to automatically generate “bread crumbs” that indicate to visitors where they are in the site. Use this feature to generate a consistent key-word rich text element on each page.
The Bottom Line
Embedding SEO best practices during the deployment of a content management solution can pay large dividends. If you are just about to proceed with your CMS implementation, this is definitely worth considering. And if you’re ready to expand your SEM strategy, review this Link Building Guide to explore link building tactics that help improve your search engine visibility.
About the Author
Randy Woods is a co-founder of non-linear creations. With his breadth of knowledge and experience in online strategy, content management and search marketing, Randy shares his lessons learned through the non-linear creations Leadership Series; a number of published whitepapers including: Best Practices in CMS Governance, SEO and CMS: Best Practices and the NLC Performance Framework.

Events RSS Feed
Email It
Stumble It
Add RSS
Processing...



Great work Randy.
I'd like to sideway into SEO linking from your observation of the static CMS publishing model, but before that, I would say that dynamic sites can be created in that model through the addition of variables into static HTML when the publishing server is setup to parse content for code.
Now, consequences of production/publishing segregation span the ability to
1. / remove complexity, and improve uptime. No production web site downtime will ever be due to the CMS itself.
2. / create efficiencies in managing a large number of websites within universes (think mini-sites, satellites, or organization with multiple vertical or geographical markets), publishing them closest to their markets, to web servers using different class C, and obviously SEO cross-linking when appropriate.
3. / open a new world of business opportunities, something we are doing at http://www.seosamba.com and that I'd love to talk to any SEO professionals and in-house SEO marketing manager who'd care listening my ramblings :-)...
Thanks Randy, for an excellent and valuable article. I think it is worth commenting on my blog how Magnolia solves the issues you have described, but for now I just want to let you know that automatic Google-Sitemap Generation has been part of Magnolia's SEO friendly offering for quite some time.
It even takes into account what we call virtual URI mappings, which is our technology that allows to solve several of the issues stated in your article through one unified approach - for instance, it allows us to create "shortcuts" or marketing-URL's in addition to our already "virtually static" URL's. E.g. http://www.magnolia-cms.com/4-0 is a page that only virtually resides at that URI, but that fact is indistinguishable for search engines. So you can create a flat hierarchy if you want.
Another usages is that we can convert virtually static urls into dynamic ones, e.g. mysite.com/product/tv/hdtv/xyz.html *looks* like a static URL, but internally can be mapped such that everything after /products is simply passed as a parameter. In other words, you can build a universe of virtual static pages with a single "real page".