Some people say that system X is better than system Y for search engine optimization (SEO) purposes. This may be true to some degree. But I've never seen a serious Web CMS project that expected to use the software without any configuration or modifications. This simply never happens.

What we see with Drupal (news, site) is that out of the box, so to speak, it's not great in terms of SEO. However, with the addition of a few contributed modules and after a some simple configurations are made, Drupal will stand alongside or even in front of the majority of web content management systems. Here's what you need to know to achieve this.

The practice of Search Engine Optimization is the means that website owners have for exercising control over how search engines such as Google, Yahoo! and Bing access the content of their websites.

This article covers the basics of how to search engine optimize a Drupal 6 installation.

[Editor's Note: Check out our 2009 Open Source CMS Market Share report for details on the 20 most popular open source content management systems.]

1. Activate Drupal's Clean URLs Feature

Drupal’s default URLs look something like this:

 http://www.example.com/index.php?q=node/1 

This is not optimal for search engines, nor for humans. Fortunately, the system has a native feature called Clean URLs. This functionality relies upon the web server to perform URL rewriting on inbound requests. Once it is enabled, Drupal will generate internal URLs using this cleaner format.

The above URL rendered in a clean format looks like this:

 http://www.example.com/node/1

This is an improvement as it no longer includes the '?' delimiter nor the 'q=node/1' name, value pair.

Enabling Clean URLs with Apache

In most cases, enabling this functionality with Apache is a no brainer. You may even be able to turn it on during the Drupal install process. As long as Apache's mod_rewrite module is loaded, you are in good shape.

Once you or a systems administrator have enabled or verified that mod_rewrite is enabled, you must simply copy the .htaccess file from your Drupal software archive into the root of your Drupal installation.

Once you have verified that the file exists and matches what came with your Drupal archive, access the Drupal admin system as an administrator, navigate to the Administer > Site configuration > Clean URLs area and enable the feature.

After you click to save the configuration changes, you are done with this process. Clean URLs should now be enabled for your website.

If the feature is still not working for you, you will need to perform advanced debugging. The best place to start is with the official Drupal documentation and related discussions.

Enabling Clean URLs with IIS

Out of the box Drupal 6 does not give you much help with IIS. The main issue is that URL rewriting can be enabled in different ways, depending on which version of IIS you are working with.

We're going to cheat here and only talk about IIS 7. The nice thing with IIS 7 is that Microsoft has released a free URL Rewrite add-on that plugs right into IIS. If you don't already have this installed, you can quickly install it by using the Microsoft Web Platform Installer.

D6-SEO-IIS-Rewrite-01.png

IIS 7 with Microsoft's free URL Rewrite Add-on

After you've installed or verified that the URL Rewrite module is enabled in your IIS server, you can add the required rewrite rules to your Drupal server instance. To do this you must create or edit the web.config file in the root of your Drupal installation.

Place or merge the following rules into your web.config file:

 <?xml version="1.0" encoding="UTF-8"?>
<configuration>
  <system.webServer>
    <rewrite>
      <rules>
        <rule name="CleanURLs" stopProcessing="true">
          <match url="^(.*)$" />
          <conditions>
            <add input="{REQUEST_FILENAME}" 
                  matchType="IsFile" negate="true" />
            <add input="{REQUEST_FILENAME}" 
                  matchType="IsDirectory" negate="true" />
          </conditions>
          <action type="Rewrite" url="index.php?q={R:1}" />
        </rule>
    </rules>
  </rewrite>
</system.webServer>
</configuration> 

After saving and exiting your web.config file, the critical rewrite rules will be in place and you should now be able to enable Clean URLs via the Drupal admin system.

Access the Drupal admin system as an administrator, navigate to the Administer > Site configuration > Clean URLs area and enable the feature.

After you click to save the configuration changes, you are done with this process. Clean URLs should now be enabled for your website.

If the feature is still not working for you, you will need to perform advanced debugging. The best place to start is with the official Drupal documentation and related discussions.

D6-SEO-CleanURL-Config.jpg

Drupal 6 -- Enabling the Clean URLs Feature
 

2. Enable the Path Module

The Path module is a core module with Drupal 6, which means that it is part of the normal software release.

This module takes you beyond clean URLs by providing the ability to create arbitrary URLs for any Drupal content item. With the Path module enabled you are free to optimize these URLs both for humans and for search engines.

To take advantage of this module you just need to enable it via the Drupal admin system. This is done by logging into Drupal as an administrator, navigating to the Administer > Site building > Modules section.

Here you will see a list of installed modules. Scroll down to locate the Path module, check the box to enable it and then click to save the configuration changes at the bottom of the page.

After you have enabled the Path module, Drupal's content editing screens will have a new section where you can enter an alias for the content item. Keep in mind that content aliases are very flexible in terms of format, but that they must be unique in the system.

D6-SEO-Path-Field.jpg

Drupal 6 -- Entering a Content Alias (URL)

3. Install and Enable the Pathauto Module

The Pathauto module builds upon the aliasing capabilities of the Path module, but goes a step further by enabling the automatic creation of aliases.

Pathauto has extensive configuration options (beyond the scope of this article) which allow you to use various keywords or data -- called tokens in the Drupal context -- to construct URLs for different content types.

The first step is to download, install and enable the Pathauto module. Pathauto relies upon both the Path and Token modules. So you must install and/or enable both those modules before you will be able to enable the Pathauto module.

Note:
If you're unfamiliar with installing Drupal modules, it's actually quite easy. You navigate to the module's homepage, download the latest "Released" build for your version of Drupal, decompress the archive and copy the module's folder into the "modules" directory which sits at the root of your Drupal install. Once you've put the new directory and files in place, as an administrator navigate to Administer > Site building > Modules in the Drupal admin system. You will see a list of installed modules. Scroll down to locate the recently installed module, check the box to enable it and then click to save the configuration changes at the bottom of the page.

After you have installed and/or enabled the Path, Token and Pathauto modules, your system will be automatically generating aliases for all new or changed content types.

The Pathauto module has a large set of configuration options. At first blush the configuration page can be daunting. Fear not, there are some reasonable default settings and with a bit of patience the options should be easy to understand.

As an administrator, navigate to the configuration page found at Administer > Site building > URL aliases > Automated alias settings. Here you will see 5 configuration sections.

D6-SEO-Pathauto-Config-Sections.jpg

Drupal 6 -- Pathauto Configuration Sections

For the moment, leave the General settings and Punctuation settings in their default state. Depending on what Drupal functionality you are presently using, review the other sections and make changes as you see fit. Certainly all site administrators should review the Node path settings area as the settings here will effect the URLs for all basic content types.

Hint: Take your time with these settings. Review other sites similar to yours and map out how you want the URLs to look for each content type you will publish.

Note:
The Pathauto module depends on the Token module. Both of these modules are "contributed modules" in Drupal 6. This picture is changing with Drupal 7 though -- the Token module is being moved into the system's core. Pathauto may also move into core, but this was not clear at the time this article was written.

4. Install and Enable the Global Redirect Module

The Global Redirect module is a contributed module that must be downloaded, installed and enabled. It primarily addresses one important SEO concern: canonicalization of URLs.

Canonicalization Issues

Canonicalization issues are a fancy way of saying that it's a SEO problem if there is more than one URL for a given piece of content in your website. I like to bring up the movie The Highlander and the "there can only be one" line that Christopher Lambert made famous.

Canonicalization can kind of be thought about like this. The problem is there can be more than one when it comes to URLs -- especially when you start implementing fancy aliasing like we've done above.

The search engines however won't allow there to be more than one URL for an item in your site. So they have to make a decision about which is the correct one. And generally speaking, it's better if you, the site owner, take control of this decision. That's where canonicalization control comes in.

The Global Redirect module helps quite a lot in this area. It performs the following tasks:

  • Removes trailing slashes from URLs (“/”) when the slash is not part of the canonical URL.
  • Permanently redirects any requests that refer to the homepage, but use something other than the canonical URL for the homepage address.
  • Permanently redirects any requests for content using the non-clean URL format (when the Clean URLs feature is enabled).
  • Removes unnecessary trailing zeros (“0”) when URLs address content in a taxonomy hierarchy.
  • Permanently redirects any requests for content where the case of the requested URL does not match the case of the canonical URL.

To enable this automatic functionality, download, install and enable the latest released version of the Global Redirect module. See our instructions above if you need guidance here. Once it is installed and enabled you are done with this configuration item.

Note - 1:
To handle homepage canonicalization issues (which you should), you will need to create a rewrite rule for your web server. The decision must be made whether or not you want the host name -- typically 'www' -- to be part of your canonical homepage URL.

For example, we had to chose to use either http://www.cmswire.com/ or http://cmswire.com/ as our homepage URL. We have chosen to use http://www.cmswire.com/ for our homepage and if you try the other one you will be permanently redirected to the URL with 'www' in it.

Tip: When implementing the canonical homepage rewrite rule, the rule should be placed above your other rewrite rules. This redirect should happen first.

In Apache environments, simply uncomment the appropriate lines from the .htaccess file that ships with Drupal. In IIS environments you will either need to use the URL Rewrite add-on or another rewrite tool if you are not using IIS 7.

If you are using IIS 7 and the URL Rewrite add-on, you can use a rule set like the below to achieve a canonical homepage URL with the 'www':

 <rule name="CanonicalHomepageURL_1" enabled="false" stopProcessing="true">
    <match url="^(.*)$" />
    <conditions>
        <add input="{HTTP_HOST}" negate="true" pattern="^www\.(.*)$" />
    </conditions>
    <action type="Redirect" url="http://www.{HTTP_HOST}/{R:1}" />
</rule> 

Note - 2:
In February 2009 Google, Yahoo! and Microsoft agreed on a standard for specifying the canonical link for a webpage. You can now specify the correct URL for a page by using a small bit of HTML code, placed in the head section of an HTML document. An example of this is as follows:

 <link rel="canonical" href="http://www.example.com/" /> 

5. Install and Enable the Path Redirect Module

The Path Redirect module is a contributed module that must be downloaded, installed and enabled. For our purposes here, it primarily addresses one important SEO concern: URL change management. It is also a handy utility that allows you to create redirects from any alias to any other alias.

URL Change Management

This particular problem should not need much explaining. For various reasons (e.g., if a title is changed and the title is a token used in the URL) a URL may change over time. The ideal situation is that once a URL changes, the system knows about this and is able to gracefully redirect requests for the old URL to the new URL. And more specifically, the ideal case is that the system uses a 301 Redirect to accomplish this task.

The Path Redirect module handles just this task, but only if it is combined with the Pathauto module.

Before we go further, download, install and enable the latest released version of the Path Redirect module. See our previous instructions if you need guidance here.

After the module is installed, make sure it has been enabled in the Modules screen. Then as an administrator browse to the Pathauto configuration screen at Administer > Site building > URL aliases > Automated alias settings.

Once you see this screen, expand the General setting section and scroll down to the item labeled Update action. In this section, chose the option entitled "Create a new alias. Redirect from old alias."

If you do not see this option, it means that the Path Redirect module has either not successfully installed or is not enabled.

 

D6-SEO-Path-Redirect-Config.jpg

Drupal 6 -- Path Redirect and Pathauto Module Integration

Once you have selected the above option and clicked to save the configuration changes, you are done with this configuration item. When content item aliases are changed, the system now automatically generates a 301 redirect from the old URL to the new URL.

A Word about Templates

Most of this article focuses on URLs. That is only one part of SEO, albeit an important part. Another aspect that you should spend time on is the structure of your HTML templates -- how you render your wonderful content in the browser.

There are a few basic rules that you should seek to understand and follow:

  1. Use semantic tags (e.g., H1, H2, H3) as they were intended. These tags have semantic value because they carry an implication. For example, the content in an H1 tag should say more about the page than the content in an H2 tag. Also, not using semantically meaningful tags in your content is a huge SEO mistake. Take some time to understand this concept and build your templates according to this rule.
  2. Keep it simple stupid (KISS). This is a rule that applies to web content but equally to your webpage templates. Use modern HTML. Keep it as light as possible. Keep HTML, CSS and content separate. Your pages will render faster and will be more accessible to both search engines and humans.
  3. Validate your HTML. If your rendered pages do no pass HTML and CSS validation tests, your website is not optimized for humans or machines. Check this regularly. Fix the problems you find.

A Word about Content

Content is king (read more about optimizing web content). No matter how much SEO you know and apply, this rule remains true. So don't over-prioritize SEO to the detriment of your content.

SEO serves a single purpose: to make your content accessible to search engines. The quality of your content serves many other important purposes. SEO and content quality are both factors in achieving your website's business goals. Balance your priorities accordingly.