Drupal SEO in 2016: URLs

SEO is a moving target. For blackhat SEOs, it's a constant back and forth cat-and-mouse game between them and Google. And the potential penalties by Google against sites employing blackhat tactics should be enough to encourage you as a site owner to focus your efforts on whitehat SEO. Drupal has a lot of tools which can help site builders format their content to be easily understood and categorized by search engines. And one of those factors which Google uses is the page's URL.

So it's important to pay attention to a few things related to URLs in your Drupal config.

  1. Turn on clean URLs / pretty URLs. This is a minimum. Out of the box, Drupal provides a working .htaccess with mod_rewrite rewrite rules to make clean URLs work when enabled.

  2. Pathauto. You don't want your content to all have urls like /node/1337, at least as what people and search engines see. I would recommend not worrying a lot about exactly what structure of URL each content type will have, including categories, many levels deep. The imporant thing is to have the content title be in the URL instead of node ID. Have '/content/' or content-type be before title if you want (just leaving it be title runs you the risk of a content post title conflicting with some other Drupal url). Set this up under Admin > Configuration > Search and metadata > URL aliases in the 'Patterns' tab. If you have old content which doesn't have cool new urls you can go to the 'Bulk update' tab to refresh all content with the new pattern settings.

    Note for foreign languages: If you have content in a foreign language, with titles using non-English characters (characters with accents or diacritics), you may consider using the Transliteration module. Transliteration will convert letters with accents into letters without accents. For more exotic languages (Greek, Russian, Chinese, etc.) it will replace foreign characters with English phonetic ones. If you leave those characters in the url, you may notice when you copy and paste the url that they all turn into % number codes, the unicode values for those characters. In theory, Google is smart enough to read and recognize those letters, keeping them an advantage for foreign language searchers.

  3. Next thing you should do is fix any duplicate content caused by multiple Pathauto URLs pointing to the same node. This would happen when changing Pathauto patterns, or simply when changing the title of a post causing it to have a new URL based on the new title - while leaving the old URL. Plus the /node/1337 URL is also a valid URL to point to the content if a search engine were to find it. The Drupal module which solves the general problem of multiple valid URLs for a single node is the Global Redirect module. What this module does is when a user clicks on a link which isn't the main url for a node, she will automatically get redirected to the main one. The same goes for Google. And so all link juice will pass to one url, making your content stronger in ranking, and also fixes some of the duplicate content problem.

    Note for foreign language websites: URL aliases will by default remove English particles/definitives/adverbs like "a, but, by, in, the, to..." etc. Your language may treat some of those words as significant words (nouns, verbs) and you would want to filter other words instead. Configure it under Settings in "Strings to remove".

  4. Now you need to check your URLs for any problems and the best way is Google Webmaster Tools (GWT). Verify ownership of your site in GWT and make sure Google is crawling and indexing your urls. And check for any crawl errors. It will take a few days before you see your first results if it's a new website.

  5. Back to the Drupal modules page, let's install Xmlsitemap and enable the content (xmlsitemap_node) and possibly taxonomy (xmlsitemap_taxonomy) submodules. Once you create an initial sitemap profile (you will have to create one before the module does anything useful) and generate and check it, you can go tell Google Webmaster Tools about your new sitemap. This will get new content indexed faster, letting Google know to crawl new pages.

    Note for foreign language websites: You can have multiple sitemaps configured here, such as one for each language, and all can be submitted to Google.

  6. If GWT is reporting 404 errors, either fix the pages which are linking to broken links, or if you don't have control over them, set up 301 redirects. You can do this by manually adding entries into your .htaccess file or by installing the Redirect module. In the UI for Redirect, you can see a list of 404 errors and how many times they have been requested to help you fix and prioritize.

There are a few other things you can do for on-site SEO with other Drupal modules and configuration. But the above should be the bulk of optimizing your Drupal website for search engines.