Prevent Duplicate Content using the Canonical Url Tag

I was recently doing an seo audit of a small ecommerce website. One of the first things I did was to do a ‘site:www.domain.com’ in Google. Amazingly, the site in question has approximately 8200 pages indexed in Google. This was quite surprising when the store only sold less than 1500 unique products. The site used a horrible ecommerce module bolted onto phpnuke and has a horrible url structure, appending lots on unecessary querystring data onto the url.

Whilst looking through the Google results for this site the majority of pages were as follows:

/index.php?tab=123&txtSearch=ALL&List=oasc&Sort=PName%2CPName&CreatedUserID=1&pageindex=40&Language=en-GB

The site has an advanced search page, whereby you can sort products using a variety of options such as ascending order, descending order, size, price etc. This is bad for a number reasons, but mainly due to duplicate content (not to mention lower serps ranking, traffic loss and decreased page relevancy) . A page of results in ascending and descending order is essentially the same page, simply a different view of your data – you can help search engines via using the relatively new canonical url link tag.

To illustrate I’ll use an example of a typical category page, whereby you can sort a list of products in ascending and descending order, leaving you with a number of urls as follows:

http://www.shop.com/category.php?catName=Shirts&sortOrder=ASC

In this example the part of the querystring creating the duplicate content would be the sortOrder parameter – as you would want your seperate categories indexed.

The solution is quite simple. In your head tag add the following:

<link rel="canonical" href="http://www.shop.com/category.php?catName=Shirts" />

By adding this to your category page you are telling search engines (currently Google, Yahoo, Ask and Bing use this tag) that this page is a copy of http://www.shop.com/category.php?catName=Shirts. Indicators such as Google Pagerank are also transferred to your preferred url.

The canonical url tag has many uses and can be used to help with the following issues:

  • Pages that contain session IDs appended to the querystring
  • Search results pages that append search data to the querystring
  • Print versions of page
  • Duplicate content for www. and non-www. pages 0 in your canonical tag you would include your preferred url
  • Same content contained in multiple categories – E.g. a product contained in multiple categories on an online store
  • Removing affiliate ids in the url
  • Preventing multiple pages of a discussion topic with comments from being indexed E.g. shop.com/post.php?id=123&page=1

You can read more about the canonical tag at the official Google Webmaster Blog. Matt Cutt’s also has a 20 minute video explaing the canonical tag in more depth.

The main point to consider is that the canonical tag is simply a hint and not a directive. It is another method to give search engines help in indexing your content. This is very useful when working on existing sites already indexed by Google. However, on new sites bit more planning can help. For instance, in a  previous article I covered 301 redirects for seo using htaccess – how to set a prefferred version of your site via htaccess. On an ecommerce store you could avoid appending search data to the querystring.

EDIT: wordpress and all in one seo plugin generate canonical link tags for blog posts. For example, comments are seperated into multiple pages E.g


https://www.web-design-talk.co.uk/157/how-to-deal-with-difficult-clients-using-split-testing/comment-page-1/#comment-344

With the actual content being at:


https://www.web-design-talk.co.uk/157/how-to-deal-with-difficult-clients-using-split-testing

If you have a quick look at the source code to the comments page you’ll see the following has been added:

<pre id="line34"><link rel="canonical" href="https://www.web-design-talk.co.uk/157/how-to-deal-with-difficult-clients-using-split-testing/" />

Published by

Rob Allport

Web Developer based in Stoke-on-Trent Staffordshire Google+ - Twitter

10 thoughts on “Prevent Duplicate Content using the Canonical Url Tag”

  1. Hi,

    Thanks for the comment. The canocial tag (like 301 redirects and the meta you mention) is mean as another hint to search engines to index the correct, unique. It’s another sign you can send (I’d advise doing it alongside and method such as 301 redirects or using the method I described in a previous post on www. and non www. versions of your site) to search engines saying where the correct content is located. For the effect required to implement this into something like wordpress or your own website, I think it’s definately worth it.

    It’s also worth nothing that overall you’re not only trying to help users, but also consolidate your page rankings from multiple urls, to a single one – teherby avoiding duplicate content penalties.

    I personally believe as many methods we have to prevent duplivate content, the better. For example, to manually integrate the canocial into your wordpress install the process is quite pain free – there’s an excellent simple article here on how edit WordPress to prevent duplicate content (you only have to edit your header.php file – simple!)

    There, I wrote way too much again for a comment reply 🙂

  2. Possibly good for other major search engines as Google no longer penalizes (and hasn’t for some time) for duplicate but rather just filters out the duplicate content.

Leave a Reply

Your email address will not be published. Required fields are marked *