I was recently doing an seo audit of a small ecommerce website. One of the first things I did was to do a ‘site:www.domain.com’ in Google. Amazingly, the site in question has approximately 8200 pages indexed in Google. This was quite surprising when the store only sold less than 1500 unique products. The site used a horrible ecommerce module bolted onto phpnuke and has a horrible url structure, appending lots on unecessary querystring data onto the url.
Whilst looking through the Google results for this site the majority of pages were as follows:
The site has an advanced search page, whereby you can sort products using a variety of options such as ascending order, descending order, size, price etc. This is bad for a number reasons, but mainly due to duplicate content (not to mention lower serps ranking, traffic loss and decreased page relevancy) . A page of results in ascending and descending order is essentially the same page, simply a different view of your data – you can help search engines via using the relatively new canonical url link tag.
To illustrate I’ll use an example of a typical category page, whereby you can sort a list of products in ascending and descending order, leaving you with a number of urls as follows:
In this example the part of the querystring creating the duplicate content would be the sortOrder parameter – as you would want your seperate categories indexed.
The solution is quite simple. In your head tag add the following:
<link rel="canonical" href="http://www.shop.com/category.php?catName=Shirts" />
By adding this to your category page you are telling search engines (currently Google, Yahoo, Ask and Bing use this tag) that this page is a copy of http://www.shop.com/category.php?catName=Shirts. Indicators such as Google Pagerank are also transferred to your preferred url.
The canonical url tag has many uses and can be used to help with the following issues:
- Pages that contain session IDs appended to the querystring
- Search results pages that append search data to the querystring
- Print versions of page
- Duplicate content for www. and non-www. pages 0 in your canonical tag you would include your preferred url
- Same content contained in multiple categories – E.g. a product contained in multiple categories on an online store
- Removing affiliate ids in the url
- Preventing multiple pages of a discussion topic with comments from being indexed E.g. shop.com/post.php?id=123&page=1
You can read more about the canonical tag at the official Google Webmaster Blog. Matt Cutt’s also has a 20 minute video explaing the canonical tag in more depth.
The main point to consider is that the canonical tag is simply a hint and not a directive. It is another method to give search engines help in indexing your content. This is very useful when working on existing sites already indexed by Google. However, on new sites bit more planning can help. For instance, in a previous article I covered 301 redirects for seo using htaccess – how to set a prefferred version of your site via htaccess. On an ecommerce store you could avoid appending search data to the querystring.
EDIT: wordpress and all in one seo plugin generate canonical link tags for blog posts. For example, comments are seperated into multiple pages E.g
With the actual content being at:
If you have a quick look at the source code to the comments page you’ll see the following has been added:
<pre id="line34"><link rel="canonical" href="http://www.web-design-talk.co.uk/157/how-to-deal-with-difficult-clients-using-split-testing/" />