Canonical Formats and Query Strings - IIS SEO Toolkit

Article
06/09/2009

Today somebody was running the IIS SEO Toolkit and using the Site Analysis feature flagged a lot of violations about "The page contains multiple canonical formats.". The reason apparently is that he uses Query String parameters to pass contextual information or other information between pages. This of course yield the question: Does that mean in general query strings are bad news SEO wise?

Well, the answer is not necessarily.

I will start by clarifying that this violation in Site Analysis means that our algorithm detected that those two URL's look like the same content, note that we make no assumptions based on the URL (including Query String parameters). This kind of situation is bad for a couple of reasons:

Based on the fact they look like the same page Search Engines will probably choose one of them and index it as the real content and will discard the other one. The problem is that you are leaving this decision to Search Engines which means some might choose the wrong version and end up using the one with Query String parameters instead of the clean one (not-likely though). Or even worse they might end up indexing both of them as if they were different.
When other Web sites look at your content and add links to it, some of them might end up using the URL with different Query String parameters and some of them not. What this means is that the organic linking will not give you the benefits that you would if this was not the case. Remember Search Engines add you "extra" points when somebody external references your page but now you'll be splitting the earnings with "two pages" instead of a single canonical form.

Query String by themselves do not pose a terrible threat to SEO, most modern Search Engines deal OK with Query Strings, however its the organic linking and the potential abuse of Query Strings that could give you headaches.

Remember, Search Engines should make no assumptions based on the fact it is a single "page" that serves tons of content through a single Absulte Path and the use of Query Strings. This is typical in many cases such as when using index.php, where pretty much every page on the site is served by the same resource and just using variations of Query Strings or path information.

So what should I do?

Well, there are several things you could do, but probably one of the easiest is to just tell Search Engines (more specifically crawlers or bots) to not index pages that have the different Query String variations that really are meant only for the application to pass state and not to specify different content. This can be done using the Robots Exclusion Protocol and use the wildcard matching to specify to not follow any URL's that contain a '?'. Note that you should make sure you are not blocking URL's that actually are supposed to be indexed. For this you can use the Site Analysis feature to run it again and it will flag an informational message for each URL that is not visited due to the robots exclusion file.

User-agent: *

Disallow: /*?

In summary, try to keep canonical formats yourself, don't leave any guesses to Search Engines cause some of them might get it wrong. There are new ways of specifying the canonical form in your markup but it is "very recent" (as in 2009) and some Search Engines do not support it (I believe the top three do, though) using the new rel="canonical":

In the Beta 2 version of IIS SEO Toolkit we will support this tag and have better detection of this canonical issues. So stay tuned.

Other ways to solve this is to use URL Rewrite so that you can easily redirect or rewrite your URL's to get rid of the Query Strings and use more SEO friendly URL's.

Canonical Formats and Query Strings - IIS SEO Toolkit

So what should I do?

Additional resources