Tracking URLs indexed by Google, Yahoo! and MSN search engines – the proliferation of duplicate content. - Web Experience & SEO -
Tracking URLs indexed by Google, Yahoo! and MSN search engines – the proliferation of duplicate content.

Like many other enterprise companies, HP uses a fairly common tracking parameter across the site and URLs including these tracking codes have historically been indexed by Google, Yahoo! and MSN.  We've seen several of our pages with this tracking string on the URL get indexed and served in results for popular queries.  Within the last two weeks, however, it seems Google has figured out some of these tracking URLs and cleaned them out of their index.  We're still ranking similarly for the pages we’ve been monitoring but the tracking URL is no longer showing in the search results.

Here is an example:

For the search term "color inkjet printers" the following was the result on April 15, note the tracking URL starting with ?jumpid=. 

Color Inkjet Printers At a glance - HP Small & Medium Business ...

Summary of all HP Color Inkjet Printers currently available for purchase and recommended for Small & Medium Business. Includes links to compare products, ...
h10010.www1.hp.com/wwpc/us/en/sm/WF02a/18972-236251-236261.html?jumpid=re_R163/GW/color_inkjet – 142k - Oct 17, 2006 - Cached - Similar pages

On April 25 the following result appeared in the same position.  Note no tracking parameters appear at end of URL.

Color Inkjet Printers At a glance - HP Small & Medium Business ...

Summary of all HP Color Inkjet Printers currently available for purchase and recommended for Small & Medium Business. Includes links to compare products, ...
h10010.www1.hp.com/wwpc/us/en/sm/WF02a/18972-18972-236251.html - 150k - Cached - Similar pages - Note this - Filter

We’ve seen this shift for several, but not all, pages that ranked highly for popular search queries in Google only – MSN and Yahoo still seem to be indexing and serving the tracking URL in lieu of the primary URL of the page.  While this is great progress at removal of duplicate content for HP and for Google, I do have questions:

(1) Is Google removing the URL with the tracking code from the index entirely – thus truly reducing duplicate content?

(2) If so, is the link popularity from that URL being transferred to the base URL (sans tracking code)?

OR -

(1) Is Google simply stripping off the tracking parameter from the URL and not consolidating link popularity to the base URL?

(2) If so, is there a way we can help Google understand they should ignore - not entire URLs with jumpids - but just the jumpid itself thus salvaging the link popularity to the base URL.

I wish the engines were more transparent on this issue and would provide some guidance on how to deal with this issue.  Here are some ideas for how they might address this:

(1) Use wild cards in the robots.txt file to block spiders from crawling pages appended with a designated tracking code.  Not sure if Google respects the wild card but Yahoo’s Priyank Garg suggests using this very solution for exclusion of tracking urls in their index.  I believe this actually won't help as I think it will tell the crawlers to simply ignore URLs with jumpids which would reduce overall link popularity potentially removing the most popular (from a link standpoint) URL we have from appearing in search results.

(2) Create a script that redirects only search crawlers (via a 301) from any URL with a tracking parameter to the base URL.  Yahoo's Sean Suchter suggested this at the duplicate content session at SES NY but I haven’t been able to find documentation on how exactly to implement this.  Also, could this solution potentially be seen as cloaking?

(3) Create a script for every single tracking URL that drops a cookie and resolves to the base URL instead of the URL with the tracking parameter. 

(4) Can we tell the Google in some other fashion through Google Webmaster tools or some other way what tracking parameters we use and when to ignore that portion of the URL?  This seems like the cleanest/easiest solution. 

My objectives are to:

Reduce duplicate content in search indices by having only the base URL indexed

Retain and consolidate link popularity to the base URL

Improve the accuracy of campaign metrics by having only the base URLs appear in natural search results

I’ve posed the questions above to Matt Cuts (who is apparently preparing for a month-long vacation) and Vanessa Fox at Google but have not heard back from them yet.   I’ve also followed up with Sean Suchter to try and find out more about the redirect script he referred to at SES NY.    

It seems the search engines have the same objectives of eliminating duplicate content created by tracking URLs and consolidating link popularity to the base URL for better, more relevant results.  I’d like to see them look to the web site owners for more direction on how to navigate and index the pages within these sites.  


Posted 04-27-2007 6:55 PM by BlogArchive

Comments

Laura Dansbury wrote Re: Tracking URLs indexed by Google, Yahoo!
on 04-30-2007 9:47 PM
#4, Developing a method to tell the engines what tracking parameters we use and when to ignore that portion of the URL seems to be the best solution. It follows the existing methodology of using sitemaps to tell the spiders which directories to exclude from indexing and where the existing sitemaps are. This method provides an industry standard and gives publishers a simple way to communicate to the spiders without changing existing processes established for very legitimate reasons just to accomodate search.
franciscocheng wrote Re: Tracking URLs indexed by Google, Yahoo!
on 05-29-2007 2:04 PM
Hey, my name is Francisco Cheng and I am the SEO Manager for thestreet.com. We are constantly struggle with this duplicate content issue as well. We are currently working on a solution like the solution #2 you mentioned, which is to 301 redirect all links with tracking parameters to the original URL. I don't think that would be considered cloaking. Francisco franciscocheng@gmail.com
Tanya Rietze wrote Re: Tracking URLs indexed by Google, Yahoo!
on 05-31-2007 2:58 PM
Thanks for the comment Francisco. I agree that a 301 is a good solution and acceptable by the engines. Whether it's feasible for all companies is another concern. I know there are some areas where we have challenges implementing a 301 for all these types of URLs whether it's systematic or procedural. I'd like to see the engines give us some more alternatives to help them clean up their indexes. Tanya
nathangorg wrote Re: Tracking URLs indexed by Google, Yahoo!
on 06-26-2007 3:06 AM
there is another way to deal with this:- *URL patterns can be encoded in what is called 'savantic codes'.Not many people know this but it can be done.
Tanya Rietze wrote Re: Tracking URLs indexed by Google, Yahoo!
on 06-26-2007 2:53 PM
Wow, thanks for the tip on 'savantic codes'. I'll be looking into this one ASAP!

Add a Comment

(required)  
(optional)
(required)  
Remember Me?

Type the numbers and letters above:
Powered by Community Server (Non-Commercial Edition), by Telligent Systems