Like many other
enterprise companies, HP uses a fairly common tracking parameter across the
site and URLs including these tracking codes have historically been indexed by
Google, Yahoo! and MSN. We've seen several of our pages with this
tracking string on the URL get indexed and served in results for popular
queries. Within the last two weeks,
however, it seems Google has figured out some of these tracking URLs and
cleaned them out of their index. We're still ranking similarly for the
pages we’ve been monitoring but the tracking URL is no longer showing in the
search results.
Here
is an example:
For
the search term "color
inkjet printers" the following was the result on April 15, note
the tracking URL starting with ?jumpid=.
Color Inkjet Printers At a glance - HP Small &
Medium Business ...
|
Summary of all HP Color Inkjet Printers currently available
for purchase and recommended for Small & Medium Business. Includes links
to compare products, ...
h10010.www1.hp.com/wwpc/us/en/sm/WF02a/18972-236251-236261.html?jumpid=re_R163/GW/color_inkjet – 142k - Oct 17, 2006 - Cached
- Similar
pages
|
On April 25 the
following result appeared in the same position.
Note no tracking parameters appear at end of URL.
Color Inkjet Printers At a glance - HP Small &
Medium Business ...
|
Summary of all HP Color
Inkjet Printers currently available for purchase and recommended for
Small & Medium Business. Includes links to compare products, ...
h10010.www1.hp.com/wwpc/us/en/sm/WF02a/18972-18972-236251.html
- 150k - Cached
- Similar
pages - Note
this - Filter
|
We’ve seen this
shift for several, but not all, pages that ranked highly for popular search
queries in Google only – MSN and Yahoo still seem to be indexing and serving
the tracking URL in lieu of the primary URL of the page. While this is great progress at removal of
duplicate content for HP and for Google, I do have questions:
(1) Is Google
removing the URL with the tracking code from the index entirely – thus truly reducing
duplicate content?
(2) If so, is the
link popularity from that URL being transferred to the base URL (sans tracking
code)?
OR -
(1) Is Google
simply stripping off the tracking parameter from the URL and not consolidating
link popularity to the base URL?
(2) If so, is
there a way we can help Google understand they should ignore - not entire URLs
with jumpids - but just the jumpid itself thus salvaging the link popularity to
the base URL.
I wish the
engines were more transparent on this issue and would provide some guidance on
how to deal with this issue. Here are
some ideas for how they might address this:
(1) Use wild
cards in the robots.txt file to block spiders from crawling pages appended with
a designated tracking code. Not sure if
Google respects the wild card but Yahoo’s Priyank
Garg suggests using this very solution for exclusion of tracking urls in
their index. I believe this actually won't help as I think it will
tell the crawlers to simply ignore URLs with jumpids which would reduce overall
link popularity potentially removing the most popular (from a link standpoint) URL
we have from appearing in search results.
(2) Create a
script that redirects only search crawlers (via a 301) from any URL with a
tracking parameter to the base URL. Yahoo's Sean Suchter suggested this
at the duplicate content session at SES
NY but I haven’t been able to find
documentation on how exactly to implement this. Also, could this solution
potentially be seen as cloaking?
(3) Create a
script for every single tracking URL that drops a cookie and resolves to the
base URL instead of the URL with the tracking parameter.
(4) Can we tell
the Google in some other fashion through Google Webmaster tools or
some other way what tracking parameters we use and when to ignore that portion
of the URL? This seems like the cleanest/easiest
solution.
My
objectives are to:
Reduce
duplicate content in search indices by having only the base URL indexed
Retain
and consolidate link popularity to the base URL
Improve the accuracy of campaign metrics by having only the base URLs appear in natural
search results
I’ve
posed the questions above to Matt Cuts (who is apparently preparing for a
month-long vacation) and Vanessa Fox at Google but have not heard back from
them yet. I’ve also followed up with
Sean Suchter to try and find out more about the redirect script he referred to
at SES NY.
It
seems the search engines have the same objectives of eliminating duplicate
content created by tracking URLs and consolidating link popularity to the base
URL for better, more relevant results.
I’d like to see them look to the web site owners for more direction on
how to navigate and index the pages within these sites.
Information disclosed in this community becomes public.
Exercise caution when deciding to disclose your personal information.
HP reserves the right, but is not obligated to, edit or remove your comment if it contains personally identifiable information or other content HP deems unacceptable.
Opinions expressed are your personal opinions or those of the original authors, and not of HP.
Please see HP's web Terms of Use for more details.