What is Google Binary Search and Should We Fear It? - Michael Sutton's Blog -
What is Google Binary Search and Should We Fear It?

Background

The so-called Google Binary Search (GBS) gained a fair bit of press attention in July 2006, when PC World published an article entitled 'Google's Binary Search Helps Identify Malware'. In the article, Websense revealed that they had used an undocumented Google search feature to identify malicious code. At the time, Websense Senior Directory of Security, Dan Hubbard, indicated that he planned to privately share the code that they were using among fellow security researchers, but would not be making it public.

GBS was back in the news a couple of weeks later when PC World published a follow up article. This time around, the article discussed HD Moore's Malware Search project, which had recently been made public. Moore downplayed the threat of GBS being used to obtain malcode and argued that it was more useful for identifying sites that distribute malware.

To the best of my knowledge, GBS is not publicly documented or mentioned by Google. This in and of itself is interesting for a company that typically provides early insight into research projects via Google Labs. After a fair bit of searching, I was unable to find much beyond Moore's Ruby source code to provide insight into how GBS works. If you don't speak Ruby, this blog is for you.

How it works

Google search engines appear to be downloading a sampling of executable files in addition to the web pages, documents, etc. that it is typically used to search for. With executable files, rather than indexing human readable strings within the file, Google is instead disassembling the header of the executable file and indexing information from it. All executable files begin with a header, which contains basic information such as the type of executable in question, where various segments exist, etc. This information is needed by the operating system that is launching the file. Windows executables, object code and DLL's, adhere to a file format known as the Portable Executable (PE) format. Therefore, all such files begin with a PE header and this is the information indexed by Google for Windows executables.

To see an example of this, do the following:

  1. Conduct a Google search for: "Signature: 00004550"+"Machine: Intel 386"
  2. Click on any of the "View as HTML" link in the search result

The page that you are viewing is simply a neatly formatted report containing PE header information for one of the executables that Google has indexed. Google treats this page, which it has created, just like any other web page. You can search for any of the unique phrases within the page and obtain results for executable files.

What it can be used for

Now that we know what GBS is and how it works, the next logical question is "how can I use it for something useful?". Good question. As mentioned, GBS can be used to search for any executable file, but you first need a means of identifying unique information within the PE header. In order to explain how this is done, we'll download a copy of popular telnet/SSH client Putty. Putty was selected as we need an executable file which is likely to be hosted on websites that Google would index. A popular self-extracting installation file is a solid choice as a *.zip file would not be indexed by GBS.

Next, we need a means of viewing the PE header information. There are various freeware tools for doing this such as LordPE. Once your PE editor is installed, open the target executable. In LordPE, click on the 'PE Editor' button, browse to the Putty executable and open it. Once this is done, you will see the basic PE header as shown in the image below. Most of the search data that we require will come from this screen.

LordPE - Basic PE Header Information

Next, click on the 'Sections' button, to view the section table. The section table provides details on the location of various components of the executable file, such as the code and data segments. Below is a screenshot of the Section Table.

LordPE - Section Table

Now, we need to select unique values to search for. Following HD's lead, we'll use the following fields:

Google Field LordPE Field LordPE Screen
Time Date Stamp TimeDateStamp Basic PE Header Information
Size of Image SizeOfImage Basic PE Header Information
Entry Point EntryPoint Basic PE Header Information
Size of Code .text-->RSize Section Table

The table above lists the field names used by Google, the same field as it's named in LordPE and finally, the LordPE screen name where you will find the data. Now that we have our search data, we can construct our query.

"Time Date Stamp: 4252EA65"
"Size of Image: 0006D000"
"Entry Point: 0004265F"
"Size of Code: 0004A000"

Concatenating the individual search phrases into a single query leaves us with the following:

"Time Date Stamp: 4252EA65"+"Size of Image: 0006D000"+"Entry Point: 0004265F"+"Size of Code: 0004A000"

Assuming that the Google index hasn't changed and that the version of Putty initially downloaded for analysis is consistent, you should receive one search result that is indeed a link to download an identical copy of putty.exe at a location other that our original download site. What does this tell us? It tells us that Google is only indexing a small fraction of the executables that it locates. We know this because Putty is a popular program available for download at many sites and we only found one. In fact we didn't even find the initial download site, which we know exists.

What about Malcode?

The initial eWeek story expressed concern that GBS could be used to identify malcode samples. While it's true that it could be used for such a purpose, it's questionable just how useful that approach would be. For those that are curious, HD has saved you the trouble of needing to generate your own search terms by publishing a signature database for common malcode samples. The format of the database is 'Descriptive Name:Time Date Stamp:Size of Image:Entry Point:Size of Code'. If you test the signatures in this database you will find that for the most part you receive surprisingly few results given the prevalence of malcode. Once again, this can be attributed to the fact that the Google index contains only a sample of executable files. Beyond that, if you run the executables obtained through an AV scanner, you'll see that many are false positives. Why? The signatures created are far from perfect. It's very possible to create two completely different executables with the same signatures given the fields that we've chosen to search for. Naturally, the false positives could be reduced by creating more precise signatures using additional fields. If you're really ambitious, you could create your own signatures by obtaining samples of malcode from one of the many public repositories, but if you've done that, you don't really need GBS to get malcode now do you?

Conclusion

I agree with HD Moore. Given the number of binary files being indexed at this point, GBS is not particularly useful for obtaining malcode samples. It is somewhat useful for identifying sites that may be hosting malcode but even then, the results tend to reveal binary attachments in email messages sent to mailing lists. It's hardly surprising that malcode would be found in such a location. Moreover, if you're looking for malcode, there are no shortage of places to find it, with or without GBS. That's not to say however that GBS is not a useful tool. I have no doubt that over time, as the index grows, the results will continue to be more useful to whitehats and blackhats alike.

- michael

Posted 09-14-2006 2:46 PM by erik.peterson

Comments

Ron Jennings wrote re: What is Google Binary Search and Should We Fear It?
on 09-14-2006 4:35 PM

Hello Michael,

     I read your blog with a lot of intrest. I really like the way you took the time to explain the MSFT issues.  I leaned a lot in a small amount of time.

   I like the way you write and I hope to get to hear you speak sometime.

  Thanks for the link to 3 Com. I had not seen it before.  

erik.peterson wrote re: What is Google Binary Search and Should We Fear It?
on 09-14-2006 6:49 PM

Thanks Ron, I appreciate that. Fortunately, my new position at SPI Dynamics will allow me to hit the road and do more public speaking. Keep an eye on the speaking schedules for the Secure Software Forum workshops and executive dinners as I'll be speaking at a number of them going forward.

/pd wrote re: What is Google Binary Search and Should We Fear It?
on 09-15-2006 11:29 AM

Mike, nice post- very informative blog.. have sub'ed :)-

During this analysis did you take into considertation the blacklist ?? SERP are running aginst this list too :)-

http://sb.google.com/safebrowsing/update?version=goog-black-url:1:-1

erik.peterson wrote re: What is Google Binary Search and Should We Fear It?
on 09-15-2006 12:00 PM

Thanks /pd. No, I hadn't thought about the blacklist while doing the research but I now wonder if GBS is used to assist Google in identifying sites to be blacklisted. Cross referencing GBS results to the Google blacklist while looking for malware samples might also be a way to reduce false positives.

- michael

/pd wrote re: What is Google Binary Search and Should We Fear It?
on 09-15-2006 5:51 PM

no Mike , I think its the other way around !!

GBS is seeded with auto malware codes and run against the bigtable index. Whenever they is a match against the seed value --the vectors are pulled in for secondary analysis /verification and then added to the BlackList, when confirmed postives are obtained.

In this manner, whenever a terms is used (e.g  "putty" ) the engine retrieves all results URLS, parses it agains the RBL and eliminates pages that are deemed to be postive postive AV sigs/malware/spam sources. The remaining list is then pushed thru for SERP display.  

I think there is a 3 tier process, which is being implmented to reduce malware results in the SERP !!

this is just my thoughts.. I could certainly be wrong  

RandallM wrote re: What is Google Binary Search and Should We Fear It?
on 09-16-2006 12:29 PM

Great piece.

I didn't see anywhere mentioning other search engines. Dogpile pulled from msn search which not only gave three sites but included you blog. that's a little better then google. which may show or support RBL.

erik.peterson wrote re: What is Google Binary Search and Should We Fear It?
on 09-17-2006 9:17 AM

RandallM - Thanks for the heads up, you're absolutely correct. Microsoft's search engine (http://live.com) does indeed appear to have identical binary search functionality. I ran a few of the signatures from HD's malware project through it and while I wouldn't say that I received better results, in some cases they were different. Run the search for Putty shown above and you will receive different results.

Thanks again RanallM.

Daniel Cuthbert wrote re: What is Google Binary Search and Should We Fear It?
on 10-05-2006 11:19 AM

Nice post Michael.

This actually got me thinking about the next generation of OS design (im currently reviewing OS X Leopard).

What developers could do is integrate this functionality into the OS. Taking Leopard as an example, ichat still has an issue when you create malware and rename it to a easily viewable extension, such as britney_naked.jpg. iChat and OS X happily tries to render the jpg in the ichat window, but in turn it actually opens up terminal and executes my code.

How i see this working is a bit like web application input validation. you have a GBS style function which is called first to check the header of the code in question and then queries a online db like HDM's. if all good its passed onto the next app for execution and if not its blocked

gags wrote re: What is Google Binary Search and Should We Fear It?
on 10-10-2006 8:02 AM

Its really a wonderfull blog.. actully i was searching for new google product code search but... some how got this  GBS thing and wanted to knw more abut and the way u explained it is best..thanks a ton..

erik.peterson wrote re: What is Google Binary Search and Should We Fear It?
on 10-10-2006 8:25 AM

My pleasure gags. Thanks for the feedback.

Anan wrote re: What is Google Binary Search and Should We Fear It?
on 12-22-2006 4:25 AM
great info thanks
Michael Sutton's Blog wrote New Year’s Resolutions
on 12-22-2006 2:32 PM

With Santa Claus on his way and another year coming to a close, it's time to start thinking about