Fun With Google Code Search - Michael Sutton's Blog -
Fun With Google Code Search

Yesterday, Google Labs launched a search tool that has many developers salivating. It's called Google Code Search (GCS) and allows developers to search source code from other projects to assist them in finding code for reuse. It has some impressive functionality as it indexes not only raw text but will even go to the extent of indexing content within *.zip files. Beyond searching for strings within code, it can handle the following advanced search capabilities:

  • Regular Expressions - Google Code Search supports POSIX extended regular expression syntax which can be used to search for complex string matches. This allows for very specific search queries.
  • file: - Restricts searches to specific files or directories. File and directory names can be created using regular expressions.
  • package: - Restricts searches to specific URLs or CVS servers. Package names can be created using regular expressions.
  • lang: - Restricts searches to specific programming languages.
  • license: - Restricts searches to specific licenses (i.e. BSD, GNU, etc.).

Google was not the first to come up with the idea of a source code search engine. Krugle has been around for a while, as has the Bugle Project. Krugle is commercial entity devoted to search capabilities for developers, while Bugle is more of a research project which leverages Google for its search capabilities. Unlike Krugle, Bugle is specifically devoted to Google searches which target vulnerable source code. While Google code search wasn't the first, it is the most powerful in terms of its advanced search features.

As with any search tool, it can be used for plenty of mischief, well beyond what it may have initially been designed for. Even though it's only a day old, there has been plenty of chatter about what it can be used for including the following:

Find Vulnerable Source Code

Given the powerful regex capabilities of GCS and the fact that it can be restricted to target specific population of source code, GCS is an excellent tool for identifying potentially vulnerable code snippets in open source projects.

Buffer Overflows

Many C library functions are prone to buffer overflow attacks due to the fact that they do not perform bounds checking. GCS can be used to identify code segments that use these potentially vulnerable functions.

SQL Injection

Browse How Prevalent Are SQL Injection Vulnerabilities, a previous blog posting to better understand the dangers of SQL injection.

Cross Site Scripting

Note: Take a look at Bugle as a great starting point for developing further GCS regular expressions for identifying potentially vulnerable source code.

Identify Comments Suggesting Problems

Find Out Where Your Source Code is Being Used

Note: Don't confuse this as a comprehensive list by any stretch of the imagination. Use it as a starting point and feel free to post comments with your own revealing searches.

- michael

Posted 10-06-2006 10:53 AM by erik.peterson

Comments

Philipp Lenssen wrote re: Fun With Google Code Search
on 10-06-2006 1:56 PM

Also something for HTML injections:

"response.write request.querystring" lang:asp

newsoft wrote re: Fun With Google Code Search
on 10-07-2006 4:45 AM

Syslog format strings:

syslog\((\w+,\w+)\); lang:c

Anonymous wrote re: Fun With Google Code Search
on 10-12-2006 8:32 AM

PHP RFI vulns: ^include\(\$(.*)\);$ lang:php

Michael Sutton's Blog wrote Top 10 Signs You Have an Insecure Web App
on 11-01-2006 12:48 AM

I often surf the web and see blatant design errors that make me shake my head. Without even investigating

Michael Sutton's Blog wrote Good Intentions Equal Bad Security
on 12-06-2006 4:19 PM

Earlier this week, yet another rapidly spreading MySapce worm reminded me of a frequent dilemma in computer

Michael Sutton's Blog wrote New Year’s Resolutions
on 01-08-2007 6:45 PM

With Santa Claus on his way and another year coming to a close, it's time to start thinking about

Practicing Software Engineering in the Field wrote Google Code Search - Different Perspective
on 03-05-2007 3:57 PM

Google launches a special treat just for developers ... I'd like to present it from some different perspective.