Yesterday, Google Labs launched a search tool that has many developers salivating. It's called Google Code Search (GCS) and allows developers to search source code from other projects to assist them in finding code for reuse. It has some impressive functionality as it indexes not only raw text but will even go to the extent of indexing content within *.zip files. Beyond searching for strings within code, it can handle the following advanced search capabilities:
- Regular Expressions - Google Code Search supports POSIX extended regular expression syntax which can be used to search for complex string matches. This allows for very specific search queries.
- file: - Restricts searches to specific files or directories. File and directory names can be created using regular expressions.
- package: - Restricts searches to specific URLs or CVS servers. Package names can be created using regular expressions.
- lang: - Restricts searches to specific programming languages.
- license: - Restricts searches to specific licenses (i.e. BSD, GNU, etc.).
Google was not the first to come up with the idea of a source code search engine. Krugle has been around for a while, as has the Bugle Project. Krugle is commercial entity devoted to search capabilities for developers, while Bugle is more of a research project which leverages Google for its search capabilities. Unlike Krugle, Bugle is specifically devoted to Google searches which target vulnerable source code. While Google code search wasn't the first, it is the most powerful in terms of its advanced search features.
As with any search tool, it can be used for plenty of mischief, well beyond what it may have initially been designed for. Even though it's only a day old, there has been plenty of chatter about what it can be used for including the following:
Find Vulnerable Source Code
Given the powerful regex capabilities of GCS and the fact that it can be restricted to target specific population of source code, GCS is an excellent tool for identifying potentially vulnerable code snippets in open source projects.
Buffer Overflows
Many C library functions are prone to buffer overflow attacks due to the fact that they do not perform bounds checking. GCS can be used to identify code segments that use these potentially vulnerable functions.
SQL Injection
Browse How Prevalent Are SQL Injection Vulnerabilities, a previous blog posting to better understand the dangers of SQL injection.
Cross Site Scripting
Note: Take a look at Bugle as a great starting point for developing further GCS regular expressions for identifying potentially vulnerable source code.
Identify Comments Suggesting Problems
Find Out Where Your Source Code is Being Used
Note: Don't confuse this as a comprehensive list by any stretch of the imagination. Use it as a starting point and feel free to post comments with your own revealing searches.
- michael
Posted
10-06-2006 10:53 AM
by
erik.peterson