Decoding the Google Blacklist - Michael Sutton's Blog -
Decoding the Google Blacklist
After publishing last week's blog entitled ‘A Tour of the Google Blacklist', I received a few queries about Google's encoded/hashed blacklist (enchash). This blacklist is separate from the unencoded blacklist that was the focus of the previous blog. It is also much larger, currently maintaining 14,000+ entries to the 1,000+ entries contained in the unencoded blacklist. Beyond that, it takes a more functional approach by providing regular expressions to match phishing URLs as opposed to exact string matches.

Structure

As with all of the Google safe browsing lists, the enchash list can be pulled from a standard URL as noted below:

http://sb.google.com/safebrowsing/update?version=goog-black-enchash:1:1

The final two integers in the URL represent major:minor version numbers, allowing you to pull specific versions of the list. When requesting the enchash list you will see the following structure:

[goog-black-enchash 1.16026]
+000063A6E10172D71383F41E62D518A4   ZFhjcVk2R1mwTTbpCYVT5twpRd6hypeo4...
+0000E099D1DD9B0CA2A834A20A20C7AF   cFhWWGd6NGVz/L8ye10PpA6dgRqtTftTu...
+00011C8D5B3C6B7E58EFE31EBD4DBE04   bFNvcGRvNmq62yRf0TeY3Lwdn7Z+y61S2...
+000351FD5CF55A398FF6360DA108ED03   UUxza1hyQzTbScPPx/MpphX/iQmMbYKET...

The first row simply identifies the version of the enchash list being displayed. The data following is contained in two columns with the first being an MD5 hash of a database salt (see below) + hostname and the second, an encrypted array of regular expressions.

Security

The enchash list isn't designed to be secure per se. Information on its structure and how to decrypt the regular expressions is publicly available. It is designed so that an individual URL can be checked against the list to determine if it is a phishing site, while preventing the entire list from being decrypted at once. This is accomplished by including the hostname in the decryption key. You must start with a hostname that is in the list, in order to decrypt the corresponding regular expressions. Therefore, in order to decrypt the entire list, you would also need to know all of the hostnames represented in the first column. This is likely done simply to prevent competitors from acquiring the full list.

Decryption

In order to understand how to decrypt the regular expressions, we'll walk through a sample record.

1. Hostname

  • As mentioned, the list is designed to be able to check a known URL against the list to determine if a match exists. In order to do this, we'll begin with a hostname taken from the unencoded blacklist, namely 210.212.141.146.
  • A canonical hostname should be broken down into sub hostnames with each one checked against the list separately. For example, with mail.yahoo.com, both mail.yahoo.com and yahoo.com should be checked separately. Since our hostname is an IP address, this is not required.
  • Next, compute an MD5 hash of the ‘database salt' and the hostname. The database salt is a constant equal to ‘oU3q.72p'. The MD5 hash of ‘oU3q.72p210.212.141.146' is equivalent to ‘74AC98F531F37D2DA9C221148F2F35C2'.
  • Ensure that all characters in the MD5 hash are capitalized and compare the result against data in the first column. If a match is found, proceed to the subsequent steps. In our case, the MD5 checksum does produce a match.

2. Key

  • Once a match is found, it's time to produce the key that will be used to decrypt the data.
  • Base64 decode the data
  • Strip the first 8 characters from the decoded data. This will be used as the ‘random salt' and in our case is ‘Qjnv90jM'.
  • Compute an MD5 hash of the ‘random salt|database salt|hostname'. This produces a 128-bit encryption key.

3. Decryption

  • Apply the decryption key generated above using the RC4 algorithm to decrypt the data.

4. Result

  • ^http\:\/\/210\.212\.141\.146\:84\/\.confirm\/index\.php\?
  • Our example produced a single regular expression but it is possible for the data to contain multiple regular expressions separated by ‘\t' (tab stop) characters.

Code

The following code was provided by Stephan Chenette and Alex Rice from WebSense and will automate the aforementioned decryption procedure. I'd like to thank them both for their invaluable collaboration as they pointed out a key fault in my logic and ultimately saved me from chasing my tail.

#!/usr/bin/perl -w
use strict;
use Crypt::RC4;
use MIME::Base64;
use Digest::MD5 qw(md5);

my $database_salt = 'oU3q.72p';
my $hostname = '210.212.141.146';
my $enc_string = decode_base64('UWpudjkwak0iUBMO+xnGplKuo+fiEw1BVFQSuoi21jQ7DE2nTuO6esC67q88bcsM8TBVHQaEK29wmwzStc7SHQut');
my $random_salt = substr($enc_string, 0, 8);
my $enc_data = substr($enc_string, 8);
my $key = md5($database_salt . $random_salt . $hostname);
my $rc4 = Crypt::RC4->new($key);

printf "Regular exp(s): %s\n", $rc4->RC4($enc_data);

Enjoy!

- michael


Posted 01-10-2007 4:07 PM by erik.peterson

Comments

free christian music ringtones wrote 1600 nokia ringtones
on 02-04-2008 1:07 PM

So far cellular free phone ringtones samsung metro pcs phone ringtones

nokia ringtones tracfone wrote loan until payday
on 02-05-2008 4:52 AM

More than one of 100 free mobile ringtones virgin loan until payday