« Improved handling of SSL warnings in Firefox 3.5 | Main | Examples of the information collected from SSL handshakes »

July 02, 2009

Analysis of Googlebot's frugal cipher suite list

Two weeks ago, I announced SSL Labs and my technique for passive SSL cipher suite analysis. It won’t surprise you to learn that I've been carefully observing the cipher suites used in the requests that came to the web site since. (In fact, I announced the site slightly earlier than I had planned because I wanted to get my hands on some real-life data.) One client’s SSL fingerprint immediately caught my attention, because it supported only 4 cipher suites. It was Googlebot.

There were 115 visits from Googlebot in the two-week period, using 5 different User-Agent strings (although Googlebot will sometimes send a request without User-Agent set):

  • Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
  • SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)
  • DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)
  • Googlebot-Image/1.0
  • Feedfetcher-Google; (+http://www.google.com/feedfetcher.html; feed-id=9430846974815548184)

The first one is by far the most common, although the other ones appear on regular basis. I used reverse DNS to verify that the IP addresses belong to Google, with the exception of one Feedfetcher request, for which I had to use ARIN.

As I’ve already mentioned, Googlebot's SSL fingerprint is quite short:

h2,03.01,010080,04,05,0a

The first token indicates the version of the SSL handshake used. In this case it’s h2, which is a code for the SSL v2 handshake. The second token indicates the highest SSL version a client is willing to support. Googlebot’s choice, 03.01, indicates that is willing to go as far as TLS v1.0. Modern browsers do not support SSL v2.0 so it's generally rare to see a browser use a SSL v2 handshake. Search engines don’t care about security but they do care about accessing as many servers as possible: they’ll compromise and support the weaker protocols.

What follows is the most interesting part: the codes for only 4 cipher suites. They are:

  • SSL_CK_RC4_128_WITH_MD5 (0x010080)
  • SSL_RSA_WITH_RC4_128_MD5 (0x04)
  • SSL_RSA_WITH_RC4_128_SHA (0x05)
  • SSL_RSA_WITH_3DES_EDE_CBC_SHA (0x0a)

The first suite is only valid with SSL v2.0, while the three remaining ones work in SSL v3.0 or TLS v1.0. It's obvious that, unlike with most other SSL clients, the cipher suites on this list were hand-picked. If I would have to guess, I would say that the motivation was to save on bandwidth and increase performance. It’s likely that all SSL v2.0 servers support the one SSL v2.0 cipher suite, while 3 suites are needed to support the rest of the Internet.

Assuming the reason for such a short list of cipher suites is frugality, I am surprised it doesn’t contain suites with weaker ciphers. A search bot doesn’t really care about security so it could afford to negotiate a weaker cipher and perhaps save some CPU cycles. Similarly, 3DES is significantly slower (than, for example, RC4) so it would be my first candidate for removal if I am concerned with performance. Thus, I am guessing 3DES is there for interoperability.

It would be interesting to get someone from Google to comment.

Interestingly, my net caught one search engine imposter, who claimed he was Googlebot, but wasn't. While I could have also used a reverse DNS lookup to determine what the imposter wasn’t, in this case I was also able to identify what it was—someone browsing the Internet using a Firefox 2.x browser with and altered User-Agent field. Nice!

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00e54fd889f28834011570a9e388970c

Listed below are links to weblogs that reference Analysis of Googlebot's frugal cipher suite list:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

I would guess that Google are reasonably paranoid about attacks against the Googlebot, so exposing fewer cipher suites means they can audit a smaller subset of an SSL library.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

MY WORK

ModSecurity Handbook is the guide to the world's most popular web application firewall.
SSL Labs offers a comprehensive SSL security assessment consisting of 250+ checks. To start, enter your domain name below (it's free):

ABOUT ME

Ivan Ristić is an open source advocate, entrepreneur, writer, programmer and web security specialist. He is the principal author of ModSecurity, the open source web application firewall, and the author of Apache Security, a concise yet comprehensive web security guide for the Apache web server.   [LinkedIn Profile]

My Photo

TWITTER

@ivanristic

    FEEDS