« Improved handling of SSL warnings in Firefox 3.5 | Main | Examples of the information collected from SSL handshakes »

Analysis of Googlebot's frugal cipher suite list

July 02, 2009

Two weeks ago, I announced SSL Labs and my technique for passive SSL cipher suite analysis. It won't surprise you to learn that I've been carefully observing the cipher suites used in the requests that came to the web site since. (In fact, I announced the site slightly earlier than I had planned because I wanted to get my hands on some real-life data.) One client's SSL fingerprint immediately caught my attention, because it supported only 4 cipher suites. It was Googlebot.

There were 115 visits from Googlebot in the two-week period, using 5 different user agent strings (although Googlebot will sometimes send a request without User-Agent set):

  • Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
  • SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)
  • DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)
  • Googlebot-Image/1.0
  • Feedfetcher-Google; (+http://www.google.com/feedfetcher.html; feed-id=9430846974815548184)

The first one is by far the most common, although the other ones appear on regular basis. I used reverse DNS to verify that the IP addresses belong to Google, with the exception of one Feedfetcher request, for which I had to use ARIN.

As I’ve already mentioned, Googlebot's SSL fingerprint is quite short:

h2,03.01,010080,04,05,0a

The first token indicates the version of the SSL handshake used. In this case it’s h2, which is a code for the SSL v2 handshake. The second token indicates the highest SSL version a client is willing to support. Googlebot’s choice, 03.01, indicates that is willing to go as far as TLS v1.0. Modern browsers do not support SSL v2.0 so it's generally rare to see a browser use a SSL v2 handshake. Search engines don’t care about security but they do care about accessing as many servers as possible: they’ll compromise and support the weaker protocols.

What follows is the most interesting part: the codes for only 4 cipher suites. They are:

  • SSL_CK_RC4_128_WITH_MD5 (0x010080)
  • SSL_RSA_WITH_RC4_128_MD5 (0x04)
  • SSL_RSA_WITH_RC4_128_SHA (0x05)
  • SSL_RSA_WITH_3DES_EDE_CBC_SHA (0x0a)

The first suite is only valid with SSL v2.0, while the three remaining ones work in SSL v3.0 or TLS v1.0. It's obvious that, unlike with most other SSL clients, the cipher suites on this list were hand-picked. If I would have to guess, I would say that the motivation was to save on bandwidth and increase performance. It’s likely that all SSL v2.0 servers support the one SSL v2.0 cipher suite, while 3 suites are needed to support the rest of the Internet.

Assuming the reason for such a short list of cipher suites is frugality, I am surprised it doesn’t contain suites with weaker ciphers. A search bot doesn’t really care about security so it could afford to negotiate a weaker cipher and perhaps save some CPU cycles. Similarly, 3DES is significantly slower (than, for example, RC4) so it would be my first candidate for removal if I am concerned with performance. Thus, I am guessing 3DES is there for interoperability.

It would be interesting to get someone from Google to comment.

Interestingly, my net caught one search engine imposter, who claimed he was Googlebot, but wasn't. While I could have also used a reverse DNS lookup to determine what the imposter wasn’t, in this case I was also able to identify what it was—someone browsing the Internet using a Firefox 2.x browser with and altered User-Agent field. Nice!