« Internet SSL Server Survey at Black Hat USA 2010 | Main | SSL Server Survey: what data are we collecting? »

SSL Server Survey: So what's with the 22M invalid certificates claim?

July 02, 2010

After I talked about my research in a Black Hat Preview webcast last week, a number of people rushed to tell me that there can't be 22M invalid certificates in the world. Some got in touch directly, some just talked about it, and, in one instance, a company managed to issue a press release (!) to urge me to clarify my statements.

"Surely I was mistaken", they said, because "there's only about 2M certificates sold by commercial certificate authorities? Something does not add up." Of course it doesn't.

The short answer is that they've generally focused on the wrong aspect of the study and that, in fact, I made no such sensational claims. So what is the real story?

The goal of the SSL Survey is to understand how SSL is used in real life. We generally know what is the right way to configure and deploy SSL, but are the best practices followed, and by how many sites? This sort of analysis is difficult to pull off because you have to know what to test. (The testing itself is not so difficult for us, because we already have a robust assessment engine running on SSL Labs.)

Here are the possible approaches to discover publicly-available SSL servers:

  • Scan all IPv4 address space. This is only possible because of the still-prevalent deficiency of most SSL deployments that they require a dedicated IP address to operate.
  • Crawl the Internet to discover the SSL servers that appear in all the hyperlinks in the world.
  • Analyse the domain name space. One option is to obtain the lists of all domain names registered under a specific TLD, which is slow and cumbersome, but possible. Another option is to use a list such as Alexa's top 1M most popular sites, which is a much more palatable approach.
  • Use a browser toolbar (or a similar service) to collect the hostnames the users are visiting. This is the only approach that will reveal internal SSL servers, although I doubt that would be very useful.

For my study I opted to analyse the domain name space. The other approaches were either not feasible, would take years, or simply cost too much. We'd love to collaborate with the organisations that have the data that we need, but that hasn't happened yet.

From VeriSign's reports we know that there's about 193M registered domain names in the world, and I managed to obtain a list of about 119M to start with (all .com, .org, .net, .biz, .us, and .info domain names). The idea was to first perform a series of lightweight tests (there are so many domain names!) to give us an idea which domain names are worth looking at in depth. Here are the results:

  • 92M domains are active on port 80 or port 443
  • 33M domains have port 443 open
  • 22.65M domain names run SSL on port 443
  • On 0.72M domain names certificates match the domain name

Now it becomes clear where the 22M invalid certificates number comes from. If you take the number of domain names that responded on port 443 and subtract the number of certificates that matched the domain name on which they reside (0.72M), you get about 21.93M domain names that do not have valid certificates on port 443. This number is not very important and certainly not ground-breaking. It's a by-product of the methodology whose end-goal is to find something else.

The important number is the 720,000 certificates whose names do match the domain names on which they reside. For each of those, someone made an effort to match the names, and those are the servers that are worth investigating further.

Sadly, some people chose to focus on the numbers that help make an interesting headline, but which aren't very interesting from the research point of view. The reason we have so many domain names that do not have proper SSL certificates installed is that most of them are not _intended_ to have them. Multiple domain names will point to the same IP address and, thus, to the same SSL server. (Remember, virtual SSL hosting is not yet mainstream.) The difference in numbers is because of the widespread use of virtual web hosting, which is available for non-SSL sites, but not yet for SSL sites. You can host a million plain-text web sites on a single IP address, but if you want a million secure sites, you'd need a million IP addresses.