« Internet SSL Survey 2010 is here! | Main | Qualys SSL Labs releases raw data from the Internet SSL survey »

Canoe: XSS prevention via context-aware output encoding

September 24, 2010

The only way to avoid having vulnerabilities is to make it impossible for programmers to make security mistakes. It's that simple. Canoe is my context-aware HTML output encoder that encodes every piece of data using the encoding that is required by the current position on the page.

At the core of Canoe is a lightweight HTML parser that tracks output in real time and detects transitions from one context to another. For example, when it sees a <script> tag it knows that what follows needs to be treated as code, not HTML markup. And that's all there is to it. As a programmer you no longer need to think about using the correct encoding function for every little piece of data that you write to HTML pages. The encoding process is transparent and foolproof.

Of course, I don't think Canoe is foolproof at the moment, but I expect it will get there in time. I've taken it as much as I could on my own. Now I need your help to find flaws in it so that eventually we fix all of them.

To that end, I have uploaded a demo of Canoe at http://canoe.webkreator.com. You are cordially invited to give it a shot. To help you break Canoe I am giving you access to more stuff than an attacker would have. In the demo you control both the template (which normally only developers control) and the attack payload. You are thus able to simulate any weird template situation you can think of.

I am aware that getting the HTML parser right might not be easy. After all, people have been breaking browsers' parser for years now. There are probably many quirks that need to be taken into account. However, Canoe's job is easier because it is able to stop processing very weird templates. In addition, it can refuse to output _any_ data into dangerous contexts. For example, the current implementation refuses to output anything into a CSS context.

Project goals:

  1. Determine if context-aware output encoding is possible
  2. Provide a Java implementation
  3. Document all the implementation details

Final notes:

  • There are some known issues (you'll find the list at the bottom of the demo page), which I will address later. I will keep the list up-to-date as the new issues are discovered.
  • I am not going to release the source code right now, but, once the tool gets closer to completion, it will be released under a BSD or Apache software licence.
  • The code is generic, so it should be easy to integrate with any templating language. I also have the Apache Velocity implementation.