UTF-8 Encoding

Hello there!

My name is Andreas Fuchsberger, I am a developer in the CISG team based in Germany. I joined CISG after a short stint with Assessment, Consulting and Engineering (ACE) Team part of the InfoSec in Microsoft IT. I am a relatively new to Microsoft having joined only 6 months ago coming from academia where I was full-time academic in the internationally renowned Information Security Group at Royal Holloway, University of London. In fact I still teach there on the excellent Masters (MSc) degree programme in Information Security, I teach the optional modules on Software Security

The Software Security module was developed in response to the industry need to develop more secure software and is strongly based Michael Howard‘s must-read book Writing Secure Code 2nd Edition and its update Writing Secure Code for Windows Vista®. It received part-funding from the Microsoft Research and the syllabus was constructed in consultation with Fabien Peticolas who headed the then university relations programme and Dieter Gollmann, who was also Microsoft Research at the time.  Since designing and teaching the course I have become quite passionate about secure coding and the need to educate all kinds of software developers to learn to code securely from early on their careers. I am a true believer that security is not just a bolt-on that can be added at the end of a project. Expect to see more this one of my favorite topics in the future.

Speaking of secure coding, I note from a recent entry from Michael’s blog that Apache Tomcat has a UTF-8 encoding security bug and its related to the implementation of a standard (RFC 3629). Security standards are another of my favourite topics as I actively participate in a number of SC27 working groups (home of the ISO 27000 series) covering IT Security Technique for the International Organisation for Standardisation (ISO) and the International Electrotechnical Commission (IEC). I will be posting updates to the happenings of the working groups in the future.

Just in case you are interested, the Tomcat vulnerability comes about from using an invalid but possible UTF-8 encoding of ‘.’ character, this bug is often called “overlong UTF-8 escape". BTW the definitive place for  UTF-8 encoding is Section 2.5 “Encoding Forms” and Section 3.9 “ Unicode Encoding Forms ” in the Unicode Standard, a great read if you interested in typesetting, character sets, encoding an similar. This particular problem comes about through the desire create a solution that provides some form of compatibility for legacy systems, a source of many security problems.  It also goes to show that using blacklists is not the safest way to check for invalid input. Hackers always seem to be able to find new ways around blacklists that the original designers could not envisage. This is one the many things we are currently contemplating in the design of our new Anti-XSS library.  Watch this space for an announcement.