Implementation vs. Design Defects

I got a comment to my last post that's worth following up on:

Can you comment on what percentage of defects you all are finding are implementation vs. design defects?

Its pretty clear that older code that doesn't have buffer overflows isn't going to all of a sudden have one.  At the same time older "well-written" code is more likely to have a design flaw, or be subject to a new class of attack than newer code designed to mitigate said attack.

When you find those design issues they can be especially tricky to fix especially if the flaw is part of an externally facing API/interface rather than just an internal one.

I'm not asking for hard numbers, just curious anecdotally whether you can comment on the rate of occurence.

I'm not sure that the sheer count is a very useful number. Design defects are much, much harder to correct than implementation defects, so a few of them can easily add up to more work than dozens of implementation errors. It also varies a lot from one app to another. This is an oversimplification, but the threat model for the Word document parser fairly well boils down to "Don't have implementation errors". There _is_ more to it than that, but in comparison to say the redirector which does all the SMB share and administrative functions, the threat model is fairly boring. May as well just get to fuzzing and fixing bugs.

Our experience in Office is that we get a lot more traction focusing on implementation issues, though this takes a little elaboration. For example, the design of the .doc format is much harder to implement properly and hence secure than the design for .docx. However, we can't very well just stop reading .doc files, and we can't just throw out the .doc parser and start over – not only do we have files generated by several versions of our own code, we also have files generated by other apps. We also can't leave customers insecure. So the right thing is to introduce something with better design AND set about making substantial improvements in existing code without breaking it.

Older code is more likely to be subject to new classes of attack. For example, integer overflows are much more well known than they were previously. However, we don't have to toss the code out to go tidy up int overflow problems, and a threat model isn't the right tool to go find int overflow problems. Code review, fuzzing, SafeInt, etc. are the right tools. It still depends on what that code does – if the code is something having to do with a bunch of web browser gunk, then that's a rapidly changing area, and you need to stay current. If the code checks an ACL against a process token, that doesn't change very much.

You're right that changing external interfaces, especially network level interfaces is just a nightmare. For example, NTLMv2 fixed a bunch of problems with NTLMv1, and I think we're just now getting to the point that it is a reasonable default – about 10 years after the original fixes went into NT 4.0 SP4. That's where threat modeling becomes really, really important – the consequences of getting something like that wrong are ghastly. A good number of the problems we have with web browsers and web servers come from the fact that neither HTML nor HTTP really considered security, and we as an industry implemented something that wasn't thoroughly baked. If we could either go back and do it right, or come up with a replacement, we'd need to do a lot of examination to get the security right from the start.