Software Contracts, Part 5: Hold on a second, why do we care about this stuff anyway?

I'm more discombobulated than usual on this series, I totally missed the third article in the series when I should have gotten to it (this, btw, is why Raymond writes his stuff 8 months in advance - it lets him fix stuff like this).

So consider this post the 2nd in the series (the first is "Software Contracts", the second is "there are two sides to every contract", this is the 3rd, the 4th is "Sometimes contracts are subtle", etc...).

Why do we care about software contracts?

 

Well, for the exact same reason we care about real-world contracts.  Contracts define the expectations between two parties.  Without fully understanding the contract for a function, you don't know how to correctly call it.

Take the ReadFile example I mentioned the other day.  The ReadFile contract tells you which parameters to the function MUST be provided (hFile, lpBuffer, nNumberOfBytesRead), which MAY be provided (lpNumberOfBytesRead and lpOverlapped).  It also includes how you determne if the function succeded or not (or if the function has success/failure semantics).

We're so used to interpreting software contracts that they become ingrained.  Normally, we don't even bother think about them, and for the most part, you don't need to wonder about them (just like in real world contracts).

However the instant you step outside the simplest case, understanding the contract for an API becomes critical.

As a simple example, consider one small aspect of the contract for the standard C++ library (from "Thread Safety in the Standard C++ Library"):

A single object is thread safe for reading from multiple threads. For example, given an object A, it is safe to read A from thread 1 and from thread 2 simultaneously.

If a single object is being written to by one thread, then all reads and writes to that object on the same or other threads must be protected. For example, given an object A, if thread 1 is writing to A, then thread 2 must be prevented from reading from or writing to A.

It is safe to read and write to one instance of a type even if another thread is reading or writing to a different instance of the same type. For example, given objects A and B of the same type, it is safe if A is being written in thread 1 and B is being read in thread 2.

These rules concisely lay out the threading guarantees for C++ library functions (to be honest, I really like this version of the text, usually I just hear it written as "An object is thread safe for reading from multiple threads or writing from a single thread").

You know from this part of the contract (which applies to the Microsoft implementation of the container classes (I don't know if it's in the standard, since I don't have a copy of the standard)) that you can have multiple readers of an object, but the instant you have a single writer, you need to add some kind of a lock to isolate the readers from the writers.  On the other hand, you don't need a lock if all you're doing is reading the data.

Without this text being a part of the contract, you MUST assume that it's not possible to call the container classes in the C++ library from multiple threads (because the contract doesn't say that you can).

 

A failure to appreciate software contracts can result in a myriad of different bugs, including various and sundry security bugs.  In my experience, most subtle, hard-to-diagnose bugs ultimately turn out to be caused by a misunderstanding about the contract associated with a function.

For example, it's long been known that strcpy is a haven for security bugs.  One of the naive suggestions for fixing strcpy bugs is to simply replace the calls to strcpy with calls to strncpy.  Unfortunately in many ways the strncpy API is just as bad as strcpy because it fills the destination string with null characters up until the length provided and doesn't ensure that the destination string is properly formed. But most people looking for a "safe" replacement for strcpy will ignore that part of the contract and thus introduce different security bugs while trying to fix existing problems (according to Michael Howard, this mistake has happened more than once in the wild). 

If the people recommending replacing strcpy with strncpy fully understood the contract for strncpy, it's likely that they wouldn't have make that mistake (or would have added more caveats).