Software Contracts, Part 4: Documentation

 The other day, I started talking about software contracts.  Today I'd like to start talking about how contracts are embodied.

The primary place people go to look for a function's explicit contract is the documentation for that function, either in the published documentation or in the header for the function (that's why the routine header I mentioned way back here is so important), and why it's so critical to make sure that the header accurately reflects the function.

So how are software contracts realized?  They can take several forms - contracts are either explict or implicit.  Explicit contracts are spelled out in the function's declaration or documentation.

So let's take a look at an example of an embodiment of a software contract - the documentation for the Win32 ReadFile API.  According to the documentation, the function reads from a file starting at the current file pointer.  It also indicates that the function has synchronous and asynchronous modes of operation.

There are 6 parameters to ReadFile: hFile, lpBuffer, nNumberOfBytesRead, lpNumberOfBytesRead, and lpOverlapped.  I'm doing to discuss the contracts for the three of them.

The documentation for hFile says (in part):

[in] A handle to the file to be read.

The file handle must be created with the GENERIC_READ access right. For more information, see File Security and Access Rights.

For asynchronous read operations, hFile can be any handle that is opened with the FILE_FLAG_OVERLAPPED flag by the CreateFile function, or a socket handle returned by the socket or accept function.

The documentation for lpBuffer says:

[out] A pointer to the buffer that receives the data read from a file

The documentation for lpNumberOfBytesRead says:

[out] A pointer to the variable that receives the number of bytes read.

ReadFile sets this value to 0 (zero) before doing any work or error checking. If this parameter is 0 (zero) when ReadFile returns TRUE on a named pipe, the other end of the message mode pipe calls the WriteFile function with nNumberOfBytesToWrite set to 0 (zero).

If lpOverlapped is NULL, lpNumberOfBytesRead cannot be NULL.

If lpOverlapped is not NULL, lpNumberOfBytesRead can be NULL.

If this is an overlapped read operation, you can get the number of bytes read by calling GetOverlappedResult.

If hFile is associated with an I/O completion port, you can get the number of bytes read by calling GetQueuedCompletionStatus.

If I/O completion ports are used and you are using a callback routine to free the memory that is allocated to the OVERLAPPED structure pointed to by the lpOverlapped parameter, specify NULL as the value of this parameter to avoid a memory corruption problem during the deallocation. This memory corruption problem causes an invalid number of bytes to be returned in this parameter.

So let's look at what the contract for ReadFile says about these three parameters.

First the explicit statements:

  1. The caller is responsible for providing a HANDLE input value that points to a opened Win32 handle.  The object to which the handle points has to have been opened for GENERIC_READ[1].
  2. The caller is responsible for providing a pointer to valid memory for the lpBuffer parameter.  That buffer MUST be of length at least nNumberOfBytesToRead in length.
  3. The caller is responsible for providing EITHER a valid lpOverlapped parameter (which points to valid memory of length sizeof(OVERLAPPED)) or a a valid lpNumberOfBytes (which points to valid memory of length sizeof(DWORD)) or both (that comes from the 2nd "If lpOverlapped..." clause.
  4. ReadFile guarantees that the value of lpNumberOfBytesRead is either 0 (on failure) or the number of bytes read (on success).  Under all circumstances, the lpNumberOfBytesRead parameter will contain the number of bytes read.

Now the implicit parts of the contract:

  1. The HANDLE, lpBuffer, lpNumberOfBytesRead (and lpOverlapped) values need to remain valid for the lifetime of the API call (more on that in a smidge).
  2. The ReadFile API will write at most nNumberOfBytesToRead to lpBuffer.  In other words, lpBuffer[nNumberOfBytesToRead] will remain unchanged by the API.

 

What things can you infer about the implementation as represented in the ReadFile contract:

  1. It's probably a bad idea to specify both an lpOverlapped and a lpNumberOfBytesRead parameter if you're using I/O completion ports - this comes from the 6th comment which discusses I/O completion ports and indicates that there's some mechanism that could cause memory corruption after completion.  There must be something that happens when I/O associated with a completion port runs.

 

What is NOT included in the ReadFile contract?

  • Performance guarantees - there is no indication in the contract for ReadFile about how long the operation should take, programs can make no assumptions about this.
  • If the function fails, there are no guarantee about any side effects. For ReadFile, that's not a big deal, but for WriteFile it's more relevant - for WriteFile, the API doesn't include any guarantees of the state of the region of the file being written.  In addition, if ReadFile completes with an error, nothing can be assumed about the contents of the memory - it might have been updated with some data from the file, it might not[2].
  • There is no indication of the state of the contents of the buffer while the call to ReadFile is in progress - the only guarantee is that at the time the API call completes successfully, the contents of the buffer will contain the relevant data from the file.
  • What happens when multiple threads call the synchronous file APIs simultaneously.  This was explicitly left out of the contract because the results are undefined when you do it.

 

One interesting discussion that came up w.r.t. the PlaySound API is that it's often difficult to know the lifetime of the API call.  ReadFile has the same issues - there's no way of knowing how long the call to ReadFile might take (it could literally be hours).  If the call to ReadFile is async, it means that the memory pointed to by the lpBuffer parameter must remain valid until the read completes.

The good news is that for ReadFile it IS possible to know if the API has completed - for instance, you can use the HasOverlappedIoCompleted API to determine if an async read has completed (there are multiple other mechanisms that are well documented).  However it's critical that the memory pointed to by the lpOverlapped, lpNumberOfBytesRead and lpBuffer  parameters all remain valid until the time that the API call completes.

Next: Annotations as a form of software contract.

 

[1] Note: this is actually wrong.  ReadFile requires the file handle be open for FILE_READ_DATA for disk files, GENERIC_READ grants more rights than is necessarily required.

[2] Note: For some kinds of errors, there actually IS a guarantee explicitly included in the contract - when reading from a named pipe in message mode, the ReadFile API can return a distinguished error (ERROR_MORE_DATA) which indicates that some data in the buffer may be valid.

 

Edit: Added comment about simultaneous operations not being in the contract.  Also fixed a really stupid typo.