Software Contracts, Part 6: Annotations


In short, an "annotation" is an addition to the source code for a program that allows an external translator to enforce the program's contract.

Way back in the beginning, the ONLY way to discover a function's contract was by it's documentation.  A function's contract looked like (from the MS-DOS 2.0 reference manual):

Get Time (Function 2CH)
    Call:
    AH = 2CH

    Return:
    CH
        Hour (0-23)
    CL
        Minutes (0-59)
    DH
        Seconds (0-59)
    DL
        Hundredths (0-99)

Function 2CH returns the current time set in the operating system as binary numbers in CX and DX [...] Depending on how your hardware keeps time, some of these fields may be irrelevant.  As an example, many CMOS clock chips do not resolve more than seconds.  In such a case the volume in DL will probably always be 0.

There was no other mechanism to let you know what a function expect (ok, your program would crash if you passed in invalid input, but that isn't particularly useful).

That's the problem with documentation-only contracts - they're extraordinarily fragile.  As a result, starting way back then (in the 1950s and 1960s) language designers started adding annotations to computer languages to help ensure that the contract for functions was met.

You've all seen these annotations - for instance, the first non trivial program in my copy of "A Practical Introduction to Pascal" published in 1978 has:

PROCEDURE DRAWALINE(LENGTH : INTEGER)
   VAR I : INTEGER;
   BEGIN
   FOR I := 1 TO LENGTH DO
      WRITE('-');
   WRITELN
   END;

The annotation is, of course the declaration of "length" as an integer.  When languages added type information to function prototypes, the type information functioned as annotations which allowed a software translator (the compiler) to enforce the contract.  It turns out that the compiler was in an ideal position to enforce the contract - by simply refusing to convert your high level program to machine code, it was quite simple for the compiler to ensure that you didn't violate the functions contract (too much).

Now the enforcement of the "type" contract varied from language to language.  For example the Pascal compiler was quite strict about its enforcement of annotations - if the parameters provided to a function didn't strictly match those of the function it refused to compile the code.  The "C" language compiler, on the other hand, was rather lax about enforcing parameters - for instance, the caller of a function didn't actually HAVE to match the number of parameters in the declaration of the function.  Unfortunately this rather lackadaisical attitude meant that it was easy to write some rather awkward code.  In addition, it turns out that it was somewhat difficult to translate C's semantics to some computer architectures (RISC machines, in particular).  As a result, in the 1990's as a part of the standardization of the C language, it also was changed to be strongly typed.

Type safety isn't a requirement - there are many languages that are essentially type-neutral (most scripting languages appear to be typeless (or have relatively weak type systems), for example), but that just means that it is harder to enforce strong contracts.

Next: Language Annotations: Beyond Simple Types

Comments (13)

  1. Anonymous says:

    > Way back in the beginning, the ONLY way to discover a

    > function’s contract was by it’s documentation.

    Way back in the beginning, the only desirable way was by its documentation.  Funny how that hasn’t changed.

    Way back in the beginning, the only really accurate ways were by reading the source code and/or disassembly.  Funny how that hasn’t changed.

    > […] published in 1978 […]

    > The annotation is, of course the declaration of "length" as an

    > integer.

    Actually that’s part of the language itself, not an add-on.  C was the same since around 1970 though its original syntax was less readable.  Fortran was the same since around 1958 though the original method of type declaration was abysmal (the first letter of the identifier) and a later syntax since around 1966 was less readable than C’s version.

    > The "C" language compiler, on the other hand, was rather lax

    > about enforcing parameters – for instance, the caller of a

    > function didn’t actually HAVE to match the number of

    > parameters in the declaration of the function.

    That was true, though most compilers were friendly enough to give a warning if the caller and callee were in the same source file.  If it was a call to an extern then the compiler usually couldn’t do the check — just as it usually can’t today.  The first time I saw a linker do a good start towards this kind of checking was around 1980.

  2. Anonymous says:

    "Way back in the beginning, the ONLY way to discover a function’s contract was by it’s documentation."

    I’d say this is still the only way to discover a function’s contract, in nearly all programming languages. (Extensions, preprocessors & assorted libraries excepted).

    Prototypes (or equivalent) can provide information about *part* of the contract for a function, but they almost never describe the whole contract. To take an example from the standard C library:

    double sqrt(double);

    OK, this tell you you have a function that takes a double-precision float, and returns another one. You can also probably figure out that it does a square root from the name (or because you’ve been looking for it).

    But what do you get if you pass in -1.0?

    a) Program termination (e.g. via SIGFPE, or some other mechanism)

    b) NaN, with errno = EDOM

    c) Implementation-defined behaviour

    d) Undefined behaviour

    e) Other

    The prototype is unable to tell you about that part of the contract. Yes, there are some languages that might be able to do so, but for most, the only place the contract is properly defined (and therefore the only place you can discover it) is in the documentation.

  3. Anonymous says:

    Static type checking is the easy part of the contract. It’s like the paragraph of definitions at the top of every contract that identifies the party of the first part. The computer linguists and compiler writers have that part nailed, at least when they want it to be nailed.

  4. Anonymous says:

    Having done a bit of 3D programming, I’ve reached the conclusion we (the programming world) have a lack in type safety when it comes to limiting (floating point) numbers to 0.0-1.0 (inclusive). In OpenGL it’s done by argument name or documentation. In DX… I don’t know if it’s even done. Anyway, there is a real and easily seen need for a type only allowing 0.0-1.0 (that still could freely be converted to a float or double).

    I’m sure Larry would also have wanted this doing the Vista audio stuff.

  5. Anonymous says:

    >But what do you get if you pass in -1.0?

    If you’re lucky? i 🙂

  6. Anonymous says:

    "As a result, in the 1990’s as a part of the standardization of the C language, it also was changed to be strongly typed."

    April is still a few months away Larry. C standardisation was done under the ANSI umbrella in the 1980s and largely consisted of formalising improvements that were already becoming widespread in vendor compilers. ANSI function prototypes are just an optional, though very useful, feature of the C standard.

    C remains a weakly typed language today, you can pass a character as an argument to a function which takes an integer and it will be silently promoted. If you omit the ANSI prototype you can pass three integers as the parameters of a function that’s supposed to take a string and two floats.

  7. Anonymous says:

    Well, from the strict mathematical point of view, the square root is not even a function! It’s just a mapping. Sqrt(4) mappes to +2 and -2.

    What is then sqrt API function going to return? I don’t know. There’s no explicit contract.

    But wait. There is "the least surprise" theorem.

    So hopefully everybody makes the right assumption:  sqrt(x) >= 0 for all x >=0;

    Oh, it was easy now.

    But sometimes it needn’t be so obvious.

  8. Anonymous says:

    Pavel > There *is* an explicit contract. You just *have to read the documentation*.

    According to the C standard[0], section 7.12.7.5, sqrt() on a negative value returns a "domain error". According to section 7.12.1, this is indicated with an implementation-defined value, and either a) errno is set to EDOM, or b) the invalid floating point exception is raised.

    POSIX says roughly the same thing[1], but requires that *if* the implementation supports NaN then that be the returned value.

    [0] http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1124.pdf

    [1] http://www.opengroup.org/onlinepubs/009695399/functions/sqrt.html

  9. Anonymous says:

    Saturday, January 20, 2007 8:26 AM by Nick Lamb

    > C remains a weakly typed language today, you can pass a

    > character as an argument to a function which takes an

    > integer and it will be silently promoted.

    Three parts of your sentence have nothing to do with each other.

    Nearly any use of a char in nearly any expression will cause the char to be promoted to int (or on obscure architectures unsigned int) because the standard says so.  This has nothing to do with weak typing and only coincidence with function calls.

    You can’t pass a char as an argument to a function.  You can pass an int (or an unsigned int) whether or not the thing came from promoting a char.  In a function’s prototype you can declare a parameter to be a char, in which case the passed value will be demoted before being assigned to the local char variable.  Of course most uses of that char variable will just get promoted again, though if the down-casting destroyed the value then the promotion won’t recover the original value.

    > If you omit the ANSI prototype you can pass three integers

    > as the parameters of a function that’s supposed to take a

    > string and two floats.

    Your use of "can" is a word game.  Your coding will evade a compile-time error in order to get a run-time crash (if your customer is lucky) or run-time misbehaviour (if your customer is unlucky).  Sure you "can" defer the error, but no you "can" not get correct results out of it.

  10. Anonymous says:

    Adam > What documentation do you mean?

    I read the MSDN online dosumentation. Twice.

    The Remark section says:

    "C++ allows overloading, so users can call overloads of sqrt that take float or long double types. In a C program, sqrt always takes and returns double."

    The Return Values section says:

    "The sqrt function returns the square-root of x. If x is negative, sqrt returns an indefinite, by default."

    None of these two sections says whether the return values for x > 0 are always positive or are always negative or are sometimes positive, sometimes negative.

    Yes. There is an Example and it’s Output section which says:

    "The square root of 45.35 is 6.73"

    Does this result implies that sqrt(x)>0 for all x>0 ?

    Of course not! (Don’t forget, I’m still in "strict mathematical point of view")

    My favorite question for such cases is:

    "Number 2 is a prime. Does it mean that all even numbers are primes?"

    P.

  11. Anonymous says:

    My last post on contracts introduced the idea that a languages type system can be used as a mechanism

  12. Anonymous says:

    Pavel > Sorry, I misread your post.

    It is in the documentation – you just need to read the correct documentation 🙂

    From the C standard, previously linked:

    "The sqrt functions compute the nonnegative square root of x."

  13. Anonymous says:

    Ok, it’s taken 7 other posts, but we’ve finally gotten close to where I wanted to be when I started this

Skip to main content