Why is CLIPFORMAT defined to be a WORD rather than a UINT?


Commenter Ivo wants to know if the Register­Clipboard­Format function returns a UINT, why is the CLIP­FORMAT data type defined to be a WORD? Since a WORD is smaller than a UINT, you have to stick in a cast every time you assign the result of Register­Clipboard­Format to a CLIP­FORMAT.

Rewind to 16-bit Windows. Back in those days, a UINT and a WORD were the same size, namely, 16 bits. As a result, people got lazy about the distinction. Six of one, a half dozen of the other. (People are lazy about this sort of distinction even today, assuming for example that UINT and DWORD are the same size, and in turn forcing UINT to remain a 32-bit integer type even on 64-bit Windows.) The Register­Clipboard­Format function came first, and when the OLE folks wanted to define a friendly name for the data type to hold a clipboard format, they said, "Well, a clipboard format is a 16-bit integer, so let me use a 16-bit integer." A WORD is a 16-bit integer, so there you go.

This mismatch had no effect in 16-bit code, but once Win32 showed up, you had a problem since 32-bit Windows expanded the UINT type to 32 bits. Not only does keeping a CLIP­FORMAT in a WORD create the need for all this casting, it also leaves two bytes of padding in the FORMAT­ETC structure. Strike two.

Yeah, basically, it sucks.

Comments (22)
  1. BOFH says:

    Post facto snarky comment:

    "It" being Microsoft.

    I can't believe you forgot to pre-empt that…

  2. James Curran says:

    So, which was the "real" problem?  Should Register­Clipboard­Format  have been defined as returning a WORD, or should have CLIP­FORMAT been defined as a UINT? I'd lean towards the former (Do we really need more than 65000 clipboard formats?  Will Klingon be one of them?)

    And what would be the downside of… well.. fixing it.  I'm assuming Register­Clipboard­Format really is now returning 32 bits, even if only 16 of them are significant, so we'd have to change CLIP­FORMAT to a UINT — other than forcing people to remove a bunch of casts they didn't want to write in the frist place, what's the problem?

  3. barbie says:

    @James Curran: I would say the "fault" lies with the one not respecting the initial contract (aka first one's right). Here, the OLE folks did not respect the format returned from RegisterClipboardFormat.

    And it's not that easy to fix in after the fact. CLIPFORMAT is 2 bytes. Changing it would break any interface that uses the type.

  4. Ivo says:

    I think the solution is to have RegisterClipboardFormat, EnumClipboardFormats, etc return CLIPFORMAT (regardless of the number of bits). The Windows API is usually better at type safety. Like for example GlobalAddAtom returns ATOM and not WORD.

    [Well, for one thing that would be a layering violation. (Lower-level interface dependent on a type defined by a higher-level interface.) And for another thing, changing the return type of a function is a breaking change, so you have to worry about the compatibility consequences. For example, old code which calls the revised Register­Clipboard­Format function will now receive garbage in the upper 16 bits. -Raymond]
  5. Adam Rosenfield says:

    I could envisage a situation where a computer needed more than 65k clipboard formats.  Imagine, if you will, a poorly-designed piece of software that registers a dozen or so different clipboard formats for its purposes.  Suppose further that to avoid potential compatibility issues, it uses different clipboard format names in each version of it (say, because V2 added a new field to the binary clipboard format or something, so you don't want it to crash if users copy and paste between V1 and V2 in either direction).

    Now imagine that developers of this software have gone through thousands of different versions of it (several builds per day over years), with each build adding new clipboard formats but not deleting the old ones (either due to a bug, or because the developers are lazy and don't fully uninstall each build).  At this rate, they could potentially eclipse 65k clipboard formats after a reasonable period of time.

    Granted, that's an extremely contrived and unlikely scenario — in that case, running out of clipboard formats could be a signal to the developers that maybe they're not quite doing something right.  A better solution would be to refactor their versioning into the clipboard data itself instead of the format name.

  6. JeroeN Mostert says:

    @Adam: Why is it that every time someone starts off with "imagine, if you will…" they proceed by describing a scenario best left to the Twilight Zone? :-)

  7. alegr1 says:

    @Adam R:

    Clipboard format registration is desktop-local and ethemeral. It's not persistent over a reboot.

  8. Zan Lynx says:

    @JeroeN: Whenever you imagine a scenario so awful, so hideous, so twisted that you believe it could not actually happen — plan for it happening. It will happen. Someone will write it.

  9. Finn says:

    Why would anyone think C's types were a good idea anyway? Is there some advantage to "int" being 32-bit on one platform and 48-bit on another? Were they really that naïve, to think that you could just compile the source on a different platform, and have it magically work? The mind boggles.

    Every day I find new reasons to praise the interpreted languages.

  10. Joshua says:

    @Finn: It largely worked and the user-mode code of various UNIX derivatives was very nearly source portable to wildly different architectures.

  11. Gabe says:

    Finn: Why would you want a 32-bit "int" on a 16-bit 8088, an 18-bit PDP-7, a 60-bit CDC 6600, or a Burroughs 7700 that has 39 bits for an integer plus a sign bit?

  12. Jeroen Mostert says:

    "Is there some advantage to "int" being 32-bit on one platform and 48-bit on another? Were they really that naïve, to think that you could just compile the source on a different platform, and have it magically work?"

    They were naive indeed, for thinking programmers could ever care enough to write their code with sufficient care to make this "magic" happen. But it's a failing of the programmers nevertheless, not of the language designers. Yes, Virginia, you can write algorithms, realistic ones even, that do not assume a fixed size for integral types.

    In some worlds every int is 32-bit. These are not the worlds of C. Those who scorn the wisdom of C in this are not worthy to receive the blessings of the same. Verily.

  13. Simon Buchan says:

    The comment about UINT remaining 32 bit seems strange, did you mean ULONG? Otherwise you would have no 32-bit integer! (On AMD64 at least, 64-bit integers are still (slightly?) slower)

  14. pdp says:

    Why would you want 2^18 clip board formats if windows was ported to pdp7?

    This looks like a premature optimization indeed.

  15. Silly says:

    The documentation for RegisterClipboardFormat says it returns values only in the range C000-FFFF, so there are only 16K custom formats available per running instance of Windows. Oh dear! But at least I only need to use 14 bits in the foo bitfield of the #pragma packed Foo structure that gets persisted out to disk in MyCoolApp (because some other members do need to be persisted and this member is just slammed into the thing for convenience).

  16. alegr1 says:

    @Silly:

    If you'll ever want to do that in real life, remember that it doesn't make sense to persist the clipboard format tag, because it's not persistent.

  17. Gabe says:

    Silly: Are you sure it's 16k formats per running instance of Windows, and not per window station? I haven't tried it myself, but in theory there's no reason that the clipboard formats in your Terminal Server session would interfere with those in my session.

  18. alegr1 says:

    @Gabe:

    It's not just per window station, it's per desktop. A window station can contain multiple desktops. USER objects (windows, clipboard, hooks, menus) are isolated inside a desktop.

  19. Gabe says:

    alegr1: From msdn.microsoft.com/…/ms687096(v=VS.85).aspx

    "A window station contains a clipboard, an atom table, and one or more desktop objects."

    Are you suggesting that MSDN is wrong here? You can actually test it yourself. Use a program (like Sysinternals' Desktops.exe) to create multiple desktops. Then copy something to the clipboard, switch to a different desktop, and observe that you can paste what you copied from the previous desktop. You will see that your desktops all share a single clipboard.

  20. Wondering says:

    @Gabe: "Finn: Why would you want a 32-bit "int" on a 16-bit 8088, an 18-bit PDP-7, a 60-bit CDC 6600, or a Burroughs 7700 that has 39 bits for an integer plus a sign bit?"

    Why? Because this eliminates entires classes of errors (overflows, bit-logic errors), caused by different sizes of variables. Especially in C, when you "need" to cast variables now and then.

    It make much more sence to work with INT32 even on a 8088 (if included in the list of target systems) as to constantly keep in mind that your int variables can be shrinked to 16 bit on one specific platform.

    Anyway, who is developing software for such ancient or esoteric architectures today? Any architecture which does not handle data in granularity of power-of-two bytes is so marginal that this cannot justify any thoughts/work/discussion anymore. Using such CPUs even for embedded systems is just silly.

  21. Gabe says:

    Wondering: If avoiding integer overflow is more important to you than performance, I highly suggest finding a language other than C. The whole point of C is that it's sort of a high-level assembly language. If you can't access your machine's native integer size, what's the point?

  22. Wondering says:

    >If you can't access your machine's native integer size, what's the point?<<

    That does not matter. For example, you write your Word Processor by using INT32 for coordinates, counters, etc. If the target machine has 64 bit, and you recompile your project for 64 bit, no bad thing happen. The display will not magically have more pixels than before. You dont need to load other graphic formats than before (PGN is still PNG, JPG is still JPG), you need to import the same foreign document formats than before and so on and so on….

    >Wondering: If avoiding integer overflow is more important to you than performance, I highly suggest finding a language other than C<<

    Using C is not (always) about getting the last edge of performance like you suggest. It is just a programming language which lets you build complex and abstract things like any other c-style language. Coding and testing is expensive. If you plan to port code of a general-purpose application (office suite, web server web browser, …) between platforms and you are working for money (that is, you have limited time), you better use integer types that does not change arbitrarily between the platforms.

Comments are closed.