On memory allocations larger than 64KB on 16-bit Windows


Allocating memory blocks larger than 64KB was tricky in 16-bit Windows because the nature of 16-bit segment:offset addressing meant that you could access the memory only 64KB at a time. Global memory allocations returned you a segment (or selector, if running protected mode Windows), and the memory started at offset zero in that selector. Things got complicated once you needed to read the byte that comes after offset 0xFFFF.

For the purpose of discussion, let's say that the value returned from Global­Alloc was 0x1234. The first 64KB of the allocated memory are accessible as 1234:0000 through 1234:FFFF.

In real mode, linear addresses are calculated by taking the segment number, multiplying by 16, and adding the offset. This means that 1234:0000 refer to linear byte 0x12340, and 1234:FFFF refer to linear byte 0x12340 + 0xFFFF = 0x2233F. The next linear byte is 0x22340, which you could access as 2234:0000.

Conclusion: When the offset wraps around, you add 0x1000 to the segment.

In standard mode, linear addresses are calculated by looking up the base address of the selector in the descriptor table, and adding the offset. When Windows allocated a block larger than 64KB, it allocated a block of consecutive selectors, so that the first selector pointed to the first 64KB of the allocated memory, the second selector pointed to second 64KB of the allocated memory, and so on.

Now, consecutive selectors do not have consecutive values, however. On the 80286, the bottom three bits of the selector are used for other purposes, so the numeric difference between consecutive selectors is actually 8. The first 64KB of the allocated memory are accessible as 1234:0000 through 1234:FFFF, and the next byte after that is available as 123C:0000.

This makes for a bit of trouble if you're writing a program that needs to run in both real mode and protected mode. When you reach the end of the first 64KB block, how much do you increment the segment/selector by to reach the next 64KB block?

Enter the __AHINCR variable.

The __AHINCR variable is a variable exported from KERNEL. In real mode Windows, the value is 0x1000. In protected mode Windows, the value is 0x0008. When your program reaches the end of a 64KB block, it uses the __AHINCR value to decide how much to increment the segment/selector by in order to reach the next 64KB block.

Most programmers never saw this variable. It was hidden inside the code generated by the compiler.

With the introduction of enhanced mode Windows, the memory manager did a little more. Enhanced mode Windows used the 80386, "Now with 32-bit registers!✨" This means that the offset portion of a selector:offset address can be a 32-bit value.

The Windows memory manager assigned the selectors to the different 64KB chunks of data in the same way that the standard mode memory manager did, but instead of setting the selector limit to 0xFFFF, it set the selector limit to extend to the entire remainder of the block. The first selector's limit was the entire memory block. The second selector's limit was the memory block minus 64KB. The third selector's limit was the memory block minus 128KB. And so on until all the selectors were exhausted.

This arrangement meant that if you could convince your compiler to do it (or if you wrote code in assembly language directly), you could leave the selector alone and operate solely on the offset portion.

Windows 95 took advantage of this. The languages team produced a special version of the compiler that, with proper coaxing and appeasement, could be convinced to access memory using 32-bit offsets from a 16-bit selector, provided you declared the selector and the pointer in just the right way.

No lesson today. Just some reminiscing.

Comments (20)

  1. Brian_EE says:

    The linked story in the last paragraph is even worse. The computer shown looks to have been in the UK (judging from the outlet and plug), so the “bad” linker would also have had a round-trip network delay from west-coast US to the UK.

    1. yukkuri says:

      It is highly unlikely that is the actual computer in question

  2. dead1ne says:

    Thinking back on this stuff makes me wonder if something will ever come along that makes page tables seem archaic and convoluted.

    1. Joshua says:

      It’s called JIT.

  3. Piotr says:

    Is there a way to figure out the multiplier when analyzing a memory dump from a different (unknown) machine in WinDBG?

    1. Um, 16-bit Windows 3.1 can’t run WinDBG.

      1. Yuhong Bao says:

        Or even crash dumps at all I think. The best that was common was Dr. Watson.

      2. Piotr says:

        oh, I assumed it was dependant on the processor, not the OS

  4. mikeb says:

    MS-DOS C compilers had an option to support a “huge” memory model or a declaration specifier that could be used to declare particular pointers as dealing with a “huge” memory allocation potentially larger than 64KB. However, my recollection is that those real mode compilers handled pointer increments/addition a little differently than Raymond described. Instead of adjusting the segment portion of the address when the offset would rollover at 0xffff, these compilers would keep the offset within 0x0000-0x000f and would increment the segment portion when the offset exceeded 0x0010. However my memory could be wrong and I can’t find anything on the web that backs this up.

    However, when looking for some support for this memory I was able to solve a mystery that Raymond’s article brought up – why is the name of the variable that holds the segment increment “__AHINCR”? The “INCR” part is obvious enough, but what does “AH” have to do with incrementing segment registers? My web search recalled that the MS-DOS compilers had options to configure various memory addressing models so you could optimize how much (or how little) the compiler had to worry about the segment registers:

    /AS – small model (both code and data in one segment and DS==SS)
    /AC – compact model (single code segment, multiple data segments)
    /AM – medium model (multiple code segments, single data segment)
    /AL – large model (multiple code and data segments)
    /AH – huge model (multiple code segments, huge data segments) <– source of "AHINCR" name

    1. Lars says:

      That’s some nice digging there. Good work!

    2. ErikF says:

      If memory serves, huge pointers were normalized in the way you described so that pointer arithmetic was easier (you didn’t get strange oddities like 1800:8000 == 2000:0000!) This normalization of course didn’t work at all with protected-mode Windows IIRC because segments != selectors (and I’m fairly sure they also messed up real-mode swapping schemes because the memory managers relied on the exact representation of a memory address for their allocators.)

      Just a small nitpick: the tiny model was the one where CS=DS=SS (aka .COM format); small model had separate code and data/stack segments.

      1. ErikF says:

        @mikeb: I think the nitpick wasn’t warranted; I just parsed your description of the small model incorrectly. Sorry about that.

        1. mikeb says:

          I can see that what I wrote for small model isn’t exactly crystal clear. What I remember is that later DOS compilers added a “/AT” option for tiny model where all segments were the same. I think the tiny mode created .com binaries directly instead of having to use the exe2bin utility to do the conversion. Or something like that.

    3. Anon says:

      Nice. I always wondered what the AH in _AHINCR stood for.

  5. Neil says:

    As I recall, in real mode Windows you also had to GlobalLock your handle before you could access it; only in protected mode was the selector related to the handle (on my copy of Windows 95 16-bit global handles happen to end in 6 or E while the selector from the return value of GlobalLock happens to end in 7 or F).

    1. Antonio Rodríguez says:

      Real-mode Windows did segment-based virtual memory in software (classic Mac OS and DOS applications which used overlays did the same). When you allocated memory, you got a handle. When you needed to access it, you locked the handle (getting a pointer to the actual data), accessed the memory, and then released the handle. The OS kept an usage count on every segment, so it knew which ones could be discarded (swapped to disk) if it needed to free memory, or moved around if it needed to compact the memory. Also, if you locked the handle of a discarded segment, the OS could recall it from disk (possibly discarding other segments, or compacting memory in the process) before returning you the pointer. Note that there weren’t pages, and thus, whole segments were swapped. But in the time, with typical mass storage access times measured in hundreds of milliseconds, maybe it was better.

      All the function import list patching Raymond has talked about in the past was there to allow you to call a function in a discarded segment. These complex mechanisms allowed Windows to trap the call, recall the offline segment, and then go on with the call as if nothing had happened.

      1. Anon says:

        I never used Macs before OS-X, but amusingly it seems like Apple’s engineers made some very questionable design choices in their scheme

        https://en.wikipedia.org/wiki/Mac_OS_memory_management#Fragmentation

        Palm OS and 16-bit Windows use a similar scheme for memory management, but the Palm and Windows versions make programmer error more difficult. For instance, in Mac OS, to convert a handle to a pointer, a program just de-references the handle directly, but if the handle is not locked, the pointer can become invalid quickly. Calls to lock and unlock handles are not balanced; ten calls to HLock are undone by a single call to HUnlock.[6] In Palm OS and Windows, handles are an opaque type and must be de-referenced with MemHandleLock on Palm OS or Global/LocalLock on Windows. When a Palm or Windows application is finished with a handle, it calls MemHandleUnlock or Global/LocalUnlock. Palm OS and Windows keep a lock count for blocks; after three calls to MemHandleLock, a block will only become unlocked after three calls to MemHandleUnlock.

        These use of high bits in addresses seems like a really bad idea too

        https://en.wikipedia.org/wiki/Mac_OS_memory_management#32-bit_clean

        Because memory was a scarce resource, the authors of the Mac OS decided to take advantage of the unused byte in each address. The original Memory Manager (up until the advent of System 7) placed flags in the high 8 bits of each 32-bit pointer and handle. Each address contained flags such as “locked”, “purgeable”, or “resource”, which were stored in the master pointer table. When used as an actual address, these flags were masked off and ignored by the CPU.[4]
        While a good use of very limited RAM space, this design caused problems when Apple introduced the Macintosh II, which used the 32-bit Motorola 68020 CPU. The 68020 had 32 physical address lines which could address up to 4 GB (232 bytes) of memory. The flags that the Memory Manager stored in the high byte of each pointer and handle were significant now, and could lead to addressing errors.
        In theory, the architects of the Macintosh system software were free to change the “flags in the high byte” scheme to avoid this problem, and they did. For example, on the Macintosh IIci and later machines, HLock() and other APIs was rewritten to implement handle locking in a way other than flagging the high bits of handles. But, many Macintosh application programmers and a great deal of the Macintosh system software code itself accessed the flags directly rather than using the APIs, such as HLock(), which had been provided to manipulate them. By doing this they rendered their applications incompatible with true 32-bit addressing, and this became known as not being “32-bit clean”.

        Compared to that Win16’s scheme doesn’t seem all that bad – it’s only as nasty as it needs to be given the constraint that it needs to work in real mode or 286 protected mode. Then again I never wrote much Win16 code.

        Still you can see why both Apple and Microsoft decided on a clean break for 32 bit code – once you have a flat 32 bit address space and an MMU you can hide the fact that blocks need to move or leave physical memory from the virtual only view applications see. And even better it doesn’t really matter if the physical address space fragments – the magic of virtual memory hides it. And of course applications have their own virtual address space and can’t overwrite anything outside it.

        I worked on a few vxWorks systems with no virtual memory support and they were very prone to dying due to memory fragmentation. Or dying due to one process overwriting another.

  6. yukkuri says:

    Laughing at that sparkle emoji!

  7. keal says:

    “so the numeric difference between consecutive selectors is actually 8. The first 64KB of the allocated memory are accessible as 1234:0000 through 1234:FFFF, and the next byte after that is available as 123C:0000”

    Would you ever have a selector 1234 if they come in steps of 8?

    1. They come in steps of 8, but it doesn’t start with 0.

Skip to main content