Uninitialized garbage on ia64 can be deadly

On Friday, we talked about some of the bad things that can happen if you call a function with the wrong signature. The ia64 introduces yet another possible bad consequence of a mismatched function signature which you may have thought was harmless.

The CreateThread function accepts a LPTHREAD_START_ROUTINE, which has the function signature

DWORD CALLBACK ThreadProc(LPVOID lpParameter);

One thing that people seem to like to do is to take a function that returns void and cast it to a LPTHREAD_START_ROUTINE. The theory is, "I don't care what the return value is, so I may as well use a function that doesn't have a return value. The caller will get garbage, but that's okay; garbage is fine." Here one web page that contains this mistake:

void MyCritSectProc(LPVOID /*nParameter*/)  
{ ... }

hMyThread = CreateThread(NULL, 0,
             (LPTHREAD_START_ROUTINE) MyCritSectProc,  
             NULL, 0, &MyThreadID);  

This is hardly the only web page that supplies buggy sample code. Here's sample code from a course at Old Dominion University that makes the same mistake, and sample code from Long Island University, It's like shooting fish in a barrel. Just google for CreateThread LPTHREAD_START_ROUTINE and pretty much all the hits are people calling CreateThread incorrectly. Even sample code in MSDN gets this wrong. Here's a whitepaper that misdeclares both the return value and the input parameter in a manner that will crash on Win64,

And it's all fun until somebody gets hurt.

On the ia64, each 64-bit register is actually 65 bits. The extra bit is called "NaT" which stands for "not a thing". The bit is set when the register does not contain a valid value. Think of it as the integer version of the floating point NaN.

The NaT bit gets set most commonly from speculative execution. There is a special form of load instruction on the ia64 which attempts to load the value from memory, but if the load fails (because the memory is paged out or the address is invalid), then instead of raising a page fault, all that happens is that NaT bit gets set, and execution continues.

All mathematical operations on NaT just produce NaT again.

The load is called "speculative" because it is intended for speculative execution. For example, consider the following imaginary function:

void SomeClass::Sample(int *p)
  if (m_ready) {

The assembly for this function might go like this:

      alloc r35=ar.pfs, 2, 2, 1 // 2 input, 2 locals, 1 output
      mov r34, rp               // save return address
      ld4 r32=[r32]             // fetch m_ready
      ld4.s r36=[r33];;         // speculative load of *p
      cmp.eq p14, p15=r0, r32   // m_ready == 0?
(p15) chk.s r36=[r33]           // if not, validate r36
(p15) br.call rp=DoSomething    //         call
      mov rp=r34;;              // return return address
      mov.i ar.pfs=r35          // clean registers
      br.ret rp;;               // return

I suspect most of you haven't seen ia64 assembly before. Since this isn't an article on ia64 assembly, I'll gloss over the details that aren't relevant to my point.

After setting up the register frame and saving the return address, we load the value of m_ready and also perform a speculative load of *p into the r36 register. Notice that we are starting to execute the "true" branch of the "if" statement before we even know whether the condition is true! That's why this is known as speculative execution.

(Why do this? Because memory access is slow. It is best to issue memory accesses as far in advance of their use as possible, so you don't sit around stalled on RAM.)

We then check the value of m_ready, and if it is nonzero, we execute the two lines marked with (p15). The first is a "chk.s" instruction which says, "If the r36 register is NaT, then perform a nonspeculative load from [r33]; otherwise, do nothing."

So if the speculative load of *p had failed, the chk.s instruction will attempt to load it for real, raising the page fault and allowing the memory manager to page the memory back in (or to let the exception dispatcher raise the STATUS_ACCESS_VIOLATION).

Once the value of the r36 register has been settled once and for all, we call DoSomething. (Since we have two input registers [r32, r33] and two local registers [r34, r35], the output register is r36.)

After the call returns, we clean up and return to our own caller.

Notice that if it turns out that m_ready was FALSE, and the access of *p had failed for whatever reason, then the r36 register would have been left in a NaT state.

And that's where the danger lies.

For you see, if you have a register whose value is NaT and you so much as breathe on it the wrong way (for example, try to save its value to memory), the processor will raise a STATUS_REG_NAT_CONSUMPTION exception.

(There do exist some instructions that do not raise an exception when handed a NaT register. For example, all arithmetic operations support NaT; they just produce another NaT as the result. And there is a special "store to memory, even if it is NaT" instruction, which is handy when dealing with variadic functions.)

Okay, maybe you can see where I'm going with this. (It sure is taking me a long time.)

Suppose you're one of the people who take a function returning void and cast it to a LPTHREAD_START_ROUTINE. Suppose that function happens to leave the r8 register as NaT, because it ended with a speculative load that didn't pan out. You now return back to kernel32's thread dispatcher with NaT as the return value. Kernel32 then tries to save this value as the thread exit code and raises a STATUS_REG_NAT_CONSUMPTION exception.

Your program dies deep inside kernel and you have no idea why. Good luck debugging this one!

There's an analogous problem with passing too few parameters to a function. If you pass too few parameters to a function, the extra parameters might be NaT. And the great thing is, even if the function is careful not to access that parameter until some other conditions are met, the compiler may find that it needs to spill the parameter, thereby raising the STATUS_REG_NAT_CONSUMPTION exception.

I've actually seen it happen. Trust me, you don't want to get tagged to debug it.

The ia64 is a very demanding architecture. In tomorrow's entry, I'll talk about some other ways the ia64 will make you pay the penalty when you take shortcuts in your code and manage to skate by on the comparatively error-forgiving i386.

Comments (23)
  1. Anonymous says:

    excellent post!

  2. Anonymous says:

    How much is Intel paying Microsoft to work around all of these issues, or is Microsoft paying itself? I assume that someone is going to pay, because the customers are going to be yelling at Microsoft when the programs don’t work on their new machine with Microsoft OS/whatever and Intel CPU/whichever.

    I’m assuming here that Intel has a desire to be seen to be backwards compatible, even when it’s 64 bit CPUs aren’t, judging from your description.

    Or, I suppose, the customers could just do what I did when I found I’d been sold a machine which corrupted the BIOS data area, which killed unpatched Brief, and return their new computers as not PC compatible?

  3. Anonymous says:

    It all stems from the olden C days and the main function.

    In C, it didn’t really matter what the return type of your function was, wether you had one at all, or what type the parameters were. It was very common to just pass void pointers around, while the function itself would be declared as taking a struct pointer.

    A special case was the main function, where you could have 0, 2 or 3 parameters to your choice, and return an int or void. This is still the case today: You can just have void main().

    If that’s how it is for main, everyone assumes it must be the same for thread procs.

    A solution might be that the compiler treats a function without a return value as a function that returns an int with value 0. This is what’s intended almost always anyway, and it would solve this problem.

  4. Anonymous says:

    I assume, nothing.

    Basically, it’s a hazard of porting to IA-64; it’s much stricter about incorrect programs (or more accurately, programs which rely on undefined behaviour). Undefined behaviour in C is allowed to do anything, including corrupting memory, formatting your hard drive, or making demons fly out of your nose. [Historical link: http://groups.google.com/groups?hl=en&selm=10195%40ksr.com]

    The 64-bit CPUs will and do run 32-bit x86 code, using the same lax attitude as a ‘genuine’ x86 processor (not that there are any of those any more!) However, recompile that code as native IA-64 code and it will bite you. Not because of any flaw in the processor, but because your code is wrong and should be fixed.

    No doubt MS will introduce compatibility fixes in the Longhorn generation for those incorrect programs released in the current XP/Server 2003 time-frame, but for the moment, it’s probably quite clean in there. Is your pops-too-many-arguments-from-the-stack WindowProc fix still in there, Raymond?

  5. Anonymous says:

    You can just have void main().

    Henk, I dare you to say that on comp.lang.c :). Within 5 minutes, 50 C language lawyers will have flamed you to a crisp.

  6. Anonymous says:

    "A solution might be that the compiler treats a function without a return value as a function that returns an int with value 0."

    I find it somewhat ironic that you’re suggesting that the compiler make every program slowre just to compensate for badly-written programs, while at the same time everybody seems to be upset that the OS does exactly that.

    People want their compiler to optimize more, not less. Notice all the hinting micro-optimizations like declspec(noreturn) or declspec(novtable).

  7. Anonymous says:


    You are absolutely right, according to the C standard.

    But if you type "c language main" in google, you will find many tutorials on C which will tell you to use void.

    for example: <a href=’http://www.dummies.com/WileyCDA/DummiesArticle/id-1062.html">http://www.dummies.com/WileyCDA/DummiesArticle/id-1062.html</a&gt;.

    If you search for "c language hello world" you will find this page:

    <a href="http://www2.latech.edu/~acm/helloworld/c.html">http://www2.latech.edu/~acm/helloworld/c.html</a&gt;. Implicit int is returned, but no return statement. Wrong again.

    In fact, it will be extremely difficult to find a correct example on google.

    This page lists all the possible forms:

    <a href="http://www.csee.umbc.edu/courses/undergraduate/CMSC104/spring02/burt/C_summary.html"http://www.csee.umbc.edu/courses/undergraduate/CMSC104/spring02/burt/C_summary.html</a&gt;

    I would say, the de facto standard is all versions are allowed. that’s how most people do it, and compilers allow it.

  8. Anonymous says:

    You seem to be of the general belief that "Any rule that is not enforced today must never be enforced tomorrow." Which has as a consequence that practically nothing can be changed any more. (For example, no internal data structures can change because nobody was preventing you from using ReadProcessMemory to suck them out.)

    Feel free to argue this position at the next standards meeting. I suspect the people on the standards committee have a different viewpoint on the matter.

  9. Anonymous says:

    As a developer I’m of the personal belief that I should follow standards and rules with care.

    When I think of customer impact and costs I become a lot more liberal and start to want the OS to do a really good job of insulating applications from changes in the underlying hardware.

  10. Anonymous says:

    I think the only possibility to change this would be having a compiler option that enforces this, and is on by default, but can be turned off.

    The quick look i have taken shows that most tutorials, courses etc. get it wrong. Specifically, most of them contain the old Hello World sample that is wrong (but, i assume, used to be right). As long as this is the case, it will be very hard to enforce this standard.

    I think the confusion is again based on the old versions of C, where void didn’t exist, and the main was usually written just as main(). When void was introduced, many programmers got it wrong and thought void was the same as putting nothing, whereas putting nothing is the same as int. The habit also used to be to have no return statement when there was no explicit return type, just as was later the case with void.

    One more remark:

    Usually it’s even plain wrong to use CreateThread at all. When using any crt functions, you should use _beginthreadex instead, which is very similar to CreateThread. I wonder how many programmers get that right…

  11. Anonymous says:

    "Usually it’s even plain wrong to use CreateThread at all. When using any crt functions, you should use _beginthreadex instead, which is very similar to CreateThread. I wonder how many programmers get that right… "

    Only if you’re linking to the C run time statically. If you’re linking dynamically, you can do it either way.

  12. Anonymous says:

    You have to wonder why MSDN doesn’t say "don’t use this" in big red flashing letters then, don’t you?

    I suspect MSDN is as much to blame for many of Raymond’s gripes as any of the developers out there. It’s a bit much to expect that we’ve all read MSJ articles that are over 4 years old.

  13. Anonymous says:

    Take care not to confuse MSJ with the Platform SDK; they are unrelated publishing organizations.

  14. Anonymous says:

    I just thought i’d take a look at how Microsoft declares main functions.

    I went to google and typed "microsoft.com c language reference main".

    The first hit is C#, the second has void main().

    Looking a bit further, although the definition of the main function in the C/C++ language reference is correct, most examples in the C/C++ language reference are wrong. For example http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccelng/htm/tions_44.asp.

    The only correct one i found so far in the C/C++ language reference is http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccelng/htm/basic_29.asp, which demonstrates the use of return in the main functon and thus has to return int to compile.

  15. Anonymous says:

    Raymond, I take it casting function pointers on the ia64 copies the value blindly and doesn’t do something like call a thunked function or point to another "gp" register, whatever that is?

  16. Anonymous says:

    The gp register is another name for r1. By convention, the r1 register is used as a base pointer for accesses into a module’s static data – there can be up to 4MB of this accessed directly (due to an architectural limitation – there are only 22 bits in the instruction bundle for indirect addressing via offset). If you need more than that, I assume the compiler has to store a relocatable address in this global data, requiring two dereferences to access the actual data.

    This reduces the number of relocations. There are a huge number of them for x86 DLLs, because x86 doesn’t offer much support for position-independent code. This is one reason that most texts for Windows DLL development (I can think of at least two) suggest you set the base address for your DLLs and bind the import table – it speeds up loading, and your debug symbols have a greater chance of being correct…

    More information on gp can be found at http://msdn.microsoft.com/msdnmag/issues/1100/hood/

    Casting a function pointer typically generates no code at all – it’s a simple copy. If you changed gp, the function would be looking for its static data in a different module…

  17. Anonymous says:

    This strictness could seriously work against attempts to raise IA64 market share. When big companies try their crufty code on this new processor, they’ll blame IT for the resulting problems. (just like Raymond’s examples of Microsoft getting blamed for breaking poorly-written Windows programs).

    As a programmer I know it’s always good to test one’s code in a strict environment. But really what gains are to be had after one puts in all the effort to root out bugs like these? I haven’t seen convincing evidence that the IA64 platform is any faster than modern x86 except on certain micro-benchmarks. (and even if it matches the absolute speed, only then do you start worrying about price/performance)

  18. Anonymous says:

    The problem isn’t the application code or the chip — it’s the language. The fact is: those two function signatures are not equivalent, and the compiler shouldn’t allow you to cast one as the other even though you’re working with pointers. The compiler should be smart enough to detect that they point to different data types and prevent you from passing one for the other.

    Other language compilers (Pascal and Modula 2 come to mind) prevent this type of mistake by having the callee rather than the caller decide whether an argument is passed by reference. So, rather than pass LP_foo you just pass foo, and the signature of the called function defines whether it is passed by value or reference. Even if the function declaration specifies that foo is passed by reference, you can’t substitute "bar" for foo — the compiler won’t allow it.

    Of course, there are obvious downsides to having the callee alone make this determination. I mention it here as an example of one way a compiler can prevent the types of mistake you describe.

    The fundamental problem is that C has a long history of passing things around in indistinguishable pointers, effectively defeating any type checking the compiler might want to do.

  19. Anonymous says:

    A couple of additional comments on this…

    I think C# actually gets this right from a language/compiler standpoint. It addresses this issue in two ways:

    1. You don’t have pointers, per se, you have references, which are, in this respect, like next-generation pointers. When you pass around a reference to something, it’s not merely a memory address, but includes the other information necessary to decode what exactly is referenced. Because of this, the compiler is able to prevent you from casting a function pointer to one type of function to another.

    2. In C#, both the caller and the callee have to specify that a parameter is passed by reference. So, unlike Pascal and Modula 2, the callee alone does not make this determination. One downside of having the callee alone decide whether a param is passed by value or address is that the callee can change the param passed in — changes the caller isn’t expecting — possibly introducing subtle bugs that the caller will have a tough time running down.

    And, one more thing: in VC++, you can turn on compiler warning 4191 to warn you of invalid casts like this. See http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccore/html/vcrefCompilerWarning(level3)C4191.asp for details.

  20. Anonymous says:


    I am working on a ia64 project. I have encountered with a opcode CHKS_MOV_PR.

    Could anyone help me what this mean ?

    Thanks in advance for help.


  21. Anonymous says:

    The Itanium has two stacks, so don’t assume that there’s only one.

Comments are closed.