Our code needs to run on multiple platforms with different rules, so we follow none of them!


A customer was encountering sporadic crashes in their 64-bit application, and upon investigation, the problem was traced to a misaligned RSP register. We saw some time ago that the Windows x64 calling convention requires the RSP register to be 16-byte aligned.

The customer traced the source of the misalignment to a third-party library they were using. They contacted the vendor, who acknowledged that they were not following the Windows x64 calling conventions, but explained that their code needs to run on multiple x64 operating systems, and since each operating system has different calling conventions, they adhere to none of them!

I was kind of boggled by this remark. Yes, it's frustrating that different operating systems have different calling conventions, but that doesn't mean that you are welcome to ignore them. Every region of the world has different laws regarding the operation of motorized vehicles, but that doesn't mean "My truck driver has to drive through all these different jurisdictions with different rules, so he follows none of them!"

Comments (27)
  1. Luigi Bruno says:

    The simplest solution! :-(

  2. Anonymous says:

    Lucky that they had not invented their own convention! :-)

  3. Anonymous says:

    "Lucky that they had not invented their own convention! :-)"

    Custom conventions suck, but at least they're better than NO conventions.

  4. Anonymous says:

    I wrote such x64 code that didn't keep the stack aligned internally. Upon any exit from its own code, including any API calls, it realigned the stack however.

    BTW, the internal calling convention was a register calling convention.

    [Don't forget that exceptions count as "exit from its own code", so if you take an exception while the stack is misaligned, you will crash the process (if you're lucky). -Raymond]
  5. Yuhong Bao says:

    [Don't forget that exceptions count as "exit from its own code", so if you take an exception while the stack is misaligned, you will crash the process (if you're lucky). -Raymond]

    Well, generally between instructions the stack is aligned to 8 bytes but not necesarily 16 bytes.

    [The Windows ABI for x64 specifies the conditions under which the stack may be temporarily unaligned. If you misalign the stack outside those special cases, then you have violated the ABI and the behavior is undefined. -Raymond]
  6. Anonymous says:

    I don't see why you're boggled by the remark. It's annoying for this customer that he'll have to write an aligning wrapper, but (depending on programming language(s) / compiler(s) involved) it may be annoying for the library developer to do something about it too. Basically they're saying ‘different platforms use different aligning and that sucks, so they get what they asked for: we'll ignore them; not our problem any more.’

    It may not make good commercial sense if you try to sell your library, but don't say it isn't understandable. When confronted with a ‘manufactured’ problem usually your first instinct is to ignore it. It allows you to get work done and is easier than solving something pointless that you didn't ask for or pushing back against the problem (which in this case wouldn't have had any effect since Microsoft won't/can't change the ABI).

    [In which case the vendor needs to stop advertising their library as Windows-compatible, because it isn't. -Raymond]
  7. Anonymous says:

    [… so if you take an exception…. -Raymond]

    I will not. All conditions that can fail are checked beforehand, including guard page.

    Of course, if someone can inject an exception into another thread it will crash, but I don't think that's possible.

    [Hardware interrupt? Intrusive profiling? -Raymond]
  8. @Joshua: Most SEH exceptions are raised by programming errors. I doubt that there is any way for you to check every possible exception handling case before hand. Obviously things like EXCEPTION_BREAKPOINT isn't something you can test for.

  9. Anonymous says:

    [In which case the vendor needs to stop advertising their library as Windows-compatible, because it isn't. -Raymond]

    Wasn't mentioned in the story, and because of the no-brand-names policy there was no way to know. Even so, if it can be made to work using a wrapper (even if only in theory) it will be difficult to prove fraud in court. I think the customer should either switch to a different library or suck it up and start coding that wrapper. Or maybe send the cutest employee they can find out to plead with the library developer to do something about it.

    In any case, your bogglement still boggles me.

  10. [Hardware interrupt? Intrusive profiling? -Raymond]

    My understanding is that 16 byte alignment is only required for structured exception handling of non-leaf functions. Any intermediate RSP must be aligned. Hardware interrupts are not structured exceptions. Debugger breakpoints intercepted by an external debugger are not. Normal page faults are not, either. Only those page faults that are reflected back to the user mode as a C0000005 exception will invoke stack unwind procedure.

  11. Anonymous says:

    If I were the customer of that library, I would start thinking to myself, "If they can't be bothered to get this right, what other problems am I going to find with this library?" This would lead to other questions like, "Are there any other libraries that do this properly?"

    If this were occurring because the customer was using an oddball language/platform, I wouldn't expect them to support it, but this is a major OS (and the customer is probably using a C derivative.) It's not a great idea for a vendor to basically tell the customer that they're too lazy to do things properly.

  12. Anonymous says:

    @Anonymous Coward

    If the vendor didn't claim that the library was Windows-compatible, then their response would have been "We do not follow the Windows x64 calling convention because we do not support Windows".

  13. Anonymous says:

    "Our code needs to be compatible with many x64 platforms.  Therefore, it isn't!"

  14. DWalker59 says:

    Certain requirements, like 16-byte, alignment, don't cause problems when they are followed if they don't have to be.  If they have to be followed but aren't, that's a real problem.  In other words, if there's a set of calling conventions that the library could folow, that would be great.  It sounds like they didn't care enough to figure it out.

  15. MikeCaron says:

    Isn't this the sort of thing that you let the compiler figure out? Of course, if the problem exists in their hand-crafted assembly code, then having a well-behaving compiler won't help; you would need to get a well-behaving programmer too.

    @Anonymous Coward: This isn't about fraud, it's about how computers have rules you need to follow when writing programs for them. If there does not exist a super-set of rules that satisfies each platform you wish to support, then you can either drop the offending platform, or create two versions.

    (I am a .NET programmer, so I have no idea if there exists such a superset.)

  16. Anonymous says:

    So the question is who actually compiled the library, the 3rd party vendor or the user? Did the user choose to use a source code library and compile it themselves on Win64 even though the vendor never specifically claimed compatibility?

  17. Anonymous says:

    The simile has an obvious workaround: drive the truck in international waters. Problem solved!

  18. Anonymous says:

    @alegr1

    You are assuming here that the library vendor was only letting the stack get unaligned in leaf functions. Given the topic in general, and how humans are capable of amazing levels of stupidity, it is likely that a non leaf function was unaligned. At some point, an exception was thrown in a leaf function and caused the whole mess to die because the vendor wasn't following the Windows x64 ABI.

    If it was a leaf function that was unaligned then there wouldn't be an entry about a certain vendor not following the ABI because that is explicitly allowed, so it had to be something much more stupid.

    [As I recall, they had a non-aligned stack in a non-leaf function, and then a leaf function took an alignment fault. -Raymond]
  19. cheong00 says:

    @Larry Hosken: Even if you drive in international waters, you're still subject to the law of country that your "truck" has registration at.

    For all others talking about replacing the library, note that not all libraries have alternatives. Even libraries providing similarly functions may have subtle difference that are not quite obvious in the beginning, makes it really painful to move to the competitors.

  20. Anonymous says:

    They probably have bigger problems than just alignment if they're writing x64 ASM code. You also have to use the correct prologue directives and epilogue forms if you want your code to unwind correctly in the event of a stackwalk or exception.

  21. Anonymous says:

    Neil's comment was just perfect.  I laughed out loud.

  22. Anonymous says:

    It would be interesting to know why this "invention" was introduced in 64-bit windows when no other OS require it. It's neither a hardware restriction.

    [Um, x86 is the weirdo. All non-x86 processors use table-based structured exception handling dispatch. It was introduced in 1992 for Alpha AXP, MIPS, and PowerPC. -Raymond]
  23. Anonymous says:

    @640k

    The exception handling they chose for x64? While the setup is more awkward for it, it is easier to do at runtime and less likely to break by fpo or buffer overruns on the stack. It esentially has a list of stuff that you do to prepare for executing the function, so if it reverses the list then the stack and registers should be in the exact same state when exiting the function via an exception as it was in when it entered the function. (This assumes of course that nothing moronic was done in the function itself.)

  24. Anonymous says:

    So does this third party library work on Windows x64 at all?  Did the third party vendor test it on Windows x64, and did it work "normally" most of the time?  Does it only crash under "special" conditions?

  25. Anonymous says:

    @Neil SM – "Our code needs to be compatible with many x64 platforms.  Therefore, it isn't!"

    Lovely summary!

    @Anonymous Coward

    "When confronted with a ‘manufactured’ problem usually your first instinct is to ignore it. It allows you to get work done and is easier than solving something pointless that you didn't ask for…"

    You've got some boggling statements yourself! If your first instinct when you run into a problem is to ignore it, then I sincerely hope you're not a programmer. Or a builder. Or an engineer. Or architect, child-carer, miner, accountant… heck, I'm hard-pressed to think of a single profession where your attitude doesn't scare the hell out of me. And do you honestly think it's acceptable for a vendor writing a code library to knowingly write out-of-spec code because it's just all too hard?

    "…it may be annoying for the library developer to do something about it too."

    Programming is full of annoying problems. So are most jobs in fact… that's why people pay you for it.

    "…if it can be made to work using a wrapper (even if only in theory) it will be difficult to prove fraud in court."

    Ignoring the fact that no-one is talking about fraud here, I find that an interesting point of view. If I buy a heater that doesn't meet the appropriate standards because it likes to randomly catch alight, are you saying that if I can (even in theory) rewire it to fix the problem that the maker of the heater is not liable for the problem? What about if it was a car instead of a heater? A plane? Do I, as a passenger on the flight, have the responsibility in your eyes to check over the engine to make sure it is safe before I board? Of course not… the responsibility lies with the manufacturer, just as it does in this code library case.

  26. Anonymous says:

    I am pretty sure there are many truck drivers who do exactly that.

Comments are closed.