Advantages of knowing your x86 machine code


Next time you find yourself debugging in assembly language (which for some of us is the only way we debug), here are some machine code tricks you may wish to try out:

90
This is the single-byte NOP opcode. If you want to patch out code and don't want to think about it, just whack some 90's over it. To undo it, you have to patch the original code bytes back in, of course.
CC
This is the single-byte INT 3 opcode, which breaks into the debugger.
74/75
These are the opcodes for JZ and JNZ. If you want to reverse the sense of a test, you can swap one for the other. Other useful pairs are 72/73 (JB/JNB), 76/77 (JBE/JA), 7C/7D (JL/JGE), and 7E/7F (JLE/JG). You don't have to memorize any of these values; all you have to recognize is that toggling the bottom bit reverses the sense of the test. To undo this, just flip the bit a second time.
EB
This is the unconditional short jump instruction. If you want to convert a conditional jump to an unconditional one, change the 74 (say) to EB. To undo this, you have to remember what the original byte was.
00
On the other hand, if you want to convert a conditional short jump to a never-taken jump, you can patch the second byte to zero. For example, "74 1C" becomes "74 00". The jump is still there; it just jumps to the next instruction and therefore has no effect. To undo this, you have to remember the original jump offset.
B8/E8
These are the opcodes for the "MOV EAX,immed32" and the "CALL" instructions. I use them to patch out calls. If there's a call to a function that I don't like, instead of wiping it all to 90's, I just change the E8 to a B8. To undo it, change the B8 back to an E8.

It has been pointed out that this works only for functions that take zero stack parameters; otherwise, your stack gets corrupted. More generally, you can use 83 C4 XX 90 90 (ADD ESP, XX; NOP; NOP) where XX is the number of bytes you need to pop. Personally, I don't remember the machine code for these instructions so I tend to rewrite the CALL instruction so it calls the "RETD" at the end of the function.

I prefer these single-byte patches to wholesale erasure with 90 because they are easier to undo if you realize that you want to restore the code to the way it was before you messed with it.

Comments (74)
  1. Mike Dunn says:

    Another handy tip is that in C++ method calls, ‘this’ is stored in the ECX register (at least, MSVC does it that way). If you’re tracking down a method call that blows up due to a bad ‘this’ pointer, look at the assembly and watch how ECX gets set.

    Also when you’re debugging in VC (with source), hit Ctrl+F11 to switch between source and source+assembly views.

    Add a watch for "@ERR,hr" to see the value of GetLastError(). Add a watch for "@EAX,hr" to have a permanent place to look for function return values. The ",hr" part means "show as HRESULT", which also gives you a text description of the number if one is available.

  2. Raymond Chen says:

    Pah, source-level debugging! Once the optimizer’s done with your code source-level debugging is useless.

    You youngsters have it so easy… don’t even bother learning your HRESULTs, make the debugger decode them for you.. function keys do all the typing for you… lazy good-for-nothings…

  3. Gene Hamilton says:

    I have noticed in debug builds that the compiler will pad the area around the stack with CC’s Is that there to help people detect bad code early on, or for some other reason?

  4. Carlos says:

    For me, 0xEA is forever NOP.

    I first programmed assembly on a 6502, without such luxuries as a disassembler or single-stepping debugger. You’ve got it easy.

  5. nerd-style says:

    Raymond – do you have a good recommendation for an x86 programming book? At the U we did all of our assembly programming on 68K chips (mostly for embedded microprocessor classes), and I’d love to reaquaint myself with the finer points of x86 machine assembly.

    Of course, a book that is more Windows-centric would be preferred.

  6. Gene Hamilton says:

    Funny you mention the 6502.

    I was just going through all my old 6502 manuals and asm programs about an hour ago.

  7. Anon says:

    @nerd-style:

    If you want to learn X86 assembly, a good start is viewing the assembler generated by the compiler for a given construct. Also, download the IA32 instruction set references (both A-M and N-Z) and the other docs from Intel:

    http://www.intel.com/design/pentium4/specupdt/252046.htm

  8. Cooney says:

    Or you could install comet cursors on a test box and figure out the myriad ways it breaks things in the shell. I hear that’s good for hours of fun.

  9. Raymond Chen says:

    No I didn’t. I haven’t done source-level debugging since my college days.

  10. Tim Smith says:

    Raymond, you forgot to say "Back in my day…"

  11. Ben Combee says:

    The 0xCC fill done in debug code is to help identify uninitialized variables. Since 0xCC is an unlikely value, seeing a variable with that byte in the debugger means it’s likely that it hasn’t been changed by the program. Also, the address 0xCCCCCCCC isn’t likely to be a valid pointer, whereas having an uninitialized stack means it’s possible your uninitialized pointer might just overlap with a previous pointer value from another call.

  12. Ben Allison says:

    If you read the post, you’ll notice that 0xCC is the opcode for breaking into the debugger. Not only are the values easy to notice, but if you ever run into a situation where you execute instructions on the stack, you’ll automatically kick into the debugger.

  13. I’m sure you have a good reason for only debugging this way, but I simply cant think of one. Do tell.

  14. Alexey Kats says:

    I cannot tell anything for Raymond’s reasons, but in my experience there are at least two.

    1) I do not always trust optimizer in the compiler – sometimes it does dirty things and throws come calculations off (I had occasional problems with rounding and mixed single-precision and double-precision calculations, for example). And trying to match optimized code to the source code is anything but fun, sometimes you are better by ignoring source code and working with machine code only instead.

    2) Sometimes you just HAVE to debug someone else’s code which you do not have the source code for, but you have to figure out how to change YOUR code to make it work no matter what (that might be your job, and it’s unfortunately not always an option to stop using the buggy third-party code).

  15. Raymond Chen says:
    1. Once the optimizer has messed with your code source level debugging falls apart.

      2. Most debugging is done remotely. When you have to debug a customer’s machine 5000 miles away over a 56k modem, you can’t tell them, "First, I want you to install Visual Studio on your domain controller…"

      3. Installing a GUI debugger on the test machine changes the system configuration and therefore influences the test itself. Imagine if Windows XP had some horrific bug that goes away when you install Visual Studio. If all test machines had Visual Studio installed on them, then this bug would never be found!

      4. Just today I had to debug a problem that occurred only immediately after installing the OS. No chance to install VS even if you wanted to.

      5. If you’re debugging the OS itself (say the window manager), then you can’t use a GUI debugger since it needs the window manager to draw its UI!

      Conclusion: Since so much debugging is done in situations where GUI debugging is not possible, you are quickly forced to become an expert at command line debugging. At which point the incremental benefit of a fancy debugger is rather small.

  16. chadbee says:

    The Windows debugger includes a command, "a", which eliminates the need to memorize these opcodes.

    Just type "a <address>" where <address> is where you want to insert the new assembly opcode, and you’ll then get prompted for what you want to put there.

    Example, just picking a random jump instruction in Notepad and replacing it with a nop:

    0:001> u notepad!Search+0x11

    notepad!Search+0x11:

    0100595d 895de8 mov [ebp-0x18],ebx

    01005960 895dec mov [ebp-0x14],ebx

    01005963 7507 jnz notepad!Search+0x20 (0100596c)

    01005965 33c0 xor eax,eax

    01005967 e9cb010000 jmp notepad!Search+0x1eb (01005b37)

    0100596c 56 push esi

    0100596d 8b3540120001 mov esi,[notepad!_imp__SendMessageW (01001240)]

    01005973 57 push edi

    0:001> a 01005963

    01005963 nop

    nop

    01005964 nop

    nop

    01005965

    0:001> u notepad!Search+0x11

    notepad!Search+0x11:

    0100595d 895de8 mov [ebp-0x18],ebx

    01005960 895dec mov [ebp-0x14],ebx

    01005963 90 nop

    01005964 90 nop

    01005965 33c0 xor eax,eax

    01005967 e9cb010000 jmp notepad!Search+0x1eb (01005b37)

    0100596c 56 push esi

    0100596d 8b3540120001 mov esi,[notepad!_imp__SendMessageW (01001240)]

  17. Pavel Lebedinsky says:

    Most debugging is done remotely. When you

    > have to debug a customer’s machine 5000

    > miles away over a 56k modem, you can’t tell

    > them, "First, I want you to install Visual

    > Studio on your domain controller…"

    If you use windbg you can do remote source level debugging without installing anything on the remote machine. All you need to do is copy the files from the debugger directory and start dbgsrv.exe. Then you start windbg on your local machine and connect to the server. Since the actual debugger runs locally, there’s no need to copy symbols or source files to the remote machine.

  18. Anonymous Coward says:

    *this is only stored in CX for one particular calling convention (admittedly the most common for conventional VC C++ code). What is way more fun is dealing with programs that dynamically load zillions of DLLs, which load other DLLs, have layers of service providers, and are written in C and C++ with liberal amounts of COM and RPC thrown in. And have many threads, with large numbers of mutexes and other forms of lock. The optimising compiler then makes it just that little bit more interesting by interpolating the simple instruction stream by register and data dependency so that the CPU can do register renaming and get a higher ILP.

  19. jeffdav says:

    You can make ntsd show you the source for whatever line your are stepping through… that doesn’t count as source level?

  20. Skywing says:

    chadbee: You still need to know the lengths of the instructions if you are patching some existing code, so you might as well just write it in machine code in the first place.

    Pavel Lebedinsky: Neat. Some time ago, I ended up writing a WinDbg extension that (among other things) had some commands to allocate debuggee VM for that sort of stuff. I suppose it’s becoming partially obsolete now that .dvalloc is built-in. Oh well.

  21. n7yap says:

    A couple of 32-bit x86 assembly language books. These are used for college courses, good for learning assembly language.

    "Assembly Language for Intel-Based Computers"

    Fourth Edition

    Irvine

    "80×86 Assembly Language and Computer Architecture"

    Detmer

    The Intel Pentium Instruction Set manuals are a necessity.

    Check out [url]http://www.masm32.com[/url]

  22. Fa Eb Fe says:

    I was going to make some pithy remark like "if machine code isn’t the only way you program, it shouldn’t be the only way you debug" before I realised what a broad and variable meaning ‘debugging’ has.

    Is debugging by examining output instead of instructions legitimate? I’d argue it’s the only legitimate way: 1) code in interrupts this, 2) kernel that, 3) processors with uploadable microcode the other, etc, etc. (This argument is as unuseful as yours.)

    I’m not sure how you reconcile your earlier comment’s points 4 and 5 (can’t always install a debugger) with point 2 (debugging is done remotely). Graphical debuggers can be used remotely. I’m also very wary of point 3, because any debugging can create a Heisenberg effect.

    There’s a place for debugging at source level. This blog has exceptionally high standards, but the occasional mistake as well. No matter how readily you can interpret disassembly or machine code (on a given architecture), you must ultimately map back to the source code in the majority of cases – why create the extra work.

    I bet it took you ages before you could debug on IA64, while all your Visual Studio -using chums were productive sooner. Low level debugging has a place and is an admired skill, but high level debugging and chosing the best tools for a job are also skills.

  23. Raymond Chen says:

    4 vs 5: Debugging is done remotely, which means that you have to convince the guy at the other end to install the remote part of the debugger. This is not always feasible. Better to use something that is already installed (ntsd.exe).

    I have never used Visual Studio. I gave up when I hit "next line" in CodeView and the cursor jumped upwards ten lines, and then you ask to look at a local variable and you get garbage. How anybody source-level debugs optimized code I have no idea.

    (I was debugging ia64 code in 2000. When did Visual Studio support for ia64 show up?)

    Do you use the Qwerty keyboard? Why don’t you switch to Dvorak? It’s much better.

  24. byron says:

    on a similar note, here’s list of my favourite assembler interrupts:

    int 10h service 4

    read light pen position

    int 13h service 19h

    park heads ps/2 only

    int 15h

    cassette i/o

    int 18h

    resident basic

  25. Btw, I’m 100% with Raymond here. Source level debugging is a crutch.

    I do use source level debugging (with windbg), because it’s convenient, but all of my debugging of my service is done with NTSD (which means assembly language).

    The optimizer totally screws up any hope of debugging code, and for my code, the optimizer’s always on.

  26. Anonymous Coward says:

    The thing I would be happiest with is a debugger that could run code backwards (ie it would keep saving state and allow you to unwind to see what happened).

  27. Johan Thelin says:

    Here are two links for thow of you who’d like to try out win32 from assembler:

    http://personal5.iddeo.es/ret007ow/articles.html

    http://win32asm.cjb.net/

    Beware, these pages are not wonders of layout and style, but kind of fun to read :).

  28. I’ll go one further. Debugging with a user mode debugger is a real pain in the ass. You often can’t get symbols etc. etc. It turns out that if you’re careful you can almost always just do what you want with kd. Debugging stress problems with kd is tons easier than the equivalent with a source level debugger.

    You just have to remember that when a piece of VA is shared across processes, doing tricks like Raymond suggests in the base blog affect all processes. My team harangued the debugger team enough during XP that there are a lot of options in KD to localize kernel mode set breakpoints to certain processes or threads.

    I think that being able to get the source line from the debugger so I can look for it is helpful but I’m mostly with Raymond. When I am toying around with an idea I build it without optimizations and use visual studio (having worked on it some years ago…). But in the end, when you’re in a crunch, you’re going to find a time when you need to debug without symbols so if you’re hung up on source level debugging and user mode debuggers you’re just putting hurdles in your own way.

    Things are much better than when Raymond first tried though. The last major PDB format change fixed a bunch of the scoping problems for local variable definitions so examining locals (dv) is an order of magnitude more reliable now. The windbg UI is enough better that even if you’re not interested in source level debugging, having windows for the current call stack, registers, memory, etc is nice.

    But at the end of the day, if there’s an important break and all you have is a kernel debugger without symbols, you’ve got to just roll up your sleeves and figure it out.

  29. Oh and if you need to break in, "ba e1 <addr>" sets a bp that doesn’t change memory.

  30. Pavel Lebedinsky says:

    I use cdb 99% of the time but it’s not because I don’t believe in source level debugging. It just so happens that for many types of problems command line UI works just fine. Things like crashes, deadlocks, resource leaks and CPU hogging can in most cases be debugged without having to step through source code.

    But there definitely are cases when VS or windbg are really helpful, even if your local variables are all messed up and statements are executed out of order.

  31. Pavel Lebedinsky says:

    By the way, in the recent debuggers you can use .dvalloc command to allocate some scratch space in the target address space and copy the code you’re going to be messing with there. Then if you want to revert your changes, you simply copy it back:

    0:000> .dvalloc /b 40000000 1000

    Allocated 1000 bytes starting at 40000000

    0:000> * copy first 32 bytes of CreateMutexW to 0x40000000

    0:000> m kernel32!CreateMutexW L20 40000000

    0:000> * Mess with CreateMutexW

    0:000> * restore original code

    0:000> m 40000000 L20 kernel32!CreateMutexW

  32. Btw, once you get close to shipping, your only opportunity to find most bugs is debugging a crashed machine. If your front end system crashes every three days, but only after having been running for those three days, the vice president isn’t going to say "Ok, we’ll install visual studio and hope and pray it reproduces, never mind that we’re losing millions of dollars for every minute our mission critical system is down". They’re going to say "I want a fix for this problem TOMORROW".

    User mode debuggers are a crutch, every developer should be able to debug any crashed machine from assembly language.

    Btw, for AC (who wanted to go backwards in time), actually that’s a large part of how you debug in assembly – you track the flow of data through the stack and registers – from the state of the various registers and flags, you can infer a huge amount of information about what went wrong.

  33. Chris Becke says:

    When reverse engineering some code I find it quite usefull myself to take the assembler listing the debugger generates, and annotate it with the equialent c++.

    A tool I’d really like would rebuild a fake pdb from that source file so I could get "source" level debugging going, when I have no source.

  34. Serge Wautier says:

    Larry,

    I feel and share your pain. I’m in this exact situation right now. Even more frustrating: I have a copy of VS installed on a test machine where the problem shows up every week or so. Last time it showed up, I started VS and attached it to the process : Bang ! Both crashed ! Go figure…

    And my problem is even worse : not only we never simulated it with debug builds, but the program doesn’t crash, simply there are operations scheduled internally which as of some random moment are no longer scheduled. I’d love a good and nice crash so much more !!!

    – Serge, forever fighting…

  35. Very nice techniques!

    I’ve used some of them, but must admit I use source-level debugging most of the time, with the occasional low-level machine code debugging when the going gets though.

    Added a reference here:

    <a href="http://hallvards.blogspot.com/2004/11/machine-code-hacking.html">http://hallvards.blogspot.com/2004/11/machine-code-hacking.html</a&gt;

  36. Aaargh! says:

    "The 0xCC fill done in debug code is to help identify uninitialized variables."

    What’s wrong with good old 0xDEADBEEF ?

  37. Chris Becke says:

    Multibyte magic values are harder to spot if the structure has unaligned and or byte or word sized members.

  38. Tim Dawson says:

    [i]User mode debuggers are a crutch, every developer should be able to debug any crashed machine from assembly language[/i]

    Are you saying every developer should know assembly language? I find that an absurd idea.

  39. Mike Raiford says:

    "Are you saying every developer should know assembly language? I find that an absurd idea."

    Not at all absurd. Just like a lot of the examples here, there are many times where you CANNOT debug source-level. This means assembly is a must.

    Granted, most of my stuff is user mode, and I can get away with using source mode to check on some of the obvious things, But that’s more just working out simple, easy to find logic errors.

    The tougher one is when a program crashes for some unknown reason on an optimised build (I’ve had it happen.. Worked fine on the debug build, would fail miserably on the release build…) Turned out to be some subtlty with VC6’s optimizer that was fighting with my code. I wound up having to use the intel compiler instead.

    Knowing assembly language is very useful. Optimized builds tend to do weird things like store values in registers, which, unless you’re watching the asm, you won’t know where to look for the value of a particular variable. Its also invaluable for seeing why a 3rd party library is choking when you’re sure the input is right.

  40. Chris Becke says:

    Ha. That I’d agree with. Unless youre obsessed with the word "junior" in your job title, familiarity with your platforms assembler is mandatory.

    I disagree with the assertion that kernel mode debugging is a necessity. I’m quite happy debugging assembler in usermode thanks – Ive yet to encounter a bug thats exceeded my ability to track down with dev studio :P

  41. Tom Seddon says:

    Well, at least every C or C++ programmer should know the assembly language for their target platform. It’s kind of hard to get far without it, particularly when debugging optimized code.

    Do all these kernel debugger users who eschew line numbers and information about local variables make use of any symbol information at all? If not, how do you manage to do anything useful like that? Surely you need function names at the very least…

  42. Henk Devos says:

    I really enjoyed reading all these comments.

    I have used a few debuffers in my life myself.

    At uni, when doing my thesis and working on parallel procesing, all I had was a post-mortem debugger that could tell me the state f all processors. then I could start trying to figure out why they had deadlocked.

    I have also used debuggers like CodeView.

    At one time, when I had to figure out why an interrupt handler was crashing, the only way of debugging I could think of was putting a CC in the code so that I could see if I got so far, and by moving this CC around I could figure out which instruction was causing the crash. This worked becauzse a CC inside the interrupt handler was giving me a different error than the real crash.

    Source level debugging is a great luxury that I want to have when ever possible. But then the debugger still has to be good. For example on Macintosh I have to use CodeWarrior and this is a very poor debugger.

    I have to say the Visual Studio debugger is by far the best I have ever used. Not only for source level debugging but also for assembler level debugging.

    With Visual Studio I will use source level debugging when ever possible. But of course most of the time it is not possible.

    One of the reasons why some people don’t understand this is that debugging has a different meaning for different people.

    For some people debugging just means finding the logical errors that you made yourself. But this is usually a trivial job. Usually what you do is just start up the program and see on which line it crashes, then you realize which variable you forgot to initialize.

    For other people it is mostly related to finding out why the software stopped functioning after several people made uncoordinated changes to the source code. The job is basically the same as the previous case.

    But the kind of debugging we are talking aout here, that requires assembler level debugging, is a totally different thing.

    In the cae of Raymond I assume that many times has has to try figuring out why a certain 3rd party application doesn’t work on the new Windows version anymore. There is simply no source available in such cases.

    Similar when you have to figure out how windows works and why a certain Windows call doesn’t work and so on. That is the inverse situation of what Raymond has to do, but this is often the case for me. There is also no source code available.

    Raymond, I would really advise you to try the Visual Studio debugger for such cases. Once you figure out the tricks you can do amazing things with it.

    About the optimizer problem: I think it is very rare that the problems are caused by the optimizer. It used to be the case much more in the past, but such problems have been almost eliminated.

    Other things that can be nightmares to debug are everything related to caching: When the code runs in L1 cache it doesn’t work anymore, that kind of stuff.

  43. Tom says:

    One thing about source level debugging is that it’s possible to waste time if the debugger is subtly wrong – e.g. wrong values for locals, or (worse) a bogus stack frame. The advantage of binary+symbols you know that the limited information you have is at least accurate.

    And as the wonderful "Undocumented DOS" put it, the advantage about binary+symbols is that "it’s almost as good as source, the only thing missing is comments which are probably misleading anyway"

  44. Mike Raiford says:

    "About the optimizer problem: I think it is very rare that the problems are caused by the optimizer."

    True, I’ve only run into this once…

    Actually, I think I misstated the problem. It wasn’t a crash, per sé, but rather missing information in an ActiveX control I created. I turned out that due to a certain sequence of statements, the optimizer "optimized-out" a critical assignment…

    The intel compiler created working code, I imagine because of different heuristics. I probably could have gotten away with compiling that one file with Optimize for size rather than Optimize for speed, but it was drawing code, so faster is better.

  45. John Drake says:

    Some interesting posts here and some nice tips by Raymond.

    However, I get the impression that not everyone moves in the same circles. Although, I sometimes drop down to assembly language while debugging, it is fairly rare.

    I have yet to find a bug introduced by the compiler – mostly because the shops I’ve worked for disallow using the optimizer.

    As an application level developer, all my bugs could be found through use of the source level debugging (although, I sometimes needed to drop into assembly to see the forest instead of the trees).

    John

  46. Alexey Kats says:

    "About the optimizer problem: I think it is very rare that the problems are caused by the optimizer."

    Rare – yes, but if you have to suspect it more than twice, and changing a compiler is not an option, you kind of start minding the possiblity of such things, too.

    Two real-life examples:

    I was writing a function which used a floating-point constant, let’s say 123.4567. If you store it in single precision format you get one number of bits, double-precision gives you another, which makes an extremely subtle difference. But combined with several hundreds of multiply- add- round- divide- add- multiply- round- divide type of operations (no, I am not kidding) it was finally producing a number 10.5, and right before the final rounding it was 10.4999998 if double-precision was used, and 10.5000003 if single-precision was used. (The numbers are provided for demonstration purpose only, real numbers were different). I had to figure out first why exactly rounding of the final number was different for two builds of our application, and we are talking about subtle difference in one step of like several hundreds of similar iterations. Second, I had to figure out under what circumstances compiler was preferring a single-precision constant to double-precision. Third I had to GUARANTEE than no matter how you compile it the calculations are stable given the same original data (otherwise I’d be forced to check for this bug forever with each build). Not directly related to any bug in optimizer, but it was the optimization options which essentially forced compiler to chose between different data types, so debugging it was nightmare – try to match floating-point calculations in optimized and non-optimized code.

    On another occasion our application suddenly started to crash. No special user action, no magic keys, no singing and dancing around – it was simply disappearing as a process once in a while. No logs, no messages… Turned out it was a stack overflow, which was only caused by a build with full optimization, no debug symbols, and only with specific type of corruption in user data. I ended up writing a tool which was scanning compiled machine-code, finding all places where memory was toucher on stack every 4kb to commit virtual memory for the stack segment, and deducing the amount of stack it was trying to allocate, just to catch all POSSIBLE places where it could fail. Then I have to figure out what functions in source code (it’s 2/3 of million lines of code project, not counting statically linked libraries) were responsible for these allocations. I simply had to do it, to guarantee that the problem was indeed fixed.

    So, in both cases the problem was not CAUSED by optimizer, but SEVERED by it. First one – because of subtle difference in the produced code (it was not a bug, but I had to figure out what’s going on nevertheless), and second one – because I had to esentially reverse-engineer our own applications, and for that task I had a pretty much useless source-code.

    Sorry for the misprints – I am tired.

  47. Alexey Kats says:

    "multiply- add- round- divide- add- multiply- round- divide"

    Oops, that was supposed to be

    "multiply – multiply – round – divide – add – multiply – round – divide"

    Sorry.

  48. Gorgosyntheton says:

    Source level debugging is not a crutch, it is yet another tool in the toolbox. I recognize the need for asm level debugging, but there are also situations in which source debugging makes things easier. If I have a logic error that reproduces in a debug build, why would I traipse through loads of asm when I can easily debug the code I wrote?

    I don’t work on an OS or a server program, maybe that is the difference.

  49. David Pritchard says:

    Erm, horses for courses, people. Developers are different, requirements are different. Your essential skill is my optional extra, etc.

  50. Keith’s right – at one point, writing assembly language was a requirement for writing fast code. These days, the rules for making efficient code are so complicated that the compiler almost always does a better job of it than most people (there are probably a half a dozen people I’d trust to write efficient assembly language code).

  51. Why this is such an either/or thing? I want strong source-level debugging *and* strong assembly-level debugging. I want the debugger to offload the mundane work wherever possible *and* still allow me to go spelunking with the power of cdb and kd. I don’t want to have to install anything on the machine being debugged *and* I want a great GUI on my end that leverages color, multiple windows, data visualization, etc. so that I’m not confined by the linear flow of a command line debugger.

    For example, compare OllyDbg and cdb below, both are debugging notepad.exe with no symbols or source. Notice how in OllyDbg you always can see the registers and a useful view of the stack. In the assembly window it cracks parameters and window messages, resolves callees, shows the direction of jumps, looks-up constants, etc. wherever possible.

    http://www.tonyschr.net/images/ollydbg.png

    http://www.tonyschr.net/images/cdb.png

    For now, WinDbg is probably the best balance between power and UI: http://www.tonyschr.net/images/windbg.png

  52. Keith Moore [exmsft] says:

    "Are you saying every developer should know assembly language? I find that an absurd idea."

    There are several levels of "knowing assembly language".

    Knowing how to *read* assembly language well enough to debug compiler-generated code is an absolute must. As many people have already pointed out, there are situations in which you have no other choice. When some routine buried in a library spews chunks over your stack and your EIP ends up in the Twilight Zone, you’ll need to know enough assembly language to "pick up the pieces" and track down the guilty culprit.

    Knowing how to *write* assembly language is a much less useful skill, IMHO.

  53. bw says:

    ida rocks you damn ms nerds

  54. Michael Silk says:

    for the person that wanted a good assembly windows reference look at the forums here: http://win32asmboard.cjb.net/.

  55. Mike Hearn says:

    printf debugging is what I normally rely on: I have to assume that checked Windows builds have some kind of logging framework built in. Obviously logging can *also* affect the bug if it’s timing related, but there are lots of bugs where it’s not really possible to use assembly level debugging because the effects are quite abstract, and the best way to figure it out is to examine a debug trace.

  56. mschaef says:

    "So, in both cases the problem was not CAUSED by optimizer, but SEVERED by it."

    I have a recent story about one of these myself. I had an application that, only when compiled for release, was periodically failing to update the window during resize.

    For context, The normal way the window in my app gets redrawn is as follows:

    1) open the back buffer

    2) draw into the back buffer

    3) copy the back buffer into the window

    4) close the back buffer

    Basically, step 1 was periodically failing with "buffer already open" errors. Therefore, step 3 was periodically failing to run on subsequent redraws. Step 3 was enclosed in the equivalent of a "finally" block, which, in my little interpreter, is implemented as a pointer from the C stack to a function object that gets called when the stack gets unwound. After some investigative work, it turned out that when step 3 failed to run, this stack pointer pointed to null, rather than to the expected function to close the back buffer. A "break on write" revealed that the function was being nulled out by my garbage collector, which was a suprise since the the GC traversed the stack looking for object pointers, and my buffer close function should therefore have been considered referenced.

    The problem stemmed from the way I determined the beginning of the stack. Basically, in WinMain, I grabbed a pointer to a local variable, and considered that to be a close enough guess to the beginning of the stack and used that during garbage collection to define the range of addresses occupied by the stack. However, in release builds, the compiler took one of my initialization functions (with a couple huge local variables) and inlined it into WinMain, thereby enlarging the local variable space in WinMain. Due to the way local variables were allocated, this changed my guess of the stack base by a couple thousand bytes. During window redraws, the pointer to my buffer close function happened to be in the area of the stack that was not traversed. So, when the GC was invoked and the back buffer was open, the close-the-back-buffer function was considered unreferenced, garbage collected, and thereby failed to run. Subsequent calls to open-the-back-buffer would then fails

    The root cause was my mediocre attempt to compute a stack base address, but the optimizer made it a lot worse in an unexpected way.

    BTW, anybody have a good way to reliably compute the base address of the runtime stack of the main thread under windows? ;-)

  57. Florian says:

    If only x86 assembly wasn’t so weird. Not the instructions, but you have only like 8 (?) registers which have funny names. So far I haven’t got used to that. Good thing my target platform is a PPC. That assembly looks less threatening to me.

  58. Michael Silk says:

    8 32-bit ones, I believe there are lots of other ones when you go into the MMX and other acronyms … (SSE, FPU, …)

    I think the names have reasonable definitions when you find them … ESI Extended Stack Something ? EAX – Extended Accumulator ? Hopefully someone can chime in with the real definitions :)

  59. Justin Cobb says:

    I have seen a few times MSVC use 0xbadf00d for dodgy pointers. hehe.

  60. Johan Johansson says:

    Wow, what a bunch of posers. I’m sure there are scenarios where the only choice is to debug the machine code and even more where it is preferable, but claiming that the only way to debug something is to look at the disassembly with kd and watch the binary representations of your data types "flow" through the registers? That’s about as clever as reading a book by examining it’s molecular structure.

    As for debugging optimized code that’s a moot point since MSVC’s optimizations can’t be trusted to begin with. At least that has been the story so far – vs2005 seems a giant leap in every other respect so I haven’t given up on it yet. But currently I wouldn’t go above inlining in a shipping product.

  61. Steven C. says:

    For those seeking backwards-in-time debugging:

    http://www.ghs.com/products/timemachine.html

    Sadly, unavailable (at present) for windows debugging.

    N.B.: Debugging optimized code is hard, but there’s no reason a debugger can’t do it "sufficiently" well, giving the developer the advantage of source code and low level information. You just have to want that. :)

  62. Norman Diamond says:

    11/16/2004 12:06 PM Steven C.

    > For those seeking backwards-in-time

    > debugging:

    Oh, I thought that was a joke, but it’s a real product, just with a joke name. Of course it doesn’t look all that different from the kind of traceback that ICE can provide, but of course it’s often[*] advantageous to get it at source level and store a longer history than ICE used to do.

    [* often != 0 && often != 1]

  63. Gary Wheeler says:

    I can’t believe you ‘optimizer always on’, machine-code debugger guys are serious. You can’t possibly debug any significant size project in this fashion.

    And before anybody starts making cracks about me being a source debugging sissy, I’m maintaining an OS/2 device driver written in 18,000 lines of assembly language. No frickin’ debugger at all.

  64. Raymond Chen says:

    "You can’t possibly debug any significant size project in this fashion."

    Shhh, don’t tell the Windows team. Not all debugging is done at asm-level, but a significant chunk is. They’d be pretty disheartened to learn that what they’re doing is impossible.

  65. Matt Pietrek published two useful articles in MSJ several years ago: Just Enough Assembly Language to Get By, Parts I and II.

    http://www.microsoft.com/msj/0298/hood0298.aspx

    http://www.microsoft.com/msj/0698/hood0698.aspx

  66. Jason Geffner says:

    do you have a good recommendation for an x86

    > programming book?

    The Art of Assembly Language can be downloaded for free in PDF format: http://webster.cs.ucr.edu/AoA/DOS/pdf/0_AoAPDF.html

    Alternatively, you can look at in HTML format (which I prefer for searching): http://webster.cs.ucr.edu/AoA/DOS/AoADosIndex.html

    > ida rocks you damn ms nerds

    Yes, it does, but not for debugging :)

    (at least not yet)

  67. Ben Hutchings says:

    mschaef: Use something like this:

    void * get_stack_base() {

    MEMORY_BASIC_INFORMATION mem_info;

    VirtualQuery(&mem_info, &mem_info, sizeof(mem_info));

    return (char *)mem_info.BaseAddress + mem_info.RegionSize;

    }

  68. ade says:

    PLs, can somebody introduce me into wrtting in machine codes and assembly codes. Thansk

  69. Tom Canham says:

    That’s what I love about sweeping generalizations — they’re all wrong.

    *wink*

    Does every programmer need to know assembly? Of course not. There are plenty of productive programmers out there who don’t have the faintest clue what assembly language their code creates, and aren’t interested in learning. Are they *bad* programmers? "Bad" is a religious question, and I stay out of religious debates.

    Remember when C++ was for wimps, and true coders wrote in C?

    Remember when C was for wimps, and true coders wrote in asm?

    Remember when asm was for wimps, and true coders wrote in machinecode?

    Remember when machine code was for wimps, and true coders "coded" by flipping toggle switches?

    Remember when toggle switches were for wimps, and true coders "coded" by swapping out carefully carved gears?

    Okay, I’m getting ridiculous, but the point is that that debate is silly — it’s like the definition of a "real man" — every man wants to be included in that definition, and to exclude as many others as possible. It’s just a pissing match.

    Now, back to reality. My *opinion* is that truly clever people use whatever tool is *best suited to the job*.

    I have done a *lot* of kernel debugging. I spent three years in PSS doing top-tier blue screen analysis from WinNT 3.5 through Win2k. Kernel debugging is the *ultimate* in power — every value of every variable anywhere in the system can be accessed, somehow, through kd.

    It’s also a *pain* — in ntsd I can just look at ebp-whatever to see a local. In kd I have to do that, then map the VM to PM, read the PM, etc. Does that make it bad? Of course not. But it DOES mean that unless you’ve got a problem that’s nigh-insoluble in kernel mode, using kd just for some macho factor is, well, stupid.

    Source level debugging used to be horrible, horrible, horrible. When I worked in PSS, we all laughed at Visual Studio and its silly attempts to track locals through optimized code. Yes, you could source level debug, but 90% of the time (or more), the info you saw was completely wrong.

    Guess what? Times have changed. With new PDB formats, new debugger support, and compiler tweaks to help the debugger out, source level debugging is pretty damned good. It’s not perfect. True optimization is lossy; and when information is lost, it’s LOST. But the good news is that at least now the debuggers have a good idea of the *lifetime* of information’s validity, so you can at least see "unknown" or something rather than some wrong (and possibly misleading) value. If you haven’t source level debugged in a while, I’d encourage it.

    And source level isn’t just for VS weenies either — Windbg and (I think) ntsd and (maybe?) kd do it now too. The functionality is *finally* starting to coagulate into libraries and dll’s shared across teams at Microsoft. So *finally* we’re seeing a coherent debugging picture, and thank god for that!

    So in conclusion for this massive post — I think that coders tend to be a tad insecure. We all want to think we’re "l33t" super coders, and everyone else is just a wannabe. But remember that there are many solutions to most problems, and it’s not a matter of finding the One True Path, but rather trying to figure out what the most efficient solution to a given problem will be, for you.

    Hell, I’m even using C# more and more these days — I actually *like* .Net. Times change!

Comments are closed.