What did Windows 3.1 do when you hit Ctrl+Alt+Del?


This is the end of Ctrl+Alt+Del week, a week that sort of happened around me and I had to catch up with.

The Windows 3.1 virtual machine manager had a clever solution for avoiding deadlocks: There was only one synchronization object in the entire kernel. It was called "the critical section", with the definite article because there was only one. The nice thing about a system where the only available synchronization object is a single critical section is that deadlocks are impossible: The thread with the critical section will always be able to make progress because the only thing that could cause it to stop would be blocking on a synchronization object. But there is only one synchronization object (the critical section), and it already owns that.

When you hit Ctrl+Alt+Del in Windows 3.1, a bunch of crazy stuff happened. All this work was in a separate driver, known as the virtual reboot device. By convention, all drivers in Windows 3.1 were called the virtual something device because their main job was to virtualize some hardware or other functionality. That's where the funny name VxD came from. It was short for virtual x device.

First, the virtual reboot device driver checked which virtual machine had focus. If you were using an MS-DOS program, then it told all the device drivers to clean up whatever they were doing for that virtual machine, and then it terminated the virtual machine. This was the easy case.

Otherwise, the focus was on a Windows application. Now things got messy.

When the 16-bit Windows kernel started up, it gave the virtual reboot device the addresses of a few magic things. One of those magic things was a special byte that was set to 1 every time the 16-bit Windows scheduler regained control. When you hit Ctrl+Alt+Del, the virtual reboot device set the byte to 0, and it also registered a callback with the virtual machine manager to say "Call me back once the critical section becomes available." The callback didn't do anything aside from remember the fact that it was called at all. And then the code waited for ¾ seconds. (Why ¾ seconds? I have no idea.)

After ¾ seconds, the virtual reboot device looked to see what the state of the machine was.

If the "call me back once the critical section becomes available" callback was never called, then the problem is that a device driver is stuck in the critical section. Maybe the device driver put an Abort, Retry, Ignore message on the screen that the user needs to respond to. The user saw this message:


 Procomm 


This background non-Windows application is not responding.

*  Press any key to activate the non-Windows application.
*  Press CTRL+ALT+DEL again to restart your computer. You will
   lose any unsaved information.


  Press any key to continue _

After the user presses a key, focus was placed on the virtual machine that holds the critical section so the user can address the problem. A user who is still stuck can hit Ctrl+Alt+Del again to restart the whole process, and this time, execution will go into the "If you were using an MS-DOS program" paragraph, and the code will shut down the stuck virtual machine.

If the critical section was not the problem, then the virtual reboot device checked if the 16-bit kernel scheduler had set the byte to 1 in the meantime. If so, then it means that no applications were hung, and you got the message


 Windows 


Although you can use CTRL+ALT+DEL to quit an application that has stopped responding to the system, there is no application in this state.

To quit an application, use the application's quit or exit command, or choose the Close command from the Control menu.

*  Press any key to return to Windows.
*  Press CTRL+ALT+DEL again to restart your computer. You will
   lose any unsaved information in all applications.


  Press any key to continue _

(Anachronism alert: The System menu was called the Control menu back then.)

Otherwise, the special byte was still 0, which means that the 16-bit scheduler never got control, which meant that a 16-bit Windows application was not releasing control back to the kernel. The virtual reboot device then waited for the virtual machine to finish processing any pending virtual interrupts. (This allowed any pending MS-DOS emulation or 16-bit MS-DOS device drivers to finish up their work.) If things did not return to this sane state within 3¼ seconds, then you got this screen:


 Windows 


The system is either busy or has become unstable. You can wait and see if the system becomes available again and continue working or you can restart your computer.

*  Press any key to return to Windows and wait.
*  Press CTRL+ALT+DEL again to restart your computer. You will
   lose any unsaved information in all applications.


  Press any key to continue _

Otherwise, we are in the case where the system returned to a state where there are no active virtual interrupts. The kernel single-stepped the processor if necessary until the instruction pointer was no longer in the kernel, or until it had single-stepped for 5000 instructions and the instruction pointer was not in the heap manager. (The heap manager was allowed to run for more than 5000 instructions.)

At this point, you got the screen that Steve Ballmer wrote.



Contoso Deluxe Music Composer


  This Windows application has stopped responding to the system.

  *  Press ESC to cancel and return to Windows.
  *  Press ENTER to close this application that is not responding.
     You will lose any unsaved information in this application.
  *  Press CTRL+ALT+DEL again to restart your computer. You will
     lose any unsaved information in all applications.



If you hit Enter, then the 16-bit kernel terminated the application by doing mov ax, 4c00h followed by int 21h, which was the system call that applications used to exit normally. This time, the kernel is making the exit call on behalf of the stuck application. Everything looks like the application simply decided to exit normally.

The stuck application exits, the kernel regains control, and hopefully, things return to normal.

I should point out that I didn't write any of this code. "It was like that when I got here."

Bonus chatter: There were various configuration settings to tweak all of the above behavior. For example, you could say that Ctrl+Alt+Del always restarted the computer rather than terminating the current application. Or you could skip the check whether the 16-bit kernel scheduler had set the byte to 1 so that you could use Ctrl+Alt+Del to terminate an application even if it wasn't hung.¹ There was also a setting to restart the computer upon receipt of an NMI, the intention being that the signal would be triggered either by a dedicated add-on switch or by poking a ball-point pen in just the right spot. (This is safer than just pushing the reset button because the restart would flush disk caches and shut down devices in an orderly manner.)

¹ This setting was intended for developers to assist in debugging their programs because if you went for this option, the program that got terminated is whichever one happened to have control of the CPU at the time you hit Ctrl+Alt+Del. This was, in theory, random, but in practice it often guessed right. That's because the problem was usually that a program got wedged into an infinite message loop, so most of the CPU was being run in the stuck application anyway.

Comments (29)
  1. Darren says:

    Fun fact: Lots of old mainframe operating systems called the OS the "System Monitor," not because it monitored the system but because a "monitor" is a synchronization object you can suspend yourself in and pick up again when it gets signaled. I.e., it would be like naming the kernel "the system critical section".

  2. IanBoyd says:

    Intended or not; i liked this trip down the Ctrl+Alt+Delete hole.

    It was also a trip down memory lane. Not only for the blue screens, but for the unintentional mention of Procomm. That, and Telix, were used a lot when BBSs existed.

  3. Gabe says:

    I love how the kernel single-stepped the processor! I wonder how often this technique has been used outside of debuggers.

  4. Muzer_ says:

    I like this, very clever – though I got a bit lost at "If things did not return to this sane state within 3¼ seconds, then you got this screen:" and "The kernel single-stepped the processor if necessary until the instruction pointer was no longer in the kernel, or until it had single-stepped for 5000 instructions and the instruction pointer was not in the heap manager."

  5. Muzer_ says:

    To expand on that, which sane state is it talking about, and what is the kernel trying to achieve by single-stepping the processor (and which instruction pointer)?

    [The sane state is explained in the previous sentence, once you ignore the parenthetical: "… waited for the virtual machine to finish processing any pending virtual interrupts." And the purpose of single-stepping the processor is to get back to the application so we can pretend the application called exit. -Raymond]
  6. Azarien says:

    @Muzer: Perhaps to get the name of the "current" executable, to be displayed on the blue screen.

  7. N I says:

    This has been a fascinating series of articles. I love how clever 16-bit Windows had to be, in contrast to NT which was able to favor correctness up front.

  8. Scott Brickey says:

    So THAT'S where VxD came from. Finally answered, after decades of not knowing (not that I was investigating the whole time).

  9. D V says:

    It's absolutely stunning that Windows, in its pre-NT incarnation, was actually a somewhat-functioning piece of software. I'm in a state of awe after reading this article. Favorite quote – "Why 3/4 seconds? I have no idea". Just utter chaos, held together by bubble gum and match sticks. Truly an amazing piece of software.

    [You had to pull a lot of tricks to squeeze three operating systems into 2MB of memory. -Raymond]
  10. Virtual troll Device says:

    All my eyes could focus on those screens is "You will lose any unsaved information". The "of Death" part in BSOD is not unfounded.

  11. Yuhong Bao says:

    I hope you will write about how Ctrl-Alt-Del changed in Win9x. Win32 apps are probably relatively easy, but it would be interesting to read about how it handled Win16 apps.

  12. foo any says:

    I was very confused about this:

    "The kernel single-stepped the processor if necessary until the instruction pointer was no longer in the kernel, or until it had single-stepped for 5000 instructions and the instruction pointer was not in the heap manager. (The heap manager was allowed to run for more than 5000 instructions.)"

    Then I remembered that there are actually two kernels involved here, the 32bit virtual machine manager and the 16bit Windows kernel. So the former single stepped the latter, right?

    [The 16-bit kernel single-stepped itself. The 32-bit kernel simulated an int 1 into the 16-bit kernel, and the int 1 handler took over from there. -Raymond]
  13. Eddie says:

    Ctrl-Alt-Del week has been absolutely entertaining and informative.

    Raymond, after all those tech news sites erroneously reported that Ballmer created the BSOD did any coworkers stop by your office and say "look what you done?"

    [Yeah, like all of them. (Exaggerating.) -Raymond]
  14. foo any says:

    Ah, thanks Raymond, the simulated interrupt was the missing link. I was wondering how the 16bit kernel single stepping itself would make any sense if the virtual machine manager just resumed it.

    By the way, you mentioned NMIs. What did happen on Win 3.1 when an NMI took place? Did the VMM just ignore it? Pass it into the 16bit kernel? Die?

    [Actually, it wasn't precisely like that. The 32-bit kernel simulated a call to a magic function (part of the magic information provided at startup) as if it were the int 1 handler. That function would then decide whether further tracing was necessary; if so, it would install a true int 1 handler, set the TF flag, and iret. Oh, did I mention that all of the code throughout this entire endeavor (32-bit kernel and 16-bit kernel) was written in assembly? (And the virtual reboot device by default rebooted the system when an NMI occurred.) -Raymond]
  15. Muzer_ says:

    Oh, I see. I had somehow misunderstood the term "in the kernel", for some reason interpreting it as meaning in the working memory of the kernel (as in, some pointer on the stack somewhere had to disappear so it was no longer "in the kernel"), as opposed to the instruction pointer (which I now realise is synonymous with the program counter) having to stop pointing to memory belonging to the kernel. Thanks for correcting my stupid misunderstanding!

  16. Yuhong Bao says:

    @Muzer_: To be more precise, I think it is referring to KRNL386.

  17. user says:

    The whole week was awesome Raymond! What can you tell us about "Abort, Retry, Fail" message?

  18. 12BitSlab says:

    This past week is yet another reason why this is my favorite blog.  Thanks, Raymond!

  19. smf says:

    "The whole week was awesome Raymond! What can you tell us about "Abort, Retry, Fail" message?"

    @user

    I'm old school. I remember "Abort, Retry, Ignore", IIRC it changed around dos 3.3

  20. nobugz says:

    Ah, I see Ballmer's hand in rewriting *that* one.

  21. Innocent Bystander says:

    There is something particularly enchanting about these stories from the Windows 3.1 days. I think the hardware constraints forces the generation of "nifty code" that is both fun to read (and one assumes write) about. Win NT was a much more stable OS perhaps, but it was at least and order of magnitude slower at the time! I get the same enjoyment reading the original Quake sources (when can you open source Win 3.1 Raymond?). Sadly, modern hardware means we mostly have different priorities in this current day.

  22. ErikF says:

    I'm betting that the 3/4 second delay was to let VCACHE flush its buffers. I remember disk activity happening when I pressed Ctrl+Alt+Del (and also remember that hard disks were *not* fast back then!) Thanks again for the great series! There's something to be said for "nifty code", but the downside to it is that such code often becomes unmaintainable and a possible source of problems later on.

    [Well, there was no VCACHE in Windows 3.1, but I get your point. To let the disk cache flush. -Raymond]
  23. user says:

    @smf

    I guess I'm old

  24. user says:

    Off topic – Raymond I guess those folks at Windows Phone team need your guidance badly, My phone was updating, It took more than an hour to reach to %93, then error 801882d2 happened and them instead of resuming the download, it started it all over again. a quick search showed that error 801882d2 is probably a server time-out error, does it really have to restart the download? can't it just resume it? Guys, Seriously?

    [Try hitting Ctrl+Alt+Del. -Raymond]
  25. cheong00 says:

    Yes, I think Win3.1 still using SMARTDRV.EXE to do caching things.

    Btw, regarding "unstable" state in Win3.1, I remember I wondered about why sometimes the mouse cursor can move in "unstable" state and sometimes I can't (The Ctrl-Alt-Del keys are still functional in both cases) We used this to deduce how badly the system state is gone at that time.

  26. Mc says:

    Anyone remember the "crash"  where moving the mouse made the machine make beeping noises at you. Because the message queue was full? Much more entertaining than a blue screen.

    [Oh, you mean ticking death. -Raymond]
  27. DWalker says:

    That KB article 83325 is quite interesting!  I probably read it when it first came out.  Windows, and all operating systems, have come a long way since then (only 23 years).  

Comments are closed.

Skip to main content