Scritch, scratch

Steve Ball posted an article about some "glitching" issues in Vista. I can't resist adding my two cents.

For me, Vista definitely glitches a LOT more than previous versions of Windows. As a fairly experienced developer, I think I understand the reasons pretty well, so I can explain it away. But as a user, when my laptop audio is glitchy, I want to find the developer responsible and (censored for mild descriptions of hypothetical violence) .

I've read a lot of comments raising various theories, some that call into question the sanity of the Windows developers. I can't say I blame them. There is definitely some room for improvement in the way things work. However, in the interest of fairness and progress, the attention should be focused where it will do the most good. That means we shouldn't simply blame the Windows developers unless there is really something they can do about the problem. And that means that before we start placing blame, we probably ought to figure out where the problem really lies.

The first complaints always mention something about the "lame" and "brain-dead" Windows NT scheduler. However, I'm pretty convinced that this is not the problem. In fact, my audio sounds BETTER when my system is under load (more on this later). I agree with the statement that audio glitches under CPU load are usually the fault of the OS and the scheduler, but I've seen very little correlation between CPU load and audio glitching. I haven't seen any evidence that the CPU scheduler at fault.

Hard disk load is occasionally an issue, but that is generally fairly obvious and easy to fix at the application level. The application simply didn't buffer enough sound samples and ran out of music to play while waiting for the hard drive to load the next bit of music. Either tweak the buffering algorithm of the application or get a faster hard drive (or network).

Memory can also be an issue. Some buffer or code needed to play the music might be paged out because you're running low on free memory, and it didn't get paged in quickly enough once the application tried to access it. This could possibly be blamed on the OS if the OS is too aggressive in trimming the working set. If an app pre-loads 60 seconds of audio, that means it doesn't won't touch the last page of the buffer for 60 seconds, which might be long enough for the OS to page out the buffer. Here, you would probably get better results by buffering only 5 seconds worth of music. In any case, I haven't had any significant trouble with this on Vista (except once in a while when I let Firefox run too long, it eats up 1.5 GB of RAM, and my system goes into memory panic mode).

Drivers are a much more significant issue. Traditional OSes (XP, Vista, Linux) can't really schedule a driver's activity. Once the driver starts doing something it thinks is important, it can only be interrupted by a higher-priority driver. On a single-CPU system, if a driver takes over, no new audio can be buffered until the driver returns. Many drivers written for XP or earlier systems (where the audio buffer was somewhat more forgiving -- more on this later) cause trouble by doing too much processing at once. In testing (under XP), the driver's latency wasn't a problem, but on Vista, the latency requirements are much less lenient. I've seen significant Vista audio glitch issues go away after upgrading from an XP-era driver to a newly released Vista-compiant driver.

Even if the driver only does 1 millisecond of work at a time (or whatever the Vista latency recommendations are), if it has to do this 1000 times a second, it will still use up all available CPU time. Drivers have priority over all applications, so on a single-CPU system, this leaves no CPU for the audio application and mixer. On a multi-CPU system, this can still be a problem if the driver holds certain locks that are needed for audio processing. This is why Vista throttles network activity when the audio channel is open -- network packet bursts can easily use enough CPU to cause audio glitches. Probably a good idea overall, though it seems that the throttling algorithm is a bit too aggressive and has some room for improvement.

Another issue is power management. This turned out to be the major problem on my laptop. My laptop's motherboard (CPU and chipset) goes into sleep mode whenever it detects that it is "idle". That's actually a pretty good thing because it means I can get 2 or 3 hours of use out of the battery instead of 20 minutes. This happens hundreds of times per second -- it sleeps for 2 milliseconds, wakes up to handle a keystroke, sleeps for another 2 milliseconds, wakes up to handle the calculations for an animation, sleeps a bit more, wakes to fill an audio buffer, etc. But it sometimes doesn't wake up quickly enough to buffer the next bit of audio. If it ever takes longer than 9 ms to wake up, there will be a glitch. This was a real problem when my laptop was new. Recent drivers have improved this a lot, but there's still a bit of scritch-scratch during some games or media.

As an experiment, I wrote a very simple application to prevent the motherboard from going to sleep. It starts a low-priority thread that does a simple busy wait in a low priority loop. A Sleep(1) loop didn't help -- it gave the motherboard a chance to go to sleep. While the busy wait makes my laptop get very hot, it also completely stops the glitching.

#include <windows.h>

#include <stdio.h>

 

// This probably doesn't really do anything. At such a low priority, the

// process usually terminates before the thread exits. But this is an easy

// way to avoid certain compiler warnings. Without it, some compilers warn

// that the "return 0" below is unreachable. If I remove the "return 0",

// other compilers warn that I don't return a value from DoNothingQuickly.

volatile BOOL g_stopNow = FALSE;

 

DWORD WINAPI DoNothingQuickly(LPVOID /* unused */)

{

      while (!g_stopNow)

      {

            // Sleep(1) didn't fix the glitching. Sleep(0) just spends all

            // the time context switching, which is probably as bad or

            // worse than a busy wait in terms of impact on the rest of

            // the system. So I'll just do a busy wait.

      }

      return 0; // Usually never reached.

}

 

int main()

{

      int returnCode;

      DWORD dwThreadId;

      HANDLE hThread;

 

      hThread = CreateThread(

            NULL,

            0,

            DoNothingQuickly,

            NULL,

            CREATE_SUSPENDED,

            &dwThreadId);

 

      if (hThread != NULL)

      {

            // We want to keep one CPU wide-awake and leave any other

            // CPU(s) idle.

            SetThreadAffinityMask(hThread, 1);

 

            // We don't want to get in the way of any useful work.

            SetThreadPriority(hThread, THREAD_BASE_PRIORITY_IDLE);

 

            ResumeThread(hThread);

            CloseHandle(hThread);

            printf("Caffeine: Now running. Press <Enter> to quit...");

 

            getchar();

            g_stopNow = TRUE; // Probably useless, but might as well...

            returnCode = 0;

      }

      else

      {

            printf("Caffeine: Unable to create thread. Exiting.\n");

            returnCode = GetLastError();

      }

 

      return returnCode;

}

Drivers seem to be a big part of the problem here -- they either spend too much time working, or they take too long to wake up after going into a sleep state. Hopefully this means that audio problems will go away as the drivers improve. Computer retailers like Dell and HP will probably ensure that their new hardware meets the Vista latency requirements before putting it on the market. Unfortunately, owners of older hardware might be out of luck.

Hindsight is 20/20. I've seen how these kinds of issues come up, and I've been involved in some mistakes myself, so I don't want to sound like I'm smarter than anybody on the Windows audio team. However, there is certainly some room for improvement in they way this issue has played out. While the changes in the audio stack are technically admirable and the problems can generally be blamed on drivers, that's little comfort to those enduring static on their speakers. Things work in XP and don't work in Vista. That sounds like a regression, not an improvement. It's getting better, but there really shouldn't have been a problem in the first place.

What's wrong? Well, Vista aimed for a technically superior audio experience. Latency has been significantly reduced in Vista -- when you fire your machine gun in your favorite game, you'll hear the sound effect a bit more quickly. For gamers and audio professionals, this reduction in latency can make a big difference. For people trying to listen to their MP3s, this probably doesn't matter much. The downside to reducing latency is that it reduces the margin for error. Vista cannot tolerate any delays longer than 5 or 10 milliseconds without glitching, while XP could usually tolerate a much longer delay with no problem. Assuming all of the drivers do their part, modern hardware actually has no problem meeting the deadlines. But if anything goes wrong on Vista, you can hear it.

What was the mistake? The core of the issue is that a change was made that was detrimental to some customers. In the long term, the change is probably a step (or two) in the right direction, but for many people, the change causes trouble and offers no immediate benefit. Obviously the severity of the problem was underestimated (contrary to popular belief, Windows developers do care about their customers and would never have done this had they known the outcome). I've never been a big fan of taking away without giving something in return.

What would have prevented the problem? I don't know how hard this would have been to implement, but I would love to have some kind of adjustment knob to control the amount of latency I want on my system. That would have sidestepped the whole issue by allowing the customer to pick their priorities, i.e. lower latency on my desktop where there aren't power issues and where I want my games to sound great, higher latency on my laptop where I want additional power savings, the drivers aren't as good (and are also optimized for power savings), and where I'm never using the audio stack for anything other than music or videos anyway. This would also have been a great mitigation for driver issues during the transition period.

 This is probably a good lesson for developers in general -- be sure to consider the transition period between the old system and the new system when designing the new system.

In the meantime, I've finally gotten my audio problems worked out. After the latest motherboard chipset upgrade, I no longer have to run caffeine.exe anymore, and I only run into minor static when running a few specific programs. Hopefully things are improving for everybody else, too.