Volume in Windows Vista, part 1: What is "volume"?


I’ve avoided writing about this because it’s “complicated”, but people are starting to ask questions that indicate that they’re confused so here goes.  It’s going to take several posts to cover this, so please bear with me.



So what IS volume, anyway?


Simply stated, volume is a measurement of the “loudness” of a sound (to a physicist or audio engineer, the answer is MUCH more complicated than that). There are lots of ways you can calculate volume, one measurement is in decibels (dB), which is a measure of the sound pressure level (SPL) emitted by a speaker.

In general, when discussing volume, there are two terms typically used – attenuation and gain. Attenuation represents a reduction in the amplitude of an audio signal, gain represents an amplification of that signal. If you look at a pro audio receiver, you’ll notice that the receiver represents its loudness as a negative number of decibels (-20dB, for example). This indicates that the receiver is attenuating the input signal by 20dB. By tradition, attenuation is measured in negative decibels, amplification is measured in positive decibels.

Audio signals flow through an electrical path and at different points in the path, there are opportunities to either amplify or attenuate the signal – the locations at which the amplification or attenuation occur are called “gain stages”, and they can occur in either analog or digital signals (an amplifier represents a gain stage, as does a potentiometer).

How does volume relate to digital audio?


When converting an analog signal to digital, the analog signal is sampled – the system measures the amplitude of the signal with fairly high resolution (44,100 samples per second for CD audio), then converts the samples to a digital value (a 16 bit integer for CD, or potentially a 32bit floating point value). In both cases, there is a reference range of legal samples – for this example, let’s assume that values range from -1.0 to 1.0 (it makes things easier).

Consider the following waveform:



 


When the sample is digitized, it is converted to individual samples like below:



Attenuation simply reduces the amplitude of the digital samples, and amplification simply increases the value of the signals.

For the reference sample, if you attenuate the sample by 50%, you get something like this:



Note that the waveform hasn’t changed shape, it’s just smaller.

If, on the other hand, if you amplify the same signal by 50%, you get:



Note that the samples that went beyond the +1 and -1 range were “clipped” – the samples can’t be represented digitally, so they were cut off. This clipping is very bad, and causes significant audio distortions.  The new waveform doesn’t really reflect the original waveform.

In addition, if a fixed point digital signal is attenuated then later digitally amplified, the signal resolution will be degraded – if, for example you apply a -6dB attenuation (which reduces the volume by 50%), you divide each of the samples in half (32767 becomes 16383). If you later amplify the signal digitally, you get 32766 – and thus you’ve lost some of the original signal information.

If, on the other hand, you’re using a floating point digital signal, you can attenuate and amplify with less worry – if you apply the same -6dB attenuation and amplification to a floating point sample, the division and multiplication cancel out (.75 becomes 0.375, becomes .75).

This is a large part of the reason the audio pipeline was converted to floating point in Vista – a floating point pipeline allows significantly more resolution and higher accuracy when manipulating the digital samples.

Btw, this loss of fidelity doesn’t happen when amplifying analog signals. That’s why it’s important that any amplification be done using analog signals, not digital signals – digital amplification is always lossy.

For audio hardware in Windows, the audio driver specification requires that for all hardware volume controls on the system that 0dB represent a full fidelity pass-through of audio samples – audio hardware can support either amplification or attenuation (or neither), but 0dB always represents “don’t change the samples”.

Please note that some audio hardware on the market does NOT follow this recommendation. We’ve seen audio devices that support a volume range of +0dB to +96dB. We’ve also seen devices that support volume ranges of +10dB to +60dB (mostly these are microphones).

 


Ok, so much for the basics on “volume”, tomorrow I’ll start discussing how volume works in the audio engine on Vista.

Comments (23)

  1. Casey says:

    Very interesting! Yes, more on audio please:)

  2. Chris says:

    Yes, I would like to hear much more about this subject.

    Cheers!

  3. Erwin Alva says:

    Cool plots.  Not simplistic, not too complicated.  I’m betting it’s not Excel. :-)

  4. LarryOsterman says:

    The graphs were built by a contractor working for one of the DSP architects, they were done using Octave (http://www.gnu.org/software/octave/).

  5. Andrew says:

    Isn’t -3dB equivalent to 50% volume reduction:-

    10 * log 0.5 = -3.01

    You use -6dB in your examples. Am I missing something?

  6. LarryOsterman says:

    3dB is a 50% POWER reduction, not a 50% volume reduction.  6dB is a 50% volume reduction.

  7. foxyshadis says:

    Recorded level difference is 20 log x, total sound pressure is 10 log x. That’s because total sound pressure is measured from the positive to the negative peak, whereas recording level is measured from the 0 to peak.

    Question, why choose float instead of 32-bit audio? Either would be as good as the other, so the slightly more common was chosen? Or is it 64-bit float?

  8. TimLovell-Smith says:

    Interesting. :)

    Is that 0DB meaning +0DB amplification? i.e. it wouldn’t correspond to muting the sound, which is what first jumped into my head?

  9. LarryOsterman says:

    0db meand no attenuation and no amplification, or 100% volume.

    Practically, -96dB == silence.

  10. Sarath says:

    Really cool! I went back to my old days in polytechnic :)

    A pictures worth a thousand words. The graphs helped for better understanding. Keep the drift on with full tank "Audio Diesel".

  11. Phaeron says:

    To expand on foxyshadis’ point, the problems attributed to fixed-point representations are not really problems with fixed point so much as simply insufficient range, precision, and care in rounding intermediate results. Use 22.10 fixed point with proper rounding, and your 16-bit audio will come through fine. On the other hand, introduce a temporary DC bias of 2^20 in your floating-point mixing, and you’ll start running into precision problems again, because floating point trades off precision bits in the mantissa for range bits in the exponent, and that trade-off isn’t always in your favor.

    Now, this isn’t to say that using hardware floating point is a bad idea, because you get graceful degradation around range/precision problems as well as proper rounding for a lot faster than you could do otherwise in fixed point. On DSP hardware with extra-wide accumulators and free rounding, through, you’d find fixed point arithmetic very competitive.

    BTW, loss of fidelity doesn’t happen when amplifying analog signals? What about noise?

  12. Hulk says:

    >> why choose float instead of 32-bit audio

    It all depends on your standard. x-bit integers in audio and graphics are usually used as a fixed point standard ranging from 0 to 1 (or -1 to 1, depending on the context). You have a uniform distribution of values in the range.

    In floating point, instead, you have separated mantissa and exponent (and sign, but that is not exactly a factor). This allows for a uniform distribution (using some less bits of course) in the desired range (-1 to 1). But it also allows the numbers to increase unpredictably (and in exponential way) over the expected range and get back (and viceversa).

    If you want to "see" it in effect, take any last generation game and see the difference in colors when float point surfaces are enabled and when not. Even with both at 16bit (integer or floating point) the floating point result is much better because it handles with almost no loss all the lighting calculations (which are mostly muls and divs).

  13. RyanBemrose says:

    @foxyshadis: Question, why choose float instead of 32-bit audio? Either would be as good as the other, so the slightly more common was chosen? Or is it 64-bit float?

    <a href=http://blogs.msdn.com/audiofool/archive/2006/08/23/715608.aspx>32-bit is just a bit overkill.</a>

    For any reasonable audio system, you really don’t need more than about 20 bits.  The IEEE float32 format offers 24 bit audio (23 mantissa bits + 1 sign bit) with an added 8-bit sliding gain (exponent) that is mainly unused.

  14. Yesterday , I talked about volume in general, today I want to drill into volume more detail. In Vista,

  15. Daniel Garlans says:

    What kind of ramifications does having a floating point audio path have when using a Pentium 4?

    I know from various DSP forums (link below) that the Pentium 4’s, older ones at least, experience extremely significant slowdowns when it comes to denormalized floats, ie, really really small ones, to the point where the solution is generally to either cut to zero below a certain point, or add a constant noise or dc bias to the signal to prevent it reaching the denormalized levels.

    So, does Vista use SSE/SSE2 to avoid this? Do you add noise/bias/offset? Do you hope it never happens? Or is it classified :D

    Here’s a relevant link, which references the popular MusicDSP group, Intel, and several others:

    http://phonophunk.com/articles/pentium4-denormalization.php?pg=3

  16. Igor says:

    Larry, please check out this discussion and tell me what you think:

    http://svconline.com/mag/avinstall_dsp_debate/

    IMO, either double precision floating point or fixed point is required for internal audio paths if you want good quality sound. Single precision (32-bit) floating point just doesn’t cut it although it is easiest to work with.

  17. LarryOsterman says:

    Igor/Daniel, we’re talking about this issue right now with the various FP and DSP experts in-house.

  18. Daniel Garlans says:

    Nice, I’m excited to hear what you guys come up with :)

    I was actually thinking about it more; the specific instances which might cause these kind of denormalized numbers are probably fairly rare in normal day-to-day audio life; it’s only when you start dealing with IIR filters that they can start happening. BUT- that opens a possible DOS vulnerability too; an application could open lots of streams, throw an impulse through the IIR, and wait for the numbers to get really, really small…

    Then, not only would the attacker hit the slowdown in it’s own math, the entire vista audio path would start having to process numbers that small too, at least until the first gain calculation that might increase/decrease the numbers enough to escape the problem.

    I’m sure there’s some huge holes in that idea (like, the interface between the application and the audio system maybe using integers rather than floats, and all the conversion happening internally), but still… I can’t be THAT far off, can I? :D

  19. Igor says:

    Just to clear something up:

    It is not denormals and over/underflows which are costly — it is floating point exceptions.

    From my personal experience in code optimization, they are affecting Intel CPUs more than AMD ones. Normally, FP exceptions are disabled but even though they are disabled, they are a penalty and a considerable one.

    In some cases I have measured execution times on Pentium 4 that were more than double the time it takes on a comparable Athlon 64 to perform the same task and I first thought it was because of Athlon’s faster FPU.

    Then I wrote FP exception handler and realized that I am having exceptions. Of course, Athlon 64 had them too but their impact was vastly smaller than on Pentium 4 CPU. After I fixed the code so that the calculations do not generate exceptions, time difference between the two dropped considerably.

    As for the audio, personally I would prefer 48-bit (or even 64-bit?) fixed point although it would be harder to implement.

    That is because CPUs of today (Core 2 Duo) and tomorrow (Penryn) are exceptionally good at working with SIMD integers and the cost of that implementation performance-wise would be pretty small. Naturally you will dislike it because you would need at least two code paths to develop and maintain, one for the older machines and one for the modern ones.

    Let me know if I could be of any help.

  20. Igor says:

    I really hope my lengthy comment isn’t lost.

  21. Donnie Hale says:

    Larry,

    I appreciate the posts. Here’s a question I’ve had for a long time about the digital representation of sound. I’ve looked at some textbook treatises but not gotten through them (too much of a thought / time commitment).

    I often see graphs like your first one – a digital waveform of amplitude plotted against time. My question is, what is this the amplitude of? In my mind, there might be sound at 20 different frequencies at any point in time (different notes on a piano, a drum cymbal, vocals, etc.). The information for each of those frequencies has to be captured at that point in time, and I would think several characteristics of each of those frequencies would have to be captured, including amplitude.

    If it’s possible, how is this information represented in a digital sense? And how does that relate to the amplitude vs. time graph?

    Thanks for bearing with me,

    Donnie

  22. Our friend in the multimedia group and prolific blogger Larry Osterman is writing a series of articles

  23. Nils Arthur asked in another post: While we are talking volume controls. Could you explain why it’s only