What's in an audio volume?

I've been talking about audio controls - volumes and mutes and the like, but one of the more confusing things I've run into here at work is the concept of "volume".

First off, what IS volume?

Well, roughly speaking (and I know the audiophiles out there will get on my case about this), there are actually several concepts when you talk about "volume".  The first (and most common) is that volume is a representation of "loudness".  But it turns out that in practice, volume is a representation of "intensity".

The difference between "loudness" and "intensity" is that "loudness" is perceptual - how do you perceive a sound.  But "intensity" is actually what's measured - as SPL (Sound Pressure Level), which is a representation of energy in the sound space.

Typically volumes are measured in decibels - a decibel is a logarithmic scale (each 10dB increase is a 10x increase in sound pressure).  20dB is about the volume of a whisper, 140dB is that of a jet airplane taking off next door. 

Now when you deal with volumes in pro audio equipment, volume is measured by two factors - attenuation and amplification.  0 means that the sound is playing at its native level, negative numbers are reductions in volume from that native level, and positive numbers indicate amplification. 

For most computer hardware, volume is measured as attenuations - negative numbers running from 0 (max volume) to -infinity (0 volume).  In practice, the number runs from 0 to -96dB.  Typically computers don't ever amplify signals, just attenuate them.  If you think about how digital audio works this makes sense.  Since an audio sample at full volume is at 0dB, it's easy to attenuate the samples (just scale them down appropriately).  On the other hand, it's not easy to amplify them - they're already AT 100% - any amplification would have to come AFTER the DAC.  So digital volumes ultimately measure attenuation.

But audio volumes AREN'T in decibels (because that would be easy).  Instead, the audio volume is represented in a number of different sets of units, depending on your API.

And that's where it gets really, really ugly.  There are at least five different sets of APIs in the system that measure audio volume, and they use totally different units.

For example, the wave APIs ((waveOutSetVolume, waveOutGetVolume) represent volume as a number between 0x0000 and 0xffff, where 0 represents silence and 0xffff represents full volume.  The wave APIs assume that all audio outputs are stereo, and they pack the left and right channels into a single DWORD.  Of course if your audio system has more than two channels, that's a problem, but the reality is that almost nobody ever wants to adjust the balance as a normal activity (it's typically done once during system setup and then ignored).

The mixer APIs on the other hand set their volumes with the mixerSetControlDetails API.  That API takes an integer between a low bound and a high bound, determined from the dwMinimum and dwMaximum fields of the relevent MIXERCONTROL.  The MIXERCONTROL structure also defines the number of steps between the low and the high value.  For most audio adapters, this is a number between 0 and 0xffff, with 0xffff steps, but this is not guaranteed - I've seen audio adapters with discrete volumes - 256 steps, for example.

And then there's direct sound.  DirectSound sets volume on individual DSound buffers - you set the volume with the IDirectSoundBuffer8::SetVolume API.  The DSound set volume API sets the volume as a DWORD with the volume measured in hundredths of a dB, ranging from 0 to -10,000 (0 to -100dB).

Oh, and I can't forget the audio CD playback volume.  The IOCTL_CDROM_GET_VOLUME (which is used to control the volume of CD playback when you're playing an audio CD over the analog connector to your sound card) specifies volumes in numbers between 0 and 255.

And of course, the audio device driver that's actually used to render all these different volume levels takes a fourth type of volume.  The KSPROPERTY_AUDIO_VOLUMELEVELEL API takes a number from -2147483648 to +2147483647 where -2147483648 is silence (-32767 dB), 0 is max volume and 2147473647 is +32767 decibels (gain).  The units for the sysaudio volume are in 1/65536th decibel, which is nice since the high 16 bits represents the decibel value, the low 16 bits represent the fractional portion of the volume (typically 0).

Sigh.