Basic audio volume theory

Last time we talked about the different Windows Audio Session APIs for setting volume. Let's talk a little about what volume means.

For purposes of illustration, let's take our signal to be a full-scale square wave:

Incidentally, the answer to the exercise

completely characterize the set of digital signals that achieve 0 dB FS intensity. If you have, say, 1000 samples of mono 16-bit
integer audio to play with, how many distinct signals achieve 0 dB FS intensity?

is "all signals with N/2 samples having value 1 and N/2 samples having value -1". There are

such signals. For N = 1000 this is comes out to about 2.7e299.


Linear in amplitude

If we multiply the signal by a number between 0 and 1, we reduce the volume. That's the first natural volume scale - the "linear in
amplitude" scale. 0 is silence, 1 is an unchanged signal, and 1/2 is a signal with all the amplitudes divided by 2.

Linear in power

Recalling the formula for the intensity of a signal, we realize that the power (intensity squared) of a signal depends on the square of the
magnitude. This leads us to the second natural volume scale - the "linear in power" scale. 0 is still silence, and 1 is still an
unchanged signal, but now 1/2 maps to a signal with half of the power of the original signal. In particular, a full-scale square
wave with power 1 would drop to a square wave with power 1/2 - this has amplitude sqrt(1/2) which is significantly more than 1/2.

Linear in dB

So far so good. However, sound is perceived in a relative fashion. Humans are not very good at determining absolute volume, but are very
good at determining relative volume. If I play a sound and ask you "what is the dB SPL of the sound I just played", you would have trouble
answering; but if I played two sounds and asked "which sound was louder", you would be able to tell me. You would even be able to give me an
estimate such as "it's about twice as loud".

So a natural scale for volume is based on relative power - that is, a logarithmic scale dB = 10 * log10( PA / P0 ) where PA is the attenuated signal and P0 is the
original. That looks something like this:

Note that the relative power scale has no bottom! It's turtles all the
way down.

So how do you map an infinite volume scale onto a finite slider, while preserving the nice property that equal distances map to equal power ratios?

Skip to main content