The evolution of a data structure - the WAVEFORMAT.

In the beginning, there was a need to be able to describe the format contained in a stream of audio data.

And thus the WAVEFORMAT structure was born in Windows 3.1.

typedef struct WAVEFORMAT {        WORD    wFormatTag;        WORD    nChannels;        DWORD   nSamplesPerSec;        DWORD   nAvgBytesPerSec;        WORD    nBlockAlign;} WAVEFORMAT;

The problem with the WAVEFORMAT is that it was ok at expressing audio streams that contained samples whose size was a power of 2, but there was no way of representing audio streams that contained samples whose size was something other than that (like 24bit samples).

So the PCMWAVEFORMAT was born. 

typedef struct PCMWAVEFORMAT {
        WAVEFORMAT      wf;
        WORD            wBitsPerSample;
} PCMWAVEFORMAT;

If the application passed in a WAVEFORMAT with a wFormatTag of WAVE_FORMAT_PCM, it was required to actually pass in a PCMWAVEFORMAT so that the audio infrastructure could determine the number of bits per sample.

That worked fine and solved that problem, but the powers that be quickly realized that relying on the format tag for extensibility was going to be a problem in the future. 

So once again, the structure was extended, and for Windows NT 3.5 and Windows 95, we got the WAVEFORMATEX that we know and love:

typedef struct tWAVEFORMATEX
{
    WORD        wFormatTag;         /* format type */
    WORD        nChannels;          /* number of channels (i.e. mono, stereo...) */
    DWORD       nSamplesPerSec;     /* sample rate */
    DWORD       nAvgBytesPerSec;    /* for buffer estimation */
    WORD        nBlockAlign;        /* block size of data */
    WORD        wBitsPerSample;     /* number of bits per sample of mono data */
    WORD        cbSize;             /* the count in bytes of the size of */
                                    /* extra information (after cbSize) */
} WAVEFORMATEX, *PWAVEFORMATEX, NEAR *NPWAVEFORMATEX, FAR *LPWAVEFORMATEX;

This solved the problem somewhat.  But there was a problem -  while all the APIs were changed to express a WAVEFORMATEX, there were still applications that passed in a WAVEFORMAT to the API (and there were WAV files that had been authored with WAVEFORMAT structures).  The root of the issue is that there was no way of distinguishing between a WAVEFORMAT (which didn't have a cbSize field) and a WAVEFORMATEX (which did).  To resolve this, for WAVEFORMAT structures kept in files, the file metadata provided the size of the structure, so we could use the size of the structure to distinguish the various forms.

When the structure was passed in as a parameter to a function, there was still a problem.  For that, the code that parses WAVEFORMATEX structure must rely on the fact that if the wFormatTag field in the WAVEFORAMAT structure was WAVE_FORMAT_PCM, then the WAVEFORMAT structure is actually a PCMWAVEFORMAT, which is the same as a WAVEFORMATEX with a cbSize field set to 0.  For all other formats, the code simply assumes that the caller is passing in a WAVEFORMATEX structure.

 

Unfortunately, the introduction of the WAVEFORMATEX wasn't quite enough.  When you're dealing with two channel audio streams, it's easy to simply say that channel 0 is left and channel 1 is right (or whatever).  But when you're dealing with a multichannel audio stream, it's not possible to determine which channel goes with which speaker.  In addition, with a WAVEFORMATEX, there's still a problem with non power-of-2 formats.  This time, the problem happens when you take a 24bit waveformat and try to pack it into 32bit samples - doing this can dramatically speed up any manipulation that needs to be done on the samples, so it's highly desirable.

So one final enhancement was made to the WAVEFORMAT structure, the WAVEFORMATEXTENSIBLE (introduced in Windows 2000):

typedef struct {
    WAVEFORMATEX    Format;
    union {
        WORD wValidBitsPerSample;       /* bits of precision  */
        WORD wSamplesPerBlock;          /* valid if wBitsPerSample==0 */
        WORD wReserved;                 /* If neither applies, set to zero. */
    } Samples;
    DWORD           dwChannelMask;      /* which channels are */
                                        /* present in stream  */
    GUID            SubFormat;
} WAVEFORMATEXTENSIBLE, *PWAVEFORMATEXTENSIBLE;

In the WAVEFORMATEXTENSIBLE, we have the old WAVEFORMATEX, and adds a couple of fields that allow the caller to specify packing of the samples, and to allow the caller to describe which channels in the stream should be redirected to which speaker.  For example, if the dwChannelMask is SPEAKER_FRONT_LEFT | SPEAKER_FRONT_RIGHT | SPEAKER_LOW_FREQUENCY | SPEAKER_TOP_FRONT_LEFT, then channel 0 is the front left channel, channel 1 is the front right channel, channel 2 is the subwoofer, and channel 3 is the top front left speaker.  The way you identify a WAVEFORMATEXTENSIBLE is that the Format.wFormatTag field is set to WAVE_FORMAT_EXTENSIBLE and the Format.cbSize field is always set to 0x16.

 That's where things live for now - who knows if there will be another revision in the future.