This is the first installment of a series of blogs on creating Windows Media Photo files. This series will cover the tools that are currently available, an explanation of the various encoding options, and recommendations and guidelines on how to achieve the best results for a wide variety of different scenarios.
Part 1 starts out with an overview of the tools currently available. The bulk of this blog provides a detailed discussion of the encoder provided as part of the Device Porting Kit (DPK), including how to download it if you don’t have the DPK and a detailed description of the numerous encoder parameters.
In upcoming installments, we’ll discuss an encoder utility based on the Windows Imaging Components (WIC) interfaces that we developed for our own internal testing. We’ll also cover metadata, color profiles, more on alpha channels, and encoding best practices for different scenarios including RAW acquisition, JPEG transcoding, editing, web or email delivery, mobile devices, printing, archiving, and a few more I’m sure we’ll think up along the way. As always, if you have recommendations or specific suggestions for topics you’d like to see covered, send an email or post a comment.
Now, on with the discussion of Windows Media Photo encoding...
When you want to create a JPEG, TIFF or BMP file, it either came from your camera or scanner in that format, or you load up your image editing application of choice and save in that format. At present, there aren’t any commercial applications available that save images as Windows Media Photo files. In the interim, we’ve developed two different conversion utilities as part of creating Windows Media Photo, and we’re actively working on new tools that will be available in the near future.
Our current encoder tools include a choice between the sample applications that are part of the Windows Media Photo Device Porting Kit (DPK) or an encoder utility based on the Windows Imaging Components (WIC) interfaces that we developed internally for our own testing purposes. Both tools are command line utilities, and each has its own specific strengths. For the remainder of this discussion, we’ll refer to these two different encoders (and their associated decoder) as the DPK Tools and the WIC Tools.
The DPK Tools are very limited in the uncompressed formats they can convert to and from, and have no ability to convert among pixel formats when encoding or decoding. However, they do provide a few special tricks, discussed below. Also, because the DPK Tools are built from the DPK sample application source code, they can be easily customized for any special-purpose requirement. The WIC Tools rely on the installed WIC codecs for source and destination image file formats, and can take advantage of all the facilities provided by WIC, including pixel format conversion. That imposes some restrictions, but will grow dynamically, including the ability to encode directly from RAW file formats as new codecs are added.
The Windows Media Photo Device Porting Kit (DPK) includes sample application source code for a Windows Media Photo encoder application (WMPEncApp.exe) and a Windows Media Photo decoder application (WMPDecApp.exe). These are command line utilities that convert between Windows Media Photo and an equivalent uncompressed image format.
Any licensee of the DPK can build these tools from the source code included in the DPK. If you don’t have a copy of the DPK, I’ve provided the compiled DPK 1.0 RC Windows compatible version of these utilities here. These should run on just about any version of Windows (though we’ve never tested on anything prior to Windows XP.) Using the DPK, you can also make versions of these utilities for other platforms and operating systems.
The DPK Tools are based on the DPK reference source code. They are completely stand-alone and make no other calls to any external libraries other than basic operating systems support. The DPK tools contain no platform-specific optimization.
This command line utility converts certain uncompressed file formats into equivalent Windows Media Photo files. It provides a complete set of command line options to control all supported Windows Media Photo Encoder options. Here is a summary of the usage of WMPEncApp and the various command line options. All of these options will be discussed in detail in later sections.
-i input.bmp/tif/hdr Input image file name
bmp: <=8bpc, BGR
tif: >=8bpc, RGB
hdr: 24bppRGBE only
-o output.wdp Output Windows Media Photo file name
-q quality [1 - 255] Default = 1, lossless
-c format Required to define uncompressed
source pixel format
-d chroma sub-sampling 0: Y-only
1: YCoCg 4:2:0
2: YCoCg 4:2:2
3: YCoCg 4:4:4 (default)
-l overlapping 0: No overlapping
1: One level overlapping (default)
2: Two level overlapping
-f Frequency order bit stream
(default is spatial)
-t Display timing information
-v Display verbose encoder information
-V tile_wd0 tile_wd1 ... Macro block rows per tile
-H tile_ht0 tile_ht1 ... Macro block columns per tile
-U num_h_tiles num_v_tiles Horizontal & vertical tile count for
-b Black/White Applies to 1bpp black/white images
0: 0 = black (default)
1: 0 = white
-a alpha channel format Required for any pixel format
with an alpha channel
2: Planar alpha
3: Interleaved alpha
Other: Reserved, do not use
-F trimmed flexbits [0 – 15] 0: no trimming (default)
15: trim all
-s skip subbands 0: All subbands included (default)
1: Skip flexbits
2: Skip highpass
3: Skip highpass & lowpass (DC only)
So for example to create a Windows Media Photo file from a typical 24-bit .bmp using reasonably high quality lossy compression, the command line would be:
wmpencapp -i input.bmp -o output.wdp -q 10
This scenario uses the default settings for most of the encoder options. Obviously, we’d like to take full control of these options to choose exactly how the Windows Media Photo file is created. The following sections describe these options in detail.
NOTE: The –F and –s command line options control compressed domain transformation features that go beyond the scope of this documentation. We’ll save a discussion of compressed domain operations for another day.
Pixel Format: -c Option & the Uncompressed Source File Format
The DPK Tools are certainly not general purpose file format conversion utilities. They provide the absolute minimum support for uncompressed source and destination file formats; the specific file formats supported are TIFF, BMP and HDR. Only certain variations of these formats are supported and they are, for the most part, tied to the pixel format being converted to or from. The DPK Tools do not perform any pixel format conversion. Therefore the source uncompressed image must be in the desired pixel format for the encoded Windows Media Photo file. Only minimal data validation is performed; if the source pixel format is incorrect, you will most likely create a bad Windows Media Photo file.
Here are the uncompressed source file formats supported by the DPK Tools, and the specifics for each pixel format supported. Each of the file formats and specific image formats listed below correspond to a mode that can be created using Adobe PhotoShop CS2.
TIFF The image should be flattened; it should not contain any layers. If it does contain layers, the “Discard Layers and Save a Copy” option should be selected under Layer Compression. Image Compression must be set to “None”; any compressed TIFF format will cause an error or convert to a bad Windows Media Photo file. TIFF images must always be stored in “Interleaved” pixel order; “Per Channel” pixel order is not supported. Byte order should be set to “IBM PC.” “Save Image Pyramid” should be unchecked.
The specific pixel format created depends on the correct combination of Image Mode in PhotoShop, the specific TIFF save options specified, and the encoder pixel format option specified for WMPEncApp.exe. The following table lists all the possible combinations
RGB/16 w/ alpha
16 bit (Half)
RGB/32 w/ alpha
16 bit (Half)
RGB/32 w/ alpha
32 bit (Float)
Fill alpha black
RGB/32 w/ alpha
32 bit (Float)
CMYK/8 w/ alpha
CMYK/16 w/ alpha
16 bit (Half)
32 bit (Float
For 1bppBlackWhite encoding, the –b option allows you to specify how interpret black vs. white values. When encoding from TIFF files, the default will generate the correct results and this option should not be required.
BMP This file format is only used for 8 bit per channel (bpc) or smaller bit depths. It differs from the equivalent TIFF format because RGB data is stored in the uncompressed bit stream in BGR rather than RGB channel order. When creating the uncompressed image in PhotoShop, only RGB/8 mode should be used. Under BMP Options, the File Format should always be set to “Windows”. “Compress (RLE)” and “Flip row order” should never be checked. The Basic or Advanced Modes under the BMP Save options should be set in combination with the appropriate encoder options to achieve the desired pixel format according to the following table.
Basic: 24 bit
RGB/8 w/ alpha
Basic: 32 bit
Fill alpha black
RGB/8 w/ alpha
Basic: 32 bit
Adv: X1 R5 G5 B5
Adv: R5 G6 B5
HDR This file format is only used for encoding to the Windows Media Photo 24bppRGBE pixel format. This special floating point pixel format uses a shared exponent and three independent mantissas to encode the entire pixel. Unfortunately, while PhotoShop CS 2 supports saving in .HDR file mode, it only saves using the compressed option. WMPEncApp.exe only supports uncompressed .HDR files. So, we’ve provided a simple command line utility called HDR2HDR that only does one thing: It reads and decompresses a compressed .HDR file and saves it as an uncompressed .HDR file. HDR2HDR is included in the DPK. If you don’t have the DPK, it’s also available here.
PhotoShop Save As…
Fixed Point Pixel Formats
The one significant omission from the pixel formats listed in the tables above is the set of fixed point formats. As discussed in our previous blog on high dynamic range, wide gamut pixel formats, the fixed point pixel formats provide an excellent solution for retaining the full content of an image source while still providing efficient storage and processing.
However, these fixed point formats are one of the innovations introduced with Windows Media Photo and the new graphics infrastructure in Windows Vista and .NET Frameworks 3.0. No other file format supports fixed point encoding. Because WMPEncApp.exe has no built-in capability for pixel format conversion, there is no way we can encode fixed point Windows Media Photo files from standard TIFF or BMP files.
In reality, if we write a little software, we could convert floating point image data to fixed point and still store it in a TIFF file. While the resulting TIFF file containing fixed point image data would not display correctly, it could be converted to a fixed point Windows Media Photo file using WMPEncApp.exe. That’s why there are –c option values defined for the various fixed point pixel formats. But for this to work, you’re on your own to first convert the uncompressed image data to fixed point. I do have a utility that does this, but trust me, it’s not ready for any public use!
If you do choose to write your own software to generate uncompressed fixed point data, you might as well simply write a WIC or Windows Presentation Foundation (WPF) application and directly call the Windows Media Photo converter. This makes a lot more sense than writing out fixed point data in a non-standard TIFF file and then using WMPEncApp.exe.
If you can hang in there for Part 2, we will discuss how you can use the WIC Tools to create fixed point Windows Media Photo files. Since the WIC Tools have full access to all the capabilities of WIC, the encoder can call the appropriate pixel format converter and create a Windows Media Photo file in any pixel format, independent of the uncompressed source image format.
Unsupported Pixel Formats
There are several other pixel formats that can’t easily be encoded using WMPEncApp.exe
It is possible to encode images in any of the pre-multiplied alpha pixel formats, but the uncompressed image will have to first be pre-processed to multiply the RGB channels by the alpha channel value. This can be done using the appropriate blending operations in Photoshop. If there is any interest, I’ll describe that process in another installment (and provide an action to offer some automation.) Like fixed point pixel formats, a TIFF file in a pre-multiplied pixel format will not display correctly, but can be converted to an equivalent Windows Media Photo file using WMPEncApp.exe. But also like fixed point pixel formats, if you wait for Part 2, we’ll discuss how this can be done much more easily using the WIC Tools.
While WMPEncApp.exe includes a –c option value for 32bpcRGB101010 pixel format, there is no way to create this pixel format in an uncompressed file using PhotoShop. This is another pixel format that we’ll handle using the WIC Tools, which we will discuss in Part 2.
WMPEncApp.exe does not support encoding any of the n-Channel pixel formats. Creating uncompressed n-Channel data will be a whole topic in itself, which we will address in a future installment of this blog series. Once we understand how to create this n-Channel data, we’ll be able to use the WIC Tools to encode it in Windows Media Photo file format.
Compression Choices: -q, -d & -l Options
There are three parameters that control the tradeoffs between image quality and compressed file size. In addition to their individual effect, it’s also important to understand how these three parameters interact with each other.
-q Quantization (1-255)
One of the principal ways lossy compression is achieved is to “quantize” a set of continuous values into a smaller set of representative values. In this way, “loss” is achieved by mapping values that are close together to the same value. Only the remaining set of values needs to be coded and saved, reducing the amount of storage required. The greater the degree of quantization, the more the content can be compressed, but in doing so, more of the small differences among similar values are lost.
The quantization level, specified by the –q option basically defines the amount of similarity that can be discarded. It is an arbitrary range from 1-255; the quantization value does not correspond directly to any specific difference amount. Quantization determines the desired image quality rather than the desired compression ratio. The actual compression ratio is a function of both the quantization and the image content; images with less complex content will have fewer differences among values and will achieve better compression without the need for greater quantization. Additionally, the quantization is also highly dependent on the specific pixel format, most importantly the bit depth. Higher values will be required to greater bit depths to achieve a comparable compression ratio, since larger bit depths provide a greater range of possible values and therefore will need more quantization.
Windows Media Photo provides the unique capability of preserving all data values during quantization, effectively providing mathematically lossless compression. When the quantization is set to 1, no values are discarded and all encoded pixel values will be returned with absolutely no loss. This is the default setting if no value for the –q option is specified.
-d Chroma Sub-sampling (0, 1, 2 or 3)
We can choose to reduce the resolution of the chrominance of an image prior to the quantization process. Reducing the chrominance resolution, or chroma sub-sampling, has long been understood as an effective way to reduce image content with very little perceptible degradation. In fact, virtually all television or video you watch, whether analog or digital, takes advantage of chroma sub-sampling to reduce the required bandwidth. The JPEG compression format always uses chroma sub-sampling as well. In fact, the unique capability of Windows Media Photo is not that we provide chroma sub-sampling, but that we provide a mechanism for you to reduce or eliminate this technique to improve image quality. Of course, this only applies to RGB color images.
An image is first reorganized from RGB into a channel for luminance and two channels to describe the color information (or chrominance.) If all chrominance is discarded, what’s left is a monochrome image. Typically, we don’t want to go that far!
Many video systems, as well as the JPEG compression format (or at least the most common variant of it that we all use) discards 75% of the chrominance information. The resolution of the color information is reduced by a factor of two in both dimensions. So every four pixels in an image are represented by four luminance values but only two (one for each chroma channel) chrominance values. What started out as 12 values (four pixels with three channels each) has been cut in half; only 6 values (four luminance values and two chrominance values) have to be saved.
In the world of digital imaging, this is referred to as 4:2:0 chroma sub-sampling, or more simply as 4:2:0. When all chrominance information is retained (no values are discarded), this is referred to as 4:4:4. Another popular approach, particularly for professional video applications, is to only discard 50% of the chroma values; two values for each chroma channel, or four values in total are retained. This is referred to as 4:2:2. Finally, if we discard all color information, retaining only the luminance, this is described as 4:0:0. Windows Media Photo supports all these modes.
-c 3 (4:4:4) All color information is retained, assuring full resolution of the chrominance information. This is the default and is the recommended setting to achieve the best overall image quality. Whenever an image is stored as an intermediate format and further editing is anticipated, it is highly recommended to use 4:4:4.
-c 2 (4:2:2) The color information is encoded at ½ the resolution of the luminance information. Four each set of four pixels, four luminance values are used and the eight chrominance values are reduced down to four (two for each chroma channel.) This provides perceptively lossless color encoding for the final delivery of an image. However, if further editing of the image is anticipated, it’s recommended than any chroma sub-sampling be avoided.
-c 1 (4:2:0) The color information is encoded at ¼ the resolution of the luminance information. Four each set of four pixels, four luminance values are used and the eight chrominance values are reduced down to two (one for each chroma channel.) This is the same sub-sampling used by JPEG. When converting a JPEG file to Windows Media Photo, there is no need to specify a higher chroma sub-sampling mode than 4:2:0.
-c 0 (4:0:0) All color information is discarded and only the luminance information is retained, effectively creating a monochrome image. For performance reasons, Windows Media Photo uses a non-traditional method to calculate luminance. Therefore, the resulting monochrome image will not appear identical to a monochrome version of the image created using other tools. Additionally, although all color information is discarded, the pixel format is not changed, so the image is still stored using an RGB pixel format. It is strongly recommended that if you want to create a monochrome image, the image should first be converted to monochrome using an appropriate image editing application to achieve the desired result, and then this monochrome image should be encoded using the appropriate Gray pixel format.
-l Overlap Processing (0, 1, 2)
Windows Media Photo uses an advanced version of a macro-block based compression scheme. To achieve the best performance and minimize the amount of memory required to encode or decode an image, the overall image is subdivided into a set of 16x16 pixel macro blocks. Each macro block is are further divided into four 4x4 pixel blocks. All image encoding and decoding operations are peformed on these blocks and macro-blocks. As a result, for high quantization values (when we are discarding a higher amount of similar pixel values), the steps between blocks and macro blocks may become visible as artifacts in the compressed image. This is very common with JPEG (which also uses macro blocks) and significantly reduces the amount of compression that can be used without creating these visible artifacts.
Windows Media Photo addresses this problem through a combination of better quantization and an additional step of overlap processing. This overlap processing takes into account the values of pixels in neighboring blocks and macro blocks when choosing the quantization values that represent similar adjacent pixels. By doing so, the visible differences among adjacent blocks and macro blocks are dramatically reduced.
Two levels of optional overlap processing can be specified via the –l parameter. Single level overlap processing (-I 1) is performed at the 4x4 block level. For all pixels in the block, bordering pixels in adjacent blocks are also evaluated when choosing the quantization values for that block. Double level overlap processing (-l 2) also analyzes neighboring adjacent pixels when choosing quantization values at the 16x16 macro-block level.
The default value for the –l parameter is 1 and single level overlap processing should be used for most typical encoding scenarios. For very high quantization levels, double-level overlap processing may be appropriate, but this will trade off the potential for macro block artifacts for a loss of image detail. Setting the –l parameter to 0 suppresses any overlap processing. This can speed performance, but is only recommended for very low quantization values. Specific quantization thresholds for choosing the appropriate level of overlap processing are highly dependent on the image content and cannot be predicted. Trial and error will be your best guide. But when in doubt, stick with the default.
Image Organization Choices: -f, -U, -V, -H & -a Options
In addition to controlling the quality vs. the size of the image, Windows Media Photo provides a number of choices on exactly how the image information is structured or organized within the file. This includes the alpha channel structure, image tiling, and the overall data order of frequency vs. spatial.
-a Alpha Channel Structure
Obviously, this option only applies to images with alpha channels. The –a option is required for any uncompressed source image that contains an alpha channel and it will be ignored if the image does not include an alpha channel.
Windows Media Photo supports both interleaved and planar alpha channels. An interleaved alpha channel is stored in sequence with the channels that describe the image contents (RGB or CMYK). It simply adds an additional channel to each pixel. A planar alpha channel is stored as a completely separate image within the Windows Media Photo file container. The alpha channel is encoded separately from the image RGB or CMYK data. The decoder can decode both and re-interleave the channels to deliver a bitmap with alpha channel. Or if only one element (the image content or the alpha channel) is required, a decoder can return just that portion with no need for all the additional processing required for the other portion.
Setting the –a parameter to 3 specifies that the image be encoded with an interleaved alpha channel. Conversely, setting this option to 2 will encode an image with a planar alpha channel. Any other value is illegal and will generate an error.
At present, WMPEncApp does not allow you to specify different encoding parameters (specifically the quantization value) for a planar alpha channel vs. the image content. We will consider adding this feature in the future.
-f Frequency Order vs. Spatial Order
Windows Media Photo makes it possible to organize the compressed image data sequentially in either spatial or frequency order. Spatial order is the typical choice for encoding by a device. The sequential compressed data stream represents the image in macro block rows starting at the upper left corner, from left to right and from top to bottom. It allows the image to be encoded sequentially in rows of pixels, minimizing the total memory required. Frequency order groups the data in three different frequencies and places it sequentially in the file starting with the low frequency information, followed by the middle frequency, and finally by the high frequency details. Frequency order makes it much more efficient to decode a low resolution version of the image, minimizing the amount of compressed image data that must be parsed to find the required low frequency content. When encoding on a typical personal computer, the performance difference between encoding in frequency order vs. spatial order is insignificant.
Including the –f option specifies that the image be encoded in frequency order. This is typically preferred because of the performance benefits when decoding the image to lower resolutions. Omitting the –f option will encode the image in spatial order. This is (obviously) the default.
-U, -V, -H Image Tiling
Windows Media Photo allows an image to be subdivided into individual rectangular tiles. Each tile is stored in the compressed bit stream as a fully self-describing sub-picture. This makes it possible to decode a tile without ever having to process the compressed data for any other tile. The main purpose for this feature is to optimize an image for region decoding. The request to decode an arbitrary region only needs to process the tiles that represent that region.
Both uniform tiling and non-uniform tiling are supported. With uniform tiling, all tiles (with the potential exception of the right-most column and bottom-most row) share the same width and height. With non-uniform tiling, the desired with and height for each tile row and column can be specified. Tiles always have uniform height within each tile row, and uniform width within each tile column.
The –U option, followed by the column and row count, specifies uniform tiling. The image is sliced into the requested number of columns and rows, spacing them as evenly as possible. Tiles are always a multiple of macro blocks (16x16 pixels.) If the image width or height is not evenly divisible in macro block increments by the requested column and row count, the right-most column and/or bottom-most row will be re-sized accordingly. The remaining columns and rows will always be of uniform width and height. Columns cannot be less than one macro block in width and rows cannot be less than one macro block in height. If the requested column or row count results in tiles smaller than this, the appropriate column or row count will be adjusted accordingly.
Instead of using the –U option for uniform tiling, the –H and –V options can be used to specify a vector of non-uniform tile widths and heights. The vector of space-delimited values following each parameter expresses the tile dimension in macro blocks (multiples of 16 pixels.) If insufficient values are specified to describe the entire width or height of the image, the right-most column and/or bottom-most row will be sized to contain the remaining pixels. If the vector of macro block sizes exceeds the dimension of the image, the extra values in the vector will be ignored and the right-most column and/or bottom-most row will be resized to match the remaining pixels in that image dimension.
In general, image tiling is not required. Its use, and the appropriate choice of tile size, is application dependent. It’s recommended that to minimize the performance penalties associated with tiling, tiles smaller than 256x256 pixels should be avoided.
Please note that the Beta 2 release of Windows
Encoder Status Reporting: -v, -t
The following are some additional encoding options that control the encoding process itself.
-v Verbose mode
When present, option enables the output of extended status and results information via the STDOUT output. This information can be piped to a file or other destination using the standard command line conventions (> or >>) for STDOUT piping. Most of the reported information is self-explanitory. I’ll save a more detailed discussion about this for another day.
-t Timing information
When present, this option enables the output of encoder timing information via the STDOUT output. As above, it can also be redirected. This timing information was something we included for our own testing. It does not use a very accurate method to measure performance and while it may be informative, it should not be relied on as an precise performance indicator. Also, please remember that the DPK Tools do not include the platform optimization code that is implemented in the WIC Tools.
Wow! This blog entry turned out to be a lot longer than I expected (10 pages when printed in 10pt.) There is just so much to talk about to cover all the capabilities supported when encoding Windows Media Photo files. This is just the first installment. Stay tuned for Part 2; it will cover the WIC Tools, and depending on how long that goes, also dive into some of the more advanced topics, like metadata and color profiles.