Lost Column #2: Unsafe Image Processing

(circa November 15, 2001) 

I think this column stands up pretty well without caveat.

I should note that I wrapped the image class to provide nicer pixel-based access in a class you can find here. I suggest basing your code on that rather than what I wrote in the column.

I should also note that the grey-scale code isn't what you want. The human eye is not equally sensitive to all colors, so you should use the following to determine the intensity:

 0.299 * red + 0.587 * green + 0.114 * blue

 Finally, I'll note that the extreme speedup I get here is because of how the underlying unmanaged GDI+ code is structured.

******

Unsafe Image Processing

Last month, we talked a bit about what unsafe code was good for and worked through a few examples. If you looked at the associated source code, you may have noticed that there was an image-processing example. This month, we're going to work through that example.

Grey Scale Images

Last summer, I was writing a program to process some pictures from my digital camera and I needed to do some image processing in C#. My first task was to figure out how to do that using the .NET fFrameworks. I did a bit of exploration, and found the Image and Bitmap classes in the System.Drawing namespace. These classes are thin covers over the GDI+ classes that encapsulate the image functions.
One task I wanted to do was to walk through an image and convert it from color to grayscale. To do this, I needed to modify each pixel in the bitmap. I started by writing a BitmapGrey class that would encapsulates the bitmap, and then writing a function in that class to do the image conversion. I came up with the following class:

public class BitmapGrey
{
    Bitmap bitmap;

    public BitmapGrey(Bitmap bitmap)
    {
        this.bitmap = bitmap;
    }

    public void Dispose()
    {
        bitmap.Dispose();
    }

    public Bitmap Bitmap
    {
        get
        {
            return(bitmap);
        }
    }

    public Point PixelSize
    {
        get
        {
            GraphicsUnit unit = GraphicsUnit.Pixel;
            RectangleF bounds = bitmap.GetBounds(ref unit);

            return new Point((int) bounds.Width,
                             (int) bounds.Height);
        }
    }

    public void MakeGrey()
    {
        Point size = PixelSize;

        for (int x = 0; x < size.X; x++)
        {
            for (int y = 0; y < size.Y; y++)
            {
                Color c = bitmap.GetPixel(x, y);
                int value = (c.R + c.G + c.B) / 3;
                bitmap.SetPixel(x, y,
                                Color.FromArgb(value, value, value));
            }
        }
    }
}

The meat of the class is in the MakeGrey() method. It gets the bounds of the bitmap, and then walks through each pixel in the bitmap. For each pixel, it fetches the color and determines what the average brightness of that pixel should be. It then creates a new color value for the brightness value, and stores it back into the pixel.

This code was simple to write, easy to understand, and worked the first time I wrote it. Unfortunately, it's pretty slow. ; iIt takes about 14 seconds to process a single image. That's pretty slow if I compare it to a commercial image processing program, such as Paint Shop Pro, which can do the same operation in under a second.

Accessing the Bitmap Data Directly

The majority of the processing time is being spent in the GetPixel() and SetPixel() functions, and to speed up the program, I needed a faster way to access the pixels in the image.  There's an interesting method in the Bitmap class called LockBits(), which can be used – not surprisingly – to lock a bitmap in memory so that it doesn't move around. With the location locked, it's safe to deal with the memory directly, rather than using the GetPixel() and SetPixel() functions. When LockBits() is called, it returns a BitmapData class. The scan0 field in the class is a pointer to the first line of bitmap data. We'll access this data to do our manipulation.

First, however, we need to understand a bit more about the how the data in the image is arranged.

Bitmap Data Organization

The organization of the bitmap depends upon the type of data in the bitmap. By looking at the PixelFormat property in the BitmapData class, we can determine what data format is being used. In this case, I'm working with JPEG images, which use the Format24bppRgb (24 bits per pixel, red green blue) format.

Since we're going to be looking at these pixels directly, we'll need a way to decode a pixel. We can do that with the PixelData struct:

public struct PixelData
{
    public byte blue;
    public byte green;
    public byte red;
}

Now, we'll need a way to figure out what the offset is for a given pixel. Basically, we treat the bitmap data as an array of PixelData structures, and figure out what index we'd be looking for to reference a pixel at x, y.

In memory, a bitmap is stored as one large chunk of memory, much like an array of bytes.  We therefore need to figure out what the offset is of a given pixel from the beginning of the array. Here's what a 4x4  bitmap would look like, with the index and (y, x) location of each pixel.

0: (0, 0) 1: (0, 1) 2: (0, 2) 3: (0, 3)
4: (1, 0) 5: (1, 1) 6: (1, 2) 7: (1, 3)
8: (2, 0) 9: (2, 1) 10: (2, 2) 11: (2, 3)
12: (3, 0) 13: (3, 1) 14: (3, 2) 15: (3, 3)

For each line, the offset is simply the width of the line in pixels times the y value. This gives the index of the first element of the line, and the x value is simply added to that value to determine the actual index of an element. This is the way that multi-dimensional arrays are typically stored in memory.

So, we can figure out the offset as follows:

Offset = x + y * width;

The code to access a given pixel is something like:

PixelData *pPixel;
pPixel = pBase + x + y * width;

Where pBase is the address of the first element.

If we go off and write some code, we'll find that this works great for some bitmaps, but doesn't work at all for other bitmaps. It turns out that the number of bytes in each line must be a multiple of 4 bytes, and since the pixels themselves are 3 bytes, there are some situations where there's some unused space at the end of each line. If we use the simple version of indexing above, we'll sometimes index into the unused space.

Here's an example of what this looks like in a bitmap, with each box now corresponding to a byte rather than a pixel.

0,0 blue green red 0,1blue green red 0,2blue green red      
1,0 blue green red 1,1blue green red 1,2blue green red      

This bitmap occupies 24 bytes, with 12 bytes per line. However, it only has three entries, so each line must be padded with an extra 3 bytes.

To make our code work everywhere, we need to switch from working in terms of pixels to working in terms of bytes, at least for dealing with the line indexing. We also need to calculate the width of a line in bytes. We can write a function that handles this all of this for us:

public PixelData* PixelAt(int x, int y)
{
    return (PixelData*) (pBase + y * width + x * sizeof(PixelData));
}

Once that we've gotten that written that bit of code, we can finally write an unsafe version of our grey scale function:
public void MakeGreyUnsafe()
{
    Point size = PixelSize;
    LockBitmap();

    for (int x = 0; x < size.X; x++)
    {
        for (int y = 0; y < size.Y; y++)
        {
            PixelData* pPixel = PixelAt(x, y);

            int value = (pPixel->red + pPixel->green + pPixel->blue) / 3;
            pPixel->red = (byte) value;
            pPixel->green = (byte) value;
            pPixel->blue = (byte) value;
        }
    }
    UnlockBitmap();
}

We start by calling a function to lock the bitmap. This function also figures out the width of the bitmap and stores the base address away. We then iterate through the pixel, pretty much just as before, except that for each pixel, we get a pointer to the pixel and then manipulate it through the pointer.

When this is tested, it only takes about 1.2 seconds to do this image. That's over 10 times faster. It's not clear where all the overhead is in GetPixel() / SetPixel(), but there's obviously some overhead in the transition from managed to unmanaged code, and when you have to do the transition 3.3 million times per image, that overhead will adds up.

I originally stopped with this version, but a bit of reflection suggested another opportunity for improvement. We're calling the PixelAt() function for every pixel, even when we know that the pixels along a line are contiguous.  We'll exploit that in the final version:

public void MakeGreyUnsafeFaster()
{
    Point size = PixelSize;
    LockBitmap();

    for (int y = 0; y < size.Y; y++)
    {
        PixelData* pPixel = PixelAt(0, y);
        for (int x = 0; x < size.X; x++)
        {
            byte value =
                (byte) ((pPixel->red + pPixel->green + pPixel->blue) / 3);
            pPixel->red =  value;
            pPixel->green = value;
            pPixel->blue = value;
            pPixel++;
        }
    }
    UnlockBitmap();
}

The loop in this version does the y values in the outer loop, so we can walk through the entire x line. At the beginning of each line, we use PixelAt() to find the address of the first element, and then use pointer arithmetic (the pPixel++ statement) to advance to the next element in the line.

My test bitmap is 1,536 pixels wide. With this modification, I'm replacing a call to PixelAt() with an additional operation in all but one of those pixels.

When this version is tested, the elapsed time is 0.52 seconds, which is over twice as fast as our previous version, and nearly 28 times faster than the original version. Sometimes unsafe code can be really extremely useful, though gains of 28x are pretty rare.