Pulling Images from Google using C# Express

*** Updated Jan 23, 2006 ***

I finally got around to updating this for the final release of Visual Studio 2005. Download the updated code here.

******

 

My Channel9 video that uses Visual C# Express to pull images from images.google.com is now available. You can find the video on Channel9 and download the code here.

 

Walkthrough

I’ll use this blog post to walk through the basics of the application and point out Whidbey features along the way. You’ll notice a Toolstrip control at the top of the page with an image containing the C# Express logo, a textbox and a button. The ToolStripTextBox uses the new autocomplete features in VS ‘05 (you’ve probably run into this with Windows or Internet Explorer automatically populating entries) and uses a custom data source. While you can define these programmatically by creating an AutoCompleteStringCollection, I decided to hard code the autocomplete values in at design time by changing the AutoCompleteCustomSource property and typing in strings. For example purposes, I'll search for Steve Ballmer.

 

 

We return pictures from Google into a ListView control and when a user clicks on the ListView control, we populate a large picture box with the high-resolution image as shown below.

I also added a context Menu on the high-resolution image so that you can set the selected picture as your Wallpaper.

 

 

 

How it all Works

If you’re familiar with how asynchronous operations work with the .NET Framework 1.1, you’ll realize that while it is a consistent programming model, it’s still a little complex (BeginInvoke, IAsyncResult, WaitHandle’s etc) then you would probably like. VS ‘05 makes async programming a lot easier with the new BackgroundWorker class that can go execute a long running task in a background thread without the main UI thread feeling like it has hanged. The Background worker class has some important properties and events:

  • WorkerSupportsProgress and WorkerSupportsCancellation which hold true/false values that let the backgroundworker report progress or cancel an async progress respectively.

  • The DoWork event fires when you call the BackgroundWorker.RunWorkerAsync() method. You’ll simply need to hook up an event handler to the DoWork event that will do whatever long running task you want to have run on another thread.

  • The ProgressChanged event fires when you call the ReportProgress method inside of the DoWork event and allows you to show the current progress of the long running task.

  • The RunWorkerCompleted event fires when the asynchronous operation has completed.

As shown below, the user clicks on the search button and we pass the search value to the background worker. The background worker calls Google images, parses the urls for images and we then retrieve all of the images into local image objects. We return the list of urls and images back to the original thread. We then bind the images to the ListView control. If a user clicks on a specific image, we load the high-res image and load it into the picture box. If a user right clicks on the image, they can set the image as the machine’s wallpaper.

The URL for the image and the image itself are stored in a generic dictionary class with the url representing the key and the image representing the value as shown below:

internal Dictionary<string, Image> GoogleImages = new Dictionary<string, Image>();

RunWorkerAsync accepts an object type parameter so that you can pass variables/data into the background worker thread.

googleBackgroundWorker.RunWorkerAsync(txtSearch.Text);

Note: The code that is being executed when the DoWork event fires will run on a separate thread then the UI. It’s important to remember that code on this thread cannot change controls on the original UI thread. Because of this, we’ll get all the images and do the actual databinding in the RunWorkerCompleted event.

Since we passed an argument (the text to search) as an object type, we can get the value out of the DoWorkEventArgs e by casting to a string. The GetImagesFromGoogle method calls Google and returns a list of image URLs by applying a regular expression to a string that contains the raw html.

WebUtilities.GetImagesFromGoogle((string)e.Argument);

With the list of image URLs parsed from the Google HTML file completed, we can then loop through each url and load the image by creating a WebClient object with the image url and reading the image bits into a stream object. We then convert the bytes in the stream to an image by calling the Image.FromStream method and finally we add our url and image to our dictionary. We do a try/catch/finally statement in case the image has been taken off of Google, but we really just want to continue rather then do anything fancy here (like stop the application) and we call the finally statement after each image to report progress to the UI thread. As we need to make 21 HTTP requests (one for the original images.google.com page and up to 20 request for each image in the results page) we show user progress in a progress bar as each image is loaded.

foreach (string url in imgUrls)

{

try

     {

       Stream ImageStream = new WebClient().OpenRead(url);

       Image img = Image.FromStream(ImageStream);

       UrlAndImage.Add(url, img);

     }

     catch { }

     finally

     {

  googleBackgroundWorker.ReportProgress(progress++);

     }

  

The ReportProgress method receives an integer value and we read it out from the ProgressChangedEventArgs e variable and increment the progress bar as shown below

void backgroundWorker1_ProgressChanged(object sender, ProgressChangedEventArgs e)

{

  googleProgressBar.Value = e.ProgressPercentage;

}

Note: The WorkerReportsProgress property must be set to true here to report progress.

Once we’ve pulled every image, we return our dictionary of urls and images by setting the e.Result property.

e.Result = UrlAndImage;

When our BackgroundWorker async progress is complete, it will then call the RunWorkerCompleted method on the original UI thread. We can now manipulate the Windows Forms controls on this thread. At design time, I set a couple of ListView properties to set the View for the control to “LargeIcon” and I set the LargeImageList property to my ImageList control. I need to now loop through each entry in our dictionary and add the key and the image to the ImageList control which we will use to store the images. The key from our dictionary is important here as we’ll use the key to map the right url to the right image in our ListView. The ListView.Items.Add() method overload I call expects a string of text to describe the ListView item, which I set to the url, and a string to represent the image key in the ImageList as I discussed above.

void backgroundWorker1_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)

{

  googleProgressBar.Visible = false;

  GoogleImages = (Dictionary<string, Image>)e.Result;

  foreach (string key in GoogleImages.Keys)

  {

  lowResImageList.Images.Add(key, GoogleImages[key]);

    googlePicturesListView.Items.Add(key, key);

  }

}

 

Debugging The Code

Visualizers

In the Channel9 video I showed how you can add a breakpoint to code that’s running on the background worker thread and I show the HTML visualizer. If you’ve ever written code to screen scrape a web page, you’ll definitely appreciate the string visualizers we’ve added in 2005. You can add a breakpoint to your code and get a rich view of your in-memory variables. Below are the screenshots from the HTML and Text visualizers which show you exactly what Google returned either in a browser view or in a straight text view. Very cool.

 

HTML Visualizer for Google Request

Text Visualizer for Google Request

 

Modifications/Wish List

It’s hard to release a sample as you know what’s ugly under the covers and all the cool things you planned to do, but things like sleep get in the way. If I had some free time, here’s what I would add/change:

Small Changes

· Change the backgroundWorker method names

· The OpenRead(url) method on WebClient class (I thought this was more readable then the request/response model) expects a string data type rather then a URI class. Because of this, I decided not to use the URI class and used the URL variable name to make the distinction clear. This isn’t much of a problem really, but it would be something I would consider switching in the future. The change would look like the following:  

internal

Dictionary<Uri, Image> GoogleImages = new Dictionary<Uri, Image>();

·  Add a way to cancel the background event. The code is trivial, it just got Pri 2’d.

· Format the ListView a bit. Rather then have the full url, simply show the file name. Add the ability to turn checkboxes on and off so that you can File…Save All images. These were pretty much Pri 3’s

· Add a status bar and move the Progress bar into the status bar. Use the status bar to not only show progress, but remove the message box pop-ups when the Wallpaper changed and instead send that message to a label on the status bar. Pri 3.

Larger Changes

· Add a way to see the next page of Google image results. The current model only shows results from the top page. Again, not too difficult, but Pri 2.

· Create a nice API to read/write/serialize values that the user enters into the textbox so that way we’ll have a history of those values. Pri 3.

· Add Menus for File, Save All (mentioned above), Exit, Options. The Options window is the bigger work item here and it would allow you to enter network configuration information (proxy, port, etc and let you “test” your connection). The new System.Net classes make this pretty easy Pri 3.

· Integrate this with our RSS Screensaver Starter Kit so that your screensaver can automatically change background images based on images in images.Google.com. In general, I’d like to see the RSS screensaver use a provider model so that images can be source agnostic whether it be file system, object, database, web service, etc. Pri 3