Pulling Images from Google using C# Express


*** Updated Jan 23, 2006 ***


I finally got around to updating this for the final release of Visual Studio 2005. Download the updated code here.


******


 


My Channel9 video that uses Visual C# Express to pull images from images.google.com is now available. You can find the video on Channel9 and download the code here.


 


 


Walkthrough


I’ll use this blog post to walk through the basics of the application and point out Whidbey features along the way.  You’ll notice a Toolstrip control at the top of the page with an image containing the C# Express logo, a textbox and a button. The ToolStripTextBox uses the new autocomplete features in VS ‘05 (you’ve probably run into this with Windows or Internet Explorer automatically populating entries) and uses a custom data source. While you can define these programmatically by creating an AutoCompleteStringCollection, I decided to hard code the autocomplete values in at design time by changing the AutoCompleteCustomSource property and typing in strings. For example purposes, I’ll search for Steve Ballmer.


 


 


We return pictures from Google into a ListView control and when a user clicks on the ListView control, we populate a large picture box with the high-resolution image as shown below.



I also added a context Menu on the high-resolution image so that you can set the selected picture as your Wallpaper.


 


 


 


How it all Works


 


If you’re familiar with how asynchronous operations work with the .NET Framework 1.1, you’ll realize that while it is a consistent programming model, it’s still a little complex (BeginInvoke, IAsyncResult, WaitHandle’s etc) then you would probably like.  VS ‘05 makes async programming a lot easier with the new BackgroundWorker class that can go execute a long running task in a background thread without the main UI thread feeling like it has hanged.  The Background worker class has some important properties and events:



  • WorkerSupportsProgress and WorkerSupportsCancellation which hold true/false values that let the backgroundworker report progress or cancel an async progress respectively.
  • The DoWork event fires when you call the BackgroundWorker.RunWorkerAsync() method.  You’ll simply need to hook up an event handler to the DoWork event that will do whatever long running task you want to have run on another thread.
  • The ProgressChanged event fires when you call the ReportProgress method inside of the DoWork event and allows you to show the current progress of the long running task.
  • The RunWorkerCompleted event fires when the asynchronous operation has completed. 

As shown below, the user clicks on the search button and we pass the search value to the background worker. The background worker calls Google images, parses the urls for images and we then retrieve all of the images into local image objects.  We return the list of urls and images back to the original thread.  We then bind the images to the ListView control.  If a user clicks on a specific image, we load the high-res image and load it into the picture box.  If a user right clicks on the image, they can set the image as the machine’s wallpaper.



The URL for the image and the image itself are stored in a generic dictionary class with the url representing the key and the image representing the value as shown below:


 


internal Dictionary<string, Image> GoogleImages = new Dictionary<string, Image>();


 


RunWorkerAsync accepts an object type parameter so that you can pass variables/data  into the background worker thread.


googleBackgroundWorker.RunWorkerAsync(txtSearch.Text);


 


Note: The code that is being executed when the DoWork event fires will run on a separate thread then the UI. It’s important to remember that code on this thread cannot change controls on the original UI thread.  Because of this, we’ll get all the images and do the actual databinding in the RunWorkerCompleted event.  


 


Since we passed an argument (the text to search) as an object type, we can get the value out of the DoWorkEventArgs e by casting to a string. The GetImagesFromGoogle method calls Google and returns a list of image URLs by applying a regular expression to a string that contains the raw html. 


WebUtilities.GetImagesFromGoogle((string)e.Argument);


 


With the list of image URLs parsed from the Google HTML file completed, we can then loop through each url and load the image by creating a WebClient object with the image url and reading the image bits into a stream object. We then convert the bytes in the stream to an image by calling the Image.FromStream method and finally we add our url and image to our dictionary. We do a try/catch/finally statement in case the image has been taken off of Google, but we really just want to continue rather then do anything fancy here (like stop the application) and we call the finally statement after each image to report progress to the UI thread.   As we need to make 21 HTTP requests (one for the original images.google.com page and up to 20 request for each image in the results page) we show user progress in a progress bar as each image is loaded.


 


foreach (string url in imgUrls)


{


try


     {


       Stream ImageStream = new WebClient().OpenRead(url);


       Image img = Image.FromStream(ImageStream);


       UrlAndImage.Add(url, img);


     }


     catch { }


     finally


     {


  googleBackgroundWorker.ReportProgress(progress++);


     }


   


The ReportProgress method receives an integer value and we read it out from the ProgressChangedEventArgs e variable and increment the progress bar as shown below


 


void backgroundWorker1_ProgressChanged(object sender, ProgressChangedEventArgs e)


{


  googleProgressBar.Value = e.ProgressPercentage;


}


 


Note: The WorkerReportsProgress property must be set to true here to report progress.


 


Once we’ve pulled every image, we return our dictionary of urls and images by setting the e.Result property.


e.Result = UrlAndImage;


 


When our BackgroundWorker async progress is complete, it will then call the RunWorkerCompleted method on the original UI thread.  We can now manipulate the Windows Forms controls on this thread.  At design time, I set a couple of ListView properties to set the View for the control to “LargeIcon” and I set the LargeImageList property to my ImageList control. I need to now loop through each entry in our dictionary and add the key and the image to the ImageList control which we will use to store the images.  The key from our dictionary is important here as we’ll use the key to map the right url to the right image in our ListView.  The ListView.Items.Add() method overload I call expects a string of text to describe the ListView item, which I set to the url, and a string to represent the image key in the ImageList as I discussed above.


 


void backgroundWorker1_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)


{


  googleProgressBar.Visible = false;


  GoogleImages = (Dictionary<string, Image>)e.Result;


  foreach (string key in GoogleImages.Keys)


  {


    lowResImageList.Images.Add(key, GoogleImages[key]);


    googlePicturesListView.Items.Add(key, key);


  }


}


 


Debugging The Code


 


Visualizers


In the Channel9 video I showed how you can add a breakpoint to code that’s running on the background worker thread and I show the HTML visualizer. If you’ve ever written code to screen scrape a web page, you’ll definitely appreciate the string visualizers we’ve added in 2005. You can add a breakpoint to your code and get a rich view of your in-memory variables. Below are the screenshots from the HTML and Text visualizers which show you exactly what Google returned either in a browser view or in a straight text view. Very cool.


 


HTML Visualizer for Google Request



Text Visualizer for Google Request



 


Modifications/Wish List


It’s hard to release a sample as you know what’s ugly under the covers and all the cool things you planned to do, but things like sleep get in the way. If I had some free time, here’s what I would add/change:


 


Small Changes


·        Change the backgroundWorker method names


·        The OpenRead(url) method on WebClient class (I thought this was more readable then the request/response model) expects a string data type rather then a URI class. Because of this, I decided not to use the URI class and used the URL variable name to make the distinction clear. This isn’t much of a problem really, but it would be something I would consider switching in the future. The change would look like the following:  


internal Dictionary<Uri, Image> GoogleImages = new Dictionary<Uri, Image>();


·         Add a way to cancel the background event. The code is trivial, it just got Pri 2’d.


·        Format the ListView a bit. Rather then have the full url, simply show the file name. Add the ability to turn checkboxes on and off so that you can File…Save All images. These were pretty much Pri 3’s


·        Add a status bar and move the Progress bar into the status bar. Use the status bar to not only show progress, but remove the message box pop-ups when the Wallpaper changed and instead send that message to a label on the status bar. Pri 3.


 


Larger Changes


·        Add a way to see the next page of Google image results. The current model only shows results from the top page. Again, not too difficult, but Pri 2.


·        Create a nice API to read/write/serialize values that the user enters into the textbox so that way we’ll have a history of those values. Pri 3.


·        Add Menus for File, Save All (mentioned above), Exit, Options. The Options window is the bigger work item here and it would allow you to enter network configuration information (proxy, port, etc and let you “test” your connection). The new System.Net classes make this pretty easy Pri 3.


·        Integrate this with our RSS Screensaver Starter Kit so that your screensaver can automatically change background images based on images in images.Google.com.  In general, I’d like to see the RSS screensaver use a provider model so that images can be source agnostic whether it be file system, object, database, web service, etc. Pri 3


 

Comments (38)

  1. That’s just awesome. It’s too bad Google doesn’t server XML and that we have to resort to scraping HTML.

  2. technostan says:

    Nice work. Can’t wait to check it out.

    This should help me a lot since I’m doing something similar with the the Visual Web starter kit..

  3. Bob Hawkey says:

    Interesting because I just wrote the same program in VB 6.0 using the browser and Inet controls. I was tired of seeing WMPlayer’s visualizations in music player programs and thought "When you’re playing Frank Sinatra why not look up a bunch of images and make a slide show!" Once I plunder google I put the images in a folder and then invoke a program that pop the images onscreen ala PointCast (remember Pointcast… sigh). I’d like to get a little more animation going but VB is pretty slack at that (at least at my level of expertise). Think about it for your next project. It would make a sweet WMPlayer plugin!

  4. Whenever you are working with a computer language for a while you want to get a bit creative or sazzy to say something and do new things with it, so in order to do that you need to get inspiration…

  5. Mabsterama says:

    Last night I set up an account with Google and wrote a small program to search the web using the Google…

  6. &amp;lt;backstory to get motivation on why I did this&amp;gt;It’s review time here for Microsoft employees, and…

  7. &amp;lt;backstory to get motivation on why I did this&amp;gt;It’s review time here for Microsoft employees, and…

  8. &amp;lt;backstory to get motivation on why I did this&amp;gt;It’s review time here for Microsoft employees, and…

  9. &amp;lt;backstory to get motivation on why I did this&amp;gt;It’s review time here for Microsoft employees, and…

  10. &amp;lt;backstory to get motivation on why I did this&amp;gt;It’s review time here for Microsoft employees, and…

  11. Mabsterama says:

    Last night I set up an account with Google and wrote a small program to search the web using the Google…

  12. Anon says:

    A screensaver (slideshow) version available at:

    http://www.takiweb.com/~samw/gi/givb.htm

    written in VB .NET

    required .NET Framework 2.0+

    [ Google Images Screensaver Slideshow poetry fetch picture screen saver

    pull slide show vb ]

  13. Sam Weera wrote a VB mod using some of my old Google Images screenscraping demo code and created a Google…

  14. craig says:

    There is an easier way.

    WebClient().DownloadFile(url, filename) would do that in one step.  You only need the image object is you want to manipulate it 1st – which from my quick look at the sample you don’t appear to need to do.

  15. Hey Craig,

    I’m familiar with the DownloadFile() method, but it has two drawbacks – 1) saving local files 2) custom code for filenames, 3) File deletion

    1) Saving Local files

    The downloadfile method requires that you store files on the hard drive, so if you go through, say 5 searches, you could have 5 x 25 = 125 images on your hard drive that, well, you may not want, plus the issue with resolving what location to save to (the specialfolders dir makes this easy though) and what to do if there is no file space. Which leads to #2

    2) Custom code for file names

    As you know, the DownloadFile method requires that you store files on the hard drive with the a file name parameter, which has to be unique, so in short you end up saving all these files to disk and then having to write custom code to give each file a unique file name that will likely be useless to the user when the reality is that if you want to save a picture you could directly rather then forcing each picture to be saved. The other advantage is that keep images in-memory would improve performance since it doesn’t have to serialize the file bits onto disk which is an expensive operation.

    3) File Deletion

    Now that I have 125+ images on my drive because I searched for five things, as the app developer I probably want to have some code that goes through and deletes old files and have to handle managing which files should be deleted and which shouldn’t.

    Thanks for the feedback though, but I thought it would be easier to just keep everything in memory 🙂

  16. With a bit of refactoring I was able to get this to work through a coprorate proxy server (removed the Error 407 Proxy Authentication Required). I went through and replaced:

    return m_WebClient.OpenRead(url);

    with

    WebClient m_WebClient = new WebClient();

    m_WebClient.Proxy.Credentials = System.Net.CredentialCache.DefaultCredentials;

    return m_WebClient.OpenRead(url);

    The Key part is setting the DefaultCredentials. Works a treat, thanks for the code.

  17. James Andrews says:

    How would you handle turning strict filtering off?

  18. Sam Jackson says:

    Can get this to work at all.  Changed the Rafting Issue then I receive the following error.  Please Help!

    Unable to cast object of type ‘System.Windows.Forms.ToolStripContainer’ to type ‘System.ComponentModel.ISupportInitialize’.

    Also, what to do about ‘KeySettings’ once rafting has been changed?  Thank you for your help and code.

    SJ

  19. Hey Sam – Are you sure you are using the updated code? I updated the code here and fixed all of those issues – http://www.danfernandez.com/view/view.aspx?ID=170

    Let me know if this fixes your problem

    Thanks,

    -Dan

  20. Matt says:

    You might want to update this to use images.live.com 😛

  21. sn says:

    I debugged it and it’s just not showing any images when I search… 🙁 I don’t know what’s wrong…

  22. Roger says:

    Same problem here, no images showing.

  23. Henry says:

    now I’ve tried more then 4 versions of your app, even this last one

    first of course the problems with container-toolstrip-beta version, but now your app just keeps hanging all the way

  24. Henry says:

    no images showing indeed … no progressbar etc.

  25. toebot says:

    Hey Dan, just wanted to let you know the program doesn’t appear to do anything, but runs fairly smoothly (I can see it attempts the connection, retrieves hardly any data, then does nothing).

    So it’s my guess that one of your RegExps in the WebUtilities class needs to be updated.

    Assuming you care to…

    P.S.  Love your blog!  Tons of good stuff — and this novice programmer appreciates it.

  26. batman says:

    Hi Dan,

    I’ve got your latest code. It run well, but after that it didnt show any results. I’m using VS2005. Could u see and check? Thanks a lot

  27. Batman says:

    I debugged and found that when run to  imageBackgroundWorker.RunWorkerAsync(txtSearch.Text);

    after that it didnt go to imageBackgroundWorker_DoWork(object sender, DoWorkEventArgs e), otherwise it run to  imageBackgroundWorker_ProgressChanged  and then imageBackgroundWorker_RunWorkerCompleted

  28. NeoTopian says:

    Can possibly post the full source code to check with what i built as it keeps crashing.

  29. Richard says:

    When I try to download the updated code I receive the following error:

    Message: An error has occurred while establishing a connection to the server. When connecting to SQL Server 2005, this failure may be caused by the fact that under the default settings SQL Server does not allow remote connections. (provider: Named Pipes Provider, error: 40 – Could not open a connection to SQL Server)

    Source: .Net SqlClient Data Provider

  30. bj says:

    is it possible to get this picture???

    http://images.mt.net.mk/minimizeImage.aspx?img=jankulovska211.jpg&w=34&h=34

    i tried and from all the pictures only the ones with parameters dont download

Skip to main content