HTTPWebRequest 404 Error

A while back I developed an application which made requests to several web sites, scraped some data and stored the data into a database.   The application worked great for several months without issue.  One day the application failed to get a successful response from the sites I was screen scraping.  Obviously, the first thing I checked was to insure that the failure was not due to the publisher updating their site and changing any critical HTML on me, no dice.  So I began debugging and found that my request was getting hung up when it made the call to request the HTML page I needed.  I kept getting a n exception that stated “The remote server returned an error: (404) Not Found.”  The object I was using to make my request was “HTTPWebRequest”  and as I stated earlier all was fine in paradise.  The code below is what I started with and have been running for months:

Private Function GetHTMLData(ByVal source As String) As String

      Dim request As HttpWebRequest = CType(WebRequest.Create(source), HttpWebRequest)        

      Dim encode As Encoding = System.Text.Encoding.UTF8

      Dim r As New IO.StreamReader(request.GetResponse.GetResponseStream(), encode)
      Dim htmlString As String = r.ReadToEnd().ToUpper(CultureInfo.CurrentCulture)
      r.Close()
      Return htmlString

  End Function

After a couple of days of searching the Internet to find a resolution for this issue I had no luck in resolving the matter.  What made matters worse was I was able to take the same exact URL that was being requested in my code and paste it into the my web browser and see the data just fine.  So same machine, same network what could be happening.  It did not help that a “404” is usually a “Page Not Found Error”  but clearly the page exists and contains all the information I am expecting.  I finally decided to fire up Fiddler to see what was happening with my request.  Shortly, after making the request from my browser and then again from my application I began the comparison of the two requests only to realize the only difference was the “User-Agent” header value.  My code did not set this value however the browser clearly does.   Turns out I can set the “User-Agent” value to whatever I like as long as it is not blank I get data returned to me.  So I inserted 1 line of code and it resolved the matter entirely and now my application runs as smooth as a ….  Below is what the code looks like now:

Private Function GetHTMLData(ByVal source As String) As String

Dim request As HttpWebRequest = CType(WebRequest.Create(source), HttpWebRequest)        

      request.UserAgent = "Fiddler"

      Dim encode As Encoding = System.Text.Encoding.UTF8

      Dim r As New IO.StreamReader(request.GetResponse.GetResponseStream(), encode)
      Dim htmlString As String = r.ReadToEnd().ToUpper(CultureInfo.CurrentCulture)
      r.Close()
      Return htmlString

  End Function