The VB version of the Blog Crawler


This is the VB.Net 2005 version of the Blog Crawler. It’s based on the Foxpro version, but.it uses SQL Server Everywhere so you can deploy it on your mobile device! It crawls a blog and stores all entries into a SQL Server Everywhere table. This includes blog comments and Cascading Style Sheets.


I had to wait to post this blog entry because SQL Everywhere CTP public release is today (announced at Tech Ed)!


 


To run it, you only need to copy a few files from this link (1.6 megabytes) into a directory on your machine and start BlogCrawl.Exe. There is no registration or install of any kind required, except the Net Framework 2.0  (which is installed with Visual Studio 2005, or you can download the runtime). The Source code can be unzipped into the same folder and is here. The program (including SQL /E) is totally isolated to the install folder, except for the My Settings XML file which stores your preferences in your local settings folder. It doesn’t touch your registry or install any other files.


 


When you start the program, the top part shows a grid of already crawled blog posts. The bottom part shows each post in a web control as it looked at the time of download. The links on the page are live. When first starting, there will be no data. If you click the Crawl button, it will start a background thread that scans the blog and downloads any entries that have not been downloaded yet. The status bar shows crawl progress.


 


It takes about 20 minutes to crawl my blog and download my 240 posts.  You can stop and continue the background thread at any time by hitting the same Crawl button. The data is stored as a SQL Mobile database in the same folder in a file called <blogname>.sdf.


 


You can type a search string in the textbox and click the Search button to limit the number of records in the grid to those blogs containing the search string.


 


It’s customized for blogs hosted on http://blogs.msdn.com for parsing out the blog entry publication date and determining what page is a blog post and what is just an intermediate page (like February posts). I haven’t tested it with all the various blog CSS styles, but the source can be modified.


 


The program defaults to crawling my blog, but allows you to switch to other blogs. Click the Blog Options button to crawl your favorite blog.


 


If you change the Followed value for a particular entry to 0, then the next crawl will recrawl that link, perhaps if you want to get the latest comments.


 


It uses the new MySettings feature to persist user settings, such as window position and which blog was last crawled. The new SplitContainer class allows you to move the splitter bar between the grid and the web control and the SplitterDistance is persisted in My.Settings.


 


One of my machines was playing a sound while my web crawler was crawling. The culprit was Control Panel->Sounds->Sound->Windows Explorer->Information Bar.


 


 


See also


SQL Moblie books online


Use Regular Expressions to get hyperlinks in blogs


 


 

Comments (19)

  1. Alan Stevens says:

    Calvin,

    I’ve been talking to Steve Lasker and other members of the SQL Everywhere team at TechEd.  They mentioned that they do not support ODBC.  What are the implications for using existing SQL Pass-Through code against SQL Everywhere?

    ++Alan

  2. michkap says:

    Cool app, Calvin!

    I noticed mine takes a bit longer than 20 minutes, should I file a  bug somewhere? 🙂

  3. I’ve had several requests that require customizing the Blog Crawler.

    &amp;nbsp;

    The entire source code…

  4. The best validation I’ve seen for SQL Server Everywhere is when Calvin Hsia, Technical Lead from Fox…

  5. noahc says:

    Why not use SQL express?  🙂

  6. noahc says:

    It wouldn’t crawl this site.  It kept cutting off the /coad portion.  Wuzzup?

    http://msmvps.com/blogs/coad

  7. noahc says:

    Hey Calvin, thanks for all the fixes you made to support http://msmvps.com/blogs/coad!  This is a very handy tool and something I’ve been wishing for a long time.  Again, thank you!

  8. I&amp;rsquo;ve been looking for awhile for a way to back up my blog by capturing each post in a nice, MHTML…

  9. Check out this awesome little utility I found. Its not something I would use all the time, but I you…

  10. Calvin has written a blog crawler with both VFP and VB.NET versions that allows you to back up your own…

  11. My prior post (Create a .Net UserControl that calls a web service that acts as an ActiveX control to…

  12. Windows Mobile 5.0 comes with a Web Browser (v6 is due out any day now). It runs on Pocket PCs and SmartPhones.

  13. My prior post ( Create your own Test Host using XAML to run your unit tests ) shows how to create a form

  14. I received a question: Simply, is there a way of interrupting a vfp sql query once it has started short

  15. darshana says:

    Hi Calvin… Is there any way I can view the source code?

  16. tanwailoon says:

    hi calvin good work though there.. i cant seems to run the program..whenever i run it,

    this problem came out

    An attempt was made to load a program with an incorrect format. (Exception from HRESULT: 0x8007000B)