Use Regular Expressions to get hyperlinks in blogs


At Southwest Fox conference I presented a sample calling a VB.NET server to do regular expression matching.


Here’s the sample I used. It gets some HTML from my blog and parses all the hyperlinks (looks for the HREF tags) and puts them into a VFP table:


 


First create a VB server: A Visual Basic COM object is simple to create, call and debug from Excel


 


The VFP code gets the blog page as html then passes that html to the VB server, which does the regular expression matching:


 


LOCAL ox as vbcom.ComClass1


 


oVB=CREATEOBJECT(“VBCom.ComClass1”)


LOCAL oHTTP as “winhttp.winhttprequest.5.1”


oHTTP=NEWOBJECT(“winhttp.winhttprequest.5.1”)


oHTTP.Open(“GET”,”http://blogs.msdn.com/calvin_hsia”,.f.)


oHTTP.Send()


cHTML=ohTTP.ResponseText


cXML =oVB.RegEx(chtml)


XMLTOCURSOR(cxml)


BROWSE LAST NOWAIT


 


 


 


Add a method to the VB class (this version works in VS 2003):


 


Imports System.Text.RegularExpressions


Imports System.Xml


 


    Public Function RegEx(ByVal cHtml As String) As String


        Dim cregex As Regex = New Regex(“href\s*=\s*(?:””(?<1>[^””]*)””|(?<1>\S+))”, _


             RegexOptions.IgnoreCase Or RegexOptions.Compiled)


        Dim MatchCollection As MatchCollection = cregex.Matches(cHtml)


        Dim sb As New System.Text.StringBuilder


        Dim xw As XmlTextWriter = New XmlTextWriter(New System.IO.StringWriter(sb))


 


        xw.WriteStartElement(“VFPData”)


        For Each m As Match In MatchCollection


            xw.WriteStartElement(“Row”) ‘ for each Row


            xw.WriteStartElement(“RegEx”) ‘ field name


            xw.WriteString(m.Value)


            xw.WriteEndElement()


            xw.WriteEndElement()


        Next


        xw.WriteEndElement()


        Return sb.ToString


 


 


Of course, when I did the demo, I used a newer version of VB and I did a SQL Select from the Regular Expression results. I also used XLINQ, the new XML features of LINQ


 


                   Dim aList As New List(Of Match)


                   For Each m In MatchCollection


                             aList.Add(m)


                   Next


 


                   Dim res = Select p From p In aList Order By p.Tostring()


                   Dim xmlMain = <VFPData/>


                   For Each item In res


                             Dim xRow = <Row/>


                             xRow.Add(<RegEx><%= item %></RegEx>)


                             xmlMain.Add(xRow)


                   Next


                   Return xmlMain.ToString


 


 


 


 

Comments (4)

  1. davidfung says:

    This and the previous VBCOM Debugging post show that VFP and VB.NET are quite interoperable. VB.NET COM objects sounds like a good way to allow VFP to make use of .NET features down the road…

  2. This is the VB.Net 2005 version of the Blog Crawler. It’s based on the Foxpro version, but.it uses SQL…

  3. Here’s how you can use Visual Studio to create a .Net User Control that will act as an ActiveX control…

Skip to main content