Building an RSS reader in SQL Server 2005

People used to write "Hello, World!" applications in order to demo new languages and technologies.  It seems that all of the demos de jour are RSS applications.  Well, in order to continue the trend, I have decided to write a quick RSS reader which uses 2 major new SQL Server 2005 features: the XML Datatype and SQLCLR user defined functions.  What we are going to do is open an OPML file, retrieve the URLs to all of the RSS feeds, and then return each blog entry as its own row.  You can then view the feeds using your query editor of choice (I am using the new SQL Server Management Studio).  Oh yeah, and we are going to do it in ~50 lines of code.

First step is to build the CLR component which will read XML files from a URL.  We will need this in order to get the OPML file as well as the feeds themselves.  The code is very straightforward:

 using System;
using System.Data.SqlTypes;
using System.Xml;
using System.Xml.XPath;
using Microsoft.SqlServer.Server;

public partial class UserDefinedFunctions
{
    [Microsoft.SqlServer.Server.SqlFunction()]
    public static SqlXml GetXmlDocument(SqlString documentPath)
    {
        if (documentPath == null || documentPath.IsNull)
        {
            return null;
        }

        try
        {
            return new SqlXml(new XmlTextReader(documentPath.Value));
        }
        catch (Exception)
        {            
            return SqlXml.Null;
        }        
    }
};

Pretty simple. We simply pass in the URL, and we get back out the XML. The only gotcha here is that to actually run this code from within SQL, we need external access on the assembly.  I usually do this from within Visual Studio.  You can set the "Permission Level" property to "External" on the project which you are writing your SQLCLR UDF in. There are also some security requirements on the database end that you need to configure to allow assemblies external access. To get this to work I usually mark the database as trustworthy using ALTER DATABASE. Now, you can use "Deploy Project" from within Visual Studio to get your code running in SQL Server. From here on out it is all T-SQL and XQuery:

 select 
   feedXml.value('(/rss/channel/title/text())[1]', 'nvarchar(128)') as feedTitle,
  item.ref.value('(title/text())[1]', 'nvarchar(128)') as itemTitle,
  item.ref.value('(description/text())[1]', 'nvarchar(max)') as itemBody
from 
(
    select dbo.GetXmlDocument(
      outlines.ref.value('@xmlUrl', 'nvarchar(1024)')
     ) as feedXml
    from
    (
       select dbo.GetXmlDocument('https://blogs.msdn.com/Opml.aspx?GroupID=2') as opml
  ) xml(opml)
 cross apply opml.nodes('/descendant::outline[@xmlUrl]') outlines(ref)
) feeds(feedXml)
cross apply feedXml.nodes('/rss/channel/item') as item(ref)

So lets take it apart a bit and figure out exactly how it is working. Queries like this are usually easier to understand by reading them from the middle out, so lets start here:

  select dbo.GetXmlDocument(
      outlines.ref.value('@xmlUrl', 'nvarchar(1024)')
     ) as feedXml
    from
    (
       select dbo.GetXmlDocument('https://blogs.msdn.com/Opml.aspx?GroupID=2') as opml
  ) xml(opml)
 cross apply opml.nodes('/descendant::outline[@xmlUrl]') outlines(ref)

So this code is invoking the CLR UDF that we wrote to retrieve the OPML file, and then we are using XQuery to find all of the references to feeds. We use the XML datatype nodes() method to return all of the "outline" elements which have an @xmlUrl attribute (which should contain the URL to the actual feed), and then we pass the value of that @xmlUrl attribute into the CLR UDF again. The result of this is a rowset where each feed is contained in an XML instance in a single row. We have one more step, and that is to use the nodes() method again to get each item in each feed:

 select 
    feedXml.value('(/rss/channel/title/text())[1]', 'nvarchar(128)') as feedTitle,
  item.ref.value('(title/text())[1]', 'nvarchar(128)') as itemTitle,
  item.ref.value('(description/text())[1]', 'nvarchar(max)') as itemBody
from 
(
    -- Code that gets feeds goes here
) feeds(feedXml)
cross apply feedXml.nodes('/rss/channel/item') as item(ref)

Again, this is pretty straightforward. The inner subquery is returning all of our feeds as a single row, so we have to apply the nodes() method to pull out each item. We then use the XML value() to grab the title of the feed (notice that we aren't referencing the item table alias, but the original feeds table alias), the title of the particular entry, and the entry text.

Ok ok, so is it really practical? Not really. You don't get any pretty HTML formatting of the results or anything like that. Is it cool though? I think so.  It also provides some good background for us to do more interesting things with this data, which we will look at in some later entries.