Read an RSS feed from SSIS


The 3.5 .NET framework introduced a new SyndicationFeed class which simplifies the process of reading (and creating) RSS feeds. To try it out, I threw together a simple SSIS package which reads from an RSS feed using a Script Component.

image

image 

Things to note:

  • I have an Http Connection Manager, but I’m not actually using its connection object in my script. I use it just for its connection string (the URL to the RSS feed)
  • You need to add a reference to System.ServiceModel.Web assembly to access the SyndicationFeed class
  • The Script Task / Script Component will target the 2.0 .NET framework by default. To use the SyndicationFeed class, you’ll need to use the 3.5 .NET framework. You can change this setting by right clicking on the script component’s Project, and selecting properties…

image

The code for the script (notice I’m using C# which is only supported in SSIS 2008):

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    private string url = string.Empty;
    private SyndicationFeed feed = null;
    private XmlReader reader = null;

    public override void PreExecute()
    {
        base.PreExecute();

        // Get the URL from the Http Connection Manager.
        // Note, we're not actually using the connection manager's connection object,
        // just it's URL setting. This is because using the .NET connection classes
        // give us more flexibility.
        reader = XmlReader.Create(Connections.HttpConnection.ConnectionString);
        feed = SyndicationFeed.Load(reader);
    }

    public override void PostExecute()
    {
        base.PostExecute();

        reader.Close();
    }

    public override void CreateNewOutputRows()
    {
        if (feed != null)
        {
            foreach (var item in feed.Items)
            {
                Output0Buffer.AddRow();

                Output0Buffer.Title = item.Title.Text;
                Output0Buffer.PublishDate = item.PublishDate;
                Output0Buffer.LastUpdatedTime = item.LastUpdatedTime;
                Output0Buffer.Id = item.Id;
                Output0Buffer.Summary.AddBlobData(ConvertToBytes(item.Summary));

                string authorName = string.Empty;
                if (item.Authors.Count > 0)
                {
                    // take the first author
                    authorName = item.Authors[0].Name;
                }
                Output0Buffer.Author = authorName;                
            }

            Output0Buffer.SetEndOfRowset();
        }
    }

    private byte[] ConvertToBytes(TextSyndicationContent content)
    {        
        if (content != null && !string.IsNullOrEmpty(content.Text))
        {
            // convert the string buffer to UTF8 so we can store it in an NTEXT column
            var encoding = new UTF8Encoding();
            return encoding.GetBytes(content.Text);
        }

        return new byte[0];
    }
}

The package is available on my SkyDrive share:

Comments (8)

  1. Steven Kamis says:

    Fantastic! Just what I needed. Thank you.

  2. J Best says:

    Great post….  Works perfect.  Thanks very much.

  3. Etienne says:

    This DTSX works well, sort of, all my results defaults to a max of 15. Any idea why?

  4. Etienne, when you access the RSS feed URL directly (like through your web browser), do you see more than 15 results? It might just be the way the feed is configured. You should check and see if it accepts any parameters that allow you to specify the number of entries to return.

  5. John Kelly says:

    I am struggling with this simply because I have no background in c#. My issue is I note you are not using the http connection for anything other than url, however I need to connect through a proxy server. How might I do that?

  6. There are multiple ways to retrieve the XML (rss) document using .NET (C# or VB). I found an example of retrieving the document when behind a proxy on stackoverflow: stackoverflow.com/…/xmldocument-loadurl-through-a-proxy

  7. Jo says:

    How Can I retrieve the full content of the RSS

Skip to main content