The 3.5 .NET framework introduced a new SyndicationFeed class which simplifies the process of reading (and creating) RSS feeds. To try it out, I threw together a simple SSIS package which reads from an RSS feed using a Script Component.
Things to note:
- I have an Http Connection Manager, but I’m not actually using its connection object in my script. I use it just for its connection string (the URL to the RSS feed)
- You need to add a reference to System.ServiceModel.Web assembly to access the SyndicationFeed class
- The Script Task / Script Component will target the 2.0 .NET framework by default. To use the SyndicationFeed class, you’ll need to use the 3.5 .NET framework. You can change this setting by right clicking on the script component’s Project, and selecting properties…
The code for the script (notice I’m using C# which is only supported in SSIS 2008):
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute] public class ScriptMain : UserComponent { private string url = string.Empty; private SyndicationFeed feed = null; private XmlReader reader = null; public override void PreExecute() { base.PreExecute(); // Get the URL from the Http Connection Manager. // Note, we're not actually using the connection manager's connection object, // just it's URL setting. This is because using the .NET connection classes // give us more flexibility. reader = XmlReader.Create(Connections.HttpConnection.ConnectionString); feed = SyndicationFeed.Load(reader); } public override void PostExecute() { base.PostExecute(); reader.Close(); } public override void CreateNewOutputRows() { if (feed != null) { foreach (var item in feed.Items) { Output0Buffer.AddRow(); Output0Buffer.Title = item.Title.Text; Output0Buffer.PublishDate = item.PublishDate; Output0Buffer.LastUpdatedTime = item.LastUpdatedTime; Output0Buffer.Id = item.Id; Output0Buffer.Summary.AddBlobData(ConvertToBytes(item.Summary)); string authorName = string.Empty; if (item.Authors.Count > 0) { // take the first author authorName = item.Authors[0].Name; } Output0Buffer.Author = authorName; } Output0Buffer.SetEndOfRowset(); } } private byte[] ConvertToBytes(TextSyndicationContent content) { if (content != null && !string.IsNullOrEmpty(content.Text)) { // convert the string buffer to UTF8 so we can store it in an NTEXT column var encoding = new UTF8Encoding(); return encoding.GetBytes(content.Text); } return new byte[0]; } }
The package is available on my SkyDrive share:
来自Matt Masson: http://blogs.msdn.com/mattm/archive/2009/02/19/read-an-rss-feed-from-ssis.aspx 在.NET Framework
Fantastic! Just what I needed. Thank you.
Great post…. Works perfect. Thanks very much.
This DTSX works well, sort of, all my results defaults to a max of 15. Any idea why?
Etienne, when you access the RSS feed URL directly (like through your web browser), do you see more than 15 results? It might just be the way the feed is configured. You should check and see if it accepts any parameters that allow you to specify the number of entries to return.
I am struggling with this simply because I have no background in c#. My issue is I note you are not using the http connection for anything other than url, however I need to connect through a proxy server. How might I do that?
There are multiple ways to retrieve the XML (rss) document using .NET (C# or VB). I found an example of retrieving the document when behind a proxy on stackoverflow: stackoverflow.com/…/xmldocument-loadurl-through-a-proxy
How Can I retrieve the full content of the RSS