Dupes be gone!

Duplicate items are an RSS aggregator's worst enemy, and many of the dedicated folks who are using Outlook 2007 Beta 2 know we did not do a great job in that build of handling the many ways that dupes can occur.

Since the Beta 2 build we've made numerous improvements to the RSS architecture around our ability to deal with duplicate items. This includes changes in both the individual download logic for feeds, the server sync if you're in an Exchange environment, and the delete behavior for individual items.

When you delete an individual RSS item from the feed's folder in Outlook 2007, we take it as "I'm done with this item and don't want to see it again." This means if the post continues to exist in the XML file we get from the content publisher for another few days (or however long it takes to roll off the end of the file), we will not download it again. Read Status is also handled the same way; mark an item as Read and its status will not change in this scenario.

If a blogger or content publisher modifies a post and wants their readers to be sure they see it again, they should follow the best practice of re-posting the new content. This will create a new GUID and cause Outlook (and other aggregators that follow this delete model) to see it as a new item and download it as appropriate.

Minor or non-content changes made to existing items in the feed's XML - especially random tags used by a specific aggregator or inserted automatically by the syndication engine - will not cause Outlook to see it as a new item and download a duplicate. We saw a large number of duplicate feed items in Beta 2 because of this and our improvements to the update logic for individual posts is designed to handle this. The specific logic for determining which fields to use for change detection in Outlook is now the same as IE 7.