Suggested Sites & Privacy


The IE8 feature Suggested Sites helps you discover related sites that can be helpful to get more information about your interests.  Under the hood, Suggested Sites is a system that provides suggestions by using a collection of users’ visited sites.  You may be wondering how Suggested Sites works with the investments we’ve made in privacy.   Respecting user privacy and giving the user control over the data provided has been part of the design philosophy of Suggested Sites since the beginning.  This blog post explains the methods we use to respect user privacy.

On the client:

  • Requires user opt-in: Suggested Sites is off by default and the user must explicitly opt-in to the experience through the first-run settings wizard or through the tools menu.  This follows Microsoft’s principle where every shipping feature requires consent for data transfers (Microsoft product policy guideline here).
  • Disabled during InPrivate: Suggested Sites does not record or send any browsing activity during InPrivate browsing sessions.
  • Respects history settings: Suggested Sites gathers the user’s visited sites and periodically sends it to the service. If the user deletes the history, these deleted entries are not uploaded to the server.  Also, suggestions are not displayed for these deleted entries.
  • Supports only public internet sites:  Suggested Sites supports and discards the following URLs:

    Supported URLs

    Discarded URLs

    HTTP scheme

    HTTPS scheme

    Internet zone

    Intranet and local zone

    DNS and IDN host

    IP address

     

    URL syntax with username and/or password, ex:

    http(s)://username:password@server/resource.ext

The full URL is used (no keystrokes), since some sites use URL parameters for navigation to portions of their web content.  Using the full URL helps narrow down the interest such that the suggestion is more relevant.  Note that this has the side effect of displaying search term like “banana bread” like: http://search.microsoft.com/results.aspx?form=MSHOME&mkt=en-US&setlang=en-US&q=banana%20bread.  With the use of the full URL, Suggested Sites can provide more relevant results, and the use of the full URL is a common tactic by users today to share links with others (ex: copy-and-paste a URL from the address bar).

  • Generate pattern ID to associate sessions: When the user opts-in to the feature, a random, unique identifier is generated to group usage patterns on the server.  This ID is not used to identify the user.  When the user clears the browsing history, this pattern ID is regenerated, and there is no way to correlate the previous pattern ID to the new pattern ID.

For instance, if I browse to allrecipes.com in one session and recipezaar.com in another session with the pattern ID 123456789, the Suggested Sites service is aware that allrecipes.com and recipezaar.com belong under one browsing experience.  However, it is important to emphasize that the pattern ID is not linked to me personally, and the server removes any personal identifiable information such as the IP address (more on how the server handles this below).

Over the wire:

  • Uses HTTPS protocol: All data sent over the wire is encrypted over an SSL connection.  This helps protect cases of a man-in-the-middle attack.

On the server:

  • Removes IP and cookie information: The server strips the data of any user identification, such as the client IP address and cookies so that it is not possible to personally indentify the user. The pattern ID is available for grouping of previous session to provide relevant results, but not used to identify the user.

We use this data to make the Suggested Sites feature better and continually improve the quality of the suggestions.

We’ve designed Suggested Sites so that user is in control of his/her data while balancing a new way to explore the web.  If you’d like to try it out, you can opt-in or out of the feature through the Tools > Suggested Sites menu.  You can read our full privacy statement online to learn more about how IE8 features handle your data. 

Jane Kim
Program Manager

Comments (12)

  1. zonk says:

    How will you handle that PHP sessions where SID is appended to every url on the site?

  2. EricLaw [MSFT] says:

    @Zonk: You’re right to not that in some cases, session identifiers are sent in URLs.  Note, however, if the site is a secure site (as you would expect for sensitive transactions) the HTTPS urls are not sent to Microsoft.  In the case of non-secure URLs, keep in mind that the communication with Microsoft servers takes place over a secure channel, so an attacker isn’t going to be able to see your secure token when it’s sent to the suggested sites feature.  

    Microsoft itself isn’t going to use your session token; how we use the URL data is covered by the privacy statement which you can find in the link above.

  3. Tino Zijdel says:

    All nice and dandy, but aren’t ‘suggested sites’ in practice sites that are paying for being ‘suggested’? In that case I’m not interested in this feature at all since it is just a marketing scheme, so basically nobody should be interested…

  4. Joseph McFarland says:

    It looks like you did a great job on this.

    Is there a way we can opt-out of suggested sites (other than deleting history)? We have some internal testing sites that we would rather not have outsiders visiting (yet they need to be publically accessible for some reason). Perhaps paranoia about this new feature will convince people to actually protect them.

    Tino: User-submitted data should come into play. I would still be fine with the feature if it was 100% pay-to-play, but it doesn’t seem that way), as long as they kept out the ad trolls. It’s already been useful at least once.

    (note: ad trolls being people who attempt to get search engines to index and highly rank their garbage trying to buy their way into polluting the suggested sites database)

  5. That’s a real nice function. However I dont need my ip hided.

  6. Mitch 74 says:

    Speaking for the very security conscious (and a strange oversight): if IP and cookie data is stripped by the server, why send it? Wouldn’t it be simpler to strip it in IE directly, and send the truncated version instead? If, for any reasons, Microsoft decide to unilaterally change their privacy agreement and make use of these IP+cookie informations, what control over that do users have? By the time someone finds out, that’ll be millions of users spied on and billions of URLs gathered – a treasure trove if there isn’t any.

    Moreover, HTTPS isn’t fool-proof anymore, SSLv3 having been cracked through brute force; so there is still a risk the Suggested Sites security certificate gets cracked, and all data intercepted gathered by the cracker would be an open book. While the IP address is irrelevant (it would still be inside the packet’s header), all cookie data would be exposed.

    At least add a setting to IE (and make it default) to strip IP and cookie data before having it sent to MS servers.

    After all, if you’re not gonna use it anyway, save yourself the bandwidth and server CPU time.

  7. Warren says:

    Mitch 74:

    It’s an artifact of the TCP/IP protocol — you can’t /not/ send your IP address. Microsoft’s servers necessarily /must/ know your IP address in order to establish a TCP connection so that it can transmit the results back to you.  What Microsoft is saying here is that they discard this information once the session is completed, i.e. they aren’t writing it to logs or a database somewhere for future use.

    Cookie information is also going to be limited to the cookies that Microsoft’s own web site sends you as part of a session.  That is how cookies have always work — they’re limited to the specific domain they originated from.  There’s absolutely no way that Microsoft would include other web sites’ cookies in a request to their own web site.

  8. Bruce says:

    <<<HTTPS isn’t fool-proof anymore, SSLv3 having been cracked through brute force>>>

    Absurd.  Quote your sources.  (Hint– SSLv3 isn’t even a cipher.)

    <<<still a risk the Suggested Sites security certificate gets cracked, and all data intercepted gathered by the cracker would be an open book.>>>

    It’s pretty clear that you don’t understand how HTTPS actually works.  Further, you’re overlooking the fact that if there’s an active attacker on the line who can mess with your SSL, s/he could just as easily see the raw HTTP traffic you’re sending when you actually visit the sites whose URLs are submitted over the encrypted channel.

    <<<Is there a way we can opt-out of suggested sites>>>

    Don’t opt in to the feature to start with??

  9. KS says:

    Suggested Sites has already "suggested" sites that try to install MALWARE on the unsuspecting visitor. You need to disallow this!

  10. jane kim [MSFT] says:

    Great comments – thanks for reading!  

    @Tino – The suggestions are purely based on relevance and not advertisements or sponsored links.

    @Joseph – By default, the feature is off, and you can also control it through the Tools > Suggested Sites menu item.

  11. ashishag says:

    please have Ctrl+L have the cursor move to the address bar rather than opening a new box. Just like firefox. Its frustrating to have Ctrl+L opening a new box, when Ctrl+O already does the same.

  12. Mitch 74 says:

    @Warren: yes, I know about how TCP/IP works (packet’s origin IP address is inside TCP/IP packets headers). I also understand that even by using raw ports access and inserting bogus IP addresses wouldn’t work, because anybody connecting to the intarweb through a router would have that replaced with their router’s IP address anyway. That part must, indeed, be stripped server-side.

    The cookie policy is another matter, as "Suggested sites" by its function implies that IE would browse browsing history to see what suggestions would work best. What, then, is the server removing in the "cookies" part of the data transfer? "Suggested Sites"’s? Or cookies associated with the transmitted URL? If it’s the former, ok. If it’s the latter, not OK!

    @Bruce: HTTPS is an encrypted version of HTTP, using a different TCP/IP port. Encryption is determined through another protocol, currently SSLv3, using (quite often) RSA and AES – which are not cracked. However, many SSLv3 certificates are signed using MD5 hashes, and THAT has been cracked: http://blog.wired.com/27bstroke6/2008/12/berlin.html

    Note that in papers dating back to 2004, MD5 was already known to be not very secure, and put SHA-1 not much higher (it is better and still uncracked, though).

    In short: stripping IP addresses can in practice only be done server-side, ok (it shouldn’t be repeated inside sent data, if it was ever part of it); stripping cookies shouldn’t be necessary, because aside from "Suggested Sites"’s own, no cookies should be accessible (same origin policy, but there still are cookies that cite ‘*’ as their origin). If it’s already the case, OK (but the message isn’t clear on that).

    SSLv3 protocol, when using MD5 signatures for its security certificates signatures (and there are still many out there, thus browsers won’t yet cut them off), isn’t fool-proof. Couple that with a DNS corruption (like we got all summer/fall 2008), then suddenly "suggested sites" becomes quite dangerous.