Previous posts have covered trustworthy principles in general and some product specifics as well. Privacy is an important part of trustworthy computing. This post discusses one aspect of privacy on the web: third-party content.
When most people browse the web, they think what they see in the address bar and the site they are visiting are the same thing. However, web sites today typically incorporate content from many different web sites. For the sake of clear terminology, the site the user browses to directly (seen in the address bar) is the first-party site; the other sites that the first-party site incorporates in its site experience (but that the user hasn’t navigated to directly) are third-party sites.
When you browse to a first-party site, you know that it can collect information about how you use the site. What many users don’t realize is that technically, third-party sites can collect information about users as well. Users aren’t typically well-informed about which third-party sites are collecting what information, how the sites use this information today, or how the sites could use the information in the future.
Identifying Third-party Sites
While the address bar shows the address of the current, first-party, site, this dialog shows the addresses of all the different web sites (including third-party sites) that the current web page includes content from. The browser visits every one of these sites in order to show the current web page’s content.
The way that sites can pull content in from other sites is useful and powerful and typical on the web today. It’s part of the underlying design and structure of the web, and enables functionality (like an interactive map in the middle of a restaurant’s website, or a “share this” link in the middle a news article) that people value.
Third-Party Sites and Privacy
At the same time, bringing information together from different websites has privacy implications. A good example of this issue that most people have experienced involves email. Many email systems treat email messages that come from unknown senders in a special way, blocking images in them and displaying a warning like this one:
The message body typically has some missing images (“red X’s”) with text nearby, like “Right-click here to download pictures. To help protect your privacy, Outlook prevented automatic download of this picture from the Internet.”
Why do email systems block these external images? The sender may have programmed some information in the external image that is unique to the recipient – for example, having the image’s file name or location include the recipient’s email address. When the sender sees that a particular image was downloaded, then the sender knows which email message arrived in a valid account and was opened. By not downloading the content, the email recipient prevents his email system from disclosing information and protects his privacy from the unknown sender. Potentially, the recipient protects himself from more unsolicited email.
In general, every piece of web content that a computer requests from a website discloses information to that website. This basic technique enables a third-party site to track visitors across different first-party websites that include content from the same third-party. When several websites show content (like a syndicated photo or article) from the same third-party website, that third-party site can determine which of the websites a particular visitor has browsed to.
For example, say two totally unrelated sites, Site1.com and Site2.com, both include images from MySyndicatedPhotos.com. The user browses to both Site1.com and Site2.com, and the user’s browser calls MySyndicatedPhotos.com in order to get the images these sites include. MySyndicatedPhotos.com can figure out (by various means) that the same machine visited these two different sites.
As the user visits additional sites that show content from this same third-party site, this third-party site is in position to build a profile of the user’s activity across the different sites that include its content.
While cookies can definitely contribute here, and there’s been long-standing concern and confusion about “tracking cookies,” the fact is that any content coming from a third-party site can function like a tracking cookie. The intent of the content (a photo, article, logo, or site-specific analytics; image, text, or script) is technologically irrelevant to its potential use as a tracking mechanism. Note that even if the user had blocked all cookies, other content on third-party websites could still be used to build a profile. Third-party content isn’t inherently good or bad; it’s just technically possible to use it this way.
Actually Happening or Just Technically Possible, and Other Questions
Third-Party Sites and Trust Issues
Given that web browsing isn’t anonymous and in some ways this is “how things work” on the web, what exactly is the trust issue? For many people, trust begins with security. The security risk here is plain: visiting one website exposes the user to potentially malicious content from other websites. The user visits one site and sees content on it that seems trustworthy (it’s on the site!) but actually comes from a different source. Finding examples of this problem on the web isn’t hard; it’s happened to visitors of several top tier websites.
Trust includes privacy as well. The privacy concern involves users having a choice, and being able to exercise control about what information they share. Today, users are not in control of which websites can get information about their browsing activities. As a result, web sites that users aren’t aware that they’ve visited and don’t have a well-defined relationship with are in position to build a profile of the users’ browsing patterns.
A guiding principle for Internet Explorer (and Microsoft overall, as part of Trustworthy Computing) is that the user should be in control. Consumers have come to expect security protections from their browsers, and are starting to have higher expectations about privacy protections as well. Control here means that users have clear notice and can tell what sites they may be disclosing information to and under what terms. Control also means that users can exercise choice about what information they disclose to whom. Preventing information disclosure means blocking content; blocking content creates a possible impact to the appearance and functionality of the page.
Beyond these issues, accountability is a question here as well. When a user visits one site after another, and each one includes some third-party content, who is accountable and who takes responsibility for the information collected about the user? On today’s web, that’s not at all clear.
The privacy and trust issues around third-party content are complex and important. As discussed in this blog before, trustworthy browsing involves many industry challenges, and, like many other efforts (e.g. interoperability), requires cooperation and trade-offs. Web privacy involves more than just blocking cookies. Enabling users to be in control starts with making users aware of the issues. In another post, we’ll cover IE8 functionality that helps users stay in control of their information.