On the Internet, everyone knows you’re a dog


According to a famous "New Yorker" cartoon by Peter Steiner, "On the Internet, Nobody Knows You're a Dog".

That's no longer true. In fact, today we can tell you're a scotch terrier, live north of London, enjoy chewing up your owner's shoes (including brand preference), crave for that special dog food, and have that odd obsession for Perdita from 101 Dalmatians.

Not convinced? These days the AOL search/privacy breach is a top news story.

Earlier this month, AOL posted "anonymized" data containing 20 Million search requests by some 250000+ individual users, collected of 3 months in 2006. While this may have been done with good intentions (to foster search engine research), the result is disastrous.

The data contains a numeric ID for individual users, the original search query string, the result clicked/followed (including ranking) and a timestamp. Most users do not disable/clear cookies for search engines, so you get a pretty interesting profile of their interests.

Worse, people tend to look up their names, related names, search for self-related data and web sites. Data mining these results will not only show trends, but may also expose private details. The original data set, roughly 0.4GB of compressed data is available from an endless list of mirrors by now and will never ever go away. Worse, the data can be queried online on a number of sites.

Software companies such a Microsoft have implemented privacy policies and have privacy statements by now. In fact, these will continue to evolve over the years to come, and be a crucial factor for consumer trust. User feedback is helping to shape software design, the privacy dialog in Windows Media player is a prime example.

Not so obvious: your hardware needs a defined level of privacy as well.

Examples: My laser printer includes near-invisible forensic watermarks (including serial number) on every page printed. Unless disabled, it transmits toner usage data to the manufacturer. Another one: My networked media center extender for audio playback queries the manufacturer home page at every startup (using it's serial number).

Most of these activities are harmless by itself, but combined they provide an increasingly complete picture.

Not convinced? Ever shopped books online? I do. Look what you can do to mine Amazon Wish lists (they are public by default).

The challenge: make it obvious what information is collected, how it could be used, and present users an easy way to opt in or out. Personalization is a great feature as long as you are in control. 

Skip to main content