Your browser history as Attention data?


Have you thought of your browser history as attention data? I have some thoughts on this. Not all good.

Two companies are thinking this way.  Tailrank is one. You can go to this Import page and give the site permission to look at your browser history for blogs you’ve visited and make browsing recommendations based on that data.


The other is company is Brightcove, Jemery Allaire’s online video start up. In this post it mentions that a forthcoming release will “prebuild the recommendations from your browser history using some crazy AI kung-fu.”. (Update – apparently this Brightcove post was an April fools joke. It got me….but as you can see – entirely plausable….)


Your browser history as attention data?


There is good, bad and ugly in this….


The Good


In Tailrank’s case, you need to give the site explicit permission for it to trawl you browser history. That’s a good thing.


Your browsing history is valuable. Very valuable. Cool stuff can be done with it.


But giving up your browser history isn’t just a case of having a cookie placed so it can track your behavior on the cookie issuing site. We’re talking about all your browsing history (i.e. the urls you’ve visited down to individual page level) across all sites.  If the value proposition is right, this sounds like a good deal – I give you my browser history and you provide personalized content / recommendations. This may be appealing to some, but not me.


The Bad


The implicit bug in the ‘reading the browser history’ approach is the realization (for me at least) that a malicious site could do something you don’t want it to do – i.e. look at your entire browser history (up to when you last deleted it).  Why does this matter? Well, apart from the privacy invasion, there are real security concerns here. 


Before I go further, I need to point out that you need to give explicit permission for the Tailrank site to be able to look at your browser history under default security settings of most browsers. You can’t just land on a page for a site to look at your browser history. You need to click on something that gives the site permission to look. This is true for IE and Firefox.  I tested Tailranks ‘Auto-Configure’ implementation (uses javascript). In IE at ‘Medium’ security, after I click the ‘Auto’Configure’ button Tailrank chugs away and trawls just fine. At ‘High’ security setting IE, Tailrank barfs – the javascript is disabled.


The problem is that you can be fooled by malicious sites into giving this permission. You could be fooled into clicking a link or button that doesn’t do what is says it’s going to do and does something else instead. That’s bad.


And The Ugly


What’s the worse that can happen?


All Your Browing History Are Belong To Us.


Well, apart from the privacy invasion (your search history can be determined by the urls in your browser history), there are all sorts of nasty things that can go on.


One example is the fact that some sites are designed with appalling security – explicit storage of usernames and passwords in the url is not unknown. Nasty. If you happen to use the same username and password across multiple sites, then the malicious can try out the unencrypted usernames and passwords on other more secure sites (that listed in your browser history) potentially yielding some very bountiful results. Nasty, nasty.


In Attention Data We Trust


The attention landscape is and will be full of privacy concerns – there is an ongoing balancing act between potential abuse of the data ‘submitted’ and the potential benefits in providing that data.


At the end of the day, it is down to individual risk assessment. As a ‘customer’ you need asses whether you can trust the site to provide you with more relevant experience based on your attention data and do so in a safe manner.


The browser history case is the one of the more poignant examples in this regard. We’re not even talking about malicious sites here. You not only need to trust the site owners in terms of proper use of the data that is collected (their privacy policy and its adherence) but also trust their staff’s competency in securing their infrastructure – if the data store that holds your data is not properly secured with proper physical security, rigorous policy and processes then you risk saying bye bye to your data (a la Mastercard).



My Attention writings



Tags:

Comments (18)

  1. BillyG says:

    Yeah, I stumbed across some TailRank pages of mine the other day when I saw in my log that they came scraping/trolling/whatever. It is spread over quite a long time but they only have 2 pages of my data, so much for "all my history’ but of course this could/will be tweaked by them in the future.

    Thanks for clueing me in on BrightCove, they just look like a video upload site to me lol.

  2. orcmid says:

    Well, the first thing I did was go clear my history and drop the retention time from 20 days to 5.  I may have to clean up my act.  During the "present emergency" I have Active Scripting disabled for all but trusted sites anyhow.

    I wouldn’t mind being able to locally digest my history for my own purposes, but I don’t think I want to let anyone troll for it or know anything about it.

    Of course, the DoJ in their current dispute with Google over search terms data would have a field-day with this kind of stuff.  

    Now that I have taken that much precaution, I get to wonder whether the date of this posting impacts your credibility [;<).

  3. Kevin Burton says:

    Disclaimer.   I agree that this is a security vulnerability when used by the wrong person.  The problem is that most web technology can never really be used for evil unless there’s a browser vulnerability.

    The problem with this model is that its integral to the web’s design.  Visited URLs are key to the way people interact with the web.   I don’t see this being replace anytime too.  

    In fact with some advanced techniques (which I won’t disclose here) it could be extended to be used with 100k -> 500k URLs.

    … anyway.  This comment makes no sense:

    "If the value proposition is right, this sounds like a good deal – I give you my browser history and you provide personalized content / recommendations. This may be appealing to some, but not me."

    Ha… are you joking?  So if you upload your OPML thats ok but if you take the same content and give it to me with my autodetection mechanisn that would be bad? 😛

    Aren’t you MR Memetracker tuning via OPML? 🙂

    "a malicious site could do something you don’t want it to do – i.e. look at your entire browser history (up to when you last deleted it). "

    No.  this isn’t true.  You can’t look at the browser’s entire history.  You can at best look at 5-10k URLs to see if they’ve visited them.

    "You can’t just land on a page for a site to look at your browser history."

    Yes you can.  You can land on a site and it can observe 5k-10k URLs if it wanted to and there are no browser permissions or security settings you can adjust to fix this.

    "You need to click on something that gives the site permission to look."

    No… all you need is to have Javascript enabled.  You don’t have to click on anything.

    "This is true for IE and Firefox.  I tested Tailranks ‘Auto-Configure’ implementation (uses javascript). In IE at ‘Medium’ security, after I click the ‘Auto’Configure’ button Tailrank chugs away and trawls just fine. At ‘High’ security setting IE, Tailrank barfs – the javascript is disabled."

    The security settings have nothing to do with it.  If you have javascript enabled that’s all I need.

    The *real* BAD part is that any site in the wild could figure out what  5-10k sites you’re visiting without your permission.

    "What’s the worse that can happen?

    All Your Browing History Are Belong To Us."

    No… not without a brute force attack on the browser but this would require a LOT of data to be sent from the server and the user would not wait around for this to happen.

    "One example is the fact that some sites are designed with appalling security – explicit storage of usernames and passwords in the url is not unknown."

    It wouldn’t matter.  I wouldn’t be able to find a URL with a user/password combo in the URL.  I’d have to know the URL before hand.

  4. MSDNArchive says:

    Thanks for repsonding Kevin.

    Re: comparing OPML with browser history:

    >>"Ha… are you joking?  So if you upload your OPML thats ok but if you take the same content and give it to me with my autodetection mechanisn that would be bad? 😛

    >>Aren’t you MR Memetracker tuning via OPML? :)"

    I don’t think these two datasets are the same, or truly comparable.

    For instance, I have multiple OPML files. Once of which I have on a public url that I can use to point services like Tailrank at to provide me more relevant experiences.  This OPML file is culled and edited – for a number of reasons, including privacy.  Why? That’s my choice. I know what’s in it and not it in – I can edit it – I can decide what gets used, want ‘counts’. Using browser history is very different in this respect.

    Re: javascript enablement.

    >>"You can land on a site and it can observe 5k-10k URLs if it wanted to and there are no browser permissions or security settings you can adjust to fix this."

    This is to do with whether you have this is javascript enabled or not on a browser. This is a security choice.

    >>"all you need is to have Javascript enabled.  You don’t have to click on anything. "

    So we’re agreed on this point in fact.  As I said, I tested with IE settings at ‘High’. This disables javascipt. So any technique that relies on javascript to look at browser history doesn’t work.

    I accept your point that it doesn’t require the user to click on anything *if* they have javascript enabled. Good point.

    Re: the number of url that can be looked at in browser history.

    >>"You can’t look at the browser’s entire history.  You can at best look at 5-10k URLs to see if they’ve visited them."

    Yet, previously you commented:

    "In fact with some advanced techniques (which I won’t disclose here) it could be extended to be used with 100k -> 500k URLs. "

    Technically you make a fair correction here in that it can’t look at *all* your browser history.

    But…you know of a technique to look at the last 100K-500K. Half a million websites constitutes most of the websites most will visit in a lifetime.  By the time the browser history is cleared, 50K is probably all of it anyhow.

    So -question for you: if Tailrank is only looking at the last 5-10 urls (with the user’s persmission) how useful can that really be?

    Anyway, thanks for dropping by 🙂

  5. MSDNArchive says:

    Ignore last question Kevin. I meant 10K to 50K – I can see this is  very useful.

  6. Kevin Burton says:

    "reasons, including privacy.  Why? That’s my choice. I know what’s in it and not it in – I can edit it – I can decide what gets used, want ‘counts’. Using browser history is very different in this respect."

    Tailrank gives you the ability to view what you’re about to import… we also have plans for you to edit the list or remove entries.  

    At this point wouldn’t it be the same thing?

    ….

    "But…you know of a technique to look at the last 100K-500K. Half a million websites constitutes most of the websites most will visit in a lifetime.  By the time the browser history is cleared, 50K is probably all of it anyhow."

    Yeah… the question is the distribution of web traffic and the probability that I can hit 90% of your history.

    Another key point is that these are only the ROOT URLs and not URL permutations like cnn.com/foo/bar/cat/dog.html.  I can only probe cnn.com.  There is a near infinite URL space and there certainly billions of URLs in the wild and I can only practically look for 5-10k right now.

    "So -question for you: if Tailrank is only looking at the last 5-10 urls (with the user’s persmission) how useful can that really be? "

    No… we look for 5k blog URLs right now.

    Kevin

  7. Kevin Burton says:

    Ah….. ok.. 10k-50k… gotcha.. 🙂

  8. MSDNArchive says:

    This is cool Kevin:

    >>"Tailrank gives you the ability to view what you’re about to import… we also have plans for you to edit the list or remove entries.  

    >>At this point wouldn’t it be the same thing?"

    I didn’t get this far in the process, but this aspect of the current implementation sounds very good. Going to point where this can be edited before submitted sounds even better.

    Now – in comes the Attention recorders….

  9. Jon Galloway says:

    I’m curious as to why they aren’t integrating with Attention Trust (http://www.attentiontrust.org/). Their Attention Recorder allows you to track your clickstream but gives you a little more control over the data.

  10. Eric Elia says:

    Alex,

    The Brightcove AI blog post was an April Fool’s prank. But thanks anyway for the links. On-demand messenger delivery of preloaded video iPods *might* have provided a hint.

    EE

  11. BillyG says:

    I’m glad I came back for my weekend comment checks because "… we also have plans for you to edit the list or remove entries" is exactly what I was harping about with Dan(?) from Technorati after ranting about them not having this option.

    http://billy-girlardo.com/WP/2006/02/18/technorati-numbers-are-bs-i-can-prove-it/

    If TailRank implements this, that would definitely keep the DB clean and that is good for everybody.

  12. Brian Hayes says:

    Gee whiz!

    Page rank boosts notwithstanding, I will not believe ANY posts on April 1st; I will not believe ANY posts on April 1st; I will not believe ANY posts on April 1st…

  13. MSDNArchive says:

    (Brian – even though the Brightcove post was March 31st)…

  14. I have to agree with Steve Gillmor on this:

    &quot;All proprietary clouds of data obtained without the users’…

  15. BillyG says:

    …nuff said

    On your "I’ve still not got the GestureBank invite.", don’t feel bad, at least I don’t now lol. Guess they’re tweaking.

  16. MSDNArchive says:
  17. RuleZ023 says:

    I haven’t been up to anything today. I can’t be bothered with anything recently. Nothing seems worth thinking about. I haven’t gotten anything done recently, but oh well. Not much noteworthy going on worth mentioning.