Open letter to Chris Law and Paul Martino

The creators of WSFinder, a wiki-based directory of web services and APIs that people are using to create mash-ups, are working on a new project that caught my eye - WSRelater.

WSRelater is a recommendations web service that enables a website to easily implement a "People who liked this (song, book, group) also liked the following..." function. Adding this recommendation functionality to your website can literally be done in a few hours. For example, if you had a site that sold electronics you could show someone who was buying a camera other electronics that other people who bought that camera also bought.

The innovative part of this service is that it is explicitly designed to work across multiple websites that have different kinds of data. For example if both a community site and an events site use this web service, related events can be shown on a community forum and related communities can be shown along side of an event.

Think of it as a networked distributed affinity engine that sites can plug into and reap the benefits.

Chris Law and Paul Martino - I like the idea, but have my concerns, specifically around the privacy and use of customer data. Let me explain.

From the WSRelater API documentation:

First:

For example, suppose a user visitors a website and is given a lifetime cookie

  MySite-0AE486E1-1C5A-9B43-EA06-61996230DA2F

This visitor views an advertisement with the following URL

     www.xyz.com/ad/1097

The following relation would be added to the system

     (PERSON: MySite-0AE486E1-1C5A-9B43-EA06-61996230DA2F, ADVERTISMENT: www.xyz.com/ad/1097) 
 Then:

Miscellaneous

  • The first time a user visits a website send a PERSON-to-WEBSITE relation with as short a website name as possible, like www.amazon.com. By doing so good recommendations for other websites to visit can be made.
  • Send a lifetime cookie down for all visitors to your site and when they become members submit a PERSON-to-SELF relation to the database.
  • If a user changes his or her email address on your website send a PERSON-to-SELF relation for the old and new email address.

Should I worry? I think so. The three concerns revolve around privacy issues:

1. The implication of the above API documentation is that there is network-based cookie that could track me across multiple sites, and that these sites could be any site that has signed up and connected into the WSRelater API. From the documentation, email address is part of the dataset that is flung far and wide (although FOAF is recommended, see below). Question: How do I know with whom you are sharing what data with? Just hearing myself ask the question sounds scary.

2. Since the email address acts as an identifier (via FOAF or the actual email address), this means my site 'use' data or any other data collected by any site that is part of the network (remember - I don't know which sites), could be potentially associated and stored along with my other personal data collected on other sites. The Question now becomes: How do I know with whom you are sharing my data with, and what is being done with all my data. Again, that just sounds scary.

3. Contact preference management - my data, how it is stored and shared, do not contact flags, etc, - can't be managed by me nor set in one place (e.g. unsubscribe) and propagated across the WSRelater network of sites, even if each site has owner has similar (or exactly the same) customer data privacy policies. Here's what I mean: I go to site www.xyz.com. I sign up for 'foo' service and provide my email. That data (at least the email address) is now shared as a unique identifier to site www.abc.com. As as site customer of xwy, how do I know my data has been shared with abc? How can I be certain that if I unsubscribe from abc's email list that the unsubscribe flag has made its way to abc. So, now Question is: How do I know with whom you are sharing what data, what is being done with all my data and how can I be certain that I am in control of my data when I want to change it and who do I hold accountable when things go wrong?

Now to be fair, on the point of sharing email addresses, the API documentation states:

As a best practice, email addresses should never be submitted to the relation engine. Instead use the FOAF (friend of a friend) standard technique of computing the SHA1SUM of the following

   mailto:paul@wsfinder.com

Which is

   503ab22d4f616a7cb1242b0c86c7d6786ad53788

The phrase of concern here is 'As a best practice'. Surely, and at the very least, the phrase (and by the relation engine's design) should be 'email addresses cannot be used'.

I realise this all might sound shrill, and maybe I have completely got the wrong end of the stick on all this. Maybe my concerns (I hope they are) are unfounded. But my interpretation of WSRelater is that this is a distributed ad network of the type that got DoubleClick in trouble a few years ago (remember this?) . The fact that the use cases provided in the API documentation are affinity engine-based doesn't get away from the fact that this approach is what landed DoubleClick in trouble. It is great for advertisers and shops who want to flog stuff, but for the customer, this is not so good at all in terms of data privacy.

It boils down to who should be in control of my data - the network of sites? The 'relation engine'? Or me? I say me.

The heart of WSRelater's idea is a very good one if done correctly. This is where the AttentionTrust.org is coming from. And Joshua Porter. AttentionTrust.org recognises the potential value of being able to share your data across sites, but its central premise is that you control your data. In my opinion, WSRelater seems to have missed this point completely in this regard, or at least hasn't communicated and assured me why this shouldn't be a concern.

Chris Law and Paul Martino, tell me I'm wrong. Show me how I'm wrong. I really want to be wrong, because I like your idea and want to support you. Explain how I can trust the system from a privacy standpoint and any of the networked sites plugged into WSRelater.Please prove that my concerns above are unfounded.

My apologies in advance if I'm all wrong about this. I'll make it up to you, I promise :-)

Thanks,

Alex.

+++ Update +++

Update: Paul Martino: Reply to open letter from Alex Barnett.

Paul Martino also responds by way of a comment to this post (within an hour of my original posting), I have copy and pasted it here:

Alex,
I appreciate your comments on this issue. Chris and I have spent a great deal of time thinking through the privacy issues that you discuss here. One of the primary reasons that we have launched as an “Alpha” and not even a “Beta” product is that we are looking specifically for this kind of feedback. Furthermore, the solutions to many of the issues you have brought up are part of our roadmap.
Let me address our overall strategy. In order for WSRelater to work, we need to know the aggregate viewing behavior of a set of people. We do NOT need to know specifically who they are. For example, we need to know
Some person looked at item A, then item B, then item C
We do not need to know that the person above is “paul@wsfinder.com”.
In order for this to work across websites, WSRelater needs some way to know that the two people in the following example are the same person
Some person looked at a group 1 on Tribe, group 2 on Tribe
Some person looked at discussion 3 on eCademy
The solution is to use the FOAF practice of sending a one-way hash of the person’s email address. (As an FYI, if you submit an email address to the WSRelater with type PERSON it is automatically converted – we don’t store the address). By using this best practice attention can be aggregated across partners using WSRelater.
Let me address your specific issues and describe what our strategy is as we move from an Alpha to a Beta product.
(1) How do I know with whom you are sharing what data with?
The answer is simple: we are not sharing your user data with our partner sites. A partner site queries WSRelater for recommendations of items, not for information about a person.
Right now we have an Alpha API feature that lets you query the database for information about a user. This is for debugging and testing purposes (another reason that we are in Alpha). Its really hard to know if you implemented the API correctly if you can not directly query the system. Perhaps we should only enable this API function for the site that contributed the data or only for the development instance of the database. We are looking for your feedback.
(2) What is being done with all my data?
We are using it in aggregate to make recommendations of items that a person would be interested in. It is an item-to-item filter, so a partner asks for information about a item and gets back more items.
(3) How can I be certain that I am in control?
Since the system is based on the one-way hash of an email address we have a nice way to deal with this issue. The user who controls the email address with the behavior can remove any of the behavior data about him or herself or can opt out of being used for recommendations completely.
(This is not yet implemented, but this is how it will work)
After verifying a person is the owner of an email address, he or she can use a web interface to view all WSRelater behavior entries from that address. Any or all of these can be removed at any time. Since this is a real time system, item recommendations that relied on this data will be updated the next time they are requested. By opting out completely no data will be accept by WSRelater from this one-way hashed emailed address.
We would love to have this “console” interoperate with existing efforts like Attention Trust, or perhaps even BE Attention Trust. Why rebuild if it is already there?
As a follow up question: how do you feel about WSRelater keeping this opted-out data, but removing any reference to the user when it is removed? So when a user opts-out of the WSRelater, instead of deleting all rows from the database, a unique random identifier (that can never be tied to you) will be applied to those rows, instead of the one-way hash of the users email.
As a final note: please remember this is a recommendation web service, not an ad network or a personalized search engine. We are not trying to deliver ads for wedding planners because you type “engaged” into a profile. Our goal is to provide high quality recommendations of items that you might be interested in based on what other people previously liked (or didn’t like). Being that this is an item-to-item system it can work for an anonymous user who simply clicked on a first book, image, or group and want to see more things like it. We think this is a compelling proposition for the end user and leverages the collective wisdom of crowds in doing so.

--

Many thanks for the quick response Paul, and I like it.

Love this bit:

"We would love to have this “console” interoperate with existing efforts like Attention Trust, or perhaps even BE Attention Trust. Why rebuild if it is already there?"

I was hoping you'd say something like this. You're totally on the right track here.

I look forward to seeing other's views on WSRelater and the topic around privacy.

Alex.

Tags: attention attentiontrust attention.xml