Scrubbing UserId in Windows Azure Mobile Services

First, many thanks to Chris Risner for the assistance on this solution!   Chris is part of the corp DPE team and has does an extensive amount of work with Windows Azure Mobile Services (WAMS) – including this session at //build, which was a great resource for getting started.

If you go through the demo of getting started with WAMS building a TodoList, the idea is that the data in the todo list is locked down to each user.   One of the nice things about WAMS is that it’s easy to enforce this via server side javascript … for example, to ensure only the current user’s rows are returned, the following read script can be used that enforces the rows returned only belong to the current user:

 function read(query, user, request) {
   query.where({ userId: user.userId });    
   request.execute();
}

If we crack open the database, we’ll see that the userId is an identifier, like the below for a Microsoft Account:

MicrosoftAccount:0123456789abcd

When the app connects to WAMS, the data returned includes the userId … for example, if we look at the JSON in fiddler:

image

The app never displays this information, and it is requested over SSL, but it’s an important consideration and here’s why.   What if we have semi-public data?   In the next version of Dark Skies, I allow users to pin favorite spots on the map.  The user has the option to make those points public or keep them private … for example, maybe they pin a great location for stargazing and want to share it with the world:

image

… Or, maybe the user pins their home locations or a private farm they have permission to use, where it might be inappropriate to show publically.

Now here comes the issue:  if a location is shared publically, that userId is included in the JSON results.  Let’s say I launch the app and see 10 public pins.  If I view the JSON in fiddler, I’ll see the userId for each one of those public pins – for example:

image

Now, the userId contains no personally identifiable information.   Is this a big deal, then?   It’s not like it is the user’s name or address, and it would only be included in spots the user is sharing publically anyway.

But, if a hacker ever finds a way to map a userId back to a specific person, this is a security issue.  Even my app doesn’t know who the users really are, it just knows the identifier.  Still, I think from a best practice/threat modeling perspective, if we can scrub that data, we should.  Note: this issue doesn’t exist with the todo list example, because the user only, and ever, sees their own data.

Ideally, what we’d like to do is return the userId if it’s the current user’s userId.  If the point belongs to another user, we should scrub that from the result set.   To do this via a read script in WAMS, we could do something like:

 function read(query, user, request) {
   
   request.execute( {
    success: function(results) {
        
         //scrub user token      
         if (results.length > 0) 
         {
            for (var i=0; i< results.length; i++)
            {
                if (results[i].UserId != user.userId)
                {
                    results[i].UserId = 'scrubbeduser';                                                                                                                                      
                }                           
            }              
          }       
           
        request.respond();
    }
  });
}

If we look at the results in fiddler, we’ll see that I’ll get my userId for any of my points, but the userId is scrubbed if it’s another user’s points that are shared publically:

image

[Note: these locations are random spots on the map for testing.]

Doing this is a good practice.  The database of course has the correct info, but the data for public points is guaranteed to be anonymous should a vulnerability ever present itself.   The downside of this approach is the extra overhead as we’re iterating the results – but, this is fairly minor given the relatively small amounts of data.

Technical point:  In my database and classes, I use Pascal case (as a matter of preference), as you can see in the above fiddler captures, such as UserId.   In the todo example and in the javascript variables, objects are conventionally camel case.   So, if you’re using any code here, just be aware that case does matter in situations like this:

  if (results[i].UserId != user.userId) // watch casing!

Be sure they match your convention.   Since Pascal case is the standard for properties in C#, and camel case is the standard in javascript, properties in .NET can be decorated with the datamember attribute to make them consistent in both locations – something I, just as a matter of preference, prefer not to do:

 [DataMember(Name = "userId")]
public string UserId { get; set; }