Optimistic Concurrency & Data Services

DB Blogs

April 22nd, 20080 0

Different applications have different requirements around consistency and how concurrent modifications are handled. I’ll oversimplify and put all these applications in two buckets: either you care about controlling concurrent changes or you don’t.

If you’re creating a REST interface to your data and don’t care about concurrency (e.g. no deep consistency rules, or nice units of change that change in whole consistent ways), then you can use the basic HTTP methods to retrieve (GET) and manipulate (POST, PUT, DELETE) resources directly without any more context than the representations of your resources. You get “last one wins” semantics on updates in this case.

On the other hand, if you do care about concurrency in your REST interface, there are more aspects to take into consideration. If your resources are atomic (no further structure that’s interesting from the concurrent changes perspective than the resource as a whole), then you can have an out of band mechanism for creating a “version number” for each resource -typically a monotonically increasing number- and use HTTP’s existing mechanism to ensure you overwrite stuff that you know about. In HTTP you can stick an “entity tag” or ETag to your responses that contain an opaque value used to denote the version or state of a resource. Later on, when you want to modify a resource, you can use that value in a “if-match” request header to make sure that your knowledge about the state of the resource you’re modifying is still current. If it’s not the resource in the sever would have an ETag that won’t match the one you provided and you’d get back a 412 “Precondition failed” status code. All that is standard HTTP 1.1 stuff described in RFC 2616. (ETags are also used for caching and conditional gets in addition to the scenario I described here).

Now, REST data services that expose structured data have to deal with various challenges beyond the basics, which I’ll go into details below. While I discuss this in the context of the ADO.NET Data Services Framework (Astoria), I’m sure some of these problems apply to a broader set of applications.

Creating ETags: concurrency tokens

The data services framework has to deal with the fact that we don’t control the data sources that we expose through the REST interface. Sometimes each entity that we turn into a resource will have a nice clean property that’s a timestamp or similar and maps perfectly to ETag semantics (e.g. whenever we change the value in a significant way the value of this property changes). However, often the schema of the underlying data is not under the control of the service developer so we have to work with what we have. What that means in practice is that you can tell the data services framework which properties of each entity type are “concurrency tokens”. Changing those values means that you chanced the version of the resource.

The way you do that in the framework is by using an [ETag(props…)] attribute in your class or an annotation in your EDM schema. For types that don’t have any concurrency tokens we won’t generate ETags for the responses for those types, and they get “last one wins” update behavior.

Once you indicated which property or properties are your concurrency tokens we can produce ETags by using the values of those properties for the particular instance we’re returning.

During update the data services runtime works with the data source to determine whether the concurrency token values that were marshaled through ETags and if-match headers still match, and if so perform the update/delete operation. If they don’t match a 412 response is sent to the client.

Including ETags in headers and/or payloads

The HTTP spec describes the ETag response header to transport the entity tag for a given resource. That works great for us for cases were we respond with a single entity (e.g. an entry in Atom terms), but it doesn’t when we return a collection of entries from a URL (e.g. an Atom feed). For the latter scenario, we include the ETag as part of the resource representation (in the entry for Atom, in the “__metadata” property for JSON), for example:

<!– rest of the entry –>

</entry>

Validation during side-effecting operations

Concurrency tokens are validated whenever you perform an operation that affects the state of an existing resource. In the data services REST interface that means HTTP PUT and DELETE methods.

As I mentioned above, validation happens during update processing by extracting the “original” values from the ETag (which was sent back through the if-match header) and comparing them with the data in the data source. If they are the same, we consider the whole resource the same and proceed with the modification.

An interesting question is whether presenting an ETag in a if-match header should be mandatory for resources that have concurrency tokens. Put another way: should the decision of whether it’s ok to potentially overwrite changes based on state knowledge be up to the client or restricted by the server? The HTTP spec defines a special value of “*” for the if-match header that effectively means “any value will match”. The behavior that we are planning for is that if an entity type has concurrency tokens then we’ll always require an “if-match” header in modification operations. The header value can be an actual ETag obtained through a GET request or “*” meaning “I know this type supports concurrency control, but I’ll overwrite it anyway”.

Almost, but not quite, a perfect match

HTTP ETags and conditional operations are almost a perfect match to what we need to handle concurrent activity in RESTful data services. There are, however, a few glitches. This is where we get into the fine-print that’s not necessarily popular knowledge. Mike brought up many of these details I wasn’t aware of.

ETags can be “strong entity tags” or “weak entity tags”. Weak ETags are very similar to what happens when we have entities for which only some properties are designated concurrency tokens. From section 13.3.3 of the HTTP spec:

“However, there might be cases when a server prefers to change the

validator only on semantically significant changes, and not when

insignificant aspects of the entity change. A validator that does not

always change when the resource changes is a “weak validator.” “

The problem is that weak ETags only apply to GET operations, they cannot be used for PUT/DELETE which is what we’re trying to do.

For cases where you own the data, the data services framework can expose a compliant interface by using constructs such as timestamps (if using a database as a data source), where any change in the entity will reflect in the ETag. You can also used a relaxed form of ETags where the entity might change but the ETag stay the same. It’s not completely HTTP compliant and may confuse intermediate systems, but it may be your only option in some scenarios.

As always, thoughts and feedback is welcome.

Pablo Castro
Software Architect
Microsoft Corporation

This post is part of the transparent design exercise in the Astoria Team. To understand how it works and how your feedback will be used please look at this post.

Optimistic Concurrency & Data Services

DB Blogs

Read next

0 comments