HTML5 does databases

The HTML5* specification has been cooking for a while and lately the amount of buzz around it has been growing at full speed. Just search for #HTML5 in twitter and you'll see what I mean. After even a quick look at it, it becomes evident that the next version of HTML aims to go much further into the application space than earlier ones. Not only there is a lot of highly expected presentation features such as <video> and <canvas>, but also several APIs to do things that applications do, from background work (with Web Workers) to direct communication (with Web Sockets) to offline support (with the App Cache) and databases (with Web SQL Database and/or Indexed Database API).

I've always been attracted to things that bring data and the Web together. So a while back when I first saw browsers and databases in the same spec I had to get involved. There is a bunch of us at Microsoft interested in HTML5 from different angles, and we have now good momentum to explore this space. So now I'm spending a good chunk of time on the database aspects of HTML5, the API, the developer story, etc.

(btw - no, I haven't given up on Astoria or OData, I'm still plenty busy with that...but hey, how bad could be it to add a whole subject area to the work schedule :) )

Current state of things

There are currently two proposals for databases in the browser: Web SQL Database and Indexed Database API.

Web SQL Database was the first to appear. It consists of a relatively small Javascript API that allows developers to execute SQL statements. You do most things through SQL, including schema management, querying, updates, etc. While the spec doesn't (or didn't for a while) specifically say anything about the actual dialect of SQL to be used, in practice early implementations used SQLite, and thus directly exposed the SQLite SQL dialect and other details specific to it.

The second and more recent proposal is now called Indexed Database API (was WebSimpleDB) and it exposes an API in terms of ordered tables of Javascript objects. You get or put Javascript objects and use a key to identify and order them. For database people, this is basically an ISAM API with Javascript objects as the record format. You can create indexes to speed up lookups or scans in particular orders. Other than that there is no schema (any clonable Javascript object will do) and no query language.

The "right" level of abstraction

Of course there are "sides" for this debate about the right database API. I'll just be upfront and take mine: I like the indexed database API better. I have two main motivations for this:

Interoperability: this thing is going to be part of HTML and has to be implementable by multiple browser vendors and still be fully interoperable. Implementing multiple SQL databases and making them fully interoperable is extremely challenging. You would have to line up not only the SQL syntax, but also catalog names, type semantics, execution behavior and perhaps even isolation model and optimization strategies if you wanted really, really similar behavior across browsers. Of course most of this is already described in specifications such as SQL-92 (or SQL:1999, or any of the newer ones); however, databases often don't follow all of it for various (good or not) reasons. The ISAM-style API imposes substantially simpler requirements on the underlying implementation, making it a good candidate for independent, interoperable implementations.

Diversity: if you look at what folks out there have done with Javascript toolkits, it's pretty amazing. From jQuery to Dojo, they've created different abstractions over the base browser APIs, providing diversity of choice for developers writing Web applications. A simple and low level API enables this. Folks can write libraries that provide whatever abstraction they see fit. This includes anything from simple shims to full query support, in whatever language you think it's right to do it in the browser. With a low level API that's consistent across all browsers, and all the fancy abstractions built on top in Javascript, the goal of portability becomes clearly reachable.

The guiding principle can be summarized as: build into the standard only the things that you cannot build on top (there are a few exceptions for very common idioms, of course; those are, well...exceptions).

Folks from Mozilla share this perspective, which is encouraging. Nikunj, the author of the indexed database API, is from Oracle, and supports this sort of by definition, given that he's the editor of the spec. You can see more discussion about this in the webapps working group list archive, including this post where I outlined Microsoft's take.

We're exploring

We are working on understanding the API, providing feedback to the W3C WebApps working group, and creating experimental implementations to explore the space, try it out in applications and discover good and bad things about it.

-pablo

* NOTE: I refer here to HTML5 kind of loosely to refer to the "next round of HTML technology". There are actually several specs involved in addition to the core HTML5 spec itself