HTML5 does databases


The HTML5* specification has been cooking for a while and lately the amount of buzz around it has been growing at full speed. Just search for #HTML5 in twitter and you’ll see what I mean. After even a quick look at it, it becomes evident that the next version of HTML aims to go much further into the application space than earlier ones. Not only there is a lot of highly expected presentation features such as <video> and <canvas>, but also several APIs to do things that applications do, from background work (with Web Workers) to direct communication (with Web Sockets) to offline support (with the App Cache) and databases (with Web SQL Database and/or Indexed Database API).


I’ve always been attracted to things that bring data and the Web together. So a while back when I first saw browsers and databases in the same spec I had to get involved. There is a bunch of us at Microsoft interested in HTML5 from different angles, and we have now good momentum to explore this space. So now I’m spending a good chunk of time on the database aspects of HTML5, the API, the developer story, etc.


(btw – no, I haven’t given up on Astoria or OData, I’m still plenty busy with that…but hey, how bad could be it to add a whole subject area to the work schedule 🙂 )


Current state of things


There are currently two proposals for databases in the browser: Web SQL Database and Indexed Database API.


Web SQL Database was the first to appear. It consists of a relatively small Javascript API that allows developers to execute SQL statements. You do most things through SQL, including schema management, querying, updates, etc. While the spec doesn’t (or didn’t for a while) specifically say anything about the actual dialect of SQL to be used, in practice early implementations used SQLite, and thus directly exposed the SQLite SQL dialect and other details specific to it.


The second and more recent proposal is now called Indexed Database API (was WebSimpleDB) and it exposes an API in terms of ordered tables of Javascript objects. You get or put Javascript objects and use a key to identify and order them. For database people, this is basically an ISAM API with Javascript objects as the record format. You can create indexes to speed up lookups or scans in particular orders. Other than that there is no schema (any clonable Javascript object will do) and no query language.


The “right” level of abstraction


Of course there are “sides” for this debate about the right database API. I’ll just be upfront and take mine: I like the indexed database API better. I have two main motivations for this:


Interoperability: this thing is going to be part of HTML and has to be implementable by multiple browser vendors and still be fully interoperable. Implementing multiple SQL databases and making them fully interoperable is extremely challenging. You would have to line up not only the SQL syntax, but also catalog names, type semantics, execution behavior and perhaps even isolation model and optimization strategies if you wanted really, really similar behavior across browsers. Of course most of this is already described in specifications such as SQL-92 (or SQL:1999, or any of the newer ones); however, databases often don’t follow all of it for various (good or not) reasons. The ISAM-style API imposes substantially simpler requirements on the underlying implementation, making it a good candidate for independent, interoperable implementations.


Diversity: if you look at what folks out there have done with Javascript toolkits, it’s pretty amazing. From jQuery to Dojo, they’ve created different abstractions over the base browser APIs, providing diversity of choice for developers writing Web applications. A simple and low level API enables this. Folks can write libraries that provide whatever abstraction they see fit. This includes anything from simple shims to full query support, in whatever language you think it’s right to do it in the browser. With a low level API that’s consistent across all browsers, and all the fancy abstractions built on top in Javascript, the goal of portability becomes clearly reachable.


The guiding principle can be summarized as: build into the standard only the things that you cannot build on top (there are a few exceptions for very common idioms, of course; those are, well…exceptions).


Folks from Mozilla share this perspective, which is encouraging. Nikunj, the author of the indexed database API, is from Oracle, and supports this sort of by definition, given that he’s the editor of the spec. You can see more discussion about this in the webapps working group list archive, including this post where I outlined Microsoft’s take.


We’re exploring


We are working on understanding the API, providing feedback to the W3C WebApps working group, and creating experimental implementations to explore the space, try it out in applications and discover good and bad things about it.


-pablo


* NOTE: I refer here to HTML5 kind of loosely to refer to the “next round of HTML technology”. There are actually several specs involved in addition to the core HTML5 spec itself

Comments (5)

  1. Steven Oldner says:

    Well reasoned and I agree with your take.  Don’t spec the final product, but allow others to build on top of a base.  Hope they listen to you!

  2. Jay Godse says:

    SQLite is a public-domain database and it is very robust. It implements most of SQL92. It is the most widely deployed database engine. It has a small footprint. It uses many tricks for crash-proofing. Microsoft could just take SQLite and put it into IE, and incur no IP liability, and get a full working SQL database in the browser.

    Multiple interoperable implementations make sense when the owner of each implementation closes his code base and only opens the APIs. SQLite has no such issue. You can download the code and tests and see if it all works. Then you can embed it into your browser. Interoperability is not needed because the implementation is in the public domain. Everybody can just copy it. Furthermore, because it is in the public domain, you do not have to open the code base of your derivative work (IE in this case).

    RDBMS engines are a great way to encode data that represents solution domains with complex relationships among entities, because it enables you to structure such data for better data integrity and therefore less code and fewer bugs. Enterprise data folks have proven this for a long time. SQLite brings this value to the browser client.

    Now, if what you really want is a way to passivate your existing Javascript data structures, then perhaps IndexDB makes sense. i.e. IndexDB will become a simple way to cache your JS data structures and DOMs. However, I believe that you can do much more with local storage.

    I think that people will eventually discover that you can make perfectly good standalone web apps using SQLite, HTML, CSS, Javascript, jQuery, and perhaps jQTouch. This is not unlike the desktop stack of years ago that used Access, VB, and Microsoft's UI to build nice desktop apps. And since WebKit is gaining market share (e.g. Google Chrome, Safari, iPhone, Blackberry, Android), I think that SQLite will live as the local storage option long after a good IndexDB implementation is available. (Check out http://www.youtube.com/watch).

  3. Michael J. Ryan says:

    What I don't get is MongoDB already has a *really* nice API and JavaScript interface for database management.. why not just implement this as an in-browser model, with only access to a single assumed database (being the hostname for the site in question).?

  4. pabloc says:

    Michael: MongoDB does have a nice API. The tricky part is that it also has quite a bit of functionality. One of the goals of IndexedDB is to enable folks to build things like MongoDB on top of a base API that's minimalistic and can be implemented consistently across browsers…and since different high-level storage solutions will likely have different needs, it seemed to us that it would be best to provide the low-level infrastructure and let folks build things on top.

  5. Kyaw says:

    Interesting history about IndexedDB. I have very thin but powerful wrapper for IndexedDB bitbucket.org/…/ydn-db That nicely failback to Sqlite too.