Introducing Azure DocumentDB – Microsoft’s fully managed NoSQL document database service

Article
08/21/2014

Today is an extremely exciting day as we release Microsoft Azure DocumentDB, a fully managed, JSON document database service.

DocumentDB was built from ground up in response to the increasing demands of applications being developed here at Microsoft and by Microsoft Azure customers. We heard from customers that they need a database that can keep pace with their rapidly evolving applications – something fast, flexible and scalable. Increasingly NoSQL databases are becoming the tool of choice for many developers but running and managing these databases can be costly, especially at scale. We also heard that customers wanted more of the capabilities inherent to relational database systems – rich queries and transactional processing are still important. Most data stores offer extreme choices to developers – strong or eventual consistency, schema-free with limited query capabilities or schematized and rich queries capabilities, transactions or scale and so on. The fact is that numerous real world scenarios exist between these extremes and we want to address them.

So we considered what it would take to build a massively scalable, schema-free database with rich query and transaction processing using the most ubiquitous programming language (JavaScript), data model (JSON) and transport protocol (HTTP) – that is DocumentDB.

We decided to build a database engine which makes a deep commitment to the JSON data model and JavaScript language. This singular design choice, in-turn, enabled a set of distinctive capabilities including, the ability to automatically index documents without requiring any schema or secondary indices, the ability to issue SQL based relational and hierarchical queries over heterogeneous JSON values, the ability to integrate database transactions with JavaScript exceptions and the ability to seamlessly operate over JSON documents. As a multi-tenant database service, we have built each component of the stack with robust resource governance to ensure tenant isolation and the elastic scale of throughput and storage. As engineers, we obsess relentlessly on site reliability, high availability, performance, and scale. Finally, we believe that databases should be blazingly fast and yet safe by default.

Meeting the promise of schema-free

We wanted DocumentDB to support SQL queries over arbitrary documents without forcing the developer to create explicit schema or secondary indices or views. We wanted to give developers the freedom to rapidly iterate on application schema while preserving the ability to execute ad hoc queries. We also felt that queries should yield consistent results even when write rates are high.

Through the deep commitment to the JSON data model, DocumentDB is able to efficiently index, query and process heterogeneous documents. We designed the DocumentDB SQL language to be based on the JavaScript type system, expression semantics and ability to invoke JavaScript UDFs. DocumentDB’s query grammar adds document semantics, hierarchical and relational projections through a familiar SQL dialect for developers. This creates an efficient and natural way for you to query over JSON documents. The .NET SDK also includes a LINQ provider and we are considering native JavaScript mapping to our SQL query language.

We have designed the storage and indexing subsystem to serve consistent queries in the face of sustained high volumes of writes. This is accomplished using novel log structured storage techniques for index maintenance and indexing algorithms which fully exploit the SSDs. By default, all document properties are indexed and can be queried through the DocumentDB SQL query language.

More on DocumentDB SQL Query

Crowning JavaScript as a modern day T-SQL

For years developers have been able to rely on RDBMS systems for complex, transactional processing of data. As developers adopt NoSQL systems for the simplicity, speed and scale; they are often required to give up the transactional processing capabilities offered by traditional database systems. Database support for transactions provides a performant and robust programming model for dealing with concurrent changes. This can result in faster apps that are easy to maintain. We feel that support for application code execution within the database is important. But we don’t want to invent another procedural language. We want a broad set of developers to be able to write code that runs within the database, we also want the mapping from the procedural language to JSON be a seamless and natural as possible. So we chose JavaScript as the de facto language of DocumentDB – supported on all platforms, easy to understand with intrinsic support for JSON.

DocumentDB has deeply integrated JavaScript execution directly into the database engine. All execution of application JavaScript logic is sandboxed, resourced governed and fully isolated. DocumentDB lets developers write stored procedures and triggers natively in JavaScript. This allows developers to write application logic which can be shipped using HTTP POST and executed directly on the storage partition within a transaction boundary. JSON can be materialized as JavaScript objects and transactions can be aborted by throwing an exception. This approach of “JavaScript as a modern day T-SQL” frees application developers from the complexity of OR mapping technologies.

More on DocumentDB JavaScript Integration

More on DocumentDB Server Side JavaScript APIs – Stored Procedures, Triggers and UDFs

Tunable consistency and predictable performance

Eventually consistent systems can offer high availability and improved performance for applications. However as a developer it can be very challenging to build experiences in the face of eventually consistent data. There are no promises – data can be stale and out of order. While we are strong advocates of weaker consistency models (pun intended), we want to make sure that we provide a service that gives developers predictability, especially when it comes to data consistency. Why not give you the control to make smart and predictable tradeoffs when it comes to performance and consistency?

DocumentDB offers four distinct consistency levels for reads and queries - Strong, Bounded Staleness, Session, and Eventual. These well-defined consistency levels allow you to make sound tradeoffs between consistency, availability and latency. Bounded staleness guarantees both total ordering of writes as well as maximum staleness, a consistency level that is useful for applications dealing with time and ordered operations. Session consistency provides read your own write guarantees and can be a good match for user centric apps. These consistency levels are backed by predictable performance levels ensuring you can achieve consistent results for your application.

More on DocumentDB Consistency Levels

Seamless scale and delivered as a service

We hear frequently from customers that they don’t want to be consumed by managing, scaling and maintaining their database infrastructure. This is true for customers using relational databases as well as NoSQL databases. We feel that part of as-a-Service delivery means that developers should get fine grained control over how much of the service they consume and that scaling should be as simple as turning a dial. If you need more, turn the dial to increase your usage. If you need less, turn the dial back down. In either case, no downtime, no fuss, no problem. Continue to scale to as much as your application needs in either database storage and request throughput.

DocumentDB is a fully managed, multi-tenant Azure service and can be configured to scale with your user base. Database accounts can be easily created through the Azure portal with capacity to serve an application’s needs today. As these needs change, you can easily add or remove capacity. DocumentDB will allocate and reserve capacity exclusively for your application – this includes high performance database storage as well as dedicated request throughput capacity. This means that you get predictable performance with the ability to elastically scale by purchasing more capacity units.

Open and approachable

The world doesn’t necessarily need more data formats, procedural languages or protocols. The learning curve for new systems can be steep. Not to mention working with new and unfamiliar tools can slow you down. As we developed DocumentDB we firmly believed that we should resist the urge to be inventive where it didn’t deliver real value to you - the developer. Our goal with DocumentDB is to eliminate any friction associated with getting data in and using the service.

Programming against DocumentDB is simple, approachable and does not require you to buy into a specific tool-chain or require custom encodings or extensions to JSON or JavaScript. All functionality including CRUD, query and JavaScript processing is exposed over a RESTful HTTP interface. By offering simple, scalable JavaScript and JSON over HTTP, DocumentDB doesn’t invent in the area of data models, application models or protocols. DocumentDB’s uniqueness is in how it embraces these standards and offers distinctive, high value capabilities on top of them.

We have validated DocumentDB with first party applications at consumer scale. Today we are delighted to make DocumentDB is available to you through the Azure portal. In the coming weeks we’ll post more on both how to use DocumentDB as well as, the technical design of various sub-systems that make up the service.

To get started, visit the Azure DocumentDB service page.

- Azure DocumentDB Team

Introducing Azure DocumentDB – Microsoft’s fully managed NoSQL document database service

Additional resources