Getting Acquainted with NoSQL on Windows Azure

imageNoSQL databases are often employed in public, massively scaled Web site scenarios, where fast fetching of relatively simple data sets matters most.

Relational databases get the nod for  transactional, atomic writes, indexing of non-key columns, query optimizers, and declarative, set-oriented query.

NoSQL provide some or all of the following features:

  • Key-value stores.
  • Document stores.
  • Wide column stores.
  • Graph databases.

This post describes the main features of NoSQL, provides some general guidance on when to use NoSQL, and how you can get started using NoSQL on Windows Azure. I’ll go in depth on how you can use MongoDB and sones GraphDB in your Azure application, and explain how you can get started with those technologies. I’ll also explain how two Azure offerings fit some  NoSQL traits.

Key-Value Stores

imageMost NoSQL databases feature key-value mechanisms. A key-value pair might consist of a key like “Phone Number” that is associated with a value like “(212) 555-1212.”

Key-Value stores can be used as collections, dictionaries, associative arrays and caches. Key-Value Stores would work well for anything where lists, like product categories, individual product attributes, shopping cart contents, or individual values like color schemes, a landing page URI, or a default account number.

Values can consist of long text content, not just numeric and short string data. As such, content like comments, reviews, status messages or even private emails can be stored in a Key-Value Store. And values can be described as fields, and each value can have completely different fields.

Document Stores

imageDocument Stores are NoSQL databases that treat “records” or “rows” as “documents.”

Documents themselves can be addressed by unique URLs, which makes document databases automatically REST-friendly.

This means that for example, JavaScript function could render HTML with the return statement could be stored in special documents called design documents. That function could be accessible via URL. This means that entire Web applications can be implemented in a document database. Users visit a URL, code runs on the server and content is returned via the HTTP response stream

HTTP and application orientation distinguishes Documents Stores from Key-Value Stores.

Wide Column Stores

imageWide Column Stores, also known as Column Family Stores, manage key-value pairs, but they organize their storage in a semi-schematized and hierarchical pattern.

Some of the Wide Column Stores nomenclature is similar to RDBMS technology. For example, the keys in a Wide Column Store are referred to as columns and are stored in structures that are sometimes referred to as tables. Between the table and the column level lie various intermediate structure that vary depending on your vendor.

Although the schema within the intermediate structures can vary from row to row, tables and the intermediate structures themselves must be declared. Wide Column Stores, while they tolerate schema variation at the “leaf” column level, are not completely schema-free.

As an example, in a product catalog, we may have a collection of items, each of which has a size and a rating associated with it, and we may want to store these items together in a table.

Graph Databases

imageGraph databases recognize entities in a business or other domain, and explicitly track the relationships between them. In the graph database world, these entities are called nodes and the relationships between them are called edges; all of these terms come from mathematical graph theory.

New edges can be added (or old ones removed) at any time, allowing one-to-many and many-to-many relationships to be expressed easily and avoiding anything like an intermediate relationship table that you might use in a relational database to accommodate many-to-many joins.

Constructs like friends, followers, degrees of separation, lists, endorsements, status messages and responses to them are very naturally accommodated in graph databases. Semantic Web data also maps quite nicely on to the graph database structure.

NoSQL Common Traits

Shared Legacy: MapReduce, Hadoop, BigTable and HBase. NoSQL databases often require queries to be broken up and executed across multiple repositories on different servers. At some point, the resulting segmented result sets need to be collected and unified. An approach called map-reduce acknowledges and addresses this. Specifically, the process of distributing the query across multiple agents is the Map step, and the process of coalescing the results into a single result set is the Reduce step.

NoSQL Database Consistency: Many NoSQL databases use an “eventual consistency” model for database updates and schema changes. This means that changes made at one replica will be transmitted asynchronously to the others. That said, not all NoSQL databases use eventual consistency. Some are fully transactional. Others use an optimistic concurrency model.

NSQL Indexing: Some NoSQL databases index on little else than the keys used for rows/entities/documents and/or partitions. Others go a bit beyond this.

MongoDB

Instead of storing data in tables as is made in a "classical" relational database, MongoDB stores data as JSON-like documents with dynamic schemas (MongoDB calls the format BSON).

MongoDB has databases, collections, and indexes much like a traditional relational database. In some cases (databases and collections) these objects can be implicitly created, however once created they exist in a system catalog (db.systems.collections, db.system.indexes).

In MongoDB do do not need to define fields or what what the relational databases call columns in advance. There is no schema for fields within documents – the fields and their value datatypes can vary. In practice, you typically would store documents of the same structure within collections.

The collection itself is not defined. The database creates a collection on the first insert. And when you do the insertion, MongoDB the object is assigned an object ID.

MongoDB on Azure

You run a MongoDB replica set on Windows Azure. Replica set members are run as Azure worker role instances. MongoDB data files are stored in an Azure Blob mounted as a cloud drive. You can use any MongoDB driver to connect to the MongoDB server instance.

Microsoft provides a tutorial for how to use MongoDB on Azure.  In Node.js Web Application with Storage on MongoDB, you will learn how to:

  • Add MongoDB support to an existing Windows Azure service that was created using the Windows Azure SDK for Node.js.

  • Use npm to install the MongoDB driver for Node.js.

  • Use MongoDB within a Node.js application.

  • Run your MongoDB Node.js application locally using the Windows Azure compute emulator.

  • Publish your MongoDB Node.js application to Windows Azure.

MongoDB Queries

MongoDB queries are expressed as JSON (BSON) objects. So you can use JavaScript to replace your CRUD operations. You never explicitly create a database or collection. MongoDB does not require that you do so. When you insert something, MongoDB creates the underlying collection and database. If you query a collection that does not exist, MongoDB treats it as an empty collection.

Example Calls

Some examples:

SQL Statement Mongo Statement
CREATE TABLE USERS (a Number, b Number) db.createCollection("mycoll")
SELECT a,b FROM users db.users.find({}, {a:1,b:1})
SELECT * FROM users WHERE name LIKE "%Joe%" db.users.find({name:/Joe/})
SELECT * FROM users WHERE a=1 and b='q' db.users.find({a:1,b:'q'})
SELECT COUNT(*y) FROM users db.users.count()
CREATE INDEX myindexname ON users(name) db.users.ensureIndex({name:1})
UPDATE users SET a=1 WHERE b='q' db.users.update({b:'q'}, {$set:{a:1}}, false, true)
DELETE FROM users WHERE z="abc" db.users.remove({z:'abc'});

See SQL to Mongo Mapping Chart for more examples.

More Information on MongoDB

For more information on MongoDB, see MongoDB on Azure. You’ll find out more about setting it up, building an application on Azure, deploying and running. Also,

sones GraphDB

According to its Website, sones GraphDB is the first graph database which is available on Microsoft Windows Azure. Since the sones GraphDB is written in C# and based upon Microsoft .NET it can run as an Azure Service in it's natural environment.

The sones GraphDB is an object-ori­en­tated graph data storage for a large amount of highly con­nected semi-struc­tured data in a dis­trib­uted envi­ron­ment. In con­trast to clas­si­cal rela­tional but also purely object ori­en­tated data­bases this implies two very impor­tant con­se­quences: First its main focus is no longer the data, objects or ver­tices itself, but their (type-safe) inter­con­nec­tions or edges. This means we are inter­ested in the name of an user within a large scale social net­work, but we are much more inter­ested to know which films his friends-friends watched last sum­mer and thought that they were amaz­ing. In the near future we will pro­vide a large frame­work of graph algo­rithms for these prob­lems and usage scenarios.

For more information and to get started, see sones GraphDB Wiki and Documentation.

NoSQL Using Windows Azure Storage

Andew Brust wrote a paper for Microsoft entitled NoSQL and the Windows Azure platform -- Investigation of an Unlikely Combination available from Microsoft Download.

In the paper, Andrew makes the case that Azure Table Storage is in fact a NoSQL database. Of the various categories of NoSQL database discussed in the last section, Azure Table Storage fits most snugly with Key-Value Stores.

Azure Storage key-value pairs are called Properties; they belong to Entities which, in turn, are organized into so-called Tables. Azure Table Storage features optimistic concurrency and, as with other NoSQL databases, is schema-free, so the properties of each entity in a table may differ.

The Windows Azure Table service is structured storage in the cloud. An application may create many tables within a storage account. A table contains a set of entities (rows). Each entity contains a set of properties. An entity can have at most 255 properties including the mandatory system properties - PartitionKey, RowKey, and Timestamp. "PartitionKey" and "RowKey" form the unique key for the entity.

 

For a tutorial on using Windows Azure Storage, see Windows Azure and SQL Azure Tutorials - Tutorial 1: Using Windows Azure Web Role and Windows Azure Table Service  on TechNet. In the tutorial, you will learn how to:

  • Understand the process of developing a Windows Azure application
  • Understand the process of deploying an application to Windows Azure
  • Create a Web role
  • Use the Table service

Also see Windows Azure Table Storage – Not Your Father’s Database in MSDN Magazine.

For a tutorial for about how Windows Azure Storage works with PHP, see Tutorial - Using Table Storage.

NoSQL Scale Using SQL Azure Federations

Cihan Biyikoglu in his blog post The “NoSQL” Gene in SQL Azure Federations describes how the tenants of NoSQL apply in SQL Azure Federations.

Federations in SQL Azure are a way to achieve greater scalability and performance from the database tier of your application through horizontal partitioning. One or more tables within a database are split by row and portioned across multiple databases (Federation members). This type of horizontal partitioning is often referred to as ‘sharding’. The primary scenarios in which this is useful are where you need to achieve scale, performance, or to manage capacity.

SQL Azure database can deliver scale, performance, and additional capacity through federation, and can do so dynamically with no downtime; client applications can continue accessing data during repartitioning operations with no interruption in service.

Chinan submits that SQL Azure Federation has many of the principles of NoSQL due to its support of the following NoSQL ideas:

Scale-out for Massive Parallelism. Federations provide the ability to take advantage of the full computational power of a cluster to parallelize processing. By federating your workload, atomic-unit focused work (a.k.a OLTP work by many of the SQL minded folks), such as “placing an order” or “shopping cart management”, get parallelized to scale to massive concurrent user load… There is little coordination between nodes needed thus the full power of the cluster is focused on processing the user workload.

Loosened Consistency or Eventual Consistency. With federations, each federation member and atomic unit provide the familiar local consistency guarantees of ‘databases’. However, you can have different schema between federation members. That is fine in federations. Federations also push to a looser model of consistency for query results across multiple federation members.

Lightweight Local Storage Besides Reliable Storage. One of NoSQL traits is arguably the ability to move processing close to the data. You can continue to use stored procedures, triggers, tables, views indexes and all other objects you are used to, to take full advantage of the powerful programmability surface of SQL Azure. SQL Azure databases are not lightweight local stores however. They are highly available, none volatile, replicated and protected. And you can use tempdb that is purely local.

Unstructured or Semi Structured Data. SQL Azure also support hierarchy data type and indexing as well as XML data type for semi structured data. Blob types are there for completely unstructured data.

For more information about Azure Federations, see Federations: Building Scalable, Elastic, and Multi-tenant Database Solutions with SQL Azure

Also see TechNet and George Huey’s MSDN Magazine article Scaling Out with SQL Azure Federation.

Getting Started on Windows Azure

Start at the Windows Azure Development Center, where you can get started with .NET, Node.js, Java, PHP, and more.

Get Windows Azure

Windows Azure Training Kit includes a comprehensive set of technical content to help you learn how to use Windows Azure.

 

Bruce D. KyleISV Architect Evangelist | Microsoft Corporation

image

Thanks to Andew Brust for his paper entitled NoSQL and the Windows Azure platform -- Investigation of an Unlikely Combination.