Storing JSON documents in SQL Database

Article
12/19/2017

SQL Server and Azure SQL Database have native JSON functions that enable you to parse JSON documents using standard SQL language. Now you are able to store JSON documents in SQL Database and query JSON data like in any NoSQL database. In this post we will see what are the options for storing JSON documents in SQL Database.

Classic tables

The simplest way to store JSON documents in the SQL database is to put a simple two-column table with id of document and content of document:

 create table WebSite.Logs (
    _id bigint primary key identity,
    log nvarchar(max)
);

This structure is equivalent to the collections that you can find in classic document databases. Primary key _id is incrementing value that provides unique identifier to every document and enables fast point lookups. This structure is good choice for the classic NoSQL scenarios where you need to get a document by id or store updated document by id.

Type nvarchar(max) enables you to store JSON documents with up to 2GB size. If you can be sure that your JSON documents cannot be greater than 8KB, we are recommending to use NVARCHAR(4000) instead of NVARCHAR(max) for performance reasons.

The table above will assume that valid JSON document are stored in the log column. If you want to be sure that valid JSON is placed in log column, you can add CHECK constraint on this column:

 ALTER TABLE WebSite.Logs
    ADD CONSTRAINT [Log record should be formatted as JSON]
                   CHECK (ISJSON(log)=1)

Every time when someone insert or updates some document in the table, this constraint will verify that the JSON document is properly formatted. Beware that table without constraint will be optimized for inserts/loads because any JSON document that is coming into the collection will be directly placed into the column without any processing.

When you store your JSON documents in the table, use can use standard T-SQL language to query JSON documents, for example:

 SELECT TOP 100 JSON_VALUE(log, ‘$.severity’), AVG( CAST( JSON_VALUE(log,’$.duration’) as float))
 FROM WebSite.Logs
 WHERE CAST( JSON_VALUE(log,’$.date’) as datetime) > @datetime
 GROUP BY JSON_VALUE(log, ‘$.severity’)
 HAVING AVG( CAST( JSON_VALUE(log,’$.duration’) as float) ) > 100
 ORDER BY CAST( JSON_VALUE(log,’$.duration’) as float) ) DESC

One big advantage is the fact that you can use ANY T-SQL function and query clause to query JSON documents. SQL Database do not introduce any constraint in the queries that you can use to analyze JSON documents. You just need to extract value from JSON document using JSON_VALUE function and use it in the query as any other value.

This is the key difference between SQL Database and classic NoSQL databases – in SQL database use you will probably find any function that you would ever need to process JSON data.

Indexes

If you find out that you queries frequently search documents by some property (e.g. severity property in JSON document), you can add a classic NONCLUSTERED index on the property to speed-up the queries.

You can create a computed column that exposes json values from the JSON columns on the specified path (e.g. on the path $.severity) and create standard index on this computed columns like in the following example:

 create table WebSite.Logs (
    _id bigint primary key identity,
    log nvarchar(max),

    severity AS JSON_VALUE(log, '$.severity'),
    index ix_severity (severity)
);

Computed column used in this example is non-persisted/virtual column that don’t adds additional space to the table. It is used by index ix_severity to improve performance of the queries like:

 SELECT log
FROM Website.Logs
WHERE JSON_VALUE(log, '$.severity') = 'P4'

One important characteristic of this index is that it is collation-aware. If your original NVARCHAR column has some COLLATION property (for example case sensitivity or Japanese language rule), the index will be organized according to the language rules associated to NVARCHAR column. Index will also apply case sensitivity/insensitivity rules that you want.

This might be important feature if you are developing applications for global markets that need to use custom language rules while processing JSON documents.

Big tables & columnstore format

If you expect that you will have a huge number of documents in your collection, we are recommending putting CLUSTERED COLUMNSTORE index on a collection:

 create sequence WebSite.LogID as bigint;
go
create table WebSite.Logs (
    _id bigint default(next value for WebSite.LogID),
    log nvarchar(max),

    INDEX cci CLUSTERED COLUMNSTORE
);

CLUSTERED COLUMNSTORE index provides high data compression (up to 25x) that can significantly reduce your storage space requirements, lower the storage cost, and increase IO performance of your workload. Also, CLUSTERED COLUMNSTORE indexes are optimized for table scans and analytic on your JSON documents, so this might be the best option for log analytic.

In this example, we have used sequence object to assign values to _id column. Both sequences and identities are available options for id column.

Frequently changing documents & memory optimized tables

If you expect a lot of updates, insert, and delete operations in your collections, you can put your JSON documents in memory optimized tables. Memory optimized JSON collections always keep data in-memory so there is no storage IO overhead. Additionally, memory optimized JSON collections are completely lock free – actions performed on document will not block any other operation.

The only thing that you need to do convert classic collections to memory optimized collections is to specify with (memory_optimized=on) option after the table definition, and you will have memory optimized version of the JSON collection:

 create table WebSite.Logs (

  _id bigint identity primary key nonclustered,

  log nvarchar(4000)

) with (memory_optimized=on)

Memory-optimized table are best option for highly changing documents. When you are considering memory optimized tables you should be focused on performance meaning that you should consider every type. If it is possible to use NVARCHAR(4000) instead of the NVARCHAR(max) in your memory optimized collections try to use it because it might drastically improve your performance.
Similar to the classic tables, you can add indexes on the fields that you are exposing via computed columns:

 create table WebSite.Logs (

  _id bigint identity primary key nonclustered,
  log nvarchar(4000),

  severity AS cast(JSON_VALUE(log, '$.severity') as tinyint) persisted,
  index ix_severity (severity)

) with (memory_optimized=on)

To maximize performance, our advice it to cast JSON value to the smallest possible type that can be used to hold the values from properties. In this example is used tinyint type.

You can also put SQL queries that update JSON document in stored procedures with native compilation:

 CREATE PROCEDURE WebSite.UpdateData(@Id int, @Property nvarchar(100), @Value nvarchar(100))
WITH SCHEMABINDING, NATIVE_COMPILATION

AS BEGIN
    ATOMIC WITH (transaction isolation level = snapshot,  language = N'English')

    UPDATE WebSite.Logs
    SET log = JSON_MODIFY(log, @Property, @Value)
    WHERE _id = @Id;

END

This natively compiled procedure will take the query and create .dll code that will execute the query. This is the faster approach for querying and updating data in SQL database.

Conclusion

Native JSON functions in SQL database enable you to process JSON documents stored in SQL Database like in NoSQL databases. Every database (relational and NoSQL) has some pros and cons when you analyze what functionalities it provides you for JSON data processing. The key benefits of storing JSON documents in SQL Database is full SQL language that you can use to process data, a variety of storage options (from columnstore for high compression and fast analytic to memory optimized for lock-free processing), mature security and internalizations features that you can simply reuse in your NoSQL scenario.

These are some reasons why you should consider storing some JSON documents in SQL database.