Storing JSON in SQL Server

Article
11/23/2015

SQL Server offers built-in functions for processing JSON text data. In this post, we will see how you can store JSON in SQL Server database.

Problem

Various systems, service, loggers, format information in JSON format. This text should be stored in database and we need a mechanism for analyzing information stored in JSON.

Solution

New SQL Server provides functions for processing JSON text. Information formatted as JSON can be stored as text in standard SQL Server columns and SQL Server provides functions that can retrieve values from these JSON objects.

Storing JSON in text columns

JSON is textual format so in SQL Server it is stored in NVARCHAR columns. The simplest table that represents collection of JSON objects is shown in the following listing:

 DROP TABLE IF EXISTS Person
 
 CREATE TABLE Person 
 ( _id int identity constraint PK_JSON_ID primary key,
 value nvarchar(max)
 CONSTRAINT [Content should be formatted as JSON]
 CHECK ( ISJSON(value)>0 )
 )

This simple structure is similar to the standard NoSQL collection that you can create in NoSQL databases (e.g. Azure DocumentDB or MongoDB) where you just have key that represents ID and value that represents JSON.

Note that NVARCHAR is not just a plain text. SQL Server has built-in text compressions mechanism that can transparently compress data stored on disk. Compression depends on language and can go up to 50% depending on your data (see UNICODE compression ).

The key difference between SQL server and other plain NoSQL databases is that SQL Server enables you to use hybrid data model where you can store several JSON objects in the same “collection” and combine them with regular relational columns.

As an example, imagine that we know that every person in your collection will have FirstName and LastName, and that you can store general information about the person as one JSON object, and phone numbers/email addresses as separate objects. In SQL Server 2016 we can easily create this structure without any additional syntax:

 DROP TABLE IF EXISTS Person
 
CREATE TABLE Person (
 
 PersonID int IDENTITY PRIMARY KEY,
 
 FirstName nvarchar(100) NOT NULL,
 
 LastName nvarchar(100) NOT NULL,
 
 AdditionalInfo nvarchar(max) NULL,
 
 PhoneNumbers nvarchar(max) NULL,
 
 EmailAddresses nvarchar(max) NULL
 CONSTRAINT [Email addresses must be formatted as JSON array]
 CHECK ( ISJSON(EmailAddresses)>0 )
 
 )

Instead of single JSON object you can organize your data in this “collection”. If you do not want to explicitly check structure of each JSON column, you don’t need to add JSON check constraint on every column (in this example I have added CHECK constraint only on EmailAddresses column).

If you compare this structure to the standard NoSQL collection, you might notice that you will have faster access to strongly typed data (FirstName and LastName). Therefore, this solution is good choice for hybrid models where you can identify some information that are repeated across all objects, and other variable information can be stored as JSON. This way, you can combine flexibility and performance.

If you compare this structure with the schema of Person table AdventureWorks database, you might notice that we have removed many related tables.

Beside simplicity of schema, your data access operations will be simpler compared to complex relational structure. Now you can read single table instead of joining several tables. When you need to insert new person with related information (email addresses, phone numbers) you can insert a single record in one table instead of inserting one record in AdventureWorks Person table, taking identity column to find foreign key that will be used to store phones, email addresses, etc. In addition, in this model you can easily delete single person row without cascade deletes using foreign key relationships.

NoSQL databases are optimized for simple, read, insert, and delete operations – SQL Server 2016 enables you to apply the same logic in relational database.

JSON constraints

In the previous examples, we have seen how to add simple constraint that validates that text stored in the column is properly formatted. Although JSON do not have strong schema, you can also add complex constraints by combining functions that read values from JSON and standard T-SQL functions:

 ALTER TABLE Person
 ADD CONSTRAINT [Age should be number]
 CHECK ( ISNUMERIC(JSON_VALUE(value, '$.age'))>0 )
 
 ALTER TABLE Person
 ADD CONSTRAINT [Person should have skills]
 CHECK ( JSON_QUERY(value, '$.skills') IS NOT NULL)

First constraint will take the value of $.age property and check is this numeric value. Second constraint will try to find JSON object in $.skills property and verify that it exists. The following INSERT statements will fail due to the violation of constraints:

 INSERT INTO Person(value)
 VALUES ('{"age": "not a number", "skills":[]}')
 
 INSERT INTO Person(value)
 VALUES ('{"age": 35}')

Note that CHECK constraints might slow down your insert/update processes so you might avoid them if you need faster write performance.

Compressed JSON storage

If you have large JSON text you can explicitly compress JSON text using built-in COMPRESS function. In the following example compressed JSON content is stored as binary data, and we have computed column that decompress JSON as original text using DECOMPRESS function:

 CREATE TABLE Person
 
 ( _id int identity constraint PK_JSON_ID primary key,
 
 data varbinary(max),
 
 value AS CAST(DECOMPRESS(data) AS nvarchar(max))
 
 )
 
 
 
 INSERT INTO Person(data)
 
 VALUES (COMPRESS(@json))

COMPRESS and DECOMPRESS functions use standard GZip compression. If your client can handle GZip compression (e.g browser that understands gzip content), you can directly return compressed content. Note that this is performance/storage trade-off. If you frequently query compressed data you mig have slower performance because text must be decompressed each time.