Combining relational and NoSQL concepts in SQL Server

Intro

In this article we will see how you can combine NoSQL and relational concepts in SQL Server database.

We will see how you can transform tables to JSON objects and vice versa using FOR JSON and OPENJSON functions.

Relational vs. NoSQL approach

Traditional relational schema design follow strict normalization approach – every logical entity is stored as a separate table and there are foreign key relationships between related entities. Imagine products that may have many reviews – you should create one table for products, another for reviews, propagate foreign key from product into reviews table, and add a foreign key relationship between them, as shown in the following figure:

 

Although this is a proper database design, if you have many related entities you might end-up with a lot of tables and foreign key relationships, you will need to join tables every time you need to take product information, maintain indexes or foreign key columns to speed-up joins, etc. Example of query that reads products and related reviews is shown in following example:

SELECT Production.Product.ProductID AS ID, Production.Product.Name, 
Color, ListPrice, ReviewerName, ReviewDate, Rating
FROM Production.Product
JOIN Production.ProductReview
ON Production.ProductReview.ProductID = Production.Product.ProductID

 

Every time you need to retrieve related reviews you need to join tables and you would probably need to create additional index on foreign key column in Review table. Results are shown in the following table:

ID

Name

Color

ListPrice

ReviewerName

ReviewDate

Rating

709

Mountain Bike Socks, M

White

9.50

John Smith

2007-10-20 00:00:00.000

5

937

HL Mountain Pedal

Silver/Black

80.99

David

2007-12-15 00:00:00.000

4

937

HL Mountain Pedal

Silver/Black

80.99

Jill

2007-12-17 00:00:00.000

2

798

Road-550-W Yellow, 40

Yellow

1120.49

Laura Norman

2007-12-17 00:00:00.000

5

On the client-side you probably need to process results and merge two 937 product rows as a single object with two related reviews items (e.g. as an array with two elements). Instead of single query that joins two tables you can use two separate queries. First query would read product information ad second query would read related product reviews. Some ORM tools use this approach when they need to read related informaiton. This is not so bad unless if you have a lot of related entities (e.g. product images, product attachments, product categories, etc.) and you need to run several independent queries for a single product.

One of the reasons why NoSQL systems become popular is the fact that you can use composite objects where you can store attributes of primary entity (product in our example) with related records (product reviews) within the primary entity as an array or collection of sub-objects. As an example, in MongoDb or DocumentDb you will create one JSON document for Product and add related reviews as an array of JSON objects like in the following example:

Products collection
{
 “ProductID”:709,
 “Name”:”Mountain Bike Socks, M”,
 “Color”:”White”,
 “Reviews”:[
  {
   “Reviewer”:{
     “Name”:”John Smith”,
     “Email”:”john@fourthcoffee.com”
    },
   “ReviewDate”:”2007-10-20T00:00:00″,
   “Rating”:5,
   “ModifiedDate”:”2007-10-20T00:00:00″
  }
 ]
}

 {
 “ProductID”:798,
 “Name”:”Road-550-W Yellow, 40″,
 “Color”:”Yellow”,
 “Reviews”:[
  {
   “Reviewer”:{
     “Name”:”Laura Norman”,
      “Email”:”laura@treyresearch.net”
     },
    “ReviewDate”:”2007-12-17T00:00:00″,
    “Rating”:5,
    “ModifiedDate”:”2007-12-17T00:00:00″
    }
  ]
}

 {
 “ProductID”:937,
 “Name”:”HL Mountain Pedal”,
 “Color”:”Silver\/Black”,
 “Reviews”:[
  {
    “Reviewer”:{
      “Name”:”David”,
      “Email”:”david@graphicdesigninstitute.com”
    },
   “ReviewDate”:”2007-12-15T00:00:00″,
   “Rating”:4,
   “ModifiedDate”:”2007-12-15T00:00:00″
  },
  {
    “Reviewer”:{
       “Name”:”Jill”,
       “Email”:”jill@margiestravel.com”
    },
   “ReviewDate”:”2007-12-17T00:00:00″,
   “Rating”:2,
   “ModifiedDate”:”2007-12-17T00:00:00″
   }
 ]

 

You have simpler data model, no JOINs, no additional requests/queries/indexes, all data available in the same record. However, this model is also far from perfect. Although it is a good choice for smaller data models, in more complex models you might end-up with heavy objects, or you would need to organize objects in separate collections. In some cases you might need to join objects stored in different collections, and you would find that this is not possible (e.g. in DocumentDb) or you need to write complex map/reduce jobs for a simple join (e.g. in MongoDb).

How to transform relational schema into NoSQL in SQL server?

SQL Server 2016/Azure SQL Database introduce hybrid approach where you can choose between relational and NoSQL concepts. As an example, if you have products and their reviews you don’t need to create separate tables if you don’t want them. You can create additional columns in the primary table that will contain collection of related entities formatted as JSON arrays:

ALTER TABLE Production.Product
ADD Reviews NVARCHAR(MAX)
CONSTRAINT [Reviews are formatted as JSON] CHECK(ISJSON(Reviews)>0)

In this example, we are adding a simple text column with a constraint that specifies that reviews should be formatted as JSON (similar to NoSQL databases). There is no new syntax for JSON constraint – you can use standard check constraint with function ISJSON that checks is Reviews text formatted as a JSON object.

If we want to move related product reviews from a separate table into this column we can use a simple UPDATE query:

UPDATE Production.Product
SET Reviews = (
SELECT ReviewerName AS [Reviewer.Name],
EmailAddress AS [Reviewer.Email],
ReviewDate, Rating, ModifiedDate
FROM Production.ProductReview
WHERE Production.ProductReview.ProductID = Production.Product.ProductID
FOR JSON PATH)

Inner query fetches all related reviews, formats them as JSON documents using FOR JSON clause and stores them as JSON text in Reviews column. We can format properties in JSON document using dot syntax (e.g. Product.Name will be created as a Name property in Product object).

If you don’t have existing related table, you can insert new values as array of JSON objects, e.g.:

INSERT INTO Production.Products(Name, Reviews)
VALUES('HL Mountain Pedal',
'[
{"Reviewer":{"Name":"John Smith","E-mail":"john@fourthcoffee.com"},"Rating":5},
{"Reviewer":{"Name":"Mike"},"Rating":4}
]')

Values can be inserted as a plain.

Now, we can read products and related reviews with a single query:

SELECT ProductID, Name, Color, ListPrice, Reviews
FROM Production.Product
WHERE Reviews IS NOT NULL

Results are shown in the following table:

ProductID Name Color ListPrice Reviews
709 Mountain Bike Socks, M White 9.5 [{“Reviewer”:{“Name”:”John Smith”,”E-mail”:”john@fourthcoffee.com”},”ReviewDate”:”2007-10-20T00:00:00″,”Rating”:5,”ModifiedDate”:”2007-10-20T00:00:00″}]
798 Road-550-W Yellow, 40 Yellow 1120.49 [{“Reviewer”:{“Name”:”Laura Norman”,”E-mail”:”laura@treyresearch.net”},”ReviewDate”:”2007-12-17T00:00:00″,”Rating”:5,”ModifiedDate”:”2007-12-17T00:00:00″}]
937 HL Mountain Pedal Silver/Black 80.99

[
 {“Reviewer”:{“Name”:”David”,”E-mail”:”david@graphicdesigninstitute.com”},”ReviewDate”:”2007-12-15T00:00:00″,”Rating”:4,”ModifiedDate”:”2007-12-15T00:00:00″},
 {“Reviewer”:{“Name”:”Jill”,”E-mail”:”jill@margiestravel.com”},”ReviewDate”:”2007-12-17T00:00:00″,”Rating”:2,”ModifiedDate”:”2007-12-17T00:00:00″}
]

 

As you can see, reviews are returned as a collection of JSON objects. There is exactly one row per each product so you do not need any transformation on the client side. This is a perfect choice if your client already expects JSON format for related records (e.g. if you are using some Angular, Knockout or other template engines that inject JSON model into HTML view).

As an alternative if you want to “join” products with related reviews as in the query above, you can use following query:

SELECT ProductID, Name, Color, ListPrice, ReviewerName, ReviewDate, Rating
FROM Production.Product
CROSS APPLY OPENJSON(Reviews)
WITH ( ReviewerName nvarchar(30) '$.Reviewer.Name', ReviewDate datetime2, Rating int)
WHERE Reviews IS NOT NULL

OPENJSON table value function takes related reviews formatted as JSON and returns them as a table. In the WITH part you can specify schema of returned table. Column names match the names of properties in JSON object, and if you have nested property (e.g. Reviewer.Name) you can specify JSON path where this value can be found. CROSS APPLY joins parent row with table returned by OPENJSON functions. Results are shown in the following table:

ID

Name

Color

ListPrice

ReviewerName

ReviewDate

Rating

709

Mountain Bike Socks, M

White

9.50

John Smith

2007-10-20 00:00:00.000

5

937

HL Mountain Pedal

Silver/Black

80.99

David

2007-12-15 00:00:00.000

4

937

HL Mountain Pedal

Silver/Black

80.99

Jill

2007-12-17 00:00:00.000

2

798

Road-550-W Yellow, 40

Yellow

1120.49

Laura Norman

2007-12-17 00:00:00.000

5

In both cases you don’t need to scan two tables – all necessary results are taken with a single query.

OK, but how to update reviews?

This is important thing you would need to consider. Updating JSON collection of objects is not so efficient as updating collection of rows in a separate table. If you need to change objects in JSON you would need to transform it back to relational format, update rows, and format them again as JSON. In the following procedure is shown how can you append new review into the collection of product reviews:

CREATE PROCEDURE
AddProductReview(@productid int, @reviewerName nvarchar(30), @reviewerEmail nvarchar(30), @reviewDate datetime2, @rating int)
AS
BEGIN

WITH reviews AS
(
SELECT ProductReviews.*
FROM Production.Product
CROSS APPLY OPENJSON(Reviews)
WITH ([Reviewer.Name] nvarchar(30), [Reviewer.Email] nvarchar(30), ReviewDate datetime2, Rating int, ModifiedDate datetime2)
AS ProductReviews
WHERE ProductID = @productid
UNION ALL
SELECT @reviewerName AS [Reviewer.Name], @reviewerEmail AS [Reviewer.Email], @reviewDate AS ReviewDate, @rating AS Rating, getdate() AS ModifiedDate
)
UPDATE Production.Product
SET Reviews = (
SELECT * FROM reviews FOR JSON PATH
)
WHERE ProductID = @productid

END

 

In the first part of WITH clause we are opening JSON array, returning a table of product reviews and appending new product review row using parameters supplied to the stored procedure. Then, we are updating original product row and put new reviews formatted as JSON into Reviews column.

This is not efficient as a separate table, but this is another trade-off you need to consider. If related information are read-only (e.g. items of the sales order that is closed, name/surname or some person, names of towns, countries, continents that will unlikely be changed) you will probably not update this column. Also, if related rows are not frequently changed opening JSON and formatting it again might not be big performance impact. However, if you have product reviews that are frequently changed then you should use separate table. 

Summary

Although SQL Server is relational database you don’t need to use strict relational theory in your database design. With new JSON support that is coming in SQL Server 2016 and Azure SQL Database you can choose when to follow strict concepts of relational schema design, and when to format objects as in the NoSQL systems. SQL Server gives you flexibility to choose the best format for your applications and enables you to easily transform table data into JSON documents and JSON documents to table structure.