Use WITH clause in OPENJSON to improve parsing performance

OPENJSON function has a WITH clause where you can specify what fields should be extracted from input JSON. This might improve performance of your queries compared to the case where you use OPENJSON without schema and later extract information from the parsed JSON using JSON_VALUE function.

I have found some examples where people using OPENJSON to parse JSON arrays and then JSON_VALUE to extract information from the JSON object.

As an example, imagine that you have JSON array that contains information about cars:

 [
  {"year":2001,"make":"ACURA","model":"CL"},
  {"year":2001,"make":"ACURA","model":"EL"},
  {"year":2001,"make":"ACURA","model":"INTEGRA"},
  {"year":2001,"make":"ACURA","model":"MDX"},
  {"year":2001,"make":"ACURA","model":"NSX"},
  {"year":2001,"make":"ACURA","model":"RL"},
  {"year":2001,"make":"ACURA","model":"TL"}
]

You can take the sample from the following URL: https://raw.githubusercontent.com/arthurkao/vehicle-make-model-data/master/json\_data.json

Let assume that we want to find number of cars with model = 'Golf'. One approach would be to parse json text and return each object in the array, then check is the model value in JSON equal 'Golf':

 SELECT count(*)
FROM OPENJSON(@json)
WHERE JSON_VALUE(value, '$.model') = 'Golf'

The problem in this approach is the fact that SQL Server parses the same JSON text twice - once when OPENJSON breaks elements from JSON array into JSON object (returned as value column), and second time when JSON_VALUE parses value returned by OPENJSON to extract model.

Another approach would be to tell OPENJSON that it needs to return model column when input JSON is parsed and then filter results by this column:

 SELECT count(*)
FROM OPENJSON(@json) WITH(model nvarchar(20) )
WHERE model = 'Golf'

In this case, input JSON will be parsed only once.

Performance comparison

Let's see in practice what are the differences:

 SET STATISTICS TIME ON
SELECT count(*) FROM OPENJSON(@json) WHERE JSON_VALUE(value, '$.model') = 'Golf'
SELECT count(*) FROM OPENJSON(@json) WITH( model nvarchar(20) ) WHERE model = 'Golf'
SET STATISTICS TIME OFF

Here are results of the queries:

 SQL Server Execution Times:
 CPU time = 656 ms, elapsed time = 651 ms.

SQL Server Execution Times:
 CPU time = 204 ms, elapsed time = 197 ms.

As you can see, WITH clause specify that OPENJSON should immediately return properties from the JSON array without second parsing. Performance of the queries might be increased 3 times if you avoid double parsing.