HealthVault: Batching up queries

When I first started using the HealthVault SDK, I wrote some code like this, based on what I had seen before:

HealthRecordSearcher searcher = PersonInfo.SelectedRecord.CreateSearcher();
HealthRecordFilter filter = new HealthRecordFilter(Height.TypeID);
searcher.Filters.Add(filter);

HealthRecordItemCollection items = searcher.GetMatchingItems()[0];

So, what's up with indexing into the result from GetMatchingItems()? Why isn't it simpler?

The answer is that queries can be batched up into a single filter, so that you can execute them all at once. So, if we want to, we can write the following:

HealthRecordSearcher searcher = PersonInfo.SelectedRecord.CreateSearcher();

HealthRecordFilter filterHeight = new HealthRecordFilter(Height.TypeId);
searcher.Filters.Add(filterHeight);

HealthRecordFilter filterWeight = new HealthRecordFilter(Weight.TypeId);
searcher.Filters.Add(filterWeight); 

ReadOnlyCollection<HealthRecordItemCollection> results = searcher.GetMatchingItems();

HealthRecordItemCollection heightItems = results[0];
HealthRecordItemCollection weightItems = results[1];

Based on a partner question today, I got a bit interested in what the performance advantages were of batching queries up. So, I wrote a short test application that compared fetching 32 single Height values either serially or batched together.

Here's what I saw:

Batch Size Time in seconds
1 0.98
2 0.51
4 0.28
8 0.16
16 0.10
32 0.08

This is a pretty impressive result - if you need to fetch 4 different items, it's nearly 4 times faster to batch up the fetch compared to doing them independently. Why is this so big?

Well, to do a fetch, the following thing has to happen:

  1. The request is created on the web server
  2. It is transmitted across the net to HealthVault servers
  3. The request is decoded, executed, and a response is created
  4. It is transmitted back to the web server
  5. The web server unpackages it

When a filter returns small amounts of data, steps 1, 3, and 5 are pretty fast, but steps 2 and 4 involve network latency, which dominates the elapsed time. So, the batching eliminates those chunks of time, and we get a nice speedup.

We would therefore expect that as we fetch more data in each request, batching would be less useful. Here is some data for fetching 16 items:

Batch Size Time in seconds
1 1.40
2 0.91
4 0.66
8 0.49
16 0.42
32 0.39

Which is pretty much what you would expect.