Try Not To Query All Rows

I wanted to take a look at HrQueryAllRows today. This is a popular function, especially among novice MAPI programmers. It seems ideal – you can pass your tag array, restriction, sort order all to one function and get all the rows back in one call. However, you pay a hidden price for this convenience. Let’s look at a pseudo-code implementation of the function. I’ve removed most of the error handling to make the algorithm clearer:

 STDAPI
HrQueryAllRows(LPMAPITABLE ptable,
   LPSPropTagArray ptaga,
   LPSRestriction pres,
   LPSSortOrderSet psos,
   LONG crowsMax,
   LPSRowSet FAR *pprows)
{
   HRESULT   hr;
   LPSRowSet prows = NULL;
   UINT      crows = 0;
   LPSRowSet prowsT;
   *pprows = NULL;
   if (ptaga) ptable->SetColumns(ptable, ptaga, TBL_BATCH);
   if (pres)  ptable->Restrict(ptable, pres, TBL_BATCH);
   if (psos)  ptable->SortTable(ptable, psos, TBL_BATCH);
   ptable->SeekRow(ptable, BOOKMARK_BEGINNING,0, NULL);
   if (crowsMax == 0) crowsMax = LONG_MAX;
   for (;;)
   {
      prowsT = NULL;
      // Retrieve some rows. Ask for the limit.
      hr = ptable->QueryRows(ptable, crowsMax, 0, &prowsT);
      // Add the rows just retrieved into the set we're building.
      MergeRowSets(prowsT, &prows);
      if (prowsT->cRows == 0) break;
   }
   *pprows = prows;
   return hr;
}

One thing we notice right off is that except for the magic MergeRowSets function, this code doesn’t do anything special that you couldn’t implement yourself. We’ll talk more about MergeRowSets later. Breaking down the algorithm, we see it first calls the trio of table setup functions: SetColumns, Restrict, and SortTable. Next, we seek to the beginning of the table. Then the fun starts: We read a bunch of rows from the table, then merge those rows with any rows we’ve already retrieved. We keep reading and merging QueryRows doesn’t return any

Why do it this way? Why read in batches? The answer is two part: the first part is that QueryRows doesn’t have a flag to ask for all rows. You have to ask for a fixed number of rows. The second part is that even if QueryRows did have a means to ask for all rows, the number of rows returned is actually determined by the provider. You can ask for 20 rows and the provider can decide, due to underlying constraints, to only return 5. So any routine that wants to read all rows has to operate in batches.

That’s the first pitfall of HrQueryAllRows – working in batches like this means you could end up waiting on the provider to communicate with it’s data store, possibly issuing a large number of network calls in the process.

The second pitfall is the necessary implementation of this MergeRowSets function. Each time it is called, it has to allocate larger and larger blocks of memory to hold the merged row sets. Depending on the table you’re reading, these allocations could easily exceed available memory. Even if they don’t exceed it outright, they can certainly fragment the virtual memory space of the process, lowering the size of the largest free block, which in turn increases the chance that any memory allocation could fail.

This is why the MSDN carries the following caution for HrQueryAllRows:

Tables that are typically small, such as a message store table or a provider table, usually can be safely retrieved with HrQueryAllRows. Tables at risk of being very large, such as a contents table or even a recipients table, should be traversed in subsections using the IMAPITable::QueryRows method.

We see customer’s code here all the time that uses HrQueryAllRows on tables where it should never be used, such as the contents table of an Inbox, or on the Global Address List. This code will work in test environments, where the data sets are small, but will almost certainly fail with MAPI_E_TABLE_TOO_BIG when deployed in production.

Fortunately, eliminating this function is straightforward. The tag array, restriction, and sort order parameters become calls to SetColumns, Restrict, and SortTable, which we see are easy calls to make. Looping calls to QueryRows and then looping over the results is a bit more complex. Here’s a pattern I like to use:

    for (;;)
   {
      hRes = S_OK;
      FreeProws(lpRows);
      lpRows = NULL;
      hRes = lpMAPITable->QueryRows(
         20, // arbitrary value, may be tweaked
         NULL,
         &lpRows);
      if (FAILED(hRes) || !lpRows || !lpRows->cRows) break;
      ULONG iCurRow;
      for (iCurRow = 0; iCurRow<lpRows->cRows; iCurRow++)
      {
         // Do work on lpRows->aRow[iCurRow]
      }
   }
   FreeProws(lpRows);
   lpRows = NULL;

We ask for a batch of rows, fail on any error, process them, and loop. We do NOT check that we got as many rows as we requested, only that we got something. This pattern is much gentler on your virtual memory space. Not only does it not require large memory allocations that will almost certainly fail, most of the memory that is allocated will be the same size, meaning it will most likely occupy the same addresses from one iteration to the next. Hopefully, you’ll be able to incorporate this into your own code and eliminate calls to HrQueryAllRows.