Try Not To Query All Rows


I wanted to take a look at HrQueryAllRows today. This is a popular function, especially among novice MAPI programmers. It seems ideal – you can pass your tag array, restriction, sort order all to one function and get all the rows back in one call. However, you pay a hidden price for this convenience. Let’s look at a pseudo-code implementation of the function. I’ve removed most of the error handling to make the algorithm clearer:

STDAPI

HrQueryAllRows(LPMAPITABLE ptable,

   LPSPropTagArray ptaga,

   LPSRestriction pres,

   LPSSortOrderSet psos,

   LONG crowsMax,

   LPSRowSet FAR *pprows)

{

   HRESULT   hr;

   LPSRowSet prows = NULL;

   UINT      crows = 0;

   LPSRowSet prowsT;


   *pprows = NULL;


   if (ptaga) ptable->SetColumns(ptable, ptaga, TBL_BATCH);

   if (pres)  ptable->Restrict(ptable, pres, TBL_BATCH);

   if (psos)  ptable->SortTable(ptable, psos, TBL_BATCH);


   ptable->SeekRow(ptable, BOOKMARK_BEGINNING,0, NULL);


   if (crowsMax == 0) crowsMax = LONG_MAX;

   for (;;)

   {

      prowsT = NULL;


      // Retrieve some rows. Ask for the limit.

      hr = ptable->QueryRows(ptable, crowsMax, 0, &prowsT);


      // Add the rows just retrieved into the set we’re building.

      MergeRowSets(prowsT, &prows);


      if (prowsT->cRows == 0) break;

   }


   *pprows = prows;


   return hr;

}

One thing we notice right off is that except for the magic MergeRowSets function, this code doesn’t do anything special that you couldn’t implement yourself. We’ll talk more about MergeRowSets later. Breaking down the algorithm, we see it first calls the trio of table setup functions: SetColumns, Restrict, and SortTable. Next, we seek to the beginning of the table. Then the fun starts: We read a bunch of rows from the table, then merge those rows with any rows we’ve already retrieved. We keep reading and merging QueryRows doesn’t return any


Why do it this way? Why read in batches? The answer is two part: the first part is that QueryRows doesn’t have a flag to ask for all rows. You have to ask for a fixed number of rows. The second part is that even if QueryRows did have a means to ask for all rows, the number of rows returned is actually determined by the provider. You can ask for 20 rows and the provider can decide, due to underlying constraints, to only return 5. So any routine that wants to read all rows has to operate in batches.


That’s the first pitfall of HrQueryAllRows – working in batches like this means you could end up waiting on the provider to communicate with it’s data store, possibly issuing a large number of network calls in the process.


The second pitfall is the necessary implementation of this MergeRowSets function. Each time it is called, it has to allocate larger and larger blocks of memory to hold the merged row sets. Depending on the table you’re reading, these allocations could easily exceed available memory. Even if they don’t exceed it outright, they can certainly fragment the virtual memory space of the process, lowering the size of the largest free block, which in turn increases the chance that any memory allocation could fail.


This is why the MSDN carries the following caution for HrQueryAllRows:



Tables that are typically small, such as a message store table or a provider table, usually can be safely retrieved with HrQueryAllRows. Tables at risk of being very large, such as a contents table or even a recipients table, should be traversed in subsections using the IMAPITable::QueryRows method.


We see customer’s code here all the time that uses HrQueryAllRows on tables where it should never be used, such as the contents table of an Inbox, or on the Global Address List. This code will work in test environments, where the data sets are small, but will almost certainly fail with MAPI_E_TABLE_TOO_BIG when deployed in production.


Fortunately, eliminating this function is straightforward. The tag array, restriction, and sort order parameters become calls to SetColumns, Restrict, and SortTable, which we see are easy calls to make. Looping calls to QueryRows and then looping over the results is a bit more complex. Here’s a pattern I like to use:

   for (;;)

   {

      hRes = S_OK;


      FreeProws(lpRows);

      lpRows = NULL;

      hRes = lpMAPITable->QueryRows(

         20, // arbitrary value, may be tweaked

         NULL,

         &lpRows);

      if (FAILED(hRes) || !lpRows || !lpRows->cRows) break;


      ULONG iCurRow;

      for (iCurRow = 0; iCurRow<lpRows->cRows; iCurRow++)

      {

         // Do work on lpRows->aRow[iCurRow]

      }

   }

   FreeProws(lpRows);

   lpRows = NULL;

We ask for a batch of rows, fail on any error, process them, and loop. We do NOT check that we got as many rows as we requested, only that we got something. This pattern is much gentler on your virtual memory space. Not only does it not require large memory allocations that will almost certainly fail, most of the memory that is allocated will be the same size, meaning it will most likely occupy the same addresses from one iteration to the next. Hopefully, you’ll be able to incorporate this into your own code and eliminate calls to HrQueryAllRows.




Comments (7)

  1. pcunite says:

    I see your using MAPIFreeBuffer on lpRows. I am doing like so, am I correct?

    LPSRowSet pLPSRowSet;

    pTable->QueryRows(iCount, 0, &pLPSRowSet);

    // Do work

    if (pLPSRowSet != NULL)

    {

     FreeProws(pLPSRowSet);

     pLPSRowSet = NULL;

    }

  2. Doh! That’ll teach me to type random code in as a sample…I’ve fixed the MAPIFreeBuffer to FreeProws. 🙂

  3. So, I’ve never used HrQueryRows, but have had to revise my own code several times to get to more stable execution.

    One thing I don’t quite understand I the effect of using short term ids (which I understand are valid only as long as the the table reference is alive) versus long term. Is there a memory/performance advantage between the two.

    Also, often I need to get just the entryid column of a table and cache those values for later use.  Do we still need to be concerned with memory load for a single column? Is it more costly to ask for PR_LONGTERM_ENTRYID_FROM_TABLE?

    Would it be better overall to leave the table reference open for a long time and move around in the table as needed versus working from a cache? How long can a table ref be left open?

  4. Short term EIDs are typically valid for your whole session. PR_LONGTERM_ENTRYID_FROM_TABLE *is* more expensive to calculate, but not by much.

    I would only keep the table open if you’re advising for notifications on it. Otherwise, get what you need and close it out when you’re done.

  5. Good to know.  Seems my experience is different regarding the short term ids.  I’ll mess with it again.

    On that topic though, will exchange return an object faster using a short term id?  And if so, is it dramatic enough that i should change my coding to use short term ids?

    Thanks!

  6. I would expect the long term entry ID to be faster to retrieve the object, but I wouldn’t rely on that without testing.

  7. Bin Lan says:

    Hi,

    Would IMESSAGE’s saveChanges function cause MAPI_E_TABLE_TOO_BIG error? I am keep getting this error for messages with recipients more than a dozen after saving about 500000 messages into PST.

    Regards

    Bin