Using fibers to simplify enumerators, part 1: When life is easier for the enumerator


The COM model for enumeration (enumeration objects) is biased towards making life easy for the consumer and hard for the producer. The enumeration object (producer) needs to be structured as a state machine, which can be quite onerous for complicated enumerators, for example, tree walking or composite enumeration.

On the other hand, the callback model for producer (used by most Win32 functions) is biased towards making life easy for the enumerator and hard for the consumer. This time, it is the consumer that needs to be structured as a state machine, which is more work if the consumer is doing something complicated with each callback. (And even if not, you have to create a context structure to pass state from the caller, through the enumerator, to the callback.)

For example, suppose we want to write a routine that walks a directory structure, allowing the caller to specify what to do at each decision point. Let's design this first using the callback paradigm:

#include <windows.h>
#include <shlwapi.h>
#include <stdio.h>

enum FERESULT {
 FER_CONTINUE,      // continue enumerating
                    // (if directory: recurse into it)
 FER_SKIP,          // skip this file/directory
 FER_STOP,          // stop enumerating
};

enum FEOPERATION {
 FEO_FILE,          // found a file
 FEO_DIR,           // found a directory
 FEO_LEAVEDIR,      // leaving a directory
};

typedef FERESULT (CALLBACK *FILEENUMCALLBACK)
    (FEOPERATION feo,
     LPCTSTR pszDir, LPCTSTR pszPath,
     const WIN32_FIND_DATA* pwfd,
     void *pvContext);

FERESULT EnumDirectoryTree(LPCTSTR pszDir,
    FILEENUMCALLBACK pfnCB, void* pvContext);

The design here is that the caller calls EnumDirectoryTree and provides a callback function that is informed of each file found and can decide how the enumeration should proceed.

Designing this as a callback makes life much simpler for the implementation of EnumDirectoryTree.

FERESULT EnumDirectoryTree(
    LPCTSTR pszDir,
    FILEENUMCALLBACK pfnCB, void *pvContext)
{
 FERESULT fer = FER_CONTINUE;
 TCHAR szPath[MAX_PATH];
 if (PathCombine(szPath, pszDir, TEXT("*.*"))) {
  WIN32_FIND_DATA wfd;
  HANDLE hfind = FindFirstFile(szPath, &wfd);
  if (hfind != INVALID_HANDLE_VALUE) {
   do {
    if (lstrcmp(wfd.cFileName, TEXT(".")) != 0 &&
        lstrcmp(wfd.cFileName, TEXT("..")) != 0 &&
        PathCombine(szPath, pszDir, wfd.cFileName)) {
     FEOPERATION feo = (wfd.dwFileAttributes &
                     FILE_ATTRIBUTE_DIRECTORY) ?
                     FEO_DIR : FEO_FILE;
     fer = pfnCB(feo, pszDir, szPath, &wfd, pvContext);
     if (fer == FER_CONTINUE) {
      if (feo == FEO_DIR) {
       fer = EnumDirectoryTree(szPath, pfnCB, pvContext);
       if (fer == FER_CONTINUE) {
        fer = pfnCB(FEO_LEAVEDIR, pszDir, szPath,
                    &wfd, pvContext);
       }
      }
     } else if (fer == FER_SKIP) {
      fer = FER_CONTINUE;
     }
    }
   } while (FindNextFile(hfind, &wfd));
   FindClose(hfind);
  }
 }
 return fer;
}

Note: I made no attempt to make this function at all efficient since that's not my point here. It's highly wasteful of stack space (which can cause problems when walking deep directory trees). This function also doesn't like paths deeper than MAX_PATH; fixing this is beyond the scope of this series. Nor do I worry about reparse points, which can induce infinite loops if you're not careful.

Well, that wasn't so hard to write. But that's because we made life hard for the consumer. The consumer needs to maintain state across each callback. For example, suppose you wanted to build a list of directories and their sizes (both including and excluding subdirectories).

class EnumState {
public:
 EnumState()
   : m_pdirCur(new Directory(NULL)) { }
 ~EnumState() { Dispose(); }
 FERESULT Callback(FEOPERATION feo,
    LPCTSTR pszDir, LPCTSTR pszPath,
    const WIN32_FIND_DATA* pwfd);
 void FinishDir(LPCTSTR pszDir);

private:

 struct Directory {
  Directory(Directory* pdirParent)
   : m_pdirParent(pdirParent)
   , m_ullSizeSelf(0)
   , m_ullSizeAll(0) { }
  Directory* m_pdirParent;
  ULONGLONG m_ullSizeSelf;
  ULONGLONG m_ullSizeAll;
 };
 Directory* Push();
 void Pop();
 void Dispose();

 Directory* m_pdirCur;
};

EnumState::Directory* EnumState::Push()
{
 Directory* pdir = new Directory(m_pdirCur);
 if (pdir) {
  m_pdirCur = pdir;
 }
 return pdir;
}

void EnumState::Pop()
{
 Directory* pdir = m_pdirCur->m_pdirParent;
 delete m_pdirCur;
 m_pdirCur = pdir;
}

void EnumState::Dispose()
{
 while (m_pdirCur) {
  Pop();
 }
}

void EnumState::FinishDir(LPCTSTR pszDir)
{
  m_pdirCur->m_ullSizeAll +=
    m_pdirCur->m_ullSizeSelf;
  printf("Size of %s is %I64d (%I64d)\n",
   pszDir, m_pdirCur->m_ullSizeSelf,
   m_pdirCur->m_ullSizeAll);
}

ULONGLONG FileSize(const WIN32_FIND_DATA *pwfd)
{
  return 
    ((ULONGLONG)pwfd->nFileSizeHigh << 32) +
    pwfd->nFileSizeLow;
}

FERESULT EnumState::Callback(FEOPERATION feo,
    LPCTSTR pszDir, LPCTSTR pszPath,
    const WIN32_FIND_DATA* pwfd)
{
 if (!m_pdirCur) return FER_STOP;

 switch (feo) {
 case FEO_FILE:
  m_pdirCur->m_ullSizeSelf += FileSize(pwfd);
  return FER_CONTINUE;

 case FEO_DIR:
  if (Push()) {
   return FER_CONTINUE;
  } else {
   return FER_SKIP;
  }

 case FEO_LEAVEDIR:
  FinishDir(pszPath);

 /* Propagate size into parent */
  m_pdirCur->m_pdirParent->m_ullSizeAll +=
    m_pdirCur->m_ullSizeAll;
  Pop();
  return FER_CONTINUE;

 default:
  return FER_CONTINUE;
 }
 /* notreached */
}

FERESULT CALLBACK EnumState_Callback(
    FEOPERATION feo,
    LPCTSTR pszDir, LPCTSTR pszPath,
    const WIN32_FIND_DATA* pwfd,
    void* pvContext)
{
 EnumState* pstate =
    reinterpret_cast<EnumState*>(pvContext);
 return pstate->Callback(feo, pszDir,
            pszPath, pwfd);
}

int __cdecl main(int argc, char **argv)
{
 EnumState state;
 if (EnumDirectoryTree(TEXT("."),
        EnumState_Callback,
        &state) == FER_CONTINUE) {
  state.FinishDir(TEXT("."));
 }
 return 0;
}

Boy that sure was an awful lot of typing, and what's worse, the whole structure of the program has been obscured by the explicit state management. It sure is hard to tell at a glance what this chunk of code is trying to do. Instead, you have to stare at the EnumState class and reverse-engineer what's going on.

(Yes, I could have simplified this code a little by using a built-in stack class, but as I have already noted in the context of smart pointers, I try to present these articles in "pure" C++ so people won't get into arguments about which class library is best.)

Tomorrow, we'll look at how the world would be if the function EnumDirectoryTree were spec'd out by the caller rather than the enumerator!

Comments (28)
  1. > (Yes, I could have simplified this code a little by using a built-in stack class, but as I have already noted in the context of smart pointers, I try to present these articles in "pure" C++ so people won’t get into arguments about which class library is best.)

    stack is a part of C++ Standard Library, it’s not any particular vendor’s one, thus it is "safe" to use in this context.

  2. Raymond Chen says:

    It may be the part of the standard library, but it’s still a library. Some people may like std::string, others might prefer CString, still others may prefer their own string library. (If everybody agreed on std::string then why are there so many string libraries?)

  3. John McCormick says:

    Perhaps if more people treated the standard library as a standard, more people would accept it as such. In any case, I don’t think too many people would lampoon you for using it.

    I always find this sort of mix of c-like code and c++ code a bit jarring, but then I’m not really a windows programmer. Looking forward to tomorrow’s installment!

  4. Raymond Chen says:

    True, but if I chose a specific library (even the standard one), I would lose people who weren’t familiar with (or actively disliked) that library. But everybody who programs Windows knows what TCHAR is.

  5. >If everybody agreed on std::string then why are there so many string libraries?

    So many? Besides MFC, I am not familiar with any other in widespread use.

    The only reason for these "other" libraries, including (parts of) MFC has been poor implementation (or buggy) of both C++ compiler and/or Standard Library. Things are a lot better now with VS.NET 2003 and will be even better with 2005. It’s time to start treating Standard Library as a part of C++ language, not something we might not use because we don’t like it or are not familiar with.

  6. Eric TF Bat says:

    Maybe it’s because I’ve been reading a lot of LISP code lately, but I looked at this article and my first thought was "Why is Raymond posting randomly-indented assembler code on his blog?"

    C++: Just Say AAAARRGHH OH GOD STOP AAAAARRRGGGHHH HELP PLEASE PLEASE PLEASE UUURKKK!

  7. Noob says:

    What does fibers have to do with this?

  8. Drazen,

    There are lots of reasons not to use std::string, and CString/CAtlString often provide an alternative to it.

    Just because STL’s a part of the C++ standard library doesn’t mean that it is a worthy solution for everyone – the code size bloat you get from using STL is enough to preclude its use in many scenarios (like most of the Windows core). Also the fact that STL’s localization support is mediochre (it has to be because it’s platform neutral, and localization basically requires a platform specific infrastructure (how you specify your localizable resources depends on the platform)) precludes its use in many places.

  9. Rick C says:

    Noob, this is a multi-part lecture. I’m sure Rand will get to that. Part 1 is setting a stage.

  10. MYG says:

    Noob,

    Because state is easily held on the CPU stack. A fiber is a bit like a thread in that it has its own stack but the context switching is driven by the application rather than the scheduler.

    So the idea here is to have a tasking module without the tasking overhead (or part of it, stack space is overhead): two "fibers", one a consumer, one a producer.

    Think of fibers as setjmp()/longjmp() with separate stacks.

    IIRC, there is a "fiber like" construct in later versions of VMS. I wonder if they are a Cutler decree or just a common good idea.

  11. Speaking as the owner of a technology (namely, linguistic collation) and a colleague to the owner of a technology (namely encoding / codepages) and a former owner of a technology (namely, locale based formatting) which is superior in almost every way to the correponding CRT analogue, I can tell you that there are indeed times that the CRT is not always the best solution. And there do exist many times that it is not the answer….

  12. This is OT and for that I apologise . . .

    Does anyone know where I can find information about the next standard version of C++? I remember hearing about this somewhere but have no idea what it is codenamed etcetera.

    Nice article Raymond (as per usual).

  13. Tim Smith says:

    std::auto_ptr is a perfect example of something that just because it is the standard doesn’t mean it is good.

  14. Kent:

    You can find some links about the next C++ standard here: http://www.gotw.ca/iso/

    And also check out Herb Sutter’s blog at http://www.pluralsight.com/blogs/hsutter/default.aspx

    Happy new year everyone :)

  15. >Drazen,

    >>There are lots of reasons not to use std::string, and CString/CAtlString often provide an alternative to it.

    Larry,

    you are right. But let’s put things into context here – this is supposed to be an example of a certain technique. State management is only a necessary "evil" here, it is not a meat of the post. I am simply arguing that standard stack could have been used to simplify the code sample without jeopardizing the message of the post – even better, shorter code might have improved the readibility and help push the message through.

  16. Raymond Chen says:

    Perhaps true, but when I see std:: stuff I tend to get more confused rather than less.

  17. This time, we’ll watch the enumerator lose.

  18. The std:: stuff assumes you are comfortable with exceptions. I personally believe that there are unsolvable quality problems associated with use of exceptions. Thus, even though the library seems very useful, I stay as far away from the STL libraries as possible.

    C and C++ (pre-STL) were systems programming languages which provided you with a syntax for using the underlying machine. The libraries provided were really just tokens to portability.

    With the introduction of STL, C++ is moving/has moved out of the systems programming space and is trying to seem like an application programming language like BASIC. I personally don’t see the point. If I wanted to use BASIC, with all its plusses and minuses, I’d use BASIC.

  19. asdf says:

    "an application programming language like BASIC"

    I see Microsoft doesn’t do any drug testing…

    Raymond: You need to cast the ULONGLONGs to unsigned __int64 when you pass it in to printf and you need the format to be %I64u because %I64d is the signed version.

  20. Cooney says:

    I see Microsoft doesn’t do any drug testing…

    Perhaps they do, just not the way you were thinking ;)

  21. Sig9 says:

    is part of the software industry. if you do not learn to learn (ya ya ya) you will be out of a job as fast as you can say "hello world!"

  22. A while back there was an article in MSDN magazine about wrapping up the unmaged fibers API to implement…

Comments are closed.