C++ Function Objects

C++ provides us with a number of ways we can shoot ourselves in the foot. If
you work as a developer on any sizeable software project, there is quite likely
a set of rules about limitations as to the C++ language features you’re allowed
to use: no memory allocation in constructors, no operator overloading,
exceptions used only for error handling, etc. Repentance for violating these
rules often resembles that for breaking the master build—i.e. some
obligation to perform some menial yet necessary task for the group and/or the
possession of some object of indistinction (“The Hat” or “The Hose”).

There are, however, reasons for flexibility where at least some of these
rules are concerned, and I’ll offer as an example some practical considerations
in favor of allowing the use of operator overloading for operator() (that’s read as “function-call
operator”). If you happen to be one of the lucky ones who work on projects with
only one or two other programmers, or even just by yourself, stick around.
‘Cause function objects are cool, and I’m about to tell you why.

Those of us who write software for the Macintosh know that the world is really
divided into three: Windows, Macintosh and Unix. How do we know this? Because
we’re constantly having to manipulate the line endings of various text files.
Curiously enough, the standard Mac OS X install doesn’t have a neat little
command-line tool for converting line endings (or at least “apropos” with
various forms of the phrase “convert line endings” yields “nothing
appropriate”). There isn’t even one among the little more than two dozen
command line tools in Apple’s Developer Tools.

So, if you often find yourself having to flip the line endings of various
text files, you’ll either open them all up in, say, XCode or BBEdit, and
manipulate the line endings by hand, or you’ll write a little command line tool
that will do what you need. The benefit of the latter is that it can handle
large numbers of files all at once.

Should you write one such tool, you’re quite likely to have three different
functions in your code that look something like this:

/*----------------------------------------------------------------------------     %%Function: ClnConvertToWin     %%Contact: richs ----------------------------------------------------------------------------*/ int ClnConvertToWin(InputFile &in, OutputFile &out) {     char chPrev = chNul;     int cln = 0;     for (;;)         {         int ch = in.ChGet()                  if (in.FEof())             break;                  switch(ch)             {         default:             out.PutChar(ch);             break;         case chCR:             out.PutChar(ch);             if (chLF != in.ChPeek())                 {                 out.PutChar(chLF);                 cln++;                 }             break;         case chLF:             if (chPrev != chCR)                 {                 cln++;                 out.PutChar(chCR);                 }             out.PutChar(ch);             break;             }         chPrev = ch;         }     return cln; }

This code, which converts an arbitrary input file to Windows’ line endings,
looks simple enough. It reads a character from the input file one character at
a time, and performs some specific action based on which character is just
read. It keeps track of the number of lines that it’s converted, and returns
that count when it’s all done.

The two other versions of this function likely have the exact same loop
control and differ only by the structure of the switch statement that does the actual conversion of line
endings.

This is bad, because you now have three separate loops in your code that are
almost identical. Suppose, for example, that you move this code to a system
where the “read a character, then test for end-of-file” construct isn’t the
most efficient or robust way to read characters from a file. You now have three
separate loops of code to change, and three separate opportunities to create
bugs in the code.

In the old days, we might have resolved this problem by using function
pointers, but they’re clumsy. Also, function pointers provide no opportunity
for the compiler to optimize out the function-call semantics. You’re going to
be stuck with full procedure prologue and epilogue with every iteration through
that loop. For performance reasons, as well as maintenance reasons, we don’t
want to use function pointers in this particular application.

 With C++, however, we can encapsulate the switch statement into a function
object, and put the control loop in a template function that takes as a
parameter a reference to an object that overloads operator().
The template that encapsulates the loop might look like:

/*----------------------------------------------------------------------------     %%Function: ClnConvertLines     %%Contact: richs      ----------------------------------------------------------------------------*/ template <class CharConverter> int ClnConvertLines(InputFile &in, CharConverter &cnv) {     int cln = 0;     for (;;)         {         int ch = in.ChGet();                  if (in.FEof())             break;         cnv(ch, cln);         }          return cln; }

And the function object that converts arbitrary line endings to Windows
might look like:

/*----------------------------------------------------------------------------     %%Class: ToWin     %%Contact: richs ----------------------------------------------------------------------------*/ class ToWin { public:     ToWin(InputFile &anIn, OutputFile &anOut) :             in(anIn),             out(anOut),             chPrev(chNul) {};     ~ToWin() {};     void operator()(int ch, int &cln)         {         switch(ch)             {         default:             out.PutChar(ch);             break;         case chCR:             out.PutChar(ch);             if (chLF != in.ChPeek())                 {                 out.PutChar(chLF);                 cln++;                 }             break;         case chLF:             if (chPrev != chCR)                 {                 cln++;                 out.PutChar(chCR);                 }             out.PutChar(ch);             break;             }         chPrev = ch;         }; private:     int chPrev;     OutputFile &out;     InputFile ∈ };

With that, our original conversion function becomes:

Inline int ClnConvertToWin(InputFile &in, OutputFile &out) {     ToWin cnv(in, out);     return ClnConvertLines(in, cnv); }

I should point out that there is no a priori reason for ClnConvertLines to be a
template. We could have defined a
base class, CharConverter,
that virtualized operator(),
and made ToWin a subclass
of CharConverter. In this particular case, however, the
virtualized base class approach isn’t any better than the old-style, function
pointer approach. In fact, on some
systems, it’s worse, because you have the double-dereference through an object’s
v-table instead of the single dereference of a function pointer.

The template-based solution, while it yields more object code in that ClnConvertLines will get instantiated
for every different flavor of cnv
object we give it, is much faster for our application. Because the template-based solution
gets expanded in line, there is an opportunity for the compiler to optimize out
the function-call semantics where the overloaded operator() is invoked—one
of those rare instances where we get to have our cake and eat it too.

Now, if that weren’t cool enough, the fact that we’ve abstracted out the
actual conversion of line endings into a separate piece of source code leads to
a flexibility one wouldn’t want to entertain in the purely functional
approach. For example, suppose we know that a particular input file has Macintosh line endings. Scanning the beginning of an input file
to figure out the existing line endings isn’t all that hard, and is well worth
the time if it greatly simplifies our inner loop. The implementation of the
line conversion from Macintosh to Windows line endings is almost trivial:

/*----------------------------------------------------------------------------     %%Class: MacToWin     %%Contact: richs ----------------------------------------------------------------------------*/ class MacToWin { public:     MacToWin(OutputFile &anOut) :             out(anOut) {};     ~MacToWin() {};     void operator()(int ch, int &cln)         {         out.PutChar(ch);         if (ch == chCR)             {             out.PutChar(chLF);             cln++;             }         }; private:     OutputFile &out; };

You wouldn’t entertain something like this in the purely functional
approach, because the proliferation of code with the same loop semantics is
something you want to avoid. If having just three duplicates of that outer loop
is bad, having one for every possible known combination of input and output
line endings is that much more of a maintenance headache. With function objects, we can proliferate
to our heart’s content without increasing the level of maintenance required
should we decide to change the semantics of the loop control.

By now, there’s at least one astute reader who’s thinking, “Gosh, Schaut,
flipping line endings isn’t all that different from iterating through one of
the Standard Template Library’s collection classes. Using function objects should be obvious. What’s all the fuss about?”

Such an astute reader would be absolutely correct: they way I’ve used
function objects here is almost exactly the way function objects are used in
the STL. In fact, we can take that
line of thought and extend it to the concept of an input iterator.

Think about how one might use a command-line tool to convert line
endings. Some times, you’ll want
to just invoke the tool on a single file.
Other times, you’ll want to invoke the tool on a whole bunch of files in
a single directory. On still other
occasions, you’ll want to use some complex find
command to generate a list of files in an entire directory tree, and pipe the
output of that command through the line converter’s standard input file.

So, you’ll have two distinct ways of getting a list of files to convert: as
an array of C-style strings provided on the command line or as a list of file
names coming in via your standard input file. The structure of the loop to convert files and report the
progress of that conversion to the user ought not change simply because we’re
getting a list of files in two distinctly separate ways. This problem screams for a solution
where input iterators are implemented as function objects.

I’ll leave the actual implementation of this as an exercise for the reader,
but there is one thought to consider.
The input iterator is in an outer loop, not an inner loop, and the
function that figures out which particular conversion loop to invoke is likely
to be complex enough that we wouldn’t want multiple copies of it in our object
code. In this case, I would avoid
a template-based approach in favor of defining a base class for our input
iterators where the operator()
is virtualized.

Hopefully, this will lead some of you to think more about using function
objects in your daily work—in particular, I’d want you to think that
function objects are useful outside something as complex as the Standard
Template Library. If function
objects can improve our implementation of something as mundanely simple as
flipping line endings in text files, they just have to be cool enough to use in
a wide variety of contexts.

 

Rick Schaut

Currently playing in iTunes: Sierra Leone by Derek Trucks Band

Update: Fixed the template definition for ClnConvertLines (convert to HTML entities).