Avoiding #defines for constant data and using enums instead


I think that the C preprocessor is a very powerful tool, but I like to limit my use of #defines. I have already touched on this when i talked about why I liked FORCEINLINE and I want to talk about it some more. I realize I can’t eliminate the use of #defines throughout all of my code for various reasons, among them being:



  • They can have unintended side affects. The classic example of this is #define MAX(a, b) (a >= b ? a : b) and invoking it with MAX(foo++, bar) such that foo++ is evaluated twice leading to a double increment.
  • Other headers use them to enable/disable certain pieces of functionality or functions.
  • They are the only way to wrap up other preprocessor macros like __FILE__ and __LINE__ to allow for easier logging without more pushing more typing onto the caller.
  • They are the only way you can generate code on the fly in C (a poor man’s C++ template if you will)
  • They are the only way you can use the quotify (#) or concat (##) functions within the preprocessor

The main reason I try to limit my usage of the preprocessor is that because it is a pure substitution, you generally don’t have type information available to you when debugging the application or driver. A great example of this is constant numeric values within your application. You could do this to define values in a bit field:

#define OPTION_ONE   (0x00000001)
#define OPTION_TWO (0x00000002)
#define OPTION_THREE (0x00000004)

but you will not be able to evaluate OPTION_ONE in the debugger because there is no OPTION_ONE symbol in the PDB. The preprocessor substituted OPTION_ONE with 0x0000001 before the linker was ever involved. To look up the value of OPTION_ONE you would have to open up a source file. Instead, I do the following when I have bit field values that I want to define and use within my app/driver:

#include <windows.h>

typedef enum _OPTION_FLAGS {
OptionOne = 0x00000001,
OptionTwo = 0x00000002,
OptionThree = 0x00000004,
} OPTION_FLAGS;

typedef union _OPTIONS {
ULONG Bits;

// NOTE: this is for debugging only, *always* use Bits in code
struct {
ULONG One : 1; // OptionOne
ULONG Two : 1; // OptionTwo
ULONG Three: 1; // OptionThree
} BitsByName;
} OPTIONS;

int _cdecl main(int argc, char *argv[])
{
OPTIONS o;

o.Bits = OptionTwo;

return 0;
}


For now, let’s ignore the fact that I can use protected or public to abstract how the values are set and read and that I am using an unnamed union. (Some folks feel that unnamed unions are unholy, but I find them very practical in this pattern.) There are 2 key concepts to what I am doing here:



  1. The enumeration remains in the symbols so you can dump it within the debugger [1]. Now you don’t have to look up the value in a source file.
  2. By creating a union of the true value (Bits) [3] and a bit field of the value (BitsByName) [4] I can see what each bit means without having to look at the enum at all! This means that you can see the value without digging into the structure at all if you don’t need to [2].
I have broken into the debugger right before returning.

[1] 0:000> dt OptionsFlags
OptionOne = 1
OptionTwo = 2
OptionThree= 4

[2] 0:000> dt o Bits
Local var @ 0x6ff78 Type Options
+0x000 Bits : 2

[3] 0:000> dt o BitsByName.
Local var @ 0x6ff78 Type Options
+0x000 BitsByName :
+0x000 One : 0y0
+0x000 Two : 0y1
+0x000 Three : 0y0

[4] 0:000> dt o
Local var @ 0x6ff78 Type Options
+0x000 Bits : 2
+0x000 BitsByName : Options::<unnamed-tag>

March 28, 2006: Updated the enum and union definitions to be C compliant and match my style guidelines.

Comments (27)

  1. strik says:

    Your solution has a problem: It uses undefined behaviour of the C compiler:

    1. You are not guaranteed that accessing the union elements under different names has any meaning at all, let alone a useful meaning. For example, the compiler could choose to implement a union as a struct, thus, accessing o.Bits and o.BitsByName might result in accessing different memory areas.

    2. The order in which a bit field is allocated is not defined by C, either. Do bit fields get allocated started with the highest  bits, or with the lowest bits? Do they even get allocated consecutively? I compiler might even choose to allocate them in a "random" manner (there have been educational compilers doing this).

    Thus, while your code might work as long as you use Microsoft compilers (things like these are used in MS headers, thus, I doubt MS will change that behaviour in the near future), you cannot generate completely portable code with it.

    To summarize: I think your suggestion with using the bitfield and the union are very bad programming habits, at least if you want to write C programs (and not "Microsoft C"), and should be avoided.

    The enum suggestion is another thing; this is something I would recommend.

    – Spiro

  2. doronh says:

    1)  yes, actually the start of each field in a union has to be overlayed with each other.  that is the whole point of a union.  See  http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccore98/html/_langref_union.asp.  It says

    “A union is a user-defined data type that can hold values of different types at different times. It is similar to a structure except that all of its members start at the same location in memory.”

    Note that this is not under a Microsoft specific heading, this is part of the C standard.  

    2)  Yes, the bit field order is not defined by the C standard, but every production compiler I have ever encountered allows you to define the behavior or has a default which is documented.  I have seen code compiled with gcc rely on the ordering of bitfields, so Linux is just as dependent on its compiler as Windows is.  Besides, there is no harm if the bit field is wrong, you would realize it quickly.  Finally, purely portable code only allows for bit fields on (U)LONG/INT values, but almost every single compiler will let you get away with creating a bitfield on a char or a short.  Again, not purely portable if you use it, but any compiler is practical enough to let you do it so that you can create bit fields for things like one byte registers.

    To summarize, it is not a bad programming habit, it is just your opinion that it is bad.  Note that I don’t use the bitfield in production code (because the compiler can do more shifting around then necessary), it is only for use within the debugger.  If it offends you, don’t use the union.

  3. strik says:

    1) No, it does not. What you are citing is C++, not C. I searched C89/90 and C99, there is no mentioning of the above. Anyway, in my (Pre-ANSI) C++ reference manual (Stroustrup), I found that sentence.

    I searched C99, and found some interesting parts:

    [1] ISO/IEC 9899:1999 (E), p. 35, §6.2.5:

    "— A union type describes an overlapping nonempty set of member objects, each of which has an optionally specified name and possibly distinct type."

    Ok, here we find that these are overlapping. What do we gain from this?

    [2] ISO/IEC 9899:1999 (E), p. 38, §6.2.6.1:

    "7 When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values, but the value of the union object shall not thereby become a trap representation."

    We are allowed to write each member without making the object invalid or a trap representation; nevertheless, when I write one member, the other member gets unspecified values.

    [3] ISO/IEC 9899:1999 (E), p. 103, §6.7.2.1:

    14 The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit-

    field, then to the unit in which it resides), and vice versa.

    O.k, here you got me. The programmer must have at least the vision that everything starts at the same position. Nevertheless, this does NOT tell you that the implementation has to do it that way. [2] clearly allows an implementation which handles it otherwise, despite the fact that the union appears to have all members start at the same position.

    However, notice the special case:

    [4] ISO/IEC 9899:1999 (E), p. 73, §6.5.2.3:

    5 One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the complete type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

    Thus, if we have a common initial sequence, this guarantee can be given. Anyway, in your example, this is not the case.

    (Additionally: Yes, I know that MS does not support C99; anyway, these citings are more or less the same in C89/C90).

    2) There is more than Windows and Linux. 😉 I never claimed the Linux community performs any better in this respect. Well, with autoconf/automake, they often do even worse.

    There are Microcontrollers, DSPs, and other things that also have C compilers. Many things allowed for common i386 compilers are not allowed or at least problematic for these targets. For example, when was the last time you got a "bus error" on an i386 machine? 😉

    These compilers tend to utilize much more of the undefined behaviour than "mainstream" compilers do, to be able to utilize the hardware in much better ways.

    Anyway, again, I was partially wrong: The bit-fields have to be consecutive (but padding is allowed); only the order (low-to-high or high-to-low) is not defined:

    [5] ISO/IEC 9899:1999 (E), p. 102, §6.7.2.1:

    10 An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified"

    To summarize: Yes, it IS bad programming habbit to rely on such undefined behaviour. Regardless of my own errors here, your code is problematic. I know (compliant) compilers for other processors where your code does not run on.

    It is good to try to write portable code as long as it is possible, and reduce the architecture specifics to the least number of code lines as possible. Anything else IS bad programming habit; I hope you will understand what I mean once you have to port code over to other, *very* different architectures.

    – Spiro.

  4. doronh says:

    I never said I was writing portable code across multiple compilers nor did I ever state that this was a goal for what I wrote.  I write Windows device drivers.  I write and have written drivers for *very* different architectures (x86, x64, IA64, MIPS, Alpha, etc).   Furthermore, I am blogging about Windows device drivers (and not about portable code).

    I would like to see a mainstream C/C++ compiler which did not start all fields at the same address.  You can bring up any number of edge cases and specialized compilers, but that is a completely different domain. Different practices and techniques apply when you are in such a specialized environment.  One size does not fit all.

    You are not seeing my point about bitfields. I have not advocated their use in production code, merely as a debugging tool.  If the bits are not laid out the same as my assumption, *no harm is done*.  If needed to construct a char/short/long full of bits, I would do it myself with masks and the correct shifting and not rely on a bit field with an arbitrary layout.

    I am finding it hard to figure how you are in a position to judge what I have done, what I do and how it applies to the practices I have developed over the years.  You have your opinion, I have mine.   I don’t view what I wrote as bad programming habit/practice, I view it as a very practical and useful technique to get my job done.  Let me reiterate, it gets my job done. By using the pattern I laid out, I spend less time debugging failures and more time doing the other parts of my job.  

  5. strik says:

    First of all, let me state that I never intended to offend you, and I appreciate your opinion and your blog very much. I know you know much more about many areas; this is the reason why I read your blog despite reading you in newsgroups and mailing lists.

    Having said this, I still stand by my opinion. Ok, I should have stated "in my opinion" more clearly, but the topics still remain.

    I am surprised you call other processors (Microcontrollers, DSPs) as "edge cases", although all statistics I know tell me that the market share of these embedded devices is much bigger than the market share of even the iA32.

    There has been more than one case where the habit "it works for me, I get my job done, so I do it this way, depending on some property of the current system, but I do not document it" – obviously, exactly what you are proposing – lead to severe, costly damage of real-world hardware (I am not speaking about PCs or similar things.) People read blogs like this here – again, I LIKE your blog very much! – and think they learn also about C, but do not see the problems in this.

    Have you ever tried to teach an experienced C programmer to "forget" about these things he cannot rely upon? It is very hard.

    Although you believe the opposite, I see your point about the bit-fields as being very helpful for debuging purposes. In fact, I am using similar techniques, "just to get the job done". Anyway, I am aware of the restrictions and problems. Is any reader here aware of these, too? This is the reason why I jumped in here in the first place.

    Again, I never wanted to offend you, nor judge what you have done or are doing. Anyway, I do not like to take these things "as given" and let them stand without telling where the pitfalls might be.

    Furthermore, aren’t such comment sections in blogs there for expressing your opinion? This is exactly what I am doing. You read it, think about it, and decide if you think  a) "he is writing completely silly things", b) "he has a very good opinion, and what I wrote was wrong", or c) "he has some points, I have mine. This and that my apply in my case, this might not", and the like. Choose your opinion.

    And: Yes, *in* *my* *opinion*, it is still a bad programming practice. 🙂

    – Spiro.

  6. sayler says:

    Is there a reason not to do something like GDB does: (e.g.) http://devworld.apple.com/documentation/DeveloperTools/gdb/gdb/gdb_10.html

    It’s a bit hacky, but so are most of the alternatives (as pointed out above)

  7. mattd says:

    Interesting stuff for sure.

    strik, technically the program isn’t valid C, its C++. So perhaps you should be referencing the C++ standard?

    Thanks doron, I always learn something on this blog.

  8. doronh says:

    sayler: I forgot about that option with GDB, that’s pretty cool.  macro evaluation in the debugger.  Are  -gdwarf-2 and -g3 debug settings?  If so, the one advantage my technique has over GDB is that you can still see my information in a free build.  Either way, that is a very cool and useful technique to have as well!

    d

  9. strik says:

    mattd: Why isn’t this valid C? I don’t see anything that is not valid C in it (depite the facts I talked about before).

    Even the // – style comments are allowed since C99.

    saylor: Thank you for the link. Although I have used gdb before, I was not aware that there was something like that.

    – Spiro.

  10. BryanK says:

    > Are  -gdwarf-2 and -g3 debug settings?

    They’re compiler options, used for debugging info.  -gdwarf-2 tells gcc to use DWARFv2-format debugging information, and -g3 tells it to export level-3 debugging information.  From the gdb info page:

    -g

    Produce debugging information in the operating system’s native format (stabs, COFF, XCOFF, or DWARF). <…>

    -ggdb

    Produce debugging information for use by GDB.  This means to use the most expressive format available (DWARF 2, stabs, or the native format if neither of those are supported), including GDB extensions if at all possible.

    -gdwarf-2

    Produce debugging information in DWARF version 2 format (if that is supported).  This is the format used by DBX on IRIX 6.

    -glevel

    Request debugging information and also use "level" to specify how much information.  The default level is 2.

    <…>

    Level 3 includes extra information, such as all the macro definitions present in the program.  Some debuggers support macro expansion when you use -g3.

    So I’m a bit surprised they didn’t use -ggdb (since it has support for gdb extensions), but maybe DWARFv2 isn’t the default on OSX.

  11. mattd says:

    "mattd: Why isn’t this valid C? I don’t see anything that is not valid C in it (depite the facts I talked about before). "

    Options needs to be typedef’ed or declare in main as

    union Options o;

    The code as is won’t compile as C. If it does for you then your most likely are using the /TP switch…

    Is this not correct?

  12. strik says:

    mattd: Of course, you are totally correct. I completely overlooked that .

    So, I stand corrected on this. 🙂

  13. doronh says:

    yes, the example is C++ (I was lazy I guess).  i will edit it to make sure it is C compliant 😉

  14. mattd says:

    A guy cant even write a "enums are good" post without getting nitpicked=).

    This blog rocks, always good stuff.

  15. sayler says:

    > A guy cant even write a "enums are good" post without getting nitpicked=).

    Not nitpicked, just talked about.   I think it’s all kinds of good to be able to read articles like this.

  16. divisortheory says:

    If, like many people, you package your driver code in .CPP compilation units to gain things like stronger type checking, anywhere variable declaration, etc, then you might as well take this one step further and use actual const data rather than enums.

  17. doronh says:

    const integer data has a few issues:

    1. it might actually take up a storage in the final binary if the linker cannot determine if it can safely remove the variable.  an enum value will not.  
    2. they are not self contained and groupings are not apparent. Yes you can scope it to a class and make it a static memember, but knowing which other constant values are associated with each other can quickly grow problematic.
    3. you don’t get auto value increment behavior as you do with an enum (that may represent a state vs a bit field in which you are assigning specific values).

    For non integer data like strings or complex types, const global vars are a great way to retain that info at runtime and provide type safety.

  18. Yulka says:

    While working with WinDbg I encountered the following problem: when the

    project (in C) is compiled with MS Visual Studio 6.0 C compiler enum

    symbols values are not recordered in the symbols file. The "dt" debugger

    command gives the answers like "Int4B" for an enum.

    When I tryed the same with a little "trial" C++ project it worked well,

    giving the enum’s names. I tried to change/add different debugger and

    linker options but it didn’t work. Do you have any idea regarding this

    issue?

    Thanks

  19. doronh says:

    yulka, I don’t have a clue as to why this is happening. I would hazard a few guesses though

    1)  VS6 is pretty old and the C and C++ compiler could be more different then in later versions

    2)  I know that in C, an enum is not a type, it is just a glorified #define.  Any enum type is just a signed integer.  What you are seeing is probably a strict interpretation of that.  In C++, a enum is a real type that is not the same as a signed int.  In later versions of visual studio this distinction is not made and the C code (at least symbols) treat an enum as its own type.

    d

  20. bruteforce says:

    The debugging aid of using enums instead of #defines is undoubtedly cool.

    Wouldn’t it be nice if all windows (user and kernel) error status codes where defined as enums? Then you could immediately know what GetLastError() returned, instead of having to look around in MSDN or browse WinError.h.

    I tried out the following simple user mode example and was really amused (I am easy to amuse…) to see the Win32 error codes as descriptive enum names while debugging.

    #define ENUMSTAT(a) _##a = a

    typedef enum _WINSTATUS : DWORD

    {

       ENUMSTAT(ERROR_SUCCESS),

       ENUMSTAT(ERROR_INVALID_FUNCTION),

       ENUMSTAT(ERROR_FILE_NOT_FOUND),

       ENUMSTAT(ERROR_PATH_NOT_FOUND),

       // …

    }

    WINSTATUS;

    #define _GetLastError (WINSTATUS)GetLastError

    int _tmain(int argc, _TCHAR* argv[])

    {

       WINSTATUS status;

       HANDLE hFile;

       hFile = CreateFile("X:\nonexistentdir\random.txt", GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);

       if (hFile == INVALID_HANDLE_VALUE)

       {

           status = _GetLastError();

           return -1;

       }

       return 0;

    }

    In fact I liked it so much, that I wrote a quick and dirty utility that parses WinError.h and produces an enumeration with all error codes defined in it.

    You can grab the generated header file from http://www.staikos-manousopoulou.net/Programming/WinStatusEnum.txt

    I tried to find a way to use the actual error code names as the enum constants (by undefining them before declaring the enumerator) but my preprocessor skills are not that good 🙂

    So instead I just prepended an underscore.

    And I will definitely update our SDKs (FireAPI) to use enums for the error codes instead of defines.

    Thanks!

  21. bruteforce says:

    Also checkout http://www.staikos-manousopoulou.net/Programming/NTStatusEnum.txt for the DDK status codes as an enum.

  22. doronh says:

    both are very nice ;)…we did the same thing when I worked on bluetooth for the test  build so that we could dump out names in DbgPrint.    you do have other options other then rolling your own parsing/strings/enum:

    1)  when printing an NTSTATUS when using WPP, you can use %!STATUS! instead of 0x%x and traceview will give you the name and hex value for free.

    2) net helpmsg <error number in decimal>.  this does not give you the name of the error though, just the description.

    3) !error <value> <flags>.  if you specify 1 for flags, the value is force to be interpreted as an NTSTATUS.  this also just gives you the description, not the name.

  23. One of the goals of KMDF was to use clear and concise types in our parameters and structures so that

  24. Scott says:

    Interesting stuff.

    The one catch I had while I was working on a similar issue was the sizeof the enum (i.e. OPTION_FLAGS).  It seems like this is compiler-dependent and not customizable from cl.  Your code gets around this issue through a level of indirection using OPTION.

    An argument against that, though, is that now you have to come up with a name for the pass-through value (called Bits).

    It's an interesting problem and its implications are all over driver software.