Thoughts on bit fields.

Article
01/11/2005

In C there is a long tradition of using bit fields to store a collection of boolean values:

enum

{

TF_KEYWORD = 0x0001,

TF_MEMBER = 0x0002,

TF_IDENTIFIER = 0x0004,

TF_STRINGLITERAL = 0x0008,

TF_CHARACTERLITERAL = 0x0010,

} TOKENFLAGS;

DWORD dwTokenFlags = TF_KEYWORD | TF_MEMBER;

This tradition has been carried over to C#, which is my area of interest. The support for bit fields in C# is a little different than C++, but my opinion of them hasn’t changed.

Upsides:

Potetential performance improvements due to reduced memory consumption. If 'bool' takes 8 bits, and a bit flag takes one, the 8:1 memory saves could concievably have a significant effect on performance. Of course, without real measurements you don't really know.

Downsides:

They're hard to define correctly. For each value, I have to write a definition that gives it a single, unique bit. If I forget that I'm writing in hex, I get a difficult to diagnose bug.

[Flags]

enum TokenFlags

{

Keyword = 0x0000,

Member = 0x0001,

Identifier = 0x0002,

StringLiteral = 0x0004,

CharacterLiteral = 0x0008,

VerbatimStringLiteral = 0x0016,

}

Did you notice that two of these items are broken?

It's hard to add new ones. If I want to insert a new value in the middle, I have to renumber all the remaining items.

In the debugger it's hard to know what the values are.

The code to manipulate them is hard to write correctly and hard to read.

// this means "if either one is set"

if (dwTokenFlags & (TF_CHARACTERLITERAL | TF_IDENTIFIER))

// this means "if both are set"

if (dwTokenFlags & (TF_CHARACTERLITERAL | TF_IDENTIFIER) == (TF_CHARACTERLITERAL | TF_IDENTIFIER))

// unset this flag

dwTokenFlags &= ~TF_CHARACTERLITERAL;

If there are rules about which flag combinations are legal, it's hard to enforce.

You can't add a constructor to initalize the flags to a default.

You can't add useful methods for checking aggregate combinations of flags.

It's tempting to add unrelated flags in the same enum, and then hard to split them out later.

It's tempting to use a single bit for multiple meanings in different contexts.

enum NODEFLAGS

{

// Only valid when the node is a MODIFIERNODE

NF_MOD_ABSTRACT = 0x0001,

NF_MOD_NEW = 0x0002,

NF_MOD_OVERRIDE = 0x0004,

NF_MOD_PRIVATE = 0x0008,

NF_MOD_PROTECTED = 0x0010,

NF_MOD_INTERNAL = 0x0020,

NF_MOD_PUBLIC = 0x0040,

// Only valid when the node is a STATEMENTNODE

NF_UNCHECKED = 0x0001,

NF_GOTO_CASE = 0x0002,

NF_TRY_CATCH = 0x0004,

NF_TRY_FINALLY = 0x0008,

NF_CONST_DECL = 0x0010,

};

If you get more than 32 of them, you’re out of luck. We have a case in our code today with 31 distinct values. We’re worried that we may need to add 2 soon…. Yikes!

It’s easy to mix up two types of flags:

// Type flags

DWORD dwFlags = TF_CLASS | TF_PUBLIC;

// Member flags

DWORD dwMember = MF_METHOD | MF_PRIVATE;

if (dwMember | TF_PUBLIC) // OOPS!

Potential for improved performance due to better memory alignment. Every time you access a bit flag, you have to execute multiple CPU instructions. 'bool' takes fewer instructions. This also means your code is smaller. Of course, without real measurements you don't really know.

A simple alternative: a class containing public 'bool' fields.

class TokenFlags

{

public bool Keyword;

public bool Member;

public bool Identifier;

public bool StringLiteral;

public bool Characterliteral;

}

TokenFlags tf = ...;

if (tf.Characterliteral || tf.Identifier)

This is easy to write and easy to understand. It solves most of the problems I listed above.

A hybrid approach: Encapsulated bit fields.

Hey, this is 2005, I should think about encapsulation. If I do some performance analysis & find that my application is too slow because of memory consumed by bools in Flags types that look like the one I showed above, it’s easy to convert:

class TokenFlags

{

enum Indexes

{

Keyword,

Member,

Identifier,

StringLiteral,

Characterliteral,

}

BitArray _bits = new BitArray(Enum.GetValues(Indexes).Length);

public bool Keyword

{

get { return this._bits[Indexes.Keyword]; }

set { this._bits[Indexes.Keyword] = value; }

}

public bool Member

{

get { return this._bits[Indexes.Member]; }

set { this._bits[Indexes.Member] = value; }

}

public bool Identifier

{

get { return this._bits[Indexes.Identifier]; }

set { this._bits[Indexes.Identifier] = value; }

}

public bool StringLiteral

{

get { return this._bits[Indexes.StringLiteral]; }

set { this._bits[Indexes.StringLiteral] = value; }

}

public bool Characterliteral

{

get { return this._bits[Indexes.Characterliteral]; }

set { this._bits[Indexes.Characterliteral] = value; }

}

Notice that this is source compatible with the ‘public bool’ approach, except if you pass one of the members with ‘ref’. If you’re worried about that, or binary compatibility, then use properties instead of public fields. But you know how I am about properties.

What about OO?

Good question. It’s taken me this far to consider OO.

It depends on the nature of the problem domain that you’re trying to work in, but there is surely an OO way, right?

For example, suppose we did:

interface IToken { }

interface KeywordToken : IToken { }

interface MemberToken : IToken { }

interface IdentifierToken : IToken { }

interface StringLiteralToken : IToken { }

interface CharacterLiteralToken : IToken { }

Well, I can write:

IToken token = ...;

if (token is KeywordToken)

F();

But now I can start taking code that was on the clients of the flags, and move it on to the tokens.

Before:

if (token is KeywordToken)

{

FooForKeyword()

}

else

{

FooForOthers()

}

After:

interface IToken

{

void Foo();

}

token.Foo();

Thoughts on bit fields.

Upsides:

Downsides:

What about OO?

Additional resources