When (Not) to Pack Structures

Posted by: Russ Keldorph

In my previous post, I talked about how structure packing works. Now I’d like to talk about when and why it’s commonly used as well as why you may or may not want to use it. Let me start out by saying that by "structure packing" I'm referring to the use of the /Zp compiler switch or #pragma pack directive to make the packing of a structure something other than the default. For example, using #pragma pack(2)around a structure containing an int type modifies the structure default packing of 4. Alternatively, #pragma pack(1) around a structure containing only char (1-byte) types has no effect and is (technically) harmless.

Why use packing?

People usually use structure packing for one of two reasons:

1. they want to save space in data structures, or

2. they want to format a stream of bytes into fields according to some existing specification like a network protocol.

These can both be valid reasons, but, more often than not, the implications of a decision to use packing are not fully understood, leading to unforeseen side effects that can, in some cases, have long-term negative consequences. The point of this post is to identify the costs of packing and suggest best practices around its use.

First, let’s look at a common example of how packing affects code generation. Take the following C++ code compiled for all four architectures supported by Windows Embedded CE.

// To compile: cl –c –O2 t.cpp –DPACKING=<packing size>

#pragma pack(push, PACKING)

struct S {

    char i8;

    int i32;

};

#pragma pack(pop)

int extract(S * ps) {

    return ps->i32;

}

The following table lists the sequences of code required to load the i32 member of S. Remember that when PACKING=4, padding is inserted such that the i32 member’s offset from the beginning of S is a multiple of its alignment (4). When PACKING=1, i32’s alignment becomes 1, so no padding is inserted.

 

PACKING=4

PACKING=1

ARM

ldr r0, [r0, #4]

ldrb lr, [r0, #1]!

ldrb r3, [r0, #1]

ldrb r2, [r0, #2]

ldrb r1, [r0, #3]

orr r3, lr, r3, lsl #8

orr r3, r3, r2, lsl #16

orr r0, r3, r1, lsl #24

MIPS

lw v0,4(a0)

addiu t0,a0,1

lwl v0,3(t0)

lwr v0,0(t0)

SuperH

mov.l @(4,r4),r0

add #1,r4

mov.b @(3,r4),r0

mov r0,r3

mov.b @(2,r4),r0

shll8 r3

extu.b r0,r2

mov.b @(1,r4),r0

or r3,r2

extu.b r0,r1

mov.b @r4,r0

shll8 r2

or r2,r1

shll8 r1

extu.b r0,r0

or r1,r0

x86

mov eax,dword ptr [eax+4]

mov eax,dword ptr [eax+1]

 

Notice how the difference packing makes depends a lot on the architecture you’re targeting. For the RISC targets (ARM, MIPS, SH), the compiler must assume that the i32 member is misaligned and must generate special code since normal 4-byte load instructions do not work in that case. In terms of code size, SuperH and ARM suffer the most since they have to load one byte at a time and combine them with a series of shifts and logical ORs. MIPS is quite a bit better with its special “left” and “right” load instructions, and x86 isn’t affected at all since the CPU supports misaligned addresses for most memory accesses. I don’t want to speculate too much, but it’s possible that the reason structure packing is so popular is that x86 is so popular. If more people had to target SH-4, they’d think twice before packing their data types. Oh, and one thing I should mention is that the 8-bit i8 member isn’t really necessary for this discussion. Even if it were absent such that i32’s offset from S were zero (0), the generated code would be almost identical. This is because packing works by modifying the alignment of members. It’s the alignment of the member, not its offset, which determines how the compiler accesses it.

Saving space

Let’s now take a look at the first reason you might want to use structure packing: to save space. It’s true that the structure above with PACKING=1 is smaller than the structure with PACKING=4. The sizeof operator indicates 5 bytes for the former and 8 bytes for the latter. This might lead one to believe that all data should be packed. However, if you look at the impact on code size, the benefit is not so obvious. The code required for each access to misaligned data can be much more than for a normal access, and that is multiplied by the total number of accesses across the code base. In one case I know of, a colleague removed a #pragma pack(1) from the main header of his ARM DLL, reducing its size from 300kB to 200kB. Remember that data is often temporal, i.e. it comes and goes and space for it isn’t always allocated. However, code will usually live for the entire lifetime of a process, and can also take up space indefinitely in ROM or on disk.

In short, make sure you take into account the code size implications if you think packing will save space. Make sure you know the performance impact as well. It should come as no surprise that the ARM and SuperH sequences for misaligned accesses are slower than the aligned sequences. However, even the x86 sequence is usually slower if the memory is misaligned, because modern CPUs have to access both of the enclosing (aligned) words in order to access a misaligned word.

Recommendation: Instead of packing to save space, consider reordering your data structures so that larger members always precede smaller members (or, rather, more-aligned members precede less-aligned members). That way, you will have little or no padding except possibly at the end of the structure. Padding at the end of a structure affects array allocations, but little else.

Matching byte stream formats

The other common reason people use packing is to implement network protocols or to parse byte streams. Packing can make it more convenient to write code for certain data formats. Take this (made up) packet format as an example:

 

Signature

(16-bit)

 

 

Size

(32-bit)

 

Protocol

(16-bit)

 

Checksum

(32-bit)

 

Payload

(N-bit)

 

If we were to declare the structure like this:

struct packet1 {

      unsigned short signature; // offset 0

      unsigned long size; // offset 2 or 4?

      unsigned short protocol; // offset 6 or 8?

      unsigned long checksum; // offset 8 or 10 or 12?

      unsigned char payload[1];

};

by default, the compiler will insert padding between the signature and size fields in order to maintain the latter’s alignment (4). One solution to this would be to use #pragma pack(2), which would remove the need for padding. In some cases, this might be the right thing to do, particularly if the alignment of the beginning of the packet is at most 2-byte. But wait, as you may have noticed, the offset of the checksum member is a multiple of its natural alignment. That means that if the beginning of the structure is aligned, it can be accessed safely with a normal 4-byte load or store. However, if we use #pragma pack(2), the alignment of all fields is capped at 2-byte, forcing the compiler to load it with at least two instructions for most architectures.

What if we can ensure that the beginning of our packet buffer will always be 4-byte aligned? Is it possible to match the packet format while still loading all fields as efficiently as possible? Yes, if you’re willing to write a little more code. One option is to replace the size field with two smaller fields with less strict alignment requirements:

struct packet2 {

      unsigned short signature; // offset 0

      unsigned short sizeLow; // offset 2

      unsigned short sizeHigh; // offset 4

      unsigned short protocol; // offset 6

      unsigned long checksum; // offset 8

      unsigned char payload[1];

};

Now we have what we want in terms of layout. In fact, this is or is similar to what we would have to write if we didn’t have the ability to pack structures at all. The problem is that now we have to write extra code to get at the size member, which is the main reason we wanted to use packing in the first place. The key to fixing this is to realize that we just need to reduce the alignment requirement of the size member. How? One option is to use #pragma pack.

#pragma pack(push,2)

struct u32_a16 {

      unsigned long u32;

};

#pragma pack(pop)

struct packet3 {

      unsigned short signature; // offset 0

      struct u32_a16 size; // offset 2

      unsigned short protocol; // offset 6

      unsigned long checksum; // offset 8

      unsigned char payload[1];

};

Note that we have to encapsulate the scalar unsigned long type in a structure because #pragma pack doesn’t affect scalars that are not members of a structure. The one drawback to this is that, in C, we have to write a little extra code to access the size member, i.e. we’d have to write p->size.u32 instead of just p->size. You could perhaps hide this overhead in an accessor function. In C++, however, you can add a little syntactic sugar to make the code look just like we want:

#pragma pack(push,2)

struct u32_a16 {

      inline unsigned long operator=(const unsigned long &that) {

return this->u32 = that;

      }

      inline operator unsigned long() { return u32; }

      unsigned long u32;

};

#pragma pack(pop)

Now the compiler can generate the most efficient code for aligned fields and correct code for the misaligned ones. Remember, though, if the entire structure may not be aligned, you’re probably best off packing the whole thing since the compiler needs to generate unaligned access code for everything anyway.

Other tips about packing and alignment

· Be careful when taking the address of a field in a packed structure. If you assign it to a “normal” pointer, the compiler will lose the fact that it is misaligned. For example:

 

struct S sample; // struct from above

int * pi = &sample.i32; // alignment information lost

*pi = 4; // DATATYPE_MISALIGNMENT exception

 

This can be particularly confusing when including an unpacked structure inside a packed structure. The compiler has a warning (C4366) to attempt to detect this practice, but it’s not completely reliable.

· Try to avoid using packing in public interfaces that have (or will have) backward compatibility requirements. Even though packing may seem beneficial now, it’s likely that it could be harmful in the future, particularly if the interface is implemented on a different architecture. It's ok to use #pragma pack in a header file to protect it from other users (see below), but the packing value should be the compiler default (8).

· If you must use #pragma pack in a header file, be careful not to let it “leak” out and affect structures you never intended. Always use the push/pop features like you see above, and try to limit the packing scopes to just around the structures you care about. The latter practice helps avoid someone unintentionally creating packed structures when adding types to your header.

· Be very wary of One Definition Rule (ODR) violations with packing. Defining the same type under different packing values in different translation units can lead to bugs that are very difficult to track down.

o Always define your types in a single header and include that wherever you need it.

o Don’t #include headers under #pragma pack

o Use #pragma pack(push,8) at the beginning and #pragma pack(pop) at the end of your headers to protect them from /Zp switches and other people including them under #pragma pack

Conclusions

Packing can be a useful feature, but like many useful features it needs to be understood fully in order to avoid misuse. Always test your assumptions about packing before making a decision to use it. “Premature optimization is the root of all evil.”

As always, feel free to ask questions. I hope my next post will come sooner than this one did. J