Subtle bugs #3


Today’s bug challenge is a design bug.


Came across some C code like this awhile back (this is pseudo code and has all “work” elements removed from it):


void AddID(char * pszObjID)


{



char szID[8];


//Pad the ID with spaces


strcpy(szID,” “);


strcat(szID,pszObjID);


strcat(szID,” “);


//Check for the ID in the list


if ((!strstr(szIDList,szID)) && ((strlen(szIDList)+8)<MAX_LIST_SIZE))



//Add it to the list


strcat(szIDList,szID);


//…


}


The point of the code is to track the items added to the collection by ID. Each item is an object of sorts and has an integer ID. If the item is already on the list, we don’t want to add it again. We need to track the IDs because we persist them to disk after all items have been added. The ID numbers start at 10,000 and range up to 30,000. Typically, the list consists of a few hundred to a few thousand entries.


The code uses a char array to store the IDs so that it can use strstr() to search the list before adding a new ID to it. Each ID is padded on the left and right with a space in order to simplify this check.


I discovered all this because this code lived in a DLL that my app used and was AVing. It turned out that a different value was used to size the ID list than was used to constrain the number of IDs that could be added to the list. In other words, there was a buffer overrun – a fairly simple, garden-variety one.


Can you identify the design bug here?  Can you think of a way to improve this design (keep your solution in C/C++)? What alternative way of storing the list can you come up with that would not only be more efficient in terms of memory, but also facilitate quick searches for IDs? What mechanism can we employ to avoid the possibility of a buffer overrun altogether?


Comments (5)

  1. lior says:

    You mean, other than the possible buffer overuns? other than the possible char without ending nulls? and other than the fact that the list can be full but the user of this method wouldn’t know it?

  2. Maciek says:

    This is probably the most inefficient way of counting the IDs. First of all memory is wasted – each ID takes 7 bytes. If we want to store 3000 IDs, it will take approximately 21 kB. What’s even worse is that strstr does a linear search every time we want to look up the ID. As the list grows it will be slower and slower.

    One solution that comes to mind is to use a bit array. We use an ID as an index into the array (converting it first from string to an integer). A bit is set if ID is already in use. Search time is constant and memory usage is 30000/8 bytes which is approximately 3.5 kB.

  3. MSDNArchive says:

    To the first poster: I said at the outset that the code was pseudo code. I stripped it down to its bare essence so we could take a look at its design. The fact that a buffer overrun is possible is actually the whole point of the exercise. What design change could be make that would make the possibility of a buffer overrun a moot point?

    To Maciek: You nailed it. I think a bitmap is indeed the best solution here. It’s what I suggested to the developers of the code in order to fix it.