What’s wrong with this code, take two…

Well, my last “What’s wrong with this code” post was such a rollicking good success, I figured I’d do it one more time (I’m SUCH a glutton for punishment).

This time, I not only bench checked the code, but I verified that it actually worked (which is more than I can say for the last one) J.

Basically this is a routine that receives a “message” from the network.  The “message” is to be time stamped and sequenced on arrival (this allows the message to be posted to worker threads while preserving ordering semantics).  A “message” consists of a short containing the length of the message and a buffer containing the actual contents of the message. 

Just for reference, there are four intentional bugs in this code.  Tomorrow, I’ll post what the bugs are.       

Assume that there is only one thread calling ReceiveSocketMessage on a particular socket at any given time – in other words, the logic involving the sequence numbers and/or timestamps isn’t the source of the bugs J.

 *     Network Message – Holds a message received from the network.
typedef struct _NetworkMessage
       long                 MessageSequence;
       FILETIME             MessageTime;
       unsigned short       MessageSize;
       char                 Message[ANYSIZE_ARRAY];
long GlobalMessageSequence = 0;   // Global variable containing the current sequence count.
 * ReceiveSocketMessage
 *     Receives a message from a socket.
 *     A “message” consists of a short message length and a buffer containing the actual message. 
 * Inputs:
 *     socket – Opened socket on which to receive the message.
 *     ppnetmsg – Output pointer that will hold a newly allocated PNETMSG structure with the MessageSequence, MessageTime, MessageSize and Message fields filled in.
 * Returns:
 *      Winsock result code if error, 0 if success.
int ReceiveSocketMessage(SOCKET socket, PNETMSG *ppnetmsg)
       int cbReceived = SOCKET_ERROR;
       short cbMessage;
       PNETMSG pnetmsg;
       assert(ppnetmsg != NULL);
       //     Receive the byte count from the socket.
       cbReceived = recv(socket, (char *)&cbMessage, sizeof(short), 0);
       if (cbReceived == SOCKET_ERROR)
              return WSAGetLastError();
       //     Allocate a buffer to hold the network message.
       pnetmsg = (PNETMSG)new BYTE[cbReceived + FIELD_OFFSET(NETMSG, Message)];
       if (pnetmsg == NULL)
              // Couldn’t allocate the buffer for the socket.
              return WSAENOBUFS;
       //     Fill in the static header values.
       pnetmsg->MessageSequence = InterlockedIncrement(&GlobalMessageSequence);
       pnetmsg->MessageSize = cbMessage;
       //     Receive the actual buffer from the server.
       cbReceived = recv(socket, (char *)&pnetmsg->Message, pnetmsg->MessageSize, 0);
       if (cbReceived == SOCKET_ERROR)
              return WSAGetLastError();
       //     If we didn’t get the amount requested, return an error.
       if (cbReceived != cbMessage)
              return WSAEMSGSIZE;
       //     Return success.
       *ppnetmsg = pnetmsg;
       return 0;

Comments (18)

  1. Anonymous says:

    how about:

    *cbMessage should be unsigned.

    *not handling the possibility of recv() returning zero to indicate disconnection.

    *if recv() returns some bytes, there’s no guarantee it returns the number of bytes you asked for.

    *InterlockedIncrement() does not return incremented values on some OS.

    *buffer not freed under some error conditions.

  2. Anonymous says:

    Good tries.

    1) cbMessage being signed or unsigned doesn’t change the bugs in the function.

    2) If recv() returned 0, I believe that the code would function correctly and return a buffer with a 0 byte payload (although I should have a check to avoid the bogus call to recv(socket, 0, buffer, 0);

    3) Yup, that accounts for two of the four bugs.

    4) Assume XP for the software, so the InterlockedIncrement isn’t an issue (I did say the bug wasn’t in the sequencing cases).

    5) You’re right, I missed freeing the buffer in the last two error cases. So two unaccounted bugs.

  3. Anonymous says:

    The buffer allocation is using cbReceived, not cbMessage to allocate the buffer, so it will allocate 2 bytes plus FIELD_OFFSET(NETMSG, Message) and the buffer will most likely overflow later on. And yes, recv can return 0 to indicate connection being closed, there’s no checks for that. I assumed that buffer releasing wasn’t included on purpose as it would increase the size of the example without adding any real value (but it should’ve been noted).

    Whether cbMessage is signed or not doesn’t really matter as long as it’s the same size as _NetworkMessage.MessageSize, it could be a char array of the right size (depends on the platform) that gets casted for all the compiler cares.

    As for recv returning less – I’m not sure about that, the docs are not really clear, it says if there’s an error the returned value will be SOCKET_ERROR, not the number of bytes recevied so far. The way I understand it it will either return SOCKET_ERROR, zero or the number of bytes you’ve asked for.

  4. Anonymous says:

    Jerry: cbReceived vs cbMessage in the allocation – you’re right, that’s a whopping big bug there.

    There’s another bug in that line of code btw.

  5. Anonymous says:

    should be FIELD_OFFSET(NETMSG, "Message")

  6. Anonymous says:

    er not, the msdn def says pchar, but the actualy impl just wants an identifier ;[

  7. Anonymous says:

    Larry, are you thinking about an integer overflow? That’s what I thought until I realized that the message size is not an int but rather a short and casting any short value to an int plus a structure offset will never overflow (unless you get a structure that is larger than 2147418112 decimal/7FFF0000 bytes). Of course, it’s compiler specific but what I said is true from MSVC on x86.

  8. Anonymous says:

    It may receice partial message in the last recv(various reason, may be socket buffer). Late gets remains. Your messages may out of synch, the next receive (short) may be the in previous message body.

    it is better to use char* than

    char Message[ANYSIZE_ARRAY];

    If this is used in DLL, need a way to release the memory.

  9. Anonymous says:

    How is the caller going to free the buffer allocated, anyway? They certainly aren’t going to expect to have to write:

    delete[] (BYTE*) pnetmsg;

    which is the correct statement, IMO. A different form of delete may work now, but may not continue to on a different compiler or C run-time.

    I’d use a LocalAlloc or a HeapAlloc against a given heap – or at least provide an explicit free function as part of the API.

    The reason I mention this: new is part of the C++ run-time library, which creates a new OS heap in its start-up code. Each instance of the CRT runs its own start-up code and therefore creates its own heap. If you try to free from a different heap than the memory was allocated from, the Windows heap code simply says "haven’t heard of that one".

    If some binaries (executable or DLL) link to the static CRT or different DLL versions of the CRT, you can’t allocate in one and free in another; in a Debug build you’ll get an assertion (if you were using crtdbg.h’s facilities) and in a Release build, you have a memory leak.

    The safest way is for the caller to provide the buffer, which is why most Windows functions work that way. It’s less convenient, though.

  10. Anonymous says:

    Mike, with TCP you’re guaranteed not to receive data out of sync. It’s the job of the TCP stack to order the data the way they were sent.

  11. Anonymous says:

    In this case, it’s a function in a bigger application, so freeing the memory isn’t an issue. You’re right, from a style standpoint it’s rocky, but…

    Mike – the char * vs char[ANYSIZE_ARRAY] is an interesting issue of programming style – char * involves a second trip through the heap allocator, which means you have an extra opportunity for missing a free call. char[ANYSIZE_ARRAY] has some added arithmatic complexity.

    Jerry – Yes, I’m thinking about arithmatic overflows – since cbMessage is signed, the compiler should sign extend it before it adds the FIELD_OFFSET constant, which can overflow.

  12. Anonymous says:

    Actually Jerry, Mike’s pointing out out the consequences of the error pointed out by B.Y. in the first comment – If you get a partial receive, the data gets "misplaced".

  13. Anonymous says:

    Yes overflow, that’s why I said cbMessage should be unsigned, althought I didn’t notice "new BYTE[…]" is using cbReceived instead of cbMessage, just assumed it’s using cbMessage.

  14. Anonymous says:

    ahhh, there’s a possibility for a byte-ordering problem reading the data from the socket.

  15. Anonymous says:

    If you can’t alloc memory, then the stream is left in an invalid state. The entire socket message should be read regardless of whether you were able to alloc the memory for the entire thing. Otherwise, if a very large message comes through and you don’t have a enough mem, you are f*@$ed and can’t process anything from there on out.

  16. Anonymous says:

    1. I’m unsure about error messages WSAENOBUFS and WSAEMSGSIZE.

    IMHO, they indicate partially complete operation or recoverable conditions by calling the same function with increased buffer (but there is no such a buffer param for function).

    Preferred can be WSA_NOT_ENOUGH_MEMORY and WSAEINVAL . Last one is at question – as after recv() changes incomplete data can be only as result of message truncation or connection closed.

    2. There is nothing about protocol used. TCP vs. UPD

    First recv will fail with WSAEMSGSIZE on datagram sockets and will discard remaining data from current datagram for comletely valid usage scenario :o(

    If function will be allowed to continue execution – next call to recv will block and get data from another completely independ datagram.

    Note: if TCP is a must for this function – let triage this as DOCBUG ;o)

  17. Anonymous says:

    Jesse – good point, it was my assumption that on any errors, the caller would tear down the connection. In general I don’t like functions like "ReceiveMessage" that affect the global state of a socket.

    AT: You’re right, I neglected to say that the socket was a TCP stream socket. And a good point on the errors, I couldn’t find better ones when I looked at MSDN. DOCBUG sounds good to me.