What Data Looks Like on an Ethernet Network

There are two ways to think about the representation of data on a standard Ethernet network. The first way is to think about the actual encoded electronic pulses that go across the wire. This turns out to be very complicated because the signaling not only has to capture the logical information of the data, but also accommodate the timing, noise, and efficiency characteristics of the adapters and wires. There have been a number of encodings developed for this purpose, but unless you're building physical hardware, you don't ever need to see this encoded form.

Let's not think about what the encoded pulses look like. Instead, let's just think about the logical data that is being sent to the adapter. Even if you never write a device driver for your network card, you still might come across logical Ethernet frames if you ever need to directly read or write network data.

An Ethernet frame has a 14 byte header, a data section, and a 4 byte trailer. The header contains the address of the machine that should read the frame, the address of the machine that sent the frame, and the type of data payload. The destination address is mostly a suggestion. There's nothing stopping other machines on the network from reading the data as well. Note that the source address is the source on this network. That machine may have originally gotten the data from somewhere else. The actual source address is hopefully buried in the data.

There are hundreds of recognized formats for the data section. Your network card probably only ever sees two of those: IPv4 and ARP. If you're adventurous, maybe it gets some RARP and IPv6 data as well. Many of the registered formats haven't been used in years. The Internet Protocol won the protocol wars.

The contents of the data are an opaque blob to the Ethernet adapter. Inside that data is typically more layers of framing and encapsulation before you get to the application data. However, the network driver may need to attach some null bytes to the end of the data to meet the minimum size requirements of an Ethernet frame. This requirement comes from those electrical characteristics that we weren't going to talk about. The length needs to be included inside the data so that the pad bytes can be ignored. Oddly, even though every driver must pad data on the way out, a network driver reading data can't assume that the padding exists. This is because going through a loopback device may have stripped out the padding and then indicated the frame to another adapter on the machine. The frame never went over the wire in that state but the driver doesn't know that.

The CRC is a hashing function that checks that data wasn't corrupted during transmission. This isn't used as a security measure because anyone can recomputed the hash easily. The CRC just protects against unintentional modifications. I'll talk more about the CRC and how to compute it in a few weeks.

Next time: Versioning for Addresses, Envelopes, and Messages