Posted on behalf of Sunil Muthuswamy
The information presented in this blog post is as per the current design and is subject to change. Reader is expected to be familiar with the Overview of the Windows Subsystem and WSL System Calls blog posts.
In this information age of “roam-anywhere, always connected” devices, networking plays a very important role. Users expect to be able to take advantage of complex capabilities and assume that they will be present at any point in time. This places a large burden on the networking stack to allow Internet access, data exchange, and state information to be communicated between distinct processes that live on the same device or a device across the world.
This post discusses networking within WSL. It explains how WSL configures networking within the subsystem, how it keeps the information up to date with any changes and the implementation of the various Linux\BSD socket address families. The design decisions were made with the specific goal of providing full binary compatibility1 to Linux applications and to present the same networking view to Linux administrators (and users alike) within the WSL, as they are accustomed to.
There are few key networking concepts involved within the operating systems2, that are covered in this post; network interfaces, Domain Name Resolution Service (DNS) and sockets.
Network interfaces can be thought of as an identifiable gateway that interconnects two or more endpoints with an established interface for communication. Network resources over the internet are usually identified using a domain name, such as www.microsoft.com. Though these names are more human readable, they are not well understood by the network systems. Under the hood, network identities have a different address format. For INET domains, an endpoint is identified using an IP address. DNS service provides the mapping between the more human-readable address format to the machine understood IP address. Without proper DNS resolution, Internet traffic would not reach anywhere. Sockets are set of APIs to communicate between two endpoints, which can be thought of as a conduit for communication.
Networking in Linux
This section will discuss how the various networking concepts that were discussed in the Introduction, applies and is available in Linux.
The list of network interfaces that are available on a particular system can be accessed via couple of different ways in Linux, both through syscalls. The outdated, but, yet fully supported way is to use Socket IOCTL’s such as SIOCGIFNAME etc. The newer and recommended way that provides the same view, is through the NETLINK sockets, using the NETLINK_ROUTE family. All of the pertaining information about the various network interfaces is kept by the kernel and made available through the above syscalls.
Domain Name Resolution (DNS)
DNS is supported in Linux with the help of the resolver services configured using
/etc/resolv.conf in combination with
hosts file contains a static list or map of hostnames to their corresponding IP address. While
resolv.conf contains (amongst other things) a list of Domain Nameserver(s) that are capable of resolving a given hostname to its IP address. The resolver API’s encapsulates all of this and provides DNS service to the applications.
Sockets is an API interface that allows inter process communication (IPC). The two endpoints that want to communicate with each other, open a socket at their end. Once a socket is bound to a given address, other sockets can discover it, depending on the scope of the address. For connection-less protocols such as datagram, this is sufficient for sending and receiving data. For connection-oriented sockets such as stream, a connection needs to be first established between the two peers, before data can be sent or received.
Opening a socket requires the caller to provide the address family "AF" (also referred to as protocol family ‘PF’ or domain), the type (ex: datagram, stream etc.) and the protocol. Sockets are broadly categorized depending on their domain names. The type (and the protocol) helps identify sub-category of the socket. The most commonly used domains by *NIX applications is the AF_INET, AF_UNIX (also known as AF_LOCAL) and AF_NETLINK. AF_INET provides access to the Internet protocol. AF_UNIX is used for communicating between processes that live on the same system3. And, lastly, AF_NETLINK sockets are used for communicating between the user mode and the kernel. They can also be used for IPC between two user mode processes.
Networking in WSL
This section provides an overview of how the above Linux networking concepts are implemented within WSL for a full binary and interface compatibility.
Network interfaces and DNS
When the first instance of bash.exe is launched, the LXSS session manager service4 queries the list of network interfaces and the DNS servers from Windows. Using the LXBus, the service passes this information to the Linux Subsystem driver. The driver caches all the information about the various network interfaces locally. As for the DNS entries, the Linux Subsystem driver populates the
resolv.conf with the list. The cached network interface information kept by the Linux Subsystem driver is accessible through the aforementioned socket IOCTL’s. During the first launch of bash.exe, it will also auto-generate5 the
hosts file for the particular system.
The LXSS session manager service also registers with Windows for notifications for any updates to the network interface (for example, moving from wireless to wired Ethernet), or to the DNS entries. Windows will notify the service of any change to the monitored network information, by calling the registered callback handler. When notified of any such change, the callback handler (part of the service), will use the same information flow mentioned above to notify the Linux Subsystem driver, which will update
resolv.conf with the updated DNS entries. This way WSL can always keep the network components within the system up to date and in sync with Windows.
Currently, WSL provides implementation for the AF_INET, AF_UNIX and AF_NETLINK address families. All of the socket implementation in WSL is provided in the kernel mode, from within the Linux Subsystem driver. This is essentially because all of the BSD socket API’s map one-to-one directly to syscalls6.
AF_INET (Internet domain)
As discussed previously, socket created with the AF_INET domain provide access to the information and services hosted on the Internet. Within the INET domain, there are few different supported socket types; namely the DGRAM, STREAM and RAW. DGRAM sockets, or, more commonly referred to as UDP sockets are connection-less sockets, with no reliability guarantees. STREAM sockets, or, more commonly referred to as TCP sockets are connection-oriented sockets, with some reliability and ordering guarantees provided by the protocol. RAW sockets support many different protocols, such as the ICMP (used by ping), and RAW which allows the protocol to be implemented entirely in the user space.
Win32 has user-mode adaptation of the BSD sockets called Winsock. It would seem pretty straight forward to leverage Winsock for providing the socket implementation within WSL. But, as mentioned previously, all of the WSL socket implementation is in the kernel mode, in the WSL socket library (WslSocket.lib) (which is part of the Linux Subsystem driver). That rules out the possibility of using Winsock. Fortunately, NT has a kernel mode network programming interface, called as the Winsock Kernel (or more commonly, as WSK).
WSK is a publicly documented API set. It is a very thin layered, low-level NT API set that provides fast and easy access to the data from the TCP/IP driver, with little or no overhead of its own. The WSK API interface differs significantly from the BSD API’s, even though it uses the same underlying constructs such as “sockets”. As for example, WSK does not provide any buffering of data7. Another example would be the differences in the operating model of WSK and BSD sockets. WSK supports both synchronous and asynchronous modes of retrieving data from the TCP/IP driver, with the asynchronous mode being more efficient. BSD sockets also support synchronous and asynchronous I/O (though epoll), but their semantics are vastly different. WSL (*not* WSK) provides support for the BSD socket API’s by translating them to the WSK API’s and wherever needed (such as for data buffering) bridging the differences by having the necessary infrastructure and implementation within WSL. (see Figure 1 AF_INET sockets in WSL).
The following sections will go into the details of each of the individual socket types such as TCP, UDP, and, explain how WSL implements them underneath.
Figure 1. AF_INET sockets in WSL
DGRAM or UDP socket type
As mentioned previously, UDP sockets are light weight, connection-less sockets, with no delivery guarantees. Once a socket has been created, it can be used to send data immediately. Receiving data requires other sockets to be able to identify it using its address. Once the socket binds to an address, then other sockets can send data to it without any further delay. Any UDP socket can send data to any UDP socket, as long it is able to identify (or locate) it using its address.
Figure 2. WSL INET UDP file context
When a WSL UDP application creates a BSD socket using the ‘socket’ syscall, the ‘WSL socket library’ creates a context (see Figure 2 WSL INET UDP file context) and attaches it to the file descriptor (using the VFS file object) that is returned by the syscall. As part of the same ‘socket’ syscall, the driver also creates a WSK UDP socket and stores it in the context. All further operations by the user-mode app on that socket will allow the driver to extract the associated context, and, the WSk socket within. Any data sent over the BSD UDP socket, is sent directly over on the WSK socket.
When the user mode application binds the BSD socket, the ‘WSL socket library’ binds the corresponding WSK socket and registers a WSK ‘receive from’ callback handler with WSK. Any time, data is available on the WSK socket to be read, WSK will call the registered handler/routine. The handler stores the data provided by the WSK (from TCP/IP) in the receive buffer, along with some metadata such as the size of the data (packet), ‘receive from’ address etc. When the application calls ‘recvfrom’ on the BSD socket, the socket library is able to satisfy the request using the data from the ‘receive buffer’.
STREAM or TCP socket type
As mentioned previously, TCP sockets are connection-oriented sockets, with some delivery guarantees. A connection needs to be first established between the two sockets, usually referred to as the ‘client’ and ‘server’ socket. The ‘server’ socket, binds to a well-known address and listens for incoming connection(s) using the ‘listen’ socket call. At this point, the ‘server’ socket is capable of accepting connections and clients can connect to it, using the ‘connect’ socket call. Once the connection is established, the server socket can then be used to accept connections using the ‘accept’ socket call. The accept call returns a new socket, that is connected to the client socket and can be used to send/receive data to/from the client socket, and vice versa. The server socket is then free to accept more incoming connections on the original server socket.
The mechanism with which the ‘WSL socket library’ creates and manages TCP BSD sockets is very similar to that of the UDP BSD sockets, but is more involved. When a WSL TCP application creates a BSD TCP socket using the ‘socket’ syscall, the ‘WSL socket library’ creates a context (see Figure 3 WSL INET TCP file context) and attaches it to the file descriptor (using the VFS file object) that is returned by the syscall. As part of the same ‘socket’ syscall, the driver also creates a WSK TCP socket and stores it in the context. The one noticeable difference between how the ‘WSL socket library’ handles UDP and TCP sockets, is the point at which it registers for WSK callbacks. As we saw earlier, in the case of UDP, the callback was registered during bind. For TCP sockets, the WSK callbacks are not registered up until later because of the need to know whether the TCP socket will be used for accepting connections or send/receive (a TCP socket can only be either of the two). When the application calls the ‘listen’ socket call, the ‘WSL socket library’ registers its ‘WSK accept’ callback handler with WSK. For the send/recv socket, whenever an application calls ‘connect’, or, on all accepted sockets, the ‘WSL socket library’ will register the ‘WSK receive’ and ‘WSK disconnect’ callback handlers. With all the right callback handlers registered with WSK, the ‘WSL socket library’ is well equipped to deal with any event that is related to that socket.
Figure 3. WSL INET TCP file context
In the case of the ‘listening’ socket, whenever there is an incoming connection request on a socket, WSK will call the appropriate ‘WSK accept’ handler registered for that socket. The ‘WSL socket library’ can then decide whether to accept or reject that connection, based on the already accepted connection list. In the case where the connection is accepted, the accepted WSK socket is stored in the list of accepted sockets. Whenever the application calls ‘accept’, the ‘WSL socket library’ finds the next connection from the list, creates a new ‘WSL TCP file context’, stores the corresponding WSK socket within it, and, returns a new socket file descriptor to the user. The new file descriptor that is returned can be used for send/recv.
For data transfer, once the connection is established, TCP sockets can send and receive data. In the case of send, when the WSL TCP application requests data to be sent over the socket, the ‘WSK socket library’ will queue the data in the ‘send buffer’ and log a pending request with WSK to send the data and return immediately, so that the send can complete asynchronously. The case of receive is similar to that of UDP. WSK will call the registered ‘WSK receive’ callback, whenever there is data to available on that socket. The ‘WSK socket library’ will buffer the incoming data in the internal ‘receive buffer’, and that data is now available to the user-mode TCP application. The application can receive the data using the ‘recv/recvmsg/read’ socket call.
RAW socket type
The case of ‘RAW’ socket is very similar to that of ‘UDP’ socket (see DGRAM or UDP socket type), because of the similarities in the data transfer protocol. The one major difference is that the underlying WSK socket type that is stored in the associated ‘file context’ is of the type, ‘RAW’. Currently, WSK only supports RAW sockets of the ‘ICMP’ protocol.
AF_UNIX or AF_LOCAL
AF_UNIX domain sockets are used for inter-process communication between processes that live within the same domain (or system). The ‘WSL socket library’ provides and manages the implementation for the AF_UNIX domain socket purely and wholly within the subsystem, without any involvement from WSK (see Figure 4 AF_UNIX sockets in WSL)
Figure 4 AF_UNIX sockets in WSL
AF_NETLINK sockets can be used for communication between the kernel and user-mode processes, and also between multiple user-mode processes. AF_NETLINK supports multiple protocols for kernel and user-mode communication, including NETLINK_ROUTE, NETLINK_FIREWALL and NETLINK_KOBJECT_UEVENT. WSL has implemented support for user-mode calls to the kernel in the NETLINK_ROUTE protocol, which handles the querying and configuration of network parameters such as network interfaces, IP addresses and routing tables. The decision to prioritize NETLINK_ROUTE was based on targeted telemetry gathered from WSL users, which showed that NETLINK_ROUTE was the most commonly used Netlink protocol among the applications executed in WSL. Examples of Linux applications using NETLINK_ROUTE include ip, traceroute and whois.
Support for AF_NETLINK sockets is implemented inside the Linux Subsystem driver. Specifically, the NETLINK_ROUTE protocol is implemented by calling the Windows NETIO APIs and translating the information provided by the NETIO APIs into the format expected by the NETLINK_ROUTE messages. The following table describes which NETLINK_ROUTE message types are currently supported in WSL, as well as the equivalent NETIO API used in the Linux Subsystem driver to implement its behavior, and the typical Linux user-mode utility usage. We are actively working to expand this table to more NETLINK_ROUTE message types.
|NETLINK_ROUTE Message Type||Windows NETIO API||Linux User-mode Usage Example|
|RTM_GETADDR||GetUnicastIpAddressTable()||ip addr show|
|RTM_GETROUTE*||GetIpForwardTable2()||ip route show|
Linux provides plenty of socket options that are available to the user application through the set/getsockopt syscalls.
AF_INET socket options
The INET socket options are layered at different levels, following the OSI networking model. This allows the application to apply the socket option at different layer, such as the TCP (or UDP), IP or at the socket layer, providing high level of control to the application. WSL manages this by assigning clear ownership of the socket options. Some of the socket options such as the send/receive buffer sizes (SO_SNDBUF/RCVBUF), send/receive timeouts (SO_SNDTIMEO /SO_RCVTIMEO) are fully managed by the "WSK socket library". Most other socket options, including all of the TCP/IP socket options are applied to the WSK socket, using the WskControlSocket API, and the "WSK socket library" merely acts as a pass through for those options.
AF_UNIX socket options
All of the AF_UNIX socket options are fully managed by the "WSL socket library".
Many thanks to the Windows networking team for implementing the necessary features required to support networking in WSL, and for their guidance.
 From a networking perspective
 In reference here to the Linux and Windows OS.
 or in the same network namespace
 Refer to the “WSL Components” diagram in the “Windows Subsystem for Linux Overview” blog post
 Can be disabled by the user.
 For details on how the syscall redirection works, refer to the blog post on ‘WSL System Calls’
 Which should not be confused with Winsock.
Sunil Muthuswamy and Seth Juarez explore networking on WSL