Lab: networking

In this lab you will write an xv6 device driver for a network interface card (NIC), and then write the receive half of an ethernet/IP/UDP protocol processing stack.

Fetch the xv6 source for the lab and check out the net branch:

  $ git fetch
  $ git checkout net
  $ make clean

Background

Before writing code, you may find it helpful to review "Chapter 5: Interrupts and device drivers" in the xv6 book.

You'll use a network device called the E1000 to handle network communication. To xv6 (and the driver you write), the E1000 looks like a real piece of hardware connected to a real Ethernet local area network (LAN). In fact, the E1000 your driver will talk to is an emulation provided by qemu, connected to a LAN that is also emulated by qemu. On this emulated LAN, xv6 (the "guest") has an IP address of 10.0.2.15. Qemu arranges for the computer running qemu to appear on the LAN with IP address 10.0.2.2. When xv6 uses the E1000 to send a packet to 10.0.2.2, qemu delivers the packet to the appropriate application on the computer on which you're running qemu (the "host").

You will use QEMU's "user-mode network stack". QEMU's documentation has more about the user-mode stack here. We've updated the Makefile to enable QEMU's user-mode network stack and the E1000 network card.

The Makefile configures QEMU to record all incoming and outgoing packets to the file packets.pcap in your lab directory. It may be helpful to review these recordings to confirm that xv6 is transmitting and receiving the packets you expect. To display the recorded packets:

tcpdump -XXnr packets.pcap

We've added some files to the xv6 repository for this lab. The file kernel/e1000.c contains initialization code for the E1000 as well as empty functions for transmitting and receiving packets, which you'll fill in. kernel/e1000_dev.h contains definitions for registers and flag bits defined by the E1000 and described in the Intel E1000 Software Developer's Manual. kernel/net.c and kernel/net.h contain simple network stack that implements the IP, UDP, and ARP protocols; net.c has complete code for user processes to send UDP packets, but lacks most of the code to receive packets and deliver them to user space. Finally, kernel/pci.c contains code that searches for an E1000 card on the PCI bus when xv6 boots.

Part One: NIC

Your job is to complete e1000_transmit() and e1000_recv(), both in kernel/e1000.c, so that the driver can transmit and receive packets. You are done with this part when make grade says your solution passes the "txone" and "rxone" tests.

While writing your code, you'll find yourself referring to the E1000 Software Developer's Manual. Of particular help may be the following sections:

Browse the E1000 Software Developer's Manual. This manual covers several closely related Ethernet controllers. QEMU emulates the 82540EM. Skim Chapter 2 now to get a feel for the device. To write your driver, you'll need to be familiar with Chapters 3 and 14, as well as 4.1 (though not 4.1's subsections). You'll also need to use Chapter 13 as a reference. The other chapters mostly cover components of the E1000 that your driver won't have to interact with. Don't worry about the details at first; just get a feel for how the document is structured so you can find things later. The E1000 has many advanced features, most of which you can ignore. Only a small set of basic features is needed to complete this lab.

The e1000_init() function we provide you in e1000.c configures the E1000 to read packets to be transmitted from RAM, and to write received packets to RAM. This technique is called DMA, for direct memory access, referring to the fact that the E1000 hardware directly writes and reads packets to/from RAM.

Because bursts of packets might arrive faster than the driver can process them, e1000_init() provides the E1000 with multiple buffers into which the E1000 can write packets. The E1000 requires these buffers to be described by an array of "descriptors" in RAM; each descriptor contains an address in RAM where the E1000 can write a received packet. struct rx_desc describes the descriptor format. The array of descriptors is called the receive ring, or receive queue. It's a circular ring in the sense that when the card or driver reaches the end of the array, it wraps back to the beginning. e1000_init() allocates packet buffers with kalloc() for the E1000 to DMA into. There is also a transmit ring into which the driver should place packets it wants the E1000 to send. e1000_init() configures the two rings to have size RX_RING_SIZE and TX_RING_SIZE.

When the network stack in net.c needs to send a packet, it calls e1000_transmit() with a pointer to a buffer that holds the packet to be sent; net.c allocates this buffer with kalloc(). Your transmit code must place a pointer to the packet data in a descriptor in the TX (transmit) ring. struct tx_desc describes the descriptor format. You will need to ensure that each buffer is eventually passed to kfree(), but only after the E1000 has finished transmitting the packet (the E1000 sets the E1000_TXD_STAT_DD bit in the descriptor to indicate this).

When the E1000 receives each packet from the ethernet, it DMAs the packet to the memory pointed to by addr in the next RX (receive) ring descriptor. If an E1000 interrupt is not already pending, the E1000 asks the PLIC to deliver one as soon as interrupts are enabled. Your e1000_recv() code must scan the RX ring and deliver each new packet to the network stack (in net.c) by calling net_rx(). You will then need to allocate a new buffer and place it into the descriptor, so that when the E1000 reaches that point in the RX ring again it finds a fresh buffer into which to DMA a new packet.

In addition to reading and writing the descriptor rings in RAM, your driver will need to interact with the E1000 through its memory-mapped control registers, to detect when received packets are available and to inform the E1000 that the driver has filled in some TX descriptors with packets to send. The global variable regs holds a pointer to the E1000's first control register; your driver can get at the other registers by indexing regs as an array. You'll need to use indices E1000_RDT and E1000_TDT in particular.

To test e1000_transmit() sending a single packet, run python3 nettest.py txone in one window, and in another window run make qemu and then run nettest txone in xv6, which sends a single packet. nettest.py will print txone: OK if all went well (i.e. qemu's e1000 emulator saw the packet on the DMA ring and forwarded it outside of qemu).

If transmitting worked, tcpdump -XXnr packets.pcap shold produce output like this:

reading from file packets.pcap, link-type EN10MB (Ethernet)
21:27:31.688123 IP 10.0.2.15.2000 > 10.0.2.2.25603: UDP, length 5
        0x0000:  5255 0a00 0202 5254 0012 3456 0800 4500  RU....RT..4V..E.
        0x0010:  0021 0000 0000 6411 3ebc 0a00 020f 0a00  .!....d.>.......
        0x0020:  0202 07d0 6403 000d 0000 7478 6f6e 65    ....d.....txone

To test e1000_recv() receiving two packets (an ARP query, then a IP/UDP packet), run make qemu in one window, and python3 nettest.py rxone in another window. nettest.py rxone sends a single UDP packet via qemu to xv6; qemu actually first sends an ARP request to xv6, and (after xv6 returns an ARP reply) qemu forwards the UDP packet to xv6. If e1000_recv() works correctly and passes those packets to net_rx(), net.c should print

arp_rx: received an ARP packet
ip_rx: received an IP packet
net.c already contains the code to detect qemu's ARP request and call e1000_transmit() to send its reply. This test requires that both e1000_transmit() and e1000_recv() work. In addition, if all went well, tcpdump -XXnr packets.pcap should produce output like this:
reading from file packets.pcap, link-type EN10MB (Ethernet)
21:29:16.893600 ARP, Request who-has 10.0.2.15 tell 10.0.2.2, length 28
        0x0000:  ffff ffff ffff 5255 0a00 0202 0806 0001  ......RU........
        0x0010:  0800 0604 0001 5255 0a00 0202 0a00 0202  ......RU........
        0x0020:  0000 0000 0000 0a00 020f                 ..........
21:29:16.894543 ARP, Reply 10.0.2.15 is-at 52:54:00:12:34:56, length 28
        0x0000:  5255 0a00 0202 5254 0012 3456 0806 0001  RU....RT..4V....
        0x0010:  0800 0604 0002 5254 0012 3456 0a00 020f  ......RT..4V....
        0x0020:  5255 0a00 0202 0a00 0202                 RU........
21:29:16.902656 IP 10.0.2.2.61350 > 10.0.2.15.2000: UDP, length 3
        0x0000:  5254 0012 3456 5255 0a00 0202 0800 4500  RT..4VRU......E.
        0x0010:  001f 0000 0000 4011 62be 0a00 0202 0a00  ......@.b.......
        0x0020:  020f efa6 07d0 000b fdd6 7879 7a         ..........xyz

Your output will look somewhat different, but it should contain the strings "ARP, Request", "ARP, Reply", "UDP", and "....xyz".

If both of the above tests work, then make grade should show that the first two tests pass.

e1000 hints

Start by adding print statements to e1000_transmit() and e1000_recv(), and running (in xv6) nettest txone. You should see from your print statements that nettest txone generates a call to e1000_transmit.

Some hints for implementing e1000_transmit:

Some hints for implementing e1000_recv:

You'll need locks to cope with the possibility that xv6 might use the E1000 from more than one process, or might be using the E1000 in a kernel thread when an interrupt arrives.

Part Two: UDP Receive

UDP, the User Datagram Protocol, allows user processes on different Internet hosts to exchange individual packets (datagrams). UDP is layered on top of IP. A user process indicates which host it wants to send a packet to by specifying a 32-bit internet address. Each UDP packet contains a source port number and a destination port number; processes can request to receive packets that arrive addressed to particular port numbers, and can specify the destination port number when sending. Thus two processes on different hosts can communicate with UDP if they know each others' IP addresses and the port numbers each is listening for. For example, Google operates a DNS name server on the host with IP address 8.8.8.8, listening on UDP port 53.

In this task, you'll add code to kernel/net.c to receive UDP packets, queue them, and allow user processes to to read them. net.c already contains the code required for user processes to transmit UDP packets (with the exception of e1000_transmit(), which you provide).

Your job is to implement ip_rx(), sys_recv(), sys_bind(), and sys_unbind() in kernel/net.c. You are done when make grade says your solution passes all of the tests.

The system-call API specification for UDP looks like this:

You'll need to implement these calls, with the exception of sys_send(). The program user/nettest.c uses this API.

To make recv() work, you'll need to add code to ip_rx(), which net_rx() calls for each received IP packet. ip_rx() should decide if the arriving packet is UDP, and whether its destination port has been passed to bind(); if both are true, it should save the packet where recv() can find it. However, for any given port, no more than 16 packets should be saved; if 16 are already waiting for recv(), an incoming packet for that port should be dropped. The point of this rule is to prevent a fast or abusive sender from forcing xv6 to run out of memory. Furthermore, if packets are being dropped for one port because it already has 16 packets waiting, that should not affect packets arriving for other ports.

The packet buffers that ip_rx() looks at contain a 14-byte ethernet header, followed by a 20-byte IP header, followed by an 8-byte UDP header, followed by the UDP payload. You'll find C struct definitions for each of these in kernel/net.h. Wikipedia has a description of the IP header here, and UDP here.

Production IP/UDP implementations are complex, handling protocol options and validating invariants. You only need to do enough to pass make grade. Your code needs to look at ip_p and ip_src in the IP header, and dport, sport, and ulen in the UDP header.

You will have to pay attention to byte order. Ethernet, IP, and UDP header fields that contain multi-byte integers place the most significant byte first in the packet. The RISC-V CPU, when it lays out a multi-byte integer in memory, places the least-significant byte first. This means that, when code extracts a multi-byte integer from a packet, it must re-arrange the bytes. This applies to short (2-byte) and int (4-byte) fields. You can use the ntohs() and ntohl() functions for 2-byte and 4-byte fields, respectively. Look at net_rx() for an example of this when looking at the 2-byte ethernet type field.

You can run the same tests that make grade runs by running python3 nettest.py grade in one window, and (in another window) then running nettest grade inside xv6. If all goes well, nettest.py should print txone: OK, and you should see this in the xv6 window:

$ nettest grade
txone: sending one packet
arp_rx: received an ARP packet
ip_rx: received an IP packet
ping1: starting
ping1: OK
ping2: starting
ping2: OK
ping3: starting
ping3: OK
dns: starting
DNS arecord for pdos.csail.mit.edu. is 128.52.129.126
dns: OK

Submit the lab

Time spent

Create a new file, time.txt, and put in a single integer, the number of hours you spent on the lab. git add and git commit the file.

Answers

If this lab had questions, write up your answers in answers-*.txt. git add and git commit these files.

Submit

Assignment submissions are handled by Gradescope. You will need an MIT gradescope account. See Piazza for the entry code to join the class. Use this link if you need more help joining.

When you're ready to submit, run make zipball, which will generate lab.zip. Upload this zip file to the corresponding Gradescope assignment.

If you run make zipball and you have either uncomitted changes or untracked files, you will see output similar to the following:

 M hello.c
?? bar.c
?? foo.pyc
Untracked files will not be handed in.  Continue? [y/N]
Inspect the above lines and make sure all files that your lab solution needs are tracked, i.e., not listed in a line that begins with ??. You can cause git to track a new file that you create using git add {filename}.

Optional Challenges:

Some of the benefits of the challenge exercises below are only measurable/testable on real, high-performance hardware, which means x86-based computers.

If you pursue a challenge problem, whether it is related to networking or not, please let the course staff know!