6.1810 2024 Lecture 9: Device drivers, interrupts Topic: device drivers a CPU needs attached devices: storage, communication, display, &c OS device drivers control these devices device handling can be hard: devices often have rigid and complex interfaces devices and CPU run in parallel -- concurrency interrupts hardware wants attention now! e.g., pkt arrived software must set aside current work and respond on RISC-V use same trap mechanism as for syscalls and exceptions interrupts can arrive at awkward times most code in production kernels is device drivers you will write one for a network card Where are devices? [CPU, bus, RAM, disk, net, uart] Programming devices: memory-mapped I/O device hardware has some control and status registers device registers live at a physical "memory" address ld/st to these addresses read/write device control registers platform designer decides where devices live in physical address space example device: UART Universal Asynchronous Receiver Transmitter serial interface, input and output "RS232 port", e.g. qemu console a uart is hardware -- transistors qemu emulates the common 16550 uart chip data sheet: 16550.pdf link on schedule page, or web search data sheet details physical, electrical, and programming [rx wire, receive shift register, receive FIFO] [transmit FIFO, transmit shift register, tx wire] 16-byte FIFOs memory-mapped 8-bit registers at physical address UART0=0x10000000: (page 9 of 16550.pdf) 0: RHR / THR -- receive/transmit holding register 1: IER -- interrupt enable register, 0x1 is receive enable, 0x2 transmit ... 5: LSR -- line status register, 0x1 is receive data ready how does a kernel device driver use these registers? simple example: uartgetc() in kernel/uart.c ReadReg(RHR) turns into *(char*)(0x10000000 + 0) why does the UART have FIFO buffers? device driver must cope with times when device is not ready read() but rx FIFO is empty write() but tx FIFO is full LSR bits: Data Ready, Transmitter Empty how should device drivers wait? perhaps a "busy loop": while((LSR & 1) == 0) ; return RHR OK if waiting is unlikely -- if input nearly always available but too wasteful for the console! often no input (keystrokes) are waiting in FIFO many devices are like this -- may need to wait a long time for I/O the solution: interrupts when device needs driver attention, device raises an interrupt UART interrupts if: rx FIFO goes from empty to not-empty, or tx FIFO goes from full to not-full how does kernel see interrupts? [add PLIC to diagram, including bus] device -> PLIC -> CPU -> trap -> usertrap()/kerneltrap() -> devintr() trap.c devintr() scause high bit indicates the trap is from a device interrupt a PLIC register indicates which device interrupted the "IRQ" -- UART's IRQ is 10 IRQs are defined by the platform -- qemu in this case an interrupt is usually just a hint that device state might have changed the real truth is in the device's status registers device driver must read them to decide action, if any for UART, check LSR to see if rx FIFO non-empty, tx FIFO non-full as in uartgetc() one interrupt may signal multiple actions needed Let's look at how xv6 sets up the interrupt machinery start() start.c:35 w_sie(r_sie() | SIE_SEIE | SIE_STIE); asks for interrupts from PLIC, timer uartinit() uart.c:75 WriteReg(IER, IER_TX_ENABLE | IER_RX_ENABLE); trap() trap.c:65 calls intr_on() riscv.h:285 w_sstatus(r_sstatus() | SSTATUS_SIE); Let's look at the shell reading input from the console/UART % make qemu-gdb % gdb (gdb) c (gdb) tbreak sys_read (gdb) c (gdb) tui enable (gdb) where sys_read() fileread() consoleread() look at cons.buf, cons.r, cons.w -- "producer/consumer buffer" [diagram: buf, r, w] (gdb) print cons there's nothing to read yet... sleep() now let's look at uart interrupt handling I'm going to press return Q: where should I tell gdb to put a breakpoint to see the interrupt? (gdb) tb *kernelvec (gdb) c how did we get here? (gdb) where in kernel; no process was running; scheduler() UART -> PLIC -> stvec -> kernelvec (gdb) p/x $stvec (gdb) p $pc kernelvec.S: if a process had been executing in user space, trap would have gone to trampoline and usertrap(), which we've seen kernelvec like trampoline, but for traps while kernel is executing saves registers on current stack; which stack? in this case, special scheduler stack if executing system call in kernel, some proc's kernel stack if in kernel, and interrupts enabled, stack guaranteed valid kernelvec ends by jumping to kerneltrap() -- C code (gdb) tb kerneltrap (gdb) c (gdb) next ... into devintr() devintr() (gdb) p/x $scause scause high bit means it's an interrupt p. 96 / Table 22 in riscv privileged manual plic_claim() to find IRQ (which device) (gdb) p irq the PLIC generates IRQ 10 for the UART uartintr() uartgetc() what's in the LSR? (gdb) x/1bx 0x10000005 16550.pdf page 9 says low bit is Data Ready if LSR says data ready, fetch from RHR x/1bx 0x10000005 -- note low bit no longer set consoleintr() backspace/newline/&c processing print cons x/3b cons.buf wakeup() return through devintr, plic_complete(), kerneltrap scheduler will now resume sh's read() system call since woken up let's break in sh's consoleread() (gdb) tb console.c:99 (gdb) c (gdb) where consoleread()'s sleep() returns consoleread() sees our character in cons.buf[cons.r] sh's read returns, with my typed newline character General device-driver pattern: top-half and bottom-half [diagram: top-half/bottom-half] top half: executing a process's system call, e.g. write() or read() may tell the device to start output or input may wait for input to be ready, or output to complete shared information (buffer) bottom half: the interrupt handler reads input, or sends more output, from/to device hardware interacts with "top half" process put input where top half can find it tell top half that input has arrived or that more output can be sent does *not* run in context of top-half process maybe on different core maybe interrupting some other process so interactions must be arms-length -- buffers, sleep/wakeup What if multiple devices want to interrupt at the same time? The PLIC distributes interrupts among cores Different interrupts can be handled in parallel on different cores Each interrupt is claimed by first core to call plic_claim() Each individual device has at most one interrupt in play PLIC knows done via plic_complete() What if kernel has disabled interrupts when a device asks for one? by clearing SIE in sstatus, with intr_off() PLIC/CPU remember pending interrupts deliver when kernel re-enables interrupts Interrupts involve several forms of concurrency 1. Device produces new data while kernel consuming Or the other way around 2. If enabled, device interrupts can occur at any time! E.g. while top half is executing 3. Interrupt may run on different CPU in parallel with top half Locks: next lecture Decoupling production and consumption Input from device: Can arrive at time when reader not waiting Can arrive faster, or slower, than reader can read Want to accumulate input, and read(), in batches for efficiency Output to device: If device is slow, want to buffer output so process can continue If device is fast, want to send in batches for efficiency A common solution pattern producer/consumer buffer separate pointers for producer, consumer wait; notify; We've seen this at two levels: UART internal FIFOs, for device and driver -- plus interrupts cons.buf, for top-half and bottom-half -- plus sleep/wakeup We'll see this again when we look at pipes If enabled, a device interrupt can occur between any two instructions Example: suppose the kernel is counting something in a global variable n top half: n = n + 1 interrupt bottom half: n = n + 1 the machine code for n=n+1 looks like this: lw a4, n add a4, a4, 1 sw a4, n what if an interrupt occurs between lw and add? and interrupt handler also says n = n + 1? One solution: briefly disable interrupts in top half intr_off() n = n + 1 intr_on() intr_off(): w_sstatus(r_sstatus() & ~SSTATUS_SIE); Good, but not enough: interrupt could arrive on a different CPU More on this when we look at locking Interrupts incur overhead around a microsecond the time required for CPU trap, save registers, decide which device, and later restore registers, and return "overhead" == cost *excluding* useful device driver work What if interrupt rate is high? Example: modern ethernet can deliver millions of packets / second At that rate, big fraction of CPU time in interrupt *overhead* Polling: an event notification strategy for high rates Tell device (or PLIC) not to generate interrupts for the device Top-half loops until device says it is ready e.g. uartputc_sync() Or perhaps check in some frequently executed kernel code e.g. scheduler() Then process everything accumulated since last poll More efficient than interrupts if device is usually ready quickly Perhaps switch strategies based on measured rate DMA (direct memory access) can move data efficiently the xv6 uart driver reads bytes one at a time in software CPUs are not efficient for this: off chip, not cacheable, 8 bits at a time OK only for low-speed devices most fast devices automatically copy batches of input to RAM -- DMA then interrupt input is already in ordinary RAM CPU RAM operations are efficient Interrupt evolution Interrupt overhead used to be a few CPU cycles now 1000s of cycles -- around a microsecond excluding actual device driver code due to pipelines, large register sets, cache/TLB misses, slow RAM So: old approach: simple h/w, smart s/w, lots of interrupts new approach: smart h/w, does lots of work for each interrupt Interrupts and device handling a continuing area of concern Special fast interrupt handler paths Clever spreading of work over CPUs Forwarding of interrupts to user space for page faults and user-handled devices h/w delivers directly to user, w/o kernel intervention? faster forwarding path through kernel? We will be seeing these topics later in the course