6.1810 2024 Lecture 9: Device drivers, interrupts

Topic: device drivers
  a CPU needs attached devices: storage, communication, display, &c
    OS device drivers control these devices
  device handling can be hard:
    devices often have rigid and complex interfaces
    devices and CPU run in parallel -- concurrency
    interrupts
      hardware wants attention now!
        e.g., pkt arrived
      software must set aside current work and respond
        on RISC-V use same trap mechanism as for syscalls and exceptions
      interrupts can arrive at awkward times
   most code in production kernels is device drivers
     you will write one for a network card
   
Where are devices?
  [CPU, bus, RAM, disk, net, uart]

Programming devices: memory-mapped I/O
  device hardware has some control and status registers
  device registers live at a physical "memory" address
  ld/st to these addresses read/write device control registers
  platform designer decides where devices live in physical address space

example device: UART
  Universal Asynchronous Receiver Transmitter
  serial interface, input and output
  "RS232 port", e.g. qemu console
  a uart is hardware -- transistors
  qemu emulates the common 16550 uart chip
    data sheet: 16550.pdf link on schedule page, or web search
    data sheet details physical, electrical, and programming
  [rx wire, receive shift register, receive FIFO]
  [transmit FIFO, transmit shift register, tx wire]
  16-byte FIFOs
  memory-mapped 8-bit registers at physical address UART0=0x10000000:
    (page 9 of 16550.pdf)
    0: RHR / THR -- receive/transmit holding register
    1: IER -- interrupt enable register, 0x1 is receive enable, 0x2 transmit
    ...
    5: LSR -- line status register, 0x1 is receive data ready

how does a kernel device driver use these registers?
  simple example: uartgetc() in kernel/uart.c
  ReadReg(RHR) turns into *(char*)(0x10000000 + 0)

why does the UART have FIFO buffers?

device driver must cope with times when device is not ready
  read() but rx FIFO is empty
  write() but tx FIFO is full
  LSR bits: Data Ready, Transmitter Empty

how should device drivers wait?

perhaps a "busy loop":
  while((LSR & 1) == 0)
    ;
  return RHR
OK if waiting is unlikely -- if input nearly always available
but too wasteful for the console!
  often no input (keystrokes) are waiting in FIFO
  many devices are like this -- may need to wait a long time for I/O

the solution: interrupts
  when device needs driver attention, device raises an interrupt
  UART interrupts if:
    rx FIFO goes from empty to not-empty, or
    tx FIFO goes from full to not-full

how does kernel see interrupts?
  [add PLIC to diagram, including bus]
  device -> PLIC -> CPU -> trap -> usertrap()/kerneltrap() -> devintr()
  trap.c devintr()
  scause high bit indicates the trap is from a device interrupt
  a PLIC register indicates which device interrupted
    the "IRQ" -- UART's IRQ is 10
    IRQs are defined by the platform -- qemu in this case

an interrupt is usually just a hint that device state might have changed
  the real truth is in the device's status registers
    device driver must read them to decide action, if any
  for UART, check LSR to see if rx FIFO non-empty, tx FIFO non-full
    as in uartgetc()
    one interrupt may signal multiple actions needed

Let's look at how xv6 sets up the interrupt machinery

  start() start.c:35
    w_sie(r_sie() | SIE_SEIE | SIE_STIE);
    asks for interrupts from PLIC, timer
  uartinit() uart.c:75
    WriteReg(IER, IER_TX_ENABLE | IER_RX_ENABLE);
  trap() trap.c:65 calls
    intr_on() riscv.h:285
      w_sstatus(r_sstatus() | SSTATUS_SIE);

Let's look at the shell reading input from the console/UART

% make qemu-gdb
% gdb
(gdb) c
(gdb) tbreak sys_read
(gdb) c
<press return>
(gdb) tui enable
(gdb) where
sys_read()
  fileread()
    consoleread()
      look at cons.buf, cons.r, cons.w -- "producer/consumer buffer"
      [diagram: buf, r, w]
      (gdb) print cons
      there's nothing to read yet...
      sleep()

now let's look at uart interrupt handling
I'm going to press return

Q: where should I tell gdb to put a breakpoint to see the interrupt?

(gdb) tb *kernelvec
(gdb) c
<press return>

how did we get here?
  (gdb) where
  in kernel; no process was running; scheduler()
  UART -> PLIC -> stvec -> kernelvec
  (gdb) p/x $stvec
  (gdb) p $pc

kernelvec.S:
  if a process had been executing in user space, trap would
    have gone to trampoline and usertrap(), which we've seen
  kernelvec like trampoline, but for traps while kernel is executing
  saves registers on current stack;  which stack?
    in this case, special scheduler stack
    if executing system call in kernel, some proc's kernel stack
  if in kernel, and interrupts enabled, stack guaranteed valid
  kernelvec ends by jumping to kerneltrap() -- C code

(gdb) tb kerneltrap
(gdb) c
(gdb) next ... into devintr()
  devintr()
    (gdb) p/x $scause
    scause high bit means it's an interrupt
      p. 96 / Table 22 in riscv privileged manual
    plic_claim() to find IRQ (which device)
    (gdb) p irq
      the PLIC generates IRQ 10 for the UART
    uartintr()
      uartgetc()
      what's in the LSR?
        (gdb) x/1bx 0x10000005
        16550.pdf page 9 says low bit is Data Ready
      if LSR says data ready, fetch from RHR
      x/1bx 0x10000005 -- note low bit no longer set
      consoleintr()
        backspace/newline/&c processing
        print cons
        x/3b cons.buf
        wakeup()
return through devintr, plic_complete(), kerneltrap


scheduler will now resume sh's read() system call
  since woken up
  let's break in sh's consoleread()
  (gdb) tb console.c:99
  (gdb) c
  (gdb) where
  consoleread()'s sleep() returns
  consoleread() sees our character in cons.buf[cons.r]
  sh's read returns, with my typed newline character

General device-driver pattern: top-half and bottom-half
  [diagram: top-half/bottom-half]
  top half:
     executing a process's system call, e.g. write() or read()
     may tell the device to start output or input
     may wait for input to be ready, or output to complete
  shared information (buffer)
  bottom half:
     the interrupt handler
     reads input, or sends more output, from/to device hardware
     interacts with "top half" process
       put input where top half can find it
       tell top half that input has arrived
       or that more output can be sent
     does *not* run in context of top-half process
       maybe on different core
       maybe interrupting some other process
     so interactions must be arms-length -- buffers, sleep/wakeup

What if multiple devices want to interrupt at the same time?
  The PLIC distributes interrupts among cores
    Different interrupts can be handled in parallel on different cores
  Each interrupt is claimed by first core to call plic_claim()
  Each individual device has at most one interrupt in play
    PLIC knows done via plic_complete()

What if kernel has disabled interrupts when a device asks for one?
  by clearing SIE in sstatus, with intr_off()
  PLIC/CPU remember pending interrupts
  deliver when kernel re-enables interrupts
    
Interrupts involve several forms of concurrency
  1. Device produces new data while kernel consuming
     Or the other way around
  2. If enabled, device interrupts can occur at any time!
     E.g. while top half is executing
  3. Interrupt may run on different CPU in parallel with top half
     Locks: next lecture

Decoupling production and consumption
  Input from device:
    Can arrive at time when reader not waiting
    Can arrive faster, or slower, than reader can read
    Want to accumulate input, and read(), in batches for efficiency
  Output to device:
    If device is slow, want to buffer output so process can continue
    If device is fast, want to send in batches for efficiency
  A common solution pattern
    producer/consumer buffer
    separate pointers for producer, consumer
    wait; notify;
  We've seen this at two levels:
    UART internal FIFOs, for device and driver -- plus interrupts
    cons.buf, for top-half and bottom-half -- plus sleep/wakeup
  We'll see this again when we look at pipes
    
If enabled, a device interrupt can occur between any two instructions
  Example:
    suppose the kernel is counting something in a global variable n
    top half: n = n + 1
    interrupt bottom half: n = n + 1
    the machine code for n=n+1 looks like this:
      lw a4, n
      add a4, a4, 1
      sw a4, n
    what if an interrupt occurs between lw and add?
      and interrupt handler also says n = n + 1?
  One solution: briefly disable interrupts in top half
    intr_off()
    n = n + 1
    intr_on()
    intr_off(): w_sstatus(r_sstatus() & ~SSTATUS_SIE);
  Good, but not enough: interrupt could arrive on a different CPU
    More on this when we look at locking

Interrupts incur overhead
  around a microsecond
  the time required for CPU trap, save registers, decide which
    device, and later restore registers, and return
  "overhead" == cost *excluding* useful device driver work

What if interrupt rate is high? 
  Example: modern ethernet can deliver millions of packets / second
  At that rate, big fraction of CPU time in interrupt *overhead*
  
Polling: an event notification strategy for high rates
  Tell device (or PLIC) not to generate interrupts for the device
  Top-half loops until device says it is ready
    e.g. uartputc_sync()
    Or perhaps check in some frequently executed kernel code e.g. scheduler()
  Then process everything accumulated since last poll
  More efficient than interrupts if device is usually ready quickly
  Perhaps switch strategies based on measured rate

DMA (direct memory access) can move data efficiently
  the xv6 uart driver reads bytes one at a time in software
    CPUs are not efficient for this:
      off chip, not cacheable, 8 bits at a time
    OK only for low-speed devices
  most fast devices automatically copy batches of input to RAM -- DMA
    then interrupt
    input is already in ordinary RAM
    CPU RAM operations are efficient

Interrupt evolution
  Interrupt overhead used to be a few CPU cycles
    now 1000s of cycles -- around a microsecond
      excluding actual device driver code
    due to pipelines, large register sets, cache/TLB misses, slow RAM
  So:
    old approach: simple h/w, smart s/w, lots of interrupts
    new approach: smart h/w, does lots of work for each interrupt

Interrupts and device handling a continuing area of concern
  Special fast interrupt handler paths
  Clever spreading of work over CPUs
  Forwarding of interrupts to user space
    for page faults and user-handled devices
    h/w delivers directly to user, w/o kernel intervention?
    faster forwarding path through kernel?
  We will be seeing these topics later in the course