Coordination

Required reading: Fast mutual exclusion for uniprocessors

Overview

We revisit the topic of mutual-exclusion coordination: the techniques to protect variables that shared among multiple threads. These techniques allows programmers to implement atomic sections so that one thread can safely update the shared variables without having to worry that another thread intervening.

We focus today on uniprocessors. Even on uniprocessor one has to provide mutual-exclusion coordination, because the scheduler might schedule another thread in response to a hardware interrrupt (e.g., end of a time slice).

The technique for mutual-exclusion coordination used in v6 is disabling/enabling interrupts in the kernel. In v6, the only program with multiple threads is the kernel, thus that is the only program that needs need mutual-exclusion mechanism. User-level processes don't share data directly.

As we discussed last lecture, microkernels force server programs to deal with many similar issues as monolithic kernels, and they therefore require concurrent handling of events. This organizational requirement raises the question "How to handle multiple events concurrently at a user-level?"

Use a thread package that doesn't preempt threads. Thus, the scheduler always resumes the thread that was running before preemption unless the thread released the processor explicitly. A number of systems use this approach, but it is not without challenges either: If a thread calls printf in a section of code that must be executed atomically, how do i know that printf doesn't release the processor?
Use a thread package that provides primitives to mark critical sections atomically (e.g., acquire and release lock). These primitives must be implemented as atomic instructions, which can be done with, for example, a hardware try-set-and-lock (TSL) instruction.
Use events. Structure program as a single loop that process events (e.g., network packet arrival, disk read completion). Each event is handled atomically.

To implement any three of these options we need to avoid blocking system calls. The kernel must support asynchronous system calls or scheduler activations.

Can we use a thread package that use the v6 kernel approach? That is, can we allow a user-level thread to disable and reenable interrupts? (Answer: not, if we care about fault isolation.)

List and insert example:


struct List {
  int data;
  struct List *next;
};

List *list = 0;

insert(int data) {
  List *l = new List;
  l->data = data;
  l->next = list;
  list = l;
}

fn() {
  insert(100);
}

main() {
  thread_create(..., fn);
  thread_create(..., fn);
  thread_schedule();
}

The paper

The paper addresses how to provide mutual-exclusion primitives that are completed implemented in software. The motivation is that some uniprocessors don't provide a hardware-level atomic instructions. It turns out, however, that the approach advocated in the paper is in general a good one, because software implementations are often more efficient than hardware implementations.

The paper describes several ways to implement test-and-set in software:

Emulation. The kernel provides a TSL system call, which is implemented atomically by, for example, disabling interrupts at the beginning of the system call and reenabling them at the end.

Mutual exclusion algorithm. Use an algorithm that can provide TSL completely in software. These algorithms require many instructions. Here is a simpler version of the one in the paper:

bool flag[N];

void TSL (int L, int *R) {
  while (true) {
    flag[me] = true;
    if (!is_flagged (me)) {
       *R = L;
       L = 0;
       flag[me] = false;
       return;
    } else {
       flag[me] = false;
    }
  }
}

boolean is_flagged(me) {
  for (i = 0; i < N; i++) {
    if ((i != me) && flag[i]) return true;
  }
  return false;
}

Insert with TSL and busy waiting:

int insert_lock = 0;

insert(int data) {

  /* acquire the lock: */
  while(TSL(&insert_lock) != 0)
    ;

  /* critical section: */
  List *l = new List;
  l->data = data;
  l->next = list;
  list = l;

  /* release the lock: */
  insert_lock = 0;
}

Restartable atomic sequences. On a uniprocessor few atomic sections will be interrupted, because sections are short and they will only be interrupted when the scheduler decides to reschedule a thread. Thus, what we do is we assume a thread won't be interrrupted in an atomic section. When a thread is scheduled, it checks whether it was in an atomic section before it was rescheduled; if so, the thread reexecutes the whole atomic section again. We need to structure our code so that atomic sections are restartable; this may require some code reorganization.

Paper discussion

Figure 3.
- What if context switch between lines 5 and 6?
- What if context switch between lines 6 and 7? (better not restart)
- What if context switch between lines 4 and 5? (failure?)
There's something wrong here: Kernel has to be able to decide precisely if sequence is done.
Figure 4.
- Is line 4 executed?
- What if interrupted before 3? (restart, re-do reads, no writes yet)
- What if interrupted between 3 and 4? (not possible)
- What if interrupted after 3/4? (already done)
The kernel can tell precisely if sequence has completed: Has line 3 executed?
Implementation. Two implementations: (1) register with kernel addresses of atomic section; and (2) designated sequence, which are recognized by kernel.
How general is RAS? They use it for one particular atomic sequence. Could we use it directly in insert()? (Answer: yes.)
```
insert(int data) {
  List *l = new List;
  l->data = data;
  BEGIN_RAS
    l->next = list;
    list = l;
  END_RAS
}
```
This implementation of insert is strictly better than the one using TSL. The TSL insert is blocking: if T1 is executing the insert is pre-empted and control is passed to T2, T2 cannot execute insert without waiting for T1. This RAS version doesn't block T2. Such versions are called wait free.
Can we use RAS directly in any atomic sequence? (No: what if two writes?). Example: the x86 provides an atomic exchange instruction; the RAS version of this instruction is:
```
void xchg_RAS (int *p1, int *p2) {
  BEGIN_RAS
    int tmp = *p1;
    *p1 = *p2;
    *p2 = tmp;
  END_RAS
```
Find a sequence of events such that xchg_RAS doesn't work. Assume:
```
a = 1;
b = 2;
xchg_RAS (&a, &b);
```

Using TSL one cannot make wait-free implementation of insert, but using cmpxchg we can:

int cmpxchg(addr, v1, v2) {
  int ret = 0;
  // stop all memory activity and ignore interrupts
  if (*addr == v1) {
    *addr = v2;
    ret = 1;
  }
  // resume other memory activity and take interrupts
  return ret;
}

insert (int data) {
  element *n = new Element;
  n->x = x;
  do {
     n->next = list;
  } while (cmpxchg (&list, n->next, n) == 0);
}

Can we implement cmpxchg as RAS? (Answer: yes.)

Figure 6. Should we use RAS in the lab? If so, where would it be useful?