Required reading: An emperical study of oeprating systems errors
Operating systems must obey many rules for correctness and performance. Examples rules:
In addition, there are standard software engineering rules, like use function results in consistent ways.
These rules are typically not checked by a compiler, even though they could be checked by a compiler, in principle. The goal of the meta-level compilation project is to allow system implementors to write system-specific compiler extensions that check the source code for rule violations.
The results are dramatic: many new bugs found (500-1000) in Linux alone. The paper for today studies these bugs and attempts to draw lessons from these bugs.
A system programmer writes the rule checkers in a high-level, state-machine language (metal). These checkers are dynamically linked into an extensible version of g++, xg++. Xg++ applies the rule checkers to every possible execution path of a function that is being compiled.
An example rule from the OSDI paper:
sm check_interrupts { decl { unsigned} flags; pat enable = { sti(); } | {restore_flags(flags);} ; pat disable = { cli(); }; is_enabled: disable ==> is_disabled | enable ==> { err("double enable")}; ...A more complete version found 82 errors in the Linux 2.3.99 kernel. Common mistake:
get_free_buffer ( ... ) { .... save_flags (flags); cli (); if ((bh = sh->buffer_pool) == NULL) return NULL; .... }
Some checkers produce false positives, because of limitations of both static analysis and the checkers, which mostly use local analysis.
How does the block checker work? The first pass is a rule that marks functions as potentially blocking. After processing a function, the checker emits the function's flow graph to a file (including, annotations and functions called). The second pass takes the merged flow graph of all function calls, and produces a file with all functions that have a path in the control-flow-graph to a blocking function call. For the Linux kernel this results in 3,000 functions that potentially could call sleep. Yet another checker like check_interrupts checks if a function calls any of the 3,000 functions with interrupts disabled. Etc.
The paper is primarily about the block, null, and var checker. The primary conclusions of the paper are:
Methodology
This paper is best discussed by studying every figure.