6.824 Spring 2004 Paper Questions

Please remember to print your answer for each day's paper question on a sheet of paper and hand it in at the beginning of class.

Thu Feb 5: The Interaction of Architecture and Operating System Design, 1991. We're reading this paper in order to dig into some details of important O/S abstractions, and to talk about how they affect performance. You can skip Sections 3 and 4, but do read Sections 5 and 6. The Question: Table 1 shows that the MIPS R2000 runs applications 4.2 times faster than the CVAX, and that the newer R3000 runs applications 6.7 times faster than the CVAX. What do you suppose is the main reason why the R3000 is faster than the R2000?

Tue Feb 10: Flash: An efficient and portable Web server, 1999. How would you expect the relative performance of MP, MT and AMPED servers improve if the Flash experiments were run on a multiprocessor machine? (with no modifications to the code)

Thu Feb 12: Design and Implementation of the Sun Network Filesystem, 1995. NFS file handles are intended be be file identifiers. Would it be reasonable for an NFS server implementation to use a file's name (e.g. "/usr/rtm/webproxy1.C") in the file handle, instead of the inode number? If not, give an example of a serious problem that would result. Ignore the question of whether file handles are big enough to hold long file names.

Thu Feb 19: Network Objects, 1993. The network objects system allows surrogate types to be more general than the object type on the server. For the example type hierarchy NetObj.T - MITStudent.T - Course6Student.T, the surrogate doesn't need to know about Course6Student before referring to the object (or even MITStudent in certain cases). How is this feature useful?

Tue Feb 24: Reimplementing the Cedar File System Using Logging and Group Commit, 1987. At the end of Section 4, the paper says that during a one-byte file create FSD writes the leader+data page synchronously to the disk, but records the update to the file name table in memory and only writes it back to disk later. Why do you suppose the FSD designers decided to write the data page synchronously? What (if anything) might go wrong if FSD instead wrote the file's data in the in-memory disk cache, and only wrote it to disk later?

Thu Feb 26: A Coherent Distributed File Cache with Directory Write-Behind. Read just Sections 1, 2, 3, and 4. Echo's guarantee B on page 6 says that writes to a given object become stable in the same order that they are performed. Does FSD provide ordering semantics at least as strict as this rule? That is, in every situation in which Echo enforces an update order because of this rule, would FSD also write the updates to disk in the same order? If yes, briefly say why; if not, give a counter-example.

Tue Mar 2: Manageability, availability, and performance in Porcupine: a highly scalable, cluster-based mail service, 1999. Suppose a Porcupine node crashes, then a mail message M1 arrives that should be stored in a fragment on the crashed node, then the user reads and deletes M1, then message M2 arrives that should be stored in the fragment, then the node reboots. The fragment is replicated (which is why the read and delete work). Ideally the node would end up with a copy of M2 but not M1 after the reboot. What specific Porcupine mechanism(s) ensure that the node gets a copy of M2? And what mechanism(s) ensure that it does not end up with a copy of M1?

Thu Mar 4: Backtracking Intrusions, 2003. What specific purpose does the BackTracker system serve? For example: if an intruder breaks in, modifies a bunch of files and leaves unnoticed, has this system been defeated? Why or why not?

Tue Mar 9: XOM, 2000. Suppose that every PC had a XOM CPU, and that a vendor of expensive CAD software wanted to use XOM for copy protection. The idea would be that a customer would provide his or her XOM public key to the vendor, which would encrypt the CAD program so that only a XOM CPU with the corresponding private key could read the program. A malicious customer might compute a new public/private key pair and send the public key to the vendor. If the vendor sent back a copy of the CAD program encrypted with this public key, the customer could easily decrypt it, thus breaking the copy protection scheme. How do you suppose the XOM designers intended that the CAD vendor prevent this attack?

Thu Mar 11: Secure Untrusted Data Repository (SUNDR), 2003, NYU technical report. How many rounds of the consistency protocol are required to complete a creat() call on SUNDR?

Tue Mar 30: Eliminating Receive Livelock in an Interrupt-driven Kernel, 1995, USENIX. Suppose we have an NFS client running the following code:

while (1) { send NFS_READ RPC; wait for response }
Is the NFS server likely to livelock? What happens as we increase the number of clients running this loop?

Thu Apr 1: Hypervisor-based Fault Tolerance, SOSP 1995. Suppose the primary fails due to a power failure, the backup takes over, and then power to the primary is restored. You would like the primary to resume replicated execution so that the system could tolerate a subsequent failure of the backup. Is this likely to be easy with hypervisor-based fault-tolerance? Why or why not?

Tue Apr 6: Hive: Fault Containment for Shared-Memory Multiprocessors, SOSP 1995. The Hive firewall hardware on a node selectively protects that node's physical memory from writes by other nodes. The Hive designers could instead have built firewall hardware associated with each node CPU that controlled which areas of memory (on other nodes) the node was allowed to write. Why do you suppose the Hive designers chose to have the firewall act on incoming writes to memory rather than on outgoing writes from each CPU?

Thu Apr 8: TreadMarks: Distributed Shared Memory on Standard Workstations and Operating systems, USENIX 1994. Suppose that a simplified version of Treadmarks, called Dreadmarks, simply sent all modifications of variables between an acquire and a release to the next processor to acquire the same lock. Deadmarks doesn't otherwise communicate any changes to memory. Outline a specific simple situation in which Treadmarks would provide more useful or intuitive memory behavior than Dreadmarks.

Tue Apr 13: Optimistic Replication Using Vector Time Pairs. The paper's Section 3.3 says that replicas need not keep deletion records. Suppose that replica R1 creates file F, then synchronizes with R2, then R1 deletes its copy of the file and discards the deletion notice, then R2 modifies its copy of the file, and then R1 and R2 synchronize again. What reasoning will R1 use to conclude that there is a conflict?

Tue Apr 13: Group Communication in Amoeba and its Applications, 1993. On page 6 the paper says that node failure events are totally ordered along with the ordinary multicast messages. It is probably the case that the Bullet server uses this fact: how?

Thu Apr 22: Recovery Management in QuickSilver, 1988. The paper seems long but that's mostly because of the way it's typeset. You can skip Section 4.3 and Section 5. Section 4.1 says that the one-phase commit protocol is used by servers that maintain only volatile state. What additional action(s) do servers with non-volatile (persistent) state do that requires them to participate in two-phase commit?

Tue Apr 27: Frangipani: A Scalable Distributed File System, 1997. What if a server modifies an i-node, appends the modification to its log, then another server modifies the same i-node, then the first server crashes, and the recovery system needs to decide whether to play back the log entry. Clearly this would not be a good thing, since it would un-do the second change. Can this situation arise? What does Frangipani do to deal with it?

Tue May 4: Freenet: A Distributed Anonymous Information Retrieval System, 2000. The second paragraph of Section 3.5 describes a complex mechanism to choose the key under which a new node is stored in other nodes' routing tables. Why can't the new node just choose a random key itself and tell the other nodes that key? What might go wrong if this was the way that Freenet worked?

Thu May 6: Democratizing content publication with Coral, 2004. The paper mentions (Sections 2.3 and 4.2) that Coral's "sloppy" DHT can store multiple different values for each key. What would break, or not work as well, if Coral's DHT only stored one value for each key?

Tue May 10: Untangling the Web from DNS, 2004. Section 3.3 says that the SFRTag is a hash of a public key and a salt. What is the salt for? What would go wrong, or be awkward, if the SFRTag were the hash of a public key alone?

Thu May 12: Hybrid Global-Local Indexing for Efficient Peer-to-peer Information Retrieval, 2004. Suppose the Web contains 3 billion pages, and the average page contains 1000 words. Roughly how much total storage would eSearch require to hold an index of the Web? Assuming hosts with 100 GB disks, how many Engine nodes would eSearch need to hold this data?