6.824 Spring 2013: Final project
In the second half of the course you'll work on a distributed systems
project of your choice in teams of two or three. The goal is to have
fun and to explore more advanced topics; you don't have to do
novel research.
We'll grade you on how much you got working, how elegant your design
is, how well you can explain it, and how interesting and creative your
solution is. Time is limited, so try to make sure
your goals are reasonable; perhaps set a minimum goal that's
definitely achievable and a more ambitious goal if things go well.
Schedule and Deliverables
- April 5: Form groups of 2 to 3. Submit a project
proposal, just a paragraph or two. You can submit this via the submissions site, with a file called proj-proposal.{pdf|txt}. Only one person from your group needs to submit, but the proposal should contain each group member's name.
- Week of April 8: first project conference with 6.824 staff --
we'll schedule a meeting with each team to discuss your proposal.
- Week of May 1: second project conference.
- May 10: Submit a two-page write-up, via the submission site, filename proj-final.{pdf|txt}
- Week of May 13: final conference with 6.824 staff, with demonstration.
- Week of May 13: short in-class demonstration.
Ideas
Here's a list of ideas to get you started thinking -- but you should
feel free to pursue your own ideas.
- Measure the performance of Lab 4, identify the bottlenecks,
and modify it to make it faster.
- Extend Lab 4 so that it performs well with large amounts of
data (e.g. hundreds of gigabytes per server), particularly
when servers temporarily or permanently fail. Have a look at
RAMCloud, Flat Datacenter Storage, Petal, and FAB for ideas.
- Make Lab 4 (or some other design) work well when the replicas are
separated by a wide-area network (e.g. are 100s or 1000s of miles
apart).
- Build a fault-tolerant replicated system in the style of
Hypervisor Fault Tolerance, using a modern virtual machine.
- Build a general-purpose library for state-machine replication.
- Extend Lab 4 so that it stores data (and Paxos state)
persistently on disk, for fast recovery after reboots.
- Build a fault-tolerant file server, perhaps using FUSE.
- Build a secure distributed system using x86
Trusted
Execution Technology (TXT).
- Add atomic transactions to Lab 4, using two-phase commit
and/or snapshots.
- Build a system with asynchronous replication (like Dynamo or
Ficus or Bayou). Perhaps add stronger consistency (as in COPS
or Walter).
- Build a file synchronizer.
- Build a
distributed shared memory (DSM) system, so that you can run
multi-threaded shared memory parallel programs on a cluster of
machines, using paging to give the appearance of real shared memory.
When a thread tries to access a page that's on another machine, the
page fault will give the DSM system a chance to fetch the page over
the network from whatever machine currently stores.
- Build a distributed RAID in the style of Petal, or a voting-based
storage system like FAB. Maybe you can get standard operating systems
to talk to you network virtual disk using iSCSI.
- Build a coherent caching system for use by web sites (a bit
like memcached).
- Build a distributed cooperative web cache.
- Build a better tracker for BitTorrent.
- Build a collaborative editor like EtherPad.