---------------- ------------- | | | | | App yfs--|-----|extent srvr|----- yfs on other hosts | | | | | | | | |--------------| | ------------- | | | | | | ------------- | | Kernel | | | | | | FUSE module | ---| lock srvr |--- | | | | ---------------- -------------
We provide you with skeleton code for both the yfs and extent server modules above. Your job in this lab is to design and implement directories in YFS; to do this, you'll need to fill out the provided modules with code that implements the CREATE/MKNOD, LOOKUP, and READDIR operations in FUSE.
The yfs module actually consists of three separate pieces:
The code for the extent server module lies in extent_server.cc, extent_server.h, and extent_smain.cc. The YFS server communicates with the extent server using the RPC protocol defined in extent_protocol.h. The extent server simply stores entire files as strings, without interpreting the contents of those strings. It also stores information about the attributes of files.
You can learn more about the FUSE file system here. In particular, it may be useful to study FUSE's lowlevel interface, which is pretty much only documented in this header file.
% cd yfs % wget -nc http://pdos.csail.mit.edu.ezproxy.canberra.edu.au/6.824/labs/yfs-lab2.tgz % tar xzvf yfs-lab2.tgz % cd l2 % cp ../l1/*{cc,h} . % makeTo use YFS in this lab, you'll need run three separate commands. The start.sh script will run these commands for you automatically (and the stop.sh script will kill them for you), but we explain them here for clarity. First you need to start an extent server on one of the class machines. You'll need to choose a port number that other students aren't using. If, for example, you choose to run the extent server on your host on port 3772, you should type this:
% cd yfs/l2 % ./extent_server 3772 &
At this point you can start up the YFS server, called yfs_server. This process needs three parameters: a port number to listen on that needs to be unique on the machine; the port number for the extent server that you assigned in the previous command; and the port number for the lock server, which is not used in this lab. So, to start the YFS server on port 3782, you should type this:
% cd yfs/l2 % ./yfs_server 3782 3772 3762 &
Finally, you can mount the YFS file system. You do this by starting up the fuse2yfs process, giving it a unique mountpoint that other students aren't using, and the port number of the yfs_server that you assigned in the previous command. The mountpoint must be an empty directory that already exists. So, if you want to mount YFS under the current working directory as a subdirectory called "yfs", you would run the following command.
% cd yfs/l2 % mkdir yfs % ./fuse2yfs ./yfs 3782 &
Again, you can use start.sh to do all three of these steps for you. It mounts YFS under the "yfs1" mountpoint.
% cd ~/lab-2 % ./start.sh % ./test-lab-2.pl ./yfs1 % ./stop.sh
The skeleton code implements only the GETATTR and STATFS operations, and so the file system you just mounted will not be useful at all to you. However, once you finish this lab, you should be able to run the Lab 2 tests successfully, which tests creating empty files, looking up names in a directory, and listing directory contents. Note: testing this lab on the command line using commands like touch will not work until you implement the SETATTR operation, which is not required until the next lab. For now, you should do your testing via the creat/open, lookup, and readdir system calls in a language like Perl, or simply just use the provided test script.
When using FUSE on Linux, as in the official class programming environment (see here), files are created via the MKNOD operation. On other operating systems, FUSE uses the CREATE operation. You are encouraged, but not required, to structure your code such that either operation will work. However, we will only be testing your code on Linux, which means the MKNOD operation must work.
If your server passes the tester in the class programming environment (see below), then you are done. If you have questions about whether you have to implement specific pieces of file system functionality, then you should be guided by the tester: if you can pass the tests without implementing something, then don't bother implementing it. For example, you don't need to implement the exclusive create semantics of the CREATE/MKNOD operation.
You may modify or add any files you like, other than the tester script.
pain% ./test-lab-2.pl ./yfs1 create file-yyuvjztagkprvmxjnzrbczmvmfhtyxhwloulhggy-18674-0 create file-hcmaxnljdgbpirprwtuxobeforippbndpjtcxywf-18674-1 ... Passed all tests!The tester creates lots of files with names like file-XXX-YYY-Z and checks that they appear in directory listings.
If test-lab-2.pl exits without printing "Passed all tests!", then it thinks something is wrong with your file server. For example, if you run test-lab-2.pl on the skeleton code we give you, you'll probably see an error message like this:
test-lab-2: cannot create /tmp/b/file-ddscdywqxzozdoabhztxexkvpaazvtmrmmvcoayp-21501-0 : Function not implemented
This error message appears because you have not yet assigned a method to handle the CREATE/MKNOD operation with FUSE. See the main() method in fuse.cc for examples on how to make this assignment.
The goal of the extent server is to provide a centralized storage location for all the data representing your distributed filesystem, much like a hard disk would. In later labs you will be serving the same file system contents on multiple hosts, each with its own YFS server. The only way they can share data is by reading and writing the extent server.
The extent server stores key/value pairs, with writes limited to a maximum of size of 8MB. Both keys and values are byte arrays; the extent server should not interpret them. The values should be the entire contents of a particular file or directory. The keys can be whatever you like, though we recommend using the same key that you return to FUSE as the file's inumber. The extent server should supports put(key,value), get(key), getattr(key), and remove(key) RPCs.
The extent server is also responsible for serving the attributes of each file. This consists of the file size, last modification time (mtime), change time (ctime), and last access time (atime). Tracking this data in the extent server should be straightforward in the handlers for the put(key,value) and get(key) RPCs. Wikipedia has a succinct description of when these three times should be updated.
For this lab, it is ok for the extent server to be somewhat simplistic and only store data in memory; this means that if you restart it, all the data previously stored will be lost. However, we may change this requirement for future labs.
In this lab you must choose the format for file and directory meta-data. Meta-data includes per-file information (for example file length) and directory contents. In future labs you'll have to choose a format in which to store each file's contents.
FUSE requires a file system to store certain generic information for every file and directory, such as size and modification times. This information corresponds to an i-node in an on-disk UNIX file system. The easiest way for you to store this information is to store a structure in a map in the extent server, using the file handle as the key. Then when an RPC arrives with the file handle as argument it is easy to fetch the corresponding file or directory's information. There is already a data structure (extent_protocol::attr) defined in extent_protocol.h that you might find useful for this purpose.
The other meta-data that you must store in the extent server are the contents of each directory. A directory's content is a list names, each with a file handle. Keeping this information allows you to handle CREATE/MKNOD, LOOKUP and READDIR operations: CREATE/MKNOD must add an entry to the relevant directory's list, LOOKUP must search the list, and READDIR must return each entry from the list.
Since you're storing this information in the extent server, you have to choose a key under which to store the information, and a format for the information. This format may be anything you like, as long as it can be represented as a single std::string. Your yfs_server should be able to interpret and manipulate this strings to handle requests from the yfs_client, which in turn needs to pass that information back to FUSE.
Each file and directory in the file system must have a unique identifier that FUSE can use to access it; this is called the inumber. The inumber is simply an uninterpreted 32-bit number. It is up to you to assign a unique inumber to every newly-created file in the file system; probably the easiest thing to do is just pick a number at random in your yfs_client whenever a file is created. In the RPC protocol skeletons that we provide (yfs_protocol.h and extent_protocol.h), we define inum types for sending around these identifiers in RPCs; note that although these are 64 bit types, only 32 bits will be usable for FUSE.
It is also very useful if YFS can tell whether a particular inumber references a file or a directory. To do this, you should ensure that any inumber you assign to a file has the most significant bit set (i.e., OR the new id you pick with the mask 0x80000000 before assigning it); likewise, identifiers for directories should have this bit equal to zero. The provided method yfs_client::isfile assumes this property holds for inumbers.
For these labs, you will be interfacing with FUSE via its "lowlevel" API. We have provided you with lots of code in the main() method of fuse.cc that handles much of the lowlevel nastiness. You will, however, have to add a method handler for each new operation you'd like to support; you can do this by assigning method pointers to the appropriate fields of fuseserver_oper. We have already done this for getattr, statfs, and readdir, but you will need to add handlers for mknod and lookup. You should study fuse_lowlevel.h for the what these method definitions must be, and what methods they should use to send their information back to FUSE. Study our getattr implementation to get a sense of how a full FUSE operation handler works, and how it communicates its results and errors back to FUSE.
Sending back directory information for the READDIR operation is a bit tricky, so we've provided you with much of the necessary code in the dirbuf_add, reply_buf_limited, and fuseserver_readdir methods. All that's left for you to do for READDIR in fuse.cc is to get the directory listing from your yfs_client, and add it to the b data structure using dirbuf_add.
Though you are free to choose any inumber identifier you like for newly created files, FUSE assumes that the inumber for the root directory is 0x00000001. Thus, you'll need to ensure that when YFS mounts, it is ready to export an empty directory stored under that inumber.
The start.sh scripts redirects the STDOUT and STDERR of the different processes to different files in the current working directory. For example, any output you make from fuse.cc will be written to fuse2yfs1.out. Thus, you should look at these files for any debug information you print out in your code.
See the Overview and the Getting Started guide for further hacking and debugging tips.
Also, you can get FUSE to print out the requests and results of operations it passes to your file system. To turn this on, add the following line of code to the main() function of fuse.cc, just before the assignment of mountpoint into fuse_argv:
fuse_argv[fuse_argc++] = "-d";
% cd ~/yfs/l2 % make clean % cd .. % tar czvf `whoami`-lab2.tgz l2/That should produce a file called [your_user_name]-lab2.tgz in your yfs/ directory. Attach that file to an email and send it to the 6.824 staff address.