CSE 120 Nachos Project 2: Multiprogramming

Fall 2000

Due: November 16, 2000 at Midnight

The second phase of Nachos is to support multiprogramming. As in the first assignment, we give you some of the code you need and your job is to complete the system and enhance it.

Up to now, all the code you have written for Nachos has been part of the operating system kernel. In a real operating system the kernel not only uses its procedures internally, but also allows user-level programs to access some of its routines via "system calls". An executing user program is a process . In this lab you will modify Nachos to support multiple processes, using system calls to have processes request services from the kernel.

Since your kernel does not trust user programs to execute safely, the kernel and the (simulated) hardware will work together to protect the system from damage by malicious or buggy user programs. To this end, you will implement simple versions of key mechanisms found in real operating system kernels: virtual addressing, protected system calls and kernel exception handling, and preemptive timeslicing. Virtual addressing prevents user processes from accessing kernel data structures or the memory of other programs; your kernel will use process page tables to safely allow multiple processes to reside in memory at the same time. With protected system calls and exceptions, all entries into the kernel funnel through a single kernel routine, "ExceptionHandler"; you will ``bullet-proof'' this routine so that buggy or malicious user programs cannot cause the kernel to crash or behave inappropriately. Finally, your kernel will use preemptive scheduling to share the simulated CPU fairly among the active user processes, so that no process can take over the system. All of these protection mechanisms require cooperation between the hardware and the operating system kernel software. Your implementation will be based on "hardware" support in the Nachos MIPS simulator, which resembles a real MIPS processor.

If all processes are created by other processes, then who creates the first user process? The operating system kernel creates this process itself as part of its initialization sequence. This is bootstrapping . You can "boot" the Nachos kernel by running nachos with the -x option (x for "execute"), giving the name of an initial program to run as the initial process. The Nachos release implements the -x option by calling StartProcess in progtest.c to handcraft the initial process and execute the initial program within it. The initial process may then create other processes, which may create other processes...and so on.

The first step is to read and understand the part of the system we have written for you. The code that you will have to look at and modify is spread over several directories. There are some kernel files in "userprog", a few additional machine simulation files in "machine", and a stub file system in "filesys". The user programs are in "test", and utilities to generate a Nachos loadable executable are in "bin". Since Nachos executes MIPS instructions (and there aren't very many MIPS machines left!), we also provide you a cross-compiler. The cross-compiler runs on Linux and Solaris and compiles user programs into MIPS format.

The code we provide can run a single user-level "C" program at a time. As a test case, we've provided you with a trivial user program, "halt". All "halt" does is to turn around and ask the operating system to shut the machine down. To run the "halt" program, make and then run Nachos in the "userprog" directory:

% cd userprog
% ./nachos -x ../test/halt

Trace what happens as the user program gets loaded, runs, and invokes a system call.

As with the previous assignment, you may find Narten's Road Map to Nachos and Nachos System Call Interface helpful. Also, it is OK to change the constants in "machine.h" if it helps you design better test cases. For example, you can change the amount of "physical memory" allocated by Nachos. Note that the code for this part of the project is enabled by the USER_PROGRAM compiler macro.

The files for this assignment include:

In this assignment we are giving you a simulated CPU that models a real CPU. In fact, the simulated CPU is the same as the real CPU (a MIPS chip), but we cannot just run user programs as regular UNIX processes because we want complete control over how many instructions are executed at a time, how the address spaces work, and how interrupts and exceptions (including system calls) are handled.

Our simulator can run normal programs compiled from C -- see the Makefile in the `test' subdirectory for an example. The compiled programs must be linked with some special flags, and then converted into Nachos format, using the program ``coff2noff'' (which we supply). The only caveat is that floating-point operations are not supported.

  1. [40 pts] Implement system call handling and multiprogramming. To support multiprogramming, you must support some of the system calls defined in syscall.h: Exec, and Exit. You can also implement thread fork and yield for extra credit (see below). We have provided you an assembly-language routine, ``syscall'', to provide a way of invoking a system call from a C routine (UNIX has something similar -- try `man syscall'). Note that the routine `StartProcess' in progtest.cc may be of use to you in implementing the `exec' system call.

    First, you will need basic facilities to load processes into the memory of the simulated machine. Spend a few minutes studying the AddrSpace class, and look at how the StartProcess procedure uses the AddrSpace class methods to create a new process, initialize its memory from an executable file, and start the calling thread running user code in the new process context. The current code for AddrSpace and StartProcess works OK, but it assumes that there is only one program/process running at a time (started with StartProcess from main via the nachos -x option), and that all of the machine's memory is allocated to that process. Your first job is to generalize this code to implement the Exec system call for the general case in which multiple processes are active simultaneously.

    Implement a memory manager module to allow your kernel to allocate page frames of the simulated machine's memory for specific processes, and to keep track of which frames are free and which are in use (cf. bitmap.h). You might find an implementation of a thread-safe Table class will help you with your implementation of AddrSpace. The Table class will store a collection of untyped object pointers indexed by integers in the range [0...size-1], and support the following methods:

    /* Create a table to hold at most "size" entries. */
    Table(int size) 
    
    /* Allocate a table slot for "object", returning the "index" of the
       allocated entry; otherwise, return -1 if no free slots are available. */
    int Alloc(void *object)
    
    /* Retrieve the object from table slot at "index", or NULL if that
       slot has not been allocated. */
    void *Get(int index) 
    
    /* Free the table slot at index. */
    void Release(int index) 
    

    Modify AddrSpace to allow multiple processes to be resident in the machine memory at the same time. The default AddrSpace constructor code assumes that all of the machine memory is free, and it loads the new process contiguously starting at page frame 0. You must modify this scheme to use your memory manager to allocate page frames for the new process, and load the process code and data into those allocated page frames, which are not typically contiguous. For now it is acceptable to fail and return an error (0) from Exec if there is not enough free machine memory to load the executable file.

    Note: to cleanly handle these failures, you will need to move the AddrSpace loading code out of the constructor and into a new AddrSpace method that can return an error. It is poor programming practice to put code that can fail into a class constructor, as the Nachos designers have done in this release. If necessary, update StartProcess to use your new AddrSpace interface so that you do not break the Nachos -x option. Modify AddrSpace to call the memory manager to release the pages allocated to a process when the process is destroyed. Make sure your AddrSpace code also releases any frames allocated to the process in the case where it discovers that it does not have enough memory to load the entire process.

    [Tips on getting started]

  2. [40 pts] Next, use these new facilities to implement the Exec and Exit system calls as defined in the Nachos System Call Interface and in userprog/syscall.h. If an executing user processes requests a system call (by executing a trap instruction) the machine will transfer control to your kernel by calling ExceptionHandler in exception.cc. Your kernel code must extract the system call identifier and the arguments from the machine registers, decode them, and call internal procedures that implement the system call. You may impose a reasonable limit on the maximum size of a file name. Also, use of Machine:ReadMem and Machine:WriteMem is not forbidden as the comment in machine.h implies.

    When an Exec call returns, your kernel should have created a new process and started a new thread executing within it to run the specified program. At this point, do not concern yourself with setting up OpenFileIds. For now, you will be able to run user programs, but they will not be able to read any input or write any output. For Exec, you must copy the filename argument from user memory into kernel memory safely, so that a malicious or buggy user process cannot crash your kernel or violate security. The filename string address (char*) passed into the kernel as an argument is a process virtual address; in order for the kernel to access the filename it must locate the characters in the kernel address space (i.e., in the machine's physical "main memory" array) by examining the page table for the process. In particular, your kernel must handle the case where the filename string crosses user page boundaries and resides in noncontiguous physical memory. You must also detect an illegal string address or a string that runs off the end of the user's address space without a terminating null character. Your kernel should handle these cases by returning an error (SpaceId 0) from the Exec system call.

    Exec must return a unique process identifier (SpaceId), which can be used as an argument to Join. Your kernel will need to keep a table of active processes, and you may find the Table class described above useful here, too. You may find it convenient to implement process exit as an internal kernel procedure called by the Exit system call handler, rather than calling the lower-level procedures directly from ExceptionHandler . This will make it easier to kill a process from inside the kernel (e.g., if the process has some kind of fatal error), by calling the internal exit primitive from another kernel procedure (e.g., ExceptionHandler) in the target process context. In general, this kind of careful internal decomposition will save you from reinventing and redebugging wheels, and it is always good practice.

    [Tips on getting started]

  3. [10 pts] The current version of the "exec" system call does not provide any way for the user program to pass parameters or arguments to the newly created address space. UNIX does allow this, for instance, to pass in command line arguments to the new address space. Implement this feature.

    There are two steps to get this done. First, you need a new signature for Exec so that you can give it the arguments to be passed to the new process. Second, you need to figure out how to pass those arguments to that process.

    1. Change the signature of Exec to be the following:

    SpaceID Exec(char *name, int argc, char **argv, int willJoin);
    

    "name" is the filename as before, "argc" is the number of arguments, "argv" is the array of argument strings, and "willJoin" determines whether this process will be joined by another. The semantics of argc and argv are exactly like standard C. The semantics of the "willJoin" argument are the same as for Nachos threads -- non-zero means that a process will eventually Join on the one being Exec'ed. If you do not implement Join, you should pass in "0" for this argument.

    You will need to change the declaration of Exec in syscall.h, and you will need to change your test programs to use this new version of Exec. If a program does not pass in arguments to the new process, just pass in zero for argc, argv, and willJoin.

    2. To pass in the arguments to the created process, you need to do two things. First, you need to copy the arguments into the process' address space. Second, you need to setup the machine so that, when the process starts executing, the argc and argv arguments are passed into the main() routine of the program.

    Before you can copy in the arguments, you need to create some space for them. To do this, grow the process' virtual address space by some amount (e.g., just like the address space is grown to accomodate the user stack). You decide how much (and document this in your writeup), and make sure that you check that the arguments do not overflow this region. Don't make it too large, though, or you will consume too much physical memory. Then copy the arguments into that region of the virtual address space (this is the opposite of reading system call arguments out of the virtual address space).

    To setup the machine so that the process is started with the arguments, you can assume that the arguments to main use the same set of registers as the arguments passed in from system calls (i.e., same calling convention). The comments at the top of ExceptionHandler describe which registers to use.

    NOTE: To test this, you will need to create a test program that has arguments in its main routine:

    int
    main(int argc, char *argv[])
    {
      // e.g., check to make sure that the arguments are valid...
    }
    

  4. [10 pts] Bullet-proof your kernel. Implement the Nachos kernel code to handle user program exceptions that are not system calls. The machine (or simulator) raises an exception whenever it is unable to execute the next user instruction, e.g., because of an attempt to reference an illegal address, a privileged or illegal instruction or operand, or an arithmetic underflow or overflow condition. The kernel's role is to handle these exceptions in a reasonable way, i.e., by printing an error message and killing the process rather than crashing the whole system. Note: an ASSERT that crashes Nachos is a reasonable response to a bug within your Nachos kernel, but it not an acceptable response to a user program exception.

  5. (Extra Credit) [5 pts] Implement the "join" system call. Join takes one argument, the SpaceId of the process that the caller wants to wait for. The process to be Joined must have been Exec'ed with its willJoin argument set to non-zero. Return -65535 from Join to signify an error (e.g., the SpaceId passed to the Join system call is invalid).

  6. (Extra Credit) [10 pts] Implement the "read" and "write" system calls for console reads and writes (not for file reads and writes). To support the system calls that access the console device, you will probably find it helpful to implement a ``SynchConsole'' class that provides the abstraction of synchronous access to the console (i.e., a read on the SynchConsole blocks the thread until a character is available). ``progtest.cc'' has the beginnings of a SynchConsole implementation. You might find it useful to peek into the filesys/synchdisk.{cc,h} files to use the SynchDisk class as a further example.

  7. (Extra Credit) [5 pts] Implement interprocess communication using pipes (requires support for "read" and "write" system calls). This will redirect the output of one process to the input of another. Since you are not required to implement file reads and writes, this will be the only way to pass information to a process, aside from command line arguments. An example of a unix pipe would be "program1 | program2" In this example, program1 would normally output text to the standard output, but the pipe will redirect this output to the input of program2. program2 would normally take its input from a file. This problem will require some imagination since files do not exist, but shouldn't be too difficult to implement.

  8. (Extra Credit) [10 pts] Implement multithreaded user programs. Implement the thread fork() and yield() system calls to allow a user program to fork a thread to call a routine in the same address space, and then ping-pong between threads.  (Hint: you will need to change the kernel to allocate memory in the user address space for each thread stack.)

As with project 1, we would like a short paper write-up describing what changes you made, how well they worked, how you tested your changes, and how each group member contributed to the project. Please include this writeup in your userprog directory.

To turn in project 2, I want you to create a tar file of your entire nachos directory and email it to me. Assuming you are group 57, do the following:

% cd nachos-3.4/code
% gmake clean
% cd ..
% tar cvf group57.tar *
% gzip group57.tar
% [mail group57.tar.gz to voelker@cs.ucsd.edu]
Look on the project groups page if you do not remember your group number.


voelker@cs.ucsd.edu