Nachos Threads Project, Version Control

January 17 2007

Administrivia

Don't forget to send a list of your project group members to the TAs. We need this info to set up your accounts for project 1.

All the lectures and discussions for this course cover the same material, so you're not tied to the lecture or discussion you signed up for on Tritonlink. Shop around - the other lecture or discussion might make more sense to you.

Nachos Survival Guide

As mentioned in lecture, Nachos is different from most class projects - instead of writing code from scratch, or filling in the blanks in skeleton code, you will be modifying and extending a large chunk of existing code. The world tends to work this way, so get used to it. :) Rarely will you find yourself with the time/money to build something big from the ground up. It is very important to learn how to understand and modify an existing system.

And just in case you haven't heard it enough already: start early. Really.

Design before you implement

Planning is critical for the success for any large project, and Nachos is no exception. You will find that the projects become easier when you plan ahead. I strongly recommend writing up a design document that lays out all the problems that need to be solved, how they will be solved, and who will implement the solutions. Design documents become especially critical when project components depend on each other - fully specify all APIs in your design documents.

Plan as much as possible before implementing anything. It is much easier to detect and correct Big Mistakes in the design stage. You do not want to realize "hey, I don't think this is going to work" when you're halfway through implementation.

Consider bringing your design document to one of the instructors or TAs during office hours. We can help you figure out if you're on a right track, or if there are special cases you should think about.

As soon as each project assignment is announced, it's a good idea to have a group meeting and start working on a design document. This is also how things work in the real world. :) When projects get big, you have to plan things out - a lot of code has to work together at the end of the day. Things usually don't end well when a bunch of programmers charge in and start making things up as they go along.

I recommend making a design document even if you are working by yourself. Spend some extra time planning things out in the beginning - it will save you a lot of time later.

Code for correctness

Figure out how to use Nachos' asserts and debug messages, and use them everywhere. The only downside to asserts and debug messages is performance, but we don't care about performance, so use them liberally.

Use assertions wherever some condition should always be true. That pointer should never be null? Assert it. The list should be empty when you exit the loop? Assert it. X must equal 5 when Y is less than 2? You get the idea. When an assertion fails, the system will tell you which one failed, what the failing condition was ("you said the pointer should never be null, but hey, it's null!"), and the filename and line number where the assertion can be found. A failed assertion is much, much easier to debug than a segfault. Don't think about whether the assertion is likely to fail, or how useful the assertion will be. Assert first, ask questions later. :)

Use debug messages to make Nachos tell you what it's doing. This is much more pleasant than figuring out what the heck Nachos is doing with gdb. For an idea of what's possible, try running "nachos -d t" to turn on thread debug messages. Nachos has a pretty good debug message system. Use it.

Test, then test some more

Nachos projects build upon each other, within assignments, and across assignments. This means that, for the first project, your mailboxes are unlikely to work if you condition variables don't work. This also means that basic infrastructure, such as your locks, has to work, or you'll be in a lot of pain when project 2 rolls around.

Write lots of test programs to convince yourself that your code is working properly. Test early and test often. In particular, if you write a line of code, and you aren't sure it will always do what you want, test it immediately. The longer you wait, the more difficult it will be to figure out why your code isn't working. Test code in isolation as much as possible.

Here's what typically happens: you write a bunch of code without testing, your partner writes a bunch of code without testing, you put all the code together, put on your helmets, and fire up Nachos. This approach typically doesn't work so well. When Nachos crashes, it becomes increasingly difficult to figure out why, as you add more and more untested code.

Do not throw away your test code. Build up a library of test code, and make sure your newest changes to Nachos don't break your old tests. This is called "regression testing." As you can imagine, this is also very common in the real world.

Know your tools

You'll be spending a lot of quality time with your text editor of choice this quarter, so get familiar with it, if you aren't already. If you're not currently using a text editor designed to edit code, you might want to take some time to learn an editor that does (emacs/vi/whatever).

Even if you are familiar with emacs/vi/whatever, it doesn't hurt to teach yourself some new tricks. I'm an emacs guy, so here's a couple more-advanced emacs features that should make your life easier when working with Nachos. If you're a vi/whatever guy, you can look for the equivalent in your editor:

M-x compile - Compile code in emacs. Emacs parses the compiler's error messages, and it can warp you to the source location of each error. Very, very useful. Bind M-x compile to a key.

etags - Jump to definition of procedure. Very useful when figuring out how existing code works. Think of M-. as the "show me how this works" button.

M-x gdb - When you're debugging with gdb, emacs shows you which line is currently executing, and makes it easy to place breakpoints (C-x SPC).

Version control

When you have a bunch of programmers working on a big piece of software, you can't have them all edit one central copy of the code, because bad things happen when two programmers want to edit the same file at the same time. On the other hand, if every programmer has their own copy of the code, they'll have to keep sending code updates to each other, and it will be difficult to keep everyone up to date.

Version control provides a hybrid solution - each programmer edits their own local copy of the code, but they periodically push their changes in to the central code base, and periodically update their local copy of the code with other programmer's checked in changes. You will definitely have to deal with version control in the real world.

The main feature of version control is that it allows programmers to work in parallel with less chaos. As you'd expect, two programmers can work on different files at the same time, but version control allows two programmers to work on the same file at the same time.

So what happens if two programmers change the same part of the same file at the same time? When a programmer pushes their changes into the central code repository, we say they are committing code. Commits are atomic, so it is impossible for two changes to happen at the exact same time - one must occur before the other. The first commit runs normally. When the second commit occurs, version control detects that the second commit conflicts with the first commit, and aborts the second commit. It sends a message back to the second committer that says "In this part over here, you wrote X, but the other programmer wrote Y. Resolve this conflict, then try your commit again."

You resolve conflicts by deciding whether you want the code the way you wrote it, the way the other programmer wrote it, or both.

A secondary feature of version control is that it keeps track of old versions of every file and maintains quite a lot of metadata, such as who committed what, when each commit happened, log messages, etc. This can be very useful - version control can answer questions like "what changed in synch.cc in the last two days?" Pretty much every big project uses some kind of version control system, even if there is only one developer, because of this secondary feature.

cvs demo

We will be using cvs for version control in this class. I am assuming that you already have a cvs repository set up. Hopefully the computer support people will set up repositories for you, but if you need to do it yourself, the commands to learn about are "cvs init" and "cvs import".

We'll play around with the following commands:

cvs update
Pull changes from the cvs repository to the local copy of the code. You can use "cvs update -p -r<version>" to see older versions of a file
cvs commit
Push local changes to the cvs repository
cvs status
Status report: Have we changed our local copy of a file? Do we need to pull updates?
cvs log
Read commit log
cvs diff
Compare two versions of a file. Can compare the local version to any version in the repository, or compare any two versions in the repository
cvs add
Add files to the cvs repository

Nachos

For the first project, you'll be working exclusively in the nachos-3.4/code/threads directory. You'll find the following files:

list.h, list.cc
A generic list class. Very useful. Learn the interface, but don't worry about the details.
main.cc
Contains the code that starts up Nachos. You'll need to change this if you want to add command line options to Nachos. For project 1, take a look at how testnum is set up in main for ThreadTest
scheduler.cc
Thread scheduler. You'll need to understand how the existing scheduler works to implement priority scheduling.
switch.s
Implements context switching. An interesting read, but you don't need to understand the details, and you definitely won't be making changes here
synch.h, synch.cc
Synchronization routines. You'll need to fill in the definitions at the bottom of the file (Lock::Acquire, Lock::Release, etc) to implement locks and condition variables. Take a look at the implementation of semaphores for a starting point
synchlist.h, synchlist.cc
A generic synchronized list class. Only one thread can modify a synchlist at a time. Very useful. It depends on locks and condition variables, so synchlist won't work as advertised until you implement locks.
system.h, system.cc
Nachos startup and shutdown code. Also defines Nachos global variables. You probably don't need to change anything here.
thread.h, thread.cc
Thread routines. Figuring out how fork, finish, yield, and sleep work will help you figure out how to implement join.
threadtest.cc
Contains a simple test for the thread system. Extend this with your own tests
utility.h, utility.cc
Read over this to figure out how to use Nachos' debug message routines and assert.

Lecture review

  1. What's an operating system
  2. Architectural support for operating systems
  3. Processes

Questions

  1. What is a page fault?
  2. x86 provides the atomic XCHG instruction. How can we use this instruction to implement a lock?
  3. One of the big selling points of Windows 95 was "preemptive multitasking". What do you think this means?
  4. What does the OS use timer interrupts for? Can we have similar functionality without timer interrupts?
  5. What are the text segment, data segment, heap and stack? Why do we need all these segments? How do I use these segments in a C program?
  6. Nachos' Thread::Fork() calls Thread::StackAllocate() to set up the new thread's stack. The comment for Thread::StackAllocate() says:
    //----------------------------------------------------------------------
    // Thread::StackAllocate
    //      Allocate and initialize an execution stack.  The stack is
    //      initialized with an initial stack frame for ThreadRoot, which:
    //              enables interrupts
    //              calls (*func)(arg)
    //              calls Thread::Finish
    //----------------------------------------------------------------------
    
    This means that when a new thread starts executing, the very first thing it does is enable interrupts. Why? Who turned off interrupts?
  7. What does the following code do?
    while(1) 
      fork();
    
    If you want to try it out, run it on your machine at home, NOT in the campus computer labs.