Getting started with Valkyrie

Changelog

Date Description
28-Sep-06 Original posting
16-Oct-06 If you use Control/C (break) to terminate an mpi job, you may have runaway processes. Be sure to check and clean up after yourself.

Valkyrie

The hardware platform for the course is a Beowulf cluster named Valkyrie. Valkyrie is managed by the ACS and runs the Rocks software developed at the San Diego Supercomputer Center. The system consists of 16 dual 1 GHz Pentium III CPUs, each with 1GB of RAM and running Linux. (Here is a web page telling you more about Valkyrie's CPUs.) A Myrinet switch provides low latency connectivity between the nodes.

Valkyrie should only be used for parallel program development and measurement. If you need to use a Linux or UNIX system, say to run latex and the like, please use your student account. You may access your account in the Advanced Programming Environment (APE) lab, located in EBU3B 2236.

Since the front end runs jobs that are invoked without mpirun, e.g. a.out, don't run computationally intensive programs unless you are invoking them with mpirun. Otherwise, you may slow down the front end for others.

 


Issues

  • Valkyrie is running, but with one caveat:
  • Nodes 6,7,8,14 [of 0..15] are currently down, so use a machine file to restrict your runs to the functioning 12 nodes. See below for instructions. [9/23/06, 12.59 PM]
  • Runaway processes


  • SSH

    SSH - Logging in for the first time

    If you haven't set up your environment for the secure shell, then you'll get the following message printed on your screen:

        It doesn't appear that you have set up your ssh key.
        This process will make the files:
        /home/cs260x/<your account>/.ssh/identity.pub
        /home/cs260x/<your account>/.ssh/identity
        /home/cs260x/<your account>/.ssh/authorized_keys


    Generating public/private rsa1 key pair.

    You will then be asked 3 questions shown below. Be sure to hit carriage return (entering no other input) in response to
    each question:

        Enter file in which to save the key (/home/cs260x/<your account>/.ssh/identity):

        Created directory '/home/cs260x/<your account>/.ssh'.
        Enter passphrase (empty for no passphrase):
        Enter same passphrase again:
        Your identification has been saved in /home/cs260x/<your account>/.ssh/identity.
        Your public key has been saved in /home/cs260x/<your account>/.ssh/identity.pub.
        The key fingerprint is:
        <several 2 digit hex numbers separated by :> <your account>@valkyrie.ucsd.edu

    Environment

    We'll be using the bash shell. Modify your .bash_profile using information found in /export/home/cs260x-public/bash_profile. (From now on we'll refer to this directory as $(PUB).) The provided bash_profile file sets up your path to enable you to run MPI jobs, and to use a special version of the the Gnu C++ compiler that incorporates the MPI libraries.

    export PATH=/opt/mpich/myrinet/gnu/bin:$PATH
    
    
    It sets the MANPATH to provide access to the MPI manual pages:
    MANPATH=/opt/mpich/gnu/man:$MANPATH
    
    and to point to a public binary directory containing generally useful programs that are found in $(PUB)/bin.


    Compiling and running your first program

    We've set up some code to get your started in the directory $(PUB)/examples. These will help acquaint you with process of running an MPI program. Compile and run the two programs in the subdirectory called Basic. Be sure to use the Makefile that we've supplied so you'll get the correct compiler and loader flags. The Makefile includes an "arch" file that defines appropriate command line flags for the compiler. You currently have arch which provides the appropriate settings needed for Valkyrie. (If you want to run on other machines let us know)

    To compile your programs use the makefiles provided for you. These makefiles include an architecture file (arch) file containing the appropriate compiler settings.

    Running a program with mpirun

    Run your program with the mpirun command. The command provides the -np flag so you can specify how many nodes to run on. There are 16 nodes numbered 0 through 15. Be sure to run this program in a subdirectory of your home directory, as it is not possible to run in $(PUB)

    To establish that your environment has been set up correctly, compile and run the parallel "hello world" program. This program prints "Hello World'' from each process along with the process ID. It also reports the total number of processes in the run. The hello world program is found in $(PUB)/examples/Basic/hello. To run the program use mpirun as follows:

    mpirun -np 2 ./hello
    

    Here is some sample output:

    # processes: 2
    Hello world from node 0
    Hello world from node 1
    
    You must specify "." before the executable. Note that any command line arguments come in the usual position, after the name of the executable. Thus, to run the Ring program(found in $(PUB)/examples/Ring) on 4 processes with command line arguments -t 5 and -s 1024, type:
    mpirun -np 4 $(PUB)/examples/Ring/ring -t 5 -s 1024
    Using machine files

    Sometimes you'll want to specify particular nodes to run on. For example, in order to obtain reproducible results, you might decide to use the same nodes, especially if you are seeing performance anomalies. Another reason is to avoid failed nodes (and this will happen), as mpirun doesn't understand how to avoid them. To this end, specify a machines file listing the names of the physical nodes on the mpirun command line as follows:

       mpirun -np <# NODES> -machinefile <MACHINE_FILE>
    

    The machine file contains a list of physical node names, one per line. The nodes are numbered from 0 to 15, and are named compute-0-0 through compute-0-15. (Each node contains 2 CPUs, but in effect you may use only 1 CPU per node.) Thus, to run the ring program with nodes 9, 11, 13, and 15 as logical processes; 0-3, create the following file, say mfile:

     compute-0-9
     compute-0-11
     compute-0-13
     compute-0-15
    
    To, run, type
    mpirun -np 4 -machinefile mfile ./ring -t 5 -s 1024
    

    We have provided a python script to generate randomized machine files: $(PUB)/bin/randMach.py. The command line argument specifies the number of processors in the machine file. For example, the command randMach.py 7 > mach was used to generate the following 7-line machine file:

    compute-0-11
    compute-0-10
    compute-0-9
    compute-0-5
    compute-0-3
    compute-0-15
    compute-0-13
    
    You may use the randMach.py python script to avoid failed nodes; the script consults a directory of functioning nodes contained in /export/home/cs260x-public/machines.valkyrie. (This machine is not maintained dynamically, and could be out of date. Post to the web board so others know, email to me, and I'll update the file.)

    Running with 2 CPUs per node

    If you want to run with 2 CPUs per node, you'll also need to use a machine file.

    Generate a machine file with ONE entry for each node you want to use. Do not list each machine entry twice. Then specify the number of processes you want to run with along with the machine file. For example, if you want to run with 6 processors using this machine file p3

    compute-0-11
    compute-0-12
    compute-0-14
    
    you enter
    mpirun -np 6 -machinefile p3 ./a.out
    


    Runaway processes

    Sometimes runaway processes will persist after a run. This can occur if you break a run using control/C (a flaw in the software environment.) If you feel that the machine is slowing down, display the loads over all the nodes as follows.

                ganglia load_one | sort -n -k 2

    valkyrie     	0.33
    compute-0-13 	0.10
    compute-0-1  	0.08
    compute-0-3  	0.07
    compute-0-15 	0.07
    compute-0-4  	0.06
    compute-0-14 	0.06
    compute-0-10 	0.05
    compute-0-2  	0.02
    compute-0-11 	0.02
    compute-0-12 	0.01
    compute-0-9  	0.00
    compute-0-7  	0.00
    compute-0-5  	0.00
    compute-0-0  	0.00
    
    If the load on a node is above more than about 0.1, then the nodes is probably in use. This may be fine if another user is running a job; simply modify your machine file accordingly so that you will avoid interfering with one another (especially if you are collecting performance data).

    In some cases, however, there may be runaway processes. To find out, run the cluster-ps command to display all of your processes sorted by node (If any nodes are down, you'll be notified):

    	cluster-ps <username>
    

    If you find processes with a running TIME of several minutes, this is an indication that they are "runaways," and should be removed.

    USER     PID  %CPU  %MEM   VSZ   RSS TTY    STAT  START     TIME  COMMAND
    compute-0-1: 
    cs260x  2110  99.9   4.2 175036 43988 ?      Rs   10:33    58:48 /home/cs260x/Ring/ring -s 129000
    compute-0-2: 
    cs260x   859  99.7   4.3 58696  45028 ?      Rs   10:23    68:49 /home/cs260x/Ring/ring -s 12900
    cs260x   996  87.2   4.2 175404 43984 ?      Rs   10:33    51:19 /home/cs260x/Ring/ring -s 129000
    compute-0-3: 
    cs260x  1118  98.2   4.3 60212  45028 ?      Rs   10:23    67:52 /home/cs260x/Ring/ring -s 12900
    cs260x  1266  88.4   4.2 176012 43988 ?      Rs   10:33    52:04 /home/cs260x/Ring/ring -s 129000
    
    
    Note the long running times, which are displayed in minutes and seconds. One of the jobs was started at 10:20, the other at 10:33. Nodes 2 and 3 are running 2 processes each; there are 2 processors per node. To clear out these processes, use cluster-kill command:
    cluster-kill <username>
    You may only delete your own processes. If you attempt to delete processes belong to another user, you see message of the following form, which may be ignored:
         compute-0-13:
         kill 9363: Operation not permitted
         Connection to compute-0-13 closed by remote host.
         compute-0-14:
         kill 9904: Operation not permitted
         Connection to compute-0-14 closed by remote host.
         compute-0-15:
         kill 9662: Operation not permitted
         Connection to compute-0-15 closed by remote host.
    

    Be sure to re-run the cluster-ps command to make sure all is clear, but specify the user "cs260x" in order to search all course user IDs (including your instructor!). This method will filter out extraneous commands, making it easier to locate runaway processes:

    cluster-fork 'ps aux' | egrep "cs260x|compute-" | sed -f $PUB/bin/cl.sed
    
    If you find other running processes, and the user is not logged in (you can find that out with the who command), then notify the user by email. Since email doesn't work on Valkyrie, you'll need to finger the user's real name (e.g. finger cs260x) and then check the ucsd data base as in finger username@ucsd.edu.

    As matter of etiquette, be sure and run cluster-ps before logging out. If you plan to be on the machine for a long time, it would be a good idea to run this command occasionally, and before you start a long series of benchmark runs.


    MPI

    MPI documentation is found at http://www-cse.ucsd.edu/users/baden/Doc/mpi.html. You may obtain man pages for the MPI calls used in the example programs described here at http://www-unix.mcs.anl.gov/mpi/www/www3/



    Copyright © 2006 Scott B. Baden. Last modified: Mon Oct 16 14:03:24 PDT 2006