A Brief Guide to DataStar

Changelog

Date Description
29-Jan-08 Added discussion about the showq command Express queue
25-Jan-08 Added discussion about using the Express queue
01-Jan-08 Editing changes
26-Dec-07 Original posting

 

Introduction

DataStar is a parallel computer located at the San Diego Supercomputer Center and is manufactured by IBM. The machine is a heterogeneous collection of 8-way and 32-way SMP nodes interconnected by a high performance switch. The nodes contain IBM Power4 CPUs, but differ according to the amount of memory they contain and the clock speed of the processors. See the System Configuration for more information, as well as the The POWER4 Processor Introduction and Tuning Guide (IBM, one of the famed "Red Books").

The purpose of this web page is to get you started running jobs on the machine. But be sure to look over the DataStar User's Guide carefully in order to appreciate DataStar's full range of capabilities. Here is documentation on the IBM C++ and Fortran Compilers and on the BLAS libraries. NERSC also maintains a web site listing more IBM Documentation.

Environment

DataStar has a front end called dslogin.sdsc.edu. From this node you may compose and compile code, and submit batch jobs. This machine should not be used to run jobs interactively. There is a special interactive node for this purpose, called dspoe.sdsc.edu. (There is another interactive node called dsdirect.sdsc.edu with 32 CPUs and 64GB of memory. It is intended for large memory jobs and should not normally be used in class.) Dspoe is a p655 server with 8 processors and 16 GB of memory. You may not use more than 5% of the node’s memory, however.

You may also run in batch mode to obtain dedicated access to nodes—although the switch will be shared with other jobs. Use batch mode when collecting performance measurements, when reproducibility is important. Instructions for batch mode are given below.

Compiling and Running on DataStar

A public directory directory has been set up in dslogin.sdsc.edu:~baden/cse260_wi08 containing source code you’ll use in your assignments. From now on we’ll refer to this directory as as $(PUB).

To establish that your environment has been set up correctly, compile and run the provided parallel "hello world" program, which prints "Hello World'' from each process along with the process ID. The code for hello world! is found in $(PUB)/Examples/Basic/hello. Be sure to use the Makefile that we've supplied so you'll get the correct compiler and loader flags for DataStar.

You’ll notice that the Makefile includes a file called arch.dstar. This file configures the Makefile to use the IBM compilers that incorporate the MPI library, and sets up various compiler and loader flags. We'll use thread-safe versions of the compilers, which are distinguished with names ending in the suffix _r, e.g. mpCC_r for the thread-safe C++ compiler, and so on. (For serial code you may use xlC.) The "arch" file should not normally be changed. If you do modify it, be sure to document any changes you made in your writeups. (Contact us if you need to run on other machines.)

Run interactively on dspoe.sdsc.edu using the poe command as follows:

poe hello -nodes 1 -tasks_per_node 2 -rmpool 1 -euilib us -euidevice sn_all

Here is some sample output:

# processes: 2
Hello world from node 0
Hello world from node 1

Note that any command line arguments come in the usual position, after the name of the executable, and are followed by the arguments to poe. Thus, to run the Ring program(found in $(PUB)/A2/Ring) on 16 CPUs with command line arguments -lin 0 1024 64, enter the following:

poe ring -lin 0 1024 64 -nodes 2 -tasks_per_node 8 -rmpool 1 -euilib us -euidevice sn_all

This invocation runs a job on 2 nodes with 8 processes per node. You may vary the poe parameters nodes and tasks_per_node up to configuration limited values. The remaining parameters should not be changed. You may also set up the configuration with environment variables:

setenv MP_NODES 2
setenv MP_TASKS_PER_NODE 8
setenv MP_RMPOOL 1
setenv MP_EUILIB us
poe ring -lin 0 1024 64

For more details, see the on-line instructions.

Running Batch Jobs

When you are ready to collect measurements, make your production runs using the batch subsystem. Batch jobs are submitted with the llsubmit command: llsubmit command:

llsubmit run_batch.sh

where the run_batch.sh is a job submission script containing the appropriate environment settings and one or more runs that you wish to make. We’ve set up a script file in $(PUB)/A2/Hello. You will need to make only minimal changes to the script, as noted in the file. (LLsubmit scripts indicate options with a # mark. These are not comments, unless there are two ## in a row.)

Once your job has been submitted, it will wait in the queue until the specified resources are available. You may check on the status of your job using the llq command. If you specify your user ID as in llq -u baden you'll see only your jobs.

You can remove jobs with the llcancel command:


ds100 [5]: llq -u baden
Id               Owner      Submitted   ST PRI Class        Running On 
---------------- ---------- ----------- -- --- ------------ -----------
ds100.404902.0   baden       1/14 07:55 I  50  high      
ds100.404906.0   baden       1/14 07:33 I  50  high     

2 job step(s) in query, 2 waiting, 0 pending, 0 running, 0 held, 0 preempted

ds100 [51]: llcancel ds100.404902.0
llcancel: Cancel command has been sent to the central manager.

Once your job has dispatched, you'll get an email confirmation. You'll get another email confirmation when your job has completed. The provided script is set up to write all output to a file which contains a unique identifier in its name as in HELLO_WORLD_404906.out. You also get an error output file with the same prefix. Normally the error file will show successful outcomes, as in

ATTENTION: 0031-408 4 tasks allocated by LoadLeveler, continuing...

but if there are errors in submitting or running the job, e.g. executable not found, exceeding resource limits, you'll be notified.

/var/loadl/execute/ds002.313841.0/run.sh[108]: ./ring: not found.
You may also have the output mailed to you. The .err file will contain some information about abnormal job termination. See the documentation for the details.

More on Queues

There are various batch queues, and they vary according to factors such as: expected job length, maximum numbers of nodes—and cost. For small jobs running on 1 or 2 nodes for a minute or less (such as in our programming assigments), use the high queue. You must submit your job from dsdirect.sdsc.edu rather than dspoe.sdsc.edu. The provided script specifies this queue.

There is also an express queue. Jobs destined for this queue must be submitted from dspoe. To use this quue, modify the job submission script as follows. (See the example script $(PUB)/A2/Ringrun_1x4.sh)

In particular, you must change the queue designation as follows

# @ class               = express

and add the following line (you can put it under the class designation)

# @ requirements        = (OpSys=="AIX53")

With these two changes, you can run from the express queue, provided you submit from dspoe rather than dslogin.[1/25/08]

For longer running times or larger numbers of processors (for some projects), use the normal queue. Be sure to see us if you need to use the normal queue. Consult the Datastar’s documentation for more information.

Keep the job time limit low until you understand the performance of your application. Then adjust the time limit carefully, modifying the time limit specified in the provided scripts. Longer jobs may have to wait longer in the queue, but depending on other activity your wait times may vary.

To get an idea of when your job will finish, use the showq command. It will display all the running and pending jobs, displaying the number of nodes, the start time, and the time limit the user specified. For example, the result of running the showq command produced this output. There is one 64 node job with 10 hours, 50 minutes, and 50 seconds remaining, there is a 64 node idle job, waiting to run, and so on. [1/29/08].

Performance Measurement

There are various tools for measuring single processor performance. You may also use the high resolution timer to collect sub-microsecond times. Look at the dotslow example program (in $(PUB)/Examples) to see how these are used, and consult the IBM AIX documentation on high resolution clocks, as well as the document on CPU monitoring and tuning.

Note that some CPUs run at 1.5 GHz, while others run at 1.7 GHz. Unless specified otherwise both normal and high queues span across both 1.5GHz and 1.7GHz nodes (You can identify which type by looking at the host names. If the host names (which includes a node number) are in the range 100-299, then the node has 1.5Ghz processors. If in the range 300-399, then it is a node with 1.7GHz processors.) In order to get reproducible and consistent results, specify the type of node you wish to use by adding the following lines to the LoadLeveler job script.

For nodes with only 1.5GHz CPUs add:

	#@ requirements = (Feature == "MEM16")

For nodes with only 1.7GHz CPUs add:

	#@ requirements = (Feature == "MEM32")
If you are combining options, use logical operators as in
	#@ requirements = (Feature == "MEM32") && (OpSys=="AIX53")
to also specify the expression queue

Porting code to DataStar

On the IBM, you may need to gmake rather than make, as make won't accept some conventions accepted by Gnu Make.

If you run out of memory, try compiling in 64 bit mode. Look here for more information.


Maintained by baden @ ucsd.
edu   [ Wed Jan 30 15:25:15 PST 2008 ]