|2-Nov-06||Added discussion about how to take timings using high resolution timer.|
|12-Nov-06||Added notes about porting code to DataStar.|
Here we'll get an overview of the environment, and some tips
to get started. But be sure to look over the
DataStar User's Guide carefully in order
to appreciate DataStar's full range of capabilities.
Here is documentation on the
IBM C++ and Fortran Compilers and on the
DataStar has a front end called dslogin.sdsc.edu. From this node you may develop code and submit batch jobs. You should never run jobs on this node, however. There is also an interactive node. In batch mode you may obtain access to dedicated nodes—although the switch will be shared with other jobs. Batch mode is appropriate for collecting performance measurements, where reproducibility is important.
A copy of Valkyrie's public directory has been set up in
At the top level of this directory
you'll find a version of the arch file set up for
DataStar. This file, called
arch.dstar, configures your
Makefile to use
IBM's compilers. These compilers incorporate the
MPI library. We'll use thread-safe versions of the compilers, which
are distinguished with
names ending in the suffix _r, e.g. mpCC_r
for the thread-safe C++ compiler, and so on.
(For serial code you may use xlC; to run you should
not use the front end.)
Use the interactive node for code development: dsdirect.sdsc.edu. This node has 32 processors and 64 GB of memory. This is a p690 server, and you may use all the node's memory if necessary. However, use memory with care as our class account is charged in terms of the amount of processors as well as memory used.
To run interactively you use the poe32 command. For example, to run the Ring program on 16 CPUs with command line arguments -lin 0 1024 64, enter the following:
You may also set up the configuration with environment variables:
When you are ready to collect measurements, make your production runs using the batch subsystem. There are many batch queues, and they vary according to factors such as: expected job length, maximum numbers of nodes—and cost. If your job requires 4 nodes (32CPUs) or less, and can live within 16GB of memory per node, use the Express queue. (The maximum time limit is two hours, but in this course we'll run in far less time. See me if you need to make longer runs.) The provided script specifies a normal queue, but a high priority queue is also specified in a commented line of the script.
Express queue jobs must be submitted from the special front end dspoe.sdsc.edu. Larger jobs should be submitted from dslogin.sdsc.edu and use use one of the other queues, which will give you access to up to 265 nodes (2120 processors). Be sure to use "normal" queue unless we've discussed the matter, as the higher priority queues drain our bank account more quickly. Similarly, be careful when running on more than 16 nodes (128 processors). Consult the documentation for more information.
Keep the job time limit low until you understand the performance of your application. Then adjust the time limit carefully, modifying the time limit specified in the provided scripts. Longer jobs may have to wait longer in the queue, but depending on other activity your wait times may vary.
Submit batch jobs with the
where the file p4_4 contains the appropriate environment setting and one or more runs that you wish to make. A copy of p4_4 is found in A1/Ring_new. You will need to make some changes in order to use the script, and these are noted in the script.
Once your has been submitted, it will wait in the queue until the appropriate resources are available. You may check on the status of your job using the llq command. If you specify your user ID as in llq -u baden you'll see only your jobs.
You can remove jobs with the llcancel command:
ds100 : llq -u baden Id Owner Submitted ST PRI Class Running On ------------------------ ---------- ----------- -- --- ------------ ----------- ds100.240242.0 baden 11/1 22:03 C 50 express ds100.240243.0 baden 11/1 22:04 C 50 express ds100.240244.0 baden 11/1 22:07 C 50 express 0 job step(s) in query, 0 waiting, 0 pending, 0 running, 0 held, 0 preempted ds100 : llcancel ds100.240242.0 llcancel: Cancel command has been sent to the central manager.
Once your job has dispatched, you'll get an email confirmation. You'll get another email confirmation when your job has completed. The provided script is set up to place all output on a file which contains a unique identifier in its name, allowing you to run the job several times while getting unique output, as in Ring_4_4_240241.out. You also get an error output file with the same prefix. Normally the error file will show successful outcomes, as in
There are various tools for measuring single processor performance.
You may also use the high resolution timer to collect sub-microsecond
times. Look at the dotslow example to see how these are used,
and consult the IBM AIX documentation
on high resolution clocks, as well as the
CPU monitoring and tuning.
Porting code to DataStar
You'll find that the IBM compilers are bit fussier than the Gnu
compilers. In some cases this is for the better, in other
cases it is due differences in the version of the language
After you've ported the code to the IBM, you may find it convenient
to set things up so you can move the code freely between
IBM's and Valkyrie's compilers, or any other(s) that you are using.
On the IBM, you'll need to use gmake rather than make, as make won't accept some conventions accepted by Gnu Make.
If you run out of memory, try compiling in 64 bit mode. Look here for more information. <! It is recomended The [tt]arch.dstar[/tt] "arch" file located in the [tt]cse260_fa06/examples/SUMMA[/tt] directory will show you how to work this out in 32 bit mode.A>
Maintained by Scott B. Baden, Last modified: Sun Nov 12 15:25:10 PST 2006