Date | Description |
06-Oct-09 | Original posting |
16-Oct-09 | Fixed discussion about impact of -px -py and -pz flags. Added discussion about roundoff errors in the residual. Discussed a strategy for setting the convergence threshold for the accelarted computation |
In this laboratory you'll parallelize an existing serial implementation of Jacobi's Method in 3 dimensions using pthreads. The serial C code is available as the compressed archive file j3d_serial.tgz and, on Triton, in ~baden/cse260-fa09/j3d_serial. Matlab code is also available as the compressed archive jac_matlab.tgz. For this assignment, use the Triton cluster but not the PDAF nodes. You may take measurements using the interactive queue, or with the batch queue. Do not take measurements on the "head" node (front-end).
The program provides various command line options which are parsed in the source file cmdLine.C, including the following options you’ll need when parallelizing the code:
-N <integer> domain size, the bounds of the computational mesh -i <integer> # iterations -err report error (Residual in the L2 norm) -px <integer> number of processors on x axis -py <integer> number of processors on y axis -pz <integer> number of processors on z axis
Thus, the command line
Jacobi3D -N 128 -i 5 -px 1 -py 2 -pz 4
will run Jacobi3D on 8 processors (threads) over a 1283 mesh for
5 iterations.
The processors will be configured in a 1 x 2 x 4 geometry.
though the -px -py -pz flags will have no effect in a serial
code. Specifying a non-unary value for any of these flags
will result in incorrect results
until you have correctly parallelized the
code [10/16/09]
This a "feature" of parallel computation. We assume that things generally happen in the same order every time we run the program, and that we know the specific order in which things are done. But, we do not have these guarantees in a parallel algorithm. We might not even have them in a serial program, due to optimizations and differences in how compilers treat them.
Since double precision carries around 14 digits, you’ll need to increase the printing precision of double precision to see the effect. Another way is to compute with single precision, where the effect of the reordering will be more pronounced. The run with single precision, remove the definition of the DOUBLE macro in the Makefile. (You will see that hooks have been added to the code to enable you to change between single and double precision). [10/16/09]
The solver handles cubical meshes of size N × N × N. When debugging, choose a value of N such that you can run 50 iterations in between 5 and 10 seconds on 1 processor. This means that runs will complete in a few seconds on 4 and 8 cores.
Next, do a scaling study, running a fixed problem size on 1, 2, 4, and 8 processors. On a single graph, plot the speedup as a function of P, the number of processors, for different problem sizes N. To double the computational workload, increase N by a factor 21/3 ~= 1.26. For 4 and 8 processors, experiment with differing processor geometries. Is performance sensitive to the geometry? If so, explain your observation and report the optimal geometry that minimizes the running time (There may be more than one for a given number of processors).
The technique we'll use is to coarsen the solution and iterate several times to obtain an approximate solution, which serves as a good initial guess. This strategy contains the essential machinery for implementing multigrid, which is an algorithm for accelerating convergence.
Modify your code so that, once the RHS and initial guess have been generated, a new coarsened version of the initial guess and RHS are produced, and subsequently smoothed by a Jacobi solver. This coarsened solution is then projected (via interpolation) back onto the original mesh and is used as an improved initial guess.
Examine the error (L2 residual) in order to to demonstrate the improved running time of your two level solver when delivering a given level of accuracy. A convenient way to do this is to set the convergence threshold to the value of a previously reported residual. That way you will get consistent results. It’s possible that you’ll run for a slighly different number of iterations, since the printout suppresses some digits in the reported residual, but the effect can be rendered harmless by running for a sufficient number of iterations [10/16/09]
You’ll need to iterate several times at the coarse level so that you see a significant improvement in the time needed to converge. To coarsen a mesh (use multigrid restriction), take the average of the 6 nearest neighbors (Manhattan directions). Thus, if C is the coarsening factor each coarsened mesh point (I,J,K) will be obtained from points ((I*C)+i, (J*C)+j), (K*C)+k ), where i, j, k range from 0 to C-1. A coarsening factor of 2 should be sufficient, but larger values may help speed up the convergence rate.
Once we have the initial guess on the coarsened mesh, we must transfer it to the original (fine) mesh. We do this by upsampling the coarse mesh via interpolation (multigrid prolongation). The details were discussed in class, and are also described in Jim Demmel's notes.
With the fine mesh in hand, we may then iterate on it to solve the problem. Note that our goal is to reduce the number of iterations on the fine mesh, which are C3 times more costly than coarse mesh iterations. Vary the number of coarse iterations needed in order to significantly reduce the running time needed to obtain a solution with an error meeting a given error threshold. I suggest that you choose a threshold that roughly equals 1/N.
Structure your code so that you use a single function to solve both the coarsened and refined problems.
Include a self-evaluation discussing the division of labor and other aspects of how you worked together as a team team. A copy of the form is available here.
Please put a hard-copy of your report in my mailbox on the 2nd floor of the CSE Building (EBU3B).
Also transmit an electronic copy by emailing a single archive file, e.g. .zip, .tgz, containing your writeup and source code-- and the self-evaluation. Note: submitting the electronic copy establishes the turnin time of your assignment.
Be sure to delete all .o files and executables before creating the tar file. Email the archive file as an attachment to the following address, and put the string cse260:A2 in the Subject: line:
baden | @ | ucsd. |
edu |
The name of your archive file should start with your login ID, followed by a colon (:) e.g. baden:a2.tgz. When your archive file is extracted, a new directory should be created with the same name as the archive file, excluding the filename extension, e.g. baden:a2 The directory should contain two subdirectories
src: | All source code including your makefile, so we can build the software. |
report | Your report including any image files, screen shots, or other output you wish us to see. |
Maintained by | baden | @ | ucsd. |
edu | [Fri Oct 16 18:46:34 PDT 2009] |