Date |
Description |

06-Oct-09 | Original posting |

16-Oct-09 | Fixed discussion about impact of -px -py and -pz flags. Added discussion about roundoff errors in the residual. Discussed a strategy for setting the convergence threshold for the accelarted computation |

In this laboratory you'll parallelize an existing
serial implementation of Jacobi's Method in 3 dimensions using pthreads.
The serial `C` code is
available as the compressed archive file j3d_serial.tgz and, on Triton, in `~baden/cse260-fa09/j3d_serial`.
Matlab code is also available as the compressed archive
jac_matlab.tgz.
For this assignment, use
the Triton cluster but not the
PDAF nodes. You may take measurements using the interactive queue,
or with the batch queue. Do not take measurements on the "head" node
(front-end).

The program provides various command line options
which are parsed in the source file
`cmdLine.C`, including the following options you’ll need when
parallelizing the code:

-N <integer> domain size, the bounds of the computational mesh -i <integer> # iterations -err report error (Residual in the L2 norm) -px <integer> number of processors on x axis -py <integer> number of processors on y axis -pz <integer> number of processors on z axis

Thus, the command line

Jacobi3D -N 128 -i 5 -px 1 -py 2 -pz 4

will run `Jacobi3D` on 8 processors (threads) over a `128 ^{3}` mesh for
5 iterations.
The processors will be configured in a

This a "feature" of parallel computation. We assume that things generally happen in the same order every time we run the program, and that we know the specific order in which things are done. But, we do not have these guarantees in a parallel algorithm. We might not even have them in a serial program, due to optimizations and differences in how compilers treat them.

Since double precision carries around 14 digits, you’ll need to increase the printing
precision of double precision to see the effect. Another way
is to compute with single precision, where the effect
of the reordering will be more pronounced.
The run with single precision,
remove the definition of the `DOUBLE` macro in the `Makefile`.
(You will see that hooks have been added to the code to enable you to change
between single and double precision).
**[10/16/09]**

The solver handles cubical meshes of size N × N × N. When debugging, choose a value of N such that you can run 50 iterations in between 5 and 10 seconds on 1 processor. This means that runs will complete in a few seconds on 4 and 8 cores.

Next, do a scaling study, running a fixed problem size on 1, 2, 4, and 8
processors. On a single graph,
plot the speedup as a function of P, the number of processors,
for different problem sizes N. To double the computational workload,
increase N by a factor 2^{1/3} ~= 1.26.
For 4 and 8 processors, experiment with differing processor geometries.
Is performance sensitive to the geometry? If so, explain your observation
and report the
optimal geometry that minimizes the running time (There may be more than one for a given
number of processors).

The technique we'll use is to coarsen the solution and iterate several times to obtain an approximate solution, which serves as a good initial guess. This strategy contains the essential machinery for implementing multigrid, which is an algorithm for accelerating convergence.

Modify your code so that, once the RHS and initial guess have been generated, a new coarsened version of the initial guess and RHS are produced, and subsequently smoothed by a Jacobi solver. This coarsened solution is then projected (via interpolation) back onto the original mesh and is used as an improved initial guess.

Examine the error
(L2 residual) in order to
to demonstrate the improved running time of your
two level solver when delivering
a given level of accuracy.
A convenient way to do this
is to set the convergence
threshold to the value of a previously reported
residual. That way you will get consistent results.
It’s possible that you’ll run for a slighly different
number of iterations, since the printout suppresses
some digits in the reported residual, but the
effect can be rendered harmless by running for a sufficient number of
iterations **[10/16/09]**

You’ll need to iterate several times at the coarse level so that
you see a significant improvement in the time needed to converge.
To coarsen a mesh (use multigrid *restriction*),
take the average of the 6 nearest neighbors (Manhattan directions).
Thus,
if C is the coarsening factor each coarsened mesh point `(I,J,K)` will be
obtained from points `((I*C)+i, (J*C)+j), (K*C)+k )`, where `i, j, k`
range from 0 to C-1.
A coarsening factor of 2 should be sufficient, but larger values
may help speed up the convergence rate.

Once we have the initial guess on the coarsened mesh,
we must transfer it to the original (fine) mesh.
We do this by upsampling the coarse mesh via interpolation
(multigrid *prolongation*).
The details were discussed in class, and are
also described in
Jim Demmel's notes.

With the fine mesh in hand, we may then iterate on it
to solve the problem. Note that our goal is to reduce the
number of iterations on the fine mesh, which are C^{3}
times
more costly than coarse mesh iterations. Vary the number
of coarse iterations needed in order to significantly reduce the running time
needed to obtain a solution with an error meeting a given
error threshold.
I suggest that you choose a threshold that roughly equals 1/N.

Structure your code so that you use a single function to solve both the coarsened and refined problems.

Include a self-evaluation
discussing the division of labor and other aspects of how you worked
together as a team team. A copy of the form is available
**here**.

Please put a **hard-copy** of your report
in my mailbox on the 2nd floor of the CSE Building (EBU3B).

Also transmit an **electronic copy**
by emailing a
single archive file, e.g. `.zip, .tgz`,
containing your writeup and source code-- and the self-evaluation.
**Note: submitting the electronic copy establishes the turnin
time of your assignment.**

Be sure to delete all .o files and executables before creating the tar file.
Email the archive file as an attachment to the following address,
and put
the string ` cse260:A2` in the

baden | @ | ucsd. |
edu |

The name of your archive file should start with your login ID,
followed by a colon (:)
e.g. `baden:a2.tgz`. When your archive file is extracted,
a new
directory should be created with the same name as the archive file,
excluding the filename extension, e.g. `baden:a2`
The directory should contain two subdirectories

src: |
All source code including your makefile, so we can build the software. |

report |
Your report including any image files, screen shots, or other output you wish us to see. |

Copyright © 2009 Scott B. Baden

Maintained by | baden | @ | ucsd. |
edu | [Fri Oct 16 18:46:34 PDT 2009] |