MPI Implementation of an Iterative solver for Solving Poisson's Equation
| Date | Description |
| 21-Oct-08 | Original posting |
| 24-Oct-08 | Added a tarball containing need parts of public direcory |
| 28-Oct-08 | Addendum to assignment for running on Abe. |
Note: A copy of the public files need to complete the assignment, /home/cs260f/public-cs260f contained in this tarball:
Parallelize the code using MPI. Implement 1, 2, and 3-dimensional geometries, and properly handle the case when N3 is not divisible by the thread geometry. Verify that the code is correct by comparing the final error (residual) for runs performed with different numbers of processors and with different processor geometries. Print out the value of the residual to 6 decimal places. To what degree are the results independent of the number of threads and the processor geometry? Be sure to provide an explanation.
The solver handles cubical meshes of size N × N × N. Choose a value of N such that you can run 100 iterations in under 10 seconds on 1 processor.
If the number of processors Pkdoes not divide Nevenly along some axis k, you must make sure that all processors are evenly loaded to the extent limited by the in ability of the processor geometry to divide the computational box evenly. In particular, you must make sure that no processor is without a work assignment unless the mesh is too small to permit them all to receive work (It is not necessary to handle that case.)
Implement the -nocomm command line option to disable all calls to MPI send and receive; you'll need it to indirectly measure communication.
To measure time, take the difference in times sampled by two successive calls to MPI_Wtime():
double t0 = MPI_Wtime();
// some code....
double t1 = MPI_Wtime();
double time = (t1-t0);
The man-page for MPI_Wtime() is found on line. Note, however, that all times are reported in seconds, and there is no need to divide by MPI_Wtick() to convert to seconds. (You can obtain the resolution of MPI_Wtime using MPI_Wtick())
Recall that the running time of the application is that of the last processor to finish. Don't time individual iterations, but rather, time an entire run, but excluding initialization, calls to MPI_Init() and MPI_Finalize() and any other activities that are not representative of a run with many iterations.
a. How many different geometries are there on 4 processors?
b. Start with the three 1 dimensional geometries, 1 x 1 x 4, 1 x 4 x 1, and 4 x 1 x 1. Since it can be difficult to measure communication times accurately,owing to the short time durations involved, we will use an indirect measurement technique: run with communication disabled (using the -nocomm option that you were asked to implement) and subtract the resultant time from that obtained from a run that includes communication.
Compare the running times for the 3 different 1D geometries and discuss the differences in computation and communication costs.
c. Next repeat with the 2D geometries. Which is optimal?
d. If you have time, run on 8 processors, but use only the 2D and 3D geometries. Adjust the problem size so that the amount of work per processor is a constant (i.e. to double the computational work, increase N by a factor of 21/3 ~= 1.26)
Run at least 3 times for each different geometry, and report the minimum running time. The running times should be highly repeatable; if you see a lot of variation try doubling the number of runs in order to observe a consensus.
You may use the problem size you used on 8 processors of Valkyrie. If you have the time, vary the size, noting any impact on the fraction of time spent communicating.
Turn in your amended report in class on Tuesday 11/3, both electronic and hard copy. If you have already turned in A4, you may submit in addendum if you wish, or submit an updated version of your lab report.
If you are working in a team, include a self-evaluation discussing the division of labor and other aspects of how you worked together as a team team. A copy of the form is available here.
Your turnin will consist of two parts. Submit a hard copy of your report in class on Thursday 10/30. Transmit your report electronically by the 5pm deadline on 10/28. Transmit your report by email in a single archive file, e.g. .zip, .tgz, containing your writeup and source code (and the self-evaluation, if applicable). Be sure and delete all .o files and executables before creating the tar file. Email the archive file as an attachment to the following address:
| baden | +260_a4 | @ | ucsd. |
edu |
Be sure to put the string cse260:A4 in the Subject: line and to use the email address above.
The name of your archive file should start with your login ID, e.g. baden.tgz. When your archive file is extracted, a new directory should be created with the name of your login ID. The directory should contain two subdirectories
| src: | All source code including your makefile, so I can build the software. |
| report | Your report including any image files, screen shots, or other output you wish me to see. |
| Maintained by | baden | @ | ucsd. |
edu | [Tue Oct 28 20:05:58 PDT 2008] |