CSE 260 Homework Assignment #4
Parallel Connected Component Labeling

Due: Saturday 11/9/02 at 5PM

Revised Tue Nov 5 20:30:08 PST 2002

In this laboratory you'll parallelize the connected component labeling algorithm you implemented in assignment #3, and collect various statistics.
Use the parallel random number generator called SPRNG, which is found in  ~/../public/lib/SPRNG.   SPRNG is a higher quality random number generator than that provided in hw #3.

The code considers points to be adjacent only if they are nearest neighbors on the Manhattan coordinate directions-- left, right, up, and down. This is different from the stencil used in the book, which includes corners. If you like, experiment with the 9-point stencil, but be sure to present results for the Manhattan stencil.

The assignment

Parallelization

Parallelize your component labeling algorithm using MPI.  Your code will label components in two phases. In the first phase, each processor will label its assigned part of the domain. (You may use ghost cells to contain labels from cells in neighboring processes.)  In the second phase, processes will re-label their clusters, using newly updated labels obtained from neighboring processes.  This second phase will take several iterations before all labels are finalized.   Try and make the code go as fast as you can. You may use a 1-dimensional decomposition, but if you have time, experiment with 2-dimensional decompositions

Experimentation

Evaluate the performance of your implementation by conducting various experiments, varying N, p, and the number of processors P.  I suggest that you look at 3 values of the independent probability p spaced evenly along the interval of 0.0 and 1.0. Determine an appropriate scaling of the problem so that your runs last not less than about 5 seconds, nor more than about 30 seconds.  Conduct experiments with both fixed and scaled workloads. Separately plot the total time and the grind time, and be sure to include all data in tabular form as well. Compute the grind time as the total time divided by N2, the number of updated points. exclude any initialization or output from your timing, and report only on the labeling. You won't be reporting speedups.

To determine an appropriate value of N, experiment by staring with N=100 and doubling N until the run completes in about 10 seconds at criticality. If you are using Valkyrie observe the following protocol, which may help improve the consistency of your results. If you see that others are on the machine, try to use a node that appears unoccupied. Stick with that node, and make all your timing runs from a single script. This way, if someone sees that the node is busy, they'll stay away for the duration of time that you are performing your runs.

If you have time:

  • Implement the 3D algorithm. You'll need to determine pc a new as its value is sensitive to the number of spatial dimensions.

  • Explore the region around criticality more finely.
  • Things you should turn in

    You should document your work in a well-written report of about 5 pages, not including figures, code listings, or appendices.   Include two appendices. The first  should contain the performance data for the clustering algorithm. Any plotted data should also be included in tabular form. In the second appendix, submit a listing of your software. (The source code should only be provided in the electronic turnin of the assignment)

    Your writeup must provide sample output demonstrating correct operation of your code. The results should be independent of the number of processors, though the specific values of the labels may vary. Discuss the decisions you made in parallelizing the algorithm. Present a clear evaluation of performance, including bottlenecks of the implementation, and describe any special coding or tuned parameters. What factors limit performance?

    You should also turn in an electronic copy of your report and code. Make a simple web page that contains links to the report and to the source code. If your tabular data are in spreadsheets or other documents, include those as well. Be sure that the actual source code is included, so that it may be compiled and run on Valkyrie. Organize your files in directories with a separate directory for source code, figures and tables, and any sample output. At a minimum, you will have your report as a single file, plus a directory containing the source along with any input files needed.

    Using SPRNG

    SRNG is a 48 bit Linear Congruential Generator. You can find documentation at http://archive.ncsa.uiuc.edu/Apps/SPRNG/www/quick-start.html In particular, look at the section entitled "Using SPRNG in a parallel program."

    However, there is an example written in C which has been taken from the SPRNG distribution. Look on Valkyrie at ~/../public/examples/hw4/sprng-simple_mpi.c (More generally, you can find many examples in ~/../public/lib/SPRNG/EXAMPLES/) To build your code with SPRING, be sure that the following compiler flags are set:

    -O3 -DLittleEndian -DSPRNG_MPI -DLINUX -I/home/cs260f/public/lib/sprng/include and that the loader line contains the following flags:

    -L/home/cs260f/public/lib/sprng/lib -llcg


    Copyright 2002 Scott B. Baden. 10/31/02 07:53 PM