| Date | Description |
| 6-Mar-08 | Added afternotes, reflections on the assigment |
| 3-Feb-08 | Added a reference to the "Tips for writeups" |
| 28-Jan-08 | Added discussion pertaining to teams of 3. |
| 28-Jan-08 | Removed discussion of SPRNG random number generator. | 28-Jan-08 | Original posting |
In this laboratory you'll parallelize and experiment with an existing connected
component labeling algorithm
provided to you. You should also attempt
to speed up single processor performance.
Refer to the
Leighton and Schroeder handouts for reference.
A serial implementation of the connected component labeling algorithm, ccl, is available on DataStar $(PUB)/A3. Be sure to look over the file README.txt for additional information about how to use the code.
Although ccl is a serial program, it contains some hooks to help get you started with the MPI implementation: broadcasting command line arguments and taking times with MPI_Wtime(). The hooks may be enabled by uncommenting the definition of MPI_ON within the Makefile. The code is configured to use the AIX built in random number generator, drand48. This is not ideal, as we want independent streams across processors, but it will suffice for the purpose of this assignment. If you are interested, look into the SPRNG random number generator, or the ESSL routine (on AIX systems like DataStar): pdurng()
The program provides various command line options which are parsed in the source file cmdLine.C. In particular, you may seed the random number generator with an arbitrary value via the -s flag; else you get a seed based on the time of day. The program outputs the seed so that you will be able to reproduce a given run:
Random # generator seed: 1201251902
The code considers points to be adjacent only if they are nearest neighbors on the Manhattan coordinate directions-- left, right, up, and down. This is different from the stencil used in the Schroeder excerpt, which includes corners. If you like, experiment with the 9-point stencil, but be sure to present results for the Manhattan stencil.
First determine pc, the critical value of the independent probability p, that maximizes the running time. It may be easier to pinpoint pc by looking at tabular data; sample p more closely in the neighborhood of criticality so you can determine pc to 3 decimal places. Include tabular data along with your report, and also plot the running time as a function of p.
Since the initial conditions are randomized you'll need to take steps to ensure that your timings are statistically significant. For each value of p, make several (5 to 10) runs each with a different random number seed. Plot the mean value with a curve, but also plot the maximum and minimum value for each value of p you measured. You should also repeat a few of your runs using the same random number seed, to see if your timings are consistent. To do this, use the -s flag to set the random number seed to the value reported by the reference run you wish to reproduce.
To determine an appropriate value of N, experiment by starting with N=100 and doubling N until the run completes in about 5 to 10 seconds at criticality.
Parallelize the code. Label components in two phases. In the first phase, each processor labels its assigned part of the domain. (You may use ghost cells to contain labels from cells in neighboring processes.) In the second phase, processes re-label their clusters, using newly updated labels obtained from neighboring processes. This second phase may take several iterations before all labels are finalized. Try and make the code go as fast as you can. If you are working in a team of two or more, you must implement 2D partitioning, else 1D.
Read over these
Tips for your writeup. [2/3/08]
If you are working in a team, include a self-evaluation
discussing the division of labor and other aspects of how you worked
together as a team team. A copy of the form is available
here.
Include two appendices. The first should contain performance data for
the clustering algorithm on one processor, as well as your scaling
experiments.
Any plotted data should also be included in tabular form.
In the second appendix, submit a listing of your software
(only in electronic form, please).
Your writeup should
discuss decisions you made in parallelizing the algorithm.
Present a clear evaluation of
performance, including bottlenecks of the implementation. Describe any
special coding or tuned parameters.
What factors limit performance?
Verify correctness, in particular, you must show that
results are independent of the number of processors.
To do this, you'll need to make several runs with different
random number seeds for a fixed value of N, showing that
the average number of clusters (and the average cluster size)
is independent of P. Do not attempt to look at output,
as labels may vary on different numbers of processors.
(The provided code will output results for small clusters,
but this is a convenience)
Afternotes
Reflections on the assignment.
Copyright © 2008 Scott B. Baden.
[Sun Mar 9 21:20:38 PDT 2008]