CSE 260 Assignment 3 (fall 2005)

CSE 260 Assignment 3
Due Tuesday, 10/25/2005 in class
Worth 20 points

Introduction

In this assignment you'll modify the 2D Jacobi finite difference solver you implemented in the previous assignment. You'll also collect and discuss performance data.

You may work in teams of two for this assignment. If you do so, please be sure to implement processor subsetting, as discussed below in Part 2.

The assignment

The starting point for the assignment is the 2D Jacobi code you implemented in the last assignment. You'll make two changes:
(1) introduce 2D processor geometries, and (2) implement the two level mesh strategy to improve the initial guess, which was discussed in class. You'll also take performance measurements. Although this strategy does not compete with multigrid, it provides useful components used to implement multigrid.

Part 1: 2D Processor Geometries

In class we described the notion of processor geometry in which the processors can be thought of as being arranged on the points of a regularly spaced mesh. In general there is more than one geometry for a given number of processors. For a given number of processors there is an optimal geometry (or in some cases geometries) that minimizes the running time. The optimal geometry balances the computational rate against the communication cost, and the model given in class will enable you to understand this tradeoff.

Processor geometry affects two different aspects of performance: communication overhead (surface-to-volume effects), and single processor performance. Single processor performance is heavily influenced by memory access behavior, and can include both surface-to-volume effects as well as cache conflict misses. Computation time (excluding communication) can vary significantly among different geometries.

This part contains of two experiments.

Experiment 1

For your first experiment run on 4 processors and start with a 400² square problem mesh. Run for about 20 iterations so that the runs completes in a few seconds.

a. Start with the two 1 dimensional geometries: 1 x 4 and 4 x 1. (We say that they are 1 dimensional because they partition only one axis of the mesh.) Compare the geometries with respect to overall performance, communication performance, and computation only performance, i.e. with communication disabled. Use the indirect measurement technique to measure communication performance: run with communication disabled, as in the previous assignment, and subtract the resultant time from that obtained by running with communication turned on.

b. Now repeat for the 2D geometries. Which is optimal?

c. Repeat with 8 processors. If you have time, run on 16 processors, but use only the 2D geometries. Adjust the problem size so that the amount of work per processor is a constant (i.e. to double the computational work, increase N by a factor of 2^1/2 ~= 1.41)

Experiment 2

In this experiment predict communication times using the model discussed in class. Use the α and β values obtained from the Ring program (in $PUB/examples) to estimate communication times.

a. How do the predictions differ from what you observed in practice?

b. Explore the possible causes, illustrating with code fragments from your code.

Part 2: Two level solver

In this part you'll enhance the convergence rate of the Jacobi solver by improving the initial guess. In order to demonstrate convergence, you'll need to run for many more iterations than in part 1. Since the Jacobi algorithm converges in O(M²) where M=N² points in the 2D mesh, we will only be able to run with small problem sizes. Adjust the problem size so that the run completes in 10 seconds (or make your runs on an unoccupied machine.)

Our technique is to solve a down-sampled version of the equation, and to use the down-sampled solution to generate an initial guess for the original (fine grid) problem. Although there are more sophisticated ways of enhancing convergence using multiresolution meshes, this exercise will enable you to build software components that may be used to build a practical multigrid solver, which converges in a constant number of cycles.

Your starting point is the iterative Jacobi solver from the last assignment. Modify your code so that, once the RHS and initial guess have been generated, you generate a down-sampled ("coarse") version of the initial guess and RHS, and subsequently smooth these with your Jacobi solver. Your next step is to project this solution via interpolation back onto the original mesh, such that the fine mesh gets an improved initial guess. To coarsen a mesh, use multigrid restriction, that is, simple averaging of the nearest neighbors. Thus, if C=2 is the coarsening factor each coarsened mesh point (I,J) will be obtained from points ((I*C)+i, (J*C)+j) ), where i and j range from 0 to C-1. To inject the coarse mesh onto the original (fine) mesh, use the multigrid prolongation operation- i.e. interpolation. The details of the restriction and prolongation operations are described in Jim Demmel's notes on multigrid, and also in Numerical Recipes in C, 2nd Ed., by Press et al., Cambridge University Press, which is available on-line at http://www.numerical-recipes.com/nronline_switcher.html
Solving the Discrete Poisson Equation using Multigrid. Be careful to handle the boundary values correctly, as shown in the Matlab code found in the $PUB/examples directory on Valkyrie (in particular, Improve.m).

A coarsening factor of 2 should be sufficient (If you try something else, be sure it is an integer power of 2). Iterate many times to at the coarse level and you will see a significant improvement in the time needed to reach convergence. Try and tune the number of coarse iterations to minimize the running time. There is one important detail which makes life interesting. If you coarsen the mesh enough, you can improve performance by restricting the coarse solve to a subset of the processors and then broadcasting the result to the remaining processors. If you work in a team you'll need to implement this option.

Run on a least 8 processors on the fine mesh. Be sure to allocate 2 MPI processes to a node, using the -g 2 option of mpirun.

Structure your code so that you use a single function to solve both the coarsened and refined problems, and a single function to handle ghost cell communication. Write utilities that collect a coarsened mesh distributed over processors into a designated subset. Write other utilities that redistribute the coarsened initial guess to the fine solution array.

You have leeway to decide how you want to do these things, but be sure to document your decisions along with a rationale.

Things you should turn in

Document your work in a well-written report which discusses your findings carefully. Hand in a copy of your report in class. Your report should present a clear evaluation of the design of your code, including bottlenecks of the implementation, and describe any special coding or tuned parameters. Be sure to include the plots described above, along with any plotted data in tabular form.

Your report should cite any written works or software as appropriate in the text of the report. Citations should include any software you used that was written by someone else, or that was written by you for purposes other than this class, i.e. plotting packages and the like (do NOT use someone else's MPI application code, however.)

If you use a machine other than Valkyrie, please mention this in your report.

Electronic turnin instructions

Important. Please follows these instructions carefully.

Transmit an electronic copy of your report as a compressed archive file, e.g. .zip, .tgz, .tar.gz. Organize the source code and plots into subdirectories, and include a README file explaining the contents. If you like, include a .html file called index.html. From there you can then link to the README file and to the various components of your report, including the source code and any other files you want to include.

The name of your archive file should begin with your Valkyrie login ID, followed by the underscore character '_' followed by "hw3," i.e. mine would be cs260x_hw3.tar.gz. Your archive file should create a directory with the same name as the prefix of your archive file. Be sure and delete all .object (.o) files and executables before creating the archive file. Email the file to my cse account baden (at) cs.ucsd.edu with the string CSE260:hw3 in the Subject: line.

CSE 260 Assignment 3 Due Tuesday, 10/25/2005 in class Worth 20 points