Data parallel Fortran languages such as High Performance Fortran [7,13,14] do not readily support non-uniform grid structures. In their Connection Machine Fortran implementation of a 2d adaptive mesh refinement application on the CM-2, Berger and Saltzman [3] required that all refinement regions be the same size. To ascertain the performance implications of such a restriction, we have implemented a grid generation strategy identical to that used by Berger and Saltzman.
One of the important trade-offs in a uniform refinement strategy is the selection of the appropriate patch size. The key is to find a grid size which is large enough to be computationally efficient on the target parallel architecture but yet small enough to limit over-refinement. Large patches typically refine more of the computational domain than what is needed and thus waste memory resources. Small patches may represent too little work to be efficient.
Another consideration is the ratio of ghost cells, boundary points used to locally cache data from other grids, to interior grid points. The width of this boundary region depends on the numerical kernels of the application. Our eigenvalue solver uses a ghost cell width of one; Berger and Saltzman use four. For small patches, these ghost regions can represent a significant fraction of the total memory, especially in three dimensions. For example, the boundary cells for a 16^3 patch in 3d with a ghost cell width of four (24^3 total) represent 70% of the memory used to store the patch. Furthermore, boundary cells may introduce additional computational work for some numerical methods (e.g. flux correction for hyperbolic partial differential equations [1]).
We compare the non-uniform refinement approach against four uniform grid sizes: 12^3, 16^3, 24^3, and 32^3. Each patch is augmented with a ghost cell region of width one. These four sizes bracket the range of useful patch sizes; for this particular application, 12^3 is too small and 32^3 is too large. Memory overheads are reported in Table 3. Uniform refinement with the smallest patch size requires only about 26% more memory than non-uniform refinement; the largest patch size uses almost three times more memory.
Table 3: Uniform grid patches require additional memory resources as
compared to non-uniform refinements because the grid generator does
not have the freedom in selecting the optimal grid size needed to
cover a particular region of space. Thus, uniform patches lead to
over-refinement. Small patches reduce over-refinement; however,
small grid patches introduce a large number of ghost cells relative
to the number of unknowns. In this application, the ghost cells region
is one cell thick.
Figure 5a presents the execution time for one iteration of the adaptive eigenvalue application on the Intel Paragon. Note that we cannot report the results for the 32^3 patch size on four processors; this problem would not run because of memory limitations. The 16^3 patch size gives the best performance for all numbers of processors, running between 40% and 60% slower than the non-uniform refinement method. Both memory usage and computation time are important computational resources for adaptive mesh applications. In fact, many accounting systems for parallel computers charge not only for CPU time but also for memory usage. To capture both resources, Figure 5b presents the relative space-time (i.e. megabyte-hour) cost of uniform refinement patches as compared to non-uniform patches. In this metric, uniform patches are between two and eight times more expensive.
Figure 5: These graphs illustrate the performance costs of requiring
uniform grid patches as opposed to non-uniform patches. By default,
the adaptive mesh libraries employ non-uniform patches. Results for
the 32^3 patch size on four processors are not reported; the problem
would not run due to memory limitations.
(a) A comparison of execution times for a variety of uniform patch sizes.
(b) Relative space-time (megabyte-hour) costs of uniform refinement
patches as compared to non-uniform patches.
These results clearly show that uniform refinement patches are more expensive than the identical application using non-uniform patches. To solve a problem to a specified accuracy, structured adaptive mesh methods employing uniform refinement regions will require more computational resources (flops and memory) than one using non-uniform refinement regions. Likewise, given fixed resources, non-uniform refinements will solve a particular problem to higher accuracy.