Next: 4 Performance Analysis Up: 3 Adaptive Mesh API Previous: 3.2 Grid Hierarchy Management

3.3 Coarse-Grain Computation

The previous section described how a single level of the adaptive grid hierarchy---an IrregularGrid ---is distributed across processors. In this section, we discuss parallel execution over such distributed structures.

Parallel numerical computation is expressed using LPARX's forall construct, a coarse-grain data parallel loop that executes each iteration as if on its own virtual processor. Each iteration executes independently of all other iterations. For each Grid , the API calls a serial numerical kernel, typically written in Fortran, which executes on one processor. There are a number of advantages of separating parallel execution from serial, numerical computation. Numerical code may be optimized to take advantage of low-level node characteristics, such as vector units or multiple physical processors, without regard to the higher level parallelism. Existing serial code may not need to be re-implemented when parallelizing an application. Furthermore, we can leverage existing, mature sequential compiler technology.

Figure 3 compares our model of coarse-grain parallelism with a fine-grain data parallel style [3,17] Coarse-grain parallelism executes in parallel over the entire collection of grids. Each grid is assigned to one processor, and numerical computation on that grid is sequential. In contrast, fine-grain parallelism processes grids sequentially and expresses parallelism over the elements of a single grid.

  
Figure 3: Coarse-grain data parallelism (left) expresses parallel execution over the entire collection of grids; computation on each individual grid is serial. In contrast, fine-grain data parallelism (right) expresses parallelism over the data elements of each grid, and the grids are handled sequentially.

There are a number of advantages to coarse-grain parallelism. Because the numerical computation is serial, we may employ numerical methods on each grid which do not parallelize efficiently. For example, Gauss-Seidel relaxation works well as a smoother in multigrid, but it cannot be easily expressed in a fine-grain data parallel style. Coarse-grain parallelism also allows more asynchrony between processors and is therefore a better match to current coarse-grain message passing architectures. To improve the efficiency of the fine-grain model, Parsons and Quinlan [22] are developing run-time methods for automatically extracting coarse-grain tasks from fine-grain data parallel loops. Another model, processor subsets [8], combines the coarse-grain and fine-grain approaches; parallelism is expressed both over grids and within each grid.



Next: 4 Performance Analysis Up: 3 Adaptive Mesh API Previous: 3.2 Grid Hierarchy Management

Scott R. Kohn and Scott B. Baden