Cluster Identification

The cluster identification problem is the special case of connected component labeling where the graph is a d-dimensional hyper-cubic lattice. This problem arises in Monte Carlo simulations of many physical processes, such as the Ising model. For simulations on large lattices, the cluster identification step is often the bottleneck in the computation. Since the algorithm does not vectorize on the Cray, it is ideally suited to a parallel computer. We are collaborating with Dr. Karl Jansen (DESY, in Hamburg, Germany) to investigate a scalable parallel cluster identification algorithm in simulations of the Ising model. Previous attempts at computing the Ising model on parallel computers have not scaled to large numbers of processors (100 or more), primarily because of the high frequency of non-local communication. Our clustering algorithm, based on Belkhale and Banerjee's quad algorithm for VLSI circuit extraction, scales to hundreds of processors. A difficulty with the quad algorithm is that communication and global computation time both grow as the number of processors and the problem size increase. This liability makes the quad algorithm inefficient for use with hundreds of processors. We have implemented optimizations that negate this effect when the degree of the graph is high. Preliminary results show that this optimization can reduce the global cluster identification time by as much as a factor of four for bond percolation configurations in the supercritical region on a 64-processor nCUBE processor. These performance improvements increase with the number of processors so long as the problem is scaled concomitantly. Our optimizations are crucial in scaling Belkhale and Banerjee's scheme to hundreds of processors. The significance of our investigation is that enables substantial improvements in the accuracy of Ising model simulations by expanding the size of the lattice that can be employed. This is important because the accuracy of current simulations is only partially understood. By increasing the size of the computational lattice, Karl Jansen hopes to determine whether he can approximate the infinite physical lattice that exists in nature with the finite computational one. Current lattices are felt to be too small to resolve the issue. We are currently re-implementing the percolation clustering application in LPARX. This will enable the application to run on diverse distributed memory MIMD architectures, and, enhance the dimension independence of the code. We are currently looking toward four-dimensional computations.

Recent Publications

Parallel Cluster Identification for Multidimensional Lattices, S. J. Fink, C. Huston, S. B. Baden, and K. Jansen. IEEE Transactions on Parallel and Distributed Systems, 8(11), November 1997, pp. 1089-1097.
The cluster identification problem is a variant of connected component labeling that arises in cluster algorithms for spin models in statistical physics. We present a multidimensional version of Belkhale and Banerjee's Quad algorithm for connected component labeling on distributed memory parallel computers. Our extension abstracts away extraneous spatial connectivity information in more than two dimensions, simplifying implementation for higher dimensionality. We identify two types of locality present in cluster configurations, and present optimizations to exploit locality for better performance. Performance results from 2D, 3D, and 4D Ising model simulations with Swendson-Wang dynamics show that the optimizations improve performance by 20-80%.
Cluster Identification on a Distributed Memory Multiprocessor, Stephen J. Fink, Scott B. Baden, Karl Jansen, March 1994, Proc. 1994 Scalable High Performance Computing Conference, May 23-35, 1994, Knoxville, Tennessee. (Also CSE Tech. Rept. CS94-355.)
The cluster identification step is often the bottleneck in multiprocessor simulations of spin models for statistical mechanics. We have applied a connected component labeling algorithm originally developed for VLSI circuit extraction to the cluster identification problem. The algorithm is extended to more than two dimensions, abstracting away unnecessary spatial information to simplify implementation in higher dimensions. We identify two types of spatial locality in cluster configurations, and present optimizations to exploit each type of locality. Performance results are presented from two and three-dimensional Ising model simulations.

Cluster Identification

Recent Publications

Maintained by Scott B. Baden. Tue Jan 20 09:58:52 PST 1998