a pretty picture

The Scallop Solver Library

Motivation

Scallop is a library of solvers for elliptic partial differential equations. Solutions of such equations are important to a wide variety of scientific computations. Scallop is best suited for constant coefficient problems with infinite domain boundary conditions, but it can also be used as a preconditioner for variable coefficient problems and methods are available to apply more general boundary conditions. The library is built on the KeLP programming system.

Scallop was designed for finite difference-based free-space problems. For such calculations, Scallop reduces the number of communication steps required by doing extra computation to take advantage of locality properties of elliptic PDEs. We believe that this trade-off makes sense for massively parallel computer architectures, and that our method will provide better over scalability than other options, which require more frequent communication.

Method

Scallop calculates a solution through five steps: a preliminary local solution on fine grids, an all-to-all aggregation of coarse grid data, a global coarse solution, nearest-neighbor communication of boundary data, and a final fine grid solution. Scallop uses the global coarse grid to communicate far-field solution information. Using the method of local corrections, this far-field data is combined with the preliminary local fine grid data to set local fine grid boundary conditions for the final solution step.

Current Results

We have run on up to 1024 processors. The following results were generated on NERSC's Seaborg machine. The graph below shows the time required per point as the problem size is scaled up, keeping the work per processor nearly constant. The time per point remains nearly constant as the number of nodes is increased.

graph of grind time

The graph below shows that communication time remains a small fraction of total running time.

graph of communication percentage

Before attempting to scale up to 1024 processors we switched to a faster serial solver. The graph below shows results using the FFTW-based solver. In this graph, a slowdown due to increasing overlap of the subdomains can be observed.

graph of communication percentage

Since the computation time per point is reduced in the FFTW version, the percentage of time spent communicating increases somewhat, but still remains relatively low.

graph of communication percentage

These results were presented at SC2003 in Phoenix. The paper is available here, as are the slides from our presentation.

Ongoing Work

In the current version of Scallop, performance is limited by the size of the coarse solution. For problems where the coarsening factor is too large compared to the local fine grid size, extra computational overhead is incurred. This overhead is reflected in the scaling study above: computation time per point increases as the number of nodes increases, rather than remaining flat. We believe we have a way to avoid this extra computation by limiting the overlap required between local fine domains.

Other performance improvements are also in development, including further optimization of the infinite domain boundary condition calculation, used in both the fine and coarse grid solution.

References

The Scallop library is based on previous work described in "A Finite Difference Domain Decomposition Method Using Local Corrections for the Solution of Poisson's Equation", by Gregory T. Balls and Phillip Colella, published in the Journal of Computational Physics, July 20, 2002.

Recent work is described in a paper to be presented this November at Supercomputing 2003. The paper, "SCALLOP: A Highly Scalable Parallel Poisson Solver in Three Dimensions" is available here. The presentation given at the conference is available as well.

Software Distribution (gzipped tar file, 144 KB)


Scientific Computation Group Home Page