As we have seen from our study of partial differential equations, diverse physical phenomena exhibit spatial and temporal locality. Included are gravitation attraction, coulombic interaction of charged particles, diffusion of energy, and the propagation of waves. These phenomena often exhibit fine structures that are unpredictable and irregular, and their accurate modeling is important economically. Turbulence, for example, that occurs in the wake of a Boeing 747, or the flow of oil and water in an oil field are examples of such phenomena. Shock waves that accompany a sonic boom are another.
Irregular phenomena are often highly localized, and exhibit large derivatives that are difficult to capture numerically. From our earlier study of finite methods we know that the error terms in a finite different approximation are polynomials in the mesh spacing h , and the coefficients are the derivatives of the function we are computing. Up to this point, we have implicitly assumed that the function is "nice;" that is it has finite, bounded derivatives of all orders. However, this is not always the case. A shock wave emanating from an aircraft moving at supersonic speeds actually exhibits discontinuities or jumps were strictly speaking, the derivatives are not defined. In other cases, we may singularities due to the presence of point charges; the potential function varies as 1/r, and the derivative, ln r is unbounded as r -> 0.
Consider, for example, the 1D Poisson equation
u''(x) = 1 / | x | on the intervals [-1.0, 0.0) and (0.0,1.0], u''(x) = C at x=0, where C is an appropriately defined constant u(-1.0) = u(1.0) = K, where K is an appropriately defined constantThe various derivatives of u(x) are exceedingly large at x=h the mesh spacing, and grow as h -> 0. Although our solution might be accurate to O(h2), the factor in front of the h2 term is so large that we cannot get an accurate answer in practice.
To capture functions that do not have "nice" derivatives we must reduce the spacing of the finite difference mesh. Unfortunately, we might have to refine the mesh by as much as a factor of 50 or 100 (or even more depending on the initial data). While this requirement may not be problematic in one dimension it almost certainly will be in two or more spatial dimensions. A factor of 100 increase in the linear mesh spacing would require that the number of unknowns in a 3-dimensional equation grow by a factor of 106, and for a time dependent problem we would also have to decrease the time-step by a factor of 100 resulting in a 100 million-fold increase in the computational effort!
But this is only part of the story. Clearly we are wasting our effort if the solution is smooth over large portions of the problem domain. In fact, for a given amount of resources we can improve accuracy dramatically by adjusting the mesh spacing according to the estimate error of the solution, which is often measured in terms of its derivatives. Thus, a non-uniform mesh not only reduces the cost of the computation, but often more importantly it reduces memory consumption significantly.
In this lecture we will discuss an adaptive mesh refinement algorithm that was developed by Berger and Oliger ( J. Computational Physics, 53:484-512, 1984) and later refined by Berger and Colella ( J. Computational Physics, 82(1):64-84, May 1989.) It is based in part on Bolstad's earlier work in one dimension. This algorithm is known by various names including the "Berger-Oliger-Colella" algorithm or "patch based AMR." Another important type of AMR, based on quad- or oct-trees, won't be discussed here. We'll consider tree-like refinement structures in the context of particle methods instead later in this lecture.
Here is a picture of a not-so-nice function. On close examination, the mesh
spacing varies according to an estimate of the error.
Note how the mesh points are spaced
close together in the vicinity of the cusp.
Two issues arise in implementing a non-uniform mesh
Our solution to #1 entails two sub-steps:
Our solution to #2 entails some special fix-up procedures to ensure
that the solution seen at such transitional points is consistent
as viewed by the two neighboring points. In the following mesh for example
In structured AMR we don't represent each point on the mesh with its
own value of h. Rather, we form a composite
mesh organized into levels, each succeeding level with a finer spacing
than the previous one. This is shown in the following figure
The spacing at each level is uniform, and the refinement factor R
gives us the ratio between the mesh spacings at adjacent levels.
Usually R is a constant; typical values are two or four.
( R is usually restricted to a power of 2).
In general we will have several levels. We may fix the number of levels
arbitrarily, or we may simply add levels until we run out of memory.
For the time being, we consider a 2-level hierarchy comprising a fine
grid uf and a coarse grid uc:
Now, how do we solve on the composite structure (The procedure is usually
accelerated by multigrid, which we leave out to simplify the discussion).?
In an elliptic equation it turns out that u is defined only
at the finest level appearing in the representation. Thus, we don't represent
uc at positions covered by uf.
To compute on the composite mesh we first compute on the
base mesh uc. We are careful to exclude the points covered
by uf. Thus, we must solve Poisson's equation for two separate right
hand sides, to the left and to the right of the overlapping refined region.
Each equation take Dirichlet boundary conditions. The "left" equation
takes it left boundary condition from the physical boundary conditions.
The right boundary condition is obtained from the fine mesh; we average
two adjacent fine mesh cells onto the coarse mesh. This extra coarse
mesh cell is like ghost cell, except that we must also carry out
numerical computation (this is identical to the multigrid restriction
operations.
We next solve on uf . Generally uf
will be divided into multiple disjoint regions but in our simple example
we have only 1 region. uf expects boundary
conditions which it obtains from uc by interpolation
(multigrid prolongation ). Again we add ghost cells to hold
these boundary conditions.
When setting up boundary conditions we must take care that the slopes
match. This is nicely described in the tutorial
"Solving Poisson's Equation using Adaptive Mesh Refinement" by
Dan Martin and Keith Cartright (see the URL
http://barkley.me.berkeley.edu/~martin/AMRPoisson.html.)
In higher dimensions, grids communicate numerical information
in a more elaborate manner, as
shown in the following figure.
In one dimension, clustering is straight forward.
In two and more dimensions,
however, the procedure gets more complicated.
In the basic Berger-Oliger-Colella algorithm each level is refined
as a single logical unit. This implies that refinements are not
aware of the breakdown of a level into grid components. As a result,
a refinement can
have more than one parent as shown in the figure.
Refinement structures may not generally be represented as a tree,
e.g. a quad-tree or oct-tree, which are employed in adaptive quadrature .
A 3 level refinement structure. The base mesh is pink, and is refined by
two
blue level-2 grids. The red spots mark the location of flagged points
in the second level, which are covered by green level-3 refinements.
Note that one of the level-3 refinements has more than one parent
grid component.
Refinements are generated recursively, stopping when some user-specified
criteria is met. Refinements must be properly nested , that is,
there must be some padding around each refinements such that it doesn't
abut the edge of the coarse mesh. This constraint bounds the number of
levels that must be spanned when interpolating coarse cells onto
fine ghost cells, and simplifies the implementation.
The Berger-Rigoutsos
algorithm is a standard technique used to handle clustering;
it is based on an edge detection algorithm from image understanding.
We'll give a brief summary here;
consult Scott Kohn's
Ph.D. Dissertation
or Berger and Rigoutsos' original article
(IEEE Transactions Systems, Man and Cybernetics, 21(5):1278-1286,1991.)
The aim of clustering is to minimally cover all points
requiring refinement. That is, we want to avoid covering
unflagged points, but we are willing to cover some of these points if the
overall cost of handling the refinement structures is reduced.
This effectively boils down to a granularity tradeoff.
In one extreme, we could exactly cover only the flagged points.
But, we would likely have many small refinements., with a large
communicating surface area.
At the other extreme, we could have only a few refinements, but
would do considerable unnecessary work.
Somewhere in between, we do a little bit of extra work, in exchange
for lower operating overheads.
Clustering algorithms for patch-based AMR tend to be based
on a few primitives: slicing the flagged points
at a position(s) where the density of points is low,
wrapping a bounding box around
an area of interest to fit snugly around the flagged points,
splitting the bounding box
if the fraction of unflagged points is too high.
Using these basic operations, we can express a variety of interesting
strategies.
In some higher-dimensional variants,
refinements may be rotated with respect to one another.
This complicates boundary condition
transfers between grids at the same level. When the grids
are not rotated, then these intra-level transfers are handled
just like ghost cells in uniform mesh methods, except that some ghost
cells will get filled in by interpolating from a coarse mesh. Note that
this added complexity of intra-level transfers does not arise in one
dimensional computations.
In some applications, refinement structures evolve according
to the solution we are computing. Thus, we will discard some refined points,
and add others. Often we pad refinements to permit us to re-grid the
meshes less frequently. Clearly there is a tradeoff; if we don't pad
enough, then we must refine too often, but if we pad too much to
avoid refining too frequently, we increase the computational cost of
numerical integration.
To refine a point we interpolate it onto a finer mesh. We never
throw away refined points that are to remain refined; we always use
the best information available. Thus we copy old refined points
where they intersect, into corresponding positions of the new
refinement structure. If we threw away such points, then we would
be forced to reconstruct them by interpolation, which would pollute
the fine meshes with less accurate coarse grid data.
We note that the exact details of error estimation, interpolation,
coarsening, and smoothing are problem-dependent. Grid generation, however,
is largely application-neutral since it doesn't know anything about
the problem. This tells us that library support for adaptive mesh methods
is likely to be as diverse as the problems themselves. However, by
layering the library support we may be able to reuse some software components.
It should come as no surprise that the effort required to construct even
a serial adaptive code
is substantial and requires appropriate library support. After all, we are carrying out
many of the activities that would occur in a parallel implementation:
decomposition (grid generation) and inter-grid transfers. The implementation issues
introduced on a parallel computer, e.g. load balancing and inter-processor
communication, compound the difficulties
entailed in a serial implementation, but the major intellectual leap
is in dealing with the issues of a single processor implementation.
Consider, for example, grid generation. Recall that a grid patch may
have more than one parent. This is true because error estimation
is carried out on an irregular region that makes up a level. The specific
grouping of points in the parent level aren't relevant to refinement
decisions. In fact, one grouping is as good as any other as
far as the error estimator is concerned. (Of course, there are externally
imposed constraints that remove some groupings from consideration.)
To raise the level of
abstraction from grid patch to irregular grid level we need to build
an appropriate application level interface.
Support for adaptive mesh refinement is usually
specialized.
An important abstraction
is the IrregularGrid, which hides the
exact assignment of grid components to processors. This abstraction
enables us to treat irregularly shaped region as if it were a single
logical unit, as
shown on the right half of the previous Figure. In particular, communication of ghost cells across
grid interfaces is handled
exactly in the same way as for a uniform mesh method.
We use the set difference construct as before, but interpreted
over irregular regions.
Coarse/fine interfaces are handled somewhat differently. As with
fine-fine interfaces, we don't know the number of processors we are running on.
However, we must also carry out computation.
Load balancing arises because the grids have different sizes. Some processors
may obtain more than one grid, and some larger grids may
have to be split over multiple processors.
This process is complicated by
the fact that communication workloads may also be imbalanced.
In the interests of managing
software complexity, we prefer to treat load balancing
as a separate post-processing step after grid generation.
This separation of load balancing from grid generation simplifies
software design by permitting application and hardware-specific
knowledge
to be handled separately.
To simplify the debugging process we want to constrain the load
balancer so that the points in the resultant refinement structure
will be the same
regardless of the number of processors employed. In other words,
there exists a canonical refinement structure and
load balancing decisions may neither add nor remove refined points to or
from this canonical
refinement structure, but may only regroup them.
This ensures that refinement decisions are unaffected by parallelization,
which otherwise could introduce subtle debugging problems. In a correctly
operating code, answers will differ only by roundoff, which is easier to
check than in the more general case when there may be different
refinement structures.
A variety of different approaches are taken to balancing workloads.
Perhaps the simplest approach relies on the use of uniform size refinements.
Such an approach was taken by Berger and Saltzman using Connection Machine
Fortran on the CM-2. (See Saltzman's Patch
Based Methods for Adaptive Mesh Refinement Solutions of Partial Differential
Equations (Los Alamos Report Number LA-UR-97-3595).)
Note, however, that the constraint that all refinements have the same size
imposes computation and memory overheads in the form of additional
refined points that would not be present were we to permit variable size
refinements.
Kohn observed that the cost of these refinements can be
substantial, as much as 40%.
Another approach is to employ the knapsack algorithm
(see Saltzman's tech report,
pp 58-59). We start by sorting the refinements in decreasing order of
size,
which is measured in number of points. We then assign each grid in reverse
sorted order to the most lightly loaded processor at the time the grid is
encountered. Finally, we swap grids between the most heavily
loaded processor and another processor if doing so improves load imbalance.
This strategy is not sensitive to communication overheads and may produce
a non-optimal data layout across processors.
Kohn explores an approach based on processor affinity.
First, he subdivides any grid that exceeds the mean grid size,
and then sorts the grids in reverse sorted order as with the knapsack
algorithm. Each patch is assigned a processor preference based on
interactions with patches assigned to other processors. In particular,
a patch will have various affinities to other processors, based
on the size of the intersections. This technique reduces inter-level
communication costs, but does not deal with intra-level communication costs.
In addition, it does not deal with communication load imbalances.
Kohn also observed that 4 processors of an IBM/SP2 (with Power 2 chips)
delivered comparable performance to a single processor
of a Cray C90 (actually the
SP2 slightly outperformed the Cray C90). On paper, both these
configurations
deliver roughly the same peak performance: about
1 Gigaflop/second.
In practice, however, the rates can be much lower.
The SP2 delivers about about one-sixth this rate for
adaptive mesh refinement (150 Megaflops/second) and the
Cray about the same. The primary difficulties on the SP2 are
communication overhead and
poor reuse of cached data.
On the Cray the
difficulty is with short vector lengths. Adaptive
mesh refinement structures tend to be fragmented to avoid unnecessary
over-refinement. In the application considered, refinements were 10 to 30 cells
on a side, which result in short vector lengths. Modern super-scalar processors
are nearly insensitive to short vector lengths.
o--------o--------o----o----o
x-4h x-2h x x+h x+2h
we want the slope (derivative) in the solution at u(x), as viewed from
u(x-2h)
to match the slop as viewed from u(x+h).
Otherwise, the refined region to the right of and including x
will be "polluted" by a less accurate solution computing on the coarse
mesh to the left of and including x .
Implementation
Error estimation and grid generation
We haven't yet said how we generated the refinements. Generally the refinement
structure will change according to the evolving solution. Let's consider the
initial coarse mesh uc. We apply a user-supplied error
estimation procedure which tells us where the error exceeds some
pre-specified threshold. (This estimation procedure generally depends
on the physical problem). We then cluster the flagged points,
filling in gaps between islands as shown to coalesce the points into
an efficient set of refinements. We try to minimize the number of
filled in gaps, noting that some fill-in will likely reduce the overhead
of processing the refinements.
Parallel implementation of AMR
On a serial computer, and AMR algorithm shows a strong
resemblance to the parallel SPMD implementation
of a single-mesh, iterative finite difference solver.
The grid patch is the basic unit of work, and each patch contains local state
which is computed in parallel with other patches at the same level.
Grid patches communicate information across their interfaces.
Ghost cells are thus employed
by the numerical algorithm proper and are not an artifact
of parallelization.
Note that unlike the single mesh finite difference method, AMR introduces
coarse-fine grid communication which incurs computation as well.
Let's consider the simple case
of
a single coarse grid Wcoarse refined by a single
fine patch Wfine, as shown in the left side of the following
figure:
We may express the ghost cell region as the set difference
between the fine patch including the ghost region
and the fine patch excluding the ghost region.
Load balancing
Performance
Adaptive mesh methods can reduce space and time requirements
by as much as two orders of magnitude. Kohn observed a 160-fold
decrease in the amount of storage required in an electronic
structure computation. In particular, he employed about 800,000 unknowns
in a computation that provided the effective resolution of uniform 5123 mesh.
Software
A variety of software libraries have built in support for Srucured
AMR. These libraries have primarily been written in C++ and includE
See the web page
on adaptive mesh refinement for structured Grids at
http://www.camk.edu.pl/~tomek/AMRA/amr.html for further information.
Libraries that support multiblock meshes (which also arise in AMR)
include:
Acknowledgment
The work described here in parallel AMR was
part of the Ph.D. dissertation research of Scott Kohn,
when a Ph.D. student in the Department of Computer Science and Engineering
at UCSD.
© 2002 Scott B. Baden.
Last modified: 01/27/02 09:24 PM