As we have seen from our study of partial differential equations, diverse physical phenomena exhibit spatial and temporal locality. Included are gravitation attraction, coulombic interaction of charged particles, diffusion of energy, and the propagation of waves. These phenomena often exhibit fine structures that are unpredictable and irregular, and their accurate modeling is important economically. Turbulence, for example, that occurs in the wake of a Boeing 747, or the flow of oil and water in an oil field are examples of such phenomena. Shock waves that accompany a sonic boom are another.

Irregular phenomena are often highly localized, and exhibit large derivatives
that are difficult to capture numerically. From our earlier study of finite
methods we
know that the error terms in a finite different approximation are polynomials
in the mesh spacing * h *, and the coefficients are the derivatives
of the function we are computing. Up to this point, we have implicitly
assumed that the function is "nice;" that is it has finite, bounded
derivatives of all orders. However, this is not always the case. A shock
wave emanating from an aircraft moving at supersonic speeds actually
exhibits discontinuities or jumps were strictly speaking, the derivatives
are not defined. In other cases, we may singularities due to the
presence of point charges; the potential function varies as *1/r*,
and the
derivative, *ln r * is unbounded as *r -> 0*.

Consider, for example, the 1D Poisson equation

u''(x) = 1 / | x | on the intervals [-1.0, 0.0) and (0.0,1.0], u''(x) = C at x=0, where C is an appropriately defined constant u(-1.0) = u(1.0) = K, where K is an appropriately defined constantThe various derivatives of u(x) are exceedingly large at x=h the mesh spacing, and grow as

To capture functions that do not have "nice" derivatives we must reduce
the spacing of the finite difference mesh.
Unfortunately,
we might have to refine the mesh by as much as a factor of 50 or 100
(or even more depending on the initial data). While this requirement
may not be problematic in one dimension it almost certainly
will be in two or more spatial dimensions. A factor of 100 increase
in the linear mesh spacing would require that the number of unknowns in
a 3-dimensional equation grow by a factor of 10^{6}, and for a time dependent
problem we would also have to decrease the time-step by a factor of 100 resulting
in a 100 million-fold increase in the computational effort!

But this is only part of the story. Clearly we are wasting our effort if the solution is smooth over large portions of the problem domain. In fact, for a given amount of resources we can improve accuracy dramatically by adjusting the mesh spacing according to the estimate error of the solution, which is often measured in terms of its derivatives. Thus, a non-uniform mesh not only reduces the cost of the computation, but often more importantly it reduces memory consumption significantly.

In this lecture we will discuss an adaptive mesh refinement algorithm
that was developed by Berger and Oliger (* J. Computational Physics,*
53:484-512, 1984) and later refined by Berger and
Colella (* J. Computational Physics,* 82(1):64-84, May 1989.)
It is based in part on Bolstad's earlier
work in one dimension.
This algorithm is known by various names including
the "Berger-Oliger-Colella"
algorithm or "patch based AMR."
Another important type of AMR, based on quad- or oct-trees,
won't be discussed here. We'll consider tree-like refinement
structures in the context of particle methods instead later in this
lecture.

Here is a picture of a not-so-nice function. On close examination, the mesh spacing varies according to an estimate of the error. Note how the mesh points are spaced close together in the vicinity of the cusp.

Two issues arise in implementing a non-uniform mesh

- How do we decide on the local mesh spacing?
- How do we handle a point that sees different spacings for its left and right neighboring points?

Our solution to #1 entails two sub-steps:

- Error estimation
- Grid generation

Our solution to #2 entails some special fix-up procedures to ensure that the solution seen at such transitional points is consistent as viewed by the two neighboring points. In the following mesh for example

o--------o--------o----o----o x-4h x-2h x x+h x+2hwe want the slope (derivative) in the solution at

In structured AMR we don't represent each point on the mesh with its
own value of * h*. Rather, we form a composite
mesh organized into levels, each succeeding level with a finer spacing
than the previous one. This is shown in the following figure

The spacing at each level is uniform, and the refinement factor * R *
gives us the ratio between the mesh spacings at adjacent levels.
Usually * R * is a constant; typical values are two or four.
(* R * is usually restricted to a power of 2).

In general we will have several levels. We may fix the number of levels
arbitrarily, or we may simply add levels until we run out of memory.
For the time being, we consider a 2-level hierarchy comprising a fine
grid u_{f} and a coarse grid u_{c}:

Now, how do we solve on the composite structure (The procedure is usually
accelerated by multigrid, which we leave out to simplify the discussion).?
In an elliptic equation it turns out that * u * is defined only
at the finest level appearing in the representation. Thus, we don't represent
u_{c} at positions covered by u_{f}.

To compute on the composite mesh we first compute on the
base mesh u_{c}. We are careful to exclude the points covered
by u_{f}. Thus, we must solve Poisson's equation for two separate right
hand sides, to the left and to the right of the overlapping refined region.
Each equation take Dirichlet boundary conditions. The "left" equation
takes it left boundary condition from the physical boundary conditions.
The right boundary condition is obtained from the fine mesh; we average
two adjacent fine mesh cells onto the coarse mesh. This extra coarse
mesh cell is like ghost cell, except that we must also carry out
numerical computation (this is identical to the multigrid * restriction *
operations.

We next solve on * u _{f} *. Generally

When setting up boundary conditions we must take care that the slopes match. This is nicely described in the tutorial "Solving Poisson's Equation using Adaptive Mesh Refinement" by Dan Martin and Keith Cartright (see the URL http://barkley.me.berkeley.edu/~martin/AMRPoisson.html.)

In higher dimensions, grids communicate numerical information
in a more elaborate manner, as
shown in the following figure.
This figure shows a coarsening * inter-level* communication
operation in which
grids communicate coarsened data to their parent. Notice how the fine
grids cast shadows on the coarse grids, and that the coarse grids
break these shadows into subregions. This bookkeeping is tedious
and must be written carefully.
Grids also communicate within a level, which is
known as * intra-level* communication. This entails
ghost cell information as in ` rb2D`. We note, however,
that owing to the presence of coarse-fine interfaces grids must also employ
inter-level communication to fill in ghost cells.

In one dimension, clustering is straight forward.
In two and more dimensions,
however, the procedure gets more complicated.
In the basic Berger-Oliger-Colella algorithm each level is refined
as a single logical unit. This implies that refinements are not
aware of the breakdown of a level into grid components. As a result,
a refinement can
have more than one parent as shown in the figure.
Refinement structures may not generally be represented as a tree,
e.g. a quad-tree or oct-tree, which are employed in * adaptive quadrature *.

** A 3 level refinement structure. The base mesh is pink, and is refined by
two
blue level-2 grids. The red spots mark the location of flagged points
in the second level, which are covered by green level-3 refinements.
Note that one of the level-3 refinements has more than one parent
grid component.**

Refinements are generated recursively, stopping when some user-specified
criteria is met. Refinements must be * properly nested *, that is,
there must be some padding around each refinements such that it doesn't
abut the edge of the coarse mesh. This constraint bounds the number of
levels that must be spanned when interpolating coarse cells onto
fine ghost cells, and simplifies the implementation.

The Berger-Rigoutsos
algorithm is a standard technique used to handle clustering;
it is based on an edge detection algorithm from image understanding.
We'll give a brief summary here;
consult Scott Kohn's
Ph.D. Dissertation
or Berger and Rigoutsos' original article
(*IEEE Transactions Systems, Man and Cybernetics, * 21(5):1278-1286,1991.)
The aim of clustering is to minimally cover all points
requiring refinement. That is, we want to avoid covering
unflagged points, but we are willing to cover some of these points if the
overall cost of handling the refinement structures is reduced.
This effectively boils down to a granularity tradeoff.
In one extreme, we could exactly cover only the flagged points.
But, we would likely have many small refinements., with a large
communicating surface area.
At the other extreme, we could have only a few refinements, but
would do considerable unnecessary work.
Somewhere in between, we do a little bit of extra work, in exchange
for lower operating overheads.

Clustering algorithms for patch-based AMR tend to be based
on a few primitives: * slicing* the flagged points
at a position(s) where the density of points is low,
*wrapping* a bounding box around
an area of interest to fit snugly around the flagged points,
* splitting* the bounding box
if the fraction of unflagged points is too high.
Using these basic operations, we can express a variety of interesting
strategies.

In some higher-dimensional variants,
refinements may be rotated with respect to one another.
This complicates boundary condition
transfers between grids at the same level. When the grids
are not rotated, then these * intra-level* transfers are handled
just like ghost cells in uniform mesh methods, except that some ghost
cells will get filled in by interpolating from a coarse mesh. Note that
this added complexity of intra-level transfers does not arise in one
dimensional computations.

In some applications, refinement structures evolve according to the solution we are computing. Thus, we will discard some refined points, and add others. Often we pad refinements to permit us to re-grid the meshes less frequently. Clearly there is a tradeoff; if we don't pad enough, then we must refine too often, but if we pad too much to avoid refining too frequently, we increase the computational cost of numerical integration.

To refine a point we interpolate it onto a finer mesh. We never throw away refined points that are to remain refined; we always use the best information available. Thus we copy old refined points where they intersect, into corresponding positions of the new refinement structure. If we threw away such points, then we would be forced to reconstruct them by interpolation, which would pollute the fine meshes with less accurate coarse grid data.

We note that the exact details of error estimation, interpolation, coarsening, and smoothing are problem-dependent. Grid generation, however, is largely application-neutral since it doesn't know anything about the problem. This tells us that library support for adaptive mesh methods is likely to be as diverse as the problems themselves. However, by layering the library support we may be able to reuse some software components.

We may express the ghost cell region as the set difference between the fine patch including the ghost region and the fine patch excluding the ghost region.

It should come as no surprise that the effort required to construct even
a * serial * adaptive code
is substantial and requires appropriate library support. After all, we are carrying out
many of the activities that would occur in a parallel implementation:
decomposition (grid generation) and inter-grid transfers. The implementation issues
introduced on a parallel computer, e.g. load balancing and inter-processor
communication, compound the difficulties
entailed in a serial implementation, but the major intellectual leap
is in dealing with the issues of a single processor implementation.

Consider, for example, grid generation. Recall that a grid patch may have more than one parent. This is true because error estimation is carried out on an irregular region that makes up a level. The specific grouping of points in the parent level aren't relevant to refinement decisions. In fact, one grouping is as good as any other as far as the error estimator is concerned. (Of course, there are externally imposed constraints that remove some groupings from consideration.)

To raise the level of abstraction from grid patch to irregular grid level we need to build an appropriate application level interface. Support for adaptive mesh refinement is usually specialized.

An important abstraction
is the ` IrregularGrid,` which hides the
exact assignment of grid components to processors. This abstraction
enables us to treat irregularly shaped region as if it were a single
logical unit, as
shown on the right half of the previous Figure. In particular, communication of ghost cells across
grid interfaces is handled
exactly in the same way as for a uniform mesh method.
We use the set difference construct as before, but interpreted
over irregular regions.
Coarse/fine interfaces are handled somewhat differently. As with
fine-fine interfaces, we don't know the number of processors we are running on.
However, we must also carry out computation.

Load balancing arises because the grids have different sizes. Some processors may obtain more than one grid, and some larger grids may have to be split over multiple processors. This process is complicated by the fact that communication workloads may also be imbalanced. In the interests of managing software complexity, we prefer to treat load balancing as a separate post-processing step after grid generation. This separation of load balancing from grid generation simplifies software design by permitting application and hardware-specific knowledge to be handled separately.

To simplify the debugging process we want to constrain the load balancer so that the points in the resultant refinement structure will be the same regardless of the number of processors employed. In other words, there exists a canonical refinement structure and load balancing decisions may neither add nor remove refined points to or from this canonical refinement structure, but may only regroup them. This ensures that refinement decisions are unaffected by parallelization, which otherwise could introduce subtle debugging problems. In a correctly operating code, answers will differ only by roundoff, which is easier to check than in the more general case when there may be different refinement structures.

A variety of different approaches are taken to balancing workloads.
Perhaps the simplest approach relies on the use of uniform size refinements.
Such an approach was taken by Berger and Saltzman using Connection Machine
Fortran on the CM-2. (See Saltzman's *Patch
Based Methods for Adaptive Mesh Refinement Solutions of Partial Differential
Equations* (Los Alamos Report Number LA-UR-97-3595).)
Note, however, that the constraint that all refinements have the same size
imposes computation and memory overheads in the form of additional
refined points that would not be present were we to permit variable size
refinements.
Kohn observed that the cost of these refinements can be
substantial, as much as 40%.

Another approach is to employ the knapsack algorithm (see Saltzman's tech report, pp 58-59). We start by sorting the refinements in decreasing order of size, which is measured in number of points. We then assign each grid in reverse sorted order to the most lightly loaded processor at the time the grid is encountered. Finally, we swap grids between the most heavily loaded processor and another processor if doing so improves load imbalance. This strategy is not sensitive to communication overheads and may produce a non-optimal data layout across processors.

Kohn explores an approach based on * processor affinity.*
First, he subdivides any grid that exceeds the mean grid size,
and then sorts the grids in reverse sorted order as with the knapsack
algorithm. Each patch is assigned a processor preference based on
interactions with patches assigned to other processors. In particular,
a patch will have various affinities to other processors, based
on the size of the intersections. This technique reduces inter-level
communication costs, but does not deal with intra-level communication costs.
In addition, it does not deal with communication load imbalances.

Kohn also observed that 4 processors of an IBM/SP2 (with Power 2 chips) delivered comparable performance to a single processor of a Cray C90 (actually the SP2 slightly outperformed the Cray C90). On paper, both these configurations deliver roughly the same peak performance: about 1 Gigaflop/second. In practice, however, the rates can be much lower. The SP2 delivers about about one-sixth this rate for adaptive mesh refinement (150 Megaflops/second) and the Cray about the same. The primary difficulties on the SP2 are communication overhead and poor reuse of cached data. On the Cray the difficulty is with short vector lengths. Adaptive mesh refinement structures tend to be fragmented to avoid unnecessary over-refinement. In the application considered, refinements were 10 to 30 cells on a side, which result in short vector lengths. Modern super-scalar processors are nearly insensitive to short vector lengths.

© 2002 Scott B. Baden. Last modified: 01/27/02 09:24 PM