## CSE 260 Notes for Lecture 5 (10/8/09)

### Boundary value problems

Many physical problems are modeled with differential equations, and today we'll consider a simple one-dimensional problem that is the starting point for understanding the higher dimensional cases that arise in practice. (1D problems are often used to study higher dimensional problem, because they are simpler and less expensive to solve numerically.)

A variety of methods are used to solve differential equations, and here we'll look at an iterative method used to solve the boundary value problem given by the following ordinary differential equation:

u''(x) = 8                                                 (ODE)

We will solve this equation on the unit interval [0,1] subject to boundary conditions

u(0) = u(1) = 0

We will use the method of finite differences to solve the problem; we discretize the solution u by considering the values of u only at discrete points spaced regularly along the unit interval.

We divide the unit interval into N-1 subintervals using N points positioned uniformly along the line at positions x=i×h, i=0, 1, 2, ..., N-1. Let u[i] represent the value of the approximate solution at point x = i×h, u(i×h), where h = 1/(N-1) and is known as the mesh spacing

```               i=0   i=1   i=2			     i=N-1
|-----|-----|------| ..... |-----|-----|
^	  			       ^
|				       |
x=0   x=h  x=2h		       x=1-h  x=1
```

We use a Taylor series to approximate the second derivative u''(x) on the left hand side of equation (ODE) as

u''(i*h) ~ ( u[i-1] - 2u[i] + u[i+1] )/h2

Setting this equation equal to the right hand side of equation (ODE), and rearranging terms, we come up with the following iterative algorithm, known as Gauss-Seidel method.

```    repeat {
for i = 1 to N-1
u[i]= (u[i-1]  + u[i+1] - 8*h*h ) / 2
}
until ( Error(u,uExact) < ε);
```

where uExact[ ] contains the exact solution.

The method updates each value of u[i] as a function of its nearest neighbors:

```     u[i-1]         u[i]        u[i+1]
|-----------|-----------|
x = (i-1)*h      i*h        (i+1)*h
```

It can be shown that the Gauss-Seidel method will converge an approximate solution to u. The smaller we make h the more accurate our answer will be, but at a greater cost, since N increases as h-1. If you are interested in the details, consult Numerical Recipes in C, 2nd Ed., by Press et al., Cambridge University Press, which is also available on line, or a good book on numerical analysis such as Numerical Methods, by Dahlquist and Bjorck, Prentice Hall.

The values of u[i] when i is 0 or N-1 are boundary data, and are given as part of the problem data; these are defined to be 0 in our case. The evolving solution is represented on the interior N-2-element sub-array, u[1] through u[N-2]. Beginning with an initial guess we update the solution according to the right hand side of the equation (ODE) until we are satisfied with the answer. We are finished computing when the error drops below the user-specified value ε. It can be shown that the solution will converge in O(N2) iterations. The more accurate the answer required, the greater the cost. The cost of computing the solution increases with N and with decreasing tolerance.

In general we do not know the exact solution and we must approximate the error (For simple linear ODEs like ours, the problem has a known exact solution. In our case it is u(x) = 4x(x-1). A simple way of estimating the error is to measure the change in the solution from one iteration to the next. When the change to the solution is small enough, we are done (This is not necessarily the best way to compute the error, but will do for now.)

How do we measure the change in the solution? One way is to take the absolute value of the point wise difference in the solution, and report the error as the maximum value of these absolute differences. That that is, for each point of the computed solution

• take the difference between corresponding points of the current value of the solution, and the solution computed at the previous iteration
• take the absolute value of the result, and
• report the error as the maximum value over all these absolute differences.
• Another possible metric is to compute the Euclidean distance between successive iterations. That is, we treat the old and new solution as n-element vectors an compute the Euclidean distance between them:√(∑(u[i]-uprev[i])2)

One last detail remains. What values should we choose for the initial values of the solution array u[ ]? We usually come up with some reasonable guess to the solution; the better the guess the more quickly the answer will converge, and fewer iterations will be required. Coming up with a good guess depends on the problem. In our case we will choose the guess to be a solution of all ones, except for the boundary points which will be set to zero as described above.

### Jacobi's method

The trouble with the above numerical algorithm is that it doesn't parallelize. Each value of u[i] depends on all values of u[ ] to the left and to the right:

```for  i in 1 : N-1
u[i]= (u[i-1] + u[i+1] - 8*h*h) / 2
end for
```

If we executed the above loop on a PRAM, then execution would take N-1 steps.

One way to eliminate the problem is to rename the left hand side of the update formula. We maintain both and old and new meshes, storing updates into the new mesh. This results in an algorithm known as Jacobi's method.

It can be shown the the Jacobi method will eventually reach (converge to) the exact solution:

```repeat s
for i in 1:N-2
ui(s) = (ui-1(s-1) + ui+1(s-1) )/ 4
end for
until |u(s) - u(s-1)| is small enough

```

where ui(s-1) is the value of the mesh at (i) at iteration s-1. Each value in the solution is the average of the 2 nearest neighbors in the "old" mesh. After each iteration we swap old and new meshes.

#### Red black ordering

Another approach is to divide the unknowns into two groups---red and black (sometimes called odd and even) then we can decouple the points into disjoint sets, and update them in any desired order.

```	@ -> boundary

@----o----o----o----o----o----o----o----o----o----o----@
```

The algorithm is as follows:

```   repeat
// Update red points
for i in 1:N-2, i is even
u[i] = (u[i-1] + u[i+1] - 8*h*h ) / 2
endfor

// Update black points
for i in 1:N-2, i is odd
u[i] = (u[i-1] + u[i+1] - 8*h*h ) / 2
endfor
until error is small enough.
```

With Red/Black ordering the solution may converge more quickly than with the usual ordering. However, the computation within the inner loop is a bit more expensive since it must access (read) each point in the solution twice per sweep (Why is this so?). However, the benefit of this algorithm is that it parallelizes. This technique valuable in removing data dependences in a variety of algorithms.

### SPMD Parallel implementation

We can parallelize Rb by dividing the points into groups, one to a processor. Each group consists of N/P points, assuming that P divides N evenly and P is the number of processors.

To update each point, we need the values to the immediate left and right. However, we don't have the values we need at the edge of each processor's sub interval. These are stored on off-processor. To get these values we must send and receive messages. To this end, it is customary to add one more point on each side of the local sub-interval to hold these values, which are called ghost cells Thus, each processor actually holds N/P + 2 values: the N/P values representing the solution, and the 2 neighboring values.

```@ -> boundary
G -> ghost cell

P0:      @----o----o----o----o----G
P1:                          G----o----o----o----o----G
P2:                                              G----o----o----o----@

```

When checking for convergence, each processor will compute a local error. However, all these local errors must be combined into a global error. To do this, we let each processor send  a message to processor 0, which forms the global some.

### Principles of building an effective program

To build a parallel program we carry out a number of steps on the path to a finely tuned implementation. While the details of the steps generally depend on the application all applications must deal with the following issues:

Algorithm selection

Is there a more appropriate numerical algorithm for solving the problem in parallel, which uses the resources more effectively than the existing approach?

Decomposition

How do we subdivide the data into parts?

Orchestration

How do the processors organize themselves, handling communication and synchronization?

Mapping

How do we map (assign) the data to processors?

In our application, decomposition is straightforward. We simply split the one-dimensional interval into subintervals one for each processor. Mapping is also straightforward since each processor gets just one piece of work. To handle orchestration, we pass messages. The algorithm we used, Gauss-Seidel's method with Red/Black ordering, is designed with a parallel processor in mind.

We'll consider the more challenging (and interesting) multidimensional case in class.