CSE 250A LECTURE NOTES

Wednesday, January 17, 2001
 

ANNOUNCEMENTS

I have to cancel the class for next Monday, January 22, but we will have a makeup lecture this Friday, January 19.  See the class web page for some calendar changes later in the quarter, and for when assignments will be due.
 
 

THE IMPORTANT THEOREM ABOUT A*

Theorem:
(a) Optimality: the first path to a goal found by A* is an optimal path to a goal.
(b) Completeness: if a path to a goal exists, then A* will find it.
(c) Efficiency: no algorithm that is optimal and complete and  has only g and h available as knowledge will always expand  fewer nodes than A*.
Proof of optimality:  By definition, the next node expanded by A* is always one of the remaining unexpanded nodes that has the lowest f = g+h value (maybe the equal-lowest f value).

When A* expands a goal node, it first checks and discovers that it is indeed a goal node.  Now the h value of a goal node is zero, so its f value is its g value, which is exactly the cost of the path to this node.  Call this cost c*.

So at the instant that A* finds a goal node, every other node in the fringe, say n, must have f(n) >= c*.  For each such node n, f(n) is an under-estimate of the true cost c' of reaching the cheapest goal node reachable through n.  Therefore every other goal node has cost c' >= f(n) >= c*.  This proves that A* is optimal.

Proof of completeness: Unless there are infinitely many nodes that have f value less than the optimal cost c*, A* will eventually expand all these nodes with lower f value and arrive at a solution.  There cannot be infinitely many such nodes if (a) all nodes have finite branching factor and (b) the cost of each edge is greater than some epsilon > 0.

Proof of efficiency (idea only!): Consider an algorithm that uses the same information, i.e. the g and h functions, but does not visit all nodes with sub-optimal f value.  Then we can construct an adversary who chooses the true costs so that one of the unvisited nodes is actually the optimal solution.
 
 

REMARKS ON A*

A* expands all nodes with f(n) = g(n) + h(n) < c* and some nodes with f(n) = c*, where c* is the minimum cost of any goal node.  For a given search problem, g(n) and c* are fixed.  Everything else being fixed, if h(n) is bigger then A* explores fewer nodes.  In the limit where h(n) is perfect, A* explores only nodes along optimal path(s).  (If there is more than one such path, we would like to explore only one such path, but A* would need a tie-breaker for choosing which node to remove from the fringe in order to guarantee this.)

The proof of optimality depends on the heuristic function h being admissible, but not on monotonicity.  The function f(n) is monotonic if f never decreases along any path from the root.  Combined with admissibility, this assumption says that f = g+h does not become less accurate as an estimate of the total cost as you move towards a goal.

Let's draw the state space that A* explores, and let's draw contours of equal value of f(n).  If the function f is monotonic, then the contours are properly nested.

What is called optimality here is an "offline" property, while efficiency is an "online" property.  Optimality is sometimes called admissibility, in which case efficiency is sometimes called optimality.

Like all variants of the general search algorithm, A* explores a tree of nodes, where multiple nodes may correspond to the same real-world state.  Some variants of the general search algorithm use a data structure called the closed list to remember already-visited nodes.  Each node discovered through expansion is then compared against all nodes on the closed list to detect whether the new node corresponds to an already-visited state.  Using this terminology, the fringe is called the open list.
 
 

FINDING HEURISTIC FUNCTIONS

The practical usefulness of A* depends entirely on having a good heuristic function.  Consider the 8 puzzle and two heuristic functions: Both h1 and h2 always underestimate the number of tiles that must be moved to reach a solution, so they are both admissible.  (Each move affects only one tile, so summing over each tile considered separately is legitimate for h2.)

For all states of the 8 puzzle, h1(n) <= h2(n).  Hence h2 is a better estimator than h1.  See Table 4.8 on page 103 for statistics on the effectiveness of these heuristics.

If one cannot guarantee that a heuristic function h is always an under-estimator, then A* is not guaranteed to find a shortest path.  However it can be proved that if h is "close to" an under-estimator, then A* finds a path that is "close to" being shortest.
 
 

HOW TO INVENT ADMISSIBLE HEURISTICS

Take the original problem and relax some of the demands so that it is possible to solve the easier problem without search.  Use the cost of the solution to the relaxed problem as the underestimate of the actual solution cost.

For the 8-puzzle, the rule is that we can move a tile from position x to position y if (1) x and y are adjacent, and (2) y is blank.  Heuristic h1 ignores both constraints, while heuristic h2 ignores the second constraint.

This relaxation idea above is an example of what is called meta-level knowledge.  It is knowledge about other knowledge.  Meta-level knowledge can be the most valuable type of knowledge, because it is the most general-purpose.

Many AI programs have similar architectures: they consist of a fixed reasoning algorithm based on search, which is guided by problem-specific information represented in some explicit fashion.  In this case the algorithm implementation can be called an inference engine and the problem-specific information is called a knowledge base.

Different inference engines work with different levels of knowledge.  The output of one AI program can be knowledge for use by another program.
 
 

"EFFECTIVE" BRANCHING FACTOR

We can measure the performance of a heuristic function and empirically predict the running time of a search algorithm by computing the effective branching factor of search using this algorithm/heuristic combination.

Suppose the solution to problem P is found by algorithm A at depth d, after expanding k nodes.

Definition: The effective branching factor b(P,A) is the solution for b of the equation k = 1 + b + ... + b^d = (b^(d+1) - 1)/(b - 1).  This is approximately the dth root of k since k = O(b^d).

Effective branching factor is the best way to compare a (P,A) combination.  Percentage speedup is meaningless when comparing algorithms that take exponential time.

Any algorithm that takes time polynomial in d, say O(d^m) where m is fixed, will have effective branching factor tending to 1 as d tends to infinity.  Proof: log b = log (d^m)^(1/d) = (m/d) log d = m (log d)/d.  Now (log d)/d -> zero from above as d -> infinity and m is constant`, so log b -> 0 from above as d -> infinity, and therefore b -> 1 from above.
 
 

CONSTRAINT SATISFACTION PROBLEMS (CSPs)

A CSP is a state-space search problem where each state consists of a value for each of a set of variables.  A goal state is one that satisfies a conjunction of constraints.

We will restrict attention to CSPs with a finite number of variables each with a finite (hence discrete) domain.  Some examples are:

A constraint is a legal subset of the cross-product of the domains of a certain number of variables.  The arity of a constraint is the number of variables involved.

Map coloring and the n-queens problem both involve only binary constraints, and much research has only considered binary constraints.  The fact that one version of the map-coloring problem is also NP-complete shows that handling binary constraints is not fundamentally easier.
 
 



Copyright (c) by Charles Elkan, 2001.