CSE 150 LECTURE NOTES

Wednesday October 6, 2004
 
 

ANNOUNCEMENTS 

The first assignment is due in just one week, on October 13.  You may work by yourself, or with one partner.  Be sure to watch the class web page for changes and clarifications.

Today's handout is a summary of questins and answers about last year's version of this assignment.  Important note:  This year, you must also do the missionaries-and-cannibals program using A*, but you only need one heuristic for the robot navigation problem.  You must generate random problem instances only for the eight-puzzle.

Our discussion board is up and running at http://webct.ucsd.edu/.  Let's have more traffic!


THE IMPORTANT THEOREM ABOUT A*

Theorem:
(a) Optimality: the first path to a goal found by A* is an optimal path to a goal.
(b) Completeness: if a path to a goal exists, then A* will find it.
(c) Efficiency: no algorithm that is optimal and complete and  has only g and h available as knowledge will always expand  fewer nodes than A*.
Proof of optimality:  By definition, the next node expanded by A* is always one of the remaining unexpanded nodes that has the lowest f = g+h value (maybe the equal-lowest f value).

When A* expands a goal node, it first checks and discovers that it is indeed a goal node.  Now the h value of a goal node is zero, so its f value is its g value, which is exactly the cost of the path to this node.  Call this cost c*.

So at the instant that A* finds a goal node, every other node in the fringe, say n, must have f(n) >= c*.  For each such node n, f(n) is an under-estimate of the true cost c' of reaching the cheapest goal node reachable through n.  Therefore every other goal node has cost c' >= f(n) >= c*.  This proves that A* is optimal.

Proof of completeness:  Suppose there are a finite number of nodes with f value less than the optimal cost c*.  Then A* will eventually expand all these nodes with lower f value and arrive at a solution.  Can there be infinitely many such nodes?  No, if (a) all nodes have finite branching factor and (b) the cost of each edge is greater than some epsilon > 0.

Proof of efficiency (idea only!):  Consider a different algorithm that uses the same information, i.e. the g and h functions, but does not visit all nodes with sub-optimal f value.  Then we can construct an adversary who chooses the true costs so that one of the unvisited nodes is actually the optimal solution.


REMARKS ON A* AND THE PROOF

(1) A* expands all nodes n with f(n) = g(n) + h(n) < c* and some nodes with f(n) = c*, where c* is the minimum cost of any goal node.  For a given search problem, g(n) and c* are fixed.  Everything else being fixed, if h(n) is bigger then A* explores fewer nodes.  In the limit where h(n) is perfect, A* explores only nodes along optimal path(s).  (If there is more than one such path, we would like to explore only one such path, but A* would need a tie-breaker for choosing which node to remove from the fringe in order to guarantee this.)

(2) The proof of optimality depends on the heuristic function h being admissible, but not on monotonicity.  The function f(n) is monotonic if f never decreases along any path from the root.  Combined with admissibility, this assumption says that f = g+h does not become less accurate as an estimate of the total cost as you move towards a goal.

(3) Let's draw the state space that A* explores, and let's draw contours of equal value of f(n).  If the function f is monotonic, then the contours are properly nested.

 

"EFFECTIVE" BRANCHING FACTOR

We can measure the performance of a heuristic function and empirically predict the running time of a search algorithm by computing the effective branching factor of search using this algorithm/heuristic combination.

Suppose the solution to problem P is found by algorithm A at depth d, after expanding k nodes.

Definition: The effective branching factor b(P,A) is the solution for b of the equation k = 1 + b + ... + b^d = (b^(d+1) - 1)/(b - 1).  This is approximately the dth root of k since k = O(b^d).

Effective branching factor is the best way to compare a (P,A) combination.  Percentage speedup is meaningless when comparing algorithms that take exponential time.

Any algorithm that takes time polynomial in d, say O(d^m) where m is fixed, will have effective branching factor tending to 1 as d tends to infinity.  Proof: log b = log (d^m)^(1/d) = (m/d) log d = m (log d)/d.  Now (log d)/d -> zero from above as d -> infinity and m is constant`, so log b -> 0 from above as d -> infinity, and therefore b -> 1 from above.


NOTES ON NOT EXPANDING THE SAME STATE REPEATEDLY

This issue is often called "eliminating duplicates."  We have mentioned it, but not discussed it directly.

Definition:  A duplicate is a node that contains the same state as some previously generated node.

Question:  Is it enough to look for duplicates in the "fringe" only?  Answer:  No; we also must check for duplicates among nodes already expanded, hence removed from the fringe; otherwise we miss some duplicates.  Reason:  Suppose state s1 leads to s2 leads to s1.  The second time we create a node with state s1, the first node with state s1 will no longer be in the fringe.  So if we check for duplicates inside the fringe only, we won't realize that the new node with s1 is  a duplicate.

The nodes that have already been expanded (hence removed from the fringe) are often called the "closed" list.  The fringe is then called the "open" list; see Section 3.5 of the book.  Using the "open" and "closed" lists gives a variant of the "general search algorithm" called "graph-search." 

Question:  Does Figure 4.8 use duplicate elimination?  Answer:  Yes, because the last paragraph before "Memory-bounded heuristic search" on page 101 indicates that A* means "A* with graph-search" for the authors.

Question:  When we eliminate duplicates, which g and h values should we use for the node we keep?  The duplicates may have different g and/or h values!  (This issue is discussed on page 99 to motivate the concept of monotonicity.)

Answer:  Let m and n be duplicate nodes, i.e. both containing the same state s.  We might as well take the best path from the initial state to s, so we should take whatever g-value is smaller.  Assume that h(n) is a deterministic function of states.  (This is the usual case.)  Then h(n) will be the same for all nodes representing the same state; these nodes can differ in g(n) only.

With monotonicity, a newly generated node must have f-value >= its parent's f-value.  Nodes are taken from the fringe in f-value order, so the generated node must have f-value >= f-value of every node in the "closed" list.  So it must have g-value >= g-value of each of its
duplicates in the closed list.  Hence the best g-value for any state is the g-value obtained the first time a node for the state is created.

Therefore, if an old node n with state s is in the closed list, and a new node with state s is created, the new node can be discarded.  If the old node is in the open list (i.e. the fringe) then either the old or the new node may have the best g-value.


FURTHER NOTES ON A*

(1) The book says that A* runs out of memory before it runs out of time on page 101.  Suppose the average branching factor is 1+x where x >= 1 always.  Then for every 1000 nodes removed from the fringe, 1000+1000*x are added.  This means that memory use increases by 1000*x, so the size of the fringe increases linearly with time, with no maximum.  Important note: the size of the closed list is linearly proportional to the size of the open list, so the closed list does not increase big-O memory usage.

(2) The book does not suggest any tie-breaking policy for nodes that have same f-value.  The policy "when f-values are equal, expand the node with larger g-value first" is sensible.  (Reason:  The larger g-value means that a bigger part of the f-value is known for sure, and a smaller part is an just an inaccurate estimate.)  Larger g-value means smaller h-value, so this can be implemented easily when g-values are integers under 100 (e.g. for the 8-puzzle) as f(n) = g(n) + 1.01*h(n).  Of course we still need a different tie-breaker for when g and h are both equal.