(a) Optimality: the first path to a goal found by A* is an optimal path to a goal.Proof of optimality: By definition, the next node expanded by A* is always one of the remaining unexpanded nodes that has the lowest f = g+h value (maybe the equal-lowest f value).
(b) Completeness: if a path to a goal exists, then A* will find it.
(c) Efficiency: no algorithm that is optimal and complete and has only g and h available as knowledge will always expand fewer nodes than A*.
When A* expands a goal node, it first checks and discovers that it is indeed a goal node. Now the h value of a goal node is zero, so its f value is its g value, which is exactly the cost of the path to this node. Call this cost c*.
So at the instant that A* finds a goal node, every other node in the fringe, say n, must have f(n) >= c*. For each such node n, f(n) is an under-estimate of the true cost c' of reaching the cheapest goal node reachable through n. Therefore every other goal node has cost c' >= f(n) >= c*. This proves that A* is optimal.
Proof of completeness: Unless there are infinitely many nodes that have f value less than the optimal cost c*, A* will eventually expand all these nodes with lower f value and arrive at a solution. There cannot be infinitely many such nodes if (a) all nodes have finite branching factor and (b) the cost of each edge is greater than some epsilon > 0.
Proof of efficiency (idea only!): Consider an algorithm that uses the
same information, i.e. the g and h functions, but does not visit
all nodes with sub-optimal f value. Then we can construct an adversary
who chooses the true costs so that one of the unvisited nodes is actually
the optimal solution.
The proof of optimality depends on the heuristic function h being admissible, but not on monotonicity. The function f(n) is monotonic if f never decreases along any path from the root. Combined with admissibility, this assumption says that f = g+h does not become less accurate as an estimate of the total cost as you move towards a goal.
Let's draw the state space that A* explores, and let's draw contours of equal value of f(n). If the function f is monotonic, then the contours are properly nested.
What is called optimality here is an "offline" property, while efficiency is an "online" property. Optimality is sometimes called admissibility, in which case efficiency is sometimes called optimality.
Like all variants of the general search algorithm, A* explores a tree
of nodes, where multiple nodes may correspond to the same real-world state.
Some variants of the general search algorithm use a data structure called
the closed list to remember already-visited nodes. Each node
discovered through expansion is then compared against all nodes on the
closed list to detect whether the new node corresponds to an already-visited
state. Using this terminology, the fringe is called the open list.
For all states of the 8 puzzle, h1(n) <= h2(n). Hence h2 is a better estimator than h1. See Table 4.8 on page 103 for statistics on the effectiveness of these heuristics.
If one cannot guarantee that a heuristic function h is always an under-estimator,
then A* is not guaranteed to find a shortest path. However it can
be proved that if h is "close to" an under-estimator, then A* finds a path
that is "close to" being shortest.
For the 8-puzzle, the rule is that we can move a tile from position x to position y if (1) x and y are adjacent, and (2) y is blank. Heuristic h1 ignores both constraints, while heuristic h2 ignores the second constraint.
This relaxation idea above is an example of what is called meta-level knowledge. It is knowledge about other knowledge. Meta-level knowledge can be the most valuable type of knowledge, because it is the most general-purpose.
Many AI programs have similar architectures: they consist of a fixed reasoning algorithm based on search, which is guided by problem-specific information represented in some explicit fashion. In this case the algorithm implementation can be called an inference engine and the problem-specific information is called a knowledge base.
Different inference engines work with different levels of knowledge.
The output of one AI program can be knowledge for use by another program.
Suppose the solution to problem P is found by algorithm A at depth d, after expanding k nodes.
Definition: The effective branching factor b(P,A) is the solution for b of the equation k = 1 + b + ... + b^d = (b^(d+1) - 1)/(b - 1). This is approximately the dth root of k since k = O(b^d).
Effective branching factor is the best way to compare a (P,A) combination. Percentage speedup is meaningless when comparing algorithms that take exponential time.
Any algorithm that takes time polynomial in d, say O(d^m) where m is
fixed, will have effective branching factor tending to 1 as d tends to
infinity. Proof: log b = log (d^m)^(1/d) = (m/d) log d = m (log d)/d.
Now (log d)/d -> zero from above as d -> infinity and m is constant`, so
log b -> 0 from above as d -> infinity, and therefore b -> 1 from above.
We will restrict attention to CSPs with a finite number of variables each with a finite (hence discrete) domain. Some examples are:
Map coloring and the n-queens problem both involve only binary constraints,
and much research has only considered binary constraints. The fact
that one version of the map-coloring problem is also NP-complete shows
that handling binary constraints is not fundamentally easier.