CSE 250A LECTURE NOTES
February 5, 2001
ANNOUNCEMENTS
Today's handout consists of some notes
about the reports submitted for the web crawler project. Most of
the important general issues for the second project are similar to those
for the first project. I'll discuss the notes briefly. There
are at least two additional important issues:
-
When you report an average, you should also report a standard error for
the average. This is especially important when comparing two averages.
Otherwise, you cannot say whether the difference is statistically significant,
i.e. reproducible.
-
Try to be as precise and clear as possible in all descriptions. Precision
and clarity lead you step by step to new insights. A simple list
(like this bulleted list of issues) is an excellent way to organize most
technical descriptions and analyses.
Remember that this week, Wednesday's lecture will be replaced by a makeup
lecture on Friday.
MIXING RANDOMNESS AND GREEDINESS
Forgetting about the metaphor from physics, the advantage of simulated
annealing seems to be that it combines a random walk with greedy local
search, and does not bother to evaluate all children of the current state.
Let's consider a 3SAT algorithm that does this combination directly:
choose a random initial truth assignment
while not termination condition do
randomly choose an unsatisfied clause
flip one variable of this clause as follows:
with fixed probability
p
select the variable randomly
or, with probability
1-p:
select the variable greedily, i.e. select
the variable that yields the best h value
This algorithm is called Walksat. Note that at each step, the randomly
chosen clause is satisfied, but other clauses may become unsatisfied.
The parameter p is called the "mixing probability" and determined
approximately by experiment for a given class of CNF formulas. For
random hard 3SAT problems (those with the ratio of clauses to variables
around 4.25) p = 0.5 works well. For 3SAT formulas with more
structure, as generated in many applications, slightly more greediness,
i.e. p < 0.5, is often better.
Because of its random walk component, Walksat is less likely than GSAT
to become stuck in the region around a local optimum. Therefore,
it is more likely to find a solution without any restarts and the termination
condition, which says when to restart with a new random assignment, should
be more stringent. Empirically, restarting after O(n2)
flips, where n is the number of variables, works well.
Walksat is the best known algorithm for 3SAT. It outperforms simulated
annealing and is capable of solving problems with around 2000 variables
in a few hours.
HOW TO TUNE A LOCAL SEARCH METHOD
As discussed in class, the idea is to choose settings for the parameter
of the algorithm in order to minimize the ratio m/s, where
m
is the long-term average of the h values achieved, and
s
is the standard deviation of these values.
For details see the paper Evidence
for Invariants in Local Search by David McAllester, Bart Selman,
and Henry Kautz, AAAI'97.
METAPHORICAL ALGORITHMS
Search algorithms, usually local and randomized, have been proposed based
on numerous phenomena from physics, biology, sociology, and economics.
Adapted from Andrew Moore of
CMU, here is a list:
-
thermodynamics of gases
-
strengthening of metals by growing large crystals through repeated slow
cooling
-
elementary particle physics
-
Darwinian evolution
-
Lamarckian evolution
-
co-evolution
-
capitalist markets
-
management hierarchies
-
colonies of social insects
-
mammalian immune systems
-
conditioning of animal behavior
-
networks of brain cells
Rigorous experimental or mathematical results have not been established
for any of these approaches. In general, there is no reason to expect
that whatever works best in one arena (ant colonies for example) will also
work best in another arena (human societies or computation for example).
A metaphor can be inspirational, but there is no reason to expect it to
be literally true or more than partially useful.
For example, after simulated annealing evaluates a randomly chosen neighbor,
the algorithm always moves to the neighbor if it is better, following an
intuition from physics. But Walksat does better by sometimes not
moving to a neighbor even if it is better.
There are still major opportunities for new heuristic approaches to
search and combinatorial optimization. Some important issues include:
-
how to define the neighbors of a state (moveset design)
-
how many neighbors to evaluate at each step (moveset size selection)
-
how to weight different constraint violations
-
how to reduce the number of local optima and plateaus
-
how to compute or approximate an evaluation function rapidly
-
reusing work performed to evaluate nearby states
-
balancing random walking and greediness
The fundamental computational issue seems to be how to trade-off improvement
along multiple conflicting dimensions. Research should be guided
by experimentation and logical argument more than by loose analogies.
The word "metaphorical" is not automatically pejorative. What
it suggests is the need for additional thought, because the evaluation
of metaphorical methods cannot be purely quantitative, Competing
metaphors are always available; which
are better and why?
One issue with metaphor-based methods is the balance between focus on
the nature of the method and focus on the nature of the task. What
are the fundamental properties of a task that cause certain methods to
be suitable? For example, there is a basic difference between offline
search, as done by A*, and online search, as done in the first 250A project,
where costs incurred already are sunk.
Copyright (c) by Charles Elkan, 2001.