FOA Home
Arguably the first use of RelFbk information in an adaptive IR
context came out of the SMART group led by Gerarld Salton. As discussed
in section §4.2.2 , Salton's vector
space model lends itself to a representation of documents and queries
that suggest ways of producing better matches, as shown in Figure
(figure) .
Roccio's characterization of the task is best
described as routing: we imagine that a stream of all the world's news
is available to some institution (for example, a corporation), and many
members of this institution (employees) are expected to build
long-standing queries characterizing their interests. These queries are
passed against the incoming stream of documents and those matching a
particular employee's query are ROUTED to him or her [Schutze95a] .
From this employee's
point-of-view, they will be receiving a stream of documents, all of
which are relevant to them. If they identify those documents which are
not relevant, the resulting corpus can be viewed as a training set, and
used to adjust their filter. This is a concrete application of the
binary classification technology.
Brauen in particular considered
``document vector modifications," resulting in ``dynamic" document
spaces [REF567] . Documents marked as
relevant can be moved closer to the query that successfully retrieved
them in several ways. First and most obviously, features shared by query
and document can have their weight increased. Features in the document
but not in the query can have their weight decreased, and terms in the
query but not in the document can be added to the document's
representation with small initial weights.
From the perspective of modern
machine learning, Brauen's method would be considered a type of
GRADIENT DESCENT learning algorithm: the disparity between
document and query creates a gradient along which small changes are
made. One difference, however, is that Brauen imagined a BATCH
learning scenario, with the entire set of labeled documents availably
simultaneously. ONLINE learning algorithms, on the other hand,
make small, incremental adaptive changes in immediate response to every
learning instance (document). Each individual step of learning (e.g.,
weight update in neural networks) must be quite small, in order to
guarantee convergence towards the globally optimal value. The size of
the small change is controlled by $\eta$, typically known as the
LEARNING RATE . Stochastic approximation theory suggests that
this constant should be relatively small for on-line learning, so that
the weight's are small enough to allow convergence [White89] . Online learning is generally
preferred, to avoid the time delay and data collection complications.
Idealizing
our learning task to be to produce a perfect match (i.e., the dot
product of query and document is 1.0) on relevant documents and no match
whatsoever on irrelevant documents, we can treat this behavior as the
target for our error correction learning. Let $R_{}$ stand for this
Boolean relevant/not relevant classification of each document. Then (as
discussed further in the next Section §7.4 ), we can hope to have available a
TRAINING SET $T$ of documents which have be labeled a
priori as $R_{\mathbf{d}}=0$or $R_{\mathbf{d}}=1$, specifying
whether they are or are not considered relevant. With such evidence as
to what documents we do and don't consider Relevant, we can define
precisely how well we have learned. A typical ERROR MEASURE or
LOSS FUNCTION can be defined as the squared difference between
the actual vector match and the target $R_{\mathbf{d}}$:
\mathname{Error} \equiv \sum_{\mathbf{d} \in T} (\mathbf{d} \cdot
\mathbf{q} - R_{\mathbf{d}})^{2} $ gradient runs both directions} {The
distinction between query and document vector modification runs deep
through the literature of Salton's students. Roccio did most to analyze
vector modification strategies using \RelFbk, but his intent was for
query modification purposes rather than vector modification. ROCCIO'S
ALGORITHM is now used as these algorithms have been ``recast as
linear classification by treating the query as a classifier.'' [Lewis96] }
Top of Page
Learning which documents to route
Subsections