# Learning which documents to route

Arguably the first use of RelFbk information in an adaptive IR context came out of the SMART group led by Gerarld Salton. As discussed in section §4.2.2 , Salton's vector space model lends itself to a representation of documents and queries that suggest ways of producing better matches, as shown in Figure (figure) .

Roccio's characterization of the task is best described as routing: we imagine that a stream of all the world's news is available to some institution (for example, a corporation), and many members of this institution (employees) are expected to build long-standing queries characterizing their interests. These queries are passed against the incoming stream of documents and those matching a particular employee's query are ROUTED to him or her [Schutze95a] .

From this employee's point-of-view, they will be receiving a stream of documents, all of which are relevant to them. If they identify those documents which are not relevant, the resulting corpus can be viewed as a training set, and used to adjust their filter. This is a concrete application of the binary classification technology.

Brauen in particular considered document vector modifications," resulting in dynamic" document spaces [REF567] . Documents marked as relevant can be moved closer to the query that successfully retrieved them in several ways. First and most obviously, features shared by query and document can have their weight increased. Features in the document but not in the query can have their weight decreased, and terms in the query but not in the document can be added to the document's representation with small initial weights.

From the perspective of modern machine learning, Brauen's method would be considered a type of GRADIENT DESCENT learning algorithm: the disparity between document and query creates a gradient along which small changes are made. One difference, however, is that Brauen imagined a BATCH learning scenario, with the entire set of labeled documents availably simultaneously. ONLINE learning algorithms, on the other hand, make small, incremental adaptive changes in immediate response to every learning instance (document). Each individual step of learning (e.g., weight update in neural networks) must be quite small, in order to guarantee convergence towards the globally optimal value. The size of the small change is controlled by $\eta$, typically known as the LEARNING RATE . Stochastic approximation theory suggests that this constant should be relatively small for on-line learning, so that the weight's are small enough to allow convergence [White89] . Online learning is generally preferred, to avoid the time delay and data collection complications.

Idealizing our learning task to be to produce a perfect match (i.e., the dot product of query and document is 1.0) on relevant documents and no match whatsoever on irrelevant documents, we can treat this behavior as the target for our error correction learning. Let $R_{}$ stand for this Boolean relevant/not relevant classification of each document. Then (as discussed further in the next Section §7.4 ), we can hope to have available a TRAINING SET $T$ of documents which have be labeled a priori as $R_{\mathbf{d}}=0$or $R_{\mathbf{d}}=1$, specifying whether they are or are not considered relevant. With such evidence as to what documents we do and don't consider Relevant, we can define precisely how well we have learned. A typical ERROR MEASURE or LOSS FUNCTION can be defined as the squared difference between the actual vector match and the target $R_{\mathbf{d}}$: \mathname{Error} \equiv \sum_{\mathbf{d} \in T} (\mathbf{d} \cdot \mathbf{q} - R_{\mathbf{d}})^{2} \$ gradient runs both directions} {The distinction between query and document vector modification runs deep through the literature of Salton's students. Roccio did most to analyze vector modification strategies using \RelFbk, but his intent was for query modification purposes rather than vector modification. ROCCIO'S ALGORITHM is now used as these algorithms have been recast as linear classification by treating the query as a classifier.'' [Lewis96] }

## Subsections

FOA © R. K. Belew - 00-09-21