# Exploiting linkage for context

All samples of language, including the documents indexed by Web search engines, depend heavily on {shared context} for comprehension. A document's author makes assumptions, often tacit, about their intended audience and when this document appears in a traditional'' medium (conference proceedings, academic journal, etc.) it is likely that typical readers will understand it as intended. But one of the many things the Web changes is the huge new audience it brings for documents, many of whom will {\em not} share the author's intended context.

But because most search engines attempt to index indiscriminantly across the entire WWW, the {global} word frequency statistics they collect can only reflect gross averages. The utility of an index term, as a discriminator of relevant from irrelevant items, can become a muddy average of its application across multiple, distinct sub-corpora within which these words have more focused meaning [REF866] [REF1097] .

Hypertext information environments such as the Web contain additional structure information [Chakrabarti98b] . This linkage information is typically exploited by browsing users. But LINKAGE TOPOLOGY --- the spatial'' structure imposed over documents by their hypertext links to one another -- can be used to generate a concrete notion of context within which each document is understood: Two documents and the words they contain are imagined to be in the same context if they are close together in this space. Even in unstructured portions of the Web, authors tend to cluster documents about related topics by letting them point to each other via links. Such linkage topology is useful inasmuch as browsers have a better-than-random expectation that following links can provide them with guidance. If this were not the case, browsing would be a waste of time.

This suggests that AGENTS (a.k.a. infobots, spiders, etc.) which navigate over such structural links might be able to discover this context. For example, agents browsing through pages about {\tt ROCK CLIMBING} and {\tt ROCK 'N ROLL} should attribute different weights to the word {\tt ROCK} depending on whether the query they are trying to satisfy is about music or sports. Where an agent is situated in an environment'' (neighborhood of highly interlinked documents) provides it with the {\em local context} within which to analyze word meanings --- a structured, situated approach to polisemy. The words that surround links in a document provide an agent with valuable information to evaluate links and thus guide its path decisions --- a statistical approach to action selection.

The idea of decentralizing the index-building process is not new. Dividing the task into localized indexing, performed by a set of { gatherers,} and centralized searching, performed by a set of {\em brokers,} has been suggested since the early days of the Web by the Harvest project [Bowman94] . WebWatcher [Armstrong95] and Letizia [Lieberman97] are agents that learn to mimic the user by looking over his/her shoulder while browsing. Then they perform look-ahead searches and make real-time suggestions for pages that might interest the user. Fab [Balabanovic97] and Amalthaea [Moukas97] are multi-agent adaptive filtering systems inspired by genetic algorithms, artificial life, and market models. Term weighting and relevance feedback are used to adapt a matching between a set of discovery agents (typically search engine parasites) and a set of user profiles (corresponding to single- or multiple-user interests).

Here we focus on InfoSpiders, a multi-agent system developed by Fillipo Menczer [REF1110] [REF1142] [REF1148] [REF1150] . In InfoSpiders an evolving population of many agents is maintained, with each agent browsing from document to document on-line, making autonomous decisions about which links to follow, and adjusting its strategy. Population-wide dynamics bias the search toward more promising area and control the total amount of computing resources devoted to the search activity. Basic features of the algorithm are discussed, and then an example of how these agents perform as searchers through a hypertext version of the Encyclopedia Britannicaare presented below.

FOA © R. K. Belew - 00-09-21