Next: Electronic artifacts Up: Adaptive information retrieval Previous: Programmed and learned

IR issues

I have made three arguments for advantages the AIR system, originally developed as an experiment in AI, offers to the IR community [21]. First the spreading activation search performed by connectionist networks can be considered a natural generalization of ``first-order'' associative techniques typically used by IR systems. Recent experiments, using the standard test collections typical of IR research, suggest that this search provides a signficant advantage [7].

This localized, sparse representation also promises to provide a more tractible alternative to the traditional ``vector space'' model. This becomes especially clear when linear algebra techniques such as singular value decomposition (SVD) are applied to the ``global'' matrix of document/keyword associations. Led by Brian Bartell, we have explored a number of extensions to the basic SVD method. For example, we have shown that this basic method is, under very reasonable assumptions, equivalent to a process that relies upon multi-dimensional scaling (MDS) instead of SVD [2]. This is importantant because the relevance feedback information provided by browsing users is fundamentally non-metric, making MDS one of the few analytic methods available. We have also shown that closely related learning methods using the same critical relevance feedback information can be successfully applied to several other, more traditional IR methods [1][3][4].

The second and primary motivation for a connectionist representation is that it provides an ideal substrate by which the system can learn: relevance feedback information (generated naturally by users as a consequence of their browsing) can be converted into modifications of indices. Over time, then, the IR system comes to better ``understand'' - in a fairly deep sense - what the documents are about and what the users mean by their queries.

Finally, AIR and especially Rose's SCALIR system incorporate a much richer view of what interaction with an IR system should entail: queries use features other than just keywords, retrievals include objects other than just documents. The ultimate goal is to support the retrieval process (e.g., browsing), rather than some particular product (e.g., a set of retrieved documents) [46]. Adaptation can also play a role in interface design, for example allowing features of the interface to become tuned to the idiosyncratic preferences of any user [30]. This is just one of many examples of ways in which AI techniques can become components of intelligent ``decision support'' tools used by people, as opposed to the autonomous agents AI typically considers [15].

With student Amy Steier, I have begun to explore the growing interaction between IR issues and computational linguistics sometimes called ``corpus-based linguistics.'' Using the linguistic primitive of ``phrase'' as the simplest example of a syntactic construct beyond the single word tokens considered by IR, we have already found striking differences between the frequency distributions of phrases within and across topically related sub-collections of large corpora [49]. While the classification by which topical organizations is often explicit (provided, for example, by the Propædia of the Encyclopedia Britannica), we have also found evidence of significant differences in topical sublanguages associated with social organizations, as defined for example by the various universities where Ph.D. dissertations have been written [50].



Next: Electronic artifacts Up: Adaptive information retrieval Previous: Programmed and learned


rik@cs.ucsd.edu