FOA Home | UP: Adaptive Information Retrieval


Background

Most of the techniques described in the last chapter built on representational and inference methods originally developed within AI in the 1970s and -80s. Today these methods are sometimes called GOOD OLD-FASHIONED AI (GOFAI), to distinguish it from more recent advances. There are many ways to characterize this change (see Russell \& Norvig's text for an alternative interpration [Russell95] and cf. Sections §6.9.1 and §7.8 ), but the most important is: AI is now centrally concerned with learning the representations it uses rather than assuming that some smart KNOWLEDGE ENGINEER has entered it manually.

To be concrete, imagine that you are to act as a librarian with respect to your own email. We have assumed at several points that you are collecting vast amounts of email , but perhaps are only now starting to think how it should be classified for subsequent retrieval. If we hire a librarian, we can reasonably expect them to bring certain useful skills to their new job, and then continue to learn ways of doing it better. As their boss we must provide regular feedback that points out both good and bad aspects of their work. If this person was having their first annual review and they were no better at finding useful information than the day they were hired, we would have reason for concern.

The preceeding chapters have surveyed a number of techniques for supporting the FOA task, but their utility is immediately apparent and we do not expect it to improve. This chapter is concerned with ADAPTIVE techniques: those that improve their performance over time, in response to FEEDBACK they receive on prior performance. We can idealize our goal for the learning system in terms of a person, a clever, resourceful, adaptive librarian.

Figure (figure) . gives an overview of how machine learning fits into the space of existing IR techniques. The horizontal axis is meant to indicate the amount of manual effort expended improving the corpus. These activities may include constructing a controlled vocabulary, forming good lexical index terms, including phrases, building thesauri relating the key words to one another, etc. The vertical axis attempts to capture something like ease of use for FOAs. Such usability metrics are notoriously difficult to quantify, but some indicators may include search time to known item.

Prior to the wide-spread application of search engine technologies, brought on by efforts like WAIS and SMART, to search text meant to {grep} across textual fields. Since {\tt grep} and related search methods rely on regular expressions for queries, and since regular expressions canšt be conveniently composed with Boolean operators, early search systems provided only these search techniques.

But with the introduction of search engine technologies, the goal became one of building an index, much like the librarian might construct for a collection of books or documents. These have been the issues at the core of our FOA discussion.

The figure extends this progression further. While it is rare to have any textual corpus receive manual attention from a librarian or editor, and so there are very few manual indices, a very few corpora have received even more extensive editorial enhancement. The Encyclopedia Britannicaand Westlaw and Medline are all exemplary of just how much the FOA activity can be supported by rich representations.

This becomes the goal for our machine learning techniques. They will turn out to form a natural extension of the statistical techniques underlying automatic index construction. Peter Turney maintains a useful bibliography of Machine Learning Applied to Information Retrieval references generally, as well as of Text Classification Resources in particular.

Finally, it is always a mistake to view the relationship between algorithmic, (artificially intelligent) methods with the natural, human intelligent behaviors they mimic. The most constructive systems we can build are ones which leverage editorial capabilities with new computational tools. The editoršs workbench is a good metaphor for such designs.

Subsections


Top of Page | UP: Adaptive Information Retrieval | ,FOA Home


FOA © R. K. Belew - 00-09-21