FOA Home | UP: Adaptive Information Retrieval

Symbolic and Subsymbolic Learning

Most of the learning applications we have discussed apply to the $$ relation between keywords and documents. But there are many other syntactic clues associated with documents from which we can also learn. Chapter 6 discussed a number of these heterogeneous data sources. But as we attempt to learn with and across both structured attributes and statistical features (recall the distinction of Section §1.6 ) it is important to keep several important differences in mind.

The distinction between SYMBOLIC , unambiguous features of each document (e.g., date and place of publication, author, etc.) which represent unambiguous features of the world that human experts can reliably program directly, and the much larger set of SUBSYMBOLIC features from which we hope to compose our induced representation [REF672] becomes especially important as we attempt to combine both manually programmed and automatically learned knowledge as part of the same system [REF438] . Even among these attributes, however, there is room for learning about their meaning. For example, while a scientific paper may have many nominal authors, often it is only one or two to whom most readers will attribute significant intellectual contribution. While papers often have extensive bibliographies, some of these also are more significant than others, and can be considered supporting or antagonistic (see Section §6.1 ).

For all these reasons, FOA is an especially ripe area for AI and machine learning. The fact that documents are composed of semantically meaningful tokens allows us to make especially strong hypotheses about how they should be classified. One fundamentally important feature of the FOA activity (unless the WWW alters our world entirely!) is that there will always be more instances of document readings than of document writings. That is, while we can imagine spending a huge effort analyzing any text, there are fundamental limits as to how much we can learn about it from only the features it contains. But each and every time a document is retrieved and read by a reader, we can potentially learn something new about the meaning of this document from this new personšs perspective. Machine learning techniques are mandatory if we are to exploit the information provided by this unending stream of queries.

As discussed in Chapter 6, the histories of IR and AI have crossed many times in the past, generally in head-on collision rather than constructively. But as AI has moved from a concern with manually constructed knowledge representations to machine learning, and as IR has begun to consider how indexing structures can change with use, these two methodologies have increasingly overlapped.

Top of Page | UP: Adaptive Information Retrieval | ,FOA Home

FOA © R. K. Belew - 00-09-21