FOA Home | UP: Background


Sources of Feedback

Two distinct classes of machine learning techniques can be applied to the FOA problem. These can be distinguished on the basis of the type of training feedback given to the learning system. The most powerful and well-understood are SUPERVISED learning methods, where each and every training instance given to the learning system comes with an explicit label as to what the learning system should do. Using the Email example \dhfoot{Email example}, if we want talk announcements to consistently go into one folder, mail from our family go in another, and spam is deleted. In terms of supervised learning, this regime requires that we first provide a TRAINING SET (cf. Section §7.4 ). In our case the training set is a set of Email messages and the $C$ mail categories we have classified them in the past. After training on this data set, we hope that our classifier generalizes to new, previously unseen messages and classifies them correctly as well.

A second class of machine learning techniques makes weaker assumptions concerning the availability of training feedback. REINFORCEMENT learning assumes only that a positive/negative signal tells the learning system when it is doing a good/bad job. In the FOA process, for example, relevant feedback generates a reinforcement signal (saying whether it was a good or bad thing that a document was retrieved).

Note that RelFbk does not count as supervised learning: in general we do not know all of the documents which should have been retrieved with respect to a particular query. Supervised training provides more information in the sense that each and every aspect of the learner's action (retrieval) can be contrasted with corresponding features of the correct action. Reinforcement information, on the other hand, aggregates all of these features into a single measure of performance.

The difference between these two kinds of learning is especially stark in the FOA context. To provide reinforcement information, the user need only react to each document and say whether they are happy or sad it was retrieved. In order to do supervised training, the user would need to identify the perfect retrieval, requiring the user evaluating each and every document in the corpus! Clearly having each user evaluate whether every document retrieved for every query is excessive. What approximations to to this notion of ``correct'' answer might be useful?

The distinction between the supervised retrieval and that shaped by RelFbk highlights the need to be explicit about what kinds of feedback are hard for the user and which are easier. The discussion of RAVE made some of our assumptions concerning cognitive overhead clear §4.4 , but this is another important area for further study. What other feedback might we reliably and easily be able to elicit? Can users react with too general/specific? Too theoretic/applied? How could such information be exploited by a learning system? Here we continue to assume that RelFbk is easy to acquire.

Subsections


Top of Page | UP: Background | ,FOA Home


FOA © R. K. Belew - 00-09-21