FOA Home | UP: Adaptive Information Retrieval


Building hypotheses about documents

We will talk about competing ``hypotheses,'' for example, rule that successfully divide our spam Email from our familyıs Email. If only very simple hypotheses are to be considered, a relatively small amount of data can be used to select between them. For example, if our hypothesis is that spam Email always contains the phrase {\$\$\$\$ BIG MONEY \$\$\$\$\$}, a small amount of training data is sufficient to confirm or disconfirm this rule [Sahami98] . But if we wish to consider elaborate discrimination rules for example including many key words and/or date information, etc., it takes much more data to tease apart all the various alternatives. The volume of training data available, then, provides a very real constraint on how complex the hypotheses we can consider and how statistically reliable we will expect rules to be on unseen test data.

Subsections


Top of Page | UP: Adaptive Information Retrieval | ,FOA Home


FOA © R. K. Belew - 00-09-21