FOA Home | UP: Other approaches to classifiation

Boolean predicates

A much different approach to the text classification task has been proposed by Cohen in his RIPPER system [Cohen96a] [Cohen96b] . The space of hypothesis considered by Ripper is a set of Boolean rules composed over a space of keywords. A simple example is showin in Figure (FOAref) , which extends the obvious definition of {\tt IRELAND} to include documents that mention violent {\tt IRA} activities as well. Like decision lists, Ripper's rule sets are easier for human experts to interpret than a large system of Bayesian probabilities.

Ripper is an example of a ``covering" learning algorithm; cf. Figure (figure) This means that it iteratively forms conjunctions of Boolean predicates which ``cover" some of the positive instances of a Boolean classification while excluding all of the negative instances. In the next iteration, the positive instances which were covered previously are removed from the training set, and a new conjunctive clause is formed which again covers some more positive instances while excluding all negative ones. Ultimately, then, the rule set will be in DISJUNCTIVE NORMAL FORM (DNF).

Ripper also includes optimizations to simplify rules by removing conditions which do not affect performance, and by picking conditions which provide the most information gain [Quinlan93] . Finally, Cohen adapted these rule learning techniques to the text domain by adding ``set valued attributes." These special attributes collapse a document's representation to be simply the set of words it contains. Ripper's rules can then include tests for sets of words, rather than having to test the presence/absence of each word individually.

Top of Page | UP: Other approaches to classifiation | ,FOA Home

FOA © R. K. Belew - 00-09-21