Automatic thesaurus construction

Before going on to consider WordNet,another elaborate thesaurus, it is useful to relate these , manually-constructed representations with automatic, statistically-derived analogs. Such comparisons have been part of IR research since its beginnings [Joyce58] [Dennis64] [Soergel74] . Section §5.2.5 has discussed how the same information used to cluster documents can be used with keywords as well. The semantics of relations based strictly on co-occurrence frequencies are not obvious [vanR77] , but seem to provide evidence for the \textbf{RT} (related term, or synonymy relation) discussed above.

Using this information to construct hierarchic relations among keywords corresponds to (hierarchic) clustering techniques. Thesaurus-specific techniques generally exploit a heuristic that high frequency keywords correspond to broad, general terms while low frequency keywords correspond to narrow, specific ones [Srinivasan92] . This heuristic can be used to organize keywords into levels of a taxonomy, with the hierarchic parent/child relation formed between those keywords with similar document distributions. RelFbk can also be used to provide theaurus structure [Guntzer89] . Whether constructed manually or automatically, thesuarus structures support many new forms of naviagation [REF1077] [REF1079] .

