FOA Home | UP: Looking for meaning in all the wrong places \\ (at the character level)


Zipf's own explanation

To explain his empirical observations, Zipf himself proposed a theoretical model that aimed at describing the ultimate purpose of communication between authors and readers.

Zipf's theory was extraordinarily broad, addressing not only (!) patterns in text but also patterns across all human activities. According to Zipf's fundamental PRINCIPLE OF LEAST EFFORT all activities can be viewed as interactions between {\em jobs} needing to be done, and {\em tools} developed to accomplish them. In a mature society in which a variety of jobs and tools have existed for some time, a ``reciprocal economy'' forms. That is, there is a set of tools good for doing certain jobs, and there is a set of jobs requiring certain tools. The Principle of Least Effort asserts that a person attempting to apply a tool to a job does so in order to minimize the probable effort in using that tool for that particular job.

In applying this principle to texts, Zipf makes an important correspondence - words also work as tools, accomplishing jobs we need done. To simplify the situation greatly, imagine that the job an author is attempting to accomplish is simply to ``point'' to some ``referent,'' something in the world. Authors would find it most convenient to simply use one word all the time for all the jobs they are trying to accomplish. It makes their task much easier; picking the right word is effortless. The author has a pressure towards {\em unification} of the vocabulary.

From the reader's point of view, it would be least ambiguous if a completely unique term was used for every possible function, every possible interpretation, every meaning. Readers therefore have a pressure towards {diversification} of the vocabulary. This leads to the VOCABULARY BALANCE we observe in Zipf's rule. ZIpf hypothesized that interplay between the forces of diversification and unification results in the use of existing words, which do not extend the vocabulary, in most situations, together with the inclusion of new words in those novel situations that demand them. The trick is to find an ``economy of language'' that best satisfies both writer and reader. Note that the maintance of the balance requires, however, that authors receive {\em feedback} from their readers, confirming that they are both ``pointing'' to the same referent.

Blair has extended Zipf's analysis, considering Zipf's tool/job setting as it's applied to our FOA task [REF704] [Blair92] . He argues that one of the primary reasons FOA systems fail is that the vocabulary balance is upset. The system of descriptors indexing the authors' works (viz., the library, or the Web), standing between the authors who are writing the books and the searchers attempting to find them {\em breaks the feedback channel} keeping their shared vocabulary in balance when author and reader are in direct contact.


Top of Page | UP: Looking for meaning in all the wrong places \\ (at the character level) | ,FOA Home


FOA © R. K. Belew - 00-09-21