FOA Home
To explain his empirical observations, Zipf himself proposed a
theoretical model that aimed at describing the ultimate purpose of
communication between authors and readers.
Zipf's theory was
extraordinarily broad, addressing not only (!) patterns in text but also
patterns across all human activities. According to Zipf's fundamental
PRINCIPLE OF LEAST EFFORT all activities can be viewed as
interactions between {\em jobs} needing to be done, and {\em tools}
developed to accomplish them. In a mature society in which a variety of
jobs and tools have existed for some time, a ``reciprocal economy''
forms. That is, there is a set of tools good for doing certain jobs, and
there is a set of jobs requiring certain tools. The Principle of Least
Effort asserts that a person attempting to apply a tool to a job does so
in order to minimize the probable effort in using that tool for that
particular job.
In applying this principle to texts, Zipf makes an
important correspondence - words also work as tools, accomplishing jobs
we need done. To simplify the situation greatly, imagine that the job an
author is attempting to accomplish is simply to ``point'' to some
``referent,'' something in the world. Authors would find it most
convenient to simply use one word all the time for all the jobs they are
trying to accomplish. It makes their task much easier; picking the right
word is effortless. The author has a pressure towards {\em unification}
of the vocabulary.
From the reader's point of view, it would be least
ambiguous if a completely unique term was used for every possible
function, every possible interpretation, every meaning. Readers
therefore have a pressure towards {diversification} of the vocabulary.
This leads to the VOCABULARY BALANCE we observe in Zipf's rule.
ZIpf hypothesized that interplay between the forces of diversification
and unification results in the use of existing words, which do not
extend the vocabulary, in most situations, together with the inclusion
of new words in those novel situations that demand them. The trick is to
find an ``economy of language'' that best satisfies both writer and
reader. Note that the maintance of the balance requires, however, that
authors receive {\em feedback} from their readers, confirming that they
are both ``pointing'' to the same referent.
Blair has extended Zipf's
analysis, considering Zipf's tool/job setting as it's applied to our FOA
task [REF704] [Blair92] . He argues that one of the
primary reasons FOA systems fail is that the vocabulary balance is
upset. The system of descriptors indexing the authors' works (viz., the
library, or the Web), standing between the authors who are writing the
books and the searchers attempting to find them {\em breaks the feedback
channel} keeping their shared vocabulary in balance when author and
reader are in direct contact.
Top of Page
Zipf's own explanation