# A simple example

Imagine that we've collected data on the HEIGHT and WEIGHT of everyone in a classroom of $N$ students. If these are plotted, the result is something like Figure (figure) . Notice the correlation around an axis we might call something like SIZE. Students vary most along this dimension; it captures most of the information about their distribution. It is possible to capture a major source of variation across the HEIGHT/WEIGHT sample because, just as with our keywords, the two quantities are correlated.

In this section we analyze similar statistical correlations among the keywords and documents contained in the much larger vector space model first mentioned in Section §3.4 . Recall that in the vector space model, the $\mathname{Index}$ relation placing $D \equiv \mathname{NDoc}$ vectors corresponding to the corpus documents within the space $\Re^{V}, V \equiv \mathname{NKw}$ (for VOCABULARY SIZE ) defined by its keyword vocabulary.

Here we describe this in the terms of linear algebra, where J = \mathname{Index} \) \). For similar reasons, within this section, we will use $V \equiv \mathname{NKw}$ and $D \equiv \mathname{NDoc}$.} is a $D \times V$ element matrix. $spaces that we discuss here. Some of these involve the CURSE OF DIMENSIONALITY , which makes the computational expense of many important questions grow exponentially with the number of dimensions.} Attempts to reduce this large dimensional space into something smaller are called DIMENSIONALITY REDUCTION . There are two reasons we might be interested in reducing dimensions. The first is probably most obvious: it's a very unwieldy representation of documents' content. Individual documents will have many, many zeros, corresponding to the many words in the corpus$V\$ not present in an individual document; the vector space matrix is very SPARSE . Dimensionality reduction is a search for a representation that is denser, more compressed.

Another reason might be to exploit what has become known as LATENT SEMANTIC relationships among these keywords. When we make each term in our vocabulary a dimension, we are effectively assuming they are ORTHOGONAL to one another; we expect their effects to be independent. But many features of FOA suggest that index terms are highly dependent, highly correlated with one another. If that's the case, we can exploit that correlation by capturing only those axes of maximal variation and throwing away the rest.

FOA © R. K. Belew - 00-09-21