FOA Home | UP: Dimensionaility Reduction

Formal notions of similarity

Two features of the FOA problem can help us to focus on what is known as the Minkowski metric [Luenberger69] [Jain88] . First, the result of our calculations below will be a {\em real-valued weight} associating a keyword with a document or query, and we can assume that this is a continuous quantity. Further, we can make the somewhat more questionable assumption that these weights make ``natural'' use of zero, and so index weights also fall on what is known as a ``ratio'' scale. With these two assumption, the Minkowski metrics are defined as: \beq \mathname{Sim}(\mathbf{q,d}) = \left( \sum_{k=1}^{NKw} \mid w_{qk} - w_{dk} \mid^L \right)^{1/L} \eeq where $L \geq 1$. The most common version is the $L=2$ norm, and we use it below. The $L=1$ (``Manhattan distance") and $L=\infty$ (``sup" norm, where $\sum_{k}$ is replaced with $max$) are also seen often.

A metric is a scalar function over pairs of points in the vector space. Minkowski metrics satisfying three critical properties: \mathname{Sim}(x,y) & \ge & 0 \\ \mathname{Sim}(x,y) & = & \mathname{Sim}(y,x) \\ \mathname{Sim}(x,x) & = & \parallel x \parallel \\ & \geq & arg\max_y \mathname{Sim}(x,y)

The measure $(x,x)$ of a vector with itself is what we typically think of as the {\em length} of the vector, or more precisely, its {\em norm} $\parallel x \parallel$.

Two other important features of metric spaces follow from these axioms, and will also prove useful later: \item $\mathname{Sim}(x,y) \leq \parallel x \parallel \cdot \parallel y \parallel$ ({\em Cauchy-Schwartz Inequality}) \item $\parallel x+y \parallel \leq \parallel x \parallel + \parallel y \parallel$ ({\em Triangle Inequality}) gle Inequality}) nequality}) lity}) ) \eenum

Top of Page | UP: Dimensionaility Reduction | ,FOA Home

FOA © R. K. Belew - 00-09-21