**FOA Home**
**UP:** Preference Relations

In many kinds of retrieval interface, it is easy and natural for a user to specify that they prefer one retrieved document over another. The RelFbk discussed in Section §4.1 is our standard characterization of this information. .}

The literature on
**MULTI-DIMENTIONAL SCALING** (MDS) was developed especially to deal
with assessments by human subjects of the similarity or difference among
multiple stimulus patterns [Borg87] .
A robust and important result from this analysis is that people can much
more easily and consistently provide **NONMETRIC** assessments of the
similarity than if they are forced to specify metric quantities . Rather
than an experimentor *arbitrarily* imposing dimensions they
believe important, We can ask the subjects to globally judge, without
criteria provided by the experimenter, the overall similarities of
different facial expressions. The *proximities* are then mapped
into [MDS] *distancesx*, and the configuration examined for
systematic characteristics of the distribution of points. [Borg87] .

The ``facial expression''
example is a reference to Woodworth's experiments in 1938 [Woodworth38] . Imagine a hypothesis
that facial expressions can be characterized in terms of two dimensions
of variability. We could test this by showing human subjects pairs of
pictures and asking them judge how similar/disimilar the two facial
expression are, in terms ``difference in emotional expression or
content.'' Then imagine they are given some **PROXIMITY** $p$ scale
from $1,2, ... k$ along which they are to rank the picture pairs; each
pair of pictures therefore has a proximity score $p_{ij}$. If these
pictures are to be characterized in a two-dimensional plane, we can also
associate with each pair of pictures a distance $d_{ij}$ according to
their two-dimensional coordinates. p_{ij} &\equiv &\mathrm{proximity\
(evaluation)} \\ d_{ij} &\equiv &\mathrm{distancece\ (in\ 2-dimensional\
plane)} {While ``similarity'' seems a simple idea, it admits a number of
interpretations. For example, what is the opposite of similar? When we
talk about spaces, does dissimilarity mean that two things are far
apart? Rather than considering dissimilarity to be a negative quantity,
it is more conventional to think of it as the inverse of similarity:
\mathname{Sim} = \frac{1}{\mathname{Dissim}+1} }

The MDS analsysis
described by Borg Lingos (they actually prefer the term Similarity
Structure Analysis, SSA) is a key contribution. It is an algorithm for
iteratively moving vectors corresponding to the objects of evaluaton
(pictures of facial expressions) within an arbitrary dimensional space
so as to minimize as much as possible the **STRESS** they experience,
relative to their pairwise proximities. We think that the pictures have
been well-placed if those with similar proximites are close together as
measured by this distance.

Within any such space, we can replace the pairwise distances $d_{ij}$ with $d_{i}$ associated with each point (picture), measuring the distance of this point from the origin of the space. Proximities $p_{i}$ are defined in terms of the projections of these points onto the space's principal component vectors.

To quantify
this notion of correspondence between humans' proxmity. assessments and
its imbedding in arbitrary spaces, Guttman has defined a measure of
**MONOTONIC CORRESPONDENCE** between two variables. \mu _{2}\equiv
\frac{\sum\limits_{i}\sum\limits_{j}\left( p_{i}-p_{j}\right) \cdot
\left( d_{i}-d_{j}\right) }{\sum\limits_{i}\sum\limits_{j}\left|
p_{i}-p_{j}\right| \cdot \left| d_{i}-d_{j}\right| } capturing the
extent to which the placement of the items within the two dimensional
space is consistent with the proximity's. This formula really is as
simple as it looks: it is approximately a correlation of proximities
with distances, but ``gated'' by the comparison of differences in the
numerator and absolute values of differences in the denominator. The
value of $\mu_{2}$ is always greater than simple linear correlation, and
they are exactly equal when a linear relationship holds between
proximity and distance.

In the FOA retrieval situation, the obvious measure of interest is the matching function ${\mathname{Match}}(q,d)$ (cf. Section §4.3.4 ) score with respect to a (henceforth implicit) query $q$

A particularly interesting use of this criterion is as part of error correction learning §7.3 . If we assume that the ranking function $R$ has certain free variables $\Theta$, that we again have a training set of documents $T$, and that the $J$ criterion is differentiable with respect to $\Theta$, a gradient search procedure can be used to adjust $\Theta$ towards an optimal retrieval: \frac{\partial J({\bf \mathname{Match}}_{\Theta })}{\partial \Theta }=\sum_{\mathbf{d} \in T}\frac{\partial J({\bf \mathname{Match}}_{\Theta })}{\partial {\bf \mathname{Match}}_{\Theta }(\mathbf{d})}\frac{\partial {\bf \mathname{Match}}_{\Theta }(\mathbf{d})}{ \partial \Theta }

For example, Bartell et al. [REF1092] [REF1141] consider the problem of picking a document-query similarity measure $R$ parameterized by $\Theta$ to vary across a broad range of alternatives: {\bf \mathname{Match}}_{\Theta }(\mathbf{d}) &=&\mathname{Sim}_{\Theta }\left( \mathbf{d,q} \right) \nonumber \\ &=&\frac{\mathbf{q}\cdot \mathbf{d}}{\left( \sum\limits_{d_{i}\neq 0}d_{i}^{\theta_{1}}\right) ^{\theta_{2}}} \Theta =\left\langle \theta_{1},\theta_{2}\right\rangle This characterization of similarity score includes standard inner product ( $\theta_{2} = 0$, $\theta_{1}$ can be anything), cosine ( $\theta_{1} = 2, \theta_{2} = 0.5$ ) and pseudo-cosine ($\theta_{1} = 1, \theta_{2} = 1$) as special cases.

Empirical testing using the CISI experimental corpus (cf. Section §4.3.9 ) and its binary relevance assessments for preference relations (\Rel $\succ$ \IRel), Bartell et al. find optimal values of $\theta_{1} = 2.5, \theta_{2} = 0.3$, a curious and non-standard type of similarity measure indeed, but functionally quite similar to standard cosine. In fact, changing the matching function to this ``optimal'' value improves performance (as measured by average precision) almost not at all.

*Top of Page*
**UP:** Preference Relations
**FOA Home**