Finding Out About

Richard K. Belew

Last updated: 1 Sept 01

p. 75
Missing paren in Eq. 3.8

...controls the probability that the word \( w \) is relevant:

\Pr(d\ {\textstyle \underline{about}\ }\ w \mid \ k \ {\tex...
...k} + %%
(1 - p_{rel}) e^{-\lambda_w^{2}} (\lambda_w^{2})^{k}}
\end{displaymath} (1)

p. 123
Fallout defn and text permutation

Similarly, this is the probability that a document will be relevant, given that it is retrieved: $\Pr({\hbox{\it Rel}}\vert{\hbox{\it Ret}})$. These two measures, Recall and Precision, have remained the bedrock of search engine evaluation since they were first introduced by Kent in 1955 [Kent55,Saracevic75].

A closely related but less common measure is called fallout , where we (perversly!) focus on the irrelevant documents and the fraction of them retrieved:

{\hbox{\it Fallout}}\equiv
...t \over
{\left\vert\overline{{\hbox{\it Rel}}}\right\vert}}
\end{displaymath} (2)

This is $\Pr({\hbox{\it Ret}}\vert\overline{\hbox{\it Rel}})$.

The close relationship between these three measures can be defined precisely ...

From: Peter Brusilovsky,, 20 Feb 01

p. 168
add cite to Robertson77 re: PRP

There are at least two possible interpretations of precisely what a probability of relevance, $\Pr({\hbox{\it Rel}})$, means, in terms of an underlying event space [Maron77,REF178,REF318,Robertson77].

p. 189
Impact $\neq$ eminence

...are those with higher impact than their peers! In fact, Price's characterization of ``eminence'' in this passage focuses on the number of publications, not how well cited these might turn out to be. Understanding how ``impact''/``eminence'' accumulates across papers by the same author, across authors at the same institution, etc. is a critical issue for further investigation.

From: Nick Belkin, pointed out the gap between Price's focus on productivity and this discussion of impact. 27 Aug 01

p. 189
Cocitation vs. coupling

Finally, as mentioned in Section 5.2.5, references among documents can be used as the basis for interdocument similarity: cocitation reflects the degree to which two documents are both referenced by other documents' bibliographies [BarHillel57,Small73]. Conversely, bibliographic coupling refers to the amount of overlap between two documents' bibiographies [Kessler63]. Figure 1

Figure 1: Citation similarity measures

sketches the basis of the two relations, which be stated formally:
$\displaystyle {\hbox{\it Couple}}(a,b)$ $\textstyle \propto \left\Vert \{x\vert{\hbox{\it Cites}}(x,a)\wedge {\hbox{\it Cites}}(x,b)\}\right\Vert$   (3)
$\displaystyle {\hbox{\it Cocite}}(a,b)$ $\textstyle \propto \left\Vert \{x\vert{\hbox{\it Cites}}(a,x)\wedge {\hbox{\it Cites}}(b,x)\}\right\Vert$   (4)

Normalization of these basic quantities, with respect to the total number of citations in the various documents' bibliographies, ``norms of scholarship'' (see below), etc., seems to be a matter of varying practice.

From: Nick Belkin, identified my confounding of bibliographic coupling and cocitation. 27 Aug 01

p. 210
Definitions of USE, RT

The USE/USE FOR relation captures synonymy or quasi-synonomy; thus, one says that in this thesaurus (or domain) we will consider the terms text retrieval and information retrieval to be synonymous, but will USE only the term information retrieval (USED FOR text retrieval).

Related Terms are those which stand in some (any) other relation to one another other than the hierarchical (BT/NT) or synonymous (USE/USE FOR). Sometimes, in thesaurus-speak, we call the BT/NT and USE/USE FOR relations ``paradigmatic'' (i.e. a priori, semantic) and the RT relation ``syntagmatic'' (i.e. a posteriori, dependent upon domain or text characteristics).

From: Nick Belkin,, provided these improved defintions, 27 Aug 01

p. 246
Fig 6.26: All arc's should be labeled ``Sim'' not ``Sin''!

p. 261
entropy reduction citation missing page refs

...[Papoulis91, p. 549-554]. In terms of keyword frequencies, then, the mutual ...

Last modified by: 1 Sept 01