# Binary Independence Model

Perhaps the simplest model proceeds by imagining binary, independent features; it is conventionally called (surprise!) the BINARY INDEPENDENCE MODEL (BIM) [Robertson76] [vanR77] . First, the binary assumption is that all the features $x_{i}$ are binary. This is not a very restrictive assumption and is used only to simplify the derivation.

The much bigger assumption is that the documents' features occur independently of one another. We have discussed the problems with such an assumption before. quotes J. H. Williams' expression of the paradox: The assumption of independence of words in a document is usually made as a matter of mathematical convenience. Without the assumption, many of the subsequent mathematical relations could not be expressed. With it, many of the conclusions should be accepted with extreme caution. [Williams65] . The key advantage it allows is that the probability of a feature vector ${\bf x}$ becomes simply the product of the MARGINAL PROBABILITIES of the individual features: \Pr({\bf x} | \mathname{Rel}) = \prod_{i} \Pr( x_i | \mathname{Rel}) Very convenient, and very unrealistic. {Cooper has noted that we don't need quite such a restrictive notion of independence [Cooper94] . All that is required is to know that the ratio of linked'' ir/relevant feature probabilities $\Pr({x_i} | \mathname{Rel})\over \Pr({x_i} | \overline{\mathname{Rel}})$ are independent of one another. While slightly less restrictive, this assumption seems no more realistic.}

Applying this decomposition to our odds calculation gives: \mathname{Odds}(\mathname{Rel} | {\bf x}) = \mathname{Odds}(\mathname{Rel}) \cdot \prod_i {\Pr({x_i} | \mathname{Rel})\over \Pr({x_i} | \overline{\mathname{Rel}})}

It will be convenient to introduce the variables $p_i$ and $q_{i}$ to capture the probabilities that feature $x_{i}$ is present, given that a document is/not relevant: p_i & \equiv & \Pr(x_i=1 | \mathname{Rel}) \\ q_i & \equiv & \Pr(x_i=1 | \overline{\mathname{Rel}}) The complimentary probabities, concerning documents in which the feature is absent, can also be defined easily: 1-p_i & = & \Pr(x_i=0 | \mathname{Rel}) \\ 1-q_i & = & \Pr(x_i=0 | \overline{\mathname{Rel}})

These definitions break the product into two portions, the first having to do with those features that are present in a particular document and the second those that are not: \mathname{Odds}(\mathname{Rel} | {\bf x}) = \mathname{Odds}(\mathname{Rel}) \cdot \prod_{x_i=1} {p_i\over q_i}\cdot \prod_{x_i=0} {{1-p_i}\over{1-q_i}}

Recall that both queries and documents live within the same vector space defined over the features $x_{i}$. The two products of Equation (FOAref) (defined in terms of presence/absence of a feature in a document) can be further broken into four subcases depending on whether the features do or do not occur in the query. We next make another background'' assumption concerning all of those features $x_{i}$ that are not in both the query and document of current interest, viz., that the probability of these features being present in relevant and irrelevant documents is equal: $p_i = q_i$. That is, for those terms we don't care about (because they don't affect this query/document comparison) we are happy to think that the occurence of these terms is independent of their ir/relevance.

Consider the sets $D$ and $Q$ shown in Figure (figure) defined in terms of those features $x_i$ present and absent in the document and query, respectively. $q_i$ of the presence of a feature in irrelevant documents, but there is no intended, direct connection between these two quantities.} Regrouping the two products of Equation (FOAref) into four products created by the two sets $D$ and $Q$, the $\frac{p_{i}}{q_{i}}$ terms cancel except in in the intersection of the query and document (where the feature $x_{i}$ is present in both) and in $Q \backslash D$ , the set difference of $Q$ take-away $D$:

In the retrieval situation we will exploit the sparseness that makes it much more efficient to keep track of where a feature does occur ( $x_{i}=1$ )than all the places it does not ( $x_{i}=0$ ). Since the second product is defined over all the features of $q$ except those in $d$ , if we are careful to pre-multiply'' each feature in their intersection by a reciprocal, we can then safely multiply everything in the query by the same ratio:

The next section will show the utility of separating the last term, which depends on features of the document in question, from the first two which do not as part of an online retrieval calculation.

But first, it is worthwhile considering how we might attempt to estimate some of the required statistics. Fuhr [Fuhr92] , for example, considers the case when we have RelFbk from a user who has evaluated each of the top $N$ documents in an initial retrieval and has found $R$ of these to be relevant (as well as evaluating all the $N-R$ rest of them and found them to be irrelevant!). If a particular feature $x_{i}$ is present in $n$ of the retrieved documents with $r$ of these relevant, then this bit of RelFbk provides reasonable estimates for $p_{i}$ and $q_{i}$: \widetilde{p_{i}} &=&\frac{r}{R} \\ \widetilde{q_{i}} &=&\frac{n-r}{N-R}

FOA © R. K. Belew - 00-09-21