# Multi-variate Bernoulli

Arguably the simplest model captures only the presence/absence of words in the document. That is, the document is modeled as the composition of $k$ keywords drawn from the  as so many independent Bernoulli trials. That is, we imagine that a document $\mathbf{d}$ is constructed by repeatedly selecting $|\mathbf{d}|$ words for each position in the document.

A reasonable simplification is to assume that the word's position within the document does not affect its conditional probability: (\forall i,j) \Pr(k_{i} | c ; \Theta) & = & \Pr(k_{j} | c ; \Theta) \\ & \equiv & \Pr(k | c ; \Theta)

When we become interested in realistic document structures and writing conventions (e.g., abstract paragraphs, introductions and conclusions, SPIRAL EXPOSITIONS of news stories (cf. Section §6.2 ), etc., this assumption must be reconsidered.}

If we associate a biased coin with each keyword $k$, we can decompose the desired model $into two sets of parameters: \theta_{c} & \equiv & \Pr(c) \\ \theta_{ck} & \equiv & \Pr(k | c) {i.e.,\ the prior probability of each class$c$, and the probability that a keyword is present given that we know a document containing it is in class$c\$. Then the naive Bayesian'' assumption allows us to assume that the keywords occur at each positional locations independently of one another: \Pr(\mathbf{d}|c) = \prod_{i=1}^{|\mathbf{d}|} \theta_{ck}

FOA © R. K. Belew - 00-09-21