CSE 291 Lecture Notes, February 17, 2005

CSE 291 LECTURE NOTES

February 17, 2005

ASYMPTOTIC LRT DISTRIBUTION

We'll prove the central result about likelihood ratio tests:

Theorem [Wilks, 1938]: Suppose (x1 ... xn) are iid with pdf f(x|theta), where f satisfies regularity conditions. Suppose we want to test the null hypothesis H0: theta = theta_0 versus H1: theta =/= theta_0. Let theta hat be the MLE of theta.

Then under H0, as n tends to infinity, the distribution of 2 log lambda(x1 ... xn) tends to the chi-squared distribution with one degree of freedom.

PROOF OF WILKS' THEOREM

Proof: Note that if H0 is true then theta hat must be close to theta_0 for large n. We write the log likelihood function l(theta,x) as a Taylor expansion around theta hat:

l(theta,x) = l(theta hat, x) + l'(theta hat, x)(theta - theta hat) + l"(theta hat, x) (theta - theta hat)^2/2! + ...

Note that l'(theta hat, x) = 0 and that -2 log lambda(x) = -2 l(theta_0, x) + 2 l(theta hat, x). We have

-2 log lambda(x) ~=~ -2 l(theta hat, x) - 2l"(theta hat, x) (theta_0 - theta hat)^2/2! + 2 l(theta hat, x)
= - l"(theta hat, x) (theta_0 - theta hat)^2

Note the possible error in Casella and Berger, where l"(theta hat, x) is shown as a denominator.

Now remember that I_theta = var[l'(x,theta)] = E_theta [ l'(x,theta)^2 ] = E_theta[- l"(x,theta)].

Consider -l"(theta hat, x)/n. This is the observed average information for x. For large n it will be very close to its expectation, which is I(theta_0).

Now consider the random variable (theta_0 - theta hat(x))^2. We know from before that under H0, the distribution of theta hat tends towards N(0, I_theta(n)^-1). So the distribution of

Slutsky's theorem says that the distribution of theta hat * l"(theta hat, x) tends towards the distribution of theta hat*I(theta_0).

THE MORE GENERAL RESULT

Let the null hypothesis be Omega and the alternative be Theta - Omega. Then the distribution of 2 log lambda(x) converges to a ch-squared distribution with degrees of freedom k, where k is the number of dimensions of Theta that are fixed in Omega.

Usually, Theta contains an open subset in R^p and Omega contains an open subset in R^q. Then k = p-q.

The proof of this result involves matrix algebra. See page 114 of Silvey, and http://www.math.umd.edu/~evs/s701/WilksThm.pdf by Prof. Eric Slud.

THE CHI-SQUARED TEST

Suppose we have n trials, each resulting in one of k alternative outcomes. Our observed dataset is x_1 through x_k, the count of each outcome. Our null hypothesis is a (restricted family of) multinomial distributions: H0: pi = pi(theta) for theta in Omega. The alternative hypothesis is that the pi are unrestricted.

The conventional test statistic is SUM_i (o_i - e_i)^2/e_i where o_i is the observed count in cell i, and e_i is the expected count in cell i. "Expected" means expected if the null hypothesis is true.

The test is a version of a LRT, and the test statistic follows a chi-squared distribution, if H0 is true. Doing maximum likelihood under H1, we have k-1 parameters to choose. Different null hypotheses involve diufferent numbers of free parameters. Say this number is p. The number of degrees of freedom for the chi-squared distribution is k-p-1, i.e. #cells - #parameters -1.

CHI-SQUARED TESTS FOR 2D TABLES

A 2D table of counts has m rows and n columns. Let xij be the count in row i, column j. Some useful notation: let xi. mean SUM_j xij, let x.j mean SUM_i xij, and let x.. mean SUM_i SUM_j xij.

Given a single such table of counts, there are different chi-squared tests for different null hypotheses. You need to think carefully about the substance of the question being asked in order to choose the right null hypothesis, and the right alternative hypothesis. Each hypothesis always involves one or more multinomial distributions, and one or more equalities involving the parameters of these multinomial distributions.

Example 1: Let H0 be that the distribution in each row is the same. Suppose the total count in each row is fixed, i.e. non-random. This particular test is useful for comparing patient outcomes under two different treatments, for example.

Here, H0 is that pij = pj for all i. H0 has n-1 free parameters, while H1 has m(n-1) free parameters. Under H0, the ML parameter estimate is phatj = x.j / x.. Therefore eij = phatj xi. = (x.j/x..)xi.

Example 2: Let H0 be that rows are independent of columns. In other words, which row an outcome falls into is independent of which column it falls into. This test is useful, for example, for testing whether grad school admission is independent of gender.

Here, H0 is that pij = pi qj while H1 involves the maximum freedom, i.e. a single multinomial. Under H0, phati = xi./x.. and qhatj = x.j/x.. eij = (xi. x.j)/x.. which is the same as before, but the number of degrees of freedom is different.

THE CHI-SQUARED TEST IS A LIKELIHOOD RATIO TEST

Now I'll show how chi-squared tests arise as an approximation of likelihood ratio tests for multinomial distributions. This lecture is based on Lecture 9 of a course for second-year undergraduates taught by Prof. Richard Weber of Cambridge University. All his teaching materials are highly recommended.

It turns out that whatever the hypotheses under consideration, the LRT test statistic always has the same form

SUM_i o_i log (o_i/e_i)

where o_i is the observed count and e_i is the corresponding expected count assuming H0 is true. The conventional statistic

SUM_i (o_i - e_i)^2/e_i

is a numerical approximation to the correct LRT statistic.

Let's do the one-dimensional case.