CSE 291 LECTURE NOTES
February 17, 2005
ASYMPTOTIC LRT DISTRIBUTION
We'll prove the central result about likelihood ratio tests:
Theorem [Wilks, 1938]: Suppose (x1 ... xn) are iid with pdf
f(x|theta), where f satisfies regularity conditions. Suppose we
want to test the null hypothesis H0: theta = theta_0 versus H1: theta
=/= theta_0. Let theta hat be the MLE of theta.
Then under H0, as n tends to infinity, the distribution of 2 log
lambda(x1 ... xn) tends to the chi-squared distribution with one degree
of freedom.
PROOF OF WILKS' THEOREM
Proof: Note that if H0 is true then theta hat must be close to
theta_0 for large n. We write the log likelihood function
l(theta,x) as a Taylor expansion around theta hat:
l(theta,x) = l(theta hat, x) + l'(theta hat,
x)(theta - theta hat) + l"(theta hat, x) (theta - theta hat)^2/2! + ...
Note that l'(theta hat, x) = 0 and that -2 log
lambda(x) = -2 l(theta_0, x) + 2 l(theta hat, x). We have
-2 log lambda(x) ~=~ -2 l(theta hat, x) - 2l"(theta hat, x)
(theta_0 - theta hat)^2/2! + 2 l(theta hat, x)
= - l"(theta hat, x) (theta_0 - theta hat)^2
Note the possible error in Casella and Berger, where l"(theta hat, x)
is shown as a denominator.
Now remember that I_theta = var[l'(x,theta)]
= E_theta [ l'(x,theta)^2 ] = E_theta[- l"(x,theta)].
Consider -l"(theta hat, x)/n. This is the observed average
information for x. For large n it will be very close to its
expectation, which is I(theta_0).
Now consider the random variable (theta_0 - theta hat(x))^2.
We know from before that under H0, the distribution of theta hat tends
towards N(0, I_theta(n)^-1). So the distribution of
Slutsky's theorem says that the distribution of theta hat * l"(theta
hat, x) tends towards the distribution of theta hat*I(theta_0).
THE MORE GENERAL RESULT
Let the null hypothesis be Omega and the alternative be Theta -
Omega. Then the distribution of 2 log lambda(x) converges to a
ch-squared distribution with degrees of freedom k, where k is the
number of dimensions of Theta that are fixed in Omega.
Usually, Theta contains an open subset in R^p and Omega contains an
open subset in R^q. Then k = p-q.
The proof of this result involves matrix algebra. See page 114
of Silvey, and http://www.math.umd.edu/~evs/s701/WilksThm.pdf
by Prof. Eric Slud.
THE CHI-SQUARED TEST
Suppose we have n trials, each resulting in one of k alternative
outcomes. Our observed dataset is x_1 through x_k, the count of
each outcome. Our null hypothesis is a (restricted family of)
multinomial distributions: H0: pi = pi(theta) for theta in
Omega. The alternative hypothesis is that the pi are unrestricted.
The conventional test statistic is SUM_i (o_i - e_i)^2/e_i
where o_i
is the observed count in cell i, and e_i is the expected count in cell
i. "Expected" means expected if the null hypothesis is true.
The test is a version of a LRT, and the test statistic follows a
chi-squared distribution, if H0 is true. Doing maximum likelihood
under H1, we have k-1 parameters to choose.
Different null hypotheses involve diufferent numbers of free
parameters. Say this number is p. The number of degrees of
freedom
for the chi-squared distribution is k-p-1, i.e. #cells - #parameters -1.
CHI-SQUARED TESTS FOR 2D TABLES
A 2D table of counts has m rows and n columns. Let xij be the
count in row i, column j. Some useful notation: let xi. mean
SUM_j xij, let x.j mean SUM_i xij, and let x.. mean SUM_i SUM_j xij.
Given a single such table of counts, there are different chi-squared
tests for different null hypotheses. You need to think carefully
about the substance of the question being asked in order to choose the
right null hypothesis, and the right alternative hypothesis. Each
hypothesis always involves one or more
multinomial distributions, and one or more equalities involving the
parameters of these multinomial distributions.
Example 1: Let H0 be that the distribution in each row is the
same. Suppose the total count in each row is fixed, i.e.
non-random. This particular test is useful for comparing patient
outcomes under two different treatments, for example.
Here, H0 is that pij = pj for all i. H0 has n-1 free parameters,
while H1 has m(n-1) free parameters. Under H0, the ML parameter
estimate is phatj = x.j / x.. Therefore eij = phatj xi. =
(x.j/x..)xi.
Example 2: Let H0 be that rows are independent of columns.
In other words, which row an outcome falls into is independent of which
column it falls into. This test is useful, for example, for
testing whether grad school admission is independent of gender.
Here, H0 is that pij = pi qj while H1 involves the maximum
freedom, i.e. a single multinomial. Under H0, phati = xi./x.. and
qhatj = x.j/x.. eij = (xi. x.j)/x.. which is the same as before, but
the number of degrees of freedom is different.
THE CHI-SQUARED TEST IS A LIKELIHOOD RATIO TEST
Now I'll show how chi-squared tests arise as an approximation of
likelihood ratio tests for multinomial distributions. This
lecture
is based on Lecture
9 of a course
for second-year undergraduates taught by Prof. Richard Weber of
Cambridge University. All his teaching
materials are highly recommended.
It turns out that whatever the hypotheses under consideration, the
LRT test statistic always has the same form
SUM_i o_i log (o_i/e_i)
where o_i is the observed count and e_i is the corresponding expected
count assuming H0 is true. The conventional statistic
SUM_i (o_i - e_i)^2/e_i
is a numerical approximation to the correct LRT statistic.
Let's do the one-dimensional case.