CSE 291 LECTURE NOTES

January 29, 2004
 
 

ANNOUNCEMENTS

Thanks for handing in the current assignment today.
 
 

USING LIKELIHOODS TO TEST HYPOTHESES

Suppose we have a family P_theta of possible probability distributions, where theta is in Theta.  Let's say we have a null hypothesis that is a subset Omega of Theta.

Idea:  Given x, find the best-guess distribution inside Theta and also inside Omega.  Each of these gives a maximum likelihood.  Look at the ratio for Theta over Omega.  By definition this ratio is  lambda(x) >= 1.

We make decisions using a threshold k.  We reject the null hypothesis Omega if and only if lambda(x) > k.  We choose k' so that

sup_theta in Omega P_theta(lambda(x) > k)  =  alpha
where alpha is called the significance level of the test.

Notes:

  1. Sometimes the null hypothesis Omega is a single point, e.g. mean = 0.
  2. We use sup instead of max above because the set Omega may be open, so it has a supremum but no maximum.
  3. If we have a sufficient statistic t(x), we can compute the likelihood ratio using just this, without needing x itself.
  4. Often, lambda(x) is an increasing function of t(x), so lambda(x) > k iff  t(x) > k'.
  5. If x has discrete values only, we may not be able to get exact equality.

   

T-TEST EXAMPLE

Suppose an observation x is (y,z) where y = (y1 ... yn) and z = (z1 ... zn) are iid samples from N(mu1, sigma^2) and N(mu2,sigma^2) respectively.  The null hypothesis is that mu1 = mu2.

Note:  We don't care what sigma^2 is, but different values for it give different tests.  A parameter of this sort is called a nuisance parameter.

Here, the likelihood function is

p(x,theta)  =  (2pi)^-2 sigma^-2n exp [ -0.5 sigma^-2 { SUM (yi - mu1)^2 + SUM (zi - mu2)^2 } ]
The supremum of this function over the unrestricted hypothesis space is
p(x,theta)  =  (2pi)^-2 sigmahat^-2n exp [ -0.5 sigmahat^-2 { SUM (yi - ybar)^2 + SUM (zi - zbar)^2 } ]
where
sigmahat^2  =  1/2n [ SUM (yi - ybar)^2 + SUM (zi - zbar)^2 ].
sigmahat^2 is called a pooled estimate of the variance.

Over the null hypothesis space, the supremum is

(2pi)^-2 sigmadot^-2n exp(-n)
where
sigmadot^2  =  1/2n [ SUM (yi - mudot)^2 + SUM (zi - mudot)^2 ]
mudot = 0.5 (ybar + zbar)
Hence
lambda(x) =  [ sigmadot^2 / sigmahat^2 ]^n
Now  sigmadot^2  =  sigmahat^2 + 0.25(ybar - zbar)^2, so  lambda(x) > k if and only if
| ybar - zbar | / sigmahat > k'
Note that sigmahat appears in the denominator, not sigmadot.

To determine k we need to find c such that     sup_theta in Omega P_theta( | ybar - zbar | / sigmahat > c )  =  alpha.

To do this, we need to know the distribution of  | ybar - zbar | / sigmahat.  Working this out is difficult, but fortunately we can work out what its distribution tends to, as n tends to infinity.


ASYMPTOTIC LRT DISTRIBUTION

We'll prove the central result about likelihood ratio tests:

Theorem [Wilks, 1938]:  Suppose (x1 ... xn) are iid with pdf f(x|theta), where f satisfies regularity conditions.  Suppose we want to test the null hypothesis H0: theta = theta_0 versus H1: theta =/= theta_0.  Let theta hat be the MLE of theta.

Then under H0, as n tends to infinity, the distribution of -2 log lambda(x1 ... xn) tends to the chi-squared distribution with one degree of freedom.

First we need a result about probability distributions:

Theorem [Slutsky]:  If X_n tends to X in distribution, and Y_n tends a constant bin probability, then

X_n + Y_n tends to X + b in distribution, and
X_n * Y_n tends to bX in distribution.
Proof:  Omitted.

Intuitively, Slutsky's theorem says that the influence of Y_n on X_n is that of a constant, if Y_n tends to a constant.

Example:  Suppose sqrt(n)(X bar_n - mu)/sigma is aymptotically N(0,1), but the true variance sigma^2 is unknown.  Let S^2_n be our estimator of the variance.  Suppose the variance of this estimator tends to zero.  Then the theorem says that sqrt(n)(X bar_n - mu)/sigma is aymptotically N(0,1).
 
 

PROOF OF WILKS' THEOREM

Theorem [Wilks, 1938]:  Suppose (x1 ... xn) are iid with pdf f(x|theta), where f satisfies regularity conditions.  Suppose we want to test the null hypothesis H0: theta = theta_0 versus H1: theta =/= theta_0.  Let theta hat be the MLE of theta.

Then under H0, as n tends to infinity, the distribution of -2 log lambda(x1 ... xn) tends to the chi-squared distribution with one degree of freedom.

Proof:  Note that if H0 is true then theta hat must be close to theta_0 for large n.  We write the log likelihood function l(theta,x) as a Taylor expansion around theta hat:

l(theta,x)  =  l(theta hat, x) + l'(theta hat, x)(theta - theta hat) + l"(theta hat, x) (theta - theta hat)^2/2! + ...
Note that  l'(theta hat, x) = 0  and that  -2 log lambda(x) = -2 l(theta_0, x) + 2 l(theta hat, x).  We have
-2 log lambda(x) ~=~ -2 l(theta hat, x) - 2l"(theta hat, x) (theta_0 - theta hat)^2/2!  + 2 l(theta hat, x)
                                  =  - l"(theta hat, x) (theta_0 - theta hat)^2
Note the possible error in Casella and Berger, where l"(theta hat, x) is shown as a denominator.

Now remember that  I_theta  =  var[l'(x,theta)]  =  E_theta [ l'(x,theta)^2 ] = E_theta[- l"(x,theta)].

Consider -l"(theta hat, x)/n.  This is the observed average information for x.  For large n it will be very close to its expectation, which is I(theta_0).

Now consider the random variable (theta_0 - theta hat(x))^2.  We know from before that under H0, the distribution of theta hat tends towards N(0, I_theta(n)^-1).  So the distribution of

Slutsky's theorem says that the distribution of theta hat * l"(theta hat, x) tends towards the distribution of theta hat*I(theta_0).
 
 

THE MORE GENERAL RESULT

Let the null hypothesis be Omega and the alternative be Theta - Omega.  Then the distribution of -2 log lambda(x) converges to a ch-squared distribution with degrees of freedom k, where k is the number of dimensions of Theta that are fixed in Omega.

Usually, Theta contains an open subset in R^p and Omega contains an open subset in R^q.  Then k = p-q.

The proof of this result involves matrix algebra.  See page 114 of Silvey, and http://www.math.umd.edu/~evs/s701/WilksThm.pdf by Prof. Eric Slud.