Idea: Given x, find the best-guess distribution inside Theta and also inside Omega. Each of these gives a maximum likelihood. Look at the ratio for Theta over Omega. By definition this ratio is lambda(x) >= 1.
We make decisions using a threshold k. We reject the null hypothesis Omega if and only if lambda(x) > k. We choose k' so that
sup_theta in Omega P_theta(lambda(x) > k) = alphawhere alpha is called the significance level of the test.
Notes:
Note: We don't care what sigma^2 is, but different values for it give different tests. A parameter of this sort is called a nuisance parameter.
Here, the likelihood function is
p(x,theta) = (2pi)^-2 sigma^-2n exp [ -0.5 sigma^-2 { SUM (yi - mu1)^2 + SUM (zi - mu2)^2 } ]The supremum of this function over the unrestricted hypothesis space is
p(x,theta) = (2pi)^-2 sigmahat^-2n exp [ -0.5 sigmahat^-2 { SUM (yi - ybar)^2 + SUM (zi - zbar)^2 } ]sigmahat^2 is called a pooled estimate of the variance.
where
sigmahat^2 = 1/2n [ SUM (yi - ybar)^2 + SUM (zi - zbar)^2 ].
Over the null hypothesis space, the supremum is
(2pi)^-2 sigmadot^-2n exp(-n)where
sigmadot^2 = 1/2n [ SUM (yi - mudot)^2 + SUM (zi - mudot)^2 ]Hence
mudot = 0.5 (ybar + zbar)
lambda(x) = [ sigmadot^2 / sigmahat^2 ]^nNow sigmadot^2 = sigmahat^2 + 0.25(ybar - zbar)^2, so lambda(x) > k if and only if
| ybar - zbar | / sigmahat > k'Note that sigmahat appears in the denominator, not sigmadot.
To determine k we need to find c such that
sup_theta in Omega P_theta( | ybar - zbar | / sigmahat > c )
= alpha.
To do this, we need to know the distribution of | ybar - zbar
| / sigmahat. Working this out is difficult, but fortunately we
can work out what its distribution tends to, as n tends to infinity.
Theorem [Wilks, 1938]: Suppose (x1 ... xn) are iid with pdf f(x|theta), where f satisfies regularity conditions. Suppose we want to test the null hypothesis H0: theta = theta_0 versus H1: theta =/= theta_0. Let theta hat be the MLE of theta.
Then under H0, as n tends to infinity, the distribution of -2 log lambda(x1 ... xn) tends to the chi-squared distribution with one degree of freedom.
First we need a result about probability distributions:
Theorem [Slutsky]: If X_n tends to X in distribution, and Y_n tends a constant bin probability, then
X_n + Y_n tends to X + b in distribution, andProof: Omitted.
X_n * Y_n tends to bX in distribution.
Intuitively, Slutsky's theorem says that the influence of Y_n on X_n is that of a constant, if Y_n tends to a constant.
Example: Suppose sqrt(n)(X bar_n - mu)/sigma is aymptotically
N(0,1), but the true variance sigma^2 is unknown. Let S^2_n be our
estimator of the variance. Suppose the variance of this estimator
tends to zero. Then the theorem says that sqrt(n)(X bar_n -
mu)/sigma is aymptotically N(0,1).
Then under H0, as n tends to infinity, the distribution of -2 log lambda(x1 ... xn) tends to the chi-squared distribution with one degree of freedom.
Proof: Note that if H0 is true then theta hat must be close to theta_0 for large n. We write the log likelihood function l(theta,x) as a Taylor expansion around theta hat:
l(theta,x) = l(theta hat, x) + l'(theta hat, x)(theta - theta hat) + l"(theta hat, x) (theta - theta hat)^2/2! + ...Note that l'(theta hat, x) = 0 and that -2 log lambda(x) = -2 l(theta_0, x) + 2 l(theta hat, x). We have
-2 log lambda(x) ~=~ -2 l(theta hat, x) - 2l"(theta hat, x) (theta_0 - theta hat)^2/2! + 2 l(theta hat, x)Note the possible error in Casella and Berger, where l"(theta hat, x) is shown as a denominator.
= - l"(theta hat, x) (theta_0 - theta hat)^2
Now remember that I_theta = var[l'(x,theta)] = E_theta [ l'(x,theta)^2 ] = E_theta[- l"(x,theta)].
Consider -l"(theta hat, x)/n. This is the observed average information for x. For large n it will be very close to its expectation, which is I(theta_0).
Now consider the random variable (theta_0 - theta hat(x))^2. We know from before that under H0, the distribution of theta hat tends towards N(0, I_theta(n)^-1). So the distribution of
Slutsky's theorem says that the distribution of theta hat * l"(theta
hat, x) tends towards the distribution of theta hat*I(theta_0).
Usually, Theta contains an open subset in R^p and Omega contains an open subset in R^q. Then k = p-q.
The proof of this result involves matrix algebra. See page 114
of Silvey, and http://www.math.umd.edu/~evs/s701/WilksThm.pdf
by Prof. Eric Slud.