Example: Suppose x = (x1 ... xn) is an iid sample from a univariate
normal distribution with parameter theta = (mu, sigma^2). The
obvious estimator for mu is x bar. What is the MSE of x bar?
Answer: ...
E_theta [g hat - g(theta)]^2 < E_theta [g bar - g(theta)]^2This is not achievable in general. Consider the estimator g bar(x) = g(theta_0) regardless of x. Although this is a bad estimator in general, it has zero error for theta = theta_0. So g hat would have to have zero error for theta_0, and hence for all theta. (Analogy: A stopped clock is perfectly accurate twice a day.)
Definition: The estimator g hat is unbiased if E_theta [g hat(x)] = g(theta) for all theta.
Example continued: Suppose x = (x1 ... xn) is an iid sample from a univariate normal distribution with parameter theta = (mu, sigma^2). The obvious estimators for mu and sigma^2 are x bar and s^2 = (1/n) SUM (xi - x bar)^2.
It can be computed that x bar is unbiased, but s^2 is not: E[x bar]
= mu and E[s^2] =/= sigma^2.
Exercise: Compute E[s^2]. This is related to Question 2 on the
current assignment.
Example: Suppose x = (x1 ... xn) is the result of n independent binomial trials. Intuitively, only the order of the 1s and 0s is irrelevant, and the sum SUM x_i captures all available information about the probability theta of success.
Note: We assume without question that the trials are independent. Information other than the sum would be relevant if we wanted to check this assumption!
Definition: A statistic t is
any function of the sample x. Intuitively, a statistic is a
summary of the observed data.
Intuitively, a statistic is sufficient if it preserves all information from x that is relevant for estimating theta.
The function x |-> SUM x_i is a statistic. We shall prove that this statistic is sufficient.
Suppose we cannot observe x directly, but just that x belongs to the set A. Clearly this information is relevant for estimating theta.
Now suppose we discover exactly which x in A was the outcome. This extra information does not help us refine our estimate of the value of theta.
Example continued: Suppose x = (x1 ... xn) in X = {0,1}^n. Partition X into {A0 ... An} where Ak = {x: SUM xi = k}.
Now P_theta(x|Ak) = 1/(n choose k) if x in Ak and zero if x not in Ak, for any theta.
Definition: The partition {A} of X is sufficient for the family P_theta if for every theta, P_theta(x|A) is the same for all theta.
The partition {A} is minimal sufficient
if its sets are supersets of those of every other sufficient partition.
Lemma: This partition is minimal sufficient (under certain natural conditions).
Example: For x being the outcome of n independent binomial trials (i.e. a binary sequence of length n), p_theta(x) = theta^z (1-theta)^(n - z) where z = sum xi.
We have p_theta(x)/p_theta(x') = theta^(z-z')
(1-theta)^(-z+z'). If and only if z = z' then this ratio is
constant. Hence the partition based on SUM xi is minimal
sufficient.
Any statistic t generates a partition of X based on the equivalence relation x ~ x' iff t(x) = t(x').
Definition: The statistic t is (minimal) sufficient for P_theta if this partition is (minimal) sufficient.
A minimal sufficient statistic is a function of every other sufficient statistics, i.e. it loses information compared to all of these.
Note that minimal sufficient statistics are never unique.
Note that if E_theta [g hat(x)] = g(theta) then E_theta [g hat(x) - g(theta)] is the variance var_theta(g hat). Therefore we are looking for minimum variance unbiased estimators: MVUEs.
Theorem: Let P_theta be a family of distributions on a sample space X, where theta in Theta. Suppose g tilde: X -> R is any unbiased estimator of g: Theta -> R. Let t be a statistic that is sufficient for theta. Then g hat(t) = E_theta[ g tilde | t] is an unbiased estimator for g with variance equal-or-smaller to that of g tilde.Intuition: If we average over all possible observations x' that have the same value for the sufficient statistic t, then we reduce the variance of the estimator.