next up previous


Postscript version of these notes.

STAT 801 Lecture 18

Reading for Today's Lecture:

Goals of Today's Lecture:

Finding Sufficient statistics

Binomial(n,p): log likelihood $\ell(\theta)$ (part depending on $\theta$) is function of X alone, not of $Y_1, \ldots,Y_n$ as well.

Normal example: $\ell(\mu)$ is, ignoring terms not containing $\mu$,

\begin{displaymath}\ell(\mu) = \mu \sum X_i - n\mu^2/2 = n\mu\bar{X} -n\mu^2/2 \, .
\end{displaymath}

Examples of the Factorization Criterion:

Theorem: If the model for data X has density $f(x,\theta)$then the statistic S(X) is sufficient if and only if the density can be factored as

\begin{displaymath}f(x,\theta) = g(s(x),\theta)h(x)
\end{displaymath}

Proof: Find statistic T(X) such that X is a one to one function of the pair S,T. Apply change of variables to the joint density of S and T. If the density factors then

\begin{displaymath}f_{S,T}(s,t) =g(s,\theta) h(x(s,t))
\end{displaymath}

so conditional density of T given S=s does not depend on $\theta$. Thus the conditional distribution of (S,T) given S does not depend on $\theta$ and finally the conditional distribution of X given S does not depend on $\theta$.

Conversely if S is sufficient then the fT|S has no $\theta$ in it so joint density of S,T is

\begin{displaymath}f_S(s,\theta) f_{T\vert S} (t\vert s)
\end{displaymath}

Apply change of variables formula to get

\begin{displaymath}f_X(x) = f_S(s(x),\theta) f_{T\vert S} (t(x)\vert s(x)) J(x)
\end{displaymath}

where J is the Jacobian. This factors.

Example: If $X_1,\ldots,X_n$ are iid $N(\mu,\sigma^2)$then the joint density is
\begin{multline*}(2\pi)^{-n/2} \sigma^{-n} \times \\ \exp\{-\sum X_i^2/(2\sigma^2) +\mu\sum X_i/\sigma^2
-n\mu^2/(2\sigma^2)\}
\end{multline*}
which is evidently a function of

\begin{displaymath}\sum X_i^2, \sum X_i
\end{displaymath}

This pair is a sufficient statistic. You can write this pair as a bijective function of $\bar{X}, \sum (X_i-\bar{X})^2$ so that this pair is also sufficient.

Completeness
In Binomial(n,p) example only one function of Xis unbiased. Rao Blackwell shows UMVUE, if it exists, will be a function of any sufficient statistic. Can there be more than one such function? Yes in general but no for some models like the binomial.

Definition: A statistic T is complete for a model $P_\theta;\theta\in\Theta$ if

\begin{displaymath}E_\theta(h(T)) = 0
\end{displaymath}

for all $\theta$ implies h(T)=0.

We have already seen that X is complete in the Binomial(n,p) model. In the $N(\mu,1)$ model suppose

\begin{displaymath}E_\mu(h(\bar{X})) \equiv 0
\end{displaymath}

Since $\bar{X}$ has a $N(\mu,1/n)$ distribution we find that

\begin{displaymath}\int_{-\infty}^\infty h(x) e^{-x^2/2} e^{\mu x} dx \equiv 0
\end{displaymath}

This is the so called Laplace transform of the function h(x)e-x2/2. It is a theorem that a Laplace transform is 0 if and only if the function is 0 ( because you can invert the transform). Hence $h\equiv 0$.

How to Prove Completeness

There is only one general tactic. Suppose X has density

\begin{displaymath}f(x,\theta) = h(x) \exp\{\sum_1^p a_i(\theta)S_i(x)+c(\theta)\}
\end{displaymath}

If the range of the function $(a_1(\theta),\ldots,a_p(\theta))$ as $\theta$ varies over $\Theta$ contains a (hyper-) rectangle in Rp then the statistic

\begin{displaymath}( S_1(X), \ldots, S_p(X))
\end{displaymath}

is complete and sufficient.

You prove the sufficiency by the factorization criterion and the completeness using the properties of Laplace transforms and the fact that the joint density of $S_1,\ldots,S_p$

\begin{displaymath}g(s_1,\ldots,s_p;\theta) = h^*(s) \exp\{\sum a_k(\theta)s_k+c^*(\theta)\}
\end{displaymath}

Example: $N(\mu,\sigma^2)$ model density has form

\begin{displaymath}\frac{1}{\sqrt{2\pi}} \exp\left\{
\left(-\frac{1}{2\sigma^2}...
...rac{\mu}{\sigma^2} \right)x
-\frac{\mu^2}{2\sigma^2} \right\}
\end{displaymath}

which is an exponential family with

\begin{displaymath}h(x) = \frac{1}{\sqrt{2\pi}}
\end{displaymath}


\begin{displaymath}a_1(\theta) = -\frac{1}{2\sigma^2}
\end{displaymath}


S1(x) = x2


\begin{displaymath}a_2(\theta) = \frac{\mu}{\sigma^2}
\end{displaymath}


S2(x) = x

and

\begin{displaymath}c(\theta) = -\frac{\mu^2}{2\sigma^2}
\end{displaymath}

It follows that

\begin{displaymath}(\sum X_i^2, \sum X_i)
\end{displaymath}

is a complete sufficient statistic.

Remark: The statistic $(s^2, \bar{X})$ is a one to one function of $(\sum X_i^2, \sum X_i)$ so it must be complete and sufficient, too. Any function of the latter statistic can be rewritten as a function of the former and vice versa.

The Lehmann-Scheffé Theorem

Theorem: If S is a complete sufficient statistic for some model and h(S) is an unbiased estimate of some parameter $\phi(\theta)$ then h(S) is the UMVUE of $\phi(\theta)$.

Proof: Suppose T is another unbiased estimate of $\phi$. According to Rao-Blackwell, T is improved by E(T|S) so if h(S) is not UMVUE then there must exist another function h*(S) which is unbiased and whose variance is smaller than that of h(S) for some value of $\theta$. But

\begin{displaymath}E_\theta(h^*(S)-h(S)) \equiv 0
\end{displaymath}

so, in fact h*(S) = h(S).

Example: In the $N(\mu,\sigma^2)$ example the random variable $(n-1)s^2/\sigma^2$ has a $\chi^2_{n-1}$ distribution. It follows that

\begin{displaymath}E\left[\frac{\sqrt{n-1}s}{\sigma}\right] =
\frac{\int_0^\inf...
...frac{x}{2}\right)^{(n-1)/2-1} e^{-x/2}
dx}{{2\Gamma((n-1)/2)}}
\end{displaymath}

Make the substitution y=x/2 and get

\begin{displaymath}E(s) = \frac{\sigma}{\sqrt{n-1}}\frac{\sqrt{2}}{\Gamma((n-1)/2)}
\int_0^\infty y^{n/2-1} e^{-y} dy
\end{displaymath}

Hence

\begin{displaymath}E(s) = \sigma\frac{\sqrt{2(n-1)}\Gamma(n/2)}{\sqrt{n-1}\Gamma((n-1)/2)}
\end{displaymath}

The UMVUE of $\sigma$ is then

\begin{displaymath}s\frac{\sqrt{n-1}\Gamma((n-1)/2)}{\sqrt{2(n-1)}\Gamma(n/2)}
\end{displaymath}

by the Lehmann-Scheffé theorem.

Criticism of Unbiasedness

Minimal Sufficiency

In any model $S(X)\equiv X$ is sufficient. In any iid model the vector $X_{(1)}, \ldots, X_{(n)}$ of order statistics is sufficient. In $N(\mu,1)$ model we have 3 sufficient statistics:

1.
$S_1 = (X_1,\ldots,X_n)$.

2.
$S_2 = (X_{(1)}, \ldots, X_{(n)})$.

3.
$S_3 = \bar{X}$.

Notice that I can calculate S3 from the values of S1 or S2but not vice versa and that I can calculate S2 from S1 but not vice versa. It turns out that $\bar{X}$ is a minimal sufficient statistic meaning that it is a function of any other sufficient statistic. (You can't collapse the data set any more without losing information about $\mu$.)

Recognize minimal sufficient statistics from $\ell$:

Fact: If you fix some particular $\theta^*$ then the log likelihood ratio function

\begin{displaymath}\ell(\theta)-\ell(\theta^*)
\end{displaymath}

is minimal sufficient. WARNING: the function is the statistic.

Subtraction of $\ell(\theta^*)$ gets rid of irrelevant constants in $\ell$. In $N(\mu,1)$ example:

\begin{displaymath}\ell(\mu) = -n\log(2\pi)/2 - \sum X_i^2/2 + \mu\sum X_i -n\mu^2/2
\end{displaymath}

depends on $\sum X_i^2$, not needed for sufficient statistic. Take $\mu^*=0$ and get

\begin{displaymath}\ell(\mu) -\ell(\mu^*) = \mu\sum X_i -n\mu^2/2
\end{displaymath}

This function of $\mu$ is minimal sufficient. Notice: from $\sum X_i$ can compute this minimal sufficient statistic and vice versa. Thus $\sum X_i$ is also minimal sufficient.

FACT: A complete sufficient statistic is also minimal sufficient.


next up previous



Richard Lockhart
2000-03-10