next up previous


Postscript version of these notes

STAT 801 Lecture 22

Reading for Today's Lecture:

Goals of Today's Lecture:

Today's notes

Optimal tests

Likelihood Ratio tests

For general composite hypotheses optimality theory is not usually successful in producing an optimal test. instead we look for heuristics to guide our choices. The simplest approach is to consider the likelihood ratio

\begin{displaymath}\frac{f_{\theta_1}(X)}{f_{\theta_0}(X)}
\end{displaymath}

and choose values of $\theta_1 \in \Theta_1$ and $\theta_0 \in \Theta_0$which are reasonable estimates of $\theta$ assuming respectively the alternative or null hypothesis is true. The simplest method is to make each $\theta_i$ a maximum likelihood estimate, but maximized only over $\Theta_i$.

Example 1: In the $N(\mu,1)$ problem suppose we want to test $\mu \le 0$ against $\mu>0$. (Remember there is a UMP test.) The log likelihood function is

\begin{displaymath}-n(\bar{X}-\mu)^2/2
\end{displaymath}

If $\bar{X} >0$ then this function has its global maximum in $\Theta_1$at $\bar{X}$. Thus $\hat\mu_1$ which maximizes $\ell(\mu)$ subject to $\mu>0$ is $\bar{X}$ if $\bar{X} >0$. When $\bar{X} \le 0$ the maximum of $\ell(\mu)$ over $\mu>0$ is on the boundary, at $\hat\mu_1=0$. (Technically this is in the null but in this case $\ell(0)$ is the supremum of the values $\ell(\mu)$ for $\mu>0$. Similarly, the estimate $\hat\mu_0$ will be $\bar{X}$ if $\bar{X} \le 0$ and 0 if $\bar{X} >0$. It follows that

\begin{displaymath}\frac{f_{\theta_1}(X)}{f_{\theta_0}(X)}= \exp\{\ell(\hat\mu_1) - \ell(\hat\mu_0)\}
\end{displaymath}

which simplifies to

\begin{displaymath}\exp\{n\bar{X}\vert\bar{X}\vert/2\}
\end{displaymath}

This is a monotone increasing function of $\bar{X}$ so the rejection region will be of the form $\bar{X} > K$. To get the level right the test will have to reject if $n^{1/2} \bar{X} > z_\alpha$. Notice that the log likelihood ratio statistic

\begin{displaymath}\lambda \equiv 2\log(\frac{f_{\hat\mu_1}(X)}{f_{\hat\mu_0}(X)} = n\bar{X}\vert\bar{X}\vert
\end{displaymath}

as a simpler statistic.

Example 2: In the $N(\mu,1)$ problem suppose we make the null $\mu=0$. Then the value of $\hat\mu_0$ is simply 0 while the maximum of the log-likelihood over the alternative $\mu \neq 0$occurs at $\bar{X}$. This gives

\begin{displaymath}\lambda = n\bar{X}^2
\end{displaymath}

which has a $\chi^2_1$ distribution. This test leads to the rejection region $\lambda > (z_{\alpha/2})^2$ which is the usual UMPU test.

Example 3: For the $N(\mu,\sigma^2)$ problem testing $\mu=0$ against $\mu \neq 0$ we must find two estimates of $\mu,\sigma^2$. The maximum of the likelihood over the alternative occurs at the global mle $\bar{X}, \hat\sigma^2$. We find

\begin{displaymath}\ell(\hat\mu,\hat\sigma^2) = -n/2 - n \log(\hat\sigma)
\end{displaymath}

We also need to maximize $\ell$ over the null hypothesis. Recall

\begin{displaymath}\ell(\mu,\sigma) = -\frac{1}{2\sigma^2} \sum (X_i-\mu)^2 -n\log(\sigma)
\end{displaymath}

On the null hypothesis we have $\mu=0$ and so we must find $\hat\sigma_0$ by maximizing

\begin{displaymath}\ell(0,\sigma) = -\frac{1}{2\sigma^2} \sum X_i^2 -n\log(\sigma)
\end{displaymath}

This leads to

\begin{displaymath}\hat\sigma_0^2 = \sum X_i^2/n
\end{displaymath}

and

\begin{displaymath}\ell(0,\hat\sigma_0) = -n/2 -n\log(\hat\sigma_0)
\end{displaymath}

This gives

\begin{displaymath}\lambda =-n\log(\hat\sigma^2/\hat\sigma_0^2)
\end{displaymath}

Since

\begin{displaymath}\frac{\hat\sigma^2}{\hat\sigma_0^2} = \frac{ \sum (X_i-\bar{X})^2}{
\sum (X_i-\bar{X})^2 + n\bar{X}^2}
\end{displaymath}

we can write

\begin{displaymath}\lambda = n \log(1+t^2/(n-1))
\end{displaymath}

where

\begin{displaymath}t = \frac{n^{1/2} \bar{X}}{s}
\end{displaymath}

is the usual t statistic. The likelihood ratio test thus rejects for large values of |t| which gives the usual test.

Notice that if n is large we have

\begin{displaymath}\lambda \approx n[1+t^2/(n-1) +O(n^{-2})] \approx t^2 \, .
\end{displaymath}

Since the t statistic is approximately standard normal if n is large we see that

\begin{displaymath}\lambda = 2[\ell(\hat\theta_1) - \ell(\hat\theta_0)]
\end{displaymath}

has nearly a $\chi^2_1$ distribution.

This is a general phenomenon when the null hypothesis being tested is of the form $\phi=0$. Here is the general theory. Suppose that the vector of p+q parameters $\theta$ can be partitioned into $\theta=(\phi,\gamma)$ with $\phi$ a vector of p parameters and $\gamma$ a vector of q parameters. To test $\phi=\phi_0$ we find two mles of $\theta$. First the global mle $\hat\theta = (\hat\phi,\hat\gamma)$ maximizes the likelihood over $\Theta_1=\{\theta:\phi\neq\phi_0\}$ (because typically the probability that $\hat\phi$ is exactly $\phi_0$ is 0).

Now we maximize the likelihood over the null hypothesis, that is we find $\hat\theta_0 = (\phi_0,\hat\gamma_0)$ to maximize

\begin{displaymath}\ell(\phi_0,\gamma)
\end{displaymath}

The log-likelihood ratio statistic is

\begin{displaymath}2[\ell(\hat\theta)-\ell(\hat\theta_0)]
\end{displaymath}

Now suppose that the true value of $\theta$ is $\phi_0,\gamma_0$(so that the null hypothesis is true). The score function is a vector of length p+q and can be partitioned as $U=(U_\phi,U_\gamma)$. The Fisher information matrix can be partitioned as

\begin{displaymath}\left[\begin{array}{cc}
I_{\phi\phi} & I_{\phi\gamma}
\\
I_{\phi\gamma} & I_{\gamma\gamma}
\end{array}\right] \, .
\end{displaymath}

According to our large sample theory for the mle we have

\begin{displaymath}\hat\theta \approx \theta + I^{-1} U
\end{displaymath}

and

\begin{displaymath}\hat\gamma_0 \approx \gamma_0 + I_{\gamma\gamma}^{-1} U_\gamma
\end{displaymath}

If you carry out a two term Taylor expansion of both $\ell(\hat\theta)$ and $\ell(\hat\theta_0)$ around $\theta_0$ you get

\begin{displaymath}\ell(\hat\theta) \approx \ell(\theta_0) + U^t I^{-1}U + \frac{1}{2}
U^tI^{-1} V(\theta) I^{-1} U
\end{displaymath}

where V is the second derivative matrix of $\ell$. Remember that $V \approx -I$ and you get

\begin{displaymath}2[\ell(\hat\theta) - \ell(\theta_0)] \approx U^t I^{-1}U \, .
\end{displaymath}

A similar expansion for $\hat\theta_0$ gives

\begin{displaymath}2[\ell(\hat\theta_0) -\ell(\theta_0)] \approx U_\gamma^t I_{\gamma\gamma}^{-1}
U_\gamma \, .
\end{displaymath}

If you subtract these you find that

\begin{displaymath}2[\ell(\hat\theta)-\ell(\hat\theta_0)]
\end{displaymath}

can be written in the approximate form

UtMU

for a suitable matrix M. It is now possible to use the general theory of the distribution of Xt M X where X is $MVN(0,\Sigma)$ to demonstrate that

Theorem: The log-likelihood ratio statistic

\begin{displaymath}\lambda = 2[\ell(\hat\theta) - \ell(\hat\theta_0)]
\end{displaymath}

has, under the null hypothesis, approximately a $\chi_p^2$ distribution.

Aside:

Theorem: Suppose that $X\sim MVN(0,\Sigma)$ with $\Sigma$non-singular and Mis a symmetric matrix. If $\Sigma M \Sigma M \Sigma = \Sigma M \Sigma$then Xt M X has a $\chi^2$ distribution with degrees of freedom $\nu=trace(M\Sigma)$.

Proof: We have X=AZ where $AA^t = \Sigma$ and Z is standard multivariate normal. So Xt M X = Zt At M A Z. Let Q=At M A. Since $AA^t = \Sigma$ the condition in the theorem is actually

A QQAt = AQAt

Since $\Sigma$ is non-singular so is A. Multiply by A-1on the left and (At)-1 on the right to discover QQ=Q.

The matrix Q is symmetric and so can be written in the form $P\Lambda P^t$ where $\Lambda$ is a diagonal matrix containing the eigenvalues of Q and P is an orthogonal matrix whose columns are the corresponding orthonormal eigenvectors. It follows that we can rewrite

\begin{displaymath}Z^t Q Z = (P^t Z)^t \Lambda (PZ)
\end{displaymath}

The variable W = Pt Z is multivariate normal with mean 0 and variance covariance matrix Pt P = I; that is, W is standard multivariate normal. Now

\begin{displaymath}W^t \Lambda W =\sum \lambda_i W_i^2
\end{displaymath}

We have established that the general distribution of any quadratic form Xt M X is a linear combination of $\chi^2$ variables. Now go back to the condition QQ=Q. If $\lambda$ is an eigenvalue of Q and $v\neq 0$ is a corresponding eigenvector then $QQv = Q(\lambda v) = \lambda Qv = \lambda^2 v
$ but also $QQv =Qv = \lambda v$. Thus $\lambda(1-\lambda ) v=0$. It follows that either $\lambda=0$ or $\lambda=1$. This means that the weights in the linear combination are all 1 or 0 and that Xt M X has a $\chi^2$ distribution with degrees of freedom, $\nu$, equal to the number of $\lambda_i$ which are equal to 1. This is the same as the sum of the $\lambda_i$ so

\begin{displaymath}\nu = trace(\Lambda)
\end{displaymath}

But
\begin{align*}trace(M\Sigma)& = trace(MAA^t)
\\
&= trace(A^t M A)
\\
& = trace...
... (P\Lambda P^t)
\\
& = trace(\Lambda P^t P)
\\
& = trace(\Lambda)
\end{align*}

In the application $\Sigma$ is ${\cal I}$ the Fisher information and $M={\cal I}^{-1} - J$ where

\begin{displaymath}J= \left[\begin{array}{cc}
0 & 0 \\ 0 & I_{\gamma\gamma}^{-1}
\end{array}\right]
\end{displaymath}

It is easy to check that $M\Sigma$ becomes

\begin{displaymath}\left[\begin{array}{cc}
I & 0 \\ 0 & 0
\end{array}\right]
\end{displaymath}

where I is a $p\times p$ identity matrix. It follows that $M\Sigma M\Sigma= M\Sigma$ and that $trace(M\Sigma) = p)$.


next up previous



Richard Lockhart
2000-03-21