No Title

$next$ $up$ $previous$

Postscript version of this file

STAT 801 Lecture 8

Reading for Today's Lecture: ?

Goals of Today's Lecture:

Develop convergence in distribution tools.
Discuss central limit theorems.

Today's notes

Last time we used the Fourier inversion formula to prove the local central limit theorem:

Framework:

$X_1,\ldots,X_n$ are iid mean 0 and variance 1
$T=n^{1/2}\bar{X}$
$\phi(t) = E(e^{itX_1})$ is the characteristic function of a single X

We concluded

$\begin{displaymath}E(e^{itT}) = \left[\phi(n^{-1/2}t)\right]^n \end{displaymath}$

We differentiated $\phi$ to obtain
$\begin{align*}\phi(0) & = 1 \\ \phi^\prime(0) & = i E(X_1) = 0 \\ \phi^{\prime\prime}(0) & = -E(X_1^2) = -1 \end{align*}$

It now follows that
$\begin{align*}E(e^{itT}) & \approx [1-t^2/(2n) + o(1/n)]^n \\ & \to e^{-t^2/2} \end{align*}$

Apply the Fourier inversion formula to deduce

$\begin{displaymath}f_T(x) \to \frac{1}{\sqrt{2\pi}} e^{-x^2/2} \end{displaymath}$

which is the standard normal random density.

This proof of the central limit theorem is not terribly general since it requires T to have a bounded continuous density. The usual central limit theorem is a statement about cdfs not densities and is

$\begin{displaymath}P(T \le t) \to P(Z \le t) \end{displaymath}$

Convergence in Distribution

In undergraduate courses we often teach the central limit theorem as follows: if $X_1,\ldots,X_n$ are iid from a population with mean $\mu$ and standard deviation $\sigma$ then $n^{1/2}(\bar{X}-\mu)/\sigma$ has approximately a normal distribution. We also say that a Binomial(n,p) random variable has approximately a N(np,np(1-p)) distribution.

To make precise sense of these assertions we need to assign a meaning to statements like ``X and Y have approximately the same distribution''. The meaning we want to give is that X and Y have nearly the same cdf but even here we need some care. If n is a large number is the N(0,1/n) distribution close to the distribution of $X\equiv 0$ ? Is it close to the N(1/n,1/n) distribution? Is it close to the $N(1/\sqrt{n},1/n)$ distribution? If $X_n\equiv 2^{-n}$ is the distribution of X_n close to that of $X\equiv 0$ ?

The answer to these questions depends in part on how close close needs to be so it's a matter of definition. In practice the usual sort of approximation we want to make is to say that some random variable X, say, has nearly some continuous distribution, like N(0,1). In this case we must want to calculate probabilities like P(X>x) and know that this is nearly P(N(0,1) > x). The real difficulty arises in the case of discrete random variables; in this course we will not actually need to approximate a distribution by a discrete distribution.

When mathematicians say two things are close together: either there is an upper bound on the distance between the two things or they are talking about taking a limit. In this course we do the latter.

Definition: A sequence of random variables X_n converges in distribution to a random variable X if

$\begin{displaymath}E(g(X_n)) \to E(g(X)) \end{displaymath}$

for every bounded continuous function g.

Theorem: The following are equivalent:

1.

X_n converges in distribution to X.

2.

$P(X_n \le x) \to P(X \le x)$ for each x such that P(X=x)=0

3.

The characteristic functions of the X_n converge to the characteristic function of X:

$\begin{displaymath}E(e^{itX_n}) \to E(e^{itX}) \end{displaymath}$

for every real x.

These are all implied by

$\begin{displaymath}M_{X_n}(t) \to M_X(t) < \infty \end{displaymath}$

for all $\vert t\vert \le \epsilon$ for some positive $\epsilon$ .

Now let's go back to the questions I asked:

$X_n\sim N(0,1/n)$ and X=0. Then

$\begin{displaymath}P(X_n \le x) \to \left\{\begin{array}{ll} 1 & x>0 \\ 0 & x<0 \\ 1/2 & x=0 \end{array}\right. \end{displaymath}$

Now the limit is the cdf of X=0 except for x=0 and the cdf of X is not continuous at x=0 so yes, X_n converges to X in distribution.

<\CENTER>
I asked if $X_n\sim N(1/n,1/n)$ had a distribution close to that of $Y_n \sim N(0,1/n)$ . The definition I gave really requires me to answer by finding a limit X and proving that both X_n and Y_nconverge to X in distribution. Take X=0. Then

$\begin{displaymath}E(e^{tX_n}) = e^{t/n+t^2/(2n)} \to 1 = E(e^{tX}) \end{displaymath}$

and

$\begin{displaymath}E(e^{tY_n}) = e^{t^2/(2n)} \to 1 \end{displaymath}$

so that both X_n and Y_n have the same limit in distribution.

Multiply both X_n and Y_n by n^1/2 and let $X \sim N(0,1)$ . Then $\sqrt{n}X_n \sim N(n^{-1/2},1)$ and $\sqrt{n} Y_n \sim N(0,1)$ . You can use characteristic functions to prove that both $\sqrt{n}X_n$ and $\sqrt{n} Y_n$ converge to N(0,1) in distribution.

<\CENTER>
If you now let $X_n \sim N(n^{-1/2},1/n)$ and $Y_n \sim N(0,1/n)$ then again both X_n and Y_n converge to 0 in distribution.

If you multiply X_n and Y_n by n^1/2 then $n^{1/2}X_n \sim N(1,1)$ and $n^{1/2} Y_n \sim N(0,1)$ so that n^1/2X_nand n^1/2 Y_n are not close together in distribution. <\CENTER>

You can check that $2^{-n}\to 0$ in distribution.

Here is the message you are supposed to take away from this discussion. You do distributional approximations by showing that a sequence of random variables X_n converges to some X. The limit distribution should be non-trivial, like say N(0,1). We don't say X_n is approximately N(1/n,1/n) but that n^1/2 X_n converges to N(0,1) in distribution.

The Central Limit Theorem

If $X_1, X_2, \cdots$ are iid with mean 0 and variance 1 then $n^{1/2}\bar{X}$ converges in distribution to N(0,1). That is,

$\begin{displaymath}P(n^{1/2}\bar{X} \le x ) \to \frac{1}{\sqrt{2\pi}} \int_{-\infty}^x e^{-y^2/2} dy \, . \end{displaymath}$

Proof: As before

$\begin{displaymath}E(e^{itn^{1/2}\bar{X}}) \to e^{-t^2/2} \end{displaymath}$

This is the characteristic function of a N(0,1) random variable so we are done by our theorem.

Edgeworth expansions

Suppose that X is a random variable with mean 0, variance 1 and $E(X^3)=\gamma$ . If $\phi$ is the characteristic function of X, then

$\begin{displaymath}\phi(t) \approx 1 -t^2/2 -i\gamma t^3/6 + \cdots \end{displaymath}$

keeping one more term. Then

$\begin{displaymath}\log(\phi(t)) =\log(1+u) \end{displaymath}$

where

$\begin{displaymath}u=-t^2/2 -i \gamma t^3/6 + \cdots \end{displaymath}$

Use $\log(1+u) = u-u^2/2 + \cdots$ to get
$\begin{multline*}\log(\phi(t)) \approx \\ [-t^2/2 -i\gamma t^3/6 +\cdots] \\ -[\cdots]^2/2 +\cdots \end{multline*}$
which rearranged is

$\begin{displaymath}\log(\phi(t)) \approx -t^2/2 -i\gamma t^3/6 + \cdots \end{displaymath}$

Now apply this calculation to the characteristic function of $T=n^{1/2}\bar{X}$ where $\bar{X}$ is the mean of a sample of size n. Then

$\begin{displaymath}\log(\phi_T(t)) \approx -t^2/2 -i E(T^3) t^3/6 + \cdots \end{displaymath}$

Remember $E(T^3) = \gamma/\sqrt{n}$ and exponentiate to get

$\begin{displaymath}\phi_T(t) \approx e^{-t^2/2} \exp\{-i\gamma t^3/(6\sqrt{n}) + \cdots\} \end{displaymath}$

You can do a Taylor expansion of the second exponential around 0 because of the square root of n and get

$\begin{displaymath}\phi_T(t) \approx e^{-t^2/2} (1-i\gamma t^3/(6\sqrt{n})) \end{displaymath}$

neglecting higher order terms. This approximation to the characteristic function of T can be inverted to get an Edgeworth approximation to the density (or distribution) of T which looks like

$\begin{displaymath}f_T(x) \approx \frac{1}{\sqrt{2\pi}} e^{-x^2/2} [1-\gamma (x^3-3x)/(6\sqrt{n}) + \cdots] \end{displaymath}$

Remarks:

1.: The error using the central limit theorem to approximate a density or a probability is proportional to n^-1/2
2.: This is improved to n^-1 for symmetric densities for which $\gamma=0$ .
3.: These expansions are asymptotic. This means that the series indicated by $\cdots$ usually does not converge. When n=25 it may help to take the second term but get worse if you include the third or fourth or more.
4.: You can integrate the expansion above for the density to get an approximation for the cdf.

Multivariate convergence in distribution

Definition: $X_n\in R^p$ converges in distribution to $X\in R^p$ if

$\begin{displaymath}E(g(X_n)) \to E(g(X)) \end{displaymath}$

for each bounded continuous real valued function g on R^p.

This is equivalent to either of

Cramér Wold Device: a^tX_n converges in distribution to a^t X for each $a \in R^p$

Convergence of characteristic functions:

$\begin{displaymath}E(e^{ia^tX_n}) \to E(e^{ia^tX}) \end{displaymath}$

for each $a \in R^p$ .

Extensions of the CLT

1.

If $Y_1,Y_2,\cdots$ are iid in R^p and each has mean $\mu$ and variance covariance $\Sigma$ then $n^{1/2}(\bar{Y}-\mu)$ converges in distribution to $MVN(0,\Sigma)$ .

2.

If for each n we have a set of independent mean 0 random variables $X_{n1},\ldots,X_{nn}$ with
$\begin{align*}E(X_{ni}) & =0 \\ Var(\sum_i X_{ni}) & = 1 \\ \sum E(\vert X_{ni}\vert^3) \to 0 \end{align*}$
then $\sum_i X_{ni}$ converges to N(0,1). This is the Lyapunov central limit theorem.

3.

Replace the third moment condition with

$\begin{displaymath}\sum E(X_{ni}^2 1(\vert X_{ni}\vert > \epsilon)) \to 0 \end{displaymath}$

for each $\epsilon > 0$ then again $\sum_i X_{ni}$ converges in distribution to N(0,1). This is the Lindeberg central limit theorem.

4.

There are extensions to rvs which aren't independent: the m-dependent central limit theorem, the martingale central limit theorem, the central limit theorem for mixing processes.

5.

Many important random variables are not sums of independent random variables. We handle these with Slutsky's theorem and the $\delta$ method.

Slutsky's Theorem: If X_n converges in distribution to Xand Y_n converges in distribution (or in probability) to c, a constant, then X_n+Y_n converges in distribution to X+c.

Warning: the hypothesis that the limit of Y_n be constant is essential.

The delta method: Suppose a sequence Y_n of random variables converges to some y a constant and that if we define X_n = a_n(Y_n-y) then X_n converges in distribution to some random variable X. Suppose that f is a differentiable function on the range of Y_n. Then a_n(f(Y_n)-f(y)) converges in distribution to $f^\prime(y) X$ . If X_n is in R^p and f maps R^p to R^q then $f^\prime$ is the $q\times p$ matrix of first derivatives of components of f.

Example: Suppose $X_1,\ldots,X_n$ are a sample from a population with mean $\mu$ , variance $\sigma^2$ , and third and fourth central moments $\mu_3$ and $\mu_4$ . Then

$\begin{displaymath}n^{1/2}(s^2-\sigma^2) \Rightarrow N(0,\mu_4-\sigma^4) \end{displaymath}$

where $\Rightarrow$ is notation for convergence in distribution. For simplicity I define $s^2 = \overline{X^2} -{\bar{X}}^2$ .

Take $Y_n =(\overline{X^2},\bar{X})$ . Then Y_n converges to $y=(\mu^2+\sigma^2,\mu)$ . Take a_n = n^1/2. Then

n^1/2(Y_n-y)

converges in distribution to $MVN(0,\Sigma)$ with

$\begin{displaymath}\Sigma = \left[\begin{array}{cc} \mu_4-\sigma^4 & \mu_3 -\mu(... ...2)\\ \mu_3-\mu(\mu^2+\sigma^2) & \sigma^2 \end{array} \right] \end{displaymath}$

Define f(x₁,x₂) = x₁-x₂². Then s² = f(Y_n). The gradient of f has components (1,-2x₂). This leads to
$\begin{multline*}n^{1/2}(s^2-\sigma^2) \approx \\ n^{1/2}[1, -2\mu] \left[\be... ...ne{X^2} - (\mu^2 + \sigma^2) \\ \bar{X} -\mu \end{array}\right] \end{multline*}$
which converges in distribution to $(1,-2\mu) Y$ . This rv is $N(0,a^t \Sigma a)=N(0, \mu_4-\sigma^2)$ where $a=(1,-2\mu)^t$ .

Remark: In this sort of problem it is best to learn to recognize that the sample variance is unaffected by subtracting $\mu$ from each X. Thus there is no loss in assuming $\mu=0$ which simplifies $\Sigma$ and a.

$next$ $up$ $previous$

Richard Lockhart
2000-02-02