next up previous


Postscript version of this file

STAT 801 Lecture 4

Reading for Today's Lecture:

Goals of Today's Lecture:

Review of end of last time

Theorem 1   Suppose $X_1,\ldots,X_n$ independent $N(\mu,\sigma^2)$ rvs. Then

1.
$\bar X$ s2 are independent.

2.
$n^{1/2}(\bar{X} - \mu)/\sigma \sim N(0,1)$

3.
$(n-1)s^2/\sigma^2 \sim \chi^2_{n-1}$

4.
$n^{1/2}(\bar{X} - \mu)/s \sim t_{n-1}$

Proof: Reduces to $\mu=0$ and $\sigma=1$.

Step 1: Define

\begin{displaymath}Y=(\sqrt{n}\bar{Z}, Z_1-\bar{Z},\ldots,Z_{n-1}-\bar{Z}) =MZ
\end{displaymath}

for suitable M. Then
\begin{align*}f_Y(y) =& (2\pi)^{-n/2} \exp[-y^t\Sigma^{-1}y/2]/\vert\det M\vert
...
...^{-(n-1)/2}\exp[-{\bf y}_2^t Q^{-1} {\bf y}_2/2]}{\vert\det M\vert}
\end{align*}
where ${\bf y}_2 = (y_2,\ldots,y_n)^t$.

Notice ftn of y1 times ftn of $y_2, \ldots,y_n$. Thus $\sqrt{n}\bar{Z}$ is independent of $Z_1-\bar{Z},\ldots,Z_{n-1}-\bar{Z}$. Since sZ2 is a function of $Z_1-\bar{Z},\ldots,Z_{n-1}-\bar{Z}$ we see that $\sqrt{n}\bar{Z}$ and sZ2 are independent (remember that $Z_n-\bar{Z} = -\sum_1^{n-1}
(Z_i-\bar{Z})$).

Also: density of Y1 is a multiple of the function of y1 in the factorization above. But this is the standard normal density so $\sqrt{n}\bar{Z}\sim N(0,1)$.

First 2 parts of theorem done. Third part is homework exercise but I will outline the derivation of the $\chi^2$ density.

Suppose $Z_1,\ldots,Z_n$ are independent N(0,1). Define $\chi^2_n$ distribution to be that of $U=Z_1^2 + \cdots + Z_n^2$. Define angles $\theta_1,\ldots,\theta_{n-1}$ by
\begin{align*}Z_1 &= U^{1/2} \cos\theta_1
\\
Z_2 &= U^{1/2} \sin\theta_1\cos\th...
...\theta_{n-1}
\\
Z_n &= U^{1/2} \sin\theta_1\cdots \sin\theta_{n-1}
\end{align*}
(Spherical co-ordinates in n dimensions. The $\theta$ values run from 0 to $\pi$ except for the last $\theta$ whose values run from 0 to $2\pi$.) Derivative formulas:

\begin{displaymath}\frac{\partial Z_i}{\partial U} = \frac{1}{2U} Z_i
\end{displaymath}

and

\begin{displaymath}\frac{\partial Z_i}{\partial\theta_j} =
\left\{ \begin{array}...
...n\theta_i & j=i
\\
Z_i\cot\theta_j & j < i
\end{array}\right.
\end{displaymath}

Fix n=3 to clarify the formulas. The matrix of partial derivatives is, denoting R=U1/2

\begin{displaymath}\left[\begin{array}{ccc}
\frac{
\cos\theta_1
}{
2R}
&
-R \sin...
...1\sin\theta_2
&
R \sin\theta_1\cos\theta_2
\end{array}\right]
\end{displaymath}

The determinant of this matrix may be found by adding $2U^{1/2}\cos\theta_j/\sin\theta_j$ times column 1 to column j+1 (which doesn't change the determinant). The resulting matrix is lower triangular with diagonal entries $U^{-1/2} \cos\theta_1 /2$, $U^{1/2}\cos\theta_2/ \cos\theta_1$ and $U^{1/2} \sin\theta_1/\cos\theta_2$. We multiply these together to get

\begin{displaymath}U^{1/2}\sin(\theta_1)/2
\end{displaymath}

which is non-negative for all U and $\theta_1$. For general n we see that every term in the first column contains a factor U-1/2/2 while every other entry has a factor U1/2. Multiplying a column in a matrix by c multiplies the determinant by c so the Jacobian of the transformation is u(n-2)/2/2 times some function, say h, which depends only on the angles. Thus the joint density of $U,\theta_1,\ldots \theta_{n-1}$ is

\begin{displaymath}(2\pi)^{-n/2} \exp(-u/2) u^{(n-2)/2}h(\theta_1, \cdots, \theta_{n-1}) / 2
\end{displaymath}

To compute the density of U we must do an n-1 dimensional multiple integral $d\theta_{n-1}\cdots d\theta_1$. We see that the answer has the form

\begin{displaymath}cu^{(n-2)/2} \exp(-u/2)
\end{displaymath}

for some c which we can evaluate by making

\begin{displaymath}\int f_U(u) du = c \int u^{(n-2)/2} \exp(-u/2) du =1
\end{displaymath}

Substitute y=u/2, du=2dy to see that
\begin{align*}c 2^{(n-2)/2} 2 \int y^{(n-2)/2}e^{-y} dy & = c 2^{(n-1)/2} \Gamma(n/2)
\\
& = 1
\end{align*}
so that the $\chi^2$ density is

\begin{displaymath}\frac{1}{2\Gamma(n/2)} \left(\frac{u}{2}\right)^{(n-2)/2} e^{-u/2}
\end{displaymath}

Fourth part of theorem is consequence of first 3 parts of the theorem and definition of $t_\nu$distribution: $T\sim t_\nu$ if it has same distribution as

\begin{displaymath}Z/\sqrt{U/\nu}
\end{displaymath}

where $Z\sim N(0,1)$, $U\sim\chi^2_\nu$ and Z and U are independent.

Derive density of T in this definition:
\begin{align*}P(T \le t) &= P( Z \le t\sqrt{U/\nu})
\\
& =
\int_0^\infty \int_{-\infty}^{t\sqrt{u/\nu}} f_Z(z)f_U(u) dz du
\end{align*}
Differentiate wrt t by differentiating inner integral:

\begin{displaymath}\frac{\partial}{\partial t}\int_{at}^{bt} f(x)dx
=
bf(bt)-af(at)
\end{displaymath}

by fundamental thm of calculus. Hence
\begin{align*}\frac{d}{dt} P(T \le t) =&
\int_0^\infty f_U(u) \sqrt{u/\nu}
\\
& \times \frac{\exp[-t^2u/(2\nu)]}{\sqrt{2\pi}} du
\, .
\end{align*}
Plug in

\begin{displaymath}f_U(u)= \frac{1}{2\Gamma(\nu/2)}(u/2)^{(\nu-2)/2} e^{-u/2}
\end{displaymath}

to get
\begin{align*}f_T(t) = & \int_0^\infty \frac{1}{2\sqrt{\pi\nu}\Gamma(\nu/2)}
\\
&\times (u/2)^{(\nu-1)/2} \exp[-u(1+t^2/\nu)/2] du
\end{align*}
Substitute $y=u(1+t^2/\nu)/2$. Then $dy=(1+t^2/\nu)du/2$ and

\begin{displaymath}(u/2)^{(\nu-1)/2}= [y/(1+t^2/\nu)]^{(\nu-1)/2}\end{displaymath}

so
\begin{align*}f_T(t) = & \frac{1}{\sqrt{\pi\nu}\Gamma(\nu/2)}(1+t^2/\nu)^{-(\nu+1)/2}
\\
& \times \int_0^\infty y^{(\nu-1)/2} e^{-y} dy
\end{align*}
or

\begin{displaymath}f_T(t)= \frac{\Gamma((\nu+1)/2)}{\sqrt{\pi\nu}\Gamma(\nu/2)(1+t^2/\nu)^{(\nu+1)/2}} \, .
\end{displaymath}

Expectation, moments

Two elementary definitions of expected values:

Def'n If X has density f then

\begin{displaymath}E(g(X)) = \int g(x)f(x)\, dx \,.
\end{displaymath}

Def'n: If X has discrete density f then

\begin{displaymath}E(g(X)) = \sum_x g(x)f(x) \,.
\end{displaymath}

If Y=g(X) for a smooth g
\begin{align*}E(Y) & = \int y f_Y(y)
\\
&= \int g(x) f_Y(g(x)) g^\prime(x) \, dy
\\
&= E(g(X))
\end{align*}
by the change of variables formula for integration. This is good because otherwise we might have two different values for E(eX).

In general, there are random variables which are neither absolutely continuous nor discrete. Here's how probabilists define E in general.

Def'n: RV X is simple if we can write

\begin{displaymath}X(\omega)= \sum_1^n a_i 1(\omega\in A_i)
\end{displaymath}

for some constants $a_1,\ldots,a_n$ and events Ai.

Def'n: For a simple rv X define

\begin{displaymath}E(X) = \sum a_i P(A_i)
\end{displaymath}

For positive random variables which are not simple we extend our definition by approximation:

Def'n: If $X \ge 0$ then

\begin{displaymath}E(X) = \sup\{E(Y): 0 \le Y \le X, Y \mbox{ simple}\}
\end{displaymath}

Def'n: We call X integrable if

\begin{displaymath}E(\vert X\vert) < \infty \, .
\end{displaymath}

In this case we define

\begin{displaymath}E(X) = E(\max(X,0)) -E(\max(-X,0))
\end{displaymath}

Facts: E is a linear, monotone, positive operator:

1.
Linear: E(aX+bY) = aE(X)+bE(Y) provided X and Y are integrable.

2.
Positive: $P(X \ge 0) = 1$ implies $E(X) \ge 0$.

3.
Monotone: $P(X \ge Y)=1$ and X, Y integrable implies $E(X) \ge E(Y)$.

Major technical theorems:

Monotone Convergence: If $ 0 \le X_1 \le X_2 \le \cdots$ and $X= \lim X_n$ (which has to exist) then

\begin{displaymath}E(X) = \lim_{n\to \infty} E(X_n)
\end{displaymath}

Dominated Convergence: If $\vert X_n\vert \le Y_n$ and $\exists$rv X such that $X_n \to X$ (technical details of this convergence later in the course) and a random variable Y such that $Y_n \to Y$ with $E(Y_n) \to E(Y) < \infty$ then

\begin{displaymath}E(X_n) \to E(X)
\end{displaymath}

Often used with all Yn the same rv Y.

Fatou's Lemma: If $X_n \ge 0$ then

\begin{displaymath}E(\lim\sup X_n) \le \lim\sup E(X_n)
\end{displaymath}

Theorem: With this definition of E if X has density f(x) (even in Rp say) and Y=g(X) then

\begin{displaymath}E(Y) = \int g(x) f(x) dx \, .
\end{displaymath}

(Could be a multiple integral.) If X has pmf f then

\begin{displaymath}E(Y) =\sum_x g(x) f(x) \, .
\end{displaymath}

Works, e.g., even if X has a density but Y doesn't.

Def'n: The $r^{\rm th}$ moment (about the origin) of a real rv X is $\mu_r^\prime=E(X^r)$ (provided it exists). We generally use $\mu$ for E(X). The $r^{\rm th}$ central moment is

\begin{displaymath}\mu_r = E[(X-\mu)^r]
\end{displaymath}

We call $\sigma^2 = \mu_2$ the variance.

Def'n: For an Rp valued random vector X we define $\mu_X = E(X) $ to be the vector whose $i^{\rm th}$ entry is E(Xi)(provided all entries exist).

Def'n: The ( $p \times p$) variance covariance matrix of X is

\begin{displaymath}Var(X) = E\left[ (X-\mu)(X-\mu)^t \right]
\end{displaymath}

which exists provided each component Xi has a finite second moment.

Moments and probabilities of rare events are closely connected as will be seen in a number of important probability theorems. Here is one version of Markov's inequality (one case is Chebyshev's inequality):
\begin{align*}P(\vert X-\mu\vert \ge t ) &= E[1(\vert X-\mu\vert \ge t)]
\\
&\...
...-\mu\vert \ge t)\right]
\\
& \le \frac{E[\vert X-\mu\vert^r]}{t^r}
\end{align*}
The intuition is that if moments are small then large deviations from average are unlikely.

Example moments: If Z is standard normal then
\begin{align*}E(Z) & = \int_{-\infty}^\infty z e^{-z^2/2} dz /\sqrt{2\pi}
\\
&=...
...ac{-e^{-z^2/2}}{\sqrt{2\pi}}\right\vert _{-\infty}^\infty
\\
& = 0
\end{align*}
and (integrating by parts)
\begin{align*}E(Z^r) = & \int_{-\infty}^\infty z^r e^{-z^2/2} dz /\sqrt{2\pi}
\\...
...
& + (r-1) \int_{-\infty}^\infty z^{r-2} e^{-z^2/2} dz /\sqrt{2\pi}
\end{align*}
so that

\begin{displaymath}\mu_r = (r-1)\mu_{r-2}
\end{displaymath}

for $r \ge 2$. Remembering that $\mu_1=0$ and

\begin{displaymath}\mu_0 = \int_{-\infty}^\infty z^0 e^{-z^2/2} dz /\sqrt{2\pi}=1
\end{displaymath}

we find that

\begin{displaymath}\mu_r = \left\{ \begin{array}{ll}
0 & \mbox{$r$ odd}
\\
(r-1)(r-3)\cdots 1 & \mbox{$r$ even}
\end{array}\right.
\end{displaymath}

If now $X\sim N(\mu,\sigma^2)$, that is, $X\sim \sigma Z + \mu$, then $E(X) = \sigma E(Z) + \mu = \mu$and

\begin{displaymath}\mu_r(X) = E[(X-\mu)^r] = \sigma^r E(Z^r)
\end{displaymath}

In particular, we see that our choice of notation $N(\mu,\sigma^2)$ for the distribution of $\sigma Z + \mu$ is justified; $\sigma$ is indeed the variance.

Moments and independence

Theorem: If $X_1,\ldots,X_p$ are independent and each Xi is integrable then $X=X_1\cdots X_p$ is integrable and

\begin{displaymath}E(X_1\cdots X_p) = E(X_1) \cdots E(X_p)
\end{displaymath}

Proof: Suppose each Xi is simple:

\begin{displaymath}X_i = \sum_j x_{ij} 1(X_i =x_{ij})\end{displaymath}

where the xij are the possible values of Xi. Then
\begin{align*}E(X_1\cdots X_p) = & \sum_{j_1\ldots j_p} x_{1j_1}\cdots x_{pj_p} ...
...& \times \sum_{j_p} x_{pj_p} P(X_p = x_{pj_p})
\\
= & \prod E(X_i)
\end{align*}
General Xi>0: round Xi down to nearest multiple of 2-n (to maximum of n); apply case just done and monotone convergence theorem. For general case write each Xi as difference of positive and negative parts:

\begin{displaymath}X_i = \max(X_i,0) -\max(-X_i,0)
\end{displaymath}

Moment Generating Functions

Def'n: The moment generating function of a real valued X is

MX(t) = E(etX)

defined for those real t for which the expected value is finite.

Def'n: The moment generating function of $X\in R^p$ is

MX(u) = E[eutX]

defined for those vectors u for which the expected value is finite.

Formal connection to moments:
\begin{align*}M_X(t) & = \sum_{k=0}^\infty E[(tX)^k]/k!
\\
= \sum_{k=0}^\infty \mu_k^\prime t^k/k!
\end{align*}
Sometimes can find power series expansion of MX and read off the moments of X from the coefficients of tk/k!.

Theorem: If M is finite for all $t \in [-\epsilon,\epsilon]$for some $\epsilon > 0$ then

1.
Every moment of X is finite.

2.
M is $C^\infty$ (in fact M is analytic).

3.
$\mu_k^\prime = \frac{d^k}{dt^k} M_X(0)$.

The proof, and many other facts about mgfs, rely on techniques of complex variables.


next up previous



Richard Lockhart
2000-01-13