next up


Postscript version of this file

STAT 801 Lecture 1

Course outline

Reading for Today's Lecture: Chapter 1 of Casella and Berger.

Goals of Today's Lecture:

Course outline:

Statistics versus Probability

Standard view of scientific inference has a set of theories which make predictions about the outcomes of an experiment:

Theory Prediction
A 1
B 2
C 3

Conduct experiment, see outcome 2: we infer that Theory B is correct (or at least that A and C are wrong).

Add Randomness

Theory Prediction
A Usually 1 sometimes 2 never 3
B Usually 2 sometimes 1 never 3
C Usually 3 sometimes 1 never 2

See outcome 2: infer Theory B probably correct, Theory A probably not correct, Theory C is wrong.

Probability Theory: construct table: compute likely outcomes of experiments.

Statistics: inverse process. Use table to draw inferences from outcome of experiment. How should we do it and how wrong are our inferences likely to be?

Probability Definitions

Probability Space (or Sample Space): ordered triple $(\Omega, {\cal F}, P)$.

Axioms guarantee we can compute probabilities by usual rules, including approximation without fear of contradiction.

Vector valued random variable: function $X:\Omega\mapsto R^p$with the property that, writing, $X=(X_1,\ldots,X_p)$

\begin{displaymath}P(X_1 \le x_1, \ldots , X_p \le x_p)
\end{displaymath}

makes sense for any constants $(x_1,\ldots,x_p)$. Formally the notation

\begin{displaymath}X_1 \le x_1, \ldots , X_p \le x_p
\end{displaymath}

is a subset of $\Omega$ or event:

\begin{displaymath}\left\{\omega\in\Omega: X_1(\omega) \le x_1, \ldots , X_p (\omega) \le x_p \right\}
\end{displaymath}

Remember X is a function on $\Omega$ so X1 is also a function on $\Omega$.

In almost all of probability and statistics the dependence of a random variable on a point in the probability space is hidden! You almost always see X not $X(\omega)$.

Now for formal definitions:

Borel $\sigma$-field in Rp: smallest $\sigma$-field in Rp containing every open ball.

Every common set is a Borel set, that is, in the Borel $\sigma$-field.

An Rp valued random variable is a map $X:\Omega\mapsto R^p$ such that when A is Borel then $\{\omega\in\Omega:X(\omega)\in A\} \in \cal F$.

Fact: this is equivalent to

\begin{displaymath}\left\{
\omega\in\Omega: X_1(\omega) \le x_1, \ldots , X_p (\omega) \le x_p
\right\}
\in \cal F
\end{displaymath}

for all $(x_1,\ldots,x_p)\in R^p$.

Jargon and notation: we write $P(X\in A)$ for $P(\{\omega\in\Omega:X(\omega)\in
A\})$ and define the distribution of X to be the map

\begin{displaymath}A\mapsto P(X\in A)
\end{displaymath}

which is a probability on the set Rp with the Borel $\sigma$-field rather than the original $\Omega$ and $\cal F$.

Cumulative Distribution Function (or CDF) of X: function FX on Rpdefined by

\begin{displaymath}F_X(x_1,\ldots, x_p) =
P(X_1 \le x_1, \ldots , X_p \le x_p)
\end{displaymath}

Properties of FX (or just F when there's only one CDF under consideration) for p=1:

1.
$0 \le F(x) \le 1$.

2.
$ x> y \Rightarrow F(x) \ge F(y)$ (F is monotone non-decreasing).

3.
$\lim_{x\to - \infty} F(x) = 0$

4.
$\lim_{x\to \infty} F(x) = 1$

5.
$\lim_{x\searrow y} F(x) = F(y)$ (F is right continuous).

6.
$\lim_{x\nearrow y} F(x) \equiv F(y-)$ exists.

7.
F(x)-F(x-) = P(X=x).

8.
FX(t) = FY(t) for all t implies that X and Y have the same distribution, that is, $P(X\in A) = P(Y\in A)$ for any (Borel) set A.

The distribution of a random variable X is discrete (we also call the random variable discrete) if there is a countable set $x_1,x_2,\cdots$ such that

\begin{displaymath}P(X \in \{ x_1,x_2 \cdots\}) =1 = \sum_i P(X=x_i)
\end{displaymath}

In this case the discrete density or probability mass function of X is

fX(x) = P(X=x)

The distribution of a random variable X is absolutely continuous if there is a function f such that

 \begin{displaymath}
P(X\in A) = \int_A f(x) dx
\end{displaymath} (1)

for any (Borel) set A. This is a p dimensional integral in general. A function f satisfying (1) is called the density of X. This condition is equivalent to

\begin{displaymath}F(x) = \int_{-\infty}^x f(y) \, dy
\end{displaymath}

For most values of x we then have F is differentiable at x and

\begin{displaymath}F^\prime(x) =f(x) \, .
\end{displaymath}

Example: X is exponential.

\begin{displaymath}F(x) = \left\{ \begin{array}{ll}
1- e^{-x} & x > 0
\\
0 & x \le 0
\end{array}\right.
\end{displaymath}


\begin{displaymath}f(x) = \left\{ \begin{array}{ll}
e^{-x} & x> 0
\\
\mbox{undefined} & x= 0
\\
0 & x < 0
\end{array}\right.
\end{displaymath}

Distribution Theory

General Problem: Start with assumptions about the density or CDF of a random vector $X=(X_1,\ldots,X_p)$. Define $Y=g(X_1,\ldots,X_p)$ to be some function of X (usually some statistic of interest). How can we compute the distribution or CDF or density of Y?

Univariate Techniques

Method 1: compute the CDF by integration and differentiate to find fY.

Example: $U \sim \mbox{Uniform}[0,1]$ and $Y=-\log U$.

\begin{eqnarray*}F_Y(y) & = & P(Y \le y)
= P(-\log U \le y)
\\
& = & P(\log U ...
...{array}{ll}
1- e^{-y} & y > 0
\\
0 & y \le 0
\end{array}\right.
\end{eqnarray*}


so Y has standard exponential distribution.

Example: $Z \sim N(0,1)$, i.e.

\begin{displaymath}f_Z(z) = \frac{1}{\sqrt{2\pi}} e^{-z^2/2}
\end{displaymath}

and Y=Z2. Then

\begin{displaymath}F_Y(y) = P(Z^2 \le y) =
\left\{ \begin{array}{ll}
0 & y < ...
...
P(-\sqrt{y} \le Z \le \sqrt{y}) & y \ge 0
\end{array}\right.
\end{displaymath}

Now differentiate

\begin{displaymath}P(-\sqrt{y} \le Z \le \sqrt{y}) = F_Z(\sqrt{y}) -F_Z(-\sqrt{y})
\end{displaymath}


\begin{displaymath}f_Y(y) = \left\{ \begin{array}{ll}
0 & y < 0
\\
\frac{d}{dy...
...\right] & y > 0
\\
\mbox{undefined} & y=0
\end{array}\right.
\end{displaymath}

Then

\begin{eqnarray*}\frac{d}{dy} F_Z(\sqrt{y}) & = & f_Z(\sqrt{y})\frac{d}{dy}\sqrt...
...{1}{2} y^{-1/2}
\\
& = & \frac{1}{2\sqrt{2\pi y}} e^{-y/2} \,.
\end{eqnarray*}


(Similar formula for other derivative.) Thus

\begin{displaymath}f_Y(y) = \left\{ \begin{array}{ll}
\frac{1}{\sqrt{2\pi y}} e...
...0
\\
0 & y < 0
\\
\mbox{undefined} & y=0
\end{array}\right.
\end{displaymath}

We will find indicator notation useful:

\begin{displaymath}1(y>0) = \left\{ \begin{array}{ll}
1 & y>0
\\
0 & y \le 0
\end{array}\right.
\end{displaymath}

which we use to write

\begin{displaymath}f_Y(y) = \frac{1}{\sqrt{2\pi y}} e^{-y/2} 1(y>0)
\end{displaymath}

(changing definition unimportantly at y=0).

Notice: I never evaluated FY before differentiating it. In fact FY and FZ are integrals I can't do but I can differentiate then anyway. Remember fundamental theorem of calculus:

\begin{displaymath}\frac{d}{dx} \int_a^x f(y) \, dy = f(x)
\end{displaymath}

at any x where f is continuous.


next up



Richard Lockhart
2000-01-04