No Title

$next$ $up$

Postscript version of this file

STAT 801 Lecture 1

Course outline

Reading for Today's Lecture: Chapter 1 of Casella and Berger.

Goals of Today's Lecture:

Give an overview of the course.
Define
- Probability Space
- Random variables (in R^p)
- The distribution of a random variable
- Cumulative Distribution Function (in R¹)
- Discrete and Absolutely Continuous Distribution
- Probability Mass Function
- Probability Density
Introduce distribution theory.

Course outline:

Distribution Theory. Chapters 1-5 of Casella and Berger (but Ch 3 is review not covered in class).
- Basic concepts of probability.
- Distributions
- Expectation and moments
- Transforms (characteristic functions, moment generating functions)
- Distribution of transformations
Point estimation (Chapters 6 and 7 in Casella and Berger.)
- Maximum likelihood estimation.
- Method of moments.
- Optimality Theory.
- Bias, mean squared error.
- Sufficiency.
- Uniform Minimum Variance Unbiased Estimators.
Hypothesis Testing (Cf. Chapter 8 in Casella and Berger.)
- Neyman Pearson optimality theory.
- Most Powerful, Uniformly Most Powerful, Unbiased tests.
Confidence sets (Cf. Chapter 9 in Casella and Berger.)
- Pivots
- Associated Hypothesis Tests
- Inversion of hypothesis tests to get confidence sets.
Decision Theory. (Cf. Chapter 10 in Casella and Berger.)

Statistics versus Probability

Standard view of scientific inference has a set of theories which make predictions about the outcomes of an experiment:

Theory	Prediction
A	1
B	2
C	3

Conduct experiment, see outcome 2: we infer that Theory B is correct (or at least that A and C are wrong).

Add Randomness

Theory	Prediction
A	Usually 1 sometimes 2 never 3
B	Usually 2 sometimes 1 never 3
C	Usually 3 sometimes 1 never 2

See outcome 2: infer Theory B probably correct, Theory A probably not correct, Theory C is wrong.

Probability Theory: construct table: compute likely outcomes of experiments.

Statistics: inverse process. Use table to draw inferences from outcome of experiment. How should we do it and how wrong are our inferences likely to be?

Probability Definitions

Probability Space (or Sample Space): ordered triple $(\Omega, {\cal F}, P)$ .

$\Omega$ is a set (possible outcomes).
$\cal F$ is a family of subsets (events) of $\Omega$ with the property that $\cal F$ is a $\sigma$ -field (or Borel field or $\sigma$ -algebra):

1.
The empty set $\emptyset$ and $\Omega$ are members of $\cal F$ .

2.
$A\in \cal F$ implies $A^c =\{\omega\in\Omega: \omega \not\in A\}\in \cal F$

3.
$A_1,A_2,\cdots$ all in $\cal F$ implies $A = \cup_{i=1}^\infty A_i$ .
P a function, domain $\cal F$ , range a subset of [0,1] satisfying:

1.
$P(\emptyset)=0$ and $P(\Omega)=1$ .

2.
Countable additivity: $A_1,A_2,\cdots$ pairwise disjoint ( $j\neq k \Rightarrow A_j\cap A_k=\emptyset$ )

$\begin{displaymath}P(\cup_{i=1}^\infty A_i) = \sum_{i=1}^\infty P(A_i) \end{displaymath}$

Axioms guarantee we can compute probabilities by usual rules, including approximation without fear of contradiction.

Vector valued random variable: function $X:\Omega\mapsto R^p$ with the property that, writing, $X=(X_1,\ldots,X_p)$

$\begin{displaymath}P(X_1 \le x_1, \ldots , X_p \le x_p) \end{displaymath}$

makes sense for any constants $(x_1,\ldots,x_p)$ . Formally the notation

$\begin{displaymath}X_1 \le x_1, \ldots , X_p \le x_p \end{displaymath}$

is a subset of $\Omega$ or event:

$\begin{displaymath}\left\{\omega\in\Omega: X_1(\omega) \le x_1, \ldots , X_p (\omega) \le x_p \right\} \end{displaymath}$

Remember X is a function on $\Omega$ so X₁ is also a function on $\Omega$ .

In almost all of probability and statistics the dependence of a random variable on a point in the probability space is hidden! You almost always see X not $X(\omega)$ .

Now for formal definitions:

Borel $\sigma$ -field in R^p: smallest $\sigma$ -field in R^p containing every open ball.

Every common set is a Borel set, that is, in the Borel $\sigma$ -field.

An R^p valued random variable is a map $X:\Omega\mapsto R^p$ such that when A is Borel then $\{\omega\in\Omega:X(\omega)\in A\} \in \cal F$ .

Fact: this is equivalent to

$\begin{displaymath}\left\{ \omega\in\Omega: X_1(\omega) \le x_1, \ldots , X_p (\omega) \le x_p \right\} \in \cal F \end{displaymath}$

for all $(x_1,\ldots,x_p)\in R^p$ .

Jargon and notation: we write $P(X\in A)$ for $P(\{\omega\in\Omega:X(\omega)\in A\})$ and define the distribution of X to be the map

$\begin{displaymath}A\mapsto P(X\in A) \end{displaymath}$

which is a probability on the set R^p with the Borel $\sigma$ -field rather than the original $\Omega$ and $\cal F$ .

Cumulative Distribution Function (or CDF) of X: function F_X on R^pdefined by

$\begin{displaymath}F_X(x_1,\ldots, x_p) = P(X_1 \le x_1, \ldots , X_p \le x_p) \end{displaymath}$

Properties of F_X (or just F when there's only one CDF under consideration) for p=1:

1.: $0 \le F(x) \le 1$ .
2.: $x> y \Rightarrow F(x) \ge F(y)$ (F is monotone non-decreasing).
3.: $\lim_{x\to - \infty} F(x) = 0$
4.: $\lim_{x\to \infty} F(x) = 1$
5.: $\lim_{x\searrow y} F(x) = F(y)$ (F is right continuous).
6.: $\lim_{x\nearrow y} F(x) \equiv F(y-)$ exists.
7.: F(x)-F(x-) = P(X=x).
8.: F_X(t) = F_Y(t) for all t implies that X and Y have the same distribution, that is, $P(X\in A) = P(Y\in A)$ for any (Borel) set A.

The distribution of a random variable X is discrete (we also call the random variable discrete) if there is a countable set $x_1,x_2,\cdots$ such that

$\begin{displaymath}P(X \in \{ x_1,x_2 \cdots\}) =1 = \sum_i P(X=x_i) \end{displaymath}$

In this case the discrete density or probability mass function of X is

f_X(x) = P(X=x)

The distribution of a random variable X is absolutely continuous if there is a function f such that

$\begin{displaymath} P(X\in A) = \int_A f(x) dx \end{displaymath}$

(1)

for any (Borel) set A. This is a p dimensional integral in general. A function f satisfying (1) is called the density of X. This condition is equivalent to

$\begin{displaymath}F(x) = \int_{-\infty}^x f(y) \, dy \end{displaymath}$

For most values of x we then have F is differentiable at x and

$\begin{displaymath}F^\prime(x) =f(x) \, . \end{displaymath}$

Example: X is exponential.

$\begin{displaymath}F(x) = \left\{ \begin{array}{ll} 1- e^{-x} & x > 0 \\ 0 & x \le 0 \end{array}\right. \end{displaymath}$

$\begin{displaymath}f(x) = \left\{ \begin{array}{ll} e^{-x} & x> 0 \\ \mbox{undefined} & x= 0 \\ 0 & x < 0 \end{array}\right. \end{displaymath}$

Distribution Theory

General Problem: Start with assumptions about the density or CDF of a random vector $X=(X_1,\ldots,X_p)$ . Define $Y=g(X_1,\ldots,X_p)$ to be some function of X (usually some statistic of interest). How can we compute the distribution or CDF or density of Y?

Univariate Techniques

Method 1: compute the CDF by integration and differentiate to find f_Y.

Example: $U \sim \mbox{Uniform}[0,1]$ and $Y=-\log U$ .

$\begin{eqnarray*}F_Y(y) & = & P(Y \le y) = P(-\log U \le y) \\ & = & P(\log U ... ...{array}{ll} 1- e^{-y} & y > 0 \\ 0 & y \le 0 \end{array}\right. \end{eqnarray*}$

so Y has standard exponential distribution.

Example: $Z \sim N(0,1)$ , i.e.

$\begin{displaymath}f_Z(z) = \frac{1}{\sqrt{2\pi}} e^{-z^2/2} \end{displaymath}$

and Y=Z². Then

$\begin{displaymath}F_Y(y) = P(Z^2 \le y) = \left\{ \begin{array}{ll} 0 & y < ... ... P(-\sqrt{y} \le Z \le \sqrt{y}) & y \ge 0 \end{array}\right. \end{displaymath}$

Now differentiate

$\begin{displaymath}P(-\sqrt{y} \le Z \le \sqrt{y}) = F_Z(\sqrt{y}) -F_Z(-\sqrt{y}) \end{displaymath}$

$\begin{displaymath}f_Y(y) = \left\{ \begin{array}{ll} 0 & y < 0 \\ \frac{d}{dy... ...\right] & y > 0 \\ \mbox{undefined} & y=0 \end{array}\right. \end{displaymath}$

Then

$\begin{eqnarray*}\frac{d}{dy} F_Z(\sqrt{y}) & = & f_Z(\sqrt{y})\frac{d}{dy}\sqrt... ...{1}{2} y^{-1/2} \\ & = & \frac{1}{2\sqrt{2\pi y}} e^{-y/2} \,. \end{eqnarray*}$

(Similar formula for other derivative.) Thus

$\begin{displaymath}f_Y(y) = \left\{ \begin{array}{ll} \frac{1}{\sqrt{2\pi y}} e... ...0 \\ 0 & y < 0 \\ \mbox{undefined} & y=0 \end{array}\right. \end{displaymath}$

We will find indicator notation useful:

$\begin{displaymath}1(y>0) = \left\{ \begin{array}{ll} 1 & y>0 \\ 0 & y \le 0 \end{array}\right. \end{displaymath}$

which we use to write

$\begin{displaymath}f_Y(y) = \frac{1}{\sqrt{2\pi y}} e^{-y/2} 1(y>0) \end{displaymath}$

(changing definition unimportantly at y=0).

Notice: I never evaluated F_Y before differentiating it. In fact F_Y and F_Z are integrals I can't do but I can differentiate then anyway. Remember fundamental theorem of calculus:

$\begin{displaymath}\frac{d}{dx} \int_a^x f(y) \, dy = f(x) \end{displaymath}$

at any x where f is continuous.

$next$ $up$

Richard Lockhart
2000-01-04