Goals of Today's Lecture:
Given random variables
whose joint density f (or distribution)
is specified and a statistic
whose distribution you want to know. To compute
something like P(T > t):
Notice accuracy inversely proportional to .
There are a number of tricks to make the method more accurate
(but they only change the constant of proportionality - the SE is still
inversely proportional to the square root of the sample size).
Transformation
If X is to have cdf F and you generate U which is Uniform[0,1] then
you can take
X=F-1(U). For example X exponential makes
F(x)=
1-e-x and
.
Compare this with the method I
showed for generating exponentials.
Acceptance Rejection
If you can't easily calculate F-1 but you know f you can
try the acceptance rejection method. Find a density g and a constant
c such that
for each x and G-1 is computable
or you otherwise know how to generate observations
independently from g. Generate W1. Compute
.
Generate a uniform[0,1] random variable U1 independent of all
the Ws and let Y=W1 if
.
Otherwise get a new W and a
new U and repeat until you find a
.
You
make Y be the last W you generated. This Y has density f.
Markov Chain Monte Carlo
In the last 10 years the following tactic has become popular, particularly
for generating multivariate observations. If
is an
(ergodic) Markov chain with stationary transitions and the stationary
initial distribution of W has density f then you can get random
variables which have the marginal density f by starting off the Markov
chain and letting it run for a long time. The marginal distributions
of the Wi converge to f. So you can estimate things like
by computing the fraction of the Wi which land in
A.
There are now many versions of this technique including Gibbs Sampling and the Metropolis-Hastings algorithm. (The technique was invented in the 1950s by physicists: Metropolis et al. One of the authors of the paper was Edward Teller ``father of the hydrogen bomb''.)
Importance Sampling
If you want to compute
Variance reduction
Consider the problem of estimating the distribution of the sample mean
for a Cauchy random variable.
The Cauchy density is
We can improve this estimate by remembering that -Xi also
has Cauchy distribution. Take Si=-Ti. Remember that Si has
the same distribution as Ti. Then we try (for t>0)
Regression estimates
Suppose we want to compute
Definition: A model is a family
of possible distributions for some random variable X. (Our
data set is X, so X will generally be a big vector or matrix or even
more complicated object.)
We will assume throughout this course that the true distribution P of Xis in fact some
for some
.
We call
the true value of the parameter. Notice that this assumption
will be wrong; we hope it is not wrong in an important way. If we are very
worried that it is wrong we enlarge our model putting in more distributions
and making
bigger.
Our goal is to observe the value of X and then guess
or some
property of
.
We will consider the following classic mathematical
versions of this:
There are several schools of statistical thinking with different views on how these problems should be done. The main schools of thought may be summarized roughly as follows:
For the next several weeks we do only the Neyman Pearson approach, though we use that approach to evaluate the quality of likelihood methods.
Suppose you toss a coin 6 times and get Heads twice. If p is the
probability of getting H then the probability of getting 2 heads is