Goals of Today's Lecture:
Given random variables whose joint density f (or distribution) is specified and a statistic whose distribution you want to know. To compute something like P(T > t):
Notice accuracy inversely proportional to . There are a number of tricks to make the method more accurate (but they only change the constant of proportionality - the SE is still inversely proportional to the square root of the sample size).
Transformation
If X is to have cdf F and you generate U which is Uniform[0,1] then you can take X=F-1(U). For example X exponential makes F(x)= 1-e-x and . Compare this with the method I showed for generating exponentials.
Acceptance Rejection
If you can't easily calculate F-1 but you know f you can try the acceptance rejection method. Find a density g and a constant c such that for each x and G-1 is computable or you otherwise know how to generate observations independently from g. Generate W1. Compute . Generate a uniform[0,1] random variable U1 independent of all the Ws and let Y=W1 if . Otherwise get a new W and a new U and repeat until you find a . You make Y be the last W you generated. This Y has density f.
Markov Chain Monte Carlo
In the last 10 years the following tactic has become popular, particularly for generating multivariate observations. If is an (ergodic) Markov chain with stationary transitions and the stationary initial distribution of W has density f then you can get random variables which have the marginal density f by starting off the Markov chain and letting it run for a long time. The marginal distributions of the Wi converge to f. So you can estimate things like by computing the fraction of the Wi which land in A.
There are now many versions of this technique including Gibbs Sampling and the Metropolis-Hastings algorithm. (The technique was invented in the 1950s by physicists: Metropolis et al. One of the authors of the paper was Edward Teller ``father of the hydrogen bomb''.)
Importance Sampling
If you want to compute
Variance reduction
Consider the problem of estimating the distribution of the sample mean
for a Cauchy random variable.
The Cauchy density is
We can improve this estimate by remembering that -Xi also
has Cauchy distribution. Take Si=-Ti. Remember that Si has
the same distribution as Ti. Then we try (for t>0)
Regression estimates
Suppose we want to compute
Definition: A model is a family of possible distributions for some random variable X. (Our data set is X, so X will generally be a big vector or matrix or even more complicated object.)
We will assume throughout this course that the true distribution P of Xis in fact some for some . We call the true value of the parameter. Notice that this assumption will be wrong; we hope it is not wrong in an important way. If we are very worried that it is wrong we enlarge our model putting in more distributions and making bigger.
Our goal is to observe the value of X and then guess or some property of . We will consider the following classic mathematical versions of this:
There are several schools of statistical thinking with different views on how these problems should be done. The main schools of thought may be summarized roughly as follows:
For the next several weeks we do only the Neyman Pearson approach, though we use that approach to evaluate the quality of likelihood methods.
Suppose you toss a coin 6 times and get Heads twice. If p is the
probability of getting H then the probability of getting 2 heads is