Goals of Today's Lecture:
Today's notes
Large Sample Theory of the MLE
Application of the law of large numbers to the likelihood function:
The log likelihood ratio for
to
is
Definition A sequence of estimators of is consistent if converges weakly (or strongly) to .
Proto theorem: In regular problems the mle is consistent.
Now let us study the shape of the log likelihood near the true
value of
under the assumption that
is a
root of the likelihood equations close to .
We use Taylor
expansion to write, for a 1 dimensional parameter ,
for some
between
and
.
(This form of the remainder in Taylor's theorem is not valid
for multivariate .) The derivatives of U are each
sums of n terms and so should be both proportional
to n in size. The second derivative is multiplied by the
square of the small number
so should be
negligible compared to the first derivative term.
If we ignore the second derivative term we get
In the normal case
In general,
has mean 0
and approximately a normal distribution. Here is how we check that:
Notice that I have interchanged the order of differentiation
and integration at one point. This step is usually justified
by applying the dominated convergence theorem to the definition
of the derivative. The same tactic can be applied by differentiating
the identity which we just proved
Definition: The Fisher Information is
The idea is that I is a measure of how curved the log
likelihood tends to be at the true value of .
Big curvature means precise estimates. Our identity above
is
Now we return to our Taylor expansion approximation
We have shown that
is a sum of iid mean 0 random variables.
The central limit theorem thus proves that
Next observe that
Summary
In regular families:
We usually simply say that the mle is consistent and asymptotically
normal with an asymptotic variance which is the inverse of the Fisher
information. This assertion is actually valid for vector valued
where now I is a matrix with ijth entry
Estimating Equations
Same ideas arise whenever estimates derived by solving some equation. Example: large sample theory for Generalized Linear Models.
Suppose that for we have observations of the numbers of cancer cases Yi in some group of people characterized by values xi of some covariates. You are supposed to think of xi as containing variables like age, or a dummy for sex or average income or A parametric regression model for the Yi might postulate that Yi has a Poisson distribution with mean where the mean depends somehow on the covariate values. Typically we might assume that where g is a so-called link function, often for this case and is a matrix product with xi written as a row vector and a column vector. This is supposed to function as a ``linear regression model with Poisson errors''. I will do as a special case where xi is a scalar.
The log likelihood is simply
Other estimating equations are possible. People
suggest alternatives very often. If wi is any set of
deterministic weights (even possibly depending on then we could define
Method of Moments
Basic strategy: set sample moments equal to population moments and solve for the parameters.
Definition: The
sample moment (about the origin)
is
(Central moments are
If we have p parameters we can estimate the parameters
by solving the system of pequations:
Gamma Example
The Gamma(
)
density is
The method of moments equations are much easier
to solve than the likelihood equations which
involve the function