Postscript version of these notes
Stat 804
Lecture 14 Notes
Our goal in this lecture is to develop asymptotic distribution
theory for the sample autocorrelation function.
We let and
be the ACF and estimated ACF
respectively.
We begin by reducing the behaviour of
to the behaviour
of , the sample autocovariance. Our approach is standard
Talyor expansion.
Large sample theory for ratio estimates
Suppose you have pairs of random variables with
and
We study the large sample behaviour of under the
assumption that is not 0. We will see that the case
results in some simplifications.
Begin by writing
where
Notice that
in probability.
We may expand
and then write
We want to compute the mean of this expression term by term and the
variance by using the formula for the variance of the sum and so on.
However, what we really do is truncate the infinite sum at some finite
number of terms and compute moments of the finite sum.
I want to be clear about the distinction; to do so I give an example.
Imagine that has a bivariate normal distribution with means
, variances
, and
correlation between and . The quantity
does not have a well defined mean because
E. Our expansion is
still valid, however. Stopping the sum at leads to the approximation
I now want to look at these terms to decide which are big and which are
small. To do so I introduce big O notation:
Definition: : If is a sequence of random variables and a sequence of
constants then we write
if, for each
there is an (depending on but not
) such that
The idea is that
means that is
proportional in size to with the ``constant of
proportionality'' being a random variable which is not
likely to be too large. We also often have use for notation
indicating that is actually small compared to .
Definition: : We say
if
in probability:
for each
You can manipulate and
notation algebraically with a few rules:
- If is a sequence of constants such that
with
then
We write
- If
and
for two sequences
and then
We express this as
- In particular
-
-
-
- In particular
-
These notions extend Landau's and notation to
random quantities.
Example: : In our ratio example we have
and
In our geometric expansion
Look first at the expansion stopped at . We
have
(The three terms on the RHS of the first line are being
described in terms of roughly how big each is.)
If we stop at we get
Keeping only terms of order
we find
We now take expected values and discover that up to an error
of order
E
BUT you are warned that what is really meant is simply that
there is a random variable which is approximately (neglecting
something which is probably proportional in size to )
whose expected value is 0. For the normal example the remainder term
in this expansions, that is, the term
, is probably small but
its expected value is not defined.
To keep terms up to order
we have to keep terms out to
(In general
For this is
but for the
term is not negligible.
If we retain terms out to then we get
Taking expected values here we get
E
E
up to terms of order . In the normal case
we get
E
In order to compute the approximate variance we ought to compute the second moment
of
and subtract the square of the first moment.
Imagine you had a random variable of the form
where I assume that the do not depend on .
The mean, taken term by term would be of the form
and the second moment of the form
This leads to a variance of the form
Our expansion above gave
and
from which we get the approximate variance
Now I want to apply these ideas to estimation of . We make
be
and be
(and replace by
). Our first order approximation to
is
Our second order approximation would be
I now evaluate means and variances in the special case where has
been calculated using a known mean of 0. That is
Then
E
so
E
To compute the variance we begin with the second moment which is
The expectations in question involve the fourth order product
moments of and depend on the distribution of the 's and
not just on . However, for the interesting case of white
noise, we can compute the expected value. For you may assume
that or since the cases can be figured out by swapping
and in the case. For the variable is independent
of all 3 of , and . Thus the expectation factors
into something containing the factor
. For ,
we get
. and so the second
moment is
This is also the variance since, for and for white noise,
.
For and or the expectation is simply
while for we get
.
Thus the variance of the sample variance (when the mean is known
to be 0) is
For the normal distribution the fourth moment is given simply
by .
Having computed the variance it is usual to look at the large
sample distribution theory. For the usual central limit theorem
applies to
(in the case of white noise) to prove that
The presence of in the formula shows that the approximation is
quite sensitive to the assumption of normality.
For the theorem needed is called the -dependent central
limit theorem; it shows that
In each of these cases the assertion is simply that the statistic
in question divided by its standard deviation has an approximate
normal distribution.
The sample autocorrelation at lag is
For we can apply Slutsky's theorem to conclude that
This justifies drawing lines at
to carry
out a 95% test of the hypothesis that the series is white
noise based on the th sample autocorrelation.
It is possible to verify that subtraction of from the
observations before computing the sample covariances does not
change the large sample approximations, although it does affect
the exact formulas for moments.
When the series is actually not white noise the situation is
more complicated. Consider as an example the model
with being white noise. Taking
we find that
The expectation is 0 unless either all 4 indices on the
's are the same or the indices come in two pairs of equal
values. The first case requires and and then
. The second case requires one of three pairs of equalities:
and
or
and
or
and
along with the restriction
that the four indices not all be equal. The actual moment is then
when all four indices are equal and when there
are two pairs. It is now possible to do the sum using geometric
series identities and compute the variance of
.
It is not particularly enlightening to finish the calculation in
detail.
There are versions of the central limit theorem called
mixing central limit theorems which can be used for ARMA() processes
in order to conclude that
has asymptotically a standard normal distribution and that the same
is true when the standard deviation in the denominator is replaced by an
estimate. To get from this to distribution theory for the
sample autocorrelation is easiest when the true autocorrelation is 0.
The general tactic is the method or Taylor expansion. In this
case for each sample size you have two estimates, say and
of two parameters. You want distribution theory for the ratio
. The idea is to write
where
and then make use of the fact that and are
close to the parameters they are estimates of. In our case
is the sample autocovariance at lag which is close to the
true autocovariance while the denominator is the
sample autocovariance at lag 0, a consistent estimator of .
Write
If we can use a central limit theorem to conclude
that
has an approximately bivariate normal distribution
and if we can neglect the remainder term then
has approximately a normal distribution. The notation here is that
denotes differentiation with respect to the th argument
of . For
we have
and
.
When the term involving vanishes and we
simply get the assertion that
has the same asymptotic normal distribution as
.
Similar ideas can be used for the estimated sample partial ACF.
Portmanteau tests
In order to test the hypothesis that a series is white noise using the
distribution theory just given, you have to produce a single statistic
to base youre test on. Rather than pick a single value of the
suggestion has been made to consider a sum of squares or a weighted
sum of squares of the
.
A typical statistic is
which, for white noise, has approximately a distribution.
(This fact relies on an extension of the previous computations to conclude
that
has approximately a standard multivariate distribution. This, in turn, relies
on computation of the covariance between
and
.)
When the parameters in an ARMA() have been estimated by maximum likelihood
the degrees of freedom must be adjusted to . The resulting
test is the Box-Pierce test; a refined version which takes better account
of finite sample properties is the Box-Pierce-Ljung test. S-Plus plots the
-values from these tests for 1 through 10 degrees of freedom as
part of the output of arima.diag.
Richard Lockhart
2001-09-30