Chapter Contents |
Previous |
Next |
The ARIMA Procedure |
The ARIMA procedure primarily uses the computational methods outlined by Box and Jenkins. Marquardt's method is used for the nonlinear least-squares iterations. Numerical approximations of the derivatives of the sum-of-squares function are taken using a fixed delta (controlled by the DELTA= option).
The methods do not always converge successfully for a given set of data, particularly if the starting values for the parameters are not close to the least-squares estimates.
When preliminary estimation is not performed by PROC ARIMA, then initial values of the coefficients for any given autoregressive or moving average factor are set to 0.1 if the degree of the polynomial associated with the factor is 9 or less. Otherwise, the coefficients are determined by expanding the polynomial (1 - 0.1B) to an appropriate power using a recursive algorithm.
These preliminary estimates are the starting values in an iterative algorithm to compute estimates of the parameters.
The METHOD= ML option produces maximum likelihood estimates. The likelihood function is maximized via nonlinear least squares using Marquardt's method. Maximum likelihood estimates are more expensive to compute than the conditional least-squares estimates, however, they may be preferable in some cases (Ansley and Newbold 1980; Davidson 1981).
The maximum likelihood estimates are computed as follows. Let the univariate ARMA model be
where at is an independent sequence of normally distributed innovations with mean 0 and variance . Here is the mean parameter plus the transfer function inputs. The log likelihood function can be written as follows:
In this equation, n is the number of observations, is the variance of x as a function of the and parameters, and denotes the determinant. The vector x is the time series Wt minus the structural part of the model , written as a column vector, as follows:
The maximum likelihood estimate (MLE) of is
Note that the default estimator of the variance divides by n-r, where r is the number of parameters in the model, instead of by n. Specifying the NODF option causes a divisor of n to be used.
The log likelihood concentrated with respect to can be taken up to additive constants as
Let H be the lower triangular matrix with positive elements on the diagonal such that . Let e be the vector H-1x. The concentrated log likelihood with respect to can now be written as
or
The MLE is produced by using a Marquardt algorithm to minimize the following sum of squares:
The subsequent analysis of the residuals is done using e as the vector of residuals.
The METHOD=ULS option produces unconditional least-squares estimates. The ULS method is also referred to as the exact least-squares (ELS) method. For METHOD=ULS, the estimates minimize
where Ct is the covariance matrix of xt and (x1, ... ,xt-1), and Vt is the variance matrix of (x1, ... ,xt-1) . In fact, is the same as and, hence, e'e. Therefore, the unconditional least-squares estimates are obtained by minimizing the sum of squared residuals rather than using the log likelihood as the criterion function.
The METHOD=CLS option produces conditional least-squares estimates. The CLS estimates are conditional on the assumption that the past unobserved errors are equal to 0. The series xt can be represented in terms of the previous observations, as follows:
The weights are computed from the ratio of the and polynomials, as follows:
The CLS method produces estimates minimizing
where the unobserved past values of xt are set to 0 and are computed from the estimates of and at each iteration.
For METHOD=ULS and METHOD=ML, initial estimates are computed using the METHOD=CLS algorithm.
where L is the likelihood function and k is the number of free parameters. The SBC is computed as
where n is the number of residuals that can be computed for the time series. Sometimes Schwarz's Bayesian criterion is called the Bayesian Information criterion (BIC).
If METHOD=CLS is used to do the estimation, an approximation value of L is used, where L is based on the conditional sum of squares instead of the exact sum of squares, and a Jacobian factor is left out.
where
and at is the residual series.
This formula has been suggested by Ljung and Box (1978) as yielding a better fit to the asymptotic chi-square distribution than the Box-Pierce Q statistic. Some simulation studies of the finite sample properties of this statistic are given by Davies, Triggs, and Newbold (1977) and by Ljung and Box (1978).
Each chi-square statistic is computed for all lags up to the indicated lag value and is not independent of the preceding chi-square values. The null hypotheses tested is that the current set of autocorrelations is white noise.
Chapter Contents |
Previous |
Next |
Top |
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.