Chapter Contents |
Previous |
Next |
The AUTOREG Procedure |
Ordinary regression analysis is based on several statistical assumptions. One key assumption is that the errors are independent of each other. However, with time series data, the ordinary regression residuals usually are correlated over time. It is not desirable to use ordinary regression analysis for time series data since the assumptions on which the classical linear regression model is based will usually be violated.
Violation of the independent errors assumption has three important consequences for ordinary regression. First, statistical tests of the significance of the parameters and the confidence limits for the predicted values are not correct. Second, the estimates of the regression coefficients are not as efficient as they would be if the autocorrelation were taken into account. Third, since the ordinary regression residuals are not independent, they contain information that can be used to improve the prediction of future values.
The AUTOREG procedure solves this problem by augmenting the regression model with an autoregressive model for the random error, thereby accounting for the autocorrelation of the errors. Instead of the usual regression model, the following autoregressive error model is used:
The notation indicates that each is normally and independently distributed with mean 0 and variance .
By simultaneously estimating the regression coefficients and the autoregressive error model parameters , the AUTOREG procedure corrects the regression estimates for autocorrelation. Thus, this kind of regression analysis is often called autoregressive error correction or serial correlation correction.
data a; ul = 0; ull = 0; do time = -10 to 36; u = + 1.3 * ul - .5 * ull + 2*rannor(12346); y = 10 + .5 * time + u; if time > 0 then output; ull = ul; ul = u; end; run;
The series Y is a time trend plus a second-order autoregressive error. The model simulated is
The following statements plot the simulated time series Y. A linear regression trend line is shown for reference. (The regression line is produced by plotting the series a second time using the regression interpolation feature of the SYMBOL statement. Refer to SAS/GRAPH Software: Reference, Version 6, First Edition, Volume 1 for further explanation.)
title "Autocorrelated Time Series"; proc gplot data=a; symbol1 v=dot i=join; symbol2 v=none i=r; plot y * time = 1 y * time = 2 / overlay; run;
The plot of series Y and the regression line are shown in Figure 8.1.
Note that when the series is above (or below) the OLS regression trend line, it tends to remain above (below) the trend for several periods. This pattern is an example of positive autocorrelation.
Time series regression usually involves independent variables other than a time-trend. However, the simple time-trend model is convenient for illustrating regression with autocorrelated errors, and the series Y shown in Figure 8.1 is used in the following introductory examples.
proc autoreg data=a; model y = time; run;
The AUTOREG procedure output is shown in Figure 8.2.
The output first shows statistics for the model residuals. The model root mean square error (Root MSE) is 2.51, and the model R2 is .82. Notice that two R2 statistics are shown, one for the regression model (Reg Rsq) and one for the full model (Total Rsq) that includes the autoregressive error process, if any. In this case, an autoregressive error model is not used, so the two R2 statistics are the same.
Other statistics shown are the sum of square errors (SSE), mean square error (MSE), error degrees of freedom (DFE, the number of observations minus the number of parameters), the information criteria SBC and AIC, and the Durbin-Watson statistic. (Durbin-Watson statistics and SBC and AIC are discussed in the "Details" section later in this chapter.)
The output then shows a table of regression coefficients, with standard errors and t-tests. The estimated model is
The OLS parameter estimates are reasonably close to the true values, but the estimated error variance, 6.32, is much larger than the true value, 4.
proc autoreg data=a; model y = time / nlag=2 method=ml; run;
The first part of the results are shown in Figure 8.3. The initial OLS results are produced first, followed by estimates of the autocorrelations computed from the OLS residuals. The autocorrelations are also displayed graphically.
|
The maximum likelihood estimates are shown in Figure 8.4. Figure 8.4 also shows the preliminary Yule-Walker estimates used as starting values for the iterative computation of the maximum likelihood estimates.
|
The diagnostic statistics and parameter estimates tables in Figure 8.4 have the same form as in the OLS output, but the values shown are for the autoregressive error model. The MSE for the autoregressive model is 1.71, which is much smaller than the true value of 4. In small samples, the autoregressive error model tends to underestimate , while the OLS MSE overestimates .
Notice that the total R2 statistic computed from the autoregressive model residuals is .954, reflecting the improved fit from the use of past residuals to help predict the next Y value. The Reg Rsq value .728 is the R2 statistic for a regression of transformed variables adjusted for the estimated autocorrelation. (This is not the R2 for the estimated trend line. For details, see "R2 Statistics and Other Measures of Fit" later in this chapter.)
The parameter estimates table shows the ML estimates of the regression coefficients and includes two additional rows for the estimates of the autoregressive parameters, labeled A(1) and A(2). The estimated model is
Note that the signs of the autoregressive parameters shown in this equation for are the reverse of the estimates shown in the AUTOREG procedure output. Figure 8.4 also shows the estimates of the regression coefficients with the standard errors recomputed on the assumption that the autoregressive parameter estimates equal the true values.
Use the OUTPUT statement to store predicted values and residuals in a SAS data set and to output other values such as confidence limits and variance estimates. The P= option specifies an output variable to contain the full model predicted values. The PM= option names an output variable for the predicted mean. The R= and RM= options specify output variables for the corresponding residuals, computed as the actual value minus the predicted value.
The following statements store both kinds of predicted values in the output data set. (The printed output is the same as previously shown in Figure 8.3 and Figure 8.4.)
proc autoreg data=a; model y = time / nlag=2 method=ml; output out=p p=yhat pm=trendhat; run;
The following statements plot the predicted values from the regression trend line and from the full model together with the actual values.
title "Predictions for Autocorrelation Model"; proc gplot data=p; symbol1 v=star i=none; symbol2 v=circle i=join; symbol3 v=none i=join; plot y * time = 1 yhat * time = 2 trendhat * time = 3 / overlay ; run;
The plot of predicted values is shown in Figure 8.5.
In Figure 8.5 the straight line is the autocorrelation corrected regression line, traced out by the structural predicted values TRENDHAT. The jagged line traces the full model prediction values. The actual values are marked by asterisks. This plot graphically illustrates the improvement in fit provided by the autoregressive error process for highly autocorrelated data.
Chapter Contents |
Previous |
Next |
Top |
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.