Example 7.5: Using Diagnostics to Identify ARIMA models
Fitting ARIMA models is as much an art as it is a science.
The ARIMA procedure has diagnostic options to help tentatively
identify the orders of both stationary and nonstationary ARIMA
processes.
Consider the Series A in Box et al (1994), which consists
of 197 concentration readings taken every two hours from a
chemical process. Let SeriesA be a data set containing these
readings in a variable named X. The following SAS statements
use the SCAN option of the IDENTIFY statement to
generate Output 7.5.1 and Output 7.5.2.
See "The SCAN Method" for details of the SCAN method.
proc arima data=SeriesA;
identify var=x scan;
run;
Output 7.5.1: Example of SCAN Tables
SERIES A: Chemical Process Concentration Readings |
Squared Canonical Correlation Estimates |
Lags |
MA 0 |
MA 1 |
MA 2 |
MA 3 |
MA 4 |
MA 5 |
AR 0 |
0.3263 |
0.2479 |
0.1654 |
0.1387 |
0.1183 |
0.1417 |
AR 1 |
0.0643 |
0.0012 |
0.0028 |
<.0001 |
0.0051 |
0.0002 |
AR 2 |
0.0061 |
0.0027 |
0.0021 |
0.0011 |
0.0017 |
0.0079 |
AR 3 |
0.0072 |
<.0001 |
0.0007 |
0.0005 |
0.0019 |
0.0021 |
AR 4 |
0.0049 |
0.0010 |
0.0014 |
0.0014 |
0.0039 |
0.0145 |
AR 5 |
0.0202 |
0.0009 |
0.0016 |
<.0001 |
0.0126 |
0.0001 |
SCAN Chi-Square[1] Probability Values |
Lags |
MA 0 |
MA 1 |
MA 2 |
MA 3 |
MA 4 |
MA 5 |
AR 0 |
<.0001 |
<.0001 |
<.0001 |
0.0007 |
0.0037 |
0.0024 |
AR 1 |
0.0003 |
0.6649 |
0.5194 |
0.9235 |
0.3993 |
0.8528 |
AR 2 |
0.2754 |
0.5106 |
0.5860 |
0.7346 |
0.6782 |
0.2766 |
AR 3 |
0.2349 |
0.9812 |
0.7667 |
0.7861 |
0.6810 |
0.6546 |
AR 4 |
0.3297 |
0.7154 |
0.7113 |
0.6995 |
0.5807 |
0.2205 |
AR 5 |
0.0477 |
0.7254 |
0.6652 |
0.9576 |
0.2660 |
0.9168 |
|
In Output 7.5.1, there is one (maximal) rectangular region
in which all the elements are insignificant with 95% confidence.
This region has a vertex at (1,1). Output 7.5.2 gives recommendations
based on the significance level specified by the ALPHA=siglevel
option.
Output 7.5.2: Example of SCAN Option Tentative Order Selection
ARMA(p+d,q) Tentative Order Selection Tests |
SCAN |
p+d |
q |
1 |
1 |
|
Another order identification diagnostic is the extended sample
autocorrelation function or ESACF method.
See "The ESACF Method" for details of the ESACF method.
The following statements generate Output 7.5.3 and Output 7.5.4.
proc arima data=SeriesA;
identify var=x esacf;
run;
Output 7.5.3: Example of ESACF Tables
Extended Sample Autocorrelation Function |
Lags |
MA 0 |
MA 1 |
MA 2 |
MA 3 |
MA 4 |
MA 5 |
AR 0 |
0.5702 |
0.4951 |
0.3980 |
0.3557 |
0.3269 |
0.3498 |
AR 1 |
-0.3907 |
0.0425 |
-0.0605 |
-0.0083 |
-0.0651 |
-0.0127 |
AR 2 |
-0.2859 |
-0.2699 |
-0.0449 |
0.0089 |
-0.0509 |
-0.0140 |
AR 3 |
-0.5030 |
-0.0106 |
0.0946 |
-0.0137 |
-0.0148 |
-0.0302 |
AR 4 |
-0.4785 |
-0.0176 |
0.0827 |
-0.0244 |
-0.0149 |
-0.0421 |
AR 5 |
-0.3878 |
-0.4101 |
-0.1651 |
0.0103 |
-0.1741 |
-0.0231 |
ESACF Probability Values |
Lags |
MA 0 |
MA 1 |
MA 2 |
MA 3 |
MA 4 |
MA 5 |
AR 0 |
<.0001 |
<.0001 |
0.0001 |
0.0014 |
0.0053 |
0.0041 |
AR 1 |
<.0001 |
0.5974 |
0.4622 |
0.9198 |
0.4292 |
0.8768 |
AR 2 |
<.0001 |
0.0002 |
0.6106 |
0.9182 |
0.5683 |
0.8592 |
AR 3 |
<.0001 |
0.9022 |
0.2400 |
0.8713 |
0.8930 |
0.7372 |
AR 4 |
<.0001 |
0.8380 |
0.3180 |
0.7737 |
0.8913 |
0.6213 |
AR 5 |
<.0001 |
<.0001 |
0.0765 |
0.9142 |
0.1038 |
0.8103 |
|
In Output 7.5.3, there are three right-triangular regions
in which all elements are insignificant at the 5% level.
The triangles have vertices (1,1), (3,1), and (4,1).
Since the triangle at (1,1) covers more insignificant terms,
it is recommended first. Similarly, the remaining recommendations
are ordered by the number of insignificant terms contained in
the triangle. Output 7.5.4 gives recommendations based on the
significance level specified by the ALPHA=siglevel option.
Output 7.5.4: Example of ESACF Option Tentative Order Selection
ARMA(p+d,q) Tentative Order Selection Tests |
ESACF |
p+d |
q |
1 |
1 |
3 |
1 |
4 |
1 |
|
If you also specify the SCAN option in the same IDENTIFY statement,
the two recommendations are printed side by side.
proc arima data=SeriesA;
identify var=x scan esacf;
run;
Output 7.5.5: Example of SCAN and ESACF Option Combined
ARMA(p+d,q) Tentative Order Selection Tests |
SCAN |
ESACF |
p+d |
q |
p+d |
q |
1 |
1 |
1 |
1 |
|
|
3 |
1 |
|
|
4 |
1 |
|
From above, the
autoregressive and moving average orders
are tentatively identified by both SCAN and ESACF tables
to be (p+d, q)=(1,1).
Because both the SCAN and ESACF indicate a p+d term of 1,
a unit root test should be used to determine whether this term
is a unit root or an autoregressive term.
Since a moving average term appears
to be present, a large autoregressive term is appropriate
for the Augmented Dickey-Fuller test for a unit root.
Submitting the following code
generates Output 7.5.6.
proc arima data=SeriesA;
identify var=x stationarity=(adf=(5,6,7,8));
run;
Output 7.5.6: Example of STATIONARITY Option Output
Augmented Dickey-Fuller Unit Root Tests |
Type |
Lags |
Rho |
Pr < Rho |
Tau |
Pr < Tau |
F |
Pr > F |
Zero Mean |
5 |
0.0403 |
0.6913 |
0.42 |
0.8024 |
|
|
|
6 |
0.0479 |
0.6931 |
0.63 |
0.8508 |
|
|
|
7 |
0.0376 |
0.6907 |
0.49 |
0.8200 |
|
|
|
8 |
0.0354 |
0.6901 |
0.48 |
0.8175 |
|
|
Single Mean |
5 |
-18.4550 |
0.0150 |
-2.67 |
0.0821 |
3.67 |
0.1367 |
|
6 |
-10.8939 |
0.1043 |
-2.02 |
0.2767 |
2.27 |
0.4931 |
|
7 |
-10.9224 |
0.1035 |
-1.93 |
0.3172 |
2.00 |
0.5605 |
|
8 |
-10.2992 |
0.1208 |
-1.83 |
0.3650 |
1.81 |
0.6108 |
Trend |
5 |
-18.4360 |
0.0871 |
-2.66 |
0.2561 |
3.54 |
0.4703 |
|
6 |
-10.8436 |
0.3710 |
-2.01 |
0.5939 |
2.04 |
0.7694 |
|
7 |
-10.7427 |
0.3773 |
-1.90 |
0.6519 |
1.91 |
0.7956 |
|
8 |
-10.0370 |
0.4236 |
-1.79 |
0.7081 |
1.74 |
0.8293 |
|
The preceding test results show that a unit root is very likely and
that the series should be differenced. Based on this test
and the previous results, an ARIMA(0,1,1) would be a good choice for
a tentative model for Series A.
Using the recommendation that the series be differenced, the following statements
generate Output 7.5.7.
proc arima data=SeriesA;
identify var=x(1) minic;
run;
Output 7.5.7: Example of MINIC Table
Minimum Information Criterion |
Lags |
MA 0 |
MA 1 |
MA 2 |
MA 3 |
MA 4 |
MA 5 |
AR 0 |
-2.05761 |
-2.3497 |
-2.32358 |
-2.31298 |
-2.30967 |
-2.28528 |
AR 1 |
-2.23291 |
-2.32345 |
-2.29665 |
-2.28644 |
-2.28356 |
-2.26011 |
AR 2 |
-2.23947 |
-2.30313 |
-2.28084 |
-2.26065 |
-2.25685 |
-2.23458 |
AR 3 |
-2.25092 |
-2.28088 |
-2.25567 |
-2.23455 |
-2.22997 |
-2.20769 |
AR 4 |
-2.25934 |
-2.2778 |
-2.25363 |
-2.22983 |
-2.20312 |
-2.19531 |
AR 5 |
-2.2751 |
-2.26805 |
-2.24249 |
-2.21789 |
-2.19667 |
-2.17426 |
|
The error series is estimated using an AR(7) model, and the
minimum of this MINIC table is BIC(0,1). This diagnostic confirms the
previous result indicating that an ARIMA(0,1,1) is a
tentative model for Series A.
If you also specify the SCAN or MINIC option in the same IDENTIFY
statement, the BIC associated with the SCAN table
and ESACF table recommendations are listed.
proc arima data=SeriesA;
identify var=x(1) minic scan esacf;
run;
Output 7.5.8: Example of SCAN, ESACF, MINIC Options Combined
ARMA(p+d,q) Tentative Order Selection Tests |
SCAN |
ESACF |
p+d |
q |
BIC |
p+d |
q |
BIC |
0 |
1 |
-2.3497 |
0 |
1 |
-2.3497 |
|
|
|
1 |
1 |
-2.32345 |
|
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.