Identification Stage

The ARIMA Procedure

Identification Stage

Suppose you have a variable called SALES that you want to forecast. The following example illustrates ARIMA modeling and forecasting using a simulated data set TEST containing a time series SALES generated by an ARIMA(1,1,1) model. The output produced by this example is explained in the following sections. The simulated SALES series is shown in Figure 7.1.

Figure 7.1: Simulated ARIMA(1,1,1) Series SALES

Using the IDENTIFY Statement

You first specify the input data set in the PROC ARIMA statement. Then, you use an IDENTIFY statement to read in the SALES series and plot its autocorrelation function. You do this using the following statements:

   proc arima data=test;
      identify var=sales nlag=8;
      run;

Descriptive Statistics

The IDENTIFY statement first prints descriptive statistics for the SALES series. This part of the IDENTIFY statement output is shown in Figure 7.2.

The ARIMA Procedure

Name of Variable = sales
Mean of Working Series	137.3662
Standard Deviation	17.36385
Number of Observations	100

Figure 7.2: IDENTIFY Statement Descriptive Statistics Output

Autocorrelation Function Plots

The IDENTIFY statement next prints three plots of the correlations of the series with its past values at different lags. These are the

sample autocorrelation function plot
sample partial autocorrelation function plot
sample inverse autocorrelation function plot

The sample autocorrelation function plot output of the IDENTIFY statement is shown in Figure 7.3.

The ARIMA Procedure

Autocorrelations
Lag	Covariance	Correlation	-1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1	Std Error
0	301.503	1.00000	\| \|********************\|	0
1	288.454	0.95672	\| . \|******************* \|	0.100000
2	273.437	0.90691	\| . \|****************** \|	0.168245
3	256.787	0.85169	\| . \|***************** \|	0.211556
4	238.518	0.79110	\| . \|**************** \|	0.243441
5	219.033	0.72647	\| . \|*************** \|	0.267918
6	198.617	0.65876	\| . \|************* \|	0.286941
7	177.150	0.58755	\| . \|************ \|	0.301686
8	154.914	0.51381	\| . \|********** . \|	0.312920

"." marks two standard errors

Figure 7.3: IDENTIFY Statement Autocorrelations Plot

The autocorrelation plot shows how values of the series are correlated with past values of the series. For example, the value 0.95672 in the "Correlation" column for the Lag 1 row of the plot means that the correlation between SALES and the SALES value for the previous period is .95672. The rows of asterisks show the correlation values graphically.

These plots are called autocorrelation functions because they show the degree of correlation with past values of the series as a function of the number of periods in the past (that is, the lag) at which the correlation is computed.

The NLAG= option controls the number of lags for which autocorrelations are shown. By default, the autocorrelation functions are plotted to lag 24; in this example the NLAG=8 option is used, so only the first 8 lags are shown.

Most books on time series analysis explain how to interpret autocorrelation plots and partial autocorrelation plots. See the section "The Inverse Autocorrelation Function" later in this chapter for a discussion of inverse autocorrelation plots.

By examining these plots, you can judge whether the series is stationary or nonstationary. In this case, a visual inspection of the autocorrelation function plot indicates that the SALES series is nonstationary, since the ACF decays very slowly. For more formal stationarity tests, use the STATIONARITY= option. (See the section "Stationarity" later in this chapter.)

The inverse and partial autocorrelation plots are printed after the autocorrelation plot. These plots have the same form as the autocorrelation plots, but display inverse and partial autocorrelation values instead of autocorrelations and autocovariances. The partial and inverse autocorrelation plots are not shown in this example.

White Noise Test

The last part of the default IDENTIFY statement output is the check for white noise. This is an approximate statistical test of the hypothesis that none of the autocorrelations of the series up to a given lag are significantly different from 0. If this is true for all lags, then there is no information in the series to model, and no ARIMA model is needed for the series.

The autocorrelations are checked in groups of 6, and the number of lags checked depends on the NLAG= option. The check for white noise output is shown in Figure 7.4.

The ARIMA Procedure

Autocorrelation Check for White Noise
To Lag	Chi-Square	DF	Pr > ChiSq	Autocorrelations
6	426.44	6	<.0001	0.957	0.907	0.852	0.791	0.726	0.659

Figure 7.4: IDENTIFY Statement Check for White Noise

In this case, the white noise hypothesis is rejected very strongly, which is expected since the series is nonstationary. The p value for the test of the first six autocorrelations is printed as <0.0001, which means the p value is less than .0001.

Identification of the Differenced Series

Since the series is nonstationary, the next step is to transform it to a stationary series by differencing. That is, instead of modeling the SALES series itself, you model the change in SALES from one period to the next. To difference the SALES series, use another IDENTIFY statement and specify that the first difference of SALES be analyzed, as shown in the following statements:

   identify var=sales(1) nlag=8;
   run;

The second IDENTIFY statement produces the same information as the first but for the change in SALES from one period to the next rather than for the total sales in each period. The summary statistics output from this IDENTIFY statement is shown in Figure 7.5. Note that the period of differencing is given as 1, and one observation was lost through the differencing operation.

The ARIMA Procedure

Name of Variable = sales
Period(s) of Differencing	1
Mean of Working Series	0.660589
Standard Deviation	2.011543
Number of Observations	99
Observation(s) eliminated by differencing	1

Figure 7.5: IDENTIFY Statement Output for Differenced Series

The autocorrelation plot for the differenced series is shown in Figure 7.6.

The ARIMA Procedure

Autocorrelations
Lag	Covariance	Correlation	-1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1	Std Error
0	4.046306	1.00000	\| \|********************\|	0
1	3.351258	0.82823	\| . \|***************** \|	0.100504
2	2.390895	0.59088	\| . \|************ \|	0.154786
3	1.838925	0.45447	\| . \|********* \|	0.176103
4	1.494253	0.36929	\| . \|*******. \|	0.187576
5	1.135753	0.28069	\| . \|****** . \|	0.194781
6	0.801319	0.19804	\| . \|**** . \|	0.198825
7	0.610543	0.15089	\| . \|*** . \|	0.200808
8	0.326495	0.08069	\| . \|** . \|	0.201950

"." marks two standard errors

Figure 7.6: Autocorrelations Plot for Change in SALES

The autocorrelations decrease rapidly in this plot, indicating that the change in SALES is a stationary time series.

The next step in the Box-Jenkins methodology is to examine the patterns in the autocorrelation plot to choose candidate ARMA models to the series. The partial and inverse autocorrelation function plots are also useful aids in identifying appropriate ARMA models for the series. The partial and inverse autocorrelation function plots are shown in Figure 7.7 and Figure 7.8.

The ARIMA Procedure

Inverse Autocorrelations
Lag	Correlation	-1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1
1	-0.73867	\| ***************\| . \|
2	0.36801	\| . \|******* \|
3	-0.17538	\| ****\| . \|
4	0.11431	\| . \|** . \|
5	-0.15561	\| .***\| . \|
6	0.18899	\| . \|**** \|
7	-0.15342	\| .***\| . \|
8	0.05952	\| . \|* . \|

Figure 7.7: Inverse Autocorrelation Function Plot for Change in SALES

The ARIMA Procedure

Partial Autocorrelations
Lag	Correlation	-1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1
1	0.82823	\| . \|***************** \|
2	-0.30275	\| ******\| . \|
3	0.23722	\| . \|***** \|
4	-0.07450	\| . *\| . \|
5	-0.02654	\| . *\| . \|
6	-0.01012	\| . \| . \|
7	0.04189	\| . \|* . \|
8	-0.17668	\| ****\| . \|

Figure 7.8: Partial Autocorrelation Plot for Change in SALES

In the usual Box and Jenkins approach to ARIMA modeling, the sample autocorrelation function, inverse autocorrelation function, and partial autocorrelation function are compared with the theoretical correlation functions expected from different kinds of ARMA models. This matching of theoretical autocorrelation functions of different ARMA models to the sample autocorrelation functions computed from the response series is the heart of the identification stage of Box-Jenkins modeling. Most textbooks on time series analysis discuss the theoretical autocorrelation functions for different kinds of ARMA models.

Since the input data is only a limited sample of the series, the sample autocorrelation functions computed from the input series will only approximate the true autocorrelation functions of the process generating the series. This means that the sample autocorrelation functions will not exactly match the theoretical autocorrelation functions for any ARMA model and may have a pattern similar to that of several different ARMA models.

If the series is white noise (a purely random process), then there is no need to fit a model. The check for white noise, shown in Figure 7.9, indicates that the change in sales is highly autocorrelated. Thus, an autocorrelation model, for example an AR(1) model, might be a good candidate model to fit to this process.

The ARIMA Procedure

Autocorrelation Check for White Noise
To Lag	Chi-Square	DF	Pr > ChiSq	Autocorrelations
6	154.44	6	<.0001	0.828	0.591	0.454	0.369	0.281	0.198

Figure 7.9: IDENTIFY Statement Check for White Noise

Chapter Contents
Previous
Next
Top