Forecasting Methods

The FORECAST Procedure

Forecasting Methods

This section explains the forecasting methods used by PROC FORECAST.

STEPAR Method

In the STEPAR method, PROC FORECAST first fits a time trend model to the series and takes the difference between each value and the estimated trend. (This process is called detrending.) Then, the remaining variation is fit using an autoregressive model.

The STEPAR method fits the autoregressive process to the residuals of the trend model using a backwards-stepping method to select parameters. Since the trend and autoregressive parameters are fit in sequence rather than simultaneously, the parameter estimates are not optimal in a statistical sense; however, the estimates are usually close to optimal, and the method is computationally inexpensive.

The STEPAR Algorithm

The STEPAR method consists of the following computational steps:

Fit the trend model as specified by the TREND= option using ordinary least-squares regression. This step detrends the data. The default trend model for the STEPAR method is TREND=2, a linear trend model.
Take the residuals from step 1 and compute the autocovariances to the number of lags specified by the NLAGS= option.
Regress the current values against the lags, using the autocovariances from step 2 in a Yule-Walker framework. Do not bring in any autoregressive parameter that is not significant at the level specified by the SLENTRY= option. (The default is SLENTRY=0.20.) Do not bring in any autoregressive parameter which results in a nonpositive-definite Toeplitz matrix.
Find the autoregressive parameter that is least significant. If the significance level is greater than the SLSTAY= value, remove the parameter from the model. (The default is SLSTAY=0.05.) Continue this process until only significant autoregressive parameters remain. If the OUTEST= option is specified, write the estimates to the OUTEST= data set.
Generate the forecasts using the estimated model and output to the OUT= data set. Form the confidence limits by combining the trend variances with the autoregressive variances.

Missing values are tolerated in the series; the autocorrelations are estimated from the available data and tapered if necessary.

This method requires at least three passes through the data: two passes to fit the model and a third pass to initialize the autoregressive process and write to the output data set.

Default Value of the NLAGS= Option

If the NLAGS= option is not specified, the default value of the NLAGS= option is chosen based on the data frequency specified by the INTERVAL= option and on the number of observations in the input data set, if this can be determined in advance. (PROC FORECAST cannot determine the number of input observations before reading the data when a BY statement or a WHERE statement is used or if the data are from a tape format SAS data set or external database. The NLAGS= value must be fixed before the data are processed.)

If the INTERVAL= option is specified, the default NLAGS= value includes lags for up to three years plus one, subject to the maximum of 13 lags or one third of the number of observations in your data set, whichever is less. If the number of observations in the input data set cannot be determined, the maximum NLAGS= default value is 13. If the INTERVAL= option is not specified, the default is NLAGS=13 or one-third the number of input observations, whichever is less.

If the Toeplitz matrix formed by the autocovariance matrix at a given step is not positive definite, the maximal number of autoregressive lags is reduced.

For example, for INTERVAL=QTR, the default is NLAGS=13 (that is, 4×3+1) provided that there are at least 39 observations. The NLAGS= option default is always at least 3.

EXPO Method

Exponential smoothing is used when the METHOD=EXPO option is specified. The term exponential smoothing is derived from the computational scheme developed by Brown and others (Brown and Meyers 1961; Brown 1962). Estimates are computed with updating formulas that are developed across time series in a manner similar to smoothing.

The EXPO method fits a trend model such that the most recent data are weighted more heavily than data in the early part of the series. The weight of an observation is a geometric (exponential) function of the number of periods that the observation extends into the past relative to the current period. The weight function is

$w_{{\tau}}={\omega} (1-{\omega})^{t-{\tau}}$

where ${\tau}$ is the observation number of the past observation, t is the current observation number, and ${\omega}$ is the weighting constant specified with the WEIGHT= option.

You specify the model with the TREND= option as follows:

TREND=1 specifies single exponential smoothing (a constant model)
TREND=2 specifies double exponential smoothing (a linear trend model)
TREND=3 specifies triple exponential smoothing (a quadratic trend model)

Updating Equations

The single exponential smoothing operation is expressed by the formula

$S_{t}={\omega}x_{t}+(1-{\omega})S_{t-1}$

where S_t is the smoothed value at the current period, t is the time index of the current period, and x_t is the current actual value of the series. The smoothed value S_t is the forecast of x_t+1 and is calculated as the smoothing constant ${\omega}$ times the value of the series, x_t, in the current period plus ( ${1-{\omega}}$ ) times the previous smoothed value S_t-1, which is the forecast of x_t computed at time t-1.

Double and triple exponential smoothing are derived by applying exponential smoothing to the smoothed series, obtaining smoothed values as follows:

$S_{t}^{[2]}={\omega}S_{t} +(1-{\omega}) S_{t-1}^{[2]}$

$S_{t}^{[3]}={\omega} S_{t}^{[2]} +(1-{\omega}) S_{t-1}^{[3]}$

Missing values after the start of the series are replaced with one-step-ahead predicted values, and the predicted value is then applied to the smoothing equations.

The polynomial time trend parameters CONSTANT, LINEAR, and QUAD in the OUTEST= data set are computed from S_T, S_T^[2], and S_T^[3], the final smoothed values at observation T, the last observation used to fit the model. In the OUTEST= data set, the values of S_T, S^[2]_T, and S^[3]_T are identified by _TYPE_=S1, _TYPE_=S2, and _TYPE_=S3, respectively.

Smoothing Weights

Exponential smoothing forecasts are forecasts for an integrated moving-average process; however, the weighting parameter is specified by the user rather than estimated from the data. Experience has shown that good values for the WEIGHT= option are between 0.05 and 0.3. As a general rule, smaller smoothing weights are appropriate for series with a slowly changing trend, while larger weights are appropriate for volatile series with a rapidly changing trend. If unspecified, the weight defaults to (1- .8^1/trend), where trend is the value of the TREND= option. This produces defaults of WEIGHT=0.2 for TREND=1, WEIGHT=0.10557 for TREND=2, and WEIGHT=0.07168 for TREND=3.

Confidence Limits

The confidence limits for exponential smoothing forecasts are calculated as they would be for an exponentially weighted time-trend regression, using the simplifying assumption of an infinite number of observations. The variance estimate is computed using the mean square of the unweighted one-step-ahead forecast residuals.

More detailed descriptions of the forecast computations can be found in Montgomery and Johnson (1976) and Brown (1962).

Exponential Smoothing as an ARIMA Model

The traditional description of exponential smoothing given in the preceding section is standard in most books on forecasting, and so this traditional version is employed by PROC FORECAST.

However, the standard exponential smoothing model is, in fact, a special case of an ARIMA model (McKenzie 1984). Single exponential smoothing corresponds to an ARIMA(0,1,1) model; double exponential smoothing corresponds to an ARIMA(0,2,2) model; and triple exponential smoothing corresponds to an ARIMA(0,3,3) model.

The traditional exponential smoothing calculations can be viewed as a simple and computationally inexpensive method of forecasting the equivalent ARIMA model. The exponential smoothing technique was developed in the 1960s before computers were widely available and before ARIMA modeling methods were developed.

If you use exponential smoothing as a forecasting method, you might consider using the ARIMA procedure to forecast the equivalent ARIMA model as an alternative to the traditional version of exponential smoothing used by PROC FORECAST. The advantages of the ARIMA form are:

The optimal smoothing weight is automatically computed as the estimate of the moving average parameter of the ARIMA model.
For double exponential smoothing, the optimal pair of two smoothing weights are computed. For triple exponential smoothing, the optimal three smoothing weights are computed by the ARIMA method. Most implementations of the traditional exponential smoothing method (including PROC FORECAST) use the same smoothing weight for each stage of smoothing.
The problem of setting the starting smoothed value is automatically handled by the ARIMA method. This is done in a statistically optimal way when the maximum likelihood method is used.
The statistical estimates of the forecast confidence limits have a sounder theoretical basis.

See Chapter 7, "The ARIMA Procedure," for information on forecasting with ARIMA models.

The Time Series Forecasting System provides for exponential smoothing models and allows you to either specify or optimize the smoothing weights. See Chapter 23, "Getting Started with Time Series Forecasting," for details.

WINTERS Method

The WINTERS method uses updating equations similar to exponential smoothing to fit parameters for the model

$x_{t} = ( a + b t ) s(t) + {\epsilon}_{t}$

where a and b are the trend parameters, and the function s(t) selects the seasonal parameter for the season corresponding to time t.

The WINTERS method assumes that the series values are positive. If negative or zero values are found in the series, a warning is printed and the values are treated as missing.

The preceding standard WINTERS model uses a linear trend. However, PROC FORECAST can also fit a version of the WINTERS method that uses a quadratic trend. When TREND=3 is specified for METHOD=WINTERS, PROC FORECAST fits the following model:

$x_{t} = ( a + b t + c t^2 ) s(t)+{\epsilon}_{t}$

The quadratic trend version of the Winters method is often unstable, and its use is not recommended.

When TREND=1 is specified, the following constant trend version is fit:

$x_{t} = a s(t) + {\epsilon}_{t}$

The default for the WINTERS method is TREND=2, which produces the standard linear trend model.

Seasonal Factors

The notation s(t) represents the selection of the seasonal factor used for different time periods. For example, if INTERVAL=DAY and SEASONS=MONTH, there are 12 seasonal factors, one for each month in the year, and the time index t is measured in days. For any observation, t is determined by the ID variable and s(t) selects the seasonal factor for the month that t falls in. For example, if t is 9 February 1993 then s(t) is the seasonal parameter for February.

When there are multiple seasons specified, s(t) is the product of the parameters for the seasons. For example, if SEASONS=(MONTH DAY), then s(t) is the product of the seasonal parameter for the month corresponding to the period t, and the seasonal parameter for the day of the week corresponding to period t. When the SEASONS= option is not specified, the seasonal factors s(t) are not included in the model. See the section "Specifying Seasonality" later in this chapter for more information on specifying multiple seasonal factors.

Updating Equations

This section shows the updating equations for the Winters method. In the following formula, x_t is the actual value of the series at time t; a_t is the smoothed value of the series at time t; b_t is the smoothed trend at time t; c_t is the smoothed quadratic trend at time t; s_t-1(t) selects the old value of the seasonal factor corresponding to time t before the seasonal factors are updated.

The estimates of the constant, linear, and quadratic trend parameters are updated using the following equations:

For TREND=3,

$a_{t}={\omega}_{1} \frac{x_{t}}{s_{t-1}(t)} +(1-{\omega}_{1}) (a_{t-1}+b_{t-1}+c_{t-1})$

$b_{t}={\omega}_{2} (a_{t}-a_{t-1}+c_{t-1}) +(1-{\omega}_{2}) (b_{t-1}+2c_{t-1})$

$c_{t}={\omega}_{2}{1 \over 2} (b_{t} -b_{t-1}) +(1-{\omega}_{2}) c_{t-1}$

For TREND=2,

$a_{t}={\omega}_{1} \frac{x_{t}}{s_{t-1}(t)} +(1-{\omega}_{1}) (a_{t-1}+b_{t-1})$

$b_{t}={\omega}_{2} (a_{t}-a_{t-1}) +(1-{\omega}_{2}) b_{t-1}$

For TREND=1,

$a_{t}={\omega}_{1} \frac{x_{t}}{s_{t-1}(t)} +(1-{\omega}_{1}) a_{t-1}$

In this updating system, the trend polynomial is always centered at the current period so that the intercept parameter of the trend polynomial for predicted values at times after t is always the updated intercept parameter a_t. The predicted value for ${\tau}$ periods ahead is

$x_{t+{\tau}}=(a_{t}+b_{t}{\tau}) s_{t}(t+{\tau})$

The seasonal parameters are updated when the season changes in the data, using the mean of the ratios of the actual to the predicted values for the season. For example, if SEASONS=MONTH and INTERVAL=DAY, then, when the observation for the first of February is encountered, the seasonal parameter for January is updated using the formula

$s_{t}(t-1)={\omega}_{3} \frac{1}{31} \sum_{i=t-31}^{t-1}{\frac{x_{i}}{a_{i} } } + (1-{\omega}_{3}) s_{t-1}(t-1)$

where t is February 1 of the current year and s_t(t-1) is the seasonal parameter for January updated with the data available at time t.

When multiple seasons are used, s_t(t) is a product of seasonal factors. For example, if SEASONS=(MONTH DAY) then s_t(t) is the product of the seasonal factors for the month and for the day of the week: s_t(t) = s^m_t(t) s^d_t(t).

The factor s^m_t(t) is updated at the start of each month using the preceding formula, and the factor s^d_t(t) is updated at the start of each week using the following formula:

$s^d_{t}(t-1)={\omega}_{3} \frac{1}7 \sum_{i=t-7}^{t-1}{\frac{x_{i}}{a_{i} } } + (1-{\omega}_{3}) s^d_{t-1}(t-1)$

Missing values after the start of the series are replaced with one-step-ahead predicted values, and the predicted value is substituted for x_i and applied to the updating equations.

Normalization

The parameters are normalized so that the seasonal factors for each cycle have a mean of 1.0. This normalization is performed after each complete cycle and at the end of the data. Thus, if INTERVAL=MONTH and SEASONS=MONTH are specified, and a series begins with a July value, then the seasonal factors for the series are normalized at each observation for July and at the last observation in the data set. The normalization is performed by dividing each of the seasonal parameters, and multiplying each of the trend parameters, by the mean of the unnormalized seasonal parameters.

Smoothing Weights

The weight for updating the seasonal factors, ${\omega}$ ₃, is given by the third value specified in the WEIGHT= option. If the WEIGHT= option is not used, then ${\omega}$ ₃ defaults to 0.25; if the WEIGHT= option is used but does not specify a third value, then ${\omega}$ ₃ defaults to ${\omega}$ ₂. The weight for updating the linear and quadratic trend parameters, ${\omega}$ ₂, is given by the second value specified in the WEIGHT= option; if the WEIGHT= option does not specify a second value, then ${\omega}$ ₂ defaults to ${\omega}$ ₁. The updating weight for the constant parameter, ${\omega}$ ₁, is given by the first value specified in the WEIGHT= option. As a general rule, smaller smoothing weights are appropriate for series with a slowly changing trend, while larger weights are appropriate for volatile series with a rapidly changing trend.

If the WEIGHT= option is not used, then ${\omega}$ ₁ defaults to (1- .8^1/trend), where trend is the value of the TREND= option. This produces defaults of WEIGHT=0.2 for TREND=1, WEIGHT=0.10557 for TREND=2, and WEIGHT=0.07168 for TREND=3.

The Time Series Forecasting System provides for generating forecast models using Winters Method and allows you to specify or optimize the weights. See Chapter 23, "Getting Started with Time Series Forecasting," for details.

Confidence Limits

A method for calculating exact forecast confidence limits for the WINTERS method is not available. Therefore, the approach taken in PROC FORECAST is to assume that the true seasonal factors have small variability about a set of fixed seasonal factors and that the remaining variation of the series is small relative to the mean level of the series. The equations are written

$s_{t}(t)=\rm{I}(t)(1+{\delta}_{t})$

$x_{t}={\mu}\rm{I}(t)(1+{\gamma}_{t})$

$a_{t}={\xi}(1+{\alpha}_{t})$

where ${\mu}$ is the mean level and I(t) are the fixed seasonal factors. Assuming that ${{\alpha}_{t}}$ and ${{\delta}_{t}}$ are small, the forecast equations can be linearized and only first-order terms in ${{\delta}_{t}}$ and ${{\alpha}_{t}}$ kept. In terms of forecasts for ${{\gamma}_{t}}$ , this linearized system is equivalent to a seasonal ARIMA model. Confidence limits for ${{\gamma}_{t}}$ are based on this ARIMA model and converted into confidence limits for x_t using s_t(t) as estimates of I(t).

The exponential smoothing confidence limits are based on an approximation to a weighted regression model, whereas the preceding Winters confidence limits are based on an approximation to an ARIMA model. You can use METHOD=WINTERS without the SEASONS= option to do exponential smoothing and get confidence limits for the EXPO forecasts based on the ARIMA model approximation. These are generally more pessimistic than the weighted regression confidence limits produced by METHOD=EXPO.

ADDWINTERS Method

The ADDWINTERS method is like the WINTERS method except that the seasonal parameters are added to the trend instead of multiplied with the trend. The default TREND=2 model is as follows:

$x_{t}=a+bt+s(t)+{\epsilon}_{t}$

The WINTERS method for updating equation and confidence limits calculations described in the preceding section are modified accordingly for the additive version.

Holt Two-Parameter Exponential Smoothing

If the seasonal factors are omitted (that is, if the SEASONS= option is not specified), the WINTERS (and ADDWINTERS) method reduces to the Holt two-parameter version of exponential smoothing. Thus, the WINTERS method is often referred to as the Holt-Winters method. Double exponential smoothing is a special case of the Holt two-parameter smoother. The double exponential smoothing results can be duplicated with METHOD=WINTERS by omitting the SEASONS= option and appropriately setting the WEIGHT= option. Letting ${{\alpha}={\omega}(2-{\omega})}$ and ${{\beta}={\omega}/(2-{\omega})}$ , the following statements produce the same forecasts:

proc forecast method=expo trend=2 weight= ... ;

proc forecast method=winters trend=2
weight=(,) ... ;

Although the forecasts are the same, the confidence limits are computed differently.

Choice of Weights for EXPO, WINTERS, and ADDWINTERS Methods

For the EXPO, WINTERS, and ADDWINTERS methods, properly chosen smoothing weights are of critical importance in generating reasonable results. There are several factors to consider in choosing the weights.

The noisier the data, the lower should be the weight given to the most recent observation. Another factor to consider is how quickly the mean of the time series is changing. If the mean of the series is changing rapidly, relatively more weight should be given to the most recent observation. The more stable the series over time, the lower should be the weight given to the most recent observation.

Note that the smoothing weights should be set separately for each series; weights that produce good results for one series may be poor for another series. Since PROC FORECAST does not have a feature to use different weights for different series, when forecasting multiple series with the EXPO, WINTERS, or ADDWINTERS method it may be desirable to use different PROC FORECAST steps with different WEIGHT= options.

For the Winters method, many combinations of weight values may produce unstable noninvertible models, even though all three weights are between 0 and 1. When the model is noninvertible, the forecasts depend strongly on values in the distant past, and predictions are determined largely by the starting values. Unstable models usually produce poor forecasts. The Winters model may be unstable even if the weights are optimally chosen to minimize the in-sample MSE. Refer to Archibald (1990) for a detailed discussion of the unstable region of the parameter space of the Winters model.

Optimal weights and forecasts for exponential smoothing models can be computed using the ARIMA procedure. For more information, see "Exponential Smoothing as an ARIMA Model" earlier in this chapter.

The ARIMA procedure can also be used to compute optimal weights and forecasts for seasonal ARIMA models similar to the Winters type methods. In particular, an ARIMA(0,1,1)×(0,1,1)S model may be a good alternative to the additive version of the Winters method. The ARIMA(0,1,1)×(0,1,1)S model fit to the logarithms of the series may be a good alternative to the multiplicative Winters method. See Chapter 7, "The ARIMA Procedure," for information on forecasting with ARIMA models.

The Time Series Forecasting System can be used to automatically select an appropriate smoothing method as well as to optimize the smoothing weights. See Chapter 23, "Getting Started with Time Series Forecasting," for more information.

Starting Values for EXPO, WINTERS, and ADDWINTERS Methods

The exponential smoothing method requires starting values for the smoothed values S₀, S^[2]₀, and S^[3]₀. The Winters and additive Winters methods require starting values for the trend coefficients and seasonal factors.

By default, starting values for the trend parameters are computed by a time-trend regression over the first few observations for the series. Alternatively, you can specify the starting value for the trend parameters with the ASTART=, BSTART=, and CSTART= options.

The number of observations used in the time-trend regression for starting values depends on the NSTART= option. For METHOD=EXPO, NSTART= beginning values of the series are used, and the coefficients of the time-trend regression are then used to form the initial smoothed values S₀, S^[2]₀, and S^[3]₀.

For METHOD=WINTERS or METHOD=ADDWINTERS, n complete seasonal cycles are used to compute starting values for the trend parameter, where n is the value of the NSTART= option. For example, for monthly data the seasonal cycle is one year, so NSTART=2 specifies that the first 24 observations at the beginning of each series are used for the time trend regression used to calculate starting values.

The starting values for the seasonal factors for the WINTERS and ADDWINTERS methods are computed from seasonal averages over the first few complete seasonal cycles at the beginning of the series. The number of seasonal cycles averaged to compute starting seasonal factors is controlled by the NSSTART= option. For example, for monthly data with SEASONS=12 or SEASONS=MONTH, the first n January values are averaged to get the starting value for the January seasonal parameter, where n is the value of the NSSTART= option.

The s₀(i) seasonal parameters are set to the ratio (for WINTERS) or difference (for ADDWINTERS) of the mean for the season to the overall mean for the observations used to compute seasonal starting values.

For example, if METHOD=WINTERS, INTERVAL=DAY, SEASON=(MONTH DAY), and NSTART=2 (the default), the initial seasonal parameter for January is the ratio of the mean value over days in the first two Januarys after the start of the series (that is, after the first nonmissing value), to the mean value for all days read for initialization of the seasonal factors. Likewise, the initial factor for Sundays is the ratio of the mean value for Sundays to the mean of all days read.

For the ASTART=, BSTART=, and CSTART= options, the values specified are associated with the variables in the VAR statement in the order in which the variables are listed (the first value with the first variable, the second value with the second variable, and so on). If there are fewer values than variables, default starting values are used for the later variables. If there are more values than variables, the extra values are ignored.

Chapter Contents
Previous
Next
Top