Predicted and Residual Values
The display of the predicted values and residuals is controlled
by the P, R, CLM, and CLI options in the MODEL statement.
The P option causes PROC REG to display the observation
number, the ID value (if an ID statement is used), the
actual value, the predicted value, and the residual.
The R, CLI, and CLM options also
produce the items under the P option.
Thus, P is unnecessary if you use one of the other options.
The R option requests more detail, especially about the residuals.
The standard errors of the mean predicted
value and the residual are displayed.
The studentized residual, which is the residual divided
by its standard error, is both displayed and plotted.
A measure of influence, Cook's D, is displayed.
Cook's D measures the change to the estimates
that results from deleting each observation
(Cook 1977, 1979).
This statistic is very similar to DFFITS.
The CLM option requests that PROC REG display the % lower
and upper confidence limits for the mean predicted values.
This accounts for the variation due
to estimating the parameters only.
If you want a % confidence interval for
observed values, then you can use the CLI option,
which adds in the variability of the error term.
The level can be specified with the ALPHA= option
in the PROC REG or MODEL statement.
You can use these statistics in PLOT and PAINT statements.
This is useful in performing a variety of regression diagnostics.
For definitions of the statistics produced by these options, see
Chapter 3, "Introduction to Regression Procedures."
The following example uses the US population data
found on the section "Polynomial Regression".
data USPop2;
input Year @@;
YearSq=Year*Year;
datalines;
1980 1990 2000
;
data USPop2;
set USPopulation USPop2;
proc reg data=USPop2;
id Year;
model Population=Year YearSq / r cli clm;
run;
The REG Procedure |
Model: MODEL1 |
Dependent Variable: Population |
Analysis of Variance |
Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
Model |
2 |
71799 |
35900 |
4641.72 |
<.0001 |
Error |
16 |
123.74557 |
7.73410 |
|
|
Corrected Total |
18 |
71923 |
|
|
|
Root MSE |
2.78102 |
R-Square |
0.9983 |
Dependent Mean |
69.76747 |
Adj R-Sq |
0.9981 |
Coeff Var |
3.98613 |
|
|
Parameter Estimates |
Variable |
DF |
Parameter Estimate |
Standard Error |
t Value |
Pr > |t| |
Intercept |
1 |
20450 |
843.47533 |
24.25 |
<.0001 |
Year |
1 |
-22.78061 |
0.89785 |
-25.37 |
<.0001 |
YearSq |
1 |
0.00635 |
0.00023877 |
26.58 |
<.0001 |
|
Figure 55.30: Regression Using the R, CLI, and CLM Options
The REG Procedure |
Model: MODEL1 |
Dependent Variable: Population |
Output Statistics |
Obs |
Year |
Dep Var Population |
Predicted Value |
Std Error Mean Predict |
95% CL Mean |
95% CL Predict |
Residual |
Std Error Residual |
Student Residual |
-2-1 0 1 2 |
Cook's D |
1 |
1790 |
3.9290 |
5.0384 |
1.7289 |
1.3734 |
8.7035 |
-1.9034 |
11.9803 |
-1.1094 |
2.178 |
-0.509 |
| *| | |
0.054 |
2 |
1800 |
5.3080 |
5.0389 |
1.3909 |
2.0904 |
7.9874 |
-1.5528 |
11.6306 |
0.2691 |
2.408 |
0.112 |
| | | |
0.001 |
3 |
1810 |
7.2390 |
6.3085 |
1.1304 |
3.9122 |
8.7047 |
-0.0554 |
12.6724 |
0.9305 |
2.541 |
0.366 |
| | | |
0.009 |
4 |
1820 |
9.6380 |
8.8472 |
0.9571 |
6.8182 |
10.8761 |
2.6123 |
15.0820 |
0.7908 |
2.611 |
0.303 |
| | | |
0.004 |
5 |
1830 |
12.8660 |
12.6550 |
0.8721 |
10.8062 |
14.5037 |
6.4764 |
18.8335 |
0.2110 |
2.641 |
0.0799 |
| | | |
0.000 |
6 |
1840 |
17.0690 |
17.7319 |
0.8578 |
15.9133 |
19.5504 |
11.5623 |
23.9015 |
-0.6629 |
2.645 |
-0.251 |
| | | |
0.002 |
7 |
1850 |
23.1910 |
24.0779 |
0.8835 |
22.2049 |
25.9509 |
17.8920 |
30.2638 |
-0.8869 |
2.637 |
-0.336 |
| | | |
0.004 |
8 |
1860 |
31.4430 |
31.6931 |
0.9202 |
29.7424 |
33.6437 |
25.4832 |
37.9029 |
-0.2501 |
2.624 |
-0.0953 |
| | | |
0.000 |
9 |
1870 |
39.8180 |
40.5773 |
0.9487 |
38.5661 |
42.5885 |
34.3482 |
46.8065 |
-0.7593 |
2.614 |
-0.290 |
| | | |
0.004 |
10 |
1880 |
50.1550 |
50.7307 |
0.9592 |
48.6972 |
52.7642 |
44.4944 |
56.9671 |
-0.5757 |
2.610 |
-0.221 |
| | | |
0.002 |
11 |
1890 |
62.9470 |
62.1532 |
0.9487 |
60.1420 |
64.1644 |
55.9241 |
68.3823 |
0.7938 |
2.614 |
0.304 |
| | | |
0.004 |
12 |
1900 |
75.9940 |
74.8448 |
0.9202 |
72.8942 |
76.7955 |
68.6350 |
81.0547 |
1.1492 |
2.624 |
0.438 |
| | | |
0.008 |
13 |
1910 |
91.9720 |
88.8056 |
0.8835 |
86.9326 |
90.6785 |
82.6197 |
94.9915 |
3.1664 |
2.637 |
1.201 |
| |** | |
0.054 |
14 |
1920 |
105.7100 |
104.0354 |
0.8578 |
102.2169 |
105.8540 |
97.8658 |
110.2051 |
1.6746 |
2.645 |
0.633 |
| |* | |
0.014 |
15 |
1930 |
122.7750 |
120.5344 |
0.8721 |
118.6857 |
122.3831 |
114.3558 |
126.7130 |
2.2406 |
2.641 |
0.848 |
| |* | |
0.026 |
16 |
1940 |
131.6690 |
138.3025 |
0.9571 |
136.2735 |
140.3315 |
132.0676 |
144.5374 |
-6.6335 |
2.611 |
-2.540 |
| *****| | |
0.289 |
17 |
1950 |
151.3250 |
157.3397 |
1.1304 |
154.9434 |
159.7360 |
150.9758 |
163.7036 |
-6.0147 |
2.541 |
-2.367 |
| ****| | |
0.370 |
18 |
1960 |
179.3230 |
177.6460 |
1.3909 |
174.6975 |
180.5945 |
171.0543 |
184.2377 |
1.6770 |
2.408 |
0.696 |
| |* | |
0.054 |
19 |
1970 |
203.2110 |
199.2215 |
1.7289 |
195.5564 |
202.8865 |
192.2796 |
206.1633 |
3.9895 |
2.178 |
1.831 |
| |*** | |
0.704 |
20 |
1980 |
. |
222.0660 |
2.1348 |
217.5404 |
226.5916 |
214.6338 |
229.4983 |
. |
. |
. |
|
. |
21 |
1990 |
. |
246.1797 |
2.6019 |
240.6639 |
251.6955 |
238.1062 |
254.2532 |
. |
. |
. |
|
. |
22 |
2000 |
. |
271.5625 |
3.1257 |
264.9363 |
278.1887 |
262.6932 |
280.4317 |
. |
. |
. |
|
. |
Sum of Residuals |
-5.8175E-11 |
Sum of Squared Residuals |
123.74557 |
Predicted Residual SS (PRESS) |
188.54924 |
|
Figure 55.31: Regression Using the R, CLI, and CLM Options
After producing the usual Analysis of Variance and Parameter
Estimates tables (Figure 55.29),
the procedure displays the results of
requesting the options for predicted and residual values
(Figure 55.30).
For each observation, the requested information is shown.
Note that the ID variable is used to identify each observation.
Also note that, for observations with missing dependent variables,
the predicted value, standard error of the predicted value, and
confidence intervals for the predicted value are still available.
The plot of studentized residuals and Cook's D
statistics are displayed as a result of requesting the R option.
In the plot of studentized residuals, a large
number of observations with absolute values
greater than two indicates an inadequate model.
A version of the studentized residual plot can be created on
a high-resolution graphics device; see Example 55.7 for a similar
example.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.