Chapter Contents |
Previous |
Next |
Multiple Regression |
Choose Vars:Hat Diag. |
This adds the variable H_GPA to the data window, as shown in Figure 14.11. (The residual variable, R_GPA, is added when a residual-by-predicted plot is created.)
Drag a rectangle in the fit window to select an area for the new plot. |
Choose Analyze:Scatter Plot (Y X). |
This displays the scatter plot variables dialog.
Assign R_GPA the Y role and H_GPA the X role, then click on OK. |
The plot appears in the fit window in the area you selected.
Belsley, Kuh, and Welsch (1980) propose a cutoff of 2 p/ n for the hat diagonal values, where n is the number of observations used to fit the model and p is the number of parameters in the model. Observations with values above this cutoff should be investigated. For this example, H_GPA values over 0.036 should be investigated. About 15% of the observations have values above this cutoff. There are other measures you can use to determine the influence of observations. These include Cook's D, Dffits, Covratio, and Dfbetas. Each of these measures examines some effect of deleting the ith observation.
Choose Vars:Dffits. |
A new variable, F_GPA, that contains the Dffits values
is added to the data window.
Large absolute values of Dffits indicate influential observations.
A general cutoff to consider is 2.
It is, thus, useful in this example to identify those observations
where H_GPA exceeds 0.036 and the absolute value of F_GPA
is greater than 2.
One way to accomplish this is by examining the H_GPA by F_GPA
scatter plot.
Choose Analyze:Scatter Plot (Y X). |
This displays the scatter plot variables dialog.
Assign H_GPA the Y role and F_GPA the X role, then click on OK. |
This displays the H_GPA by F_GPA scatter plot.
None of the observations identified as potential influential
observations
(H_GPA > 0.036) are, in fact, influential for this model using the
criterion
.
Chapter Contents |
Previous |
Next |
Top |
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.