Example 60.1: Performing a Stepwise Discriminant Analysis
The iris data published by Fisher (1936) have been widely used
for examples in discriminant analysis and cluster analysis.
The sepal length, sepal width, petal length, and petal width are
measured in millimeters on fifty iris specimens from each of three
species: Iris setosa, I. versicolor, and I. virginica.
proc format;
value specname
1='Setosa '
2='Versicolor'
3='Virginica ';
data iris;
title 'Fisher (1936) Iris Data';
input SepalLength SepalWidth PetalLength PetalWidth
Species @@;
format Species specname.;
label SepalLength='Sepal Length in mm.'
SepalWidth ='Sepal Width in mm.'
PetalLength='Petal Length in mm.'
PetalWidth ='Petal Width in mm.';
datalines;
50 33 14 02 1 64 28 56 22 3 65 28 46 15 2 67 31 56 24 3
63 28 51 15 3 46 34 14 03 1 69 31 51 23 3 62 22 45 15 2
59 32 48 18 2 46 36 10 02 1 61 30 46 14 2 60 27 51 16 2
65 30 52 20 3 56 25 39 11 2 65 30 55 18 3 58 27 51 19 3
68 32 59 23 3 51 33 17 05 1 57 28 45 13 2 62 34 54 23 3
77 38 67 22 3 63 33 47 16 2 67 33 57 25 3 76 30 66 21 3
49 25 45 17 3 55 35 13 02 1 67 30 52 23 3 70 32 47 14 2
64 32 45 15 2 61 28 40 13 2 48 31 16 02 1 59 30 51 18 3
55 24 38 11 2 63 25 50 19 3 64 32 53 23 3 52 34 14 02 1
49 36 14 01 1 54 30 45 15 2 79 38 64 20 3 44 32 13 02 1
67 33 57 21 3 50 35 16 06 1 58 26 40 12 2 44 30 13 02 1
77 28 67 20 3 63 27 49 18 3 47 32 16 02 1 55 26 44 12 2
50 23 33 10 2 72 32 60 18 3 48 30 14 03 1 51 38 16 02 1
61 30 49 18 3 48 34 19 02 1 50 30 16 02 1 50 32 12 02 1
61 26 56 14 3 64 28 56 21 3 43 30 11 01 1 58 40 12 02 1
51 38 19 04 1 67 31 44 14 2 62 28 48 18 3 49 30 14 02 1
51 35 14 02 1 56 30 45 15 2 58 27 41 10 2 50 34 16 04 1
46 32 14 02 1 60 29 45 15 2 57 26 35 10 2 57 44 15 04 1
50 36 14 02 1 77 30 61 23 3 63 34 56 24 3 58 27 51 19 3
57 29 42 13 2 72 30 58 16 3 54 34 15 04 1 52 41 15 01 1
71 30 59 21 3 64 31 55 18 3 60 30 48 18 3 63 29 56 18 3
49 24 33 10 2 56 27 42 13 2 57 30 42 12 2 55 42 14 02 1
49 31 15 02 1 77 26 69 23 3 60 22 50 15 3 54 39 17 04 1
66 29 46 13 2 52 27 39 14 2 60 34 45 16 2 50 34 15 02 1
44 29 14 02 1 50 20 35 10 2 55 24 37 10 2 58 27 39 12 2
47 32 13 02 1 46 31 15 02 1 69 32 57 23 3 62 29 43 13 2
74 28 61 19 3 59 30 42 15 2 51 34 15 02 1 50 35 13 03 1
56 28 49 20 3 60 22 40 10 2 73 29 63 18 3 67 25 58 18 3
49 31 15 01 1 67 31 47 15 2 63 23 44 13 2 54 37 15 02 1
56 30 41 13 2 63 25 49 15 2 61 28 47 12 2 64 29 43 13 2
51 25 30 11 2 57 28 41 13 2 65 30 58 22 3 69 31 54 21 3
54 39 13 04 1 51 35 14 03 1 72 36 61 25 3 65 32 51 20 3
61 29 47 14 2 56 29 36 13 2 69 31 49 15 2 64 27 53 19 3
68 30 55 21 3 55 25 40 13 2 48 34 16 02 1 48 30 14 01 1
45 23 13 03 1 57 25 50 20 3 57 38 17 03 1 51 38 15 03 1
55 23 40 13 2 66 30 44 14 2 68 28 48 14 2 54 34 17 02 1
51 37 15 04 1 52 35 15 02 1 58 28 51 24 3 67 30 50 17 2
63 33 60 25 3 53 37 15 02 1
;
A stepwise discriminant analysis is
performed using stepwise selection.
In the PROC STEPDISC statement, the BSSCP and
TSSCP options display the between-class SSCP matrix
and the total-sample corrected SSCP matrix.
By default, the significance level of an F test from an
analysis of covariance is used as the selection criterion.
The variable under consideration is the dependent variable,
and the variables already chosen act as covariates.
The following SAS statements produce Output 60.1.1
through Output 60.1.8:
proc stepdisc data=iris bsscp tsscp;
class Species;
var SepalLength SepalWidth PetalLength PetalWidth;
run;
Output 60.1.1: Iris Data: Summary Information
The Method for Selecting Variables is STEPWISE |
Observations |
150 |
Variable(s) in the Analysis |
4 |
Class Levels |
3 |
Variable(s) will be Included |
0 |
|
|
Significance Level to Enter |
0.15 |
|
|
Significance Level to Stay |
0.15 |
Class Level Information |
Species |
Variable Name |
Frequency |
Weight |
Proportion |
Setosa |
Setosa |
50 |
50.0000 |
0.333333 |
Versicolor |
Versicolor |
50 |
50.0000 |
0.333333 |
Virginica |
Virginica |
50 |
50.0000 |
0.333333 |
|
Output 60.1.2: Iris Data: Between-Class and Total-Sample SSCP Matrices
Between-Class SSCP Matrix |
Variable |
Label |
SepalLength |
SepalWidth |
PetalLength |
PetalWidth |
SepalLength |
Sepal Length in mm. |
6321.21333 |
-1995.26667 |
16524.84000 |
7127.93333 |
SepalWidth |
Sepal Width in mm. |
-1995.26667 |
1134.49333 |
-5723.96000 |
-2293.26667 |
PetalLength |
Petal Length in mm. |
16524.84000 |
-5723.96000 |
43710.28000 |
18677.40000 |
PetalWidth |
Petal Width in mm. |
7127.93333 |
-2293.26667 |
18677.40000 |
8041.33333 |
Total-Sample SSCP Matrix |
Variable |
Label |
SepalLength |
SepalWidth |
PetalLength |
PetalWidth |
SepalLength |
Sepal Length in mm. |
10216.83333 |
-632.26667 |
18987.30000 |
7692.43333 |
SepalWidth |
Sepal Width in mm. |
-632.26667 |
2830.69333 |
-4911.88000 |
-1812.42667 |
PetalLength |
Petal Length in mm. |
18987.30000 |
-4911.88000 |
46432.54000 |
19304.58000 |
PetalWidth |
Petal Width in mm. |
7692.43333 |
-1812.42667 |
19304.58000 |
8656.99333 |
|
In Step 1, the tolerance is 1.0 for each variable under
consideration because no variables have yet entered the model.
Variable PetalLength is selected because its F
statistic, 1180.161, is the largest among all variables.
Output 60.1.3: Iris Data: Stepwise Selection Step 1
The STEPDISC Procedure |
Stepwise Selection: Step 1 |
Statistics for Entry, DF = 2, 147 |
Variable |
Label |
R-Square |
F Value |
Pr > F |
Tolerance |
SepalLength |
Sepal Length in mm. |
0.6187 |
119.26 |
<.0001 |
1.0000 |
SepalWidth |
Sepal Width in mm. |
0.4008 |
49.16 |
<.0001 |
1.0000 |
PetalLength |
Petal Length in mm. |
0.9414 |
1180.16 |
<.0001 |
1.0000 |
PetalWidth |
Petal Width in mm. |
0.9289 |
960.01 |
<.0001 |
1.0000 |
Variable PetalLength will be entered. |
Variable(s) that have been Entered |
PetalLength |
Multivariate Statistics |
Statistic |
Value |
F Value |
Num DF |
Den DF |
Pr > F |
Wilks' Lambda |
0.058628 |
1180.16 |
2 |
147 |
<.0001 |
Pillai's Trace |
0.941372 |
1180.16 |
2 |
147 |
<.0001 |
Average Squared Canonical Correlation |
0.470686 |
|
|
|
|
|
In Step 2, with variable PetalLength already in the
model, PetalLength
is tested for removal before selecting a new variable for entry.
Since PetalLength meets the criterion to stay, it is used as a
covariate in the analysis of covariance for variable selection.
Variable SepalWidth is selected because its F statistic, 43.035,
is the largest among all variables not in the model and its
associated tolerance, 0.8164, meets the criterion to enter.
The process is repeated in Steps 3 and 4.
Variable PetalWidth is entered in Step 3,
and variable SepalLength is entered in Step 4.
Output 60.1.4: Iris Data: Stepwise Selection Step 2
The STEPDISC Procedure |
Stepwise Selection: Step 2 |
Statistics for Removal, DF = 2, 147 |
Variable |
Label |
R-Square |
F Value |
Pr > F |
PetalLength |
Petal Length in mm. |
0.9414 |
1180.16 |
<.0001 |
No variables can be removed. |
Statistics for Entry, DF = 2, 146 |
Variable |
Label |
Partial R-Square |
F Value |
Pr > F |
Tolerance |
SepalLength |
Sepal Length in mm. |
0.3198 |
34.32 |
<.0001 |
0.2400 |
SepalWidth |
Sepal Width in mm. |
0.3709 |
43.04 |
<.0001 |
0.8164 |
PetalWidth |
Petal Width in mm. |
0.2533 |
24.77 |
<.0001 |
0.0729 |
Variable SepalWidth will be entered. |
Variable(s) that have been Entered |
SepalWidth |
PetalLength |
Multivariate Statistics |
Statistic |
Value |
F Value |
Num DF |
Den DF |
Pr > F |
Wilks' Lambda |
0.036884 |
307.10 |
4 |
292 |
<.0001 |
Pillai's Trace |
1.119908 |
93.53 |
4 |
294 |
<.0001 |
Average Squared Canonical Correlation |
0.559954 |
|
|
|
|
|
Output 60.1.5: Iris Data: Stepwise Selection Step 3
The STEPDISC Procedure |
Stepwise Selection: Step 3 |
Statistics for Removal, DF = 2, 146 |
Variable |
Label |
Partial R-Square |
F Value |
Pr > F |
SepalWidth |
Sepal Width in mm. |
0.3709 |
43.04 |
<.0001 |
PetalLength |
Petal Length in mm. |
0.9384 |
1112.95 |
<.0001 |
No variables can be removed. |
Statistics for Entry, DF = 2, 145 |
Variable |
Label |
Partial R-Square |
F Value |
Pr > F |
Tolerance |
SepalLength |
Sepal Length in mm. |
0.1447 |
12.27 |
<.0001 |
0.1323 |
PetalWidth |
Petal Width in mm. |
0.3229 |
34.57 |
<.0001 |
0.0662 |
Variable PetalWidth will be entered. |
Variable(s) that have been Entered |
SepalWidth |
PetalLength |
PetalWidth |
Multivariate Statistics |
Statistic |
Value |
F Value |
Num DF |
Den DF |
Pr > F |
Wilks' Lambda |
0.024976 |
257.50 |
6 |
290 |
<.0001 |
Pillai's Trace |
1.189914 |
71.49 |
6 |
292 |
<.0001 |
Average Squared Canonical Correlation |
0.594957 |
|
|
|
|
|
Output 60.1.6: Iris Data: Stepwise Selection Step 4
The STEPDISC Procedure |
Stepwise Selection: Step 4 |
Statistics for Removal, DF = 2, 145 |
Variable |
Label |
Partial R-Square |
F Value |
Pr > F |
SepalWidth |
Sepal Width in mm. |
0.4295 |
54.58 |
<.0001 |
PetalLength |
Petal Length in mm. |
0.3482 |
38.72 |
<.0001 |
PetalWidth |
Petal Width in mm. |
0.3229 |
34.57 |
<.0001 |
No variables can be removed. |
Statistics for Entry, DF = 2, 144 |
Variable |
Label |
Partial R-Square |
F Value |
Pr > F |
Tolerance |
SepalLength |
Sepal Length in mm. |
0.0615 |
4.72 |
0.0103 |
0.0320 |
Variable SepalLength will be entered. |
All variables have been entered. |
Multivariate Statistics |
Statistic |
Value |
F Value |
Num DF |
Den DF |
Pr > F |
Wilks' Lambda |
0.023439 |
199.15 |
8 |
288 |
<.0001 |
Pillai's Trace |
1.191899 |
53.47 |
8 |
290 |
<.0001 |
Average Squared Canonical Correlation |
0.595949 |
|
|
|
|
|
Since no more variables can be added to or removed from the
model, the procedure stops at Step 5 and displays a summary of
the selection process.
Output 60.1.7: Iris Data: Stepwise Selection Step 5
The STEPDISC Procedure |
Stepwise Selection: Step 5 |
Statistics for Removal, DF = 2, 144 |
Variable |
Label |
Partial R-Square |
F Value |
Pr > F |
SepalLength |
Sepal Length in mm. |
0.0615 |
4.72 |
0.0103 |
SepalWidth |
Sepal Width in mm. |
0.2335 |
21.94 |
<.0001 |
PetalLength |
Petal Length in mm. |
0.3308 |
35.59 |
<.0001 |
PetalWidth |
Petal Width in mm. |
0.2570 |
24.90 |
<.0001 |
No variables can be removed. |
No further steps are possible. |
|
Output 60.1.8: Iris Data: Stepwise Selection Summary
Stepwise Selection Summary |
Step |
Number In |
Entered |
Removed |
Label |
Partial R-Square |
F Value |
Pr > F |
Wilks' Lambda |
Pr < Lambda |
Average Squared Canonical Correlation |
Pr > ASCC |
1 |
1 |
PetalLength |
|
Petal Length in mm. |
0.9414 |
1180.16 |
<.0001 |
0.05862828 |
<.0001 |
0.47068586 |
<.0001 |
2 |
2 |
SepalWidth |
|
Sepal Width in mm. |
0.3709 |
43.04 |
<.0001 |
0.03688411 |
<.0001 |
0.55995394 |
<.0001 |
3 |
3 |
PetalWidth |
|
Petal Width in mm. |
0.3229 |
34.57 |
<.0001 |
0.02497554 |
<.0001 |
0.59495691 |
<.0001 |
4 |
4 |
SepalLength |
|
Sepal Length in mm. |
0.0615 |
4.72 |
0.0103 |
0.02343863 |
<.0001 |
0.59594941 |
<.0001 |
|
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.