Example 49.2: Best Subset Selection
An alternative to stepwise selection of variables is best subset
selection.
The procedure uses the branch and bound
algorithm of
Furnival and Wilson (1974) to find a specified number of best models
containing one, two, three variables and so on, up to the single model
containing all of the explanatory variables. The criterion
used to determine "best" is based on the global
score chi-squared statistic. For two models A and B,
each having the same number of explanatory variables, model A is
considered to be
better than model B if the global score chi-squared
statistic for A exceeds
that for B.
Best subset selection analysis is requested by specifying
the
SELECTION=SCORE option in the MODEL statement.
The BEST=3 option requests
the procedure to identify only the three best models
for each size.
In other words, PROC PHREG will list the three models having the highest
score statistics of all the models possible for a given number
of covariates.
proc phreg data=Myeloma;
model Time*VStatus(0)=LogBUN HGB Platelet Age LogWBC
Frac LogPBM Protein SCalc
/ selection=score best=3;
run;
Output 49.2.1 displays the results of this analysis.
The number of explanatory
variables in the model is given in the first column,
and the names of the variables are listed on the right.
The models are listed
in descending order of their score chi-squared values within each model
size. For example,
among all models containing two explanatory variables, the
model that contains the variables LogBUN and HGB has the
largest score value (12.7252), the
model that contains the variables LogBUN and Platelet has the
second largest score value (11.1842), and the model that
contains the variables LogBUN and SCalc has the third
largest score value (9.9962).
Output 49.2.1: Best Variable Combinations
Regression Models Selected by Score Criterion |
Number of Variables |
Score Chi-Square |
Variables Included in Model |
1 |
8.5164 |
LogBUN |
1 |
5.0664 |
HGB |
1 |
3.1816 |
Platelet |
2 |
12.7252 |
LogBUN HGB |
2 |
11.1842 |
LogBUN Platelet |
2 |
9.9962 |
LogBUN SCalc |
3 |
15.3053 |
LogBUN HGB SCalc |
3 |
13.9911 |
LogBUN HGB Age |
3 |
13.5788 |
LogBUN HGB Frac |
4 |
16.9873 |
LogBUN HGB Age SCalc |
4 |
16.0457 |
LogBUN HGB Frac SCalc |
4 |
15.7619 |
LogBUN HGB LogPBM SCalc |
5 |
17.6291 |
LogBUN HGB Age Frac SCalc |
5 |
17.3519 |
LogBUN HGB Age LogPBM SCalc |
5 |
17.1922 |
LogBUN HGB Age LogWBC SCalc |
6 |
17.9120 |
LogBUN HGB Age Frac LogPBM SCalc |
6 |
17.7947 |
LogBUN HGB Age LogWBC Frac SCalc |
6 |
17.7744 |
LogBUN HGB Platelet Age Frac SCalc |
7 |
18.1517 |
LogBUN HGB Platelet Age Frac LogPBM SCalc |
7 |
18.0568 |
LogBUN HGB Age LogWBC Frac LogPBM SCalc |
7 |
18.0223 |
LogBUN HGB Platelet Age LogWBC Frac SCalc |
8 |
18.3925 |
LogBUN HGB Platelet Age LogWBC Frac LogPBM SCalc |
8 |
18.1636 |
LogBUN HGB Platelet Age Frac LogPBM Protein SCalc |
8 |
18.1309 |
LogBUN HGB Platelet Age LogWBC Frac Protein SCalc |
9 |
18.4550 |
LogBUN HGB Platelet Age LogWBC Frac LogPBM Protein SCalc |
|
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.