Example 62.6: Stratum Collapse
In a stratified sample, it is possible that some strata
will have only one sampling unit. When this happens,
PROC SURVEYREG collapses these strata that contain
single sampling unit into a pooled stratum. For more
detailed information on stratum collapse, see
the section "Stratum Collapse".
Suppose that you have the following data.
data Sample;
input Stratum X Y;
datalines;
10 0 0
10 1 1
11 1 1
11 1 2
12 3 3
33 4 4
14 6 7
12 3 4
;
The variable Stratum is the stratification
variable, the variable X is the independent
variable, and the variable Y is the dependent
variable. You want to regress Y on X. In the
data set Sample, both Stratum=33 and
Stratum=14 contain one observation. By default, PROC
SURVEYREG collapses these strata into one pooled
stratum in the regression analysis.
To input the finite population correction information,
you create the SAS data set StratumTotal.
data StratumTotal;
input Stratum _TOTAL_;
datalines;
10 10
11 20
12 32
33 40
33 45
14 50
15 .
66 70
;
The variable Stratum is the stratification
variable, and the variable _TOTAL_ contains the
stratum totals. The data set StratumTotal contains
more strata than the data set Sample. Also in the
data set StratumTotal, more than one
observation contains the stratum totals for
Stratum=33.
33 40
33 45
PROC SURVEYREG allows this type of input. The procedure
simply ignores the strata that are not present in the
data set Sample; for the multiple entries of a
stratum, the procedure uses the first observation.
In this example, Stratum=33 has the stratum
total _TOTAL_=40.
The following SAS statements perform the regression
analysis.
title1 'Stratified Sample with Single Sampling Unit in Strata';
title2 'With Stratum Collapse';
proc SURVEYREG data=Sample total=StratumTotal;
strata Stratum/list;
model Y=X;
run;
Output 62.6.1: Summary of Data and Regression
Stratified Sample with Single Sampling Unit in Strata |
With Stratum Collapse |
The SURVEYREG Procedure |
Regression Analysis for Dependent Variable Y |
Data Summary |
Number of Observations |
8 |
Mean of Y |
2.75000 |
Sum of Y |
22.00000 |
Design Summary |
Number of Strata |
5 |
Number of Strata Collapsed |
2 |
Fit Statistics |
R-square |
0.9555 |
Root MSE |
0.5129 |
Denominator DF |
4 |
|
Output 62.6.1 displays that there are a total of 5 strata
in the input data set, and 2 strata are collapsed into a
pooled stratum. The denominator degrees of freedom
is 4, due to the collapse (see the section "Denominator Degrees of Freedom").
Output 62.6.2: Stratification Information
Stratified Sample with Single Sampling Unit in Strata |
With Stratum Collapse |
The SURVEYREG Procedure |
Regression Analysis for Dependent Variable Y |
Stratum Information |
Stratum Index |
Collapsed |
Stratum |
N Obs |
Population Total |
Sampling Rate |
1 |
|
10 |
2 |
10 |
0.20 |
2 |
|
11 |
2 |
20 |
0.10 |
3 |
|
12 |
2 |
32 |
0.06 |
4 |
Yes |
14 |
1 |
50 |
0.02 |
5 |
Yes |
33 |
1 |
40 |
0.03 |
0 |
Pooled |
|
2 |
90 |
0.02 |
NOTE: |
Strata with only one observation are collapsed into the stratum with Stratum Index "0". |
|
|
Output 62.6.2 displays the stratification information,
including stratum collapse. Under the column
Collapsed, the fourth (Stratum Index=4) stratum
and the fifth (Stratum Index=5) stratum are marked as
"Yes," which indicates that these two strata are
collapsed into the pooled stratum (Stratum Index=0). The
sampling rate for the pooled stratum is 2%, which
combined from the 4th stratum and the 5th stratum (see
the section "Sampling Rate of the Pooled Stratum from Collapse").
Output 62.6.3: Parameter Estimates and Effect Tests
Stratified Sample with Single Sampling Unit in Strata |
With Stratum Collapse |
The SURVEYREG Procedure |
Regression Analysis for Dependent Variable Y |
Tests of Model Effects |
Effect |
Num DF |
F Value |
Pr > F |
Model |
1 |
155.62 |
0.0002 |
Intercept |
1 |
0.24 |
0.6503 |
X |
1 |
155.62 |
0.0002 |
NOTE: |
The denominator degrees of freedom for the F tests is 4. |
|
Estimated Regression Coefficients |
Parameter |
Estimate |
Standard Error |
t Value |
Pr > |t| |
Intercept |
0.13004484 |
0.26578532 |
0.49 |
0.6503 |
X |
1.10313901 |
0.08842825 |
12.47 |
0.0002 |
NOTE: |
The denominator degrees of freedom for the t tests is 4. |
|
|
Output 62.6.3 displays the parameter estimates and the
tests of the significance of the model effects.
Alternatively, if you prefer not to collapse the strata
that have single sampling unit, you can specify the
NOCOLLAPSE option in the STRATA statement.
title1 'Stratified Sample with Single Sampling Unit in Strata';
title2 'Without Stratum Collapse';
proc SURVEYREG data=Sample total=StratumTotal;
strata Stratum/list nocollapse;
model Y = X;
run;
Output 62.6.4: Summary of Data and Regression
Stratified Sample with Single Sampling Unit in Strata |
Without Stratum Collapse |
The SURVEYREG Procedure |
Regression Analysis for Dependent Variable Y |
Data Summary |
Number of Observations |
8 |
Mean of Y |
2.75000 |
Sum of Y |
22.00000 |
Design Summary |
Number of Strata |
5 |
Fit Statistics |
R-square |
0.9555 |
Root MSE |
0.5129 |
Denominator DF |
3 |
|
Output 62.6.4 does not contain stratum collapse
information as compared to Output 62.6.1. The denominator
degrees of freedom is 3 instead of 4 as in
Output 62.6.1.
Output 62.6.5: Stratification Information
Stratified Sample with Single Sampling Unit in Strata |
Without Stratum Collapse |
The SURVEYREG Procedure |
Regression Analysis for Dependent Variable Y |
Stratum Information |
Stratum Index |
Stratum |
N Obs |
Population Total |
Sampling Rate |
1 |
10 |
2 |
10 |
0.20 |
2 |
11 |
2 |
20 |
0.10 |
3 |
12 |
2 |
32 |
0.06 |
4 |
14 |
1 |
50 |
0.02 |
5 |
33 |
1 |
40 |
0.03 |
|
In Output 62.6.5, although the fourth stratum and the fifth
stratum contain only one observation, no stratum
collapse occurs as in Output 62.6.2.
Output 62.6.6: Parameter Estimates and Effect Tests
Stratified Sample with Single Sampling Unit in Strata |
Without Stratum Collapse |
The SURVEYREG Procedure |
Regression Analysis for Dependent Variable Y |
Tests of Model Effects |
Effect |
Num DF |
F Value |
Pr > F |
Model |
1 |
391.94 |
0.0003 |
Intercept |
1 |
0.25 |
0.6508 |
X |
1 |
391.94 |
0.0003 |
NOTE: |
The denominator degrees of freedom for the F tests is 3. |
|
Estimated Regression Coefficients |
Parameter |
Estimate |
Standard Error |
t Value |
Pr > |t| |
Intercept |
0.13004484 |
0.25957741 |
0.50 |
0.6508 |
X |
1.10313901 |
0.05572135 |
19.80 |
0.0003 |
NOTE: |
The denominator degrees of freedom for the t tests is 3. |
|
|
As a result of not collapsing strata, the standard error
estimates of the parameters are different from those in
Output 62.6.3, the tests of the significance of model
effects are different as well.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.