The SURVEYSELECT Procedure |
Output Data Set
PROC SURVEYSELECT creates a SAS data set that
contains the sample of selected units. You can
specify the name of this output data set with the
OUT= option in the PROC SURVEYSELECT statement.
If you omit the OUT= option, the data set is named
DATAn, where n is the smallest integer
that makes the name unique.
The output data set contains an observation for each
unit selected for the sample.
If you specify the OUTHITS option for methods that
may select the same unit more than once (that is, methods
that select with replacement or with minimum replacement),
the output data set contains a separate observation for
each selection. If you do not specify the OUTHITS option,
the output data set contains
only one observation for each selected unit, even
if the unit is selected more than once,
and the variable NumberHits contains the
number of hits or selections for that unit.
The output data set
contains design information and selection statistics,
depending on the selection method and output options you
specify. The output data set can include the following
variables:
- STRATA variables
- Replicate, which is the sample replicate number. This
variable is included when you request replicated sampling with
the REP= option.
- ID variables
- CONTROL variables
- Zone, which is the selection zone. This variable is included
for METHOD=PPS_SEQ.
- SIZE variable
- AdjustedSize, which is the adjusted size measure.
This variable is included if you request adjusted sizes with
the MINSIZE= option or the MAXSIZE= option.
- Certain, which indicates certainty selection.
This variable is included if you specify the CERTSIZE= option.
It equals 1 for units included with certainty because their
size measures exceed the certainty size measure. Otherwise,
it equals 0.
- NumberHits, which is the number of hits or selections.
This variable is included for selection methods that are with
replacement or with minimum replacement (METHOD=URS,
METHOD=PPS_WR, METHOD=PPS_SYS, and METHOD=PPS_SEQ).
The output data set includes the following variables if
you request a PPS selection method or if you specify the
STATS option for other methods:
- ExpectedHits, which is the expected number of hits
or selections. This variable is included for selection methods that
are with replacement or with minimum replacement (METHOD=URS,
METHOD=PPS_WR, METHOD=PPS_SYS, and METHOD=PPS_SEQ).
- SelectionProb, which is the probability of selection.
This variable is included for selection methods that are without
replacement.
- SamplingWeight, which is the sampling weight.
This variable equals
the inverse of ExpectedHits or SelectionProb.
For METHOD=PPS_BREWER and METHOD=PPS_MURTHY, which select
two units from each stratum with probability proportional to
size, the output data set contains the following variable:
- JtSelectionProb, which is the joint probability
of selection for the two units selected from the stratum
If you request the JTPROBS option to compute joint probabilities
of selection for METHOD=PPS or METHOD=PPS_SAMPFORD,
then the output data set contains the following variables:
- Unit, which is an identification variable that numbers
the selected units sequentially within each stratum
- JtProb_1, JtProb_2, JtProb_3, ...,
where the variable
JtProb_1 contains the joint probability of selection
for the current unit and unit 1. Similarly, JtProb_2
contains the joint probability of selection for the current
unit and unit 2, and so on.
If you request the JTPROBS option for METHOD=PPS_WR,
then the output data set contains the following variables:
- Unit, which is an identification variable that numbers
the selected units sequentially within each stratum
- JtHits_1, JtHits_2, JtHits_3, ...,
where the variable
JtHits_1 contains the joint expected number of hits
for the current unit and unit 1. Similarly, JtHits_2
contains the joint expected number of hits for the current
unit and unit 2, and so on.
If you request the OUTSIZE option, the output data set contains
the following variables. If you specify a STRATA statement,
the output data set includes stratum-level values of these variables.
Otherwise, the output data set contains population-level values of
these variables.
- MinimumSize, which is the minimum size measure
specified with the MINSIZE= option. This variable
is included if you request the MINSIZE= option.
- MaximumSize, which is the maximum size measure
specified with the MAXSIZE= option. This variable
is included if you request the MAXSIZE= option.
- CertaintySize, which is the certainty size measure
specified with the CERTSIZE= option. This variable
is included if you request the CERTSIZE= option.
- Total, which is the total number of sampling units
in the stratum. This variable
is included if there is no SIZE statement.
- TotalSize, which is the total of size measures in
the stratum. This variable
is included if there is a SIZE statement.
- TotalAdjSize, which is the total of adjusted size
measures in the stratum.
This variable is included if there is a SIZE
statement and if you request adjusted sizes with the MAXSIZE=
option or the MINSIZE= option.
- SamplingRate, which is the sampling rate. This variable is
included if you specify the SAMPRATE= option.
- SampleSize, which is the sample size. This variable is included
if you specify the SAMPSIZE= option, or if you specify
METHOD=BREWER or METHOD=MURTHY, which select two units from
each stratum.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.