The SURVEYMEANS Procedure |
PROC SURVEYMEANS Statement
- PROC SURVEYMEANS < options > <
statistic-keywords > ;
The PROC SURVEYMEANS statement invokes the procedure. In
this statement, you identify the data set to be analyzed
and specify sample design information. The DATA= option
names the input data set to be analyzed. If your
analysis includes a finite population correction factor,
you can input either the sampling rate or the population
total using the RATE= or TOTAL= option. If your design
is stratified, with different sampling rates or totals
for different strata, then you can input these stratum
rates or totals in a SAS data set containing the
stratification variables.
In the PROC SURVEYMEANS statement, you also can use
statistic-keywords to specify statistics for the
procedure to compute. Available statistics include the
population mean and population total, together with their
variance estimates and confidence limits. You can also
request data set summary information and sample design
information.
You can specify the following options in the PROC
SURVEYMEANS statement.
-
ALPHA=
-
sets the confidence level for confidence limits. The
value of the ALPHA= option must be between 0.0001 and
0.9999, and the default value is 0.05. A confidence
level of produces %
confidence limits. The default of ALPHA=0.05 produces
95% confidence limits. If is between 0 and 1
but outside the range of 0.0001 to 0.9999, the
procedure uses the closest range endpoint. For
example, if you specify ALPHA=0.000001, the
procedure uses 0.0001 to determine confidence limits.
-
DATA=SAS-data-set
-
specifies the SAS data set to be analyzed by PROC SURVEYMEANS. If
you omit the DATA= option, the procedure uses the most recently
created SAS data set.
-
MISSING
-
requests that the procedure treat missing values as a valid
category for categorical variables.
- ORDER=DATA | FORMATTED | INTERNAL
-
specifies the order in which the values of the categorical
variables are to be reported. Note that the ORDER= option
applies to all the categorical variables. The exception is
ORDER=FORMATTED (the default) for numeric variables for
which you have supplied no explicit format (that is, for
which there is no corresponding FORMAT statement in
the current PROC SURVEYMEANS run or in the DATA step that
created the data set). In this case, the values of the
numerical categorical variables are ordered by their
internal (numeric) value. The following shows how PROC
SURVEYMEANS interprets values of the ORDER= option.
- DATA
- orders values according to their order in the input
data set.
- FORMATTED
- orders values by their formatted values. This order
is operating environment dependent. By default,
the order is ascending.
- INTERNAL
- orders values by their unformatted values, which yields
the same order that the SORT procedure does. This order
is operating environment dependent.
By default, ORDER=FORMATTED.
- RATE=valueSAS-data-set
- R=valueSAS-data-set
-
specifies the sampling rate as a positive value,
or names an input data set that contains the stratum
sampling rates. The procedure uses this information to
compute a finite population correction for variance
estimation. If your sample design has multiple stages,
you should specify the first-stage sampling rate,
which
is the ratio of the number of PSUs selected to the total
number of PSUs in the population
For a nonstratified sample design, or for a stratified
sample design with the same sampling rate in all
strata, you should specify a positive value
for the RATE= option. If your design is stratified with
different sampling rates in the strata, then you should
name a SAS data set that contains the stratification
variables and the sampling rates.
See the section "Specification of Population Totals and Sampling Rates" for details.
The sampling rate value must be a positive
number. You can specify value as a number
between 0 and 1. Or you can specify value in
percentage form as a number between 1 and 100, and PROC
SURVEYMEANS will convert that number to a proportion. The
procedure treats the value 1 as 100%, and not the
percentage form 1%.
If you do not specify the TOTAL= option or the RATE=
option, then the variance estimation does not include
a finite population correction. You cannot specify
both the TOTAL= option and the RATE= option.
- TOTAL=valueSAS-data-set
-
N=valueSAS-data-set
-
specifies the total number of primary sampling units
(PSUs) in the study population as a positive
value, or names an input data set that
contains the stratum population totals. The procedure
uses this information to compute a finite population
correction for variance estimation.
For a nonstratified sample design, or for a stratified
sample design with the same population total in all
strata, you should specify a positive value
for the TOTAL= option. If your sample design is stratified
with different population totals in the strata, then
you should name a SAS data set that contains the
stratification variables and the population totals.
See the section "Specification of Population Totals and Sampling Rates" for details.
If you do not specify the TOTAL= option or the RATE=
option, then the variance estimation does not include
a finite population correction. You cannot specify
both the TOTAL= option and the RATE= option.
-
statistic-keywords
-
specifies the statistics for the procedure to compute.
If you do not specify any statistic-keywords, PROC
SURVEYMEANS computes the NOBS, MEAN, STDERR, and CLM
statistics by default.
PROC SURVEYMEANS performs univariate analysis, analyzing
each variable separately. Thus the number of nonmissing
and missing observations may not be the same for all
analysis variables.
See the section "Missing Values" for
more information.
The statistics produced depend on the type of the
analysis variable. If you name a numeric variable in
the CLASS statement, then the procedure analyzes that
variable as a categorical variable. The procedure
always analyzes character variables as categorical.
See
the section "CLASS Statement" for more information.
PROC SURVEYMEANS computes MIN, MAX, and RANGE for
numeric variables but not for categorical variables.
For numeric variables, the keyword MEAN produces the
mean, but for categorical variables it produces the
proportion in each category or level. Also for
categorical variables, the keyword NOBS produces the
number of observations for each variable level, and the
keyword NMISS produces the number of missing
observations for each level. If you request the keyword
NCLUSTER for a categorical variable, PROC SURVEYMEANS
displays for each level the number of clusters with
observations in that level. PROC SURVEYMEANS computes
SUMWGT the same for categorical and numeric variables,
as the sum of the weights over all nonmissing
observations.
The valid statistic-keywords are as follows:
- ALL
- all statistics listed
- CLM
- % confidence limits for the
MEAN, where is determined by the
ALPHA= option, and the default is
- CLSUM
- % confidence limits for
the SUM, where is determined by the
ALPHA= option, and the default is
- CV
- coefficient of variation
- DF
- degrees of freedom for the t test
- MAX
- maximum value
- MEAN
- mean for a numeric variable,
or the proportion in each category for a categorical
variable
- MIN
- minimum value
- NCLUSTER
- number of clusters
- NMISS
- number of missing observations
- NOBS
- number of nonmissing observations
- RANGE
- range, MAX-MIN
- STD
- standard deviation of the SUM. When you
request SUM, the procedure computes STD by
default.
- STDERR
- standard error of the MEAN. When you
request MEAN, the procedure computes STDERR by
default.
- SUM
- weighted sum, , or estimated
population total when the appropriate
sampling weights are used
- SUMWGT
- sum of the weights,
- T
- t value for H0: population MEAN = 0,
and its two tailed p-value with DF
degrees of freedom
- VAR
- variance of the MEAN
- VARSUM
- variance of the SUM
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.