PROC SURVEYMEANS Statement

The SURVEYMEANS Procedure

PROC SURVEYMEANS Statement

PROC SURVEYMEANS < options > < statistic-keywords > ;

The PROC SURVEYMEANS statement invokes the procedure. In this statement, you identify the data set to be analyzed and specify sample design information. The DATA= option names the input data set to be analyzed. If your analysis includes a finite population correction factor, you can input either the sampling rate or the population total using the RATE= or TOTAL= option. If your design is stratified, with different sampling rates or totals for different strata, then you can input these stratum rates or totals in a SAS data set containing the stratification variables.

In the PROC SURVEYMEANS statement, you also can use statistic-keywords to specify statistics for the procedure to compute. Available statistics include the population mean and population total, together with their variance estimates and confidence limits. You can also request data set summary information and sample design information.

You can specify the following options in the PROC SURVEYMEANS statement.

ALPHA= $\alpha$

sets the confidence level for confidence limits. The value of the ALPHA= option must be between 0.0001 and 0.9999, and the default value is 0.05. A confidence level of $\alpha$ produces $100(1 - \alpha)$ % confidence limits. The default of ALPHA=0.05 produces 95% confidence limits. If $\alpha$ is between 0 and 1 but outside the range of 0.0001 to 0.9999, the procedure uses the closest range endpoint. For example, if you specify ALPHA=0.000001, the procedure uses 0.0001 to determine confidence limits.

DATA=SAS-data-set

specifies the SAS data set to be analyzed by PROC SURVEYMEANS. If you omit the DATA= option, the procedure uses the most recently created SAS data set.

MISSING

requests that the procedure treat missing values as a valid category for categorical variables.

ORDER=DATA | FORMATTED | INTERNAL

specifies the order in which the values of the categorical variables are to be reported. Note that the ORDER= option applies to all the categorical variables. The exception is ORDER=FORMATTED (the default) for numeric variables for which you have supplied no explicit format (that is, for which there is no corresponding FORMAT statement in the current PROC SURVEYMEANS run or in the DATA step that created the data set). In this case, the values of the numerical categorical variables are ordered by their internal (numeric) value. The following shows how PROC SURVEYMEANS interprets values of the ORDER= option.

DATA: orders values according to their order in the input data set.
FORMATTED: orders values by their formatted values. This order is operating environment dependent. By default, the order is ascending.
INTERNAL: orders values by their unformatted values, which yields the same order that the SORT procedure does. This order is operating environment dependent.

By default, ORDER=FORMATTED.

RATE=value $\, | \,$ SAS-data-set

R=value $\, | \,$ SAS-data-set

specifies the sampling rate as a positive value, or names an input data set that contains the stratum sampling rates. The procedure uses this information to compute a finite population correction for variance estimation. If your sample design has multiple stages, you should specify the first-stage sampling rate, which is the ratio of the number of PSUs selected to the total number of PSUs in the population

For a nonstratified sample design, or for a stratified sample design with the same sampling rate in all strata, you should specify a positive value for the RATE= option. If your design is stratified with different sampling rates in the strata, then you should name a SAS data set that contains the stratification variables and the sampling rates. See the section "Specification of Population Totals and Sampling Rates" for details.

The sampling rate value must be a positive number. You can specify value as a number between 0 and 1. Or you can specify value in percentage form as a number between 1 and 100, and PROC SURVEYMEANS will convert that number to a proportion. The procedure treats the value 1 as 100%, and not the percentage form 1%.

If you do not specify the TOTAL= option or the RATE= option, then the variance estimation does not include a finite population correction. You cannot specify both the TOTAL= option and the RATE= option.

TOTAL=value $\, | \,$ SAS-data-set

N=value $\, | \,$ SAS-data-set

specifies the total number of primary sampling units (PSUs) in the study population as a positive value, or names an input data set that contains the stratum population totals. The procedure uses this information to compute a finite population correction for variance estimation.

For a nonstratified sample design, or for a stratified sample design with the same population total in all strata, you should specify a positive value for the TOTAL= option. If your sample design is stratified with different population totals in the strata, then you should name a SAS data set that contains the stratification variables and the population totals. See the section "Specification of Population Totals and Sampling Rates" for details.

If you do not specify the TOTAL= option or the RATE= option, then the variance estimation does not include a finite population correction. You cannot specify both the TOTAL= option and the RATE= option.

statistic-keywords

specifies the statistics for the procedure to compute. If you do not specify any statistic-keywords, PROC SURVEYMEANS computes the NOBS, MEAN, STDERR, and CLM statistics by default.

PROC SURVEYMEANS performs univariate analysis, analyzing each variable separately. Thus the number of nonmissing and missing observations may not be the same for all analysis variables. See the section "Missing Values" for more information.

The statistics produced depend on the type of the analysis variable. If you name a numeric variable in the CLASS statement, then the procedure analyzes that variable as a categorical variable. The procedure always analyzes character variables as categorical. See the section "CLASS Statement" for more information.

PROC SURVEYMEANS computes MIN, MAX, and RANGE for numeric variables but not for categorical variables. For numeric variables, the keyword MEAN produces the mean, but for categorical variables it produces the proportion in each category or level. Also for categorical variables, the keyword NOBS produces the number of observations for each variable level, and the keyword NMISS produces the number of missing observations for each level. If you request the keyword NCLUSTER for a categorical variable, PROC SURVEYMEANS displays for each level the number of clusters with observations in that level. PROC SURVEYMEANS computes SUMWGT the same for categorical and numeric variables, as the sum of the weights over all nonmissing observations.

The valid statistic-keywords are as follows:

ALL: all statistics listed
CLM: $100(1 - \alpha)$ % confidence limits for the MEAN, where $\alpha$ is determined by the ALPHA= option, and the default is $\alpha=0.05$
CLSUM: $100(1 - \alpha)$ % confidence limits for the SUM, where $\alpha$ is determined by the ALPHA= option, and the default is $\alpha=0.05$
CV: coefficient of variation
DF: degrees of freedom for the t test
MAX: maximum value
MEAN: mean for a numeric variable, or the proportion in each category for a categorical variable
MIN: minimum value
NCLUSTER: number of clusters
NMISS: number of missing observations
NOBS: number of nonmissing observations
RANGE: range, MAX-MIN
STD: standard deviation of the SUM. When you request SUM, the procedure computes STD by default.
STDERR: standard error of the MEAN. When you request MEAN, the procedure computes STDERR by default.
SUM: weighted sum, $\sum{w_iy_i}$ , or estimated population total when the appropriate sampling weights are used
SUMWGT: sum of the weights, $\sum{w_i}$
T: t value for H₀: population MEAN = 0, and its two tailed p-value with DF degrees of freedom
VAR: variance of the MEAN
VARSUM: variance of the SUM

Chapter Contents
Previous
Next
Top