Chapter Contents |
Previous |
Next |
The MEANS Procedure |
See also: | The SUMMARY Procedure |
PROC MEANS <option(s)> <statistic-keyword(s)>; |
To do this | Use this option | |
---|---|---|
Specify the input data set | DATA= | |
Disable floating point exception recovery | NOTRAP | |
Specify the amount of memory to use for data summarization with class variables | SUMSIZE= | |
Control the classification levels | ||
Specify a secondary data set that contains the combinations of class variables to analyze | CLASSDATA= | |
Create all possible combinations of class variable values | COMPLETETYPES | |
Exclude from the analysis all combinations of class variable values that are not in the CLASSDATA= data set | EXCLUSIVE | |
Use missing values as valid values to create combinations of class variables | MISSING | |
Control the statistical analysis | ||
Specify the confidence level for the confidence limits | ALPHA= | |
Exclude observations with nonpositive weights from the analysis | EXCLNPWGTS | |
Specify the sample size to use for the P2 quantile estimation method | QMARKERS= | |
Specify the quantile estimation method | QMETHOD= | |
Specify the mathematical definition used to compute quantiles | QNTLDEF= | |
Select the statistics | statistic-keyword | |
Specify the variance divisor | VARDEF= | |
Control the output | ||
Specify the field width for the statistics | FW= | |
Specify the number of decimal places for the statistics | MAXDEC= | |
Suppress reporting the total number of observations for each unique combination of the class variables | NONOBS | |
Suppress all displayed output | NOPRINT | |
Order the values of the class variables according to the specified order | ORDER= | |
Display the output | ||
Display the analysis for all requested combinations of class variables | PRINTALLTYPES | |
Display the values of the ID variables | PRINTIDVARS | |
Control the output data set | ||
Specify that the _TYPE_ variable contain character values. | CHARTYPE | |
Order the output data set by descending _TYPE_ value | DESCENDTYPES | |
Select ID variables based on minimum values | IDMIN | |
Limit the output statistics to the observations with the highest _TYPE_ value | NWAY |
Options |
Default: | .05 |
Range: | between 0 and 1 |
Interaction: | To compute confidence limits specify the statistic-keyword CLM, LCLM, or UCLM. |
See also: | Confidence Limits |
Featured in: | Computing a Confidence Limit for the Mean |
Main discussion: | Output Data Set |
Interaction | When you specify more than 32 class variables, _TYPE_ automatically becomes a character variable. |
Featured in: | Computing Output Statistics with Missing Class Variable Values |
Restriction: | The CLASSDATA= data set must contain all class variables. Their data type and format must match the corresponding class variables in the input data set. |
Interaction: | If you use the EXCLUSIVE option, PROC MEANS excludes any observation in the input data set whose combination of class variables is not in the CLASSDATA= data set. |
Tip: | Use the CLASSDATA= data set to filter or to supplement the input data set. |
Featured in: | Using a CLASSDATA= Data Set with Class Variables |
Interaction: | The PRELOADFMT option in the CLASS statement ensures that PROC MEANS ouputs all user-defined format ranges or values for the combinations of class variables, even when a frequency is zero. |
Tip: | Using COMPLETETYPES does not increase the memory requirements. |
Featured in: | Using Preloaded Formats with Class Variables |
Main discussion: | Input Data Sets |
Alias: | DESCENDING | DESCEND |
Interaction: | Descending has no effect if you specify NWAY. |
Tip: | Use DESCENDTYPES to make the overall total (_TYPE_=0) the last observation in each BY group. |
See also: | Output Data Set |
Featured in: | Computing Different Output Statistics for Several Variables |
Alias: | EXCLNPWGT |
See also: | WEIGHT= and WEIGHT Statement |
Requirement: | If a CLASSDATA= data set is not specified, this option is ignored. |
Featured in: | Using a CLASSDATA= Data Set with Class Variables |
Default: | 12 |
Tip: | If PROC MEANS truncates column labels in the output, increase the field width. |
Featured in: | Computing Specific Descriptive Statistics , Using a CLASSDATA= Data Set with Class Variables , and Using Multi-label Value Formats with Class Variables |
Interaction: | Specify PRINTIDVARS to display the value of the ID variables in the output. |
See: | ID Statement |
Default: | BEST. width for columnar format, typically about 7. (This does not apply to the PROBT statistic. The SAS system option PROBSIG= determines its format. See SAS system options in SAS Language Reference: Concepts for details.) |
Range: | 0-8 |
Featured in: | Computing Descriptive Statistics with Class Variables and Using a CLASSDATA= Data Set with Class Variables |
Default: | If you omit MISSING, PROC MEANS excludes the observations with a missing class variable value from the analysis. |
See also: | SAS Language Reference: Concepts for a discussion of missing values that have special meaning. |
Featured in: | Using Preloaded Formats with Class Variables |
See also: | The N Obs Statistic |
Featured in: | Using Multi-label Value Formats with Class Variables and Using Preloaded Formats with Class Variables |
In operating environments where the overhead of FPE recovery is significant, NOTRAP can improve performance. Note that normal SAS System FPE handling is still in effect so that PROC MEANS terminates in the case of math exceptions.
Interaction: | If you specify a TYPES statement or a WAYS statements, PROC MEANS ignores this option. |
See also: | Output Data Set |
Featured in: | Computing Output Statistics with Missing Class Variable Values |
Interaction: | If you use PRELOADFMT in the CLASS statement, the order for the values of each class variable matches the order that PROC FORMAT uses to store the values of the associated user-defined format. If you use the CLASSDATA= option, PROC MEANS uses the order of the unique values of each class variable in the CLASSDATA= data set to order the output levels. If you use both options, PROC MEANS first uses the user-defined formats to order the output. If you omit EXCLUSIVE, PROC MEANS appends after the user-defined format and the CLASSDATA= values the unique values of the class variables in the input data set based on the order that they are encountered. |
Tip: | By default, PROC FORMAT stores a format definition in sorted order. Use the NOTSORTED option to store the values or ranges of a user defined format in the order that you define them. |
Alias: | FMT | EXTERNAL |
Interaction: | For multiway combinations of the class variables, PROC MEANS determines the order of a class variable combination from the individual class variable frequencies. |
Interaction: | Use the ASCENDING option in the CLASS statement to order values by ascending frequency count. |
Alias: | UNFMT | INTERNAL |
Default: | UNFORMATTED |
See also: | Ordering the Class Values |
Default: | |
Tip: | Use NOPRINT when you want to create only an OUT= output data set. |
Featured in: | For an example of NOPRINT, see Computing Output Statistics and Identifying the Top Three Extreme Values with the Output Statistics |
Alias: | PRINTALL |
Interaction: | If you use the NWAY option, the TYPES statement, or the WAYS statement, PROC MEANS ignores this option. |
Featured in: | Using a CLASSDATA= Data Set with Class Variables |
Alias: | PRINTIDS |
Interaction: | Specify IDMIN to display the minimum value of the ID variables. |
See: | ID Statement |
Default: | The default value depends on which quantiles you request. For the median (P50), number is 7. For the quartiles (P25 and P50), number is 25. For the quantiles P1, P5, P10, P90, P95, or P99, number is 105. If you request several quantiles, PROC MEANS uses the largest value of number. |
Range: | an odd integer greater than 3 |
Tip: | Increase the number of markers above the defaults settings to improve the accuracy of the estimate; reduce the number of markers to conserve memory and computing time. |
Main Discussion | Quantiles |
Note: This technique can be very memory-intensive.
Default: | OS |
Restriction: | When QMETHOD=P2, PROC MEANS will not compute weighted quantiles. |
Tip: | When QMETHOD=P2, reliable estimations of some quantiles (P1,P5,P95,P99) may not be possible for some data sets. |
Main Discussion: | Quantiles |
Default: | 5 |
Alias: | PCTLDEF= |
Main discussion: | Calculating Percentiles |
Descriptive statistic keywords | |
CLM | RANGE |
CSS | SKEWNESS|SKEW |
CV | STDDEV|STD |
KURTOSIS|KURT | STDERR |
LCLM | SUM |
MAX | SUMWGT |
MEAN | UCLM |
MIN | USS |
N | VAR |
NMISS | |
Quantile statistic keywords | |
MEDIAN|P50 | Q3|P75 |
P1 | P90 |
P5 | P95 |
P10 | P99 |
Q1|P25 | QRANGE |
Hypothesis testing keyword | |
PROBT | T |
Default: | N, MEAN, STD, MIN, and MAX |
Requirement: | To compute standard error, confidence limits for the mean, and the Student's t test you must use the default value of VARDEF= which is DF. To compute skewness or kurtosis you must use VARDEF=N or VARDEF=DF. |
Tip: | Use CLM or both LCLM and UCLM to compute a two-sided confidence limit for the mean. Use only LCLM or UCLM, to compute a one-sided confidence limit. |
Main discussion: | The definitions of the keywords and the formulas for the associated statistics are listed in Keywords and Formulas . |
Featured in: | Computing Specific Descriptive Statistics and Using the BY Statement with Class Variables |
Default: | The value of the SUMSIZE= system option. |
Tip: | For best results, do not make SUMSIZE= larger than the amount of physical memory that is available for the PROC step. If additional space is needed, PROC MEANS uses utility files. |
See also: | The SAS system option SUMSIZE= in SAS Language Reference: Dictionary. |
Main discussion: | Computational Resources |
Value | Divisor | Formula for Divisor |
---|---|---|
DF | degrees of freedom | n - 1 |
N | number of observations | n |
WDF | sum of weights minus one | (iwi) - 1 |
WEIGHT|WGT | sum of weights | iwi |
Default: | DF |
Requirement: | To compute the standard error of the mean, confidence limits for the mean, or the Student's t-test, use the default value of VARDEF=. |
Tip: | When you use the WEIGHT statement and VARDEF=DF, the variance is an estimate of , where the variance of the ith observation is and is the weight for the ith observation. This yields an estimate of the variance of an observation with unit weight. |
Tip: | When you use the WEIGHT statement and VARDEF=WGT, the computed variance is asymptotically (for large n) an estimate of , where is the average weight. This yields an asymptotic estimate of the variance of an observation with average weight. |
See also: | the example of weighted statistics |
Main discussion: | Keywords and Formulas |
Chapter Contents |
Previous |
Next |
Top of Page |
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.