Chapter Contents |
Previous |
Next |
The MEANS Procedure |
Tip: | You can use multiple OUTPUT statements to create several OUT= data sets. |
Featured in: | Computing Output Statistics , Computing Different Output Statistics for Several Variables , Computing Output Statistics with Missing Class Variable Values , Identifying an Extreme Value with the Output Statistics , and Identifying the Top Three Extreme Values with the Output Statistics |
OUTPUT
<OUT=SAS-data-set> <output-statistic-specification(s)>
<id-group-specification(s)> <maximum-id-specification(s)> <minimum-id-specification(s)> </ option(s)>; |
Options |
Default: | DATAn |
Tip: | You can use data set options with OUT=. |
statistic-keyword<(variable-list)>=<name(s)> |
Descriptive statistics keyword | |
CSS | RANGE |
CV | SKEWNESS|SKEW |
KURTOSIS|KURT | STDDEV |STD |
LCLM | STDERR |
MAX | SUM |
MEAN | SUMWGT |
MIN | UCLM |
N | USS |
NMISS | VAR |
Quantile statistics keyword | |
MEDIAN|P50 | Q3|P75 |
P1 | P90 |
P5 | P95 |
P10 | P99 |
Q1|P25 | QRANGE |
Hypothesis testing keyword | |
PROBT | T |
By default the statistics in the output data set automatically inherit the analysis variable's format, informat, and label. However, statistics computed for N, NMISS, SUMWGT, USS, CSS, VAR, CV, T, PROBT, SKEWNESS, and KURTOSIS will not inherit the analysis variable's format because this format may be invalid for these statistics (for example, dollar or datetime formats).
Restriction: | If you omit variable and name(s) then PROC MEANS allows the statistic-keyword only once in a single OUTPUT statement, unless you also use the AUTONAME option. |
Featured in: | Computing Output Statistics , Computing Different Output Statistics for Several Variables , Identifying an Extreme Value with the Output Statistics , and Identifying the Top Three Extreme Values with the Output Statistics |
Default: | all numeric analysis variables |
Default: | the analysis variable name. If you specify AUTONAME, the default is the combination of the analysis variable name and the statistic-keyword. |
Interaction: | If you specify variable-list, PROC MEANS uses the order that you specify the analysis variables to store the statistics in the output data set variables. |
Featured in: | Computing Output Statistics |
Default: | If you use the CLASS statement and an OUTPUT statement without an output-statistic-specification, the output data set contains five observations for each combination of class variables: the value of N, MIN, MAX, MEAN, and STD. If you use the WEIGHT statement or the WEIGHT option in the VAR statement, the output data set also contains an observation with the sum of weights (SUMWGT) for each combination of class variables. |
Tip: | Use the AUTONAME option to have PROC MEANS generate unique names for multiple variables and statistics. |
IDGROUP (<MIN|MAX (variable-list-1) <...MIN|MAX
(variable-list-n)>> <<MISSING> <OBS>
<LAST>> OUT <[n]>
(id-variable-list)=<name(s)>) |
When you specify multiple selection variables, the ordering of observations for the selection of n extremes is done the same way that PROC SORT sorts data with multiple BY variables. PROC MEANS concatenates the variable values into a single key. The MAX(variable-list) selection criterion is similar to using PROC SORT and the DESCENDING option in the BY statement.
Default: | If you do not specify MIN or MAX, PROC MEANS uses the observation number as the selection criterion to output observations. |
Restriction: | If you specify criteria that are contradictory, PROC MEANS only uses the first selection criterion. |
Interaction: | When multiple observations contains the same extreme values in all the MIN or MAX variables, PROC MEANS uses the observation number to resolve which observation to output. By default, PROC MEANS outputs the first observation to resolve any ties. However, if you specify the LAST option then PROC MEANS outputs the last observation to resolve any ties. |
Interaction: | When you specify MIN or MAX and when multiple observations contain the same extreme values, PROC MEANS use the observation number to resolve which observation to output. If you specify LAST, PROC MEANS outputs the last observation to resolve any ties. |
Alias: | MISS |
Interaction: | If you use WHERE processing, the value of _OBS_ may not correspond to the location of the observation in the input data set. |
Interaction: | If you use [n] to output multiple extreme values, PROC MEANS creates n _OBS_ variables and uses the suffix n to create the variable names, where n is a sequential integer from 1 to n. |
By default, PROC MEANS determines one extreme value for each level of each requested type. If n is greater than one, then n extremes are output for each level of each type. When n is greater than one and you request extreme value selection, the time complexity is where is the number of types requested and is the number of observations in the input data set. By comparison, to group the entire data set, the time complexity is .
Default: | 1 |
Range: | an integer between 1 and 100 |
Example: | To output two minimum
extreme values for each variable, use
idgroup(min(x) out[2](x y z)=MinX MinY MinZ);The OUT= data set contains the variables MinX_1, MinX_2, MinY_1, MinY_2, MinZ_1, and MinZ_2. |
Default: | If you omit name, PROC MEANS uses the names of variables in the id-variable-list. |
Tip: | Use the AUTONAME option to automatically resolve naming conflicts. |
Alias: | IDGRP |
Requirement: | You must specify the MIN|MAX selection criteria first and OUT(id-variable-list)= after the suboptions MISSING, OBS, and LAST. |
Tip: | You can use id-group-specification to mimic the behavior of the ID statement and a maximum-id-specification or mimimum-id-specification in the OUTPUT statement. |
Tip: | When you want the output
data set to contain extreme values along with other id variables, it is more
efficient to include them in the id-variable-list
than to request separate statistics. For example, the statement
output idgrp(max(x) out(x a b)= );is more efficient than the statement output idgrp(max(x) out(a b)= ) max(x)=; |
Featured in: | Computing Output Statistics and Identifying the Top Three Extreme Values with the Output Statistics |
Note: If you specify fewer new variable names than the
combination of analysis variables and identification variables then the remaining
output variables use the corresponding names of the ID variables as soon as
PROC MEANS exhausts the list of new variable names.
MAXID <(variable-1
<(id-variable-list-1)> <...variable-n
<(id-variable-list-n)>>)> = name(s) |
Tip: | If you use an ID statement and omit variable, PROC MEANS uses all analysis variables. |
Default: | the ID statement variables |
Tip: | If you use an ID statement, and omit variable and id-variable, PROC MEANS associates all ID statement variables with each analysis variable. Thus, for each analysis variable, the number of variables that are created in the output data set equals the number of variables that you specify in the ID statement. |
Tip: | Use the AUTONAME option to automatically resolve naming conflicts. |
Limitation: | If multiple observations contain the maximum value within a class level, PROC MEANS saves the value of the ID variable for only the first of those observations in the output data set. |
Featured in: | Identifying an Extreme Value with the Output Statistics |
Note: If you specify fewer new variable names than the
combination of analysis variables and identification variables then the remaining
output variables use the corresponding names of the ID variables as soon as
PROC MEANS exhausts the list of new variable names.
MINID<(variable-1
<(id-variable-list-1)>
<...variable-n
<(id-variable-list-n)>>)> = name(s) |
Featured in: | Identifying the Top Three Extreme Values with the Output Statistics |
output min(x)=/autoname;produces the x_Min variable in the output data set.
AUTONAME activates the SAS internal mechanism to automatically resolve conflicts in the variable names in the output data set. Duplicate variables will not generate errors. As a result, the statement
output min(x)= min(x)=/autoname;produces two variables, x_Min and x_Min2, in the output data set.
Featured in: | Identifying the Top Three Extreme Values with the Output Statistics |
Main discussion: | Output Data Set |
Featured in: | Computing Output Statistics |
Tip: | By default, the output data set includes an output variable for each analysis variable and for five observations that contain N, MIN, MAX, MEAN, and STDDEV. Unless you specify NOINHERIT, this variable inherits the format of the analysis variable, which may be invalid for the N statistic (for example, datetime formats). |
Main discussion: | Output Data Set |
See also: | WAYS Statement |
Featured in: | Computing Output Statistics |
Chapter Contents |
Previous |
Next |
Top of Page |
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.