By
default, PROC FREQ excludes missing values before it constructs the frequency
and crosstabulation tables. PROC FREQ also excludes missing values before
computing statistics. However, PROC FREQ displays the total frequency of observations
with missing values below each table. The following options in the TABLES
statement change how PROC FREQ handles missing values:
- MISSPRINT
- includes missing value frequencies in frequency or crosstabulation
tables.
- MISSING
- includes missing values in percentage and statistical calculations.
The OUT= option in the TABLES statement includes an observation
in the
output data set that contains the frequency of missing values. The NMISS keyword
in the OUTPUT statement creates a variable in the output data set that contains
the number of missing values.
Missing Values in Frequency Tables shows three ways that PROC FREQ handles missing values. The first table uses
the default method; the second table uses MISSPRINT; and the third table uses
MISSING.
Missing Values in Frequency Tables
When a combination of variable values for a crosstabulation is missing,
PROC FREQ assigns zero to the frequency count for the table cell. By default,
PROC FREQ omits missing combinations in list format and in the output data
set that is created with a TABLES statement. To include the missing combinations,
use SPARSE with LIST or OUT= in the TABLES statement.
PROC FREQ treats missing BY variable values like any other BY variable
value. The missing values form a separate BY group. When the value of a WEIGHT
variable is missing, PROC FREQ excludes the observation from the analysis.
By
default, a one-way table lists the variable name, variable values, frequency
counts, percentages, cumulative frequency counts, cumulative percentages,
and the number of missing values. Unless you use LIST in the TABLES statement,
a two-way table appears as a crosstabulation table. An n-way
table appears as multiple crosstabulation tables with one table for each combination
of values for the stratification variables. By default, each cell of a crosstabulation
table lists the frequency count, percentage of the total frequency count,
row percentage, and column percentage.
Use the following TABLES statement options to report additional information
for each table cell:
- CELLCHI2
- includes the cell's contribution to the total chi-square
statistic
- CUMCOL
- includes the cumulative column percentage of the cell
- DEVIATION
- includes the deviation of the cell frequency from the expected
value
- EXPECTED
- includes the expected cell frequency under the hypothesis
of independence.
You can also use the
SCOROUT option to display the type of score, row score, and column score for
two-way tables.
By default, PROC FREQ displays the next one-way frequency table on the
current page when there is enough space to display the entire table. If you
use COMPRESS in the PROC FREQ statement, the next one-way table starts to
display on the current page even when the entire table will not fit. If you
use PAGE in the PROC FREQ statement, each frequency or crosstabulation table
always displays on a separate page.
By default, PROC FREQ uses the
BEST6. format to display a cell
frequency when the frequency is less than 1E6. Otherwise, it uses the BEST7.
format so that frequency values with more than seven significant digits display
in scientific notation (E format). The V5FMT option in the TABLES statement
uses BEST8. format so that frequency values with more than eight significant
digits display in scientific notation.
When scientific notation is used, only the first few significant digits
are shown. If you need more significant digits than PROC FREQ displays, create
an output data set by specifying OUT= in the TABLES statement. Then use PROC
PRINT and assign an appropriate format to the variable COUNT. For example,
the statement
format count 10.;
displays exact integer counts up
to 9999999999. For more information about formats, see the section on components
of the SAS language in SAS Language Reference: Concepts.
The NOPRINT option in the PROC
FREQ statement and NOPRINT, NOCOL,
NOCUM, NOFREQ, NOPERCENT, and NOROW in the TABLES statement suppress displayed
output. Use NOPRINT in the PROC FREQ statement to suppress all displayed output
as well as the Output Delivery System. Use NOPRINT in the TABLES statement
to suppress frequency and crosstabulation tables but still display the requested
statistics. Use NOCOL, NOCUM, NOFREQ, NOPERCENT, and NOROW to suppress various
frequencies and percentages in the frequency and crosstabulation tables.
- CAUTION:
- Multiway tables can generate a great deal of displayed
output.
For example, if the variables A, B, C, D, and E
each have ten levels, the table request A*B*C*D*E may generate 1000 or more
pages of output. If you are primarily interested in the tests and measures
of association, use NOPRINT in the TABLES statement to suppress the tables
but display the statistics. Or use NOPRINT in the PROC FREQ statement to suppress
all displayed output, and use the OUTPUT statement to store the statistics
in an output data set. If you are interested in frequency counts and percentages
use LIST in the TABLES statement.
PROC
FREQ produces two types of output data sets that you can use with other statistical
and reporting procedures. These data sets are produced as follows:
- TABLES statement, OUT= option
- creates an output data set that contains frequency or crosstabulation
table counts and percentages.
- OUTPUT statement
- creates an output data set that contains statistics.
PROC FREQ does not display the output data set. Use PROC PRINT, PROC
REPORT, or any other SAS reporting tool to display the output data set.
The OUT= option in the TABLES statement creates an output data
set that contains one observation for each combination of the variable values
in the last table request. By default, each observation contains the frequency
and percentage for each combination of variable values. When the input data
set contains missing values, the output data set contains an observation with
the frequency of missing values. The output data set includes the following
variables:
- BY variables
- table request variables, such as A, B, C, and D in the table request
A*B*C*D
- COUNT variable containing the cell frequency
- PERCENT variable containing the cell percentage.
If
you use OUTEXPECT and OUTPCT, the output data set also contains expected
frequencies and row, column, and table percentages, respectively. The additional
variables are
- EXPECTED variable containing the expected frequency
- PCT_TABL variable containing the percentage of two-way table frequency,
for n-way tables where n > 2
- PCT_ROW variable containing the percentage of row frequency
- PCT_COL variable containing the
percentage of column frequency.
When you submit the following statements
proc freq;
tables a a*b / out=d;
run;
the output data set D contains frequencies and percentages for
the last table request, A*B. If A has two levels (1 and 2), B has three levels
(1, 2, and 3), and no table cell count is zero or missing, the output data
set D includes six observations, one for each combination of A and B. The
first observation corresponds to A=1 and B=1; the second observation corresponds
to A=1 and B=2; and so on. The data set also includes the variables COUNT
and PERCENT. The value of COUNT is the number of observations that have the
given combination of A and B values. The value of PERCENT is the percent of
the total number of observations having that A and B combination.
When PROC FREQ combines different variable values into the same formatted
level, the output data set contains the smallest internal value for the formatted
level. For example, suppose a variable X has the values 1.1, 1.4, 1.7, 2.1,
and 2.3. When you submit the statement
format x 1.;
in a PROC FREQ step,
the formatted levels listed in the frequency table for X are 1 and 2. If you
create an output data set with the frequency counts, the internal values of
X are 1.1 and 1.7. To report the internal values of X when you display the
output data set, use a format of 3.1 with X.
The
OUTPUT statement creates a SAS data set that contains the statistics that
PROC FREQ computes for the last table request. You specify which statistics
to store in the output data set. There is an observation with the specified
statistics for each stratum or two-way table. If PROC FREQ computes summary
statistics for a stratified table, the output data set also contains a summary
observation for these statistics. Additionally, you can output statistics
for one-way tables, such as chi-square or binomial proportion statistics.
If you use a BY statement, the output data set contains observations for each
BY group.
The output data set can include the following variables:
- BY variables
- variables that identify the stratum such as A and B in
the table
request A*B*C*D
- variables that contain the specified statistics.
The output data set also includes variables with the
p-value
and degrees of freedom, asymptotic standard error (ASE), or confidence limits
when PROC FREQ computes these values for a specified statistic.
The variable names for the specified statistics in the output data set
are the names of the keywords that are enclosed in underscores. PROC FREQ
forms variable names for the corresponding p-values, degrees
of freedom, or confidence limits by combining the name of the keyword with
one of the following prefixes
DF_ |
degrees of freedom |
E_ |
asymptotic standard error (ASE) |
E0_ |
asymptotic standard error under the null hypothesis |
L_ |
lower confidence limit |
P_ |
p-value |
P2_ |
two-sided p-value |
PL_ |
left-sided p-value |
PR_ |
right-sided p-value |
U_ |
upper confidence limit |
XP_ |
exact p-value |
XP2_ |
exact two-sided p-value |
XPR_ |
exact right-sided p-value |
XPL_ |
exact left-sided p-value |
XL_ |
exact lower confidence limit |
XU_ |
exact upper confidence limit |
Z_ |
standardized value |
If the length of the prefix plus the statistic keyword exceeds
eight characters, PROC FREQ truncates the keyword so that the name of the
new variable is eight characters long.
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.