Chapter Contents |
Previous |
Next |
The UNIVARIATE Procedure |
Alias: | HIST |
Tip: | You can use multiple HISTOGRAM statements. |
Featured in: | Fitting Density Curves and Creating a Two-Way Comparative Histogram |
HISTOGRAM <variable(s)> </ option(s)>; |
To do this | Use this option | |
---|---|---|
Create output data set with information on histogram intervals | OUTHISTOGRAM= | |
Request estimated density curve | ||
Fit beta density with threshold parameter , scale parameter , and shape parameters and | BETA(beta-suboptions) | |
Fit exponential density with threshold parameter and scale parameter | EXPONENTIAL(exponential-suboptions) | |
Fit gamma density with threshold parameter , scale parameter , and shape parameter | GAMMA(gamma-suboptions) | |
Fit nonparametric kernel density estimates | KERNEL(kernel-suboptions) | |
Fit lognormal density with threshold parameter , scale parameter , and shape parameter | LOGNORMAL(lognormal-suboptions) | |
Fit normal density with mean and standard deviation | NORMAL(normal-suboptions) | |
Fit Weibull density with threshold parameter , scale parameter , and shape parameter | WEIBULL(Weibull-suboptions) | |
Parametric density curve suboptions | ||
Specify shape parameter for fitted beta or gamma curve | ALPHA= | |
Specify second shape parameter for beta fitted curve | BETA= | |
Specify shape parameter for fitted Weibull curve | C= | |
Specify the mean for fitted normal curve | MU= | |
Specify scale parameter for the fitted beta curve, exponential curve, gamma curve and Weibull curve; standard deviation for fitted normal curve; or the scale parameter for the fitted lognormal curve | SIGMA= | |
Specify threshold parameter for fitted beta curve, exponential curve, gamma curve, lognormal curve, and Weibull curve | THETA= | |
Specify scale parameter for fitted lognormal curve | ZETA= | |
Nonparametric density curve suboptions | ||
Specify standardized bandwidth parameter for fitted kernel density estimates | C= | |
Specify type of kernel density curve | K= | |
Control appearance of fitted density curves | ||
Specify color of fitted curve | COLOR= | |
Fill area under fitted curve | FILL | |
Specify line type of fitted curve | L= | |
Display table of histogram interval midpoints | MIDPERCENTS | |
Suppress the table summarizing the fitted curve | NOPRINT | |
List percentages for calculated and estimated quantiles | PERCENTS= | |
Specify width of fitted density curve | W= | |
Control general histogram layout | ||
Specify width for the bars | BARWIDTH= | |
Force creation of a histogram | FORCEHIST | |
Create a grid | GRID | |
Specify offset for horizontal axis | HOFFSET= | |
Specify reference lines perpendicular to the horizontal axis | HREF= | |
Specify labels for HREF= lines | HREFLABELS= | |
Specify vertical position of labels for HREF= lines | HREFLABPOS= | |
Specify a line style for grid lines | LGRID= | |
List percentages for histogram intervals | MIDPOINTS= | |
Suppress histogram bars | NOBARS | |
Suppress frame around plotting area | NOFRAME | |
Suppress label for horizontal axis | NOHLABEL | |
Suppress plot | NOPLOT | |
Suppress label for vertical axis | NOVLABEL | |
Suppress tick marks and tick mark labels for vertical axis | NOVTICK | |
Include right endpoint in interval | RTINCLUDE | |
Turn and vertically string out characters in labels for vertical axis | TURNVLABELS | |
Specify tick mark values for vertical axis | VAXIS= | |
Specify label for vertical axis | VAXISLABEL= | |
Specify length of offset at upper end of vertical axis | VOFFSET= | |
Specify reference lines perpendicular to the vertical axis | VREF= | |
Specify labels for VREF= lines | VREFLABELS= | |
Specify horizontal position of labels for VREF= lines | VREFLABPOS= | |
Specify scale for vertical axis | VSCALE= | |
Specify line thickness for axes and frame | WAXIS= | |
Specify line thickness for grid | WGRID= | |
Enhance the graph | ||
Specify annotate data set | ANNOTATE= | |
Specify color for axis | CAXIS= | |
Specify color of outlines of histogram bars | CBARLINE= | |
Specify color for filling under curve | CFILL= | |
Specify color for frame | CFRAME= | |
Specify color for grid lines | CGRID= | |
Specify color for HREF= lines | CHREF= | |
Specify color for text | CTEXT= | |
Specify color for VREF= lines | CVREF= | |
Specify description for plot in graphics catalog | DESCRIPTION= | |
Specify software font for text | FONT= | |
Specify height of text used outside framed areas | HEIGHT= | |
Specify number of horizontal minor tick marks | HMINOR= | |
Specify software font for text inside framed areas | INFONT= | |
Specify height of text inside framed areas | INHEIGHT= | |
Specify line style for HREF= lines | LHREF= | |
Specify line style for VREF= lines | LVREF= | |
Specify name for plot in graphics catalog | NAME= | |
Specify pattern for filling under curve | PFILL= | |
Specify number of vertical minor tick marks | VMINOR= | |
Specify line thickness for bar outlines | WBARLINE= | |
Enhance comparative histograms | ||
Apply annotation requested in ANNOTATE= data set to key cell only | ANNOKEY | |
Specify color for filling frame for row labels | CFRAMESIDE= | |
Specify color for filling frame for column labels | CFRAMETOP= | |
Specify color for proportion of frequency bar | CPROP= | |
Specify distance between tiles | INTERTILE= | |
Specify maximum number of bins to display | MAXNBIN= | |
Limit the number of bins that display to within a specified number of standard deviations above and below mean of data in key cell | MAXSIGMAS= | |
Specify number of columns in comparative histogram | NCOLS= | |
Specify number of rows in comparative histogram | NROWS= |
Arguments |
Default: | If you omit variable(s) in the HISTOGRAM statement, then the procedure creates a histogram for each variable that you list in the VAR statement, or for each numeric variable in the DATA= data set if you omit a VAR statement. |
Requirement: | If you specify a VAR statement, use a subset of the variable(s) that you list in the VAR statement. Otherwise, variable(s) are any numeric variables in the DATA= data set. |
Options |
Alias: | A= if you use it as a beta-suboption. SHAPE= if you use it as a gamma-suboption |
Default: | a maximum likelihood estimate |
Requirement: | Enclose this suboption in parentheses after the BETA option or GAMMA option. |
Requirement: | This option is ignored unless you specify the CLASS statement. |
Tip: | Use the KEYLEVEL= option in the CLASS statement to specify the key cell. |
See also: | the KEYLEVEL= option |
Alias: | ANNO= |
Tip: | You can also specify an ANNOTATE= data set in the PROC UNIVARIATE statement to enhance all the graphic displays that the procedure creates. |
See also: | ANNOTATE= in the PROC UNIVARIATE statement |
Restriction: | The BETA option can occur only once in a HISTOGRAM statement. |
Interaction: | The beta distribution is bounded
below by the parameter
and above by the value
. Use the THETA= and SIGMA= suboptions to specify these
parameters. The default values for THETA= and SIGMA= are 0 and 1, respectively.
You can specify THETA=EST and SIGMA=EST to request maximum likelihood estimates
for
and
.
Note: Three- and four-parameter maximum likelihood
estimation may not always converge. |
Interaction: | The beta distribution has two shape parameters, and . If these parameters are known, you can specify their values with the ALPHA= and BETA= options. By default, PROC UNIVARIATE computes maximum likelihood estimates for and . |
Main Discussion: | See Beta Distribution |
See also: | the ALPHA= suboption , BETA= suboption , SIGMA= suboption , and THETA= suboption |
Alias: | B= |
Default: | a maximum likelihood estimate |
Requirement: | Enclose this suboption in parentheses after the BETA option. |
Default: | a maximum likelihood estimate |
Requirement: | Enclose this suboption in parentheses after the WEIBULL option. |
Default: | the bandwidth that minimizes the approximate MISE. |
Restriction: | You can specify up to five values to request multiple estimates. |
Requirement: | Enclose this suboption in parentheses after the KERNEL option. |
Interaction: | You can also use the C= suboption
with the K= suboption, which specifies the kernel function, to compute multiple
estimates. If you specify more kernel functions than bandwidths, PROC UNIVARIATE
repeats the last bandwidth in the list for the remaining estimates. Likewise,
if you specify more bandwidths than kernel functions, then PROC UNIVARIATE
repeats the last kernel function for the remaining estimates. For example,
the following statements compute three density estimates:
proc univariate; var length; histogram length / kernel(c=1 2 3 k=normal quadratic); run;The first uses a normal kernel and a bandwidth of 1, the second uses a quadratic kernel and a bandwidth of 2, and the third uses a quadratic kernel and a bandwidth of 3. |
Tip: | To estimate a bandwidth that minimizes the
approximate mean integrated square error (MISE) use the C=MISE suboption.
For example, the following statements compute three density estimates:
proc univariate; var length; histogram length / kernel(c=0.5 1.0 mise); run;The first two estimates have standardized bandwidths of 0.5 and 1.0, respectively, and the third has a bandwidth that minimizes the approximate MISE. |
Alias: | CAXES= and CA= |
Default: | the first color in the device color list |
Default: | the first color in the device color list |
Featured in: | Fitting Density Curves |
See also: | FILL option and PFILL=option |
Featured in: | Fitting Density Curves and Creating a Two-Way Comparative Histogram |
Alias: | CRF= |
Default: | The area is not filled. |
Default: | These areas are not filled. |
Requirement: | This option is ignored unless you specify the CLASS statement. |
Default: | These areas are not filled. |
Requirement: | This option is ignored unless you specify the CLASS statement. |
Default: | the first color in the device color list |
Interaction: | This option automatically invokes the GRID= option. |
Default: | the first color in the device color list |
Requirement: | You must enclose this suboption in parentheses after the density curve option or the KERNEL option. |
Interaction: | You can specify as a KERNEL suboption a list of up to five colors in parentheses for multiple kernel density estimates. If there are more estimates than colors, the remaining estimates use the last color that you specify. |
Default: | bars do not display |
Requirement: | This option is ignored unless you specify the CLASS statement. |
Tip: | Use the keyword EMPTY to display empty bars. |
Alias: | CT= |
Default: | The color that you specify for the CTEXT= option in the GOPTIONS statement. If you omit the GOPTIONS statement, the default is the first color in the device color list. |
Alias: | CV= |
Default: | the first color in the device color list |
Alias: | DES= |
Default: | the variable name |
Alias | EXP |
Restriction: | The EXPONENTIAL option can occur only once in a HISTOGRAM statement. |
Interaction: | The parameter must be less than or equal to the minimum data value. Use the THETA= suboption to specify . The default value for is zero. Specify THETA=EST to request the maximum likelihood estimate for . |
Interaction: | Use the SIGMA= suboption to specify
. By default, PROC UNIVARIATE computes a maximum likelihood
estimate for
. For example, the following statements fit an exponential
curve with
and with a maximum likelihood estimate for
:
proc univariate; var length; histogram / exponential(theta=10 l=2 color=red); run; |
Main discussion: | See Exponential Distribution |
See also: | the SIGMA= suboption and THETA= suboption |
Featured in: | Fitting Density Curves |
Restriction: | The FILL suboption can occur with only one fitted curve. |
Requirement: | Enclose the FILL suboption in parentheses after a density curve option or the KERNEL option. |
Interaction: | The CFILL= and PFILL= options specify the color and pattern for the area under the curve. |
See also: | For a list of available colors and patterns, see SAS/GRAPH Software: Reference |
Featured in: | Fitting Density Curves |
Default: | hardware characters |
Interaction: | The FONT= font takes precedence over the FTEXT= font that you specify in the GOPTIONS statement. |
Restriction: | The GAMMA option can occur only once in a HISTOGRAM statement. |
Interaction: | The parameter must be less than the minimum data value. Use the THETA= suboption to specify . The default value for is zero. Specify THETA=EST to request the maximum likelihood estimate for . |
Interaction: | Use the ALPHA= and the SIGMA= suboptions
to specify the shape parameter
and the scale parameter
. By default, PROC UNIVARIATE computes maximum likelihood
estimates for
and
. For example, the following statements fit a gamma curve
with
and with a maximum likelihood estimate for
and
:
proc univariate; var length; histogram length/ gamma(theta=4); run;PROC UNIVARIATE calculates the maximum likelihood estimate of iteratively using the Newton-Raphson approximation. |
Main discussion: | See Gamma Distribution |
See also: | the SIGMA= suboption , ALPHA= suboption , and the THETA= suboption |
See also: | the CGRID= option |
Alias: | HM= |
Default: | 0 |
Tip: | Use HOFFSET=0 to eliminate the default offset. |
See also: | CHREF= option and LHREF= option . |
Alias: | HREFLABEL= and HREFLAB= |
Restriction: | The number of labels must equal the number of reference lines. Labels can have up to 16 characters. |
1 | positions the labels along the top of the histogram |
2 | staggers the labels from top to bottom |
3 | positions the labels along the bottom. |
Default: | 1 |
See also: | For a list of fonts, see SAS/GRAPH Software: Reference. |
Default: | The height that you specify with the HEIGHT= option. If you do not specify the HEIGHT= option, the default height is the height that you specify with the HTEXT= option in the GOPTIONS statement. |
Default: | .75 in percentage screen units. |
Requirement: | This option is ignored unless you specify the CLASS statement. |
Featured in: | Creating a Two-Way Comparative Histogram |
Default: | normal kernel |
Restriction: | You can specify up to five values to request multiple estimates. |
Requirement: | You must enclose this suboption in parentheses after the KERNEL option. |
Interaction: | You can also use the K= suboption
with the C= suboption, which specifies standardized bandwidths. If you specify
more kernel functions than bandwidths, PROC UNIVARIATE repeats the last bandwidth
in the list for the remaining estimates. Likewise, if you specify more bandwidths
than kernel functions, PROC UNIVARIATE repeats the last kernel function for
the remaining estimates. For example, the following statements compute three
estimates with bandwidths of 0.5, 1.0, and 1.5:
proc univariate; var length; histogram length / kernel(c=0.5 1.0 1.5 k=normal quadratic); run;The first estimate uses a normal kernel, and the last two estimates use a quadratic kernel. |
Tip: | To request multiple kernel density estimates on the same histogram, specify a list of values for either the C= suboption or K= suboption. |
Main discussion: | Kernel Density Estimates |
See also: | C= suboption and K= suboption |
Default: | 1, which produces a solid line. |
Requirement: | You must enclose the L= suboption in parentheses after a density curve option or the KERNEL option. |
Interaction: | If you use the L= suboption with the KERNEL option, you can specify a single line type or a list of line types. |
See also: | For a list of available line types, see SAS/GRAPH Software: Reference |
Featured in: | Fitting Density Curves |
Default: | 1, which produces a solid line |
Interaction: | This option automatically invokes the GRID= option. |
Alias: | LH= |
Default: | 2, which produces a dashed line |
Restriction: | The LOGNORMAL option can occur only once in a HISTOGRAM statement. |
Interaction: | The parameter must be less than the minimum data value. Use the THETA= suboption to specify . The default value for is zero. Specify THETA=EST to request the maximum likelihood estimate for . |
Interaction: | Use the SIGMA= and ZETA= suboptions
to specify
and
. By default, PROC UNIVARIATE computes a maximum likelihood
estimate for
and
. For example, the following statements fit a lognormal
distribution function with a default value of
and with maximum likelihood estimates for
and
:
proc univariate; var length; histogram length/ lognormal; run; |
Main discussion: | See Lognormal Distribution |
See also: | the ZETA= suboption , SIGMA= suboption , and THETA= suboption |
Alias: | LV= |
Default: | 2, which produces a dashed line |
By default, PROC UNIVARIATE determines the bin size and midpoints for the key cell, and then extends the midpoint list to accommodate the data ranges for the remaining cells. However, if the cell scales differ considerably, the resulting number of bins may be so great that each cell histogram is scaled into a narrow region. By using MAXNBIN= to limit the number of bins, you can narrow the window about the data distribution in the key cell.
Requirement: | This option is ignored unless you specify the CLASS statement. |
Tip: | MAXNBIN= provides an alternative to the MAXSIGMAS= option. |
By default, PROC UNIVARIATE determines the bin size and midpoints for the key cell, and then extends the midpoint list to accommodate the data ranges for the remaining cells. However, if the cell scales differ considerably, the resulting number of bins may be so great that each cell histogram is scaled into a narrow region. By using MAXSIGMAS= to limit the number of bins, you can narrow the window that surrounds the data distribution in the key cell.
Requirement: | This option is ignored unless you specify the CLASS statement. |
Interaction: | If you specify MIDPERCENTS in parentheses after a density estimate option, PROC UNIVARIATE displays a table that lists the midpoints, the observed percentage of observations, and the estimated percentage of the population in each interval (estimated from the fitted distribution). |
Range: | The range of midpoints, extended at each
end by half of the bar width, must cover the range of the data. For example,
if you specify
midpoints=2 to 10 by 0.5then all of the observations should fall between 1.75 and 10.25. |
Requirement: | You must use evenly spaced midpoints which you list in increasing order. |
Requirement: | This option is ignored unless you specify the CLASS statement. |
Requirement: | This option does not apply unless you specify the CLASS statement. |
Default: | If you use a CLASS statement, MIDPOINTS=KEY; however, if the key cell is empty then MIDPOINTS=UNIFORM. Otherwise, PROC UNIVARIATE computes the midpoints by using an algorithm (Terrell and Scott, 1985) that is primarily applicable to continuous data that are approximately normally distributed. |
Featured in: | Fitting Density Curves and Creating a Two-Way Comparative Histogram |
Default: | the sample mean |
Requirement: | You must enclose this suboption in parentheses after the NORMAL option. |
Default: | UNIVAR |
Alias: | NCOL= |
Default: | NCOLS=1, if you specify only one class variable, and NCOLS=2, if you specify two class variables. |
Requirement: | This option is ignored unless you specify the CLASS statement. |
Interaction: | If you specify two class variables, you can use the NCOLS= option with the NROWS= option. |
Featured in: | Creating a Two-Way Comparative Histogram |
Tip: | Use this option to display only the fitted curves. |
Tip: | Use this option to reduce clutter. |
Alias: | NOCHART |
Tip: | Use NOPLOT when you want to display only descriptive statistics for a fitted density or create an OUTHISTOGRAM= data set. |
Requirement: | Enclose this option in the parentheses that follow the density curve option. |
Featured in: | Fitting Density Curves |
Restriction: | The NORMAL option can occur only once in a HISTOGRAM statement. |
Interaction: | Use the MU= and SIGMA= suboptions to specify and . By default, PROC UNIVARIATE uses the sample mean and sample standard deviation for and . |
Main discussion: | See Normal Distribution |
See also: | the MU= suboption and the SIGMA= suboption |
Featured in: | Fitting Density Curves |
Interaction: | This option automatically invokes the NOVLABEL option. |
Alias: | NROW= |
Default: | 2 |
Requirement: | This option is ignored unless you specify the CLASS statement. |
Interaction: | If you specify two class variables, you can use the NCOLS= option with the NROWS= option. |
Featured in: | Creating a Two-Way Comparative Histogram |
Alias: | OUTHIST= |
See also: | OUTHISTOGRAM= Data Set |
Alias: | PERCENT= |
Default: | 1, 5, 10, 25, 50, 75, 90, 95, and 99 percent |
Range: | between 0 and 100 |
Requirement: | You must enclose this suboption in parentheses after the curve option. |
Default: | The bars and curve areas are not filled. |
See also: | CFILL= option and FILL option |
See also: | SAS/GRAPH Software: Reference |
See also: | SIGMA= suboption and ZETA= suboption |
See also: | ALPHA suboption , SIGMA suboption , and C= suboption |
Default: | see Uses of the SIGMA suboption |
Requirement: | You must enclose this suboption in parentheses after the density curve option. |
Tip: | As a BETA suboption, you can specify SIGMA=EST to request a maximum likelihood estimate for . |
Distribution Keyword | SIGMA= Specifies | Default Value | Alias |
---|---|---|---|
BETA | scale parameter
|
1 | SCALE= |
EXPONENTIAL | scale parameter
|
maximum likelihood estimate | SCALE= |
GAMMA | scale parameter
|
maximum likelihood estimate | SCALE= |
WEIBULL | scale parameter
|
maximum likelihood estimate | SCALE= |
LOGNORMAL | shape parameter
|
maximum likelihood estimate | SCALE= |
NORMAL | scale parameter
|
standard deviation | SHAPE= |
Default: | 0 |
Requirement: | You must enclose this suboption in parentheses after the curve option. |
Tip: | To compute a maximum likelihood estimate for , specify THETA=EST. |
Alias: | TURNVLABEL |
Requirement: | Use evenly spaced values which you list in increasing order. The first value must be zero and the last value must be greater than or equal to the height of the largest bar. You must scale the values in the same units as the bars. |
See also: | the VSCALE= option |
Featured in: | Creating a Two-Way Comparative Histogram |
Requirement: | Labels can have up to 40 characters. |
Featured in: | Creating a Two-Way Comparative Histogram |
Alias: | VM= |
Default: | 0 |
See also: | CVREF= option and LVREF= option . |
Alias: | VREFLABEL= and VREFLAB= |
Restriction: | The number of labels must equal the number of reference lines. Labels can have up to 16 characters. |
1 | positions the labels at the left of the histogram. |
2 | positions the labels at the right of the histogram. |
Default: | 1 |
Default: | PERCENT |
Featured in: | Creating a Two-Way Comparative Histogram |
Default: | 1 |
Requirement: | You must enclose this suboption in parentheses after the density curve option or the KERNEL option. |
Interaction: | As a KERNEL suboption, you can specify a list of up to five W= values. |
Default: | 1 |
Default: | 1 |
Restriction: | The WEIBULL option can occur only once in a HISTOGRAM statement. |
Interaction: | The parameter must be less than the minimum data value. Use the THETA= suboption to specify . The default value for is zero. Specify THETA=EST to request the maximum likelihood estimate for . |
Interaction: | Use ALPHA= and the SIGMA= suboptions
to specify the shape parameter
and the scale parameter
. By default, PROC UNIVARIATE computes the maximum likelihood
estimates for
and
. For example, the following statements fit a Weibull curve
with
and with a maximum likelihood estimate for
and
:
proc univariate; var length; histogram length/ weibull(theta=4); run;PROC UNIVARIATE calculates the maximum likelihood estimate of iteratively by using the Newton-Raphson approximation. |
Main discussion: | See Weibull Distribution |
See also: | the C= suboption , SIGMA= suboption , and THETA= suboption |
Default: | a maximum likelihood estimate |
Requirement: | You must enclose this suboption in parentheses after the LOGNORMAL option. |
Chapter Contents |
Previous |
Next |
Top of Page |
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.