Formulas for Fitted Curves

HISTOGRAM Statement

Formulas for Fitted Curves

The following sections provide information on the families of parametric distributions that you can fit with the HISTOGRAM statement. Properties of these distributions are discussed by Johnson et al. (1994, 1995).

Beta Distribution

The fitted density function is

$p(x) = \{ \frac{(x-\theta)^{\alpha-1}(\sigma+\theta-x)^{\beta-1}} { B(\alpha,\b... ...ta + \sigma} \ 0 & {for x \leq \theta\space or x \geq \theta + \sigma\space } .$


where  and  
 
		  lower threshold parameter (lower endpoint parameter)
		  scale parameter  
		  shape parameter  
		  shape parameter  
		 h = width of histogram interval

Note: This notation is consistent with that of other distributions that you can fit with the HISTOGRAM statement. However, many texts, including Johnson et al. (1995), write the beta density function as

$p(x) = \{ \frac{(x - a)^{p - 1} (b - x)^{q - 1} } {B(p ,q)(b - a )^{p + q - 1} } & {for a \lt x \lt b} \ 0 & {for x \leq a\space or x \geq b\space } .$


The two notations are related as follows:

The range of the beta distribution is bounded below by a threshold parameter $\theta = a$ and above by $\theta + \sigma = b$ . If you specify a fitted beta curve using the BETA option, $\theta$ must be less than the minimum data value, and $\theta + \sigma$ must be greater than the maximum data value. You can specify $\theta$ and $\sigma$ with the THETA= and SIGMA= beta-options in parentheses after the keyword BETA. By default, $\sigma=1$ and $\theta=0$ .If you specify THETA=EST and SIGMA=EST, maximum likelihood estimates are computed for $\theta$ and $\sigma$ .

In addition, you can specify $\alpha$ and $\beta$ with the ALPHA= and BETA= beta-options, respectively. By default, the procedure calculates maximum likelihood estimates for $\alpha$ and $\beta$ . For example, to fit a beta density curve to a set of data bounded below by 32 and above by 212 with maximum likelihood estimates for $\alpha$ and $\beta$ , use the following statement:

   histogram length / beta(theta=32 sigma=180);

The beta distributions are also referred to as Pearson Type I or II distributions. These include the power-function distribution ( $\beta = 1$ ), the arc-sine distribution ( $\alpha =\beta = \frac{1}2$ ), and the generalized arc-sine distributions ( $\alpha +\beta =1$ , $\beta \neq \frac{1}2$ ).

You can use the DATA step function BETAINV to compute beta quantiles and the DATA step function PROBBETA to compute beta probabilities.

Exponential Distribution

The fitted density function is

$p(x) = \{ \frac{h x 100\%}{\sigma} \exp(-(\frac{x - \theta} {\sigma})) & {for x \geq \theta} \ 0 & {for x \lt \theta} .$


where 
 
		  threshold parameter
		  scale parameter  
		 h = width of histogram interval

The threshold parameter $\theta$ must be less than or equal to the minimum data value. You can specify $\theta$ with the THRESHOLD= exponential-option. By default, $\theta=0$ . If you specify THETA=EST, a maximum likelihood estimate is computed for $\theta$ .In addition, you can specify $\sigma$ with the SCALE= exponential-option. By default, the procedure calculates a maximum likelihood estimate for $\sigma$ . Note that some authors define the scale parameter as $\frac{1}{\sigma}$ .

The exponential distribution is a special case of both the gamma distribution (with $\alpha=1$ ) and the Weibull distribution (with c=1). A related distribution is the extreme value distribution. If Y = exp(-X) has an exponential distribution, then X has an extreme value distribution.

Gamma Distribution

The fitted density function is

$p(x) = \{ \frac{h x 100\%}{\Gamma(\alpha)\sigma} (\frac{x - \theta}{\sigma})^{\... ...-(\frac{x - \theta}{\sigma})) & {for x \gt \theta} \ 0 & {for x \leq \theta} .$


where 
 
		  threshold parameter
		  scale parameter  
		  shape parameter  
		 h = width of histogram interval

The threshold parameter $\theta$ must be less than the minimum data value. You can specify $\theta$ with the THRESHOLD= gamma-option. By default, $\theta=0$ . If you specify THETA=EST, a maximum likelihood estimate is computed for $\theta$ .In addition, you can specify $\sigma$ and $\alpha$ with the SCALE= and ALPHA= gamma-options. By default, the procedure calculates maximum likelihood estimates for $\sigma$ and $\alpha$ . The gamma distributions are also referred to as Pearson Type III distributions, and they include the chi-square, exponential, and Erlang distributions. The probability density function for the chi-square distribution is

$p(x) = \{ \frac{1}{2\Gamma (\frac{\nu}2)} ( \frac{x}2 )^{\frac{\nu}2 - 1} \exp(-\frac{x}2) & {for x \gt 0} \ 0 & {for x \leq 0} .$

Notice that this is a gamma distribution with $\alpha = \frac{\nu}2$ , $\sigma=2$ , and $\theta=0$ .The exponential distribution is a gamma distribution with $\alpha=1$ , and the Erlang distribution is a gamma distribution with $\alpha$ being a positive integer. A related distribution is the Rayleigh distribution. If R = [(max(X₁, ... ,X_n))/(min(X₁, ... ,X_n))] where the X_i's are independent $\chi^2_{\nu}$ variables, then logR is distributed with a $\chi_{\nu}$ distribution having a probability density function of

$p(x) = \{[2^{\frac{\nu}2-1}\Gamma(\frac{\nu}2)] ^{-1}x^{\nu-1} \exp(-\frac{x^2}2) & {for x \gt 0} \ 0 & {for x \leq 0} .$

If $\nu=2$ , the preceding distribution is referred to as the Rayleigh distribution.

You can use the DATA step function GAMINV to compute gamma quantiles and the DATA step function PROBGAM to compute gamma probabilities.

Johnson S_B Distribution

The fitted density function is

$p(x) = \{ \frac{\delta h x 100\%}{\sigma \sqrt{2\pi} } [ ( \frac{x - \theta}{... ...theta + \sigma } \ 0 & {for x \leq \theta\space or x \geq \theta + \sigma } .$


where 
 
		  threshold parameter  
		  scale parameter  
		  shape parameter  
		  shape parameter  
		 h = width of histogram interval

The S_B distribution is bounded below by the parameter $\theta$ and above by the value $\theta + \sigma$ .The parameter $\theta$ must be less than the minimum data value. You can specify $\theta$ with the THETA= S_B-option, or you can request that $\theta$ be estimated with the THETA = EST S_B-option. The default value for $\theta$ is zero. The sum $\theta + \sigma$ must be greater than the maximum data value. The default value for $\sigma$ is one. You can specify $\sigma$ with the SIGMA= S_B-option, or you can request that $\sigma$ be estimated with the SIGMA = EST S_B-option.

By default, the method of percentiles given by Slifker and Shapiro (1980) is used to estimate the parameters. This method is based on four data percentiles, denoted by x_-3z, x_-z, x_z, and x_3z, which correspond to the four equally spaced percentiles of a standard normal distribution, denoted by -3z, -z, z, and 3z, under the transformation

$z = \gamma + \delta \log ( \frac{x - \theta} {\theta + \sigma - x} )$

The default value of z is 0.524. The results of the fit are dependent on the choice of z, and you can specify other values with the FITINTERVAL= option (specified in parentheses after the SB option). If you use the method of percentiles, you should select a value of z that corresponds to percentiles which are critical to your application.

The following values are computed from the data percentiles:

$m & = & x_{3z} - x_{z} \ n & = & x_{-z} - x_{-3z} \ p & = & x_{z} - x_{-z} \$

It was demonstrated by Slifker and Shapiro (1980) that

$\frac{mn}{p^2} \gt 1 & {for any S_U\space distribution} \\frac{mn}{p^2} \lt 1 & ... ...stribution} \\frac{mn}{p^2} = 1 & {for any S_L\space (lognormal) distribution} \$

A tolerance interval around one is used to discriminate among the three families with this ratio criterion. You can specify the tolerance with the FITTOLERANCE= option (specified in parentheses after the SB option). The default tolerance is 0.01. Assuming that the criterion satisfies the inequality

[mn/(p²)] < 1 - tolerance

the parameters of the S_B distribution are computed using the explicit formulas derived by Slifker and Shapiro (1980).

If you specify FITMETHOD = MOMENTS (in parentheses after the SB option) the method of moments is used to estimate the parameters. If you specify FITMETHOD = MLE (in pareqntheses after the SB option) the method of maximum likelihood is used to estimate the parameters. Note that maximum likelihood estimates may not always exist. Refer to Bowman and Shenton (1983) for discussion of methods for fitting Johnson distributions.

Johnson S_U Distribution

The fitted density function is

$p(x) = \{ \frac{ \delta h x 100\%}{\sigma \sqrt{2\pi} } \frac{ 1 } { \sqrt{ 1 ... ...}{\sigma} ) )^2 ] & {for x \gt \theta } \ 0 & {for x \leq \theta\space } .$


where 
 
		  location parameter  
		  scale parameter  
		  shape parameter  
		  shape parameter  
		 h = width of histogram interval

You can specify the parameters with the THETA=, SIGMA=, DELTA=, and GAMMA= S_U-options, which are enclosed in parentheses after the SU option. If you do not specify these parameters, they are estimated.

$z = \gamma + \delta \sinh^{-1} ( \frac{x - \theta}{\sigma} )$

The following values are computed from the data percentiles:

$m & = & x_{3z} - x_{z} \ n & = & x_{-z} - x_{-3z} \ p & = & x_{z} - x_{-z} \$

It was demonstrated by Slifker and Shapiro (1980) that

$\frac{mn}{p^2} \gt 1 & {for any S_U\space distribution} \\frac{mn}{p^2} \lt 1 & ... ...stribution} \\frac{mn}{p^2} = 1 & {for any S_L\space (lognormal) distribution} \$

A tolerance interval around one is used to discriminate among the three families with this ratio criterion. You can specify the tolerance with the FITTOLERANCE= option (specified in parentheses after the SU option). The default tolerance is 0.01. Assuming that the criterion satisfies the inequality

[mn/(p²)] > 1 + tolerance

the parameters of the S_U distribution are computed using the explicit formulas derived by Slifker and Shapiro (1980).

If you specify FITMETHOD = MOMENTS (in parentheses after the SU option) the method of moments is used to estimate the parameters. If you specify FITMETHOD = MLE (in parentheses after the SU option) the method of maximum likelihood is used to estimate the parameters. Note that maximum likelihood estimates may not always exist. Refer to Bowman and Shenton (1983) for discussion of methods for fitting Johnson distributions.

Lognormal Distribution

The fitted density function is

$p(x) = \{ \frac{h x 100\%}{\sigma\sqrt{2\pi}(x - \theta)} \exp(-\frac{(\log(x-\theta)-\zeta)^2} {2\sigma^2}) & {for x \gt \theta} \ 0 & {for x \leq \theta} .$


where 
 
		  threshold parameter
		  scale parameter  
		  shape parameter  
		 h = width of histogram interval

The threshold parameter $\theta$ must be less than the minimum data value. You can specify $\theta$ with the THRESHOLD= lognormal-option. By default, $\theta=0$ . If you specify THETA=EST, a maximum likelihood estimate is computed for $\theta$ .You can specify $\zeta$ and $\sigma$ with the SCALE= and SHAPE= lognormal-options, respectively. By default, the procedure calculates maximum likelihood estimates for these parameters.

Note: The lognormal distribution is also referred to as the S_L distribution in the Johnson system of distributions.

Note: This book uses $\sigma$ to denote the shape parameter of the lognormal distribution, whereas $\sigma$ is used to denote the scale parameter of the beta, exponential, gamma, normal, and Weibull distributions. The use of $\sigma$ to denote the lognormal shape parameter is based on the fact that $\frac{1}{\sigma}(\log(X-\theta)-\zeta)$ has a standard normal distribution if X is lognormally distributed.

Normal Distribution

The fitted density function is

$p(x) = \frac{h x 100\%}{\sigma\sqrt{2\pi}} \exp(-\frac{1}2 (\frac{x - \mu}{\sigma})^2) & {for -\infty \lt x \lt \infty}$



where
		  mean
		  standard deviation  
		 h = width of histogram interval

You can specify $\mu$ and $\sigma$ with the MU= and SIGMA= normal-options, respectively. By default, the procedure estimates $\mu$ with the sample mean and $\sigma$ with the sample standard deviation.

You can use the DATA step function PROBIT to compute normal quantiles and the DATA step function PROBNORM to compute probabilities.

Note: The normal distribution is also referred to as the S_N distribution in the Johnson system of distributions.

Weibull Distribution

The fitted density function is

$p(x) = \{ \frac{ch x 100\%}{\sigma} (\frac{x - \theta}{\sigma})^{c - 1} \exp(-(\frac{x- \theta}{\sigma})^c) & {for x \gt \theta} \ 0 & {for x \leq \theta} .$


where
		  threshold parameter
		  scale parameter  
		 c = shape parameter (c >0) 
		 h = width of histogram interval

The threshold parameter $\theta$ must be less than the minimum data value. You can specify $\theta$ with the THRESHOLD= Weibull-option. By default, $\theta=0$ . If you specify THETA=EST, a maximum likelihood estimate is computed for $\theta$ .You can specify $\sigma$ and c with the SCALE= and SHAPE= Weibull-options, respectively. By default, the procedure calculates maximum likelihood estimates for $\sigma$ and c.

The exponential distribution is a special case of the Weibull distribution where c=1.

Chapter Contents
Previous
Next
Top