Parameter Estimation

The RELIABILITY Procedure

Parameter Estimation

Maximum Likelihood Estimation

Maximum likelihood estimation of the parameters of a statistical model involves maximizing the likelihood or, equivalently, the log likelihood with respect to the parameters. The parameter values at which the maximum occurs are the maximum likelihood estimates of the model parameters. The likelihood is a function of the parameters and of the data.

Let x₁, x₂, ... , x_n be the observations in a random sample, including the failures and censoring times (if the data are censored). Let $f({\theta};x)$ be the probability density of failure time, $S({\theta};x)=Pr\{X\ge x\}$ be the reliability function, and $F({\theta};x)=Pr\{X\le x\}$ be the cumulative distribution function, where ${\theta}$ is the vector of parameters to be estimated, ${\theta}=(\theta_{1}, \theta_{2}, ... , \theta_{p})$ .The probability density, reliability function, and CDF are determined by the specific distribution selected as a model for the data. The log likelihood is defined as

${L({\theta})} & = & \sum_{i}\log(f({\theta};x_{i})) + {\sum_{i}}^{'}\log(S({\th... ...theta};x_{i})) + {\sum_{i}}^{'''}[\log(F({\theta};x_{ui})-F({\theta};x_{li}))]$

where

$\sum$ is the sum over failed units
${\sum}^{'}$ is the sum over right-censored units
${\sum}^{''}$ is the sum over left-censored units
${\sum}^{'''}$ is the sum over interval-censored units

and (x_li, x_ui) is the interval in which the ith unit is interval censored. Only the sums appropriate to the type of censoring in the data are included when the preceding equation is used.

The RELIABILITY procedure maximizes the log likelihood with respect to the parameters ${\theta}$ using a Newton-Raphson algorithm. The Newton-Raphson algorithm is a recursive method for computing the maximum of a function. On the rth iteration, the algorithm updates the parameter vector ${{\theta}}_{r}$ with

${{\theta}}_{r+1} = {{\theta}}_{r} - H^{-1}g$

where H is the Hessian (second derivative) matrix, and g is the gradient (first derivative) vector of the log likelihood function, both evaluated at the current value of the parameter vector. That is,

$g = [g_{j}] = . [ \frac{\partial L}{\partial \theta_{j}}] | _{{\theta}={\theta}_{r}}$

and

$H = [{h_{ij}}] = . [ \frac{\partial^2L} {\partial\theta_{i}\partial\theta_{j}} ] | _{{\theta}={\theta}_{r}}$

Iteration continues until the parameter estimates converge. The convergence criterion is

$|\theta^{r+1}_{i}-\theta^r_{i}| \leq c \;\; { if } \;\; |\theta^{r+1}_{i}| \lt .... ...ta^r_{i}}{\theta^{r+1}_{i}}| \leq c \;\; { if } \;\; |\theta^{r+1}_{i}| \geq .01$

for all i = 1,2, ... ,p where c is the convergence criterion. The default value of c is 0.001, and it can be specified with the CONVERGE= option in the MODEL, PROBPLOT, RELATIONPLOT, and ANALYZE statements.

After convergence by the above criterion, the quantity

tc = [(gH^-1g)/(L)]

is computed. If tc > d then a warning is printed that the algorithm did not converge. tc is called the relative Hessian convergence criterion. The default value of d is .0001. You can specify other values for d with the CONVH= option. The relative Hessian criterion is useful in detecting the occasional case where no progress can be made in increasing the log-likelihood, yet the gradient g is not zero.

A location-scale model has a CDF of the form

$F(x) = G(\frac{x-\mu}{\sigma})$

where $\mu$ is the location parameter, $\sigma$ is the scale parameter, and G is a standardized form $(\mu=0, \sigma=1)$ of the cumulative distribution function. The parameter vector is ${\theta}$ =( $\mu$ $\sigma$ ). It is more convenient computationally to maximize log likelihoods that arise from location-scale models. If you specify a distribution from Table 30.37 that is not a location-scale model, it is transformed to a location-scale model by taking the natural (base e) logarithm of the response. If you specify the lognormal base 10 distribution, the logarithm (base 10) of the response is used. The Weibull, lognormal, and log-logistic distributions in Table 30.37 are not location-scale models. Table 30.38 shows the corresponding location-scale models that result from taking the logarithm of the response.

Maximum likelihood is the default method of estimating the location and scale parameters in the MODEL, PROBPLOT, RELATIONPLOT, and ANALYZE statements. If the Weibull distribution is specified, the logarithms of the responses are used to obtain maximum likelihood estimates ( $\hat{\mu}$ $\hat{\sigma}$ ) of the location and scale parameters of the extreme value distribution. The maximum likelihood estimates ( $\hat{\alpha}$ , $\hat{\beta}$ ) of the Weibull scale and shape parameters are computed as $\hat{\alpha}=\exp(\hat{\mu})$ and $\hat{\beta}=1/\hat{\sigma}$ .

Regression Models

In a regression model, the location parameter for the ith observation of a location-scale model is a linear function of parameters:

$\mu_{i} = {x_{i}}'{{\beta}}$

where x_i is a vector of explanatory variables for the ith observation determined by the experimental setup and ${\beta}$ is a vector of parameters to be estimated.

You can specify a regression model using the MODEL statement. For example, if you want to relate the lifetimes of electronic parts in a test to operating temperature using the Arrhenius relationship, then an appropriate model might be

$\mu_{i} = \beta_{0} + x_{i}\beta_{1}$

where x_i= 1000/(T_i+273.15), and T_i is the centigrade temperature at which the ith unit is tested. Here, x_i' =[ 1 x_i ].

There are two types of explanatory variables: continuous variables and class (or classification) variables. Continuous variables represent physical quantities, such as temperature or voltage, and they must be numeric. Continuous explanatory variables are sometimes called covariates. Class variables identify classification levels and are declared in the CLASS statement. These are also referred to as categorical, dummy, qualitative, discrete, or nominal variables. Class variables can be either character or numeric. The values of class variables are called levels. For example, the class variable BATCH could have levels `batch1' and `batch2' to identify items from two production batches. An indicator (0-1) variable is generated for each level of a class variable and is used as an explanatory variable. See Nelson (1990, p.277) for an example using an indicator variable in the analysis of accelerated life test data. In a model, an explanatory variable that is not declared in a CLASS statement is assumed to be continuous. By default, all regression models automatically contain an intercept term; that is, the model is of the form

$\mu_{i} = \beta_{0} + \beta_{1}x_{i1} + ...$

where $\beta_{0}$ does not have an explanatory variable multiplier. The intercept term can be excluded from the model by specifying the NOINT option in the MODEL statement.

For numerical stability, continuous explanatory variables are centered and scaled internally to the procedure. This transforms the parameters ${\beta}$ in the original model to a new set of parameters. The parameter estimates ${\beta}$ and covariances are transformed back to the original scale before reporting, so that the parameters should be interpreted in terms of the originally specified model. Covariates that are indicator variables, that is, those specified in a CLASS statement, are not centered and scaled.

Initial values of the regression parameters used in the Newton-Raphson method are computed by ordinary least squares. The parameters ${\beta}$ and the scale parameter $\sigma$ are jointly estimated by maximum likelihood, taking a logarithmic transformation of the responses, if necessary, to get a location-scale model.

The generalized gamma distribution is fit using the log lifetimes. The regression parameters ${\beta}$ ,the scale parameter $\sigma$ , and the shape parameter $\lambda$ are jointly estimated.

The Weibull distribution shape parameter estimate is computed as $\hat{\beta}=1/\hat{\sigma}$ , where $\sigma$ is the scale parameter from the corresponding extreme value distribution. The Weibull scale parameter $\hat{\alpha_{i}}=\exp({x_{i}}'\hat{{\beta}})$ is not computed by the procedure. Instead, the regression parameters ${\beta}$ and the shape $\beta$ are reported.

In a model with a single continuous explanatory variable x, you can use the RELATION= option in the MODEL statement to specify a transformation that is applied to the variable before model fitting. Table 30.45 shows the available transformations.

Table 30.45: Variable Transformations

Relation	Transformed variable
ARRHENIUS (Nelson parameterization)	1000/(x+273.15)
ARRHENIUS2 (activation energy parameterization)	11605/(x+273.15)
POWER	log(x)
LINEAR	x

Stable Parameters

The location and scale parameters $(\mu, \sigma)$ are estimated by maximizing the likelihood function by numerical methods, as described previously. An alternative parameterization that may have better numerical properties for heavy censoring is $(\eta, \sigma)$ , where $\eta = \mu + z_{p}\sigma$ and z_p is the pth quantile of the standardized distribution. See Meeker and Escobar (1998, p. 90) and Doganaksoy and Schmee (1993) for more details on alternate parameterizations.

By default, RELIABILITY estimates a value of z_p from the data that will improve the numerical properties of the estimation. You can also specify values of p from which the value of z_p will be computed with the PSTABLE= option in the ANALYZE, PROBPLOT, RELATIONPLOT, or MODEL statements. Note that a value of p=0.632 for the Weibull and extreme value and p=0.5 for all other distributions will give z_p=0 and the parameterization will then be the usual location-scale parameterization.

All estimates and related statistics are reported in terms of the location and scale parameters $(\mu, \sigma)$ . If you specify the ITPRINT option in the ANALYZE, PROBPLOT, or RELATIONPLOT statement, a table showing the values of p, $\nu$ , $\sigma$ ,and the last evaluation of the gradient and Hessian for these parameters is produced.

Covariance Matrix

An estimate of the covariance matrix of the maximum likelihood estimators (MLEs) of the parameters ${\theta}$ is given by the inverse of the negative of the matrix of second derivatives of the log likelihood, evaluated at the final parameter estimates:

${\Sigma} = [\sigma_{ij}] = -H^{-1} = -[ \frac{\partial^2L} {\partial\theta_{i}\partial\theta_{j}} ]^{-1}_{{\theta}={\hat{\theta}}}$

The negative of the matrix of second derivatives is called the Fisher information matrix. The diagonal term $\sigma_{ii}$ is an estimate of the variance of $\hat{\theta}_{i}$ .Estimates of standard errors of the MLEs are provided by

${\rm SE}_{\theta_{i}} = \sqrt{\sigma_{ii}}$

An estimator of the correlation matrix is

$R = [ \frac{\sigma_{ij}}{\sqrt{\sigma_{ii}\sigma_{jj}}} ]$

The covariance matrix for the Weibull distribution parameter estimators is computed by a first-order approximation from the covariance matrix of the estimators of the corresponding extreme value parameters $(\mu, \sigma)$ as

${\rm Var}(\hat{\alpha}) & = & [\exp(\hat{\mu})]^2{\rm Var}(\hat{\mu}) \{\rm Var}... ... & = & -\frac{\exp(\hat{\mu)}}{\hat{\sigma}^2}{\rm Cov}(\hat{\mu},\hat{\sigma})$

For the regression model, the variance of the Weibull shape parameter estimator $\hat{\beta}$ is computed from the variance of the estimator of the extreme value scale parameter $\sigma$ as shown previously. The covariance of the regression parameter estimator $\hat{\beta}_{i}$ and the Weibull shape parameter estimator $\hat{\beta}$ is computed in terms of the covariance between $\hat{\beta}_{i}$ and $\hat{\sigma}$ as

${\rm Cov}(\hat{\beta}_{i}, \hat{\beta}) = -\frac{{\rm Cov}(\hat{\beta_{i}},\hat{\sigma})}{\hat{\sigma}^2}$

Confidence Intervals for Distribution Parameters

Table 30.46 shows the method of computation of approximate two-sided $\gammax 100\%$ confidence limits for distribution parameters. The default value of confidence is $\gamma = 0.95$ .Other values of confidence are specified using the CONFIDENCE= option. In Table 30.46, $K_{\gamma}$ represents the $(1+\gamma)/2 x 100\%$ percentile of the standard normal distribution, and $\hat{\mu}$ and $\hat{\sigma}$ are the MLEs of the location and scale parameters for the normal, extreme value, and logistic distributions. For the lognormal, Weibull, and log-logistic distributions, $\hat{\mu}$ and $\hat{\sigma}$ represent the MLEs of the corresponding location and scale parameters of the location-scale distribution that results when the logarithm of the lifetime is used as the response. For the Weibull distribution, $\mu$ and $\sigma$ are the location and scale parameters of the extreme value distribution for the logarithm of the lifetime. ${\rm SE}_{\hat{\theta}}$ denotes the standard error of the MLE of $\theta$ , computed as the square root of the appropriate diagonal element of the inverse of the Fisher information matrix.

Table 30.46: Confidence Limit Computation

	Parameters
Distribution	Location	Scale	Shape

Normal	$\mu_{L}=\hat{\mu}-K_{\gamma}{\rm (SE}_{\hat{\mu}})$	$\sigma_{L}=\hat{\sigma}/\exp[K_{\gamma}{\rm (SE}_{\hat{\sigma}})/\hat{\sigma}]$
	$\mu_{U}=\hat{\mu}+K_{\gamma}{\rm (SE}_{\hat{\mu}})$	$\sigma_{U}=\hat{\sigma}\exp[K_{\gamma}{\rm (SE}_{\hat{\sigma}})/\hat{\sigma}]$

Lognormal	$\mu_{L}=\hat{\mu}-K_{\gamma}{\rm (SE}_{\hat{\mu}})$	$\sigma_{L}=\hat{\sigma}/\exp[K_{\gamma}{\rm (SE}_{\hat{\sigma}})/\hat{\sigma}]$
	$\mu_{U}=\hat{\mu}+K_{\gamma}{\rm (SE}_{\hat{\mu}})$	$\sigma_{U}=\hat{\sigma}\exp[K_{\gamma}{\rm (SE}_{\hat{\sigma}})/\hat{\sigma}]$

Lognormal	$\mu_{L}=\hat{\mu}-K_{\gamma}{\rm (SE}_{\hat{\mu}})$	$\sigma_{L}=\hat{\sigma}/\exp[K_{\gamma}{\rm (SE}_{\hat{\sigma}})/\hat{\sigma}]$
(base 10)	$\mu_{U}=\hat{\mu}+K_{\gamma}{\rm (SE}_{\hat{\mu}})$	$\sigma_{U}=\hat{\sigma}\exp[K_{\gamma}{\rm (SE}_{\hat{\sigma}})/\hat{\sigma}]$

Extreme Value	$\mu_{L}=\hat{\mu}-K_{\gamma}{\rm (SE}_{\hat{\mu}})$	$\sigma_{L}=\hat{\sigma}/\exp[K_{\gamma}{\rm (SE}_{\hat{\sigma}})/\hat{\sigma}]$
	$\mu_{U}=\hat{\mu}+K_{\gamma}{\rm (SE}_{\hat{\mu}})$	$\sigma_{U}=\hat{\sigma}\exp[K_{\gamma}{\rm (SE}_{\hat{\sigma}})/\hat{\sigma}]$

Weibull		$\alpha_{L}=\exp[\hat{\mu}-K_{\gamma}{\rm (SE}_{\hat{\mu}})]$	$\beta_{L}=\exp[-K_{\gamma}{\rm (SE}_{\hat{\sigma}})/\hat{\sigma}]/\hat{\sigma}$
		$\alpha_{U}=\exp[\hat{\mu}+K_{\gamma}{\rm (SE}_{\hat{\mu}})]$	$\beta_{U}=\exp[K_{\gamma}{\rm (SE}_{\hat{\sigma}})/\hat{\sigma}]/\hat{\sigma}$

Exponential		$\alpha_{L}=\exp[\hat{\mu}-K_{\gamma}{\rm (SE}_{\hat{\mu}})]$
		$\alpha_{U}=\exp[\hat{\mu}+K_{\gamma}{\rm (SE}_{\hat{\mu}})]$

Logistic	$\mu_{L}=\hat{\mu}-K_{\gamma}{\rm (SE}_{\hat{\mu}})$	$\sigma_{L}=\hat{\sigma}/\exp[K_{\gamma}{\rm (SE}_{\hat{\sigma}})/\hat{\sigma}]$
	$\mu_{U}=\hat{\mu}+K_{\gamma}{\rm (SE}_{\hat{\mu}})$	$\sigma_{U}=\hat{\sigma}\exp[K_{\gamma}{\rm (SE}_{\hat{\sigma}})/\hat{\sigma}]$

Log-logistic	$\mu_{L}=\hat{\mu}-K_{\gamma}{\rm (SE}_{\hat{\mu}})$	$\sigma_{L}=\hat{\sigma}/\exp[K_{\gamma}{\rm (SE}_{\hat{\sigma}})/\hat{\sigma}]$
	$\mu_{U}=\hat{\mu}+K_{\gamma}{\rm (SE}_{\hat{\mu}})$	$\sigma_{U}=\hat{\sigma}\exp[K_{\gamma}{\rm (SE}_{\hat{\sigma}})/\hat{\sigma}]$

Generalized		$\sigma_{L}=\hat{\sigma}/\exp[K_{\gamma}{\rm (SE}_{\hat{\sigma}})/\hat{\sigma}]$	$\mu_{L}=\hat{\lambda}-K_{\gamma}{\rm (SE}_{\hat{\lambda}})$
gamma		$\sigma_{U}=\hat{\sigma}\exp[K_{\gamma}{\rm (SE}_{\hat{\sigma}})/\hat{\sigma}]$	$\mu_{U}=\hat{\lambda}+K_{\gamma}{\rm (SE}_{\hat{\lambda}})$

Regression Parameters Approximate $\gammax 100\%$ confidence limits for the regression parameter $\beta_{i}$ are given by

$\beta_{iL}=\hat{\beta}_{i} - K_{\gamma}({\rm SE}_{\hat{\beta}_{i}})$

$\beta_{iU}=\hat{\beta}_{i} + K_{\gamma}({\rm SE}_{\hat{\beta}_{i}})$

Percentiles

The maximum likelihood estimate of the p ×100% percentile x_p for the extreme value, normal, and logistic distributions is given by

$\hat{x}_{p} = \hat{\mu} + z_{p}\hat{\sigma}$

where z_p=G^-1(p), G is the standardized CDF shown in Table 30.47, and $(\hat{\mu},\hat{\sigma})$ are the maximum likelihood estimates of the location and scale parameters of the distribution. The maximum likelihood estimate of the percentile t_p for the Weibull, lognormal, and log-logistic distributions is given by

$\hat{t}_{p} = \exp[\hat{\mu} + z_{p}\hat{\sigma}]$

where z_p=G^-1(p), and G is the standardized CDF of the location-scale model corresponding to the logarithm of the response. For the lognormal (base 10) distribution,

$\hat{t}_{p} = 10^{[\hat{\mu} + z_{p}\hat{\sigma}]}$

Table 30.47: Standardized Cumulative Distribution Functions

	Location-Scale	Location-Scale
Distribution	Distribution	CDF

Weibull	Extreme Value	1 - exp[-exp(z)]

Lognormal	Normal	$\int_{-\infty}^z\frac{1}{\sqrt{2\pi}}\exp(-\frac{u^2}2)du$

Log-logistic	Logistic	[exp(z)/(1+exp(z))]

Confidence Intervals The variance of the MLE of the p×100% percentile for the normal, extreme value, or logistic distribution is

$Var(\hat{x}_{p}) = Var(\hat{\mu}) + z_{p}^2Var(\hat{\sigma}) + 2Cov(\hat{\mu},\hat{\sigma})$

Two-sided approximate $100\gamma\%$ confidence limits for x_p are

$x_{pL} & = & \hat{x}_{p} - K_{\gamma}\sqrt{Var(\hat{x}_{p})} \ x_{pU} & = & \hat{x}_{p} + K_{\gamma}\sqrt{Var(\hat{x}_{p})}$

where $K_{\gamma}$ represents the $100(1+\gamma)/2 x 100\%$ percentile of the standard normal distribution.

The limits for the lognormal, Weibull, or log-logistic distributions are

$t_{pL} & = & \exp(\hat{x}_{p} - K_{\gamma}\sqrt{Var(\hat{x}_{p})}) \ t_{pU} & = & \exp(\hat{x}_{p} + K_{\gamma}\sqrt{Var(\hat{x}_{p})})$

where x_p refers to the percentile of the corresponding location-scale distribution (normal, extreme value, or logistic) for the logarithm of the lifetime. For the lognormal (base 10) distribution,

$t_{pL} & = & 10^{(\hat{x}_{p} - K_{\gamma}\sqrt{Var(\hat{x}_{p})})} \ t_{pU} & = & 10^{(\hat{x}_{p} + K_{\gamma}\sqrt{Var(\hat{x}_{p})})}$

Reliability Function

For the extreme value, normal, and logistic distributions shown in Table 30.47, the maximum likelihood estimate of the reliability function R(x) = Pr{X>x} is given by

$\hat{R}(x)=1-F(\frac{x-\hat{\mu}}{\hat{\sigma}})$

The MLE of the CDF is $\hat{F}(x)=1-\hat{R}(x)$ .

Confidence Intervals Let $\hat{u}=\frac{x-\hat{\mu}}{\hat{\sigma}}$ . The variance of u is

$Var(\hat{u})\approx \frac{Var(\hat{\mu})+\hat{u}^2Var(\hat{\sigma})+ 2\hat{u}Cov(\hat{\mu},\hat{\sigma})}{\hat{\sigma}^2}$

Two-sided approximate $\gammax 100\%$ confidence intervals for R(x) are computed as

$R_{L}(x)=\hat{R}(u_{2})$

$R_{U}(x)=\hat{R}(u_{1})$

where

$u_{1}=\hat{u} - K_{\gamma}\sqrt{Var(\hat{u})}$

$u_{2}=\hat{u} + K_{\gamma}\sqrt{Var(\hat{u})}$

and $K_{\gamma}$ represents the $(1+\gamma)/2 x 100\%$ percentile of the standard normal distribution.

The corresponding limits for the CDF are

F_L(x) = 1-R_U(x)

F_U(x) = 1-R_L(x)

Limits for the Weibull, lognormal, and log-logistic reliability function R(t) are the same as those for the corresponding extreme value, normal, or logistic reliability R(y), where y = log(t).

Estimation with the Binomial and Poisson Distributions

In addition to estimating the parameters of the distributions in Table 30.37, you can estimate parameters, compute confidence limits, compute predicted values and prediction limits, and compute chi-squared tests for differences in groups for the binomial and Poisson distributions using the ANALYZE statement. Specify either BINOMIAL or POISSON in the DISTRIBUTION statement to use one of these distributions. The ANALYZE statement options available for the binomial and Poisson distributions are given in Table 30.5. See "Analysis of Binomial Data" for an example of an analysis of binomial data.

Binomial Distribution

If r is the number of successes and n is the number of trials in a binomial experiment, then the maximum likelihood estimator of the probability p in the binomial distribution in Table 30.39 is computed as

$\hat{p} = r/n$

Two-sided $\gammax 100\%$ confidence limits for p are computed as in Johnson, Kotz, and Kemp (1992, p.130):

$p_{L}= \frac{\nu_{1}F[(1-\gamma)/2; \nu{_1}, \nu_{2}] } {\nu_{2} + \nu_{1}F[(1-\gamma)/2; \nu{_1}, \nu_{2}] }$

with $\nu_{1}=2r$ and $\nu_{2}=2(n-r+1)$ and

$p_{U}= \frac{\nu_{1}F[(1+\gamma)/2; \nu{_1}, \nu_{2}] } {\nu_{2} + \nu_{1}F[(1+\gamma)/2; \nu{_1}, \nu_{2}] }$

with $\nu_{1}=2(r+1)$ and $\nu_{2}=2(n-r)$ , where $F[\gamma;\nu_{1},\nu_{2}]$ is the $\gammax 100\%$ percentile of the F distribution with $\nu_{1}$ degrees of freedom in the numerator and $\nu_{2}$ degrees of freedom in the denominator.

You can compute a sample size required to estimate p within a specified tolerance w with probability $\gamma$ .Nelson (1982, p. 206) gives the following formula for the approximate sample size:

$n \approx \hat{p}(1-\hat{p})(\frac{K_{\gamma}}w)^2$

where $K_{\gamma}$ is the $(1+\gamma)/2 x 100\%$ percentile of the standard normal distribution. The formula is based on the normal approximation for the distribution of $\hat{p}$ . Nelson recommends using this formula if np > 10 and np(1-p) > 10. The value of $\gamma$ used for computing confidence limits is used in the sample size computation. The default value of confidence is $\gamma = 0.95$ .Other values of confidence are specified using the CONFIDENCE= option. You specify a tolerance of number with the TOLERANCE(number) option.

The predicted number of successes X in a future sample of size m, based on the previous estimate of p, is computed as

$\hat{X}=m(r/n) = m\hat{p}$

Two-sided approximate $\gamma x 100)\%$ prediction limits are computed as in Nelson (1982, p. 208). The prediction limits are the solutions X_L and X_U of

$X_{U}/m = [(r+1)/n]F[(1+\gamma)/2;2(r+1),2X_{U}]$

$m/(X_{L}+1)=(n/r)F[(1+\gamma)/2;2(X_{L}+1),2r]$

where $F[\gamma;\nu_{1},\nu_{2}]$ is the $\gamma x 100$ % percentile of the F distribution with $\nu_{1}$ degrees of freedom in the numerator and $\nu_{2}$ degrees of freedom in the denominator. You request predicted values and prediction limits for a future sample of size number with the PREDICT(number) option. You can test groups of binomial data for equality of their binomial probability using the ANALYZE statement. You specify the K groups to be compared with a group variable having K levels.

Nelson (1982, p.450) discusses a chi-squared test statistic for comparing K binomial proportions for equality. Suppose there are r_i successes in n_i trials for i = 1,2, ... , K. The grouped estimate of the binomial probability is

$\hat{p}=\frac{r_{1}+r_{2}+ ... +r_{K}}{n_{1}+n_{2}+ ... +n_{K}}$

The chi-squared test statistic for testing the hypothesis p₁ = p₂ = ... = p_K against $p_{i}\neq p_{j}$ for some i and j is

$Q=\sum_{i=1}^K\frac{(r_{i}-n_{i}\hat{p})^2}{n_{i}\hat{p}(1-\hat{p})}$

The statistic Q has an asymptotic chi-squared distribution with K-1 degrees of freedom. The RELIABILITY procedure computes the contribution of each group to Q, the value of Q, and the p-value for Q based on the limiting chi-squared distribution with K-1 degreees of freedom. If you specify the PREDICT option, predicted values and prediction limits are computed for each group, as well as for the pooled group. The p-value is defined as $p_{0}=1-\chi^2_{K-1}[Q]$ , where $\chi^2_{K-1}[x]$ is the chi-squared CDF with K-1 degrees of freedom, and Q is the observed value. A test of the hypothesis of equal binomial probabilities among the groups with significance level $\alpha$ is

$p_{0} \gt \alpha$ : do not reject the equality hypothesis
$p_{0} \le \alpha$ : reject the equality hypothesis

Poisson Distribution

You can use the ANALYZE statement to model data using the Poisson distribution. The data consists of a count Y of occurrences in a "length" of observation T. Observation T is typically an exposure time, but it can have other units, such as distance. The ANALYZE statement enables you to compute the rate of occurrences, confidence limits, and prediction limits.

An estimate of the rate $\lambda$ is computed as

$\hat{\lambda}=Y/T$

Two-sided $\gammax 100\%$ confidence limits for $\lambda$ are computed as in Nelson (1982, p. 201):

$\lambda_{L}=.5\chi^2[(1-\gamma)/2;2Y]/T$

$\lambda_{U}=.5\chi^2[(1+\gamma)/2;2(Y+1)]/T$

where $\chi^2[\delta;\nu]$ is the $\delta x 100\%$ percentile of the chi-squared distribution with $\nu$ degrees of freedom.

You can compute a length T required to estimate $\lambda$ within a specified tolerance w with probability $\gamma$ .Nelson (1982, p. 202) provides the following approximate formula:

$\hat{T}\approx\hat{\lambda}(\frac{K_{\gamma}}w)^2$

where $K_{\gamma}$ is the $(1+\gamma)/2 x 100\%$ percentile of the standard normal distribution. The formula is based on the normal approximation for $\hat{\lambda}$ and is more accurate for larger values of $\lambda T$ . Nelson recommends using the formula when $\lambda T \gt 10$ .The value of $\gamma$ used for computing confidence limits is also used in the length computation. The default value of confidence is $\gamma = 0.95$ . Other values of confidence are specified using the CONFIDENCE= option. You specify a tolerance of number with the TOLERANCE(number) option.

The predicted future number of occurrences in a length S is

$\hat{X}=(Y/T)S = \hat{\lambda}S$

Two-sided approximate $\gammax 100\%$ prediction limits are computed as in Nelson (1982, p. 203). The prediction limits are the solutions X_L and X_U of

$X_{U}/S = [(Y+1)/T]F[(1+\gamma)/2;2(Y+1),2X_{U}]$

$S/(X_{L}+1)=(T/Y)F[(1+\gamma)/2;2(X_{L}+1),2Y]$

where $F[\gamma;\nu_{1},\nu_{2}]$ is the $\gammax 100\%$ percentile of the F distribution with $\nu_{1}$ degrees of freedom in the numerator and $\nu_{2}$ degrees of freedom in the denominator. You request predicted values and prediction limits for a future exposure number with the PREDICT(number) option.

You can compute a chi-squared test statistic for comparing K Poisson rates for equality. You specify the K groups to be compared with a group variable having K levels.

Refer to Nelson (1982, p.444) for more information on this test. Suppose that there are Y_i Poisson counts in lengths T_i for i = 1,2, ... , K and that the Y_i are independent. The grouped estimate of the Poisson rate is

$\hat{\lambda}=\frac{Y_{1}+Y_{2}+ ... +Y_{K}}{T_{1}+T_{2}+ ... +T_{K}}$

The chi-squared test statistic for testing the hypothesis $\lambda_{1}=\lambda_{2}= ... =\lambda_{K}$ against $\lambda_{i}\neq \lambda_{j}$ for some i and j is

$Q=\sum_{i=1}^K\frac{(Y_{i}-\hat{\lambda}T_{i})^2}{\hat{\lambda}T_{i}}$

The statistic Q has an asymptotic chi-squared distribution with K-1 degrees of freedom. The RELIABILITY procedure computes the contribution of each group to Q, the value of Q, and the p-value for Q based on the limiting chi-squared distribution with K-1 degreees of freedom. If you specify the PREDICT option, predicted values and prediction limits are computed for each group, as well as for the pooled group. The p-value is defined as $p_{0}=1-\chi^2_{K-1}[Q]$ , where $\chi^2_{K-1}[x]$ is the chi-squared CDF with K-1 degrees of freedom and Q is the observed value. A test of the hypothesis of equal Poisson rates among the groups with significance level $\alpha$ is

$p_{0} \gt \alpha$ : accept the equality hypothesis
$p_{0} \le \alpha$ : reject the equality hypothesis

Least Squares Fit to the Probability Plot

Fitting to the probability plot by least squares is an alternative to maximum likelihood estimation of the parameters of a life distribution. Only the failure times are used. A least squares fit is computed using points (x_(i), m_i), where m_i=F^-1(a_i) and a_i are the plotting positions as defined in
sref[d]ppos. The x_i are either the lifetimes for the normal, extreme value, or logistic distributions or the log lifetimes for the lognormal, Weibull, or log-logistic distributions. The ANALYZE, PROBPLOT, or RELATIONPLOT statement option FITTYPE=LSXY specifies the x_(i) as the dependent variable (`y-coordinate') and the m_i as the independent variable (`x-coordinate'). You can optionally reverse the quantities used as dependent and independent variables by specifying the FITTYPE=LSYX option.

Weibayes Estimation

Weibayes estimation is a method of performing a Weibull analysis when there are few or no failures. The FITTYPE=WEIBAYES option requests this method. The method of Nelson (1985) is used to compute a one-sided confidence interval for the Weibull scale parameter when the Weibull shape parameter is specified. Also refer to Abernethy (1996) for more discussion and examples. The Weibull shape parameter $\beta$ is assumed to be known and is specified to the procedure with the SHAPE=number option. Let T₁,T₂, ... ,T_n be the failure and censoring times, and let $r\ge 0$ be the number of failures in the data. If there are no failures (r=0), a lower $\gammax 100\%$ confidence limit for the Weibull scale parameter $\alpha$ is computed as

$\alpha_{L}=\{\sum_{i=1}^nT_{i}^{\beta}/[-\log(1-\gamma)]\}^{1/\beta}$

The default value of confidence is $\gamma = 0.95$ .Other values of confidence are specified using the CONFIDENCE= option.

If $r\ge1$ , the MLE of $\alpha$ is given by

$\hat{\alpha}=[\sum_{i=1}^nT_{i}^{\beta}/r]^{1/\beta}$

and a lower $\gammax 100\%$ confidence limit for the Weibull scale parameter $\alpha$ is computed as

$\alpha_{L}=\hat{\alpha}[2r/\chi^2(\gamma, 2r+2)]^{1/\beta}$

where $\chi^2(\gamma, 2r+2)$ is the $\gamma$ percentile of a chi-square distribution with 2r+2 degrees of freedom. The procedure uses the specified value of $\beta$ and the computed value of $\alpha_{L}$ to compute distribution percentiles and the reliability function.

Chapter Contents
Previous
Next
Top