Computational Issues
Computational Method
In addition to numerous matrix-multiplication routines, PROC MIXED
frequently uses the sweep operator (Goodnight 1979) and the Cholesky
root (Golub and Van Loan 1989). The routines perform a modified W
transformation (Goodnight and Hemmerle 1979) for G-side
likelihood calculations and a direct method for R-side
likelihood calculations. For the Type III F-tests, PROC MIXED
uses the algorithm described in Chapter 30, "The GLM Procedure."
PROC MIXED uses a ridge-stabilized Newton-Raphson algorithm to
optimize either a full (ML) or residual (REML) likelihood function.
The Newton-Raphson algorithm is preferred to the EM algorithm
(Lindstrom and Bates 1988). PROC MIXED profiles the likelihood
with respect to the fixed effects and also with respect to the
residual variance whenever it appears reasonable to do so. The
residual profiling can be avoided by using the NOPROFILE option
of the PROC MIXED statement. PROC MIXED uses the MIVQUE0 method
(Rao 1972; Giesbrecht 1989) to compute initial values.
The likelihoods that PROC MIXED optimizes are usually well-defined
continuous functions with a single optimum. The Newton-Raphson
algorithm typically performs well and finds the optimum in a few
iterations. It is a quadratically converging algorithm, meaning
that the error of the approximation near the optimum is squared
at each iteration. The quadratic convergence property is evident
when the convergence criterion drops to zero by factors of ten
or more.
Table 41.9: Notation for Order Calculations
Symbol
|
Number
|
p | columns of X |
g | columns of Z |
N | observations |
q | covariance parameters |
t | maximum observations per subject |
S | subjects |
Using the notation from Table 41.9, the following are
estimates of the computational speed of the algorithms used in PROC
MIXED. For likelihood calculations, the crossproducts matrix
construction is of order N(p+g)2 and the sweep operations are of
order (p+g)3. The first derivative calculations for parameters in
G are of order qg3 for ML and q(g3 + pg2 + p2g)
for REML. If you specify a subject effect in the RANDOM statement
and if you are not using the REPEATED statement, then replace g by
g/S and q by qS in these calculations. The first derivative
calculations for parameters in R are of order qS(t3 + gt2
+ g2t) for ML and qS(t3 + (p+g)t2 + (p2 + g2)t) for
REML. For the second derivatives, replace q by q(q+1)/2 in the
first derivative expressions. When you specify both G- and
R-side parameters (that is, when you use both the RANDOM and
REPEATED statements), then additional calculations are required of
an order equal to the sum of the orders for G and R.
Considerable execution times may result in this case.
For further details about the computational techniques used in PROC
MIXED, refer to Wolfinger, Tobias, and Sall (1994).
Parameter Constraints
By default, some covariance parameters are assumed to satisfy certain
boundary constraints during the Newton-Raphson algorithm. For
example, variance components are constrained to be nonnegative and
autoregressive parameters are constrained to be between -1 and 1.
You can remove these constraints with the NOBOUND option in the
PARMS statement, but this may lead to estimates that produce an
infinite likelihood. You can also introduce or change boundary
constraints with the LOWERB= and UPPERB= options in the PARMS
statement.
During the Newton-Raphson algorithm, a parameter may
be set equal to one of its boundary constraints for a few
iterations and then it may move away from the boundary.
You see a missing value in the Criterion column
of the "Iteration History" table whenever a boundary
constraint is dropped.
For some data sets the final estimate of a parameter may equal one
of its boundary constraints. This is usually not a cause for
concern, but it may lead you to consider a different model. For
instance, a variance component estimate can equal zero; in this
case, you may want to drop the corresponding random effect from the
model. However, be aware that changing the model in this fashion
can impact degrees of freedom calculations.
Convergence Problems
For some data sets, the Newton-Raphson algorithm can fail to
converge. Non-convergence can result from a number of causes,
including flat or ridged likelihood surfaces and ill-conditioned
data.
It is also possible for PROC MIXED to converge to a point
that is not the global optimum of the likelihood, although
this usually occurs only with the spatial covariance structures.
If you experience convergence problems, the following points may
be helpful:
- One useful tool is the PARMS statement, which lets you input
initial values for the covariance parameters and performs a grid
search over the likelihood surface.
- Sometimes the Newton-Raphson algorithm does not perform well
when two of the covariance parameters are on a different scale; that
is, they are several orders of magnitude apart. This is because the
Hessian matrix is processed jointly for the two parameters, and
elements of it corresponding to one of the parameters can become
close to internal tolerances in PROC MIXED. In this case, you can
improve stability by rescaling the effects in the model so that the
covariance parameters are on the same scale.
- Data that is extremely large or extremely small can
adversely affect results because of the internal tolerances in
PROC MIXED. Rescaling it can improve stability.
- For stubborn problems, you may want to specify
ODS OUTPUT COVPARMS= data-set-name
to output the "CovParms" table as a precautionary measure.
That way, if the problem does not converge, you can read the final
parameter values back into a new run with the PARMSDATA= option in
the PARMS statement.
- Fisher scoring can be more robust than Newton-Raphson to poor
MIVQUE(0) starting values. Specifying a SCORING= value of 5 or so
may help to recover from poor starting values.
- Tuning the singularity options SINGULAR=, SINGCHOL=, and
SINGRES= in the MODEL statement may improve the
stability of the optimization process.
- Tuning the MAXITER= and MAXFUNC= options in the PROC MIXED
statement can save resources. Also, the ITDETAILS option displays
the values of all of the parameters at each iteration.
- Using the NOPROFILE and NOBOUND options in the PROC MIXED statement
may help convergence, although they can produce unusual results.
- Although the CONVH convergence criterion usually gives the
best results, you may want to try CONVF or CONVG, possibly along
with the ABSOLUTE option.
- If the convergence criterion bottoms out at a relatively small
value such as 1E-7 but never gets less than 1E-8, you may want to
specify CONVH=1E-6 in the PROC MIXED statement to get results;
however, interpret the results with caution.
- An infinite likelihood during the iteration process means
that the Newton-Raphson algorithm has stepped into a region where
either the R or V matrix is nonpositive definite. This is
usually no cause for concern as long as iterations continue. If
PROC MIXED stops because of an infinite likelihood, recheck your
model to make sure that no observations from the same subject are
producing identical rows in R or V and that you have enough
data to estimate the particular covariance structure you have
selected. Any time that the final estimated likelihood is infinite,
subsequent results should be interpreted with caution.
- A nonpositive definite Hessian matrix can indicate a
surface saddlepoint or linear dependencies among the parameters.
- A warning message about the singularities of X
changing indicates that there is some linear dependency in the
estimate of that is not found in X' X. This can adversely affect the likelihood calculations and
optimization process. If you encounter this problem, make sure that
your model specification is reasonable and that you have enough data
to estimate the particular covariance structure you have selected.
Rearranging effects in the MODEL statement so that the most
significant ones are first can help because PROC MIXED sweeps the
estimate of X' V-1 X in the order of the MODEL effects
and the sweep is more stable if larger pivots are dealt with first.
If this does not help, specifying starting values with the PARMS
statement can place the optimization on a different and possibly
more stable path.
- Lack of convergence may indicate model misspecification or
a violation of the normality assumption.
Memory
Let p be the number of columns in X, and let g be the
number of columns in Z. For large models, most of the
memory resources are required for holding symmetric matrices of
order p, g, and p + g. The approximate memory
requirement in bytes is
-
40(p2 + g2) + 32(p+g)2
If you have a large model that exceeds the memory capacity of your
computer, see the suggestions listed under "Computing Time."
Computing Time
PROC MIXED is computationally intensive, and execution times can be
long. In addition to the CPU time used in collecting sums and
cross products and in solving the mixed model equations (as in PROC
GLM), considerable CPU time is often required to compute the
likelihood function and its derivatives. These latter computations
are performed for every Newton-Raphson iteration.
If you have a model that takes too long to run, the following
suggestions may be helpful:
- Examine the "Model Information" table to find out the
number of columns in the X and Z matrices. A large number
of columns in either matrix can greatly increase computing time.
You may want to eliminate some higher order effects if they are too
large.
- If you have a Z matrix with a lot of columns,
use the DDFM=BW option in the MODEL statement
to eliminate the time required for the containment
method.
- If possible, "factor out" a common effect
from the effects in the RANDOM statement and make it
the SUBJECT= effect. This creates a block-diagonal
G matrix and can often speed calculations.
- If possible, use the same or nested SUBJECT= effects in
all RANDOM and REPEATED statements.
- If your data set is very large, you may want to analyze
it in pieces. The BY statement can help implement this strategy.
- In general, specify random effects with a lot of levels
in the REPEATED statement and those with a few levels
in the RANDOM statement.
- The METHOD=MIVQUE0 option runs faster than either
the METHOD=REML or
METHOD=ML option because it is noniterative.
- You can specify known values for the covariance
parameters using the HOLD= or
NOITER option in the PARMS
statement or the GDATA= option
in the RANDOM statement. This eliminates the need for iteration.
- The LOGNOTE option in the PROC MIXED statement writes
periodic messages to the SAS log concerning the status
of the calculations. It can help you diagnose where the
slow down is occurring.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.