LSMEANS Statement

The GLM Procedure

LSMEANS Statement

LSMEANS effects < / options > ;

Least-squares means (LS-means) are computed for each effect listed in the LSMEANS statement. You may specify only classification effects in the LSMEANS statement -that is, effects that contain only classification variables. You may also specify options to perform multiple comparisons. In contrast to the MEANS statement, the LSMEANS statement performs multiple comparisons on interactions as well as main effects.

LS-means are predicted population margins; that is, they estimate the marginal means over a balanced population. In a sense, LS-means are to unbalanced designs as class and subclass arithmetic means are to balanced designs. Each LS-mean is computed as L'b for a certain column vector L, where b is the vector of parameter estimates -that is, the solution of the normal equations. For further information, see the section "Construction of Least-Squares Means".

Multiple effects can be specified in one LSMEANS statement, or multiple LSMEANS statements can be used, but they must all appear after the MODEL statement. For example,

   proc glm;
      class A B;
      model Y=A B A*B;
      lsmeans A B A*B;
   run;

LS-means are displayed for each level of the A, B, and A*B effects.

You can specify the following options in the LSMEANS statement after a slash:

ADJUST=BON

ADJUST=DUNNETT

ADJUST=SCHEFFE

ADJUST=SIDAK

ADJUST=SIMULATE <(simoptions)>

ADJUST=SMM | GT2

ADJUST=TUKEY

ADJUST=T

requests a multiple comparison adjustment for the p-values and confidence limits for the differences of LS-means. The ADJUST= option modifies the results of the TDIFF and PDIFF options; thus, if you omit the TDIFF or PDIFF option then the ADJUST= option has no effect. By default, PROC GLM analyzes all pairwise differences unless you specify ADJUST=DUNNETT, in which case PROC GLM analyzes all differences with a control level. The default is ADJUST=T, which really signifies no adjustment for multiple comparisons.

The BON (Bonferroni) and SIDAK adjustments involve correction factors described in the "Multiple Comparisons" section and in Chapter 43, "The MULTTEST Procedure." When you specify ADJUST=TUKEY and your data are unbalanced, PROC GLM uses the approximation described in Kramer (1956) and identifies the adjustment as "Tukey-Kramer" in the results. Similarly, when you specify ADJUST=DUNNETT and the LS-means are correlated, PROC GLM uses the factor-analytic covariance approximation described in Hsu (1992) and identifies the adjustment as "Dunnett-Hsu" in the results. The preceding references also describe the SCHEFFE and SMM adjustments.

The SIMULATE adjustment computes the adjusted p-values from the simulated distribution of the maximum or maximum absolute value of a multivariate t random vector. The simulation estimates q, the true $(1-\alpha)$ th quantile, where $1 - \alpha$ is the confidence coefficient. The default $\alpha$ is the value of the ALPHA= option in the PROC GLM statement or 0.05 if that option is not specified. You can change this value with the ALPHA= option in the LSMEANS statement.

The number of samples for the SIMULATE adjustment is set so that the tail area for the simulated q is within a certain accuracy radius $\gamma$ of $1 - \alpha$ with an accuracy confidence of $100(1-\epsilon)$ %. In equation form,

$P(| F(\hat{q})-(1-\alpha)| \leq \gamma) & = & 1 - \epsilon$

where $\hat{q}$ is the simulated q and F is the true distribution function of the maximum; refer to Edwards and Berry (1987) for details. By default, $\gamma$ = 0.005 and $\epsilon$ = 0.01 so that the tail area of $\hat{q}$ is within 0.005 of 0.95 with 99% confidence. You can specify the following simoptions in parentheses after the ADJUST=SIMULATE option.

ACC=value: specifies the target accuracy radius $\gamma$ of a $100(1-\epsilon)$ % confidence interval for the true probability content of the estimated $(1-\alpha)$ th quantile. The default value is ACC=0.005. Note that, if you also specify the CVADJUST simoption, then the actual accuracy radius will probably be substantially less than this target.
CVADJUST: specifies that the quantile should be estimated by the control variate adjustment method of Hsu and Nelson (1998) instead of simply as the quantile of the simulated sample. Specifying the CVADJUST option typically has the effect of significantly reducing the accuracy radius $\gamma$ of a $100x(1-\epsilon)$ % confidence interval for the true probability content of the estimated $(1-\alpha)$ th quantile. The control-variate-adjusted quantile estimate takes roughly twice as long to compute, but it is typically much more accurate than the sample quantile.
EPS=value: specifies the value $\epsilon$ for a $100x(1-\epsilon)$ % confidence interval for the true probability content of the estimated $(1-\alpha)$ th quantile. The default value for the accuracy confidence is 99%, corresponding to EPS=0.01.
NSAMP=n: specifies the sample size for the simulation. By default, n is set based on the values of the target accuracy radius $\gamma$ and accuracy confidence $100x(1-\epsilon)$ true probability content of the estimated $(1-\alpha)$ th quantile. With the default values for $\gamma$ , $\epsilon$ , and $\alpha$ (0.005, 0.01, and 0.05, respectively), NSAMP=12604 by default.
REPORT: specifies that a report on the simulation should be displayed, including a listing of the parameters, such as $\gamma$ , $\epsilon$ ,and $\alpha$ as well as an analysis of various methods for estimating or approximating the quantile.
SEED=number: specifies a positive integer less than 2³¹ - 1. The value of the SEED= option is used to start the pseudo-random number generator for the simulation. The default is a value generated from reading the time of day from the computer's clock.

ALPHA=p

specifies the level of significance p for 100(1-p)% confidence intervals. This option is useful only if you also specify the CL option, and, optionally, the PDIFF option. By default, p is equal to the value of the ALPHA= option in the PROC GLM statement or 0.05 if that option is not specified, This value is used to set the endpoints for confidence intervals for the individual means as well as for differences between means.

AT variable = value

AT (variable-list) = (value-list)

AT MEANS

enables you to modify the values of the covariates used in computing LS-means. By default, all covariate effects are set equal to their mean values for computation of standard LS-means. The AT option enables you to set the covariates to whatever values you consider interesting. For more information, see the section "Setting Covariate Values"

BYLEVEL

requests that PROC GLM process the OM data set by each level of the LS-mean effect in question. For more details, see the entry for the OM option in this section.

CL

requests confidence limits for the individual LS-means. If you specify the PDIFF option, confidence limits for differences between means are produced as well. You can control the confidence level with the ALPHA= option. Note that, if you specify an ADJUST= option, the confidence limits for the differences are adjusted for multiple inference but the confidence intervals for individual means are not adjusted.

COV

includes variances and covariances of the LS-means in the output data set specified in the OUT= option in the LSMEANS statement. Note that this is the covariance matrix for the LS-means themselves, not the covariance matrix for the differences between the LS-means, which is used in the PDIFF computations. If you omit the OUT= option, the COV option has no effect. When you specify the COV option, you can specify only one effect in the LSMEANS statement.

E

displays the coefficients of the linear functions used to compute the LS-means.

E=effect

specifies an effect in the model to use as an error term. The procedure uses the mean square for the effect as the error mean square when calculating estimated standard errors (requested with the STDERR option) and probabilities (requested with the STDERR, PDIFF, or TDIFF option). Unless you specify STDERR, PDIFF or TDIFF, the E= option is ignored. By default, if you specify the STDERR, PDIFF, or TDIFF option and do not specify the E= option, the procedure uses the error mean square for calculating standard errors and probabilities.

ETYPE=n

specifies the type (1, 2, 3, or 4, corresponding to Type I, II, III, and IV tests, respectively) of the E= effect. If you specify the E= option but not the ETYPE= option, the highest type computed in the analysis is used. If you omit the E= option, the ETYPE= option has no effect.

NOPRINT

suppresses the normal display of results from the LSMEANS statement. This option is useful when an output data set is created with the OUT= option in the LSMEANS statement.

OBSMARGINS

OM

specifies a potentially different weighting scheme for computing LS-means coefficients. The standard LS-means have equal coefficients across classification effects; however, the OM option changes these coefficients to be proportional to those found in the input data set. For more information, see the section "Changing the Weighting Scheme"

The BYLEVEL option modifies the observed-margins LS-means. Instead of computing the margins across the entire data set, the procedure computes separate margins for each level of the LS-mean effect in question. The resulting LS-means are actually equal to raw means in this case. If you specify the BYLEVEL option, it disables the AT option.

OUT=SAS-data-set

creates an output data set that contains the values, standard errors, and, optionally, the covariances (see the COV option) of the LS-means. For more information, see the "Output Data Sets" section.

PDIFF<=difftype>

requests that p-values for differences of the LS-means be produced. The optional difftype specifies which differences to display. Possible values for difftype are ALL, CONTROL, CONTROLL, and CONTROLU. The ALL value requests all pairwise differences, and it is the default. The CONTROL value requests the differences with a control that, by default, is the first level of each of the specified LS-mean effects.

To specify which levels of the effects are the controls, list the quoted formatted values in parentheses after the keyword CONTROL. For example, if the effects A, B, and C are class variables, each having two levels, '1' and '2', the following LSMEANS statement specifies the '1' '2' level of A*B and the '2' '1' level of B*C as controls:

   lsmeans A*B B*C / pdiff=control('1' '2', '2' '1');

For multiple effect situations such as this one, the ordering of the list is significant, and you should check the output to make sure that the controls are correct.

Two-tailed tests and confidence limits are associated with the CONTROL difftype. For one-tailed results, use either the CONTROLL or CONTROLU difftype. The CONTROLL difftype tests whether the noncontrol levels are significantly less than the control; the lower confidence limits for the control minus the noncontrol levels are considered to be minus infinity. Conversely, the CONTROLU difftype tests whether the noncontrol levels are significantly greater than the control; the upper confidence limits for the noncontrol levels minus the control are considered to be infinity.

The default multiple comparisons adjustment for each difftype is shown in the following table.

*difftype*	Default ADJUST=
Not specified	T
ALL	TUKEY
CONTROL
CONTROLL	DUNNETT
CONTROLU

If no difftype is specified, the default for the ADJUST= option is T (that is, no adjustment); for PDIFF=ALL, ADJUST=TUKEY is the default; in all other instances, the default value for the ADJUST= option is DUNNETT. If there is a conflict between the PDIFF= and ADJUST= options, the ADJUST= option takes precedence.

For example, in order to compute one-sided confidence limits for differences with a control, adjusted according to Dunnett's procedure, the following statements are equivalent:

   lsmeans Treatment / pdiff=controll cl;
   lsmeans Treatment / pdiff=controll cl adjust=dunnett;

SLICE = fixed-effect

SLICE = (fixed-effects)

specifies effects within which to test for differences between interaction LS-mean effects. This can produce what are known as tests of simple effects (Winer 1971). For example, suppose that A*B is significant and you want to test for the effect of A within each level of B. The appropriate LSMEANS statement is

   lsmeans A*B / slice=B;

This code tests for the simple main effects of A for B, which are calculated by extracting the appropriate rows from the coefficient matrix for the A*B LS-means and using them to form an F-test as performed by the CONTRAST statement.

SINGULAR=number

tunes the estimability checking. If ABS(L - LH) > C×number for any row, then L is declared nonestimable. H is the (X'X)^-X'X matrix, and C is ABS(L) except for rows where L is zero, and then it is 1. The default value for the SINGULAR= option is 10^-4. Values for the SINGULAR= option must be between 0 and 1.

STDERR

produces the standard error of the LS-means and the probability level for the hypothesis H₀: LS-mean = 0.

TDIFF

produces the t values for all hypotheses H₀: LS-mean(i) = LS-mean(j) and the corresponding probabilities.

Chapter Contents
Previous
Next
Top