MODEL Statement
- MODEL response-effect=< design-effects >< / options > ;
PROC CATMOD requires a MODEL statement. You can specify the
following in a MODEL statement:
- response-effect
- can be either a single
variable, a crossed effect with two or more variables joined
by asterisks, or _F_. The _F_ specification indicates
that the response functions and their estimated covariance
matrix are to be read directly into the procedure. The
response-effect indicates the dependent variables that
determine the response categories (the columns of the
underlying contingency table).
- design-effects
- specify potential sources of variation (such as main effects
and interactions) in the model. Thus, these effects
determine the number of model parameters, as well as the
interpretation of such parameters. In addition, if there is
no POPULATION statement, PROC CATMOD uses these variables to
determine the populations (the rows of the underlying
contingency table). When fitting the model, PROC CATMOD
adjusts the independent effects in the model for all other
independent effects in the model.
Design-effects can be any of those described in
the section "Specification of Effects", or they can be defined by specifying the
actual design matrix, enclosed in parentheses (see
the "Specifying the Design Matrix Directly" section). In addition, you can use the keyword _RESPONSE_
alone or as part of an effect. Effects cannot
be nested within _RESPONSE_, so effects of the form
A(_RESPONSE_) are invalid.
For more information, see the "Log-Linear Model Analysis" section and
the "Repeated Measures Analysis" section.
Some examples of MODEL statements are
model r=a b; | main effects only |
model r=a b a*b; | main effects with interaction |
model r=a b(a); | nested effect |
model r=a|b; | complete factorial |
model r=a b(a=1) b(a=2); | nested-by-value effects |
model r*s=_response_; | log-linear model |
model r*s=a _response_(a); | nested repeated measurement factor |
model _f_=_response_; | direct input of the response functions |
The relationship between these specifications and the
structure of the design matrix X is described in
the "Generation of the Design Matrix" section.
The following table summarizes the options available in the
MODEL statement.
Task
|
Options
|
Specify details of computation |
Generates maximum likelihood estimates | ML |
Generates weighted least-squares estimates | GLS |
| WLS |
Omits intercept term from the model | NOINT |
Adds a number to each cell frequency | ADDCELL= |
Averages main effects across response functions | AVERAGED |
Specifies the convergence criterion for maximum likelihood | EPSILON= |
Specifies the number of iterations for maximum likelihood | MAXITER= |
Request additional
computation and tables |
Estimated correlation matrix of estimates | CORRB |
Covariance matrix of response functions | COV |
Estimated covariance matrix of estimates | COVB |
Two-way frequency tables | FREQ |
One-way frequency tables | ONEWAY |
Predicted values | PRED= |
| PREDICT |
Probability estimates | PROB |
Crossproducts matrix | XPX |
Title | TITLE= |
Suppress output |
Design matrix | NODESIGN |
Iterations for maximum likelihood | NOITER |
Parameter estimates | NOPARM |
Population and response profiles | NOPROFILE |
_RESPONSE_ matrix | NORESPONSE |
The following list describes these options in alphabetical order.
- ADDCELL=number
-
adds number to the frequency count in each cell,
where number is any positive number.
This option has no effect on maximum likelihood analysis;
it is used only for weighted least-squares analysis.
-
AVERAGED
-
specifies that dependent variable effects can be modeled
and that independent variable main effects are averaged
across the response functions in a population.
For further information on the effect of using
(or not using) the AVERAGED option, see
the "Generation of the Design Matrix" section.
Direct input of the design matrix or specification
of the _RESPONSE_ keyword in the MODEL statement
automatically induces an AVERAGED model type.
- CORRB
-
displays the estimated correlation matrix of the parameter
estimates.
- COV
-
displays Si, which is the covariance matrix of
the response functions for each population.
- COVB
-
displays the estimated covariance matrix of the parameter
estimates.
-
EPSILON=number
-
specifies the convergence criterion for the
maximum likelihood estimation of the parameters. The
iterative estimation process stops when the proportional
change in the log likelihood is less than number, or
after the number of iterations specified by the MAXITER=
option, whichever comes first. By default, EPSILON=1E-8.
- FREQ
-
produces the two-way frequency table for the
cross-classification of populations by responses.
- MAXITER=number
-
specifies the maximum number of iterations used for
the maximum likelihood estimation of the parameters.
By default, MAXITER=20.
- ML
-
computes maximum likelihood estimates. This option is
available when generalized logits are used, or for the
special case of a single two-level dependent variable where
cumulative logits or adjacent category logits are used. For
generalized logits (the default response functions), ML is
the default estimation method.
- NODESIGN
-
suppresses the display of the design matrix X.
- NOINT
-
suppresses the intercept term in the model.
- NOITER
-
suppresses the display of parameter estimates and other
information at each iteration of a maximum likelihood analysis.
- NOPARM
-
suppresses the display of the estimated parameters and
the statistics for testing that each parameter is zero.
- NOPREDVAR
-
suppresses the display of the variable levels in
tables requested with the PRED= option.
-
NOPRINT
-
suppresses the normal display of results.
The NOPRINT option is useful when you only want to create
output data sets with the OUT= or OUTEST= option in the
RESPONSE statement. A NOPRINT
option is also available in the PROC CATMOD statement.
Note that this option
temporarily disables the Output Delivery
System (ODS); see Chapter 15, "Using the Output Delivery System," for
more information.
- NOPROFILE
-
suppresses the display of the population
profiles and the response profiles.
- NORESPONSE
-
suppresses the display of the _RESPONSE_ matrix for
log-linear models. For further information, see
the "Log-Linear Model Design Matrices" section.
- ONEWAY
-
produces a one-way table of frequencies for each variable
used in the analysis. This table is useful in determining
the order of the observed levels for each variable.
-
PREDICT
- PRED=FREQ | PROB
-
displays the observed and predicted values of the response
functions for each population, together with their standard
errors and the residuals (observed - predicted). In
addition, if the response functions are the standard ones
(generalized logits), then the PRED=FREQ option specifies
the computation and display of predicted cell frequencies,
while PRED=PROB (or just PREDICT) specifies the computation
and display of predicted cell probabilities.
The OUT= data set always contains the predicted
probabilities. If the response functions are the
generalized logits, the predicted cell probabilities are
output unless the option PRED=FREQ is specified, in which
case the predicted cell frequencies are output.
- PROB
-
produces the two-way table of probability estimates for
the cross-classification of populations by responses.
These estimates sum to one across the
response categories for each population.
- TITLE='title'
-
displays the title at the top of certain pages
of output that correspond to this MODEL statement.
- WLS
- GLS
-
computes weighted least-squares estimates. This type of
estimation is also called generalized-least-squares
estimation. For response functions other than the default
(of generalized logits), WLS is the default estimation
method.
- XPX
-
displays X'S-1X,
the crossproducts matrix for the normal equations.
If you specify the design matrix directly, adjacent rows of
the matrix must be separated by a comma, and the matrix must
have q ×s rows, where s is the number of
populations and q is the number of response functions per
population. The first q rows correspond to the response
functions for the first population, the second set of q
rows corresponds to the functions for the second population,
and so forth. The following is an example using direct
specification of the design matrix.
proc catmod;
model R=(1 0,
1 1,
1 2,
1 3);
run;
These statements are appropriate for the case of one
population and for R with five levels (generating four
response functions), so that 4 ×1 = 4. These
statements are also appropriate for a situation with
two populations and two response functions per population;
giving 2 ×2 = 4 rows of the design matrix. (To induce
more than one population, the POPULATION statement is
needed.)
When you input the design matrix directly, you also have the
option of specifying that any subsets of the parameters be
tested for equality to zero. Indicate each subset by
specifying the appropriate column numbers of the design
matrix, followed by an equal sign and a label (24 characters
or less, in quotes) that describes the subset.
Adjacent subsets are separated by a comma, and the entire
specification is enclosed in parentheses and placed after
the design matrix. For example,
proc catmod;
population Group Time;
model R=(1 1 0 0,
1 1 0 1,
1 1 0 2,
1 0 1 0,
1 0 1 1,
1 0 1 2,
1 -1 -1 0,
1 -1 -1 1,
1 -1 -1 2) (1 ='Intercept',
2 3='Group main effect',
4 ='Linear effect of Time');
run;
The preceding statements are appropriate when Group
and Time each have three levels, and R is dichotomous.
The POPULATION statement induces nine populations, and q=1
(since R is dichotomous), so q ×s = 1 ×9 = 9.
If you input the design matrix directly but do not specify
any subsets of the parameters to be tested, then PROC CATMOD
tests the effect of MODEL | MEAN, which represents the
significance of the model beyond what is explained by an
overall mean. For the previous example, the MODEL | MEAN
effect is the same as that obtained by specifying
(2 3 4='model|mean');
at the end of the MODEL statement.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.