Background: The Underlying Model
The CATMOD procedure analyzes data that can be represented
by a two-dimensional contingency table.
The rows of the table correspond to populations (or samples)
formed on the basis of one or more independent variables.
The columns of the table correspond to observed responses
formed on the basis of one or more dependent variables.
The frequency in the (i,j)th cell is the number of subjects
in the ith population that have the jth response.
The frequencies in the table are assumed to follow a product
multinomial distribution, corresponding to a sampling design
in which a simple random sample is taken for each population.
The contingency table can be represented
as shown in Table 22.1.
Table 22.1: Contingency Table Representation
|
Response
|
|
Sample
|
1
|
2
|
...
|
r
|
Total
|
1 | n11 | n12 | ... | n1r | n1 |
2 | n21 | n22 | ... | n2r | n2 |
| | | | | |
s | ns1 | ns2 | ... | nsr | ns |
For each sample i, the probability of the jth response
() is estimated by the sample proportion,
pij=nij/ni. The vector (p) of all such
proportions is then transformed into a vector of functions,
denoted by F = F(p). If denotes the vector of true probabilities for the entire
table, then the functions of the true probabilities, denoted
by , are assumed to follow a linear model
where EA denotes asymptotic expectation, X
is the design matrix containing fixed constants, and
is a vector of parameters to be estimated.
PROC CATMOD provides two estimation methods:
- The maximum likelihood method estimates the parameters
of the linear model so as to maximize the value of the
joint multinomial likelihood function of the
responses. Maximum likelihood estimation is available
only for the standard response functions, logits and
generalized logits,
which are used for logistic regression analysis and
log-linear model analysis. For details of the theory,
refer to Bishop, Fienberg, and Holland (1975).
- The weighted least-squares method minimizes the
weighted residual sum of squares for the model. The
weights are contained in the inverse covariance matrix
of the functions F(p). According to
central limit theory, if the sample sizes within
populations are sufficiently large, the elements of
F and b (the estimate of ) are distributed approximately as multivariate normal.
This allows the computation of statistics for testing
the goodness of fit of the model and the significance
of other sources of variation. For details of the
theory, refer to Grizzle, Starmer, and Koch (1969) or Koch
et al. (1977, Appendix 1). Weighted least-squares
estimation is available for all types of response
functions.
Following parameter estimation, hypotheses about linear
combinations of the parameters can be tested. For that
purpose, PROC CATMOD computes generalized Wald (1943)
statistics, which are approximately distributed as
chi-square if the sample sizes are sufficiently large and
the null hypotheses are true.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.