Background: The Underlying Model

The CATMOD Procedure

Background: The Underlying Model

The CATMOD procedure analyzes data that can be represented by a two-dimensional contingency table. The rows of the table correspond to populations (or samples) formed on the basis of one or more independent variables. The columns of the table correspond to observed responses formed on the basis of one or more dependent variables. The frequency in the (i,j)th cell is the number of subjects in the ith population that have the jth response. The frequencies in the table are assumed to follow a product multinomial distribution, corresponding to a sampling design in which a simple random sample is taken for each population. The contingency table can be represented as shown in Table 22.1.

Table 22.1: Contingency Table Representation

	Response
Sample	1	2	...	r	Total
1	n₁₁	n₁₂	...	n_1r	n₁
2	n₂₁	n₂₂	...	n_2r	n₂
$\vdots$	$\vdots$	$\vdots$	$\ddots$	$\vdots$	$\vdots$
s	n_s1	n_s2	...	n_sr	n_s

For each sample i, the probability of the jth response ( $\pi_{ij}$ ) is estimated by the sample proportion, p_ij=n_ij/n_i. The vector (p) of all such proportions is then transformed into a vector of functions, denoted by F = F(p). If ${\pi}$ denotes the vector of true probabilities for the entire table, then the functions of the true probabilities, denoted by ${F(\pi)}$ , are assumed to follow a linear model

${E_A}(F) = F({\pi}) = X {\beta}$

where E_A denotes asymptotic expectation, X is the design matrix containing fixed constants, and ${\beta}$ is a vector of parameters to be estimated.

PROC CATMOD provides two estimation methods:

The maximum likelihood method estimates the parameters of the linear model so as to maximize the value of the joint multinomial likelihood function of the responses. Maximum likelihood estimation is available only for the standard response functions, logits and generalized logits, which are used for logistic regression analysis and log-linear model analysis. For details of the theory, refer to Bishop, Fienberg, and Holland (1975).
The weighted least-squares method minimizes the weighted residual sum of squares for the model. The weights are contained in the inverse covariance matrix of the functions F(p). According to central limit theory, if the sample sizes within populations are sufficiently large, the elements of F and b (the estimate of ${\beta}$ ) are distributed approximately as multivariate normal. This allows the computation of statistics for testing the goodness of fit of the model and the significance of other sources of variation. For details of the theory, refer to Grizzle, Starmer, and Koch (1969) or Koch et al. (1977, Appendix 1). Weighted least-squares estimation is available for all types of response functions.

Following parameter estimation, hypotheses about linear combinations of the parameters can be tested. For that purpose, PROC CATMOD computes generalized Wald (1943) statistics, which are approximately distributed as chi-square if the sample sizes are sufficiently large and the null hypotheses are true.

Chapter Contents
Previous
Next
Top