Generalized Estimating Equations

Correlated data are modeled using the same link function and linear predictor setup (systematic component) as the independence case. The random component is described by the same variance functions as in the independence case, but the covariance structure of the correlated measurements must also be modeled. Let the vector of measurements on the ith subject be Y = [Y_i1, ... , Y_{i n_i}]' with corresponding vector of means $\bdmu=[\mu_{i1}, ... , \mu_{i n_i}]^'$ and let V be the covariance matrix of Y. Let the vector of independent, or explanatory, variables for the jth measurement on the ith subject be

Working Correlation Matrix

The working correlation matrix is usually unknown and must be estimated. It is estimated in the iterative fitting process using the current value of the parameter vector ${\beta}$ to compute appropriate functions of the Pearson residual

Following are the structures of the working correlation supported by the GENMOD procedure and the estimators used to estimate the working correlations.

Working Correlation Structure		Estimator
Fixed
	Corr(Y_ij,Y_ik) = r_jk where r_jk is the jkth element of a constant, user-specified correlation matrix R₀.	The working correlation is not estimated in this case.

Independent
	${\displaystyle {Corr}(Y_{ij},Y_{ik})= \{1 & j = k \ 0 & j \ne k . }$	The working correlation is not estimated in this case.

m-dependent
	${\displaystyle {Corr}(Y_{ij},Y_{i,j+t})= \{1 & t = 0 \ \alpha_{t} & t=1,2, ... ,m \ 0 & t \gt m . }$	${\displaystyle \hat{\alpha}_{t} = \frac{1}{(K_t-p)\phi}\sum_{i=1}^K \sum_{j\leq n_i-t}e_{ij}e_{i,j+t} }$
		$K_t = \sum_{i=1}^K (n_i - t)$

Exchangeable
	$\displaystyle {Corr}(Y_{ij},Y_{ik})=\{1 & j = k \ \alpha & j \neq k \ .$	${\displaystyle \hat{\alpha} = \frac{1}{(N^{*}-p)\phi}\sum_{i=1}^K \sum_{j\neq k}e_{ij}e_{ik} }$
		$N^{*}=\sum_{i=1}^K n_i(n_i-1)$

Unstructured
	$\displaystyle {Corr}(Y_{ij},Y_{ik})= \{ 1 & j = k \ \alpha_{jk} & j \neq k \ .$	$\displaystyle \hat{\alpha}_{jk} = \frac{1}{(K-p)\phi}\sum_{i=1}^Ke_{ij}e_{ik}$

Autoregressive AR(1)
	${Corr}(Y_{ij},Y_{i,j+t})= \alpha^t$ for t = 0,1,2, ... ,n_i-j	${\displaystyle \hat{\alpha} = \frac{1}{(K_1-p)\phi}\sum_{i=1}^K \sum_{j\leq n_i-1}e_{ij}e_{i,j+1} }$
		$K_1 = \sum_{i=1}^K (n_i - 1)$

Dispersion Parameter

The square root of $\hat{\phi}$ is reported by PROC GENMOD as the scale parameter in the "Analysis of GEE Parameter Estimates Model-Based Standard Error Estimates" output table.

Fitting Algorithm

Missing Data

The contribution of the ith unit to the parameter update equation is computed by omitting the elements of $(\bdY - \bdmu)$ , the columns of $D_i^'=\frac{\partial {\mu}}{\partial {\beta}}^'$ , and the rows and columns of V corresponding to missing measurements.

Parameter Estimate Covariances

Multinomial GEEs

Alternating Logistic Regressions

For binary data, the correlation between the jth and kth response is, by definition,

The ALR algorithm seeks to model the logarithm of the odds ratio, ${\gamma_{ijk} = \log(OR(Y_{ij},Y_{ik}))}$ , as

The parameter $\gamma_{ijk}$ can take any value in $(-\infty, \infty)$ with $\gamma_{ijk} = 0$ corresponding to no association.

The log odds ratio, when modeled in this way with a regression model, can take different values in subgroups defined by z_ijk. For example, z_ijk can define subgroups within clusters, or it can define `block effects' between clusters.

You specify a GEE model for binary data using log odds ratios by specifying a model for the mean, as in ordinary GEEs, and a model for the log odds ratios. You can use any of the link functions appropriate for binary data in the model for the mean, such as logistic, probit, or complementary log-log. The ALR algorithm alternates between a GEE step to update the model for the mean and a logistic regression step to update the log odds ratio model. Upon convergence, the ALR algorithm provides estimates of the regression parameters for the mean, ${\beta}$ , the regression parameters for the log odds ratios, ${\alpha}$ , their standard errors, and their covariances.

Pair	Parameter
(1,2)	Alpha1
(1,3)	Alpha2
(1,4)	Alpha3
(2.3)	Alpha4
(2,4)	Alpha5
(3,4)	Alpha6

Generalized Estimating Equations

Working Correlation Matrix

Dispersion Parameter

Fitting Algorithm

Missing Data

Parameter Estimate Covariances

Multinomial GEEs

Alternating Logistic Regressions

Specifying Log Odds Ratio Models

Generalized Score Statistics