Residuals

The PHREG Procedure

Residuals

The cumulative baseline hazard function $\Lambda_0$ is estimated by

$\hat{\Lambda}_0(t) = \sum_{i=1}^n\int_{0}^t \frac{dN_{i}(s)}{\sum_{j=1}^nY_{j}(s)\exp(\hat{{\beta}}'Z_{j}(s))}$

Although this formula is for the BRESLOW=TIES option, the same formula is used for other TIES= options. The discrepancies between results obtained by using an appropriate formula for a nondefault TIES= option and those obtained by the given formula are minimal.

The martingale residual at t is defined as

$\hat{M}_i(t) = N_{i}(t) - \int_{0}^tY_{i}(s)\exp(\hat{{\beta}}'Z_{i}(s)) d\hat{\Lambda}_0(s)$

Here $\hat{M}_i(t)$ estimates the difference over (0,t] between the observed number of events for the ith subject and a conditional expected number of events. The quantity $\hat{M}_i \equiv \hat{M}_i(\infty)$ is referred to as the martingale residual for the ith subject. When the counting process MODEL specification is used, the RESMART= variable contains the component ( $\hat{M}_i(t_2) - \hat{M}_i(t_1)$ ) instead of the martingale residual at t₂. The martingale residual for a subject can be obtained by summing up these component residuals within the subject. For the Cox model with no time-dependent explanatory variables, the martingale residual for the ith subject with observation time t_i and event status $\delta_i$ , where

$\delta_{i}=\{ 0 & {if t_{i}\space is a censored time} \ 1 & {if t_{i}\space is an event time} .$

$\hat{M}_i = \delta_i - \hat{\Lambda}_0(t_i)\exp(\hat{{\beta}}' z_i)$

The deviance residuals d_i are a transform of the martingale residuals:

$d_{i}= sign(\hat{M}_i)\sqrt{2 \biggl[ -\hat{M}_i- N_{i}(\infty)\log \biggl( \frac{N_{i}(\infty)- \hat{M}_i}{N_{i}(\infty)} \biggr) \biggr]}$

The square root shrinks large negative martingale residuals, while the logarithmic transformation expands martingale residuals that are close to unity. As such, the deviance residuals are more symmetrically distributed about zero than the martingale residuals. For the Cox model, the deviance residual reduces to the form

$d_{i}= sign(\hat{M}_i)\sqrt{2 [ -\hat{M}_i- \delta_i \log( \delta_i - \hat{M}_i)]}$

When the counting process MODEL specification is used, values of the RESDEV= variable are set to missing because the deviance residuals can be calculated on a per subject basis only.

The Schoenfeld residual vector is calculated on a per event time basis as

$U_{i}(t) = Z_{i}(t) - \bar{Z}(t)$

where t is an event time, and $\bar{Z}(t)$ is a weighted average of the covariates over the risk set at time t and is given by

$\bar{Z}(t) = \frac{\sum_{l=1}^n Y_{l}(t)Z_{l}(t)\exp(\hat{{\beta}}'Z_{l}(t))} {\sum_{l=1}^n Y_{l}(t)\exp(\hat{{\beta}}'Z_{l}(t))}$

Under the proportional hazards assumption, the Schoenfeld residuals have the sample path of a random walk; therefore, they are useful in assessing time trend or lack of proportionality. Harrell (1986) proposed a z-transform of the Pearson correlation between these residuals and the rank order of the failure time as a test statistic for nonproportional hazards. Therneau, Grambsch, and Fleming (1990) considered a Kolmogorov-type test using the cumulative sum of the residuals.

The score process for the ith subject at time t is

$L_{i}(t) = \int_{0}^t [Z_{i}(s) - \bar{Z}(s)] d\hat{M_{i}}(s)$

The vector $L_i \equiv L_i(\infty)$ is the score residual for the ith subject. When the counting process MODEL specification is used, the RESSCO= variables contain the components of (L_i(t2) - L_i(t1)) instead of the score process at t2. The score residual for a subject can be obtained by summing up these component residuals within the subject.

The score residuals are a decomposition of the first partial derivative of the log likelihood. They are useful in assessing the influence of each subject on individual parameter estimates. They also play an important role in the computation of the robust sandwich variance estimators of Lin and Wei (1989) and Wei, Lin, and Weissfeld (1989).

Chapter Contents
Previous
Next
Top