Validation for the Cox Model. Terry M. Therneau Mayo Foundation. Breslow estimates. Exact partial likelihood, страница 2

The second term is dI−1d0, where I is the variance–covariance matrix of the Cox model, and d is a vector. The second term accounts for the fact that the weights themselves have a variance; d is the derivative of S(t) with respect to β and can be formally written as

.

This can be recognized as −1 times the score residual process for a subject with xi as covariates and no events; it measures leverage of a particular observation on the estimate of β. It is intuitive that a small score residual — an obs with such covariates has little influence on β — results in a small added variance; that is, β has little influence on the estimated survival.

Time

Term 1

1

1/(3r + 3)2

6

1/(3r + 3)2 + 2/(r + 3)2

9

1/(3r + 3)2 + 2/(r + 3)2 + 1/12

Time

d

1

(r/(r + 1)) ∗ 1/(3r + 3)

6

(r/(r + 1)) ∗ 1/(3r + 3) + (r/(r + 3)) ∗ 2/(r + 3)

9

(r/(r + 1)) ∗ 1/(3r + 3) + (r/(r + 3)) ∗ 2/(r + 3) + 0 ∗ 1

For β = 0, x = 0:

Time

Variance

1

1/36

+ 1.6 ∗ (1/12)2

= 7/180

6

(1/36 + 2/16)

+ 1.6 ∗ (1/12 + 2/16)2

= 2/9

9

(1/36 + 2/16 + 1)

+ 1.6 ∗ (1/12 + 2/16 + 0)2

= 11/9

For β = 1.4752849, x = 0

Time

Variance

1

0.0038498      + .004021

= 0.007871

2

0.040648           + .0704631

= 0.111111

4

1.040648           + .0704631

= 1.111111

1.2  Efron approximation

The Efron approximation [?] differs from the Breslow only at day 6, where two deaths occur. A useful way to think about the approximation is this: assume that if the data had been measured with higher accuracy that the deaths would not have been tied, that is two cases died on day 6 but they did not perish at the same instant on that day. There are thus two separate events on day 6. Four subjects were alive and at risk for the first of the events. Three subjects were at risk for the second event, either subjects 3, 5, and 6 or subjects 2, 5, and 6, but we do not know which. In some sense then, subjects 3 and 4 each have “.5” probability of being at risk for the second event at time 2 + . In the computation, we treat the two deaths as two separate times (two terms in the loglik), with subjects 3 and 4 each having a case weight of 1/2 for the second event. The mean covariate for the second event is then

and the main quantities are

The solution corresponds to the one positive root of U(β) = 0, which can

be written as φ = arccos{(45/23)p3/23}, r = 2p23/3cos(φ/3) ≈ 5.348721, or βˆ = log(r) ≈ 1.676858. Then

LL(0) = −4.276666

LL(βˆ) = −3.358979

U(0) = 52/48

U(βˆ) = 0

I(0) = 83/144

I(βˆ) = 0.652077.

The cumulative hazard now has a jump of size 1/(r + 3) + 2/(r + 5) at time 6. Efron [?] did not discuss estimation of the cumulative hazard, but it follows directly from the same argument as that used for the loglikelihood so we refer to it as the “Efron” estimate of the hazard. In S this hazard is the default whenever the Efron approximation for ties is used; the estimate is not available in SAS. For simple survival curves (i.e., the no-covariate case), the estimate is explored by Fleming and Harrington [?] as an alternative to the Kaplan–Meier.