Validation for the Cox Model. Terry M. Therneau Mayo Foundation. Breslow estimates. Exact partial likelihood, страница 8

giving numeric values of 0.0012706, 0.0649885, and 0.2903805, respectively.

3.2  Efron approximation

For the Efron approximation the combination of tied times and case weights can be approached in at least two ways. One is to treat the case weights as replication counts. There are then 10 tied deaths at time 2 in the data above, and the Efron approximation involves 10 different denominator terms. Let a = 7r + 3, the sum of risk scores for the 3 observations with an event at time 2 and b = 4r + 2, the sum of risk scores for the other subjects at risk at time 2. For the replication approach, the loglikelihood is

             LL          = {2β − log(r2 + 11r + 7)} +

{7β − log(a + b) − log(.9a + b) − ... − log(.1a + b)} + {2β − log(2r + 1) − log(r + 1)}.

A test program can be created by comparing results from the weighted data set (9 observations) to the unweighted replicated data set (19 observations). This is the approach taken by SAS phreg using the freq statement. It’s advantage is that the appropriate result for all of the weighted computations is perfectly clear, the disadvantage is that only integer case weights are supported. (A second advantage is that I did not need to create another algebraic derivation for my test suite.)

A second approach, used in S, allows for non-integer weights. SAS also has weighted estimates, but I am not familiar with their algorithms. The data is considered to be 3 tied observations, and the log-likelihood at time 2 is the sum of 3 weighted terms. The first term of the three is one of

3[β − log(a + b)]

4[β − log(a + b)] or    3[0 − log(a + b)],

depending on whether the event for observation C, D or E actually happened first (had we observed the time scale more exactly); the leading multiplier of 3, 4 or 3 is the case weight. The second term is one of 6 possiblities

4[β − log(4r + 3 + b)]

CDE

3[β − log(4r + 3 + b)]

CED

3[0 − log(3r + 3 + b)]

DCE

3[0 − log(3r + 3 + b)]

DEC

3[β − log(7r + 0 + b)]

ECD

or

4[β − log(7r + 0 + b)]

EDC

The first choice corresponds to an event order of observation C then D (subject D has the event, with D and E still at risk), etc. For a weighted Efron approximation first replace each term by its average, just as in the unweighted case. The first terms ends up as (7/3)β − (10/3)log(a + b), the second as (7/3)β−20/6log(2a/3+b), and the third as (7/3)β−10/3log(a/3+ b). This replaces the interior of the log function with its average, and the multiplier of the log with the average weight of 10/3. The final log-likelihood and score statistic are

LL

=

{2β − log(r2 + 11r + 7)}

+{7β − (10/3)[log(a + b) + log(2a/3 + b) + log(a/3 + b)]}

+2{β − log(2r + 1)}

U

=

(2 − x¯1) + 2(1 − x¯3)

+7 − (10/3)[¯x2 + 26r/(26r + 12) + 19r/(19r + 9)]

=

11 − (¯x1 + (10/3)(¯x2 + ¯x2b + ¯x2c) + 2¯x3)

I

=

[(4s2 + 11s)/(s2 + 11s + 7) − x¯21]

+(10/3)[(¯x2 x¯22) + (¯x2b x¯22b) + (¯x2b x¯22b)

+2(¯x3 x¯23)

The solution is at β = .87260425, and

                                   LL(0) = −30.29218      LL(βˆ) = −29.41678

                                  U(0) = 2.148183           U(βˆ) = 0

                                  I(0) = 2.929182             I(βˆ) = 1.969447.