Validation for the Cox Model. Terry M. Therneau Mayo Foundation. Breslow estimates. Exact partial likelihood

Страницы работы

Содержание работы

Validation for the Cox Model

Terry M. Therneau Mayo Foundation

Oct 2010

It is useful to have a set of test data where the results have been worked out in detail, both to illuminate the computations and to form a test case for software programs. The data sets below are quite simple, but have proven useful in this regard.

1  Test data 1

In this data set x is a 0/1 treatment covariate, with n = 6 subjects. There is one tied death time, one time with a death and a censoring, one with only a death, and one with only a censoring. (This is as small as a data set can be and still cover the four cases.) Let r = exp(β) be the risk score for a subject with x = 1. Table 1 shows the data set along with the mean and increment to the hazard at each point.

1.1  Breslow estimates

The loglikelihood has a term for each event; each term is the log of the ratio of the score for the subject who had an event over the sum of scores for

x¯(t)

dΛˆ0(t)

Time

Status

x

Breslow

Efron

Breslow

Efron

1

1

1

r/(r + 1)

r/(r + 1)

1/(3r + 3)

1/(3r + 3)

1

0

1

6

1

1

r/(r + 3)

r/(r + 3)

2/(r + 3)

1/(r + 3)

6

1

0

r/(r + 5)

2/(r + 5)

8

0

0

9

1

0

0

0

1

1

Table 1: Test data 1

those who did not.

       The actual solution corresponds to√   U(β) = 0, which from the quadratic

formula is r = (1/2)(3 +         33) ≈ 4.372281, or βˆ = log(r) ≈ 1.475285. Then

                                   LL(0) = −4.564348      LL(βˆ) = −3.824750

                                   U(0) = 1                           U(βˆ) = 0

I(0) = 5/8 = 0.625 I(βˆ) = 0.634168.

Newton–Raphson iteration has increments of −I−1U. Starting with the usual initial estimate of β = 0, the N–R iterates are 0, 8/5, 1.4727235, 1.4752838, 1.4752849, .... S considers the algorithm to have converged after three iterations, SAS after four (using the default convergence criteria in each package).

The martingale residuals are a simple function of the cumulative hazard, Mi = δi rΛ(ˆ ti).

Subject

Λ0

Mc(0)

1

1/(3r + 3)

5/6

.728714

2

1/(3r + 3)

–1/6

–.271286

3

1/(3r + 3) + 2/(r + 3)

1/3

–.457427

4

1/(3r + 3) + 2/(r + 3)

1/3

.666667

5

1/(3r + 3) + 2/(r + 3)

–2/3

–.333333

6

1/(3r + 3) + 2/(r + 3) + 1

–2/3

–.333333

The score residual Li can be calculated from the entries in Table 1. For subject number 3, for instance, we have

.

Let a = (r + 1)(3r + 3) and b = (r + 3)2; then the residuals are as follows.

Subject

L

L(0)

L(βˆ)

1

(2r + 3)/a

5/12

.135643

2

r/a

–1/12

–.050497

3

r/a + 3(3 − r)/b

7/24

–.126244

4

r/a r(r + 1)/b

–1/24

–.381681

5

r/a + 2r/b

5/24

.211389

6

r/a + 2r/b

5/24

.211389

The Schoenfeld residuals are defined at the three unique death times, and have values of 1−r/(r +1) = 1/(r +1), {1−r/(r +3)}+{0−r/(r +3)} = (3 − r)/(3 + r), and 0 at times 1, 6, and 9, respectively. For convenience in plotting and use, however, the programs return one residual for each event rather than one per unique event time. The two values returned for time 6 are 3/(r + 3) and −r/(r + 3).

The Nelson–Aalen estimate of the hazard is closely related to the Breslow approximation for ties. The baseline hazard is shown as the column Λ0 above. The hazard estimate for a subject with covariate xi is Λi(t) = exp(xiβ0(t) and the survival estimate is Si(t) = exp(−Λi(t)).

The variance of the cumulative hazard is the sum of two terms. Term 1 is a natural extension of the Nelson–Aalen estimator to the case where there are weights. It is a running sum, with an increment at each death time of dN(t)/(PYi(t)ri(t))2. For a subject with covariate xi this term is multiplied by [exp(xiβ)]2.

Похожие материалы

Информация о работе