Иностранные языки \ Английский язык

Validation for the Cox Model. Terry M. Therneau Mayo Foundation. Breslow estimates. Exact partial likelihood

Страницы работы

18 страниц (Word-файл)

Посмотреть все страницы

Скачать файл

Содержание работы

Validation for the Cox Model

Terry M. Therneau Mayo Foundation

Oct 2010

It is useful to have a set of test data where the results have been worked out in detail, both to illuminate the computations and to form a test case for software programs. The data sets below are quite simple, but have proven useful in this regard.

1 Test data 1

In this data set x is a 0/1 treatment covariate, with n = 6 subjects. There is one tied death time, one time with a death and a censoring, one with only a death, and one with only a censoring. (This is as small as a data set can be and still cover the four cases.) Let r = exp(β) be the risk score for a subject with x = 1. Table 1 shows the data set along with the mean and increment to the hazard at each point.

1.1 Breslow estimates

The loglikelihood has a term for each event; each term is the log of the ratio of the score for the subject who had an event over the sum of scores for

			x¯(t)		dΛ^ˆ₀(t)
Time	Status	x	Breslow	Efron	Breslow	Efron
1	1	1	r/(r + 1)	r/(r + 1)	1/(3r + 3)	1/(3r + 3)
1	0	1
6	1	1	r/(r + 3)	r/(r + 3)	2/(r + 3)	1/(r + 3)
6	1	0		r/(r + 5)		2/(r + 5)
8	0	0
9	1	0	0	0	1	1

Table 1: Test data 1

those who did not.

The actual solution corresponds to√ U(β) = 0, which from the quadratic

formula is r = (1/2)(3 + 33) ≈ 4.372281, or β^ˆ= log(r) ≈ 1.475285. Then

LL(0) = −4.564348 LL(β^ˆ) = −3.824750

U(0) = 1 U(β^ˆ) = 0

I(0) = 5/8 = 0.625 I(β^ˆ) = 0.634168.

Newton–Raphson iteration has increments of −I⁻¹U. Starting with the usual initial estimate of β = 0, the N–R iterates are 0, 8/5, 1.4727235, 1.4752838, 1.4752849, .... S considers the algorithm to have converged after three iterations, SAS after four (using the default convergence criteria in each package).

The martingale residuals are a simple function of the cumulative hazard, M_i= δ_i− rΛ(^ˆt_i).

Subject	Λ₀	M_c(0)
1	1/(3r + 3)	5/6	.728714
2	1/(3r + 3)	–1/6	–.271286
3	1/(3r + 3) + 2/(r + 3)	1/3	–.457427
4	1/(3r + 3) + 2/(r + 3)	1/3	.666667
5	1/(3r + 3) + 2/(r + 3)	–2/3	–.333333
6	1/(3r + 3) + 2/(r + 3) + 1	–2/3	–.333333

The score residual L_ican be calculated from the entries in Table 1. For subject number 3, for instance, we have

Let a = (r + 1)(3r + 3) and b = (r + 3)²; then the residuals are as follows.

Subject	L	L(0)	L(β^ˆ)
1	(2r + 3)/a	5/12	.135643
2	−r/a	–1/12	–.050497
3	−r/a + 3(3 − r)/b	7/24	–.126244
4	r/a − r(r + 1)/b	–1/24	–.381681
5	r/a + 2r/b	5/24	.211389
6	r/a + 2r/b	5/24	.211389

The Schoenfeld residuals are defined at the three unique death times, and have values of 1−r/(r +1) = 1/(r +1), {1−r/(r +3)}+{0−r/(r +3)} = (3 − r)/(3 + r), and 0 at times 1, 6, and 9, respectively. For convenience in plotting and use, however, the programs return one residual for each event rather than one per unique event time. The two values returned for time 6 are 3/(r + 3) and −r/(r + 3).

The Nelson–Aalen estimate of the hazard is closely related to the Breslow approximation for ties. The baseline hazard is shown as the column Λ₀above. The hazard estimate for a subject with covariate x_iis Λ_i(t) = exp(x_iβ)Λ₀(t) and the survival estimate is S_i(t) = exp(−Λ_i(t)).

The variance of the cumulative hazard is the sum of two terms. Term 1 is a natural extension of the Nelson–Aalen estimator to the case where there are weights. It is a running sum, with an increment at each death time of dN(t)/(^PY_i(t)r_i(t))². For a subject with covariate x_ithis term is multiplied by [exp(x_iβ)]².