Validation for the Cox Model
Terry M. Therneau Mayo Foundation
Oct 2010
It is useful to have a set of test data where the results have been worked out in detail, both to illuminate the computations and to form a test case for software programs. The data sets below are quite simple, but have proven useful in this regard.
In this data set x is a 0/1 treatment covariate, with n = 6 subjects. There is one tied death time, one time with a death and a censoring, one with only a death, and one with only a censoring. (This is as small as a data set can be and still cover the four cases.) Let r = exp(β) be the risk score for a subject with x = 1. Table 1 shows the data set along with the mean and increment to the hazard at each point.
The loglikelihood has a term for each event; each term is the log of the ratio of the score for the subject who had an event over the sum of scores for
x¯(t) |
dΛˆ0(t) |
|||||
Time |
Status |
x |
Breslow |
Efron |
Breslow |
Efron |
1 |
1 |
1 |
r/(r + 1) |
r/(r + 1) |
1/(3r + 3) |
1/(3r + 3) |
1 |
0 |
1 |
||||
6 |
1 |
1 |
r/(r + 3) |
r/(r + 3) |
2/(r + 3) |
1/(r + 3) |
6 |
1 |
0 |
r/(r + 5) |
2/(r + 5) |
||
8 |
0 |
0 |
||||
9 |
1 |
0 |
0 |
0 |
1 |
1 |
Table 1: Test data 1
those who did not.
The actual solution corresponds to√ U(β) = 0, which from the quadratic
formula is r = (1/2)(3 + 33) ≈ 4.372281, or βˆ = log(r) ≈ 1.475285. Then
LL(0) = −4.564348 LL(βˆ) = −3.824750
U(0) = 1 U(βˆ) = 0
I(0) = 5/8 = 0.625 I(βˆ) = 0.634168.
Newton–Raphson iteration has increments of −I−1U. Starting with the usual initial estimate of β = 0, the N–R iterates are 0, 8/5, 1.4727235, 1.4752838, 1.4752849, .... S considers the algorithm to have converged after three iterations, SAS after four (using the default convergence criteria in each package).
The martingale residuals are a simple function of the cumulative hazard, Mi = δi − rΛ(ˆ ti).
Subject |
Λ0 |
Mc(0) |
|
1 |
1/(3r + 3) |
5/6 |
.728714 |
2 |
1/(3r + 3) |
–1/6 |
–.271286 |
3 |
1/(3r + 3) + 2/(r + 3) |
1/3 |
–.457427 |
4 |
1/(3r + 3) + 2/(r + 3) |
1/3 |
.666667 |
5 |
1/(3r + 3) + 2/(r + 3) |
–2/3 |
–.333333 |
6 |
1/(3r + 3) + 2/(r + 3) + 1 |
–2/3 |
–.333333 |
The score residual Li can be calculated from the entries in Table 1. For subject number 3, for instance, we have
.
Let a = (r + 1)(3r + 3) and b = (r + 3)2; then the residuals are as follows.
Subject |
L |
L(0) |
L(βˆ) |
1 |
(2r + 3)/a |
5/12 |
.135643 |
2 |
−r/a |
–1/12 |
–.050497 |
3 |
−r/a + 3(3 − r)/b |
7/24 |
–.126244 |
4 |
r/a − r(r + 1)/b |
–1/24 |
–.381681 |
5 |
r/a + 2r/b |
5/24 |
.211389 |
6 |
r/a + 2r/b |
5/24 |
.211389 |
The Schoenfeld residuals are defined at the three unique death times, and have values of 1−r/(r +1) = 1/(r +1), {1−r/(r +3)}+{0−r/(r +3)} = (3 − r)/(3 + r), and 0 at times 1, 6, and 9, respectively. For convenience in plotting and use, however, the programs return one residual for each event rather than one per unique event time. The two values returned for time 6 are 3/(r + 3) and −r/(r + 3).
The Nelson–Aalen estimate of the hazard is closely related to the Breslow approximation for ties. The baseline hazard is shown as the column Λ0 above. The hazard estimate for a subject with covariate xi is Λi(t) = exp(xiβ)Λ0(t) and the survival estimate is Si(t) = exp(−Λi(t)).
The variance of the cumulative hazard is the sum of two terms. Term 1 is a natural extension of the Nelson–Aalen estimator to the case where there are weights. It is a running sum, with an increment at each death time of dN(t)/(PYi(t)ri(t))2. For a subject with covariate xi this term is multiplied by [exp(xiβ)]2.
Уважаемый посетитель!
Чтобы распечатать файл, скачайте его (в формате Word).
Ссылка на скачивание - внизу страницы.