Basic & Clinical Biostatistics, 4th Edition

9. Analyzing Research Questions About Survival

 

KEY CONCEPTS

image

When research involves time-related variables, such as survival and recurrence, we generally do not know the outcome for all patients at the time the study is published, so these outcomes are called censored.

image

The logrank statistic is one of the most commonly used methods to learn if two curves are significantly different.

image

Observations are doubly censored when not all patients enter the study at the same time.

image

The hazard ratio is similar to the odds ratio; the difference is that the hazard ratio compares risk over time, while the odds ratio examines risk at a given time.

image

An example of why special methods are needed to analyze survival data helps illustrate the logic behind them.

image

The Mantel–Haenszel statistic is also used to compare curves, not just survival curves.

image

Life table or actuarial methods were developed to show survival curves; although generally surpassed by Kaplan–Meier curves, they occasionally appear in the literature.

image

Several versions of the logrank statistic exist. The logrank statistic assumes that the risk of the outcome is the constant over time.

image

Survival analysis gives patients credit for how long they have been in the study, even if the outcome has not yet occurred.

image

The Mantel—Haenszel statistic essentially combines a number of 2 × 2 tables for an overall measure of difference.

image

The Kaplan–Meier procedure is the most commonly used method to illustrate survival curves.

image

The hazard function gives the probability that an outcome will occur in a given period, assuming that the outcome has not occurred during previous periods.

image

Estimates of survival are less precise as the time from entry into the study becomes longer, because the number of patients in the study decreases.

image

The intention-to-treat principle states that subjects are analyzed in the group to which they were assigned. It minimizes bias when there are treatment crossovers or dropouts.

image

Survival curves can also be used to compare survival in two or more groups.

 

PRESENTING PROBLEMS

Presenting Problem 1

Lung cancer is the leading cause of cancer deaths in men and in women between the ages of 15 and 64 years of age. Small-cell lung cancer accounts for 20–25% of all cases of lung cancer. At the time of diagnosis, 40% of the patients with small-cell cancer have disease confined to the thorax (limited disease) and 60% have metastases outside of the thorax (extensive disease). Current standard chemotherapy for extensive disease using a combination of cisplatin and etoposide yields a median survival of 8–10 months and a 2-year survival rate of 10%. Preliminary studies using a combination of cisplatin with irinotecan resulted in a median survival of 13.2 months. For this reason, Noda and colleagues (2002) at the Japan Clinical Oncology Group conducted a prospective, randomized clinical trial to compare irinotecan plus cisplatin with etoposide plus cisplatin. The primary endpoint was overall survival. Secondary endpoints included rates of complete and overall response. A complete response was defined as the disappearance of all clinical and radiologic evidence of a tumor for at least 4 weeks.

Over a 3-year period, 154 patients with histologically confirmed small-cell cancer and extensive disease (defined as distant metastasis or contralateral hilar-node metastasis) were enrolled. All patients were evaluated weekly. Tumor response was assessed by chest radiograph and chest CT.

We use data from this study to illustrate methods for analyzing survival data. A sample of 26 patients is used to show actual calculations, but results from the entire study are also given. Data from the study are available in a data set entitled “Noda” on the CD-ROM.

Presenting Problem 2

In the United States the formation of renal stones (nephrolithiasis) occurs with an annual incidence of 7–21 per 10,000. Nephrolithiasis is complicated by obstruction, infection, and severe pain; and 5–10% of all cases may require hospitalization. Studies of the natural history of this disease show that 50% of patients will have a recurrence in 5 years. Seventy percent of stones are calcium oxalate, and idiopathic hypercalciuria is an important, common risk factor for the formation of stones.

Previous studies have shown that a low calcium intake reduces urinary calcium excretion but can cause a deficiency of calcium and an increase in urinary oxalate. A low animal protein and low-salt diet decreases urinary excretion of calcium and oxalate and may lower the endogenous synthesis of oxalate. Borghi and colleagues (2002) compared the efficacy of a traditional low-calcium diet with a diet containing a normal amount of calcium but reduced amounts of animal protein and salt in the prevention of recurrent stone formation. They enrolled 120 men with idiopathic hypercalciuria and a history of recurrent formation of calcium oxalate stones on at least two occasions in a 5-year randomized trial. One group consisting of 60 men consumed a diet containing a normal amount of calcium (30 mmol/day) but reduced amounts of animal protein and salt. The other group of 60 men followed a traditional low-calcium diet that contained 10 mmol calcium/day. Twenty-four hour urine collections were obtained at baseline, 1 week after randomization, and at yearly intervals; they were used to estimate liquid consumption, and salt, total protein, and animal protein intake. Calcium and oxalate excretion were measured. Renal ultrasound and abdominal flat-plate examinations were performed at yearly intervals. The primary outcome measure was the time to the first recurrence of symptomatic renal stone or presence of radiographically identified stone. Recurrences were classified as either silent or symptomatic. We will use data from this study to illustrate Kaplan–Meier analysis to determine the cumulative incidence of recurrent stones; data are in a file on the CD-ROM called “Borghi.”

Presenting Problem 3

Prostate-specific antigen (PSA) is a serine protease glycoprotein secreted by both normal and neoplastic prostate epithelia. It circulates in the bloodstream and can be measured by a variety of assays and consequently has become an important tool in the understanding of prostate cancer biology and growth. PSA value correlates with the stage of the prostate tumor at diagnosis and is an important prognostic variable.(See also Shipley et al, 1999, Presenting Problem 3 in Chapter 4.) An increasing PSA value after prostate cancer treatment is a sensitive indictor of relapse but does not discriminate between local recurrence and metastatic recurrence. Crook and coinvestigators (1997) studied the correlation between both the pretreatment PSA and posttreatment nadir PSA with the outcome in men with localized prostate cancer who were treated with external beam radiation therapy.

This study was a cohort study of 207 men with localized adenocarcinoma of the prostate treated with radiation therapy. Pretreatment PSA values were obtained for all of the men, and posttreatment values were obtained at 3- and 6-month intervals for 5 years and yearly thereafter. Posttreatment prostate biopsies were done at 12 months. Patients with residual tumor had repeat biopsies about every 6 months, whereas those with negative biopsy results had repeat biopsies at 36 months or if a rising PSA value occurred.

The Gleason histologic scoring system was used to classify tumors on a scale of 2 to 10. A low score indicates a well-differentiated tumor, a medium score a moderately differentiated tumor, and a high score a poorly differentiated tumor. Tumors were also classified using the TNM (tumor, node, metastasis) staging system, called the T classification. A T1 tumor is a nonpalpable tumor identified by biopsy. A T2 tumor is palpable on digital rectal examination and limited to the prostate gland. T3 and T4 tumors have invaded adjacent prostate structures, such as the bladder neck or seminal vesicle. The median age of patients was 69 years, and the median duration of follow-up was 36 months. Sixty-eight of the 207 patients had a recurrence of prostate cancer: 20 had local recurrence, 24 had nodal or metastatic recurrence, 7 had both local and distant recurrence, and 17 had biochemical recurrence (elevated PSA with negative biopsy and metastatic workup). The prognostic importance of pretreatment PSA, posttreatment nadir PSA, the Gleason score, and the T classification were examined.

We summarize this study and present the results from the Kaplan–Meier survival analysis predicting recurrence. Data on the patients are given on the CD-ROM in the data set entitled “Crook.”

PURPOSE OF THE CHAPTER

Many studies in medicine are designed to determine whether a new medication, a new treatment, or a new procedure will perform better than the one in current use. Although measures of short-term effects are of interest with efforts to provide more efficient health care, long-term outcomes, including mortality and major morbidity, are also important. Often, studies focus on comparing survival times for two or more groups of patients

The methods of data analysis discussed in previous chapters are not appropriate for measuring length of survival for two reasons.

First, investigators frequently must analyze data before all patients have died; otherwise, it may be many years before they know which treatment is better. When analysis of survival is done while some patients in the study are still living, the observations on these patients are called censored observations, because we do not know how long these patients will remain alive. Figure 9-1illustrates a situation in which observations on patients B and E are censored.

The second reason special methods are needed to analyze survival data is that patients do not typically begin treatment or enter the study at the same time, as they did for Figure 9-1. For example, in the cisplatin study, patients entered the study at different times. When the entry time for patients is not simultaneous and some patients are still in the study when the analysis is done, the data are said to be progressively censored. Figure 9-2 shows results for a study with progressively censored observations. The study began at time 0 months with patient A; then, patient B entered the study at time 7 months; patient C entered at time 8 months; and so on. Patients B and E were still alive at the time the data were analyzed at 40 months.

Analysis of survival times is sometimes called actuarial, or life table, analysis. Historically, astronomer Edmund Halley (of Halley's Comet fame) first used life tables in the 17th century to describe survival times of residents of a town. Since then, these methods have been used in various ways. Life insurance companies use them to determine the life expectancy of individuals, and this information is subsequently used to establish premium schedules. Insurers generally use cross-sectional data about how long people of different age groups are expected to live in order to develop a current life table. In medicine, however, most studies of survival use cohort life tables, in which the same group of subjects is followed for a given period. The data for life tables may come from cohort studies (either prospective or historical) or from clinical trials; the key feature is that the same group of subjects is followed for a prescribed time.

Figure 9-1. Example of censored observations (X means patient died).

Figure 9-2. Example of progressively censored observations (X means patient died).

In this chapter, we examine two methods to determine survival curves. Actually, it is more accurate to describe them as methods to examine curves with censored data, because many times the outcome is something other than survival. The outcome studied by Borghi and colleagues (2002) was the incidence of stones in patients having idiopathic hypercalciuria who were randomized to one of two diets. Crook and colleagues (1997) studied the recurrence of prostate cancer based on different risk factors.

WHY SPECIALIZED METHODS ARE NEEDED TO ANALYZE SURVIVAL DATA

Before illustrating the methods for analyzing survival data, let us consider briefly why some intuitive methods are not very useful or appropriate. To illustrate these points, we selected a sample of 26 patients being treated for lung cancer: 13 patients taking irinotecan plus cisplatin and 13 patients taking etoposide plus cisplatin (Table 9-1)

Colton (1974, pp. 238–241) gives a creative presentation of some simple methods to analyze survival data; the arguments presented in this section are modeled on his discussion. Some methods appear at first glance to be appropriate for analyzing survival data, but closer inspection shows they are incorrect.

Suppose someone suggests calculating the mean length of time patients survive with small-cell lung cancer. Using the data on 26 patients inTable 9-1, the mean survival time for patients on irinotecan plus cisplatin is 17.51 months, and, for patients on etoposide plus cisplatin, 9.12 months. The problem is that mean survival time depends on when the data are analyzed; it will change with each passing month until the point when all the subjects have died. Therefore, mean survival estimates calculated in this way are useful only when all the subjects have died or the event being analyzed has occurred. Almost always, however, investigators wish to analyze their data prior to that time.

An estimate of median length of survival time is also possible, and it can be calculated after only half of the subjects have died. Again, however, investigators often wish to evaluate the outcome prior to that time.

A concept sometimes used in epidemiology is the number of deaths per each 100 person-years of observation. To illustrate, we use the observations in Table 9-1 to determine the number of person-months of survival. Regardless of whether patients are alive or dead at the end of the study, they contribute to the calculation for however long they have been in the study. Patient 1 therefore contributes 13.57 months, patient 2 is in the study for 11.70 days, and so on. The total number of months patients have been observed is 346.25 months; converting to years by dividing by 12 gives 28.8 person-years.

One problem with using person-years of observation is that the same number is obtained by observing 1000 patients for 1 year or by observing 100 patients for 10 years. Although the number of subjects is involved in the calculation of person-years, it is not evident as an explicit part of the result; and no statistical methods are available to compare these numbers. Another problem is the inherent assumption that the risk of an event, such as death or rejection, during any one unit of time is constant throughout the study (although several other survival methods also make this assumption).

Mortality rates (see Chapter 3) are a familiar way to deal with survival data, and they are used (especially in oncology) to estimate 3- and 5-year survival with various types of medical conditions. We cannot determine a mortality rate using data on all patients until the specified length of time has passed.

Suppose we have a study with 20 patients: 10 lived at least 1 year, and 4 died prior to 1 year, and 6 had been in the study less than a year (ie, they were censored). We have to decide what to do with the six censored patients in order to calculate a 1-year survival rate. One solution is to divide the number who died in the first year, 4, by the total number in the study, 20, for an estimate of 0.20, or 20%. This estimate, however, is probably too low, because it assumes that none of the six patients in the study less than 1 year will die before the year is up.

An alternative solution is to ignore the patients who were not in the study for 1 year to obtain 4/(20 – 6) = 0.286, or 28.6%. This technique is similar to the approach used in cancer research in which 3- and 5-year mortality rates are based on only those patients who were in the study at least 3 or 5 years. The shortcoming of this approach is that it ignores completely the contribution of the six patients who were in the study for part of a year. We need a way to use information gained from all patients who entered the study. A reasonable approach should produce an estimate between 20% and 28.6%, which is exactly what actuarial life table analysis and Kaplan–Meier product limit methods do. They give credit for the amounts of time subjects survived up to the time when the data are analyzed.

Table 9-1. Data report on a sample of 26 patients.

Row

Case #

Survive

Treatment

Months Survival

Progression Free Survival

Response

1

1

1

1

13.57

7.39

1

2

3

1

1

11.70

4.67

1

3

6

1

1

12.52

6.64

1

4

8

1

1

30.65

10.12

1

5

9

1

1

2.73

1.87

0

6

12

1

1

25.49

11.14

1

7

13

1

1

13.31

4.30

1

8

14

1

1

16.89

7.23

1

9

16

1

1

10.94

8.02

1

10

19

1

1

8.18

5.88

1

11

20

1

1

9.72

8.84

1

12

24

1

1

15.61

5.75

1

13

26

1

1

56.38

38.87

1

14

2

1

2

8.11

8.05

1

15

4

1

2

5.82

3.12

1

16

5

1

2

1.94

1.02

0

17

7

1

2

13.34

3.71

1

18

10

1

2

7.56

4.40

0

19

15

1

2

6.47

4.57

1

20

17

1

2

14.69

11.96

1

21

18

1

2

16.26

6.54

1

22

21

1

2

9.43

7.06

1

23

22

1

2

9.49

8.08

1

24

23

1

2

4.86

2.76

1

25

25

1

2

14.65

4.67

1

26

28

1

2

5.95

2.40

1

Source: Data, used with permission, from Noda K, Nishiwaki Y, Kawahara M, Negoro S, Sugiura T, Yokoyama A, et al: Irinotecan plus cisplatin compared with etoposide plus cisplatin for extensive small-cell lung cancer. N Engl J Med 2002; 346: 85–91. Table produced with NCSS, used with permission.

ACTUARIAL, OR LIFE TABLE, ANALYSIS

Actuarial, or life table, analysis is also sometimes referred to in the medical literature as the Cutler–Ederer method (1958). The actuarial method is not computationally overwhelming and, at one time, was the predominant method used in medicine. The availability of computers makes it far less often used today, however, than the Kaplan–Meier product limit method discussed in the next section

We briefly illustrate the calculations involved in actuarial analysis by arranging the 13 patients on etoposide plus cisplatin according to the length of time they had no progression of their disease (Table 9-2). We use the observations in Table 9-2 to produce Table 9-3. The time intervals are arbitrary but should be selected so that the number of censored observations in any interval is small; we group by 3-month intervals.

The column headed ni in Table 9-3 is the number of patients in the study at the beginning of the interval; all patients (13) began the study, so n1 is 13. Three patients did not complete the first time interval: patients 5, 28, 23. Of these, one patient's disease progressed (patient 5), referred to as a terminal event (d1); the remaining two patients are referred to as withdrawals (w1).

The actuarial method assumes that patients withdraw randomly throughout the interval; therefore, on the average, they withdraw halfway through the time represented by the interval. In a sense, this method gives patients who withdraw credit for being in the study for half of the period. One-half of the number of patients withdrawing is subtracted from the number beginning the interval, so the denominator used to calculate the proportion having a terminal event is reduced by half of the number who withdraw during the period, 13 – (˝ × 2), or 12 in our example. The proportion terminating is 1/12 = 0.0833. The proportion surviving is 1 – 0.0833 = 0.9167, and, because we are still in the first period, the cumulative survival is also 0.9167.

Table 9-2. Survival of a sample of patients in the etoposide plus cisplatin arm.

Filter

trt_arm=2

Row

Case #

Survive

Treatment

Progression-Free Survival

Response

 

1

5

1

2

1.02

0

 

3

28

1

2

2.40

1

 

4

23

1

2

2.76

1

 

5

4

1

2

3.12

1

 

6

7

1

2

3.71

1

 

8

10

1

2

4.40

0

 

9

15

1

2

4.57

1

 

11

25

1

2

4.67

1

 

14

18

1

2

6.54

1

 

16

21

1

2

7.06

1

 

20

2

1

2

8.05

1

 

21

22

1

2

8.08

1

 

25

17

1

2

11.96

1

 

Source: Data, used with permission, from Noda K, Nishiwaki Y, Kawahara M, Negoro S, Sugiura T, Yokoyama A, et al: Irinotecan plus cisplatin compared with etoposide plus cisplatin for extensive small-cell lung cancer. N Engl J Med 2002; 346: 85–91. Table produced with NCSS, used with permission.

 

At the beginning of the second interval, only ten patients remain. During the second period, four patients withdrew and one's disease progressed (patient 10), so d2 = 1 and w2 = 4. The proportion terminating at the second interval is not 1/10 because, although ten patients began the interval, four patients withdrew giving 10 – 2 for the denominator. In our example, the proportion terminating during the second period is 1/[10 – (4/2)] = 1/8, or 0.1250. Again, the proportion with no progression is 1 – 0.1250, or 0.8750, and the cumulative proportion is 0.0.9167 × 0.8750, or 0.8021. This computation procedure continues until the table is completed.

Table 9-3. Life table for sample of 13 patients treated with etoposide plus cisplatin.

Life Table Survival Variable: Progression-Free Survival

 

ni

wi

 

di

qi = di/[ni-(wi/2)]

pi = 1–qi

si =pipi–1pi-2p1

Interval Start Time

Number Entering this Interval

Number Withdrawn During Interval

Number Exposed to Risk

Number of Terminal Events

Propn Terminating

Propn Surviving

Cumul Propn Surv at End

0.0

13.0

2.0

12.0

1.0

0.833

0.9167

0.9167

3.0

10.0

4.0

8.0

1.0

0.1250

0.8750

0.8021

6.0

5.0

4.0

3.0

0.0

0.0000

1.0000

0.8021

9.0

1.0

1.0

0.5

0.0

0.0000

1.0000

0.8021

Source: Data, used with permission, from Noda K, Nishiwaki Y, Kawahara M, Negoro S, Sugiura T, Yokoyama A, et al: Irinotecan plus cisplatin compared with etoposide plus cisplatin for extensive small-cell lung cancer. N Engl J Med 2002; 346: 85–91. Table produced with NCSS, used with permission.

Note that pi is the probability of surviving interval i only; and to survive interval i, a patient must have survived all previous intervals as well. Thus, pi is an example of a conditional probability because the probability of surviving interval i is dependent, or conditional, on surviving until that point. This probability is sometimes called the survival function. Recall from Chapter 4 that if one event is conditional on a previous event, the probability of their joint occurrence is found by multiplying the probability of the conditional event by the probability of the previous event. The cumulative probability of surviving interval i plus all previous intervals is therefore found by multiplying pi by pi-1, pi-2, …, p1.

The results from an actuarial analysis can help answer questions that may help clinicians counsel patients or their families. For example, we might ask, If X is the length of time survived by a patient selected at random from the population represented by these patients, what is the probability that X is 6 months or greater? From Table 9-3, the probability is 0.80, or 4 out of 5, that a patient will live for at least 6 months.

Journal articles rarely present the results from life table analysis as we have in Table 9-3; rather, the results are usually presented in a survival curve. The line in Figure 9-3 is a survival curve for the sample of 13 patients on etoposide plus cisplatin.

The actuarial method involves two assumptions about the data. The first is that all withdrawals during a given interval occur, on average, at the midpoint of the interval. This assumption is of less consequence when short time intervals are analyzed; however, considerable bias can occur if the intervals are large, if many withdrawals occur, and if withdrawals do not occur midway in the interval. The Kaplan–Meier method introduced in the next section overcomes this problem. The second assumption is that, although survival in a given period depends on survival in all previous periods, the probability of survival at one period is treated as though it is independent of the probability of survival at others. This condition, although probably violated somewhat in much medical research, does not appear to cause major concern to biostatisticians.

Figure 9-3. Life table survival plot of a sample of patients in the etoposide plus cisplatin arm. (Data, used with permission, from Noda K, Nishiwaki Y, Kawahara M, Negoro S, Sugiura T, Yokoyama A, et al: Irinotecan plus cisplatin compared with etoposide plus cisplatin for extensive small-cell lung cancer. N Engl J Med 2002; 346: 85–91. Survival plot produced with NCSS, used with permission.)

KAPLAN–MEIER PRODUCT LIMIT METHOD

The Kaplan–Meier method of estimating survival is similar to actuarial analysis except that time since entry in the study is not divided into intervals for analysis. Depending on the number of patients who died, the Kaplan–Meier product limit method, commonly called Kaplan–Meier curves, may involve fewer calculations than the actuarial method, primarily because survival is estimated each time a patient dies, so withdrawals are ignored. We will illustrate with data from Noda and colleagues (2002) using the same subset of patients as with life table analysis, patients on etoposide plus cisplatin

The first step is to list the times when a death or dropout occurs, as in the column “Event Time” in Table 9-4. One patient's disease progressed at 1 month and another at 4.4 months, and they are listed under the column “Number of Events.” Then, each time an event or outcome occurs, the mortality, survival, and cumulative survival are calculated in the same manner as with the life table method. If the table is published in an article, it is often formatted in an abbreviated form, such as in Table 9-5.

Note that the Kaplan–Meier procedure gives exact survival proportions because it uses exact survival times; the actuarial method gives approximations because it groups survival times into intervals. Prior to the widespread use of computers, the actuarial method was much easier to use for a very large number of observations.

Typically, as the interval from entry into the study becomes longer, the number of patients who remain in the study becomes increasingly smaller. This means that the standard deviation of the estimate of the proportion surviving gets increasingly larger over time. Sometimes the number of patients remaining in the study is printed under the time line (as in Figures 1 and 2 in the article by Noda et al). Some authors provide graphs with dashed lines on either side of the survival curve that represent 95% confidence bands for the curve. The confidence limits become wider as time progresses, reflecting decreased confidence in the estimate of the proportion as the sample size decreases. These practices are desirable, but not all computer programs provide them.

Table 9-4. Kaplan–Meier survival curve in detail for patients on etoposide plus cisplatin.

Event Time(T)

Number at Risk ni

Number of Events di

Mortality qidi/ni

Survival pi = 1 - qi

Cumulative Survival S =pip(i-1)p2p1

1.0

13

1

0.076

0.9231

0.9231

2.4

12

       

2.8

11

       

3.1

10

       

3.7

9

       

4.4

8

1

0.1250

0.8750

0.8077

4.6

7

       

4.7

6

       

6.5

5

       

7.1

4

       

8.0

3

       

8.1

2

       

12.0

1

       

Source: Data, used with permission, from Noda K, Nishiwaki Y, Kawahara M, Negoro S, Sugiura T, Yokoyama A, et al: Irinotecan plus cisplatin compared with etoposide plus cisplatin for extensive small-cell lung cancer. N Engl J Med 2002; 346: 85–91. Table produced with NCSS, used with permission.

To illustrate confidence bands, we analyze actual survival for all patients in both treatment arms. The procedure for obtaining confidence bands uses the standard error of the cumulative survival estimate Si:

For example, at month 32, 10 patients taking irinotecan plus cisplatin are still in the study and 1 patient's disease has progressed, so

and the standard error is

The remaining calculations for both treatment arms are given in Table 9-6

Figure 9-4 is a graph of the Kaplan–Meier product limit curve for all patients on irinotecan plus cisplatin illustrating 95% confidence bands. In this graph, the curve is step-like because the proportion of patients surviving changes precisely at the points when a subject dies.

Table 9-5. Kaplan–Meier survival curve in abbreviated form for patients on etoposide plus cisplatin.

Event Time(T)

Number at Risk ni

Number of Events di

Mortality qidi/ni

Survival pi = 1 - qi

Cumulative Survival S =pip(i-1)p2p1

1.0

13

1

0.076

0.9231

0.9231

4.4

8

1

0.1250

0.8750

0.8077

.

         

.

         

.

         

12.0

         

Source: Data, used with permission, from Noda K, Nishiwaki Y, Kawahara M, Negoro S, Sugiura T, Yokoyama A, et al: Irinotecan plus cisplatin compared with etoposide plus cisplatin for extensive small-cell lung cancer. N Engl J Med 2002; 346: 85–91. Table produced with NCSS, used with permission.

COMPARING TWO SURVIVAL CURVES

Although some journal articles report survival for only one group, more often investigators wish to compare two or more samples of patients. Table 9-6 contains the analysis of survival for the entire sample of 154 patients, separately by each treatment arm (Noda et al, 2002).

The Kaplan–Meier survival curves for both treatment arms are given in Figure 9-5. It is difficult to tell by looking whether the two curves are significantly different. We cannot make judgments simply on the basis of the amount of separation between two lines; a small difference may be statistically significant if the sample size is large, and a large difference may not if the sample size is small. As you might suspect, we need to perform a statistical test to evaluate the degree of any differences. Use the CD-ROM and replicate our analysis to produce the survival curves

We need special methods to compare survival distributions. If no censored observations occur, the Wilcoxon rank sum test introduced inChapter 6 is appropriate for comparing the ranks of survival time. The independent-groups t test is not appropriate because survival times are not normally distributed and tend to be positively skewed (extremely so, in some cases).

If some observations are censored, several methods may be used to compare survival curves. Most articles in the medical literature report a comparison of survival curves using the logrank statistic or the Mantel–Haenszel chi-square statistic. The computations for all of the methods are very time-consuming, and computer programs are readily available. We illustrate the logrank and Mantel-Haenszel methods; both methods are straightforward, if computationally onerous, and are useful in helping us understand the logic behind the method. Within the context of the logrank statistic, we illustrate the hazard ratio, a useful descriptive statistic for comparing two groups at risk.

The Logrank Test

Several forms of the logrank statistic have been published by different biostatisticians, so it is called by several different names in the literature: the Mantel logrank statistic, the Cox–Mantel logrank statistic, and simply the logrank statistic. The logrank testcompares the number of observed deaths in each group with the number of deaths that would be expected based on the number of deaths in the combined groups, that is, if group membership did not matter. An approximate chi-square test is used to test the significance of a mathematical expression involving the observed and expected number of deaths

To illustrate the logrank test, we use the data from Borghi and colleagues (2002) in which 60 patients received a low-calcium diet and 60 received a normal-calcium, low-protein, low-salt diet. The times to recurrence and outcomes are given in Table 9-7, and the time-to-event curves are shown in Figure 9-6. We grouped the data for the entire sample of 120 patients in Table 9-7; the steps for calculating the logrank statistic follow.

1. The second and third columns contain the number of patients in each group who were at risk of developing stones during the time interval. Thus, at 0–10 months, all 60 patients in each sample were at risk. At 11–20 months, patients who developed stones in the first 10 months or were censored (were not in the study that long) are subtracted to obtain the number of patients still at risk, resulting in 55 and 54, respectively. In column 4, the total number at risk in the combined samples is given; that is, the sum of columns 2 and 3. This calculation continues through all periods.

Table 9-6. Survival analysis for OSM (months of survival) in both treatment arms.

Factor TRT_ARM = Ironotecan + cisplatin

Time

Status

Cumulative Survival

Standard Error

Cumulative Events

Number Remaining

0.5585216

Alive

   

0

76

2.135524

Alive

   

0

75

.

         

.

         

.

         

29.99589

Alive

   

0

11

30.65298

Alive

   

0

10

32.55852

Dead

.9000

.0949

1

9

33.31417

Dead

.8000

.1265

2

8

34.20123

Alive

   

2

7

37.65092

Alive

   

2

6

44.61602

Dead

.6667

.1610

3

5

44.71458

Dead

.5333

.1755

4

4

44.78029

Dead

.4000

.1751

5

3

52.76386

Dead

.2667

.1596

6

2

53.38809

Dead

.1333

.1235

7

1

56.37782

Alive

   

7

0

Number of Cases:77  Censored: 70(90.91%)  Events: 7
Survival Analysis for OSM:Months of Survival
Factor TRT_ARM = Etoposide + cisplatin

Time

Status

Cumulative Survival

Standard Error

Cumulative Events

Number Remaining

1.938398

Alive

   

0

76

2.299795

Alive

   

0

75

.

         

.

         

.

         

19.28542

Alive

   

0

5

21.74949

Alive

   

0

4

35.21971

Dead

.7500

.2165

1

3

38.57084

Alive

   

1

2

40.34497

Dead

.3750

.2864

2

1

54.63655

Dead

.0000

.0000

3

0

Number of Cases:77  Censored:74(96.10%)  Events: 3
Survival Analysis for OSM: Months of Survival

   

Total

Number Events

Number Censored

Percent Censored

TRT_ARM

Ironotecan + cisplatin

77

7

70

90.91

TRT_ARM

Etoposide + cisplatin

77

3

74

96.10

Overall

 

154

10

144

93.51

Source: Data, used with permission, from Noda K, Nishiwaki Y, Kawahara M, Negoro S, Sugiura T, Yokoyama A, et al: Irinotecan plus cisplatin compared with etoposide plus cisplatin for extensive small-cell lung cancer. N Engl J Med 2002; 346: 85–91. Table produced with NCSS, used with permission.

Figure 9-4. Kaplan–Meier curve with 95% confidence limits for patients on irinotecan plus cisplatin. (Data, used with permission, from Noda K, Nishiwaki Y, Kawahara M, Negoro S, Sugiura T, Yokoyama A, et al: Irinotecan plus cisplatin compared with etoposide plus cisplatin for extensive small-cell lung cancer. N Engl J Med 2002; 346: 85–91. Survival plot produced with NCSS, used with permission.)

2. In columns 5 through 7, the number of patients in each group who relapse during that interval and the total number are listed. Thus, at 0–10 months, two patients developed stones in both groups, whereas three patients on the low-calcium diet and four in the other group were in the study less than 11 months. This calculation continues through all periods.

3. The last three columns contain the expected number of relapses for each group and the total at each period. The expected number of relapses for a given group is found by multiplying the total number of relapses in a given period by the proportion of patients in that group. For example, at 31–40 months, 44 patients remain in group 1 and 46 in group 2, for a total of 90. Four relapses are noted; so 4 × (44/90) is the number of relapses expected to occur in group 1, and 4 × (46/90) is the number of relapses expected in group 2. This calculation is done for all periods.

4. The totals are calculated for each column

The following expression can be used to test the null hypothesis that the survival distributions are the same in the two groups:

where O1 is the total number of observed losses in group 1, E1 is the total number of expected failures in group 1, and so forth. The statistic χ2 follows an approximate chi-square distribution with 1 degree of freedom. In our example, the calculation is

Figure 9-5. Kaplan–Meier survival curve patients in both treatment arms. (Data, used with permission, from Noda K, Nishiwaki Y, Kawahara M, Negoro S, Sugiura T, Yokoyama A, et al: Irinotecan plus cisplatin compared with etoposide plus cisplatin for extensive small-cell lung cancer. N Engl J Med 2002; 346: 85–91. Survival plot produced with NCSS, used with permission.)

 

The chi-square distribution with 1 degree of freedom in Table A–5 indicates that a critical value of 3.841 is required for significance at 0.05. We therefore conclude that a statistically significant difference exists in the distributions of time to recurrence of stones for patients on the two diets.

Computer programs that calculate the logrank statistic do so without dividing the recurrences, or failures, into periods. Instead, they calculate the observed and expected number of failures at each time that a patient dies or is censored, and the result is more accurate—but even more computationally intensive—than the approach we used. Use the CD-ROM and find the value of the logrank statistic for this sample. Is it in close agreement with our calculations?

The Hazard Ratio

One benefit of calculating the logrank statistic is that the hazard ratio can easily be calculated from the information given in Table 9-8. It is estimated by O1/E1 divided by O2/E2. In our example, the hazard ratio, or risk of recurrence of stones, in patients who were on the low-calcium diet compared with patients on the normal-calcium, low-protein, low-salt diet is

The hazard ratio of 2.01 can be interpreted in a similar manner as the odds ratio: The risk of recurrence of stones at any time in the group on the traditional low-calcium diet is approximately twice greater than the risk in the group on the normal-calcium with low-protein and salt diet. Using the hazard ratio assumes that the hazard or risk of death is the same throughout the time of the study; we will discuss the concept of hazard again in the next chapter.

 

Table 9-7. Data on outcomes for patients on a low calcium vs normal calcium diet.

Group 1

Low-Calcium Diet

Group 2

Normal-Calcium, Low-Protein & Salt Diet

Months of Follow-Up

 

Months of Follow-Up

 

6.00

censored

0.23

censored

8.00

censored

0.23

censored

8.00

recurrence

0.23

censored

9.00

recurrence

8.00

recurrence

10.00

censored

10.00

censored

11.00

recurrence

10.00

recurrence

12.00

recurrence

12.00

recurrence

12.00

recurrence

13.00

recurrence

12.00

recurrence

14.00

recurrence

12.23

censored

15.00

recurrence

18.00

recurrence

16.00

recurrence

21.00

censored

18.00

censored

23.00

recurrence

18.00

censored

24.00

recurrence

24.00

recurrence

24.23

censored

26.00

recurrence

26.00

recurrence

36.00

recurrence

33.00

recurrence

40.00

censored

36.00

recurrence

43.00

censored

36.23

censored

48.00

recurrence

36.23

censored

51.00

recurrence

40.00

recurrence

60.00

censored

43.00

recurrence

60.00

censored

46.00

recurrence

60.00

censored

46.00

recurrence

60.00

censored

48.00

recurrence

60.00

censored

48.00

recurrence

60.00

censored

48.00

recurrence

60.00

censored

48.23

censored

60.00

censored

52.00

recurrence

60.00

censored

56.00

recurrence

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

recurrence

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

recurrence

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

60.00

censored

Source: Data, used with permission, from Borghi L, Schianchi T, Meschi T, Guerra A, Allergri F, Maggiore U, et al: Comparison of two diets for the prevention of recurrent stones in idiopathic hypercalciuria. N Engl J Med 2002; 346: 77–88. Listing produced with SPSS Inc.; used with permission.

 

The Mantel–Haenszel Chi-Square Statistic

Another method for comparing survival distributions is an estimate of the odds ratio developed by Mantel and Haenszel that follows (approximately) a chi-square distribution with 1 degree of freedom. The Mantel–Haenszel test combines a series of 2 × 2 tables formed at different survival times into an overall test of significance of the survival curves. The Mantel–Haenszel statistic is very useful because it can be used to compare any distributions, not simply survival curves

Figure 9-6. Kaplan–Meier survival curve for recurrence of stones. (Data, used with permission, from Borghi L, Schianchi T, Meschi T, Guerra A, Allegri F, Maggiore U, et al: Comparison of two diets for the prevention of recurrent stones in idiopathic hypercalciuria. N Engl J Med 2002; 346: 77–88. Survival plot produced with NCSS, used with permission.)

We again use data from Borghi and colleagues (2002) to illustrate the calculation of the Mantel– Haenszel statistic (Table 9-9). The first step is to select the time intervals for which 2 × 2 tables will be formed; we use the 10-month intervals as before. For each interval, the number of patients who remained stone-free and the number who developed stones are the rows, and the number of patients in each group are the columns of the 2 × 2 tables.

Table 9-8. Logrank statistic for survival.

 

Number of Patients at Risk

Number of Observed Occurrences

Number of Expected Occurrences

Time Period

Group 1

Group 2

Total

Group 1

Group 2

Total

Group 1

Group 2

Total

0–10

60

60

120

2

2

4

2

2

4

11–20

55

54

109

5

5

10

5.045872

4.954128

10

21–30

49

47

96

3

2

5

2.552083

2.447917

5

31–40

44

46

90

3

1

4

1.955556

2.044444

4

41–50

39

43

82

6

1

7

3.329268

3.670732

7

51–60

32

41

73

4

1

5

2.191781

2.808219

5

Totals

     

23

12

35

17.07456

17.92544

35

Calculations of the logrank statistic

O-E

(O-E)2

(O-E)2/E

Sum

5.925440437

35.11084

2.056325

4.015041

-5.925440437

35.11084

1.958716

 

Group 1: Low-calcium diet

Group 2: Normal-calcium, low-protein & salt diet

Source: Data, used with permission, from Borghi L, Schianchi T, Meschi T, Guerra A, Allergri F, Maggiore U, et al: Comparison of two diets for the prevention of recurrent stones in idiopathic hypercalciuria. N Engl J Med 2002; 346: 77–88. Table produced with Microsoft Excel.

 

Table 9-9. Illustration of Mantel–Haenszel using entire sample.

 

As with the logrank test, the Mantel–Haenszel test is onerous to compute. The first step estimates a pooled odds ratio, which is useful for descriptive purposes but is not needed for the statistical test itself. The pooled odds ratio is

where a, b, c, d, and n are defined as they were in the 2 × 2 table in Table 6-9. The numerator and denominator are calculated in the columns under the heading “Odds Ratio.” For the first time period, the ad/n is (58 × 2)/120 or 0.97. The sum of the terms in the numerator is 5.61 and in the denominator is 8.39. The estimate of the odds ratio using the Mantel–Haenszel approach is therefore 5.61/8.39 = 0.67. The hypothesis to be tested is whether 0.67 is significantly different from 1

The remaining calculations focus on cell (1, 1) of the table; we first find its expected value and its variance for each 2 × 2 table. For example, at 21–30 months, among the 49 patients on the low-calcium diet who had not developed stones and who were in the study at least 21 months, 46 remained stone-free and 3 relapsed during that period. Among the 47 on the experimental diet, 45 remained stone-free and 2 relapsed. The expected values are found in the same manner as in the chi-square test discussed in Chapters 5 and 6. For example, the expected value of cell (1, 1) in this period is the row total times the column total divided by the grand total:

In addition, the variance of cell (1, 1) is calculated. Using the notation from Table 6-9, the estimated variance is

For this period, the variance, with rounding, is

After the expected value and the variance are found for each 2 × 2 table, the values are added, along with the number of observed patients in cell (1, 1) in each table. The three sums are 262, 264.78, and 6.91, as you can see in Table 9-9. The Mantel–Haenszel test is the squared difference between the sum of the observed number minus the sum of the expected number, all divided by the sum of the variances:

This value is smaller than the value we found for the logrank test, and it is no longer significant. Which statistic should we use? It is not a straightforward choice. The value of the logrank statistic in NCSS and SPSS is 4.33, significant at P = 0.04. We give some general guidelines in the next section.

Summary of Procedures to Compare Survival Distributions

The logrank statistics are used with a great deal of frequency in the medical literature. Several logrank methods are seen in the literature, such as that developed by Peto and Peto (1972). The logrank procedure gives all calculations the same weight, regardless of the time at which an event occurs. In contrast, the Peto logrank test weights the terms (observed minus expected) by the number of patients at risk at that time, thereby giving more weight to early events when the number of patients at risk is large. Some biostatisticians choose this method because they believe that calculations based on larger sample sizes should receive more weight than calculations based on smaller sample sizes that occur later in time. On the other hand, other statisticians, such as Hintze, who is the developer of NCSS, recommends using the logrank that gives equal weighting across time unless investigators have specific reasons for deciding otherwise. If the pattern of deaths is similar over time, the Peto logrank statistic and the logrank statistic we illustrated earlier generally lead to the same conclusion. If, however, a higher proportion of deaths occurs during one interval, such as sometimes occurs early in the survival curve, the Peto logrank test and the logrank test may differ

In truth, the information available to guide investigators in deciding which procedure is appropriate in any given application is quite complex, and, in some situations, readers of journal articles cannot determine which procedure actually was used. It is unfortunate that many of the statistical procedures used to compare survival distributions are called by a variety of names. Part of the confusion has occurred because the same biostatisticians (eg, Mantel, Gehan, Cox, Peto and Peto, Haenszel) are or have been leading researchers who developed a number of statistical tests. Another source of confusion is that research on biostatistical methods for analyzing survival data is still underway; as a result, the Mantel procedure and the Peto logrank procedure were only recently shown to be equivalent.

The Mantel–Haenszel chi-square test is sometimes referred to as the logrank test in some texts, and although it is technically different, on many occasions it leads to the same conclusion. This statistic actually may be considered an extension of the logrank test because it can be used in more general situations.

For example, the Mantel–Haenszel chi-square test can be used to combine two or more 2 × 2 tables in other situations, such as a 2 × 2 table for men and a 2 × 2 table for women. This procedure is similar to other methods to control for confounding factors, topics discussed in Chapter 10.

To summarize, all logrank tests, regardless of what they are called, and the Mantel–Haenszel chi-square test may be considered similar procedures. The Gehan and Wilcoxon tests, however, are conceptually different. The Gehan, or generalized Wilcoxon test is an extension of the Wilcoxon rank sum test illustrated in Chapter 6 modified so that it can be used with censored observations (Gehan, 1965). This test is also referred to in the literature as the Breslow test or the generalized Kruskal–Wallis test for comparison of more than two samples (Kalbfleisch and Prentice, 2002). As with the Peto logrank test, the generalized Wilcoxon test uses the number of patients at risk as weights and therefore counts losses that occur early in the survival distribution more heavily than losses that occur late.

Another difference between these two families of tests is that the logrank statistic assumes that the ratio of hazard rates in the two groups stays the same throughout the period of interest. When a constant hazard ratio cannot be assumed, the generalized Wilcoxon procedure is preferred. In the special situation in which the hazard rates are proportional, a method called Cox's proportional hazard model can be used; it is increasingly in the literature because it permits investigators to control for confounding variables (see Chapter 10).

As you can see, the issue is complex and illustrates the advisability of consulting a statistician if performing a survival analysis. Back to our question in the previous section when comparing patients on low-calcium versus normal-calcium, low-protein, low-salt diet, should we accept the result from the logrank test, which was significant, or the Mantel–Haenszel test, which was not? Looking at the distribution of mortality across time periods in Table 9-8, a constant hazard appears to be a reasonable assumption. Therefore, we would opt for the logrank statistic and conclude the diets are different at P < 0.04, as did Borghi and colleagues (2002). Readers who want more information are referred to the Lee and Wang (2003) text, possibly the most comprehensive text available. An introductory text devoted to survival analysis is that by Kleinbaum (1996). Other texts that discuss survival methods include books by Hosmer and Lemeshow (1999), Collett (2003), Fisher and van Belle (1996), Fleiss (1981; 1999), and Schlesselman (1982).

Many of the statistical computer programs provide several test statistics for survival, but they generally provide at least one from the Gehan/Wilcoxon family and one from the logrank group. In the Kaplan–Meier procedure, NCSS gives more than ten statistics, including the logrank test, the Gehan–Wilcoxon, the Peto test, and the Mantel–Haenszel chi-square. SPSS provides the logrank test and two others. JMP gives two tests: the Wilcoxon test and the logrank test.

THE HAZARD FUNCTION IN SURVIVAL ANALYSIS

In the introduction to this chapter, we stated that calculating mean survival is generally not useful, and we subsequently illustrated how its value depends on the time when the data are analyzed. Estimates of mean survival that are reasonable can be obtained, however, when the sample size is fairly large. This procedure depends on the hazard function, which is the probability that a person dies in the time interval i to i + 1, given that the person has survived until time i. The hazard function is also called the conditional failure rate; in epidemiology, the term force of mortality is used

Although the exponential probability distribution was not discussed in Chapter 4 when we introduced other probability distributions (ie, the normal, binomial, and Poisson), many survival curves follow an exponential distribution. It is a continuous distribution that involves the natural logarithm, ln, and it depends on a constant rate (which determines the shape of the curve) and on time. It provides a model for describing processes such as radioactive decay.

If an exponential distribution is a reasonable assumption for the shape of a survival curve, then the following formula can be used to estimate the hazard rate, symbolized by the letter H, when censored observations occur:

where d is the number of deaths, Σf is the sum of failure times, and Σc is the sum of censored times. Calculating the hazard rate requires us to add all of the failure times and censored times

One reason the hazard rate is of interest is that its reciprocal is an estimate of mean survival time. The formulas are complex, and, fortunately, the NCSS computer program calculates the hazard function and its 95% confidence interval as part of the Kaplan–Meier analysis. Details on using the hazard function are given in the comprehensive text on survival analysis by Lee and Wang (2003).

INTERPRETING SURVIVAL CURVES FROM THE LITERATURE

The investigators in the study described in Presenting Problem 3 (Crook et al, 1997) reported on a group of 207 men followed prospectively after treatment for prostate carcinoma with radiotherapy. Follow-up included systematic transrectal ultrasound-guided biopsies and measurements of serum prostate-specific antigen (PSA) levels. The median duration of follow-up for the patients at the time of analysis was 36 months, with a range from 12 to 70 months. Failures were observed in 68 patients. The investigators wanted to look at the relationship between patient outcome and both the stage of the tumor at diagnosis and pretreatment PSA levels. Time-dependent variables, survival, and time to failure were examined using the Kaplan–Meier product limit method. All outcomes were calculated from the time the patient completed radiotherapy, and the curves were compared using the logrank test. Treatment failure was categorized by whether they had a local, distant, or chemical failure. The investigators examined the relationship between multiple variables and the time until failure using the Cox proportional hazard model; we return to this study when we discuss this method in Chapter 10

Figure 9-7. Kaplan–Meier survival curve showing disease-free survival by tumor state. (Used, with permission, from Crook JM, Bahadur YA, Bociek RG, Perry GA, Robertson SJ, Esche BA: Radiotherapy for localized prostate carcinoma. Cancer 1997; 79: 328–336. Figure produced with NCSS; used with permission.)

The authors presented six different survival curves:

1. Overall survival by tumor stage

2. Disease-free survival by tumor stage

3. Time to any failure by pretreatment PSA level (divided into six categories)

4. Time to local failure

5. Time to distant failure

6. Time to any failure by the lowest (nadir) posttreatment PSA

We reproduced two of the figures using the NCSS computer program and the data provided by the authors. Figure 9-7 shows the disease-free survival curves for patients categorized by their tumor stage (T classification) using the NCSS computer program. The curves for T1b–c and T2a are very similar and, in fact, cross several times. The survival rate was very high in these two groups for the first 2 years following treatment. Even without a statistical test, it appears that the survival pattern for patients with these tumor stages follows similar survival curves. The situation is different for those with stages T2b–c and T3–4. Survival rates in these two stages are considerably lower. Both curves demonstrate a fairly steady decrease in disease-free survival over time.

Table 9-10 gives the results of the statistical tests comparing the survival curves using the NCSS computer program. You can see that the values for the three procedures are in close agreement: the Gehan– Wilcoxon chi-square is 25.94, the Peto–Wilcoxon is 27.98, and the logrank is 27.70. All are highly significant and substantiate our tentative conclusion that the curves, certainly for tumor stages T2b-c and T3-4, are significantly lower, indicating earlier deaths in patients with these two tumor stages.

We can use the filter procedure in NCSS or SPSS to select a subset of the cases to learn if the curves for patients with tumor stages T2b–c and T3–4 differ. Selecting only the patients with stages T2b–c and T3–4 gives a chi-square value for the logrank statistic of 6.81 with a P value of 0.009. We suggest you replicate these analyses using the CD-ROM

Figure 9-8 illustrates the six survival curves for the patients categorized according to their pretreatment PSA levels using the NCSS computer program. The investigators formed these six categories of PSA level: 0–5, 5.1–10, 10.1–15, 15.1–20, 20.1–50, and > 50. Is the pattern of survival what you would expect; that is, do patients with lower pretreatment PSA values survive longer, on average? What would you advise a patient with a PSA level of 3? It appears that such a patient has a very good chance of survival to 5 or 6 years, although it is important to remember that the curves are based on a small number of patients. The problem associated with small samples is exacerbated when the subjects are divided into groups. What would you tell a patient who inquires about survival if he has a pretreatment PSA level greater than 15?

Table 9-10. Statistical tests for survival curves by tumor stages.

Gehan–Wilcoxon Section

Tumor Stage Value

Failed Count

Censored Count

Total Count

Sum

Mean

T1b–c

4

30

34

1231.00

36.2059

T2a

4

30

34

1255.00

36.9118

T2b–c

27

52

79

-15.00

-0.1899

T3–4

33

27

60

-2471.00

-41.1833

χ2 = 25.94

df = 3

Probability = 0.000010

     

Peto–Wilcoxon Section

Tumor Stage Value

Failed Count

Censored Count

Total Count

Sum

Mean

T1b–c

4

30

34

-7.02

-0.2066

T2a

4

30

34

-7.28

-0.2141

T2b–c

27

52

79

0.37

0.0047

T3–4

33

27

60

13.93

0.2322

χ2 = 27.98

df = 3

Probability = 0.000004

     

Logrank Section

Tumor Stage Value

Failed Count

Censored Count

Total Count

Sum

Mean

T1b–c

4

30

34

-8.55

-0.2514

T2a

4

30

34

-8.74

-0.2571

T2b–c

27

52

79

0.66

0.0084

T3–4

33

27

60

16.63

0.2771

χ2 = 27.70

df = 3

Probability = 0.000004

     

Source: Data, used with permission, from Crook JM, Bahadur YA, Bociek RG, Perry GA, Robertson SJ, Esche BA: Radiotherapy for localized prostate carcinoma. Cancer 1997; 79: 328–336. Output produced with NCSS; used with permission.

Use the CD-ROM to determine the logrank statistic for survival by pretreatment PSA. We produced the output in Table 9-11.

THE INTENTION-TO-TREAT PRINCIPLE

In the method section of journal articles that report clinical trial results, investigators often state that they analyzed the data on anintention-to-treat basis. For instance, Presenting Problem 5 in Chapter 3 described a study evaluating antenatal administration of thyrotropin-releasing hormone to improve pulmonary outcome in preterm infants (Ballard et al, 1998). The study consisted of 996 women in active labor who were randomized to receive an injection of thyrotropin-releasing hormone or normal saline. The primary outcome was infant death on or before the 28th day after delivery or chronic lung disease, defined as the need for oxygen therapy for 21 of the first 28 days of life

Figure 9-8. Kaplan–Meier survival curve showing time to any failure by pretreatment PSA level group. (Used, with permission, from Crook JM, Bahadur YA, Bociek RG, Perry GA, Robertson SJ, Esche BA: Radiotherapy for localized prostate carcinoma. Cancer 1997; 79: 328–336. Figure produced with NCSS; used with permission.)

Table 9-11. Statistics comparing time to any failure by pretreatment PSA group.

Survival Analysis for Months of Survival

   

Total

Number of Events

Number Censored

Percent Censored

GPPREPSA

1.0 to 5.0

34

1

33

97.06

GPPREPSA

5.1 to 10.0

66

10

56

84.85

GPPREPSA

10.1 to 15.0

31

11

20

64.52

GPPREPSA

15.1 to 20.0

11

5

6

54.55

GPPREPSA

20.1 to 50

46

26

20

43.48

GPPREPSA

>50.0

19

15

4

21.05

Overall

 

207

68

139

67.15

Test Statistics for Equality of Survival Distributions for PSA Pretreatment Groups

 

Statistic

df

Significance

Logrank

64.73

5

0.0000

Breslow

56.64

5

0.0000

Tarone–Ware

60.41

5

0.0000

   

Source: Data, used with permission, from Crook JM, Bahadur YA, Bociek RG, Perry GA, Robertson SJ, Esche BA: Radiotherapy for localized prostate carcinoma. Cancer 1997; 79: 328–336. Output produced with NCSS; used with permission.

In the method section, the researchers state: “All analyses were based on the intention-to-treat principle…”. This statement means that the results for each patient who entered the trial were included in the analysis of the group to which the patient was randomized, regardless of any subsequent events. In Ballard and coworkers' study, 18 women in the treatment group (3.7%) and 7 in the placebo group (1.4%) withdrew from the trial because of side effects. These patients, however, were included in the analysis.

Analyzing data on an intention-to-treat basis is appropriate for several reasons. First is the issue of dropouts, as in the study by Ballard and coworkers. Although the percentage of patients dropping out of this study was relatively small, more than twice as many patients dropped out of the treatment group than from the control group. Is it possible that the patients who dropped out of the treatment group had some characteristics that, independent of the treatment, could affect the outcome? Suppose, for instance, that the women who dropped out of the study had gestations of fewer than 26 weeks. Women with gestations of fewer than 26 weeks are more likely to have infants with respiratory distress. If these patients are omitted from the analysis, the results may appear to be better for the early-gestation group than they should; that is, the results are biased. Although there is no indication that this occurred in the study by Ballard and coworkers, it is easy to see how such events could affect the conclusions, and these investigators were correct to analyze the data on the intention-to-treat basis.

The intention-to-treat principle is also important in studies in which patients cross over from one treatment group to another. For example, the classic Coronary Artery Surgery Study (CASS, 1983) was a randomized trial of coronary bypass surgery. Patients were assigned to medical treatment or surgical intervention to evaluate the effect of treatment on outcomes for patients with coronary artery disease. As in many studies that compare a conservative treatment with a more aggressive intervention, some patients in the CASS study who were randomized to medical treatment subsequently underwent surgery. And, some patients randomized to surgery were treated medically instead.

The problem with studies in which patients cross over from one treatment to another is that we do not know why the crossover occurred. Did some of the patients originally assigned to medical treatment improve so that they became candidates for surgery? If so, this could cause results in the surgery group to appear better than they really were (because “healthier” patients were removed from the medical group and transferred to the surgery group). On the other hand, perhaps the condition of the patients originally assigned to medical treatment worsened to such a degree that the patient or family insisted on having surgery. If so, this could cause the surgery group results to appear worse than they really were (because “sicker” patients were transferred from the medical to the surgical group). The point is, we do not know why the patients crossed over, and neither do the investigators.

In the past, some investigators presented with such a situation analyzed the patients by the group they were in at the end of the study. Other researchers omitted from the analysis any patients who crossed over. It should be easy to see why both of these approaches are potentially biased. The best approach, one recommended by biostatisticians and advocates of evidence-based medicine, is to perform all analyses on the original groups to which the patients were randomized. The CASS study occurred several years ago, and no consensus existed at that time on the best way to analyze the findings. The CASS investigators therefore performed the analyses in several ways: by the original group (intention-to-treat), by the final groups of the study, and by eliminating all crossovers from the analysis. All of these methods gave the same result, namely that no difference in survival occurred, although later studies showed differences in quality-of-life indicators.

The intention-to-treat principle applies to studies other than those with survival as the outcome. We included the topic here, however, because it is so pertinent to survival studies. Gillings and Koch (1991) provide a comprehensive and very readable discussion.

SUMMARY

Special methods are needed to analyze data from studies of survival time because censored observations occur when patients enter at different times and remain in a study for different periods. Otherwise, investigators would have to wait until all subjects were in the study for a given period before analyzing the data. In medicine, survival curves are commonly drawn by the Kaplan–Meier product limit method and, less frequently, the actuarial (life table) method. The quality of survival studies published in the medical literature was reviewed by Altman and associates (1995). They found that almost half of the papers did not summarize the length of follow-up or clearly define all endpoints. They suggested some guidelines for presenting survival analyses in medical journals

The study of small-cell lung cancer by Noda and colleagues (2002) was terminated early when an interim analysis showed a significant difference in overall survival between the two groups. When the final analysis was conducted 64 months after the first patient was enrolled and 28 months after the last patient was enrolled, the median overall survival was 12.8 months in the irinotecan-plus-cisplatin group and 9.4 months in the etoposide-plus-cisplatin group. The rate of overall survival in the irinotecan group at 1- and 2-years was 58.4% and 19.5%, respectively, and, in the etoposide group, 37.7% and 5.2%, respectively. The authors conclude that the combination of irinotecan plus cisplatin is an effective treatment option for patients with metastatic small-cell lung cancer.

In the study by Borghi and colleagues (2002), 23 of the 60 men on the low-calcium diet and 12 of the 60 men on the normal-calcium, low-protein, low-salt diet had recurrence of stones. This study shows that a diet with a normal amount of calcium but with reduced amounts of animal protein and salt is more effective than the traditional low-calcium diet in reducing the risk of recurrent stones in men with hypercalciuria.

We illustrated the Kaplan–Meier and actuarial methods for the length of survival time. The Kaplan–Meier method calculates survival each time a patient dies and provides exact estimates. Although generally more time-intensive to calculate, the widespread use of computers has made Kaplan–Meier curves the procedure of choice.

Published results on treatment failure for men with prostate carcinoma treated with radiotherapy were examined in Presenting Problem 3 (Crook et al, 1997). This study found that treatment failure and survival are related to the initial tumor stage and the pretreatment PSA level: patients with T classifications of T1b, T1c, or T2a and those with lower PSA levels had relatively long survival times.

We concluded the chapter with a discussion of the important principle of intention-to-treat, whereby patients are analyzed in the group to which they were originally assigned. We described some of the problems in interpreting the results when this principle is not adhered to, and we pointed out the applicability of the intention-to-treat principle to any study, regardless of whether the outcome is survival or another variable.

EXERCISES

1. A renal transplant surgeon compared two groups of patients who received kidney transplants.a One group underwent transplantation and received azathioprine to retard rejection of the transplanted organ. The other group was treated with cyclosporine, an immunomodulatory substance. Data are given on these two groups in Tables 9-12 and 9-13.

a. Perform the calculations for Kaplan– Meier survival curves. We suggest you use the CD-ROM and the file called “Birtch.”

b. Draw the survival curves. Do you think the survival curves are significantly different?

Table 9-12. Survival of kidney in patients having a transplant and receiving azathioprine.

Patient

Date of Transplant

Months in Study

1

1-11-1978

2

2

1-18-1978

23

3

1-29-1978

23

4

4-4-1978

1

5

4-19-1978

20

6

5-10-1978

19

7

5-14-1978

3

8

5-21-1978

5

9

6-6-1978

17

10

6-17-1978

18

11

6-21-1978

18

12

7-22-1978

3

13

9-27-1978

15

14

10-5-1978

3

15

10-22-1978

14

16

11-15-1978

13

17

12-6-1978

12

18

12-12-1978

12

19

2-1-1979

10

20

2-16-1979

10

21

4-8-1979

8

22

4-11-1979

8

23

4-18-1979

8

24

6-26-1979

1

25

7-3-1979

5

26

7-12-1979

5

27

7-18-1979

1

28

8-23-1979

4

29

10-16-1979

2

30

12-12-1979

1

31

12-24-1979

1

Source: Data courtesy of Dr. A. Birtch.

c.  Perform the logrank test, and interpret the results.

2. Moertel and colleagues (1985) performed a double-blind, randomized trial of high-dose vitamin C versus placebo in the treatment of advanced colorectal cancer in patients who had not ever received chemotherapy. In addition to analyzing survival as an outcome, the investigators used the Kaplan–Meier method and the logrank statistic to analyze progression of the disease. Progression was defined as any of the following: an increase of more than 50% in the product of the perpendicular diameters of any area of known malignancy, new area of malignancy, substantial worsening of symptoms or performance status, or weight loss of 10% or more.


The results of their analysis are reproduced in Figure 9-9.

Table 9-13. Survival of kidney in patients having a transplant and receiving cyclosporine.

Patient

Date of Transplant

Months in Study

1

2-8-1984

22

2

2-22-1984

22

3

2-25-1984

22

4

2-29-1984

8

5

3-12-1984

21

6

3-22-1984

1

7

4-26-1984

20

8

5-2-1984

19

9

5-9-1984

19

10

6-6-1984

18

11

7-11-1984

17

12

7-20-1984

17

13

8-18-1984

16

14

9-5-1984

15

15

9-15-1984

15

16

10-3-1984

14

17

11-9-1984

13

18

11-27-1984

6

19

12-5-1984

12

20

12-6-1984

12

21

12-19-1984

12

Source: Data courtesy of Dr. A. Birtch.

a. What conclusion can be drawn from the figure?

b. What is the median time to disease progression in each group?

c.  Do you think the analysis of survival times found a statistically significant difference?

3. Bajwa and colleagues (1996) wanted to determine risk factors that predict which patients undergoing chronic dialysis treatment will decide to discontinue their life-saving dialysis. A prospective study of 235 chronic dialysis patients analyzed sociodemographic, quality-of-life, medical, and dialysis variables over a period of 3˝ years. Seventy-six patients (32%) died during the follow-up period. Thirteen (17%) died because of discontinuation of dialysis. Data from the study are available in a data set entitled “Bajwa” on the CD-ROM.

Use the data to compare survival for patients treated using continuous ambulatory peritoneal dialysis versus traditional hemodialysis procedures.

Figure 9-9. Comparison of disease progression in patients with colorectal cancer receiving vitamin C versus placebo. (Reproduced, with permission, from Moertel CG, Fleming TR, Creagan ET, Rubin J, O'Connell MJ, Ames MM: High-dose vitamin C versus placebo in the treatment of patients with advanced cancer who have had no prior chemotherapy: A randomized double-blind comparison. N Engl J Med 1985; 312: 137–141.)

a. Which treatment resulted in longer survival times? Was the longer survival sustained over a long period?

b. What is the value of the logrank statistic? What do you conclude from this value?

c.  What are the potential biases in drawing conclusions about treatment method in this study?

4. Refer to Figure 9-7, illustrating the survival curves for the patients categorized by the stage of their tumor (Crook et al, 1997). Is it possible to find the median survival for any of these groups? If so, what is the median survival time for any of these groups?

5. Borghi and colleagues (2002) also classified patients as high versus normal risk for stones.

a. Use the data on the CD-ROM to produce Kaplan–Meier curves for the number of months until recurrence of stones for patients who were high-risk versus normal-risk. In NCSS the cumulative incidence procedure must be used; in SPSS, request the curves for 1 – survival.

b. Based on the graph, what would you conclude about the risk classification used in the study?

6. Refer to the MRFIT study discussed in Chapter 8 and used as a group exercise in that chapter.


Refer to Figure 2 in the study: cumulative coronary heart disease and total mortality rates for the two groups. What statistical method is optimal for determining whether the two groups differed in either of these outcomes?

Figure 9-10. Survival probabilities for 1698 patients according to the extent of coronary artery disease before operation. (Reproduced, with permission, from Figure 3 in Lawrie GM, Morris GC, Earle N: Long-term results of coronary bypass surgery.Ann Surg 1991; 213: 377–385.)

7. Outcomes in a cohort of patients who had had coronary artery bypass surgery 20 years earlier were described by Lawrie and coworkers (1991). Follow-up activities included physician visits, questionnaires, and telephone interviews at regular intervals. Data were available on 92% of the patients 20 years after surgery. The investigators examined survival for a number of subgroups and used the expected survival for age- and sex-adjusted population from the U.S. census to provide a baseline. The curves were calculated using the Kaplan–Meier method; groups defined by the vessel involved are given in Figure 9-10. It is important to recognize that coronary artery bypass surgery procedures have changed greatly since the time of this study, and today's patients enjoy more favorable long-term results.

a. Which group had the best survival?

b. Which group had the highest mortality rate?

c.  What was the approximate median survival in each group?

Footnote

aData kindly provided by Dr. Alan Birtch, Professor Emeritus, Southern Illinois University School of Medicine, Department of Surgery.



If you find an error or have any questions, please email us at admin@doctorlib.info. Thank you!