KEY CONCEPTS

PRESENTING PROBLEMS
Presenting Problem 1
Lung cancer is the leading cause of cancer deaths in men and in women between the ages of 15 and 64 years of age. Smallcell lung cancer accounts for 20–25% of all cases of lung cancer. At the time of diagnosis, 40% of the patients with smallcell cancer have disease confined to the thorax (limited disease) and 60% have metastases outside of the thorax (extensive disease). Current standard chemotherapy for extensive disease using a combination of cisplatin and etoposide yields a median survival of 8–10 months and a 2year survival rate of 10%. Preliminary studies using a combination of cisplatin with irinotecan resulted in a median survival of 13.2 months. For this reason, Noda and colleagues (2002) at the Japan Clinical Oncology Group conducted a prospective, randomized clinical trial to compare irinotecan plus cisplatin with etoposide plus cisplatin. The primary endpoint was overall survival. Secondary endpoints included rates of complete and overall response. A complete response was defined as the disappearance of all clinical and radiologic evidence of a tumor for at least 4 weeks.
Over a 3year period, 154 patients with histologically confirmed smallcell cancer and extensive disease (defined as distant metastasis or contralateral hilarnode metastasis) were enrolled. All patients were evaluated weekly. Tumor response was assessed by chest radiograph and chest CT.
We use data from this study to illustrate methods for analyzing survival data. A sample of 26 patients is used to show actual calculations, but results from the entire study are also given. Data from the study are available in a data set entitled “Noda” on the CDROM.
Presenting Problem 2
In the United States the formation of renal stones (nephrolithiasis) occurs with an annual incidence of 7–21 per 10,000. Nephrolithiasis is complicated by obstruction, infection, and severe pain; and 5–10% of all cases may require hospitalization. Studies of the natural history of this disease show that 50% of patients will have a recurrence in 5 years. Seventy percent of stones are calcium oxalate, and idiopathic hypercalciuria is an important, common risk factor for the formation of stones.
Previous studies have shown that a low calcium intake reduces urinary calcium excretion but can cause a deficiency of calcium and an increase in urinary oxalate. A low animal protein and lowsalt diet decreases urinary excretion of calcium and oxalate and may lower the endogenous synthesis of oxalate. Borghi and colleagues (2002) compared the efficacy of a traditional lowcalcium diet with a diet containing a normal amount of calcium but reduced amounts of animal protein and salt in the prevention of recurrent stone formation. They enrolled 120 men with idiopathic hypercalciuria and a history of recurrent formation of calcium oxalate stones on at least two occasions in a 5year randomized trial. One group consisting of 60 men consumed a diet containing a normal amount of calcium (30 mmol/day) but reduced amounts of animal protein and salt. The other group of 60 men followed a traditional lowcalcium diet that contained 10 mmol calcium/day. Twentyfour hour urine collections were obtained at baseline, 1 week after randomization, and at yearly intervals; they were used to estimate liquid consumption, and salt, total protein, and animal protein intake. Calcium and oxalate excretion were measured. Renal ultrasound and abdominal flatplate examinations were performed at yearly intervals. The primary outcome measure was the time to the first recurrence of symptomatic renal stone or presence of radiographically identified stone. Recurrences were classified as either silent or symptomatic. We will use data from this study to illustrate Kaplan–Meier analysis to determine the cumulative incidence of recurrent stones; data are in a file on the CDROM called “Borghi.”
Presenting Problem 3
Prostatespecific antigen (PSA) is a serine protease glycoprotein secreted by both normal and neoplastic prostate epithelia. It circulates in the bloodstream and can be measured by a variety of assays and consequently has become an important tool in the understanding of prostate cancer biology and growth. PSA value correlates with the stage of the prostate tumor at diagnosis and is an important prognostic variable.(See also Shipley et al, 1999, Presenting Problem 3 in Chapter 4.) An increasing PSA value after prostate cancer treatment is a sensitive indictor of relapse but does not discriminate between local recurrence and metastatic recurrence. Crook and coinvestigators (1997) studied the correlation between both the pretreatment PSA and posttreatment nadir PSA with the outcome in men with localized prostate cancer who were treated with external beam radiation therapy.
This study was a cohort study of 207 men with localized adenocarcinoma of the prostate treated with radiation therapy. Pretreatment PSA values were obtained for all of the men, and posttreatment values were obtained at 3 and 6month intervals for 5 years and yearly thereafter. Posttreatment prostate biopsies were done at 12 months. Patients with residual tumor had repeat biopsies about every 6 months, whereas those with negative biopsy results had repeat biopsies at 36 months or if a rising PSA value occurred.
The Gleason histologic scoring system was used to classify tumors on a scale of 2 to 10. A low score indicates a welldifferentiated tumor, a medium score a moderately differentiated tumor, and a high score a poorly differentiated tumor. Tumors were also classified using the TNM (tumor, node, metastasis) staging system, called the T classification. A T1 tumor is a nonpalpable tumor identified by biopsy. A T2 tumor is palpable on digital rectal examination and limited to the prostate gland. T3 and T4 tumors have invaded adjacent prostate structures, such as the bladder neck or seminal vesicle. The median age of patients was 69 years, and the median duration of followup was 36 months. Sixtyeight of the 207 patients had a recurrence of prostate cancer: 20 had local recurrence, 24 had nodal or metastatic recurrence, 7 had both local and distant recurrence, and 17 had biochemical recurrence (elevated PSA with negative biopsy and metastatic workup). The prognostic importance of pretreatment PSA, posttreatment nadir PSA, the Gleason score, and the T classification were examined.
We summarize this study and present the results from the Kaplan–Meier survival analysis predicting recurrence. Data on the patients are given on the CDROM in the data set entitled “Crook.”
PURPOSE OF THE CHAPTER
Many studies in medicine are designed to determine whether a new medication, a new treatment, or a new procedure will perform better than the one in current use. Although measures of shortterm effects are of interest with efforts to provide more efficient health care, longterm outcomes, including mortality and major morbidity, are also important. Often, studies focus on comparing survival times for two or more groups of patients
The methods of data analysis discussed in previous chapters are not appropriate for measuring length of survival for two reasons.
First, investigators frequently must analyze data before all patients have died; otherwise, it may be many years before they know which treatment is better. When analysis of survival is done while some patients in the study are still living, the observations on these patients are called censored observations, because we do not know how long these patients will remain alive. Figure 91illustrates a situation in which observations on patients B and E are censored.
The second reason special methods are needed to analyze survival data is that patients do not typically begin treatment or enter the study at the same time, as they did for Figure 91. For example, in the cisplatin study, patients entered the study at different times. When the entry time for patients is not simultaneous and some patients are still in the study when the analysis is done, the data are said to be progressively censored. Figure 92 shows results for a study with progressively censored observations. The study began at time 0 months with patient A; then, patient B entered the study at time 7 months; patient C entered at time 8 months; and so on. Patients B and E were still alive at the time the data were analyzed at 40 months.
Analysis of survival times is sometimes called actuarial, or life table, analysis. Historically, astronomer Edmund Halley (of Halley's Comet fame) first used life tables in the 17th century to describe survival times of residents of a town. Since then, these methods have been used in various ways. Life insurance companies use them to determine the life expectancy of individuals, and this information is subsequently used to establish premium schedules. Insurers generally use crosssectional data about how long people of different age groups are expected to live in order to develop a current life table. In medicine, however, most studies of survival use cohort life tables, in which the same group of subjects is followed for a given period. The data for life tables may come from cohort studies (either prospective or historical) or from clinical trials; the key feature is that the same group of subjects is followed for a prescribed time.
Figure 91. Example of censored observations (X means patient died). 
Figure 92. Example of progressively censored observations (X means patient died). 
In this chapter, we examine two methods to determine survival curves. Actually, it is more accurate to describe them as methods to examine curves with censored data, because many times the outcome is something other than survival. The outcome studied by Borghi and colleagues (2002) was the incidence of stones in patients having idiopathic hypercalciuria who were randomized to one of two diets. Crook and colleagues (1997) studied the recurrence of prostate cancer based on different risk factors.
WHY SPECIALIZED METHODS ARE NEEDED TO ANALYZE SURVIVAL DATA
Before illustrating the methods for analyzing survival data, let us consider briefly why some intuitive methods are not very useful or appropriate. To illustrate these points, we selected a sample of 26 patients being treated for lung cancer: 13 patients taking irinotecan plus cisplatin and 13 patients taking etoposide plus cisplatin (Table 91)
Colton (1974, pp. 238–241) gives a creative presentation of some simple methods to analyze survival data; the arguments presented in this section are modeled on his discussion. Some methods appear at first glance to be appropriate for analyzing survival data, but closer inspection shows they are incorrect.
Suppose someone suggests calculating the mean length of time patients survive with smallcell lung cancer. Using the data on 26 patients inTable 91, the mean survival time for patients on irinotecan plus cisplatin is 17.51 months, and, for patients on etoposide plus cisplatin, 9.12 months. The problem is that mean survival time depends on when the data are analyzed; it will change with each passing month until the point when all the subjects have died. Therefore, mean survival estimates calculated in this way are useful only when all the subjects have died or the event being analyzed has occurred. Almost always, however, investigators wish to analyze their data prior to that time.
An estimate of median length of survival time is also possible, and it can be calculated after only half of the subjects have died. Again, however, investigators often wish to evaluate the outcome prior to that time.
A concept sometimes used in epidemiology is the number of deaths per each 100 personyears of observation. To illustrate, we use the observations in Table 91 to determine the number of personmonths of survival. Regardless of whether patients are alive or dead at the end of the study, they contribute to the calculation for however long they have been in the study. Patient 1 therefore contributes 13.57 months, patient 2 is in the study for 11.70 days, and so on. The total number of months patients have been observed is 346.25 months; converting to years by dividing by 12 gives 28.8 personyears.
One problem with using personyears of observation is that the same number is obtained by observing 1000 patients for 1 year or by observing 100 patients for 10 years. Although the number of subjects is involved in the calculation of personyears, it is not evident as an explicit part of the result; and no statistical methods are available to compare these numbers. Another problem is the inherent assumption that the risk of an event, such as death or rejection, during any one unit of time is constant throughout the study (although several other survival methods also make this assumption).
Mortality rates (see Chapter 3) are a familiar way to deal with survival data, and they are used (especially in oncology) to estimate 3 and 5year survival with various types of medical conditions. We cannot determine a mortality rate using data on all patients until the specified length of time has passed.
Suppose we have a study with 20 patients: 10 lived at least 1 year, and 4 died prior to 1 year, and 6 had been in the study less than a year (ie, they were censored). We have to decide what to do with the six censored patients in order to calculate a 1year survival rate. One solution is to divide the number who died in the first year, 4, by the total number in the study, 20, for an estimate of 0.20, or 20%. This estimate, however, is probably too low, because it assumes that none of the six patients in the study less than 1 year will die before the year is up.
An alternative solution is to ignore the patients who were not in the study for 1 year to obtain 4/(20 – 6) = 0.286, or 28.6%. This technique is similar to the approach used in cancer research in which 3 and 5year mortality rates are based on only those patients who were in the study at least 3 or 5 years. The shortcoming of this approach is that it ignores completely the contribution of the six patients who were in the study for part of a year. We need a way to use information gained from all patients who entered the study. A reasonable approach should produce an estimate between 20% and 28.6%, which is exactly what actuarial life table analysis and Kaplan–Meier product limit methods do. They give credit for the amounts of time subjects survived up to the time when the data are analyzed.
Table 91. Data report on a sample of 26 patients. 


ACTUARIAL, OR LIFE TABLE, ANALYSIS
Actuarial, or life table, analysis is also sometimes referred to in the medical literature as the Cutler–Ederer method (1958). The actuarial method is not computationally overwhelming and, at one time, was the predominant method used in medicine. The availability of computers makes it far less often used today, however, than the Kaplan–Meier product limit method discussed in the next section
We briefly illustrate the calculations involved in actuarial analysis by arranging the 13 patients on etoposide plus cisplatin according to the length of time they had no progression of their disease (Table 92). We use the observations in Table 92 to produce Table 93. The time intervals are arbitrary but should be selected so that the number of censored observations in any interval is small; we group by 3month intervals.
The column headed n_{i} in Table 93 is the number of patients in the study at the beginning of the interval; all patients (13) began the study, so n_{1} is 13. Three patients did not complete the first time interval: patients 5, 28, 23. Of these, one patient's disease progressed (patient 5), referred to as a terminal event (d_{1}); the remaining two patients are referred to as withdrawals (w_{1}).
The actuarial method assumes that patients withdraw randomly throughout the interval; therefore, on the average, they withdraw halfway through the time represented by the interval. In a sense, this method gives patients who withdraw credit for being in the study for half of the period. Onehalf of the number of patients withdrawing is subtracted from the number beginning the interval, so the denominator used to calculate the proportion having a terminal event is reduced by half of the number who withdraw during the period, 13 – (˝ × 2), or 12 in our example. The proportion terminating is 1/12 = 0.0833. The proportion surviving is 1 – 0.0833 = 0.9167, and, because we are still in the first period, the cumulative survival is also 0.9167.
Table 92. Survival of a sample of patients in the etoposide plus cisplatin arm. 


At the beginning of the second interval, only ten patients remain. During the second period, four patients withdrew and one's disease progressed (patient 10), so d_{2} = 1 and w_{2} = 4. The proportion terminating at the second interval is not 1/10 because, although ten patients began the interval, four patients withdrew giving 10 – 2 for the denominator. In our example, the proportion terminating during the second period is 1/[10 – (4/2)] = 1/8, or 0.1250. Again, the proportion with no progression is 1 – 0.1250, or 0.8750, and the cumulative proportion is 0.0.9167 × 0.8750, or 0.8021. This computation procedure continues until the table is completed.
Table 93. Life table for sample of 13 patients treated with etoposide plus cisplatin. 


Note that p_{i} is the probability of surviving interval i only; and to survive interval i, a patient must have survived all previous intervals as well. Thus, p_{i} is an example of a conditional probability because the probability of surviving interval i is dependent, or conditional, on surviving until that point. This probability is sometimes called the survival function. Recall from Chapter 4 that if one event is conditional on a previous event, the probability of their joint occurrence is found by multiplying the probability of the conditional event by the probability of the previous event. The cumulative probability of surviving interval i plus all previous intervals is therefore found by multiplying p_{i} by p_{i}1, p_{i}2, …, p_{1}.
The results from an actuarial analysis can help answer questions that may help clinicians counsel patients or their families. For example, we might ask, If X is the length of time survived by a patient selected at random from the population represented by these patients, what is the probability that X is 6 months or greater? From Table 93, the probability is 0.80, or 4 out of 5, that a patient will live for at least 6 months.
Journal articles rarely present the results from life table analysis as we have in Table 93; rather, the results are usually presented in a survival curve. The line in Figure 93 is a survival curve for the sample of 13 patients on etoposide plus cisplatin.
The actuarial method involves two assumptions about the data. The first is that all withdrawals during a given interval occur, on average, at the midpoint of the interval. This assumption is of less consequence when short time intervals are analyzed; however, considerable bias can occur if the intervals are large, if many withdrawals occur, and if withdrawals do not occur midway in the interval. The Kaplan–Meier method introduced in the next section overcomes this problem. The second assumption is that, although survival in a given period depends on survival in all previous periods, the probability of survival at one period is treated as though it is independent of the probability of survival at others. This condition, although probably violated somewhat in much medical research, does not appear to cause major concern to biostatisticians.
Figure 93. Life table survival plot of a sample of patients in the etoposide plus cisplatin arm. (Data, used with permission, from Noda K, Nishiwaki Y, Kawahara M, Negoro S, Sugiura T, Yokoyama A, et al: Irinotecan plus cisplatin compared with etoposide plus cisplatin for extensive smallcell lung cancer. N Engl J Med 2002; 346: 85–91. Survival plot produced with NCSS, used with permission.) 
KAPLAN–MEIER PRODUCT LIMIT METHOD
The Kaplan–Meier method of estimating survival is similar to actuarial analysis except that time since entry in the study is not divided into intervals for analysis. Depending on the number of patients who died, the Kaplan–Meier product limit method, commonly called Kaplan–Meier curves, may involve fewer calculations than the actuarial method, primarily because survival is estimated each time a patient dies, so withdrawals are ignored. We will illustrate with data from Noda and colleagues (2002) using the same subset of patients as with life table analysis, patients on etoposide plus cisplatin
The first step is to list the times when a death or dropout occurs, as in the column “Event Time” in Table 94. One patient's disease progressed at 1 month and another at 4.4 months, and they are listed under the column “Number of Events.” Then, each time an event or outcome occurs, the mortality, survival, and cumulative survival are calculated in the same manner as with the life table method. If the table is published in an article, it is often formatted in an abbreviated form, such as in Table 95.
Note that the Kaplan–Meier procedure gives exact survival proportions because it uses exact survival times; the actuarial method gives approximations because it groups survival times into intervals. Prior to the widespread use of computers, the actuarial method was much easier to use for a very large number of observations.
Typically, as the interval from entry into the study becomes longer, the number of patients who remain in the study becomes increasingly smaller. This means that the standard deviation of the estimate of the proportion surviving gets increasingly larger over time. Sometimes the number of patients remaining in the study is printed under the time line (as in Figures 1 and 2 in the article by Noda et al). Some authors provide graphs with dashed lines on either side of the survival curve that represent 95% confidence bands for the curve. The confidence limits become wider as time progresses, reflecting decreased confidence in the estimate of the proportion as the sample size decreases. These practices are desirable, but not all computer programs provide them.
Table 94. Kaplan–Meier survival curve in detail for patients on etoposide plus cisplatin. 


To illustrate confidence bands, we analyze actual survival for all patients in both treatment arms. The procedure for obtaining confidence bands uses the standard error of the cumulative survival estimate S_{i}:
For example, at month 32, 10 patients taking irinotecan plus cisplatin are still in the study and 1 patient's disease has progressed, so
and the standard error is
The remaining calculations for both treatment arms are given in Table 96
Figure 94 is a graph of the Kaplan–Meier product limit curve for all patients on irinotecan plus cisplatin illustrating 95% confidence bands. In this graph, the curve is steplike because the proportion of patients surviving changes precisely at the points when a subject dies.
Table 95. Kaplan–Meier survival curve in abbreviated form for patients on etoposide plus cisplatin. 


COMPARING TWO SURVIVAL CURVES
Although some journal articles report survival for only one group, more often investigators wish to compare two or more samples of patients. Table 96 contains the analysis of survival for the entire sample of 154 patients, separately by each treatment arm (Noda et al, 2002).
The Kaplan–Meier survival curves for both treatment arms are given in Figure 95. It is difficult to tell by looking whether the two curves are significantly different. We cannot make judgments simply on the basis of the amount of separation between two lines; a small difference may be statistically significant if the sample size is large, and a large difference may not if the sample size is small. As you might suspect, we need to perform a statistical test to evaluate the degree of any differences. Use the CDROM and replicate our analysis to produce the survival curves
We need special methods to compare survival distributions. If no censored observations occur, the Wilcoxon rank sum test introduced inChapter 6 is appropriate for comparing the ranks of survival time. The independentgroups t test is not appropriate because survival times are not normally distributed and tend to be positively skewed (extremely so, in some cases).
If some observations are censored, several methods may be used to compare survival curves. Most articles in the medical literature report a comparison of survival curves using the logrank statistic or the Mantel–Haenszel chisquare statistic. The computations for all of the methods are very timeconsuming, and computer programs are readily available. We illustrate the logrank and MantelHaenszel methods; both methods are straightforward, if computationally onerous, and are useful in helping us understand the logic behind the method. Within the context of the logrank statistic, we illustrate the hazard ratio, a useful descriptive statistic for comparing two groups at risk.
The Logrank Test
Several forms of the logrank statistic have been published by different biostatisticians, so it is called by several different names in the literature: the Mantel logrank statistic, the Cox–Mantel logrank statistic, and simply the logrank statistic. The logrank testcompares the number of observed deaths in each group with the number of deaths that would be expected based on the number of deaths in the combined groups, that is, if group membership did not matter. An approximate chisquare test is used to test the significance of a mathematical expression involving the observed and expected number of deaths
To illustrate the logrank test, we use the data from Borghi and colleagues (2002) in which 60 patients received a lowcalcium diet and 60 received a normalcalcium, lowprotein, lowsalt diet. The times to recurrence and outcomes are given in Table 97, and the timetoevent curves are shown in Figure 96. We grouped the data for the entire sample of 120 patients in Table 97; the steps for calculating the logrank statistic follow.
1. The second and third columns contain the number of patients in each group who were at risk of developing stones during the time interval. Thus, at 0–10 months, all 60 patients in each sample were at risk. At 11–20 months, patients who developed stones in the first 10 months or were censored (were not in the study that long) are subtracted to obtain the number of patients still at risk, resulting in 55 and 54, respectively. In column 4, the total number at risk in the combined samples is given; that is, the sum of columns 2 and 3. This calculation continues through all periods.
Table 96. Survival analysis for OSM (months of survival) in both treatment arms. 



Figure 94. Kaplan–Meier curve with 95% confidence limits for patients on irinotecan plus cisplatin. (Data, used with permission, from Noda K, Nishiwaki Y, Kawahara M, Negoro S, Sugiura T, Yokoyama A, et al: Irinotecan plus cisplatin compared with etoposide plus cisplatin for extensive smallcell lung cancer. N Engl J Med 2002; 346: 85–91. Survival plot produced with NCSS, used with permission.) 
2. In columns 5 through 7, the number of patients in each group who relapse during that interval and the total number are listed. Thus, at 0–10 months, two patients developed stones in both groups, whereas three patients on the lowcalcium diet and four in the other group were in the study less than 11 months. This calculation continues through all periods.
3. The last three columns contain the expected number of relapses for each group and the total at each period. The expected number of relapses for a given group is found by multiplying the total number of relapses in a given period by the proportion of patients in that group. For example, at 31–40 months, 44 patients remain in group 1 and 46 in group 2, for a total of 90. Four relapses are noted; so 4 × (44/90) is the number of relapses expected to occur in group 1, and 4 × (46/90) is the number of relapses expected in group 2. This calculation is done for all periods.
4. The totals are calculated for each column
The following expression can be used to test the null hypothesis that the survival distributions are the same in the two groups:
where O_{1} is the total number of observed losses in group 1, E_{1} is the total number of expected failures in group 1, and so forth. The statistic χ^{2} follows an approximate chisquare distribution with 1 degree of freedom. In our example, the calculation is
Figure 95. Kaplan–Meier survival curve patients in both treatment arms. (Data, used with permission, from Noda K, Nishiwaki Y, Kawahara M, Negoro S, Sugiura T, Yokoyama A, et al: Irinotecan plus cisplatin compared with etoposide plus cisplatin for extensive smallcell lung cancer. N Engl J Med 2002; 346: 85–91. Survival plot produced with NCSS, used with permission.) 
The chisquare distribution with 1 degree of freedom in Table A–5 indicates that a critical value of 3.841 is required for significance at 0.05. We therefore conclude that a statistically significant difference exists in the distributions of time to recurrence of stones for patients on the two diets.
Computer programs that calculate the logrank statistic do so without dividing the recurrences, or failures, into periods. Instead, they calculate the observed and expected number of failures at each time that a patient dies or is censored, and the result is more accurate—but even more computationally intensive—than the approach we used. Use the CDROM and find the value of the logrank statistic for this sample. Is it in close agreement with our calculations?
The Hazard Ratio
One benefit of calculating the logrank statistic is that the hazard ratio can easily be calculated from the information given in Table 98. It is estimated by O_{1}/E_{1} divided by O_{2}/E_{2}. In our example, the hazard ratio, or risk of recurrence of stones, in patients who were on the lowcalcium diet compared with patients on the normalcalcium, lowprotein, lowsalt diet is
The hazard ratio of 2.01 can be interpreted in a similar manner as the odds ratio: The risk of recurrence of stones at any time in the group on the traditional lowcalcium diet is approximately twice greater than the risk in the group on the normalcalcium with lowprotein and salt diet. Using the hazard ratio assumes that the hazard or risk of death is the same throughout the time of the study; we will discuss the concept of hazard again in the next chapter.
Table 97. Data on outcomes for patients on a low calcium vs normal calcium diet. 


The Mantel–Haenszel ChiSquare Statistic
Another method for comparing survival distributions is an estimate of the odds ratio developed by Mantel and Haenszel that follows (approximately) a chisquare distribution with 1 degree of freedom. The Mantel–Haenszel test combines a series of 2 × 2 tables formed at different survival times into an overall test of significance of the survival curves. The Mantel–Haenszel statistic is very useful because it can be used to compare any distributions, not simply survival curves
Figure 96. Kaplan–Meier survival curve for recurrence of stones. (Data, used with permission, from Borghi L, Schianchi T, Meschi T, Guerra A, Allegri F, Maggiore U, et al: Comparison of two diets for the prevention of recurrent stones in idiopathic hypercalciuria. N Engl J Med 2002; 346: 77–88. Survival plot produced with NCSS, used with permission.) 
We again use data from Borghi and colleagues (2002) to illustrate the calculation of the Mantel– Haenszel statistic (Table 99). The first step is to select the time intervals for which 2 × 2 tables will be formed; we use the 10month intervals as before. For each interval, the number of patients who remained stonefree and the number who developed stones are the rows, and the number of patients in each group are the columns of the 2 × 2 tables.
Table 98. Logrank statistic for survival. 



Table 99. Illustration of Mantel–Haenszel using entire sample. 
As with the logrank test, the Mantel–Haenszel test is onerous to compute. The first step estimates a pooled odds ratio, which is useful for descriptive purposes but is not needed for the statistical test itself. The pooled odds ratio is
where a, b, c, d, and n are defined as they were in the 2 × 2 table in Table 69. The numerator and denominator are calculated in the columns under the heading “Odds Ratio.” For the first time period, the ad/n is (58 × 2)/120 or 0.97. The sum of the terms in the numerator is 5.61 and in the denominator is 8.39. The estimate of the odds ratio using the Mantel–Haenszel approach is therefore 5.61/8.39 = 0.67. The hypothesis to be tested is whether 0.67 is significantly different from 1
The remaining calculations focus on cell (1, 1) of the table; we first find its expected value and its variance for each 2 × 2 table. For example, at 21–30 months, among the 49 patients on the lowcalcium diet who had not developed stones and who were in the study at least 21 months, 46 remained stonefree and 3 relapsed during that period. Among the 47 on the experimental diet, 45 remained stonefree and 2 relapsed. The expected values are found in the same manner as in the chisquare test discussed in Chapters 5 and 6. For example, the expected value of cell (1, 1) in this period is the row total times the column total divided by the grand total:
In addition, the variance of cell (1, 1) is calculated. Using the notation from Table 69, the estimated variance is
For this period, the variance, with rounding, is
After the expected value and the variance are found for each 2 × 2 table, the values are added, along with the number of observed patients in cell (1, 1) in each table. The three sums are 262, 264.78, and 6.91, as you can see in Table 99. The Mantel–Haenszel test is the squared difference between the sum of the observed number minus the sum of the expected number, all divided by the sum of the variances:
This value is smaller than the value we found for the logrank test, and it is no longer significant. Which statistic should we use? It is not a straightforward choice. The value of the logrank statistic in NCSS and SPSS is 4.33, significant at P = 0.04. We give some general guidelines in the next section.
Summary of Procedures to Compare Survival Distributions
The logrank statistics are used with a great deal of frequency in the medical literature. Several logrank methods are seen in the literature, such as that developed by Peto and Peto (1972). The logrank procedure gives all calculations the same weight, regardless of the time at which an event occurs. In contrast, the Peto logrank test weights the terms (observed minus expected) by the number of patients at risk at that time, thereby giving more weight to early events when the number of patients at risk is large. Some biostatisticians choose this method because they believe that calculations based on larger sample sizes should receive more weight than calculations based on smaller sample sizes that occur later in time. On the other hand, other statisticians, such as Hintze, who is the developer of NCSS, recommends using the logrank that gives equal weighting across time unless investigators have specific reasons for deciding otherwise. If the pattern of deaths is similar over time, the Peto logrank statistic and the logrank statistic we illustrated earlier generally lead to the same conclusion. If, however, a higher proportion of deaths occurs during one interval, such as sometimes occurs early in the survival curve, the Peto logrank test and the logrank test may differ
In truth, the information available to guide investigators in deciding which procedure is appropriate in any given application is quite complex, and, in some situations, readers of journal articles cannot determine which procedure actually was used. It is unfortunate that many of the statistical procedures used to compare survival distributions are called by a variety of names. Part of the confusion has occurred because the same biostatisticians (eg, Mantel, Gehan, Cox, Peto and Peto, Haenszel) are or have been leading researchers who developed a number of statistical tests. Another source of confusion is that research on biostatistical methods for analyzing survival data is still underway; as a result, the Mantel procedure and the Peto logrank procedure were only recently shown to be equivalent.
The Mantel–Haenszel chisquare test is sometimes referred to as the logrank test in some texts, and although it is technically different, on many occasions it leads to the same conclusion. This statistic actually may be considered an extension of the logrank test because it can be used in more general situations.
For example, the Mantel–Haenszel chisquare test can be used to combine two or more 2 × 2 tables in other situations, such as a 2 × 2 table for men and a 2 × 2 table for women. This procedure is similar to other methods to control for confounding factors, topics discussed in Chapter 10.
To summarize, all logrank tests, regardless of what they are called, and the Mantel–Haenszel chisquare test may be considered similar procedures. The Gehan and Wilcoxon tests, however, are conceptually different. The Gehan, or generalized Wilcoxon test is an extension of the Wilcoxon rank sum test illustrated in Chapter 6 modified so that it can be used with censored observations (Gehan, 1965). This test is also referred to in the literature as the Breslow test or the generalized Kruskal–Wallis test for comparison of more than two samples (Kalbfleisch and Prentice, 2002). As with the Peto logrank test, the generalized Wilcoxon test uses the number of patients at risk as weights and therefore counts losses that occur early in the survival distribution more heavily than losses that occur late.
Another difference between these two families of tests is that the logrank statistic assumes that the ratio of hazard rates in the two groups stays the same throughout the period of interest. When a constant hazard ratio cannot be assumed, the generalized Wilcoxon procedure is preferred. In the special situation in which the hazard rates are proportional, a method called Cox's proportional hazard model can be used; it is increasingly in the literature because it permits investigators to control for confounding variables (see Chapter 10).
As you can see, the issue is complex and illustrates the advisability of consulting a statistician if performing a survival analysis. Back to our question in the previous section when comparing patients on lowcalcium versus normalcalcium, lowprotein, lowsalt diet, should we accept the result from the logrank test, which was significant, or the Mantel–Haenszel test, which was not? Looking at the distribution of mortality across time periods in Table 98, a constant hazard appears to be a reasonable assumption. Therefore, we would opt for the logrank statistic and conclude the diets are different at P < 0.04, as did Borghi and colleagues (2002). Readers who want more information are referred to the Lee and Wang (2003) text, possibly the most comprehensive text available. An introductory text devoted to survival analysis is that by Kleinbaum (1996). Other texts that discuss survival methods include books by Hosmer and Lemeshow (1999), Collett (2003), Fisher and van Belle (1996), Fleiss (1981; 1999), and Schlesselman (1982).
Many of the statistical computer programs provide several test statistics for survival, but they generally provide at least one from the Gehan/Wilcoxon family and one from the logrank group. In the Kaplan–Meier procedure, NCSS gives more than ten statistics, including the logrank test, the Gehan–Wilcoxon, the Peto test, and the Mantel–Haenszel chisquare. SPSS provides the logrank test and two others. JMP gives two tests: the Wilcoxon test and the logrank test.
THE HAZARD FUNCTION IN SURVIVAL ANALYSIS
In the introduction to this chapter, we stated that calculating mean survival is generally not useful, and we subsequently illustrated how its value depends on the time when the data are analyzed. Estimates of mean survival that are reasonable can be obtained, however, when the sample size is fairly large. This procedure depends on the hazard function, which is the probability that a person dies in the time interval i to i + 1, given that the person has survived until time i. The hazard function is also called the conditional failure rate; in epidemiology, the term force of mortality is used
Although the exponential probability distribution was not discussed in Chapter 4 when we introduced other probability distributions (ie, the normal, binomial, and Poisson), many survival curves follow an exponential distribution. It is a continuous distribution that involves the natural logarithm, ln, and it depends on a constant rate (which determines the shape of the curve) and on time. It provides a model for describing processes such as radioactive decay.
If an exponential distribution is a reasonable assumption for the shape of a survival curve, then the following formula can be used to estimate the hazard rate, symbolized by the letter H, when censored observations occur:
where d is the number of deaths, Σf is the sum of failure times, and Σc is the sum of censored times. Calculating the hazard rate requires us to add all of the failure times and censored times
One reason the hazard rate is of interest is that its reciprocal is an estimate of mean survival time. The formulas are complex, and, fortunately, the NCSS computer program calculates the hazard function and its 95% confidence interval as part of the Kaplan–Meier analysis. Details on using the hazard function are given in the comprehensive text on survival analysis by Lee and Wang (2003).
INTERPRETING SURVIVAL CURVES FROM THE LITERATURE
The investigators in the study described in Presenting Problem 3 (Crook et al, 1997) reported on a group of 207 men followed prospectively after treatment for prostate carcinoma with radiotherapy. Followup included systematic transrectal ultrasoundguided biopsies and measurements of serum prostatespecific antigen (PSA) levels. The median duration of followup for the patients at the time of analysis was 36 months, with a range from 12 to 70 months. Failures were observed in 68 patients. The investigators wanted to look at the relationship between patient outcome and both the stage of the tumor at diagnosis and pretreatment PSA levels. Timedependent variables, survival, and time to failure were examined using the Kaplan–Meier product limit method. All outcomes were calculated from the time the patient completed radiotherapy, and the curves were compared using the logrank test. Treatment failure was categorized by whether they had a local, distant, or chemical failure. The investigators examined the relationship between multiple variables and the time until failure using the Cox proportional hazard model; we return to this study when we discuss this method in Chapter 10
Figure 97. Kaplan–Meier survival curve showing diseasefree survival by tumor state. (Used, with permission, from Crook JM, Bahadur YA, Bociek RG, Perry GA, Robertson SJ, Esche BA: Radiotherapy for localized prostate carcinoma. Cancer 1997; 79: 328–336. Figure produced with NCSS; used with permission.) 
The authors presented six different survival curves:
1. Overall survival by tumor stage
2. Diseasefree survival by tumor stage
3. Time to any failure by pretreatment PSA level (divided into six categories)
4. Time to local failure
5. Time to distant failure
6. Time to any failure by the lowest (nadir) posttreatment PSA
We reproduced two of the figures using the NCSS computer program and the data provided by the authors. Figure 97 shows the diseasefree survival curves for patients categorized by their tumor stage (T classification) using the NCSS computer program. The curves for T1b–c and T2a are very similar and, in fact, cross several times. The survival rate was very high in these two groups for the first 2 years following treatment. Even without a statistical test, it appears that the survival pattern for patients with these tumor stages follows similar survival curves. The situation is different for those with stages T2b–c and T3–4. Survival rates in these two stages are considerably lower. Both curves demonstrate a fairly steady decrease in diseasefree survival over time.
Table 910 gives the results of the statistical tests comparing the survival curves using the NCSS computer program. You can see that the values for the three procedures are in close agreement: the Gehan– Wilcoxon chisquare is 25.94, the Peto–Wilcoxon is 27.98, and the logrank is 27.70. All are highly significant and substantiate our tentative conclusion that the curves, certainly for tumor stages T2bc and T34, are significantly lower, indicating earlier deaths in patients with these two tumor stages.
We can use the filter procedure in NCSS or SPSS to select a subset of the cases to learn if the curves for patients with tumor stages T2b–c and T3–4 differ. Selecting only the patients with stages T2b–c and T3–4 gives a chisquare value for the logrank statistic of 6.81 with a P value of 0.009. We suggest you replicate these analyses using the CDROM
Figure 98 illustrates the six survival curves for the patients categorized according to their pretreatment PSA levels using the NCSS computer program. The investigators formed these six categories of PSA level: 0–5, 5.1–10, 10.1–15, 15.1–20, 20.1–50, and > 50. Is the pattern of survival what you would expect; that is, do patients with lower pretreatment PSA values survive longer, on average? What would you advise a patient with a PSA level of 3? It appears that such a patient has a very good chance of survival to 5 or 6 years, although it is important to remember that the curves are based on a small number of patients. The problem associated with small samples is exacerbated when the subjects are divided into groups. What would you tell a patient who inquires about survival if he has a pretreatment PSA level greater than 15?
Table 910. Statistical tests for survival curves by tumor stages. 


Use the CDROM to determine the logrank statistic for survival by pretreatment PSA. We produced the output in Table 911.
THE INTENTIONTOTREAT PRINCIPLE
In the method section of journal articles that report clinical trial results, investigators often state that they analyzed the data on anintentiontotreat basis. For instance, Presenting Problem 5 in Chapter 3 described a study evaluating antenatal administration of thyrotropinreleasing hormone to improve pulmonary outcome in preterm infants (Ballard et al, 1998). The study consisted of 996 women in active labor who were randomized to receive an injection of thyrotropinreleasing hormone or normal saline. The primary outcome was infant death on or before the 28th day after delivery or chronic lung disease, defined as the need for oxygen therapy for 21 of the first 28 days of life
Figure 98. Kaplan–Meier survival curve showing time to any failure by pretreatment PSA level group. (Used, with permission, from Crook JM, Bahadur YA, Bociek RG, Perry GA, Robertson SJ, Esche BA: Radiotherapy for localized prostate carcinoma. Cancer 1997; 79: 328–336. Figure produced with NCSS; used with permission.) 

Table 911. Statistics comparing time to any failure by pretreatment PSA group. 


In the method section, the researchers state: “All analyses were based on the intentiontotreat principle…”. This statement means that the results for each patient who entered the trial were included in the analysis of the group to which the patient was randomized, regardless of any subsequent events. In Ballard and coworkers' study, 18 women in the treatment group (3.7%) and 7 in the placebo group (1.4%) withdrew from the trial because of side effects. These patients, however, were included in the analysis.
Analyzing data on an intentiontotreat basis is appropriate for several reasons. First is the issue of dropouts, as in the study by Ballard and coworkers. Although the percentage of patients dropping out of this study was relatively small, more than twice as many patients dropped out of the treatment group than from the control group. Is it possible that the patients who dropped out of the treatment group had some characteristics that, independent of the treatment, could affect the outcome? Suppose, for instance, that the women who dropped out of the study had gestations of fewer than 26 weeks. Women with gestations of fewer than 26 weeks are more likely to have infants with respiratory distress. If these patients are omitted from the analysis, the results may appear to be better for the earlygestation group than they should; that is, the results are biased. Although there is no indication that this occurred in the study by Ballard and coworkers, it is easy to see how such events could affect the conclusions, and these investigators were correct to analyze the data on the intentiontotreat basis.
The intentiontotreat principle is also important in studies in which patients cross over from one treatment group to another. For example, the classic Coronary Artery Surgery Study (CASS, 1983) was a randomized trial of coronary bypass surgery. Patients were assigned to medical treatment or surgical intervention to evaluate the effect of treatment on outcomes for patients with coronary artery disease. As in many studies that compare a conservative treatment with a more aggressive intervention, some patients in the CASS study who were randomized to medical treatment subsequently underwent surgery. And, some patients randomized to surgery were treated medically instead.
The problem with studies in which patients cross over from one treatment to another is that we do not know why the crossover occurred. Did some of the patients originally assigned to medical treatment improve so that they became candidates for surgery? If so, this could cause results in the surgery group to appear better than they really were (because “healthier” patients were removed from the medical group and transferred to the surgery group). On the other hand, perhaps the condition of the patients originally assigned to medical treatment worsened to such a degree that the patient or family insisted on having surgery. If so, this could cause the surgery group results to appear worse than they really were (because “sicker” patients were transferred from the medical to the surgical group). The point is, we do not know why the patients crossed over, and neither do the investigators.
In the past, some investigators presented with such a situation analyzed the patients by the group they were in at the end of the study. Other researchers omitted from the analysis any patients who crossed over. It should be easy to see why both of these approaches are potentially biased. The best approach, one recommended by biostatisticians and advocates of evidencebased medicine, is to perform all analyses on the original groups to which the patients were randomized. The CASS study occurred several years ago, and no consensus existed at that time on the best way to analyze the findings. The CASS investigators therefore performed the analyses in several ways: by the original group (intentiontotreat), by the final groups of the study, and by eliminating all crossovers from the analysis. All of these methods gave the same result, namely that no difference in survival occurred, although later studies showed differences in qualityoflife indicators.
The intentiontotreat principle applies to studies other than those with survival as the outcome. We included the topic here, however, because it is so pertinent to survival studies. Gillings and Koch (1991) provide a comprehensive and very readable discussion.
SUMMARY
Special methods are needed to analyze data from studies of survival time because censored observations occur when patients enter at different times and remain in a study for different periods. Otherwise, investigators would have to wait until all subjects were in the study for a given period before analyzing the data. In medicine, survival curves are commonly drawn by the Kaplan–Meier product limit method and, less frequently, the actuarial (life table) method. The quality of survival studies published in the medical literature was reviewed by Altman and associates (1995). They found that almost half of the papers did not summarize the length of followup or clearly define all endpoints. They suggested some guidelines for presenting survival analyses in medical journals
The study of smallcell lung cancer by Noda and colleagues (2002) was terminated early when an interim analysis showed a significant difference in overall survival between the two groups. When the final analysis was conducted 64 months after the first patient was enrolled and 28 months after the last patient was enrolled, the median overall survival was 12.8 months in the irinotecanpluscisplatin group and 9.4 months in the etoposidepluscisplatin group. The rate of overall survival in the irinotecan group at 1 and 2years was 58.4% and 19.5%, respectively, and, in the etoposide group, 37.7% and 5.2%, respectively. The authors conclude that the combination of irinotecan plus cisplatin is an effective treatment option for patients with metastatic smallcell lung cancer.
In the study by Borghi and colleagues (2002), 23 of the 60 men on the lowcalcium diet and 12 of the 60 men on the normalcalcium, lowprotein, lowsalt diet had recurrence of stones. This study shows that a diet with a normal amount of calcium but with reduced amounts of animal protein and salt is more effective than the traditional lowcalcium diet in reducing the risk of recurrent stones in men with hypercalciuria.
We illustrated the Kaplan–Meier and actuarial methods for the length of survival time. The Kaplan–Meier method calculates survival each time a patient dies and provides exact estimates. Although generally more timeintensive to calculate, the widespread use of computers has made Kaplan–Meier curves the procedure of choice.
Published results on treatment failure for men with prostate carcinoma treated with radiotherapy were examined in Presenting Problem 3 (Crook et al, 1997). This study found that treatment failure and survival are related to the initial tumor stage and the pretreatment PSA level: patients with T classifications of T1b, T1c, or T2a and those with lower PSA levels had relatively long survival times.
We concluded the chapter with a discussion of the important principle of intentiontotreat, whereby patients are analyzed in the group to which they were originally assigned. We described some of the problems in interpreting the results when this principle is not adhered to, and we pointed out the applicability of the intentiontotreat principle to any study, regardless of whether the outcome is survival or another variable.
EXERCISES
1. A renal transplant surgeon compared two groups of patients who received kidney transplants.^{a} One group underwent transplantation and received azathioprine to retard rejection of the transplanted organ. The other group was treated with cyclosporine, an immunomodulatory substance. Data are given on these two groups in Tables 912 and 913.
a. Perform the calculations for Kaplan– Meier survival curves. We suggest you use the CDROM and the file called “Birtch.”
b. Draw the survival curves. Do you think the survival curves are significantly different?
Table 912. Survival of kidney in patients having a transplant and receiving azathioprine. 


c. Perform the logrank test, and interpret the results.
2. Moertel and colleagues (1985) performed a doubleblind, randomized trial of highdose vitamin C versus placebo in the treatment of advanced colorectal cancer in patients who had not ever received chemotherapy. In addition to analyzing survival as an outcome, the investigators used the Kaplan–Meier method and the logrank statistic to analyze progression of the disease. Progression was defined as any of the following: an increase of more than 50% in the product of the perpendicular diameters of any area of known malignancy, new area of malignancy, substantial worsening of symptoms or performance status, or weight loss of 10% or more.
The results of their analysis are reproduced in Figure 99.
Table 913. Survival of kidney in patients having a transplant and receiving cyclosporine. 


a. What conclusion can be drawn from the figure?
b. What is the median time to disease progression in each group?
c. Do you think the analysis of survival times found a statistically significant difference?
3. Bajwa and colleagues (1996) wanted to determine risk factors that predict which patients undergoing chronic dialysis treatment will decide to discontinue their lifesaving dialysis. A prospective study of 235 chronic dialysis patients analyzed sociodemographic, qualityoflife, medical, and dialysis variables over a period of 3˝ years. Seventysix patients (32%) died during the followup period. Thirteen (17%) died because of discontinuation of dialysis. Data from the study are available in a data set entitled “Bajwa” on the CDROM.
Use the data to compare survival for patients treated using continuous ambulatory peritoneal dialysis versus traditional hemodialysis procedures.
Figure 99. Comparison of disease progression in patients with colorectal cancer receiving vitamin C versus placebo. (Reproduced, with permission, from Moertel CG, Fleming TR, Creagan ET, Rubin J, O'Connell MJ, Ames MM: Highdose vitamin C versus placebo in the treatment of patients with advanced cancer who have had no prior chemotherapy: A randomized doubleblind comparison. N Engl J Med 1985; 312: 137–141.) 
a. Which treatment resulted in longer survival times? Was the longer survival sustained over a long period?
b. What is the value of the logrank statistic? What do you conclude from this value?
c. What are the potential biases in drawing conclusions about treatment method in this study?
4. Refer to Figure 97, illustrating the survival curves for the patients categorized by the stage of their tumor (Crook et al, 1997). Is it possible to find the median survival for any of these groups? If so, what is the median survival time for any of these groups?
5. Borghi and colleagues (2002) also classified patients as high versus normal risk for stones.
a. Use the data on the CDROM to produce Kaplan–Meier curves for the number of months until recurrence of stones for patients who were highrisk versus normalrisk. In NCSS the cumulative incidence procedure must be used; in SPSS, request the curves for 1 – survival.
b. Based on the graph, what would you conclude about the risk classification used in the study?
6. Refer to the MRFIT study discussed in Chapter 8 and used as a group exercise in that chapter.
Refer to Figure 2 in the study: cumulative coronary heart disease and total mortality rates for the two groups. What statistical method is optimal for determining whether the two groups differed in either of these outcomes?
Figure 910. Survival probabilities for 1698 patients according to the extent of coronary artery disease before operation. (Reproduced, with permission, from Figure 3 in Lawrie GM, Morris GC, Earle N: Longterm results of coronary bypass surgery.Ann Surg 1991; 213: 377–385.) 
7. Outcomes in a cohort of patients who had had coronary artery bypass surgery 20 years earlier were described by Lawrie and coworkers (1991). Followup activities included physician visits, questionnaires, and telephone interviews at regular intervals. Data were available on 92% of the patients 20 years after surgery. The investigators examined survival for a number of subgroups and used the expected survival for age and sexadjusted population from the U.S. census to provide a baseline. The curves were calculated using the Kaplan–Meier method; groups defined by the vessel involved are given in Figure 910. It is important to recognize that coronary artery bypass surgery procedures have changed greatly since the time of this study, and today's patients enjoy more favorable longterm results.
a. Which group had the best survival?
b. Which group had the highest mortality rate?
c. What was the approximate median survival in each group?
Footnote
^{a}Data kindly provided by Dr. Alan Birtch, Professor Emeritus, Southern Illinois University School of Medicine, Department of Surgery.