Brian Haynes MD, PHD, FACP1
Professor of Clinical Epidemiology and Medicine and Chair
Harold C. Sox Jr MD, MACP2
1Department of Clinical Epidemiology and Biostatistics, McMaster University Health Sciences Centre
2Annals of Internal Medicine
The authors have no commercial relationships with manufacturers of products or providers of services discussed in this chapter.
An increasing amount of very useful quantitative evidence from health care research is available to practitioners. New research findings continually expand the knowledge base of what does more good than harm for patients, and institutional forces, both professional and financial, are accelerating the adoption of research findings. More and more information is available on issues related to such important clinical topics as screening and diagnostic tests, preventive and therapeutic interventions, prognosis and clinical prediction, risk of adverse outcomes, improvement in quality of care, and cost-effectiveness of tests and treatments. Clinical application of this evidence has lagged, however, for a number of reasons.1 First, evidence from research is often not definitive or covers only some aspects of practice. Second, clinicians are often slow to adopt research findings, even those that are well validated. Third, resources may be inadequate or too poorly organized in the local setting to permit implementation. Fourth, clinicians may be unfamiliar with the concepts that lie behind the application of quantitative reasoning to clinical care. This chapter addresses the last of these barriers: principles and methods for quantitative reasoning.
Lack of precision in clinical thinking is beginning to yield to several encouraging developments—in particular, clinicians increasingly applying principles of critical appraisal to evidence in the medical literature; formulation of methods for medical decision analysis; increasing clinical comfort with terms such as sensitivity, specificity, likelihood ratio, number needed to treat, and confidence interval (CI); and creation of print and electronic resources that minimize the effort that clinicians must make to find and interpret valid quantitative evidence when it is needed. These developments notwithstanding, the possibility of miscommunication is still considerable. A 2003 study of primary care physicians reported that just over 50% of respondents were able to answer questions about critical appraisal of methods and interpretation of results of studies focusing on treatments and diagnostic tests.2 Patients are entitled to expect clearer thinking from their physicians, especially because many patients have difficulty themselves interpreting information about risks, benefits, and prognoses provided by their doctors.3 Moreover, the current health care environment increasingly demands that physicians be able to justify clinical policies and decisions with an evidence-based, quantitative approach.
We have two principal goals in this chapter. The first is to provide a basic explanation of the measurements used in critical appraisal of the literature and the ways in which physicians interpret these measurements in evidence-based clinical decision making. With the advent of electronic access to MEDLINE and its clinical subsets, specialized compendia of studies (e.g., Clinical Evidence4 and Physicians' Information and Education Resource [PIER]5), systematic reviews of studies (e.g., the Cochrane Library6), and alerting services for new, clinically relevant evidence (e.g., bmjupdates+7 and MEDSCAPE Best Evidence alerts8), the current best evidence for clinical practice is becoming more and more accessible to clinicians.
The second goal is to introduce the topic of medical decision analysis. Clinicians use decision analysis in two ways. One way is essentially indirect: reliance on products of decision analyses conducted by others. For example, practice guidelines increasingly influence many of the quick, straightforward decisions that occur in daily practice. Many of these guidelines are based on formal decision analyses. The second way of employing decision analysis is more direct: using the tools of decision analysis to assist in making major decisions about the care of an individual patient. Although few physicians spend the hours required to conduct a formal decision analysis from scratch, some tools of decision analysis (e.g., likelihood ratios of test results) are easy to apply; moreover, some decision analyses are accessible on a desktop or palmtop computer and only require the clinician to enter the clinical findings required by the decision tree.
It is important to understand the intent of this effort to achieve precision and quantitation in measurement and decision making: to enhance the quality of care by making it more tailored to the individual patient. Anything that can be measured, even if only qualitatively, can be counted and turned into a clinically useful quantitative measure. For example, a study might classify clinical outcomes only qualitatively (e.g., as satisfactory or unsatisfactory), but if the numbers of participants in the study who fall into one or the other of the two outcome states are counted, the result then becomes quantitative. If physicians can define individual states and measure them quantitatively (e.g., by using a continuous scale to assess functional status), they can describe individual patient status more precisely and therefore can make finer distinctions between groups of patients. By placing patients in distinctive groups, physicians can achieve one of the great goals of patient care: to inform patients of the choices between alternative treatments by the known predictors of response to those treatments.
What is the role of the individual practitioner in retrieving and evaluating evidence from research and incorporating it into individual clinical decisions? The answer to this innocuous question distills the angst of contemporary health care. In some settings, the practitioner has the freedom to act as circumstances dictate, whereas in others (e.g., certain managed care settings), someone else tries to dictate how to translate research results into patient care. We believe that practitioners cede their responsibility for clinical decision making to others at great risk to their patients and themselves, because any clinical decisions must take into account not only the evidence available and the guidelines in force but also the patient's unique circumstances and individual wishes. In today's world, the freedom to determine the content of one's practice is increasingly precious. To use this freedom responsibly, practitioners must have ready access to information that is based on current best evidence, must understand the basic principles of quantitative decision making and decision analysis, must be able to determine whether others have applied these principles appropriately in published works or in practice, and must be able to understand how to use evidence from research to make decisions in clinical practice.
How to Critically Evaluate Research Reports
To use numbers wisely in making decisions about patients, the physician must have some way of determining whether the numbers are derived from sound research. Detailed users' guides for interpreting the medical literature are available9; in an effort to simplify this issue, we have provided an abbreviated set of such guides [see Table 1].10 Physicians may find these guides especially useful when reading research reports in the primary literature. However, when physicians are not getting and interpreting evidence themselves, they should look to evidence-based publications, such as Clinical Evidence and PIER; systematic review articles, such as those from the Cochrane Collaboration and clinical journals; and practice guidelines that use explicit criteria for evaluating evidence [see Table 1].
Table 1 Abbreviated Users' Guides for Appraisal of Medical Journal Articles
How to Apply Research Results to Patient Care
Once a physician is satisfied that the quantitative results from the relevant research were derived through sound methods, he or she can interpret them in light of the patient's circumstances and use them to help determine the best way to proceed with management. The interpretation of research results takes five main forms: (1) measures of disease frequency, (2) measures of diagnostic certainty, (3) measures of diagnostic test performance and interpretation, (4) measures of the effects of treatment, and (5) measures of treatment outcomes adjusted for quality of life.
MEASURES OF DISEASE FREQUENCY
Clinically useful measures of disease frequency include incidence, prevalence, the case-fatality rate, the P value, and the CI [see Table 2]. The use of such terms is illustrated in more detail elsewhere (see below).
Table 2 Clinically Useful Measures of Disease Frequency
MEASURES OF DIAGNOSTIC CERTAINTY: USE OF PROBABILITIES
When asked how sure they are of their diagnoses, most physicians express their degree of certainty in words rather than numbers. A classic study illustrates the difficulty of this approach.11 The authors examined pathology and radiology reports and recorded various terms expressive of the probability of a disorder, such as “compatible with,” “consistent with,” “likely,” “probably,” and “pathognomonic.” They then asked a group of clinicians to assign numerical probabilities to all of these terms. For each term (even “pathognomonic”), the range of probabilities stretched over half the scale. For example, to one physician, “likely” meant there was a 45% chance that the disease in question was present, whereas to another, “likely” meant the probability was higher than 90%. When diagnostic-test specialists were asked on two different occasions what they meant by these terms, the earlier and later answers were highly consistent for each individual specialist but highly inconsistent from one specialist to the next.
An alternative to using words to express the degree of diagnostic certainty is to use a number—namely, the probability that the diagnosis is present. A probability is a number between 0 and 1 that expresses the likelihood that an event will occur; 0 represents certainty that it will not occur, and 1 represents certainty that it will. Using probability to express diagnostic certainty has two key advantages. First, it facilitates precise communication. Comparison of probability estimates is a far more precise method of comparing degrees of diagnostic certainty than exchanging verbal assessments. Second, there exists an accurate method of calculating changes in the likelihood of disease as new information (e.g., a test result) becomes available. This method, Bayes' theorem, should be one of the central principles that underlie medical practice. This claim may seem audacious to some readers, but we all recognize that the interpretation of new information about the patient moves us either away from or closer to a diagnosis and, therefore, away from or closer to the decision to use a specific treatment.
The probability of an event is not precisely the same thing as the odds of an event occurring, even though the two are mathematically equivalent ways of expressing diagnostic uncertainty. Habitués of the racetrack are reputed to use odds directly, but most clinicians are likely to find probabilities easier to use. Each of these measures can be readily converted to the other, as follows:
To use a test result quantitatively, a physician must first estimate the pretest probability of the disease. Unaided, physicians are not particularly good at this task. In a 1982 study, when primary care physicians were given clinical scenarios and asked for their estimates of the probabilities of given disorders, they provided estimates—quite confident ones—but their estimates did not agree with those of their fellow clinicians.12 Indeed, when individual physicians were tested subsequently with the same scenarios, their later estimates did not agree with their initial ones.
How does a physician estimate the probability that a patient's chief complaint is a manifestation of a particular disease? The first step is to take a careful history and do a physical examination. From this point, the physician may take any of three basic approaches to estimating the probability of a disease13: (1) subjective estimation, (2) estimation based on the prevalence of disease in other patients with the same syndrome, or (3) application of clinical prediction rules.
In principle, the physician can draw on personal experience with similar patients and use the estimated frequency of the disease in those patients. In practice, this approach is little more than a semiquantitative guess and is prone to error because of defective recall, as well as to bias in the application of the heuristics (i.e., the rules of thumb) for estimating probability. Examples of such heuristics are representativeness, by which one estimates a probability on the basis of the similarity of the patient's signs and symptoms to the features of the classic description of the disease, and availability, by which one estimates a probability partly on the basis of how easy it is to recall similar cases. One very useful heuristic is anchoring and adjustment, by which one establishes an initial estimate (e.g., the prevalence of pulmonary embolism in 100 patients presenting to the emergency department with pleuritic chest pain) and then adjusts the estimate upward or downward by taking into account the patient's findings (e.g., hypoxemia, unilateral leg swelling, or a history of cancer). Physicians can, in principle, calculate the extent of such adjustments by using Bayes' theorem (see below).
Estimation Based on the Prevalence of Disease in Other Patients with the Same Syndrome
One antidote to the failures of subjective probability estimation is to base the estimate on accurate diagnoses established in a series of patients with the same clinical syndrome as the patient under consideration. The best example is the diagnosis of suspected coronary artery disease in patients with chronic chest pain. On the basis of the clinical history, the physician can place the patient in one of three categories: typical angina pectoris, atypical angina, or nonanginal chest pain. Many published studies have reported the frequency of angiographically proven coronary disease in patients with these syndromes. These studies have shown, for example, that in a man with atypical angina, the probability of significant coronary artery disease is approximately 0.70 (see below).
Application of Clinical Prediction Rules
Clinical prediction rules describe the key clinical findings that predict a disease and show how to use these findings to estimate the probability of disease in a patient. Such rules are based on analysis of a standardized set of data, including clinical findings and the final diagnosis, for each of many patients with a diagnostic problem. One type of clinical prediction rule uses regression analysis to identify the best clinical predictors and their diagnostic weights. The sum of the diagnostic weights corresponding to a patient's findings is a score, and the probability of disease for each patient is equivalent to the prevalence of disease among patients with a similar score. A well-known example of this approach is the rule for estimating the probability of cardiac complications from noncardiac surgery.14 Another interesting example showed that the prevalence of coronary artery disease in patients with similar chest pain scores varied systematically according to the overall prevalence of coronary artery disease in several study populations.15 This study suggested that the probability of disease corresponding to a patient's clinical history varies depending on whether the setting of care is a primary care practice or a referral practice.Diagnostic Strategies for Common Medical Problems16 is an excellent source of pretest probabilities, as is Evidence-Based Physical Diagnosis.17
MEASURES OF DIAGNOSTIC TEST PERFORMANCE AND INTERPRETATION
Clinically useful measures of diagnostic test performance include sensitivity, specificity, and the likelihood ratio; clinically useful measures of test interpretation include pretest odds, pretest probability, probability after a positive test result, and probability after a negative test result [see Table 3]. Physicians should memorize and internalize the definitions of these terms to avoid becoming muddled when attempting to use information from diagnostic tests in decision making.
Table 3 Definitions of Clinically Useful Measures of Diagnostic Test Performance and Interpretation
In the past, articles usually described the performance of a diagnostic test only in terms of sensitivity and specificity. These familiar terms do not directly describe the effect of a test result on the probability of disease. To correct this shortcoming, many articles now use the likelihood ratio (LR), which is the amount by which the odds of a disease change with new information. This value is calculated as follows:
Because physicians often express test results as either positive or negative, there is a likelihood ratio for a positive test result (LR+) and a likelihood ratio for a negative test result (LR-). The formula for the likelihood ratio for a positive test result is as follows:
The formula for the LR for a negative test is as follows:
The likelihood ratio is generally a better descriptor than sensitivity or specificity because it more directly describes the effect of a test result on the odds of disease. The probability after obtaining new information is an application of Bayes' theorem. The most useful form of Bayes' theorem for this purpose is the odds ratio format:
Posttest odds = pretest odds × likelihood ratio
This form of Bayes' theorem illustrates a very powerful concept that clinicians often overlook: new information has meaning only in context. Operationally, the statement means that a physician should never interpret a test result in isolation but should always take into account the individual patient's pretest probability. Simply stated, the posttest probability after a positive test result will be greater if the pretest index of suspicion was high than if the pretest index of suspicion was low. The most important practical application of this reasoning is to be suspicious when a test result is negative in a patient whose clinical findings strongly point toward a disease—that is, the probability of the disease may still be high, even after the negative test. One should also be suspicious when a test is positive in a patient for whom the likelihood of disease is very low.
The evaluation of suspected pulmonary embolism (PE) is a good example of the practical use of these statistical terms and methods. A 37-year-old woman presents to the emergency department (ED) with pleuritic chest pain and new dyspnea. She has a low-grade fever and has no cough or hemoptysis, but the ED physician believes it necessary to rule out PE. The patient has none of the other known risk factors for PE (e.g., recent surgery or prolonged bed rest, previous deep vein thrombosis [DVT], coagulopathy, malignancy, pregnancy, and use of oral contraceptives), and physical examination reveals no evidence of DVT. The arterial oxygen tension (PaO2) is 92 mm Hg on room air. The patient is quite distressed. The ED physician orders a chest x-ray and a helical CT scan. The CT scan is interpreted as negative for PE. The resident wishes to explain this result to the patient and then to take the appropriate next steps.
A useful flowchart for working up patients with suspected PE is provided elsewhere [see 1:XVIII Venous Thromboembolism]; however, this chart provides no guidance on how to estimate the clinical probability of PE. It is instructive to examine how the results of a quantitative, evidence-based approach to this patient's case relate to the recommendations outlined in the flowchart.
The initial step is to estimate the pretest probability of PE by one of two approaches. The first is to use the anchoring and adjustment heuristic. The anchor, or starting point, is the prevalence of PE in adults who present to the ED with pleuritic chest pain. One very careful study found that 21% of such patients (36/173) had a positive pulmonary angiogram.18 The physician should use this 21% initial probability as the starting point (the anchor) for the patient under discussion and adjust it on the basis of the history and the physical examination. As noted, this patient has no predisposing factors for PE and no evidence of DVT, and her PaO2 is greater than 90 mm Hg. Using this approach, the ED physician concludes that the probability of PE before helical CT is quite low, perhaps 10%.19
The second approach is to use a clinical prediction rule.20 This model places patients into three categories on the basis of clinical findings (typical for PE, atypical for PE, severe PE), the likelihood of alternative diagnoses, and the presence of risk factors for DVT. The prevalence rates of PE in the three categories are 3.4%, 27.8%, and 78.4%, respectively. The algorithm for placing patients into one of the three categories is somewhat complex but is easy to use when represented on the screen of a palmtop computer. Assuming that the ED physician did not identify an alternative diagnosis that seemed more likely than PE, the patient's pretest probability of PE was 28%, considerably higher than the ED physician's subjective probability.
With an estimate for the pretest probability of PE, the next step is to obtain the likelihood ratio for a negative helical CT scan. The sensitivity and specificity of the helical CT scan have varied considerably among studies. A recent meta-analysis of studies of diagnostic tests for pulmonary embolism found the likelihood ratio for a positive chest CT scan to be 24.1 (95% CI, 12.4 to 46.7). The likelihood ratio for a negative scan was 0.04 (95% CI, 0.03 to 0.06).21
To calculate the posttest odds of PE, the ED physician must combine the patient's pretest odds with the test's likelihood ratio by means of the odds ratio format of Bayes' theorem mentioned earlier (posttest odds = pretest odds × likelihood ratio). An alternative to converting the pretest probability to odds and doing the calculation of posttest odds is to use a nomogram [see Figure 1]. To estimate posttest probability, anchor a straightedge at a pretest probability of 28% (corresponding to the clinical predictive rule's estimate of pretest probability) in the left-hand column; then pass the straightedge through a likelihood ratio for a negative helical CT scan, 0.04, in the middle column. Read the posttest probability from the right-hand column: about 1.5%. The math for this estimate is as follows, with the 0.28 pretest probability of PE first needing to be converted to pretest odds:
Figure 1. Nomogram for Conversion of Pretest Probabilities to Posttest Probabilities
Nomogram for converting pretest probabilities to posttest probabilities when test results are presented as likelihood ratios.
Now, the post-helical CT scan odds of PE for this patient must be determined by multiplying the pretest odds of PE, 0.39, by the likelihood ratio for a negative helical CT, 0.04:
Convert the posttest odds to the posttest probability, as follows:
At a posttest probability of PE of 1.5%, only 15 patients per 1,000 would have PE. Anticoagulating patients with a 1.54% probability of PE would mean exposing 65 patients (i.e., 1/0.0154) to the harms of anticoagulation to benefit one patient with a PE. Most physicians would follow this patient closely without giving specific treatment for PE. This same logic can be applied to all screening and diagnostic tests for PE, including D-dimer testing (high sensitivity and low specificity), which is therefore more useful for ruling out PE (when it is negative) than ruling it in (when it is positive).22 D-dimer tests can also be used for calibrating clinical observations to enhance the quantitation of pretest probabilities.23
MEASURES OF TREATMENT EFFECTS
One of the most important tasks of clinicians is to advise patients about the current best treatment for their condition. Such advice should be based on the best evidence available. Clinically useful measures of treatment effects reported in clinical trials include the experimental event rate (EER), the control event rate (CER), relative risk reduction (RRR), absolute risk reduction (ARR), the number needed to treat (NNT), and the number needed to harm [see Table 4]. These measures can be effective tools for quantifying the magnitude of treatment benefits and risks, provided that there is a statistically significant difference in the clinical event rate between experimental subjects and control subjects (i.e., between the EER and the CER).
Table 4 Definitions of Clinically Useful Measures of Treatment Effects from Clinical Trials
Again, we illustrate the practical application of these terms by a specific example. A 69-year-old hypertensive male smoker has experienced a partial left hemispheric stroke, with good recovery of function. He has a 75% ipsilateral internal carotid artery stenosis. One option would be to give this patient aspirin or clopidogrel and manage his risk factors for cerebrovascular disease; another would be to offer him carotid endarterectomy in addition to medical treatment. The question is, how and on what evidentiary basis does the clinician choose one treatment over another? It is tempting to think of treatments in black-and-white terms, as either working or not working, but the reality is rarely so absolute; often, the choice is between two or more treatments, each of which works after a fashion in certain situations. To apply the available evidence to the decision-making process in the most effective manner, the clinician must interpret it quantitatively, offering accurate, relevant figures instead of gut feelings when the patient asks what his chances are with each therapeutic approach.
Three randomized, controlled trials of carotid endarterectomy for symptomatic carotid artery stenosis24,25,26 can inform our choice of treatment in this hypothetical patient. Examination of the North American Symptomatic Carotid Endarterectomy Trial (NASCET)24 in the light of the users' guides discussed earlier [see Table 1] reveals that this study meets the three criteria for a study focusing on therapy. First, patients with symptomatic hemispheric transient ischemic attacks or partial strokes and ipsilateral carotid stenoses of 70% to 99% were randomly assigned to either an experimental group that underwent carotid endarterectomy or a control group that did not. All patients received continuing medical care, with special attention given to risk factors for cerebrovascular disease. Second, the study assessed the effect of carotid endarterectomy on important clinical events—namely, recurrence of stroke or perioperative stroke or death. Third, none of the patients were lost to follow-up. Consequently, the data from the study are likely to be valid guides in determining which treatment is best for this patient.
In the NASCET report, the risk of major or fatal ipsilateral stroke within a 2-year follow-up period was 2.5% in the group that underwent carotid endarterectomy and 13.1% in the control group. The absolute risk reduction, therefore, was 13.1% - 2.5%, or 10.6% (P < 0.001; CI, 5.5% to 15.7%), and the relative risk reduction was 10.6%/13.1%, or 81%. The number needed to treat was 10 (1/0.106); that is, 10 patients (CI, 7 to 18) would have to be treated with carotid endarterectomy (rather than medical treatment alone) to ensure that one major or fatal ipsilateral stroke would be prevented. The NASCET report indicates that this benefit is somewhat lower for patients with less severe stenosis (70% to 79%) and somewhat higher for patients with multiple risk factors for cerebrovascular disease—circumstances that offset one another in the case of the patient under consideration here.
Having determined the NNT, the next question is whether an NNT of 10 for major or fatal stroke over a 2-year period is a small benefit or a large one. By contrast, treatment of elevated diastolic blood pressures that do not exceed 115 mm Hg is associated with an NNT of 167 to prevent one stroke over a 5-year period.25 Thus, for patients who have symptomatic, severe carotid artery stenosis, carotid endarterectomy is highly beneficial.
Given this conclusion, the next question is, do these research results apply to a specific patient, hospital, and surgeon? For example, the NASCET data reflect operative procedures performed by highly competent surgeons in specialized centers. One would have to know the perioperative complication rates for local surgeons to be able to assess a patient's level of risk if referred to any of those surgeons. If the local surgeons' perioperative complication rates for carotid endarterectomy are lower than 7%, the results are comparable to the NASCET results. On the other hand, patients with a stenosis of less than 70% are at substantially less risk for subsequent stroke to begin with. Potential benefit is similar to potential harm for patients with stenoses of 50% to 70%; for patients with stenoses of less than 50%, current evidence indicates that carotid endarterectomy would not yield any net reduction of this risk, even when the procedure is done by a highly skilled surgeon.26,27
MEASURES OF TREATMENT OUTCOME, ADJUSTED FOR QUALITY OF LIFE
Measures of treatment outcome, such as reduction in mortality, are important in deciding whether to start a medication or perform an operation, but they do not answer a question that is important to many patients: How much longer can they expect to live if treatment is started? One way of responding is to frame the answer in terms of life expectancy, the average length of life after starting treatment, which has a simple relation to the annual mortality in patients undergoing treatment.13
Although life expectancy is a useful measure of treatment outcome, it has one shortcoming: it places the same value on years in perfect health as on years in poor health. Arguably, a year with partially treated chronic disease is not equivalent to a year in perfect health. A solution to this problem is to adjust life expectancy for the quality of life that the patient experiences during a year of poor health by multiplying life expectancy by a number, expressed on a scale of 0 to 1, that reflects how the patient feels about the quality of life experienced during an illness. This number is usually called a utility. When life expectancy, expressed in years, is multiplied by a utility, the result is a quality-adjusted life year (QALY). One QALY is equivalent to a year in perfect health.
Medical Decision Analysis
Clearly, there is more to clinical decision making than simply collecting numbers that measure treatment effects. Reports of treatment effects in randomized, controlled trials are important starting points that help determine whether a treatment has merit in its own right, but the actual decision whether to offer a given patient a particular treatment is complex and must take into account each patient's specific clinical circumstances and individual wishes. For example, if the patient has significant comorbidity that would result in an especially high risk of perioperative complications, surgical therapy might not be the best choice. Even if the patient is well enough to undergo operation, individual preferences and values must be taken into account: the patient might be strongly averse to the immediate risks posed by surgery or might lack the resources to pay for the procedure.
THE THRESHOLD MODEL OF DECISION MAKING
At the conclusion of every history and physical examination, the clinician must choose one of three options: to treat, to observe, or to obtain more information. The optimal approach to making this choice starts with the assumption that the physician will seek more information (i.e., order diagnostic tests) only if the results may alter the treatment decision. Although occasional exceptions are easily justified, this rule is a good guiding principle for a lean style of practice. It is also the central assumption behind the threshold model of decision making.
When a diagnosis is uncertain, the decision whether to start treatment depends on the probability of the diagnosis. If the probability is 0, no one would start treatment; if the probability is 1, everyone would start treatment. Therefore, there must be a probability between 0 and 1 at which a physician would have no preference between treating and not treating. This probability is called the treatment threshold probability.
The treatment threshold probability is a key to solving the important decision-making problem of whether to treat, to observe, or to obtain more information. The most elegant way of obtaining the treatment threshold probability is to construct a decision tree that represents the choice between starting treatment and withholding treatment [see Figure 2]. In a decision tree, decisions are represented by squares (decision nodes), and the chance events that follow a decision are represented by circles (chance nodes). The probabilities of the events after a chance node must total 1.0. A terminal node (represented by a rectangle enclosing the name of the state) represents a state in which there are no subsequent chance events. Each terminal node has a value, which is a measure of the outcome associated with the event.
Figure 2. Decision Tree for Treatment Threshold Probability
Shown is a decision tree for calculating the treatment threshold probability in a patient who is a possible candidate for carotid endarterectomy. (D—disease; U—utility)
In a decision tree for starting or withholding treatment, each branch of the two chance nodes ends in a terminal node whose value is the utility (U) for being in the state specified. For example, U[D+, Rx+] is the utility for having the disease (D) and being treated for it, which one could calculate by representing that state as a tree with chance nodes and terminal nodes. To obtain the treatment threshold probability, one sets the expected utility of treatment at a value equal to the value for the expected utility of no treatment and then solves for the probability of disease. The general solution to the equation is as follows:
where harm is the net utility of being treated when disease is absent (U[D-, Rx+] - U[D-, Rx-]) and benefit is the net utility of being treated when disease is present (U[D+, Rx+] - U[D+, Rx-]). This relationship between harms and benefits of treatment is fundamental to solving the common decision problem of deciding about treatment when the diagnosis is not known with certainty. Because the treatment threshold depends on the benefits and harms of the treatment, it will vary from treatment to treatment. When the benefits of a treatment exceed harms, which is usually the case, the treatment threshold probability must be less than 0.50.
To make the choice between treating, not treating, and ordering tests to obtain additional information, the physician needs to know the range of probabilities of disease within which testing is the preferred action. The probability scale can be divided into three ranges [seeFigure 3], one of which is the test range. The first step in defining the test range is to establish the treatment threshold probability. For the next step, we must invoke the principle that the physician should seek more information only if the results might alter the treatment decision. Translated to the threshold model, this principle takes the following form: testing is indicated only if the result of the test might move the probability of disease from one side of the treatment threshold (the do-not-treat side) to the other (the treat side). A physician can use this principle to decide whether to obtain a test in an individual patient. If the patient's pretest probability is below the treatment threshold and therefore in the do-not-treat zone, the physician should order the test only if the posttest probability of disease after a positive test result would be higher than the treatment threshold probability.
Figure 3. Ranges of Probability Corresponding to Different Actions
Probability scale showing the ranges of probability corresponding to different actions following the initial history and physical examination.
To obtain the test range, we must extend this example to a more general solution, which is to use the test's likelihood ratio and Bayes' theorem to calculate the pretest probability at which the posttest probability is exactly equal to the treatment threshold probability [seeFigure 3]. This probability is called the no treat-test threshold probability. Clearly, if the pretest probability is lower than the no treat-test threshold probability, the test should not be done, because the posttest probability will be lower than the treatment threshold probability (i.e., a positive result will not change the management decision); conversely, if the pretest probability is higher than the no treat-test threshold probability, the test should be done, because the posttest probability will be higher than the treatment threshold probability (i.e., a positive test result would change the management decision from do not treat to treat).
The size of the test range depends on the likelihood ratios reported for the test. If LR- is close to zero and LR+ is much greater than 1.0, the test range will be very wide. In general, the better the test, the larger the test range. If the posttest probability falls within the treat zone, the physician must then decide which treatment to offer. The choice among treatments offers a good opportunity to explore the principles of decision making under conditions of uncertainty.
MEASURES OF EXPECTED-OUTCOME DECISION MAKING: THE TREATMENT DECISION
The purpose of decision analysis is to help with those decisions for which the outcome cannot be foretold (e.g., the decision whether to treat carotid artery stenosis surgically). Even when randomized trial results indicate that one treatment generally gives better results than another, some degree of uncertainty remains: individual patients may still exhibit idiosyncratic outcomes or may experience unusual but serious side effects of treatment. Faced with this uncertainty, most physicians choose the treatment that gives the best results averaged over a large number of patients. In so doing, they become, perhaps unwittingly, what are known as “expected-value decision makers.” Expected value is the value of an intervention when the outcomes of that intervention are averaged over many patients. A more general term might be “expected-outcome decision maker,” which would denote a physician who chooses the treatment that gives the best outcome when averaged over many patients. This concept is the basis of expected-outcome decision analysis, which is a method of framing a decision problem in terms of the expected outcome of each decision alternative. Thus, in a patient with stable angina, the physician would decide between medical management, coronary angioplasty, and coronary artery bypass surgery by first calculating a patient's life expectancy, expressed in years in good health, after undergoing each of these treatment options; then, the physicial would choose the treatment with the highest life expectancy.
We can illustrate the application of expected-outcome decision making by returning to the example of the 69-year-old man who has recovered from a hemispheric stroke and has a 75% carotid stenosis. The question to be answered is the same: Should the patient be offered carotid endarterectomy in addition to best medical treatment? The first step is to represent the problem by a decision tree [seeFigure 4]. Each of the terminal nodes in this decision tree is associated with a life expectancy, as well as a utility representing the value of life in the outcome state, represented by the terminal node. As noted earlier [see How to Apply Research Results to Patient Care, Measures of Treatment Outcome Adjusted for Quality of Life, above], life expectancy by itself is not a sufficiently precise measure: clearly, 2 years of life after a major stroke is not equivalent to 2 years in perfect health. The decision maker needs a quantitative measure of the patient's feelings about being in an outcome state. The physician can obtain the patient's utility for that state by asking the patient to indicate the length of time in perfect health that he would consider equivalent to his life expectancy in a disabled state (e.g., after a major stroke). This technique is called time trade-off. Other techniques used to obtain this utility include linear scaling and the standard reference gamble.13
Figure 4. Application of Expected-Outcome Decision Analysis
Shown is a decision tree depicting the application of expected-outcome decision analysis to the same patient referred to in Figure 2. (LE—life expectancy; U—utility)
To calculate the expected value of surgical management, the decision maker starts at the chance nodes that are farthest from the decision node (the tips of the branches of the decision tree), multiplies the probability of each event at each chance node by the value of the event, and sums these products over all the events at the chance node. This calculation is known as averaging out at a chance node [see Figure 5]. The value obtained for each chance node by means of this process becomes the outcome measure for the next step, which is to repeat the averaging-out process at the next chance node to the left.
Figure 5. Process of Averaging Out at a Chance Node
Illustrated is the process of averaging out at a chance node, as applied to the upper (carotid endarterectomy) portion of the decision tree depicted in Figure 2. (EV—expected value; LE—life expectancy; U—utility)
With either therapeutic option—aspirin combined with carotid endarterectomy or continued management with aspirin alone—there is a chance of death within 30 days, stroke within 30 days, or stroke within 2 years [see Figure 4]. As noted [see How to Apply Research Results to Patient Care, Measures of Treatment Effects, above], reliable data on the probabilities of these adverse events are available in the NASCET report.24 To simplify the presentation of the decision analysis, we measure survival only within the 2-year time frame addressed in the NASCET report, and we assume that all late strokes occur at the start of this 2-year period. Further, we assume that a patient would value 2 years of disability resulting from a stroke as equivalent to 17.5 months of healthy life, which means that the utility representing the state of having experienced a major stroke is 0.70.
The decision analysis indicates that the decision maker should prefer surgical treatment to medical treatment. The expected value of carotid endarterectomy for this patient is 1.96 QALY, whereas the expected value of medical treatment is 1.91 QALY. Admittedly, this difference is not very large, indicating a close call, and it is reasonable to ask how high the operative mortality would have to be to make medical treatment the favored approach. Sensitivity analysis, one of the most powerful features of decision analysis, shows that the operative mortality would have to increase considerably before medical treatment would become preferable. The baseline figure for operative mortality in the NASCET report was 0.6%. The sensitivity analysis indicates that medical treatment would have a higher expected value than surgical treatment only if the operative mortality were 3.2% or higher, which might be the case if considerable comorbidity were present or if the surgeon seldom performed carotid endarterectomy. Although most physicians would not have the time or expertise to carry out this decision analysis, storing the appropriate decision tree in a palmtop computer would make it possible to do the decision analysis easily in the office setting, using values specific to the clinical setting and the patient.
Cost-effectiveness analysis is a method for comparing the impact of expenditures on different health outcomes. Cost-effectiveness analysis assesses the trade-off between added benefit and added cost by examining costs and benefits at the margin (i.e., comparing one intervention with another or with no intervention). The cost-effectiveness of one intervention (A) versus another (B) is calculated as follows:
In the carotid endarterectomy example, the costs would include all costs associated with a subsequent stroke. If we assume that the average lifetime cost associated with carotid endarterectomy is $10,000 and the average lifetime cost associated with medical treatment is $8,000, then the cost-effectiveness of surgical treatment, as compared with medical treatment, would be calculated as follows:
One may then ask, is a treatment choice that costs $40,000 for each extra QALY cost-effective? There is no absolute answer to this question. In practice, a physician compares the cost-effectiveness of carotid endarterectomy with that of other interventions. How this information should affect the decision whether to offer surgical treatment to any given patient is an even more difficult question. Indeed, most experts would say that cost-effectiveness is a technique for deciding policies that would apply to many patients. An organization with limited resources would choose policies that prescribe interventions with the lowest cost per added QALY. The organization would not offer interventions that have a high cost relative to the magnitude of the anticipated benefit.
Quantitative approaches to clinical reasoning are still evolving. By combining better evidence from health care research with today's burgeoning information technology, physicians can apply evidence effectively to individual patient care. As requirements for efficiency and accountability continue to increase, physicians are under more and more pressure to adopt a quantitative, evidence-based approach to patient care. Physicians who can back up their decisions with sound research and sound reasoning will be in a better position to provide their patients with optimal care.
Editors: Dale, David C.; Federman, Daniel D.