Basic & Clinical Biostatistics, 4th Edition

13. Reading the Medical Literature

 

PURPOSE OF THE CHAPTER

This final chapter has several purposes. Most importantly, it ties together concepts and skills presented in previous chapters and applies these concepts very specifically to reading medical journal articles. Throughout the text, we have attempted to illustrate the strengths and weaknesses of some of the studies discussed, but this chapter focuses specifically on those attributes of a study that indicate whether we, as readers of the medical literature, can use the results with confidence. The chapter begins with a brief summary of major types of medical studies. Next, we examine the anatomy of a typical journal article in detail, and we discuss the contents of each component—abstract or summary, introduction, methods, results, discussion, and conclusions. In this examination, we also point out common shortcomings, sources of bias, and threats to the validity of studies

Clinicians read the literature for many different reasons. Some articles are of interest because you want only to be aware of advances in a field. In these instances, you may decide to skim the article with little interest in how the study was designed and carried out. In such cases, it may be possible to depend on experts in the field who write review articles to provide a relatively superficial level of information. On other occasions, however, you want to know whether the conclusions of the study are valid, perhaps so that they can be used to determine patient care or to plan a research project. In these situations, you need to read and evaluate the article with a critical eye in order to detect poorly done studies that arrive at unwarranted conclusions.

To assist readers in their critical reviews, we present a checklist for evaluating the validity of a journal article. The checklist notes some of the characteristics of a well-designed and well-written article. The checklist is based on our experiences with medical students, house staff, journal clubs, and interactions with physician colleagues. It also reflects the opinions expressed in an article describing how journal editors and statisticians can interact to improve the quality of published medical research (Marks et al, 1988). A number of authors have found that only a minority of published studies meet the criteria for scientific adequacy. The checklist should assist you in using your time most effectively by allowing you to differentiate valid articles from poorly done studies so that you can concentrate on the more productive ones.

Two guidelines recently published increase our optimism that the quality of the published literature will continue to improve. The International Conference on Harmonization (ICH) E9 guideline “Statistical Principles for Clinical Trials” (1999) addresses issues of statistical methodology in the design, conduct, analysis, and evaluation of clinical trials. Application of the principles is intended to facilitate the general acceptance of analyses and conclusions drawn from clinical trials.

The International Committee of Medical Journal Editors published the Uniform Requirements of Manuscripts Submitted to Biomedical Journals in 1997. Under Statistics, the document states:

Describe statistical methods with enough detail to enable a knowledgeable reader with access to the original data to verify the reported results…. When data are summarized in the results section, specify the statistical methods used to analyze them.

The requirements also recommend the use of confidence intervals and to avoid depending solely on P values.

REVIEW OF MAJOR STUDY DESIGNS

Chapter 2 introduced the major types of study designs used in medical research, broadly divided into experimental studies (includingclinical trials); observational studies (cohort, case–control, cross-sectional/ surveys, case–series); and meta-analyses. Each design has certain advantages over the others as well as some specific disadvantages; they are briefly summarized in the following paragraphs. (A more detailed discussion is presented in Chapter 2.)

Clinical trials provide the strongest evidence for causation because they are experiments and, as such, are subject to the least number of problems or biases. Trials with randomized controls are the study type of choice when the objective is to evaluate the effectiveness of a treatment or a procedure. Drawbacks to using clinical trials include their expense and the generally long time needed to complete them.

Cohort studies are the best observational study design for investigating the causes of a condition, the course of a disease, or risk factors. Causation cannot be proved with cohort studies, because they do not involve interventions. Because they are longitudinal studies, however, they incorporate the correct time sequence to provide strong evidence for possible causes and effects. In addition, in cohort studies that are prospective, as opposed to historical, investigators can control many sources of bias. Cohort studies have disadvantages, of course. If they take a long time to complete, they are frequently weakened by patient attrition. They are also expensive to carry out if the disease or outcome is rare (so that a large number of subjects needs to be followed) or requires a long time to develop.

Case–control studies are an efficient way to study rare diseases, examine conditions that take a long time to develop, or investigate a preliminary hypothesis. They are the quickest and generally the least expensive studies to design and carry out. Case–control studies also are the most vulnerable to possible biases, however, and they depend entirely on high-quality existing records. A major issue in case–control studies is the selection of an appropriate control group. Some statisticians have recommended the use of two control groups: one similar in some ways to the cases (such as having been hospitalized or treated during the same period) and another made up of healthy subjects.

Cross-sectional studies and surveys are best for determining the status of a disease or condition at a particular point in time; they are similar to case–control studies in being relatively quick and inexpensive to complete. Because cross-sectional studies provide only a snapshot in time, they may lead to misleading conclusions if interest focuses on a disease or other time-dependent process.

Case–series studies are the weakest kinds of observational studies and represent a description of typically unplanned observations; in fact, many would not call them studies at all. Their primary use is to provide insights for research questions to be addressed by subsequent, planned studies.

Studies that focus on outcomes can be experimental or observational. Clinical outcomes remain the major focus, but emphasis is increasingly placed on functional status and quality-of-life measures. It is important to use properly designed and evaluated methods to collect outcome data. Evidence-based medicine makes great use of outcome studies.

Meta-analysis may likewise focus on clinical trials or observational studies. Meta-analyses differ from the traditional review articles in that they attempt to evaluate the quality of the research and quantify the summary data. They are helpful when the available evidence is based on studies with small sample sizes or when studies come to conflicting conclusions. Meta-analyses do not, however, take the place of well-designed clinical trials.

THE ABSTRACT & INTRODUCTION SECTIONS OF A RESEARCH REPORT

Journal articles almost always include an abstract or summary of the article prior to the body of the article itself. Most of us are guilty of reading only the abstract on occasion, perhaps because we are in a great hurry or have only a cursory interest in the topic. This practice is unwise when it is important to know whether the conclusions stated in the article are justified and can be used to make decisions. This section discusses the abstract and introduction portions of a research report and outlines the information they should contain.

The Abstract

The major purposes of the abstract are (1) to tell readers enough about the article so they can decide whether to read it in its entirely and (2) to identify the focus of the study. The International Committee of Medical Journal Editors (1997) recommended that the abstract “state the purposes of the study or investigation, basic procedures (selection of study subjects or experimented animals; observational and analytic methods), main findings (specific data and their statistical significance, if possible) and the principal conclusions.” An increasing number of journals, especially those we consider to be of high quality, now use structured abstracts in which authors succinctly provide the above-mentioned information in separate, easily identified paragraphs (Haynes et al, 1990)

We suggest asking two questions to decide whether to read the article: (1) If the study has been properly designed and analyzed, would the results be important and worth knowing? (2) If the results are statistically significant, does the magnitude of the change or effect also have clinical significance; if the results are not statistically significant, was the sample size sufficiently large to detect a meaningful difference or effect? If the answers to these questions are yes, then it is worthwhile to continue to read the report. Structured abstracts are a boon to the busy reader and frequently contain enough information to answer these two questions.

The Introduction or Abstract

At one time, the following topics were discussed (or should have been discussed) in the introduction section; however, with the advent of the structured abstract, many of these topics are now addressed directly in that section. The important issue is that the information be available and easy to identify.

Reason for the study

The introduction section of a research report is usually fairly short. Generally, the authors briefly mention previous research that indicates the need for the present study. In some situations, the study is a natural outgrowth or the next logical step of previous studies. In other circumstances, previous studies have been inadequate in one way or another. The overall purpose of this information is twofold: to provide the necessary background information to place the present study in its proper context and to provide reasons for doing the present study. In some journals, the main justification for the study is given in the discussion section of the article instead of in the introduction.

Purpose of the study

Regardless of the placement of background information on the study, the introduction section is where the investigators communicate the purpose of their study. The purpose of the study is frequently presented in the last paragraph or last sentences at the end of the introduction. The purpose should be stated clearly and succinctly, in a manner analogous to a 15-second summary of a patient case. For example, in the study described in Chapter 5, Dennison and colleagues (1997, p. 15) do this very well; they stated their objective as follows:

To evaluate, in a population-based sample of healthy children, fruit juice consumption and its effects on growth parameters during early childhood

This statement concisely communicates the population of interest (healthy children), the focus of the study or independent variable (fruit juice consumption), and the outcome (effects on growth). As readers, we should be able to determine whether the purpose for the study was conceived prior to data collection or if it evolved after the authors viewed their data; the latter situation is much more likely to capitalize on chance findings. The lack of a clearly stated research question is the most common reason medical manuscripts are rejected by journal editors (Marks et al, 1988).

Population included in the study

In addition to stating the purpose of the study, the structured abstract or introduction section may include information on the study's location, length of time, and subjects. Alternatively, this information may be contained in the methods sections. This information helps readers decide whether the location of the study and the type of subjects included in the study are applicable in the readers' own practice environment

The time covered by a study gives important clues regarding the validity of the results. If the study on a particular therapy covers too long a period, patients entering at the beginning of the study may differ in important ways from those entering at the end. For example, major changes may have occurred in the way the disease in question is diagnosed, and patients entering near the end of the study may have had their disease diagnosed at an earlier stage than did patients who entered the study early (see detection bias, in the section of that title.). If the purpose of the study is to examine sequelae of a condition or procedure, the period covered by the study must be sufficiently long to detect consequences.

THE METHOD SECTION OF A RESEARCH REPORT

The method section contains information about how the study was done. Simply knowing the study design provides a great deal of information, and this information is often given in a structured abstract. In addition, the method section contains information regarding subjects who participated in the study or, in animal or inanimate studies, information on the animals or materials. The procedures used should be described in sufficient detail that the reader knows how measurements were made. If methods are novel or require interpretation, information should be given on the reliability of the assessments. The study outcomes should be specified along with the criteria used to assess them. The method section also should include information on the sample size for the study and on the statistical methods used to analyze the data; this information is often placed at the end of the method section. Each of these topics is discussed in detail in this section

How well the study has been designed is of utmost importance. The most critical statistical errors, according to a statistical consultant to the New England Journal of Medicine, involve improper research design: “Whereas one can correct incorrect analytical techniques with a simple reanalysis of the data, an error in research design is almost always fatal to the study—one cannot correct for it subsequent to data collection” (Marks et al, 1988, p. 1004). Many statistical advances have occurred in recent years, especially in the methods used to design, conduct, and analyze clinical trials, and investigators should offer evidence that they have obtained expert advice.

Subjects in the Study

Methods for choosing subjects

Authors should provide several critical pieces of information about the subjects included in their study so that we readers can judge the applicability of the study results. Of foremost importance is how the patients were selected for the study and, if the study is a clinical trial, how treatment assignments were made

Randomized selection or assignment greatly enhances the generalizability of the results and avoids biases that otherwise may occur in patient selection (see the section titled, “Bias Related to Subject Selection.”). Some authors believe it is sufficient merely to state that subjects were randomly selected or treatments were randomly assigned, but most statisticians recommend that the type of randomization process be specified as well. Authors who report the randomization methods provide some assurance that randomization actually occurred, because some investigators have a faulty view of what constitutes randomization. For example, an investigator may believe that assigning patients to the treatment and the control on alternate days makes the assignment random. As we emphasized in Chapter 4, however, randomization involves one of the precise methods that ensure that each subject (or treatment) has a known probability of being selected.

Eligibility criteria

The authors should present information to illustrate that major selection biases (discussed in the section titled, “Bias Related to Subject Selection.”) have been avoided, an aspect especially important in nonrandomized trials. The issue of which patients serve as controls was discussed in Chapter 2 in the context of case–control studies. In addition, the eligibility criteria for both inclusion and exclusion of subjects in the study must be specified in detail. We should be able to state, given any hypothetical subject, whether this person would be included in or excluded from the study. Sauter and coworkers (2002) gave the following information on patients included in their study:

Patients undergoing CHE in our Surgical Department were consecutively included into the study provided that they did not meet one the following exclusion criteria: (a) inflammatory bowel disease, history of intestinal surgery, or diarrhea within the preceding 2 years, (b) body weight > 90 kg, (c) pregnancy, (d) abnormal liver function tests …, (e) diabetes mellitus, (f) history of radiation of the abdominal region, and (g) drug therapy with antibiotics, lipid lower agents, laxatives, and cholestyramine.

Patient follow-up

For similar reasons, sufficient information must be given regarding the procedures the investigators used to follow up patients, and they should state the numbers lost to follow-up. Some articles include this information under the results section instead of in the methods section

The description of follow-up and dropouts should be sufficiently detailed to permit the reader to draw a diagram of the information. Occasionally, an article presents such a diagram, as was done by Hébert and colleagues in their study of elderly residents in Canada (1997), reproduced in Figure 13-1. Such a diagram makes very clear the number of patients who were eligible, those who were not eligible because of specific reasons, the dropouts, and so on.

Bias Related to Subject Selection

Bias in studies should not happen; it is an error related to selecting subjects or procedures or to measuring a characteristic. Biases are sometimes called measurement errors or systematic errors to distinguish them from random error (random variation), which occurs any time a sample is selected from a population. This section discusses selection bias, a type of bias common in medical research

Selection biases can occur in any study, but they are easier to control in clinical trials and cohort designs. It is important to be aware of selection biases, even though it is not always possible to predict exactly how their presence affects the conclusions. Sackett (1979) enumerated 35 different biases. We discuss some of the major ones that seem especially important to the clinician. If you are interested in a more detailed discussion, consult the article by Sackett and the text by Feinstein (1985), which devotes several chapters to the discussion of bias (especially Chapter 4, in the section titled “The Meaning of the Term Probability,” and Chapters 15–17).

Prevalence or incidence bias

Prevalence (Neyman) bias occurs when a condition is characterized by early fatalities (some subjects die before they are diagnosed) or silent cases (cases in which the evidence of exposure disappears when the disease begins). Prevalence bias can result whenever a time gap occurs between exposure and selection of study subjects and the worst cases have died. A cohort study begun prior to the onset of disease is able to detect occurrences properly, but a case–control study that begins at a later date consists only of the people who did not die. This bias can be prevented in cohort studies and avoided in case–control studies by limiting eligibility for the study to newly diagnosed or incident cases. The practice of limiting eligibility is common in population-based case–control studies in cancer epidemiology

To illustrate prevalence or incidence bias, let us suppose that two groups of people are being studied: those with a risk factor for a given disease (eg, hypertension as a risk factor for stroke) and those without the risk factor. Suppose 1000 people with hypertension and 1000 people without hypertension have been followed for 10 years. At this point, we might have the situation shown in Table 13-1.

Table 13-1. Illustration of prevalence bias: Actual situation.

Patients

Number of Patients in 10-Year Cohort Study

Alive with Cerebro-vascular Disease

Dead from Stroke

Alive with No Cerebro-vascular Disease

With hypertension

50

250

700

Without hypertension

80

20

900

Figure 13-1. Flow of the subjects through the study, a representative sample of elderly people living at home in Sherbrooke, Canada, 1991–1993. (Reproduced, with permission, from Figure 1 in Hébert R, Brayne C, Spiegelhalter D: Incidence of functional decline and improvement in a community-dwelling very elderly population. Am J Epidemiol 1997; 145: 935–944.)

A cohort study begun 10 years ago would conclude correctly that patients with hypertension are more likely to develop cerebrovascular disease than patients without hypertension (300 to 100) and far more likely to die from it (250 to 20).

Suppose, however, a case–control study is undertaken at the end of the 10-year period without limiting eligibility to newly diagnosed cases of cerebrovascular disease. Then the situation illustrated in Table 13-2 occurs.

The odds ratio is calculated as (50 × 900)/(80 × 700) = 0.80, making it appear that hypertension is actually a protective factor for the disease! The bias introduced in an improperly designed case–control study of a disease that kills off one group faster than the other can lead to a conclusion exactly the opposite of the correct conclusion that would be obtained from a well-designed case–control study or a cohort study.

Table 13-2. Illustration of prevalence bias: Result with case–control design.

Patients

Number of Patients in Case–Control Studyat End of 10 Years

With Cerebro-vascular Disease

Without Cerebro-vascular Disease

With hypertension

50

700

Without hypertension

80

900

Admission rate bias

Admission rate bias (Berkson's fallacy) occurs when the study admission rates differ, which causes major distortions in risk ratios. As an example, admission rate bias can occur in studies of hospitalized patients when patients (cases) who have the risk factor are admitted to the hospital more frequently than either the cases without the risk factor or the controls with the risk factor

This fallacy was first pointed out by Berkson (1946) in evaluating an earlier study that had concluded that tuberculosis might have a protective effect on cancer. This conclusion was reached after a case–control study found a negative association between tuberculosis and cancer: The frequency of tuberculosis among hospitalized cancer patients was less than the frequency of tuberculosis among the hospitalized control patients who did not have cancer. These counterintuitive results occurred because a smaller proportion of patients who had both cancer and tuberculosis were hospitalized and thus available for selection as cases in the study; chances are that patients with both diseases were more likely to die than patients with cancer or tuberculosis alone.

It is important to be aware of admission rate bias because many case–control studies reported in the medical literature use hospitalized patients as sources for both cases and controls. The only way to control for this bias is to include an unbiased control group, best accomplished by choosing controls from a wide variety of disease categories or from a population of healthy subjects. Some statisticians suggest using two control groups in studies in which admission bias is a potential problem.

Nonresponse bias and the volunteer effect

Several steps discussed in Chapter 11 can be taken to reduce potential bias when subjects fail to respond to a survey. Bias that occurs when patients either volunteer or refuse to participate in studies is similar to nonresponse bias. This effect was studied in the nationwide Salk polio vaccine trials in 1954 by using two different study designs to evaluate the effectiveness of the vaccine (Meier, 1989). In some communities, children were randomly assigned to receive either the vaccine or a placebo injection. Some communities, however, refused to participate in a randomized trial; they agreed, instead, that second graders could be offered the vaccination and first and third graders could constitute the controls. In analysis of the data, researchers found that families who volunteered their children for participation in the nonrandomized study tended to be better educated and to have a higher income than families who refused to participate. They also tended to be absent from school with a higher frequency than nonparticipants

Although in this example we might guess how absence from school could bias results, it is not always easy to determine how selection bias affects the outcome of the study; it may cause the experimental treatment to appear either better or worse than it should. Investigators should therefore reduce the potential for nonresponse bias as much as possible by using all possible means to increase the response rate and obtain the participation of most eligible patients. Using databases reduces response bias, but sometimes other sources of bias are present, that is, reasons that a specific group or selected information is underrepresented in the database.

Membership bias

Membership bias is essentially a problem of preexisting groups. It also arises because one or more of the same characteristics that cause people to belong to the groups are related to the outcome of interest. For example, investigators have not been able to perform a clinical trial to examine the effects of smoking; some researchers have claimed it is not smoking itself that causes cancer but some other factor that simply happens to be more common in smokers. As readers of the medical literature, we need to be aware of membership bias because it cannot be prevented, and it makes the study of the effect of potential risk factors related to life-style very difficult

A problem similar to membership bias is called the healthy worker effect; it was recognized in epidemiology when workers in a hazardous environment were unexpectedly found to have a higher survival rate than the general public. After further investigation, the cause of this incongruous finding was determined: Good health is a prerequisite in persons who are hired for work, but being healthy enough to work is not a requirement for persons in the general public.

Procedure selection bias

Procedure selection bias occurs when treatment assignments are made on the basis of certain characteristics of the patients, with the result that the treatment groups are not really similar. This bias frequently occurs in studies that are not randomized and is especially a problem in studies using historical controls. A good example is the comparison of a surgical versus a medical approach to a problem such as coronary artery disease. In early studies comparing surgical and medical treatment, patients were not randomized, and the evidence pointed to the conclusion that patients who received surgery were healthier than those treated medically; that is, only healthier patients were subjected to the risks associated with the surgery. The Coronary Artery Surgery Study (CASS, 1983) was undertaken in part to resolve these questions. It is important to be aware of procedure selection bias because many published studies describe a series of patients, some treated one way and some another way, and then proceed to make comparisons and draw inappropriate conclusions as a result.

Procedures Used in the Study and Common Procedural Biases

Terms and measurements

The procedures used in the study are also described in the method section. Here authors provide definitions of measures used in the study, especially any operational definitions developed by the investigators. If unusual instruments or methods are used, the authors should provide a reference and a brief description. For example, the study of screening for domestic violence in emergency department patients by Lapidus and colleagues (2002) defined domestic violence as “past or current physical, sexual, emotional, or verbal harm to a woman caused by a spouse, partner, or family member.” Domestic violence screening was defined as “assessing an individual to determine if she has been a victim of domestic violence.”

The journal Stroke has the practice of presenting the abbreviations and acronyms used in the article in a box. This makes the abbreviations clear and also easy to refer to in reading other sections of the article. For example, in reporting their study of sleep-disordered breathing and stroke, Good and colleagues (1996) presented a list of abbreviations at the top of the column that describe the subjects and methods in the study.

Several biases may occur in the measurement of various patient characteristics and in the procedures used or evaluated in the study. Some of the more common biases are described in the following subsections.

Procedure bias

Procedure bias, discussed by Feinstein (1985), occurs when groups of subjects are not treated in the same manner. For example, the procedures used in an investigation may lead to detection of other problems in patients in the treatment group and make these problems appear to be more prevalent in this group. As another example, the patients in the treatment group may receive more attention and be followed up more vigorously than those in another group, thus stimulating greater compliance with the treatment regimen. The way to avoid this bias is by carrying out all maneuvers except the experimental factor in the same way in all groups and examining all outcomes using similar procedures and criteria.

Recall bias

Recall bias may occur when patients are asked to recall certain events, and subjects in one group are more likely to remember the event than those in the other group. For example, people take aspirin commonly and for many reasons, but patients diagnosed as having peptic ulcer disease may recall the ingestion of aspirin with greater accuracy than those without gastrointestinal problems. In the study of the relationship between juice consumption and growth, Dennison and associates (1997) asked parents to keep a daily journal of all the liquid consumed by their children; a properly maintained journal helps reduce recall bias.

Insensitive-measure bias

Measuring instruments may not be able to detect the characteristic of interest or may not be properly calibrated. For example, routine x-ray films are an insensitive method for detecting osteoporosis because bone loss of approximately 30% must occur before a roentgenogram can detect it. Newer densitometry techniques are more sensitive and thus avoid insensitive-measure bias.

Detection bias

Detection bias can occur because a new diagnostic technique is introduced that is capable of detecting the condition of interest at an earlier stage. Survival for patients diagnosed with the new procedure inappropriately appears to be longer, merely because the condition was diagnosed earlier

A spin-off of detection bias, called the Will Rogers phenomenon (because of his attention to human phenomena), was described by Feinstein and colleagues (1985). They found that a cohort of subjects with lung cancer first treated in 1953–1954 had lower 6-month survival rates for patients with each of the three main stages (localized, nodal involvement, and metastases) as well as for the total group than did a 1977 cohort treated at the same institutions. Newer imaging procedures were used with the later group; however, according to the old diagnostic classification, this group had a prognostically favorable zero-time shift in that their disease was diagnosed at an earlier stage. In addition, by detecting metastases in the 1977 group that were missed in the earlier group, the new technologic approaches resulted in stage migration; that is, members of the 1977 cohort were diagnosed as having a more advanced stage of the disease, whereas they would have been diagnosed as having earlier-stage disease in 1953–1954. The individuals who migrated from the earlier-stage group to the later-stage group tended to have the poorest survival in the earlier-stage group; so removing them resulted in an increase in survival rates in the earlier group. At the same time, these individuals, now assigned to the later-stage group, were better off than most other patients in this group, and their addition to the group resulted in an increased survival in the later-stage group as well. The authors stated that the 1953–1954 and 1977 cohorts actually had similar survival rates when patients in the 1977 group were classified according to the criteria that would have been in effect had there been no advances in diagnostic techniques.

Compliance bias

Compliance bias occurs when patients find it easier or more pleasant to comply with one treatment than with another. For example, in the treatment of hypertension, a comparison of α-methyldopa versus hydrochlorothiazide may demonstrate inaccurate results because some patients do not take α-methyldopa owing to its unpleasant side effects, such as drowsiness, fatigue, or impotence in male patients.

Assessing Study Outcomes

Variation in data

In many clinics, a nurse collects certain information about a patient (eg, height, weight, date of birth, blood pressure, pulse) and records it on the medical record before the patient is seen by a physician. Suppose a patient's blood pressure is recorded as 140/88 on the chart; the physician, taking the patient's blood pressure again as part of the physical examination, observes a reading of 148/96. Which blood pressure reading is correct? What factors might be responsible for the differences in the observation? We use blood pressure and other clinical information to examine sources of variation in data and ways to measure the reliability of observations. Two classic articles in the Canadian Medical Association Journal (McMaster University Health Sciences Centre, Department of Clinical Epidemiology and Biostatistics, 1980a; 1980b) discuss sources of clinical disagreement and ways disagreement can be minimized.

Factors that contribute to variation in clinical observations

Variation, or variability in measurements on the same subject, in clinical observations and measurements can be classified into three categories: (1) variation in the characteristic being measured, (2) variation introduced by the examiner, and (3) variation owing to the instrument or method used. It is especially important to control variation due to the second two factors as much as possible so that the reported results will generalize as intended

Substantial variability may occur in the measurement of biologic characteristics. For example, a person's blood pressure is not the same from one time to another, and thus, blood pressure values vary. A patient's description of symptoms to two different physicians may vary because the patient may forget something. Medications and illness can also affect the way a patient behaves and what information he or she remembers to tell a nurse or physician.

Even when no change occurs in the subject, different observers may report different measurements. When examination of a characteristic requires visual acuity, such as the reading on a sphygmomanometer or the features on an x-ray film, differences may result from the varying visual abilities of the observers. Such differences can also play a role when hearing (detecting heart sounds) or feeling (palpating internal organs) is required. Some individuals are simply more skilled than others in history taking or performing certain examinations.

Variability also occurs when the characteristic being measured is a behavioral attribute. Two examples are measurements of functional status and measurements of pain; here the additional component of patient or observer interpretation can increase apparent variability. In addition, observers may tend to observe and record what they expect based on other information about the patient. These factors point out the need for a standardized protocol for data collection.

The instrument used in the examination can be another source of variation. For instance, mercury column sphygmomanometers are less inherently variable than aneroid models. In addition, the environment in which the examination takes place, including lighting and noise level, presence of other individuals, and room temperature, can produce apparent differences. Methods for measuring behavior-related characteristics such as functional status or pain usually consist of a set of questions answered by patients and hence are not as precise as instruments that measure physical characteristics.

Several steps can be taken to reduce variability. Taking a history when the patient is calm and not heavily medicated and checking with family members when the patient is incapacitated are both useful in minimizing errors that result from a patient's illness or the effects of medication. Collecting information and making observations in a proper environment is also a good strategy. Recognizing one's own strengths and weaknesses helps one evaluate the need for other opinions. Blind assessment, especially of subjective characteristics, guards against errors resulting from preconceptions. Repeating questionable aspects of the examination or asking a colleague to perform a key aspect (blindly, of course) reduces the possibility of error. Having well-defined operational guidelines for using classification scales helps people use them in a consistent manner. Ensuring that instruments are properly calibrated and correctly used eliminates many errors and thus reduces variation.

Ways to measure reliability and validity

A common strategy to ensure the reliability or reproducibility of measurements, especially for research purposes, is to replicate the measurements and evaluate the degree of agreement. We discussed intra- and interrater reliability in Chapter 5 and discussed reliability and validity in detail in Chapter 11

One approach to establishing the reliability of a measure is to repeat the measurements at a different time or by a different person and compare the results. When the outcome is nominal, the kappa statistic is used; when the scale of measurement is numerical, the statistic used to examine the relationship is the correlation coefficient (Chapter 8) or the intraclass correlation (Chapter 11).

Hébert and colleagues (1997) used the Functional Autonomy Measurement System (SMAF) instrument to measure cognitive functioning and depression. They evaluated its validity by comparing the SMAF score with the nursing time required for care (r = 0.88) and between disability scores for residents living in settings of different levels of care. The high correlation between nursing time and score indicates that patients with higher (more dependent) scores required more nursing time, a reasonable expectation. Another indication of validity is higher disability scores among residents living in settings where they were provided with a high level of care and lower scores among residents who live independently.

Blinding

Another aspect of assessing the outcome is related to ways of increasing the objectivity and decreasing the subjectivity of the assessment. In studies involving the comparison of two treatments or procedures, the most effective method for achieving objective assessment is to have both patient and investigator be unaware of which method was used. If only the patient is unaware, the study is called blind; if both patient and investigator are unaware, it is called double-blind.

Ballard and colleagues (1998) studied the effect of antenatal thyrotropin-releasing hormone in preventing lung disease in preterm infants in a randomized study. Experimental subjects were given the hormone, and controls were given placebo. The authors stated:

The women were randomly assigned within centers to the treatment or placebo group in permuted blocks of four. The study was double-blinded, and only the pharmacies at the participating centers had the randomization schedule

Blinding helps to reduce a priori biases on the part of both patient and physician. Patients who are aware of their treatment assignment may imagine certain side effects or expect specific benefits, and their expectations may influence the outcome. Similarly, investigators who know which treatment has been assigned to a given patient may be more watchful for certain side effects or benefits. Although we might suspect an investigator who is not blinded to be more favorable to the new treatment or procedure, just the opposite may happen; that is, the investigator may bend over backward to keep from being prejudiced by his or her knowledge and therefore may err in favor of the control.

Knowledge of treatment assignment may be somewhat less influential when the outcome is straightforward, as is true for mortality. With mortality, it is difficult to see how outcome assessment can be biased. Many examples exist in which the outcome appears to be objective, however, even though its evaluation contains subjective components. Many clinical studies attempt to ascribe reasons for mortality or morbidity, and judgment begins to play a role in these cases. For example, mortality is often an outcome of interest in studies involving organ transplantation, and investigators wish to differentiate between deaths from failure of the organ and deaths from an unrelated cause. If the patient dies in an automobile accident, for example, investigators can easily decide that the death is not due to organ rejection; but in most situations, the decision is not so easy.

The issue of blinding becomes more important as the outcome becomes less amenable to objective determination. Research that focuses on quality-of-life outcomes, such as chest pain status, activity limitation, or recreational status, require subjective measures. Although patients cannot be blinded in many studies, the subjective outcomes can be assessed by a person, such as another physician, a psychologist, or a physical therapist, who is blind to the treatment the patient received.

Data quality and monitoring

The method section is also the place where steps taken to ensure the accuracy of the data should be described. Increased variation and possibly incorrect conclusions can occur if the correct observation is made but is incorrectly recorded or coded. Dennison and colleagues (1997, p. 16) stated: “All questionnaire data were dual-entered and verified before being entered into a … database.” Dual or duplicate entry decreases the likelihood of errors because it is unusual for the same entry error to occur twice

Multicenter studies provide additional data quality challenges. It is important that an accurate and complete protocol be developed to ensure that data are handled the same way in all centers. Gelber and colleagues (1997) studied data collected from 63 centers in North America in setting normative values for cardiovascular autonomic nervous system tests. They reported that:

All site personnel were trained by a member of the Autonomic Nervous System (ANS) Reading Center in the use of the equipment and testing methodology…. All data were analyzed at a single Autonomic Reading Center. The analysis program contains internal checks which alert the analyzing technician to any aberrant data points overlooked during the editing process and warns the technician when the results suggest that the test may have been performed improperly. The analysis of each study was reviewed by the director of the ANS Reading Center

In addition to standardized training, the data entry process itself was monitored for potential errors.

Determining an Appropriate Sample Size

Specifying the sample size needed to detect a difference or an effect of a given magnitude is one of the most crucial pieces of information in the report of a medical study. Recall that missing a significant difference is called a type II error, and this error can happen when the sample size is too small. We provide many illustrations in the chapters that discuss specific statistical methods, especially Chapters 5–8 and 10–11

Determination of sample size is referred to as power analysis or as determining the power of a study. An assessment of power is essential in negative studies, studies that fail to find an expected difference or relationship; we feel so strongly about this point that we recommend that readers disregard negative studies that do not provide information on power.

Harper studied the use of paracervical block to diminish cramping associated with cryosurgery (1997). She stated:

To have a power of 80% to detect a difference of 20 mm on the visual analog scale at the 0.05 level of significance (assuming a standard deviation of 30 mm), the power analysis a priori showed that 35 women would be needed in each cohort. The first 35 women who met the inclusion and exclusion criteria for cryosurgery were treated in the usual manner with no anesthetic block given before cryosurgery. The variances of the actual responses were greater than anticipated in the a priori power analysis, leading to the subsequent enrollment of the next five women qualifying for the study for a total of 40 women in the usual treatment group. This increase in enrollment maintained the power of the study.

Thus, as a result of analysis of data, the investigator opted to increase the sample size to maintain power

We repeatedly emphasize the need to perform a power analysis prior to beginning a study and have illustrated how to estimate power using statistical programs for that purpose. Investigators planning complicated studies or studies involving a number of variables are especially advised to contact a statistician for assistance.

Evaluating the Statistical Methods

Statistical methods are the primary focus of this text, and only a brief summary and some common problems are listed here. At the risk of oversimplification, the use of statistics in medicine can be summarized as follows: (1) to answer questions concerning differences; (2) to answer questions concerning associations; and (3) to control for confounding issues or to make predictions. If you can determine the type of question investigators are asking (from the stated purpose of the study) and the types and numbers of measures used in the study (numerical, ordinal, nominal), then the appropriate statistical procedure should be relatively easy to specify. Tables 10-1 and 10-2 inChapter 10 and the flowcharts in Appendix C were developed to assist with this process. Some common biases in evaluating data are discussed in the next sections.

Fishing expedition

A fishing expedition is the name given to studies in which the investigators do not have clear-cut research questions guiding the research. Instead, data are collected, and a search is carried out for results that are significant. The problem with this approach is that it capitalizes on chance occurrences and leads to conclusions that may not hold up if the study is replicated. Unfortunately, such studies are rarely repeated, and incorrect conclusions can remain a part of accepted wisdom.

Multiple significance tests

Multiple tests in statistics, just as in clinical medicine, result in increased chances of making a type I, or false-positive, error when the results from one test are interpreted as being independent of the results from another. For example, a factorial design for analysis of variance in a study involving four groups measured on three independent variables has the possibility of 18 comparisons (6 comparisons among the four groups on each of three variables), ignoring interactions. If each comparison is made for P ≤ 0.05, the probability of finding one or more comparisons significant by chance is considerably greater than 0.05. The best way to guard against this bias is by performing the appropriate global test with analysis of variance prior to making individual group comparisons (Chapter 7) or using an appropriate method to analyze multiple variables (Chapter 10)

A similar problem can occur in a clinical trial if too many interim analyses are done. Sometimes it is important to analyze the data at certain stages during a trial to learn if one treatment is clearly superior or inferior. Many trials are stopped early when an interim analysis determines that one therapy is markedly superior to another. For instance, the Women's Health Initiative trial on estrogen plus progestin (Writing Group for WHI, 2002) and the study of finasteride and the development of prostate cancer (Thompson et al, 2003) were both stopped early. In the estrogen study, the conclusion was that overall health risks outweighed the benefits, whereas finasteride was found to prevent or delay prostate cancer in a significant number of men. In these situations, it is unethical to deny patients the superior treatment or to continue to subject them to a risky treatment. Interim analyses should be planned as part of the design of the study, and the overall probability of a type I error (the α level) should be adjusted to compensate for the multiple comparisons.

Migration bias

Migration bias occurs when patients who drop out of the study are also dropped from the analysis. The tendency to drop out of a study may be associated with the treatment (eg, its side effects), and dropping these subjects from the analysis can make a treatment appear more or less effective than it really is. Migration bias can also occur when patients cross over from the treatment arm to which they were assigned to another treatment. For example, in crossover studies comparing surgical and medical treatment for coronary artery disease, patients assigned to the medical arm of the study sometimes require subsequent surgical treatment for their condition. In such situations, the appropriate method is to analyze the patient according to his or her original group; this is referred to as analysis based on the intention-to-treat principle.

Entry time bias

Entry time bias may occur when time-related variables, such as survival or time to remission, are counted differently for different arms of the study. For example, consider a study comparing survival for patients randomized to a surgical versus a medical treatment in a clinical trial. Patients randomized to the medical treatment who die at any time after randomization are counted as treatment failures; the same rule must be followed with patients randomized to surgery, even if they die prior to the time surgery is performed. Otherwise, a bias exists in favor of the surgical treatment.

THE RESULTS SECTION OF A RESEARCH REPORT

The results section of a medical report contains just that: results of (or findings from) the research directed at questions posed in the introduction. Typically, authors present tables or graphs (or both) of quantitative data and also report the findings in the text. Findings generally consist of both descriptive statistics (means, standard deviations, risk ratios, etc) and results of statistical tests that were performed. Results of statistical tests are typically given as either P values or confidence limits; authors seldom give the value of the statistical test itself but, rather, give the P value associated with the statistical test. The two major aspects for readers evaluating the results section are the adequacy of information and the sufficiency of the evidence to withstand possible threats to the validity of the conclusions.

Assessing the Data Presented

Authors should provide adequate information about measurements made in the study. At a minimum, this information should include sample sizes and either means and standard deviations for numerical measures or proportions for nominal measures. For example, in describing the effect of sex, race, and age on estimating percentage body fat from body mass index, Jackson and colleagues (2002) specified the independent variables in the method section along with how they were coded for the regression analysis. They also gave the equation of the standard error of the estimate that they used to evaluate the fit of the regression models. In the results section, they included a table of means and standard deviations of the variables broken down by sex and race.

In addition to presenting adequate information on the observations in the study, good medical reports use tables and graphs appropriately. As we outlined in Chapter 3, tables and graphs should be clearly labeled so that they can be interpreted without referring to the text of the article. Furthermore, they should be properly constructed, using the methods illustrated in Chapter 3.

Assuring the Validity of the Data

A good results section should have the following three characteristics. First, authors of medical reports should provide information about the baseline measures of the group (or groups) involved in the study as did Jackson and colleagues (2002) in Table 2 of their article. Tables like this one typically give information on the gender, age, and any important risk factors for subjects in the different groups and are especially important in observational studies. Even with randomized studies, it is always a good idea to show that, in fact, the randomization worked and the groups were similar. Investigators often perform statistical tests to demonstrate the lack of significant difference on the baseline measures. If it turns out that the groups are not equivalent, it may be possible to make adjustments for any important differences by one of the covariance adjusting methods discussed in Chapter 10 Second, readers should be alert for the problem of multiple comparisons in studies in which many statistical tests are performed. Multiple comparisons can occur because a group of subjects is measured at several points in time, for which repeated-measures analysis of variance should be used. They also occur when many study outcomes are of interest; investigators should use multivariate procedures in these situations. In addition, multiple comparisons result when investigators perform many subgroup comparisons, such as between men and women, among different age groups, or between groups defined by the absence or presence of a risk factor. Again, either multivariate methods or clearly stated a priori hypotheses are needed. If investigators find unexpected differences that were not part of the original hypotheses, these should be advanced as tentative conclusions only and should be the basis for further research.

Third, it is important to watch for inconsistencies between information presented in tables or graphs and information discussed in the text. Such inconsistencies may be the result of typographic errors, but sometimes they are signs that the authors have reanalyzed and rewritten the results or that the researchers were not very careful in their procedures. In any case, more than one inconsistency should alert us to watch for other problems in the study.

THE DISCUSSION & CONCLUSION SECTIONS OF A RESEARCH REPORT

The discussion and conclusion section(s) of a medical report may be one of the easier sections for clinicians to assess. The first and most important point to watch for is consistency among comments in the discussion, questions posed in the introduction, and data presented in the results. In addition, authors should address the consistency or lack of same between their findings and those of other published results. Careful readers will find that a surprisingly large number of published studies do not address the questions posed in their introduction. A good habit is to refer to the introduction and briefly review the purpose of the study just prior to reading the discussion and conclusion

The second point to consider is whether the authors extrapolated beyond the data analyzed in the study. For example, are there recommendations concerning dosage levels not included in the study? Have conclusions been drawn that require a longer period of follow-up than that covered by the study? Have the results been generalized to groups of patients not represented by those included in the study?

Finally, note whether the investigators point out any shortcomings of the study, especially those that affect the conclusions drawn, and discuss research questions that have arisen from the study or that still remain unanswered. No one is in a better position to discuss these issues than the researchers who are intimately involved with the design and analysis of the study they have reported.

A CHECKLIST FOR READING THE LITERATURE

It is a rare article that meets all the criteria we have included in the following lists. Many articles do not even provide enough information to make a decision about some of the items in the checklist. Nevertheless, practitioners do not have time to read all the articles published, so they must make some choices about which ones are most important and best presented. Bordage and Dawson (2003) developed a set of guidelines for preparing a study and writing a research grant that contains many topics that are relevant to reading an article as well. The companion to our text, the book by Greenberg and colleagues (2000), is recommended for suggestions in reading the epidemiologic literature. Greenhalgh (1997b) presents a collection of articles on various topics published in the British Medical Journal, and the Journal of the American Medical Association has published a series of excellent articles under the general title of “Users” Guides to the Medical Literature.”

The following checklist is fairly exhaustive, and some readers may not want to use it unless they are reviewing an article for their own purposes or for a report. The items on the checklist are included in part as a reminder to the reader to look for these characteristics. Its primary purpose is to help clinicians decide whether a journal article is worth reading and, if so, what issues are important when deciding if the results are useful. The items in italics can often be found in a structured abstract. An asterisk (*) designates items that we believe are the most critical; these items are the ones readers should use when a less comprehensive checklist is desired.

Reading the Structured Abstract

·      *A. Is the topic of the study important and worth knowing about?

·      *B. What is the purpose of the study? Is the focus on a difference or a relationship? The purpose should be clearly stated; one should not have to guess.

·      C. What is the main outcome from the study? Does the outcome describe something measured on a numerical scale or something counted on a categorical scale? The outcome should be clearly stated.

·      D. Is the population of patients relevant to your practice—can you use these results in the care of your patients? The population in the study affects whether or not the results can be generalized.

·      *E. If statistically significant, do the results have clinical significance as well?

Reading the Introduction

If the article does not contain a structured abstract, the introduction section should include all of the preceding information plus the following information.

·      *A. What research has already been done on this topic and what outcomes were reported? The study should add new information.

Reading the Methods

·      *A. Is the appropriate study design used (clinical trial, cohort, case–control, cross-sectional, meta-analysis)?

·      B. Does the study cover an adequate period of time? Is the follow-up period long enough?

·      *C. Are the criteria for inclusion and exclusion of subjects clear? How do these criteria limit the applicability of the conclusions? The criteria also affect whether or not the results can be generalized.

·      *D. Were subjects randomly sampled (or randomly assigned)? Was the sampling method adequately described?

·      E. Are standard measures used? Is a reference to any unusual measurement/procedure given if needed? Are the measures reliable/replicable?

·      F. What other outcomes (or dependent variables) and risk factors (or independent variables) are in the study? Are they clearly defined?

·      *G. Are statistical methods outlined? Are they appropriate? (The first question is easy to check; the second may be more difficult to answer.)

·      *H. Is there a statement about power—the number of patients that are needed to find the desired outcome? A statement about sample size is essential in a negative study.

·      I. In a clinical trial:

   1. How are subjects recruited?

   *2. Are subjects randomly assigned to the study groups? If not:

      a. How are patients selected for the study to avoid selection biases?

      b. If historical controls are used, are methods and criteria the same for the experimental group; are cases and controls compared on prognostic factors?

   *3. Is there a control group? If so, is it a good one?

   4. Are appropriate therapies included?

   5. Is the study blind? Double-blind? If not, should it be?

   6. How is compliance ensured/evaluated?

   *7. If some cases are censored, is a survival method such as Kaplan–Meier or the Cox model used?

·      J. In a cohort study:

   *1. How are subjects recruited?

   2. Are subjects randomly selected from an eligible pool?

   *3. How rigorously are subjects followed? How many dropouts does the study have and who are they?

   *4. If some cases are censored, is a survival method such as Kaplan–Meier curves used?

·      K. In a case–control study:

   *1. Are subjects randomly selected from an eligible pool?

   2. Is the control group a good one (bias-free)?

   3. Are records reviewed independently by more than one person (thereby increasing the reliability of data)?

·      L. In a cross-sectional (survey, epidemiologic) study:

   1. Are the questions unbiased?

   *2. Are subjects randomly selected from an eligible pool?

   *3. What is the response rate?

·      M. In a meta-analysis:

   *1. How is the literature search conducted?

   2. Are the criteria for inclusion and exclusion of studies clearly stated?

   *3. Is an effort made to reduce publication bias (because negative studies are less likely to be published)?

   *4. Is there information on how many studies are needed to change the conclusion?

Reading the Results

·      *A. Do the reported findings answer the research questions?

·      *B. Are actual values reported—means, standard deviations, proportions—so that the magnitude of differences can be judged by the reader?

·      C. Are many P values reported, thus increasing the chance that some findings are bogus?

·      *D. Are groups similar on baseline measures? If not, how did investigators deal with these differences (confounding factors)?

·      E. Are the graphs and tables, and their legends, easy to read and understand?

·      *F. If the topic is a diagnostic procedure, is information on both sensitivity and specificity (false-positive rate) given? If predictive values are given, is the dependence on prevalence emphasized?

Reading the Conclusion and Discussion

·      *A. Are the research questions adequately discussed?

·      *B. Are the conclusions justified? Do the authors extrapolate more than they should, for example, beyond the length of time subjects were studied or to populations not included in the study?

·      C. Are the conclusions of the study discussed in the context of other relevant research?

·      D. Are shortcomings of the research addressed?

Table 13-3. Association between anticipatory dementia and well-being by sample group.

 

Anticipatory Dementia

Adult Children (n = 25)

Control (n = 25)

CES-D

   r

0.352

0.266

   P

0.085

0.200

Psych symptoms

   r

0.341

0.433

   P

0.095

0.031

Life satisfaction

   r

-0.419

-0.283

   P

0.037

0.171

Health status

   r

-0.173

-0.446

   P

0.409

0.026

CES-D = Center for Epidemiological Studies Depression Scale.
Source: Reproduced, with permission, from Hodgson LG, Cutler SJ: Anticipatory dementia and well-being. Am J Alzheimer's Dis 1997;12:62–66.

Table 13-4. Predictors of depression score at wave 2.

Predictor Variablea

Betab

P

R2

R2 change

Depression score, wave 1

0.231

0.000

0.182

0.182

Sociodemographic variables

Age

-0.024

0.528

0.187

0.005

Sex

0.034

0.370

Psychologic health variables

Neuroticism, wave 1

0.077

0.056

0.237

0.050

Past history of depression or nervous breakdown, wave 2

0.136

0.000

Physical health variables

ADL, wave 1

-0.103

0.033

0.411

0.174

ADL, wave 2

0.283

0.012

ADL squared, wave 2

-0.150

0.076

Number current symptoms, wave 2

0.117

0.009

Number medical conditions, wave 2

0.226

0.000

Blood pressure: systolic, wave 2

-0.092

0.010

Global health rating change between waves

0.079

0.028

Sensory impairment change between waves

-0.064

0.073

Social support/inactivity

Social support—friends, wave 2

-0.095

0.015

0.442

0.031

Social support—social visits, wave 2

-0.087

0.032

Activity level, wave 2

0.095

0.025

   

Services (community residents only)

Total services used, wave 2

0.135c

0.001c

0.438c

0.015c

aOnly the variables shown were included in the final model.
bStandardized beta value, controlling for all other variables in the regression, except service use. Based on community and institutional residents.
cRegression limited to community sample only; coefficients for other variables vary only very slightly from those obtained with regression on the full sample.
Source: Reproduced, with permission, from Henderson AS, Korten AE, Jacomb PA, MacKinnon AJ, Jorm AF, Christensen H, et al: The course of depression in the elderly: A longitudinal community- based study in Australia. Psychol Med 1997;27:119–129.

Table 13-5. Partial regression coefficients (β) and significance levels of selected independent variables, according to multivariate analysis in model C.a

Outcome

Level of BMIb

Slope of BMI

CV of BMIc

β

P value

β

P value

β

P value

Men

   Total mortality

+0.024

0.16

-2.52

0.0001

+9.24

0.0001

   Morbidity due to CHD

+0.042

0.04

-2.24

0.0010

+10.89

0.0001

   Mortality from CHD

+0.037

0.17

-3.16

0.0002

+12.42

0.0003

   Morbidity due to cancer

+0.015

0.50

-1.46

0.0800

+5.42

0.0900

Women

   Total mortality

+0.024

0.09

-1.60

0.0002

+3.64

0.04

   Morbidity due to CHD

+0.047

0.02

-1.13

0.0800

+4.92

0.07

   Mortality from CHD

+0.057

0.01

-2.62

0.0004

+6.77

0.02

   Morbidity due to cancer

+0.010

0.60

+0.14

0.8000

+0.67

0.80

aModel C included five risk factors for cardiovascular disease (smoking, serum cholesterol level, systolic blood pressure, glucose tolerance, and level of physical activity) in addition to age, level of the body mass index (BMI, the weight in kilograms divided by the square of the height in meters), slope of the BMI (defined as the change in the BMI per year from 25 years of age to the eighth examination), and the coefficient of variation (CV) of the BMI. CHD denotes coronary heart disease.
b Calculated as a mean value for each subject.
c Defined as the standard deviation of the BMI divided by the mean BMI.
Source: Reproduced, with permission, from Table 2 in Lissner L, Odell PM, D'Agostino RB, Stokes J 3rd, Kreger BE, Belanger AJ, et al: Variability of body weight and health outcomes in the Framingham population. N Engl J Med 1991;324:1839–1844.

Table 13-6. Clinical and metabolic details.a,b

Subjects

Age (years)

BMI (kg/m2)

Basal Glucose (mmol/L)

Basal Insulin (pmol/L)

KG (%/min)

NW-HT (n = 8)

39.4 ą 5.0 (20–62)

23.5 ą 0.6 (21.4 -25)

4.99 ą 0.26

84.8 ą 13.6

2.45 ą 0.41

OW-HT (n = 6)

42.7 ą 5.7 (23–60)

28.0 ą 0.7 (26–31.4)

5.05 ą 0.11

134 ą 23.6

1.84 ą 0.20

Total-HT (n= 14)

40.8 ą 3.7 (20–62)

25.4 ą 0.8 (21.4 -31.4)

5.02 ą 0.15

105.9 ą 13.9

2.19 ą 0.26

NW-C (n = 11)

42.8 ą 5.0 (22–66)

21.3 ą 0.9 (18–24)

4.81 ą 0.11

79.6 ą 7.0

2.35 ą 0.13

OW-C (n = 8)

45.0 ą 4.7 (26–62)

29.0 ą 0.8 (25.5–31.6)

5.0 ą 0.22

112.5 ą 20.5

2.65 ą 0.35

Total-C (n = 19)

43.7 ą 3.4 (22–66)

24.5 ą 1.0 (18–31.6)

4.89 ą 0.11

15.5 ą 1.6

2.47 ą 0.16

aNumbers in parentheses are range values.

bData presented as mean plus or minus the standard error of the mean.
BMI = body mass index; NW = normal weight; OW = overweight; HT = hyperthyroid patients; C = control subjects.
Source: Reproduced, with permission, from Table 1 in Gonzalo MA, Grant C, Moreno I, Garcia FJ, Suarez AI, Herrera-Pombo JL, et al: Glucose tolerance, insulin secretion, insulin sensitivity and glucose effectiveness in normal and overweight hyperthyroid women. Clin Endocrinol 1996;45:689–697.

EXERCISES

For Questions 1–65, choose the single best answer.

1. Table 2 from Hodgson and Cutler (1997) is reproduced in Table 13-3. Which variable is most closely associated with anticipatory dementia?

a. CES-D

b. Psych symptoms

c. Life satisfaction

d. Health status

Questions 2–3

2. Henderson and colleagues (1997) studied a community sample of elderly subjects for a period of 3–4 years. One of their goals was to predict the level of depression, as measured by a depression score, at the end of the study period. Table 13-4 gives the list of measures used in their analysis. What type of statistical method was used to produce this information?

a. Multiple regression

b. Logistic regression

c. Cox proportional hazard model

d. Paired t tests

e. Wilcoxon signed rank test

3. Which set of variables increased the prediction of depression score by the largest amount?

a. Sociodemographic variables

b. Age

c. Psychological health variables

d. Physical health variables

e. ADL at wave 2

4. In study of mortality from selected diseases in England and Wales by Barker (1989), the author stated that the trend in deaths from thyrotoxicosis rose to a peak in the 1930s and declined thereafter. A graph from this investigation is given in Figure 13-2. Based on this figure, which of the following statements is true?

a. Iodine deficiency during childhood was not a problem among people born in Britain in the 1800s.

b. Age-specific rates of thyrotoxicosis rose progressively beginning in 1836 and reached a peak in 1880.

c. People who are iodine-deficient in youth are more able to adapt to increased iodine intake in later life.

d. People exposed to increased levels of iodine during their adult years are more likely to develop thyrotoxicosis.

Questions 5–6

In a sample of 49 individuals, the mean total leukocyte count is found to be 7600 cells/mm3, with a standard deviation of 1400 cells/mm3.

5. If it is reasonable to assume that total leukocyte counts follow a normal distribution, then approximately 50% of the individuals will have a value

a. Between 6200 and 9000

b. Between 7400 and 7800

c. Below 6200 or above 9000

d. Below 7600

e. Above 9000

6. Again assuming a normal distribution of total leukocyte counts, a randomly selected individual has a total leukocyte count lower than 4800 cells/mm3

a. 1% of the time

b. 2.5% of the time

c. 5% of the time

d. 10% of the time

e. 16.5% of the time

7. If the correlation between two measures of functional status is 0.80, we can conclude that

a. The value of one measure increases by 0.80 when the other measure increases by 1.

b. 64% of the observations fall on the regression line.

c. 80% of the observations fall on the regression line.

 

Figure 13-2. Relative mortality from thyrotoxicosis in successive generations of women in England and Wales according to year of birth, together with estimated per capita daily iodine intake from milk, meat, and fish. (Reproduced, with permission, from Figure 2 in Barker DJP: Rise and fall of Western diseases. Nature 1989; 338: 371–372.)

d. 80% of the variation in one measure is accounted for by the other.

e. 64% of the variation in one measure is accounted for by the other.

Questions 8–10

An evaluation of an antibiotic in the treatment of possible occult bacteremia was undertaken. Five hundred children with fever but no focal infection were randomly assigned to the antibiotic or to a placebo. A blood sample for culture was obtained prior to beginning therapy, and all patients were reevaluated after 48 h.

8. The design used in this study is best described as a

a. Randomized clinical trial

b. Placebo-controlled trial

c. Controlled clinical trial

d. Cohort study

e. Crossover study

9. The authors reported the proportion of children with major infectious morbidity among those with bacteremia was 13% in the placebo group and 10% in the antibiotic group. The 95% confidence interval for the difference in proportions was -2.6% to +8.6%. Thus, the most important conclusion is that

a. The difference in major infectious morbidity between placebo and antibiotic is statistically significant.

b. The proportion of children with major infectious morbidity is the same with placebo and antibiotic.

c. No statistically significant difference exists in the proportions that received placebo and antibiotic.

d. The study has low power to detect a difference owing to the small sample size, and no conclusions should be drawn until a larger study is done.

e. Using a chi-square test to determine significance is preferable to determining a confidence interval for the difference.

Figure 13-3. Box plots showing median levels of B-Type Natriuretic Peptide among patients in each of the four New York Heart Association Classification. Boxes show interquartile ranges, and I bars represent highest and lowest values. (Reproduced, with permission, from Figure 2 in Maisel AS, Krishnaswamy P, Nowak RM, McCord J, Hollander JE, Duc P, et al: Rapid measurement of B-type natriuretic peptide in the emergency diagnosis of heart failure. N Engl J Med 2002; 347: 161–167.)

10. What is the approximate number needed to treat to prevent one patient from developing occult bacteremia?

a. 15

b. 6.7 or 7

c. 65

d. 3

e. 33.3 or 33

Questions 11-12

Maisel and colleagues (2002) conducted a multinational trial to evaluate the use of B-type natriuretic peptide measurements in the diagnosis of CHF. Patients complaining of shortness of breath were evaluated in the emergency department; physicians assessed the probability that the patient had CHF without knowledge of the results of measurement of B-type natriuretic peptide. Refer to Figure 13-3for a boxplot of the distributions of B-type natriuretic (pg/mL) for patients in the four New York Heart Association Functional Classifications (based on limitations in physical activity, fatigue, and dyspnea).

11. What is the estimated median B-type natriuretic for patients in class II?

a. >1000

b. Approximately 500

c. Approximately 300

d. 500 – 100 = 400

e. 1000 – 50 = 950

12. The statistical method most appropriate to analyze the means of the four groups is

a. Analysis of variance

b. Correlation

c. Independent groups t test

d. Simple linear regression

e. Trend analysis

Questions 13-16

A study of fluctuation in body weight and health outcomes using data on participants in the Framingham study was undertaken (Lissner et al, 1991). The investigators used body mass index (BMI) values from the first eight biennial examinations during the study plus the subject's recalled weight at age 25 to determine three measures for each subject: mean BMI, linear trend of BMI, and coefficient of variation. Table 13-5 presents the regression coefficients and P values for level, slope, and coefficient of variation (CV) of BMI in predicting total mortality, morbidity and mortality from coronary heart disease, and morbidity from cancer for men and women separately.

13. The dependent variable best predicted for men was

a. Total mortality

b. Morbidity due to coronary heart disease (CHD)

c. Mortality from CHD

d. Morbidity due to cancer

e. Impossible to determine

14. The measure of BMI most important in predicting outcomes for women was

a. Level of BMI

b. Slope of BMI

c. CV of BMI

d. Impossible to determine

15. The BMI variable with the widest 95% confidence interval is

a. Level of BMI for morbidity due to CHD in men

b. Slope of BMI for predicting total mortality in women

c. CV of BMI for predicting morbidity due to cancer in women

d. CV of BMI for predicting total mortality in men

e. Impossible to determine

16. The legend to Table 13-5 says that these results are for a model that included five risk factors in addition to age and the three BMI variables. The main purpose for using this model is to

a. Include all relevant risk factors

b. Determine the effect of BMI variables controlling for risk factors

c. Determine if differences exist between men and women once risk factors are included

d. Determine if the risk factors are important

17. Table 1 from the study of glucose tolerance and insulin sensitivity in normal and overweight hyperthyroid women is reproduced below in Table 13-6 (Gonzalo et al, 1996). Which statistical procedure is best if the authors wanted to compare the baseline glucose (mmol/L) in the women to see if weight or thyroid level have an effect?

a. Correlation

b. Independent-groups t test

c. Paired t test

d. One-way ANOVA

e. Two-way ANOVA

18. If the relationship between two measures is linear and the coefficient of determination has a value near 1, a scatterplot of the observations

a. Is a horizontal straight line

b. Is a vertical straight line

c. Is a straight line that is neither horizontal nor vertical

d. Is a random scatter of points about the regression line

e. Has a positive slope

Questions 19–22

A study was undertaken to evaluate the use of computed tomography (CT) in the diagnosis of lumbar disk herniation. Eighty patients with lumbar disk herniation confirmed by surgery were evaluated with CT, as were 50 patients without herniation. The CT results were positive in 56 of the patients with herniation and in 10 of the patients without herniation.

19. The sensitivity of CT for lumbar disk herniation in this study is

a. 10/50, or 20%

b. 24/80, or 30%

c. 56/80, or 70%

d. 40/50, or 80%

e. 56/66, or 85%

20. The false-positive rate in this study is

a. 10/50, or 20%

b. 24/80, or 30%

c. 56/80, or 70%

d. 40/50, or 80%

e. 56/66, or 85%

21. Computed tomography is used in a patient who has a 50–50 chance of having a herniated lumbar disk, according to the patient's history and physical examination. What are the chances of herniation if the CT is positive?

a. 35/100, or 35%

b. 50/100, or 50%

c. 35/50, or 70%

d. 40/55, or 73%

e. 35/45, or 78%

22. The likelihood ratio is

a. 0.28

b. 0.875

c. 1.0

d. 3.5

e. 7.0

23. In a placebo-controlled trial of the use of oral aspirin–dipyridamole to prevent arterial restenosis after coronary angioplasty, 38% of patients receiving the drug had restenosis, and 39% of patients receiving placebo had restenosis. In reporting this finding, the authors stated that P > 0.05, which means that

a. Chances are greater than 1 in 20 that a difference would again be found if the study were repeated.

b. The probability is less than 1 in 20 that a difference this large could occur by chance alone.

c. The probability is greater than 1 in 20 that a difference this large could occur by chance alone.

d. Treated patients were 5% less likely to have restenosis.

e. The chance is 95% that the study is correct.

24. A study of the relationship between the concentration of lead in the blood and hemoglobin resulted in the following prediction equation: Y = 15 – 0.1(X), where Y is the predicted hemoglobin and X is the concentration of lead in the blood. From the equation, the predicted hemoglobin for a person with blood lead concentration of 20 mg/dL is

a. 13

b. 14.8

c. 14.9

d. 15

e. 20

Questions 25–26

25. D'Angio and colleagues (1995) studied a group of extremely premature infants to learn whether they have immunologic responses to tetanus toxoid and polio vaccines that are similar to the response of full-term infants. Figure 1 from their study contains the plots of the antitetanus titers before and after the vaccine was given (reproduced in Figure 13-4). The authors calculated the geometric means. Which statistical test is best to learn if the titer level increases in the preterm group after they were given the vaccine?

Figure 13-4. Antitetanus toxoid antibody levels. PT = preterm; FT = full term; GMT = geometric mean titer. * indicates P< 0.001. (Reproduced, with permission, from Figure 1 in D'Angio CT, Maniscalco WM, Pichichero ME: Immunologic response of extremely premature infants to tetanus, Haemophilus influenzae, and polio immunizations. Pediatrics 1995; 96: 18–22.)

a. One-group t test

b. Paired t test

c. Wilcoxon signed rank test

d. Two independent-groups t test

e. Wilcoxon rank sum test

26. Based on Figure 13-4, which is the appropriate conclusion?

a. No difference exists between preterm and full-term infants before and after the vaccine.

b. No difference exists between preterm and full-term infants before the vaccine, but a difference does occur after the vaccine.

c. A difference exists between preterm and full-term infants before and after the vaccine.

d. A difference exists between preterm and full-term infants before the vaccine but not after.

Questions 27–28

Figure 13-5 summarizes the gender-specific distribution of values on a laboratory test.

27. From Figure 13-5, we can conclude that

a. Values on the laboratory test are lower in men than in women.

b. The distribution of laboratory values in women is bimodal.

c. Laboratory values were reported more often for women than for men in this study.

d. Half of the men have laboratory values between 30 and 43.

e. The standard deviation of laboratory values is equal in men and women.

28. The most appropriate statistical test to compare the distribution of laboratory values in men with that in women is

a. Chi-square

b. Paired t test

c. Independent-groups t test

d. Correlation

e. Regression

29. A study was undertaken to evaluate any increased risk of breast cancer among women who use birth control pills. The relative risk was calculated. A type I error in this study consists of concluding

a. A significant increase in the relative risk when the relative risk is actually 1

b. A significant increase in the relative risk when the relative risk is actually greater than 1

c. A significant increase in the relative risk when the relative risk is actually less than 1

d. No significant increase in the relative risk when the relative risk is actually 1

e. No significant increase in the relative risk when the relative risk is actually greater than 1

Figure 13-5. Gender-specific distributions of values on a laboratory test.

Questions 30–31

30. A graph of the lowest oxyhemoglobin percentage in the patients studied overnight in the study by Good and colleagues (1996) is given in Figure 1 from their article (see Figure 13-6). What is the best way to describe the distribution of these values?

a. Normal distribution

b. Chi-square distribution

c. Binomial distribution

d. Negatively skewed distribution

e. Positively skewed distribution

31. If the investigators wanted to compare oxyhemoglobin levels in patients who died within 12 months with the levels of oxyhemoglobin percentage in patients who survived, what method should they use?

a. t test for two independent groups

b. Wilcoxon rank sum test

c. Analysis of variance

d. Chi-square test

e. Kaplan–Meier curves

32. The scale used in measuring cholesterol is

a. Nominal

b. Ordinal

c. Interval

d. Discrete

e. Qualitative

Figure 13-6. Distribution of lowest oxyhemoglobin (OxyHb) values for each patient during overnight oximetry. (Reproduced, with permission, from Figure 1 in Good DC, Henkle JQ, Gelber D, Welsh J, Verhulst S: Sleep-disordered breathing and poor functional outcome after stroke. Stroke 1996; 27: 252–259.)

33. The scale used in measuring presence or absence of a risk factor is

a. Binary

b. Ordinal

c. Interval

d. Continuous

e. Quantitative

34. Which of the following sources is most likely to provide an accurate estimate of the prevalence of multiple sclerosis (MS) in a community?

a. A survey of practicing physicians asking how many MS patients they are currently treating

b. Information from hospital discharge summaries

c. Data from autopsy reports

d. A telephone survey of a sample of randomly selected homes in the community asking how many people living in the home have the disease

e. Examination of the medical records of a representative sample of people living in the community

Questions 35–36

In an epidemiologic study of carbon-black workers, 500 workers with respiratory disease and 200 workers without respiratory disease were selected for study. The investigators obtained a history of exposure to carbon-black dust in both groups of workers. Among workers with respiratory disease, 250 gave a history of exposure to carbon-black dust; among the 200 workers without respiratory disease, 50 gave a history of exposure.

35. The odds ratio is

a. 1.0

b. 1.5

Figure 13-7. A: Distribution of normative values for heart rate variation to deep breathing (VAR) and B: Valsalva ratio (VAL) for the entire study population. (Reproduced, with permission, from Figure 1 in Gelber DA, Pfeifer M, Dawson B, Shumer M: Cardiovascular autonomic nervous system tests: Determination of normative values and effect of confounding variables. J Auton Nerv Syst 1997; 62: 40–44.)

c. 2.0

d. 3.0

e. Cannot be determined from the preceding information

36. This study is best described as a

a. Case–control study

b. Cohort study

c. Cross-sectional study

d. Controlled experiment

e. Randomized clinical trial

37. A physician wishes to study whether a particular risk factor is associated with some disease. If, in reality, the presence of the risk factor leads to a relative risk of disease of 4.0, the physician wants to have a 95% chance of detecting an effect this large in the planned study. This statement is an illustration of specifying

a. A null hypothesis

b. A type I, or alpha, error

c. A type II, or beta, error

d. Statistical power

e. An odds ratio

Table 13-7. Results of test.

 

CHD Present

CHD Absent

Positive test

80

50

Negative test

20

150

CHD = coronary heart disease.

38. The most likely explanation for a lower crude annual mortality rate in a developing country than in a developed country is that the developing country has

a. An incomplete record of deaths

b. A younger age distribution

c. An inaccurate census of the population

d. A less stressful life-style

e. Lower exposure to environmental hazards

Figure 13-8. Box plots of the mean SMAF score and subscores at baseline (T1) according to sex, among a representative sample of elderly people living at home in Sherbrooke, Canada, 1991–1993. SMAF = Functional Autonomy Measurement System; ADL = activities of daily living; IADL = instrumental activities of daily living. (Reproduced, with permission, from Figure 2 in Hébert R, Brayne C, Spiegelhalter D: Incidence of functional decline and improvement in a community-dwelling very elderly population. Am J Epidemiol1997; 145: 935–944.)

Questions 39–40

39. The distribution of variation in heart rate to deep breathing and the Valsalva ratio from the study by Gelber and colleagues (1997) is given in Figure 1 in their article and is reproduced below in Figure 13-7. What is the best way to describe the distribution of these values?

a. Normal distribution

b. Chi-square distribution

c. Binomial distribution

d. Negatively skewed distribution

e. Positively skewed distribution

40. What is the best method to find the normal range of values for the Valsalva ratio?

a. The values determined by the mean ą 2 standard deviations

b. The values determined by the mean ą 2 standard errors

c. The values determined by the upper and lower 5% of the distribution

d. The values determined by the upper and lower 2˝% of the distribution

e. The highest and lowest values defined by the whiskers on a box plot

41. A study of the exercise tolerance test for detecting coronary heart disease had the characteristics shown in Table 13-7. The authors concluded the exercise stress test has positive predictive value in excess of 75%. The error in this conclusion is that

a. Not enough patients are included in the study for precise estimates.

b. The sample sizes should be equal in these kinds of studies.

c. The positive predictive value is really 80%.

d. The prevalence of coronary heart disease (CHD) is assumed to be 33%.

e. No gold standard exists for D.

Questions 42–44

42. Hébert and colleagues (1997) studied functional decline in a very elderly population of community-dwelling subjects. They used the Functional Autonomy Measurement System (SMAF) questionnaire to measure several indices of functioning. Figure 2 from their article displays box plots of scores for men and women at baseline and is reproduced in Figure 13-8. Based on the plots, men and women had the most similar scores on which measure?

a. ADL (adult daily living)

b. Mobility

c. Communication

d. Mental

e. IADL (instrumental activities of daily living)

f. Total score

g. Impossible to tell

43. Among men, which measure has the most symmetric distribution?

a. ADL (adult daily living)

b. Mobility

c. Communication

d. Mental

e. IADL (instrumental activities of daily living)

f. Total score

g. Impossible to tell

44. What is the median mental score for women?

a. 0

b. 0.15

c. 0.3

d. 0.7

e. Impossible to tell

Questions 45–46

Several studies have shown that men with a low blood cholesterol level as well as those with high levels have an increased risk of early death. A report in the March 21, 1993, New York Times described a study by Dr Carlos Iribarren at the University of Southern California on a cohort of 8000 Japanese-American men followed for 23 years. The purpose was to study further the link between low blood cholesterol levels and higher death rate. The men were divided into four groups: healthy men, men with chronic disorders of the stomach or liver, heavy smokers, and heavy drinkers. Within each group, subjects were stratified according to their cholesterol level. Among men with cholesterol levels below 189 mg/dL, death rates were higher in the chronic illness, heavy smoker, and heavy drinker groups but not in the group of healthy men.

45. This study highlights a potential threat that may have occurred in previous studies that suggested that low cholesterol is linked to higher death rates. The most likely threat in the previous studies is

a. Selection bias

b. Length of follow-up

c. A confounding variable

d. Inappropriate sample size

46. Using the guidelines in Table 10-2, select the most appropriate statistical method to analyze the data collected in this study.

a. Mantel–Haenszel chi-square

b. ANOVA

c. Multiple regression

d. Log-linear methods

Questions 47–48

47. Figure 2 in Dennison and colleagues (1997) shows the relationship between the age of children and daily consumption of fruit juice (see Figure 13-9). What is the best method to learn if a difference exists between 2-year-old and 5-year-old children?

a. Chi-square test

b. Fisher's exact test

c. Fisher's z distribution

d. Paired t test

e. Pearson correlation

Figure 13-9. The prevalence of children with decreased stature (height less than 20th age- and sex-specific percentile) between children drinking less than 12 fl oz/day of fruit juice and children drinking more than 12 fl oz/day of fruit juice. * Indicates Fisher's exact test P < 0.01. (Reproduced, with permission, from Figure 2 in Dennison BA, Rockwell HL, Baker SL: Excess fruit juice consumption by preschool-aged children is associated with short stature and obesity. Pediatrics 1997; 99: 15–22.)

48. What is the best way to describe the magnitude of the relationship between the age of children and daily consumption of fruit juice?

a. Kappa

b. Odds ratio

c. Pearson correlation

d. Spearman correlation

e. r2

49. Birth weights of a population of infants at 40 weeks gestational age are approximately normally distributed, with a mean of 3000 g. Roughly 68% of such infants weigh between 2500 and 3500 g at birth. If a sample of 100 infants were studied, the standard error would be

a. 50

b. 100

c. 250

d. 500

e. None of the above

50. A significant positive correlation has been observed between alcohol consumption and the level of systolic blood pressure in men. From this correlation, we may conclude that

a. No association exists between alcohol consumption and systolic pressure.

b. Men who consume less alcohol are at lower risk for increased systolic pressure.

c. Men who consume less alcohol are at higher risk for increased systolic pressure.

d. High alcohol consumption can cause increased systolic pressure.

e. Low alcohol consumption can cause increased systolic pressure.

51. In a randomized trial of patients who received a cadaver renal transplant, 100 were treated with cyclosporin and 50 were treated with conventional immunosuppression therapy. The difference in treatments was not statistically significant at the 5% level. Therefore

a. This study has proven that cyclosporin is not effective.

b. Cyclosporin could be significant at the 1% level.

c. Cyclosporin could be significant at the 10% level.

d. The groups have been shown to be the same.

e. The treatments should not be compared because of the differences in the sample sizes.

52. The statistical method used to develop guidelines for diagnostic-related groups (DRGs) was

a. Survival analysis

b. t tests for independent groups

c. Multiple regression

d. Analysis of covariance

e. Correlation

53. Suppose the confidence limits for the mean of a variable are 8.55 and 8.65. These limits are

a. Less precise but have a higher confidence than 8.20 and 9.00

b. More precise but have a lower confidence than 8.20 and 9.00

c. Less precise but have a lower confidence than 8.20 and 9.00

d. More precise but have a higher confidence than 8.20 and 9.00

e. Indeterminate because the level of confidence is not specified

54. A senior medical student wants to plan her elective schedule. The probability of getting an elective in endocrinology is 0.8, and the probability of getting an elective in sports medicine is 0.5. The probability of getting both electives in the same semester is 0.4. What is the probability of getting into endocrinology or sports medicine or both?

a. 0.4

b. 0.5

c. 0.8

d. 0.9

e. 1.7

55. A manager of a multispecialty clinic wants to determine the proportion of patients who are referred by their primary care physician to another physician within the group versus a physician outside the group. Every 100th patient chart is selected and reviewed for a letter from a physician other than the primary care physician. This procedure is best described as

a. Random sampling

b. Stratified sampling

c. Quota sampling

d. Representative sampling

e. Systematic sampling

56. The probability is 0.6 that a medical student will receive his first choice of residency programs. Four senior medical students want to know the probability that they all will obtain their first choice. The solution to this problem is best found by using

a. The binomial distribution

b. The normal distribution

c. The chi-square distribution

d. The z test

e. Correlation

57. The graph in Figure 13-10 gives the 5-year survival rates for patients with cancer at various sites. From this figure, we can conclude

a. That breast and uterine corpus cancer are increasing at higher rates than other cancers

b. That lung cancer is the slowest growing cancer

c. That few patients are diagnosed with distant metastases

d. That survival rates for patients with regional involvement are similar, regardless of the primary site of disease

e. None of the above

Table 13-8. Diagnostic test characteristics of electrocardiographic criteria for acute myocardial infarction.a

ECG Characteristic

Sensitivity, % (95% CI)

Specificity, % (95% CI)

Positive Predictive Value (95% CI)

Negative Predictive Value (95% CI)

ST elevation ≥1 mm in concordant leadsb

7 (1–21)

100 (95–100)

100 (16–100)

71 (61–80)

ST depression ≥1 mm in leads V1, V2, or Vb3

3 (0–17)

100 (95–100)

100 (2–100)

71 (61–79)

ST elevation ≥5 mm in discordant leadsb

19 (7–37)

82 (71–90)

32 (13–57)

70 (59–80)

Overall ECG algorithm

10 (2–26)

100 (96–100)

100 (29–100)

72 (62–81)

QRS notching

39 (22–58)

57 (45–59)

28 (15–44)

68 (55–80)

RS complex in lead V6

26 (12–45)

79 (68–88)

35 (16–57)

71 (60–81)

Sign of Cabrera

7 (1–21)

86 (76–93)

17 (2–48)

68 (58–78)

ST elevation ≥7 mm in discordant lead, or ≥2 mm depression in concordant lead

3 (0–17)

99 (93–100)

50 (1–100)

70 (60–79)

Positive T waves in lead with upright QRS complex

3 (0–17)

93 (85–98)

17 (0–64)

69 (59–78)

Sign of Chapman

3 (0–17)

92 (83–97)

14 (3–58)

68 (58–78)

aApplies to patients with acute cardiopulmonary symptoms and left bundle-branch block among 31 patient presentations with myocardial infarction and 72 without.
b Criteria used in algorithm of Sgarbossa et al.
ECG = electrocardiogram; CI = confidence interval.
Source: Reproduced, with permission, from Table 3 from Shlipak MG, Lyons WL, Go AS, Chou TM, Evans GT, Browner WS: Should the electrocardiogram be used to guide therapy for patients with left bundle-branch block and suspected myocardial infarction? JAMA 1999;281:714–719.

Figure 13-10. Five-year survival rates for patients with cancer. (Adapted and reproduced, with permission, from Rubin P [editor]:Clinical Oncology: A Multidisciplinary Approach, 6th ed. American Cancer Society, 1983.)

Questions 58–59

58. Table 3 from Shlipak and colleagues (1999) provides diagnostic test characteristics of electrocardiographic criteria for determining myocardial infarction (MI) (see Table 13-8). Which of the following five features is the best single feature for ruling in MI?

a. ST elevation ≥ 5 mm

b. Overall ECG algorithm

c. QRS notching

d. RS complex in lead V8

e. Sign of Cabrera

59. Which of the following five features is the best single feature for ruling out MI?

a. ST elevation ≥ 5 mm

b. Overall ECG algorithm

c. QRS notching

d. RS complex in lead V8

e. Sign of Cabrera

60. A study was undertaken to compare treatment options in black and white patients who are diagnosed as having breast cancer. The 95% confidence interval for the odds ratio for blacks being more likely to be untreated than whites was 1.1 to 2.5. The statement that most accurately describes the meaning of these limits is that

a. Ninety-five percent of the odds ratios fall within these limits.

b. Black women are up to 2.5 times more likely than whites to be untreated.

c. Ninety-five percent of the time blacks are more likely than whites to be untreated.

d. Blacks are 95 times more likely than whites to receive no treatment for breast cancer.

e. No difference exists in the treatment of black and white women.

Questions 61–63

61. In the study to identify risk factors for discontinuing dialysis, Bajwa and colleagues (1996) presented information on the sociodemographic characteristics of patients in the study (Table 2 from Bajwa is reproduced in Table 13-9). Which of these characteristics has the greatest statistical significance in identifying those who stopped versus those who continued on dialysis?

a. Sex

b. Age

c. Marital status

d. Treatment

e. Education

Table 13-9. Sociodemographic data on all patients.a

62. What statistical technique can the investigators use to control for differences in any of the sociodemographic characteristics in the two groups that might confound the statistical analysis of mortality?

a. Paired t test

b. Analysis of variance

c. Chi-square

d. Regression

e. Logistic regression

63. The results of the multivariate analysis to predict stopping dialysis from the study by Bajwa and colleagues (1996) is given in Table 13-10. All of the independent variables were coded as yes or no. The regression coefficient for severe pain is -1.20 and for comorbidity is 0.79. Why is the P value for comorbidity lower than the P value for severe pain?

a. The standard error for comorbidity is smaller.

b. The relationship between stopping dialysis and comorbidity is positive.

c. The relative risk associated with comorbidity is higher.

d. Comorbidity is a more objective measure.

Questions 64–65

Physicians wish to determine whether the emergency department (ED) at the local hospital is being overused by patients with minor health problems. A random sample of 5000 patients was selected and categorized by age and degree of severity of the problem that brought them to the ED; severity was measured on a scale of 1 to 3, with 3 being most severe; the results are given in Table 13-11.

64. The joint probability that a 3-year-old patient with a problem of high severity is selected for review is

a. 200/5000 = 0.04

b. 200/1600 = 0.125

c. 1600/5000 = 0.32

d. 1100/5000 = 0.22

e. 1600/5000 + 1100/5000 = 0.54

Table 13-10. Result of multivariate analysis.

Variable

Regression Coefficient

SE

P

R

Relative Risk

No severe pain

-1.1975

0.4881

0.01

-0.2007

0.3020

Living with partner

-0.6614

0.3087

0.03

-0.1612

0.5161

Comorbidity

0.7856

0.2673

0.003

0.2580

2.1937

Source: Reproduced, with permission, from Table 5 in Bajwa K, Szabo E, Kjellstrand CM: A prospective study of risk factors and decision making in discontinuation of dialysis. Arch Intern Med1996;156:2571–2577.

Table 13-11. Patients seen in emergency department.

Age (years)

Severity of Problem

Total

Low

Medium

High

<5

1100

300

200

1600

5–14

500

900

300

1700

>14

600

500

600

1700

Total

2200

1700

1100

5000

Table 13-12. Logistic-regression model to predict aspirin therapy before infarction.

Variable

Odds Ratio (95% CI)

PTCA before MI

2.66 (1.57–4.51)

Catheterization before MI

2.22 (1.59–3.10)

Previous MI

1.95 (1.49–2.54)

CABG before MI

1.80 (1.27–2.55)

White race

1.47 (0.97–2.23)

Randomization after 1/28/88

1.43 (1.11–1.85)

Angina before MI

1.22 (0.94–1.60)

Hypertension

1.19 (0.95–1.50)

Male sex

1.19 (0.86–1.64)

Married status

1.03 (0.79–1.35)

Age

1.02 (1.01–1.04)

Education after high school

0.97 (0.77–1.22)

Orthopedic disease

0.97 (0.55–1.69)

Type of hospital (academic vs community)

0.92 (0.71–1.18)

Diabetes

0.83 (0.62–1.08)

CI = confidence interval; PTCA = percutaneous transluminal coronary angioplasty; MI = myocardial infarction; CABG = coronary-artery bypass grafting.
Source: Adapted and used, with permission, from Table 2 in Lamas GA, Pfeffer MA, Hamm P, Wertheimer J, Rouleau JL, Braunwald E: Do the results of randomized clinical trials of cardiovascular drugs influence medical practice? N Engl J Med 1992; 327:241–247.

65. If a patient comes to the ED with a problem of low severity, how likely is it that the patient is older than 14 years of age?

a. 1700/5000 2200/5000 = 0.15

b. 600/2200 = 0.27

c. 1700/5000 = 0.34

d. 600/1700 = 0.35

e. 2200/5000 = 0.44

Questions 66–70

These questions constitute a set of extended matching items. For each of the situations outlined here, select the most appropriate statistical method to use in analyzing the data from the choices a–i that follow. Each choice may be used more than once.

a. Independent-groups t test

b. Chi-square test

c. Wilcoxon rank sum test

d. Pearson correlation

e. Analysis of variance

f. Mantel–Haenszel chi-square

g. Multiple regression

h. Paired t test

i. Odds ratio

66. Investigating average body weight before and after a supervised exercise program

67. Investigating gender of the head of household in families of patients whose medical costs are covered by insurance, Medicaid, or self

68. Investigating a possible association between exposure to an environmental pollutant and miscarriage

69. Investigating blood cholesterol levels in patients who follow a diet either low or moderate in fat and who take either a drug to lower cholesterol or a placebo

70. Investigating physical functioning in patients with diabetes on the basis of demographic characteristics and level of diabetic control

Questions 71–75

These questions constitute a set of multiple true–false items. For each of the statements, determine whether the statement is true or false.

Table 13-12 contains the variables used by Lamas and colleagues (1992) to predict aspirin therapy before myocardial infarction (MI). Refer to the table to answer the following questions.

71. Patients who had had a previous MI were significantly more likely to take aspirin.

72. Race was a more significant predictor of aspirin therapy than age.

73. Older patients were significantly more likely to take aspirin.

74. Diabetic patients were significantly less likely to take aspirin.

75. The type of hospital was significantly associated with aspirin use.



If you find an error or have any questions, please email us at admin@doctorlib.info. Thank you!