Ordinarily Well: The Case for Antidepressants


Better, Faster, Cheaper

IMIPRAMINE ENDED EPISODES of depression. The NIMH researchers defined “recovery” as a Hamilton score below 6. By this standard—it suggests solid mental health—over 40 percent of those who entered the imipramine arm recovered. Half as many subjects recovered with placebo. Whatever distortions the recruitment process produced—and it does seem that some participants had exaggerated their symptoms—overall, the recovery numbers suggested that it had attracted patients with substantial mood disorders.

But what if, instead of using imipramine to test the trial, we want to use the trial to test imipramine?

The NIMH study was designed to evaluate psychotherapy. Since imipramine was meant as a comparator, it was prescribed according to methods used in prior research where antidepressants succeeded. Clinicians were to push the dose up to a good level and hold it there. But the trials that served as models lasted six or eight weeks. To accommodate psychotherapy, the TDCRP lasted sixteen.

Jan Fawcett, the psychiatrist who wrote the manual for drug administration, later said that he had always entertained misgivings about the medication protocol. With the rigid plan, a research subject might be dosed heavily, show no response, suffer agitation and indigestion, and persist with those symptoms for months. Patients would be exposed to risk and discomfort, and imipramine’s score sheet would be blemished by bad outcomes rarely seen in settings where doctors can adjust regimens. As the trial proceeded, Fawcett recalled, participating pharmacologists became demoralized.

There are, so far as I know, no published accounts of how much medication patients in the TDCRP took, but worrisome indicators came to Fawcett’s attention.

Doctors were beginning to regulate patients’ imipramine doses according to blood levels, a practice that later became common. The intention was to keep the concentrations in a “therapeutic window,” a range in which medicine is most effective. A classic study found that patients with proper drug levels were 70 percent likelier to respond than those with readings below or above the window. Once the NIMH trial had ended, Fawcett learned that ten participants had reached toxic drug levels—enough imipramine to cause disturbances of electrical conduction in the heart. It was lucky that no one died. The blood levels ran almost triple what they should have. Given the level of overdosing in these ten patients, many more of the volunteers must have been at less alarming levels that were nonetheless outside—above—the window.

Donald Klein developed concerns that the trial’s design had caused problems at the other extreme—below the therapeutic window. Knowing that patients would be stuck at whatever dose the imipramine had been increased to, prudent clinicians might have been cautious in prescribing for relatively healthy patients. Klein wrote, “It is quite possible the milder patients received ineffective, small doses.”

The trial’s method—drawing blood only late in the trial, withholding laboratory results from prescribers, and forbidding dose decreases—had the virtue of simplicity. If you provided blood levels to prescribers, would you offer them pretend data for the placebo group? The TDCRP avoided complexity and still got the wished-for confirmation: real patients. But if half the TDCRP participants had blood levels outside the therapeutic window, then the study results might understate imipramine’s benefits by a third. That’s a lot of efficacy left on the table—a lot of difference between what the trial reported and what imipramine does in doctors’ offices.

Sir Austin Bradford Hill had anticipated this dilemma, the need to choose between simplicity and full efficacy. He said, “It may well be asked, therefore, in the planning of a trial, which is the more important—for the doctor to be ignorant of the treatment and unbiased in his judgment or for him to know what he is doing and to be able to adjust what he is doing so as to observe closely the results and then make unbiased judgments to the best of his ability…”

The NIMH trial was blinded, but that precaution, meant to minimize bias, almost certainly skewed the results anyway, against medication.

If the study underestimated the value of imipramine, it may also have overstated the effects of placebo pills. I have mentioned problems that can occur at the start of a trial, when patients may exaggerate symptoms. Later, we will consider ways in which baseline score inflation has clouded our understanding of antidepressants. For now, let’s consider distortions that can occur as participants leave a study.

What drives people away? When researchers ask, patients give consistent answers. In the group on placebo, almost all who leave do so because the treatment isn’t working. On the drug side, dropouts for side effects predominate.

Placebo arms lose sick patients and retain healthier ones: people who get better spontaneously, people who were not depressed to begin with, and people who respond to minimal supportive psychotherapy. Describing what happens in the drug arm is more difficult. If participants know from experience that their depression tends to remit with, say, psychotherapy, even a mild drug side effect may cause them to seek help elsewhere. These dropouts may include the most naturally responsive patients in the sample. For some patients, drugs will work partial magic, relieving half their symptoms, say, and instilling hope of more improvement to follow—so that quite fragile and vulnerable patients are carried along.

When participants exit different arms of a trial for different reasons, the study is said to suffer from differential dropout. Differential dropout ruins randomization. The placebo arm will have retained patients who were resilient to start with. The medication arm will have retained a sicker group. The experiment is back where it would have been if the triage nurse had assigned healthy people to take placebo and gravely burdened ones to take medication. This distortion— dropout bias—can produce results that understate antidepressants’ worth.

The NIMH researchers worried in advance about differential dropout. A third of patients did quit early—but fully 45 percent of those given placebo. In the placebo group, patients who left the study tended to be those who had been severely depressed to start with.

Dropout bias tends to be greatest when researchers base their calculations on patients who stick with a trial, what is called a completer analysis. If only healthy patients persist in the placebo arm of a trial, a completer analysis will reverse reality, giving placebo credit for having driven off people who are still suffering.

In an alternative approach, an intention-to-treat analysis, researchers include results from all patients who enter a trial. Any who drop out are assigned the last Hamilton score recorded before they left. If a patient quits toward the start of a trial, her final Hamilton score will be the initial one—no improvement. The intention-to-treat analysis does not reward treatments for their failures.

In the completer analysis, looking at severely depressed patients who stuck with the program, the NIMH trial found that, contrasted with placebo, imipramine was four times as likely to induce a recovery—good efficacy. In the intention-to-treat analysis, counting all severely depressed patients who started the trial, it found that imipramine was six or seven times as likely to induce a recovery—fantastic efficacy. Arguably, imipramine was much more helpful than the completer analysis had shown it to be.

Okay. Researchers decide whose results (everyone, or those who stick it out) to include in an analysis—but which results?

We can count Hamilton points. On average, did patients on imipramine shed more symptoms than patients on placebo?

Another choice is to count people or instances of improvement. How many people responded—that is, shed half their symptoms? How many recovered—that is, exited depression? Response and recovery are categories. When they count people, researchers are doing a categorical analysis.

Because they help some patients a lot while leaving others unchanged or worse off, antidepressants look most effective in categorical analyses. Because in the course of a study many people will shed a few symptoms, placebos do well in analyses that look at averages.

The NIMH trial had that pattern. In terms of average outcome, imipramine always outperformed placebo, but only in a few calculations were the differences statistically significant. It was when they looked at categories—How many people did well?—that researchers found the most decisive evidence of imipramine’s superiority.

Interpreters of outcome studies have choices: Completer sample or intention-to-treat? Categorical analysis or attention to averages? But what measure should we be talking about? Having met Per Bech, we may be wondering about the clinical gestalt.

The NIMH study employed an instrument called the Global Assessment Scale. Raters summarized each subject’s well-being in a single number, considering social, psychological, and occupational competency. On the global scale, imipramine’s efficacy was apparent even in average scores. This result is common. If an experiment includes a scale based on global impressions, where observers say how each research participant is doing overall, that measure will best capture antidepressants’ advantage over placebo.

Can we use the NIMH trial data to develop a portrait of imipramine? We know that the medication was administered poorly to reluctant patients, people hoping for free psychotherapy. Even the strongest of the trial’s results do not capture imipramine’s inherent efficacy. But the data give an impressionistic picture. Imipramine will rarely shine “on average.” Its strength is in bringing about substantial change in the large subset of patients it works for. The benefits are not limited to the syndrome the Hamilton recognizes. They extend to overall well-being.

Later, when we think about meta-analyses, we will be able to put our tools to work. Does a collection include the NIMH study, using the results as if they represented a fair assessment of imipramine’s potential? If so, do the researchers rely on a completer analysis? The answers will give us a sense of what genre we’re encountering—what sort of narrator is telling the tale.

Regarding the NIMH-led collaboration, if the news was not as good as reporters had claimed, it was not all bad either. A representative of the psychodynamic school, interpersonal psychotherapy, had shown its worth, and the field’s old friend imipramine had done yeoman’s work. These successes covered most of what mental health professionals relied on in the treatment of depression.

One more irony emerged. This scrupulous effort, designed by topflight academics, had led to controversy and calls for reinterpretation. The contentious aftermath suggested that it was hard to conduct outcome research in a way that produced definitive or even convincing results. But the general understanding about the TDCRP was the reverse: it had demonstrated that randomized trials of depression treatment were feasible. As Gerry Klerman had hoped, they became the standard of evidence for psychotherapy.

If you find an error or have any questions, please email us at admin@doctorlib.info. Thank you!