BEFORE RONALD REAGAN took office, Gerry Klerman left government, and the Senate bill requiring psychotherapy trials lost momentum. Why had psychologists ever supported the legislation? Payment under government programs was an incentive, but they had another reason to trust in randomized trials—an ace in the hole. In the 1970s, a psychologist had invented a new statistical tool, meta-analysis, and it showed that psychotherapy worked.

As interest in controlled trials grew, psychotherapy had come under attack as mere placebo or worse. Chief among the critics was a German-born British psychologist, Hans Eysenck. Since the 1950s, Eysenck had been writing that research failed to support psychotherapy’s central claim, that it lessened neurosis.

In an era when hospitalization was used to treat milder conditions, Eysenck reviewed data on discharges of mentally ill inpatients. Many had never received formal psychotherapy. Eysenck estimated that over 70 percent of neurotics got better with general medical care. Looking at wards where psychotherapy was practiced, Eysenck found that 64 percent of patients responded—and where psychoanalysis was used, only 44 percent.

Psychotherapy seemed to relieve neurosis—until you looked at a control condition. With conventional medical treatment, symptoms faded, and more reliably. Eysenck suggested that therapy might be harmful: “the more psychotherapy, the lower the recovery rate.”

The attack was preposterous. When a patient returned to work, his GP might consider him cured even when a psychoanalyst would doubt that the neurosis had budged. As a contemporary commentator put it, Eysenck’s finding—less relief among analytic patients—“reflects the probability that the more intensive the therapy, the higher the standard of recovery.”

However unreasonable, Eysenck’s challenge was a burr under the saddle for psychotherapists. A young psychologist and statistician, Gene Glass, had become frustrated by what he called “Eysenck’s frequent and tendentious reviews of the psychotherapy outcome research that proclaimed psychotherapy as worthless.” Glass intended, he wrote, “to annihilate Eysenck and prove that psychotherapy really works” by making “a big splash.” The occasion was an address that Glass was slated to give to a professional organization in 1976. He later recalled, “I set about to do battle with Dr. Eysenck and prove that psychotherapy—my psychotherapy—was an effective treatment.”

The problem was the lack of agreement about how to summarize a body of research literature. Experts did that job all the time, but the review article was a ready forum for the expression of prejudices. Glass complained, “A common method for integrating several studies with inconsistent findings is to carp on the design or analysis deficiencies of all but a few studies—those remaining frequently being one’s own work or that of one’s students or friends.”

Glass believed that Eysenck had ignored inconvenient data.

Intent on doing the opposite, Glass assembled every study he could find that tested a psychotherapy against another intervention or against a placebo condition such as “usual medical treatment.” Where Eysenck had located 11 qualifying studies, Glass gathered 475 involving more than twenty-five thousand patients. He then set out to combine the results in clever fashion. Taking a basket of apples and oranges, he would prove that it was possible to study fruit.

But how? One researcher might test psychoanalysis in the treatment of alcoholism. Another might evaluate cognitive therapy for college students made anxious by a hard math problem. To amalgamate diverse studies—to create a class of fruit—Glass turned to a statistical concept, *effect size*.

In his reworking of the Ten Commandments, the poet W. H. Auden admonished, “Thou shalt not sit / With statisticians nor commit / A social science.” Because the antidepressant controversy consists largely of arguments about effect sizes, we may need to transgress, if gingerly.

Effect size looks at a group of people and says how far treatment moves those who receive it. The calculation begins with our standard maneuver, subtracting the progress patients make in the control arm from the greater progress their counterparts make in the treatment arm. Then, to make the results of dissimilar studies comparable, the formula brings in a basic measure in statistics, the standard deviation.

We won’t get into the math except to say that, for our purposes, a standard deviation is “a lot.” To attach a number: If a medical intervention shifts those on it one standard deviation in a good direction, the average treatment user will now be at the 84 percent mark, so that only 16 percent of people in the original group were healthier. A therapy that brings about change at that level is said to have an effect size of one. It shifted the average by one standard deviation—a lot.

Few medical treatments do that well. An intervention may move patients ahead half a standard deviation, a result expressed as an effect size of 0.5. The average treated subject will do better than 69 percent of untreated subjects.

Glass took the concept of effect size from an influential statistician, Jacob Cohen, who, in the 1960s, had developed a field called power analysis. To that point, statistics had been mostly an all-or-none business. When an experiment found a result (for instance, that psychotherapy, more than usual medical care, decreases patients’ anxiety), statisticians would say whether the result was “significant”—unlikely to be due to chance alone. By the 1930s, statisticians had adopted a working agreement: only if it achieved certainty at the 95 percent level would an experimental finding be deemed significant, that is, credible. Statistical significance answers the question “Is the result real?” or, more precisely, “Is there only a small probability that it’s not real?”

Changes can be real without being noteworthy. A therapy might predictably help people but only by a tiny amount. Cohen’s measure looked beyond yes-or-no to say *how much*. Effect size expresses the amount of improvement likely to occur in response to treatment. It’s the statistical answer to the Ed Koch question, “How’m I doing?”

Since effect size is not an intuitive concept, Cohen provided a guide. An effect size of 0.2 is small; 0.5 is medium; and 0.8 is large. (He later admonished those who took these labels too seriously that they had been “set forth throughout with much diffidence, qualifications, and invitations not to employ them if possible.”) Standard medical treatments tend to show medium effect sizes, just under 0.5.

Gene Glass saw that he could use effect sizes to combine apples and oranges. If psychoanalysis for alcoholism moved treated patients to the 84 percent mark and so did the anxiety therapy (and some others), then Glass would say that therapy in general performed its job at that level, one standard deviation’s worth, with an effect size of one.

Glass had collected all kinds of psychotherapy outcome studies. They measured movement toward freedom from delusions in schizophrenic patients and movement toward freedom from snake phobias in healthy college students. In each case, Glass asked how far treatment moved treated research subjects relative to their group as a whole. Combining his 475 studies into one big experiment, Glass found that the effect size for psychotherapy was 0.85—large. Few other interventions in the social or psychological arena work as well. Glass’s summary result—high efficacy for psychotherapy—constituted a thorough refutation of Eysenck. Glass had achieved his goal, total victory in the psychotherapy dispute.