Shorter Oxford Textbook of Psychiatry, 6th Ed.

CHAPTER 6. Evidence-based approaches to psychiatry

What is evidence-based medicine?

Individual treatment studies

Systematic reviews

Implementation of evidence-based medicine

Evaluation of evidence-based medicine

Other applications of evidence-based medicine

Qualitative research methods

Evidence-based medicine in psychiatry

What is evidence-based medicine?

Evidence-based medicine (EBM) is a systematic way of obtaining clinically important information about aetiology, diagnosis, prognosis, and treatment. The evidence-based approach is a process in which the following steps are applied:

• formulation of an answerable clinical question

• identification of the best evidence

• critical appraisal of the evidence for validity and utility

• implementation of the findings

• evaluation of performance.

The principles of EBM can be applied to a variety of medical procedures. For psychiatry, the main use of EBM at present is to assess the value of therapeutic interventions. For this reason, in the following sections the application of EBM will be linked to studies of treatment. Applications to other areas such as diagnosis and prognosis are discussed later.

History of evidence-based approaches

Examples of what we now might call ‘evidence-based approaches’ to the investigation of treatments have a long if sporadic history in medicine. For example, in 1747 a naval surgeon, James Lind, studied six pairs of sailors ‘as similar as I could have them’ who were suffering from scurvy. The sailors who received oranges and lemons recovered within a few weeks, in contrast to those who simply received the same housing and general diet. Lind’s study was not carried out ‘blind’, but in 1784 Benjamin Franklin applied blindfolds to the participants in a mesmerism study, who were therefore unaware whether or not the treatment was being applied. The ‘blinding’ abolished the treatment effect of mesmerism, providing strong evidence that its effects were mediated by suggestion (Devereaux et al., 2002).

The application of modern randomized trial methodology to medicine is attributed to Sir Austin Bradford Hill (1897–1991), who designed the Medical Research Council (MRC) trial of streptomycin treatment of tuberculosis in 1948. Subsequently, Bradford Hill lent his influence to the application of randomized trials in the evaluation of psychiatric treatments, often in the face of vociferous opposition from the profession. The first psychiatric trial to use this methodology was carried out at the Maudsley Hospital in 1955 by David Davies and Michael Shepherd, who demonstrated that, relative to placebo, reserpine had beneficial effects in anxiety and depression. A few years later Ackner and Oldham (1962) used double-blind randomized methods to debunk insulin coma therapy (see p. 508). Subsequently, in 1965, an MRC group reported the first large-scale, multi-centre, randomized controlled trial in psychiatry, in which imipramine and electroconvulsive therapy (ECT) were shown to be therapeutically superior to placebo in the treatment of hospitalized depressed patients (see Tansella, 2002).

More recent developments in evidence-based approaches owe much to Archibald Cochrane (1909–1988), an epidemiologist and author of an influential book, Effectiveness and Efficiency: Random Reflections on Health Services, which was published in 1972. Cochrane emphasized the need, when planning treatment provision, to use evidence from randomized controlled trials because it is more reliable than any other kind. In a frequently cited quotation (Cochrane, 1979), he wrote: ‘It is surely a great criticism of our profession that we have not organized a critical summary, by specialty or subspecialty, adapted periodically, of all relevant randomized controlled trials.’

Cochrane’s views were widely accepted, and two further developments enabled his vision to be realized. First, the availability of electronic databases and computerized searching made it feasible to find all (or nearly all) of the relevant randomized trials when gathering evidence on particular therapeutic questions. Secondly, the statistical techniques of meta-analysis enabled randomized trials to be combined, providing greater power and allowing a reliable quantification of treatment effects. Results from studies using these methodologies are called ‘systematic reviews’ to distinguish them from the more traditional, less reliable ‘narrative reviews’ in which the judgement of the authors plays a major role in deciding what evidence to include and what weight to give it. The Cochrane Collaboration, which was formed in 1993, is now the largest organization in the world engaged in the production and maintenance of systematic reviews ( In the UK, the Centre for Reviews and Dissemination, based at the University of York, maintains an up-to-date database of systematic reviews of healthcare interventions (

Why do we need evidence-based medicine?

There are two main related problems in clinical practice which can be helped by the application of EBM:

• the difficulty in keeping up to date with clinical and scientific advances

• the tendency of practitioners to work in idiosyncratic ways that are not justified by the available evidence.

With the burgeoning number of clinical and scientific journals, the most assiduous clinician is unable to keep up to date with all of the relevant articles even in their own field. In fact, it has been estimated that to accomplish this task would require scrutiny of 20 publications a day! Clinicians therefore have to rely on information gathered from other sources, which might include, for example, unsystematic expert reviews, opinions of colleagues, information from pharmaceutical companies, and their own clinical experiences and beliefs. This can lead to wide variations in practice—for example, those described for the use of ECT (see UK ECT Review Group, 2003).

Kinds of evidence

The fundamental assumption of EBM is that some kinds of evidence are better (i.e. more valid and of greater clinical applicability) than others. This view is most easily elaborated for questions about therapy. A commonly used ‘hierarchy’ is shown in Table 6.1.

In this hierarchy, evidence from randomized trials is regarded as more valid than evidence from non-randomized trials, while systematic reviews of randomized trials are seen as the gold standard for answering clinical questions. This assumption has itself yet to be tested systematically, and some argue that large trials with simple clinically relevant end points may be more valid than meta-analyses (see Furukawa, 2004). It is certainly important that clinicians are trained in critical evaluation of systematic reviews before they apply their results to clinical practice (see Geddes, 1999).

Table 6.1 Hierarchy of the quality of research about treatment

Ia Evidence from a systematic review of randomized controlled trials

Ib Evidence from at least one randomized controlled trial

IIa Evidence from at least one controlled study without randomization

IIb Evidence from at least one other type of quasi-experimental study

III Evidence from non-experimental descriptive studies, such as comparative studies, correlation studies, and case–control studies

IV Evidence from expert committee reports or opinions and/or clinical experience of respected authorities

Individual treatment studies


The key criterion for validity in treatment studies is randomization. In addition, clinicians who are entering patients into a therapeutic trial should be unaware of the treatment group to which their patients are being allocated. This is usually referred to as concealment of the randomization list. Without concealed randomization, the validity of a study is questionable and its results may be misleading.

Other important points to consider when assessing the validity of a study include the following:

• Were all of the patients who entered the trial accounted for at its conclusion?

• Were patients analysed in the groups to which they were allocated (so-called ‘intention-to-treat’ analysis)?

• Were patients and clinicians blind to the treatment received (a different question to that of blind allocation)?

• Apart from the experimental treatment, were the groups treated equally?

• Did the randomization process result in the groups being similar at baseline?

Presentation of results

Odds ratios and relative risk

When the outcome of a clinical trial is an event (e.g. admission to hospital), a commonly used measure of effectiveness is the odds ratio. The odds ratio is the odds of an event occurring in the experimental group divided by the odds of it occurring in the control group. The odds ratio is given with 95% confidence intervals (which indicate the range of values within which we have a 95% certainty that the truevalue falls). The narrower the confidence intervals are, the greater is the precision of the study.

If the odds ratio of an event such as admission to hospital is 1.0, this means the rates of readmission do not differ between the control and experimental groups. Therefore if the confidence interval of the odds ratio of an individual study includes the value of 1.0, the study has failed to show that the experimental and control treatments differ from each other.

Relative risk also measures the relative likelihood of an event occurring in two distinct groups. It is regarded as a more intuitive measure of effectiveness than the odds ratio. For example, if action A carries a risk of 99.9% and action B carries a risk of 99.0%, the relative risk is just over 1, which seems intuitively correct for two such similar outcomes. However, the calculated odds ratio is almost 10! With relatively infrequent events, the odds ratio and relative risk become more similar. Measures of relative risk cannot be used in case–control designs, and are hard to adjust by covariance for confounding variables.

Effect sizes

In many studies the outcome measure of interest is a continuous variable, such as a mean score on the Hamilton Rating Scale for Depression. It is possible to use the original measure in the meta-analysis, although more often an estimate of effect size is made because it is more statistically robust.

Effect sizes are obtained by dividing the difference in effect between the experimental group and the control group by the standard deviation of their difference. The clinical interpretation of the effect size is discussed below.

Clinical utility of interventions

Risk reduction and number needed to treat

An important part of EBM involves using the results of randomized trials of groups of patients to derive the impact of an intervention at the level of the individual patient. A useful concept when assessing the value of a treatment is that of absolute risk reduction. This compares the proportion of patients receiving the experimental treatment who experienced a clinically significant adverse outcome (e.g. clinical relapse) with the rate in patients receiving the comparison treatment. These rates are known as the experimental event rate (EER) and control event rate (CER), respectively, and are calculated as percentages. The difference between these two outcome rates is the absolute risk reduction (ARR).

The ARR can be converted into a more clinically useful number, known as the number needed to treat (NNT). The NNT is the reciprocal of the ARR, and it tells us how many patients would need to be treated in order to experience one more positive outcome event compared with a comparator treatment (or no treatment) (see Box 6.1). Like odds ratios, NNTs are usually given with 95% confidence intervals.


Paykel et al. (1999) randomized 158 patients with residual depressive symptoms following an episode of major depression to either clinical management or clinical management with 18 sessions of cognitive–behaviour therapy (CBT). Over the following 68 weeks the relapse rate in the CBT-treated group (29%) was significantly less than that in the clinical management group (47%; P = 0.02).

Box 6.1 Indices for translating research results into clinical practice


Control event rate (CER) = b/(b + d)

Experimental event rate (EER) = a/(a + c)

Absolute risk reduction (ARR)

The difference in the proportions with a positive outcome on treatments X and Y = (CER– EER)

Relative risk = EER/CER

Odds ratio (OR)

The ratio of the odds of a positive outcome on treatments X and Y = (a/c)/(b/d) = ad/bc

Number needed to treat (NNT)– the number of patients that need to be treated with treatment X in order to obtain one more positive outcome than would be expected on treatment Y (= 1/AAR)

The ARR in relapse with CBT is 47 − 29 = 18%. The NNT is the reciprocal of this number, which is approximately 6 (usually the NNT is rounded up to the next highest integer). This means that six patients with residual depressive symptoms have to be treated with CBT in order to avoid one relapse. In general, an NNT of less than 10 denotes a useful treatment effect. However, interpretation of the NNT will also depend on the nature of the treatment, together with the extent of its therapeutic and adverse effects. The NNTs for some common psychiatric treatments are shown in Table 6.2.

If the outcome measure of an intervention is a beneficial event (e.g. recovery) rather than avoidance of an adverse event, the effect of the intervention is calculated as the absolute benefit increase (ABI) in the same way as the ARR (see above), with the NNT being similarly computed. A concept related to NNT is the number needed to harm (NNH), which describes the adverse risks of particular therapies (e.g. extrapyramidal symptoms with antipsychotic drugs).

Table 6.2 Examples of number needed to treat (NNT) for interventions in psychiatry


Computing the NNT from odds ratios

If a study or meta-analysis provides an odds ratio, it is possible to compute an NNT that may be more relevant to the clinical circumstances of the practitioner and their patient. For example, in the example given above (Paykel et al., 1999), relapses occurred in 35 of 78 subjects in the clinical management group, compared with 23 of 80 subjects in the CBT group. This gives an odds ratio in the risk of relapse between the two treatments of 0.49. To obtain an NNT from the odds ratio it is necessary to know, or to estimate, the expected relapse rate in the control group. This is known as the patient expected event rate (PEER). The PEER is combined with the odds ratio (OR) in the following formula:


If we take the relapse rate in the patients who were in the clinical management group in the above study (45%), we have:


This gives an NNT of about 6, which we also derived from the other method of calculation involving the ARR. However, if from a local audit we know that in our own service the relapse rate of patients with residual depressive symptoms is about 20% (rather than the figure of 45% reported by Paykel et al.), using the above formula the NNT becomes about 11. This means in our own service we would need to treat 11 patients with CBT in order to obtain one less relapse. Thus odds ratios can be used to adjust NNTs to local clinical conditions, thereby aiding decisions about the applicability of interventions.

Clinical relevance of effect size

Like the odds ratio, the effect size is not easy to interpret clinically. A useful approach is to use the effect size to estimate the degree of overlap between the control and experimental populations. In this way we obtain the proportion of control group scores that are lower than those in the experimental group. (A negative effect size simply means that scores in the control group are higher than those in the experimental group.)

For example, in a review of the effects of benzodiazepines and zolpidem on total sleep time relative to placebo, Nowell et al. (1997) found an overall effect size of 0.71. From normal distribution tables this means that 76% of controls had less total sleep time than the average sleep time in the hypnotic-treated patients. Effect sizes have been classified in the following way:

• 0.2 = small

• 0.5 = moderate

• ≥0.8 = large.

The effect size of antidepressant medication relative to placebo is about 0.4–0.5. At the kind of response levels seen in antidepressant-treated patients (response rate of around 30% in the placebo group and 60% in the experimental group), an effect size of 0.2 is equivalent to an NNT of about 10. With an effect size of 0.5, the NNT falls to 5.

Ethical aspects of therapeutic trials


As we have seen, randomization is a key process in the conduct of an evidence-based clinical trial, because it is the best way of avoiding bias due to chance and random error. However, a clinician may feel uncomfortable about randomization if, for example, he has a strong belief in the efficacy of one of the treatments that is being assessed. Randomization is ethical where there is genuine uncertainty about the best treatment for the individual concerned. In fact, EBM suggests that this situation is more common than clinicians may realize, in that many strongly held beliefs about the efficacy of therapeutic interventions are based on anecdotal experience rather than systematic evidence.

Use of placebo

The use of drug placebo in trials of psychotropic agents is controversial. However, such studies are required by many drug-licensing authorities before, for example, a new antidepressant drug is licensed. The arguments for the use of placebo in antidepressant drug trials have been summarized as follows (see Kader and Pantelis, 2009):

• The placebo response in major depression is variable and unpredictable, and is not infrequently equivalent in therapeutic effect to active treatment.

• Placebo is required to establish the efficacy of new antidepressants. Comparison against an active treatment is not methodologically sufficient because while a finding of ‘no difference’ in antidepressant activity might mean that the new and established treatments have equivalent efficacy, it might also mean that neither treatment was actually effective under the particular trial conditions employed.

• The lack of placebo-controlled design in antidepressant drug development might lead to the marketing of a drug that is ineffective, thereby harming public health.

These arguments have to be weighed against the knowledge that antidepressants are generally somewhat more effective than placebo in the treatment of depression. Therefore a patient who is treated with placebo in a randomized trial is not receiving the best available therapy. One way of trying to deal with this is to ensure that patients in such trials receive particularly close clinical monitoring which will result in their being withdrawn from the study if they are not doing well.

An unintended consequence of this problem is that patients in placebo-controlled trials may not be representative of the patients who will receive the treatment concerned in real-world conditions. For example, in placebo-controlled studies of depression, patients are recruited by advertisement, and there can be inflation of depression scores by raters prior to treatment so that individuals will meet the cut-off score for recruitment into the trial. Such patients tend to do well, whether they are receiving placebo or active treatment, making it difficult to show convincing clinical differences between drug and placebo (Landin et al., 2000; Kirsch et al., 2008).

Informed consent

The role of informed consent is crucial to the ethical conduct of randomized and placebo-controlled trials. This raises difficulties with some psychiatric disorders in which the judgement and decision-making abilities of patients may be impaired. Kader and Pantelis (2009) have outlined a number of important factors.

• Patients must be made specifically aware that the trial is not being conducted for their individual benefit.

• With placebo treatment there must be clear specification of the probability of receiving placebo, the lack of improvement that might result, and the possibility of symptomatic worsening.

• Patients must be free from any coercion or inducement.

• Patients have the right to withdraw from the study at any time without any kind of penalty.

• In addition to the investigator, a family member or other suitable person should be encouraged to monitor the patient’s condition and report to the investigator if there are any concerns.

The key issues therefore are open and explicit information sharing with the patient and their family, and all necessary measures to avoid placebo treatment leading to harm to the subject. However, the issue remains controversial.

Systematic reviews


The most common aim of a systematic review is to obtain all of the available valid evidence about a specific procedure or intervention, and from this to provide a more precise quantitative assessment of its efficacy. Other uses of systematic reviews (e.g. to examine prognosis) are discussed later. Two advances have greatly increased the feasibility of systematic reviews—first, the availability of electronic databases such as Medline and Scopus, and secondly, new statistical techniques through which results from different studies can be combined in a quantitative manner. Because a meta-analysis uses all of the available valid data, its statistical power is greater than that of an individual study. It may therefore demonstrate moderate but clinically important effects of treatment that were not apparent in individual randomized studies.

Systematic reviews of treatment, like single therapeutic studies, have to be tested for validity and quality. The following questions should be posed.

• Is it a systematic review of relevant and randomized studies? We have already seen that the first task in the EBM process is to ask a clearly formulated question. It is therefore necessary to determine whether the subject of the systematic review is truly relevant to the therapeutic question that needs to be answered. The next step is to make sure that only randomized studies have been included. Systematic reviews that contain a mixture of randomized and non-randomized studies may give misleading results.

• Do the authors describe the methods by which relevant trials were located? Although electronic searching greatly facilitates the identification of clinical trials, up to half of the relevant studies may be missed because of miscoding. It is therefore important for authors to make clear whether they supplemented electronic searching with hand-searching of appropriate journals. They may also, for example, have contacted authors of trials, as well as relevant groups in the pharmaceutical industry. In general, negative studies are less likely to be published than positive ones, which can lead to falsely optimistic conclusions about the efficacy and tolerability of particular treatments. For example, analysis of all completed studies of new antidepressants in adolescents indicated that some SSRIs and venlafaxine might increase the risk of suicidal behaviour. This potentially important finding was not apparent from analyses of the published data alone (Whittington et al., 2004).

• How did the authors decide which studies should be included in the systematic review? In a systematic review, authors have to decide which of the various studies that they identify should be included in the overall analysis. This means defining explicit measures of quality, which will be based on the factors outlined above. Because these judgements are in part subjective, it is desirable for them to be made independently by at least two of the investigators.

• Were the results of the therapeutic intervention consistent from one study to another? It is common to see differences in the size of the effect of a therapeutic intervention from one study to another. However, if the effects are mixed, with some studies showing a large clinical effect while others find none at all, the trials are said to show heterogeneity. Sometimes heterogeneity can be accounted for by factors such as lower doses of a drug treatment or differences in patient characteristics. If there is no likely explanation for it, the results of the review must be considered tentative.

Presentation of results

Combining odds ratios

The results of meta-analyses are often presented as a ‘forest plot’ in which the findings of the various studies are shown in diagrammatic form (see Figure 6.1). As noted above, studies in which the outcome is an event are presented as odds ratios with 95% confidence intervals.

The aim of meta-analysis is to obtain a pooled estimate of the treatment effect by combining the odds ratios or effect sizes of all the studies. This is not simply an average of all the odds ratios, but is weightedso that studies with more statistical information and greater precision (with narrower confidence intervals) contribute relatively more to the final result. The pooled odds ratio also has a 95% confidence interval. Once again, if this interval includes the value of 1.0, the experimental intervention does not differ from the control.

In Figure 6.1 some of the studies show a significant effect of assertive community treatment (ACT) in decreasing readmission, while others do not. The two pooled analyses are difficult to interpret because the confidence intervals of one of them (the fixed effects model) do not overlap with 1.0, making ACT significantly different from the control, whereas the other (the random effects model) just overlaps with 1.0 and is therefore of marginal statistical significance.

We have already seen that the studies in a meta-analysis may indicate heterogeneity. This can be tested statistically with a modification of the chi-squared test. If significant heterogeneity is present, the most appropriate meta-analytic technique is a pooled random effects model. This model assumes that different treatment effects will occur in different studies, and takes this into account in the pooled estimate. This usually results in wider confidence intervals, as in the pooled random effect odds ratio in the ACT studies. If the studies suggest a single underlying population treatment effect (i.e. lack of heterogeneity), the pooled treatment analysis should use a fixed effects model. This estimate has narrower confidence intervals that may, however, be misleading in the presence of significant heterogeneity.


Figure 6.1 Effect of active community treatment on the odds of admission to hospital. Reproduced from Evidence Based Mental Health, Nick Freemantle and John Geddes, 1 (4), pp. 102–4, copyright 1998 with permission from BMJ Publishing Group Ltd.

In Figure 6.1 there is statistically significant heterogeneity between the studies, and inspection of the data shows that the majority of the benefit is contributed by two of the studies, which are not the largest. The random and fixed effects models find a similar mean benefit of ACT in preventing readmission, but the random effects model has a wider confidence interval and, as noted above, just overlaps with 1.0. Because of the heterogeneity of the studies, the random effects model is the more appropriate way of analysing the data. Overall, therefore, we would be cautious about accepting the efficacy of ACT in lowering readmission rates, unless we were able to find a convincing reason for the variation in the study results.

Effect sizes

As noted above, where the outcome measure is a continuous variable, the usual method of calculating results is to use effect sizes. As with odds ratios, the effect sizes can be combined to give a pooled estimate of greater precision.

Clinical utility

The clinical utility of meta-analyses is assessed as described for individual studies above. Meta-analyses will often provide figures for the NNT. However, as shown above, it is also possible to calculate NNT values from meta-analysis data using ARR or odds ratios.

Problems with meta-analysis

Biased conclusions

Apart from a systematic location of evidence, the aim of meta-analysis is to combine data from multiple studies into a single estimate of treatment effect. There are a number of ways in which the results of such an exercise can be misleading.

• Publication bias. Evidence indicates that studies which show positive treatment effects are more likely to be published than negative studies. If negative studies are not included in the meta-analysis, the effect of treatment will be inflated.

• Duplication of publication. Just as negative treatment studies may go unpublished, positive studies may be published several times in different forms, sometimes with different authors! This again will falsely elevate treatment effects if the same study is included more than once.

• Heterogeneity of studies. As noted above, individual studies may vary widely in the results obtained, because of quite subtle differences in study design, quality, and patient population. If such heterogeneity is not recognized and accounted for in the meta-analysis, misleading conclusions will be drawn.

How accurate is meta-analysis?

There are some well-known examples where the results of meta-analyses have been contradicted subsequently by single large randomized trials. For example, a meta-analysis which showed that intravenous magnesium improved outcome in patients with myocardial infarction was later decisively refuted by a single large randomized trial of 58 000 patients. The misleading result of the meta-analysis was later explained on the basis of publication bias, poor methodological quality in the smaller trials, and clinical heterogeneity (Baigent et al., 2010).

Reviews of this area have generally found that about 80% of meta-analyses agree with single large trials in terms of direction of effect of treatment, but the size and statistical significance of the effect often differ between the two methods. Furthermore, separate meta-analyses of the same therapeutic intervention may reach quite different conclusions (Furukawa, 2004).

Funnel plots

One way of improving the reliability of meta-analyses is by the use of funnel plots. The funnel plot is based on the assumption that the precision (confidence interval) of the estimated treatment effect will be greater in studies with a larger sample size. Therefore the effect sizes of larger studies should cluster around the overall mean difference between experimental and control groups. By contrast, the results from smaller studies should be more dispersed around the mean. Therefore, when the precision of individual studies is plotted against their odds ratios or effect sizes, the resulting graphical plot should resemble a symmetrical inverted funnel (the funnel plot). Statistically significant deviations from this plot suggest that the meta-analysis may be biased and should be treated with caution.

Large-scale randomized trials

As noted above, the advantage of meta-analysis is that by combining individual studies it can assemble sufficient patient numbers to allow detection of moderate-sized but clinically important therapeutic effects. Another way of detecting moderate-sized treatment effects is to randomize very large numbers of patients to a single study. These large-scale randomized (simple) trials (or mega-trials) have advantages over meta-analysis in that all of the patients can be allocated to a single study design. Such studies need numerous collaborators and therefore require a simple study design and a clear end point.

This methodology has been most successfully applied to areas of medicine, such as cardiovascular disorders, where interventions can sometimes be simple (e.g. one dose of aspirin daily) and end points (e.g. cardiac infarction, or death) clearly identified (Baigent et al., 2010). The challenge for psychiatric trials is to adapt such methodology to conditions for which interventions are more complex and end points more subtle.


A general problem when applying evidence from randomized trials and meta-analysis to routine clinical work is that clinical trials are often conducted in rather ‘ideal’ conditions, which in a number of respects may not match routine clinical work:

• Patient population. Patients in controlled trials may differ systematically from those in routine clinical care in being less severely ill and having fewer comorbid difficulties. Thus trials may be conducted on patients who are in fact rather unrepresentative of a usual patient population.

• Level of supervision. In drug trials, concordance is regularly monitored by frequent review and supervision. Thus patients are less likely to drop out of treatment even when drugs are not particularly well tolerated.

• Therapist variables. Particularly in psychotherapy trials, treatment may be administered by skilled and experienced therapists. In routine practice, treatments may be given by people with less experience. Furthermore, in trials the performance of therapists is often monitored closely to ensure that it conforms to the treatment protocol. Everyday practice may match the protocol less well.

Pragmatic trials

To overcome these limitations, it has been suggested that pragmatic trials might be a more appropriate way to study the effect of certain psychiatric interventions. Such studies aim to conduct randomized trials in ‘real-life’ situations. Methodologically they have much in common with the mega-trials described earlier, in that they are designed to answer simple and important clinical questions. As far as possible, pragmatic trials are conducted in a routine clinical setting. Other important features include the following:

• Randomization of very large numbers of subjects is necessary to take account of the fact that most advances in treatment will yield only moderate effect sizes. Blinding is seen as less important than randomization, particularly where active treatments are compared.

• The process of recruitment is simplified by avoiding restrictive entry criteria. The principle criterion is that both doctor and patient should feel substantial uncertainty as to which of the trial treatments is best.

• Assessments are streamlined so that they fit in with routine clinical practice: ‘Many trials would be of much greater scientific value if they collected 10 times less data … on 10 times more patients’ (Baigent et al., 2010).

• Clinically relevant outcome measures are used. For example, in a trial of a therapeutic intervention in schizophrenia, a rating by a patient and family member on a simple scale of well-being may carry more clinical relevance than a score on a standardized rating scale.

Implementation of evidence-based medicine

Implementing EBM for the individual patient

Having obtained the best evidence on a therapeutic intervention and decided that it is valid and therapeutically useful, it is necessary to decide how applicable it is to the individual patient you are considering. In large measure this depends on the answers to the questions on ‘applicability’ listed above. The key issues are as follows:

• How similar is the patient to those in the randomized trials? (This might be particularly relevant to psychiatry, where patients with more severe illness may be under-represented.)

• Can the local service deliver the intervention successfully? (For example, it is no use recommending applied relaxation for generalized anxiety disorder if there are no trained therapists available to carry it out.)

When making the decision about implementation, it may be useful to adjust the NNT for local clinical conditions if the relevant information is available (see above). A further way of taking more information into account in clinical decision making is provided by the concept of ‘likelihood of being helped or harmed (LHH).’ Straus and McAlister (2001) describe an example in which a patient and clinician are considering the use of the anticholinesterase, donepezil, to decrease the risk of cognitive decline. The NNT of donepezil for this indication is 6, while the NNH to experience an adverse event with donepezil is 11. The LHH is calculated as the ratio of (1/ NNT) to (1/NNH) or (1/6):(1/11), which is about 2 to 1 in favour of donepezil. It is also possible to weight the NNT and NNH with factors that incorporate the patient’s attitude to the value of avoiding cognitive decline relative to that of experiencing adverse effects. Whether such efforts at quantification add significantly to a careful clinical assessment and discussion with the patient is questionable. The important point is that results from randomized trials need to be adapted to the differing needs of individuals.

Implementing EBM at a service level

Haynes (1999) suggested that the following stages are important in the implementation of a new treatment:

• Efficacy. Does the intervention work under carefully controlled (‘ideal’) conditions?

• Effectiveness. Does the intervention work when provided under the usual circumstances of healthcare practice?

• Efficiency. What is the effectiveness of the intervention in relation to the resources that it consumes (i.e. its cost-effectiveness or cost–benefit;)?

Ideally, the full implementation of EBM would involve successful negotiation of all these stages, and only interventions that have satisfied the three criteria of efficacy, effectiveness, and efficiency would be used. In practice, many therapeutic interventions in psychiatry (particularly drug treatment and cognitive–behaviour therapy) are of proven efficacy, but there is often uncertainty about their effectiveness and efficiency. For example, lithium treatment is efficacious in the prophylaxis of bipolar disorder, but appears to have disappointing effectiveness, mainly because under standard clinical conditions relatively few patients take lithium reliably (Goodwin, 1999).

Clinical practice guidelines

In some medical fields there is a substantial amount of evidence of different kinds, but still considerable clinical uncertainty about the best therapeutic management. In this situation it may be worth developing clinical practice guidelines, which are explicitly evidence based.

Such guidelines are best developed in the following way.

• A guideline development group, composed of a multi-disciplinary group and patient representatives, decides on the precise clinical questions to be answered.

• The available evidence is systematically reviewed and classified according to the hierarchy shown in Table 6.1.

• The guideline development group makes recommendations, explicitly demonstrating how their recommendations are linked to the available evidence.

Clinical guidelines are best developed at a national level by appropriate professional organizations, but usually require modification to take local clinical conditions into account. Guidelines will only be effective if they are actively disseminated and implemented.

In the UK, the National Institute for Health and Clinical Excellence has taken a prominent role in analysing and promulgating evidence about therapeutic interventions in the form of national guidelines ( The success of this process in changing clinical practice is not certain, and may depend on other factors such as the strength and stability of evidence, and cost issues (Freemantle, 2004). A further problem is that sometimes the evidence which is used in guideline development is based on a few trials whose relevance to the real world may be questionable. Nevertheless, workable guidelines need relatively definitive advice; this can lead to the issuing of rather arbitrary guidance together with a diminished probability that more informative studies will be carried out subsequently. Clinicians need to be aware of the key guidelines relating to the conditions that they treat, and should be able to justify pursuing different approaches where they have judged this to be necessary in individual cases.

Evaluation of evidence-based medicine

EBM needs to be evaluated through randomized trials of effectiveness as described above for the guidelines on treating depression. Individual practitioners can also evaluate their EBM performance by:

• auditing what proportion of their clinical decisions are evidence based

• recognizing gaps in practice that require a search for and appraisal of relevant evidence

• auditing the effectiveness of evidence-based practice changes.

In this way the process of EBM can become an integral part of continuing professional development and the audit cycle.

Other applications of evidence-based medicine

The foregoing account has focused on the use of EBM in the assessment of therapeutic interventions in psychiatry. Other applications of EBM include the assessment of evidence relating to diagnosis, prognosis, and aetiology. These applications require rather different methodologies from the randomized trials considered above, and diagnosis and prognosis will be discussed in the remainder of the chapter. Approaches to aetiology have been discussed in Chapter 5. All of these applications start with a focused question which, as with treatment-related questions, must:

• be directly relevant to the identified problem

• be constructed in a way that facilitates searching for a precise answer (Geddes, 1999; see Table 6.3).


If a practitioner is trying to assess the value of a particular study assessing a diagnostic test, they need to consider a number of questions.

• Was there an independent, blind comparison of the test with a diagnostic gold standard?

• Did the sample include the range of patients to whom the test is likely to be applied in clinical practice?

Table 6.3 Common types of clinical question


• What is the sensitivity and specificity of the test?

• Will it aid the management of my patients?


Question: How useful is the CAGE questionnaire in detecting problem drinking in medical and surgical inpatients?

The CAGE questionnaire is a simple four-item questionnaire designed to detect patients with alcohol misuse (see Chapter 17, p. 454). Sackett (1996) describes a study in a community-based teaching hospital in Boston where the CAGE questionnaire was administered to 518 patients. The gold standard to which the CAGE questionnaire was compared was an extensive social history and clinical examination supplemented by liver biopsy where indicated. We can therefore be reasonably confident that cases of alcohol misuse were reliably identified.

On clinical (‘gold standard’) grounds, 117 patients met the criteria for alcohol misuse or dependence. Of the 61 patients who scored positively on the CAGE questionnaire (scores of 3 or 4), 60 were found to have ‘gold standard’ evidence of alcohol misuse. The CAGE is therefore highly specific (see Figure 6.2). However, the remaining 57 patients with alcohol misuse were not identified by the CAGE. The CAGE therefore has only a modest sensitivity (see Figure 6.2).

These results show that the CAGE is a useful screening instrument for problem drinking in a general hospital setting, in that a positive response is highly predictive of alcohol problems. However, the test would have to be applied in the knowledge that a negative CAGE response does not rule out alcohol misuse, particularly if there is other evidence of problem drinking.


Studies relating to prognosis should be assessed by considering the following questions.

• Was a defined, representative sample of patients identified at a common point, early in the course of the disorder?


Figure 6.2 The CAGE questions for screening for alcohol abuse/dependency. From Sackett DL (1996). Evaluation of clinical method. In DJ Weatherall, JGG Ledingham and DA Warrell, eds. Oxford Textbook of Medicine, 15–21. 3rd edn. Oxford University Press.

• Was the follow-up sufficiently long and was it complete?

• Were objective outcome criteria applied in a blind fashion?

• Are these follow-up data likely to apply to my own patients?

A common problem with prognostic studies is lack of complete follow-up. As a rule of thumb, less than 5% drop-out is ideal, and more than 20% makes the study of questionable validity. As with treatment trials, the applicability of the study will depend critically on the extent to which the patients in it resemble those whom the practitioner is considering.


Question: How much of the time during long-term follow-up do patients with bipolar disorder experience affective symptomatology, and what is the pattern of symptoms?

Judd et al. (2002) recruited 146 patients with a diagnosis of bipolar disorder and a current episode of major mood disturbance from five tertiary care centres in the USA. They were followed up with interviews every 6 months for the first 5 years, and annually thereafter. At interview, affective symptoms were elicited using Psychiatric Status Rating Scales which were linked to the Research Diagnostic Criteria (RDC). Affective symptoms that did not meet the criteria for RDC diagnosis were assigned to sub-syndromal categories of depression or mania.

The mean follow-up period was 14.2 years, and 93% of the subjects were followed up for more than 2 years. Patients were symptomatically ill for about half of the time. Overall, about 90% of the patients spent one or more weeks during follow-up with depressive symptoms, and almost the same number (86%) experienced at least one week of manic or hypomanic symptoms. However, depressive symptoms during follow-up (32% of follow-up weeks) were about three times as common as manic or hypomanic symptoms (9.3%). Most of these depressive states were classified as sub-syndromal depression or minor mood disorder, rather than major depression. At least one week of mood cycling or mixed affective states was noted in about 48% of patients.

This study suggests that patients with bipolar disorder who are referred to a tertiary centre with an episode of major mood disturbance will have some symptoms of mood disturbance for about half of the time over the next few years. Overall, depressive symptoms predominate, particularly minor and sub-syndromal depressive states. However, over time patients can experience considerable fluctuation in symptoms (both manic and depressive, or mixed). This study also has some aetiological implications, because it suggests that bipolar disorder and other milder depressive and hypomanic states are different expressions of the same underlying disorder. In terms of applicability, we would note that the patients in the study are tertiary referrals, so the findings might not apply, for example, to patients in primary care.

Qualitative research methods

Qualitative research methods are used to collect and analyse data that cannot be easily represented by numbers (see Brown and Lloyd, 2001). Although current evidence-based approaches in psychiatry have focused on the use of randomized controlled trials with quantitative end points, the use of qualitative approaches has a long history and encompasses, for example, the classificatory work of Kraepelin and the case studies of Freud. More modern case series, such as that of Russell (1979) describing bulimia nervosa, also rely on a qualitative approach.

The key differences between qualitative and quantitative research are summarized in Table 6.4.

When should qualitative methods be used?

There are a number of circumstances in which the application of qualitative methods is appropriate in psychiatric research and service development (Brown and Lloyd, 2001):

• in the initial stages of research, to conceptualize and clarify the relevant questions and to generate hypotheses

• to gather and collate the attitudes, beliefs, and experiences of service users, carers, and professionals

• in the development of assessment tools and rating scales

• to examine the use of evidence-based interventions in practice and to understand problems with their implementation.

From this it can be seen that qualitative and quantitative approaches are not conflicting, but have particular uses in defined situations. For example, Rogers et al. (2003) combined a qualitative methodology with a quantitative randomized trial, which aimed to improve the management of antipsychotic medication in patients with schizophrenia. The qualitative study indicated that the trial participants did not readily recall the details of the interventions to which they had been exposed. On the other hand, they valued the opportunity provided by the trial for greater communication and contact with professionals. This was associated with greater feelings of self-efficacy and clinical improvement.

Table 6.4 Key differences between quantitative and qualitative research


Evaluation of qualitative research

Like quantitative research, qualitative research needs to be relevant and valid. Similar general principles apply to the processes of participant selection, which should be clear and justified. It is important to be explicit about the reasons for choosing a qualitative approach, and to give a clear description of the methods of data collection and analysis. The concept of permeability, or the extent to which observations have influenced understanding, is important in qualitative research. The following factors influence study permeability:

• The degree of engagement with the material. How far were theories generated by direct contact (e.g. through interviews, naturalistic observation, and familiarity with textual sources)?

• Iteration. Did the investigators continually reformulate and re-examine their interpretations in the light of continuing observations?

• Grounding. Were the procedures for linking interpretations with observations clearly presented, and were illustrative examples given?

• Context. Were the values and expectations of the investigator disclosed? Was the cultural context of the research and its meaning to the participants made explicit?


The concept of validity in qualitative research refers to the soundness and reasoning of interpretation, rather than comparison with an objective external criterion. Validity also differs according to role. For example, readers of a study will look for coherence and internal consistency of material, whereas participants must feel that their experiences are accurately described by the interpretations. Finally, the authors of the research need to take into account the effect that the research process itself might have on the data which they have collected (‘reflexivity’).

Assessment of qualitative research is not always straightforward, and the specialist terminology may defeat general readers. Brown and Lloyd (2001) have pointed out the lack of utility of evaluative checklists whose terminology is not readily understood by health service researchers. For a further discussion of the evaluation of qualitative research, see Greenhalgh (2010).

Evidence-based medicine in psychiatry

Practising according to evidence-based principles is an important aspect of modern psychiatry, and is followed as far as possible in this textbook. EBM is probably best viewed as an ‘approach’ whereby the key tasks for the practitioner are to formulate a relevant clinical question and then identify the best evidence with which to answer it. In addition, EBM provides an important means by which psychiatrists can work in a way that improves both consistency and quality of treatment. However, clinicians work with individual patients with particular needs in specific settings. Therefore EBM needs be tempered with the clinical skills that an expert practitioner brings to each clinical encounter.

Further reading

Greenhalgh T (2010). How to Read a Paper. Wiley Blackwell, Oxford. (A concise handbook that provides a clear exposition of the principles of EBM and their implementation.)