June Raine
Synopsis
This chapter is about evidence-based drug therapy.
New drugs are progressively introduced by clinical pharmacological studies in rising numbers of healthy and/or patient volunteers until sufficient information has been gained to justify formal therapeutic studies. Each of these is usually a randomised controlled trial, in which a precisely framed question is posed and answered by treating equivalent groups of patients in different ways.
The key to the ethics of such studies is informed consent from patients, efficient scientific design and review by an independent research ethics committee. The key interpretative factors in the analysis of trial results are calculations of confidence intervals and statistical significance. Potential clinical significance develops within the confines of controlled clinical trials. This is best expressed by stating not only the percentage differences, but also the absolute difference or its reciprocal, the number of patients who have to be treated to obtain one desired outcome. The outcome might include both efficacy and safety.
Surveillance studies and the reporting of spontaneous adverse reactions respectively determine the clinical profile of the drug and detect rare adverse events. Further trials to compare new medicines with existing medicines are also required. These form the basis of cost–effectiveness comparisons.
Topics include:
• Experimental therapeutics.
• Ethics of research.
• Rational introduction of a new drug.
• Need for statistics.
• Types of trial: design, size.
• Meta-analysis.
• Pharmacoepidemiology.
Experimental therapeutics
After preclinical evidence of efficacy and safety have been obtained in animals, potential medicines are tested in healthy volunteers and volunteer patients. Studies in healthy normal volunteers can help to determine the safety, tolerability, pharmacokinetics and, for some drugs (e.g. anticoagulants and anaesthetic agents), their dynamic effect and likely dose range. For many drugs, the dynamic effect and hence therapeutic potential can be investigated only in patients, e.g. drugs for parkinsonism and antimicrobials will have no measurable efficacy in subjects without movement disorder or infection, respectively.
Modern medicine is sometimes accused of callous application of science to human problems and of subordinating the interest of the individual to those of the group (society).1 Official regulatory bodies rightly require scientific evaluation of drugs. Drug developers need to satisfy the official regulators and they also seek to persuade the medical profession to prescribe their products. Patients, too, are more aware of the comparative advantages and limitations of their medicines than they used to be. To some extent, this helps encourage patients to participate in trials so that future patients can benefit, as they do now, from the knowledge gained from such trials. An ethical framework is required to ensure that the interests of the individual participant take precedence over those of society (and, more obviously, those of an individual or corporate investigator).
Research involving human subjects
The definition of research continues to present difficulties. The distinction between medical research and innovative medical practice derives from the intent. In medical practice the sole intention is to benefit the individual patient consulting the clinician, not to gain knowledge of general benefit, though such knowledge may incidentally emerge from the clinical experience gained. In medical research the primary intention is to advance knowledge so that patients in general may benefit; the individual patient may or may not benefit directly.2
Consider also the process of audit, which is used extensively to assess performance, e.g. by individual health-care workers, by departments within hospitals or between hospitals. Audit is a systematic examination designed to determine the degree to which an action or set of actions achieves predetermined standards. Whereas research seeks to address ‘known unknowns’ and often discovers ‘unknown unknowns,3 audit is limited to the monitoring of ‘known knowns’: maybe important, but clearly limited.
Ethics of research in humans4
Some dislike the word ‘experiment’ in relation to humans, thinking that its mere use implies a degree of impropriety in what is done. It is better that all should recognise from the true meaning of the word, ‘to ascertain or establish by trial’,5 that the benefits of modern medicine derive almost wholly from experimentation and that some risk is inseparable from much medical advance.
The issue of (adequately informed) consent is a principal concern for Research Ethics Committees (also called Institutional Review Boards). People have the right to choose for themselves whether or not they will participate in research, i.e. they have the right to self-determination (the ethical principle of autonomy). They should be given whatever information is necessary for making an adequately informed choice (consent) with the right to withdraw at any stage. Consent procedures, especially information on risks, loom larger in research than they do in medical practice. This is appropriate given that in research, patients may be submitting themselves to extra risks, or simply to extra inconvenience (e.g. more or longer visits). It is a moot point whether more consent in routine practice might not go amiss. It is also likely that patients participating in well-conducted trials receive more, and sometimes better, care and attention than might otherwise be available. Sometimes the unintended consequences of ethical procedures include causing unnecessary apprehension to patients with long, legalistic documents, and creating a false impression of clinical researchers as people from whom patients need protection.
The moral obligation of all doctors lies in ensuring that in their desire to help patients (the ethical principle of beneficence) they should never allow themselves to put the individual who has sought their aid at any disadvantage (the ethical principle of non-maleficence) for ‘the scientist or physician has no right to choose martyrs for society’.6
In principle, it may be thought proper to perform a therapeutic trial only when doctors (and patients) have genuine uncertainty as to which treatment is best.7 Not all trials are comparisons of different treatments. Some, especially early phase trials of new drugs, are comparisons of different doses. Comparisons of new with old should usually offer patients the chance of receiving current best treatment with one which might be better. Since this is often rather more than is offered in resource-constrained routine care, the obligatory patient information sheet mantra that ‘the decision whether to take part has no bearing on your usual care’ may be economical with the truth. But it is also simplistic to view the main purpose of all trials with medicines as comparative.
The past decade has seen the pharmaceutical industry struggle to match the pace of new understanding about disease pathogenesis, and models of research are being adapted to the complexity of common disease that is now apparent. In diseases where many good medicines already exist, the industry spent much time developing minor modifications which were broadly equivalent to current therapy with possible advantages for some patients. With many of the standard blockbusters now off patent, new drugs for such diseases are unattractive, and the industry is concentrating more on harder therapeutic targets where no satisfactory treatment yet exists. Just as in basic science, non-hypothesis-led ‘fishing expedition’ research – genome scans, microarrays – is no longer frowned upon, so the imaginative clinical investigator must throw his stone – a new medicine – into the pond, and be able to make sense of the ripples. One such approach is to move away from trial design in which it is the average response of the group which is of interest towards the design in which the investigator attempts to match differences in response to differences – ethnic, gender, genetic – among the patients. Matches at a molecular level give clues both to how the drug may best be used, and who will benefit most.
The ethics of the randomised and placebo-controlled trial
Providing that ethical surveillance is rooted in the ethical principles of justice,8 there should be no difficulty in clinical research adapting to current needs. And even if the nature of early phase research is changing, the randomised controlled trial will remain the cornerstone of how cause and effect is proven in clinical practice, and how drugs demonstrate the required degree of efficacy and safety to obtain a licence for their prescription.
The use of a placebo (or dummy) raises both ethical and scientific issues (see placebo medicines and the placebo effect,Ch. 2). There are clear-cut cases when placebo use would be ethically unacceptable and scientifically unnecessary, e.g. drug trials in epilepsy and tuberculosis, when the control groups comprise patients receiving the best available therapy.
The pharmacologically inert (placebo) treatment arm of a trial is useful:
• To distinguish the pharmacodynamic effects of a drug from the psychological effects of the act of medication and the circumstances surrounding it, e.g. increased interest by the doctor, more frequent visits, for these latter may have their placebo effect. Placebo responses have been reported in 30–50% of patients with depression and in 30–80% with chronic stable angina pectoris.
• To distinguish drug effects from natural fluctuations in disease that occur with time, e.g. with asthma or hay fever, and other external factors, provided active treatment, if any, can be ethically withheld. This is also called the ‘assay sensitivity’ of the trial.
• To avoid false conclusions. The use of placebos is valuable in Phase 1 healthy volunteer studies of novel drugs to help determine whether minor but frequently reported adverse events are drug related or not. Although a placebo treatment can pose ethical problems, it is often preferable to the continued use of treatments of unproven efficacy or safety. The ethical dilemma of subjects suffering as a result of receiving a placebo (or ineffective drug) can be overcome by designing clinical trials that provide mechanisms to allow them to be withdrawn (‘escape’) when defined criteria are reached, e.g. blood pressure above levels that represent treatment failure. Similarly, placebo (or new drug) can be added against a background of established therapy; this is called the ‘add on’ design.
• To provide a result using fewer research subjects. The difference in response when a test drug is compared with a placebo is likely to be greater than that when a test drug is compared with the best current, i.e. active, therapy (see later).
Investigators who propose to use a placebo, or otherwise withhold effective treatment, should justify their intention. The variables to consider are:
• The severity of the disease.
• The effectiveness of standard therapy.
• Whether the novel drug under test aims to give only symptomatic relief, or has the potential to prevent or slow up an irreversible event, e.g. stroke or myocardial infarction.
• The length of treatment.
• The objective of the trial (equivalence, superiority or non-inferiority; see p. 45). Thus it may be quite ethical to compare a novel analgesic against placebo for 2 weeks in the treatment of osteoarthritis of the hip (with escape analgesics available). It would not be ethical to use a placebo alone as comparator in a 6-month trial of a novel drug in active rheumatoid arthritis, even with escape analgesia.
The precise use of the placebo will depend on the study design, e.g. whether crossover, when all patients receive placebo at some point in the trial, or parallel group, when only one cohort receives placebo. Generally, patients easily understand the concept of distinguishing between the imagined effects of treatment and those due to a direct action on the body. Provided research subjects are properly informed and give consent freely, they are not the subject of deception in any ethical sense; but a patient given a placebo in the absence of consent is deceived and research ethics committees will, rightly, decline to agree to this. (See also: Lewis et al (2002) in Guide to further reading, at the end of this chapter.)
Injury to research subjects9
The question of compensation for accidental (physical) injury due to participation in research is a vexed one. Plainly there are substantial differences between the position of healthy volunteers (whether or not they are paid) and that of patients who may benefit and, in some cases, who may be prepared to accept even serious risk for the chance of gain. There is no simple answer. But the topic must always be addressed in any research carrying risk, including the risk of withholding known effective treatment. The CIOMS/WHO Guidelines4 state:
Research subjects who suffer physical injury as a result of their participation are entitled to such financial or other assistance as would compensate them equitably for any temporary or permanent impairment or disability. In the case of death, their dependants are entitled to material compensation. The right to compensation may not be waived.
Therefore, when giving their informed consent to participate, research subjects should be told whether there is provision for compensation in case of physical injury, and the circumstances in which they or their dependants would receive it.
Payment of subjects in clinical trials
Healthy volunteers are usually paid to take part in a clinical trial. The rationale is that they will not benefit from treatment received and should be compensated for discomfort and inconvenience. There is a fine dividing line between this and a financial inducement, but it is unlikely that more than a small minority of healthy volunteer studies would now take place without a ‘fee for service’ provision, including ‘out of pocket’ expenses. It is all the more important that the sums involved are commensurate with the invasiveness of the investigations and the length of the studies. The monies should be declared and agreed by the ethics committee.
There is an intuitive abreaction by physicians to pay patients (compared with healthy volunteers), because they feel the accusation of inducement or persuasion could be levelled at them, and because they assuage any feeling of taking advantage of the doctor–patient relationship by the hope that the medicines under test may be of benefit to the individual. This is not an entirely comfortable position.10
Rational introduction of a new drug to humans
When studies in animals predict that a new molecule may be a useful medicine, i.e. effective and safe in relation to its benefits, then the time has come to put it to the test in humans. Most doctors will be involved in clinical trials at some stage of their career and need to understand the principles of drug development. When a new chemical entity offers a possibility of doing something that has not been done before or of doing something familiar in a different or better way, it can be seen to be worth testing. But where it is a new member of a familiar class of drug, potential advantage may be harder to detect. Yet these ‘me too’ drugs are often worth testing. Prediction from animal studies of modest but useful clinical advantage is particularly uncertain and, therefore, if the new drug seems reasonably effective and safe in animals it is rational to test it in humans. From the commercial standpoint, the investment in the development of a new drug can be over £500 million, but will be substantially less for a ‘me too’ drug entering an already developed and profitable market.
Phases of clinical development
Human experiments progress in a commonsense manner that is conventionally divided into four phases (Fig. 4.1). These phases are divisions of convenience in what is a continuous expanding process. It begins with a small number of subjects (healthy subjects and volunteer patients) closely observed in laboratory settings, and proceeds through hundreds of patients, to thousands before the drug is agreed to be a medicine by a national or international regulatory authority. It is then licensed for general prescribing (though this is by no means the end of the evaluation). The process may be abandoned at any stage for a variety of reasons, including poor tolerability or safety, inadequate efficacy and commercial pressures. The phases are:
• Phase 1. Human pharmacology (20–50 subjects):
healthy volunteers or volunteer patients, according to the class of drug and its safety
pharmacokinetics (absorption, distribution, metabolism, excretion)
pharmacodynamics (biological effects) where practicable, tolerability, safety, efficacy.
• Phase 2. Therapeutic exploration (50–300 subjects):
patients
pharmacokinetics and pharmacodynamic dose ranging, in carefully controlled studies for efficacy and safety,11 which may involve comparison with placebo.
• Phase 3. Therapeutic confirmation (randomised controlled trials; 250–1000 + subjects):
patients
efficacy on a substantial scale; safety; comparison with existing drugs.
• Phase 4. Therapeutic use (pharmacovigilance, post-licensing studies) (2000–10 000 + subjects):
surveillance for safety and efficacy: further formal therapeutic trials, especially comparisons with other drugs, marketing studies and pharmacoeconomic studies.
Fig. 4.1 The phases of drug discovery and development.
(With permission of Pharmaceutical Research and Manufacturers of America.)
Official regulatory guidelines and requirements12
For studies in humans (see also Ch. 6) these ordinarily include:
• Studies of pharmacokinetics and bioavailability and, in the case of generics, bioequivalence (equal bioavailability) with respect to the reference product.
• Therapeutic trials (reported in detail) that substantiate the safety and efficacy of the drug under likely conditions of use, e.g. a drug for long-term use in a common condition will require a total of at least 1000 patients (preferably more), depending on the therapeutic class, of which (for chronic diseases) at least 100 have been treated continuously for about 1 year.
• Special groups. If the drug will be used in, for example, the elderly or children, then these populations should be studied. New drugs are not normally studied in pregnant women. Studies in patients having disease that affects drug metabolism and elimination may be needed, such as patients with impaired liver or kidney function.
• Fixed-dose combination products will require explicit justification for each component.
• Interaction studies with other drugs likely to be taken simultaneously. Plainly, all possible combinations cannot be evaluated; a rational choice, based on knowledge of pharmacodynamics and pharmacokinetics, is made.
• The application for a licence for general use (marketing application) should include a draft Summary of Product Characteristics for prescribers. A Patient Information Leaflet must be submitted. These should include information on the form of the product (e.g. tablet, capsule, sustained-release, liquid), its uses, dosage (adults, children, elderly where appropriate), contraindications, warnings and precautions (less strong), side-effects/adverse reactions, overdose and how to treat it.
The emerging discipline of pharmacogenomics seeks to identify patients who will respond beneficially or adversely to a new drug by defining certain genotypic profiles. Individualised dosing regimens may be evolved as a result (see p. 101). This tailoring of drugs to individuals is consuming huge resources from drug developers but has yet to establish a place in routine drug development.
Therapeutic investigations
There are three key questions to be answered during drug development:
• Does it work?
• Is it safe?
• What is the dose?
With few exceptions, none of these is easy to answer definitively within the confines of a pre-registration clinical trials programme. Effectiveness and safety have to be balanced against each other. What may be regarded as ‘safe’ for a new oncology drug in advanced lung cancer would not be so regarded in the treatment of childhood eczema. The use of the term ‘dose’, without explanation, is irrational as it implies a single dose for all patients. Pharmaceutical companies cannot be expected to produce a large array of different doses for each medicine, but the maxim to use the smallest effective dose that results in the desired effect holds true. Some drugs require titration, others have a wide safety margin so that one ‘high’ dose may achieve optimal effectiveness with acceptable safety. There are two classes of endpoint or outcome of a therapeutic investigation:
• The therapeutic effect itself (sleep, eradication of infection), i.e. the outcome.
• A surrogate effect, a short-term effect that can be reliably correlated with long-term therapeutic benefit, e.g. blood lipids or glucose or blood pressure. A surrogate endpoint might also be a pharmacokinetic parameter, if it is indicative of the therapeutic effect, e.g. plasma concentration of an antiepileptic drug.
Use of surrogate effects presupposes that the disease process is fully understood. They are best justified in diseases for which the true therapeutic effect can be measured only by studying large numbers of patients over many years. Such long-term outcome studies are indeed always preferable but may be impracticable on organisational, financial and sometimes ethical grounds prior to releasing new drugs for general prescription. It is in areas such as these that the techniques of large-scale surveillance for efficacy, as well as for safety, under conditions of ordinary use (below), would be needed to supplement the necessarily smaller and shorter formal therapeutic trials employing surrogate effects. Surrogate endpoints are of particular value in early drug development to select candidate drugs from a range of agents.
Therapeutic evaluation
The aims of therapeutic evaluation are three-fold:
1. To assess the efficacy, safety and quality of new drugs to meet unmet clinical needs.
2. To expand the indications for the use of current drugs (or generic drugs13) in clinical and marketing terms.
3. To protect public health over the lifetime of a given drug.
The process of therapeutic evaluation may be divided into pre- and post-registration phases (Table 4.1), the purposes of which are set out below.
Table 4.1 Process of therapeutic evaluation
When a new drug is being developed, the first therapeutic trials are devised to find out the best that the drug can do under conditions ideal for showing efficacy, e.g. uncomplicated disease of mild to moderate severity in patients taking no other drugs, with carefully supervised administration by specialist doctors. Interest lies particularly in patients who complete a full course of treatment. If the drug is ineffective in these circumstances there is no point in proceeding with an expensive development programme. Such studies are sometimes called explanatory trials as they attempt to ‘explain’ why a drug works (or fails to work) in ideal conditions.
If the drug is found useful in these trials, it becomes desirable next to find out how closely the ideal may be approached in the rough and tumble of routine medical practice: in patients of all ages, at all stages of disease, with complications, taking other drugs and relatively unsupervised. Interest continues in all patients from the moment they are entered into the trial and it is maintained if they fail to complete, or even to start, the treatment; the need is to know the outcome in all patients deemed suitable for therapy, not only in those who successfully complete therapy.14
The reason some drop out may be related to aspects of the treatment and it is usual to analyse these according to the clinicians' initial intention (intention-to-treat analysis), i.e. investigators are not allowed to risk introducing bias by exercising their own judgement as to who should or should not be excluded from the analysis. In these real-life, or ‘naturalistic’, conditions the drug may not perform so well, e.g. minor adverse effects may now cause patient non-compliance, which had been avoided by supervision and enthusiasm in the early trials. These naturalistic studies are sometimes called ‘pragmatic’ trials.
The methods used to test the therapeutic value depend on the stage of development, who is conducting the study (a pharmaceutical company, or an academic body or health service at the behest of a regulatory authority), and the primary endpoint or outcome of the trial. The methods include:
• Formal therapeutic trials.
• Equivalence and non-inferiority trials.
• Safety surveillance methods.
Formal therapeutic trials are conducted during Phase 2 and Phase 3 of pre-registration development, and in the post-registration phase to test the drug in new indications. Equivalence trials aim to show the therapeutic equivalence of two treatments, usually the new drug under development and an existing drug used as a standard active comparator. Equivalence trials may be conducted before or after registration for the first therapeutic indication of the new drug (see p. 46 below for further discussion). Safety surveillance methods use the principles of pharmacoepidemiology (see p. 51) and are concerned mainly with evaluating adverse events and especially rare events, which formal therapeutic trials are unlikely to detect.
Need for statistics
In order truly to know whether patients treated in one way are benefited more than those treated in another, it is essential to use numbers. Statistics has been defined as ‘a body of methods for making wise decisions in the face of uncertainty’.15 Used properly, they are tools of great value for promoting efficient therapy. More than 100 years ago Francis Galton saw this clearly:
The human mind is … a most imperfect apparatus for the elaboration of general ideas … In our general impressions far too great weight is attached to what is marvellous … Experience warns us against it, and the scientific man takes care to base his conclusions upon actual numbers … to devise tests by which the value of beliefs may be ascertained.16
Concepts and terms
Hypothesis of no difference
When it is suspected that treatment A may be superior to treatment B, and the truth is sought, it is convenient to start with the proposition that the treatments are equally effective – the ‘no difference’ hypothesis (null hypothesis). After two groups of patients have been treated and it has been found that improvement has occurred more often with one treatment than with the other, it is necessary to decide how likely it is that this difference is due to a real superiority of one treatment over the other.
To make this decision we need to understand two major concepts, statistical significance and confidence intervals.
A statistical significance test17
such as the Student's t-test or the chi-squared (χ2) test will tell how often an observed difference would occur due to chance (random influences) if there is, in reality, no difference between the treatments. Where the statistical significance test shows that an observed difference would occur only five times if the experiment were repeated 100 times, this is often taken as sufficient evidence that the null hypothesis is unlikely to be true. Therefore, the conclusion is that there is (probably) a real difference between the treatments. This level of probability is generally expressed in therapeutic trials as: ‘the difference was statistically significant’, or ‘significant at the 5% level’ or ‘P = 0.05’ (P is the probability based on chance alone). Statistical significance simply means that the result is unlikely to have occurred if there was no genuine treatment difference, i.e. there probably is a difference.
If the analysis reveals that the observed difference, or greater, would occur only once if the experiment were repeated 100 times, the results are generally said to be ‘statistically highly significant’, or ‘significant at the 1% level’ or ‘P = 0.01’.
Confidence intervals
The problem with the P value is that it conveys no information on the amount of the differences observed or on the range of possible differences between treatments. A result that a drug produces a uniform 2% reduction in heart rate may well be statistically significant but it is clinically meaningless. What doctors are interested to know is the size of the difference, and what degree of assurance (confidence) they may have in the precision (reproducibility) of this estimate. To obtain this it is necessary to calculate a confidence interval (see Figs 4.2 and 4.3).18
Fig. 4.2 Relationship between significance tests and confidence intervals for the comparisons between a new treatment and control. The treatment differences a, b, c are all in favour of ‘New treatment’, but superiority is shown only in A and B. In C, superiority has not been shown. This may be because the effect is small and not detected. The result, nevertheless, is compatible with equivalence or non-inferiority. Adequate precision and power are assumed for all the trials.
Fig. 4.3 Power curves – an illustrative method of defining the number of subjects required in a given study. In practice, the actual number would be calculated from standard equations. In this example the curves are constructed for 16, 40, 100 and 250 subjects per group in a two-limb comparative trial. The graphs can provide three pieces of information: (1) the number of subjects that need to be studied, given the power of the trial and the difference expected between the two treatments; (2) the power of a trial, given the number of subjects included and the difference expected; and (3) the difference that can be detected between two groups of subjects of given number, with varying degrees of power. Also see p. 48.
(With permission from Baber N, Smith R N, Griffin J P, O'Grady J, D'Arcy P F (eds) 1998 Textbook of Pharmaceutical Medicine, 3rd edn. Queen's University of Belfast Press, Belfast.)
A confidence interval expresses a range of values that contains the true value with 95% (or other chosen percentage) certainty. The range may be broad, indicating uncertainty, or narrow, indicating (relative) certainty. A wide confidence interval occurs when numbers are small or differences observed are variable and points to a lack of information, whether the difference is statistically significant or not; it is a warning against placing much weight on (or confidence in) the results of small or variable studies. Confidence intervals are extremely helpful in interpretation, particularly of small studies, as they show the degree of uncertainty related to a result. Their use in conjunction with non-significant results may be especially enlightening.19
A finding of ‘not statistically significant’ can be interpreted as meaning there is no clinically useful difference only if the confidence intervals for the results are also stated in the report and are narrow. If the confidence intervals are wide, a real difference may be missed in a trial with a small number of subjects, i.e. the absence of evidence that there is a difference is not the same as showing that there is no difference. Small numbers of patients inevitably give low precision and low power to detect differences.
Types of error
The above discussion provides us with information on the likelihood of falling into one of the two principal kinds of error in therapeutic experiments, for the hypothesis that there is no difference between treatments may either be accepted incorrectly or rejected incorrectly.
Type I error
(α) is the finding of a difference between treatments when in reality they do not differ, i.e. rejecting the null hypothesis incorrectly. Investigators decide the degree of this error which they are prepared to tolerate on a scale in which 0 indicates complete rejection of the null hypothesis and 1 indicates its complete acceptance; clearly the level for α must be set near to 0. This is the same as the significance level of the statistical test used to detect a difference between treatments. Thus α (or P = 0.05) indicates that the investigators will accept a 5% chance that an observed difference is not a real difference.
Type II error
(β) is the finding of no difference between treatments when in reality they do differ, i.e. accepting the null hypothesis incorrectly. The probability of detecting this error is often given wider limits, e.g. β = 0.1–0.2, which indicates that the investigators are willing to accept a 10–20% chance of missing a real effect. Conversely, the power of the study (1 − β) is the probability of avoiding this error and detecting a real difference, in this case 80–90%.
It is up to the investigators to decide the target difference20 and what probability level (for either type of error) they will accept if they are to use the result as a guide to action.
Plainly, trials should be devised to have adequate precision and power, both of which are consequences of the size of study. It is also necessary to make an estimate of the likely size of the difference between treatments, i.e. the target difference. Adequate power is often defined as giving an 80–90% chance of detecting (at 1–5% statistical significance, P = 0.01–0.05) the defined useful target difference (say 15%). It is rarely worth starting a trial that has less than a 50% chance of achieving the set objective, because the power of the trial is too low.
Types of therapeutic trial
A therapeutic trial is:
a carefully, and ethically, designed experiment with the aim of answering some precisely framed question. In its most rigorous form it demands equivalent groups of patients concurrently treated in different ways or in randomised sequential order in crossover designs. These groups are constructed by the random allocation of patients to one or other treatment … In principle the method has application with any disease and any treatment. It may also be applied on any scale; it does not necessarily demand large numbers of patients.21
This is the classical randomised controlled trial (RCT), the most secure method for drawing a causal inference about the effects of treatments. Randomisation attempts to control biases of various kinds when assessing the effects of treatments. RCTs are employed at all phases of drug development and in the various types and designs of trials discussed below. Fundamental to any trial are:
• A hypothesis.
• The definition of the primary endpoint.
• The method of analysis.
• A protocol.
Other factors to consider when designing or critically appraising a trial are:
• The characteristics of the patients.
• The general applicability of the results.
• The size of the trial.
• The method of monitoring.
• The use of interim analyses.22
• The interpretation of subgroup comparisons.
The aims of a therapeutic trial, not all of which can be attempted at any one occasion, are to decide:
• Whether a treatment is effective.
• The magnitude of that effect (compared with other remedies – or doses, or placebo).
• The types of patients in whom it is effective.
• The best method of applying the treatment (how often, and in what dosage if it is a drug).
• The disadvantages and dangers of the treatment.
Dose–response trials
Response in relation to the dose of a new investigational drug may be explored in all phases of drug development. Dose–response trials serve a number of objectives, of which the following are of particular importance:
• Confirmation of efficacy (hence a therapeutic trial).
• Investigation of the shape and location of the dose–response curve.
• The estimation of an appropriate starting dose.
• The identification of optimal strategies for individual dose adjustments.
• The determination of a maximal dose beyond which additional benefit is unlikely to occur.
Superiority, equivalence and non-inferiority in clinical trials
The therapeutic efficacy of a novel drug is most convincingly established by demonstrating superiority to placebo, or to an active control treatment, or by demonstrating a dose–response relationship (as above).
In some cases the purpose of a comparison is to show not necessarily superiority, but either equivalence or non-inferiority. Such trials avoid the use of placebo, explore possible advantages of safety, dosing convenience and cost, and present an alternative or ‘second-line’ therapy. Examples of a possible outcome in a ‘head to head’ comparison of two active treatments appear in Figure 4.2.
There are in general, two types of equivalence trials in clinical development: bio-equivalence and clinical equivalence. In the former, certain pharmacokinetic variables of a new formulation have to fall within specified (and regulated) margins of the standard formulation of the same active entity. The advantage of this type of trial is that, if bioequivalence is ‘proven’, then proof of clinical equivalence is not required.
Design of trials
Techniques to avoid bias
The two most important techniques are:
• Randomisation.
• Blinding.
Randomisation
Introduces a deliberate element of chance into the assignment of treatments to the subjects in a clinical trial. It provides a sound statistical basis for the evaluation of the evidence relating to treatment effects, and tends to produce treatment groups that have a balanced distribution of prognostic factors, both known and unknown. Together with blinding, it helps to avoid possible bias in the selection and allocation of subjects.
Randomisation may be accomplished in simple or more complex ways, such as:
• Sequential assignments of treatments (or sequences in crossover trials).
• Randomising subjects in blocks. This helps to increase comparability of the treatment groups when subject characteristics change over time or there is a change in recruitment policy. It also gives a better guarantee that the treatment groups will be of nearly equal size.
• By dynamic allocation, in which treatment allocation is influenced by the current balance of allocated treatments.23
Blinding
The fact that both doctors and patients are subject to bias due to their beliefs and feelings has led to the invention of the double-blind technique, which is a control device to prevent bias from influencing results. On the one hand, it rules out the effects of hopes and anxieties of the patient by giving both the drug under investigation and a placebo (dummy) of identical appearance in such a way that the subject (the first ‘blind’ person) does not know which he or she is receiving. On the other hand, it also rules out the influence of preconceived hopes of, and unconscious communication by, the investigator or observer by keeping him or her (the second ‘blind’ person) ignorant of whether he or she is prescribing a placebo or an active drug. At the same time, the technique provides another control, a means of comparison with the magnitude of placebo effects. The device is both philosophically and practically sound.24
A non-blind trial is called an open trial.
The double-blind technique should be used wherever possible, and especially for occasions when it might at first sight seem that criteria of clinical improvement are objective when in fact they are not. For example, the range of voluntary joint movement in rheumatoid arthritis has been shown to be influenced greatly by psychological factors, and a moment's thought shows why, for the amount of pain patients will put up with is influenced by their mental state.
Blinding should go beyond the observer and the observed. None of the investigators should be aware of treatment allocation, including those who evaluate endpoints, assess compliance with the protocol and monitor adverse events. Breaking the blind (for a single subject) should be considered only when the subject's physician deems knowledge of the treatment assignment essential in the subject's best interests.
Sometimes the double-blind technique is not possible, because, for example, side-effects of an active drug reveal which patients are taking it or tablets look or taste different; but it never carries a disadvantage (‘only protection against biased data’). It is not, of course, used with new chemical entities fresh from the animal laboratory, whose dose and effects in humans are unknown, although the subject may legitimately be kept in ignorance (single blind) of the time of administration. Single-blind techniques have a place in therapeutics research, but only when the double-blind procedure is impracticable or unethical.
Ophthalmologists are understandably disinclined to refer to the ‘double-blind’ technique; they call it double-masked.
Some common design configurations
Parallel group design
This is the most common clinical trial design for confirmatory therapeutic (Phase 3) trials. Subjects are randomised to one of two or more treatment ‘arms’. These treatments will include the investigational drug at one or more doses, and one or more control treatments such as placebo and/or an active comparator. Parallel group designs are particularly useful in conditions that fluctuate over a short term, e.g. migraine or irritable bowel syndrome, but are also used for chronic stable diseases such as Parkinson's disease and some types of cancer. The particular advantages of the parallel group design are simplicity, the ability to approximate more closely the likely conditions of use, and the avoidance of ‘carry-over effects’ (see below).
Crossover design
In this design, each subject is randomised to a sequence of two or more treatments, and hence acts as his or her own control for treatment comparisons. The advantage of this design is that subject-to-subject variation is eliminated from treatment comparison so that the number of subjects is reduced.
In the basic crossover design each subject receives each of the two treatments in a randomised order. There are variations to this in which each subject receives a subset of treatments or ones in which treatments are repeated within the same subject (to explore the reproducibility of effects).
The potential disadvantage of the crossover design is carry-over, i.e. the residual influence of treatments on subsequent treatment periods. This can often be avoided either by separating treatments with a ‘wash-out’ period or by selecting treatment lengths based on a knowledge of the disease and the new medication. The crossover design is best suited for chronic stable diseases, e.g. hypertension, chronic stable angina pectoris, where the baseline conditions are attained at the start of each treatment arm. The pharmacokinetic characteristics of the new medication are also important, the principle being that the plasma concentration at the start of the next dosing period is zero and no dynamic effect can be detected.
Factorial designs
In the factorial design, two or more treatments are evaluated simultaneously through the use of varying combinations of the treatments. The simplest example is the 2 × 2 factorial design in which subjects are randomly allocated to one of four possible combinations of two treatments A and B. These are: A alone, B alone, A + B, neither A nor B (placebo). The main uses of the factorial design are to:
• Make efficient use of clinical trial subjects by evaluating two treatments with the same number of individuals.
• Examine the interaction of A with B.
• Establish dose–response characteristics of the combination of A and B when the efficacy of each has been previously established.
Multicentre trials
Multicentre trials are carried out for two main reasons. First, they are an efficient way of evaluating a new medication, by accruing sufficient subjects in a reasonable time to satisfy trial objectives. Second, multicentre trials may be designed to provide a better basis for the subsequent generalisation of their findings. Thus they provide the possibility of recruiting subjects from a wide population and of administering the medication in a broad range of clinical settings. Multicentre trials can be used at any phase in clinical development, but are especially valuable when used to confirm therapeutic value in Phase 3. Large-scale multicentre trials using minimised data collection techniques and simple endpoints have been of immense value in establishing modest but real treatment effects that apply to a large number of patients, e.g. drugs that improve survival after myocardial infarction.
N-of-1 trials
Patients give varied treatment responses and the average effect derived from a population sample may not be helpful in expressing the size of benefit or harm for an individual. In the future, pharmacogenomics may provide an answer, but in the meantime the best way to settle doubt as to whether a test drug is effective for an individual patient is the N-of-1 trial. This is a crossover design in which each patient receives two or more administrations of drug or placebo in random manner; the results from individuals can then be displayed. Two conditions apply. First, the disease in which the drug is being tested must be chronic and stable. Second, the treatment effect must wear off rapidly. N-of-1 trials are not used routinely in drug development and, if so, only at the Phase 3 stage.25,26
Historical controls
Any temptation simply to give a new treatment to all patients and to compare the results with the past (historical controls) is almost always unacceptable, even with a disease such as leukaemia. The reasons are that standards of diagnosis and treatment change with time, and the severity of some diseases (infections) fluctuates. The general provision stands that controls must be concurrent and concomitant. An exception to this rule is the case–control study (see p. 52).
Size of trials
Before the start of any controlled trial it is necessary to decide the number of patients that will be needed to deliver an answer, for ethical as well as practical reasons. This is determined by four factors:
1. The magnitude of the difference sought or expected on the primary efficacy endpoint (the target difference). For between-group studies, the focus of interest is the mean difference that constitutes a clinically significant effect.
2. The variability of the measurement of the primary endpoint as reflected by the standard deviation of this primary outcome measure. The magnitude of the expected difference (above) divided by the standard deviation of the difference gives the standardised difference (Fig. 4.3).
3. The defined significance level, i.e. the level of chance for accepting a Type I (α) error. Levels of 0.05 (5%) and 0.01 (1%) are common targets.
4. The power or desired probability of detecting the required mean treatment difference, i.e. the level of chance for accepting a Type II (β) error. For most controlled trials, a power of 80–90% (0.8–0.9) is frequently chosen as adequate, although higher power is chosen for some studies.
It will be intuitively obvious that a small difference in the effect that can be detected between two treatment groups, or a large variability in the measurement of the primary endpoint, or a high significance level (low P value) or a large power requirement, all act to increase the required sample size. Figure 4.3 gives a graphical representation of how the power of a clinical trial relates to values of clinically relevant standardised difference for varying numbers of trial subjects (shown by the individual curves). It is clear that the larger the number of subjects in a trial, the smaller is the difference that can be detected for any given power value.
The aim of any clinical trial is to have small Type I and II errors, and consequently sufficient power to detect a difference between treatments, if it exists. Of the four factors that determine sample size, the power and significance level are chosen to suit the level of risk felt to be appropriate. The magnitude of the effect can be estimated from previous experience with drugs of the same or similar action; the variability of the measurements is often known from published experiments on the primary endpoint, with or without drug. These data will not be available for novel substances in a new class, and frequently the sample size in the early phase of development is chosen on a more arbitrary basis. Numbers required to detect the difference in frequency of a categorical outcome, e.g. fractures in a trial of osteoporosis or remissions in a cancer trial, are generally larger than numbers required to detect differences in a continuous quantitative variable. As an example, a trial that would detect, at the 5% level of statistical significance, a treatment that raised a cure rate from 75% to 85% would require 500 patients for 80% power.
Fixed sample size and sequential designs
Defining when a clinical trial should end is not as simple as it first appears. In the standard clinical trial the end is defined by the passage of all of the recruited subjects through the complete design. However, it is results and decisions based on the results that matter, not the number of subjects. The result of the trial may be that one treatment is superior to another or that there is no difference. These trials are of fixed sample size. In fact, patients are recruited sequentially, but the results are analysed at a fixed time-point.
The results of this type of trial may be disappointing if they miss the agreed and accepted level of significance.
It is not legitimate, having just failed to reach the agreed level (say, P = 0.05), to take in a few more patients in the hope that they will bring P value down to 0.05 or less, for this is deliberately not allowing chance and the treatment to be the sole factors involved in the outcome, as they should be.
An alternative (or addition) to repeating the fixed sample size trial is to use a sequential design in which the trial is run until a useful result is reached.27 These adaptive designs, in which decisions are taken on the basis of results to date, can assess results on a continuous basis as data for each subject become available or, more commonly, on groups of subjects (group sequential design). The essential feature of these designs is that the trial is terminated when a predetermined result is attained and not when the investigator looking at the results thinks it appropriate. Reviewing results in a continuous or interim basis requires formal interim analysis and there are specific statistical methods for handling the data, which need to be agreed in advance. Group sequential designs are especially successful in large long-term trials of mortality or major non-fatal endpoints when safety must be monitored closely.
Such sequential designs recognise the reality of medical practice and provide a reasonable balance between statistical, medical and ethical needs. Interim analyses, however, reduce the power of statistical significance tests each time that they are performed, and carry a risk of false positive result if chance differences between groups are encountered before the scheduled end of a trial.
Sensitivity of trials
Definitive therapeutic trials are expensive and on occasion may be so prolonged that aspects of treatment have been superseded by the time a result is obtained. A single trial, however well designed, executed and analysed, can answer only the question addressed. The regulatory authorities give guidance as to the number and design of trials that, if successful, would lead to a therapeutic claim. But changing clinical practice in the longer term depends on many other factors, of which confirmatory trials in other centres by different investigators under different conditions are an important part.
Meta-analysis
The two main outcomes for therapeutic trials are to influence clinical practice and, where appropriate, to make a successful claim for a drug with the regulatory authorities. Investigators are eternally optimistic and frequently plan their trials to look for large effects. Reality is different. The results of a planned (or unplanned) series of clinical trials may vary considerably for several reasons, but most significantly because the studies are too small to detect a treatment effect. In common but serious diseases such as cancer or heart disease, however, even small treatment effects can be important in terms of their total impact on public health. It may be unreasonable to expect dramatic advances in these diseases; we should be looking for small effects. Drug developers, too, should be interested not only in whether a treatment works, but also how well, and for whom.
The collecting together of a number of trials with the same objective in a systematic review28and analysing the accumulated results using appropriate statistical methods is termed meta-analysis. The principles of a meta-analysis are that:
• It should be comprehensive, i.e. include data from all trials, published and unpublished.
• Only randomised controlled trials should be analysed, with patients entered on the basis of ‘intention to treat’.29
• The results should be determined using clearly defined, disease-specific endpoints (this may involve a re-analysis of original trials).
There are strong advocates and critics of the concept, its execution and interpretation. Arguments that have been advanced against meta-analysis are:
• An effect of reasonable size ought to be demonstrable in a single trial.
• Different study designs cannot be pooled.
• Lack of accessibility of all relevant studies.
• Publication bias (‘positive’ trials are more likely to be published).
In practice, the analysis involves calculating an odds ratio for each trial included in the meta-analysis. This is the ratio of the number of patients experiencing a particular endpoint, e.g. death, and the number who do not, compared with the equivalent figures for the control group. The number of deaths observed in the treatment group is then compared with the number to be expected if it is assumed that the treatment is ineffective, to give the observed minus expected statistic. The treatment effects for all trials in the analysis are then obtained by summing all the ‘observed minus expected’ values of the individual trials to obtain the overall odds ratio. An odds ratio of 1.0 indicates that the treatment has no effect, an odds ratio of 0.5 indicates a halving and an odds ratio of 2.0 indicates a doubling of the risk that patients will experience the chosen endpoint.
From the position of drug development, the general requirement that scientific results have to be repeatable has been interpreted in the past by the Food and Drug Administration (the regulatory agency in the USA) to mean that two well controlled studies are required to support a claim. But this requirement is itself controversial and its relation to a meta-analysis in the context of drug development is unclear.
In clinical practice, and in the era of cost-effectiveness, the use of meta-analysis as a tool to aid medical decision-making and underpinning ‘evidence-based medicine’ is here to stay.
Figure 4.4 shows detailed results from 11 trials in which antiplatelet therapy after myocardial infarction was compared with a control group. The number of vascular events per treatment group is shown in the second and third columns, and the odds ratios with the point estimates (the value most likely to have resulted from the study) are represented by black squares and their 95% confidence intervals (CI) in the fourth column.
Fig. 4.4 A clear demonstration of benefits from meta-analysis of available trial data, when individual trials failed to provide convincing evidence (see text).
(Reproduced with permission of Collins R 2001 Lancet 357:373–380.)
The size of the square is proportional to the number of events. The diamond gives the point estimate and CI for overall effect.
Results: implementation
The way in which data from therapeutic trials are presented can influence doctors' perceptions of the advisability of adopting a treatment in their routine practice.
Relative and absolute risk
The results of therapeutic trials are commonly expressed as the percentage reduction of an unfavourable (or percentage increase in a favourable) outcome, i.e. as the relative risk, and this can be very impressive indeed until the figures are presented as the number of individuals actually affected per 100 people treated, i.e. as the absolute risk.
Where a baseline risk is low, a statement of relative risk alone is particularly misleading as it implies large benefit where the actual benefit is small. Thus a reduction of risk from 2% to 1% is a 50% relative risk reduction, but it saves only one patient for every 100 patients treated. But where the baseline is high, say 40%, a 50% reduction in relative risk saves 20 patients for every 100 treated.
To make clinical decisions, readers of therapeutic studies need to know: how many patients must be treated30 (and for how long) to obtain one desired result (number needed to treat). This is the inverse (or reciprocal) of absolute risk reduction.
Relative risk reductions can remain high (and thus make treatments seem attractive) even when susceptibility to the events being prevented is low (and the corresponding numbers needed to be treated are large). As a result, restricting the reporting of efficacy to just relative risk reductions can lead to great – and at times excessive – zeal in decisions about treatment for patients with low susceptibilities.31
A real-life example follows:
Antiplatelet drugs reduce the risk of future non-fatal myocardial infarction by 30% [relative risk] in trials of both primary and secondary prevention. But when the results are presented as the number of patients who need to be treated for one nonfatal myocardial infarction to be avoided [absolute risk] they look very different.
In secondary prevention of myocardial infarction, 50 patients need to be treated for 2 years, while in primary prevention 200 patients need to be treated for 5 years, for one non-fatal myocardial infarction to be prevented. In other words, it takes 100 patient-years of treatment in primary prevention to produce the same beneficial outcome of one fewer non-fatal myocardial infarction.32
Whether a low incidence of adverse drug effects is acceptable becomes a serious issue in the context of absolute risk. Non-specialist doctors, particularly those in primary care, need and deserve clear and informative presentation of therapeutic trial results that measure the overall impact of a treatment on the patient's life, i.e. on clinically important outcomes such as morbidity, mortality, quality of life, working capacity, fewer days in hospital. Without it, they cannot adequately advise patients, who may themselves be misled by inappropriate use of statistical data in advertisements or on internet sites.
Important aspects of therapeutic trial reports
• Statistical significance and its clinical importance.
• Confidence intervals.
• Number needed to treat, or absolute risk.
Pharmacoepidemiology
Pharmacoepidemiology is the study of the use and effects of drugs in large numbers of people. Some of the principles of pharmacoepidemiology are used to gain further insight into the efficacy, and especially the safety, of new drugs once they have passed from limited exposure in controlled therapeutic pre-registration trials to the looser conditions of their use in the community. Trials in this setting are described as observational because the groups to be compared are assembled from subjects who are, or who are not (the controls), taking the treatment in the ordinary way of medical care. These (Phase 4) trials are subject to greater risk of selection bias33 and confounding34 than experimental studies (randomised controlled trials) where entry and allocation of treatment are strictly controlled (increasing internal validity). Observational studies, nevertheless, come into their own when sufficiently large randomised trials are logistically and financially impracticable. The following approaches are used.
Observational cohort35 studies
Patients receiving a drug are followed up to determine the outcomes (therapeutic or adverse). This is usually forward-looking (prospective) research. A cohort study does not require a suspicion of causality; subjects can be followed ‘to see what happens’ (event recording). Prescription event monitoring (below) is an example, and there is an increasing tendency to recognise that most new drugs should be monitored in this way when prescribing becomes general. Major difficulties include the selection of an appropriate control group, and the need for large numbers of subjects and for prolonged surveillance. This sort of study is scientifically inferior to the experimental cohort study (the randomised controlled trial) and is cumbersome for research on drugs.
Investigation of the question of thromboembolism and the combined oestrogen–progestogen contraceptive pill by means of an observational cohort study required enormous numbers of subjects36 (the adverse effect is, fortunately, uncommon) followed over years. An investigation into cancer and the contraceptive pill by an observational cohort would require follow-up for 10–15 years. Happily, epidemiologists have devised a partial alternative: the case–control study.
Case–control studies
This reverses the direction of scientific logic from a forward-looking, ‘what happens next’ (prospective) to a backward-looking, ‘what has happened in the past’ (retrospective)37 investigation. The case–control study requires a definite hypothesis or suspicion of causality, such as an adverse reaction to a drug. The investigator assembles a group of patients who have the condition. A control group of people who have not had the reaction is then assembled (matched, e.g. for sex, age, smoking habits) from hospital admissions for other reasons, primary care records or electoral rolls. A complete drug history is taken from each group, i.e. the two groups are ‘followed up’ backwards to determine the proportion in each group that has taken the suspect agent. Case–control studies do not prove causation.38 They reveal associations and it is up to investigators and critical readers to decide the most plausible explanation.
A case–control study has the advantage that it requires a much smaller number of cases (hundreds) of disease and can thus be done quickly and cheaply. It has the disadvantage that it follows up subjects backwards and there is always suspicion of the intrusion of unknown and so unavoidable biases in the selection of both patients and controls. Here again, independent repetition of the studies, if the results are the same, greatly enhances confidence in the outcome.
Surveillance systems: pharmacovigilance
When a drug reaches the market, a good deal is known about its therapeutic activity but rather less about its safety when used in large numbers of patients with a variety of diseases, for which they are taking other drugs. The term pharmacovigilance refers to the process of identifying and responding to issues of drug safety through the detection in the community of drug effects, usually adverse. Over a number of years increasingly sophisticated systems have been developed to provide surveillance of drugs in the post-marketing phase. For understandable reasons, they are strongly supported by governments. The position has been put thus:
Four kinds of logic can be applied to drug safety monitoring:
• To gain experience from regular reporting of suspected adverse drug reactions from health professionals during the regular clinical use of the drug.
• To attempt to follow a complete cohort of (new) drug users for as long as it is deemed necessary to have adequate information.
• To perform special studies in areas which may be predicted to give useful information.
• To examine disease trends for drug-related causality.39
Drug safety surveillance relies heavily on the techniques of pharmacoepidemiology, which include the following:
Voluntary reporting
Doctors, nurses, pharmacists and patients may report suspected adverse reaction to drugs. In the UK, this is called the ‘Yellow Card’ system and the Commission for Human Medicines advises the Medicines and Healthcare products Regulatory Agency of the government on trends and signals. It is recommended that for:
• newer drugs: all suspected reactions should be reported, i.e. any adverse or any unexpected event, however minor, that could conceivably be attributed to the drug
• established drugs: all serious suspected reactions should be reported, even if the effect is well recognised.
Inevitably the system depends on the intuitions and willingness of those called on to respond. Surveys suggest that no more than 10% of serious reactions are reported. Voluntary reporting is effective for identifying reactions that develop shortly after starting therapy, i.e. at providing early warnings of drug toxicity, particularly rare adverse reactions. Thus, it is the first line in post-marketing surveillance. Reporting is particularly low, however, for reactions with long latency, such as tardive dyskinesia from chronic neuroleptic use. As the system has no limit of quantitative sensitivity, it may detect the rarest events, e.g. those with an incidence of 1:5000 to 1:10 000. Voluntary systems are, however, unreliable for estimating the incidence of adverse reactions as this requires both a high rate of reporting (the numerator) and a knowledge of the rate of drug usage (the denominator).
Prescription event monitoring
This is a form of observational cohort study. Prescriptions for a drug (say, 20 000) are collected (in the UK this is made practicable by the existence of a National Health Service in which prescriptions are sent to a single central authority for pricing and payment of the pharmacist). The prescriber is sent a questionnaire and asked to report all events that have occurred (not only suspected adverse reactions) with no judgement regarding causality. Thus ‘a broken leg is an event. If more fractures were associated with this drug they could have been due to hypotension, CNS effects or metabolic disease’.40 By linking general practice and hospital records and death certificates, both prospective and retrospective studies can be done and unsuspected effects detected. Prescription event monitoring can be used routinely on newly licensed drugs, especially those likely to be widely prescribed in general practice, and it can also be implemented quickly in response to a suspicion raised, e.g. by spontaneous reports.
Fig. 4.5 Oscillations in the development of a drug.
(By courtesy of Dr Robert H. Williams and the editor of the Journal of the American Medical Association.)
Medical record linkage
allows computer correlation in a population of life and health events (birth, marriage, death, hospital admission) with history of drug use. It is being developed as far as resources permit. It includes prescription event monitoring (above). The largest UK medical record linkage is the General Practice Research Database (GPRD) at the Medicines and Healthcare products Regulatory Agency.
Population statistics
e.g. birth defect registers and cancer registers. These are insensitive unless a drug-induced event is highly remarkable or very frequent. If suspicions are aroused then case–control and observational cohort studies will be initiated.
Strength of evidence
A number of types of clinical investigation are described in this chapter, and elsewhere in the book. When making clinical decisions about a course of therapeutic action, it is obviously relevant to judge the strength of evidence generated by different types of study. This has been summarised as follows, in rank order:41
1. Systematic reviews and meta-analyses.
2. Randomised controlled trials with definitive results (confidence intervals that do not overlap the threshold of the clinically significant effect).42
3. Randomised controlled trials with non-definitive results (a difference that suggests a clinically significant effect but with confidence intervals overlapping the threshold of this effect).
4. Cohort studies.
5. Case–control studies.
6. Cross-sectional surveys.
7. Case reports.
In conclusion43
Drug development is a high risk business. Early hopes and expectations can later be shattered by the realities of clinical practice, when the risks as well as the benefits of a medicine emerge with the passage of time.
Guide to further reading
Biomarkers Definitions Working Group. Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clin. Pharmacol. Ther.. 2001;69(3):89–95.
Bland J.M., Altman D.G. Statistical notes: the odds ratio. Br. Med. J.. 2000;320:1468.
Bracken M.B. Why animal studies are often poor indicators of human reactions to exposure. J. R. Soc. Med.. 2008;101:120–122.
Chatellier G., Zapletal E., Lemaitre D., et al. The number needed to treat: a clinically useful nomogram in its proper context. Br. Med. J.. 1996;312:426–429.
Doll R. Controlled trials: the 1948 watershed. Br. Med. J.. 1998;317:1217–1220. (and following articles)
Egger M., Smith G.D., Phillips A.N. Meta-analysis: principles and procedures. Br. Med. J. 1997;315:1533–1537. (see also other articles in the series entitled ‘Meta-analysis’)
Emanuel E.J., Miller F.G. The ethics of placebo-controlled trials – a middle ground. N. Engl. J. Med.. 2001;345:915–919.
Garattini S., Chalmers I. Patients and the public deserve big changes in the evaluation of drugs. Br. Med. J.. 2009;338:804–806.
GRADE Working Group. GRADE: what is 'quality of evidence' and why is it important to clinicians?. Br. Med. J.. 2008;336:924–929. (and the other papers of this series)
Greenhalgh T. Papers that report drug trials. Br. Med. J. 1997;315:480–483. (see also other articles in the series entitled ‘How to read a paper’)
Kaptchuk T.J. Powerful placebo: the dark side of the randomised controlled trial. Lancet. 1998;351:1722–1725.
Khan K.S., Kunz R., Kleijnen J., Antes G. Five steps to conducting a systematic review. J. R. Soc. Med.. 2003;96:118–121.
Lewis J.A., Jonsson B., Kreutz G., et al. Placebo-controlled trials and the Declaration of Helsinki. Lancet. 2002;359:1337–1340.
Miller F.G., Rosenstein D.L. The therapeutic orientation to clinical trials. N. Engl. J. Med.. 2003;348:1383–1386.
Rochon P.A., Gurwitx J.H., Sykora K., et al. Reader's guide to critical appraisal of cohort studies: 1. Role and design. Br. Med. J.. 2005;330:895–897.
Rothwell P.M. External validity of randomised controlled trials: ‘to whom do the results of this trial apply? Lancet. 2005;365:82–93.
Rothwell P.M. Treating individuals 2. Subgroup analysis in randomised controlled trials: importance, indications, and interpretation. Lancet. 2005;365:176–186.
Sackett D., Rosenberg W., Gray J., et al. Evidence based medicine: what it is and what it isn't [editorial]. Can. Med. Assoc. J.. 2009;312:1–8.
Silverman W.A., Altman D.G. Patients' preferences and randomised trials. Lancet. 1996;347:171–174.
Vlahakes G.J. Editorial. The value of phase 4 clinical testing. N. Engl. J. Med. 2006;354(4):413–415.
Waller P.C., Jackson P.R., Tucker G.T., Ramsay L.E. Clinical pharmacology with confidence [intervals]. Br. J. Clin. Pharmacol.. 1994;37(4):309.
Williams R.L., Chen M.L., Hauck W.W. Equivalence approaches. Clin. Pharmacol. Ther.. 2002;72:229–237.
Zwarenstein M., Treweek S., Gagnier J.J., et al. CONSORT group: Pragmatic Trials in Healthcare (Practihc) group. Improving the reporting of pragmatic trials: an extension of the CONSORT statement. Br. Med. J. 2008;337:a2390.
1 Guidance to researchers in this matter is clear. The World Medical Association Declaration of Helsinki (Edinburgh revision 2000) states that ‘considerations related to the well-being of the human subject should take precedence over the interests of science and society’. The General Assembly of the United Nations adopted in 1966 the International Covenant on Civil and Political Rights, of which Article 7 states, ‘In particular, no one shall be subjected without his free consent to medical or scientific experimentation’. This means that subjects are entitled to know that they are being entered into research even though the research be thought ‘harmless’. But there are people who cannot give (informed) consent, e.g. the demented. The need for special procedures for such is now recognised, for there is a consensus that, without research, they and the diseases from which they suffer will become therapeutic ‘orphans’.
2 Report: Royal College of Physicians of London 1996 Guidelines on the Practice of Ethics Committees in Medical Research Involving Human Subjects. Royal College of Physicians, London.
3 American Defence Secretary Donald Rumsfeld, on 12 February 2002, at a press briefing where he addressed the absence of evidence linking the government of Iraq with the supply of weapons of mass destruction to terrorist groups.
4 For extensive practical detail, see Council for International Organisations of Medical Sciences (CIOMS) in collaboration with the World Health Organization (WHO) 2002 International Ethical Guidelines for Biomedical Research Involving Human Subjects. CIOMS, Geneva. (WHO publications are available in all UN member countries.) See also: Guideline for Good Clinical Practice, International Conference on Harmonisation Tripartite Guideline. EU Committee on Proprietary Medicinal Products (CPMP/ICH/135/95). Smith T 1999 Ethics in Medical Research: A Handbook of Good Practice. Cambridge University Press, Cambridge.
5 Oxford English Dictionary. See also: Edwards M 2004 Historical keywords: Tria. Lancet 364:1659.
6 Kety S. Quoted by Beecher H K 1959 Journal of the American Medical Association 169:461.
7 This is the uncertainty principle: the concept that patients entering a randomised therapeutic trial will have equal potential for benefit and risk is referred to as equipoise.
8 The ‘four principles’ approach (above) is widely utilised in biomedical ethics. A full description and an analysis of the contribution of this and other ethical theories to decision-making in clinical, including research, practice can be found in: Beauchamp T L, Childress J F 2001 Principles of Biomedical Ethics, 5th edn. Oxford University Press, Oxford.
9 Injury to participants in clinical trials is uncommon and serious injury is rare. In March 2006, eight healthy young men entered a trial of a humanised monoclonal antibody designed to be an agonist of a particular receptor on T lymphocytes that stimulates their production and activation. This was the first administration to humans; preclinical testing in rabbits and monkeys at doses up to 500 times those received by the volunteers apparently showed no ill effect. Six of the volunteers quickly became seriously ill and required admission to an intensive care facility with multi-organ failure due to a ‘cytokine release syndrome’, in effect a massive attack on the body's own tissues. All the volunteers recovered but some with disability. This toxicity in humans, despite apparent safety in animals, may be due to the specifically humanised nature of the monoclonal antibody. Testing of perceived high-risk new medicines is likely to be subject to particularly stringent regulation in future. See Wood A J J, Darbyshire J 2006 Injury to research volunteers – the clinical research nightmare. New England Journal of Medicine 354:1869–1871.
10 Freedman B 1987 Equipoise and the ethics of clinical research. New England Journal of Medicine 317:141–145.
11 Moderate to severe adverse events have occurred in about 0.5% of healthy subjects. See Orme M, Harry J, Routledge P, Hobson S 1989 British Journal of Clinical Pharmacology 27:125; Sibille M et al 1992 European Journal of Clinical Pharmacology 42:393.
12 Guidelines for the conduct and analysis of a range of clinical trials in different therapeutic categories are released from time to time by the Committee on Medicinal Products for Human Use (CHMP) of the European Commission. These guidelines apply to drug development in the European Union. Other regulatory authorities issue guidance, e.g. the Food and Drug Administration in the USA, the Ministry of Health, Labour and Welfare in Japan. There has been considerable success in aligning different guidelines across the world through the International Conferences on Harmonisation (ICH). The source for CHMP guidelines is info@mhra.gsi.gov.uk
13 A drug for which the original patent has expired, so that any pharmaceutical company may market it in competition with the inventor. The term ‘generic’ has come to be synonymous with the non-proprietary or approved name (see Ch. 7).
14 Information on both categories (method effectiveness and use effectiveness) is valuable (Sheiner L B, Rubin D B 1995 Intention-to-treat analysis and the goals of clinical trials. Clinical Pharmacology and Therapeutics 57(1):6–15).
15 Wallis W A, Roberts H V 1957 Statistics, A New Approach. Methuen, London.
16 Galton F 1879 Generic images. Proceedings of the Royal Institution.
17 Altman D G, Gore S M, Gardner M J, Pocock S J 1983 Statistical guidelines for contributors to medical journals. British Medical Journal 286:1489–1493.
18 Gardner M J, Altman D G 1986 Confidence intervals rather than P values: estimation rather than hypothesis testing. British Medical Journal 292:746–750.
19 Altman D G, Gore S M, Gardner M J, Pocock S J 1983 Statistical guidelines for contributors to medical journals. British Medical Journal 286:1489–1493.
20 The target difference. Differences in trial outcomes fall into three grades: (1) that the doctor will ignore, (2) that will make the doctor wonder what to do (more research needed), and (3) that will make the doctor act, i.e. change prescribing practice.
21 Bradford Hill A 1977 Principles of Medical Statistics. Hodder and Stoughton, London. If there is a ‘father’ of the modern scientific therapeutic trial, it is he.
22 Particularly in large-scale outcome trials, an independent data monitoring committee is given access to the results as these are accumulated; the committee is empowered to discontinue a trial if the results show significant advantage or disadvantage to one or other treatment.
23 Note also patient preference trials. Conventionally, patients are invited to participate in a clinical trial, give consent and are then randomised to a particular treatment group. In special circumstances, randomisation takes place first, the patients are informed of the treatment to be offered and are allowed to opt for this or another treatment. This is called pre-consent randomisation or ‘pre-randomisation’. In a trial of simple mastectomy versus lumpectomy with or without radiotherapy for early breast cancer, recruitment was slow because of the disfiguring nature of the mastectomy option. A policy of pre-randomisation was then adopted, letting women know the group to which they would be allocated should they consent. Recruitment increased sixfold and the trial was completed, providing sound evidence that survival was as long with the less disfiguring option (Fisher B, Bauer M, Margolese R et al 1985 Five-year results of a randomised clinical trial comparing total mastectomy and segmental mastectomy with and without radiotherapy in the treatment of breast cancer. New England Journal of Medicine 312:665–673). However, the benefit of enhanced recruitment may be limited by potential for introducing bias.
24 Modell W, Houde R W 1958 Factors influencing clinical evaluation of drugs; with special reference to the double-blind technique. Journal of the American Medical Association 167:2190–2199.
25 Senn S 1997 N-of-1 Trials: Statistical Issues in Drug Development. John Wiley, Chichester, pp. 249–255.
26 Jull A, Bennet D 2005 Do N-of-1 trials really tailor treatment? Lancet 365:1992–1994.
27 Whitehead J 1992 The Design Analysis of Sequential Clinical Trials, 2nd edn. Ellis Horwood, Chester.
28 A review that strives comprehensively to identify and synthesise all the literature on a given subject (sometimes called an overview). The unit of analysis is the primary study, and the same scientific principles and rigour apply as for any study. If a review does not state clearly whether and how all relevant studies were identified and synthesised, it is not a systematic review (Cochrane Library 1998).
29 Reports of therapeutic trials should contain an analysis of all patients entered, regardless of whether they dropped out or failed to complete, or even started the treatment for any reason. Omission of these subjects can lead to serious bias (Laurence D R, Carpenter J 1998 A Dictionary of Pharmacological and Allied Topics. Elsevier, Amsterdam).
30 See Cooke R J, Sackett D L 1995 The number needed to treat: a clinically useful treatment effect. British Medical Journal 310:452.
31 Sackett D L, Cooke R J 1994 Understanding clinical trials: what measures of efficacy should journal articles provide busy clinicians? British Medical Journal 309:755.
32 For example, drug therapy for high blood pressure carries risks, but the risks of the disease vary enormously according to severity of disease: ‘Depending on the initial absolute risk, the benefits of lowering blood pressure range from preventing one cardiovascular event a year for about every 20 people treated, to preventing one event for about every 5000–10 000 people treated. The level of risk at which treatment should be started is debatable’ (Jackson R, Barham P, Bills J et al 1993 Management of raised blood pressure in New Zealand: a discussion document. British Medical Journal 307:107–110).
33 A systematic error in the selection or randomisation of patients on admission to a trial such that they differ in prognosis, i.e. the outcome is weighted one way or another by the selection, not by the trial.
34 When the interpretation of an observed association between two variables may be affected by a strong influence from a third variable (which may be hidden or unknown). Examples of confounders would be concomitant drug therapy or differences in known risk factors, e.g. smoking, age, sex.
35 Used here for a group of people having a common attribute, e.g. they have all taken the same drug.
36 The Royal College of General Practitioners (UK) recruited 23 000 women takers of the pill and 23 000 controls in 1968 and issued a report in 1973. It found an approximately doubled incidence of venous thrombosis in combined-pill takers (the dose of oestrogen was reduced because of this study).
37 For this reason such studies have been named trohoc (cohort spelled backwards) studies (Feinstein A 1981 Journal of Chronic Diseases 34:375).
38 Experimental cohort studies (i.e. randomised controlled trials) are on firmer ground with regard to causation as there should be only one systematic difference between the groups (i.e. the treatment being studied). In case–control studies the groups may differ systematically in several ways.
39 Edwards I R 1998 A perspective on drug safety. In: Edwards I R (ed) Drug Safety. Adis International, Auckland, p. xii.
40 Inman W H W, Rawson N S B, Wilton L V 1986 Prescription-event monitoring. In: Inman W H W (ed) Monitoring for Drug Safety, 2nd edn. MTP, Lancaster, p. 217.
41 Guyatt G H, Sackett D L, Sinclair J C et al 1995 Users' guides to the medical literature. IX. A method for grading health care recommendations. Evidence-Based Medicine Working Group. Journal of the American Medical Association 274:1800–1804.
42 The reporting of randomised controlled trials has been systemised so that only high-quality studies will be considered. See Moher D, Schulz K F, Altman D G 2001 CONSORT Group. The CONSORT statement: revised recommendations for improving the quality of reports of parallel group randomised trials. Lancet 357:1191–1194.
43 ‘Quick, let us prescribe this new drug while it remains effective’. Richard Asher.