Core Topics in General and Emergency Surgery

Outcomes and health economic issues in surgery

Sharath C.V. Paravastu and Jonathan A. Michaels


Evidence-based medicine demands that all those making decisions regarding clinical management, either on an individual patient basis or at a policy level, consider existing evidence in order to maximise the chance of favourable outcomes and optimise the use of available resources. However, such evidence is rarely clear-cut and there may be conflicting advice because of differences in the way that outcomes are measured, the way in which costs are assessed or the perspective from which an economic evaluation is carried out.

This chapter deals with some of the issues around the measurement of outcomes, the calculation of costs and the methods of economic evaluation. The available outcome measures are considered, drawing the distinctions between disease-specific and generic measures and explaining concepts such as health-related quality of life, quality-adjusted life-years (QALYs) and utilities. The differences between costs, charges and resource use are highlighted, followed by a discussion of issues such as discounting, sensitivity analysis and marginal costing. Finally, a section on economic evaluation describes the different techniques available – cost minimisation, cost-effectiveness, cost–utility and cost–benefit analysis – and discusses the use of cost-effectiveness league tables. The intention is not to provide a full reference work on these subjects but to raise awareness of some of the important issues to be considered when evaluating evidence on specific interventions that may rely on differing outcome measurements or methods of economic evaluation.

Outcome measures

Clinicians tend routinely to consider health outcomes in terms of clinical or biomedical measures such as blood pressure levels, blood sugar levels or bone mineral density. Process-based outcomes such as readmission rates, reintervention or complications are readily considered alternatives. Data such as these are seen as readily available, easily measured, objective and comparable between differing settings. However, the present environment in medical services makes it necessary for the healthcare professional to consider more than just the treatment of the condition. A greater emphasis is now placed upon the consideration by the clinician of the actual status of the patient's quality of life. Now, considerations extend beyond assessing the value of an intervention and the effectiveness, or otherwise, of drug regimens. There should also be an assessment of the patient's physical, mental and social well-being. In line with such interests, there has been considerable research and a greater emphasis upon applying subjective non-biomedical measures and the development of such tools (or ‘instruments’) has been substantial since the early 1980s.

When considering which instrument of assessment to choose from the plethora now available, the user should carefully consider what parameter is to be measured before making a final selection of an outcome measure. Before applying this measure to a patient population, particular consideration needs to be given to deciding whether it will measure what we are interested in measuring and whether it will answer the questions that we wish to be answered.

All too often assessment tools may be applied to patients in the wrong circumstances or used when there is no realistic opportunity to measure what we wish to measure. These are important considerations because the administration and analysis of these measures can be costly, as well as taking up valuable time for both patients and clinicians. In addition, the use of unsuitable measures applied in the wrong context might yield results that are perhaps plausible but wrong, thus leading to erroneous conclusions. The implications of such findings for patients or the health service can be substantial.

Instruments should therefore be carefully selected for their appropriateness (able to answer the research question), acceptability (acceptable to patients), feasibility (ease of administering), reliability (reproducibility), validity (measures the outcome it is meant to), responsiveness (ability to respond to changes), precision (of scores) and interpretability (ease of understanding the results).1 In particular, attention should be given to reliability and validity. Reliability refers to whether the instrument will be reproducible, such that if applied in different settings or circumstances to the same unchanged population then the same results should be achieved. This has particular implications for studies using instruments to derive longitudinal data on a particular sample of patients. In such circumstances, we need to be confident that observed changes over a given time period reflect actual change. Test–retest reliability is an important consideration and is assessed by making repeated assessments under the same circumstances at differing points in time and comparing the results using correlations or differences. Similarly, for instruments that require administration by interviewers, there needs to be a high level of agreement between different raters assessing the same patients but at different periods in time. For example, Collin et al. found a high level of agreement between the patients, a trained nurse and two skilled observers during applications of the Barthel Index (assesses patients' ability to carry out daily activities) to the same group of patients.2 In another example, Aissaoui et al. found a high level of agreement between doctors and nurses during the application of Behavioural Pain Score (BPS) in the same group of critically ill patients.3

Another psychometric criterion for consideration is that of validity, which means that instruments measure precisely what they set out to do. It should be borne in mind that measures can be reliable without being valid, but they cannot be valid without being reliable. Three types of validity are described. First, content validity, which relates to the choice, appropriateness and representativeness of the content of the instrument. Judging content validity involves an assessment of whether all of the relevant concepts are represented. For example, a representative sample of asthmatic patients could be used to develop an asthma questionnaire in order to ensure that it captures all the domains of interest for such a patient population. Second, there is a requirement to consider criterion validity, which is the degree to which the measure obtains results that are comparable to some kind of ‘gold standard’. While this is theoretically a simple concept, there are very few such gold standards for comparison. Finally, there is construct validity. This relates to observation of when expected patterns of given relationships are observed. For example, if a method of valuation of outcomes predicts that a patient prefers option A to option B, then one would expect this to be reflected by their behaviour when faced with genuine clinical choices. This is normally assessed through the use of multitrait–multimethod techniques,4 which map the correlations between alternative approaches to measuring the same construct and between measures of different constructs.

As can be seen, the choice of an outcome measure is not always as straightforward as it may seem at first. In addition to considerations regarding the patient group, there are also important considerations regarding the psychometric properties of the instruments. Different outcome measures will have uses for differing patient groups. For example, a biomedical measure such as blood pressure alone might be considered suitable for comparing two similar drug regimens to assess ‘best’ control of blood pressure. However, a study that attempts to compare renal transplant with dialysis might also wish to consider a much broader picture and would be likely to require consideration of quality-of-life issues together with mortality as outcome measures.

It is extremely important to choose the right outcomes for the purpose in question, as different conclusions can be drawn from the application of different outcome measures in the same study. For example, a study of vascular patients compared exercise training with angioplasty for stable claudication with results expressed in terms of ankle–brachial pressure indices and walking distance.5 In the short term, it was found that angioplasty improved the pressure but not the walking distance, while exercise improved the walking distance but not the pressure. This example shows that different outcome measures may not always change in the same direction and used in isolation could lead to opposite conclusions.

The following sections examine some of the issues involved in the evaluation and application of some common specific outcome measures.


The mortality rate expresses the incidence of death in a population of interest over a given period of time. It is calculated by dividing the number of fatalities in the given population by the total population.

Mortality is often used as an outcome measure in studies as an indication of the effectiveness, or otherwise, of a treatment. It is often easily derived and as such represents a readily accessible outcome measure. While mortality can indeed provide much useful information, its use in reporting results should always be interpreted with caution. First, procedure- and diagnosis-related mortality rates often refer to inpatient deaths only or perhaps mortality over a given postoperative period, for example 30 days. Variations in short-term survival rates might simply reflect differing discharge practices between differing hospitals or settings. Longer-term survival rates are frequently reported for cancer and other chronic conditions. In interpreting these, it must be borne in mind that distortion may occur as a result of the starting point or choice of time frame. For example, survival may be longer in a screened population because there is an earlier starting point,6 and comparisons between surgical and medical treatments may be very sensitive to follow-up periods because of early excess operative mortality in surgical treatments, which may be offset by better long-term survival. For these reasons, it is often necessary to compare survival curves rather than total survival at a specific time point. This may raise further issues regarding the possible need for discounting, to take account of a preference for survival in the earlier years after treatment (see below).

Second, mortality can only be a partial measure of quality, and it is often not the most appropriate outcome measure for use in most situations. Many studies report mortality and tend to ignore other important outcomes such as morbidity and quality of life. These are more complex to quantify and are not routinely collected. Mortality is particularly limited in usefulness for studies investigating low-risk procedures. Accurate assessment of the quality of such procedures requires more sensitive measures. For example, use of mortality alone as an outcome measure for parathyroidectomy would not be appropriate as it is associated with an extremely low mortality. A more appropriate measure would be to assess improvement in symptoms or quality of life.

It is also necessary to highlight the effect that differences in case mix can have on the mortality rate. For example, there is a tendency in studies relating workload to outcome for the results to be reported for the whole sample of patients. It is important to be aware that results reported in this way may be misleading as no account is taken of the diversity of patient characteristics that may be contained in such a sample. Both differences in severity of illness and in risk of adverse outcomes relating to comorbidity can significantly affect any interpretation of mortality rates. This problem is illustrated by Sowden and Sheldon, who discuss examples from coronary artery bypass grafting and intensive care to demonstrate the importance of adjusting for case mix.7 For coronary artery bypass grafting, they report that the strength of the relationship between low volume and increased mortality is reduced in studies that adjust for differences in risk among patients receiving treatment. With adult intensive care, they cite a study by Jones and Rowan8 in which the apparent higher mortality associated with smaller intensive care units ceased to be significant once the data were adjusted to reflect the fact that severity of illness was on average higher among patients admitted to small units. These examples clearly demonstrate that, in order to minimise bias in such studies, account must be taken of all possible factors (beyond workload) that are likely to affect patient outcomes.

Condition-Specific Outcome Measures

The term ‘condition specific’ describes instruments designed to measure health outcomes considered to be of specific interest to patients who incur health problems attributable to a particular disease or as the result of other processes. Such instruments are often referred to as ‘disease specific’, but this term is more general as it encompasses more diverse areas such as natural ageing, trauma and pregnancy, which are not diseases.9

The measurement of health status is not restricted to broad generic measures. There are many instances when researchers and clinicians are interested in assessing the health status of individuals with a certain condition or disease. As might be anticipated, many tools have been designed for this purpose and these are primarily aimed at measuring changes that are of importance to clinicians. For example, Spilker et al. identified over 300 such instruments in 1987 and many more are presently available.10 Examples of such instruments include: measures for arthritis, such as the Arthritis Impact Measurement Scales;11 measures for the heart, such as the Specific Activity Scale;12 measures that assess pain, such as the McGill Pain Questionnaire;13 and measures for varicose veins, such as the Aberdeen Varicose Vein Questionnaire.14 These instruments have a varying number of dimensions, differing numbers of items and are generally self-completion or interview, though some methods include professional assessment and clinical interview. Such questionnaires are usually scored in a simplistic fashion. Most have simple numerical scaling, such as from 1 to 5, and these scores are usually summed across the items for each dimension, or across all items.

Advantages of condition-specific measures include their relevance and their greater responsiveness to health change.15 Disadvantages are that they often exclude items relevant to potential complications of treatment and symptoms that do not easily fit the medical model of disease. Generic measures have tended to be used in preference because they can be used to assess benefits for differing treatments or conditions, in a common and exchangeable currency. This enables decisions to be made on allocative efficiency between healthcare programmes within the total healthcare budget, rather than helping to establish the technical efficiency of producing health benefits for a specific condition.9

Patient-Reported Outcome Measures (PROMs)

Due to increasing need for assessing the effectiveness of care from a patient's perspective, a number of PROMs have been developed. For example, in the NHS patients undergoing primary hip or knee replacement, hernia surgery and varicose vein surgery are requested to fill questionnaires to assess their symptoms and disability before and after surgery. Some NHS trusts use electronic methods of recording outcome data, such as the Patient Assessment Questionnaires (ePAQ) in gynaecology. The advantage of PROM is that it minimises observer bias, wherein patient experience is directly assessed. These results can be useful in informing patients, redesigning the provision of services and quality improvement. One disadvantage can be the response rate and, in the case of electronic data collection, the access to and use of technology by patients to complete the questionnaires.

The Measurement Of Pain

Pain is a common and important symptom of many medical conditions and deserves special consideration. While many generic and condition-specific instruments dedicate specific dimensions to the measurement of pain, there are also a number of instruments designed specifically to assess levels of pain. The measurement of pain cannot be directly assessed through clinical measures (e.g. blood samples) and, in the absence of such objective approaches, it is necessary to assess pain subjectively through the patient's own perceptions.

Subjective measures allow for reproducible results provided that the tools used to assess pain are measured appropriately. While subjective measures for the measurement of pain have many advantages over other instruments, there are problems in assessing subjects for whom communication is difficult. Examples include very young children and patients who are incapable of expressing how they feel. Those patients who are unconscious or who are terminally ill will continue to present particular difficulties for those attempting to assess levels of pain.

As with other outcome measures, options exist to use binary, categorical or visual analogue scales. Within this area, there are both pain-relief scales and pain-intensity scales to be considered. There are often occasions when it will be necessary to measure the state of the pain rather than the effect of a particular intervention or therapy, and in these circumstances pain-intensity scales will be appropriate. Pain-relief scales may well encompass more than an intensity scale, as any side-effects resulting from an intervention, such as dizziness, might be included in such measurements. Pain-relief scales also require the patient to make a judgment as to how the current pain compares with remembered pain, before the intervention. This might make such scales more complicated for patients to grasp compared with intensity scales and this may raise doubts about validity. In situations where both pain-relief and pain-intensity scales are appropriate, a decision is required regarding which one should be used.

Pain can manifest itself in a variety of qualities, and one of the most widely used tools is the McGill Pain Questionnaire.13 This is a generic instrument, designed primarily for adults, which was developed to specify the qualities and intensities of pain; as such, it is intended to provide a quantifiable profile of pain. The questionnaire can be completed by the patient or administered in an interview. Completion takes roughly 15–20 minutes but becomes quicker on subsequent applications. The instrument contains 78 pain descriptor words, which are grouped into 20 subclasses, each of which contains two to five words that describe pain on an ordinal scale. These words are arranged to reflect three dimensions of pain: sensory, affective and evaluative. The questionnaire can yield three indices of pain: a pain-rating index based on the scale values of the words chosen by the patient; a rank score using the rank values within each subgroup chosen; and the total number of descriptions chosen by the patient.

The McGill Pain Questionnaire has been thoroughly investigated in recent years and the instrument has proved to be the most reliable in applications to patients with moderate-to-severe chronic or acute pain. It has been shown to be particularly useful in disaggregating explained pain from unexplained pain. When the instrument was used in a cohort of cancer patients with lymphoedema, it proved to be more sensitive than categorical or analogue scales.16

In those patients who are critically ill or unconscious, Behavioural Pain Score (BPS) was developed to assess response to pain. BPS is based on three items: facial expression, movement of upper limbs and compliance with mechanical ventilation. Each of these items is graded with values from 1 to 4, 1 being no response and 4 being full response, giving a minimal score of 3 and maximal score of 12.17 Over the last decade BPS has been validated in a number of studies.3,18

Health-Related Quality Of Life

Recent years have witnessed quite an upsurge in interest in the measurement of health-related quality of life, much of which has no doubt been stimulated by the analytical demands of researchers and the need for outcome information as a basis for policy decisions.

Such health status measures are standardised questionnaires, or instruments, which are used to evaluate patient health across a broad range of areas. These areas include symptoms, physical functioning, mental well-being, work and social activities. Measures can be either generic or condition specific, and such measures can generate a profile of scores or a single index. These scores can be based upon people's preferences (e.g. EQ-5D) or, more usually, arbitrary scoring procedures (e.g. SF-36, assumes equal weighting for most items).19 Four of the most frequently used instruments are discussed in more detail below.


Of those instruments that generate a single index, EQ-5D has been rapidly adopted and widely used.19 The measure was developed by a group of researchers from seven centres across five countries.20 The present measure has five dimensions, whereas the original version had six. Patients are classified by the completion of a five-item questionnaire. The EuroQol is a brief, easy-to-use questionnaire of two pages. It can be made even simpler by using the one-page descriptive classification. Self-completion, or interview, usually only takes a matter of minutes and response rates tend to be extremely high.19 The five dimensions of the EQ-5D comprise mobility, self-care, usual activities, pain/discomfort and anxiety/ depression. Each of the five questions has three levels; therefore, combining the five questions defines 243 health states. In addition, the instrument contains a visual analogue scale: a thermometer scale calibrated from 0 (representing ‘worst possible health state’) to 100 (representing ‘best imaginable health state’).


The EQ-5D is now one of the most widely adopted methods for measuring health-related quality of life and is the method favoured by the National Institute for Health and Clinical Excellence (NICE) for deriving utility weightings for cost-effectiveness analysis (see below). It has been used in a wide variety of clinical areas, many pharmaceutical companies now include it as a standard outcome measure in clinical research, and the questionnaire has been translated into many different languages and validated in a number of different countries.21


The SF-36 is a self-administered questionnaire composed of 36 items. It measures health across eight multi-item dimensions, covering functional status, well-being and overall evaluation of health. Responses within each of the dimensions are combined in order to generate a score from 0 to 100, where 0 represents ‘worst health’ and 100 indicates ‘best health’. Dimension scores should not be aggregated into a single index score. The questionnaire takes about 5 minutes to complete, attains good response rates and is suitable for completion by the patients themselves or for administration by trained interviewers on a face-to-face or telephone basis.22

The questionnaire has been used in a variety of settings, administered on differing populations and exhibits good psychometric properties. Detailed information regarding the scoring process and computation are readily available, as are published data for comparative norms. The question complexity can be a problem in situations where the sample comprises individuals with low education levels, but otherwise the instrument is suitable for administration in a wide range of settings.22


This is a newer health measure, which is derived from various items of SF-36 or SF-12 and provides a single index measure.23 It comprises six multi-level dimensions and describes 18 000 health states. The SF-6D measure can be derived for any individual patient completing SF-36 or SF-12. A standard gamble technique has been used to obtain parametric and non-parametric preference weights from a sample of the general population who were asked to rate a variety of health states. These weights are modelled to predict health states obtained from SF-6D. This measure, however, may not be readily used in all countries as preferences vary between people living in different countries. Currently, SF-6D preference weights are available for the UK, Australia, Brazil, Hong Kong, Japan, Portugal and Singapore.

Nottingham Health Profile

The Nottingham Health Profile (NHP) measures levels of self-reported distress. The instrument consists of two parts, each of which can be used independent of the other. Part 1, the most frequently used component, comprises 38 statements that are grouped into six sections: physical mobility, pain, sleep, social isolation, emotional reaction, and energy. The number of statements in each of these sections varies from three for the energy dimension to nine for emotional reaction. The second part of the instrument asks respondents to indicate whether or not their state of health influences activity in seven areas of everyday life: work, looking after the home, social life, home life, sex life, interests and hobbies, and holidays. Responses for both parts 1 and 2 are yes/no responses.24

Scoring is straightforward, with 0 assigned to a ‘no’ response and 1 to a ‘yes’ response. Scores for each of the sections range between 0 (‘worst health’) and 100 (‘best health’). The NHP was designed for self-completion and can readily be used in postal surveys, although it can also be administered by an interviewer. Generally, the instrument takes around 5 minutes to complete.

Quality-Adjusted Life-Years

QALYs were developed as an outcome measure to incorporate effects on both the quality (morbidity) and quantity (mortality) of life. Each year of life is multiplied by a weighting factor reflecting quality of life.25 An alternative to QALYs is healthy years equivalent (HYE), which is discussed below.

QALYs are estimated by assigning every life-year a weight between 0 and 1, where a weight of 0 reflects a health status that is valued as equal to being dead and a weight of 1 represents full health. For example, consider a patient with a colon cancer who has a health state of 0.9. Without surgery, he will die in 2 years. With surgery, his health state deteriorates slightly to 0.7, but he lives 5 more years (in total 7 years). Therefore, the QALY gained with surgery = [(0.7 × 7) − (0.9 × 2)] = 4.9 − 1.8 = 3.1. Another way of expressing QALY gained is from QALY charts. For example, a patient with a particular disease is expected to deteriorate and die at an estimate point 1. With intervention the patient would deteriorate slowly and die at an estimate point 2. The area between the two curves is the QALY gained by the intervention (Fig. 2.1).26

FIGURE 2.1 An example QALY chart showing the QALY gained with intervention.

The QALY is frequently used by decision-makers or researchers to draw comparisons between differing types of health programme or intervention. The QALY can be used in economic evaluation: the number of additional QALYs that a new surgical intervention might yield can be compared with the costs of the new procedure, thus enabling cost per QALY ratios to be generated.27 Such ratios have been used in so-called QALY league tables, whereby procedures or interventions are ranked on this basis. These tables might be seen as useful within the decision-making process, but caution should be exercised in making quick decisions regarding the allocation of resources based upon these comparative tables (see below).2831

HYEs were developed as an alternative in order to address some of the suggested shortcomings of QALYs.32 HYEs produce a hypothetical combination of the number of years in full health that equates to the individual's utility of living a number of years at a health state rated at less than full health. In effect, this can be considered as a lifetime profile of health and is often referred to as a stream of health states. In order to produce the values for the HYEs, a two-stage gamble procedure is used. There has, however, been criticism of the use of HYEs and at present the use of QALYs remains the most common method of performing cost-effectiveness analysis.33,34 For the QALY to be a useful measure of outcome, it should reflect patient or individual preferences, which means that if an individual has a choice between two or more treatments or interventions then they should choose the treatment option that yields the most QALYs.


QALYs remains the most common measure of effectiveness used in cost-effectiveness analysis.33,34


Quality weights for use in QALYs can be obtained directly using one of three main methods – visual analogue scales (VAS), standard gamble and time trade-off – which are described in turn below. In addition to these direct methods of measurement, it is possible to obtain utilities through the mapping of health states derived from other quality-of-life measures onto valuations obtained from members of the general population.35 The method for obtaining utility weightings is currently the subject of considerable research, and there is evidence that different values are obtained depending on the method of evaluation and framing of questions. In addition, there is considerable controversy as to whether the most appropriate utility values for economic evaluation are those of the general population or of patients with the particular conditions.36

Rating Scale Or Visual Analogue Scales: In this approach respondents are asked to rate their preference of outcomes or their present health state on a straight-line chart, often a thermometer with fixed points, where the bottom of the thermometer is represented by 0, which is a health state equivalent to being dead, and the top of the thermometer is 1, which indicates full health. If thermometers ranging between 0 and 100 are used, these values are equated to 0–1, and the same assumptions regarding the health states hold. If, by way of an example, we wanted to measure the quality weight for being an amputee, and the respondent selects the health state at 55 on the thermometer scale of 0–100, then for that health state their quality weight is equivalent to 0.55. Advantages of VAS include their simplicity, but this can be counterbalanced by the fact that there is no choice to be made between health states and therefore it is not possible to observe any trade-offs that the individual might have between health states. Another problem with VAS is that it is uncertain whether the responses lie on a linear interval scale: it is questionable whether moving from 20 to 30 is equivalent to moving from 70 to 80. Further, there is scope for end-of-scale bias, where respondents tend to avoid the extremes of scales, and spacing out bias, where respondents tend to space out the outcomes irrespective of the significance.37 Given these reservations, it is difficult to use VAS methods to determine quality weights in a way that is consistent with the theoretical basis of utility measurement.38

Standard Gamble: The second method routinely used to estimate quality weights is standard gamble. Under this approach, the quality weight of a health state (patients' feeling of being well or unwell) can be constructed by comparing a specific number of years in the health state to a gamble with a probability (P) of achieving full health for the same number of years and a complementary probability (1 − P) of immediate death. The probability of full health is varied until the individual is indifferent between the alternatives, and the quality weight of the assessed health state is therefore equal to P. As an example, let us consider that we are comparing 10 years with heart disease with the gamble of full health for 10 years (probability P) or immediate death, with a complementary probability (1 − P). Here, let us assume that the individual is indifferent at a probability of 0.6 of full health. In practical terms, this means that the individual would consider a certainty of living 10 years with the heart disease to be equivalent to taking a gamble that gives a 60% chance of living 10 years in full health and a 40% chance of immediate death. The quality weight here would be 0.6 for the heart disease health state. The advantages of standard gamble include the fact that it is based on expected utility theory;38 however, one of the major disadvantages is that respondents might find the concept difficult to understand because of the probabilities associated with the method. Another drawback is that the hypothetical choices used in such an approach are not representative of ‘real’ life, since choices between large improvements in health status and large mortality risks are seldom encountered, especially with the assumption that we will live for a certain number of years for sure.

Time Trade-Off: A third approach to estimating quality weights is time trade-off.39 This method compares D years duration in the health state with X years in full health. The number of years in full health is varied until the individual is indifferent between their options; at this point, the quality weight of the health state is calculated by X/D. As an example, consider that we wish to calculate the quality weight for heart disease and that we assess the measurement for 10 years of heart disease. If the individual considers that 10 years of living with heart disease (D = 10) is equivalent to 6 years of full health (X = 6), then the quality weight is equal to 0.6 (6/10).

Obtaining Utilities From Health-Related Quality Of Life: The direct methods of obtaining utilities have a number of drawbacks. In particular, the VAS has a poor theoretical basis, and standard gamble and time trade-off methods may be complex to administer.40 An alternative approach is to derive single index measures from generic or disease-specific health-related quality-of-life measures.41,42

One of the most widely adopted methods is derived from the EQ-5D, as described above.42 The questionnaire generates a limited number of discrete health states depending upon the grading of the responses to each of the questions and previous research has used direct measurement methods to obtain a utility tariff associated with each of these states.43 These tariffs have been widely used for cost-effectiveness analysis and alternative tariffs for different populations have also been developed.21 Another generic measure that has been used to generate utilities is the SF-6D, a measure derived from the SF-36 questionnaire.44 This has been extensively validated and compared with EQ-5D in a number of studies.45

In cases where information from generic scores is not available, methods have been developed to derive utility weightings from disease-specific measures such as scoring systems for arthritis46 or even from laboratory measures such as haemoglobin levels, used as a proxy for anaemia-related symptoms.47 Disease-specific scores may be more sensitive to changes in symptoms for the condition in question, but suffer from the disadvantage that they may fail to capture adverse events or other outcomes of treatment that are outside the main domains that are of interest in the specific disease so that comparability with other diseases may be questionable. There remains considerable controversy about the relative benefits of the use of generic and disease-specific measures to generate utilities.48,49

Costs, charges and resource use

From the above discussion, it can be seen that, from a health economist's perspective, there are several methods for assessing the benefits that might be accrued from a healthcare intervention within the context of an economic evaluation. Whichever approach is adopted within the economic evaluation, the method of identifying either the costs or the benefits is essentially the same for each of the approaches. In order to identify the relevant costs, it will be necessary to categorise all items of resource that will be utilised within the healthcare programme. Therefore, we need to identify which resources are required and which are not. Measurement requires an estimation of the amount of resources used within the programmes, and these should be measured using natural units of measurement. For example, to look specifically at staffing time, one would use units of time (such as hours) that are spent on activities relating to the programme and the specific grades of staff. For other categories of resource use, different units would be appropriate. One might look at drug use in units such as doses of specific drugs. Other examples of resource use and their relevant methods of measurement are outlined in Table 2.1.

Table 2.1

Resource use and methods for measurement

Many of the items in Table 2.1 are readily identifiable and straightforward to value. Of these, staffing costs usually have the greatest impact upon healthcare costs, and these can be readily costed provided that we are aware of the staffing scale and can use wage rates or salary levels attached to the staff level. For example, to cost consultant time one would multiply the number of hours of consultant time in the programme by their hourly pay, with added allowance to cover the costs of leave, sick pay and superannuation, etc. The majority of the other categories of resource use for health services identified in the table (consumables, overheads, capital, etc.) can be readily costed through the use of the market price.

Elsewhere, community services, ambulance services and the expenses incurred by patients and their families would usually be costed in the same manner as health service resources.

Within the above, certain components are notoriously difficult to cost. For example, using patient or family leisure time incurs an opportunity cost, which is complex to measure as there are differing types of activity that are forgone in such situations. It is also complex to attach monetary values to activity involving voluntary care or time lost from housework. No accessible market value exists for either of these areas; therefore, it is customary to use a comparable market value from another market. As an example, Gerard used the wage rate for auxiliary nursing staff in order to cost the inputs by volunteers assisting with respite services for mentally handicapped adults.50 However, there are occasions when comparable proxy market values are not readily accessible. This is often the case for the costing of ‘time off usual activities’, such as housework, which by its nature is not of routine duration and is often irregular in its occurrence, making comparison with other occupations virtually impossible. One approach that is advocated in such circumstances is to use average female labour costs as a relatively accurate reflection of the opportunity cost of housework.51

While costing might appear to be rather simplistic and straightforward, there are a number of considerations that need to be taken into account before embarking upon such an exercise.

Counting Costs In Base Year

First, we should consider whether healthcare costs should be counted during the base year. By this, we mean that the costs should be adjusted in order to take account of inflation. If we assume that the annual inflation rate is running at 6% then £1060 would be required to purchase an item of medical equipment in a year's time that would currently cost £1000. The two values (both now and in 1 year's time) are considered to be equivalent in real terms, although of course they represent two different amounts of money. This problem becomes more acute if we are considering a comparison of two or more health programmes that have their costs spread at differing proportions over a different number of years. To illustrate this point, let us consider the following example from Auld et al.51 where surgical and drug treatment options are considered for the same hypothetical condition (Table 2.2). We assume that each option has the same effect, but the cost streams are different between the two options, and the inflation rate is 5% per annum. This rate means that a cost of £1050 in a year's time is equivalent to £1000 now (i.e. £1050/1.05), and likewise £1102.5 in 2 years' time is also equivalent to £1000 now (i.e. £102.5/1.052). If we compare the costs between the different options in this example, we would conclude that surgery is the more efficient option when compared with the unadjusted drug option, since it is the least costly of the two but equally effective. However, it should be noted at this point that the drug therapy option is only greater in terms of cost owing to inflation. Therefore, if the costs are adjusted to take account of inflation, by adjusting costs to year 0 prices, then both therapies cost exactly the same, with the same effectiveness, and neither of the options can be considered superior to the other.

Table 2.2

Adjusting costs to base year (assuming 5% inflation)


Not all costs and benefits of healthcare programmes are observed to occur at the same point in time. For example, the costs associated with a vaccination programme are incurred very early in order to provide benefits to the individual, or society, in later life. In general, individuals prefer to reap the benefits sooner rather than later and prefer to incur the costs later rather than sooner. The most common method of allowing for such circumstances is to apply a discount rate to future costs and benefits.26 This leads us to consider whether the costs (and benefits) occurring at differing time points should be allocated equal weighting. There is not a consensus amongst health economists over what the appropriate discount rates are, or whether costs and benefits should be discounted at the same rate. Choosing the appropriate discount rate can have significant implications for the results of evaluations; the current recommended rates in England and Scotland are 3.5%. Consequently, sensitivity analysis is essential to assess the implications of varying the discount rate. Recent convention dictates that costs should be discounted at the same rate as benefits,52,53 though again this should be done in conjunction with sensitivity analysis to assess variations in such assumptions. Issues of discounting of costs and benefit may have a particularly dramatic effect on conclusions if different options have marked differences in the timing of expenditure or outcomes. This is particularly important when considering screening programmes and preventative treatments.

Marginal Costing

The marginal cost is the cost incurred or saved from producing one unit more, or one unit less, of a healthcare programme. This is in contrast to the average cost, which is the total cost of a programme divided by the total units produced.

In calculating marginal costs the costs of treating an extra case, or moving from one programme to another, need to be assessed. For example, if there is currently a breast-screening programme for 50- to 65-year-old women and one wishes to consider whether breast screening is as cost-effective in women aged 40–50 years of age, this should be done by looking at the marginal costs and benefits of reducing the age at which screening is started rather than assessing average costs for the entire programme. The use of marginal rather than average costs has been found to be extremely important in screening programmes, where the marginal cost of screening an additional individual can be significantly lower than the average cost.54

Auld et al. illustrate this point with an example of hospital care.51 Although it may well cost £25 000 per annum, on average, to care for an elderly person in hospital, it is extremely unlikely that this amount would be saved if one person less were admitted to the hospital. Similarly, this figure is unlikely to equate to an additional expense if one person more were admitted to the hospital. The reasoning behind this is that some costs, such as capital and overhead costs plus some staffing costs, will not differ with small incremental changes in the numbers of patients entering the hospital.

The use of full costs or marginal costs may depend upon the purpose of an economic evaluation and also the time scale of interest. For example, if one is considering making the most cost-effective use of limited resources then the issue may be one of opportunity costs in that a change in activity may be more or less cost-effective depending upon the activity that is displaced. Marginal costing depends upon an ability to distinguish between fixed and variable costs, and this can be difficult as it may depend upon time-scale and capacity issues. In the long term most resources can be altered in line with activity, although there may be issues of economies of scale. In general the assessment of whether a new technology is considered a cost-effective use of resources will depend upon an assessment of the full costs that would be incurred; however, there may be special considerations where a particular resource has limited availability that could not be influenced by additional expenditure due to capacity constraints.55

Another issue that arises in addressing costs is the difference between the true cost of providing care or treatment and the charge that may be made for that treatment by the providing authority. In the UK in recent years the costs of treatment have been classified according to health resource groups that may cover a range of related procedures and/or diagnostic groupings. For these procedures, reference costs have been determined that represent the average costs of treatment within the NHS and a set of tariffs has been developed that determines the level of funding available to providers that deliver the services. In some cases these tariff rates or reference costs are used for economic analysis; however, they frequently cover a wide of range of procedures and case mix and may not represent true costs.56

Summary Of Cost Analysis

This section highlights the importance of adhering to the appropriate methods when undertaking the costing component of an evaluation. Failure to do so might well lead to incorrect conclusions and to recommendations based upon flawed analysis. Clearly, not all studies that are published will have fully adopted the principles underlying the methods outlined in this section, and it is important when assessing the results from evaluations to consider whether appropriate analyses have been undertaken.

Economic evaluation

Within the healthcare sector, there will never be enough resources to allow the sufficient provision of healthcare to satisfy the demands of society. Quite simply, resources are scarce and choices need to be made about how best to distribute such resources. Such problems are further increased by the fact that healthcare is a mercurial environment with changing technology and population structures. This leads on to the concept of opportunity cost. Because of the scarcity of resources, choices need to be made regarding the best method for their deployment. It is therefore inevitable that choosing to use resources for one activity requires that their use in other activities must be forsaken. The benefits, often referred to as utility, that would have resulted from these forsaken activities are referred to as opportunity costs. In healthcare, the opportunity costs of the use of resources for a particular healthcare programme or intervention are equivalent to the benefits forsaken in the best alternative use of these resources.

It is necessary to identify from whose perspective the economic evaluation is undertaken. The perspective can be that of the individual patient, the NHS, the individual hospital or service provider, the government or society as a whole. If we considered the societal perspective, then we would seek to include all costs and benefits, no matter where they occur. In the UK, the methods recommended by NICE consider that the base case for cost-effectiveness analysis should include health and personal social service costs, but not all societal or costs incurred by patients, although other issues may be taken into account in evaluating technologies.

Within healthcare, economic evaluation is used as a general term to describe a range of methods that look at the costs and consequences of different programmes or interventions.54Each of the methods involves identifying, measuring and, where necessary, valuing all of the relevant costs and consequences of the programme or intervention under review.

There are four main approaches for undertaking economic evaluation: cost-minimisation analysis, cost-effectiveness analysis, cost–utility analysis and cost–benefit analysis. A summary of the features of these is given in Table 2.3 and each is discussed below, outlining their appropriate use in healthcare.

Table 2.3

Methods of economic evaluation

Type of economic evaluation

Units of measurement

Cost-minimisation analysis

Outcomes are the same between the different options; evaluation based upon cost

Cost-effectiveness analysis

Benefits are quantity or quality of life, which are measured in natural units (e.g. life-years gained, cases avoided, etc.)

Cost–utility analysis

Benefits are quantity and quality of life, which are measured using QALYs or HYEs

Cost–benefit analysis

Benefits are quantity and quality of life, which are measured in monetary terms such as human capital or willingness to pay

HYE, healthy years equivalent; QALY, quality-adjusted life-year.

Cost-Minimisation Analysis

Cost-minimisation analysis is often considered to be a form of cost-effectiveness analysis but is treated here as a separate method of economic evaluation. This particular form of economic evaluation is appropriate in circumstances where, prior to investigation, there is no reason to expect that there will be any therapeutic difference in the outcomes of the procedures under consideration. For example, we might wish to consider two different settings of treatment for varicose veins, such as day-case and inpatient treatment. Here, one might assume that there would be no expected differences in outcome between the two forms of treatment, and therefore the preferred option would involve choosing the treatment method that was the least costly of the two. Care should be taken in applying cost minimisation and it should be borne in mind that the lack of evidence regarding differences in outcomes is not the same as evidence that such differences do not exist. If there is potential for such differences then a safer approach is to carry out cost-effectiveness analysis with a sensitivity analysis to examine the effect of possible differences in outcome (see below). It is not unusual to find that plausible but unsubstantiated differences in outcome would outweigh the cost differences, which might have led to incorrect conclusions had analysis been confined to cost-minimisation techniques.

Cost-Effectiveness Analysis

Cost-effectiveness analysis should be used when the outcomes from the different programmes or interventions are anticipated to vary. The outcomes are expressed in natural units, though the appropriate measure to be used in such studies depends ultimately upon the programmes that are being compared. For interventions that would be expected to extend life, natural units such as life-years gained would be an appropriate measure. However, there might be a programme, such as the surgical approach for a hernia repair, where other measures might be considered appropriate, in this case, for example, recurrence rates or time taken to return to work. Likewise, there might be a comparison of two different preventative treatments for coronary heart disease, where heart attacks avoided might be a suitable measure to use. In order to assess the cost-effectiveness, or otherwise, of the interventions, cost is expressed per unit of outcome (cost-effectiveness ratios). The outcome of interest in the appraisal of two or more interventions must be exactly the same for each of the alternatives that are considered. Therefore, the results from cost-effectiveness studies cannot often be generalised in order to assess the impact of interventions for differing conditions unless a unified measure that reflects both quantity and quality of survival is used (see cost–utility analysis). In conclusion, cost-effectiveness analysis is a useful tool for informing choices between alternatives where common outcomes have been used for the analysis.

As an example, one might assess the cost per stroke avoided in comparing the cost-effectiveness of a drug treatment for stroke prevention with that of carotid endarterectomy. However, such figures would be of little help in comparing the value of these treatments with that of a treatment for a different condition, such as joint replacement.

Cost–Utility Analysis

As has been discussed above, one of the potential limitations of cost-effectiveness analysis is that it does not allow for decisions to be made regarding different treatments for differing diseases or conditions if the units of outcome differ between disease areas.

Cost–utility analysis can be thought of as a special case of cost-effectiveness analysis where the outcomes are expressed in generic units that are able to represent the outcome for different conditions and treatments. Therefore, the units of outcome combine both mortality and morbidity information into a single unit of measurement (such as QALYs or HYEs). This two-dimensional outcome measure allows comparisons to be drawn between treatments for different therapeutic areas. These units of measurement are often expressed in terms of a universal unit, usually cost per QALY gained. Such units have resulted in league tables that compare the outcomes for treatments in different areas, and these will be discussed below.


Cost–utility analysis expressed in terms of cost per QALY gained has become the predominant form of cost-effectiveness analysis in recent years.

Incremental Cost-Effectiveness Ratios

In comparing the results of cost-effectiveness analysis for different treatments it is usual to report the incremental cost-effectiveness ratio (ICER). The ICER is the ratio between the additional cost that is incurred and the additional benefit (frequently measured in QALYs). Thus, for example, if a treatment produces 0.5 of a quality-adjusted life-year more than the next best treatment but costs an additional £5000, the incremental cost-effectiveness ratio would be £5000 divided by 0.5 or £10 000 per QALY. If the treatment that produces greater clinical benefit is less costly than the alternative then the ICER is negative and the treatment with better outcomes is said to ‘dominate’, thus being the preferred option in that it produces greater benefit at lower cost.

Where the ICER is positive it reflects the fact that the additional benefit comes at an additional cost and the ICER can be compared with other alternative uses for the available resources. Such calculations have been used to produce league tables to compare the cost of providing benefits by treatment in different clinical settings.

Cost-Effectiveness League Tables

Decision-makers face difficult decisions when asked how to allocate resources in healthcare. Such decisions are increasingly influenced by the relative cost-effectiveness of different treatments and by comparisons between healthcare interventions in terms of their cost per life-year or per QALY gained. The first compilation of such league tables was undertaken by Williams, who calculated the cost per QALY of a range of interventions and divided them into strong candidates for expansion and less strong candidates for expansion.27 Advocates of such analyses argue that if properly constructed, these tables provide comprehensive and valid information to aid decision-makers.

There are, however, problems with the use of such tables and these can make interpretation and comparison between studies problematical.57 First, the year of origin for the studies varies and, because of technological changes and shifts in relative prices, the ranking might not be truly reflective of the intervention under current practice. Second, differing discount rates have been used in the studies, some appropriately and others inappropriately, which impacts upon the results. Third, there have been a variety of preference values for health states, and currently it is difficult to determine which measure of quality of life has been used to derive the estimates concealed within the statistics presented in the league table. Clearly, if there is a high degree of homogeneity between the methods used to derive such estimates then these statistics might well aid decision-makers. Fourth, there is a wide range of costs used within the studies, and often costs are presented at an insufficient level of detail to allow recalculation to reflect local circumstances. In addition, many studies used in such league tables are often compared with differing programmes from which the incremental cost per QALY has been assessed. For example, some might compare with a ‘do nothing’ or ‘do minimum’ alternative, while other programmes would compare with the incremental cost per QALY of expanding services to other groups of patients. Finally, the setting of the study will prove important in drawing comparisons between the statistics in such tables, especially in situations where the studies are undertaken in different countries, requiring adjustments for exchange rates.

There has been a substantial amount of literature on the topic,2831 and while these tables might aid the decision-maker they also need to be interpreted with extreme caution as there is ample opportunity to mislead the casual observer.

Willingness-To-Pay Thresholds

A more common practice in recent years has been to compare incremental cost-effectiveness ratios against a ‘willingness-to-pay’ (WTP) threshold rather than directly against the ICER of other specific treatments. The WTP threshold is a figure that represents the amount that those who fund the healthcare consider is the maximum that should be paid to generate one unit of benefit, usually one QALY. This figure may be specific to local circumstances, depending upon the population and funding arrangements. In the UK, there has been some discussion in recent years on the appropriate level of WTP thresholds and the theoretical basis for the figures.58,59 On one hand it may be thought of as representing the amount that society is prepared to pay to generate additional healthcare benefit. An alternative interpretation is that if one assumes that the resources available to health are finite then additional expenditure on a new treatment will displace expenditure elsewhere. Assuming that healthcare provision was fully efficient, the WTP threshold would be the level of cost per additional QALY at which the activity displaced resulted in a net loss of health benefit that was equivalent to that produced by the new treatment. Thus, any treatments that fell below the threshold would displace activities that produced less overall benefit, providing a net gain in health, whilst any treatment that was purchased above the threshold would be displacing greater health gain than was provided by the new treatment.

At present there is active empirical research to try to establish a realistic WTP threshold. In the UK setting NICE works with a threshold of £20 000–30 000 per QALY,60 although early empirical research suggests that this may be set rather high.61,62

Cost–Benefit Analysis

Whilst cost-effectiveness analysis and cost–utility analysis tell us whether a programme or intervention has better outcomes at additional costs or gains more QALYs, they cannot tell us whether the use of resources to achieve those outcomes is justified. Cost–benefit analysis is a type of evaluation that places a single value, usually in monetary terms, upon the benefits and outcomes from differing programmes of healthcare, i.e. it determines the absolute benefit of both quality and quantity, which is vital in resource allocation. In order to do this the health outcomes from treatment need to be measured in the same units as cost. This can be carried out as an extension to cost–utility analysis where the costs and benefits are converted to the same units. In the UK this is usually done with reference to the WTP threshold set by NICE. For example, if one were to consider a treatment that produces one additional QALY then at a WTP threshold of £20 000 this may be considered equivalent to a £20 000 benefit. If the cost of providing that treatment were £10 000 the treatment would result in a net benefit of £10 000. However, at a cost of £30 000 the net benefit would be − £10 000, implying that there would be a net loss from providing the treatment (as, for example, if it were to displace a more cost-effective use of the available resources). An alternative way of presenting such analysis is in terms of net health benefit rather than economic benefit, so that the result is presented in terms of QALY rather than monetary terms (0.5 or − 0.5 QALY in the above example).63

Choosing An Evaluation Method

The appropriate method of economic evaluation depends upon which choices need to be made and the context within which those choices need to be reached (for example, refer to Table 2.4). If outcomes are expected to be the same then the choice is quite straightforward: cost-minimisation analysis may be used. The limitations of cost-effectiveness with disease-specific outcomes should be borne in mind. Cost–utility analysis has increased in popularity in an attempt to standardise and allow comparisons across different conditions and healthcare programmes. Cost–benefit analysis may offer decision-makers an alternative way of viewing such analysis but is dependent upon a predetermined WTP threshold.

Table 2.4

An example of how to choose a type of economic evaluation based on the question

Sensitivity Analysis

Evaluations will always be subject to elements of uncertainty, be it in terms of resource use, costs or effectiveness. Sensitivity analysis is essential in such circumstances as it allows us to assess how sensitive the study results are to variations in key parameters or assumptions that have been used in the analysis. This allows us to assess whether changes in key parameters will result in savings or costs.

It is possible to undertake sensitivity analysis using as few or as many variables as desired. Commonly, variables such as production variables or discount rates will be used, or if statistical analysis of the variables has been undertaken one can carry out sensitivity analysis around known confidence intervals. Although sensitivity analysis is advocated for evaluations, a review by Briggs and Sculpher52 found that only 39% of articles reviewed had taken at least an adequate account of uncertainty, while only 14% were judged to have provided a good account of uncertainty. In addition, 24% had failed to consider uncertainty at all. There are differing methods of sensitivity analysis, which are discussed below.

Simple sensitivity analysis

Simple sensitivity analysis, in which one or more parameters contained within the evaluation are varied across a plausible range, is widely practised. With one-way analysis, each uncertain component of the evaluation is varied individually in order to assess the separate impact that each component will have upon the results of the analysis. Multi-way sensitivity analysis involves varying two or more of the components of the evaluation at the same time and assessing the impact upon the results. It should be noted that multi-way sensitivity analysis becomes more difficult to interpret as progressively more variables are varied in the analysis.52

Threshold analysis

Threshold analysis involves the identification of the critical value of a parameter above or below which the conclusion of a study will change from one conclusion to another.64 Threshold analysis is of greatest use when a particular parameter in the evaluation is indeterminate, for example a new drug with a price that has not yet been determined. A major limitation of threshold analysis is that it deals only with uncertainty in continuous variables, meaning that it is normally only useful for addressing uncertainty in analyses with data inputs.52

Analysis of extremes

In analysis of extremes, a base-case analysis is undertaken that incorporates the best estimates of the inputs and then further analyses consider extreme estimates of the relevant variables. For example, if two alternative treatment strategies are being compared, then both the high and low costs can be considered for both therapies and costs can be assessed for each of the options based upon combinations of these. Analysis of extremes can be particularly effective in situations where a base-case value is known together with a plausible range, but the actual distribution between the outer limits is unknown. However, a problem with this approach is that it does not consider how likely it is that the various scenarios will arise.52

Probabilistic sensitivity analysis

A final approach to dealing with uncertainty is through the use of probabilistic sensitivity analysis (PSA). This method allows ranges and distributions to be assigned to variables about which we are uncertain, thus allowing for combinations of items that are more likely to take place. For example, it is unlikely that all of the pessimistic factors regarding costs will occur in the evaluation. Techniques such as Monte Carlo simulations allow for the random simultaneous selection of items at designated values and undertake analysis based upon hypothetical patient cohorts. This approach allows the proportion of patients to be estimated for whom one of the options under evaluation is preferred; generally, proportions approaching 100% suggest that the intervention is nearly always preferable under a range of conditions. PSA is generally considered to be the most rigorous form of sensitivity analysis and is gaining widespread use.65

Value of information analysis

Value of information analysis is a recent development that is an extension of PSA. The method uses the results of PSA to consider the effect of reducing the uncertainty. Whilst PSA can provide a measure of the uncertainty around a prediction of cost-effectiveness, expected value of perfect information (EVPI) gives a measure that also incorporates the importance of such uncertainty.66 Further developments of this may help to guide priorities for future research67 or help to design studies and estimate required sample size.68

Ethical Issues

Any formal method for determining the costs and benefits of different treatments that may be used to allocate resources is likely to raise complex ethical issues. In particular, certain methods may create apparent discrimination against certain groups, such as the elderly or disabled, due to reduced capacity to gain from a particular treatment. Such methods may also fail to take into account other issues that are seen by society as being important in allocating resources, such as preferences relating to the process of care and issues such as equity.69 It is important that such economic methods should not be used without considering these wider implications of the decisions which stem from such analyses.

Recent Advances

Most economic evaluations in healthcare use the above-mentioned methods looking at monetary value for new treatment options. There are, however, a number of complex issues in economic evaluation that remain controversial. These include whether to use patient or societal preferences, weighting of QALY to consider severity of disease, carer benefits and the incorporation of a value for innovation. Over the last decade, multi-criteria decision analysis (MCDA) has been suggested as a way to incorporate these complex and often conflicting values in economic evaluation. In MCDA, ‘criteria’ refers to the value taken into consideration. The process involves consideration of multiple criteria, each of which is given a weight in coming to an ‘objective’ decision.70 Currently, NICE health technology appraisals predominantly use ICER provided by cost–utility analysis. This is considered, using informal methods for incorporating other issues that are not thought to be incorporated in the costs or QALY measures, often by adjusting the WTP threshold that is considered acceptable.71

Another major change in NICE economic evaluations evolved in appraisals for interventions involving ‘end of life’. As mentioned in earlier sections, NICE considers interventions to be cost-effective if the cost per QALY gained is less than £20 000–30 000. However, in 2009, NICE issued guidance wherein some ‘end-of-life’ interventions or therapeutics that cost more than £30 000 per QALY gained may be given consideration if the treatment is indicated for conditions with a life expectancy of less than 24 months and if there is sufficient evidence that the new intervention improves life expectance by at least 3 months compared to the available NHS treatment and if the treatment if licensed for small population groups.72


Whether making individual or policy decisions regarding healthcare provision, it is becoming increasingly important for clinicians to take into account evidence about both the effectiveness and the cost-effectiveness of the treatment options. This requires that they examine the available evidence with particular attention to the appropriateness of the outcome measures used and of any techniques for economic analysis. In particular, there is a need for both clinicians and researchers to focus upon outcomes that are relevant to patients and truly represent their views about the relative values of the health states and events that they may encounter. Outcome research and economic evaluation are relatively new areas of healthcare research but they are progressing rapidly. An understanding of the methods used is a prerequisite for an adequate interpretation of the conclusions drawn from such work.


Key points

  • The choice of outcome measure is important in assessing the results of surgical treatment and needs to be carefully considered.
  • The measure used should be clinically relevant and preferably have been validated by previous research.
  • Possible measures relevant to surgery include mortality, condition-specific measures, standard pain questionnaires and generic measures of health-related quality of life.
  • Quality-adjusted life-years are a commonly used measure of outcome and there are several different ways to produce the weights (utilities) that are required to calculate these.
  • The estimation of the cost of treatments should include a detailed analysis of the resources used and their valuation, and may require consideration of the timing of incurring various costs.
  • There are several different methods of economic evaluation, including cost-minimisation, cost-effectiveness, cost–utility and cost–benefit analysis.
  • The use of cost-effectiveness analysis may allow comparison of health benefits to be gained by expenditure on different treatments but is not without both technical and ethical problems in its application.


  1. Fitzpatrick, R., Davey, C., Buxton, M.J., et al. Evaluating patient-based outcome measures for use in clinical trials. Health Technol Assess. 1998;2(14):i–iv. [1–74].
  2. Collin, C., Wade, D.T., Davies, S., et al, The Barthel Index: a reliability study. Int Disabil Stud. 1988;10(2):61–63. 3403500
  3. Aissaoui, Y., Zeggwagh, A.A., Zekraoui, A., et al, Validation of a behavioral pain scale in critically ill, sedated, and mechanically ventilated patients. Anesth Analg. 2005;101(5):1470–1476. 16244013
  4. Campbell, D.T., Fiske, D.W., Convergent and discriminant validation by the multitrait–multimethod matrix. Psychol Bull. 1959;56(2):81–105. 13634291
  5. Perkins, J.M., Collin, J., Creasy, T.S., et al, Exercise training versus angioplasty for stable claudication. Long and medium term results of a prospective, randomised trial. Eur J Vasc Endovasc Surg. 1996;11(4):409–413. 8846172
  6. Stockton, D., Davies, T., Day, N., et al. Retrospective study of reasons for improved survival in patients with breast cancer in east Anglia: earlier diagnosis or better treatment. Br Med J. 1997;314(7079):472–475.
  7. Sowden, A.J., Sheldon, T.A., Does volume really affect outcome? Lessons from the evidence. J Health Serv Res Policy. 1998;3(3):187–190. 10185378
  8. Jones, J., Rowan, K., Is there a relationship between the volume of work carried out in intensive care and its outcome? Int J Technol Assess Health Care. 1995;11(4):762–769. 8567208
  9. Brazier, J., Dixon, S., The use of condition specific outcome measures in economic appraisal. Health Econ. 1995;4(4):255–264. 8528428
  10. Spilker, B., Molinek, F.R., Jr., Johnston, K.A., et al. Quality of life bibliography and indexes. Med Care. 1990;28(12, Suppl):DS1–D77.
  11. Meenan, R.F., Mason, J.H., Anderson, J.J., et al. AIMS2. The content and properties of a revised and expanded Arthritis Impact Measurement Scales Health Status Questionnaire.Arth Rheum. 1992;35(1):1–10.
  12. Goldman, L., Hashimoto, B., Cook, E.F., et al, Comparative reproducibility and validity of systems for assessing cardiovascular functional class: advantages of a new specific activity scale. Circulation. 1981;64(6):1227–1234. 7296795
  13. Melzack, R., The McGill Pain Questionnaire: major properties and scoring methods. Pain. 1975;1(3):277–299. 1235985
  14. Garratt, A.M., Macdonald, L.M., Ruta, D.A., et al. Towards measurement of outcome for patients with varicose veins. Qual Health Care. 1993;2(1):5–10.
  15. Guyatt, G.H., Berman, L.B., Townsend, M., et al, A measure of quality of life for clinical trials in chronic lung disease. Thorax. 1987;42(10):773–778. 3321537
  16. Carroll, D., Rose, K., Treatment leads to significant improvement. Effect of conservative treatment on pain in lymphoedema. Prof Nurse. 1992;8(1):32–33. 35–6. 1480641
  17. Payen, J.F., Bru, O., Bosson, J.L., et al, Assessing pain in critically ill sedated patients by using a behavioral pain scale. Crit Care Med. 2001;29(12):2258–2263. 11801819
  18. Young, J., Siffleet, J., Nikoletti, S., et al, Use of a Behavioural Pain Scale to assess pain in ventilated, unconscious and/or sedated patients. Intensive Crit Care Nurs. 2006;22(1):32–39.16198570
  19. Brazier, J., Deverill, M., Green, C., et al. A review of the use of health status measures in economic evaluation. Health Technol Assess. 1999;3(9):i–iv. [1–164].
  20. Group, T.E. EuroQol – a new facility for the measurement of health-related quality of life. The EuroQol Group. Health Policy. 1990;16(3):199–208.
  21. Rabin, R., de Charro, F., EQ-5D: a measure of health status from the EuroQol Group. Ann Med. 2001;33(5):337–343. 11491192
  22. Brazier, J.E., Harper, R., Jones, N.M., et al. Validating the SF-36 health survey questionnaire: new outcome measure for primary care. Br Med J. 1992;305(6846):160–164.
  23. Brazier, J.E., Roberts, J., The estimation of a preference-based measure of health from the SF-12. Med Care. 2004;42(9):851–859. 15319610
  24. Hunt, S.M., McKenna, S.P., McEwen, J. Measuring health status. London: Croom-Helm; 1986.
  25. Torrance, G.W., Measurement of health state utilities for economic appraisal. J Health Econ. 1986;5(1):1–30. 10311607
  26. Drummond, M.F., Sculpher, M.J., Torrance, G.W., et al. Methods for the economic evaluation of healthcare programmes, 3rd ed. Oxford: Oxford University Press; 2005.
  27. Williams, A. Economics of coronary artery bypass grafting. Br Med J (Clin Res Ed). 1985;291(6491):326–329.
  28. Birch, S., Gafni, A., Cost-effectiveness ratios: in a league of their own. Health Policy. 1994;28(2):133–141. 10136058
  29. Drummond, M., Torrance, G., Mason, J., Cost-effectiveness league tables: more harm than good? Soc Sci Med. 1993;37(1):33–40. 8332922
  30. Mason, J., Drummond, M., Torrance, G. Some guidelines on the use of cost effectiveness league tables. Br Med J. 1993;306(6877):570–572.
  31. Drummond, M., Mason, J., Torrance, G., Cost-effectiveness league tables: think of the fans. Health Policy. 1995;31(3):231–238. 10142619
  32. Mehrez, A., Gafni, A., Healthy-years equivalents versus quality-adjusted life years: in pursuit of progress. Med Decis Making. 1993;13(4):287–292. 8246700
  33. Buckingham, K., A note on HYE (healthy years equivalent). J Health Econ. 1993;12(3):301–309. 10145202
  34. Johannesson, M., Jonsson, B., Karlsson, G., Outcome measurement in economic evaluation. Health Econ. 1996;5(4):279–296. 8880165
  35. Brazier, J., Roberts, J., Deverill, M., The estimation of a preference-based measure of health from the SF-36. J Health Econ. 2002;21(2):271–292. 11939242
  36. Johannesson, M., O'Conor, R.M., Cost–utility analysis from a societal perspective. Health Policy. 1997;39(3):241–253. 10165464
  37. Bleichrodt, H., Johannesson, M., Standard gamble, time trade-off and rating scale: experimental results on the ranking properties of QALYs. J Health Econ. 1997;16(2):155–175.10169092
  38. von Neumann, J., Morgenstern, O. Theory of games and economic behaviour. New York: Wiley; 1967.
  39. Torrance, G.W., Thomas, W.H., Sackett, D.L., A utility maximization model for evaluation of health care programs. Health Serv Res. 1972;7(2):118–133. 5044699
  40. Hollingworth, W., Deyo, R.A., Sullivan, S.D., et al, The practicality and validity of directly elicited and SF-36 derived health state preferences in patients with low back pain. Health Econ. 2002;11(1):71–85. 11788983
  41. Stein, K., Fry, A., Round, A., et al, What value health? A review of health state values used in early technology assessments for NICE. Appl Health Econ Health Policy. 2005;4(4):219–228. 16466273
  42. Rasanen, P., Roine, E., Sintonen, H., et al, Use of quality-adjusted life years for the estimation of effectiveness of health care: a systematic literature review. Int J Technol Assess Health Care. 2006;22(2):235–241. 16571199
  43. Dolan, P., Modeling valuations for EuroQol health states. Med Care. 1997;35(11):1095–1108. 9366889
  44. Brazier, J., Usherwood, T., Harper, R., et al, Deriving a preference-based single index from the UK SF-36 Health Survey. J Clin Epidemiol. 1998;51(11):1115–1128. 9817129
  45. Brazier, J., Roberts, J., Tsuchiya, A., et al, A comparison of the EQ-5D and SF-6D across seven patient groups. Health Econ. 2004;13(9):873–884. 15362179
  46. Bansback, N., Marra, C., Tsuchiya, A., et al. Using the health assessment questionnaire to estimate preference-based single indices in patients with rheumatoid arthritis. Arth Rheum. 2007;57(6):963–971.
  47. Wilson, J., Yao, G.L., Raftery, J., et al. A systematic review and economic evaluation of epoetin alpha, epoetin beta and darbepoetin alpha in anaemia associated with cancer, especially that attributable to cancer treatment. Health Technol Assess. 2007;11(13):iii–iiv. [1–202].
  48. Sculpher, M.J., Price, M., Measuring costs and consequences in economic evaluation in asthma. Respir Med. 2003;97(5):508–520. 12735668
  49. Stolk, E.A., Busschbach, J.J., Validity and feasibility of the use of condition-specific outcome measures in economic evaluation. Qual Life Res. 2003;12(4):363–371. 12797709
  50. Gerard, K., Determining the contribution of residential respite care to the quality of life of children with severe learning difficulties. Child Care Health Dev. 1990;16(3):177–188.2350870
  51. Auld, C., Donaldson, C., Mitton, C., et al. Economic evaluation. In: Detel R., et al, eds. Oxford textbook of public health, Vol. 2:The methods of public health. Oxford: Oxford University Press, 2002.
  52. Briggs, A., Sculpher, M., Sensitivity analysis in economic evaluation: a review of published studies. Health Econ. 1995;4(5):355–371. 8563834
  53. Cairns, J., Discounting and health benefits: another perspective. Health Econ. 1992;1(1):76–79. 1342634
  54. Drummond, M., Maynard, A. Purchasing and providing cost-effective health care. Edinburgh: Churchill Livingstone; 1993.
  55. Hartwell, D., Colquitt, J., Loveman, E., et al. Clinical effectiveness and cost-effectiveness of immediate angioplasty for acute myocardial infarction: systematic review and economic evaluation. Health Technol Assess. 2005;9(17):iii–iiv. [1–99].
  56. Baboolal, K., McEwan, P., Sondhi, S., et al, The cost of renal dialysis in a UK setting – a multicentre study. Nephrol Dial Transplant. 2008;23(6):1982–1989. 18174268
  57. Mason, J., Drummond, M., Reporting guidelines for economic studies. Health Econ. 1995;4(2):85–94. 7613600
  58. Raftery, J. Should NICE's threshold range for cost per QALY be raised? No. Br Med J. 2009;338:b185.
  59. Towse, A. Should NICE's threshold range for cost per QALY be raised? Yes. Br Med J. 2009;338:b181.
  60. Rawlins, M.D., Culyer, A.J. National Institute for Clinical Excellence and its value judgments. Br Med J. 2004;329(7459):224–227.
  61. Appleby, J., Devlin, N., Parkin, D. NICE's cost effectiveness threshold. Br Med J. 2007;335(7616):358–359.
  62. Martin, S., Rice, N., Smith, P., The link between health care spending and health outcomes: evidence from English programme budgeting data. Centre for Health Economics Research Paper 24. University of York; 2007.
  63. Stinnett, A.A., Mullahy, J. Net health benefits: a new framework for the analysis of uncertainty in cost- effectiveness analysis. Med Decis Making. 1998;18(2, Suppl):S68–S80.
  64. Pauker, S.G., Kassirer, J.P., The threshold approach to clinical decision making. N Engl J Med. 1980;302(20):1109–1117. 7366635
  65. Claxton, K., Sculpher, M., McCabe, C., et al, Probabilistic sensitivity analysis for NICE technology assessment: not an optional extra. Health Econ. 2005;14(4):339–347. 15736142
  66. Felli, J.C., Hazen, G.B., Sensitivity analysis and the expected value of perfect information. Med Decis Making. 1998;18(1):95–109. 9456214
  67. Claxton, K.P., Sculpher, M.J., Using value of information analysis to prioritise health research: some lessons from recent UK experience. Pharmacoeconomics. 2006;24(11):1055–1068.17067191
  68. Ades, A.E., Lu, G., Claxton, K., Expected value of sample information calculations in medical decision modeling. Med Decis Making. 2004;24(2):207–227. 15090106
  69. Ubel, P.A., DeKay, M.L., Baron, J., et al, Cost-effectiveness analysis in a setting of budget constraints – is it equitable? N Engl J Med. 1996;334(18):1174–1177. 8602185
  70. Belton, V., Stewart, T.J. Multi criteria decision analysis: an integrated approach. Dordrecht: Kluwer Academic; 2002.
  71. Thokala, P. Multiple criteria decision analysis for health technology assessment. Decision Support Unit, NICE; 2011.
  72. (NICE) NIoCE. Appraising life-extending, end of life treatments. Available at, 2009. [[accessed 05.01.12]].