Urogynecology: Evidence-Based Clinical Practice 2nd ed.

5. Outcome Measures Used to Assess Response

Kate H. Moore1


Department Obstetrics & Gynaecology, St George Hospital, Kogarah, New South Wales, Australia


In the past, doctors recommended a particular treatment because, in their experience, most patients said that they were “better” after receiving it. In the last two decades, we have realized that this is not good enough. We need objective measures by which we can determine what percentage of patients are “cured” (normal) or at least have greater than 50 % reduction in symptoms, after any given treatment.


In the past, doctors recommended a particular treatment because, in their experience, most patients said that they were “better” after receiving it. In the last two decades, we have realized that this is not good enough. We need objective measures by which we can determine what percentage of patients are “cured” (normal) or at least have greater than 50% reduction in symptoms, after any given treatment.

In this century, outcome measures are going to become even more important, because there is not enough money to fund all health care. Doctors (and administrators) must assess whether one treatment is more effective than another, so that money can be spent on that which is most effective. This is loosely termed “health economics.”

In the 1980s, continence clinicians began to realize the importance of outcome measures. It was a time of great creation. Many different outcome measures were created but not necessarily fully “validated.” The process of validation involves the following steps:

·               Establish the validity of the test that it measures what it is supposed to—includes three subsets: content validity, construct validity, and criterion validity.

·               Establish the reliability of the test. For questionnaires, measure internal consistency of different parts of test. For questionnaires and other physical tests, the reproducibility, or test–retest reliability, needs to be proven.

·               Establish the responsiveness to change of the test, before and after treatment.

This chapter provides a brief overview of outcome measures that have been validated. Most are used in this book to describe the effectiveness of different treatments.

The Standardization Committee of the International Continence Society (ICS) is the main body that has governed terminology and outcome measures in the field of urinary incontinence since 1978 [8]. The urodynamic measures described in the previous chapter and the pelvic floor assessments (Oxford score and POPQ) described in Chap. 2 are also used as outcome measures. The tests described in this chapter do not require physical examination or invasive procedures. More recently, the International Urogynecology Society (IUGA) has also created a joint ICS and IUGA standardization of terminology document which also fully covers the terminology for prolapse conditions. The IUGA website is http://www.iuga.org/.

The World Health Organization has recently acknowledged the global importance of incontinence, by holding a regular International Consultation on Incontinence (ICI) Abrams et al. [13]. These publications also consider which treatments are the most effective, as judged by standardized outcome measures. Finally, the Cochrane Collaboration performs meta-analyses of randomized controlled trials in the field of incontinence, which also use the outcome measures described in this text.

The ICS recommends that there should be five main groups, or “domains,” of outcome measures [9].






Tests That Measure Patient’s Symptoms

The ICIQ-SF was validated under the auspices of the ICI. It records incontinence symptoms and severity, with a simple quality of life question. The final ICIQ comprises three scored items (Fig. 5.1, maximum score 21) and an unscored self-diagnostic item.


Figure 5.1

The ICIQ-SF questionnaire

The Wexner Score for Fecal Incontinence

This was originally a 20-point score concerning three types of incontinence with one question for impact upon lifestyle (italic bold in Table 5.1). Later, a score for wearing pads, taking constipating medication, or suffering from fecal urgency was added (ordinary typeface in Table 5.1). The Wexner score has been fully validated [15] and is used worldwide.

Table 5.1

Modified Wexner fecal incontinence score







Incontinent solid stool






Incontinent liquid stool






Incontinent to gas






Alters lifestyle









Need to wear pad/plug




Take constipating meds




Unable to defer 15 min




Tests That Quantify Patients’ Symptoms

Rather than giving the patient a questionnaire about her symptoms, the following tests actually measure symptoms such as stress leak, urge leak, frequency, or nocturia.

Bladder Chart

The bladder chart is a generic term used to indicate several types of records.

·               The micturition chart only asks patients to record times of voiding and incontinence episodes; only output is considered, roughly.

·               The frequency–volume chart (FVC) also asks patients to record their fluid intake and the volume they void and when they change pads, usually over 3 days.

·               The urinary diary includes the details of the FVC but also includes symptoms and activities at leakage episodes, including urgency, coughing, lifting, and others.

The micturition chart tells nothing about people who drink too much (>3 l/day) or too little (<1.5 l/day). Most clinicians use the frequency–volume chart. Although the bladder diary (Fig. 5.2) provides even more detail about the type of leakage, patients often object to the detail required, depending on how many days of charting you require.


Figure 5.2

A urinary diary from a patient with urge incontinence. Patient drinks little (1.065 l/day), has marked frequency (11 voids per day), nocturia ×2, and a small bladder capacity (average of 12 voids  =  108 ml). Note that diary gives the extra details that she leaks with urge, laughing, running water. Note the caffeine intake

The FVC is a useful outcome measure. It tells you:

·               The number of leakage episodes per 24 h (in mild cases, convert to leaks per week by taking an average of the 3 days)

·               The number of voids per day (“frequency”)

·               The episodes of nocturia

·               Whether patients are fluid restricting for fear of urge leak or drinking more than 2 l/day (over-drinking)

The ideal duration of the FVC is controversial. The 7-day diary is the most sensitive and accurate, but patients dislike this burden, so compliance is poor. Because the first 3 days and the last 4 days of a 7-day test correlate well (r =  0.9), most clinicians use a 3-day FVC, at least at the first visit. The ICI Committee for Research Methodology found that in most cases, a single 24-h diary is sufficient. In our unit, we use a 3-day FVC for the first visit and a 24-h urinary diary for follow-up visits (see discussion of bladder training in Chap. 7).

The Pad Test

The One-Hour Pad Test

This was initially the “industry standard” after its introduction in 1983 and ICS recommendation in 1988. This test involves:

·               Patients attend with a comfortably full bladder.

·               Are given a pre-weighed continence pad.

·               Then drink 500 ml of water over 15 min.

·               Then perform a standard series of activities to provoke leakage.

·               The voided volume is then measured, and the wet pad is re-weighed.

Unfortunately, the 1-h pad test fails to correlate with other measures of severity (poor criterion validity) and has poor sensitivity (up to 40 % false negative rate). For many years, the 1-h pad test was the only objective method that could be used to define mild (1–10 g leakage per 1 h), moderate (11–50 g/h), and severe (>50 g/h) incontinence; thus, it is used in many publications quoted in this text.

The 24-Hour Pad Test

Because of the problems with the 1-h test, the 24-h home pad test was developed in the late 1980s. This test involves the following.

·               Women are given a set of pre-weighed pads in sealed bags. The pads are worn at home for 24 h. Ordinary provocative activities are carried out.

·               They return pads in a sealed plastic bag, personally or by post, to be re-weighed.

There is no loss of accuracy by evaporation from the sealed plastic bag for durations of 72 h to 2 weeks. Thus, wet pads can be returned via post. The 24-h pad test is more sensitive than the 1-h test (10% false negative rate).

Normal ranges for the 24-h pad test have been controversial. Studies from 1989 to 1996 in small samples of women (n  =  23–78), using simple kitchen scales, gave normal values of 3–8 g. This seems a lot of fluid on the underwear to be tolerated by an asymptomatic woman. However, this definition of “continent” (up to 8 g) is used throughout most studies in this book.

Recently [6], the normal values were redefined (n  =  120) using scales accurate to 0.1 g. A median value of 0.3 g (95th centile 1.3 g) was obtained. The test correlates with the ICIQ [7].

The definition of mild, moderate, and severe is important. Because conservative therapy is more likely to cure patients with mild incontinence and surgery is often offered to patients with severe leakage, a pad test should be able to define severity. Recently mild, moderate, and severe were characterized as 1.3–20, 21–74, and >75 g on the 24-h test [10].

Tests That Measure Anatomical and Functional Observations by Doctors

·               The Oxford score for measuring pelvic floor muscle strength and the POPQ scoring system for measuring prolapse were shown in Chap. 2.

·               The standard urodynamic test measurements were shown in Chap. 4.

Quality of Life for Incontinence

A large array of quality of life (QOL) tests are available urogynecology.

·               Generic tests that just measure overall QOL are often used to provide a comparison with other medical therapies (e.g., cardiac surgery). The SF36 is the most common.

·               For incontinence, the two most common are the urogenital distress inventory (UDI) and the incontinence impact questionnaire (IIQ), from the United States. Both come in a short form and have been fully validated. The King’s health questionnaire is also often used (from the United Kingdom, available in many languages). For full review, see Abrams [13].

·               In order to perform a health economic analysis, a QOL test that scales from 1 to 100 needs to be used, such as the York questionnaire or the AQOL (for review, see Moore et al. [8]).

Quality of Life Tests for Prolapse and Sexual Function

Why Do We Need QOL Tests for Prolapse?

Up until the last decade, gynecologists judged the “success” of their surgery for prolapse on the anatomical findings after the operation. As will be seen in Chap. 10 on Prolapse Management, it now appears that we may have focused too heavily on getting a perfect “vaginal reconstruction,” without asking the patient whether she is still bothered by any prolapse symptoms.

Therefore, we needed to have outcome measures that could reliably define the severity of “prolapse bother symptoms.” These outcome measures are still undergoing refinement, but the most useful ones are listed here:

·               The pelvic floor distress inventory

·               The pelvic floor impact questionnaire

These questionnaires mainly focus upon “bother” of urinary, bowel, vaginal prolapse, and sexual symptoms [4] but do not measure the frequency or severity of the symptoms. They were based on the UDI and the IIQ (that are long-standing validated measures of QOL impairment due to leakage, see earlier this chapter).

In 2006, the International Consultation on Incontinence (ICI) expanded the ICIQ into a series of modules that could be added onto the ICIQ, to encompass prolapse symptoms [11].

A copy of ICIQ–VS is available on their website: www.iciq.net.

Why Do We Need QOL Tests for Sexual Function?

For many years, women over the age of 60–70 (the peak ages for prolapse) were not thought to desire sexual activity any more. Now that the median lifespan for women is 83 years in most Western countries, gynecologists have learned that many women (and/or their partners) do wish to continue normal sexual activity well into later life. Thus, we have become aware that simply restoring normal vaginal anatomy is not necessarily sufficient. If the surgery required to do this causes scarring and fibrosis with dyspareunia, then the woman, if asked, may not consider this operation a functional success.

As a result, QOL tests that measure sexual function have been developed:

·               The “PISQ”—prolapse and sexual function questionnaire—was published in 2001 [12].

·               The GRISS test of sexual function was published in 1986 [13] and includes 28 questions about overall sexual function without any items that focus on prolapse symptoms, but it has been used as an outcome measure for prolapse surgery.

Socioeconomic Evaluation

·               A standard test for measuring the personal and treatment costs of incontinence is the Dowell Bryant incontinence cost index (DBICI), which is validated [5]. Another common test is the willingness to pay questionnaire, usually tailor made for the particular condition.


In later chapters in this text, studies that employ validated outcome measures are emphasized, but in the absence of objective data, some studies presenting mainly subjective data are mentioned.



Abrams P, Khoury S, Wein A, editors. Incontinence: report of world health organization. Plymouth: Health; 1998.


Abrams P, Cardozo L, Koury S, Wein A, editors. Incontinence: report of World Health Organization. Plymouth: Health; 2001.


Abrams P, Cardozo L, Koury S, Wein A, editors. Incontinence: report of World Health Organization. Plymouth: Health; 2005.


Barbes MD, Kutchibtatia MN, Peper CF, Bump RC. Psychometric evaluation of 2 condition – specific quality of life instruments for women with pelvic floor disorders. Am J Obstet. 2001;185:1388–95. Gynecol.CrossRef


Dowell CJ, Bryant CM, Moore KH, Simons AM. Calculation of the direct costs of urinary incontinence: the DBICI, a new test instrument. Br J Urol. 1999;83:596–606.CrossRef


Karantanis E, O’Sullivan R, Moore KH. The 24-hour pad test in continent women and men: normal values and cyclical alterations. Br J Obstet Gynaecol. 2003;110:567–71.CrossRef


Karantanis E, Fynes M, Moore KH, Stanton SL. Comparison of the ICIQ-SF and 24-hour pad test with other measures for evaluating the severity of urodynamic stress incontinence. Int Urogynecol J. 2004;15:111–6.CrossRef


Moore KH, Hu TW, Subak L, Wagner TH, Duetekom M. Economics of urinary & faecal incontinence, and prolapse. In: Abrams P, Cardozo L, Koury S, Wein A, editors. Report of world health organization. Plymouth: Health Publications Ltd; 2009. p. 1687–712.


Lose G, Fantl A, Victor A, Walter S, Wells T, Wyman J, et al. Outcome measures for research in adult women with symptoms of lower urinary tract dysfunction. Neurourol Urodyn. 1998;17:255–62.PubMedCrossRef


O’Sullivan R, Karantanis E, Stevermuer TL, Allen W, Moore KH. Definition of mild, moderate and severe incontinence on the 24-hour pad test. Br J Obstet Gynaecol. 2004;111:859–62.CrossRef


Price N, Jackson SR, Avery K, Brookes ST, Abrams P. Development and psychometric evaluation of the ICIQ vaginal symptom questionnaire. The ICIQ-VS. BJOG. 2006;113:700–12.PubMedCrossRef


Rogers RG, Kammerer-Doak D, Villaveal A, Coates K, Qualls C. A new instrument to measure sexual function in women with urinary incontinence or pelvic organ prolapse. Am J Obstet Gynecol. 2001;188:552–8.CrossRef


Rust J, Golombok S. The GRISS: a psychometric instrument for the assessment of sexual dysfunction. Arch Sex Behav. 1986;15:157–65.PubMedCrossRef


Stikrishna S, Robinson D, Cardozo L, Gonzalez J. Can sex survive pelvic floor surgery? Int Urogyrecol J. 2010;21:1313–9.CrossRef


Vaisey C, Garapeti E, Cahill J, Kamm M. Prospective comparison of faecal incontinence grading systems. Gut. 1999;44:77–80.CrossRef