Georgette Siparsky, PhD
Frank J. Accurso, MD
Laboratory tests provide valuable information necessary to evaluate a patient’s condition and to monitor recommended treatment. Chemistry and hematology test results are compared with those of healthy individuals or those undergoing similar therapeutic treatment to determine clinical status and progress. In the past, the term normal ranges relayed some ambiguity because statistically, the term normal also implied a specific (Gaussian or normal) distribution and epidemiologically it implied the state of the majority, which is not necessarily the desirable or targeted population. This is most apparent in cholesterol levels, where values greater than 200 mg/dL are common, but not desirable. Use of the term reference range or reference interval is therefore recommended by the International Federation of Clinical Chemistry (IFCC) and the Clinical and Laboratory Standards Institute (CLSI, formerly the National Committee for Clinical Laboratory Standards [NCCLS]) to indicate that the values relate to a reference population and clinical condition.
Reference ranges are established for a specific age (eg, alpha-fetoprotein), sex, and sexual maturity (eg, luteinizing hormone and testosterone); they are also defined for a specific pharmacologic status (eg, taking cyclosporine), dietary restrictions (eg, phenylalanine), and stimulation protocol (eg, growth hormone). Similarly, diurnal variation is a factor (eg, cortisol), as is degree of obesity (eg, insulin). Some reference ranges are particularly meaningful when combined with other results (eg, parathyroid hormone and calcium), or when an entire set of analytes is evaluated (eg, lipid profile: triglyceride, cholesterol, high-density lipoprotein, and low-density lipoprotein).
Laboratory tests are becoming more specific and measure much lower concentrations than ever before. Therefore, reference ranges should reflect the analytical procedure as well as reagents and instrumentation used for a specific analysis. As test methodology continues to evolve, reference ranges are modified and updated.
CHALLENGES IN DETERMINING & INTERPRETING PEDIATRIC REFERENCE INTERVALS
The pediatric environment is particularly challenging for the determination of reference intervals since growth and developmental stages do not have a distinct and finite boundary by which test results can be tabulated. Reference ranges may overlap and, in many cases, complicate diagnosis and treatment. Collection and allocation of test results by age for the purpose of establishing a reference range is a convenient and manageable way to report them, but caution is needed in their interpretation and clinical correlation.
A particular difficulty lies in establishing reference ranges for analytes whose levels are changed under scheduled stimulation conditions. The common glucose tolerance test is such an example, but more complex endocrinology tests (eg, stimulation by clonidine and cosyntropin) require skill and extensive experience to interpret. Reference ranges for these serial tests are established over a long period of time and are not easily transferable between test methodologies. Changing analytical technologies add a new dimension to the challenges of establishing pediatric reference ranges.
Adeli K: Special Issue on Laboratory Reference Intervals. eJIFCC. September 2008. http://www.ifcc.org/PDF/190201200801.pdf.
C28A3: Defining, Establishing, and Verifying Reference Intervals in the Clinical Laboratory: Approved Guideline, 3rd ed. http://www.clsi.org/source/orders.
GUIDELINES FOR USE OF DATA IN A REFERENCE RANGE STUDY
The College of American Pathologists provides guidelines for the adoption of reference ranges used in hospitals and commercial clinical laboratories. It recognizes the enormous task of establishing a laboratory’s own reference ranges, and recommends alternatives to the process. A laboratory may acquire reference ranges by:
1. Conducting its own study to evaluate a statistically significant number of “healthy” volunteers. It is a monumental task for a laboratory to develop its own pediatric reference ranges, because parental consent and procedural approval by review boards need to be addressed. The numerous age categories to be evaluated also add to the complexity and size of the study.
2. Adopting ranges established by the manufacturer of a particular analytical instrument. The laboratory must validate the data by analyzing a sample of 20 “reference” subjects (patients representing that specific population) to confirm that the adopted range is truly representative of that group.
3. Using reference data in the general medical literature and conferring with physicians to make sure the data agree with their clinical experience. A validation study is also recommended.
4. Analyzing hospital patient data. Laboratory test results from hospital patients have been used to compute reference ranges provided they fulfill stated clinical criteria. Patient records need to indicate that the patient’s specific medical condition does not influence the analyte whose reference range is being determined. For example, a child undergoing surgery for bone fracture repair is expected to have normal electrolytes and thyroid function, whereas a child examined for precocious puberty should not be included in a reference range study for luteinizing hormone. In the REALAB Project, Grossi et al chose the “inclusion criterion,” whereby tests that have a single laboratory measurement were used; justification was based on the fact that persons with repeated testing had a higher probability of being diseased and their results should be excluded from the study.
Statistically, the sample size of a hospital patient study should be considerably larger than that of a healthy group. A study from a healthy population may require 20 subjects to be statistically significant, whereas a hospital population should evaluate a minimum of 120 patients. At Children’s Hospital, Colorado, the free thyroxine reference range was recently established using 1480 clinic and hospital patient results in this manner.
Biological Variation Database Reference List: http://www.westgard.com/biological-variation-database-reference-list.htm.
College of American Pathologists publication: http://www.cap.org/apps/docs/laboratory_accreditation/sample_checklist.pdf.
Grossi E et al: The REALAB Project: a new method for the formulation of reference intervals based on current data. Clin Chem 2005;51:1232 [PMID: 15919879].
Schnabl K, Chan MK, Gong Y, Adeli K: Closing the gap on paediatric reference intervals: the CALIPER initiative. Clin Biochem Rev 2008 Aug;29(3):89–96 [PMID: 19107221].
STATISTICAL COMPUTATION OF REFERENCE INTERVALS
The establishment of reference intervals is based on a statistical distribution of test results obtained from a representative population. The CLSI recommendation for data collection and statistical analysis provides guidelines for managing the data. For clinicians, it is not important that they can reproduce the calculation. It is far more critical to understand the benefits and restrictions provided by the described statistical approaches and to evaluate patient results with these limitations in mind.
In reviewing the statistics, 95% of all results will be inherently included in the reference range. Note that 5% of that population will have “abnormal” results, when in fact they are “healthy” and an integral part of the reference group study. Similarly, an equivalent 5% of the “ill” population will have laboratory results within the reference range. These are inherent features of the statistical computation. Taking that analysis one step further, the probability of a healthy patient having a test result within a calculated reference range is
P = .95
When multiple tests or panels of tests are used, the combined probability of all the test results falling in their respective reference ranges drops dramatically. For example, the probability of all results from 10 tests in the complete metabolic panel being in the reference range is
P = (.95)10 = .60
Therefore, about one-third of healthy patients will have one test result in the panel that is outside the reference range.
A. Parametric Method of Computation
The parametric method of establishing reference intervals is simple, though not always representative, since it is based on the assumption that the data have a Gaussian distribution. A mean (x) and standard deviation (s) are calculated; test results of 95% of that specific population will fall within the mean ±1.96s, as shown in Figure 46–1.
Figure 46–1. Gaussian distribution and parametric calculation using x ± 1.96s to define the range.
Where the distribution is not Gaussian, a mathematical manipulation of the values (eg, plotting the log of the value, instead of the value itself) may give a Gaussian distribution. The mean and standard deviation are then converted back to give a usable reference range.
B. Nonparametric Method of Computation
The nonparametric method of establishing reference ranges is currently recommended by CLSI, since it defines outliers as those in the extreme 2.5 percentile of the upper and lower limits of data, respectively. The number of data points excluded at the limits depends on the skew of the curve, and so the computation accommodates a non-Gaussian distribution. A histogram depicting the non-Gaussian distribution of data from a free thyroxine reference range study conducted at Children’s Hospital in Denver is shown in Figure 46–2.
Figure 46–2. Histogram of free thyroxine (FT4) using clinic and hospital patients at Children’s Hospital in Denver.
Ichihara K, Boyd JC; IFCC Committee on Reference Intervals and Decision Limits (C-RIDL): An appraisal of statistical procedures used in derivation of reference intervals. Clin Chem Lab Med 2010 Nov;48(11):1537–1551 [PMID: 21062226].
WHY REFERENCE INTERVALS VARY
Recent modifications to reference ranges are due to the introduction of new and improved analytical procedures, advanced automated instrumentation, and standardization of reagents and reference materials. Reference ranges are also affected by preanalytical variations that can occur during sample collection, processing, and storage.
Preanalytical variations of biological origin can occur when specimens are drawn in the morning versus in the evening, or from hospitalized recumbent patients versus ambulatory outpatients. Variations also may be caused by metabolic and hemodynamic factors. Preanalytical factors may be a product of the socioeconomic environment or ethnic background (eg, genetic or dietary).
Analytical variations are caused by differences in analytical measurements and depend on the analytical tools as well as an inherent variability in obtaining a quantitative value. Furthermore, scientific progress is constantly introducing new reagents, instruments, and improved testing procedures to the clinical laboratory, and each tool adds an element of variability between tests.
1. Antigen-antibody reactions have revolutionized clinical chemistry, but have also added a degree of variability because biologically derived reagents have different specificity and sensitivity. In addition to the targeted analyte, some of its metabolites are also measured, and these may or may not be biologically active.
2. Reference materials continue to be reviewed and evaluated by organizations such as the World Health Organization and the National Institute for Standards and Technology. A new standard was recently established for troponin I, and reference ranges were modified to reflect the new standard.
3. Analytical instrumentation with advanced electronics and robotics has improved accuracy of results and increased throughput. However, they have added an element of variability between instruments from different manufacturers.
4. Analytical detection methods have also made big strides as they have expanded from simple ultraviolet-visible spectrophotometry to fluorescence, nephelometry, radioimmunoassay, and chemiluminescence. For example, the third-generation thyroid-stimulating hormone assays can now measure concentrations as low as 0.001 μIU/mL; the reference range was recently modified to reflect the improved sensitivity of the assay.
Jung B, Adeli K: Clinical laboratory reference intervals in pediatrics: The CALIPER initiative. Clin Biochem 2009 Nov;42(16–17):1589–1595 [Epub 2009 Jul 7] [PMID: 19591815].
Nakamoto J, Fuqua JS: Laboratory assays in pediatric endocrinology: common aspects. Pediatr Endocrinol Rev 2007 Oct;5(Suppl 1): 539–554 [PMID: 18167465].
Yüksel B et al. J Clin Res Pediatr Endocrinol. 2011 Jun;3(2):84–88 [Epub 2011 Jun 8] [PMID: 21750637]. Free PMC Article at http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3119446/?tool=pubmed.
SENSITIVITY & SPECIFICITY
Despite its statistical derivation, a reference interval does not necessarily provide a finite and clear-cut guideline as to whether a patient has a disease. There will always be a segment of the population with test values that fall within the reference interval, but clinical manifestations that indicate disease is present. Similarly, a segment of the population will have test values outside the reference interval, but no clinical signs of disease. The ability of a test and corresponding reference interval to detect individuals with disease is defined by the diagnostic sensitivity of the test. Similarly, the ability of a test to detect individuals without disease is described by the diagnostic specificity. These characteristics are governed by the analytical quality of the test as well as the numerical parameters (reference interval) that define the presence of disease. The tolerance level for the desired sensitivity and specificity of a test requires significant input from clinicians.
Generally, specificity increases as sensitivity decreases. A typical reference range frequency distribution, shown in Figure 46–3 (solid line), provides information on the test results of a number of healthy subjects, or individuals without the disease. A second curve (dashed line) shows test results for individuals with the disease. As with most tests, there is an overlap area. A patient with a test result of 1 is likely healthy, and the result indicates a true negative (TN) for the presence of disease. A patient with a test result of 9 is likely to have the disease and the test result is a true positive (TP). There is a small, but significant, population with a test result of 2–5 in whom the test is not 100% conclusive. A statistical analysis may determine the most likely cutoff for healthy individuals, but the clinically acceptable cutoff depends on the test as well as clinical correlation. Where the treatment is aggressive and has serious side effects, a clinician may choose to err on the side of caution and hold treatment for anyone with a test value of less than 6.
Figure 46–3. Frequency distribution of test results for patients with and without disease. FN, false negative; FP, false positive; TN, true negative; TP, true positivea.
If cutoff values for the reference interval are such that a test result indicates that a healthy patient has the disease, the result is a false positive (FP). Conversely if a test result indicates that a patient is well when in fact he or she has the disease, the result is a false negative (FN). To define the ability of the test and reference interval to identify a disease state, the diagnostic sensitivity and specificity are measured.
Diagnostic sensitivity = TP/(TP + FN)
Diagnostic specificity = TN/(TN + FP)
In the example shown in Figure 46–3, a reference interval of 0.5–3 will provide more TN results and minimize FP results. Alternatively, a reference interval of 0.5–4 will increase the rate of FN. Thus, an increase in sensitivity leads to a decrease in specificity. A medical condition that requires aggressive treatment may necessitate a test and corresponding reference interval with a high sensitivity, which is a measure of the TP rate. This is accomplished at the expense of lowering specificity.
A reference interval is a statistical representation of test results from a finite population, but it is by no means inclusive of every member of the group. It is merely one component in the measure of a patient’s status to be viewed in relation to the sensitivity and specificity of the test.
Clinical laboratory data are frequently interpreted using a dynamic approach in which each value is compared with another. In this case, time series analysis of relative values may provide more important information than comparison against a strict reference range. Examples include the evaluation of enzyme activity over time and various stimulation studies for endocrine assessment. Drug monitoring also is a dynamic process.
PEDIATRIC REFERENCE INTERVALS
The establishment of reference ranges is a complex process. Assumptions are made in the management of data processes, regardless of whether “healthy” individuals or hospital patients are used for the accumulation of test results. Analytical instrument manufacturers conduct large studies to identify reference intervals for each specific analyte, and pediatric values have always been the most challenging. Some of the manufacturers’ recommended reference intervals are listed in Table 46–1 for general chemistry, Table 46–2 for endocrinology, and Table 46–3 for hematology. The interpretation of chemistry and hematology laboratory results is equally complex and forms a continuous challenge for physicians and the medical community at large.
Table 46–1 General chemistry.
Table 46–2 Endocrine chemistry.
Table 46–3 Hematology.