Rodak's Hematology: Clinical Principles and Applications, 5th Ed.

CHAPTER 5. Quality assurance in hematology and hemostasis testing

George A. Fritsma*


Statistical Significance and Expressions of Central Tendency and Dispersion

Statistical Significance

Computing the Mean

Determining the Median

Determining the Mode

Computing the Variance

Computing the Standard Deviation

Computing the Coefficient of Variation

Validation of a New or Modified Assay


Statistical Tests



Lower Limit of Detection

Analytical Specificity

Levels of Laboratory Assay Approval

Documentation and Reliability

Lot-to-Lot Comparisons

Development of the Reference Interval and Therapeutic Range

Internal Quality Control


Moving Average  of the Red Blood Cell IndicesImage

Delta Checks

External Quality Assessment

Assessing Diagnostic Efficacy

The Effects of Population Incidence and Odds Ratios on Diagnostic Efficacy OUTLINE—cont’d

Receiver Operating Characteristic Curve

Assay Feasibility

Laboratory Staff Competence

Proficiency Systems

Continuing Education

Quality Assurance Plan: Preanalytical and Postanalytical

Agencies That Address Hematology and Hemostasis Quality


After completion of this chapter, the reader will be able to:

1. Describe the procedures to validate and document a new or modified laboratory assay.

2. Compare a new or modified assay to a reference using statistical tests to establish accuracy.

3. Select appropriate statistical tests for a given application and interpret the results.

4. Define and compute precision using standard deviation and coefficient of variation.

5. Determine assay linearity using graphical representations and transformations.

6. Discuss analytical limits and analytical sensitivity and specificity.

7. Explain Food and Drug Administration clearance levels for laboratory assays.

8. Compute a reference interval and a therapeutic range for a new or modified assay.

9. Interpret internal quality control using controls and moving averages.

10. Explain the benefits of participation in periodic external quality assessment.

11. Measure and describe assay clinical efficacy.

12. Interpret relative and absolute risk ratios.

13. Interpret receiver operating characteristic curves.

14. Describe methods to enhance and assess laboratory staff competence.

15. Describe a quality assurance plan to control for preanalytical and postanalytical variables.

16. List the agencies that regulate hematology and hemostasis quality.


After studying the material in this chapter, the reader should be able to respond to the following case study:

On an 8:00 am assay run, the results for three levels of a preserved hemoglobin control specimen are 2 g/dL higher than the upper limit of the target interval. The medical laboratory scientist reviews δ-check data on the hemoglobin results for the last 10 patients in sequence and notices that the results are consistently 1.8 to 2.2 g/dL higher than results generated the previous day.

1. What do you call the type of error detected in this case?

2. Can you continue to analyze patient specimens as long as you subtract 2 g/dL from the results?

3. What aspect of the assay should you first investigate in troubleshooting this problem?

In medical laboratory science, quality implies the ability to provide accurate, reproducible assay results that offer clinically useful information.1 Because physicians base 70% of their clinical decision making on laboratory results, assay results must be reliable.2 Reliability requires vigilance and effort on the part of all laboratory staff members.3 An experienced medical laboratory scientist who is a quality assurance and quality control specialist often directs this effort.

Of the terms quality control and quality assurance, quality assurance is the broader concept, encompassing preanalytical, analytical, and postanalytical variables (Box 5-1). Quality control processes are employed to document assay validity, accuracy, and precision, including external quality assessment, reference interval preparation and publication, and lot-to-lot validation.4

BOX 5-1

Examples of Components of Quality Assurance

1. Preanalytical variables: selection of assay relative to patient need; implementation of assay selection; patient identification and preparation; specimen collection equipment and technique; specimen transport, preparation, and storage; monitoring of specimen condition

2. Analytical variables: laboratory staff competence; assay and instrument selection; assay validation, including linearity, accuracy, precision, analytical limits, and specificity; internal quality control; external quality assessment

3. Postanalytical variables: accuracy in transcription and filing of results; content and format of laboratory report, narrative report; reference interval and therapeutic range; timeliness in communicating critical values; patient and physician satisfaction; turnaround time; cost analysis; physician application of laboratory results

Preanalytical variables, listed further in this chapter in Table 5-11, are addressed in Chapter 3, which discusses blood specimen collection, and in Chapter 42, which includes a section on coagulation specimen management. Postanalytical variables are discussed briefly at the end of this chapter and are listed in Table 5-12. Quality assurance further encompasses laboratory assay utilization and physician test ordering patterns, nicknamed “pre-pre” analytical variables, and the appropriate application of laboratory assay results, sometimes called “post-post” analytical variables. There exists a combined 17% medical error rate associated with the pre-pre and post-post analytical phases of laboratory test utilization and application, prompting laboratory directors and scientists to develop clinical query systems that guide clinicians in laboratory assay selection.5 Clinical query systems are enhanced by reflex assay algorithms developed in collaboration with the affiliated medical and surgical staff.5 Equally important, a system of narrative reports that accompany and augment numerical laboratory assay output, authored by medical laboratory scientists and directors, is designed to assist physicians with case management.6 A discussion of pre-pre and post-post analytical variables extends beyond the scope of this textbook but may be found in the references listed at the end of this chapter.78 Quality control relies on the initial computation of central tendency and dispersion.

TABLE 5-11

Preanalytical Quality Assurance Components and the Laboratory’s Responsibility

Preanalytical Component

Laboratory Staff Responsibility

Test orders

Conduct continuous utilization reviews to ensure that physician laboratory orders are comprehensive and appropriate to patient condition. Inform physician about laboratory test availability and ways to avoid unnecessary orders. Reduce unnecessary repeat testing.

Test request forms

Are requisition forms legible? Can you confirm patient identity? Are physician orders promptly and correctly interpreted and transcribed? Is adequate diagnostic, treatment, and patient preparation information provided to assist the laboratory in appropriately testing and interpreting results?

Stat orders and timeliness

Do turnaround time expectations match clinical necessity and ensure that stat orders are reserved for medical emergencies?

Specimen collection

Is the patient correctly identified, prepared, and available for specimen collection? Is fasting and therapy status appropriate for laboratory testing? Is the tourniquet correctly applied and released at the right time? Are venipuncture sites appropriately cleansed? Are timed specimens collected at the specified intervals? Are specimen tubes collected in the specified order? Are additive tubes properly mixed? Are specimen tubes labeled correctly?

Specimen transport

Are specimens delivered intact, sealed, and in a timely manner? Are they maintained at the correct temperature?

Specimen management

Are specimens centrifuged correctly? Are tests begun within specified time frames? Are specimens and aliquots stored properly? Are coagulation specimens platelet-poor when specified?

TABLE 5-12

Postanalytical Quality Assurance Components and the Laboratory’s Responsibility

Postanalytical Component

Laboratory Staff Responsibility

Publication of reports

Are results accurately transcribed into the information system? Are they reviewed for errors by additional laboratory staff? If autoverification is in effect, are the correct parameters employed? 

Do reports provide reference intervals? Do they flag abnormal results? Are result narratives appended when necessary? Does the laboratory staff conduct in-service education to support test result interpretation? 

Are critical values provided to nursing and physician staff? Are verbal reports confirmed with feedback? Are anomalous findings resolved?


Are turnaround times recorded and analyzed? Are laboratory reports being posted to patient charts in a timely fashion?

Patient satisfaction

Does the institution include laboratory care in patient surveys? Was specimen collection explained to the patient?

Statistical significance and expressions of central tendency and dispersion

Statistical significance

When applying a statistical test such as the Student’s t-test of means or the analysis of variance (ANOVA), the statistician begins with a null hypothesis. The null hypothesis states that there is no difference between or among the means or variances of the populations being compared.9 The alternative (researchhypothesis is the logical opposite of the null hypothesis.10 For example, the null hypothesis may state there is no difference between t-test means, but the alternative hypothesis states that the null hypothesis is rejected and a statistical difference between the means does indeed exist (Table 5-2). In medical research, the null and alternative hypotheses may go unstated but are always implied.11


Typical Student’s t-Test Results

Example 1

Reference Method

New or Modified Method










Null hypothesis

Image of control = Image of test

Selected p-value


Computed p-value


Two-tail t


Null hypothesis is supported, means are not unequal

Critical two-tail t


Example 2

Reference Method

New or Modified Method










Null hypothesis

Image of control = Image of test

Selected p-value


Computed p-value


Two-tail t


Null hypothesis is rejected, means are unequal

Critical two-tail t


In the first example of a two-tail t-test, the difference in the means does not rise to statistical significance, the computed p-value exceeds the selected p-value, and the means are “not unequal.” In the second example, the selected p-value exceeds the computed p-value, and the means are unequal.

Image , mean; SD, standard deviation; n, number of data points.

The power of a statistical test is defined as its ability to reject the null hypothesis when the null hypothesis is indeed false. Power is expressed as p, which stands for the probability that the test is able to detect an effect. The p scale ranges from 0 to 1. Power is determined by the sample size (number of data points, n), the design of the research study, and the study’s ability to control for extraneous variables.

The conventional levels for significance, or for rejecting the null hypothesis, are p ≤ 0.05 (5%) or p ≤ 0.01 (1%). In the former instance, there exists a 5% chance that the effect has occurred by chance alone; in the latter instance, there exists a more stringent 1% chance. Often researchers combine the statistical results, and thus the powers of several studies, to compute a common p-value, a process called meta-analysis.

The term significant has a specific meaning based on the p-value, and it should not be generalized to imply practical or clinical significance.12 A statistical test result may indicate a statistically significant difference that is based on a selected study condition and power, but the difference may not possess practical importance because the clinical difference may be inconsequential. Experience and clinical judgment help when analyzing data, as does asking the question “Will this result generate a change in the prognosis, diagnosis, or treatment plan?”

Computing the mean

The arithmetic mean ( Image), or average, of a data series is the sum (Σ) of the individual data values divided by the number (n) of data points. A data series that represents a single population, for instance, a series of prothrombin time results from a population, is called a sample. Often clinical laboratory personnel apply the terms sample and specimen interchangeably. A specimen may be defined as a single data point within a data series (sample). In the clinical laboratory, of course, a specimen often means a tube of blood or a piece of tissue collected from a patient, which provides a single data point. The sum of sample data values above the mean is equal to the sum of the data values below the mean; however, the actual numbers of points above and below the mean are not necessarily equal.13 The mean is a standard expression of central tendency employed in most scientific applications; however, it is profoundly affected by outliers and is unreliable in a skewed population. This is the formula for computing the arithmetic mean: Image

where Image = mean; Σx = sum of data point values; and n = number of data points

The geometric mean is the n root of the product of n individual data points and is used to compute means of unlike data series. The geometric mean of the prothrombin time reference interval is used to compute the prothrombin time international normalized ratio (Chapter 43). This is the formula for computing the geometric mean:

Geometric mean of n instances of a =


Determining the median

The median is the data point that separates the upper half from the lower half of a data series (sample). To find the median, arrange the data series in numerical order and select the central data point. If the data series has an even number of data points, the median is the mean of the two central points. The median is a robust expression of central tendency in a skewed distribution because it minimizes the effects of outliers.

Determining the mode

The mode of a data series (sample) is the data point that appears most often in the sample. The mode is not a true measure of central tendency because there is often more than one mode in a data series. For instance, a typical white blood cell histogram may be trimodal, with three modes, one each for lymphocytes, monocytes, and neutrophils. Conversely, in a Gaussian, “normal” sample, in which the data points are distributed symmetrically, the mean, median, and mode coincide at a single data point.

Computing the variance

Variance (σ2) expresses the deviation of each data point from its expected value, usually the mean of the data series (sample) from which the data point is drawn. The difference between each data point from the mean is squared, the squared differences are summed, and the sum of squares is divided by n – 1. Variance is expressed in the units of the variable squared as follows: Image

where σ2 = sample variance; xi = value of each data point; Image = mean; and n = number of data points

Computing the standard deviation

Standard deviation (SD), a commonly used measure of dispersion, is the square root of the variance and is the mean distance of all the data points in a sample from the sample mean (). The larger the SD of a sample, the greater the deviation from the mean. In clinical analyses, the SD of an assay is an expression of its quality based on its inherent dispersion or variability. The formula for SD is: Figure 5-1 Image


FIGURE 5-1 The values generated by repeated assays of an analyte are graphed as a frequency distribution. Incremental values are plotted on the horizontal (x) scale and number of times each value was obtained (frequency) on the vertical (y) scale. In this example, the values are normally distributed about their mean (symmetric, Gaussian distribution). Results from an accurate assay generate a mean that closely duplicates the reference target value. Results from a precise assay generate small dispersion about the mean, whereas imprecision is reflected in a broad curve. The ideal assay is both accurate and precise.

where SD = standard deviation; xi = each data point value; Image = mean; and n = number of observations

SD states the confidence, or degree of random error, for statistical conclusions. Dispersion is typically expressed as Image ± 2 SD or the 95.5% confidence interval (CI). Data points that are over 2 SD from the mean are outside the 95.5% CI and may be considered abnormal. The dispersion of data points within Image ± 2 SD is considered the expression of random or chance variation. Typically, Image ± 2 SD is used to establish biological reference intervals (normal ranges), provided the frequency of the data points is “Gaussian,” or normally distributed, meaning symmetrically distributed about the mean.

Computing the coefficient of variation

The coefficient of variation (CV) is the normalized expression of the SD, ordinarily articulated as a percentage (CV%). CV% is the most commonly used measure of dispersion in laboratory medicine. CV% is expressed without units (except percentage), thus making it possible to compare data sets that use different units. The computation formula is:


where CV% = coefficient of variation expressed as a percentage; SD = standard deviation; and Image = mean

Validation of a new or modified assay

All new laboratory assays and all assay modifications require validation.14 Validation is an activity comprised of procedures to determine accuracy, specificity, precision, limits, and linearity.15 The results of these procedures are faithfully recorded and made available to on-site assessors upon request.16


Accuracy is the measure of agreement between an assay value and the theoretical “true value” of its analyte (Figure 5-1). Some statisticians prefer to define accuracy as the magnitude of error separating the assay result from the true value. By comparison, precision is the expression of reproducibility or dispersion about the mean, often expressed as SD or CV%, as discussed in a subsequent section, “Precision.” Accuracy is easy to define but difficult to establish and maintain; precision is relatively easy to measure and maintain.

For many analytes, laboratory professionals employ primary standards to establish accuracy. A primary standard is a material of known, fixed composition that is prepared in pure form, often by determining its mass on an analytical balance. The practitioner dissolves the weighed standard in an aqueous solution, prepares suitable dilutions, calculates the anticipated concentration for each dilution, and assigns the calculated concentrations to assay outcomes. For example, he or she may obtain pure glucose, weigh 100 mg, dilute it in 100 mL of buffer, and assay an aliquot of the solution using photometry. The resulting absorbance would then be assigned the value of 100 mg/dL. The practitioner may repeat this procedure using a series of four additional glucose solutions at 20, 60, 120, and 160 mg/dL to produce a five-pointstandard curve. The curve may be reassayed several times to generate means for each concentration. Standard curve generation is automated, however laboratory professionals retain the ability to generate curves manually when necessary. The assay is then employed on human serum or plasma, with absorbance compared with the standard curve to generate a result. The matrix of a primary standard need not match the matrix of the patient specimen; the standard may be dissolved in an aqueous buffer, whereas the test specimen may be human serum or plasma.

To save time and resources, the laboratory professional may employ a secondary standard, perhaps purchased, that the vendor has previously calibrated to a primary standard. The secondary standard may be a preserved plasma preparation at a certified known concentration. The laboratory professional merely thaws or reconstitutes the secondary standard and incorporates it into the test series during validation or revalidation. Manufacturers often match secondary standards as closely as possible to the test specimen’s matrix, for instance, serum to serum, plasma to plasma, and whole blood to whole blood. Primary and secondary standards are seldom assayed during routine patient specimen testing, only during calibration or when the assay tends to be unstable.

Regrettably, in hematology and hemostasis, where the analytes are often cell suspensions or enzymes, there are just a handful of primary standards: cyanmethemoglobin, fibrinogen, factor VIII, protein C, antithrombin, and von Willebrand factor.17 For scores of analytes, the hematology and hemostasis practitioner relies on calibrators. Calibrators for hematology may be preserved human blood cell suspensions, sometimes supplemented with microlatex particles or nucleated avian red blood cells (RBCs) as surrogates for hard-to-preserve human white blood cells (WBCs). In hemostasis, calibrators may be frozen or lyophilized plasma from healthy human donors. For most of these analytes, it is impossible to prepare “weighed-in” standards; instead, calibrators are assayed using reference methods (“gold standards”) at selected independent expert laboratories. For instance, a vendor may prepare a 1000-L lot of preserved human blood cell suspension, assay for the desired analytes within their laboratory (“in-house”), and send aliquots to five laboratories that employ well-controlled reference instrumentation and methods. The vendor obtains blood count results from all five, averages the results, compares them to their in-house values, and publishes the averages as the reference calibrator values. The vendor then distributes sealed aliquots to customer laboratories with the calibrator values published in the accompanying package inserts. Vendors often market calibrators in sets of three or five, spanning the range of assay linearity or the range of potential clinical results.

As with secondary standards, vendors attempt to match their calibrators as closely as possible to the physical properties of the test specimen. For instance, human preserved blood used to calibrate complete blood count (CBC) analytes generated by an automated cell counter is prepared to closely match the matrix of fresh anticoagulated patient blood specimens, despite the need for preservatives, refrigeration, and sealed packaging. Vendors submit themselves to rigorous certification by governmental or voluntary standards agencies in an effort to verify and maintain the validity of their products.

The laboratory practitioner assays the calibration material using the new or modified assay and compares results with the vendor’s published results. When new results parallel published results within a selected range, for example ±10%, the results are recorded and the assay is validated for accuracy. If they fail to match, the new assay is modified or a new reference interval and therapeutic range is prepared.

Medical laboratory professionals may employ locally collected fresh blood from a healthy donor as a calibrator; however, the process for validation and certification is laborious, so few attempt it. The selected specimens are assayed using reference instrumentation and methods, calibration values are assigned, and the new or modified assay is calibrated (adjusted) from these values.

New or modified assays may also be compared to reference methods. A reference method may be a previously employed, well-controlled assay or an assay currently being used by a neighboring laboratory. Several statistics are available to compare results of the new or modified assay to a reference method, including the Student’s t-test, analysis of variance (ANOVA), linear regression, Pearson correlation coefficient, and the Bland-Altman plot.

Statistical tests

Comparing means of two data series using student’s t-test

The Student’s t-test compares the sample mean of a new or modified assay to the sample mean of a reference assay.18 In a standard t-test the operator assumes that population distributions are normal (Gaussian), the SDs are equal, and the assays are independent. Often laboratory professionals use the more robust paired t -test in which the new and reference assays are performed using specimens from the same donors (aliquots). Laboratory scientists also choose between the one-tailed and two-tailed t -test, depending on whether the population being sampled has one (high or low) versus two (high and low) critical values. For instance, when assaying plasma for glucose, clinicians are concerned about both elevated and reduced glucose values, so the laboratory professional would use the two-tailed t-test; however, when assaying for bilirubin, clinical concern focuses only on elevated bilirubin concentrations, so the laboratory professional would apply the more robust one-tailed t-test in method comparison studies.

The laboratory professional generates t-test data by entering the paired data sets side by side into columns of a spreadsheet and applying an automated t-test formula. The program generates the number, mean, and variance for each data series (sample; n1, n2Image 1Image 2; σ21, σ22), and the “degrees of freedom” (df) for the test: df = n1 + n2 – 2. The operator selects the appropriate critical value (p), often p ≤ 0.05. The computer uses df and p to compare the computed t-value to the standard table of critical t-values (Table 5-1) and reports the corresponding p-value. If the p-value is less than 0.05, the means of the two samples are unequal and the result is “ statistically significant.” For instance, if the two assays are each performed on aliquots from 10 donors, the df is 18. If the computed t-value is 2.10 or higher, the means are unequal at p ≤ 0.05. Applying a stricter critical value, the computed t-value would have to be 2.88 or higher for the means to be considered unequal at p ≤ 0.01. Table 5-2 illustrates a typical t-test result.


Excerpt from the Standard Table of Critical t-Values for a Two-tailed Test


p ≤ 0.05

p ≤ 0.01


p ≤ 0.05

p ≤ 0.01


p ≤ 0.05

p ≤ 0.01


t = 4.30

t = 9.93











































The operator matches the degrees of freedom (df) with the test and looks up the critical t-value at the selected level of significance, often p ≤ 0.05 or p ≤ 0.01. If the computed t-value exceeds the critical value from the table, the null hypothesis is rejected and the difference between method means is statistically significant.

When the t-test indicates that two sample means are not unequal, the operator may choose to implement the new or modified assay. However, statistically, if two means are adjudged “not unequal,” that is not the same as “equal.” To increase the power of the validation, the scientist often chooses to compute the Pearson correlation coefficient and to apply linear regression and the Bland-Altman plot.

Using analysis of variance to compare variances of more than two data series

ANOVA accomplishes the same outcomes as the t-test; however, ANOVA may be applied to more than two series of data. A laboratory scientist may often choose to compare two, three, or four new methods with a reference method. The ANOVA computes variance (σ2) for each group (between-group σ2), an overall σ2 (within-group σ2), and an F-statistic (similar to the t-statistic) based on the within-group σ2. The F-statistic is compared with a table of critical F-statistic values to determine significance analogous to the t-statistic as shown in Table 5-1.

Like the Student’s t-test, ANOVA is available on computer spreadsheets. The operator enters the data in one column per data series (group or sample) and applies the ANOVA formula. The test reports between-groups df (the number of groups – 1) and the within-group df (total of observations – 1 per group). The test also computes and reports the sum of squares within and between groups, the total sum of squares, the mean squares within and between groups, and the F-statistic. Spreadsheet programs compare the F-statistic to the table of critical F-values and report the p-value, which the operator then compares to the selected p-value limit to determine significance. Table 5-3 illustrates typical ANOVA results.


Typical ANOVA Results












Test 1





Test 2






In this example, the F-value does not exceed the critical F-value from the table of critical values; there is no significant difference among the groups.

Variation source






Critical F

Between groups







Within groups








This test fails significance at p ≤ 0.05; null hypothesis is supported

ANOVA = analysis of variance; df = degrees of freedom; SS = sum of squares; MS = mean squares.

Comparing data series using the pearson correlation coefficient

In addition to comparing means by t-test or ANOVA, the laboratory professional compares a series of paired data to learn if the data points agree with adequate precision throughout the measurable range. For instance, to validate a new prothrombin time reagent, the scientist or technician assembles 100 plasma aliquots, assays them in sequence using first the new and then the current reagents, and records the paired data points in two spreadsheet columns, x and y. He or she then applies the spreadsheet’s Pearson product-moment correlation coefficient formula to generate a Pearson r, or correlation coefficient, which may range from –1.0 to +1.0. The spreadsheet uses this formula: Image

where r = Pearson correlation coefficient; Σxy = sum of the products of each pair of scores; n = number of values; Image = mean of the X distribution; Image = mean of the Y distribution; SDX = SD of the X distribution; and SDY = SD of the Y distribution.

Pearson r-values from 0 to +1.0 represent positive correlation; 1.0 equals perfect correlation. Laboratorians employ the Pearson formula to assess the range of values from two like assays or to compare assay results to previously assigned standard or calibrator results. Most operators set an r-value of 0.975 (or r2-value of 0.95) as the lower limit of correlation; any Pearson r-value less than 0.975 is considered invalid because it indicates unacceptable variability of the reference method.

When the Pearson r-value result indicates the adequacy of the range of values, the linear regression r-value equation described in the next section is applied. Linear regression finds the line that best predicts xfrom y but its equation does not account for dispersion. The Pearson correlation coefficient formula quantifies how x and y vary together while documenting dispersion.

Comparing data series using linear regression

If a series of five calibrators is used, results may be analyzed by the following regression equation:


where x and y are the variables; a = the intercept between the regression line and the y-axis; b = the slope of the regression line; n = number of values or elements; X = first calibrator value; Y = second calibrator value; Σ XY = sum of the product of first and second calibrator values; Σ X = sum of first calibrator values; Σ Y = sum of second calibrator values; and Σ X2 = sum of squared first calibrator values.

Perfect correlation generates a slope of 1 and a y intercept of 0. Local policy based on total error calculation establishes limits for slope and y intercept; for example, many laboratory directors reject a slope of less than 0.9 or an intercept of more than 10% above or below zero (Figure 5-2).


FIGURE 5-2 Linear regression comparing four new assays with a reference method. Assay 1 is a perfect match. The y intercept of assay 2 is 5.0, which illustrates a constant systematic error, or bias. The slope (b) value for assay 3 is 0.89, which illustrates a proportional systematic error. New assay 4 has both bias and proportional error.

Slope measures proportional systematic error; the higher the analyte value, the greater the deviation from the line of identity. Proportional errors are caused by malfunctioning instrument components or a failure of some part of the testing process. The magnitude of the error increases with the concentration or activity of the analyte. An assay with proportional error may be invalid.

Intercept measures constant systematic error (or bias, in laboratory vernacular), a constant difference between the new and reference assay regardless of assay result magnitude. A laboratory director may choose to adopt a new assay with systematic error but must modify the published reference interval.

Regression analysis gains sufficient power when 40 or more patient specimens are tested using both the new and reference assay in place of or in addition to calibrators. Data may be entered into a spreadsheet program that offers an automatic regression equation.

Comparing data series using the bland-altman difference plot

Linear regression and the Pearson correlation coefficient are essential tests of accuracy and performance; however, both are influenced by dispersion. The Bland-Altman difference plot, also known as the Tukey mean-difference plot, provides a graphical representation of agreement between two assays.19 Similar to the t-test, Pearson correlation, and linear regression, paired assay results are tabled in automated spreadsheet columns. This formula is applied:


The operator computes the mean of the assays and the signed difference between the values. A chart is prepared with the means plotted on the x-axis and the numerical or % differences on the y-axis. Difference limits are provided, characteristically at Image ± 2 SD (Figure 5-3). The plot visually illustrates the magnitude of the differences. In a normal distribution, 95.5% of the values are expected to fall within the limits; when more than 5% of data points fall outside the limits, the assay is rejected.





FIGURE 5-3 Bland-Altman data and plot. A, Excerpt illustrating 10 of the 31 PTT result data points from a current and new reagent; B, preliminary correlation data; C, identity line; and D, Bland-Altman plot based on data given above. The difference between the results from each assay are within the acceptance limits; the new PTT assay is validated. PTT, partial thromboplastin time.


Unlike the determination of accuracy, assessment of precision (dispersion, reproducibility, variation, random error) is a simple validation effort, because it merely requires performing a series of assays on a single specimen or lot of reference material (Figure 5-1).20 Precision studies always assess both within-day and day-to-day variation about the mean and are usually performed on three to five calibration specimens, although they may also be performed using a series of patient specimens. To calculate within-day precision, the scientist assays a single specimen at least 20 consecutive times using one reagent batch and one instrument run. For day-to-day precision, 20 assays are required on at least 10 runs on 10 consecutive days. The day-to-day precision study employs the same source specimen and instrument but separate aliquots. Day-to-day precision accounts for the effects of different operators, reagents, and environmental conditions such as temperature and barometric pressure.

The collected data from within-day and day-to-day sequences are reduced by formula to the mean and a measure of dispersion such as standard deviation or, most often, coefficient of variation in percent (CV%), as described in “Statistical Significance and Expressions of Central Tendency and Dispersion”. The CV% documents the degree of dispersion or random error generated by an assay, a function of assay stability.

CV% limits are established locally. For analytes based on primary standards, the within-run CV% limit may be 5% or less, and for hematology and hemostasis assays, 10% or less; however, the day-to-day run CV% limits may be as high as 30%, depending on the stability and complexity of the assay. Although accuracy, linearity, and analytical specificity are just as important, medical laboratory professionals often equate the quality of an assay with its CV%. The best assay, of course, is one that combines the smallest CV% with the greatest accuracy.

Precision for visual light microscopy leukocyte differential counts on stained blood films is immeasurably broad, particularly for low-frequency eosinophils and basophils.21 Most visual differential counts are performed by reviewing 100 to 200 leukocytes. Although impractical, it would take differential counts of 800 or more leukocytes to improve precision to measurable though inadequate levels. Automated differential counts generated by profiling instruments, however, provide CV% levels of 5% or lower because these instruments count thousands of cells.


Linearity is the ability to generate results proportional to the calculated concentration or activity of the analyte.22 The laboratory professional dilutes a high-end calibrator or elevated patient specimen to produce at least five dilutions spanning the full range of the assay. The dilutions are then assayed. Computed and assayed results for each dilution are paired and plotted on a linear graph, x-scale, and y-scale, respectively. The line is inspected visually for nonlinearity at the highest and lowest dilutions (Figure 5-4). The acceptable range of linearity is established just above the low value and below the high value at which linearity loss is evident. Although formulas exist for computing the limits of linearity, visual inspection is an accepted practice. Nonlinear graphs may be transformed using semilog or log-log graphs when necessary.


FIGURE 5-4 Determination of linearity. At least five dilutions of standard or calibrator are prepared. Dilutions must span the expected range of analyte measurements (analytical measurement range, AMR). The concentration of the analyte for each of the five dilutions is calculated. The assayed values are plotted on the y scale and the computed concentrations on the x scale. The linear range is selected by visual inspection, including the dilutions for which assayed values vary in a linear manner. In this example, the limits of linearity are 56.1% to 146.2%. Assay results that fall outside these limits are inaccurate.

Patient specimens with results above the linear range must be diluted and reassayed. Results from diluted specimens that fall within the linear range are valid; however, they must be multiplied by the dilution factor (reciprocal of the dilution) to produce the final concentration. Laboratory personnel never report results that fall below or above the linear limits, because accuracy is compromised in the nonlinear regions of the assay. Lower limits are especially important when counting platelets or assaying coagulation factors. For example, the difference between 1% and 3% coagulation factor VIII activity affects treatment options and the potential for predicting coagulation factor inhibitor formation. Likewise, the difference between a platelet count of 10,000/μL and 5000/μL affects the decision to treat with platelet concentrate.

Lower limit of detection

Linearity studies are coupled with the lower limit of detection study.23 A “zero calibrator,” or blank, is assayed 20 times, and the mean and standard deviation are computed from the results. The lower limit of detection is determined from the computed standard deviation. The limit is three standard deviations above the mean of blank assay results. This cutoff prevents false-positive results generated by low-end assay interference, commonly called noise. The manufacturer or distributor typically performs limit assays and provides the results on the package insert; however, local policies often require that results of the manufacturer’s limit studies be confirmed.

Analytical specificity

Analytical specificity is the ability of an assay to distinguish the analyte of interest from anticipated interfering substances within the specimen matrix. The laboratory practitioner “spikes” identical specimens with potential interfering substances and measures the effects of each upon the assay results. Analytical specificity is determined by the manufacturer and need not be confirmed at the local laboratory unless there is suspicion of interference from a particular substance not assayed by the manufacturer. Manufacturer specificity data are transferred from the package insert to the laboratory validation report.

Levels of laboratory assay approval

The U.S. Food and Drug Administration (FDA) categorizes assays as clearedanalyte-specific reagent (ASR) assays, research use only (RUO), and laboratory-developed (home-brew) assays. FDA-cleared assays are approved for the detection of specific analytes and should not be used for non-cleared (off-label) applications. ASRs that are bundled with other ASRs or other general reagents and labeled with an intentional use are subject to premarket review requirements. RUO kits may be used on a trial basis, but the institution or a clinical trial typically bears their expense, not the third-party payer or the patient. The FDA monitors in-house assays by regulating the main components, which include, but are not limited to, ASRs, locally prepared reagents, and laboratory instrumentation. Details are given in Table 5-4.


Categories of Laboratory Assay Approval by the United States Food and Drug Administration

Assay Category


FDA-cleared assay

The local institution may use package insert data for linearity and specificity but must establish accuracy and precision.

Analyte-specific reagent

Manufacturer may provide individual reagents but not in kit form, and may not provide package insert validation data. Local institution must perform all validation steps.

Research use only

Local institution must perform all validation steps. Research use only assays are intended for clinical trials, and carriers are not required to pay.

Laboratory-developed assay

Assays devised locally, Food and Drug Administration evaluates using criteria developed for FDA-cleared assay kits.

Documentation and reliability

Validation is recorded on standard forms available from commercial sources, for example, Data Innovations LLC EP Evaluator®. Validation records are stored for 7 to 10 years in readily accessible databases and made available to laboratory assessors upon request.24

Precision and accuracy records document assay reliability over specified periods. The recalibration interval may be once every 6 months or in accordance with operators’ manual recommendations. Recalibration is necessary whenever reagent lots are updated unless the laboratory professional can demonstrate that the reportable range is unchanged using lot-to-lot comparison. When control results demonstrate a shift or consistently fall outside action limits, or when an instrument is repaired, the laboratory professional repeats the validation procedure.25

Regularly scheduled validity rechecks, lot-to-lot comparisons, instrument preventive maintenance, staff competence, and scheduled performance of internal quality control and external quality assessment procedures ensure continued reliability and enhance the value of a laboratory assay to the patient and physician.

Lot-to-lot comparisons

Laboratory managers reach agreements with vendors to sequester kit and reagent lots, thereby ensuring infrequent lot changes, optimistically no more than once a year.26 The new reagent lot must arrive approximately a month before the laboratory runs out of the old lot so that lot-to-lot comparisons may be completed and differences resolved, if necessary. The scientist uses control or patient specimens and prepares a range of analyte dilutions, typically five, spanning the limits of linearity. If the reagent kits provide controls, these are also included, and all are assayed using the old and new reagent lots. Results are charted as illustrated in Table 5-5.


Example of a Lot-to-Lot Comparison


Old Lot Value

New Lot Value

% Difference





Low middle value








High middle








Old kit control 1




Old kit control 2




New kit control 1




New kit control 2




Negative % difference indicates the new lot value is below the old lot (reference) value. The new lot is rejected because the low and high middle value results differ by more than 10%.

Action limits vary by laboratory, but many managers reject the new lot when more than one specimen (data point pair) generates a variance greater than 10% or when all variances are positive or negative. In the latter case, the new lot may be rejected or it may be necessary to use the lot but develop a new reference interval and therapeutic range.

For several analytes, lot-to-lot comparisons include revalidation of the analytical measurement range (AMR) or reportable range. AMR is the range of results a method produces without any specimen pre-treatment, such as dilution, and is similar to a linearity study.

Development of the reference interval and therapeutic range

Once an assay is validated, the laboratory professional develops the reference interval (reference range, normal range).26 Most laboratory professionals use the vernacular phrase normal range; however, reference interval is preferred by statisticians. Using strict mathematical definitions, range encompasses all assay results from largest to smallest, whereas interval is a statistic that trims outliers.

To develop a reference interval, the laboratory professional carefully defines the desired healthy population and recruits representative donors who meet the criteria to provide blood specimens. The definition may, for example, exclude smokers, women taking oral contraceptives, and people using specified over-the-counter or prescription medications. Donors may be paid. There should be an equal number of males and females, and the chosen healthy donors should match the institution’s population demographics in terms of age and race. When practical, large-volume blood specimens are collected, aliquotted, and placed in long-term storage. For instance, plasma aliquots for coagulation reference interval development are stored indefinitely at –70° C. It may be impractical to develop local reference intervals for infants, children, or geriatric populations. In these cases the laboratory director may choose to use published (textbook) intervals.27 In general, although published reference intervals are available for educational and general discussion purposes, local laboratories must generate their own reference intervals for adults to most closely match the demographics of the area served by their institution.

The minimum number of subject specimens (data points) required to develop a reference interval may be determined using statistical power computations; however, practical limitations prevail.28 For a new assay with no currently established reference interval, a minimum of 120 data points is necessary. In most cases, however, the assay manufacturer provides a reference interval on the package insert, and the local laboratory practitioner need only assay 30 specimens, approximately 15 male and 15 female, to validate the manufacturer’s reference interval, a process called transference. Likewise, the practitioner may refer to published reference intervals and, once they are locally validated, transfer them to the institution’s report form.

Scientists assume that the population specimens employed to generate reference intervals will produce frequency distributions (in laboratory vernacular, histograms) that are normal bell-shaped (Gaussian) curves (Figure 5-5). In a Gaussian frequency distribution the mean is at the center; the mean, median, and mode coincide; and the dispersion about the mean is identical in both directions. In many instances, however, biologic frequency distributions are “log-normal” with a “tail” on the high end. For example, laboratory professionals assumed for years that the visual reticulocyte percentage reference interval in adults is 0.5% to 1.5%; however, repeated analysis of healthy populations in several locations has established the interval to be 0.5% to 2%, owing to a subset of healthy donors whose reticulocyte counts fall at the high end of the interval.29 Scientists may choose to live with a log-normal distribution, or they may transform it by replotting the curve using a semilog or log-log graphic display. The decision to transform may arise locally but eventually becomes adopted as a national practice standard.


FIGURE 5-5 Normal (Gaussian) distribution. When the test values obtained for a given subject population are normally distributed, the mean is at the peak and the mean, mode, and median coincide. The segments of the population distribution representing ±1, ±2, and ±3 standard deviations are illustrated. In developing the reference interval, laboratory directors often use ±2 standard deviations to establish the 95.5% confidence interval. This means that 95.46% of the test values from the healthy population are included within ±2 standard deviations. Consequently, 4.54%, or approximately 1 in 20 test results from theoretically healthy donors, fall outside the interval, half (2.27%) above and half below.

In a normal distribution, the mean ( Image) is computed by dividing the sum of the observed values by the number of data points, n, as shown in the equation in the section entitled “Statistical Significance and Expressions of Central Tendency and Dispersion.” The standard deviation is calculated using the formula provided in the same section. A typical reference interval is computed as ±2 standard deviations and assumes that the distribution is normal (Gaussian). The limits at ±2 standard deviations encompass 95.46% of results from healthy individuals, known as the 95.5% confidence interval. This implies that 4.54% of results from theoretically healthy individuals fall outside the interval. A standard deviation computed from a non-Gaussian distribution may turn out to be too narrow to reflect the true reference interval and may thus encompass fewer than 95.5% of results from presumed healthy donors and generate a number of false positives. Assays with high CV% values have high levels of random error reflected in a broad curve; low CV% assays with “tight” dispersal have smaller random error and generate a narrow curve, as illustrated in Figure 5-1. The breadth of the curve may also reflect biologic variation in values of the analyte.

A few hematology and hemostasis assays are used to monitor drug therapy. For instance, the international normalized ratio (INR) for prothrombin time is used to monitor the effects of oral Coumadin (warfarin) therapy, and the therapeutic range is universally established at an INR of 2 to 3. On the other hand, the therapeutic range for monitoring treatment with unfractionated heparin using the partial thromboplastin time (PTT) assay must be established locally by graphically comparing regression of the PTT results in seconds against the results of the chromogenic anti-Xa heparin assay, whose therapeutic range is established empirically as 0.3 to 0.7 international heparin units. The PTT therapeutic range is called the Brill-Edwards curve and is described in Chapter 42.

If assay revalidation or lot-to-lot comparison reveals a systematic change caused by reagent or kit modifications, a new reference interval (and therapeutic range, when applicable) is established. The laboratory director must advise the hospital staff of reference interval and therapeutic range changes because failure to observe new intervals and ranges may result in diagnosis and treatment errors.

Internal quality control


Laboratory managers prepare, or more often purchase, assay controls. Although it may appear similar, a control is wholly distinct from a calibrator. Indeed, cautious laboratory directors may insist that controls be purchased from distributors different from those who supply their calibrators. As discussed in the section “Validation of a New or Modified Assay,” calibrators are used to adjust instrumentation or to develop a standard curve. Calibrators are assayed by a reference method in expert laboratories, and their assigned value is certified. Controls are used independently of the calibration process so that systematic errors caused by deterioration of the calibrator or a change in the analytical process can be detected through internal quality control. This process is continuous and is called calibration verification.29 Compared with calibrators, control materials are inexpensive and are prepared from the same matrix as patient specimens except for preservatives, lyophilization, or freezing necessary to prolong shelf life. Controls provide known values and are sampled alongside patient specimens to accomplish within-run assay validation. In nearly all instances, two controls are required per test run: one within the reference interval and one above or below the reference interval. For some assays there is reason to select controls whose values are just outside the upper or lower limit of the reference interval, “slightly” abnormal. In institutions that perform continuous runs, the controls should be run at least once per shift, for instance, at 7 am, 3 pm, and 11 pm. In laboratories where assay runs are discrete events, two controls are assayed with each run.

Control results must fall within predetermined dispersal limits, typically ±2 SD. Control manufacturers provide limits; however, local laboratory practitioners must validate and transfer manufacturer limits or establish their own, usually by computing standard deviation from the first 20 control assays. Whenever the result for a control is outside the established limits, the run is rejected and the cause is found and corrected. The steps for correction are listed in Table 5-6.


Steps Used to Correct an Out-of-Control Assay Run



1. Reassay

When a limit of ±2 standard deviations is used, 5% of expected assay results fall above or below the limit.

2. Prepare new control and reassay

Controls may deteriorate over time when exposed to adverse temperatures or subjected to conditions causing evaporation.

3. Prepare fresh reagents and reassay

Reagents may have evaporated or become contaminated.

4. Recalibrate instrument

Instrument may require repair.

Control results are plotted on a Levey-Jennings chart that displays each data point in comparison to the mean and limits ().Figure 5-630 The Levey-Jennings chart assumes that the control results distribute in a Gaussian manner and provide limits at 1, 2, and 3 SD above and below the mean. In addition to being analyzed for single-run errors, the data points are examined for sequential errors over time (Figure 5-7). Both single-run and long-term control variation are a function of assay dispersion or random error and reflect the CV% of an assay.


FIGURE 5-6 Levey-Jennings chart illustrating acceptable control results. Control results from 19 runs in 20 days all fall within the action limits established as ±2 standard deviations (s). Results distribute evenly about the mean.


FIGURE 5-7 Levey-Jennings chart that illustrates a systematic error or Westgard 10X condition (shift, Table 5-7)Control results from 21 runs in 22 days all fall within the action limits established as ±2 standard deviations (s); however, the final 11 control results are above the mean. When 10 consecutive control results fall on one side of the mean, the assay has been affected by a systematic error (shift). The operator troubleshoots and recalibrates the assay.

Dr. James Westgard has established a series of internal quality control rules that are routinely applied to long-term deviations, called the Westgard rules.31 The rules were developed for assays that employ primary standards, but a few Westgard rules that are the most useful in hematology and hemostasis laboratories are provided in Table 5-7, along with the appropriate actions to be taken.32


Westgard Rules Employed in Hematology and Hemostasis


A single control value is outside the ±3 SD limit.


Two control values are outside the ±2 SD limit.


Two consecutive control values within a run are more than 4 SD apart.


Four consecutive control values within a run exceed the mean by ±1 SD.


Also called a “shift.” A series of 10 consecutive control values remain within the dispersal limits but are consistently above or below the mean.


Also called a “trend.” A series of at least 7 control values trend in a consistent direction.

10× or 7T may indicate an instrument calibration issue that has introduced a constant systematic error (bias). Shifts or trends may be caused by deterioration of reagents, pump fittings, or light sources. Abrupt shifts may reflect a reagent or instrument fitting change.

In all cases, assay results are rejected and the error is identified using the steps in Table 5-6.

Moving average Image of the red blood cell indices

In 1974, Dr. Brian Bull proposed a method of employing patient RBC indices to monitor the stability of hematology analyzers, recognizing that the RBC indices mean cell volume (MCV), mean cell hemoglobin (MCH), and mean cell hemoglobin concentration (MCHC) remain constant on average despite individual patient variations.33 Each consecutive sequence of 20 patient RBC index assay results is collected and treated by the moving average formula (see reference), which accumulates, “smooths,” and “trims” data to reduce the effect of outliers. Each trimmed 20-specimen mean, Image , is plotted on a Levey-Jennings chart and tracked for trends and shifts using Westgard rules. The formula has been automated and embedded in the circuitry of all hematology analyzers, which provide a Levey-Jennings chart for MCV, MCH, and MCHC. The moving average concept has been generalized to WBC and platelet counts and to some clinical chemistry analytes, albeit with moderate success.

To begin, 500 consecutive specimens are analyzed for the mean MCV, MCH, and MCHC. A Levey-Jennings chart is prepared using ±3% of the mean or one SD as the action limits, and subsequent data accumulation commences in groups of 20.

The moving average method requires a computer to calculate the averages, does not detect within-run errors, and is less sensitive than the use of commercial controls in detecting systematic shifts and trends. It works well in institutions that assay specimens from generalized populations that contain minimal numbers of sickle cell or oncology patients. A population that has a high percentage of abnormal hematologic results, as may be seen in a tertiary care facility, may generate a preponderance of moving average outliers.34 Moving average systems do not replace the use of control specimens but provide additional means to detect shifts and trends.

Delta checks

The δ-check system compares a current analyte result with the result from the most recent previous analysis for the same patient.35 Certain patient values remain relatively consistent over time unless there is an intervention. A result that fails a δ-check, often a 20% deviation, is investigated for intervention such as a transfusion or surgery, or a profound change in the patient’s condition subsequent to the previous analysis. If there is no ready explanation, the failed δ-check may indicate an analytical error or mislabeled specimen. Results that fail a δ-check are sequestered until the cause is found. Laboratory directors may require δ-checks on MCV, RDW, HGB, PLT, PT, INR, and PTT. Action limits for δ-checks are based on clinical impression and are assigned by hematology and hemostasis laboratory directors in collaboration with clinicians and laboratory staff. Computerization is essential, and δ-checks are designed only to identify gross errors, not changes in random error, or shifts or trends. There is no regulatory requirement for δ-checks.

External quality assessment

External quality assessment further validates the accuracy of hematology and hemostasis assays by comparing results from identical aliquots of specimens distributed at regular intervals among laboratories nationwide or worldwide. The aliquots are often called survey or proficiency testing specimens and include preserved human donor plasma and whole blood, stained peripheral blood films and bone marrow smears, and photomicrographs of cells or tissues.

In most proficiency testing systems, target (true or reference) values for the test specimens are established in-house by their manufacturer or distributor and are then further validated by preliminary distribution to a handful of “expert” laboratories. Separate target values may be assigned for various assay methods and instruments, as feasible.

Laboratories that participate in external quality assessment are directed to manage the survey specimens using the same principles as those employed for patient specimens—survey specimens should not receive special attention. Turnaround is swift, and results are sent electronically to the provider.

In addition to establishing a target value, agencies that administer surveys reduce the returned data to statistics, including the mean, median, and standard deviation of all participant results. Provided the survey is large enough, the statistics may be computed individually for the various instruments and assay methods. The statistics collected from participants should match the predetermined targets. If they do not, the agency troubleshoots the assay and assigns the most reliable statistics, usually the group mean and standard deviations.

The agency provides a report to each laboratory, illustrating its result in comparison with the target value and appending a comment if the laboratory result exceeds the established limits, usually ±2 standard deviations from the mean. If the specimen is a blood or bone marrow smear, a photomicrograph, or a problem that requires a binary (positive/negative, yes/no) response, the local laboratory comment is compared with expert opinion and consensus.

Although a certain level of error is tolerated, error rates that exceed established limits result in corrective recommendations or, in extreme circumstances, loss of laboratory accreditation or licensure.

There are a number of external quality assessment agencies; however, the College of American Pathologists (CAP, and the American Proficiency Institute (API, provide the largest survey systems. Survey packages are provided for laboratories offering all levels of service. API and CAP are nongovernmental agencies; however, survey participation is necessary to meet the accreditation requirements of the Joint Commission ( and to qualify for Medicare reimbursement. The North American Specialized Coagulation Laboratory Association ( provides survey systems for specialty coagulation laboratories in the United States and Canada and is affiliated with the ECAT (external quality control of diagnostic assays and tests, Foundation External Quality Assessment Program of the Netherlands, which provides survey materials throughout Europe. Many state health agencies provide proficiency testing surveys, requiring laboratories to participate as a condition of licensure.

Assessing diagnostic efficacy

Since the 1930s, surgeons have used the bleeding time test to predict the risk of intraoperative hemorrhage. The laboratory scientist, technician, or phlebotomist activates an automated lancet to make a 5-mm long, 1-mm deep incision in the volar surface of the forearm and uses a clean piece of filter paper to meticulously absorb drops of blood in 30-second intervals. The time interval from initial incision to bleeding cessation is recorded, normally 2 to 9 minutes. The test is simple and logical, and experts have claimed for over 50 years that if the incision bleeds for longer than 9 minutes, there is a risk of surgical bleeding. In the 1990s clinical researchers compared within-range and prolonged bleeding times with instances of intraoperative bleeding and found to their surprise that prolonged bleeding time results predicted fewer than 50% of intraoperative bleeds.3637 Many bleeds occurred despite a bleeding time shorter than 9 minutes. Thus, the positive predictive value of the bleeding time for intraoperative bleeding was less than 50%, which is the probability of turning up heads in a coin toss. Today the bleeding time test is widely agreed to have no clinical efficacy and is obsolete, though still available.

Like the bleeding time test, many time-honored hematology and hemostasis assays gain credibility on the basis of logic and expert opinion. Now, however, besides being valid, accurate, linear, and precise, a new or modified assay must be diagnostically effective.38 To compute diagnostic efficacy, the laboratory professional obtains a series of specimens from healthy subjects, volunteers who do not have the particular disease or condition being measured, called controls; and from patients who conclusively possess a disease or condition. The patients’ diagnosis is based on downstream clinical outcomes, discharge notes, or the results of valid existing laboratory tests, excluding the new assay. The new assay is then applied to specimens from both the healthy control and disease patient groups to assess its efficacy.

In a perfect world, the laboratory scientist sets the discrimination threshold at the 95.5% confidence interval limit (±2 SD) of the mean. When this threshold, also called the limit or “cut point,” is used, the test hopefully yields a positive result, meaning a level elevated beyond the upper limit or reduced below the lower limit, in every instance of disease and a negative result, within the reference interval, in all subjects (controls) without the disease. In reality, there is always some overlap: a “gray area” in which some positive test results are generated from non-disease specimens (false positives) and some negative results are generated from specimens taken from patients with proven disease (false negatives). False positives cause unnecessary anxiety, follow-up expense, and erroneous diagnostic leads—worrisome, expensive, and time consuming, but seldom fatal. False negatives fail to detect the disease and may delay treatment which can be potentially life threatening. The laboratory scientist employs diagnostic efficacy computations to establish the effectiveness of laboratory assays and to minimize both false-positive and false-negative results (Table 5-8). Diagnostic efficacy testing includes determination of diagnostic sensitivity and specificity, positive and negative predictive value, and receiver operating characteristic analysis.


Diagnostic Efficacy Definitions and Binary Display

True positive

Assay correctly identifies a disease or condition in those who have it.

False positive

Assay incorrectly identifies disease when none is present.

True negative

Assay correctly excludes a disease or condition in those without it.

False negative

Assay incorrectly excludes disease when it is present.


Individuals Unaffected by the Disease or Condition

Individuals Affected by the Disease or Condition

Assay is negative

True negative

False negative

Assay is positive

False positive

True positive

To start a diagnostic efficacy study, the scientist selects control specimens from healthy subjects and specimens from patients proven to have the disease or condition addressed by the assay. To make this discussion simple, assume that 50 specimens of each are chosen. All are assayed, and the results are shown in . Table 5-9

The scientist next computes diagnostic sensitivity and specificity and positive and negative predictive value as shown in . These values are then used to consider the conditions in which the assay may be effectively used. Table 5-10


Diagnostic Efficacy Study


Individuals Unaffected by the Disease or Condition

Individuals Affected by the Disease or Condition

Assay is negative

True negative: 40

False negative: 5

Assay is positive

False positive: 10

True positive: 45

Data on specimens from 50 individuals who are unaffected by the disease or condition and 50 individuals who are affected by the disease or condition.

TABLE 5-10

Diagnostic Efficacy Computations





Diagnostic sensitivity

Proportion with the disease who have a positive test result

Sensitivity (%) = TP/(TP + FN) × 100

45/(45 + 5) × 100 = 90%

Distinguish diagnostic sensitivity from analytical sensitivity. Analytical sensitivity is a measure of the smallest increment of the analyte that can be distinguished by the assay.

Diagnostic specificity

Proportion without the disease who have a negative test result

Specificity (%) = TN/(TN + FP) × 100

40/(40 +10) × 100 = 80%

Distinguish diagnostic specificity from analytical specificity. Analytical specificity is the ability of the assay to distinguish the analyte from interfering substances.

Positive predictive value (PPV)

Proportion with a disease who have a positive test result compared with all individuals who have a positive test result

PPV (%) = TP/(TP + FP) × 100

45/(45 + 10) × 100 = 82%

The positive predictive value predicts the probability that an individual with a positive assay result has the disease or condition.

Negative predictive value (NPV)

Proportion without a disease who have a negative test result compared with all individuals who have a negative test result

NPV (%) = TN/(TN + FN) × 100

40/(40 + 5) × 100 = 89%

The negative predictive value predicts the probability that an individual with a negative assay result does not have the disease or condition.

* Using data from Table 5-9.

FN, false negative; FP, false positive; TN, true negative; TP, true positive.

The effects of population incidence and odds ratios on diagnostic efficacy

Epidemiologists describe population events using the terms prevalence and incidence. Prevalence describes the total number of events or conditions in a broadly defined population, for instance, the total number of patients with chronic heart disease in the United States. Prevalence quantitates the burden of a disease on society but is not qualified by time intervals and does not predict disease risk.

Incidence describes the number of events occurring within a randomly selected number of subjects representing a population, over a defined time, for instance, the number of new cases of heart disease per 100,000 U.S. residents per year. Incidence numbers are non-cumulative. Incidence can be further defined, for instance by the number of heart disease cases per 100,000 nonsmokers, 100,000 women, or 100,000 people ages 40 to 50. Scientists use incidence, not prevalence, to select laboratory assays for specific applications such as screening or confirmation.

For all assays, as diagnostic sensitivity rises, specificity declines. A screening test is an assay that is applied to a large number of subjects within a convenience sample where the participant’s condition is unknown, for example, lipid profiles offered in a shopping mall. Assays that possess high sensitivity and low specificity make effective screening tests, although they produce a number of false positives. For instance, if the condition being studied has an incidence of 0.0001 (1 in 10,000 per year) and the false-positive rate is a modest 1%, the assay will produce 99 false-positive results for every true-positive result. Clearly such a test is useful only when the consequence of a false-positive result is minimal and follow-up confirmation is readily available.

Conversely, as specificity rises, sensitivity declines. Assays with high specificity provide effective confirmation when used in follow-up to positive results on screening assays. High-specificity assays produce a number of false negatives and should not be used as initial screens. A positive result on both a screening assay and a confirmatory assay provides a definitive conclusion. A positive screening result followed by a negative confirmatory test result generates a search for alternative diagnoses.

Laboratory assays are most effective when chosen to assess patients with high clinical pretest probability. In such instances, the incidence of the condition is high enough to mitigate the effects of false positives and false negatives. For instance, when a physician orders hemostasis testing for patients who are experiencing easy bruising, there is a high pretest probability, which raises the assays’ diagnostic efficacy. Conversely, ordering hemostasis assays as screens of healthy individuals prior to elective surgery introduces a low pretest probability and reduces the efficacy of the test profile, raising the relative rate of false positives.

Epidemiologists further assist laboratory professionals by designing prospective randomized control trials to predict the relative odds ratio (or relative risk ratio, RRR) and the absolute odds ratio (or absolute risk ratio, ARR) of an intervention that is designed to modify the incidence of an event within a selected population, as illustrated in the following example.38

You design a 5-year study in which you select 2000 obese smokers ages 40 to 60 who have no heart disease. You randomly select 1000 for intervention: periodic laboratory assays for inflammatory markers, with follow-up aspirin for those who have positive assay results. The 1000 controls are tested with the same lab assays but are given a placebo that resembles aspirin. The primary endpoint is acute myocardial infarction (AMI). No one dies or drops out, and at the end of five years, 100 of the 1000 controls and 50 of the 1000 members of the intervention arm have suffered AMIs. The control arm ratio is 100/1000 = 0.1; the intervention arm ratio is 50/1000 = 0.05; and the RRR is 0.05/0.1 = 0.5 (50%). You predict from your study that the odds (RRR) of having a heart attack are cut in half by the intervention. You repeat the study using 2000 slim nonsmokers ages 20 to 40. In this sample, 10 of the 1000 controls and 5 in the intervention group suffer AMIs, the computation is 0.01/0.005 = 0.5, same as in the obese smoker group, thus enabling you to generalize your results to slim non-smokers. RRR has been used extensively to support widespread medical interventions, often without regard to control arm incidence or the risks associated with generalizing to non-studied populations.

You go on to compute the ARR, which is the absolute value of the arithmetic difference in the event rates of the control and intervention arms.39 In our example using the obese smokers group, the ARR = 0.1 – 0.05 = 0.05, or 5% (not 50%, as reported in the example using RRR above). The ARR is often expressed as the number necessary to treat (NNT), the inverse of ARR. In our example, for everyone whose AMI is prevented by the laboratory test and subsequent treatment, you would have to treat 20 total donors over a 5-year period. Further, if you reduce the ARR to an annual rate, 0.05/5 years = 0.01, or a 1% annual reduction. You conclude from your study that 100 interventions per year are required to prevent one AMI.

Finally, your RRR and ARR are not discrete integers but means computed from samples of 1000, so they must include an expression of dispersion, usually ±2 SD or a 95.5% confidence interval. Suppose the 95.5% confidence interval for the RRR turns out to be relatively broad: –0.1 to +1.1. A ratio of 1 implies no effect from the intervention, that is, the rate of change in the intervention arm is equal to the rate of change in the control arm. Given that the 95.5% confidence interval embraces the number 1, the intervention has failed to provide any benefit.

In summary, once a laboratory assay is verified to be accurate and precise, it must then be revealed to possess diagnostic efficacy and to provide for effective intervention as determined by favorable RRR and ARR or NNT values. Application of the receiver operating characteristic (ROC) curve may help achieve these goals.

Receiver operating characteristic curve

A ROC curve is a further refinement of diagnostic efficacy testing that may be employed to determine the decision limit (cutoff, threshold) for an assay when the assay generates a continuous variable.40 In diagnostic efficacy testing as described in the previous section, the ±2 SD limits of the reference interval are used as the thresholds for discriminating a positive from a negative test result. Often the “true” threshold varies from the ±2 SD limit. Using ROC analysis, the limit is adjusted by increments of 1 (or other increments depending upon the analytical range), and the true-positive and false-positive rates are recomputed for each new threshold level using the same formulas provided in the section named “Assessing Diagnostic Efficacy.” The limit that is finally selected is the one that provides the largest true-positive and smallest false-positive rate (Figure 5-8). The operator generates a line graph plotting true positives on the y-axis and false positives on the x-axis. Measuring the area under the curve (a computer-based calculus function) assesses the overall efficacy of the assayIf the area under the curve is 0.5, the curve is at the line of identity between false and true positives and provides no discrimination. Most agree that a clinically useful assay should have an area under the curve of 0.85 or higher.41


FIGURE 5-8 A, Receiver operating characteristic curve. The false-positive and true-positive rates for each discrimination threshold from 70% to 80% are computed and graphed as paired variables on a linear scale, false-positive rate on the horizontal (x) scale and true-positive rate on the vertical (y) scale. The assay has acceptable discrimination between affected and non-affected individuals; with an area under the curve (AUC) of 0.85, and 73% is the threshold that produces the most desirable false-positive and true-positive rates. B, This assay has unacceptable discrimination between affected and unaffected individuals, with an AUC of 0.70. It is difficult to find the threshold that produces the most desirable false-positive and true-positive rates. C,This assay, with an AUC of 0.50, has no ability to discriminate. FP, false positive; TP, true positive.

Assay feasibility

Most laboratory managers and directors review assay feasibility before launching complex validation, efficacy, reference interval, and quality control initiatives. Feasibility studies include a review of assay throughput (number of assays per unit time), dwell time (length of assay interval from specimen sampling to report), cost per test, cost/benefit ratio, turnaround time, and the technical skill required to perform the assay. To select a new instrument, the manager reviews issues of operator safety, footprint, overhead, compatibility with laboratory utilities and information system, the need for middleware, frequency and duration of breakdowns, and distributor support and service.

Laboratory staff competence

Staff integrity and professional staff competence are the keys to assay reliability. In the United States, California, Florida, Georgia, Hawaii, Louisiana, Montana, Nevada, New York, North Dakota, Rhode Island, Tennessee, West Virginia, and Puerto Rico enforce licensure laws. In these states, only licensed laboratory professionals may be employed in medical center or reference laboratories. Legislatures in Alaska, Illinois, Massachusetts, Minnesota, Missouri, Vermont, and Virginia have considered and rejected licensure bills, the bills having been opposed by competing health care specialty associations and for-profit entities. In non-licensure states, conscientious laboratory directors employ only nationally certified professionals. Certification is available from the American Society for Clinical Pathology Board of Certification in Chicago, Illinois. Studies of laboratory errors and outcomes demonstrate that laboratories that employ only licensed or certified professionals produce the most reliable assay results.4243

Competent laboratory staff members continuously watch for and document errors by inspecting the results of internal validation and quality control programs and external quality assessment. Error is inevitable, and incidents should be documented and highlighted for quality improvement and instruction. When error is associated with reprimand, the opportunity for improvement may be lost to cover-up. Except in cases of negligence, the analysis of error without blame is consistently practiced in an effort to improve the quality of laboratory service.

Proficiency systems

Laboratory managers and directors assess and document professional staff skills using proficiency systems. The hematology laboratory manager may, for instance, maintain a collection of normal and abnormal blood films, case studies, or laboratory assay reports that technicians and scientists are required to examine at regular intervals. Personnel who fail to reproduce the target values on examination of the blood film are provided remedial instruction. The proficiency set may also be used to assess applicants for laboratory positions. Proficiency testing systems are available from external quality assessment agencies, and proficiency reports are made accessible to laboratory assessors.

Continuing education

The American Society for Clinical Pathology Board of Certification and state medical laboratory personnel licensure boards require technicians and scientists to participate in and document continuing education for periodic recertification or relicensure. Educators and experts deliver continuing education in the form of journal articles, case studies, online seminars (webinars), and seminars and workshops at professional meetings. Medical centers offer periodic internal continuing education opportunities (in-service education) in the form of grand rounds, lectures, seminars, and participative educational events. Presentation and discussion of current cases are particularly effective. Continuing education maintains the critical skills of laboratory personnel and provides opportunities to learn about new clinical and technical approaches. The Colorado Association for Continuing Medical Laboratory Education (, the American Society for Clinical Laboratory Science (, the American Society for Clinical Pathology (, the American Society of Hematology (, the National Hemophilia Foundation (, and the Fritsma Factor ( are examples of the scores of organizations that direct their activities toward quality continuing education in hematology and hemostasis.

The medical laboratory science profession stratifies professional staff responsibilities by educational preparation. In the United States, professional levels are defined as the associate (2-year) degree level, or medical laboratory technician; bachelor (4-year) degree level, or medical laboratory scientist; and the levels of advanced degrees: master’s degree or doctorate in clinical laboratory science and related sciences. Many colleges and universities offer articulation programs that enable professional personnel to advance their education and responsibility levels. Several of these institutions provide undergraduate and graduate distance-learning opportunities. A current list is maintained by the National Accrediting Agency for Clinical Laboratory Sciences (, and the American Society for Clinical Laboratory Science publishes the Directory of Graduate Programs for Clinical Laboratory Practitioners, 5th ed. Enlightened employers encourage personnel to participate in advanced educational programs, and many provide resources for this purpose. Education contributes to quality laboratory services.

Quality assurance plan: Preanalytical and postanalytical

In addition to keeping analytical quality control records, U.S. regulatory agencies such as the Centers for Medicare and Medicaid Services ( require laboratory directors to maintain records of preanalytical and postanalytical quality assurance and quality improvement efforts.44 Although not exhaustive, Table 5-11 lists and characterizes a number of examples of preanalytical quality efforts, and Table 5-12 provides a review of postanalytical components. All quality assurance plans provide objectives, sources of authority, scope of services, an activity calendar, corrective action, periodic evaluation, standard protocol, personnel involvement, and methods of communication.45

API Paperless Proficiency Testing™ and CAP Q-PROBES® are subscription services that provide model quality assurance programs. Experts in quality assurance continuously refine the consensus of appropriate indicators of laboratory medicine quality. Quality assurance programs search for events that provide improvement opportunities.

Agencies that address hematology and hemostasis quality

The following are agencies that are concerned with quality assurance in hematology and hemostasis laboratory testing:

• Data Innovations North America (, 120 Kimball Avenue, Suite 100, South Burlington, VT 05403: Quality assurance management software: instrument management middleware, laboratory production management software, EP Evaluator®, reference interval tables, allowable total error tables.

• Clinical and Laboratory Standards Institute (CLSI,, 940 West Valley Road, Suite 1400, Wayne, PA 19087. International guidelines and standards for laboratory practice. Hematology and hemostasis guidelines and standards include CLSI H02-A4, H07-A3, and H21–H48, method evaluation and assessment of diagnostic accuracy, mostly EP-prefix standards; quality assurance and quality management systems, mostly QMS-prefix standards are available.

• Centers for Medicare and Medicaid Services (CMS,, 7500 Security Boulevard, Baltimore, MD 21244. Administers the laws and rules developed from the Clinical Laboratory Improvement Amendments of 1988. Establishes Current Procedural Terminology (CPT) codes, reimbursement rules, and classifies assay complexity.

• American Proficiency Institute (API,, 1159 Business Park Drive, Traverse City, MI 49686. Laboratory proficiency testing and quality assurance programs, continuing education programs and summaries, tutorials, special topics library.

• College of American Pathologists (CAP,, 325 Waukegan Road, Northfield, IL 60093. Laboratory accreditation, proficiency testing, and quality assurance programs; laboratory education, reference resources, and e-lab solutions.

• Joint Commission (, One Renaissance Boulevard, Oakbrook Terrace, IL 60181. Medical center-wide accreditation and certification programs.

• Laboratory Medicine Quality Improvement (, an initiative of the U.S. Centers for Disease Control and Prevention.


• Hematology and hemostasis laboratory quality assurance relies on basic statistics describing measures of central tendency, measures of dispersion, and significance.

• Each new assay or assay modification must be validated for accuracy, precision, linearity, specificity, and lower limit of detection ability. In the hematology and hemostasis laboratory, accuracy validation usually requires a series of calibrators. Accuracy is established using the Student’s t-test, ANOVA, Pearson product-moment correlation, linear regression, and the Bland-Altman distribution.

• Precision is established by using repeated within-day and day-to-day assays, then computing the mean, standard deviation, and coefficient of variation of the results.

• Vendors usually provide assay linearity, specificity, and lower limit of detection; however, laboratory managers may require that these parameters be revalidated locally.

• Internal quality control is accomplished by assaying controls with each test run. Control results are compared with action limits, usually the mean of the control assay ±2 SD. When a specified number of control values are outside the limits, the use of the assay is suspended and the practitioner begins troubleshooting. Control results are plotted on Levey-Jennings charts and examined for shifts and trends. Internal quality control is enhanced through the use of the moving average algorithm and ∂-checks.

• All conscientious laboratory directors subscribe to an external quality assessment system, also known as proficiency testing or proficiency surveys. External quality assessment enables the director to compare selected assay results with other laboratory results, nationally and internationally, as a further check of accuracy. Maintaining a good external quality assessment record is essential to laboratory accreditation. Most U.S. states require external quality assessment for laboratory licensure.

• All laboratory assays are analyzed for diagnostic efficacy, including diagnostic sensitivity and specificity, their true-positive and true-negative rates, and positive and negative predictive values. Highly sensitive assays may be used for population screening but may poorly discriminate between the healthy and diseased population. Specific assays may be used to confirm a condition, but generate a number of false negatives. Assays are chosen on the basis of the value of their intervention, based on relative or absolute risk ratios. Diagnostic efficacy computations expand to include receiver operating characteristic curve analysis.

• Conscientious laboratory managers hire only certified or licensed medical laboratory scientists and technicians and provide regular individual proficiency tests that are correlated with in-service education. They encourage staff members to participate in continuing education activities and in-house discussion of cases. Quality laboratories provide resources for staff to pursue higher education.

• The laboratory director maintains a protocol for assessing and improving upon preanalytical and postanalytical variables and finds means to communicate enhancements to other members of the health care team.

Now that you have completed this chapter, go back and read again the case study at the beginning and respond to the questions presented.

Review questions

Answers can be found in the Appendix.

1. What procedure is employed to validate a new assay?

a. Comparison of assay results to a reference method

b. Test for assay precision

c. Test for assay linearity

d. All of the above

2. You validate a new assay using linear regression to compare assay calibrator results with the distributor’s published calibrator results. The slope is 0.99 and the y intercept is +10%. What type of error is present?

a. No error

b. Random error

c. Constant systematic error

d. Proportional systematic error

3. Which is a statistical test comparing means?

a. Bland-Altman

b. Student’s t-test


d. Pearson

4. The acceptable hemoglobin control value range is 13 ± 0.4 g/dL. The control is assayed five times and produces the following five results: 12.0 g/dL  12.3 g/dL  12.0 g/dL  12.2 g/dL  12.1 g/dL These results are:

a. Accurate but not precise

b. Precise but not accurate

c. Both accurate and precise

d. Neither accurate nor precise

5. A WBC count control has a mean value of 6000/μL and a standard deviation of 300/μL. What is the 95.5% confidence interval?

a. 3000 to 9000/μL

b. 5400 to 6600/μL

c. 5500 to 6500/μL

d. 5700 to 6300/μL

6. The ability of an assay to distinguish the targeted analyte from interfering substances within the specimen matrix is called:

a. Analytical specificity

b. Analytical sensitivity

c. Clinical specificity

d. Clinical sensitivity

7. The laboratory purchases reagents from a manufacturer and develops an assay using standard references. What FDA category is this assay?

a. Cleared

b. Home-brew

c. Research use only

d. Analyte-specific reagent

8. A laboratory scientist measures prothrombin time for plasma aliquots from 15 healthy males and 15 healthy females. She computes the mean and 95.5% confidence interval and notes that they duplicate the manufacturer’s statistics within 5%. This procedure is known as:

a. Confirming linearity

b. Setting the reference interval

c. Determining the therapeutic range

d. Establishing the reference interval by transference

9. You purchase a preserved whole blood specimen from a distributor who provides the mean values for several complete blood count analytes. What is this specimen called?

a. Normal specimen

b. Calibrator

c. Control

d. Blank

10. You perform a clinical efficacy test and get the following results:


Unaffected by Disease or Condition

Affected by Disease or Condition

Assay is negative



Assay is positive



What is the number of false-negative results?

a. 40

b. 10

c. 5

d. 45

11. What agency provides external quality assurance (proficiency) surveys and laboratory accreditation?

a. Clinical Laboratory Improvement Advisory Committee (CLIAC)

b. Centers for Medicare and Medicaid Services (CMS)

c. College of American Pathologists (CAP)

d. Joint Commission

12. What agency provides continuing medical laboratory education?

a. Colorado Association for Continuing Medical Laboratory Education (CACMLE)

b. Clinical Laboratory Improvement Advisory Committee (CLIAC)

c. Centers for Medicare and Medicaid Services (CMS)

d. College of American Pathologists (CAP)

13. Regular review of blood specimen collection quality is an example of:

a. Postanalytical quality assurance

b. Preanalytical quality assurance

c. Analytical quality control

d. External quality assurance

14. Review of laboratory report integrity is an example of:

a. Preanalytical quality assurance

b. Analytical quality control

c. Postanalytical quality assurance

d. External quality assurance

15. When performing a receiver operating curve analysis, what parameter assesses the overall efficacy of an assay?

a. Area under the curve

b. Performance limit (threshold)

c. Positive predictive value

d. Negative predictive value

16. You require your laboratory staff to annually perform manual lupus anticoagulant profiles on a set of plasmas with known values. This exercise is known as:

a. Assay validation

b. Proficiency testing

c. External quality assessment

d. Pre-pre analytical variable assay


1.  Westgard J.O. Assuring the Right Quality Right Good Laboratory Practices for Verifying the Attainment of the Intended Quality of Test Results. Madison, WI : Westgard QC 2007.

2.  Becich M.J. Information management moving from test results to clinical information. Clin Leadership Manag Rev; 2000; 14:296-300.

3.  Clinical and Laboratory Standards Institute (CLSI). Quality Management System Continual Improvement, Approved Guideline. 3rd ed. Wayne, PA : CLSI document QMS06–A3 2011.

4.  Clinical and Laboratory Standards Institute (CLSI). Procedures for the Collection of Diagnostic Blood Specimens by Venipuncture, Approved Standard. 6th ed. Wayne, PA : CLSI document H3–A6 2007.

5.  Stroobants A.K, Goldschmidt H.M, Piebani M. Error budget calculations in laboratory medicine linking the concepts of biological variation and allowable medical errors. Clin Chim Acta,; 2003; 333:169-176.

6.  Laposata M.E, Laposata M, Van Cott E.M, et al. Physician survey of a laboratory medicine interpretive service and evaluation of the influence of interpretations on laboratory test orderingArch Pathol Lab Med; 2004; 128:1424-1427.

7.  Hickner J, Graham D.G, Elder N.C, et al. Testing practices a study of the American Academy of Family Physicians National Research Network. Qual Saf Health Care; 2008; 17:194-200.

8.  Laposata M, Dighe A. “Pre-pre” and “post-post” analytical error high-incidence patient safety hazards involving the clinical laboratory. Clin Chem Lab Med; 2007; 45:712-719.

9.  Isaac S, Michael W.B. Handbook in Research and Evaluation A Collection of Principles, Methods, and Strategies useful in the Planning, Design, and Evaluation of Studies in Education and the Behavioral Sciences. 3rd ed. San Diego : EdITS Publishers 1995; 262.

10.  Huck S.W. Reading Statistics and Research. 6th ed. New York : Pearson 2011; 592.

11.  Fritsma G.A, McGlasson D.L. Quick Guide to Laboratory Statistics and Quality Control. Washington, DC : AACC Press 2012.

12.  McGlasson D.L, Plaut D, Shearer C. Statistics for the Hemostasis Laboratory, CD-ROM. McLean, VA : ASCLS Press 2006.

13.  Westgard J.O. Basic Method Validation. 3rd ed. Madison, Wisc : Westgard QC 2008.

14.  Clinical and Laboratory Standards Institute (CLSI). Method Comparison and Bias Estimation using Patient Samples; Approved Guideline. 2nd ed. Wayne, PA : CLSI document EP09–A2 2010 (interim revision)

15.  McGlasson D, Plaut D, Shearer C. Statistics for the Hemostasis Laboratory. McLean, VA : ASCLS Press 2006.

16.  Clinical Laboratory Improvement Amendments (CLIA). Calibration and Calibration Verification. Brochure No. 3. Available at Available at: Accessed 24.05.13.

17.  Clinical and Laboratory Standards Institute (CLSI):. Laboratory Instrument Implementation, Verification, and Maintenance; Approved Guideline. Wayne, PA : CLSI document GP31–A 2009.

18.  Bartz A.E. Basic Statistical Concepts. 3rd ed. New York : MacMillan Publishing 1988.

19.  Bland J.M, Altman D.G. Statistical methods for assessing agreement between two methods of clinical measurementLancet; 1986; 327:307-310.

20.  Westgard J.O, Quam E.F, Barry P.L, et al. Basic QC practices. 3rd ed. Madison, WI : Westgard QC 2010.

21.  Koepke J.A, Dotson M.A, Shifman M.A. A critical evaluation of the manual/visual differential leukocyte counting methodBlood Cells; 1985; 11:173-186.

22.  Henderson M.P.A, Cotten S.W, Rogers M.W, et al. Method evaluation. In: Bishop M.L, Fody E.P, Schoeff L.E. Clinical Chemistry Principles, Techniques, and Applications 7th ed. Philadelphia : Lippincott Williams & Wilkins 2013.

23.  Schork M.A, Remington R.D. Statistics with Applications to the Biological and Health Sciences. 3rd ed. Upper Saddle River, NJ : Prentice-Hall 2000.

24.  Data Innovations EP Evaluator. Available at Available at: Accessed 25.05.13.

25.  Burns C, More L. Quality assessment in the hematology laboratory. In: McKenzie S.B, Williams J.L. Clinical Laboratory Hematology. Boston : Prentice-Hall 2010.

26.  Clinical and Laboratory Standards Institute (CLSI). Defining, Establishing and Verifying Reference Intervals in the Clinical Laboratory; Approved Guideline. 3rd ed. Wayne, PA : CLSI document C28–A3 2010.

27.  Malone B. The quest for pediatric reference ranges why the national children’s study promises answers. Clin Lab News; 2012; 38:3-4.

28.  Krejcie R.V, Morgan D.W. Determining sample size for research activitiesEduc Psychological Measurement; 1970; 30,:607-610.

29.  Bachner P. Quality assurance in hematology. In: Howanitz J.F, Howanitz J.H. Laboratory Quality Assurance. New York : McGraw-Hill 1987.

30.  Levey S, Jennings E.R. The use of control charts in the clinical laboratoryAm J Clin Pathol; 1950; 20:1059-1066.

31.  Westgard J.O. Basic QC Practices. 3rd ed. Madison, WI : Westgard QC 2010.

32.  Westgard J.O, Barry P.L, Hunt M.R, et al. A multi-rule Shewhart chart for quality control in clinical chemistryClin Chem; 1981; 27:493-501.

33.  Bull B.S, Elashoff R.M, Heilbron D.C, et al. A study of various estimators for the derivation of quality control procedures from patient erythrocyte indicesAm J Clin Pathol; 1974; 61:473-481.

34.  Cembrowski G.S, Smith B, Tung D. Rationale for using insensitive quality control rules for today’s hematology analyzersInt J Lab Hematol; 2010; 32:606-615.

35.  Ovens K, Naugler C. How useful are delta checks in the 21st century? A stochastic-dynamic model of specimen mix-up and detectionJ Pathol Inform; 2012; 3:5.

36.  Lind S.E. The bleeding time does not predict surgical bleedingBlood; 1991; 77:2547-2552.

37.  Gewirtz A.S, Miller M.L, Keys T.F. The clinical usefulness of the preoperative bleeding timeArch Pathol Lab Med; 1996; 120:353-356.

38.  Tripepi G, Jager K.J, Dekker F.W, Zoccali C. Measures of effect in epidemiological researchClin Practice; 2010; 115:91-93.

39.  Replogle W.H, Johnson W.D. Interpretation of absolute measures of disease risk in comparative researchFam Med; 2007; 39:432-435.

40.  Fardy J.M. Evaluation of diagnostic testsMethods Mol Biol; 2009; 473:127-136.

41.  Søreide K. Receiver-operating characteristic curve analysis in diagnostic, prognostic and predictive biomarker researchJ Clin Pathol; 2009; 62:1-5.

42.  Novis D.A. Detecting and preventing the occurrence of errors in the practices of laboratory medicine and anatomic pathology 15 years’ experience with the College of American Pathologists’ Q-PROBES and Q-TRACKS programs. Clin Lab Med; 2004; 24:965-978.

43.  Winkelman J.W, Mennemeyer S.T. Using patient outcomes to screen for clinical laboratory errorsClin Lab Manage Rev; 1996; 10:139-142.

44.  Westgard J.O, Ehrmeyer S.S, Darcy T.P. CLIA Final Rule for Quality Systems, Quality Assessment Issues and Answers. Madison, WI : Westgard QC 2004.

45.  Howanitz P.J, Hoffman G.G, Schifman R.B, et al. A nationwide quality assurance program can describe standards for the practice of pathology and laboratory medicineQual Assur Health Care; 1992; 3:245-256.

*The author acknowledges David McGlasson, M.S., MLS (ASCP)CM, Clinical Research Scientist, Wilford Hall Ambulatory Surgical Center, JBSA, San Antonio, Texas, for his assistance in preparing this chapter.