Robert D. Becher, J. Wayne Meredith, and Patrick D. Kilgo
INTRODUCTION
Traumatic accidents have long been classified in terms of their severity. The world’s oldest known surgical document, the Edwin Smith Surgical Papyrus (ca. 17th century BC), classified 48 traumatic injuries from ancient Egyptian battlefields and construction sites as successfully treatable, possibly curable, or untreatable.1 Such predictions about patient outcomes, and attempts to quantify the severity of traumatic accidents, are today the realm of injury severity scores (ISSs).
Trauma injury severity scoring quantifies the risk of an outcome following trauma. Injury scoring provides a single metric based on elements of clinical acumen and statistical theory to describe aspects of the patient condition after a traumatic incident. The primary outcome of interest is usually survival, though the outcome can be whatever one wants to measure: hospital or ICU length of stay (LOS), a vital sign such as blood pressure, performance of a procedure, or any other endpoint of interest.
Clinically, these “scores” assist in the prehospital triage of trauma patients and can help to more accurately predict patient outcomes to assist with clinical decision making, especially at the end of life. In the outcomes research setting, ISSs allow valid comparisons between disparate groups, which in turn can be translated into myriad applications: quality improvements in patient care, advancements in trauma systems and health care delivery, enhancements in injury prevention, valid benchmarking and quality control “report cards,” and epidemiological studies of trauma, among others.
Outcomes research is defined as a method of creating “empirically verified information” to better understand how variables in the real-world setting (from injury to treatment) affect a wide range of outcome variables (from mortality to satisfaction with care).2 Because outcomes are the product of many influences, the outcomes researcher must isolate the effects he or she wants to study from the effects of other “noisy” factors that can influence the outcome. This is called risk adjustment, or “case mix” adjustment, and is essential for proper outcomes analysis.
In trauma outcomes research, trauma ISSs are the essential tools for stratified risk adjustment, thereby allowing accurate comparisons among disparate patient populations with varied degrees of risk. The goal is to compare populations with similar degrees of traumatic injury so that other risk factors (time to treatment, mechanism, injury prevention equipment, etc.) may be properly isolated to examine their relationship to particular outcomes. Risk adjustment might be as simple as defining classes of a variable to stratify risk groups or as complicated as using a risk adjustor in a multivariable regression model.3
This chapter provides a background into injury severity scoring and outcomes research, reviewed in three sections. The first section, Injury Coding, discusses the two major schemes used to classify traumatic injury in the United States, the Abbreviated Injury Scale (AIS) and the International Classification of Diseases (ICD). The second section, Injury Severity Scoring, highlights the major trauma scoring systems used for outcome prediction and risk adjustment. The final section, Outcomes Research, discusses the increasingly important role of outcomes research in the field of trauma, the databases used for such research, and the basic approaches to risk adjustment and statistical analysis.
INJURY CODING
Accurate classification of a patient’s injuries, also known as “injury coding,” is fundamental to the validity and success of severity scoring. This is because ISSs are uniformly based on two classification schemes: the AIS and the ICD (Table 5-1).
TABLE 5-1 Injury Coding/Classification Schemes: A Comparison
The most advanced trauma-specific, anatomically based coding lexicon is the AIS, which was first conceived as a system to define the type and severity of injuries arising from motor vehicle accidents.4 The last major revision to the AIS occurred in 2005,5 with a subsequent update in 2008.6 To calculate AIS scores, medical records of traumatic incidents are transcribed into specific codes that capture individual injuries. AIS is a proprietary classification system, meaning it requires specialized training for coding personnel. Therefore, AIS is not captured at every hospital.
The actual AIS code consists of two numerical components. The first component is a six-digit injury descriptor code (“pre-dot”), which is unique to each traumatic injury; pre-dots classify the injury by region, type of anatomic structure, specific structure, and level. The second component is a severity score (“post-dot”), graded from 1 (minor) to 5 (critical injury), with the caveat that all unsurvivable injuries are scored a 6 (Table 5-2); these severity scores, or “AIS severity,” are consensus-derived assessments assigned by a group of experts. Of note, AIS is used as both a classification scheme for injury coding (the pre-dots) and a severity score (the post-dots; see next section).
TABLE 5-2 AIS Components, Definition of 1–6
The second method to classify traumatic injury is the ICD coding system. ICD is not trauma-specific, but rather is a general, all-purpose diagnosis taxonomy for all health conditions; it is over 110 years old and is currently in its 10th revision (ICD-10),7 though in the United States the 9th revision (ICD-9)8 is most commonly used (though a conversion to using ICD-10 will be complete by the year 2013). Codes exist for over 10,000 medical conditions, about 2,000 of which are physical injuries (the block of ICD-9 codes from 800.0 to 959.9 encompasses all traumatic injuries). ICD-9 codes are used by all hospitals in the United States, primarily to classify diagnoses for administrative purposes, such as billing and event reporting.
For the trauma outcomes researcher, AIS codes are generally preferred over ICD-9 because of their greater specificity of injury description (the pre-dot classification). However, as discussed in the next section, valid severity scores can be formulated from either system. Additionally, while the AIS classification scheme attaches an ordinal 1–6 severity level to each injury, ICD-9 codes are only nominal classifications and therefore do not measure the severity of injury.
INJURY SEVERITY SCORES
ISSs quantify the risk of an outcome after trauma, for both clinical and research purposes. The selection of which trauma severity score to use should be based on a clear sense of what one wants to measure and why. The scores vary considerably, from complexity of calculation to ease of use. The majority of scores are based on either the trauma-specific AIS coding classification or the more general ICD-9 taxonomy. However, trauma scoring systems are continuously being revised, tested, and compared to each other, and still today there is no consensus on a “best” injury scoring system.
The trauma outcomes researcher needs to be familiar with the various scoring schemes (Table 5-3) in order to most accurately risk adjust their patient population to best isolate the effects of an independent predictor variable on a dependent outcome variable. In general, four types of risk adjustments (equally called “scores”) are calculated to account for trauma severity: (1) Anatomic Injury Scores; (2) Physiological Derangement Scores; (3) Comorbidity Scores; and (4) A combination of the three. Unlike other circumstantial factors (time to treatment, quality of care, etc.), each of these scores is intrinsic to the patient and are therefore important to understand and quantify.
TABLE 5-3 Injury Severity Scores: A Comparison by Type of Score
Anatomic Scoring Systems
Anatomic injury scores are the most developed types of risk adjustment following trauma. Many scores have been proposed in the literature, but this review will be limited to scores that have gained practical acceptance. The majority of scoring algorithms are designed to predict mortality (Table 5-3) and are not specifically validated on other outcomes, such as LOS or functional status, though moderate correlations may exist.
The AIS is not only a method to classify injuries, as described earlier, it is also a validated method to score injury severity. The AIS severity designation (ordinal scale from 1 to 6; Table 5-2) that accompanies each coded injury is the simplest form of a score. The maximum AIS (maxAIS), which is the largest AIS severity among all of a patient’s injuries, is highly associated with mortality but ignores information provided from other injuries.
In 1974, Baker et al. first posited a multiinjury score by introducing the ISS.9 ISS divides the body into six regions: head or neck, face, abdominal, chest, extremities, and external. Injuries in each region are given an AIS score and the highest AIS scores in the 3 most severely injured regions are squared and summed to form the ISS. ISSs have a range from 1 (least severe) to 75 (unsurvivable); higher scores reflect higher likelihood of mortality. Any patient with an AIS severity of 6 is automatically given an aggregate score of 75.
ISS correlates well with mortality and remains the most widely used anatomical scoring system. However, ISS has many limitations.10 ISS is often incorrectly treated as a continuous, monotonic function of mortality, though it is none of these (Fig. 5-1).11,12 There are only 44 distinct values of ISS, some of which are possible in two different combinations of sums of squares. Optimally, each combination would be treated nominally (as its own class) in terms of risk adjustment, but in practice this seldom occurs. Furthermore, ISS only considers one injury in each of the body regions and thus ignores important injury information. Because of these shortcomings we continue to believe ISS should be retired and replaced by one of the more modern injury scores that are now available (see below).
FIGURE 5-1 ISS versus actual mortality. This graph plots the mortality associated with each ISS value. Of note is the erratic choppiness of the curve, indicating that ISS is not a monotonically increasing function of mortality. It is characterized in places by steep decreases in mortality as ISS gets larger. Ideally, ISS would be considered nominal and not ordinal.
The New Injury Severity Score (NISS) was formulated by Osler et al. to address some of the ISS shortcomings, specifically its omission of multiple occurrences of serious injuries within the same body region.13 NISS is the sum of the squares of the three most severe AIS severities, regardless of body region (and keeping the convention that an AIS of 6 automatically results in a NISS of 75). This permutation offers a slight prediction advantage but has several of the same shortcomings as ISS (Fig. 5-2).
FIGURE 5-2 NISS versus actual mortality. This graph plots the mortality associated with each NISS value. The NISS curve is also very nonmonotonic, even more so than ISS.
The Anatomic Profile Score (APS), developed by Copes et al., adjusted for body region differences and AIS severity.11 Three “modified components” are weighted to form a single scalar based on anatomic location of all serious injuries (AIS severity of >3). Although APS represents a logical approach to anatomic scoring, it has failed to supplant ISS.
The International Classification of Diseases Injury Severity Score (ICISS), created by Osler et al., took an empirical estimation approach to injury severity scoring with the formulation of ICD-9 survival risk ratios (SRRs).14 An SRR is an ICD-9 code-specific estimate of the survival probability associated with that particular injury. For a set of patients, the SRR for a particular injury code is the number of patients that survive that injury divided by the number of patients who display the injury. The ICISS score is the product of the SRRs corresponding to a patient’s set of injuries, and ranges from 0 (unsurvivable) to 1 (high likelihood of survival).
ICISS offers several advantages over other anatomic scores. First, because of its ICD-9 base coding lexicon, it can be used in any clinical setting, including smaller centers that typically do not perform AIS coding. Second, unlike the consensus-derived AIS severity scores, ICISS’ empirical approach means that powerful statistical estimates of injury-specific survival can be computed if enough representative patients are available for study. Consequently, unlike ISS and NISS, ICISS is a smooth, if nonlinear, function of mortality (Fig. 5-3).
FIGURE 5-3 ICISS versus actual mortality. ICISS, unlike ISS and NISS, has a very smooth association with mortality, though it too is nonlinear. In the places where ICISS mortality decreases from one value to the next, the decrease is very slight, never more than about 7% and corrects itself quickly. Contrast these small decreases with the decreases seen in ISS and NISS, which can be as large as 20% from one value to the next and 30% in the span of two values.
However, ICISS does have limitations. First, although it resembles an overall probability, ICISS can only be considered a scalar since most SRRs are “contaminated” by patients with multiple injuries. Independent SRRs can be calculated from patients who only have an isolated injury, but these are not available for all codes because many injuries rarely occur in isolation.15 Second, SRRs are database-specific and the degree to which they are applicable within disparate populations remains uncertain.16
ICD-9 codes are nominal, meaning they are unordered, qualitative categories not ranked by severity. If one ignores the AIS severity score, AIS codes can also be treated nominally, taking advantage of their specificity in injury classification. As such, AIS injury descriptor codes can be used to create SRRs, similar to the SRR calculated from ICD-9 codes for ICISS. AIS-based SRRs are used for the TRAIS score (Trauma Registry Abbreviated Injury Score), which is the product of AIS-derived SRRs. Kilgo et al. showed that ICISS and TRAIS behave very similarly in a large group of patients coded both ways (Fig. 5-4) and that TRAIS out-predicts its AIS counterparts ISS, NISS, and APS.17
FIGURE 5-4 TRAIS and ICISS by mortality rate. ICISS and TRAIS behave very similarly in terms of their association with mortality (the vertical axis) despite being derived from two very different types of codes. This suggests that empirical approaches might obviate the inherent structure of the coding systems.
Trauma clinicians, outcomes researchers, and hospital administrators may ask: which of these approaches is the best? There is no consensus, and many publications each year continue to debate this question.
Several large studies, including Sacco et al. and Meredith et al., compared these anatomic scores in terms of their ability to predict mortality.18,19 Both studies found that APS and ICISS better discriminate survivors from nonsurvivors than ISS, NISS, and the ICDMAP versions of ISS, NISS, and APS. A surprising finding was that maxAIS performed better than its multiinjury counterparts ISS and NISS. Based on this result, Kilgo et al. showed that the patient’s worst injury, regardless of the coding lexicon (ICD-9 or AIS) or the estimation approach (AIS severity-consensus or empirical SRRs), was a better predictor of mortality than multiinjury scores, though there remains no consensus on this.17 More recently, however, Harwood et al. found that NISS was better than the ISS and equivalent to the maxAIS in the prediction of mortality in blunt trauma patients.20
Finally, in 1987, the American Association for the Surgery of Trauma (AAST) introduced the AAST Organ Injury Scale (OIS).21 The goal of the scale was not to predict outcomes, but to standardize the descriptive language of injuries to improve communication between trauma surgeons and physicians. Like AIS, the OIS provides an ordinal scale to each level of organ disfigurement, with Grade 1 injuries being relatively minor and Grade 5 injuries being destructive injuries that are thought to be fatal. These scales, originally developed by Moore et al. via a series of journal articles, exist for 32 organ and body region systems.22–27
Although descriptions using this lexicon are common, the scale has not been widely adopted into formal risk adjustment methods. The potential exists for these scales to make an enduring impact on outcomes research. The validation of OIS should be carried out with a large representative database.
Physiological Scoring Systems
Physiological status is a powerful predictor of mortality. Clinical markers, including respiratory rate (RR), systolic blood pressure (SBP), base deficit, and others, are important prognosticators of outcome and are routinely used in clinical management. However, unlike anatomic injuries and preexisting comorbidities, which are fixed at the time of hospital admission, physiological parameters are ever-changing, both spontaneously and in response to therapy. This makes them difficult to utilize in risk adjustment. The solution, even though imperfect and with some exceptions, is to use a “snapshot” of physiological status at one point in time, usually immediately upon emergency department (ED) arrival.
Perhaps the most widely employed physiological adjuster is the Glasgow Coma Scale (GCS), first proposed first by Teasdale and co-workers as a means to monitor postoperative craniotomy patients.28,29 The GCS was subsequently adopted by trauma surgeons as a measure of overall physiological derangement. The scale has three components—motor (GCS-M), verbal (GCS-V), and eye (GCS-E)—each with ordinal characterizations of severity (Table 5-4). The scales can be summed to produce the Glasgow Coma Score, or equally the GCS. The GCS is labeled a measure of brain injury but in actuality it measures brain function. It ranges from 3 (completely unresponsive) to 15 (completely responsive) and has been shown to be highly associated with survival. Osler and co-workers used the National Trauma Data Bank (NTDB) to show that the Glasgow Motor Component was almost as powerful as the full GCS score and had better statistical properties in general.30 As such, the motor score alone could replace the full GCS score.
TABLE 5-4 Descriptors of GCS Components
The Trauma Score, later updated to the Revised Trauma Score (RTS), was designed by Champion et al. as an approach to combining clinical and observational physiological data into one score.31,32 Two forms of the RTS exist, one for triage (Triage-RTS) and one for outcomes evaluation and risk adjustment. Both are based on variable physiological breakpoints for GCS, SBP, and RR (Table 5-5). The Triage-RTS score is calculated by summing the coded values for each of the three variables; it has a minimum score of 0 and maximum of 12.
TABLE 5-5 Revised Trauma Score (RTS) Variable Breakpoints
The RTS equation for outcomes evaluation computes indexed values of GCS, SBP, and RR (Table 5-5) by weighting them with logistic regression coefficients and summing them.
The RTS score ranges from 0 to 7.84; lower scores translate into more physiological derangement. RTS is highly associated with mortality and remains important in injury scoring through its contribution to the TRISS model (see below). Studies have also shown that the combined use of SBP and GCS-M are just as effective at predicting patient survival as the RTS.33
The Acute Physiologic and Chronic Health Evaluation II (APACHE-II) was first introduced in 1985.34 It is calculated from 12 physiological parameters (the worst values within 24 hours of ICU admission), age, and chronic health conditions. It has long been validated for the use in both medical and surgical ICU patients, though its use in the trauma intensive care unit (TICU) has been limited and debated. This is because of APACHE-II’s poor correlation with ISS and its inability to predict hospital LOS.35 However, APACHE-II very accurately predicts mortality in the TICU population.36,37 APACHE-II has also been shown to be superior to TRISS and ISS at predicting TICU mortality,38and we advocate for its use in risk adjustments in critically injured patients.
Comorbidity Scoring Systems
Trauma outcomes research has long recognized the importance of comorbidities on patient outcomes. Morris et al., among others, identified several preexisting conditions that worsen prognosis following trauma, most notably liver cirrhosis, chronic obstructive pulmonary disease (COPD), congenital coagulopathy, diabetes, and congenital heart disease.39 Morbid obesity has now been added to this list.40Accordingly, specific comorbidity adjustments, such as the Charlson Comorbidity Index (CCI), which are widely used in other disciplines,41 have been incorporated into current injury severity models in attempts to enhance their predictive abilities. Results, however, have been poor.42
The incorporation of preexisting conditions into injury severity models is difficult because so many potential comorbidities exist, each of which may itself occur with variable severity. Further, many are relatively rare, confounded by age, and may be inconsistently recorded. One accepted convention is to simply use patient age as a surrogate for comorbidities because age is moderately associated with serious preexisting disease. Another approach is to use the presence of individual comorbidities or classes of conditions (ICD-9 ranges) in risk adjustment methods. Either of these is acceptable but eventually a generalized score that incorporates all of this information might improve the accuracy of trauma scoring.
One trauma-specific score that adjusts for comorbidities is available, based on adjustments to TRISS.43 The Trauma and Injury Severity Score Comorbidity (TRISSCOM) adjusts the initial TRISS model (see below) to dichotomize age at 65 years old (as opposed to 55 years old) and to include eight comorbidities recorded as a binary yes/no variable if any one of the eight was present in the patient (based on ICD-9 diagnosis ranges: pulmonary disease, cardiac disease, diabetes, coagulopathy/anticoagulation, neurological disease or dementia, hepatic insufficiency, chronic renal insufficiency on dialysis, active neoplasia of the hematological or lymphatic system, or metastatic cancer). The end result was that the TRISSCOM model improved the predictive performance of TRISS but not its ability to discriminate.
Combined Scoring Systems
The three types of risk adjustments—anatomic, physiological, and comorbid—can be easily combined so that information from all three sources is used to predict outcomes. The first such attempt from resulted in the Trauma and Injury Severity Score (TRISS).44 TRISS has become the standard tool to estimate survival probabilities. TRISS incorporates ISS (anatomic component), RTS (physiological component), and an age indicator (≤55, >55; comorbidity component) to estimate survival. Two separate equations, one each for blunt and penetrating patients, represent weighted sums of each of the three components; the equations were calculated from data gathered in the Major Trauma Outcomes Study (MTOS).45 From these equations, a probability of survival can be calculated for an individual patient (Table 5-6). This probability (usually called the TRISS Score) can be used as a risk adjustor.
TABLE 5-6 Equation for TRISS: Probability of Survival = 1/(1 + e−(LOGIT)) where LOGIT is Given by: LOGIT = Intercept + βISS* (ISS) + βRTS* (RTS) + βAGE (AGE)
However, the TRISS approach has shortcomings.46 It requires 8–10 variables (depending on the number of injuries used by ISS); failing to capture even a single predictor renders TRISS incalculable. This is the case in as many as 28% of all trauma cases. TRISS could be improved by replacing ISS with ISS squared or replacing it with a better anatomic predictor, accounting for comorbidities more accurately, and updating the MTOS equations with more modern NTDB coefficients that reflect the advancements made since TRISS first appeared.47,48
Other TRISS-like models aim to account for all three risk adjustments.48 The ASCOT score (A Severity Characterization of Trauma) was introduced to address some weaknesses in TRISS, in particular its poor prediction for certain types of trauma (e.g., penetrating torso trauma) and the reliance upon ISS.49 Like TRISS, ASCOT relies upon anatomic descriptors, emergency department physiological status, age, and mechanism. However, instead of ISS, the Anatomic Profile (AP), which is the basis for the APS score, is used to adjust for anatomic severity.11 Further, age is parsed into five ordinal categories rather than two. Similar to TRISS, all the values are statistically weighted in such a manner as to produce a probability of survival. Although ASCOT provides better predictions than TRISS, it has failed to replace TRISS as the standard survival predictor.
OUTCOMES RESEARCH
Trauma outcomes research was at one point focused solely on predicting patient survival. Today, it has become much more complex, as contemporary trauma outcomes research keeps pace with the changes in the medical and scientific research communities on the whole. This has translated into a move away from a predominate focus on quantitative outcome measures, such as mortality and hospital LOS, and toward much more qualitative and subjective measures, such as health-related quality of life, chronic functional impairment, and quality-adjusted life years (QALYs). These changes reflect a trauma community that has begun to embrace the World Health Organization’s definition of health, which is a “state of complete physical, mental, and social well-being and not merely the absence of disease or infirmity.”50
Current severity scoring systems are inadequate for predicting nonfatal, subjective outcome measures. There is considerable room for growth and advancement in the scores’ ability to predict myriad potential outcomes in trauma patients, such as appropriateness of care, cost-utility, satisfaction with care, and functionality.51–53 This will be essential given the renewed national focus on comparative-effectiveness research (CER).54 To create such tools requires a basic understanding of the process of outcomes research, which is the focus of this section.
Outcomes Research Basics
There are five essential steps in outcomes research as outlined by Kane,2 and each step is to be performed sequentially (Kane RL. Understanding Health Care Outcomes Research. 2nd ed. Sudbury, MA: Jones & Bartlett Learning; 2005. www.jblearning.com. Reprinted with permission):
1. Define a research question
2. Develop a conceptual model
3. Identify the critical dependent and independent variables
4. Identify appropriate measures for each
5. Develop an analysis plan
Each step is critical, though none more so than refining your research question (step 1) by use of a conceptual model (step 2). Such a model (often a drawn diagram) outlines all the determinants thought to influence/cause an outcome, either directly or indirectly. The practice of creating a conceptual model should fully elucidate the multifactorial and multidimensional nature of the outcome under study. As such, the conceptual model is the foundation for outcomes research, and can be simplistically written as follows (and can be amended based on specific outcomes):
Outcomes = f (baseline, patient clinical characteristics, patient demographics, psychosocial characteristics, treatment, setting).2
Quality outcomes research attempts to define the determinants of an outcome in a quantifiable relationship. This therefore depends on quality, comprehensive data collection. Kane maintains that the ultimate goal of outcomes research analysis is to isolate the true relationship between an outcome of interest and its determinants. In order to do this, the researcher must risk adjust the data, meaning he must control for the effects of the other relevant variables in the outcomes model (see below). Accordingly, the more accurate the data, the better the risk adjustment, and in turn the more valid the statistical conclusions.
Identifying the Critical Variables
The goal of trauma outcomes research is to discover true relationships between input variables and outcome variables, collectively known as the “critical variables.” To do this requires statistical hypothesis testing, which enables inferences about populations based on samples from those populations. From these samples powerful inferences can be made if studies are properly designed and adequately powered. The statistical “model” is usually the modus operandi for exploring relationships among the critical variables. In general, three types of variables are used in the statistical modeling of data.
Outcome/Dependent Variables
The dependent, or outcome, variable is the one that is described in terms of the other variables (the independent variables) in the model under study. The outcome variables in trauma research include mortality, ICU and hospital LOS, the presence of some complication, functional status, and others. The data type of the outcome (continuous, dichotomous, ordinal, etc.) drives the type of statistical model chosen.
Predictor/Independent Variables
The independent, or predictor, variables are those variables that are hypothesized to influence the outcome of interest (the dependent variable). Independent variables are measured or observed. Examples would be ICD-9 code of an injury or a patient’s preexisting condition.
Covariates
Covariates are variables that are known to influence the study outcome, but whose relationship to the outcome is not of primary interest. These variables are called covariates and their purpose is to account for as much of the variance in the outcome as possible. Sometimes called “confounders” or “nuisance variables,” covariates are included in the model so that the association between predictors and outcomes is properly ascribed. The significance of the covariates are of no interest; all that matters is the association of the predictors to the outcome in the presence of covariates. In observational and interventional studies, trauma severity scores are usually used as covariates, hence removing (or adjusting) the confounding effect coincident with some other predictor of interest.
Analysis and Risk Adjustment Approaches
Unlike in randomized controlled trials, which are controlled experiments under controlled conditions in populations comparable on every level except the intervention being studied (termed “efficacy” studies), outcomes research evaluates the results of interventions and health care processes in real-world conditions (termed “effectiveness” studies). In such studies, patient populations can be vastly different, with varied degrees of injury severity, physiological derangement, and comorbidities. To address these differences, risk adjustments are made that allow accurate comparisons among such disparate patient populations.
Risk adjustment in trauma outcomes research uses the injury severity scoring systems mentioned above. Risk adjustment is increasingly simpler because of the advent of large relational databases and powerful, easily implemented statistical software. Researchers interested in risk adjustment should choose carefully which methods best accommodate their data constraints. Here are some factors to consider when planning a risk-adjusted study.
Database Choices
The type of patient database one uses for their research will determine what type of risk adjustments can be made (Table 5-7). Trauma registries, such as the NTDB,55 exist at most verified trauma centers for clinical documentation, research, and quality control purposes. These data include the pertinent medical records outcomes for each patient over a range of variables, including anatomic injury measures, physiological parameters, and comorbidities. In most cases, any of the aforementioned risk adjustments can be made to NTDB data. Absent large amounts of missing data, comprehensive TRISS-like risk models are fit and probabilities of survival are computed, if desired. This situation is optimal because the best available risk adjustments are derived from scoring approaches that use all three types of trauma severity adjustments.
TABLE 5-7 Databases Used in Trauma Outcomes Research and to Populate Injury Severity Scores
Administrative databases, on the other hand, exist primarily for billing purposes and aren’t meant specifically to be used for clinical research. In many cases administrative databases will have at least some injury ICD-9 codes and some comorbidity information, but seldom do they have physiological data. Therefore, risk adjustments on administrative data are usually limited to anatomic severity adjustments.56 If only a principal ICD-9 diagnosis code exists, then the worst injury approach is indicated and the SRR for this code is used for adjustments. If a complete set of injury codes is present, the evidence suggests that ICISS should be used.
Finally, the Trauma Quality Improvement Program (TQIP) uses NTDB-collected data to provide risk-adjusted mortality and morbidity analysis of participating trauma centers to track outcomes and improve patient care.57 Although still in its pilot stages, TQIP will eventually lead to the creation of an entire risk-adjusted database. The risk adjustment will be based on observed outcome to expected outcome ratios (O/E ratio; see below) for both survival and complications.
Risk Adjustment Choices
The use of AIS severities for risk adjustment have the advantage of familiarity, but studies show that SRR approaches account for more variance in the outcome, discriminate dichotomies better, and contain more information. The problem is that conglomerate scores such as TRISS and ASCOT use AIS severities, and no established, empirically based alternative exists. Hence, it is advisable to take empirical approaches such as TRAIS or ICISS when adjusting only with anatomic scores. Otherwise, TRISS-like combined scores that are AIS-based offer a substantial improvement over single anatomic adjustments. Finally, O/E ratios in databases such as TQIP provide population-based expected outcome probabilities, which can be used for risk stratification. Overall, a low O/E ratio indicates better than expected outcome and a high O/E ratio indicates poorer than expected outcome.
Injury Coding/Classification Choices
Injury classification is based on either AIS or ICD taxonomy. When the variables necessary for TRISS or ASCOT are available then AIS codes should be used. TRAIS may be calculated for any case where AIS codes are present, though it only represents the effect due to anatomic injury. Alternatively, when only ICD-9 codes are available (as with most administrative databases), the literature suggests that the ICISS score be used rather than mapping software. When both types of codes are available, the decision is more difficult and no consensus exists. AIS codes possess more specificity in describing the trauma landscape and have in the past been used for these types of adjustments. However, ICD-9 scores repeatedly have been shown to possess similar statistical properties. Although most trauma surgeons will prefer AIS scoring, these decisions are usually guided by other facets of the study design.
Evaluation of Trauma Severity Codes
Several statistical criteria are employed when evaluating the efficacy of the trauma severity scores. The choice of the modelbased evaluation depends on the data type of the outcome. When the outcome is continuous, multiple linear regression analysis or analysis of variance methods including model R-square values, information criterion, and tests of significance of risk factors suffice to evaluate the association of these scores to the outcome.58
If the outcome is dichotomous (i.e., it takes on one of the two possible values), then logistic regression is warranted.59,60 Logistic regression has two important functions. First, it establishes the relationships between the outcome and the predictors. Within logistic models, the strength of the association between predictors and outcomes is directly measured and inferences about statistical significance are made. Second, logistic regression returns an estimated probability of exposure to the outcome of interest. This estimated (expected) probability can be compared with the observed outcomes in the following ways:
Tests of Discrimination—a score that discriminates well is able to efficiently separate dichotomies. For example, survivors get accurately classified as survivors with minimal probability for misclassification as nonsurvivors. Popular tests of discrimination include the area under the Receiver Operating Characteristic (ROC) curve and Harrell’s c-index.60,61
Tests of Goodness-of-Fit—These tests measure the degree of agreement between empirically observed and statistically predicted probabilities. The Hosmer–Lemeshow (HL) statistic is probably used the most, but it has severe limitations.60 Many researchers prefer to graph predicted and observed classes in deciles and compare them visually.
Information Criterion Scores—Because models can be compared using different criteria (ROC, HL, etc.) that may disagree among themselves as to which model is preferred it is desirable to have a mathematically consistent approach to comparing models. Based upon the work of Kullback and Leibler, it is possible to measure the distance between any two models in terms of the amount of information contained in each model.62 In order to compare two models of a system (say, two models predicting death from trauma), it is enough to measure the Kullback–Leibler distance from each putative model to the “true model.” The “true model” is never known (of course, otherwise we would have no interest in modeling it), but by means of a mathematically rigorous sleight of hand it is possible to substitute another measure of information content, the Akaike Information Criterion (AIC), for the Kullback–Leibler distance and avoid the need to explicitly specify the true model.60 Fortuitously, the AIC is a simple function of the likelihood function (available from all standard statistical software) and the number of parameters estimated for the model of interest, and is thus straightforward to calculate. Once the AICs for each model are available, it is a simple matter to order them and to further assign probabilities to each model as to the likelihood that it is, in fact, the true model.63
REFERENCES
1. Breasted JH. The Edwin Smith Surgical Papyrus: Published in Facsimile and Hieroglyphic Transliteration with Translation and Commentary in Two Volumes. Chicago, IL: The University of Chicago Press, Oriental Institute Publications; 1930.
2. Kane RL. Understanding Health Care Outcomes Research. 2nd ed. Jones & Bartlett; 2005.
3. Iezzoni LI. Risk Adjustment for Measuring Healthcare Outcomes. 3rd ed. Health Administration Press; 2003.
4. Rating the severity of tissue damage. I. The abbreviated scale. JAMA. 1971;215(2):277–280.
5. Association for the Advancement of Automotive Medicine, Committee on Injury Scaling. The Abbreviated Injury Scale 2005. Des Plains, IL: Committee on Injury Scaling; 2005.
6. AAAM. Abbreviated Injury Scale (AIS) 2005-Update 2008. 1st ed. AAAM; 2008.
7. Organization WH. International Statistical Classification of Diseases and Health Related Problems (The) ICD-10. 2nd ed. World Health Organization; 2004.
8. Association AM. International Classification of Diseases 9th Revision Clinical Modification ICD-9-Cm 2001. Volumes 1 and 2. AMA Press; 2000.
9. Baker SP, O’Neill B, Haddon W, Long WB. The injury severity score: a method for describing patients with multiple injuries and evaluating emergency care. J Trauma. 1974;14(3):187–196.
10. Linn S. The injury severity score—importance and uses. Ann Epidemiol. 1995;5(6):440–446.
11. Copes WS, Champion HR, Sacco WJ, et al. Progress in characterizing anatomic injury. J Trauma. 1990;30(10):1200–1207.
12. Kilgo PD, Meredith JW, Hensberry R, Osler TM. A note on the disjointed nature of the injury severity score. J Trauma. 2004;57(3): 479–485; discussion 486–487.
13. Osler T, Baker SP, Long W. A modification of the injury severity score that both improves accuracy and simplifies scoring. J Trauma. 1997;43(6): 922–925; discussion 925–926.
14. Osler T, Rutledge R, Deis J, Bedrick E. ICISS: an international classification of disease-9 based injury severity score. J Trauma. 1996; 41(3):380–386; discussion 386–388.
15. Meredith JW, Kilgo PD, Osler TM. Independently derived survival risk ratios yield better estimates of survival than traditional survival risk ratios when using the ICISS. J Trauma. 2003;55(5):933–938.
16. Meredith JW, Kilgo PD, Osler T. A fresh set of survival risk ratios derived from incidents in the National Trauma Data Bank from which the ICISS may be calculated. J Trauma. 2003;55(5):924–932.
17. Kilgo PD, Osler TM, Meredith W. The worst injury predicts mortality outcome the best: rethinking the role of multiple injuries in trauma outcome scoring. J Trauma. 2003;55(4):599–606; discussion 606–607.
18. Sacco WJ, MacKenzie EJ, Champion HR, Davis EG, Buckman RF. Comparison of alternative methods for assessing injury severity based on anatomic descriptors. J Trauma. 1999;47(3):441–446; discussion 446–447.
19. Meredith JW, Evans G, Kilgo PD, et al. A comparison of the abilities of nine scoring algorithms in predicting mortality. J Trauma. 2002;53(4): 621–628; discussion 628–629.
20. Harwood PJ, Giannoudis PV, Probst C, et al. Which AIS based scoring system is the best predictor of outcome in orthopaedic blunt trauma patients? J Trauma. 2006;60(2):334–340.
21. The American Association for the Surgery of Trauma. AAST Injury Scoring Scale Resource for Trauma Care Professionals. Available at: http://www.aast.org/Library/TraumaTools/InjuryScoringScales.aspx. Accessed March 19, 2010.
22. Moore EE, Shackford SR, Pachter HL, et al. Organ injury scaling: spleen, liver, and kidney. J Trauma. 1989;29(12):1664–1666.
23. Moore EE, Cogbill TH, Malangoni MA, et al. Organ injury scaling, II: pancreas, duodenum, small bowel, colon, and rectum. J Trauma. 1990;30(11):1427–1429.
24. Moore EE, Cogbill TH, Jurkovich GJ, et al. Organ injury scaling. III: chest wall, abdominal vascular, ureter, bladder, and urethra. J Trauma. 1992;33(3):337–339.
25. Moore EE, Malangoni MA, Cogbill TH, et al. Organ injury scaling. IV: thoracic vascular, lung, cardiac, and diaphragm. J Trauma. 1994;36(3): 299–300.
26. Moore EE, Jurkovich GJ, Knudson MM, et al. Organ injury scaling. VI: extrahepatic biliary, esophagus, stomach, vulva, vagina, uterus (nonpregnant), uterus (pregnant), fallopian tube, and ovary. J Trauma. 1995;39(6):1069–1070.
27. Moore EE, Malangoni MA, Cogbill TH, et al. Organ injury scaling VII: cervical vascular, peripheral vascular, adrenal, penis, testis, and scrotum. J Trauma. 1996;41(3):523–524.
28. Teasdale G, Murray G, Parker L, Jennett B. Adding up the Glasgow Coma Score. Acta Neurochir Suppl (Wien). 1979;28(1):13–16.
29. Segatore M, Way C. The Glasgow Coma Scale: time for change. Heart Lung. 1992;21(6):548–557.
30. Healey C, Osler TM, Rogers FB, et al. Improving the Glasgow Coma Scale score: motor score alone is a better predictor. J Trauma. 2003; 54(4):671–678; discussion 678–680.
31. Champion HR, Sacco WJ, Carnazzo AJ, Copes W, Fouty WJ. Trauma score. Crit Care Med. 1981;9(9):672–676.
32. Champion HR, Sacco WJ, Copes WS, et al. A revision of the Trauma Score. J Trauma. 1989;29(5):623–629.
33. Oyetunji T, Crompton JG, Efron DT, et al. Simplifying Physiologic Injury Severity Measurement for Predicting Trauma Outcomes1. J Surg Res. 2010;159(2):627–632.
34. Knaus WA, Draper EA, Wagner DP, Zimmerman JE. APACHE II: a severity of disease classification system. Crit Care Med. 1985;13(10):818–829.
35. McAnena OJ, Moore FA, Moore EE, et al. Invalidation of the APACHE II scoring system for patients with acute trauma. J Trauma. 1992;33(4): 504–506; discussion 506–507.
36. Rutledge R, Fakhry S, Rutherford E, Muakkassa F, Meyer A. Comparison of APACHE II, Trauma Score, and Injury Severity Score as predictors of outcome in critically injured trauma patients. Am J Surg. 1993;166(3): 244–247.
37. Aslar AK, Kuzu MA, Elhan AH, Tanik A, Hengirmen S. Admission lactate level and the APACHE II score are the most useful predictors of prognosis following torso trauma. Injury. 2004;35(8):746–752.
38. Dossett LA, Redhage LA, Sawyer RG, May AK. Revisiting the validity of APACHE II in the trauma ICU: improved risk stratification in critically injured adults. Injury. 2009;40(9):993–998.
39. Morris JA, MacKenzie EJ, Edelstein SL. The effect of preexisting conditions on mortality in trauma patients. JAMA. 1990;263(14):1942–1946.
40. Byrnes MC, McDaniel MD, Moore MB, Helmer SD, Smith RS. The effect of obesity on outcomes among injured patients. J Trauma. 2005;58(2):232–237.
41. Needham DM, Scales DC, Laupacis A, Pronovost PJ. A systematic review of the Charlson Comorbidity Index using Canadian administrative databases: a perspective on risk adjustment in critical care research. J Crit Care. 2005;20(1):12–19.
42. Gabbe BJ, Magtengaard K, Hannaford AP, Cameron PA. Is the Charlson Comorbidity Index useful for predicting trauma outcomes? Acad Emerg Med. 2005;12(4):318–321.
43. Bergeron E, Rossignol M, Osler T, Clas D, Lavoie A. Improving the TRISS methodology by restructuring age categories and adding comorbidities. J Trauma. 2004;56(4):760–767.
44. Boyd CR, Tolson MA, Copes WS. Evaluating trauma care: the TRISS method. Trauma Score and the Injury Severity Score. J Trauma. 1987; 27(4):370–378.
45. Champion HR, Copes WS, Sacco WJ, et al. The Major Trauma Outcome Study: establishing national norms for trauma care. J Trauma. 1990; 30(11):1356–1365.
46. Gabbe BJ, Cameron PA, Wolfe R. TRISS: does it get better than this? Acad Emerg Med. 2004;11(2):181–186.
47. Osler TM, Rogers FB, Badger GJ, et al. A simple mathematical modification of TRISS markedly improves calibration. J Trauma. 2002; 53(4):630–634.
48. Kilgo PD, Meredith JW, Osler TM. Incorporating recent advances to make the TRISS approach universally available. J Trauma. 2006;60(5): 1002–1009.
49. Champion HR, Copes WS, Sacco WJ, et al. A new characterization of injury severity. J Trauma. 1990;30(5):539–545; discussion 545–546.
50. WHO|WHO Constitution. Available at: http://www.who.int/governance/eb/constitution/en/index.html. Accessed March 23, 2010.
51. Lee CN, Ko CY. Beyond outcomes—the appropriateness of surgical care. JAMA. 2009;302(14):1580–1581.
52. Holtslag HR, van Beeck EF, Lindeman E, Leenen LPH. Determinants of long-term functional consequences after major trauma. J Trauma. 2007; 62(4):919–927.
53. Schluter PJ, Neale R, Scott D, Luchter S, McClure RJ. Validating the functional capacity index: a comparison of predicted versus observed total body scores. J Trauma. 2005;58(2):259–263.
54. Iglehart JK. Prioritizing comparative-effectiveness research—IOM recommendations. N Engl J Med. 2009;361(4):325–328.
55. American College of Surgeons: Trauma Programs: National Trauma Data Bank (NTDB). Available at: http://www.facs.org/trauma/ntdb/index.html. Accessed March 22, 2010.
56. Clark DE, Winchell RJ. Risk adjustment for injured patients using administrative data. J Trauma. 2004;57(1):130–140; discussion 140.
57. American College of Surgeons: Trauma Programs: National Trauma Data Bank (NTDB) Trauma Quality Improvement Program (TQIP). Available at: http://www.facs.org/trauma/ntdb/tqip.html. Accessed March 22, 2010.
58. Kutner M, Nachtsheim C, Neter J, Li W. Applied Linear Statistical Models. 5th ed. McGraw-Hill/Irwin; 2004.
59. Hosmer DW, Lemeshow S. Applied Logistic Regression. 2nd ed. Wiley-Interscience; 2000.
60. Harrell FE. Regression Modeling Strategies. Corrected. Springer; 2001.
61. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143(1):29–36.
62. Kullback S, Leibler R. On information and sufficiency. Ann Math Statist. 1951;22:79.
63. Burnham KP. Model Selection and Multi-Model Inference: A Practical Information-Theoretic Approach. New York: Springer; 2010.