﻿ Statistical Methods for Multiple Variables - Basic & Clinical Biostatistics, 4th Edition

## Basic & Clinical Biostatistics, 4th Edition

### 10. Statistical Methods for Multiple Variables

KEY CONCEPTS

 The choice of statistical methods depends on the research question, the scales on which the variables are measured, and the number of variables to be analyzed. Several methods can be used to select variables in a multivariate regression. Polynomial regression can be used when the relationship is curvilinear. Many of the advanced statistical procedures can be interpreted as an extension or modification of multiple regression analysis. Cross-validation tell us how applicable the model will be if we used it in another sample of subjects. Many of the statistical methods used for questions with one independent variable have direct analogies with methods for multiple independent variables. A good rule of thumb is to have ten times as many subjects as variables. The term “multivariate” is used when more than one independent variable is analyzed. Analysis of covariance controls for confounding variables; it can be used as part of analysis of variance or in multiple regression. Multiple regression is a simple and ideal method to control for confounding variables. Logistic regression predicts a nominal outcome; it is the most widely used regression method in medicine. Multiple regression coefficients indicate whether the relationship between the independent and dependent variables is positive or negative. The regression coefficients in logistic regression can be transformed to give odds ratios. Dummy, or indicator, coding is used when nominal variables are used in multiple regression. The Cox model is the multivariate analogue of the Kaplan–Meier curve; it predicts time-dependent outcomes when there are censored observations. Regression coefficients indicate the amount the change in the dependent variable for each one-unit change in theX variable, holding other independent variables constant. The Cox model is also called the proportional hazard model; it is one of the most important statistical methods in medicine. Multiple regression measures a linear relationship only. Meta-analysis provides a way to combine the results from several studies in a quantitative way and is especially useful when studies have come to opposite conclusions or are based on small samples. The Multiple R statistic is the best indicator of how well the model fits the data—how much variance is accounted for. An effect size is a measure of the magnitude of differences between two groups; it is a useful concept in estimating sample sizes. Several methods are available when the goal is to classify subjects into groups. The Cochrane Collection is a set of very well designed meta-analyses and is available at libraries and online. Multivariate analysis of variance, or MANOVA, is analogous to using ANOVA when there are several dependent variables.

PRESENTING PROBLEMS

Presenting Problem 1

In Chapter 8 we examined the study by Jackson and colleagues (2002) who evaluated the relationship between BMI and percent body fat. Please refer to that chapter for more details on the study. We found a significant relationship between these two measures and calculated a correlation coefficient of r = 0.73. These investigators knew, however, that variables other than BMI may also affect the relationship between BMI and percent body fat and developed separate models for men and women. We use their data in this chapter to illustrate two important procedures: multiple regression to control possible confounding variables, and polynomial regression to model the nonlinear relationship we noted in Chapter 8. Data are on the CD-ROM in a file entitled “Jackson.”

Presenting Problem 2

Soderstrom and coinvestigators (1997) wanted to develop a model to identify trauma patients who are likely to have a blood alcohol concentration (BAC) in excess of 50 mg/dL. They evaluated data from a clinical trauma registry and toxicology database at a level I trauma center. Such patients might be candidates for alcohol and drug abuse and dependence treatment and intervention programs.

Data, including BAC, were available on 11,062 patients of whom approximately 71% were male and 65% were white. The mean age was 35 years with a standard deviation of 17 years. Type of injury was classified as unintentional, typically accidental (78.2%), or intentional, including suicide attempts (21.8%). Of these patients, 3180 (28.7%) had alcohol detected in the blood, and 91.2% of those patients had a BAC in excess of 50 mg/dL. Among the patients with a BAC > 50, percentages of men and whites did not differ appreciably from the entire sample; however, the percentage of intentional injuries in this group was higher (28.9%). We use a random sample of data provided by the investigators to illustrate the calculation and interpretation of the logistic model, the statistical method they used to develop their predictive model. Data are in a file called “Soderstrom” on the CD-ROM.

Presenting Problem 3

In the previous chapter we used data from a study by Crook and colleagues (1997) to illustrate the Kaplan–Meier survival analysis method. These investigators studied the correlation between both the pretreatment prostate-specific antigen (PSA) and posttreatment nadir PSA levels in men with localized prostate cancer who were treated using external beam radiation therapy. The Gleason histologic scoring system was used to classify tumors on a scale of 2 to 10. Please refer to that Chapter 9 for more details. The investigators wanted to examine factors other than tumor stage that might be associated with treatment failure, and we use observations from their study to describe an application of the Cox proportional hazard model. Data on the patients are given in the file entitled “Crook” on the CD-ROM.

Presenting Problem 4

The use of central venous catheters to administer parenteral nutrition, fluids, or drugs is a common medical practice. Catheter-related bloodstream infections (CR-BSI) are a serious complication estimated to occur in about 200,000 patients each year. Many studies have suggested that impregnation of the catheter with the antiseptic chlorhexidine/silver sulfadiazine reduces bacterial colonization, but only one study has shown a significant reduction in the incidence of bloodstream infections.

It is difficult for physicians to interpret the literature when studies report conflicting results about the benefits of a clinical intervention or practice. As you now know, studies frequently fail to find significance because of low power associated with small sample sizes. Traditionally, conflicting results in medicine are dealt with by reviewing many studies published in the literature and summarizing their strengths and weaknesses in what are commonly called review articles. Veenstra and colleagues (1999) used a more structured method to combine the results of several studies in a statistical manner. They applied meta-analysis to 11 randomized, controlled clinical trials, comparing the incidence of bloodstream infection in impregnated catheters versus nonimpregnated catheters, so that overall conclusions regarding efficacy of the practice could be drawn. The section titled “Meta-Analysis” summarizes the results.

PURPOSE OF THE CHAPTER

The purpose of this chapter is to present a conceptual framework that applies to almost all the statistical procedures discussed so far in this text. We also describe some of the more advanced techniques used in medicine.

A Conceptual Framework

The previous chapters illustrated statistical techniques that are appropriate when the number of observations on each subject in a study is limited. For example, a t test is used when two groups of subjects are studied and the measure of interest is a single numerical variable—such as in Presenting Problem 1 in Chapter 6, which discussed differences in pulse oximetry in patients who did and did not have a pulmonary embolism (Kline et al, 2002). When the outcome of interest is nominal, the chi-square test can be used—such as the Lapidus et al (2002) study of screening for domestic violence in the emergency department (Chapter 6 Presenting Problem 3). Regression analysis is used to predict one numerical measure from another, such as in the study predicting insulin sensitivity in hyperthyroid women (Gonzalo et al, 1996; Chapter 7 Presenting Problem 2)

Alternatively, each of these examples can be viewed conceptually as involving a set of subjects with two observations on each subject: (1) for the t test, one numerical variable, pulse oximetry, and one nominal (or group membership) variable, development of pulmonary embolism; (2) for the chi-square test, two nominal variables, training in domestic violence and screening in the emergency department; (3) for regression, two numerical variables, insulin sensitivity and body mass index. It is advantageous to look at research questions from this perspective because ideas are analogous to situations in which many variables are included in a study.

To practice viewing research questions from a conceptual perspective, let us reconsider Presenting Problem 1 in Chapter 7 by Woeber (2002). The objective was to determine whether differences exist in serum free T4 concentrations in patients who had thyroiditis with normal serum TSH values and not taking L-T4 replacement, had normal TSH values and were taking L-T4 replacement therapy, or had normal thyroid and serum TSH levels. The research question in this study may be viewed as involving a set of subjects with two observations per subject: one numerical variable, serum free T4 concentrations, and one ordinal (or group membership) variable, thyroid status, with three categories. If only two categories were included for thyroid status, the t test would be used. With more than two groups, however, one-way analysis of variance (ANOVA) is appropriate.

Many problems in medicine have more than two observations per subject because of the complexity involved in studying disease in humans. In fact, many of the presenting problems used in this text have multiple observations, although we chose to simplify the problems by examining only selected variables. One method involving more than two observations per subject has already been discussed: two-way ANOVA. Recall that in Presenting Problem 2 in Chapter 7 insulin sensitivity was examined in overweight and normal weight women with and without hyperthyroid disease (Gonzalo et al, 1996). For this analysis, the investigators classified women according to two nominal variables (weight status and thyroid status, both measured as normal or higher than normal) and one numerical variable, insulin sensitivity. (Although both weight and thyroid level are actually numerical measures, the investigators transformed them into nominal variables by dividing the values into two categories.)

If the term independent variable is used to designate the group membership variables (eg, development of pulmonary embolism or not), or the X variable (eg, blood pressure measured by a finger device), and the term dependent is used to designate the variables whose means are compared (eg, pulse oximetry), or the Y variable (eg, blood pressure measured by the cuff device), the observations can be summarized as in Table 10-1. (For the sake of simplicity, this summary omits ordinal variables; variables measured on an ordinal scale are often treated as if they are nominal.) Data from several of the presenting problems are available on the CD-ROM, and we invite you to replicate the analyses as you go through this chapter.

Table 10-1. Summary of conceptual frameworka for questions involving two variables.

 Independent Method Dependent Variable Method Nominal Nominal Chi-square Nominal (binary) Numerical t testa Nominal (more than two values) Numerical One-way ANOVAa Nominal Numerical (censored) Actuarial methods Numerical Numerical Regressionb aAssuming the necessary assumptions (eg, normality, independence, etc.) are met. b Correlation is appropriate when neither variable is designated as independent or dependent.ANOVA = analysis of variance.

Introduction to Methods for Multiple Variables

Statistical techniques involving multiple variables are used increasingly in medical research, and several of them are illustrated in this chapter. The multiple-regression model, in which several independent variables are used to explain or predict the values of a single numerical response, is presented first, partly because it is a natural extension of the regression model for one independent variable illustrated in Chapter 8. More importantly, however, all the other advanced methods except meta-analysis can be viewed as modifications or extensions of the multiple-regression model. All except meta-analysis involve more than two observations per subject and are concerned with explanation or prediction

The goal in this chapter is to present the logic of the different methods listed in Table 10-2 and to illustrate how they are used and interpreted in medical research. These methods are generally not mentioned in traditional introductory texts, and most people who take statistics courses do not learn about them until their third or fourth course. These methods are being used more frequently in medicine, however, partly because of the increased involvement of statisticians in medical research and partly because of the availability of complex statistical computer programs. In truth, few of these methods would be used very much in any field were it not for computers because of the time-consuming and complicated computations involved. To read the literature with confidence, especially studies designed to identify prognostic or risk factors, a reasonable acquaintance with the methods described in this chapter is required. Few of the available elementary books discuss multivariate methods. One that is directed toward statisticians is nevertheless quite readable (Chatfield, 1995); Katz (1999) is intended for readers of the medical literature and contains explanations of many of topics we discuss in this chapter (Dawson, 2000), as does Norman and Streiner (1996).

Before we examine the advanced methods, however, a comment on terminology is necessary. Some statisticians reserve the term “multivariate” to refer to situations that involve more than one dependent (or response) variable. By this strict definition, multiple regression and most of the other methods discussed in this chapter would not be classified as multivariate techniques. Other statisticians, ourselves included, use the term to refer to methods that examine the simultaneous effect of multiple independent variables. By this definition, all the techniques discussed in this chapter (with the possible exception of some meta-analyses) are classified as multivariate.

MULTIPLE REGRESSION

Review of Regression

Simple linear regression (Chapter 8) is the method of choice when the research question is to predict the value of a response (dependent) variable, denoted Y, from an explanatory (independent) variable X. The regression model is

For simplicity of notation in this chapter we use Y to denote the dependent variable, even though Y′, the predicted value, is actually given by this equation. We also use a and b, the sample estimates, instead of the population parameters, β0 and β1, where a is the intercept and bthe regression coefficient. Please refer to Chapter 8 if you'd like to review simple linear regression.

Multiple Regression

The extension of simple regression to two or more independent variables is straightforward. For example, if four independent variables are being studied, the multiple regression model is

where X1 is the first independent variable and b1 is the regression coefficient associated with it, X2 is the second independent variable andb2 is the regression coefficient associated with it, and so on. This arithmetic equation is called a linear combination; thus, the response variable Y can be expressed as a (linear) combination of the explanatory variables. Note that a linear combination is really just a weighted average that gives a single number (or index) after the X's are multiplied by their associated b's and the bX products are added. The formulas for a and b were given in Chapter 8, but we do not give the formulas in multiple regression because they become more complex as the number of independent variables increases; and no one calculates them by hand, in any case

Table 10-2. Summary of conceptual frameworka for questions involving two or more independent (explanatory) variables.

 Independent Variables Dependent Variable Method(s) Nominal Nominal Log-linear Nominal and numerical Nominal (binary) Logistic regression Nominal and numerical Nominal (2 or more categories) Logistic regressionDiscriminant analysisa Cluster analysisPropensity scoresCART Nominal Numerical ANOVAaMANOVAa Numerical Numerical Multiple regressiona Nominal and numerical Numerical (censored) Cox propotional hazard model Confounding factors Numerical ANCOVAaMANOVAaGEEa Confounding factors Nominal Mantel–Haenszel Numerical only Factor analysis aCertain assumptions (eg, multivariate normality, independence, etc.) are needed to use these methods.CART = classification and regression tree; ANOVA = analysis of variance; ANCOVA = analysis of covariance; MANOVA = multivariate analysis of variance; GEE = generalized estimating equations.

The dependent variable Y must be a numerical measure. The traditional multiple-regression model calls for the independent variables to be numerical measures as well; however, nominal independent variables may be used, as discussed in the next section. To summarize, the appropriate technique for numerical independent variables and a single numerical dependent variable is the multiple regression model, as indicated in Table 10-2.

Multiple regression can be difficult to interpret, and the results may not be replicable if the independent variables are highly correlated with each other. In the extreme situation, two variables that are perfectly correlated are said to be collinear. When multicollinearity occurs, the variances of the regression coefficients are large so the observed value may be far from the true value. Ridge regression is a technique for analyzing multiple regression data that suffer from multicollinearity by reducing the size of standard errors. It is hoped that the net effect will be to give more reliable estimates. Another regression technique, principal components regression, is also available, but ridge regression is the more popular of the two methods.

Interpreting the Multiple Regression Equation

Jackson and colleagues (2002) (Presenting Problem 1) wanted to study the way in which sex, age, and race affect the relationship between BMI and percent body fat. We provide some basic information on these variables in Table 10-3 and see the study included 121 black females, 238 white females, 81 black men, and 215 white men

Table 10-4 shows the regression equation to predict percent body fat (see the shaded values). Focusing initially on the Regression Equation Section, we see that all the variables are statistically significantly related to percent body fat.

Table 10-3. Means and standard deviations broken down by gender and race.

 Report Gender Race 2 Age BMI PCTFAT Female Black Mean 32.7770 28.1380 35.997 N 121 121 121 Standard deviation 11.35229 6.14086 8.756 White Mean 34.4032 24.8182 29.971 N 238 238 238 Standard deviation 13.79910 4.91353 9.8447 Total Mean 33.8551 25.9371 32.002 N 359 359 359 Standard deviation 13.03256 5.57608 9.9349 Male Black Mean 34.2526 26.9269 22.944 N 81 81 81 Standard deviation 11.97843 4.83454 7.3195 White Mean 36.4834 26.5334 22.963 N 215 215 215 Standard deviation 15.06562 4.66455 9.0302 Total Mean 35.8730 26.6411 22.958 N 296 296 296 Standard deviation 14.30226 4.70670 8.5839 Total Black Mean 33.3687 27.6524 30.763 N 202 202 202 Standard deviation 11.60057 5.67188 10.4632 White Mean 35.3905 25.6322 26.645 N 453 453 453 Standard deviation 14.43550 4.86781 10.0846 Total Mean 34.7670 26.2552 27.915 N 655 655 655 Standard deviation 13.64747 5.20919 10.3710 Source: Data, used with permission, from Jackson AS, Stanforth PR, Gagnon J, Rankinen T, Leon AS, Rao DC, et al: The effect of sex, age and race on estimating percentage body fat from body mass index: The Heritage Family Study. Int J Obes Relat Metab Disord 2002; 26: 789–796. Table produced with SPSS Inc.; used with permission.

The first variable is a numerical variable, age, with regression coefficient, b, of 0.1603, indicating that greater age is associated with higher percent body fat. The second variable, BMI, is also numerical; the regression coefficient of 1.3710 indicates that patients with higher BMI also have higher percent body fat, which certainly makes sense.

The third variable, sex, is a binary variable having two values. For regression models it is convenient to code binary variables as 0 and 1; in the Jackson example, females have a 0 code for sex, and males have a 1. This procedure, called dummy or indicator coding, allows investigators to include nominal variables in a regression equation in a straightforward manner. The dummy variables are interpreted as follows: A subject who is male has the code for males, 1, multiplied by the regression coefficient for sex, 1.3710, resulting in an additional 1.3710 points being added to his percent body fat. The decision of which value is assigned 1 and which is assigned 0 is an arbitrary decision made by the researcher but can be chosen to facilitate interpretations of interest to the researcher.

The final variable is race, also dummy coded, with 0 for black and 1 for white. The regression coefficient is negative and indicates that white patients have 0.9161 subtracted from their percent body fat. The intercept itself is -8.3748, meaning that the predicted percent body fat is reduced by this amount after including all variables in the equation. The regression coefficients can be used to predict percent body fat by multiplying a given patient's value for each independent variable X by the corresponding regression coefficient b and then summing to obtain the predicted percent body fat.

Table 10-4. Multiple regression predicting percent body weight.

Multiple regression Report

Run Summary Section

Parameter

Value

Parameter

Value

Dependent variable

PCTFAT

Rows processed

655

Number independent variables

4

Rows filtered out

0

Weight variable

None

Rows with X's missing

0

R2

0.8042

Rows with weight missing

0

0.8030

Rows with Y missing

0

Coefficient of variation

0.1649

Rows used in estimation

655

Mean square error

21.18832

Sum of weights

655.000

Square root of MSE

4.603077

Completion status

Normal completion

Ave Abs Pct Error

19.089

 Regression Equation Section Regression Standard T-Value Reject Power Independent Coefficient Error to test Prob H0 at of Test Variable b (i) Sb(i) H0:B(i)=0 Level 5%? at 5% Intercept -8.3748 1.0338 -8.101 0.0000 Yes 1.0000 Age 0.1603 0.0140 11.442 0.0000 Yes 1.0000 BMI 1.3710 0.0372 36.809 0.0000 Yes 1.0000 Race -0.9161 0.4005 -2.287 0.0225 Yes 0.6283 Sex -10.2746 0.3638 -28.242 0.0000 Yes 1.0000

Regression Coefficient Section

Independent

Regression

Standard

Lower

Upper

Standardized

Variable

Coefficient

Error

95% C.L.

95% C.L.

Coefficient

Intercept

-8.3748

1.0338

-10.4011

-6.3486

0.0000

Age

0.1603

0.0140

0.1328

0.1877

0.2109

BMI

1.3710

0.0372

1.2980

1.4440

0.6886

Race

-0.91616

0.4005

-1.7011

-0.1311

-0.0408

Sex

-10.2746

0.3638

-10.9876

-9.5616

-0.4934

Note: The T-Value used to calculate these confidence limits was 1.960.

Source: Data, used with permission, from Jackson AS, Stanforth PR, Gagnon J, Rankinen T, Leon AS, Rao DC, et al: The effect of sex, age and race on estimating percentage body fat from body mass index: The Heritage Family Study. Int J Obes Relat Metab Disord 2002; 26: 789–796. Analysis produced using NCSS; used with permission.

Regression coefficients are interpreted differently in multiple regression than in simple regression. In simple regression, the regression coefficient b indicates the amount the predicted value of Y changes each time X increases by 1 unit. In multiple regression, a given regression coefficient indicates how much the predicted value of Y changes each time X increases by 1 unit,holding the values of all other variables in the regression equation constant—as though all subjects had the same value on the other variables. For example, predicted percent body fat is increased by 0.1603 for increase of 1 year in patient, assuming all other variables are held constant. This feature of multiple regression makes it an ideal method to control for baseline differences and confounding variables, as we discuss in the section titled “Controlling for Confounding.”

It bears repeating that multiple regression measures only the linear relationship between the independent variables and the dependent variable, just as in simple regression. In the Jackson study, the authors examined the scatterplot between BMI and percent body fat, which we have reproduced in Figure 10-1. The figure indicates a curvilinear relationship, and investigators decided to transform BMI by taking its natural logarithm. They developed four models for females and males separately to examine the cumulative effect of including variables in the regression equation; results are reproduced in Table 10-5. Model I includes only ln BMI and the intercept; model II adds in the age, model III the race, and model IV interactions between ln BMI with race and age. The rationale for including interactions is the same as discussed in Chapter 7, namely that they wanted to know whether the relationship between ln BMI and percent body weight was the same for all levels of race or age.

Figure 10-1. Plot illustrating the nonlinear relationship between BMI and percent body fat. (Data, used with permission, from Jackson AS, Stanforth PR, Gagnon J, Rankinen T, Leon AS, Rao DC, et al: The effect of sex, age, and race on estimating percentage body fat from body mass index: The Heritage Family Study. Int J Obes Relat Metab Disord 2002; 26: 789–796. Analysis produced using NCSS; used with permission.)

Table 10-5. Results from the regression analyses predicting percent body weight.

 Female Models Male Models Variable I II III IV I II III IV Intercept 107.22a 102.01a 97.11a 82.83a 111.13a 103.94a 104.21a 149.24a In BMI 43.05a 39.96a 38.67a 34.43a 41.04a 37.31a 37.35a 51.31a Age 0.14a 0.15a 0.14a 0.14a 0.14a 1.47a Raceb -1.63a -26.02a -0.23 Race × In BMI 7.48a Age × ln BMI -0.41a r2 0.78a 0.80 0.81a 0.82a 0.67a 0.72a 0.72a 0.73a rΔ 0.01a 0.01a 0.01a 0.05a 0.00 0.01a s.e.e. (% fat) 4.7 4.4 4.3 4.3 4.9 4.6 4.6 4.5 a P< 0.001.b Key: race—black = 0 and white = 1.Source: Data, used with permission, from Jackson AS, Stanforth PR, Gagnon J, Rankinen T, Leon AS, Rao DC, et al: The effect of sex, age and race on estimating percentage body fat from body mass index: The Heritage Family Study. Int J Obes Relat Metab Disord 2002; 26: 789–796.

Statistical Tests for the Regression Coefficient

Table 10-6 shows the output from NCSS for model III for female subjects; it contains a number of features to discuss. In the upper half of the table, note the columns headed by t value and probability level. Both the t test and the F test can be used to determine whether a regression coefficient is different from zero, or the t distribution can be used to form confidence intervals for each regression coefficient. Remember that even though the P values are sometimes reported as 0.000, there is always some probability, even if it is very small. Many statisticians believe, and we agree, that it is more accurate to report P < 0.001.

Standardized Regression Coefficients

Most authors present regression coefficients that can be used with individual subjects to obtain predicted Y values. But the size of the regression coefficients cannot be used to decide which independent variables are the most important, because their size is also related to the scale on which the variables are measured, just as in simple regression. For example, in Jackson and colleagues' study, the variable race was coded 1 if white and 0 if black, and the variable age was coded as the number of years of age at the time of the first data collection. Then, if race and age are equally important in predicting subsequent depression, the regression coefficient for race would be much larger than the regression coefficient for age so that the same amount would be added to the prediction of percent body weight. These regression coefficients are sometimes called unstandardized; they cannot be used to draw conclusions about the importance of the variable, but only whether the relationship or with the dependent variable Y is positive or negative.a One way to eliminate the effect of scale is to standardize the regression coefficients. Standardization can be done by subtracting the mean value of X and dividing by the standard deviation before analysis, so that all variables have a mean of 0 and a standard deviation of 1. Then it is possible to compare the magnitudes of the regression coefficients and draw conclusions about which explanatory variables play an important role. It is also possible to calculate the standardized regression coefficients after the regression model has been developed.b The larger the standardized coefficient, the larger the value of the t statistic. Standardized regression coefficients are often referred to as beta (β) coefficients. The major disadvantage of standardized regression coefficients is that they cannot readily be used to predict outcome values. The lower half of Table 10-6 contains the standardized regression coefficients in the far right column for the variables used to percent body fat in Jackson and colleagues' study. Using the standardized coefficients in Table 10-6, can you determine which variable, age or race, has more influence in predicting subsequent depression? If you chose age, you are correct, because the absolute value of its standardized coefficient is larger, 0.1981, compared with -0.0777 for race.

Table 10-6. Regression analysis of females, model III.

Regression Equation Section

Regression

Standard

T-Value

Reject

Power

Independent

Coefficient

Error

to test

Prob

H0 at

of Test

Variable

b(i)

Sb(i)

H0:B(i)=0

Level

5%?

at 5%

Intercept

-97.1096

4.0314

-24.088

0.0000

Yes

1.0000

Log_BMI

38.6724

1.2684

30.490

0.0000

Yes

1.0000

Age

0.1510

0.0190

7.938

0.0000

Yes

1.0000

Race

-1.6308

0.5125

-3.182

0.0016

Yes

0.8875

 Regression Coefficient Section Independent Regression Standard Lower Upper Standardized Variable Coefficient Error 95% C.L. 95% C.L. Coefficient Intercept -97.1096 4.0314 -105.0110 -89.2082 0.0000 Log_BMI 38.6724 1.2684 36.1864 41.1584 0.7910 Age 0.1510 0.0190 0.1137 0.1883 0.1981 Race -1.6308 0.5125 -2.6354 -0.6262 -0.0777 Note: TheT-value used to calculate these confidence limits was 1.960.

Source: Data, used with permission, from Jackson AS, Stanforth PR, Gagnon J, Rankinen T, Leon AS, Rao DC, et al: The effect of sex, age and race on estimating percentage body fat from body mass index: The Heritage Family Study. Int J Obes Relat Metab Disord 2002; 26: 789–796. Table produced with NCSS; used with permission.

Multiple R

Multiple R is the multiple-regression analogue of the Pearson product moment correlation coefficient r. It is also called the coefficient of multiple determination, but most authors use the shorter term. As an example, suppose percent body fat is calculated for each person in the study by Jackson and colleagues; then, the correlation between predicted percent body fat and the actual percent body fat is calculated. This correlation is the multiple R. If the multiple R is squared (R2), it measures how much of the variation in the actual depression score is accounted for by knowing the information included in the regression equation. The term R2 is interpreted in exactly the same way as r2 in simple correlation and regression, with 0 indicating no variance accounted for and 1.00 indicating 100% of the variance accounted for. Recall that in simple regression, the correlation between the actual value Y of the dependent variable and the predicted value, denoted Y′, is the same as the correlation between the dependent variable and the independent variable; that is, rY × Y = rXY. Thus,R and R2 in multiple regression play the same role as r and r2 in simple regression. The statistical test for R and R2, however, uses the Fdistribution instead of the t distribution

The computations are time-consuming, and fortunately, computers do them for us. Jackson and colleagues included R2 in Table 10-5(although they used lowercase r2); it was 0.81 for model III (and is also shown in the NCSS output in Table 10-4). After ln BMI, age, and race are entered into the regression equation, R2 = 0.81 indicates that more than 80% of the variability in percent body fat is accounted for by knowing patients' BMI, age, and race. Because R2 is less than 1, we know that factors other than those included in the study also play a role in determining a person's percent body fat.

Selecting Variables for Regression Models

The primary purpose of Jackson and colleagues in their study of BMI and percent body fat was explanation; they used multiple regression analysis to learn how specific characteristics confounded the relationship between BMI and percent body fat. They also wanted to know how the characteristics interacted with one another, such as gender and race. Some research questions, however, focus on the prediction of the outcome, such as using the regression equation to predict of percent body fat in future subjects

Deciding on the variables that provide the best prediction is a process sometimes referred to as model building and is exemplified in Table 10-5. Selecting the variables for regression models can be accomplished in several ways. In one approach, all variables are introduced into the regression equation, called the “enter” method in SPSS and used in the multiple regression procedure in NCSS. Then, especially if the purpose is prediction, the variables that do not have significant regression coefficients are eliminated from the equation. The regression equation may be recalculated using only the variables retained because the regression coefficients have different values when some variables are removed from the analysis.

Computer programs also contain routines to select an optimal set of explanatory variables. One such procedure is called forward selection. Forward selection begins with one variable in the regression equation; then, additional variables are added one at a time until all statistically significant variables are included in the equation. The first variable in the regression equation is the X variable that has the highest correlation with the response variable Y. The next X variable considered for the regression equation is the one that increases R2 by the largest amount. If the increment in R2 is statistically significant by the F test, it is included in the regression equation. This step-by-step procedure continues until no X variables remain that produce a significant increase in R2. The values for the regression coefficients are calculated, and the regression equation resulting from this forward selection procedure can be used to predict outcomes for future subjects. The increment in R2 was calculated by Jackson and colleagues; it is shown as r2Δ in Table 10-5.

A similar backward elimination procedure can also be used; in it, all variables are initially included in the regression equation. The Xvariable that would reduce R2 by the smallest increment is removed from the equation. If the resulting decrease is not statistically significant, that variable is permanently removed from the equation. Next, the remaining X variables are examined to see which produces the next smallest decrease in R2. This procedure continues until the removal of an X variable from the regression equation causes a significant reduction in R2. That X variable is retained in the equation, and the regression coefficients are calculated.

When features of both the forward selection and the backward elimination procedures are used together, the method is called stepwise regression (stepwise selection). Stepwise selection is commonly used in the medical literature; it begins in the same manner as forward selection. After each addition of a new X variable to the equation, however, all previously entered X variables are checked to see whether they maintain their level of significance. Previously entered X variables are retained in the regression equation only if their removal would cause a significant reduction in R2. The forward versus backward versus stepwise procedures have subtle advantages related to the correlations among the independent variables that cannot be covered in this text. They do not generally produce identical regression equations, but conceptually, all approaches determine a “parsimonious” equation using a subset of explanatory variables.

Some statistical programs examine all possible combinations of predictor values and determine the one that produces the overall highest R2, such as All Possible Regression in NCSS. We do not recommend this procedure, however, and suggest that a more appealing approach is to build a model in a logical way. Variables are sometimes grouped according to their function, such as all demographic characteristics, and added to the regression equation as a group or block; this process is often called hierarchical regression; see exercise 7 for an example. The advantage of a logical approach to building a regression model is that, in general, the results tend to be more stable and reliable and are more likely to be replicated in similar studies.

Polynomial Regression

Polynomial regression is a special case of multiple regression in which each term in the equation is a power of X. Polynomial regression provides a way to fit a regression model to curvilinear relationships and is an alternative to transforming the data to a linear scale. For example, the following equation can be used to predict a quadratic relationship:

If a linear and cubic term do not provide an adequate fit, a cubic term, a fourth-power term, and so on, can also be included until an adequate fit is obtained

Jackson and colleagues (2002) used polynomial regression to fit separate curves for men and women, illustrated in Figure 10-1. Two approaches to polynomial regression can be used. The first method calculates squared terms, cubic terms, and so on; these terms are then entered one at a time using multiple regression. Another approach is to use a program that permits curve fitting, such as the regression curve estimation procedure in SPSS. We used the SPSS procedure to fit a quadratic curve of BMI to percent body fat for women. The regression equation was:

A plot is produced by SPSS is given in Figure 10-2.

Missing Observations

When studies involve several variables, some observations on some subjects may be missing. Controlling the problem of missing data is easier in studies in which information is collected prospectively; it is much more difficult when information is obtained from already existing records, such as patient charts. Two important factors are the percentage of observations that is missing and whether missing observations are randomly missing or missing because of some causal factor

 Figure 10-2. Linear and quadratic curves for the relationship between BMI and percent body fat in females. (Data, used with permission, from Jackson AS, Stanforth PR, Gagnon J, Rankinen T, Leon AS, Rao DC, et al: The effect of sex, age, and race on estimating percentage body fat from body mass index: The Heritage Family Study. Int J Obes Relat Metab Disord 2002; 26: 789–796. Table produced with SPSS Inc.; used with permission.)

For example, suppose a researcher designs a case–control study to examine the effect of leg length inequality on the incidence of loosening of the femoral component after total hip replacement. Cases are patients who developed loosening of the femoral component, and controls are patients who did not. In reviewing the records of routine follow-up, the researcher found that leg length inequality was measured in some patients by using weight-bearing anterior–posterior (AP) hip and lower extremity films, whereas other patients had measurements taken using non-weight-bearing films. The type of film ordered during follow-up may well be related to whether the patient complained of hip pain; patients with symptoms were more likely to have received the weight-bearing films, and patients without symptoms were more likely to have had the routine non-weight-bearing films. A researcher investigating this question must not base the leg length inequality measures on weight-bearing films only, because controls are less likely than cases to have weight-bearing film measures in their records. In this situation, the missing leg length information occurred because of symptoms and not randomly.

The potential for missing observations increases in studies involving multiple variables. Depending on the cause of the missing observations, solutions include dropping subjects who have missing observations from the study, deleting variables that have missing values from the study, or substituting some value for the missing data, such as the mean or a predicted value, called imputing. SPSS has an option to estimate missing data with the mean for that variable calculated with the subjects who had the data. The Data Screening procedure (in Descriptive Statistics) in NCSS provides the option of substituting either the mean or a predicted score. Investigators in this situation should seek advice from a statistician on the best way to handle the problem.

Cross Validation

The statistical procedures for all regression models are based on correlations among the variables, which, in turn, are related to the amount of variation in the variables included in the study. Some of the observed variation in any variable, however, occurs simply by chance; and the same degree of variation does not occur if another sample is selected and the study is replicated. The mathematical procedures for determining the regression equation cannot distinguish between real and chance variation. If the equation is to be used to predict outcomes for future subjects, it should therefore be validated on a second sample, a process called cross validation. The regression equation is used to predict the outcome in the second sample, and the predicted outcomes are compared with the actual outcomes; the correlation between the predicted and actual values indicates how well the model fits. Cross-validating the regression equation gives a realistic evaluation of the usefulness of the prediction it provides

In medical research we rarely have the luxury of cross-validating the findings on another sample of the same size. Several alternative methods exist. First, researchers can hold out a proportion of the subjects for cross validation, perhaps 20% or 25%. The holdout sample should be randomly selected from the entire sample prior to the original analysis. The predicted outcomes in the holdout sample are compared with the actual outcomes, often using R2 to judge how well the findings cross-validate.

Another method is the jackknife in which one observation is left out of the sample, call it x1; regression is performed using the n – 1 observations, and the results are applied to x1. Then this observation is returned to the sample, and another, x2, is held out. This process continues until there is a predicted outcome for each observation in the sample; the predicted and actual outcomes are then compared.

The bootstrap method works in a similar manner although the goal is different. The bootstrap can be used with small samples to estimate the standard error and confidence intervals. A small hold-out sample is randomly selected and the statistic of interest calculated. Then the hold-out sample is returned to the original sample, and another hold-out sample is selected. After a fairly large number of samples is analyzed, generally a minimum of 200, standard errors and confidence intervals can be estimated. In essence, the bootstrap method uses the data itself to determine the sampling distribution rather than the central limit theorem discussed in Chapter 4.

Both the jackknife and bootstrap are called resampling methods; they are very computer-intensive and require special software. Kline and colleagues (2002) used a bootstrap method to develop confidence intervals for odds ratios in their study of the use of the d-dimer test in the emergency department.

It is possible to estimate the magnitude of R or R2 in another sample without actually performing the cross validation. This R2 is smaller than the R2 for the original sample, because the mathematical formula used to obtain the estimate removes the chance variation. For this reason, the formula is called a formula for shrinkage. Many computer programs, including NCSS, SPSS, and SAS, provide both R2 for the sample used in the analysis as well as R2 adjusted for shrinkage, often referred to as the adjusted R2. Refer to Table 10-4 where NCSS gives the “Adj R2” in the fifth row of the first column of the computer analysis.

Sample Size Requirements

The only easy way to determine how large a sample is needed in multiple regression or any multivariate technique is to use a computer program. Some rules of thumb, however, may be used for guidance. A common recommendation by statisticians calls for ten times as many subjects as the number of independent variables. For example, this rule of thumb prescribes a minimum of 60 subjects for a study predicting the outcome from six independent variables. Having a large ratio of subjects to variables decreases problems that may arise because assumptions are not met

Assumptions about normality in multiple regression are complicated, depending on whether the independent variables are viewed as fixed or random (as in fixed-effects model or random-effects model in ANOVA), and they are beyond the scope of this text. To ensure that estimates of regression coefficients and multiple R and R2 are accurate representatives of actual population values, we suggest that investigators never perform regression without at least five times as many subjects as variables.

A more accurate estimate is found by using a computer power program. We used the PASS power program to find the power of a study using five predictor variables, as in the Jackson study (Table 10-5). We posed the question: How many subjects are needed to test whether a given variable increases R2 by 0.05, given that four variables are already in the regression equation and they collectively provide an R2 of 0.50? The output from the program is shown in Box 10-1. The power table indicates that a sample of 80 gives power of 0.84, assuming an α or P value of 0.05. The accompanying graph shows the power curve for different sample sizes and different values of α. As you can see, the sample of 359 females and 296 males in the study by Jackson and colleagues was more than adequate for the regression model.

CONTROLLING FOR CONFOUNDING

Analysis of Covariance

Analysis of covariance (ANCOVA) is the statistical technique used to control for the influence of a confounding variable.Confounding variables occur most often when subjects cannot be assigned at random to different groups, that is, when the groups of interest already exist. Gonzalo and colleagues (1996) (Chapters 7 and 8) predicted insulin sensitivity from body mass index (BMI); they wanted to control for age of the women and did so by adding age to the regression equation. When BMI alone is used to predict insulin sensitivity (IS) in hyperthyroid women, the regression equation is

where IS is the insulin sensitivity level. Using this equation, a hyperthyroid woman's insulin sensitivity level is predicted to decrease by 0.077 for each increase of 1 in BMI. For instance, a woman with a BMI of 25 has a predicted insulin sensitivity of 0.411. What would happen, however, if age were also related to insulin sensitivity? A way to control for the possible confounding effect of age is to include that variable in the regression equation. The equation with age included is

Using this equation, a hyperthyroid woman's insulin sensitivity level is predicted to decrease by 0.068 for each increase of 1 in BMI, holding age constant or independent of age. A 30-year-old woman with a BMI of 25 has a predicted insulin sensitivity of 0.456, whereas a 60-year-old woman with the same BMI of 25 has a predicted insulin sensitivity of 0.321

A more traditional use of ANCOVA is illustrated by a study of the negative influence of smoking on the cardiovascular system. Investigators wanted to know whether smokers have more ventricular wall motion abnormalities than nonsmokers (Hartz et al, 1984). They might use a t test to determine whether the mean number of wall motion abnormalities differ in these two groups. The investigators know, however, that wall motion abnormalities are also related to the degree of coronary stenosis, and smokers generally have a greater degree of coronary stenosis. Thus, any difference observed in the mean number of wall abnormalities between smokers and nonsmokers may really be a difference in the amount of coronary stenosis between these two groups of patients.

This situation is illustrated in the graph of hypothetical data in Figure 10-3; in the figure, the relationship between occlusion scores and wall motion abnormalities appears to be the same for smokers and nonsmokers. Nonsmokers, however, have both lower occlusion scores and lower numbers of wall motion abnormalities; smokers have higher occlusion scores and higher numbers of wall motion abnormalities. The question is whether the difference in wall motion abnormalities is due to smoking, to occlusion, or to both.

In this study, the investigators must control for the degree of coronary stenosis so that it does not confound (or confuse) the relationship between smoking and wall motion abnormalities. Useful methods to control for confounding variables are analysis of covariance (ANCOVA) and the Mantel–Haenszel chi-square procedure. Table 10-2 specifies ANCOVA when the dependent variable is numerical (eg, wall motion) and the independent measures are grouping variables on a nominal scale (eg, smoking versus nonsmoking), and confounding variables occur (eg, degree of coronary occlusion). If the dependent measure is also nominal, such as whether a patient has survived to a given time, the Mantel–Haenszel chi-square discussed in Chapter 9 can be used to control for the effect of a confounding (nuisance) variable. ANCOVA can be performed by using the methods of ANOVA; however, most medical studies use one of the regression methods discussed in this chapter.

Box 10-1. LINEAR AND QUADRATIC CURVES FOR THE RELATIONSHIP BETWEEN BMI AND PERCENT BODY FAT IN FEMALES.

 Multiple-Regression Power Analysis Power N Alpha Beta Ind. Variables Tested Ind. Variables Controlled Cnt R2 Cnt R2 0.53528 40 0.05000 0.46472 1 0.05000 4 0.50000 0.63511 50 0.05000 0.36489 1 0.05000 4 0.50000 0.71765 60 0.05000 0.28235 1 0.05000 4 0.50000 0.78431 70 0.05000 0.21569 1 0.05000 4 0.50000 0.83709 80 0.05000 0.16291 1 0.05000 4 0.50000 0.87818 90 0.05000 0.12182 1 0.05000 4 0.50000 0.90973 100 0.05000 0.09027 1 0.05000 4 0.50000 0.93366 110 0.05000 0.06634 1 0.05000 4 0.50000 0.95160 120 0.05000 0.04840 1 0.05000 4 0.50000 Summary Statements A sample size of 80 achieves 84% power to detect an R-Squared of 0.05000 attributed to 1 independent variable(s) using an F-Test with a significance level (alpha) of 0.5000. The variables tested are adjusted for an additional 4 independent variable(s) with an R-Squared of 0.50000.

Figure. No caption available.

Source: Data, used with permission, from Jackson AS, Stanforth PR, Gagnon J, Rankinen T, Leon AS, Rao DC: The effect of sex, age and race on estimating percentage body fat from body mass index: The Heritage Family Study. Int J Obes Relat Metab Disord2002;26:789–796. Output produced with PASS; used with permission.

If ANCOVA is used in this example, the occlusion score is called the covariate, and the mean number of wall motion abnormalities in smokers and nonsmokers is said to be adjusted for the occlusion score (or degree of coronary stenosis). Put another way, ANCOVA simulates the Y outcome observed if the value of X is held constant, that is, if all the patients had the same degree of coronary stenosis. This adjustment is achieved by calculating a regression equation to predict mean number of wall motion abnormalities from the covariate, degree of coronary stenosis, and from a dummy variable coded 1 if the subject is a member of the group (ie, a smoker) and 0 otherwise. For example, the regression equation determined for the hypothetical observations in Figure 10-3 is

The equation illustrates that smokers have a larger number of predicted wall motion abnormalities, because 1.28 is added to the equation if the subject is a smoker. The equation can be used to obtain the mean number of wall motion abnormalities in each group, adjusted for degree of coronary stenosis

 Figure 10-3. Relationship between degree of coronary stenosis and ventricular wall motion abnormalities in smokers and nonsmokers (hypothetical data).

If the relationship between coronary stenosis and ventricular motion is ignored, the mean number of wall motion abnormalities, calculated from the observations in Figure 10-2, is 3.33 for smokers and 1.00 for nonsmokers. If, however, ANCOVA is used to control for degree of coronary stenosis, the adjusted mean wall motion is 2.81 for smokers and 1.53 for nonsmokers, a difference of 1.28, represented by the regression coefficient for the dummy variable for smoking. In ANCOVA, the adjusted Y mean for a given group is obtained by (1) finding the difference between the group's mean on the covariate variable X, denoted, and the grand mean; (2) multiplying the difference by the regression coefficient; and (3) subtracting this product from the unadjusted mean. Thus, for group j, the adjusted mean is

(See Exercise 1.)

This result is consistent with our knowledge that coronary stenosis alone has some effect on abnormality of wall motion; the unadjusted means contain this effect as well as any effect from smoking. Controlling for the effect of coronary stenosis therefore results in a smaller difference in number of wall motion abnormalities, a difference related only to smoking.

Using hypothetical data, Figure 10-4 illustrates schematically the way ANCOVA adjusts the mean of the dependent variable if the covariate is important. Using unadjusted means is analogous to using a separate regression line for each group. For example, the mean value of Y for group 1 is found by using the regression line drawn through the group 1 observations to project the mean value X̅1 onto the Y-axis, denoted [Y with bar above]1 in Figure 10-4. Similarly, the mean of group 2 is found at [Y with bar above]2 by using the regression line to project the mean X̅2 in that group. The Y means in each group adjusted for the covariate (stenosis) are analogous to the projections based on the overall mean value of the covariate; that is, as though the two groups had the same mean value for the covariate. The adjusted means for groups 1 and 2, Adj. [Y with bar above]1 and Adj. [Y with bar above]2, are illustrated by the dotted line projections of X̅ from each separate regression line in Figure 10-4.

 Figure 10-4. Illustration of means adjusted using analysis of covariance.

ANCOVA assumes that the relationship between the covariate (X variable) and the dependent variable (Y) is the same in both groups, that is, that any relationship between coronary stenosis and wall motion abnormality is the same for smokers and nonsmokers. This assumption is equivalent to requiring that the regression slopes be the same in both groups; geometrically, ANCOVA asks whether a difference exists between the intercepts, assuming the slopes are equal.

ANCOVA is an appropriate statistical method in many situations that occur in medical research. For example, age is a variable that affects almost everything studied in medicine; if preexisting groups in a study have different age distributions, investigators must adjust for age before comparing the groups on other variables, just as Gonzalo and colleagues recognized. The methods illustrated in Chapter 3 to adjust mortality rates for characteristics such as age and birth weight are used when information is available on groups of individuals; when information is available on individuals themselves, ANCOVA is used.

Before leaving this section, we point out some important aspects of ANCOVA. First, although only two groups were included in the example, ANCOVA can be used to adjust for the effect of a confounding variable in more than two groups. In addition, it is possible to adjust for more than one confounding variable in the same study, and the confounding variables may be either nominal or numerical. Thus, it is easy to see why the multiple regression model for analysis of covariance provides an ideal method to incorporate confounding variables.

Finally, ANCOVA can be considered as a special case of the more general question of comparing two regression lines (discussed in Chapter 8). In ANCOVA, we assume that the slopes are equal, and attention is focused on the intercept. We can also perform the more global test of both slope and intercept, however, by using multiple regression. In Presenting Problem 4 in Chapter 8 on insulin sensitivity (Gonzalo et al, 1996), interest focused on comparing the regression lines predicting insulin activity from body mass index (BMI) in women who had normal versus elevated thyroid levels. ANCOVA can be used for this comparison using dummy coding. If we let X be BMI, Y be insulin sensitivity level, and Z be a dummy variable, where Z = 1 if the woman is hyperthyroid and Z = 0 for controls, then the multiple-regression model for testing whether the two regression lines are the same (coincident) is

The regression lines have equal slopes and are parallel when b3 is 0, that is, no interaction between the independent variable X and the group membership variable Z. The regression lines have equal intercepts and equal slopes (are coincident) if both b2 and b3 are 0; thus, the model becomes the simple regression equation Y = a + bX. The statistical test for b2 and b3 is the t test discussed in the section titled, “Statistical Tests for the Regression Coefficient.”

Generalized Estimating Equations (GEE)

Many research designs, including both observational studies and clinical trials, concern observations that are clustered or hierarchical. A group of methods has been developed for these special situations. To illustrate, a study to examine the effect of different factors on complication rates following total knee arthroplasty was undertaken in a province of Canada (Kreder et al, 2003). Outcomes included length of hospital stay, inpatient complications, and mortality. Can the researchers examine the outcomes for patients and conclude that any differences are due the risk factors? The statistical methods we have examined thus far assume that one observation is independent from another. The problem with this study design, however, is that the outcome for patients operated on by the same surgeon may be related to factors other than the surgical method, such as the skill level of the surgeon. In this situation, patients are said to be nested within physicians

Many other examples come to mind. Comparing the efficacy of medical education curricula is difficult because students are nested within medical schools. Comparing health outcomes for children within a community is complicated by the fact that children are nested within families. Many clinical trials create nested situations, such as when trials are carried out in several medical centers. The issue arises of how to define the unit of analysis—should it be the students or the school? the children or the families? the patients or the medical center?

The group of methods that accommodates these types of research questions include generalized estimating equations (GEE), multilevel modeling, and the analysis of hierarchically structured data. Most of these methods have been developed within the last decade and statistical software is just now becoming widely available. In addition to some specialized statistical packages, SAS, Stata, and SPSS contain procedures to accommodate hierarchical data. Using these models is more complex than some of the other methods we have discussed, and it is relatively easy to develop a model that is meaningless or misleading. Investigators who have research designs that involve nested subjects should consult a biostatistician for assistance.

PREDICTING NOMINAL OR CATEGORICAL OUTCOMES

In the regression model discussed in the previous section, the outcome or dependent Y variable is measured on a numerical scale. When the outcome is measured on a nominal scale, other approaches must be used. Table 10-2 indicates that several methods can be used to analyze problems with several independent variables when the dependent variable is nominal. First we discuss logistic regression, a method that is frequently used in the health field. One reason for the popularity of logistic regression is that many outcomes in health are nominal, actually binary, variables—they either occur or do not occur. The second reason is that the regression coefficients obtained in logistic regression can be transformed into odds ratios. So, in essence, logistic regression provides a way to obtain an odds ratio for a given risk factor that controls for, or is adjusted for, confounding variables; in other words, we can do analysis of covariance with logistic regression as well as with multiple linear regression

Other methods are log-linear analysis and several methods that attempt to classify subjects into groups. These methods appear occasionally in the medical literature, and we provide a brief illustration, primarily so that readers can have an intuitive understanding of their purpose. The classification methods are discussed in the section titled “Methods for Classification.”

Logistic Regression

Logistic regression is commonly used when the independent variables include both numerical and nominal measures and the outcome variable is binary (dichotomous). Logistic regression can also be used when the outcome has more than two values (Hosmer and Lemeshow, 2000), but its most frequent use is as in Presenting Problem 2, which illustrates the use of logistic regression to identify trauma patients who are alcohol-positive, a yes-or-no outcome. Soderstrom and his coinvestigators (1997) wanted to develop a model to help emergency department staff identify the patients most likely to have blood alcohol concentrations (BAC) in excess of 50 mg/dL at the time of admission. The logistic model gives the probability that the outcome, such as high BAC, occurs as an exponential function of the independent variables. For example, with three independent variables, the model is

where b0 is the intercept, b1b2, and b3 are the regression coefficients, and exp indicates that the base of the natural logarithm (2.718) is taken to the power shown in parentheses (ie, the antilog). The equation can be derived by specifying the variables to be included in the equation or by using a variable selection method similar to the ones for multiple regression. A chi-square test (instead of the t or F test) is used to determine whether a variable adds significantly to the prediction

In the study described in Presenting Problem 2, the variables used by the investigators to predict blood alcohol concentrations included the variables listed in Table 10-7. The investigators coded the values of the independent variables as 0 and 1, a method useful both for dummy variables in multiple regression and for variables in logistic regression. This practice makes it easy to interpret the odds ratio. In addition, if a goal is to develop a score, as is the case in the study by Soderstrom and coinvestigators, the coefficient associated with a given variable needs to be included in the score only if the patient has a 1 on that variable. For instance, if patients are more likely to have BAC ≥ 50 mg/dL on weekends, the score associated with day of week is not included if the injury occurs on a weekday.

Table 10-7. Variables, codes, and frequencies for variables.a

 Value Frequency Age 39 or younger 0 3514 40 or older 1 1534 Time of Day 6 PM–6 AM 0 2601 6 am–6 pm 1 2447 Day of week Monday–Thursday 0 2642 Friday–Sunday 1 2406 Sex Female 0 1457 Male 1 3591 Race Non-Caucasian 0 1758 Caucasian 1 3290 Injury Type Unintentional 0 3966 Intentional 1 1082 Blood Alcohol Concentration <50 mg/dL 0 4067 ≥50 mg/dL 1 1465 aNot all totals are the same because of missing data on some variables.Source: Data, used with permission, from Soderstrom CA, Kufera JA, Dischinger PC, Kerns TJ, Murphy JG, Lowenfels A: Predictive model to identify trauma patients with blood alcohol concentrations 50 mg/dL. J Trauma 1997; 42: 67–73.

The investigators calculated logistic regression equations for each of four groups: males with intentional injury, males with unintentional injury, females with intentional injury, and females with unintentional injury. The results of the analysis on males who were injured unintentionally are given in Table 10-8.

We need to know which value is coded 1 and which 0 in order to interpret the results. For example, time of day has a negative regression coefficient. The hours of 6 AM -6 PM are coded as 1, so a male coming to the emergency department with unintentional injuries in the daytime is less likely to have BAC ≥ 50 mg/dL than a male with unintentional injuries at night. The age variable is not significant (P > 0.268). Interpreting the equation for the other variables indicates that males with unintentional injuries who come to the emergency department at night and on weekends and are Caucasian are more likely to have elevated blood alcohol levels.

Table 10-8. Logistic regression report for men with unintentional injury.a

 Filter sex=1; injtype=0 Response BAC50 Parameter Estimation Section Regression Probability Variable Coefficient Standard Error χ2 Level Last R2 Intercept -0.7960357 0.1188189 44.88 0.000000 0.016780 Daytime -1.8445640 0.1062133 301.60 0.000000 0.102879 Weekday 0.6622602 0.0975930 46.05 0.000000 0.017208 Race 0.2780667 0.1125357 6.11 0.013477 0.002316 Age 40 -0.1198371 0.1082209 1.23 0.268148 0.000466 Odds Ratio Estimation Section Regression Odds Lower 95% Upper 95% Variable Coefficient Standard Error Ratio Confidence Limit Confidence Limit Intercept -0.796036 0.118819 Daytime -1.844564 0.106213 0.158094 0.128383 0.194682 Weekday 0.662260 0.097593 1.939170 1.601565 2.347942 Race 0.278067 0.112536 1.320574 1.059186 1.646468 Age 40 -0.119837 0.108221 0.887065 0.717526 1.096663 Model Summary Section Model R2 Model df Model χ2 Model Probability 0.141881 .4 434.84 0.000000 Classification Table Predicted Actual 0 1 Total 0 Count 1751.00 202.00 1953.00 Row percent 89.66 10.34 100.00 Column percent 80.88 42.98 74.12 1 Count 414.00 268.00 682.00 Row percent 60.70 39.30 100.00 Column percent 19.12 57.02 25.88 Total Count 2165.00 470.00 2635.00 Row percent 82.16 17.84 Column percent 100.00 100.00 Percent correctly classified = 76.62 aResults from logistic regression for men with unintentional injury.Source: Data, used with permission, from Soderstrom CA, Kufera JA, Dischinger PC, Kerns TJ, Murphy JG, Lowenfels A: Predictive model to identify trauma patients with blood alcohol concentrations 50 mg/dL. J Trauma 1997; 42: 67–73. Output produced using NCSS; used with permission.

The logistic equation can be used to find the probability for any given individual. For instance, let us find the probability that a 27-year-old Caucasian man who comes to the emergency department at 2 PM on Thursday has BAC ≥ 50 mg/dL. The regression coefficients from Table 10-8 are

and we evaluate it as follows:

Substituting -2.36 in the equation for the probability:

Therefore, the chance that this man has a high BAC is less than 1 in 10. See Exercise 3 to determine the likelihood of a high BAC if the same man came to the emergency department on a Saturday night

One advantage of logistic regression is that it requires no assumptions about the distribution of the independent variables. Another is that the regression coefficient can be interpreted in terms of relative risks in cohort studies or odds ratios in case–control studies. In other words, the relative risk of an elevated BAC in males with unintentional trauma during the day is exp (-1.845) = 0.158. The relative risk for night is the reciprocal, 1/0.158 = 6.33; therefore, males with unintentional injuries who come to the ER at night are more than six times more likely to have BAC ≥ 50 mg/dL than males coming during the day.

How can readers easily tell which odds ratios are statistically significant? Recall from Chapter 8 that if the 95% confidence interval does not include 1, we can be 95% sure that the factor associated with the odds ratio either is a significant risk or provides a significant level of protection. Do any of the independent variables in Table 10-8 have a 95% confidence interval for the odds ratio that contains 1? Did you already know without looking that it would be age because the age variable is not statistically significant?

The overall results from a logistic regression may be tested with Hosmer and Lemeshow's goodness of fit test. The test is based on the chi-square distribution. A P value ≥ 0.05 means that the model's estimates fit the data at an acceptable level.

There is no straightforward statistic to judge the overall logistic model as R2 is used in multiple regression. Some statistical programs giveR2, but it cannot be interpreted as in multiple regression because the predicted and observed outcomes are nominal. Several other statistics are available as well, including Cox and Snell's R2 and a modification called Nagelkerke's R2, which is generally larger than Cox and Snell'sR2.

Before leaving the topic of logistic regression, it is worthwhile to inspect the classification table in Table 10-8. This table gives the actual and the predicted number of males with unintentional injuries who had normal versus elevated BAC. The logistic equation tends to underpredict those with elevated concentrations: 470 males are predicted versus the 682 who actually had BAC ≥ 50 mg/dL. Overall, the prediction using the logistic equation correctly classified 76.62% of these males. Although this sounds rather impressive, it is important to compare this percentage with the baseline: 74.12% of the time we would be correct if we simply predicted a male to have normal BAC. Can you recall an appropriate way to compensate for or take the baseline into consideration? Although computer programs typically do not provide the kappa statistic, discussed in Chapter 5, it provides a way to evaluate the percentage correctly classified (see Exercise 4). Other measures of association are used so rarely in medicine that we did not discuss them in Chapter 8. SPSS provides two nonparametric correlations, the lambda correlation and the tau correlation, that can be interpreted as measures of strength of the relationship between observed and predicted outcomes.

Log-Linear Analysis

Psoriasis, a chronic, inflammatory skin disorder characterized by scaling erythematous patches and plaques of skin, has a strong genetic influence—about one third of patients have a positive family history. Stuart and colleagues (2002) conducted a study to determine differences in clinical manifestation between patients with positive and negative family histories of psoriasis and with early-onset versus late-onset disease. This study was used in Exercise 7 in Chapter 7

The hypothesis was that the variables age at onset (in 10-year categories), onset (early or late), and familial status (sporadic or familial) had no effect on the occurrence of joint complaints. Results from the analysis of age, familial status, and frequency of joint complaints are given in Table 10-9.

Each independent variable in this research problem is measured on a categorical or nominal scale (age, onset, and familial status), as is the outcome variable (occurrence of joint complaints). If only two variables are being analyzed, the chi-square method introduced in Chapter 6can be used to determine whether a relationship exists between them; with three or more nominal or categorical variables, a statistical method called log-linear analysis is appropriate. Log-linear analysis is analogous to a regression model in which all the variables, both independent and dependent, are measured on a nominal scale. The technique is called log-linear because it involves using the logarithm of the observed frequencies in the contingency table.

Table 10-9. Frequency of joint complaints by familial status, stratified by age at examination.

 Age atExamination Joint Complaints (%) (Years) Sporadic Familial Relative risk Pearson χ2 P-value 0–20 0.0 0.0 — — — 21–30 14.6 19.4 1.41 0.32 0.57 31–40 26.6 40.9 1.91 3.33 0.068 41–50 20.5 26.8 1.42 0.43 0.51 51–60 47.1 47.3 1.01 0.00024 0.99 >60 28.6 45.3 2.07 0.72 0.40 Total 19.7 34.0 2.09 13.32 0.00026 Source: Reproduced, with permission, from Stuart P, Malick F, Nair RP, Henseler T, Lim HW, Jenisch S, et al: Analysis of phenotypic variation in psoriasis as a function of age at onset and family history.Arch Dermatol Res 2002; 294: 207–213.

Stuart and colleagues (2002) concluded that joint complaints and familial psoriasis were conditionally independent given age at examination, but that age at examination was not independent of either joint complaints or a family history.

Log-linear analysis may also be used to analyze multidimensional contingency tables in situations in which no distinction exists between independent and dependent variables, that is, when investigators simply want to examine the relationship among a set of nominal measures. The fact that log-linear analysis does not require a distinction between independent and dependent variables points to a major difference between it and other regression models—namely, that the regression coefficients are not interpreted in log-linear analysis.

PREDICTING A CENSORED OUTCOME: COX PROPORTIONAL HAZARD MODEL

In Chapter 9, we found that special methods must be used when an outcome has not yet been observed for all subjects in the study sample. Studies of time-limited outcomes in which there are censored observations, such as survival, naturally fall into this category; investigators usually cannot wait until all patients in the study experience the event before presenting information

Many times in clinical trials or cohort studies, investigators wish to look at the simultaneous effect of several variables on length of survival. For example, in the study described in Presenting Problem 3, Crook and her colleagues (1997) wanted to evaluate the relationship of pretreatment prostate-specific antigen (PSA) and posttreatment nadir PSA on the failure pattern of radiotherapy for treating localized prostate carcinoma. They categorized failures as biochemical, local, and distant. They analyzed data from a cohort study of 207 patients, but only 68 had a failure due to any cause in the 70 months during which the study was underway. These 68 observations on failure were therefore censored. The independent variables they examined included the Gleason score, the T classification, whether the patient had received hormonal treatment, the PSA before treatment, and the lowest PSA following treatment.

Table 10-2 indicates that the regression technique developed by Cox (1972) is appropriate when time-dependent censored observations are included. This technique is called the Cox regression, or proportional hazard, model. In essence this model allows the covariates (independent variables) in the regression equation to vary with time. The dependent variable is the survival time of the jth patient, denotedYj. Both numerical and nominal independent variables may be used in the model.

The Cox regression coefficients can be used to determine the relative risk or odds ratio (introduced in Chapter 3) associated with each independent variable and the outcome variable, adjusted for the effect of all other variables in the equation. Thus, instead of giving adjusted means, as ANCOVA does in regression, the Cox model gives adjusted relative risks. We can also use a variety of methods to select the independent variables that add significantly to the prediction of the outcome, as in multiple regression; however, a chi-square test (instead of the F test) is used to test for significance.

The Cox proportional hazard model involves a complicated exponential equation (Cox, 1972). Although we will not go into detail about the mathematics involved in this model, its use is so common in medicine that an understanding of the process is needed by readers of the literature. Our primary focus is on the application and interpretation of the Cox model.

Understanding the Cox Model

Recall from Chapter 9 that the survival function gives the probability that a person will survive the next interval of time, given that he or she has survived up until that time. The hazard function, also defined in Chapter 9, is in some ways the opposite: it is the probability that a person will die (or that there will be a failure) in the next interval of time, given that he or she has survived until the beginning of the interval. The hazard function plays a key role in the Cox model

The Cox model examines two pieces of information: the amount of time since the event first happened to a person and the person's observations on the independent variables. Using the Crook example, the amount of time might be 3 years, and the observations would be the patient's Gleason score, T classification, whether he had been treated with hormones, and the two PSA scores (pretreatment and lowest posttreatment). In the Cox model, the length of time is evaluated using the hazard function, and the linear combination of the independent values (like the linear combination we obtain when we use multiple regression) is the exponent of the natural logarithm, e. For example, for the Crook study, the model is written as

In words, the model is saying that the probability of dying in the next time interval, given that the patient has lived until this time and has the given values for Gleason score, T classification, and so on, can be found by multiplying the baseline hazard (h0) by the natural log raised to the power of the linear combination of the independent variables. In other words, a given person's probability of dying is influenced by how commonly patients die and the given person's individual characteristics. If we take the antilog of the linear combination, we multiply rather than add the values of the covariates. In this model, the covariates have a multiplicative, or proportional, effect on the probability of dying—thus, the term “proportional hazard” model.

An Example of the Cox Model

In the study described in Presenting Problem 3, Crook and her colleagues (1997) used the Cox proportional hazard model to examine the relationship between pretreatment PSA and posttreatment PSA nadir and treatment failure in men with prostate carcinoma following treatment with radiotherapy. Failure was categorized as chemical, local, or distant. The investigators wanted to control for possible confounding variables, including the Gleason score, the T classification, both measures of severity, and whether the patient received hormones prior to the radiotherapy. The outcome is a censored variable, the amount of time before the treatment fails, so the Cox proportional hazard model is the appropriate statistical method. We use the results of analysis using SPSS, given in Table 10-10, to point out some salient features of the method

Both numerical and nominal variables can be used as independent variables in the Cox model. If the variables are nominal, it is necessary to tell the computer program so they can be properly analyzed. SPSS prints this information. PRERTHOR, pretreatment hormone therapy, is recoded so that 0 = no and 1 = yes. Prior to doing the analysis, we recoded the Gleason score into a variable called GSCORE with two values: 0 for Gleason scores 2–6 and 1 for Gleason scores 7–10. The T classification variable, TUMSTAGE, was recoded by the computer program using dummy variable coding. Note that for four values of TUMSTAGE, only three variables are needed, with the three more advanced stages compared with the lowest stage, T1b-2.

Among the 207 men in the study, 68 had experienced a failure by the time the data were analyzed. The authors reported a median follow-up of 36 months with a range of 12 to 70 months. The log likelihood statistic (LL) is used to evaluate the significance of the overall model; smaller values indicate that the data fit the model better. The change in the log likelihood associated with the initial (full) model in which no independent variables are included in the equation and the log likelihood after the variables are entered is calculated. In this example, the change is 72.706 (highlighted in Table 10-10), and it is the basis of the chi-square statistic used to determine the significance of the model. The significance is reported, as often occurs with computer programs, as 0.0000.

In addition to testing the overall model, it is possible to test each independent variable to see if it adds significantly to the prediction of failure. Were any of the potentially confounding variables significant? The significance of TUMSTAGE requires some explanation. The variable itself is significant, with P = 0.0025 (shaded in the table). The TUMSTAGE(3) variable (which indicates the patient has T3–4 stage tumor), however, is the one that really matters because it is the only significant stage (P = 0.0066). Note that Gleason score and hormone therapy were not significant. Was either of the PSA values important in predicting failure? It appears that the pretreatment PSA is not significant, but the lowest PSA (NADIRPSA) reached following treatment has a very low P value.

As in logistic regression, the regression coefficients in the Cox model can be interpreted in terms of relative risks or odds ratios (by finding the antilog) if they are based on independent binary variables, such as hormone therapy. For this reason, many researchers divide independent variables into two categories, as we did with Gleason score, even though this practice can be risky if the correct cutpoint is not selected. The T classification variable was recoded as three dummy variables to facilitate interpretation in terms of odds ratios for each stage. The odds ratios are listed under the column titled “Exp (B)” in Table 10-10. Using the T3–4 stage (TUMSTAGE(3)) as an illustration, the antilog of the regression coefficient, 1.5075, is exp (1.5075) = 4.5156. Note that the 95% confidence interval goes from approximately 1.52 to 13.40; because this interval does not contain 1, the odds ratio is statistically significant (consistent with the P value).

Table 10-10. Results from Cox proportional hazard model using both pretreatment and posttreatment variables.

Indicator Parameter Coding

Value

Frequency

PRERTHOR

1.00

44

0.000

2.00

163

1.000

GSCORE

Recoded Gleason score

2–6

168

0.000

7–10

39

1.000

TUMSTAGE

Tumor stage

T1b–2

34

0.000

0.000

0.000

T2a

34

1.000

0.000

0.000

T2b–c

79

0.000

1.000

0.000

T3–4

60

0.000

0.000

1.000

Dependent Variable: TIMEANYF

Events

Censored

68

139 (67.1%)

Beginning block number 0.

Initial log likelihood function

-2 Log likelihood

649.655

Beginning block number 1.

Method:   Enter

Variable(s) Entered at step number 1.

GSCORE
TUMSTAGE
PRERTHOR
PRERXPSA

Recoded Gleason score
Tumor stage

Coefficients converged after seven iterations.

-2 Log likelihood

576.950

χ2

df

Significance

Overall (score)

274.737

7

0.0000

Change (–2LL) from

Previous block

72.706

7

0.0000

Previous step

72.706

7

0.0000

Variables in the equation

Variable

B

SE

Wald

df

Significance

R

GSCORE

0.4420

0.2843

2.4172

1

0.1200

0.0253

TUMSTAGE

14.3608

3

0.0025

0.1134

TUMSTAGE (1)

-0.0224

0.7077

0.0010

1

0.9747

0.0000

TUMSTAGE (2)

0.8044

0.5506

2.1342

1

0.1440

0.0144

TUMSTAGE (3)

1.5075

0.5548

7.3828

1

0.0066

0.0910

PRERTHOR

-0.1348

0.3168

0.1811

1

0.6704

0.0000

PRERXPSA

0.0040

0.0029

1.8907

1

0.1691

0.0000

0.0769

0.0115

44.7491

1

0.0000

0.2565

 Variable Exp (B) 95% CI for Exp (B) Lower Upper GSCORE 1.5558 0.8912 2.7161 TUMSTAGE (1) 0.9778 0.2443 3.9140 TUMSTAGE (2) 2.2353 0.7597 6.5770 TUMSTAGE (3) 4.5156 1.5221 13.3962 PRERTHOR 0.8739 0.4697 1.6260 PRERXPSA 1.0040 0.9983 1.0098 NADIRPSA 1.0799 1.0559 1.1045

df = degrees of freedom; SE = standard error; CI = confidence interval; Wald = statistic used by SPSS to test the significance of variances.
Source: Data, used with permission, from Crook JM, Bahadur YA, Bociek RG, Perry GA, Robertson SJ, Esche BA: Radiotherapy for localized prostate carcinoma. Cancer 1997; 79: 328–336. Output produced using SPSS 10.0, a registered trademark of SPSS, Inc; used with permission.

Crook and colleagues (1997) also computed the Cox model using only the variables known prior to treatment (see Exercise 8).

Importance of the Cox Model

The Cox model is very useful in medicine, and it is easy to see why it is being used with increasing frequency. It provides the only valid method of predicting a time-dependent outcome, and many health-related outcomes are related to time. If the independent variables are divided into two categories (dichotomized), the exponential of the regression coefficient, exp (b), is the odds ratio, a useful way to interpret the risk associated with any specific factor. In addition, the Cox model provides a method for producing survival curves that are adjusted for confounding variables. The Cox model can be extended to the case of multiple events for a subject, but that topic is beyond our scope. Investigators who have repeated measures in a time-to-survival study are encouraged to consult a statistician.

META-ANALYSIS

Meta-analysis is a way to combine results of several independent studies on a specific topic. Meta-analysis is different from the methods discussed in the preceding sections because its purpose is not to identify risk factors or to predict outcomes for individual patients; rather, this technique is applicable to any research question. We briefly introduced meta-analysis in Chapter 2. Because we could not talk about it in detail until the basics of statistical tests (confidence limits, P values, etc) were explained, we include it in this chapter. It is an important technique increasingly used for studies in health and it can be looked on as an extension of multivariate analysis

The idea of summarizing a set of studies in the medical literature is not new; review articles have long had an important role in helping practicing physicians keep up to date and make sense of the many studies on any given topic. Meta-analysis takes the review article a step further by using statistical procedures to combine the results from different studies. Glass (1977) developed the technique because many research projects are designed to answer similar questions, but they do not always come to similar conclusions. The problem for the practitioner is to determine which study to believe, a problem unfortunately too familiar to readers of medical research reports.

Sacks and colleagues (1987) reviewed meta-analyses of clinical trials and concluded that meta-analysis has four purposes: (1) to increase statistical power by increasing the sample size, (2) to resolve uncertainty when reports do not agree, (3) to improve estimates of effect size, and (4) to answer questions not posed at the beginning of the study. Purpose 3 requires some expansion because the concept of effect size is central to meta-analysis. Cohen (1988) developed this concept and defined effect size as thedegree to which the phenomenon is present in the population. An effect size may be thought of as an index of how much difference exists between two groups—generally, a treatment group and a control group. The effect size is based on means if the outcome is numerical, on proportions or odds ratios if the outcome is nominal, or on correlations if the outcome is an association. The effect sizes themselves are statistically combined in meta-analysis.

Veenstra and colleagues (1999) used meta-analysis to evaluate the efficacy of impregnating central venous catheters with an antiseptic. They examined the literature, using manual and computerized searches, for publications containing the words chlorhexidine, antiseptic, andcatheter and found 215 studies. Of these, 24 were comparative studies in humans. Nine studies were eliminated because they were not randomized, and another two were excluded based on the criteria for defining catheter colonization and catheter-related bloodstream infection. Ten studies examined both outcomes, two examined only catheter colonization, and one reported only catheter-related bloodstream infection.

Two authors independently read and evaluated each article. They reviewed the sample size, patient population, type of catheter, catheterization site, other interventions, duration of catheterization, reports of adverse events, and several other variables describing the incidence of colonization and catheter-related bloodstream infection. The authors also evaluated the appropriateness of randomization, the extent of blinding, and the description of eligible subjects. Discrepancies between the reviewers were resolved by a third author. Some basic information about the studies evaluated in this meta-analysis is given in Table 10-11.

The authors of the meta-analysis article calculated the odds ratios and 95% confidence intervals for each study and used a statistical method to determine summary odds ratios over all the studies. These odds ratios and intervals for the outcome of catheter-related bloodstream infection are illustrated in Figure 10-5. This figure illustrates the typical way findings from meta-analysis studies are presented. Generally the results from each study are shown, and the summary or combined results are given at the bottom of the figure. When the summary statistic is the odds ratio, a line representing the value of 1 is drawn to make it easy to see which of the studies have a significant outcome.

From the data in Table 10-11 and Figure 10-5, it appears that only one study (of the 11) reported a statistically significant outcome because only one has a confidence interval that does not contain 1. The entire confidence interval in Maki and associates' study (1997) is less than 1, indicating that these investigators found a protective effect when using the treated catheters. Of interest is the summary odds ratio, which illustrates that by pooling the results from 11 studies, treating the catheters appears to be beneficial. Several of the studies had relatively small sample sizes, however, and the failure to find a significant difference may be due to low power. Using meta-analysis to combine the results from these studies can provide insight on this issue.

A meta-analysis does not simply add the means or proportions across studies to determine an “average” mean or proportion. Although several different methods can be used to combine results, they all use the same principle of determining an effect size in each study and then combining the effect sizes in some manner. The methods for combining the effect sizes include the z approximation for comparing two proportions (Chapter 6); the t test for comparing two means (Chapter 6); the P values for the comparisons, and the odds ratio as shown in Veenstra and colleagues' study (1999). The values corresponding to the effect size in each study are the numbers combined in the meta-analysis to provide a pooled (overall) P value or confidence interval for the combined studies. The most commonly used method for reporting meta-analyses in the medical literature is the odds ratio with confidence intervals.

In addition to being potentially useful when published studies reach conflicting conclusions, meta-analysis can help raise issues to be addressed in future clinical trials. The procedure is not, however, without its critics, and readers should be aware of some of the potential problems in its use. To evaluate meta-analysis, LeLorier and associates (1997) compared the results of a series of large randomized, controlled trials with relevant previously published meta-analyses. Their results were mixed: They found that meta-analysis accurately predicted the outcome in only 65% of the studies; however, the difference between the trial results and the meta-analysis results was statistically significant in only 12% of the comparisons. Ioannidis and colleagues (1998) determined that the discrepancies in the conclusions were attributable to different disease risks, different study protocols, varying quality of the studies, and possible publication bias (discussed in a following section). These reports serve as a useful reminder that well-designed clinical trials remain a critical source of information.

Studies designed in dissimilar manners should not be combined. In performing a meta-analysis, investigators should use clear and well-accepted criteria for deciding whether studies should be included in the analysis, and these criteria should be stated in the published meta-analysis.

Most meta-analyses are based on the published literature, and some people believe it is easier to publish studies with results than studies that show no difference. This potential problem is called publication bias. Researchers can take at least three important steps to reduce publication bias. First, they can search for unpublished data, typically done by contacting the authors of published articles. Veenstra and his colleagues (1999) did this and contacted the manufacturer of the treated catheters as well but were unable to identify any unpublished data. Second, researchers can perform an analysis to see how sensitive the conclusions are to certain characteristics of the studies. For instance, Veenstra and colleagues assessed sources of heterogeneity or variation among the studies and reported that excluding these studies had no substantive effect on the conclusions. Third, investigators can estimate how many studies showing no difference would have to be done but not published to raise the pooled P value above the 0.05 level or produce a confidence interval that includes 1 so that the combined results would no longer be significant. The reader can have more confidence in the conclusions from a meta-analysis that finds a significant effect if a large number of unpublished negative studies would be required to repudiate the overall significance. The increasing use of computerized patient databases may lessen the effect of publication bias in future meta-analyses. Montori and colleagues (2000) provide a review of publication bias for clinicians.

Table 10-11. Characteristics of studies comparing antiseptic-impregnated with control catheters.

 Number of Catheters Catheter Duration (Number of Patients) Mean, d Outcome Definitions Study, ya Number of Catheters Lumens Patient Population Catheter Exchangeb Treatment Group Control Group Treatment Group Control Group Catheter Colonizationc Catheter-Related Bloodstream InfectionLKd Tennenberg et al, 1997 2,3 Hospital No 137 (137) 145 (145) 5.1 5.3 SQ (IV, SC, >15 CFU) SO (IV, SC, site), CS, NS 1997 Maki et al, 1997 3 ICU Yes 208 (72) 195 (86) 6.0 6.0 SQ (IV, >15 CFU) SO (>15 CFU, IV, hub, inf)e van Heerden NRet al, 1996f 3 ICU No 28 (28) 26 (26) 6.6 6.8 SQ (IV, >15 CFU) NR Hannan et al, 1996 3 ICU NR 68 (NR) 60 (NR) 7 8 SQ (IV, >103CFU)g SQ (IV, >103 CFU), NS Bach et al, 1994f 3 ICU No 14 (14) 12 (12) 7.0 7.0 QN (IV, >103CFU) NR Bach et al, 1996f 2, 3 Surgical No 116 (116) 117 (117) 7.7 7.7 QN (IV, >103CFU) SO (IV) Heard et al, 1998f 3 SICU Yes 151 (107) 157 (104) 8.5 9 SQ (IV, SC, >14 CFU) SO (IV, SC, >4 CFU) Collin, 1999 1, 2, 3 ED/ICU Yes 98 (58) 139 (61) 9.0 7.3 SQ (IV, SC, >15 CFU) SO (IV, SC) Ciresi et al, 1996f 3 TPN Yes 124 (92) 127 (99) 9.6 9.1 SQ (IV, SC, >15 CFU) SO (IV, SC) Pemberton et al, 1996 3 TPN No 32 (32) 40 (40) 10 11 NR SO (IV), res, NS Ramsay et al, 1994e 3 Hospital No 199 (199) 189 (189) 10.9 10.9 SQ (IV, SC, >15 CFU) SO (IV, SC) Trazzera et al, 1995e 3 ICU/BMT Yes 123 (99) 99 (82) 11.2 6.7 SQ (IV, >15 CFU) SO (IV, >15 CFU) George et al, 1997 3 Transplant No 44 (NR) 35 (NR) NR NR SQ (IV, >5 CFU) SO (IV) a Readers should refer to the original article for these citations.b Catheter exchange was performed using a guide wire.c Catheter segments cultured and criteria for positive culture are given in parentheses.d Catheter segment or site cultured and criteria for positive culture are given in parentheses.e Organism identity was confirmed by restriction-fragment subtyping.f Additional information was provided by author (personal communications, Jan 1998–Mar 1998).g Culture method is reported as semiquantitative; criteria for culture growth suggest quantitative method.NR = not reported; ICU = intensive care unit; SICU = surgical intensive care unit; TPN = total parenteral nutrition; BMT = bone marrow transplant; ED = emergency department; hospital, hospitalwide or a variety of settings; SQ = semiquantitative culture; QN = quantitative culture; CFU = colony-forming units; IV = intravascular catheter segment; SC = subcutaneous catheter segment; site = catheter insertion site; hub = catheter hub; inf = catheter infusate; SO = same organism isolated from blood and catheter; CS = clinical symptoms of systemic infection; res = resolution of symptoms on catheter removal; and NS = no other sources of infection.Source: Table 1 from Veenstra DL, Saint S, Saha S, Lumley T, Sullivan SD: Efficacy of antiseptic-impregnated central venous catheters in preventing catheter-related bloodstream infection. JAMA 1999;281:261–267. Copyright Š 1999, American Medical Association; used with permission.

Figure 10-5. Analysis of catheter-related bloodstream infection in trials comparing chlorhexidine/silver sulfadiazine-impregnated central venous catheters with nonimpregnated catheters. The diamond indicates odds ratio (OR) and 95% confidence interval (CI). Studies are ordered by increasing mean duration of catheterization in the treatment group. The size of the squares is inversely proportional to the variance of the studies. (Reproduced, with permission, from Veenstra DL, Saint S, Saha S, Lumley T, Sullivan SD: Efficacy of antiseptic-impregnated central venous catheters in preventing catheter-related bloodstream infection. JAMA 1999; 281: 261–267. Copyright Š 1999, American Medical Association.)

The Cochrane Collection is a large and growing database of meta-analyses that were done according to specific guidelines. Each meta-analysis contains a description and an assessment of the methods used in the articles that constitute the meta-analysis. Graphs such as Figure 10-5 are produced, and, if appropriate, graphs for subanalyses are presented. For instance, if both cohort studies and clinical trials have been done on a given topic, the Cochrane Collection presents a separate figure for each. The Cochrane Collection is available on CD-ROM or via the Internet for an annual fee. The Cochrane Web site states that: “Cochrane reviews (the principal output of the Collaboration) are published electronically in successive issues of The Cochrane Database of Systematic Reviews. Preparation and maintenance of Cochrane reviews is the responsibility of international collaborative review groups.”

No one has argued that meta-analyses should replace clinical trials. Veenstra and his colleagues (1999) conclude that a large trial may be warranted to confirm their findings. Despite their shortcomings, meta-analyses can provide guidance to clinicians when the literature contains several studies with conflicting results, especially when the studies have relatively small sample sizes. Furthermore, based on the increasingly large number of published meta-analyses, it appears that this method is here to stay. As with all types of studies, however, the methods used in a meta-analysis need to be carefully assessed before the results are accepted.

METHODS FOR CLASSIFICATION

Several multivariate methods can be used when the research question is related to classification. When the goal is to classify subjects into groups, discriminant analysis, cluster analysis, and propensity score analysis are appropriate. These methods all involve multiple measurements on each subject, but they have different purposes and are used to answer different research questions.

Discriminant Analysis

Logistic regression is used extensively in the biologic sciences. A related technique, discriminant analysis, although used with less frequency in medicine, is a common technique in the social sciences. It is similar to logistic regression in that it is used to predict a nominal or categorical outcome. It differs from logistic regression, however, in that it assumes that the independent variables follow a multivariate normal distribution, so it must be used with caution if some X variables are nominal

The procedure involves determining several discriminant functions, which are simply linear combinations of the independent variables that separate or discriminate among the groups defined by the outcome measure as much as possible. The number of discriminant functions needed is determined by a multivariate test statistic called Wilks' lambda. The discriminant functions' coefficients can be standardized and then interpreted in the same manner as in multiple regression to draw conclusions about which variables are important in discriminating among the groups.

Leone and coworkers (2002) wanted to identify characteristics that differentiate among expert adolescent female athletes in four different sports. Body mass, height, girth of the biceps and calf, skinfold measures, measures of aerobic power, and flexibility were among the measures they examined. Sports included were tennis with 15 girls, skating with 46, swimming with 23, and volleyball with 16. Discriminant analysis is useful when investigators want to evaluate several explanatory variables and the goal is to classify subjects into two or more categories or groups, such as that defined by the four sports.

Their analysis revealed three significant discriminant functions. The first function discriminated between skaters and the other three groups; the second reflected differences between volleyball players and swimmers, and the third between swimmers and tennis players. They concluded that adolescent female athletes show physical and biomotor differences that distinguish among them according to their sport.

Although discriminant analysis is most often employed to explain or describe factors that distinguish among groups of interest, the procedure can also be used to classify future subjects. Classification involves determining a separate prediction equation corresponding to each group that gives the probability of belonging to that group, based on the explanatory variables. For classification of a future subject, a prediction is calculated for each group, and the individual is classified as belonging to the group he or she most closely resembles.

Factor Analysis

Andrewes and colleagues (2003) wanted to know how scores on the Emotional and Social Dysfunction Questionnaire (ESDQ) can be used to help decide the level of support needed following brain surgery. Similarly, the Medical Outcomes Study Short Form 36 (MOS-SF36) is a questionnaire commonly used to measure patient outcomes (Stewart et al, 1988). In examples such as these, tests with a large number of items are developed, patients or other subjects take the test, and scores on various items are combined to produce scores on the relevant factors

The MOS-SF36 is probably used more frequently than any other questionnaire to measure functional outcomes and quality of life; it has been used all over the world and in patients with a variety of medical conditions. The questionnaire contains 36 items that are combined to produce a patient profile on eight concepts: physical functioning, role-physical, bodily pain, general health, vitality, social functioning, role-emotional, and mental health. The first four concepts are combined to give a measure of physical health, and the last four concepts are combined to give a measure of mental health. The developers used factor analysis to decide how to combine the questions to develop these concepts.

In a research problem in which factor analysis is appropriate, all variables are considered to be independent; in other words, there is no desire to predict one on the basis of others. Conceptually, factor analysis works as follows: First, a large number of people are measured on a set of items; a rule of thumb calls for at least ten times as many subjects as items. The second step involves calculating correlations. To illustrate, suppose 500 patients answered the 36 questions on the MOS-SF36. Factor analysis answers the question of whether some of the items group together in a logical way, such as items that measure the same underlying component of physical activity. If two items measure the same component, they can be expected to have higher correlations with each other than with other items.

In the third step, factor analysis manipulates the correlations among the items to produce linear combinations, similar to a regression equation without the dependent variable. The difference is that each linear combination, called a factor, is determined so that the first one accounts for the most variation among the items, the second factor accounts for the most residual variation after the first factor is taken into consideration, and so forth. Typically, a small number of factors account for enough of the variation among subjects that it is possible to draw inferences about a patient's score on a given factor. For example, it is much more convenient to refer to scores for physical functioning, role-physical, bodily pain, and so on, than to refer to scores on the original 36 items. Thus, the fourth step involves determining how many factors are needed and how they should be interpreted.

Andrewes and colleagues analyzed the ESDQ, a questionnaire designed for brain-damaged populations.

They performed a factor analysis of the ratings by the partner or caretaker of 211 patients. They found that the relationships among the questions could be summarized by eight factors, including anger, helplessness, emotional dyscontrol, indifference, inappropriateness, fatigue, maladaptive behavior, and insight. The researchers subsequently used the scores on the factors for a discriminant analysis to differentiate between the brain-damaged patients and a control group with no cerebral dysfunction and found significant discrimination.

Investigators who use factor analysis usually have an idea of what the important factors are, and they design the items accordingly. Many other issues are of concern in factor analysis, such as how to derive the linear combinations, how many factors to retain for interpretation, and how to interpret the factors. Using factor analysis, as well as the other multivariate techniques, requires considerable statistical skill.

Cluster Analysis

A statistical technique similar conceptually to factor analysis is cluster analysis. The difference is that cluster analysis attempts to find similarities among the subjects that were measured instead of among the measures that were made. The object in cluster analysis is to determine a classification or taxonomic scheme that accounts for variance among the subjects. Cluster analysis can also be thought of as similar to discriminant analysis, except that the investigator does not know to which group the subjects belong. As in factor analysis, all variables are considered to be independent variables

Cluster analysis is frequently used in archeology and paleontology to determine if the existence of similarities in objects implies that they belong to the same taxon. Biologists use this technique to help determine classification keys, such as using leaves or flowers to determine appropriate species. A study by Penzel and colleagues (2003) used cluster analysis to examine the relationships among chromosomal imbalances in thymic epithelial tumors. Journalists and marketing analysts also use cluster analysis, referred to in these fields as Q-type factor analysis, as a way to classify readers and consumers into groups with common characteristics.

Propensity Scores

The propensity score method is an alternative to multiple regression and analysis of covariance. It provides a creative method to control for an entire group of confounding variables. Conceptually, a propensity score is found by using the confounding variables as predictors of the group to which a subject belongs; this step is generally accomplished by using logistic regression. For example, many cohort studies are handicapped by the problem of many confounding variables, such as age, gender, race, comorbidities, and so forth. Once the outcome is known for the subjects in the cohort, the confounding variables are used to develop a logistic regression equation to predict whether a patient has the outcome or not. This prediction, based on a combination of the confounding variables, is calculated for all subjects and then used as the confounding variable in subsequent analyses. Developers of the technique maintain it does a better job of controlling for confounding variables (Rubin, 1997). See Katzan and colleagues (2003) for an example of the application of propensity score analysis in a clinical study to determine the effect of pneumonia on mortality in patients with acute stroke.

Classification and Regression Tree (CART) Analysis

Classification and regression tree (CART) analysis is an approach to analyzing large databases to find significant patterns and relationships among variables. The patterns are then used to develop predictive models for classifying future subjects. As an example, CART was used in a study of 105 patients with stage IV colon or rectal cancer (Dixon et al, 2003). CART identified optimal cut points for carcinoembryonic antigen (CEA) and albumen (ALB) to form four groups of patients: low CEA with high ALB, low CEA with low ALB, high CEA with high ALB, and high CEA with low ALB. A survival analysis (Kaplan–Meier) was then used to compare survival times in these four groups. In another application of CART analysis, researchers were successful in determining the values of semen measurements that discriminate between fertile and infertile men (Guzick et al, 2001). The method requires special software and extensive computing power.

MULTIPLE DEPENDENT VARIABLES

Multivariate analysis of variance and canonical correlation are similar to each other in that they both involve multiple dependentvariables as well as multiple independent variables.

Multivariate Analysis of Variance

Multivariate analysis of variance (MANOVA) conceptually (although not computationally) is a simple extension of the ANOVA designs discussed in Chapter 7 to situations in which two or more dependent variables are included. As with ANOVA, MANOVA is appropriate when the independent variables are nominal or categorical and the outcomes are numerical. If the results from the MANOVA are statistically significant, using the multivariate statistic called Wilks' lambda, follow-up ANOVAs may be done to investigate the individual outcomes

Weiner and Rudy (2002) wanted to identify nursing home resident and staff attitudes that are barriers to effective pain management. They collected information from nurses, nursing assistants, and residents in seven long-term care facilities. They designed questionnaires to collect beliefs about 12 components of chronic pain management and administered them to these three groups. They wanted to know if there were attitudinal differences among the three groups on the 12 components. If analysis of variance is used in this study, they would need to do 12 different ANOVAs, and the probability of any one component being significant by chance is increased. With these multiple dependent variables, they correctly chose to use MANOVA. Results indicated that residents believed that chronic pain does not change, and they were fearful of addiction. The nursing staff believed that many complaints were unheard by busy staff. Note that this study used a nested design (patients and staff within nursing homes) and would be a candidate for GEE or multilevel model analysis.

The motivation for doing MANOVA prior to univariate ANOVA is similar to the reason for performing univariate ANOVA prior to t tests: to eliminate doing many significance tests and increasing the likelihood that a chance difference is declared significant. In addition, MANOVA permits the statistician to look at complex relationships among the dependent variables. The results from MANOVA are often difficult to interpret, however, and it is used sparingly in the medical literature.

Canonical Correlation Analysis

Canonical correlation analysis also involves both multiple independent and multiple dependent variables. This method is appropriate whenboth the independent variables and the outcomes are numerical, and the research question focuses on the relationship between the set of independent variables and the set of dependent variables. For example, suppose researchers wish to examine the overall relationship between indicators of health outcome (physical functioning, mental health, health perceptions, age, gender, etc) measured at the beginning of a study and the set of outcomes (physical functioning, mental health, social contacts, serious symptoms, etc) measured at the end of the study. Canonical correlation analysis forms a linear combination of the independent variables to predict not just a single outcome measure, but a linear combination of outcome measures. The two linear combinations of independent variables and dependent variables, each resulting in a single number (or index), are determined so the correlation between them is as large as possible. The correlation between the pair of linear combinations (or numbers or indices) is called the canonical correlation. Then, as in factor analysis, a second pair of linear combinations is derived from the residual variation after the effect of the first pair is removed, and the third pair from those remaining, and so on. The canonical coefficients in the linear combinations are interpreted in the same manner as regression coefficients in a multiple regression equation, and the canonical correlations as multiple R. Generally, the first two or three pairs of linear combinations account for sufficient variation, and they can be interpreted to gain insights about related factors or dimensions

The relationship between personality and symptoms of depression was studied in a community-based sample of 804 individuals. Grucza and colleagues (2003) used the Temperament and Character Inventory (TCI) to assess personality and the Center for Epidemiologic Studies Depression scale (CES-D) to measure symptoms of depression. Both of these questionnaires contain multiple scales or factors, and the authors used canonical correlation analysis to learn how the factors on the TCI are related to the factors on the CES-D. They discovered several relationships and concluded that depression symptom severity and patterns are partially explained by personality traits.

The advanced methods presented in this chapter are used in approximately 10–15% of the articles in medical and surgical journals. Unfortunately for readers of the medical literature, these methods are complex and not easy to understand, and they are not always described adequately. As with other complex statistical techniques, investigators should consult with a statistician if an advanced statistical method is planned. Table 10-2 gives a guide to the selection of the appropriate method(s), depending on the number independent variables and the scale on which they are measured.

EXERCISES

1. Using the following formula, verify the adjusted mean number of ventricular wall motion abnormalities in smokers and nonsmokers from the hypothetical data in the section titled, “Controlling for Confounding.” That is,

2. Blood flow through an artery measured as peak systolic velocity (PSV) increases with narrowing of the artery. The well-known relationship between area of the arterial vessels and velocity of blood flow is important in the use of carotid Doppler measurements for grading stenosis of the artery. Alexandrov and collaborators (1997) examined 80 bifurcations in 40 patients and compared the findings from the Doppler technique with two angiographic methods of measuring carotid stenosis (the North American or NASCET [N] method and the common carotid [C or CSI] method). They investigated the fit provided by a linear equation, a quadratic equation, and a cubic equation.

a. Using data in the file “Alexandrov” on the CD-ROM, produce a scatterplot with PSV on the y-axis and CSI on the x-axis. How do you interpret the scatterplot?

b. Calculate the correlation between both the N and C methods and PSV. Which is most highly related to PSV?

c.  Perform a multiple regression to predict PSV from CSI using linear and quadratic terms.

d. Using the regression equation, what is the predicted PSV if the measurement of angiographic stenosis using the CIS method is 60%?

3. Refer to the study by Soderstrom and coinvestigators (1997). Find the probability that a 27-year-old Caucasian man who comes to the emergency department on Saturday night has a BAC ≥50 mg/dL.

4. Refer to the study by Soderstrom and coinvestigators (1997). From Table 10-8, find the value of the kappa statistic for the agreement between the predicted and actual number of males with unintentional injuries who have a BAC ≥ 50 mg/dL when they come to the emergency department.

5. Bale and associates (1986) performed a study to consider the physique and anthropometric variables of athletes in relation to their type and amount of training and to examine these variables as potential predictors of distance running performance. Sixty runners were divided into three groups: (1) elite runners with 10-km runs in less than 30 min; (2) good runners with 10-km times between 30 and 35 min, and (3) average runners with 10-km times between 35 and 45 min. Anthropometric data included body density, percentage fat, percentage absolute fat, lean body mass, ponderal index, biceps and calf circumferences, humerus and femur widths, and various skinfold measures. The authors wanted to determine whether the anthropometric variables were able to differentiate between the groups of runners. What is the best method to use for this research question?

Table 10-12. Regression coefficients and t test values for predicting bed-days in RAND study.

 Dependent-Variable Equation Explanatory Variables and Other Measures (X) Coefficient (b) t Test Intercept 0.613 22.36 FFS freeplan -0.017 -2.17 FFS payplan -0.014 -2.18 Personal functioning -0.0002 -1.35 Mental health -0.00006 0.25 Health perceptions -0.002 -5.17 Age -0.0001 -0.54 Male -0.026 -4.58 Income -0.021 -1.65 Three-year term 0.002 0.44 Took physical -0.003 -0.56 Bed-day00 0.105 6.15 Sample size 1568 R2 0.12 Residual standard error 0.01 FFS = fee for service.Source: Reproduced, with permission, from Ware JE, Brook RH, Rogers WH, Keeler EB, Davie AR, Sherbourne CD, et al: Health Outcomes for Adults in Prepaid and Fee-for-Service Systems of Care. (R–3459–HHS.) Santa Monica, CA: The RAND Corporation, 1987, p. 59.

Table 10-13. Values for prediction equation.

 Variable Value Personal functioning 80 Mental heatlh 80 Health perceptions 75 Age 70 Income 10 (from a formula used in the RAND study) Three-year term* Yes Took physical* Yes Bed-day00 14 * Indicates a dummy variable with 1 = yes and 0 = no.

Table 10-14. Regression results for predicting depression at wave 2.

 Predictor Variablea b Betab P R2 R2 Change Depression Score Wave 1 0.267 0.231 0.000 0.182 0.182 Sociodemographic Age -0.014 -0.024 0.538 0.187 0.005 sex 0.165 0.034 0.370 Psychologic Health Neuroticism, wave 1 0.067 0.077 0.056 0.0237 0.050 Past history of depression 0.320 0.136 0.000 Physical Health ADL, wave 1 -0.154 -0.103 0.033 0.411 0.174 ADL, wave 2 0.275 0.283 0.012 ADL2, wave 2 -0.013 -0.150 0.076 Number of current symptoms, wave 2 0.115 0.117 0.009 Number of medical conditions, wave 2 0.309 0.226 0.000 BP, systolic, wave 2 -0.010 -0.092 0.010 Global health rating change 0.284 0.079 0.028 Sensory impairment change -0.045 -0.064 0.073 Social support/inactivity Social support—friends, wave 2 -1.650 -0.095 0.015 0.442 0.031 Social support—visits, wave 2 -1.229 -0.087 0.032 Activity level, wave 2 0.061 0.095 0.025 Services (community residents only), wave 2 0.207c 0.135c 0.001c 0.438c 0.015c a Only those variables are shown that were included in the final model.b Standardized beta value, controlling for all other variables in the regression, except service use. Based on community and institutional residents.c Regression limited to community sample only; coefficients for other variables vary only very slightly from those obtained with regression on the full sample.ADL = adult daily living; BP = blood pressure.Source: Table 3 from the article was modified with the addition of unstandardized regression coefficients; used, with permission, from Henderson AS, Korten AE, Jacomb PA, MacKinnon AJ, Jorm AF, Christensen H, et al: The course of depression in the elderly: A longitudinal community-based study in Australia. Psychol Med 1997;27:119–129.)

6. Ware and collaborators (1987) reported a study of the effects on health for patients in health maintenance organizations (HMO) and for patients in fee-for-service (FFS) plans. Within the FFS group, some patients were randomly assigned to receive free medical care and others shared in the costs. The health status of the adults was evaluated at the beginning and again at the end of the study. In addition, the number of days spent in bed because of poor health was determined periodically throughout the study. These measures, recorded at the beginning of the study—along with information on the participant's age, gender, income, and the system of health care to which he or she was assigned (HMO, free FFS, or pay FFS)—were the independent variables used in the study. The dependent variables were the values of these same 13 measures at the end of the study. The results from a multiple-regression analysis to predict number of bed days are given in Table 10-12.

Use the regression equation to predict the number of bed-days during a 30-day period for a 70-year-old woman in the FFS pay plan who has the values on the independent variables shown in Table 10-13 (asterisks [*] designate dummy variables given a value of 1 if yes and 0 if no).

7. Symptoms of depression in the elderly may be more subtle than in younger patients, but recognizing depression in the elderly is important because it can be treated. Henderson and colleagues in Australia (1997) studied a group of more than 1000 elderly, all age 70 years or older. They examined the outcome of depressive states 3–4 years after initial diagnosis to identify factors associated with persistence of depressive symptoms and to test the hypothesis that depressive symptoms in the elderly are a risk factor for dementia or cognitive decline. They used the Canberra Interview for the Elderly (CIE), which measures depressive symptoms and cognitive performance, and referred to the initial measurement as “wave 1” and the follow-up as “wave 2.” The regression equation predicting depression at wave 2 for 595 people who completed the interview on both occasions is given in Table 10-14, and data are in the file on the CD-ROM entitled, “Henderson.” The variables have been entered into the regression equation in blocks, an example of hierarchical regression.

Table 10-15. Cox proportional hazard model using only pretreatment variables.

 -2 Log Likelihood 610.312 χ2 df Significance Overall (score) 51.483 6 0.0000 Change (-2LL) from Previous block 39.344 6 0.0000 Previous step 39.344 6 0.0000 Variables in the Equation Variable B SE Wald df Significance R GSCORE 0.2999 0.2818 1.1321 1 0.2873 0.0000 TUMSTAGE 12.1032 3 0.0070 0.0969 TUMSTAGE (1) -0.0263 0.7075 0.0014 1 0.9703 0.0000 TUMSTAGE (2) 1.0141 0.5419 3.5014 1 0.0613 0.0481 TUMSTAGE (3) 1.4588 0.5535 6.9458 1 0.0084 0.0873 PRERTHOR 0.1332 0.3262 0.1668 1 0.6830 0.0000 PRERXPSA 0.0080 0.0027 9.0391 1 0.0026 0.1041 95% CI for Exp (B) Variable Exp (B) Lower Upper GSCORE 1.3497 0.7768 2.3450 TUMSTAGE (1) 0.9740 0.2434 3.8979 TUMSTAGE (2) 2.7568 0.9530 7.9746 TUMSTAGE (3) 4.3008 1.4534 12.7264 PRERTHOR 1.1425 0.6028 2.1654 PRERXPSA 1.0080 1.0028 1.0133 df= degree of freedom; SE = standard error; CI = confidence interval; Wald = statistic used by SPSS to test the significance of variables.Source: Data, used with permission of the authors and publisher, from Crook JM, Bahadur YA, Bociek RG, Perry GA, Robertson SJ, Esche BA: Radiotherapy for localized prostate carcinoma.Cancer 1997; 79: 328-336. Output produced using SPSS 10.0, a registered trademark of SPSS, Inc. Used with permission.

9.

a. Based on the regression equation in Table 10-14, what is the relationship between depression score initially and at follow-up?

b. The regression coefficient for age is -0.014. Is it significant? How would you interpret it?

c.  Once a person's depression score at wave 1 is known, which group of variables accounts for more of the variation in depression at wave 2?

d. Use the data on the CD-ROM to replicate the analysis.

10.    Table 10-15 contains the results from an analysis of the data from Crook and colleagues (1997) using only information known before treatment was given.

a. Is the overall Cox model significant when based on pretreatment variables only? What level of significance is reported?

b. Were any of the potentially confounding variables significant?

c.  Confirm the value of the odds ratios associated with the TUMSTAGE(3) variable of T classifications, and interpret the confidence interval.

Table 10-16. Case summaries.a

 Height SDS Chronologic Height SDS Final Height SDS Age Dose Father's Height Mother's Height Age Bone Age 1 -2.18 6.652 20.00 -2.14 -2.20 -3.07 -1.75 2 -2.15 10.383 16.00 -2.78 -0.15 -2.20 -2.43 3 -1.65 10.565 16.00 -1.91 -1.87 -2.62 -0.90 4 -1.18 10.104 15.00 0.84 -1.63 -2.43 -2.06 5 -1.31 11.145 14.00 -1.80 -0.70 -1.92 -1.49 6 -1.35 9.682 14.00 0.09 0.38 -1.38 -0.54 7 -1.18 9.863 16.00 -0.26 -1.70 -2.08 -1.32 8 -2.51 9.463 15.00 -3.62 -0.75 -2.36 -0.45 9 -1.61 7.704 19.00 -2.11 -1.25 -1.98 1.23 10 -2.15 5.858 25.00 1.31 0.23 -2.43 -1.95 11 0.80 5.153 22.00 -0.14 -1.83 -2.36 -0.39 12 -0.20 6.986 21.00 -0.14 -1.83 -2.09 -0.98 13 0.20 8.967 17.00 -0.14 -1.83 -1.38 -0.19 14 -0.71 6.970 21.00 0.50 1.63 -1.09 0.49 15 -1.71 6.515 13.00 0.84 -0.10 -1.98 -0.18 16 -2.32 7.548 21.00 -2.89 -0.20 -3.30 -2.25 TotalN 16 16 16 16 16 16 16 aLimited to first 100 cases.SDS = standard deviation score.Source: Data, used with permission, from Hindmarsh PC, Brook CGD: Final height of short normal children treated with growth hormone.Lancet 1996; 348: 13-16. Table produced using SPSS 10.0; used with permission

d. What are the major differences in this analysis compared with the one that included posttreatment variables as well?

11.    Hindmarsh and Brook (1996) examined the final height of 16 short children who were treated with growth hormone. They studied several variables they thought might predict height in these children, such as the mother's height, the father's height, the child's chronologic and bone age, dose of the growth hormone during the first year, age at the start of therapy, and the peak response to an insulin-induced hypoglycemia test. All anthropometric indices were expressed as standard deviation scores; these scores express height in terms of standard deviations from the mean in a norm group. For example, a height score of -2.00 indicates the child is 2 standard deviations below the mean height for his or her age group.

Data are given in Table 10-16 and in a file entitled “Hindmarsh” on the CD-ROM.

a. Use the data to perform a stepwise regression and interpret the results. We reproduced a portion of the output in Table 10-17.

b.

Table 10-17. Results from stepwise multiple regression to predict final height in standard deviation scores.a

 Unstandardized Standardized Coefficients Coefficients Model B SE β t Significance 1 (Constant) -1.055 0.248 -4.261 0.001 Father's height 0.302 0.142 0.494 2.126 0.052 2 (Constant) -1.335 0.284 -4.705 0.000 Father's height 0.337 0.135 0.552 2.503 0.026 Mother's height -0.343 0.200 -0.378 -1.715 0.110 3 (Constant) 0.205 0.734 0.280 0.785 Father's height 0.211 0.131 0.345 1.612 0.133 Mother's height -0.478 0.185 -0.527 -2.581 0.024 Height SDS chronologic age 0.820 0.368 0.505 2.230 0.046 4 (Constant) -1.110 0.927 -1.198 0.256 Father's height 0.128 0.124 0.210 1.035 0.323 Mother's height -0.559 0.170 -0.617 -3.284 0.007 Height SDS chronologic age 1.132 0.363 0.697 3.116 0.010 Dose 0.104 0.052 0.385 2.009 0.070 5 (Constant) -1.138 0.929 -1.225 0.244 Mother's height -0.575 0.170 -0.634 -3.381 0.005 Height SDS chronologic age 1.325 0.313 0.816 4.229 0.001 Dose 0.121 0.049 0.451 2.487 0.029 aDependent variable: final height SDS.SE = standard error; SDS = standard deviation scores.Note: Because the sample size is small, we set probability for variables to enter the regression equation at 0.15 and for variables to be removed at 0.20.Source: Data, used with permission, from Hindmarsh PC, Brook CGD: Final height of short normal children treated with growth hormone.Lancet 1996; 348:13-16. Stepwise regression results produced using SPSS; used with permission.

c.  What variable entered the equation on the first iteration (model 1)? Why do you think it entered first?

d. What variables are in the equation at the final model? Which of these variables makes the greatest contribution to the prediction of final height?

e. Why do you think the variable that entered the equation first is not in the final model?

f.   Using the regression equation, what is the predicted height of the first child? How close is this to the child's actual final height (in SDS scores)?

Footnotes

aTechnically it is possible for the regression coefficient and the correlation to have different signs. If so, the variable is called a moderator variable; it affects the relationship between the dependent variable and another independent variable.

bThe standardized coefficient = the unstandardized coefficient multiplied by the standard deviation of the X variable and divided by the standard deviation of the Y variable: βj = bj (SDX/SDY).

﻿
﻿
If you find an error or have any questions, please email us at admin@doctorlib.info. Thank you!