Thompson & Thompson Genetics in Medicine, 8th Edition

CHAPTER 8. Complex Inheritance of Common Multifactorial Disorders

Common diseases such as congenital birth defects, myocardial infarction, cancer, neuropsychiatric disorders, diabetes, and Alzheimer disease cause morbidity and premature mortality in nearly two of every three individuals during their lifetimes (Table 8-1). Many of these diseases “run in families”—cases seem to cluster among the relatives of affected individuals more frequently than in the general population. However, their inheritance generally does not follow one of the mendelian patterns seen in the single-gene disorders described in Chapter 7. This is because such diseases rarely result simply from inheriting one or two alleles of major effect at a single locus, as is the case for dominant and recessive mendelian disorders. Instead, they are thought to result from complex interactions among a number of genetic variants that alter susceptibility to disease, combined with certain environmental exposures and perhaps chance events as well, all of which acting together may trigger, accelerate, or protect against the disease process. For this reason, these disorders are considered to be multifactorial in origin, and the familial clustering generates a pattern of inheritance that is referred to as complex.


Frequency of Different Types of Genetic Disease


Data from Rimoin DL, Connor JM, Pyeritz RE: Emery and Rimoin's principles and practice of medical genetics, ed 3, Edinburgh, 1997, Churchill Livingstone.

The familial clustering and complex inheritance seen with multifactorial disorders can be explained by recognizing that family members share a greater proportion of their genetic information and environmental exposures than individuals chosen at random in the population. Thus the relatives of an affected individual are more likely to experience the same gene-gene and gene-environment interactionsthat led to disease in the proband than are individuals who are unrelated to the proband.

In this chapter, we first address the question of how we infer that gene variants in the population predispose to such common diseases. We then describe how studies of familial aggregation and twin studies are used by geneticists to quantify the relative contributions of genetic variation and environment and show how these tools have been applied to multifactorial diseases. Finally, we devote the remainder of the chapter to describing a few examples of complex disorders where information is beginning to emerge about the specific nature of the genetic and environmental contributions to disease.

As we shall see in this chapter, the individual genes, the particular variants in those genes, and the environmental factors that interact with these variants have not yet been fully identified for the vast majority of common multifactorial diseases. A more detailed understanding of the approaches that geneticists use to identify the genetic factors underlying complex disease first requires a full appreciation of the distribution of genetic variation in different populations. This topic is presented in Chapter 9, after which we will turn, in Chapter 10, to discussion of the specific population-based epidemiological approaches that geneticists are using to identify the particular genes and the variants in those genes responsible for an increasing number of conditions with complex inheritance.

Ultimately, finding the genes and their variants that interact with the environment to contribute to susceptibility will give us a better understanding of the underlying processes leading to common multifactorial diseases and, perhaps, better tools for prevention or treatment.

Qualitative and Quantitative Traits

Multifactorial diseases with complex inheritance can be classified either as discrete qualitative traits or as continuous quantitative traits. A qualitative trait is the simpler of the two; a disease, such as lung cancer or rheumatoid arthritis, is either present or absent in an individual. Distinguishing between individuals who either have a disease or not is usually straightforward, but it may sometimes require detailed examination or specialized testing if the manifestations are subtle.

In contrast, a quantitative trait is a measurable physiological or biochemical quantity, such as height, blood pressure, serum cholesterol concentration, or body mass index, that varies among different individuals within a population. Although a quantitative trait varies continuously across a range of values, there are certain disease diagnoses, such as short stature, hypertension, hypercholesterolemia, or obesity, that are defined based on whether the value of the trait falls outside the so-called normal range, defined as an arbitrary interval around the population average. Frequently the normal range is derived by using the normal distribution, which is described in the next section, as an approximation for the distribution of the values of a quantitative trait in the population. Note that the term normal is used here in two different ways. Asserting that a physiological quantity has a “normal” distribution in the population and stating that an individual's value is in the “normal” range are different uses of the same word, one statistical and the other a measure of conformity to what is typically observed.

The Normal Distribution

As is often the case with physiological quantities, such as systolic blood pressure, a graph of the number (or the fraction) of individuals in the population (y-axis) having a particular quantitative value (x-axis) approximates the familiar, bell-shaped curve known as the normal (or gaussiandistribution (Fig. 8-1A). The position of the peak and the width of the curve of the normal distribution are governed by two quantities, the mean (µ) and the variance (σ2), respectively. The mean is the arithmetic average of the values, and because more people have values for the trait near the average, the curve ordinarily has its peak at the mean value. The variance (or its square root, σ, the standard deviation, abbreviated SD) is a measure of how much spread there is in the values to either side of the mean and therefore determines the breadth of the curve.


FIGURE 8-1 A, The normal gaussian distribution, with mean (average) and standard deviations (SDs) indicated. For many traits, the “normal” range is considered the mean ± 2 SD, as indicated by the shaded region. B, Distribution of systolic blood pressure in approximately 3300 men aged 40 to 45 (solid line) and approximately 2200 men aged 50 to 55 (dotted line). The mean and ± 2 SD are shown above double-headed arrowsSeeSources & Acknowledgments.

Any physiological quantity that can be measured across a sample of a population is a quantitative phenotype, and the mean and variance for that sample can be calculated and used to approximate the underlying mean and variance of the population from which the sample was drawn. For example, the systolic blood pressure of thousands of men in two different age-groups is shown in Figure 8-1B. The systolic blood pressure of the younger cohort is nearly symmetrical; in the older age-group, however, the curve becomes more “skewed” (asymmetrical), with more individuals with systolic blood pressures above the mean than below, indicating a tendency toward hypertension in that age-group.

The normal distribution provides guidelines for setting the limits of the normal range. A normal range is often defined as the values of a quantitative trait that are seen in approximately 95% of the population. Basic statistical theory states that when the values of a quantitative trait in a population follow the bell-shaped curve (i.e., are normally distributed), approximately 5% of the population will have measurements more than 2 SD above or below the population mean. For a given individual, however, it may still be perfectly “normal” (i.e., the individual is in good health), despite being a value outside the “normal” range.

Familial Aggregation and Correlation

Allele Sharing among Relatives

The more closely related two individuals are in a family, the more alleles they have in common, inherited from their common ancestors (see Chapter 7). The most extreme examples of two individuals having alleles in common are identical (monozygotic [MZ]) twins (see later in this chapter), who have the same alleles at every locus. The next most closely related individuals in a family are first-degree relatives, such as a parent and child or a pair of sibs, including fraternal (dizygotic [DZ]) twins. In a parent-child pair, the child has exactly one allele out of two (50% of alleles) in common with each parent at every locus, that is, the allele the child inherited from that parent. Siblings (including DZ twins) also have 50% of their alleles in common with their other siblings, but this is only on average. This is because a pair of sibs inherits the same two alleles at a locus one fourth of the time, no alleles in common one fourth of the time, and one allele in common one half of the time (Fig. 8-2). At any one locus therefore, the average number of alleles an individual is expected to share with a sibling is given by:


The more distantly related two members of a family are, the fewer alleles they will have in common, inherited from a common ancestor.


FIGURE 8-2 Allele sharing at an arbitrary locus between sibs concordant for a disease. The parents' genotypes are shown as A1A2 for the father and A3A4 for the mother. All four possible genotypes for sib #1 are given across the top of the table, and all four possible genotypes for sib #2 are given along the left side of the table. The numbers inside the boxes represent the number of alleles both sibs have in common for all 16 different combinations of genotypes for both sibs. For example, the upper left-hand corner has the number 2 because sib #1 and sib #2 both have the genotype A1A3 and so have both A1 and A3 alleles in common. The bottom left-hand corner contains the number 0 because sib #1 has genotype A1A3, whereas sib #2 has genotype A2A4, so there are no alleles in common.

Familial Aggregation in Qualitative Traits

If certain alleles increase the chance of developing a disease, one would expect an affected individual to have a greater-than-expected number of affected relatives compared to what would be predicted from the frequency of the disease in the general population (familial aggregation of disease). This is because the more closely related the family members are to the affected relative, the more they will share the relevant alleles and the greater their chance of also being affected. Here, we will present two approaches to measuring familial aggregation: relative risk ratios and family history case-control studies.

Relative Risk Ratio

One way to measure familial aggregation of a disease is by comparing the frequency of the disease in the relatives of an affected proband with its frequency (prevalence) in the general population. The relative risk ratio λr (where the subscript “r” refers to relatives) is defined as:


The value of λr as a measure of familial aggregation depends both on how frequently a disease is found to have recurred in a relative of an affected individual (the numerator) and on the population prevalence (the denominator); the larger λr is, the greater is the familial aggregation. The population prevalence enters into the calculation because the more common a disease is, the greater is the likelihood that aggregation may be just a coincidence based on drawing alleles from the overall gene pool rather than a result of sharing the alleles that predispose to disease because of familial inheritance. A value of λr = 1 indicates that a relative is no more likely to develop the disease than is any individual in the population, whereas a value greater than 1 indicates that a relative is more likely to develop the disease. In practice, one measures λ for a particular class of relatives (e.g., r = s for sibs or r = p for parents). Examples of relative risk ratios determined for various diseases in samples of siblings (thus, λs) are shown in Table 8-2.


Risk Ratios λs for Siblings of Probands with Diseases with Familial Aggregation and Complex Inheritance










Manic-depressive (bipolar) disorder



Type 1 diabetes mellitus



Crohn disease



Multiple sclerosis



Data from Rimoin DL, Connor JM, Pyeritz RE: Emery and Rimoin's principles and practice of medical genetics, ed 3, Edinburgh, 1997, Churchill Livingstone; and King RA, Rotter JI, Motulsky AG: The genetic basis of common diseases, ed 2, Oxford, England, 2002, Oxford University Press.

Family History Case-Control Studies

Another approach to assessing familial aggregation is the case-control study, in which patients with a disease (the cases) are compared with suitably chosen individuals without the disease (the controls), with respect to family history of disease (as well as other factors, such as environmental exposures, occupation, geographical location, parity, and previous illnesses). To assess a possible genetic contribution to familial aggregation of a disease, the frequency with which the disease is found in the extended families of the cases (positive family history) is compared with the frequency of positive family history among suitable controls, matched for age and ethnicity, but who do not have the disease. Spouses are often used as controls in this situation because they usually match the cases in age and ethnicity and share the same household environment. Other frequently used controls are patients with unrelated diseases matched for age, occupation, and ethnicity. Thus, for example, in a study of multiple sclerosis (MS), approximately 3.5% of first-degree relatives of patients with MS also had MS, a prevalence that was much higher than among first-degree relatives of matched controls without MS (0.2%). Thus the odds of having a first-degree relative with MS were 18 times higher among MS patients than among controls. (In Chapter 10, we will discuss how one calculates odds ratios in case-control studies.) One can conclude therefore that substantial familial aggregation is occurring in MS, thereby providing evidence of a genetic predisposition to this disease.

Measuring the Genetic Contribution to Quantitative Traits

Just as a hereditary contribution to a disease increases familial aggregation for that disease, sharing of alleles that govern a particular quantitative trait affects the distribution of values of that trait in family members. The more sharing of alleles that govern a quantitative trait there is among relatives, the more similar the value of the trait will be among family members compared to what would be expected from the variance of the trait measured in the general population. The effect of genetic variation on quantitative traits is often measured and reported in two related ways: correlation between relatives and heritability.

Familial Correlation

The tendency for the values of a physiological measurement to be more similar among relatives than it is in the general population is measured by determining the degree of correlation of particular physiological quantities among relatives. The coefficient of correlation (symbolized by the letter r) is a statistical measure of correlation applied to a pair of measurements, such as, for example, a child's serum cholesterol level and that of a parent. Accordingly, a positive correlation would exist between the cholesterol measurements in a group of patients and the cholesterol levels in their relatives if it is found that the higher a patient's level, the proportionately higher is the level in the patient's relatives. When a correlation exists, a graph of values in the proband and his or her relatives, in which each point represents a proband-relative pair of values, will tend to cluster around a straight line. In such examples, the value of r can range from 0 when there is no correlation to +1 for perfect positive correlation. In the example of serum cholesterol, Figure 8-3shows a modest positive correlation (r = 0.294) between serum cholesterol level of mothers aged 30 to 39 and those of their male children aged 4 to 9. In contrast, a negative correlation exists when the greater the increase in the patient's measurement, the lower the measurement is in the patient's relatives. The measurements are still correlated, but in the opposite direction. In such a case, the value of r can range from 0 to −1 for a perfect negative correlation.


FIGURE 8-3 Plot of serum cholesterol levels in a group of mothers aged 30 to 39 and in their male children aged 4 to 9. Each dot represents a mother-son pair of measurements. The straight line is a “best fit” through the data points. SeeSources & Acknowledgments.


The concept of heritability of a quantitative trait (symbolized as H2) was developed in an attempt to determine how much the genetic differences between individuals in a population contribute to variability of that trait in the population. H2 is defined as the fraction of the total phenotypic variance of a quantitative trait that is due to allelic variation in the broadest sense, regardless of the mechanism by which the various alleles affect the phenotype. The higher the heritability, the greater is the contribution of genetic differences among people to the variability of the trait in the population. The value of H2 varies from 0, if genotype contributes nothing to the total phenotypic variance in a population, to 1, if genotype is totally responsible for the phenotypic variance in that population.

Heritability of a human trait is a theoretical quantity that is usually estimated from the correlation between measurements of that trait among relatives of known degrees of relatedness, such as parents and children, siblings, or, as we shall see later in this chapter, twins.

Determining the Relative Contributions of Genes and Environment to Complex Disease

Distinguishing between Genetic and Environmental Influences Using Family Studies

For both qualitative and quantitative traits, similarities among family members are most likely the result of overlapping genotype and common exposure to nongenetic (i.e., environmental) factors such as socioeconomic status, local environment, dietary habits, or cultural behaviors, all of which are frequently shared among family members but are generally considered to be of nongenetic origin. Given evidence of familial aggregation of a disease or correlation of a quantitative trait, geneticists attempt to separate the relative contributions of genotype and environment to the phenotype using a variety of approaches. One approach is to compare λr measurements or quantitative trait correlations between relatives who are of varying degrees of relatedness to the proband. For example, if genes predispose to a disease, one would expect λr to be greatest for MZ twins, to be somewhat smaller for first-degree relatives such as sibs or parent-child pairs, and to continue to decrease as allele-sharing decreases among the more distant relatives in a family (see Figure 7-3).

To illustrate this approach, consider cleft lip with or without cleft palate, or CL(P), one of the most common congenital malformations, affecting 1.4 per 1000 newborns worldwide. CL(P) originates as a failure of fusion of embryonic tissues that will go to make up the upper lip and the hard palate at approximately the 35th day of gestation. It is a multifactorial disorder with complex inheritance; for reasons that are not well understood, approximately 60% to 80% of those affected with CL(P) are males. Despite the similarity in names, CL(P) is usually etiologically distinct from isolated cleft palate (i.e., without cleft lip).

CL(P) is heterogeneous and includes forms in which the clefting is only one feature of a syndrome that includes other anomalies, known as syndromic CL(P), as well as forms that are not associated with other birth defects, which are known as nonsyndromic CL(P). Syndromic CL(P) can be inherited as a mendelian single-gene disorder or can be caused by chromosome disorders (especially trisomy 13 and 4p deletion syndrome) (see Chapter 6) or teratogenic exposure (rubella embryopathy, thalidomide, or anticonvulsants) (see Chapter 14). Nonsyndromic CL(P) can also be inherited as a single-gene disorder but more commonly is a sporadic occurrence and demonstrates some degree of familial aggregation without an obvious mendelian inheritance pattern.

The risk for CL(P) in a child increases as a function of the number of relatives the child has who are affected with CL(P) and the more closely related they are to the child (Table 8-3). The simplest explanation for this is that the more closely related one is to the proband and, the more probands there are in the family, the more likely one is to share disease-susceptibility alleles with the probands; therefore one's risk for the disorder increases.


Risk for Cleft Lip with or without Cleft Palate in a Child Depending on the Number of Affected Parents and Other Relatives


CL(P), Cleft lip with or without cleft palate.

Another approach is to compare the disease relative risk ratio in biological relatives of the proband with that in biologically unrelated family members (e.g., adoptees or spouses), all living in the same household environment. Returning to MS, for example, λr is 190 for MZ twins and 20 to 40 for first-degree biological relatives (parents, children, and sibs). In contrast, λr is 1 for the adopted siblings of an affected individual, suggesting that most of the familial aggregation in MS is genetic rather than the result of a shared environment. A similar analysis can be carried out for quantitative traits such as blood pressure: no correlation exists between a child's blood pressure and that of his adopted siblings, in contrast to the positive correlation with blood pressure of biological siblings, all living in the same household.

Distinguishing between Genetic and Environmental Influences Using Twin Studies

Of all methods used to separate genetic and environmental influences, geneticists have relied most heavily on twin studies.


MZ and DZ twins are “experiments of nature” that provide an excellent opportunity to separate environmental and genetic influences on phenotypes in humans. MZ twins arise from the cleavage of a single fertilized zygote into two separate zygotes early in embryogenesis (see Chapter 14). They occur in approximately 0.3% of all births, without significant differences among different ethnic groups. At the time the zygote cleaves in two, MZ twins start out with identical genotypes at every locus and are therefore often thought of as having identical genotypes and gene expression patterns.

In contrast, DZ twins arise from the simultaneous fertilization of two eggs by two sperm; genetically, DZ twins are siblings who share a womb and, like all siblings, share, on average, 50% of the alleles at all loci. DZ twins are of the same sex half the time and of opposite sex the other half. In contrast to MZ twins, DZ twins occur with a frequency that varies as much as fivefold in different populations, from a low of 0.2% among Asians to more than 1% of births in parts of Africa and among African Americans.

The striking difference between MZ and DZ twins in their genetic makeup is most easily seen by comparing the pattern for a type of so-called DNA fingerprint in twins (Fig. 8-4). This method of DNA fingerprinting is generated by simultaneously examining many DNA fragments of varying lengths that share a particular DNA sequence (minisatellite) and are located throughout the genome. MZ twins show an indistinguishable pattern, whereas many differences are seen between DZ twins, whether of same sex or not.


FIGURE 8-4 DNA fingerprinting of twins by detecting a variable number tandem repeat polymorphism, a class of polymorphism that has many alleles in loci around the genome due to variation in the number of copies repeated in tandem (see Chapter 4). Each pair of lanes contains DNA from a set of twins. The twins of the first and third sets have identical DNA fingerprints, indicating that they are identical (MZ) twins. The twins of the set in the middle have clearly distinguishable DNA fingerprints, confirming that they are fraternal (DZ) twins. SeeSources & Acknowledgments.

Disease Concordance in Monozygotic and Dizygotic Twins

When twins have the same disease, they are said to be concordant for that disorder. Conversely, when only one member of the pair of twins is affected and the other is not, the relatives are discordant for the disease. An examination of how frequently MZ twins are concordant for a disease is a powerful method for determining whether genotype alone is sufficient to produce a particular disease. The differences between a disease that is mendelian from one that shows complex inheritance are immediately evident. Using sickle cell disease (Case 42) as an example of a mendelian disorder, if one MZ twin has sickle cell disease, the other twin will always have the disease as well. In contrast, as an example of a multifactorial disorder, when one MZ twin has type 1 diabetes mellitus (previously known as insulin-dependent or juvenile diabetes) (Case 26), the other twin will also have type 1 diabetes in only approximately 40% of such twin pairs. Disease concordance less than 100% in MZ twins is strong evidence that nongenetic factors play a role in the disease. Such factors could include environmental influences, such as exposure to infection or diet, as well as other effects, such as somatic mutation, effects of aging, or epigenetic changes in gene expression in one twin compared with the other.

MZ and same-sex DZ twins share a common intrauterine environment and sex and are usually reared together in the same household by the same parents. Thus a comparison of concordance for a disease between MZ and same-sex DZ twins shows how frequently disease occurs when relatives who experience the same prenatal and often the same postnatal environment have the same alleles at every locus (MZ twins), compared with only 50% of their alleles in common (DZ twins). Greater concordance in MZ versus DZ twins is strong evidence of a genetic component to the disease, as shown in Table 8-4 for a number of disorders.


Concordance Rates in MZ and DZ Twins for Various Multifactorial Disorders


Concordance (%)*




Nontraumatic epilepsy



Multiple sclerosis



Type 1 diabetes






Bipolar disease






Rheumatoid arthritis






Cleft lip with or without cleft palate



Systemic lupus erythematosus



*Rounded to the nearest percent.

DZ, Dizygotic; MZ, monozygotic.

Data from Rimoin DL, Connor JM, Pyeritz RE: Emery and Rimoin's principles and practice of medical genetics, ed 3, Edinburgh, 1997, Churchill Livingstone; King RA, Rotter JI, Motulsky AG: The genetic basis of common diseases, Oxford, England, 1992, Oxford University Press; and Tsuang MT: Recent advances in genetic research on schizophrenia. J Biomed Sci 5:28-30.

Estimating Heritability from Twin Studies

Just as twin data may be used to assess the separate roles of genes and environment in qualitative disease traits, twins are also used to estimate the heritability of a quantitative trait using the correlation in the values of a physiological measurement in MZ and DZ twins. If one assumes that the alleles affecting the trait exert their effect additively (which is certainly overly simplistic and probably incorrect in many, if not all cases), MZ twins, who share 100% of their alleles, have twice the amount of allele sharing compared to that of DZ twins, who share 50% of their alleles on average. H2, introduced earlier in this chapter, can therefore be approximated by taking twice the difference in the correlation coefficient r for a quantitative trait between MZ twins (rMZ) and r between same-sex DZ twins (rDZ) (as given by Falconer's formula):


If the variability of the trait is determined chiefly by environment, the correlation within pairs of DZ twins will be similar to that seen between pairs of MZ twins; there will be little difference in the value of rfor MZ and DZ twins. Thus, rMZ − rDZ = ≈0, and H2 will approach 0. At the other extreme, however, if the variability is determined exclusively by genetic makeup, the correlation coefficient r between MZ pairs will approach 1, whereas r between DZ twins will be half of that. Now, rMZ − rDZ = ≈image, and therefore H2 will be approximately 2 × (image) = 1.

Twins Reared Apart

Although a rare occurrence, twins are sometimes separated at birth for social reasons and placed in different homes, thus providing an opportunity to observe individuals of identical or half-identical genotypes reared in different environments. Such studies have been used primarily in research in psychiatric disorders, substance abuse, and eating disorders, in which strong environmental influences within the family are believed to play a role in the development of disease. For example, in one study of obesity, the body mass index (BMI; weight/height2, expressed in kg/m2) was measured in MZ and DZ twins reared in the same household versus those reared apart (Table 8-5). Although the average BMI among MZ or DZ twins was similar, regardless of whether they were reared together or apart, the pairwise correlation for BMI between a pair of twins was much higher for the MZ than the DZ twins. Also interesting is that the higher correlation between MZ versus DZ twins was independent of whether the twins were reared together or apart, which suggests that genotype has a highly significant impact on adult weight and consequently on the risk for obesity and its complications.


Pairwise Correlation of BMI between MZ and DZ Twins Reared Together and Apart


*Mean ± 1 SD.

BMI, Body mass index; DZ, dizygotic; MZ, monozygotic.

Data from Stunkard A J, Harris JR, Pedersen NL, McClearn GE: The body-mass index of twins who have been reared apart. N Engl J Med 322:1483-1487, 1990.

Limitations of Familial Aggregation and Heritability Estimates from Family and Twin Studies

Potential Sources of Bias

There are a number of difficulties in measuring and interpreting λs. One is that studies of familial aggregation of disease are subject to various forms of bias. There is ascertainment bias, which arises when families with more than one affected sibling are more likely to come to a researcher's attention, thereby inflating the sibling recurrence risk λs. Ascertainment bias is also a problem for twin studies. Many studies rely on asking one twin with a particular disease to recruit the other twin to participate in a study (volunteer-based ascertainment), rather than ascertaining them first as twins through a twin registry and only then examining their health status (population-based ascertainment). Volunteer-based ascertainment can give biased results because twins, particularly MZ twins who may be emotionally close, are more likely to volunteer if they are concordant than if they are not, which inflates the concordance rate.

Similarly, because case-control studies of family history often rely, for practical reasons, on taking a history from the proband rather than examining all the relatives directly, there may be recall bias, in which a proband may be more likely to know of family members with the same or similar disease, than would the controls. Such biases will inflate the level of familial aggregation.

Other difficulties arise in measuring and interpreting heritability. The same trait may yield different measurements of heritability in different populations because of different allele frequencies or diverse environmental conditions. For example, heritability measurements of height would be lower when measured in a population with widespread famine that stunts growth in childhood as compared to the same population after food becomes plentiful. Heritability of a trait should therefore not be thought of as an intrinsic, universally applicable measure of “how genetic” the trait is, because it depends on the population and environment in which the estimate is being made. Although heritability estimates are still made in genetic research, most geneticists consider them to be only crude estimates of the role of genetic variation in causing phenotypic variation.

Potential Genetic or Epigenetic Differences

Despite the evident power of twin studies, one must caution against thinking of such studies as perfectly controlled experiments that compare individuals who share either half or all of their genetic variation and are exposed either to the same or to different environments. Studies of MZ twins assume the twins are genetically identical. Although this is mostly true, genotype and gene expression patterns may come to differ between MZ twins because of genetic or epigenetic changes that occur after the cleavage event that produced the MZ twin embryos. There are a number of ways that MZ twins may differ in their genotypes or patterns of gene expression. Genotype may differ due to somatic rearrangements and/or rare somatic mutations that occur after the cleavage event (see Chapter 3). Epigenetic changes may occur in response to environmental or stochastic factors, thus leading to differences in gene expression between MZ twins. (Female MZ twins have an additional source of variability, because of the stochastic nature of X inactivation patterns in various tissues, as presented in Chapter 6.)

Other Limitations

Another problem may arise when assuming that the environmental exposure of MZ and DZ twins has been held constant when they are reared together but not when twins are reared apart. Environmental exposures, including even intrauterine environment, may vary for twins reared in the same family. For example, MZ twins frequently share a placenta, and there may be a disparity between the twins in blood supply, intrauterine development, and birth weight. For late-onset diseases, such as neurodegenerative disease of late adulthood, the assumption that MZ and DZ twins are exposed to similar environments throughout their adult lives becomes less and less valid, and thus a difference in concordance provides less strong evidence for genetic factors in disease causation. Conversely, one assumes that by determining disease concordance in MZ twins reared apart, one is measuring the effect of different environments on the same genotype. However, the environment of twins reared apart may actually not be as different as one might suppose. Thus no twin study is a perfectly controlled assessment of genetic versus environmental influence.

Finally, caution is necessary when generalizing from twin studies. The most extreme situation would be when the phenotype being studied is only sometimes genetic in origin; that is, nongenetic phenocopies may exist. If genotype alone causes the disease in half the pairs of twins (MZ twin concordance of 100%) in your sample and a nongenetic phenocopy affects only one twin of the other half of twin pairs in your sample (MZ twin concordance of 0%), twin studies will show an intermediate level of 50% concordance that really applies to neither form of the disease.

Examples of Common Multifactorial Diseases with a Genetic Contribution

In this section and the next, we turn to considering examples of several common conditions that illustrate general concepts of multifactorial disorders and their complex inheritance, as summarized here (see Box).

Characteristics of Inheritance of Complex Diseases

• Genetic variation contributes to diseases with complex inheritance, but these diseases are not single-gene disorders and do not demonstrate a simple mendelian pattern of inheritance.

• Diseases with complex inheritance often demonstrate familial aggregation because relatives of an affected individual are more likely to have disease-predisposing alleles in common with the affected person than with unrelated individuals.

• Diseases with complex inheritance are more common among the close relatives of a proband and become less common in relatives who are less closely related and therefore share fewer predisposing alleles. Greater concordance for disease is expected among monozygotic versus dizygotic twins.

• However, pairs of relatives who share disease-predisposing genotypes at relevant loci may still be discordant for phenotype (show lack of penetrance) because of the crucial role of nongenetic factors in disease causation. The most extreme examples of lack of penetrance despite identical genotypes are discordant monozygotic twins.

Multifactorial Congenital Malformations

Many common congenital malformations, occurring as isolated defects and not as part of a syndrome, are multifactorial and demonstrate complex inheritance (Table 8-6). Among these, congenital heart malformations are some of the most common and serve to illustrate the current state of understanding of other categories of congenital malformation.


Some Common Congenital Malformations with Multifactorial Inheritance


Approximate Population Incidence (per 1000)

Cleft lip with or without cleft palate


Cleft palate


Congenital dislocation of hip


Congenital heart defects


 Ventricular septal defect


 Patent ductus arteriosus


 Atrial septal defect


 Aortic stenosis


Neural tube defects


 Spina bifida and anencephaly


Pyloric stenosis

1, 5*

*Per 1000 males.

Per 1000 females.

Data from Carter CO: Genetics of common single malformations. Br Med Bull 32:21-26, 1976; Nora JJ: Multifactorial inheritance hypothesis for the etiology of congenital heart diseases: the genetic environmental interaction. Circulation 38:604-617, 1968; and Lin AE, Garver KL: Genetic counseling for congenital heart defects. J Pediatr 113:1105-1109, 1988.

Congenital heart defects (CHDs) occur at a frequency of approximately 4 to 8 per 1000 births. They are a heterogeneous group, caused in some cases by single-gene or chromosomal mechanisms and in others by exposure to teratogens, such as rubella infection or maternal diabetes. The cause is usually unknown, however, and the majority of cases are believed to be multifactorial in origin.

There are many types of CHDs, with different population incidences and empirical risks. It is known that when heart defects recur in a family, however, the affected children do not necessarily have exactly the same anatomical defect but instead show recurrence of lesions that are similar with regard to developmental mechanisms (see Chapter 14). By using developmental mechanisms as a classification scheme, five main groups of CHDs can be distinguished:

• Flow lesions

• Defects in cell migration

• Defects in cell death

• Abnormalities in extracellular matrix

• Defects in targeted growth

The subtype of congenital heart malformations known as flow lesions illustrates the familial aggregation and elevated risk for recurrence in relatives of an affected individual, all characteristic of a complex trait (Table 8-7). Flow lesions, which constitute approximately 50% of all CHDs, include hypoplastic left heart syndrome, coarctation of the aorta, atrial septal defect of the secundum type, pulmonary valve stenosis, a common type of ventricular septal defect, and other forms (Fig. 8-5). Up to 25% of patients with flow lesions, particularly tetralogy of Fallot, may have the deletion of chromosome region 22q11 seen in the velocardiofacial syndrome (see Chapter 6).


Population Incidence and Recurrence Risks for Various Flow Lesions



FIGURE 8-5 Diagram of various flow lesions seen in congenital heart disease. Blood on the left side of the circulation is shown in red, on the right side in blue. Abnormal admixture of oxygenated and deoxygenated blood is purple. AO, Aorta; LA, left atrium; LV, left ventricle; PA, pulmonary artery; RA, right atrium; RV, right ventricle.

Certain isolated CHDs are inherited as multifactorial traits. Until more is known, the figures shown in Table 8-7 can be used as estimates of the recurrence risk for flow lesions in first-degree relatives. There is, however, a rapid falloff in risk (to levels not much higher than the population risk) in second- and third-degree relatives of index patients with flow lesions. Similarly, relatives of index patients with types of CHDs other than flow lesions can be offered reassurance that their risk is no greater than that of the general population. For further reassurance, many CHDs can now be assessed prenatally by ultrasonography (see Chapter 17).

Neuropsychiatric Disorders

Mental illnesses are some of the most common and perplexing of human diseases, affecting 4% of the human population worldwide. The annual cost in medical care and social services exceeds $150 billion in the United States alone. Among the most severe of the mental illnesses are schizophrenia and bipolar disease (manic-depressive illness).

Schizophrenia affects 1% of the world's population. It is a devastating psychiatric illness, with onset commonly in late adolescence or young adulthood, and is characterized by abnormalities in thought, emotion, and social relationships, often associated with delusional thinking and disordered mood. A genetic contribution to schizophrenia is supported by both twin and family aggregation studies. MZ concordance in schizophrenia is estimated to be 40% to 60%; DZ concordance is 10% to 16%. The recurrence risk ratio is elevated in first- and second-degree relatives of schizophrenic patients (Table 8-8).


Recurrence Risks and Relative Risk Ratios in Schizophrenia Families

Relation to Individual Affected by Schizophrenia

Recurrence Risk (%)


Child of two schizophrenic parents









Nephew or niece



Uncle or aunt



First cousin






Although there is considerable evidence of a genetic contribution to schizophrenia, only a subset of the genes and alleles that predispose to the disease has been identified to date. A major exception is the small percentage (<2%) of all schizophrenia that is found in individuals with interstitial deletions of particular chromosomes, such as the 22q11 deletion responsible for the velocardiofacial syndrome. It is estimated that 25% of patients with 22q11 deletions develop schizophrenia, even in the absence of many or most of the other physical signs of the syndrome. The mechanism by which a deletion of 3 Mb of DNA on 22q11 (see Fig. 6-5) causes mental illness in patients with this syndrome is unknown. Chromosomal microarrays have been used to scan the entire genome for other deletions and duplications, many too small to be detectable by standard cytogenetic approaches, as introduced in Chapter 5. These studies have revealed numerous deletions and duplications (copy number variants [CNVs]) throughout the genome in both normal individuals and individuals with a variety of psychiatric and neurodevelopmental disorders (see Chapter 6). In particular, small (1- to 1.5-Mb) interstitial deletions at 1q21.1, 15q11.2, and 15q13.3 have been implicated repeatedly in a small fraction of patients with schizophrenia. For the vast majority of patients with schizophrenia, however, genetic lesions are not known, and counseling therefore relies on empirical risk figures (see Table 8-8).

Bipolar disease is predominantly a mood disorder in which episodes of mood elevation, grandiosity, high-risk dangerous behavior, and inflated self-esteem (mania) alternate with periods of depression, decreased interest in what are normally pleasurable activities, feelings of worthlessness, and suicidal thinking. The prevalence of bipolar disease is 0.8%, approximately equal to that of schizophrenia, with a similar age at onset. The seriousness of this condition is underscored by the high (10% to 15%) rate of suicide in affected patients.

A genetic contribution to bipolar disease is strongly supported by twin and family aggregation studies. MZ twin concordance is 40% to 60%; DZ twin concordance is 4% to 8%. Disease risk is also elevated in relatives of affected individuals (Table 8-9). One striking aspect of bipolar disease in families is that the condition has variable expressivity; some members of the same family demonstrate classic bipolar illness, others have depression alone (unipolar disorder), and others carry a diagnosis of a psychiatric syndrome that involves both thought and mood (schizoaffective disorder). Even less is known about genes and alleles that predispose to bipolar disease than is known for schizophrenia; in particular, although an increase in de novo deletions or duplications has been identified in bipolar psychosis, recurrent CNVs involving particular regions of the genome have not been identified. Counseling therefore typically relies on empirical risk figures (see Table 8-9).


Recurrence Risks and Relative Risk Ratios in Bipolar Disorder Families

Relation to Individual Affected with Bipolar Disease


Risk (%)*


Child of two parents with bipolar disease









Second-degree relative



*Recurrence of bipolar, unipolar, or schizoaffective disorder.

Coronary Artery Disease

Coronary artery disease (CAD) kills approximately 500,000 individuals in the United States yearly and is one of the most frequent causes of morbidity and mortality in the developed world. CAD due to atherosclerosis is the major cause of the nearly 1,500,000 cases of myocardial infarction (MI) and the more than 200,000 deaths from acute MI occurring annually. In the aggregate, CAD costs more than $143 billion in health care expenses alone each year in the United States, not including lost productivity. For unknown reasons, males are at higher risk for CAD both in the general population and within affected families.

Family studies have repeatedly supported a role for heredity in CAD, particularly when it occurs in relatively young individuals. The pattern of increased risk suggests that when the proband is female or young, there is likely to be a greater genetic contribution to MI in the family, thereby increasing the risk for disease in the proband's relatives. For example, the recurrence risk (Table 8-10) in male first-degree relatives of a female proband is sevenfold greater than that in the general population, compared with the 2.5-fold increased risk in female relatives of a male proband. When the proband is young (<55 years) and female, the risk for CAD is more than 11 times greater than that of the general population. Having multiple relatives affected at a young age increases risk substantially as well. Twin studies also support a role for genetic variants in CAD (Table 8-11).

TABLE 8-10

Risk for Coronary Artery Disease in Relatives of a Proband


Increased Risk for CAD in a Family Member*


3-fold in male first-degree relatives

2.5-fold in female first-degree relatives


7-fold in male first-degree relatives

Female <55 years of age

11.4-fold in male first-degree relatives

Two male relatives <55 years of age

13-fold in first-degree relatives

*Relative to the risk in the general population.

CAD, Coronary artery disease.

Data from Silberberg JS: Risk associated with various definitions of family history of coronary heart disease. Am J Epidemiol 147:1133-1139, 1998.

TABLE 8-11

Twin Concordance Rates and Relative Risks for Fatal Myocardial Infarction When Proband Had Early Fatal Myocardial Infarction*


*Early myocardial infarction defined as age <55 years in males, age <65 years in females.

Relative to the risk in the general population.

DZ, Dizygotic; MZ, monozygotic.

Data from Marenberg ME: Genetic susceptibility to death from coronary heart disease in a study of twins. N Engl J Med 330:1041-1046, 1994.

A few mendelian disorders leading to CAD are known. Familial hypercholesterolemia (Case 16), an autosomal dominant defect of the low-density lipoprotein (LDL) receptor discussed in Chapter 12, is one of the most common of these but accounts for only approximately 5% of survivors of MI. Most cases of CAD show multifactorial inheritance, with both nongenetic and genetic predisposing factors. There are many stages in the evolution of atherosclerotic lesions in the coronary artery. What begins as a fatty streak in the intima of the artery evolves into a fibrous plaque containing smooth muscle, lipid, and fibrous tissue. These intimal plaques become vascular and may bleed, ulcerate, and calcify, thereby causing severe vessel narrowing as well as providing fertile ground for thrombosis, resulting in sudden, complete occlusion and MI. Given the many stages in the evolution of atherosclerotic lesions in the coronary artery, it is not surprising that many genetic differences affecting the various pathological processes involved could predispose to or protect from CAD (Fig. 8-6; also see Box). Additional risk factors for CAD include other disorders that are themselves multifactorial with genetic components, such as hypertension, obesity, and diabetes mellitus. The metabolic and physiological derangements represented by these disorders also contribute to enhancing the risk for CAD. Finally, diet, physical activity, systemic inflammation, and smoking are environmental factors that also play a major role in influencing the risk for CAD. Given all the different processes, metabolic derangements, and environmental factors that contribute to the development of CAD, it is easy to imagine that genetic susceptibility to CAD could be a complex multifactorial condition (see Box).

Genes and Gene Products Involved in the Stepwise Process of Coronary Artery Disease

A large number of genes and gene products have been suggested and, in some cases, implicated in promoting one or more of the developmental stages of coronary artery disease. These include genes involved in the following:

• Serum lipid transport and metabolism—cholesterol, apolipoprotein E, apolipoprotein C-III, the low-density lipoprotein (LDL) receptor, and lipoprotein(a)—as well as total cholesterol level. Elevated LDL cholesterol level and decreased high-density lipoprotein cholesterol level, both of which elevate the risk for coronary artery disease, are themselves quantitative traits with significant heritabilities of 40% to 60% and 45% to 75%, respectively.

• Vasoactivity, such as angiotensin-converting enzyme

• Blood coagulation, platelet adhesion, and fibrinolysis, such as plasminogen activator inhibitor 1, and the platelet surface glycoproteins Ib and IIIa

• Inflammatory and immune pathways

• Arterial wall components


FIGURE 8-6 Sections of coronary artery demonstrating the steps leading to coronary artery disease. Genetic and environmental factors operating at any or all of the steps in this pathway can contribute to the development of this complex, common disease. SeeSources & Acknowledgments.

CAD is often an incidental finding in family histories of patients with other genetic diseases. In view of the high recurrence risk, physicians and genetic counselors may need to consider whether first-degree relatives of patients with CAD should be evaluated further and offered counseling and therapy, even when CAD is not the primary genetic problem for which the patient or relative has been referred. Such an evaluation is clearly indicated when the proband is young, particularly if the proband is female.

Examples of Multifactorial Traits for Which Specific Genetic and Environmental Factors are Known

Up to this point, we have described some of the epidemiological approaches involving family and twin studies that are used to assess the extent to which there may be a genetic contribution to a complex trait. It is important to realize, however, that studies of familial aggregation, disease concordance, or heritability do not specify how many loci there are, which loci and alleles are involved, or how a particular genotype and set of environmental influences interact to cause a disease or to determine the value of a particular physiological measurement. In most cases, all we can show is that there is some genetic contribution and estimate its magnitude. There are, however, a few multifactorial diseases with complex inheritance for which we have begun to identify the genetic and, in some cases, environmental factors responsible for increasing disease susceptibility. We give a few examples in the next part of this chapter, illustrating increasing levels of complexity.

Modifier Genes in Mendelian Disorders

As discussed in Chapter 7, allelic variation at a single locus can explain variation in the phenotype in many single-gene disorders. However, even for well-characterized mendelian disorders known to be due to defects in a single gene, variation at other gene loci may impact some aspect of the phenotype, illustrating features therefore of complex inheritance.

In cystic fibrosis (CF) (Case 12), for example, whether or not a patient has pancreatic insufficiency requiring enzyme replacement can be explained largely by which mutant alleles are present in the CFTR gene (see Chapter 12). The correlation is imperfect, however, for other phenotypes. For example, the variation in the degree of pulmonary disease seen in CF patients remains unexplained by allelic heterogeneity. It has been proposed that the genotype at other genetic loci could act as genetic modifiers, that is, genes whose alleles have an effect on the severity of pulmonary disease seen in CF patients. For example, reduction in forced expiratory volume after 1 second (FEV1), calculated as a percentage of the value expected for CF patients (a CF-specific FEV1 percent), is a quantitative trait commonly used to measure deterioration in pulmonary function in CF patients. A comparison of CF-specific FEV1 percent in affected MZ versus affected DZ twins provides an estimate of the heritability of the severity of lung disease in CF patients of approximately 50%. This value is independent of the specific CFTR allele(s) (because both kinds of twins will have the same CF mutations).

Two loci harboring alleles responsible for modifying the severity of pulmonary disease in CF are known: MBL2, a gene that encodes a serum protein called mannose-binding lectin; and the TGFB1 locus encoding the cytokine transforming growth factor β (TGFβ). Mannose-binding lectin is a plasma protein in the innate immune system that binds to many pathogenic organisms and aids in their destruction by phagocytosis and complement activation. A number of common alleles that result in reduced blood levels of the lectin exist at the MBL2 locus in European populations. Lower levels of mannose-binding lectin appear associated with worse outcomes for CF lung disease, perhaps because low levels of lectin result in difficulties with containing respiratory pathogens, particularly Pseudomonas. Alleles at the TGFB1 locus that result in higher TGFβ production are also associated with worse outcome, perhaps because TGFβ promotes lung scarring and fibrosis after inflammation. Thus both MBL2 and TGFB1 are modifier genes, variants at which—while they do not cause CF—can modify the clinical phenotype associated with disease-causing alleles at the CFTR locus.

Digenic Inheritance

The next level of complexity is a disorder determined by the additive effect of the genotypes at two or more loci. One clear example of such a disease phenotype has been found in a few families of patients with a form of retinal degeneration called retinitis pigmentosa (RP) (Fig. 8-7). Affected individuals in these families are heterozygous for mutant alleles at two different loci (double heterozygotes). One locus encodes the photoreceptor membrane protein peripherin and the other encodes a related photoreceptor membrane protein called Rom1. Heterozygotes for only one or the other of these mutations in these families are unaffected. Thus the RP in this family is caused by the simplest form of multigenic inheritance, inheritance due to the effect of mutant alleles at two loci, without any known environmental factors that influence disease occurrence or severity. The proteins encoded by these two genes are likely to have overlapping physiological function because they are both located in the stacks of membranous disks found in retinal photoreceptors. It is the additive effect of having an abnormality in two proteins with overlapping function that produces disease.


FIGURE 8-7 Pedigree of a family with retinitis pigmentosa due to digenic inheritance. Dark blue symbols are affected individuals. Each individual's genotypes at the peripherin locus (first line) and ROM1 locus (second line) are written below each symbol. The normal allele is 1; the mutant allele is mut. Light blue symbols are unaffected, despite carrying a mutation in one or the other gene. SeeSources & Acknowledgments.

A multigenic model has also been observed in a few families with Bardet-Biedl syndrome, a rare birth defect characterized by obesity, variable degrees of intellectual disability, retinal degeneration, polydactyly, and genitourinary malformations. Fourteen different genes have been found in which mutations cause the syndrome. Although inheritance is clearly autosomal recessive in most families, a few families appear to demonstrate digenic inheritance, in which the disease occurs only when an individual is homozygous for mutations at one of these 14 loci and is heterozygous for a mutation at another of the loci.

Gene-Environment Interactions in Venous Thrombosis

Another example of gene-gene interaction predisposing to disease is found in the group of conditions referred to as hypercoagulability states, in which venous or arterial clots form inappropriately and cause life-threatening complications of thrombophilia (Case 46). With hypercoagulability, however, there is a third factor, an environmental influence that in the presence of the predisposing genetic factors, increases the risk for disease even more.

One such disorder is idiopathic cerebral vein thrombosis, a disease in which clots form in the venous system of the brain, causing catastrophic occlusion of cerebral veins in the absence of an inciting event such as infection or tumor. It affects young adults, and although quite rare (<1 per 100,000 in the population), it carries a high mortality rate (5% to 30%). Three relatively common factors—two genetic and one environmental—that lead to abnormal coagulability of the clotting system are each known to individually increase the risk for cerebral vein thrombosis (Fig. 8-8):

• A missense variant in the gene for the clotting factor, factor V

• A variant in the 3′ untranslated region (UTR) of the gene for the clotting factor prothrombin

• The use of oral contraceptives


FIGURE 8-8 The clotting cascade relevant to factor V Leiden and prothrombin variants. Once factor X is activated, through either the intrinsic or extrinsic pathway, activated factor V promotes the production of the coagulant protein thrombin from prothrombin, which in turn cleaves fibrinogen to generate fibrin required for clot formation. Oral contraceptives (OC) increase blood levels of prothrombin and factor X as well as a number of other coagulation factors. The hypercoagulable state can be explained as a synergistic interaction of genetic and environmental factors that increase the levels of factor V, prothrombin, factor X and others to promote clotting. Activated forms of coagulation proteins are indicated by the letter a. Solid arrows are pathways; dashed arrows are stimulators.

A polymorphic allele of factor V, factor V Leiden (FVL), in which arginine is replaced by glutamine at position 506 (Arg506Gln), has a frequency of approximately 2.5% in white populations but is rarer in other population groups. This alteration affects a cleavage site used to degrade factor V, thereby making the protein more stable and able to exert its procoagulant effect for a longer duration. Heterozygous carriers of FVL, approximately 5% of the white population, have a risk for cerebral vein thrombosis that, although still quite low, is sevenfold higher than that in the general population; homozygotes have a risk that is eightyfold higher.

The second genetic risk factor, a mutation in the prothrombin gene, changes a G to an A at position 20210 in the 3′ UTR of the gene (prothrombin g.20210G>A). Approximately 2.4% of white individuals are heterozygotes, but it is rare in other ethnic groups. This change appears to increase the level of prothrombin mRNA, resulting in increased translation and elevated levels of the protein. Being heterozygous for the prothrombin 20210G>A allele raises the risk for cerebral vein thrombosis three to sixfold.

Finally, the use of oral contraceptives containing synthetic estrogen increases the risk for thrombosis fourteen- to twentytwofold, independent of genotype at the factor V and prothrombin loci, probably by increasing the levels of many clotting factors in the blood. Although using oral contraceptives and being heterozygous for FVL cause only a modest increase in risk compared with either factor alone, oral contraceptive use in a heterozygote for prothrombin 20210G>A raises the relative risk for cerebral vein thrombosis 30- to 150-fold!

There is also interest in the role of FVL and prothrombin 20210G>A alleles in deep venous thrombosis (DVT) of the lower extremities, a condition that occurs in approximately 1 in 1000 individuals per year, far more common than idiopathic cerebral venous thrombosis. Mortality due to DVT (primarily due to pulmonary embolus) can be up to 10%, depending on age and the presence of other medical conditions. Many environmental factors are known to increase the risk for DVT and include trauma, surgery (particularly orthopedic surgery), malignant disease, prolonged periods of immobility, oral contraceptive use, and advanced age.

The FVL allele increases the relative risk for a first episode of DVT sevenfold in heterozygotes; heterozygotes who use oral contraceptives see their risk increased thirtyfold compared with controls. Heterozygotes for prothrombin 20210G>A also have an increase in their relative risk for DVT of twofold to threefold. Notably, double heterozygotes for FVL and prothrombin 20210G>A have a relative increased risk of twentyfold—a risk approaching a few percent of the population.

Thus each of these three factors, two genetic and one environmental, on its own increases the risk for an abnormal hypercoagulable state; having two or all three of these factors at the same time raises the risk even more, to the point that thrombophilia screening programs for selected populations of patients may be indicated in the future.

Multiple Coding and Noncoding Elements in Hirschsprung Disease

A more complicated set of interacting genetic factors has been described in the pathogenesis of a developmental abnormality of the enteric nervous system in the gut known as Hirschsprung disease (HSCR)(Case 22). In HSCR, there is complete absence of some or all of the intrinsic ganglion cells in the myenteric and submucosal plexuses of the colon. An aganglionic colon is incapable of peristalsis, resulting in severe constipation, symptoms of intestinal obstruction, and massive dilatation of the colon (megacolon) proximal to the aganglionic segment. The disorder affects approximately 1 in 5000 newborns of European ancestry but is twice as common among Asian infants. HSCR occurs as an isolated birth defect 70% of the time, as part of a chromosomal syndrome 12% of the time, and as one element of a broad constellation of congenital abnormalities in the remainder of cases. Among patients with HSCR as an isolated birth defect, 80% have only a single, short aganglionic segment of colon at the level of the rectum (hence, HSCR-S), whereas 20% have aganglionosis of a long segment of colon, the entire colon or, occasionally, the entire colon plus the ileum as well (hence, HSCR-L).

Familial HSCR-L is often characterized by patterns of inheritance that suggest dominant or recessive inheritance, but consistently with reduced penetrance. HSCR-L is most commonly caused by loss-of-function missense or nonsense mutations in the RET gene, which encodes RET, a receptor tyrosine kinase. A small minority of families have mutations in genes encoding ligands that bind to RET, but with even lower penetrance than those families with RET mutations.

HSCR-S is the more common type of HSCR and has many of the characteristics of a disorder with complex genetics. The relative risk ratio for sibs, λs, is very high (approximately 200), but MZ twins do not show perfect concordance and families do not show any obvious mendelian inheritance pattern for the disorder. When pairs of siblings concordant for HSCR-S were analyzed genome-wide to see which loci and which sets of alleles at these loci each sib had in common with an affected brother or sister, alleles at three loci (including RET) were found to be significantly shared, suggesting gene-gene interactions and/or multigenic inheritance; indeed, most of the concordant sibpairs were found to share alleles at all three loci. Although the non-RET loci have yet to be identified, Figure 8-9 illustrates the range of interactions necessary to account for much of the penetrance of HSCR in even this small cohort of patients.


FIGURE 8-9 Patterns of allele sharing among sibpairs concordant for Hirschsprung disease, divided according to the number of loci for which the sibs show allele sharing. The three loci are located at 10q11.2 (the RET locus), 3p21, and 19q12. SeeSources & Acknowledgments.

HSCR mutations have now been described at over a dozen loci, with RET mutations being by far the most common. The current data suggest that the RET gene is implicated in nearly all HSCR patients and, in particular, have pointed to two interacting noncoding regulatory variants near the RET gene, one in a potent gut enhancer with a binding site for the relevant transcription factor SOX10 and the other at an even more distant noncoding site some 125 kb upstream of the RET transcription start site. Thus HSCR-S is a multifactorial disease that results from mutations in or near the RET locus, perturbing the normally tightly controlled process of enteric nervous system development, combined with mutations at a number of other loci, both known and still unknown. Current genomic approaches of the type discussed in Chapter 10suggest the possibility that many dozens of additional genes could be involved.

The identification of common, low-penetrant variants in noncoding elements serves to illustrate that the gene variants responsible for modifying expression of a multifactorial trait may be subtle in how they exert their effects on gene expression and, as a consequence, on disease penetrance and expressivity. It is also sobering to realize that the underlying genetic mechanisms for this relatively well defined congenital malformation have turned out to be so surprisingly complex; still, they are likely to be far simpler than are the mechanisms involved in the more common complex diseases, such as diabetes.

Type 1 Diabetes Mellitus

A common complex disease for which some of the underlying genetic architecture is being delineated is diabetes mellitus. Diabetes occurs in two major forms: type 1 (T1D) (sometimes referred to as insulin-dependent; IDDM) (Case 26) and type 2 (T2D) (sometimes referred to as non–insulin-dependent; NIDDM) (Case 35), representing approximately 10% and 88% of all cases, respectively. Familial aggregation is seen in both types of diabetes, but in any given family, usually only T1D or T2D is present. They differ in typical onset age, MZ twin concordance, and association with particular genetic variants at particular loci. Here, we focus on T1D to illustrate the major features of complex inheritance in diabetes.

T1D has an incidence in the white population of approximately 2 per 1000 (0.2%), but this is lower in African and Asian populations. It usually manifests in childhood or adolescence. It results from autoimmune destruction of the β cells of the pancreas, which normally produce insulin. A large majority of children who will go on to have T1D develop multiple autoantibodies early in childhood against a variety of endogenous proteins, including insulin, well before they develop overt disease.

There is strong evidence for genetic factors in T1D: concordance among MZ twins is approximately 40%, which far exceeds the 5% concordance in DZ twins. The lifetime risk for T1D in siblings of an affected proband is approximately 7%, resulting in an estimated λs of ≈35. However, the earlier the age of onset of the T1D in the proband, the greater is λs.

The Major Histocompatibility Complex

The major genetic factor in T1D is the major histocompatibility complex (MHC) locus, which spans some 3 Mb on chromosome 6 and is the most highly polymorphic locus in the human genome, with over 200 known genes (many involved in immune functions) and well over 2000 alleles known in populations around the globe (Fig. 8-10). On the basis of structural and functional differences, two major subclasses, the class I and class II genes, correspond to the human leukocyte antigen (HLA) genes, originally discovered by virtue of their importance in tissue transplantation between unrelated individuals. The HLA class I (HLA-A, HLA-B, HLA-C) and class II (HLA-DR, HLA-DQ, HLA-DP) genes encode cell surface proteins that play a critical role in the presentation of antigen to lymphocytes, which cannot recognize and respond to an antigen unless it is complexed with an HLA molecule on the surface of an antigen-presenting cell. Within the MHC, the HLA class I and class II genes are by far the most highly polymorphic loci (see Fig. 8-10).


FIGURE 8-10 Genomic landscape of the major histocompatibility complex (MHC). The classic MHC is shown on the short arm of chromosome 6, comprising the class I region (yellow) and class II region (blue), both enriched in human leukocyte antigen (HLA) genes. Sequence-level variation is shown for single nucleotide polymorphisms (SNPs) found with at least 1% frequency. Remarkably high levels of polymorphism are seen in regions containing the classic HLA genes where variation is enriched in coding exons involved in defining the antigen-binding cleft. Other genes (pink) in the MHC region show lower levels of polymorphism. dbSNP, minor allele frequency in the Single Nucleotide Polymorphism database. SeeSources & Acknowledgments.

The original studies showing an association between T1D and alleles designated as HLA-DR3 and HLA-DR4 relied on a serological method in use at that time for distinguishing between different HLA alleles, one that was based on immunological reactions in a test tube. This method has long been superseded by direct determination of the DNA sequence of different alleles, and sequencing of the MHC in a large number of individuals has revealed that the serologically determined “alleles” associated with T1D are not single alleles at all (see Box). Both DR3 and DR4 can be subdivided into a dozen or more alleles located at a locus now termed HLA-DRB1.

Human Leukocyte Antigen Alleles and Haplotypes

The human leukocyte antigen (HLA) system can be confusing at first because the nomenclature used to define and describe different HLA alleles has undergone a fundamental change with the advent of widespread DNA sequencing of the major histocompatibility complex (MHC). According to the older system of HLA nomenclature, the different alleles were distinguished from one another serologically. However, as the genes responsible for encoding the class I and class II MHC chains were identified and sequenced (see Fig. 8-10), single HLA alleles initially defined serologically were shown to consist of multiple alleles defined by different DNA sequence variants even within the same serological allele. The 100 serological specificities at HLA-A, B, C, DR, DQ, and DP loci now comprise more than 1300 alleles defined at the DNA sequence level! For example, what used to be a single B27 allele defined serologically, is now referred to as HLA-B*2701HLA-B*2702, and so on, based on DNA-based genotyping.

The set of HLA alleles at the different class I and class II loci on a given chromosome together form a haplotype. Within any one ethnic group, some HLA alleles and haplotypes are found commonly; others are rare or never seen. The differences in the distribution and frequency of the alleles and haplotypes within the MHC are the result of complex genetic, environmental, and historical factors at play in each of the different populations. The extreme levels of polymorphism at HLA loci and their resulting haplotypes have been extraordinarily useful for identifying associations of particular variants with specific diseases (see Chapter 10), many of which (as one might predict) are autoimmune disorders, associated with an abnormal immune response apparently directed against one or more self-antigens resulting from polymorphism in immune response genes.

Furthermore, it is now clear that the association between certain DRB1 alleles and T1D is due, in part, to alleles at two other class II loci, DQA1 and DQB1, located approximately 80 kb away from DRB1, that form a particular combination of alleles with each other—that is, a haplotype—that is typically inherited as a unit (due to linkage disequilibrium; see Chapter 10). DQA1 and DQB1 encode the α and β chains of the class II DQ protein. Certain combinations of alleles at these three loci form a haplotype that increases the risk for T1D more than elevenfold over that for the general population, whereas other combinations of alleles reduce the risk fiftyfold. The DQB1*0303 allele contained in this protective haplotype results in the amino acid aspartic acid at position 57 of the DQB1 product, whereas other amino acids at this position (alanine, valine, or serine) confer susceptibility. In fact, approximately 90% of patients with T1D are homozygous for DQB1 alleles that do not encode aspartic acid at position 57. It is likely that differences in antigen binding, determined by which amino acid is at position 57, contribute directly to the autoimmune response that destroys the insulin-producing cells of the pancreas. Other loci and alleles in the MHC, however, are also important, as can be seen from the fact that some patients with T1D do have an aspartic acid at this position.

Genes Other Than Class II Major Histocompatibility Complex Loci in Type 1 Diabetes

The MHC haplotype alone accounts for only a portion of the genetic contribution to the risk for T1D in siblings of a proband. Family studies in T1D (Table 8-12) suggest that even when siblings share the same MHC class II haplotypes, the risk for disease is only approximately 17%, still well below the MZ twin concordance rate of approximately 40%. Thus there must be other genes, elsewhere in the genome, that contribute to the development of T1D, assuming that MZ twins and sibs have similar environmental exposures. Indeed, genetic association studies (to be described in Chapter 10) indicate that variation at nearly 50 different loci around the genome can increase susceptibility to T1D, although most have very small effects on increasing disease susceptibility.

TABLE 8-12

Empirical Risks for Counseling in Type 1 Diabetes

Relationship to Affected Individual

Risk for Development

of Type 1 Diabetes (%)



MZ twin




Sibling with no DR haplotypes in common


Sibling with 1 DR haplotype in common


Sibling with 2 DR haplotypes in common




Child of affected mother


Child of affected father


*20%-25% for particular shared haplotypes.

MZ, Monozygotic.

It is important to stress, however, that genetic factors alone do not cause T1D because the MZ twin concordance rate is only approximately 40%, not 100%. Until a more complete picture develops of the genetic and nongenetic factors that cause T1D, risk counseling using HLA haplotyping must remain empirical (see Table 8-12).

Alzheimer Disease

Alzheimer disease (AD) (Case 4) is a fatal neurodegenerative disease that affects 1% to 2% of the United States population. It is the most common cause of dementia in older adults and is responsible for more than half of all cases of dementia. As with other dementias, patients experience a chronic, progressive loss of memory and other cognitive functions, associated with loss of certain types of cortical neurons. Age, sex, and family history are the most significant risk factors for AD. Once a person reaches 65 years of age, the risk for any dementia, and AD in particular, increases substantially with age and female sex (Table 8-13).

TABLE 8-13

Cumulative Age- and Sex-Specific Risks for Alzheimer Disease and Dementia

Time Interval Past 65 Years of Age

Risk for Development of AD (%)

Risk for Development of Any Dementia (%)

65 to 80 years







65 to 100 years







AD, Alzheimer disease.

Data from Seshadri S, Wolf PA, Beiser A, et al: Lifetime risk of dementia and Alzheimer's disease. The impact of mortality on risk estimates in the Framingham Study. Neurology 49:1498-1504, 1997.

AD can be diagnosed definitively only postmortem, on the basis of neuropathological findings of characteristic protein aggregates (β-amyloid plaques and neurofibrillary tangles; see Chapter 12). The most important constituent of the plaques is a small (39– to 42–amino acid) peptide, Aβ, derived from cleavage of a normal neuronal protein, the amyloid protein precursor. The secondary structure of Aβ gives the plaques the staining characteristics of amyloid proteins.

In addition to three rare autosomal dominant forms of the disease (see Chapter 12), in which disease onset is in the third to fifth decade, there is a common form of AD with onset after the age of 60 years (late onset). This form has no obvious mendelian inheritance pattern but does show familial aggregation and an elevated relative risk ratio (λs = ≈4) typical of disorders with complex inheritance. Twin studies have been inconsistent but suggest MZ concordance of approximately 50% and DZ concordance of approximately 18%.

The ε4 Allele of Apolipoprotein E

The major locus with alleles found to be significantly associated with common late-onset AD is APOE, which encodes apolipoprotein E. Apolipoprotein E is a protein component of the LDL particle and is involved in clearing LDL through an interaction with high-affinity receptors in the liver. Apolipoprotein E is also a constituent of amyloid plaques in AD and is known to bind the Aβ peptide. The APOE gene has three alleles, ε2, ε3, and ε4, due to substitutions of arginine for two different cysteine residues in the protein (see Chapter 12).

When the genotypes at the APOE locus were analyzed in AD patients and controls, a genotype with at least one ε4 allele was found two to three times more frequently among patients compared with controls in both the general U.S. and Japanese populations (Table 8-14), with much less of an association in Hispanic and African American populations. Even more striking is that the risk for AD appears to increase further if both APOE alleles are ε4, through an effect on the age at onset of AD; patients with two ε4 alleles have an earlier onset of disease than those with only one. In a study of patients with AD and unaffected controls, the age at which AD developed in the affected patients was earliest for ε4/ε4 homozygotes, next for ε4/ε3 heterozygotes, and significantly less for the other genotypes (Fig. 8-11).

TABLE 8-14

Association of Apolipoprotein E ε4 Allele with Alzheimer Disease*


*Frequency of genotypes with and without the ε4 allele among Alzheimer disease (AD) patients and controls from the United States and Japan.


FIGURE 8-11 Chance of developing Alzheimer disease as a function of age for different APOE genotypes for each sex. At one extreme is the ε4/ε4 homozygote, who has ≈40% chance of remaining free of the disease by the age of 85 years, whereas an ε3/ε3 homozygote has ≈70% to ≈90% chance of remaining disease free at the age of 85 years, depending on the sex. General population risk is also shown for comparison. SeeSources & Acknowledgments.

In the population in general, the risk for developing AD by age 80 is approaching 10%. The ε4 allele is clearly a predisposing factor that increases the risk for development of AD by shifting the age at onset to an earlier age, such that ε3/ε4 heterozygotes have a 40% risk for developing the disease, and ε4/ε4 have a 60% risk by age 85. Despite this increased risk, other genetic and environmental factors must be important because a significant proportion of ε3/ε4 and ε4/ε4 individuals live to extreme old age with no evidence of AD. There are also reports of association between the presence of the ε4 allele and neurodegenerative disease after traumatic head injury (as seen in professional boxers, football players, and soldiers who have suffered blast injuries), indicating that at least one environmental factor, brain trauma, can interact with the ε4 allele in the pathogenesis of AD.

The ε4 variant of APOE represents a prime example of a predisposing allele: it predisposes to a complex trait in a powerful way but does not predestine any individual carrying the allele to the disease. Additional genes as well as environmental effects are also clearly involved; although several of these appear to have a significant effect, most remain to be identified. In general, testing of asymptomatic people for the APOE ε4 allele remains inadvisable because knowing that one is a heterozygote or homozygote for the ε4 allele does not mean one will develop AD, nor are there any interventions currently known that can affect the chance one will or will not develop AD (see Chapter 18).

The Challenge of Multifactorial Disease with Complex Inheritance

The greatest challenge facing medical genetics and genomic medicine going forward is unraveling the complex interactions between the variants at multiple loci and the relevant environmental factors that underlie the susceptibility to common multifactorial disease. This area of research is the central focus of the field of population-based genetic epidemiology (to be discussed more fully in Chapter 10). The field is developing rapidly, and it is clear that the genetic contribution to many more complex diseases in humans will be elucidated in the coming years. Such understanding will, in time, allow the development of novel preventive and therapeutic measures for the common disorders that cause such significant morbidity and mortality in the population.

General References

Chakravarti A, Clark AG, Mootha VK. Distilling pathophysiology from complex disease genetics. Cell. 2013;155:21–26.

Rimoin DL, Pyeritz RE, Korf BR. Emery and Rimoin's essential medical genetics. Academic Press (Elsevier): Waltham, MA; 2013.

Scott W, Ritchie M. Genetic analysis of complex disease. ed 3. John Wiley and Sons: Hoboken, NJ; 2014.

References for Specific Topics

Amiel J, Sproat-Emison E, Garcia-Barcelo M, et al. Hirschsprung disease, associated syndromes, and genetics: a review. J Med Genet. 2008;45:1–14.

Bertram L, Lill CM, Tanzi RE. The genetics of Alzheimer disease: back to the future. Neuron. 2010;68:270–281.

Concannon P, Rich SS, Nepom GT. Genetics of type 1A diabetes. N Engl J Med. 2009;360:1646–1664.

Emison ES, Garcia-Barcelo M, Grice EA, et al. Differential contributions of rare and common, coding and noncoding Ret mutations to multifactorial Hirschsprung Disease liability. Am J Hum Genet. 2010;87:60–74.

Malhotra D, McCarthy S, Michaelson JJ, et al. High frequencies of de novo CNVs in bipolar disorder and schizophrenia. Neuron. 2011;72:951–963.

Segal JB, Brotman DJ, Necochea AJ, et al. Predictive value of Factor V Leiden and prothrombin G20210A in adults with venous thromboembolism and in family members of those with a mutation. JAMA. 2009;301:2472–2485.

Trowsdale J, Knight JC. Major histocompatibility complex genomics and human disease. Annu Rev Genomics Hum Genet. 2013;14:301–323.


1. For a certain malformation, the recurrence risk in sibs and offspring of affected persons is 10%, the risk in nieces and nephews is 5%, and the risk in first cousins is 2.5%.

a. Is this more likely to be an autosomal dominant trait with reduced penetrance or a multifactorial trait? Explain.

b. What other information might support your conclusion?

2. A large sex difference in affected persons is often a clue to X-linked inheritance. How would you establish that pyloric stenosis is multifactorial rather than X-linked?

3. A series of children with a particular congenital malformation includes both boys and girls. In all cases, the parents are normal. How would you determine whether the malformation is more likely to be multifactorial than autosomal recessive?