Thompson & Thompson Genetics in Medicine, 8th Edition

CHAPTER 9. Genetic Variation in Populations

We have explored in previous chapters the nature of genetic and genomic variation and mutation and the inheritance of different alleles in families. Throughout, we have alluded to differences in the frequencies of different alleles in different populations, whether assessed by examining different single nucleotide polymorphisms (SNPs), indels, or copy number variants (CNVs) in the genomes of many thousands of individuals studied worldwide (see Chapter 4) or inferred by ascertaining individuals with specific phenotypes and genetic disorders among populations around the globe (see Chapters 7 and 8). Here we consider in greater detail the genetics of populations and the principles that influence the frequency of genotypes and phenotypes in those populations.

Population genetics is the quantitative study of the distribution of genetic variation in populations and of how the frequencies of genes and genotypes are maintained or change over time both within and between populations. Population genetics is concerned both with genetic factors, such as mutation and reproduction, and with environmental and societal factors, such as selection and migration, which together determine the frequency and distribution of alleles and genotypes in families and communities. A mathematical description of the behavior of alleles in populations is an important element of many disciplines, including anthropology, evolutionary biology, and human genetics. At present, human geneticists use the principles and methods of population genetics to address many unanswered questions concerning the history and genetic structure of human populations, the flow of alleles between populations and between generations, and, importantly, the optimal methods for identifying genetic susceptibilities to common diseases, as we introduced in Chapter 8. In the practice of medical genetics, population genetics provides knowledge about various disease genes that are common in different populations. Such information is needed for clinical diagnosis and genetic counseling, including determining the allele frequencies required for accurate risk calculations.

In this chapter, we describe the central, organizing concept of population genetics, Hardy-Weinberg equilibrium; we consider its assumptions and the factors that may cause true or apparent deviation from equilibrium in real as opposed to idealized populations. Finally, we provide some insight into how differences in allelic variant or disease gene frequencies arise among members of different, more or less genetically isolated groups.

Genotypes and Phenotypes in Populations

Allele and Genotype Frequencies in Populations

To illustrate the relationship between allele and genotype frequencies in populations, we begin with an important example of a common autosomal trait governed by a single pair of alleles. Consider the gene CCR5, which encodes a cell surface cytokine receptor that serves as an entry point for certain strains of the human immunodeficiency virus (HIV), which causes the acquired immunodeficiency syndrome (AIDS).A 32-bp deletion in this gene results in an allele (ΔCCR5) that encodes a nonfunctional protein due to a frameshift and premature termination. Individuals homozygous for the ΔCCR5 allele do not express the receptor on the surface of their immune cells and, as a consequence, are resistant to HIV infection. Loss of function of CCR5 appears to be a benign trait, and its only known phenotypic consequence is resistance to HIV infection. A sampling of 788 individuals from Europe illustrates the distribution of individuals who were homozygous for the wild-type CCR5 allele, homozygous for the ΔCCR5 allele, or heterozygous (Table 9-1).


Genotype Frequencies for the Wild-Type CCR5 Allele and the ΔCCR5 Deletion Allele


Data from Martinson JJ, Chapman NH, Rees DC, et al: Global distribution of the CCR5 gene 32-basepair deletion. Nat Genet 16:100-103, 1997.

On the basis of the observed genotype frequencies, we can directly determine the allele frequencies by simply counting the alleles. In this context, when we refer to the population frequency of an allele, we are considering a hypothetical gene pool as a collection of all the alleles at a particular locus for the entire population. For autosomal loci, the size of the gene pool at one locus is twice the number of individuals in the population because each autosomal genotype consists of two alleles; that is, a ΔCCR5/ΔCCR5 individual has two ΔCCR5 alleles, and a CCR5/ΔCCR5 individual has one of each. In this example, then, the observed frequency of the CCR5 allele is:


Similarly, one can calculate the frequency of the ΔCCR5 allele as 0.094, by adding up how many ΔCCR5 alleles are present [(2 × 7) + (1 × 134)] = 148 out of a total of 1576 alleles in this sample], resulting in a ΔCCR5allele frequency of 148/1576 = 0.094. Alternatively (and more simply), one can subtract the frequency of the normal CCR5 allele, 0.906, from 1, because the frequencies of the two alleles must add up to 1, resulting in a ΔCCR5 allele frequency of 0.094.

The Hardy-Weinberg Law

As we have just shown with the CCR5 example, we can use a sample of individuals with known genotypes in a population to derive estimates of the allele frequencies by simply counting the alleles in individuals with each genotype. How about the converse? Can we calculate the proportion of the population with various genotypes once we know the allele frequencies? Deriving genotype frequencies from allele frequencies is not as straightforward as counting because we actually do not know in advance how the alleles are distributed among homozygotes and heterozygotes. If a population meets certain assumptions (see later), however, there is a simple mathematical equation for calculating genotype frequencies from allele frequencies. This equation is known as the Hardy-Weinberg law. This law, the cornerstone of population genetics, was named for Godfrey Hardy, an English mathematician, and Wilhelm Weinberg, a German physician, who independently formulated it in 1908.

The Hardy-Weinberg law has two critical components. The first is that under certain ideal conditions (see Box), a simple relationship exists between allele frequencies and genotype frequencies in a population. Suppose p is the frequency of allele A, and q is the frequency of allele a in the gene pool. Assume alleles combine into genotypes randomly; that is, mating in the population is completely at random with respect to the genotypes at this locus. The chance that two A alleles will pair up to give the AA genotype is then p2; the chance that two a alleles will come together to give the aa genotype is q2; and the chance of having one Aand one a pair, resulting in the Aa genotype, is 2pq (the factor 2 comes from the fact that the A allele could be inherited from the mother and the a allele from the father, or vice versa). The Hardy-Weinberg law states that the frequency of the three genotypes AA, Aa, and aa is given by the terms of the binomial expansion of (p + q)2 = p2 + 2pq + q2. This law applies to all autosomal loci and to the X chromosome in females, but not to X-linked loci in males who have only a single X chromosome.

The Hardy-Weinberg Law

The Hardy-Weinberg law rests on these assumptions:

• The population under study is large, and matings are random with respect to the locus in question.

• Allele frequencies remain constant over time because of the following:

• There is no appreciable rate of new mutation.

• Individuals with all genotypes are equally capable of mating and passing on their genes; that is, there is no selection against any particular genotype.

• There has been no significant immigration of individuals from a population with allele frequencies very different from the endogenous population.

A population that reasonably appears to meet these assumptions is considered to be in Hardy-Weinberg equilibrium.

The law can be adapted for genes with more than two alleles. For example, if a locus has three alleles, with frequencies p, q, and r, the genotypic distribution can be determined from (p + q + r)2. In general terms, the genotype frequencies for any known number of alleles an with allele frequencies p1p2, … pn can be derived from the terms of the expansion of (p1 + p2 + … pn)2.

A second component of the Hardy-Weinberg law is that if allele frequencies do not change from generation to generation, the proportion of the genotypes will not change either; that is, the population genotype frequencies from generation to generation will remain constant, at equilibrium, if the allele frequencies p and q remain constant. More specifically, when there is random mating in a population that is at equilibrium and genotypes AA, Aa, and aa are present in the proportions p2 : 2pq : q2, then genotype frequencies in the next generation will remain in the same relative proportions, p2 : 2pq : q2. Proof of this equilibrium is shown in Table 9-2. It is important to note that Hardy-Weinberg equilibrium does not specify any particular values for p and q; whatever allele frequencies happen to be present in the population will result in genotype frequencies of p2 : 2pq : q2, and these relative genotype frequencies will remain constant from generation to generation as long as the allele frequencies remain constant and the other conditions introduced in the Box are met.


Frequencies of Mating Types and Offspring for a Population in Hardy-Weinberg Equilibrium with Parental Genotypes in the Proportion p2 : 2pq : q2


Sum of AA offspring = p4 + p3q + p3q + p2q2 = p2(p2 + 2pq + q2) = p2(p + q)2 = p2. (Remember that p + q = 1.)

Sum of Aa offspring = p3q + p3q + p2q2 + p2q2 + 2p2q2 + pq3 + pq3 = 2pq(p2 + 2pq + q2) = 2pq(p + q)2 = 2pq.

Sum of aa offspring = p2q2 +pq3 + pq3 + q4 = q2(p2 + 2pq + q2) = q2(p + q)2 = q2.

Applying the Hardy-Weinberg formula to the CCR5 example given earlier, with relative frequencies of the two alleles in the population of 0.906 (for the wild-type allele CCR5) and 0.094 (for ΔCCR5), then the Hardy-Weinberg law states that the relative proportions of the three combinations of alleles (genotypes) are p2 = 0.906 × 0.906 = 0.821 (for an individual having two wild-type CCR5 alleles), q2 = 0.094 × 0.094 = 0.009 (for two ΔCCR5 alleles), and 2pq = (0.906 × 0.094) + (0.094 × 0.906) = 0.170 (for one CCR5 and one ΔCCR5 allele). When these genotype frequencies, which were calculated by the Hardy-Weinberg law, are applied to a population of 788 individuals, the derived numbers of people with the three different genotypes (647 : 134 : 7) are, in fact, identical to the actual observed numbers in Table 9-1. As long as the assumptions of the Hardy-Weinberg law are met in a population, we would expect these genotype frequencies (0.821 : 0.170 : 0.009) to remain constant generation after generation in that population.

The Hardy-Weinberg Law in Autosomal Recessive Disease

The major practical application of the Hardy-Weinberg law in medical genetics is in genetic counseling for autosomal recessive disorders. For a disease such as phenylketonuria (PKU), there are hundreds of different mutant alleles with frequencies that vary among different population groups defined by geography and/or ethnicity (see Chapter 12). Affected individuals can be homozygotes for the same mutant allele but, more often than not, are compound heterozygotes for different mutant alleles (see Chapter 7). For many disorders, however, it is convenient to consider all disease-causing alleles together and treat them as a single mutant allele, with frequency q, even when there is significant allelic heterogeneity in disease-causing alleles. Similarly, the combined frequency of all wild-type or normal alleles, p, is given by 1 − q.

Suppose we would like to know the frequency of all disease-causing PKU alleles in a population for use in genetic counseling, for example, to inform couples of their risk for having a child with PKU. If we were to attempt to determine the frequency of disease-causing PKU alleles directly from genotype frequencies, as we did in the earlier example of the ΔCCR5 allele, we would need to know the frequency of heterozygotes in the population, a frequency that cannot be measured directly because of the recessive nature of PKU; heterozygotes are asymptomatic silent carriers (see Chapter 7), and their frequency in the population (i.e., 2pq) cannot be reliably determined directly from phenotype.

However, the frequency of affected homozygotes/compound heterozygotes for disease-causing alleles in the population (i.e., q2can be determined directly, by counting the number of babies with PKU born over a given period of time and identified through newborn screening programs (see Chapter 18), divided by the total number of babies screened during that same period of time. Now, using the Hardy-Weinberg law, we can calculate the mutant allele frequency (q) from the observed frequency of homozygotes/compound heterozygotes alone (q2), thereby providing an estimate (2pq) of the frequency of heterozygotes for use in genetic counseling.

To illustrate this example further, consider a population in Ireland, where the frequency of PKU is approximately 1 per 4500. If we group all disease-causing alleles together and treat them as a single allele with frequency q, then the frequency of affected individuals q2 = 1/4500. From this, we calculate q = 0.015, and thus 2pq = 0.029. The carrier frequency for all disease-causing alleles lumped together in the Irish population is therefore approximately 3%. For an individual known to be a carrier of PKU through the birth of an affected child in the family, there would then be an approximately 3% chance that he or she would find a new mate of Irish ethnicity who would also be a carrier, and this estimate could be used to provide genetic counseling. Note, however, that this estimate applies only to the population in question; if the new mate was not from Ireland, but from Finland, where the frequency of PKU is much lower (≈1 per 200,000), his or her chance of being a carrier would be only 0.6%.

In this example, we lumped all PKU-causing alleles together for the purpose of estimating q. For other disorders, however, such as hemoglobin disorders that we will consider in Chapter 11, different mutant alleles can lead to very different diseases, and therefore it would make no sense to group all mutant alleles together, even when the same locus is involved. Instead, the frequency of alleles leading to different phenotypes (such as sickle cell anemia and β-thalassemia in the case of different mutant alleles at the β-globin locus) is calculated separately.

The Hardy-Weinberg Law in X-Linked Disease

Recall from Chapter 7 that, for X-linked genes, there are three female genotypes but only two possible male genotypes. To illustrate gene frequencies and genotype frequencies when the gene of interest is X-linked, we use the trait known as X-linked red-green color blindness, which is caused by mutations in the series of visual pigment genes on the X chromosome. We use color blindness as an example because, as far as we know, it is not a deleterious trait (except for possible difficulties with traffic lights), and color blind persons are not subject to selection. As discussed later, allowing for the effect of selection complicates estimates of gene frequencies.

In this example, we use the symbol cb for all the mutant color blindness alleles and the symbol + for the wild-type allele, with frequencies q and p, respectively (Table 9-3). The frequencies of the two alleles can be determined directly from the incidence of the corresponding phenotypes in males by simply counting the alleles. Because females have two X chromosomes, their genotypes are distributed like autosomal genotypes, but because color blindness alleles are recessive, the normal homozygotes and heterozygotes are typically not distinguishable. As shown in Table 9-3, the frequency of color blindness in females is much lower than that in males. Less than 1% of females are color blind, but nearly 15% are carriers of a mutant color blindness allele and have a 50% chance of having a color blind son with each male pregnancy.


X-Linked Genes and Genotype Frequencies (Color Blindness)


Factors That Disturb Hardy-Weinberg Equilibrium

Underlying the Hardy-Weinberg law and its use are a number of assumptions (see Box, earlier), not all of which can be met (or reasonably inferred to be met) by all populations. The first is that the population under study is large and that mating is random. However, a very small population in which random events can radically alter an allele frequency may not meet this first assumption. This first assumption is also breached when the population contains subgroups whose members choose to marry within their own subgroup rather than the population at large. The second assumption is that allele frequencies do not change significantly over time. This requires that there is no migration in or out of the population by groups whose allele frequencies at a locus of interest are radically different from the allele frequencies in the population as a whole. Similarly, selection for or against particular alleles, or the addition of new alleles to the gene pool due to mutations, will break the assumptions of the Hardy-Weinberg law.

In practice, some of these violations are more damaging than others to the application of the law to human populations. As shown in the sections that follow, violating the assumption of random mating can cause large deviations from the frequency of individuals homozygous for an autosomal recessive condition that we might expect from population allele frequencies. On the other hand, changes in allele frequency due to mutation, selection, or migration usually cause more minor and subtle deviations from Hardy-Weinberg equilibrium. Finally, when Hardy-Weinberg equilibrium does not hold for a particular disease allele at a particular locus, it may be instructive to investigate why the allele and its associated genotypes are not in equilibrium because this may provide clues about the pathogenesis of the condition or point to historical events that have affected the frequency of alleles in different population groups over time.

Exceptions to Large Populations with Random Mating

As introduced earlier, the principle of random mating is that for any locus, an individual of a given genotype has a purely random probability of mating with an individual of any other genotype, the proportions being determined only by the relative frequencies of the different genotypes in the population. One's choice of mate, however, may not be at random. In human populations, nonrandom mating may occur because of three distinct but related phenomena: stratificationassortative mating, and consanguinity.


Stratification describes a population in which there are a number of subgroups that have—for a variety of historical, cultural, or religious reasons—remained relatively genetically separate during modern times. Worldwide, there are numerous stratified populations; for example, the United States population is stratified into many subgroups, including whites of northern or southern European ancestry, African Americans, and numerous Native American, Asian, and Hispanic groups. Similarly stratified populations exist in other parts of the world as well, either currently or in the recent past, such as Sunni and Shia Muslims, Orthodox Jews, French-speaking Canadians, or different castes in India. When mate selection in a population is restricted for any reason to members of one particular subgroup, and that subgroup happens to have a variant allele with a higher frequency than in the population as a whole, the result will be an apparent excess of homozygotes in the overall population beyond what one would predict from allele frequencies in the population as a whole if there were truly random mating.

To illustrate this point, suppose a population contains a minority group, constituting 10% of the population, in which a mutant allele for an autosomal recessive disease has a frequency qmin = 0.05 and the wild-type allele has frequency pmin = 0.95. In the remaining majority 90% of the population, the mutant allele is nearly absent (i.e., qmaj is ≈0 and pmaj = 1). An example of just such a situation is the African American population of the United States and the mutant allele at the β-globin locus responsible for sickle cell disease (Case 42). The overall frequency of the disease allele in the total population, qpop, is therefore equal to 0.1 × 0.05 = 0.005, and, simply applying the Hardy-Weinberg law, the frequency of the disease in the population as a whole would be predicted to be q2pop = (0.005)2 = 2.5 × 10−5 if mating were perfectly random throughout the entire population. If, however, individuals belonging to the minority group were to mate exclusively with other members of that same minority group (an extreme situation that does not apply in reality), then the frequency of affected individuals in the minority group would be (q2min) = (0.05)2 = 0.0025. Because the minority group is one tenth of the entire population, the frequency of disease in the total population is 0.0025/10 = 2.5 × 10−4, or 10-fold higher than the calculated q2pop = 2.5 × 10−5 obtained by naively applying the Hardy-Weinberg law to the population as a whole without consideration of stratification.

By way of comparison, stratification has no effect on the frequency of autosomal dominant disease and would have only a minor effect on the frequency of X-linked disease by increasing the small number of females homozygous for the mutant allele.

Assortative Mating

Assortative mating is the choice of a mate because the mate possesses some particular trait. Assortative mating is usually positive; that is, people tend to choose mates who resemble themselves (e.g., in native language, intelligence, stature, skin color, musical talent, or athletic ability). To the extent that the characteristic shared by the partners is genetically determined, the overall genetic effect of positive assortative mating is an increase in the proportion of the homozygous genotypes at the expense of the heterozygous genotype.

A clinically important aspect of assortative mating is the tendency to choose partners with similar medical problems, such as congenital deafness or blindness or exceptionally short stature. In such a case, the expectations of Hardy-Weinberg equilibrium do not apply because the genotype of the mate at the disease locus is not determined by the allele frequencies found in the general population. For example, consider achondroplasia (Case 2), an autosomal dominant form of skeletal dysplasia with a population incidence of 1 per 15,000 to 1 per 40,000 live births. Offspring homozygous for the achondroplasia mutation have a severe, lethal form of skeletal dysplasia that is almost never seen unless both parents have achondroplasia and are thus heterozygous for the mutation. This would be highly unlikely to occur by chance, except for assortative mating among those with achondroplasia.

When mates have autosomal recessive disorders caused by the same mutation or by allelic mutations in the same gene, all of their offspring will also have the disease. Importantly, however, not all cases of blindness, deafness, or short stature have the same genetic basis; many families have been described, for example, in which two parents with albinism have had children with normal pigmentation, or two deaf parents have had hearing children, because of locus heterogeneity (discussed in Chapter 7). Even if there is locus heterogeneity with assortative mating, however, the chance that two individuals are carrying mutations in the same disease locus is increased over what it would be under true random mating, and therefore the risk for the disorder in their offspring is also increased. Although the long-term population effect of this kind of positive assortative mating on disease gene frequencies is insignificant, a specific family may find itself at very high genetic risk that would not be predicted from strict application of the Hardy-Weinberg law.

Consanguinity and Inbreeding

Consanguinity, like stratification and positive assortative mating, brings about an increase in the frequency of autosomal recessive disease by increasing the frequency with which carriers of an autosomal recessive disorder mate. Unlike the disorders in stratified populations, in which each subgroup is likely to have a high frequency of a few alleles, the kinds of recessive disorders seen in the offspring of related parents may be very rare and unusual in the population as a whole because consanguineous mating allows an uncommon allele inherited from a heterozygous common ancestor to become homozygous. A similar phenomenon is seen in genetic isolates, small populations derived from a limited number of common ancestors who tended to mate only among themselves. Mating between two apparently “unrelated” individuals in a genetic isolate may have the same risk for certain recessive conditions as that observed in consanguineous marriages because the individuals are both carriers by inheritance from common ancestors of the isolate, a phenomenon known as inbreeding.

For example, among Ashkenazi Jews in North America, mutant alleles for Tay-Sachs disease (GM2 gangliosidosis) (Case 43), discussed in detail in Chapter 12, are relatively more common than in other ethnic groups. The frequency of Tay-Sachs disease is 100 times higher in Ashkenazi Jews (1 per 3600) than in most other populations (1 per 360,000). Thus the Tay-Sachs carrier frequency among Ashkenazi Jews is approximately 1 in 30 (q2 = 1/3600, q = 1/60, 2pq = ≈1/30) as compared to a carrier frequency of approximately 1 in 300 in non-Ashkenazi individuals.

Exceptions to Constant Allele Frequencies

Effect of Mutation

We have shown that nonrandom mating can substantially upset the relative frequency of various genotypes predicted by the Hardy-Weinberg law, even within the time of a single generation. In contrast, changes in allele frequency due to selection or mutation usually occur slowly, in small increments, and cause much less deviation from Hardy-Weinberg equilibrium, at least for recessive diseases.

The rates of new mutations (see Chapter 4) are generally well below the frequency of heterozygotes for autosomal recessive diseases; the addition of new mutant alleles to the gene pool thus has little effect in the short term on allele frequencies for such diseases. In addition, most deleterious recessive alleles are hidden in asymptomatic heterozygotes and thus are not subject to selection. As a consequence, selection is not likely to have major short-term effects on the allele frequency of these recessive alleles. Therefore, to a first approximation, Hardy-Weinberg equilibrium may apply even for alleles that cause severe autosomal recessive disease.

Importantly, however, for dominant or X-linked disease, mutation and selection do perturb allele frequencies from what would be expected under Hardy-Weinberg equilibrium, by substantially reducing or increasing certain genotypes.

Selection and Fitness

The molecular and genomic basis for mutation was considered earlier in Chapter 4. Here we examine the concept of fitness, the chief factor that determines whether a mutation is eliminated immediately, becomes stable in the population, or even becomes, over time, the predominant allele at the locus concerned. The frequency of an allele in a population at any given time represents a balance between the rate at which mutant alleles appear through mutation and the effects of selection. If either the mutation rate or the effectiveness of selection is altered, the allele frequency is expected to change.

Whether an allele is transmitted to the succeeding generation depends on its fitness, f, which is a measure of the number of offspring of affected persons who survive to reproductive age, compared with an appropriate control group. If a mutant allele is just as likely as the normal allele to be represented in the next generation, f equals 1. If an allele causes death or sterility, selection acts against it completely, and fequals 0. Values between 0 and 1 indicate transmission of the mutation, but at a rate that is less than that of individuals who do not carry the mutant allele.

A related parameter is the coefficient of selections, which is a measure of the loss of fitness and is defined as 1 − f, that is, the proportion of mutant alleles that are not passed on and are therefore lost as a result of selection. In the genetic sense, a mutation that prevents reproduction by an adult is just as “lethal” as one that causes a very early miscarriage of an embryo, because in neither case is the mutation transmitted to the next generation. Fitness is thus the outcome of the joint effects of survival and fertility. When a genetic disorder limits reproduction so severely that the fitness is zero (i.e., s = 1), it is thus referred to as a genetic lethal. In the biological sense, fitness has no connotation of superior endowment except in a single respect: comparative ability to contribute alleles to the next generation.

Selection in Recessive Disease.

Selection against harmful recessive mutations has far less effect on the population frequency of the mutant allele than does selection against dominant mutations because only a small proportion of the genes are present in homozygotes and are therefore exposed to selective forces. Even if there were complete selection against homozygotes (f = 0), as in many lethal autosomal recessive conditions, it would take many generations to reduce the gene frequency appreciably because most of the mutant alleles are carried by heterozygotes with normal fitness. For example, as we saw previously in this chapter, the frequency of mutant alleles causing Tay-Sachs disease, q, can be as high as 1.5% in Ashkenazi Jewish populations. Given this value of q, we can estimate that approximately 3% of such populations (2 × p × q) are heterozygous and carry one mutant allele, whereas only 1 individual per 3600 (q2) is a homozygote with two mutant alleles. The proportion of all mutant alleles found in homozygotes in such a population is thus given by:


Thus, less than 2% of all the mutant alleles in the population are in affected homozygotes and would therefore be exposed to selection in the absence of effective treatment.

Reduction or removal of selection against an autosomal recessive disorder by successful medical treatment (e.g., as in the case of PKU [see Chapter 12]) would have just as slow an effect on increasing the gene frequency over many generations. Thus as long as mating is random, genotypes in autosomal recessive diseases can be considered to be in Hardy-Weinberg equilibrium, despite selection against homozygotes for the recessive allele.Thus the mathematical relationship between genotype and allele frequencies described in the Hardy-Weinberg law holds for most practical purposes in recessive disease.

Selection in Dominant Disorders.

In contrast to recessive mutant alleles, dominant mutant alleles are exposed directly to selection. Consequently, the effects of selection and mutation are more obvious and can be more readily measured for dominant traits. A genetic lethal dominant allele, if fully penetrant, will be exposed to selection in heterozygotes, thus removing all alleles responsible for the disorder in a single generation. Several human diseases are thought or known to be autosomal dominant traits with zero or near-zero fitness and thus always result from new rather than inherited autosomal dominant mutations (Table 9-4), a point of great significance for genetic counseling. In some, the genes and specific mutant alleles are known, and family studies show new mutations in the affected individuals that were not inherited from the parents. In other conditions, the genes are not known, but a paternal age effect (see Chapter 4) has been seen, suggesting (but not proving) that a de novo mutation in the paternal germline is a possible cause of the disorder. The implication for genetic counseling is that the parents of a child with an autosomal dominant but genetically lethal condition will typically have a very low risk for recurrence in subsequent pregnancies because the condition would generally require another independent mutation to recur. A caveat to keep in mind, however, is the possibility of germline mosaicism, as we saw in Chapter 7 (see Fig. 7-18).


Examples of Disorders Occurring as Sporadic Conditions due to New Mutations with Zero Fitness




Early lethal form of short-limbed skeletal dysplasia

Cornelia de Lange syndrome

Intellectual disability, micromelia, synophrys, and other abnormalities; can be caused by mutation in the NIPBL gene

Osteogenesis imperfecta, type II

Perinatal lethal type, with a defect in type I collagen (COL1A1COL1A2) (see Chapter 12)

Thanatophoric dysplasia

Early lethal form of skeletal dysplasia due to de novo mutations in the FGFR3 gene (see Fig. 7-6C)

Mutation and Selection Balance in Dominant Disease.

If a dominant disease is deleterious but not lethal, affected persons may reproduce but will nevertheless contribute fewer than the average number of offspring to the next generation; that is, their fitness, f, will be reduced. Such a mutation will be lost through selection at a rate proportional to the reduced fitness of heterozygotes. The frequency of the mutant alleles responsible for the disease in the population therefore represents a balance between loss of mutant alleles through the effects of selection and gain of mutant alleles through recurrent mutation. A stable allele frequency will be reached at whatever level balances the two opposing forces: one (selection) that removes mutant alleles from the gene pool and one (de novo mutation) that adds new ones back. The mutation rate per generation, µ, at a disease locus must be sufficient to account for that fraction of all the mutant alleles (allele frequency q) that are lost by selection from each generation. Thus,


As an illustration of this relationship, in achondroplasia, the fitness of affected patients is not zero, but they have only approximately one fifth as many children as people of normal stature in the population. Thus their average fitness, f, is 0.20, and the coefficient of selection, s, is 1 − f, or 0.80. In the subsequent generation, then, only 20% of current achondroplasia alleles are passed on from the current generation to the next. Because the frequency of achondroplasia appears stable from generation to generation, new mutations must be responsible for replacing the 80% of mutant genes in the population lost through selection.

If the fitness of affected persons suddenly improved (e.g., because of medical advances), the observed incidence of the disease in the population would be predicted to increase and reach a new equilibrium. Retinoblastoma (Case 39and other dominant embryonic tumors with childhood onset are examples of conditions that now have a greatly improved prognosis, with a predicted consequence of increased disease frequency in the population. Allele frequency, mutation rate, and fitness are related; thus, if any two of these three characteristics are known, the third can be estimated.

Mutation and Selection Balance in X-Linked Recessive Mutations.

For those X-linked phenotypes of medical interest that are recessive, or nearly so, selection occurs in hemizygous males and not in heterozygous females, except for the small proportion of females who are manifesting heterozygotes with reduced fitness (see Chapter 7). In this brief discussion, however, we assume that heterozygous females have normal fitness.

Because males have one X chromosome and females two, the pool of X-linked alleles in the entire population's gene pool is partitioned at any given time, with one third of mutant alleles present in males and two thirds in females. As we saw in the case of autosomal dominant mutations, mutant alleles lost through selection must be replaced by recurrent new mutations to maintain the observed disease incidence. If the incidence of a serious X-linked disease is not changing and selection is operating against (and only against) hemizygous males, the mutation rate, µ, must equal the coefficient of selection, s (i.e., the proportion of mutant alleles that are not passed on), times q, the allele frequency, adjusted by a factor of 3 because selection is operating only on the third of the mutant alleles in the population that are present in males at any time. Thus,


For an X-linked genetic lethal disease, s = 1, and one third of all copies of the mutant gene responsible are lost from each generation and must, in a stable equilibrium, be replaced by de novo mutations. Therefore, in such disorders, one third of all persons who have X-linked lethal disorders are predicted to carry a new mutation, and their genetically normal mothers have a low risk for having subsequent children with the same disorder (again, assuming the absence of germline mosaicism). The remaining two thirds of the mothers of individuals with an X-linked lethal disorder would be carriers, with a 50% risk for having another affected son. However, the prediction that two thirds of the mothers of individuals with an X-linked lethal disorder are carriers of a disease-causing mutation is based on the assumption that mutation rates in males and in females are equal. It can be shown that if the mutation rate in males is much greater than in females, then the chance of a new mutation in the egg is very low, and most of the mothers of affected children will be carriers, having inherited the mutation as a new mutation from their unaffected fathers and then passing it on to their affected children. The effect on genetic counseling of differences in the rate of disease-causing mutations in male and female gametes will be discussed in Chapter 16.

In less severe disorders, such as hemophilia A (Case 21), the proportion of affected individuals representing new mutations is less than one third (currently approximately 15%). Because the treatment of hemophilia is improving rapidly, the total frequency of mutant alleles can be expected to rise relatively rapidly and to reach a new equilibrium. Assuming (as seems reasonable) that the mutation rate at this locus stays the same over time, the proportion of hemophiliacs who result from a new mutation will decrease, but the overall incidence of the disease will increase. Such a change would have significant implications for genetic counseling for this disorder (see Chapter 16).

Genetic Drift

Chance events can have a much greater effect on allele frequencies in a small population than in a large one. For example, when a new mutation occurs in a small population, its frequency is represented by only one copy among all the copies of that gene in the population. Random effects of environment or other chance occurrences that are independent of the genotype (i.e., events that occur for reasons unrelated to whether an individual is carrying the mutant allele) can produce significant changes in the frequency of the disease allele when the population is small. Such chance occurrences disrupt Hardy-Weinberg equilibrium and cause the allele frequency to change from one generation to the next. This phenomenon, known as genetic drift, can explain how allele frequencies can change as a result of chance. During the next few generations, although the population size of the new group remains small, there may be considerable fluctuation in gene frequency until allele frequencies come to a new equilibrium as the population increases in size. In contrast to gene flow (see next section), in which allele frequencies change because of the mixing of previously distinct populations, the mechanism of genetic drift is simply chance operating on a small population.

Founder Effect.

One special form of genetic drift is referred to as founder effect. When a small subpopulation breaks off from a larger population, the gene frequencies in the small population may be different from those of the population from which it originated because the new group contains a small, random sample of the parent group and, by chance, may not have the same gene frequencies as the parent group. If one of the original founders of a new group just happens to carry a relatively rare allele, that allele will have a far higher frequency than it had in the larger group from which the new group was derived.

Migration and Gene Flow

Migration can change allele frequency by the process of gene flow, defined as the slow diffusion of genes across a barrier. Gene flow usually involves a large population and a gradual change in gene frequencies. The genes of migrant populations with their own characteristic allele frequencies are gradually merged into the gene pool of the population into which they have migrated, a process referred to as genetic admixture. The term migration is used here in the broad sense of crossing a reproductive barrier, which may be racial, ethnic, or cultural and not necessarily geographical and requiring physical movement from one region to another. Some examples of admixture reflect well-known and well-documented events in human history (e.g., the African diaspora from the 15th to the 19th century), whereas others can only be inferred from the genomic study of variation in ancient DNA samples (see Box).

Returning to the example of the 32-bp deletion allele of the CCR5 cytokine receptor gene, ΔCCR5, the frequency of this allele has been studied in many populations all over the world. The frequency of the ΔCCR5allele is highest, up to 18%, in parts of northwestern Europe and then declines along a gradient into eastern and southern Europe, falling to a few percent in the Middle East and the Indian subcontinent. The ΔCCR5 allele is virtually absent from Africa and the Far East. The best interpretation of the current geographical distribution of the ΔCCR5 allele is that the mutation originated in northern Europe and then underwent both positive selection and gene flow over long distances (Fig. 9-1).

Ancient Migrations and Gene Flow

A fascinating example of gene flow during human prehistory comes from the sequencing of DNA samples obtained from the bones of three Neanderthals who died approximately 38,000 years ago in Europe. The most recent common ancestors of Neanderthals and Homo sapiens lived in Africa over 200,000 years ago, well before the migration of Neanderthals out of Africa to settle in Europe and the Middle East. An analysis of the sequence of Neanderthal DNA revealed that approximately 1% to 4% of the DNA of modern Europeans and Asians, but not of Africans, matches Neanderthal DNA. A variety of statistical techniques indicate that the introduction of Neanderthal DNA likely occurred approximately 50,000 years ago, well after the migration of modern humans out of Africa into Europe and beyond, which explains why traces of the Neanderthal genome are not present in modern Africans.

The analysis of individual Neanderthal genomes and their comparison to genomes of modern human populations promises to provide clues about characteristic differences between these groups, as well as about the frequency of possible disease genes or alleles that were more or less common in these ancient populations compared to different modern human populations.


FIGURE 9-1 The frequency of ΔCCR5 alleles in various geographical regions of Europe, the Middle East, and the Indian subcontinent. The various allele frequencies are shown with color coding provided on the right. Black dots indicate the locations where allele frequencies were sampled; the rest of the frequencies were then interpolated in the regions between where direct sampling was done. Gray areas are regions where there were insufficient data to estimate allele frequencies. SeeSources & Acknowledgments.

Ethnic Differences in the Frequency of Various Genetic Diseases

The previous discussion of the Hardy-Weinberg law explained how, at equilibrium, genotype frequencies are determined by allele frequencies and remain stable from generation to generation, assuming the allele frequencies in a large, isolated, randomly mating population remain constant. However, there is a problem of interest to human geneticists that the Hardy-Weinberg law does not address: Why are allele frequencies different in different populations in the first place? In particular, for the medical geneticist, why are some mutant alleles that are clearly deleterious more common in certain population groups than in others? We address these issues in the rest of this chapter.

Differences in frequencies of alleles that cause genetic disease are of particular interest to the medical geneticist and genetic counselor because they cause different disease risks in specific population groups. Well-known examples include Tay-Sachs disease in people of Ashkenazi Jewish ancestry, sickle cell disease in African Americans, and hemolytic disease of the newborn and PKU in white populations (Table 9-5).


Incidence, Gene Frequency, and Heterozygote Frequency for Selected Autosomal Disorders in Different Populations


The Rh System

One clinically important example of marked differences in allele frequencies is seen with the Rh blood group. The Rh blood group is very important clinically because of its role in hemolytic disease of the newborn and in transfusion incompatibilities. In simplest terms, the population is separated into Rh-positive individuals, who express, on their red blood cells, the antigen Rh D, a polypeptide encoded by the RHD gene, and Rh-negative individuals, who do not express this antigen. Being Rh-negative is therefore inherited as an autosomal recessive trait in which the Rh-negative phenotype occurs in individuals homozygous or compound heterozygous for nonfunctional alleles of the RHD gene. The frequency of Rh-negative individuals varies enormously in different ethnic groups (see Table 9-5).

Hemolytic Disease of the Newborn Caused by Rh Incompatibility

The chief significance of the Rh system is that Rh-negative persons can readily form anti-Rh antibodies after exposure to Rh-positive red blood cells. Normally, during pregnancy, small amounts of fetal blood cross the placental barrier and reach the maternal bloodstream. If the mother is Rh-negative and the fetus Rh-positive, the mother will form antibodies that return to the fetal circulation and damage the fetal red blood cells, causing hemolytic disease of the newborn with consequences that can be severe if not treated.

In pregnant Rh-negative women, the risk for immunization by Rh-positive fetal red blood cells can be minimized with an injection of Rh immune globulin at 28 to 32 weeks of gestation and again after pregnancy. Rh immune globulin serves to clear any Rh-positive fetal cells from the mother's circulation before she is sensitized. Rh immune globulin is also given after miscarriage, termination of pregnancy, or invasive procedures such as chorionic villus sampling or amniocentesis, in case any Rh-positive cells gained access to the mother's circulation. The discovery of the Rh system and its role in hemolytic disease of the newborn has been a major contribution of genetics to medicine. At one time ranking as the most common human genetic disease among individuals of European ancestry, hemolytic disease of the newborn caused by Rh incompatibility is now relatively rare, but only because obstetricians remain vigilant, identify at-risk patients, and routinely give them Rh immune globulin to prevent sensitization.

Ethnic Differences in Disease Frequencies

A number of factors discussed earlier in this chapter are thought to explain how differences in alleles and allele frequencies among ethnic groups develop. One is the lack of gene flow due to genetic isolation, so that a mutation in one group would not have an opportunity to be spread through matings to other groups. Other factors are genetic drift, including nonrandom distribution of alleles among the individuals who founded particular subpopulations (founder effect), and heterozygote advantage under environmental conditions that favor the reproductive fitness of carriers of deleterious mutations. Specific examples of these are illustrated in the next section. However, in many cases, we do not have a clear explanation for how these differences developed.

Founder Effect

One extreme example of a difference in the incidence of genetic disease among different ethnic groups is the high incidence of Huntington disease (Case 24among the indigenous inhabitants around Lake Maracaibo, Venezuela, that resulted from the introduction of a Huntington disease mutation into this genetic isolate. There are numerous other examples of founder effect involving other disease alleles in genetic isolates throughout the world, such as the French-Canadian population of Canada, which has high frequencies of certain disorders that are rare elsewhere. For example, hereditary type I tyrosinemia is an autosomal recessive condition that causes hepatic failure and renal tubular dysfunction due to deficiency of fumarylacetoacetase, an enzyme in the degradative pathway of tyrosine. The disease frequency is 1 in 685 in the Saguenay–Lac-Saint-Jean region of Quebec, but only 1 in 100,000 in other populations. As predicted for a founder effect, 100% of the mutant alleles in the Saguenay–Lac-Saint-Jean patients are due to the same mutation.

Thus one of the outcomes of the founder effect and genetic drift is that each population may be characterized by its own particular mutant alleles, as well as by an increase or decrease in specific diseases. The relative mobility of most present-day populations, in comparison with their ancestors of only a few generations ago, may reduce the effect of genetic drift in the future while increasing the effect of gene flow.

Positive Selection for Heterozygotes (Heterozygote Advantage)

Although certain mutant alleles may be deleterious in homozygotes, there may be environmental conditions in which heterozygotes for some diseases have increased fitness not only over homozygotes for the mutant allele but even over homozygotes for the normal allele. This situation is termed heterozygote advantage. Even a slight heterozygote advantage can lead to an increase in frequency of an allele that is severely detrimental in homozygotes, because heterozygotes greatly outnumber homozygotes in the population. A situation in which selective forces operate both to maintain a deleterious allele and to remove it from the gene pool is described as a balanced polymorphism.

Malaria and Hemoglobinopathies.

A well-known example of heterozygote advantage is resistance to malaria in heterozygotes for the mutation in sickle cell disease (Case 42). The sickle cell allele in the β-globin gene has reached its highest frequency in certain regions of West Africa, where heterozygotes are more fit than either type of homozygote because heterozygotes are relatively more resistant to the malarial organism. In regions where malaria is endemic, normal homozygotes are susceptible to malaria; many become infected and are severely, even fatally, affected, leading to reduced fitness. Sickle cell homozygotes are even more seriously disadvantaged, with a low relative fitness that approaches zero because of their severe hematological disease, discussed more fully in Chapter 11. Heterozygotes for sickle cell disease have red cells that are inhospitable to the malaria parasite but do not undergo sickling under normal environmental conditions; the heterozygotes are thus relatively more fit than homozygotes for the normal β-globin allele and reproduce at a higher rate. Thus, over time, the sickle cell mutant allele has reached a frequency as high as 0.15 in some areas of West Africa that are endemic for malaria, far higher than could be accounted for by recurrent mutation alone.

The heterozygote advantage in sickle cell disease demonstrates how violating one of the fundamental assumptions of Hardy-Weinberg equilibrium—that allele frequencies are not significantly altered by selection—causes the mathematical relationship between allele and genotype frequencies to diverge from what is expected under the Hardy-Weinberg law (see Box).

Balanced Selection and the Hardy-Weinberg Law

Consider two alleles at the β-globin gene, the normal A allele and the mutant S allele, which give rise to three genotypes: A/A (normal), A/S (heterozygous carriers), and S/S (sickle cell disease). In a study of 12,387 individuals from an adult West African population, the three genotypes were detected in the proportions 9365 A/A : 2993 A/S : 29 S/S.

By counting the A and S alleles in these three genotypes, one can determine the allele frequencies in this population to be p = 0.877 for the A allele and q = 0.123 for the S allele. Under Hardy-Weinberg equilibrium, the ratio of genotypes is determined by p2 : 2pq : q2 and should therefore be 9527 A/A : 2672 A/S : 188 S/S. In this West African population, the observed number of A/S individuals exceeds what was predicted assuming Hardy-Weinberg equilibrium, whereas the observed number of S/S homozygotes is far below what was predicted, reflecting balanced selection at this locus. This example of the sickle cell allele illustrates how the forces of selection, operating both negatively on the relatively rare S/S genotype but also positively on the more common A/S genotype, cause a deviation from Hardy-Weinberg equilibrium in a population.

Change in selective pressures would be expected to lead to a rapid change in the relative frequency of the sickle cell allele. Today, in fact, major efforts are being made to eradicate the mosquito responsible for transmitting the disease in malarial areas; in addition, many sickle cell heterozygotes live in nonmalarial regions. There is evidence that in the African American population in the United States, the frequency of the sickle cell gene may already be falling from its high level in the original African population of several generations ago, although other factors, such as the admixture of alleles from non-African populations into the African American gene pool, may also be playing a role. Some other deleterious alleles, including those responsible for thalassemia (Case 44), and glucose-6-phosphate dehydrogenase deficiency (Case 19), are also thought to be maintained at their present high frequencies in certain populations because of the protection that they provide against malaria.

Balanced Selection in Other Infectious Diseases

The effects of balanced selection in malaria are also apparent in other infectious diseases. For example, many Africans and African Americans with the severe renal disease known as focal segmental glomerulosclerosis are homozygotes for certain variant alleles in the coding region of the APOL1 gene that encodes the apolipoprotein L1. Apolipoprotein L1 is a serum factor that kills the trypanosome parasite Trypanosoma brucei that causes trypanosomiasis (sleeping sickness). The same variants that increase the risk for severe kidney disease in homozygotes tenfold over the rest of the population protect heterozygotes carrying these variants against strains of trypanosomes (T. brucei rhodesiense) that have developed resistance to wild-type apolipoprotein L1. As a result, the frequency of heterozygous carriers for these variant alleles can be as high as approximately 45% in parts of Africa in which the rhodesiense trypanosomiasis is endemic.

Drift Versus Heterozygote Advantage

Determining whether drift or heterozygote advantage is responsible for the increased frequency of some deleterious alleles in certain populations is not always straightforward, because it involves the integration of modern genetic data and public health with the historical record of population movement and ancient diseases. The environmental selective pressure responsible for heterozygote advantage may have been operating in the past and not be identifiable in modern times. As seen in Figure 9-1, the northwest to southeast gradient in the frequency of the ΔCCR5 allele reflects major differences in the frequency of this allele in different ethnic groups. For example, the highest frequency of the ΔCCR5 allele, seen among Ashkenazi Jews, is 0.21, and it is nearly that high in Iceland and the British Isles. The moderate variation in allele frequencies across Europe is most consistent with genetic drift acting on a neutral polymorphism.

However, the overall elevation of allele frequencies in Europe (relative to non-European populations) is more suggestive of positive selection in response to some infectious agent. Although the current AIDS pandemic is too recent to have affected gene frequencies through selection, it is possible that a different selective factor (perhaps another infectious disease such as smallpox or bubonic plague) may have elevated the frequency of the ΔCCR5 allele in northern European populations during a period of intense selection many generations ago. Thus geneticists continue to debate whether genetic drift or heterozygote advantage (or both) adequately accounts for the unusually high frequencies that some deleterious alleles achieve in some populations.

Genetics and Ancestry

Ancestry Informative Markers

Although the approximately 20,000 coding genes and their location and order on the chromosomes are nearly identical in all humans, we saw in Chapter 4 that humans as a whole have tens of millions of different alleles, ranging from changes in single base pairs (SNPs) to large genomic variants (CNVs or indels) hundreds of kilobases in size, that underlie extensive polymorphism among individuals. Many of the alleles found in one population are found in all human populations, at similar frequencies around the globe.

However, following a period of explosive population growth, today's human species of more than 7 billion members are derived from much smaller subpopulations, which, until quite recently, existed as separate subpopulations or ethnic groups, with different geographical origins and population histories that resulted in restricted mating between the subgroups. Different alleles arose from random mutation in humans who lived in these small isolated settlements; most of these would be expected to confer no selective advantage or disadvantage and are therefore selectively neutral. For the population geneticist and anthropologist, selectively neutral genetic markers provide a means of tracing human history. The interactions of genetic drift, selection due to environmental factors, and gene flow brought about by migration and intermarriage have different effects at loci around the genome: they may equalize allele frequencies throughout many subpopulations, may cause major differences in frequency between populations, or may cause certain alleles to be restricted to just one population.

Alleles that show large differences in allele frequency among populations originating in different parts of the world are referred to as ancestry informative markers (AIMs). Sets of AIMs have been identified whose frequencies differ among populations derived from widely separated geographical origins (e.g., European, African, Far East Asian, Middle Eastern, Native American, and Pacific Islanders). They are therefore useful as markers for charting human migration patterns, for documenting historical admixture between or among populations, and for determining the degree of genetic diversity among identifiable population subgroups. Studies of hundreds of thousands of AIMs from across the genome have been used to distinguish and determine the genome-wide relationships among many different population groups, including communities of Jews in Europe, Africa, Asia, and the Americas; dozens of distinct Native American populations from South America, North America, and Siberia; and many castes and tribal groups in India. Figure 9-2 illustrates this type of analysis to establish that Hispanics as a group are genetically very heterogeneous, with ancestors from many parts of the world.


FIGURE 9-2 Mixed ancestry of a group of Americans who self-identify as African American (AA), European American (EA), and Hispanic American (HA) using ancestry informative markers. Each vertical line represents one individual ((totaling hundreds, as shown by the numbers), and subjects are displayed according to the predominant ancestry contribution to their genomes. Different colors indicate origin from a different geographical origin, as inferred from AIMs, as follows: Africa (blue), Europe (red), Middle East (purple), Central Asia (yellow), Far East Asia (cyan), Oceania (amber), and America (green). Most African Americans have genomes of predominantly African origin (blue), and most European Americans have genomes of predominantly European origin (red), although there is a range of ancestry contribution among different subjects. In contrast, Hispanic Americans are a more heterogeneous group, and most individuals have genomes with significant contributions from four or five different origins. SeeSources & Acknowledgments.

Although there are millions of variants with different allele frequencies that can distinguish different population groups, genotyping as few as just a few hundred or a thousand SNPs in an individual is sufficient to identify the likely proportion of his or her genome contributed by ancestors from these different continental populations and to infer, therefore, the likely geographical origin(s) of that individual's ancestors. For example, Figure 9-3 shows the results from several hundred individuals from Puerto Rico, whose individual genomes can be shown to consist of various proportions of African and European heritage, with much less Native American genetic heritage. Dozens of companies now offer ancestry testing services to consumers. Although there is disagreement about the scientific, medical, or anthropological value of the information for most individuals, the availability of ancestry testing has attracted widespread attention from those with interests in their family histories or diasporic heritage.


FIGURE 9-3 Ancestry contributions in an admixed Puerto Rican population. Three-dimensional display of the similarity of genomes from 192 Puerto Ricans to West African, European, and Native American control genomes, using a statistical measure known as principal components (PC) analysis. The PCs shown on the three axes correspond to groups of ancestry informative markers that distinguish the populations in question. The West African, European, and Native American genomes cluster in three distinct locations by PC analysis. The analysis demonstrates that the Puerto Rican genomes are heterogeneous; some individuals have genomes of predominantly European origin, and others have a much greater contribution from West Africa, whereas there is much less contribution from the Americas. SeeSources & Acknowledgments.

Population Genetics and Race

Population genetics uses quantitative methods to explain why and how differences in the frequency of genetic disease and the alleles responsible for them arose among different individuals and ethnic groups. What population genetics does not do, however, is provide a biological foundation for the concept of “race.”

In one sense, racial distinctions are both “real” and widely used (and misused); they are social constructs that can have a profound impact on the health of individuals experiencing racial categorization in their day-to-day lives. Physicians must pay attention to the social milieu their patients navigate, and the impact of racial categorization on the health and well-being of their patients must be taken into account if physicians are to understand and respond to patient needs (see Box).

Ancestry and Health

The significance of genetic ancestry for the practice of medicine reflects the role that allelic variants with different frequencies in different populations have on various clinically relevant functions. Although this area of study is still in its early stages, it is already clear that including assessment of genetic ancestry can provide useful information to improve prognostic predictions compared to those that depend only on self-declared racial or ethnic identity.

For example, when examined with panels of AIMs, the genomes of individuals who self-identify as African American contain DNA that ranges from less than 10% to more than 95% of African origin. For genetically determined traits influenced by ancestry, the effect of a particular single nucleotide polymorphism(s) on gene function will depend then on whether the responsible allele(s) is of African or European origin, a determination of genetic origin that is distinct from one's self-identification as an African American.

As an illustration of this point, a study of lung function to classify disease severity in a group of African American asthmatics showed that predictions of the overall degree of lung impairment in these patients were more accurate when genetic ancestry was considered, rather than relying solely on self-reported race. Disease classification (i.e., lung function in the “normal” range or not) could be misclassified in up to 5% of patients when ancestry information was omitted.

In addition to the potential for clinical management, AIMs testing can also be useful in research for identifying the particular genes and variants that are responsible for genetic diseases and other complex traits that differ markedly in incidence between different ethnic or geographical groups. Successful examples of this powerful approach are described in Chapter 10.

From the scientific point of view, however, race is a fiction. Racial categories are constructed using poorly defined criteria that subdivide humankind using physical appearance (i.e., skin color, hair texture, and facial structures) combined with social characteristics that have their origins in the geographical, historical, cultural, religious, and linguistic backgrounds of the community in which an individual was born and raised. Although some of these distinguishing characteristics have a basis in the differences in the alleles carried by individuals of different ancestry, others likely have little or no basis in genetics. Racial categorization has been widely used in the past in medicine as the basis for making a number of assumptions concerning an individual's genetic makeup. Knowing the frequencies of alleles of relevance to health and disease in different populations around the globe is valuable for alerting a physician to an increased likelihood for disease based on an individual's genetic ancestry. However, with the expansion of individualized genetic medicine, it is hoped that more and more of the variants that contribute to disease will be assessed directly rather than having ethnicity or “race” used as a surrogate for an accurate genotype.

General References

Li CC. First course in population genetics. Boxwood Press: Pacific Grove, CA; 1975.

Nielsen R, Slatkin M. An introduction to population genetics. Sinauer Associates, Inc.: Sunderland, MA; 2013.

References for Specific Topics

Behar DM, Yunusbayev B, Metspalu M, et al. The genome-wide structure of the Jewish people. Nature. 2010;466:238–242.

Corona E, Chen R, Sikora M, et al. Analysis of the genetic basis of disease in the context of worldwide human relationships and migration. PLoS Genet. 2013;9:e1003447.

Kumar R, Seibold MA, Aldrich MC, et al. Genetic ancestry in lung-function predictions. N Engl J Med. 2010;363:321–330.

Reich D, Patterson N, Campbell D, et al. Reconstructing Native American population history. Nature. 2012;488:370–374.

Reich D, Thangaraj K, Patterson N, et al. Reconstructing Indian population history. Nature. 2009;461:489–494.

Royal CD, Novembre J, Fullerton SM, et al. Inferring genetic ancestry: opportunities, challenges and implications. Am J Hum Genet. 2010;86:661–673.

Sankararaman S, Mallick S, Dannemann M, et al. The genomic landscape of Neanderthal ancestry in present-day humans. Nature. 2014;507:354–357.


1. A short tandem repeat DNA polymorphism consists of five different alleles, each with a frequency of 0.20. What proportion of individuals would be expected to be heterozygous at this locus? What if the five alleles have frequency of 0.40, 0.30, 0.15, 0.10, and 0.05?

2. If the allele frequency for Rh-negative is 0.26 in a population, what fraction of first pregnancies would sensitize the mother (assume Hardy-Weinberg equilibrium)? If no prophylaxis were given, what fraction of second pregnancies would be at risk for hemolytic disease of the newborn due to Rh incompatibility?

3. In a population at equilibrium, three genotypes are present in the following proportions: A/A, 0.81; A/a, 0.18; a/a, 0.01.

a. What are the frequencies of A and a?

b. What will their frequencies be in the next generation?

c. What proportion of all matings in this population are A/a × A/a?

4. In a screening program to detect carriers of β-thalassemia in an Italian population, the carrier frequency was found to be approximately 4%. Calculate:

a. The frequency of the β-thalassemia allele (assuming that there is only one common β-thalassemia mutation in this population)

b. The proportion of matings in this population that could produce an affected child

c. The incidence of affected fetuses or newborns in this population

d. The incidence of β-thalassemia among the offspring of couples both found to be heterozygous

5. Which of the following populations is in Hardy-Weinberg equilibrium?

a. A/A, 0.70; A/a, 0.21; a/a, 0.09.

b. For the MN blood group polymorphism, with two codominant alleles, M and N: (i) M, 0.33; MN, 0.34; N, 0.33. (ii) 100% MN.

c. A/A, 0.32; A/a, 0.64; a/a, 0.04.

d. A/A, 0.64; A/a, 0.32; a/a, 0.04.

What explanations could you offer to explain the frequencies in those populations that are not in equilibrium?

6. You are consulted by a couple, Abby and Andrew, who tell you that Abby's sister Anna has Hurler syndrome (a mucopolysaccharidosis) and that they are concerned that they themselves might have a child with the same disorder. Hurler syndrome is an autosomal recessive condition with a population incidence of approximately 1 in 90,000 in your community.

a. If Abby and Andrew are not consanguineous, what is the risk that Abby and Andrew's first child will have Hurler syndrome?

b. If they are first cousins, what is the risk?

c. How would your answers to these questions differ if the disease in question were cystic fibrosis instead of Hurler syndrome?

7. In a certain population, each of three serious neuromuscular disorders—autosomal dominant facioscapulohumeral muscular dystrophy, autosomal recessive Friedreich ataxia, and X-linked Duchenne muscular dystrophy—has a population frequency of approximately 1 in 25,000.

a. What are the gene frequency and the heterozygote frequency for each of these?

b. Suppose that each one could be treated, so that selection against it is substantially reduced and affected individuals can have children. What would be the effect on the gene frequencies in each case? Why?

8. As discussed in this chapter, the autosomal recessive condition tyrosinemia type I has an observed incidence of 1 in 685 individuals in one population in the province of Quebec, but an incidence of approximately 1 in 100,000 elsewhere. What is the frequency of the mutant tyrosinemia allele in these two groups? Suggest two possible explanations for the difference in allele frequencies between the population in Quebec and populations elsewhere.