A question about inheritance usually focuses on individuals or families. That is not surprising. Individuals are the object of medical concern. Families give us information about the genes they have inherited—and which relatives might, too. But we have already seen how we can learn important information by expanding our perspective to groups of families through pedigrees. Pedigree analysis tells us things about transmission that an individual family with a small number of children might not. New information surfaces. But even single families do not explain many important factors. For example, we cannot quantify the amount of a trait’s penetrance from just a single family or even a pedigree of several families. We may be able to find one, or even a few, examples of incomplete penetrance. But that does not tell us how often that event will happen in the population as a whole. Yet, it is on that population frequency that individual predictions depend. We must look at many families in which the trait is segregating. We must use “population thinking.”
Every population is highly diverse, but the genetic basis of diversity is not uniformly distributed. Examples are familiar, such as the higher frequency of sickle cell hemoglobin in those with ancestry in some northern African and Mediterranean areas or the X-linked glucose-6-phosphate dehydrogenase deficiency (G6PD, or favism) in those from areas like southern Italy. As our knowledge of genetic differences between one group and another improves, so does our ability to make predictions about individuals in those groups.
Population genetics gives us a quantitative perspective on variation, and new techniques are expanding the field. Analytical approaches like genomics, proteomics, and metabolomics are beginning to make their mark. In this chapter, we will explore some of the ways that knowing about genetic makeup of a population can provide valuable predictive tools, and we will see how these new approaches promise to change the future of genetic assessment. But first, history is also important. Let’s take a moment to consider the implications of population thinking in an ethical context. Our example will be a case in which, because of poor understanding of population genetics and even poorer humanitarian concern, authorities caused terrible pain for innocent people.
The story is about a woman who was ordered to undergo sterilization because she was labeled “feebleminded.” Carrie Buck (Figure 15-1) was born in 1906 and became pregnant when she was 17 after being raped by a presumed member of her foster family. Perhaps from embarrassment, her foster parents had her committed to the Virginia State Colony for Epileptics and Feebleminded, which took patients for being feebleminded or for displaying unmanageable behavior or promiscuity. Her daughter, Vivian, was born in March 1924. Carrie was the first person ordered to be sterilized under a new Virginia law as part of the state’s eugenics program. The case eventually ended in the United States Supreme Court. By an eight to one vote, the Virginia Sterilization Act of 1924 was found not to be in violation of the United States Constitution. There were various factors, including the contention that the Virginia Act was not a punishment and, since it was applied only to those living in a state institution, it did not deny them equal protection under the law. The Supreme Court Justice Oliver Wendell Holmes, Jr., wrote in the decision, “It is better for all the world, if instead of waiting to execute degenerate offspring for crime, or to let them starve for their imbecility, society can prevent those who are manifestly unfit from continuing their kind… . Three generations of imbeciles are enough.” Remember, Carrie Buck’s only “crime” was being raped by a relative.
Figure 15-1. Photograph of Carrie Buck a young woman to be sterilized under a state law under the auspice of the state’s eugenics program. (Reprinted from Paul B. Popenoe, “The Progress of Eugenic Sterilization,” Journal of Heredity, 25:1 (1934), 23.)
Carrie was released soon after her sterilization and eventually married. Her sister, Doris, was also sterilized without her knowledge when hospitalized for appendicitis. She did not find out until many years later, after having tried unsuccessfully to have a child for decades. Carrie’s daughter, Vivian, was an average student but died of an intestinal disease when she was only eight. There is no evidence that Carrie, her daughter, or her sister were “feebleminded.”
This is a sad story of an individual’s mistreatment based on an erroneous understanding of the genetics of populations. When Justice Oliver Wendall Holmes declared that, “Three generations of imbeciles are enough,” he implied that removal of affected individuals from the reproductive pool will have a quick, predictable, positive effect. Indeed, in the United States, as elsewhere, the eugenics movement of the late 19th and early 20th century had the goal of improving the genetic composition of our population. By 1935 more than 20,000 people had been forced to undergo “eugenic” sterilization and about 30 states had laws like the one in Virginia. In Hitler’s Germany, that doctrine had terrible consequences. Using the American model, about 375,000 people were sterilized just before the start of World War II.
But the scientific logic on which these acts were based was biologically unfounded. It was mathematically wrong. Allele frequencies do not change that quickly. The proof will come in the next section. Hopefully, we can all accept this as a lesson learned, if learned the hard way. Science and medicine can never be separated from bioethical considerations. Nor should they be. None of us are ever very far from ethical questions about how information is discovered, collected, and used. And it is increasingly true that informed, intelligent people are watching and care.
Part 1: Background and Systems Integration
Some important elements of genetic analysis can only be applied when we evaluate the population as a whole. Often this must be done theoretically—at least in part. What we say about a population will seldom allow us to make a concrete prediction about a specific future individual in that population. Instead, it is an argument built on probabilities. But powerful molecular and genetic tools are beginning to help us better understand individual patients and their families. In this section, we will introduce the quantitative approach of these tools.
Allele Frequencies in a Gene Pool
In Chapter 6 on patterns of Mendelian transmission, we used the Punnett square to help visualize the events that can occur at fertilization (Figure 6-4). The Punnett square combines two independent probabilities, i.e., the genetic makeups of an egg nucleus and of a sperm. If the cross is between two heterozygotes (A a), for example, then the probability of a gamete carrying, say, the A allele is ½, and the probability of the offspring inheriting the A allele from both parents and being A A is the product of the individual probabilities, ½ × ½ = ¼.
The same approach can be used to predict probabilities of each genotype in a population, with one minor generalization of the Mendelian cross assumptions. In a Mendelian cross, a heterozygous individual (A a) will have a ½ chance of producing a gamete with the A allele and ½ chance of the a allele. Thus, p = q = ½. But in a population, our thinking must expand from looking at the outcomes of a genotype, and instead consider the events that occur in a gene pool. The gene pool is a theoretical concept that represents all of the alleles in all of the individuals in the population. In a population, then, we can let prepresent the proportion of all A alleles and q all of the a alleles in the gene pool. If these are the only alleles for a given gene, then
p + q = 1
Although we can expand the algebra to account for more than two alleles (e.g., p + q + r = 1), in most cases that is not necessary. Note that, if we know one of these frequencies, say the frequency of a recessive allele a = 0.21 = q, then we can directly calculate the frequency of the other, since p = 1 − q and, in this example, p = 1 − 0.21 = 0.79. We will see the application of this idea as we explore the ways it is used to model genotype predictions.
The Hardy-Weinberg Equilibrium
From the allele frequencies in a population, it is an easy step to predicting the proportions of each genotype, assuming that nothing except normal meiosis and random fertilization are at work. We will use a simple application of the rule of multiplying independent probabilities (Figure 15-2). The assumption is that gametes with a given allele will combine as a function of the frequency of that allele in the gene pool. If, say, 10% of the alleles are R, then the likelihood that two R alleles will combine at fertilization to produce a R R genotype is 0.1 × 0.1 = 0.01.
Figure 15-2. The Hardy-Weinberg equilibrium is a simple derivation from the familiar Punnett square, which summarizes all outcomes and their proportions. If we let p be the frequency of the dominant R allele and q be the frequency of the recessive r allele, random associations among gametes in the reproducing gene pool will yield the three genotypes in the proportion p2 + 2pq + q2.
Often the application of this rule works in the reverse. If we know the frequency of a rare recessive condition in the population, the square root of that frequency (q2) will equal the allele frequency (q). For example, let’s say that a recessive condition is found in one child out of 2 500 births, a frequency of 0.0004 in the population. This is q2, so
q = 0.02 and
p = 1 − 0.02 = 0.98
Most often the frequency of interest is that for heterozygous carriers, 2pq, since they are phenotypically normal but have a chance of passing the allele to their offspring. In this case, for this example, the frequency of heterozygotes in the population will be
2pq = 2 (0.98) (0.02) = 0.0392
or about one child in 25 will be heterozygous.
Hardy-Weinberg is a simple, but very powerful, predictive tool. The heterozygotes are often hidden among the homozygous dominant members of the population. But their frequency can be estimated if we know how many in the population show the recessive trait. While it is true that advances in biochemical techniques can detect heterozygotes for some important traits, these are not often employed in routine assessments of a family. To sort out these associations, the Hardy-Weinberg relationship is very useful. But it only holds if key assumptions are true.
Hardy-Weinberg Assumptions: A Null Hypothesis for Population Genetics
The Hardy-Weinberg relationship is the null hypothesis of population genetics. By “null hypothesis” we mean the predictions hold if nothing is acting to change the basic process of passing alleles to offspring. It only assumes that meiosis and fertilization are normal and random. But this model has some additional dimensions. An obvious underlying assumption is that the species reproduces sexually. One cannot, therefore, apply these ideas to a human pathogen like bacteria. Another is that individuals are diploid. Some of the consequences of this discussed earlier, such as linkage disequilibrium (Chapter 11), are also relevant when considering the composition of a gene pool. In this chapter we will primarily look at examples involving simple diploid inheritance. Here, of course, there is an important exception: X-linked genes in males. In fact, it is the exceptions to this and other assumptions that make the study of population genetics such a fascinating and complex subject. The general impacts of exceptions are summarized in Table 15-1 and are described more formally in the next several sections. But we will only develop the formal mathematics in a couple of examples.
Table 15-1. Consequences for Exceptions to the Hardy-Weinberg Equilibrium Assumptions
Effects of Migration
Migration is a good example to show how exceptions can change expected gene frequencies. In this case, they may not change very much. But the point is that the effect of migration can be quantified with a relatively simple mathematical model. You might think that something like migration only relates to animals and plants. But migration among human populations with their regional and ethnic genetic histories will act the same way.
Assume two populations differ in the frequency of an allele (Figure 15-3). In this example, the frequency of the A allele (p = 0.8) is higher in the recipient population (r) than in the group providing migrants, where p = 0.4. The effect of this difference in allele frequency will be a function of how many migrants (m) move to the recipient population. All of the other alleles in the recipient population (1 − m) represent the nonmigrants that stayed in the original population. To see the effect of migration on allele frequency in the next generation, it is traditional to focus on the recessive allele frequency, with qdenoting the frequency of the a allele after one generation (q = 0.2 in this recipient population and = 0.6 in this donor population).
Figure 15-3. An example of migration between two partially separated gene pools, in which the recipient population (r) receives a proportion of migrants (m) from a population having a different frequency of the two alleles.
The frequency of the a allele in the recipient population, r, after migration has occurred will be symbolized qr′. This new frequency is a function of the proportion of alleles that remain in the recipient population and their frequency, plus the proportion of migrant alleles and their frequency in the donor population:
qr′ = qr (1 − m) + qm (m) = qr − m qr + m qm
If we let Δq (read “delta q”) represent the change in allele frequency for this one generation of migration (i.e., qr′ − qr is the difference in frequency of the new generation minus the previous generation), we can substitute into the formula and derive the generalized prediction of this migration effect.
Δq = qr′ − qr
= qr − m qr + m qm − qr
The qr terms cancel out, since qr − qr = 0. If we factor migration (m) out of the remaining terms and rearrange them to subtract, this reduces to:
= qr − m qr + m qm − qr
= − m qr + m qm
= m (qm − qr)
In other words, the effect of migration is a function of two things: how often does migration occur (m) and what is the difference in allele frequency between the two populations? This is logical. It also shows how relatively straightforward and precise the mathematical relation describing important population dynamics can be. We will not develop the formulae for other exceptions to the Hardy-Weinberg assumptions, except for the important case of selection. Instead, we will simply describe the consequences such processes can have. But all can be formalized in a manner like this.
To complete this specific example, we only need to substitute allele frequencies. For this demonstration, we will let the proportion of migrants, m, be high (m = 0.10). The new slightly elevated frequency of a in the recipient population is the original frequency plus the change introduced by the migrants:
qr′ = qr + Δq = 0.20 + 0.10 (0.60 − 0.20) = 0.24
Similarly, the frequency of the dominant allele in the recipient population will decrease from 0.80 to 0.76.
Effects of Mutation
Mutation turns one allele into another. Mutation rates vary from one gene to another based on factors like the size of the gene, i.e., the number of nucleotides that can change. So-called back mutation can revert a mutant back to the original allele. But back mutation is much less common than the typical forward mutation. This makes sense if you consider that there are many points along a gene that can alter its function if changed (i.e., forward mutation). But once changed, there are many fewer ways in which that change can be corrected back to normal.
Mutation rate is a “population” measurement. It applies to individuals, but it occurs with a rate that can only be measured from a group and that can vary due to environmental and other conditions. Even in cases like chemical- or radiation-induced mutation or the insertion of transposable elements into DNA, mutation rates for a gene must be estimated from their population frequencies. But approximations based on cumulative mutation data are generally sufficient for clinical assessments.
Many processes can mask or change a trait’s phenotypic expression. That can make it hard to distinguish back mutation from processes like variable expression or incomplete penetrance unless you have DNA sequence data or understand the gene’s mechanism of phenotype expression. While back mutation is an uncommon finding in typical medical situations with current assessment techniques, it may become a complicating factor to keep in mind when evaluating DNA sequences as more detailed genetic diagnostic tools become available in the future.
Population Size and Nonrandom Mating
As the size of a population gets smaller, the probability of inbreeding increases. This can have at least two important consequences. First, it can improve the likelihood that a recessive allele is inherited by both parents from some common ancestor. In that case a rare recessive trait is more likely to be expressed. Examples include about 20% of the instances of recessive albinism and xeroderma pigmentosum and 30% to 40% of the occurrences of Tay-Sachs disease in the United States in families with parents who are first cousins. The effect increases with the rarity of the recessive deleterious allele. That is because a rare allele will typically be found only as an occasional heterozygote in the population. But homozygous offspring will increase when related heterozygotes mate.
A second outcome of reduced population size is that variation due to random sampling becomes more significant. This results in genetic drift (Figure 15-4), which is caused by random variation and unpredictability in allele frequencies from one generation to the next. Drifting to complete homozygosity will occur in especially small populations, such as a bottleneck due to severe reductions in population size (Figure 15-5). Earlier, we saw a similar kind of reduction in genetic diversity when we discussed the transmission of mitochondrial mutants during cytoplasmic inheritance (Chapter 13). There a bottleneck in mitochondrial number during oogenesis can significantly change the proportions of normal and affected cells carrying an mtDNA mutation.
Figure 15-4. Results from a computer simulation of random sampling of alleles over time. When the number of individuals (N) is only 20, random drift changes allele frequencies and can lead to fixation of one or the other allele. When N is large, e.g., 1000, the effect of random sampling does not change the composition of the gene pool beyond minor sampling variation. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 4th ed. New York: McGraw-Hill, 2012.)
Figure 15-5. “Bottleneck effect.” A decrease in genetic diversity can occur even to the point of complete homozygosity. This may occur due to severe reductions in population size. (a) Hourglass representation. (b) Cheetahs. (a: Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 4th ed. New York: McGraw-Hill, 2012. b: Photo by Gary M. Stolz, U.S. Fish and Wildlife Service, via Wikimedia Commons.)
Deviations from random mating, such as choosing a mate based on some preferred trait, can also affect genotype frequencies. In small populations like religious enclaves or geographically-isolated groups, there is often little difference between nonrandom mating due to population size and owing to behavioral factors like mate choice. But in most large populations, nonrandom mating like positive assortative mating (mates choosing each other because of similarity in a character) can be important. Instead of changing allele frequencies, assortative mating only changes genotype frequencies. Specifically, it increases the frequencies of both homozygotes.
The effect in an extreme case of inbreeding, complete self-fertilization, is shown in Figure 15-6. The proportion of heterozygotes is halved each generation. In less extreme cases the outcome is similar, but it occurs more slowly. Inbreeding increases homozygosity without changing allele frequencies.
Figure 15-6. The extreme example of inbreeding (self-fertilization). Note that the proportion of heterozygotes is halved with each generation.
Consequences of Selection
Factors like inbreeding, mutation, and geographic origin may be more important than selection in their influence on many human population events. But selection is still a highly visible population phenomenon, at least theoretically. The Hardy-Weinberg equilibrium assumes no selection. Another way to say this is that each genotype will contribute equivalently to the next generation. Alleles are passed to the next generation as a function of the genotype’s frequency, not because it is better favored or more successful than another. But if one trait has an advantage over another, it will make a larger contribution to the next generation. There has been selection for the favored trait, and the alleles that produce it will increase in the next generation’s gene pool.
We can illustrate this idea by modeling the case of a recessive lethal genotype. If the homozygous recessive individuals die before reproduction, the Hardy-Weinberg assumption of “no selection” fails to hold. In Figure 6-4, we introduced the Punnett square to show how alleles combine by the product rule to yield all genotypic outcomes in their Mendelian proportions. In Figure 15-2, we relaxed the assumption that p = q = ½. That let us use the Punnett square to demonstrate the foundations of the Hardy-Weinberg equilibrium. Now we will make one more change: the homozygous r r die (Figure 15-7). There is selection in favor of the R allele, carried both in the heterozygotes and in homozygous R R individuals.
Figure 15-7. Punnett square demonstrating the case of a homozygous (r r) lethal condition.
As in earlier examples, the effect of selection can be modeled quite easily. For this example, the proportion of the three genotypes can initially be predicted from Hardy-Weinberg expectations. For this example, we will let the frequency of the normal dominant allele be 0.9 and the frequency of the recessive lethal allele be 0.1. In practice, a lethal allele would not be that high. But these beginning frequencies will let us see the effect of selection easily. If R = 0.9, then p2 = 0.81, and so on.
R R R r r r
p2 = 0.81 2pq = 0.18 q2 = 0.01
Now, if all r r individuals die or at least fail to reproduce, then the frequency of heterozygotes (the only ones still carrying the r allele and that can transmit it to the next generation) will be:
Frequency of R r = 2pq/(p2 + 2pq)
The new frequency of the r allele (q′) is, therefore:
q′ = pq/(p2 + 2pq)
Note that we use pq here instead of 2pq, since only half of the alleles inherited from a heterozygote (with a frequency of 2pq, see above) will be the recessive allele. Now, if we cancel p from both the numerator and denominator, this reduces to:
q′ = q/(p + 2q)
But, since p + q = 1, it follows that p = 1 − q. We can substitute this value for p, so the change in frequency is expressed only in terms of the recessive allele, q.
q′ = q/(1 − q + 2q) = q/(1 + q)
Again, this is a simple relationship that shows the predicted change in frequency of the lethal recessive allele. Substituting allele frequencies from the problem we began with, the new proportion q′ is 0.1/1.1 = 0.0909, and the expected proportion of homozygous recessive individuals in a randomly-mating population the next generation will be q′2, or 0.0083. The power of selection will rapidly decline as the frequency of the lethal allele gets smaller.
This is a demonstration of straightforward directional selection. The population is changed in a beneficial direction, such as increasing the frequency of an allele that gives DDT resistance to a treated agricultural pest (Figure 15-8). But other kinds of selection can also occur and may, in fact, be more common.
Figure 15-8. Directional selection demonstrated in survivors after DDT exposure. Note the increase in the percent of survivors with each passing generation. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 4th ed. New York: McGraw-Hill, 2012.)
Not all selection acts for or against a specific allele. In fact, most traits are well fitted for the survival of the individual and the population. This means that retaining the norm is good. This kind of selection favoring the normal or population average is called stabilizing selection (Figure 15-9). It reduces the variation for a character, since there is selection against survival of both extremes. A classical example is birth weight in newborn babies, in which there is reduced survival of newborns who are significantly below or above the population average birth weight.
Figure 15-9. Stabilizing selection—favoring the normal or population average in which the specific trait is advantageous. Note the decreased diversity in the population after this type of selection. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 4th ed. New York: McGraw-Hill, 2012.)
There is a slightly more complex, but still easily understood, kind of selection that can act on traits. Diversifying selection occurs when more than one trait is favored in a population because the habitat conditions are diverse or variable (Figure 15-10). The color changes seen for industrial melanism of moths in England is a visually concrete example. Although some discussions of this classic case describe it as directional selection, they are simply focusing attention on what occurred in a single habitat. But it is more informative to consider the different selection pressures found at different locations within the whole distribution of the species in England. Not all portions of its range were exposed to the same environmental change.
Figure 15-10. Diversifying selection occurs when more than one trait is favored in a population because the habitat conditions are diverse or variable. The bottom graph demonstrates two prevalent phenotypes after selection has occurred. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 4th ed. New York: McGraw-Hill, 2012.)
Many animals evolved melanism in response to the environmental effects of the British industrial revolution in the mid- to late-1800s. For scalloped hazel moths, the light speckled form was the most common until coal-fired factories began to dump large amounts of heavy-metal-laden smoke into the atmosphere. In the humid British climate, this rained down as heavy metal pollution and killed the light-colored lichens on tree trunks in the industrialized part of the country. The transition from light to melanic forms can, in fact, be documented quite accurately by looking at the moth and butterfly collections kept by wealthy landowners throughout the 19th century. A melanic mutant scalloped hazel moth was a rare prize until the industrial revolution.
Consider two contrasting habitats: tree bark covered with light-colored lichens in rural woodlands and the dark bark remaining after sensitive lichens had been killed by industrial pollution. Birds prey visually on resting moths. In clean rural forests, predators can easily see a dark moth against the lighter lichen-covered tree trunk. The spotted form is hidden and has a higher survival rate. The opposite is found in the industrialized midlands, where the loss of lichens left dark bark backgrounds exposed. Melanic moths blended better and avoided predation. Across the species range, therefore, the frequencies of the two forms differed as a function of the pollution conditions affecting lichen survival and, thus, on the differing colored backgrounds against which the moths rested and the birds hunted. No single trait was uniformly favored. It depended on environmental conditions. This, in fact, was one of the first well-documented examples of a change in allele frequency due to selection in a natural population. Many similar examples are now known. Interestingly, the light form of the scalloped hazel moth is increasing in frequency now that better air quality standards are in place and lichens are returning.
Selection is one of those genetic phenomena we are aware of intellectually. It has, of course, given us a wide range of pet dog and cat breeds. But we also tend to take it for granted as relatively unimportant to our own species. Are humans affected by directional selection, stabilizing selection, or diversifying selection? How common are they? For us, it is hard to say. The effects of population phenomena are often measured over decades or centuries. On a more manageable scale, however, these processes have a major impact on variables like pathogen virulence. There is, in fact, a whole medical specialty devoted to exploring “evolutionary medicine.” Our environment is both naturally variable each season and changing over time globally. We probably override much of the natural selection pressure on us by controlling our habitat (e.g., living in climate-controlled houses) and by promoting good health care. But as a natural process, selection is worth keeping in mind. It certainly affects the animal and plant species—and the pathogens—around us.
Diploidy and the Special Case of Sex-linked Genes
In its common form, the Hardy-Weinberg model assumes that genotypes are diploid. But there is an important exception that we discussed earlier, sex-linked genes in males. This exception actually has a mathematically simple result. Since males have only one X chromosome, the frequency of a sex-linked condition in males must equal its frequency in the gene pool (q). Its frequency in females would be the square, q2. Thus, sex-linked recessive conditions are expressed much more often in males than in females. To be specific, the frequency in males (q) is the square root of the frequency in females (q2).
Revisiting the Case of Carrie Buck
In the introduction to this chapter, we presented the case of Carrie Buck, one of those sterilized in an attempt to reduce the frequency of so-called “feeblemindedness” in the population. But now applying our knowledge of allele frequencies and selection, we can make the lesson even more concrete. In the hereditary sense, sterilization is biologically equivalent to death. No offspring are produced by that genotype. Consider the analysis we did earlier for selection against a recessive lethal allele. As the frequency of the recessive, q, declines, most alleles will be carried in heterozygotes, 2pq. The proportion of homozygous recessive individuals, q2, will become vanishingly small and selection pressure fades away (Figure 15-11). But large numbers of the targeted allele still remain for a long time in the gene pool.
Figure 15-11. Graph demonstrating the effect over time of a recessive allele when the homozygous individuals are “genetic lethal.” The proportion of homozygous recessive individuals, q2, will become vanishingly small but eventually selection pressure fades away. However, a significant number of the targeted allele still remains for an extended time in the population gene pool.
For feeblemindedness, the anticipated outcome of the eugenics movement was fundamentally flawed. While this is a mistake we hope will never happen again, we must remember that population genetics, indeed every aspect of genetics, is not just an intellectual exercise. It can have real consequences for real people.
Part 2: Medical Genetics
In this section, we will use the population perspective to explore some of its medical applications. But we will also look into the future. Advances in biotechnology are rapidly adding tools that change the way we can respond to genetic conditions. When the information about a patient includes a genetic profile, the questions and answers may become more complex. But they also become more precise. Added to this, we continue to learn a lot about our own biology from other organisms. The fruit fly, Drosophila, for example, has always been a centerpiece of genetic research. Some of the insights that have come from model organisms will be a key focus of the final chapter. There is no question that new insights that are coming from biotechnology and model organisms are improving our ability to understand and respond to medical questions faced by all of us.
Why Are Some Mutations So Frequent?
There are many reasons why some mutations are more frequent than others. From the population view, one of the most obvious is that a genetic difference—a “mutation” in the sense of simply being different from other common forms of the gene—may have such a minor effect on the phenotype that selection against it is relatively weak. Alternatively, one gene form may be better adapted to one environment or to one developmental context than another. Industrial melanism discussed in Part 1 is one example. Selection favors one allele in one region or at one time but favors a different allele at another.
But there are other situations that may have more direct medical or diagnostic significance. One example is a founder effect. Essentially, a founder effect is a form of sampling variation with consequences similar to genetic drift and bottlenecks discussed earlier. Indeed, the main difference among these processes is their cause, not so much their outcome. As the name suggests, a founder effect is due to establishing a new population with a small number of founders from a larger, more genetically diverse population. The founders will carry only a small sample of the alleles in the original gene pool and the frequency of these alleles can vary by chance. If this event is followed by some degree of inbreeding, which would be typical, then the probability of having otherwise rare recessive conditions appear in the group will be high. One example is the Kel Kummer Tuareg, a small group in the Sahara Desert that traces its origin to 156 founders about 300 years ago. Their ancestry is complex, and each pair of current members shares about 15 common ancestors. Essentially all of the current gene pool of the group traces back to just 25 individuals. Clearly, a patient’s ancestral history can be valuable information.
Another mechanism for maintaining higher than expected frequencies of alternative alleles is that the heterozygote is better fit than either homozygote. Heterozygous advantage will cause the heterozygous individuals to make a larger contribution to the next generation gene pool than predicted by the Hardy-Weinberg equilibrium, thus maintaining both alleles.
Sickle Cell Anemia and Protection from Malaria: An Example of Heterozygous Advantage
Sickle cell anemia is a serious condition we discussed earlier. It is often found is those tracing their lineage to southern Europe and Africa. At amino acid position 6 of the β-chain of hemoglobin, the amino acid glutamic acid is replaced by valine. Those with sickle cell trait are heterozygous for the condition and can show some sickling of the red blood cells (RBCs). Those with sickle cell disease (the homozygotes) have hemoglobin that readily crystallizes in low oxygen tension, such as at high elevations or following strenuous exercise. Red blood cells become distorted and less flexible. For that reason, those with sickle cell disease have a high frequency of capillary blockages, often with devastating effects on blood supply to key organs. But this condition is also closely associated with a completely different phenomenon, malaria.
The malarial protozoan parasite, Plasmodium falciparum, is transferred among endothermic (“warm-blooded”) hosts like humans by the mosquito, Anopheles. Both those homozygous and heterozygous for the sickle cell allele, Hbbβ, have a partial protection from infection, because the sickle RBCs break apart or lyse before the parasite has been able to reproduce. Those without the sickle cell gene have a 2- to 3-fold higher likelihood of becoming infected. This is an excellent example of heterozygous advantage: maintaining both alleles in the population because of the advantage of the heterozygotes due to the occurrence of malaria in the environment (selecting against the normal hemoglobin homozygotes that are more susceptible to malaria) and the sickle cell disease (selecting against those homozygous for the sickle cell allele).
Other Examples of Heterozygous Advantage
While the heterozygote advantage of malaria and sickle cell disease is one of the most cited and best understood, there are other such examples. We have discussed cystic fibrosis (CF) several times in this book (see Chapter 4including Figure 4-21). Clinically CF is characterized by pancreatic insufficiency and chronic progressive pulmonary disease. The pathogenesis of the disorder is related to the causative gene—CFTR. The CFTR gene is a chloride transport gene that functions primarily in exocrine cells. The pancreatic and pulmonary problems are due to inspissated mucous in these organs because of hyperviscosity of the mucous secondary to increased chloride concentrations in the mucous (due to the transport defect). Homozygotes for CF have a progressive lethal condition. Cystic fibrosis occurs in a high frequency in Caucasians of northern European descent. The carrier frequency for pathogenic CF mutations in this population is around 1 in 20. This high frequency is also suspected to be due to heterozygote advantage. It has been postulated that carriers of CF (with presumably subclinical differences in ion transport) are more resistant to the severe secretory diarrhea in cholera. Thus carriers of CF were more likely to survive the great epidemics in Europe in the 19th century. This then led to an increased frequency of CF mutations in this population and their descendants. The calculated heterozygote advantage for a mutant allele to reach equilibrium at a carrier frequency of 1:20 is around 2%.
Smith-Lemli-Opitz syndrome (SLOS) is a multiple anomaly syndrome associated with a defect in the terminal stage of cholesterol synthesis (see Chapter 8 including Figures 8-5 and 8-6). The carrier frequency of SLOS is estimated to be as high as 1/30. However, the observed incidence of SLOS is between 1/20,000 and 1/60,000. Based on the carrier frequency, the incidence should be 1/3600. This discrepancy has been suggested to be due to in utero loss and/or unrecognized milder cases. Given the high carrier frequency of SLOS, heterozygote advantage with founder effect has been proposed. It is postulated that the advantage to carriers is due to better vitamin D synthesis.
Heterozygote Disadvantage
It is important to note that not all mutations are advantageous as heterozygotes (carriers). Homocystinuria (HC) is an autosomal recessive disorder associated abnormalities in amino acid metabolism (see Chapter 8). The condition is genetically heterogeneous. The most common cause of HC is a deficiency in the enzyme cystathione-beta-synthase (CBS). Symptoms of HC include a Marfanoid body habitus, mental retardation, and lens dislocations (Figure 8-7b). The condition is genetically heterogeneous. The common biochemical features are elevated plasma homocysteine levels with subsequent increased homocysteine in the urine, from which the condition derives its name. Elevated homocysteine in the blood is toxic to the vascular endothelium, which potentiates the lipoprotein LDL and which leads to increased platelet adhesion. Thus a major part of the pathogenesis of the condition is the occurrence of micro-emboli and the resultant problems associated with the vascular occlusion.
The carrier frequency (heterozygotes) of CBS deficiency is 0.3% to 1%. As a group, heterozygotes for CBS deficiency have normal fasting plasma homocysteine levels, but they often have increased urine concentrations. Some, but not all, will show elevated homocysteine levels in response to a methionine load. Hyper-homocysteinemia is a well-documented independent risk factor for cardiovascular disease. Likewise, increased post-load plasma homocysteine concentrations are a risk factor for vascular disease and neural tube defects. There remains significant debate in a large body of literature as to the relative overall contribution of CBS deficiency to the epidemiology of vascular disease. Regardless, the overriding concept is that carriers are at increased risk for medical problems, not at a genetic advantage.
Assessment of Human Genetic Diversity: The HapMap Example
In earlier chapters we referred to some of the initiatives that are taking place to improve genetic mapping and interpretation. One of these is the HapMap Project. It focuses on linked groups of genes, or haplotypes, that are marked by a unique single nucleotide polymorphism (SNP). Rather than trying to track the 10 million or so individual SNPs tied to all regions of the genome, the use of representative tag SNPs reduces the analysis to about 500,000 sites. This is just one example of the advances in genetic technology that will contribute to deciphering the genetic bases of heritable conditions.
The Impact of Changing Technology: Genomics-Proteomics-Metabolomics
It was not long ago that words like proteomics and metabolomics would have been nonsense. Now they are specialties with their own journals and professional organizations. Working in a field that is changing as rapidly as medicine, a physician will always be in one sense a medical student. Advances in genetic technology share a lot with the advances in population genetics. Both must think in terms of systems, not just individuals. A genome is not simply about A and a alleles or about expression of homozygotes versus heterozygotes. Instead, genotypes are at the core of a multilevel biochemical system.
The developing field of genomics is concerned with determining the DNA sequence of a species. The first successes were sequencing the 5368 bp genome of a virus, φX-174, and of a mitochondrion by Fred Sanger in the 1970s and 1980s. With this work, he and his group developed the first sequencing, data handling, and analysis tools that later work built upon. The first complete organismal genome was of Haemophilus influenzae in 1995. With the completion of the 13-year Human Genome Project in 2003, the technologies it developed are now being applied to sequencing genomes from other species. This allows comparisons of gene function to be made across the living spectrum.
For some the term “genomics” has now taken on a broader meaning to include not only the DNA sequence but also functions associated with it. Functional genomics focuses on the dynamics of gene and protein activities and interactions. Rather than simply annotating DNA sequences, this field explores gene transcription and translation as well as the interactions that occur between genes and proteins and in protein-protein interactions. In a similar but more medically targeted way, pharmacogenetics (sometimes now called pharmacogenomics) is a specialized branch of pharmacology that explores individual differences in response to drugs and environmental chemicals.
The focus of transcriptomics is slightly more narrow. Only a tiny percentage of the DNA in humans is transcribed. As complex as the genome is, it has the advantage of being relatively stable from one cell or one individual to another. In contrast, whereas the human genome may have 20- or 30,000 genes, there are probably more than 2 million different proteins produced during a lifetime. This is accomplished by posttranslational modification and related processes we discussed in earlier chapters. Each of these proteins has a different function and participates in a wide array of catalytic reactions, as cell structural components, in trophic growth signaling, neurotransmission, defending the body against disease, and so forth. Proteomics is the study of these proteins and their functions.
The term “proteomics” is drawn from PROTEin and the genOME. The protein profiles differ from cell type to type and from one time in development to another. Factors that go beyond mRNA translation include various forms of posttranslational modification like phosphorylation, ubiquitination, methylation, glycosylation, and other modifications. At a more complex level, each of these must be considered in terms of their protein-protein interactions. When we think of the functioning of the body, proteomics can give us a better understanding than can a study of the genome alone. By looking just at the genome, we have no idea how often a gene is transcribed or how stable will be its mRNA or its protein product. On the other hand, proteins are not easy to study, especially when present in low amounts or for short periods. At this time, therefore, proteomic studies often suffer from reproducibility problems that limit their predictive strength. Part of this limitation is, of course, the fact that proteins are being modified. So the dynamic process and its inherent limitation are intertwined.
But proteins are not end points. At a more functional level of cell biochemistry, metabolomics is the study of metabolites produced by cellular processes. The genome and proteome offer a profile of the controlling processes, and the metabolome documents the end products of their activity. Its tools include nuclear magnetic resonance (NMR) spectroscopy and mass spectrometry. The first metabolomics web database, METLIN, now contains information on over 40,000 human metabolites. Parallel studies are also being done in several plant species.
Metagenomics is the analysis of genomes from material collected from an environmental sample. Traditional techniques have missed many of the species living in a habitat, and this field is providing a new look at microbial diversity. Since samples are sequenced directly from natural collections, there is no requirement that an organism be cultured, a limitation that had kept the vast majority of microbial organisms from being discovered and studied. One of the earliest such projects was led by Craig Venter, a pioneer in sequencing the human genome. Analyzing samples from the Sargasso Sea, DNA was found from almost 2000 different species including almost 150 new types of bacteria. Because of the medical importance of bacteria, metagenomics is now being used to profile the microbial communities from numerous body sites in several hundred representative people in the Human Microbiome Initiative.
Personal Genomics
We carry our genetic heritage with us. For some conditions, knowing the geographic “origin” of a patient may give useful insights comparable to knowing whether a patient has traveled recently to areas where infectious disease is endemic. There is increasing availability of commercial sources for a person to gain their genome profile. Sometimes this can have a clearly beneficial outcome, such as selecting the best drug option for an individual physiology. But our knowledge of drug × physiology interactions is just in its infancy. So this is a new frontier. An associated issue is the potential for discrimination based on genetic profiles.
Let’s step back for a moment. Why have we chosen to bring these developing areas of genetics into a discussion of populations? We hope the answer is fairly obvious. Population genetics and the developing fields in molecular biology all share a focus on systems. They try to understand systems of alleles or individuals in a population, systems of proteins, systems of metabolic interactions, and ultimately systems of…, well, systems. But as impressive as the growth of medical genetics is in understanding all these levels, the most promising realization is rather simpler. The more we know about the complexity of genetic control of development and function, the better able we are to diagnose and respond to the conditions faced by an individual patient.
Tracking Populations by Genes
The advent of molecular genetic technologies has provided powerful and exciting tools for use in tracking populations and their relationships. Many polymorphisms exist between individuals and among populations. The exclusive maternal inheritance of the mitochondrial genome provides a method for tracking maternal lineage over long periods of time. Analyses of mitochondrial polymorphisms analyses have provided fascinating insights into human populations (Figures 15-12 and 15-13). One of the most amazing observations in these studies has been how well the DNA data correlate with earlier archaeological, anthropological, and linguistic predictions. Similar studies have been done with polymorphisms on the Y chromosome that exclusively track paternal lineages.
Figure 15-12. Diagram showing lineages based on mitochondrial DNA (mtDNA) polymorphisms. (Image from MITOMAP: A human mitochondrial genome database. Available at http://www.mitomap.org. Image is licensed by a Creative Commons Attribution 3.0 license.)
Figure 15-13. Map showing the hypothesized migration of human populations based on mtDNA analysis. (Image from MITOMAP: A human mitochondrial genome database. Available at http://www.mitomap.org. Image is licensed by a Creative Commons Attribution 3.0 license.)
DNA Sequencing of Model Organisms
One of the surprising results to come from sequencing many species is that we share a significant part of our genetic background. Population genetics was founded on the study of genes in moths, fruit flies, and snails. The genetic information from model organisms offers an insight into biology that parallels that from advances in technology. The biological similarities among animals will help us understand ourselves better. We explore this in the final chapter.
Part 3: Clinical Correlation
Noonan syndrome (Figure 15-14) is an autosomal dominant syndrome characterized by dysmorphic facial features (hypertelorism, down-slanting palpebral fissures, and low-set/posteriorly rotated ears). Other features seen in Noonan syndrome include short stature, a short neck (with or without webbing), cardiac anomalies (especially pulmonic stenosis), epicanthal folds, neurosensory hearing loss, developmental delay, and a bleeding diathesis. Noonan syndrome is genetically heterogeneous with at least 11 genes now known to be associated with the condition (Table 15-2).
Figure 15-14. A young girl with Noonan syndrome. This child has a mutation in a gene known as KRAS. Noonan syndrome has significant genetic heterogeneity with at least 11 known genes associated with the condition.
Table 15-2. Genes Associated with Noonan Syndrome
PTPN11 (50% of cases)
BRAF
CBL
HRAS
KRAS
MAP2K1
MAP2K2
NRAS
RAF1
SHOC2
SOS1
Neurofibromatosis type I (Figure 15-15) is an autosomal dominant disorder which exhibits complete penetrance with a wide range of variability in its expression. It is characterized by the increased propensity to develop a variety of benign and malignant tumors. Neurofibromas, from which the condition gets its name—originate from nonmyelinating Schwann cells and are among the most common tumors seen in these patients. Other tumor types reported include CNS tumors (optic gliomas and meningiomas among others), endocrine tumors, and sarcomas. Patients with neurofibromatosis I also have a variety of dermatologic findings such as hyper-pigmented macules (designated as “cafe-au-lait spots”) and freckling in abnormal locations such as the axillary or inguinal regions. Other cardinal features are the presence of iris hamartomas, or Lisch nodules, and bony dysplasias. Neurofibromatosis 1 is caused by mutations in the gene neurofibromin at chromosome location17q11.2. Neurofibromin functions as a tumor suppressor gene involved in the RAS signal transduction pathway.
Figure 15-15. Adult male with neurofibromatosis type I. (a) Back of the patient demonstrating multiple hyperpigmented macules and tumors (neurofi bromas) (b) Hamartomas of the iris (Lisch nodules) which are a characteristic feature of neurofi bromatosis type I.
Both Noonan syndrome and neurofibromatosis are established conditions with well-delineated phenotypes and diagnostic criteria. Over time clinicians have described a number of patients who have features overlapping both conditions. Noonan-neurofibromatosis syndrome is a condition described in patients with neurofibromatosis who also have manifestations of Noonan syndrome, such as short stature, characteristic facial features, and pulmonic stenosis (Figure 15-16). A similar condition is Watson syndrome. The two major features of Watson syndrome are pulmonic stenosis and cafe-au-lait spots. Patients with Watson syndrome may also have cognitive impairments and short stature. Most patients have macrocephaly. Many have Lisch nodules and/or neurofibromas. Molecular genetic studies have now shown that these three conditions are allelic—all being caused by mutations in the neurofibromin gene.
Figure 15-16. Adult female with neurofibromatosis with a confirmed neurofibromin mutation. Note the craniofacial features which are typical of those seen in Noonan syndrome.
Further investigation into the relationship between Noonan syndrome and neurofibromatosis I has shown a molecular link. As mentioned earlier, the neurofibromin gene modulates the RAS oncogene. The genes that are associated with Noonan syndrome have also been shown to be involved in the same signaling pathway. Likewise, several other conditions with phenotypes that also overlap with Noonan syndrome and/or neurofibromatosis have been shown to be involved in this pathway as well. These include Costello syndrome, cardio-facial-cutaneous syndrome, Legius syndrome, and multiple lentigines, electrocardiographic conduction abnormalities, ocular hypertelorism, pulmonic stenosis, abnormal genitalia, retardation of growth, and sensorineural deafness (LEOPARD) syndrome.
Thus at the level of metabolomics, all of these conditions are related. In fact they represent a family of disorders linked by their involvement in the RAS signaling pathway (Figure 15-17). Collectively they have now been termed the RASopathies based on this central shared pathogenesis. As more is understood about the molecular basis of human disorders, the list of these genetic families continues to grow accordingly.
Figure 15-17. Diagram of the RAS-MAPK signal induction pathway. The multiple syndromes (RASopathies) and their associated genes are noted in the boxed areas. (Adapted from Ekvall S, Hagenas L, Allanson J, et al. “Co-occurring SHOC2 and PTPN11 mutations in a patient with severe/complex Noonan syndrome-like phenotype.” Am J Med Genet A., 155(6): 1217-24, 2011.)
Board-Format Practice Questions
1. Assume that a population is in Hardy-Weinberg equilibrium and the frequency of a rare autosomal recessive disease allele is q, then the frequency of disease carriers can be estimated as:
A. p2
B. q2
C. 2pq
D. 2q
E. 2(1-p)
2. Mr and Mrs Smith have come to your clinic. Mrs Smith is 9 weeks pregnant and has a sister with cystic fibrosis (CF), an autosomal recessive disease. They are concerned about whether their baby will also be born with the same disease. Mr and Mrs Smith are Caucasian (incidence for CF is 1 in 2500). What is the probability that Mrs Smith is a carrier of CF?
A. 1/2
B. 1/4
C. 2/3
D. 1/25
E. 1/50
3. It is recognized that parents and children share ½ (half) of their genes and that siblings share ½ (half) of their genes. Which one of the following is the correct number of shared genes for first cousins?
A. 1/8
B. 1/16
C. 1/32
D. 1/64
E. 1/128
4. Hemochromatosis is a disorder associated with iron overload. It is an autosomal recessive condition. It is caused by mutations in a gene called HFE. It is one of the most common human single gene disorders occurring in about 1 in 256 individuals of northern European descent. What then is the approximate carrier frequency of hemochromatosis in the northern European population?
A. 1 in 2
B. 1 in 8
C. 1 in 50
D. 1 in 100
E. 1 in 256
5. One possible reason that certain mutations occur frequently in a given population would be:
A. Heterozygote advantage.
B. Heterozygote disadvantage.
C. Foundation efforts.
D. Random mating.
E. Low mutation rate.