Medical Genetics 1st Ed

chapter 2

Information Flow and Levels of Regulation

CHAPTER SUMMARY

Deoxyribonucleic acid (DNA) is sometimes described as a “blueprint” for development. Although easy to visualize, that view is a damaging oversimplification. A blueprint defines the location of each element in a given structure. In contrast, genetic control of development is much more dynamic. A better description for the role of DNA is as a “recipe for interactions.” DNA codes for the assembly of proteins by way of a ribonucleic acid (RNA) intermediate, the messenger RNA (mRNA). But it is the interactions among the resulting proteins, other RNAs like microRNAs, and their feedback influences on the genome that will determine how cells, tissues, organs, and the body as a whole will take on its form and function.

As the ultimate information resource for biological processes, DNA is essential. But in many ways it is the simplest part of the development puzzle. In 1953, Watson and Crick proposed a model for the structure of DNA that has been confirmed by experiments to test predictions about processes like DNA replication during cell division. But knowing the structure of DNA does not explain how it works. Sequencing the human genome was also not the final answer. Instead, knowing the genome’s DNA structure leads to a higher level of questions. How is DNA organized into genes that control the activities of a cell to create individual phenotypes? How do genes influence each other? How do proteins interact with each other to form networks of biochemical change? How do functional pathways and feedback loops influence the organism at a level beyond the simple turning-on of a gene? Answering questions like these is the focus of new fields like genomics, bioinformatics, proteomics, and metabolomics. Furthermore, genes do not work in isolation from their cellular and developmental environment. What roles do environmental variables like temperature play in forming a trait?

Building from that perspective, we can think of our development as the product of a molecular storm. Storms may be influenced by rules, but the rules are often complex and random events can be influential. Rather than genes providing a simple blueprint, the unfolding of each step in development is actually the result of hundreds, if not thousands, of different molecular interactions. This perspective is introduced here but will be explored in detail in later chapters. DNA is only the beginning.

Part 1: Background and Systems Integration

From DNA to Protein: the Central Dogma of Molecular Biology

A dictionary definition of “dogma” is that it is information presented as an established opinion or an authoritative view, but without significant grounds of support. In that sense, the Central Dogma of Molecular Biology is famously misnamed. The flow of information it describes is extensively supported by experimental evidence. Still, it compactly summarizes the unifying theme of molecular genetics: DNA ↔ RNA → polypeptide (Figure 2-1). The nucleotides that make up a region of DNA are transcribed into a complementary strand of RNA that is then translated into a strand of amino acids, a polypeptide. A large polypeptide is called a protein.

images

Figure 2-1. The Central Dogma is a unifying theme of molecular biology. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

One portion of this information flow is reversible in special circumstances. Retroviruses that utilize RNA as their genetic material use reverse transcriptase to create a DNA copy that can integrate into the host’s genome so it is transmitted to daughter cells during cell division. The other step, translation of the RNA sequence into protein, is not reversible. This provides a molecular argument against the early evolutionary hypothesis by Jean-Baptiste Lamarck that traits acquired during an organism’s lifetime, such as modifications through use or lack of use, can become heritable. The “inheritance of acquired characters” was a powerful idea that competed with later Mendelian genetic models and was even promoted by Trofim Lysenko, with strong political support, to the serious detriment of Russian agriculture as late as the 1960s.

But biology is complex and can sometimes hide surprises. We know that the biochemical process of protein synthesis on ribosomes is not reversible, so altering a protein or other body part does not change the heritable DNA. Acquired traits cannot be passed to offspring that way. But recent advances in our understanding of the biochemistry of DNA suggest that there might be important exceptions to this rule. Some regulatory mechanisms may leave an imprint on chromosomal DNA that can alter its later activity. In such a case some acquired conditions can influence development in later generations, an idea that is being explored further.

The outline of the Central Dogma came about by recognizing that the genome in the nucleus is physically separated from the site of protein synthesis on ribosomes in the cytoplasm. There must be some intermediary molecule that Francis Crick labeled a “messenger,” and mRNA was soon discovered. But the existence of an intermediary like mRNA has other far-reaching implications. In cells with a membrane barrier between DNA and the site of protein synthesis, there are opportunities for several levels of regulation that dramatically increase the potential coding power and flexibility of the genome.

Importance of Having a Nuclear Membrane

Prokaryotes like bacteria differ from eukaryotes (animals, plants, fungi, and most single-cell organisms) in several ways (Figure 2-2). But their names indicate one of the most important differences. The term “prokaryote” literally means “before a nucleus,” and “eukaryote” means “true nucleus.” In prokaryotes, the DNA and other cell components are in the same cellular space, so an mRNA molecule might still be forming when its first part begins binding with ribosomes to start protein synthesis. The connection between transcription and translation is immediate. Eukaryotic cells, on the other hand, have a double membrane separating the chromosomes from other parts of the cell. This creates two functional domains, the nuclear and the cytoplasmic. Eukaryotes also have an array of other important membranous organelles that will be a focus of our later discussions. Among these, mitochondria even have their own DNA. But the nuclear envelope offers a key element in the genetic regulation of biochemical expression. By physically separating the process of transcription from translation, eukaryotic cells are able to modify the initial RNA transcript in various ways before the mature mRNA molecules are transported out of the nucleus. This simple separation of functions thus allows a potentially large expansion of the possible polypeptides that can be produced by each original gene in eukaryotes. It helps explain how the 100,000 or more proteins expressed in a human can be produced by a surprisingly smaller number of genes.

images

Figure 2-2. The eukaryotic cell has many membranous organelles that influence the way genetic-coded information is processed and expressed. The prokaryotic cell lacks membrane-bound organelles. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

Opportunities for Regulation

The formation of a protein is ultimately encoded in the sequence of nucleotides in a molecule of DNA. But the steps between these points allow many opportunities to influence the outcome (Figure 2-3). By “turning-on” a gene, we simply mean that enzymes and regulatory proteins are activated to synthesize a molecule of RNA using one of the two DNA complementary strands as a template. This initial RNA molecule is then modified in several ways to become a functional mRNA. Certain sequences, introns, are spliced out of the initial transcript leaving behind the coding sequences, exons, in the mature mRNA. But alternative splicing of introns can result in several slightly different versions of mRNA, so one gene can yield several related products, depending on the cell type or developmental stage. The mRNA is then exported through pores in the nuclear membrane with protein complexes that control the traffic of this and other large molecules between the nuclear and the cytoplasmic domains.

images

Figure 2-3. The regulation of gene expression involves determining which genes are transcribed as well as the various ways the initial transcripts are processed. Regulation also takes place during translation and modifications that can occur in the protein products. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

In the cytoplasm, the mRNA binds with ribosome sub-units to initiate protein synthesis. Competition among mRNAs, variation in RNA longevity, and temporary inactivation or silencing by microRNAs or other agents can influence how quickly and for how long each type of mRNA will function. But influences on the gene product can occur even after translation. The polypeptide produced at the ribosome may be active immediately as a structural protein or enzyme. But in many cases a protein is initially inactive until it is activated by some external inducer molecule or it binds with other polypeptides to form a higher-order complex. One example is an inactive proteolytic enzyme, pepsinogen, that is activated into pepsin by hydrochloric acid in the stomach so it does not prematurely digest the proteins in the cell that made it. Thus, there are many opportunities for regulation or intervention in the events between gene and protein. The information flow from a gene to a final phenotype is a network of interactions.

The genetic codes of prokaryotes and eukaryotes are fundamentally similar, and many of the biochemical processes involved in DNA replication and genetic control have parallels that make prokaryotes excellent models with which to study more complex eukaryotic mechanisms. Much of what we know about DNA replication, transcription, and translation comes from these simpler systems. In the next sections we will outline some of the key events in each process as seen in prokaryotes and point out some ways in which eukaryotes like humans may differ.

DNA Replication

In Chapter 1, we introduced DNA as a double-stranded molecule, a polymer composed of nucleotide subunits. Several characteristics of this helical molecule are important to keep in mind when exploring its replication (Figure 2-4). Each nucleotide carries one of the four nitrogenous bases: adenine (A) and guanine (G) are purines, and thymine (T) and cytosine (C) are pyrimidines. Nucleotides have an important directional asymmetry based upon the way subunit components attach to the 5-carbon sugar, deoxyribose. The nitrogenous base is attached to the 1′ carbon, a phosphoric acid group is attached to the 5′ carbon, and there is a hydroxyl (-OH) group on the 3′ carbon. DNA polymerases are the enzymes that link nucleotides together to form a single strand. They are only able to add a new nucleotide to a preexisting 3′ OH sugar group. They catalyze the formation of a covalent bond between the 3′ carbon of an existing nucleotide and the phosphoric acid group attached to the 5′ carbon of the new nucleotide, creating a sugar-phosphate backbone with bases at regular intervals. In such a strand, the “earliest” carbon is the 5′ carbon of the first nucleotide, and the newest position is the 3′ carbon of the last nucleotide. In other words, a strand of DNA grows in the 5′ to 3′ direction.

images

Figure 2-4. The DNA double helix is composed of two antiparallel nucleotide strands oriented in opposite 3′-5′ directions to each other. There are about 10 base pairs in each 360° turn of the alpha helix. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

Another key factor is the way hydrogen bonds are formed between nucleotide bases. Adenine only binds to thymine, using two hydrogen bonds; and cytosine only binds to guanine, using three hydrogen bonds. For that reason, the proportion of A is the same as T in double-stranded DNA, and G is the same as C. This relationship, known as Chargaff’s rule, can be presented in several different ways, such as A = T and G = C, or A + G = T + C (indicating there is one purine for each pyrimidine). In the double-stranded DNA molecule, the sugar-phosphate backbones of the complementary strands are antiparallel. One is oriented 5′ to 3′ and the opposite strand is oriented 3′ to 5′. These orientations are important both to the process of replication and to the mechanism for identifying and transcribing genes in the proper sequence during development.

The DNA helix is not a uniform spiral. It has a major and a minor groove. In the major groove, the DNA bases are in contact with water, and proteins that regulate gene action can bind there. To replicate a double-stranded molecule, it is first necessary to separate it into two single-strands to serve as templates for new synthesis (Figure 2-5). But DNA is an alpha-helix. If you have ever tried to pull apart the individual strands of a twisted rope, you will realize that separating the twisted strands generates supercoiling of the remaining part. The occurrence of supercoiling and the fact that strands grow at only the 3′ end of the antiparallel single-strand templates means that the replication of DNA has special challenges. Some of the main enzymes responsible for replication are shown in (Figure 2-6), which diagrams the events at one replication fork. Of special note is the fact that once the hydrogen bonds have been broken by DNA helicase and the single strands are stably separated by single-strand binding proteins to form two complementary templates, a short RNA sequence, a primer, is laid down by primase. DNA polymerase III (Figure 2-7) can use the 3′-OH position of an RNA nucleotide as a point for attaching the first DNA nucleotide. Replication at this fork occurs in opposite directions on the two template strands.

images

Figure 2-5. The replication of DNA is semi-conservative, or “half saved.” Each of the two strands of the original DNA molecule serves as a single-strand template for the synthesis of a new complementary strand. Thus, in the next generation, half of the DNA double helix is directly from the original and half is new. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

images

Figure 2-6. DNA replication involves many proteins. DNA helicase unwinds the double-stranded molecule; topoisomerase relaxes the supercoiling that is generated by this unwinding; single-strand binding proteins keep the complementary strands from reassociating with each other; primase forms a short RNA sequence on which DNA polymerase III can attach new DNA nucleotides; and DNA ligase links together the short DNA fragments, the Okazaki fragments, that are formed by replication on the strands in which the 3′ end is nearest the replication fork. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

images

Figure 2-7. Diagram of the events that occur when DNA polymerase III moves along the template toward its 5′ end, adding nucleotides to the 3′ end of the new strand. (Reprinted with permission from Ying Li et al: Crystal structures of open and closed forms of binary and ternary complexes of the large fragment of Thermus aquaticus DNA polymerase I: structural basis for nucleotide incorporation. Embo J. 1998; 17:24, 7514-7525.)

On the leading strand, where the template is oriented with the 5′ end nearest the replication fork, synthesis of a new complementary strand can be continuous because it adds nucleotides at the 3′ end. But on the lagging strand, synthesis occurs in discontinuous bursts as new template is opened by the DNA helicase (Figure 2-8) with creation of periodic RNA primer sequences. This results in short sequences, Okazaki fragments, of about 1000 to 2000 nucleotides long in bacteria and about 100 to 200 nucleotides long in eukaryotes. To complete synthesis on the lagging strand, therefore, the RNA primers must be removed, DNA nucleotides must replace them, and the final covalent bond must be formed between adjacent fragments. In bacteria, DNA polymerase I removes the primers and inserts DNA nucleotides. DNA ligase catalyzes the final covalent bond.

images

Figure 2-8. Events at a replication fork during DNA synthesis. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

Some of the enzymes involved in replication are part of a larger complex (Figure 2-9). DNA helicase and primase are bound together as a primosome which separates the parental strands and creates RNA primers spaced along the lagging strand. Linking the proteins helps coordinate their functions more efficiently. The primosome, in turn, is associated with two molecules of DNA polymerase III, one for the leading and one for the lagging strand. This complex is the replisome. When the polymerase on the lagging strand completes an Okazaki fragment, it is released from the template and jumps to the next nearest RNA primer to begin again. Although highly accurate, there is some error in all of these processes, and indeed error-prone enzymes are the source of enhanced mutation associated with certain genetic diseases. But many of the errors are detected and corrected during and soon after replication. Repair systems will be discussed in Chapter 7.

images

Figure 2-9. During DNA replication, the helicase and primase form a primosome, which associates with two molecules of DNA polymerase III. One of these synthesizes a new strand continuously on the leading strand, but the other synthesizes new fragments in discontinuous bursts on the lagging strand. The primosome plus the two polymerases makeup the replisome. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

The understanding of replication discussed so far comes from studies in simple prokaryotes. Although the biochemistry of eukaryotic DNA replication is not as well understood, there are many similarities with that in prokaryotes. A comparable array of enzymes is involved, but the process is more complex. One obvious difference comes from the vastly larger size of the eukaryotic genome. In bacteria, there is a single origin with replication proceeding bidirectionally along two forks that eventually meet around the circular bacterial chromosome. Eukaryotic chromosomes are much longer DNA strands and are linear. Multiple origins of replication are required for the process to occur rapidly (Figure 2-10). Like prokaryotic origins of replication, those identified so far in eukaryotes have a high proportion of A and T bases. With two, rather than three, hydrogen bonds in an A-T pair, enzymes can separate these regions into single-strand templates more easily.

images

Figure 2-10. The large eukaryotic genome requires multiple replication forks that originate during the S phase of interphase prior to mitosis or meiosis. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

In eukaryotes a prereplication complex of at least 14 proteins is required to begin replication at an origin site. In addition, eukaryotes have many different DNA polymerases; mammals have over a dozen. Some of these appear to function in error correction and in repairing various kinds of DNA damage. A final major difference between circular bacterial versus linear eukaryotic strands is the need to handle replication at the ends of a chromosome. Since DNA polymerase must have a preexisting 3′-OH nucleotide to which to attach the first DNA nucleotide, it cannot replicate the initial 3′ end of the chromosome since there is no place for a primer to be synthesized for it. Even if a primer is placed at the tip, the DNA polymerase cannot replace the most distal RNA nucleotides with DNA nucleotides without an earlier 3′-OH to link with. To keep the coding region of DNA from becoming shorter at each replication cycle, eukaryotic chromosomes have tandemly-repeated sequences (TTAGGG in humans repeated 250 to 1,500 times) in a region called a telomere at each end (Figure 2-11). This extra DNA provides a site for primer formation and avoids chromosome shortening into the information-coding DNA.

images

Figure 2-11. Telomeres at each end of a eukaryotic chromosome are made up of tandemly duplicated sequences and a short overhang. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

Manipulation of DNA Replication: the Polymerase Chain Reaction

Many techniques of molecular biology depend on having pure samples of a specific piece of DNA. This is now routinely accomplished by a manipulation of the process of DNA replication, the polymerase chain reaction (PCR) (Figure 2-12). As we have just seen, replication requires a single-stranded DNA template, primers to provide the 3′-OH group to which a new nucleotide can attach, DNA polymerase, and the four nucleotide triphosphates. The primers are commercially-prepared nucleotide sequences, or oligonucleotides, typically about 18 to 22 bases long that flank the region to be amplified. Primers are easily made to order for researchers interested in any particular region of the genome. During PCR, these components in an appropriately buffered solution are manipulated by repeated cycles of heating and cooling to amplify the targeted DNA region.

images

Figure 2-12. The polymerase chain reaction (PCR). (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

A small amount of genomic DNA is first heated to 94°C-95°C to separate the double helix into single strands. The reaction is then cooled to a temperature between 52°C and 58°C or slightly higher, which has been predetermined as optimal for hybridizing the two primers to the single-strand templates. After 30 seconds to a minute, temperature is increased to 72°C at which a heat-resistant DNA polymerase from Thermus aquaticus (Taqpolymerase, isolated from a bacterium adapted to living in hot springs) extends the new strand for as much as 1000 bases or so. Temperature is then cooled. This heating and cooling cycle can be repeated, often 25 to 35 times, yielding a large number of copies of the targeted DNA sample. The use of heat-adapted Taq polymerase means that the enzyme is not destroyed each time the reaction is heated to 94°C-95°C to melt the DNA.

Transcription and RNA Processing

Transcription is the process of synthesizing a single-stranded RNA molecule from an active gene, literally “transcribing” a copy of the genetic message. The molecular signals that actually initiate the process will be part of our discussion of development and pattern formation in Chapter 3. Here we will focus upon the events that yield the initial RNA transcript and will introduce some of the processes that modify this transcript into one or more functionally-related mature mRNA molecules.

Transcription (Figure 2-13) can be divided conveniently into three phases: initiation of transcription, elongation of the RNA transcript, and termination. Recognition of the beginning of a gene involves the action of transcription factors, which are DNA-binding proteins that assist RNA polymerase to bind to a promoter, a specific sequence of nucleotides upstream from the beginning of the gene’s coding region (Figure 2-14). Other transcription factors can bind short nucleotide sequences near the promoter and either enhance or inhibit the rate of transcription. RNA polymerase begins synthesizing an RNA strand starting at the promoter, so each transcript has a stretch of nucleotides before coming to the ones that are eventually translated into the polypeptide.

images

Figure 2-13. Transcription has three stages: initiation, elongation of the RNA transcript, and termination. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

images

Figure 2-14. The bacterial promoter region, identified by the consensus −35 and −10 sequences shown. The 5′-TATAAT-3′ sequence is sometimes called the Pribnow box. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

The transcription factors and RNA polymerase bind to the DNA double helix and must open it to expose the single-strand that will serve as the template for RNA synthesis. In bacteria, the RNA polymerase core enzyme is composed of five subunits, and a sixth protein, the sigma factor (σ, (Figure 2-15), completes the RNA polymerase holoenzyme and assists it to recognize the promoter sequence. Synthesis occurs as RNA polymerase moves along the template strand and catalyzes the insertion of an RNA nucleotide complementary to the template DNA sequence (Figure 2-16). A pairing similar to that in replication occurs, except that uracil (U) is attached in place of thymine, so when the template presents an A, the RNA transcript will add a U to the 3′ end of the growing chain (Figure 2-17).

images

Figure 2-15. The σ factor illustrates how proteins that facilitate transcription interact with the DNA double helix. DNA has a major and a minor groove. The σ factor protein is composed of two α-helices connected by a turn, called a helix-turn-helix motif. The amino acids in the α-helices bond with nucleotide bases in the major groove. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

images

Figure 2-16. The initiation of bacterial transcription. The σ factor helps RNA polymerase recognize the promoter region, and the open complex makes one DNA strand available as a template for RNA synthesis. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

images

Figure 2-17. Key events in the synthesis of an RNA transcript. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

Special cases are known like overlapping genes that share the use of portions of the same DNA sequence and nested genes that use different parts of the same sequence in separate transcription cycles. But typically, transcription can be thought of as using only one of the two strands as a template. The template strand is defined by the nucleotide sequence of the promoter and regulatory gene regions. Along an extended stretch of DNA, one strand may be the template for gene #1 and be transcribed let us say to the right, and further along the other strand might be the template for gene #2 and be transcribed to the left. The key is the 5′ to 3′ nucleotide sequences of the transcription signals.

Nucleotides are added to the RNA transcript at the 3′-OH end, just as in DNA replication. The template strand is, therefore, being read in the 3′ to 5′ direction, since the RNA is antiparallel to the DNA template and is growing at the 3′ end. One consequence of this is that the unused complementary DNA strand, often called the coding strand or sense strand, has the same nucleotide sequence as the RNA being formed, except that RNA has uracil (U) wherever the DNA had thymine (T). For that reason, publications often present a gene by showing the nucleotide sequence of the sense strand rather than the template strand that is actually being used, making it easy to convert the information into a form in which amino acid content can be determined mentally.

The overview we have presented of transcription in prokaryotes has direct parallels in eukaryotes, although there is greater variation among promoter sequences and a larger role for a range of regulatory elements. Transcription begins when RNA polymerase II, general transcription factors, and a mediator bind to a promoter sequence (Figure 2-18). With typically 12 subunits in RNA polymerase II, 5 general transcription factors, and a mediator with multiple subunits, this is a complex in every sense of the word (Figure 2-19). Another complication in eukaryotes is that the chromosome has its DNA wrapped around histone protein complexes, the nucleosomes. The chromatin must be remodeled to remove the nucleosomes before transcription can proceed. Chromatin remodeling and related issues will be discussed when we explore chromosome structure in more depth in Chapter 5.

images

Figure 2-18. Representative elements of the promoter for structural genes in eukaryotes, which are typically more complex and variable than in prokaryotes. For structural genes recognized by the eukaryotic RNA polymerase II, there are sites at which regulatory elements bind, a TATA box, and a start site for transcription. The start site is typically an adenine with a cytosine and two pyrimidines before it and five after it. The promoters for other RNA polymerase differ from this pattern. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

images

Figure 2-19. The events that occur in producing the open complex for transcription in eukaryotes. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

RNA Processing in the Nucleus

Ribonucleic acid is processed in various ways before the final molecule is ready for use by the cell. For example, in some cases, a long transcript is cleaved into smaller pieces, such as in the production of ribosomal RNA(rRNA) or transfer RNA (tRNA) molecules created from a region of a chromosome where their structures are tandemly repeated many times.

As mentioned earlier, the coding portion of a typical eukaryotic gene is interrupted by intervening sequences that occur between those that will eventually define the content of an mRNA. The initial RNA transcript, sometimes called heterogeneous nuclear RNA (hnRNA) is processed to cleave out the non-coding intervening sequences, the introns, leaving behind the coding part of the mRNA molecule, the exons. Introns are removed by a spliceosome, composed of subunits called snRNPs, small nuclear ribonucleoproteins. These RNA plus protein complexes bind splice sites at the edges of an intron, cut the DNA, attach the adjacent exons, and remove the intron in the form of a lariat (Figure 2-20). Different cell types may not splice the introns in exactly the same way, causing alternative splicing that can yield slightly modified versions of the protein in different tissues.

images

Figure 2-20. Removal of an intron by a spliceosome during RNA processing in the nucleus. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

The ends of the mRNA are also modified in the nucleus by adding a cap of 7-methylguanosine to the 5′ end (Figure 2-21) and a string of adenine nucleotides as a poly-A tail to the 3′ end. The 5′ cap is recognized by cap-binding proteins that may be required for proper export of the mRNA from the nucleus, and the cap is recognized by initiation factors to begin translation at the ribosome. The poly-A tail is important for mRNA stability.

images

Figure 2-21. The 5′ end of the mRNA is capped with 7-methylguanosine. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

Translation

Genetic information is translated from one molecular language to another, from nucleotides in DNA/RNA to amino acids in polypeptides, at the ribosomes in the cytoplasm. Like the process of transcription, translation can be viewed as a sequence of stages: initiation, elongation, and termination. The mRNA can be translated many times until it eventually breaks down and its nucleotides are recycled. The translator is a population of RNA molecules called transfer RNA (tRNA) that each carry one of the 20 naturally-occurring amino acids and cause their code-appropriate insertion into a growing polypeptide. The translation dictionary is the Genetic Code.

The 20 naturally-occurring amino acids have in common an amino (NH2) group attached to a carbon and then to a carboxyl (O=C-OH) group (Figure 2-22). The group attached to the middle carbon of this nitrogen-carbon-carbon backbone affects the molecular behavior of its part of the polypeptide. During translation, amino acids are linked together by adding the next one to the carboxyl-end of the chain. Thus, polypeptides have an important asymmetry. The “earliest” amino acid in a growing chain has a free amino group (thus, the N-terminal), and the newest amino acid has a free carboxyl group (the C-terminal). The groups attached to the central carbons influence the 3-dimen-sional polypeptide shape and thus its function. For example, some side chains make the amino acid non-polar so they are less likely to intermix with water. They are hydrophobic, or “water fearing.” Regions of a polypeptide with non-polar amino acids tend to coil toward the inner part of the folded chain, away from water. Polar amino acids, on the other hand, readily interact with polar water molecules. They are hydrophilic, or “water loving,” and fold toward the outside in contact with the aquatic cytoplasm or intercellular fluid.

images

Figure 2-22. Proteins contain 20 different amino acids that have chemical characteristics that contribute to protein structure. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

The information in the nucleotide sequence of mRNA is colinear with the order of amino acids in a polypeptide. In other words, there is a direct linear correspondence between the two molecular languages without gaps. During translation, nucleotides are read three at a time from a starting point near the 5′ end of the mRNA molecule. Such a triplet is called a codon. Each triplet is unambiguously associated with a specific amino acid carried by its corresponding tRNA molecule (Figure 2-23) to which it is attached by a covalent bond at the 3′ end. The appropriate amino acid is covalently attached to the correct tRNA molecule by an aminoacyl-tRNA synthetase (Figure 2-24). Each type of tRNA molecule also has a loop with three nucleotides forming the anticodon, which is complementary, and antiparallel, to the three nucleotides of the mRNA codon. In that way, the codon binds the complementary anticodon of a specific tRNA carrying its specified amino acid (Figure 2-25). In effect, the tRNA molecules are the translators at a ribosome. By keeping track of the A with T (or U) and G with C complementary pairing of DNA and RNA nucleotides and the antiparallel 5′-3′ orientation of each nucleotide strand, one can trace the colinear alignment of genetic information from the double helix to its resultant polypeptide. For example, the 3′ end of the DNA template strand corresponds to the 5′ end of the mRNA, which in turn corresponds with the N-terminal end of the polypeptide. Catalytic RNAs that are part of the ribosome structure link sequential amino acids into the growing polypeptide chain.

images

Figure 2-23. A schematic diagram of a tRNA molecule, showing the acceptor stem that binds an amino acid and the anticodon that binds a codon on mRNA. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

images

Figure 2-24. Aminoacyl-tRNA synthetase “charges” a tRNA molecule by catalyzing the attachment of the correct amino acid. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

images

Figure 2-25. An overview of the relationships between the coding and template DNA strands, the mRNA, the tRNAs, and the polypeptide. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

There are only four different nucleotides to account for 20 different amino acids in living cells. For that reason, a triplet code is the simplest possible translation vocabulary. If it were just a two-letter code, for example, there would be only 42 different combinations of the four nucleotides, yielding only 16 unique two-letter codons. In a three-letter code, there are 43 = 64 different codons. At that level, however, there clearly must be some redundancy, called degeneracy, of the code, since there are more possible triplets than amino acids (Figure 2-26). Of the 64 possible triplets, three (UAA, UAG, and UGA) do not bind a tRNA molecule. Instead they are involved in terminating protein synthesis. There are, therefore, 61 sense codons, meaning that 61 triplets bind a tRNA anticodon and result in the addition of an amino acid during translation. Although it has a functional role in the process, the triplet AUG binds a Met-tRNA molecule and is, therefore, one of the sense codons.

images

Figure 2-26. The Genetic Code. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

In some cases six codons yield the same amino acid (such as leucine or serine). In other cases, four (proline and alanine), two (histidine and glutamine), or only one (methionine) possible codon is found. An examination of the codons within the same amino acid group reveals that it is the third nucleotide, the one in the 3′ position of the mRNA triplet, that primarily accounts for degeneracy of the Genetic Code. This is called the wobble position and is explained in part by the tolerance of some mismatched pairing in that position and the incorporation of modified bases into tRNA anticodon (Figure 2-27) that pair differently than the normal four bases. One consequence is that a cell does not need to produce 61 different types of tRNA to accommodate all of the sense codons, although new information about tRNA diversity in humans suggests it may be more extensive than previously thought.

images

Figure 2-27. The wobble position is the third base of the 5′ to 3′ codon, which corresponds to the first base of the antiparallel anticodon. The tRNA can carry modified bases in addition to the normal A, U, G, and C. Examples are inosine (I), 5-methyl-2-thiouridine (xm5s2U), 5-methyl-2′-O-methyluridine (xm5Um), 2′-O-methyluridine (Um), 5-methyluridine; (xm5U), 5-hydroxyuridine (xo5U), and lysidine (k2C). The bases in parentheses are not well-recognized by tRNA. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

The Genetic Code is almost universal. Few differences among organisms have been found. Among these, however, are differences in the code used in mammalian mitochondria, a cell organelle with its own DNA. For example, the triplet AUA usually codes for isoleucine, but it codes for methionine in mammalian mitochondria, and UGA codes for tryptophan rather than being a stop codon. But in general the near-universality of the Genetic Code is evidence for the continuity of life and is a boon for the use of model organisms to decipher its puzzles.

A key to understanding the process of protein synthesis rests in the ribosome and its active regions. Ribosomes are complexes with a small and a large subunit, each containing one or more types of ribosomal RNA (rRNA) and a large number of proteins (Figure 2-28). Most of the mass and important catalytic activity is associated with the RNA component. There is only one kind of ribosome in bacterial cells, but in eukaryotes, the structure of the main ribosomes, those found in the cytoplasm, differs from those found in the mitochondria (and in the chloroplasts of plant cells). The sizes of the rRNA and the subunits are described in terms of their rate of sedimentation under centrifugation. Svedberg units (S) are named after the inventor of the ultracentrifuge. The cytoplasmic ribosomes in a eukaryote have a 40S small subunit composed of an 18S rRNA and 33 proteins plus a 60S large subunit composed of 5S, 5.8S, and 28S rRNAs with 49 proteins. These two subunits are assembled at initiation to produce an 80S ribosome (Svedberg units are not simply additive) with several active sites. Bacterial ribosomes have a 30S small subunit and 50S large subunit.

images

Figure 2-28. RNA and protein compositions of: (a) bacterial ribosomes; and (b) eukaryotic ribosomes. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

An overview of events in protein synthesis is shown in Figure 2-29. The initiation of translation requires the assembly of a ribosome from its two subunits and binding with a molecule of mRNA and the initiator tRNA. Elongation of the polypeptide occurs when a triplet is drawn into the aminoacyl site (A site) of the ribosome and the amino acid it carries is covalently linked to the polypeptide carried by the tRNA in the peptidyl site (P site). The tRNA at the P site is released from the ribosome through the exit site (E site), and the ribosome brings the remaining tRNA into the P site by moving along the mRNA. This draws a new triplet into the A site. The process is repeated a few hundred times or more for an average size protein.

images

Figure 2-29. Summary of protein translation in bacteria: initiation, elongation, and termination. These stages are presented in more detail in Figures 2.31 to 2.33. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

Termination occurs when a stop codon is brought into the A site. There the stop codon binds with a protein that acts as a release factor. Each type of tRNA molecule is recharged with its appropriate amino acid, and all components are reused until they break down.

With that overview of the process, we will now look at the events of initiation, elongation, and termination of protein synthesis in more detail. As before, these events will be described first for bacterial protein synthesis where they are perhaps best understood, and key features of eukaryotic protein synthesis will then be described.

The bacterial initiation complex (Figure 2-30) involves the small ribosomal subunit, the mRNA, three protein initiation factors, and the initiator tRNA that we will denote as fMet tRNA or tRNAfMet (you may encounter different abbreviations in other references). This tRNA is a special form that carries a methionine amino acid covalently bound to a formyl-group that effectively blocks the amino end from forming bonds with another amino acid. It is only used at initiation and helps insure the unidirectional growth of the polypeptide at the C-terminal.

images

Figure 2-30. Bacterial translation: initiation. In addition to the 30S ribosome subunit, the mRNA with the 9 nucleotide long Shine-Dalgarno recognition sequence and AUG start codon, three initiation factors (IF) are required. The initiator tRNA carries the modified formyl-Methionine (f-Met) amino acid, and the initiation complex is complete when the 50S ribosome subunit becomes attached. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

Bacterial mRNA contains a 9-nucleotide-long ribosomal binding site called the Shine-Dalgarno sequence. This facilitates binding of the mRNA to the small 30S ribosomal sub-unit, because it is complementary to a sequence of nucleotides in the rRNA molecule present there. Again we see that complementary base pairing is found throughout these processes. One of the initiation factors assists the binding of tRNAfMet to the start codon, which is usually AUG. After translation has been completed, the formyl group or the complete fMet may be removed from the protein, so methionine is not the first amino acid of every protein. Finally, initiation phases into elongation of the polypeptide when the large 50S ribosome subunit is attached to complete the protein synthesis workbench with its active A, P, and E sites.

Elongation begins when a charged tRNA, i.e., a tRNA molecule carrying its specified amino acid, binds to an mRNA codon at the A site of the ribosome (Figure 2-31). Accuracy of the codon-anticodon pairing is assisted by the decoding function associated with the 16S rRNA in the small subunit. If mispairing occurs, elongation is halted until the mispaired tRNA leaves the A site. An uncorrected error in elongation occurs only about once per 10,000 amino acids. This level of accuracy is especially impressive when we note that elongation of a polypeptide chain occurs at a rate of approximately 15 to 18 amino acids per second in bacteria and about 6 per second in eukaryotes.

images

Figure 2-31. Bacterial translation: elongation. Catalytic activities and binding sites are associated with the large ribosome subunit, and elongation factors (EF) promotes tRNA binding and translocation. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

Next, the peptide attached to the tRNA in the P site is transferred to the tRNA at the A site, and a covalent peptide bond is formed with the new amino acid. This peptidyl transfer is catalyzed by a component of the 50S subunit called peptidyltransferase, composed of rRNA and several proteins. It is actually the 23S rRNA that catalyzes the dehydration reaction to synthesize the peptide bond, an example of the enzymatic capability of RNA. The ribosome then translocates three nucleotides in the 3′ direction. This does two things. It moves the two tRNA molecules into the E and P sites, respectively, and it brings a new codon into the now-empty A site. The uncharged tRNA in the E site is released from the ribosome, and a new charged tRNA binds the codon in the A site. This process is repeated until one of the stop codons enters the A site.

In most organisms, the stop codons are UAA, UAG, and UGA. Instead of binding with a tRNA, they bind with proteins called release factors (Figure 2-32). The bond between the complete polypeptide attached to the tRNA in the P site is broken (hydrolyzed). The polypeptide and uncharged tRNA are released from the ribosome, then the ribosome disassembles into its subunits and a free mRNA. Components are reused until they degrade.

images

Figure 2-32. Bacterial translation: termination. A stop codon is recognized by release factors (RF) that promote termination and the dissociation of components. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

The translation process in eukaryotes parallels the bacterial events outlined here, but with some not-unexpected added complexity. Some translational proteins are conserved (i.e., found in both systems), but there are additional initiation factors in eukaryotes and only one release factor, compared to the three present in bacteria. Furthermore, eukaryotic mRNA does not have a Shine-Dalgarno sequence. Instead, several initiation factors bind to the mRNA, one of which (cap-binding protein I, CBPI) recognizes the 5′ cap of 7-methylguanosine added in the nucleus during mRNA processing. These initiation factors also unwind any secondary folding that might be present in the mRNA and assists binding to the ribosome’s 40S small subunit. Typically, translation uses the first AUG triplet in the 3′ direction as the start codon, although the sequence of flanking nucleotides plays an important role in scanning by the ribosome. One of the initiation factors helps to complete ribosome assembly by the addition of the 60S large subunit.

Factors Affecting Protein Shape and Function

The amino acid sequence of the gene product determines its function. But the relationship is not a straightforward one. The amino acid side chains have their own individual chemical characteristics. Individual amino acids or subregions of the polypeptide can also react with each other and with different domains in the cell, such as the aquatic cytosol or the non-polar membrane lipid bilayer. In addition, binding with other polypeptides or cofactors can influence shape and function. For simplicity, it is helpful to begin by recognizing four general levels at which protein structure can be described (Figure 2-33).

images

Figure 2-33. Levels of protein structure. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

The primary (1°) level of protein structure is the amino acid sequence. This is presented in the literature by listing the amino acid abbreviations or their one-letter codes (Figure 2-22) in sequential order from the N-terminal to the C-terminal. The secondary (2°) level derives from the way hydrogen bonds can create repeating shapes within some localized portions of a protein. There are two forms of secondary structure. The α-helix is a right-handed coil of the nitrogen-carbon-carbon backbone, and the β-pleated sheet is formed between parallel regions. Both are stabilized by hydrogen bonds between amino- and carboxyl-groups of different amino acids. When protein shape is denatured by heat or increased acidity, it is often these comparatively weak hydrogen bonds that are broken, causing the protein to fall into a random coil.

As it is synthesized, the polypeptide folds into a tertiary (3°) structure in which the three-dimensional shape determines many of its functional characteristics in the cell. Often other proteins, called chaperones, help produce the proper shape. Some proteins are long and fibrous, such as collagen and elastin that affect the strength and flexibility of connective tissue or actin that participates in cell movement. Most proteins, however, are globular with one or more active sites that allow proteins to carry out a broad array of functions. Perhaps the most familiar class of globular proteins is enzymes that catalyze the biochemical events of metabolism. Globular proteins also include membrane receptors, ion channels, cell signaling molecules, protein hormones, and many other critical elements of a cell and its products.

Many proteins are also found to have an additional level of structural complexity, the quaternary (4°) level, in which two or more polypeptides are linked together to form a complex. These polypeptides are often produced by different genes, so several genes yield one active product. Hemoglobin, microtubules, microfilaments, connective tissue proteins, and many of the enzymatic complexes we described in DNA replication, transcription, and translation must be understood at this level. A change in any one of the component proteins can affect the function of the complex, so several different genetic mutations can have related consequences for the cell. Indeed, the way multimeric proteins are assembled can affect their function and be abnormal, even if each of the component subunits is normal.

Protein shape is not rigid. In fact, many proteins must be flexible in order to carry out their metabolic role. The motor molecule dynein “walks” up an adjacent tubule to cause movement of cilia and flagella. Movement of the end of myosin molecules in muscle cells is essential to contraction. Temporary shape changes can also have a regulatory influence. Allosteric proteins are those that go through reversible changes in shape by binding with another molecule.

Allosteric interactions with an activator or inhibitor molecule play important regulatory roles in biochemical pathways and in processes like facilitating or inhibiting the binding of RNA polymerase to initiate transcription in eukaryotes. We will explore this last example in more detail in Chapter 3.

From the genetic point of view, it is easy to understand the impact a mutation can have by causing an amino acid substitution. But there is a range of severity among mutations. Not all amino acid changes will alter a protein in a major way, and some amino acid substitutions are biochemically equivalent. Many point mutations cause the substitution of one amino acid with another that differs in chemical properties and affects protein shape in a major way. Thus, the consequences of a mutation can range from being phenotypically neutral, to being conditional on the biochemical environment, to being severe or even fatal.

Posttranslational Modification and Protein Sorting

We have seen that the process of information flow from DNA to phenotype has many points at which regulatory events can operate. In the previous section, some examples of regulation at the protein level were described. But we can formalize this idea by introducing the concept of posttranslational modification. Protein structure can be changed after translation in several ways. For example, a few amino acids can be removed from an end of the polypeptide, and that can change the protein’s activity. Some proteolytic enzymes of the digestive tract, such as trypsin, are initially synthesized and secreted in an inactive form (e.g. trypsinogen in this case) so they do not damage the cells that make them. They are then activated when they reach their working location. Similarly, the inactive fibrinogen is activated into the fibrin monomer of a blood clot by platelet breakdown or other activating signal.

Several small polypeptides can be produced from a larger one by cleavage. An example is pituitary hormones. Depending on how it is cleaved, propriomelanocortin (POMC) yields a total of five different hormones, including ACTH and β-endorphin. Posttranslational modifications can also include adding chemical groups, such as methyl groups (methylation), phosphate groups (phosphorylation), and carbohydrates (gly-cosylation). Phosphorylation, for example, is dependent on a class of enzymes called kinases and can either activate or inactivate a protein. In Chapter 5 we will see how cyclin-dependent kinases (CDKs) are involved in regulating the cell division cycle by phosphorylating proteins like those required for DNA replication and for chromosome condensation.

A related idea is directed transportation or sorting of proteins within the cell. Many protein sequences include signals that will sort proteins to particular targets, such as a specific membrane-bounded organelle (Figure 2-34), since each protein typically works in a restricted area of the cell. Proteins involved in ATP synthesis but encoded in the nuclear genome, for example, must be sorted to the mitochondria. Sometimes the polypeptide includes a sequence that is recognized by an RNA-protein complex called the signal recognition particle (SRP) that temporarily halts translation until the ribosome has been bound to the membrane of the endoplasmic reticulum (ER). The polypeptide is then synthesized into the inner lumen of the ER. Sorting signals are short amino acid sequences that are recognized by specific ultrastructural elements. The SRP signal is a group of about 20 primarily nonpolar amino acids near the amino terminal, while the mitochondrial-sorting signal is a short sequence that includes positively charged amino acids that fold into an alpha helix with the positive charges on the outside. Clearly, the flow of information in a cell is much broader and more dynamic than expressed by the Central Dogma, DNA ↔ RNA → polypeptide.

images

Figure 2-34. One means of regulation involves sorting of the protein into various cell regions. Posttranslational sorting occurs with proteins synthesized in the cytosol. They either remain in the cytosol or are sorted to the mitochondria, chloroplasts, peroxisomes, or nucleus. Cotranslational sorting involves the signal recognition particle (SRP) detecting a short amino acid sequence near the amino terminal. These proteins are sorted first to the endoplasmic reticulum and then to the Golgi complex, lysosomes, secretory vesicles, or the plasma membrane. Note that the diagram represents snapshot points in translation, not three ribosomes translating an mRNA at the same time. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

Pleiotropy

In contrast to the Central Dogma, the flow of genetic information is often not linear. The gene and phenotype do not always show a one-to-one relationship. Pathways branch and merge. Sometimes one mutation can have several apparently unrelated phenotypic effects, a phenomenon called pleiotropy. Sickle cell anemia is a classic example (Figure 2-35).

images

Figure 2-35. (a) A comparison of the amino acid sequence between normal beta-globin and sickle cell beta-globin. (b) Abnormally shaped (sickled) red blood cells in sickle cell anemia. (a: Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008. b: CDC/Sickle Cell Foundation of Georgia: Jackie George, Beverly Sinclair.)

Hemoglobin is a multimeric protein composed of two α-globin and two β-globin polypeptides. The change from glutamic acid to valine at position 6 of β-globin causes the hemoglobin to crystallize in low oxygen conditions, such as during strenuous exercise or at high altitude. Deformed and rigid RBCs block capillaries and cause heart attacks and strokes. They also rupture and cause anemia that places high demands on RBC forming tissues in bone marrow, which can alter bone size and shape. Thus, one amino acid substitution has a range of superficially unrelated phenotypic consequences.

Understanding pleiotropic expression can uncover the central event of a genetic change. For example, what can small ears and kidney problems in a mouse have in common? First, let us consider the ear phenotype. What can account for having small ears? One possibility is a defect in cartilage production. If cartilage formation is retarded, what other body structures might be similarly affected through their cartilage? The nose and the cartilage disks between vertebrae are candidates. It might be hard to tell if a mouse’s nose is shorter than usual. But the vertebral disks can certainly have a noticeable effect on body length. Having normal-sized organs in a shortened abdomen can cramp structures like the ureters, leading to back pressure of urine into the kidneys and eventually to tissue damage. Atrophy of the kidney, or hydronephrosis, is therefore a functional consequence of the same genetic defect that caused the ears to be small.

The opposite defect, abnormal proliferation of cartilage cells or their precursors in rats, was one of the first examples of pleiotropy studied in detail. Excess cartilage narrows the lumen of the trachea and causes ribs to be larger. For that reason, breathing is inhibited and there is chronic oxygen deficiency. Hemoglobin levels increase to compensate and the blood thickens. Higher resistance to pulmonary circulation contributes to hypertrophy of the right ventricle. The affected rats cannot suckle or sneeze. Death occurs soon after birth. This range of phenotypes, including death, can be traced to the action of a single gene.

In this example, death is a phenotype. In fact, recessive lethal mutations are the largest class of gene mutations. When a recessive lethal mutation also has developmental effects that can be detected in heterozygotes, they are showing pleiotropic expression. An example is the creeper mutation in chickens. The heterozygote displays skeletal defects of the legs, but the homozygote dies in early development. Thus, in this pleiotropic mutation, the leg malformation is dominant and lethality is recessive. In fact, given enough information about their developmental influences, most if not all genes are probably pleiotropic at some level.

Genotype × Environment Interactions

It would be a mistake to limit one’s evaluation of genetic influences to their direct products: the RNA and proteins. Since temperature can affect the rate of chemical reactions, it is hardly surprising that environmental conditions can affect the phenotype produced by a gene. Genotype × environment interactions are situations in which the genetic effects on a phenotype differ due to environmental factors like nutrition, climate, or presence of a specific chemical or drug. The field of medicine called pharmacogenetics is devoted to identifying situations in which a person’s genetically defined physiology puts them at risk for serious reactions to properly prescribed treatments.

An environmental factor can even have genetic consequences one or two generations later. An example is dietary intake by a mother and its potential influence on body weight of grandchildren. In a female fetus, primary oocytes are already present in developing ovaries by week 10 after conception. Meiosis, the cell division leading to gamete formation, begins in these primary oocytes but is temporarily arrested at an early stage until it is stimulated to continue years later after puberty. This timing has many important implications and will be discussed in more detail in Chapter 3. Here, the main point is that the cellular events in early development essentially collapse the physiological separation between a grandmother and a grandchild. Genetic and environmental factors affecting a pregnant female can influence the developing oocytes in the fetal ovaries of her daughter in utero.

Biochemical Pathways

As we have seen, genes do not produce their phenotypes in isolation. Proteins influence development by participating in networks of synthesis and degradation. An early insight into the role of genes was formalized in the “one gene, one enzyme” hypothesis by George Beadle and Edward Tatum in 1941, one of the first major insights establishing the field of molecular biology. But it soon became clear that this was an over-simplification, because some proteins, such as hemoglobin, are formed by combining two or more different polypeptides into one functional unit and, of course, not all proteins are enzymes. So the hypothesis was revised as “one gene, one polypeptide.” Current advances, like identifying numerous proteins produced from the same gene by alternative splicing and other processes will continue to refine our appreciation of the complexity of gene effects. But historically this idea was useful in guiding early studies of the role that genes play in determining phenotypes.

Even in its earliest formulation, the role of a gene was understood as controlling sequential steps in a metabolic pathway. By following the inheritance patterns of a rare metabolic disorder, alkaptonuria, Archibald Garrod pioneered the study of inherited diseases of metabolism in 1902. A mutation in homogentisic acid oxidase causes a build up of homogentisic acid, which oxidizes to a black color in urine. Its easy detection in diapers made it the first human metabolic disease clearly associated with a genetic mutation. Expanding the work to include metabolic studies of related compounds, Garrod developed an appreciation of genetic network relationships. A classic example is the metabolism of phenylalanine derived from dietary protein (Figure 2-36).

images

Figure 2-36. A representative metabolic pathway. Phenylalanine from the diet is broken down through tyrosine to maleylacetoacetic acid. Mutations at several points in the process lead to well-known human genetic diseases. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

Phenylketonuria (PKU) is due to a mutation affecting the activity of the enzyme phenylalanine hydroxylase blocking the conversion of phenylalanine to tyrosine. Such an enzymatic defect can potentially affect a phenotype in two ways: it can reduce the available pool of the product of the reaction and it can cause excess build up of the precursor acted upon by the normal enzyme. In this case, a build up of phenylpyruvic acid via phenylalanine interferes with normal neurological development leading to cognitive impairment.

Now imagine a much larger database of protein relationships. Specialties like proteomics combine advanced computational tools with rapidly growing knowledge about the genome and biochemical makeup of humans and model organisms. The technical challenges of analyzing such immense data collections should not be underestimated. It is spurring rapid progress in another fairly new field, bioinformatics. In contrast to the simple relationships seen in the pathway for phenylalanine metabolism, a more current view looks like the protein-interaction map shown in (Figure 2-37).

images

Figure 2-37. (a) Diagram showing 911 high confidence (HC) interactions involving 401 proteins for disease proteins (orange), proteins with gene ontology (GO) annotation (light blue), and proteins without GO and disease annotation (yellow). Interactions that connect the nodes are color-coded to denote confidence scores: green for 3, blue for 4, red for 5, and purple for 6 (b) Proteins linked to one specific role, the Wnt signaling pathway. (From Stelzl et al., 2005, Cell 122: 957-968).

One theme of this chapter, and indeed of the whole book, is the way model organisms can help us understand the human genome and its role in physiology, development, and disease. The molecular interactions and mutation effects in model organisms like Drosophila are very similar to events in human development and our genetic diseases. For that reason, the proteins identified in model organisms suggest prime targets for developing potential drugs and new medical therapies. In a sense, then, the flow of genetic information extends beyond the organism and into medical applications.

Part 2: Medical Genetics

A question often asked in the medical school classroom is, “Why do I need to know (or review) this basic science material? I want to train to be a doctor.” A corollary question would be, “Why should the medical student know the basic mechanics of information flow at the molecular level as described in this chapter?” The answer lies in the progression of the understanding of the basis of human disease in terms of appreciating systems, and how they relate to the structure and function of an organism. In general this progression can be tied to periods of time in the advancement of medical knowledge that correspond to its “Golden Ages.” The history of acquiring medical knowledge parallels the “hot” subject of an era. In periods gone by, advances in medical information have reflected the discipline yielding the “cutting edge” of the day. Previous periods of emphasis have occurred in anatomy, physiology, and microbiology. The last 20 years have clearly ushered in the era of genetics. Most recently, the role of genetics in medicine has evolved into the further refined disciplines of genomics, leading to proteomics and eventually metabolomics discussed in later chapters.

One of the greatest challenges for the student in assimilating the ever increasing knowledge base in genetics is how to organize the material in some sort of functional construct. What is needed is a way to group the information into tangibly related categories without losing important detail in the process. This can be seen in the realm of clinical classification with the two major types of diagnosticians: the “lumpers” and the “splitters.” As with any dichotomy in science the best answer probably lies somewhere between the extremes.

The discipline of dysmorphology involves identifying specific physical features in a patient and then trying to group the findings into a recognizable pattern (Chapter 3). As dysmorphology emerged as its own discipline in the 1970s, painstaking emphasis was placed on identifying specific, often subtle, developmental features that could be identified in an individual. The patterns of these features were then grouped into diagnoses that were designated as syndromes or associations or sequences. A collection of these conditions was assembled into the seminal book on dysmorphology by the father of the field, Dr. David Smith (Recognizable Patterns of Human Malformations). As many such conditions have been described, there are now observations of persons with similar, but not identical, features. The diagnostic key then is to determine critical features that at a minimum should be present to make a diagnosis.

For most of the more common conditions, specific diagnostic criteria have been determined by expert panels. Those patients that have many of the features of a particular condition, but not enough to meet criteria, represent a diagnostic dilemma. Should their condition be classified as a milder expression of the primary disorder or an unrelated but similar condition? Advances in molecular diagnostics have shown that the correct answer can be either. These diagnostics have also demonstrated that there can be a tremendous range of variability in expression from changes in the same gene, such that two clinically very different conditions may be linked by a common gene, i.e., be allelic disorders. Thus, the best approach appears to be the use of strong clinical characterization supported by molecular confirmation.

As these types of correlations emerge, apparently unrelated conditions may be grouped by any number of differing parameters. Depending on the reasons that necessitate grouping (or simply by personal preference), they may be grouped by clinical, biochemical, physiological, or molecular characteristics. At one end of the spectrum, diseases may be classified by how the condition actually affects the patient. In this way of thinking, diseases are linked by the total spectrum of the disease and the clinical presentation of the condition (Table 2-1). A particularly attractive way to pull apparently disparate scenarios together is by pathogenesis, i.e., the underlying mechanism of the disease. Thus, in light of the discussions in the first section of this chapter, specific conditions may be linked by identifying the point in the flow of information at which normal function is disrupted.

Table 2-1. Common Medical Presentations of Genetic Changes (Phenotype from Genotype)

Acquired/degenerative condition

Adverse reproductive outcomes

• Infertility

• Prenatal lethal condition

• Spontaneous miscarriage

Congenital anomaly

Endocrinopathy

Genetic susceptibility to an environmental agent

Inborn error of metabolism

Neoplasia/tumor formation

Neuromuscular dysfunction

Specific organ dysfunction

As these mechanisms have been discovered, the answers have not always been intuitive. For example, Hutchinson-Gilford syndrome is a premature aging condition. Patients with this condition usually start showing problems by 2 years of age. They have marked decrease in linear growth, hair growth, and decreased subcutaneous fat. Features of premature aging include atherosclerosis, presbycusis (hearing loss associated with aging) and arthritic changes as early as 4-5 years old. This condition has now been shown to be due to abnormalities of the LMNA gene, which codes for a protein called lamin A, one major component of the nuclear membrane. Molecular tools were needed to make this association, as no clinician would have ever deduced on their own that the clinical features of premature aging would be caused by a gene that codes for a protein in the nuclear membrane. Other examples of clinical disorders associated with disruption of a specific component of information flow are given in Table 2-2.

Table 2-2. Processes in Information Flow and Corresponding Disorders

images

Alternatively, genetic “families” may be defined based on the common gene / locus involved. Thus a spectrum may be defined as a range of conditions that represent different levels of severity in the disruption of that particular gene’s function. Type II collagen is a structural connective tissue protein that lends strength to tissues, such as bones and cartilage. Abnormalities in type II collagen lead to problems with the bones, joints, eyes, and other tissues. Molecular studies have now demonstrated that several clinically described conditions share in common mutations in type II collagen. These conditions range from skeletal disorders that are so severe that the child dies shortly after birth to less severe conditions like the early onset of osteoarthritis (Table 2-3).

Table 2-3. Examples of “Genetic Families”

images

Similarly, conditions may be linked by a common signaling pathway with similar expression resulting from the involvement of different genes that share a common input into the same system of transmission of molecular information. Neurofibromatosis is a neurocutaneous condition characterized by pigmentary changes in the skin (café au lait spots and abnormal freckling), tumors of the nerves, and various skeletal problems (Figure 2-38). Noonan syndrome is a multiple anomaly syndrome characterized by short stature, characteristic facies, pulmonic stenosis, and skeletal changes (Figure 2-39). Clinical geneticists familiar with both conditions on occasion had noted patients who had the typical changes of neurofibromatosis and some of the features of Noonan syndrome (typical facies and pulmonic stenosis). These conditions were then ascribed names such as Watson syndrome and Noonan-neurofibromatosis. Molecular studies eventually showed that these conditions were actually allelic to neurofibromatosis (neufibromin gene on chromosome 17). Recent molecular studies have now demonstrated that the link between these two conditions is that both conditions are due to mutations in genes that contribute to a common signaling pathway—the RAS/MAPK system (Table 2-4). Most importantly, beyond its role in diagnosis, this type of understanding will have significant implications for potential therapies. For more detailed discussion on patho-genesis and categorizing conditions see Chapter 16.

images

Figure 2-38. (a) Adult male with neurofibromatosis. (b) Note the multiple cutaneous tumors (neurofibromas).

images

Figure 2-39. Two patients with Noonan syndrome. (a) Known mutation in SOS1 gene, and (b) Known mutation in KRAS gene.

Table 2-4. Examples of Disorders Linked By a Common Signaling Pathway

images

Part 3: Clinical Correlation

Increasing knowledge of the molecular basis of human medical conditions has identified clinical correlates of the disruption of each part of the information flow outlined in the first section of this chapter. Table 2-2 lists the major processes we have described and gives examples of the types of conditions that may occur when there is interference with that process.

Congenital Disorders of Glycosylation

Glycosylation is one of several known posttranslational modifications of biological chemicals. This process is both intricate and specific. Glycosylation pathways are some of the most complex metabolic processes known. To date there have been at least eleven glycosylation pathways identified. It plays an important part in the completion of protein production. Two major types of protein glycosylation have been described. N-linked glycosylation involves the link of a glycan to the amide nitrogen of asparagines; O-linked glycosylation involves the attachment to the hydroxy oxygen of serine or threonine.

The first clinical disorder recognized as being due to a problem with protein glycosylation was described in 1980. Since then over 30 such conditions have been reported. These conditions were originally known as “carbohydrate deficient glycoprotein syndromes.” Current nomenclature now identifies them as congenital disorders of glycosylation (CDGs). The clinical spectrum of CDGs is widely variable. Patients may present as extremely ill neonates or as mildly affected adults. They often manifest as multi-system disorders. CDGs should be included in the differential diagnosis of symptoms as varied as problems with the nervous, ocular, skeletal, clotting, or immune systems. They may present with nonspecific features such as growth abnormalities or low muscle tone. CDGs should be considered in almost any patient with otherwise unexplained multi-system problems. Fortunately, a simple and relatively inexpensive screening test is available as a first-tier evaluation. This test, called “transferrin isoelectric focusing,” looks at the migration of the protein transferrin on an electrophoretic gel. In the case of abnormal glycosylation, the protein will migrate on the gel in a pattern different from normal (Figure 2-40).

images

Figure 2-40. Isoelectric focusing of the protein transferrin. (a) Normal pattern. (b) Isoelectric focusing of the protein transferrin from a patient with CDG Ia. Note the increases in disialotransferrin and asialotransferrin in an affected patient and the general shift in the pattern to the left. (Graphs courtesy of Dr. Tim Wood, Greenwood Genetics Center, Greenville, SC.)

Congenital disorder of glycosylation type 1a (CDG 1a) has also been called Jaeken syndrome. It is the most common form of CDG and was the first to be reported. It is known to be due to a deficiency of the enzyme phosphomannomutase 2. This deficiency results in glycoproteins with decreased levels of sialic acid. Patients with CDG 1a typically present with neurologic symptoms including cognitive deficits, central (supranuclear) hypotonia, decreased stretch reflexes, and truncal ataxia. Other features that may be present include cardiomyopathy, enlarged fibrotic liver, and problems with the kidneys. Endocrine, immune, and clotting abnormalities may also be involved. There may be a number of important clues seen on physical exam that may alert the clinician to the possible diagnosis. Such features include dysmorphic facial features (Figure 2-41), inverted nipples, abnormal distribution of subcutaneous fat, and an “orange peel” appearance to areas of the skin.

images

Figure 2-41. Patient with a congenital disorder of glycosylation (CDG). This patient has CDG type 1a (Jaeken syndrome) due to phosphomannomutase-2 enzyme deficiency.

Image Board-Format Practice Questions

1. Different types of RNA can play a role in which of the following processes:

A. DNA replication.

B. Enzymatic/catalytic functions.

C. Secondary messengers for cell membrane receptors.

D. Endocytosis.

E. Synaptic communication.

2. Which is the best example of pleiotropy?

A. One patient with neurofibromatosis has only a few hyperpigmented spots. A second (unrelated) patient has multiple tumors including spinal tumors that cause extreme pain.

B. A patient who carries a mutation inherited from a parent who is affected with a medical condition due to the mutation shows no expression of the condition themselves.

C. A mutation in the SOS1 gene produces Noonan syndrome. Patients with Noonan syndrome have short stature, heart malformations, dysmorphic facies, and learning difficulties.

D. A mutation in one part of gene produces one clinical problem. A mutation in another part of the gene produces a completely different problem.

E. A specific gene change causes no clinical problem.

3. It is estimated that humans only have 22,000 functioning genes. Much simpler organisms have many more functioning genes. A major reason that the more complex human development can occur with fewer genes is:

A. the presence of multiple pseudogenes per each copy of a ‘real’ gene.

B. posttranslational modification of produced proteins.

C. gene amplification.

D. single splicing options.

E. epimerases.

4. Mutations in a gene known as PTEN can produce many different clinical conditions. These would include Cowden disease (a familial cancer syndrome), autism with macrocephaly, and Bannayan-Riley-Ruvalcaba syndrome (a multiple anomaly syndrome with mental retardation and dysmorphic features). These conditions could be grouped together as:

A. pleiotropic conditions.

B. genetically linked conditions.

C. co-dominant conditions.

D. a genetic family.

E. contiguous gene disorders.



If you find an error or have any questions, please email us at admin@doctorlib.info. Thank you!