The basal transcriptional machinery mediates gene transcription
Protein-coding genes are transcribed by an enzyme called RNA polymerase II (Pol II), which catalyzes the synthesis of RNA that is complementary in sequence to a DNA template. Pol II is a large protein (molecular mass of 600 kDa) comprising 10 to 12 subunits. Although Pol II catalyzes mRNA synthesis, by itself it is incapable of binding to DNA and initiating transcription at specific sites. The recruitment of Pol II and initiation of transcription requires an assembly of proteins called general transcription factors. Six general transcription factors are known—TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH—each of which contains multiple subunits. N4-2 These general transcription factors are essential for the transcription of all protein-coding genes, which distinguishes them from the transcription factors discussed below that are involved in the transcription of specific genes. Together with Pol II, the general transcription factors constitute the basal transcriptional machinery, which is also known as the RNA polymerase holoenzyme or preinitiation complex because its assembly is required before transcription can begin. The basal transcriptional machinery assembles at a region of DNA that is immediately upstream from the gene and includes the transcription initiation site. This region is called the gene promoter (Fig. 4-6).
FIGURE 4-6 Promoter and DNA regulatory elements. The basal transcriptional machinery assembles on the promoter. Transcriptional activators bind to enhancers, and repressors bind to negative regulatory elements.
Sequential Assembly of General Transcription Factors
Contributed by Peter Igarashi
In vitro, the general transcription factors and Pol II assemble in a stepwise, ordered fashion on DNA. The first protein that binds to DNA is TFIID, which induces a bend in the DNA and forms a platform for the assembly of the remaining factors. Once TFIID binds to DNA, the other components of the basal transcriptional machinery assemble spontaneously by protein-protein interactions. The next general transcription factor that binds is TFIIA, which stabilizes the interaction of TFIID with DNA. Assembly of TFIIA is followed by assembly of TFIIB, which interacts with TFIID and also binds DNA. TFIIB then recruits a preassembled complex of Pol II and TFIIF. Entry of the Pol II–TFIIF complex into the basal transcriptional machinery is followed by binding of TFIIE and TFIIH. TFIIF and TFIIH may assist in the transition from basal transcriptional machinery to an elongation complex, which may involve unwinding of the DNA that is mediated by the helicase activity of TFIIH. Although this stepwise assembly of Pol II and general transcription factors occurs in vitro, the situation in vivo may be different. In vivo, Pol II has been observed in a multiprotein complex containing general transcription factors and other proteins. This preformed complex may be recruited to DNA to initiate transcription.
EFIGURE 4-2 The sequential assembly of general transcription factors and RNA polymerase II (Pol II) results in the formation of the basal transcriptional machinery.
The promoter determines the initiation site and direction of transcription
The promoter is a cis-acting regulatory element that is required for expression of the gene. In addition to locating the site for initiation of transcription, the promoter also determines the direction of transcription. Perhaps somewhat surprisingly, no unique sequence defines the gene promoter. Instead, the promoter consists of modules of simple sequences (DNA elements). N4-3 A common DNA element in many promoters is the Goldberg-Hogness TATA box. The TATA box has the consensus sequence 5′-GNGTATA(A/T)A(A/T)-3′, where N is any nucleotide. The TATA box is usually located ~30 bp upstream (5′) from the site of transcription initiation. The general transcription factor TFIID—a component of the basal transcriptional machinery—recognizes the TATA box, which is thus believed to determine the site of transcription initiation. TFIID itself is composed of TATA-binding protein (TBP) and at least 10 TBP-associated factors (TAFs). The TBP subunit is a sequence-specific DNA-binding protein that binds to the TATA box. TAFs are involved in the activation of gene transcription (more on this below).
Binding of Specific Transcription Factors to Promoter Elements on DNA
Contributed by Peter Igarashi
EFIGURE 4-3 In this example, specific transcription factors bind to enhancer elements on the DNA and interact with the basal transcriptional machinery to increase the efficiency of gene transcription.
Some promoters do not contain a TATA box. Instead, these promoters contain other elements—for example, the initiator (Inr)—that bind general transcription factors. In addition to the TATA box and Inr, gene promoters contain other DNA elements that are necessary for initiating transcription. These elements consist of short DNA sequences and are sometimes called promoter-proximal sequences because they are located within ~100 bp upstream from the transcription initiation site. Promoter-proximal sequences are a type of regulatory element that is required for the transcription of specific genes. Well-characterized examples include the GC box (5′-GGGCGG-3′) and the CCAAT box (5′-CCAAT-3′), N4-4 as well as the CACCC box and octamer motif (5′-ATGCAAAT-3′). These DNA elements function as binding sites for additional proteins (transcription factors) that are necessary for initiating transcription of particular genes. The proteins that bind to these sites help recruit the basal transcriptional machinery to the promoter. Examples include the transcription factor NF-Y, which recognizes the CCAAT box, and Sp1 (stimulating protein 1), which recognizes the GC box. The CCAAT box is often located ~50 bp upstream from the TATA box, whereas multiple GC boxes are frequently found in TATA-less gene promoters. Some promoter-proximal sequences are present in genes that are active only in certain cell types. For example, the CACCC box found in gene promoters of β-globin (see pp. 80–81) is recognized by the erythroid-specific transcription factor EKLF (erythroid Kruppel-like factor).
Typical Eukaryotic Gene Promoters
Contributed by Peter Igarashi
EFIGURE 4-4 Typical eukaryotic gene promoters. A promoter consists of modules of simple DNA sequences or “elements.”
Positive and negative regulatory elements modulate gene transcription
Although the promoter is the site where the basal transcriptional machinery binds and initiates transcription, the promoter alone is not generally sufficient to initiate transcription at a physiologically significant rate. High-level gene expression generally requires activation of the basal transcriptional machinery by specific transcription factors, which bind to additional regulatory elements located near the target gene. Two general types of regulatory elements are recognized. First, positive regulatory elements or enhancers represent DNA-binding sites for proteins that activate transcription; the proteins that bind to these DNA elements are called activators. Second, negative regulatory elements (NREs) or silencers are DNA binding sites for proteins that inhibit transcription; the proteins that bind to these DNA elements are called repressors (see Fig. 4-6).
A general property of enhancers and silencers is that they consist of modules of relatively short sequences of DNA, generally 6 to 12 bp. Regulatory elements are generally located in the vicinity of the genes that they regulate. Typically, regulatory elements reside in the 5′ flanking region that is upstream from the promoter. However, enhancers and silencers may be located downstream from the transcription initiation site or a considerable distance from the gene promoter, many hundreds or thousands of base pairs away. Moreover, the distance between the enhancer or silencer and the promoter can often be varied experimentally without substantially affecting transcriptional activity. In addition, many regulatory elements work equally well if their orientation is inverted. Thus, in contrast to the gene promoter, enhancers and silencers exhibit position independence and orientation independence. Another property of regulatory elements is that they are active on heterologous promoters; that is, if enhancers and silencers from one gene are placed near a promoter for a different gene, they can stimulate or inhibit transcription of the second gene.
After transcription factors (activators or repressors) bind to regulatory elements (enhancers or silencers), they may interact with the basal transcriptional machinery to alter gene transcription. How do transcription factors that bind to regulatory elements physically distant from the promoter interact with components of the basal transcriptional machinery? Regulatory elements may be located hundreds of base pairs from the promoter. This distance is much too great to permit proteins that are bound at the regulatory element and promoter to come into contact along a two-dimensional linear strand of DNA. Rather, DNA looping explains these long-range effects, whereby the transcription factor binds to the regulatory element, and the basal transcriptional machinery assembles on the gene promoter. Looping out of the intervening DNA permits physical interaction between the transcription factor and the basal transcriptional machinery, which subsequently leads to alterations in gene transcription.
Locus control regions and insulator elements influence transcription within multigene chromosomal domains
In addition to enhancers and silencers, which regulate the expression of individual genes, some cis-acting regulatory elements are involved in the regulation of chromosomal domains containing multiple genes.
The first of this type of element to be discovered was the locus control region (LCR), also called the locus-activating region or dominant control region. The LCR is a dominant, positive-acting cis element that regulates the expression of several genes within a chromosomal domain. LCRs were first identified at the β-globin gene locus, which encodes the β-type subunits of hemoglobin. Together with α-type subunits, these β-globin–like subunits form embryonic, fetal, and adult hemoglobin (see Box 29-1). The β-globin gene locus consists of a cluster of five genes (ε, γG, γA, δ, β) that are distributed over 90 kilobases (kb) on chromosome 11. N4-5 During ontogeny, the genes exhibit highly regulated patterns of expression in which they are transcribed only in certain tissues and only at precise developmental stages. Thus, embryonic globin (ε) is expressed in the yolk sac, fetal globins (γG, γA) are expressed in fetal liver, and adult globins (δ, β) are expressed in adult bone marrow. This tightly regulated expression pattern requires a regulatory region that is located far from the structural genes. This region, designated the LCR, extends from 6 to 18 kb upstream from the ε-globin gene. The LCR is essential for high-level expression of the β-globin–like genes within red blood cell precursors because the promoters and enhancers near the individual genes permit only low-level expression.
LCR for the β-Globin Gene Family
Contributed by Peter Igarashi
EFIGURE 4-5 Locus control region for the β-globin gene family. A, The β-globin gene LCR lies upstream from the genes that encode the ε-, γG-, γA-, δ-, and β-globin subunits. The five vertical arrows indicate sites at which the DNA is unusually sensitive to degradation by deoxyribonuclease (DNase). B, The LCR ensures that the genes are expressed in a temporally colinear manner, with ε and γ expressed early during development and δ and β expressed later.
The β-globin LCR contains five sites, each with an enhancer-like structure that consists of modules of simple sequence elements that are binding sites for the erythrocyte-specific transcription factors GATA-1 and NF-E2. It is believed that the LCRs perform two functions: one is to alter the chromatin structure of the β-globin gene locus so that it is more accessible to transcription factors, and the second is to serve as a powerful enhancer of transcription of the individual genes. In one model, temporally dependent expression of β-type globin genes is achieved by sequential interactions involving activator proteins that bind to the LCR and promoters of individual genes (Fig. 4-7).
FIGURE 4-7 Cis-acting elements that regulate gene transcription. This model shows a loop of chromatin that contains genes A, B, and C. The matrix-attachment region (MAR) is an insulator element on the DNA. Matrix-attachment regions attach to the chromosome scaffold and thus isolate this loop of chromatin from other chromosomal domains. Contained within this loop are several cis-acting elements (i.e., DNA sequences that regulate genes on the same piece of DNA), including promoters, enhancers, negative regulatory elements, and the LCR.
A potential problem associated with the existence of LCRs that can exert transcriptional effects over long distances is that the LCRs may interfere with the expression of nearby genes. One solution to this problem is provided by insulator elements, which function to isolate genes from neighboring regulatory elements. Insulator elements may represent sites of attachment of DNA to the chromosome scaffold, generating loops of physically separated DNA that may correspond to discrete functional domains. A transcription factor called CTCF (CCCTC-binding factor) binds to insulator elements and prevents interactions between regulatory elements and genes located on different sides of the insulator.
Figure 4-7 summarizes our understanding of the arrangement of cis-acting regulatory elements and their functions. Each gene has its own promoter where transcription is initiated. Enhancers are positively acting regulatory elements that may be located either near or distant from the transcription initiation site; silencers are regulatory elements that inhibit gene expression. A cluster of genes within a chromosomal domain may be under the control of an LCR. Finally, insulator elements functionally separate one chromosomal domain from another.
Abnormalities of Regulatory Elements in β-Thalassemias
The best-characterized mutations affecting DNA regulatory elements occur at the gene cluster encoding the β-globin–like chains of hemoglobin. Some of these mutations result in thalassemia, whereas others cause hereditary persistence of fetal hemoglobin. The β-thalassemias are a heterogeneous group of disorders characterized by anemia caused by a deficiency in production of the β chain of hemoglobin. The anemia can be mild and inconsequential or severe and life-threatening. The thalassemias were among the first diseases to be characterized at the molecular level. As noted on page 80, the β-globin gene locus consists of five β-globin–like genes that are exclusively expressed in hematopoietic cells and exhibit temporal colinearity. N4-5 As expected, many patients with β-thalassemia have mutations or deletions that affect the coding region of the β-globin gene. These patients presumably have thalassemia because the β-globin gene product is functionally abnormal or absent. In addition, some patients have a deficiency in β-globin as a result of inadequate levels of expression of the gene. Of particular interest are patients with the Hispanic and Dutch forms of β-thalassemia. These patients have deletions of portions of chromosome 11. However, the deletions do not extend to include the β-globin gene itself. Why, then, do these patients have β-globin deficiency? It turns out that the deletions involve the region 50 to 65 kb upstream from the β-globin gene, which contains the LCR. In these cases, deletion of the LCR results in failure of expression of the β-globin gene, even though the structural gene and its promoter are completely normal. These results underscore the essential role that the LCR plays in β-globin gene expression.