Robb E. Moses M.D.1
Jone E. Sampson M.D.2
1Professor and Chair, Department of Molecular and Medical Genetics, Oregon Health and Science University School of Medicine
2Assistant Professor, Department of Molecular and Medical Genetics, Oregon Health and Science University School of Medicine
The authors have no commercial relationships with manufacturers of products or providers of services discussed in this chapter.
The human genome consists of approximately three billion pairs of nucleotides (bases) that encode about 30,000 genes in a DNA duplex. The information in the protein-coding genes is converted to functional elements through the copying of base sequences to RNA. RNA may itself be active—ribonucleotide-containing proteins can interact with newly synthesized RNA to regulate structural changes—or it may be copied to protein. Whereas DNA is a relatively stable molecule, RNA is much less so. Reproductive germ cells have one copy of each gene; in somatic cells, the genome contains two copies of each gene, and the genome is partitioned during cell division. At all levels, the information content is protected by mechanisms safeguarding the stability of the genome. Information is transferred out of the genome along two paths: within the cell for defined functions specific to cell type (horizontal information transfer) and from one generation of cells to another through cell division, either for cell multiplication or for reproduction (vertical transfer).
The advances in the past century, from the verification of Mendel's observations1 to sequencing of the complete human genome, occurred in bursts of insight and technology. During the first half of the 20th century, the principle of inheritance by means of packets of genetic information—packets that were stable and that persisted independently of other units of inheritance—was verified in animals and plants, with the fruit fly Drosophila melanogaster being a notable organism of study. In the 1940s, DNA was unequivocally shown to be the chemical basis of the gene.2 Within a dozen years, the structure of DNA at the molecular level was proposed by Watson and Crick.3 This structural model had immediate implications for the copying of genes and the mode of transfer of information [see Figures 1 and 2].4 During the next 2 decades, the fundamental rules and mechanisms of these processes were determined; these advances relied heavily on the study of bacteria and their viruses, the phages. In the 1970s, three different technological advances catapulted genetics to the point at which the human DNA sequence could be determined. The first of these disparate techniques was the ability to determine DNA sequence information in a relatively simple and reproducible manner; the second was the ability to move and duplicate isolated segments of DNA; and the third was the evolution of computers with adequate power for storing and comparing large amounts of sequence information. Refinements in these basic advances, coupled with automation, led to the sequencing of the human genome by century's end.
Figure 1. In replication, the two strands of the parent DNA molecule (gray) separate as the base pairs detach. The daughter strands (blue) form when guanine (G) pairs with cytosine (C) and when adenine (A) pairs with thymine (T). The orientation of the two strands is anti-parallel, so the strands grow in opposite directions.
Figure 2. (a) Synthesis initiates with priming of a lagging strand. (b) Elongation proceeds from 5′ to 3′ on each strand. (c) The discontinuous fragment reaches the 5′ terminus of the lagging strand. (d) The replicative complex releases the lagging strand to form a new initiation complex.
With progress in technology, genetics has become more applicable to clinical practice. The physician should be aware of patterns of inheritance suggestive of genetic disease in a family member; the physician should also be aware of diagnostic capabilities for inherited diseases and be able to interpret the results of genetic testing for the patient or refer the patient for counseling. Given the pace of advances and the complexity of techniques, it is important that clinicians have knowledge of a range of resources, including Internet sites, for technical and medical information and patient referral.
Diagnostic capabilities continue to improve. Prenatal diagnosis and neonatal screening are powerful tools for the prevention of disease. New diagnostics based on microarray analysis, the hope of gene therapy, and the tailoring of drug therapies to maximize responsiveness and sensitivity for individuals are feasible goals.
Genome Structure and Function
The DNA double helix encoding the genes is packaged into the chromosome, a structure recognizable by light microscopy. In addition to DNA, chromatin (i.e., the substance that forms chromosomes) contains RNA and various proteins.
The packaging of genes is very efficient. The DNA is in the form of supercoils—like a rubber band that is tightly wound until it compacts upon itself. It is then folded into the chromatin assembly by the binding of basic histone proteins. The resulting structure resembles beads on a string, with the DNA wound tightly around a core of histone proteins—two H2A, two H2B, two H3, and two H4 residues—to form the nucleosome. Nucleosomes are spaced approximately 80 bases apart. The DNA structure is further condensed by the addition of other proteins.
There are 23 chromosome packages of genes in each cell; two of these chromosomes, X and Y, are the sex chromosomes. In females, there are two X chromosomes; in males, an X and a Y chromosome. The remaining 22 pairs of chromosomes are termed autosomes. In the process of cell division, or mitosis, the chromosomes condense and are duplicated [see Figure 3], with a complete set going to each daughter cell. In producing germ cells for reproduction, the number of chromosomes is halved to a haploid number of autosomes through the process of meiosis, with either an X or a Y chromosome in spermatozoa and an X chromosome in oocytes. There is a standard system of nomenclature for describing the chromosomes and recognizable alterations or rearrangements of the chromosomes. For example, the normal male karyotype is listed as 46,XY and the normal female as 46,XX.
Figure 3. Genes contain both coding and noncoding portions, termed exons and introns, respectively. (a) The β-globin gene contains three exons (orange) separated by two introns (green). The boundaries between exons and introns are known as splice junctions and contain specific nucleotide sequences that are required for proper joining of the exons. The synthesis of messenger RNA (mRNA) from the β-globin gene proceeds in a 5′ to 3′ direction. The enzyme RNA polymerase II (dark green) binds to a promoter region (light green) located 200 to 300 base pairs in the 5′ direction or located upstream of the point at which mRNA synthesis begins. (b) mRNA begins with a 7-methylguanosine residue, referred to as the CAP site, and includes a 5′ untranslated region (light purple), a coding region of exons and introns, and a 3′ untranslated region (light purple). Nearly all mRNAs that encode proteins terminate at their 3′ ends with a string of approximately 200 adenine residues [known as the poly (A) tail], which are added 18 to 20 base pairs downstream from an AAUAAA signal in the 3′ untranslated region. (c) After mRNA is synthesized but before it leaves the nucleus, the introns are excised and the exons are spliced together to form mature mRNA (d). (e) Once the mature mRNA reaches the cytoplasm, it attaches to ribosomes and is translated into protein.
Each chromosome contains a region of repeated sequence DNA called the centromere. This is the portion that anchors the replicated duplexes (chromatids) together at the time of cell division. The centromere is typically not centrally located; this results in a long arm, termed the q arm, and a short arm, termed the p arm.
Integrity of the terminal ends of chromosomes is maintained by special DNA sequences known as telomeres. In many cell types, telomeres shorten slightly each time the cell divides; this may limit the potential number of cell divisions that can take place before the cell undergoes apoptosis, which has been hypothesized as a factor in longevity and the development of certain cancers. In some cell types (e.g., stem cells), telomere length is maintained through the activity of enzymes known as telomerases.
Genes are not distributed at an average frequency throughout the genome but are clustered in gene-rich regions. Moreover, there is significant variability among the chromosomes with respect to gene content per unit length.
The protein-coding genes [see Figure 4] constitute only perhaps 2% of the total genome. Another 15% to 20% of the genome consists of repetitive sequences, insertional elements, mobile elements, and remnants of viral sequences.5 Although the human genome contains many retroviruses, or transposable elements, most of these do not appear to be active; however, many can function as mobile genetic elements.
Figure 4. (a) Mitosis, or somatic cell division, leads to two identical daughter cells that each has the same number of chromosomes as the parent cell (i.e., the diploid number). (b) Meiosis, or sex cell division, produces four gametes that each has half the number of chromosomes of the parent cell (i.e., the haploid number). In meiosis, the chromatids form junctions known as chiasma, and segments of chromatids cross over.
Even though they do not code for protein, these sequences and elements are of great significance. Short sequences, stably inherited, often influence the level of expression of genes; because level of expression can have important phenotypic consequences, these sequences form an important basis for interindividual variability. Repeat sequences can produce small duplications in the genome, giving rise to the same effect. Repeat sequences normally found in low numbers—so-called low copy repeat sequences—afford a rich basis for nonallelic homologous recombination, fostering deletions and duplications.
The term micro-RNA refers to small, noncoding RNA molecules that constitute perhaps 20% to 25% or more of the genome. These noncoding RNA molecules have been recognized as basic regulatory elements. In turn, micro-RNA is regulated by short interfering RNA (siRNA), as are some protein-encoding genes. Finally, the proteins produced by coding information can be modified by the addition of carbohydrate moieties, acetylation, phosphorylation, or ubiquitination. Thus, there is a rich variability in the end effect of the genome, far exceeding the number of coding genes.5
Techniques of Genetic Analysis
Chromosome identification and characterization was much improved by the development of staining or banding techniques. This process involves partial denaturation of the DNA and proteins, followed by staining. The resulting preparations allow identification of up to 800 bands by microscopy, permitting evaluation for structural changes. These techniques have led to the recognition of many rearrangements, leading to localization of genes.
Additional staining techniques based on binding to complementary short stretches of DNA tagged with fluorescent probes have been developed, allowing fluorescent in situ hybridization (FISH) [see Figure 5]. This technique permits identification of deletions down to 100 kb in length. Further development has led to chromogenic stains, which allow the identification of individual chromosomes and of certain regions of the chromosome—for example, the centromere or telomere.
Figure 5. Fluorescence in situ hybridization (FISH) was performed using the TUPLE1 probe (red) for the VCFS/DGS region at 22q11.2 along with an identifier probe (green) (Vysis, Inc., Downers Grove, Illinois). One homologue has both probe signals (red and green); however, the other homologue has only the identifier signal (green), which indicates that this homologue is deleted.
In the past 20 years, advances in technology, genetics, and bioinformatics have facilitated the analysis of hundreds to thousands of genes in parallel, thus providing a mechanism for profiling gene expression in a single individual with a given condition. These techniques include serial analysis of gene expression (SAGE), differential displays, and DNA microarrays. A microarray is a miniature high-density dot blot consisting of thousands of probes immobilized on a solid matrix. The arrays are built using a variety of DNA substrates, including oligonucleotides, complementary DNA (cDNA), or bacterial artificial chromosomes (BACs), for expression arrays. Each probe detects the presence of an RNA transcript in the sample, compared with control tissue. The two are co-hybridized on the solid support, and specifically designed software is used to determine the relative amounts of messenger RNA (mRNA) in the two samples for every gene built into the array. If DNA samples are used, then the array can detect DNA copy changes—including deletions, duplications, or amplifications—simultaneously at multiple loci represented on the array, in a process known as comparative genomic hybridization (CGH). Although array CGH is applicable to a number of clinical areas, the predominant focus of microarray-based research has been in clinical oncology, where the so-called genetic signature can potentially determine tumor type and provide information regarding prognosis, treatment, or both [see 12:II Molecular Genetics of Cancer]. Clinical utility of array-based genetic testing is dependent on targeting the microarrays for diagnostic and prognostic information as well as clinical management. Regardless of the specific format, data generated from this array can be massive and technically difficult to manage and synthesize in the context of the biologic question at hand. Array CGH also has many research applications, including cancer profiling, gene discovery, and understanding epigenetic modifications and chromatin conformation.
Although microarrays have been constructed for all or parts of the human genome, a whole-genome approach is not practical for clinical use because of the large number of normal polymorphisms present. Targeted arrays based on BACs are likely to be replaced with oligonucleotide chips in the near future, as analytical methods improve.6
Mutations in Clinical Conditions
Mutations are changes in information contained within DNA, which may result in gene products that function in an abnormal or deleterious manner. Different types of mutations lead to genetic disease. Mutations range from single base changes that alter the gene product to the addition or deletion of whole chromosomes. Intermediate structural rearrangements may involve segments that are large enough to be able to be detected microscopically, or they may involve segments that are so small as to require detection by molecular labeling methods. Genetic diseases resulting from single gene mutations are inherited in classic Mendelian fashion, although there is the possibility of new mutations occurring in individuals with unaffected parents. Alternatively, changes other than primary nucleotide sequence may alter gene function. Understanding the mechanisms by which mutations occur in these disorders has led to an understanding of other factors that influence disease; such factors include the effect of imprinting on phenotype expression, the role of trinucleotide expansion in genetic diseases, and the role of mitochondrial DNA (mtDNA) mutations in disorders of energy metabolism.
The following discussions describe the roles of several different mechanisms of mutation in genetic conditions seen more commonly in the general population.
Disorders Caused by an Abnormal Number of Chromosomes
The ability to associate clinical disease with detectable changes in genetic material was established in patients who were identified as having an abnormal number of chromosomes, including trisomy 21 and the sex chromosomes. Before the development of techniques for identifying and separating individual chromosomes from cell preparations, these patients were described clinically on the basis of a shared constellation of congenital anomalies and dysmorphic features (i.e., a syndrome). For example, individuals with Down syndrome arising from trisomy 21 have characteristic facial features; they experience hypotonia in infancy, delayed development, and cognitive impairment, as well as a pattern of congenital malformations. With the ability to karyotype individuals, rarer abnormalities of whole chromosomes were detected in dysmorphic stillborn infants or in live-born infants who subsequently died early in infancy (for example, from trisomy 18 or 13 syndrome) and were detected in analyses of first-trimester abortuses, which in general have a 50% rate of chromosome abnormalities.
The gain or loss of an entire chromosome is generally the result of nondisjunction or the missegregation of chromosomes at the time of cell division (i.e., in meiosis or mitosis). This results in one daughter cell having two copies of a particular chromosome and the other daughter cell having no copy. Fertilization of a germ cell containing two copies of a chromosome results in a zygote trisomic for that chromosome. Three autosomal trisomy syndromes have been described in live-born infants: the syndromes associated with trisomies 13, 18, and 21. Occasionally, trisomies of other autosomes occur, but there is usually a normal cell line present as well (mosaicism). In most cases, mosaicism is the result of a postzygotic segregation error in mitosis that occurs early in embryogenesis. Extra chromosomal material is better tolerated than missing material; there are no viable autosomal monosomy syndromes. The lack of one of the sex chromosomes is deleterious; the lack of a single X chromosome is lethal. Having a single X chromosome without a Y chromosome (Turner syndrome, which is associated with karyotype 45,XO) results in a high proportion of fetal wastage. Triploidy and tetraploidy result in abnormal embryogenesis, and a haploid conceptus has never been reported.7
The development of techniques for chromosome banding allowed the identification of individual chromosomes by banding pattern rather than merely by size. This enabled the detection of the addition, loss, or rearrangement of large groups of genes by means of changes in chromosome appearance; thus, translocations, inversions, duplications, isochromosomes, and ring or marker chromosomes were described. These changes may or may not have an effect on phenotype, depending on whether there is a net conservation of genetic material, but they can have profound effects on reproductive fitness, affecting the process of chromosome segregation in meiosis. In approximately 5% of couples with a history of three or more first-trimester losses, one of the partners will be found to have a chromosome abnormality; thus, karyotype analysis is indicated in such persons.8
Disorders of Partial Chromosome Deletion
The 22q11 deletion syndrome is a common microdeletion syndrome, occurring in one in 4,000 persons; it is unique in that most cases are de novo (i.e., not inherited from an affected individual). Persons with 22q11 deletion syndrome have variable clinical features, including (1) congenital heart disease (occurring in 74% of patients), particularly conotruncal malformations, such as tetralogy of Fallot, interrupted aortic arch, and truncus arteriosus; (2) palatal abnormalities (69%), notably velopharyngeal incompetence, submucosal cleft, and cleft palate; (3) characteristic facial features, including auricular abnormalities, hypoplastic alae nasi with a bulbous nasal tip, prominent nasal root, malar flatness, and hooded eyelids (> 50%); and (4) learning disabilities (70% to 90%).
Before the identification of the 22q11 microdeletion, patients were diagnosed on the basis of clinical features. The condition went under several names, including velocardiofacial syndrome, DiGeorge syndrome, Shprintzen syndrome, CATCH-22 (cardiac defects, abnormal facies, thymic hypoplasia, cleft palate, hypocalcemia), Cayler syndrome, and conotruncal anomaly face syndrome. Velocardiofacial syndrome was originally described as the combination of velopharyngeal incompetence, congenital heart disease, characteristic facial features, and developmental delay. DiGeorge syndrome, which includes the previously mentioned features as well as parathyroid deficiency and immune dysfunction from thymic aplasia or hypoplasia, was thought to be a developmental field defect of the third and fourth pharyngeal pouches.
In 1992, the first report of a microdeletion of chromosome 22 at the 11.2q band was reported; this was subsequently confirmed in other cases. In approximately 15% of cases, a deletion can be seen microscopically. In individuals with submicroscopic 22q11 deletions, the diagnosis is made by use of FISH using DNA probes from the DiGeorge chromosome region [see Figure 5]. Fewer than 5% of patients with clinical features of deletion 22q11 have normal results on cytogenetic studies and negative results on FISH testing.9
The typical deletion encompasses three million base pairs, with smaller deletions of several hundred thousand base pairs reported. However, there is no correlation between the size of the deletion and the expression of the syndrome. It is still unknown whether the syndrome involves contiguous genes or is the result of a single gene deletion that is variably expressed in affected individuals. The broadness of the phenotype and the unification of the various above-mentioned syndromes under the umbrella of 22q11 deletion have created some confusion. All cases of velocardiofacial syndrome, DiGeorge syndrome, Cayler syndrome, and conotruncal anomaly face syndrome that are associated with a deletion at 22q11 represent the same disorder.
The high occurrence of de novo deletions of 22q11 suggests some instability in this region. Because the overwhelming majority of patients have the same deletion in the 3 Mb (megabase) region, this area has been sequenced and carefully examined. This region contains four copies of duplicated sequence or low copy repeats, located nearer to the end points of the region. Each low copy repeat contains one or more duplicated modules, which contain duplicated markers. The presence of these low copy repeats at this typically deleted region suggests that occasionally these areas misalign; during cell division and homologous recombination, this leads to duplication of the region on one chromatid and deletion on the other. The presence of these low copy repeats, therefore, gives us some insight into the mechanisms responsible for the recurrence of this common de novo deletion involving chromosome 22.10 Repetitive sequences contribute to the inherent instability of some chromosome regions.
Chromosome analysis has always been pivotal in the evaluation of individuals with congenital anomalies, development delay, and mental retardation. Microarrays designed for clinical diagnosis of medically significant and relatively common chromosome microdeletions have become commercially available. A number of clinical syndromes result from microdeletions (less than 5 Mb) or microduplications that are not always apparent on a karyotype. Abnormalities in the telomere regions are also suspected to account for the phenotype in some of these individuals; these abnormalities are not revealed in routine chromosome analysis. The advantage of the targeted microarray technology is that numerous chromosomes can be screened in one procedure.11
Disorders of Single Gene Mutations
Anemia, which is a common clinical problem, is an excellent example of a condition that has many causes, both genetic and environmental. As monogenic disorders, the hemoglobinopathies are varied and complex. Approximately 7% of the world's population are carriers of different inherited disorders of hemoglobin, including structural hemoglobin variants and the thalassemias, which are disorders resulting from defective synthesis of the globin chains. Hemoglobin is a tetramer of two pairs of dissimilar globin chains, commonly α-globin and β-globin chains in hemoglobin A (HbA) or α-globin and δ-globin chains in HbA2. Healthy adults can have a residual amount of HbF (fetal hemoglobin, composed of two α-globin and two γ-globin chains), which is produced during fetal life and then replaced by adult hemoglobin in the first year of life. Since the discovery of a single point mutation that leads to the amino acid substitution of valine for glutamine, resulting in sickle cell anemia, over 700 structural hemoglobin variants have been identified, the most common of which are sickle hemoglobin HbS, HbC, and HbE. The thalassemias are generally classified on the basis of the particular globin chain or chains that are inefficiently synthesized in affected individuals.12
β-Thalassemia results from defective β-globin synthesis, which leads to an excess of α-globin chains. Over 200 mutations in β-globin genes have been identified in patients with β-thalassemia. The majority of these are point mutations—the loss of one or two bases that results in the disruption of gene function at the transcriptional, translational, or posttranslational level and in decreased synthesis of the β-globin chain. Clinically, one would expect the severity of the condition to correlate with the amount of β-globin chain produced, with homozygotes or compound heterozygotes being profoundly anemic and requiring lifelong blood transfusions and heterozygotes having a milder or silent condition. However, sibship studies have demonstrated phenotypic diversity in family members with the same genotype. This diversity may be a reflection of the inheritance of mutations in other loci involved in globin synthesis, because mutations for thalassemias and structural hemoglobinopathies occur together at a higher frequency in many populations. Combinations of structural hemoglobinopathies may positively alter the phenotype of thalassemia and reduce the concurrence of α- and β-thalassemia mutations in an individual; the occurrence of this process can vary from individual to individual in families. In addition, there are several mutations in the β-globin gene cluster and in the promoter region of the γ-globin genes that result in the persistence of fetal hemoglobin; such persistence produces a milder phenotype overlaying either a structural hemoglobinopathy or a β-thalassemia. Finally, mutations or polymorphisms in genes involved in bilirubin, iron, and bone metabolism may play a role in the clinical course of the disease in affected individuals.13
Gregor Mendel reported that the outcomes of reciprocal crosses were independent of the parental origin of a trait. In the late 1980s, however, researchers discovered that the two parental genomes are not equivalent in mammals. In the mouse zygote, the two pronuclei are distinct from one another and can be individually removed from the cell, and a zygote containing two female pronuclei or two male pronuclei can be created (a phenomenon known as uniparental disomy). Early embryonic development in such zygotes is abnormal. Purely female-derived embryos have poorly developed extraembryonic tissues, and purely male-derived embryos demonstrate abnormal embryo development. This phenomenon occurs sporadically in human conception, when a sperm fertilizes an egg without a pronucleus. This causes a doubling of the sperm chromosomes. The resulting diploid conceptus is a hydatidiform mole, which is a mass of extraembryonic membranes without an embryo. In contrast, ovarian dermoid cysts are derived from the spontaneous division of an oocyte; this results in the duplication of the maternal genome.
This phenomenon, whereby progeny phenotypes differ according to whether the genetic material is maternal or paternal in origin, is called genomic imprinting. This represents an extreme situation wherein the genetic material is derived entirely from one parent. In studying this phenomenon, investigators focused on the chromosomal regions responsible for the genomic imprinting effects observed in mouse embryos. Certain regions of distinct chromosomes were found to produce markedly different phenotypes, depending on whether the two copies were inherited from one parent, resulting in duplication or deficiency of one parental complement. An imprinted allele is one whose expression is changed or silenced as it passes through a particular sex. An allele is paternally imprinted if it is not expressed when it is inherited from the father. It is maternally imprinted if it is not expressed when it is inherited through the mother. Imprinted regions have been identified in both mouse and human chromosomes; alterations in normal imprinting patterns are associated with disorders of growth and development, cell proliferation, and behavior.
During gamete formation in mammals, some genes are altered by the methylation of certain cytosine groups in DNA. This process tends to prevent access by transcription machinery to that region of the chromosome for transcription, thus resulting in the “silencing” of that gene or genes. Whatever the process, the imprinting procedure would have to be erased during embryogenesis so that an individual could reimprint its genes according to its own sex during gametogenesis. Demethylation in embryonic cells occurs in the early cleavage divisions. Shortly after implantation, the embryonic somatic cells are methylated again, whereas the germ cells in the developing embryo are methylated later, as they develop in the gonads. An imprinting center on chromosome 15 may play a role in this process.14
Prader-Willi syndrome (PWS) and Angelman syndrome (AS) are two clinically distinct genetic diseases associated with deletions of the same region of chromosome 15. These syndromes are characterized by deficiencies in growth and sexual development, behavioral abnormalities, and mental retardation. Major diagnostic criteria for PWS include hypotonia; hyperphagia with resulting obesity; hypogonadism; and developmental delay. Patients with AS may have ataxia; sleep disorders; seizures; and hyperactivity with severe mental retardation. They may exhibit characteristic outbursts of inappropriate laughter.
Approximately 70% of patients with PWS and AS have a de novo 3 to 4 Mb deletion in the q11–q13 region of chromosome 15. Because this region is imprinted, the phenotypes that result from this deletion differ, depending on the allele upon which the deletion occurred. When the deletion occurs on the paternal chromosome, it results in PWS; when it occurs in the maternal copy, it results in AS. This suggests that the normal PWS gene is expressed from the paternal chromosome and that the normal AS gene is expressed from the maternal chromosome. Most of the remaining cases of PWS are the result of maternal uniparental disomy; paternal uniparental disomy accounts for only 4% of AS cases.15 In uniparental disomy, an individual inherits both copies of a chromosome from either the mother or the father through a nondisjunction error in meiosis. Again, lack of paternal 15q11–15q13 results in PWS; lack of maternal 15q11–15q13 results in AS. Imprinting defects have been implicated in some individuals with these syndromes.
Defects in a region termed the imprinting center, located within 15q11–15q13, can change the DNA methylation and transcription activity of certain genes that reside in the region. Thus, if there is a mutation in the imprinting center, the process of activation or inactivation of the imprinted region may not occur. If the methylation pattern is characteristic of maternal inheritance only, the underlying molecular class of the mutation can be determined for counseling. Although the risk of recurrence of a deletion is rare, a mutation of the imprinting center is associated with a recurrence risk of 50%.16
Trinucleotide Repeat Disorders
Several inherited disorders are known to have a worsening phenotype in each subsequent generation of family members affected by the disease.
Fragile X Syndrome, Huntington Disease, and Friedreich Ataxia
In the early 1990s, molecular geneticists discovered a new type of mutation, first in fragile X syndrome and then in a series of inherited neurologic disorders, including myotonic dystrophy, Huntington disease, and Friedreich ataxia. The mutation involves a repeat expansion of a DNA triplet, a trinucleotide repeat in a gene. The affected triplet may be in an exon (a portion of the gene that codes for protein synthesis) or an intron (a portion of the gene that is not expressed in the gene product). In patients with these conditions, the normal number of repeats is expanded (compared with unaffected individuals). The number of nucleotide repeats can increase in successive generations, causing disease symptoms to appear at an earlier age. The molecular basis of repeat instability is not well understood, but increased severity of the phenotype and earlier age of onset in successive generations (a phenomenon termed anticipation) are generally associated with larger repeat length. The parental origin of the disease allele can also influence expression; for most of these disorders, there is greater risk of repeat expansion with paternal transmission, although in fragile X syndrome and congenital myotonic dystrophy (see below), the maternally transmitted alleles are more prone to expansion, thereby causing more severe phenotypes. Most of the trinucleotide repeat disorders are inherited in an autosomal dominant or X-linked fashion, with the exception of Friedreich ataxia, which is an autosomal recessive disorder [see Table 1].17
Table 1 Trinucleotide Repeat Disorders
Myotonic dystrophy is a trinucleotide repeat disorder resulting in multisystem involvement of skeletal and smooth muscle, as well as involvement of the eye, heart, endocrine system, and central nervous system. The disorder represents a continuum of clinical findings; it has been classified for diagnostic purposes into three somewhat overlapping phenotypes: mild, classic, and congenital. Mild myotonic dystrophy is characterized by the development of cataracts in early adulthood and mild myotonia (difficulty relaxing the muscles after contraction). The symptoms may be so subtle that diagnosis is made retrospectively, after the birth of an affected offspring. Classic myotonic dystrophy is characterized by muscle weakness and wasting, myotonia, cataract formation, and cardiac conduction abnormalities that occur in adulthood. The life span of these patients may be somewhat shorter than normal.
Congenital myotonic dystrophy is a disease of the neonate characterized by generalized hypotonia, respiratory insufficiency requiring ventilatory support, and mental retardation if the infant survives to childhood. Individuals have characteristic facial features, which include drooping eyelids, facial weakness resulting in an open-mouthed appearance, and wasting of the muscles in the jaw and neck. The overall incidence of myotonic dystrophy is estimated to be one in 20,000 persons.
The diagnosis of myotonic dystrophy is confirmed by detection of an expansion of the cytosine-thymine-guanine (CTG) trinucleotide repeat that affects the noncoding regions of two adjacent genes (DMPK and SIX5) on chromosome 19q13. Normal individuals have a repeat of 37 trinucleotides or fewer. The trinucleotide repeat is located at the 3C1 end of the gene (the transcription occurs in the 5Á to 3Á direction), but it is in a part of the gene that is transcribed but not translated into the final protein product. Unaffected individuals have a polymorphic repeat length of five to 37 CTG repeats; this repeat length is stable when passed from generation to generation. Stability is disrupted, however, when the number of repeats exceeds 37. When the number exceeds 37, this repeat expansion not only disrupts the function of the gene but also engenders further instability and larger expansions. This tendency accounts for the phenomenon of anticipation seen in families with this disorder, in which a mildly affected adult can give birth to a child with the congenital form of the disease. In rare cases, the region will contract, with the CTG repeat being smaller in an offspring. In affected individuals, further expansion can occur during somatic cell division, resulting in mosaicism from tissue to tissue.18
The gene product of DMPK is a protein kinase. It is expressed in the different organs involved in the disease: skeletal muscle, the heart, the brain, and the testes. The function of the protein is unknown, and it is unclear how the expansion in the untranslated region leads to the phenotype. There is evidence, however, that the mutant DMPK transcripts accumulate abnormally in the nuclei and bind to RNA-binding proteins, thus disrupting RNA splicing and metabolism. A second form of myotonic dystrophy has been described. This form involves a chromosome 3 trinucleotide repeat, which also results in the accumulation of RNA in cells.19
This disorder presents a unique genetic counseling problem. Individuals who are mildly affected have a 50% chance of passing on the expanded allele to their offspring. However, because of the instability of the expanded region, it is impossible to accurately predict the severity of the condition in an affected child. There is a risk of having a child with the severe form of the disease—congenital myotonic dystrophy—only if the mutation is transmitted through the mother. Approximately 20% of the offspring of an affected mother who inherit the mutation manifest the severe form, depending on the size of the expansion in the mother. Although prenatal testing for the expansion is available, often the diagnosis of mild myotonic dystrophy in the mother is established only after the birth of an infant with the congenital form of the disease.
Mutations in mtDNA have been associated with a number of disorders with a unique inheritance pattern, termed maternal transmission [seeTable 2]. In maternal transmission, a condition affects individuals in each generation, suggesting dominant inheritance. Males or females may be affected, but men never transmit the disorder to their offspring. Women pass the trait on to all of their children, although there is great variability in expression.
Table 2 Disorders Resulting from Mitochondrial Mutations
mtDNA is a circular, double-stranded structure without introns. It contains 16,569 base pairs that encode 13 known proteins required for oxidative phosphorylation. In addition, mtDNA contains the transfer RNA (tRNA) and ribosomal RNA (rRNA) involved in the translation of these proteins in the organelle.
The manner in which mitochondria are passed from one generation to the next accounts for the phenomenon of maternal transmission. At the time of fertilization, the sperm sheds its cytoplasm, and only the nuclear DNA enters the egg. Therefore, all mitochondria in the zygote are contributed by the egg cell. However, there are hundreds of copies of mtDNA in each cell. During cell division, each mtDNA replicates, but unlike nuclear DNA, the newly synthesized mitochondria segregate passively to the daughter cells. This random segregation of mitochondria, which is termed heteroplasmy, results in unpredictability in phenotype from individual to individual.
Examples of disorders associated with deletions of mtDNA are chronic progressive external ophthalmoplegia, Kearns-Sayre syndrome, and Pearson syndrome. Point mutations of mtDNA result in mitochondrial encephalomyopathy with lactic acidosis and strokelike episodes (MELAS), myoclonic epilepsy with ragged-red fibers (MERRF), and neuropathy, ataxia, and retinitis pigmentosa (NARP).20
Thus far, we have focused on genetic diseases with specific phenotypes associated with the presence of germline mutations that lead to expression of one or more abnormally functioning proteins. This type of mutation is present in all the cells of an individual from birth, although there may be a degree of mosaicism, depending on the stage of development at which the mutation occurs. Investigation into the control of cell growth has given new insight into genetic changes that occur in both germ cells and somatic cells and that can lead to malignancy. Mutations in three types of genes that regulate cell growth are involved in the development of cancer: tumor suppressor genes, proto-oncogenes, and DNA repair genes. Somatic mutations in these genes may result in unchecked proliferation or clonal expansion of a single cell with subsequent loss of cellular organization; somatic mutations may also confer the ability to metastasize. This process generally requires a number of mutations, because there is an elaborate backup system in place to prevent faulty cell proliferation. Although sporadic mutations arise in individual somatic cells and ultimately play a role in cancer development, the study of familial or inherited cancer syndromes has contributed to our understanding of the genetic changes responsible for the development of some of the more common cancers.
It is perhaps easiest to understand how a germline mutation in a tumor suppressor gene could lead to a predisposition to cancer. Such is the case in families with an inherited mutation in BRCA1 and BRCA2. These individuals have only one functional copy of the gene; a subsequent somatic mutation in the normal copy in a single cell can give rise to a population of cells that have no functional BRCA1 or BRCA2 gene and have therefore lost a tumor suppressor activity that limits cell proliferation. In individuals with a germline mutation in BRCA1 and BRCA2, the chance of developing breast cancer over one's lifetime is estimated to be 40% to 70%; the chance of developing ovarian cancer is 20% to 50%.21 In such patients, the cancer usually develops at an earlier age than is seen in the general population, and there can be multiple primary sites. Nevertheless, most breast cancer disease is sporadic, and the disease is common enough that family history may be misleading, particularly in a large family. There are algorithms for assessing a patient's risk of developing breast cancer, as well as the risk of carrying a germline mutation (e.g., http://qap.sdsu.edu/screening/breastcancer/bda/pdf/Algos_all_2005.pdf; the National Cancer Institute's breast cancer risk calculator is available online at http://www.cancer.gov/bcrisktool/). Such risk-assessment algorithms are based on personal health history and family history of breast cancer, ovarian cancer, or both. With regard to family history, important factors include the age of onset in affected individuals and whether there was more than one primary site. Verification of the family member's medical records is imperative.22
Proto-oncogenes are recessively acting genes that regulate the cell cycle. Mutant dominant genes, called oncogenes, are usually gain-of-function mutations; the altered products of such mutations cause uncontrolled cell proliferation. Oncogenes were discovered by transformation experiments in tissue culture. In these experiments, normal cells were made into malignant cells by the insertion of a mutant piece of DNA (the oncogene). A number of proto-oncogenes have been located in the human genome, and mutations in them have been implicated in the development of leukemias, lymphomas, breast and ovarian carcinomas, and cancer of the colon, thyroid, lung, and pancreas. Many oncogenes are caused by chromosomal changes that result from breakage and translocations occurring as cell proliferation becomes more disorganized. The so-called Philadelphia chromosome seen in chronic myelogenous leukemia is a translocation between chromosomes 9 and 22. The breakpoint in chromosome 9 occurs in the cellular proto-oncogene ABL, which normally codes for a tyrosine kinase that binds to DNA. The breakpoint in chromosome 22 is in a gene called BCR, or breakage cluster region, which codes for a serine kinase. The fused BCR-ABL gene in the Philadelphia chromosome makes a novel protein, which leads to unregulated proliferation of hematopoietic stem cells and chronic myeloid leukemia.23
Although most cases of colorectal cancer are sporadic, there are two somewhat common forms of autosomal, dominantly inherited colorectal cancer. Taken together, familial adenomatous polyposis (FAP) and hereditary nonpolyposis colorectal cancer account for about 10% of cases of colorectal cancer. FAP is caused by a germline mutation in a tumor suppressor gene, the APC gene. Loss of function of the second APCallele leads to adenoma formation and progression to cancer through the accumulation of other somatic mutations.
Supporting the concept that accumulated genetic changes underlie the development of neoplasia, the defect in hereditary nonpolyposis colorectal cancer (HNPCC) was found to be in DNA mismatch repair genes, active in maintaining genome stability. In the course of normal cell division, DNA replication is subject to error, although the fidelity of DNA polymerase (an enzyme that assists in replication) is quite good. The mismatch repair genes, of which at least six are known (MSH2, MSH6, MLH1, MLH3, PMS1, and PMS2) function as a complex to recognize the deformation in the double helix and to then recruit enzymes to correct the error. Without these proteins, errors are propagated in successive generations of cells. Individuals with a germline mutation in the mismatch repair system may undergo loss of function in the second mismatch repair allele, resulting in the characteristic microsatellite instability (MSI). Microsatellites are repeating DNA sequences of unknown function that are found throughout the genome. These repetitive sequences are more prone to errors in replication. Loss of mismatch repair mechanisms permits expansion of these repeats, as may be demonstrated in tumor specimens. The presence of MSI indicates an increased likelihood of HNPCC, although MSI is seen in 15% of sporadic colorectal cancers. Families with HNPCC have an increased incidence of other types of cancers, including endometrial, ovarian, upper gastrointestinal tract, renal, pelvis, and brain cancers.24
As cancer evolves, the tumor's genome undergoes many changes—including point mutations, rearrangements, deletions, and amplifications—that enable the cells to escape the normal cell cycle controls. Profiling these changes with microarray analysis may provide precise diagnostic criteria, as well as a basis for recommending therapy.25
Human Genome Project
The Human Genome Project was a massive effort undertaken to determine the sequence of the entire human genome. Sequencing of the genome was essentially completed by 2000, and it is already beginning to have important effects on medical research and practice.
Only about 2%, at most, of the genome codes directly for information. About 24% is intronic sequence (i.e., noncoding sections of genes); the remaining 75% is intergenic. The function of this intronic and intergenic material has not been fully elucidated.
Another difficulty is that scientists do not understand the rules of so-called genomic punctuation. With genes embedded in small exons occupying less than 1% of the genome, recognition of the coding regions is difficult. Identification of genes is based on the assumption that genes are expressed and usually converted to mRNA. Because mRNA has a poly-A 3′ tail, it is possible to capture portions of messages. These snippets are termed expressed sequence tags; they were used to identify the signposts for genes in the genome project. However, identification of the information content or function for the remaining portion of the genome remains an open question.
Knowledge of human genetics and its application to clinical medicine is constantly evolving. Geneticists have progressed from inferring inheritance modes by pedigree analysis and from inferring risk to future offspring by probability calculations to molecular testing based on the identification of mutations in a gene or genes involved in a specific disorder. The human genome has been sequenced. Despite these advances, the genetic bases of the remaining single gene disorders, the genetic component of multifactorial inheritance conditions, and the function of noncoding DNA remain to be deciphered. Aided by improvements and advances in molecular technology, scientists will have this task in the 21st century.
Future Applications of Genetics to Medicine
Scanning the Genome for Risk of Disease
Technology holds the promise of detailed analysis of the genomes of individuals with attention to particular areas. By combining computer-chip design with DNA hybridization techniques, arrays of DNA sequences containing many thousands of specified sequence variations can be made.17 This will allow searching for disease-specific mutations or associated polymorphisms in a person. However, because it appears that most common diseases have a genetic component but manifest on the basis of other factors (multifactorial disease), so-called array analysis offers a new tool. The human genome contains single-base variations—single-nucleotide polymorphisms (SNPs) that occur at a rate of about one per 1,000 bases; there are close to three million SNPs in the human genome. Of these, perhaps 1% are in exons and can be used to identify disease risk by linkage. As associations with multifactorial disease are made, scanning for markers linked to risk—even though, at the molecular level, the basis of the risk remains unknown—will allow determination of the apparent risk of multifactorial disease in an individual patient. It is expected that array analysis will prove useful in assessing the risk of diabetes, heart disease, cancers, and other common diseases.
Identifying Drug Responsiveness
A second avenue of use for the complex analysis of individual genomes may come with regard to drug prescription. Associations of drug responsiveness and genome markers will develop. It seems likely that genome variations affect a patient's response to drug therapy and that such variations may thus have a role in drug selection and dosage schedules. Medical practice may come to utilize an array analysis for a given drug. The Food and Drug Administration has approved tests that use genotyping of the cytochrome P-450 system, which acts in drug metabolism, to guide the prescribing of certain medications.
Privacy Issues in Genetics
The testing capabilities and the ability to store and compare sequence data raise ethical concerns. To a large degree, these questions are not new in medicine, but the extent of the knowledge and the possible predictive nature of the information make the issue one of new focus and attention. Collection, storage, and dissemination of an individual's genetic information have become topics for discussion at the state and national levels. Already a number of states have revised statutes regarding privacy. The question of privacy in a time of electronic records is in itself a difficult one for health care providers. Access to records is a thorny issue. Added to this are concerns over the availability of health care insurance and life insurance for individuals with a family history of genetic disease. With patient profiles that include a large number of disease-causing sequence alterations now a reality, the problem has only become larger.
Online Resources for Genetic Information
Several Web-based sites for information regarding the genome or genetic diseases are available [see Sidebar Selected Internet Resources for Genetic Information].
Selected Internet Resources for Genetic Information
A convenient entry point to search for publications or to search the genome database for sequence information. Published by the National Library of Medicine.
The Online Mendelian Inheritance in Man (OMIM)
Provides a compilation of heritable diseases and a summary of the clinical and molecular information relating to them. It is organized under the direction of the National Center for Biotechnology Information.
Provides information for clinicians regarding molecular testing laboratories. This resource is helpful in patient evaluation and in locating laboratory testing sites for families with known diagnoses or risk of genetic diseases.
American Society of Human Genetics
Provides access to electronic publications and links to other sources of information.
Figure 1 George Kelvin.
Figure 2 Seward Hung.
Figures 4 and 5 Dimitry Schidlovsky.
Editors: Dale, David C.; Federman, Daniel D.