Abeloff's Clinical Oncology, 4th Edition

Part I – Science of Clinical Oncology

Section A – Biology and Cancer

Chapter 1 – Molecular Tools in Cancer Research

Nadia Rosenthal




Our understanding and treatment of cancer have always relied heavily on parallel developments in biological research. Molecular biology provides the basic tools to study genes that are involved in cancer growth patterns and tumor suppression. An advanced understanding of the molecular processes that govern cell growth and differentiation has revolutionized the diagnosis and prognosis of malignant disorders.



Current cancer research seeks to integrate the complex interaction of the cell's genome with its environment and emphasizes the need for a systematic approach to the analysis of gene function and dysfunction in the context of the intact organism. Discovery of the mechanisms that are responsible for genetic instability may lead to more reliable tests for hereditary susceptibility to cancer and, ultimately, to more effective therapies.



This introductory chapter relates basic principles of molecular biology to emerging perspectives on the origin and progression of cancer and explains newly developed laboratory techniques, including whole-genome analysis, expression profiling, and refined genetic manipulation in animal models, providing the conceptual and technical background necessary to grasp the central principles and new methods of current cancer research.


Since the previous edition of this book was published, advances in our understanding of the basic mechanisms of cancer have continued to inform and refine clinical approaches to prevention and therapy. New prognostic and predictive markers derived from molecular biology can now pinpoint specific genetic changes in particular tumors or detect occult malignant cells in normal tissues, leading to improved technologies for tumor screening and early detection. Diagnostic approaches have expanded from morphologic criteria and single-gene analysis to whole-genome technologies imported from other biological disciplines. A new systemic vision of cancer is emerging, in which the importance of individual mutation has been superseded by an appreciation for the new parameters set by gene-environment interactions, with profound influences on a tumor cell's transcriptional profile and function. Results from these cross-disciplinary applications underscore the complexity of carcinogenesis and promise to streamline the design of strategies for both cancer prevention and advanced cancer therapy.

This overview will serve as a foundation of conceptual and technical information for understanding the exciting new advances in cancer research that will be described in subsequent chapters. Since the discovery of oncogenes, which provided the first concrete evidence of cancer's genetic basis, applications of advanced molecular techniques and instrumentation have yielded new insights into normal cell biology as well. A basic fluency in molecular biology will soon be a necessary prerequisite for clinical oncologists, since many of the new diagnostic and prognostic tools that are in use today and will be in use tomorrow will rely on these fundamental principles of gene, protein, and cell function.


Cancer genetics has classically relied on the candidate gene approach, detecting acquired or inherited changes in specific genetic loci accumulated in a single cell, which then proliferates to produce a tumor composed of its identical clonal progeny. During the early steps of tumor formation, mutations that lead to an intrinsic genetic instability allow additional deleterious genetic alterations to accumulate. These genetic changes confer selective advantages on tumor cell clones by disrupting control of cell proliferation. The identification of specific mutations that characterize a tumor cell has proved invaluable for analyzing the neoplastic progression and remission of the disease.

Methods for mutation detection all rely on the manipulation of DNA, the basic building block of heredity in the cell. DNA consists of two long strands of polynucleotides that twist around each other clockwise in a double helix ( Fig. 1-1 ). Nucleic acid bases attached to the sugar groups of each strand face each other within the helix, perpendicular to its axis. These comprise only four bases: the purines adenine and guanine (A and G) and the pyrimidines cytosine and thymine (C and T). During assembly of the double helix, stable pairings of nucleotides from either strand are made between A and T or between G and C. Each base pair forms one of the billions of rungs in the long, unbroken ladder of DNA that forms a chromosome.


Figure 1-1  DNA (deoxyribonucleic acid) is the cell's genetic material, contained in single compacted strands comprising chromosomes within the cell nucleus. In the DNA double helix, the two intertwined components of its backbone are composed of sugar (deoxyribose) and phosphate molecules that are connected by pairs of molecules called bases. The sequence of four bases (guanine, adenine, thymine, and cytosine) in the DNA helix determines the specificity of genetic information. The bases face inward from the sugar-phosphate backbone and form pairs with complementary bases on the opposing strand for specific recognition. The arrangement of chemical groups is unique for each base pair, allowing base pairs to be specifically targeted by transcription factors, polymerases, restriction enzymes, and other DNA-binding proteins.



The functional unit of inherited information in DNA—the gene—is most often represented by a discrete section of sequence that is necessary to encode a particular protein structure. Gene expression is initiated by forming a copy of the gene, messenger RNA (mRNA), which is constructed base by base from the DNA template by a polymerase enzyme. Once the sequence is transcribed, an RNA transcript is modified at both ends and then undergoes a highly regulated process called splicing. In higher organisms, most protein-coding gene sequences are interrupted by stretches of noncoding sequences, called introns. The genetic machinery must remove these introns to form a continuous chain of coding sequences, or exons, which subsequently undergo translation into protein. The splicing process requires absolute precision because the deletion or addition of a single nucleotide at the splice junction would throw the three-base coding sequence out of frame.

The biologic importance of RNA splicing is not entirely understood, but many medically relevant genes have alternative splice patterns in which different combinations of exons are chosen for the final mRNA transcript, such that one gene can encode many different proteins ( Fig. 1-2 ). The choice of protein isoform to be expressed from a gene with multiple splicing possibilities is a decision that can be perturbed in disease. Since protein synthesis occurs in the cytoplasm, genetic information is transported out of the nucleus by mRNA. In the cytoplasm, proteins are then synthesized, or translated, in macromolecular complexes called ribosomes that read the mRNA sequence and convert the nucleic acid code, based on three-base segments or codons, into a 20-amino-acid code to form the corresponding protein.


Figure 1-2  Alternative splicing produces multiple related proteins, or isoforms, from a single gene.  (Adapted from Guttmacher AE, Collins F: Genomic medicine: A primer. N Engl J Med 2002;347:1512–1520.)




The complete set of DNA sequences carried on all the chromosomes is known as the genome. Although the general map of the genome is shared by all members of a species, the recent sequencing of the human genome has given us new tools to reveal the more subtle variations that arise between individuals. These variations are critical, both as a natural engine driving heterogeneity within a species and as a source of predisposition to cancer types. The most common forms of human genetic variations arise as single-nucleotide polymorphisms (SNPs). Because these allelic dissimilarities are abundant, inherited, and dispersed throughout the genome, SNPs can be used to track racial diversity, personal traits, and susceptibility to common forms of cancer ( Fig. 1-3 ).


Figure 1-3  Using SNPs to determine cancer susceptibility. Millions of single-nucleotide polymorphisms (SNPs) exist between individuals, as depicted by the red arrows and the SNP density map of human chromosome 11 (right). By contrast, point mutations, deletions, insertions, and rearrangements between normal tissues and tumors or between primary and secondary tumors probably number in the tens to hundreds (or potentially thousands), as depicted by the spectral karyotype image at the bottom of the figure. Because the constitutional genetic polymorphisms are present in all of the tissues of the body, it might be possible to distinguish differences in metastatic versus nonmetastatic tumors and in nontumor tissue before they ever happen to develop a solid tumor.  (Adapted from Hunter K: Host genetics influence tumour metastasis. Nat Rev Cancer 2006;6:141–146.)




How do SNPs arise between individuals? One source of variation in DNA sequence derives from deviations in the strict base-pairing rule underlying the structure, storage, retrieval, and transfer of genetic information. The duplicated genetic information in the two strands of DNA not only permits the repair of a damaged coding sequence, but also forms the basis for the replication of DNA. During cell division, polymerase enzymes unwind the DNA strands and copy them, using the base sequences as a template for constructing a new helix so that the dividing cell passes its entire genetic content on to its progeny. Errors in this process are rare, and person-to-person differences make up only about 0.5% of the human genome. SNPs are inherited if they occur in the germline. Most genetic variation is of no obvious functional consequence, occurring in regions that do not encode protein or alter the regulation of nearby genes. Given the disruptive effects that even subtle genetic changes can have on cell function, it is important to distinguish SNPs that represent mutations from benign polymorphisms.

Our ability to monitor hundreds of thousands of SNPs simultaneously is one of the most important advances in modern medical genetics. Relatively simple genotyping technologies for SNP detection rely largely on the polymerase chain reaction (PCR). In this procedure, two chemically synthesized single-stranded DNA fragments, or primers, are designed to match chromosomal DNA sequences flanking the segment in which an SNP is positioned. The strands of genomic DNA are separated by heating and, after cooling, the primer binds to its matching sequence in the genomic DNA. With the addition of nucleotide building blocks and a heat-stable DNA polymerase, the primer pairs, or amplicons, initiate synthesis of new DNA strands using the chromosomal material as a template. Each successive copying cycle, initiated by “melting” the resulting double-stranded products with heat, doubles the number of DNA segments in the reaction ( Fig. 1-4 ). The technique is exceptionally sensitive; millions of identical DNA copies can be generated in a matter of hours with PCR using a single DNA molecule as the starting material. Other novel methods for large-scale SNP detection include single nucleotide primer extension, allele-specific hybridization, oligonucleotide ligation assay, and invasive signal amplification, which detect polymorphisms directly from genomic DNA without the requirement of PCR amplification. Regardless of the method that is used to characterize them, the collective SNPs in a selected genomic region characterize a haplotype, or specific combination of alleles at multiple linked genetic loci along a chromosome that are inherited together.


Figure 1-4  Amplification of DNA by PCR. The DNA sequence to be amplified is selected by primers, which are short, synthetic oligonucleotides that correspond to sequences flanking the DNA to be amplified. After an excess of primers is added to the DNA, together with a heat-stable DNA polymerase, the strands of both the genomic DNA and the primers are separated by heating and allowed to cool. The polymerase elongates the primers on either strand, thus generating two new, identical double-stranded DNA molecules and doubling the number of DNA fragments. Each cycle takes just a few minutes and doubles the number of copies of the original DNA fragment.



Even when the SNPs within a given haplotype are not directly involved in a disease, they provide markers for clonality and for the loss or rearrangement of specific chromosomal segments in growing tumors. In the human nucleus, each of the 23 tightly compacted chromosomes has a characteristic size and structure and a distinctive base sequence that carries unique protein coding information. Other noncoding DNA sequences are used for directing the transcription of neighboring genes through complex regulatory circuits that involve protein binding and modification of the DNA itself or shifting of its chromosomal packaging. Although genomic instability is generally considered a consequence of tumor formation rather than the initial trigger of cancer, the loss, gain, or rearrangement of chromosomal segments through deletion or translocation is a common form of neoplastic mutation, as protein-coding segments from different genes are combined or regulatory sequences are brought into new proximity to genes that they do not normally control. Gross changes in DNA arrangement can be detected by cytogenetic analysis of chromosomal features on metaphase spreads. Fluorescent in situ hybridization provides greater resolution by localizing specific chromosomal DNA sequences corresponding to fluorescently labeled probes ( Fig. 1-5 ) and can be used to track specific alterations in chromosomal structure where known genes are involved.


Figure 1-5  Detection of chromosomal translocations in interphase cells by fluorescent in situ hybridization. This technology uses a labeled DNA segment as a probe to search homologous sequences in interphase chromosomes for the t(9;22)(q34;q11) translocation, associated with chronic myeloid leukemia. On the left, patient nuclei were hybridized with probes for chromosome 9 (labeled with SpectrumRed fluorophore) and chromosome 22 (labeled with SpectrumGreen).  (Republished with permission of AlphaMed Press, from Oncologist, Varella-Garcia M. Molecular cytogenetics in solid tumors: laboratorial tool for diagnosis, prognosis, and therapy. 2003;45–58; permission conveyed through Copyright Clearance Center, Inc.)




Although cytogenetic techniques are useful in detecting consistent, nonrandom structural abnormalities of clonal tumor cells, they require cell culture, which can limit their usefulness, particularly in analyzing solid tumors. Particularly when chromosomal alterations are too small to be detected by cytogenetic analysis, exploitation of the exquisite sequence specificity of certain bacterial DNA endonucleases, called restriction enzymes, allows the systematic cleavage of very large DNA molecules isolated from tumor samples into predictable, manageable subfragments. These can be identified by hybridization of a short, specific DNA or RNA oligonucleotide probe to its complementary base sequence, or target, in the fractionated genomic material. In most applications, the enzyme-digested sample of DNA is size fractionated by gel electrophoresis and transferred or “blotted” onto a nylon membrane to which labeled probe is then applied. In this procedure, DNA fragments containing sequences that hybridize with radioactively labeled probe can be detected by autoradiography or, alternatively, by nonisotopic colorimetric or chemiluminescent systems ( Fig. 1-6 ). When a SNP occurs in the recognition site of a restriction enzyme, lost or rearranged DNA produces a change in the enzyme cleavage pattern, and these restriction-fragment-length polymorphisms can be detected by blotting analysis. PCR-based analysis using flanking DNA sequences as primers is also used to produce fragments, changes in the size of which, due to addition or loss of DNA bases, can be detected by gel electrophoresis. Subtler single base mutations can be identified by automated DNA-sequencing techniques.


Figure 1-6  A, Analysis of DNA by gel electrophoresis and Southern blotting. Genomic DNA is cut with restriction enzymes into fragments before being separated according to size by gel electrophoresis. The four lanes on the gel represent the digestion of the DNA with four different restriction enzymes. In Northern blotting, total cellular RNA, including messenger RNA, can also be separated according to size. After electrophoresis, the nucleic acids in the gel are transferred directly onto a charged nylon filter, to which they are tightly bound. Thus, the filter contains a precise replica of the nucleic acid distribution in the gel. The filter is then hybridized in a rotating sealed chamber with a DNA or RNA probe specific for the target of interest (in this case, sequences in a microbial pathogen). Probes have traditionally been radioactively labeled with nucleotides containing phosphorus 32; however, the use of nonradiolabeled probes is becoming more common. After the probe has hybridized to its target sequence, the nonhybridized probe is washed away, and the filter is exposed to x-ray film. A DNA sequence complementary to the probe is seen as a dark band on the developed film. The position of the hybridized target sequence in each lane is unique to the restriction enzyme that is used to digest the DNA. This procedure is termed Southern blotting when DNA is analyzed and Northern blotting when RNA is analyzed. B, Cytogenetic and molecular analyses of tumor cells. Three methods of detecting the specific genetic alterations shared by all the neoplastic cells in a tumor are shown. (1) If the genetic alteration is large enough, as in the deletion of a region of DNA between loci A and B, cytogenetic analysis can detect grossly visible karyotypic changes. (2) Southern blot analysis can detect small changes in gene structure that routine karyotyping studies cannot find. In this example, a probe to locus A normally detects a large DNA restriction fragment, as shown by the band for the normal DNA sample. Because of the deleted DNA segment in the tumor cells, the probe for the region between loci A and B hybridizes to a smaller, rearranged, tumor-specific restriction fragment. The normal, larger band shown on the blot is from DNA contributed by nonneoplastic stromal and reactive cells. (3) In many applications, the polymerase chain reaction (PCR) can detect alterations in DNA structure with the highest degree of sensitivity. Here, the primers that anneal to loci A and B in normal DNA are too far apart to yield an amplified PCR product. The deletion shown in the tumor DNA brings the two annealing sites close to one another, allowing the generation of a novel amplified PCR product.  (A, Modified from Naber SP: Molecular pathology-diagnosis of infectious disease. N Engl J Med 1994;331:1212–1215. B, From Naber SP: Molecular pathology-detection of neoplasia. N Engl J Med 1994;331:1508–1510.)




The plethora of data that arise from genome-wide association studies using currently available techniques poses particular challenges to cancer researchers. Discerning the causal genetic variants among genotype-phenotype associations requires extensive replication, control for underlying genetic differences in population cohorts, and consistent classification of clinical outcomes. New technologies must be met with equivalently sophisticated and rigorous analytical methodologies for the true genetic cause of cancer to be teased out from our variable and often unstable heredity.


The engineering of genes by recombinant DNA technology evolved from methods that were initially devised to provide sequences in amounts sufficient for biochemical analysis. The original protocol involves clipping the desired segment from the surrounding DNA and inserting it into a bacterial or viral vector, which is then amplified millions of times in a host bacterium. Using recombinant DNA technology, genetic engineering can routinely produce industrial quantities of pure, clinically useful products in a cost-effective way. For diagnostic purposes, it is easier and faster to amplify a known genomic DNA sequence directly from a patient sample with PCR, but the classic approach is still applied to the construction of recombinant DNA libraries.

To be useful, a DNA library must be as complete as possible, with recombinant members, or clones, sufficiently numerous to include all the sequences in an individual genome. For certain kinds of gene linkage analysis that require long, uninterrupted stretches of DNA, special vectors, such as bacterial or yeast artificial chromosomes, can carry foreign DNA fragments of enormous lengths. Chromosomal segments represented in genomic DNA libraries can contain the structure of an entire gene, including the information that regulates its expression and formed the starting material for sequencing the human genome.

For some applications, construction of partial libraries, which contain only the DNA sequences transcribed by a particular tissue or type of cell, is sufficient. The starting material in this case is mRNA. For cloning purposes, the enzyme reverse transcriptase can convert mRNA into complementary DNA (cDNA). Advanced techniques for acquiring full-length copies of RNA transcripts include rapid amplification of cDNA ends. The cDNAs are then incorporated into bacterial vectors. The number of clones in a cDNA library is much smaller than that in a genomic library, since a cDNA library represents only the genes that are expressed by the tissue of interest and contains exclusively the coding portion of genes. Screening DNA libraries for a specific gene classically relied on mass growth of bacterial hosts on agar, transfer of replicates to nylon filter and exposure to a specific DNA probe. The probe's unique sequence of nucleotides ensures that it hybridizes only to a nucleic acid molecule with the complementary sequence, which marks the position of the target clone. Cloning vectors can also be modified to drive the expression of their payload, producing “expression” libraries in bacteria that can be screened for protein production using specific antibodies, a technique that becomes increasingly important in large-scale screening of protein products.


Mutations that lead to oncogenic transformation of a cell invariably affect the expression of its genetic information that specifies functional products, either RNA molecules or proteins used for various cellular functions. The primary level of gene control is the transcription of DNA into RNA. Gene regulation, or the control of RNA synthesis, represents a complex process that is frequently a target of neoplastic mutation.

DNA regulatory sequences do not encode a product. Yet without them, a cell could not coordinate the expression of the hundreds of thousands of genes in its nucleus, select only certain genes for expression, and activate or repress them in response to precise internal or external signals. These control centers of the genome contain binding sites for multiple proteins, called transcription factors, which interact to form regulatory networks that control gene transcription. Their function can be altered by signals that induce modifications such as phosphorylation, or by interactions with other regulators such as steroid hormones. Many of the cell's responses to a wide variety of external stimuli, such as neurotransmitters, antigens, cytokines, and growth factors, are mediated through transcription factors binding to DNA regulatory sequences.

Certain regulatory DNA sequences common to many genes are positioned upstream of the transcription start site ( Fig. 1-7 ). Collectively called the promoter of a gene, these proximal sequences comprise binding sites for the RNA polymerase and its numerous cofactors. Whereas the position of the promoter with regard to the transcription start site is relatively inflexible, other DNA regulatory elements, known as enhancers, occur in unpredictable locations, often at a considerable distance from the genes they control. Some transcription factors bind to particular regions of enhancers and drive their associated genes in many types of cells, whereas others, which are active in only a limited variety of cells, maintain a tissue-specific pattern of gene expression. Enhancers are often responsible for the aberrant expression of genes induced by chromosomal translocation-associated specific forms of cancer; a normally quiescent gene promoting cell growth that is dislocated to a position near a strong enhancer may be activated inappropriately, resulting in loss of growth control.


Figure 1-7  Mammalian gene structure and expression. The DNA sequences that are transcribed as RNA are collectively called the gene and include exons (expressed sequences) and introns (intervening sequences). Introns invariably begin with the nucleotide sequence GT and end with AG. An AT-rich sequence in the last exon forms a signal for processing the end of the RNA transcript. Regulatory sequences that make up the promoter and include the TATA box occur close to the site where transcription starts. Enhancer sequences are located at variable distances from the gene. Gene expression begins with the binding of multiple protein factors to enhancer sequences and promoter sequences. These factors help to form the transcription-initiation complex, which includes the enzyme RNA polymerase and multiple polymerase-associated proteins. The primary transcript (pre-mRNA) includes both exon and intron sequences. Post-transcriptional processing begins with changes at both ends of the RNA transcript. At the 5′ end, enzymes add a special nucleotide cap; at the 3′ end, an enzyme clips the pre-mRNA about 30 base pairs after the AAUAAA sequence in the last exon. Another enzyme adds a polyA tail, which consists of up to 200 adenine nucleotides. Next, spliceosomes remove the introns by cutting the RNA at the boundaries between exons and introns. The process of excision forms lariats of the intron sequences. The spliced mRNA is now mature and can leave the nucleus for protein translation in the cytoplasm.  (Adapted from Rosenthal N: Regulation of gene expression. N Engl J Med 1994;331:931–932.)




Enhancers and promoters have been assigned specific roles by means of cell culture assays or in transgenic animals in which putative regulatory DNA sequences are linked to test or “reporter” genes and are examined for their ability to activate expression of the reporter gene in response to the appropriate signals. By assessing the effects of deleting, adding, or changing DNA sequences within the regulatory element, the precise nucleotides that are critical for recognition by transcription factors can be determined.

The interaction between protein and DNA is also used to identify transcription factor-binding sites in a regulatory region. Whereas electrophoretic mobility shift assays, or DNA footprinting, were once standard techniques for determining protein-DNA interactions, emerging genome-wide technologies, such as chip sequencing (see Fig. 1-13 later in the chapter), are revolutionizing the way in which we see the simultaneous interaction of a transcription factor complex with virtually all of its potential genomic targets in a particular cell state.


Figure 1-13  ChIPonchip is a technique for location, isolation, and identification of the DNA sequences occupied by specific DNA-binding proteins in cells. These binding sites may indicate functions of various transcriptional regulators and help to identify their target genes during development and disease progression. The types of functional elements that one can identify using ChIPonchip include promoters, enhancers, repressor and silencing elements, insulators, boundary elements, and sequences that control DNA replication.  (Adapted from Ren lab: www.chiponchip.org/Images/scheme_800x600_crop.jpg.)


Our appreciation of oncogenic perturbations, either by mutation of regulatory protein-coding genes or in the target sequences these proteins recognize, has recently extended to include epigenetic lesions, although these mechanisms have been more difficult to define. Multiple levels of control are necessary to ensure correct gene expression, which is so central to the normal function of the cell. Epigenetics refers generally to control information that is inherited during cell division along with the DNA sequence itself.

A key component of epigenetic regulation, chromatin, wraps DNA into coils with scaffolding proteins such as histones as a necessary component of chromosomal compaction but also plays a critical role in gene accessibility ( Fig. 1-8 ). Active genetic loci are associated with loosely configured euchromatin, whereas silent loci are condensed in heterochromatin. The formation of chromatin configurations both controls and is controlled by patterns of methylation on specific DNA sequences, relating the underlying genetic information to its higher-order structure that determines whether a particular gene regulatory element is available to transcription factors. These epigenetic modifications of the nuclear environment that determine the accessibility of a gene can persist during cell division, as inherited patterns of methylation provide permanent marks for altered chromatin configuration in daughter cells.


Figure 1-8  Chromatin packaging of DNA. The 4 meters of DNA in every human cell must be compressed in the nucleus, reaching compaction ratios of 1:400,000. This is achieved by wrapping the DNA (blue) around histone protein complexes (green), forming nucleosomes that are connected by a thread of free linker DNA. Each nucleosome, together with its linker, packages about 200 base pairs (66 nm) of DNA. The nucleosomes are then coiled into chromatin, a rope of nucleoprotein about 30 nm thick (bottom left electron micrograph). To allow DNA to be accessed by transcription and replication apparatus, chromatin is relaxed (bottom right electron micrograph).  (Courtesy of Jakob Waterborg www.umkc.edu/sbs/waterborg/chromat/chromatn.html © 1998 Jakob Waterborg.)




Recent research has linked rearrangement of chromatin and associated DNA methylation with the inactivation of tumor suppressor genes and neoplastic transformation. Defects that could lead to cancer involve perturbations in the “epigenotype” of a particular locus through the silencing of normally active genes or activation of normally silent genes, which are associated with changes in DNA methylation, histone modification, and chromatin proteins ( Fig. 1-9 ). Changes in the number or density of heterochromatin proteins associated with cancer-related genes such as EZH2 or of euchromatic proteins such as trithorax in leukemia can also be associated with abnormal patterns of methylation in gene promoter regions as well as with higher-order chromosomal structures that are only beginning to be understood. Finally, it is increasingly evident that interactions between the “epigenome,” the genome, and the environment are a common target for mutation and can have profound effects on the gene expression readout of a cancer cell.


Figure 1-9  Gene accessibility through epigenetics. The nature of epigenetic lesions. The cartoon depicts known and possible defects in the epigenome that could lead to disease. A, X is a transcriptionally active gene with sparse DNA methylation (brown circles), an open chromatin structure, interaction with euchromatin proteins (green protein complex), and histone modifications such as H3K9 acetylation and H3K4 methylation (green circles). Y is a transcriptionally silent gene with dense DNA methylation, a closed chromatin structure, interaction with heterochromatin proteins (red protein complex), and histone modifications such as H3K27 methylation (pink circles). B, The abnormal cell could switch its epigenotype through the silencing of normally active genes or activation of normally silent genes, with the attendant changes in DNA methylation, histone modification, and chromatin proteins. In addition, the epigenetic lesion could include a change in the number or density of heterochromatin proteins in gene X (such as EZH2 in cancer) or euchromatic proteins in gene Y (such as trithorax in leukemia). There may also be an abnormally dense pattern of methylation in gene promoters (shown in gene X) and an overall reduction in DNA methylation (shown in gene Y) in cancer. The insets show that the higher-order loop configuration may be altered, although such structures are currently only beginning to be understood.  (Adapted from Feinberg AP: Phenotypic plasticity and the epigenetics of human disease. Nature 2007;447:433–440.)





Monitoring global gene expression patterns of cells represents one of the latest breakthroughs in developing a molecular taxonomy of cancer. Genome-wide profiling of gene expression in tumors delivers an unprecedented view into the biological processes underlying tumor progression by following the changes in a tumor cell's transcriptional landscape. Relying on two-color fluorescence-based microarray technology (DNA microarray), simultaneous evaluation of thousands of gene transcripts and their relative expression can provide a snapshot of the “transcriptome,” the full complement of RNA transcripts that are produced at a specific time during the progression of malignancy.

Transcriptional profiling using microarrays typically involves screens of mRNA expression from two sources (such as tumor and normal cells), using complementary cDNA or oligonucleotide libraries that are arranged in extremely high density on microchips. These are probed with a mixture of fluorescently tagged cDNAs generated from the tumor and normal samples, which results in differential staining of each gene spot. The relative intensity of the two different colors reflects the RNA expression level of each gene in each sample as analyzed with a laser confocal scanner ( Fig. 1-10 ). By using microarrays, single genes that constitute diagnostic, prognostic, or therapeutically relevant markers can be systematically monitored. Alternatively, the entire set of expressed genes can be collectively analyzed by using powerful statistical methods to classify tumors by their transcriptional profile. Microarray analysis has already dramatically improved our ability to explore the genetic changes that are associated with cancer etiology and development and is providing new tools for disease diagnosis and prognostic assessment. For example, DNA microarray analysis of multiple primary breast tumor transcriptomes has revealed a reproducible 70-gene expression signature that was recently cleared by the U.S. Food and Drug Administration for a PCR-based application in which expression analysis of a relatively small gene group can predict the prognosis of early-stage breast cancers. When applied on a larger scale, these assays can predict response to chemotherapy or optimize pharmaceutical intervention by targeting therapeutic approaches to specific patient populations and, ultimately, to individualized therapy.


Figure 1-10  Microarray-based expression profiling of breast tumor tissue. A, Reference RNA and tumor RNA are labeled by reverse transcription with different fluorescent dyes (green for the reference cells and red for the tumor cells) and are hybridized to a cDNA microarray containing robotically printed cDNA clones. B, The slides are scanned with a confocal laser scanning microscope, and color images are generated with RNA from the tumor and reference cells for each hybridization. Genes that are upregulated in the tumors appear red; those with decreased expression appear green. Genes with similar levels of expression in the two samples appear yellow. Genes of interest are selected on the basis of the differences in the level of expression by known tumor classes (e.g., BRCA1-mutation-positive and BRCA2-mutation-positive). Statistical analysis determines whether these differences in the gene expression profiles are greater than would be expected by chance. C, The differences in the patterns of gene expression between tumor classes can be portrayed in the form of a color-coded plot, and the relationships between tumors can be portrayed in the form of a multidimensional-scaling plot. Tumors with similar gene expression profiles cluster close to one another in the multidimensional-scaling plot. D, Particular genes of interest can be further studied through the use of a large number of arrayed, paraffin-embedded tumor specimens, referred to as tissue microarrays. E, Immunohistochemical analyses of hundreds or thousands of these arrayed biopsy specimens can be performed to extend the microarray findings.  (From Hedenfalk I, Duggan D, Chen Y, et al: Gene expression profiles in hereditary breast cancer. N Engl J Med 2001;344:539–548.)




Serial analysis of gene expression (SAGE) provides a simultaneous, comprehensive evaluation of multiple mRNA species. Unlike microarray analysis, SAGE does not require prior knowledge of the genes of interest and provides quantitative and qualitative data of potentially every transcribed sequence in a particular tissue or cell type. Furthermore, SAGE can quantify low-abundance transcripts and can reliably detect relatively small differences in transcript concentrations between cell populations. The SAGE method generates a short sequence tag that functions as a unique identifier of a transcript, derived from a defined location within that transcript. Many transcript tags are concatenated into a single molecule and then sequenced, revealing the identity of multiple tags simultaneously. The relative presentation of each member of a SAGE tag library is proportional to the corresponding mRNA abundance in the original transcript population. Comparative expression profiles can then be deduced by comparing the abundance of individual tags within each sample set. This allows changes in global expression profiles of normal or malignant tissues under different therapeutic conditions to be rapidly evaluated.

These technologies can be applied to the analysis of noncoding RNA species as well. Beside the 20,000 protein-coding transcripts that are used to classify a wide variety of human tumors, hundreds if not thousands of small, noncoding interference RNA species have recently been discovered with critical functions in multiple biological processes, many of which are directly or indirectly involved in the control of cell proliferation. Known as microRNAs (miRNAs), these short transcripts arise from primary genome-encoded transcripts of variable sizes that are processed into 70- to 100-nucleotide hairpin-shaped precursors, which are processed into mature miRNAs of 21- to 23-base-pair RNA molecules ( Fig. 1-11 ). miRNAs function by base-pairing with specific mRNAs to inhibit translation or to promote mRNA degradation. In the context of cancer, miRNAs could act in concert with other effectors such as p53 to inhibit inappropriate cell proliferation. A global decrease in miRNA levels is often observed in human cancers, indicating that small RNAs could have an intrinsic function in tumor suppression. The utility of monitoring the expression of miRNAs in human cancer is just now being explored, but preliminary findings reveal an extraordinary level of diversity in miRNA expression across cancers and the large amount of diagnostic information that is encoded in a relatively small number of miRNAs. Significant technologic advances facilitating the profiling of the miRNA expression patterns in normal and cancer tissues hint at the unexpected greater reliability of miRNA expression signatures than the respective signatures of protein-coding genes in classifying cancer types. Along with their potential diagnostic value, miRNAs are also being tested for their prognostic use in predicting clinical behaviors of cancer patients.


Figure 1-11  MicroRNA production and gene regulation in animal cells. Mature functional microRNAs of approximately 22 nucleotides are generated from long primary microRNA (pri-microRNA) transcripts. First, the pri-microRNAs, which usually contain a few hundred to a few thousand base pairs, are processed in the nucleus into stem-loop precursors (pre-microRNA) of approximately 70 nucleotides by the RNase III endonuclease Drosha and DiGeorge syndrome critical region gene 8 (DGCR8). The pre-microRNAs are then actively transported into the cytoplasm by exportin 5 and Ran-GTP and further processed into small RNA duplexes of approximately 22 nucleotides by the Dicer RNase III enzyme and its partner Loqacious (Loqs), a homolog of the human immunodeficiency virus transactivating response RNA-binding protein (TRBP). The functional strand of the microRNA duplex is then loaded into the RNA-induced silencing complex (RISC). Finally, the microRNA guides the RISC to the target messenger RNA (mRNA) target for translational repression or degradation of mRNA.  (Adapted from Chen C-J: MicroRNAs as oncogenes and tumor suppressors. N Engl J Med 2005;353:1768–1771.)




Although Northern blot analysis is a reliable technique to detect gene expression at the mRNA level, it has some limitations, such as unequal hybridization efficiency of individual probes and difficulty in detecting multiple miRNAs simultaneously. For cancer studies, it is important to be able to compare the expression pattern of all known miRNAs between cancer cells and normal cells. Thus, DNA microarrays are used to detect the full complement of miRNA expression at a single point in time. Since probe specificity in miRNA microarray analysis can be problematic, owing to the small target size, hybridization can first be performed in solution and then quantified by using multicolor flow sorting. Real-time PCR can also be employed to quantify specific miRNA sets or to capture a more detailed picture of their changing expression profiles in tumor progression. Identification of the miRNAs that are involved in tumor pathogenesis and elucidation of their action in a specific cancer will be the next necessary steps for their manipulation in a therapeutic setting.


The term proteome describes the entire complement of proteins that are expressed by the genome of a cell, tissue, or organism. More specifically, it is used to describe the set of all the expressed proteins at a given time point in a defined setting, such as a tumor. Like RNA transcription, the synthesis of proteins is a highly regulated process that contributes to the specific proteome of a particular cell and can be perturbed in diseases such as cancer.

Advances in protein analytical techniques over the last decade have progressed to the point at which even small numbers of specific proteins expressed in tissues can be used to predict the prognosis of a cancer. The improvement of protein-based assays has made it possible to identify and examine the expression of most proteins and to envision large-scale protein analysis on the level of gene-based screens. Various systematic methodologies contributed to the current explosion of information on the proteome and are being compared for their ability to provide suitable platforms for generating databases on protein structural features, interaction maps, activity profiles, and regulatory modifications.

The yeast two-hybrid system is a popular genetics-based approach for detecting protein-protein interactions inside a cell ( Fig. 1-12 ). One protein that is fused to the DNA binding domain (bait) and a different protein that is fused to the activation domain of a transcriptional activator (prey) are expressed together in yeast cells. If the bait and prey interact, transcription of a reported gene is induced anddetected, typically by a color reaction that reflects the transactivation of the reporter gene and, by proxy, the interaction of the two test proteins. The method can also be used for large-scale protein interactions, RNA-protein interactions, and protein-ligand binding.


Figure 1-12  Exploring protein-protein interactions with the yeast two-hybrid system. Two-hybrid technology exploits the fact that transcriptional activators are modular in nature. Two physically distinct functional domains are necessary to get transcription: a DNA-binding domain that binds to the DNA of the promoter and an activation domain that binds to the basal transcription apparatus and activates transcription. A, The known gene encoding protein A is cloned into the “bait” vector, fused to the gene encoding a DNA-binding domain from some transcription factor. When placed into a yeast system with a reporter gene, this fusion protein can bind to the reporter gene promoter, but it cannot activate transcription. B, Separately, a second gene (or a library of cDNA fragments encoding potential interactors), protein B, is cloned into the “prey” vector, fused to an activation domain of a different transcription factor. When placed into a yeast strain containing the reporter gene, it cannot activate transcription because it has no DNA-binding domain. C, When the two vectors are placed into the same yeast, a transcription factor is formed that can activate the reporter gene if protein B, made by the second plasmid, binds to protein A. D, Screening a yeast two-hybrid library. The plate on the left holds 96 different yeast strains in patches (or colonies), each of which expresses a different bait protein (top). The plate on the right holds 96 patches, each of the same yeast strain (prey strain) that expresses a protein fused to an activation domain (prey). The plate of bait strains and the plate of prey strains are pressed to the same replica velvet, and the impression is lifted with a plate containing YPD medium. After 1 day of growth on the YPD plate, during which time the two strains mate to form diploids, the YPD plate is pressed to a new replica velvet, and the impression is lifted with a plate containing diploid selection medium and an indicator such as X-Gal. Blue patches (dark spots) on the X-Gal plate indicate that the lacZ reporter is transcribed, suggesting that the prey interacts with the bait at that location.  (C, Text afterhttp://www.invitrogen.com/catalog_project/cat_hybrid.html [July, 2000]; figure retrieved from http://www.nature.com/nature/journal/v403/n6770/pdf/403601a0.pdfD, From Bartel PL, Fields S (eds): The Yeast Two-Hybrid System. New York, Oxford University Press, 1997; Finley RL Jr, Brent R: Two-hybrid analysis of genetic regulatory networks. Retrieved from http://www.genetics.wayne.edu/finlab/YTHnetworks.html.)




As a complementary proteomics tool, mass spectrometry is an accurate mass measurement of charged peptides that are isolated by two-dimensional gel electrophoresis, producing a mass-to-charge ratio of charged samples under vacuum that can be used to determine the sequence identity of peptides. Combined with a specific proteolytic cleavage step, mass spectroscopy can be used for peptide mass mapping. Automation of this process has made mass spectroscopy the analytic tool of choice for many proteomics projects.

Monoclonal antibodies (mAbs) have been a cornerstone of protein analysis in cancer research and more recently have risen to prominence as cancer therapeutics based on their exquisite specificity for protein targets and their potent interference with protein function. Laboratory mice have been the animal model of choice for generating a ready source of diverse high-affinity and high-specificity mAbs; however, the use of rodent antibodies as therapeutic agents has been restricted by the inherent immunogenicity of mouse proteins in a human setting. The more recent application of transgenic mouse technology to introduce variable regions encoded by human sequences into the corresponding mouse immunoglobulin genes has enabled the generation of “humanized” therapeutic mAbs with reduced immunogenicity. Numerous of these mAb-based agents are currently in trial or in use as therapeutics for cancer, and the potential for further optimization of mAbs through genetic engineering promises to open new avenues for in vivo therapy.

From an epigenetics perspective, new techniques are enabling the genome-wide characterization of protein-DNA interactions that can uncover novel transcription factor targets, histone modifications, and DNA methylation patterns within a cancer cell. Combining chromatin immunoprecipitation (ChIP) with microarray (ChIPonchip) allows genome-wide screening for the binding position of protein factors to their gene targets. In ChIPonchip assays, a cross-linking reagent is applied in vivo to proteins associated with DNA in the nucleus, which then can be coimmunoprecipitated with specific antibodies to the protein under analysis. The bound DNA and appropriate controls are then fluorescently labeled and applied to microscopic slides for microarray analysis, rendering a simultaneous profile of all the binding positions of specific proteins in the cancer cell's genome ( Fig. 1-13 ).

After a decade of development, proteomics is still primarily a basic research activity, yet in the near future, this technology is likely to have a profound impact on medicine. By defining the collective protein-protein interactions in a cancer cell (its “interactome”), functional relationships between disease-promoting genes could be revealed that would provide novel candidates for intervention. Networks of disorder-gene associations are already being built that offer a platform for describing all known phenotype and disease gene associations, often indicating the common genetic origin of many diseases. A precise diagnosis of cancer using proteomics could be envisioned, based on highly discriminating patterns of proteins in easily accessible patient samples. Proteomics information also promises to provide sophisticated mathematical models of the molecular events underlying a process as complex as neoplastic transformation, which will capture the dynamics of the disease with unprecedented power.


Once the mechanistic underpinnings of a particular cancer have been described, creating an animal model to test that mechanism becomes critical to understanding the pathophysiology and to design therapeutic strategies for treatment. Recent advances in manipulating the mouse genome have resulted in more sophisticated models of human cancer. These methodologies can circumvent embryonic death by targeted alteration of gene expression only after a critical period in development and reduce the complexity of gene functional analysis by restricting its pattern of activation. Inducible gene expression or silencing also allows acute, as opposed to chronic, effects to be assessed.

Integrating an oncogene that causes malignancy into the genome of a mouse without altering the mouse's own genes generates a transgenic, cancer-prone mouse that transmits this trait to its offspring with a dominant pattern of inheritance. Although species differences in tumor susceptibility and disease remission exist between mouse and human, the tools for genetic manipulation in mouse are superior to those in other mammals, and useful information about the function of oncogenes can be gained by targeted expression of mutant protein products in mouse tissues.

The technology for producing transgenic mice joins recombinant DNA methodology with standard techniques that are used today by in vitro fertilization clinics, relying on our understanding of mammalian reproduction and the development of protocols to harvest, manipulate, and reimplant eggs and early embryos ( Fig. 1-14 ). The transgene is constructed so that the gene product will be expressed under appropriate spatial and temporal control. In addition to all the standard signals that are necessary for efficient transcription and translation of the gene, transgenes contain a promoter, or regulatory region, that drives transcription in either a ubiquitous or tissue-restricted pattern. This requires an extensive knowledge of genetic regulation in the target cells. A recent advance that circumvents this requirement involves embedding the transgene inside another gene locus that is expressed in the desired pattern. Held in a bacterial artificial chromosome for easier manipulation, this long stretch of DNA surrounding the host gene is likely to carry all the necessary regulatory information to guarantee a predictable expression pattern of the introduced transgene.


Figure 1-14  Generation of transgenic mice. The transgene containing the DNA sequences necessary for the expression of a functional protein is injected into the male (larger) pronucleus of uncleaved fertilized eggs through a micropipette. The early embryos are then transferred into the reproductive tract of a female mouse that has been rendered “pseudopregnant” by hormonal therapy. The resulting pups (founders) are tested for incorporation of the transgene by assaying genomic DNA from their tails. Founder animals that have incorporated the transgene (+) are mated with nontransgenic mice, and their offspring are mated with each other to confirm germline integration and to establish a line of homozygous transgenic mice. Several transgenic lines that have incorporated different numbers of transgenes at different integration sites (and thus express various amounts of the protein of interest) are usually studied. UT, untranslated.  (Adapted from Shuldiner AR: Transgenic animals. N Engl J Med 1996;334:653–655.)




The transgene DNA is then injected into the male pronucleus of a fertilized mouse egg, obtained from a female mouse in which hyperovulation has been hormonally induced. The injected eggs are cultured to the two-cell stage and then implanted in the oviduct of another recipient female mouse. Transgenic pups are identified by the presence of the transgene in their genomic DNA (obtained from the tip of the tail and analyzed by PCR). Typically, several copies of the transgene are incorporated in a head-to-tail orientation into a single random site in the mouse genome. About 30% percent of the resulting pups will have integrated the transgene into their germline DNA and constitute the founders of the transgenic lines. RNA analysis of their progeny determines the level of transgene expression and whether the transgene is being expressed in the desired location or at the appropriate time. Given the variability in transgene number and chromosomal location, transgene expression patterns and levels can diverge considerably among different founder lines carrying the same transgene.

In general, transgenesis is optimal for modeling oncogenic mutations that cause a gain of function, producing disease even when they occur in only one of a gene's two alleles. For example, an activating mutation in a growth factor that causes abnormal cell proliferation can be mimicked by introducing a transgenic version of the mutated growth factor gene under the control of an appropriate regulatory sequence for expression in the tissue of interest. The relative susceptibility of such a transgenic mouse to tumorigenesis can help to distinguish between a primary and secondary role of the mutant factor, and established lines of these animals can be used for testing new therapeutic protocols.

The genetic construction of cancer-prone transgenic mice with the capacity to induce oncogene expression in vivo provides a new avenue to modeling the role of oncogenes in tumor generation and maintenance. This technology relies on conditional mutagenesis.

Producing conditional mutations in mice requires a DNA recombinase enzyme that does not recognize any mouse sequence but rather targets short, foreign recognition sequences to catalyze recombination between them. By strategic placement of these recognition sequences in appropriate orientations either beside or within a mouse gene, the recombination results in deletion, insertion, inversion, or translocation of associated genomic DNA ( Fig. 1-15 ). Two recombinase systems are currently in use: the Cre-loxP system from bacteriophage P1 and the Flp-FRT system from yeast. The 34–base-pair loxP or FRT recognition sequences do not occur in the mouse genome, and both Cre and Flp recombinases function autonomously, without the need for cofactors. Cre- or Flp-mediated recombination is not distance or cell-type dependent and can occur in proliferating or differentiated tissues.


Figure 1-15  Conditional mutagenesis schemes demonstrated with the Cre-loxP system. A, Two mouse lines are required for conditional gene deletion: a conventional transgenic mouse line with Cre targeted to a specific tissue or cell type and a mouse strain that embodies a target gene (endogenous gene or transgene) flanked by two loxP sites in a direct orientation (“floxed gene”). Recombination (excision and consequently inactivation of the target gene) occurs only in cells that express Cre recombinase. Hence, the target gene remains active in all cells and tissues that do not express the Cre recombinase. B, The Z/EG double reporter system. These transgenic mice constitutively express lacZ under the control of the cytomegalovirus enhancer/chicken actin promoter. Expression is widespread, notable exceptions being liver and lung tissue. Expression is observed throughout all embryonic and adult stages. When crossed with a Cre recombinase-expressing strain, lacZ expression is replaced with enhanced green fluorescent protein expression in tissues that express Cre. This double reporter system makes it possible to distinguish a lack of reporter expression from a lack of Cre recombinase expression while providing a means to assess Cre excision activity in live animals and cells.  (A, Courtesy of Kay-Uwe Wagner, National Institutes of Health. B,From Novak A, Guo C, Yang W, et al: Z/EG, a double reporter mouse line that expresses enhanced green fluorescent protein upon Cre-mediated excision. Genesis 2000;28:147–155.)




The general scheme involves two mouse lines, one carrying the recombinase either as a transgene driven by inducible regulatory elements or knocked into one allele of a gene expressed in the desired tissue. The other mouse line harbors a modified gene target, including recognition sequences. Mating the two lines results in progeny carrying both the target gene and the recombinase, which interacts with the target gene only in the desired tissue.

A popular conditional methodology is based on the activation of nuclear hormone receptors to control gene expression. Two current systems involve activation of a mammalian estrogen receptor, an estrogen analog 4-hydroxy-tamoxifen, or an insect hormone receptor with the corresponding ligand ectodysone. Although several variations on these hormone-receptor systems are currently in use, the underlying principle is the same. The Cre recombinase gene or another regulatory protein, such as a transcription factor, is fused with the ligand-binding domain from a nuclear hormone receptor protein. The resulting chimeric transgene is placed under the control of a promoter that directs expression to the tissue of interest, and transgenic animals are generated. In the absence of the hormone or an analog, the fusion protein accumulates in the desired tissue but is rendered inactive through its association with resident heat shock proteins. Administered hormone, either systemically or topically, binds to ligand-binding domain moiety of the fusion protein, dissociates it from the heat shock protein, and allows the transcriptional regulatory component to find its natural DNA targets and promote lox-P mediated recombination or, in the case of an inducible transcription factor, activate expression of the corresponding genes. If the ligand-binding domain is fused to a recombinase, administration of hormone leads to the rearrangement of target sequences. This reaction is not reversible but lends additional temporal control over recombinase-based mutation. If the ligand-binding domain is fused to a transcription factor, removal of hormone leads to inactivation of the fusion protein and gene downregulation.

Another inducible method in use is the tetracycline (tet) regulatory system. In the classic design (tTA or tet-off), a fusion protein combining a bacterial tet repressor and a viral transactivation domain drives expression of the target transgene by binding to upstream tet operator sequences flanking the transgene transcription start site. In the presence of the antibiotic inducer, the fusion protein is dissociated from the operator sequences, inactivating the transgene. In a complementary design, called reverse tTA (rtTA or tet-on), structural modification of the tet repressor makes the antibiotic an active requirement for binding of the fusion protein to the operator sequences, such that its administration activates transgene expression at any time during the life span of the mouse, whereas withdrawal results in downregulation of the gene. It is important that the transgene integrate into a genomic locus that permits proper tTA or rtTA regulation so that the system exhibits minimal intrinsic leakiness and good antibiotic responsiveness.

Conditional expression systems have already been developed to generate hematopoietic, leukemogenic, and lymphomagenic mutations in the mouse, as well as solid tumors. These inducible cancer models can be exploited to identify oncogenic signals that influence host-tumor interactions, to establish the role of a given oncogenic lesion in advanced tumors, and to evaluate therapies targeted toward cancer-causing mutations. Potential clinical application of inducible systems include targeting virally delivered transgene expression to malignant tissues by the use of specific inducible regulatory elements, restricting the expression of transgenes exclusively to affected tissues, and increasing the therapeutic index of the vectors, particularly in the context of solid tumors. In all cases, a basic knowledge of the specific mutations that are involved in the molecular genetics of malignancies is required, since it is often unclear that the causal mutation underlying the genesis of neoplasia continues to play a central role in the progression to the fully transformed state. This is particularly important in modeling cancers that are characterized by genetic plasticity, in which drug resistance can arise subsequent to primary tumor formation.


In contrast to dominantly acting oncogenes, recessive genetic disorders, such as loss-of-function mutations in a tumor suppressor gene, require both copies (alleles) of a gene to be inactivated. The methods that are needed to produce animal models of recessive genetic disease differ from those that are used in studying dominant traits. Gene knockout technology has been developed to generate mice in which one allele of an endogenous gene is removed or altered in a heritable pattern ( Fig. 1-16 ). Gene disruption or replacement is first engineered in pluripotent cells, termed embryonic stem (ES) cells, which are genetically altered by introduction of a replacement gene that is inactive or mutant.


Figure 1-16  Gene knockout strategy for generating mice that lack a tumor suppressor gene. Embryonic stem cells (upper left panel) contain the tumor suppressor cellular gene (upper right panel), which consists of exon 1 (olive green, a 5′ noncoding region), an intron, and exon 2 (red, a protein-coding region, and yellow, a 3′ noncoding region). A knockout vector consisting of a collinear assembly of a DNA flanking segment 5′ to the cellular gene (blue), the phosphoglycerate kinase-bacterial neomycin gene (pgk-neo, violet), a 3′ segment of the cellular gene (yellow), a DNA flanking segment 3′ to the cellular gene (green), and the phosphoglycerate kinase-viral thymidine kinase gene (pgk-tk, orange) is created and introduced into the embryonic stem cell culture. Double recombination occurs between the cellular gene and the knockout vector in the 5′ homologous regions and the 3′ homologous regions (dashed lines), resulting in the incorporation of the inactive knockout vector, including pgk-neo but not pgk-tk, into the cellular genomic locus of the embryonic stem cell. The presence of pgk-neo and the absence of pgk-tk in these replaced genes will allow survival of these embryonic stem cells after positive-negative selection with neomycin and ganciclovir. The clone of mutant embryonic stem cells is injected into a host blastocyst, which is implanted into a pseudopregnant foster mother and subsequently develops into a chimeric offspring (bottom panel). The contribution of the embryonic stem cells to the germ cells of the chimeric mouse results in germline transmission of the embryonic stem cell genome to offspring that are heterozygous for the mutated tumor suppressor allele. The heterozygotes are mated to produce mutant, cancer-prone mice that are homozygous for tumor suppressor deficiency.  (Modified from Mazjoub JA, Muglia LJ: Knockout mice. N Engl J Med 1996;334:904–906.)




To reduce random integration of the foreign DNA, the replacement gene is embedded into a long stretch of DNA from its native locus in the mouse, which targets the recombination event to the homologous position in the ES cell genome. Inclusion of selectable markers along with the replacement gene allows selection of the cells in which homologous recombination has taken place. Site-specific recombinase systems combined with gene-targeting techniques in ES cells can also be used to inducing recessive single point mutations or site-specific chromosomal rearrangements in a tissue- and time-restricted pattern. In a variation on this theme called knock-in, a foreign gene, such as one encoding a marker, can be placed in the locus of an endogenous gene. The engineered ES cells are then microinjected into the cavity of an intact mouse blastocyst sufficiently early in gestation that these cells can, in principle, populate all the tissues of the developing chimeric embryo. This is rarely the case, so contribution of ES cells to the resulting animal is most often assessed by using ES cells and blastocysts whose genes for coat color differ.

If the ES cells contribute to the germ cells of the founder mouse, their entire haploid genome can be passed on to subsequent generations. By mating subsequent progeny of the founder mouse, both alleles of the mutated gene can be passed to a single animal. Overlapping genetic functions can also be defined by crossbreeding mice with mutations in different genes. In this way, it is possible to study the combinatorial effects of oncogene and tumor suppressor gene mutations.

These experimental systems are of great value in dissecting the pathogenesis of many tumor types. In some knockout studies, the phenotype of the mutated gene is anticipated by prior knowledge of the gene's function. However, unexpected mutant phenotypes may help clarify the mechanism of the underlying neoplasia. Pharmacologic manipulation of transgenic knockout animal models of cancer will prove useful in screening therapeutic agents with potential for study in clinical trials. Therapy involving gene or cell replacement can be also tested in genetically engineered disease models.

Several caveats are important in considering the use of knockout technology. Most knockout mutations are loss-of-function (null) germline mutations. Inactivation of widely expressed genes with multiple functions may have complex phenotypes. Conversely, if the functions of two genes overlap, a mutation in one of the genes might not produce an abnormal phenotype, owing to compensation by the unaltered partner.

Perhaps the greatest drawback of conventional knockout technology derives from the disruption of gene function at the earliest stage of its expression. If the gene has a vital developmental role, the identification of functions later in development can be occluded. Therefore, although the generation of a null mutation is an excellent starting point for analysis, it is far from being functionally exhaustive. For these reasons, conditional mutagenesis is the method of choice for the elucidation of the gene functions that exert pleiotropic effects in a variety of cell types and tissues throughout the life of the animal, which is particularly relevant for the generation of mouse models of adult-onset diseases such as cancer.

By using recombinase-mediated gene mutation described previously for conditional transgenesis, conditional knockout mutations can be designed to disrupt the function of a target gene in a specific tissue (spatial control) and/or life stage (temporal control). Depending on the design of the experiment, recombinase action can delete an entire gene, remove blocking sequences to induce gene expression, or rearrange chromosomal segments. With the advent of recent internationally coordinated systematic mutagenesis programs that aim to place a conditional inactivating mutation in each of the 20,000 genes in the mouse genome, the possibilities for modeling cancer are limited only by a researcher's choice of the gene loci to test. The constantly evolving techniques for gene manipulation in vivo constitute a major advance in cancer research. They promise to provide integration of underlying molecular biological principles of malignancy with pathophysiologic consequences, generating an invaluable resource for understanding the complex genetics of tumor formation that holds great promise for improved treatment of human cancer.


Alberts B, Johnson A, Lewis J, et al: Molecular Biology of the Cell,  4th ed.. London, Taylor and Francis Group, 2002.

Pecorino L: Molecular Biology of Cancer: Mechanisms, Targets, and Therapeutics,  New York, Oxford University Press, 2005.

Weinberg RA: Biology of Cancer,  London, Garland Science, 2006.


Chen C-J: MicroRNAs as oncogenes and tumor suppressors.  N Engl J Med  2005; 353:1768-1771.

Feinberg AP: Phenotypic plasticity and the epigenetics of human disease.  Nature  2007; 447:433-440.

Frese KK, Tuveson DA: Maximizing mouse cancer models.  Nat Rev Cancer  2007; 7:654-658.

Goh KI, Cusick ME, Valle D, et al: The human disease network.  Proc Natl Acad Sci USA  2007; 104:8685-8690.

Hunter K: Host genetics influence tumour metastasis.  Nat Rev Cancer  2006; 6:141-146.

Rosenthal N, Brown S: The mouse ascending: perspectives for human-disease models.  Nat Cell Biol  2007; 9:993-999.

Wu J, Smith LT, Plass C, Huang TH: ChIP-chip comes of age for genome-wide functional analysis.  Cancer Res  2006; 66:6899-6902.