Pocket Oncology (Pocket Notebook Series), 1st Ed.

GENE EXPRESSION PROFILING

Chung-Han Lee

Gene Expression Profiling

• Goal: Get a global understanding of cellular functions by the simultaneous measurement of thousands of genes

• Rationale:

DNA is shared among multiple cell types; however, cell fates & physiology varies

Protein expression & post-translational modification ultimately decide physiology; however, global assessment of protein levels & states remains impractical

Measuring RNA levels can serve as a proxy for measuring proteins

• Assumptions:

RNA levels correlate to protein expression (ie, mRNA that is transcribed proportional to protein translated)

Protein levels drive activity > post-translational modification

Identified RNA → unique protein (ie, assay distinguishes between alternative splice forms)

• Basic Experimental Design:

Two or more experimental conditions are designed

Samples from the experimental conditions are assayed

Statistical comparisons are made between expression levels of thousands of genes

Individual genes vs. clusters of related genes are used to define the roles of experimental conditions

• Limitations:

Expense of assay limits numbers of samples → limits statistical power

Different genes may have different thresholds for significant changes (eg, changes in PTEN mRNA levels more significant than changes in actin mRNA levels)

In tightly regulated proteins constant mRNA may not equal protein levels

Proteins in signal transduction pathways exist in on/off states related to post-translational modification, localization, & binding partners, mRNA levels give no info regarding those states

DNA Microarrays

• AKA: Gene chips

• The principle:

Probes = short unique seq of DNA designed to bind specific cDNA or mRNA

Targets = cDNA or mRNA from genes of interest

A microfluidic chip is designed w/DNA probes placed at known locations

Samples are placed on chip to allow probes to capture targets

Binding of probe & targets are assayed

RNA-Seq

• AKA: Whole transcriptome seq

• The principle:

Build a cDNA library—coding RNAs identified by 3’ polyA tail, ribosomal RNA removed by collecting RNAs containing polyA tail, reverse transcription generates cDNA

cDNA is seq using next gen. seq technology

Data Analysis

Controversial & still subject to research

Fold change as cutoff: Easiest, but arbitrary lacks biologic rationale

Statistical testing such as ANOVA: Complicated by large numbers of genes involved, (eg, p-value < 0.01, examining 10000 genes → 100 by chance alone)

Q-value: Proposed by Yoav Benjamini & Yosi Hochberg, analogue of p-value in FDR statistical test, helps balance tradeoff between power & error

• Principle Components Analysis (PCA)

Reduces dimensions on analysis by removing or consolidating data

Chooses subset of “independent” variables

Assumes variables w/low variance yields little info & discards info

Principle component is normalized linear combination of original variables

Lossy & simplifies data for quick comparison of samples

• Self-Organizing Maps

Nonlinear generalization of principle components analysis (PCA)

Originally adapted from unsupervised neural network learning algorithms

Samples compete to become the most representative sample for each variable

Samples organize by similarity to the representative samples

• K-means Clustering

Investigator identifies number of clusters before clustering (parameter k)

k number of means are chosen

Genes & samples clustered based on distance from mean

Means are recalculated based on new clustering

Process is repeated until results converge & clusters are stable

k choice is critical for correct clustering; however, difficult to predict

Figure 5-2 Sample of Hierarchical Clustering

• Hierarchical Clustering

Initially all groups are considered individual “Clusters”

Most similar clusters are combined until a single cluster remains

Produces a clustering tree (dendrogram) showing a hierarchy of clusters

• Gene Expression Profiling and Oncology:

Expression profiling can help reclassify tumor types, correlate to physiologic properties

8000 individual genes examined in 60 cell lines

Correlations in gene expression pattern to cell of origin

Different patterns correlated to doubling time, drug metabolism, & interferon response (Nat Genet 2000;24:227)

• Hypoxic vs. Non-hypoxic Phenotype in Kidney CA:

Gene expression profiling of 91 ccRCC tumors

75/91 tumors upregulated genes associated w/hypoxia

49/75 hypoxic tumors have inactivating Mts in VHL

VHL helps degrade HIF

Loss of VHL → ↑ HIF → Hypoxia-dependent transcription → hypoxic phenotype (Nature 2010;463:360)

• Oncotype DX testing in breast CA:

Commercial clinical application of gene expression profiling

21 genes, 16 CA-related genes, 5 reference genes → RS

In tamoxifen treated node negative ER + breast CA → stratification into risk categories

RS < 18 = 6.8%

18 < RS < 31 = 14.3%

RS > 31 = 30.5%

(NEJM 2004;351:2817)

RS predicts benefit of chemotherapy in node negative ER + breast CA

High RS (>31) → RR 0.26

Low RS (<18) → RR 1.31

(JCO 2006;24:3726)

RS predicts prognosis & benefit of CAF in node positive ER + breast CA

HR 2.64, p < 0.05 for 50 point difference in RS

High RS (>31) + CAF → HR 0.59, p < 0.05

Low RS (<18) + CAF → HR 1.02, p = 0.97

(Lancet Oncol 2010;11:55)