A Systems Biology Approach to Understanding Transcriptional Networks


Material Information

A Systems Biology Approach to Understanding Transcriptional Networks
Physical Description:
1 online resource (196 p.)
Yang, Yajie
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Genetics and Genomics
Committee Chair:
Mcintyre, Lauren M
Committee Co-Chair:
Renne, Rolf Friedrich
Committee Members:
Baker, Henry V
Bungert, Jorg
Barbazuk, William Bradley
Tibbetts, Scott Aaron


Subjects / Keywords:
chip-seq -- kshv -- microrna -- snp
Genetics and Genomics -- Dissertations, Academic -- UF
Genetics and Genomics thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation


Dysregulation of gene regulatory networks (GRNs), caused by erroneous cellular or environmental factors, is associated with complex diseases. While it is important to understand the molecular function of each component contributing to the overall system, new insights for disease mechanisms can be gathered by the more contemporary systems biology approach that looks to integrate data. Towards this goal I have studied a number of aspects of biological systems that contribute to GRNs, including sequence variants, divergence in chromatin state, transcription factor occupancy, and microRNAtargeting. Infection by Kaposi’s sarcoma-associated herpes virus (KSHV) causes KS in endothelial cells but primary effusion lymphoma (PEL) and multicentric Castleman’s disease (MCD) in Bcells. I identified the tissue specific GRNs affected by a single KSHVmicroRNA, miR-K12-11, an important post-transcriptional regulator and a viral homolog of the human oncomiR miR-155. The common and divergent components of miR-k12-11 regulation were identified by linking information from secondary resources to data collected from perturbation experiments. Allelic imbalance is another important component of GRNs that contribute to phenotypic diversity and diseases. A cost-effective microarray platform was developed to assay allele-specific gene expression in Drosophilasimulans. In addition, to improve the understanding of DNA protein interaction in GRNs, I carried out a thorough exploration of methods for quantifying ChIP-seq data from multiple biological replicates. As the result, I found that the reliability of peak identification increases with the number of biological replicates. A simple majority rule improved site discovery compared to the absolute concordance of peak identification between two replicates. By examining the genetic, epigenetic, and post-transcriptional control oftranscriptional networks in different contexts, I have learned how to integrate data and aggregate knowledge of GRNs.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Yajie Yang.
Thesis (Ph.D.)--University of Florida, 2013.
Adviser: Mcintyre, Lauren M.
Co-adviser: Renne, Rolf Friedrich.
Electronic Access:

Record Information

Source Institution:
Rights Management:
Applicable rights reserved.
lcc - LD1780 2013
System ID:


Material Information

A Systems Biology Approach to Understanding Transcriptional Networks
Physical Description:
1 online resource (196 p.)
Yang, Yajie
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Genetics and Genomics
Committee Chair:
Mcintyre, Lauren M
Committee Co-Chair:
Renne, Rolf Friedrich
Committee Members:
Baker, Henry V
Bungert, Jorg
Barbazuk, William Bradley
Tibbetts, Scott Aaron


Subjects / Keywords:
chip-seq -- kshv -- microrna -- snp
Genetics and Genomics -- Dissertations, Academic -- UF
Genetics and Genomics thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation


Dysregulation of gene regulatory networks (GRNs), caused by erroneous cellular or environmental factors, is associated with complex diseases. While it is important to understand the molecular function of each component contributing to the overall system, new insights for disease mechanisms can be gathered by the more contemporary systems biology approach that looks to integrate data. Towards this goal I have studied a number of aspects of biological systems that contribute to GRNs, including sequence variants, divergence in chromatin state, transcription factor occupancy, and microRNAtargeting. Infection by Kaposi’s sarcoma-associated herpes virus (KSHV) causes KS in endothelial cells but primary effusion lymphoma (PEL) and multicentric Castleman’s disease (MCD) in Bcells. I identified the tissue specific GRNs affected by a single KSHVmicroRNA, miR-K12-11, an important post-transcriptional regulator and a viral homolog of the human oncomiR miR-155. The common and divergent components of miR-k12-11 regulation were identified by linking information from secondary resources to data collected from perturbation experiments. Allelic imbalance is another important component of GRNs that contribute to phenotypic diversity and diseases. A cost-effective microarray platform was developed to assay allele-specific gene expression in Drosophilasimulans. In addition, to improve the understanding of DNA protein interaction in GRNs, I carried out a thorough exploration of methods for quantifying ChIP-seq data from multiple biological replicates. As the result, I found that the reliability of peak identification increases with the number of biological replicates. A simple majority rule improved site discovery compared to the absolute concordance of peak identification between two replicates. By examining the genetic, epigenetic, and post-transcriptional control oftranscriptional networks in different contexts, I have learned how to integrate data and aggregate knowledge of GRNs.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Yajie Yang.
Thesis (Ph.D.)--University of Florida, 2013.
Adviser: Mcintyre, Lauren M.
Co-adviser: Renne, Rolf Friedrich.
Electronic Access:

Record Information

Source Institution:
Rights Management:
Applicable rights reserved.
lcc - LD1780 2013
System ID:

This item has the following downloads:

Full Text




2 2013 Yajie Yang


3 To my supporting family for their constant encouragement


4 ACKNOWLEDGMENTS I would like to thank my mentors Dr. Lauren McIntyre and Dr. Rolf Renne, without whom this work would not have be en possible. They provided me in valuable guidance, essential training, and a great working environment to achieve my goal as a scientist. Thank s also to my dissertation committee members: Dr. Henry Baker, Dr. Brad Barbazuk, and Dr. Jrg Bungert. I would like to acknowledge all the colleagues of the McIntyre lab and the Renne lab: Dr. Alison Morse, Justin Fear, Alexi Runnels, Chelsea Tymms, Hong S eok Choi, Lauren Gay, Vaibhav Jain and Sunantha Sethuraman. I also received tremendous help from past members: Dr. Isaac Boss, Dr. Irina Haecker, Dr. Jianhong Hu, Dr Karlie Plaisance Bonstaff, Dr. Rita Graze, Rhonda Bacher, Victor Amin, and Brandon Waltz. Lastly, special thanks to my family: my father Yang Zhengde, my late mother Bian Aixian, my stepmother Bian Cuihong, my brother Yang Ruijie, and my husband He Hao, for their unwaving support and endless love.


5 TABLE OF CONTENTS ACKNOWLEDGMENTS .................................................................................................. 4 page LIST OF TABLES ............................................................................................................ 8 LIST OF FIGURES .......................................................................................................... 9 LIST OF ABBREVIATIONS ........................................................................................... 11 ABSTRACT ................................................................................................................... 12 CHAPTER 1 INTRODUCTION .................................................................................................... 14 1.1 Overview ........................................................................................................... 14 1.2 KSHV Provides a Model for the Study of Cell Type Specific Regulatory Networks .............................................................................................................. 16 1.3 microRNAs are Important Post Transcriptional Regulators that S hape Tissue Specificity ................................................................................................. 17 1.3.1 Discovery, Biogenesis and Functional Mechanisms of miRNAs .............. 17 1.3.2 Tissue Specificity of miRNAs ................................................................... 19 1.3.3 MiRNAS and Signal Pathways ................................................................ 21 1.3.4 KSHV miR K12 11 ................................................................................... 22 1.4 Approaches to Identify miRNA Functions are Developed ................................. 26 1.4.1 Bioinformatic Target Prediction Algorithms .............................................. 26 1.4.2 Ribonomics Approaches for Target Identification .................................... 26 1.4.3 Proteome and Transcriptome Analysis after Perturbing miRNA Expression .................................................................................................... 27 1.4.4 Target Validation with Molecular Experiments ......................................... 30 1.5 Biological Functions Must Be Studied in the Context of Regulatory Networks .. 30 1.5.1 Systems Biology ...................................................................................... 31 1.5.2 Netw ork Inference from Expression Profiles ............................................ 32 1.5.3 Network Inference by Integrating Multiple Resources ............................. 35 1.6 Understanding the Gene Regulatory Networks (GRN) ..................................... 36 1.6.1 Allelic Imbalance Contributes to Phenotypes and Diseases .................... 36 1.6.2. Epigenetic Modification Determine Gene Expression of Host Cellular and Herpesvirus Genes ................................................................................ 37 1. 6.3 Transcription Factors as Major Regulators of Gene Expression ............. 39 1.6.4 Methodology to Investigate DNA Protein Interactions ............................. 41 1.6.5 Transcriptional Regulatory Networks ....................................................... 43 1.6.6 Protein Functioning Networks .................................................................. 45 1.6.7 Complex GRNs can be Inferred by Integrating Multiple Layers of Regulation ..................................................................................................... 47


6 2 A SYSTEMS BIOLOGY APPROACH TO ANALYSIS OF MIRNA PERTURBATION EXPERIMENTS ......................................................................... 56 2.1 Overview ........................................................................................................... 56 2.2 Results and Discussion ..................................................................................... 59 2.2.1 Ectopic Expression of miR K12 11 and miR 155 Caused Transcriptome Changes in BJAB an d TIVE Cells ......................................... 59 2.2.2 Effect of miR K12 11 is Amplified by Direct Targeting of Transcription Factors .......................................................................................................... 60 2.2.3 MiR K12 11 Targeted Different Components of Biological Pathways in BJAB and TIVE Cells .................................................................................... 63 2.2.4 MiR K12 11 Targets Multiple Components of Interferon Signaling Pathways in Endothelial Cells ....................................................................... 64 2.2.5 Incorporating Physical Interactions into TF Target Pairs Extends the Regulatory Networks ..................................................................................... 66 2.3 Conclusions ...................................................................................................... 67 2.4 Methods ............................................................................................................ 69 2.4.1 Experimental Design ............................................................................... 69 2.4.2 Vector System ......................................................................................... 69 2.4.3 Cell Culture .............................................................................................. 70 2.4.4 Transduction and Validation .................................................................... 71 2.4.5 Microarray Analysis ................................................................................. 71 2.4.6 Identification of Direct miRNA Targets ..................................................... 72 2.4.7 Identification of Transcription Factor Regulation ..................................... 73 2.4.8 Identification of Signaling Genes ............................................................. 74 2.4.9 Identification of Functional Interaction ..................................................... 74 3 PARTITIONING TRANSCRIPT VARIATION IN DROSOPHILA: ABUNDANCE, ISOFORMS AND ALLELES .................................................................................... 88 3.1 Overview ........................................................................................................... 88 3.2 Results .............................................................................................................. 90 3.2.1 Quality Control ......................................................................................... 90 3.2.2 Verifying the SNP Module ....................................................................... 91 3.2.3 AI Analysis ............................................................................................... 94 3.3 Conclusions ...................................................................................................... 94 3.4 Methods ............................................................................................................ 96 3.4.1 Chip Design ............................................................................................. 96 3.4.2 Verification Experiment ............................................................................ 99 3.4.3 Signal Quantification .............................................................................. 101 3.4. 4 General Quality Control ......................................................................... 102 3.4.5 SNP calling and genotyping accuracy ................................................... 105 3.4.6 Analysis of AI ......................................................................................... 105


7 4 LEVERAGING BIOLOGICAL REPLICATES TO IMPROVE ANALYSIS IN CHIPSEQ EXPERIMENTS ............................................................................................ 118 4.1 Overview ......................................................................................................... 118 4.2 Results ............................................................................................................ 122 4.2.1 There is Variability Among Biological Replicates of ChIP seq Experiments ................................................................................................ 122 4.2.2 Proportion of Common and Unique Peaks Reflects the Reproducibility of Replicates ............................................................................................... 123 4.2.3 Quantitative Signal Strength Examines the Reproducibility More Accurately ................................................................................................... 124 4.2.4 Peaks Identified in the Majority o f Replicates are Reliable .................... 125 4.2.5 Genomic Features Provide an Alternative to 7 ...................................... 127 4.3 Discussion ...................................................................................................... 128 4.4 Methods .......................................................................................................... 130 4.4.1 Data ....................................................................................................... 130 4.4.2 Analysis ................................................................................................. 131 5 CONCLUSIONS AND FUTURE DIRECTIONS .................................................... 142 5.1 Genetic, Epigenetic, and Post transcriptional Factors Contribute to Systematical Understanding of GRNs ............................................................... 142 5.2 Application of the Systems Biology Approach to Other Viral and Host Factors is Necessary to Understand GRNs During Viral Infection ..................... 144 5.3 Measurements of Additional Genetic and Epigenetic Regulators Will Complete the GRNs .......................................................................................... 145 5.4 Gene Specific Experiments Can Validate Observations in Genome Wide Profiling .............................................................................................................. 148 5.5 Cellular Background Should be Carefully Considered .................................... 149 5.6 Timeseries Measur ements Can Model the Dynamic Aspects of Viral Host Interactions ........................................................................................................ 150 5.7 Understanding the Quantitative Nature of Biological Processes ..................... 151 5.8 Conclusions .................................................................................................... 153 LIST OF REFERENCES ............................................................................................. 155 BIOGRAPHICAL SKETCH .......................................................................................... 196


8 LIST OF TABLES Table page 1 1 Validated targets of KSHV miRNAs .................................................................... 49 1 2 Algorithms for miRNA target prediction .............................................................. 50 2 1 MiRNA regulated genes in each treatment group ............................................... 76 3 1 Proportions of probes detected above the median GC band control ................ 107 3 2 Tests for the effect of sex from the 3 expression and exon probe modules at multiple FDR levels. .......................................................................................... 108 3 3 The rank of hybridization signal corresponds to the expectation based upon sequence information (homozygous genotypes). ............................................. 109 3 4 Allele imbalance overall and separated by sex (n = 6,579). ............................. 110 4 1 Summary statistics for the four ChIP seq experiments examined .................... 134 4 2 Numbers of common peaks .............................................................................. 136


9 LIST OF FIGURES Figure page 1 1 Mutiple layers of gene regulatory networks (GRN) ............................................. 51 1 2 The KSHV Genome ............................................................................................ 52 1 3 KSHV miR NAs are encoded in the KSHV latency associated region (KLAR). ... 53 1 4 Biogenesis pathway for miRNAs. ....................................................................... 54 1 5 Methodology for miRNA target identification. ..................................................... 55 2 1 MicroRNAs can affect GRNs directly and indirectly.. .......................................... 78 2 2 Experimental design and analysis pipeline.. ....................................................... 79 2 3 BJAB and TIVE cells stably express GFP after foamy virus transduction and purification by fluorescence Activated Cell Sorter.. ............................................. 80 2 4 Ectopic miRK12 11 and miR 155 expression. probl ematic an d removed from future analysis. .................................................................................................. 81 2 5 Overall effect on the transcriptome after ectopic miRNA expression.. ................ 82 2 6 Verification of microarray measurements by qPCR on four previously reported genes that were targets of miR 155/miR K2 11. .................................. 83 2 7 Different components of the same IFN pathway were targeted in TIVE and BJAB cells.. ........................................................................................................ 84 2 8 TFBS prediction for TFs directly targeted by miR K12 11 in TIVE. ..................... 85 2 9 MiR K12 11 attenuated the interferon pathways by downregulating multiple genes.. ................................................................................................................ 86 2 10 Connectivity of human proteinprotein interactions.. ........................................... 87 3 1 Probe design. A total of 2,424,414 probes were printed on the chip.. .............. 111 3 2 SNP probe design windows.. ............................................................................ 112 3 3 Distribution of SNP probe sets per gene.. ........................................................ 113 3 4 The proportion of probes detected above background (DABG) is reported for all probes sets of each sample. ........................................................................ 114 3 5 Box plot for probe intensity classified by genotype and nucleic acid.. .............. 115


10 3 6 Expression for known sex specific genes in female and male RNA sample. .... 116 3 7 Linear discriminant plots of three genotypes: AA, AC and CC. Different genotypes had hybridization patterns that are visually separable by linear discrimina nt analysis. Each genotype is colored differently. ............................. 117 4 1 Defining the consensus regions for overlapping peaks across replicates.. ....... 137 4 2 Consistency across replicates of the RNAPII ChIP seq experiment.. ............... 138 4 3 Percentages of peaks that were detected above background (DABG) in replicates where no algorithmically identified peaks were present.. ................. 139 4 4 Spearman correlation coefficients were comparable when the peaks were identified in all replicates or in the majority of the replicates. ............................ 140 4 5 BlandAltman plots showing the sample agreement, using genomic features as the quantification unit. .................................................................................. 141


11 LIST OF ABBREVIATIONS AI allele imbalance AS alternative splicing ASE allele specific expression BJAB EBVnegative, Burkitt like lymphoma CHIP chromatin immunoprecipitation CLIP crosslinked to Argonaute followed by immunoprecipitation CRE cis regulatory element DEG differentially expressed genes GRN Gene regulatory network KLAR KSHV latency associated region KS Kaposis sarcoma KSHV Kaposis sarcomaassociated herpesvirus LANA latency associated nuclear antigen PEL primary effusion lymphoma PPI proteinprotein interaction RISC RNA induced silencing complex SNP single nucleotide polymorphism TIVE telomerase immortalized human um bilical vein endothelial TF transcription factor TFBS transcription factor binding site UTR untranslated regions


12 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy A SYSTEMS BIOLOGY APPROACH TO UNDERSTANDING TRANSCRIPTIONAL NETWORKS By Yajie Yang August 2013 Chair: Lauren McIntyre Cochair: Rolf Renne Major: Genetics and Genomics Dysregulation of gene regulatory networks (GRNs), caused by erroneous cellular or environmental factors, is associated with complex diseases. While it is important to understand the molecular function of each component contributing to the overall system, new insights for disease mechanis ms can be gathered by the more contemporary systems biology approach that looks to integrate data. Towards this goal I have studied a number of aspects of biological systems that contribute to GRNs, including sequence variants, divergence in chromatin stat e, transcription factor occupancy, and microRNA targeting. Infection by Kaposis sarcomaassociated herpesvirus (KSHV) causes KS in endothelial cells but primary effusion lymphoma (PEL) and multicentric Castlemans disease (MCD) in B cells. I identified t he tissue specific GRNs affected by a single KSHV microRNA, miRK12 11, an important post transcriptional regulator and a viral homolog of the human oncomiR miR 155. The common and divergent components of miRk12 11 regulation were identified by linking information from secondary resources to data collected from perturbation experiments. Allelic imbalance is another important


13 component of GRNs that contribute to phenotypic diversity and diseases. A cost effective microarray platform was developed to assay allele specific gene expression in Drosophila simulans In addition, to improve the understanding of DNA protein interaction in GRNs, I carried out a thorough exploration of methods for quantifying ChIP seq data from multiple biological replicates. As the r esult, I found that the reliability of peak identification increases with the number of biological replicates. A simple majority rule improved site discovery compared to the absolute concordance of peak identification between two replicates. By examining t he genetic, epigenetic, and post transcriptional control of transcriptional networks in different contexts, I have learned how to integrate data and aggregate knowledge of GRNs.


14 CHAPTER 1 INTRODUCTION 1.1 Overview Gene regulatory networks (GRNs) are specific to time points and cell types, allowing an identical genome to produce various phenotypes. GRNs consist of many layers of cellular regulator s as well as the interplay across layers (Figure 11). On top of the endogenous difference among cell types, environmental factors such as pathogen molecules can interact with the cellular GRNs and cause malignant changes in the phenotype (i.e. diseases). Infection by Kaposis sarcomaassociated herpesvirus (KSHV) causes KS in endothelial cells but primary effusion lymphoma (PEL) and multicentric Castlemans disease (MCD) in B cells. Thus, KSHV gene products provide a model system to study tissue specific G RNs affected by viral host interactions. Among the few genes expressed during KSHV latency, I am particularly interested in a single KSHV encoded microRNA, miR K12 11, because (a) miRNAs are important post transcriptional regulators that contribute to tis sue specificity; (b) miRK12 11 is a homolog of the oncomir hsa miR155; and (c) miR K12 11 is expressed in PEL but not in KSHV infected telomerase immortalized human umbilical vein endothelial (TIVE) cells. Studying the cell typespecific regulatory effec ts of miR K12 11 will shed light on KSHV pathogenesis. To dissect the GRNs affected by miR K12 11, I used a systems biology approach that perturbs the biological system by changing the factor of interest (i.e. ectopic expression of miR K12 11), measures the outcome in response to such a change, and integrates the primary data with secondary information sources.


15 While perturbation has always been the base of any functional study, systematic measurements have only become possible after the breakthrough in hi gh throughput technologies that assay biological activities at the genome scale, such as microarray s and massive parallel sequencing. Now we can measure the gene expression and the components of GRNs including sequence variants, chromatin state, transcript ion factor occupancy, and miRNA targeting, all on the genomescale. Besides the experiment specific data produced in an individual laboratory, we can also take advantage of data generated by consortium projects such as ENCODE and GENCODE, and data stored in public databases such as GEO and UCSC. In this dissertation, Chapter 1 introduces the biology of KSHV (Section 1.1) and microRNAs (Section 1.2). The characteristics of miR K12 11 are particularly described. Section 1.3 reviews the progress on target id entification of miRNAs. Development of systems biology and its applications are in Section 1.4. Functions of other components of GRNs, and the current advances in the methodology to study them are introduced in Section 1.5. Chapter 2 presents my study on the tissue specific regulatory effect of miR K12 11 using a systems biology approach. Transcriptome changes were measured using microarrays. To practice my skill on microarray data analysis, I have analyzed the pilot experiments of a custom microarray chi p (Chapter 3). I have also participated in the design and annotation of this custom chip, during which I developed my bioinformatic skills and understanding of genomic features. To meet the challenges to comprehensively identify the regulators in GRNs, I have obtained training on genome sequencing, RNA seq, HITS CLIP, and ChIP seq


16 from multiple collaborative projects. Besides bioinformatic skills in data manipulation, pipeline development, software usage and scripting, my knowledge on KSHV biology, human g enomic features and resources for genomic studies has also increased through these collaborations. My experience with the ChIP seq data has led me to develop a data analysis pipeline that incorporated biological replication (Chapter 4). In the last chapter (5), conclusions and future directions are discussed. 1.2 KSHV Provides a Model for the Study of Cell Type Specific Regulatory Networks Kaposis sarcomaassociated herpesvirus (KSHV), also known as Human Herpesvirus 8 (HHV 8), is the etiological agent of human vascular tumor Kaposis sarcoma (KS). It was first isolated from KS biopsies from HIV infected patients and identified as a gamma herpesvirus (Chang et al., 1994) Similar to other g amma herpesvirus, KSHV infects lymphocytes. In addition to its namesake tumor in endothelial cells, KSHV cause s two lymphoproliferative disorders: primary effusion lymphoma (PEL) and multicentric Castlemans disease (MCD) in B cells (Cesarman and Knowles, 1999; Soulier et al., 1995) The exact pathogenesis mechanisms of KSHV remain unclear. In cell culture, KSHV infection induces reprogramming of the entire expression profiles (Chandriani et al., 2010; Hong et al., 2004; Lagos et al., 2007; Wang et al., 2004) The distinct pathological turnout of KSHV in two types of human tissues serves as a model system for studying the mechanism of cell type specific gene regulation. KSHV is a dsDNA virus, with a genome size of ~170 kb that contains a unique internal sequence (~140 kb) encoding 87 open reading frames (ORFs), which is flanked by GC rich terminal repeats (TR) (Russo et al., 1996) (Figure 1 2). More than 20 KSHV genes are homologs of cellular genes. This high level of homology is believed to give


17 KSHV an advantage in hijacking host cellular pathways without eliciting an immune response (Neipel et al., 1997) In KS tumors and PELs, the majority of cells are latently infected and express only viral genes within a specific region of the viral genome: the latency associated region (KLAR) (Dittmer et al., 1998; Renne et al., 1996; Zhong et al., 1996) (Figure 1 3) This region encodes the l atency associated nuclear antigen (LANA, master regulator of the establishment and maintenance of latency), v Cyclin (cyclin D homolog that promotes S phase entry), v Flip (promotes cell survival), the kaposin gene family (involved in cytokine mRNA stabili zation and cell transformation), and 12 miRNA genes (more in Section 1.3). Together, latent gene products modulate the gene expression and signal transduction in the infected host cells, thus establish and maintain a stable and long term infection (Arest and Blackbourn, 2009; Wen and Damania, 2009) 1.3 microRNAs are Important Post Transcriptional Regulators that Shape Tissue Specificity 1.3 .1 Discovery, Biogenesis and Functional Mechanism s of miRNAs MiRNAs are small RNAs of 19 24 nucleotides that post transcrip tionally regulate gene expression and lead to translational inhibition and/or mRNA degradation. The first miRNA gene identified is lin 4 of Caenorhabditis elegans (Lee et al., 1993; Ruvkun et al., 1991; Wightman et al., 1991; Wightman et al., 1993) It does not encode a protein, but plays a role in the development timing through complimentary binding of its RNA to the transcript of the key regulator lin14 gene, inducing translational silencing of the latter. This type of regulation turned out to be universal. MiRNAs are encoded by metazoans, plants, and viruses. MiRNAs have been identified in all herpesvirus except for Varicella Zoster virus. There are currently more than 20,000 miRNAs known


18 (mirbase.org) (Griffiths Jones, 2004) This number may grow as more sensitive techniques for discovering miRNAs are developed. Sequences of cellular miRNAs are highly conserved through evolution. A third of the miRNAs expressed by Caenorhabditis elegans have homologs found in humans (Lee et al., 2003) The selection pres sure on miRNA genes suggests the importance of their functions. Cellular and viral miRNAs share the biogen e sis machinery (Figure 1 4) Transcribed by RNA polymerase II, the primary transcript contains a stem loop structure, which is the defining character istic required for the processing next: cleaved by Drosha in the nucleus, exported to the cytoplasm and processed by Dicer. Both Drosha and Dicer are RNase III type endonucleases (Bartel, 2004) The mature product is a miRNA duplex consists of a guiding strand and its opposite passenger strand, termed as miR and miR (star). Argonaute proteins (AGO) bind the duplex together or the guiding strand alone (while the miR star is degraded), assembling it into the RNA i nduced silencing complex (RISC) (Schwarz et al., 2003; Tomari et al., 2004) RISC facilitates miRNA pairing mainly but not exclusively with the 3 untranslated regions (UTRs) of mRNA targets. The seven bases from nucleotide 2 to 8 of the guiding strand, termed as the seed sequence, mostly determine the targeting specificity (Lewis et al., 2005; Lim et al., 2005; Stark et al., 2003) Targeted mRNAs have shorter lives and are sent for degradation (Baek et al., 2008; Guo et al., 2010; Selbach et al., 2008) Assays in single cells revealed that the strength of repression varies considerably in a population of identically prepared cells (Mukherji et al., 2011) The degree of repression also depends on the concentration of the target mRNAs themselves. When the target concentration is low, miRNAs can impose effective repression, as in the case


19 of lin 14 mRNA by lin 4 miRNA during Caenorhabditis elegans development (Bagga et al., 2005) In most physiological conditions, the amount of available targets exceeds those of miRNAs, thus, miRNAs fine tune instead of switc hing the target levels on and off to (Mukherji et al., 2011) Correspondingly, m ost targets of an overexpressed miRNA were downregulated les s than 50% at the protein level as demonstrated by genome wide proteomics (Baek et al., 2008; Selbach et al., 2008) The short length of the seed sequences makes it possible that mRNAs are targets of multiple miRNAs and that miRNAs have large number of targets In mammalian species, it has been estimated that greater than half of all protein coding genes contain a miRNA target site (Friedman et al., 2009) Indeed, miRNA regulation is a widespread phenomenon. MiRNAs are involved in various biological processes including development immune response, apoptosis, hematopoiesis and tumorigenesis. The seemingly all encompassing breadth of the miRNA mediated regulatory network has fueled hypotheses about the relevance of miRNAs to human disease s Functional data are quickly accumulating t hat implicate miRNAs in human diseases including cancers (Calin and Croce, 2006; Calin et al., 2004; Garzon et al., 2008; Volinia et al., 2006) 1.3 .2 Tissue Specificity of miRNAs Many miRNAs exhibit temporally and spatially controlled expression. The tissuespecific distribution of miRNAs was investigated early on using cloning (Lagos Quintana et al., 2002; Lau et al., 2001) With technology advancements in highthroughput experiments, entire microRNAomes of vario us tissues were determined (Baskerville and Bartel, 2005; Liu et al., 2005; Lu et al., 2005; Mineno et al., 2006; Nakano et al., 2006; Nelson et al., 2007) MiRNA expression is gradually induced over the course of


20 embryonic development (Kloosterman and Plasterk, 2006; Thomson et al., 2006; Wienholds et al., 2005) Aberrant expression of miRNAs can be used as a signature for the clinical diagnosis of different types of leukemia and lymphomas (Garzon et al., 2008) There is evidence for tissue specific response of dynamic expression levels to environmental stimuli. For example, acute stress increases let 7a, miR 9 and miR 26a/b expression levels in the mouse frontal cortex, but not in the hippocampus (Rinaldi et al., 2010) MiRNA targeting also shows tis sue specificity The levels of miRNAs and their target mRNAs are often mutually exclusive among tissues (Farh et al., 2005; Sood et al., 2006; Stark et al., 2003; Tsang et al., 2007) Transcripts can esca pe miRNA regulation by expressi n g shorter 3 UTRs via utilization of alternative polyA adenylation signals (Ji and Tian, 2009) Genes with tissue specific expression have longer 3 UTRs with more miRNA binding sites (Stark et al., 2005) MiRNA regulation contributes to the differ entiation programs that lead to distinct tissue types. During embryogenesis, miRNA regulation shapes gene expression post transcriptionally to delimit spatial and temporal boundaries. Neighboring cells with different miRNA profiles end up with different terminal fates. It was found that for each tissue specifically expressed miRNA, the tissues in which their predicted targets were most highly significantly downregulated (relative to all other tissues) matched precisely the tissues in which the miRNAs are specifically expressed (Sood et al., 2006) By destabilizing unwanted transcripts that are leaky from transcriptional control or residual of previous stages, miRNAs sharpen the tissue specific transcript ome for example in worm (Farh et al., 2005) and mouse (Gao et al., 2011)


21 Tissue specific 3UTRs are largely the result of alternative splicing (AS), which is ubiquitous in human transcriptomes (Pan et al., 2008; Wang et al., 2008b) Chang et al. (2011) measured the exon usage in KSHV infected lymphatic endothelial cells (LEC) and identified 542 AS genes including several genes related to cell growth, DNA repair and angiogenesis. Functional analys is revealed Gene Ontology (GO) enrichment in multiple tumorigenesis pathways including cell cycle and insulin receptor signaling pathways. RNAPII occupancy was higher over exons than adjacent introns, likely reflecting a link between transcriptional elongation and splicing (Lee et al., 2011) By interacting with RNAPII and Chromatin remodeling factors (Allemand et al., 2008) the splicing machinery can work simultaneous during transcription. Splicing factors are subject to miRNA targeting (Boutz et al., 2007; Kalsotra et al., 2010; Makeyev et al., 2007) MiRNA s may also regulate splicing by locating near splice donor/acceptor sites in the genome. The EBV BART miRNAs reside within introns of a multiple spliced transcr ipt. The competition between miRNA maturation and mRNA splicing suppresses the usage of surrounding exons (Edwards et al., 2008) 1.3 .3 MiRNAS and Signal Pathways The small repression effect of miRNAs in physiological conditions seems to conflict with their importance indicted by the selection pressure (i.e. sequence conservation) or by DICER knockout phenotypes (Murchison et al., 2005) Inui et al. (2010) proposed that miRNAs target the genes in signaling pathways, including ligands and receptors which mediate the intercellular communication. Similar to miRNA expression, signaling pathways are frequently altered in disease cell s (Karin, 2006; Pouyssegur et al., 2006; Shaw and Cantley, 2006) Signaling pathways control fundamental biological processes such as transcription and protein synthesis, affecting


22 cell divisi on, growth and apoptosis. Many signaling pathways are under varying degrees of default repression. Dysregulation of signaling pathways may play a role in oncogenesis and cancer. Detection of viral infection induces the synthesis of interferon type I (IFN1) which in turn upregulates more than 100 genes to confer protection against the virus. The insulin receptor signaling pathway (INSR) is essential in tumorigenesis for several cancer types, including the one in KS induced by KSHV (Rose et al., 2007) Small changes in the expression level of signaling genes can affect the signal transduction cascade significantly. Many signaling genes are dosagedependent. For example, Notch, Delta and Hairless in the Drosophila Notch pathway are haploinsufficient (the loss of a single allele confers a completely penetrate, morphological mutant). The sensitivity of cell signaling can amplify the initially modest difference into a massive output (Hagen and Lai, 2008) MiRNA targeting of transductionlimiting repressors could increase pathway activity. Conversely, miRNAs might repress signal regulated events by targeting positively acting pathway components. MiRNAs often coordinately regul ate targets that function together in pathways and protein complexes (Mayr et al., 2007) The miR 17/miR 92 cluster target TSP 1 and CTGF of the MYC pathways (Dews et al., 2006) Four members of the insulin signaling pathway are validated targets of the miR 96/miR 182/miR 183 cluster (Xu and Wong, 2008) By regulating components within the same signaling pathway, these miRNA clusters act coordinately to control a biological process. 1.3 .4 KSHV miRK12 11 KSHV encodes 17 miRNAs from 12 miRNA genes, all of which are located in the latency associated region (Cai et al., 2005; Grundhoff et al., 2006; Pfeffer et al., 2005;


23 Samols et al., 2005) In PEL cells all KSHV miRNAs are highly expressed during latency, and induction of lytic replication has only m oderate effects on viral miRNA expression, with the exception of miR K10 (Cai and Cullen, 2006; Pearce et al., 2005) Valida ted targets of KSHV miRNAs (Table 1 1 ) indicate the involvement in several fundamental cellular processes: angiogenesis, cell cycle, immunity, apoptosis, and key steps in the herpesvirus life cycle such as latency and the switch from latent to lytic replication. Viral miRNAs show limited conservation among relative viral species (Schafer et al., 2007) Rather, they appear to co evolve with their respective host target sequences (Sood et al., 2006) Multiple viral homologs of host miRNAs have been identified (Gottwein et al., 2011; Gottwei n et al., 2007; Skalsky and Cullen, 2010; Waidner et al., 2009; Zhao et al., 2011) Specifically, miRK12 11 is a homolog of the oncomiR miR 155: The identical seed sequence of miR K12 11 and miR 155 suggests similar targeting specific ity (Gottwein et al., 2007; Skalsky et al., 2007) HEK293 cells transfected with miR 155 and miR K12 11 showed a subset of similarly regulated genes analyzed by microarray profiling (Skalsky et al., 2007) A predicted miR 155 target, BACH1 was proven by reporter assays to be targeted equally by miR 155 and miR K12 11(Skalsky et al., 2007) Gottwein et al. (2007) overexpressed miR K12 11 in BJAB cells and identified downregulated genes, which were enriched with predicted miR 155 targets. PEL cells express high levels of miR K12 11 but no miR 155. In contrast, only miR155 expression was detected in KSHV infected endothelial cells (Skalsky et al., 2007) Given the importance of miR 155 in immune response and tumorigenesis, it is


24 possible that miR K12 11 hijacks miR 155 function during B cell infection, but not during infection of endot helial cells. Mir155 was first identified as bic (B cell integration cluster), which is a common retroviral integration site in chicken B cell lymphomas and is noncoding (Tam et al., 1997) Expression of bic was later detected in human germinal center B cells and activated T cells by Northern blot and RNA in situ hybridization (Tam, 2001; van den Berg et al., 2003) MiR 155 has been characterized as an oncoMir, a miRNA with tumorigenic activi ty when overexpressed in hematopoietic cells in vivo (Costinean et al., 2006; Eis et al., 2005; Kluiver et al., 2005; O'Connell et al., 2008; v an den Berg et al., 2003) et al., 2003). It is overexpressed in a number of B cell lymphomas. When over expressed in transgenic mice, miR 155 leads to abnormal immune responses and lymphomas in B cells (Costinean et al., 2006) MiR 155 plays an important role at different stages of hematopoiesis. Microarray detected miR 155 express ion in early human CD34+ hematopoietic stem progenitor cells (Georgantas et al., 2007) and at much lower levels in mature peripheral B cells, T cells, monocytes, and granul ocytes (Merkerova et al., 2008; Ramkissoon et al., 2006) Over expression of miR 155 in human CD34+ progenitor cells caused a decrease in myeloid and erythroid colony formation (Georgantas et al., 2007) In vitro colony forming assays revealed that MiR 155 is differentially expressed during lineagespecific cell differentiation (Chen et al., 2004; Georgantas et al., 2007) MiR 155 is also associated with immune activation. In activated murine B cells, there was increased transient production of this miR 155 (Thai et al., 2007) Upon in vitro stimulation of the antigen receptors, primary murine macrophages expressed miR 155 (O'Connell et al., 2007) The upregulation of miR 155


25 in response to immune activation suggests its involvement in immune pathways. MiR 155 knockout mice display defective adaptive and innate immunity (Moffett and Novina, 2007; Rodriguez et al., 2007; Thai et al., 2007) A number of targets of miR 155 have been characterized by in vitro assays, these and PU.1, AID in B cells, cMAF in T cells, and SMAD2, BACH1, SOCS1, ETS1, and Meis1 in various tissues, many of which are genes that have been implicated in tumorigenesis (Costinean et al., 2009; Dorsett et al., 2008; Jiang et al., 2010; Louafi et al., 2010; Lu et al., 2008; O'Connell et al., 2009; Romania et al., 2 008; Vigorito et al., 2007) KSHV miRK12 11 can target signaling molecule in the antiviral interferon response pathway (Liang et al., 2011) MiR K12 11 expressing cells had markedly attenuated interferon signaling and enhanced viral titers in response to Sendai virus (SeV) and vesicular stomatitis virus (VSV) (Lei et al., 2010) In addition, miR K12 (Lei et al., 2010; Lu et al., 2010a; Yu et al., 2007) and regulates lytic reactivation. Inhibition of miRK12 11 leads to an increase in lytic gene expression (RTA and ORF65) in bacm id chemical agent that triggers lytic reactivation, was used (Lei et al., 2010; Lu et al., 2010a) The promoter of RTA contains a putative NFIB binding si te and e ctopic NFIB expression could activate an RTA promoter construct. ShRNA knockdown of NFIB resulted in decreased RTA expression. Boss (2011) identified a number of genes involved in B cell biology as a target of miR K12 11 in both splenic B cells and PEL cells. Chapter 2 presents a study on the tissuespecific regulatory effects of miR K12 11 in lymphatic and endothelial tissues.


26 1.4 Approaches to Identify miRNA Functions are Developed The function of a miRNA is defined by its target genes and their downstream effects. Target identification is a key step towards understanding the biological function of any miRNA. There has been a dedicated effort to develop methodology to identify miRNA targets (Figure 1 5 ). 1.4.1 Bioinformatic Target Prediction Alg orithms Rules governing the recognition of target transcripts have been summarized from studies of miRNA: target pairs (reviewed by Bartel, 2009) The target mRNA must contain complementary sites to the seed sequence (Lewis et al., 2005; Lim et al., 2005; Stark et al., 2003) Other factors used by prediction algori thms include the number and location of miRNA seed sites within the 3 UTR (Doench and Sharp, 2004) AU rich in the flanking environment (Grimson et al., 2007) thermodynamic stability (Krger and Rehmsmeier, 2006) and evolutionary conservation. In addition, machine learning approaches have been employed to predict targets based on the rules learned from known targets (Saetrom et al., 2005) Table 12 reviewed these prediction algorithms. The criteria used by individual algorithms, as well as the selection of benchmark set affects the algorithms favoring some targets while underestimating others. Alexiou et al (2009) compared eight prediction algorithms and found that agreement of target prediction among algorithms was limited. Experimental approaches woul d be necessary to validate the predicted targets. 1.4.2 Ribonomics Approaches for Target Identification It is important to note that mRNAs with these sequence properties may not be actually targeted in a given cell since both miRNA and mRNA copy number as well as the presence of other targets and their concentration contribute to targeting efficiency.


27 In contrast to predicting targets, sequencing the RNAs crosslinked to Argonaute followed by immunoprecipitation and massive parallel sequencing (CLIP seq) assays measures the targeting events of the miRNA:mRNA within RISC complexes directly. The binding within RISC is cross linked in vivo by UV light. RNase removes unprotected RNAs outside RISC. The complexes were immunoprecipitated using AGO specific antibodies. The cross linked miRNA: mRNA segments were isolated and converted to cDNA libraries for sequencing. Highthroughput sequencing o f RNA isolated by crosslinking immunoprecipitation (HITS CLIP ) (Chi et al., 2009) uses 254 nm UV light for cross linking. PhotoactivatableRibonucleosideEnhanced Crosslinking and Immunoprecipitation (PAR CLIP ) (Hafner et al., 2010) treats cells with nucleoside analogs to be incorporat ed into nascent mRNAs, and uses 365 nm UV to cross link. By analyzing the sequencing output, CLIP seq can pinpoint the exact sequence context where miRNA binding occurs. (Haecker et al., 2012) in our lab performed HITS CLIP in two PEL cell lines that are at different development stages, and identified enriched sequencing clusters as putative targets of endogenous human and KSHV miRNAs. Comparison of the results from the two cell lines further confirms the notion that miRNA regulation is cell typeand developmental stage specific. 1.4.3 Proteome and Transcriptome Analysis after Perturbing miRNA Expression Profiling the changes of proteome after perturbing the miRNA levels is another approach to identify the targets of a miRNA. Ectopic miRNA expression induces genome wide degradation of target mRNAs. The gain of function can be introduced by transfection or retroviral transduction. Conversely, loss of function of miRNA can be studied by genetic knockout or expression of miRNA silencing constructs, which allow the target levels to increase. Various types of silencing constructs have been


28 developed. To name a few, antagomirs are 2methyl backbone modified oligonucleotides that are antisense to a specific mi RNA with extensive complementarity beyond the seed sequence (Krtzfeldt et al., 2005) Sponges are miRNA inhibitors which contain multiple binding sites to the seed sequence of a miRNA that are expressed as synthetic mRNAs. When t ransgenic expressed in tissue culture or in vivo a sponge can block the entire family of related miRNAs with functional redundancy (Ebert et al., 2007) SILAC (stable isotope labeling with amino acids in cell culture) is a mass spectrometry approach to measure the relative protein abundance of samples labeled with different isotopes. SILAC studies that assay the proteomic effect of miRNA overexpression (Baek et al., 2008; Selbach et al., 2008; Vinther et al., 2006; Yang et al., 2010) have shown that repression of proteins and degradat ion of mRNAs are strongly correlated for miRNA targets, with only a small number of targets showing a change in protein level without a reduction in mRNA. Microarray analysis showed that miRNAs can reduce the levels of their target transcripts (Lim et al., 2005) F urthermore, ribosomal profiling of target mRNAs was applied to measure the target protein abundance during ectopic and endogenous miRNA targeting. Results indicated that at least 84% of miRNA mediated repression was attributable to decreased mRNA abundance (Guo et al., 2010) The notion that mRNA destabilization (not translation inhibition) is the predominant mechanism of miRNA targeting (Guo et al., 2010) endorses the transcriptome approaches including microarray and RNA seq for miRNA target identification.


29 Microarrays r ely on nucleic acid hybridization and measure gene expression levels at the genome scale (Schena et al., 1995) DNA probes are attached to a solid surface, allowing simultaneous hybridization of all the known genes in the genome. The experiment al conditions are designed to enable true homologous pairs to bind. Negative control probes measure random hybridization. Microarrays have successfully identified many putative targets of KSHV miRNAs (Gottwein et al., 20 07; Samols et al., 2007; Skalsky et al., 2007; Ziegelbauer et al., 2009) However, the differentially expressed genes (DEG) are not necessarily enriched with seed sequences (Lim et al., 2005; Park et al., 2008) indicating that many DEGs were not regulated by the miRNA directly. RNA analysis through massi vely parallel cDNA sequencing (RNA seq) provides an alternative to microarrays. The introduction of highthroughput sequencing technologies revolutionized omics studies. One of the most popular sequencing systems, Illumina Genome Analyzer, amplifies the single DNA molecules on the surface of a flow cell to generate clusters, which are then sequenced by adding fluor o phorelabeled reversible chain terminators. Each cycle processes one base, and the sequencing cycles determine the read length. The increasing speed and dropping costs allow high throughput sequencing to be applied for various purposes, such as genome re sequencing, miRNA target identification (CLIP seq), and identification of cis regulatory elements (More in Section 1.6). RNA seq has been used to study overall expression, miRNA distribution, splice events, or even new gene discovery (reviewed by Pepke et al., 2009) Cap Analysis of Gene Expression (CAGE), which directly measures the transcriptional signal at the TSS of genes, was initially developed for sanger sequencing but is now more powerful with RNA seq (Shiraki et al., 2003) In RNA seq,


30 Poly A RNAs are enriched, fragmented and converted into a cDNA library for sequencing. The resulting sequencing reads are mapped to the reference genome and counted to obtain the number of reads corresponding to genes. Wen et al. (2011) reported that targets identified from CLIP seq and those from expressionbased methods have different features. Target contextual features are highly predictive in both types. However, the seed match features are more predictive in the miRNA expression based targets, and the length of 3 UTR and conse rvation of flanking region are more discriminative in CLIP seq targets (Wen et al., 2011) E xpressionbased methods detect target genes with substantially changed expression levels, whether those are the result of direct or indirect regulation by miRNA remaining unknown. In contrast, the CLIP seq methods detect target genes bound by one or more miRNPs, but the effect of such targeting is not measured. 1.4.4 Target Validation with Molecular Experiments T ar get genes identified by high throughput methods need to be molecularly validated. The qPCR assay and western blot detect the repression effect of perturbed miRNA expression at mRNA and protein level respectively (reviewed by Kuhn et al., 2008) Luciferase reporter assays can distinguish direct targets from indirect targets (e.g. Boss et al., 2011) The caveat is that the downregulation observed in reporter assays may not ha ppen at physiological levels. 1.5 Biological Functions Must Be Studied in the Context of Regulatory N etworks Due to the limitation of each target identification method, it is best to combine the results of several methods for a target set with high confidence. While a miRNA has the potential to regulate many mRNAs, only a fraction of the potential targets may be available in the cellular context and have phenotypic consequences. T he direct mRNA


31 targets that are expressed at the same time and space with the miRNA of interest are the first step in identification of the physiological important targets Once the direct targets are known, dissecting the downstream phenotypic effects requires elucidating the functions of the individual genes in the context of the networks in which they operate. Genes and their products always act in networks with other genes and proteins and in the context of the ir environment. Mendelian diseases where a single gene switches the phenotype are rare. Whether these genes act in simpl e linear cascades or more complex manners is a topic of much research. For the more complex disorders, the phenotypes arise from accumulative genetic and/or epigenetic defects in a number of driver genes. Complex traits such as cancer are emergent properties of perturbed molecular networks that are modulated by many factors (Chen et al., 2008) It is imperative to increase knowledge on the biological mechanisms of how these genes form GRNs that are dysregulated in disease states. 1.5.1 Systems Biology The central task of systems biology is to comprehensively gather information and to integrate these data to generate predictive models for complex biological systems. To accomplish such task s a systems biology framework includes three key components: perturbation, measurements and model development (Ideker et al., 2001) Biological systems are robust with buffering mechanisms of redundant functions. GRNs are resilient to environment fluctuations. A si ngle perturbation typically does not greatly affect the biological system. For example, individual miRNA mutant worms (Miska et al., 2007) and mice (Ebert and Sharp, 2012) showed no gross phenotypes, likely due to the other overlapping functions provided by other members of the miRNA family.


32 The robustness of biological networks is demonstrated by the connectivity of the network nodes, which follows a power law distribution (Barabasi and Albert, 1999; Wagner and Fell, 2001) i.e. most genes are sparsely connected, while a few hubs that are highly connected dominate the network topology. Perturbing the nonhub genes is unlikely to have severe outcome, while perturbation of key nodes in the network may have profound effects. T he hub genes are hypothesized to be essent ial for survival, and offer perturbation targets for the understanding and the manipulation of the system. In the yeast PPI network, the hub genes control basic cellular functions (Carter et al., 2004; Han et al., 2004; Jeong et al., 2001) However, even for hub genes, the property of one individual element alone is insufficient to predict the complex c ellular behavior. In living cells, GRNs constantly detect, process, and respond to changing signals from the environment. All cellular information processing is dynamic in nature. A complete model would include isoforms and chemical modifications, and have to restrict interactions between molecules located in different cellular compartments. 1.5.2 Network Inference from Expression Profiles A series of approaches and algorithms have been developed to infer GRN s from observational data. One of the earliest approaches to construct GRNs is clustering, for which genes with similar expression pattern are organized into groups (E isen et al., 1998) Each gene in a cluster is a node of the network. Regulatory interactions are presented by edges between nodes. The correlation coefficient is an indicator of the strength of pairwise interactions. Genes belonging to the same functional group are typically clustered together as expected, but uncharacterized genes were observed too (Eisen et al., 1998) The edges imply coordinated response, but not necessarily causality.


33 Improvement to the simple clustering method takes advantage of advances in information theory to eliminate a large fraction of indirect interactions. By computing the mutual information (MI) (Butte and Kohane, 2000) for all gene pairs in a microarray dataset, gene pairs with the strongest (and irreducible to other dependencies) statistical association between mRNA abundance levels can be identified. MI is a probabilistic measure of relatedness. The MI algorithm models a targets behavior as a function of its regulators and searches for the most predictive regulator set. Statistical dependent genes are only considered as biologically related if the pairwise MI between two genes is significant ly above threshold. This approach has been used to reconstruct yeast metabolic networks from gene expression data (Pe'er et al., 2002) ARACNE (Margolin and et al., 2006) and CLR (context likelihood of relatedness) (Faith et al., 2007) both implement the MI algorithm and can take a large set (>100) of microarray profiles to infer the transcriptional regulatory networks for systems as complex as mammalian cells. Bayesian networks (Pearl, 1988) are probabilistic models that represent statistical dependencies among multiple variables, which are biomolecules in the cases of GRNs. Bayesian networks illustrate the influential connections (i.e. pathway interactions) among variables i n the form of an influence graph with nodes (i.e. biomolecules) and edges (dependencies between a pair of nodes), and a joint probability distribution that describes the form and magnitude of each nodes dependence on its parent(s). Bayesian networks searc h the space of possible structures in order to find the one that best reflects probabilistic relationships in a biological dataset. These directed acyclic graphs (DAG) describe the strength and


34 direction of edges between nodes using a series of equations. Molecular networks are described as quantitative models of biochemical reactions. Bayesian networks have been used extensively in biology, to model connectivity within regulatory pathways in both the genetic (Friedman et al., 2000; Hartemink et al., 2001) and signaling pathways (Sachs et al., 2002; Sachs et al., 2005; Woolf et al., 2005) for model organisms. For example, Sachs et al. (2005) used Bayesian networks to reconstruct detailed signaling pathway structur es in human T cells using only the concentration of phosphoproteins simultaneously measured in individual cells. Without prior knowledge of any pathways, they identified the majority of known influences between the measured signaling components. At least 100 different types of samples are required for these network identification algorithms. Still, even this sample size is too small with regard to the goal: a comprehensive network that involves thousands of genes, either observable or hidden in the expressi on profiles. Overfitting is a common problem in current studies of network construction from expression data. True correlations and spurious ones are hard to distinguish. For the complexity of mammalian cells, few major insights have emerged from such modeling efforts. To deal with this challenge, bootstrap methods can identify significant network features that are robust to perturbations of the observation. Factor analysis models the covariation among genes. The underlying continuum of unobserved biologica l components can be included (Coffman et al., 2005; Nuzhdin et al., 2009; Tarone et al., 2012) Factor analysis is especially suitable for the cases that the number of genes was greater than the number of samples, which is the case in most genomic settings. Another approach is to take advantage of prior knowledge about biological


35 principles to rest rict the set of network structure. This approach is termed bottom up compared to the top down ones including clustering and probabilistic models, which identify network as patterns from a large set of expression data. 1.5.3 Network Inference by Integr ating Multiple Resources Integrative methods use independent experimental and bioinformatics datasets to understand the properties of biological systems. Known biological principles are used as the foundation for bottom up network inference. Many regulator y features have been identified using collections of microarray data integrated with the mapping of interaction networks (e.g. Ihmels et al., 2002; Li and Yang, 2004) Mapping gene expression data onto a physical interaction network provides a context in which hypotheses can be developed to explain observed effects on gene expression (Lin et al., 2005) This integrated approach has been applied in yeast (Washburn et al., 2003) and mouse (Grieneisen et al., 2007) Yuh et al. (1998) knocked out a single gene, measured the response of whole network in sea urchin development, and integrated the genomewide measurements with previous knowledge about CRE. In th e end, a network specified by direct effects was proposed. Davidson et al. (2002) integrated the expression profiles and sequence features after perturbation of individual TFs into transcriptional regulatory networks. Gunsalus e t al. (2005) integrated transcriptomes, PPI, and RNAi data to provide a multidimensional view of how molecular machines work together during embryogenesis of Caenorhabditis elegans. Chuang et al. (2007) developed a network based classifier by integrating gene expression profiles of metastatic and nonmetastatic cancer patients with the human PPI. Pujana et al. (2007) combined gene expression profiling with functional genomic and proteomic data from different species to identify genes associated with breast cancer risk.


36 1.6 Understanding the Gene Regulatory Networks (GRN) The following section introduc es the complex mechanism by which transcription and gene expression is regulated with special emphasis on those that I have aimed to integrate into our systems biology approach to decipher the role of two miRNAs in endothelial and lymphoid cells. Complex G RNs consist of alternative (or altered) gene expression patterns and their functional output of their gene products. Divergent gene expression patterns are achieved by the combinatorial roles of cis and trans acting factors in response to signal stimuli (Cowles et al., 2002; Nuzhdin et al., 2009; Sieberts and Schadt, 2007; Tarone et al., 2005; Wray, 2007) Cis regulatory elements (CRE) include promoters, enhancers, s ilencers and insulators. Promoters and enhancers are positive elements for gene regulation, for proximal and distal genes respectively. Silencers are sites of initiation of heterochromatin that inhibit specific promoters. Insulators insulate genes located in one chromatin domain from promiscuous regulation by enhancers or silencers in neighboring domains (reviewed by Raab and Kamakaka, 2010) Functions of CREs are subject to the genetic variation in the DNA sequence as well as epigenetic state of the chromatin, whic h determines their accessibility to trans acting DNA binding proteins. 1.6.1 Allelic Imbalance Contributes to Phenotype s and Disease s Allelic imbalance (AI) from the sequence variation between the two copies of a gene is observed in diploid organisms including humans (Graze et al., 2009; Guo et al., 2008; Lo et al., 2003; Zhang et al., 2009; Zhang and Borevitz, 2009) AI can affect the gene regulation through allele specific binding (AS B) of regulatory proteins and allele specific expression (ASE) (McDaniell et al ., 2010; Rozowsky et al., 2011) AI contributes to phenotypic variation in human populations (Johnson et al., 2005; Pickrell et al.,


37 2010) Approximately 18% of annotated human genes exhibit AI (Djebali et al., 2012) AI is an underlying factor of cancer etiology and is associated with the risk of developing breast cancer (Meyer et al., 2008) and colorectal cancer (de la Chapelle, 2009; Valle et al., 2008) AI genes have been found across oral s quamous cell carcinoma tumors relative to matched normal tissues (Tuch et al., 2010) were enriched in cancer related functions. AI of the proapoptotic gene DAPK1 is associated with chronic lymphocytic leukemia (Lynch et al., 2002) AI in heterozygous individuals have been assayed with allelespecific qPCR (Szab and Mann, 1995) with pyrosequencing (Ahmadian et al., 2000; Wittkopp et al., 2004) with targeted SNP typing arrays (e.g. Serre et al., 2008) ), and by genomescale methods such as highdensity microarrays (e.g. Zhang and Borevitz, 2009) or RNA seq (McManus et al., 2010; Pickrell et al., 2010; Zhang et al., 2009) We have designed a reliable cost effective microar ray platform, which measures overall expression, exon usage and AI simultaneously for Drosophila (Yang et al., 2011) These data, which have been published, are described in detail in chapter 3. 1.6.2. Epigenetic Modification Determine Gene Expression of Host C ellular and Herpesvirus Genes Epigenetic mechanisms such as histone modifications, DNA methylation and DNA looping alter the physical state of chromatin and modulate the accessibility of CREs to trans acting factors in a cell typespecific manner (Deaton et al., 2011; Gheldof et al., 2010; Saiz and Vilar, 2006; Wiench et al., 2011) Acetylation and methylation of histones are involved in activation or repression of gene expression (Barrera et al., 2008; Heintzman et al., 2009) For example, H3Ac and H3K4me3 are linked to gene activation (Barski et al., 2007; Boyer et al., 2006) and H3K9me3 and H3K27me3 are


38 markers of repressed regions (Li et al., 2007; Tachibana et al., 2002) Bernstein et al (2012) demonst rated cell typespecific patterns of histone modifications by mapping 12 types of histone modifications in 46 cell types. Expression status and expression levels can be accurately predicted by the presence of different combinations of histone modifications (Ha et al., 2011; Kurdistani et al., 2004; Wang et al., 2008c) Direct DNA methylation on CG dinucleotides is another epigenetic modification of gene expression (Ball et al., 2009) and also exhibits significant tissue specificity (ENCODE Project Consortium, 2012). In addition, DNA looping allows extensive interactions between distantly located enhancers and promoters (Branco and Pombo, 2006; Fraser and Bickmore, 2007; Fullwood et al., 2009; Handoko et al., 2011; Li et al., 2012; Osborne and Eskiw, 2008; Woodcock, 2006) G enes that are coordinately expressed but are present in different regions or even on different chromosomes can be looped into common transcription factories. Genomewide profiling of chromatin loops by 3C mapping in 125 cell types found only a small fraction of the identified DNA looping were common across cell types, while roughly one third were detected in only one cell type (Thurman et al., 2012) The epigenetic state also influences miRNA gene expression (reviewed by Iorio et al., 2010) Approximately half of miRNA genes are associated with CpG islands, indicating regulation by promoter methylation (Weber et al., 2007) Chromosomal abnormalities that are frequent in diseases such as cancers affect the expression of miRNAs located in fragile regions of the genome (Zhang et al., 2006) Aberrant miRNA profiles have been described for most cancers and been proposed as a useful marker for tumor diagnosis (Calin and Croce, 2006; Calin et al., 2004; Garzon et al., 2008;


39 Volinia et al., 2006) In a study that integrated miRNA expression with DNA methylation and H3K27me3 epigenetic marks at the promoter regions of miRNA genes in two differentiated cell types, 19% of t he cell typespecific miRNAs showed epigenetic regulation (Vrba et al., 2011) Reversely, miRNAs can regul ate the expression levels of epigenetic effectors. For example, the miR 29 family directly targets DNA methyltransferases DNMT3a and DNMT3b and reverts aberrant methylation in lung cancer (Fabbri et al., 2007) In mice, the miR 206/miR 133a cluster targets the hSW1/SNF ATP dependent complexes, a major group of chromatin modifying complexes (Nohata et al., 2012) 1.6.3 Transcription Factors as Major Regulators of Gene Expression Besides CREs, trans acting factors su ch as transcription factors (TFs) (Takahashi and Yamanaka, 2006; Vaquerizas et al., 2009) are critical for the establishment and maintenance of cell type specificity. T F is the biggest family of human proteins. Around 10% of human proteins are TFs (Babu et al., 2004) Broadly speaking, TFs can be classified into general and sequence specific. The general TFs such as TFIID, TFIIB cooperate with RNA polymerase II (RNAPII) and bind ubiquitously to a large fraction of genes (Lee and Young, 2000) The sequence specific TFs recognize the TFBS in the subsets of genes that are restricted by their sequence preferences. The two types of TFs can function together. The binding to DNA of general TFs may induce conformational changes, allowing transcriptional regulation by additional TFs to follow (Leung et al., 2004; Meijsing et al., 2009) Sequence specific TFs bind to DNA sequences at enhancers and promoters, and recruit a polymerase complex to the transcriptional start site (TSS) to initiate transcription (Ptashne and Gann, 2001) Differential expression and binding patterns of


40 sequence specific TFs lead to distinc t spatiotemporal patterns of target expression (Kadonaga, 2004) TFs interact with the components of the polymerase complex and other complexes such as chromatin remodelers to regulate the transcription. Binding of sequence specific TF is correlated with the expression levels of their putative targets (Le e et al., 2011) Using gene expression and TF binding data, statistical models revealed that TF binding signals around TSSs of genes predict gene expression as accurately as the differences in chromatin accessibility levels (Cheng et al., 2011; Ouyang et al., 2009) At promoter regions, TF binding signals and histone modification signals are highly correlated (Cheng et al., 2012) The latency associated nuc lear antigen (LANA) of KSHV is a transcriptional regulator that regulates both viral and host cellular gene expression: LANA tethers the viral genome to host chromosomes in the metaphase, maintaining a stable copy number of episomes (extrachromosomal copies of covalently closed circular genomes) through cell division. LANA is also a TF that represses the expression of the reactivation transcriptional activator (RTA), which triggers the switch from latency to lytic replication (Lan et al., 2004) Deletion of LANA increases the expres sion of lytic genes and infectious ability of the virus (Li et al., 2008a) LANA interacts with cellular TFs ATF4/CREB2 and SP1 (Krith ivas et al., 2000; Lim et al., 2000) LANA can i nhibit cellular tumor suppressor p53 and therefore avoid p53mediated apoptosis (Friborg et al., 1999) LANA competes at the IFNB promoter with the activating TF IRF3 to breach the interferon pathways (Cloutier and Flamand, 2010) LANA interacts with various host TFs and chromatin regulatory proteins to promote cell growth and attenuate the antiviral defense. LANA also mediates epigenetic modifications of the viral genome and regulate


41 the expression of viral genes. By binding to the KSHV terminal repeats, LANA mediates replication of KSHV genome (Garber et al., 2002; Garber et al., 2001; Hu et al., 2002; Hu and Renne, 2005) 1.6.4 Methodology to Investigate DNA Protein Interactions Chromatin immunoprecipitation followed by microarray (ChIP chip) or by sequencing (ChIP seq) enables genomewide localization of DNA binding proteins including histones and TFs (Barski et al., 2007; Johnson et al., 2007; Ren et al., 2000) In ChIP assays, cellular DNA protein interactions are maintained by cross linking with f ormaldehyde. The chromatin is sheared into small fragments by sonication and the DNA protein complexes of interest are selected using specific antibodies, resulting in an enrichment of DNA fragments that were bound to the protein of interest. The cross lin king is then reversed and DNA fragments are released from the binding complex to be assayed. Sequencing reads that are aligned to the reference genome form peaks at putative binding sites when visualized in a genome browser. ChIP seq experiments identifi ed thousands of LANA binding sites in PEL cells (Hu et al., In prep; Lu et al., 2012) Many of the binding sites are located in the promoter regions of genes involved in cancer, apoptosis, and immune responses. ChIP experiments have limitations. It is timeconsuming to develop the perfect antibody for the protein of interest and indirect TF DNA interactions through proteinprotein interaction may also be captured. It is difficult to determine which sites among thousands of peaks are functionally active. One ChIP experiment only captures the binding profile of one TF in one cell type. To obtain a comprehensive proteinDNA interactom e, a large amount of ChIP experiments are required. Noise may be introduced to the data at many steps of a ChIP protocol: antibody quality, library bias to


42 certain genomic regions, technical issues during performing the experiment as well as sequencing err ors. Thus, in addition to the sequences truly associated with the protein of interest, random sequences are also present and create a background of nonspecific binding. Despite the attempt of algorithms to account for potential biases (Kidder et al., 2011; Kuan et al., 2011) false positives in peak identification are prevalent. During the process o f analyzing LANA and histone ChI P seq data, I comparatively test ed various tools and algorithms to distinguish specific signals from noise and accurately annotate epigenetic and LANA binding sites genome wide. Consequently I developed a guideline for the design and analysis of ChIP seq experiments that incorporates biological replicates (Yang et al ., submitted to Computational and Structural Biotechnology Journal described in detail in Chapter 4) The footprinting method identifies the open chromatin sites where active transcription is ongoing. Without histone protection, TF occupied sites are sensitive to DNase I and are preferentially cut (Gross and Garrard, 1988) DNase seq combines DNase I treatment with high throughput sequencing, and is able to identify the hypersensive sites all over the genome (Urnov, 2003) Formaldehyde assisted isolation of regulatory elements (FAIRE seq) exploits the difference in crosslinking efficiency. As TF DNA binding is not as strong as the packed nucleosomes, FAIRE seq identifies nucleosomedepleted genomic regions. Identification of TF bin ding sites (TFBSs) can also benefit from bioinformatics. Sequence specific TFs can bind variable regulatory sites, which are rarely exact but often share some core similarity. A consensus sequence may emerge from individual binding sites with similar sequence motifs. One method to accurately represent a TFBS


43 is position weight matrice (PWMs). All possible bases for each position of the TFBS are assigned a weight associated with the similarity to the consensus. Each weight is an element of the matrix. The s um of element values is the score for a particular TFBS (Hawley and McClure, 1983; Stormo, 1988) Searching the genome, especially CRE regions, for the consensus sequence or the PWMs can recognize putative TFBSs. Many algorithms have been developed to identify the optimal alignment (Lawrence and Reilly, 1990; Stormo and Hartzell, 1989) Another scenario is hav ing genomic regions identified that are bound by a specific TF but needing to extract the consensus motif for binding sites. For this purpose, bioinformatic tools have been developed to discover the enriched pattern from a set of enriched sequences. Some m ost commonly used tools include the MEME suite (Bailey et al., 2009) and Regulatory Sequence Analysis Tools (RSAT) (Thomas Chollier et al., 2011) Once de novo motifs are identified from sequences, further comparison with known TFs help to remove artifacts and retain those that are likely to be biological relevant. 1.6.5 Transcriptional Regulator y Networks The number of genomic binding sites varies greatly among trans cription factors (e.g. ENCODE ChIP seq data). For some TFs, ChIP seq identified a large number of binding sites. Over 10,000 STAT1 binding sites were identified in human Hela S3 cell s. Upon the interferon(Robertson et al., 2007) These numbers exceed our expectation of possible direct targets of STAT1. They are possibly artifacts, but more and more studies report similar results (Cheng et al., 2009; Frankel et al., 2010; Kassouf et al., 2010; Palii et al., 2011) raising doubts about the traditional model that TFs directly activate or repress the expression of a defined list of target genes in a discrete network. The excessive binding may be non-


44 functional (Li et al., 2008b) or involved in indirect regulation such as chromatin looping and nuclear structure (MacQuarrie et al., 2011) Based on these data, Biggin (2011) proposed that TF regulatory networks are continuous instead of discrete: A TF may essentially bind all genes in a continuum of different strength. Those with the strongest bin ding are most ly t argets with important biological effects. Other target genes are weakly bound and regulated moderately and yet they are evolutionary conserved (Biggin, 2011; Taney and Smith, 2006) The low affinity binding may not regulate the ex pression level of the bound genes but rather control the concentration of the freestate TFs (Riggs, 1975) Frequently, multiple TFs bind to CREs in a combinatorial fashion to shape the cell type specific gene expression. It has been estimated that 75% of all metazoan TFs heterodimerize with other factors (Walhout, 2006) For example, the cohesion complex consists of the three TFs: CTCF, RAD21 and SMC3 (Parelho et al., 2008; Rubio et al., 2008) The complex activates gene expression (Schmidt et al., 2010) and mediates long range DNA interactions via chromatin looping at many loci (Degner et al., 2009b; Hadjur et al., 2009; Hou et al., 2010; Kim et al., 2011; Majumder and Boss, 2011) Many combinations are universal, but cell type specificity is evident. FOS and JUN associate in the K562 Hela cells but not in the lymphoblastoid GM12878 cells (Gerstein et al., 2012) Different combinations of TFs also have different preference of genomic positions. For example, FOS SP2 binds proximal regulatory elements such as core promoters, whereas FOS JUND typically bind to distal regions (Gerstein et al., 2012) Variability in the composit ion of TF complex es leads to differential binding patterns


45 (Leung et al., 2004; Saccani et al., 2003; So et al., 2007; van Dam and Castellazzi, 2001) ). 1.6.6 Protein Functioning N etworks Besides forming TF complexes, proteinprotein interactions (PPIs) are crucial for many other biological processes. Proteins rarely act alone at the biochemical level (Alberts, 1998) Most gene functions are carried out not by single proteins but by protein complexes. A TF can function as an activator or a repressor depending on the binding context (Botquin et al., 1998; Ma, 2005; Zhao et al., 2000) PPIs interplay with other layers of regulation. ChIP data have revealed interesting co nnections between histone modifications and splicing (Kolasinska Zwierz et al., 2009; Schwartz et al., 2009) Individual splice variants can affect PPI. Isoforms regulated by specific splicing factors may physically associate with each other through PPIs (Calarco et al., 2009; Ule et al., 2005; Warzecha et al., 2010) A significant proportion of alternative exons structurally map to the surface of proteins, potentially influence PPIs (Wang et al., 2005) Tissue specific AS networks may have evolved in part to finetune PPIs for a given cell or tissue type. PPI data can be obtained fro m publicly available databases such as IntAct (Kerrien et al., 2007) and BioGrid (Breitkreutz et al., 2008; Stark et al., 2006) Experimental approaches to identify PPI include the classical yeast two hybrid (Y2H) assays (Fields and Song, 1989; Ito et al., 2001; Uetz et al., 2000) protein chips (Zhu et al., 2001) and high throughput co IP f ollowed by mass spectrometry (MS) analysis of purified protein complexes (Ho, 2001) Y2H assays detect binary interactions through activation of reporter gene expression and have the potential to predict direct protein interaction partners. Y2H results contain a lot of artificial interaction due to the


46 occasional conformatio n changes induced by the bait/prey proteins, and to the unnatural expression levels and compartmentalization. MS of co IP ed protein complexes indicates that the proteins are indeed assembled complexes. Similar as ChIP seq, the co IP process for the MS ass ays introduce noise and false positives. The overlap between Y2H and MS are relatively low (Krogan et al., 2006; Yu et al., 2008) Interpretation of PPI data needs caution. By checking the spatial temporal expression information from expression profiling, the list of interacting proteins can be filtered. A key component of the cellular machinery is the set of signaling proteins. Signaling proteins form signaling pathways, which interact and form complex signaling networks permitting a cell to sense the internal and external environment and respond by altering metabolism and gene expression. KEGG (Aoki Kinoshita and Kanehisa, 2007) is a substantial knowledge base for these pathways. The core pathways are modified in different organisms and specific cell types. PPI of some core pathways were reviewed in details in literature, such as the cancer associated Raf MEK ERK MAPK cascade (Roberts and Der, 2007) and the diabetes associated insulin pathways (Liu et al., 2007) Similarly, one can create a signaling network for a system of interest, by integrating other layers of regulators including TFs and miRNAs. Differential gene expression profiles can be used to identify aberrant signaling proteins. However, genes without change in expression may still be key regulators, as many signals are propagated by post translational modifications such as phosphorylation, acetylation, sum o ylation etc. changes in proteins or proteolytic cleavage events changing protein localization and structure. Measurements for such post translational modifications are presently very limited in vivo


47 1.6.7 Complex GRNs can be Inferred by Integrating Multiple Layers of Regulation Interaction, cross regulation and collaboration among components of the multiple layers form the complex networks of tissue specific regulation. TFs were considered the dominant players in transcriptional regulatory networks Recent studies have identified other influential factors. Allele imbalance affects the transcript composition as well as the transcript production process, through allele specific expression and allele specific binding, respectively. Chromatin structure is determined by nucleosome occupancy and histone modification. It influences the accessibility of DNA to TF, and in turn is influenced by TF too. TFs cross regulate one another and form context specific co association. After transcription, RNA molecules are subject to post transcriptional mechanisms including miRNA targeting. Once translated and post translationally modified, the final protein products have to interact with one another for proper functionality, sometimes composing signaling networks. Indiv idual cell types use their own combinations of these features to construct the unique biology. On top of the cellular circuitries, viral infection introduces more regulatory mechanisms by viral gene products. Multiple profiling and interaction data from th ese layers have to be integrated in order to obtain a global picture of the biology. The heterogeneity of dataset and databases, in terms of data format, data processing protocol, interaction model and annotation sources, presents challenges to such integr ative analysis. Moreover, assembling the pairwise interactions into hierarchies is a challenging task too. Ultimately, systems biology aims to comprehensively and quantitatively analyze all of the components, interactions, and dynamics of a biological sys tem. Powerful new high throughput tools allow experiments to collect the primary data. Next generation sequencing based technology has increased our knowledge on TF targets and miRNA


48 targets in vivo providing input materials for dissecting the GRN. With t he global network of protein protein, protein DNA, and other known physical interactions (e.g. (Bader et al., 2001; Bader and Hogue, 2003) ), data integration can provide important insight in reconstructing networks with sparse and noisy expression data (e.g. Ihm els et al., 2002; Li and Yang, 2004) Integration of several types of data can reduce the effect of noise and lead to a more coherent reconstruction of network.


49 Table 11 Validated targets of KSHV miRNAs Viral Targets KSHV miR K12 9* miRK12 5 miRK12 7 RTA Replication and Transcriptional Activator Cellular Targets KSHV miR Cluster THBS1 EXOC6 ZNF684 CDK5RAP1 Angiogenesis inhibitor SEC15 gene family Zinc finger protein Regulation of neuronal differentiation miR K12 1 p21 NF Inducer of cell cycle arrest miR K12 3 LRRC8D NHP2L1 NFIB Immune cell activator U4 snRNA nuclear binding protein Transcriptional Activator miR K12 3 miRK12 7 miR K12 11 Transcriptional Activator miR K12 4 3p GEMIN8 Required for splicing miR K12 5 BCLAF1 Pro apoptotic factor Rbl 2 Rb like protein miR K12 6 miR K12 11 MAF Transcription factor miR K12 7 MICB NK cell ligand miR K12 11 BACH1 Transcriptional suppressor Inducer of interferon miR K12 10a TWEAKR Pro apoptotic factor


50 Table 12 Algorithms for miRNA target prediction Algorithm Criteria for Prediction and Ranking Website Reference TargetScan Stringent seed pairing, site number, site type, site context (which includes factors that influence site accessibility); option of ranking by likelihood of preferential conservation rather than site context http://targetscan.org (Friedman et al., 2008) EMBL Stringent seed pairing, site number, overall predicted pairing stability http://russell.embl heidelberg.de Stark et al., 2005 PicTar Stringent seed pairing for at least one of the sites for the miRNA, site number, overall predicted pairing stability http://pictar.mdc berlin.de Lall et al., 2006 EIMMo Stringent seed pairing, site number, likelihood of preferential conservation http://www.mirz.unibas.c h/ElMMo2 Gaidatzis et al., 2007 Miranda Moderately stringent seed pairing, site number, pairing to most of the miRNA http://www.microrna.org Betel et al., 2008) miRBase Targets Moderately stringent seed pairing, site number, overall pairing http://microrna.sanger.ac .uk Griffiths Jones et al., 2008 PITA Moderately stringent seed pairing, site number, overall predicted pairing stability, predicted site accessibility http://genie.weizmann.ac .il/pubs/mir07/mir07_dat a.html Kertesz et al., 2007) mirWIP Moderately stringent seed pairing, site number, overall predicted pairing stability, predicted site acces sibility ery Hammell et al., 2008 RNA22 Moderately stringent seed pairing, matches to sequence patterns generated from miRNA set, overall predicted pairing and predicted pairing stability http://cbcsrv.watson.ibm. com/rna22.html Miranda et al., 2006 RNAhybrid thermodynamic stability, Moderately stringent seed pairing http://bibiserv.techfak.uni bielefeld.de/rnahybrid/ Kruger and Rehmsmeier 2 006 Targetboost Moderately stringent seed pairing; site number, conservation; thermodynamic stability http://www.interagon.co m/demo/ S aetrom et al. 2005


51 Figure 11 Mutiple layers of g ene regulatory networks (GRN). Gene expression is subject to changes in chromatin state, transcription, post transcriptional control and protein functioning. Adapted from Schonrock et al. 2012 http://circres.ahajournals.org/content/111/10/1349/F3.expansion.html


52 Figure 12 The KSHV Genome. Open reading frames (ORFS) are labeled in color based on their expression pattern during latent, immediate early, early, or late infection. The KSHV latency associated region (KLAR) is underlined in blue and the miRNA genes are labeled in orange.


53 Figure 13 KSHV miRNAs are encoded in the KSHV latency associated region (KLAR). The latent genes in KLAR are in orange with the direction of latent transcription denoted by orange a rrows. Latent promoters are indicated by the black directional arrows. The miRNA cluster contains 10 miRNA genes and downstream of the cluster are 2 additional miRNA genes are encoded.


54 Figure 14 Biogenesis pathway for miRNAs. MiRNA precursors begi n as hairpin loops in RNAPII or RNAPI II transcripts in introns or exons. Drosha cleaves the pri miRNA transcript leaving a ~80 bp stem loop which is exported into the cytoplasm. Dicer cleaves off the loop structure leaving a 2124 nt dsRNA molecule. The miRNA is incorporated into the RISC where it binds to the 3UTR of target transcripts and induces either translational silencing or transcriptional degradation depending on the level of complementarity. The seed sequence of the miRNA, nts 2 through 8, is known to be a critical component of target recognition and binding.


55 Figure 15. Methodology for miRNA target identification. MiRNA targets have mainly been identified by pair wise analysis approaches establishing that a gene initially predict ed to be a target by a number of bioinformatic algorithms could be regulated by perturbation (over gainof function or loss of function) of a specific miRNA. These assays entail cloning and mutagenesis of 3UTRs of potential targets downstream of a report er (luciferase or GFP) and/or monitoring potential targets by real time RT PCR in miRNA or antigomir transfected cells. Ribonomics approaches such as highthroughput sequencing of RNA isolated by cross linking immunoprecipitation (HITS CLIP) and Photoacti vatableRibonucleosideEnhanced Cross linking and immunoprecipitation (PAR CLIP) enable direct identification of miRNA targeted genes. Both techniques utilize UV cross linking to fix RNA/protein interaction, followed by immunoprecipitation of Ago, which enriches miRNAs that are incorporated into RISC complexes and guided to their cognate targets.


56 CHAPTER 2 A SYSTEMS BIOLOGY APPROACH TO ANALYSIS OF MIRNA PERTURBATION EXPERIMENTS 2.1 Overview Kaposis sarcoma (KS) is an endothelial tumor and a major cause of AIDS patient death. Its associated herpes virus (KSHV, HHV 8) is a double strand DNA virus and a member of the subfamily of human herpes virus (Chang et al., 1994) KSHV can also infect lymphocytes in which it establishes latency, and in immunodeficient patients causes primary effusion lymphoma (PEL) and Multicentric Castlemans diseases (MCD) (Cesarman and Knowles, 1999; Soulier et al., 1995) The distinct pathological turnout of KSHV in two types of human tissues serves as a model system for studying cell type specific gene regulation. In KS tumors and PELs, the majority of cells are latently infected and expresses only viral genes within a specific region of the viral genome: the latency associated region (KLAR) (Dittmer et al., 1998; Renne et al., 1996; Zhong et al., 1996) This region encodes the latency associated nuclear antigen (LANA, involved in latent DNA replication and episomal maintenance), v Cyclin (cyclin D homolog that promotes S phase entry), v Flip (promotes cell survival), the kaposin gene family (involved in cytokine mRNA stabilization and cell transformation), and 12 microRNA (miRNAs) ge nes. MiRNAs are small RNAs of 19 24 nucleotides that inhibit translation (Baek et al., 2008; Selbach et al., 2008) and induce mRNA degradation (Eulalio et al., 2009; Farh et al., 2005; Guo et al., 2010) The genomic location of KSHV miRNAs suggests their potential import ance in latency establishment and pathogenesis of KSHV. Target identification is the first step towards understanding the function of miRNAs. Target specificity of a miRNA is largely determined by the seed sequence at


57 its 5 end (Lewis et al., 2005; Lewis et al., 2003; Lim et al., 2005; Stark et al., 2003) which facilitates binding to the complementary sites primarily within 3UTRs) of messenger RNAs (mRNAs) in the RNA induced silencing complex (RISC). Based on the c omplementary and other sequence properties (Bartel, 2009; Grimson et al., 2007) miRNA targets can be predicted using bioinformatic algorithms (Alexiou et al., 2009; Mazire and Enright, 2007) Individual miRNAs may target large numbers of mRNAs due to the short length of the seed sequence (68 bases). In mammalian cells, it has been estimated that greater than half of all protein coding genes contain miRNA target sites (Friedman et al., 2009) However, to identify the regulatory output of a miRNA in living cells, it is necessary to determine transcriptome and/or proteome changes after perturbing miRNA expression. Th e regulatory network of a miRNA consists of the direct targets as well as the secondary effects mediated by the indirect targets. Change in a direct target that is positioned at the higher level of the network hierarchy will result in changes in downstream genes (Figure 21). Dissecting the downstream phenotypic effects requires elucidating the functions of the individual genes in the context of the networks in which they operate. To accomplish such a task, I employ a systems biology approach which includes three key components: perturbation, measurements, and integration (Ideker et al., 2001) Data integration can provide im portant insight in reconstructing networks with sparse and noisy expression data (e.g. Ihmels et al., 2002; Li and Yang, 2004) By comprehensively gathering information and to integrate these data, the regulatory network of a miRNA will be connected between different layers of regulators. Besides, while one miRNA potentially can regulate many mRNAs, only a fraction of


58 these targets are functional since both the miRNA expression levels as well as the transcriptome vary between cell types and even cells within a population (Graur et al., 2013; Mukherji et al., 2011) Examining the target genes in the context of cellular networks can separate driver targets that have phenotypic consequences from nonfunctional ones. In this study, I applied this integrative approach to study the cell type specific regulatory network of one KSHV miRNA, miR K12 11 (Figure 22). MiR K12 11 shares the seed sequence of the cellular onco miR miR155 (Gottwein et al., 2007; Skalsky et al., 2007) which is asso ciated with immune activation (Moffett and Novina, 2007; Rodriguez et al., 2007; Thai et al., 2007) and implicated in tumorigenesis (Costinean et al., 2006; Eis et al., 2005; Kluiver et al., 2005; O'Connell et al., 2008; van den Berg et al., 2003) To ask if and how miR K12 11 contributes to KSHV pathogenesis, I ectopically expressed miR K12 11 and miR 155 using retroviral transduction, identified differentially express ed genes (DEGs) by microarray analysis, and integrated the DEGs with transcription networks, signaling pathways, and protein/protein interaction data. To explore the cell type specific pathogenesis of KSHV, I compared the regulatory effects of miR K12 11 i n BJAB (lymphoid origin) and TIVE (endothelial origin) cells, and found that the two cell types shared only a small number of direct miR K12 11 targets. Nevertheless, common pathways such as carbohydrate metabolisms and cytokine signaling were affected in both cell types, although different components of these pathways were targeted. Various transcription factors were repressed, and amplified the modest miRNA dependent negative regulation to their downstream genes. By influencing key elements in the gene regulatory networks, such as transcription factors


59 and signaling proteins, miR K12 11 opposes the host defense and contributes to the proliferation and survival of KSHV infected cells. 2.2 Results and Discussion 2.2.1 Ectopic Expression of miR K12 11 and mi R 155 Caused Transcriptome Changes in BJAB and TIVE Cells MiR K 1211 and miR 155 were expressed from a bi cistronic mRNA together with GFP. After retroviral transduction, BJAB and TIVE cells stably express GFP, the indicator of infection (Figure 23). Qu antitative PCR confirmed ectopic miRNA expression (Figure 24). There was some endogenous expression of miR 155 in both cell types. However, miR 155 expression was significantly higher in transduced BJAB cells compared to mock. In contrast, the miR 155 transduced TIVE cells did not significantly increase miR 155 levels over endogenous expression. MiR K12 11 was successfully over expressed compared to the mock control in both BJAB and TIVE cells. MiRNA copy numbers of ectopic miR K12 11 was lower than in the KSHV infected B cell line BCBL 1, indicating that it did not express at superphysiological levels (Figure 2 4). Quality control analysis of the microarray indicated all slides successfully hybridized. There was good agreement among biological replicates (Pearson correlation>0.9, Spearman correlation >0.9, weighted kappa >0.7). The array surveyed 13,793 genes, 3,189 of which were identified as differentially expressed genes (DEGs) in response to miR K12 11 in TIVE cells, 1,215 DEGs in BJAB cells, and 141 D EGs as regulated by miR 155 in BJAB cells (Table 21). Consistent with previous reports that the repression mediated by miRNAs is small (Baek et al., 2008; Guo et al., 2010) : 91% of DEGs of miR K12 11 expressing TIVE cells had less than a 50% change in the RNA levels (Figure 25). The effect was even more


60 moderate in BJ AB cells with 97% having less than a 50% change in RNA levels. Genes commonly affected between different cell types were few, indicating that the regulatory effect is highly tissue specific (Figure 25). The much fewer number of DEGs caused by miR155 is m ost likely due to the endogenous expression of miR 155, for which the targets genes were already under regulation before the ectopic expression. For four genes mRNA repression levels were verified by qPCR. Expression of AGTRAP (angiotensin), APOBEC3G (cont rols RNA processing), SAMHD1 (regulates TNF proinflammatory responses) and SOCS1 (cytokine suppressor) validated the array data (Figure 2 6). Direct targets of miRNAs are expected to be repressed through sequence complementarity. By comparing the compu tational target prediction with the down regulated DEG, I observed that direct targeting of the ectopic miRNAs only explained a small portion of the DEGs. The association between downregulated genes and predicted targets was weak. In addition, the number of up regulated genes was about the same as the number downregulated. Hence, most of the observed changes must be attributed to indirect regulation mediated by downstream events, such as changes in transcriptional regulation, signaling pathways, and phys ical associations. Enrichment analysis of gene ontology (GO) confirmed this hypothesis, as sequence specific transcription factors and protein binding were among the top molecular functions. 2.2.2 Effect of miR K12 11 is Amplified by Direct Targeting of T ranscription Factors List enrichment test confirmed that miR K12 11 preferentially targets Transcription factors (TFs) in both cell types (Fishers exact test p<0.05). TFs regulate the expression of multiple genes and can amplify the effect of the initial miRNA targeting


61 event (Figure 21). I identified the following TFs as putative direct targets of miR K12 11 in TIVE cells: CEBP E2F1, PAX6, RELA (NF algorithmically predicted, and showed strong repression (fc>1.4). They are also key regulators involved in cancer, strongly suggesting a role for miR K1211 in KS tumorigenesis. Particularly, CEBP is a previously confirmed target for both miR 155 and miR K12 11 in B cells and in the context of human hematopoiesis (Boss et al., 2011; Yin et al., 2010) In BJAB cells, several TF genes were identified as putative direct targets by prediction and decreased expression. It was not clear which TFs were the primary targets that mediated downstream regulation in BJAB cells, for the fold changes of expression levels were modest for all TFs (fc<1.2). However, motif analysis using RSAT and TOMTOM identified E2F, SP1 and KLF as enriched motifs within the promoter sequences of downregulated genes, strongly suggesting these TFs as putative direct targets in BJAB cells. From the set of upregulated genes, motifs matching the FOXA TFs were enriched, though FOXA genes themselves did not show significant changes of expression levels. The regulation might occur at the post translational modification that changed TF activity, such as phosphorylation or acetylation and therefore not detectable at the transcript level. In contrast, for CEBP E2F1, PAX6, RELA (NF identified based on both expression changes and the presence of TF binding sites (TFBS) within these promoters. As TF binding is predominantly cell type specific (the ENCODE consortium 2012), I increased the tissue specificity of TF target prediction by excluding genes that were absent in the cellular context due to spatial and temporal specificity of gene


62 expression. Those that were not on the array, or were absent in the mock transduced cells (i.e. absent call on the array) were removed from the analysis. I did not find published ChIP data for these TFs in the corresponding cell types for a list of in vivo TFBS. However, the ENCODE project contained a DNase seq data from primar y endothelial cells (HUVEC) and B cells (GM12878 line) which allows identification of active chromatin regions. Genes that did not show DNase hypersensitivity and therefore were not transcribed were excluded from our TFBS analysis list. Using these criteria, a strong correlation was observed between downregulation of TFs and their cognate targets, many of which are not direct targets of miR K12 11 (Table 22 ; Figure 28 ). E2F1 was directly target e d by miR K12 11 in both BJAB and TIVE. Down regulation of E2F1 affects the cell cycle transition from G1 to S and potentially prevents apoptosis (BioCarta database). The repression of this master regulator of cell cycle, miR K12 11 may help the establishment of latency. E2F1 possibly affected 240 genes, which contained TFBS of E2F1 and which showed decreased expression levels. Since more than 66% of these 240 genes did not contain seed sequence matches their downregulation was likely due to miR K12 11 targeting of E2F1. Many of these indirect miR K12 11 targets are known to be regulated by E2F1, including other E2F family members (Muller et al., 2001) the DNA replicase POLA2 (Polager et al., 2002) cell cycle control genes such as CCN E1 (Ishi da et al., 2001; Stanelle et al., 2002; Young et al., 2003) MCM5 (Ren et al., 2002) MYBL2 (Ishida et al., 2001) and RAD51 (Weinmann et al., 2002) Similar indirect connections between miRK12 11 and DEGs were identified as mediated by CEBP PAX6, RELA and STAT1, connecting hundreds o f downregulated genes to the downstream effect of miR -


63 K12 11 targeting of TFs (Table 22 ; Figure 28 ). This analysis revealed how a relatively small miRNA inhibitory effect on TFs can convey significant gene expression changes to a large number of downstream genes. 2.2.3 MiR K12 11 Targeted Different Components of Biological Pathways in BJAB and TIVE Cells Although the overlap between DEGs in TIVE and BJAB cells of DEGs was limited, common pathways were targeted in both cell types, including innate imm une response, cytokinemediated signaling pathways, regulation of cell cycle, and proliferation. Specifically, cancer cells increased glucose uptake to meet the energy needs of tumor progression, by switch ing from glycolysis to the more efficient oxidative phosphorylation (Warburg, 1956; Warburg et al., 1924) It has been reported that that KSHV infection of endothelial cells strongly induced the Warburg effect during latency (Delgado et al., 2010) When utilizing HITS CLIP to comprehensively analyze KSHV miRNA targets in KSHV infected B cells, glycolysis was identified as the top enriched biological process (Haecker et al., 2012) Consistently with these studies, I found that response to glucose stimulus was enriched for downregulated genes by miR K12 11 in BJAB, and that carbohydrate metabolism was enriched for downregulated genes by miRK12 11 in TIVE. The IFN signaling pathway was affected in both BJAB and TIVE cells by miR K12 11, although differ ent components of the cascade were targeted (Supplemental Table 2). In TIVE, both SOCS1 (fold change>1.4) and the transcription activators STATs (STAT1 and STAT2 fc>2, STAT3>1.4) were downregulated. SOCS1 is a known target of miR 155 in breast cancer (Jiang et al., 2010) The cytokine receptor IFNGR1 (fc>1.2) was targeted in BJAB, and IFNGR1 is a confirmed target of miR 155 in CD4+ T


64 cells (Banerjee et al., 2010) Mapping miR K12 11 induced DEGs in both celltypes to the core JAK STAT pathway in KEGG (Aoki Kinoshita and Kanehisa, 2007) revealed differential targeting of the JAK STAT pathway between BJAB and TIVE cells (Figure 2 7) 2.2.4 MiR K12 11 Targets Multiple Components of Interferon Signaling Pathways in Endothelial C ells Of the TFs I investigated, RELA and STAT1 are key components of NF JAK STAT signaling, which are two main cytokinesignaling pathways with the potential for cross talk (Shuai and Liu, 2003) The NF regulation, and the activation of several downstream genes that have antiviral functions requires the cooperation of STATs and NF (Li and Verma, 2002) JAK STAT pathways are a paradigm of cell s ignaling used by many cytokines and growth factors. Both signaling pathways are associated with the interferon response, which was also identified within DEG by functional enrichment analysis. I looked into the details of the interferon response and find t hat multiple key components were directly or indirectly targeted by miR K12 11 in TIVE cells. Consistent decrease of expression levels were observed for STAT1, STAT2, STAT3, and their transcriptio nal targets (Figure 29 ). Liang et al. (2011) has identified IKK as a miR K12 11 target in lung cancer cells. Though IKK level was unchanged in this experiment, its downstream effectors IRF and NF K12 11 attenuates IFN signaling by downregulating multiple possible components, IKK in lung cancer cells, IFNGR1 in B cells, and STAT1 in endothelial cells (Figure 27 ; Figure 29 ), in order to evade the host immune response. A number of well characterized interferon stimulated genes (ISGs) such as ISG15, USP18 and the OAS gene family all exhibited significant down-


65 regulation by miR K12 11, strongly supporting that miR K12 11 inhibits interferonresponses in endothelial cells. The expression of IRF1, IRF7 and IRF9 decreased, likely due to reduced STATs levels since none of these IRFs contain seed sequence mat ches. As important transcription factors, reduced IRFs can activate or repress a variety of downstream genes. RELA may be subject to the negative regulation of IRFs, and it is also a putative direct target of miR K12 11. The observed downregulation of RELA suggested that the latter mechanism overrode the first, reflecting the complexity of these regulatory networks. A similar function has been reported for miR 155, which by attenuating NF activity, contributes to stabilization of EBV latency ( Lu et al., 2008) Another antagonizing regulation was that both SOCS1 and STAT1 were downregulated, even though decreased SOCS1 might increase STAT1 levels by lessening the repressive effect on JAKs. Such incongruity was the natural result of interplay of multiple components. To increasing the complexity of signaling regulation, miR K12 11 also targets key regulators of additional signaling pathways that are important for proliferation, apoptosis, and immune evasion and that likely cross talk with the interferon pathways. These are PTEN and AKT1S1 of the AKT signaling pathway, SKI and SMAD4 of the TGF MYD88 pathway which regulates host defense. Feedback and crosstalk among these signaling proteins interconnected these p athways to work synergistically.


66 2.2.5 Incorporating Physical Interactions into TFTarget Pairs Extends the Regulatory Networks Biological functions depend on physical interactions among individual proteins and nucleic acids. Combining this information of DNA protein interaction with the PPI map revealed further evidence for a cascading regulatory effect. A PPI pair can transmit the expression change of one protein that was repressed by the miRNA to its interacting partner (Figure 2 1 ). The human protein interactome assembled from IntAct (Kerrien et al., 2007) and BioGrid (Breitkreutz et al., 2008; Stark et al., 2006) contains 173,609 interacting pairs represented by 11,494 genes. The connectivity and the neighbor numbers f ollowed power law distribution (Figure 2 10). The secondary interactome that was expanded from the primary interactome consisted of 19,893,302 interacting pairs of genes. A particular TF can cooccupy promoters with different sets of TFs to form distinctly functional regulatory complexes in a cell type specific manner. These complexes or regulatory modules are a mechanism that might be common to pleiotropically expressed TF s such as E2Fs and STATs. STAT1 and E2F1 may co bind to regulate gene transcription concomitantly, for their binding sites frequently overlap (Kiuchi et al., 1999) Differential p rotein/protein interaction, nuclear/cytoplasmic sequestration, and cooccupancy on promoters determines the regulatory specificity of each TF in addition to its cognate DNA binding motif Previous genomewide binding studies of E2F family members have sugg ested that proteinprotein interactions may be the main mechanism by which E2F proteins are recruited to specific promoters (Cao et al., 2011) By integrating the expression data and the human PPI, I identified enriched modules of such co occupancy. Nodes of the PPI network were removed for non-


67 expressed genes. The neighboring genes of E2F1 were enriched with genes downregulated by miR K12 11, indicating that the subnetwork was targeted. 2.3 Conclusions Dramatically cha nged miRNA concentrations may compete for miRNA AGO complexes and thereby affect target genes of other miRNAs (Jeyapalan et al., 2011; Poliseno et al., 2010; Sumazin et al., 2011; Tay et al., 2011) Unlike the experiments that over expressed the miRNAs in unnatural settings, our expression of miR K12 11 by retroviral transduction is lower than that in PEL observed in cancer patients, and therefore should not trigger artificial effects such as IFN responses or apoptosis. Fewer than 10% downregulated genes were repressed to half. Our results are consistent with the hypothesis that in somatic cells miRNAs are finetun ers of gene expression. As more and more evidence has emerged for the tissue specificity of gene expression and regulation, it is imperative to identify miRNA targets in the relevant cell types. Phenotypes of KSHV infection in B cells and endothelial cells drastically diverge, causing PEL and KS respectively. Appreciation of the divergent regulatory control underlying the two tissues helps to elucidate the pathogenesis of KSHV. This is the first target identification of KSHV miRNAs in TIVE cells, which is o nly available cell culture system for studying KSHV effect on host endothelial cells. We observed a stronger effect in response to ectopic expression of miR K12 11 in TIVE than in BJAB cells. The transcriptome change caused by miR K12 11 in the two cell ty pes was almost completely different on the individual gene level. Few DEGs were shared between TIVE and BJAB. However, GO analysis identified more similar enrichment terms such as protein binding, sequence specific transcription factor binding, and cytokine mediated signaling pathways. Cellular use of energy is an important aspect in cancer progression.


68 MiR K12 11 affected response to glucose stimulus in BJAB cells, while in TIVE cells it is the broader process of carbohydrate metabolism. When exploring beyond the direct biochemical miRNA:mRNA target pairs, we gained insight of the regulatory cascade triggered by miR K12 11. For both BJAB and TIVE cells, direct targets are relatively underrepresented in the list of DEGs compared to the more broadly indirect effects. The direct targeting of TFs and other signaling genes suffices to significantly influence the function of a whole pathway. Early studies have pointed out the importance of the 3UTR for the mRNA stability of sequence specific TFs (Kabnick and Housman, 1988; Yeilding et al., 1996) Short lifetime of TFs is essential for accurate regulation of transcription, which may be implemented by miRNA regulation. In this study, we observed a preference to targeting TFs, including CEBPB, PAX6, RELA, and STAT1 in TIVE cells, FOXA, KLF and SP1 in BJAB cells, and E2F1 common to both. Decrease in the TF levels amplifies the effect of miR K12 11 to many more genes downstream. Signaling pathways have been suggested as ideal targets of miRNA regulation, as small changes in the expression level of upstream signaling genes affect the signal transduction cascade significantly (Inui et al 2010). Some individual miRNAs are able to target components of several components of a single signaling pathway (Kennell et al., 2008; Leucht et al., 2008; RicarteFilho et al., 2009; Uhlmann et al., 2012) Consistently, we identified the IFN signaling pathway as such a case for miR K12 11. For both BJAB and TIVE cells, multiple components of th e IFN pathway were affected albeit by t argeting different components (Figure 27 ). Interferon is potent cytokine that mediate antiviral defense. Modulation of interferon pathways is required to suppress the innate


69 immune response and establish successful l atent infection. Previous studies also suggested that miR K12 11 could attenuate IFN signaling and help KSHV to evade the host immune response by targeting IKK (Liang et al., 2011) It seems that the ability o f miRK12 11 to regulate IFN rel a ted genes is universal across cell types but is conveyed by targeting different genes. In addition to miR K12 11, KSHV expresses homologs to cellular IRFs, and prevent the association of IRFs with their co activators (Joo et al., 2007; Lin et al., 2001) The inhibition imposed by miR K12 11 and vIRF to cellular IRFs may reinforce each other through a feedforward loop. While we cannot estimate the relative contribution of miR K12 11 v er sus vIRF signaling, expressing a miRNA comes with the added advantage of not eliciting and humoral host immune responses. Other KSHV gene products such as v cyclin and vILs are also cytokine signaling genes that can block the activity of host homologs (Damania, 2004) Taken together, KSHV is able to manipulate cell cycle and apoptosis, to evade immune response, and promote proliferation and survival of infected cells. 2. 4 Methods 2. 4 .1 Experimental Design The experimental design allows comparison of miR 155 transduced cells, miR K12 11 treated cells, and mock transduced cells. The experiment was conducted in four subsequent time periods such that all the experimental conditions were independently repeated. 2. 4 .2 Vector System The foamy virus vector plasmid pCEGFPL was constructed as described before (Boss et al., 2011) The gag, po l and env genes are replaced by a miRNA gene following a minimal hum an cytomegalovirus (CMV) immediateearly promoter at the


70 transcription start site located in the 5' LTR and a GFP gene as the reporter. The replication ability of the viral vector can be restored by co transfection with the packaging plasmid pCI env3.5 (pr ovided by Mergia lab). Recombinant virus vectors expressing miR 155, miR k12 11 and empty vector without insert as the control were produced by transient cotransfection with Mirus transfection reagent following the manufacturer's instruction. The supernatant after transfection) was filtered through a Durapore filter, concentrated by centrifugation at 1100 rpm for 30 min, resuspended in culture medium and stored at 4 C temporarily and at 80 C for long term. The amounts of foamy virus vectors produced were titrated on fresh 293T cells plated at a densi ty of 4.0 X 104 in 24 well plates. Seventy two hours after infection, cells were numerated for GFP expression using fluorescent microscopy 2. 4 .3 Cell Culture BJAB is a Burkitt's lymphoma human B cell line. It is non infective, EpsteinBarr virus negative. BJAB cells were grown in culture suspension in complete RPMI medium with 10% fetal bovine serum (FBS) and 5% penicillinstreptomycin. Cells were split once they were confluent, when the medium turned from red to yellow Telomerase immortalized human umbilical vein endothelial (TIVE) cells have been specially developed for the purpose of studying the effects of KHSV latent infection in endothelial cells. TIVE cells are adherent cells grown in Medium 199 supp lemented by 20% FBS and 60 g/mL Endothelial Cell Growth Factor (ECGF). TIVE ells require change of medium twice a week, and should be split on a weekly basis when they reach 60% confluency.


71 2. 4 .4 Transduction and Validation TIVE and BJAB c ells were retr ovirally transduced at two levels of Multiplicity of Infection (MOI): 1 and 10. 72hr post transduction, positive cells were sorted according to their GFP signal. Cells were aliquoted in 1 million cells per tube and frozen down in liquid nitrogen. Empty vec tors without miRNA expression cassett es were used for m ock transduction to control for the impact of retroviral integration on the cellular activities. The aim of the freezing is to synchronize the growth status of the cells across samples, and to reduce noise to microarray profiling. Later, cells were removed f rom liquid nit rogen and grown for a same amount of time. RNA was extracted using the RNA Bee reagent according to the manufacturers instructions. The RNA was quantified usi ng Nanodrop. The quality of RNA was confirmed by NanoDrop spectrometer and agarose gel electroph oresis The integrity of total RNA was assessed with Agilent Bioanalyzer. The ectopic expression of miRNAs was examined using TaqMan qPCR. Expression levels of miR155 and miR K12 11 were normalized by RNU66 levels. The MOI did not result in differences i n observed induction levels of the miRNAs. Therefore, I treated all samples as biological replicates of a common MOI 2. 4 .5 Microarray Analysis For each HG 133 plus 2.0 chip, 200ng RNA was used as the starting material. aRNA was synthesized and labeled us ing GeneChip 3 IVT Express Kit and chips were hybridized according to manufacturer instructions (Affymetrix). Raw data (cell intensity files, CEL) were summarized using Affymetrix Expression Console software (v1.1). Chips were examined for successful hybridization by ensuring that the marginal distribution of all slides was similar. Samples were compared for the global effect of miRNA treatment at a population level using principal component analysis (Johnson and


72 Wichern, 1992) Probe sets were flagged as absent if they were absent according to Affymetrix probe detection algorithm (Affymetrix Statistical Algorithms Description Document. http://media.affymetrix.com/support/technical/whitepapers/ sadd_whitepaper.pdf) in more than half of the samples. The following model was fit yij i ij where yij is the difference of the log2 signals for each probe set between the miRNA transduced and control vector for the ith signal differences between miRNA transduced samples and their corresponding control samples were used as this paired design reflects the experimental design. The test of i =0 is a direct test of the miRNA condition. F tests for each of the miRNA conditions (miR 155 in BJAB, miR K12 11 in BJAB, miR K12 11 in TIVE) were conducted. An FDR of 0.05 was used to determine statistical significance for the probe set (Benjamini and Hochberg, 1995) The probe sets were annotated by comparing the genome positions of human genes and of probe set hits. A gene was considered differentially expressed (DEG) when at least one probe set was significant. The change in expression levels was the difference in the mean of all probe sets between treatment and control. DEGs were examined for potential functional groups by enrichment analysis ( (Mootha et al., 2003) Enriched Gene Ontology terms (Subramanian et al., 2005) of the DEGs and known biological pathways were compared using Fishers exact test. 2. 4 .6 Identification of Direct miRNA Targets Our analysis and previous reports (Alexiou et al., 2009; Ritchie et al., 2009) found the lack of concordance across the miRNA target prediction of different algorithms. This is the result of using different training set of target genes when the


73 algorithms were developed. I created a comprehensive list of putative targets of miR 155/miR K12 11 by using the union of target prediction from multiple algorithms: EMBLEBI mirBase (Betel et al., 2008) TargetScan (Friedman et al., 2009) PITA (Kertesz et al., 2007) DIANA (Kiriakidou et al., 2004) miRDB (Wang, 2008) RNA22 (Miranda et al., 2006) mirWalk (Dweep et al., 2011) mirZ (Gaidatzis et al., 2007) and PicTar (Krek et al., 2005) In addition, I used SylArray (Bartonicek and Enright, 2010) to identify enrichment of miRNA seed sequence matches. 2. 4 .7 Identification of Transcription Factor Regulation A list of human transcriptional factor (TF) genes was obtained from the JASPAR database (Sandelin et al., 2004) and a TF census study (Vaquerizas et al., 2009) DEGs on this list as well on the miRNA target list were were examined in detail for expression changes and biological implications, as they were the primary targets of the miRNA. I used MAPPER (Marinescu et al., 2005; Riva, 2012) which uses binding site information from TRANSFAC and JASPAR databases derived Hidden Markov Models, to detect putative transcription factor binding sites (TFBS). Genes containing TF BS within the upstream 2kb region of transcription start sites were identified as genes that might be under TF regulation. For DEGs with the same direction of expression change, enriched motifs in their promoter regions were identified using RSAT oligo analysis (Thomas Chollier et al., 2011) The motifs were compared to the binding motifs of TFs using the TOMTOM program of the MEME suite (Bailey et al., 2009) Motifs identified from up and downregulated set of DEGs were compared, and unique motifs for each set were identified. Additional evidence for TF regulation was obtained from literature se arch and the Transcriptional Regulatory Element Database (TRED) (Jiang et al., 2007) ChIP seq


74 (measuring DNA protein interaction) and DNase seq (measuring DNA accessibility to regulatory proteins) profiles of the ENCODE project (Neph et al., 2012) from corresponding cell types were used to constrain the TF regulated genes to be tissue specific. 2. 4 .8 Identification of Signaling Genes Human signaling pathway data was obtained from the National Cancer Institute Pathway Interaction Database (NCI PID) (Schaefer et al., 2009) which is a manually curated collection of biomolecular interactions and key cellular processes assembled into signaling pathways. NCI PID holds 128 pathways including 47 subnetworks. I combined all subnetworks with their parent networks to the set of signaling pathways. Pathways curated in the BioCarta database (http://www.biocarta.com/) were used for crossreferencing to reduce ambiguity. In addition, I kept all pathways that have more than one predicted microRNA target gene, leading to a final data set of 79 human signaling pathways containing 1573 unique human proteins. The database also provides information on subcellular location terms from the Gene Ontology Consortium. I extracted process type information for each biological process, which can be input, output, positive or negative regulator. In total, there are 1120 interactions of which 765 are activating, 74 inhibiting and 281 proteins acting as activators as well as inhibitors. 2. 4 .9 Identification of Functional Interaction I assembled a binary interactome enabling an overview of all physical interactions that can occur between human proteins. Gene association data were downloaded from GeneRIF (Gene References into Function) database at NCBI (Benson et al., 2005) and the IntAct database (Kerrien et al., 2007) at EBI on Febuary 28 2011. The interactions in GeneRIF are sourced from Bind (Bader et al., 2001; Bader and


75 Hogue, 2000) BioGrid (Breitkreutz et al., 2008; Stark et al., 2006) EcoCyc (Karp P d and et al., 2000) and HPRD (Peri et al., 2004) The IntAct database includes interactions from literature curation at EBI as well as user submission. Only proteinprotein interaction data for human was retained. The formatted data contain a list of focal genes that covers all available values of gene ident ifiers, the interacting genes for each focal gene, the detection method and the source of the interaction. Secondary interactions are derived as the interactions of the genes identified as interactors of the initial focal gene. The human PPI networks were plotted as undirected graphs, where the nodes are proteins and two nodes are connected by an undirected edge if the corresponding proteins physically bind to each other. DEGs were mapped to the interactomes to identify the interactants of the indirect targ ets. The expression levels of genes belonging to the map were examined and absent genes were removed. Upand downregulated DEGs were flagged to display in different colors. A focal gene and its neighboring genes were defined as a subnetwork. The percenta ge of DEGs in the subnetwork for each focal gene was calculated. If DEGs were present more often than in the experiment as a whole, the focal gene was identified as an enriched regulator and its subnetwork was considered as responsive. GO enrichment was al so examined on the enriched regulators, to determine if transcriptionally regulated subnetworks shared GO terms indicative of known or related biological functions. The subnetworks were viewed in Cytoscape (Cline et al., 2007; Shannon et al., 2003) for active biological pathways.


76 Table 2 1. MiRNA regulated genes in each treatment group miRNA Cell type Direction FDR <0.05 FDR <0.05 and FC > 1.2 FDR <0.05 and FC > 2 miR K12 11 TIVE Down 1607 1332 151 Up 1582 miR K12 11 BJAB Down 608 325 21 Up 607 miR 155 BJAB Down 52 37 4 Up 89


77 Table 2 2 Transcription factors that contain seed sequence for miR K12 11, and show a transcriptional response to the ectopic expression. Gene Name Activities Expression fold change CEBPB CCAAT/enhancer binding protein (C/EBP), beta transcription factor for involved in immune and inflammatory genes; binding IL1 response element in the IL6 gene; binding several acute phase and cytokine genes > 1.4 E2F1 E2F transcription factor 1 Cell cycle control; tumor suppressor; binding RB in a cell cycle dependent manner; mediates proliferation and apoptosis >1.4 PAX6 paired box 6 homeo box transcription factors >2 RELA v rel reticuloendothelios is viral oncogene homolog A The most abundant form of NF kappaB is NFKB1 complexed with the product of RELA. 1.4 STAT1 signal transducer and activator of transcription 1 transcription activator; in response to cytokines and growth factors STAT1(>2) STAT2(>2) STAT3(>1.4)


78 Figure 21 MicroRNAs can affect GRNs directly and indirectly. The regulatory effects of a miRNA are not limited to the direct targeting within RISC. Both direct and indirect targets are i ntegral components of GRNs and should be included in functional analysis. When a miRNA is over expressed, its direct targets should be downregulated. If the direct target is a repressor of downstream genes, then as a result of miRNA regulation, these genes will be derepressed and their levels will go up. On the other hand, genes downstream of activators will go down accordingly with the direct targets. In addition, proteins with physical association to function together in protein complex will also be affected.


79 Figure 22 Experimental design and analysis pipeline. Orange boxes: secondary resources that were incorporated to the experimental data. MiR K12 11 and its human homolog were ectopically expressed in endothelial and B cells using retroviral transduction. By comparing the microarray profiles of miRNA expressing cell and the mock transduced cells, genes with significant changes were identified. The downregulated genes with predicted miRNA binding sites were categorized as putative direct targets of miR K12 11/miR 155. For the direct targets that are also transcription factors, the transcription factor binding sites (TFBS) were searched in the promoter region of other genes. For those indirect targets, motif analysis within their sequences identified potential regulators. In addition, Gene Ontology and known proteinprotein interactions help to build the regulatory networks


80 Figure 23. BJAB and TIVE cells stably express GFP after foamy virus transduction and purification by fluorescence Activated Cell Sorter. Left: BJAB. Right: TIVE.


81 Figure 24. Ectopic miR K12 11 and miR 155 expression. Up: Expression levels of miR155 in BJAB cells. There was endogenous expression of miR 155, although the ectopic miRNA expression was higher. The expression levels were for. Multiplicity of infection (MOI, i.e. copies per cell) did not result consistent changes in the expression levels. The second batch was problematic and removed from future analysis. Y axis: relative quantity. Down: Copy numbers of miR K12 11 in transduced cells compared to the PEL cell line BCBL1. The absolute numbers of miR K12 11 fr om transduced cells were much lower than in BCBL1. The ectopic expression was not over expressed to super physiological level.


82 Figure 25. Overall effect on the transcriptome after ectopic miRNA expression. Up : miRNA effects are quantitatively moderat e. The fold change of expression levels for most DEGs was below 2fold. Down: Overlapping of downregulated genes was few.


83 Figure 26. Verification of microarray measurements by qPCR on four previously reported genes that were targets of miR 155/miR K 2 11.


84 Figure 27. Different components of the same IFN pathway were targeted in TIVE and BJAB cells Green : unchanged ; Blue: upregulated; Pink: downregulated; Pink boxes with red words: downregulated genes that are potential direct targets. Up : in BJAB cells, the cytokine receptor may be directly targeted by miRK12 11, leading to reduced levels of downstream factors. Down: in TIVE cells, the transcription factor STAT and AKT are directly targeted, amplifying the effect to a large set of genes.


85 Figure 28. TFBS prediction for TFs directly targeted by miR K12 11 in TIVE Up: MAPPER2 program predicted thousands of genes with binding sites for each of the 5 TFs. A majority of the prediction were not on the array, or not expressing in our context, or were not affected based on our microarray measurement, or located in inacti ve chromatin regions according to ENCODE data. Down: Genes containing TFBS and downregulated can be further divided into two groups. Red genes contain binding sites for both TF and miRNAs. Change of expression of green genes can only be attributed to tran scriptional regulation by TFs.


86 Figure 29 MiR K12 11 attenuated the interferon pathways by downregulating multiple genes. Pink shapes: downregulated. Eclipse: transcription factors. Triangle: Kinases. Circle: Enzymes. Rectangular: cytokine suppressors. Arrows: positive regulation. Blunt arrows: negative regulation Signaling effectors such as STAT, IRF and RELA were likely directly targeted by miR K12 11. As transcription factors, their decreased levels affected many genes that under their transcr iptional control, including BCL2, OAS family, and IFI family.


87 Figure 210. Connectivity of human proteinprotein interactions The distribution follows the power law. Few proteins have many neighbors, while most genes are sparsely connected.


88 CHAPTER 3 PARTITIONING TRANSCRIPT VARIATION IN DROSOPHILA: ABUNDANCE, ISOFORMS AND ALLELES13.1 Overview Gene expression analysis has proceeded from a primary focus on overall transcript level (Rifkin et al., 2003; Ross et al., 2000; Schena et al., 1995) to more sophisticated analyses including those which examine expression of different isoforms (Johnson et al., 2003; Kwan et al., 2008) or of individual alleles (Lo et al., 2003; Zhang et al., 2009) Commercial platforms exi st for measuring 3 expression or exon level expression, as yet there is not a single cost effective platform for measuring expression at multiple levels. This paper presents an array with three modules: 3 expression, exon and SNP probes for Drosophila. In diploid organisms, expression from two, potentially different, copies of each gene contributes to transcript level and subsequent protein production. Unequal expression of these alleles is termed allelic imbalance (AI). AI is observed in model organisms and humans (Graze et al., 2009; Guo et al., 2004; Lo et al., 2003; Zhang and Borevitz, 2009) AI is a factor in predisposition to complex diseases (de la Chapelle, 2009; Meyer et al., 2008) and contributes to phenotypic variation in human populations (Johnson et al., 2005; Pickrell et al., 2010) For example, AI is associated with the risk of developing breast cancer (Meyer et al., 2008) and colorectal cancer (de la Chapelle, 2009) Reprinted from the published journal article: Yang Y, Graze RM, Walts BM, Lopez CM, Baker HV, Wayne ML, Nuzhdin SV, McIntyre LM. Partitioning transcript variation in Drosophila: abundance, isoforms, and alleles. G 3 (Bethesda). 2011 Nov;1(6):42736. 1


89 AI has a genetic basis (Pastinen et al., 2004; Serre et al., 2008; Verlaan et al., 2009; Wang et al., 2008a) Exciting new developments in the study of complex diseases revealed regulatory polymorphisms contributing to the evolution of gene regulation (e.g. (Emerson et al., 2010) ). Whole genome associations of gene expression and phenotype identify the genetic basis of disease and other important phenotypic variation (Nica and Dermitzakis, 2008; Nica et al., 2010; Stranger et al., 2007) AI identifies causal cis regulatory variants (Wittkopp et al., 2004) Allele specific association studies advance these analyses, and increase scientific knowledge of the regulatory process (Rockman and Kruglyak, 2006; Serre et al., 2008; Stamatoyannopoulos, 2004) Analysis of AI is an important next step in identifying the genetic basis of expression differences. AI has been assayed with pyrosequencing (Ahmadian et al., 2000; Wittkopp et al., 2004) targeted SNP typing arrays (e.g. Serre et al., 2008) high density array designs (e.g. Zhang and Borevitz, 2009) RNA Seq based methods (McManus et al., 2010; Pickrell et al., 2010; Zhang et al., 2009) and by smaller scale methods such as allele specific qPCR (Szab and Mann, 1995) This paper presents a custom array for measuring 3 expression, exon expression (and thus alternative splicing) and AI. The array has been designed for Drosophila on an Affymetrix platform (UFL Custom Dros_snpa520 726F Array Format: 497875, available for purchase from Affymetrix). The use of a single platform is cost effective and statistical analysis is simplified by the single hybridization. We designed 60,118 D. simulans SNP probe sets from previously reported S NP variants (Begun et al., 2007; Benson et al., 2005) In total these probe sets allow AI to be assessed for


90 11,929 genes (79% of 15,107 genes in FlyBase R5.11), with the majority of genes represented by multiple SNP probe sets. This SNP module is complemented by two additional modules: one which measures 3 expression, and another which analyzes exon level expression concurrently with allelespecific expression. Experiments show an amount of sex bias (34% of 18,769 probe sets), alternative exon usage (164 genes) and AI (37% of 6,579 probe sets within a species) consistent with previous reports on other platforms (Fontanillas et al., 2010; McIntyre et al., 2006; Telonis Scott et al., 2008; Wayne et al., 2007) 3.2 Results 3.2.1 Quality Control Quality control evaluations showed that the three C167.4 parental DNA hybridizations had overall weaker signals, and the kernel density distribution was markedly different from all of the other chips. The DABG was only 70% for these hybridizations (Figure 3 4 ) compared to ~90% for other DNA samples. The remaining chips showed no obvious problems with hybridization. All modules (3 expression, exon, and SNP) had similar proportions of probes det ected above the median of the GC band control signals (Table 3 1). The proportions were ~72% for RNA samples and ~90% for DNA samples. The distribution of signal values across all modules was similar for all RNA hybridizations and differed from DNA hybridi zations (Figure 3 5 ). PCA identified no other hybridization anomalies. Both sexes had similar hybridization patterns. The normalized signal intensities of the exon module probe sets and expression module probe sets for Acp and Yp genes gave the expected r esults (Figure 3 6 ). The average intensity for each gene was consistent between modules.


91 To test the smaller format features of the 3 expression probe sets, we analyzed the effect of sex on expression level. Previous studies in Drosophila (Bownes, 1994; Jin et al., 2001; McIntyre et al., 2006; Parisi et al., 2003; Ranz et al., 2003; Telonis Scott et al., 2008) have all foun d sex bias in overall expression. Analysis of differential expression on the 3 expression module revealed a strong effect of sex on gene expression. Three different significance levels (FDR < 0.05, FDR < 0.1 and FDR < 0.2) are reported (Table 3 2). Among all sex biased probe sets at FDR < 0.1, 2,216 had higher expression in males and 4,140 had higher expression in females. Sex bias of expression was similarly analyzed for exon probe sets corresponding to single exons which exist in all transcripts of a gene (constitutive) (n =47,122; Table 3 2). The results for exon probe sets were compared to the 3 expression probe sets that correspond to the same genes. There were 2,091 genes with probe sets corresponding to exons contained in all transcripts, which coul d be easily matched to a single probe set in the 3 expression module. The majority of genes showed similar sex bias between the 3 expression and exon module and simple agreement was high (1,720, 82%). The Kappa statistic was 0.63 between the two modules indicating good agreement. There was no apparent asymmetry in detection with 106 genes detected by the exon module alone, and 265 genes detected by the 3 expression module alone. The McNemars test statistics was 68.1429 with P value smaller than 0.0001. There were 164 genes that showed a significant interaction between the exon type and sex, and these were considered as showing putative isoform specific sex bias (FDR 0.1). 3.2.2 Verifying the SNP Module The D. simulans sequences used to design the SNP probes were sequenced as part of the DPGP. The sequencing strategy of DPGP was to sequence one line at ~4X


92 coverage and six additional lines (including C167.4) at 1X. The confidence for SNP calls, therefore, varied depending on quality and depth. There were 60,118 probe sets of these 42,978 had SNP base information for C167.4 from DPGP, 35,379 had additional data available from Illumina sequencing of C167.4 RNA, and 49,758 had additional data from Illumina sequencing of st e DNA. The concordance of C167.4 SNP base calls from the ~1X DPGP C167.4 strain genome sequence used for the chip design and the resequencing was 66.72%, significantly larger than expected by chance. This rate did not affect the quality of the SNP probe sets as C167.4 was only one of the seven lines in DPGP that were used for our chip design. The concordance between the resequencing base compared to the alleles used in the design was impressive. The C167.4 RNA Seq base agreed with one of the two alleles identified in the design 99.58% of the time. Agreement between the st e DNA Seq bases and the two alleles in the design was 99.75%. Next, the concordance between the resequencing and hybridization was examined by comparing resequencing SNP base calls to the probe bases ranked by the strength of their hybridization signals. The comparison was carried out separately for cases where the r esequencing bases were the same for C167.4 and st e (homozygous F1) or where they were different (heterozygous F1). For n = 13,573 SNP probe sets homozygous at the SNP site in the C167.4/ st e genotype, the ranked hybridization intensities of probes corres ponding to each base (within a given SNP probe set) was compared to the predicted genotype at the SNP base (Table 3 3). A SNP probe set was included in this comparison when the common SNP allele call for C167.4 and st e strains corresponded to one of the alleles in the chip design, and the base was also the


93 same in the DPGP sequence for the C167.4 strain. For example, if the SNP allele in both strains is A (with respect to the forward strand), probes corresponding to targets with A at the SNP position are expected to show increased hybridization intensity relative to the signal for the T, G, and C probes for all genotypes tested. The percent of probe sets where the probes corresponding to the target SNP allele show the highest intensity is reported overall and separately for each base (Table 3 3). The concordance between the resequencing SNP and the C167.4 DNA arrays was significantly lower than other arrays, likely caused by the weaker signal intensity of the C167.4 DNA arrays. Similarly, the ranked hybridi zation intensities of probes corresponding to each base (within a given SNP probe set) was compared to the predicted genotype at the SNP base for n = 2,769 SNP probe sets heterozygous at the SNP site in the C167.4/ st e genotype. Our observed concordance i s striking given the small number of genotypes used in this experiment. A previous study (Borevitz et al., 2003) using arrays for detecting sequence polymorphisms reported a similar error rate. Larger experiments with more sam ples can reduce the error rate (Edenberg et al., 2005; Rabbee and S peed, 2006) In comparison, a short read sequencing experiment requires more than 200 reads unambiguously mappe d to the gene for each gene to achieve a similar result (Fontanillas et al., 2010) The hybridization signals were compared to the two alleles (PM1 and PM2) used in the design. The concordance between the highest two ranked probe bases and the PM1 and PM2 bases w ere high: 96.22% for C167.4 DNA chips and 98.77% for st e DNA chips. For hybrid F1s, the concordance percentage was 98.62% for DNA chips and 97.53% for RNA chips. To determine whether the pattern of hybridization could be


94 used to predict genotype, we perf ormed a linear discriminate analysis (Johnson and Wichern, 1992) for several arbitrarily selected probe sets heterozygous for the SNP base in the F1 hybrid. All we examined showed visual separation of the expression patterns (Figure 3 7 ). These comparisons confirmed that t he hybridizations performed as expected. 3.2.3 AI Analysis There were 33,914 unambiguous probe sets with sequence information for both the st e and C167.4 resequencing experiments at the SNP site. The SNP base was the same for the parental two lines for 74.78% of these probe sets, which were classified as homozygous for the F1 genotypes (n=25,362). The rest of the probe sets (8,552) had heterozygous F1 genotypes. The 6,579 autosomal probe sets were analyzed for AI on the combined data from both sexes and for each sex separately (Table 3 4). 3 3 Conclusions Our custom platform performs similarly to previous array platforms with a larger feature size (Jin et al., 2001; McIntyre et al., 2006; Ranz et al., 2003; Wayne et al., 2007) MCINTYRE et al. (2006) analyzed 10,014 transcripts in eight lines of D. melanogaster and identified 5,221 sex biased transcripts at FDR 0.05 (56% male bias and 44% female bias). The overall sex ef fect for eight genotypes was 53%. Similarly, on a study using nine D. melanogaster lines, Wayne et al. (2007) reported 7,617 genes out of 9,312 genes with sexually dimorphic expression (4,070 male bias and 3,547 female bias). These previous studies used multiple genotypes of D. melanogaster When the data for Wayne et al. (2007) were reanalyzed for each genotype separately, the percentage of sex biased genes ranged from 1.29% to 40.09%. For the genotype considered in this study, 31% of genes showed a significant sex effect, close to the


95 upper end of the range. A slight excess of genes with increased expression in females was also observed in this analysis, as is seen in previous analyses (Ranz et al., 2003) These findings are consistent with findings from arrays with a larger feature size. Two previous studies using array designs (McIntyre et al., 2006; Telonis Scot t et al., 2008) found signifi cant sex differences in alternative exon usage for many genes. For the single genotype examined here, approximately 5.6% of the genes examined showed evidence of sex specific isoform expression. For four genes that are components of the sex determination pathway with previously reported sex specific splicing in adults, tra2, Sxl, dsx and fru at least one exon shows evidence of sex bias. This is the first genomewide study of allelespecific expression variation within D. simulans Although only one genot ype was used, significant differences in AI are detected for 37% of probe sets examined which contained heterozygous SNP loci. Other work using a priori selected genes found almost 67% of the genes tested showed evidence for AI within species (Wittkopp et al., 2008) This chip can be used to detect allele specific variation in expression within species. Large differences between males and females were detected in the number of probe sets that showed significant differences in AI. It is currently unclear if this result is explained by differences in AI between males and females or by sex bias in overall expression making power for detection uneven. Likewise to microarray studies, to adequately assess ASE for a particular transcript using RNA Seq, there must be adequate coverage for both alleles of that particular gene. For samples from the same organism and tissue, detection of transcription for a particular exon in four gigabases of RNA Seq data is 57% (Graze et


96 al., 2012) while detection is 72% for tiling arrays (Graze et al., 2009) Initial stu dies of AI examined how many informative reads were needed per gene for estimation of allelic frequencies (Fontanillas et al., 2010) This study suggests that average depth of coverage needed is quite large, if most genes are to be evaluated. The actual coverage needed depends on specific assumptions and the number varies; but is often in excess of 100X coverage. Other examinations of RNA Seq find that a minimum average depth of 5 reads per nucleotide are needed for estimation of expression (McIntyre et al., 2011) One lane of a GAIIX provides sufficient reads at a coverage of 5X to assess ~30% of the transcriptome (McIntyre et al., 2011) Array s still provide a cost effective way of assessing transcription for the whole genome (Malone and Oliver, 2011) Detailed studies within species that examine AI variation genomewide and identify the impact of sex on this variation are needed in order to understand the true extent of cis regulatory variation within species in Drosophila. This array is a good tool for such studies because it will allow the overall and allelespecific components of expression variation to be examined in a single experiment on a singl e platform for many more genes than has previously been possible. 3 4 Methods 3.4.1 Chip Design The chip has 2,424,414 informative features, covering four types of probes: SNP probes (n =1,442,832, 60,118 probe sets), 3 expression probes (n =262,766, 18,769 probe sets), exon probes (n =699,865, 61,919 probe sets), and control probes (16,943 GC band controls, 2,008 hybridization and labeling controls; Figure 3 1). The 3 expression probes consist of all perfect match (PM) probes from the Affymetrix GeneChip Drosophila Genome 2.0 array (900531, 900532 and 900533). The exon


97 probe sets provide measurements of expression from each individual exon, allowing controls for signal fluctuation caused by 5 bias in expression assays, as well as measurement of alternative exon usage. The exon probes consist of all Affymetrix Drosophila Tiling 2.0 Array (901021) probes that map uniquely to exonic regions (FlyBase R5.11 August 2008) at the time of chip design. Overlapping exons with alternative start/end sites in the same genomic region were combined into a single unique exonic region. The majority of exonic regions contain a single exon. For simplicity, exonic regions are referred to simply as exons throughout the text. Each exon corresponds to a unique probe set. The 3 expression probes and exon probes on this custom chip were designed by Affymetrix from D. mela nogaster sequences. The probe sets have been used for other Drosophila species (Dworkin and Jones, 2009; Graze et al., 2009; Kopp et al., 2008; Lu et al., 2010b) Using these probe sets allows for direct comparisons to existing literature and straightforward quality control. As each probe set has multiple probes, the impact of divergence is likely to be minimal on summary measures of expression. However, investigators comparing among species should consider filtering individual probes. The SNP module was designed for the purpose of estimating AI. There were three main steps in this design: SNP identification, SNP quality assessment and probe selection. SNP identification: alignment sets were created from multiple sequence sources, including FlyBase R5.4 exons (n =68,536), six D. simulans strain genomes for Drosophila Population Genom ics Project (DPGP, http://www.dpgp.org, (Begun et al., 2007) and all D. simulans sequences (343,420) from Genbank (Benson et al., 2005) that were not annotated as whole genome. In DPGP, D. simulans genomes, except for


98 the heterochromatic regions, were assembled against the FlyBase R4.2 D. melanogaster genome. Exons from FlyBase R5.4 were BLAST (Altschul et al., 1990) aligned to the DPGP genomes and Genbank sequences. There were 325 exons with only Genbank sequence, 62,161 with only DPGP sequence, 3,163 with both Genbank and DPGP sequence, and 2,887 for which no sequence was available. For each FlyBase R5.4 exon, its genome location i n D. melanogaster R4.2 genome was determined by BLAST Exons that matched more than one location and those located on chromosomes four or U were excluded (n =1,912). All matching sequences for each exon were aligned using ClustalW (Thompson et al., 1994) to create a multiple sequence alignment at the exons genome position. All SNPs, regardl ess of location in the exon, were identified from the multiple sequence alignment. SNP quality assessment: A design window, which consisted of 17 bases upstream and 17 bases downstream from each SNP (Figure 3 2) was the basis of SNP quality assessment. A SNP locus supported by fewer than five sequences was discarded. SNPs were also discarded when the design window mapped to multiple places in the genome, and when more than one SNP occurred in the design window. T hese criteria identified 589,915 SNPs, 196,345 were biallelic. Only biallelic SNPs were considered further. There were 558 exons for which SNP data were identified from Genbank alone, 51,418 exons for which SNP data were identified from DPGP alone and 2,99 2 identified from both, resulting in a total of 54,968 exons with SNPs present, i.e. 81% coverage of the entire FlyBase R5.4 transcriptome (68,536 exons). Probe selection: For each SNP, 24 probes were designed, with the SNP at the 0, +4, and 4 position f rom the probe center, for the forward and reverse strands, and with


99 each possible base (A, C, G and T) at the SNP site. Probe hybridization quality was predicted by an Affymetrix internal scoring algorithm, which takes into account Tm, secondary structure and previous empirical observations. If any probe contained a homopolymer run, or could not be synthesized, or if one third or more probes had poor predicted hybridization, the probe set was eliminated. For all genes with seven or fewer SNPs, all SNPs we re selected. If a gene had more than seven SNPs additional probe sets were selected at random (n =610) to fill the chip. In sum, 60,118 custom SNP probe sets, representing 11,929 genes (Figure 3 3) were included on the chip. The mean number of probe set s per gene was 4.4. The majority (8,013 genes) had more than three probe sets. The chip library files are available at http://bioinformatics.ufl.edu/McIntyre_Lab/ASE. Probe sequences and chip annotation can be found at Gene Expression Omnibus (GEO) using accession ID GPL11273. 3.4.2 Verification Experiment Experimental Design: Two different isogenic strains of D. simulans st e and C167.4 and their male and female progeny were used as the basis for the verification study. Three replicates of RNA from femal e and male progeny of the cross st e X C167.4 were assayed for a total of six RNA samples. In addition, DNA was used as a control for estimating AI (Degner et al., 2009a; McManus et al., 2010; Wittkopp et al., 2008; Wittkopp et al., 2004) Three replicate gDNA samples were prepared for female st e female C167.4 and the female F1 progeny of the cross st e X C167.4, for a total of nine gDNA samples. Sample collection: Flies were reared in incubators (25, 12:12 hour light/dark cycle) on a standard dextrose medium. Isogenic strains of D. simulans (C167.4, BDSC

PAGE 100

100 4736; st e isogenic, DSSC 140210251.041 inbred >20 generations) were used. For each of three cross genotypes (C167.4, st e and C167.4 by st e ) 20 virgin females were crossed to five males. Female and male progeny were collected on consecutive days (under CO2) and aged 57.5 days in single sex vials. Flies were then flash frozen in liquid nitrogen (without anesthesia) in a 2. 5 hour window (46:30pm). For RNA samples, two sets of 20 flies (subsamples) were collected for each replicate from multiple cross vials. No vials were used for more than one replicate. Sample processing: Flies were freeze dried at 20 overnight prior t o homogenization. Dried flies were ground to a fine powder using a GenoGrinder (Maximum, three minutes, repeated twice). Trizol (1 ml) was added to each homogenized sample and mixed thoroughly in the GenoGrinder (Max, 3 minutes). Samples were transferred t o a new tube, 1 l linear acrylamide was added to each and samples were then incubated at room temperature for 5 minutes. RNA was extracted using a standard Trizol extraction protocol: phase separation using 0.1 vol BCP, RNA precipitation with isopropanol 70% ethanol wash and resuspension in 80 l DEPC H2O. Concentration was measured using a NanoDrop and up to 30 g RNA per sample was treated with DNase I in 100 l reaction volumes for 30 minutes at 37 (reaction mix: 4 U Cloned DNase I TaKaRa 2220A, 80 U Promega Recombinant RNasin N2515, in 1X TaKaRa Cloned DNase I Buffer II). Samples were cleaned prior to concentration using the Qiagen RNeasy Mini Kit (Cat. # 74104) following the manufacturers standard protocol with 30 l DEPC H2O elutions (run through the column twice). RNA quality was examined using BioAnalyzer RNA 6000 Nano chips and all samples were found to be of good quality. Genomic DNA was isolated from 3540 flash frozen females using the

PAGE 101

101 AllPrep Mini Kit (Qiagen) following standard manufactur ers protocols. Samples were concentrated by standard ethanol precipitation and resuspended in 31 l DEPC H2O. Fragmentation, labeling and array hybridization: Target materials were prepared for array hybridization using the recommended Affymetrix kits following the no amplification protocol of GeneChip WT DoubleStranded Target Assay Manual (DNA samples started from Procedure D forward) for single Tiling Arrays. Briefly, 10 g of total RNA was concentrated to 8 l in DEPC H2O followed by 1st and 2nd st rand cDNA synthesis using the WT DoubleStranded DNA Synthesis Kit (P/N 900813). Following GeneChip Sample Cleanup Manual (P/N 900371), 7.5 g of dsDNA was fragmented. For each DNA sample, 7.5 g of gDNA was fragmented to between 25 and 200 bp with 0.02 U/ g DNase I (Takara Cloned DNase I, 2U/ l) in a 40 l reaction with 4 l reaction buffer (10X reaction buffer: 100 mM Tris acetate, 100 mM magnesium acetate, and 100 mM potassium acetate) and 0.8 l BSA (10 mg/ml). Reactions were incubated 16 minutes at 37 = and heat killed at 99 for 15 minutes and fragment size was checked by agarose gel electrophoresis. Fragmented cDNA and gDNA targets were labeled using WT DoubleStranded DNA Terminal Labeling Kit (P/N 900812). The prepared target samples were hybridized using the Hybridization, Wash and Stain Kit (P/N 900720) following the manufacturers protocol (FS450_0001) for the Fluidics Station 450 with protocol. Arrays were scanned using an Affymetrix 7G scanner. The GEO accession number for the array data is GSE31750. 3.4.3 Signal Quantification Signals were extracted from the scans using the apt cel extract program of the Affymetrix Power Tools (version 1.10.2) suite. GC bin control probes provide an

PAGE 102

102 estimate of non specific hybridization (Affymetrix, 2005) and help to assess the overall quality of the hybridization. A GC bin control is a standard Affymetrix control based upon the number of G/C bases (from 3 to 24) in the 25 mer probe. None of the GC bin control probes align to the D. melanogaster or D. simulans reference genomes. Individual probes were classified according to their GC content and matched to the corresponding GC bin controls. A probe was considered detected when signal strength was higher than the medi an intensity of the corresponding GC band controls. Detection above background (DABG) was calculated at the individual probe level. The overall intensity of the array was evaluated at the individual probe level. To correct for the background noise and to n ormalize the probe signals, each probe was classified into a GC bin and the five percentile signal for that GC bin was subtracted from the probe signal. Yi, the signal for probe set i, is estimated as: Yi=ln( j(XijGCj)/Ni +100). Xij is the intensity for probe j in probe set i and GCj is the average intensity for control probes in the corresponding GC bin. Ni is the number of probes in probe set i. Chip verification was analyzed first for the overall hybridization quality, then for each module on the chip: the 3 expression module, the exon module and the SNP module. 3.4.4 General Quality Control The distribution of the overall signal across all modules was compared using kernel density estimates for each slide separately (Silverman, 1986) with the goal of identifying any slide with an unusual distribution. Similar marginal distributions of kernel density would be expected for one sample type. Principle Component Analysis (PCA) (Johnson and Wichern, 1992) was carried out to determine whether there was any pattern or grouping to the data.

PAGE 103

103 To verify the veracity of probe set estimates of 3 expression, we compared the signal from probe sets for the well known sex biased genes (Bownes, 1994; Wolf ner, 1997) : Yp (Yp1, Yp2, Yp3) and Acp (Acp29AB, Acp32CD, Acp36DE, Acp53C14a, Acp53C14b, Acp53C14c, Acp62F, Acp76A). Consistency of estimation of gene expression across modules was also examined using BlandAltman plots (Bland and Altman, 1986; Bland and Altman, 1988; Dudoit et al., 2002; McIntyre et al., 2011) in which the exon module and SNP module were plotted against the 3 expression module. The feature size of this array is smaller (5micron) than for the Affymetrix GeneChip Drosophila Geno me 2.0 array (11 micron). Although the PM probes are identical for these two chips, feature size may have an impact on differential expression (Ammar et al., 2009; Dandy et al., 2007) This raises the concern of potential loss of sensitivity. To evaluate the performance of the 3 expression and exon modules, we compared expression for RNA samples between the two sexes of the F1 progeny. Sex biased expression is well described for Drosophila (Bownes, 1994; Jin et al., 2001; McIntyre et al., 2006; Parisi et al., 2003; Ranz et al., 2003; Telonis Scott et al., 2008) The fixed effects model, Y ij = +s i + ij (3 1) was fit for each probe set in the 3expression and exon modules where Yij is the signal for probe set i, sample j as described above and i s the overall mean, si is the fixed effect of sex, and ij is the random error. The null hypothesis that male and female sexes had equal expression levels was tested using an F test (Neter et al., 1990) All probes in a given probe set were used. As only one genotype is considered, any polymorphisms between the genotype used and the probe will be the same between the

PAGE 104

104 two sexes of the same genotype. Results were corrected for multiple testing using FDR (Benjamini and Hochberg, 1995; Verhoeven et al., 2005) Where there were multiple probe sets matched the same genes, the agreement between the exon probe sets and the 3 expression probe sets were examined for agreement in detecting sex bias using Kappa statistics (Fleiss, 1981) and McNemars test (Johnson and Wichern, 1992) There is a sex bias in isoform usage in Drosophila (Kwan et al., 2008; McIntyre et al., 2006; Telonis Scott et al., 2008) The use of alternative transcript isoforms between the two sexes can be detected from the measurements taken by the exon module. A probe set represents a constitutive exon for a gene ( included in all annotated isoforms), or an alternative exon (included in a subset of known isoforms). Inferences can become ambiguous when probe set annotations correspond to exon regions located in overlapping regions of multiple gene models. Probe sets m apping to more than one gene, or to the ambiguous regions of overlapping exons, were excluded from analysis (n =2,611). The model Y ij = +x i +s j +xs ij + ij (3 2) where x is the fixed effect of exon type and s is the fixed effect of sex was fit. Probe set s from multiple constitutive exons were grouped as one exon type while probe sets representing alternative exons were each considered a different exon type. The variance was estimated separately for each sex. The significance of the interaction was tested using an F test, followed by FDR correction.

PAGE 105

105 3.4.5 SNP calling and genotyping accuracy By design, there are DPGP/Genbank sequences for all 60,118 probe sets in the SNP set, from which biallelic SNPs were defined and used for the chip design. SNP alleles were verified using Illumina genome resequencing data for the C167.4 and st e strains (SRP005952) of D. simulans. D. simulans C167.4 sequence data were obtained from male head RNA libraries sequenced on multiple lanes with Illumina paired end procedures and chemistry (Celniker et al., 2009; McIntyre et al., 2011) D. simulans st e sequences were from genomic DNA extracted from adult st e D. simulans females. Average coverage was 30X. Reads were aligned to updated reference genomes (Graze et al., 2012) using Bowtie (Langmead et al., 2009) and LAST (Frith et al., 2010) Alignments were converted to pileup format using SAMtools (Li et al., 2009) SNP bases were identified from the pileup alignments and compared to alleles identified from DPGP/Genbank. The bases identified from C167.4 resequencing were also compared to the DPGP genome sequences for the C167.4 strain. 3.4 6 Analysis of AI To verify the chips capacity to identify differences in AI, a subset of SNP probe sets unambiguous for the two alleles from the design and confirmed by the C167.4 resequencing and st e resequencing where the F1 is heterozygous were selected. The model Yi jk= +s i + t j + ijk (3 3) was fit for probe sets in this subset using the nine F1 arrays (six RNA, three DNA). Yij is the normalized signal value for the it h sex, j th treatment and the kth replicate. The treatment groups were defined by combinations of nucleic acid

PAGE 106

106 (DNA/RNA) and allele (PM1, PM2, and MM) for a total of j=1 levels and k=1 replicates. AI was examined by testing the difference in hybridization intensity between PM1 and PM2 in the RNA, as compared to the difference in the DNA. An F test for this contrast was performed and the result was corrected for multiple testing using FDR. The power of detection of AI effects may differ between the sexes due to sex bias in gene expression. There may also be sex specific differences in AI. Both phenomena would result in a difference in detection of AI between the sexes. Unfortunately, the power for the test of an interaction was low. The AI was also analyzed considering the female and male data separately so that any differences between the sexes in the results may be apparent.

PAGE 107

107 Table 31. Proportions of probes detected above the median GC band control. The average proportion of probes detected above background (DABG) for genotype, nucleic acid and sex. An individual probe was detected when signal st rength was higher than the median intensity of the corresponding GC band control. Average DABG was calculated for each individual module as well as the overall slide. The individual modules of expression, exon, and SNP, as well as the entire slide, had sim ilar proportions of probes detected above the median GC band control. The distribution of signal values across all modules was similar for all RNA hybridizations and differed from DNA hybridizations. The three C167.4 parental DNA hybridizations had lower D ABG compared to the other two genotypes of DNA samples. Genotype Nucleic acid sex Overall DABG Exon module DABG expression module DABG SNP module DABG C167.4 DNA Female 0.714435 0.744564 0.750768 0.6968 st e DNA Female 0.898116 0.897156 0.919405 0.899683 C167.4/ st e DNA Female 0.908025 0.903021 0.915376 0.914164 C167.4/ st e RNA Female 0.722919 0.791827 0.761107 0.770503 C167.4/ st e RNA Male 0.722919 0.812318 0.79766 0.791968

PAGE 108

108 Table 32 Tests for the effect of sex from the 3 expression and exon probe modules at multiple FDR levels The probe sets in the 3 expression module and the exon module were tested for sex biased expression. The number of significant probe sets and the percentage of significant probe sets over all probe sets within a module are shown. Results from different significance thresholds (FDR < 0.05, 0.1 and 0.2) all indicate a strong sex effect measured both by the expression module and the exon module. FDR<0.05 FDR<0.1 FDR<0.2 3 expression module (n=18,769) 3,607 (19.22%) 6,356 (33.86%) 9,574 (51.01%) Exon module (n=47,122) 6,201 (13.16%) 15,241 (32.34%) 26,226 (55.66%)

PAGE 109

109 Table 33 The rank of hybridization signal corresponds to the expectation based upon sequence information (homozygous genotypes) Probe sets where:1) SNP allele calls for the C167.4 and st e strains correspond to the PM1 and PM2 alleles in the chip design; 2) the C167.4 base is concordant between the resequencing data and the DPGP sequence for the C167.4 strain; and 3) t he C167.4/ st e genotype is homozygous for the SNP site (n = 13,573) were considered. The signal from each base was estimated as the average of the probes representing that base. For each probeset the 4 bases were ranked according to signal and the base wi th the greatest hybridization signal was compared to the known base. The percentage of probe sets for which the base corresponding to the top ranked hybridization intensity was the known allele was calculated. Percentages are reported considering all SNP bases and separately for A, C, G and T alleles. The concordance between the resequencing SNP and the C167.4 DNA arrays was significantly lower than other arrays, likely due to the weaker signal intensity of the C167.4 DNA arrays. SNP allele Hybridized arr ays C167.4 DNA st e DNA C167.4/ st e DNA C167.4/ st e RNA Overall 71.47% 91.54% 86.95% 90.09% A 59.98% 84.82% 79.90% 83.11% C 77.12% 94.35% 90.23% 93.15% G 77.07% 95.26% 90.30% 93.30% T 59.94% 84.90% 80.44% 84.23%

PAGE 110

110 Table 34 Allele imbalance overall and separated by sex (n = 6,579) AI was tested for male and female samples alone and combined. Results from different significance cutoffs (FDR < 0.05, 0.1 and 0.2) are shown. The numbers of probe sets with a significant AI effect are reported. T here is large proportions of genes have significant AI in either male samples or female samples, as well as when they are analyzed together. FDR<0.05 FDR<0.1 FDR<0.2 Both sexes 2,013 2,453 3,004 Male 1,657 2,028 2,497 Female 923 1,384 1,899

PAGE 111

111 Figure 31. Probe design. A total of 2,424,414 probes were printed on the chip. They are of four types: SNP probes (n =1,442,832, 60,118 probe sets), 3 expression probes (n =262,766, 18,769 probe sets), exon probes (n =699,865, 61,919 probe sets), and co ntrol probes (not shown). The 3 expression probes consist of all perfect match (PM) probes from the Affymetrix GeneChip Drosophila Genome 2.0 array. An example of a 3 expression probe set is shown in navy. The exon probes consist of all Affymetrix Dros ophila Tiling 2.0 Array probes that map uniquely to exonic regions annotated in FlyBase R5.11 (August 2008). Exon probe sets within the example gene are shown in light blue. SNP probes are custom made. The SNP probes corresponding to a single SNP site base are shown in dark blue (matching the forward strand) and blue (matching the reverse strand).

PAGE 112

112 Figure 32 SNP probe design windows. For each SNP site there are four sets of probes, one for each SNP site base. The SNP base is designed at three differe nt positions of the probes: middle, shifted four bases upstream, or shifted four bases downstream. Each SNP probe set contains 24 probes, which can be classified based on alleles as PM1 (n =6), PM2 (n =6) and MM (n =12), for a total of 24 probes per probe set. A SNP probe set has a 35 bases design window, with sequences of 17 bases upstream and 17 bases downstream from the SNP. If there were fewer than five sequences supporting a SNP, the SNP was discarded. If more than one SNP occurred in the design window, then the alignment was considered suspect and the SNP was not included among those printed on the array. Only biallelic SNPs that were unique in their design window and supported by five or greater sequences in the multiple alignment were considered.

PAGE 113

113 Figure 33 Distribution of SNP probe sets per gene. 60,118 probe sets were selected for the SNP module for the custom chip, representing 11,929 genes. The number of genes (Y axis) with a given number of corresponding SNP probe sets (X axis) is shown. Most genes are represented on the array by one to five SNP probe sets.

PAGE 114

114 Figure 34 The proportion of probes detected above background (DABG) is reported for all probes sets of each sample: C167.4 parental DNA, st e parental DNA, and DNA and RNAs of the F1 genotype. DNA samples are shown in dark grey. RNA samples are shown in light grey. The y axis is the overall percentage of DABG. Probes were classified according to their GC content and matched to the GC band contr ols of the corresponding %GC bin. A probe was considered detected when signal strength was higher than the median intensity of the corresponding GC band controls. The three C167.4 parental DNA hybridizations had lower DABG compared to the other two genotypes of DNA samples.

PAGE 115

115 Figure 35 Box plot for probe intensity classified by genotype and nucleic acid. DNA samples are shown in dark grey. RNA samples are shown in light grey. The y axis is the normalized signal. Probes were classified according to thei r GC content and matched to the GC band controls of the corresponding %GC bin. The five percentile signal for that GC bin was subtracted from the probe for background correction. The corrected signals were then log transformed. The three C167.4 parental DN A hybridizations had overall weaker signals.

PAGE 116

116 Figure 36 Expression for known sex specific genes in female and male RNA samples. The y axis is the normalized signal. A value around or lower than three is close to the background intensity and therefore should be considered as not detected. Female samples are shown in red. Male samples are shown in blue. A. The mean signals of all probe sets for each Acp gene. B. The expression of individual probe sets designed for Acp genes. C. The mean signals of all probe sets for each Yp gene. D. The expression of individual probe set that was designed for Yp genes. The directions of sex bias are as expected (Acps are malespecific genes and Yps are female specific genes). Individual probe sets for a same gene behave consistently.

PAGE 117

117 Figure 37 Linear discriminant plots of three genotypes: AA, AC and CC. Different genotypes had hybridization patterns that are visually separable by linear discrimina nt analysis. Each genotype is colored differently.

PAGE 118

118 CHAPTER 4 LEVERAGING BIOLOGICAL REPLICATES TO IMPROVE ANALYSIS IN CHIPSEQ EXPERIMENTS 4.1 Overview The goal of chromatin immunoprecipitation (ChIP) experiments is to map the binding sites of a protein across the genome in a cell type or tissue (ORLANDO 2000). In ChIP assays, cellular DNA protein interactions are maintained by cross linking with formaldehyde. The chromatin is sheared into small fragments by sonication and the DNA protein complexes of interest are recovered using specific antibodies, resulting in an enrichment of DNA fragments that were bound by the protein of interest. The cross linking is then reversed; DNA fragments are released from the binding complex to be assayed. Usually there is a PCR amplification step to increase the amount of starting DNA. The first genome wide ChIP studies used microarray (ChIP chip) to analyze the DNA fragment s (Iyer et al., 2001; Ren et al., 2000) which can now be sequenced directly (ChIP seq) using massive parallel sequencing (Johnson et al., 2007; Jothi et al., 2008; Robertson et al., 2007) After the sequenced reads are aligned to the reference genome, sitespecific binding of transcription factors produces very narrow peaks at putative binding sites when visualized in a genome browser. Consistent with their functions such as stabilizing the chromatin state, histones bind the DNA from several nucleosomes to large domains covering multiple genes (Barski et al., 2007; Blahnik et al., 2011; Mikkelsen et al., 2007) These two distinct types of binding are termed as point source and broad source, respectively. RNA polymerase II is an example of mixed source factors, which can form both highly localized and spreading peaks at different genome positions (Baugh et al., 2009; Rozowsky et al., 2009)

PAGE 119

119 In addition to sequences truly associated with the protein of inter est, random background noise is also present due to nonspecific binding or biases in library construction and sequencing (Chen et al., 2012; Dohm et al., 2008; Kuan et al., 2011; Park, 2009; Vega et al., 2009) The use of control samples may mitigate these biases but cannot eliminate all sources of noise. The noisy nature of ChIP seq data makes peak identifi cation challenging. Consequently there has been a concerted effort on algorithm development for peak finding, with over 30 different programs available (Wilbanks and Facciotti, 2010) Peak placement depends upon the background in each independent experiment. Replication is necessary to separate actual biological events from variability resulting from random chance (Rozowsky et al., 2009; Tuteja et al., 2009) Technical replication measures a single biological sample repeatedly and allows estimation of the variability in the sequencing process. Biological replication measures multiple biological samples independently and enables inferences about variability during sample preparation as well as about the biological activity of the broader population where the samples are drawn. Biological replicates and their advantage over technical replicates have been well described in the context of gene expression studies such as microarrays (e. g. (Chu et al., 2002; Churchill, 2002; Kerr, 2003; Yang and Speed, 2002) ) and mass spectrometry (Oberg and Vitek, 2009) and more recently in RNA seq experiments (Bullard et al., 2010; McIntyre et al., 2011) For ChIP Seq experiments, with the ease of multiplexing and the plummeting costs of sequencing, increased sample sizes (i.e. number of replicates) are not only more affordable but are also b ecoming standard practice. For example, the ENCODE consortium requires a minimum of two biological replicates in ChIP experiments (Landt et al., 2012)

PAGE 120

120 There is not yet unanimity on how to analyze repl icate ChIP seq samples. Pooling biological replicates is common in current protocols of ChIP seq experiments. In some cases multiple biological samples were pooled and then divided into aliquots before sequencing (Chen et al., 2012) Other investigators sequenced the biological replicates separately but pooled the sequencing data together before proceeding to data analysis (Chen et al., 2011; Hutchins et al., 2013; Robertson et al., 2007; Tuteja et al., 2009) Pooling replicates is also integrated into the ENCODE framework (Consortium, 2011) where the replicates were first analyzed separately to determine the Irreproducibility Discovery Rate (IDR) (Li et al., 2011) and then pooled together for identification of the peaks passing the IDR. IDR avoids statistical methods and streamlines data management for the ENCODE consortium. However, IDR has many limitations. For the bivariate model of IDR, the preliminary peaks have to contain both high quality peaks and peaks that are most likely to b e only noise, which is only possible for a few peak callers such as SPP (Kharchenko et al., 2008) PeakSeq (Rozowsky et al., 2009) and MACS (Zhang et al., 2008) However, investigators may have peak callers optimized for the binding factor of interest. The more stringent peak callers such as CisGenome (J i et al., 2008) and QUEST (Valouev et al., 2008) do not work with IDR. Moreover, IDR relies on the ranking of the preliminary peaks and does not handle ties in the ranks, while such ties are common in ChIP seq peaks, especially for the weaker signals. A true signal may be dropped by IDR when one rep is low quality, because IDR chooses signals with consistent ranking over the signals that rank high in one replicate but low in the other. In this scenario, weak signals with consistent ranking between replicates are considered

PAGE 121

121 more credible than signals that were strong in one but weak in the other (inconsistent ranking). In genomic experiments, independent processing of biological replicates is standard. Combined data may be unduly influenced by an outlier sample. Detection rates are also reduced, with binding sites with smaller signal to noise ratios being especially affected. Detection is critical for investigators who want to obtain maximal information from their ChIP seq experiments. Another severe limitation of analyzing a single combined sample is that it precludes downstream quantitative comparisons across samples. Recently there has been attention drawn to analysis of individual samples separately in ChIP seq experiments (Blahnik et al., 2011; Blahnik et al., 2009; 2008; Zhu et al., 2010) Some groups have proposed to focus on the analysis of one replicate, using the additional samples for confirmation only (Revilla i Domingo et al., 2012) Overlapping peaks have been compared for transcription factor occupancy (F ujiwara et al., 2009; Lu et al., 2012) comparing ChIP seq techniques (Yu et al., 2009) and cell cycle phases (Liu et al., 2010) Still, there is no consensus about how to leverage information provided by biological replicates. In this study, we analyzed four ChIP seq experim ents with three or more replicates. Multiple methods for identifying the consensus peaks using biological replicates were considered in order to minimize variability and maximize consistency. We confirm results from genomic studies and conclude that more t han two biological replicates are essential for ChIP seq experiments. We propose using a majority rule for

PAGE 122

122 peak identification and show that this yields more reliable peaks than absolute concordance with fewer replicates. 4.2 Results 4.2 1 There is Varia bility Among Biological Replicates of ChIP seq Experiments For all of the experiments we examined, the sequencing depth and quality varied among replicates (Table 4 1). Sufficient numbers of total reads and uniquely mapped reads were necessary for binding site discovery. The RNAPII data met the rule of thumb promoted for the minimal mapped reads per sample, which is 2 million for drosophila, and 10 million for mammalian genome (Landt et al., 2012) Unde r this rule the FOXA1 and NFKB experiments appeared to lack sequencing depth. The first replicate of the H3K4 data had much fewer reads compared to the other replicates. Consistent with their biological functions, the binding signals of RNAPII and H3K4me3 were associated with genic regions with more prominent peaks near the transcription start sites (TSSs). Clear and narrow peaks were found at the TSSs of known NFKB targets such as TP53 (Schumm et al., 2006; Wu and Lozano, 1994) NFKBIA (Haskill et al., 1991; Sun et al., 1993) NFKB1 (Ten et al., 1992) and SHH (Kasperczyk et al., 2009) The numbers of peaks independently identified were also different for the multiple replicates of the same experiment (Table 4 1). Besides, the difference between peak calling programs was evident. Using default settings, MACS2 (Zhang et al., 2008) identified more peaks in the RNAPII data while CisGenome (Ji et al., 2008) identified more in other datasets. CisGenome peaks were also wider, especially for the NFKB data. Multiple consecutive peaks identified by MACS2 in RNAPII were frequently identified as a single peak by CisGenome. The fraction of reads in peaks (FRIP) varied corresponding to the number of peaks being identified (Table 4 1). MACS2 and

PAGE 123

123 Cis Genome differ both in their underlying statistical models (Poisson vs. negative binomial) and in the assumptions of the default settings. For example, redundant reads, which were plentiful for high PBC data such as H3K4 and FOXA1, have to be removed manual ly for CisGenome but are done so automatically by MACS2. When this step was repressed in MACS2 by the --keepdup option, the number of peaks became comparable to that identified by CisGenome for H3K4 and FOXA1 (data not shown). 4.2 2 Proportion of Common and Unique Peaks Reflect s the Reproducibility of Replicates Without prohibitively costly independent validation experiments, the rate of false positive and false negative peaks cannot be accurately estimated. However, consistency of replicates provides a proxy for such an estimate, as the general assumption is that peaks identified in multiple samples, in approximately the same region, represent the same protein/DNA binding phenomenon. Despite discrepancies in the number of peaks identified by CisGenome and MACS2 in individual replicates, the numbers of common peaks were more comparable between the two programs (Table 4 2). The proportion of overlapping peaks between a pair of replicates reflects sample agreement, which was fair for the RNAPII and NFKB data. Conversely CisGenome did not perform well on the H3K4me3 data. This was probably because CisGenome, unlike MACS2, was n ot optimized for histone signals (broad peaks). The FOXA1 data also had few reproducible peaks across replicates. Compared to the other datasets, the FOXA1 data appeared noisier in the genome browser and we were not able to observe noticeable peaks near kn own selected FOXA1 target genes. The metric we proposed (proportion of overlapping peaks) and the existing metrics (sequencing and mapped

PAGE 124

124 reads) all suggest high background noise in these data. The researchers in the original report combined the five repli cates into one sample prior to analysis. We therefore decided not to analyze the FOXA1 dataset further. Generally, the number of peaks increases with the number of sequenced reads for both CisGenome and MACS2 (Table 4 1), consistent with previous studies (Rozowsky et al., 2009) McNemars test (Johnson and Wichern, 1992) demonstrates that the unique peaks do not match for a given pair of replicates, with more peaks being identified in samples with an increase in sequencing depth. The second and third replicates of the H3K4me3 experiment did not follow this pattern, probably because of their high PBC values. 4.2 3 Quantitative Signal Strength Examines the Reproducibility More Accurately Read coverage in peaks provides a quantitative measurement of enrichment above background. We calculated the Reads Per Kilobase per Million mapped reads (RPKM ) (Mortazavi et al., 2008) in the consensus regions for common peaks (defined in Methods). Because differently defined consensus regions mostly varied in width (Figure 4 1), the choice of consensus region affected read coverage and in t urn the estimate of sample agreement, though this effect was small (Figure 42). ASF consensus peaks had relatively lower agreement across replicates, indicating that ASF is not a good choice of consensus despite its usage of biological knowledge of a prot eins footprint size. It has been reported that although factors bind short regions of DNA (typically 5 25 bp), the DNA fragments that are pulled down typically cover a wider region of 150 600 bp around the binding site (Park, 2009) It is then possible to observe that the width of identified peak regions does not reflect the actual resolution of biological binding size. We also examined the enrichment in the corresponding regions of peaks identified in

PAGE 125

125 the replicate with the most reads. This is comparable with other ChIP seq studies that arbitrarily selected one replicat e as the reference sample (e.g. (Revilla i Doming o et al., 2012) ). Unsurprisingly, such consensus peaks were heavily biased towards the sample that was selected as the standard. For RNAPII and NFKB, CisGenome peaks had higher agreement across replicates (Supplemental Figure 3: tighter BA plots, hig her Kappa and Spearmans coefficient), indicating it called fewer peaks which were of higher quality. Another contributing factor may be that under the default settings, CisGenome peaks were wider, including more reads that covered broader regions. In the H3K4me3 data, MACS2 identified much fewer but higher quality peaks compared to CisGenome. The first replicate of H3K4me3 data was less correlated with the other replicates, possibly an outlier, which was hinted by its much lower read counts. Despite the difference in the number of identified peaks, the RNAPII and NFKB replicates were highly correlated in terms of signal quantification (Figure 42 ). QC based on sequencing depth and peak calling results may identify the third replicate of NFKB experiment as failed; however, when measured quantitatively, it actually had very good agreement with other samples. 4.2 4 Peaks Identified in the Majority of Replicates are Reliable Due to the noisy nature of ChIP experiments and limitations of peak calling programs, peak identification varies across samples. Requiring support from all replicates for common peaks is likely to increase the false negative rate. Visual inspection using the genome browser found clear peaks at the TSS of known NFKB targets such as TP53 (Schumm et al., 2006; Wu and Lozano, 1994) NFKBIA (Haskill et al., 1991; Sun et al., 1993) NFKB1 (Ten et al., 1992) and SHH (Kasperczyk et al.,

PAGE 126

126 2009) though these peaks were not identified in all replicates by CisGenome or MACS2. In addition, there were also distinct increases of signal near the TSS of BRCA2 and PTEN, both of which are known targets of NFKB (Wu et al., 2000; Xia et al., 2007) but were not identified as peaks The absence of peaks identified at these regions may be the result of insufficient coverage or excessive noise at these genome positions. As the identified peaks alone do not reflect the quantitative property of ChIP seq signals accurately (Figure 4 2), it is overly conservative to require a peak to be identified in all replicates. Alternatively, we hypothesized that if a peak was identified in more than 50% of the replicates (i.e. two out of three, three out of five) there is sufficient support for its existence. More peaks were included as common under this majority rule (Table 4 2 Common in the majority). The probability of detection above background (DABG) was used to determine the observed signal in the putative peak region was greater than ex pected due to noise. Peaks common in the majority were considered detected in the remaining replicates when the RPKM in the peak was greater than the lower quantile of coverage in all peaks (Z test p<0.05) for that sample. For the RNAPII data, peaks that w ere identified in the majority of replicates had a high confirmation rate using this simple DAGB compared to those who were unique in one replicate, regardless the peak caller used or the consensus definition (Figure 4 3). More than 92% of unique peaks in NFKBs first replicate were also significant in other replicates. Possibly many genuine signals were missed by the peak callers due to lack of sequencing depth at certain areas. Consistent with the agreement described above, peaks identified in the third and fourth replicates of the NFKB data were less likely to be detected above background than in other replicates. When the peaks were identified only in the third and fourth

PAGE 127

127 replicates, 11% and 25% were significantly above background in the other replicates. The percentages increased to up to 100% when peaks were also identified in the additional two replicates. Overall, additional replicates enable inferring peaks from where the information is missing in one of the multiple replicates. Spearmans correlati on between pairs of replicates was expected to be high when using peaks that were identified by the peak callers in all replicates. The correlation was only slightly lower when the peaks that were identified in the majority were also included (Figure 44 ). However, when only one replicate was required for peak identification, the correlation in enrichment among replicates dropped dramatically (Figure 4 4 ), indicating that peaks identified in the majority of replicates were comparable to the common peaks, both of which were much more reliable than those identified in one replicate. 4.2 5 Genomic Features Provide an Alternative to 7 The performance of different methods for determining consensus peaks was dependent upon the mode of protein binding, data quality and peak caller used. For the data we examined, MAX, SMT and ASW consensus peaks yielded a high estimate of consistency for point and mixed source factors. It was less conclusive for the broad source factors, as most peak callers are not designed for them. Genomic features may serve as a reasonable alternative as quantification unit for well annotated genomes. For example, based on the biology that H3K4me3 marks are associated with TSSs, sample consistency can be inferred by inspecting the read coverage at TSSs. Even for factors whose functions are less defined, the regulation of many proteins are gene centric, therefore the bindi ng strength in the nearby genic regions may provide a measure of the biological activity.

PAGE 128

128 We calculated the coverage in the surrounding regions of TSS for the H3K4me3 data and coverage in the transcripts for the RNAPII data. Enrichment in the TSS surrounding regions was in good agreement for the second and third replicates of the H3K4me3 data ( Figure 4 5). Consistent with other measures, the first replicate of H3K4me3 seems to be an outlier sample. The enrichment in the transcripts was in good agreement for all replicates of the RNAPII data (Figure 4 5). 4.3 Discussion Noise may be introduced during many steps of ChIP. Some may be technical issues in IP, library construction, or sequencing. Other noise may be due to biological differences among individual samples. The noise makes peak identification from ChIP seq data a challenging task. We an alyzed four publically available ChIP seq data with three or more biological replicates. Common and unique peaks were separated, and the proportion of common and unique peaks was used as an indicator of individual sample quality. Deeply sequenced experiments, such as the RNAPII data in this study, had better concordance among replicates than those with lower read counts. Yet, reproducible peaks could still be determined from those studies with lower coverage. The range of the consensus peaks and associated analysis are sensitive to the peaks identified in individual replicates. Nevertheless, quantification of the signals in the consensus regions was consistent. Despite their distinct results in peak identification, the two different programs used in this stu dy (CisGenome and MACS2) produced comparable quantitative measurements of consensus peaks and led to similar conclusions. Importantly, sample consistency was higher than estimated by the absolute concordance of peak identification across all replicates.

PAGE 129

129 U sing the concordant binding sites in three replicates as the benchmarking set, a previous ChIP seq study concluded that a third replicate did not increase site discovery (Rozowsky et al., 2009) However, the real binding sites are unknown for most o f ChIP studies. The strategy that requires identification of a peak in all replicates will likely exclude genuine binding sites. The failure to detect a peak in a particular sample may be due to a low coverage or high background at the peak position, in co mbination with the uncertainty in peak calling algorithms. A practical approach to maximize site discovery is to increase the number of replicates and to implement the majority rule. We showed that peaks that were identified in the majority of replicates w ere likely to be enriched above background in the other replicates. When more than two replicates were examined, many peaks that would be considered unique in the pair of replicates were confirmed in the additional replicate. The significance of detection above background was more substantial for peaks identified in most (>50%) replicates than singleton peaks, suggesting these peaks were more likely to be true positives. This majority rule may also apply to other IP seq studies. Twice as many microRNA binding sites were identified from two out of three replicates than from all three replicates using high throughput sequencing of RNA isolated by crosslinking immunoprecipitation (HITS CLIP) technology (Haecker et al., 2012) Real target sites may not recur across r eplicates above background as defined by a particular algorithm. Annotationbased approaches provide quantification that is independent of peak calling. They are complementary to peak identification for promoter/transcript associated protein binding, or ca n be employed when peak calling is difficult. Notably, they cannot replace peak callers, as many binding sites would be

PAGE 130

130 missed, as it has been demonstrated by previous ChIP experiments that TF, even transcription activators such as STAT1(Robertson et al., 2007) E2F1 (Bieda et al., 2006; Cao et al., 2011) can bind in regions of the genome previously unknown, though the function of the binding remains unclear. The decade long debates on replication for microarray experiments (Allison et al., 2006) and more recently RNA seq data (Auer and Doerge, 2010) applies to the current discussion of ChIP seq data. Not only is an increase in replication sensible from a statistical point of view, it enables identification of a higher number of reliable signals out of the noisy ChIP seq data. The more variability in the sample source, the more biological replicates will be necessary. More replicates provide a shield against undercalling, as a particular peak caller is unlikely to identify all real peaks in all replicates. In the cases that a certai n peak is missing in one sample but present in other replicates, the signal in the missing sample might still be estimated from other replicates. 4.4 Methods 4.4.1 Data We used four ChIP seq data sets for this study. One is previously unpublished data created in our lab. The raw data (fastq files) of the other three were downloaded from Gene Expression Omnibus (GEO). a) RNA Polymerase II ChIP seq in Drosophila melanogaster with three replicates, and one input DNA control (GEO accession: GSE36107). b) Transcription factor NFKB ChIP seq (Kasowski et al., 2010) (GEO accession: GSE19485) in human lymphoblastoid cell line GM10847. The cells were stimulated with

PAGE 131

131 TNF biological replicates and two IgG control samples. c) FOXA1 ChIP seq in mouse liver with five biological replicates and three input control samples (Bochkis et al., 2012; Soccio et al., 2011) (GEO accession: GSE25836 and GSE33666). d) H3K4me3 ChIP seq in Drosophila melanogaster with three biological replicates and three input control samples (unpublished). 4.4.2 Analysis Seq uencing reads were mapped to the genome (FlyBase 5.30 for drosophila mm9 for mouse, and hg19 for human) using Bowtie (Langmead et al., 2 009) with options m 10 best strata. Aligned reads were visualized in Integrative Genomics Viewer (Broad Institute) (Robinson et al., 2011; Thorvaldsdttir et al., 2013) to check the overall read distribution shape and signal strength of the factor and the control at individual loci. Although not a quantitative metric, visible peaks at known binding regions are expected in a successful ChIP seq e xperiment. The PCR bottleneck coefficient (PBC) was calculated to measure approximate library complexity by taking the ratio of nonredundant, uniquely mapped reads over all uniquely mapped reads. For peak identification, we used two of the most popular peak callers, MACS2 (Zhang et al., 2008) and CisGenome (Ji et al., 2008) which were found to compete well with other peak callers (Chen et al., 2012; Li et al., 2011) Both programs were run with default settings with the inp ut DNA sample as the control, where MACS2 identified peaks with FDR <0.05 and CisGenome identified peaks with the fold of enrichment >3. The fraction of reads in peaks (FRIP ) (Ji et al., 2008) was calculated to estimate the global enrichment of signals against the background.

PAGE 132

132 For peaks independently identified from multiple replicates, consensus regions were defined (Figure 4 1) as follows: the maximum area encompassing identified peak regions (MAX); the area between the summits of overlapping peaks (SMT); the area encompas sing the known footprint size for a specific binding protein centered at the average summit (ASF), or using an empirical observation of average peak width to determine the boundaries again centered at the average summit (ASW). The coverage in the consensus peaks was calculated as the Reads Per Kilobase per Million mapped reads (RPKM ) (Mortazavi et al., 2008) values. Consistency between pairs of replicates was explored using weighted Kappa coefficients (Fleiss, 1981) of ranked coverage (groups=5) and Spearmans correlation. BlandAltman plots were also used to examine differences between the two replica tes plotted against their mean (Bland and Altman, 1986; Bland and Al tman, 1988) Unique and common peaks were identified across replicates by comparing the position of individual peaks. Peaks found only in a single replicate were considered unique. Peaks were considered overlapping among replicates if at least one nucl eotide was shared. Peaks present in all replicates were considered to be common. Peaks present in a subset of replicates were used to examine the corresponding region in the other replicates. The simple agreement coefficient was calculated as the number of common peaks over all peaks identified in a pair of replicates. McNemars test (Johnson and Wichern, 1992) evaluates the symmetry of unique peaks, providing a measurement of agreement between replicates.

PAGE 133

133 A Z test was used to test the probability that the enrichment in a peak region was significant above the background (detection above background, DABG): x: RPKM in the peak region for a specific sample; coverage in all peaks; n: total number of consensus peaks for the experiment. The Gene Feature Format (GFF) file containing the genomic annotation of D. melanogaster was downloaded from: ftp://ftp.flybase.net/genomes/Drosop hila_melanogaster/dmel_r5.30_FB2010_07/. The promoters were defined as +/ 2kb from the TSSs. The genic regions were taken as the upstream 2kb from the TSSs until the downstream 2kb from the transcript terminate sites (TTSs). Agreement between the RPKM of pairs of replicates was inspected using BlandAltman plots for both promoters and genic regions.

PAGE 134

134 Table 41. Summary statistics for the four ChIP seq experiments examined. The number of total reads and uniquely mapped reads reflect the sequencing depth and quality of a ChIP seq experiment. PCR bottleneck coefficient (PBC) is the ratio of nonredundant reads among uniquely mapped reads, serving as an approximate measure of library complexity. CisGenome and MACS2 identified different numbers of peaks under default settings. The different peaks resulted different fraction of reads in peaks (FRIP), which is a rough estimate of the global enrichment. Rep total reads Uniquely Mapped reads PCR bottleneck coefficient (PBC) Peak # CisGenome FRIP CisGenome (%) Peak # MACS2 FRIP MACS2 (%) RNAPII in Drosophila melanogaster 1 4,426,333 3,264,564 0.0642 2438 4.20 3493 8.56 2 6,871,525 5,323,280 0.1260 2893 5.25 5930 5.45 3 5,017,458 3,710,467 0.0292 3100 5.59 7578 7.76 NFKB in Homo sapiens 1 12,511,139 7,829,396 0.0181 1203 1.18 165 0.06 2 10,804,133 6,633,449 0.0150 1073 0.81 122 0.05 3 5,094,627 3,119,736 0.0198 409 0.48 1614 0.90 4 20,241,529 7,773,601 0.0133 699 0.37 1090 0.39 5 19,376,644 9,038,510 0.0216 1304 1.92 4612 2.26

PAGE 135

135 Table 41. Continued Rep total reads Uniquely Mapped reads PCR bottleneck coefficient (PBC) Peak # CisGenome FRIP CisGenome (%) Peak # MACS2 FRIP MACS2 (%) FOXA1 in Mus musculus 1 12,220,197 4,583,104 0.1809 15669 2.31 41 0.51 2 7,901,063 4,463,503 0.1271 8721 1.15 59 0.13 3 8,480,219 4,293,683 0.0577 2131 0.88 17 0.68 4 15,312,860 6,761,376 0.3907 78533 7.86 579 0.15 5 15,076,772 5,829,339 0.7683 318662 51.61 1047 0.54 H3K4 Drosophila melanogaster 1 3,155,567 1,502,023 0.7700 16,078 37.42 825 3.13 2 45,168,009 25,813,997 0.7750 17,045 7.49 130 0.41 3 17,460,280 9,833,701 0.7653 19,528 14.02 165 0.52

PAGE 136

136 Table 42. Numbers of common peaks. Common in all replicates: a peak was counted when it has overlapping peaks in each of the replicates. Common in the majority: a peak was counted when it has overlapping peaks in more than 50% of the replicates (i.e. three out of five, two out of three, etc). RNAPII FOXA1 H3K4 NFKB CisGenome MACS2 CisGenome MACS2 CisGenome MACS2 CisGenome MACS2 Common in all replicates 1391 1874 5 3 160 53 113 62 Common in the majority 2278 3569 439 28 3288 154 432 781

PAGE 137

137 Figure 41. Defining the consensus regions for overlapping peaks across replicates. (A). Scheme showing different methods of combining individual peaks into a consensus. MAX: the maximum area encompassing all peak regions. SMT: the area between the summits of peaks. Summits of individual peaks are marked in red. The average summit of individual peaks is shown as the star. ASF: the area in the size of the footprint of the bound protein with the average summit as the center. ASW: the area centering the average summit in the size of the average peak width. (B) Snapshot of signals (grey bar charts on top), algor ithmically identified peaks (black) and the consensus regions (blue) for point source factors that form narrow peaks at the transcription start site (TSS). The ChIP signals are distinct compared to the input control. The outlooks of the signals are highly similar for all five replicates when the signal range is not set but allows autoadjustment to the local background (not shown). Here the range is set to a constant to allow comparison of the relative signal strengths, which vary across samples. The peaks identified in individual samples are similar in their position and width. (C) Snapshot for broad source factors whose binding signals span an entire gene (cropped at the 3 end for readability). There are bigger differences in the identified peaks across r eplicates.

PAGE 138

138 Figure 42 Consistency across replicates of the RNAPII ChIP seq experiment. (A) Boxplot of weighted Kappa coefficients. The coverage in the consensus peak was binned into five ranked groups. The agreement of such ranked coverage between re plicates was reflected by the weighted Kappa coefficients. A value over 0.75 indicates excellent agreement, which was met for all replicates regardless of the consensus being used. (B) Heat map of the Spearman correlation of the coverage in the consensus peak. Correlations were high. (C) Bland Altman plots show the relationship between the difference (Y axis) and the mean (X axis) for a pair of replicates. Narrow and symmetrical plots reflect better agreement. Replicate 2 and replicate 3 are shown here, but other pairs (Replicate 1 vs Replicate 2, Replicate 1 vs Replicate 3) have similar patterns

PAGE 139

139 Figure 43 Percentages of peaks that were detected above background (DABG) in replicates where no algorithmically identified peaks were present. The read coverage (RPKM) in each identified peak, unique or common, was compared to the lower quantile of coverage in all peaks. The peak was detectable if the difference was statistically significant by a Z test. Peaks that were identified in the majority of repl icates had a higher ratio to be confirmed by DAGB compared to those were unique in one replicate (Supplemental Table 3. The Y axis is the percentage of the peaks DABG and the mean is indicated by the sold line while the whiskers are the 25 and 75 percentil e values.

PAGE 140

140 Figure 44 Spearman correlation coefficients were comparable when the peaks were identified in all replicates or in the majority of the replicates. However, the correlation was much lower for uniquely identified peaks. The Y axis is the correlation coefficient and the mean is indicated by the sold line while the whiskers are the 25 and 75 percentile values

PAGE 141

141 Figure 45 Bland Altman plots showing the sample agreement, using genomic features as the quantification unit. The difference (Y axis ) between a pair of replicates at the genomic feature (transcript for RNAPII [A] and TSS for H3K4me3 [B]) was plotted against the average of two samples. (A) Enrichment in the transcripts was in good agreement for all replicates of the RNAPII data. (B) The first replicate of H3K4m3 seemed an outlier sample, the agreement of which with the other replicates was low. The second and third replicates agree with each other very well in their enrichment near the TSS

PAGE 142

142 CHAPTER 5 CONCLUSIONS AND FUTURE DIRECTIONS 5.1 Genetic, Epigenetic and Post transcriptional Factors Contribute to Systematical Understanding of GRNs Understanding the dynamic interplay between pathogen and host is essential to develop therapeutic treatments. This is a complex system w ith many components. The focus of my research has been transcriptional networks, which consist of genetic, epigenetics, and post transcriptional control. Perturbation of key components in the regulatory network is one method to identify changes in the dow nstream components of the GRN. Integration of experimental data across multiple platforms can lead to deeper understanding of interactions in the GRN and to testable hypotheses about critical regulatory interactions. In chapter 2, I examined the GRNs aff ected by a post transcriptional regulator miRK12 11, and observed a striking divergence of response in individual genes to miR K12 11 in BJAB and TIVE cells. The RISC dependent direct targeting of TFs and signaling proteins amplified the effect of miR K12 11 to a much larger set of genes. I developed a systems approach that links information about many different experiments present in databases to current experiments and showed that common signaling (e.g. IFN) and metabolic (e.g. carbohydrate) pathways wer e targeted in both tissues by affecting different sets of genes in their expression. I identified distinct tissue specificity in the interaction between miR K12 11 and host genes. The results indicate a possible global view of the role of miR K12 11 but also demonstrate the dramatic effects of the tissue specific transcriptome. Tissue specificity and the regulatory changes which are manifest are important in the study of viral host interactions. This is particularly true for herpesviruses which always infect a number of different tissues. As an example,

PAGE 143

143 KSHV uses oral epithelial cells for transmission, memory B cells for long term latency and persistence, and also transforms endothelial cells in KS pathogenesis. Hence, all KSHVencoded miRNAs including mi R K12 11 must have been evolved to carry common and highly cell typespecific regulatory roles in this vastly different cellular environments. In chapter 3, I presented a cost effective platform to quantify the effect of genetic variation on gene expression in Drosophila. The custom designed microarray chip measures expression, exon level signal and variation in allelespecific expression simultaneously (Yang et al., 2011) Application of this platform to a large series of X chromosome substituted Dro sophila simulans has successfully identified sex specific regulatory variation for cis regulatory elements and trans acting factors (Graze et al submitted). I also worked on a review paper for how to model cis and t rans effects in GRNs (Jansen et al., 2009) Together, these projects helped me understand the role of genetic polymor phism in gene networks. In chapter 4, I tested methods for examining biological replicates in ChIP seq assays, which have become one of the common tools for genomewide epigenetics studies (Yang et. al. submitted). Noise is ubiquitous in ChIP samples and c omplicates site discovery. Determining what the common binding areas are across multiple noisy samples is challenging. After a thorough exploration of ways to approach this problem, I demonstrated that increasing the number of replicates increases detection rates and that binding sites initially identified in more than half of the replicates can be found reliabily in other replicates. This analysis approach as well as an integrative systems biology approach as performed in Chapter 2 has been applied to a st udy of KSHV

PAGE 144

144 epigenomic regulation, and found that LANA contributes to the epigenetic profile of the major latency promoters (Hu et al., In prep) The GRNs affected by miR K1 2 11 improved our insight into viral host interactions. However, it is important to realize that these types of studies are descriptive. The next step is to transition to fully predictive biological models of viral host interactions. Once we are able to f orm such predictive models, therapeutic interventions are within closer reach since such approaches will ultimately also help to distinguish solid putative targets from those whose perturbation overall may have much smaller effects. 5.2 Application of the Systems Biology Approach to Other Viral and Host Factors is Necessary to Understand GRNs during Viral Infection Many viral and host factors are transcriptionally active during KSHV infection, constituting complex GRNs. I focused on miR K12 11 in chapter 2 and developed a systems approach to perturb the GRN by overexpressing this miRNA and measuring its impact on endothelial and B cells. However, miR K12 11 is only one of the 17 KSHV miRNAs expressed during latency. In addition to miRNAs, other latent genes, such as LANA, v Cyclin and v FLIP, all significantly shape the cellular environment and play critical roles in KSHV latency and pathogenesis (Dittmer et al., 1998; Renne et al., 1996; Zhong et al., 1996) Host gene products are also integral components of the GRNs during KSHV infection. For example, the oncogenic miR 1792 family was upregulated in PEL (O'Hara et al., 2008) While Chapter 2 focuses on miR K12 11 the presented approach can be extended to each of these components. KSHV gene products have been shown to target a diverse list of host genes that are associated wit h latency, proliferation, immunity, cell signaling, and transcription

PAGE 145

145 (AREST and BLACKBOURN 2009; WEN and DAMANIA 2009). Host miRNAs and proteins may work together with viral gene products to regulate the host transcriptome, proteome and subsequently the interactome. In order to separate driving factors from nonessential ones, perturbation of these factors needs to be performed independently and codependently. Integration of multiple experiments can identify the synergistic activity among individual fact ors. The combined information of the role of individual gene products will give broader insight into the pathways that are involved in viral pathogenesis. The detailed information of components and targets of GRNs would be highly useful to improve diagnost ics and therapeutics for KSHV infection. The leveraging of direct experimentation with existing knowledge can be applied to a broader range of systems where virus and host interact. MiRNA regulation is a component of all herpes viruses except the Varicel la Zoster virus. Comparative analysis of miRNA from different herpesviruses may identify common regulatory nodes of those cellular processes that need to be blunted or modulated in order to support efficient viral replication and or latency. Identification of key GRNs affected by viral miRNAs can lead to understanding of the mechanisms underlying complex diseases. Newly identified critical components in such complex networks may also serve as novel drug targets. 5.3 Measurements of Additional Genetic and Epigenetic Regulators Will Complete the GRNs Besides transcriptome profiling using microarrays that I did, more genomewide assays can be performed on the foamy virus transduced BJAB and TIVE cells. Ultimately, we wish to determine the paths throughout the network that connect miR K12 11 to every other affected gene. Until now, many of the interactions linking the direct targets to the indirect targets are still unknown. Even a gene with undetectable

PAGE 146

146 expression level changes may participate in the regulatory network through cascading pathways. To comprehensively understand the transcriptional and functional networks, we need information on other components of the cellular GRNs, including the epigenetic state of DNA methylation and histone modifications, TF binding, chromatin conformation, short RNA content and proteomic profiling. Thanks to the recent technological advances, genomewide measurements of these regulators have become possible, and more and more data have been deposited in the public databases. By integrating multiple components in the cellular GRNs, we can identify interactions across different layers of regulation. ChIP seq of the TFs of interest in TIVE cells is important in order to unravel KS pathogenesis. Publically available data on endoth elial cells is currently scarce. ChIP seq of the TFs of interest in TIVE cells can identify the actual TF targets in vivo and eliminate irrelevant sites identified by bioinformatic predictions only. In the present study, we filtered the predicted TFBS by integrating the DNaseseq data from the ENCODE project. This filtering is useful to exclude TFBS located within inactive chromatin regions in endothelial cells. However, false predictions, such as active binding sites occupied by other TFs than the ones inspected, are likely to linger. Combining computational prediction with ChIP seq experiments will provide higher resolution for TFBS identification. The detection of expression levels and binding of all short RNAs present in lymphoid and endothelial cells will reveal more potential regulators involved in shaping cell type specific transcriptomes. Deep sequencing of short RNA contents can identify and quantify all miRNAs in the BJAB and TIVE cells ectopically expressing miR K12 11.

PAGE 147

147 CLIP seq assays can catch the miRNA:mRNA pairs in action. I have participated in the CLIP seq study of two PEL cells lines (Haecker et al., 2012) Key targets of miRNAs can be more precisely identified when the miRNAome, transcriptome and AGO binding profiles are combined. Gene expression and proteomic measurements, histone ChIP seq, FAIRE seq, DNA methylation data are under production by the ENCODE project and other consortia such as the Epigenomics Initiative. It is now possible for individual scientists to integrate small lab knowledge with big data. These resources can be extended to broader contexts regardless of the original purpose for which the data were produced. For example, EBV (a gamma herpes virus with great similarity to KSHV) infection of primary B cells leads to efficient e stablishment of continuously proliferating, genetically stable, lymphoblastoid cell lines. ENCODE and HapMap have generated a large repository of genotyping, expression profiling and epigenetic data for these lymphoblastoid cell lines. Reanalyzing of the c onsortia data, with the additional step of alignment to the EBV reference genome, identified viral genes that are co expressed during the latent infection. Integration of host regulators, chromatin structures, and protein binding map revealed host virus in teractions used by EBV (Arvey et al., 2012) Un derstanding of the methods used in consortia studies and their underlying assumptions is critical for effective use of the data. Many early studies lacked replicates altogether or only used two replicates without independently determining the noise for eac h new assay (Rozowsky et al., 2009) The raw data may be more useful than the initial analysis results. Reanalysis of raw data may identify novel findings that had been

PAGE 148

148 omitted in the original interpretation. More sophisticated analysis pipelines can be devised for particular questions of interest. Big data are useful to narrow the scope for the next logical set of experiments at the bench. However, it is important to understand how to transit from mere suggestions of correlation to concrete evidenc e of biological mechanism. The establishment of causative relationship requires validation experiments of molecular biology (5.4). 5.4 Gene Specific Experiments Can Validate Observations in Genome Wide Profiling Validation of specific genes is an indispe nsible step in any functional study. In the presented study, putative direct targets of miR K12 11 were identified by combining expression profiles with bioinformatic predictions. Molecular experiments described in 1.3.4 can be performed to further confirm the targeting of individual genes. We have verified the downregulation of f our genes using qPCR (Figure 26 P age 80). To examine whether the changes at the transcript levels are also present at the protein level, western blot analysis will be informative. Luciferase reporter assays can test whether the biochemical binding between the miRNA and the target transcript can modulate the identified target through seed sequence dependent miRNA targeting (Boss et al., 2011) The inferred pathways of miR K12 11 targeting make specific testable predictions, which need to be confirmed through functional assays. By transfecting the missing elements into the cells, repressed pathways may be rescued. The elevated pathways with upregulated genes may be suppressed using RNAi. We have assayed the effect o f miRK12 11 gain of function. The loss of function approaches will be able to further validate the important role played by miR K12 11.

PAGE 149

149 Antagomirs and sponges specific to miR K12 11 should release the repression for its targets. Infection with mutant viru ses that contain defined miRNA mutations opens the possibility of functional studies in the context of viral infection. Our lab has designed miRNA knockout KSHV mutants for all 17 KSHV miRNAs. Comparing the GRNs trigged by mutant viruses with those by wild type viruses, will help to further elucidate viral miRNA function. However, it is important to note that for pathways like IFN and STAT signaling, experiments in tissue culture may not yield additional insights. One major limitation for KSHV studies is th at currently no efficient animal models exist. However, miRK12 11 ability to modulate cell cycle has been confirmed in the context of humanized mouse models in which miR K12 11 expression like its human counterpart miR155 is associated with splenic B cel l proliferation (Boss et al., 2011) 5.5 Cellular Background Should be Carefully Considered The exquisite cell type specificity of GRNs emphasizes the importance of having appropriate biological systems on which to test hypotheses. Due to high specificity of transcriptomes and proteomes i n different cell types, a regulatory effect is only relevant within a particular cell type. We identified direct and indirect targets of miR K12 11 in two types of KSHV host cells, which more accurately reflect the context in which the viral miRNA operates and contributes to pathogenesis, compared to the studies using 293 cells and lung cancer cells as previously published. Endogenous expression miR 155 was detected in mock transduced BJAB cells. Given the seed sequence homology between miR 155 and miR K1 2 11, it is possible that some miR K12 11 sensitive GRNs had already been modulated. Indeed, we observed only few DEGs in miR 155 transduced BJAB. The number of DEGs in response to miR K12 11 was smaller in BJAB than in TIVE. Although not viral infected,

PAGE 150

150 B JAB cells were isolated from lymphoma tissues and have already gone through transformation. Tumor associated pathways may be activated, and obscured those exclusively triggered by miR K12 11. A cleaner background may be obtained by conducting similar exper iments in progenitor or nave B cells. In addition to endothelial cells and lymphoid cells, KSHV also infects epithelial cells in the context of oral transmission (Moore and Chang, 2003) It would be of interest whether in these cells miR K12 11 induces yet another cell typespecific GRN. The lesson on tissue specificity is broadly applicable to any studies of viral host interaction. To identify genes and interactions that are efficient therapeutic targets, it is critical to carry out the experime nts in those cell types that best resemble the genetic background of the physiological host cell. Interactions identified that do not happen in the disease tissue are not informative for understanding of pathogenesis. 5.6 Time series Measurements Can Model the Dynamic Aspects of Viral Host Interactions We have only tested the effects of miR K12 11 in a static manner (one time point and one stage of differentiation). However, in order to pinpoint the exact causal relationships between a regulatory element and its downstream targets, timeseries data are necessary. As reviewed in 1.4.2, a large set of timeseries expression data allow construction of directed regulatory networks with probabilistic models. Such models provide a genome scale view of GRNs and c an generate predictions on the systems response to further perturbations. Likewise, time series measurements of the dynamic DNA protein, proteinprotein, short RNA interactions and epigenetic state can be added. Once a comprehensive list time series data covering multiple layers of regulation is collected,

PAGE 151

151 we will be able to move from descriptive to dynamic more quantitative models that integrate changes of relevant protein, RNA and chromatin components over time. Integrating the dynamics of transcriptomes and functional interactomes will offer new insights into complex biological systems. With predictive models, iterative cycles of perturbation, data integration, development of further probabilistic models can be carried out to refining the model for bet ter description of the GRNs. Hypotheses can be formulated to explain unpredicted observations. Iterative experimentation and computation identifies novel regulators that play a role in the biological system. The hypothesis derived at each round in turn dri ves new experimental design and new data collection. This iterative approach has the potential to one day fully dissect the complex interactions between virus and the host. With the rapidly reducing cost and increasing speed of high throughput technologies the iterative approach has become more and more feasible to tackle complex problems such as host/pathogen interaction or embryonic development 5.7 Understanding the Quantitative Nature of Biological Processes Complex traits are the combinatorial resul ts of quantitative genetics (vs. molecular genetics in Mendelian traits) (Richards, 2009; Rutherford and Henikoff, 2003) With the number of nodes within GRNs constantly increasing, the num ber of phenotypes may grow exponentially. Improved technology allows higher resolution to characterize the quantitative nature of biology. Recent evidence suggests that TF regulation is a continuum instead of a discrete all or nothing, and the binding sequence specificity has more plasticity rather than strictly binding to consensus motifs (Biggin, 2011; MacQuarrie et al., 2011) The same flexibility is present for miRNA regulation. Beyond the seed sequence (27 bases), the miR binding specificity wobbles when the

PAGE 152

152 1st or 8th base changes (Bartel, 2009) Very recently, a new technique that clones miRNA targets by ligating RISCassociated miRNAs to their cognate mRNA targets revealed many noneseed sequence dependent interactions (Helwak et al., 2013) As a result our analysis may have underestimated the number of direct targets within miR K12 11 expressing TIVE cells. The regulatory effect of miRNAs is subtle instead of abrupt, too. MiRNAs act more in fine tuning than switching between the on and off status of transcript abundance. To accurately measure the quantitative effect of biological processes, a single cell approach may be necessary. Most biological processes occur at the level of individual cell. These observations based upon a population of cells may not reflect the biological activities in a single cell. There is a large degree of heterogeneity (genetic, micr oenvironment, stochastic) between individual cells, even within clonal population (Cohen et al., 2008; Sharma et al., 2010) Variation arises from difference in isoform presence, protein concentrations, and stochastic fluctuations in biochemical reactions involving low copy number molecules, which can lead to phenotypic variation that models assuming homogeneity cannot address. The effect of individual components in single cells may be obscured by the population of mixed cells in different stages. This might have created some discrepancies among the regulatory networks, which need to be clarified with more measurements of the status and quantities of functional elements. Notably, subpopulations are especially frequent in tumor cells, which are the research material of many labs. Most of the information on cell s ignaling was collected from populationlevel studies using bulk assays, yet single cell assays have raised some doubts to whether

PAGE 153

153 population data faithfully reflec t how individual cells respond (Cohen et al., 2008) Comparing bulk and single cell assays of NF exclusive molecules may appear statistical dependent at the population level (Tay et al. 2011) Response of NF responding cells increases with dose, creating a dose dependent appearance based on actual all or nothing in single cells (Tay et al., 2011) In another example, the pulsed responses of p53 to radiation damage are evident only at the singlecell level and are blurred out in population measurements (Batchelor et al., 2009; Lahav et al., 2004; Spencer et al., 2009) However, it remains a challenge to infer the temporal regulations and plausible causal relations to reflec t the dynamic property of GRNs (Wiggins and Nemenman, 2003) Single cell sequencing and mass spectrometry based cytometry have been developed (Ornatsky et al., 2008) Right now it remains challenging to simultaneously measure multiple components of a single cell over time and space, for the number of proteins that can be measured is still limited. Profiling data usually capture random steady states of the underlying biochemical dynamics. Accurate temporal measurements are especially difficult to obtain for human tissues. 5.8 Conclusions I have aimed to understanding GRNs systematically by profiling the genetic, epigenetic, and post transcriptional factors. This integrative approach can be applied to a broader range of viral and host systems and will shed light on the viral pathogenesis in many diseases. The GRNs will be more comprehensive with more profiling data on th e genetic and epigenetic regulators. Integration of multiple lines of evidence can suggest cellular pathways disrupted and/or modified by the virus. In order to identify the

PAGE 154

154 causal factors, it is indispensible to validate hypothesis about the role of indiv idual genes and pathways using molecular biology experiments. Notably, it is critical to use a relevant model for the investigation of GRNs because GRNs are highly tissue specific. To move on from descriptive to predictive models, timeseries measurements need to be added. Single cells may better model the variation in quantitative biological systems when the technology bottleneck is removed in the future. With profiling data on every regulatory level, in a broader genetic background, and across time for each genetic background, we can figure out the GRNs and hopefully utilize this knowledge in the design of disease and tissuespecific intervention strategies.

PAGE 155

155 LIST OF REFERENCES Affymetrix (2005). Exon array background detection. ( http://media.affymetrix.com/ support/technical/whitepapers/exon_background_correction_whitepaper.pdf) Ahmadian, A., Gharizadeh, B., Gustafsson, A.C., Sterky, F., Nyrn, P., Uhln, M., and Lundeberg, J. (2000). SingleNuc leotide Polymorphism Analysis by Pyrosequencing. Analytical Biochemistry 280 103110. Alberts, B. (1998). The cell as a collection of protein machines: preparing the next generation of molecular biologists. Cell 92, 291 294. Alexiou, P., Maragkakis, M., P apadopoulos, G.L., Reczko, M., and Hatzigeorgiou, A.G. (2009). Lost in translation: an assessment and perspective for computational microRNA target identification. Bioinformatics 25, 30493055. Allemand, E., Batsch, E., and Muchardt, C. (2008). Splicing, transcription, and chromatin: a mnage trois. Curr Opin Genet Dev 18, 145 151. Allison, D.B., Cui, X., Page, G.P., and Sabripour, M. (2006). Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet 7 55 65. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990). Basic Local Alignment Search Tool. Journal of Molecular Biology 215, 403 410. Ammar, R., Smith, A.M., Heisler, L.E., Giaever, G., and Nislow, C. (2009). A comparative analysis of DNA barcode microarray feature size. BMC Genomics 10, 471. Aoki Kinoshita, K.F., and Kanehisa, M. (2007). Gene annotation and pathway mapping in KEGG, Vol 396. Arest, C., and Blackbourn, D.J. (2009). Modulation of the immune system by Kaposi's sarcoma associated herpesvirus. Trends in Microbiology 17, 119129. Arvey, A., Tempera, I., Tsai, K., Chen, H.S., Tikhmyanova, N., Klichinsky, M., Leslie, C., and Lieberman, P.M. (2012). An atlas of the EpsteinBarr virus transcriptome and epigenome reveals host virus regulatory inter actions. Cell Host Microbe 12, 233 245. Auer, P.L., and Doerge, R.W. (2010). Statistical Design and Analysis of RNA Sequencing Data. Genetics 185 405416. Babu, M.M., Luscombe, N.M., Aravind, L., Gerstein, M., and Teichmann, S.A. (2004). Structure and evolution of transcriptional regulatory networks. Curr Opin Struct Biol 14, 283 291.

PAGE 156

156 Bader, G.D., Donaldson, I., Wolting, C., Ouellette, B.F.F., Pawson, T., and Hogue, C.W.V. (2001). BIND --The Biomolecular Interaction Network Database. Nucl. Acids Res. 29, 24 2 245. Bader, G.D., and Hogue, C.W.V. (2000). BIND --a data specification for storing and describing biomolecular interactions, molecular complexes and pathways, Vol 16. Bader, G.D., and Hogue, C.W.V. (2003). An automated method for finding molecular complexes in large protein interaction networks, Vol 4. Baek, D., Villen, J., Shin, C., Camargo, F.D., Gygi, S.P., and Bartel, D.P. (2008). The impact of microRNAs on protein output. Nature 455 64 71. Bagga, S., Bracht, J., Hunter, S., Massirer, K., Holtz, J., Eachus, R., and Pasquinelli, A.E. (2005). Regulation by let 7 and lin4 miRNAs results in target mRNA degradation. Cell 122 553563. Bailey, T.L., Boden, M., Buske, F.A., Frith, M., Grant, C.E., Clementi, L., Ren, J., Li, W.W., and Noble, W.S. (2009). MEM E Suite: tools for motif discovery and searching. Nucl. Acids Res. 37 W202 W208. Ball, M.P., Li, J.B., Gao, Y., Lee, J.H., LeProust, E.M., Park, I.H., Xie, B., Daley, G.Q., and Church, G.M. (2009). Targeted and genomescale strategies reveal genebody met hylation signatures in human cells. Nat Biotechnol 27 361368. Banerjee, A., Schambach, F., DeJong, C.S., Hammond, S.M., and Reiner, S.L. (2010). Micro RNA 155 inhibits IFN gamma signaling in CD4+ T cells. Eur J Immunol 40 225231. Barabasi, A. L., and A lbert, R. (1999). Emergence of Scaling in Random Networks. Science 286 509 512. Barrera, L.O., Li, Z., Smith, A.D., Arden, K.C., Cavenee, W.K., Zhang, M.Q., Green, R.D., and Ren, B. (2008). Genome wide mapping and analysis of active promoters in mouse embryonic stem cells and adult organs. Genome Res 18 4659. Barski, A., Cuddapah, S., Cui, K., Roh, T.Y., Schones, D.E., Wang, Z., Wei, G., Chepelev, I., and Zhao, K. (2007). HighResolution Profiling of Histone Methylations in the Human Genome. Cell 129 823 837. Bartel, D.P. (2004). MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116, 281 297. Bartel, D.P. (2009). MicroRNAs: target recognition and regulatory functions. Cell 136 215233.

PAGE 157

157 Bartonicek, N., and Enright, A.J. (2010). SylArray: a web server for automated detection of miRNA effects from expression data. Bioinformatics 26, 29002901. Baskerville, S., and Bartel, D.P. (2005). Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. RNA 11, 241 247. Batchelor, E., Loewer, A., and Lahav, G. (2009). The ups and downs of p53: understanding protein dynamics in single cells. Nat Rev Cancer 9 371 377. Baugh, L.R., DeModena, J., and Sternberg, P.W. (2009). RNA Pol II Accumulates at Promoters o f Growth Genes During Developmental Arrest. Science 324 92 94. Begun, D., Holloway, A., Stevens, K., Hillier, L., Poh, Y. P., and al, e. (2007). Population Genomics: WholeGenome Analysis of Polymorphism and Divergence in Drosophila simulans. PLoS Biol 5 Benjamini, Y., and Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological) 57, 289300. Benson, D.A., Karsch Mizrachi, I., Lipman, D.J., Ostell, J., and Wheeler, D.L. (2005). GenBank. Nucleic Acids Research 33, D34 D38. Bernstein, B.E., Birney, E., Dunham, I., Green, E.D., Gunter, C., and Snyder, M. (2012). An integrated encyclopedia of DNA elements in the human genome. Nature 489 5774. Betel, D., Wilson, M., Gabon, A., Marks, D.S., and Sander, C. (2008). The microRNA.org resource: targets and expression. Nucleic Acids Res., D149 D153. Bieda, M., Xu, X., Singer, M.A., Green, R., and Farnham, P.J. (2006). Unbiased location analysi s of E2F1 binding sites suggests a widespread role for E2F1 in the human genome. Genome Research 16 595605. Biggin, M.D. (2011). Animal transcription networks as highly connected, quantitative continua. Dev Cell 21 611626. Blahnik, K.R., Dou, L., Echip are, L., Iyengar, S., O'Geen, H., Sanchez, E., Zhao, Y., Marra, M.A., Hirst, M., Costello, J.F. et al. (2011). Characterization of the Contradictory Chromatin Signatures at the 3 ONE 6 e17121. Blahnik, K.R., Dou, L., O'Geen, H., McPhillips, T., and Xu, X. (2009). SoleSearch: an integrated analysis program for peak detection and functional annotation using ChIP seq data. Nucl. Acids Res. 38 e13.

PAGE 158

158 Bland, J.M., and Altman, D.G. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 307 310. Bland, J.M., and Altman, D.G. (1988). Misleading Statistics: errors in textbooks, software and manuals. International Journal of Epidemiology 17, 201203. Bochkis, I.M., Schug, J., Ye, D.Z., Kurinna, S., Stratton, S.A., Barton, M.C., and Kaestner, K.H. (2012). GenomeWide Location Analysis Reveals Distinct Transcriptional Circuitry by Paralogous Regulators Foxa1 and Foxa2. PLoS Genet 8 e1002770. Borevitz, J.O., Liang, D., Plouffe, D., C hang, H.S., Zhu, T., Weigel, D., Berry, C.C., Winzeler, E., and Chory, J. (2003). Largescale identification of singlefeature polymorphisms in complex genomes, Vol 13. Boss, I.W., Nadeau, P.E., Abbott, J.R., Yang, Y., Mergia, A., and Renne, R. (2011). A K aposi's SarcomaAssociated Herpesvirus Encoded Ortholog of MicroRNA miR 155 Induces Human Splenic B Cell Expansion in NOD/LtSz Journal of Virology 85, 98779886. Botquin, V., Hess, H., Fuhrmann, G., Anastassiadis, C., Gross, M.K., Vrie nd, G., and Schler, H.R. (1998). New POU dimer configuration mediates antagonistic control of an osteopontin preimplantation enhancer by Oct 4 and Sox 2. Genes Dev 12, 20732090. Boutz, P.L., Chawla, G., Stoilov, P., and Black, D.L. (2007). MicroRNAs regulate the expression of the alternative splicing factor nPTB during muscle development. Genes Dev 21, 71 84. Bownes, M. (1994). The regulation of the yolk protein genes, a family of sex differentiation genes in Drosophila melanogaster. BioEssays 16, 745 752. Boyer, L.A., Plath, K., Zeitlinger, J., Brambrink, T., Medeiros, L.A., Lee, T.I., Levine, S.S., Wernig, M., Tajonar, A., Ray, M.K., et al. (2006). Polycomb complexes repress developmental regulators in murine embryonic stem cells. Nature 441 349353. Br anco, M.R., and Pombo, A. (2006). Intermingling of chromosome territories in interphase suggests role in translocations and transcriptiondependent associations. PLoS Biol 4 780 788. Breitkreutz, B.J., Stark, C., Reguly, T., Boucher, L., Breitkreutz, A., Livstone, M., Oughtred, R., Lackner, D.H., Bahler, J., Wood, V., et al. (2008). The BioGRID Interaction Database: 2008 update. Bullard, J., Purdom, E., Hansen, K., and Dudoit, S. (2010). Evaluation of statistical methods for normalization and differential expression in mRNA Seq experiments. BMC Bioinformatics 11 94.

PAGE 159

159 Butte, A.J., and Kohane, I.S. (2000). Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Pac Symp Biocomput., 418429. Cai, X., and Cullen, B.R. (2006). Transcriptional origin of Kaposi's sarcomaassociated herpesvirus microRNAs. J Virol 80 22342242. Cai, X., Lu, S., Zhang, Z., Gonzalez, C.M., Damania, B., and Cullen, B.R. (2005). Kaposi's sarcomaassociated herpesvirus expresses an array of viral microRNAs in latently infected cells. Proc Natl Acad Sci U S A 102 55705575. Calarco, J.A., Superina, S., O'Hanlon, D., Gabut, M., Raj, B., Pan, Q., Skalska, U., Clarke, L., Gelinas, D., van der Kooy, D. et al. (2009). Regulation of vertebrate nervous system alternative splicing and development by an SR related protein. Cell 138, 898 910. Calin, G.A., and Croce, C.M. (2006). MicroRNA signatures in human cancers. Nat Rev Cancer 6 857 866. Calin, G.A., Liu, C.G., Sevignani, C., Ferracin, M., Felli, N., Dumitru, C.D., Shimizu, M., Cimmino, A., Zupo, S., Dono, M. et al. (2004). MicroRNA profiling reveals distinct signatures in B cell chronic lymphocytic leukemias. Proc Natl Acad Sci U S A 101 1175511760. Cao, A.R., Rabinovich, R., Xu, M., X u, X., Jin, V.X., and Farnham, P.J. (2011). Genomewide Analysis of Transcription Factor E2F1 Mutant Proteins Reveals That N and C terminal Protein Interaction Domains Do Not Participate in Targeting E2F1 to the Human Genome. Journal of Biological Chemist ry 286, 1198511996. Carter, C.R., Whitmore, K.M., and Thorpe, R. (2004). The significance of carbohydrates on G CSF: differential sensitivity of G CSFs to human neutrophil elastase degradation. J Leukoc Biol 75, 515 522. Celniker, S.E., Dillon, L.A.L., Gerstein, M.B., Gunsalus, K.C., Henikoff, S., Karpen, G.H., Kellis, M., Lai, E.C., Lieb, J.D., MacAlpine, D.M., et al. (2009). Unlocking the secrets of the genome. Nature 459 927 930. Cesarman, E., and Knowles, D.M. (1999). The role of Kaposi's sarcomaas sociated herpesvirus (KSHV/HHV 8) in lymphoproliferative diseases. Semin Cancer Biol 9 165 174. Chandriani, S., Xu, Y., and Ganem, D. (2010). The lytic transcriptome of Kaposi's sarcoma associated herpesvirus reveals extensive transcription of noncoding r egions, including regions antisense to important genes. J Virol 84 79347942.

PAGE 160

160 Chang, T.Y., Wu, Y.H., Cheng, C.C., and Wang, H.W. (2011). Differentially regulated splice variants and systems biology analysis of Kaposi's sarcomaassociated herpesvirus infe cted lymphatic endothelial cells. Nucleic Acids Res 39 69706985. Chang, Y., Cesarman, E., Pessin, M.S., Lee, F., Culpepper, J., Knowles, D.M., and Moore, P.S. (1994). Identification of herpesvirus like DNA sequences in AIDS associated Kaposi's sarcoma. S cience 266 18651869. Chen, C.Z., Li, L., Lodish, H.F., and Bartel, D.P. (2004). MicroRNAs modulate hematopoietic lineage differentiation. Science 303 83 86. Chen, Y., Meyer, C., Liu, T., Li, W., Liu, J., and Liu, X.S. (2011). MM ChIP enables integrative analysis of cross platform and betweenlaboratory ChIP chip or ChIP seq data. Genome Biology 12, R11. Chen, Y., Negre, N., Li, Q., Mieczkowska, J.O., Slattery, M., Liu, T., Zhang, Y., Kim, T.K., He, H.H., Zieba, J. et al. (2012). Systematic evaluation o f factors influencing ChIP seq fidelity. Nat Meth 9 609 614. Chen, Y., Zhu, J., Lum, P.Y., Yang, X., Pinto, S., MacNeil, D.J., Zhang, C., Lamb, J., Edwards, S., Sieberts, S.K., et al. (2008). Variations in DNA elucidate molecular networks that cause disease. Nature 452 429 435. Cheng, C., Alexander, R., Min, R., Leng, J., Yip, K.Y., Rozowsky, J., Yan, K.K., Dong, X., Djebali, S., Ruan, Y. et al. (2012). Understanding transcriptional regulation by integrative analysis of transcription factor binding data. Genome Res 22, 16581667. Cheng, C., Yan, K.K., Hwang, W., Qian, J., Bhardwaj, N., Rozowsky, J., Lu, Z.J., Niu, W., Alves, P., Kato, M. et al. (2011). Construction and analysis of an integrated regulatory network derived from highthroughput sequencing d ata. PLoS Comput Biol 7 e1002190. Cheng, H., Jiang, L., Wu, M., and Liu, Q. (2009). Inferring Transcriptional Interactions by the Optimal Integration of ChIP chip and Knock out Data. Bioinform Biol Insights 3 129 140. Chi, S.W., Zang, J.B., Mele, A., and Darnell, R.B. (2009). Argonaute HITS CLIP decodes microRNA mRNA interaction maps. Nature 460 479 486. Chu, T. M., Weir, B., and Wolfinger, R. (2002). A systematic statistical linear modeling approach to oligonucleotide array experiments. Mathematical Biosciences 176 3551. Chuang, H.Y., Lee, E., Liu, Y.T., Lee, D., and Ideker, T. (2007). Network based classification of breast cancer metastasis. Mol Syst Biol 3 140.

PAGE 161

161 Churchill, G.A. (2002). Fundamentals of experimental design for cDNA microarrays. Nat Gen et. Cline, M.S., Smoot, M., Cerami, E., Kuchinsky, A., Landys, N., Workman, C., Christmas, R., Avila Campilo, I., Creech, M., Gross, B., et al. (2007). Integration of biological networks and gene expression data using Cytoscape. Nat. Protocols 2 2366 2382. Cloutier, N., and Flamand, L. (2010). Kaposi sarcomaassociated herpesvirus latency associated nuclear antigen inhibits interferon (IFN) beta expression by competing with IFN regulatory factor 3 for binding to IFNB promoter. J Biol Chem 285 72087221. Coffman, C., Wayne, M., Nuzhdin, S., Higgins, L., and McIntyre, L. (2005). Identification of co regulated transcripts affecting male body size in Drosophila. Genome Biology 6 R53. Cohen, A.A., GevaZatorsky, N., Eden, E., Frenkel Morgenstern, M., Issaeva, I., Sigal, A., Milo, R., Cohen Saidon, C., Liron, Y., Kam, Z. et al. (2008). Dynamic proteomics of individual cancer cells in response to a drug. Science 322 15111516. Consortium, G.P. (2011). A map of human genome variation from populationscale sequencing. Nature 467 10611073. Costinean, S., Sandhu, S.K., Pedersen, I.M., Tili, E., Trotta, R., Perrotti, D., Ciarlariello, D., Neviani, P., Harb, J., Kauffman, L.R., et al. (2009). Src homology 2 domaincontaining inositol 5 phosphatase and CCAAT enhancer binding protein beta are targeted by miR 155 in B cells of Emicro MiR 155 transgenic mice. Blood 114, 13741382. Costinean, S., Zanesi, N., Pekarsky, Y., Tili, E., Volinia, S., Heerema, N., and Croce, C.M. (2006). Pre B cell proliferation and lymphoblastic leukemia/highgrade lymphoma in E(mu) miR155 transgenic mice. Proc Natl Acad Sci U S A 103 70247029. Cowles, C.R., Hirschhorn, J.N., Altshuler, D., and Lander, E.S. (2002). Detection of regulatory variation in mouse genes. Nat Genet 32, 432 437. Damania, B. (2004). Modulation of cell signaling pathways by Kaposi's sarcomaassociated herpesvirus (KSHVHHV 8). Cell Biochem Biophys 40, 305 322. Dandy, D.S., Wu, P., and Grainger, D.W. (2007). Array feature size influences nucleic acid surface capture i n DNA microarrays. PNAS 104 82238228. Davidson, E.H., Rast, J.P., Oliveri, P., Ransick, A., Calestani, C., Yuh, C.H., Minokawa, T., Amore, G., Hinman, V., Arenas Mena, C. et al. (2002). A genomic regulatory network for development. Science 295 1669 167 8.

PAGE 162

162 de la Chapelle, A. (2009). Genetic predisposition to human disease: allelespecific expression and low penetrance regulatory loci. Oncogene 28 3345 3348. Deaton, A.M., Webb, S., Kerr, A.R., Illingworth, R.S., Guy, J., Andrews, R., and Bird, A. (2011). Cell type specific DNA methylation at intragenic CpG islands in the immune system. Genome Res 21 10741086. Degner, J.F., Marioni, J.C., Pai, A.A., Pickrell, J.K., Nkadori, E., Gilad, Y., and Pritchard, J.K. (2009a). Effect of read mapping biases on detec ting allele specific expression from RNA sequencing data. Bioinformatics 25 32073212. Degner, S.C., Wong, T.P., Jankevicius, G., and Feeney, A.J. (2009b). Cutting edge: developmental stagespecific recruitment of cohesin to CTCF sites throughout immunogl obulin loci during B lymphocyte development. J Immunol 182 44 48. Delgado, T., Carroll, P.A., Punjabi, A.S., Margineantu, D., Hockenbery, D.M., and Lagunoff, M. (2010). Induction of the Warburg effect by Kaposi's sarcoma herpesvirus is required for the maintenance of latently infected endothelial cells. Proc Natl Acad Sci U S A 107 1069610701. Dews, M., Homayouni, A., Yu, D., Murphy, D., Sevignani, C., Wentzel, E., Furth, E.E., Lee, W.M., Enders, G.H., Mendell, J.T. et al. (2006). Augmentation of tumor angiogenesis by a Myc activated microRNA cluster. Nat Genet 38, 10601065. Dittmer, D., Lagunoff, M., Renne, R., Staskus, K., Haase, A., and Ganem, D. (1998). A cluster of latently expressed genes in Kaposi's sarcomaassociated herpesvirus. J Virol 72, 830 9 8315. Djebali, S., Davis, C.A., Merkel, A., Dobin, A., Lassmann, T., Mortazavi, A., Tanzer, A., Lagarde, J., Lin, W., Schlesinger, F. et al. (2012). Landscape of transcription in human cells. Nature 489 101108. Doench, J.G., and Sharp, P.A. (2004). Specificity of microRNA target selection in translational repression. Genes Dev 18, 504 511. Dohm, J.C., Lottaz, C., Borodina, T., and Himmelbauer, H. (2008). Substantial biases in ultra short read data sets from highthroughput DNA sequencing. Nucleic Acids Research 36, e105e105. Dorsett, Y., McBride, K.M., Jankovic, M., Gazumyan, A., Thai, T.H., Robbiani, D.F., Di Virgilio, M., Reina San Martin, B., Heidkamp, G., Schwickert, T.A. et al. (2008). MicroRNA 155 suppresses activationinduced cytidine deaminase mediated Myc Igh translocation. Immunity 28 630 638. Dudoit, S., Yang, Y.H., Callow, M.J., and Speed, T.P. (2002). Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica 12 111139.

PAGE 163

163 Dweep, H., Sticht, C., Pandey, P., and Gretz, N. (2011). miRWalk --database: prediction of possible miRNA binding sites by "walking" the genes of three genomes. J Biomed Inform 44 839847. Dworkin, I., and Jones, C.D. (2009). Genetic changes accompanying the evolution of host specialization in Drosophila sechellia. Genetics 181 721 736. Ebert, M.S., Neilson, J.R., and Sharp, P.A. (2007). MicroRNA sponges: competitive inhibitors of smal l RNAs in mammalian cells. Nat Methods 4 721 726. Ebert, Margaret S., and Sharp, Phillip A. (2012). Roles for MicroRNAs in Conferring Robustness to Biological Processes. Cell 149 515524. Edenberg, H.J., Bierut, L.J., Boyce, P., Cao, M., Cawley, S., Chil es, R., Doheny, K.F., Hansen, M., Hinrichs, T., Jones, K. et al. (2005). Description of the data from the Collaborative Study on the Genetics of Alcoholism (COGA) and singlenucleotide polymorphism genotyping for Genetic Analysis Workshop 14. BMC Genet 6 Suppl 1, S2. Edwards, R.H., Marquitz, A.R., and Raab Traub, N. (2008). EpsteinBarr virus BART microRNAs are produced from a large intron prior to splicing. J Virol 82, 90949106. Eis, P.S., Tam, W., Sun, L., Chadburn, A., Li, Z., Gomez, M.F., Lund, E., an d Dahlberg, J.E. (2005). Accumulation of miR 155 and BIC RNA in human B cell lymphomas. Proc Natl Acad Sci U S A 102 36273632. Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. (1998). Cluster analysis and display of genomewide expression patterns. Proc Natl Acad Sci U S A 95 1486314868. Emerson, J.J., Hsieh, L.C., Sung, H.M., Wang, T.Y., Huang, C.J., Lu, H.H. S., Lu, M. Y.J., Wu, S.H., and Li, W. H. (2010). Natural selection on cis and trans regulation in yeasts. Genome Research 20 826 836. Eulalio, A., Huntzinger, E., Nishihara, T., Rehwinkel, J., Fauser, M., and Izaurralde, E. (2009). Deadenylation is a widespread effect of miRNA regulation. RNA 15, 21 32. Fabbri, M., Garzon, R., Cimmino, A., Liu, Z., Zanesi, N., Callegari, E., Liu, S., Alder, H., Costinean, S., Fernandez Cymering, C. et al. (2007). MicroRNA 29 family reverts aberrant methylation in lung cancer by targeting DNA methyltransferases 3A and 3B. Proceedings of the National Academy of Sciences 104 1580515810.

PAGE 164

164 Faith, J.J ., Hayete, B., Thaden, J.T., Mogno, I., Wierzbowski, J., Cottarel, G., Kasif, S., Collins, J.J., and Gardner, T.S. (2007). Largescale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol 5 e8. Farh, K.K. H., Grimson, A., Jan, C., Lewis, B.P., Johnston, W.K., Lim, L.P., Burge, C.B., and Bartel, D.P. (2005). The Widespread Impact of Mammalian MicroRNAs on mRNA Repression and Evolution. Science 310 18171821. Fields, S., and Song, O. (1989). A novel genetic system to detect proteinprotein interactions. Nature 340 245 246. Fleiss, J. (1981). Statistical methods for rates and proportions, 2 edn (Wiley, New York). Fontanillas, P., Landry, C.R., Wittkopp, P.J., Russ, C., Gruber, J.D., Nusbaum C., and Hartl, D.L. (2010). Key considerations for measuring allelic expression on a genomic scale using highthroughput sequencing. Molecular Ecology 19, 212 227. Frankel, N., Davis, G.K., Vargas, D., Wang, S., Payre, F., and Stern, D.L. (2010). Phenoty pic robustness conferred by apparently redundant transcriptional enhancers. Nature 466, 490 493. Fraser, P., and Bickmore, W. (2007). Nuclear organization of the genome and the potential for gene regulation. Nature 447 413417. Friborg, J., Jr., Kong, W., Hottiger, M.O., and Nabel, G.J. (1999). p53 inhibition by the LANA protein of KSHV protects against cell death. Nature 402 889 894. Friedman, N., Linial, M., Nachman, I., and Pe'er, D. (2000). Using Bayesian networks to analyze expression data. J Comput Biol 7 601 620. Friedman, R.C., Farh, K.K., Burge, C.B., and Bartel, D.P. (2009). Most mammalian mRNAs are conserved targets of microRNAs. Genome Res 19, 92 105. Frith, M.C., Wan, R., and Horton, P. (2010). Incorporating sequence quality data into alignme nt improves DNA read mapping. Nucleic Acids Research 38, e100. Fujiwara, T., O'Geen, H., Keles, S., Blahnik, K., Linnemann, A.K., Kang, Y.A., Choi, K., Farnham, P.J., and Bresnick, E.H. (2009). Discovering Hematopoietic Mechanisms through Genomewide Anal ysis of GATA Factor Chromatin Occupancy. Molecular cell 36, 667681. Fullwood, M.J., Liu, M.H., Pan, Y.F., Liu, J., Xu, H., Mohamed, Y.B., Orlov, Y.L., Velkov, S., Ho, A., Mei, P.H., et al. (2009). An oestrogenreceptor [agr] bound human chromatin interact ome. Nature 462 58 64.

PAGE 165

165 Gaidatzis, D., van Nimwegen, E., Hausser, J., and Zavolan, M. (2007). Inference of miRNA targets using evolutionary conservation and pathway analysis. BMC Bioinformatics 8 69. Gao, Y., Schug, J., McKenna, L.B., Le Lay, J., Kaestner K.H., and Greenbaum, L.E. (2011). Tissuespecific regulation of mouse microRNA genes in endoderm derived tissues. Nucleic Acids Res 39, 454 463. Garber, A.C., Hu, J., and Renne, R. (2002). Latency associated nuclear antigen (LANA) cooperatively binds to two sites within the terminal repeat, and both sites contribute to the ability of LANA to suppress transcription and to facilitate DNA replication. J Biol Chem 277 2740127411. Garber, A.C., Shu, M.A., Hu, J., and Renne, R. (2001). DNA binding and modulat ion of gene expression by the latency associated nuclear antigen of Kaposi's sarcomaassociated herpesvirus. J Virol 75 7882 7892. Garzon, R., Volinia, S., Liu, C.G., Fernandez Cymering, C., Palumbo, T., Pichiorri, F., Fabbri, M., Coombes, K., Alder, H., Nakamura, T. et al. (2008). MicroRNA signatures associated with cytogenetics and prognosis in acute myeloid leukemia. Blood 111 31833189. Georgantas, R.W., 3rd, Hildreth, R., Morisot, S., Alder, J., Liu, C.G., Heimfeld, S., Calin, G.A., Croce, C.M., and Civin, C.I. (2007). CD34+ hematopoietic stem progenitor cell microRNA expression and function: a circuit diagram of differentiation control. Proc Natl Acad Sci U S A 104 2750 2755. Gerstein, M.B., Kundaje, A., Hariharan, M., Landt, S.G., Yan, K.K., Cheng C., Mu, X.J., Khurana, E., Rozowsky, J., Alexander, R. et al. (2012). Architecture of the human regulatory network derived from ENCODE data. Nature 489 91 100. Gheldof, N., Smith, E.M., Tabuchi, T.M., Koch, C.M., Dunham, I., Stamatoyannopoulos, J.A., a nd Dekker, J. (2010). Cell type specific long range looping interactions identify distant regulatory elements of the CFTR gene. Nucleic Acids Res 38 43254336. Gottwein, E., Corcoran, D.L., Mukherjee, N., Skalsky, R.L., Hafner, M., Nusbaum, J.D., Shamulai latpam, P., Love, C.L., Dave, S.S., Tuschl, T., et al. (2011). Viral microRNA targetome of KSHV infected primary effusion lymphoma cell lines. Cell Host Microbe 10 515526. Gottwein, E., Mukherjee, N., Sachse, C., Frenzel, C., Majoros, W.H., Chi, J.T., Br aich, R., Manoharan, M., Soutschek, J., Ohler, U., et al. (2007). A viral microRNA functions as an orthologue of cellular miR 155. Nature 450 10961099. Graur, D., Zheng, Y., Price, N., Azevedo, R.B., Zufall, R.A., and Elhaik, E. (2013). On the immortalit y of television sets: "function" in the human genome according to the evolutionfree gospel of ENCODE. Genome Biol Evol 5 578590.

PAGE 166

166 Graze, R.M., McIntyre, L.M., Main, B.J., Wayne, M.L., and Nuzhdin, S.V. (2009). Regulatory Divergence in Drosophila melanogaster and D. simulans, a Genomewide Analysis of Allelespecific Expression. Genetics, genetics.109.105957. Graze, R.M., Novelo, L.L., Amin, V., Fear, J.M., Casella, G., Nuzhdin, S.V., and McIntyre, L.M. (2012). Allelic imbalance in Drosophila hybrid heads: exons, isoforms, and evolution. Mol Biol Evol 29, 15211532. Grieneisen, V.A., Xu, J., Maree, A.F.M., Hogeweg, P., and Scheres, B. (2007). Auxin transport is sufficient to generate a maximum and gradient guiding root growth. Nature 449 10081013. Griffiths Jones, S. (2004). The microRNA Registry. Nucleic Acids Res 32, D109111. Grimson, A., Farh, K.K., Johnston, W.K., GarrettEngele, P., Lim, L.P., and Bartel, D.P. (2007). MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell 27, 91 105. Gross, D.S., and Garrard, W.T. (1988). Nuclease hypersensitive sites in chromatin. Annu Rev Biochem 57, 159197. Grundhoff, A., Sullivan, C.S., and Ganem, D. (2006). A combined computational and microarray based approach identifies novel microRNAs encoded by human gamma herpesviruses. Rna 12 733750. Gunsalus, K.C., Ge, H., Schetter, A.J., Goldberg, D.S., Han, J.D., Hao, T., Berriz, G.F., Bertin, N., Huang, J., Chuang, L.S., et al. (2005). Predictive models of molecular machines invol ved in Caenorhabditis elegans early embryogenesis. Nature 436 861865. Guo, H., Ingolia, N.T., Weissman, J.S., and Bartel, D.P. (2010). Mammalian microRNAs predominantly act to decrease target mRNA levels. Nature 466 835 840. Guo, M., Rupe, M.A., Zinselm eier, C., Habben, J., Bowen, B.A., and Smith, O.S. (2004). Allelic Variation of Gene Expression in Maize Hybrids. Plant Cell 16 17071716. Guo, M., Yang, S., Rupe, M., Hu, B., Bickel, D., Arthur, L., and Smith, O. (2008). Genomewide allele specific expre ssion analysis using Massively Parallel Signature Sequencing (MPSS) Reveals <i>cis </i> and <i>trans</i> effects on gene expression in maize hybrid meristem tissue. Plant Molecular Biology 66, 551 563. Ha, M., Ng, D.W.K., Li, W. H ., and Chen, Z.J. (2011). Coordinated histone modifications are associated with gene expression variation within and between species. Genome Research 21 590598.

PAGE 167

167 Hadjur, S., Williams, L.M., Ryan, N.K., Cobb, B.S., Sexton, T., Fraser, P., Fisher, A.G., and Merkenschlager, M. (2009). Cohesins form chromosomal cis interactions at the developmentally regulated IFNG locus. Nature 460 410 413. Haecker, I., Gay, L.A., Yang, Y., Hu, J., Morse, A.M., McIntyre, L.M., and Renne, R. (2012). Ago HITS CLIP Expands Understanding of Kaposi's Sarcomaassociated Herpesvirus miRNA Function in Primary Effusion Lymphomas. PLoS Pathog 8 e1002884. Hafner, M., Landthaler, M., Burger, L., Khorshid, M., Hausser, J., Berninger, P., Rothballer, A., Ascano, M., Jr., Jungkamp, A.C., M unschauer, M. et al. (2010). Transcriptomewide identification of RNA binding protein and microRNA target sites by PAR CLIP. Cell 141, 129141. Hagen, J.W., and Lai, E.C. (2008). microRNA control of cell cell signaling during development and disease. Cell Cycle 7 2327 2332. Han, J.D., Bertin, N., Hao, T., Goldberg, D.S., Berriz, G.F., Zhang, L.V., Dupuy, D., Walhout, A.J., Cusick, M.E., Roth, F.P., et al. (2004). Evidence for dynamically organized modularity in the yeast proteinprotein interaction networ k. Nature 430 8893. Handoko, L., Xu, H., Li, G., Ngan, C.Y., Chew, E., Schnapp, M., Lee, C.W.H., Ye, C., Ping, J.L.H., Mulawadi, F., et al. (2011). CTCF mediated functional chromatin interactome in pluripotent cells. Nat Genet 43, 630 638. Hartemink, A.J ., Gifford, D.K., Jaakkola, T.S., and Young, R.A. (2001). Using graphical models and genomic expression data to statistically validate models of genetic regulatory networks. Pac Symp Biocomput, 422433. Haskill, S., Beg, A.A., Tompkins, S.M., Morris, J.S., Yurochko, A.D., SampsonJohannes, A., Mondal, K., Ralph, P., and Baldwin Jr, A.S. (1991). Characterization of an immediateearly gene induced in adherent monocytes that like activity. Cell 65, 12811289. Hawley, D.K., and McClure, W.R. (1983). Compilation and analysis of Escherichia coli promoter DNA sequences. Nucleic Acids Res 11, 22372255. Heintzman, N.D., Hon, G.C., Hawkins, R.D., Kheradpour, P., Stark, A., Harp, L.F., Ye, Z., Lee, L.K., Stuart, R.K., Ching, C.W. et al. (2009). Histone modifications at human enhancers reflect global cell type specific gene expression. Nature 459 108112. Helwak, A., Kudla, G., Dudnakova, T., and Tollervey, D. (2013). Mapping the Human miRNA Interactome by CLASH Reveals Frequent Noncanonical Binding. Cell 153, 654 665. Ho, N. (2001). More light shed on the Mauna Kea controversy. Nature 411 737738.

PAGE 168

168 Hong, Y.K., Foreman, K., Shin, J.W., Hirakawa, S., Curry, C.L., Sage, D.R., Libermann, T., Dezube, B.J., Fingeroth, J.D., and Detmar, M. (2004). Lymphatic repr ogramming of blood vascular endothelium by Kaposi sarcomaassociated herpesvirus. Nat Genet 36, 683 685. Hou, C., Dale, R., and Dean, A. (2010). Cell type specificity of chromatin organization mediated by CTCF and cohesin. Proc Natl Acad Sci U S A 107 365 1 3656. Hu, J., Garber, A.C., and Renne, R. (2002). The latency associated nuclear antigen of Kaposi's sarcomaassociated herpesvirus supports latent DNA replication in dividing cells. J Virol 76, 11677 11687. Hu, J., and Renne, R. (2005). Characterization of the minimal replicator of Kaposi's sarcoma associated herpesvirus latent origin. J Virol 79 2637 2642. Hu, J., Yang, Y., McIntyre, L., Morse, A., and Renne, R. (In prep). LANA Selectively Associates with H3K4 Methyltransferase hSET1 and Contributes to the KSHV Epigenome. Hutchins, A.P., Diez, D., Takahashi, Y., Ahmad, S., Jauch, R., Tremblay, M.L., and MirandaSaavedra, D. (2013). Distinct transcriptional regulatory modules underlie STAT3's cell type independent and cell typespecific functions. Nucl. Acids Res. 41, 21552170. Ideker, T., Thorsson, V., Ranish, J.A., Christmas, R., Buhler, J., Eng, J.K., Bumgarner, R., Goodlett, D.R., Aebersold, R., and Hood, L. (2001). Integrated genomic and proteomic analyses of a systematically perturbed metabolic net work. Science, 929 934. Ihmels, J., Friedlander, G., Bergmann, S., Sarig, O., Ziv, Y., and Barkai, N. (2002). Revealing modular organization in the yeast transcriptional network. Nat. Genet, 370 377. Inui, M., Martello, G., and Piccolo, S. (2010). MicroRNA control of signal transduction. Nat Rev Mol Cell Biol 11, 252263. Iorio, M.V., Piovan, C., and Croce, C.M. (2010). Interplay between microRNAs and the epigenetic machinery: An intricate network. Biochimica et Biophysica Acta (BBA) Gene Regulatory Mechanisms 1799 694 701. Ishida, S., Huang, E., Zuzan, H., Spang, R., Leone, G., West, M., and Nevins, J.R. (2001). Role for E2F in control of both DNA replication and mitotic functions as revealed from DNA microarray analysis. Mol Cell Biol 21, 46844699. Ito T., Chiba, T., Ozawa, R., Yoshida, M., Hattori, M., and Sakaki, Y. (2001). A comprehensive twohybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci U S A 98 45694574.

PAGE 169

169 Iyer, V.R., Horak, C.E., Scafe, C.S., Botstein, D., Snyder, M ., and Brown, P.O. (2001). Genomic binding sites of the yeast cell cycle transcription factors SBF and MBF. Nature 409 533 538. Jansen, R.C., Tesson, B.M., Fu, J., Yang, Y., and McIntyre, L.M. (2009). Defining gene and QTL networks. Curr Opin Plant Biol 1 2 241 246. Jeong, H., Mason, S.P., Barabasi, A.L., and Oltvai, Z.N. (2001). Lethality and centrality in protein networks, Vol 411. Jeyapalan, Z., Deng, Z., Shatseva, T., Fang, L., He, C., and Yang, B.B. (2011). Expression of CD44 3' untranslated region re gulates endogenous microRNA functions in tumorigenesis and angiogenesis. Nucleic Acids Res 39 30263041. Ji, H., Jiang, H., Ma, W., Johnson, D., and Myers, R. (2008). An integrated software system for analyzing ChIP chip and ChIP seq data. Nat Biotechnol. 26, 1293 1300. Ji, Z., and Tian, B. (2009). Reprogramming of 3' untranslated regions of mRNAs by alternative polyadenylation in generation of pluripotent stem cells from different cell types. PLoS One 4 e8419. Jiang, C., Xuan, Z., Zhao, F., and Zhang, M.Q. (2007). TRED: a transcriptional regulatory element database, new entries and other development. Nucl. Acids Res. 35 D137 D140. Jiang, S., Zhang, H.W., Lu, M.H., He, X.H., Li, Y., Gu, H., Liu, M.F., and Wang, E.D. (2010). MicroRNA 155 functions as an OncomiR in breast cancer by targeting the suppressor of cytokine signaling 1 gene. Cancer Res 70, 31193127. Jin, W., Riley, R.M., Wolfinger, R.D., White, K.P., Passador Gurgel, G., and Gibson, G. (2001). The contributions of sex, genotype and age to trans criptional variance in Drosophila melanogaster. Nat Genet 29 389 395. Johnson, A.D., Wang, D., and Sadee, W. (2005). Polymorphisms affecting gene regulation and mRNA processing: Broad implications for pharmacogenetics. Pharmacology & Therapeutics 106 19 38. Johnson, D.S., Mortazavi, A., Myers, R.M., and Wold, B. (2007). Genomewide mapping of in vivo proteinDNA interactions. Science 316 1497 1502. Johnson, J.M., Castle, J., GarrettEngele, P., Kan, Z., Loerch, P.M., Armour, C.D., Santos, R., Schadt E.E., Stoughton, R., and Shoemaker, D.D. (2003). GenomeWide Survey of Human Alternative PremRNA Splicing with Exon Junction Microarrays. Science 302 21412144. Johnson, R.A., and Wichern, D.W. (1992). Applied multivariate Statistical Analysis, 3 edn ( Prentice Hall).

PAGE 170

170 Joo, C.H., Shin, Y.C., Gack, M., Wu, L., Levy, D., and Jung, J.U. (2007). Inhibition of interferon regulatory factor 7 (IRF7) mediated interferon signal transduction by the Kaposi's sarcomaassociated herpesvirus viral IRF homolog vIRF3. J Virol 81, 82828292. Jothi, R., Cuddapah, S., Barski, A., Cui, K., and Zhao, K. (2008). Genomewide identification of in vivo proteinDNA binding sites from ChIP Seq data. Nucl. Acids Res. 36, 5221 5231. Kabnick, K.S., and Housman, D.E. (1988). Determinants that contribute to cytoplasmic stability of human c fos and betaglobin mRNAs are located at several sites in each mRNA. Mol Cell Biol 8 32443250. Kadonaga, J.T. (2004). Regulation of RNA polymerase II transcription by sequence specific DNA binding f actors. Cell 116 247 257. Kalsotra, A., Wang, K., Li, P.F., and Cooper, T.A. (2010). MicroRNAs coordinate an alternative splicing network during mouse postnatal heart development. Genes Dev 24, 653 658. Karin, M. (2006). Nuclear factor [kappa]B in cancer development and progression. Nature 441 431 436. Karp Pd, R.M.S.M.P.I.T.P.S.M., and et al. (2000). The EcoCyc and MetaCyc databases. Nucleic Acids Res. 28 56 59. Kasowski, M., Grubert, F., Heffelfinger, C., Hariharan, M., Asabere, A., Waszak, S.M., Habeg ger, L., Rozowsky, J., Shi, M., Urban, A.E., et al. (2010). Variation in Transcription Factor Binding Among Humans. Science 328 232235. Kasperczyk, H., Baumann, B., Debatin, K. M., and Fulda, S. (2009). Characterization of sonic hedgehog as a novel NF target gene that promotes NF mediated apoptosis resistance and tumor growth in vivo. Faseb j 23 21 33. Kassouf, M.T., Hughes, J.R., Taylor, S., McGowan, S.J., Soneji, S., Green, A.L., Vyas, P., and Porcher, C. (2010). Genomewide identification of TAL1's functional targets: insights into its mechanisms of action in primary erythroid cells. Genome Res 20, 10641083. Kennell, J.A., Gerin, I., MacDougald, O.A., and Cadigan, K.M. (2008). The microRNA miR8 is a conserved negative regulator of Wnt signaling Proc Natl Acad Sci U S A 105 15417 15422. Kerr, K.M. (2003). Design Considerations for Efficient and Effective Microarray Studies. Biometrics 59, 822 828.

PAGE 171

171 Kerrien, S., Alam Faruque, Y., Aranda, B., Bancarz, I., Bridge, A., Derow, C., Dimmer, E., Feuermann, M., Friedrichsen, A., Huntley, R. et al. (2007). IntAct --open source resource for molecular interaction data, Vol 35. Kertesz, M., Iovino, N., Unnerstall, U., Gaul, U., and Segal, E. (2007). The role of site accessibility in microRNA target rec ognition. Nat Genet 39, 12781284. Kharchenko, P.V., Tolstorukov, M.Y., and Park, P.J. (2008). Design and analysis of ChIP seq experiments for DNA binding proteins. Nat Biotechnol. 26 1351 1359. Kidder, B.L., Hu, G., and Zhao, K. (2011). ChIP Seq: technical considerations for obtaining highquality data. Nat Immunol 12 918922. Kim, Y.J., Cecchini, K.R., and Kim, T.H. (2011). Conserved, developmentally regulated mechanism couples chromosomal looping and heterochromatin barrier activity at the homeobox g ene A locus. Proc Natl Acad Sci U S A 108 73917396. Kiriakidou, M., Nelson, P.T., Kouranov, A., Fitziev, P., Bouyioukos, C., Mourelatos, Z., and Hatzigeorgiou, A. (2004). A combined computational experimental approach predicts human microRNA targets. Genes Dev 18, 11651178. Kiuchi, N., Nakajima, K., Ichiba, M., Fukada, T., Narimatsu, M., Mizuno, K., Hibi, M., and Hirano, T. (1999). STAT3 is required for the gp130mediated full activation of the cmyc gene. J Exp Med 189 63 73. Kloosterman, W.P., and Plasterk, R.H.A. (2006). The Diverse Functions of MicroRNAs in Animal Development and Disease. Developmental Cell 11, 441 450. Kluiver, J., Poppema, S., de Jong, D., Blokzijl, T., Harms, G., Jacobs, S., Kroesen, B.J., and van den Berg, A. (2005). BIC and m iR155 are highly expressed in Hodgkin, primary mediastinal and diffuse large B cell lymphomas. J Pathol 207 243 249. Kolasinska Zwierz, P., Down, T., Latorre, I., Liu, T., Liu, X.S., and Ahringer, J. (2009). Differential chromatin marking of introns and expressed exons by H3K36me3. Nat Genet 41, 376 381. Kopp, A., Barmina, O., Hamilton, A.M., Higgins, L., McIntyre, L.M., and Jones, C.D. (2008). Evolution of gene expression in the Drosophila olfactory system. Mol Biol Evol 25, 10811092. Krek, A., Grun, D., Poy, M.N., Wolf, R., Rosenberg, L., Epstein, E.J., MacMenamin, P., da Piedade, I., Gunsalus, K.C., Stoffel, M. et al. (2005). Combinatorial microRNA target predictions. Nat Genet 37 495 500.

PAGE 172

172 Krithivas, A., Young, D.B., Liao, G., Greene, D., and Haywar d, S.D. (2000). Human herpesvirus 8 LANA interacts with proteins of the mSin3 corepressor complex and negatively regulates EpsteinBarr virus gene expression in dually infected PEL cells. J Virol 74 96379645. Krogan, N.J., Cagney, G., Yu, H., Zhong, G., Guo, X., Ignatchenko, A., Li, J., Pu, S., Datta, N., Tikuisis, A.P. et al. (2006). Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440, 637 643. Krger, J., and Rehmsmeier, M. (2006). RNAhybrid: microRNA target predicti on easy, fast and flexible. Nucleic Acids Res 34 W451454. Krtzfeldt, J., Rajewsky, N., Braich, R., Rajeev, K.G., Tuschl, T., Manoharan, M., and Stoffel, M. (2005). Silencing of microRNAs in vivo with 'antagomirs'. Nature 438 685689. Kuan, P.F., Chung, Statistical Framework for the Analysis of ChIP Seq Data. Journal of the American Statistical Association 106, 891 903. Kuhn, D.E., Martin, M.M., Feldman, D.S., Terry, A.V., Nuovo, G.J., and Elton, T.S. (2008). Experimental validation of miRNA targets. Methods 44 47 54. Kurdistani, S.K., Tavazoie, S., and Grunstein, M. (2004). Mapping Global Histone Acetylation Patterns to Gene Expression. Cell 117 721 733. Kwan, T., Benovoy, D., Dias, C., G urd, S., Provencher, C., Beaulieu, P., Hudson, T.J., Sladek, R., and Majewski, J. (2008). Genome wide analysis of transcript isoform variation in humans. Nat Genet 40 225 231. Lagos, D., Trotter, M.W., Vart, R.J., Wang, H.W., Matthews, N.C., Hansen, A., F lore, O., Gotch, F., and Boshoff, C. (2007). Kaposi sarcoma herpesvirus encoded vFLIP and vIRF1 regulate antigen presentation in lymphatic endothelial cells. Blood 109, 15501558. Lagos Quintana, M., Rauhut, R., Yalcin, A., Meyer, J., Lendeckel, W., and Tuschl, T. (2002). Identification of tissuespecific microRNAs from mouse. Curr Biol 12, 735739. Lahav, G., Rosenfeld, N., Sigal, A., Geva Zatorsky, N., Levine, A.J., Elowitz, M.B., and Alon, U. (2004). Dynamics of the p53Mdm2 feedback loop in individual c ells. Nat Genet 36 147 150. Lan, K., Kuppers, D.A., Verma, S.C., and Robertson, E.S. (2004). Kaposi's sarcomaassociated herpesvirus encoded latency associated nuclear antigen inhibits lytic replication by targeting Rta: a potential mechanism for virus me diated control of latency. J Virol 78 6585 6594.

PAGE 173

173 Landt, S.G., Marinov, G.K., Kundaje, A., Kheradpour, P., Pauli, F., Batzoglou, S., Bernstein, B.E., Bickel, P., Brown, J.B., Cayting, P., et al. (2012). ChIP seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res 22, 18131831. Langmead, B., Trapnell, C., Pop, M., and Salzberg, S. (2009). Ultrafast and memory efficient alignment of short DNA sequences to the human genome. G enome Biology 10, R25. Lau, N.C., Lim, L.P., Weinstein, E.G., and Bartel, D.P. (2001). An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294 858862. Lawrence, C.E., and Reilly, A.A. (1990). An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins 7 41 51. Lee, B. K., Bhinge, A.A., Battenhouse, A., McDaniell, R.M., Liu, Z., Song, L., Ni, Y., Birney, E., Lieb, J.D., Furey, T.S. et al. (2011). Cell type specific and combinatorial usage of diverse transcription factors revealed by genomewide binding studies in multiple human cells. Genome Res. 22 9 24. Lee, R.C., Feinbaum, R.L., and Ambros, V. (1993). The C. elegans heterochronic gene lin 4 encodes small RNAs with antisense complementarity to lin14. Cell 75, 843 854. Lee, T.I., and Young, R.A. (2000). Transcription of eukaryotic proteincoding genes. Annu Rev Genet 34 77 137. Lee, Y., Ahn, C., Han, J., Choi, H., Kim J., Yim, J., Lee, J., Provost, P., Radmark, O., Kim, S. et al. (2003). The nuclear RNase III Drosha initiates microRNA processing. Nature 425 415 419. Lei, X., Bai, Z., Ye, F., Xie, J., Kim, C.G., Huang, Y., and Gao, S.J. (2010). Regulation of NF kappaB inhibitor IkappaBalpha and viral replication by a KSHV microRNA. Nat Cell Biol 12 193199. Leucht, C., Stigloher, C., Wizenmann, A., Klafke, R., Folchert, A., and Bally Cuif, L. (2008). MicroRNA 9 directs late organizer activity of the midbrainhindbrai n boundary. Nat Neurosci 11, 641648. Leung, T.H., Hoffmann, A., and Baltimore, D. (2004). One nucleotide in a kappaB site can determine cofactor specificity for NF kappaB dimers. Cell 118 453 464. Lewis, B.P., Burge, C.B., and Bartel, D.P. (2005). Conser ved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120, 15 20.

PAGE 174

174 Lewis, B.P., Shih, I.H., Jones Rhoades, M.W., Bartel, D.P., and Burge, C.B. (2003). Prediction of mammalian microRNA targets. Cell 115, 787 798. Li, B., Carey, M., and Workman, J.L. (2007). The role of chromatin during transcription. Cell 128, 707 719. Li, F., and Yang, Y. (2004). Recovering genetic regulatory networks from microarray data and location analysis data. Genome Inform, 131 140. Li, G., Ruan, X., Auerbach, Raymond K., Sandhu, Kuljeet S., Zheng, M., Wang, P., Poh, Huay M., Goh, Y., Lim, J., Zhang, J. et al. (2012). Extensive Promoter Centered Chromatin Interactions Provide a Topological Basis for Transcription Regulati on. Cell 148 84 98. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., and Genome Project Data Processing, S. (2009). The Sequence Alignment/Map (SAM) Format and SAMtools. Bioinformatics 25 2078207 9. Li, H., Ruan, J., and Durbin, R. (2008a). Mapping short DNA sequencing reads and calling variants using mapping quality scores, Vol 18. Li, Q., Brown, J.B., Huang, H., and Bickel, P.J. (2011). Measuring reproducibility of high throughput experiments. Ann. Appl. Stat. 5 1752 1779. Li, Q., and Verma, I.M. (2002). NF kappaB regulation in the immune system. Nat Rev Immunol 2 725 734. Li, X.Y., MacArthur, S., Bourgon, R., Nix, D., Pollard, D.A., Iyer, V.N., Hechmer, A., Simirenko, L., Stapleton, M., Luengo Hendriks, C.L. et al. (2008b). Transcription factors bind thousands of active and inactive regions in the Drosophila blastoderm. PLoS Biol 6 e27. Liang, D., Gao, Y., Lin, X., He, Z., Zhao, Q., Deng, Q., and Lan, K. (2011). A human herpesvirus miRNA attenuates interferon signaling and contributes to maintenance of viral latency by targeting IKKvarepsilon. Cell Res 21 793 806. ChIP seq. Bioinformatics 28, 121 122. Lim, C., Sohn, H., Gwack, Y., and Choe, J. (2000). Latency associated nuclear antigen of Kaposi's sarcomaassociated herpesvirus (human herpesvirus 8) binds ATF4/CREB2 and inhibits its transcriptional activation activity. J Gen Virol 81, 26452652.

PAGE 175

175 Lim, L.P., Lau, N.C., GarrettEngele, P., Grimson, A., Schelter, J.M., Castle, J., Bartel, D.P., Linsley, P.S., and Johnson, J.M. (2005). Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature 433 769773. Lin, B. White, J.T., Lu, W., Xie, T., Utleg, A.G., Yan, X., Yi, E.C., Shannon, P., Khrebtukova, I., Lange, P.H. et al. (2005). Evidence for the Presence of Disease Perturbed Networks in Prostate Cancer Cells by Genomic and Proteomic Analyses: A Systems Approach to Disease, Vol 65. Lin, R., Genin, P., Mamane, Y., Sgarbanti, M., Battistini, A., Harrington, W.J., Barber, G.N., and Hiscott, J. (2001). HHV 8 encoded vIRF 1 represses the interferon antiviral response by blocking IRF 3 recruitment of the CBP/p300 coact ivators. Oncogene 20 800 811. Liu, J., ValenciaSanchez, M.A., Hannon, G.J., and Parker, R. (2005). MicroRNA dependent localization of targeted mRNAs to mammalian P bodies. Nat Cell Biol 7 719 723. Liu, M., Liberzon, A., Kong, S.W., Lai, W.R., Park, P.J., Kohane, I.S., and Kasif, S. (2007). Network based analysis of affected biological processes in type 2 diabetes models. PLoS Genet 3 e96. Liu, W., Tanasa, B., Tyurina, O.V., Zhou, T.Y., Gassmann, R., Liu, W.T., Ohgi, K.A., Benner, C., Garcia Bassets, I., Aggarwal, A.K. et al. (2010). PHF8 mediates histone H4 lysine 20 demethylation events involved in cell cycle progression. Nature 466 508 512. Lo, H.S., Wang, Z., Hu, Y., Yang, H.H., Gere, S., Buetow, K.H., and Lee, M.P. (2003). Allelic Variation in Gene Expression Is Common in the Human Genome. Genome Research 13, 18551862. Louafi, F., Martinez Nunez, R.T., and Sanchez Elsner, T. (2010). MicroRNA 155 targets SMAD2 and modulates the response of macrophages to transforming growth factor {beta}. J Biol Chem 285 41328 41336. Lu, C., Tej, S.S., Luo, S., Haudenschild, C.D., Meyers, B.C., and Green, P.J. (2005). Elucidation of the small RNA component of the transcriptome. Science 309 15671569. Lu, F., Stedman, W., Yousef, M., Renne, R., and Lieberman, P.M. (2010a). Epigenetic regulation of Kaposi's sarcomaassociated herpesvirus latency by virus encoded microRNAs that target Rta and the cellular Rbl2DNMT pathway. J Virol 84, 26972706.

PAGE 176

176 Lu, F., Tsai, K., Chen, H.S., Wikramasinghe, P., Davuluri, R.V., Showe L., Domsic, J., Marmorstein, R., and Lieberman, P.M. (2012). Identification of Host Chromosome Binding Sites and Candidate Gene Targets for Kaposi's SarcomaAssociated Herpesvirus LANA. Journal of Virology 86, 57525762. Lu, F., Weidmer, A., Liu, C.G., V olinia, S., Croce, C.M., and Lieberman, P.M. (2008). EpsteinBarr virus induced miR 155 attenuates NF kappaB signaling and stabilizes latent virus persistence. J Virol 82 1043610443. Lu, Q., Yan, J., and Adler, P.N. (2010b). The Drosophila planar polarit y proteins inturned and multiple wing hairs interact physically and function together. Genetics 185 549558. Lynch, H.T., Weisenburger, D.D., Quinn Laquer, B., Watson, P., Lynch, J.F., and Sanger, W.G. (2002). Hereditary chronic lymphocytic leukemia: an extended family study and literature review. Am J Med Genet 115 113117. Ma, J. (2005). Crossing the line between activation and repression. Trends Genet 21 5459. MacQuarrie, K.L., Fong, A.P., Morse, R.H., and Tapscott, S.J. (2011). Genome wide transcrip tion factor binding: beyond direct target regulation. Trends in genetics : TIG 27 141 148. Majumder, P., and Boss, J.M. (2011). DNA methylation dysregulates and silences the HLA DQ locus by altering chromatin architecture. Genes Immun 12 291299. Makeyev, E.V., Zhang, J., Carrasco, M.A., and Maniatis, T. (2007). The MicroRNA miR 124 promotes neuronal differentiation by triggering brainspecific alternative premRNA splicing. Mol Cell 27, 435448. Malone, J.H., and Oliver, B. (2011). Microarrays, deep sequencing and the true measure of the transcriptome. BMC Biol 9 34. Margolin, A., and et al. (2006). ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics, (Suppl 1): S7. Marinesc u, V.D., Kohane, I.S., and Riva, A. (2005). The MAPPER database: a multi genome catalog of putative transcription factor binding sites. Nucleic Acids Res 33, D91 97. Mayr, C., Hemann, M.T., and Bartel, D.P. (2007). Disrupting the pairing between let 7 and Hmga2 enhances oncogenic transformation. Science 315 1576 1579. Mazire, P., and Enright, A.J. (2007). Prediction of microRNA targets. Drug Discov Today 12, 452 458.

PAGE 177

177 McDaniell, R., Lee, B.K., Song, L., Liu, Z., Boyle, A.P., Erdos, M.R., Scott, L.J., Morke n, M.A., Kucera, K.S., Battenhouse, A., et al. (2010). Heritable individual specific and allelespecific chromatin signatures in humans. Science 328 235 239. McIntyre, L., Bono, L., Genissel, A., Westerman, R., Junk, D., Telonis Scott, M., Harshman, L., W ayne, M., Kopp, A., and Nuzhdin, S. (2006). Sex specific expression of alternative transcripts in Drosophila. Genome Biology 7 R79. McIntyre, L.M., Lopiano, K., AM, M., V, A., AL, O., LJ, Y., and SV, N. (2011). RNA seq: technical variability and sampling. BMC Genomics 12 McManus, C.J., Coolon, J.D., Duff, M.O., Eipper Mains, J., Graveley, B.R., and Wittkopp, P.J. (2010). Regulatory divergence in Drosophila revealed by mRNA seq. Genome Research 20 816 825. Meijsing, S.H., Pufall, M.A., So, A.Y., Bates, D.L., Chen, L., and Yamamoto, K.R. (2009). DNA binding site sequence directs glucocorticoid receptor structure and activity. Science 324 407 410. Merkerova, M., Belickova, M., and Bruchova, H. (2008). Differential expression of microRNAs in hematopoietic cell lineages. Eur J Haematol 81 304 310. Meyer, K.B., Maia, A.T., O'Reilly, M., Teschendorff, A.E., Chin, S.F., Caldas, C., and Ponder, B.A.J. (2008). AlleleSpecific Up Regulation of FGFR2 Increases Susceptibility to Breast Cancer. PLoS Biol 6 e108. Mikkelsen, T.S., Ku, M., Jaffe, D.B., Issac, B., Lieberman, E., Giannoukos, G., Alvarez, P., Brockman, W., Kim, T.K., Koche, R.P., et al. (2007). Genomewide maps of chromatin state in pluripotent and lineagecommitted cells. Nature 448 55 3 560. Mineno, J., Okamoto, S., Ando, T., Sato, M., Chono, H., Izu, H., Takayama, M., Asada, K., Mirochnitchenko, O., Inouye, M. et al. (2006). The expression profile of microRNAs in mouse embryos. Nucleic Acids Res 34, 17651771. Miranda, K.C., Huynh, T. Tay, Y., Ang, Y.S., Tam, W.L., Thomson, A.M., Lim, B., and Rigoutsos, I. (2006). A patternbased method for the identification of MicroRNA binding sites and their corresponding heteroduplexes. Cell 126 12031217. Miska, E.A., Alvarez Saavedra, E., Abbot t, A.L., Lau, N.C., Hellman, A.B., McGonagle, S.M., Bartel, D.P., Ambros, V.R., and Horvitz, H.R. (2007). Most Caenorhabditis elegans microRNAs are individually not essential for development or viability. PLoS Genet 3 e215. Moffett, H., and Novina, C. (20 07). A small RNA makes a Bic difference, Vol 8. Moore, P.S., and Chang, Y. (2003). Kaposi's sarcomaassociated herpesvirus immunoevasion and tumorigenesis: two sides of the same coin? Annu Rev Microbiol 57, 609639.

PAGE 178

178 Mootha, V.K., Lindgren, C.M., Eriksson, K. F., Subramanian, A., Sihag, S., Lehar, J., Puigserver, P., Carlsson, E., Ridderstrale, M., Laurila, E. et al. (2003). PGC 1[alpha] responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 34, 267 273. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L., and Wold, B. (2008). Mapping and quantifying mammalian transcriptomes by RNA Seq. Nat Methods 5 621 628. Mukherji, S., Ebert, M.S., Zheng, G.X., Tsang, J.S., Sharp, P.A., and van Oudenaar den, A. (2011). MicroRNAs can generate thresholds in target gene expression. Nat Genet 43 854 859. Muller, H., Bracken, A.P., Vernell, R., Moroni, M.C., Christians, F., Grassilli, E., Prosperini, E., Vigo, E., Oliner, J.D., and Helin, K. (2001). E2Fs regulate the expression of genes involved in differentiation, development, proliferation, and apoptosis. Genes Dev 15, 267285. Murchison, E.P., Partridge, J.F., Tam, O.H., Cheloufi, S., and Hannon, G.J. (2005). Characterization of Dicer deficient murine embry onic stem cells. Proc Natl Acad Sci U S A 102 12135 12140. Nakano, M., Nobuta, K., Vemaraju, K., Tej, S.S., Skogen, J.W., and Meyers, B.C. (2006). Plant MPSS databases: signaturebased transcriptional resources for analyses of mRNA and small RNA. Nucleic Acids Res 34 D731 735. Neipel, F., Albrecht, J.C., and Fleckenstein, B. (1997). Cell homologous genes in the Kaposi's sarcomaassociated rhadinovirus human herpesvirus 8: determinants of its pathogenicity? J Virol 71, 41874192. Nelson, P.T., De PlanellS aguer, M., Lamprinaki, S., Kiriakidou, M., Zhang, P., O'Doherty, U., and Mourelatos, Z. (2007). A novel monoclonal antibody against human Argonaute proteins reveals unexpected characteristics of miRNAs in human blood cells. Rna 13 17871792. Neph, S., Vie rstra, J., Stergachis, A.B., Reynolds, A.P., Haugen, E., Vernot, B., Thurman, R.E., John, S., Sandstrom, R., Johnson, A.K. et al. (2012). An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 489 83 90. Neter, Wasserman and Kutner (1990). Applied Linear Statistical Models (McGraw Hill/Irwin). Nica, A.C., and Dermitzakis, E.T. (2008). Using gene expression to investigate the genetic basis of complex disorders. Human Molecular Genetics 17, R129R134.

PAGE 179

179 Nica, A.C., Montgome ry, S.B., Dimas, A.S., Stranger, B.E., Beazley, C., Barroso, I.s., and Dermitzakis, E.T. (2010). Candidate Causal Regulatory Effects by Integration of Expression QTLs with Complex Trait Genetic Associations. PLoS Genet 6 e1000895. Nohata, N., Hanazawa, T. Enokida, H., and Seki, N. (2012). microRNA 1/133a and microRNA 206/133b clusters: dysregulation and functional roles in human cancers. Oncotarget 3 9 21. Nuzhdin, S.V., Brisson, J.A., Pickering, A., Wayne, M.L., Harshman, L.G., and McIntyre, L.M. (2009) Natural genetic variation in transcriptome reflects network structure inferred with major effect mutations: insulin/TOR and associated phenotypes in Drosophila melanogaster. BMC Genomics 10, 124. Oberg, A.L., and Vitek, O. (2009). Statistical Design of Quantitative Mass Spectrometry Based Proteomic Experiments. Journal of Proteome Research 8 21442156. O'Connell, R.M., Chaudhuri, A.A., Rao, D.S., and Baltimore, D. (2009). Inositol phosphatase SHIP1 is a primary target of miR 155. Proc Natl Acad Sci U S A 106, 71137118. O'Connell, R.M., Rao, D.S., Chaudhuri, A.A., Boldin, M.P., Taganov, K.D., Nicoll, J., Paquette, R.L., and Baltimore, D. (2008). Sustained expression of microRNA 155 in hematopoietic stem cells causes a myeloproliferative disorder. J Exp Med 205 585594. O'Connell, R.M., Taganov, K.D., Boldin, M.P., Cheng, G., and Baltimore, D. (2007). MicroRNA 155 is induced during the macrophage inflammatory response. Proc Natl Acad Sci U S A 104 16041609. O'Hara, A.J., Vahrson, W., and Dittmer, D.P. (2008). Gene alteration and precursor and mature microRNA transcription changes contribute to the miRNA signature of primary effusion lymphoma. Blood 111 23472353. Ornatsky, O.I., Lou, X., Nitz, M., Schfer, S., Sheldrick, W.S., Baranov, V.I., Bandura, D.R., and Tanner, S.D. (2008). Study of cell antigens and intracellular DNA by identification of element containing labels and metallointercalators using inductively coupled plasma mass spectrometry. Anal Chem 80 25392547. Osborne, C.S., and Eskiw, C.H. (20 08). Where shall we meet? A role for genome organisation and nuclear subcompartments in mediating interchromosomal interactions. J Cell Biochem 104 15531561. Ouyang, Z., Zhou, Q., and Wong, W.H. (2009). ChIP Seq of transcription factors predicts absolute and differential gene expression in embryonic stem cells. Proc Natl Acad Sci U S A 106 2152121526.

PAGE 180

180 Palii, C.G., Perez Iratxeta, C., Yao, Z., Cao, Y., Dai, F., Davison, J., Atkins, H., Allan, D., Dilworth, F.J., Gentleman, R., et al. (2011). Diff erential genomic targeting of the transcription factor TAL1 in alternate haematopoietic lineages. EMBO J 30 494509. Pan, Q., Shai, O., Lee, L.J., Frey, B.J., and Blencowe, B.J. (2008). Deep surveying of alternative splicing complexity in the human transc riptome by highthroughput sequencing. Nat Genet 40, 14131415. Parelho, V., Hadjur, S., Spivakov, M., Leleu, M., Sauer, S., Gregson, H.C., Jarmuz, A., Canzonetta, C., Webster, Z., Nesterova, T. et al. (2008). Cohesins functionally associate with CTCF on mammalian chromosome arms. Cell 132 422 433. Parisi, M., Nuttall, R., Naiman, D., Bouffard, G., Malley, J., Andrews, J., Eastman, S., and Oliver, B. (2003). Paucity of genes on the Drosophila X chromosome showing malebiased expression. Science 299 697 7 00. Park, P.J. (2009). ChIP seq: advantages and challenges of a maturing technology. Nat Rev Genet 10, 669680. Park, S.M., Gaur, A.B., Lengyel, E., and Peter, M.E. (2008). The miR 200 family determines the epithelial phenotype of cancer cells by targeting the E cadherin repressors ZEB1 and ZEB2. Genes Dev 22, 894907. Pastinen, T., Sladek, R., Gurd, S., Sammak, A.a., Ge, B., Lepage, P., Lavergne, K., Villeneuve, A., Gaudin, T., Brandstrom, H. et al. (2004). A survey of genetic and epigenetic variation affecting human gene expression. Physiol. Genomics 16 184193. Pe'er, D., Regev, A., and Tanay, A. (2002). Minreg: inferring an active regulator set. Bioinformatics 18 Suppl 1, S258 267. Pearce, M., Ma tsumura, S., and Wilson, A.C. (2005). Transcripts encoding K12, v FLIP, v cyclin, and the microRNA cluster of Kaposi's sarcomaassociated herpesvirus originate from a common promoter. J Virol 79 1445714464. Pearl, J. (1988). Probabilistic reasoning in in telligent systems: networks of plausible inference (Morgan Kaufmann). Pepke, S., Wold, B., and Mortazavi, A. (2009). Computation for ChIP seq and RNA seq studies. Nature Methods 6 S22 32. Peri, S., Navarro, J.D., Kristiansen, T.Z., Amanchy, R., Surendra nath, V., Muthusamy, B., Gandhi, T.K.B., Chandrika, K.N., Deshpande, N., Suresh, S. et al. (2004). Human protein reference database as a discovery resource for proteomics. Nucleic Acids Res 32, D497501.

PAGE 181

181 Pfeffer, S., Sewer, A., Lagos Quintana, M., Sheridan, R., Sander, C., Grasser, F.A., van Dyk, L.F., Ho, C.K., Shuman, S., Chien, M., et al. (2005). Identification of microRNAs of the herpesvirus family. Nat Methods 2 269276. Pickrell, J.K., Marioni, J.C., Pai, A.A., Degner, J.F., Engelhardt, B.E., Nkador i, E., Veyrieras, J. B., Stephens, M., Gilad, Y., and Pritchard, J.K. (2010). Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464 768 772. Polager, S., Kalma, Y., Berkovich, E., and Ginsberg, D. (2002). E2Fs upregulate expression of genes involved in DNA replication, DNA repair and mitosis. Oncogene 21 437 446. Poliseno, L., Salmena, L., Zhang, J., Carver, B., Haveman, W.J., and Pandolfi, P.P. (2010). A coding independent function of gene and pseudogene mRN As regulates tumour biology. Nature 465 10331038. Pouyssegur, J., Dayan, F., and Mazure, N.M. (2006). Hypoxia signalling in cancer and approaches to enforce tumour regression. Nature 441 437443. Ptashne, M., and Gann, A. (2001). Transcription initiatio n: imposing specificity by localization. Essays Biochem 37 1 15. Pujana, M.A., Han, J.D., Starita, L.M., Stevens, K.N., Tewari, M., Ahn, J.S., Rennert, G., Moreno, V., Kirchhoff, T., Gold, B., et al. (2007). Network modeling links breast cancer susceptibility and centrosome dysfunction. Nat Genet 39 13381349. Raab, J.R., and Kamakaka, R.T. (2010). Insulators and promoters: closer than we think. Nat Rev Genet 11 439446. Rabbee, N., and Speed, T.P. (2006). A genotype calling algorithm for affymetrix SNP arrays. Bioinformatics 22, 7 12. Ramkissoon, S.H., Mainwaring, L.A., Ogasawara, Y., Keyvanfar, K., McCoy, J.P., Jr., Sloand, E.M., Kajigaya, S., and Young, N.S. (2006). Hematopoietic specific microRNA expression in human cells. Leuk Res 30 643 647. Ranz, J.M., Castillo Davis, C.I., Meiklejohn, C.D., and Hartl, D.L. (2003). Sex dependent gene expression and evolution of the Drosophila transcriptome. Science 300 17421745. Ren, B., Cam, H., Takahashi, Y., Volkert, T., Terragni, J., Young, R.A., and Dynlacht, B.D. (2002). E2F integrates cell cycle progression with DNA repair, replication, and G(2)/M checkpoints. Genes Dev 16, 245 256. Ren, B., Robert, F., Wyrick, J.J., Aparicio, O., Jennings, E.G., Simon I., Zeitlinger, J., Schreiber, J., Hannett, N., Kanin, E., et al. (2000). Genome Wide Location and Function of DNA Binding Proteins. Science 290 23062309.

PAGE 182

182 Renne, R., Lagunoff, M., Zhong, W., and Ganem, D. (1996). The size and conformation of Kaposi's s arcoma associated herpesvirus (human herpesvirus 8) DNA in infected cells and virions. J Virol 70, 81518154. Revilla i Domingo, R., Bilic, I., Vilagos, B., Tagoh, H., Ebert, A., Tamir, I.M., Smeenk, L., Trupke, J., Sommer, A., Jaritz, M., et al. (2012). T he B cell identity factor Pax5 regulates distinct transcriptional programmes in early and late B lymphopoiesis. EMBO J 31 3130 3146. Ricarte Filho, J.C., Fuziwara, C.S., Yamashita, A.S., Rezende, E., da Silva, M.J., and Kimura, E.T. (2009). Effects of let 7 microRNA on Cell Growth and Differentiation of Papillary Thyroid Cancer. Transl Oncol 2 236 241. Richards, E.J. (2009). Quantitative epigenetics: DNA sequence variation need not apply. Genes & Development 23, 16011605. Rifkin, S.A., Kim, J., and White K.P. (2003). Evolution of gene expression in the Drosophila melanogaster subgroup. Nat Genet 33, 138 144. Riggs, A. (1975). Hemoglobin in humans. Science 188 11. Rinaldi, A., Vincenti, S., De Vito, F., Bozzoni, I., Oliverio, A., Presutti, C., Fragapane, P., and Mele, A. (2010). Stress induces region specific alterations in microRNAs expression in mice. Behav Brain Res 208 265 269. Ritchie, W., Flamant, S., and Rasko, J.E.J. (2009). Predicting microRNA targets and functions: traps for the unwary. Nat Met h 6 397 398. Riva, A. (2012). The MAPPER2 Database: a multi genome catalog of putative transcription factor binding sites. Nucleic Acids Res 40 D155 161. Roberts, P.J., and Der, C.J. (2007). Targeting the Raf MEK ERK mitogen activated protein kinase casc ade for the treatment of cancer. Oncogene 26 3291 3310. Robertson, G., Hirst, M., Bainbridge, M., Bilenky, M., Zhao, Y., Zeng, T., Euskirchen, G., Bernier, B., Varhol, R., Delaney, A. et al. (2007). Genome wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Meth 4 651 657. Robinson, J.T., Thorvaldsdottir, H., Winckler, W., Guttman, M., Lander, E.S., Getz, G., and Mesirov, J.P. (2011). Integrative genomics viewer. Nat Biotech 29 24 26. Rockman, M.V., and Kruglyak, L. (2006). Genetics of global gene expression. Nat Rev Genet 7 862 872. Rodriguez, A., Vigorito, E., Clare, S., Warren, M.V., Couttet, P., Soond, D.R., van Dongen, S., Grocock, R.J., Das, P.P., Miska, E.A., et al. (2007). Requirement of bic/microRNA 155 for normal immune function. Science 316 608 611.

PAGE 183

183 Romania, P., Lulli, V., Pelosi, E., Biffoni, M., Peschle, C., and Marziali, G. (2008). MicroRNA 155 modulates megakaryopoiesis at progenitor and precursor level by targeting Ets 1 and Meis1 transcription factors. Br J Haematol 143 570580. Rose, P.P., Carroll, J.M., Carroll, P.A., DeFilippis, V.R., Lagunoff, M., Moses, A.V., Roberts, C.T., and Frh, K. (2007). The insulin receptor is essential for virus induced tumorigenesis of Kaposi's sarcoma. Oncogene 26 19952005. Ross, D.T., Scherf, U., Eisen, M.B., Perou, C.M., Rees, C., Spellman, P., Iyer, V., Jeffrey, S.S., Van de Rijn, M., Waltham, M., et al. (2000). Systematic variation in gene expression patterns in human cancer cell lines. Nat Genet 24 227235. Rozowsky, J., Abyzov, A., Wang, J., Alves, P., Raha, D., Harmanci, A., Leng, J., Bjornson, R., Kong, Y., Kitabayashi, N. et al. (2011). AlleleSeq: analysis of allele sp ecific expression and binding in a network framework. Mol Syst Biol 7 522. Rozowsky, J., Euskirchen, G., Auerbach, R.K., Zhang, Z.D., and Gibson, T. (2009). PeakSeq enables systematic scoring of ChIP seq experiments relative to controls. Nature Biotechnol ogy 27, 66 75. Rubio, E.D., Reiss, D.J., Welcsh, P.L., Disteche, C.M., Filippova, G.N., Baliga, N.S., Aebersold, R., Ranish, J.A., and Krumm, A. (2008). CTCF physically links cohesin to chromatin. Proc Natl Acad Sci U S A 105 83098314. Russo, J.J., Boh enzky, R.A., Chien, M.C., Chen, J., Yan, M., Maddalena, D., Parry, J.P., Peruzzi, D., Edelman, I.S., Chang, Y., et al. (1996). Nucleotide sequence of the Kaposi sarcomaassociated herpesvirus (HHV8). Proc Natl Acad Sci U S A 93, 1486214867. Rutherford, S. L., and Henikoff, S. (2003). Quantitative epigenetics. Nat Genet 33, 6 8. Ruvkun, G., Wightman, B., Burglin, T., and Arasu, P. (1991). Dominant gainof function mutations that lead to misregulation of the C. elegans heterochronic gene lin14, and the evolutionary implications of dominant mutations in patternformation genes. Dev Suppl 1 47 54. Saccani, S., Pantano, S., and Natoli, G. (2003). Modulation of NF kappaB activity by exchange of dimers. Mol Cell 11, 15631574. Sachs, K., Gifford, D., Jaakkola, T., Sorger, P., and Lauffenburger, D.A. (2002). Bayesian network approach to cell signaling pathway modeling. Sci STKE 2002 pe38. Sachs, K., Perez, O., Pe'er, D., Lauffenburger, D.A., and Nolan, G.P. (2005). Causal proteinsignaling networks derived from multiparameter singlecell data. Science 308, 523 529.

PAGE 184

184 Saetrom, O., Snove, O., Jr., and Saetrom, P. (2005). Weighted sequence motifs as an improved seeding step in microRNA target prediction algorithms. Rna 11 995 1003. Saiz, L., and Vilar, J.M.G. (2006). D NA looping: the consequences and its control. Current Opinion in Structural Biology 16, 344 350. Samols, M.A., Hu, J., Skalsky, R.L., and Renne, R. (2005). Cloning and identification of a microRNA cluster within the latency associated region of Kaposi's sa rcoma associated herpesvirus. J Virol 79 9301 9305. Samols, M.A., Skalsky, R.L., Maldonado, A.M., Riva, A., Lopez, M.C., Baker, H.V., and Renne, R. (2007). Identification of cellular genes targeted by KSHV encoded microRNAs. PLoS Pathog. 3 e65. Sandelin, A., Alkema, W., Engstrm, P., Wasserman, W.W., and Lenhard, B. (2004). JASPAR: an open access database for eukaryotic transcription factor binding profiles. Nucleic Acids Research 32, D91 D94. Schaefer, C.F., Anthony, K., Krupa, S., Buchoff, J., Day, M., Hannay, T., and Buetow, K.H. (2009). PID: the Pathway Interaction Database. Nucleic Acids Res 37, D674679. Schafer, A., Cai, X., Bilello, J.P., Desrosiers, R.C., and Cullen, B.R. (2007). Cloning and analysis of microRNAs encoded by the primate gammaherpesvirus rhesus monkey rhadinovirus. Virology 364, 21 27. Schena, M., Shalon, D., Davis, R.W., and Brown, P.O. (1995). Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray. Science 270, 467 470. Schmidt, D., Wilson, M.D., B allester, B., Schwalie, P.C., Brown, G.D., Marshall, A., Kutter, C., Watt, S., Martinez Jimenez, C.P., Mackay, S., et al. (2010). Five Vertebrate ChIP seq Reveals the Evolutionary Dynamics of Transcription Factor Binding. Science 328 10361040. Schumm, K. Rocha, S., Caamano, J., and Perkins, N.D. (2006). Regulation of p53 tumour suppressor target gene expression by the p52 NF [kappa]B subunit. Embo j 25 48204832. Schwartz, S., Meshorer, E., and Ast, G. (2009). Chromatin organization marks exonintron st ructure. Nat Struct Mol Biol 16 990995. Schwarz, D.S., Hutvagner, G., Du, T., Xu, Z., Aronin, N., and Zamore, P.D. (2003). Asymmetry in the assembly of the RNAi enzyme complex. Cell 115 199208.

PAGE 185

185 Selbach, M., Schwanhausser, B., Thierfelder, N., Fang, Z., Khanin, R., and Rajewsky, N. (2008). Widespread changes in protein synthesis induced by microRNAs. Nature 455 58 63. Serre, D., Gurd, S., Ge, B., Sladek, R., Sinnett, D., Harmsen, E., Bibikova, M., Chudin, E., Barker, D.L., Dickinson, T. et al. (2008). Differential Allelic Expression in the Human Genome: A Robust Approach To Identify Genetic and Epigenetic CisActing Mechanisms Regulating Gene Expression. PLoS Genet 4 e1000006. Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B., and Ideker, T. (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks, Vol 13. Shao, Z., Zhang, Y., Yuan, G. C., Orkin, S., and Waxman, D. (2012). MAnorm: a r obust model for quantitative comparison of ChIP Seq data sets. Genome Biology 13, R16. Sharma, S.V., Lee, D.Y., Li, B., Quinlan, M.P., Takahashi, F., Maheswaran, S., McDermott, U., Azizian, N., Zou, L., Fischbach, M.A., et al. (2010). A chromatinmediated reversible drug tolerant state in cancer cell subpopulations. Cell 141 6980. Shaw, R.J., and Cantley, L.C. (2006). Ras, PI(3)K and mTOR signalling controls tumour cell growth. Nature 441 424 430. Shiraki, T., Kondo, S., Katayama, S., Waki, K., Kasukawa, T., Kawaji, H., Kodzius, R., Watahiki, A., Nakamura, M., Arakawa, T. et al. (2003). Cap analysis gene expression for highthroughput analysis of transcriptional starting point and identification of promoter usage. Proc Natl Acad Sci U S A 100 15776 15781. Shuai, K., and Liu, B. (2003). Regulation of JAK STAT signalling in the immune system. Nat Rev Immunol 3 900 911. Sieberts, S.K., and Schadt, E.E. (2007). Moving toward a system genetics view of disease. Mamm Genome 18, 389 401. Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis (Champman and Hall). Skalsky, R.L., and Cullen, B.R. (2010). Viruses, microRNAs, and host interactions. Annu Rev Microbiol 64 123 141. Skalsky, R.L., Samols, M.A., Plaisance, K.B., Boss, I.W., Riva, A., Lopez, M.C., Baker, H.V., and Renne, R. (2007). Kaposi's sarcomaassociated herpesvirus encodes an ortholog of miR 155. Skalsky RL, Samols MA, Plaisance KB, Boss IW, Riva A, Lopez MC, Baker HV, Renne R. 81 12836 12845.

PAGE 186

186 So, A.Y., Chaivorapol, C., Bolton, E.C., Li, H., and Yamamoto, K.R. (2007). Determinants of cell and genespecific transcriptional regulation by the glucocorticoid receptor. PLoS Genet 3 e94. Soccio, R.E., Tuteja, G., Everett, L.J., Li, Z., Lazar, M.A., and Kaestner, K.H. (2011). Species Specific Strategies Underlying Conserved Functions of Metabolic Transcription Factors. Molecular Endocrinology 25, 694 706. Sood, P., Krek, A., Zavolan, M., Macino, G., and Rajewsky, N. (2006). Cell type specific signatures of microRNAs on target m RNA expression. Proc Natl Acad Sci U S A 103, 27462751. Soulier, J., Grollet, L., Oksenhendler, E., Cacoub, P., Cazals Hatem, D., Babinet, P., d'Agay, M.F., Clauvel, J.P., Raphael, M., Degos, L. et al. (1995). Kaposi's sarcoma associated herpesvirus like DNA sequences in multicentric Castleman's disease. Blood 86, 12761280. Spencer, S.L., Gaudet, S., Albeck, J.G., Burke, J.M., and Sorger, P.K. (2009). Nongenetic origins of cell to cell variability in TRAIL induced apoptosis. Nature 459 428432. Stamato yannopoulos, J.A. (2004). The genomics of gene expression. Genomics 84 449457. Stanelle, J., Stiewe, T., Theseling, C.C., Peter, M., and Ptzer, B.M. (2002). Gene expression changes in response to E2F1 activation. Nucleic Acids Res 30 18591867. Stark, A., Brennecke, J., Bushati, N., Russell, R.B., and Cohen, S.M. (2005). Animal MicroRNAs confer robustness to gene expression and have a significant impact on 3'UTR evolution. Cell 123 11331146. Stark, A., Brennecke, J., Russell, R.B., and Cohen, S.M. (20 03). Identification of Drosophila MicroRNA targets. PLoS Biol 1 E60. Stark, C., Breitkreutz, B.J., Reguly, T., Boucher, L., Breitkreutz, A., and Tyers, M. (2006). BioGRID: a general repository for interaction datasets, Vol 34. Stormo, G.D. (1988). Comput er methods for analyzing sequence recognition of nucleic acids. Annu Rev Biophys Biophys Chem 17 241263. Stormo, G.D., and Hartzell, G.W. (1989). Identifying proteinbinding sites from unaligned DNA fragments. Proc Natl Acad Sci U S A 86 11831187. Stra nger, B.E., Nica, A.C., Forrest, M.S., Dimas, A., Bird, C.P., Beazley, C., Ingle, C.E., Dunning, M., Flicek, P., Koller, D. et al. (2007). Population genomics of human gene expression. Nat Genet 39 12171224.

PAGE 187

187 Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S. et al. (2005). Gene set enrichment analysis: A knowledgebased approach for interpreting genomewide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102 1554515550. Sumazin, P., Yang, X., Chiu, H.S., Chung, W.J., Iyer, A., LlobetNavas, D., Rajbhandari, P., Bansal, M., Guarnieri, P., Silva, J. et al. (2011). An extensive microRNA mediated network of RNA R NA interactions regulates established oncogenic pathways in glioblastoma. Cell 147 370 381. Sun, S.C., Ganchi, P.A., Ballard, D.W., and Greene, W.C. (1993). NF kappa B controls expression of inhibitor I kappa B alpha: evidence for an inducible autoregulat ory pathway. Science 259 19121915. Szab, P.E., and Mann, J.R. (1995). Biallelic expression of imprinted genes in the mouse germ line: implications for erasure, establishment, and mechanisms of genomic imprinting. Genes & Development 9 18571868. Tachibana, M., Sugimoto, K., Nozaki, M., Ueda, J., Ohta, T., Ohki, M., Fukuda, M., Takeda, N., Niida, H., Kato, H., et al. (2002). G9a histone methyltransferase plays a dominant role in euchromatic histone H3 lysine 9 methylation and is essential for early embry ogenesis. Genes Dev 16, 17791791. Takahashi, K., and Yamanaka, S. (2006). Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126 663 676. Tam, W. (2001). Identification and characterization of human BIC, a gene on chromosome 21 that encodes a noncoding RNA. Gene 274 157 167. Tam, W., Ben Yehuda, D., and Hayward, W.S. (1997). bic, a novel gene activated by proviral insertions in avian leukosis virus induced lymphomas, is likely to function through its noncoding RNA. Mol Cell Biol 17 14901502. Taney, K.G., and Smith, M.M. (2006). Surgical extraction of impacted teeth in a dog. J Vet Dent 23 168177. Tarone, A.M., McIntyre, L.M., Harshman, L.G., and Nuzhdin, S.V. (2012). Genetic variation in the Yolk protein expression network of Drosophila melanogaster: sex biased negative correlations with longevity. Heredity (Edinb) 109 226 234. Tarone, A.M., Nasser, Y.M., and Nuzhdin, S.V. (2005). Genetic variation for expression of the sex determination pat hway genes in Drosophila melanogaster. Genet Res 86, 31 40.

PAGE 188

188 Tay, Y., Kats, L., Salmena, L., Weiss, D., Tan, S.M., Ala, U., Karreth, F., Poliseno, L., Provero, P., Di Cunto, F. et al. (2011). Coding independent regulation of the tumor suppressor PTEN by co mpeting endogenous mRNAs. Cell 147 344 357. Telonis Scott, M., Kopp, A., Wayne, M.L., Nuzhdin, S.V., and McIntyre, L.M. (2008). Sex specific Splicing in Drosophila: Widespread Occurrence, Tissue specificity, and Evolutionary Conservation. Genetics, doi108.096743. Ten, R.M., Paya, C.V., Isral, N., Bail, O.L., Mattei, M.G., Virelizier, J.L., Kourilsky, P., and Isral, A. (1992). The characterization of the promoter of the gene encoding the p50 subunit of NF kappa B indicates that it participates in its own regulation. Embo j. 11, 195 203. Thai, T.H., Calado, D.P., Casola, S., Ansel, K.M., Xiao, C., Xue, Y., Murphy, A., Frendewey, D., Valenzuela, D., Kutok, J.L. et al. (2007). Regulation of the germinal center response by microRNA 155. Science 316 604 608. Thomas Chollier, M., Defrance, M., MedinaRivera, A., Sand, O., Herrmann, C., Thieffry, D., and van Helden, J. (2011). RSAT 2011: regulatory sequence analysis tools. Nucleic Acids Res 39 W86 91. Thompson, J.D., Higgins, D.G., and Gibson, T.J. (1994). CLUS TAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positionspecific gap penalties and weight matrix choice. Nucleic Acids Research 22, 46734680. Thomson, J.M., Newman, M., Parker, J.S., MorinKensicki, E.M., Wright, T., and Hammond, S.M. (2006). Extensive post transcriptional regulation of microRNAs and its implications for cancer. Genes & Development 20, 22022207. Thorvaldsdttir, H., Robinson, J.T., and Mesirov, J.P. (2013). Integrative Genomics View er (IGV): high performance genomics data visualization and exploration. Briefings in Bioinformatics 14, 178 192. Thurman, R.E., Rynes, E., Humbert, R., Vierstra, J., Maurano, M.T., Haugen, E., Sheffield, N.C., Stergachis, A.B., Wang, H., Vernot, B., et al. (2012). The accessible chromatin landscape of the human genome. Nature 489 75 82. Tomari, Y., Du, T., Haley, B., Schwarz, D.S., Bennett, R., Cook, H.A., Koppetsch, B.S., Theurkauf, W.E., and Zamore, P.D. (2004). RISC assembly defects in the Drosophila RN Ai mutant armitage. Cell 116 831841. Tsang, J., Zhu, J., and van Oudenaarden, A. (2007). MicroRNA Mediated Feedback and Feedforward Loops Are Recurrent Network Motifs in Mammals. Molecular Cell 26, 753767.

PAGE 189

189 Tuch, B.B., Laborde, R.R., Xu, X., Gu, J., Chu ng, C.B., Monighetti, C.K., Stanley, S.J., Olsen, K.D., Kasperbauer, J.L., Moore, E.J. et al. (2010). Tumor transcriptome sequencing reveals allelic expression imbalances associated with copy number alterations. PLoS One 5 e9317. Tuteja, G., White, P., S chug, J., and Kaestner, K.H. (2009). Extracting transcription factor targets from ChIP Seq data. Nucleic Acids Research 37 e113 e113. Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., Judson, R.S., Knight, J.R., Lockshon, D., Narayan, V., Srinivasan, M., P ochart, P. et al. (2000). A comprehensive analysis of proteinprotein interactions in Saccharomyces cerevisiae. Nature 403, 623 627. Uhlmann, S., Mannsperger, H., Zhang, J.D., Horvat, E.A., Schmidt, C., Kublbeck, M., Henjes, F., Ward, A., Tschulena, U., Zweig, K. et al. (2012). Global microRNA level regulation of EGFR driven cell cycle protein network in breast cancer. Mol Syst Biol 8 Ule, J., Ule, A., Spencer, J., Williams, A., Hu, J.S., Cline, M., Wang, H., Clark, T., Fraser, C., Ruggiu, M., et al. (2005). Nova regulates brainspecific splicing to shape the synapse. Nat Genet 37, 844 852. Urnov, F.D. (2003). Chromatin remodeling as a guide to transcriptional regulatory networks in mammals. J Cell Biochem 88, 684 694. Valle, L., SerenaAcedo, T., Liy anarachchi, S., Hampel, H., Comeras, I., Li, Z., Zeng, Q., Zhang, H.T., Pennison, M.J., Sadim, M., et al. (2008). Germline allelespecific expression of TGFBR1 confers an increased risk of colorectal cancer. Science 321, 13611365. Valouev, A., Johnson, D. S., Sundquist, A., Medina, C., Anton, E., Batzoglou, S., Myers, R.M., and Sidow, A. (2008). Genomewide analysis of transcription factor binding sites based on ChIP Seq data. Nat Meth 5 829 834. van Dam, H., and Castellazzi, M. (2001). Distinct roles of J un : Fos and Jun : ATF dimers in oncogenesis. Oncogene 20 2453 2464. van den Berg, A., Kroesen, B.J., Kooistra, K., de Jong, D., Briggs, J., Blokzijl, T., Jacobs, S., Kluiver, J., Diepstra, A., Maggio, E., et al. (2003). High expression of B cell receptor inducible gene BIC in all subtypes of Hodgkin lymphoma. Genes Chromosomes Cancer 37, 20 28. Vaquerizas, J.M., Kummerfeld, S.K., Teichmann, S.A., and Luscombe, N.M. (2009). A census of human transcription factors: function, expression and evolution. Nat Re v Genet 10, 252263.

PAGE 190

190 Vega, V.B., Cheung, E., Palanisamy, N., and Sung, W.K. (2009). Inherent Signals in Sequencing Based ChromatinImmunoPrecipitation Control Libraries. PLoS ONE 4 e5241. Verhoeven, K.J.F., Simonsen, K.L., and McIntyre, L.M. (2005). Implementing false discovery rate control: increasing your power. Oikos 108 643 647. Verlaan, D.J., Berlivet, S., Hunninghake, G.M., Madore, A. M., Larivire, M., Moussette, S., Grundberg, E., Kwan, T., Ouimet, M., Ge, B., et al. (2009). Allele Specific C hromatin Remodeling in the ZPBP2/GSDMB/ORMDL3 Locus Associated with the Risk of Asthma and Autoimmune Disease. American journal of human genetics 85 377 393. Vigorito, E., Perks, K.L., Abreu Goodger, C., Bunting, S., Xiang, Z., Kohlhaas, S., Das, P.P., Mi ska, E.A., Rodriguez, A., Bradley, A., et al. (2007). microRNA 155 regulates the generation of immunoglobulin class switched plasma cells. Immunity 27 847 859. Vinther, J., Hedegaard, M.M., Gardner, P.P., Andersen, J.S., and Arctander, P. (2006). Identifi cation of miRNA targets with stable isotope labeling by amino acids in cell culture. Nucleic Acids Res 34, e107. Volinia, S., Calin, G.A., Liu, C.G., Ambs, S., Cimmino, A., Petrocca, F., Visone, R., Iorio, M., Roldo, C., Ferracin, M., et al. (2006). A micr oRNA expression signature of human solid tumors defines cancer gene targets. Proc Natl Acad Sci U S A 103 22572261. Vrba, L., Garbe, J.C., Stampfer, M.R., and Futscher, B.W. (2011). Epigenetic regulation of normal human mammary cell typespecific miRNAs. Genome Res. 21, 20262037. Wagner, A., and Fell, D.A. (2001). The small world inside large metabolic networks. Proc. R. Soc. Lond. B 268, 18031810. Waidner, L.A., Morgan, R.W., Anderson, A.S., Bernberg, E.L., Kamboj, S., Garcia, M., Riblet, S.M., Ouyang, M., Isaacs, G.K., Markis, M., et al. (2009). MicroRNAs of Gallid and Meleagrid herpesviruses show generally conserved genomic locations and are virus specific. Virology 388, 128 136. Walhout, A.J. (2006). Unraveling transcription regulatory networks by pr oteinDNA and proteinprotein interaction mapping. Genome Res 16, 14451454. Wang, D., Chen, H., Momary, K.M., Cavallari, L.H., Johnson, J.A., and Sadee, W. (2008a). Regulatory polymorphism in vitamin K epoxide reductase complex subunit 1 (VKORC1) affects gene expression and warfarin dose requirement. Blood 112 1013 1021.

PAGE 191

191 Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S.F., Schroth, G.P., and Burge, C.B. (2008b). Alternative isoform regulation in human tissue transcriptomes, Vol 456. Wang, P., Yan, B., Guo, J.T., Hicks, C., and Xu, Y. (2005). Structural genomics analysis of alternative splicing and application to isoform structure modeling. Proc Natl Acad Sci U S A 102 18920 18925. Wang, X. (2008). miRDB: a mi croRNA target prediction and functional annotation database with a wiki interface. RNA 14 10121017. Wang, Y., Li, H., Chan, M.Y., Zhu, F.X., Lukac, D.M., and Yuan, Y. (2004). Kaposi's sarcoma associated herpesvirus ori Lyt dependent DNA replication: cis acting requirements for replication and ori Lyt associated RNA transcription. J Virol 78, 86158629. Wang, Z., Zang, C., Rosenfeld, J.A., Schones, D.E., Barski, A., Cuddapah, S., Cui, K., Roh, T. Y., Peng, W., Zhang, M.Q., et al. (2008c). Combinatorial pat terns of histone acetylations and methylations in the human genome. Nat Genet 40, 897 903. Warburg, O. (1956). On the origin of cancer cells. Science 123 309 314. Warburg, O., Posener, K., and Negelein, E. (1924). Ueber den Stoffwechsel der Tumoren. Bioch emische Zeitschrift 319 344. Warzecha, C.C., Jiang, P., Amirikian, K., Dittmar, K.A., Lu, H., Shen, S., Guo, W., Xing, Y., and Carstens, R.P. (2010). An ESRP regulated splicing programme is abrogated during the epithelial mesenchymal transition. EMBO J 29 32863300. Washburn, M.P., Koller, A., Oshiro, G., Ulaszek, R.R., Plouffe, D., Deciu, C., Winzeler, E., and Yates, J.R., 3rd (2003). Protein pathway and complex clustering of correlated mRNA and protein expression analyses in Saccharomyces cerevisiae. Pr oc Natl Acad Sci U S A 100 31073112. Wayne, M.L., Telonis Scott, M., Bono, L.M., Harshman, L., Kopp, A., Nuzhdin, S.V., and McIntyre, L.M. (2007). Simpler mode of inheritance of transcriptional variation in male Drosophila melanogaster. Proceedings of th e National Academy of Sciences of the United States of America 104 1857718582. Weber, M., Hellmann, I., Stadler, M.B., Ramos, L., Pbo, S., Rebhan, M., and Schbeler, D. (2007). Distribution, silencing potential and evolutionary impact of promoter DNA m ethylation in the human genome. Nat Genet 39 457466. Weinmann, A.S., Yan, P.S., Oberley, M.J., Huang, T.H., and Farnham, P.J. (2002). Isolating human transcription factor targets by coupling chromatin immunoprecipitation and CpG island microarray analysi s. Genes Dev 16, 235 244.

PAGE 192

192 Wen, J., Parker, B.J., Jacobsen, A., and Krogh, A. (2011). MicroRNA transfection and AGObound CLIP seq data sets reveal distinct determinants of miRNA action. RNA 17, 820834. Wen, K.W., and Damania, B. (2009). Kaposi sarcomaassociated herpesvirus (KSHV): Molecular biology and oncogenesis. Cancer Letters In Press, Corrected Proof Wiench, M., John, S., Baek, S., Johnson, T.A., Sung, M.H., Escobar, T., Simmons, C.A., Pearce, K.H., Biddie, S.C., Sabo, P.J. et al. (2011). DNA methy lation status predicts cell typespecific enhancer activity. EMBO J 30, 3028 3039. Wienholds, E., Kloosterman, W.P., Miska, E., Alvarez Saavedra, E., Berezikov, E., de Bruijn, E., Horvitz, H.R., Kauppinen, S., and Plasterk, R.H. (2005). MicroRNA expression in zebrafish embryonic development. Science 309 310 311. Wiggins, C.H., and Nemenman, I. (2003). Process pathway inference via time series analysis. Experimental Mechanics 43, 361 370. Wightman, B., Burglin, T.R., Gatto, J., Arasu, P., and Ruvkun, G. (19 91). Negative regulatory sequences in the lin14 3' untranslated region are necessary to generate a temporal switch during Caenorhabditis elegans development. Genes Dev 5 18131824. Wightman, B., Ha, I., and Ruvkun, G. (1993). Posttranscriptional regulation of the heterochronic gene lin14 by lin4 mediates temporal pattern formation in C. elegans. Cell 75 855862. Wilbanks, E.G., and Facciotti, M.T. (2010). Evaluation of algorithm performance in ChIP seq peak detection. PLoS One 5 e11471. Wittko pp, P., Haerum, B., and Clark, A. (2008). Regulatory changes underlying expression differences within and between Drosophila species. Nature Genetics 40, 346350. Wittkopp, P.J., Haerum, B.K., and Clark, A.G. (2004). Evolutionary changes in cis and trans gene regulation. Nature 430 85 88. Wolfner, M.F. (1997). Tokens of love: Functions and regulation of drosophila male accessory gland products. Insect Biochemistry and Molecular Biology 27, 179 192. Woodcock, C.L. (2006). Chromatin architecture. Curr Opin Struct Biol 16, 213220. Woolf, P.J., Prudhomme, W., Daheron, L., Daley, G.Q., and Lauffenburger, D.A. (2005). Bayesian analysis of signaling networks governing embryonic stem cell fate decisions. Bioinformatics 21, 741 753.

PAGE 193

193 Wray, G.A. (2007). The evolutionary significance of cis regulatory mutations. Nat Rev Genet 8 206 216. Wu, H., and Lozano, G. (1994). NF kappa B activation of p53. A potential mechanism for suppressing cell growth in response to stress. J. Biol. Chem. 269 20067 20074. Wu, K., Jiang, S W., Thangaraju, M., Wu, G., and Couch, F.J. (2000). Induction of the BRCA2 Promoter by Nuclear Factor 275 35548 35556. Xia, D., Srinivas, H., Ahn, Y.h., Sethi, G., Sheng, X., Yung, W.K.A., Xia, Q., Chiao, P.J., Kim, H., Brown, P.H., et al. (2007). Mitogenactivated Protein Kinase Kinase4 Promotes Cell Survival by Decreasing PTEN Expression through an NFB dependent Pathway. J. Biol. Chem. 282 35073519. Xu, H., Wei, C.L., Lin, F., and Sung, W. K. (2008). An HMM approach to genomewide identification of differential histone modification sites from ChIP seq data. Bioinformatics 24, 2344 2349. Xu, J., and Wong, C. (2008). A computational screen for mouse signaling pathways targeted by microRNA clusters. RNA 14 12761283. Yang, Y., Ch aerkady, R., Kandasamy, K., Huang, T.C., Selvan, L.D., Dwivedi, S.B., Kent, O.A., Mendell, J.T., and Pandey, A. (2010). Identifying targets of miR 143 using a SILAC based proteomic approach. Mol Biosyst 6 1873 1882. Yang, Y., Graze, R.M., Walts, B.M., Lop ez, C.M., Baker, H.V., Wayne, M.L., Nuzhdin, S.V., and McIntyre, L.M. (2011). Partitioning transcript variation in Drosophila: abundance, isoforms, and alleles. G3 (Bethesda) 1 427 436. Yang, Y.H., and Speed, T. (2002). Design issues for cDNA microarray experiments. Nat Rev Genet 3 579 588. Yeilding, N.M., Rehman, M.T., and Lee, W.M. (1996). Identification of sequences in c myc mRNA that regulate its steady state levels. Mol Cell Biol 16 3511 3522. Yin, Q., Wang, X., Fewell, C., Cameron, J., Zhu, H., Bad doo, M., Lin, Z., and Flemington, E.K. (2010). MicroRNA miR 155 inhibits bone morphogenetic protein (BMP) signaling and BMP mediated EpsteinBarr virus reactivation. J Virol 84, 63186327. Young, A.P., Nagarajan, R., and Longmore, G.D. (2003). Mechanisms o f transcriptional regulation by RbE2F segregate by biological pathway. Oncogene 22 72097217.

PAGE 194

194 Yu, F., Harada, J.N., Brown, H.J., Deng, H., Song, M.J., Wu, T.T., Kato Stankiewicz, J., Nelson, C.G., Vieira, J., Tamanoi, F., et al. (2007). Systematic ident ification of cellular signals reactivating Kaposi sarcomaassociated herpesvirus. PLoS Pathog 3 e44. Kishikawa, T., Gebreab, F., Li, N., Simonis, N., et al. (2008). HighQuality Binary Protein Interaction Map of the Yeast Interactome Network. Science 322 104 110. Yu, M., Riva, L., Xie, H., Schindler, Y., Moran, T.B., Cheng, Y., Yu, D., Hardison, R., Weiss, M.J., Orkin, S.H., et al. (2009). Insights into GATA 1 Mediated Gene Activation versus Repression via Genomewide Chromatin Occupancy Analysis. Molecular cell 36 682 695. Yuh, C.H., Bolouri, H., and Davidson, E.H. (1998). Genomic cis regulatory logic: experimental and computational analysis of a sea urchin gene. Science 279 18961902. Zhang, K., Li, J.B., Gao, Y., Egli, D., Xie, B., Deng, J., Li, Z., Lee, J. H., Aach, J., Leproust, E.M. et al. (2009). Digital RNA allelotyping reveals tissuespecific and allele specific gene expr ession in human. Nat Meth 6 613618. Zhang, L., Huang, J., Yang, N., Greshock, J., Megraw, M.S., Giannakakis, A., Liang, S., Naylor, T.L., Barchetti, A., Ward, M.R. et al. (2006). microRNAs exhibit high frequency genomic alterations in human cancer. Proc Natl Acad Sci U S A 103 91369141. Zhang, X., and Borevitz, J.O. (2009). Global Analysis of Allele Specific Expression in Arabidopsis thaliana. Genetics, genetics.109.103499. Zhang, Y., Liu, T., Meyer, C.A., Eeckhoute, J., Johnson, D.S., Bernstein, B.E., Nussbaum, C., Myers, R.M., Brown, M., Li, W., et al. (2008). Model based Analysis of ChIP Seq (MACS). Genome Biology 9 R137. Zhao, R., Gish, K., Murphy, M., Yin, Y., Notterman, D., Hoffman, W.H., Tom, E., Mack, D.H., and Levine, A.J. (2000). Analysis of p53regulated gene expression patterns using oligonucleotide arrays. Genes Dev 14, 981 993. Zhao, Y., Xu, H., Yao, Y., Smith, L.P., Kgosana, L., Green, J., Petherbridge, L., Baigent, S.J., and Nair, V. (2011). Critical role of the virus encoded microRNA 155 ortholog in the induction of Marek's disease lymphomas. PLoS Pathog 7 e1001305. Zhong, W., Wang, H., Herndier, B., and Ganem, D. (1996). Restricted expression of Kaposi sarcomaassociated herpesvirus (human herpesvirus 8) genes in Kaposi sarc oma. Proc Natl Acad Sci U S A 93, 66416646.

PAGE 195

195 Zhu, H., Bilgin, M., Bangham, R., Hall, D., Casamayor, A., Bertone, P., Lan, N., Jansen, R., Bidlingmaier, S., Houfek, T. et al. (2001). Global analysis of protein activities using proteome chips. Science 293 2101 2105. Zhu, L., Gazin, C., Lawson, N., Pages, H., Lin, S., Lapointe, D., and Green, M. (2010). ChIPpeakAnno: a Bioconductor package to annotate ChIP seq and ChIP chip data. BMC Bioinformatics 11, 237. Ziegelbauer, J.M., Sullivan, C.S., and Ganem, D. (2009). Tandem array based expression screens identify host mRNA targets of virus encoded microRNAs. Nat Genet 41 130 134.

PAGE 196

196 BIOGRAPHICAL SKETCH Yajie Yang earned her B.S.(2006) from Huazhong Agriculture Univ ersity, China. She is studying genetics and g enomics for her Ph.D degree under the supervision of Dr Lauren McIntyre and Dr Rolf Renne. Her research interests include gene regulatory networks, microRNA s and human genomics. She won the Outstanding International S tudent Award in 2012.