Discovery of MLL1 binding units, their localization to CpG Islands, and their potential function in mitotic chromatin

MISSING IMAGE

Material Information

Title:
Discovery of MLL1 binding units, their localization to CpG Islands, and their potential function in mitotic chromatin
Series Title:
BMC Genomics
Physical Description:
Mixed Material
Creator:
Minou Bina
Phillip Wyss
Elise Novorolsky
Noorfatin Zulkelfi
Jing Xue
Randi Price
Matthew Fay
Zach Gutmann
Brian Fogler
Daidong Wang
Publisher:
BMC Genomics
Publication Date:

Notes

Abstract:
Background: Mixed Lineage Leukemia 1 (MLL1) is a mammalian ortholog of the Drosophila Trithorax. In Drosophila, Trithorax complexes transmit the memory of active genes to daughter cells through interactions with Trithorax Response Elements (TREs). However, despite their functional importance, nothing is known about sequence features that may act as TREs in mammalian genomic DNA. Results: By analyzing results of reported DNA binding assays, we identified several CpG rich motifs as potential MLL1 binding units (defined as morphemes). We find that these morphemes are dispersed within a relatively large collection of human promoter sequences and appear densely packed near transcription start sites of protein-coding genes. Genome wide analyses localized frequent morpheme occurrences to CpG islands. In the human HOX loci, the morphemes are spread across CpG islands and in some cases tail into the surrounding shores and shelves of the islands. By analyzing results of chromatin immunoprecipitation assays, we found a connection between morpheme occurrences, CpG islands, and chromatin segments reported to be associated with MLL1. Furthermore, we found a correspondence of reported MLL1-driven “bookmarked” regions in chromatin to frequent occurrences of MLL1 morphemes in CpG islands. Conclusion: Our results implicate the MLL1 morphemes in sequence-features that define the mammalian TREs and provide a novel function for CpG islands. Apparently, our findings offer the first evidence for existence of potential TREs in mammalian genomic DNA and the first evidence for a connection between CpG islands and gene-bookmarking by MLL1 to transmit the memory of highly active genes during mitosis. Our results further suggest a role for overlapping morphemes in producing closely packed and multiple MLL1 binding events in genomic DNA so that MLL1 molecules could interact and reside simultaneously on extended potential transcriptional

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
All rights reserved by the source institution.
System ID:
AA00020102:00001

Full Text

PAGE 1

CHR1 to CHRYMorphemeCompl CGIs, expected counts for random occurrences CGIs, Observed counts CHR1 CGCG 27817118 CHR1 CGTCG CGACG 683390 CHR1 CGCCG CGGCG 13017019 CHR1 CGCGCG 102231 CHR1 CGTGCG CGCACG 361574 CHR1 CGCCCG CGGGCG 2055888 CHR1 CGTCCG CGGACG 321384 CHR1 CGTACG 6106 CHR2 CGCG 18313437 CHR2 CGTCG CGACG 492481 CHR2 CGCCG CGGCG 8613334 CHR2 CGCGCG 61703 CHR2 CGTGCG CGCACG 251115 CHR2 CGCCCG CGGGCG 1284289 CHR2 CGTCCG CGGACG 211033 CHR2 CGTACG 578 CHR3 CGCG 1058348 CHR3 CGTCG CGACG 261630 CHR3 CGCCG CGGCG 458579 CHR3 CGCGCG 41143 CHR3 CGTGCG CGCACG 13824 CHR3 CGCCCG CGGGCG 802767 CHR3 CGTCCG CGGACG 12703 CHR3 CGTACG 260 CHR4 CGCG 947775 CHR4 CGTCG CGACG 251469 CHR4 CGCCG CGGCG 427818 CHR4 CGCGCG 31020 CHR4 CGTGCG CGCACG 13714CHR4 CGCCCG CGGGCG 672419 CHR4 CGTCCG CGGACG 11751 CHR4 CGTACG 229 CHR5 CGCG 1198930 CHR5 CGTCG CGACG 311735 CHR5 CGCCG CGGCG 528642 CHR5 CGCGCG 41089 CHR5 CGTGCG CGCACG 15758 CHR5 CGCCCG CGGGCG 852807 CHR5 CGTCCG CGGACG 13752 CHR5 CGTACG 379 CHR6 CGCG 1288729 CHR6 CGTCG CGACG 321700

PAGE 2

CHR6 CGCCG CGGCG 578513 CHR6 CGCGCG 51097 CHR6 CGTGCG CGCACG 16789 CHR6 CGCCCG CGGGCG 882663 CHR6 CGTCCG CGGACG 14660 CHR6 CGTACG 339 CHR7 CGCG 18910521 CHR7 CGTCG CGACG 491932 CHR7 CGCCG CGGCG 8910130 CHR7 CGCGCG 71372 CHR7 CGTGCG CGCACG 251014 CHR7 CGCCCG CGGGCG 1303351 CHR7 CGTCCG CGGACG 21878 CHR7 CGTACG 470 CHR8 CGCG 1137820 CHR8 CGTCG CGACG 341528 CHR8 CGCCG CGGCG 547571 CHR8 CGCGCG 51257 CHR8 CGTGCG CGCACG 17773 CHR8 CGCCCG CGGGCG 732502 CHR8 CGTCCG CGGACG 13611 CHR8 CGTACG 340 CHR9 CGCG 1378792 CHR9 CGTCG CGACG 341705 CHR9 CGCCG CGGCG 688624 CHR9 CGCGCG 41157 CHR9 CGTGCG CGCACG 17768 CHR9 CGCCCG CGGGCG 992947 CHR9 CGTCCG CGGACG 15740 CHR9 CGTACG 346 CHR10 CGCG 1538912 CHR10 CGTCG CGACG 411641 CHR10 CGCCG CGGCG 718653 CHR10 CGCGCG 51186 CHR10 CGTGCG CGCACG 22795 CHR10 CGCCCG CGGGCG 992846 CHR10 CGTCCG CGGACG 17690 CHR10 CGTACG 442 CHR11 CGCG 1549312 CHR11 CGTCG CGACG 411885 CHR11 CGCCG CGGCG 778956 CHR11 CGCGCG 51207 CHR11 CGTGCG CGCACG 22931 CHR11 CGCCCG CGGGCG 1063084 CHR11 CGTCCG CGGACG 17827 CHR11 CGTACG 461 CHR12 CGCG 1387440

PAGE 3

CHR12 CGTCG CGACG 331513 CHR12 CGCCG CGGCG 627682 CHR12 CGCGCG 5985 CHR12 CGTGCG CGCACG 18687 CHR12 CGCCCG CGGGCG 1002503 CHR12 CGTCCG CGGACG 15620 CHR12 CGTACG 354 CHR13 CGCG 484470 CHR13 CGTCG CGACG 13777 CHR13 CGCCG CGGCG 214215 CHR13 CGCGCG 2528 CHR13 CGTGCG CGCACG 6371 CHR13 CGCCCG CGGGCG 331302 CHR13 CGTCCG CGGACG 5338 CHR13 CGTACG 131 CHR14 CGCG 815938 CHR14 CGTCG CGACG 211233 CHR14 CGCCG CGGCG 385859 CHR14 CGCGCG 3752 CHR14 CGTGCG CGCACG 11501 CHR14 CGCCCG CGGGCG 571997 CHR14 CGTCCG CGGACG 9445 CHR14 CGTACG 231 CHR15 CGCG 936500 CHR15 CGTCG CGACG 221333 CHR15 CGCCG CGGCG 426467 CHR15 CGCGCG 3839 CHR15 CGTGCG CGCACG 12553 CHR15 CGCCCG CGGGCG 672211 CHR15 CGTCCG CGGACG 10523 CHR15 CGTACG 245 CHR16 CGCG 2389317 CHR16 CGTCG CGACG 631974 CHR16 CGCCG CGGCG 1279193 CHR16 CGCGCG 81110 CHR16 CGTGCG CGCACG 341033 CHR16 CGCCCG CGGGCG 1573135 CHR16 CGTCCG CGGACG 28869 CHR16 CGTACG 552 CHR17 CGCG 36111275 CHR17 CGTCG CGACG 812352 CHR17 CGCCG CGGCG 17311108 CHR17 CGCGCG 111414 CHR17 CGTGCG CGCACG 481078 CHR17 CGCCCG CGGGCG 2663875 CHR17 CGTCCG CGGACG 37967 CHR17 CGTACG 766

PAGE 4

CHR18 CGCG 584143 CHR18 CGTCG CGACG 16814 CHR18 CGCCG CGGCG 274036 CHR18 CGCGCG 2595 CHR18 CGTGCG CGCACG 8421 CHR18 CGCCCG CGGGCG 371275 CHR18 CGTCCG CGGACG 6350 CHR18 CGTACG 221 CHR19 CGCG 64513264 CHR19 CGTCG CGACG 1353068 CHR19 CGCCG CGGCG 32612190 CHR19 CGCGCG 231655 CHR19 CGTGCG CGCACG 771551 CHR19 CGCCCG CGGGCG 4814232 CHR19 CGTCCG CGGACG 661139 CHR19 CGTACG 993 CHR20 CGCG 1175663 CHR20 CGTCG CGACG 321309 CHR20 CGCCG CGGCG 585626 CHR20 CGCGCG 4770 CHR20 CGTGCG CGCACG 17557 CHR20 CGCCCG CGGGCG 751940 CHR20 CGTCCG CGGACG 14443 CHR20 CGTACG 228 CHR21 CGCG 372493 CHR21 CGTCG CGACG 10445 CHR21 CGCCG CGGCG 192269 CHR21 CGCGCG 1323 CHR21 CGTGCG CGCACG 5233 CHR21 CGCCCG CGGGCG 23744 CHR21 CGTCCG CGGACG 4226 CHR21 CGTACG 17 CHR22 CGCG 1275479 CHR22 CGTCG CGACG 321121 CHR22 CGCCG CGGCG 665370 CHR22 CGCGCG 4704 CHR22 CGTGCG CGCACG 17462 CHR22 CGCCCG CGGGCG 841948 CHR22 CGTCCG CGGACG 14475 CHR22 CGTACG 222 CHRX CGCG 835690 CHRX CGTCG CGACG 231425 CHRX CGCCG CGGCG 355848 CHRX CGCGCG 3714 CHRX CGTGCG CGCACG 12594 CHRX CGCCCG CGGGCG 641787 CHRX CGTCCG CGGACG 9515

PAGE 5

CHRX CGTACG 242 CHRY CGCG 6730 CHRY CGTCG CGACG 2187 CHRY CGCCG CGGCG 3618 CHRY CGCGCG 091 CHRY CGTGCG CGCACG 188 CHRY CGCCCG CGGGCG 5190 CHRY CGTCCG CGGACG 170 CHRY CGTACG 016 TOTAL CGCG 3139192096 TOTAL CGTCG CGACG 80438647 TOTAL CGCCG CGGCG 1477188320 TOTAL CGCGCG 11024942 TOTAL CGTGCG CGCACG 42018184 TOTAL CGCCCG CGGGCG 220862702 TOTAL CGTCCG CGGACG 34916009 TOTAL CGTACG 721158



PAGE 1

Morpheme / compl Promoter /counts Non motif/compl Promoter/ counts CGCG27964CGAACG665 CGACG13178CGACCG1823 CGGCG17440CGAGCG4535 CGTGCG3487CGATCG568 CGCCCG13086CGGCCG10703 CGGACG6589 CGCGCG3588 CGTACG279



PAGE 1

#CHR1 to CHRYnon motif compl CGIs, expected counts for random occurrences CGIs, Observed counts CHR1 CGAACG CGTTCG 15480 CHR1 CGATCG 6163 CHR1 CGACCG CGGTCG 14944 CHR1 CGAGCG CGCTCG 272160 CHR1 CGGCCG 372808 CHR2 CGAACG CGTTCG 11321 CHR2 CGATCG 4107 CHR2 CGACCG CGGTCG 10660 CHR2 CGAGCG CGCTCG 171633 CHR2 CGGCCG 242112 CHR3 CGAACG CGTTCG 6248 CHR3 CGATCG 377 CHR3 CGACCG CGGTCG 6475 CHR3 CGAGCG CGCTCG 101087 CHR3 CGGCCG 141369 CHR4 CGAACG CGTTCG 6175 CHR4 CGATCG 350 CHR4 CGACCG CGGTCG 5388 CHR4 CGAGCG CGCTCG 9911 CHR4 CGGCCG 111172 CHR5 CGAACG CGTTCG 7248 CHR5 CGATCG 384 CHR5 CGACCG CGGTCG 6469 CHR5 CGAGCG CGCTCG 111097 CHR5 CGGCCG 151322 CHR6 CGAACG CGTTCG 7258 CHR6 CGATCG 367 CHR6 CGACCG CGGTCG 7432 CHR6 CGAGCG CGCTCG 121089 CHR6 CGGCCG 161368 CHR7 CGAACG CGTTCG 10251 CHR7 CGATCG 498 CHR7 CGACCG CGGTCG 10498 CHR7 CGAGCG CGCTCG 171282 CHR7 CGGCCG 251678 CHR8 CGAACG CGTTCG 7179 CHR8 CGATCG 367 CHR8 CGACCG CGGTCG 7376

PAGE 2

CHR8 CGAGCG CGCTCG 11960 CHR8 CGGCCG 151235 CHR9 CGAACG CGTTCG 7233 CHR9 CGATCG 378 CHR9 CGACCG CGGTCG 7447 CHR9 CGAGCG CGCTCG 131075 CHR9 CGGCCG 181380 CHR10 CGAACG CGTTCG 10176 CHR10 CGATCG 479 CHR10 CGACCG CGGTCG 8393 CHR10 CGAGCG CGCTCG 151071 CHR10 CGGCCG 191363 CHR11 CGAACG CGTTCG 8268 CHR11 CGATCG 479 CHR11 CGACCG CGGTCG 9514 CHR11 CGAGCG CGCTCG 161164 CHR11 CGGCCG 211546 CHR12 CGAACG CGTTCG 7189 CHR12 CGATCG 369 CHR12 CGACCG CGGTCG 7454 CHR12 CGAGCG CGCTCG 13894 CHR12 CGGCCG 171274 CHR13 CGAACG CGTTCG 392 CHR13 CGATCG 133 CHR13 CGACCG CGGTCG 2214 CHR13 CGAGCG CGCTCG 4487 CHR13 CGGCCG 6678 CHR14 CGAACG CGTTCG 4166 CHR14 CGATCG 245 CHR14 CGACCG CGGTCG 4324 CHR14 CGAGCG CGCTCG 8731 CHR14 CGGCCG 10959 CHR15 CGAACG CGTTCG 5145 CHR15 CGATCG 247 CHR15 CGACCG CGGTCG 5339 CHR15 CGAGCG CGCTCG 9825 CHR15 CGGCCG 111082 CHR16 CGAACG CGTTCG 13228 CHR16 CGATCG 569 CHR16 CGACCG CGGTCG 13479 CHR16 CGAGCG CGCTCG 261113 CHR16 CGGCCG 311583 CHR17 CGAACG CGTTCG 15315 CHR17 CGATCG 8117 CHR17 CGACCG CGGTCG 17609 CHR17 CGAGCG CGCTCG 331438 CHR17 CGGCCG 441932

PAGE 3

CHR18 CGAACG CGTTCG 3104 CHR18 CGATCG 133 CHR18 CGACCG CGGTCG 3207 CHR18 CGAGCG CGCTCG 5517 CHR18 CGGCCG 7699 CHR19 CGAACG CGTTCG 21377 CHR19 CGATCG 11124 CHR19 CGACCG CGGTCG 31688 CHR19 CGAGCG CGCTCG 551495 CHR19 CGGCCG 852028 CHR20 CGAACG CGTTCG 6148 CHR20 CGATCG 357 CHR20 CGACCG CGGTCG 7314 CHR20 CGAGCG CGCTCG 11703 CHR20 CGGCCG 17980 CHR21 CGAACG CGTTCG 273 CHR21 CGATCG 123 CHR21 CGACCG CGGTCG 2118 CHR21 CGAGCG CGCTCG 3263 CHR21 CGGCCG 5316 CHR22 CGAACG CGTTCG 5134 CHR22 CGATCG 254 CHR22 CGACCG CGGTCG 7279 CHR22 CGAGCG CGCTCG 13618 CHR22 CGGCCG 17915 CHRX CGAACG CGTTCG 5191 CHRX CGATCG 254 CHRX CGACCG CGGTCG 4363 CHRX CGAGCG CGCTCG 8759 CHRX CGGCCG 11873 CHRY CGAACG CGTTCG 120 CHRY CGATCG 012 CHRY CGACCG CGGTCG 025 CHRY CGAGCG CGCTCG 166 CHRY CGGCCG 198 TOTAL CGAACG CGTTCG 1725019 TOTAL CGATCG 741686 TOTAL CGACCG CGGTCG 16510009 TOTAL CGAGCG CGCTCG 29823438 TOTAL CGGCCG 40430770





PAGE 1

RESEARCHARTICLEOpenAccessDiscoveryofMLL1bindingunits,theirlocalization toCpGIslands,andtheirpotentialfunctionin mitoticchromatinMinouBina*,PhillipWyss,EliseNovorolsky,NoorfatinZulkelfi,JingXue,RandiPrice,MatthewFay,ZachGutmann, BrianFoglerandDaidongWangAbstractBackground: MixedLineageLeukemia1(MLL1)isamammalianorthologofthe Drosophila Trithorax.In Drosophila TrithoraxcomplexestransmitthememoryofactivegenestodaughtercellsthroughinteractionswithTrithorax ResponseElements(TREs).However,despitetheirfunctionalimportance,nothingisknownaboutsequencefeatures thatmayactasTREsinmammaliangenomicDNA. Results: ByanalyzingresultsofreportedDNAbindingassays,weidentifiedseveralCpGrichmotifsaspotential MLL1bindingunits(definedasmorphemes).Wefindthatthesemorphemesaredispersedwithinarelativelylarge collectionofhumanpromotersequencesandappeardenselypackedneartranscriptionstartsitesofprotein-coding genes.GenomewideanalyseslocalizedfrequentmorphemeoccurrencestoCpGislands.Inthehuman HOX loci, themorphemesarespreadacrossCpGislandsandinsomecasestailintothesurroundingshoresandshelvesof theislands.Byanalyzingresultsofchromatinimmunoprec ipitationassays,wefoundaconnectionbetweenmorpheme occurrences,CpGislands,andchromatinsegmentsreport edtobeassociatedwithMLL1.Furthermore,wefounda correspondenceofreportedMLL1-driven “ bookmarked ” regionsinchromatintofrequentoccurrencesofMLL1 morphemesinCpGislands. Conclusion: OurresultsimplicatetheMLL1morphemesinsequence-featuresthatdefinethemammalianTREsand provideanovelfunctionforCpGislands.Apparently,ourfindingsofferthefirstevidenceforexistenceofpotentialTREs inmammaliangenomicDNAandthefirstevidenceforaconnectionbetweenCpGislandsandgene-bookmarkingby MLL1totransmitthememoryofhighlyactivegenesduringmitosis.Ourresultsfurthersuggestaroleforoverlapping morphemesinproducingcloselypackedandmultipleMLL1bindingeventsingenomicDNAsothatMLL1molecules couldinteractandresidesimultaneouslyonextendedpo tentialtranscriptionalmaintenanceelementsinhuman chromosomestotransmitthememoryofhighlyactivegenesduringmitosis. Keywords: Cis -elements,Chromatinstructure,CodesinDNA,CGGrepeats,CpGislands,FMR1,HOXA,HOXB,HOXC, HOXD,MLL,MLL1,Genebookmarking,Generegulation,Humangenome,Mammaliangenomes,Regulatorycodes, Trithoraxresponseelements,TREs,Mitosis,Celldivision *Correspondence: bina@purdue.edu DepartmentofChemistry,PurdueUniversity,WestLafayette,IN47907,USA 2013Binaetal.;licenseeBioMedCentralLtd.ThisisanopenaccessarticledistributedunderthetermsoftheCreative CommonsAttributionLicense(http://creativecommons.org/licenses/by/2.0),whichpermitsunrestricteduse,distribution,and reproductioninanymedium,providedtheoriginalworkisproperlycited.Bina etal.BMCGenomics 2013, 14 :927 http://www.biomedcentral.com/1471-2164/14/927

PAGE 2

BackgroundTheDNAinhumanchromosomesisrelativelylong[1].In additiontoprotein-codinggenes,thegenomeincludesnumeroussequencefeaturesincludinggenedeserts[2],a multitudeoflongnoncodingRNAswithlittleorno protein-codingcapacity[3],andmanyislandsofCpG-rich sequences[4].CpGIslands(GIs)includeG-tractsandnumerousnonmethylatedCpGs[4].CpG-richnessisaremarkablefeaturesince,generally,bulkgenomicDNAis depletedofCpG,owingtoselectivedeaminationof5-meC [5,6].CGIsvaryinsizeandCpGcontent[6-8].Inclose proximity(~2kb)toCGIs,thereareregions(knownas shores)thatcontainalowerCpGdensitythanthevalues computationallyselectedtodefinethepositionofCpG islands[9,10].Sequences(~2kb)thatflanktheshoresare referredtoasshelves[11].Sequencesbeyondtheshelves aredescribedasopensea[11].Bothshoresandshelves appeartocontributetodevelopmentalandregulatoryprocessesthatcontrolCpGmethylationpatternsinchromosomesleadingtogenerepression[12]. Generepressionandactivationareregulatedbyproteins thatinteractwithDNA,byenzymesthatmodifythecore histonesinnucleosomesandbyproteinsthatbindmodifiedresiduesinhistones[13].Core-histonemodifications includemethylation(me),acetylation(ac),phosphorylation (P),andubiquitination(ub)[14].Aconserveddomain (SET)catalyzesmethylationofH3K4(lysine4inhistone H3)producingH3K4me3[15].TrimethylatedH3K4isassociatedwithactiveortranscriptionallypoisedchromatin states[16].Inmammaliancells,H3K4trimethylationinvolvesseveralenzymesthatincludeSETD1A,SET1DB, andmembersofMLLfamily.MLLfamilymembersare comprisedofMLL1,MLL2,MLL3,andMLL4[15,17].In theliterature,thehumanMLL1isalsoreferredtoasMLL, ALL-1,andMLLT1;itsofficialsymbolisKMT2A.Inour studies,werefertohumanKMT2BasMLL2,toKMT2C asMLL3,andtoKMT2DasMLL4. Earlierstudiesdiscoveredthe MLL1 genethroughits involvementinchromosometranslocationsthatcause acuteleukemia[18,19].Translocationsoftenproduceabnormalproteinsconsistingoftheamino-terminusof MLL1fusedinframetothecarboxylterminusofanother protein[20].ThenormalformofMLL1isrelativelylarge andcontainsseveraldomains:aplanthomeodomain,a bromodomain,atransactivationdomain,aSETdomain, andacysteine-richCXXCdomain[21].TheCXXCdomainisknownasMTsinceitshowssequencesimilarity toDNAmethyltransferases[22, 23].Asimilardomainexists inMLL2andCXXC1(alsoknownasCGBPandCFP1). EventhoughtheMTdomaininMLL1andCXXC1binds non-methylatedCpGcontainingsequences[24-26],swappingexperimentshaveshownthatCXXCdomainshave specificandnonredundanta ctivitiesthatimpactdownstreamregulatoryfunctions[27].Colonyformingability andleukemogenicityofafusionprotein(MLL-AF9)was abrogatedwhentheMLL-derivedsegmentwasreplaced withtheDNAbindingdomainofCXXC1[27].Furthermore,eventhoughMLL1andMLL2displayedalmost indistinguishableDNA-bindingproperties,theircorrespondingMT-domainsguidedtheproteinstolargely non-overlappinggenerepertoires[25]. Evidencesupportscentralrolesfornativeformsof MLL1inmechanismsthatpreserve “ thememory ” of highlyactivegenesduringcelldivisionandatspecific stagesinembryonicdevelopment[28-31].In Drosophila twogroupsofproteinssupportheritablememorysystems thatmaintainthetranscriptionalstateoftargetgenes [32,33].TrithoraxGroup(TrxG)bindsTrxGResponseElements(TREs)tomaintainactivestates[32].Polycomb Group(PcG)perpetuatesrepressedstatesthroughPcGResponseElements(PREs)[32,33].In Drosophila ,related DNAsequenceelementsarethoughttocontributetothe recruitmentofbothTrxGandPcGcomplexestochromatin[32].MammalianPcGproteinsconsistoftwo groups:PolycombRepressiveComplexes1and2(PRC1 andPRC2),see[34]andreferencestherein.PRC1catalyzesmono-ubiquitylationofhistoneH2A;PRC2methylateslysine27inhistoneH3producingH3K27me2/ me3[16,35].ThePRC2complexincludesEZH2,EED, andSUZ12[36].EZH2istheenzymaticcomponentofthe PRC2complexandproducestherepressiveH3K27me3 marksinnucleosomes[16,35].Interestingly,emerging dataindicatethatthePRC2complexisrecruitedtochromatinbyCpGislands[34]. SyndromicmanifestationssupporttheopposingfunctionsthatMLL1andEZH2playinembryonicdevelopment.Mutationsinthe EZH2 genecauseautosomal dominantWeaversyndromecharacterizedbygeneralized overgrowth,advancedboneage,markedmacrocephaly, hypertelorism,andcharacteristicfacialfeatures[37,38]. De novo mutationsinthe MLL1 genecauseWiedemannSteinersyndrome[39-41].Symptomsvaryandmayincludedelayedgrowthanddevelopment,asymmetryofthe face,hypotonia,andintellectualdisability[39-41].Mutationsoftenproduceframe-shiftsremovingdownstream domains.Studiesof Mll1 knockoutmicesupportacentral roleforMLL1inregulatingdevelopmentalpathways [28-30]. Mll1 heterozygous(+/ )micedisplayedretarded growth,haematopoieticabnormalities,andbidirectional homeotictransformationsoftheaxialskeletonaswell assternalmalformations[28]. Mll1deficiency( / )was embryoniclethal[28].Inmice, Mll1 wasrequiredfor maintaininggeneexpressionearlyinembryogenesis[42], necessaryforcorrectdevelopmentofmultipletissues,and essentialforsuccessfulskeletalandneural,andcraniofacialdevelopment[28,42]. ProteinnetworksthatincludeMLL1drivecoordinated patternsofgeneexpression(Figure1).ThesenetworksBina etal.BMCGenomics 2013, 14 :927 Page2of16 http://www.biomedcentral.com/1471-2164/14/927

PAGE 3

areorganizedashubsthatreceiveandtransmitinformationtoactivate,upregulate,downregulate,orrepresstheexpressionofagivengene[13].Components inmolecularcircuitriesincludemultiproteincomplexesthatarerelativelylargeandhighlydynamic[13]. Dependingonenvironment almilieu,MLL1associates directlyorindirectlywithnumerousregulatoryproteinsincludingMEN1,RBBP5,WDR5,ASH2L,HCF1, LEDGF,andCXXC1(Figure1).Inproteinnetworks, MLL1,HCF1,andCXXC1alsocommunicatewith largeanddynamicproteincomplexesthatrepresstranscription(Figure1).CXXC1bindsnon-methylatedCpG [26,43]andinteractswithH3K 4methyltransferasesknown asSET1A/SETD1AandSET1B/SETD1B(Figure1).These enzymesplayamorewidespreadroleinH3K4trimethylationthandoMLL1complexesinmammaliancells[17]. TheseandrelatedfindingsindicatethatinadditiontoH3K4 methylation,MLL1pe rformshistonemethyltransferaseindependentfunctions[31]. Asthemaincomponentintrithorax-basedregulation networks,MLL1playsacentralroleinpreservingtranscriptionalmemoryduringmitosis[31].Analysesofsynchronizedhumancellsidentifiedagloballyrearranged patternofMLL1occupancyduringmitosisinamanner favoringgenesthatwerehighlytranscribedduringthe interphasestageofcell-cycle[31].However,howMLL1 bookmarksgenestomaintaintranscriptionalmemoryhas notbeenaddressed.Thefindingthatgene-bookmarking byMLL1islargelyindependentofthemethylationstatus ofH3K4onmitoticchromosomes[31]provokesthequestionofwhetherinteractionsofMLL1withgenomicDNA mayplayaroleinbookmarkingeventsthatpreservethe memoryofhighlytranscribedgenesattheonsetofmitosis.Toexplorethisquestion,wehaveanalyzeddata concerninginteractionsofMLL1withDNAandchromatin.WeshowthatDNAsequencesthatbindthe MLL1MT-domaincanbedescribedasminimalunitsor morphemes:thesmallest ‘ words ’ inDNAthatselectivelybindtheMT-domaininMLL1.Wefindthatthe MLL1morphemesoccurinchromatinsegmentsthat arebookmarkedbyMLL1duringmitosis.Furthermore, weshowthatfrequentmorphemeoccurrencesmapto genomicsequencesthatcorrespondtoCGIs.Collectively,ourresultssuggestthatCGIsincludeTREsthat bindMLL1tomaintainthememoryofhighlyactive genesattheonsetofmitosis. SETD1B SETD1A DPY30 ASH2L MEN1 CXXC1 MOF EZH2 DNMT3B DNMT3A DNMT1 HDAC2 SUZ12 HDAC1 WDR5 EP300 MECP2 HCFC1 SIN3B CDK9 EED RBBP5 KDM6A/UTX CREBBP BRD4 SIN3A LEDGF MLL2/KMT2B MLL1 Figure1 AsubsetofproteinnetworksthatinvolveMLL1. Proteinsaredepictedasnodes,theirinteractionsasedges[44]:pink,proteinsthat bindunmethylatedCpGs;red,proteinsfoundinrepressivecomplexes.TheinteractionswereobtainedfromBioGRID[45,46].Thefiguredoesnot displayallreportedinteractions.ItfocusesonunderlyingconnectivityamongproteinsthatassociatewithMLL1toformlargeandhighlydynamic multiproteincomplexes. Bina etal.BMCGenomics 2013, 14 :927 Page3of16 http://www.biomedcentral.com/1471-2164/14/927

PAGE 4

ResultsanddiscussionLocalizationofCpG-richmotifsinpromotersofhuman protein-codinggenesProteincodinggenesaretranscribedbyRNApolymerase II(POLII).EarlierstudiesdeducedthatMLL1exclusively regulatedtheexpressionofhomeoticgenesandproper segmentalidentityinmammals[28,42].However,emergingdataindicatethatMLL1associateswithasubstantial fractionofhumanPOLIIpromoters,supportingaglobal roleforMLL1inregulationoftranscription[31,47]. Touncoversequencemotifsthatmayselectivelyinteract withMLL1,weanalyzedsequencesof19clonedinserts thattheMT-domaininMLL1selectedinDNAbindingassays[24].In16inserts,weidentifiedmotifsconsistingof CGCGwith0 – 2nucleotidesbetweenthetwoCpGs. Theremaining3insertscontainedCpGbutlackeddiscernablemotifs(Figure2A).Toexploretherelevanceof theidentifiedmotifstogeneregulation,weexamineda relativelylargecollectionofhumanPOLIIpromoters. Wefocusedontheregionupstreamoftranscription startsites( 500to 1)sincethisDNAsegmentcontributedtoformationofproteincomplexesthatregulated initiationofmRNAsynthesis[13].Inpromoterselection,weimposedfilteringcriteriatoeliminateredundancy.Thefinalpromotersetincludednearly16,000 sequences.WeanalyzedthissetforoccurrencesofCGCG, CGNCG,andCGNNCG.Additionalfile1:FigureS1 showsthatthesemotifsarespreadacrosstheDNAsegmentthatprecedesthetranscriptionstartsites(TSSs). MotiffrequenciessteadilyincreaseinsequencesapproachingproximalpromotersandTSSsingenomic DNA(Additionalfile1:FigureS1).Certainmotifsappear moreprevalentthanothers,displayingthefollowing trend:CGNNCG>CGNCG>CGCG.LexicalunitsrecognizedbytheMLL1MT-domainand theirlocalizationtoPOLIIpromotersEncouragedbyresultsofpreliminarypromoteranalyses, weaskedwhethertheclonedinsertsobtainedfromSELEX assaysincludedsequence-elementsthatmaycorrespondto MLL1recognitionsites.Toapproachthisquestion,we separatedmotifsconsistingofCGNCGandCGNNCG accordingtonucleotidesthatappearedatNposition. Weuncoveredseveralmotifs,whichwerefertoequivalentlyasMLL1bindingsites,bindingunits,ormorphemes (Figure2B).ExamplesincludeCGCG,CGTCGoritscomplement(CGACG),CGGCGoritscomplement(CGCCG), andCGTACG,apalindromicsequence(Figure2B).Thus, theMLL1morphemesderivedfromCGNCGincludeall possiblebasesattheNposition:A,G,C,orT.Amongthe combinatorialpermutationsofNN(inCGNNCG),wedid notfindCGGCCG,CGAACG,CGATCG,CGAGCG,and CGACCG.Werefertothesesequencesasnon-motifs. Resultsofpromoteranalysespromptedexaminationof asequencepatternthatappearedfrequentlyatthe5 boundaryofhumanPOLIIgenes [48].Thispatternconsists ofBVSCGSSSCB:whereBcorrespondstoC,G,orT;Vto A,C,orG;StoCorG.Wefindthatthispatterndescribes threeoftheMLL1morphemes(CGCGCG,CGCCCG, CGCCG),supportingapossibleroleforsuchmorphemes inregulationoftranscription.Additionally,earlierstudies analyzedhumanPOLIIpromoter sforfrequentlyoccurring 8-mersand9-mers[49-52].Whenrankedaccordingto statisticalcriteria,includ ingoccurrencesintotalhuman A1. 2. 3. 4.B 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. CCGTGCTAGTG CGTCG TACC CGCCG A CGTGTCTGGCTGTGTCGGTGACCGGCA CTGCGGGCCCATCGNGANT CGCG GCG XXTTA CGTGCG GTGAGATGCGTCATG TTCGGCATGGCGTTTGACCTG CGCGCG CGTACG TGTCGTAAGTGTTGCGATGTG XXTTA CGTGCG GTGAGATGCGTCATG CGG CGCG ATGTATCGTCCTCGTTTACG AT CGCCCG G G C G A G T G G G G G C T T T G T CGACG ACTACGGTGCCGTTGCGCTTCG CGGTGCTG CGCG TGTCGTTGCTCGTGT A CGCCG TAATT CGCGCG GGTG CGCACG CGACGTCG TGATCGTAGTTA CGTCG GT TACGTGGAC CGACGTCG TACG GTACG CGTACG GTGGGATAGA CGT CG CGGACG CTGGCGATGTCTCCGAGTTGTGTGCCG ACGTACTTCTTGGATGCCTG CGTCCG ATCA CGCG ACTCT CGACG TGAATTGCG CGTCACGTATTTGTGGGCTTCGGGATCG Forward Complement CGCGCGCG CGACG CGTCG CGCCGCGGCG CGCGCGCGCGCG CGTGCGCGCACG CGCCCGCGGGCG CGGACGCGTCCG CGTACGCGTACG Figure2 AnalysisofclonedinsertsobtainedfromSELEXassays.(A) TheinsertswereisolatedandsequencedbyBirkeetal.[24].We numberedtheinsertsasshownontheleft.BoldnumbershighlightinsertsthatincludeoneormoreMLL1morpheme(s).UnderlinedCGsdenote thepositionofmorphemeoverlaps.YellowboxeshighlightCpGsthatdidnotcorrespondtodiscernablemotifs. (B) Color-codingschemefor distinguishingvariousMLL1morphemes. Bina etal.BMCGenomics 2013, 14 :927 Page4of16 http://www.biomedcentral.com/1471-2164/14/927

PAGE 5

genomicDNA,wefindthatarelativelylargeproportionof promoter9-mersarecomposedofCpG-richsequences [49,50]thatincludeMLL1morphemes. Therefore,wereanalyzedthePOLIIpromotersetfor morphemeoccurrences(Figure3,Additionalfile2:Table S1).WefindthatasobservedforCGCG,CGNCG,and CGNNCG(Additionalfile1:FigureS1),theMLL1morphemesarespreadacrossPOLIIpromotersandtheirdensityincreasesinsequencesappr oachingtheTSSs(Figure3). Formorphemefrequencies ,weobservethefollowing trend:CGCG>CGGCG>CGACG>CGCCCG>CGG ACG>CGCGCG>CGTGCG>CGTACG(Additional file2:TableS1). Intoto ,resultsofcomplementaryanalysessupportaroleforMLL1morphemesinpromoterassociatedfunctions.MorphemeoccurrencesinfunctionalDNAsequencesSincetheMLL1morphemeswereidentifiedfromtheresultsofSELEXassays,weaskedwhetherthemorphemes haveanyrelevancetosequencesthatbindMLL1ina cellularcontext.Inliteraturesurveys,wefoundstudies thatdealtwithinteractionsofMLL1withbothsynthetic andnaturallyoccurringDNAsequences[53-55].One studyexaminedanaturallyoccurringDNAderivedfrom theproximalGC-boxintheHSVTKpromoter[53].We foundthattheGC-boxintheHSVTKpromoterincludedasequence(CGG CG CG)producedfromtwo overlappingMLL1morphemes:CGG CG and CG CG (Figure4A).Intransientexpressionassays,theGC-box recruitedMLL1toDNAtoactivateexpressionofa linkedreportergene[53].Thisfindingsupportsarole forMLL1-DNAinteractionsinactivationoftranscription.Furthermore,aminoacidsubstitutionsintheregion encompassingtheMT-domainabrogatedtranscription andreportergeneactivation[53].Thesefindingssupport aroleforinteractionsoftheMT-domainwithDNAin theregulationoftranscription. Resultsofanotherstudyprovi deevidenceforfunctionalityofMLL1morphemes invivo .Specifically,inanupstreampromoterofthemouse Hoxa9 gene,thestudy localizedseveralCpG-richclustersthatwereassociated withMLL1[54].Gene-knockoutexperimentsshowedthat MLL1wasrequiredforprotectionoftheCpGclustersfrom methylation[54].WefindthattheCpGclustersinthe Hoxa9 promoterincludeMLL1morphemes(Figure4B). IsolatedmorphemesincludeCGCG,CGCCG,andCGC GCG.AMLL-protectedcluster(CGGG CG GGCG)isproducedfromoverlapofCGGGCGandCGGGCG.Thus,resultsofMLL-knockoutexperimentsprovidesupportfora roleforMLL1morphemesinan invivo context.MorphemeoccurrencesinCpGislandsThefindingthattheMLL1morphemesareCpG-rich raisesthequestionofwhethertheyarelocalizedinCGIs inordertorecruitMLL1tochromatin.However,since themorphemesarerelativelyshort apriori onecould suspectthattheymayappearfrequentlyinhumangenomicDNAjustbychance:onceevery256bpsfora4nucleotidemotif;onceevery1024bpsfora5-nucleotide motif;onceevery4094bpsfora6-nucleotidemotif.To examinethisissue,wecountedmorphemeoccurrences intotalhumangenomicDNA.Wefindthatmorpheme frequenciesingenomicDNAarerelativelylow.Forexample:CGCG(4bps)occursonceper53,977bps; CGACG/CGTCG(5bps)occursonceper210,681bps; andCGCGCG(6bps)occursonceper1,546,669bps. ToevaluatemorerigorouslyapossibleconnectionbetweenMLL1morphemesandCGIs,wefolloweda Figure3 MLL1morphemedistributionandoccurrencesinpromotersofhumanprotein-codinggenes:CGCG,redfullcircles;CGGCG, bluefullcircles;CGCCCG,magentaemptycircles;CGACG,greenemptycircles;CGGACG,lightblueemptycircles;CGCGCG,black circles;CGTGCG,xmagenta;CGTACG,+red. Bina etal.BMCGenomics 2013, 14 :927 Page5of16 http://www.biomedcentral.com/1471-2164/14/927

PAGE 6

previouslydescribedstatisticalmodel[49].Thestatistical procedurepartitionsthehumangenomeaccordingtooccurrencesofagivenMLL1morphemeinCGIsandinregionsoutsideCGIs.Theprobabilisticmodelassumesthat thetotalgenomicDNAisgeneratedbyamemorylessor Markovsource.Thestatisticalderivationsarebasedon theprincipleoflargedeviations,oftenreferredtoaspvalueanalyses[56].ResultsrevealedthatfrequentmorphemeoccurrencesinCGIswerestatisticallysignificant with 10 50(detailedinmethodssection). TofurtherassessapossibleassociationofMLL1morphemeswithCGIs,weexaminedindividualhuman chromosomesandtotalgenomicDNAformorpheme occurrences.Theanalysiscomparedexpectedfrequencies forrandomoccurrencestoobservedmorphemefrequenciesinCGIs.Wefoundthatmorpheme-occurrencesin CGIsexceededthevaluesexpectedforrandomdistributionineachhumanchromosomeandintotalgenomic DNA(Additionalfile3:TableS2). FormorphemefrequenciesinCGIs,wenotedthefollowingtrend:CGCG>CGCCG/CGGCG>CGCCCG/ CGGGCG>CGCGCG>CGTCG/CGACG>CGTGCG/C GCACG>CGTCCG/CGGACG>CGTACG.Asexpected, thefrequenciesareinfluencedbymorpheme-length. Nonetheless,thetrendindicatesabiasinfavorofGCrichmorphemes.Forexample,inCGIs,a5-bpmorpheme(CGCCG/CGGCG)occurred188,320timeswhile CGTCG/CGACGoccurred38,647times.InCGIs,a6bp morpheme(CGCCCG/CGGGCG)occurred62,702times whileCGTGCG/CGCACGoccurred18,184times.Overall,theobservedtrendisconsistentwithapossibleconnectionbetweenMLL1morphemesandCGIssinceahigh G+CcontentisahallmarkofsequenceslocalizedinCpG islands[5]. Additionally,weperformedstatisticalevaluationsof CpG-richmotifsthatdidnotappearinresultsofSELEX assays.Theanalysisrevealedthatthenon-motifsalsoare associatedwithCGIs.However,exceptforCGGCCG,the overallfrequenciesofnon-motifsinCGIsweremuch lowerthanthoseobservedforMLL1morphemes (Additionalfile4:TableS3).Forexample:inCGIs, CGAGCG/CGCTCGoccurs23,438times;CGACCG/ CGGTCGoccurs10,009;CGAACG/CGTTCGoccurs 5,019times;CGATCGoccurs1,686times.OccurrencesofMLL1morphemesinclassifiedPOLII promotersHumanPOLIIpromoterscanbeclassifiedintothree groups:groupI(about~30%)doesnothaveaCpGislandattheirTSS.GroupII(about~60%)hasasingle CpGislandattheirTSS.GroupIII(about~10%)has twoormoreCpGislandsinthevicinityoftheirTSSs[57]. Generally,thedensityofCpGdinucleotidesingenomic DNApositivelycorrelateswithpositionsofH3K4me3 marksinchromatin,indicatingthatthesetwoproperties aremechanisticallylinked[57,58].CpG-richpromoters maybeenrichedinRNApolymeraseIIpoisedfortranscription[16].Incontrast,bydefault,AT-richpromoters aretranscriptionallyinactive(19). Promoterclassifications[57]ledustoexaminethedistributionpatternofMLL1morphemesinhumangenomicDNAwithrespecttoCGIpositions,overallCpG occurrences,andH3K4me3modificationpatterns.We presentthreerepresentativeexamples,chosenforcomparisonwithresultsofPOLIIpromoterclassification[57]. Thefirstexamplecoversaregionthatdoesnotincludea CGI(Figure5).Thedepictedsegmentisabout211,000bp long.Itincludesaprotein-codinggene( SCN1A ),many CpGsbutnotaCGI.Figure5showsthatMLL1morphemesarescatteredthroughoutthesegment,possibly reflectingrandomoccurrences(tracklabeledMLL1sites). Asobservedpreviously[57],wedidnotfindH3K4me3 marksfornucleosomesassociatedwiththatregionin humangenomicDNA(Figure5,tracklabeledLayered H3K4me3). ThesecondexampleshowsaregionthatincludesasingleCGI[57]encompassingTSSsofvarious POLR1B transcripts(Figure6).Consistentwithsequencecharacteristics A B Figure4 MorphemesinDNAfragmentsthatinteractwithMLL1inDNAbindingandfunctionalassays.(A) SequencesanalyzedinDNA bindingortransientexpressionassays.Probe1correspondedtoinsert1,showninFigure2A;probe2wasderivedfromtheHSVTKpromoter [53].ColoredsequenceshighlightthepositionofMLL1morphemesinprobe1andprobe2 (B) MLL1morphemesinaDNAsegmentfromthe mouse Hoxa9 gene.Thissegmentincludesthepromoterofa Hoxa9 transcript[54].ColoredsequenceshighlightCpG-richclustersthat invivo MLL1protectedfrommethylation[54].Color-codingfollowstheschemeinFigure2B. Bina etal.BMCGenomics 2013, 14 :927 Page6of16 http://www.biomedcentral.com/1471-2164/14/927

PAGE 7

ofCGIs,frequencyofCpGdinucleotidesisrelativelyhigh withintheislandandtailsintoflankingsequencesdesignatedinliteratureasshoresandshelves(Figure5,lanelabeledshortmatch).TheMLL1morphemesareprimarily localizedwithintheisland(Figure6).LayersofH3K4me3 marksencompasstheCpGislandandextendintothe island ’ sshoresandshelves. ThethirdexampleshowsaDNAsegmentthatcontainsseveralCGIsandincludesaregionspanninga protein-codinggeneknownas SIX2 [57].Figure7shows thattheMLL1morphemesaredenselypackedwithin theislands.Incontrast,thedistributionofCpGsoccurrencesissignificantlymorebroadandextendstosurroundingshoresandshelvesoftheCGIs.Asnoted previously[57],theH3K4me3marksencompasstheregionsthatincludesahighCpGdensity.SinceSIX2functionsinmyogenesis[60],H3K4me3marksareprimarily observedforHSMMcells,humanskeletalmusclemyoblast(Figure7).Morphemeoccurrencesinhuman HOX lociInboth Drosophila andvertebrates,thehomeoticgenes playessentialrolesincorrectpatterningofthebodyplan [61].TrxGcomplexesandPcGcomplexesmaintainthe expressionpatternofgeneslocalizedinappropriatedomains[61,62].Asinmice,thehumanhomeoticgenesare organizedintofourclusters: HOXA HOXB HOXC ,and HOXD [63].Thisgroupofgenesencodeafamilyof transcriptionfactorsthatplayfundamentalrolesin morphogenesisduringdevel opment[42].Notably,severalgenesintheclustersincludeknownMLL1targets [42,47,55]. NumerousCGIsarespreadacrossthehuman HOX loci:about31 – 36CGIs/locus(Figures8,9,10,11and12). Intheseloci,theMLL1morphemesareprimarilylocalizedinCGIsandinsomecasestailintotheshoresand shelvesoftheislands(Figures8,9,10,11and12).ACGI maybeassociatedwithabidirectionalpromoterregulatingtheexpressionofa HOX geneandanoncoding RNA.ExamplesincludeaCGIthatinclude HOXA1 and HOTAIRM promoters(Figure8).Transcriptionof HOTAIRM1 originatesfromthesameCpGislandthat embedsthestartsiteof HOXA1 [64].Similarly,aCGI encompassesabidirectiona lpromoterthatregulates theexpressionof HOXA13 and HOTTIP (Figure8). Transcriptionof HOTTIP producesanoncodingRNA, implicatedinmaintainingactivechromatintocoordinatetheexpressionofgenesin HOXA locus[65].TranscriptioninitiationsiteofanothernoncodingRNAgene ( HOXD-AS1 )iswithinaCGIthatincludesthecodingregionof HOXD1 (Figure11). Inthehuman HOXA locus,apreviousstudydiscoveredextensiveMLL1bindingeventstoatranscriptionallyactivechromatindomain[47].InChIPassaysofa Scale chr2: CpG Islands Short Match MLL1 sites Non-motifs 100 kb hg19 166,850,000 166,900,000 166,950,000 167,000,000 UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) CpG Islands (Islands < 300 Bases are Light Green) H3K4Me3 Mark (Often Found Near Promoters) on 7 cell lines from ENCODE Perfect Matches to Short Sequence (CG) BINA MLL Morphemes BINA MLL-non-motifs SCN1A SCN1A SCN1A BC051759 Layered H3K4Me3 150 0 Figure5 HumangenomicDNAwithoutaCGI. Tracklabeled “ ShortMatch ” marksthepositionofCpGs.Track-labeled “ MLL1sites ” marksmorpheme positions; “ non-motifs ” markthepositionofsequencesnotfoundinresultsofSELEXassays[24].Tracklabeled “ LayeredH3K4me3 ” showstheposition ofH3K4me3marks. Scale chr2: CpG Islands Short Match MLL1 sites Non-motifs 2 kb hg19 113,298,000 113,299,000 113,300,000 113,301,000 113,302,000 113,303,000 113,304,000 UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) CpG Islands (Islands < 300 Bases are Light Green) H3K4Me3 Mark (Often Found Near Promoters) on 7 cell lines from ENCODE Perfect Matches to Short Sequence (CG) BINA MLL Morphemes BINA MLL-non-motifs POLR1B POLR1B POLR1B POLR1B POLR1B POLR1B POLR1B Layered H3K4Me3 150 0 Figure6 HumangenomicDNAwithoneCGencompassing POLR1B gene. HorizontalgreenbarmarkstheCGIposition.Tracklabeled “ Short Match ” marksthepositionofCpGs.Track-labeled “ MLL1sites ” marksmorphemepositions; “ non-motifs ” markthepositionofsequencesnotfound inresultsofSELEXassays[24].Tracklabeled “ LayeredH3K4me3 ” showsthepositionofH3K4me3marks,determinedbytheENCODEprojectusing avarietyofcelllinesincluding:lymphoblastoidcells(GM12878),H1humanembryonicstemcellline(H1-hESC),humanskeletalmuscle myoblasts(HSMM),humanumbilicalveinendothelialcells(HUVEC),erythroleukemiatypecellline(K562),normalhumanepidermalkeratinocytes (NHEK),andnormalhumanlungfibroblasts(NHLF)[59].AlayeredrepresentationisdisplayedinordertoprovideanoverviewofH3K4me3 profiles[59]. Bina etal.BMCGenomics 2013, 14 :927 Page7of16 http://www.biomedcentral.com/1471-2164/14/927

PAGE 8

humanmonocyticcell-line(U937),MLL1waslocalized tochromatinsegmentsencompassing HOXA1 andthe5 HOXA subclusterincluding HOXA7 HOXA9 HOXA10 HOXA11 ,and HOXA13 (Figure8).BindingofMLL1to thesegenescorrelatedwithhigh-levelsoftheirexpression [47].WefindthatMLL1morphemesoccurfrequentlyin chromatinregionswithwhichMLL1associates(Figure8). WecoverseveralexamplesillustratingthecorrespondenceofmorphemeoccurrencestoCGIsinhuman HOXA locusandtoMLL1associatedregionsdeterminedbyChIP assays[47].Theseregionsaremarkedbyhorizontal brown-barsinFigure8.ChIPassayslocalizedanMLLboundsegmentthatincludedtheTSSof HOXA1 ,extendingintothetranscribedregionofthegene[47].Wefind thatthecorrespondinggenomicDNAsegmentencompassestwoCGIsthatcontainclustersofMLL1morphemes(Figure8).AshortMLL-associatedchromatin segmentincludesthe HOXA5 promoterandextendsto thesecondexonin HOXA6 [47].TheMLL-boundsegmentiswithinaCGIthatcontainstwoclustersofMLL1 morphemes(Figure8).AlongMLL-boundsegmentencompassesfourCGIsthatincludeseveralclustersofmorpheme.AshorterMLL1associatedsegmentoverlapswith aCGIthatincludesaclusterofmorphemes;similarly sizedMLL-boundsegmentsalsoencompassCGIsthat containclustersofMLL1morphemes(Figure8). Inthehuman HOXB locus,CGIsareassociatedwith promotersofseveralgenesincluding HOXB5 HOXB7 and HOXB9 (Figure9).Intra-andinter-genicislands occuroftenandincludebothisolatedandoverlapping MLL1morphemes(Figure9).The HOXC locusprimarily containsintra-andinter-genicCGIs(Figure10). HOXC8 HOXC9 ,and HOXC10 promotersarewithinCGIsthat extendintocodingsequences.TheseCGIsalsoinclude MLL1morphemes(Figure10).TSSsofgenesin HOXD locusareoftenwithinCGIsthatcontainMLL1morphemes(Figure11).ExamplesincludeCGIsencompassing HOXD1 HOXD6 HOXD8 HOXD9 HOXD12 ,and HOXD13 promoters(Figure11).Morphemeoccurrencesinchromatinregionsbookmarked byMLL1duringmitosisTofurtherassesstherelevanceofMLL1morphemesto biologicalfunctions,weexaminedchromosomalregions reportedtobindMLL1duringmitosis[31].Bindingof MLL1totheseregionspreservedthememoryofgenes thatwerehighlyactivepriortotheonsetofcelldivision [31].TheassaysprimarilyfocusedonregionsencompassingpromotersequencesofPOLIIgenes[31].Therefore, weanalyzedresultsofChIPassaystodeterminewhether theMLL-boundchromatinsegmentsmappedtoCGIs andtoevaluatewhetherthebookmarkedsegmentsincludedMLL1morphemes.Wecoverthreerepresentative examples,selectedtocompareourfindingstofiguresdiscussedinapreviouspublication[31].Thefirstexample dealswithassociationofMLL1withachromatinsegment thatincludestheTSSof EEF1A1 gene.Thisassociation appearedexclusivelyinmitoticchromosomes[31].The MLL-boundsegmentincludedthemajorTSSof EEF1A1 extendingintothefirstexonandthefirstintronofthe gene(brown-barsinFigure12).WefindthatthecorrespondinggenomicDNAiswithinaCGIandincludes clustersofbothisolatedandoverlappingMLL1morphemes(Figure12,tracklabeledMLL1sites). Scale chr2:CpG IslandsShort Match MLL1 sites Non-motifs 10 kb hg19 45,225,000 45,230,000 45,235,000 45,240,000 45,245,000 45,250,000UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) CpG Islands (Islands < 300 Bases are Light Green)H3K4Me3 Mark (Often Found Near Promoters) on 7 cell lines from ENCODE Perfect Matches to Short Sequence (CG) BINA MLL Morphemes BINA MLL-non-motifs SIX2 Layered H3K4Me3 150 0 Figure7 HumangenomicDNAwithseveralCGIsand SIX2 gene. HorizontalgreenbarsmarkthepositionofCGIs.Tracklabeled “ Short Match ” marksthepositionofCpGs.Track-labeled “ MLL1sites ” marksmorphemepositions; “ non-motifs ” markthepositionofsequencesnotfound inresultsofSELEXassays[24].Tracklabeled “ LayeredH3K4me3 ” showsthepositionofH3K4me3marks[59]. Figure8 LocalizationofCGIsandMLL1morphemesinDNAsegmentsencompassinghuman HOXA locus. Horizontalgreenbarsmarkthe positionofCGIs.HorizontalbrownbarsmarkMLL1bindingsegmentsobservedinChIPassays[47].Track-labeled “ MLL1sites ” marksmorpheme positions; “ non-motifs ” markthepositionofsequencesnotfoundinresultsofSELEXassays[24].Track-labeled “ LayeredH3K4me3 ” showsthe positionofH3K4me3marks[59]. Bina etal.BMCGenomics 2013, 14 :927 Page8of16 http://www.biomedcentral.com/1471-2164/14/927

PAGE 9

ThesecondexamplecoverstheassociationofMLL1 withthe MYC locus(Figure13).ChIPassaysrevealedthat MLL1waspreferentiallyboundtothe MYC locusduring mitosiswhereasPOLIIoccupiedthelocusintheinterphasestageofcell-cycle[31].Inmitoticchromosomes, theMLL1-associatedchromatinsegmentincludedsequencesfrom~0.5kbupstreamto~1kbdownstreamof theTSSof MYC [31].ThecorrespondingDNAsegment encompassesaCGIthatcontainsnumerousclustersof MLL1morphemes(Figure13). PABPC1 locusprovidesan exampleofnumerousoccurrencesofbothisolatedand overlappingmorphemesinaregionbookmarkedby MLL1duringmitosis[31].Thebookmarkedsegment (2.5kb)iswithinaCGIthatspansthepromoter,the firstexon,andpartofthefirstintronof PABPC1 gene (Figure14).MLL1morphemesarespreadacrossthe DNAsegmentlocalizedinChIPassays(Figure14).The segmentincludesseveralmorpheme-overlapsproduced frompermutationsofCGCG,CGCCG,CGCCCG,and complementsofthesesequences.ExamplesincludeCG CG CCG,CGC CG CG,CG CG GGCG,CG CG GCG,CG CG GCGinPABPC1promoter.Notably,the PABPC1 associatedCGIcontainsmultipleoccurrencesofmorphemesthatalsooccurinaregionthat invivo MLL1 protectedfromCpGmethylation(Figure4B).OngenebookmarkingduringmitosisOverall,resultsofouranalysesimpliedthatinteractions ofMLL1withitsmorphemesmaycontributetogene bookmarkingeventsthatpreservedthememoryofgenes thatwerehighlyactivepriortomitosis[31].However,evidenceislackingforinvolvementofotherMLL1family membersingenebookmarkingevents.AsMLL1,MLL2/ KMT2Bbindsnon-methylatedCpGs[25].Furthermore,a studyhasshownbindingofMLL2toaPOLIIpromoter withinaCpGisland[66].However,ChIPassaysrevealed thatMLL2wasevictedfrommitoticchromatinindicating thatMLL2didnotcontributetogenebookmarking duringmitosis[31].Thestructureoftheothertwofamily members(MLL3/KMT2CandMLL4/KMT2D)doesnot containanMT-domain.Therefore,itseemsunlikelythat MLL3andMLL4wouldinteractwithCpGrichmotifslocalizedinCpGislands. MLL1isacomponentofrelativelylargeanddynamic multiproteincomplexes[13].Therefore,onemayask whetherothercomponentsinthesecomplexeswould contributetogene-bookmarkingbyMLL1[31].Inproteinnetworks,MLL1interactswithseveralproteins,includingMEN1,RBBP5,andASH2L[67],(Figure1).All threeproteinsassociatewithMLL1duringbothinterphaseandmitosis[31].InMLL-deficientcells,mostof RbBP5,ASH2L,andMEN1werelocalizedtothecytoplasm,indicatingthattheirassociationwithmitoticchromatinwasMLL-dependent[31].EventhoughMEN1 interactswithDNA,thebindingisnotDNA-sequencespecific[68].MEN1alsoassociateswithavarietyofDNA structures,includingY-structures,branchedstructures, and4-wayjunctionstructures[68].Inliteraturesurveys, Figure9 LocalizationofCGIandMLL1morphemesinDNAsegmentsencompassinghuman HOXB locus. Horizontalgreenbarsmarkthe positionofCGIs.Tracklabeled “ MLL1sites ” marksthepositionofMLL1morphemes; “ non-motifs ” markthepositionofsequencesnotfoundin resultsofSELEXassays[24].Track-labeled “ LayeredH3K4me3 ” showsthepositionofH3K4me3marks[59]. Scale chr12:CpG IslandsMLL1 sites Non-motifs 100 kb hg19 54,300,000 54,350,000 54,400,000 54,450,000UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) CpG Islands (Islands < 300 Bases are Light Green)H3K4Me3 Mark (Often Found Near Promoters) on 7 cell lines from ENCODE Your Sequence from Blat Search BINA MLL Morphemes BINA MLL-non-motifs HOXC-AS5 HOXC13 HOXC12 HOTAIR HOTAIR HOTAIR HOTAIR_4 HOTAIR_5 HOXC11 HOXC10 MIR196A2 HOXC9 HOXC8 HOXC6 HOXC5 HOXC4 HOXC6 HOXC5 MIR615 HOXC4 FLJ12825 LOC100240735 Y_RNA LOC100240734 Layered H3K4Me3 150 0 Figure10 LocalizationofCGIandMLL1morphemesinDNAsegmentsencompassinghuman HOXC locus. Horizontalgreenbarsmarkthe positionofCGIs.Track-labeled “ MLL1sites ” marksmorphemepositions; “ non-motifs ” markthepositionofsequencesnotfoundinresultsofSELEX assays[24].Track-labeled “ LayeredH3K4me3 ” showsthepositionofH3K4me3marks[59]. Bina etal.BMCGenomics 2013, 14 :927 Page9of16 http://www.biomedcentral.com/1471-2164/14/927

PAGE 10

wedidnotfindevidencefordirectinteractionsofMEN1 withCpG-containingsequences.Furthermore,whileduringmitoticsilencingofhighlyexpressedgenesMEN1was associatedwithmitoticchromatin,MLL1wasrequiredfor thisassociation[31]. Othercandidatesforgene-bookmarkingincludeLEDGF/ p75(Figure1).LEDGFisbestknownforitsroleintetheringtochromatinprotein-complexesthatintegratethe HIV-1genomeintothehost-cellchromosomes[69]. LEDGFprimarilyassociateswithchromatininregions downstreamofTSSs,toeffectgene-specificHIV-1integration[70].Furthermore,incontrasttoMLL1[24,53], LEDGFdoesnotbindCpG-richDNAsequences. SinceMLL1isbestknownforitsH3K4methyltransferaseactivity,onemayexpectthatgenebookmarkingby MLL1couldinvolvemechanismsdealingwithtrimethylationofhistoneH3[31].However,MLL1wasdispensable forpreservinghistoneH3K4methylationduringmitosis, indicatingthatMLL1servedH3K4methyltransferaseindependentfunctionstopropagateactivechromatin duringmitosis[31].Furthermore,duringmitosis,SETD1A wasevictedfrommitoticchromosomes,implyingthat itdidnotcontributetogene-bookmarkingevents[31]. SETD1AisthemajorH3K4methyltransferaseduringthe interphaseandtargetsmanynucleosome-associatedgenes forhistoneH3modifications[17].BothSETD1Aand SETD1Binteractwithaprotein(CXXC1/Cfp1)thatbinds unmethylatedCpG[26,43],Figure1.Earliergenome-wide studieslocalizedCXXC1toCGIsanddeducedthatCXXC1 functionsincludedrecruitmentofSETD1AandSETD1Bto chromatinfortrimethylationofH3K4[71].However,subsequentstudiesrevealedthatwhileCXXC1playedakey roleinorganizinggenome-wideH3K4me3inmouseES cells,itsDNAbindingdomainwasnotrequiredforrecruitmentofenzymesthatproducedH3K4me3marks onCGI-associatednucleosomes[72].WhileCXXC1is crucialforearlyembryonicdevelopmentandregulates genomiccytosinemethylationpatterns[73-75],itremains tobedeterminedwhetherCXXC1mayalsoplayarolein genebookmarkingduringmitosis.OccurrencesofoverlappingMLL1morphemesWenotedthatinsomecases,morphemesoverlappedin variousordersandcombinations.Basedonstatistical criteria(describedintheMethodssection),occurrences ofmorphemeoverlapsinCGIsareevenrarerevents thanthoseobtainedforisolatedmorphemes.Wefound thatmorphemeoverlapscreatinglongsequencesappeared infrequentlyinhumangenomicDNA.Wenoticedthat morphemeoverlapscouldbeproducedfromarepeated DNAsequenceelement.ExamplesincludeCGGrepeats associatedwithgeneticabnormalities.(CGG)ncreatesmorphemeoverlapsofthefollowingform:CGG CG G CG G CG G CG G CG Getc. Notableexamplesincludethe FMR1 locusinwhich CGGexpansioncausesmentalretardation[76].ThisexpansionarisesinaCGIassociatedwithFragileXSyndrome,inthe5 untranslatedregionofthe FMR1 gene [77].Innormalindividuals,repeat-sizevariesfrom6to54 CGG[78].Allalleleswithgreaterthan52repeats,includingthoseidentifiedinanormalfamily,aremitotically unstable[78].Remarkably,incarriers FMR1 transcription increases,displayingapositivecorrelationbetweenrepeat numberandlevelsof FMR1 transcripts[79].Additionally, Scale chr2: CpG Islands MLL1 sites Non-motifs 50 kb hg19 176,950,000 177,000,000 177,050,000 UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) CpG Islands (Islands < 300 Bases are Light Green) H3K4Me3 Mark (Often Found Near Promoters) on 7 cell lines from ENCODE Your Sequence from Blat Search BINA MLL Morphemes BINA MLL-non-motifs EVX2 HOXD13 HOXD12 HOXD12 HOXD11 HOXD11 HOXD10 HOXD9 AX747372 AX747372 HOXD8 HOXD8 HOXD8 HOXD-AS2 BC047605 MIR10B HOXD4 HOXD3 HOXD-AS1 HOXD-AS1 HOXD1 HOXD1 Layered H3K4Me3 150 0 Figure11 LocalizationofCGIandMLL1morphemesinDNAsegmentsencompassinghuman HOXD locus. Horizontalgreenbarsmarkthe positionofCGIs.Track-labeled “ MLL1sites ” marksmorphemepositions; “ non-motifs ” markthepositionofsequencesnotfoundinresultsofSELEX assays[24].Track-labeled “ LayeredH3K4me3 ” showsthepositionofH3K4me3marks[59]. Scale chr6: CpG Islands MLL1 sites Non-motifs 10 kb hg19 74 220 000 74,225 00074 ,230, 000 74,235 00074, 240 000 74 245 000 UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) CpG Islands (Islands < 300 Bases are Light Green) H3K4Me3 Mark (Often Found Near Promoters) on 7 cell lines from ENCODE BINA MLL Morphemes EEF1A1 Layered H3K4Me3 150 0 Figure12 Zoom-outviewofMLL1morphemeoccurrencesin EE1F1A locus. HorizontalgreenbarsmarkthepositionofCGIs.Horizontal brownbarmarksthechromatinsegmentbookmarkedbyMLL1duringmitosis[31].Track-labeled “ MLL1sites ” marksmorphemepositions; “ nonmotifs ” markthepositionofsequencesnotfoundinresultsofSELEXassays[24].Tracklabeled “ LayeredH3K4me3 ” showsthepositionof H3K4me3marks[59]. Bina etal.BMCGenomics 2013, 14 :927 Page10of16 http://www.biomedcentral.com/1471-2164/14/927

PAGE 11

carriersdisplaychangesinTSSutilization[76].Thus,repeatedoverlappingmorphemes,downstreamoftranscriptioninitiationsites,mayinfluenceTSSutilization andupregulationofgeneexpression. SeveraloverlappingMLL1morphemesaredispersed acrossthehuman HOX loci(Figures8,9,10and11).In somecasesmorphemeoverlapsarelocalizedupstreamor nearTSSofspecificgenes.Examplesincludemorpheme overlapsinpromoter/upstreamsequencesofgenesinvariousloci:CGCGCGCGCG, HOXA4 ;CGCCCGCCCGCCG CCCGCCCG HOXA6 ;CGCCCGCGCCCGGCG, HOXA7 ; CGGCGCGCGCG, HOXA11 ;tworepeats(CGCCGCCG CCGCCGCCGCCGCCCGandCGCCGCCCGCCGCCG CCGCCG), HOXC8 ;andCGGCGGCGGCGGCG, HOXD10 Insomecases,overlappingmorphemesarelocalized incodingregionsproducingrepeatedaminoacidresiduesinapolypeptidechain.Notableexamplesinclude morphemeoverlapsin HOXA13 and HOXD13 coding sequencesproducingtractsofalanines.Amplificationof DNAsequencesina HOXD13 exoncausesSyndactyly, fusionofdigitsinfingers[63].ItisthoughtthatSyndactylyisduetoexpansionofalanine-tracts[63].However, itseemsplausiblethatoverlappingmorphemesincoding sequencesmayplayaregulatoryroleatthelevelofgene expression.Infact,anemergingviewisthatgeneregulatoryandcodingsequencesaremoreintermingledthan oncebelieved[80]. Statisticalcriteriaindicatethatmorphemeoverlapsare rareeventsingenomicDNA,raisingthequestionof whetheroccurrencesofoverlappingmorphemescould playaroleincellularfunctionsregulatedbyMLL1.In thiscontext,wenotedthatapreviousstudyfoundthat theSETdomaininMLL1self-associatedtoformhomooligomericcomplexes[81].Thisassociationwasobserved invariousexperimentalsettingsincludingyeasttwohybridmethodology,biochemicalstudies,anddeletion analyses[81].Thestudyfoundasimilarself-association fortheSETdomaininthe Drosophila trithorax[81]. InleukemogenicMLL1fusionproteins,theSETdomainisdeletedandreplacedwithover40different translocationpartners[20].Invariably,theMLL1MTdomainisretainedattheamino-terminusoffusionproteins[20].MLL1fusionpartnersincludetranscriptional activatorsthatupregulategeneexpressioninleukemic cells.Also,therearepartnersthatimparttranscriptional activatingpropertiestoMLL1fusionproteinsbypromotingdimerization[82].Dimerizationoffusionproteinsimmortalizedhematopoieticcellsbyupregulating transcriptionofseveralendogenousgenes[82].Interestingly,protein-dimerizationenhancedthebindingof MLL1amino-terminustoregulatoryregionsleadingto upregulationoflinkedgenes[82]. SincenormalformsofMLL1self-associate via theSET domain[81]toproduceshomo-oligomericcomplexes [81],itseemsplausiblethatasobservedforleukemiccells [82],associationofMLL1moleculescouldenhancetheaffinityofMLL1forDNA.Furthermore,self-association mightoperateinlinkingMLL1moleculessothatthey wouldresidesimultaneouslyondifferentmaintenanceelementsinchromosomes[81].ThismechanismwouldintegratetheactivityofMLL1inactivationofatargetgene, sharedtargetgenes,orboth[81].PropagationofMLL1 associationwithDNAmayarisefromacombinationof twomolecularevents:bindingofMLL1tooverlapping Scale chr8: C p G Islands MLL1 sites Non-motifs 10 kb hg19 128,735,000128, 740 ,000128,745,000128 750, 000 128,755,000128, 760, 000 UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) CpG Islands (Islands < 300 Bases are Light Green) H3K4Me3 Mark (Often Found Near Promoters) on 7 cell lines from ENCODE BINA MLL Morphemes BINA MLL-non-motifs BC042052 MYC HV975509 Layered H3K4Me3 150 0 Figure13 Zoom-outviewofmorphemeoccurrencesin MYC locus. HorizontalgreenbarsmarkthepositionofCGIs.Horizontalbrownbars markthechromatinsegmentbookmarkedbyMLL1duringmitosis[31].Tracklabeled “ MLL1sites ” marksthepositionofMLL1morphemes; “ non-motifs ” markthepositionofsequencesnotfoundinresultsofSELEXassays[24].Tracklabeled “ LayeredH3K4me3 ” showsthepositionof H3K4me3marks[59]. Scale chr8: C pG Islands MLL1 sites Non-motifs 20 kb hg19 101 705 000 101,710, 000 101 ,715,000 101 720 000 101,725, 000 101,730,000 101 735 ,000 101,740, 000 101,745, 000 101 750, 000 UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) CpG Islands (Islands < 300 Bases are Light Green) H3K4Me3 Mark (Often Found Near Promoters) on 7 cell lines from ENCODE BINA MLL Morphemes BINA MLL-non-motifs PABPC1 PABPC1 PABPC1 PABPC1 PABPC1 Layered H3K4Me3 150 0 Figure14 Zoom-outviewofmorphemeoccurrencesin PABPC1 locus. HorizontalgreenbarmarksthepositionofCGIs.Horizontalbrownbar marksthechromatinsegmentbookmarkedbyMLL1duringmitosis[31].Track-labeled “ MLL1sites ” marksmorphemepositions; “ non-motifs ” mark thepositionofsequencesnotfoundinresultsofSELEXassays[24].Tracklabeled “ LayeredH3K4me3 ” showsthepositionofH3K4me3marks[59]. Bina etal.BMCGenomics 2013, 14 :927 Page11of16 http://www.biomedcentral.com/1471-2164/14/927

PAGE 12

morphemesandMLL1oligomerization via theSETdomain.CooperativeDNAbinding, via self-association, oftenincreasestheDNAbindingspecificityofaprotein. WeimaginethatoverlappingMLL1morphemesmayfacilitateMLL1self-associationlinkingMLL1molecules toresidecooperativelyonDNAsequenceelements (TREs)thatmaintaincellularmemoryduringdevelopment.OurdataindicatethatsuchTREsalsocouldfunctioningene-bookmarkingtopreservethememoryof highlyactivegenesduringmitosis.Also,onecouldimagine thatoverlappingmorphemeoccurrencesmayfacilitatelocalizedpropagationofMLL1bindingtoDNAtomaintain anucleosome-freeregionand,thus,anopenchromatin configuration.ConclusionsAnnotationofthehumangenomehasinvolvednumerous experimentalandcomputationalstrategiestoidentifyand describeDNAsequencesthatareimportanttocellular functions.However,despitecutting-edgeadvances,welack acompleteunderstandingofthefunctionofCpGislands, whichwerediscoveredsometimeago[4,5].Resultsofour analysisprovidesuggestiveevidenceforspecificsequence motifsinCGIsthatmayfunctionintherecruitmentof MLL1tomitoticchromatin.WeshowthatvariouscombinationsofMLL1morphemesoccurinchromatinregions bookmarkedbyMLL1duringmitosis[31].Thus,ourresultsimplicatetheMLL1morphemesinsequence-features thatdefinethemammalianTREs.Ourresultsalsosuggest aroleforoverlappingmorphemesinproducingmultiple MLL1bindingevents,linkingMLL1moleculessothatthey wouldresidesimultaneouslyondifferentmaintenanceelementsinchromosomes,aspreviouslyproposed[81]. OurfindingsalsomayexplainwhyCGIsoftenextendto includepromoter,exonic,andintronicsequencesofgenes. BybindingCGIs,MLL1mightpreserveandmaintainan openchromatinconfigurationtoregulategeneexpression andtofacilitaterapidgeneactivationuponmitoticexit. AssociationofMLL1withCGIsagreeswithaglobalrole forMLL1inregulationoftranscription[47]. Apparently,ourfindingspr ovidethefirstevidence fortheexistenceofpotentialTREsinmammalian genomicDNAandthefirstevidenceforaconnection betweenCGIsandgene-bookmarkingbyMLL1to transmitthememoryofhighl yactivechromatinstates duringcell-division.Becau seofthestrongconnection ofTREsandPREsin Drosophila [32],wespeculate thattheMLL1morphemesmayplayadualrole:(1) contributedirectlytotherecruitmentofmammalian TrxGcomplexestochromatinand(2)contributeindirectly,ordirectly,totherecruitmentofPRC2complexes tochromatintorepresstranscription.Thispossibilityis consistentwiththefindingthatthemammalianPRC2repressivecomplexbindsCGIs[34]andourdiscoveryof frequentoccurrencesofMLL1morphemesinCpG islands.MethodsIdentificationofMLL1morphemesandtheirlocalization inhumangenomicDNAWeidentifiedtheMLL1bindingunitsbyanalyzingresultsofreportedSELEX-andPCR-basedassays.These assayswereconductedtodeterminetheDNAbinding propertiesoftheMLL1MT-domain[24].Inouranalyses,weincludedcountingschemestoassessthenumberofCpGsandtoidentifynucleotidesthatappeared betweenCpGdinucleotidesineachclonedinserts. TocountgenomicoccurrencesofMLL1morphemes, wedownloadednucleotidesequencesofCGIsandhumanchromosomesfromthehumangenomebrowserat UCSC[83].APerlscriptwaswrittentodetermineoccurrencesofeachmorphemeindownloadedsequences andtocreateoutputsdisplayingtheresults.Wefollowed variouscountingschemes.Wefoundincludingoromittingmorphemeoverlapsgaveaboutthesamenumberof counts(variationamongprocedureswaslessthan10%). TolocalizegenomicpositionsofMLL1morphemes, weretrievedgenomicDNA(Hg19)fromtheGenome BrowseratUCSC[83].Sequenceanalysesinvolvedscanningthehumangenomeformorphemeoccurrences,using Perlscripts[50].Similarly,wedevelopedscripttocreate outputs(bedfiles)todisplaythepositionofMLL1morphemesontheGenomeBrowseratUCSC[50].Toolsofferedbythebrowserfacilitatedexamininggenomicmaps incontextoflandmarks,includingthepositionofgenes, CGIs,andchromatinmodificationpatterns[83,84].StudiesofpromotersequencesofhumangenesToanalyzepromotersequencesofPOLIIgenes,weobtainedtheaccessionnumberofhumancDNAsfromthe UCSCdatabase[83].HeatherT rumbower(atUCSC)wrote queriesandretrievedtheaccessionnumberof44,338 cDNAs,organizedaccordingtotheirpositioninhuman chromosomes.Toreducesequence-redundancy,weselectedonecDNApergene.Subsequently,wecomputationallyremovedcDNAsthatappearedtobeincomplete. AccessionnumbersofremainingcDNAswereuploaded onthetablebrowseratUCSCtoobtainthenucleotidesequenceofcorrespondingpromoters:-500totranscription startsite.Sincethehumangenomemaycontainmultiple copiesofagivengene[85],wechoseonepromotertorepresentredundantgenes. Afterwards,wefollowedpreviouslydescribedmethods [49,50,86]tocreateadatabaseforretrievinginformation aboutthefinalset(15,906)ofpromoters.Thedatabase (RF_data_06)trackedthenumberofoccurrencesaswell asthepositionofallpossible9-mersinPOLIIpromoters,withrespecttoTSSs.Forstatisticalevaluations,Bina etal.BMCGenomics 2013, 14 :927 Page12of16 http://www.biomedcentral.com/1471-2164/14/927

PAGE 13

thedatabaseincludedcountsof9-mersintotalhuman genomicDNAandinrepetitiveDNAsequences[49]. Forpromoteranalyses,wequeriedRF_data_06toobtain countsforagivensubsequence(i.e.CGCG,CGNCG, CGNNCG,andMLL1morphemes)ateachnucleotide position( 500to 1).StatisticalevaluationForstatisticalevaluation,wefollowedapreviouslydescribedapproach[49].Briefly,RegnierandSzpankowski haveshownthatoccurrencesofwordsinarandomly generatedtext(basedoneitherBernoulliorMarkov model)arenormallydistributedaroundamean[87].We usedtheirfindingstoperformstatisticalderivations basedontheprincipleoflargedeviations[49]. Wechosethefollowingnotations:LGlengthoftotalgenomicDNA,LEtotallengthofCGIs,andLFlengthofregionsthatdonotcorrespondtoCGIs.Thus,LF=LG LESubsequently,wecreatedamotiftable( w1, ƒ wM), consistingofMLL1morphemestoidentifyelementsthat matchedsequencesinLEandLGFor1 i M,wedenotebyEi,Fi,andGi,respectively, thefrequencyoftheithelement(wi)inLE,LF,andLGSinceLEissignificantlyshorterthanLGasanapproximationweassume|LE|<|LF| |LG| Quantitiesofinterestaretotalcountsnormalizedwith respecttolengthofanalyzedsequences: ei EiLEfi FiLFgi GiLGWemadetwoadditionaljustifiableapproximations: fi giandfi p(wi) Since|LG|isverylarge,withinthemarginoferror,fiapproximatestheprobabilityofoccurrenceofmorphemewiingenomicDNA. Aspreviously[49],weaimedtodetermineathreshold thsothatwecouldassignstatisticalsignificanceto casesinwhichei> fi(orei> gi).Evaluationsrequire comparingempiricaldatatoareferencemodel.Forreference,wechoseaprobabilisticmodelassumingthat thegenomeisgeneratedbyamemorylessorMarkov source.Inthismodel,eiandfibecomerandomvariables. Asdetailedabove,wesimplifiedtheanalysisbyassuming thatfi=p(wi)isaconstant.Subsequently,wedetermine whetherforagiven ,theeventei> fiis statistically significant providedthattheprobabilityofei> fiis smallerthan .Thatis,P(ei> fi)< (thechanceof randomnessthatgeneratestheeventei> fiisvery small).Weset =10 50tocomputethe ththreshold. From[87],weknewthatEivaluesshouldbenormally distributedaroundamean EEi LEpwi WhenEidoesnotdeviatemorethan O LEpwi p Eie NLEpwi ; LE2wi Where,N( 2)denotesthenormaldistributionwith mean andvariance 2WhenEideviatesfrom O LEpwi p anotherprobabilisticlawwouldgoverntheEibehavior:namely,the largedeviationslaw[56].PreviouslyRegnierandSzpankowski[87]proved pEi < 1 LEpwi < 1 2 LEp exp LEI 1 Where,I( )isacomplicatedfunctionof thatdependsonmomentgenerationfunctions[88].Tocomputethreshold =1+ >1,weestimate from Pei< 1 pwi < ThatequationtranslatesintoP(Ei>(1+ )LEp(wi))< whichisclearlywithinthelargedeviationsdomain. Fortheanalyses,weneedtoapplyEq.(1).However, numericalcomputationsofthelargedeviationfunctionI ( )arerathercumbersome.Therefore,wefollowedapproximations,notingthatagoodboundwasneeded onlyforthelargedeviationprobability.Ignoringoverlappingmorphemes,EiwouldbeasumofBernoulliindependentrandomvariables.Ifthatcase,thefollowing boundcanbefound(cf.forexample,Ref.[88]): PEi> 1 LEpwi < exp LEI 2= 3 2 Toberigorousandtakeintoaccountoverlapping morphemes,wemustsomewhatrelaxequation(2).ReferringtoAzuma ’ sinequality(cf.Ref.[56]),weobtain: PEi> 1 LEpwi exp LEpwi 2= 2 3 Fromequation(2)and(3),weobtainthefollowingestimateforthreshold th=1+ 1 2ln 1LEpwi s th 1 3ln 1LEpwi sAdditionalfilesAdditionalfile1:FigureS1. OccurrencesofCpG-richmotifsinpromoter regionsofhumanprotein-codinggenes.Fullmagenta-circlescorrespondto CGNNCG,blue-circlestoCGNCG,andemptyred-circlestoCGCG.MotifBina etal.BMCGenomics 2013, 14 :927 Page13of16 http://www.biomedcentral.com/1471-2164/14/927

PAGE 14

frequenciesareshownasthefunctionofnucleotidepositionsinpromoter sequences,numberedwithrespecttoTSSs. Additionalfile2:TableS1. Frequencyofmorphemesandnon-motifs inpromotersequencesofPOLIIgenes. Additionalfile3:TableS2. Countsofexpectedandobserved morphemeoccurrencesinCpGislands. Additionalfile4:TableS3. Countsofexpectedandobserved non-motifoccurrencesinCpGislands. Abbreviations CGIs: CpGislands;H3K4:Lysine4onhistoneH3;PcG:Polycombgroup; POLII:RNApolymeraseII;PRC1:Polycombrepressivecomplexes1; PRC2:Polycombrepressivecomplexes2;PREs:Polycombresponseelements; TREs:Trithoraxresponseelements;TSSs:Transcriptionstartsites; TrxG:Trithoraxgroup;UCSC:UniversityofCaliforniaSantaCruz. Competinginterests Theauthorsdeclarethattheyhavenocompetinginterests. Authors ’ contributions MBdesignedexperiments,performedanalyses,andwrotethemanuscript. PWwroteprogramsforcreatingthedatabaseandperformedstatistical evaluations.ENanalyzedresultsofSELEXassaysandidentifiedtheMLL1 morphemes.NZmappedthepositionofMLL1morphemesinhuman genomicDNA.JXimprovedtheprogramsusedinmappingstudies.RPand ZGanalyzedalistingofhumanprotein-coding-genesdownloadedfromthe humangenomebrowseratUCSC.MFcreatedadatabaseof9-mersfor studiesofpromoterregionsofhumangenes.BFcontributedtostatistical evaluations.DWwroteprogramstomapthepositionofMLL1morphemes inhumanchromosomes.Allauthorsreadandapprovedthefinalmanuscript. Acknowledgements WethankHeatherTrumbowerforretrievingfromtheUCSCgenome browseralistingofhumanproteincodinggenes.WethankArnoldSteinfor helpfuldiscussionsandforhiscriticalreviewofthemanuscript. Received:2August2013Accepted:16December2013 Published:28December2013 References1.InternationalHumanGenomeSequencingConsortium: Initialsequencing andanalysisofthehumangenome. Nature 2001, 409 (6822):860 – 921. 2.TaylorJ: Cluestofunctioningenedeserts. TrendsBiotechnol 2005, 23 (6):269 – 271. 3.WiluszJE,SunwooH,SpectorDL: LongnoncodingRNAs:functionalsurprises fromtheRNAworld. GenesDev 2009, 23 (13):1494 – 1504. 4.DeatonAM,BirdA: CpGislandsandtheregulationoftranscription. GenesDev 2011, 25 (10):1010 – 1022. 5.Gardiner-GardenM,FrommerM: CpGislandsinvertebrategenomes. JMolBiol 1987, 196 (2):261 – 282. 6.IllingworthRS,BirdAP: CpGislands –‘ aroughguide ’ FEBSLett 2009, 583 (11):1713 – 1720. 7.CrossSH,BirdAP: CpGislandsandgenes. CurrOpinGenetDev 1995, 5 (3):309 – 314. 8.ZhaoZ,HanL: CpGislands:algorithmsandapplicationsinmethylation studies. BiochemBiophysResCommun 2009, 382 (4):643 – 645. 9.IrizarryRA,Ladd-AcostaC,WenB,WuZ,MontanoC,OnyangoP,CuiH, GaboK,RongioneM,WebsterM, etal : Thehumancoloncancermethylomeshowssimilarhypo-andhypermethylationatconservedtissuespecificCpGislandshores. NatGenet 2009, 41 (2):178 – 186. 10.DoiA,ParkIH,WenB,MurakamiP,AryeeMJ,IrizarryR,HerbB,Ladd-Acosta C,RhoJ,LoewerS, etal : Differentialmethylationoftissue-andcancerspecificCpGislandshoresdistinguisheshumaninducedpluripotent stemcells,embryonicstemcellsandfibroblasts. NatGenet 2009, 41 (12):1350 – 1353. 11.ShenJ,WangS,ZhangYJ,WuHC,KibriyaMG,JasmineF,AhsanH,WuDP, SiegelAB,RemottiH, etal :Exploringgenome-wideDNAmethylationprofilesalteredinhepatocellularcarcinomausingInfiniumHumanMethylation450BeadChips. Epigenetics 2013, 8 (1):34 – 43. 12.WangD,LiuX,ZhouY,XieH,HongX,TsaiHJ,WangG,LiuR,WangX: Individualvariationandlongitudinalpatternofgenome-wideDNA methylationfrombirthtothefirsttwoyearsoflife. Epigenetics 2012, 7 (6):594 – 605. 13.BinaM: Generegulation. MethodsMolBiol 2013, 977: 1 – 11. 14.KouzaridesT: Chromatinmodificationsandtheirfunction. Cell 2007, 128 (4):693 – 705. 15.RuthenburgAJ,AllisCD,WysockaJ: Methylationoflysine4onhistoneH3: intricacyofwritingandreadingasingleepigeneticmark. MolCell 2007, 25 (1):15 – 30. 16.ZhouVW,GorenA,BernsteinBE: Chartinghistonemodificationsandthe functionalorganizationofmammaliangenomes. NatRevGenet 2011, 12 (1):7 – 18. 17.ShilatifardA: TheCOMPASSfamilyofhistoneH3K4methylases: mechanismsofregulationindevelopmentanddiseasepathogenesis. AnnuRevBiochem 2012, 81: 65 – 95. 18.RowleyJD: Rearrangementsinvolvingchromosomeband11Q23inacute leukaemia. SeminCancerBiol 1993, 4 (6):377 – 385. 19.BernardOA,BergerR: Molecularbasisof11q23rearrangementsin hematopoieticmalignantproliferations. GenesChromosomesCancer 1995, 13 (2):75 – 85. 20.MunteanAG,HessJL: Thepathogenesisofmixed-lineageleukemia. AnnuRevPathol 2012, 7: 283 – 301. 21.CosgroveMS,PatelA: Mixedlineageleukemia:astructure-function perspectiveoftheMLL1protein. FEBSJ 2010, 277 (8):1832 – 1842. 22.MaQ,AlderH,NelsonKK,ChatterjeeD,GuY,NakamuraT,CanaaniE,CroceCM, SiracusaLD,BuchbergAM: AnalysisofthemurineAll-1generevealsconserveddomainswithhumanALL-1andidentifiesamotifsharedwithDNA methyltransferases. ProcNatlAcadSciUSA 1993, 90(13):6350 – 6354. 23.CierpickiT,RisnerLE,GrembeckaJ,LukasikSM,PopovicR,OmonkowskaM, ShultisDD,Zeleznik-LeNJ,BushwellerJH: StructureoftheMLLCXXC domain-DNAcomplexanditsfunctionalroleinMLL-AF9leukemia. NatStructMolBiol 2010, 17 (1):62 – 68. 24.BirkeM,SchreinerS,Garcia-CuellarMP,MahrK,TitgemeyerF,SlanyRK: TheMT domainoftheproto-oncoproteinMLLbindstoCpG-containingDNAand discriminatesagainstmethylation. NucleicAcidsRes 2002, 30 (4):958 – 965. 25.BachC,MuellerD,BuhlS,Garcia-CuellarMP,SlanyRK: Alterationsofthe CxxCdomainprecludeoncogenicactivationofmixed-lineageleukemia 2. Oncogene 2009, 28 (6):815 – 823. 26.LeeJH,SkalnikDG: CpG-bindingprotein(CXXCfingerprotein1)isa componentofthemammalianSet1histoneH3-Lys4methyltransferase complex,theanalogueoftheyeastSet1/COMPASScomplex. JBiolChem 2005, 280 (50):41725 – 41731. 27.RisnerLE,KuntimaddiA,LokkenAA,AchilleNJ,BirchNW,SchoenfeltK, BushwellerJH,Zeleznik-LeNJ: FunctionalspecificityofCpGDNA-binding CXXCdomainsinmixedlineageleukemia. JBiolChem 2013, 288 (41):29901 – 29910. 28.YuBD,HessJL,HorningSE,BrownGA,KorsmeyerSJ: AlteredHox expressionandsegmentalidentityinMll-mutantmice. Nature 1995, 378 (6556):505 – 508. 29.HansonRD,HessJL,YuBD,ErnstP,vanLohuizenM,BernsA,vanderLugtNM, ShashikantCS,RuddleFH,SetoM, etal : MammalianTrithoraxandpolycombgrouphomologuesareantagonisticregulatorsofhomeoticdevelopment. ProcNatlAcadSciUSA 1999, 96 (25):14372 – 14377. 30.AytonP,SneddonSF,PalmerDB,RosewellIR,OwenMJ,YoungB,PresleyR, SubramanianV: TruncationoftheMllgeneinexon5bygenetargeting leadstoearlypreimplantationlethalityofhomozygousembryos. Genesis 2001, 30 (4):201 – 212. 31.BlobelGA,KadaukeS,WangE,LauAW,ZuberJ,ChouMM,VakocCR: A reconfiguredpatternofMLLoccupancywithinmitoticchromatin promotesrapidtranscriptionalreactivationfollowingmitoticexit. MolCell 2009, 36 (6):970 – 983. 32.RingroseL,ParoR: Polycomb/Trithoraxresponseelementsandepigenetic memoryofcellidentity. Development 2007, 134 (2):223 – 232. 33.SchuettengruberB,ChourroutD,VervoortM,LeblancB,CavalliG: Genome regulationbypolycombandtrithoraxproteins.Cell 2007, 128 (4):735 – 745. 34.MendenhallEM,KocheRP,TruongT,ZhouVW,IssacB,ChiAS,KuM, BernsteinBE: GC-richsequenceelementsrecruitPRC2inmammalianES cells. PLoSGenet 2010, 6 (12):e1001244. 35.BannisterAJ,KouzaridesT: Regulationofchromatinbyhistone modifications. CellRes 2011, 21 (3):381 – 395.Bina etal.BMCGenomics 2013, 14 :927 Page14of16 http://www.biomedcentral.com/1471-2164/14/927

PAGE 15

36.ChaseA,CrossNCP: AberrationsofEZH2incancer. ClinCancerRes 2011, 17 (9):2613 – 2618. 37.GibsonWT,HoodRL,ZhanSH,BulmanDE,FejesAP,MooreR,MungallAJ, EydouxP,Babul-HirjiR,AnJ, etal : MutationsinEZH2causeWeaver syndrome. AmJHumGenet 2012, 90 (1):110 – 118. 38.Tatton-BrownK,HanksS,RuarkE,ZachariouA,DuarteSdelV,RamsayE, SnapeK,MurrayA,PerdeauxER,SealS, etal : Germlinemutationsinthe oncogeneEZH2causeWeaversyndromeandincreasedhumanheight. Oncotarget 2011, 2 (12):1127 – 1133. 39.WiedemannHR: AtlasofClinicalSyndromes:AVisualAidtoDiagnosisfor CliniciansandPracticingPhysicians. 2ndedition.London:WolfePublishing Ltd.;1989. 40.SteinerCE,MarquesAP: Growthdeficiency,mentalretardationand unusualfacies. ClinDysmorphol 2000, 9 (2):155 – 156. 41.JonesWD,DafouD,McEntagartM,WoollardWJ,ElmslieFV,Holder-EspinasseM, IrvingM,SaggarAK,SmithsonS,TrembathRC, etal : Denovomutations inMLLcauseWiedemann-Steinersyndrome. AmJHumGenet 2012, 91 (2):358 – 364. 42.YuBD,HansonRD,HessJL,HorningSE,KorsmeyerSJ: MLL,amammalian trithorax-groupgene,functionsasatranscriptionalmaintenancefactor inmorphogenesis. ProcNatlAcadSciUSA 1998, 95 (18):10632 – 10636. 43. TheBiologicalGeneralRepositoryforInteractionDatasets. http://thebiogrid.org/. 44.LeeJH,TateCM,YouJS,SkalnikDG: Identificationandcharacterizationof thehumanSet1BhistoneH3-Lys4methyltransferasecomplex. JBiol Chem 2007, 282 (18):13419 – 13428. 45.BreitkreutzBJ,StarkC,TyersM: Osprey:anetworkvisualizationsystem. GenomeBiol 2003, 4 (3):R22. 46.Chatr-AryamontriA,BreitkreutzBJ,HeinickeS,BoucherL,WinterA,StarkC, NixonJ,RamageL,KolasN,O ’ DonnellL, etal : TheBioGRIDinteraction database:2013update. NucleicAcidsRes 2013, 41 (Databaseissue):D816 – 823. 47.GuentherMG,JennerRG,ChevalierB,NakamuraT,CroceCM,CanaaniE,YoungRA: GlobalandHox-specificrolesfortheMLL1methyltransferase. ProcNatlAcadSciUSA 2005, 102 (24):8603 – 8608. 48.BinaM,CrowelyE: Sequencepatternsdefiningthe5 boundaryofhuman genes. Biopolymers 2001, 59 (5):347 – 355. 49.BinaM,WyssP,RenW,SzpankowskiW,ThomasE,RandhawaR,ReddyS, JohnPM,Pares-MatosEI,SteinA, etal : Exploringthecharacteristicsof sequenceelementsinproximalpromotersofhumangenes. Genomics 2004, 84 (6):929 – 940. 50.BinaM,WyssP,LazarusSA,ShahSR,RenW,SzpankowskiW,CrawfordGE, ParkSP,SongXC: Discoveringsequenceswithpotentialregulatory characteristics. Genomics 2009, 93 (4):314 – 322. 51.FitzGeraldPC,ShlyakhtenkoA,MirAA,VinsonC: ClusteringofDNA sequencesinhumanpromoters. GenomeRes 2004, 14 (8):1562 – 1574. 52.Marino-RamirezL,SpougeJL,KangaGC,LandsmanD: Statisticalanalysisof over-representedwordsinhumanpromotersequences. NucleicAcidsRes 2004, 32 (3):949 – 958. 53.AytonPM,ChenEH,ClearyML: BindingtononmethylatedCpGDNAis essentialfortargetrecognition,transac tivation,andmyeloidtransformation byanMLLoncoprotein. MolCellBiol 2004, 24 (23):10470 – 10478. 54.ErfurthFE,PopovicR,GrembeckaJ,CierpickiT,TheislerC,XiaZB,StuartT, DiazMO,BushwellerJH,Zeleznik-LeNJ: MLLprotectsCpGclustersfrom methylationwithintheHoxa9gene,maintainingtranscriptexpression. ProcNatlAcadSciUSA 2008, 105 (21):7517 – 7522. 55.MilneTA,BriggsSD,BrockHW,MartinME,GibbsD,AllisCD,HessJL: MLL targetsSETdomainmethyltransferaseactivitytoHoxgenepromoters. MolCell 2002, 10 (5):1107 – 1117. 56.SzpankowskiW: AverageCaseAnalysisofAlgorithmsonSequences. NewYork: Wiley;2001. 57.OrlandoDA,GuentherMG,FramptonGM,YoungRA: CpGislandstructure andtrithorax/polycombchromatindomainsinhumancells. Genomics 2012, 100 (5):320 – 326. 58.MikkelsenTS,KuM,JaffeDB,IssacB,LiebermanE,GiannoukosG,AlvarezP,BrockmanW,KimTK,KocheRP, etal : Genome-widemapsofchromatinstate inpluripotentandlineage-committedcells. Nature 2007, 448 (7153):553 – 560. 59.TheENCODEProjectConsortium: Auser ’ sguidetotheencyclopediaof DNAelements(ENCODE). PLoSBiol 2011, 9 (4):e1001046. 60.RelaixF,DemignonJ,LaclefC,PujolJ,SantoliniM,NiroC,LaghaM, RocancourtD,BuckinghamM,MaireP: Sixhomeoproteinsdirectlyactivate Myodexpressioninthegeneregulatorynetworksthatcontrolearly myogenesis. PLoSGenet 2013, 9 (4):e1003425. 61.SoshnikovaN,DubouleD: EpigeneticregulationofvertebrateHoxgenes: adynamicequilibrium. Epigenetics 2009, 4 (8):537 – 540. 62.GehringWJ: Homeoboxesinthestudyofdevelopment. Science 1987, 236 (4806):1245 – 1252. 63.GoodmanFR: LimbmalformationsandthehumanHOXgenes. AmJMed Genet 2002, 112 (3):256 – 265. 64.ZhangX,LianZ,PaddenC,GersteinMB,RozowskyJ,SnyderM,GingerasTR, KapranovP,WeissmanSM,NewburgerPE: Amyelopoiesis-associatedregulatoryintergenicnoncodingRNAtranscriptwithinthehumanHOXA cluster. Blood 2009, 113 (11):2526 – 2534. 65.WangKC,YangYW,LiuB,SanyalA,Corces-ZimmermanR,ChenY,Lajoie BR,ProtacioA,FlynnRA,GuptaRA, etal : AlongnoncodingRNAmaintains activechromatintocoordinatehomeoticgeneexpression. Nature 2011, 472 (7341):120 – 124. 66.LadopoulosV,HofemeisterH,HoogenkampM,RiggsAD,StewartAF, BoniferC: ThehistonemethyltransferaseKMT2BisrequiredforRNA polymeraseIIassociationandprotectionfromDNAmethylationatthe MagohBCpGislandpromoter. MolCellBiol 2013, 33 (7):1383 – 1393. 67.NakamuraT,MoriT,TadaS,KrajewskiW,RozovskaiaT,WassellR,DuboisG, MazoA,CroceCM,CanaaniE: ALL-1isahistonemethyltransferasethat assemblesasupercomplexofproteinsinvolvedintranscriptional regulation. MolCell 2002, 10 (5):1119 – 1128. 68.LaP,SilvaAC,HouZ,WangH,SchneppRW,YanN,ShiY,HuaX: Directbinding ofDNAbytumorsuppressormenin.JBiolChem 2004, 279 (47):49045 – 49054. 69.CherepanovP,MaertensG,ProostP,DevreeseB,VanBeeumenJ, EngelborghsY,DeClercqE,DebyserZ: HIV-1integraseformsstable tetramersandassociateswithLEDGF/p75proteininhumancells. JBiol Chem 2003, 278 (1):372 – 381. 70.ShunMC,RaghavendraNK,VandegraaffN,DaigleJE,HughesS,KellamP, CherepanovP,EngelmanA: LEDGF/p75functionsdownstreamfrom preintegrationcomplexformationtoeffectgene-specificHIV-1integration. GenesDev 2007, 21 (14):1767 – 1778. 71.ThomsonJP,SkenePJ,SelfridgeJ,ClouaireT,GuyJ,WebbS,KerrAR,DeatonA, AndrewsR,JamesKD, etal : CpGislandsinfluencechromatinstructureviathe CpG-bindingproteinCfp1. Nature 2010, 464 (7291):1082 – 1086. 72.ClouaireT,WebbS,SkeneP,IllingworthR,KerrA,AndrewsR,LeeJH, SkalnikD,BirdA: Cfp1integratesbothCpGcontentandgeneactivityfor accurateH3K4me3depositioninembryonicstemcells. GenesDev 2012, 26 (15):1714 – 1728. 73.CarloneDL,SkalnikDG: CpGbindingproteiniscrucialforearlyembryonic development. MolCellBiol 2001, 21 (22):7601 – 7606. 74.CarloneDL,LeeJH,YoungSR,DobrotaE,ButlerJS,RuizJ,SkalnikDG: Reducedgenomiccytosinemethylationanddefectivecellular differentiationinembryonicstemcellslackingCpGbindingprotein. MolCellBiol 2005, 25 (12):4881 – 4891. 75.ButlerJS,PalamLR,TateCM,SanfordJR,WekRC,SkalnikDG: DNA MethyltransferaseproteinsynthesisisreducedinCXXCfingerprotein1deficientembryonicstemcells. DNACellBiol 2009, 28 (5):223 – 231. 76.MoncktonDG,CaskeyCT: Unstabletripletrepeatdiseases. Circulation 1995, 91 (2):513 – 520. 77.VerkerkAJ,PierettiM,SutcliffeJS,FuYH,KuhlDP,PizzutiA,ReinerO, RichardsS,VictoriaMF,ZhangFP, etal : Identificationofagene(FMR-1) containingaCGGrepeatcoincidentwithabreakpointclusterregion exhibitinglengthvariationinfragileXsyndrome. Cell 1991, 65 (5):905 – 914. 78.FuYH,KuhlDP,PizzutiA,PierettiM,SutcliffeJS,RichardsS,VerkerkAJ, HoldenJJ,FenwickRGJr,WarrenST, etal: VariationoftheCGGrepeatat thefragileXsiteresultsingeneticinstability:resolutionoftheSherman paradox. Cell 1991, 67 (6):1047 – 1058. 79.KennesonA,ZhangF,HagedornCH,WarrenST: ReducedFMRPand increasedFMR1transcriptionisproportionallyassociatedwithCGG repeatnumberinintermediate-lengthandpremutationcarriers. HumMolGenet 2001, 10 (14):1449 – 1454. 80.WolteringJM,DubouleD: Conservedelementswithinopenreading framesofmammalianHoxgenes. JBiol 2009, 8 (2):17. 81.RozovskaiaT,Rozenblatt-RosenO,SedkovY,BurakovD,YanoT,Nakamura T,PetruckS,Ben-SimchonL,CroceCM,MazoA, etal : Self-associationof theSETdomainsofhumanALL-1andofDrosophilaTRITHORAXand ASH1proteins. Oncogene 2000, 19 (3):351 – 357. 82.MartinME,MilneTA,BloyerS,GaloianK,ShenW,GibbsD,BrockHW,SlanyR, HessJL: DimerizationofMLLfusionproteinsimmortalizeshematopoietic cells. CancerCell 2003, 4 (3):197 – 207.Bina etal.BMCGenomics 2013, 14 :927 Page15of16 http://www.biomedcentral.com/1471-2164/14/927

PAGE 16

83.FujitaPA,RheadB,ZweigAS,HinrichsAS,KarolchikD,ClineMS,GoldmanM, BarberGP,ClawsonH,CoelhoA, etal : TheUCSCgenomebrowserdatabase: update2011. NucleicAcidsRes 2011, 39 (Databaseissue):D876 – 882. 84.ENCODEProjectConsortium: AnintegratedencyclopediaofDNA elementsinthehumangenome. Nature 2012, 489 (7414):57 – 74. 85.McCarrollSA: Copynumbervariationandhumangenomemaps. NatGenet 2010, 42 (5):365 – 366. 86.WyssP,LazarusSA,BinaM: Aprogramtoolkitfortheanalysisof regulatoryregionsofgenes. MethodsMolBiol 2006, 338: 135 – 152. 87.RegnierM,SzpankowskiW: Onpatternfrequencyoccurrencesina Markoviansequence. Algorithmica 1998, 22: 631 – 649. 88.JacquetP,SzpankowskiW: Autocorrelationonwordsanditsapplications: analysisofsuffixtreesbystring-rulerapproach. JCombTheorySeriesA 1994, 66: 237 – 269.doi:10.1186/1471-2164-14-927 Citethisarticleas: Bina etal. : DiscoveryofMLL1bindingunits,their localizationtoCpGIslands,andtheirpotentialfunctioninmitotic chromatin. BMCGenomics 2013 14 :927. Submit your next manuscript to BioMed Central and take full advantage of: € Convenient online submission € Thorough peer review € No space constraints or color “gure charges € Immediate publication on acceptance € Inclusion in PubMed, CAS, Scopus and Google Scholar € Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit Bina etal.BMCGenomics 2013, 14 :927 Page16of16 http://www.biomedcentral.com/1471-2164/14/927