Tandemly repeated DNA families in the mouse genome

MISSING IMAGE

Material Information

Title:
Tandemly repeated DNA families in the mouse genome
Series Title:
BMC Genomics
Physical Description:
Book
Language:
English
Creator:
Komissarov, Aleksey S.
Gavrilova, Ekaterina V.
Demin, Sergey Ju
Ishov, Alexander M.
Podgornaya, Olga I.
Publisher:
BioMed Central
Publication Date:

Notes

Abstract:
Background: Functional and morphological studies of tandem DNA repeats, that combine high portion of most genomes, are mostly limited due to the incomplete characterization of these genome elements. We report here a genome wide analysis of the large tandem repeats (TR) found in the mouse genome assemblies. Results: Using a bioinformatics approach, we identified large TR with array size more than 3 kb in two mouse whole genome shotgun (WGS) assemblies. Large TR were classified based on sequence similarity, chromosome position, monomer length, array variability, and GC content; we identified four superfamilies, eight families, and 62 subfamilies - including 60 not previously described. 1) The superfamily of centromeric minor satellite is only found in the unassembled part of the reference genome. 2) The pericentromeric major satellite is the most abundant superfamily and reveals high order repeat structure. 3) Transposable elements related superfamily contains two families. 4) The superfamily of heterogeneous tandem repeats includes four families. One family is found only in the WGS, while two families represent tandem repeats with either single or multi locus location. Despite multi locus location, TRPC-21A-MM is placed into a separated family due to its abundance, strictly pericentromeric location, and resemblance to big human satellites. To confirm our data, we next performed in situ hybridization with three repeats from distinct families. TRPC-21AMM probe hybridized to chromosomes 3 and 17, multi locus TR-22A-MM probe hybridized to ten chromosomes, and single locus TR-54B-MM probe hybridized with the long loops that emerge from chromosome ends. In addition to in silico predicted several extra-chromosomes were positive for TR by in situ analysis, potentially indicating inaccurate genome assembly of the heterochromatic genome regions. Conclusions: Chromosome-specific TR had been predicted for mouse but no reliable cytogenetic probes were available before. We report new analysis that identified in silico and confirmed in situ 3/17 chromosome-specific probe TRPC-21-MM. Thus, the new classification had proven to be useful tool for continuation of genome study, while annotated TR can be the valuable source of cytogenetic probes for chromosome recognition.

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
All rights reserved by the source institution.
System ID:
AA00008691:00001


This item is only available as the following downloads:


Full Text

PAGE 1

RESEARCHARTICLE OpenAccessTandemlyrepeatedDNAfamiliesinthemouse genomeAlekseySKomissarov1,EkaterinaVGavrilova1,2,3,SergeyJuDemin1,AlexanderMIshov3andOlgaIPodgornaya1,2*AbstractBackground: FunctionalandmorphologicalstudiesoftandemDNArepeats,thatcombinehighportionofmost genomes,aremostlylimitedduetotheincompletecharacterizationofthesegenomeelements.Wereportherea genomewideanalysisofthelargetandemrepeats(TR)foundinthemousegenomeassemblies. Results: Usingabioinformaticsapproach,weidentifiedlargeTRwitharraysizemorethan3kbintwomouse wholegenomeshotgun(WGS)assemblies.LargeTRwereclassifiedbasedonsequencesimilarity,chromosome position,monomerlength,arrayvariability,andGCcontent;weidentifiedfoursuperfamilies,eightfamilies,and62 subfamilies-including60notpreviouslydescribed.1)Thesuperfamilyofcentromericminorsatelliteisonlyfound intheunassembledpartofthereferencegenome.2)Thepericentromericmajorsatelliteisthemostabundant superfamilyandrevealshighorderrepeatstructure.3)Transposableelementsrelatedsuperfamilycontainstwo families.4)Thesuperfamilyofheterogeneoustandemrepeatsincludesfourfamilies.Onefamilyisfoundonlyin theWGS,whiletwofamiliesrepresenttandemrepeatswitheithersingleormultilocuslocation.Despitemulti locuslocation,TRPC-21A-MMisplacedintoaseparatedfamilyduetoitsabundance,strictlypericentromeric location,andresemblancetobighumansatellites. Toconfirmourdata,wenextperformed insitu hybridizationwiththreerepeatsfromdistinctfamilies.TRPC-21AMMprobehybridizedtochromosomes3and17,multilocusTR-22A-MMprobehybridizedtotenchromosomes, andsinglelocusTR-54B-MMprobehybridizedwiththelongloopsthatemergefromchromosomeends.In additionto insilico predictedseveralextra-chromosomeswerepositiveforTRby insitu analysis,potentially indicatinginaccurategenomeassemblyoftheheterochromaticgenomeregions. Conclusions: Chromosome-specificTRhadbeenpredictedformousebutnoreliablecytogeneticprobeswere availablebefore.Wereportnewanalysisthatidentified insilico andconfirmed insitu 3/17chromosome-specific probeTRPC-21-MM.Thus,thenewclassificationhadproventobeusefultoolforcontinuationofgenomestudy, whileannotatedTRcanbethevaluablesourceofcytogeneticprobesforchromosomerecognition.BackgroundTandemlyrepeatedDNArepresentsasignificantportion ofthemousegenomeandincludecentromereandpericentromereregions.Althoughhistoricallyreferredtoas junkDNA ,TandemRepeats(TR)appeartoprovide uniquestructuralandfunctionalcharacteristicsdueto theirtandemorganizatio n.TandemlyrepeatedDNA containsmultiplecopiesofarepeatunit(ormonomer) arrangedinaheadtotailfashion.CentromeresfromfissionyeasttohumanscontainTR,andpericentromeric regionsenrichedinTRappearingtobecriticallyimportantforestablishingheter ochromatinformationand properchromosomesegregation[1].Someofthese functionsappeartoinvolveRNAinterference-mediated chromatinmodifications[2-4]. TRcontentiswellinvestigatedinthehumangenome, anditshowsawiderangeofrepeatsizesandorganization,rangingfrommicrosatellitesofafewbasepairsto megasatellitesofuptoseveralkilobases.Microsatellites andVariableNumberTandemRepeats(minisatellitesor VNTRs)canbehighlypolymorphicandthusareused asgeneticmarkers[5,6]. ThecentromericregionofhumanchromosomescontainsalphasatelliteDNA(satDNA),thelargestTR *Correspondence:opodg@yahoo.com1InstituteofCytologyRAS,4Tikhoretskyavenue,194064,St.Petersburg, Russia FulllistofauthorinformationisavailableattheendofthearticleKomissarov etal BMCGenomics 2011, 12 :531 http://www.biomedcentral.com/1471-2164/12/531 2011Komissarovetal;licenseeBioMedCentralLtd.ThisisanOpenAccessarticledistributedunderthetermsoftheCreative CommonsAttributionLicense(http://creativecommons.org/licenses/by/2.0),whichpermitsunrestricteduse,distribution,and reproductioninanymedium,providedtheoriginalworkisproperlycited.

PAGE 2

familyinthehumangenome.Thisfamilyhasbeen extensivelystudiedandpro videsaparadigmforunderstandingthegenomicorganizationofTR[7,8].These tandemarraysarecomposedofeitherdivergedmonomers,withnohigherorderrepeatstructure,oraschromosome-specificHigher-OrderRepeat(HOR)units characterizedbydistinctperiodicityandarrangements ofanintegralnumberofbasicmonomers[9].TheHOR structureofhumancentromericalphasatelliteisimportantforcentromerefunction[7]. Inhumans,thepericentromericregionsconsistof alphasatDNAarraysthataresurroundedbyarraysof classical satellites(e.g.humansatDNA1-4)[10-13]. Thesepericentromericregionshaveaspecifichighorderchromatinstructureandmightberesponsiblefor chromatinspatialorganization. Inthehousemouse, Musmusculus ,centromericand pericentromericregionsarerepresentedbytwohighly conserved,tandemlyrepeatedsequencesknownasminor andmajorsatellites(MiSatandMaSat,respectively,SATMINandGSAT_MMinRepbasenomenclature).MiSat arecomposedof120-bpAT-richmonomersthatoccupy 300-600kboftheterminalregionofallmousetelocentric(single-armed)chromosomes;theseTRserveas thesiteofkinetochoreformationandspindlemicrotubuleattachment[14-18].MaSatismoreabundantand arecombinedfrom234-bpmonomersthatresidesadjacenttoMiSat.MaSatareimplicatedintoheterochromatinformationandsisterchr omatidcohesion[17,19]. NeitherofthesesatDNAwereidentifiedatthecentromereofthemorphologicallydistinctacrocentricYchromosome,whichhasaveryshortarmthatdistinguishesit fromthetelocentricautosomesandchromosomeX[20]. Recently,thecentromereofYchromosomewasshown tocontainahighlydivergedMiSat-likesequence(designatedYmin)withHORorganizationpreviouslynot describedformouseMiSatarrays[20]. HerewereporttheanalysisofmouselargeTRgenomeorganizationbyacombinedbioinformaticsand cytologicalapproaches.AlllargeTRfoundintwo mousewholegenomeshotgunassemblies(WGS)were classifiedintofoursuperfamilies,eightfamilies,and62 subfamilies,including60notdescribedyet.Theproposedclassificationisbased onarraysimilarity,monomerlength,thedegreeofunitsimilarity,positiononthe referencegenomechromosomeassemblies,andGCcontent.ThreeTRwereselectedfortheexperimentalwork duetotheirabundanceintheWGS.Allarray-based probesrecognizechromosomespredicted insilico Thedatareportedhererepresenttheoverallgenome wideassessmentofthenumber,positionandorganizationoflargeTRinthesequencedmousegenome. AnnotatedTRcouldbeanimportantresourcefor furthercharacterizationandoverallunderstandingof themousegenome.ResultsTandemlyrepeatedDNAinmousewholegenome shotgunassembliesFortheinitialsearchoflargeTRweusedtwoWGS assemblies:MouseGenomeSequencingConsortium (MGSC)andCeleraassemblies[21,22].TheWGS assemblyistheentireshotgunsequencingreads assembledintocontigsincludingeuchromaticandheterochromaticregions,evenwhennotassembledinto gappedcontigsornotanchoredonchromosomesyet. TheregionsenrichedinTRaremostlynotanchored, althoughTRandinparticularsatDNAarepresentin WGSduetotheirabundanceinthegenome. ToidentifyallTRwithunitsizeupto2kbweusedTRF (TandemRepeatFinder[23]).TheinitialrawTRFoutput containsdataredundancyduetonestedrepeatsand repeatswiththesamecoordinatesbutdifferentunitsizes. Toeliminatethisredundancyallnestedrepeatswitharray lengthlessthantheparentarraywereremoved.Incaseof samecoordinatesanarraywithalongerunitsizewas removed.BothinMGSC(~3%)andCelera(~5%)WGS theamountofnon-redundantTRislessthantheexperimentallydeterminedamountoftheMaSatalone(~8%) [24],indicatingthateveninWGSdatasetsTRremain underrepresented(Table1).Sincethemousegenomeis enrichedwithmicro-andminisatellites[21],wetriedto getridofthemwithafilterthatexcludedanyarrayless than3kb.InbothWGScollectionswefound941large TR(Table1),whichwerefurthergroupedintofamilies duetosequencesimilarity(Figure1).FamiliesandsuperfamiliesEachpairofarrayswascomparedbybl2seqprogram, andthescorevaluewasusedasameasureofTR Table1TandemrepeatsinmouseWGSassembliesAssembly GPID Size(bp) Contigs TR (all) %of assembly TR (>3kb) MGSCWGS131832,477,633,597224,713 849,466 2.9% 157 CeleraWGS117853,003,109,157837,963 1,084,5525.0% 784 TotalWGS 5,480,742,7541,062,6761,934,0183.8% 941GPID-NCBIGenomeProjectID.TR(all)-totalamountfoundinassembly;MGSC-themousegenomesequencingconsortium.Komissarov etal BMCGenomics 2011, 12 :531 http://www.biomedcentral.com/1471-2164/12/531 Page2of21

PAGE 3

sequencesimilarity.Twotandemrepeatswereplacedin thesamesubfamilyiftheyhadabl2seqmatchwith scoregreaterthan90.ThissubdividedtheTRinto62 subfamilies.WeusedtheBlastsearchversusrodent Repbaserepeatcollectiontocheckthesimilaritywith knownmouserepeats.Thissearchdeterminedonlytwo knownmousesatDNA:715arrays(~76%)represent pericentromericMaSatand21(~2%)representcentromericMiSatfamily(Table2).TherestoftheTR familiesarenotpresentinRepbase;therefore,theywere Figure1 Overviewofthelargetandemrepeatsanalysis .Foreachprogramonlyparametersthatwerechangedareshown.The blastn was usedforthe Repbase searchandforthegenomemappingwithparametersidenticalto bl2seq .TRfamilynamesaregivenaccordingtothe Table2.ThecompletedescriptionoftheworkflowisgiveninResultsandMethods. Table2MouselargetandemrepeatsclassificationSuperfamily NFamily GenomePositionArrays%ofTRSubfamilies A.Centromeric 1MiSat Cen* 21 2.2 1 B.Pericentromeric2MaSat periCen* 715 76.0 1 C.Heterogeneous3TRPC-21A-MMperiCen 50 5.3 1 4Multilocus Any 57 6.0 20 5Singlelocus Any 56 6.0 29 6Unplaced Absent 11 1.2 8 D.TE-related 7MTArelated Any 15 1.6 1 8L1_MMrelatedAny 16 1.7 1EightfamiliesoftheTRfoundinWGSwerecombinedinfoursuperfamilies(A-D).Familiesareformedaccordingtosequencesimilarityand/orpositioninthe referencegenome.Arrays-thenumberofTRarraysfoundinWGS;%ofTR-percentofallTRfoundinWGS;Subfamilies-numberofsubfamiliesinfamily.(*)MaSatandMiSatpositionisdeterminedby insitu datapublished;(D)-Tandemrepeatsrelatedtotransposableelements(TE);(MTA)-mousetranscript retrotransposon,(L1)-L1_MM.Komissarov etal BMCGenomics 2011, 12 :531 http://www.biomedcentral.com/1471-2164/12/531 Page3of21

PAGE 4

namedaccordingtotheirstructureandgenomeposition.Fortwofamilies(C4,C5inTable2)thepublished nomenclaturewasused:singlelocus(SL)familyfor arraysfoundonlyoncei nthereferencegenome, whereasmultilocus(ML)familyforarraysfoundat morethanonelocus[5].Asubfamilynameincludesthe lettersTR(TandemRepeat),genomicposition(if known),minimalunitsizeinbp,indexletterifthereis morethanoneTRwithsimilarunitsize(A,B,etc),and suffixMM( MusMusculus ),withthelatterpresentonly inthetablesandfigures. Thecharacteristicfeatureofsuperfamily C isthe prominentvariabilityoftheTR,whichcouldbedivided intosubfamilies.ThemostabundantisTRPC-21A, whichhasastrictlypericentromericlocation(Table2, C3).Multiandsinglelocusfamilieseachrepresent~6% oftheTRdataset.SomeoftheTRarrays(~1%)from Unplacedfamily(UnP,Table2,C6),whichhaveadistinctmonomerandrelativelylongarrays,arenotfound inthereferencegenome. Thesuperfamily D isformedbyMTA-relatedand L1-relatedfamilies(~3%tog ether),whichshowstructuralcharacteristicsrelate dtodispersedtransposable elements(TE),butaretandemlyorganized,andhave severalfeaturesquitedistinctfromthemostmembers oftheset(Table2,D7,D8). Therelationshipbetweenfamiliesdependingon monomerlength,thedegreeofunitsimilarityandGC contentareshowninthegraph(Figure2).Themost clearandcompactcloudisformedbyMaSatarrays thoughitisnotasuniformasmighthavebeenexpected fromtheexperimentalstudies[24,25].MiSatcloudisin proximitytotheMaSatbutformsadistinctgroup.In theareaofrelativelyshortmonomerunit,twodefined cloudsofTRPC-21AandothermultilocusTRarevisible.Thetransposon-relatedTRformaloosecloudin theregionoflongmonomerunits.ArraysfromSLand UnPfamiliesarescatteredthroughoutthewholeplot.It islikelythatadditionaldatafromoncomingmousegenomeresequencingcouldimprovetheclassificationofSL andUnPfamilies.ChromosomeendsEvenforhuman,thebestassembledmammaliangenome,onlychromosomes8andXhavethehigher-order repeatunitsknowntobeatthecentromericregion [8,26,27].Thelargeregionsofclassicalheterochromatin arepoorlyassembled[6],andforthemousegenome evenlessisknown.Mousetelocentricchromosomes haveextendedTRarraysattheends.Thatisthereason whytheseregionsaredifficulttoassembleandchromosomesendabruptlyin3Mbgapsreservedforcentromericregions. WeidentifiedwhatkindsofTRareprecedingthese gaps(Table3).Theendsassemblydoesnotallowto findTRonallchromosomes,sowedeterminethedistancefromthegaptothefirs tgene(Additionalfile1, TableS1).OnlytwoassembliesendupinMaSatarrays: chromosomes9and11.Fourassembliesendupinthe newlyfoundTRPC-21A(chromosomes3,4,16and17). Figure2 TRarraysdistributiongraph.ThegraphoftandemrepeatarraysdistributionwasdoneinMathematica 7.0.Eachcirclerepresents onearrayfoundinWGSassemblies.EachfamilywascoloredaccordingtotheTable2:centromericMiSat(magenta);pericentromericMaSat (blue);TRPC-21A-MM(orange);heterogeneousmultilocus(ML,indigo);heterogeneoussinglelocus(SL,yellow);heterogeneousUnplaced(UnP, burntorange);TE-relatedtandemrepeats(TE,green).Xaxis-monomerlength(bp)upto2kb;Yaxis-GC-contentisnormalizedto1;Zaxissimilaritybetweenmonomers.AandB-differentprojectionsofthesamegraph. Komissarov etal BMCGenomics 2011, 12 :531 http://www.biomedcentral.com/1471-2164/12/531 Page4of21

PAGE 5

Onchromosomes4and17thearraysofTR-22Aand TR-27AarefollowedbyTRPC-21A.TR-22Aarraysare alsofoundattheveryendsofchromosomes6and18. Wefoundoutthatonlyeightchromosomeendscontain TRarraysandsixofthemaredistinctfromthepericentromericMaSat.MiSat(minorsatellite)andMaSat(majorsatellite)familiesThepreviousexperimentaldataindicatedthesequence uniformityofmousesatDNA,i.e.MaSatmonomers variabilityislessthan5%[25],and~5.6%variationis foundbetweenMiSatmonomers[28].MaSatandMiSat arebothAT-rich(64%and66%respectively),andshare stretchesofsequenceswith83%homology[16].MiSat arrayswerenotfoundintheassembledreferencegenome.However,ChromosomeUnknown(ChrUn)containsMiSat(Additionalfile1,TableS2).Centromeric positionofMiSatinTable2isgivenaccordingtofluorescent insitu hybridisation(FISH)[29-31].Allthe MiSatarrays(thelongestarrayis~6kb)areAT-rich, withGCcontentnomorethan33%.MonomervariabilityofMiSatfamilyisthelowestofallfamiliesexcept TE-relatedsuperfamily.Inaccordancewiththedata published[18,20,28,32]andlowmonomervariability MiSatarraysdonothaveaprominentHORstructure. Onethirdofthearrayshavethe120bpmonomerunit reportedforMiSat[14,28,32].Theresthasunitsof112 bp,223bp,232bpandoneoftheunitsis1054bp.The unitdifferencemaybeabasefortheHORstructure, butthelimitednumberofMiSatarraysfoundinWGS makesitdifficulttodrawconclusionsonthispointright now. ThepericentromericAT-richMaSatisformedby234 bpheterotetramerthatconsistsoffourdifferent58-60bp monomerswithcommonmotif[24].MaSatisthemost abundantfamilyinWGS(Table2).VeryfewMaSat arraysfoundintheWGSexceed10kb,withthelongest being~23kb(Additionalfile2,NN234and316).The arrayof38kbisfoundattheendofchromosome9in thereferencegenome(Table3).Thisfeature,thearray length,differsfromthehumangenome,wherealpha satDNAareassembledinarrayswithlength>100kb[6]. TheMaSatfamilyhasGCcontentnomorethan37% andthemeanmonomervariabilityof30%.TheMaSat hastwocommonunitsizevariants:35%ofarrayshave theexperimentallydescribed58-59bpmonomer[24] and31%havethe234bpclassicalmonomer(Figure3). MaSatarrayswithshortmonomershavethemostprominentvariability(~30%for58bpunit).Arrayswith 234bpmonomershowthelowestrateofthevariability, withameanof~15%(NN397-617inAdditionalfile2). Veryfewofthearrayshavevariabilityabout5%.Thus, bioinformaticsapproachdoesnotconfirmthehigh degreeofMaSatsequenceconservationthatwasconcludedfromtheexperimentaldata[25]. ThehighrateoftheunitvariabilitysuggeststheexistenceofaHORstructureinthearray.Thiswaschecked withadot-plotsimilarityanalysiswherethesequenceis self-comparedwiththefixed13bpwindow(Figure4). Adegreeofsimilarityisindicatedbyagreyscalewhere adarkergreyrepresentshigherdegreeofsimilarity. Therefore,repeatedunitswit hhighsimilaritylooklike diagonallines,andrepeatedmotifslooklikesquarepatterns.Wefoundthatabout60%ofMaSatarrayshavea HORstructurewithaclear tartan pattern(Figure4A). Aconservative234bpheterotetramer(58+60+58+58bp units)isvisibleathighermagnification(Figure4C). Moreover,eachunitconsistsoftwolessconservative28 bpand30bpsubunits(Figure4D). TRFoutputcontainedMaSatarrayswithaunitsizeof morethan1000bp(Figure3;Additionalfile2,NN698715).ItislikelythatMaSathasunitsevenlargerthan2 kb,whicharenotdetectedbytheTRFsearchthatwas restrictedtoamaximalunitsizeof2kb.Nevertheless theblackandwhitedot-plotwith51bpwindowsize demonstratestheoveralldifferencebetweenHORsin differentMaSatarraysandconfirmstheexistenceof~2 kbHOR(Additionalfile3,FigureS1A,B). AprominentdifferencebetweenMaSatarrayscould beexpectedfromdot-plotanalysis(Figure4A).The formofMaSatcloudonFigure2alsosuggeststhat MaSatisnotasuniformasitwaspreviouslythought [30].Wesupposethatbeingclonedandassembledeach MaSatarraymightcometothedifferentchromosomes, andthenchromosomespecificitycouldbesuspectedfor MaSatpreviouslycountedasuniform.TRPC-21A-MMfamilyThesecondlargestfamilyinWGSisTRPC-21A(HeterogeneousTR,familyC3,Table2).ItismoreGC-rich incomparisontoMiSatandMaSat,butitsmonomer Table3TRarraysintheregionadjustedtocentromeric gapChromosomeTRsubfamilyArraylength(kb)Coordinates(bp) 3TRPC-21A-MM33.63000001-3033629 4 TRPC-21A-MM7.0 3006469-3013522 4 TR-22A-MM4.9 3104899-3109811 6 TR-22A-MM9.9 3082006-3091879 9 MaSat38.4 3000003-3038419 11 MaSat3.9 3000004-3003872 16 TRPC-21A-MM9.0 3232335-3241336 17 TRPC-21A-MM32.5 3006399-3038945 17 TR-27A-MM4.6 3070530-3075093 18 TR-22A-MM8.0 3112790-3120776OnlyTRwiththearraymorethan3kbinthedistanceupto2Mbfromthe centromericgapisshown.TR-TRnameisgivenaccordingtoTables4and5. Coordinates-thearraypositiononchromosome.Komissarov etal BMCGenomics 2011, 12 :531 http://www.biomedcentral.com/1471-2164/12/531 Page5of21

PAGE 6

variabilityisnearlythesame(Table4).Infourcases, whenitwasfoundintheassembledgenome,itislocalizedtotheveryendofcentromericgap(Table3).Only onchromosome7itisplacedintheinternalband(7D1, Table4).Moreover,TRPC-21Aarraysarefoundin ChrUnwhichcontainsmostlypericentromericregions (indicatedbyPCsuffixinTRPC-21Aname). AllTRPC-21Aarraysweredividedintotengroups accordingtothesimilaritytothespecificlocusinthe referencegenome(Table4 ).Thelongestarrayof~30 kb(N35inAdditionalfile1,TableS3)probablybelongs tochromosome17duetothehighsequenceandlength similaritywiththearrayattheendofthischromosome (Table3).Mostarraysshowsimilaritywiththeband3A thathasthelargeTRPC-21Afieldattheendofchromosome(Tables3and4). ArraysofTRPC-21Aareorganizedbymultiplication ofthebasic21bpunit,althoughTRPC-21Aarraysare morehomogeneousthanMaSatarrays(Figure5).All TRPC-21AarrayshaveaHORstructureondot-plot.In thiscaseeven60-merunitsappeared(Figure5A).PCR withspecificprimersonthetemplateoftotal M.musculus DNAgavetheladderforTRPC-21Aaswellasfor MaSat,indicatingthecharacteristicfeatureofthe satDNA,alsocausedbyvariablemonomersorganizedin HOR(datanotshown). AllthefeaturesofTRPC-21Aarethoseofa bigclassical satDNAsuchashumansatellites1-4[33].They areknowntobechromosome-specific.Forexample,the bulkofhumansatellite3(HS3)islocatedon chromosome1,butitcouldbedistinguishedfromHS3 onchromosome9[34].TodesignaFISHprobefor TRPC-21Aweselectedthearraywithahighsimilarity totheband3A2.Multilocus,singlelocusandunplacedfamiliesTheHeterogeneousTRsuperfamily(Table2)isclassifiedintofamiliesaccordingtotheirpresence(ML,SL) orabsence(UnP)inthereferencegenome(Tables5,6, and7).ThemostabundantMLsubfamily,TR-22A,was foundinfourlociinthereferencegenome;threeare associatedwithcentromeric gap(Table3and4A2,6A2, 18B2inTable5)andoneislocatedmoredistantfrom thecentromericgap(7A2,Table5). MLTR-4Aconsistsofave ryshortAT-richunit. AboutahalfoftheMLsubfamiliesispresentonthesex chromosomes(Table5).Itcouldbeexplainedbymore accurateassemblyoftheheterochromaticregionsonthe sexchromosomesrelativetoautosomes.Ontheother hand,itisknownthatthesexchromosomeshave uniqueDNArepeats[35-37]andMLTR-4Acanbeone ofthem. Despitetheminimalsequencesimilarity,severalML andSLsubfamilieshavesimilarGC-content,unitsize, andarrayvariability,formingthreevisuallydistinct groups(clouds)onthegraph:GC-rich,AT-rich,and GC-neutral(Figure2). TR-22AsubfamilyisthecoreofGC-richcloudinthe areaof55-60%GC,whileTR-6A,TR-57A,TR-16Aand TR-31Barecloselyadjoined.Atleastonesubfamily Figure3 MaSatunitlengthdistribution.Xaxis-unitlength(bp);Yaxis-numberofthearrayswithcorrespondentunit.Thedetaileddataare showninAdditionalfile2.Twomainpeaksrepresent58-59bpand234bpunits;presenceoflargerunitscanbeinterpretedastheHOR structureforMaSat. Komissarov etal BMCGenomics 2011, 12 :531 http://www.biomedcentral.com/1471-2164/12/531 Page6of21

PAGE 7

fromSL,TR-31D,alsobelongstothisgroup(Additional file1,TableS3). ThecoreofAT-richcloudintheareaof40-45%GC isformedbysubfamiliesfromSL(TR-17A,TR-38A, TR-39Aandother).However,severalML,suchasTR81A,alsogravitatetowardsthiscloud.SeveralUnP arrays(TR-24B,TR-28Aandother)belongtothisgroup aswell(Figure2Table7).TwoofsubfamiliesfromATrichcloud(TR-39A,TR-44A)areembeddedintothe MaSatcloudorgravitatetoMaSat(TR-81A,TR-4A). SeveralMLsubfamilies(TR-31A,TR-58A)andSL subfamilies(TR-54B,TR-29A,TR-24C)arraysgravitate uponTRPC-21AandformcloudwithneutralGC-content(45-55%GC).MostofthemhaveHORandarepresentinChrUn(Tables5and6). AnumberofMLarrays(Table5,NN19-22)andSL arrays(Table6,NN29-34)haveaverylongmonomer of>1kb,althoughthestructureofthelongSLandML Figure4 MaSatHORstructure .A:ThedotplotoftheMaSatarrayN707(Additionalfile2);awindowsizeis13bp,sequencesimilarityis showningrayscale.HORunitsareshownasarrowswithindicatedlength;smallerarrowsindicateHORsubunits;differentcolorsandletters indicatesubunitsvariants.B:2154bpHORunitstructure;thecolorcodefordifferentunitsisshown.C:ThestructureofconventionalMaSat234 bpheterotetramer.D:58bpunitisbuiltof28bpand30bpsubunitsconsistingof7-11bpsubunits;lettersindicatesubunitsvariants. Table4TRPC-21A-MMfamilyNUnit (bp) ChromoBandsArraysGC % Length (bp) Var % 142*3A2149.3473927 2213A2,4A2750.1528829 3633A2,17A2248.6441729 44216A2,17A2248.22988431 5217D1,16A2,17A2650.1495629 6213A2,4A2,17A2748.9769830 7213A2,16A2,17A2548.71719831 8213A2,7D1,16A2,17A21849.51568429 92093A2,4A2,16A2,17A2148.4748129 10213A2,4A2,7D1,16A2,17A2149.4802129ArraysoftheTRPC-21A-MMfamily(Heterogeneoussuperfamily,Table2,C3). N-rowindex;Unit(bp)-minimalunitlength;ChromoBands-chromosomal positionsinthereferencegenome;Arrays-numberofarraysinWGSwith similaritytothosechromobands;GC%-meanarrayGCcontent;Length(bp) -maximalarraylength;Var%-meanvariabilitybetweenmonomersinarray. AllarrayshaveHORandpresentinChrUn,socorrespondentcolumnsare omittedfromthistable. *thissequencewasusedfortheFISHprobesdesign.Komissarov etal BMCGenomics 2011, 12 :531 http://www.biomedcentral.com/1471-2164/12/531 Page7of21

PAGE 8

TRdoesnotshowextensivesimilaritywithknownTEs. However,TRwithlongunitsthatareclassifiedasML orSLfamiliescouldbebuiltonthebaseofverydivergentorunknownTE.TheexistenceofsuchTEswas predictedinvertebrategenomes[38]. AlistofarraypositionsinWGSfortheHeterogeneoussuperfamilyisgivenintheAdditionalfile1, TableS3.TransposableelementsrelatedtandemrepeatsTwofamilieshavestructurals imilaritytotransposable elements(TE,superfamilyD,Table2).Thearraysare formedbythelargemonomerswithalowdegreeof diversityandsimilarGC-contentinbothfamilies(Additionalfile1,TableS4). Firstfamily,TR-MTA,isformedbyMTAfragments: MTa,MaLR-LTR,MammalianapparentLTR-retrotransposonsinRepbase(Figure6A).Secondfamily,L1relatedfamilyisformedbypartoftheORF2and3 LTR (Figure6B). MTAtransposonshavestructuralsimilaritiestoendogenousretroviruses,namelyERV3,andarerelatedto THE1inhumans[39].Endogenousretrovirusesby themselvescomprise~10%ofthemousegenome[40]. Overtime,mostMaLRshavedivergedconsiderably fromtheirconsensussequence,sotheirnumberisnow estimatedat25-94,000copies[39].Preliminaryanalysis hasnotyetrevealedsignificantsimilaritiesoftheputativeproductofMTAORFtoanyproteinpresentinthe databanks.Theres idualpartoftheORFisnowdeterminedasinternalpartinMTARepbaseconsensusand itisincludedinTRarrays. InordertomapTE-relatedarraystothereference genometworuleswereapplied.First,aTRhitata chromosomelocuscountsaspositiveonlywhenthe alignmentlengthismorethan2850bp(95%fromthe Figure5 TRPC-21A-MMHORstructure .A:ThesimilaritydotplotofTRPC-21A-MMarrayN50(inAdditionalfile1,TableS3);awindowsizeis 13bp,sequencesimilarityisshowningrayscale.Unitsareshownaslargearrowswithlengthindicated.SmallerarrowsindicateHORsubunits; colorsindicatesubunitsvariants.B;Thestructureof2017bpunits,twosubunitswithcorrespondingsizeareindicated.C:21bpisthebasicunit forTRPC-21A-MMasitisvisibleondotplotathighmagnification.Numberof21monomersisindicated.Bluearrowrepresentsoneunit.D:The blackandwhitedotplotwithawindowsizeof51bpandminimum80%identity. Komissarov etal BMCGenomics 2011, 12 :531 http://www.biomedcentral.com/1471-2164/12/531 Page8of21

PAGE 9

originalTRarraylimitof3kb).Second,ahitiscounted asasinglewhenthedistancebetweentwohitsisless than150bp(5%).Afterapplyingtheserules284hits withprecisepositionsremained(Additionalfile1,Table S5).MostofthelocifoundforTR-L1familydonot exceed5kb.ForTR-MTAfamilywefoundtwoloci witharraylengthabout10kb.Alllociweredisplayed onthebandedchromosomes.ThereisnoobviousregularityinTR-MTAfamilydistribution,probablydueto thelimitedamountofthearraysfound(Figure7, orange).TheTR-L1familyisenrichedinheterochromaticbandsandtheconcentrationonchromosomeXis visible(Figure7,blue).AtthesametimenoTE-related TRarefoundonYchromosome.Validationofthese findingsbyFISHistechnicallychallenging,becausethe LTRsofotherretroelementsmayobscuretheresults.TandemrepeatpositiondefinedbyFISHBioinformaticspredictionsaboutthepositionsofnewlyfoundTRwerecheckedby insitu experiments.Wedid notexpecttoobtain insitu thefullcorrespondenceof TRpositionsfound insilico ,sincetheassemblyofheterochromaticpartofthereferencegenomeisfarfrom beingcomplete.Nevertheless, insilico chromosome locationsshouldbeincludedinthesetofthe insitu labelledchromosomes.Monomerunitsfromthree classeswereselectedforprobedesign(seeMethodssection).Allprobesequenceswithashortdescription showninAdditionalfile1,TableS6.Inthereference genome,TRPC-21Ahaspredicted insilico pericentromericlocationonfourchromosomes(Table3and4) andTRPC-21AarrayswerefoundinChrUn,whichcontainsmostlypericentrom ericregions;therefore, Table5MultilocusfamilyNSubfamilyUnit(bp)ChromoBandsArraysGC%LengthVar%ChrUnHOR 1TR-22A-MM224A2,6A2, 7A2,18B2 958.112896263+ 2TR-4A-MM4 1H6,9F1, 9F3,XA4, XC2,XF5, YA2 6 33.17704303 + 3TR-27A-MM27 14B,17A2 4 39.67073270 + 4TR-31C-MM31 9A2* 4 49.513059240 + 5TR-18A-MM18 14A2,XF5 3 55.55644271 + 6TR-19B-MM19 5G1.2,12D13 47.85852190 + 7TR-38C-MM38 12B1,13A3.23 48.76797260 + 8TR-57A-MM57 5C2,7F5, 8A1.2,10D3, 12F1,14B, 16C2,17A2, XA6 3 52.35619210 + 9TR-4B-MM4 1H6,XA4, XC2 2 46.24161363 + 10TR-6A-MM6 5C2,XC2, XF3,XF5 2 60.26649390 + 11TR-16A-MM16 6E2,8A1.2, 16C3.2 2 54.26765170 + 12TR-20A-MM20 XA4,XF5 2 45.53604310 + 13TR-31A-MM31 7D1,8C1 2 50.66558203 + 14TR-31B-MM31 7D1,14A2 2 53.24922191 + 15TR-58A-MM58 6B2.2,6C2, 6F3,13B1 2 50.54658340 + 16TR-1521A-MM15217F3,XA1.2 2 44.63213111 + 17TR-30A-MM30 5B3,17B1 1 46.33912370 + 18TR-81A-MM81 14B,17A2 1 40.03483310 + 19TR-1164A-MM1164XC2,XD 1 48.63333190 20TR-1595A-MM1595XA3.2,XA6 1 38.83007160 21TR-1149A-MM1149XF1,XF3 1 45.85463135 22TR-1527A-MM1527XC2,XD 1 46.03120120 +SubfamiliesorderedbynumberofarraysinWGSandthenbyunitlength.N-rowindex;Unit(bp)-minimalunitlength;ChromoBands-chromosomalpositions inthereferencegenome;Arrays-numberofarraysofeachsubfamilyfoundinbothWGS;GC%-meanarrayGCcontent;Length(bp)-maxarraylength;Var%meanvariabilitybetweenmonomersinarray.ChrUn-numberofarraysfoundinChrUn;HOR-presenceofHOR. *ThereareseveraldistinctarraysofTR-31C-MMin9A2locusinthereferencegenome.Komissarov etal BMCGenomics 2011, 12 :531 http://www.biomedcentral.com/1471-2164/12/531 Page9of21

PAGE 10

Table6SinglelocusfamilyNSubfamilyUnit(bp)ChromoBandsArraysGC%LengthVar%ChrUnHOR 1TR-17A-MM1717D543.210312330+ 2TR-54B-MM54 XA1.2 5 47.911978310 3TR-29A-MM29 2F1 4 50.34175260 + 4TR-734A-MM734 XC2 3 37.09507220 + 5TR-1870A-MM18707D1 3 44.6579538 6TR-19A-MM19 18A2 2 49.74883320 + 7TR-34A-MM34 12F2 2 55.2335470 8TR-38A-MM38 8C1 2 42.75426250 + 9TR-38B-MM38 8C1 2 42.06113250 + 10TR-54A-MM54 XA1.2 2 48.110744300 11TR-100A-MM100 XA1.2 2 44.04364260 12TR-234A-MM234 3F2.2 2 59.4687850 13TR-23A-MM23 17B1 1 43.46018130 14TR-24C-MM24 12F1 1 52.5329890 15TR-29B-MM29 2F1 1 50.015896300 + 16TR-31D-MM31 12A1.2 1 55.95175241 + 17TR-33A-MM33 7B1 1 53.43601170 18TR-39A-MM39 1C1.2 1 39.2338750 19TR-40A-MM40 15A2 1 62.66612210 + 20TR-44A-MM44 2H3 1 36.3301690 21TR-48A-MM48 14D3 1 50.76603280 22TR-56A-MM56 XA1.2 1 42.94194220 + 23TR-84A-MM84 7E2 1 48.5304090 24TR-93A-MM93 XC2 1 51.33124243 25TR-111A-MM111 9F4 1 52.43347220 26TR-168A-MM168 XA1.2 1 41.94046230 27TR-297A-MM297 17E1.2 1 56.23100190 28TR-321A-MM321 11E2 1 42.33152210 + 29TR-814A-MM814 5A2 1 45.5317530 30TR-1146A-MM114619B 1 47.53056140 31TR-1284A-MM128410C3 1 42.35239140 32TR-1384A-MM1384XF1 1 25.8366580 33TR-1872A-MM18729A4 1 42.54285170 34TR-1908A-MM19082A2 1 39.84126140 +SubfamiliesorderedbynumberofarraysinWGSandthenbyunitlength.N-rowindex;Unit(bp)-minimalunitlength;ChromoBands-chromosomalpositions inthereferencegenome;Arrays-numberofarraysofeachsubfamilyfoundinbothWGS;GC%-meanarrayGCcontent;Length(bp)-maxarraylength;Var%meanvariabilitybetweenmonomersinarray.ChrUn-numberofarraysfoundinChrUn;HOR-presenceofHOR. Table7UnplacedfamilyNSubfamilyUnit(bp)ArraysGC%Length(bp)Var%HORChrUnWGSChr 1TR-24B-MM24334.4763628+211 2TR-24A-MM24245.6354022+2Un 3TR-13A-MM13 1 56.04477 15+0 6 4TR-27B-MM27 1 63.53452 36+1 Un 5TR-28A-MM28 1 41.23195 23+1 Un 6TR-36A-MM36 1 64.03003 8+0 19 7TR-102A-MM102 1 45.73698 31-0 X 8TR-624A-MM624 1 43.33297 2-0 UnSubfamiliesorderedbynumberofarraysinWGSandthenbyunitlength.N-rowindex;Unit(bp)-minimalunitlength;Arrays-numberofarraysinWGS;GC% -meanarrayGC%;Length(bp)-maxarraylength;Var%-meanvariabilitybetweenmonomersinarray.HOR-presenceofHOR;ChrUn-arraysfoundinChrUn; WGSChr-chromosomesourcefromWGScontigdescription.Komissarov etal BMCGenomics 2011, 12 :531 http://www.biomedcentral.com/1471-2164/12/531 Page10of21

PAGE 11

additionalchromosomeshavetobelabelledinthesame region.Singlestranddimerlabelledfrombothends yieldedsignalonninechromosomes:3,5,6,7,8,12,16, 17,andY.Thelargestsignalbelongstochromosome3 onallchromosomespreads.Ineachcasethelabelwas atthepericentromericregi onsexcepttheY(Figure8). FourchromosomespredictedasTRPC-21Abearing (Table4)areinthesetof insitu labelledchromosomes. Figure6 StructureofTE-relatedtandemrepeats .A:ThegeneralschemeofMTArelatedTRfamily;regionsofMTAaredenoted,anda fragmentofTRunitismarked.B:ThegeneralschemeofL1relatedTRfamily;L1regionsaredenoted,andafragmentofTRunitismarked. Figure7 ChromosomallocationofTE-relatedtandemrepeats .IdeogramofmousekaryotypewithMTA-likearraypositionsindicatedin orange,L1-likearraypositionsindicatedinblue.ForideogramdescriptionseetheMethodssection. Komissarov etal BMCGenomics 2011, 12 :531 http://www.biomedcentral.com/1471-2164/12/531 Page11of21

PAGE 12

Figure8 Fluorescent insitu hybridization(FISH)withTRPC-21A-MMshortprobe .A:bonemarrowmetaphaseplates;B:oneofthe metaphasesetsofchromosomesnegativeDAPI-banded,numbersofsignalbearingchromosomesareindicated.ForAandB:DAPIinblue,FISH signalingreen;bar-5 m.C:allchromosomeskaryotyped.IneachgroupthemiddleimageisG-bandedmousechromosomefromatlas[41], thesidenegativeDAPI-bandedchromosomesarefromtheplateshownonB.Ninechromosomeswiththelabelareindicatedbycircles;four chromosomes,with insitu signalthatconfirmed insilico prediction,areindicatedbyorangecircle.Theassembledchromosome4hasshort TRPC-21A-MMarraybutdoesnothavesignal(indicatedbyemptyorangecircle).Chromosome7hassignalinpericentromericregioninsteadof predicted insilico 7D1band. Komissarov etal BMCGenomics 2011, 12 :531 http://www.biomedcentral.com/1471-2164/12/531 Page12of21

PAGE 13

Chromosome4hasshortTRPC-21Aarray insilico (Table3)butitlacksanysignal,probablyduetothe wrongassembly.Otherdiscrepancyisthepericentromericsignalseenonchromosome7,while insilico TRPC-21Amappedtotheinternal7D1band.Instead,Y chromosomehastheinternalsignal,whichcouldbe explainedbytheuniquerepeatscontentofthesexchromosomes[20]. TheHORstructureofTRPC-21Asuggestschromosome-specificvariants(Fi gure5).Thenextprobewas basedonthearrayfragmentfromchromosome3.The probeisadoublestranded~150bpsequencewithadditional~20bpflankingsequences.FlanksgivethepossibilitytolabelprobebyPCR(Additionalfile3,Figure S2).Thisprobehasastrongsignalonchromosomes3 and17accordingtothepositionoflargeTRPC-21A arraysattheendsofthesechromosomesinthereferencegenome(Table3,Figure9).Wesupposethat probesdesignedonthebasisofTRPC-21Avariants couldbespecificforotherchromosomes. TR-22A(ML,C4inTable2)waschosenfortheprobe designduetoitsabundanceinthereferencegenomeas wellasinChrUn(Table5).Themonomericsingle strandprobelabelledfrombothendsishybridizedto tenchromosomes,fourofthempredictedasTR-22A bearing(Table5).Inthiscasethemainpartofthesignalislocatedatthepericentromericregions(chromosomes2,6,7,9,11,17,18),withadditionalsignals locatedonthearmsofchromosomes2and15,andin thesubtelomericregionofchromosome13.Ineachcase signalsarelocatedinheterochromaticdarkbands(Figure10).ThesignalisstrongeronL929chromosome spreadscomparingwiththesignalonnormalbonemarrowcells(Figure10Ac).Itcouldbeexplainedbyknown chromosomepolyploidyandrearrangementswithinheterochromaticregionsinL929cells[30,41].Thereisno obviousmainsignalonanychromosomespread,sothe designofchromosome-specificprobeonthebaseof TR-22Acouldbemorecomplicatedand,moreover,the arraysattheendsofchromosomes4,6and18inthe referencegenomedonotexceed10kb(Table3). Finally,theSLTR-54B(C5,Table2)wasselecteddue totheabundanceofitsarraysattheXA1.2pericentromericband.Adoublestranddimerprobewasdesigned andlabelledbyPCR.Abouthalfofallsignalsobtained inthelateprophasechromosomespreadsbelongtothe longloopsemergedfromsubtelomericregionsofchromosomesduringinevitableosmoticshock,whichisa necessarystepduringchro mosome-spreadisolation [42,43].ThesignalonthechromosomeXislocatedat thepredictedregion.However,thissignalaswellas mostoftherestcouldonlyberecognizedon fuzzy chromosomes,whenalltheDAPIstainedmaterialis visiblebutbandsareobscure.Incontrasttothe referencegenomeassembly,TR-54Bisnotasingle locusTR,becauseaboutfiftysignalsintotalarevisible onchromosomespreads(Figure11).ThefurthermappingofTR-54Busingadditionalprobeforthesubtelomericregionisrequiredtoclarifyitsexactlocation.DiscussionThecomputationapproachestothegenome-wideTR analysisgraduallyappearwiththegenomesequencing advanced[5,6,44-46].AtthechromosomallevelTRcan beofprofoundstructuralaswellasevolutionaryimportance,sincegenomicregionswithahighdensityofTR, e.g.,telomeric,centromeric,andheterochromatic regions,oftenhavespecificpropertiessuchasalternative DNAstructureandpackaging[47-49].Atthenuclear leveloforganization,constitutiveheterochromatinmay helpmaintaintheproperspatialrelationshipsnecessary fortheefficientoperationofthecellthroughthestages ofmitosisandmeiosis.Intheinterphasenucleus satDNAhaveonepropertyincommondespitetheir speciesspecificity,namelyhe terochromatization,which involvesRNAinterference-mediatedchromatinmodifications[2,3,50-54].Thestrand-specificburstintranscriptionofpericentromer icsatellitesisrequiredfor chromocentersformationin earlymousedevelopment. SpecificexpressiondynamicsofMaSatrepeats,together withtheirstrand-specificcontrol,representnecessary mechanismsduringacriticaltimewindowinpreimplantationdevelopmentthatareofkeyimportanceto consolidatethematernalandtosetupthepaternalheterochromaticstateatperi centromericdomains[55]. Suchanimportantandcrucialfindingisbasedonthe knownsequenceofthemouseMaSat.Mostoftheother mouseTRcouldnotbetestedinsimilarexperiments beingundescribedandunclassified.MousemajorsatelliteTheproportionofMaSatinatotalmouseDNApreparationisabout8%,anditishigherthantheamount ofsatDNAfoundintotalDNApreparationsfromrat andhuman[24].MaSatislocatednearchromosome centromeres[56].Themostwide-spreadopinionbased onexperimentaldataisthehighdegreeofMiSatand MaSatsequenceconservationexistsacrossthetelocentricdomainofallmousechromosomes.Theearlier publicationsdonotconfirmMaSatuniformity.There aredataforbothshortrange[57]andlong-rangeperiodicityinMaSat[58].EcoRIIdigestbreaksMaSatinto fragments,whichformaseriesofbandsongelelectrophoresis(ladder).TheDNAinthestrongestbandwas 220-260bpandtheotherbandswerethemultiplesof thislength.Thestrongerbandsoftheminorpatterns fallhalf-waybetweenthebandsofthemainpattern,and thesmallestis120to130nucleotidepairslong[58].Komissarov etal BMCGenomics 2011, 12 :531 http://www.biomedcentral.com/1471-2164/12/531 Page13of21

PAGE 14

Monomersofthecorrespondentlengtharethethirdin representationamongMaSatmonomersinthearrays (Figure3).Thesequenceisshowntobebasedona repeatingunitlessthan20bpinlength.Fourmajoroligonucleotideswereidentified,allofwhichcouldderive fromanoriginalsequenced(GA5TGA)forthelight strand[57].Shortunitsofthesizesimilartothe reportedoligonucleotides couldbetrackedbyMaSat dot-plotanalysis(Figure4D).Incontrasttoproposed MaSatuniformitybasedonlimitedexperimentaldata [25],ourresultsindicatethatitsmonomersvariationis quitehigh.DespitetheabundanceofMaSatinTRFoutputs,themajorityofMaSatisunplacedandinalllikelihoodwillbeplacedin3Mbcentromericgapsoneach chromosome.WesupposethatMaSatarrayscouldbe chromosome-specificandthusmaycometodifferent chromosomesduringattemptstofillcentromericgap. ForthispurposetheprobesbasedondifferentMaSat variantscouldbedesignedandcheckedbyFISH.MouseminorsatelliteTherewerepreviousattemptstofindMiSatchromosome-specificvariants.MiSatspecificityhasbeenshown tochromosome2withsyntheticoligonucleotideprobes andSouthernhybridization[59].Oligonucleotideprobes thatspecificallydetectsequencevariationswerefound insomeclonedMiSatfragments,andtheydetecteda limitedsubsetofMiSatarr aysusingpulse-gelelectrophoresiswithSouthernhybridizationandPRINS (primed insitu hybridization).Mostlyprominentlabel Figure9 FISHwithTRPC-21A-MMlongprobe .A:bonemarrowmetaphaseplates;B:chromos omeanalysisonthemetaphaseplates;the numbersareindicated(left).DAPIinblue,FISHsignalingreen. Insitu positivechromosomes(negativeDAPI-banded)areshown(right);bar-5 m. Komissarov etal BMCGenomics 2011, 12 :531 http://www.biomedcentral.com/1471-2164/12/531 Page14of21

PAGE 15

Figure10 FISHwithTR-22A-MMprobe .A:Primarybonemarrowmetaphaseplates(a,b)andmetaphaseplatefromcelllineL929(c).DAPIis blue,FISHsignalisgreen;bar-5 m.B:oneofthebonemarrowmetaphaseplateswithchromosomenumbersindicated.Bar-5 m.C:in eachchromosomegroupthemiddleimageisG-bandedmousechromosomefromatlas[41],theside(leftandright)negativeDAPI-banded chromosomesarefromtheplateshownonB.TenchromosomeswiththeFISHsignalareindicatedbycircles,fourchromosomeswith insitu signalthatconfirmed insilico predictionareindicatedbyorangecircles. Komissarov etal BMCGenomics 2011, 12 :531 http://www.biomedcentral.com/1471-2164/12/531 Page15of21

PAGE 16

correspondedtochromosomes1and14[28].TheexistenceofachromosomespecificMiSatimpliesthatthe rateofsequenceexchangesbetweennon-homologous chromosomesrelativetotherateofexchangebetween homologouschromosomesismuchlowerthanwaspostulatedpreviously.Basedontheseresultsthesuggestion wasmadethatthehighdegreeofsequencehomogeneity ofbothknownmousesatDNAmayreflectrecentcommonancestry[28].Still,noneoftheseprobeshavebeen workeduptobeareliablecytogeneticmarker.Since onlyafewMiSatarrayswerefoundinWGS(Additional file1,TableS2),thisdoesnotgivemuchhopetoa successfuldesignofachromosome-specificprobewith purelybioinformaticsapproach.ClassicalsatellitesBigclassicalsatDNAarewellknownandwerestudied extensively.HumansatDNA1-4(HS1-4)arebasedon a simple 5-6bpmotifandHS3ismostlywellinvestigated[10,11].HS3wasfoundinpericentromericregions ofallchromosomesbut2,6,8,11,12,18,19,andX [60,61].Chromosome-specificsubfamiliesofHS3have beendetermined[62,63]andthosethatbelongstochromosome1(HS3-1)and9(HS3-9)aretwoofthem[61]. Figure11 HighresolutionFISHwithTR-54B-MMprobe .A:(a)bonemarrowprophasechromosomespread.DAPIinblue,FISHsignalin green;bar-5 m;additionallyshownanegativeDAPI-bandedcentralcoreofchromosomes(b)and fuzzy structureofwholeDAPI-stained chromosomes(c).B:Ineachgroupthemiddleimageisfromatlas[41],theside(leftandright)negativeDAPI-bandedchromosomesarefrom theplate.Tenchromosomeswiththelabelareindicatedbycircles;chromosomeXbearingthelabelinaccordancewith insilico prediction indicatedbyorangecircle.Thelabelsontheshortchromatinloopsmarkedwithblueasterisks;labelbelongstothelongloopsmarkedwithred circles. Komissarov etal BMCGenomics 2011, 12 :531 http://www.biomedcentral.com/1471-2164/12/531 Page16of21

PAGE 17

MouseTRPC-21Aresembleshuman classical or simplesequence satellitesinmostfeatures.Thedotplotatahighmagnificationsuggeststhatunitsof~5-7 bpcouldbedistinguishedinsidethebasic21bpmonomer(greylinesbetweenblackonesonFigure5C), althoughthedegreeofdiversitydemandsaspecial investigationtodetermine theexactoligonucleotide sequence.AlloftheTRPC-21AarrayscontainHORs,it iscommoncharacteristicofcla ssicalsatellites.Letters PC areincludedintheTRPC-21Anametoindicate strictlypericentromericlocationdeterminedaccording totherelevantWGSposition,positionintheassembled genome(Tables3and4)andconfirmedbytheFISH signal(Figures8and9).Themostprominentchromosome-specificprobewasdesignedatthebaseofchromosome3variantofTRPC-21A(Table4;Additional file3,FigureS2;Additionalfile1,TableS6). TRPC-21Awasthefirsttested,butamongMLfamily someofthefamilymembers(TR-22A,TR-27A,and TR-31A)lookpromisingforchromosomes-specific probesdesignduetoHORsandtheirpresenceinthe ChrUn.TR-22AandTR-27Awerealsofoundatthe endofassembledchromosomes(Table3),sothepossibilitytomapthembyFISHisquitehigh.GCcontentItisnotablethatmostofnewlyfoundsubfamilieshave GCcontenthigherthanMaSatandMiSat-themean forTRPC-21Ais~50%,andevenhigherforMLfamily -~57%(Figure2).BothGC-richandAT-richsatDNA areknowninhumanandmostofthehigheukaryotes [64,65],henceourresultscurethestrangeasymmetric satDNAdistributionreportedformouseuntilnow.The isochors(regionsdifferinGCcontent)havethefunctionalsignificanceforoptimizationofepigeneticgenome regulationandsupportsthenotionthatnoncodingDNA isimportantfororderlychromatincondensationand chromatin-mediatedsuppressionoftissue-specificgenes [66].Theabsolutevaluesofth ermostability,bendability andabilitytoBZtransitioncorrelatedpositivelywith theGCcontent,whereascurvaturecorrelatednegatively [67].Althoughtheseconclusionsweremadeonthebase ofintronsandintergenicspacersasexamplesofnoncodingDNA,TRofdifferentGCandATcontentmay addtotheisochoricgenomestructureduetoitsabundanceinsomeregions.BarcodeAlpha-satDNAistheonlyfunctionalDNAsequence associatedwithallnaturallyoccurringhumancentromeres.Twodistinctformsofalpha-satellitearerecognizedbasedontheirorganizationandsequence properties.Alargefractionofalpha-satelliteisarranged intoHORarrayswherecorrespondingmonomersare organizedasmultimericrepeatunitsranginginsize from3to5Mb[68,69].Humanchromosome-specific probesbasedonalpha-satDNA[70,71], classical satDNA[34],andmegasate llites[6]existandareused incytogeneticanalysis.Itappearsthatusinghuman WGSandassembledgenomeasetofTRcharacteristic foreachhumanchromosomecouldbefound,suggesting thatTRmightprovideakindof barcode foreach chromosome. Thelackofmousechromosome-specificprobescauses problemsformostgenome-connectedstudies,including studiesindevelopmentalbiology.UsingWGSwehave identified62subfamilyoflargetandemlyrepeatedDNA. Thenextstepistomapmostofthemtocheckwhether thereisthechromosomespecificityinthehybridization pattern.Probably,itwillbepossibletocreateindividual chromosome bar-code setofprobestobeusedin cytogeneticanalysis.Wesuggestthatthis bar-code describestheheterochroma tinsignatureforeachchromosomeandthesesignatureshelptoarrangechromosomesinthenucleusinthespecificorderduring development.Potentially,this bar-code orsignature representsthehypotheticalMasterDevelopmentProgram,previouslyattributedtotheheterochromatic regions[72].ConclusionsEightfamiliesincluding62subfamiliesarefoundand characterizedherebybioin formaticsanalysis.Mostof themaremoreGC-richthanwellknownMaSatand MiSat.HORstructurewasdeterminedforsomeofthem suggestingtheexistenceo fTRchromosome-specific variants.ProbesfortherepresentativesofthreeTR familiesweredesignedonthebaseofTRmonomers units. Insitu hybridizationsignalpositionsareinaccordancewith insilico predictionsonthereferencegenome,althoughotherchrom osomesarelabelleddueto thepoorassemblyoftheheterochromaticgenome regions.Alongprobebasedonchromosome3variant ofTRPC-21ArecognizesthelongestfieldsofTRatthe endsofchromosomes3and17.Noreliablecytogenetic probewasdesigneduptonow.Wesupposethatwith thefutureinvestigationofthenewlycharacterizedTR familiesitwillbepossibletodeterminethesetofmouse chromosome-specificTR.MethodsSequencedatabasesMousesequenceswereobtainedfromNCBIftpsitein FASTAformat:twoWGSassembliesforprojectsAAHY andCAAA[73];thereferencegenomeassemblybuild 37.1andCeleragenomeassemblybuild37.1[74]; MGSCgenomeassemblyrelease3[75].Thegenome bandingannotationwasobtainedfromtheNCBIftpsiteKomissarov etal BMCGenomics 2011, 12 :531 http://www.biomedcentral.com/1471-2164/12/531 Page17of21

PAGE 18

[76].TheRepbasedatabaseversion15.07inFASTAformatwasobtainedfrom[77].Tocompilelocalblast databasesweusedblastdbprogramfromBLAST+suite withdefaultparameters.ProgramsandsearchparametersusedSequencealignmentswereperformedusingblastnand bl2seqfromBLAST+suite[78].SeveralsearchparameterswerechangedtoworkwithrepetitiveDNA: max_target_seqs (themaximumnumberofdatabase sequencesforwhichanyalignmentwillbereported) and num_descriptions (themaximumnumberofonelinedescriptionsofsignificantdatabasesequenced reported)weresetto10,000, evalue (expectationvalue thresholdforsavinghit)wassetto10-16, word_size for wordfinderalgorithmwassetto10, dust (argumentsto DUSTfilteringalgorithm)wassetto no soft_masking parameter(simplerepeatfilter)wassetto false .All othersearchparametersweresettodefaultvalues.TandemrepeatsearchwasperformedusingTRF[23]. Searchparameter mismatch wassetto5; maximumperiodsize wassetto2000.Othersearchparameterswere settothedefaultvalues.Selfdot-plotmatrixcomputationsweredonewithin-housesoftwarewithtwosetsof parameters:(1)windowsizesetto13bpandsimilarity indicatedbygray-scalecolorfromblack(100%window match)towhite(100%windowmismatch);(2)window sizesetto51bpandsimilarityindicatedbytwocolors: blackfor>90%(MaSatarrays)or>80%(TRPC-21A arrays)windowmatch;whiteforcorrespondingmismatch.TostorethemouselargetandemrepeatcollectionweusedMySQL5.1database.TRFoutputanalysis wasperformedwithcustomPythonscripts.3D-plots wererenderedwithMathematica 7.0.Acoordinate representationofmousechromosomeideogram[76] andbandpositionofDNArepeatswereusedforchromosomeideogramdrawingwithcustomPythonscript.TRanalysisToeliminateanyredundantentriesfromtheTRFoutput,allembeddedTRarrayswerediscarded;inthecase whentwoarrayshadthesamesequencecoordinatesa TRwithalargerunitsizewasdiscarded.Overlapping arrayswereconsideredasindependentarrays.Repbase version15.07wasusedtocompareTRwithknown repeats[79].Toremovefalsepositivematchesfrom BlastversusRepbaseresults,allmatchesthatcoveredby repeatsfromRepbaselessthan80%werediscarded. Eachpairofarrayswascomparedusingbl2seq.Wegot anumberoffalse-positivealignmentsduetothetandem natureofcomparedsequences.Toremovefalse-positive orsuspiciousalignmentswediscardedallpairmatches withascorelessthan90.T heremainingarrayswere separatedinsubfamiliesbyBlastdefinedsimilarity.Two tandemrepeatswereplacedinthesamesubfamilyif theyhaveabl2seqmatchwithscoregreaterthan90. Finally,eachsubfamilycheckedbyhandforerrors.In severalcasessubfamilyexactbordersarefuzzy(TR29A/B;TR-4A/B;TR-81AandTR-27A;TR-54A/B;TR38A/B).Thosesubfamiliespairscanbejoinedinone biggersubfamilywithlessstrictBlastparameters.MousegenomedatabasescomparisonWeusedthreemousegenomeassembles:thereference genome,thealternate(Celera)genome MGSCgenomeassembly.EachgenomehasChromosome Unknown(CrhUn)thatcontainsallunplacedor unmappedcontigsremainedafterassembly.TRFsearch withtheparameterssameasdescribedhasbeenapplied tothese6databases(Additionalfile1,TableS7).There areprominentdifferencesintotalandlargeTRamount indifferentdatabases.Itcanbeexplainedwithadifferenceinthemethodologyofgenomesequencingand assembly[80,81].Inthegenomeassemblyprocessthe additionalsequencesources(e.g.clonebasedsequence) wereused[21],whichcausedthedifferenceinTRnumberfoundinWGSandgenomeassemblies.Weusedthe referencegenomebuild37.1asthemostcomprehensible,widelyusedandcontain ingthelargestamountof largeTRtomapnewlyfoundTR.Thereferencegenomebuild37.1wasassembledwithoutWGSsequences andithassmallChrUn.Alternate(Celera)ChrUnwas usedtochecktheamountofTRfoundinWGS(Additionalfile1,TableS7).WedidnotuseMGSCgenome andChrUnassembliesasoutdatedwiththelackofY chromosome[21].ProbedesignTheprobes1and4testedinFISHtheprobewere designedasfollows.Fragmentcomposedofseveral monomerswithatotallength~150bpwaschosenfrom themostvariableregionoftandemarray,anditwas flankedbytwodifferentadapters(Additionalfile1, TableS6andAdditionalfile3,FigureS2).Theprobe wasamplifiedwithprimerstoadaptersandlabelled withbiotine-dUTPbyPCR:95C15sec,60C30sec, 72C30sec,20circles.Probes2and3weresynthesized as3 -/5 -biotinelabelled(Beagle,St.Petersburg,Russia).MitoticchromosomesChromosomespreadsfrombonemarrowcellswere madeaccordingtothepreviouslypublishedmethod [82].Colchicine(0.4ml0.04%solution)wasinjected intraperitoneallyfor90minbeforemiceweresacrificed bycervicaldislocationunder anaesthesia.Bonemarrow waswashedoutfromlegstubularcylindricalboneswith 75mMKCl.Suspensionwasincubated15minin37C andcentrifuged5min1000rpm.Pelletwasfixed15Komissarov etal BMCGenomics 2011, 12 :531 http://www.biomedcentral.com/1471-2164/12/531 Page18of21

PAGE 19

minincoldfixative(methanol:aceticacid-3:1)at4C. Suspensionandcentrifugationcycleswererepeated threetimes.Atlast,thesuspensionwasdroppedonwet coldslides,whichwereairdriedtogetridofthefixative.Metaphaseplatesofgoodqualitywereselected undermicroscope. L929cellswereculturedinPetridishesinfullmedium(DMEM+10%FCS)until40-50%confluency.The cellsweretreatedwithcolcemid(0.5 g/ml)for2h beforeharvesting.Aftertreatmentwithhypotonicsolution(50mMKCl:1%Nacitrate,1:1)for20minat 37Cthecellswerefixedwithaceticacid:methanol(1: 3),droppedontoice-coldglassslidesandair-dried. Theslideswerekeptat-20CuntilFISHwas performed.FISHwithdoublestrandedprobesFISHwithdoublestrandedprobeswasdoneintheusual way[29].Thelabeledprobesweredissolvedinthe hybridizationmixture(50 g/ lshearedyeasttotal DNA,50%formamid,10%dextransulphate,2xSSC), loadedonaslidewithcells,coveredwithasmaller coverslip,andsealedwithrubbercement.Slideswere denaturedsimultaneouslyonahot-blockat75Cfor2 min.Hybridizationwasperf ormedovernightat37Cin ahumidchamber.Post-hybridizationwashesweredone at42Cin50%formamidfor10min,twicein2xSSC for5min,0.5xSSCfor10minandfinallyin2xSSCfor 10minatroomtemperature.Theslideswerecounterstainedwith0.5 g/mlDAPIin2xSSCsolutionfor5 min,rinsedin2xSSC,andmountedinCitifluorantifade solution(CitifluorLtd,UK).FISHwithsinglestrandedoligonucleotideprobesFISHwithsinglestrandedprobeswasdoneaccordingto apublishedprotocol[83]withthefollowingmodifications.Oligonucleotidesweresynthesizedas3 -/5 -biotine labeledforprobes2and3(Additionalfile1,TableS6). AfterRNaseandpepsinpretreatment,metaphasechromosomespreadsweredehydratedinethanolseriesand air-dried.Chromosomesweredenaturedfor2minin 70%formamide,2 SSC,at65C.Afterbeingdehydratedinanice-coldethanolseriesofwashes,hybridizationwasperformedfor12-16hat37C.The hybridizationsolutioncontained5ng/mlprobe,sheared yeasttotalDNA(50 g/ l),25%formamide,4SSC. Afterhybridization,thesl ideswerewashedthreetimes for5minin2SSCatroomtemperature.Fordetection,preparationswereincubatedwithfluoresceinavidin D(VectorLaboratories)(5 g/mlin2SSCcontaining 5%BSA)for40minatroomtemperature.Thenthey werewashedthreetimesfor8minin2SSCatroom temperature.Signalamplificationwasperformedby treatingtheslideswithabio tinylatedgoatanti-avidin (VectorLaboratories)(5 g/mlin2SSCplus5%BSA) for40minatroomtemperature.Preparationswere washedagainthreetimesfor5minwith2SSCanda newincubationwithfluoresceinavidinD(Vector Laboratories)werecarriedoutfor40minatroomtemperature.Theslideswerecounterstainedwith0.5 g/ml DAPIin2SSCsolutionfor5min,rinsedin2SSC andmountedinCitifluorantifadesolution(Citifluor Ltd,UK).MicroscopyandImageAcquisitionForimageacquisitiontheconfocalmicroscopeLeica TCSSP5equippedwithimme rsion100objective,488 nmargonand405nmdiodelaserswasused.ForprimaryimageanalysisLeicaLASAFsoftwarewasused. Theseriesofconfocalsectionswerecollectedwiththe stepsize0.25 m,andmaximalprojectionsoftheseries wereobtained.Negative(inverse)DAPI-bandingpattern thatiscoincidedwithG-bandingonewascomputer processedaccordingtotheprotocolpublished[84]. Chromosomeidentificationwasgoingonwiththehelp ofimagesofindividualG-bandedmousechromosomes withdifferentlevelofcompaction[41].AdditionalmaterialAdditionalfile1:Supplementarytables .Thisfilecanbeviewedwith: AdobeAcrobatReader. Additionalfile2:CoordinatesofMaSatarrays .Thisfilecanbeviewed with:AdobeAcrobatReader. Additionalfile3:Supplementaryfigures .Thisfilecanbeviewedwith: AdobeAcrobatReader. Acknowledgements TheworkwassupportedbyMCBgrantfromtheRussianAcademyof Sciences,RFBR11-04-01700-aandNIH/NCIR01CA127378-01A1grant.The authorsthankDr.A.Fedorov,Dr.A.VoroninandDr.I.Kuznetsovaforthe fruitfuldiscussionsduringthework.ViktorNikonovisthankedforhelpwith 3D-plotrenderinginWolframMathematica 7.0.AlenaYakushevais thankedforthepicturesdesign.WearegratefultoDr.D.Kipling,Dr.K.I. Galaktionov,andB.A.Dianovaforhelpfulcommentsandlanguagerevision ofthemanuscript.Wewouldliketothanktwoanonymousreviewersand Dr.M.Gelfandforcarefulreadingandhelpfulcomments,whichimprove manuscript. Authordetails1InstituteofCytologyRAS,4Tikhoretskyavenue,194064,St.Petersburg, Russia.2FacultyofBiologyandSoilSciences,St.PetersburgStateUniversity, Universitetskayanab.7/9,St.Petersburg199034,Russia.3Departmentof AnatomyandCellBiology,UniversityofFloridaCollegeofMedicine,1376 Mowry,GainesvilleFL32610-3633,USA. Authors contributions GEVandDSJperformedprobelabelling,FISHandchromosome identification.KASperformedgenomicanalysisandprobedesign.IAM participatedindatainterpretationandhelpedtodraftthemanuscript.POI andKASconceivedtheexperimentaldesign,analyzedandinterpreteddata. POIwrotethefinalmanuscriptandcoordinatedtheproject.Allauthorshave readandapprovedthefinalmanuscript.Komissarov etal BMCGenomics 2011, 12 :531 http://www.biomedcentral.com/1471-2164/12/531 Page19of21

PAGE 20

Received:5February2011Accepted:28October2011 Published:28October2011 References1.MorrisCA,MoazedD: Centromereassemblyandpropagation. Cell 2007, 128 :647-50. 2.MartienssenRA: MaintenanceofheterochromatinbyRNAinterferenceof tandemrepeats. Naturegenetics 2003, 35 :213-4. 3.AllemanM,SidorenkoL,McGinnisK,SeshadriV,DorweilerJE,WhiteJ, SikkinkK,ChandlerVL: AnRNA-dependentRNApolymeraseisrequired forparamutationinmaize. Nature 2006, 442 :295-8. 4.EnukashvilyNI,MalashichevaAB,WaisertreigerIS-R: SatelliteDNAspatial localizationandtranscriptionalactivityinmouseembryonicE-14and IOUD2stemcells. Cytogeneticandgenomeresearch 2009, 124 :277-87. 5.AmesD,MurphyN,HelentjarisT,SunN,ChandlerV: Comparativeanalyses ofhumansingle-andmultilocustandemrepeats. Genetics 2008, 179 :1693-704. 6.WarburtonPE,HassonD,GuillemF,LescaleC,JinX,AbrusanG: Analysisof thelargesttandemlyrepeatedDNAfamiliesinthehumangenome. BMC genomics 2008, 9 :533. 7.SchuelerMG,HigginsAW,RuddMK,GustashawK,WillardHF: Genomic andgeneticdefinitionofafunctionalhumancentromere. Science 2001, 294 :109-15. 8.RuddMK,WillardHF: Analysisofthecentromericregionsofthehuman genomeassembly. Trendsingenetics 2004, 20 :529-33. 9.ChooKH: CentromereDNAdynamics:latentcentromeresand neocentromereformation. Americanjournalofhumangenetics 1997, 61 :1225-33. 10.LeeC,WevrickR,FisherRB,Ferguson-SmithMA,LinCC: Human centromericDNAs. Humangenetics 1997, 100 :291-304. 11.MoyzisRK,AlbrightKL,BartholdiMF,CramLS,DeavenLL,HildebrandCE, JosteNE,LongmireJL,MeyneJ,Schwarzacher-RobinsonT: Human chromosome-specificrepetitiveDNAsequences:novelmarkersfor geneticanalysis. Chromosoma 1987, 95 :375-86. 12.ProsserJ,FrommerM,PaulC,VincentPC: Sequencerelationshipsofthree humansatelliteDNAs. Journalofmolecularbiology 1986, 187 :145-55. 13.PodgornayaO,DeyR,LobovI,EnukashviliN: Humansatellite3(HS3) bindingproteinfromthenuclearmatrix:isolationandbinding properties. Biochimicaetbiophysicaacta 2000, 1497 :204-14. 14.KiplingD,AckfordHE,TaylorBA,CookeHJ: MouseminorsatelliteDNA geneticallymapstothecentromereandisphysicallylinkedtothe proximaltelomere. Genomics 1991, 11 :235-41. 15.KalitsisP,GriffithsB,ChooKHA: Mousetelocentricsequencesreveala high rateofhomogenizationandpossibleroleinRobertsonian translocation. ProcNatlAcadSci 2006, 103 :8786-91. 16.WongAK,RattnerJB: Sequenceorganizationandcytologicallocalization oftheminorsatelliteofmouse. Nucleicacidsresearch 1988, 16 :11645-61. 17.GuenatriM,BaillyD,MaisonC,AlmouzniG: Mousecentricandpericentric satelliterepeatsformdistinctfunctionalheterochromatin. TheJournalof cellbiology 2004, 166 :493-505. 18.BroccoliD,MillerOJ,MillerDA: RelationshipofmouseminorsatelliteDNA tocentromereactivity. Cytogeneticsandcellgenetics 1990, 54 :182-6. 19.HrzW,AltenburgerW: NucleotidesequenceofmousesatelliteDNA. Nucleicacidsresearch 1981, 9 :683-96. 20.PertileMD,GrahamAN,ChooKHA,KalitsisP: RapidevolutionofmouseY centromererepeatDNAbeliesrecentsequencestability. Genome research 2009, 19 :2202-13. 21.WaterstonRH,Lindblad-TohK,BirneyE, etal : Initialsequencingand comparativeanalysisofthemousegenome. Nature 2002, 420 :520-62. 22.MuralRJ,AdamsMD,MyersEW, etal : Acomparisonofwhole-genome shotgun-derivedmousechromosome16andthehumangenome. Science 2002, 296 :1661-71. 23.BensonG: Tandemrepeatsfinder:aprogramtoanalyzeDNAsequences. Nucleicacidsresearch 1999, 27 :573-80. 24.AbdurashitovMA,ChernukhinVA,GoncharDA,DegtyarevSK: GlaI digestionofmousegamma-satelliteDNA:studyofprimarystructure andACGTsitesmethylation. BMCgenomics 2009, 10 :322. 25.VisselB,ChooKH: Mousemajor(gamma)satelliteDNAishighly conservedandorganizedintoextremelylongtandemarrays: implicationsforrecombinationbetweennonhomologouschromosomes. Genomics 1989, 5 :407-14. 26.KiddJM,SampasN,AntonacciF,GravesT,FultonR,HaydenHS,AlkanC, MaligM,VenturaM,GiannuzziG,KallickiJ,AndersonP,TsalenkoA, YamadaNA,TsangP,KaulR,WilsonRK,BruhnL,EichlerEE: Characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions. Naturemethods 2010, 7 :365-71. 27.SheX,HorvathJE,JiangZ,LiuG,FureyTS,ChristL,ClarkR,GravesT, GuldenCL,AlkanC,BaileyJA,SahinalpC,RocchiM,HausslerD,WilsonRK, MillerW,SchwartzS,EichlerEE: Thestructureandevolutionof centromerictransitionregionswithinthehumangenome. Nature 2004, 430 :857-64. 28.KiplingD,WilsonHE,MitchellAR,TaylorBA,CookeHJ: Mousecentromere mappingusingoligonucleotideprobesthatdetectvariantsoftheminor satellite. Chromosoma 1994, 103 :46-55. 29. KuznetsovaIS,EnukashvilyNI,NoniashviliEM,ShatrovaAN,AksenovND, ZeninVV,DybanAP,PodgornayaOI: Evidencefortheexistenceof satelliteDNA-containingconnectionbetweenmetaphasechromosomes. Journalofcellularbiochemistry 2007, 101 :1046-61. 30.KuznetsovaIS,VoroninAP,PodgornayaOI: TelomereandTRF2/MTBP localizationinrespecttosatelliteDNAduringthecellcycleofmouse celllineL929. Rejuvenationresearch 2006, 9 :391-401. 31.KuznetsovaI,PodgornayaO,Ferguson-SmithM: High-resolution organizationofmousecentromericandpericentromericDNA. Cytogeneticandgenomeresearch 2006, 112 :248-55. 32.BroccoliD,TrevorKT,MillerOJ,MillerDA: Isolationofavariantfamilyof mouseminorsatelliteDNAthathybridizespreferentiallytochromosome 4. Genomics 1991, 10 :68-74. 33.RichardG-F,KerrestA,DujonB: Comparativegenomicsandmolecular dynamicsofDNArepeatsineukaryotes. MicrobiolMolBiolRev 2008, 72 :686-727. 34.EnukashvilyNI,DonevR,WaisertreigerIS-R,PodgornayaOI: Human chromosome1satellite3DNAisdecondensed,demethylatedand transcribedinsenescentcellsandinA431epithelialcarcinomacells. Cytogeneticandgenomeresearch 2007, 118 :42-54. 35.NamekawaSH,PayerB,HuynhKD,JaenischR,LeeJT: Two-stepimprinted Xinactivation:repeatversusgenicsilencinginthemouse. Molecularand cellularbiology 2010, 30 :3187-205. 36.CookeHJ,BrownWR,RappoldGA: Hypervariabletelomericsequences fromthehumansexchromosomesarepseudoautosomal. Nature 1985, 317 :687-92. 37.CookeHJ,BrownWA,RappoldGA: Closelyrelatedsequencesonhuman XandYchromosomesoutsidethepairingregion. Nature 1984, 311 :259-61. 38.GiordanoJ,GeY,GelfandY,AbrusnG,BensonG,WarburtonPE: Evolutionaryhistoryofmammaliantransposonsdeterminedbygenomewidedefragmentation. PLoScomputationalbiology 2007, 3 :e137. 39.SmitAF: Identificationofanew,abundantsuperfamilyofmammalian LTR-transposons. Nucleicacidsresearch 1993, 21 :1863-72. 40.StockingC,KozakCA: Murineendogenousretroviruses. Cellularand molecularlifesciences 2008, 65 :3383-98. 41.MamaevaS: Atlasofchromosomesofhumanandanimalcelllines Moscow: ScentificWorld;2002,1-236. 42.UshikiT,HoshiO: Atomicforcemicroscopyforimaginghuman metaphasechromosomes. ChromosomeRes 2008, 16 :383-96. 43.KireevaN,LakonishokM,KireevI,HiranoT,BelmontAS: Visualizationof earlychromosomecondensation:ahierarchicalfolding,axialgluemodel ofchromosomestructure. The Journalofcellbiology 2004, 166 :775-85. 44.AlkanC,CardoneMF,CatacchioCR,AntonacciF,O BrienSJ,RyderOA, PurgatoS,ZoliM,DellaValleG,EichlerEE,VenturaM: Genome-wide characterizationofcentromericsatellitesfrommultiplemammalian genomes. Genomeresearch 2010. 45.AlkanC,VenturaM,ArchidiaconoN,RocchiM,SahinalpSC,EichlerEE: OrganizationandevolutionofprimatecentromericDNAfromwholegenomeshotgunsequencedata. PLoScomputationalbiology 2007, 3 :1807-18. 46.MayerC,LeeseF,TollrianR: Genome-wideanalysisoftandemrepeatsin Daphniapulex-acomparativeapproach. BMCgenomics 2010, 11 :277. 47.PlohlM,LuchettiA,Mestrovi N,MantovaniB: SatelliteDNAsbetween selfishnessandfunctionality:structure,genomicsandevolutionof tandemrepeatsincentromeric(hetero)chromatin. Gene 2008, 409 :72-82.Komissarov etal BMCGenomics 2011, 12 :531 http://www.biomedcentral.com/1471-2164/12/531 Page20of21

PAGE 21

48.PodgornayaOI,VoroninAP,EnukashvilyNI,MatveevIV,LobovIB: StructurespecificDNA-bindingproteinsasthefoundationforthree-dimensional chromatinorganization. Internationalreviewofcytology 2003, 224 :227-96. 49.KobliakovaI,ZatsepinaO,StefanovaV,PolyakovV,KireevI: Thetopology ofearly-andlate-replicatingchromatinindifferentiallydecondensed chromosomes. ChromosomeRes 2005, 13 :169-81. 50.YunisJJ,YasminehWG: Heterochromatin,satelliteDNA,andcellfunction. StructuralDNAofeucaryotesmaysupportandprotectgenesandaidin speciation. Science 1971, 174 :1200-9. 51.GrewalSIS,ElginSCR: TranscriptionandRNAinterferenceinthe formationofheterochromatin. Nature 2007, 447 :399-406. 52.MaisonC,BaillyD,PetersAH,QuivyJP,RocheD,TaddeiA,LachnerM, JenuweinT,AlmouzniG: Higher-orderstructureinpericentric heterochromatininvolvesadistinctpatternofhistonemodificationand anRNAcomponent. Naturegenetics 2002, 30 :329-34. 53.MuchardtC,GuillemeM,SeelerJS,TroucheD,DejeanA,YanivM: CoordinatedmethylandRNAbindingisrequiredforheterochromatin localizationofmammalianHP1alpha. EMBOreports 2002, 3 :975-81. 54.LuJ,GilbertDM: Proliferation-dependentandcellcycleregulated transcriptionofmousepericentricheterochromatin. TheJournalofcell biology 2007, 179 :411-21. 55.ProbstAV,OkamotoI,CasanovaM,ElMarjouF,LeBacconP,AlmouzniG: A strand-specificburstintranscriptionofpericentricsatellitesisrequired forchromocenterformationandearlymousedevelopment. Developmentalcell 2010, 19 :625-38. 56.FalconerE,ChavezEA,HendersonA,PoonSS,McKinneyS,BrownL, HuntsmanDG,LansdorpPM: IdentificationofsisterchromatidsbyDNA templatestrandsequences. Nature 2010, 463 :93-7. 57.BiroPA,Carr-BrownA,SouthernEM,WalkerPM: Partialsequenceanalysis ofmousesatelliteDNAevidenceforshortrangeperiodicities. Journalof molecularbiology 1975, 94 :71-86. 58.SouthernEM: LongrangeperiodicitiesinmousesatelliteDNA. Journalof molecularbiology 1975, 94 :51-69. 59.HayashiT,OhtsukaH,KuwabaraK,MafuneY,MiyashitaN,MoriwakiK, TakahashiY,KominamiR: Avariantfamilyofmouseminorsatellite locatedonthecentromericregionofchromosome2. Genomics 1993, 17 :490-2. 60.GosdenJR,MtichellAR,BucklandRA,ClaytonRP,EvansHJ: Thelocationof fourhumansatelliteDNAsonhumanchromosomes. Cytogeneticsandcell genetics 1975, 14 :338-9. 61.TherkelsenAJ,NielsenA,KlvraaS: LocalisationoftheclassicalDNA satellitesonhumanchromosomesasdeterminedbyprimedinsitu labelling(PRINS). Humangenetics 1997, 100 :322-6. 62.HigginsMJ,WangHS,ShtromasI,HaliotisT,RoderJC,HoldenJJ,WhiteBN: Organizationofarepetitivehuman1.8kbKpnIsequencelocalizedin theheterochromatinofchromosome15. Chromosoma 1985, 93 :77-86. 63.CookeHJ,HindleyJ: CloningofhumansatelliteIIIDNA:different componentsareondifferentchromosomes. Nucleicacidsresearch 1979, 6 :3177-97. 64.PalomequeT,LoriteP: SatelliteDNAininsects:areview. Heredity 2008, 100 :564-73. 65.BeridzeT: SatelliteDNA Springer-Verlag;1986,149. 66.VinogradovAE: NoncodingDNA,isochoresandgeneexpression: nucleosomeformationpotential. Nucleicacidsresearch 2005, 33 :559-63. 67.VinogradovAE: DNAhelix:theimportanceofbeingGC-rich. Nucleicacids research 2003, 31 :1838-44. 68.MahtaniMM,WillardHF: Pulsed-fieldgelanalysisofalpha-satelliteDNA atthehumanXchromosomecentromere:high-frequency polymorphismsandarraysizeestimate. Genomics 1990, 7 :607-13. 69.PaarV,PavinN,RosandicM,GluncicM,BasarI,PezerR,ZinicSD: ColorHOR novelgraphicalalgorithmforfastscanofalphasatellite higher-orderrepeatsandHORannotationforGenBanksequenceof humangenome. Bioinformatics 2005, 21 :846-852. 70.WarburtonPE,HaafT,GosdenJ,LawsonD,WillardHF: Characterizationof achromosome-specificchimpanzeealphasatellitesubset:evolutionary relationshiptosubsetsonhumanchromosomes. Genomics 1996, 33 :220-8. 71.AlexandrovI,KazakovA,TumenevaI,ShepelevV,YurovY: Alpha-satellite DNAofprimates:oldandnewfamilies. Chromosoma 2001, 110 :253-266. 72.ParrisGE: ScopeofmedicalimplicationsoftheMasterDevelopment Programhypothesis. Medicalhypotheses 2010, 74 :953. 73. GenBankftpsite. [ftp://ftp.ncbi.nih.gov/genbank/wgs]. 74. Mousegenomeassemblybuild37.1. [ftp://ftp.ncbi.nih.gov/genomes/ M_musculus/ARCHIVE/BUILD.37.1/Assembled_chromosomes/]. 75. MGSCgenomeassemblyrelease3. [ftp://ftp.ncbi.nih.gov/genomes/ M_musculus/ARCHIVE/MGSCv3_Release3/Assembled_Chromosomes/]. 76. Mouseideograms. [ftp://ftp.ncbi.nih.gov/genomes/M_musculus/ARCHIVE/ BUILD.37.1/mapview/ideogram.gz]. 77. Repbasecollectionversion15.7. [http://www.girinst.org/server/archive/ RepBase15.07/]. 78.CamachoC,CoulourisG,AvagyanV,MaN,PapadopoulosJ,BealerK, MaddenTL: BLAST+:architectureandapplications. BMCbioinformatics 2009, 10 :421. 79.JurkaJ,KapitonovVV,PavlicekA,KlonowskiP,KohanyO,WalichiewiczJ: RepbaseUpdate,adatabaseofeukaryoticrepetitiveelements. Cytogeneticandgenomeresearch 2005, 110 :462-7. 80.BlakeJA,BultCJ,EppigJT,KadinJA,RichardsonJE: TheMouseGenome Databasegenotypes::phenotypes. Nucleicacidsresearch 2009, 37 :D712-9. 81.QuinlanAR,ClarkRA,SokolovaS,LeibowitzML,ZhangY,HurlesME, MellJC,HallIM: Genome-widemappingandassemblyofstructural variantbreakpointsinthemousegenome. Genomeresearch 2010. 82.FordEH,HamertonJL: Astudyofthemitoticchromosomesofmiceif thestrongaline. ExpCellRes 1963, 32 :320-6. 83.TagarroI,Fernndez-PeraltaAM,Gonzlez-AguileraJJ: Chromosomal localizationofhumansatellites2and3byaFISHmethodusing oligonucleotidesasprobes. Humangenetics 1994, 93 :383-8. 84.DeminS,PleskachN,SvetlovaM,SolovjevaL: High-ResolutionMappingof InterstitialTelomericRepeatsinSyrianHamsterMetaphase Chromosomes. Cytogeneticandgenomeresearch 2011, 132 :151-5.doi:10.1186/1471-2164-12-531 Citethisarticleas: Komissarov etal .: TandemlyrepeatedDNAfamilies inthemousegenome. BMCGenomics 2011 12 :531. Submit your next manuscript to BioMed Central and take full advantage of: Convenient online submission Thorough peer review No space constraints or color gure charges Immediate publication on acceptance Inclusion in PubMed, CAS, Scopus and Google Scholar Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit Komissarov etal BMCGenomics 2011, 12 :531 http://www.biomedcentral.com/1471-2164/12/531 Page21of21

PAGE 22

1 Additional f ile 1 T ables S 1 S 7 Table S 1 Distance s from CEN gap to a first gene in the reference genome ( build 37.1 ) For each chromosome except Y (Chr) distances from a first gene to centromeric gap (Dist), a number of genes (N genes) and a position o f first TR array (Dist TR, not found, "0" TR array is located just at the gap) in the 2 Mb region adjusted to centromeric gap are shown. Gene ID and type of the evidences are shown. Chr D ist (kb) N genes D ist TR (kb) First gene G ene ID G ene evid ence type 1 205 17 Xkr4 497097 best RefSeq; identical 2 36 31 9630050M13Rik 269233 best RefSeq; identical 3 149 9 0 Trnak cuu 100093675 E xternal 4 13 23 6 LOC100039124 100039124 P rotein 5 0 26 LOC171266 171266 best RefSeq; mismatch 6 19 22 82 L OC100039885 100039885 P rotein 7 72 91 LOC100041028 100041028 P rotein 8 83 62 EG664801 664801 P rotein 9 134 16 0 4930433N12Rik 114673 mRNA 10 134 11 Cnksr3 215748 best RefSeq; identical 11 19 58 0 LOC236604 236604 best RefSeq 12 13 36 LOC66826 0 668260 Protein 13 8 30 LOC667375 667375 mRNA 14 4 68 LOC100040452 100040452 mRNA; identical 15 3 15 LOC674207 674207 mRNA 16 99 47 232 LOC100042177 100042177 mRNA; identical 17 42 20 6 LOC383196 383196 protein; identical 18 161 20 113 LOC6647 94 664794 P rotein 19 94 19 LOC630980 630980 P rotein X 32 31 LOC385516 3855 1 6 best RefSeq

PAGE 23

2 Table S 2 WGS MiSat arrays with length >3 kb For each array found in WGS row index (N), unit length (Unit), array length (Length), GC%, variability between m onomers in array (Var%) and GenBank GI (GI) with array position (Start and End pos) are shown. N Unit (bp) L ength (bp) GC% Var% GI Start pos End pos 1 112 3105 33.1 7 69824452 1 3105 2 112 3054 33.2 9 69824279 1 3054 3 112 3035 33.1 6 69824189 1 303 5 4 112 3020 33.0 7 69824129 1 3020 5 112 3614 33.3 10 69825764 1 3614 6 112 3505 32.6 8 69825514 1 3505 7 112 3376 32.6 8 69825237 1 3376 8 112 3255 32.8 9 69824919 1 3255 9 112 3171 33.3 15 69824647 1 3171 10 120 3850 31.7 17 69780418 956 4805 11 120 3029 33.1 5 69824173 1 3029 12 120 6080 32.2 15 69778468 1 6080 13 120 3142 32.9 18 69825085 177 3318 14 223 3874 31.7 16 69776774 1 3874 15 232 4672 31.4 15 69827328 1 4672 16 232 4670 31.6 15 69777779 1 4670 17 232 3434 31.2 16 69825410 19 345 2 18 232 5491 31.6 12 69828178 1 5491 19 240 4490 32.2 15 69827114 8 4497 20 360 4694 32.4 14 69827298 1 4694 21 1054 3206 31.9 10 69824761 1 3206

PAGE 24

3 Table S 3 WGS Multi locus, Single locus and U n placed TR arrays with length >3 kb Columns names are th e same as in Table S 3 with the subfamily name added for each array N Subfamily Unit (bp) Length (bp) GC % Var % NCBI GI Start pos End pos TR 21A MM 1 TRPC 21A MM 21 4869 49 28 69913686 12 4880 2 TRPC 21A MM 21 4302 5 1 29 69885816 2279 6580 3 TRPC 21 A MM 21 15684 50 28 69845347 2865 18548 4 TRPC 21A MM 21 10656 50 27 69845346 1 10656 5 TRPC 21A MM 21 3336 5 1 27 69825142 1 3336 6 TRPC 21A MM 21 6763 49 30 69551604 2806 9568 7 TRPC 21A MM 21 3775 50 29 69829807 4 3778 8 TRPC 21A MM 21 7726 48 31 20 779795 3 7728 9 TRPC 21A MM 21 4581 50 27 69827752 481 5061 10 TRPC 21A MM 21 11791 50 32 69787202 12893 24683 11 TRPC 21A MM 21 3120 50 31 20595812 73097 76216 12 TRPC 21A MM 21 4956 49 30 20595819 1405 6360 13 TRPC 21A MM 21 4089 48 30 20595822 2 40 90 14 TRPC 21A MM 21 4349 4 8 31 20595828 296 4644 15 TRPC 21A MM 21 5078 50 27 20615795 3 5080 16 TRPC 21A MM 21 5288 48 30 20629104 4626 9913 17 TRPC 21A MM 21 3152 49 29 20646974 47 3198 18 TRPC 21A MM 21 3430 50 31 20653070 4 3433 19 TRPC 21A MM 2 1 4335 49 31 20663797 124 4458 20 TRPC 21A MM 21 3584 4 9 29 20663806 1 3584 21 TRPC 21A MM 21 4101 4 9 27 20679820 14184 18284 22 TRPC 21A MM 21 3584 48 30 20698680 10019 13602 23 TRPC 21A MM 21 4787 49 31 20716982 22 4808 24 TRPC 21A MM 21 3221 5 1 28 20721109 3 3223 25 TRPC 21A MM 21 14741 50 28 20725869 1 14741 26 TRPC 21A MM 21 12118 50 28 20726665 27 12144 27 TRPC 21A MM 21 7698 49 32 20733427 3 7700 28 TRPC 21A MM 21 7307 48 31 69798865 2 7308 29 TRPC 21A MM 21 5903 49 27 69798864 1 5903 30 T RPC 21A MM 21 8021 49 29 69798860 9 8029 31 TRPC 21A MM 21 3715 49 29 69798859 66 3780 32 TRPC 21A MM 21 4107 50 31 20757956 2436 6542 33 TRPC 21A MM 21 6271 4 9 28 69798688 11593 17863 34 TRPC 21A MM 21 9932 50 27 20763719 1 9932 35 TRPC 21A MM 42 298 84 48 33 69970350 142 30025 36 TRPC 21A MM 42 7307 49 30 20779788 9 7315 37 TRPC 21A MM 42 15628 50 30 69779856 8 15635 38 TRPC 21A MM 42 7426 48 30 69779230 703 8128

PAGE 25

4 N Subfamily Unit (bp) Length (bp) GC % Var % NCBI GI Start pos End pos 39 TRPC 21A MM 42 4739 49 27 20706577 228 4966 40 TRPC 21A MM 42 3738 4 8 27 20725369 218 3955 41 TRPC 21A MM 42 6167 48 30 69798871 2099 8265 42 TRPC 21A MM 63 4417 50 28 69845345 3 4419 43 TRPC 21A MM 63 8397 48 32 69970350 58294 66690 44 TRPC 21A MM 63 12594 50 28 69779062 4 12597 45 TRPC 21A MM 63 4386 49 30 69798870 1 4386 46 TR PC 21A MM 125 3137 49 32 20725866 4 3140 47 TRPC 21A MM 168 10758 49 32 69970349 50388 61145 48 TRPC 21A MM 209 17198 4 9 31 69970350 33180 50377 49 TRPC 21A MM 209 9587 48 32 20698680 294 9880 50 TRPC 21A MM 209 7481 48 29 69798863 17 7497 Multi locus TR 51 TR 4A MM 4 7704 44 37 69762187 10027 17730 52 TR 4A MM 6 3001 28 29 69760073 30856 33856 53 TR 4A MM 8 3063 37 30 20741311 1472 4534 5 4 TR 4A MM 10 3416 27 34 69760073 36036 39451 55 TR 4A MM 19 3304 32 31 69895613 541 3844 56 TR 4A MM 23 3661 30 18 69995841 18009 21669 57 TR 4B MM 4 4161 44 38 69762184 3574 7734 58 TR 4B MM 30 3294 48 33 69798594 275 3568 59 TR 6A MM 6 4151 60 40 69772082 12 4162 60 TR 6A MM 14 6649 60 39 69772081 68 6716 61 TR 16A MM 16 6765 53 24 69920266 4719 11483 62 TR 16A MM 16 3350 55 11 69894355 7221 10570 63 TR 18A MM 19 4013 56 25 69770893 9 4021 64 TR 18A MM 19 5644 56 29 20615423 2663 8306 65 TR 18A MM 19 3855 55 28 69780122 4397 8251 66 TR 19B MM 19 4175 46 13 69874805 8073 12247 67 TR 19B MM 19 5852 51 30 20692411 12070 17921 68 TR 19B MM 19 4062 46 14 20740135 1 4062 69 TR 20A MM 20 3470 55 26 69767056 338141 341610 70 TR 20A MM 20 3604 36 36 69980749 39780 43383 71 TR 22A MM 22 4784 58 23 69924996 3752 8535 72 TR 22A MM 22 3429 59 30 69777571 16 3 444 73 TR 22A MM 22 8198 58 23 69916504 11 8208 74 TR 22A MM 22 5824 58 23 69916503 5 5828 75 TR 22A MM 22 3090 59 25 69863943 4 3093 76 TR 22A MM 22 5424 59 25 69863942 2 5425 77 TR 22A MM 22 12896 57 23 69780624 1 12896 78 TR 22A MM 22 5674 57 22 6 9798385 3 5676 79 TR 22A MM 44 8522 58 27 69916502 34698 43219 80 TR 27A MM 27 3346 39 26 69892976 4096 7441 81 TR 27A MM 27 7073 39 29 69863967 1 7073 82 TR 27A MM 27 4261 40 26 69863966 15 4275

PAGE 26

5 N Subfamily Unit (bp) Length (bp) GC % Var % NCBI GI Start pos End pos 83 TR 27A MM 27 4929 40 26 20735549 1 4929 84 TR 30A M M 30 3912 46 37 69844207 316 4227 85 TR 31A MM 31 3673 50 13 69826951 740 4412 86 TR 31A MM 93 6558 52 27 69864994 1866 8423 87 TR 31B MM 31 3037 53 18 69778775 1 3037 88 TR 31B MM 31 4922 53 20 69778774 1 4922 89 TR 31C MM 31 3147 50 23 69406430 1 31 47 90 TR 31C MM 31 13059 49 25 69963525 1 13059 91 TR 31C MM 31 3584 49 23 69963521 1 3584 92 TR 31C MM 31 10181 50 25 20654307 3 10183 93 TR 38C MM 38 6796 50 26 69871294 6158 12953 94 TR 38C MM 38 6797 50 29 20595065 37970 44766 95 TR 38C MM 38 360 1 46 22 20622731 26471 30071 96 TR 57A MM 57 3229 59 30 69887864 83720 86948 97 TR 57A MM 192 3228 59 31 69887864 83719 86946 98 TR 57A MM 1869 5619 39 1 20614333 30335 35953 99 TR 58A MM 58 4652 51 34 69922432 22463 27114 100 TR 58A MM 58 4658 51 35 20611377 1479 6136 101 TR 81A MM 81 3483 40 31 69845344 11316 14798 102 TR 1149A MM 1149 5463 46 13 69830128 1816 7278 103 TR 1164A MM 1164 3333 49 19 69763564 3174 6506 104 TR 1521A MM 1521 3211 45 11 20694878 25015 28225 105 TR 1521A MM 1526 3213 45 12 69980699 1682 4894 106 TR 1527A MM 1527 3120 46 12 20695066 1 3120 107 TR 1595A MM 1595 3007 39 16 69766272 1407 4413 Single locus TR 108 TR 17A MM 17 3748 44 31 69842800 104362 108109 109 TR 17A MM 17 3443 44 31 20724788 12 3454 110 TR 17A MM 38 10270 43 34 69842800 78622 88891 111 TR 17A MM 75 10312 43 35 69842800 78568 88879 112 TR 17A MM 173 3828 43 34 20735817 16645 20472 113 TR 19A MM 19 4883 50 32 69839474 1 4883 114 TR 19A MM 19 4101 50 32 20610023 100 4200 115 TR 23A MM 23 6018 43 13 20784691 54 6071 116 TR 24C MM 24 3298 53 9 69873212 8626 11923 117 TR 29A MM 29 4175 51 27 20782309 21613 25787 118 TR 29A MM 29 3389 50 26 20567260 3 3391 119 TR 29A MM 29 3540 50 25 20567262 1329 4868 120 TR 29A MM 145 3591 50 28 20567262 749 4339 121 TR 29B MM 29 15896 50 30 69974133 17596 33491 122 TR 31D MM 31 5175 56 24 69827874 1 5175 123 TR 33A MM 33 3601 53 17 20584741 3 3603 124 TR 34A MM 34 3352 55 7 69872454 20569 23920 125 TR 34A MM 34 3354 55 7 20680935 20521 23874 126 TR 38A MM 3 8 5426 42 27 20783783 76 5501

PAGE 27

6 N Subfamily Unit (bp) Length (bp) GC % Var % NCBI GI Start pos End pos 127 TR 38A MM 38 3941 44 23 20670356 6 3946 128 TR 38B MM 38 6113 42 26 69904150 1 6113 129 TR 38B MM 38 4436 42 25 69904117 1 4436 130 TR 39A MM 39 3387 39 5 20793760 481 3867 131 TR 40A MM 40 6612 63 21 69856745 14026 2 0637 132 TR 44A MM 44 3016 36 9 20625740 1 3016 133 TR 48A MM 48 6603 51 28 69861320 12400 19002 134 TR 54A MM 54 4947 48 28 20687861 2718 7664 135 TR 54A MM 162 10744 48 32 20752452 3470 14213 136 TR 54A MM 269 5702 48 35 20756725 1 5702 137 TR 54B MM 54 3164 48 26 69768762 6466 9629 138 TR 54B MM 108 11978 48 33 69768777 40749 52726 139 TR 54B MM 161 3322 48 28 69768758 369 3690 140 TR 54B MM 162 10870 48 31 69768762 6494 17363 141 TR 56A MM 56 4194 43 22 69768250 2137 6330 142 TR 56A MM 168 40 46 42 23 69768246 2003 6048 143 TR 84A MM 84 3040 49 9 20793506 14440 17479 144 TR 93A MM 93 3124 51 24 69825195 16 3139 145 TR 100A MM 100 4364 44 26 69768907 7155 11518 146 TR 100A MM 100 4364 44 26 20680076 7577 11940 147 TR 111A MM 111 3347 52 22 20782355 1996 5342 148 TR 234A MM 234 4472 60 4 20792971 2 4473 149 TR 234A MM 696 6878 59 6 20794729 16026 22903 150 TR 297A MM 297 3100 56 19 69842306 98123 101222 151 TR 321A MM 321 3152 42 21 20643323 25711 28862 152 TR 734A MM 734 9507 37 24 2074 7431 35163 44669 153 TR 734A MM 740 8739 37 21 20747431 26230 34968 154 TR 734A MM 1474 4899 37 21 69762220 13783 18681 155 TR 814A MM 814 3175 45 3 20649433 1845 5019 156 TR 1146A MM 1146 3056 47 14 69834409 63293 66348 157 TR 1284A MM 1284 5239 42 1 4 69888739 38050 43288 158 TR 1384A MM 1384 3665 26 8 20679591 308 3972 159 TR 1870A MM 1870 5795 45 2 20649296 1 5795 160 TR 1870A MM 1872 4818 44 5 20646096 1 4818 161 TR 1870A MM 1873 4540 45 3 20675563 1 4540 162 TR 1872A MM 1872 4285 43 17 698996 72 10072 14356 163 TR 1908A MM 1908 4126 40 14 20745884 6859 10984 Un p laced TR 164 TR 13A MM 13 4477 56 15 69917804 1327 5803 165 TR 24A MM 24 3322 46 23 20641441 6 3327 166 TR 24A MM 24 3540 45 22 20756212 1 3540 167 TR 24B MM 24 4393 35 26 69883785 2 4394 168 TR 24B MM 24 7636 34 28 69883682 2 7637 169 TR 24B MM 24 4707 34 29 69827337 1 4707 170 TR 27B MM 27 3452 64 36 69798693 7952 11403

PAGE 28

7 N Subfamily Unit (bp) Length (bp) GC % Var % NCBI GI Start pos End pos 171 TR 28A MM 28 3195 41 23 69825025 1 3195 172 TR 36A MM 36 3003 64 8 69834529 8 3010 173 TR 102A MM 102 3698 46 31 69765558 1 3698 174 TR 624A MM 624 3297 43 2 20754462 1 3297

PAGE 29

8 Table S 4 WGS TE related arrays with length >3 kb Columns names are the same as in Table S3 N Unit (bp) Length (bp) GC% Var% GI Start pos End pos TR MTA MM 1 1478 3351 45.8 1 20568724 10446 13796 2 1484 3363 46.2 1 69867327 15344 18706 3 1488 3225 45.5 19 69762556 22864 26088 4 1488 3379 45.4 18 20681598 1809 5187 5 1489 4850 45.4 1 69900412 13860 18709 6 1490 3375 45.8 0 69916732 23310 26684 7 1491 3370 45.9 2 69921411 1 3359 16728 8 1492 3381 45.2 0 69866223 1031 4411 9 1493 3497 45.4 3 20676965 258 3754 10 1493 3377 45.8 2 20705703 6907 10283 11 1494 3386 46.3 1 69958941 35314 38699 12 1497 3384 45.1 2 20610870 2687 6070 13 1502 4272 46.4 1 20583185 1 4272 14 1505 3352 46.1 2 20564801 17230 20581 15 1551 3498 45.0 4 20637905 13405 16902 TR LINE MM 16 1090 3284 42.9 9 20565614 5144 8427 17 1119 3127 41.7 11 20613371 2668 5794 18 1185 3559 39.6 5 20698228 1682 5240 19 1304 3880 44.7 4 69956417 1350 5229 20 152 8 3973 40.5 5 69762966 24672 28644 21 1559 3114 42.0 17 69834101 27784 30897 22 1581 3169 42.4 8 69983060 25193 28361 23 1590 3187 43.8 3 69837416 15306 18492 24 1595 3195 42.6 12 20778512 12941 16135 25 1601 3251 43.1 7 20701771 2788 6038 26 1627 42 14 42.3 3 20600012 272 4485 27 1874 3712 42.2 10 69834759 16426 20137 28 1877 3702 42.1 11 20597462 22 3723 29 1962 3863 42.0 9 69885494 34804 38666 30 1982 3949 42.4 9 69960655 25917 29865

PAGE 30

9 Table S 5 Positions of TE related arrays with length >3 kb in the mouse reference genome (build 37.1) For each array row index (N), chromosome N (Chr) and Chromo band, start and end position in the reference genome are shown. Alignment length is the length of a genomic region covered with precise array. N Chr Chr omo Band Start Position (bp) End Position (bp) Alignment length (bp) TR LINE MM 1 1 1A2 6538348 6541607 3259 2 1 1B 26546515 26550550 4035 3 1 1B 30527087 30530065 2978 4 1 1C1.2 44531737 44535648 3911 5 1 1C1.2 46498285 46501441 3156 6 1 1C1.2 4675 6918 46760143 3230 7 1 1C1.2 48744898 48748065 3167 8 1 1C1.2 48910328 48914008 3743 9 1 1C4 67764812 67768002 3194 10 1 1C4 70007668 70010917 3249 11 1 1D 81444956 81448665 3790 12 1 1E1.2 100291932 100294973 3041 13 1 1E1.2 101437680 101440834 322 5 14 1 1E1.2 101474628 101477508 2880 15 1 1E1.2 101821917 101825051 3147 16 1 1E2.2 108570242 108573341 3099 17 1 1E2.2 111152468 111155646 3843 18 1 1E3 114267839 114271625 3786 19 1 1E3 115617472 115620725 3266 20 1 1E3 119585097 119588204 3108 21 1 1F 132301392 132304559 3167 22 1 1G2 148889162 148892240 3078 23 1 1H5 179759633 179762784 3151 24 1 1H5 179847620 179851576 3956 25 2 2A2 3764054 3767782 3811 26 2 2A2 11912285 11915749 3464 27 2 2A2 12584733 12588362 3629 28 2 2B 15502152 155 05315 3185 29 2 2B 17011062 17014036 2974 30 2 2C1.2 48906504 48909523 3019 31 2 2C1.2 56784083 56787377 3294 32 2 2C1.2 57424863 57428488 3708 33 2 2E2 83218105 83221691 3586 34 2 2E2 86588263 86591281 3018 35 2 2E2 87724158 87727847 3752 36 2 2E2 95055992 95059259 3267 37 2 2E2 95125623 95128884 3261 38 2 2E2 97683248 97686487 3245 39 2 2E4 111012787 111015776 2989 40 2 2F3 123772834 123776340 3506 41 2 2G2 139086653 139089811 3158 42 2 2G2 140704524 140707406 2882 43 2 2H1 151277926 151281 284 3358 44 2 2H1 151303993 151307247 3254 45 3 3A2 4329803 4332693 2890 46 3 3A2 5112053 5115147 3099 47 3 3A2 6036134 6039150 3016 48 3 3A2 6599986 6603129 3143 49 3 3A2 10002125 10005358 3233 50 3 3A2 18252262 18256254 3998 51 3 3B 18882173 1888 5328 3159

PAGE 31

10 N Chr Chr omo Band Start Position (bp) End Position (bp) Alignment length (bp) TR LINE MM 52 3 3B 22540031 22544101 4083 53 3 3B 22883572 22886468 2896 54 3 3D 41650908 41653916 3019 55 3 3D 55944653 55947734 3081 56 3 3E2 61475717 61478708 2991 57 3 3E2 61930403 61934298 3895 58 3 3F1 70411095 70414545 3450 59 3 3F1 81397986 81401269 3283 60 3 3F2.2 90707536 90710613 3077 61 3 3F3 106663642 106667609 3971 62 3 3H1 133515183 133519087 3904 63 3 3H1 134732641 134735786 3145 64 3 3H3 139038573 139041743 3170 65 3 3H3 144433609 144436861 3252 66 4 4A2 4200374 4203694 3320 67 4 4A2 7634501 7638447 3949 68 4 4A4 18992723 18996641 3918 69 4 4C4 74748540 74752240 3700 70 4 4C4 80127224 80130418 3223 71 4 4C6 89385628 89388853 3225 72 4 4C6 96512963 96516768 3876 73 4 4D2.2 113513641 113516819 3184 74 4 4D2.2 114981068 11 4984491 3423 75 5 5A2 12776311 12779975 3664 76 5 5B1 15045395 15048626 3255 77 5 5B3 26634125 26637339 3218 78 5 5C2 54950922 54954121 3228 79 5 5E2 85538841 85541710 2869 80 5 5E2 89762491 89766353 3862 81 5 5E4 94054232 94057403 3171 82 6 6B3 49 835162 49838351 3189 83 6 6C2 65247127 65256435 9330 84 6 6C2 69505679 69508678 2999 85 6 6C2 73495770 73499723 3953 86 6 6D3 89679335 89682498 3168 87 6 6D3 90169142 90172227 3085 88 6 6F3 121879755 121883404 3649 89 6 6F3 123080116 123084000 3955 90 6 6G2 131718311 131721314 3003 91 6 6G2 132585280 132588477 3226 92 6 6G3 142127405 142131316 3914 93 7 7B1 17811407 17814548 3154 94 7 7B3 32170794 32173801 3007 95 7 7B3 32906624 32909783 3159 96 7 7B3 33645188 33648351 3163 97 7 7B3 33872085 33875247 3162 98 7 7B3 39125037 39128802 3765 99 7 7B5 48965411 48968513 3102 100 7 7B5 55826053 55829745 3692 101 7 7D1 62858952 62862142 3202 102 7 7D1 68100912 68104131 3230 103 7 7D1 68251632 68254828 3210 104 7 7D1 70544436 70547294 2858 105 7 7E2 95824993 95828113 3132 106 7 7E2 96842603 96845823 3246 107 7 7F1 113541782 113545673 3891 108 8 8A4 30621163 30624226 3063 109 8 8B3.2 58668811 58671827 3016

PAGE 32

11 N Chr Chr omo Band Start Position (bp) End Position (bp) Alignment length (bp) TR LINE MM 110 8 8B3.2 58834024 58837215 3197 111 8 8B3.2 62708560 62711735 3180 112 8 8B3.2 645 06155 64509542 3387 113 8 8B3.2 65774983 65780527 5607 114 8 8B3.2 66298633 66302587 3954 115 8 8C1 69049720 69053257 3537 116 8 8C1 73574835 73578831 3996 117 8 8D2 101192544 101195583 3039 118 8 8E1 105470284 105473252 2968 119 8 11A2 131299974 13 1304452 4478 120 9 9A2 8229749 8232922 3173 121 9 9A2 12538018 12541977 3963 122 9 9A4 16599046 16602975 3929 123 9 9A4 18043919 18047872 3953 124 9 9A4 23464563 23468525 3962 125 9 9A5.2 25649201 25652183 2982 126 9 9A5.2 25979682 25982733 3051 12 7 9 9A5.2 26680326 26683552 3226 128 9 9E3.2 83172562 83176224 3745 129 9 9E3.2 84049846 84052779 2933 130 9 9E3.2 87684471 87689308 4840 131 9 9F3 105048812 105051983 3171 132 9 9F3 109204212 109208033 3824 133 9 9F3 109256192 109260011 3890 134 9 9F3 109295944 109299170 3230 135 10 10B2 38479057 38482588 3531 136 10 10B4 47248386 47251665 3280 137 10 10B4 47615536 47618790 3254 138 10 10B4 48314097 48317981 3955 139 10 10B4 49560360 49563567 3207 140 10 10B4 49657259 49660237 2978 141 10 10B 4 52214918 52218602 3747 142 10 10B5.2 65787396 65791114 3807 143 10 10C3 95467930 95471050 3120 144 10 10D2 105387660 105390703 3043 145 11 11A2 8222737 8226666 3933 146 11 11A2 9256588 9259930 3342 147 11 11A2 11993516 11996414 2898 148 11 11A3.2 13763035 13766717 3787 149 11 11A3.2 16281623 16285264 3705 150 11 11A4 26758967 26762451 3484 151 11 11A4 28685025 28688181 3156 152 11 11A4 29236183 29239481 3298 153 11 11B1.1 40055348 40058248 2971 154 11 11B1.3 45420129 45423055 2926 155 11 11D 89826761 89829962 3210 156 11 11E2 105565019 105568704 3748 157 12 12A1.2 5803240 5806391 3151 158 12 12A1.2 9784395 9788057 3662 159 12 12B1 27926337 27929757 3420 160 12 12C2 45589211 45593128 3921 161 12 12C2 50958842 50961981 3139 162 12 12C2 6 2303932 62307122 3235 163 12 12C2 65862526 65866307 3852 164 12 12F1 102471912 102475082 3199 165 12 12A1.2 115950311 115954025 3841 166 12 12C2 116121534 116125302 3839 167 12 12F1 116890059 116892978 2922

PAGE 33

12 N Chr Chr omo Band Start Position (bp) End Position (bp) Alignment length (bp) TR LINE MM 168 12 12A1.2 116983962 116986931 2969 169 13 13A2 11476000 11479490 3490 170 13 13A3.2 16342821 16346681 3860 171 13 13A3.2 22929568 22932637 3069 172 13 13B1 50591334 50595305 3976 173 13 13B3 61329896 61332956 3060 174 13 13C2 70015945 70019825 3880 175 13 13C2 77841875 77845077 3212 176 13 13D1 82157765 82160938 3178 177 13 13D1 84979160 84982082 2922 178 13 13D1 87326313 87329409 3096 179 13 13D1 90587629 90591585 3956 180 13 13D1 92089603 92092909 3327 181 14 14C2 37193667 37197484 3817 182 14 14C2 42679815 42683634 3819 183 14 14C2 43024244 43028063 3819 184 14 14C2 43438020 43441839 3819 185 14 14C2 43629218 43633038 3820 186 14 14D1 59501748 59505477 3729 187 14 14D3 71456588 71459633 3045 188 14 14E5 109969382 109972543 3161 189 14 14E5 119637177 119640212 3035 190 15 15A2 5323741 5327600 3863 191 15 15A2 14889788 14892765 2977 192 15 15B2 18590028 18594015 3987 193 15 15B2 19064719 19068627 3911 194 15 15B2 20435500 20438676 3176 195 15 15B2 24735827 24738987 3164 196 15 15B2 26099537 26103433 3896 197 15 15B2 2 9275259 29278377 3123 198 15 15B3.2 29924994 29928191 3226 199 15 15B3.2 33658595 33662420 3896 200 15 15B3.2 39526886 39530869 3983 201 15 15C 45634175 45637441 3297 202 15 15D2 52353886 52357074 3217 203 15 15D2 63626140 63629326 3191 204 15 15D2 65894286 65897450 3187 205 15 15E3 82554452 82557507 3060 206 17 17B3 38320865 38324574 3772 207 17 17B3 38375792 38379483 3755 208 17 17B3 92821587 92825279 3755 209 18 18A2 12484457 12487579 3122 210 18 18A2 19195065 19197960 2895 211 18 18C 41674 103 41677391 3288 212 18 18C 41763448 41766618 3170 213 18 18D2 50597051 50600235 3184 214 18 18D2 52612627 52615699 3072 215 18 18E1 60498102 60502037 3935 216 18 18E1 85786652 85790379 3727 217 18 18C 90224992 90228683 3773 218 19 19B 7993081 7996 231 3150 219 19 19B 9726672 9730270 3598 220 19 19B 13323411 13326359 2948 221 19 19B 14045002 14048115 3113 222 19 19D1 39426481 39432408 5927 223 19 19D3 51728435 51732398 3963 224 19 19D3 52505202 52508855 3653 225 X XA1.2 7933876 7936864 2988

PAGE 34

13 N Chr Chr omo Band Start Position (bp) End Position (bp) Alignment length (bp) TR LINE MM 2 26 X XA1.2 8055758 8059013 3255 227 X XA1.2 8361541 8364515 2974 228 X XA2 19354237 19358030 3796 229 X XA2 19621522 19624644 3140 230 X XA3.2 21100522 21104454 3932 231 X XA3.2 22606305 22610003 3761 232 X XA4 36184421 36187755 3334 233 X XA4 36967 814 36970874 3060 234 X XA6 47255840 47258912 3072 235 X XA7.2 61138059 61141954 3966 236 X XA7.2 61162816 61165732 2916 237 X XA7.2 63588958 63592482 3527 238 X XA7.2 67525370 67528424 3054 239 X XB 68591885 68594881 2996 240 X XB 72603481 72607492 4011 241 X XB 73798138 73801477 3339 242 X XC2 82617192 82620235 3043 243 X XD 90488309 90491966 3740 244 X XD 92461140 92465114 3974 245 X XE2 105690272 105693332 3060 246 X XE2 105804998 105807945 2947 247 X XE2 112647960 112650875 2915 248 X XE 2 116796602 116800031 3429 249 X XF1 123360830 123364018 3192 250 X XF1 125021740 125024862 3122 251 X XF1 125847938 125850879 2941 252 X XF3 132164562 132167756 3199 253 X XF3 133950754 133954654 3904 254 X XF3 138583415 138587051 3714 255 X XF3 14 2144293 142148029 3799 256 X XF5 146806123 146809930 3810 257 X XF5 151446178 151449374 3204 258 X XF5 153758165 153761282 3132 259 X XF5 155336215 155339255 3040 260 X XF5 155571523 155576301 4801 261 X XF5 162092385 162095448 3063 262 X XF5 165195 974 165198845 2900 TR MTA MM 263 1 1C1.2 32014240 32022623 8383 264 2 2A2 5466924 5469925 3001 265 2 2B 20344705 20348204 3499 266 2 2C1.2 46198663 46203548 4885 267 3 3F1 77771511 77774895 3385 268 3 3F2.2 96074241 96077418 3177 269 4 4B1 37970056 37973441 3385 270 4 4C6 86104230 86109095 4865 271 5 5F 103128255 103131638 3383 272 6 6E2 102800300 102803770 3470 273 10 19D3 129900158 129903529 3371 274 12 12C2 52134210 52137613 3403 275 12 12D3 84388876 84392397 3521 276 12 12D3 84657947 8466 1468 3521 277 16 16C1.2 44947448 44950829 3381 278 16 16C4 85781779 85785167 3388 279 17 17B3 33191637 33195008 3371 280 17 17B3 34599239 34602615 3378 281 19 19D1 39330195 39345529 15334 282 19 19D3 53042426 53045797 3371

PAGE 35

14 N Chr Chr omo Band Start Position (bp) End Position (bp) Alignment length (bp) TR LINE MM 283 X XB 69435096 69438444 3350 284 X XC2 77211342 77214715 3373

PAGE 36

15 Table S 6 Probes used for FISH For each probe subfamily and family (according to the tables 4 6), probe type : double stranded (DS) or single stranded (SS), probe length with monomer multiplication, and probe se quence are shown. In the sequences every ten nucleotide is in bold, adaptors indicated by upper case. N Subfamily Family Type Length Sequence 1 TRPC 21A MM C3, ML DS 163 bp (42 bp x3) GGTCGAAGA C ACGAAGAAC T TTtgtcacag t gctccgctgt g gtgacaaag t g tctactg t gttgcaaa g t gctcaactt t tgtgtcaca a tgctgcact g tgttgtcac a ttctggaac t gtggtgtca c agtgttcca c tttttAGAC C GTCATCGGC G TAG 2 TRPC 21A MM C3, ML SS 42 bp (21 bp x2) gtgtcacag t gctccactg t ggtgtcaca g tgctccact g tg 3 TR 22A MM C4, ML SS 22 bp tagccccag g gcccaaccc a tt 4 TR 54B MM C5, SL DS 151 bp (54 bp x2) GGTCGAAGA C ACGAAGAAC T TTggattgg g ccttactgt c ctttgcata c c gcaacaca c tctgcagct a ggatggact a agccttact g tccttagac t gacctacag c acaccctgt a gctaggata c accAGACCG T CATCGGCGT A G Table S 7 Tandem repeats in mouse genome assemblies Assembly Build Assembly size (bp) Number of contigs TR (total) % of assembly TR (>3 kb) Reference Genome 37.1 2,654,895,218 21 826,028 2.6% 234 Alternate (Celera) Assembly 37.1 2,679,921,514 21 760,414 2.2% 121 MGSC Assembly 3 2,580,596,378 20* 752,199 2.2% 77 ChrUn Ref 37.1 3,350,358 52 1,032 12.2% 29 ChrUn Alt (Celera) 37.1 95,253,641 12,483 28,090 10.3 704 ChrUn MGSC 3 103,946,130 42,254 38,177 9.9% 111 *except Y chromosome

PAGE 37

1 Additional file 2 List of M a Sat arrays in the mouse WGS assemblies with length greater than 3 kb For each array found in WGS unit length (Unit), array length (Length), GC%, variability between monomers in array (Var%), HOR, and GenB ank GI (GI) with array position (Start and End pos) are shown. All MaSat arrays present in ChrUn. N Unit (bp) Length (bp) GC% Var% HOR GI Start pos End pos 1 30 3234 36.5 46 + 69778673 1634 4867 2 58 7262 35.0 31 + 69829786 34 7295 3 58 7942 35.4 31 + 69829517 71 8012 4 58 7563 35.1 31 + 69829356 1 7563 5 58 6037 34.2 30 + 69829296 4 6040 6 58 6725 35.3 33 + 69829197 1 6725 7 58 3126 35.3 33 + 69829087 48 3173 8 58 4809 36.1 30 + 69828991 47 4855 9 58 6572 36.0 31 + 69828909 1 6572 10 58 6424 35.7 34 + 69828831 2 6425 11 58 5042 35.4 30 + 69828825 8 5049 12 58 6046 35.4 31 + 69828614 1 6046 13 58 5597 36.1 30 + 69828374 110 5706 14 58 5424 37.1 32 + 69828164 44 5467 15 58 5250 35.2 31 + 69827957 7 5256 16 58 5062 35.4 32 + 69827764 8 5069 17 58 5049 35.6 33 + 698 27741 1 5049 18 58 5001 35.9 29 + 69827710 34 5034 19 58 4945 35.4 31 + 69827619 1 4945 20 58 4925 35.6 30 + 69827589 1 4925 21 58 3438 34.1 32 + 69827529 21 3458 22 58 4539 35.9 31 + 69827148 31 4569 23 58 4545 34.6 30 + 69827115 1 4545 24 58 4396 35.5 31 + 69826932 2 4397 25 58 4338 35.0 32 + 69826878 7 4344 26 58 4260 35.6 29 + 69826799 26 4285 27 58 4183 35.8 30 + 69826678 4 4186 28 58 4048 35.8 31 + 69826544 51 4098 29 58 4064 35.2 31 + 69826540 11 4074 30 58 4043 35.8 33 + 69826458 2 4044 31 58 4028 36.1 31 + 69826429 2 4029 32 58 4010 35.8 30 + 69826402 1 4010 33 58 3953 35.5 37 + 69826390 54 4006 34 58 3917 34.9 31 + 69826302 1 3917 35 58 3929 34.9 33 + 69826300 10 3938 36 58 3845 34.6 33 + 69826261 1 3845 37 58 3868 37.9 29 + 698 26197 3 3870 38 58 3830 36.0 29 + 69826179 33 3862 39 58 3351 35.0 31 + 69826178 506 3856 40 58 3796 34.6 33 + 69826057 1 3796 41 58 3407 34.9 32 + 69826048 385 3791 42 58 3729 35.3 31 + 69826010 44 3772 43 58 3659 34.0 32 + 69825951 45 3703 44 58 3 662 35.3 32 + 69825874 5 3666 45 58 3642 34.9 34 + 69825813 1 3642 46 58 3632 35.9 31 + 69825807 4 3635 47 58 3623 35.3 30 + 69825806 7 3629 48 58 3558 35.2 31 + 69825717 15 3572 49 58 3590 35.5 31 + 69825714 1 3590 50 58 3537 34.9 32 + 69825615 1 35 37

PAGE 38

2 N Unit (bp) Length (bp) GC% Var% HOR GI Start pos End pos 51 58 3508 35.6 33 + 69825596 4 3511 52 58 3524 35.0 30 + 69825574 1 3524 53 58 3507 35.1 31 + 69825530 5 3511 54 58 3482 35.4 30 + 69825525 28 3509 55 58 3440 35.9 30 + 69825474 34 3473 56 58 3479 34.8 30 + 69825463 2 3480 57 58 3472 36.5 30 + 6 9825447 1 3472 58 58 3458 35.6 30 + 69825443 4 3461 59 58 3319 38.0 34 + 69825345 3 3321 60 58 3422 35.0 30 + 69825340 1 3422 61 58 3403 35.0 33 + 69825298 1 3403 62 58 3386 35.0 31 + 69825261 1 3386 63 58 3370 35.7 30 + 69825250 2 3371 64 58 3364 3 5.2 31 + 69825222 9 3372 65 58 3326 35.7 30 + 69825193 1 3326 66 58 3335 35.9 30 + 69825190 1 3335 67 58 3125 35.2 31 + 69825189 228 3352 68 58 3352 36.3 30 + 69825187 5 3356 69 58 3334 35.7 31 + 69825167 1 3334 70 58 3335 35.4 31 + 69825157 1 3335 71 58 3325 35.3 32 + 69825114 5 3329 72 58 3311 34.8 33 + 69825101 6 3316 73 58 3298 35.3 28 + 69825096 20 3317 74 58 3241 35.3 31 + 69825084 55 3295 75 58 3275 34.5 33 + 69825069 6 3280 76 58 3298 34.9 32 + 69825052 13 3310 77 58 3265 35.8 29 + 6982 5035 38 3302 78 58 3253 37.8 28 + 69824936 10 3262 79 58 3220 35.6 30 + 69824873 21 3240 80 58 3235 36.6 33 + 69824851 1 3235 81 58 3218 34.9 30 + 69824840 14 3231 82 58 3171 35.4 33 + 69824660 3 3173 83 58 3169 36.7 31 + 69824651 2 3170 84 58 3127 35.9 30 + 69824592 1 3127 85 58 3022 35.2 31 + 69824541 109 3130 86 58 3010 36.9 31 + 69824515 34 3043 87 58 3108 34.4 31 + 69824492 6 3113 88 58 3111 35.3 30 + 69824488 4 3114 89 58 3109 35.4 32 + 69824470 1 3109 90 58 3056 35.1 30 + 69824392 20 307 5 91 58 3069 35.9 30 + 69824340 1 3069 92 58 3040 35.0 31 + 69824317 25 3064 93 58 3062 35.5 31 + 69824313 1 3062 94 58 3033 34.9 31 + 69824305 5 3037 95 58 3025 35.5 31 + 69824234 1 3025 96 58 3041 34.9 32 + 69824219 1 3041 97 58 3034 35.8 29 + 698 24185 1 3034 98 58 3015 35.0 32 + 69824177 6 3020 99 58 3026 35.5 32 + 69824172 3 3028 100 58 3013 36.9 30 + 69824168 13 3025 101 58 7154 36.6 29 + 69809557 6 7159 102 58 3179 35.7 32 + 69809542 2576 5754 103 58 3675 37.2 30 + 69809536 3 3677 104 58 3027 37.4 29 + 69809462 3 3029 105 58 5270 37.3 35 + 69809461 12 5281 106 58 4930 37.1 29 + 69809459 3068 7997 107 58 7891 37.3 30 + 69809439 1 7891 108 58 4371 35.9 30 + 69809132 1 4371

PAGE 39

3 N Unit (bp) Length (bp) GC% Var% HOR GI Start pos End pos 109 58 3715 35.1 33 + 69809131 1 3715 110 58 3783 36.1 31 + 69 809129 28 3810 111 58 8067 36.0 34 + 69809128 4 8070 112 58 6305 35.6 34 + 69809024 1 6305 113 58 5249 36.6 33 + 69809021 4 5252 114 58 5542 36.9 31 + 69809018 270 5811 115 58 3127 37.4 31 + 69798858 1622 4748 116 58 3489 37.3 28 + 69798855 61 3549 117 58 5084 36.1 30 + 69798845 36 5119 118 58 6407 36.5 32 + 69798839 166 6572 119 58 4609 36.0 33 + 69798833 1 4609 120 58 3211 36.5 29 + 69798770 3 3213 121 58 9717 37.2 25 + 69798696 8 9724 122 58 9581 36.5 36 + 69798472 2 9582 123 58 7373 34.6 32 + 69798357 3 7375 124 58 5595 35.4 32 + 69798352 1 5595 125 58 5428 34.8 35 + 69798332 27 5454 126 58 3940 35.3 32 + 69798323 7 3946 127 58 4826 35.5 31 + 69798153 1 4826 128 58 3565 34.8 33 + 69798151 1 3565 129 58 3846 34.8 31 + 69798043 53 3898 130 58 5200 34.9 33 + 69798042 40 5239 131 58 3061 34.2 37 + 69798041 29 3089 132 58 4259 34.9 31 + 69798039 10 4268 133 58 3099 35.1 32 + 69798014 36 3134 134 58 3251 34.9 32 + 69798013 2 3252 135 58 7133 37.3 32 + 69797954 4 7136 136 58 4184 37.3 2 8 + 69797945 1 4184 137 58 9971 35.0 34 + 69797933 1 9971 138 58 5392 35.0 32 + 69797914 1 5392 139 58 5150 34.9 31 + 69787705 1 5150 140 58 3689 35.6 30 + 69787579 1 3689 141 58 5575 35.9 30 + 69787539 1 5575 142 58 3582 35.5 32 + 69787538 1 3582 1 43 58 3927 37.3 29 + 69787495 36 3962 144 58 4184 37.1 32 + 69787493 25 4208 145 58 3384 36.0 32 + 69787468 4 3387 146 58 6026 35.7 31 + 69787440 35 6060 147 58 3451 34.6 30 + 69787437 61 3511 148 58 4081 35.1 31 + 69787436 1 4081 149 58 5833 35.1 31 + 69787412 1578 7410 150 58 5551 34.9 32 + 69787332 1 5551 151 58 4643 36.1 30 + 69787179 12825 17467 152 58 5649 34.7 31 + 69787071 8 5656 153 58 4069 34.2 30 + 69787032 6 4074 154 58 3772 38.2 31 + 69780706 13165 16936 155 58 3900 35.7 31 + 697804 32 10 3909 156 58 3340 35.7 30 + 69780430 1 3340 157 58 3962 35.2 32 + 69780289 66 4027 158 58 3635 34.8 33 + 69780288 4 3638 159 58 5224 35.2 32 + 69780281 6 5229 160 58 4415 34.9 31 + 69780280 54 4468 161 58 3147 35.1 31 + 69780253 1 3147 162 58 3 882 35.8 33 + 69780251 4 3885 163 58 3042 34.8 33 + 69780163 4 3045 164 58 3725 35.3 31 + 69780131 1 3725 165 58 8091 34.3 32 + 69780126 1388 9478 166 58 7065 35.7 30 + 69780037 10 7074

PAGE 40

4 N Unit (bp) Length (bp) GC% Var% HOR GI Start pos End pos 167 58 5846 36.4 32 + 69779747 2215 8060 168 58 5480 35.9 30 + 6 9779623 1 5480 169 58 3423 36.8 31 + 69779620 8 3430 170 58 3583 35.0 34 + 69779472 27 3609 171 58 5870 38.2 29 + 69779402 1296 7165 172 58 3014 34.6 33 + 69779248 10 3023 173 58 4430 34.7 31 + 69779247 1 4430 174 58 3019 36.0 32 + 69779064 5 3023 1 75 58 7819 36.1 29 + 69779060 3 7821 176 58 5334 35.6 31 + 69779000 1 5334 177 58 6014 35.4 33 + 69778999 6 6019 178 58 4664 35.5 33 + 69778733 2 4665 179 58 4056 35.4 33 + 69778707 4 4059 180 58 3294 35.0 32 + 69778495 1 3294 181 58 3727 36.3 32 + 6 9778459 13 3739 182 58 3781 35.1 30 + 69778435 1 3781 183 58 3887 35.1 32 + 69778173 1 3887 184 58 3167 36.2 29 + 69777951 5 3171 185 58 8418 37.1 32 + 69777927 3 8420 186 58 4660 36.8 33 + 69777926 1 4660 187 58 3599 35.3 32 + 69777715 8 3606 188 5 8 3356 36.1 34 + 69777607 57 3412 189 58 4453 35.8 31 + 69777606 11 4463 190 58 3337 35.7 30 + 69777471 1 3337 191 58 3276 34.5 30 + 69777259 24 3299 192 58 5031 35.8 31 + 69777178 13 5043 193 58 4293 35.1 32 + 69777056 25 4317 194 58 3574 35.9 29 + 69777052 2 3575 195 58 4894 34.9 34 + 69777001 10 4903 196 58 3501 36.3 30 + 69776871 1 3501 197 58 3377 35.2 29 + 69775756 9 3385 198 58 3539 36.1 27 + 69601673 1306 4844 199 58 4148 36.6 32 + 69970351 1 4148 200 58 4496 34.4 33 + 20772172 1 4496 2 01 58 8325 36.2 31 + 20778062 6 8330 202 58 5026 36.4 29 + 20778115 1 5026 203 58 5427 35.5 31 + 20784542 2 5428 204 58 4061 35.2 30 + 20641030 19 4079 205 58 4283 36.1 33 + 20646310 724 5006 206 58 3551 34.6 32 + 20648551 28 3578 207 58 3002 36.7 33 + 20668347 32 3033 208 58 3011 37.5 32 + 20668393 62 3072 209 58 4128 34.9 28 + 20680103 1 4128 210 58 3748 34.4 36 + 20683179 2 3749 211 58 3919 36.6 29 + 20688657 1279 5197 212 58 4206 36.0 30 + 20713343 8 4213 213 58 3599 36.2 29 + 20719656 9 360 7 214 58 4304 35.6 32 + 20722588 2463 6766 215 58 3498 36.6 33 + 20743451 5 3502 216 58 4315 34.6 34 + 20758157 1 4315 217 59 3972 35.6 28 + 69826353 7 3978 218 59 6658 35.7 30 + 69809556 1 6658 219 59 3287 35.4 30 + 69809537 1 3287 220 59 3534 37.0 26 + 69809456 15182 18715 221 59 6615 36.6 33 + 69809141 4 6618 222 59 3763 36.4 34 + 69809139 2970 6732 223 59 5732 36.6 32 + 69809012 2055 7786 224 59 3245 36.5 29 + 69798778 25 3269

PAGE 41

5 N Unit (bp) Length (bp) GC% Var% HOR GI Start pos End pos 225 59 3044 36.3 28 + 69798480 7515 10558 226 59 10914 35.9 32 + 69798310 2 10915 227 59 5647 35.9 33 + 69798309 398 6044 228 59 3342 36.7 32 + 69798306 1125 4466 229 59 15822 37.2 31 + 69798036 1505 17326 230 59 7117 37.0 31 + 69798036 19079 26195 231 59 4586 37.8 33 + 69798035 20 4605 232 59 6872 37.3 27 + 6979 7946 11044 17915 233 59 5950 37.1 28 + 69787559 595 6544 234 59 22913 37.1 33 + 69787526 33 22945 235 59 5943 36.1 31 + 69787455 69 6011 236 59 4080 35.8 29 + 69780673 4 4083 237 59 9295 37.7 32 + 69780673 15134 24428 238 59 5672 38.1 30 + 69778913 1 5672 239 59 9980 36.2 33 + 69778913 5750 15729 240 59 4252 36.7 33 + 69778864 1458 5709 241 59 5080 36.3 30 + 20766889 1 5080 242 59 3423 36.0 31 + 20775050 2745 6167 243 59 3754 36.8 30 + 20787326 23 3776 244 59 7250 36.8 32 + 20787543 15 7264 245 59 3770 37.1 32 + 20641033 1 3770 246 59 6337 37.4 30 + 20647624 16 6352 247 59 10591 36.9 32 + 20649335 33 10623 248 59 3127 36.1 32 + 20667996 21 3147 249 59 3513 37.2 27 + 20668294 1 3513 250 59 3167 35.8 32 + 20693824 15 3181 251 59 3404 35.7 30 + 20694913 27 3430 252 59 4735 36.2 29 + 20721896 60 4794 253 59 4941 37.5 35 + 20727399 1 4941 254 59 5701 36.2 31 + 20735063 8 5708 255 116 5052 35.4 26 + 69827764 1 5052 256 116 3825 35.0 23 + 69827488 34 3858 257 116 4509 35.3 23 + 69827095 5 45 13 258 116 4226 35.5 27 + 69826947 3 4228 259 116 3864 35.5 24 + 69826183 1 3864 260 116 3771 35.3 26 + 69826010 1 3771 261 116 3660 35.2 17 69825846 1 3660 262 116 3361 35.9 19 + 69825698 223 3583 263 116 3110 35.1 25 + 69824474 1 3110 264 116 46 18 35.5 30 + 69809548 4 4621 265 116 6019 35.9 32 + 69809547 6546 12564 266 116 3587 36.5 30 + 69809544 13 3599 267 116 6836 35.0 27 + 69798266 3 6838 268 116 3573 34.4 26 + 69798264 14 3586 269 116 4836 34.8 24 + 69798261 1 4836 270 116 4605 35.7 32 + 69787387 349 4953 271 116 3689 36.8 37 + 69787031 15 3703 272 116 3320 34.3 26 + 69780213 2 3321 273 116 6229 36.1 26 + 69778732 1 6229 274 116 3014 34.9 20 + 69778494 1 3014 275 116 3668 34.8 29 + 69777975 10 3677 276 116 3669 35.8 22 + 69777790 1 3669 277 116 3911 34.6 25 + 20779497 1875 5785 278 116 4781 35.7 34 + 20646310 223 5003 279 116 5855 35.2 26 + 20710918 28 5882 280 116 3164 35.4 21 + 20735384 1 3164 281 117 4057 36.9 33 + 69893283 7163 11219 282 117 6657 36.9 25 + 69885819 1 6657

PAGE 42

6 N Unit (bp) Length (bp) GC% Var% HOR GI Start pos End pos 283 117 14395 36.6 30 + 69885813 3 14397 284 117 10039 36.9 27 + 69885812 1 10039 285 117 5741 34.5 22 + 69828408 9 5749 286 117 3893 34.4 25 + 69826593 3 3895 287 117 4091 35.5 24 + 69826542 7 4097 288 117 7666 36.6 29 + 69809557 240 7905 289 117 4347 35.6 30 + 69809549 1925 6271 290 117 3644 35.3 30 + 69809548 8424 12067 291 117 8267 36.6 29 + 69809546 611 8877 292 117 7148 36.6 32 + 69809545 1 7148 293 117 4959 35.6 34 + 69809539 11273 16231 294 117 8647 36.5 29 + 69809533 1 8647 295 117 33 88 35.0 39 + 69809450 1398 4785 296 117 5421 37.6 27 + 69809444 8 5428 297 117 3956 36.4 31 + 69809436 813 4768 298 117 6120 35.5 29 + 69809435 1 6120 299 117 3894 36.4 33 + 69809435 6158 10051 300 117 18293 36.7 35 + 69809140 4 18296 301 117 5076 35 .6 34 + 69809136 1 5076 302 117 5067 35.0 29 + 69809131 69 5135 303 117 4439 36.5 29 + 69809127 3830 8268 304 117 5140 36.2 29 + 69809020 1177 6316 305 117 8410 36.4 33 + 69809014 15262 23671 306 117 4521 35.6 28 + 69809011 5 4525 307 117 3387 35.9 2 9 + 69809011 3374 6760 308 117 3765 37.2 34 + 69798842 1004 4768 309 117 6524 35.5 31 + 69798478 1502 8025 310 117 4465 36.3 29 + 69798472 5124 9588 311 117 3821 36.6 31 + 69797996 1 3821 312 117 8028 36.4 27 + 69797946 352 8379 313 117 7016 37.4 34 + 69787770 1 7016 314 117 10457 37.6 30 + 69787562 1 10457 315 117 3937 37.3 34 + 69787559 8103 12039 316 117 22944 37.1 30 + 69787526 1 22944 317 117 3501 35.6 31 + 69787457 17 3517 318 117 10616 35.7 33 + 69780706 3 10618 319 117 4164 35.4 30 + 697 80706 7493 11656 320 117 3691 38.2 28 + 69780705 4 3694 321 117 5006 36.2 28 + 69780673 7507 12512 322 117 5922 38.2 27 + 69779402 1305 7226 323 117 3591 35.1 28 + 69778133 73 3663 324 117 5080 34.8 22 + 69777693 3 5082 325 117 7066 36.8 33 + 6977664 9 346 7411 326 117 3901 37.3 28 + 69546071 1 3901 327 117 4332 36.2 32 + 69970351 2369 6700 328 117 3315 36.3 36 + 20778080 1 3315 329 117 8975 37.4 27 + 20778100 1 8975 330 117 3001 37.4 29 + 20778105 350 3350 331 117 3679 36.4 27 + 20782443 2 3680 332 117 8420 35.3 25 + 20784845 20 8439 333 117 3133 36.6 28 + 20646354 2 3134 334 117 3314 36.9 27 + 20647327 1 3314 335 117 3476 37.2 32 + 20649420 6705 10180 336 117 3916 36.5 30 + 20719652 1 3916 337 117 3649 35.7 30 + 20720952 1441 5089 338 117 4301 37.0 31 + 20721496 2 4302 339 117 3003 34.9 23 + 20725023 494 3496 340 117 3612 36.9 26 + 20733329 1 3612

PAGE 43

7 N Unit (bp) Length (bp) GC% Var% HOR GI Start pos End pos 341 117 3261 35.4 31 + 20735066 19 3279 342 118 3100 35.5 21 + 69828629 1 3100 343 118 5649 35.4 24 + 69828373 2 5650 344 118 3648 34.8 18 + 69825845 2 3649 345 118 3530 35.1 22 + 69825680 5 3534 346 118 3499 35.4 20 + 69825633 52 3550 347 118 3485 35.2 16 69825475 1 3485 348 118 3304 36.3 21 + 69825039 1 3304 349 118 3254 34.8 22 + 69824986 16 3269 350 118 3213 34.8 24 + 69824839 1 3213 351 118 3219 35.9 23 + 69824816 1 3219 352 118 3157 35.7 21 + 69824625 6 3162 353 118 3127 35.1 21 + 69824530 1 3127 354 118 3055 35.4 23 + 69824327 1 3055 355 118 3493 36.1 30 + 69798479 566 4058 356 118 3440 35.3 18 + 69798324 1 3440 357 118 3137 34.6 23 + 69787704 3 3139 358 118 10204 34.9 17 69787591 1 10204 359 118 9324 35.2 26 + 69787530 88 9411 360 118 3584 35.1 26 + 69777497 2 3585 361 174 6924 36.7 33 + 69798830 612 7535 362 174 16751 36.8 32 + 69787494 13 16763 363 175 4146 37. 1 33 + 69809144 393 4538 364 175 3765 35.7 29 + 69798478 8062 11826 365 175 3989 35.9 32 + 69798311 4 3992 366 175 4225 36.2 32 + 69798309 4055 8279 367 175 6614 37.8 28 + 69779930 1 6614 368 175 3457 36.8 29 + 20668036 831 4287 369 175 4530 35.1 31 + 20734680 3450 7979 370 176 4321 34.8 28 + 69826849 8 4328 371 176 4942 36.2 27 + 69809530 767 5708 372 176 5345 36.7 30 + 69809447 1 5345 373 176 5454 35.5 35 + 69809142 2 5455 374 176 10974 36.5 30 + 69809014 1717 12690 375 176 3232 36.4 30 + 6979 8778 16 3247 376 176 3207 36.8 32 + 69798480 692 3898 377 176 7078 37.0 32 + 69778862 17 7094 378 176 4480 35.3 27 + 69777973 70 4549 379 176 4060 35.2 27 + 20641030 28 4087 380 176 3146 38.2 26 + 20646447 2 3147 381 176 3954 36.7 33 + 20727375 4260 8213 382 231 3409 36.2 36 + 69797997 179 3587 383 232 3239 33.5 25 + 69787692 1 3239 384 233 4151 35.2 13 69826637 4 4154 385 233 3853 35.1 13 69826172 4 3856 386 233 3449 35.9 18 + 69825474 3 3451 387 233 3073 35.1 13 69824357 1 3073 388 233 3008 36.0 11 69824084 1 3008 389 233 5022 36.4 30 + 69809020 1 5022 390 233 6244 35.5 33 + 69798831 34 6277 391 233 3077 35.6 9 69798155 1 3077 392 233 3948 34.3 20 + 69787329 21 3968 393 233 5237 36.2 26 + 69787007 948 6184 394 233 3849 35.7 14 69780432 1 3849 395 233 4339 35.9 16 69777996 1 4339 396 233 3515 34.7 10 + 69777424 1 3515 397 234 3397 35.9 9 69830835 1 3397 398 234 3636 34.3 14 69830656 1 3636

PAGE 44

8 N Unit (bp) Length (bp) GC% Var% HOR GI Start pos End pos 399 234 6989 36.1 7 69829968 1 6989 400 234 3151 36.2 10 69829963 1 3151 401 234 7368 35.6 12 69829394 189 7556 402 234 7077 35.4 11 69829247 1 7077 403 234 4370 34.9 17 + 69829154 10 4379 404 234 3328 34.9 11 69829088 5 3332 405 234 6299 34.6 23 + 69828894 39 6337 406 234 5051 35.4 12 69828825 10 5060 407 234 62 87 35.5 10 69828751 1 6287 408 234 4154 35.5 9 69828689 1 4154 409 234 5158 35.5 12 69828554 1 5158 410 234 5912 35.4 12 69828519 1 5912 411 234 5845 36.1 10 69828449 1 5845 412 234 5429 35.6 13 69828128 1 5429 413 234 4560 35.6 11 6982 8120 10 4569 414 234 4851 35.1 15 + 69827723 193 5043 415 234 4985 35.5 14 69827649 1 4985 416 234 4948 35.6 10 69827605 1 4948 417 234 4904 36.6 10 69827568 1 4904 418 234 4892 35.5 10 69827548 1 4892 419 234 3451 34.2 14 69827529 6 3456 420 234 4868 36.4 9 69827518 1 4868 421 234 4781 34.9 14 + 69827432 1 4781 422 234 4636 35.7 11 69827217 3 4638 423 234 4621 35.6 9 69827200 1 4621 424 234 4613 35.8 14 + 69827195 1 4613 425 234 4544 35.8 12 69827148 1 4544 426 234 4562 35.6 9 69827136 1 4562 427 234 4451 35.3 14 + 69827021 1 4451 428 234 4421 36.0 14 + 69826971 1 4421 429 234 4400 36.0 9 69826941 1 4400 430 234 4289 35.4 11 69826806 1 4289 431 234 4285 34.8 11 69826798 1 4285 432 234 4270 35.5 15 69826782 1 42 70 433 234 4246 35.3 18 + 69826774 16 4261 434 234 4245 34.7 15 69826752 1 4245 435 234 4240 36.2 9 69826736 1 4240 436 234 4134 34.9 11 69826597 3 4136 437 234 4091 35.6 12 69826533 3 4093 438 234 4041 35.5 11 69826526 49 4089 439 234 403 9 35.9 14 + 69826509 18 4056 440 234 4075 35.6 10 69826505 2 4076 441 234 4072 36.1 8 69826498 2 4073 442 234 4049 35.6 15 + 69826461 3 4051 443 234 3347 35.7 9 69826432 1 3347 444 234 3979 35.7 12 69826365 9 3987 445 234 3963 35.3 12 69826 346 13 3975 446 234 3974 35.2 12 69826341 1 3974 447 234 3950 34.2 11 69826303 1 3950 448 234 3935 34.7 15 + 69826299 1 3935 449 234 3683 36.4 21 + 69826241 1 3683 450 234 3846 36.3 9 69826229 2 3847 451 234 3859 34.6 14 69826204 19 3877 452 234 3871 35.6 10 69826193 1 3871 453 234 3866 35.0 13 69826191 1 3866 454 234 3852 34.5 15 + 69826167 1 3852 455 234 3845 35.7 12 69826157 1 3845 456 234 3837 35.2 12 69826138 2 3838

PAGE 45

9 N Unit (bp) Length (bp) GC% Var% HOR GI Start pos End pos 457 234 3738 35.1 13 69826104 60 3797 458 234 3802 35.9 7 69826086 9 3810 459 234 3798 35.4 12 69826062 1 3798 460 234 3776 35.2 9 69826016 1 3776 461 234 3770 34.4 14 69826009 1 3770 462 234 3743 35.7 15 69825974 3 3745 463 234 3723 35.6 9 69825956 1 3723 464 234 3614 34.0 16 + 69825951 105 3 718 465 234 3698 36.1 10 69825921 3 3700 466 234 3682 36.0 8 69825889 3 3684 467 234 3630 34.8 18 + 69825808 3 3632 468 234 3629 35.6 9 69825790 1 3629 469 234 3619 35.9 12 69825786 8 3626 470 234 3613 35.3 16 + 69825774 1 3613 471 234 3599 36.0 12 69825728 1 3599 472 234 3579 35.3 10 69825689 2 3580 473 234 3541 35.8 10 69825674 32 3572 474 234 3567 35.5 12 69825673 4 3570 475 234 3539 35.7 14 69825642 8 3546 476 234 3512 35.4 13 69825626 1 3512 477 234 3488 35.5 16 + 69825 596 1 3488 478 234 3521 35.3 9 69825563 1 3521 479 234 3492 35.4 9 69825495 6 3497 480 234 3466 35.6 9 69825445 5 3470 481 234 3430 35.6 17 + 69825437 34 3463 482 234 3459 35.3 11 69825426 1 3459 483 234 3454 34.8 10 69825419 1 3454 484 23 4 3449 35.2 10 69825402 1 3449 485 234 3438 35.7 10 69825378 1 3438 486 234 3433 36.2 9 69825372 3 3435 487 234 3428 35.4 11 69825351 1 3428 488 234 3343 35.7 11 69825341 1 3343 489 234 3403 34.7 18 + 69825296 4 3406 490 234 3390 35.1 14 69825271 1 3390 491 234 3384 36.6 4 69825252 1 3384 492 234 3364 36.4 6 69825204 1 3364 493 234 3316 36.2 8 69825079 1 3316 494 234 3307 34.9 11 69825042 1 3307 495 234 3289 35.2 9 69825009 1 3289 496 234 3283 36.0 7 69824981 1 3283 497 234 3269 35.6 9 69824980 1 3269 498 234 3281 36.4 9 69824977 1 3281 499 234 3275 35.5 9 69824967 1 3275 500 234 3273 35.5 9 69824960 1 3273 501 234 3260 35.4 14 + 69824934 1 3260 502 234 3257 35.7 10 69824926 2 3258 503 234 3213 36.0 7 69 824884 30 3242 504 234 3231 36.0 8 69824874 10 3240 505 234 3231 35.5 12 69824873 4 3234 506 234 3235 35.4 10 69824848 1 3235 507 234 3229 35.4 9 69824835 1 3229 508 234 3194 35.3 10 69824741 2 3195 509 234 3179 35.6 11 69824681 1 3179 5 10 234 3171 35.0 11 69824652 1 3171 511 234 3163 35.6 13 69824634 1 3163 512 234 3153 36.5 11 69824604 1 3153 513 234 3148 35.8 10 69824603 1 3148 514 234 3149 36.1 7 69824594 1 3149

PAGE 46

10 N Unit (bp) Length (bp) GC% Var% HOR GI Start pos End pos 515 234 3107 36.5 5 69824576 36 3142 516 234 3137 34.9 14 69824562 1 3137 517 234 3077 35.5 13 69824532 1 3077 518 234 3119 36.0 16 + 69824523 1 3119 519 234 3123 36.3 12 69824517 1 3123 520 234 3099 35.1 14 69824453 7 3105 521 234 3102 33.9 13 69824450 3 3104 522 234 3077 35.5 10 69824369 1 3077 523 234 3077 35.6 10 69824368 1 3077 524 234 3076 35.9 7 69824365 1 3076 525 234 3062 35.2 15 + 69824358 2 3063 526 234 3065 35.5 9 69824324 1 3065 527 234 3058 36.4 5 69824295 1 3058 528 234 3054 35.6 8 69824274 1 3054 529 234 3033 3 5.9 12 69824272 1 3033 530 234 3031 35.3 14 + 69824206 1 3031 531 234 3025 35.4 10 69824165 1 3025 532 234 3008 35.6 8 69824089 2 3009 533 234 3009 34.8 15 + 69824088 1 3009 534 234 3118 35.3 14 69809454 1 3118 535 234 4013 36.3 29 + 69809446 3 4015 536 234 5728 35.8 31 + 69798780 1 5728 537 234 3637 35.5 16 + 69798326 31 3667 538 234 5200 34.9 18 + 69798042 48 5247 539 234 3423 34.8 16 + 69798017 50 3472 540 234 6070 36.6 26 + 69797942 3 6072 541 234 7384 35.0 17 + 69797932 1 7384 542 234 3501 36.3 12 69797917 1 3501 543 234 5103 35.5 14 + 69797915 2 5104 544 234 4046 35.2 19 + 69787706 1 4046 545 234 4155 35.0 12 69787576 1 4155 546 234 6619 35.0 18 + 69787542 1 6619 547 234 3570 35.5 11 69787538 22 3591 548 234 5341 35.2 2 2 + 69787535 1 5341 549 234 3209 35.7 20 + 69787534 1 3209 550 234 8729 34.9 15 + 69787439 1 8729 551 234 4083 34.5 13 69787438 1 4083 552 234 7041 35.4 11 69787435 1 7041 553 234 4503 35.5 11 69787434 1 4503 554 234 11286 35.1 18 + 69787413 66 31 17916 555 234 3385 36.4 9 69787373 1 3385 556 234 3556 35.7 11 69787371 1 3556 557 234 4579 35.7 13 69787180 92 4670 558 234 12819 36.1 11 69787179 1 12819 559 234 3862 35.0 12 69787069 1 3862 560 234 11702 34.6 18 + 69780671 22 11723 5 61 234 3099 34.5 18 + 69780488 1 3099 562 234 3239 34.9 12 69780288 447 3685 563 234 3433 35.0 18 + 69780252 1 3433 564 234 3790 35.6 10 69780160 1 3790 565 234 3048 35.1 11 69780047 1 3048 566 234 3085 36.1 18 + 69779902 1 3085 567 234 3438 35 .9 15 69779900 2 3439 568 234 3536 35.0 15 69779405 170 3705 569 234 7952 37.4 28 + 69779404 1 7952 570 234 6279 36.2 9 69779250 1 6279 571 234 4356 35.4 11 + 69779249 1 4356 572 234 3672 35.7 9 69779059 1 3672

PAGE 47

11 N Unit (bp) Length (bp) GC% Var% HOR GI Start pos End pos 573 234 16160 35.9 15 + 6977892 2 24 16183 574 234 4675 34.3 19 + 69778762 12 4686 575 234 5640 35.4 11 69778737 1 5640 576 234 4455 35.5 14 69778733 1 4455 577 234 3530 35.2 17 + 69778728 1 3530 578 234 3006 35.3 14 69778691 2 3007 579 234 5460 35.0 18 + 69778690 1 5460 580 234 5149 34.9 11 69778670 1 5149 581 234 4043 35.2 19 + 69778577 3 4045 582 234 7072 35.2 15 69778568 1 7072 583 234 3510 34.4 13 69778487 33 3542 584 234 4471 36.4 17 + 69778461 3 4473 585 234 3597 35.5 12 69778415 1 3597 586 234 3154 35.8 8 69778414 1 3154 587 234 4038 35.8 12 69778298 1 4038 588 234 3329 35.5 12 69778257 1 3329 589 234 3223 35.4 15 + 69778214 1 3223 590 234 4328 35.1 11 69778164 12 4339 591 234 4684 35.6 10 69778159 20 4703 592 234 5844 35.8 11 69778066 1 5844 593 234 4465 35.8 14 + 69778055 14 4478 594 234 4691 37.0 24 + 69778035 1 4691 595 234 4994 35.7 13 69778013 33 5026 596 234 3604 34.8 21 + 69777975 2 3605 597 234 3777 35.2 10 69777885 1 3777 598 234 3255 35.5 11 69777870 1 3255 599 234 7600 35.0 19 + 69777694 213 7812 600 234 4313 35.7 14 + 69777606 153 4465 601 234 3556 35.1 15 + 69777470 48 3603 602 234 3115 35.2 8 69777324 1 3115 603 234 3484 35.8 8 69777317 1 3484 604 234 4160 35.4 13 69777034 2 4161 605 234 4128 35.9 12 69777028 33 4160 606 234 3098 34.6 19 + 69776480 1 3098 607 234 3536 36.4 9 69775833 1 3536 608 234 3038 35.4 26 + 69591580 1 3038 609 234 3536 34.8 14 + 69970353 1 3536 610 234 3091 37.3 32 + 20778105 36 3126 611 234 3168 35.7 9 20778132 1 31 68 612 234 9558 35.1 19 + 20787080 5 9562 613 234 5125 35.7 10 20787119 3 5127 614 234 3409 38.0 30 + 20668056 35 3443 615 234 4109 36.7 26 + 20716693 10 4118 616 234 3935 34.4 11 20731658 16 3950 617 234 3312 34.8 16 + 20741412 16 3327 618 235 4036 36.3 9 69826439 1 4036 619 235 3380 34.8 11 69825325 1 3380 620 235 3201 35.4 14 69825192 158 3358 621 292 6812 36.2 33 + 69809016 84 6895 622 293 3197 35.2 29 + 69824737 1 3197 623 293 5484 36.0 30 + 69787453 1 5484 624 340 4116 35.4 19 + 69798265 1 4116 625 347 3863 36.3 31 + 69787470 1 3863 626 348 3619 33.2 19 + 69787693 1 3619 627 349 3462 35.6 18 + 69825437 1 3462 628 349 3689 36.2 19 + 69778459 9 3697 629 350 3709 34.7 17 + 69825937 1 3709 630 350 7469 34.9 21 + 69779406 1 7469

PAGE 48

12 N Unit (bp) Length (bp) GC% Var% HOR GI Start pos End pos 631 350 4617 35.4 17 + 69778669 1 4617 632 350 6151 34.6 19 + 69778640 4 6154 633 350 5020 35.7 20 + 69778013 1 5020 634 350 3006 35.2 15 + 20724572 1 3006 635 351 3212 35.9 18 + 69828056 2145 5356 636 351 3235 34.5 19 + 69825069 66 3300 637 351 70 37 34.9 16 + 69809434 1 7037 638 351 5030 33.9 24 + 69798356 8 5037 639 351 3536 34.8 23 + 69787333 1 3536 640 351 4043 34.7 19 + 69778399 1 4043 641 351 3355 34.5 14 20646358 1 3355 642 352 3651 34.8 20 + 69825845 5 3655 643 352 3356 34.4 13 698 25186 1 3356 644 406 4778 35.0 25 + 69809541 774 5551 645 416 4567 35.0 24 + 69809541 1075 5641 646 466 3506 35.0 13 69825607 31 3536 647 466 3434 36.2 10 69825365 1 3434 648 466 3173 36.3 13 20673162 1 3173 649 467 9869 35.4 17 + 69830098 20 9 888 650 467 5725 35.6 9 69828626 351 6075 651 467 5183 35.1 9 69827882 2 5184 652 467 4378 35.5 12 69826947 28 4405 653 467 3816 35.2 9 69826095 1 3816 654 467 3459 35.5 14 69825542 3 3461 655 467 3265 36.3 12 69824944 1 3265 656 467 302 4 34.8 13 69824305 37 3060 657 467 3335 34.8 14 69797936 15 3349 658 467 5369 35.9 12 69787382 8 5376 659 467 3272 35.2 9 69778215 3 3274 660 467 3068 36.1 9 69777345 11 3078 661 467 3248 34.5 14 + 69777259 11 3258 662 467 3137 34.8 16 + 20 736854 1 3137 663 467 11354 35.8 12 20746950 1 11354 664 468 3258 35.8 13 69824963 1 3258 665 468 3556 34.8 13 69798151 38 3593 666 468 4156 35.8 12 20643713 2 4157 667 469 3023 34.3 21 + 69778761 1 3023 668 469 3096 35.8 10 69778084 535 36 30 669 471 3297 36.2 30 + 69798480 10622 13918 670 522 4387 36.7 28 + 69797955 186 4572 671 522 6561 36.4 31 + 69787468 454 7014 672 583 3934 34.3 17 + 69787329 47 3980 673 583 3977 35.4 18 + 69777573 1 3977 674 584 4040 34.2 18 + 69826447 1 4040 67 5 585 3130 35.0 14 + 69824551 3 3132 676 586 3448 35.2 17 + 69825399 1 3448 677 641 3736 35.8 22 + 69826026 11 3746 678 701 3731 35.5 21 + 69825999 1 3731 679 701 3746 35.8 19 + 69825989 7 3752 680 701 3606 35.4 14 + 69825841 1 3606 681 701 3404 35.8 16 + 69825679 163 3566 682 701 3314 35.8 13 69825072 1 3314 683 701 4077 35.8 9 69787383 1 4077 684 703 4558 34.9 16 + 69798358 6 4563 685 703 3328 36.1 13 69777607 1 3328 686 703 5037 35.3 10 20745496 1 5037 687 819 4049 35.6 19 + 69826533 46 4094 688 821 3564 34.8 7 69825657 1 3564

PAGE 49

13 N Unit (bp) Length (bp) GC% Var% HOR GI Start pos End pos 689 821 3257 35.0 8 69825052 1 3257 690 927 4759 34.6 20 + 69798268 1 4759 691 932 3225 34.6 21 + 69824825 1 3225 692 933 3012 35.0 10 69824127 1 3012 693 933 4014 35.2 16 + 69780289 1 4014 694 935 3 448 35.4 9 69825400 1 3448 695 935 4985 35.8 9 69780372 1 4985 696 935 3803 35.9 9 69777344 1 3803 697 981 4918 34.2 25 + 69778183 255 5172 698 1052 3565 35.4 13 69825673 1 3565 699 1055 3824 35.0 16 + 69787541 1 3824 700 1056 3764 35.6 13 69826625 2 3765 701 1168 3267 35.7 20 + 69780673 1 3267 702 1227 4530 34.9 18 + 20739849 103 4632 703 1277 3214 35.9 13 69828056 2146 5359 704 1284 3576 35.9 13 69825693 7 3582 705 1288 3406 34.9 13 69798017 1 3406 706 1343 4203 36.2 12 69828 297 41 4243 707 1395 8369 36.2 21 + 69797957 5728 14096 708 1396 3844 35.3 8 69829461 146 3989 709 1403 3815 36.1 9 69826105 6 3820 710 1443 5271 37.3 19 + 69809461 1 5271 711 1518 4626 34.7 13 69827227 17 4642 712 1521 4205 35.8 12 69826694 2 4206 713 1634 8320 36.4 10 69779065 1 8320 714 1737 3551 35.8 11 69798832 1 3551 715 1756 3887 35.2 8 69826226 7 3893

PAGE 50

Additional file 3 Fig ure s S 1 and S2 Fig ure S 1 The d ot plot similarity an alysis of MaSat and TRPC 21A MM The black and white color dot plot was performed with a window size of 51 bp and minimum identity of 90% for MaSat and 80% for TRPC 21A MM. A the dot plot of the MaSat array N707 ( Additional file 2 ) same as on Figure 4 ; B MaSat array N4 ( Additional file 2 ) ; C TRPC 21A MM array N50 ( Additional file 1 T able S 3 ) ; D TRPC 21A MM array N8 ( Additional file 1 T able S 3 ) With different window size it is possible to notice different features in HOR structure of tandemly repeated DNA. The similarity visualization using 13 bp window size and full grey scale can be used for distinguishing large HORs of about 2 kb (Figures 4A, 5A) as well as little units of less than 20 bp at high magnification (Figures 4C, 5C). The similarity visualization using 51 bp window size and two colors allow to notice the overall difference between different MaSat

PAGE 51

arrays (Additional file 2 and Figure 3). The existence of v ery large HORs is visible in such a case and HOR of about 1 kb is more obvious for both TR.

PAGE 52

Figure S2 Double stranded FISH probes design. Array specific synthetic oligonucleotides were designed for FISH Fo u r TRPC 21A MM repeated monomers from the m ost specific region of the array was chosen (orange) and flanked by two adapters (blue, green). To amplify and label probe two primes were used (arrows). The monomer variability is shown by asterisks.