![]() ![]() |
![]() |
UFDC Home | Search all Groups | UF Institutional Repository | UF Institutional Repository | | Help |
Material Information
Notes
Record Information
|
This item is only available as the following downloads:
A customized Web portal for the genome of the ctenophore Mnemiopsis leidyi ( PDF )
Supplement 5 ( PDF ) Supplement 6 ( PDF ) Supplement 7 ( PDF ) |
Full Text |
PAGE 1 Insert query FASTA sequence or upload sequence file BLASTN TBLASTN TBLASTX INPUT PROGRAM! BLASTP BLASTX DATABASES! NUCLEOTIDE PROTEIN! Main Scaffolds Gene Models (2.2) Unfiltered Gene Models (Unincorporated Predictions) Public ESTs Mitochondrial Genome Cufflinks-Assembled Transcripts Trinity-Assembled Transcripts Protein Models (2.2) Unfiltered Protein Models (Unincorporated Proteins) Mitochondrial Proteins OUTPUT Alignment Formats: Pairwise Query-anchored with identities Query-anchored without identities Flat query-anchored with identities Flat query-anchored without identities XML Tabular Tabular with comment lines PAGE 1 DATABASEOpenAccessAcustomizedWebportalforthegenomeofthe ctenophore MnemiopsisleidyiRTravisMoreland1,Anh-DaoNguyen1,JosephFRyan1,2,ChristineESchnitzler1,BernardJKoch1, KatherineSiewert1,TyraGWolfsberg1andAndreasDBaxevanis1*AbstractBackground: Mnemiopsisleidyi isactenophorenativetothecoastalwatersofthewesternAtlanticOcean.A numberofstudieson Mnemiopsis haveledtoabetterunderstandingofmanykeybiologicalprocesses,andthese studieshavecontributedtotheemergenceof Mnemiopsis asanimportantmodelforevolutionaryand developmentalstudies.Recently,wesequenced,assembled,annotated,andperformedapreliminaryanalysison the150-megabasegenomeofthectenophore, Mnemiopsis .Thissequencingefforthasproducedthefirstsetof whole-genomesequencingdataonanyctenophorespeciesandisamongstthefirstwaveofprojectstosequence ananimalgenome denovo solelyusingnext-generationsequencingtechnologies. Description: The Mnemiopsis GenomeProjectPortal(http://research.nhgri.nih.gov/mnemiopsis/)isintendedboth asaresourceforobtaininggenomicinformationon Mnemiopsis throughanintuitiveandeasy-to-useinterfaceand asamodelfordevelopingcustomizedWebportalsthatenableaccesstogenomicdata.Thescopeofdataavailable throughthisPortalgoeswellbeyondt hesequencedataavailablethrough GenBank,providingkeybiological informationnotavailableelsewhere,suchaspathwayandproteindomainanalyses;italsofeaturesacustomized genomebrowserfordatavisualization. Conclusions: Weexpectthattheavailabilityofthesedatawilla llowinvestigatorstoadvancetheirownresearch projectsaimedatunderstandingphylogeneticdiversityandtheevolutionofproteinsthatplayafundamental roleinmetazoandevelopment.Theoverallapproachta keninthedevelopmentofthisWebsitecanserveasa viablemodelfordisseminatingdatafromwhole-genome sequencingprojects,framedinawaythatbest-serves thespecificneedsofthescientificcommunity. Keywords: Mnemiopsisleidyi ,Genomebrowser,CustomizedWebportal,GenewikiBackgroundCtenophoresareanimportantgroupofearly-branching metazoansthatareessentialforunderstandingtheevolutionofmulticellularanimals,therelationshipbetween genomiccomplexityandmorphologicalcomplexity,and themolecularbasisfortheevolutionofnovelcelltypes suchasepithelia,neurons,muscle,andstemcells.One ctenophorespeciesthathasreceivedparticularattention is Mnemiopsisleidyi ,whichisnativetothecoastalwatersoftheAtlanticOcean.Studiesin Mnemiopsis have advancedourunderstandingofanumberofimportant biologicalprocessessuchasregeneration,axialpatterning,andbioluminescence[1-3].Assuch, Mnemiopsis hasemergedasanimportantmodelorganismforunderstandingtheimmensediversityandcomplexityseenin theearlyevolutionofanimals. Despitetheimportanceof Mnemiopsis asanemerging modelorganism,therewerenohigh-qualitygenome-scale sequencedataavailableforanyctenophorespeciesuntil recently.Toaddressthisdearthofgenome-scalesequence data,werecentlycompletedthesequencing,assembly,annotation,andpreliminaryanalysisofthe150-megabase genomeof Mnemiopsisleidyi [4];thesedatawillserveas aninvaluableresourceforthegrowingcommunityofdevelopmental,evolutionary,andmarinebiologistsstudying importantquestionsregardingearlybranchingmetazoan biology.Initialstudiesutilizingthesesequencedatahave *Correspondence: andy@mail.nih.gov1GenomeTechnologyBranch,DivisionofIntramuralResearch,National HumanGenomeResearchInstitute,NationalInstitutesofHealth,50South Drive,Bethesda,MD20892,USA Fulllistofauthorinformationisavailableattheendofthearticle 2014Morelandetal.;licenseeBioMedCentralLtd.ThisisanOpenAccessarticledistributedunderthetermsoftheCreative CommonsAttributionLicense(http://creativecommons.org/licenses/by/2.0),whichpermitsunrestricteduse,distribution,and reproductioninanymedium,providedtheoriginalworkisproperlycredited.TheCreativeCommonsPublicDomain Dedicationwaiver(http://creativecommons.org/publicdomain/zero/1.0/)appliestothedatamadeavailableinthisarticle, unlessotherwisestated.Moreland etal.BMCGenomics 2014, 15 :316 http://www.biomedcentral.com/1471-2164/15/316 PAGE 2 contributedtoourunderstandingoftheevolutionofgene families[5-7],signalingpathways[8,9],proteindomains [10],miRNAs[11],andgenesinvolvedintheproduction anddetectionoflight[12].Theavailabilityofthesedata alsohasprovidedasolidfoundationforstudiesaimedat resolvingthequestionofthephylogeneticpositionofthe ctenophores[4]. Inrecentyears,databaseshavebeencreatedtohouse whole-genomesequencingdatafromseveralemerging modelorganisms.GenomicdataandannotationaretypicallymadeaccessibleviapublicgenomeportalsatsequencingcenterssuchastheUSDepartmentofEnergy sJoint GenomeInstitute(JGI)[13]andtheBroadInstitute[14], whileothergroupshavedevelopedWeb-basedgenomic databaseresourcesthatprovideadditionalanalysistools [15]andbrowsingoptions[16]toincreasetheutilityof thedata.Stillothershaveimplementedgenomicresources thatofferthescientificcommunityaccesstogenomicannotationandactivelyseekusercontributions[17].Ideally, foreachorganismwithasequencedgenome,therewould beasinglecentralizedresourcewheremost(ifnotall) dataretrievalandanalysiscouldtakeplace;thiskindofresourcewouldinclude,ataminimum,theabilitytosearch, browse,anddownloadsequenceandannotationdata, visualizegenomicdataviaaWeb-basedbrowsertool,and encouragetheactiveengagementofthescientificcommunityinmaintainingawiki-styleresourceforcapturing supplementaryannotationsofgenemodelsandpredicted proteins.Moreover,thiskindofresourcewould(and should)bedevelopedandmaintainedbyresearchersin themodelorganismcommunitywhointimatelyunderstandtheneedsoftheirscientificcolleagues,presenting thedatainanintuitive,user-friendlyandconcisemanner despiteitssheervolumeandcomplexity. Inourownexperienceadvisinggroupswhohave undertakenwhole-genomesequencingprojects,wehave foundthatmanyofthesegroupsdonothavereadyaccesstothekindofprogrammingresourcesneededto implementsomeofthemore advanced databasesolutionscurrentlyavailable.Withthatinmind,andtofacilitatethecreationofthekindofcentralizedgenomic dataresourcedescribedabove,wesetouttodevelopa generalizedframeworkthatstrikesareasonablebalance betweeneaseofimplementationanddocumentedstructure,withouttheadditionalconstraintsposedbysome publiclyavailabledatabaseschemas. Here,wedescribethedevelopmentandfeaturesofa comprehensiveWeb-baseddataportalfornavigatingthe recentlycompletedgenomesequenceof Mnemiopsisleidyi (http://research.nhgri.nih.gov/mnemiopsis/).The Mnemiopsis GenomeProjectPortal(or MGPPortal )isabiologistcentricresourcedesignedwithaparticularemphasison usability,intuitivenavigation,andclarity.SomekeyfeaturesoftheMGPPortalincludetheabilitytoretrieve selectednucleotideandproteinsequences,theavailability ofwhole-genomedatasetsfordownload,anintegrated BLASTutilityforsequencecomparisons,agenome browsertool,agene-centricwiki,and phylogenetically informed geneorthologclustersmappedtohuman KEGGpathways.Furthermore,thescopeofdataaccessiblethroughthisWebsite goeswellbeyondthesequencedataavailableatGenBank,providingotherkey biologicalinformationsuchasGeneOntologytermassignmentsanddatafrompathwayandproteindomain analyses.Inaddition,weo fferasetofPerlmodulesthat canbeutilizedbyotherscientistsasageneralizedframeworkforimplementingagenepageinMediaWiki,aswell asacustomizablegenomebrowserforvisualizinglargescalegenomicdatausingJBrowse.ConstructionandcontentDisplayingcustomizedgenomeannotationAmajorgoalduringthedevelopmentoftheMGPPortal wastocreateareproducibleworkflowforthecreation ofwell-annotatedandvisuallyaccessiblegenomerepositories(Figure1).Tothatend,wehaveadoptedthe JavaScript-basedJBrowse[18]astheengineforour Mnemiopsis GenomeBrowser,resultinginacleanandresponsiveuserinterfaceforviewingthegenomeassembly,gene models,andallsupportingdata.Wehavedevelopeda numberoftoolsinthePerlscriptinglanguagetoconvert sequenceandannotationdataintoaformatacceptedby JBrowse(version1.11.1),whichareincludedasAdditional files1,2and3.Inaddition,weprovideaPerlscriptthat facilitatesthecreationofMediaWikipagesthatdisplay nucleotideandproteinsequences,exonicgenomiccoordinates,andPFAMdomains(Additionalfile4).Theactual creationofgenomeassembliesandannotationfilesis outsidethescopeofthismanuscriptbutisoutlinedin detailelsewhere[4]. Tobuildfeaturetracks,JBrowserequiresproperlyformattedinputfiles.WhilenewerversionsofJBrowsereleasedsubsequenttothecreationofthe Mnemiopsis GenomeBrowserareabletoacceptaslightlyexpanded numberofinputs(includingspecificrelationaldatabase dumps),thegenericfeatureformatversion3(GFF3)flat filewastheinputtypeweadopted,aformatthatcontinuestobesupportedbyJBrowseatthetimeofthis writing(version1.11.1).AdoptionoftheGFF3format greatlyfacilitatedtheprocessingofmultipleoutputfile typesproducedbyanumberofdataanalysisprograms. Tobegin,the scaffoldToGFF3.pl script(Additionalfile1) canbeusedtoreformatscaffoldsequencesfromFASTA toGFF3.Twoparameters( iand o)arerequired,specifyingtheinputscaffoldfileanddesiredoutputdirectory, respectively.Twooptionalparameters( land n)arealso available.Thefirstshouldbecalledifthegap-indication letterintheinputisdifferentfrom N ,acceptingasingleMoreland etal.BMCGenomics 2014, 15 :316 Page2of13 http://www.biomedcentral.com/1471-2164/15/316 PAGE 3 character( e.g. ,X)asasubstitute,whilethesecondparametercanbecalledinordertospecifywhichfeaturegeneratingprogramwasused.ThescriptcreatesaGFF3 fileforeachsequenceintheinputscaffoldFASTAfile andisabletohandlebothgapped[ e.g. ,thescaffold (SCF)track]andrepeat-masked[ e.g. ,therepeat-masked (MASK)track]regionsinascaffold.Next, evmToGFF3. pl (Additionalfile2)parsesaGFF3-formattedoutput filecreatedbyEvidenceModeler(EVM)[19]bycollectingdataaboutthestartanden dpositionsofpredicted genes,usingthisinformationtocreatewell-formed GFF3files.Thescriptacceptsseveraladditionalparameters;theinputEVMfile( i)anddesiredoutputdirectory( o)mustbeset,whilethethirdandoptional parameter( n)is,again,thenameofthefeature-generating program( e.g., EVM).Thethirdmoduleiscalled cufflinksToGFF3.pl (Additionalfile3)andisusedtoparse predictedtranscriptassembliesfromtheGTF-formatted filecreatedbyCufflinks[20].The cufflinksToGFF3.pl scripthasthesamebehaviorfortranscriptlocationas evmToGFF3.pl hasforpredictedgenelocation,andacceptsthesamethreeparameters( i, o,and n). ToimportGFF3dataintoJBrowsefordisplayascustomtracksinthemaingenomewindow,aseriesofthree JBrowse-suppliedPerlscripts( prepare-refseqs.pl,biodbto-json.pl, and generate-names.pl )needtobeexecuted usingappropriateparametersandasystem-specificconfigurationfile,thedetailsofwhichcanbefoundinthe tutorialsontheJBrowseWebsite(www.jbrowse.org).A numberofcustomCGIscriptswerewrittentocreate thehyperlinksconnectingfeaturesinJBrowsetothe varioussourcesofgenedata. APerlscriptnamed create_wiki_page.pl hasbeenused tocreateMediaWikipagesfordisplayinggenomicdata (Additionalfile4).Here,weprovideasamplewikipage thattakesFASTA-formattednucleotideandprotein Figure1 Avisualrepresentationofthe Mnemiopsis GenomeProject(MGP)Portaldepictingtheflowof Mnemiopsis sequencedatainto accessibleinternaldatavisualizationandannotationtools. ColoredarrowscorrespondtoindividualdatatypeslistedintheSequenceData box(center).Forexample,theflowof AssembledTranscripts data( e.g. ,Cufflinks-andTrinity-assembledRNA-seqtranscripts)isrepresentedby theorangearrows;thesedatacanbeviewedbothastracksintheGenomeBrowserand/orutilizedasaBLASTdatabase. Moreland etal.BMCGenomics 2014, 15 :316 Page3of13 http://www.biomedcentral.com/1471-2164/15/316 PAGE 4 sequences,GFF3filescontainingexoniccoordinates,and anhmmscanoutputfilecontaininginformationonPfamAdomainsasinput.The create_wiki_page.pl scriptrequiresfiveparameters.Thefirstthreeparametersspecify asetofinputfiles,includinganucleotideFASTAfile (-n),aproteinFASTAfile(-a),andaPfam-Afile(-p). TheremainingparametersspecifyadirectorycontainingtheinputGFF3filescontainingtheexoniccoordinates(-d)andanoutputdirectory(-o).Inaddition,we showaPHPcommandlinethatimportsawikipage intoMediaWiki: php/$WIKIHOME/maintenance/importTextFile.php --user=USERwikipage.out Thewikipage.outfileiscreatedbythe create_wiki_ page.pl Researchersinterestedincreatingacustomizedgenomebrowserorgenewikiforvisualizinggenome-related dataareencouragedtoutilizetheaforementionedscripts (Additionalfiles1,2,3and4),astheirimplementation satisfiesthefundamentalrequirementsofbothJBrowse andMediaWiki.Furthermore,thesemodulesmayserve asausefulframeworkforboththedevelopmentofgene wikisandmoreadvancedgenomebrowsertracksasnew JBrowseapplicationsarecreatedtovisualizegenomic data.UserinterfaceandgenomebrowserimplementationMnemiopsis sequencesarestoredasindividualtextfiles, andseveralPerlscriptswerewrittentoretrievetheseas singlecombinedfiles.MultiplesequencescanbedownloadedviaanHTMLinterfacethatwasdevelopedusing aseriesofCGI/Perlscripts.The Mnemiopsis BLAST toolwasimplementedusingViroBLAST[21]andruns onanApacheWebserver(Additionalfile5:FigureS1). TheViroBLASTtoolwaswritteninPHP,aserver-side scriptinglanguage[22],andPerl[23],applyingthe stand-aloneblastallprogramdownloadedfromNCBI [24].The Mnemiopsis BLASTdatabaseswerecreated usingformatdb.PHPwasusedtoparsetheBLASToutputcreatedbyViroBLASTtogeneratetheformatted BLASTresults,includingtheinternallinkstotheGenome Browser,theGeneWikipagesandtheFetchScaffold Tool.MediaWiki(version1.19.11),writteninPHPand Perl,wasusedfortheGeneWikiimplementation.Perl wasalsousedtocreatedatafordisplayingontheGene Wikipages.TheKEGGpathwayspagesweredeveloped usingJavaScripttodisplayKEGGidentifier,pathway, andgenesymbolsearchlists.CGI/Perlscriptswere usedforKEGGpathways earchfunctionsanddata downloadingutilities(Additionalfile6:FigureS2).Python(version2.6)[25]scriptswereusedtosearchthe Pfamdomains,identifiedusinghmmscan[26]fromthe HMMERsuite,andCGIandJavaScriptwereusedto displaythePfamdomainsearchresults(Additionalfile7: FigureS3).UtilityanddiscussionMnemiopsis BLASTtoolOnefeatureofthe Mnemiopsis GenomeProjectPortalis acustomizedstand-aloneWeb-basedBLASTinterface forperformingnucleotideandaminoacidsequence similaritysearches(Additionalfile5:FigureS1).ViroBLASTwasusedtoimplementour Mnemiopsis BLAST tool,producinganorganized,manageableoutputthatis easytoparseandnavigate.UsersmayinputtheirFASTAformattedquerysequencesdirectlyintothesearchbox oruploadsequencefilesfromtheircomputer.ThecustomarysetofBLASTprogramsisavailable,including BLASTN,BLASTP,BLASTX,TBLASTN,andTBLASTX. Nucleotidesequencedatabasesincludethe Mnemiopsis genomicscaffolds(MainScaffolds),consensusgenepredictionmodels(GeneModels2.2)andUnfilteredGene Models(unincorporatedpredictions)describedinRyan etal. [4],allpubliclyavailable Mnemiopsis ESTsand mRNAsfromGenBank(PublicESTs),the Mnemiopsis mitochondrialgenome[27],Cu fflinks-assembledRNA-seq transcripts,andTrinity-ass embledRNA-seqtranscripts [28].Theproteinsequencedat abasesavailablethroughthe MGPPortalincludethetranslatedproteinsderivedfrom the Mnemiopsis consensusgenepredictionmodels(Protein Models2.2),theunincorporated Mnemiopsis proteinsderivedfromunincorporatedgenepredictionmodels(UnfilteredProteinModels),andthecomputationallyderived Mnemiopsis mitochondrialproteins.Additionally,auser maycreateacustomizeduser-definedBLASTdatabaseby uploadingafilecontainingFASTA-formattednucleotide orproteinsequencesofinterest.BLASToutputresults featurecustomizedcolor-codedboxeslinkeddirectlyto relevantinternalannotationresources,includingthe Mnemiopsis GenomeBrowser[B],thewiki-based Mnemiopsis GenePages[G],theScaffoldFetchTool[S],Unfiltered GeneModels[U],Cufflinks-assembledtranscripts[C],and Trinity-assembledtranscripts[T](Figure2).Browsingthe Mnemiopsis genomeOneofourprimaryobjectivesindevelopingtheMGP Portalistoprovideagraphicaltoolforthescientific communitytovisualizethevarioustypesof Mnemiopsis genomedatacurrentlyavailable.Usingthebuilt-inJBrowse genomebrowser,userscanviewavarietyofdatatracks, suchasthe Mnemiopsis genomeassembly,genepredictionmodels,andRNA-seqdata(Figure3).Several customizedJBrowsetracksareavailableforviewing,includingconsensusgenepredictionmodels(labeled2.2in thebrowser), Mnemiopsis RNA-seqreadsassembledinto transcriptsusingCufflinks(CL2)andTrinity(TRN15-30Moreland etal.BMCGenomics 2014, 15 :316 Page4of13 http://www.biomedcentral.com/1471-2164/15/316 PAGE 5 hpf),publiclyavailable Mnemiopsis ESTsfromGenBank (EST),publiclyavailable Mnemiopsis mRNAsfromGenBank(GBNT),assembledgenomicscaffolds(SCF),genomicregionsthathavebeenrepeat-maskedusingV-Match (MASK)[29],experimentallyverified Mnemiopsis RACE PCRtranscripts(RACE),unincorporatedgeneprediction models(2.2UF),andnon-redundantproteindomainsderivedfromPfamhmmscanrunsusingthe2.2and2.2UF datasetsandthesix-frametranslationsofthe Mnemiopsis genome(PFAM2.2). TheJBrowsedisplayisorganizedbygenomicscaffold. Scaffoldsarenamedusingasix-characterconvention ( e.g. ,ML nnnn );theMLdesignatesthespecies( Mnemiopsis leidyi ),andtheindividualscaffo ldsarenumberedfrom0001 to5100( e.g. ,ML0001).Geneidentifiers( e.g. ,ML000129a) startwithaprefixindicatingthescaffoldonwhichthe geneislocated(inthisexample, ML0001 ),followedbya non-paddedintegerthatisuniqueincombinationwith thescaffoldidentifier(inthiscase, 29 )andendingwitha lower-caseletterthatspecifiesthegeneisoform(inthis case, a ).Datadisplayedinthebrowsercanbesearched usingavarietyof Mnemiopsis identifiers.Ascaffold-based querytakestheuserdirectlytothatscaffold,whileagenebasedsearchgoestothatgene slocationontheappropriatescaffold.Ausermayalsosearchthegenomebrowser usinga Mnemiopsis GenBankmRNAidentifier,anEST identifier,oraPfam-Adomainname( e.g., AF293700.1, FC475136,or Glycohydro20 ,respectively).ThePFAM2.2 trackwascreatedbyrunninghmmscanagainsttheProtein Models2.2,theUnfilteredProteinModels,andthesixframetranslationsofthe Mnemiopsis genome.Scaffold coordinatesaredisplayedacrossthetopofthebrowser window.Navigationoptio nscanbefoundbeneaththe coordinatebar,includingthezoomtoolandleft-right arrows.Userscanalsorefinethedisplayedregionbyenteringthescaffoldcoordinatesintothesearchboxto therightofthenavigationoptions. GenomebrowsertracksaredescribedintheTrackDescriptionslinkabovetheleftsidebar.Theconsensus Mnemiopsis genepredictionmodels(track2.2)arepresentedbydefault.Additionaltrackcanbeaddedby clickingontheappropriatetrackboxontheleftsidebar ofthemainviewwindow.Alltrackoptionsaredisplayed inagivenscaffoldwindowevenwhenthereareno Figure2 The Mnemiopsis BLASTimplementationprovidesusersanintuitiveWebinterfaceforperformingsequencesimilarity searches. Shownaretab-delimitedBLASTPresultsfromasinglehumanPAX3protein,providinglinkstorelevantsequenceentriesinthe Mnemiopsis GenomeBrowser(purple B box),individualGeneWikipages(green G box),andUnfilteredProteinModels(orange U box). Moreland etal.BMCGenomics 2014, 15 :316 Page5of13 http://www.biomedcentral.com/1471-2164/15/316 PAGE 6 annotatedfeaturesinthatparticularregion.Featuresare representedasblackarrows,withthedirectionofthe arrowindicatingtheorientation.Exonsarepresentedas light-coloredsolidbars( e.g. ,greenfor2.2andpinkfor 2.2UF),whileuntranslatedregions(both5 and3 )are renderedusingdarker-shadedcolors.Assembledgenomic scaffolds(SCF)aredepictedassolidblacktrackswith intermittentbrightpinkgaps.TheMASKtrackappears asasolidblackbar,withbluehighlightingtheregions thatwererepeat-maskedusingVMatch(Figure3).The Referencesequencetrackdepictsthescaffoldsequence,butonlywhenthedisplayisfullyzoomedin. Mnemiopsis genesinKEGGpathways Previously,weidentifiedgeneclustersthatcontainlikely orthologsofboth Mnemiopsis andhumanproteins[4].In ordertogainsomeinsightintothefunctionofindividual Mnemiopsis genes,weusedthisinformationtoassign individual Mnemiopsis genestohumanKEGGpathways. WeconvertedEnsemblidentifiersforallhumangenes fromourphylogeneticallyinformedclustersoforthologousgenes[4]toEntrezGeneIDsusingafilefromthe EntrezGeneFTPsite[30];wethenusedEASE[31]tolink EntrezGeneIDstoKEGGpathways.Thesedatacanbe accessedbyfollowingtheKEGGPathwayslinkfoundin theleftsidebarofmostMGPPortalpages.HumanKEGG pathwayscontaininggeneswitha Mnemiopsis orthologare searchablebyselectingaKEGGidentifier( e.g., hsa00604),a pathwayname( e.g., glycosphingolipidbiosynthesis),ora genesymbol( e.g., SLC33A1)fromtheirrespectivelists, orbyusingtheKEGGpathwayssearchbox(Additional file6:FigureS2).Theresultsarepresentedaspathwayspecificorthologclustermatrices(Figure4). ForeachKEGGpathway,eachrowintheorthologclustertablerepresentsaclusteroforthologousgenesfrom ourclusteringanalysis[4].Foraclustertobeincludedin Figure3 Severalcustomizedtrackscanbedisplayedonthe Mnemiopsis genomebrowser,implementedinJBrowse. Here,wepresenta predictedproteinmodel(ML000129a)containingG-protein-coupledreceptordomains,asevidencedbythePFAM2.2track.Transcriptsassembled fromRNA-seqreadssupportthepredictedproteinmodelasdepictedintheCL2track.Transcriptsarepresentedasgrayarrowsindicatingthe orientationofthetranscript.Exonsarepresentedaslight-coloredsolidbarsanduntranslatedregions(both5 and3 )aredarker-shadedbars.Assembledgenomicscaffolds(SCF)aredepictedassolidblacktrackswithintermittentgapsshadedbrightpink.TheMASKtrackalsoappearsasa solidblackbarhighlightedwithblueinthegenomicregionsthathavebeenrepeat-maskedusingVMatch.OthertracksshownincludeanEST andseveralunfilteredproteinpredictionmodels(2.2UF).Additionaltracksaredescribedinthemaintext. Moreland etal.BMCGenomics 2014, 15 :316 Page6of13 http://www.biomedcentral.com/1471-2164/15/316 PAGE 7 thetable,atleastonehumangenefromthatclustermust bepresentintheKEGGpathwayrepresented.Inmost cases,arowconsistsofoneormorehumangenesthatbelongtotheselectedKEGGpathway,alongwiththecomputationallypredictedorthologsfrom Mnemiopsis and21 othermodelorganisms.The Cluster columnindicatesthe mostinclusivephylogeneticcladethatencompassesallof thegenesintheparticularcluster( e.g., Metazoa couldindicatethattheclustercontainsgenesfrombothbilaterians andcnidariansandthuscannotbecharacterizedbyaless inclusiveclade,suchas Bilateria ).Eachhumangeneis hyperlinkedtoitsEntrezGeneentry.The Ratio column representsthenumberofhumangenesintheparticular clusterthatareinagivenpathway(numerator)overthe numberoftotalhumangenesinthatcluster(denominator).Thehighertheratio,themorelikelythenon-human orthologsintheclusterareinvolvedinthepathway.The numbersinthecolumnsundereachspeciesabbreviation indicatethenumberofgenesfromthatspeciesthatarein thatcluster.Eachnumberinthe Ml ( Mnemiopsisleidyi ) columnislinkedtotheappropriate Mnemiopsis geneID(s) andcorrespondingGeneWikipages. Pfamdomainsin Mnemiopsis proteins Anotherwaytocharacterizethe Mnemiopsis genesisto determinetheproteindomainsthattheyencode.Weused hmmscanfromtheHMMERsuite(HMMER3.0;March 2010)tosearchtheProteinModels2.2andUnfiltered ProteinModelsfordomainsfromthePfam-Adatabase (version25).Thegatheringthreshold(cut_ga)optionwas usedtoensureconservativedomainprediction.ThePfam DomainslinkonthehomepageoftheMGPPortaltakes theusertoaquerypage,whereresearcherscanspecifya givenPfam-Adomainbynameoraccessionnumber,then searchfor Mnemiopsis genesthatcontainthatdomain (Additionalfile7:FigureS3).Theresultsaredisplayedas alistofproteinmodels,listedbygeneidentifier,andthe numberofquerydomainsfoundinthoseproteinmodels. Figure4 HumanKEGGpathwayscontaininggeneswitha Mnemiopsis homologarepresentedaspathway-specificorthologcluster matrices. Eachrowintheglycosphingolipidbiosynthesispathwayrepresentsaclusterfromourclusteringanalysis.The Cluster columnindicates themostinclusivecladethatencompassesalloftheproteinsinthecluster.The Ratio columnrepresentsthenumberofhumanproteinsina givenclusterthatarefoundinthepathwayoverthetotalnumberofhumanproteinsinthatcluster. Mnemiopsis (Ml)entriesareshadedingray andhyperlinkedtotheirrespectiveGeneWikipages. Moreland etal.BMCGenomics 2014, 15 :316 Page7of13 http://www.biomedcentral.com/1471-2164/15/316 PAGE 8 Additionally,ausermaydownloadFASTA-formatted Pfam-Adomainsequencesfromtheresultinglistbyclickingonthecheckboxesnexttothesequence(s)ofinterest, selectingeitherPfam-Adomainonlyorthefull-length domain-containingproteinfromthepull-downmenu,and clicking Get Mnemiopsis GeneWiki Inanefforttoengagethecollectiveexpertiseofthescientificcommunity,wehaveimplementedacollaborative wiki(MediaWikiversion1.19.11)forthe Mnemiopsis genecomplement.The Mnemiopsis GeneWikiisaccessiblefromtheleftsidebarofmostpagesandis searchableeitherbyselectinga Mnemiopsis geneidentifier( e.g., ML00011a)fromthedrop-downmenuorby manuallyenteringanidentifierintheappropriatesearch box.Userscanalsoaccessthesepagesbyclickingona geneinthe2.2trackofthegenomebrowser.EachrecordintheGeneWikirepresentsasingle Mnemiopsis geneandprovidesthefollowingannotation:nucleotide andproteinsequences,codingexonicgenomiccoordinates, pre-computedBLASThitsf romnumerousorganisms displayingthetophitsforeachprotein,thetopnon-self BLASThitto Mnemiopsis ,Pfam-Adomains,GeneOntology(GO)functionalannotations,humandiseasegenes fromOnlineMendelianInheritanceinMan(OMIM),and atableoforthologclustersformedbyphylogenetically informedclusteringmethods[4](Figure5).Inaddition, controllededitablesectionshavebeenincludedthat permit(andencourage)thescientificcommunityto providefurthergeneannotationforisoforms, insitu images,references,andothernotesforeachgene.Users interestedinsupplementingourgenemodelannotation atthe Mnemiopsis GeneWikipagesmustfirstcreatean accountandloginpriortosubmittingtheircontributions.In-housesubjectmatterexpertdatacuratorsare notifiedbye-mailfollowingthecreationofanewuser accountoranedittoanexistingGeneWikirecord.Any Figure5 EachrecordintheGeneWikirepresentsasingle Mnemiopsis gene(ML000127a)andprovidesthefollowingannotation: nucleotideandproteinsequences,codingexonicgenomiccoordinates,andpre-computedBLASThitsfromnumerousorganisms displayingthetophitsforeachprotein. TheinsetillustratesadditionalannotationsavailablethroughtheGeneWikipages,includingthose regardingPfam-AdomainsandGOterminology(functionalannotation). Moreland etal.BMCGenomics 2014, 15 :316 Page8of13 http://www.biomedcentral.com/1471-2164/15/316 PAGE 9 contentchangesoradditionstotheGeneWikiarethoroughlyevaluatedbythesedatacuratorsandaremade publicsubjecttotheirapproval. Pre-compiledBLASThitsareenumeratedintabularform. Each Mnemiopsis proteinwascomparedtotheUniProt andNCBInon-redundantproteindatabases(nr)using BLASTP.Theresultsdisplaythehitnumber,theaccession numbers, E -values,andbriefdescriptionsofthetopfour hits(lowest E -values).AccessionnumbersarelinkedtorelevantcorrespondingentriesatUniProtandGenBank.The E -valuesarehyperlinkedtothepairwiseBLASTalignments. Each Mnemiopsis proteinwasalsocomparedtosequence datafromdevelopmentallyrelevantorganisms,including Homosapiens Drosophilamelanogaster Capitellateleta Amphimedonqueenslandica Nematostellavectensis Hydra magnipapillata Trichoplaxadhaerens Monosigabrevicollis Salpingoecarosetta Capsasporaowczarzaki ,fungi, plants,andnon-eukaryotes.ThetophitsforBLASTPand TBLASTNresults,fallingbelowanestablished E -value threshold( E -value 110-6),aredisplayedalongwith theirgeneorproteinidentifiers, E -values,anddescriptionofthebesthits.Forthe Mnemiopsis organismal databasesearch,thegeneidentifierofthetopnon-self hitisdisplayed(andlinkedtoitscorrespondingGene Wikipage)alongwiththe E -valueforthatalignment. TheGeneWikialsocontainsasectiondisplayingthe Pfam-Adomainsthatareencodedbytheprotein.The Pfamidentifier,domainarchitectures,sequencestartand endcoordinates,HMMstartandendcoordinates, E -values, anddomainsequencearedisplayedforeachPfam-Adomain.AllPfamidentifiersarehyperlinkedtotheircorrespondingentriesonthePfamWebsite[32].Tofurther assistinclassification,GOtermsarepresentedforeach gene,withGOtermsassignedusingtheArgot2[33] method.FunctionalannotationsderivedusingBlast2GO [34]arealsopresentedforeachgene.RetrievingsinglescaffoldsequencesThegenomicsequenceofasingle(orpartial)scaffoldcan beretrievedusingtheFetchScaffoldTool.Ausercan downloadaFASTA-formattedfullgenomicscaffoldsequencebyfollowingtheFetchaScaffoldlinkintheleft sidebaroftheMGPPortalhomepage,enteringascaffold identifier( e.g., ML0001)inthesearchbox,andselecting theFetchsequenceoption(Figure6).Partialscaffoldsequencescanberetrievedbyaddingscaffoldcoordinatesto theabovequery.Alternatively,userscanretrievethereversecomplementorsix-frameproteintranslationofthe scaffoldbyselectingtheappropriateoption.DownloadingfullorpartialdatasetsAsnotedabove,oneofourprimaryobjectivesindevelopingtheMGPPortalistosimplifythedisseminationofall Mnemiopsis sequencedatatothescientificcommunityat large.Tothatend,wehaveprovidedusersadirectmethod forobtainingentire Mnemiopsis datasetsascompressed textfiles.Thefollowingcompletesequencedatasetscan bedownloadedbyclickingontheappropriatelinkslocatedintheleftsidebar:the Mnemiopsisleidyi genome assembly(5,100scaffolds),thefullsetof Mnemiopsis GeneModels(16,548genes),the Mnemiopsis Protein Models(16,548proteins),the Mnemiopsis Unfiltered ProteinModels(60,006proteins),allpubliclyavailable Mnemiopsis ESTsequencesfromNCBI(15,752ESTs), andthefull Mnemiopsis mitochondrialgenomeand proteinsequences[27](11proteins;Table1).Optionally,usersmayenteraknown Mnemiopsis sequenceidentifier( e.g., ML0001orML00011a)inthesearchboxor selectfromalistofidentifierstoretrieveasinglescaffold, genemodel,proteinmodel,orESTofinterest.DemonstratingtheMGPPortal sutility:aworked exampleTheMGPPortalwasdevelopedtofacilitateresearchthat wouldbenefitfromtheavailabilityofgenomicinformation fromthisemergingmodelorganismand,tothisend,itincludesanumberofintuitivedataanalysistools.Toillustratethispoint,considerthecaseofadevelopmental biologiststudyingthehumanTALEclasshomeoboxgene family( e.g., PBX3;[GenBank:NP_001128250.1])whomay beinterestedincomparingthesesequencesagainst(or predictingnovel) Mnemiopsis homeodomainorthologs.A straightforwardapproachtoa ddressingthisquestionwould betorunaBLASTPsearchofthePBX3proteinsequence againstthe Mnemiopsis ProteinModels(2.2)database. The Mnemiopsis BLASTresultsdisplayanumberofhighscoringcandidateproteinsthatcanbefurtherevaluated forpropertiescharacteristicofTALEclasshomeodomains ( e.g., aTALE-typehomeobox;Figure1). Alternatively,anotherbiologistmaybeinterestedin searchingfornovel Mnemiopsis homeodomains,using sequencedatafromacloselyrelatedorganismsuchas Amphimedon asthebasisfortheirsearch.Oneapproach wouldbetoretrieveacompletesetofknown Amphimedon homeodomainproteinsthroughanNCBIEntrezquery (Search PAGE 10 thatwerescreenedandfilteredoutduringtheinitialannotationprocess.Theseunfilteredproteinmodelscan beplacedintogenomiccontextbyfollowingthecolorcoded Mnemiopsis GenomeBrowser[B]linksinthefar right-handcolumnoftheBLASTresults.Thebrowser providesdirectaccesstoadditionalannotation,includingagraphicalrepresentationofthePfam-Adomain thatoverlapstheproteinmodel,aswellaslinkstothe sequenceofthePfam-Adomain(Figure7). Similarly,aresearchermayalsowanttousetheirown customscriptsorexternalcomputationaltoolstofurtherexploretheavailable Mnemiopsis datasets.Insuchacase, theDownloadSequencelinksintheMGPPortalcanbe usedtodownloadboththeProteinModelsandtheUnfilteredProteinModelsforanalysiswithtoolsfromthe HMMERsuite[26]( e.g., hmmsearchHomeobox.hmm ML2.2aa>ML_novel_HDs).Predicteddomainswith E -valuesbelowaninclusionthreshold( e.g.,E -value<0.05) couldthenbeconsideredcandidatehomeodomainsfor furtherevaluation. Conclusions The Mnemiopsis GenomeProjectPortalisintendedasa resourceforinvestigatorsfromthescientificcommunity toobtaingenomicinformationon Mnemiopsis through anintuitiveandeasy-to-useinterface;italsoservesasa modelforresearchersundertakingthedevelopmentof Table1 Mnemiopsisleidyi completesequencedatasets availablefordownloadfromthe Mnemiopsis Genome ProjectPortal DatasetNumberofsequences Genomeassembly(scaffolds)5,100 Genemodels16,548 Proteinmodels16,548 Unfilteredproteinmodels60,006 ESTs15,752 Mitochondrialgenome1 Mitochondrialproteins11 Figure6 The Mnemiopsis FetchToolisusedtoretrieveasingleorpartialscaffold,itsreversecomplementorthesix-frameprotein translation. DisplayedaboveistheoutputofaqueriedpartialgenomicscaffoldforML0001showingthespecifiedgenomicregionofinterest anditssix-frameproteintranslations(thefirstfivetranslationsaredepicted). Moreland etal.BMCGenomics 2014, 15 :316 Page10of13 http://www.biomedcentral.com/1471-2164/15/316 PAGE 11 suchacustomizedgenomeportalthemselves.Thereare anumberofcomprehensivedataportalsavailablefor well-establishedmodelorganisms( e.g. ,FlyBase).However,aswesearchedformodelWebsitesfromwhichto drawinspirationfortheMGPPortal,wefoundthat manyrepositoriesfornext-generationsequencingdata aresimplyWebsiteswithlistsoflinkstorawsequence dataaccompaniedbyminimalannotation,orwerenonintuitiveanddifficulttonavigate.Basedonthisexperience,wefeltthattheselectionandutilizationofessential resourcestosystematicallymanageanddisseminatethe considerableamountsofdatageneratedbythesesequencingprojectswasimperative.Thus,thepresentationandconveyanceofsuchagenomeWebportal shouldbeintuitive,user-friendly,andconcise.Itis withinthisframeworkthatwepresenttheMGPPortal assucharesourcefortherecentlycompletedgenome sequenceof Mnemiopsisleidyi ,andwearehopefulthat thisresourcewillinspireothergroupsastheycreate Webportalsoftheirown. ItwasourintentduringthedevelopmentoftheMGP Portaltodeveloparesourcetomaximizeusabilitywhile presentingacomprehensiveseriesofdatasetsnotavailableelsewhere.Recognizingthedifficultiesandlessons learnedfromthedevelopmentofsucharesource,andin ourcontinuedefforttofurthercommunicateourshared experiencestothescientificcommunityatlarge,weencourageotherinvestigatorstoconsidertheproposed genomeportalmodeland,assuch,haveincludeda seriesofscripts(Additionalfiles1,2,3and4)tofacilitatetheconversionofoutputfilesproducedbyvarious programs.Specifically,thisseriesofscriptscanbeusedto formatannotationdataforvisualizationwithinacustomizedgenomebrowserandawiki. Figure7 SelectingtheGenomeBrowserlink(purple B boxfromFigure1)fromaBLASTPresultentryofqueriedknown homeodomainsagainstthecomplete Mnemiopsis UnfilteredProteinModelsdirectstheusertotheGenomeBrowserdisplayingthe applicable Mnemiopsis transcriptmodel. Shownabovethe2.2UFtrackinthebrowseristhePFAM2.2trackdisplayingevidenceofa homeoboxinthetargetedregion.Clickingonthe Homeobox linkinthePFAM2.2trackopensanewbrowserwindowdisplayingthePfam-A domain(ML1991_pfa)predictionresultsderivedfromapre-compile dhmmscanrunusingHMMER.ThisPfam-Adomainrecordprovidesthe genomiclocationofthePfam-Adomain,thegenomiccoordinates,transcriptandproteinsequences,andthehmmscanoutputforthe homeodomainprediction. Moreland etal.BMCGenomics 2014, 15 :316 Page11of13 http://www.biomedcentral.com/1471-2164/15/316 PAGE 12 Asdescribedabove,theMGPPortalcontainssequencebasedinformationandseveralcustomizedutilitiesnot availableelsewhere,increasingtheutilityofthedata generatedbyourgroupinthecourseofour Mnemiopsis whole-genomesequencingproject[4].Thegenome browsertoolprovidesanintuitiveinterfaceforusersto visualizethevarioustypesofdataavailable,including dataresultingfromourcomprehensiveannotationof the Mnemiopsis genome.Mostimportantly,manyfeaturesofthissitemakeiteasyforuserswhodonothave abackgroundinbioinformaticstostraightforwardlyaccessinformationpresentedfromacomparativegenomicspoint-of-view,withouthavingtoperformmanyof theanalysesthemselves.Forinstance,ourphylogeneticallyrelevantgeneclustersaremappedtohumanKEGG pathways,providingaclearphylogeneticperspectivefor anyparticular Mnemiopsis gene(orpathwayofgenes)of interest.Inaddition,usersmaycontributetoourgeneannotationeffortsbyaddingisoforms, insitu images,or othernotestoanyGeneWikipageusingasecurelogin. Wetrustthattheavailabilityofthesedatawillallowinvestigatorsfromnumerousfields(suchasdevelopmental, evolutionary,andmarinebiology)toadvancetheirown researchprojectsaimedatunderstandingphylogeneticdiversityandtheevolutionofproteinsthatplayafundamentalroleinmetazoandevelopment.AvailabilityandrequirementsThe Mnemiopsis GenomeProjectPortalisfreelyavailable athttp://research.nhgri.nih .gov/mnemiopsis,withnobarrierstoaccess.Registrationis onlyrequiredifuserswishto contributedatatotheIsoforms, Insitu Images,References, orNotessectionsofanyoftheGeneWikipages.AdditionalfilesAdditionalfile1: The scaffoldToGFF3.pl scriptreformatsscaffold sequencesfromFASTAtoGFF3. ThescriptcreatesaGFF3fileforeach sequenceintheinputscaffoldFASTAfileandisabletohandleboth gapped[ e.g. ,thescaffold(SCF)track]andrepeat-masked[ e.g. ,the repeat-masked(MASK)track]regionsinascaffold. Additionalfile2: The evmToGFF3.pl scriptparsesaGFF3-formatted outputfilecreatedbyEvidenceModeler(EVM)bycollectingdata aboutthestartandendpositionsofpredictedgenes,usingthis informationtocreatewell-formedGFF3files. Additionalfile3: The cufflinksToGFF3.pl scriptparsespredicted transcriptassembliesfromtheGTF-formattedfilecreatedby Cufflinks. Additionalfile4: The create_wiki_page.pl scriptcreatesMediaWiki pagesfordisplayinggenomicdata. Here,weprovideasamplewiki pagethattakesasinputFASTA-formattednucleotideandprotein sequences,GFF3filescontainingexoniccoordinates,andanhmmscan outputfilecontaininginformationonPfam-Adomains.Theoutputofthis perlscriptiscalledwikipage.out. Additionalfile5:FigureS1. The Mnemiopsis BLASTtool(implemented usingViroBLAST)schematicillustratestheavailableuser-definedinput andoutputformats,BLASTprograms,anddatabaseoptions.BLASTdatabases areprovidedforboth Mnemiopsis nucleotide(e.g.,Mitochondrialgenome) andprotein[e.g.,ProteinModels(2.2)]data. Additionalfile6:FigureS2. TheKEGGPathwayssearchfunction permitsuserstosearchKEGGpathwayscontaininghumangenes,usinga Mnemiopsis homologasthequery.Therelationshipsunderlingthesearch functionaredepictedasaseriesofassociatedflatfiles.Aone-to-one relationshipexistsbetweentheKEGGandPATHWAYtablesandthe PEP2SOURCE_IDandCLUSTERStables .Allotherrelationshipsareone to-many ormany-to-many.TheCLUSTERING_ANALYSIStableisthefinaloutput representationofaKEGGPathwaysqueryconsistingofthecombination ofKEGG_ENTREZ_GENE,SPECIES,andCLUSTERS. Additionalfile7:FigureS3. ThePFAMDomainssearchfunctionparses aseriesofflatfilesillustratedhereasarelationalframework.ThePFAM Domainsschemaisrepresentedassixattributes,withconnectorsindicating thenatureofeachapplicablerelationship.DOMAIN_ACCESSIONand DOMAIN_NAMEhaveaone-to-onerelationship.Allotherrelationships betweenPFAMattributesaremany-to-many. Competinginterests Theauthorsdeclarethattheyhavenocompetinginterests. Authors contributions JFRandADBconceivedthestudy.RTM,ADN,andTGWdesignedand developedthedatabasewithcriticalinputfromJFR,CES,andADB.ADN wrotethecodeandimplementedtheinterfacesandtoolsassociatedwith theMGPPortal.RTM,JFR,CES,BJK,KS,andTGWperformedthegenome annotationanddataanalysis.RTM,JFR,CES,TGW,andADBtestedtheWeb applicationandtoolsandprovidedfeedback.RTMandADBwrotethe manuscript,withinputandsuggestionsfromCES,JFR,andTGW.ADB directedtheproject.Allauthorsreadandapprovedthefinalmanuscript. Acknowledgments ThisresearchwassupportedbytheIntramuralResearchProgramofthe NationalHumanGenomeResearchInstitute,NationalInstitutesofHealth.We wouldliketothankStevenBond,MarkFredriksen,DerekGildea,andEvan Maxwellfortheirthoughtful,constructivecommentsduringthe developmentofthePortal.WealsothankStevenBond,DerekGildea,and EvanMaxwellfortheircriticalreadingofthemanuscript. Authordetails1GenomeTechnologyBranch,DivisionofIntramuralResearch,National HumanGenomeResearchInstitute,NationalInstitutesofHealth,50South Drive,Bethesda,MD20892,USA.2WhitneyLaboratoryforMarineBioscience, UniversityofFlorida,St.Augustine,FL32080,USA. Received:12November2013Accepted:31March2014 Published:28April2014 References1.HenryJQ,MartindaleMQ: Regulationandregenerationinthectenophore Mnemiopsisleidyi DevBiol 2000, 227 (2):720 733. 2.MartindaleMQ,FinnertyJR,HenryJQ: TheRadiataandtheevolutionary originsofthebilaterianbodyplan. MolPhylogenetEvol 2002, 24 (3):358 365. 3.FreemanG,ReynoldsGT: Thedevelopmentofbioluminescenceinthe ctenophore Mnemiopsisleidyi DevBiol 1973, 31 (1):61 100. 4.RyanJF,PangK,SchnitzlerCE,NguyenAD,MorelandRT,SimmonsDK,KochBJ, FrancisWR,HavlakP,NISCComparativeSequencingProgram,SmithSA, PutnamNH,HaddockSHD,DunnCW,WolfsbergTG,MullikinJC, MartindaleMQ,BaxevanisAD: Thegenomeofthectenophore Mnemiopsisleidyi anditsimplicationsforcelltypeevolution. Science 2013, 342 (6164):1242592. 5.RyanJF,PangK,MullikinJC,MartindaleMQ,BaxevanisAD: The homeodomaincomplementofthectenophore Mnemiopsisleidyi suggeststhatctenophoraandporiferadivergedpriortothe ParaHoxozoa. Evodevo 2010, 1 (1):9. 6.ReitzelAM,PangK,RyanJF,MullikinJC,MartindaleMQ,BaxevanisAD, TarrantAM: Nuclearreceptorsfromthectenophore Mnemiopsisleidyi lackazinc-fingerDNA-bindingdomain:lineage-specificlossorancestralMoreland etal.BMCGenomics 2014, 15 :316 Page12of13 http://www.biomedcentral.com/1471-2164/15/316 PAGE 13 conditionintheemergenceofthenuclearreceptorsuperfamily? Evodevo 2011, 2 (1):3. 7.LiebeskindBJ: Evolutionofsodiumchannelsandthenewviewofearly nervoussystemevolution. CommunIntegrBiol 2011, 4 (6):679 683. 8.PangK,RyanJF,MullikinJC,BaxevanisAD,MartindaleMQ: Genomic insightsintoWntsignalinginanearlydivergingmetazoan,the ctenophore Mnemiopsisleidyi Evodevo 2010, 1 (1):10. 9.PangK,RyanJF,BaxevanisAD,MartindaleMQ: EvolutionoftheTGF-beta signalingpathwayanditspotentialroleinthectenophore, Mnemiopsis leidyi PLoSOne 2011, 6 (9):e24152. 10.KochBJ,RyanJF,BaxevanisAD: ThediversificationoftheLIMsuperclass atthebaseofthemetazoaincreasedsubcellularcomplexityand promotedmulticellularspecialization. PLoSOne 2012, 7 (3):e33261. 11.MaxwellEK,RyanJF,SchnitzlerCE,BrowneWE,BaxevanisAD: MicroRNAs andessentialcomponentsofthemicroRNAprocessingmachineryare notencodedinthegenomeofthectenophore Mnemiopsisleidyi BMCGenomics 2012, 13 (1):714. 12.SchnitzlerCE,PangK,PowersML,ReitzelAM,RyanJF,SimmonsD,TadaT, ParkM,GuptaJ,BrooksSY,BlakesleyRW,YokoyamaS,HaddockSH, MartindaleMQ,BaxevanisAD: Genomicorganization,evolution,and expressionofphotoproteinandopsingenesin Mnemiopsisleidyi :anew viewofctenophorephotocytes. BMCBiol 2012, 10 (1):107. 13.SrivastavaM,BegovicE,ChapmanJ,PutnamNH,HellstenU,KawashimaT, KuoA,MitrosT,SalamovA,CarpenterML,SignorovitchAY,MorenoMA, KammK,GrimwwodJ,SchmutzJ,ShapiroH,GrigorievIV,BussLW, SchierwaterB,DellaportaSL,RokhsarDS: TheTrichoplaxgenomeandthe natureofplacozoans. Nature 2008, 454 (7207):955 960. 14. GenomeIndex-OriginsofMulticellularity;BroadInstitute. http://www.broadinstitute.org/annotation/genome/multicellularity_project/ GenomesIndex.html. 15.SullivanJC,RyanJF,WatsonJA,WebbJ,MullikinJC,RokhsarD,FinnertyJR: StellaBase:thenematostellavectensisgenomicsdatabase. NucleicAcids Res 2006, 34 (Databaseissue):D495 D499. 16.KreppelL,FeyP,GaudetP,JustE,KibbeWA,ChisholmRL,KimmelAR: dictyBase:anew Dictyosteliumdiscoideum genomedatabase. Nucleic AcidsRes 2004, 32 (Databaseissue):D332 D333. 17. AiptasiaWiki. http://aiptasia.cs.vassar.edu/AiptasiaWiki/. 18.SkinnerME,UzilovAV,SteinLD,MungallCJ,HolmesIH: JBrowse:a next-generationgenomebrowser.GenomeRes 2009, 19 (9):1630 1638. 19.HaasBJ,SalzbergSL,ZhuW,PerteaM,AllenJE,OrvisJ,WhiteO,BuellCR, WortmanJR: Automatedeukaryoticgenestructureannotationusing EVidenceModelerandtheprogramtoassemblesplicedalignments. GenomeBiol 2008, 9 (1):R7. 20.TrapnellC,WilliamsBA,PerteaG,MortazaviA,KwanG,vanBurenMJ, SalzbergSL,WoldBJ,PachterL: Transcriptassemblyandquantificationby RNA-Seqrevealsunannotatedtranscriptsandisoformswitchingduring celldifferentiation. NatBiotechnol 2010, 28 (5):511 515. 21.DengW,NickleDC,LearnGH,MaustB,MullinsJI: ViroBLAST:astand-alone BLASTwebserverforflexiblequeriesofmultipledatabasesanduser's datasets. Bioinformatics 2007, 23 (17):2334 2336. 22. PHP. http://www.php.net. 23. ThePerlProgrammingLanguage. http://www.perl.org. 24. NCBIBLASTFTPsite. ftp://ftp.ncbi.nlm.nih.gov/blast. 25. PythonProgrammingLanguage. http://www.python.org. 26.FinnRD,ClementsJ,EddySR: HMMERwebserver:interactivesequence similaritysearching. NucleicAcidsRes 2011, 39 (WebServerissue):W29 W37. 27.PettW,RyanJF,PangK,NISCComparativeSequencingProgram,Martindale MQ,BaxevanisAD,LavrovDV: Extrememitochondrialevolutioninthe ctenophore Mnemiopsisleidyi MitochondrialDNA 2011, 22 (4):130 142. 28.GrabherrMG,HaasBJ,YassourM,LevinJZ,ThompsonDA,AmitI,AdiconisX, FanL,RaychowdhuryR,ZengQ,ChenZ,MauceliE,HacohenN,GnirkeA, RhindN,diPalmaF,BirrenBW,NusbaumC,Lindblad-TohK,FriedmanN, RegevA: Full-lengthtranscriptomeassemblyfromRNA-Seqdatawithouta referencegenome. NatBiotechnol 2011, 29: 644 652. 29.KurtzS,ChoudhuriJV,OhlebuschE,SchleiermacherC,StoyeJ,GiegerichR: REPuter:themanifoldapplicationsofrepeatanalysisonagenomic scale. NucleicAcidsRes 2001, 29 (22):4633 4642. 30.MaglottD,OstellJ,PruittKD,TatusovaT: Entrezgene:gene-centered informationatNCBI. NucleicAcidsRes 2005, 33 (Databaseissue):D54 D58. 31.HosackDA,DennisGJr,ShermanBT,LaneHC,LempickiRA: Identifying biologicalthemeswithinlistsofgeneswithEASE. GenomeBiol 2003, 4 (10):R70.32.FinnRD,MistryJ,TateJ,CoggillP,HegerA,PollingtonJE,GavinOL, GunasekaranP,CericG,ForslundK,HolmL,SonnhammerEL,EddySR, BatemanA: ThePfamproteinfamiliesdatabase. NucleicAcidsRes 2010, 38 (Databaseissue):D211 D222. 33.FaldaM,ToppoS,PescaroloA,LavezzoE,DiCamilloB,FacchinettiA,CiliaE, VelascoR,FontanaP: Argot2:alargescalefunctionpredictiontool relyingonsemanticsimilarityofweightedgeneontologyterms. BMCBioinforma 2012, 13 (Suppl4):S14. 34.ConescaA,GotzS,Garcia-GomezJM,TerolJ,TalonM,RoblesM: Blast2GO: auniversaltoolforannotation,visualizationandanalysisinfunctional genomicsresearch. Bioinformatics 2005, 21 (18):3674 3676.doi:10.1186/1471-2164-15-316 Citethisarticleas: Moreland etal. : AcustomizedWebportalforthe genomeofthectenophore Mnemiopsisleidyi BMCGenomics 2014 15 :316. Submit your next manuscript to BioMed Central and take full advantage of: Convenient online submission Thorough peer review No space constraints or color gure charges Immediate publication on acceptance Inclusion in PubMed, CAS, Scopus and Google Scholar Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit Moreland etal.BMCGenomics 2014, 15 :316 Page13of13 http://www.biomedcentral.com/1471-2164/15/316 |