A customized Web portal for the genome of the ctenophore Mnemiopsis leidyi

MISSING IMAGE

Material Information

Title:
A customized Web portal for the genome of the ctenophore Mnemiopsis leidyi
Physical Description:
Mixed Material
Language:
English
Creator:
Moreland, Travis R
Nguyen, Anh-Dao
Ryan, Joseph F.
Schnitzler, Christine E.
Koch, Bernard J.
Publisher:
Bio Med Central (BMC Genomics)
Publication Date:

Notes

Abstract:
Background: Mnemiopsis leidyi is a ctenophore native to the coastal waters of the western Atlantic Ocean. A number of studies on Mnemiopsis have led to a better understanding of many key biological processes, and these studies have contributed to the emergence of Mnemiopsis as an important model for evolutionary and developmental studies. Recently, we sequenced, assembled, annotated, and performed a preliminary analysis on the 150-megabase genome of the ctenophore, Mnemiopsis. This sequencing effort has produced the first set of whole-genome sequencing data on any ctenophore species and is amongst the first wave of projects to sequence an animal genome de novo solely using next-generation sequencing technologies. Description: The Mnemiopsis Genome Project Portal (http://research.nhgri.nih.gov/mnemiopsis/) is intended both as a resource for obtaining genomic information on Mnemiopsis through an intuitive and easy-to-use interface and as a model for developing customized Web portals that enable access to genomic data. The scope of data available through this Portal goes well beyond the sequence data available through GenBank, providing key biological information not available elsewhere, such as pathway and protein domain analyses; it also features a customized genome browser for data visualization. Conclusions: We expect that the availability of these data will allow investigators to advance their own research projects aimed at understanding phylogenetic diversity and the evolution of proteins that play a fundamental role in metazoan development. The overall approach taken in the development of this Web site can serve as a viable model for disseminating data from whole-genome sequencing projects, framed in a way that best-serves the specific needs of the scientific community. Keywords: Mnemiopsis leidyi, Genome browser, Customized Web portal, Gene wiki
General Note:
Moreland et al. BMC Genomics 2014, 15:316 http://www.biomedcentral.com/1471-2164/15/316; Pages 1-13
General Note:
doi:10.1186/1471-2164-15-316 Cite this article as: Moreland et al.: A customized Web portal for the genome of the ctenophore Mnemiopsis leidyi. BMC Genomics 2014 15:316.

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
© 2014 Moreland et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
System ID:
AA00024649:00001

Full Text





PAGE 1

Insert query FASTA sequence or upload sequence file BLASTN TBLASTN TBLASTX INPUT PROGRAM! BLASTP BLASTX DATABASES! NUCLEOTIDE PROTEIN! Main Scaffolds Gene Models (2.2) Unfiltered Gene Models (Unincorporated Predictions) Public ESTs Mitochondrial Genome Cufflinks-Assembled Transcripts Trinity-Assembled Transcripts Protein Models (2.2) Unfiltered Protein Models (Unincorporated Proteins) Mitochondrial Proteins OUTPUT Alignment Formats: Pairwise Query-anchored with identities Query-anchored without identities Flat query-anchored with identities Flat query-anchored without identities XML Tabular Tabular with comment lines



PAGE 1

DATABASEOpenAccessAcustomizedWebportalforthegenomeofthe ctenophore MnemiopsisleidyiRTravisMoreland1,Anh-DaoNguyen1,JosephFRyan1,2,ChristineESchnitzler1,BernardJKoch1, KatherineSiewert1,TyraGWolfsberg1andAndreasDBaxevanis1*AbstractBackground: Mnemiopsisleidyi isactenophorenativetothecoastalwatersofthewesternAtlanticOcean.A numberofstudieson Mnemiopsis haveledtoabetterunderstandingofmanykeybiologicalprocesses,andthese studieshavecontributedtotheemergenceof Mnemiopsis asanimportantmodelforevolutionaryand developmentalstudies.Recently,wesequenced,assembled,annotated,andperformedapreliminaryanalysison the150-megabasegenomeofthectenophore, Mnemiopsis .Thissequencingefforthasproducedthefirstsetof whole-genomesequencingdataonanyctenophorespeciesandisamongstthefirstwaveofprojectstosequence ananimalgenome denovo solelyusingnext-generationsequencingtechnologies. Description: The Mnemiopsis GenomeProjectPortal(http://research.nhgri.nih.gov/mnemiopsis/)isintendedboth asaresourceforobtaininggenomicinformationon Mnemiopsis throughanintuitiveandeasy-to-useinterfaceand asamodelfordevelopingcustomizedWebportalsthatenableaccesstogenomicdata.Thescopeofdataavailable throughthisPortalgoeswellbeyondt hesequencedataavailablethrough GenBank,providingkeybiological informationnotavailableelsewhere,suchaspathwayandproteindomainanalyses;italsofeaturesacustomized genomebrowserfordatavisualization. Conclusions: Weexpectthattheavailabilityofthesedatawilla llowinvestigatorstoadvancetheirownresearch projectsaimedatunderstandingphylogeneticdiversityandtheevolutionofproteinsthatplayafundamental roleinmetazoandevelopment.Theoverallapproachta keninthedevelopmentofthisWebsitecanserveasa viablemodelfordisseminatingdatafromwhole-genome sequencingprojects,framedinawaythatbest-serves thespecificneedsofthescientificcommunity. Keywords: Mnemiopsisleidyi ,Genomebrowser,CustomizedWebportal,GenewikiBackgroundCtenophoresareanimportantgroupofearly-branching metazoansthatareessentialforunderstandingtheevolutionofmulticellularanimals,therelationshipbetween genomiccomplexityandmorphologicalcomplexity,and themolecularbasisfortheevolutionofnovelcelltypes suchasepithelia,neurons,muscle,andstemcells.One ctenophorespeciesthathasreceivedparticularattention is Mnemiopsisleidyi ,whichisnativetothecoastalwatersoftheAtlanticOcean.Studiesin Mnemiopsis have advancedourunderstandingofanumberofimportant biologicalprocessessuchasregeneration,axialpatterning,andbioluminescence[1-3].Assuch, Mnemiopsis hasemergedasanimportantmodelorganismforunderstandingtheimmensediversityandcomplexityseenin theearlyevolutionofanimals. Despitetheimportanceof Mnemiopsis asanemerging modelorganism,therewerenohigh-qualitygenome-scale sequencedataavailableforanyctenophorespeciesuntil recently.Toaddressthisdearthofgenome-scalesequence data,werecentlycompletedthesequencing,assembly,annotation,andpreliminaryanalysisofthe150-megabase genomeof Mnemiopsisleidyi [4];thesedatawillserveas aninvaluableresourceforthegrowingcommunityofdevelopmental,evolutionary,andmarinebiologistsstudying importantquestionsregardingearlybranchingmetazoan biology.Initialstudiesutilizingthesesequencedatahave *Correspondence: andy@mail.nih.gov1GenomeTechnologyBranch,DivisionofIntramuralResearch,National HumanGenomeResearchInstitute,NationalInstitutesofHealth,50South Drive,Bethesda,MD20892,USA Fulllistofauthorinformationisavailableattheendofthearticle 2014Morelandetal.;licenseeBioMedCentralLtd.ThisisanOpenAccessarticledistributedunderthetermsoftheCreative CommonsAttributionLicense(http://creativecommons.org/licenses/by/2.0),whichpermitsunrestricteduse,distribution,and reproductioninanymedium,providedtheoriginalworkisproperlycredited.TheCreativeCommonsPublicDomain Dedicationwaiver(http://creativecommons.org/publicdomain/zero/1.0/)appliestothedatamadeavailableinthisarticle, unlessotherwisestated.Moreland etal.BMCGenomics 2014, 15 :316 http://www.biomedcentral.com/1471-2164/15/316

PAGE 2

contributedtoourunderstandingoftheevolutionofgene families[5-7],signalingpathways[8,9],proteindomains [10],miRNAs[11],andgenesinvolvedintheproduction anddetectionoflight[12].Theavailabilityofthesedata alsohasprovidedasolidfoundationforstudiesaimedat resolvingthequestionofthephylogeneticpositionofthe ctenophores[4]. Inrecentyears,databaseshavebeencreatedtohouse whole-genomesequencingdatafromseveralemerging modelorganisms.GenomicdataandannotationaretypicallymadeaccessibleviapublicgenomeportalsatsequencingcenterssuchastheUSDepartmentofEnergy ’ sJoint GenomeInstitute(JGI)[13]andtheBroadInstitute[14], whileothergroupshavedevelopedWeb-basedgenomic databaseresourcesthatprovideadditionalanalysistools [15]andbrowsingoptions[16]toincreasetheutilityof thedata.Stillothershaveimplementedgenomicresources thatofferthescientificcommunityaccesstogenomicannotationandactivelyseekusercontributions[17].Ideally, foreachorganismwithasequencedgenome,therewould beasinglecentralizedresourcewheremost(ifnotall) dataretrievalandanalysiscouldtakeplace;thiskindofresourcewouldinclude,ataminimum,theabilitytosearch, browse,anddownloadsequenceandannotationdata, visualizegenomicdataviaaWeb-basedbrowsertool,and encouragetheactiveengagementofthescientificcommunityinmaintainingawiki-styleresourceforcapturing supplementaryannotationsofgenemodelsandpredicted proteins.Moreover,thiskindofresourcewould(and should)bedevelopedandmaintainedbyresearchersin themodelorganismcommunitywhointimatelyunderstandtheneedsoftheirscientificcolleagues,presenting thedatainanintuitive,user-friendlyandconcisemanner despiteitssheervolumeandcomplexity. Inourownexperienceadvisinggroupswhohave undertakenwhole-genomesequencingprojects,wehave foundthatmanyofthesegroupsdonothavereadyaccesstothekindofprogrammingresourcesneededto implementsomeofthemore “ advanced ” databasesolutionscurrentlyavailable.Withthatinmind,andtofacilitatethecreationofthekindofcentralizedgenomic dataresourcedescribedabove,wesetouttodevelopa generalizedframeworkthatstrikesareasonablebalance betweeneaseofimplementationanddocumentedstructure,withouttheadditionalconstraintsposedbysome publiclyavailabledatabaseschemas. Here,wedescribethedevelopmentandfeaturesofa comprehensiveWeb-baseddataportalfornavigatingthe recentlycompletedgenomesequenceof Mnemiopsisleidyi (http://research.nhgri.nih.gov/mnemiopsis/).The Mnemiopsis GenomeProjectPortal(or “ MGPPortal ” )isabiologistcentricresourcedesignedwithaparticularemphasison usability,intuitivenavigation,andclarity.SomekeyfeaturesoftheMGPPortalincludetheabilitytoretrieve selectednucleotideandproteinsequences,theavailability ofwhole-genomedatasetsfordownload,anintegrated BLASTutilityforsequencecomparisons,agenome browsertool,agene-centricwiki,and “ phylogenetically informed ” geneorthologclustersmappedtohuman KEGGpathways.Furthermore,thescopeofdataaccessiblethroughthisWebsite goeswellbeyondthesequencedataavailableatGenBank,providingotherkey biologicalinformationsuchasGeneOntologytermassignmentsanddatafrompathwayandproteindomain analyses.Inaddition,weo fferasetofPerlmodulesthat canbeutilizedbyotherscientistsasageneralizedframeworkforimplementingagenepageinMediaWiki,aswell asacustomizablegenomebrowserforvisualizinglargescalegenomicdatausingJBrowse.ConstructionandcontentDisplayingcustomizedgenomeannotationAmajorgoalduringthedevelopmentoftheMGPPortal wastocreateareproducibleworkflowforthecreation ofwell-annotatedandvisuallyaccessiblegenomerepositories(Figure1).Tothatend,wehaveadoptedthe JavaScript-basedJBrowse[18]astheengineforour Mnemiopsis GenomeBrowser,resultinginacleanandresponsiveuserinterfaceforviewingthegenomeassembly,gene models,andallsupportingdata.Wehavedevelopeda numberoftoolsinthePerlscriptinglanguagetoconvert sequenceandannotationdataintoaformatacceptedby JBrowse(version1.11.1),whichareincludedasAdditional files1,2and3.Inaddition,weprovideaPerlscriptthat facilitatesthecreationofMediaWikipagesthatdisplay nucleotideandproteinsequences,exonicgenomiccoordinates,andPFAMdomains(Additionalfile4).Theactual creationofgenomeassembliesandannotationfilesis outsidethescopeofthismanuscriptbutisoutlinedin detailelsewhere[4]. Tobuildfeaturetracks,JBrowserequiresproperlyformattedinputfiles.WhilenewerversionsofJBrowsereleasedsubsequenttothecreationofthe Mnemiopsis GenomeBrowserareabletoacceptaslightlyexpanded numberofinputs(includingspecificrelationaldatabase dumps),thegenericfeatureformatversion3(GFF3)flat filewastheinputtypeweadopted,aformatthatcontinuestobesupportedbyJBrowseatthetimeofthis writing(version1.11.1).AdoptionoftheGFF3format greatlyfacilitatedtheprocessingofmultipleoutputfile typesproducedbyanumberofdataanalysisprograms. Tobegin,the scaffoldToGFF3.pl script(Additionalfile1) canbeusedtoreformatscaffoldsequencesfromFASTA toGFF3.Twoparameters( – iand – o)arerequired,specifyingtheinputscaffoldfileanddesiredoutputdirectory, respectively.Twooptionalparameters( – land – n)arealso available.Thefirstshouldbecalledifthegap-indication letterintheinputisdifferentfrom ‘ N ’ ,acceptingasingleMoreland etal.BMCGenomics 2014, 15 :316 Page2of13 http://www.biomedcentral.com/1471-2164/15/316

PAGE 3

character( e.g. ,X)asasubstitute,whilethesecondparametercanbecalledinordertospecifywhichfeaturegeneratingprogramwasused.ThescriptcreatesaGFF3 fileforeachsequenceintheinputscaffoldFASTAfile andisabletohandlebothgapped[ e.g. ,thescaffold (SCF)track]andrepeat-masked[ e.g. ,therepeat-masked (MASK)track]regionsinascaffold.Next, evmToGFF3. pl (Additionalfile2)parsesaGFF3-formattedoutput filecreatedbyEvidenceModeler(EVM)[19]bycollectingdataaboutthestartanden dpositionsofpredicted genes,usingthisinformationtocreatewell-formed GFF3files.Thescriptacceptsseveraladditionalparameters;theinputEVMfile( – i)anddesiredoutputdirectory( – o)mustbeset,whilethethirdandoptional parameter( – n)is,again,thenameofthefeature-generating program( e.g., EVM).Thethirdmoduleiscalled cufflinksToGFF3.pl (Additionalfile3)andisusedtoparse predictedtranscriptassembliesfromtheGTF-formatted filecreatedbyCufflinks[20].The cufflinksToGFF3.pl scripthasthesamebehaviorfortranscriptlocationas evmToGFF3.pl hasforpredictedgenelocation,andacceptsthesamethreeparameters( – i, – o,and – n). ToimportGFF3dataintoJBrowsefordisplayascustomtracksinthemaingenomewindow,aseriesofthree JBrowse-suppliedPerlscripts( prepare-refseqs.pl,biodbto-json.pl, and generate-names.pl )needtobeexecuted usingappropriateparametersandasystem-specificconfigurationfile,thedetailsofwhichcanbefoundinthe tutorialsontheJBrowseWebsite(www.jbrowse.org).A numberofcustomCGIscriptswerewrittentocreate thehyperlinksconnectingfeaturesinJBrowsetothe varioussourcesofgenedata. APerlscriptnamed create_wiki_page.pl hasbeenused tocreateMediaWikipagesfordisplayinggenomicdata (Additionalfile4).Here,weprovideasamplewikipage thattakesFASTA-formattednucleotideandprotein Figure1 Avisualrepresentationofthe Mnemiopsis GenomeProject(MGP)Portaldepictingtheflowof Mnemiopsis sequencedatainto accessibleinternaldatavisualizationandannotationtools. ColoredarrowscorrespondtoindividualdatatypeslistedintheSequenceData box(center).Forexample,theflowof ‘ AssembledTranscripts ’ data( e.g. ,Cufflinks-andTrinity-assembledRNA-seqtranscripts)isrepresentedby theorangearrows;thesedatacanbeviewedbothastracksintheGenomeBrowserand/orutilizedasaBLASTdatabase. Moreland etal.BMCGenomics 2014, 15 :316 Page3of13 http://www.biomedcentral.com/1471-2164/15/316

PAGE 4

sequences,GFF3filescontainingexoniccoordinates,and anhmmscanoutputfilecontaininginformationonPfamAdomainsasinput.The create_wiki_page.pl scriptrequiresfiveparameters.Thefirstthreeparametersspecify asetofinputfiles,includinganucleotideFASTAfile (-n),aproteinFASTAfile(-a),andaPfam-Afile(-p). TheremainingparametersspecifyadirectorycontainingtheinputGFF3filescontainingtheexoniccoordinates(-d)andanoutputdirectory(-o).Inaddition,we showaPHPcommandlinethatimportsawikipage intoMediaWiki: php/$WIKIHOME/maintenance/importTextFile.php --user=USERwikipage.out Thewikipage.outfileiscreatedbythe create_wiki_ page.pl Researchersinterestedincreatingacustomizedgenomebrowserorgenewikiforvisualizinggenome-related dataareencouragedtoutilizetheaforementionedscripts (Additionalfiles1,2,3and4),astheirimplementation satisfiesthefundamentalrequirementsofbothJBrowse andMediaWiki.Furthermore,thesemodulesmayserve asausefulframeworkforboththedevelopmentofgene wikisandmoreadvancedgenomebrowsertracksasnew JBrowseapplicationsarecreatedtovisualizegenomic data.UserinterfaceandgenomebrowserimplementationMnemiopsis sequencesarestoredasindividualtextfiles, andseveralPerlscriptswerewrittentoretrievetheseas singlecombinedfiles.MultiplesequencescanbedownloadedviaanHTMLinterfacethatwasdevelopedusing aseriesofCGI/Perlscripts.The Mnemiopsis BLAST toolwasimplementedusingViroBLAST[21]andruns onanApacheWebserver(Additionalfile5:FigureS1). TheViroBLASTtoolwaswritteninPHP,aserver-side scriptinglanguage[22],andPerl[23],applyingthe stand-aloneblastallprogramdownloadedfromNCBI [24].The Mnemiopsis BLASTdatabaseswerecreated usingformatdb.PHPwasusedtoparsetheBLASToutputcreatedbyViroBLASTtogeneratetheformatted BLASTresults,includingtheinternallinkstotheGenome Browser,theGeneWikipagesandtheFetchScaffold Tool.MediaWiki(version1.19.11),writteninPHPand Perl,wasusedfortheGeneWikiimplementation.Perl wasalsousedtocreatedatafordisplayingontheGene Wikipages.TheKEGGpathwayspagesweredeveloped usingJavaScripttodisplayKEGGidentifier,pathway, andgenesymbolsearchlists.CGI/Perlscriptswere usedforKEGGpathways earchfunctionsanddata downloadingutilities(Additionalfile6:FigureS2).Python(version2.6)[25]scriptswereusedtosearchthe Pfamdomains,identifiedusinghmmscan[26]fromthe HMMERsuite,andCGIandJavaScriptwereusedto displaythePfamdomainsearchresults(Additionalfile7: FigureS3).UtilityanddiscussionMnemiopsis BLASTtoolOnefeatureofthe Mnemiopsis GenomeProjectPortalis acustomizedstand-aloneWeb-basedBLASTinterface forperformingnucleotideandaminoacidsequence similaritysearches(Additionalfile5:FigureS1).ViroBLASTwasusedtoimplementour Mnemiopsis BLAST tool,producinganorganized,manageableoutputthatis easytoparseandnavigate.UsersmayinputtheirFASTAformattedquerysequencesdirectlyintothesearchbox oruploadsequencefilesfromtheircomputer.ThecustomarysetofBLASTprogramsisavailable,including BLASTN,BLASTP,BLASTX,TBLASTN,andTBLASTX. Nucleotidesequencedatabasesincludethe Mnemiopsis genomicscaffolds(MainScaffolds),consensusgenepredictionmodels(GeneModels2.2)andUnfilteredGene Models(unincorporatedpredictions)describedinRyan etal. [4],allpubliclyavailable Mnemiopsis ESTsand mRNAsfromGenBank(PublicESTs),the Mnemiopsis mitochondrialgenome[27],Cu fflinks-assembledRNA-seq transcripts,andTrinity-ass embledRNA-seqtranscripts [28].Theproteinsequencedat abasesavailablethroughthe MGPPortalincludethetranslatedproteinsderivedfrom the Mnemiopsis consensusgenepredictionmodels(Protein Models2.2),theunincorporated Mnemiopsis proteinsderivedfromunincorporatedgenepredictionmodels(UnfilteredProteinModels),andthecomputationallyderived Mnemiopsis mitochondrialproteins.Additionally,auser maycreateacustomizeduser-definedBLASTdatabaseby uploadingafilecontainingFASTA-formattednucleotide orproteinsequencesofinterest.BLASToutputresults featurecustomizedcolor-codedboxeslinkeddirectlyto relevantinternalannotationresources,includingthe Mnemiopsis GenomeBrowser[B],thewiki-based Mnemiopsis GenePages[G],theScaffoldFetchTool[S],Unfiltered GeneModels[U],Cufflinks-assembledtranscripts[C],and Trinity-assembledtranscripts[T](Figure2).Browsingthe Mnemiopsis genomeOneofourprimaryobjectivesindevelopingtheMGP Portalistoprovideagraphicaltoolforthescientific communitytovisualizethevarioustypesof Mnemiopsis genomedatacurrentlyavailable.Usingthebuilt-inJBrowse genomebrowser,userscanviewavarietyofdatatracks, suchasthe Mnemiopsis genomeassembly,genepredictionmodels,andRNA-seqdata(Figure3).Several customizedJBrowsetracksareavailableforviewing,includingconsensusgenepredictionmodels(labeled2.2in thebrowser), Mnemiopsis RNA-seqreadsassembledinto transcriptsusingCufflinks(CL2)andTrinity(TRN15-30Moreland etal.BMCGenomics 2014, 15 :316 Page4of13 http://www.biomedcentral.com/1471-2164/15/316

PAGE 5

hpf),publiclyavailable Mnemiopsis ESTsfromGenBank (EST),publiclyavailable Mnemiopsis mRNAsfromGenBank(GBNT),assembledgenomicscaffolds(SCF),genomicregionsthathavebeenrepeat-maskedusingV-Match (MASK)[29],experimentallyverified Mnemiopsis RACE PCRtranscripts(RACE),unincorporatedgeneprediction models(2.2UF),andnon-redundantproteindomainsderivedfromPfamhmmscanrunsusingthe2.2and2.2UF datasetsandthesix-frametranslationsofthe Mnemiopsis genome(PFAM2.2). TheJBrowsedisplayisorganizedbygenomicscaffold. Scaffoldsarenamedusingasix-characterconvention ( e.g. ,ML nnnn );theMLdesignatesthespecies( Mnemiopsis leidyi ),andtheindividualscaffo ldsarenumberedfrom0001 to5100( e.g. ,ML0001).Geneidentifiers( e.g. ,ML000129a) startwithaprefixindicatingthescaffoldonwhichthe geneislocated(inthisexample, ‘ ML0001 ’ ),followedbya non-paddedintegerthatisuniqueincombinationwith thescaffoldidentifier(inthiscase, ‘ 29 ’ )andendingwitha lower-caseletterthatspecifiesthegeneisoform(inthis case, ‘ a ’ ).Datadisplayedinthebrowsercanbesearched usingavarietyof Mnemiopsis identifiers.Ascaffold-based querytakestheuserdirectlytothatscaffold,whileagenebasedsearchgoestothatgene ’ slocationontheappropriatescaffold.Ausermayalsosearchthegenomebrowser usinga Mnemiopsis GenBankmRNAidentifier,anEST identifier,oraPfam-Adomainname( e.g., AF293700.1, FC475136,or “ Glycohydro20 ” ,respectively).ThePFAM2.2 trackwascreatedbyrunninghmmscanagainsttheProtein Models2.2,theUnfilteredProteinModels,andthesixframetranslationsofthe Mnemiopsis genome.Scaffold coordinatesaredisplayedacrossthetopofthebrowser window.Navigationoptio nscanbefoundbeneaththe coordinatebar,includingthezoomtoolandleft-right arrows.Userscanalsorefinethedisplayedregionbyenteringthescaffoldcoordinatesintothesearchboxto therightofthenavigationoptions. GenomebrowsertracksaredescribedintheTrackDescriptionslinkabovetheleftsidebar.Theconsensus Mnemiopsis genepredictionmodels(track2.2)arepresentedbydefault.Additionaltrackcanbeaddedby clickingontheappropriatetrackboxontheleftsidebar ofthemainviewwindow.Alltrackoptionsaredisplayed inagivenscaffoldwindowevenwhenthereareno Figure2 The Mnemiopsis BLASTimplementationprovidesusersanintuitiveWebinterfaceforperformingsequencesimilarity searches. Shownaretab-delimitedBLASTPresultsfromasinglehumanPAX3protein,providinglinkstorelevantsequenceentriesinthe Mnemiopsis GenomeBrowser(purple ‘ B ’ box),individualGeneWikipages(green ‘ G ’ box),andUnfilteredProteinModels(orange ‘ U ’ box). Moreland etal.BMCGenomics 2014, 15 :316 Page5of13 http://www.biomedcentral.com/1471-2164/15/316

PAGE 6

annotatedfeaturesinthatparticularregion.Featuresare representedasblackarrows,withthedirectionofthe arrowindicatingtheorientation.Exonsarepresentedas light-coloredsolidbars( e.g. ,greenfor2.2andpinkfor 2.2UF),whileuntranslatedregions(both5 ’ and3 ’ )are renderedusingdarker-shadedcolors.Assembledgenomic scaffolds(SCF)aredepictedassolidblacktrackswith intermittentbrightpinkgaps.TheMASKtrackappears asasolidblackbar,withbluehighlightingtheregions thatwererepeat-maskedusingVMatch(Figure3).The Referencesequencetrackdepictsthescaffoldsequence,butonlywhenthedisplayisfullyzoomedin. Mnemiopsis genesinKEGGpathways Previously,weidentifiedgeneclustersthatcontainlikely orthologsofboth Mnemiopsis andhumanproteins[4].In ordertogainsomeinsightintothefunctionofindividual Mnemiopsis genes,weusedthisinformationtoassign individual Mnemiopsis genestohumanKEGGpathways. WeconvertedEnsemblidentifiersforallhumangenes fromourphylogeneticallyinformedclustersoforthologousgenes[4]toEntrezGeneIDsusingafilefromthe EntrezGeneFTPsite[30];wethenusedEASE[31]tolink EntrezGeneIDstoKEGGpathways.Thesedatacanbe accessedbyfollowingtheKEGGPathwayslinkfoundin theleftsidebarofmostMGPPortalpages.HumanKEGG pathwayscontaininggeneswitha Mnemiopsis orthologare searchablebyselectingaKEGGidentifier( e.g., hsa00604),a pathwayname( e.g., glycosphingolipidbiosynthesis),ora genesymbol( e.g., SLC33A1)fromtheirrespectivelists, orbyusingtheKEGGpathwayssearchbox(Additional file6:FigureS2).Theresultsarepresentedaspathwayspecificorthologclustermatrices(Figure4). ForeachKEGGpathway,eachrowintheorthologclustertablerepresentsaclusteroforthologousgenesfrom ourclusteringanalysis[4].Foraclustertobeincludedin Figure3 Severalcustomizedtrackscanbedisplayedonthe Mnemiopsis genomebrowser,implementedinJBrowse. Here,wepresenta predictedproteinmodel(ML000129a)containingG-protein-coupledreceptordomains,asevidencedbythePFAM2.2track.Transcriptsassembled fromRNA-seqreadssupportthepredictedproteinmodelasdepictedintheCL2track.Transcriptsarepresentedasgrayarrowsindicatingthe orientationofthetranscript.Exonsarepresentedaslight-coloredsolidbarsanduntranslatedregions(both5 ’ and3 ’ )aredarker-shadedbars.Assembledgenomicscaffolds(SCF)aredepictedassolidblacktrackswithintermittentgapsshadedbrightpink.TheMASKtrackalsoappearsasa solidblackbarhighlightedwithblueinthegenomicregionsthathavebeenrepeat-maskedusingVMatch.OthertracksshownincludeanEST andseveralunfilteredproteinpredictionmodels(2.2UF).Additionaltracksaredescribedinthemaintext. Moreland etal.BMCGenomics 2014, 15 :316 Page6of13 http://www.biomedcentral.com/1471-2164/15/316

PAGE 7

thetable,atleastonehumangenefromthatclustermust bepresentintheKEGGpathwayrepresented.Inmost cases,arowconsistsofoneormorehumangenesthatbelongtotheselectedKEGGpathway,alongwiththecomputationallypredictedorthologsfrom Mnemiopsis and21 othermodelorganisms.The ‘ Cluster ’ columnindicatesthe mostinclusivephylogeneticcladethatencompassesallof thegenesintheparticularcluster( e.g., ‘ Metazoa ’ couldindicatethattheclustercontainsgenesfrombothbilaterians andcnidariansandthuscannotbecharacterizedbyaless inclusiveclade,suchas ‘ Bilateria ’ ).Eachhumangeneis hyperlinkedtoitsEntrezGeneentry.The ‘ Ratio ’ column representsthenumberofhumangenesintheparticular clusterthatareinagivenpathway(numerator)overthe numberoftotalhumangenesinthatcluster(denominator).Thehighertheratio,themorelikelythenon-human orthologsintheclusterareinvolvedinthepathway.The numbersinthecolumnsundereachspeciesabbreviation indicatethenumberofgenesfromthatspeciesthatarein thatcluster.Eachnumberinthe ‘ Ml ’ ( Mnemiopsisleidyi ) columnislinkedtotheappropriate Mnemiopsis geneID(s) andcorrespondingGeneWikipages. Pfamdomainsin Mnemiopsis proteins Anotherwaytocharacterizethe Mnemiopsis genesisto determinetheproteindomainsthattheyencode.Weused hmmscanfromtheHMMERsuite(HMMER3.0;March 2010)tosearchtheProteinModels2.2andUnfiltered ProteinModelsfordomainsfromthePfam-Adatabase (version25).Thegatheringthreshold(cut_ga)optionwas usedtoensureconservativedomainprediction.ThePfam DomainslinkonthehomepageoftheMGPPortaltakes theusertoaquerypage,whereresearcherscanspecifya givenPfam-Adomainbynameoraccessionnumber,then searchfor Mnemiopsis genesthatcontainthatdomain (Additionalfile7:FigureS3).Theresultsaredisplayedas alistofproteinmodels,listedbygeneidentifier,andthe numberofquerydomainsfoundinthoseproteinmodels. Figure4 HumanKEGGpathwayscontaininggeneswitha Mnemiopsis homologarepresentedaspathway-specificorthologcluster matrices. Eachrowintheglycosphingolipidbiosynthesispathwayrepresentsaclusterfromourclusteringanalysis.The ‘ Cluster ’ columnindicates themostinclusivecladethatencompassesalloftheproteinsinthecluster.The ‘ Ratio ’ columnrepresentsthenumberofhumanproteinsina givenclusterthatarefoundinthepathwayoverthetotalnumberofhumanproteinsinthatcluster. Mnemiopsis (Ml)entriesareshadedingray andhyperlinkedtotheirrespectiveGeneWikipages. Moreland etal.BMCGenomics 2014, 15 :316 Page7of13 http://www.biomedcentral.com/1471-2164/15/316

PAGE 8

Additionally,ausermaydownloadFASTA-formatted Pfam-Adomainsequencesfromtheresultinglistbyclickingonthecheckboxesnexttothesequence(s)ofinterest, selectingeitherPfam-Adomainonlyorthefull-length domain-containingproteinfromthepull-downmenu,and clicking ‘ Get ’ Mnemiopsis GeneWiki Inanefforttoengagethecollectiveexpertiseofthescientificcommunity,wehaveimplementedacollaborative wiki(MediaWikiversion1.19.11)forthe Mnemiopsis genecomplement.The Mnemiopsis GeneWikiisaccessiblefromtheleftsidebarofmostpagesandis searchableeitherbyselectinga Mnemiopsis geneidentifier( e.g., ML00011a)fromthedrop-downmenuorby manuallyenteringanidentifierintheappropriatesearch box.Userscanalsoaccessthesepagesbyclickingona geneinthe2.2trackofthegenomebrowser.EachrecordintheGeneWikirepresentsasingle Mnemiopsis geneandprovidesthefollowingannotation:nucleotide andproteinsequences,codingexonicgenomiccoordinates, pre-computedBLASThitsf romnumerousorganisms displayingthetophitsforeachprotein,thetopnon-self BLASThitto Mnemiopsis ,Pfam-Adomains,GeneOntology(GO)functionalannotations,humandiseasegenes fromOnlineMendelianInheritanceinMan(OMIM),and atableoforthologclustersformedbyphylogenetically informedclusteringmethods[4](Figure5).Inaddition, controllededitablesectionshavebeenincludedthat permit(andencourage)thescientificcommunityto providefurthergeneannotationforisoforms, insitu images,references,andothernotesforeachgene.Users interestedinsupplementingourgenemodelannotation atthe Mnemiopsis GeneWikipagesmustfirstcreatean accountandloginpriortosubmittingtheircontributions.In-housesubjectmatterexpertdatacuratorsare notifiedbye-mailfollowingthecreationofanewuser accountoranedittoanexistingGeneWikirecord.Any Figure5 EachrecordintheGeneWikirepresentsasingle Mnemiopsis gene(ML000127a)andprovidesthefollowingannotation: nucleotideandproteinsequences,codingexonicgenomiccoordinates,andpre-computedBLASThitsfromnumerousorganisms displayingthetophitsforeachprotein. TheinsetillustratesadditionalannotationsavailablethroughtheGeneWikipages,includingthose regardingPfam-AdomainsandGOterminology(functionalannotation). Moreland etal.BMCGenomics 2014, 15 :316 Page8of13 http://www.biomedcentral.com/1471-2164/15/316

PAGE 9

contentchangesoradditionstotheGeneWikiarethoroughlyevaluatedbythesedatacuratorsandaremade publicsubjecttotheirapproval. Pre-compiledBLASThitsareenumeratedintabularform. Each Mnemiopsis proteinwascomparedtotheUniProt andNCBInon-redundantproteindatabases(nr)using BLASTP.Theresultsdisplaythehitnumber,theaccession numbers, E -values,andbriefdescriptionsofthetopfour hits(lowest E -values).AccessionnumbersarelinkedtorelevantcorrespondingentriesatUniProtandGenBank.The E -valuesarehyperlinkedtothepairwiseBLASTalignments. Each Mnemiopsis proteinwasalsocomparedtosequence datafromdevelopmentallyrelevantorganisms,including Homosapiens Drosophilamelanogaster Capitellateleta Amphimedonqueenslandica Nematostellavectensis Hydra magnipapillata Trichoplaxadhaerens Monosigabrevicollis Salpingoecarosetta Capsasporaowczarzaki ,fungi, plants,andnon-eukaryotes.ThetophitsforBLASTPand TBLASTNresults,fallingbelowanestablished E -value threshold( E -value 110-6),aredisplayedalongwith theirgeneorproteinidentifiers, E -values,anddescriptionofthebesthits.Forthe Mnemiopsis organismal databasesearch,thegeneidentifierofthetopnon-self hitisdisplayed(andlinkedtoitscorrespondingGene Wikipage)alongwiththe E -valueforthatalignment. TheGeneWikialsocontainsasectiondisplayingthe Pfam-Adomainsthatareencodedbytheprotein.The Pfamidentifier,domainarchitectures,sequencestartand endcoordinates,HMMstartandendcoordinates, E -values, anddomainsequencearedisplayedforeachPfam-Adomain.AllPfamidentifiersarehyperlinkedtotheircorrespondingentriesonthePfamWebsite[32].Tofurther assistinclassification,GOtermsarepresentedforeach gene,withGOtermsassignedusingtheArgot2[33] method.FunctionalannotationsderivedusingBlast2GO [34]arealsopresentedforeachgene.RetrievingsinglescaffoldsequencesThegenomicsequenceofasingle(orpartial)scaffoldcan beretrievedusingtheFetchScaffoldTool.Ausercan downloadaFASTA-formattedfullgenomicscaffoldsequencebyfollowingtheFetchaScaffoldlinkintheleft sidebaroftheMGPPortalhomepage,enteringascaffold identifier( e.g., ML0001)inthesearchbox,andselecting theFetchsequenceoption(Figure6).Partialscaffoldsequencescanberetrievedbyaddingscaffoldcoordinatesto theabovequery.Alternatively,userscanretrievethereversecomplementorsix-frameproteintranslationofthe scaffoldbyselectingtheappropriateoption.DownloadingfullorpartialdatasetsAsnotedabove,oneofourprimaryobjectivesindevelopingtheMGPPortalistosimplifythedisseminationofall Mnemiopsis sequencedatatothescientificcommunityat large.Tothatend,wehaveprovidedusersadirectmethod forobtainingentire Mnemiopsis datasetsascompressed textfiles.Thefollowingcompletesequencedatasetscan bedownloadedbyclickingontheappropriatelinkslocatedintheleftsidebar:the Mnemiopsisleidyi genome assembly(5,100scaffolds),thefullsetof Mnemiopsis GeneModels(16,548genes),the Mnemiopsis Protein Models(16,548proteins),the Mnemiopsis Unfiltered ProteinModels(60,006proteins),allpubliclyavailable Mnemiopsis ESTsequencesfromNCBI(15,752ESTs), andthefull Mnemiopsis mitochondrialgenomeand proteinsequences[27](11proteins;Table1).Optionally,usersmayenteraknown Mnemiopsis sequenceidentifier( e.g., ML0001orML00011a)inthesearchboxor selectfromalistofidentifierstoretrieveasinglescaffold, genemodel,proteinmodel,orESTofinterest.DemonstratingtheMGPPortal ’ sutility:aworked exampleTheMGPPortalwasdevelopedtofacilitateresearchthat wouldbenefitfromtheavailabilityofgenomicinformation fromthisemergingmodelorganismand,tothisend,itincludesanumberofintuitivedataanalysistools.Toillustratethispoint,considerthecaseofadevelopmental biologiststudyingthehumanTALEclasshomeoboxgene family( e.g., PBX3;[GenBank:NP_001128250.1])whomay beinterestedincomparingthesesequencesagainst(or predictingnovel) Mnemiopsis homeodomainorthologs.A straightforwardapproachtoa ddressingthisquestionwould betorunaBLASTPsearchofthePBX3proteinsequence againstthe Mnemiopsis ProteinModels(2.2)database. The Mnemiopsis BLASTresultsdisplayanumberofhighscoringcandidateproteinsthatcanbefurtherevaluated forpropertiescharacteristicofTALEclasshomeodomains ( e.g., aTALE-typehomeobox;Figure1). Alternatively,anotherbiologistmaybeinterestedin searchingfornovel Mnemiopsis homeodomains,using sequencedatafromacloselyrelatedorganismsuchas Amphimedon asthebasisfortheirsearch.Oneapproach wouldbetoretrieveacompletesetofknown Amphimedon homeodomainproteinsthroughanNCBIEntrezquery (Searchfor: “ homeodomainANDAmphimedon [ORGN] ” ,whichyields31known Amphimedon homeodomainproteinsatthetimeofthiswriting).FASTA sequencescanthenbecopiedandpastedintothe Mnemiopsis BLASTsearchwindoworuploadedasafile fromalocalcomputer.AuniquefeatureoftheMGP PortalBLASTimplementationincludesaccesstothe UnfilteredProteinModelsdatabase,whichcontainsthe completeunincorporated(unfiltered)proteindataset derivedfromthe Mnemiopsis genepredictionandannotationprocess.ABLASTPsearchagainsttheUnfiltered ProteinModelsdatabaseprovidesadditionalinformationaboutpossiblealternatetranscriptsandisoformsMoreland etal.BMCGenomics 2014, 15 :316 Page9of13 http://www.biomedcentral.com/1471-2164/15/316

PAGE 10

thatwerescreenedandfilteredoutduringtheinitialannotationprocess.Theseunfilteredproteinmodelscan beplacedintogenomiccontextbyfollowingthecolorcoded Mnemiopsis GenomeBrowser[B]linksinthefar right-handcolumnoftheBLASTresults.Thebrowser providesdirectaccesstoadditionalannotation,includingagraphicalrepresentationofthePfam-Adomain thatoverlapstheproteinmodel,aswellaslinkstothe sequenceofthePfam-Adomain(Figure7). Similarly,aresearchermayalsowanttousetheirown customscriptsorexternalcomputationaltoolstofurtherexploretheavailable Mnemiopsis datasets.Insuchacase, theDownloadSequencelinksintheMGPPortalcanbe usedtodownloadboththeProteinModelsandtheUnfilteredProteinModelsforanalysiswithtoolsfromthe HMMERsuite[26]( e.g., hmmsearchHomeobox.hmm ML2.2aa>ML_novel_HDs).Predicteddomainswith E -valuesbelowaninclusionthreshold( e.g.,E -value<0.05) couldthenbeconsideredcandidatehomeodomainsfor furtherevaluation. Conclusions The Mnemiopsis GenomeProjectPortalisintendedasa resourceforinvestigatorsfromthescientificcommunity toobtaingenomicinformationon Mnemiopsis through anintuitiveandeasy-to-useinterface;italsoservesasa modelforresearchersundertakingthedevelopmentof Table1 Mnemiopsisleidyi completesequencedatasets availablefordownloadfromthe Mnemiopsis Genome ProjectPortal DatasetNumberofsequences Genomeassembly(scaffolds)5,100 Genemodels16,548 Proteinmodels16,548 Unfilteredproteinmodels60,006 ESTs15,752 Mitochondrialgenome1 Mitochondrialproteins11 Figure6 The Mnemiopsis FetchToolisusedtoretrieveasingleorpartialscaffold,itsreversecomplementorthesix-frameprotein translation. DisplayedaboveistheoutputofaqueriedpartialgenomicscaffoldforML0001showingthespecifiedgenomicregionofinterest anditssix-frameproteintranslations(thefirstfivetranslationsaredepicted). Moreland etal.BMCGenomics 2014, 15 :316 Page10of13 http://www.biomedcentral.com/1471-2164/15/316

PAGE 11

suchacustomizedgenomeportalthemselves.Thereare anumberofcomprehensivedataportalsavailablefor well-establishedmodelorganisms( e.g. ,FlyBase).However,aswesearchedformodelWebsitesfromwhichto drawinspirationfortheMGPPortal,wefoundthat manyrepositoriesfornext-generationsequencingdata aresimplyWebsiteswithlistsoflinkstorawsequence dataaccompaniedbyminimalannotation,orwerenonintuitiveanddifficulttonavigate.Basedonthisexperience,wefeltthattheselectionandutilizationofessential resourcestosystematicallymanageanddisseminatethe considerableamountsofdatageneratedbythesesequencingprojectswasimperative.Thus,thepresentationandconveyanceofsuchagenomeWebportal shouldbeintuitive,user-friendly,andconcise.Itis withinthisframeworkthatwepresenttheMGPPortal assucharesourcefortherecentlycompletedgenome sequenceof Mnemiopsisleidyi ,andwearehopefulthat thisresourcewillinspireothergroupsastheycreate Webportalsoftheirown. ItwasourintentduringthedevelopmentoftheMGP Portaltodeveloparesourcetomaximizeusabilitywhile presentingacomprehensiveseriesofdatasetsnotavailableelsewhere.Recognizingthedifficultiesandlessons learnedfromthedevelopmentofsucharesource,andin ourcontinuedefforttofurthercommunicateourshared experiencestothescientificcommunityatlarge,weencourageotherinvestigatorstoconsidertheproposed genomeportalmodeland,assuch,haveincludeda seriesofscripts(Additionalfiles1,2,3and4)tofacilitatetheconversionofoutputfilesproducedbyvarious programs.Specifically,thisseriesofscriptscanbeusedto formatannotationdataforvisualizationwithinacustomizedgenomebrowserandawiki. Figure7 SelectingtheGenomeBrowserlink(purple ‘ B ’ boxfromFigure1)fromaBLASTPresultentryofqueriedknown homeodomainsagainstthecomplete Mnemiopsis UnfilteredProteinModelsdirectstheusertotheGenomeBrowserdisplayingthe applicable Mnemiopsis transcriptmodel. Shownabovethe2.2UFtrackinthebrowseristhePFAM2.2trackdisplayingevidenceofa homeoboxinthetargetedregion.Clickingonthe ‘ Homeobox ’ linkinthePFAM2.2trackopensanewbrowserwindowdisplayingthePfam-A domain(ML1991_pfa)predictionresultsderivedfromapre-compile dhmmscanrunusingHMMER.ThisPfam-Adomainrecordprovidesthe genomiclocationofthePfam-Adomain,thegenomiccoordinates,transcriptandproteinsequences,andthehmmscanoutputforthe homeodomainprediction. Moreland etal.BMCGenomics 2014, 15 :316 Page11of13 http://www.biomedcentral.com/1471-2164/15/316

PAGE 12

Asdescribedabove,theMGPPortalcontainssequencebasedinformationandseveralcustomizedutilitiesnot availableelsewhere,increasingtheutilityofthedata generatedbyourgroupinthecourseofour Mnemiopsis whole-genomesequencingproject[4].Thegenome browsertoolprovidesanintuitiveinterfaceforusersto visualizethevarioustypesofdataavailable,including dataresultingfromourcomprehensiveannotationof the Mnemiopsis genome.Mostimportantly,manyfeaturesofthissitemakeiteasyforuserswhodonothave abackgroundinbioinformaticstostraightforwardlyaccessinformationpresentedfromacomparativegenomicspoint-of-view,withouthavingtoperformmanyof theanalysesthemselves.Forinstance,ourphylogeneticallyrelevantgeneclustersaremappedtohumanKEGG pathways,providingaclearphylogeneticperspectivefor anyparticular Mnemiopsis gene(orpathwayofgenes)of interest.Inaddition,usersmaycontributetoourgeneannotationeffortsbyaddingisoforms, insitu images,or othernotestoanyGeneWikipageusingasecurelogin. Wetrustthattheavailabilityofthesedatawillallowinvestigatorsfromnumerousfields(suchasdevelopmental, evolutionary,andmarinebiology)toadvancetheirown researchprojectsaimedatunderstandingphylogeneticdiversityandtheevolutionofproteinsthatplayafundamentalroleinmetazoandevelopment.AvailabilityandrequirementsThe Mnemiopsis GenomeProjectPortalisfreelyavailable athttp://research.nhgri.nih .gov/mnemiopsis,withnobarrierstoaccess.Registrationis onlyrequiredifuserswishto contributedatatotheIsoforms, Insitu Images,References, orNotessectionsofanyoftheGeneWikipages.AdditionalfilesAdditionalfile1: The scaffoldToGFF3.pl scriptreformatsscaffold sequencesfromFASTAtoGFF3. ThescriptcreatesaGFF3fileforeach sequenceintheinputscaffoldFASTAfileandisabletohandleboth gapped[ e.g. ,thescaffold(SCF)track]andrepeat-masked[ e.g. ,the repeat-masked(MASK)track]regionsinascaffold. Additionalfile2: The evmToGFF3.pl scriptparsesaGFF3-formatted outputfilecreatedbyEvidenceModeler(EVM)bycollectingdata aboutthestartandendpositionsofpredictedgenes,usingthis informationtocreatewell-formedGFF3files. Additionalfile3: The cufflinksToGFF3.pl scriptparsespredicted transcriptassembliesfromtheGTF-formattedfilecreatedby Cufflinks. Additionalfile4: The create_wiki_page.pl scriptcreatesMediaWiki pagesfordisplayinggenomicdata. Here,weprovideasamplewiki pagethattakesasinputFASTA-formattednucleotideandprotein sequences,GFF3filescontainingexoniccoordinates,andanhmmscan outputfilecontaininginformationonPfam-Adomains.Theoutputofthis perlscriptiscalledwikipage.out. Additionalfile5:FigureS1. The Mnemiopsis BLASTtool(implemented usingViroBLAST)schematicillustratestheavailableuser-definedinput andoutputformats,BLASTprograms,anddatabaseoptions.BLASTdatabases areprovidedforboth Mnemiopsis nucleotide(e.g.,Mitochondrialgenome) andprotein[e.g.,ProteinModels(2.2)]data. Additionalfile6:FigureS2. TheKEGGPathwayssearchfunction permitsuserstosearchKEGGpathwayscontaininghumangenes,usinga Mnemiopsis homologasthequery.Therelationshipsunderlingthesearch functionaredepictedasaseriesofassociatedflatfiles.Aone-to-one relationshipexistsbetweentheKEGGandPATHWAYtablesandthe PEP2SOURCE_IDandCLUSTERStables .Allotherrelationshipsareone – to-many ormany-to-many.TheCLUSTERING_ANALYSIStableisthefinaloutput representationofaKEGGPathwaysqueryconsistingofthecombination ofKEGG_ENTREZ_GENE,SPECIES,andCLUSTERS. Additionalfile7:FigureS3. ThePFAMDomainssearchfunctionparses aseriesofflatfilesillustratedhereasarelationalframework.ThePFAM Domainsschemaisrepresentedassixattributes,withconnectorsindicating thenatureofeachapplicablerelationship.DOMAIN_ACCESSIONand DOMAIN_NAMEhaveaone-to-onerelationship.Allotherrelationships betweenPFAMattributesaremany-to-many. Competinginterests Theauthorsdeclarethattheyhavenocompetinginterests. Authors ’ contributions JFRandADBconceivedthestudy.RTM,ADN,andTGWdesignedand developedthedatabasewithcriticalinputfromJFR,CES,andADB.ADN wrotethecodeandimplementedtheinterfacesandtoolsassociatedwith theMGPPortal.RTM,JFR,CES,BJK,KS,andTGWperformedthegenome annotationanddataanalysis.RTM,JFR,CES,TGW,andADBtestedtheWeb applicationandtoolsandprovidedfeedback.RTMandADBwrotethe manuscript,withinputandsuggestionsfromCES,JFR,andTGW.ADB directedtheproject.Allauthorsreadandapprovedthefinalmanuscript. Acknowledgments ThisresearchwassupportedbytheIntramuralResearchProgramofthe NationalHumanGenomeResearchInstitute,NationalInstitutesofHealth.We wouldliketothankStevenBond,MarkFredriksen,DerekGildea,andEvan Maxwellfortheirthoughtful,constructivecommentsduringthe developmentofthePortal.WealsothankStevenBond,DerekGildea,and EvanMaxwellfortheircriticalreadingofthemanuscript. Authordetails1GenomeTechnologyBranch,DivisionofIntramuralResearch,National HumanGenomeResearchInstitute,NationalInstitutesofHealth,50South Drive,Bethesda,MD20892,USA.2WhitneyLaboratoryforMarineBioscience, UniversityofFlorida,St.Augustine,FL32080,USA. Received:12November2013Accepted:31March2014 Published:28April2014 References1.HenryJQ,MartindaleMQ: Regulationandregenerationinthectenophore Mnemiopsisleidyi DevBiol 2000, 227 (2):720 – 733. 2.MartindaleMQ,FinnertyJR,HenryJQ: TheRadiataandtheevolutionary originsofthebilaterianbodyplan. MolPhylogenetEvol 2002, 24 (3):358 – 365. 3.FreemanG,ReynoldsGT: Thedevelopmentofbioluminescenceinthe ctenophore Mnemiopsisleidyi DevBiol 1973, 31 (1):61 – 100. 4.RyanJF,PangK,SchnitzlerCE,NguyenAD,MorelandRT,SimmonsDK,KochBJ, FrancisWR,HavlakP,NISCComparativeSequencingProgram,SmithSA, PutnamNH,HaddockSHD,DunnCW,WolfsbergTG,MullikinJC, MartindaleMQ,BaxevanisAD: Thegenomeofthectenophore Mnemiopsisleidyi anditsimplicationsforcelltypeevolution. Science 2013, 342 (6164):1242592. 5.RyanJF,PangK,MullikinJC,MartindaleMQ,BaxevanisAD: The homeodomaincomplementofthectenophore Mnemiopsisleidyi suggeststhatctenophoraandporiferadivergedpriortothe ParaHoxozoa. Evodevo 2010, 1 (1):9. 6.ReitzelAM,PangK,RyanJF,MullikinJC,MartindaleMQ,BaxevanisAD, TarrantAM: Nuclearreceptorsfromthectenophore Mnemiopsisleidyi lackazinc-fingerDNA-bindingdomain:lineage-specificlossorancestralMoreland etal.BMCGenomics 2014, 15 :316 Page12of13 http://www.biomedcentral.com/1471-2164/15/316

PAGE 13

conditionintheemergenceofthenuclearreceptorsuperfamily? Evodevo 2011, 2 (1):3. 7.LiebeskindBJ: Evolutionofsodiumchannelsandthenewviewofearly nervoussystemevolution. CommunIntegrBiol 2011, 4 (6):679 – 683. 8.PangK,RyanJF,MullikinJC,BaxevanisAD,MartindaleMQ: Genomic insightsintoWntsignalinginanearlydivergingmetazoan,the ctenophore Mnemiopsisleidyi Evodevo 2010, 1 (1):10. 9.PangK,RyanJF,BaxevanisAD,MartindaleMQ: EvolutionoftheTGF-beta signalingpathwayanditspotentialroleinthectenophore, Mnemiopsis leidyi PLoSOne 2011, 6 (9):e24152. 10.KochBJ,RyanJF,BaxevanisAD: ThediversificationoftheLIMsuperclass atthebaseofthemetazoaincreasedsubcellularcomplexityand promotedmulticellularspecialization. PLoSOne 2012, 7 (3):e33261. 11.MaxwellEK,RyanJF,SchnitzlerCE,BrowneWE,BaxevanisAD: MicroRNAs andessentialcomponentsofthemicroRNAprocessingmachineryare notencodedinthegenomeofthectenophore Mnemiopsisleidyi BMCGenomics 2012, 13 (1):714. 12.SchnitzlerCE,PangK,PowersML,ReitzelAM,RyanJF,SimmonsD,TadaT, ParkM,GuptaJ,BrooksSY,BlakesleyRW,YokoyamaS,HaddockSH, MartindaleMQ,BaxevanisAD: Genomicorganization,evolution,and expressionofphotoproteinandopsingenesin Mnemiopsisleidyi :anew viewofctenophorephotocytes. BMCBiol 2012, 10 (1):107. 13.SrivastavaM,BegovicE,ChapmanJ,PutnamNH,HellstenU,KawashimaT, KuoA,MitrosT,SalamovA,CarpenterML,SignorovitchAY,MorenoMA, KammK,GrimwwodJ,SchmutzJ,ShapiroH,GrigorievIV,BussLW, SchierwaterB,DellaportaSL,RokhsarDS: TheTrichoplaxgenomeandthe natureofplacozoans. Nature 2008, 454 (7207):955 – 960. 14. GenomeIndex-OriginsofMulticellularity;BroadInstitute. http://www.broadinstitute.org/annotation/genome/multicellularity_project/ GenomesIndex.html. 15.SullivanJC,RyanJF,WatsonJA,WebbJ,MullikinJC,RokhsarD,FinnertyJR: StellaBase:thenematostellavectensisgenomicsdatabase. NucleicAcids Res 2006, 34 (Databaseissue):D495 – D499. 16.KreppelL,FeyP,GaudetP,JustE,KibbeWA,ChisholmRL,KimmelAR: dictyBase:anew Dictyosteliumdiscoideum genomedatabase. Nucleic AcidsRes 2004, 32 (Databaseissue):D332 – D333. 17. AiptasiaWiki. http://aiptasia.cs.vassar.edu/AiptasiaWiki/. 18.SkinnerME,UzilovAV,SteinLD,MungallCJ,HolmesIH: JBrowse:a next-generationgenomebrowser.GenomeRes 2009, 19 (9):1630 – 1638. 19.HaasBJ,SalzbergSL,ZhuW,PerteaM,AllenJE,OrvisJ,WhiteO,BuellCR, WortmanJR: Automatedeukaryoticgenestructureannotationusing EVidenceModelerandtheprogramtoassemblesplicedalignments. GenomeBiol 2008, 9 (1):R7. 20.TrapnellC,WilliamsBA,PerteaG,MortazaviA,KwanG,vanBurenMJ, SalzbergSL,WoldBJ,PachterL: Transcriptassemblyandquantificationby RNA-Seqrevealsunannotatedtranscriptsandisoformswitchingduring celldifferentiation. NatBiotechnol 2010, 28 (5):511 – 515. 21.DengW,NickleDC,LearnGH,MaustB,MullinsJI: ViroBLAST:astand-alone BLASTwebserverforflexiblequeriesofmultipledatabasesanduser's datasets. Bioinformatics 2007, 23 (17):2334 – 2336. 22. PHP. http://www.php.net. 23. ThePerlProgrammingLanguage. http://www.perl.org. 24. NCBIBLASTFTPsite. ftp://ftp.ncbi.nlm.nih.gov/blast. 25. PythonProgrammingLanguage. http://www.python.org. 26.FinnRD,ClementsJ,EddySR: HMMERwebserver:interactivesequence similaritysearching. NucleicAcidsRes 2011, 39 (WebServerissue):W29 – W37. 27.PettW,RyanJF,PangK,NISCComparativeSequencingProgram,Martindale MQ,BaxevanisAD,LavrovDV: Extrememitochondrialevolutioninthe ctenophore Mnemiopsisleidyi MitochondrialDNA 2011, 22 (4):130 – 142. 28.GrabherrMG,HaasBJ,YassourM,LevinJZ,ThompsonDA,AmitI,AdiconisX, FanL,RaychowdhuryR,ZengQ,ChenZ,MauceliE,HacohenN,GnirkeA, RhindN,diPalmaF,BirrenBW,NusbaumC,Lindblad-TohK,FriedmanN, RegevA: Full-lengthtranscriptomeassemblyfromRNA-Seqdatawithouta referencegenome. NatBiotechnol 2011, 29: 644 – 652. 29.KurtzS,ChoudhuriJV,OhlebuschE,SchleiermacherC,StoyeJ,GiegerichR: REPuter:themanifoldapplicationsofrepeatanalysisonagenomic scale. NucleicAcidsRes 2001, 29 (22):4633 – 4642. 30.MaglottD,OstellJ,PruittKD,TatusovaT: Entrezgene:gene-centered informationatNCBI. NucleicAcidsRes 2005, 33 (Databaseissue):D54 – D58. 31.HosackDA,DennisGJr,ShermanBT,LaneHC,LempickiRA: Identifying biologicalthemeswithinlistsofgeneswithEASE. GenomeBiol 2003, 4 (10):R70.32.FinnRD,MistryJ,TateJ,CoggillP,HegerA,PollingtonJE,GavinOL, GunasekaranP,CericG,ForslundK,HolmL,SonnhammerEL,EddySR, BatemanA: ThePfamproteinfamiliesdatabase. NucleicAcidsRes 2010, 38 (Databaseissue):D211 – D222. 33.FaldaM,ToppoS,PescaroloA,LavezzoE,DiCamilloB,FacchinettiA,CiliaE, VelascoR,FontanaP: Argot2:alargescalefunctionpredictiontool relyingonsemanticsimilarityofweightedgeneontologyterms. BMCBioinforma 2012, 13 (Suppl4):S14. 34.ConescaA,GotzS,Garcia-GomezJM,TerolJ,TalonM,RoblesM: Blast2GO: auniversaltoolforannotation,visualizationandanalysisinfunctional genomicsresearch. Bioinformatics 2005, 21 (18):3674 – 3676.doi:10.1186/1471-2164-15-316 Citethisarticleas: Moreland etal. : AcustomizedWebportalforthe genomeofthectenophore Mnemiopsisleidyi BMCGenomics 2014 15 :316. Submit your next manuscript to BioMed Central and take full advantage of: € Convenient online submission € Thorough peer review € No space constraints or color “gure charges € Immediate publication on acceptance € Inclusion in PubMed, CAS, Scopus and Google Scholar € Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit Moreland etal.BMCGenomics 2014, 15 :316 Page13of13 http://www.biomedcentral.com/1471-2164/15/316