The BiSciCol Triplifier: bringing biodiversity data to the Semantic Web

MISSING IMAGE

Material Information

Title:
The BiSciCol Triplifier: bringing biodiversity data to the Semantic Web
Physical Description:
Mixed Material
Language:
English
Creator:
Stucky, Brain J.
Deck, John
Conlin, Tom
Ziemba, Lukasz
Cellinese, Nico
Guralnick, Robert
Publisher:
Bio-Med Central (BMC Bioinformatics)
Publication Date:

Notes

Abstract:
Background: Recent years have brought great progress in efforts to digitize the world’s biodiversity data, but integrating data from many different providers, and across research domains, remains challenging. Semantic Web technologies have been widely recognized by biodiversity scientists for their potential to help solve this problem, yet these technologies have so far seen little use for biodiversity data. Such slow uptake has been due, in part, to the relative complexity of Semantic Web technologies along with a lack of domain-specific software tools to help non-experts publish their data to the Semantic Web. Results: The BiSciCol Triplifier is new software that greatly simplifies the process of converting biodiversity data in standard, tabular formats, such as Darwin Core-Archives, into Semantic Web-ready Resource Description Framework (RDF) representations. The Triplifier uses a vocabulary based on the popular Darwin Core standard, includes both Web-based and command-line interfaces, and is fully open-source software. Conclusions: Unlike most other RDF conversion tools, the Triplifier does not require detailed familiarity with core Semantic Web technologies, and it is tailored to a widely popular biodiversity data format and vocabulary standard. As a result, the Triplifier can often fully automate the conversion of biodiversity data to RDF, thereby making the Semantic Web much more accessible to biodiversity scientists who might otherwise have relatively little knowledge of Semantic Web technologies. Easy availability of biodiversity data as RDF will allow researchers to combine data from disparate sources and analyze them with powerful linked data querying tools. However, before software like the Triplifier, and Semantic Web technologies in general, can reach their full potential for biodiversity science, the biodiversity informatics community must address several critical challenges, such as the widespread failure to use robust, globally unique identifiers for biodiversity data. Keywords: Biocollections, Biodiversity informatics, Darwin core, Linked data, Ontology, RDF, Semantic web, SPARQL
General Note:
Stucky et al. BMC Bioinformatics 2014, 15:257 http://www.biomedcentral.com/1471-2105/15/257; Pages 1-9
General Note:
doi:10.1186/1471-2105-15-257 Cite this article as: Stucky et al.: The BiSciCol Triplifier: bringing biodiversity data to the Semantic Web. BMC Bioinformatics 2014 15:257.

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
All rights reserved by the source institution.
System ID:
AA00026956:00001

Full Text



PAGE 1

SOFTWAREOpenAccessTheBiSciColTriplifier:bringingbiodiversitydata totheSemanticWebBrianJStucky1*,JohnDeck2,TomConlin3,LukaszZiemba4,NicoCellinese4andRobertGuralnick1,3AbstractBackground: Recentyearshavebroughtgreatprogressineffortstodigitizetheworld ’ sbiodiversitydata,but integratingdatafrommanydifferentproviders,andacrossresearchdomains,remainschallenging.SemanticWeb technologieshavebeenwidelyrecognizedbybiodiversityscientistsfortheirpotentialtohelpsolvethisproblem, yetthesetechnologieshavesofarseenlittleuseforbiodiversitydata.Suchslowuptakehasbeendue,inpart,to therelativecomplexityofSemanticWebtechnologiesalongwithalackofdomain-specificsoftwaretoolstohelp non-expertspublishtheirdatatotheSemanticWeb. Results: TheBiSciColTriplifierisnewsoftwarethatgreatlysimplifiestheprocessofconvertingbiodiversitydatain standard,tabularformats,suchasDarwinCore-Archives,intoSemanticWeb-readyResourceDescriptionFramework (RDF)representations.TheTriplifierusesavocabularybasedonthepopularDarwinCorestandard,includesboth Web-basedandcommand-lineinterfaces,andisfullyopen-sourcesoftware. Conclusions: UnlikemostotherRDFconversiontools,theTriplifierdoesnotrequiredetailedfamiliaritywithcore SemanticWebtechnologies,anditistailoredtoawidelypopularbiodiversitydataformatandvocabularystandard. Asaresult,theTriplifiercanoftenfullyautomatetheconversionofbiodiversitydatatoRDF,therebymakingthe SemanticWebmuchmoreaccessibletobiodiversityscientistswhomightotherwisehaverelativelylittleknowledge ofSemanticWebtechnologies.EasyavailabilityofbiodiversitydataasRDFwillallowresearcherstocombinedata fromdisparatesourcesandanalyzethemwithpowerfullinkeddataqueryingtools.However,beforesoftwarelike theTriplifier,andSemanticWebtechnologiesingeneral,canreachtheirfullpotentialforbiodiversityscience,the biodiversityinformaticscommunitymustaddressseveralcriticalchallenges,suchasthewidespreadfailuretouse robust,globallyuniqueidentifiersforbiodiversitydata. Keywords: Biocollections,Biodiversityinformatics,Darwincore,Linkeddata,Ontology,RDF,Semanticweb,SPARQLBackgroundBiocollectionsrepresentirreplaceablelegacyinformation aboutourbiospherethatisessentialforunderstanding howbiodiversityischanginginaneraofunprecedented humanimpacts[1-3].Suchanalysesareonlypracticalif datafrombiocollectionsaroundtheworldaredigitized, integrated,andmadewidelyavailableonline.Thesetasks areamajorfocusofthefieldofbiodiversityinformatics, andalthoughtheypresentmanychallenges,theyalso promisetodeliversignificantbenefitsforbiodiversity scienceanditsallieddisciplines(see,e.g.,[4-9]).Inrecent years,thebiodiversityinformaticscommunityhasmade tremendousstridestowardachievingthisgoalbycreatingsharedcommonvocabulariessuchasDarwinCore (DwC)[10]andpublishingmechanismssuchasthe IntegratedPublishingToolkit(IPT)[11].Thanksto theseandothernationalandinternationalinitiatives,we nowhavehundredsofmillionsofbiodiversityrecords fromaroundtheworldpublishedincommonformats andaggregatedintocentralizedportalsforfurtheruse. Alongwiththissuccess,however,comenewchallengesforeffectivelyusingsuchalargemassofdata.In particular,asthenumbersofspecies,geographicregions, andinstitutionsrepresentedcontinuetogrow,answering questionsaboutthecomplexinterrelationshipsamong thesedatabecomesincreasinglydifficult.Thisisduein nosmallparttotheformatofmostexistingdata,which havebeenassembledandmobilizedfortheWebusing *Correspondence: stuckyb@colorado.edu1DepartmentofEcologyandEvolutionaryBiology,UniversityofColorado, Boulder,Colorado,USA Fulllistofauthorinformationisavailableattheendofthearticle 2014Stuckyetal.;licenseeBioMedCentralLtd.ThisisanOpenAccessarticledistributedunderthetermsoftheCreative CommonsAttributionLicense(http://creativecommons.org/licenses/by/4.0),whichpermitsunrestricteduse,distribution,and reproductioninanymedium,providedtheoriginalworkisproperlycredited.TheCreativeCommonsPublicDomain Dedicationwaiver(http://creativecommons.org/publicdomain/zero/1.0/)appliestothedatamadeavailableinthisarticle, unlessotherwisestated.Stucky etal.BMCBioinformatics 2014, 15 :257 http://www.biomedcentral.com/1471-2105/15/257

PAGE 2

relatively “ flat, ” tabularformats,whichoftenrelyon identifiersthatareuniqueonlywithinthecontextofa giveninstitutionordataprovider,andwhichtypically havelittleornoawarenessofdatafromothersources. Aggregationeffortsbythemselvesdonotdirectlysolve theseproblems,soinaveryrealsense,thebiodiversity datalandscapestillconsistsofmany “ islands ” ofbiodiversitydatawithonlylimitedconnectivity.Discovering evensimplerelationshipsamongthesedataislandsis oftenprohibitivelydifficult.Forexample,asinglecollectingexpeditionmightultimatelyresultinacascadeof specimensandtheirderivatives(tissuesamples,genetic data,andsoon)scatteredacrossmultipleinstitutional collections.Aseachinstitutionpopulatesitsowndata island,thelinksbetweentheseobjectsarelost,andputtingthesepiecesbacktogetheragainis,atbest,very challengingandatworst,practicallyimpossible. TheSemanticWeb(SW)anditscoretechnologies provideanaturalsolutiontotheseproblemsbyenabling aweboflinkeddataandknowledgewherealldataobjectsareuniquelyidentifiedandtherelationshipsamong themareexplicitlydefined[12,13].Consequently,there isgrowingrecognitionoftheadvantagesoflinkeddata technologiesnotonlyinbiodiversityresearchanditsrelateddisciplines(e.g.,[7,9,14,15])butalsothroughoutthe lifesciences[16-20]. Despitethisconsiderableinterest,mostextantbiodiversitydataremainwelloutsideoftheSemanticWeb.Why theapparentlackofprogress?Weidentifythreemajorobstaclespreventingthemorewidespreadadoptionoflinked datatechnologiesinbiodiversityscience.First,thetechnologiesbehindtheSWaregenerallymuchmorecomplexthanthoseoftraditionaldatapublishing.Producing high-qualitylinkabledatatypicallyrequires,ataminimum,theuseofHTTP(HyperTextTransferProtocol) URI(UniformResourceIdentifier)-basedidentifiersforall dataobjects,ResourceDescriptionFramework(RDF)[21] forrepresentingthedataanditsinterrelationships,and vocabulariesandontologiesspecifiedinRDFSchema (RDFS)[22]ortheWebOntologyLanguage(OWL)[23] todescribethekindsofdataandrelationshipsthatmaybe used.Thetraditionaltoolsthatthesetechnologiesreplace, suchasrelationaldatabasesorevenspreadsheets,are comparativelymuchsimpler.Second,alongwiththis technologicalcomplexity,mostavailabletoolsformoving datatotheSWareeitherimmature,toogeneric,orrequiresophisticatedknowledgeofSWtechnologies.Such toolsareintimidatingandunhelpfultouserswhosimply wanttopublishtheirdata,notbecomeSWexperts. Finally,standardsforguidingthecreationoflinkeddata, suchasidentifierschemesandservicesanddescriptive ontologies,areeithernonexistent,influxastheyundergo activedevelopment,orplaguedbyuncertaintydueto competingproposals.Thisisespeciallyproblematicfor interdisciplinaryfieldssuchasbiodiversityscience,which spansmanyknowledgedomains,suchastaxonomy,genetics,ecology,andgeography.However,majorcollaborative effortsareunderwaytoaddressthislastprobleme.g., [24,25],andweanticipatethatrobust,stablestandardswill emergeinthenextfewyears.Inthemeantime,much moreworkwillbeneededtoeventuallyovercomeobstaclesoneandtwo. TheBiologicalSciencesCollections(BiSciCol)Triplifier isnewsoftwarethattakesasteptowardmeetingthese challengesbymakingiteasyforbiodiversityscientiststo takedataintraditionaltabularrepresentationsandtransformthemintoaformatsuitablefortheSemanticWeb. TheTriplifieracceptsdatainavarietyofcommoninput formatsandconvertsthemintoafullRDFrepresentation usingaconsistentRDFvocabularywithRDFSclassesand explicitrelationshipsamongtheclassinstances.Datain thewidely-usedDarwinCore-Archive(DwC-A)format [26]areespeciallyeasytoprocessandrequiretheuserto haveonlyminimalknowledgeofSWtechnologies.The TriplifierisFreeandOpenSourcesoftwarethatcanbe usedeitherasagraphicalWeb-basedapplicationorasa localcommand-linetool.Inthispaper,wedescribethe designandimplementationoftheTriplifier,summarizeits userinterfaceandoutputs,discusstheadvantagesofand potentialapplicationsfortheTriplifier,andconsiderthe ongoingchallengesthatcurrentlylimittheutilityof theTriplifierandotherSWtechnologiesinbiodiversity science.ImplementationIndevelopingtheTriplifier,therewerefourmajordesign goalsthatguidedourwork.First,theTriplifierneededto acceptbiodiversitydatainstandardtabularformats,includingDwC-A,andconvertthemintoausable,anduseful,RDFrepresentation.Second,thesoftwareneededto beeasytouse,yetflexibleenoughtohandleavarietyof inputdataformatsandstructures.Third,theTriplifier ’ s RDFvocabularyshouldbebasedprimarilyonDarwin Core,whichhasbecomethestandardforrepresenting speciesoccurrencedata.Finally,theTriplifierneededto beeasilyextensibletosupportnewinputdatasourcesand formats.SoftwaredesignandarchitectureTomeettheseobjectives,wechosetobuildtheTriplifier primarilyasadynamicWebapplication.Thesoftwarewas architectedusingatypical “ Web2.0 ” approach,withAjax (AsynchronousJavaScriptandXML)-stylelightweight communicationbetweentheclientandaserverbackend usedforprimarydataprocessing.Theserver-sidecomponentoftheTriplifierwasimplementedinJavaandcommunicateswiththeclientbydeliveringandaccepting JSON(JavaScriptObjectNotation)-formatteddataviaanStucky etal.BMCBioinformatics 2014, 15 :257 Page2of9 http://www.biomedcentral.com/1471-2105/15/257

PAGE 3

HTTPwebservicesAPI(applicationprogramminginterface).AsimplifieddiagramofthebasicsoftwarearchitectureispresentedinFigure1. Inordertosupportmultipleinputformatsandto allowtheTriplifiertoeasilysupportnewinputformats, theinitialstagesofdataprocessingintheserversoftwarewereimplementedusingasimplepluginarchitecture.Allcodethatisspecifictoaparticulardataformat ishousedwithinasingle,self-containedJavaclassthat implementsaJavainterfaceforreadinggenerictabularformatteddata.Thesereaderclassesareautomatically discoveredanddynamicallyloadedbythemainserver softwareatruntime. Afterinitialprocessing,allnewincomingdataisconvertedtoastandardizedrepresentationinaSQLite (http://www.sqlite.org/)database.Mostdatasourcesare convertedtothisdatabaserepresentationwithfew,or nochanges.ForDarwinCoreArchives,however,weimplementedmoresophisticatedinitialprocessingthat takesadvantageofthisformat ’ swell-definedstructure andcloserelationshipwithDwC.Thecolumnnamesin aDwC-Aarefirstanalyzedtoidentifywhichclassesare present(seethediscussionoftheTriplifierdatamodel below)andthedataarethen “ normalized ” bymoving columnsforthedifferentclassesintoseparatetablesand eliminatingduplicates. FinaloutputofRDFtripleswasimplementedusing D2RQ[27]andApacheJena[28].Guidedbyinputfrom theclientinterface,theserverdynamicallybuildsaD2RQ database-to-RDFmappingthatallowsthedatainthe SQLitedatabasetobeconverteddirectlytoRDFineither N-Triples[29]orTurtle[30]format,ortotheDOTformat(http://www.graphviz.org/content/dot-language)asa directedgraphrepresentation. Theclient-sidecomponentoftheTriplifierwasdesigned tosupportvariousdegreesofautomationdependingon theinputdataformat.Conversionofaparticularformat toRDFmaybealmostcompletelyautomatedbywritinga JavaScriptcomponentthatdefineshowthesourcedata shouldbemappedtoclassinstances,properties,andrelationshipsinRDF.TheseJavaScriptcomponentsmustadheretoasimple “ interface ” andaresimilarinconceptto thepluginsystemforreadingsourcedataontheserver. Forallinputsources,wedesignedtheWebinterfaceto allowtheuserfine-grainedcontroloverthedetailsofhow thedataaremappedtoRDF. Figure1 SimplifiedarchitecturalblockdiagramoftheTriplifier. Solidlinesindicateconnectionsbetweenmajorsoftwarecomponents, dottedlinesindicatemovementofdataintooroutoffilesanddatabases,arrowheadsindicatetheoveralldirectionofdatamovementthrough thesystem.White,roundedboxesrepresentkeyTriplifiersoftwarecomponents;theorange,rectangularboxrepresentskeythird-partysoftware components.Theremainingsymbolsrepresentdatafilesanddatabases. Stucky etal.BMCBioinformatics 2014, 15 :257 Page3of9 http://www.biomedcentral.com/1471-2105/15/257

PAGE 4

VocabularyanddatamodelAsmentionedearlier,amajorimpedimenttothemore widespreadadoptionofSWstandardsbybiodiversity scientistsisthelackofrobust,standardizedRDFvocabulariesandontologies.Thiswasaproblemfordeveloping theTriplifierbecausesuchstandardsareneededtoproducemeaningfulandreusablelinkeddata.Tosolvethis problem,wechosetobasetheTriplifier ’ sworkingRDF vocabulary,RDFSclassdefinitions,andontologyprimarily ontheDarwinCorestandard.DwCisbyfarthemost widelyusedvocabularyfortabular-formatteddata,soit madesensetouseitasmuchaspossiblefortheTriplifier. EventhoughDarwinCoreisavailableasanRDF document(http://rs.tdwg.org/dwc/rdf/dwcterms.rdf),in itscurrentformitisoflimitedutilityforproducinglinked data.Themostsignificantshortcomingsareconfusion abouttheprecisemeaningsoftheDwC “ classes ” andtheir associated “ ID ” terms(e.g.,dwc:Occurrenceanddwc: OccurrenceID[fullexpansionsforallURIprefixesare giveninTable1]),nostatementsdefiningthedomainsof descriptiveproperties,andalackofpropertiestodefine therelationshipsamongclassinstances[10,31].Webriefly describehowweaddressedeachoftheseissues. TheprincipalRDFSclassesintheTriplifier ’ svocabulary aretheDwC “ categories, ” whicharealreadydefinedinthe DwCRDFasRDFSclasses.Weinterpretedtheclassspecific “ ID ” termsasdenotingtheidentifiersthatcould beusedforinstancesoftheDwCclasses,whichwebelieve isconsistentwithcurrentusageofthesetermsinactual biodiversitydatasets.Theremainingterms,whicharedefinedintheDwCRDFastyperdf:Property,wereincluded aspropertiesusedtoconnectliteralobjects(e.g.,text stringsornumericvalues)toclassinstances.WealsoincludedsevenRDFSclassesdefinedbytheDublinCore metadatastandard[32],oneofwhichisalreadyapartof DwC(dcterms:Location),andtheremainderofwhichare usefulfordescribingbiocollectionsdata(dcterms:Agent, dcterms:Image,dcterms:MovingImage,dcterms:PhysicalObject,dcterms:Sound,dcterms:Text). AlthoughtheDwCRDFdoesnotstatethedomainof anyproperties,itdoesincludedwcattributes:organizedInClass,whichappearstoserveasimilarfunction.We interpreteddwcattributes:organizedInClassasdefiningto whichclasseachpropertyshouldapplyandthereforeeffectivelydefiningeachproperty ’ sdomain.Thisworkedfor nearlyallpropertiesexceptfortheso-called “ record-level terms, ” whichdonotindicatethattheyareorganized withinasingleclass.Sevenoftheseterms(dcterms:type, dwc:institutionID,dwc:collectionID,dwc:institutionCode,dwc:collectionCode,dwc:ownerInstitutionCode, dwc:basisOfRecord)appearedtomostoftendescribean Occurrenceinactualpractice,soweconsideredthemas applyingtothisclass.Welefttheremainingrecord-level termsinthevocabularywithoutasingle-classdomain.Becausemostofthesetermsarerarelyused,theyarenotincludedbydefaultintheWeb-baseduserinterface. Asitcurrentlystands,DwCincludesvirtuallynoguidanceabouthowclassinstancesshouldberelatedtoone anotherinalinkeddatacontext,buttherehavebeenpreviousindependenteffortstofillthisgap,suchasthe TaxonConceptOntology[33],thedarwin-swproject[34], andtheworkoftheTDWG-RDFinterestgroup(http:// tdwg-rdf.googlecode.com/).However,becauseeffortsto developandstandardizefull-featuredontologiesforbiodiversityscienceanditsrelateddisciplinesarewellunder way[25],wechosetodevelopalimited “ ontology ” forthe Triplifierthatdefinesonlyfourhigh-levelrelationships betweenclassinstances.Thisallowedustomoveforward withsoftwaredevelopmentwhilewewaitforricherand moredescriptiveontologiestobecomeavailable. ThefourpropertiessupportedbytheTriplifieroriginatedwiththebroaderBiSciColprojectandwerechosen becausetheyallowedustocapturetheessentialconnectionsinDwCdata.WeusedtheOWLconstructsowl: SymmetricPropertyandowl:TransitivePropertytoformallydefinehowthesepropertiesshouldbeappliedwhen reasoningacrossmultipleRDFstatements(see[35]fora detaileddiscussionoftheseOWLconstructs).Thefour propertiesarero:derives_from(non-symmetric,transitive), bsc:depends_on(non-symmetric,non-transitive),bsc:alias_of(symmetric,transitive),andbsc:related_to(symmetric,non-transitive)(Table2).Thesepropertieshad alreadybeenselectedforthebroaderBiSciColprojectand theyallowedustocapturetheessentialconnectionsin DwCdata.Themeaningofeachpropertyisapparentfrom itsname:ro:derives_fromisborrowedfromtheOBORelationOntology[36]andindicatesthatthesubjectofan RDFstatementwasphysicallyderivedfromtheobject, bsc:depends_onindicatesthatthesubjectcouldnotexist withouttheobject,bsc:alias_ofindicatesthesubjectand objectarethesamething,andbsc:related_toindicates thatthesubjectandobjectshareanon-dependentrelationship. DuringtheTriplifier ’ sdevelopment,DwCincluded sixcorecategoriesorclasses(dwc:Occurrence,dwc: Event,dcterms:Location,dwc:GeologicalContext,dwc: Table1URIshort-formprefixesusedinthispaperPrefixURI bsc http://biscicol.org/terms/biscicol.owl# dcterms http://purl.org/dc/terms/ dwc http://rs.tdwg.org/dwc/terms/ dwcattributes http://rs.tdwg.org/dwc/terms/attributes/ owl http://www.w3.org/2002/07/owl# rdf http://www.w3.org/1999/02/22-rdf-syntax-ns# ro http://www.obofoundry.org/ro/ro.owl # Stucky etal.BMCBioinformatics 2014, 15 :257 Page4of9 http://www.biomedcentral.com/1471-2105/15/257

PAGE 5

Identification,dwc:Taxon),andwefocusedonexplicitlydefiningthepossiblerelationshipsamonginstancesoftheseclasses.Recently,aseventhcategory wasadded,dwc:MaterialSample[37],butitisnotyet includedintheTriplifier ’ sontology.Foreachofthese sixclasses,weconsideredtheirmeaningsasdefinedin theDwCstandardaswellashowtheyaremostcommonlyusedinpracticeinordertodecidehowinstances oftheseclassescouldbestbeconnectedusingthefour simplepropertiesdiscussedabove.Theresultsofthis analysisbecamethefinalontologythatweusedfordevelopingtheTriplifier,whichisillustratedinFigure2. Thisontologywasintendedtocovermost,butnotall, DwC-basedbiodiversitydata.Forexample,itdoesnot dealwithcaseswherero:derives_frommightbeused, suchasatissuesamplethatistakenfromawholespecimen,anditdoesnotincludetheuseofalias_of.Ourexperiencehasbeenthatderives_fromisrarelyneeded whenworkingwithcurrentDwCdatasets,andalias_of isrelevantonlyforspecialcasesinwhichasingleobject hasbeenaccidentallyassignedmultipleidentifiers.Thus, theTriplifierdoesnotautomaticallyapplyeitherofthese propertiesbydefault.However,wedesignedtheTriplifier ’ s Webinterfacetoallowuserstoeasilyexpressallfourrelationshippropertiesfortheirowndata,asneeded.InstanceidentifiersAfinalchallengeindevelopingtheTriplifierwasthe lackofabroadlyacceptedandwidelyusedstandardfor generatingandresolvingidentifiersforbiodiversitydata. Althoughseveraloptionsareavailable,suchasArchival ResourceKeys(ARKs),BiocodeCommonsIdentifiers (BCIDs),DigitalObjectIdentifiers(DOIs),LifeSciences Identifiers(LSIDs),UniformResourceName(URN)-based mechanisms,andothers[38-40],nonehavebeenwidely adoptedbybiodiversitydataproviders[7,41].Giventhis reality,wedecidedtoneitherenforcenorendorseanyparticularidentifierstandardintheTriplifierandtoinstead allowuserstoworkdirectlywithwhateveridentifiersthey prefer. Manydata,however,haveonlylocallyuniqueidentifiers,whicharenotusefulforlinkingontheSW,and someidentifiersarenotevenuniquewithinasingle dataset(e.g.,integerkeysthatarereusedacrossdatabase tables).Moreover,flat,single-tableinputformatssuchas DwC-Ausuallyhavenoidentifiersatallformostofthe classinstancesthatareimplicitlypresentinthedata.To handlethesecases,weimplementedasimpleidentifier constructionalgorithm.Iftheuserdoesnotindicatethat inputdatausesgloballyuniqueidentifiers,theTriplifier generatesidentifiersforeachclassinstancebyconcatenatingthreepiecesofinformation:SQLite_database_table_ name+ “ ” +identifier_column_name+ “ ” +local_identifier. InthecaseofDwC-As,ifidentifiersforaparticularclass aremissingentirely,localinteger-basedidentifiersare generatedduringthedatanormalizationstep.Itshouldbe notedthatthisschemeisonlyguaranteedtoproduce identifiersthatareuniquewithinagivenversionofadataset,whichissufficientforusingthefinalRDFbyitselfbut notforlinkingitwithotherdatasets.AlthoughweconsideredhavingtheTriplifiermintnewgloballyunique identifiersforuserdata,weultimatelydecidedthatthis wasbeyondthesoftware ’ sintendedscope.ActivedevelopmentWearecurrentlydevelopingacommand-lineversionof theTriplifiertocomplementtheWeb-basedTriplifier. Thecommand-lineTriplifierisintendedforefficient, high-throughputprocessingoflargenumbersofdata filesorverylargedatafiles.Itsuserinterfacefeatures twobasicmodesofoperation.First,userscansupplya customD2RQmappingfiletoguidetheconversionof thesourcedatatoRDF.Mappingfilescaninitiallybe generatedbytheWeb-basedTriplifier,furthermodified asneeded,thenusedwiththecommand-lineTriplifier forbatchdataprocessing.Thesecondmodeofoperation usescustomJavaclassesthatfullyautomatethedata conversionprocessforspecificinputdataformats.These classesareconceptuallyanalogoustotheautomation pluginsdescribedabovefortheWeb-basedTriplifier, andwhenthecommand-linetoolisusedinthismode, noD2RQmappingfileisrequired. Automationclassesforthecommand-lineTriplifier canalsotakeadvantageofaJavaclassframeworkthat wasdesignedtomakeiteasytocustomizehowsource dataareconvertedtoRDF.Forinstance,automation Table2DefinitionsandexamplesoftherelationshippropertiesusedbytheTriplifier ’ sontologyPropertyDefinitionExampleSymmetricTransitive obj1ro:derives_fromobj2Physicalmaterial(obj1)thatissubstantially derivedfromotherphysicalmaterial(obj2) Atissuesample(obj1)isderived fromaspecimen(obj2) NoYes obj1bsc:depends_onobj2Anentity(obj1)whoseexistencedepends onanotherentity(obj2) Anidentification(obj1)depends onaspecimen(obj2) NoNo obj1bsc:alias_ofobj2Twoinstances(obj1,obj2)thatare understoodtobethesamething Obj1andobj2bothreferto thesamespecimen YesYes obj1bsc:related_toobj2Anentity(obj1)withanon-dependent relationshipwithanotherentity(obj2) Aspecimen(obj1)isrelatedto ataxon(obj2) YesNo Stucky etal.BMCBioinformatics 2014, 15 :257 Page5of9 http://www.biomedcentral.com/1471-2105/15/257

PAGE 6

classescouldusetermsfromalternativevocabulariesor ontologies,allowingforabroaderrangeofsemanticinterpretationsofsourcedatafiles,ortheycouldincorporate customgloballyuniqueidentifierschemes.Supportfor theserelativelylow-levelmodificationsgoesbeyondwhat iseasilyachievablethroughtheWeb-basedinterface.ResultsTheBiSciColTriplifierisavailabletousersasbothaWeb application[42]andacommand-lineprogram.TheTriplifierisFreeandOpenSourceSoftwareandallsourcecode isprovidedunderthetermsoftheBSD3-ClauseLicense [43]attheTriplifier ’ sprojectsite[44].Pre-builtexecutablesofthecommand-linetoolareavailableviathe Triplifier ’ sSubversionrepository.Theprojectsitealsoincludesuseranddeveloperdocumentation,andadditional informationaboutthephilosophyanddesigndecisions thatguidedTriplifierdevelopmentcanbefoundonthe BiSciColblog[45]. TheTriplifiercurrentlyacceptssourcedatainavarietyof commontabulardataformats,includingcomma-separated values(CSV)textfiles,OpenDocument(OpenOffice.org andLibreOffice)andMicrosoftExcelspreadsheets,and DwC-A.TheTriplifieralsosupportsdirectnetworkconnectionstopopularrelationaldatabasemanagementsystemssuchasPostgreSQL,MySQL,OracleDatabase,and MicrosoftSQLServer.Inputdataarenotrequiredtofollowanyparticularstructureandarenotrequiredtouse DwCterms. Afterloadinganinputdatasource,theusermustprovidetheinformationaboutthedatathattheTriplifier needstosuccessfullyconvertthemtoRDF.Withthe Webinterface,thisrequiresfoursteps.First,ifthe sourceincludesmultipledatatables,theuserindicates whichkeysjointhetablestogether(inthesenseofrelationaldatabases).Second,theuserspecifieswhichtable columnsshouldbeusedasidentifiersforinstancesof theclassesintheTriplifier ’ svocabularyandwhether thosecolumnscontaingloballyuniqueidentifiers.Third, columnswithliteraldataarematchedtopropertynames intheTriplifier ’ svocabularyandtheclassestheydescribe.Fourth,theuserindicateshowclassinstances shouldbeconnectedtooneanotherwiththefourTriplifierrelationshipproperties. Uponcompletionofthesefoursteps,theTriplifiercan generateandreturntheRDFrepresentationoftheuser ’ s dataasasingleN-TriplesorTurtle-formattedtextfile.Alternatively,thetriplescanbeconvertedtotheDOTgraph descriptionformatanddownloadedasaDOTfile,which allowsthedatatobevisualizedwithsoftwaresuchas Graphviz[46].Theusermayalsodownloadthedynamicallygeneratedmappingfilethatcapturesalloftheinformationprovidedinthefourconfigurationsteps.These mappingfilescanthenbeusedwiththecommand-line versionoftheTriplifiertorapidlyprocesslargervolumes ofdata. ForsourcedatathatisintheformatofaDwC-A,the processisevensimpler.Iftheuserchooses,theTriplifier canautomaticallyanalyzethesourcearchiveandcomplete allfouroftheconfigurationstepswithnouserintervention.Atthatpoint,theusercaneithercustomizetheconfigurationasdesiredorsimplyrequesttheRDFfile.In mostcases,runningDwC-AsthroughtheTriplifierisas straightforwardasuploadingthearchive,thendownloadingtheRDFrepresentation.ExampleRDFoutputfrom theTriplifierfortypicalDwC-Adata,includingagraphical representationofthegeneratedRDFtriples,isprovidedin Additionalfile1. WebuilttheTriplifierwithbiodiversitydatainmind, butthecodefollowsamodulardesignandshouldbe, forthemostpart,relativelyeasytoadapttootherknowledgedomains.Forexample,allthatisrequiredto Figure2 DiagramoftheontologyusedbytheTriplifierforthesixcoreDwCclasses. Forsimplicityofpresentation,the “ dwc ” and “ bsc ” prefixesareomitted. Stucky etal.BMCBioinformatics 2014, 15 :257 Page6of9 http://www.biomedcentral.com/1471-2105/15/257

PAGE 7

supportanewinputdataformatiswritingasingleJava classthatimplementsareaderpluginfortheTriplifier ’ s server-sidedataprocessingsystem.Noexistingcode needstobemodified.CustomizingorreplacingtheTriplifier ’ sRDFvocabularyisalsostraightforwardbecause itisdefinedbyasingleRDFSfile.DiscussionTheBiSciColTriplifierisnotthefirstsoftwareforconvertingtabulardatatoRDFsee,e.g.,[47-49],butforbiodiversityscientists,thereareatleastthreekeyadvantages thatseparatetheTriplifierfrommostoftheseothertools. First,unlikeothersoftwarewhichoftenworkswithonly oneorafewinputformats,theTriplifiersupportsabroad rangeofdataformats,rangingfromplaintextfilestofullfledgeddatabases,anditistheonlytoolofwhichweare awarethatdirectlysupportsDwC-As.Second,theTriplifierisspecificallytailoredforbiodiversitydataandcomes withaready-to-usevocabularyandinterfacebasedupon DwC.Third,andperhapsmostimportant,becauseofits domain-specificity,theTriplifiercanhidemostorallof thecomplexityofSWtechnologiesfromtheenduser.For mostusers,andformostkindsofdata,usingtheTriplifier willrequirelittlemorethanaconceptualunderstanding oftheprinciplesbehindlinkeddatatechnologies.For userswithdatainDwC-As,conversiontoRDFcanbe fullyautomatedandrequiresnospecialknowledgeatall. Thisisincontrasttomoregenerictabulardataconverters forwhichtheusermustspecifycomplexmappingsthat requiredetailedknowledgeofthetargetRDFvocabularies andontologies. Despitetheseadvantages,thecurrentlandscapefor linkeddatainbiodiversitysciencewilllikelylimitthe Triplifier ’ sabilitytobringbiodiversitydatatotheSW,at leastintheshortterm.Therearetwomainreasonsfor this.First,theabsenceofrobust,standardized,and widely-acceptedvocabulariesandontologiesforlinkable biodiversitydatameansthattheRDFgeneratedbythe Triplifierisneitherasexpressivenorasbroadlyusefulas itcouldbe.Forexample,wedonotenvisiontheTriplifier ’ ssimplisticontologyforrelationshipsamongitsclass instancesasalong-termsolution.Rather,weconsiderit merelyameansformovingforwarduntilamoresatisfactoryontologybecomesavailableandacceptedasa standard. Second,theanarchypresentlygoverningtheuseof identifiersinbiodiversitydataisamajorimpedimentto usingthesedata,andRDFgeneratedforthembythe Triplifier,ontheSW.Ifthedreamofmakingallbiodiversitydatauniversallyaccessibleandlinkableistobecomereality,biodiversitydatamustuseidentifiersthat aregloballyunique,resolvable,andaboveall,persistent [7].TheTriplifiertakesstepstoensurethatidentifiersin thedataitprocessesareuniquewithinthedataset, whichissufficientforproducingfunctionalRDF,but datawithoutpermanentandgloballyuniqueidentifiers cannotbeusefullylinkedwithdatafromothersources. Unfortunately,apermanentsolutiontothischallenge doesnotseemnearathand,andfindingsuchasolution shouldbeanurgentcommunitypriority. Nevertheless,thereareseveralwaysinwhichtheTriplifierwillstillbeusefultobiodiversityscientistswithdatain traditionalformats,evenifthosedataarenotimmediately destinedforthelargerSW.Perhapsmostimportant,RDFformatteddatafromtheTriplifiercanbeaggregated ineitherlocalorremote “ triplestores ” (databasesspecializedforstoringRDFstatements)andexaminedwithSW queryingtoolssuchasSPARQL[50].AnalyzingDwCbasedbiodiversitydatainthisway,ratherthanasrowsin aspreadsheetorrelationaldatabasetable,oftenmakesit mucheasiertoanswerhigh-levelquestionsaboutthedata. Forexample,supposewewouldliketoknowwhichOccurrenceinstancesinaDwCdatasetareassociatedwith taxonomicinformation.Inatabularcontext,answering thisconceptuallysimplequestionisquitecumbersome andcouldrequireinspectingthevaluesofmorethan20 tablecolumns.WiththeRDFoutputgeneratedbythe Triplifier,weneedonlyaskwhichOccurrenceinstances arerelatedtoaTaxoninstance,andthisquestioncanbe conciselyrepresentedbyasimpleSPARQLquery,suchas: SELECT?occurrence WHERE{ ?occurrencebsc:related_to?taxon. ?occurrenceadwc:Occurrence. ?taxonadwc:Taxon }. Thus,unliketraditionalrelationaldatabases,RDF makesitpossibletoworkdirectlyatthelevelofclassinstancesandtheirrelationships.Thiscangreatlysimplify theprocessoftranslatingquestionsaboutthedatainto thequeriesthatwillanswerthem,andthebenefitsof thisapproachhavealreadybeendemonstratedinother lifesciencesdomainse.g.,[51]. WealsoenvisiontheTriplifierasahands-ontoolfor biodiversityresearcherstolearnaboutandexperiment withSWtechnologies.Inthisrole,itshouldbeespeciallyusefultothosewhoarearecuriousaboutmoving theirdatatotheSWbuthavesofarbeendeterredby thecomplexityoftheSWtechnologystack.Withthe Triplifier,suchresearcherscaneasilyapplycoreSW technologiesdirectlytotheirowndataandstudytheresults.UseofthefamiliarDwCtermsintheTriplifier ’ s RDFvocabularyandsupportforvisualizationsoftware suchasGraphvizwillfurtherassistresearchersininterpretingandunderstandingtheRDFrepresentationsof theirdata.Stucky etal.BMCBioinformatics 2014, 15 :257 Page7of9 http://www.biomedcentral.com/1471-2105/15/257

PAGE 8

ConclusionsTheBiSciColTriplifierisnewopen-sourcesoftwarethat automatestheprocessofconvertingtabular-formatbiodiversitydatatoRDFsuitableforuseontheSW.The TriplifierofferstheflexibilityofbothWeb-basedand command-lineinterfaces,supportsawidevarietyof commoninputformats,includingthepopularDwC-A format,andcomeswithavocabularyandsimpleontologybasedonthewidely-usedDarwinCorestandard. OutputformatsincludeTurtleandN-TriplesforRDF andDOTforgraphvisualizations.Amodulardesign makestheTriplifieradaptabletootherinputdataformats,RDFvocabularies,andresearchdomains. TheBiSciColTriplifiermakesiteasierthaneverbefore forbiodiversityscientiststoapplymodernSWtechnologiestotheirdata.Expressingtabular-formatbiodiversity dataasRDFisusefulnotonlybecauseitallowsthedata tolinkwithotherdatasourcesontheSW,butalsobecauseitallowsresearcherstousepowerfulquerytools suchasSPARQLtoanswercomplexquestionsabouttheir data.TheTriplifierdoesnotrequireuserstohaveextensiveknowledgeofRDF,RDFSvocabularies,orOWL ontologies,whichshouldmakeitavaluableaidforbiodiversityresearcherswhowishtolearnaboutandexperimentwiththesetechnologies. TheSemanticWebholdsgreatpotentialforbiodiversityscience,butavarietyofchallengescontinuetomake actuallyachievingthispotentialelusive.Perhapsmost critically,biodiversitydatacontinuetosufferfromthe widespreaduseofidentifiersthatareneitherpersistent norgloballyunique,andthisseverelylimitstheusefulnessofthesedataontheglobalSW.Bettermethodsfor visualizinglinkedbiodiversitydataarealsoapressing need,especiallyforbiodiversityscientistswhoarejust beginningtoexplorehowtheirdatamightworkina linkedcontext.Lookingfurtherahead,astheTriplifier andother,complementaryeffortseventuallysucceedin mobilizingbiodiversitydatafortheSW,wewilllikely neednewcomputationaltoolstomakesenseofsucha massive,andmassivelyinterconnected,dataset. First,though,weneedtomakeitlessdifficultforbiodiversityresearcherstoactuallygettheirdatatotheSW inthefirstplace,andtheBiSciColTriplifierisasignificantsteptowardthisgoal.Ourhopeisthatdefinitive newstandardsforvocabularies,ontologies,andidentifiers,inconcertwithsoftwaretoolsliketheTriplifier, willmakeSWtechnologiesaseasytouseinthefuture asdatabasesandspreadsheetshavebeeninthepast.AvailabilityandrequirementsProjectname: BiSciColTriplifier Projecthomepage: http://www.biscicol.org/triplifier/ (Web-basedapplication);http://triplifier.googlecode.com/ (projectsite,documentation,andsourcecode) Operatingsystem(s): TheWeb-basedinterfaceonly requiresamodernwebbrowserandhasbeentestedwith recentversionsofFirefox,Chrome,Opera,Safari,and InternetExplorer.Thecommand-lineTriplifierandserversidesoftwarerequireJavaandshouldrunonallmodern operatingsystemsforwhichtheJavarun-timeenvironmentanddevelopmenttoolsareavailable(e.g.,GNU/ Linux,Windows,OSX). Programminglanguage: JavaandJavaScript Otherrequirements: Theserver-sidesoftwarerequires aJavawebapplicationserversuchasApacheTomcator GlassFish.Allotherrequiredcomponentsareincluded withthesourcedistribution. License: BSD3-ClauseLicense(http://opensource.org/ licenses/BSD-3-Clause) Anyrestrictionstousebynon-academics: Nonacademicsmayfreelyusethissoftware.AdditionalfileAdditionalfile1: Supplementarymaterialfor “ TheBiSciCol Triplifier:bringingbiodiversitydatatotheSemanticWeb ” The supplementarymaterialprovidesRDFoutputgeneratedbytheTriplifier forasmallinputDwC-Adataset,andincludesagraphicalrepresentation oftheRDFoutput. Competinginterests Theauthorsdeclarethattheyhavenocompetinginterests. Authors ’ contributions RG,NC,andJDconceivedtheproject.Allauthorscontributedtosoftware design.BJS,LZ,JD,andTCimplementedthesoftware.RG,JD,andBJSwrote thesoftwaredocumentation.BJSdraftedthemanuscript.Allauthorsread andapprovedthefinalmanuscript. Acknowledgments WewouldliketothankStevenBaskauf,ReedBeaman,BryanP.Heidorn,Hilmar Lapp,RichardPyle,TimRobertsonandRobertWhittonfortheconstructive feedbackandmanydiscussionsthattookplacethroughoutthecourseof softwaredevelopment.WeareverygratefultotheNationalScienceFoundation foritssupport(DBI-0956371,DBI-0956350,andDBI-0956426). Authordetails1DepartmentofEcologyandEvolutionaryBiology,UniversityofColorado, Boulder,Colorado,USA.2BerkeleyNaturalHistoryMuseums,Universityof California,Berkeley,California,USA.3MuseumofNaturalHistory,Universityof Colorado,Boulder,Colorado,USA.4FloridaMuseumofNaturalHistory, UniversityofFlorida,Gainesville,Florida,USA. Received:12June2014Accepted:22July2014 Published:29July2014 References1.MoritzC,PattonJL,ConroyCJ,ParraJL,WhiteGC,BeissingerSR: Impactof acenturyofclimatechangeonsmall-mammalcommunitiesinYosemite NationalPark,USA. Science 2008, 322: 261 – 264. 2.ScobleM: Rationaleandvalueofnaturalhistorycollectionsdigitisation. BiodiversInform 2010, 7: 77 – 80. 3.ErbLP,RayC,GuralnickR: Onthegeneralityofaclimate-mediatedshift inthedistributionoftheAmericanpika( Ochotonaprinceps ). Ecology 2011, 92: 1730 – 1735. 4.BisbyFA: Thequietrevolution:biodiversityinformaticsandtheInternet. Science 2000, 289: 2309 – 2312.Stucky etal.BMCBioinformatics 2014, 15 :257 Page8of9 http://www.biomedcentral.com/1471-2105/15/257

PAGE 9

5.GodfrayHCJ,ClarkBR,KitchingIJ,MayoSJ,ScobleMJ: TheWebandthe structureoftaxonomy. SystBiol 2007, 56: 943 – 955. 6.SarkarIN: Biodiversityinformatics:organizingandlinkinginformation acrossthespectrumoflife. BriefBioinform 2007, 8: 347 – 357. 7.PageRDM: Biodiversityinformatics:thechallengeoflinkingdataandthe roleofsharedidentifiers. BriefBioinform 2008, 9: 345 – 354. 8.GuralnickR,HillA: Biodiversityinformatics:automatedapproachesfor documentingglobalbiodiversitypatternsandprocesses. Bioinformatics 2009, 25: 421 – 428. 9.ParrCS,GuralnickR,CellineseN,PageRDM: Evolutionaryinformatics: unifyingknowledgeaboutthediversityoflife. TrendsEcolEvol 2012, 27: 94 – 103. 10.WieczorekJ,BloomD,GuralnickR,BlumS,DringM,GiovanniR,RobertsonT, VieglaisD: Darwincore:anevolvingcommunity-developedbiodiversity datastandard. PLoSOne 2012, 7: e29715. 11.RobertsonT,DringM,GuranickR,BloomD,BraakK,OteguiJ,RussellL, WieczorekJ,DesmetP: TheGBIFintegratedpublishingtoolkit:facilitating theefficientpublishingofbiodiversitydataontheinternet. PLoSOne 2014. 12.Berners-LeeT,HendlerJ,LassilaO: Thesemanticweb. SciAm 2001, 284: 28 – 37. 13.HeathT,BizerC: Linkeddata:evolvingthewebintoaglobaldataspace. SynthLectSemanticWebTheoryTechnol 2011, 1: 1 – 136. 14.MadinJS,BowersS,SchildhauerMP,JonesMB: Advancingecological researchwithontologies. TrendsEcolEvol 2008, 23: 159 – 168. 15.DeansAR,YoderMJ,BalhoffJP: Timetochangehowwedescribe biodiversity. TrendsEcolEvol 2012, 27: 78 – 84. 16.StevensR: Ontology-basedknowledgerepresentationforbioinformatics. BriefBioinform 2000, 1: 398 – 414.17.BlakeJA,BultCJ: Beyondthedatadeluge:dataintegrationandbio-ontologies. JBiomedInform 2006, 39: 314 – 320. 18.GoodBM,WilkinsonMD: Thelifesciencessemanticwebisfullofcreeps! BriefBioinform 2006, 7: 275 – 286. 19.AntezanaE,KuiperM,MironovV: Biologicalknowledgemanagement:the emergingroleoftheSemanticWebtechnologies. BriefBioinform 2009, 10: 392 – 407. 20.ChenH,YuT,ChenJY: Semanticwebmeetsintegrativebiology:asurvey. BriefBioinform 2013, 14: 109 – 125. 21. RDFprimer. http://www.w3.org/TR/2004/REC-rdf-primer-20040210/. 22. RDFVocabularyDescriptionLanguage1.0:RDFSchema. http://www.w3. org/TR/rdf-schema/. 23. OWL2WebOntologyLanguageprimer(secondedition). http://www.w3. org/TR/2012/REC-owl2-primer-20121211/. 24.WooleyJC,FieldD,GlcknerF-O: Extendingstandardsforgenomicsand metagenomicsdata:aresearchcoordinationnetworkforthegenomic standardsconsortium(RCN4GSC). StandGenomicSci 2009, 1: 85 – 90. 25.WallsRL,DeckJ,GuralnickR,BaskaufS,BeamanR,BlumS,BowersS, ButtigiegPL,DaviesN,EndresenD,GandolfoMA,HannerR,JanningA, KrishtalkaL,MatsunagaA,MidfordP,MorrisonN,Tuama",SchildhauerM, SmithB,StuckyBJ,ThomerA,WieczorekJ,WhitacreJ,WooleyJ: Semantics insupportofbiodiversityknowledgediscovery:anintroductiontothe biologicalcollectionsontologyandrelatedontologies. PLoSOne 2014, 9: e89606. 26.RobertsonT,DringM,WieczorekJ,DeGiovanniR,VieglaisD: DarwinCore textguide. http://rs.tdwg.org/dwc/terms/guides/text/. 27.BizerC,SeaborneA: D2RQ – treatingnon-RDFdatabasesasvirtualRDF graphs. http://iswc2004.semanticweb.org/posters/PID-SMCVRKBT1089637165.pdf. 28.McBrideB: Jena:aSemanticWebtoolkit. IEEEInternetComput 2002, 6: 55 – 59. 29.BeckettD: RDF1.1N-Triples:Aline-basedsyntaxforanRDFgraph. http://www.w3.org/TR/n-triples/. 30.BeckettD,Berners-LeeT: Turtle-TerseRDFTripleLanguage. http://www. w3.org/TeamSubmission/turtle/. 31.BaskaufSJ,WebbCO: RationaleforaSemanticWebimplementationof DarwinCore. http://code.google.com/p/darwin-sw/wiki/Rationale. 32.WeibelS: TheDublincore:asimplecontentdescriptionmodelfor electronicresources. BullAmSocInfSciTechnol 1997, 24: 9 – 11.33. TaxonConcept:speciesconceptsfortheSemanticWeb. http://www. taxonconcept.org/ontologies/. 34.WebbC,BaskaufS:D-SW: DarwinCoredatafortheSemanticWeb. http://www.tdwg.org/fileadmin/2011conference/slides/Webb_DarwinSW. pdf. 35.AllemangD,HendlerJA: SemanticWebfortheWorkingOntologist:Modeling inRDF,RDFSandOWL. Amsterdam;Boston:MorganKaufmannPublishers/ Elsevier;2008. 36.SmithB,CeustersW,KlaggesB,KhlerJ,KumarA,LomaxJ,MungallC, NeuhausF,RectorAL,RosseC: Relationsinbiomedicalontologies. GenomeBiol 2005, 6: R46. 37. DarwinCoreIssue167:MaterialSample. https://code.google.com/p/ darwincore/issues/detail?id=167. 38.MartinS,HohmanMM,LiefeldT: TheimpactofLifeScienceIdentifieron informaticsdata. DrugDiscovToday 2005, 10: 1566 – 1572. 39.HilseH-W,KotheJ,ConsortiumofEuropeanResearchLibraries,European CommissiononPreservationandAccess: ImplementingPersistentIdentifiers: OverviewofConcepts,GuidelinesandRecommendations. London;Amsterdam: ConsortiumofEuropeanResearchLibraries;EuropeanCommissionon PreservationandAccess;2006. 40. BiocodeCommonsIdentifiers(BCIDs). http://bcid.googlecode.com/. 41.ChavanVS,IngwersenP: Towardsadatapublishingframeworkfor primarybiodiversitydata:challengesandpotentialsforthebiodiversity informaticscommunity. BMCBioinformatics 2009, 10 (Suppl14):S2. 42. TriplifierWebApplication. http://www.biscicol.org/triplifier/. 43. TheBSD3-ClauseLicense. http://opensource.org/licenses/BSD-3-Clause. 44. BiSciColTriplifierProjectSite. http://triplifier.googlecode.com/. 45. TheBiSciColBlog. http://biscicol.blogspot.com/. 46.EllsonJ,GansnerER,KoutsofiosE,NorthSC,WoodhullG: Graphvizand Dynagraph — staticanddynamicgraphdrawingtools .In GraphDraw Softw. EditedbyJngerM,MutzelP.Berlin,Heidelberg:SpringerBerlin Heidelberg;2004:127 – 148[FarinG,HegeH-C,HoffmanD,JohnsonCR, PolthierK(Serieseditors)]. 47.HanL,FininT,ParrC,SachsJ,JoshiA: RDF123:FromSpreadsheetstoRDF In SemanticWeb-ISWC2008.Volume5318. EditedbyShethA,StaabS, DeanM,PaolucciM,MaynardD,FininT,ThirunarayanK.Berlin,Heidelberg: SpringerBerlinHeidelberg;2008:451 – 466. 48.LangeggerA,WW: XLWrap – QueryingandIntegratingArbitrary SpreadsheetswithSPARQL .In SemanticWeb-ISWC2009.Volume5823. EditedbyBernsteinA,KargerDR,HeathT,FeigenbaumL,MaynardD, MottaE,ThirunarayanK.Berlin,Heidelberg:SpringerBerlinHeidelberg; 2009:359 – 374[HutchisonD,KanadeT,KittlerJ,KleinbergJM,MatternF, MitchellJC,NaorM,NierstraszO,PanduRanganC,SteffenB,SudanM, TerzopoulosD,TygarD,VardiMY,WeikumG(Serieseditors)]. 49.LeboT,EricksonJS,DingL,GravesA,WilliamsGT,DiFranzoD,LiX, MichaelisJ,ZhengJG,FloresJ,ShangguanZ,McGuinnessDL,HendlerJ: ProducingandUsingLinkedOpenGovernmentDataintheTWCLOGD Portal .InLinkGovData. EditedbyWoodD.NewYork,NY:SpringerNew York;2011:51 – 72. 50. SPARQLQueryLanguageforRDF. http://www.w3.org/TR/rdf-sparql-query/. 51.AsiaeeAH,DoshiP,MinningT,SahooS,ParikhP,ShethA,TarletonRL: Fromquestionstoeffectiveanswers:ontheutilityofknowledge-driven queryingsystemsforlifesciencesdata. In DataIntegrLifeSci.Volume 7970. EditedbyBakerCJO,ButlerG,JurisicaI.Berlin,Heidelberg:Springer BerlinHeidelberg;2013:38 – 45[HutchisonD,KanadeT,KittlerJ,Kleinberg JM,MatternF,MitchellJC,NaorM,NierstraszO,PanduRanganC,SteffenB, SudanM,TerzopoulosD,TygarD,VardiMY,WeikumG(Serieseditors)].doi:10.1186/1471-2105-15-257 Citethisarticleas: Stucky etal. : TheBiSciColTriplifier:bringing biodiversitydatatotheSemanticWeb. BMCBioinformatics 2014 15 :257.Stucky etal.BMCBioinformatics 2014, 15 :257 Page9of9 http://www.biomedcentral.com/1471-2105/15/257