High-resolution deep sequencing reveals biodiversity, population structure, and persistence of HIV-1 quasispecies within...

MISSING IMAGE

Material Information

Title:
High-resolution deep sequencing reveals biodiversity, population structure, and persistence of HIV-1 quasispecies within host ecosystems
Physical Description:
Mixed Material
Language:
English
Creator:
Yin, Li
Liu, Li
Sun, Yijun
Hou, Wei
Lowe, Amanda C.
Gardner, Brent P.
Salemi, Marco
Williams, Wilton B.
Publisher:
BioMed Central (Retrovirology)
Publication Date:

Notes

Abstract:
Background: Deep sequencing provides the basis for analysis of biodiversity of taxonomically similar organisms in an environment. While extensively applied to microbiome studies, population genetics studies of viruses are limited. To define the scope of HIV-1 population biodiversity within infected individuals, a suite of phylogenetic and population genetic algorithms was applied to HIV-1 envelope hypervariable domain 3 (Env V3) within peripheral blood mononuclear cells from a group of perinatally HIV-1 subtype B infected, therapy-naïve children. Results: Biodiversity of HIV-1 Env V3 quasispecies ranged from about 70 to 270 unique sequence clusters across individuals. Viral population structure was organized into a limited number of clusters that included the dominant variants combined with multiple clusters of low frequency variants. Next generation viral quasispecies evolved from low frequency variants at earlier time points through multiple non-synonymous changes in lineages within the evolutionary landscape. Minor V3 variants detected as long as four years after infection co-localized in phylogenetic reconstructions with early transmitting viruses or with subsequent plasma virus circulating two years later. Conclusions: Deep sequencing defines HIV-1 population complexity and structure, reveals the ebb and flow of dominant and rare viral variants in the host ecosystem, and identifies an evolutionary record of low-frequency cell-associated viral V3 variants that persist for years. Bioinformatics pipeline developed for HIV-1 can be applied for biodiversity studies of virome populations in human, animal, or plant ecosystems. Keywords: HIV-1 envelope V3, Biodiversity, Population structure, Quasispecies, Fitness, Pyrosequencing, Founder virus persistence, Most recent common ancestor
General Note:
Publication of this article was funded in part by the University of Florida Open-Access publishing Fund. In addition, requestors receiving funding through the UFOAP project are expected to submit a post-review, final draft of the article to UF's institutional repository, IR@UF, (www.uflib.ufl.edu/UFir) at the time of funding. The institutional Repository at the University of Florida community, with research, news, outreach, and educational materials.
General Note:
Yin et al. Retrovirology 2012, 9:108 http://www.retrovirology.com/content/9/1/108; Pages 1-9
General Note:
doi:10.1186/1742-4690-9-108 Cite this article as: Yin et al.: High-resolution deep sequencing reveals biodiversity, population structure, and persistence of HIV-1 quasispecies within host ecosystems. Retrovirology 2012 9:108.

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
All rights reserved by the source institution.
System ID:
AA00013962:00001

Full Text

PAGE 1

RESEARCHOpenAccessHigh-resolutiondeepsequencingreveals biodiversity,populationstructure,andpersistence ofHIV-1quasispecieswithinhostecosystemsLiYin1*,LiLiu2,YijunSun2,WeiHou3,AmandaCLowe1,BrentPGardner1,MarcoSalemi1,WiltonBWilliams1, WilliamGFarmerie2,JohnWSleasman4andMaureenMGoodenow1*AbstractBackground: Deepsequencingprovidesthebasisforanalysisofbiodiversityoftaxonomicallysimilarorganismsin anenvironment.Whileextensivelyappliedtomicrobiomestudies,populationgeneticsstudiesofvirusesarelimited. TodefinethescopeofHIV-1populationbiodiversitywithininfectedindividuals,asuiteofphylogeneticand populationgeneticalgorithmswasappliedtoHIV-1envelopehypervariabledomain3(EnvV3)withinperipheral bloodmononuclearcellsfromagroupofperinatallyHIV-1subtypeBinfected,therapy-navechildren. Results: BiodiversityofHIV-1EnvV3quasispeciesrangedfromabout70to270uniquesequenceclustersacross individuals.Viralpopulationstructurewasorganizedintoalimitednumberofclustersthatincludedthedominant variantscombinedwithmultipleclustersoflowfrequencyvariants.Nextgenerationviralquasispeciesevolvedfrom lowfrequencyvariantsatearliertimepointsthroughmultiplenon-synonymouschangesinlineageswithinthe evolutionarylandscape.MinorV3variantsdetectedaslongasfouryearsafterinfectionco-localizedinphylogenetic reconstructionswithearlytransmittingvirusesorwithsubsequentplasmaviruscirculatingtwoyearslater. Conclusions: DeepsequencingdefinesHIV-1populationcomplexityandstructure,revealstheebbandflowof dominantandrareviralvariantsinthehostecosystem,andidentifiesanevolutionaryrecordoflow-frequency cell-associatedviralV3variantsthatpersistforyears.BioinformaticspipelinedevelopedforHIV-1canbe appliedforbiodiversitystudiesofviromepopulationsinhuman,animal,orplantecosystems. Keywords: HIV-1envelopeV3,Biodiversity,Populationstruc ture,Quasispecies,Fitn ess,Pyrosequencing, Founderviruspersistence,MostrecentcommonancestorBackgroundHumanimmunodeficiencyvirustype1(HIV-1)displays extensivegeneticdiversity,reflectingtheerrorprone characteristicsofreversetranscriptase-dependentreplication,elevatedrecombinationrateandcontinuousselectionofmorefitviralvariantswithinfluctuatinghost ecosystems.HIV-1populationswithinaninfectedindividualarecomplexandcomprisedofswarmsofrelatedgenomes,orquasispecies[1,2].StudiesofHIV-1diversity withinquasispeciesbenefitedovertheyearsbythedevelopmentofnovelsequencingtechnologiesthatextended thedepthofsampling[1-11].Nextgenerationdeepsequencingincreasessignificantlythesensitivitytoidentify withinHIV-1quasispecieslowfrequencygeneticvariants thatmightleadtoreducedsusceptibilitytoantiretroviral treatments[12,13]orescapefromimmunity[14].Beyond surveillancefordrugresistance,deepsequencingprovidesadditionaladvantagestodetectepistaticinteractions [15],estimatepopulationstruc ture[16],identifyevolutionaryintermediates,andevalua tebiodiversityoforganisms withinanecosystem[17-26]. Biodiversityisusedinpopulationgeneticstopresenta unifiedviewoftheextentofvariationoflifeformswithin habitats[27]andassumesthatgenomeswithinanenvironmentaretaxonomicallysimilar,randomlydistributed, andsufficientlylarge[28].Assessmentsofbiodiversity *Correspondence: yin@pathology.ufl.edu ; goodenow@ufl.edu1DepartmentofPathology,ImmunologyandLaboratoryMedicine,Collegeof Medicine,UniversityofFlorida,2033MowryRoad,POBox103633, Gainesville,FL32610-3633,USA Fulllistofauthorinformationisavailableattheendofthearticle 2012Yinetal.;licenseeBioMedCentralLtd.ThisisanOpenAccessarticledistributedunderthetermsoftheCreative CommonsAttributionLicense(http://creativecommons.org/licenses/by/2.0),whichpermitsunrestricteduse,distribution,and reproductioninanymedium,providedtheoriginalworkisproperlycited.Yin etal.Retrovirology 2012, 9 :108 http://www.retrovirology.com/content/9/1/108

PAGE 2

fromdeepsequencingdataprovideunprecedentedviews oftherichnessofimmunelociinprimates,zebrafish,and humans[17,18,26]orthecomplexityofmicrobiomes independentofanabilitytoculturemicroorganisms [21,24,25,29].Biodiversitydefinescomplexitywithinpopulationsthatextendbeyondevaluationsofdiversitybased onpairwisegeneticdistance,themajorapproachforanalysisofsmalldatasetsofHIV-1sequencesfrominfected individuals[30,31].BiodiversitywithinHIV-1populations mightreflecthostenvironments,infectionbycirculating recombinantformsofHIV-1orco-infectionbymultiple subtypes,andprovideuniqueandsensitivebiomarkersforchangesinviralpopulations.Moreover,structureofHIV-1quasispecies,orthefrequencydistribution ofviralvariantswithinindividuals,mayrevealthepotentialforviralpopulationstoevolvewithinafitnesslandscapeandcontributetoviralpersistence[4,32-34]. Wedesignedadeep-sequencingstudyofHIV-1EnvV3 quasispecieswithinperipheralbloodcellsthatapplied populationgeneticstoolsinanovelbioinformaticspipeline todefineviralbiodiversity,examineviralpopulationstructure,andexploredirectlytheextenttowhichdeepsequencingenrichesanalysisoftheHIV-1evolutionarylandscape.ResultsBiodiversityofHIV-1quasispeciesBiodiversityisevaluatedbyrarefactionanalysisand definedasthenumberofoperationaltaxonomicunits (OTU)withinapopulation[17,18,21,23-26].HIV-1Env V3pyrosequenceswithineachsamplewereclustered overarangeofpairwisegeneticdistancesfrom0%to 10%tocompareviralpopulationsamongindividuals (Figure1).Whenanupperclusteringthresholdof10% wasappliedtoapproximatemeanpairwisegeneticdistancefoundamongsubtypeBEnvsequences[35],the viruspopulationformedasingleOTUinS3,but included3or4OTUinS4orS5viralpopulations,6 OTUinS1andS6,orasmanyas10OTUinS2 (Table1).Biodiversityofviralpopulationsevaluatedat 0%distance(i.e.,theuniquelevel)[31]rangedfromrelativelylowbiodiversity(69or82OTU)inS3orS5,to 156or157OTUinS6orS4,orashighas253or267 OTUinS1orS2(Figure1).Eventhoughviralbiodiversityattheuniquelevelwassimilarbetweensomeindividuals,clusteringgenomesatdistancesfrom1%to5% revealeddifferencesincomplexitywithinhostenvironments.Forexample,theviruspopulationinS4displayed reducedcomplexitycomparedwiththepopulationin S6,andwasmoresimilartoviralpopulationsinS3 orS5(Figure1andTable1).Biodiversitycalculated at3%correlatedsignificantlywithbiodiversityatthe uniquelevelamongtheindividuals[r=0.91,p=0.01] andprovidedarationaleforclusteringat3%insubsequentanalyses. Rarefactioncurvesat3%distanceapproached,but failedtoachieveaplateau,raisingthepossibilitythat depthofsequencingwasinsufficienttocaptureallviral diversity.Yet,estimatedmaximumbiodiversitywasonly abouttwo-foldgreaterthan,andcorrelatedwith,calculatedbiodiversity(r=0.89;p=0.02)(Table1),indicating thatsequencedepth(about25-foldcoverage)wassufficienttoprovidearobustassessmentofV3biodiversitywithinasample.Ingeneral,biodiversityamong thesixsubjectsappearedunrelatedtovirallevelsin plasmaorcells,lengthofinfection,orCD4Tcelllevels Figure1 Biodiversityamongviralpopulations. Pyrosequencing datasetsfromeachindividualwereclusteredat0%(unique)to10% geneticdistancesanddisplayedasrarefactioncurves.Y-axis,number ofOTU(numberofsequenceclusters);x-axis,percentoftotal pyrosequences(sequencessampledtotalnumberofsequencesx 100%).Colorsofcurvesindicatethelevelofclustering:yellow,0%; black,1%;blue,2%;green,3%;cyan,4%;purple,5%;red,10%. NumbersofOTUattheendofcurvesat0%distancerepresent biodiversitycalculatedfromrarefactioncurveatthesequencedepth (100%ofpyrosequences).Smallredboxesindicateapproximate sequencedepthachievedbyconventionalclonalsequences. Yin etal.Retrovirology 2012, 9 :108 Page2of9 http://www.retrovirology.com/content/9/1/108

PAGE 3

(Additionalfile1),butrevealedpatternsofcomplexity withinviralquasispeciesindifferenthostenvironments.PopulationstructureToevaluatethecomplexityofviralpopulationstructure withineachindividual,unrootedphylogenetictreeswere constructedtorelatethedistributionandfrequencyof sequenceclusters(OTUs)withtheproportionofamino acidsequencesineachcluster.Eachsubjectharbored viruspopulationsinwhich65%to90%ofsequences wereorganizedinto1to3dominantclusterswiththousandsofsequencespercluster(Figure2).Ineachcase, dominantsequenceclustersweresurroundedbyswarms ofclustersoflessabundantvariantsformingstar-like phylogenies.Ingeneral,structureofviralpopulationsin differentenvironmentswasdistinguishednotonlyby thenumberofdominantsequences,butbythedistributionofthefrequencyofnon-dominantviralvariants,as well;forexample,virusesinS2,S3,andS4eachhada singledominantpopulation,butuniqueorganization andfrequencyoflessabundantvariants.EnrichedevolutionarylandscapewithinHIV-1 quasispeciesToevaluatetherelationshipofarchivedviralpopulations fromasingletimepointtoviralpopulationsovertime, phylogenetictreeswereinferredfromdeepsequenceV3 datasetscombinedwithlongitudinalcell-associatedand plasmaclonalviralsequences.Combineddatasetsfrom S1extendedoveratwo-yearperiodfromabout3to 5yearsofage/infection,whenCD4Tcellsrangedbetween25%to30%andviralsetpointwasabout10,000 copies(Figure3A).Phylogeneticanalysisofconventional clonalV3sequencesfromviralDNAandRNAatfour timepointsprovidedaviewofviralpopulationswith significantlysupportedbranches(L1andL2),butuncleardominantviralpopulation(s)(Figure3B).When pyrosequencingdatawereincludedinthephylogenetic construction,twodominantpopulations,oneinL1and thesecondinL2,becameapparent(Figure3C).Low frequency(~1%)cell-associatedV3pyrosequencing variantscolocalizedonthetreewithvirusfound abouteighteenmonthsearlierbyconventional sequencesinbothcellsandplasma.Moreover,pyrosequencingvariantswithfrequencyrangingfrom0.25% to>10%incellscolocalizedwithconventional sequencesfoundmonthslaterinplasmaviralRNA. Overall,thearrayofviralvariantsidentifiedbypyrosequencingatasingletimepointreflectedtherange ofclonalsequencesidentifiedinlongitudinalsamples over2-yearsofinfection. Toevaluateviralpopulationsoverlongerperiodsof time,S5samplescollectedfrom6-wkstomorethan 6.6yearsofagewereanalyzed(Figure4A).CellassociatedV3variantsbyconventionalclonalsequencingshortlyafterbirthhadlimiteddiversity,whileat leasttwowell-supportedlineagesofvariants(L1andL2) developedby4.4yearsofinfection(Figure4B).Pyrosequencesincludedtwodominantclusters,bothinL2, aswellastherepertoireofV3domainsfoundover thecourseofinfection(Figure4C).Forexample, somelowfrequency(~1%)cell-associatedvirusquasispeciesfoundafter4.5yearsofinfectionincluded V3domainsthatcolocalizedwiththeclusterofviral DNAsequencesidentifiedshortlyafterbirth(Figure4C). Otherlowfrequencycell-associatedV3variantsdetected bypyrosequencing(0.25%)werecloselyrelatedto viralRNAexpressedinplasmamorethantwoyears later.Overall,theevolutionarylandscapewasdefined bycyclicemergenceofdominantpopulationsfrom low-frequencyvariants.Mostrecentcommonancestorsintheevolutionary landscapeV3populationsinS5developedalonglineageswithmultipleaminoacidchangesatbranchnodes,providingan opportunitytoinferthemostrecentcommonancestor (MRCA)ofeachlineage.Basedonclonalsequences,the Table1Calculatedandestimatedbiodiversitydefinedbyoperationaltaxonomicunits(OTUs)PIDaBiodiversity(OTU)b0%3%10% CalculatedEstimatedCalculatedEstimatedCalculatedEstimated S12533158517867 S2267293991371012 S36975122411 S4157183215534 S58298229145 S61562006713267aPatientidentification.bBiodiversity,expressedasoperationaltaxonomicunits(OTU),wascalculatedfromrarefactioncurvesorestimatedbyabundance-basedestimator( ACE)using ESPRITsoftware[ 25 ]whensequenceswereclusteredat0%(unique),3%,or10%distance.Yin etal.Retrovirology 2012, 9 :108 Page3of9 http://www.retrovirology.com/content/9/1/108

PAGE 4

earliestviralpopulationgaverisethroughancestralnode 1(anc1)totwosubsequentlineages(Figure4B).L1progressedthroughnode2(anc2)withchangesinV3attwo aminoacidpositions,E322DandY316H(Figure4D), whileL2gaverisebytwodifferentaminoacidsubstitutions,Q308RandE322K(Figure4D)tovirusesat6 to7yearsofinfectionthroughanc3(Figure4B). Depthofconventionalclonalsequencingwasinadequatetoassignatemporalordertotheaminoacid changesbetweenMRCAatanc1andanc2oranc3. Inclusionofpyrosequencesintheanalysisprovided sufficientcoverageoftheviralpopulationtoinfer thattheE322Dchange(anc2 ’ )appearedbeforethe Y316Hsubstitution,whileQ308R(anc3 ’ )preceded theE322Ksubstitution(Figure4D).DiscussionBiodiversityisroutinelyappliedtometagenomicsofa varietyofspecies,includingthehumanmicrobiome, butonlylimited,ifany,assessmentofviromesindifferentecologicalniches.Ourstudyappliesanefficient bioinformaticpipelinethatwedevelopedtoassessthe complexityofHIV-1quasispeciesinuniqueecosystems withininfectedindividuals.Thepowerofpyrosequencingtogenerateextensivesequencedatasetsprovidesa foundationtoapplypopulationgeneticanalysesand extendsthevaluefordeepsequencingbeyondanalysis ofrarevariantsthatmightindicatereducedsensitivityto drugs.Analysisofbiodiversitybasedonsequence clusteringprovidesanovelviralpopulationprofilefor differentenvironmentsindependentofvirallevelsin Figure2 Organizationofviralpopulations. Unrootedneighbor-joiningtreesweredevelopedforeachpyrosequencingdatasetclusteredat 3%pairwisedistance.Symbolsrepresenttheproportionoftotalpyrosequencesinacluster: emptycircle 0.25%; blackinvertedtriangle ,>0.25% to1%; blacksquare ,>1%to10%; star ,>10%. Yin etal.Retrovirology 2012, 9 :108 Page4of9 http://www.retrovirology.com/content/9/1/108

PAGE 5

cellsorplasma,perhapsreflectinglengthofinfection ifsequenceswerearchivedinlineagesoflong-lived cells.Consistentwiththismodel,complexviralpopulationstructurewithhighbiodiversityappearedas earlyaseighteenmonths,orbyfourtosixyears,of infectioninsomeindividuals.Yet,similarperiodsof infectioninotherindividualswerecharacterizedby monomorphicviralpopulationswithlowcomplexity, indicatingthatbiodiversityofV3populationsrepresentscomplexcombinationsoffactors;forexample, changesinviralfitnessintheenvironmentallandscapeinresponsetohostimmunity,hosttargetcells, orcoreceptorevolutionunderselectivepressure. Anothernovelaspectofourstudyinvolvedacombinationofcross-sectionaldeep sequencingwithconventional longitudinalsequencestoprovidehigh-resolutiondetection ofevolutionaryintermediates,whichmaybelessfitorinfrequentinperipheralblood,b utnonethelesscontributeto thegeneticflexibilityofthepo pulation.Thespecificorder ofaminoacidsubstitutionsovertimemayreflectimportant epistaticinteractionsthatcouldfocusdetectionofcompensatorymutationscontributingtofitnessinthegeneticlandscapetootherregionsofthevirusgenome.Deep sequencingdatasetsfillintheevolutionarylandscapeand increasethepowertoinferthetemporalaccumulationof aminoacidsubstitutions,orprovideabasisforrational functionalanalysisofancestralenvelopesandtheprogeny thatemergefromrecurringvira lpopulationbottlenecks. Anapparentparadoxfromouranalysesisthecontributionbylow-frequency,presumablyless-fitviralvariants, Figure3 PersistenceofV3variantsinPBMCforS1.A .Timelinewithrainbowcolorsindicatetimingofsamples(blackdotsforclonal sequences;blackdotwithPforpyrosequences),aswellasCD4%(blackline)andlog10plasmavirallevels(orangeline),relativetoage/lengthof infectioninyears. B .MLtreeoflongitudinalclonalV3sequencesresembledthetopologyoftreesdevelopedfromEnvV1throughV3clonal sequences(sequencenumber:red – 19,yellow – 37,green – 7,blue-13).Symbols:ovals,plasmaRNAsequences;rectangles,cell-associatedDNA sequences.Sizeofsymbolsrepresentsrelativeabundanceofsequencesinthepopulation.Colorsrepresenttimelineofsamples.Asterisksona branchrepresentsignificantapproximatelikelihood-ratiotest(*>0.75,**>0.90).Scaleindicates0.02nucleotidesubstitutionspersite. C .MLtree combininglongitudinalconventionalandsingle-timepointdeepsequences.Blacksymbolsrepresentpyrosequencesclusteredat3%pairwise distancewithsymbolshapesindicatingproportionofsequencesineachcluster: emptycircle 0.25%; blackinvertedtriangle ,>0.25%to1%;black square,>1%to10%; star ,>10%.Bracketsindicatecolocalizationofcell-associatedviralvariantsbypyrosequencingwith: “ a ” ,clonalRNAandDNA viralsequencesfromearliertimepoint;or “ b ” ,clonalplasmaviralvariantsfromlatertimepoints.Scaleindicates0.02nucleotidesubstitutionspersite. Yin etal.Retrovirology 2012, 9 :108 Page5of9 http://www.retrovirology.com/content/9/1/108

PAGE 6

Figure4 PersistenceofV3variantsandevolutionaryintermediates.A .Timelinewithrainbowcolorsindicatestimingofsamples(blackdots, clonalsequences;P,pyrosequences),CD4%(blackline)andlog10plasmaviralloadatonetimepoint(anorangedot),relativetoage/lengthof infectioninyears. B .MLtreeofconventionalsequences(sequencenumber:red – 10,yellow – 15,green – 8,blue-17)withmostrecentcommon ancestralnodes(anc)labeledfordifferentlineages(greencircles).Scale:0.02nucleotidesubstitutions/site.Symbols:ovals,plasmaRNAsequ ences; rectangles,cell-associatedDNAsequences.Sizeofsymbols:relativeabundanceofsequencesinthepopulation.Colors:timingofsamples.Asteris kson branches:significantapproximatelikelihood-ratiotest(*>0.75,**>0.90). C .MLtreecombininglongitudinalconventionalandsingle-timepoint pyrosequenceswithancnodesmarkedfordifferentlineages(greencircles:thesameancnodesasinpanelB;redcircles:additionalancnodeswhen pyrosequencesfilledinthephylogeneticlandscape).Blacksymbols:representpyrosequencesclusteredat3%distancewithsymbolshapesindicati ng proportionofsequencesineachcluster: emptycircle 0.25%; blackinvertedtriangle ,>0.25%to1%; blacksquare ,>1%to10%; star ,>10%.Brackets with “ b ” :clusteringofcellassociatedviralvariantsbypyrosequencingwithclonalplasmaviralvariantsfromalatertimepoint.Redcircle:colocalizati on ofcell-associatedvirusfromnearbirthwithasubsetofpyrosequencesincells4.5yearslater. D .Mostrecentcommonancestors(MRCA)onMLtreeof panelC.Anc1,anc2andanc3:thesameancestralnodesonMLtreeinpanelB.Anc2 ’ andanc3 ’ :additionalancestralnodeswhenpyrosequencesfillin theevolutionarylandscape.Numbers:aminoacidpositionsrelativetoHIV-1HXB2gp160[36].NOTE.MRCAanalysiswasnotperformedonS1data becauseonlysingleaminoacidchangesoccurredbetweenancestralnodesontheconventionalMLtree. Yin etal.Retrovirology 2012, 9 :108 Page6of9 http://www.retrovirology.com/content/9/1/108

PAGE 7

ratherthanthedominantvariants,tonextgeneration plasmaHIV-1populationswithenhancedfitness.Lowfrequencyvariantsexpandthefitnesslandscapeforvirus populations,whileprovidinganarrayofevolutionary optionstomaximizesurvivalinachangingecosystem [34].Lowfrequencycell-associatedHIV-1quasispecies mayrepresentresidualgenomesfromapastdominant populationarchivedinlong-livedcells,asequestered reservoirthatonlyinfrequentlyfindsitswayintothe peripheralblood,and/orprogenitorsthatgivesriseto thenextgenerationofdominantvariantsinthe plasma.Transientdominanceofapopulationleavesa moleculartrailthatpersistsaslowfrequencyvariants archivedinperipheralblood.Inagreementwithstudiesof heterosexualHIV-1transmission[37],archeologicalevidenceoftheearliestviralpopulationswasfoundinour studyofpediatriccellsaslongasfouryearsafterinfection bymaternaltransmission,suggestingthoseearlyviruses, oratleasttheirV3domains,endureduringthenatural historyofinfection. WhilethestudyfocusedonHIV-1populationsin humanenvironments,theapproachisapplicabletoan arrayofviruseswithcomplexpopulations,including othersubtypesorrecombinantformsofHIV-1,hepatitis CorhepatitisBviruses,aswellastherepertoireof relatedvirusesthatinfectanimals.Increaseddepthof samplingandextendedlengthofthetargetregionnow possiblebypyrosequencingcombinedwithefficientbioinformaticpipelinesprovidesabasisfordeveloping quantitativemeasuresoftheebbandflowofviralpopulationsinchangingenvironments.ConclusionsDeepsequencingofHIV-1EnvV3hypervariable domainscombinedwithconventionallongitudinalV3 sequencedatasetsprovideshighresolutionoftheevolutionarylandscapeofHIV-1quasispecies,revealstherichnessofviraldiversitywithintheecosystemsofinfected individuals,explorestheebbandflowofdominanthigh-fit andlowfrequencyless-fitviralvariants,infersdetailsof multistepevolutionaryeventsinthefitnesslandscape,and identifiespersistenceoflow-frequencyviralvariantsinperipheralbloodcellsthatresembletransmittedviruses.MethodsSubjectsPeripheralmononuclearcells(PBMC)wereobtained fromacohortofHIV-1childrenwithparentalinformed consentunderaprotocolapprovedbytheInstitutional ReviewBoardoftheUniversityofFlorida.Study includedsixtherapy-navesubjects,infectedperinatally between1989and1995throughmaternaltransmission ofsubtypeBHIV-1,withmedianplasmaviralloadof 4.9(quartilerange4.6to5.3)log10HIV-1RNAcopies perml,medianage/lengthofinfectionof4.4(quartile range:2.0to5.1)years,andmedianCD4levelsof22% (quartilerange13.3%to25.5%)atthetimeofdeep sequencing(Additionalfile1).ClonalandpyrosequencesClonalsequencesfromHIV-1EnvV1throughV5 weregeneratedusingAmpliTaq(LifeTechnologies Corporation,Carlsbad,CA,US)aspreviouslydescribed [30].AmpliconlibrarieswereconstructedfromPBMC DNAwith400HIV-1copiesusingGoTaqDNApolymerase(Promega,Madison,WI,US),aspreviouslydescribed [38,39]andsubmittedtotheUniversityofFlorida InterdisciplinaryCenterforBiotechnologyResearchfor pyrosequencingusingaproprietaryDNApolymerase (amixtureof Taq andhighfidelityDNApolymerases) (Roche/454LifeSciences)onaGenomeSequencer FLX(Roche/454LifeSciences)toproduceanaverage ofabout10,000readspersampleorabout25-foldcoverageof400templatecopies(10,000sequences400viral copies=25foldcoverage).Rawclonalandpyrosequencing nucleicaciddatasetsaredepositedinEMBLdatabase (EMBLaccessionnumberspending).AnalysispipelineAbioinformaticspipelinedevelopedbyourgroupwas appliedtothedatasets.Thepipelineincorporatesa seriesofqualitycontrolanderrorcorrectionfiltersto reducerandomnucleotidesubstitutions,correctframe shifts,andeliminatehypermutatedorrecombinant sequences(Additionalfile2).Overall,theanalysispipelineproducedhigh-qualitydatasetswithretentionof about90%to97%ofthesequencesfromanysample (Additionalfile3).Integrityoferror-correcteddatasets fromdeepsequencingwasverifiedbyphylogenticconstruction(Additionalfile4). Ingeneral,maximumlikelihoodpairwisedistances withindeepsequencedatasetsweresignificantlygreater thanamongconventionalsequencedatafromeachindividual(p<0.001).ToassessbiodiversityofHIV-1Env quasispecies,rarefactioncurveswereconstructedusing theESPRITsoftwaresuite[25].NumbersofOTUare displayedonthey-axisasafunctionofpercentageof sequences(sequencessampledtotalsequencesgeneratedfrom400inputviralcopiesx100%)displayedon thex-axis.Sequenceswereclusteredacrossarangeof pairwisedistancesfrom0%to10%withallpreviously collapsedreadscountedfortheirabsoluteoccurrence. OneOTUequatestoonesequencecluster.ESPRITwas alsousedtoestimatemaximumbiodiversitywithin400 inputviralcopiesusingabundance-basedcoverageestimator(ACE),constructedconsensussequencefrom eachsequencecluster,andcalculatedthefrequencyof eachOTU.Yin etal.Retrovirology 2012, 9 :108 Page7of9 http://www.retrovirology.com/content/9/1/108

PAGE 8

Constructionofphylogenetictreesandmostrecent commonancestor(MRCA)analysisMaximumlikelihood(ML)phylogenetictreescombined deepsequencingclusterconsensusreadsandlongitudinalclonalsequencesforsubjectsS1andS5wereconstructedfromnucleotidesequencesalignedinBioEdit. AlignmentsweretrimmedtotheV3loopdefinedby codonsforcysteine296tocysteine331basedongp160 aminoacidnumberinginHXB2genome,andidentical nucleicacidclusterswerecollapsed. PhylogeneticsignalwithinS1orS5datasetsofaligned sequenceswasevaluatedbylikelihoodmappinganalyseswiththeprogramTREE-PUZZLE,andprovento besufficientforreliablephylogenyinference[40-42] (Additionalfile5).Treeswereconstructedaspreviouslydescribed[9].Briefly,theheuristicsearchfor thebesttreewasperformedusinganeighbor-joining treeandthetreebisectionreconnectionalgorithm withPAUP*4.0b10[43,44].Treeswererootedusing theearliestclonalsequencesastheoutgroup.Significanceofbrancheswasdeterminedbytheapproximate likelihoodratiotest[45-47].ForanalysisofMRCA,ancestralnucleicacidsequencesinthegenealogyobtainedfor S5wereinferredbythemaximumlikelihoodmethod usingthecodonsubstitutionmodelM0inthePAMLsoftwarepackage[47].Reconstructedancestralsequences frominternalnodeswereanalyzedinBioEditfornonsynonymouschangesateachcodonposition.StatisticalanalysisPearsoncorrelationwasappliedtoanalyzecorrelations betweenbiodiversitycalculatedfromrarefactioncurves generatedat0%and3%pairwisedistances,andbetween calculatedandACE-estimatedmaximumbiodiversity. StatisticalanalyseswereperformedusingSASversion 9.1(SAS191Institute,Cary,NC)withP<0.05defined assignificant.AdditionalfilesAdditionalfile1: TableS1. Characteristicsofstudyparticipantsattime ofpyrosequencing. Additionalfile2: Errorcorrection. Additionalfile3: TableS2. Sequentialfilteringofdatasetsthroughthe bioinformaticspipeline. Additionalfile4: FigureS1. Phylogenetictreeofclusterederrorcorrectedpyrosequencesfromindividualsstudied. Additionalfile5: FigureS2. Likelihoodmappinganalysistoevaluate phylogeneticsignal. Competinginterests Theauthorsdeclarethattheyhavenocompetinginterests. Authors'contributions LY,WGF,JWS,andMMGdesignedthestudy,obtainedfunding,analyzedand interpretedtheresults.JWSdirectedtheclinicalprogramandprovided clinicalsamplesanddataaboutthesubjects.LYandLLwithWGF,YS,and MMGwereinvolvedindevelopinganalyticalpipelinefordataanalysis,and applyingpopulationgeneticsanalysistools.LYdevelopedtheexperiments, themethods,andsuperviseddataacquisitionandanalysisbyBPG,WBW, andYS,andcollaboratedwithWHforbiostatisticalanalyses;MMGworked withACLandMStoanalyzedistances,phylogeny,mostrecentcommon ancestor,andintegrationofdeepsequencingwithconventionalsequences. ManuscriptwaswrittenbyLYandMMGwithinputfromallauthors.All authorsreadandapprovedthefinalmanuscript. Authors'information LLiscurrentlyafacultymemberattheUniversityofArizona. YSiscurrentlyafacultymemberattheUniversityofBuffalo. WHiscurrentlyafacultymemberattheStonyBrookUniversityMedical Center. BPGiscurrentlyamedicalstudentinPhiladelphiaCollegeofOsteopathic MedicineinSuwanee,Georgia.WBWiscurrentlyapostdoctoralresearch fellowattheDukeUniversity. Acknowledgements Theauthorsthankthestudyvolunteersforparticipating;Drs.ConnieJ. Mulligan,VolkerMai,MarkA.Wallet,NazleMendoncaVeres,andRebeccaR. Grayforcriticalreadingofthismanuscript.Researchwassupportedinpart byNIH/NIAIDR01AI065265andR01AI047723;ElizabethGlaserPediatric AIDSFoundationMV-00-9-900-0143-0-00;FloridaCenterforAIDSResearch; CenterforResearchinHumanImmuneDeficiencyandInflammation;and StephanyW.HollowayUniversityChairforAIDSResearch. Authordetails1DepartmentofPathology,ImmunologyandLaboratoryMedicine,Collegeof Medicine,UniversityofFlorida,2033MowryRoad,POBox103633, Gainesville,FL32610-3633,USA.2InterdisciplinaryCenterforBiotechnology Research,UniversityofFlorida,Gainesville,FL,USA.3Departmentof EpidemiologyandHealthPolicyResearch,CollegeofMedicineand DepartmentofBiostatistics,CollegeofPublicHealth,UniversityofFlorida, Gainesville,FL,USA.4DepartmentofPediatrics,DivisionofAllergy, ImmunologyandRheumatology,CollegeofMedicine,UniversityofSouth Florida,andAllChildren ’ sHospital,St.Petersburg,FL,USA. Received:25October2012Accepted:20November2012 Published:17December2012 References1.Garcia-ArriazaJ,DomingoE,BrionesC: Characterizationofminority subpopulationsinthemutantspectrumofHIV-1quasispeciesby successivespecificamplifications. VirusRes 2007, 129 (2):123 – 134. 2.ParedesR,ClotetB: ClinicalmanagementofHIV-1resistance. AntiviralRes 2010, 85 (1):245 – 265. 3.BoutwellCL,RollandMM,HerbeckJT,MullinsJI,AllenTM: Viralevolution andescapeduringacuteHIV-1infection. JInfectDis 2010, 202 (Suppl2):S309 – 314. 4.GoodenowM,HuetT,SaurinW,KwokS,SninskyJ,Wain-HobsonS: HIV-1 isolatesarerapidlyevolvingquasispecies:evidenceforviralmixtures andpreferrednucleotidesubstitutions. JAcquirImmuneDeficSyndr 1989, 2 (4):344 – 352. 5.LamersSL,SleasmanJW,SheJX,BarrieKA,PomeroySM,BarrettDJ, GoodenowMM: IndependentvariationandpositiveselectioninenvV1 andV2domainswithinmaternal-infantstrainsofhuman immunodeficiencyvirustype1invivo. JVirol 1993, 67 (7):3951 – 3960. 6.LamersSL,SleasmanJW,SheJX,BarrieKA,PomeroySM,BarrettDJ, GoodenowMM: Persistenceofmultiplematernalgenotypesofhuman immunodeficiencyvirustypeIininfantsinfectedbyvertical transmission. JClinInvest 1994, 93 (1):380 – 390. 7.NickleDC,ShrinerD,MittlerJE,FrenkelLM,MullinsJI: Importanceand detectionofvirusreservoirsandcompartmentsofHIVinfection. CurrOpinMicrobiol 2003, 6 (4):410 – 416. 8.NowakMA,MayRM,AndersonRM: TheevolutionarydynamicsofHIV-1 quasispeciesandthedevelopmentofimmunodeficiencydisease. AIDS 1990, 4 (11):1095 – 1103. 9.SalemiM,BurkhardtBR,GrayRR,GhaffariG,SleasmanJW,GoodenowMM: PhylodynamicsofHIV-1inlymphoidandnon-lymphoidtissuesrevealsaYin etal.Retrovirology 2012, 9 :108 Page8of9 http://www.retrovirology.com/content/9/1/108

PAGE 9

centralroleforthethymusinemergenceofCXCR4-usingquasispecies. PLoSOne 2007, 2 (9):e950. 10.SimmondsP,BalfeP,LudlamCA,BishopJO,BrownAJ: Analysisof sequencediversityinhypervariableregionsoftheexternalglycoprotein ofhumanimmunodeficiencyvirustype1. JVirol 1990, 64 (12):5840 – 5850. 11.WolinskySM,WikeCM,KorberBT,HuttoC,ParksWP,RosenblumLL, KunstmanKJ,FurtadoMR,MunozJL: Selectivetransmissionofhuman immunodeficiencyvirustype-1variantsfrommotherstoinfants. Science 1992, 255 (5048):1134 – 1137. 12.SimenBB,SimonsJF,HullsiekKH,NovakRM,MacarthurRD,BaxterJD, HuangC,LubeskiC,TurenchalkGS,BravermanMS,DesanyB,RothbergJM, EgholmM,KozalMJ: Low-abundancedrug-resistantviralvariantsin chronicallyHIV-infected,antiretroviraltreatment-naivepatients significantlyimpacttreatmentoutcomes. JInfectDis 2009, 199 (5):693 – 701. 13.TsibrisAM,KorberB,ArnaoutR,RussC,LoCC,LeitnerT,GaschenB,Theiler J,ParedesR,SuZ,HughesMD,GulickRM,GreavesW,CoakleyE,FlexnerC, NusbaumC,KuritzkesDR: Quantitativedeepsequencingrevealsdynamic HIV-1escapeandlargepopulationshiftsduringCCR5antagonist therapyinvivo. PLoSOne 2009, 4 (5):e5683. 14.HennMR,BoutwellCL,CharleboisP,LennonNJ,PowerKA,MacalaladAR, BerlinAM,MalboeufCM,RyanEM,GnerreS,ZodyMC,ErlichRL,GreenLM, BericalA,WangY,CasaliM,StreeckH,BloomAK,DudekT,TullyD,Newman R,AxtenKL,GladdenAD,BattisL,KemperM,ZengQ,SheaTP,GujjaS, ZedlackC,GasserO,BranderC,HessC,GunthardHF,BrummeZL,Brumme CJ,BaznerS,RychertJ,TinsleyJP,MayerKH,RosenbergE,PereyraF,Levin JZ,YoungSK,JessenH,AltfeldM,BirrenBW,WalkerBD,AllenTM: Whole genomedeepsequencingofHIV-1revealstheimpactofearlyminor variantsuponimmunerecognitionduringacuteinfection. PLoSPathog 2012, 8 (3):e1002529. 15.PoonAF,SwensonLC,DongWW,DengW,KosakovskyPondSL,Brumme ZL,MullinsJI,RichmanDD,HarriganPR,FrostSD: Phylogeneticanalysisof population-basedanddeepsequencingdatatoidentifycoevolvingsites inthenefgeneofHIV-1. MolBiolEvol 2009, 27 (4):819 – 832. 16.ErikssonN,PachterL,MitsuyaY,RheeSY,WangC,GharizadehB,Ronaghi M,ShaferRW,BeerenwinkelN: Viralpopulationestimationusing pyrosequencing. PLoSComputBiol 2008, 4 (4):e1000074. 17.BimberBN,BurwitzBJ,O ’ ConnorS,DetmerA,GostickE,LankSM,PriceDA, HughesA,O ’ ConnorD: Ultradeeppyrosequencingdetectscomplex patternsofCD8+T-lymphocyteescapeinsimianimmunodeficiency virus-infectedmacaques. JVirol 2009, 83 (16):8247 – 8253. 18.BoydSD,MarshallEL,MerkerJD,ManiarJM,ZhangLN,SahafB,JonesCD, SimenBB,HanczarukB,NguyenKD,NadeauKC,EgholmM,MiklosDB, ZehnderJL,FireAZ: Measurementandclinicalmonitoringofhuman lymphocyteclonalitybymassivelyparallelVDJpyrosequencing. SciTranslMed 2009, 1 (12):12ra23. 19.GoodmanAL,McNultyNP,ZhaoY,LeipD,MitraRD,LozuponeCA,Knight R,GordonJI: Identifyinggeneticdeterminantsneededtoestablisha humangutsymbiontinitshabitat.CellHostMicrobe 2009, 6 (3):279 – 289. 20.HamadyM,KnightR: Microbialcommunityprofilingforhuman microbiomeprojects:tools,techniques,andchallenges. GenomeRes 2009, 19 (7):1141 – 1152. 21.KeijserBJ,ZauraE,HuseSM,vanderVossenJM,SchurenFH,MontijnRC, tenCateJM,CrielaardW: Pyrosequencinganalysisoftheoralmicroflora ofhealthyadults. JDentRes 2008, 87 (11):1016 – 1020. 22.McCaigAE,GloverLA,ProsserJI: Molecularanalysisofbacterial communitystructureanddiversityinunimprovedandimprovedupland grasspastures. ApplEnvironMicrobiol 1999, 65 (4):1721 – 1730. 23.SchlossPD,HandelsmanJ: IntroducingDOTUR,acomputerprogramfor definingoperationaltaxonomicunitsandestimatingspeciesrichness. ApplEnvironMicrobiol 2005, 71 (3):1501 – 1506. 24.SoginML,MorrisonHG,HuberJA,MarkWD,HuseSM,NealPR,ArrietaJM, HerndlGJ: Microbialdiversityinthedeepseaandtheunderexplored “ rarebiosphere ” ProcNatlAcadSciUSA 2006, 103 (32):12115 – 12120. 25.SunY,CaiY,LiuL,YuF,FarrellML,McKendreeW,FarmerieW: ESPRIT: estimatingspeciesrichnessusinglargecollectionsof16SrRNA pyrosequences. NucleicAcidsRes 2009, 37 (10):e76. 26.WeinsteinJA,JiangN,WhiteRAIII,FisherDS,QuakeSR: High-throughput sequencingofthezebrafishantibodyrepertoire. Science 2009, 324 (5928):807 – 810. 27.CampbellA: Savethosemolecules:molecularbiodiversityandlife. JournalofAppliedEcology 2003, 40 (2):193 – 203. 28.NewtonAC: ForestEcologyandpreservation:AHandbookofTechniques Oxford:IllustartedEditionedition;1999. 29.HumanMicrobiomeProjectConsortium: Structure,functionanddiversity ofthehealthyhumanmicrobiome. Nature 2012, 486 (7402):207 – 214. 30.HoSK,PerezEE,RoseSL,ComanRM,LoweAC,HouW,MaC,LawrenceRM, DunnBM,SleasmanJW,GoodenowMM: GeneticdeterminantsinHIV-1Gag andEnvV3arerelatedtoviralresponsetocombinationantiretroviral therapywithaproteaseinhibitor. AIDS 2009, 23 (13):1631– 1640. 31.RozeraG,AbbateI,BrusellesA,VlassiC,D ’ OffiziG,NarcisoP,ChillemiG, ProsperiM,IppolitoG,CapobianchiMR: Massivelyparallelpyrosequencing highlightsminorityvariantsintheHIV-1envquasispeciesderivingfrom lymphomonocytesub-populations. Retrovirology 2009, 6: 15. 32.DomingoE,HollandJJ: RNAvirusmutationsandfitnessforsurvival. AnnuRevMicrobiol 1997, 51: 151 – 178. 33.EigenM: Onthenatureofvirusquasispecies. TrendsMicrobiol 1996, 4 (6):216 – 218. 34.LauringAS,AndinoR: QuasispeciestheoryandthebehaviorofRNA viruses. PLoSPathog 2010, 6 (7):e1001005. 35.PaladinFJ,MonzonOT,TsuchieH,AplascaMR,LearnGHJr,KurimuraT: GeneticsubtypesofHIV-1inthePhilippines. AIDS 1998, 12 (3):291 – 300. 36. LosAlamosdatabase. 2012,http://www.hiv.lanl.gov/content/index. 37.ReddAD,Collinson-StrengAN,ChatziandreouN,MullisCE,LaeyendeckerO, MartensC,RicklefsS,KiwanukaN,NyeinPH,LutaloT,GrabowskiMK,Kong X,ManucciJ,SewankamboN,WawerMJ,GrayRH,PorcellaSF,FauciAS, SagarM,SerwaddaD,QuinnTC: PreviouslytransmittedHIV-1strainsare preferentiallyselectedduringsubsequentsexualtransmissions. JInfectDis 2012, 206 (9):1433 – 1442. 38.CoberleyCR,KohlerJJ,BrownJN,OshierJT,BakerHV,PoppMP,Sleasman JW,GoodenowMM: Impactongeneticnetworksinhumanmacrophages byaCCR5strainofhumanimmunodeficiencyvirustype1. JVirol 2004, 78 (21):11477 – 11486. 39.GhaffariG,TuttleDL,BriggsD,BurkhardtBR,BhattD,AndimanWA, SleasmanJW,GoodenowMM: Complexdeterminantsinhuman immunodeficiencyvirustype1envelopegp120mediateCXCR4dependentinfectionofmacrophages. JVirol 2005, 79 (21):13250 – 13261. 40.SchmidtHA,StrimmerK,VingronM,vonHA: TREE-PUZZLE:maximum likelihoodphylogeneticanalysisusingquartetsandparallelcomputing. Bioinformatics 2002, 18 (3):502 – 504. 41.StrimmerK,vonHaeselerA: Likelihood-mapping:asimplemethodto visualizephylogeneticcontentofasequencealignment. ProcNatlAcad SciUSA 1997, 94: 6815 – 6819. 42.XiaX,XieZ,SalemiM,ChenL,WangY: Anindexofsubstitution saturationanditsapplication. MolPhylogenetEvol 2003,26: 1 – 7. 43.GuindonS,DufayardJF,LefortV,AnisimovaM,HordijkW,GascuelO: New algorithmsandmethodstoestimatemaximum-likelihoodphylogenies: assessingtheperformanceofPhyML3.0. SystBiol 2010, 59 (3):307 – 321. 44.SwoffordDSJ: Phylogenyinferencebasedonparsimonyandother methodswithPAUP* .In ThePhylogeneticHandbook-aPracticalApproachto DNAandProteinPhylogeny .2ndedition.EditedbyLemeyP,SalemiM, VandammeA-M.NewYork:CambrigeUniversityPress;2003:160 – 206. 45.GrayRR,VerasNM,SantosLA,SalemiM: Evolutionarycharacterizationof theWestNileViruscompletegenome. MolPhylogenetEvol 2010, 56 (1):195 – 200. 46.VerasNM,GrayRR,BrigidoLF,RodriguesR,SalemiM: High-resolution phylogeneticsandphylogeographyofhumanimmunodeficiencyvirus type1subtypeCepidemicinSouthAmerica. JGenVirol 2011, 92 (Pt7):1698 – 1709. 47.YangZ: PAML:aprogrampackageforphylogeneticanalysisby maximumlikelihood. ComputApplBiosci 1997, 13 (5):555 – 556.doi:10.1186/1742-4690-9-108 Citethisarticleas: Yin etal. : High-resolutiondeepsequencingreveals biodiversity,populationstructure,andpersistenceofHIV-1quasispecies withinhostecosystems. Retrovirology 2012 9 :108.Yin etal.Retrovirology 2012, 9 :108 Page9of9 http://www.retrovirology.com/content/9/1/108



PAGE 1

Additional file 4: Figure S1. Phylogenetic tree of clustered error-corrected pyrosequences from individuals studied.


!DOCTYPE art SYSTEM 'http:www.biomedcentral.comxmlarticle.dtd'
ui 1742-4690-9-108
ji 1742-4690
fm
dochead Research
bibl
title
p High-resolution deep sequencing reveals biodiversity, population structure, and persistence of HIV-1 quasispecies within host ecosystems
aug
au id A1 ca yes snm Yinfnm Liinsr iid I1 email yin@pathology.ufl.edu
A2 LiuLiI2 liliu@asu.edu
A3 SunYijunyijunsun@buffalo.edu
A4 HouWeiI3 Wei.Hou@stonybrookmedicine.edu
A5 Lowemi CAmandaloweac@pathology.ufl.edu
A6 GardnerPBrentbrentga@pcom.edu
A7 SalemiMarcosalemi@pathology.ufl.edu
A8 WilliamsBWiltonwilton.williams@duke.edu
A9 FarmerieGWilliamwgf2@ufl.edu
A10 SleasmanWJohnI4 jsleasma@health.usf.edu
A11 GoodenowMMaureengoodenow@ufl.edu
insg
ins Department of Pathology, Immunology and Laboratory Medicine, College of Medicine, University of Florida, 2033 Mowry Road, PO Box 103633, Gainesville, FL, 32610-3633, USA
Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, FL, USA
Department of Epidemiology and Health Policy Research, College of Medicine and Department of Biostatistics, College of Public Health, University of Florida, Gainesville, FL, USA
Department of Pediatrics, Division of Allergy, Immunology and Rheumatology, College of Medicine, University of South Florida, and All Children’s Hospital, St. Petersburg, FL, USA
source Retrovirology
issn 1742-4690
pubdate 2012
volume 9
issue 1
fpage 108
url http://www.retrovirology.com/content/9/1/108
xrefbib pubidlist pubid idtype doi 10.1186/1742-4690-9-108pmpid 23244298
history rec date day 25month 10year 2012acc 20112012pub 17122012
cpyrt 2012collab Yin et al.; licensee BioMed Central Ltd.note This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
kwdg
kwd HIV-1 envelope V3
Biodiversity
Population structure
Quasispecies
Fitness
Pyrosequencing
Founder virus persistence
Most recent common ancestor
abs
sec
st
Abstract
Background
Deep sequencing provides the basis for analysis of biodiversity of taxonomically similar organisms in an environment. While extensively applied to microbiome studies, population genetics studies of viruses are limited. To define the scope of HIV-1 population biodiversity within infected individuals, a suite of phylogenetic and population genetic algorithms was applied to HIV-1 envelope hypervariable domain 3 (Env V3) within peripheral blood mononuclear cells from a group of perinatally HIV-1 subtype B infected, therapy-naïve children.
Results
Biodiversity of HIV-1 Env V3 quasispecies ranged from about 70 to 270 unique sequence clusters across individuals. Viral population structure was organized into a limited number of clusters that included the dominant variants combined with multiple clusters of low frequency variants. Next generation viral quasispecies evolved from low frequency variants at earlier time points through multiple non-synonymous changes in lineages within the evolutionary landscape. Minor V3 variants detected as long as four years after infection co-localized in phylogenetic reconstructions with early transmitting viruses or with subsequent plasma virus circulating two years later.
Conclusions
Deep sequencing defines HIV-1 population complexity and structure, reveals the ebb and flow of dominant and rare viral variants in the host ecosystem, and identifies an evolutionary record of low-frequency cell-associated viral V3 variants that persist for years. Bioinformatics pipeline developed for HIV-1 can be applied for biodiversity studies of virome populations in human, animal, or plant ecosystems.
bdy
Background
Human immunodeficiency virus type 1 (HIV-1) displays extensive genetic diversity, reflecting the error prone characteristics of reverse transcriptase-dependent replication, elevated recombination rate and continuous selection of more fit viral variants within fluctuating host ecosystems. HIV-1 populations within an infected individual are complex and comprised of swarms of related genomes, or quasispecies abbrgrp
abbr bid B1 1
B2 2
. Studies of HIV-1 diversity within quasispecies benefited over the years by the development of novel sequencing technologies that extended the depth of sampling
1
2
B3 3
B4 4
B5 5
B6 6
B7 7
B8 8
B9 9
B10 10
B11 11
. Next generation deep sequencing increases significantly the sensitivity to identify within HIV-1 quasispecies low frequency genetic variants that might lead to reduced susceptibility to antiretroviral treatments
B12 12
B13 13
or escape from immunity
B14 14
. Beyond surveillance for drug resistance, deep sequencing provides additional advantages to detect epistatic interactions
B15 15
, estimate population structure
B16 16
, identify evolutionary intermediates, and evaluate biodiversity of organisms within an ecosystem
B17 17
B18 18
B19 19
B20 20
B21 21
B22 22
B23 23
B24 24
B25 25
B26 26
.Biodiversity is used in population genetics to present a unified view of the extent of variation of life forms within habitats
B27 27
and assumes that genomes within an environment are taxonomically similar, randomly distributed, and sufficiently large
B28 28
. Assessments of biodiversity from deep sequencing data provide unprecedented views of the richness of immune loci in primates, zebra fish, and humans
17
18
26
or the complexity of microbiomes independent of an ability to culture microorganisms
21
24
25
B29 29
. Biodiversity defines complexity within populations that extend beyond evaluations of diversity based on pairwise genetic distance, the major approach for analysis of small data sets of HIV-1 sequences from infected individuals
B30 30
B31 31
. Biodiversity within HIV-1 populations might reflect host environments, infection by circulating recombinant forms of HIV-1 or co-infection by multiple subtypes, and provide unique and sensitive biomarkers for changes in viral populations. Moreover, structure of HIV-1 quasispecies, or the frequency distribution of viral variants within individuals, may reveal the potential for viral populations to evolve within a fitness landscape and contribute to viral persistence
4
B32 32
B33 33
B34 34
.We designed a deep-sequencing study of HIV-1 Env V3 quasispecies within peripheral blood cells that applied population genetics tools in a novel bioinformatics pipeline to define viral biodiversity, examine viral population structure, and explore directly the extent to which deep sequencing enriches analysis of the HIV-1 evolutionary landscape.
Results
Biodiversity of HIV-1 quasispecies
Biodiversity is evaluated by rarefaction analysis and defined as the number of operational taxonomic units (OTU) within a population
17
18
21
23
24
25
26
. HIV-1 Env V3 pyrosequences within each sample were clustered over a range of pairwise genetic distances from 0% to 10% to compare viral populations among individuals (Figure figr fid F1 1). When an upper clustering threshold of 10% was applied to approximate mean pairwise genetic distance found among subtype B Env sequences
B35 35
, the virus population formed a single OTU in S3, but included 3 or 4 OTU in S4 or S5 viral populations, 6 OTU in S1 and S6, or as many as 10 OTU in S2 (Table tblr tid T1 1). Biodiversity of viral populations evaluated at 0% distance (i.e., the unique level)
31
ranged from relatively low biodiversity (69 or 82 OTU) in S3 or S5, to 156 or 157 OTU in S6 or S4, or as high as 253 or 267 OTU in S1 or S2 (Figure 1). Even though viral biodiversity at the unique level was similar between some individuals, clustering genomes at distances from 1% to 5% revealed differences in complexity within host environments. For example, the virus population in S4 displayed reduced complexity compared with the population in S6, and was more similar to viral populations in S3 or S5 (Figure 1 and Table 1). Biodiversity calculated at 3% correlated significantly with biodiversity at the unique level among the individuals [r = 0.91, p = 0.01] and provided a rationale for clustering at 3% in subsequent analyses.
fig Figure 1caption Biodiversity among viral populationstext
b Biodiversity among viral populations. Pyrosequencing data sets from each individual were clustered at 0% (unique) to 10% genetic distances and displayed as rarefaction curves. Y-axis, number of OTU (number of sequence clusters); x-axis, percent of total pyrosequences (sequences sampled ÷ total number of sequences x 100%). Colors of curves indicate the level of clustering: yellow, 0%; black, 1%; blue, 2%; green, 3%; cyan, 4%; purple, 5%; red, 10%. Numbers of OTU at the end of curves at 0% distance represent biodiversity calculated from rarefaction curve at the sequence depth (100% of pyrosequences). Small red boxes indicate approximate sequence depth achieved by conventional clonal sequences.
graphic file 1742-4690-9-108-1
table
Table 1
Calculated and estimated biodiversity defined by operational taxonomic units (OTUs)
tgroup align left cols 7
colspec center colname c1 colnum 1 colwidth 1*
c2 2
c3 3
c4 4
c5 5
c6 6
c7
thead valign top
row
entry
PID
sup
a
nameend namest rowsep
Biodiversity (OTU)
b
0%
3%
10%
Calculated
Estimated
Calculated
Estimated
Calculated
Estimated
tfoot
a Patient identification.
b Biodiversity, expressed as operational taxonomic units (OTU), was calculated from rarefaction curves or estimated by abundance-based estimator (ACE) using ESPRIT software
25
when sequences were clustered at 0% (unique), 3%, or 10% distance.
tbody
S1
253
315
85
178
6
7
S2
267
293
99
137
10
12
S3
69
75
12
24
1
1
S4
157
183
21
55
3
4
S5
82
98
22
91
4
5
S6
156
200
67
132
6
7
Rarefaction curves at 3% distance approached, but failed to achieve a plateau, raising the possibility that depth of sequencing was insufficient to capture all viral diversity. Yet, estimated maximum biodiversity was only about two-fold greater than, and correlated with, calculated biodiversity (r = 0.89; p = 0.02) (Table 1), indicating that sequence depth (about 25-fold coverage) was sufficient to provide a robust assessment of V3 biodiversity within a sample. In general, biodiversity among the six subjects appeared unrelated to viral levels in plasma or cells, length of infection, or CD4 T cell levels (Additional file supplr sid S1 1), but revealed patterns of complexity within viral quasispecies in different host environments.
suppl
Additional file 1
Table S1. Characteristics of study participants at time of pyrosequencing.
name 1742-4690-9-108-S1.pdf
Click here for file
Population structure
To evaluate the complexity of viral population structure within each individual, unrooted phylogenetic trees were constructed to relate the distribution and frequency of sequence clusters (OTUs) with the proportion of amino acid sequences in each cluster. Each subject harbored virus populations in which 65% to 90% of sequences were organized into 1 to 3 dominant clusters with thousands of sequences per cluster (Figure F2 2). In each case, dominant sequence clusters were surrounded by swarms of clusters of less abundant variants forming star-like phylogenies. In general, structure of viral populations in different environments was distinguished not only by the number of dominant sequences, but by the distribution of the frequency of non-dominant viral variants, as well; for example, viruses in S2, S3, and S4 each had a single dominant population, but unique organization and frequency of less abundant variants.
Figure 2Organization of viral populations
Organization of viral populations. Unrooted neighbor-joining trees were developed for each pyrosequencing data set clustered at 3% pairwise distance. Symbols represent the proportion of total pyrosequences in a cluster: it empty circle, ≤ 0.25%; black inverted triangle, 0.25% to 1%; itblack square/it, 1% to 10%; itstarit, 10%.p
textgraphic file="1742-4690-9-108-2"fig
sec
sec
st
pEnriched evolutionary landscape within HIV-1 quasispeciesp
stpTo evaluate the relationship of archived viral populations from a single time point to viral populations over time, phylogenetic trees were inferred from deep sequence V3 data sets combined with longitudinal cell-associated and plasma clonal viral sequences. Combined data sets from S1 extended over a two-year period from about 3 to 5 years of ageinfection, when CD4 T cells ranged between 25% to 30% and viral set point was about 10,000 copies (Figure figr fid="F3"3Afigr). Phylogenetic analysis of conventional clonal V3 sequences from viral DNA and RNA at four time points provided a view of viral populations with significantly supported branches (L1 and L2), but unclear dominant viral population(s) (Figure figr fid="F3"3Bfigr). When pyrosequencing data were included in the phylogenetic construction, two dominant populations, one in L1 and the second in L2, became apparent (Figure figr fid="F3"3Cfigr). Low frequency (~1%) cell-associated V3 pyrosequencing variants colocalized on the tree with virus found about eighteen months earlier by conventional sequences in both cells and plasma. Moreover, pyrosequencing variants with frequency ranging from 0.25% to >10% in cells colocalized with conventional sequences found months later in plasma viral RNA. Overall, the array of viral variants identified by pyrosequencing at a single time point reflected the range of clonal sequences identified in longitudinal samples over 2-years of infection.p
fig id="F3"titlepFigure 3ptitlecaptionpPersistence of V3 variants in PBMC for S1pcaptiontext
pbPersistence of V3 variants in PBMC for S1.bbAb. Time line with rainbow colors indicate timing of samples (black dots for clonal sequences; black dot with P for pyrosequences), as well as CD4% (black line) and logsub10sub plasma viral levels (orange line), relative to agelength of infection in years. bBb. ML tree of longitudinal clonal V3 sequences resembled the topology of trees developed from Env V1 through V3 clonal sequences (sequence number: red – 19, yellow – 37, green – 7, blue 13). Symbols: ovals, plasma RNA sequences; rectangles, cell-associated DNA sequences. Size of symbols represents relative abundance of sequences in the population. Colors represent time line of samples. Asterisks on a branch represent significant approximate likelihood-ratio test (* 0.75, ** 0.90). Scale indicates 0.02 nucleotide substitutions per site. bCb. ML tree combining longitudinal conventional and single-time point deep sequences. Black symbols represent pyrosequences clustered at 3% pairwise distance with symbol shapes indicating proportion of sequences in each cluster: itempty circleit ≤ 0.25%; itblack inverted triangleit, 0.25% to 1%; black square, 1% to 10%; itstarit, 10%. Brackets indicate colocalization of cell-associated viral variants by pyrosequencing with: “a”, clonal RNA and DNA viral sequences from earlier time point; or “b”, clonal plasma viral variants from later time points. Scale indicates 0.02 nucleotide substitutions per site.p
textgraphic file="1742-4690-9-108-3"figpTo evaluate viral populations over longer periods of time, S5 samples collected from 6-wks to more than 6.6 years of age were analyzed (Figure figr fid="F4"4Afigr). Cell-associated V3 variants by conventional clonal sequencing shortly after birth had limited diversity, while at least two well-supported lineages of variants (L1 and L2) developed by 4.4 years of infection (Figure figr fid="F4"4Bfigr). Pyrosequences included two dominant clusters, both in L2, as well as the repertoire of V3 domains found over the course of infection (Figure figr fid="F4"4Cfigr). For example, some low frequency (~1%) cell-associated virus quasispecies found after 4.5 years of infection included V3 domains that colocalized with the cluster of viral DNA sequences identified shortly after birth (Figure figr fid="F4"4Cfigr). Other low frequency cell-associated V3 variants detected by pyrosequencing (0.25%) were closely related to viral RNA expressed in plasma more than two years later. Overall, the evolutionary landscape was defined by cyclic emergence of dominant populations from low-frequency variants.p
fig id="F4"titlepFigure 4ptitlecaptionpPersistence of V3 variants and evolutionary intermediatespcaptiontext
pbPersistence of V3 variants and evolutionary intermediates.bbAb. Time line with rainbow colors indicates timing of samples (black dots, clonal sequences; P, pyrosequences), CD4% (black line) and logsub10sub plasma viral load at one time point (an orange dot), relative to agelength of infection in years. bBb. ML tree of conventional sequences (sequence number: red – 10, yellow – 15, green – 8, blue 17) with most recent common ancestral nodes (anc) labeled for different lineages (green circles). Scale: 0.02 nucleotide substitutionssite. Symbols: ovals, plasma RNA sequences; rectangles, cell-associated DNA sequences. Size of symbols: relative abundance of sequences in the population. Colors: timing of samples. Asterisks on branches: significant approximate likelihood-ratio test (* 0.75, ** 0.90). bCb. ML tree combining longitudinal conventional and single-time point pyrosequences with anc nodes marked for different lineages (green circles: the same anc nodes as in panel B; red circles: additional anc nodes when pyrosequences filled in the phylogenetic landscape). Black symbols: represent pyrosequences clustered at 3% distance with symbol shapes indicating proportion of sequences in each cluster: itempty circleit ≤ 0.25%; itblack inverted triangleit, 0.25% to 1%; itblack squareit, 1% to 10%; itstarit, 10%. Brackets with “b”: clustering of cell associated viral variants by pyrosequencing with clonal plasma viral variants from a later time point. Red circle: colocalization of cell-associated virus from near birth with a subset of pyrosequences in cells 4.5 years later. bDb. Most recent common ancestors (MRCA) on ML tree of panel C. Anc1, anc2 and anc3: the same ancestral nodes on ML tree in panel B. Anc2’ and anc3’: additional ancestral nodes when pyrosequences fill in the evolutionary landscape. Numbers: amino acid positions relative to HIV-1subHXB2sub gp160 abbrgrpabbr bid="B36"36abbrabbrgrp. NOTE. MRCA analysis was not performed on S1 data because only single amino acid changes occurred between ancestral nodes on the conventional ML tree.p
textgraphic file="1742-4690-9-108-4"fig
sec
sec
st
pMost recent common ancestors in the evolutionary landscapep
stpV3 populations in S5 developed along lineages with multiple amino acid changes at branch nodes, providing an opportunity to infer the most recent common ancestor (MRCA) of each lineage. Based on clonal sequences, the earliest viral population gave rise through ancestral node 1 (anc1) to two subsequent lineages (Figure figr fid="F4"4Bfigr). L1 progressed through node 2 (anc2) with changes in V3 at two amino acid positions, E322D and Y316H (Figure figr fid="F4"4Dfigr), while L2 gave rise by two different amino acid substitutions, Q308R and E322K (Figure figr fid="F4"4Dfigr) to viruses at 6 to 7 years of infection through anc3 (Figure figr fid="F4"4Bfigr). Depth of conventional clonal sequencing was inadequate to assign a temporal order to the amino acid changes between MRCA at anc1 and anc2 or anc3. Inclusion of pyrosequences in the analysis provided sufficient coverage of the viral population to infer that the E322D change (anc2’) appeared before the Y316H substitution, while Q308R (anc3’) preceded the E322K substitution (Figure figr fid="F4"4Dfigr).p
sec
sec
sec
st
pDiscussionp
stpBiodiversity is routinely applied to metagenomics of a variety of species, including the human microbiome, but only limited, if any, assessment of viromes in different ecological niches. Our study applies an efficient bioinformatic pipeline that we developed to assess the complexity of HIV-1 quasispecies in unique ecosystems within infected individuals. The power of pyrosequencing to generate extensive sequence data sets provides a foundation to apply population genetic analyses and extends the value for deep sequencing beyond analysis of rare variants that might indicate reduced sensitivity to drugs. Analysis of biodiversity based on sequence clustering provides a novel viral population profile for different environments independent of viral levels in cells or plasma, perhaps reflecting length of infection if sequences were archived in lineages of long-lived cells. Consistent with this model, complex viral population structure with high biodiversity appeared as early as eighteen months, or by four to six years, of infection in some individuals. Yet, similar periods of infection in other individuals were characterized by monomorphic viral populations with low complexity, indicating that biodiversity of V3 populations represents complex combinations of factors; for example, changes in viral fitness in the environmental landscape in response to host immunity, host target cells, or coreceptor evolution under selective pressure.ppAnother novel aspect of our study involved a combination of cross-sectional deep sequencing with conventional longitudinal sequences to provide high-resolution detection of evolutionary intermediates, which may be less fit or infrequent in peripheral blood, but nonetheless contribute to the genetic flexibility of the population. The specific order of amino acid substitutions over time may reflect important epistatic interactions that could focus detection of compensatory mutations contributing to fitness in the genetic landscape to other regions of the virus genome. Deep sequencing data sets fill in the evolutionary landscape and increase the power to infer the temporal accumulation of amino acid substitutions, or provide a basis for rational functional analysis of ancestral envelopes and the progeny that emerge from recurring viral population bottlenecks.ppAn apparent paradox from our analyses is the contribution by low-frequency, presumably less-fit viral variants, rather than the dominant variants, to next generation plasma HIV-1 populations with enhanced fitness. Low-frequency variants expand the fitness landscape for virus populations, while providing an array of evolutionary options to maximize survival in a changing ecosystem abbrgrp
abbr bid="B34"34abbr
abbrgrp. Low frequency cell-associated HIV-1 quasispecies may represent residual genomes from a past dominant population archived in long-lived cells, a sequestered reservoir that only infrequently finds its way into the peripheral blood, andor progenitors that gives rise to the next generation of dominant variants in the plasma. Transient dominance of a population leaves a molecular trail that persists as low frequency variants archived in peripheral blood. In agreement with studies of heterosexual HIV-1 transmission abbrgrp
abbr bid="B37"37abbr
abbrgrp, archeological evidence of the earliest viral populations was found in our study of pediatric cells as long as four years after infection by maternal transmission, suggesting those early viruses, or at least their V3 domains, endure during the natural history of infection.ppWhile the study focused on HIV-1 populations in human environments, the approach is applicable to an array of viruses with complex populations, including other subtypes or recombinant forms of HIV-1, hepatitis C or hepatitis B viruses, as well as the repertoire of related viruses that infect animals. Increased depth of sampling and extended length of the target region now possible by pyrosequencing combined with efficient bioinformatic pipelines provides a basis for developing quantitative measures of the ebb and flow of viral populations in changing environments.p
sec
sec
st
pConclusionsp
stpDeep sequencing of HIV-1 Env V3 hypervariable domains combined with conventional longitudinal V3 sequence data sets provides high resolution of the evolutionary landscape of HIV-1 quasispecies, reveals the richness of viral diversity within the ecosystems of infected individuals, explores the ebb and flow of dominant high-fit and low frequency less-fit viral variants, infers details of multistep evolutionary events in the fitness landscape, and identifies persistence of low-frequency viral variants in peripheral blood cells that resemble transmitted viruses.p
sec
sec
st
pMethodsp
st
sec
st
pSubjectsp
stpPeripheral mononuclear cells (PBMC) were obtained from a cohort of HIV-1 children with parental informed consent under a protocol approved by the Institutional Review Board of the University of Florida. Study included six therapy-naïve subjects, infected perinatally between 1989 and 1995 through maternal transmission of subtype B HIV-1, with median plasma viral load of 4.9 (quartile range 4.6 to 5.3) logsub10sub HIV-1 RNA copies per ml, median agelength of infection of 4.4 (quartile range: 2.0 to 5.1) years, and median CD4 levels of 22% (quartile range 13.3% to 25.5%) at the time of deep sequencing (Additional file supplr sid="S1"1supplr).p
sec
sec
st
pClonal and pyrosequencesp
stpClonal sequences from HIV-1 Env V1 through V5 were generated using AmpliTaq (Life Technologies Corporation, Carlsbad, CA, US) as previously described abbrgrp
abbr bid="B30"30abbr
abbrgrp. Amplicon libraries were constructed from PBMC DNA with 400 HIV-1 copies using GoTaq DNA polymerase (Promega, Madison, WI, US), as previously described abbrgrp
abbr bid="B38"38abbr
abbr bid="B39"39abbr
abbrgrp and submitted to the University of Florida Interdisciplinary Center for Biotechnology Research for pyrosequencing using a proprietary DNA polymerase (a mixture of itTaqit and high fidelity DNA polymerases) (Roche454 Life Sciences) on a Genome Sequencer FLX (Roche454 Life Sciences) to produce an average of about 10,000 reads per sample or about 25-fold coverage of 400 template copies (10,000 sequences ÷ 400 viral copies = 25 fold coverage). Raw clonal and pyrosequencing nucleic acid data sets are deposited in EMBL data base (EMBL accession numbers pending).p
sec
sec
st
pAnalysis pipelinep
stpA bioinformatics pipeline developed by our group was applied to the data sets. The pipeline incorporates a series of quality control and error correction filters to reduce random nucleotide substitutions, correct frame shifts, and eliminate hypermutated or recombinant sequences (Additional file supplr sid="S2"2supplr). Overall, the analysis pipeline produced high-quality data sets with retention of about 90% to 97% of the sequences from any sample (Additional file supplr sid="S3"3supplr). Integrity of error-corrected datasets from deep sequencing was verified by phylogentic construction (Additional file supplr sid="S4"4supplr).p
suppl id="S2"
title
pAdditional file 2p
title
text
p
bError correction.b
p
text
file name="1742-4690-9-108-S2.pdf"
pClick here for filep
file
suppl
suppl id="S3"
title
pAdditional file 3p
title
text
p
bTable S2.b Sequential filtering of data sets through the bioinformatics pipeline.p
text
file name="1742-4690-9-108-S3.pdf"
pClick here for filep
file
suppl
suppl id="S4"
title
pAdditional file 4p
title
text
p
bFigure S1.b Phylogenetic tree of clustered error-corrected pyrosequences from individuals studied.p
text
file name="1742-4690-9-108-S4.pdf"
pClick here for filep
file
supplpIn general, maximum likelihood pairwise distances within deep sequence data sets were significantly greater than among conventional sequence data from each individual (p < 0.001). To assess biodiversity of HIV-1 Env quasispecies, rarefaction curves were constructed using the ESPRIT software suite abbrgrp
abbr bid="B25"25abbr
abbrgrp. Numbers of OTU are displayed on the y-axis as a function of percentage of sequences (sequences sampled ÷ total sequences generated from 400 input viral copies x 100%) displayed on the x-axis. Sequences were clustered across a range of pairwise distances from 0% to 10% with all previously collapsed reads counted for their absolute occurrence. One OTU equates to one sequence cluster. ESPRIT was also used to estimate maximum biodiversity within 400 input viral copies using abundance-based coverage estimator (ACE), constructed consensus sequence from each sequence cluster, and calculated the frequency of each OTU.p
sec
sec
st
pConstruction of phylogenetic trees and most recent common ancestor (MRCA) analysisp
stpMaximum likelihood (ML) phylogenetic trees combined deep sequencing cluster consensus reads and longitudinal clonal sequences for subjects S1 and S5 were constructed from nucleotide sequences aligned in BioEdit. Alignments were trimmed to the V3 loop defined by codons for cysteine 296 to cysteine 331 based on gp160 amino acid numbering in HXB2 genome, and identical nucleic acid clusters were collapsed.ppPhylogenetic signal within S1 or S5 datasets of aligned sequences was evaluated by likelihood mapping analyses with the program TREE-PUZZLE, and proven to be sufficient for reliable phylogeny inference abbrgrp
abbr bid="B40"40abbr
abbr bid="B41"41abbr
abbr bid="B42"42abbr
abbrgrp (Additional file supplr sid="S5"5supplr). Trees were constructed as previously described abbrgrp
abbr bid="B9"9abbr
abbrgrp. Briefly, the heuristic search for the best tree was performed using a neighbor-joining tree and the tree bisection reconnection algorithm with PAUP* 4.0b10 abbrgrp
abbr bid="B43"43abbr
abbr bid="B44"44abbr
abbrgrp. Trees were rooted using the earliest clonal sequences as the out group. Significance of branches was determined by the approximate likelihood ratio test abbrgrp
abbr bid="B45"45abbr
abbr bid="B46"46abbr
abbr bid="B47"47abbr
abbrgrp. For analysis of MRCA, ancestral nucleic acid sequences in the genealogy obtained for S5 were inferred by the maximum likelihood method using the codon substitution model M0 in the PAML software package abbrgrp
abbr bid="B47"47abbr
abbrgrp. Reconstructed ancestral sequences from internal nodes were analyzed in BioEdit for nonsynonymous changes at each codon position.p
suppl id="S5"
title
pAdditional file 5p
title
text
p
bFigure S2.b Likelihood mapping analysis to evaluate phylogenetic signal.p
text
file name="1742-4690-9-108-S5.pdf"
pClick here for filep
file
suppl
sec
sec
st
pStatistical analysisp
stpPearson correlation was applied to analyze correlations between biodiversity calculated from rarefaction curves generated at 0% and 3% pairwise distances, and between calculated and ACE-estimated maximum biodiversity. Statistical analyses were performed using SAS version 9.1 (SAS 191 Institute, Cary, NC) with P < 0.05 defined as significant.p
sec
sec
sec
st
pCompeting interestsp
stpThe authors declare that they have no competing interests.p
sec
sec
st
pAuthors’ contributionsp
stpLY, WGF, JWS, and MMG designed the study, obtained funding, analyzed and interpreted the results. JWS directed the clinical program and provided clinical samples and data about the subjects. LY and LL with WGF, YS, and MMG were involved in developing analytical pipeline for data analysis, and applying population genetics analysis tools. LY developed the experiments, the methods, and supervised data acquisition and analysis by BPG, WBW, and YS, and collaborated with WH for biostatistical analyses; MMG worked with ACL and MS to analyze distances, phylogeny, most recent common ancestor, and integration of deep sequencing with conventional sequences. Manuscript was written by LY and MMG with input from all authors. All authors read and approved the final manuscript.p
sec
sec
st
pAuthors’ informationp
stpLL is currently a faculty member at the University of Arizona.ppYS is currently a faculty member at the University of Buffalo.ppWH is currently a faculty member at the Stony Brook University Medical Center.ppBPG is currently a medical student in Philadelphia College of Osteopathic Medicine in Suwanee, Georgia. WBW is currently a postdoctoral research fellow at the Duke University.p
sec
bdy
bm
ack
sec
st
pAcknowledgementsp
stpThe authors thank the study volunteers for participating; Drs. Connie J. Mulligan, Volker Mai, Mark A. Wallet, Nazle Mendonca Veres, and Rebecca R. Gray for critical reading of this manuscript. Research was supported in part by NIHNIAID R01 AI065265 and R01 AI047723; Elizabeth Glaser Pediatric AIDS Foundation MV-00-9-900-0143-0-00; Florida Center for AIDS Research; Center for Research in Human Immune Deficiency and Inflammation; and Stephany W. Holloway University Chair for AIDS Research.p
sec
ack
refgrpbibl id="B1"titlepCharacterization of minority subpopulations in the mutant spectrum of HIV-1 quasispecies by successive specific amplificationsptitleaugausnmGarcia-ArriazasnmfnmJfnmauausnmDomingosnmfnmEfnmauausnmBrionessnmfnmCfnmauaugsourceVirus Ressourcepubdate2007pubdatevolume129volumeissue2issuefpage123fpagelpage134lpagexrefbibpubidlistpubid idtype="doi"10.1016j.virusres.2007.07.001pubidpubid idtype="pmpid" link="fulltext"17706828pubidpubidlistxrefbibbiblbibl id="B2"titlepClinical management of HIV-1 resistanceptitleaugausnmParedessnmfnmRfnmauausnmClotetsnmfnmBfnmauaugsourceAntiviral Ressourcepubdate2010pubdatevolume85volumeissue1issuefpage245fpagelpage265lpagexrefbibpubidlistpubid idtype="doi"10.1016j.antiviral.2009.09.015pubidpubid idtype="pmpid" link="fulltext"19808056pubidpubidlistxrefbibbiblbibl id="B3"titlepViral evolution and escape during acute HIV-1 infectionptitleaugausnmBoutwellsnmfnmCLfnmauausnmRollandsnmfnmMMfnmauausnmHerbecksnmfnmJTfnmauausnmMullinssnmfnmJIfnmauausnmAllensnmfnmTMfnmauaugsourceJ Infect Dissourcepubdate2010pubdatevolume202volumeissueSuppl 2issuefpageS309fpagelpage314lpagexrefbibpubidlistpubid idtype="pmcid"2945609pubidpubid idtype="pmpid" link="fulltext"20846038pubidpubidlistxrefbibbiblbibl id="B4"titlepHIV-1 isolates are rapidly evolving quasispecies: evidence for viral mixtures and preferred nucleotide substitutionsptitleaugausnmGoodenowsnmfnmMfnmauausnmHuetsnmfnmTfnmauausnmSaurinsnmfnmWfnmauausnmKwoksnmfnmSfnmauausnmSninskysnmfnmJfnmauausnmWain-HobsonsnmfnmSfnmauaugsourceJ Acquir Immune Defic Syndrsourcepubdate1989pubdatevolume2volumeissue4issuefpage344fpagelpage352lpagexrefbibpubid idtype="pmpid"2754611pubidxrefbibbiblbibl id="B5"titlepIndependent variation and positive selection in env V1 and V2 domains within maternal-infant strains of human immunodeficiency virus type 1 in vivoptitleaugausnmLamerssnmfnmSLfnmauausnmSleasmansnmfnmJWfnmauausnmShesnmfnmJXfnmauausnmBarriesnmfnmKAfnmauausnmPomeroysnmfnmSMfnmauausnmBarrettsnmfnmDJfnmauausnmGoodenowsnmfnmMMfnmauaugsourceJ Virolsourcepubdate1993pubdatevolume67volumeissue7issuefpage3951fpagelpage3960lpagexrefbibpubidlistpubid idtype="pmcid"237762pubidpubid idtype="pmpid" link="fulltext"8510212pubidpubidlistxrefbibbiblbibl id="B6"titlepPersistence of multiple maternal genotypes of human immunodeficiency virus type I in infants infected by vertical transmissionptitleaugausnmLamerssnmfnmSLfnmauausnmSleasmansnmfnmJWfnmauausnmShesnmfnmJXfnmauausnmBarriesnmfnmKAfnmauausnmPomeroysnmfnmSMfnmauausnmBarrettsnmfnmDJfnmauausnmGoodenowsnmfnmMMfnmauaugsourceJ Clin Investsourcepubdate1994pubdatevolume93volumeissue1issuefpage380fpagelpage390lpagexrefbibpubidlistpubid idtype="doi"10.1172JCI116970pubidpubid idtype="pmcid"293789pubidpubid idtype="pmpid" link="fulltext"8282808pubidpubidlistxrefbibbiblbibl id="B7"titlepImportance and detection of virus reservoirs and compartments of HIV infectionptitleaugausnmNicklesnmfnmDCfnmauausnmShrinersnmfnmDfnmauausnmMittlersnmfnmJEfnmauausnmFrenkelsnmfnmLMfnmauausnmMullinssnmfnmJIfnmauaugsourceCurr Opin Microbiolsourcepubdate2003pubdatevolume6volumeissue4issuefpage410fpagelpage416lpagexrefbibpubidlistpubid idtype="doi"10.1016S1369-5274(03)00096-1pubidpubid idtype="pmpid" link="fulltext"12941414pubidpubidlistxrefbibbiblbibl id="B8"titlepThe evolutionary dynamics of HIV-1 quasispecies and the development of immunodeficiency diseaseptitleaugausnmNowaksnmfnmMAfnmauausnmMaysnmfnmRMfnmauausnmAndersonsnmfnmRMfnmauaugsourceAIDSsourcepubdate1990pubdatevolume4volumeissue11issuefpage1095fpagelpage1103lpagexrefbibpubidlistpubid idtype="doi"10.109700002030-199011000-00007pubidpubid idtype="pmpid"2282182pubidpubidlistxrefbibbiblbibl id="B9"titlepPhylodynamics of HIV-1 in lymphoid and non-lymphoid tissues reveals a central role for the thymus in emergence of CXCR4-using quasispeciesptitleaugausnmSalemisnmfnmMfnmauausnmBurkhardtsnmfnmBRfnmauausnmGraysnmfnmRRfnmauausnmGhaffarisnmfnmGfnmauausnmSleasmansnmfnmJWfnmauausnmGoodenowsnmfnmMMfnmauaugsourcePLoS Onesourcepubdate2007pubdatevolume2volumeissue9issuefpagee950fpagexrefbibpubidlistpubid idtype="doi"10.1371journal.pone.0000950pubidpubid idtype="pmcid"1978532pubidpubid idtype="pmpid" link="fulltext"17895991pubidpubidlistxrefbibbiblbibl id="B10"titlepAnalysis of sequence diversity in hypervariable regions of the external glycoprotein of human immunodeficiency virus type 1ptitleaugausnmSimmondssnmfnmPfnmauausnmBalfesnmfnmPfnmauausnmLudlamsnmfnmCAfnmauausnmBishopsnmfnmJOfnmauausnmBrownsnmfnmAJfnmauaugsourceJ Virolsourcepubdate1990pubdatevolume64volumeissue12issuefpage5840fpagelpage5850lpagexrefbibpubidlistpubid idtype="pmcid"248744pubidpubid idtype="pmpid" link="fulltext"2243378pubidpubidlistxrefbibbiblbibl id="B11"titlepSelective transmission of human immunodeficiency virus type-1 variants from mothers to infantsptitleaugausnmWolinskysnmfnmSMfnmauausnmWikesnmfnmCMfnmauausnmKorbersnmfnmBTfnmauausnmHuttosnmfnmCfnmauausnmParkssnmfnmWPfnmauausnmRosenblumsnmfnmLLfnmauausnmKunstmansnmfnmKJfnmauausnmFurtadosnmfnmMRfnmauausnmMunozsnmfnmJLfnmauaugsourceSciencesourcepubdate1992pubdatevolume255volumeissue5048issuefpage1134fpagelpage1137lpagexrefbibpubidlistpubid idtype="doi"10.1126science.1546316pubidpubid idtype="pmpid" link="fulltext"1546316pubidpubidlistxrefbibbiblbibl id="B12"titlepLow-abundance drug-resistant viral variants in chronically HIV-infected, antiretroviral treatment-naive patients significantly impact treatment outcomesptitleaugausnmSimensnmfnmBBfnmauausnmSimonssnmfnmJFfnmauausnmHullsieksnmfnmKHfnmauausnmNovaksnmfnmRMfnmauausnmMacarthursnmfnmRDfnmauausnmBaxtersnmfnmJDfnmauausnmHuangsnmfnmCfnmauausnmLubeskisnmfnmCfnmauausnmTurenchalksnmfnmGSfnmauausnmBravermansnmfnmMSfnmauausnmDesanysnmfnmBfnmauausnmRothbergsnmfnmJMfnmauausnmEgholmsnmfnmMfnmauausnmKozalsnmfnmMJfnmauaugsourceJ Infect Dissourcepubdate2009pubdatevolume199volumeissue5issuefpage693fpagelpage701lpagexrefbibpubidlistpubid idtype="doi"10.1086596736pubidpubid idtype="pmpid" link="fulltext"19210162pubidpubidlistxrefbibbiblbibl id="B13"titlepQuantitative deep sequencing reveals dynamic HIV-1 escape and large population shifts during CCR5 antagonist therapy in vivoptitleaugausnmTsibrissnmfnmAMfnmauausnmKorbersnmfnmBfnmauausnmArnaoutsnmfnmRfnmauausnmRusssnmfnmCfnmauausnmLosnmfnmCCfnmauausnmLeitnersnmfnmTfnmauausnmGaschensnmfnmBfnmauausnmTheilersnmfnmJfnmauausnmParedessnmfnmRfnmauausnmSusnmfnmZfnmauausnmHughessnmfnmMDfnmauausnmGulicksnmfnmRMfnmauausnmGreavessnmfnmWfnmauausnmCoakleysnmfnmEfnmauausnmFlexnersnmfnmCfnmauausnmNusbaumsnmfnmCfnmauausnmKuritzkessnmfnmDRfnmauaugsourcePLoS Onesourcepubdate2009pubdatevolume4volumeissue5issuefpagee5683fpagexrefbibpubidlistpubid idtype="doi"10.1371journal.pone.0005683pubidpubid idtype="pmcid"2682648pubidpubid idtype="pmpid" link="fulltext"19479085pubidpubidlistxrefbibbiblbibl id="B14"titlepWhole genome deep sequencing of HIV-1 reveals the impact of early minor variants upon immune recognition during acute infectionptitleaugausnmHennsnmfnmMRfnmauausnmBoutwellsnmfnmCLfnmauausnmCharleboissnmfnmPfnmauausnmLennonsnmfnmNJfnmauausnmPowersnmfnmKAfnmauausnmMacalaladsnmfnmARfnmauausnmBerlinsnmfnmAMfnmauausnmMalboeufsnmfnmCMfnmauausnmRyansnmfnmEMfnmauausnmGnerresnmfnmSfnmauausnmZodysnmfnmMCfnmauausnmErlichsnmfnmRLfnmauausnmGreensnmfnmLMfnmauausnmBericalsnmfnmAfnmauausnmWangsnmfnmYfnmauausnmCasalisnmfnmMfnmauausnmStreecksnmfnmHfnmauausnmBloomsnmfnmAKfnmauausnmDudeksnmfnmTfnmauausnmTullysnmfnmDfnmauausnmNewmansnmfnmRfnmauausnmAxtensnmfnmKLfnmauausnmGladdensnmfnmADfnmauausnmBattissnmfnmLfnmauausnmKempersnmfnmMfnmauausnmZengsnmfnmQfnmauausnmSheasnmfnmTPfnmauausnmGujjasnmfnmSfnmauausnmZedlacksnmfnmCfnmauausnmGassersnmfnmOfnmauausnmBrandersnmfnmCfnmauausnmHesssnmfnmCfnmauausnmGunthardsnmfnmHFfnmauausnmBrummesnmfnmZLfnmauausnmBrummesnmfnmCJfnmauausnmBaznersnmfnmSfnmauausnmRychertsnmfnmJfnmauausnmTinsleysnmfnmJPfnmauausnmMayersnmfnmKHfnmauausnmRosenbergsnmfnmEfnmauausnmPereyrasnmfnmFfnmauausnmLevinsnmfnmJZfnmauausnmYoungsnmfnmSKfnmauausnmJessensnmfnmHfnmauausnmAltfeldsnmfnmMfnmauausnmBirrensnmfnmBWfnmauausnmWalkersnmfnmBDfnmauausnmAllensnmfnmTMfnmauaugsourcePLoS Pathogsourcepubdate2012pubdatevolume8volumeissue3issuefpagee1002529fpagexrefbibpubidlistpubid idtype="doi"10.1371journal.ppat.1002529pubidpubid idtype="pmcid"3297584pubidpubid idtype="pmpid" link="fulltext"22412369pubidpubidlistxrefbibbiblbibl id="B15"titlepPhylogenetic analysis of population-based and deep sequencing data to identify coevolving sites in the nef gene of HIV-1ptitleaugausnmPoonsnmfnmAFfnmauausnmSwensonsnmfnmLCfnmauausnmDongsnmfnmWWfnmauausnmDengsnmfnmWfnmauausnmKosakovsky PondsnmfnmSLfnmauausnmBrummesnmfnmZLfnmauausnmMullinssnmfnmJIfnmauausnmRichmansnmfnmDDfnmauausnmHarrigansnmfnmPRfnmauausnmFrostsnmfnmSDfnmauaugsourceMol Biol Evolsourcepubdate2009pubdatevolume27volumeissue4issuefpage819fpagelpage832lpagexrefbibpubidlistpubid idtype="pmcid"2877536pubidpubid idtype="pmpid" link="fulltext"19955476pubidpubidlistxrefbibbiblbibl id="B16"titlepViral population estimation using pyrosequencingptitleaugausnmErikssonsnmfnmNfnmauausnmPachtersnmfnmLfnmauausnmMitsuyasnmfnmYfnmauausnmRheesnmfnmSYfnmauausnmWangsnmfnmCfnmauausnmGharizadehsnmfnmBfnmauausnmRonaghisnmfnmMfnmauausnmShafersnmfnmRWfnmauausnmBeerenwinkelsnmfnmNfnmauaugsourcePLoS Comput Biolsourcepubdate2008pubdatevolume4volumeissue4issuefpagee1000074fpagexrefbibpubidlistpubid idtype="pmcid"2323617pubidpubid idtype="pmpid" link="fulltext"18437230pubidpubidlistxrefbibbiblbibl id="B17"titlepUltradeep pyrosequencing detects complex patterns of CD8+ T-lymphocyte escape in simian immunodeficiency virus-infected macaquesptitleaugausnmBimbersnmfnmBNfnmauausnmBurwitzsnmfnmBJfnmauausnmO’ConnorsnmfnmSfnmauausnmDetmersnmfnmAfnmauausnmGosticksnmfnmEfnmauausnmLanksnmfnmSMfnmauausnmPricesnmfnmDAfnmauausnmHughessnmfnmAfnmauausnmO’ConnorsnmfnmDfnmauaugsourceJ Virolsourcepubdate2009pubdatevolume83volumeissue16issuefpage8247fpagelpage8253lpagexrefbibpubidlistpubid idtype="doi"10.1128JVI.00897-09pubidpubid idtype="pmcid"2715741pubidpubid idtype="pmpid" link="fulltext"19515775pubidpubidlistxrefbibbiblbibl id="B18"titlepMeasurement and clinical monitoring of human lymphocyte clonality by massively parallel VDJ pyrosequencingptitleaugausnmBoydsnmfnmSDfnmauausnmMarshallsnmfnmELfnmauausnmMerkersnmfnmJDfnmauausnmManiarsnmfnmJMfnmauausnmZhangsnmfnmLNfnmauausnmSahafsnmfnmBfnmauausnmJonessnmfnmCDfnmauausnmSimensnmfnmBBfnmauausnmHanczaruksnmfnmBfnmauausnmNguyensnmfnmKDfnmauausnmNadeausnmfnmKCfnmauausnmEgholmsnmfnmMfnmauausnmMiklossnmfnmDBfnmauausnmZehndersnmfnmJLfnmauausnmFiresnmfnmAZfnmauaugsourceSci Transl Medsourcepubdate2009pubdatevolume1volumeissue12issuefpage12ra23fpagexrefbibpubidlistpubid idtype="doi"10.1126scitranslmed.3000540pubidpubid idtype="pmcid"2819115pubidpubid idtype="pmpid" link="fulltext"20161664pubidpubidlistxrefbibbiblbibl id="B19"titlepIdentifying genetic determinants needed to establish a human gut symbiont in its habitatptitleaugausnmGoodmansnmfnmALfnmauausnmMcNultysnmfnmNPfnmauausnmZhaosnmfnmYfnmauausnmLeipsnmfnmDfnmauausnmMitrasnmfnmRDfnmauausnmLozuponesnmfnmCAfnmauausnmKnightsnmfnmRfnmauausnmGordonsnmfnmJIfnmauaugsourceCell Host Microbesourcepubdate2009pubdatevolume6volumeissue3issuefpage279fpagelpage289lpagexrefbibpubidlistpubid idtype="doi"10.1016j.chom.2009.08.003pubidpubid idtype="pmcid"2895552pubidpubid idtype="pmpid" link="fulltext"19748469pubidpubidlistxrefbibbiblbibl id="B20"titlepMicrobial community profiling for human microbiome projects: tools, techniques, and challengesptitleaugausnmHamadysnmfnmMfnmauausnmKnightsnmfnmRfnmauaugsourceGenome Ressourcepubdate2009pubdatevolume19volumeissue7issuefpage1141fpagelpage1152lpagexrefbibpubidlistpubid idtype="doi"10.1101gr.085464.108pubidpubid idtype="pmpid" link="fulltext"19383763pubidpubidlistxrefbibbiblbibl id="B21"titlepPyrosequencing analysis of the oral microflora of healthy adultsptitleaugausnmKeijsersnmfnmBJfnmauausnmZaurasnmfnmEfnmauausnmHusesnmfnmSMfnmauausnmvan der VossensnmfnmJMfnmauausnmSchurensnmfnmFHfnmauausnmMontijnsnmfnmRCfnmauausnmten CatesnmfnmJMfnmauausnmCrielaardsnmfnmWfnmauaugsourceJ Dent Ressourcepubdate2008pubdatevolume87volumeissue11issuefpage1016fpagelpage1020lpagexrefbibpubidlistpubid idtype="doi"10.1177154405910808701104pubidpubid idtype="pmpid" link="fulltext"18946007pubidpubidlistxrefbibbiblbibl id="B22"titlepMolecular analysis of bacterial community structure and diversity in unimproved and improved upland grass pasturesptitleaugausnmMcCaigsnmfnmAEfnmauausnmGloversnmfnmLAfnmauausnmProssersnmfnmJIfnmauaugsourceAppl Environ Microbiolsourcepubdate1999pubdatevolume65volumeissue4issuefpage1721fpagelpage1730lpagexrefbibpubidlistpubid idtype="pmcid"91243pubidpubid idtype="pmpid" link="fulltext"10103273pubidpubidlistxrefbibbiblbibl id="B23"titlepIntroducing DOTUR, a computer program for defining operational taxonomic units and estimating species richnessptitleaugausnmSchlosssnmfnmPDfnmauausnmHandelsmansnmfnmJfnmauaugsourceAppl Environ Microbiolsourcepubdate2005pubdatevolume71volumeissue3issuefpage1501fpagelpage1506lpagexrefbibpubidlistpubid idtype="doi"10.1128AEM.71.3.1501-1506.2005pubidpubid idtype="pmcid"1065144pubidpubid idtype="pmpid" link="fulltext"15746353pubidpubidlistxrefbibbiblbibl id="B24"titlepMicrobial diversity in the deep sea and the underexplored “rare biosphere”ptitleaugausnmSoginsnmfnmMLfnmauausnmMorrisonsnmfnmHGfnmauausnmHubersnmfnmJAfnmauausnmMarksnmfnmWDfnmauausnmHusesnmfnmSMfnmauausnmNealsnmfnmPRfnmauausnmArrietasnmfnmJMfnmauausnmHerndlsnmfnmGJfnmauaugsourceProc Natl Acad Sci U S Asourcepubdate2006pubdatevolume103volumeissue32issuefpage12115fpagelpage12120lpagexrefbibpubidlistpubid idtype="doi"10.1073pnas.0605127103pubidpubid idtype="pmcid"1524930pubidpubid idtype="pmpid" link="fulltext"16880384pubidpubidlistxrefbibbiblbibl id="B25"titlepESPRIT: estimating species richness using large collections of 16S rRNA pyrosequencesptitleaugausnmSunsnmfnmYfnmauausnmCaisnmfnmYfnmauausnmLiusnmfnmLfnmauausnmYusnmfnmFfnmauausnmFarrellsnmfnmMLfnmauausnmMcKendreesnmfnmWfnmauausnmFarmeriesnmfnmWfnmauaugsourceNucleic Acids Ressourcepubdate2009pubdatevolume37volumeissue10issuefpagee76fpagexrefbibpubidlistpubid idtype="doi"10.1093nargkp285pubidpubid idtype="pmcid"2691849pubidpubid idtype="pmpid" link="fulltext"19417062pubidpubidlistxrefbibbiblbibl id="B26"titlepHigh-throughput sequencing of the zebrafish antibody repertoireptitleaugausnmWeinsteinsnmfnmJAfnmauausnmJiangsnmfnmNfnmauausnmWhitesnmfnmRAfnmsufIIIsufauausnmFishersnmfnmDSfnmauausnmQuakesnmfnmSRfnmauaugsourceSciencesourcepubdate2009pubdatevolume324volumeissue5928issuefpage807fpagelpage810lpagexrefbibpubidlistpubid idtype="doi"10.1126science.1170020pubidpubid idtype="pmcid"3086368pubidpubid idtype="pmpid" link="fulltext"19423829pubidpubidlistxrefbibbiblbibl id="B27"titlepSave those molecules: molecular biodiversity and lifeptitleaugausnmCampbellsnmfnmAfnmauaugsourceJournal of Applied Ecologysourcepubdate2003pubdatevolume40volumeissue2issuefpage193fpagelpage203lpagexrefbibpubid idtype="doi"10.1046j.1365-2664.2003.00803.xpubidxrefbibbiblbibl id="B28"augausnmNewtonsnmfnmACfnmauaugsourceForest Ecology and preservation: A Handbook of TechniquessourcepublisherOxford: Illustarted Edition editionpublisherpubdate1999pubdatebiblbibl id="B29"titlepStructure, function and diversity of the healthy human microbiomeptitleaugaucnmHuman Microbiome Project ConsortiumcnmauaugsourceNaturesourcepubdate2012pubdatevolume486volumeissue7402issuefpage207fpagelpage214lpagexrefbibpubidlistpubid idtype="doi"10.1038nature11234pubidpubid idtype="pmpid" link="fulltext"22699609pubidpubidlistxrefbibbiblbibl id="B30"titlepGenetic determinants in HIV-1 Gag and Env V3 are related to viral response to combination antiretroviral therapy with a protease inhibitorptitleaugausnmHosnmfnmSKfnmauausnmPerezsnmfnmEEfnmauausnmRosesnmfnmSLfnmauausnmComansnmfnmRMfnmauausnmLowesnmfnmACfnmauausnmHousnmfnmWfnmauausnmMasnmfnmCfnmauausnmLawrencesnmfnmRMfnmauausnmDunnsnmfnmBMfnmauausnmSleasmansnmfnmJWfnmauausnmGoodenowsnmfnmMMfnmauaugsourceAIDSsourcepubdate2009pubdatevolume23volumeissue13issuefpage1631fpagelpage1640lpagexrefbibpubidlistpubid idtype="doi"10.1097QAD.0b013e32832e0599pubidpubid idtype="pmpid" link="fulltext"19625947pubidpubidlistxrefbibbiblbibl id="B31"titlepMassively parallel pyrosequencing highlights minority variants in the HIV-1 env quasispecies deriving from lymphomonocyte sub-populationsptitleaugausnmRozerasnmfnmGfnmauausnmAbbatesnmfnmIfnmauausnmBrusellessnmfnmAfnmauausnmVlassisnmfnmCfnmauausnmD’OffizisnmfnmGfnmauausnmNarcisosnmfnmPfnmauausnmChillemisnmfnmGfnmauausnmProsperisnmfnmMfnmauausnmIppolitosnmfnmGfnmauausnmCapobianchisnmfnmMRfnmauaugsourceRetrovirologysourcepubdate2009pubdatevolume6volumefpage15fpagexrefbibpubidlistpubid idtype="doi"10.11861742-4690-6-15pubidpubid idtype="pmcid"2660291pubidpubid idtype="pmpid" link="fulltext"19216757pubidpubidlistxrefbibbiblbibl id="B32"titlepRNA virus mutations and fitness for survivalptitleaugausnmDomingosnmfnmEfnmauausnmHollandsnmfnmJJfnmauaugsourceAnnu Rev Microbiolsourcepubdate1997pubdatevolume51volumefpage151fpagelpage178lpagexrefbibpubidlistpubid idtype="doi"10.1146annurev.micro.51.1.151pubidpubid idtype="pmpid" link="fulltext"9343347pubidpubidlistxrefbibbiblbibl id="B33"titlepOn the nature of virus quasispeciesptitleaugausnmEigensnmfnmMfnmauaugsourceTrends Microbiolsourcepubdate1996pubdatevolume4volumeissue6issuefpage216fpagelpage218lpagexrefbibpubidlistpubid idtype="doi"10.10160966-842X(96)20011-3pubidpubid idtype="pmpid" link="fulltext"8795155pubidpubidlistxrefbibbiblbibl id="B34"titlepQuasispecies theory and the behavior of RNA virusesptitleaugausnmLauringsnmfnmASfnmauausnmAndinosnmfnmRfnmauaugsourcePLoS Pathogsourcepubdate2010pubdatevolume6volumeissue7issuefpagee1001005fpagexrefbibpubidlistpubid idtype="doi"10.1371journal.ppat.1001005pubidpubid idtype="pmcid"2908548pubidpubid idtype="pmpid" link="fulltext"20661479pubidpubidlistxrefbibbiblbibl id="B35"titlepGenetic subtypes of HIV-1 in the PhilippinesptitleaugausnmPaladinsnmfnmFJfnmauausnmMonzonsnmfnmOTfnmauausnmTsuchiesnmfnmHfnmauausnmAplascasnmfnmMRfnmauausnmLearnsnmfnmGHfnmsufJrsufauausnmKurimurasnmfnmTfnmauaugsourceAIDSsourcepubdate1998pubdatevolume12volumeissue3issuefpage291fpagelpage300lpagexrefbibpubidlistpubid idtype="doi"10.109700002030-199803000-00007pubidpubid idtype="pmpid" link="fulltext"9517992pubidpubidlistxrefbibbiblbibl id="B36"titlepLos Alamos data baseptitlepubdate2012pubdatenoteurlhttp:www.hiv.lanl.govcontentindexurl.notebiblbibl id="B37"titlepPreviously transmitted HIV-1 strains are preferentially selected during subsequent sexual transmissionsptitleaugausnmReddsnmfnmADfnmauausnmCollinson-StrengsnmfnmANfnmauausnmChatziandreousnmfnmNfnmauausnmMullissnmfnmCEfnmauausnmLaeyendeckersnmfnmOfnmauausnmMartenssnmfnmCfnmauausnmRicklefssnmfnmSfnmauausnmKiwanukasnmfnmNfnmauausnmNyeinsnmfnmPHfnmauausnmLutalosnmfnmTfnmauausnmGrabowskisnmfnmMKfnmauausnmKongsnmfnmXfnmauausnmManuccisnmfnmJfnmauausnmSewankambosnmfnmNfnmauausnmWawersnmfnmMJfnmauausnmGraysnmfnmRHfnmauausnmPorcellasnmfnmSFfnmauausnmFaucisnmfnmASfnmauausnmSagarsnmfnmMfnmauausnmSerwaddasnmfnmDfnmauausnmQuinnsnmfnmTCfnmauaugsourceJ Infect Dissourcepubdate2012pubdatevolume206volumeissue9issuefpage1433fpagelpage1442lpagexrefbibpubidlistpubid idtype="doi"10.1093infdisjis503pubidpubid idtype="pmpid" link="fulltext"22997233pubidpubidlistxrefbibbiblbibl id="B38"titlepImpact on genetic networks in human macrophages by a CCR5 strain of human immunodeficiency virus type 1ptitleaugausnmCoberleysnmfnmCRfnmauausnmKohlersnmfnmJJfnmauausnmBrownsnmfnmJNfnmauausnmOshiersnmfnmJTfnmauausnmBakersnmfnmHVfnmauausnmPoppsnmfnmMPfnmauausnmSleasmansnmfnmJWfnmauausnmGoodenowsnmfnmMMfnmauaugsourceJ Virolsourcepubdate2004pubdatevolume78volumeissue21issuefpage11477fpagelpage11486lpagexrefbibpubidlistpubid idtype="doi"10.1128JVI.78.21.11477-11486.2004pubidpubid idtype="pmcid"523249pubidpubid idtype="pmpid" link="fulltext"15479790pubidpubidlistxrefbibbiblbibl id="B39"titlepComplex determinants in human immunodeficiency virus type 1 envelope gp120 mediate CXCR4-dependent infection of macrophagesptitleaugausnmGhaffarisnmfnmGfnmauausnmTuttlesnmfnmDLfnmauausnmBriggssnmfnmDfnmauausnmBurkhardtsnmfnmBRfnmauausnmBhattsnmfnmDfnmauausnmAndimansnmfnmWAfnmauausnmSleasmansnmfnmJWfnmauausnmGoodenowsnmfnmMMfnmauaugsourceJ Virolsourcepubdate2005pubdatevolume79volumeissue21issuefpage13250fpagelpage13261lpagexrefbibpubidlistpubid idtype="doi"10.1128JVI.79.21.13250-13261.2005pubidpubid idtype="pmcid"1262568pubidpubid idtype="pmpid" link="fulltext"16227248pubidpubidlistxrefbibbiblbibl id="B40"titlepTREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computingptitleaugausnmSchmidtsnmfnmHAfnmauausnmStrimmersnmfnmKfnmauausnmVingronsnmfnmMfnmauausnmvonsnmfnmHAfnmauaugsourceBioinformaticssourcepubdate2002pubdatevolume18volumeissue3issuefpage502fpagelpage504lpagexrefbibpubidlistpubid idtype="doi"10.1093bioinformatics18.3.502pubidpubid idtype="pmpid" link="fulltext"11934758pubidpubidlistxrefbibbiblbibl id="B41"titlepLikelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignmentptitleaugausnmStrimmersnmfnmKfnmauausnmvon HaeselersnmfnmAfnmauaugsourceProc Natl Acad Sci U S Asourcepubdate1997pubdatevolume94volumefpage6815fpagelpage6819lpagexrefbibpubidlistpubid idtype="doi"10.1073pnas.94.13.6815pubidpubid idtype="pmcid"21241pubidpubid idtype="pmpid" link="fulltext"9192648pubidpubidlistxrefbibbiblbibl id="B42"titlepAn index of substitution saturation and its applicationptitleaugausnmXiasnmfnmXfnmauausnmXiesnmfnmZfnmauausnmSalemisnmfnmMfnmauausnmChensnmfnmLfnmauausnmWangsnmfnmYfnmauaugsourceMol Phylogenet Evolsourcepubdate2003pubdatevolume26volumefpage1fpagelpage7lpagexrefbibpubidlistpubid idtype="doi"10.1016S1055-7903(02)00326-3pubidpubid idtype="pmpid" link="fulltext"12470932pubidpubidlistxrefbibbiblbibl id="B43"titlepNew algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0ptitleaugausnmGuindonsnmfnmSfnmauausnmDufayardsnmfnmJFfnmauausnmLefortsnmfnmVfnmauausnmAnisimovasnmfnmMfnmauausnmHordijksnmfnmWfnmauausnmGascuelsnmfnmOfnmauaugsourceSyst Biolsourcepubdate2010pubdatevolume59volumeissue3issuefpage307fpagelpage321lpagexrefbibpubidlistpubid idtype="doi"10.1093sysbiosyq010pubidpubid idtype="pmpid" link="fulltext"20525638pubidpubidlistxrefbibbiblbibl id="B44"titlepPhylogeny inference based on parsimony and other methods with PAUP*ptitleaugausnmSwoffordsnmfnmDSJfnmauaugsourceThe Phylogenetic Handbook-a Practical Approach to DNA and Protein PhylogenysourcepublisherNew York: Cambrige University PresspublishereditorLemey P, Salemi M, Vandamme A-Meditoredition2editionpubdate2003pubdatefpage160fpagelpage206lpagebiblbibl id="B45"titlepEvolutionary characterization of the West Nile Virus complete genomeptitleaugausnmGraysnmfnmRRfnmauausnmVerassnmfnmNMfnmauausnmSantossnmfnmLAfnmauausnmSalemisnmfnmMfnmauaugsourceMol Phylogenet Evolsourcepubdate2010pubdatevolume56volumeissue1issuefpage195fpagelpage200lpagexrefbibpubidlistpubid idtype="doi"10.1016j.ympev.2010.01.019pubidpubid idtype="pmpid" link="fulltext"20102743pubidpubidlistxrefbibbiblbibl id="B46"titlepHigh-resolution phylogenetics and phylogeography of human immunodeficiency virus type 1 subtype C epidemic in South AmericaptitleaugausnmVerassnmfnmNMfnmauausnmGraysnmfnmRRfnmauausnmBrigidosnmfnmLFfnmauausnmRodriguessnmfnmRfnmauausnmSalemisnmfnmMfnmauaugsourceJ Gen Virolsourcepubdate2011pubdatevolume92volumeissuePt 7issuefpage1698fpagelpage1709lpagexrefbibpubid idtype="pmpid" link="fulltext"21450946pubidxrefbibbiblbibl id="B47"titlepPAML: a program package for phylogenetic analysis by maximum likelihoodptitleaugausnmYangsnmfnmZfnmauaugsourceComput Appl Bioscisourcepubdate1997pubdatevolume13volumeissue5issuefpage555fpagelpage556lpagexrefbibpubid idtype="pmpid"9367129pubidxrefbibbiblrefgrp
bm
art



PAGE 1

Additional file 5: Figure S2. Likelihood mapping an alysis to evaluate phylogenetic signal. Likelihood mapping analyses were performed with the program TREE-PUZZLE [40] for each data set by analyzing 10,000 random quartets. Starlike signals in the likelihood maps were <45%, indicating sufficient phylogenetic signal [41 ]. Neither data set showed significant saturation (P<0.0001) [42], indicating that data co uld be used for reliable phylogeny inferences. Pinv 5.7% 13.9 %


xml version 1.0 encoding utf-8 standalone no
mets ID sort-mets_mets OBJID sword-mets LABEL DSpace SWORD Item PROFILE METS SIP Profile xmlns http:www.loc.govMETS
xmlns:xlink http:www.w3.org1999xlink xmlns:xsi http:www.w3.org2001XMLSchema-instance
xsi:schemaLocation http:www.loc.govstandardsmetsmets.xsd
metsHdr CREATEDATE 2012-12-27T20:06:54
agent ROLE CUSTODIAN TYPE ORGANIZATION
name BioMed Central
dmdSec sword-mets-dmd-1 GROUPID sword-mets-dmd-1_group-1
mdWrap SWAP Metadata MDTYPE OTHER OTHERMDTYPE EPDCX MIMETYPE textxml
xmlData
epdcx:descriptionSet xmlns:epdcx http:purl.orgeprintepdcx2006-11-16 xmlns:MIOJAVI
http:purl.orgeprintepdcxxsd2006-11-16epdcx.xsd
epdcx:description epdcx:resourceId sword-mets-epdcx-1
epdcx:statement epdcx:propertyURI http:purl.orgdcelements1.1type epdcx:valueURI http:purl.orgeprintentityTypeScholarlyWork
http:purl.orgdcelements1.1title
epdcx:valueString High-resolution deep sequencing reveals biodiversity, population structure, and persistence of HIV-1 quasispecies within host ecosystems
http:purl.orgdctermsabstract
Abstract
Background
Deep sequencing provides the basis for analysis of biodiversity of taxonomically similar organisms in an environment. While extensively applied to microbiome studies, population genetics studies of viruses are limited. To define the scope of HIV-1 population biodiversity within infected individuals, a suite of phylogenetic and population genetic algorithms was applied to HIV-1 envelope hypervariable domain 3 (Env V3) within peripheral blood mononuclear cells from a group of perinatally HIV-1 subtype B infected, therapy-naïve children.
Results
Biodiversity of HIV-1 Env V3 quasispecies ranged from about 70 to 270 unique sequence clusters across individuals. Viral population structure was organized into a limited number of clusters that included the dominant variants combined with multiple clusters of low frequency variants. Next generation viral quasispecies evolved from low frequency variants at earlier time points through multiple non-synonymous changes in lineages within the evolutionary landscape. Minor V3 variants detected as long as four years after infection co-localized in phylogenetic reconstructions with early transmitting viruses or with subsequent plasma virus circulating two years later.
Conclusions
Deep sequencing defines HIV-1 population complexity and structure, reveals the ebb and flow of dominant and rare viral variants in the host ecosystem, and identifies an evolutionary record of low-frequency cell-associated viral V3 variants that persist for years. Bioinformatics pipeline developed for HIV-1 can be applied for biodiversity studies of virome populations in human, animal, or plant ecosystems.
http:purl.orgdcelements1.1creator
Yin, Li
Liu, Li
Sun, Yijun
Hou, Wei
Lowe, Amanda C
Gardner, Brent P
Salemi, Marco
Williams, Wilton B
Farmerie, William G
Sleasman, John W
Goodenow, Maureen M
http:purl.orgeprinttermsisExpressedAs epdcx:valueRef sword-mets-expr-1
http:purl.orgeprintentityTypeExpression
http:purl.orgdcelements1.1language epdcx:vesURI http:purl.orgdctermsRFC3066
en
http:purl.orgeprinttermsType
http:purl.orgeprinttypeJournalArticle
http:purl.orgdctermsavailable
epdcx:sesURI http:purl.orgdctermsW3CDTF 2012-12-17
http:purl.orgdcelements1.1publisher
BioMed Central Ltd
http:purl.orgeprinttermsstatus http:purl.orgeprinttermsStatus
http:purl.orgeprintstatusPeerReviewed
http:purl.orgeprinttermscopyrightHolder
Li Yin et al.; licensee BioMed Central Ltd.
http:purl.orgdctermslicense
http://creativecommons.org/licenses/by/2.0
http:purl.orgdctermsaccessRights http:purl.orgeprinttermsAccessRights
http:purl.orgeprintaccessRightsOpenAccess
http:purl.orgeprinttermsbibliographicCitation
Retrovirology. 2012 Dec 17;9(1):108
http:purl.orgdcelements1.1identifier
http:purl.orgdctermsURI http://dx.doi.org/10.1186/1742-4690-9-108
fileSec
fileGrp sword-mets-fgrp-1 USE CONTENT
file sword-mets-fgid-0 sword-mets-file-1
FLocat LOCTYPE URL xlink:href 1742-4690-9-108.xml
sword-mets-fgid-1 sword-mets-file-2 applicationpdf
1742-4690-9-108.pdf
sword-mets-fgid-3 sword-mets-file-3
1742-4690-9-108-S2.PDF
sword-mets-fgid-4 sword-mets-file-4
1742-4690-9-108-S5.PDF
sword-mets-fgid-5 sword-mets-file-5
1742-4690-9-108-S3.PDF
sword-mets-fgid-6 sword-mets-file-6
1742-4690-9-108-S1.PDF
sword-mets-fgid-7 sword-mets-file-7
1742-4690-9-108-S4.PDF
structMap sword-mets-struct-1 structure LOGICAL
div sword-mets-div-1 DMDID Object
sword-mets-div-2 File
fptr FILEID
sword-mets-div-3
sword-mets-div-4
sword-mets-div-5
sword-mets-div-6
sword-mets-div-7
sword-mets-div-8



PAGE 1

Additional file 2: Error correction. A two-step error correction strategy included: 1. Correct frame-shifting mutation. Frame shifts reflected insertions and deletions int roduced at the read level by the detection algorithm in the Roche/454 process. As V3 has essentially no length variation within an individual, frame-shifti ng reads were corrected. Sequences were aligned using Geneious (Biomatters Ltd, Auckland, N ew Zealand) followed by BioEdit (Informer Technologies, Inc). All insertions were removed by alignment using the reference sequence HIV-1HXB2 as a mask in BioEdit. Nucleotide sequences were tr anslated into amino acid sequences to define codons; any amino acid codon wi th one missing nucleotide residue was corrected with the correspondent nucleotide in the reference sequence. Any read with more than two nucleotide deletions within a codon were r emoved from further analysis. 2. Correct random mismatches. Random mismatches resulted in synonymous or nonsynonymous substitutions or STOP codons. To corr ect errors resulted from random nucleotide substitutions, Hierarchical clustering o f sequences at 3% pairwise distance was performed and a consensus sequence for each cluster was developed using ESPRIT program [25]. Since the rate of mismatch was 0.3% (3 mismat ches per 1,000 nt) and length of V3 loop is about 100 bp, only 1 of 3 reads would display a mis match. The most frequent nucleotide at each position appears in the consensus sequence, thus a low frequency error will not contribute to a consensus sequence. For sequence clusters contain a single sequence (a singleton), there is a chance that 1 of 3 singleton clusters contain misma tch(s), thus people should be cautious when draw any conclusion to results based on singleton s equence clusters.



PAGE 1

Additional file 3: Table S2. Sequential filtering o f data sets through the bioinformatics pipeline. PIDa Raw reads Reads removed by filters (%) Errorcorrected consensus clusters (no.) Final reads Q uality control E rror correction % Number S1 9,964 6.3 0.5 b 283 93.2 9,291 S2 9,305 2.8 0.6 c 282 96.6 8,988 S3 10,238 5.4 5.0 75 89.8 9,194 S4 21,348 3.1 4.9 202 92.1 19,662 S5 10,971 7.1 0.2 172 93.0 10,174 S6 10,599 4.7 0.3 179 95.0 10,067 a Patient identification. b Error correction included removal of recombinants found in five unique reads. c Error correction included removal of hypermutated s equences found in about 0.3% of reads.



PAGE 1

Additional materials Additional file 1: Table S1. Characteristics of st udy participants at time of pyrosequencing. a Patient identification. Subjects included in pyros equencing study were infected perinatally by matern al transmission for more than one year, were nave to combination antiretroviral therapy with viral RNA levels >4 log10 copies per ml of plasma and genetic analysis of HI V-1 envelope by conventional clonal sequencing. b Log10 HIV-1 RNA copies per ml of plasma by measuring HIV -1 gag gene (COBAS AMPLICOR HIV-1 MONITOR Test, v1.5, Roche, Pleasanton, CA, USA). c Log10 HIV-1 gag copies/106 CD4+ T cells d CD4% is used to normalize for the age-related lymph ocytosis that occurs in pediatric subjects. PIDa Viral load Age/ length of infection (years) CD4 %d Plasmab Cellassociatedc S 1 4.0 2.6 4.5 24 S 2 4.8 4.5 1.5 30 S 3 4.9 3.4 4.8 22 S 4 4.9 3.3 2.1 17 S 5 5.2 5.0 4.2 22 S 6 5.7 5.5 6.1 2