BACTERIAL SYMBIOSIS WITH PARASITIC LICE By BRET MICHAEL BOYD A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2014
2014 Bret M Boyd
To my patient wife Sally Riewe Boyd and mentor George T Austin
4 ACKNOWLEDGMENTS I thank David Reed for his time, patience, continuing support and commitme nt to my success. I am also thankful to my graduate committee Gordon Burleigh, Jaret Daniels, Valerie de Crecy, Lauren McIntyre, and Marta Wayne without whom this work would not have been possible I thank Julie Allen for her help and support through al l steps of my research and graduate school. I thank my undergraduate assistants Anna Marsakova and Zach Quicksall for their hard work and patience. For her patients over these last few years I thank my wife Sally Riewe Boyd. None of this work would hav e be en possible without d Matt Gitzendanner and Oleksander Moskalenko who helped me implem ent data analysis and manage data I thank Justin F ear for the numerous times he fixed my scripts I thank Rita Graze, Jajie Yang, and Alison Morse for their help wi th processing and interpreting genome assembly data. I thank Basma El Yacoubi, Marc Bailly, Patrick Thaiville, and Ian Blaby for their guidance and help in understanding the role of louse endosymbionts For their support in preparation of my DDIG proposa l I thank Eric Lyons, Lori Wojciechowski, Tamara Mandell, and Richard Snyder. I thank Takema F ukatsu and Ryuichi Koga for their work describing the physical location of endosymbionts in seal lice and pro viding images used in this dissertation I thank Br ad Barbazuk and Ewen Kirkness for their help interpreting louse endosymbiont genomes. I thank Brett Olds for his help with human head louse data. I thank Ulrike Munderloh, and Roderick Felsheim for their help interpreting the Rickettsia genome. I would like to thank Katie Scholl, Kristin Magrini, Angelo Soto Centeno, Jorge Pino, Marina Ascunce, Candace McCaffery, and Kelly Speer for all of their help and support in preparing this dissertation I thank Michael Miyamoto and Stuart McDaniel for stimulating conversation and their input on my research. For their support and help navigating
5 graduate school I thank Wilfred Vermeris, Connie Mulligan, Jorg Bungert, Patrick Concannon, and Hope Parmeter. I thank Kevin Johnson for his support of my endosymbiont re search moving forward. Finally I thank the National Science Foundation, Florida Museum of Natural History, and the University of Florida Genetics Institute for support of my work. This work was supported in part by Nation Science Foundation Doctoral Dis ertation Improvement Grant DEB 1310824.
6 TABLE OF CONTENTS page ACKNOWLEDGMENTS ................................ ................................ ................................ .. 4 LIST OF TABLES ................................ ................................ ................................ ............ 9 LIST OF FIGURES ................................ ................................ ................................ ........ 10 LIST OF ABBREVIATIONS ................................ ................................ ........................... 12 ABSTRACT ................................ ................................ ................................ ................... 13 CHAPTER 1 INTRODUCTION: TAXONOMY OF LICE AND THEIR ENDOSYBIOTIC BACTERIA IN THE POS T GENOMIC ERA ................................ ............................ 14 Overview ................................ ................................ ................................ ................. 15 Ecology and Morphology of Parasitic Lice ................................ .............................. 15 Higher Level Taxonomic Placement of Parasitic Lice ................................ ............. 17 Classification Within the Phthiraptera ................................ ................................ ..... 18 Phylogeny and Taxonomy of Human Lice ................................ .............................. 19 Louse Perspective in the Post Genomic Era ................................ .......................... 20 Endosymbionts of Lice ................................ ................................ ............................ 22 Roles of Endosymbionts ................................ ................................ ......................... 23 Taxonomy and Phylogeny of Louse Primary Endosymbionts ................................ 24 Perspectives for the P ost Genomic Era ................................ ................................ .. 27 Concluding Remarks ................................ ................................ ............................... 28 2 GENOME EROSION AND GENE LOSS IN YOUNG SYMBIONTS: A GENOMIC COMPARISON OF RECENETLY AC QUIRED ENDOSYMBIOTNS FROM HUMAN AND CHIMPANZEE LICE ................................ ............................. 31 Background ................................ ................................ ................................ ............. 32 P endosymbi onts of Human and Primate Lice ................................ ................. 32 Rationale ................................ ................................ ................................ .......... 33 Objectives ................................ ................................ ................................ ......... 34 Methods ................................ ................................ ................................ .................. 35 Specimen Collection ................................ ................................ ......................... 35 Candidatus Riesia pediculischaeffi str. PTSU Genome Sequencing ................ 35 Assembly and An notation of the Ca Riesia Primary Chromosome .................. 35 Assembly and Annotation of the Pantothenate Plasmid ................................ ... 37 Determining Rate of Gene Loss ................................ ................................ ....... 38 Identifying Predicted Genes of Unknown Function ................................ ........... 39 B vitamin Synthesis Predicted by Genomic Data ................................ ............. 39 Results ................................ ................................ ................................ .................... 40
7 Chimpanzee Louse P endosymbiont, Ca Riesia pediculischaeffi, Genome Assembly ................................ ................................ ................................ ....... 40 Comparison Between Chimpanzee and Human Louse P endosymbiont Genomes ................................ ................................ ................................ ....... 40 Genes Shared Between and Unique to Louse P endosymbiont Genomes ...... 40 Departure from Purifying Selection ................................ ................................ ... 41 Hypothetical Short Coding Sequences ................................ ............................. 41 B vitamin Synthesis ................................ ................................ .......................... 42 Concluding Remarks ................................ ................................ ............................... 43 Genome Structure ................................ ................................ ............................ 43 Gene Loss ................................ ................................ ................................ ........ 43 Effect of Selection ................................ ................................ ............................ 45 Abundant Genes of Unknown Function ................................ ............................ 45 The Role of Louse P endosymbionts ................................ ............................... 46 Data Deposition ................................ ................................ ................................ ...... 48 3 BACTERIAL ENDOSYMBIOSIS IN A MARINE LOUSE (PHTHIRAPTERA, ANOPLURA) AND DRAFT GENOME SEQUEN CES OF A NEW RICKETTSIA AND SODALIS LIKE ENDOSYMBIONT. ................................ ................................ 54 Background ................................ ................................ ................................ ............. 55 Methods ................................ ................................ ................................ .................. 59 Collection of Lice ................................ ................................ .............................. 59 DNA Extraction and Genome Sequencing ................................ ....................... 59 Isolation of Bacterial Sequence Data ................................ ............................... 59 Reconstruction of Endosymbiont Genomes ................................ ..................... 60 Testing for Rickettsia peacockii pRPR Plasmid ................................ ................ 61 Identification of Rickettsia Pathogenicity Genes ................................ ............... 61 Phylogenetic Analysis ................................ ................................ ...................... 62 Localization of Endosymbionts ................................ ................................ ......... 63 Results ................................ ................................ ................................ .................... 63 Sodalis like Endosymbiont ................................ ................................ ............... 63 Rickettsia ................................ ................................ ................................ .......... 64 Concluding Remarks ................................ ................................ ............................... 65 Sodalis like Endosymbiont ................................ ................................ ............... 67 Rickettsia ................................ ................................ ................................ .......... 71 4 PHYLOGENOMIC ANALYSIS REVEALS A WIDESPREAD CLADE OF HERITABLE SYMBIONTIC BACTERIA IN THE PARASITIC LICE OF MAMMALS (PHTHIRAPTERA: ANOPLURA AND RHYNCOPHTHIRINA .............. 83 Background ................................ ................................ ................................ ............. 84 Methods ................................ ................................ ................................ .................. 88 Specimen Collection ................................ ................................ ......................... 88 Genome Sequencing ................................ ................................ ........................ 89 Gene Clustering ................................ ................................ ............................... 89 Gene Repertoire ................................ ................................ ............................... 90
8 Phylogenomic Analysis ................................ ................................ ..................... 90 Phylogenetic Analysis ................................ ................................ ...................... 92 Results ................................ ................................ ................................ .................... 93 Genome Assembly ................................ ................................ ........................... 93 Gene Clustering ................................ ................................ ............................... 93 Gene Repertoire ................................ ................................ ............................... 94 Li kelihood Tree Inference from Superm atrices with GTR .. 94 Likelihood Tree Inference with Non homogenous Models ................................ 95 Likelihood Tree Inference from Reco ded Supermatrices ................................ 95 Individual Gene Trees ................................ ................................ ...................... 96 Concluding Remarks ................................ ................................ ............................... 96 5 CONCLUSIONS: TOWARDS DESCRIBING THE TRANSITION TO OBLIGATE VERTICALLY INHERITED SYMBIOSIS BY INSECT ASSOCIATED BACTERIA 117 Significance ................................ ................................ ................................ .......... 117 Insect Endosymbiosis ................................ ................................ ........................... 117 Evolutionary Models of Endosymbiosis in Insects ................................ ................ 119 Sources of P endosymbionts and D irectionality of Transitions ............................. 121 Genome Erosion in P endosymbionts ................................ ................................ .. 122 Effects of Mutation ................................ ................................ ................................ 123 Natural Selection ................................ ................................ ................................ .. 125 Host Control and Drift ................................ ................................ ........................... 127 Competition ................................ ................................ ................................ ........... 129 Summary ................................ ................................ ................................ .............. 130 LIST OF REFERENCES ................................ ................................ ............................. 131 BIOGRAPHICAL SKETCH ................................ ................................ .......................... 143
9 LI ST OF TABLES Table page 2 1 B vitamins predicted to be supplied to human lice by their p endosymbiont, Ca Riesia pediculicola ................................ ................................ ...................... 52 2 2 Genome and assembly statistics of Louse p endosymbionts. ........................... 52 2 3 Age of associations between p endosymbionts and insects and the estimated rate of gene loss in each p endosymbion t ................................ ......................... 53 3 1 Genes associated with pathogenicity in R. rickettsii SS and their state in our new Rickettsi a. ................................ ................................ ................................ ... 82 4 1 Summary of phylogenetic studies including louse endosymbionts. .................. 112 4 2 Louse endosymbionts and genome characteristics. ................................ ......... 112 4 3 Supermatrix phylogenetic methods and the number of origins of louse symbiosis detected. ................................ ................................ .......................... 113 4 4 Summary of louse p endosymbiont placement in individual gene ML trees ..... 114 4 5 Non louse associated taxa in cluded in phylogenetic analysis ......................... 116
10 LIST OF FIGURES Figure page 1 1 Comparison of the relationsh ip of phthirpateran families. A ) Traditional classification based on morphological data. B ) Classification b ased on recent molecular studies ................................ ................................ ............................... 30 1 2 Human head louse nymph, showing the white, circular mycetome in the abdomen where primary endosymbionts are housed ................................ ........ 30 2 1 Evolutionary history of louse p endosymbionts, Ca. Riesia species, with dates of species divergence ................................ ................................ .............. 49 2 2 Schematic of sequencing and bioinformatics process to isolate and de novo assemble chimpanzee louse p ................................ .. 50 2 3 Alignment of the 5.2kb plasmid from the chimpanzee louse p endosymbiont to the 7.7kb plasmid from t he human louse p endosymbiont ............................ 51 3 1 Partial map of Russia and Alaska, USA showing co llection site of lice in the Pribilof Islands, USA. ................................ ................................ ......................... 76 3 2 Maximum Likelihood tree based on 16S rDNA show ing relationship of Sodalis like endosymbiont to other Enterobacteria ................................ ............ 77 3 3 Alignment of draft genome sequence of the Sodalis like endosymbiont of P. fluctus to the genome sequence of S. glossinidius ................................ ............ 78 3 4 Whole mount in situ hybridization of endosymbiont 16S rDNA in P. fluctus ...... 79 3 5 Maximum Likelihood tree based on three protein coding gene supermatrix showing relationship of a new Rickettsia found in P. fluctus .............................. 80 3 6 Alignment of draft genome se quence of a new Rickettsia found in P. fluctus R. rickettsii SS, and R. peacockii to an outgroup taxon, R. conorii .................... 81 3 7 Alignment of ankarin repeat encoding gene from our new Rickettsia from P. fluctus and R. rickettsii SS ................................ ................................ ................. 82 4 1 Phylogenomic workflow ................................ ................................ ................... 104 4 2 prot eobacteria used in this study .......... 105 4 3 proteobacteria. Neighbor Joining tree constructed from a binary matrix representing presence or absence of genes proteobacteria included in this study ................................ ......................... 106 4 4 proteobacteria calculat ed from 2056 gene supermatrix ............... 107
11 4 5 ML trees calculated from 53 gene supermatrices s. ................................ .......... 108 4 6 ML tree calculated from 16S rDNA ................................ ................................ .. 109 4 7 Counts of clusters found when comparing 104,735 translated gene sequences from 42 taxa ................................ ................................ .................. 110 4 8 Counts of amino acid sequences included in clusters that contain one sequence, 2 42 sequences, and greater than 42 sequences ........................... 111
12 LIST OF ABBREVIATIONS P endosymbiont An intracellular, obligate endosymbiont, house in a specialized structure called a bacteriome or mycetome, vertically inherited, and often involved in nutrient provisioning. S endosymbiont An intracellular, facultative endosymbiont, that may be found in diverse tissues, and can be vertically or horizontally transmitted. Genome For the purposes of this dissertation a ge nome will simply be defined as the entire genetic material in a cell. Ca. Abbreviation for Candidatus status used in the classification of bacteria Mya Millions of years ago
13 Abstract of Dissertation Presented to the Graduate School of the Univer sity of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy BACTERIAL SYMBIOSIS WITH PARASITIC LICE By Bret Michael Boyd May 2014 Chair: David Lee Reed Major: Genetics and Genomics Numerous insect s pecies have formed symbiosis with heritable intracellular bacteria. The parasitic lice of animals have obligate symbiosis with bacteria belonging to the proteobacteria. Here I review what was known about symbiosis between lice and bacteria, study genome evolution in human and primate lo use p endosymbionts, describe new symbiosis between bacteria and the n orthern fur seal louse, and describe a diverse clade of heritable p endosymbionts from the blood feeding lice of mammals. Through the use of next genera tion sequencing and bioinformatics tools I can describe n ew symbionts, accurately place them in the bacterial tree of life, and describe the history of louse bacterial symbiosis.
14 CHAPTER 1 INTRODUCTION: TAXONOMY OF LICE AND THEIR ENDOSYBIOTIC BACTERIA IN THE POST GENOMIC ERA Recent studies of molecular and genomic data from the parasitic lice of birds and mammals, as well as their mutualistic endosymbiotic bacteria, are changing the accepted phylogenetic relationships and taxonomy of these organisms. Ph ylogenetic studies of lice suggest that vertebrate parasitism arose multiple times from free living book and bark lice. Molecular clocks show that the major families of lice arose in the late Mesozoic and radiated in the early Cenozoic, following the radia tion of mammals and birds. The recent release of the human louse genome has provided new opportunities for research. The genome is being used to find new genetic markers for phylogenetics and population genetics, to understand the complex evolutionary rela tionships of mitochondrial genes, and to study genome evolution. Genomes are informing us not only about lice, but also about their obligate endosymbiotic bacteria. In contrast to lice and their hosts, lice and their endosymbionts do not share common evolu tionary histories, suggesting that endosymbionts are either replaced over time or that there are multiple independent origins of symbiosis in lice. Molecular phylogenetics and whole genome sequencing have recently provided the first insights into the phylo genetic placement and metabolic characteristics of these distantly related bacteria. Comparative genomics between distantly related louse symbionts can provide insights into conserved metabolic functions and can help to explain how distantly related specie s are fulfilling their role as mutualistic symbionts. In lice and their endosymbionts, molecular data and genome sequencing are driving our understanding of evolutionary relationships and classification, and will for the foreseeable future.
15 Overview The pa rasitic lice are a diverse group of specialized insect parasites of birds and mammals. These parasites belong to a larger group of insects known as book and bark lice. Numerous phylogenies and classification schemes have been proposed for the parasitic lic e. Traditional phylogenetics based on morphology suggested that parasitic lice were a monophyletic radiation, because of their pe rmanent parasitic lifestyle (Kim et al. 1978; Kim and Ludwig 1985; Lyal 1985) However, more recently, molecular data have supp orted an alternative topology whereby the parasitic lifestyle would have arisen twice wit hin the book and bark lice (Johnson et al 2004; Yowhizawa and Johnson 2010) Molecular data are also revealing the age of this parasitism and the nature of louse host associations over time (Light et al. 2010) Herein, we will review both old and new phylogenetic hypotheses, and we will illustrate how the recently sequenced ge nome of the human body louse (Kirkness et al. 2010) has already provided, and will continue to provide, additional insights into the evolutionary history of parasitic lice. Ecology and Morphology of Parasitic Lice There are c. 4500 recognized species of chewing lice (Amblycera, Ischnocera, and Rhyncophthirina) fou nd on both mammals and birds (Price et al. 2003) The sucking lice (Anoplura) are a much smaller group, with 540 described species tha t occupy 12 mammalian orders (Light et al. 2010) Parasitic lice are generally host specific, occupying one or a few closely related species (Price et al. 20 03) Many birds and mammals are known to harbor more than one species of louse, and in some cases different louse species may be restricted to only one region of their host. The diet of most chewing lice is dominated by keratin rich dermal components such as feathers, skin, and hair. A few species of chewing lice (Rhyncophthirina) feed from the pooled
16 blood of their hosts. The sucking lice feed strictly on the blood of their hosts by piercing the skin. The morphology of true lice is highly specialized to su it to an ectoparasitic lifestyle. Lice complete their entire life cycle on their host, and every stage is specialized for parasitism. Eggs of lice, known as nits, are large and affixed to the hair shafts or feather barbules of the host (Grimaldi and Engle 2005) Lice are hemimetabolous, meaning that the immature stages, or nymphs, look similar to the imago and utilize the same resources. Adult parasitic lice are secondarily apterous, and the body is dorsoventrally flattened. In the Ischnocera, the head is b road and flattened, with the thorax being reduced. In other lice, the head is generally small and the thorax is reduced. The sensory organs are vestigial or absent and antennae are greatly reduced and concealed in the Ambylecera (Grimaldi and Engle 2005) The tarsi of true lice are modified into claw like structures to grasp the feathers or hairs of the host. The Amblycera and Ischnocera retain chewing mouthparts with which to feed on the skin, hair, and/or feathers of their hosts; blood feeding is minimal in these two groups. In contrast, the mouth of the Rhyncophthirina has been modified into a long rostrum with the mandibles located at the terminus of the rostrum (Grimaldi and Engle 2005) The mandibles are rotated to rasp at the skin of the host, causing blood to pool from which the louse sucks (Grimaldi and Engle 2005) Mouthparts in the Anoplura have been highly modified to pierce mammal skin and suck blood from the host. Some morphological characteristics of parasitic lice can also be seen in the book louse family Liposcelididae. The Liposcelididae probably represent the closest living relatives of parasitic lice, or they may be part of the parasitic lice (see Discussion below) [1 5] (Kim and Ludwig 1978; Kim and Ledwig 1982; Lyal 1985; Johnson et al. 2004; Yoshizawa
17 and Johnson 2010). Liposcelids share similar morphological characteristics with their parasitic relatives, including loss or reduction of wings, eyes, and sensory organs, a smooth broad head, and a re duction of thoracic segments (Grimalid and Engle 2005) These small lice have been found in animal nests, feeding on shed fur and feathers, and there are a few documented occurrences within the fur or feathers of birds and mam mals (see Grimaldi and Engle 2005 for a review). Higher Level Taxonom ic Placement of Parasitic Lice The Phthiraptera (true lice) belong to the order Psocodea, which also includes the book lice (Liposcelidae ) and bark lice (Psocoptera) (Grimaldi and Engle 2005) The book and bark lice are small and often overlooked insects t hat are diverse and free living. Many book and bark lice occupy moist areas, where they use modified mouthparts to scrape microorganisms from the surface of detritus (Grimalid and Engle 2005) However, some species have acquired the ability to survive desi ccation, and feed on organic materials in caves, insect and animal nests, and human habitations (Grimalid and Engle 2005) Collectively, the Psocodea form the sister group to the Condylognatha, which includes both thrips (Thysanoptera) and true bugs (Hemip tera). The thrips and true bugs generally feed on the phloem of plants or are generalist insect predators. However, some members of the true bug group feed strictly on vertebrate blood such as bed bugs and kissing bugs. The Psocodea plus the Condylognatha represent a large monophyletic group of insects, known as the Paraneoptera (Grimalid and Engle 2005) The Paraneoptera have undergone numerous radiations to occupy niches and utilize numerous food resources. Within this group, feeding by piercing and sucki ng has arisen multiple times, as has parasitism and feeding on vertebrate blood.
18 Classification W ithin the Phthiraptera The accepted view of the evolutionary relationships of lice and their classification ha s changed considera bly over multiple revisions (S mith 2004). Kim and Ludwig (1978, 1982) supported two orders, the Mallophaga (all chewing lice) and the Anopl ura (all sucking lice). Lyal (1985) challenged the monophyly of the Mallophaga, instead supporting a topology of two sister clades. For many years, a phylogen etic hypothesis has persisted that contains two sister clades, one clade containing the Amblycera, and the other containing the Ischnocera, R hynchopthrina, and Anoplura (Smith 2004). Johnson and Whiting (2002) and Barker et al. (2003) were the f irst to examine the phylogeny, using both nuclear and mitochondrial markers under a maximum parsimo ny criterion. Johnson et al. (2004) were the first to reconstruct the p hylogeny of the Phthirpatera under the maximum likelihood criterion, using mitochondrial sequence data. Their findings suggested that parasitism had arisen multiple times in non parasitic book lice, and that Phthiraptera was a polyphyletic classification. Johnson et al. (2004) supported two families, the Liposcelididae (nest parasites described previously) and the Pachytroctidae, a small group of free living book lice, as the closest relatives of the Ambl ycera. Yoskizawa and Johnson (2010) further tested th e polyphyly of the Phthiraptera under maximum likelihood and Bayesian frameworks, using multiple nuclear and mitochondrial markers. They also supported the Phthiraptera as polyphyletic when the Liposcelididae and the P achytroctidae are excluded (F igure 1 1 ). Both of these studies suggest that parasitism of vertebrates arose twice, once in the Amblycera and again in the common ancestor of the Ischnocera, Rhyncophthirina, and Anoplura. Under this new phylogeny, Smith et al. (2011) used molecular dating
19 techni ques, calibrated to louse and host fossils, to determine the approximate age when major louse clades diverged. They found that all four Phthiraptera families and the Liposcelididae had diverged during the Mesozoic, prior to the Cretaceous Paleogene ( K Pg ) bou ndary. Whereas Smith et al. (2011) found that the major families had Cretaceous origins, major radiations occurred late in the Cretaceous and early i n the Cenozoic. Light et al. (2010) conducted an extensive phylogenetic analysis of the Anoplura, and da ted their divergence with a molecular clock calibrated to host divergence. They also found that the Anoplura diversified in the late Cretaceous, but that an additional major radiation occurred after the K Pg boundary, following the radiat ion of mammals. Li ght et al. (2010) found considerable disagreement with accepted anopluran phylogenies, most importantly demonstrating that host switching had occurred multiple times in anopluran history. This suggests that rapid host switching, and subsequent extinctions in some groups, has played a major role in the post K Pg diversification of sucking lice. Whereas host specificity and co speciation appear to be important in the evolution of the Anoplura, host associations may be less inf ormative for louse phylogeny (Lig ht et al. 2010) Phylogeny and Taxonomy of Human L ice Humans are parasitized by two species of sucking lice, the pubic louse ( Pthirus pubis Linnaeus), and head and body lice ( Pediculus humanus Linnaeus). Recent studies have helped to establish the taxonomi c rank of the human body louse and the phylogenetic relationships of Phtirus and Pediculus Light et al. (2008) built a phylogenetic reconstruction of human head and body lice based on mitochondrial sequence data. They found that human body lice did not re present different species, but rather were eco morphs of a single s pecies. Reed et al. (2007) investigated the
20 relationships of human, chimp ( Pediculus schaeffi Farenholz) and gorilla ( Pthirus gorillae Ewing) lice, using both mitochondrial and nuclear sequ ence data. Reed et al. (2007) found that human Pediculus species and P. gorillae shared a common phylogenetic history with their primate hosts, but that P. pu bis did not. Light and Reed (2009) later supported this topology by using multiple nuclear and mit ochondrial markers. Both studies supported a divergence time of the human and gorilla species of Pthirus of about 3 million y ears ago (mya). Reed et al. (2007) suggested that the human pubic louse arose from a host switch from gorillas to humans c. 3 mya. Louse Perspective in the Post Genomic E ra The studies surveyed above have utilized molecular data to dramatically change our understanding of louse evolutionary history. The sequencing of the human body louse genome presents new opportunities to understand louse evolution and refine louse classification. The human body louse genome is the second sequenced genome of a hemimetabolous insect, providing opportunities for comparative genomics with more dis tantly related insect groups (Kirkness et al. 2010) The genome sequence itself revealed numerous interesting characteristics in the nuclear and mitochondrial genomes, and holds potential for phylogenetics and population genetics in lice. The human louse possesses the smallest sequenced insect genome, about 108 Mb, but maintains a complete set of protein coding genes and RNAs for basic metabolic funct ions (Kirkness et al. 2010) However, unlike in most other insects, the canalization of lice as obligate ectoparasites has led to the loss of genes associated with d etecting and respondi ng to a variable environment (Kirkness et al. 2010) The publication of the genome has also sparked interest and a series of publications on the composition of louse mitochondrial gen omes and genome recombination [Kirkness et al 2010; Shao et
21 al 2009; Shao and Barker 2011; Cameron et al 2011) Lice are the only insects known to possess mitochondrial genomes that are fragmented into a series of 18 small chromosomes, known as minichr omosomes (Kirkness et al. 2010; Cameron et al. 2011) Not all louse species possess mitochondrial minichro mosomes, and Cameron et al. (Cameron et al. 2011) suspected a link with the functionality of the mitochondrial single stranded bindi ng protein. Shao and Barker (2011) found evidence that louse mitochondr ial minichromosomes have undergone multiple instances of non homologous recombination, resulting in chimeric combinations of minichromosomes. Mitochondrial sequence data have played a major role in recent phylogenetic revisions of lice, and these studies w ill provide valuable information for the selection of mitochondrial sequence data for phylogenetics and interpretation of the results. The published human louse genome also holds potential for the rapid sequencing of multiple genes to build louse phylogeni es and the asking of evolutionary questions. At the University of Illinois, Kevin Johnson and his research group are currently using a new method, the targeted restricted assembly method, to rapidly obtain p hylogenetic markers in lice (Johnson et al. 2011) In this method, conserved gene sequences from the human louse genome are used as a reference for mining high throughput sequence data from several louse species (K. Johnson, personal communication). The conserved gene library from the human louse allows reads to be readily mapped to genes to generate gene sequences. From the resulting multigene datasets, they plan to build extensive phylogenies of parasitic lice, particularly in the less studied chewing lice. In our laboratory at the University of Florida we have used the genome data to develop a set of microsatellite markers, non coding regions and coding regions for population genetics and
22 phylogenetics in human lice and other anoplurans. These data are valuable for looking at population dynamics and mi gration patterns of human head lice. These lines of research were made possible or greatly accelerated by the release of the human louse genome. This genome has provided a powerful platform for elucidating louse evolutionary history, and will ultimately in form classification. Endosymbionts of L ice Insect bacterium endosymbiosis is a common phenomenon. Numerous insect groups rely on nutritional provisioning by obligate endosymbiotic bacteria to sustain them on nutritionally incomplete diets. Acquisition of a n endosymbiont may provide invasion of niches with limited diets, and subsequent radiation. Parasitic lice sustain themselves solely on the keratin rich dermal compon ents, secretions or blood of their hosts, a potentially incomplete diet. Many parasitic louse species have been found to harbour endosymbiotic bacteria that are potentially engaged in nutritional provisioning. Some endosymbionts of lice are found in the gu t, whereas others are primary endosymbionts, being intracellular and housed in specialized str uctures known as mycetomes (F igure 1 2). Experimental removal of primary endosymbionts from lice results in increased mortality and reduced fitness. These bacteri a are suspected of proteobacteria. However, all currently known primary proteobacteria, from the families Enterobacteriales and Legionellales. Lice and their primary endosymbionts deviate more often from a shared, common evolutionary history
23 t han other groups of insects that harbour primary endosymbiotic bacteria. Because primary endosymbiotic bacteria cannot be cultured, genome sequencing and molecular phylogenetics provide the only opportunity to classify these bacteria and develop hypotheses regarding their symbiotic roles. The prevalence and complexity of interactions between lice and endosymbionts is varied across parasitic lice. Ries (1930) was the first to make an extensive review of louse mycetome structure and location. Buchner (1965) summarized the work of Ries and all subsequent work on mycetome structure and bacterial transmission. From these studies, we learned that amblyceran species appear to have limited or no associations with bacteria (Perotti et al. 2009) proteobacteri a are known to inhabit the gut of some amblycerans, and defined mycetome structures are absent (Perotti et al. 2009) The Ischnocera possess mycetomes, but these structures are not well organized (Buchner 1965) Both the Rhyncophthirina and the Anoplura po ssess structured mycetomes. The mycetomes of the Rhyncophthirina and the Anoplura vary considerably between species in location, structure, and number (Ries 1930; Buchner 1965) Buchner (1965) suspected that differences in housing and transmission of endos ymbionts suggested that symbiosis between bacteria and lice arose multiple times. Roles of E ndosymbionts Unfortunately, very little is known about the nutritional provisioning and metabolic roles regarding endo symbionts and lice. Aschner (1934) and Puchta (1955) conducted experiments with endosymbiont removal in human body lice (Anoplura). They (Aschner 1934; Puchta 1955) found that when endosymbionts were excluded, human body lice showed reductions in s urvival and fitness. Puchta (1955) (as int erpreted by Perottii et al. 2009 ) supplemented the diets of lice without endosymbionts, and found that B vitamins
24 (thiamine, riboflavin, folic acid, pyridoxine nicotinamide, pantothenate, and biotin) increased surviva l and fitness. Smith et al. (2010) conducted endosy mbiont removal in Columbicola species, and found a reduction in fitness when endosymbionts were removed. The endosymbiont of the slender pigeon louse ( Columbic ola columbae [Freire and Duarte] ) is closely related to Sodalis a secondary e ndosymbiont of tset se flies (Fukatsu et al 2007) Sodalis is prototrophic for man y cofactors and amino acids (Toh et al. 2006) and nutritional provisioning is suspected in other Sodalis like endosymbionts. The endosymbiont of Columbicola may also be engaged in cofactor or amino acid provisioning. These studies only attempted to address metabolite provisioning from the endosymbiont to the host. No studies have been conducted on provisioning from the host to the endosymbiont. For other insect endosymbionts, provisioning from the host to the endosymbiont can vary considerably between associations. For example, Carsonella an endosymbiont of psyllids, requires provisioning of both met abolites and small proteins (Tamames et al. 2007) whereas Buchnera an endosymbiont of aphids, requires onl y metabolites from its host (Gosalbes et al 2006) Much remains unknown about the associations between lice and endosymbionts, and in the post genomic era, there is the potential to understand these relationships. Taxonomy and Phylogeny of Lou se Primary E ndosymbionts The phylogenetic relationships of only a few louse primary endosymbionts have been in vestigated. Fukatsu et al. (2006) were the first to characterize the phylogenetic placement of the primary endosymbiont of human body lice. They f proteobacterium, and named it Candidatus Riesia pediculicola. Allen et al. (2007) further investigated C a Riesia, and found that it had co speciated with its hosts, the lice
25 of great apes (Pediculidae a nd Pthiridae). Allen et al. (2007 ) also described two more species within C a Riesia, and noted the close relationship of C a Riesia to Aresnophonus an endosymbiont of haematophagous dipte rans. Novakova et al. (2009) conducted an extensive phylogenetic reconstruction of Arsenophonus spec ies, sampling from endosymbionts of plants, ticks, and four orders of insect. Novakova et al. (2009) found that C a Riesia belongs within a clade of Arsenophonus endosymbionts of dipterans. Allen et al. (2009) dated the divergence between the Riesia and Ar senophonus clades at 13 25 mya, making this one of the youngest known insect primary endosymbiont associations. The next youngest involves the primary endosymbiont of the grain weevil, which is estimated to have diverged from the secondary endosybmiont of ts etse flies, Sodalis 25 mya (Gosalbes et al. 2010) Most insect primary endosymbiont associatio ns range from 50 to 350 mya (Gosalbes et al. 2010) which is considerably older than the louse endosymbiont association. A sister clade to the hominid lice is the Pedicinidae, comprising the lice of cercopithecid p rimates (Light et al. 2010). Fukatsu et al. (2009) were the first to investigate the phylogenetic placement of primary endosymbionts in the Pedicinidae. They found that the endosymbiont of Pedicinus ob tusus represented a primary endosymbiont independent of C a Riesia, and proposed the name Candidatus Puchtella. Interestingly, phylogenetic reconstruction placed C a Puchtella close to Wigglesworthia, the primary endosy mbiont of tsetse flies (Fukatsu et al 2009) Like C a Riesia, C a Puchtella is another louse primary endosymbiont that is closely related to an endosymbiont of a non louse blood feeding insect.
26 Hypsa and Krizek (2007) sampled primary endosymbionts from the anopluran genera Haematopinus, Sole noptes, Pediculus and Polyplax and from the rhyncophthirinan genus Haematomyzus (lice of ungulates, hominids, rodents, and elephants). They found that these louse primary endosymbionts represented five independent clades of endosymbionts. Whereas the pri mary endosymbionts of Haematopinus, Solenoptes, Pediculus and Haematomyzus were from the Enterobacteriales, the endosymbionts from Polyplax were members of the Legionellales. Collectively, with C a Puchtella, this suggests six known independent lineages of louse primary endosymbionts within the Anoplura and the Rh yncophthirina. Allen et al. (2010) presented the first attempt to determine how many times louse proteobacteria. They conducted a large scale phylogenetic analysis of thousands of bacterial strains, supporting additional lineages. They found that there are at least ten distinct lineages of endosymbionts in mutualistic relationships with lice. Although they are much more diverse, very little is known about primary endosymbiosis in the Ischnocera. The primary endosymbiont of Columbicola columbae (slender proteobacteria by Fukatsu et al. (2007) They found it to be closely allied with Sodalis glossinidius, a secondary endosymbiont of the tsetse flies (Diptera). Additional Sodalis like endosy mbionts have recently been described from multiple distantly related insect groups, including a primary endosymbiont in the weevil ge nus Sitophilus (Coleoptera) (Charles et al. 1997) a secondary endosymbiont of the parasitic fly Craterin a melbae (Rondani; Diptera) (Novakova and Hypsa 2007) and a primary endosymbiont of the
27 stinkbug Cantao ocel latus (Thunberg; Hempitera) (Kiawa et al. 2007) Additional studies would improve our understanding of ischnoceran endosymbiosis and determine whether polyphyly is p resent in this group as well. Perspectives for the Post Genomic E ra The post proteobacterial primary endosymbionts of true lice. Unlike free living and pathogenic bacteria, louse prima ry endosymbionts cannot be cultured and classified with traditional microbiological methods. Sequencing of the 16S rRNA gene by PCR has provided valuable insights into the diversity of primary endosymbionts of all insects. Recent studies have shown that li ce house distantly related primary endosymbionts that are closely related to other insect symbionts and pathogens. Although this work has added to our understanding of louse primary endosymbiosis, the A/T rich and low complexity regions prevalent in insect endosymbiont genomes often limit PCR techniques. The recent publication of the genome of C a Riesia pediculicola revealed a small genome, 574 kB, similar to what is found in other insect primary endosymbionts (Kirkness et al. 2010) Genomes of this size c an easily be sequenced at low cost with current high throughput sequencing technologies. Although primary endosymbiont bacteria cannot easily be separated from louse tissues, supercomputers and metagenomic algorithms allow for parsing of mixed short read p ools. These technologies have brought whole genome sequences of louse primary endosymbionts within reach, with respect to both budget and time. An initiative to sequence multiple genomes of louse primary endosymbionts would provide additional markers for p hylogenetic analysis and insights into the symbiotic interaction between louse and bacteria. Although 16S rRNA is a valuabl e resource, Novakova et al. (2009) and Comas et a l. (2007) have both
28 highlighted the importance of using multiple phylogenetic marker s when reconstructing the evolutionary history of endosymbionts. Additional markers would provide additional resolution in closely related taxa, and the recent explosion of publically available bacterial genomes would make multigene phylogeny building feas ible. Unlike the recent queries into the evolutionary history of louse primary endosymbionts, very few attempts have been made to describe the nutritional role that the primary endosymbiont provides for its louse host, and vice versa. Past endosymbiont rem oval experiments, such as that conducted by Puchta (1955) may not be possible for many species of lice. Whole genome sequences would provide new insights on which we can build hypotheses of metabolic provisioning via metabolites (and potentially proteins) to both the louse host and primary endosymbiont. Collectively, these two lines of study would provide insights into how distantly related endosymbionts come to inhabit louse mycetomes and act as primary endosymbionts engaged in metabolite provisioning. Ul timately, we will learn whether these disparate bacteria have used similar means to provide for their host. Conclu ding R emarks Recent molecular data and increasingly sophisticated phylogenetic analyses are challenging our hypotheses of the evolutionary hi story of parasitic lice. It appears that parasitism has arisen twice within lice, and that host switching has played an important role in louse speciation. Previous proposals of louse phylogenies based on morphology have been contentious at times. Next gen eration sequencing and genome assembly technologies offer an opportunity to test current phylogenetic hypotheses. Rapid sequencing and efficient gene mapping to the human louse genome will allow for extensive multigene phylogenies to be developed and impro ve our understanding of the evolutionary history and classification of lice.
29 Molecular data have provided the first insights into louse primary endosymbiont evolutionary history. Parasitic lice and their primary endosymbionts do not share a completely over lapping evolutionary history, which is largely unique in insect endosymbiont systems. Additionally, parasitic louse primary endosymbionts are among the youngest known insect primary endosymbionts. Whether these endosymbionts are being replaced by new endos ymbionts or whether louse endosymbiosis has arisen multiple times independently remains an important evolutionary question. Whether these bacteria are fulfilling precisely the same roles in symbiosis is also an intriguing question. The recent surge in avai proteobacteria genomes and advances in next generation sequencing technologies will bring whole genome sequencing within the time and budget limitations of most laboratories. Whole genome sequences will provide additional markers for phylogenetics and help us to understand the roles of primary endosymbionts in lice by comparative genomics, improved phylogenies, and understanding genome evolution.
30 Figure 1 1. Comparison of the relationship of phthirpateran families. A ) Traditional classifica tion based on morphological data. B ) Classification based on rec ent molecular studies. Image based on Yoskizawa K, Johnson KP. 2010 How hypothesis (Insecta: Psocodea)?: a comparison of phylogenetic signal in multiple gen es. Mol Phylogenetic Evol 55:939 951. Figure 1 2. Human head louse nymph, showing the white, circular mycetome in the abdomen where primary endosymbionts are hou sed. Photo provided by J. M. Allen. A B
31 CHAPTER 2 GENOME EROSION AND GENE LOSS IN YOUNG SYMBI ONTS: A GENOMIC COMPARISON OF R ECE NETLY ACQUIRED ENDOSYMBIOTNS FROM HUMAN AND CHIMPANZEE LICE The heritable endosymbionts of insects possess some of the smallest known bacterial genomes. This is likely due to a process of genome erosion occurring during s ymbiosis. The mode and rate of this erosion may change over time, fast in newly formed associations and slow long established ones. The endosymbionts of human and anthropoid primate lice present a unique opportunity to study genome erosion in newly estab lished (or young) symbionts. This is because we have a detailed phylogenetic history of these endosymbionts with divergence dates for closely related species. This allows for genome evolution to be studied in detail and rates of change to be estimated in a phylogenetic framework. We sequenced the genome of the chimpanzee louse endosymbiont ( Candidatus Riesia pediculischaeffi) and compared it to closely related genome of the human body louse endosymbiont. From this comparison we found evidence for recent genome erosion leading to gene loss in these endosymbionts. Additionally, we searched for genes associated with B vitamin synthesis in these two genomes. These endosymbionts are believed to synthesize B ect genes encoding proteins involved in B vitamin synthesis to be conserved in these otherwise degraded genomes. All of the expected genes were present, except those involved in thiamine synthesis. These data add to a growing resource of obligate endosym biont genomes and to our understanding of the rate and mode of genome erosion in obligate animal associated bacteria. Ultimately sequencing additional louse p endosymbiont genomes will provide a model system for studying genome evolution in obligate host associated bacteria.
32 Background Bacterial endosymbiosis of insects: Many insect species are engaged in symbiosis with intracellular mi crobial symbionts (Kikuchi 2009; Duron and Hurst 2013). These symbionts are classified as reproductive parasites, faculta tive endosymbionts, or obligate beneficial end osymbionts (Dale and Moran 2006; Gosalbes et al. 2010). In some cases microbial symbiosis has permitted insects to persist on specialized diets that are nutritionally incomplete. This is because the endosymbi ont provides the insect with the metabolic capacity to synthesize vitamins and/or amino acids absent in their diet (Douglas 1989). Metabolic provisioning by endosymbionts likely facilitated the evolution and radiation of economically and medically importa nt insect groups, including blood feeding lice. These nutritional provisioning endosymbionts (called primary endosymbionts or p endosymbionts), are obligate, bacteriome bound, and vertically inherited. Relationships between insects and p endosymbionts are complex and ensure that endosymbionts are passed on to new generations (Bright and Bulgheresi 2010). P endosymbi onts of Human and Primate L ice The parasitic lice of humans, chimpanzees, and gorillas possess p endosymbionts belonging to the genus Candid atus Riesia (Sasaki Fukatsu et al. 2006; Allen et al. 2007; Hypsa and Krizek 2007; Allen et al. 2009; Novakova et al. 2009). These p endosymbionts are housed in a large bacteriome visible in the abdomen of lice (Ries 1930; Buchner 1965; Eberle and McLean 1 982; Eberle and McLean 1983; Sasaki Fukatsu et al. 2006; Perotti et al. 2007; Bright and Bulgheresi 2010). In sexually mature females, the p endosymbionts leave the bacteriome and migrate to the ovaries where they are passed on to the next generation of l ice (Ries 1930; Buchner 1965; Eberle and McLean 1982; Eberle and McLean 1983; Perotti et al. 2007; Bright and Bulgheresi
33 2010). Puchta (1955) conducted p endosymbiont removal and louse feeding experiments with human lice to determine if p endosymbionts pr ovided lice with an the p endosymbionts supplied the lice with seven different B vitamins (Puchta 1955 as interpreted by Perotti et al. 2009; Table 2 1). Loss of any o f these vitamins reduced survival of louse nymphs (Perotti et al. 2009). One of these vitamins, vitamin B5 or pantothenate, had the greatest effect on louse survival when absent (Perotti et al. 2009). Pantothenate is the precursor to Coenzyme A and it s ynthesis by louse p endosymbionts appeared to be crucial to survival of human lice. The genes encoding for proteins involved in pantothenate synthesis are located on a small plasmid (Kirkness et al. 2010). Localization of these genes on a plasmid could be significant in control and regulation of the pathway. Rationale The genomes of insect p endosymbionts have been under intense study in regards to minimal genome requirements and genome erosion (Moran and Mira 2001; Silva et al. 2003; Gil et al. 2004; Delmotte et al. 2006; Moran et al. 2008; Gosalbes et al. 2010). This is because these p endosymbionts possess some of the smallest known bacterial genomes, but likely possessed much larger genomes prior to entering into obligate endosymbiosis that eroded during symbiosis (Moran and Mira 2001; Silva et al. 2003). Recently Allen et al (2009) calculated substitution rates of the 16S rD NA gene of insect p endosymbionts (see also Allen et al 2007). They found that p endosymbionts acquired by insects less th an 100 mya have a much higher substitution rate in this gene than associations greater than 100my old. Of these p endosymbionts, Ca Riesia p endosymbiont of lice had the fastest mutation rate and is the youngest
34 known insect p endosymbiont association (t h e association began ~13 25mya ; F igure 2 1). This suggests that in p endosymbiont the mutation rate at this locus is not constant, but that mutation are occurring much more frequently in young p endosymbionts and slowing as the symbioisis ages. Does glob al genome erosion proceed in a similar manner, quickly reducing the genomes than dramatically slowing or is genome erosion a more continuous process with no major changes in rate? Burke and Moran (2011) looked at gene loss in the endosymbiont of Aphids, S erratia symbiotica They found evidence of numerous pseudo genes and small genomic deletions that could effectively reduce the number of genes during genome erosion. Ca Riesia presents a unique opportunity to look at genome erosion in a recently acquire d rapidly evolving insect p endosymbiont. This is because Ca Riesia is the only young rapidly evolving insect p endosymbiont for which we have a detailed phylogenetic history with estimated divergence dates for each species (Allen et al 2007, Allen et a l 2009). Objectives Here we sequenced the genome of Candidatus Riesia pediculischaeffi, p endosymbiont of chimpanzee lice and compared its genome with that of the published genome of Candidatus Riesia pediculicola to estimate the rate of gene loss in l ouse p endosymbionts. These two species diverged approximately 5.4my ago, only 7.6 19.6my after the symbiosis with lice began (Allen et al. 2009). Therefore, these species present an ideal opportunity to detect recent gene loss in young p endosymbionts. We also examined the direction of selection on protein coding genes in these p endosymbionts. Genes departing from purifying selection might be undergoing by genome erosion. Also we found numerous short predicted coding sequences in louse p endosymbiont genomes of unknown function. We surveyed these short predicted
35 coding sequences for conserved domains or other features that could describe their function. Even though we did not know the role of these coding sequences, their abundance in a small genome warranted additional investigation. Finally, we identified genes coding for proteins involved in B vitamin synthesis to evaluate their roles as vitamin provisioning symbiotic bacteria. M ethods Specimen C ollection Specimens of Pediculus schaeffi, the chimpanzee louse, were collected from Pan troglodytes s c hweinfurthii (individuals Oketch and Ikuru) at the Ngamba Island sanctuary, Uganada. Specimens were stored in 95% ethanol and transported to the United States. From there they were stored in 95% et hanol at 80 Celsius. Candidatus Riesia pediculischaef fi str. PTSU Genome S equencing Genomic DNA was extracted from chimpanzee lice using a phenol chloroform extraction method (J. Allen personal communication) Extracts from four lice were pooled and a r Sample Prep Kit, selecting for an average insert size of 350 bases. The library was sequenced paired end on the Illumina HiSeq2000 platform using the TruSeq SBS sequencing kit and analyz ed using pipeline v.1.8 yielding 100bp reads. Quality of the read library was assessed using fastqc v.0.10.0 (Babraham Bioinformatics). Assembly and A nnotation of t he C a Riesia Primary C hromosome We first removed repeat containing or simple sequence r eads associated with louse telomeres to reduce the library complexity by excluding reads that mapped onto telomere scaffolds of the USDA strain Human Body Louse genome using Bowtie2 v.beta5 local alignment options (Langmead and Saizberg 2012). Remaining r eads were
36 trimmed removed by quality trimming, that read and its paired end mate were removed from the library. Reads were then assembled de novo into contigs using Velvet v.1.2.02, kmer=41 and long paired settings (Zerbino and Birney 2008; Zerbino 2010). Resulting contigs were compared to eight bacterial genomes ( Ca Riesia pediculicola USDA gi295698 239, gi292493920; Sodalis glossinidius str. Morsitans gi85057978, gi85060411, gi85060466, gi85060490; Wigglesworthia glossinidia gi32490749, gi19225058, gi19225058; Photorhabdus luminescens subsp. Laumondii gi37524032; Yersinia pestis gi31795333; Bacilus s ubtilis subsp. subtiis gi223666304; Buchnera aphidicola str. APS gi15616630, gi10957103, gi10957099; and C a Blochmannia floridanus gi33519483) using NCBI Blastn v.2.2.25 word size=11 (Altschul et al. 1990). These genomes were selected to be representati ve of a broad range of gamma proteobacteria endosymbionts and a gram positive species. Contigs with significant similarity to bacterial genomes were separated from the general population and considered as a tentative draft genome assembly of the p endosym biont. Bowtie2 v.beta5, end to end very sensitive options, was used to build a library of reads aligning to the draft bacterial genome. The resulting population of reads were then re assembled de novo using both Velvet v.1.2.02 and ABySS v.1.3.4 into dra ft genome sequences (Simpson et al. 2009). The two draft sequences were compared and the ABySS assembly used more of the available reads and was selected as the final assembly. The draft genome sequence was annotated using the RAST pipeline (Aziz et al. 2008). The two genomes were compared using SEED individual metabolic
37 pathway pages and sequence based comparisons tools (Overbeek et al. 2005). Genes detected in the human louse p endosymbiont genome, but not in the chimpanzee louse genome were compared to the original population of contigs using blastn to search for and ensure no endosymbiont contigs were missed. The draft genome assembly of the chimpanzee louse p endosymbiont genome was compared to the human louse p endosymbiont genome using CoGe SynMa p (Lyons et al. 2008) to determine the order of contigs. Assembly and A nnotat ion of the Pantothenate P lasmid A small plasmid was described by Kirkness et al (2010) in the human louse p endosymbiont that encodes proteins involve in pantothenate biosyn thesis. A LAST (V.3; Frith et al 2010) search was used to identify the homologous plasmid in the chimpanzee louse p endosymbiont initial contig library comparing conserved protein coding genes in the human louse p endosymbiont plasmid (genes panB panC and panE ; sequences extracted from gi292493920). This search found matches to all three genes on a single contig in the initial chimpanzee louse contig library. Again, reads mapping to the contig were isolated using bowtie2 v.beta6 and re assembled using ABySS v.1.3.4. Reads mapping to this contig were viewed in SAM format in Geneious (Biomaters, www.geneious.com) to determine that paired reads spanned the ends of the contig demonstrating that the assembled contig was circular. To annotate the chimpanze e louse p endosymbiont plasmid, we extracted all open reading frames (ORFs) found in the plasmid. We then identified potential homologs by reciprocal tblastx best hits between all chimpanzee louse p endosymbiont plasmid ORFs and predicted genes in the hum an louse p endosymbiont plasmid. The plasmid sequence
38 was then manually annotated based on reciprocal best hit data, assigning the potential homologs to the predicted function given to the human louse p endosymbiont plasmid. Comparison of chimpanzee a nd human louse p endosymbiont genomes: The annotated Ca Riesia pediculischaeffi genome was added to the CoGe database of genomes (Lyons and Freeling 2008, Lyon et al. 2008) Syntenic blocks common to the Ca Riesia pediculischaeffi and C a Riesia pedicu licola were identified using SynMap (Lyons and Freeling 2008, Lyon et al. 2008). Syntenic blocks were viewed as a dotplot. SynMap (See Lyons et al. 2008 for SynMap meth ods). To examine the effect of selection, we determined the ratio of non synonymous to synonymous mutations (dn/ds) for all coding sequences found in both Ca Riesia pediculicola and Ca Riesia pediculischaeffi genomes and search for outliers. The dn/ds ratio is a common method of estimate the direction of selection on coding sequence. Determinin g Rate of Gene L oss All coding sequences in each genome were compared to all coding sequences in the other genome using reciprocal tblastx. Perl scripts were u sed to sort out only those genes with a reciprocal best hit (scripts obtained from FAS Center for Systems Biology website). These genes were considered potential orthologs. If two genes in one genome were equally as good of a match for a gene in the other genome, both were retained as potential orthologs (this helped to correct for error when calculating rates of gene loss in louse symbionts by removing recently duplicated genes). We then identified genes for which an ortholog was not found. A manual che ck of genome alignments was then performed to determine if any of these genes were actually present, but was not described in the genome annotation for any reason. These genes
39 could then be considered lost in the other genome and rate of gene loss calcula ted using divergence dates described by Allen et al. (2009). When using only two taxa we cannot account for genes lost in both taxa or for gene gains. Therefore, gene loss in a single gene was assumed for this work. Following Degnan et al (2005) we cal culated rate of gene loss as genes los t over the time since divergence. Identifying Predicted Genes of Unknown F unction The genome of the human louse p endosymbiont is rich with short predicted coding sequences unknown function (hypothetical CDS; Kirkne ss et al. 2010). We first wanted to determine if these hypothetical CDS possessed a shared conserved domain. To do this we first identified predicted CDS under 200bp in length and were named hypothetical protein by the annotation pipeline. These predicte d CDS were globally aligned to each other using muscle (implemented through Geneious). The alignments were then used to build a predicted conserved domain using hmmer V.3, hmmbuild (Eddy 2011). We then compared this conserved domain to the non redundant p rotein and Swissport databases using hmmsearch (Eddy 2011) to determine if it was similar to a known domain. The aforementioned process would be useful if these CDS proved to be mobile elements sharing a common history. If instead they were derived from different origins we would gain more information from comparing each individual gene to a database of conserved domains and protein sequences. Therefore, we compared all of the hypothetical CDSs individually to the swissprot database using psi BLAST v.2.2 .26 (Altschul et al. 1997). B vitamin S ynth esis Predicted by Genomic D ata We used the SEED database tools and blast searches to predict if the human and chimpanzee louse p endosymbionts were capable of synthesizing B vitamins
40 (Overbeek et al. 2005). P erotti et al. (2009), based on Puchta (1955), published a list of B vitamins predicted to be provisioned to the louse by its bacterial p endosymbiont. We accessed the SEED metabolic sub system page for biosynthesis and metabolism for each of these B vitam ins using the seed viewer. From this predicted pathways we determined if predicted CDS in louse p endosymbiont genomes supported synthesis of each B vitamin. R esults Chimpanzee Louse P endosymbiont, Ca Riesia pedi culischaeffi, Genome A ssembly Our assemb ly resulted in reconstruction of a primary chromosome 576,757 bp in length spanning 5 contigs (N50=303,941bp, GC=31.79%). A circular plasmid was reconstructed from a single contig, 5,159bp in length. Annotation of the primary chromosome resulted in 585 p redicted coding sequences and five additional genes located on the plasmid. Comparison Between Chimpanzee and H uman Louse P endosymbiont G enomes The genome of the human louse p endosymbiont, Ca Riesia pediculicola, is similar in size and composition to t he chimpanzee louse p endosymbiont genome (Table 2 2). Overall gene order and genomic synteny have been maintained between these species, with the exception of a duplicated region in the human louse p endosymbiont at the end of the primary chromosome and a small duplicated region in the plasmid (Figure 2 3). These regions encode for duplicate copies of 11 genes. Genes Shared Between and U nique t o Louse P endosymbiont G enomes Reciprocal tblastx searches revealed 472 potential orthologous CDS shared between the human and chimpanzee louse p endosymbiont genomes. An additional 84
41 CDS were predicted in the human louse p endosymbiont genome not found in the chimpanzee louse p endosymbiont and 118 CDS predicted in the chimpanzee louse p endosymbiont genome, but not found in the human louse p endosymbiont. This included CDS with predicted function and small CDS with no known function. Only ten of the CDS unique to the human louse p endosymbiont, and 15 CDS unique to the chimpanzee louse p endosymbiont had a pre d icted function (T ables 2 1 & 2 2). Using both total number of unique CDS and unique CDS with a known function (herein referred to as genes) we estimated a rate of gene loss in louse p endosymbiont genomes. When only considering gene with known function t he predicted rate of gene loss for the chimpanzee louse p endosymbiont was 1.79 genes/my whereas the rate for the human louse p endosymbiont was 2.7 genes/my. If all unique gene and small CDS of unknown function are considered, then the rate increases to 14.29 genes/my for the chimpanzee louse p endosymbiont and 20 genes/my for the human louse p endosymbiont. Depa rture from Purifying S election 470 of the 472 CDS shared between louse p endosymbiont genomes appear to be maintained by purifying selection (a d n/ds < 0.5). Two genes, metK and sdhA departed from this trend with dn/ds ratios higher than expec ted under purifying selection (F igure 2 2 ). Hypoth etical Short Coding S equences Both louse p endosymbiont genomes contain abundant short CDS with no known function. We failed to find a common conserved domain shared by most or all of these small CDS consistent with a phage or mobile element. We also failed to find significant
42 similarity between individual short hypothetical CDS and any entry in the conser ved protein databases. B vitamin S ynthesis Genes associated with synthesis of the B vitamins riboflavin (B2), folate (B9), nicotinamide (B3), biotin (B7), and pyridoxine (B6) were found on the primary chromosome in both louse p endosymbiont species (Ta ble 2 1). Synthesis of riboflavin from GTP ( ribA pyrD and pyrR ) and transformation of riboflavin to FMN and FAD appears active. Metabolism from nicotinate to NADP+ is complete in both species. The pathway for folate contained the genes: folE, folB, fo lK, folP, folC and a gene encoding for dihydrofolate reductase (DHFR) was found in both taxa. In the human louse p endosymbiont folB is duplicated, but not in the chimpanzee louse p endosymbiont. Biosynthesis of biotin from PimeloylCoA appeared to be co mplete in both taxa (Biotin protein ligase (EC22.214.171.124), bioF, bioA, bioD and bioB ). De novo biosynthesis of pyridoxine phosphate appears complete in both taxa ( gapA, pdxB, pdxF, pdxA and pdxJ ). In both taxa, synthesis of pantothenate (B5) was encoded by a small plasmid ( panE, panB and panC ). In the chimpanzee louse p endosymbiont, the panE gene, one of three genes involved in pantothenate biosynthesis, is truncated compared to the human louse p endosymbiont. We failed to detect genes associated with thiamine (B1) biosynthesis, instead finding genes encoding an ABC thiamin transport system that may act as an exogenous thiamine salvage in both louse p endosymbionts (including a thiamin binding proteins, a transmembrane component, and thiQ thiamin ATP b inding protein).
43 Concluding R emarks Genome S tructure Ca Riesia species possess small genomes (576,757bp and 582,127bp) with a low percent of GC bases (37% and 35%) that only encode for a few hundred (556 and 585) predicted protein coding sequences. T he human louse p endosymbiont genome contains duplicated regions in both the end of the primary chromosome and on a small plasmid resulting in duplication of 11 genes. We did not find any evidence of these duplicated regions in the chimpanzee louse p endo symbiont genome. By comparing only two genomes we cannot determine if these duplicated regions represent gains in the human louse p endosymbiont genome or losses in chimpanzee louse p endosymbiont genome. However, it seems parsimonious to conclude that t hey represent gains in the human louse p endosymbiont genome because the duplicated genes sequences are identical. Gene L oss Gene loss is important in shaping the genomes of insect p endosymbionts (Moran and Mira 2001; Delmotte et al. 2006; Burke and Moran 2011). In young endosymbionts, genes unnecessary to maintain the symbiosis may be lost quickly. To determine if whole gene loss was occurring in either or both louse p endosymbiont genomes, we identified genes that were unique to each p endosymbiont ge nome or present in both. We found that 202 of the predicted CDS were unique to a louse p endosymbiont genome. This suggests that gene loss is occurring in these genomes, however we cannot differentiate between gene loss and addition when comparing two ta xa. We have considered only loss for the purpose of this article. The number of unique genes was surprisingly high, but further investigation found that only a faction of
44 these predicted CDS showed homology to genes with known function from other bacteria l genomes. Using a detailed phylogeny of human and primate louse p endosymbionts with dates of speciation we were able to estimate rates of genes loss in these two p endosymbionts. When only those genes with a predicted function are considered, we find th at the human louse p endosymbiont is losing 2.7 genes/my and the chimpanzee louse p endosymbiont is losing 1.79 genes/my. Rates of gene loss have been previously reported for Blochmannia floridanus and Blochmannia pennsylvanicus p endosymbionts of carpen ter ants by Degnan et al. (2005) using similar methods. Blochmannia entered into symbiosis ~30mya and these two species diverged ~16 20mya (Degnan et al. 2005), slightly older than Ca Riesia, p endosymbionts of lice. The ancestor of Ca Riesia entered i nto symbiosis with a parasitic louse between 13 25mya and the p endosymbionts of chimpanzee lice and human lice co speciated with their hosts ~5.6mya (Allen et al. 2009). Degnan et al. (2005) found that B. floridanus lost 1.56 1.25 genes/my (a simillar ra te to louse p endosymbionts), but that B. pennsylvanicus lost genes at a much slower rate of 0.25 0.2 genes/my (see also Delmotte et al. 2006). Degnan et al. (2005) also reported rates of gene loss in Buchnera species, losing 0.6 0.42 genes/my (Table 2 3) Buchnera (p endosymbionts of aphids) have been in association with aphids for >150my (Gosalbes et al. 2010), much longer than either Blochmannia or Ca Riesia. B. floridanus and both Ca Riesia species are losing genes at a faster rate than the ancient Buchnera p endosymbiont, but B. pennsylvanicus showed the slowest rate of gene loss. Unexpectedly the younger p endosymbiont Ca Riesia is not losing genes at a faster rate than B. floridanus The recently sequenced genome of Blochmannia vafer is
45 smaller than either B. floridanus or B. pennsylanicus and may prove to have a faster rate of gene loss (Williams and Wernegreen 2010). Delmotte et al. (2006) interpreted this rate heterogeneity to mean that that rate of gene loss in insect p endosymbionts is lin age specific. We also see some heterogeneity in louse p endosymbionts. However additional sampling of other insect p endosymbionts is needed to determine if gene loss is truly lineage specific or if we can infer generalities about these rates. Effect of S election If gene loss is occurring by gene degradation then we might expect some genes to be evolving neutrally. We found evidence that two genes have departed from purifying selection, by having dn/ds ratios much higher than other genes. One gene appe ars to be evolving neutrally and the may be under positive selection in the chimpanzee louse p endosymbiont genome based on dn/ds ratio alone Although, in some instances AT biased mutation in insect endosymbionts may lead to detection of positive selecti on in genes under purifying selection (Toft and Anderson 2010). This may lead to false detection of positive selection in obligate p endosymbiont genomes. Therefore it is difficult to say definitively if any louse p endosymbiont genes are evolving under positive selection. Abun dant Genes of Unknown F unction Both sequenced Ca Riesia genomes possess abundant small (<200bp) predicted CDS of unknown function identified by the ab initio gene finder glimmer (Delcher et al. 1999). Because one genome was sequen ced using long read technology and the other using short read next generation sequencing we do not believe these short CDS represent sequenencing or assembly error (see Kirkness et al 2010 for assembly of Ca Riesia pediculicola genome). These CDS are co mmon in
46 both C a Riesia genomes, but many are unique to one genome or the other. This would suggest a rapid loss or expansion of these elements in each genome, consistent with a mobile element or phage. Mobile element activity could also help to explain the presence of duplicated regions in the human louse p endosymbiont. If they are mobile elements or phage associated genes then we should have detected similarities in overall nucleotide sequence or the presence of a conserved domain. Our searches faile d to find evidence of conserved features. It is possible they represent an extinct mobile element or phage and that their structure has been disrupted by mutation. Sequencing of additional Ca Riesia species that diverged earlier during the symbiosis, su ch as p endosymbionts of Gorilla or human pubic lice (Allen et al. 2009), would allow us to differentiate between expansion or loss of these short CDS. If expanding this would be indicative of a mobile element. Another possibility is that they represent degraded bacterial genes that are no longer identifiable. In this case we would be dramatically increasing the rate of gene loss in Ca Riesia (1.79 2.7 genes/my to 14.29 20 gene/my). However, this must be interpreted with caution as hypothetical CDS may contribute little or not at all to a bacterial phenotype (Jackson et al. 2002, Lerat and Ochman 2005, Konstantinidis et al. 2006). Therefore, we might expect only the rate of genes loss for identified genes to represent gene loss that might impact bacter ial phenotype. The Role of Louse P endosymbionts B vitamin synthesis has been considered to be a primary function of p endosymbionts for parasitic lice of mammals (Puchta 1955; Perotti et al. 2009). Many of these vitamins are in low concentration or unav vertebrate blood (Perotti et al. 2009). When human lice are treated to remove
47 symbionts, the absence of different B vitamins had varying effects on louse survival and reproduction (Puchta 1955; Perotti et al. 2009). Here we found that the genomes of both human and chimpanzee p endosymbionts encoded genes involv ed in the synthesis of these vi t a mins except for thiamine (Table 2 1). In both C a Riesia species we found gene encoding for a mechanism to import thiamine a cross the p endosymbionts cell membrane, the thiamine ABC transport. This is similar to an endosymbiont ( Sodalis glossinidius ) from the Tsetse fly, another blood feeding insect (Snyder et al 2010). Here thiamine monophosphate is scavenged from exogenous sources by Sodalis using the thiamine ABC transport, and the available exogenous source of thiamin is synthesized by a different endosymbiont, Wigglesworthia glossinidia (Snyder et al 2010). It was surprising to see a similar mechanism in place for endo genous uptake of thiamin in p endosybmionts of lice, as there are no other known endosymbionts present that could complement biosynthesis. Also, Sodalis possess thiamine kinase and thiamine monophosphate kinase to convert the scavenged thiamine monophosph ate to thiamine pyrophosphate that Ca Riesia appears to lack. Ca Riesia must possess an unknown kinase to salvage thiamine monophosphate. The B vitamins, that when absent in symbiont free lice, had the strongest effect on louse survival are capable of being synthesized by both louse p endosymbionts. De novo pantothenate biosynthesis, considered to be the most important role of louse p endosymbionts (Puchta 1955; Perotti et al. 2009), is encoded on a plasmid in both species. This plasmid may increase efficiency of production; particularly in the chimpanzee louse p endosymbiont were one gene in the de novo pantothenate synthesis pathway is truncated.
48 Here we have sequenced the genome of a rare p endosymbiotic bacterium, Ca Riesia pediculischaeffi, fou nd only in the parasitic lice of chimpanzees. Comparison of this genome with the genome of a closely related p endosymbiont from human parasitic lice, Ca Riesia pediculicola, revealed recent genome erosion and gene loss. Su r prisingly this loss is not oc curring significantly faster than in slightly older symbionts from ants, despite earlier evidence that the 16S rD NA genes in experiencing a higher mutation rate in louse p endosymbionts (Allen et al 2009). This genome sequence also revealed two surprises an abundance of short coding sequences of unknown function and the absence of genes for synthesis of vitamin B1 previously thought to be important to the symbiosis with lice. Additional sequencing of Ca Riesia species would be significant to approach gene loss in this clade using a more rigorous phylogenetic framework using ancestral state reconstruction. Data D eposition This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession AWXV00000000. The version described in this paper is version AWXV01000000. This genome was sequenced under the BioProject=PRJNA2187, as Ca Riesia pediculischaeffi str. PTSU. The genome has Candidatus Riesia
49 Figure 2 1: Evolutionary history of louse p endosymbionts, Ca. Riesia species, with dates of species divergence. Origin of symbiosis at <25 million years ago and subsequent speciation (based on results by Allen et al. 2009).
50 Figure 2 2: Schematic o f sequencing and bioinformatics process to isolate and de novo assemble chimpanzee louse p contigs colored in blue, louse p endosymbiont reads and contigs colored in red.
51 Figure 2 3: Alignment of the 5.2kb plasmi d from the chimpanzee louse p endosymbiont to the 7.7kb plasmid from the human louse p endosymbiont. Black inner ring is human louse p endosymbiont plasmid reference sequence, red ring represents the annotation of human louse p endosymbiont plasmid, and p urple outer ring is alignment of query chimpanzee louse p endosymbiont plasmid sequence. Genes involved in de novo synthesis of pantothenate ( panB, panC and panE ) are labeled in red. Image generated using BRIG (Alikhan et al. 2011).
52 Table 2 1: B vit amins predicted to be supplied to human lice by their p endosymbiont, C a Riesia pediculicola (based on Puchta 1955 as interpreted by Perotti et al. 2009), and if genes associated with vitamin synthesis where detected in p endosybmiont genomes. B vitami n Effect on Louse if vitamin absent Human louse p endosymbiont Chimpanzee louse p endosymbiont Thiamine (B1) High female mortality, males survive to adult No, transport present No, transport present Riboflavin (B2) High mortality during 2 nd molt Yes Ye s Folic Acid (B9) High mortality during 2 nd & 3 rd molts Yes Yes Pyridoxine (B6) High mortality during 2 nd molt Yes Yes Nicotinamide (B3) High mortality during first molt Yes Yes Pantothenate (B5) Near complete mortality during 1 st molt Yes, plasmid bas ed Yes, plasmid based biotin (B7) High mortality during 1 st molt Yes Yes Table 2 2: Genome and assembly statistics of Louse p endosymbionts. CDS total=sum of protein coding sequences found in a given genome. CDS unique=sum of protein coding sequences found in one louse p endosymbiont genome, but not the other. Human louse p endosybmiont sequenced by Kirkness et al. (2010) Chimpanzee louse p endosymbiont sequenced in this study. Human Louse p endosymbiont Chimpanzee Louse p endosymbiont Primary chromosome number o f bases 582127 576757 number of contigs n/a 5 percent GC bases 28.57 31.79 number of CDS total 556 585 number of CDS unique 84 118 Pantothenate plasmid number of bases 7737 5159 number of contigs n/a 1 percent GC bases 35.25 37.1 number of CDS t otal 12 5 number of CDS unique 7 0
53 Table 2 3: Age of associations between p endosymbionts and insects and the estimated rate of gene loss in each p endosymbiont. Rates and ages for Blochmannia and Buchnera species from Degnan et al. (2005), ages of louse p endosymbionts from Allen et al. (2009), and rates of gene loss in louse p endosymbionts calculated in this study. P endosymbiont Host Insect Age of symbiosis Rate of gene loss Candidatus Riesia pediculicola Human lice 13 25 my 1gene/0.37my Candi datus Riesia pediculiscaeffi Chimpanzee lice 13 25 my 1gene/0.56my Blochmannia pennsylvanicus Carpenter ants ~30 my 1gene/4.0 50.my Blochmannia floridanus Carpenter ants ~30 my 1gene/0.64 0.80my Buchnera aphidicola Aphid species >150 my 1gene/1.70 2.38m y
54 CHAPTER 3 BACTERIAL ENDOSYMBIOSIS IN A MARINE LOUSE (PHTHIRAPTERA, ANO PLURA) AND DRAFT GENOME SEQUENC E S OF A NEW RICKETTSIA AND SODALIS LIKE ENDOSYMBIONT. The capture and maintenance of intracellular heritable bacteria (endosymbionts) was likely ke y in the evolution of insect species with limited diets. One such example are the lice that parasitize mammals. These lice feed exclusively on mammal blood, a food source that lacks essential vitamins. Parasitic lice rely on endosymbionts to provision v itamins absent in their diet. Unlike oth er insect bacterial endosymbiose s, louse endosymbionts do not form a single monophyletic assemblage This suggests endosymbiosis with between lice and bacteria has evolved multiple time or new bacteria frequently r eplace endosymbionts. Previous studies h ave described bacterial symbiose s in diverse louse species, but almost nothing is known about bacterial symbiosis of lice parasitizing marine mammals. These lice are subject to extreme environmental conditions, whi ch may limit their opportunity to feed. This may put additional demands on the lice and their endosymbionts. Here we show that lice parasitizing the northern fur seal have associations with bacteria unlike p reviously described relationships in other lous e species. These lice harbor two bacterial species one an intracellular endosymbiont the other abundant Rickettsia One belongs to a clade of closely related endosymbionts found in tsetse flies ( Sodalis glossinidius ), weevils, spittlebugs, and bir d lice The Rickettsia species is closely related to a tick vectored pathogen that causes Rocky Mountain spotted fever. In other louse species, endosymbionts are housed in cells called bacteriocytes. These cell forming specialize d structures called bacteriom es, for storage of endosymbionts W e found simple bacteriocytes in the cuticle of these lice. The Sodalis like endosymbionts were localized
55 to bacteriocytes as well as being present throughout fatty tissues of the body cavity and reproductive tissu es. Rickettsia were difficult to isolate, but are appear to be difuse in Both of these bacteria have larger genomes than the well characterized bacteriome bound insect endosymbionts, but have genomes similar to insect vectored pathogens and facultative endosymbionts. We assume one or both bacteria are essential for development or survival of these lice. These data suggests we have found a louse species with a recently acquired endosymbiont, where there has not been sufficient time for more complex associations to develop (such as localization of endosymbionts to a bacteriome) Further we suggest that facultative endosymbionts may provide a source for obligate vertically inherited endosymbionts in insects. Collectively our fi ndings describe an unusual louse bacterial symbiosis, an endosymbiont with a comparatively large genome, and an un described insect associated Rickettsia species. These finding also demonstrate the importance of broad sampling when documenting the diversi ty of animal associated bacteria. Background Numerous insect spec ies have formed complex symbiose s with heritable bacteria (Bright and Bulgheresi 2010; Gosalbes et al. 2010). These symbionts are generally classified as reproductive parasites, facultati ve beneficial endosymbionts, or obligate endosymbionts (Dale and Moran 2006; see also Oliver et al. 2010; Hosokawa et al. 2010). Reproductive parasites are pa ssed on vertically, but show no obvious benefit to the host. Facultative endosymbionts (often re ferred to as secondary or s endosymbionts) confer beneficial traits to the host (Oliver et al. 2010) and are primarily inherited vertically, but occasionally undergo horizontal transmission (Dale and Moran 2006). Obligate endosymbionts (referred to as pri mary or p endosymbionts) are only
56 vertically inherited and are housed in specialized structures called bacteriomes (Buchner 1965; Dale and Moran 2006; Bright and Bulgheresi 2010). These p endosymbionts are typically involved in nutritional provisioning an d in sucking lice (Phthirapter: Anoplura) they synthesize vitamins not available in the insects diet (Puchta 1955; Perotti et al. 2009). The sucking lice and the closely related Rhyncophthirina have modified mouthparts to feed exclusively on mammal blood ( Grimaldi and Engle 2005). Many if not all of these lice likely rely on nutritional supplements from endosymbiotic bacteria (Puchta 1955; Perotti et al. 2009). These bacteria supply b vitamins absent in vertebrate bloo d (based on research done with h uman lice by Puchta 1955; see also Perotti et al 2009). Ries, Aschner, and Weel were the first to publish on the endosymbiotic bacteria of blood feeding lice (Ries 1931; Aschner and Ries 1932; Ries 1932; Ries 1933; Ries and Weel 1933; Ries 1935; Buchner 1965 ). They found diverse louse species are infected by obligate endosymbiotic bacteria, now known to belong to the gamma proteobacteria species (Allen et al. 2007; Allen et al. 2009; Hypsa and Krizek 2007; Fukatsu et al. 2009; Novakova et al. 2009 ; Allen et al. in rev. ). Just as other p endosymbiont t hese bacteria are housed in specialized organs, known as bacteriomes, and vertically transmitted from mother to offspring (Ries 1931; Aschner and Ries 1932; Ries 1932; Ries 1933; Reis and Weel 1933; Ries 1935; Buchner 1965; Elerbe and Mclean 1982 and 1983). However, the location in the louse, general characteristics, and tissue type forming bacteriomes can be different across species (see Buchner 1965 for general overview). Based on these observations, Buchner (1965) suggested that bacteriomes had originated independently in different louse
57 species and were not derived from an ancestral structure. Later evidence was found using m olecular ph ylogenetics that the bacteria inhabiting these bacteriomes were distant ly related across the gamma proteobacteria (Allen et al. 2009; Hypsa and Krizek 2007; Fukatsu et al. 2009; Novakova e t al. 2009; Allen et al. in rev. ; see also Boyd and Reed 2012 for review). This suggests that louse bacterial symbiosis has originated mul tiple times or that endosymbionts have been replaced multiple t imes. This is u nlike other insect bacterial symbioses that have been stable for tens to hundreds of millions of year s and endosymbiont replacement appear rare (Clark et al. 2000; Gosalbles et al. 2010). In addition to the beneficial bacteriomic endosymbionts, lice may also be infected by reproductive parasites ( Wolbachia ) and transmit disease causing bacteria ( Rickettsia and Bartonella ) belonging to the alpha proteobacteria (Bozeman et al 198 1; Gross 1996; Fournier et al. 2002; Houhamdi et al. 2002; Robinson et al. 2003; Kyei Poku et al. 2005; Gillespie et al 2009). Wolbachia infections have been found in diverse louse species (Kyei Poku et al. 2005); however, Rickettsia prowazekii and Barto nella quintana are only known to infect and be transmitted by human clothing lice, Pediculus humanus corporis (and possibly rodent associated lice; Bozeman et al 1981; Gross 1996; Fournier et al. 2002; Houhamdi et al. 2002; Robinson et al. 2003; Gillespie et al. 2009). Both are deadly pathogens of humans causing epidemic typhus and trench fever (Bozeman et al 1981; Gross 1996; Houhamdi et al. 2002; Robinson et al. 2003; Gillespie et al. 2009). Infection of human body lice by R. prowazekii also results i n other pathogenic species of Rickettsia have been shown to infect lice in the laboratory
58 and all resulted in death of the louse similar to R. prowazekii infections ( Houhamdi et al. 2003; Houhamdi and Raoult 2006). There has been extensive documentation of louse bacterial symbiosis in many louse species but very little is known about the bacterial endosymbionts of lice parasitizing marine mammals (Buchner 1965). The w alruses and seals (Odobenidae, Otariidae, and Phocidae) are parasitized by four genera of sucking lice (Durden and Musser 1994). These lice are obligate ectoparasites, remaining on their hosts at all times. These lice are exposed to extreme conditions an d may have limited opportunities to feed while the host is underwater (Kim 1971). This may place additional demands on endosymbionts of th ese lice. Allen et al. (in rev. ) described a gamma proteobacterial endosymbiont form lice parasitizing northern fur seal based on 16s ribosomal DNA. Here we describe in detail two abundant bacteria from lice ( Proechinophthirus fluctus Osborne; Phthiraptera, Anoplura) parasitizing the northern fur seal ( Callorhinus ursinus Gray; Pinnipedia, Otariidae). One is an endosy mbiont corresponds to the endosymbiont de scribed by Allen et al. (in rev. ). The Northern Fur Seal is a marine mammal found in the Bearing Sea, Pacific Ocean, and Sea of Japan (Nowak 1991). These seals travel great distances across frigid oceans while di ving for food; coming to land at rookeries primarily on small isolated islands. We obtained preserved specimens of P. fluctus collected from seal pups at a rookery on the Pribilof Islands, Alaska, USA (F igure 3 1). Next generation sequencing was used to sequence the genomes of bacteria in these lice for comparison to other insect endosymbiont Florescent in situ hybridization of 16S rD NA was used isolate these bacteria within the lice and characterize and classify the type of symbiosis and mechanisms of
59 transmission We then built phylogenies from protein coding and ribosomal subunit genes to better understand the relationship of these bacteria to other insect associated bacteria. From this data we data we attempt to evaluate if one or more if these sym bionts are pathogens and infer sources of obligate endosymbionts in lice. Methods Collection of L ice Proechinophthirus fluctus Osborne (Phthiraptera: Anoplura); Ex. Callorhinus ursinus Linnaeus (Otariidae); USA, Alaska, Pribilof Islands, St. Paul Island r ookery; leg. S Vignigeri Rolf. Lice were preserved in 95% ethanol and stored at 80 C. DNA Extraction and Genome S equencing Genomic DNA was extracted and isolated from whole lice using a phenol chloroform method. Extracts from four lice were pooled a nd used to construct a random shotgun library for next generation sequencing. The library was constructed using the Illumina TruSeq sample preparation kit with a targeted insert size of 300 400bp. DNA fragments were sequenced paired end on the Illumina HiSeq 2000 platform using the TruSeq SBS sequencing kit, yielding 100bp reads. Isolation of Bacterial Sequence D ata Isolation and assembly of the bacterial genomes from the northern fur seal l ouse followed methods described in Chapter 2 to sequence the p endosymbiont of the end based on base quality score. After quality trimming, reads shorter tha n 75 bases were deleted along with their mate pairs. Remaining reads were de novo assembled into contigs using ABySS (Simpson et al 2009). Resulting contigs were compared to a
60 library of bacterial genomes including proteobacterial endosymbionts and Baci lus (a gram positive soil bacteria and gut commensal of humans) using blastn ( Ca Riesia pediculicola USDA gi295698239, gi292493920; Sodalis glossinidius str. Morsitans gi85057978, gi85060411, gi85060466, gi85060490; Wigglesworthia glossinidia gi32490749, gi19225058, gi19225058; Photorhabdus luminescens subsp. Laumondii gi37524032; Yersinia pestis gi31795333; Bacilus subtilis subsp. subtiis gi223666304; Buchnera aphidicola str. APS gi15616630, gi10957103, gi10957099; and Ca Blochmannia floridanus gi335194 83; Rickettsia conorii str. Malish 7 gi15891923; Altschul et al 1990). Contigs showing significant similarity to these genomes were considered to be parts of bacterial genomes. Reads used in the assembly of these bacterial contigs were identified and is olated using Bowtie v.beta2.6 (Langamed 2012). Reconstruction of Endosymbiont G enomes Our initial alignment of contigs to bacterial genomes suggested there was two species of bacteria present in the lice. One was a Rickettsia and other a close relative of Sodalis like endosymbionts. Before we could reconstruct the genomes of these two bacteria we need ed to separate sequence reads belonging to each bacterial genome. To do this we first de novo assembled all bacterial reads into contigs. These contigs we re then aligned to Rickettsia ( R. conorii str. Malish 7 gi:15891923) and Sodalis genomes ( S. glossinidius str. Morsitans gi:85057978) using blastn. Contigs were assigned to Rickettsia or the Sodalis type endosymb iont based on blast scores ( i. e those con tigs with a significantly better alignment score to the R. conorii genome and lower e value were considered to belong to the Rickettsia genome and likewise for better alignment scores to the Sodalis type endosymbiont genome). Reads that went into the cons truction of these contigs were then isolated as either belonging to the
61 Rickettsia or Sodalis genomes. Both genomes were then independently assembled de novo using AbySS. Both draft genomes were then annotated using the Rapid Annotation Subsystem (Overbe ek et al. 2005, Aziz et al. 2008). Testing for Rickettsia peacockii pRPR P lasmid Some species of Rickettsia carry a plasmid that can be difficult to detect. A closely related species of the Rickettsia we found in the fur s eal louse, R. peacockii, carries a 26,406bp plasmid (Felsheim et al. 2009). The R. peacockii primary chromosome and the plasmid both carry a DnaA gene. The sequences of these two genes are considerab ly different. To determine if we had potentially missed this plasmid in our reconstruct ion of fur s eal louse Rickettsia we aligned sequence data from our new Rickettsia to both copies of this DnaA gene from R. peacockii We did this using Bowtie vbeta2.6, requiring that reads align end to end with a minimum of substitutions. Here we can de termine if two copies of the gene are present if sequence data only aligns to one gene (suggesting only one copy is present on the primary chromosome) or if there is alignment to both copies (suggesting both the primary chromosome and plasmid are present). Identification of Rickettsia Pathogenicity G enes Felsheim et al. (2009) identified genes whose products may be involved with virulence of vertebrates by spotted fever group Rickettsia Based on their report, we obtained sequence of these genes from th e Rickettsia rickettsii SS genome (Felsheim et al. 2009 reported the gene identifiers). We used reciprocal tblastx to identify potential homologs of these genes in this new louse associated genome. Theses genes were aligned to their respecti ve homologs in R. rickettsii SS using Muscle (Edgar 2004) and alignments were viewed in Geneious (Biomaters, www.geneious.com ).
62 Phylogenetic A nalysis We used three protein coding genes, AtpA, CoxA and GltA (identifi ed by Weinert et al (2009) as useful phylogenetic markers in Rickettsia ), to build a phylogenetic tree of Rickettsia species with available genomes that we could get all three genes from including our new Rickettsia The three protein coding genes were i dentified in our draft genome sequence and other Rickettsia and Orientia genomes (accessed through genomeevolution.org; Lyons and Freeling 2008; Lyons et al. 2008) using tblastx. Gene sequences were aligned using Muscle implemented in Geneious. Aligned g ene sequences were then concatenated to form a supermatrix. A maximum likelihood tree was constructed in RAxML (VI HPC; Stamatakis 2006) from this supermatrix, under a of 1 000 rapid bootstrap replicates. The tree was rooted to Orientia and viewed in FigTree (http://tree.bio.ed.ac.uk/software/figtree/). A similar method was used to obtain the DNA sequences encoding for the 16 S ribosomal subunit (16S r D NA) in the same Rickett sia and Orientia species. Agai n we aligned the 16Sr D NA sequence in Muscle. RAxML (VI HPC) was used to find the percent of 1000 rapid bootstrap replicates. The tree was rooted to Orientia and viewed the tree in FigTree. The relationship of the Sodalis like endosymbiont to other know gamma proteobacte ria was assessed using the 16S rDNA. Representative 16S rD NA sequences were downloaded from the Ribosomal Database Project II (htt p://rdp.cme.msu.edu/seqmatch/seqmatch_intro.jsp). Sequences were aligned using
63 Muscle, implemented in Geneious. The maximum likelihood tree with 1000 rapid bootstrap replicates was calculated in RAxML (VI Localization of E ndos ymbionts 16S rD NA was isolated and re sequenced from three additional lice for both endosymbionts using primers 27F, 1525R, 1329R (Lane 1991) Polymerase Chain Reactions (PCR) were done using Sratageen Hi Fidelity Master Mix. PCR products were cloned us ing Invirtorgen Cloning Kit and 96 colonies were sequenced using Sanger sequencing ( the same methods used by Allen et al. in rev. who initially described this endosymbiont use 16SrDNA ). A lignment of Illumina reads to the resulting Sanger sequence verified they were identical. These Sanger derived 16S rD NA were used to construct probes to identify both endosymbionts in whole mount in situ florescent hybridization. An Alexa Fluor 555 labeled oligonucleotide (AI555 Sod181R) targeting 16S rD NA of the Sodalis like endosymbiont was used to identify the endosymbionts in resin embedded sections of whole lice (R. Koga personal communication). Results We found that l ice parasitizing the northern fur s eal harbor two bacteria species Phylogenetic results show that one is a Rickettsia species belonging to the spotted fever species group. The second endosymbiont belongs to a clade of gamma proteobacteria endosymbionts from tsetse flies ( S. glossinidius ) beetles, spittlebugs, and chewing lice. Sodalis like E ndosymbion t Molecular phylogenetics using 16S rD NA suggests that this endosymbiont is most closely related p endosymbionts of chewing l ice (Phthiraptera, Ischnocera; F igure
64 3 2). Bootstrap support for this association was very low (ie: <50%) The new endosymbiont and chewing louse p endosymbionts formed a clade sister to the p endosymbionts of s endosymbionts of weevils (Coleoptera), spittlebugs (Hemiptera), and tsetse flies (Diptera; S. glossinidius Figure 3 2 ). The draft genome of this endosymbiont totaled 3,3 05,175bp and 43.56%GC. The assembly resulted in 629 scaffolds with a relatively low N50 of 76,377. All but two of the larger scaffolds could be aligned to the primary chromosome of S. glossinidius (F igure 3 3). None of our scaffolds could be aligned to the three plasmids known from S. glossinidius (pSG1, pSG2, and pSG3). Whole mount in situ hydridization of 16S rD NA found the Sodalis like endosymbiont were found to be concentrated in bacteriocytes in the cuticle and spread throughout the body cavity in both nymphs and adult female lice. We also found these endosymbionts in the ovaries of adult female lice (F igure 3 4). Large clusters of the endosymbionts were found in the follicle tissues surrounding developing eggs. We also observed endosymbionts th at had left the follicle tissues and migrated to the posterior poles of developing eggs where they formed large masses Rickettsia Molecular phylogenetics based on protein coding genes and 16S rD NA placed this new louse associated Rickettsia as a member of the spotted fever species group. Specifically this new Rickettsia belongs to a clade that includes R. rickettsii, R. parkerii and R. peacockii (F igure 3 5). When estimating phylogenetic relationships from three protein coding genes ( AtpA CoxA and G ltA ) we found strong support (based on percent of 1000 bootstrap repl icates) for this relationship (F igure 3 5). A 16S rD NA based phylogeny of the same species yielded similar results, but with lower bootstrap
65 support. In the phylogenty based on pro tein coding genes our new louse associated Rickettsia, R. rickettsii and R. parkerii formed a clade sister to R. peacockii In the 16S rD NA based tree our new louse associated Rickettsia formed this basal position occupied by R. peacockii in the tree based on protein coding genes The draft genome sequence of this Rickettsia totaled 1,251,943bp and 32%GC The genome was assembled into 3 scaffolds with a long N50 = 893,776. We found no evidence of the pRPR plasmid known from the closely related R. peacockii Alignment of our scaffolds showed that gene order was largely conserved between our new louse associated Rickettsia and R. rickettsii with the exception of two inversions (F igure 3 6). One is a large inversion that occurred in the common ancestor of our new Rickettsia, R. rickettsii and R. parkerii with a subsequent inversion within the original inversion in R. rickettsii The second inversion appears unique to our new Rickettsia From our de novo genome assembly we identified 18 genes possibly a ssociated with virulence in vertebrates in closely related R. rickettsii We found that five of these 18 genes were disrupted by mutation based on next generation sequence data assembly. Four of these genes were disrupted by major deletions one by a fram e shift mutation (see T able 3 1 and F igure 3 7). In situ hybridization using 16S rD NA failed to determine the exact location of Rickettsia in louse tissues. We found very low levels of probe binding in reproductive tissues of adult female lice. Conc lu ding R emarks Here we show that lice parasitizing the northern fur seal possess two un described proteobacteria. One is a gamma proteobacteria closely allied with p endosymbionts of chewing lice, spittlebugs, and weevils as well as the s endosymbiont
66 of tsetse flies ( S. glossinidius ). We found this endosymbiont in cuticle based bacteriocytes, spread throughout the body cavity, and in the reproductive tissues of adult female lice and invading the posterior pole of developing eggs. The second bacteria is a Rickettsia species belonging to the spotted fever species group. We failed t o isolate this species in whole mount in situ hybridization, but suspect it might also be present in low levels in reproductive tissues. Roughly one third of insect species are expected to carry heritable bacteria (Duron and Hurst 2013). Lice are no exception, having formed complex associations with many species of proteobacteria (Bozeman et al. 1981; Gross 1996; Fournier et al. 2002; Houhamdi et al. 2002; Robinson et al. 2003; Kyei Poku et al. 2005; Sakai Fukatsu et al. 2006; Allen et al. 2007; Hypsa and Krizek 2007; Perotti et al. 2007; Allen et al. 2009; Fukatsu et al. 2009; Gillespie et al. 2009; Novakova et al. 2009). Many species provide no known benefit, but we suspect m ost if not all louse species rely on an lice, this requirement is met by a single p endosymbiont species that occupies a specialized structure called a stomach disk (a common na me for the bacteriome in this species; Puchta 1955; Buchner 1965; Sakai Fukatsu et al. 2006; Perotti et al. 2007; Perotti et al. 2009). Many species of blood feeding lice, closely related chewing lice (parasites of birds; Ischnocera), and booklice (Psocop tera) carry p endosymbionts housed in bacteriomes (Ries 1931; Aschner and Ries 1932; Ries 1932; Ries 1933; Ries and Weel 1933; Ries 1935; Buchner 1965; Perotti et al. 2006; Sakai Fukatsu et al. 2006; Perotti et al. 2007; Fukatsu et al. 2009; Smith et al. 2 013). Most insect p endosymbionts are very stable across evolutionary time co speciating for tens to hundreds of millions of years
67 (see review by Gosalbes et al. 2010). However there is evidence frequent p endosymbiont replacements or multiple novel ori gins of p endosymbionts in lice (Allen et al. 2007; Hypsa and Krizek 2007; Allen et al. 2009; Fukatsu et al. 2009; Novakova e t al. 2009; Allen et al. in rev. ; see also Chapter 4 for a contrasting hypothesis ). In the chewing lice, Smith et al. (2013) found evidence of frequent host switching by p endosymbionts with little co speciation. In the sucking lice, co speciation has been documented in human and anthropoid primate lice, but the association is less than 25 million years old (Allen et al. 2007; Allen et al. 2009). Other sucking louse species possess unique clades of p endosymbionts (Hypsa and Krizek 2007; Fukatsu et al. 2009; Novakova e t al. 2009; Allen et al. in rev. ). In the sucking lice parasitizing nor thern fur seals we find additional ne w louse associated bacteria However, much to our surprise neither appears to occupy a bacteriome like we see in other endosymbionts of sucking lice. Sodalis like E ndosymbiont One of the two new bacteria is a gamma proteobacteria (Enterobacteria) endosymbiont that is closely related to p endosymbionts of ch ewing lice parasitizing birds (F igure 3 2). This clade of endosymbionts belongs to a larger clade of Sodalis like endosymbionts. This clade includes p endosymbionts of chewing lice, weevils, and spittlebug s and s endosymionts of tsetse flies and weevils. While our phylogeny placed this endosymbiont within the chewing louse p endosymbiont clade, overall support based on percent of bootstrap replicates was low. Smith et al. (2013) suggested rapid radiation or host switching has left the phylogeny of these p endosymbionts difficult to resolve. Regardless we feel confident that this endosymbiont does belong within the larger Sodalis like endosymbiont clade, even if the placement within clade is unclear.
68 Due to data availability, our phylogenetic anaylsis was limited to 16S rD NA and bootstrap support was extremely low for the rela tionship between taxa within this clade Adding additional phylogenetic markers, such as protein coding genes, would likely help re solve the tree. This is the first time a Sodalis like endosymbiont has been characterized in a sucking louse. Other sucking lice (and closely related Rhynchopthrina) possess Enterobacteria endosymbionts more closely related to p endosybmionts of tsetse flies ( Wigglesworthia glossinidia ) and endosymbionts of parasitic bat flies and parasitoid wasps ( Arsenophonus ; Allen et al. 2007; Hypsa and Krizek 2007; Allen et al. 2009; Fukatsu et al. 2009; Novakova e t al. 2009; Allen et al. in rev. ). It is conceivabl e that this Sodalis like endosymbiont is providing vitamins absent in the louses diet, as closely related endosymbionts have similar roles. The Sodalis like p endosymbionts from weevils likely supply their insect hosts with vitamins and amino acids absent in their diets (Gosalbes et al. 2010). In the blood feeding tsetse flies, S. glossinidius is largely has a complex relationship with its host (Gosalbes et al. 2010). This s endosymbiont can synthesize many vitamins, but lacks the ability to synthesize t hiamine and likely scavenges a pre cursor synthesized by another endosymbiont species (Snyder et al 2010). Thiamine synthesis was found to be important in symbiosis between p endosymbionts and human lice (Puchta 1955; Perotti et al. 2009). However, rece nt work found that human louse p endosymbionts appear to also lack t his pathway (Chapter 2 ). We were unable to locate a well organized bacteriome or other specialized structure ho using either bacteria finding the Sodalis like endosymbiont in simple
69 bacter iocytes, spread through out the louse body cavity, and in reproductive tissues (F igure 3 4). Closely related p endosymbionts from chewing lice and weevils are present in specialized bacteriomes and in the ovaries (Lefevre et al. 2004; Fukatsu et al. 2007; Bright and Bulgheresi 2010; Koga et al. 201 3 ). Weevils have more organized bacteriomes, while chewing lice have a more simplified bacteriocytes for endosymbiont storage like the fur seal louse. S. glossinidius s endosymbiont of tsetse flies, is found m ore diffusely in host tissues and not localized to a bacteriome ( like we see in the fur seal lice; Bright and Bulgheresi 2010). It also does not localize in ovaries of tsetse flies, but instead is passed on via a milk glad to developing larvae (tsetse fli es are ovoviviparous and larval development is internal; Bright and Bulgheresi 2010). Unlike tsetse flies, lice are oviparous and transovarial transmission is required for vertical transmission of endosymbionts. T hese new Sodalis like endosymbionts occup y tissues throughout these lice, similar to S. glossinidius These data suggest that this new Sodalis like endosymbiont is not a p endosymbiont (commonly found in other louse species), but is an s endos ymbiont. Other insect s endosymbionts ( e. g. Serrati a ; see Oliver et al. 2010 for review) are know n to invade bacteriocytes in bacteriomes ; therefore it would not be uncharacteristic to find a n s endosymbiont in a bacteriocyte. Therefore the mechanisms of housing and transmission are more like an s endosy mbiont. Our de novo reconstruction of the Sodalis like endosymbiont genome produced a genome totaling 3.3Mbp, 44% GC. This genome is considerably larger than sequenced p endosymbiont genomes from human and chimpanzee lice (~0.57Mb ~32% GC; Kirkness et a l. 2010 ). However this is considerably smaller than genomes of the
70 closely related s endsoymbiont, S. glossinidius at 4.3Mbp, 55%GC and the p endosymbiont of weevils at 4.5Mbp (Charles et al. 1997; Clayton et al. 2012). We were able to align most of th e scaffolds in our genome assembly to the S glossinidius genome (F igure 3 3). Only two of our larger scaffold s did not share common sequences with that genome. None of our scaffolds shared homologous sequences with two plasmids know n from S. glossinidiu s Our draft genome sequence suggests that genome erosion has reduced the total genome size in this louse associated s endosymbiont. Insect endosymbionts possess extremely small and AT rich genomes (see Gosalbes et al. 2010 for review). The reduced geno me size is the result of reduced purifying selection, small effective population size, transmission bottlenecks, see also Bright and Bulgheresi 2010). These factors lead t o the accumulation of deleterious mutations and erosion of the genome (Bright and Bulgheresi 2010). Genome erosion appears to quickly remove genetic material not need ed to maintain an endosymbiotic lifestyle in recently acquired endosymbionts, slowing in older associations (Allen et al 2009). Our assembly of this new Sodalis like endosymbiont is a draft assembly with many small contigs that need to be joined into scaffolds. While more work should be done to refine the assembly, it does suggest that geno me erosion has been active in both s endosymbionts of tsetse flies and lice. This is consistent with previous studies finding numerous pseudogenes in S. glossinidius (Toh et al. 2006). Lice appear to have arrived at the solution of endosymbiosis for prov isioning vitamins multiple times. Additionally louse endosymbionts may often go extinct and be replaced by a new endosymbionts (Smith et al 2013). This new Sodalis like
71 endosymbiont may represent a recently captured endosymbiont transit ioning to a p en dosymbiont life style. This endosymbiont undergoes vertical transmission from female lice to o ffspring, but is not house d in a more organized manner. Perhaps s endosymbionts or more loosely affiliated endosymbionts serve as sources for new p endosymbionts in parasitic lice. Rickett sia The second bacteria detected in the lice parasitizing northern fur seals was an alpha proteobacteria belonging to the genus Rickettsia Known associations between Rickettsia and parasitic lice are few and only R. prowazek ii has been previously found in association with wild blood feeding parasitic lice (Bozeman et al. 1981; Gross 1996; Houhamdi et al. 2002; Robinson et al. 2003; Gillespie et al. 2009). R. prowazekii is a deadly pathogen of humans causing epidemic typhus a nd is vectored by human body lice (Bozeman et al. 1981; Gross 1996; Houhamdi et al. 2002; Robinson et al. 2003; Gillespie et al. 2009). R. prowazekii infections of lice result in death of lice, due to rupture of the gut wall (Houhamdi et al. 2002). Refer rupture of the gut allows its contents to leak into the hemo ly m ph giving a diagnostic visible red color to the louse (Raoult and Roux 1999). Three additional species of Rickettsia ( R conorii, R. rickettsii, and R. typhii ) hav e been shown to infect parasitic lice under artificial conditions (Houhamdi et al. 2003; Houhamdi and Raoult 2006). These R. prowazekii infections (Houhamdi et al. 2003; Houham di and Raoult 2006). We saw n o evidence of pathology consistent with Rickettsia infections of lice in the northern fur s eal louse s pecimens
72 Our phylogenetic analysis supported this louse associated Rickettsia as member of the spotted fever species group of Rickettsia (F igure 3 5). We found slight differences in the placement of this new Rickettsia fr om the fur s eal louse based on protein and rRNA encoding genes. 16S rD NA is a common marker used in bacterial phylogenetics, however, this marker evolves s lowly in Rickettsia and is limited in use for describing phylogenetic relationships of recently diverged species (Weinert et al. 2009). Weinert et al. (2009) in a recent phylogenetic survey of Rickettsia and related genera described three protein coding g enes ( AtpA, CoxA and GltA ) that served as robust phylogenetic markers for the genus. In our phylogenetic analysis we generated trees based on these three protein coding genes as a supermatrix and 16S rD NA. While we found difference between the 16S rD NA and protein coding gene trees, they were minor. Bootstrap support for the protein coding tree was higher than the 16S rD NA tree and for that reason we accepted the protein coding supermatrix based tree as the best estimate of the species tree. Phylogenet ic reconstruction of Rickettsia supported this new Rickettsia from lice as a part of a clade containing R. parkerii, R. rickettsii and R. peacockii Rickettsia to be found in a wild parasitic blood feeding louse. Hard ticks and fleas serve as the vectors for most pathogenic species of Rickettsia (Merhej and Raoult 2010). Ixodes ticks serve as the sole known vectors for R. parkerii, R. rickettsii, and R. peacockii in the wild (Wi enert et al. 2009). Additionally, R. peacockii is an obligatory associate of ticks and is unable to undergo horizontal transmission to vertebrate hosts like R. parkerii and R. rickettsii (Niebylski et al. 1997; Burgdorfer et al. 1981). It seems unlikely that this Rickettsia infection represents an
73 accidental or transient infection, because ticks are not known to parasitiz e the northern fur seal (Ward 1921; Furman and Loomis 1984; see also Osborne 1899). This species of Rickettsia could maintained throug h vertical transmission between lice or by horizontal transmission between lice and northern fur seals. To attempt to determine which or if possibly both mechanisms of transmission was maintaining these symbionts in lice, we used whole mount in situ hybri dization of 16S rD NA. Unfortunately, we failed to find any aggregations of Rickettsia in louse tissue. We suspect that this endosymbiont is likely present, but diffuse throughout louse tissues. It could also be present in the gut or circulatory systems of the lice as the bacteria could have been wash ed away during specimen preparation. Because we could not definitively find Rickettsia in the louses ovaries, we suspect that these bacteria are only horizontally transmitted between lice and seals. T his new Rickettsia has a genome similar to the genome of its close phylogenetic relative, R. rickettsii (F igure 3 6). Both this new Rickettsia and the R. rickettsii SS geno mes are ~1.3Mb in length R. prowazekii (the only previously reported species to b e vectored by lice) possess a smaller genome, ~1.11mb, and encodes for fewer genes, ~874. This new Rickettsia possesses a genome more like spotted fever group Rickettsia than R. prowazekii. R. peacockii a closely related species of R rickettsii carries a 26kb plasmid (Felsheim et al. 2009). This plasmid might complement mutations in the R. peacockii genome that would limit phospholipid biosynthesis (Felshiem et al. 2009). Our resul ts suggest that this Rickettsia does not carry the plasmid found in R. peacockii
74 Of the three Rickettsia species most c losely related to the new louse associated Rickettsia two possess a conserved gene order (i.e. few or no inversions) found in most Rickettsia These species are vertebrate pathogens that may largely rely on horizontal transfer between invertebrate and vertebrate hosts to be maintained in host populations (Niebylski et al. 1999). In contrast the R. peacockii genome has been extensively rearranged by the presence of a mobile element (Felsheim et al. 2009). This rearrangement may be associated with a shift in the life history of this species to obligate endosymbiosis (Niebylski et al. 1997). We found that this new genome had not been remodeled like that of R. peacockii Genome synteny is larg ely maintained with R. rickettsii with the exce ption of two large inversions (F igure 3 6). Our results suggest there was a large inversion in the ancestor of our new Rickettsia and R. rickettsii There was then a subsequent inversion within this inverte d segment in R. rickettsii leading to th e pattern of inversion seen in F igure 3 6. The second inversion in the new Rickettsia genome is large and distant to the other. This data would suggest R. peacockii and largely shares genome synteny with the virulent spotted fever group Rickettsia Previous research suggested that genome reduction is associated with vertebrate virulence by Rickettsia ( Fourneir et al. 2009 ). However, recent research in t he spotted fever group of Rickettsia has pointed to genes that may be essential to host invasion and coopting of host cell machinery (Felsheim et al. 2009). This is due in large part to studies comparing the genomes of virulent R. rickettsii with attenuat ed R. rickettsii and the obligate symbiont of ticks R. peacockii By comparing the genome of the virulent R. rickettsia SS with the genome of R. peacockii Felsheim et al. (2009)
75 identified 18 genes that might be associated with virulence (assuming the a ncestor to both species was virulent and R. peacockii represented a loss of virulence). One of these genes, encoding for ankyrin repeats, was also disrupted to two strains of attenuated R. rickettsii ( R. rickettsii Iowa and hlp#2; Felsheim et al. 2009; pe r. com. R. Felsheim). In our new Rickettsia we found evidence for mutational loss of function in five of the 18 genes believed to play a role in virulence by R. rickettsii (Felsheim et al. 2009; T able 3 1; F igure 3 7). This included the a nkyrin repeat co ntaining gene absent or disrupted in R. peacockii and the two attenuated strains of R. rickettsii A large deletion was found in this gene in the Rickettsia symbiont of the fur s ea l louse that almost perfectly matched a deletion in the same gene in the at tenuated Iowa strain of R. rickettsii (F igure 3 7). T his suggest s that this new Rickettsia may not be capable of acting as a zoonotic pathogen transmitted by lice. It was then surprising that we could not detect abundant Rickettsia in the reproductive ti ssues of these lice. Rickettsia species belonging to the Transitional and Torix species groups have been found in obligate endosymbiosis with booklice, distant relatives of the parasitic lice (Perotti et al. 2006; see also Weinert et al. 2009). Th ese wou ld suggest Rickettsia are capable of vertical transmission and obligate endosymbiosis with lice. This data suggests a loss of genes involved with virulence occurred independent of the loss of virulence genes in the closely related obligate endosymbiont of ticks, R. peacockii The disruption of genes involved in host invasion and recruitment of host resources in R. peacockii occurred by massive reorganization of its genome (Felsheim et al 2009). In this new species of Rickettsia we do not see a similar p attern of reorganization, but we do see that the genome structure is largely maintained with its virulent neighbor species R. rickettsii
76 Further investigations with this new Rickettsia could elucidate more about its biology and the results could be sign ificant to understand the evolution (and potentially loss) of vertebrate pathogenicity in Rickettsia Figure 3 1: A) Partial map of Russia and Alaska, USA showing collection site of lice in the Pribilof Islands, USA. B) An adult female P. fluctus (no rthern fur seal l ouse) used in this study. A) B )
77 Figure 3 2: Maximum Likelihood tree based on 16S rD NA showing relationship of Sodalis like Sodalis vertical bars, insects assoc iates of these bacteria noted alongside the bars.
78 Figure 3 3: Alignment of draft genome sequence of the Sodalis like endosymbiont of P. fluctus to the genome sequence of S. glossinidius Green dots indicate last alignment between genome assemblies. G ray bars indicate contig boundaries in genome assembly. Image generated in using SynMap in CoGe.
79 Figure 3 4: Whole mount in situ hybridization of endosymbiont 16S rD NA in P. fluctus Left, Sodalis like e ndosymbionts in simple bacteriomes located in the cuticle of a nymph. Right, Sodalis like endosymbionts forming a mass at the posterior pole of developing egg in a female louse Image taken by R. Koga.
80 Figure 3 5: Maximum Likelihood tree based on three protein coding gene supermatrix showing relationship of a new Rickettsia found in P. fluctus Numbers at nodes indicate percent of 1000 bootstrap replicates. Vertical bars indicate species groups described in text adjacent to bars. Louse associated Rickettsia speci es indicated by re d arrows. branch lengths that were shortened for publication.
81 Figure 3 6: Alignment of draft genome sequence of a new Rickettsia found in P. fluctus R. rickettsii SS, and R. peacockii to an outgroup taxon, R. conorii Green dots indicate last a ligments between genomes. Tree indicates evolutionary relatio n ship of taxa. Genome alignment images generated using SynMap in CoGe.
82 Figure 3 7: Alignment of ankarin repeat encoding gene from our new Rickettsia from P. fluctus and R. rickettsii SS, including consensus sequence, DNA sequence, and translated sequence. Black dashes represent gaps in the alignment. Alignment done using Muscle as implemented in Geneious. Table 3 1: Genes associated with pathogenicity in R. rickettsii SS and their st ate in our new Rickettsi a Genes associated with virulence in Rickettsia rickettsii SS State of homologous gene in Rickettsia from P. fluctus A1G_03950, Methyltransferase Intact A1G_03470, NAD(P)H dependent glyceral 3 phosphate dehydrogenase Intact A1 G_06990, OmpA Three deletions at bases 138 140, 636 1075, and 1526 3307 A1G_02570, Putative phospohoethanolamine transferase Intact A1G_00130, Cell surface antigen Sca1 like Intact A1G_02165, Protease II Deletion of bases 1051 1059 A1G_02820, A1G_028 5, A1G_02830, ABC transport Intact A1G_03355, Protein disulfied isomerase, DsbA Intact A1G_04605, YhbC Intact A1G_04620, transcription regulator rirA Frame shift mutation at base 372 leading to premature stop codon A1G_05015, RickA Deletion in repeated region A1G_04995 A1G_05010, Succinyl CoA:3 ketoacid coenzymeA transferase A & B subunits Intact A1G_05165, Ankyrin repeat containing protein (contains nine copies of repeat) Deletion of 8 copies of Ank repeat, same deletion found in attenuated R. ricket tsia
83 CHAPTER 4 PHYLOGENOMIC ANALYSIS REVEALS A WI DESPREAD CLADE OF HERITABLE SYMBIONTIC BACTERIA IN THE PARASITIC LICE OF MAMMALS (PHTHIRAPTERA: ANOPLURA AND RHYNCOPHTHIRINA Numerous insect species are engaged in obligate symbiosis with heritable intrace llular bacteria (endosymbionts). Necessary for survival, many endosymbionts feeding lice parasitizing mammals rely on endosymbionts to provision essential b vitamin s. Unlik e other insect species that have co spec iated with endosymbionts for hundred s of millions of years, louse endosymbionts have multiple origins in the bacterial tree of life. This suggests multiple novel origins of endosymbiosis between bacteria and lice or endosymbiont extinction with replacement throughout the evolutionary history of lice. Much of what is known about louse endosymbiont diversity is derived from single gene phylogenies. Single gene trees may be prone to error caused by stochastic events, providing a gene history deviates from the species history. A preferable approach to describe the evolutionary history of a species would be through multiple gene phylogenies, an approach facilitated by next generation sequencing. Here we show that many louse species actually carry closely related endosymbionts based on multi gene trees. While our results still show multiple independent origins of louse endosymbionts, our results suggest single gene phylogenies may overestimate the number of origins. We find evidence for a clade of bacterial endosymbionts in diverse species of blood feeding lice. We suspect that the blood feeding lice have been co speciating with a clade of bacteria, closely related to endosymbionts found in tsetse flies. Further we fi nd evidence that endosymbiont replacement has occurred in lice, but that replacement may be rare. There has been substantial interest in understanding the
84 origins of beneficial bacterial symbionts, particularly obligate endosymbionts. Are they derived pa thogens or did they originate from opportunistic commensals? The description of new endosymbionts and accurately describing phylogenetic relationships is key to answering broader questions of endosymbiont origins. Here we provide evidence that single gen e phylogenies, including those based on 16S ribosomal RNA, may be insufficient to describe insect endosymbiont species histories in lice. Further we suggest that future endosymbiont phylogenetic studies should focus on multi gene phylogenies when possible to eliminate problems associated with single gene phylogenies. Background Numerous insect species are engaged in beneficial symbiosis with intracellular heritable bacteria (Bright and Bulgheresi 2010; Gosalbes et al. 2010). These symbionts are classifi ed as either primary or secondary endosymbionts (Dale and Moran 2006; Oliver et al. 2010; Hosokawa et al. 2010). The secondary endosymbionts (s endosymbionts) are facultative and generally confer beneficial traits to the insect (Oliver et al. 2010). Whil e these bacteria are principally inherited vertically (often mother to offspring), they occasionally undergo horizontal transmission (Dale and Moran 2006). Primary endosymbionts (p endosymbionts) are obligate symbionts that are only vertically inherited a nd housed in specialized structures called bacteriomes (Buchner 1965; Dale and Moran 2006; Bright and Bulgheresi 2010). P endosymbionts are involved in nutritional provisioning, supplying vitamins or amino acids to the insect not available in the insects diet (see review by Gosalbes et al. 2013). Roughly 10% of insect species rely on p endosymbionts (Buchner 1965; Douglas 1989). Relationships between insects and p endosymbionts often persist for tens to hundred s of millions of
85 years and loss or replaceme nt of p endosymbionts appears rare (Clark et al. 2000; Gosalbles et al. 2010). The p endosymbionts of parasitic lice (Phthirapter a ) are an exception having multiple independent origins i n the bacterial tree of life (T able 4 1; Allen et al. 2009; Hypsa and Krizek 2007; Fukatsu et al. 2009; Novakova e t al. 2009; Allen et al. in rev. ; Smith et al. 2013; see also Leferve et al. 2004; Gosalbes et al. 2008; Koga et al. 2013). The sucking lice (Phthiraptera; Anoplura) and their sister clade the Rhyncophthirina are obligate ectoparasites of mammals. Both clades have specialized mouthparts to feed exclusively from the blood of mammals (Grimaldi and Engle 2005). proteobacteria (Bu chner 1965; Allen et al. in rev. ). In human lice these p endosymbionts synthesize b vitamins absent in the louse s diet of blood (Puchta 1955; Perotti et al. 2009). We suspect that endosymbionts of other louse species have similar roles in vitamin provis ioning. Ries and colleagues made extensive studies of the life history and inheritance of p endosymbionts in lice (Ries 1931; Ries 1932; Aschner and Ries 1933; Ries 1933; Ries and Weel 1934; Ries 1935). Collectively they detailed the life history of p en dosymbionts from Polyplax, Haematopinus, Hematomyzus, Linognathus, Pedicinus, Phthirus, and Pediculus species (see also Elerbe and Mclean 1982, 1983; Sasaki Fukatsu et al. 2006; Perotti et al. 2007; Fukatsu et al. 2009 for more recent studies). They found considerable variation in the structure and location of bacteriomes as well as how the bacteria were transmitted (Buchner 1965). The p endosymbionts of Linognathus and Polyplax were housed in very simplified bacteriomes reminiscent of bacteriomes in clos ely related chewing lice parasitizing birds (Buchner
86 1965). Based on variation in the life history of endosymbionts Buchner (1965) suggested that bacterial endosymbiosis had originated multiple times in lice. Recent findings using molecular phylogenetics have found that louse endosymbionts have proteobacteria (Allen et al. 2009; Hypsa and Krizek 2007; Fukatsu et al. 2009; Novakova et al. 2009; Allen et al. in rev. ). A new endosymbiont from Proechinophthirus lice parasitizing northern fur seals has recently been described (Chapter 3). T his endosymbiont was described as an s endosymbiont ( because the endosymbionts were not localized to a n organized bacteriome ) and closely related to p endosymbionts of chewing lice (Smith et al 2013). Add itionally, Allen et al. (in rev. ) found Ancistroplax crodidurae (Phthiraptera: Anoplura) parasitizing the Asian Gray shrew ( Crocidura attenuate; Soricidae) had two endosymbionts, however it is unclear if one or both are p endosymbionts. Colle ctively this suggests an unusual system with diverse endosymbionts occupying closely related louse species with extensive variation in the life history of endosymbionts. The most comprehensive molecular phylogeneti c study by Allen et al. (in rev. ) found t hat obligate proteobacteria (nine times in Enterobacteriales and once in Legionellales). However, studies on the origins of louse endosymbiosis have relied on 16S rDNA to estimate the species history (Allen et al. 2009; Hypsa and Krizek 2007; Fukatsu et al. 2009; Novakova e t al. 2009; Allen et al. in rev. ). Insect endosymbionts present unique challenges to phylogenetic analysis. These bacteria possess extremely small AT rich genomes and often extremely hig h mutation rates (Gosalbe et al. 2010; Allen et al. 2009). While 16S rDNA may be useful for describing species level association, its use in understanding higher
87 level relationships in the Enterobacteria may be limited due to rates of substitution in diff erent regions of the gene (Naum et al. 2008). The relationship of insect endosymbiont in Enterobacteria based on 16S rDNA can change depending on the number of non insect associated taxa included in the analysis, causing them to either group together or a part (Naum e t al. 2008; Allen et al. in rev. ). AT bias and high mutation rates may cause endosymbiont to be falsely assigned to a clade through long branch attraction (Husnik et al. 2011). While the availability of 16S rDNA sequence data may make it att ractive to phylogenetic studies, a preferable approach would be to use multiple independent conserved genes to estimate the origins of insect endosymbionts within the bacterial tree of life. By combining phylogenetic information from multiple independent markers we hope to estimate the species history, not a single gene history. Reduced costs and availability of next generation sequencing can facilitate such approaches. Only one louse p endosymbiont has been included in a phylogenomic study, Candidatus Ri esia pediculicola, p endosymbiont of the human louse (Husnik et al. 2011). This study largely agreed with previous single gene phylogenies that this p endosymbiont was closely related to Arsenophonus (endosymbionts of parasitic wasps and flies), but only after extensive manipulation of their data and models of evolution (Husnik et al. 2011; see also Novakova et al. 2009). Under more standard models and approaches Husnik et al. (2011) found that Ca Riesia grouped with Wigglesworthia (p endosymiont of the tsetse fly). To better understand the origins of louse endosymbiont we have sequenced the genome of three new Enterobacteria louse endosymbionts previously characterized as p endosymbionts (Buchner 1965). Along with three
88 additional published louse endos ymbiont genomes, we used a phylogenomic (or multi gene) approach to determine the relationship of louse endosymbionts to other bacteria. Like Husnik et al (2011) we considered traditional and more complex models of sequence evolution as well as alternati ve methods of coding the data matrix when build ing phylogenies of Enterobacteria. We then used these results to address possible sources of error from AT bias and high mutation rates. Finally we built phylogenetic trees individually from 53 protein codin g genes and ribosomal DNA to assess variation across phylogenetic markers. Methods Specimen C ollection Endosymbiont genomes sequenced in this study: Hematomyzus elephantus (Rhyncophthirina) Ex. Elephas maximus indicus (Indian Elephant), INDIA: Kerala, Gu ruvayur, 23 Sept 1998; Linognathus spicatus (Anoplura) Ex. Connochaetes taurinus (Blue Wildebeest), ZIMBABWE: Southern Bubye Conservatory, S21 40.006 E30 10.542, Leg. J Light & J Skinner; Pedicinus badii (Anoplura) Ex. Procolobus rufomitratus (Red Colobu s Monkey), UGANDA: Kibale National Park, Leg. J Allen; Haematopinus eurysternus (Anoplura) Ex. Bos (Cow ) LAD 3298. Published louse endosymbiont genomes used in this study: Proechinophthirus fluctus (Anoplura) Ex. Callorhinus ursinus (Northern Fur Seal), USA: Alaska, Pribilof Islands, St. Paul Island (Chapter 3 ); Candidatus Riesia pediculischaeffi str. PTSU (Anoplura) Ex. Pan troglodytes schweinfurthii (chimpanzee), UGAND A: Ngamba Island Sanctuary (Chapter 2 ); Candidatus Riesia pediculicola str. USDA (Anop lura), Ex. Rabbit adapted strain of human clothing louse, USA: Maryland (Kirkness et al. 2010).
89 Genome S equencing Three to nine individual nymphs or adult female lice were selected for each louse species used in this study. Genomic DNA was extracted from each louse using a phenol chloroform extraction method and extracts were pooled by species. Random shotgun libraries were then constructed using the TruSeq DNAseq Sample Prep kit, selecting for a 450bp mean insert size, libraries were then barcoded by spe cies. Libraries were then pooled and sequenced paired end using two lanes of Illumina HiSeq 2000, 100bp long reads, using TruSeq SBS sequencing kit V2, with data analysis in pipeline V1.8. Quality of the library was assessed from the sequence data for ea ch species using fastQC (Babraham Bioinformatics). Reads were trimmed or removed by quality score, bacterial reads were separated from louse reads, and bacterial genomes assembled following methods o utlined in Chapter 2 Genome sequences of louse p endos ymbiotic bacteria were annotated using Rapid Annotation Subsystem Technology (Aziz et al. 2008). Gene C lustering Predicted gene sequences from louse endosymbiont and representative Enterobacteria (not associated with lice) were translated into amino acid s proteobacteria gene and translated gene sequences were downloaded from CoGe; see T able 4 5; Lyons and Freeling 2008; Lyons et al. 2008). Amino acid sequences were clustered by overall similarity to find putative homologous groups of genes, her ein referred to as gene clusters using cd hit (Li and Godzik 2006; Fu et al. 2012). To find the optimal settings to cluster amino acid sequences we performed 18 iterations of clustering changing the minimum overlap between sequences each time (80%, 60%, a nd 50%) and percent identity (90%, 80%, 70%, 60%, 50%, and 40%) of the
90 overlapping sequences for inclusion of a gene into a cluster. Summary information of clusters was generated for each change in overlap and percent identity using perl scripts available in the cd hit package. Summary information was visualized in R v3.0.1 o acid sequences were replaced with the original corresponding nucleotide sequences. Gene R eperto ire Optimal clusters of ge ne sequences were used to generate a binary (0=absence and 1=presence) matrix of gene presence and absence by species of bacteria (see Delsuc et al. 2005 for more on this approach). This matrix was exported as a nexus formatted file ( i. e. each row of the nexus file represented a species and each column represented a group of homologous genes) As homology was estimated during gene clustering and no gaps can be present in the matrix using this method, no homology assessement through position alignment wa s needed. PAUP version 4 beta 10 was used to build a neighbor joining tree from this binary matrix (Swoffored 2003). Phylogenomic A nalysis Global sequence alignments were performed on sequences within each gene cluster using Muscle (Edgar 2004). To ex clude uninformative alignments we removed gene clusters with fewer than 10 sequences and clusters with paralogs (i. e. two or more sequences from one taxon in a cluster). Remaining gene alignments were concatenated into a single supermatrix using FASconCA T V.1 (Kuck and Meusemann 2010; see F igure 4 1 for a general outline of supermatrix analysis). Missing genes were assumed to be absent and encoded as gaps in the supermatrix. To reduce possible effects of gaps on subsequent phylogen e tic analysis we built a second supermatrix containing a subset of genes from the first matrix. Gene clusters included in this
91 reduced matrix contained a minimum of 35 sequences and no paralogs. Maximum likelihood (ML) phylogenetic trees were calculated from both the larger a nd the reduced supermatrices using RAxML (VI HPC) with a GTR 2006). Support was assessed from 217 rapid bootstrap replicates from the larger supermatrix and 1000 rapid bootstrap replicates from the smaller matrix. Additi onal phylogenetic searches were done using nhPhyML (Galtier and Gouy 1998; Guindon and Gascuel 2003; Boussau and Gouy 2006) with the smaller supermatrix. This program uses a non reversible model of evolution coupled with multiple classes of base frequency estimates. nhPhyML requires a rooted starting tree. We ran nhPhyML six independent times changing setting and starting trees: run 1) starting tree=maximum likelihood tree, G+C categories=3, positions=1st2nd3rd, rates=8, alpha=estimate, transversion/tran sition=estimate; run 2) starting tree=maximum likelihood tree, G+C categories=3, positions=1st2nd, rates=8, alpha=estimate, transversion/transition=estimate; runs 3 6) starting tree=random tree, G+C categories=3, positions=1st2nd, rates=8, alpha=estimate, transversion/transition=estimate. Maximum likelihood starting trees were the RAxML rooted to Rickettsia Random starting trees were calculated in R package APE (Paradis et al. 2004). In every case nhPh yML was allowed to estimate alternative topologies and branch lengths. Next we recoded the smaller supermatrix by purines (A and G changed to R) and pyrimidines (T and C changed to Y). We then calculated the ML tree from the RY coded supermatrix using R AxML, GTR While the GTR model was used, constraining the data to RY or SW coding will make it behave more like the
92 simpler model. Support was assessed from 1000 rapid bootstrap replicates. A similar process was then used to recode the smaller supermatrix by strong molecular bond (G and C changed to S) and weak molecular bond (A and T changed to W). Again a ML tree was calculated from the recoded SW matrix as in the RY analysis. Next the 53 genes comprising the smaller supermatrix were translated to amino acid sequences. Amino acid sequences were aligned using muscle and concatenated using FASconCAT. We used RAxML to find the maximum likelihood tree under a d bootstrap replicates. We then recoded the amino acid supermatrix by Dayhoff4 and Dayhoff6 schemes and calculated the maximum likelihood tree f or each matrix using RAxML with 1000 rapid bootstrap replicates each. Phylogenetic A nalysis The 53 genes com prising the reduced supermatrix were analyzed individually as single gene nucleotide alignment s. ML trees were calculated from each gene cluster alignment using RAxML, GTR rapid bootstrap replicates. Finally 16S rDNA was obt ained from Allen et al. (in rev. ) for the corresponding louse endosymbion t used in this study. 16S rDNA sequences from non louse associated bacteria used in this study were obtained using blastx implemented on CoGe. 16S rDNA sequences were aligned using Muscle. A ML tree was calculated from the alignment using RAxML, GTR model of evolution, and support assessed from 1000 rapid bootstrap replicates.
93 Results Genome A ssembly We attempted to sequence and reconstruct the genomes of p endosymbionts from five species of parasitic lice. In three of these species, the Elephant Lo use, proteobacterial genome in our sequen ce data. In one case, the cattle louse, we did not recover any bacterial sequence data. Collectively these louse endosymbiont genomes ran ge d in size from 438kb to 3.3mb with a G C content of 25 to 44% (T able 4 2; F igure 4 2). Gene C lustering Amino acid sequences from louse endosymbionts and related Enterobacteria were clustered into groups based on sequence similarity. The process of gene clustering was repeated 18 times, each iteration changing minimum overlap and percent identity of aligning sequences for incl usion into a cluster (F igures 4 7 and 4 8 ). Changing minimum percent overlap had little effect on gene clustering. Ultimately we chose a minimum overlap of 60% based on previous research suggesting genes truncated by more than 60% should be considered pseudogenes (Degnan et al. 2005). Changing percent identity within the overlapping sequence showed a greater effect on how sequences clustered. A percent identity of 90% resulted in the greatest number of sequences not clustering with other sequences. At the opposite end of the spectrum 40% identity showed the greatest number of sequence s clustering in very large clusters ( i. e. tho se clusters with more sequences than the original number of taxa ). A minimum sequence overlap of 60% and a percent identity of 50% within the overlapping region was found to be the optimal parameters for gene clustering This yielded the fewest
94 genes tha t did not cluster with other gene sequences while minimizing the number of large clusters (i. e. clusters containing non homologous genes). A total of 104,735 amino acid sequences were grouped into 38,875 clusters representing groups of homologous gene se quences. Gene R epertoire A neighbor joining tree was constructed from a binary matrix of gene presence absence in bacterial species (F igure 4 3). Louse p endosymbionts, with the exception of the s endosybmiont form the northern fur seal louse, grouped in to a single clade. This clade was nested within a larger clade consisting of endosymbionts from many insect species. Likelihood Tree Inference from S upermat rices with GTR odels A supermatrix of aligned nucleotide sequences was built from 2,056 phylogen e tically informative gene clusters. Sixty one percent of this matrix consisted of gap characters resulting from absent ge nes or alignment based gap s. A ML tree calculated from this larger matrix (using a GTR origins of louse endosymbionts with 100% bootstrap support at nearly every node (F igure 4 4). The fur seal louse s endosymbiont was found to be sister to Sod alis and all other louse p endosymbionts sister to Wigglesworthia To reduce potential bias in constructed from 53 of the most common genes. This matrix contained only 18% gap characters due to missing genes or alignment based gaps. A ML tree was inferred from the r educed matrix (using a GTR o pology to the ML tree in ferred from the larger matrix (F igures 4 4 and 4 5). Then the reduced supermatrix was reconstructed using aligned amino acid sequences. A ML tree was inferred from
95 this mat supported a similar typology to previous analysis (F igure 4 5). Based on studies by Husnik et al (2011) we suspected these topologies might reflect problems with long branch attra ction (see Felsenstein 1978) erroneously grouping distantly related endosymbionts due to base frequency bias. We attempted to correct for this by two approaches: first by using models that account for multiple rapid transition in base frequencies and sec ond by using recoded matrices. Likelihood Tree I nfere nce with Non homogenous M odels Implemented in nhPhyML, non homogenous models estimate model parameters and allow base frequencies to vary between a set of user defined classes. nhPhyML was used to infer a ML tree from the reduced nucleotide supermatrix when the starting tree was the ML tree calculated under the GTR considering all codon positions and when excluding the highly variable third codon position, nhPhyML infe rred trees that supported two origins of louse endosymbiosis as in previous ML trees. When using five different random starting trees, nhPhyML inferred ML trees with any where from two to five origins of louse symbiosis. Sister groups to louse symbionts were variable when comparing these five ML trees, with the exception of the fur seal louse s endosymbiont. This endosymbiont grouped with Sodalis in all trees. Likelihood Tree I nfer ence from Recoded S upermatrices The reduced nucleotide supermatrix was recoded using the RY and SW systems. Here RY should reduce bias seen in AT rich taxa, while SW should not reduce this bias. We again inferred ML trees using a GTR and SW recoded supermatrices (F igure 4 5). Both of the resulting ML t rees supported two origins of
96 louse symbiosis similar to the ML tree inferred from the original supermatrix. Next we recoded the reduced amino acid supermatrix using the Dayhoff4 and Dayhoff6 coding. Again we inferred ML trees based on each recoded matr ix using Dayhoff substitution matrix wit igure 4 5). These ML trees placed louse endosymbionts near the base of the tree and were difficult to interpret. Indi vidual Gene T rees For each gene included in the reduced supermatrix and 16S rDNA we constructed an indivi dual gene tree using a GTR supported louse endosymbionts relat ionships similar to supermatrix derived trees. However, four protein coding gene trees and the 16S rDNA tree supported Ca Riesia as an independent clad e from other louse endosymbionts and closely related to Arsenophonus nasonae (T able 4 4; F igure 4 6). Conclu ding R emarks Here we sequenced the genomes of three new louse endosymbionts. While these are draft genome sequences, they are annotated and double the number of sequenced louse endosymbiont genomes ( three previously described louse endosymbiont genomes; Kirkness et al. 2010; Chapters 2 and 3 ). From the s e data we have found that the p endosymbionts of Indian Elephant, Blue Wildebeest, Red Colobus mon key, Chimpanzee ( Ca Riesia pediculischaeffi), and Human lice ( Ca Riesia pediculicola) possess very small genomes similar to other insect p endosymbionts (T able 4 2; F igures 4 4, 4 5, and 4 6). All of these endosymbionts genomes correspond to bacteriome associated p endosymbionts described by early microscopy in the 1930s (Ries 1931; Ries 1932; Aschner and Ries 1933; Ries 1933; Ries and Weel 1934; Ries 1935; Buchner 1965). An endosymbiont found in lice parasitizing the northern fur seal
97 possess a much la rger genome similar to the s endosymbiont from tsetse flies, Sodalis glossinidius (T able 4 2; F igures 4 4, 4 5, and 4 6; Chapter 3 ). This endosymbiont was described as an s endosymbiont, not associated with a n organized bacteriome, in Chapter 3 Our dat a also provides new insights into the origins of Ca Riesia p endosymbiont of lice. We find that Ca Riesia and new louse p endosymbionts described in this report form a clade. The closest known relative of these louse p endosymbionts is Wigglesworthia a p endosymbiont of tsetse flies. This is contrary to previous reports placing Ca Riesia as member of the Arsenophonus clade of Enterobacteria (Novakova et al. 2009; Husnik et al. 2011). The s endosymbiont from the fur seal, groups with S glossinidius consistent with previous descriptions (Allen et al. in rev. ; Chapter 3 ). Previous authors have con cluded that louse p endosymbiose s have multiple proteobacteria (Hypsa and Krizek 2007; Fukatsu et al 2009; Novakova e t al. 2009; Allen et al. in rev. ). The most studied of these bacteria are the p endosymbionts of human and anthropoid primate lice, belong to the genus Ca Riesia (Sasaki Fukatus et al. 2006; Allen et al. 2007; Allen et al. 2009; Novakova e t al. 2009; Allen et al. in rev. ). Previous studies have found that Ca Riesia belongs to a larger clade of insect associated bacteria roughly falling under the name A rsenophonus (Novakova et al. 2009; Husnik et al. 2011). Novakova et al. (2009) even suggested that the Ca Riesia name was not valid, as Ca Riesia represents a clade within Arsenphonus based on their results. Other louse p endosymbiont have been found t o be closely related to other insect endosymbionts (including Wigglesworthia Buchnera
98 Sodalis ) and one related to Legionella (Hypsa and Krizek 2007; Fukatsu et al. 2009; Novakova et al. 2009; Allen et al. in rev. ). The most extensive study of louse p en dos ymbiont by Allen et al. (in rev. ) found 10 origins of lou se endosymbiosis. While proteobacteria, studies focusing on one gene may over estimate the number of origins of endosymbisosis. Husnik et al. (2011) was the first to include a lou se p endosymbiont ( Ca Riesia) proteobacteria. They found under standard models of sequence evolution ( e. g GTR Ca Riesia was more closely related to Wigglesworthia than to Arsenophonus However, they suspected that this relationship represented a problem of long branch attraction. Long branch attraction occurs when two unrelated taxa experience major shifts in nucleotide base frequencies and phylogenetic analysis f avor grouping these unrelated taxa erroneously. When using new models and/or matrix recoding to correct for long branch attraction, Husnik et al. (2011) found support for a clade containing Ca Riesia and Arsenophonus Like Husnik et al. (2011) we found that Ca Riesia grouped with Wigglesworthia along with other louse p endosymbionts from elephant, wildebeest, and colobus monkey lice under a standard GTR ML trees from our 2056 gene supermatrix. The only e xception being the s endosymbiont from the fur seal louse grouping with Sodalis (see Allen et al. in rev. ; Chapter 3 ; F igure 4 4). We were concerned that three factors may cause Ca. Riesia to erroneously group with Wigglesworthia and newly sequenced louse p endosymbionts
99 based on our data; gap characteristics of the supermatrix, AT bias, and inadequate taxon sampling. First, gap characters in our supermatrix might leave little overlap between genes shared by p endosymbionts and facultatively associated o r free living bacteria. Analysis of bacterial gene repertoires suggested Ca Riesia, other louse p endosymbionts, and Wigglesworthia shared similar set of genes (F igure 4 3). Insect p endosymbionts possess small genomes with a relatively similar set of g enes, likely arriving at this repertoire by convergent processes (Merhej et al. 2009). To correct for this source of error we reduced our 2056 gene supermatrix to the 53 most conserved genes across our taxa. Re analysis of our reduced supermatrix under s tandard models supported our original finding, that Ca Riesia along with other louse p endosymbionts are closely related to Wigglesworthia (F igure 4 5). Second, we were concerned that base frequency bias may lead to long branch attraction in our data se t. Like Husnik et al. (2011) we used two approaches to correct for long branch attraction, model correction and matrix recoding. For model correction we used a non reversible model with multiple three classes of GC base frequencies implemented in nhPhyML However because of the variability of results produced from subsequent re analysis of our data set under this model, we suspect the model (as implemented in nhPhyML) may be prone to finding locally optimal trees, never finding a globally optimal tree. For this reason we felt nhPhyML was not a valid option to describe the origins of louse endosymbionts. Next we recoded our reduced supermatrix to try to account for possible long branch attraction. This included recoding amino acid matrices by Dayhoff6 a
100 Pyrimidine should control for AT base bias commonly found in endosymbiont genomes that may lead to long branch attraction. However strong weak coding should not correct for this bias while still changing from four to two characters, acting as a control for this method. Surprisingly ML trees calculated from both RY and SW recoded matrices supported Ca Riesia, other louse p endosymbionts, and Wigglesworthia as closely related with strong b ootstrap support (F igure 4 5). This suggests AT bias in p endosymbionts may not be as problematic as suggested by Husnik et al (2011). ML trees calculated from Dayhoff4 and Dayhoff6 recod ed amino acid matrices showed little support for the placement of Ca Riesia and were difficult to interpret, because louse endosymbionts gr ouped at the base of the tree (F igure 4 5). Third, previous studies using 16S rDNA have demonstrated that insect e ndosymbionts can group together or apart based on the number of non endosymbiont taxa included in the tree (Naum e t al. 2008; Allen et al. in rev. ). We wanted to verify that our taxon sampling was adequate to describe multiple origins of louse endosymbion ts when considering only 16S rDNA. A ML gene tree was calculated under a GTR 16S rDNA from the same taxa included in our phylogenomic analysis. Our res ults with 16S rDNA supported five origins of louse endosymbionts. The taxa included in this analysis represented a subset of taxa included in a recent large study of louse endosymbionts using 16S rDNA (Allen et al. (in rev.). However, our results with 16S rDNA appeared to be consistent with this larger study by Allen et al. (in rev. ). To better understand how individual p rotein coding genes contributed to our overall supermatrix results, we constructed individual ML trees for each of the 53 genes
101 that composed our reduced supermatrix. Each gene tree was calculated from nucleotide sequences un der a GTR supported more than two origins of louse endosymbionts with Ca Riesia as closely related to Arsenophonus similar to the 16S rDNA based phylogeny. The remaining 49 gene trees supported a topolog y similar to the supermatr ix derived trees (F igure 4 6). This data points to the utility of using multi gene approaches when determining phylogenetic relationships of bacteria. While a few genes support one relationship similar to pervious studies, the m ajority of gene trees supported an entirely different relationship. Our data supports the genus Ca Riesia not as part of the genus Arsenophonus but as a phylogenetically distinct clade with its closest known relative being Wigglesworthia (p endosymbion ts of tsetse flies). Our findings also suggest that Ca Riesia is a much more diverse clade of louse p endosymbionts than previously thought. We found Ca Riesia p endosymbionts in three genera of sucking lice (Anoplura) and from the Rhyncophthirina, a s ister group of the sucking lice. Based on our phylogenomic data we propose that these three p endosymbionts be designated as species within the genus Ca Riesia. New species Candidatus Riesia rubra ; p endoysmboint of Pedicinus badii This species is na med for red color of Red Colobus Monkey on which the host lice were found. P. badii are know to parasitize other Colobus monkey species, but we do not know if they all harbor this new species of endosymbiont. Fukatsu et al (2009) proposed the name Candi datus Puchtella pedicinophila for p endosymbiont of Pedicinus obtusus (Phthiraptera, Anoplura). Allen et al. (in rev.) found that all p endosymbionts of Pedicinus species form a clade based
102 on 16S rDNA. We suggest Ca Puchtella represent a junior synonym of Ca Riesia and Ca Puchtell pedicinophila should be considered a species of Ca Riesia. New species Candidatus Riesia indica ; p endosymbionts of Hematomyzus elephantus parasite of elephants. This Ca Riesia species posses s the smallest sequenced gen ome of any Ca Riesia species. This is the only known Ca Riesia species in symbiosis with a member of the Rhynophthirina, sister clade to the Anoplura. The proposed species name was chosen because it is a p endosymbiont of lice found on the Indian eleph ant. New species Candidatus Riesia caerulea ; p endosymbiont of Lignonathus spicatus parasite of Blue Wildebeest. The proposed species name was chosen because this Ca Riesia species is a p endosymbiont of li ce parasitizng Blue Wildebeest; derived from t he L atin word blue, caeruleus The s endosymbiont found in sucking lice parasitizing northern fur seals was unique in our study. This endosymbiont was supported as closely related to Sodalis in all of our analyses, consistent with 16S rDN A analysis (All en et al. in rev. ; Chapter 3 ). S. glossinidia is a s endo symbiont found in tsetse flies (Dale and Maudlin 1999). Sodalis like p endosymbionts have been detected from the grain weevils, spittlebugs, and chewing lice (Clayton et al. 2012; Smith et al. 2013 ; Koga et al. 2013). It seems likely that the Sodalis like endosymbionts are actually a diverse clade of insect endosymbionts, only just beginning to be described. This symbiont has a genome very similar to Sodalis (F igures 4 3 and 4 7; Chapter 3 ; see al so Charles et al. 1997; Toh et al. 2006; Clayton et al. 2012). Its genome is considerably larger than any Ca Riesia genome and has a much higher GC content. In Chapter 3 this was described as an s
103 endosymbiont and it likely represents a recent endosymbi ont replacement in these unusual marine lice. Our results show a departure from previous work on the number of origins and proteobacteria. We still suggest multiple origins ( e g the Sodalis like s endosym biont included in this study, a Legionella like endosymbion t not available for our study the dual endosymbionts of Ancistroplax crodidurae and likely more undescribed ), however previous studies may have over estimated the number of origins. Based on 16S rDNA we expected to find five origins of louse endosymbios is in our data, but phylogenomic data supported only two. Additionally we find evidence for a widespread p endosymbiont clade in sucking lice and their sister group, the Rhyncophthirina. This suggests lice possess a clade of ancient p endosymbionts with small AT rich genome. We rigorously controlled for problems in phylogenetic analysis of p endosymbionts, long branch attracti on and gene repertoire. Fu rther we looked at variation in a set of individual genes, finding only few genes deviated from our pro posed phylogeny of louse p endosymbionts. By combining multiple gene sequences and increased taxon sampling of louse endosymbionts we were able to confidently build a backbone tree of Enterobacteria accurately placing louse p endosymbionts.
104 Figure 4 1 Phylogenomic workflow. Gray ovals indicated supermatrices used in phylogenomic analysis and include information about each supermatrix. Analysis workflow indicated by arrows. Model of sequence evolution and data type indicated in columns, supermatrix type and methods of data coding indicated in rows.
105 Figure 4 proteobacteria used in this study. Red triangles denote louse endosymbionts, green crosses denote non louse insect p endosymbionts, blue triangles denote non louse insect s endosymbionts, and all others de noted by a black circle. Note that Serratia symbiotica is designated as p endosymbionts in this chart, although it is sometime classified as a s endosymbiont.
106 Figure 4 proteobacteria. Neighbor Joining tree constructe d from a binary matrix representing presence or absence of genes proteobacteria included in this study. Louse p endosymbionts highlighted in red.
107 Figure 4 proteobacteria calculated from 2056 gene supermatrix. Values at nodes i ndicated percent bootstrap support from 217 replicates. Vertical bars at tree tips indicate endosymbionts of insects with roles indicated as PE=primary endosymbiont, SE=secondary endosymbiont, and RP=reproductive parasite.
108 Figure 4 5. ML trees calcul ated from 53 gene supermatrices. Data type indicated at top of columns, data coding indicated next to each tree, values at nodes indicate parts of trees have been pruned. Lous e endosymbionts highlighted in red. Trees originally calculated with 42 taxa, but trimmed to summarize results.
109 Figure 4 6. ML tree calculated from 16S rD NA. Values at nodes indicate percent bootstrap support from 1000 replicates. Louse endosymbio nts highlighted in red.
110 Figure 4 7. Counts of clusters found when comparing 104,735 translated gene sequences from 42 taxa. Changes in minimum overlap between gene sequences for inclusion in a cluster plotted on x axes. Changes in minimum percen t identity are between overlapping sequences shown in legend. Upper left shows total number of clusters. Upper right shows total number of clusters that contain only one sequence. Lower left shows total number of clusters that contain between 2 and 42 s equences. Lower right shows total number of clusters containing greater than 42 sequences in a cluster.
111 Figure 4 8. Counts of amino acid sequences included in clusters that contain one sequence, 2 42 sequences, and greater than 42 sequences. Minimum overlap required for inclusion of a sequence in a cluster is 0.6 of total sequence length. Minimum percent identity of sequence within the overlap plotted on x axes
112 Table 4 1. Summary of phylogenetic studies including louse endosymbionts. Source Data Taxa included Origins detected Taxa in this study Fukatsu et al 2006 16S rD NA 1 1 1 Allen et al 2007 16S rD NA 4 1 2 Hypsa & Krizek 2007 16S rD NA 8 5 2 Allen et al 2009 16S rD NA 4 1 2 Fukatsu et al 2009 16S rD NA,groEL 2 2 2 Novakova et al 2009 1 6S rD NA 6 3 2 Husnik et al 2009 69 genes 1 1 1 Allen et a l in rev. 16S rD NA Many 10 6 Table 4 2. Louse endosymbionts and genome characteristics. Louse Endosymbiont Reference Genome size %GC Contigs N50 Human Body Louse; Pediculus humanus Ca. Riesia pediculicola str. USDA Kirkness et al 2010 582,127 28.57 na na Chimpanzee Louse: Pediculus schaeffi Ca. Riesia pediculischaeffi str. PTSU Chapter 2 581,916 31.79 na na Indian Elephant Louse: Hematomyzus maximus Ca. Riesia minima new sp. this study 4 37,514 25.28 258 45,962 Northern Fur Seal Losue: Proechinophthirus fluctus Sodalis like Chapter 3 3,305,175 43.56 629 76,377 Blue Wildbeest Louse: Lignonathus spicatus Ca. Riesia ungula new sp. this study 511,310 26.64 1 511,310 Red Colobus Monkey Lous e: Pedicinus badii Ca. Riesia rubra new sp. this study 558,122 24.19 2 305,250
113 Table 4 3. Supermatrix phylogenetic methods and the number of origins of louse symbiosis detected. Software Model Data type Coding type Posit ions Loci Startin g tree Boot strap Origins Louse symbiosis RAxML GTR Nucl. Acid ATCG 123 2056 None 217 2 RAxML GTR Nucl. Acid ATCG 123 54 None 1K 2 nhPhyML Variable, GC=3 Nucl. Acid ATCG 123 54 ML tree NA 2 nhPhyML Variable, GC=3 Nucl. Acid ATCG 12 54 ML tree NA 2 nhPhyML Variable, GC=3 Nucl. Acid ATCG 12 54 Rando m NA 2 5 RAxML GTR Nucl. Acid RY 123 54 None 1K 2 RAxML GTR Nucl. Acid SW 123 54 None 1K 2 RAxML CAT Dayhoff Amino Acid Standard NA 54 None 1K 2 RAxML CAT Dayhoff Amino Acid Dayhoff4 NA 54 None 1K 2 RAxML CAT Dayhoff Amino Acid Dayhoff6 NA 54 None 1 K 2
114 Table 4 4. Summary of louse p endosymbiont placement in individual gene ML trees (LPE=louse p endosymbiont, PFE=Fur Seal Louse endosymbiont). Trees calculated using a GTR Edwar=Edwardsiella Cluster/gene ID PFE sister speci es Ca Riesia & other LPEs sister species 31415/ rpml 50S ribosomal subunit protein L35 Sodalis Wigglesworthia+Buchnera 29875/ csrA pleiotropic regulatory protein for carbon source Wigglesworthia+Buchnera 28870/ rpsR ribosomal protein S18 Sodalis Wiggl esworthia+Buchnera 26576/ rpsJ ribosomal protein S10 Sodalis Wigglesworthia+Buchnera 26174/ prlX 50S ribosomal subunit protein L24 Sodalis Wigglesworthia+Buchnera 26076/30S ribosomal protein S15 Sodalis Wigglesworthia+Buchnera 24804/ rplR ribosomal pro tein L18 Sodalis Wigglesworthia+Buchnera 23953/ rpsH ribosomal protein S8 Sodalis Wigglesworthia+Buchnera 23440/ rplP ribosomal protein L16 Sodalis Wigglesworthia+Buchnera 23365/ rplQ ribosomal protein L17 Sodalis Wigglesworthia+Buchnera 22609/ rpsK ri bosomal protein S11 Sodalis Wigglesworthia+Buchnera 22137/ groS chaperonin GroS Sodalis Arsenophonus+ Wigglesworthia+Buchnera 21055/ rplA ribosomal protein L1 Wigglesworthia+Buchnera 19809/ rpsE ribosomal protein S5 Sodalis Wigglesworthia+Buchnera 196 23/ nuoI NADH:ubiquinone oxidoreductase, chain I Sodalis Wigglesworthia+Buchnera 19443/ dksA RNA polymerase binding protein DksA Wigglesworthia+Buchnera 18664/oligoribonuclease Sodalis Wigglesworthia+Buchnera 18275/ATP dependent protease HslV Sodalis A rsenophonus+ Wigglesworthia+Buchnera 17281/ ribA GTP cyclohydrolase II Sodalis Arsenophonus+ Wiggelsworthia+Buchnera 16261/ rpoA DNA directed RNA polymerase Sodalis Wigglesworthia+Buchnera 15666/ rpsC ribosomal protein S3 Sodalis Wigglesworthia+Buchner a 15128/ rpiA ribose 5 phospate isomerase A Sodalis Wigglesworthia+Buchnera 14359/ rpsB ribosomal protein S4 Sodalis Wigglesworthia+Buchnera 9049/ pfkA 6 phosphofructokinase Sodalis Arsenophonus+ Wigglesworthia+Buchnera 9029/ gidA2 glucose inhibited di vision protein A Wigglesworthia+Buchnera 8923/ rpsD ribosomal protein S4 Sodalis Wigglesworthia+Buchnera 8920/ ispE 4 diphosphocytidly 2C methyl erythritol kinase Wigglesworthia+Buchnera 8383/ gap glyceraldehyde 3 phosphate dehydrogenase, type I Sodal is Wigglesworthia+Buchnera 7182/ fbaA fructose bisphosphate aldolase, class II Sodalis Wigglesworthia+Buchnera 7046/ aroC chorismate synthase Edward siella Wigglesworthia+Buchnera 6927 / ftsZ cell division protein Sodalis Wigglesworthia+Buchnera 6895/rib onucleoside diphsophate reductase, beta subunit Sodalis Wigglesworthia+Buchnera 6597/ dnaJ chaperone protein DnaJ Sodalis Wigglesworthia+Buchnera 6031/3 oxoacyl [acyl carrier protein] synthase 1 Wigglesworthia+Buchnera 5495/UDP N acetylglucosamine 1 ca rboxyvinyltransferase Sodalis Wigglesworthia+Buchnera 5089/ eno phosphopyruvate hydratase Wigglesworthia+Buchnera 4864/preprotein translocase, Sec Y subunit Wigglesworthia+Buchnera
115 Table 4 4. Continued Cluster/gene ID PFE sister species Ca Riesi a & other LPEs sister species 4827/ hslU heat shock protein HslVU, ATPase subunit HslU Wigglesworthia+Buchnera 4745/ rho transcription termination factor Rho Sodalis Wigglesworthia+Buchnera 4646/ clpX ATP dependent Clp protease, ATP binding subunit Clp X Sodalis Wigglesworthia+Buchnera 3851/ asnS asparaginyl tRNA synthetase Sodalis Wigglesworthia+Buchnera 3754/ atpD F1 sector of membrane bound ATP synthase, beta subunit Sodalis Wigglesworthia+Buchnera 2566/ glnS glutaminyl tRNA synthetase Sodalis Wigg lesworthia+Buchnera 2520/ rpsA ribosomal protein S1 Sodalis Wigglesworthia+Buchnera 2429/ argS arginyl tRNA synthase Sodalis 2128/ nuoC NADH:ubiquinone oxidoreductase, chain C,D Sodalis Wigglesworthia+Buchnera 1952/ lepA GTP binding protein LepA Sodal is Wigglesworthia+Buchnera 1855/ thrS threonyl tRNA synthetase Sodalis Wigglesworthia+Buchnera 1850/ hflB ATP dependent metallopeptidase HflB Sodalis Wigglesworthia+Buchnera 1748/ dnaK chaperone protein DnaK Sodalis Wigglesworthia+Buchnera 983/Ion, ATP dependent protease La Sodalis Wigglesworthia+Buchnera 718/ gryA DNA gyrase A subunit Sodalis Wigglesworthia+Buchnera 231/ rpoC DNA directed RNA polymerase beta subunit Sodalis Wigglesworthia+Buchnera
116 Table 4 5. Non louse associated taxa included in phylogenetic analysis. Special note: Serratia symbiotica treated as primary endosymbiont as some species of aphids require its presence in the bacteriome. Genus Species Strain P or S endosym biont Genome size %GC Arsenophonus nasoniae Son killer 3,5 75,339 35.58 Baumannia cicadellinicola Hc primary 686,194 33.24 Brenneria sp 4,943,773 55.85 Buchnera aphidicola Ak primary 653,223 25.69 Buchnera aphidicola Cc primary 416,380 20.1 Buchnera aphidicola Cinara primary 444,925 23.03 Buchnera aphidico la Sg primary 641,454 25.33 Buchnera aphidicola Tuc7 primary 641,895 26.29 Buchnera aphidicola Ua primary 627,953 24.18 Ca. Blochmannia floridanus primary 705,557 27.38 Ca. Blochmannia pennsylvanicus BPEN primary 791,654 29.56 Ca. Blochmannia vafer B VAF primary 722,585 27.51 Ca. Hamiltonella defensa 5AT secondary 1,843,969 40.26 Ca. Regiella Insecticola R5 15 secondary 2,013,072 42.6 Dickeya dadntii 3937 4,922,802 56.3 Dickeya dadntii Ech586 4,818,394 53.64 Edwardsiella ictarluir 93 146 3,81 2,315 57.44 Escherichia coli K 12 4,686,137 50.78 Escherichia coli O157 H7 5,376,914 50.24 Hafnia alvei ATCC 4,883,880 48.19 Pantoea ananatis AJI3355 4,877,280 53.7 Pectobacterium wasabiae wpp 5,063,892 50.48 Photorhabdus asymbiotica 1,611,855 43.27 Photorhabdus luminescens laumondii 5,688,987 42.83 Rahnella aquatilis CIP 5,448,900 52.02 Rickettsia rickettsii Iowa OUT GROUP Serratia odorifera DSM 5,362,525 56.16 Serratia Sp. AS12 5,443,009 55.96 Serratia symbiotica Cc primary 1,762, 765 29.23 Serratia symbiotica Tucson primary 2,792,868 47.93 Shigella boydii 3594 74 4,662,214 51.15 Sodalis glossinidius morsitans secondary 4,292,502 54.59 Wigglesworthia glossnidia G. brevipalpis primary 703,004 22.48 Wigglesworthia glossnidia G. morsitans primary 719,535 25.22 Xenorhabdus bovienii SS 2004 4,225,498 44.97 Xenorhabdus nematophilla ATCC 4,587,917 44.27
117 CHAPTER 5 CONCLUSIONS: TOWARDS DESCRIBING THE TRANSITION TO OBLIGATE VERTICALLY INHERITED SYMBIOSIS BY INSECT ASSOCIATED BA CTERIA Significance Insects are host to a variety of intracellular heritable bacterial (Buchner 1965), some of which convey adaptive traits to their insect hosts (Feldhaar 2011; Gosalbes et al. 2010). These bacteria are referred to as endosymbionts and so me are obligate symbionts that can only be passed o n from mother to offspring (Buchner 1965, Bright and Bulgheresi 2010) while others maintain a free living stage as well or can be horizontally transmitted (Bright and Bulgheresi 2010). Obligate symbionts provide dietary supplements to their hosts (including essential vitamins and amino acids) and/or assist in metabolism (Gosalbes et al. 2010). The acquisition of obligate heritable endosymbionts likely facilitated the radiation of insect groups that exclus ively utilize nutritionally incomplete diets (Allen et al. 2009). Despite this, there is no evolutionary model to describe how these symbioses originated. This leaves two open questions. What are the sources of these obligate vertically inherited bacter ial symbionts? How do evolutionary forces shape or limit the development of these symbioses? Insect Endosymbiosis Symbiosis is as an interaction between two dissimilar organisms (Smith and Douglas 1987). This definition does not describe the consequenc es to the fitness of either player in the symbiotic interaction. Endosymbionts (the focus of this chapter) are simply microbial symbionts that are housed within host tissues and/or cells (Wernegreen 2004). Here, I will exclude pathogens (those microorgan isms with a strong negative effect on host fitness and are primarily horizontally transmitted) and refer to them separately. Here, I will be focusing on heritable bacterial endosymbionts
118 of insects. These endosymbionts are primarily or exclusively vertic ally transmitted between host generations and can be broken into three categories following Dale and Moran (2006); primary endosymbionts, secondary endosymbiont, and reproductive parasites. Primary endosymbionts (p endosymbionts) are obligate, vertically inherited (maternal), intracellular bacteria that are housed in specialized structures called a bacteriomes (for review see Buchner 1965; Bright and Bulgheresi 2010). Bacteriomes are large complex structures with specialized cells that store p endosymbio nts (Buchner 1965). These structures form during early embryonic development (following infection by p to storage, transmission of p endosymbionts is facilitated and high ly regulated by the host through signaling and immune response (Buchner 1965; Hinde 1971; Login et al. 2011). In aphids, Hinde (1971) presented evidence that p endosymbionts that are present in the hemolymph or fatty tissues are subject to attack by the h system. Hinde (1971) also found that that the same immune responses were active in the bacteriome to remove dead or damaged p endosymbiont cells. Login et al. (2011) reported that weevils regulate the growth and spread of p endosymbionts thr ough a p endosymbiont targeted antimicrobial peptide. Here, Login et al. (2011) showed that stopping production of this peptide results in a change in the morphology of p endosymbionts and that p endosymbionts were able to escape the bacteriome, infecting other host tissues. Host control of p endosymbionts has likely facilitated ancient association between bacterial endosymbionts and insects. Associations between p endosymbionts and insects have persisted for tens to hundreds of millions of years, with
119 p endosymbionts co speciating with their hosts (Clark et al. 2000, Dale and Moran 2006). Secondary endosymbionts (s endosybionts) are facultative intracellular endosymbiont that are primarily vertically inherited. Unlike p endosymbionts, s endosymbiont s are not housed in specialized structures, but are found in multiple tissue types (Dale and Moran 2006). However, these endosymbionts can undergo horizontal transmission within and between host species, and thus do not show patterns of co speciation wit h their hosts (Dale and Moran 2006; Oliver et al. 2010). Reproductive parasites are ancient parasites that manipulate the reproductive biology of arthropods to ensure they are vertically transmitted in host populations (Dale and Moran 2006). Like s endo symbionts, these parasites undergo horizontal exchange and do not co speciate with their hosts (Dale and Moran 2006; Werren et al. 2008). Evolutionary Models of Endosymbiosis in I nsects Interactions between insects and endosymbiotic bacteria are complex a nd much remains unknown about the evolutionary forces that shape those relationships. Insect p endosymbionts have some of the smallest known bacterial genomes; these tiny genomes are the results of an endosymbiotic lifestyle (Moran 1996; Nilsson et al. 20 05; presented a model to describe the evolutionary forces that shape the genomes of ancient insect p endosymbionts; including effect of natural selection, drift, and host cont rol. In this example, the symbiosis is stable and has been persisting for tens to hundreds of millions of years, such as those endosymbionts of Aphids or tsetse flies (Gosalbes et al. 2010).
120 Smith (2007) presented evidence on how selfish mutations might drive adaptation in populations of host s endosymbionts and reproductive parasites. Importantly, neither of these mo dels attempted to describe how obligate p endosymbiosis begins and what evolutionary forces shape endosymbionts and their hosts during this transition period. Ewald (1987) reviewed early concepts on the origins of beneficial endosymbiosis and how selecti on acts on endosymbionts. He reviewed two hypotheses, the continuum hypothesis and the preadapted hypothesis. The continuum hypothesis presents a deterministic model where selection acts on the interaction between a host and a parasitic microorganism. S election will favor synergistic interactions, driving the loss of pathogenicity and the development of beneficial interactions (Feldhaar 2011). This then implies that both facultative and obligate endosymbionts originated from reproductive parasites or pa thogenic bacteria. The origin of beneficial endosymbionts (both obligate and facultative) from pathogens has seen traction with phylogenet ic researchers (Sachs et al. 2010 ; Clayton et al. 2012), however this model does not attempt to describe how natural selection acts within and across endosymbiont populations. While endosymbionts may often be closely related to pathogens, phylogenetic data provides little inference about the biology of ancestral states. The alternative to the continuum hypothesis is t he preadapted hypothesis (Ewald 1987). Here, nave bacteria possess traits beneficial in symbiosis prior to interaction with the host (Ewald 1987). Ewald (1987) seemed to discount this hypothesis because the benefit of the symbiosis would have to outweig h the costs of a symbiotic lifestyle.
121 However, pathogens are already associated with a host and the interaction is available for selection (Ewald 1987). It seems unlikely that obligate p endosymbionts are all derived from a similar source (such as paras ites). Instead, it is more likely that p endosymbionts are actually derived from multiple sources, and accordingly the path to obligate symbiosis is more variable than presented in the continuum hypothesis. I argue that we need an evolutionary model that could describe how obligate p endosymbioses form. This model would evaluate how evolutionary forces that shape endosymbionts and their hosts including directionality of transitions to p endosymbionts, effects of genome erosion, effects of new mutations o n the endosymbiont and host, effect of natural selection on the endosymbionts, effects of host control over endosymbionts, and competition among strains of the same endosymbiont, as well as with other potential symbionts. It would also serve to evaluate p otential reservoirs of new p endosymbionts (including facultative endosymbionts, reproductive parasites, pathogens, enteric flora, and free living species). Sources of P endosymbionts and D i rectionality of T ransitions It is likely that there is more than one source for new p endosymbionts. Rickettsia and Wolbachia have been recently found to act a p endosymbionts in booklice and bedbugs (Perotti et al. 2006; Hosokawa et al. 2010). These bacteria are typically acts as reproductive parasites or pathogens in a diversity of arthropods suggesting a recent transition to p endosymbiosis (Werren et al. 2008; Weinert et al. 2009). In Chapter 3, I presented data showing a Sodalis like endosymbiont is potentially taking on the role of a p endosymbiont, bu t without the organized storage as seen with louse p endosymbionts. Because there is no p endosymbiont present, this
122 suggests that an s endosymbiont is provisioning vitamin for the host. Collectively, this points to facultative endosymbionts (both reproductive pa rasites and s endosymbionts) as reservoirs of proto p endosymbionts. This evidence however does not rule out additional sources, such as pathogenic, enteric, or free living bacteria. As transitions in symbiosis occur over long evolutionary time, it is cr itical to develop an evolutionary model that could evaluate potential sources. These results could then be supported or challenged based on the phylogenetic placement of p endosymbiont s in the bacterial tree of life, not relying on phylogenetics alone to pinpoint sources. Also, a model could inform the directionality of transitions in endosymbiosis. It seems likely that transition to p endosymbionts from facultative endosymbiont, pathogen, or free living bacteria would be in one direction. Experimental data suggests that a p should rapidly degrade and become much smaller (Nilsson et al. 2005; see also Delmotte et al. 2006; Allen et al. 2009). These rapid genomic changes may quickly lead a p endosymbiont to become dependent on a par ticular host for survival and limit its ability to escape obligate symbiosis. Hence, suggesting that the transition to p endosymbiont could be an irreversible one way change. Genome Erosion i n P endosymbionts Insect p endosymbionts possess very small, AT rich genomes (Gosalbes et al. 2010; Chapter 4). They also lack much of the DNA repair mechanisms seen in free living or enteric bacteria (Dale et al. 2003). It is likely that proto p endosymbionts entered into symbiosis with much larger genomes, but muc h of the genome was lost through a process known as genome erosion (Delmotte et al. 2006; Allen et al. 2009). This process is facilitated by the life style of p endosymbionts. These endosymbionts exist in small clonal populations and undergo population b ottlenecks during
123 accumulation of mutations in endosymbiont genomes (Moran 1996; Allen et al. 2009). Genome erosion has been experimentally shown to remove large regions of ba cterial genomes over short periods of evolutionary time, particularly when DNA repair mechanisms are inhibited (Nilsson et al. 2005). Delmotte et al. (2006) found evidence that this erosion removes genes unnecessary to maintain the symbiosis, probably gen es that were selectively constrained prior to symbiosis (see also Allen et al. 2009). Allen et al. (2009) suggested that this process of genome ero sion slows as the symbiosis matures This implies selection may limit genome erosion after superfluous gen es are lost. However, this process will continue to impact the genomes of p endosymbionts and has lead to the development of mechanisms to compensate for deleterious mutations that cannot be purged from populations (Tamas et al. 2008; Kupper et al 2014). This suggests growing dependency of gene function based on the genetic background through evolutionary time. Therefore, models to describe genome evolution in long established p endsoymbionts may not be informative if applied to bacteria transitioning t o p endosymbiosis. Effects of Mutation New mutations arising in the p endosymbiont genome could have a neutral or nearly neutral effect on the fitness of both the endosymbiont and its host. The effect and prevalence of neutral mutation may change with the age of the symbiosis, because the p endosymbiont genome shrinks considerably during symbiosis (Delmotte et al. 2006). Genome shrinkage coulde, in and of itself have a fitness effect for the endosymbiont by removing mutational targets and decreasing energ etic costs associated with a larger genome. However, it is not entirely clear how much of genome
124 decay is adaptive and how much is driven by neutral processes. Endosymbionts have extremely AT rich genomes (Gosalbes et al. 2010). This could be energetica lly beneficial, but mutation is universally biased from GC to AT in bacteria (Hershberg and Petrov 2010; Van Leuven and McCutcheon 2011). Therefore, AT accumulation could be easily explained by genetic drift or by adaptive processes. It is much more stra ightforward to describe the effect of mutations arising in p endosymbiont genome that are deleterious to both p endosymbionts and to their hosts. These mutations would likely impact essential cell functions of the p endosymbiont, inhibiting survival and f itness of the p endosymbiont (Rispe and Moran 2000). As the p endosymbionts are essential to the survival of the host insect, any loss of fitness in p endosymbionts would be deleterious to the host. The loss of DNA repair mechanisms may increase in the number of deleterious mutations arising in p endosymbiont populations (i. e. mutations that could have been removed in other bacterial species). However, effect of DNA repair may increase with severity of genome erosion. The remaining type of mutation tha t could arise is a selfish mutation. These mutations would be beneficial to the p endosymbiont, but detrimental to its host (Rispe and Moran 2000). P endosymbionts produce essential compounds absent in the diet of insects (Gosalbes et al. 2010). It is e xpected that the p endosymbionts produce these nutrients in quantities above what is need by the p endosymbionts in order to make them available to the host (Rispe and Moran 2000). Any mutation that disrupts this increased production would provide an ener getic benefit to the endosymbiont and selfishly increase its own fitness, while causing a fitness cost to the host (Rispe and Moran 2000).
125 Rispe and Moran (2000) described targets for deleterious and selfish mutations in the p endosymbionts of aphids, Buc hnera 80 85% of the Buchnera genome was predicted to be devoted to essential cellular functions, while only 15 20% was devoted to functions essential to its host alone (Rispe and Moran 2000). Mutations affecting the larger set of essential cell function s would be deleterious to both Buchnera and the aphid. However, mutations arising in the functions essential to the host could increase the fitness of Buchnera while being deleterious to the host. The sizes and effects of deleterious and selfish mutation would likely be different in recently captured p endosymbionts. There would likely be more essential cell functions and therefore a large target for deleterious mutations (see Rispe and Moran 2000). The roles of the endosymbiont would not be as well def ined and selfish mutations may impact the host less, but could have a larger effect in competition between candidate p endosymbiont species. Therefore, any model attempting to describe the formation of p endosymbionts would need to allow for the changes i n effect of selfish mutations over time. Natural Selection Different approaches have been put forth to understand how natural selection acts on endosymbionts (Ewald 1987; Rispe and Moran 2000; Smith 2007; Feldhaar 2011). The holobiont or hologenome app roach, treating host and its endosymbionts as a single unit visible to selection, has some support (Feldhaar 2011). This approach is more or less a form of group selection (Leigh 2010) and shares similarities with the continuum hypothesis. Here the unit of selection is the interaction between host and endosymbiont (Ewald 1987). However, this approach largely discounts selection acting within a population of endosymbionts ( i. e. within a host) and across endosymbiont populations ( i. e. across a host popul ation). Rispe and Moran (2000) demonstrated that
126 selection acts within host and across host populations of endosymbionts in established p endosymbionts. They found that changes in the size of the transmission bottlenecks had dramatic consequences for gen etic drift and natural selection (Rispe and Moran 2000). Similar process should be acting within forming p endosymbionts, however the effect of selection could be different ( e. g. in the persistence of selfish mutations). Kin selection, a specialized ca se of group selection, occurs when closely related organisms cooperate to increases the fitness of one or more partners (Hamilton 1964). Kin selection could be pronounced in p endosymbionts. For example, in grain weevils, a single source population of p endosymbionts divides and diverges during development of the weevil, with one population occupying a region near the gut and the other population occupying the ovaries (Bright and Bulgheresi 2010). In this example, only one population has the opportunity to be passed onto the next host generation, while the other population serves to ensure the survival of the host. Kin selection could be used to explain the evolution of this symbiotic system. However, the grain weevil example may be an extreme case. In most insects, there is not such a clear physical division of endosymbionts (Buchner 1965; Bright and Bulgheresi 2010), with a compartmentalized subgroup only being transmitted to the next generation. Nevertheless, could kin selection be applied to underst and persistence of other endosymbionts? Or could some simpler idea be invoked to understand selection in p endosymbionts? P endosymbionts cannot disperse outside of their bacteriome and may be in direct competition, therefore limiting opportunity for kin selection in most cases (West et al. 2007). Clearly space and nutritional resources are limited for p endosymbionts, again potentially promoting competition.
127 Allen et al. (2009) studied the substitution rates of 16SrDNA in insect p endosymbionts. The y found that the rate of substitution slowed with respect to the age of the symbiosis and suggested that selection may limit fixation of mutation in older p endosymbionts (Allen et al. 2009). This would suggest effect of selection does change with regard to time in p endosymbionts. However, Allen et al. (2009) did not account for host effective population size or for p endosymbiont transmission census size (relevant data were unavailable at that time), although both factors that were found to be important in rates of evolution in p endosymbionts by Rispe and Moran (2000). While further validation is need ed to control for stochastic changes resulting from population effects, Allen et al. (2009) findings do provide tantalizing evidence that the effect of se lection changes over evolutionary time in p endosymbionts. This also show s that models meant to describe established p endosymbiosis could not be applie d to newly formed relationships; a new model is needed. Host Control and Drift Insects have developed c omplex mechanisms to store and guide infection of developing offspring by p endosymbionts (Buchner 1965; Hinde 1971; Bright and Bulgheresi 2010; Login et al. 2013; Diekmann and Pereira Leal 2013). This includes signaling and immune control to prevent infe ction of non bacteriome associated tissues and guiding transmission (Hinde 1971; Shigenobu and Stern 2012; Login et al. 2013). Control of insect p endosymbionts could be particularly interesting to study in parasitic lice (Insecta, Anoplura), as these spe cies have evolved different strategies for storage and transmission of p endosymbionts (Buchner 1965). Here, the bacteriomes are derived from different host tissues and migration to ovaries takes different routs in
128 different, but closely related species o f lice (Buchner 1965). These differences could be explained simply by p endosymbiosis having evolved multiple times in the parasitic lice. The variation may then simply reflect random difference that arose independently. 16SrDNA based phylogenies of lou se p endosymbiont s provide evidence for multiple origins of p endosymbioses (Allen et al. in rev.). However, this variation could have evolved independently by different louse species needing to control the spread of p endosymbiont. The phylogenetic data presented in Chapter 4 suggests there could have been a single infection in an ancestral louse species. Then the variation in p endosymbiont storage and transmission arose independently in the presence of similar p endosymbionts. Rispe and Moran (2000) provided evidence that transmission bottleneck size was important to controlling proliferation of selfish mutation and limiting drift in p endosymbiont genomes. In ancient p endosymbionts, smaller transmission bottlenecks limited fixation of selfish mutat ions, while increasing the effect of drift (Rispe and Moran 2000). However, increasing the census size of p endosymbionts passed on to the next generation facilitated maintenance of selfish mutations possibly detrimental to the host (Rispe and Moran 2000) Hence, host and p endsoymbiont population sizes must be considered when developing a model of endosymbiont transmission. As I argued above, prevalence of selfish mutations could change with the age of symbiosis. Parasitic lice could provide insights i nto the host control over p endosymbiont transmission and necessitates the need for an accurate phylogenetic reconstruction of these endosymbiotic bacteria.
129 If directionality of endosymbiont transmission does not occur in one direction ( i. e. a new p endo symbiont may still be able to leave the bacteriomic population), migration between source populations could promote p endosymbiont diversity by introducing new gen etic material to p endosymbiont populations. This could slow the accumulation of mutations a nd genome erosion in p endosymbionts and have consequences for the effect of host control and p endosymbiont transmission. Much remains unknown about how early p endosymbionts are maintained, but studies of Buchnera provides clues (Rispe and Moran 2000; s ee also Sachs et al. 2010). The development of host control through physical storage and controlling transmission is likely important in all stages in the evolution of p endosymbiosis. Competition It is known that s endosymbionts have the potential to invade bacteriomes, the structures normally reserved for p endosymbionts (see Oliver et al. 2010 for review in aphids). Bacteriome invasion by s endosymbionts in aphids can have different effects on host fitness depending on the s endosymbiont strain, p e ndosymbiont strain, and environmental conditions (Oliver et al. 2010). This suggests competition for space in the bacteriome exists in at least established p endosymbionts. S endosymbionts could consume space, nutrient provisioned by the host, and possib ly nutrients made by the p endosymboints. Competition by closely related bacterial strains (with at least some phenotypic differences) or unrelated species likely effects the formation of new p endosymbionts. Any model attempting to describe how obligate symbiosis evolves should include insult by invasion and competition from other bacteria.
130 Summary Intracellular symbiosis is prevalent in plants and animals, but particularly common in insects (Smith and Douglas 1987). The associations between insects and bacteria provide opportunities to study and understand how intricate and o bligate endosymbioses originate Findings from insect endosymbionts will be significant contribution to understanding the origins of complex organisms and how animals adapt to new resources. Here, I have pointed to an area where there is little known about the evolution of insect endosymbionts. With the increasing availability of genome sequence data, large computing, and ecological data the questions discussed here may be approa chable. A model describing the how drift and natural selection shapes the origins of primary obligate endosymbiosis is important to understanding the diversity and evolution of bacteria and insects alike
131 LIST OF REFERENCES Alikhan NF, Petty NK, Zakour N LB, Beatson SA. 2011. BLAST ring image generator (BRIG): simple prokaryote genome comparisons. BMC Genomics 12:402. Allen JM, Burlei gh G, Light JE, Reed DL. In rev Extensive gammaproteobacterial sampling reveals multiple origins of endosymbiosis in blood feeding lice. Allen JM, Light JE, Perotti MA, Braig HR, Reed DL. 2009. Mutational meltdown in Allen JM, Light JE, Reed DL. 2010. Coevolutionary history of sucking lice and their pr imary endosymbionts. Turkiye Par azitol Derg 34 (suppl 1): 68. Allen JM, Reed DL, Perotti MA, Braig HR. 2007. Evolutionary relationships of hematophagous primate lice. Appl Environ Micr obiol 73:1659 1664. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol 215:403 410. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. 1997. Gapped BLAST and PSI BLAST: a new gen eration of protein databse search programs. Nucleic Acids Res 25:3389 3402. Archetti M, Francisco U, Fundenberg D, Green J, Pierce NE, Yu DW. 2011. Let the right one in: a microeconomic approach to partner choice in mutualisms. Am Nat. 177:75 85. Aschner M. 1934. Studies on the symbiosis of the body louse: I Elimination of the symbionts by centrifugalisation of the eggs. Parasitology XXVI: 309 314, plate XII. Aschner M, Ries E. 1933. Das verhalten der kleiderlaus bei ausschaltung iher symbioten. Zeitschri ft fur Morphologie und Okologie der Tiere 26:529 590. Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, Meyer F, Olsen GJ, Olson R, Osterman AL, Overbeek RA, McNeil LK, Paarmann D, Paczian T, Parrello B, Pusch C, Reich C, Stevens R, Vassieva O, Vonstein V, Wilke A, Zagnitko O. 2008. The RAST server: rapid annotations using subsystems technology. BMC Genomics 9:75. Babraham Bioinformatics. FastQC. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (2 012). Barker SC, Whiting M, Johnson KP, Murrell A. 2003. Phylogeny of the lice (Insecta, Phthirpatera) inferred from small s ubunit rRNA. Zool Scr 32: 407 414.
132 Boyd BM, Reed DL. 2012. Taxonomy of lice and their endosymbiotic bacteria in the post genomic era. Clin Microbiol Infect 18:324 331. Bozeman FM, Sonenshine DE, Williams MS, Chadwick DP, Lauer DM, Elisberg BL (1981) Experimental infection of ecotparasitic arthropods with Rickettsia prowazekii (GvF 16 strain) and transmission to flying squirrels. Am J Tr op Med Hyg 30:253 263. Bright M, Bulgheresi S. 2010. A complex journey: transmission of microbial symbionts. Nat Rev Microbiol 8:218 230. Buchner P. 1965. Endosymbiosis of Animals with Plant Microorganisms. Interscience: New York. Burgdorfer W, Hayes SF, M aves AJ. 1981. Nonpathogenic Rickettsiae in Dermacentor andersoni: A limiting factor for the distribution of Rickettsia rickettsia. In Burgdorfer W, Anaker RL Ed. Rickettsiae and Rickettsial diseases. Academic Press, New York. Burke GR, Moran NA. 2011. Ma ssive genomic decay in Serratia symbiotica, a recently evolved symbiont of aphids. Genome Biol Evol 3:195 208. Cameron SL, Yoskizawa K, Mizukoshi A, Whiting MF, Johnson KP. 2011. Mitochondrial genome deletions and minicircles are common in lice (Insecta: P hthir aptera). BMC Genomics 12: 394 408. Charles H, Condemine G, Nardon C, Nardon P. 1997. Genome size characterization of the principal endocullular symbiotic bacteria of the weevil Sitophilus oryzae, using pulsed field gel electrophoresis. In sect Biochem M ol Biol 27: 345 350. Clark MA, Moran NA, Baumann P, Wernegreen JJ. 2000. Cospeciation between bacterial endosymbionts (Buchnera) and a recent radiation of aphids (Uroleucon) and pitfalls of testing for phylogenetic congruence. Evolution 54:517 525. Clayton AL, Oakeson KF, Gutin M, Pontes A, Dunn DM, Niederhausern ACV, Weiss RB, Fisher M, Dale C. 2012. A novel human infection derived bacterium provides insights into the evolutionary origins of mutualistic insect bacterial symbiosis. PLoS Genet 8:e1002990. Com as I, Moya A, Gonzales Candelas F. 2007. From phylogenetics to phylogenomics: proteobacteria as a test case. Syst Biol 56: 1 16. de Crecy Lagard V, Marck C, Grosjean H. 2012. Decoding in Candidatus Riesia pediculicola, clase to a minimal tRNA modification set? Trends Cell Mol Biol 7:11 34.
133 Dale C, Maudlin I. 1999. Sodalis gen. nov. and Sodalis glossinidius sp. nov., a microaerophilic secondary endosymbiont of the tsetse fly Glossina moristans morsitans. Inf J Systematic Bacteriol 49:267 275. Dale C, Moran NA. 20 06. Molecular interactions between bacter ial symbionts and their hosts. Cell 126:453 465. Dale C, Wang B, Moran N, Ochman. 2003. Loss of DNA recombination and repair enzymes in the initial stages of genome degeneration. Mol Biol Evol 20:118 1194. Degnan PH Lasarus AB, Wernegreen. 2005. Genome sequence of Blochmannia pennsylanicus indicates parallel evolutionary trends among bacterial mutualists of insects. Genome Res 15:1023 1033. Delcher AL, Harmon D, Kasif S, White O, Salzberg SL. 1999. Improved microbia l gene identification with GLIMMER. Nucleic Acids Res 27:4636 4641. Delmotte F, Rispe C, Schaber J, Silva FJ, Moya A. 2006. Tempo and mode of early gene loss in endosymbiotic bacteria from insects. BMC Evol Biol 6:56. Delsuc F, Brinkmann H, Philippe H. 200 5. Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet 6:361 373. Diekmann Y, Pereira Leal JB. 2013. Evolution of intracellular compartmentalization. Biochem J 449:319 331. Douglas AE. 1989. Mycetome symbiosis in insects. Biol Rev 64:40 9 434. Durden LA, Musser GG. 1994. The Mammalian hosts of the sucking lice (Anoplura) of the world: a host parasite list. J Vector Ecol 19:130 168. Duron O, Hurst GDD. 2013. Arthropods and inherited bacteria: from counting the symbionts to understanding h ow symbionts count. BMC Biol 11:45. Eberle MW, Mclean DL. 1982. Initiation and orientation of the symbiote migration in the human body louse Pediculus humans L. J Insect Physiol 28:417 422. Eberle MW, Mclean DL. 1983. Observation of symbiote migration in h uman body lice with scanning and transmission electron microscopy. Can J Microbiol 29:755 762. Eddy SR. 2011. Accelerated profile HMM searches. PLoS Comput Biol 7:e1002195. Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high thr oughput. Nucl Acids Res 32:1792 1797.
134 Ewald PW. 1987. Transmission modes and evolution of the parasitism mutualism continuum. Ann N Y Acad Sci. 503:295 306. FAS Center for Systems Biology, Harvard. Sequence Analysis Protocols. http://sysbio.harvard.edu/cs b/resources/computational/scriptome/unix/Protocols/ Sequences.html (2012). Felsenstein J. 1978. Cases in which parsimony or compatibility methods will be positively misleading. Syst Zool 27:401 410. Feldhaar H. 2011. Bacterial symbionts as mediators of ecol ogically important traits of insect hosts. Ecol Entomol 36:533 543. Felsheim RF Kurtti TJ, Munderloh UG. 2009. Genome sequence of the endosymbiont Rickettsia peacockii and comparison with virulent Rickettsia reckettsii: indentificaiton of virulence factor s. PLoS ONE 4:e8361. Fourn i er PE, Karkouri KE, Leroy Q, Robert C, Giumelli B, Renesto P, Socolovschi C, Parola P, Audic S, Raoult D. 2009. Analysis of the Rickettsia africae genomes reveals that virulence acquisition in Rickettsia species may be explained by genome reduction. BMC Genomics 10:166. Fournier P, Ndihokubwayo J, Guidran J, Kelly PJ, Raoult D. 2002. Human pathogens in body and head lice. Emerg Infect Dis 8:1515 1518. Frith MC, Hamada M, Horton P. 2010. Parameters for accurate genome alignment. BMC Bioinf 11:80. Fu L, Niu B, Zhu Z, Wu S, Li W. 2012. CD HIT: accelerated for clustering the next generation sequencing data. Bioinf 28:3150 3152. Fuk atsu T, Hosokawa T, Koga R, Nikoh N, Kato T, Haymam S, Takefushi H, Tanak I 2009. Intestinal endocell ular symbiotic bacterium of the macaque louse Pedicinus obtusus: distinct endosymbiont origins in anthropoid primate lice and the old world monkey lous e. Appl Environ Microbiol 75: 3796 3799. Fukatsu KS, Koga R, Nikoh N et al. 2006. Symbioitic bacteria asso ciated with stomach discs of human lice. Appl Enivon Microbiol 72: 7349 7352. Fukatsu T, Koga R, Smith WA et al. 2007. Bacterial endosymbionts of the slender pigeon louse, Columbicola columbae, allied to endosymbionts o f the grain weevils and tsetse flies. A ppl Environ Microbiol 73: 6660 6668. Funk DJ, Wernegreen JJ, Moran NA. 2001. Intraspecific variation in symbiont genomes: bottlenecks and the aphid buchnera association. Genetics 157:477 489. Furman DP, Loomis EC. 1984 The ticks of California (Acari: Ixodida). Bull California Insect Survey. 25:1 89.
135 Galtier N, Gouy M. 1998. Inferring patterns and process: maximum likelihood implementation of a nonhomogenous model of DNA sequence evolution for phylogenetic analysis. Mol Biol Evol 15:871 879. Gil R, Silva FJ, Pereto J, Moya A. 2004. Determination of the core of a minimal bacterial gene set. Microbiol Mol Biol 68:518 537. Gillespie JJ, Ammerman NC, Beier Sexton M, Sobral BS, Azad AF. 2009. Louse and flea borne rickettiose s: biological and genomic analyses. Vet Res 40:12. Gosalbes MJ, Lamelas A, Moya A, Latorre A. 2008. The striking case of tryptophan provision in the cedar aphid Cinara cedri. J Bacteriol 190:6026 6029. Gosalbes MJ, Latorre A, Lamelas A, Moya A. 2010. Genom ics of intracellular symbionts in insects. International J Med Microbiol 300:271 278 Grimaldi D, Engel MS. 2005. Evolution of the insects. Cambridge: C ambridge, University Press Gross L 1996. How Charles Nicolle of the Pasteur Institute discovered that epidemic typhus is transmitted by lice: Reminiscences from my years that the Pasteur Institute in Paris. Proc Natl Acad Sci 93:10539 10540. Hamilton WD. 1964. The genetical evolution of social behavior. J Theor Biol 7:1 16. Hershberg R, Petrov DA. 2010. Ev idence that mutation is universally biased towards AT in bacteria. PLoS Genetics 6:e1001115. Hinde R. 1971. The control of the mycetome symbiotes of the aphids Brevicoreyne brassicae, Myzus persicae, and Macrosiphum rosae. J Insect Physiol 17:1791 1800. Ho sokawa T, Koga R, Kikuchi Y, Meng X, Fukatsu T. 2010. Wolbachia as a bacteriocyte associated nutritional mutualist. PNAS 107:769 774. Houhamdi L, Raoult D. 2006. Experimentally infected human body lice (Pediculus humanus humanus) as vectores of Rickettsia rickettsii and Rickettsia conorii in a rabbit model. Am J Trop Med Hyg 74:521 525. Houhamdi L, Fournier PE, Fang R, Raoult D. 2003. An experimental model of human body louse infection with Rickettsia typhi. Ann NY Acad Sci 990:617 627. Hypsa V, Krizek J. 2 007. Molecular evidence for polyphyletic origin of the primary symbionts of sucking lice (Phthirpatera, Anoplura). Microb Ecol 54:242 251. Jackson JH, Harrison SH, Herring PA. 2002. A theoretical limit to coding space in chromosomes of bacteria. Omics 6:11 5 121.
136 Johnson KP, Walden KKO, Robertson HM. 2011. A targeted restricted assembly method (TRAM) fo r phylogenetics. Nat Proc npre20104612 1. Johnson KP, Whiting MF. 2002. Multiple genes and the monophyly of Ischnocera (Insecta: Phthiraptera) Mol Phylogenet Evol 22: 101 110. Jonson KP, Yoskizama K, Smith VS. 2004. Multiple origins of parasitism in lice Proc R Soc B 271:1771 1776. Kiawa N, Hosokawa T, Kikuchi Y et al. 2007. Primary gut symbiont and secondary, Sodalis allied symbiont of the Scutellerid stinkbu g Cantao ocellatus. A ppl Environ Microbiol 76: 3486 3494. Kim KC. 1971. The sucking lice (Anoplura: Echinophthiriidae) of the northern fur seal; descriptions and morphological adaptation. Ann Entomol Soc Am 64:280 292. Kim KC, Ludwig HW. 1978. The family cl assification of Anoplura. Syst Entomol 3:249 284. Kim KC, Ludwig HW. 1982. Parallel evolution, cladistics, and classification of parasitic Pscocodea. Ann Entomol Soc Am 75:537 548. Kirkness EF, Haas BJ, Sun W, Braig HR, Perotti MA, et al. 2010. Genome sequ ence of the human body louse and its primary endosymbiont provide insights into the permanent parasitic lifestyle. PNAS 1003379107. Konstantinidis KT, Remette A, Tiedje JM. 2006. The bacterial species definition in the genomic era. Phil Trans R Soc B 361:1 929 1940. Koga R, Bennett GM, Cryan JR, Moran NA. 2013. Evolutionary replacement of obligate symbionts in an ancient and diverse insect lineage. Environ Microbiol 15:2073 2081. Kikuchi Y. 2009. Endosymbiotic bacteria in insects: their diversity and cultura bility. Microbes Environ 24:195 204. Kuck P, Meusemann K. 2010. FASconCAT, version 1.0. Zool Forschungsmuseum A. Koeng, Germany. Kupper M, Gupta SK, Feldhaar H, Gross R. 2014. Versatile roles for the chaperonin GroEL in microorganism insect interactions. F EMS Microbiol Lett DOI10.1111/1574 6968.12390. Kyei Poku GK, Colwell DD, Coghlin P, Benkel B, Floate KD. 2005. On the ubiquity and phylogeny of Wolbachia in lice. Mol Ecol 14:285 294. Langmead B, Saizberg S. 2012. Fast gapped read alignment with Bowtie 2. Nat Methods 9:357 359.
137 Lefevre C, Charles H, Vallier A, Delobel B, Farrel B, Heddi A. 2004. Endosymbiont phylogenesis in the Dryophthoridae weevils: Evidence for bacterial replacement. Mol Biol Evol 21:965 973. Leigh EG Jr. 2010. The group selection contro versy. J Evol Biol 23:6 19. Lerat E, Ochman H. 2005. Recognizing the pseudogenes in bacterial genomes. Nucleic Acids Res 33:3125 3132. Li W, Godzik A. 2006. Cd hit: a fast program for clustering and comparing large data sets of protein or nucleotide sequen ces. Bioinf 22:1658 1659. Light JE, Smith VS, Allen JM, Durden LA, Reed RL. 2010. Evolutionary history of mammalian sucking lice (Phthiratpera: Ano plura). BMC Evol Biol 10: 292 306. Light JE, Reed DL. 2009. Multigene analysis of phylogenetic relationships a nd divergence times of primate sucking lice (Phthirpatera: Anopl ura). Mol Phylogenet Evol 5 0: 376 390. Light JE, Toups MA, Reed DL. 2008. human head louse and body louse. Mol Phylogenet Ev ol 47: 1203 1216. Login FH, Balmand S, Vallier A, Vincent Monegat C, Vigneron A, Weiss Gayet M, et al. 2011. Antimicrobial peptides keep insect endosymbionts under control. Science 334:362. Lyal CHC. 1985. Phylogeny and classification of the Psococea, with particular reference to the lice (Psocodea, Phthiraptera). Syst Entomol 10:145 165. Lyons E, Freeling M. 2008. How to usefully compare homologous plant genes and chromosomes as DNA sequences. The Plant Journal 53:661 673. Lyons E, Penderson B, Kane J, Freeling M. 2008. The value of nonmodel genomes and an example using SynMap with CoGe to dissect the hexaploidy that predates rosids. Tropical Plant Biol 1:181 190. Merhej V, Royer Carenzi M, Pontaotti P, Raoult D. 2009. Massive comparative genomic analysis reveals convergent evolut ion of specialized bacteria. Biol Direct 4:13. Mira A, Moran NA. 2002. Estimating population size and transmission bottlenecks in maternally transmitted endosymbiotic bacteria. Microb Ecol 44:137 143. het in endosymbiotic bacteria. Proc Natl Acad Sci 93:2873 2878. Moran NA, Mira A. 2001. The process of genome shrinkage in the obligate symbiont Buchnera aphidicola. Genome Biol 2:research0054 research0054.12.
138 Moran NA, McCutcheon JP, Nakabachi A. 2008. Ge nomics and evolution of heritable bacterial symbionts. Annu Rev Genet 42:165 90. Naum M, Brown EW, Mason Gamer RJ. 2008. Is 16S rDNA a reliable phylogenetic marker to characterize relationships below the family level in the enterobacteria. J Mol Evol 66: 630 642. Niebylski ML, Schrumpt ME, Burgdorfer W, Fisher ER, Gage KL, Schwan TG. 1997. Rickettsia peacockii sp. Nov., a new species infecting wood ticks, Dermacentor anersoni, in western Montana. Int J Syst Bacteriol 47:446 52. Niebylski ML, Peacock MG, Sc hwan TG. 1999. Lethal effect of Rickettsia rickettsii on its tick vector (Dermancentor andersoni). Appl Environ Microbiol 65:773 778. Nilsson AJ, Koskiniemi S, Eriksson S, Kugelberg E, Hinton JCD, Andersson DI. 2005. Bacterial genome size reduction by exp erimental evolution. PNAS 102:12112 12116. Novakova E, Hypsa V. 2007. A new Sodalis lineage from bloodsucking fly Caterina melbae (Diptera, Hippoboscoidea) originated independently of the tsetse fly symbiont Sodalis glossinidius. FEMS Microbiol Lett 269:13 1 135. Novakova E, Hypsa V, Moran NA. 2009. Arsenophonus, an emerging clade of intracellular symbionts with a broad host distribution. BMC Microbiol 9:143. University Press, Bal timore. intracellular symbionts. Evolution 62:361 373. Oliver KM, Degnan PH, Burke GR, Moran NA. 2010. Facultative symbionts in aphids and horizontal transfer of ecological ly important traits. Annu Rev Entomol 55:247 266. Osborne H. 1899. Insects of the Pribilof Islands IX Acarina. In Jordan et al. Fur seals and fur seal islands, North Pacific Ocean. Pt3 9:553 554 & figure2. Overbeek R, Begley T, Butler RM, Choudhuri JV, Ch uang HY, Cohoon M, de Crecy Lagard V, Diaz N, Disz T, Edwaards R, Fonstein M, Frank ED, Gerdes S, Glass EM, Goesmann A, Hanson A, Iwata Reuyl D, Jensen R, Jamshidi N, Krause L, Kubal M, Larsen N, Linke B, McHardy AC, Meyer F, Neuweger H, Olsen G, Olsen R, Osterman A, Portnoy V, Pusch GD, Rodionov DA, Ruckert C, Steiner J, Stevens R, Thiele I, Vassieva O, Ye Y, Zagnitko O, Vonstein V. 2005. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res 33: 5691 5702.
139 Paradis E, Claude J, Strimmer K. 2004. APE: analyses of phylogenetics and evolution in R language. Bioinf 20:289 290. Perotti MA, Clarke HK, Turner BD, Braig HR. 2006. Rickettsia as obligate and mycetomic bacteria. FASEB J 20:2372 2374 & E1646 E 1656. Perotti MA, Allen JM, Reed DL, Braig HR. 2007. Host symbiont interaction of the primary endosymbiont of human and head and body lice. FASEB 21:1058 1066. Perotti MA, Kirkness EF, Reed DL, Braig HR. Endosymbionts of lice. 2009. In Bourtzis K, Mille r TA, eds. Insect Symbiosis, Vol. 3. Boca Raton, FL: CRC Press, 205 219. Price RD, Hellenthal RA, Palma RL, Johnson KP, Clayton DH. 2003. The chewing lice: world checklist and biological overview. Illinois Nat His t Special Publication 24: i 501. Puchta O. 1 955. Experimentelle untersuchungen uber die bedeutung der symbiose der kleiderlaus Pediculus vestimenti Burm. Z Parasitenkd 17:1. Raoult D, Roux V. 1999. The body louse as a vector of reemerging human diseases. Clin Infect Dis 29:888 911. Reed DL, Light JL Allen JM, Kirchman JJ. 2007. Pair of lice lost or parasites regained: the evolutionary history of anthropoid primate lice. BMC Biol 5: 7 17. Ries E. 1930. Die symbiose der lause und federlinge. Aus dem Zoologischen der Universitat Breslau mit74. Ries E. 1 931. Die symbiose der lause und federlinge. Zeitschrift fur Morphologie und Okologie der Tiere 20:233 367. Ries E. 1932. Die prozesse der eibildung und des eiwachstums bei Pediculiden und Mallophagen. Zeitschrift fur Zellforschung und Mikroskopische Anato mie 16:314 388. Ries E. 1933. Endosymbiose und parasitismus. Zeitchrift fur Parasitenkunde 6:339 349. Ries E. 1935. Uber den sinn der erblichen insektensymbiose. Naturwissenschaften 23:744 749. Ries E, van Weel PB. 1934. Die eibildung der kleiderlaus, unt ersucht an lebenden, vital gefarbten und fixierten praparaten. Zeitschrift fur Zellforschung und Mikrokopische Anatomie 20:565 618. Rispe C, Moran NA. 2000. Accumulation of deleterious mutations in endosymbionts: on. Am Nat 156:425 441.
140 Robinson D, Leo N, Prociv P, Barker SC. 2003. Potential role of head lice, Pediculus humanus capitis, as vectors of Rickettsia prowazekii. Parsitol Res 90:209 211. Sachs JL, Russell JE, Lii YE, Black KC, Lopez G, Patil AS. 2010. Ho st control over infection and proliferation of a cheater symbiont. J Evol Biol 23:1919 1927. Sasaki Fukatsu K, Koga R, Nikoh N, Yoshizama K, Kasai S, Mihara M, Kobayashi M, Tomita T, Fukatsu T. 2006. Symbiotic bacteria associated with stomach disc of human lice. Appl Environ Microbiol 72:7349 7352. Shao R, Barker SC. 2011. Chimeric mitochondrial minichromosomes of the human body louse, Pediculus humanus: evidence for homologous and non homologous recombination. Gene 473: 36 43. Shao R, Kirkness EF, Barker SC 2009. The single mitochondrial chromosome typical of animals has evolved into 18 minichromosomes in the human body louse, Pediculu s humanus. Genome Res 19: 904 912. Shigenobu S, Stern DL. 2012. Aphids evolved novel secreted proteins for symbiosis with bac terial endosymbiont. Proc R Soc B 280:1750. Silva FJ, Latorre A, Moya A. 2003. Why are the genomes of endosymbiotic bacteria so stable? Trends Genet 19:176 180. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I. 2009. ABySS: a parallel assembler for short read sequence data. Genome Res 19:1117 1123. eye view of symbiont transmission. Am Nat. 170:542 550. Smith VS. 2004. Lousy phylogenies: Phthiraptera systematic and the antiquity of lice. Entomol Abh 61: 150 151. Smith VS, Ford T, Johnson KP, Johnson PCD, Yoshizawa K, Light JE. 2011. Multiple lineages of lice pass through the K Pg boundary. Biol Lett 7:782 785. Smith DC, Douglas AE. 1987. The Biology of Symbiosis. Edward Arnold, London. 302pp. Smith WA, Dale C, Clayton DH. 2010. Determining the role of bacterial symbionts with the genus Columbicola. Turkiye Par azitol Derg 34 (suppl 1): 67. Smith WA, Oakeson KF, Johnson KP, Reed DL, Carter T, Smith KL, Koga R, Fukatsu T, Clayton DH, Dale C. 2013. Phylogenetic analysis of symbi onts in feather feeding lice of the genus Columbicola: evidence for repeated symbiont replacements. BMC Evol Biol 13:109.
141 Snyder AK, Deberry JW, Runyen Janecky L, Rio RVM. 2010. Nutrient provisioning facilitates homeostasis between tsetse fly (Diptera: Glo ssinidae) symbionts. Proc R Soc B 277:2389 2397. Stamatakis A. 2006. RAxML VI HPC: maximum likelihood based phylogenetic analyses with thousands of taxa and mixed models. Bioinf 22:2688 2690. Swofford DL, 2003. PAUP. Phylogenetic analysis using parsimony. V4. Sinauer Associates, Sunderland, Massachusetts. Tamas J, Gil R, Latorre A, Pereto J, Silva FJ, Moya A. 2007. The frontier between cell and organelle: genome analysis of Candidatus Carsonella ruddii. BMC Evol Biol 7: 181 187. Toft C, Anderson SGE. 2010. E volutiony microbial genomics: insights into bacterial host adaptation. Nat Rev Genet 11:465 475. To h H, Weiss BL, Perkin SAH, Yasmashia A, Oshima K, Hattori M, Aksoy S. 2006. Massive genome erosion and functional adaptations provide insights into the symbi otic lifestyle of Sodalis glossinidius in the ts etse host. Genome Res 16: 149 156. Van Leuven JT, McCutcheon JP. 2011. An AT mutation bias in the tiny GC rich endosymbiont genome of Hodgkinia. Genome Biol Evol 4:24 27. Ward BT. 1921. Alaska Fishery and Fur seal industries in 1920. In Appendix VI of Doc 909 of the Report to the US Commissioner of Fisheries for 1921. Weinert LA, Werren JH, Aebi A, Stone GN, Jiggins FM. 2009. Evolution and diversity of Rickettsia bacteria. BMC Biol 7:6. Wel EG, Frederixkson ME Yu DW, Pierce NE. 2010. Economic contract theory test models of mutualism. PNAS 107:15712 15716. Wernegreen JJ. 2004. Endosymbiosis: lessons in conflict resolution. PLoS Biol 2:e68. Werren JH, Baldo L, Clark ME. 2008. Wolbachia: master manipulators of in vertebrate biology. Nat Rev 6:741 751. West SA, Griffin AS, Gardner A. 2007. Evolutionary explanations for cooperation. Current Biology 17:R661 R672. Williams LK, Wernegreen JJ. 2010. Unprecedented loss of ammonia assimilation capability in a urease encodi ng bacterial mutualist. BMC Genomics 11:687. Yoskizawa K Johnson KP. 2010. (Insects: Psocodea)?: a comparison of phylogenetic signal in multiple genes. Mol Phylogenetic Evol 55:939 951.
142 Zerbino, DR. 2010. Us ing the Velvet de novo assembler for short read sequencing technologies. Curr Protoc Bioinformatics DOI 10.1002/0471250953.bi1105s31. Zerbino, DR, E Birney. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genomes Res 18:821 829.
143 BIOGRAPHICAL SKETCH Bret M. Boyd began studying butterflies and moths (Lepidoptera, Insecta) during grade school in Henderson Nevada. Under the guidance of George T. Austin and with the support of his parents, Bret continued his studies through th e Nevada State Mus eum and Historical Society. In 1993 the Nevada State Legislature in Senate Concurrent Resolution 31 recognized Bret for his contribution to the understand ing of Lepidopt era on the Senate and Assembly f loor Bret continued his studies through high school, working with faculty from Stanford University and the University of Neva da Reno to document Lepidoptera diversity in Nevada and California This included conducting field and laboratory research relating to the Nevada Biodiversity Ini tiative, Spring Mountains Conservation Agree ment and Walker River Drainage Biodiversity I nventory. Bret obtained an Associate of Science fro m Richmond Community College in 2005 and received an Excellence in Science award at graduation. In 2005 he moved to Gainesville, Florida and began working under the direction of Dr. Jaret C. Daneils Here Bret was part of an effort to reintroduce the endangered Miami Blue to the Florida Keys and conduct research relating to other imperiled insect taxa. Following th is Bret enrolled as an undergraduate at the University of Florida, Department of Entomology and Nematology, while continuing his studies of Lepidoptera. He was awar ded a Bachelor of Science in 2008 and be gan working in the lab of Dr. Marta L Wayne, Unive rsity of Florida, Department of Biology. In 2009, he enrolled in the Genetics and Genomics Graduate Program, University of Florida, Genetics Institute. Under the supervision of Dr. David L. Reed he received a Doctor of Philosophy in the spring of 2014. His research focused on understanding the evolution of bacterial symbiosis in lice using phylogenomic and comparative genomic tools. In 2013 he received a
144 National Science Foundation Doctoral Dissertation Improvem ent grant to continue his graduate work.
xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID E789Y9IF6_O9K9JB INGEST_TIME 2014-10-03T21:29:03Z PACKAGE UFE0046441_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC