|UFDC Home||myUFDC Home | Help|
This item has the following downloads:
ANTHROPOLOGICAL GENETIC ANALYSIS OF HUMAN DISEASE FROM EVOLUTIONARY, POPULATION, AND CLINICAL PERSPECTIVES By REBECCA R. GRAY A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLOR IDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2008 1
2008 Rebecca R. Gray 2
ACKNOWLEDGMENTS Foremost, I thank my Ph.D. advisor Dr. Conni e Mulligan for her mentorship and affording me the opportunity to conduct this research. I thank the members of my Ph.D. committee; specifically I thank Dr. Maureen Goodenow for her training and exper tise; I thank Dr. John Krigbaum and Dr. Marta Wayne for their early guidance; and I thank Dr. David Reed for his helpful analytical advice. I thank Dr. Marco Sa lemi for his instructi on and involving me in additional projects. I tha nk my collaborators on the Treponema project at the University of Washington, including Dr. Sheila Lukehart and Dr. Arturo Centuri on. I thank my collaborators at the National Institutes of Health including Dr. Jordi Clarimon, Dr. Andr ew Singleton, Dr. David Goldstein, and Dr. Mary-Anne Enoch. I thank Dr. Grace Aldrovandi at the Childrens Hospital in Los Angeles who collaborated on the breastmilk pr oject. I thank my undergraduate advisor Dr. Carole Counihan for my initial and continue d enthusiasm for anthropological research. I appreciate the suggestions and feedback on thes e projects from the members of Drs. Mulligan and Goodenows lab. I acknowledge the undergra duate students who assisted me on these projects, including Danielle Muchnick and Lindsay Williams. I profess deep gratitude to the Native American individuals and the Zambian wome n who participated in the studies which are part of this dissertation. Finall y, I thank my parents for the in tellectual foundation they provided, and my husband for his encouragement and profound patience. 3
TABLE OF CONTENTS page ACKNOWLEDGMENTS ...............................................................................................................3 LIST OF TABLES ...........................................................................................................................7 LIST OF FIGURES .........................................................................................................................8 ABSTRACT ...................................................................................................................................10 CHAPTER 1 INTRODUCTION................................................................................................................. .12 2 MOLECULAR EVOLUTION OF THE TPRC D, I, K, G AND J GENES IN THE PATHOGENIC GENUS Treponema .....................................................................................25 Introduction .............................................................................................................................25 Materials and Methods ...........................................................................................................29 Treponemal Strains and tpr Sequencing..........................................................................29 Evolutionary Analysis of Sequences...............................................................................30 Phylogenetic Analyses .....................................................................................................30 Detection of Recombination ............................................................................................31 Results .....................................................................................................................................32 Phylogenetic Analyses .....................................................................................................32 Phylogenetic analyses of subfamily I .......................................................................33 Phylogenetic analyses of subfamily II......................................................................36 Phylogenetic analyses of subfamily III....................................................................37 Statistical Tests for Recombination .................................................................................38 Analysis of Nucleotide Diversity and Composition ........................................................40 Discussion ...............................................................................................................................41 3 LINKAGE DISEQUILIBRIUM AND ASSOCIATION ANALYSIS OF ALPHA SYNUCLEIN ( SNCA ) AND ALCOHOL AND DRUG DEPENDENCE IN TWO AMERICAN INDIAN POPULATIONS...............................................................................59 Introduction .............................................................................................................................59 Materials and Methods ...........................................................................................................61 Sampling Strategy ...........................................................................................................61 Testing Instruments, Interviews, and Psychiatric Diagnoses ..........................................62 Genotyping ......................................................................................................................63 Statistical Analysis ..........................................................................................................63 Results .....................................................................................................................................65 Discussion ...............................................................................................................................68 4
4 LACK OF ASSOCIATION BETWEEN ADH/ALDH MARKERS AND SUBSTANCE USE DISORDER IN NATIVE AMERICAN POPULATION..............................................75 Introduction .............................................................................................................................75 Materials and Methods ...........................................................................................................78 Samples ............................................................................................................................78 Testing Instruments, Interviews, and Psychiatric Diagnoses ..........................................78 Genotyping ......................................................................................................................79 Statistical Analysis ..........................................................................................................79 Results .....................................................................................................................................80 Discussion ...............................................................................................................................82 5 DYNAMIC AND DISTINCT EVOLUTI ON OF HIV-1 IN BREASTMILK OVER TWO YEARS POST-PARTUM............................................................................................89 Introduction .............................................................................................................................89 Background .............................................................................................................................91 Human Immunodeficiency Vi rus Type 1 Infection.........................................................91 Stages of Breastmilk Production .....................................................................................93 Cellular Composition of Breastmilk ................................................................................95 Compartmentalization of Breastmilk Virus .....................................................................96 Risk of Transmission via Breast-feeding ........................................................................98 Our Study .......................................................................................................................102 Materials and Methods .........................................................................................................104 Subject ...........................................................................................................................104 Viral Isolation, Amplification, and Sequencing ............................................................104 Sequence Analysis and Recombination .........................................................................105 Phylogenetic Analyses ...................................................................................................106 Branch Selection Analysis .....................................................................................108 Compartmentalization ............................................................................................108 Results ...................................................................................................................................109 Subtype Analysis ...........................................................................................................109 Sequence Analysis .........................................................................................................109 Variable regions 1 and 2 sequence analysis ...........................................................109 Variable region 3 loop analysis ..............................................................................111 Recombination Analysis ................................................................................................112 Phylogenetic Analyses ...................................................................................................113 Bayesian tip-date phylogeny..................................................................................113 Rooting the phylogeny ...........................................................................................116 Branch selection analysis .......................................................................................117 Inclusion of breastmilk month 1 sequences ...........................................................118 Migration Analysis ........................................................................................................119 Discussion .............................................................................................................................119 6 CONCLUSION................................................................................................................... ..139 APPENDIX 5
A LIST OF QUESTIONS FOR SUBS TANCE ABUSE CATEGORIZATION.....................145 LIST OF REFERENCES .............................................................................................................147 BIOGRAPHICAL SKETCH .......................................................................................................172 6
LIST OF TABLES page Table 2-1. Treponema isolates used in this study ..........................................................................50 Table 2-2. T. pallidum primers used in this study. .........................................................................51 Table 2-3. Polymorphism at the tprC and tprD loci among pathogenic treponemes. ...................52 Table 2-4. Recombinant regions identified by RDP2. ...................................................................53 Table 2-6. Average GC content at combined 1st + 2nd (GC1+2) and 3rd codon (GC3) positions .............................................................................................................................54 Table 3-1. Demographic and phenotypic charact erstics of southwest (SW) and plains populations. ........................................................................................................................70 Table 4-1. Loci, primers, cycling conditions and restriction enzymes for 12 loci studied. ...........85 Table 4-2. Phenotypic charac teristics of the dataset. ....................................................................86 Table 4-3. Haplotype frequenc ies and p-value for comparis ons of cases vs. controls. ................86 Table 4-4. Chi-squared and regression p-valu es for genotype and allele associations for each marker. .......................................................................................................................87 Table 5-1. Number of sequences generated for each tissue. ........................................................125 Table 5-2. Sequence characteristics of V1 and V2. .....................................................................125 Table 5-3. Combination of V1 and V2 haplotypes. .....................................................................126 Table 5-4. Hudson test for population structure. .........................................................................126 Table 5-5. Number of putative recombinant clones. ....................................................................127 Table 5-6. Marginal likelihoods for mode ls used in the Bayesian analysis.................................127 7
LIST OF FIGURES Figure page Figure 1-1. Model for studying human diseases. ...........................................................................24 Figure 2-1. Unrooted ML phylogenies of multiple tpr genes. .......................................................55 Figure 2-2. ML phylogenies for tpr D, C, and I. ............................................................................56 Figure 2-3. ML phylogeny of tprG and J. ......................................................................................57 Figure 2-4. ML phylogeny of tprK. ...............................................................................................58 Figure 3-1. Relative positions of single nucleotide polymorphisms assessed in -synuclein ( SNCA ) gene. ......................................................................................................................72 Figure 3-2. Single-marker analyses representi ng p values for each marker on a logarithmic scale. ...................................................................................................................................73 Figure 3-3. Allelic distribution of th e NACP-REP1 microsatellite repeats. ..................................74 Figure 4-1. Linkage disequilbrium of markers assessed in the ADH gene family. .......................88 Figure 5-1. Sampling times and tissues. ......................................................................................128 Figure 5-2. Neighbor-joining phyl ogeny of all subtypes in group M plus this patient. ..............128 Figure 5-3. Haplotype analysis of V1. .........................................................................................129 Figure 5-4. Haplotype analysis of V3. .........................................................................................130 Figure 5-5. Recombination alignments. .......................................................................................131 Figure 5-6. Network of breast milk sequences from week 1. ......................................................131 Figure 5-7. Bayesian consensus phylogeny for C2V5 (A). .........................................................132 Figure 5-8. Bayesian consensus phylogeny for C2V5 (B). .........................................................133 Figure 5-9. Best-rooted ma ximum likelihood phylogeny. ...........................................................134 Figure 5-10. Bayesian consensus phylogeny for the C2V5 (A) dataset with branches under significant selection. ........................................................................................................135 Figure 5-11. Bayesian consensus phyloge ny with breast milk month 1 sequences. ....................136 8
Figure. 5-12. Migration an alysis for two tissues. .........................................................................137 Figure 5-13. Migration analysis for three tissues. ........................................................................138 9
Abstract of Dissertation Pres ented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy ANTHROPOLOGICAL GENETIC ANALYSIS OF HUMAN DISEASE FROM EVOLUTIONARY, POPULATION, AND CLINICAL PERSPECTIVES By Rebecca R. Gray May 2008 Chair: Connie Mulligan Major: Anthropology In this dissertation, I used genetic data from both humans and pathogens to explore the evolution and etiology of three di seases from temporally distinct perspectives. I employed an evolutionary framework to address the the or igin of syphilis, a popul ation perspective to determine genetic components contributing to al coholism in Native Americans, and a clinical perspective to study factors rela ting to the transmission of HIV-1 via breastfeeding. In the first study, I used sequence data from six genes from three subspecies of Treponema pallidum the spirochetes that cause venereal syphilis, yaws, and endemic syphilis in humans, as well as two other Treponema species, to determine their evoluti onary origin and relationships using phylogenetic and population genetic analyses. My data discriminate between key components of several of the leading theories of treponemal evolution, and provide new loci that are distinct among the treponemes and can be used for diagnosis. Second, I genotyped ~1000 Native American individuals for markers in the alpha-synuclein gene ( SNCA ) and used sequence data from ~400 Native Americans from the alcohol dehydrogenase gene ( ADH ) and the aldehyde dehydrogenase gene ( ALDH) to test for an association with substance abuse in these populations. I used both dichotomous and continuous measures of addition and several statistical tests to determine association. Despite the high power of the study, no significant association was 10
detected. This may be the result of the past evolutionary history of Native Americans, who experienced a severe genetic bottleneck during the migration from Asia and may have lost variants that have been previous ly associated with substance a buse. I concluded that a focus on environmental causes and solutions may be most appropriate in these populations. Finally, I sequenced and analyzed the env gene of the human immunodefici ency virus type-1 (HIV-1) in breast milk and blood plasma from an HIV-1 pos itive woman who transmitted the virus to her infant via breastfeeding. This was the first long itudinal study of HIV-1 in breast milk, and major findings included the distinctivene ss of the virus in milk during the first month post-partum, the compartmentalization of the virus over time, and the dynamic evolutionary pattern of the virus in the milk. These results provided information a bout the biological mechanism responsible for differential transmission risks associated with various modes of breastfeeding. Genetic anthropologists are equi pped with the analytical tools to study the biological mechanisms of diseases and to incorporate information about the relevant underlying population structure and evolutionary hist ory. This unique perspective allo ws genetic anthropologists to provide comprehensive clinical and policy reco mmendations based on genetic data. Finally, the multi-disciplinary approach employed by anthropo logists can be valuable in ensuring that resulting applications of the data are culturally appropr iate and provide maximum health benefits to communities in need. 11
CHAPTER 1 INTRODUCTION Disease has been a major component of the human experience for the past 10,000 years, and likely long before that time as well (Cockburn 1971; Omran 1971; Armelagos and Dewey 1975; Cohen and Armelagos 1984; Barre tt et al. 1998). Human health and diseases are studied in a wide range of fields, in cluding anthropology, biology and me dicine. Anthropologists in particular pride themselves on the holistic nature of their discip line, which not only incorporates incredibly diverse research, but also enc ourages interdisciplin ary communicastion and interpretation. Dialogue between subfields within anthropology and across disciplines such as medicine and public health allows for a more comprehensive and cross-cultural perspective on the nature of disease. Anthropological genetics is a subfield of bi ological anthropology and applies which uses genetic data and evolutionary concepts to address anthropological questions, including the nature of human and non-human prim ate relationships (Krings et al. 1997; Krings et al. 1999; Krings et al. 2000; Relethford 2001; Lalueza-Fox et al. 2005; Caramelli et al. 2006; Plagnol and Wall 2006; Krause et al. 2007), the routes and timi ng of human migrations around the world (i.e. Cann, Stoneking, and Wilson 1987; Kolman, Sambuughin, and Bermingham 1996; Quintana-Murci et al. 1999; Macaulay et al. 2005; Ramachandran et al. 2005), and the demographic forces that have shaped human history (i.e. Harpending et al. 1993; Harpending 1994; Sherry et al. 1994; Harpending et al. 1998 ; Harpending and Rogers 2000). The impact of disease on human genetic diversity is often addressed as well, in part because alleles affecting and affected by diseases are often population specific and provide information about the questions posed above. In addition, many anthr opologists are interested in determining the relative contribution of geneti cs to the etiology and seve rity of human disease. 12
In this dissertation, I use genetic data from both humans and pat hogens to explore the evolution and etiology of one co mplex disease and two infectious diseases from temporally distinct perspectives. I have adapted a model that incorporat es three major perspectives (evolutionary, population, a nd clinical) from which the anthr opological study of the genetics of human disease can be approached (Figure 1-1) Although the original model was applied to infectious disease (Quintana-Murci et al. 2007), I have broadened the model to include complex disease as well. This modification provides a temporal framework for considering genetic diversity and disease, which enables a more holis tic treatment of the evolutionary, demographic, and cultural forces that are operating on, and in concert with, genetic va riability. I used an evolutionary perspective to address the origin of syphilis, a population perspective to determine genetic components contributing to alcoholism in Na tive Americans, and a clinical perspective to study factors relating to the transm ission of HIV via breastfeeding. The evolutionary perspective t ypically incorporates the great est genetic variation, because the questions addressed in this framework are of ten rooted in the dist ant past. For example, constant exposure to pathogens has shaped th e human genome (Nielsen et al. 2007), either through negative selection (Schwartz et al. 1995; Diaz et al. 2000; H ugot et al. 2001) or balancing selection (Schroeder, Gaughan, and Swift 1995; Allen et al. 1997; Verrell i et al. 2002). An intriguing example is the chemokine recep tor CCR5 locus, at which homozygosity for the delta32 mutation confers resistance to the human immunodeficiency virus type 1 (HIV-1) infection (Samson et al. 1996). Under the assumpti on of neutrality, the high frequency (up to 10%) of the variant in European populations would suggest th at the allele arose over 100ka (Stephens et al. 1998b; Galvani and Novembre 2005). However, using linkage disequilibrium and geographic structure analyses, the origin of the allele ha s been estimated ca.1,000 years ago, 13
which suggests that strong selection has driven th e allele to its current high frequency (Libert et al. 1998; Stephens et al. 1998b; Lucotte 2001). Clearly HIV-1, which only entered the human population less than one hundred years ago (Sharp et al. 2001), could not have been the selective force. However, smallpox, which is phenotypically similar to HIV-1 and has caused high rates of mortality episodically over the past millennium, ma y have been the selective force (Galvani and Slatkin 2003; Galvani and Novembre 2005). For comparison, the nucleotide diversity at the CCR5 gene was compared between humans and chimpanzees, which are subject to a simian immunodeficiency virus (SIVcpz) similar to HIV1 but much less pathogenic. An excess of rare variants was found in the chimpanzee gene, sugge sting that the locus was influenced by a selective sweep (Wooding et al. 2005). CCR5 was much more diverse in humans and characterized by an excess of co mmon variants, suggesting balanc ing selection (Wooding et al. 2005). These studies of the CCR5 gene demonstr ate the global nature of the evolutionary perspective, which considers comprehensive hum an genetic variation and the impact on human and nonhuman primate genomes from past pathogen experiences. In chapter two, I used genetic information from pathogens themselves to addr ess evolutionary questions of anthropological interest, such as where and when in human history veneareal syphilis evolved. This project elucidates an important evolut ionary mechanism of emerging pa thogens, gene conversion, which may have a significant impact on our appr oach to treatment and vaccination. The second stage of the model is the populati on perspective, which considers the genetic, demographic and cultural influences that shape the distribution of diseas es between and within geographic and ethnic groups. Disease may differ entially affect populat ions either because particular disease-causing variants may exis t at higher frequencies in populations due to demographic history, or because environmental forces within a population may exacerbate the 14
effect of variants predisposing a disease (Sc hork 1997). One well studied example is the extreme difference in diabetes prevalence between diffe rent ethnic groups (Fujimoto 1996) which can range from over 50% in the Native American Pima (Knowler et al. 1990) and in Pacific Islanders (Amos, McCarty, and Zimmet 1997; McCarthy and Zimmet 2001) to a fraction of that prevalence in European populations with the sa me risk factors (West 1974; Young et al. 2000). The thrifty-gene hypothesis proposed that repe ated exposure to famine in certain huntergatherers led to the selection of genes which pr omote storage of fat; however, in modern times, an over-abundance of food has le d to the high prevalence of diab etes and metabolic disorders (Neel 1962; Neel 1999), which is es pecially exacerbated in nonWestern populations that have possibly had less time to adapt to changing c onditions (Neel 1982). A lthough this hypothesis helped to explain Native American rates of diabetes (Johnson and McNutt 1964; Doeblin, Evans, and Ingall 1969; Wise 1976) and continues to be discussed (Benyshek and Watson 2006; Paradies, Montoya, and Fullerton 2007), numerous objections have been raised, including the ethnographic validity of the prem ise that hunter-gatherers ex perience more famines than agriculturalists (Dirks 1993; Benyshek and Watson 2006), the importance of the fetal environmental component (Hales and Barker 19 92; Barker et al. 1993; Hales and Barker 2001; Lindsay and Bennett 2001; Ordovas, Pittas, and Gr eenberg 2003), and the reductionist approach to human variability encompassed by a typologi cal (race-based) approach (Fee 2006; Paradies, Montoya, and Fullerton 2007), among other points However, the longevity of the hypothesis demonstrates both the attraction of an anthropological theory that incorporates evolution and culture, as well as the complications inherent in the etiology of complex diseases. In chapters three and four, I invest igated the potential association be tween genetic data markers and substance abuse in Native American population using the population perspective. The past 15
genetic history of this population may have c ontributed to the non-signi ficance of the genetic data, and I ultimately concluded that a focus on environmental causes and solutions might be most appropriate in these populations. Finally, the clinical perspective focuses on individuals involved in experimental and intervention studies usually run by medical practitioners. This pe rspective typically measures either the response to an interv ention or the frequency with which healthy individuals succumb to a particular disease over time. Studies from the clinical perspective often do not explicitly account for genetic and cultural diversity, which risks misinterpret ation of results due to the underlying population genetic stratif ication and/or cultural infl uences that may impact the outcome of such trials. For example, the majori ty of drug trial studies for HIV-1 have been conducted in the developed world (Perrin, Kaiser and Yerly 2003). Host genetics, such as the human leukocyte antigen genes (HLA) are certainl y involved in HIV-1 inf ection (Moore et al. 2002; O'Brien and Nelson 2004; Fell ay et al. 2007; Brass et al 2008), which are differentially distributed among geographic groups (Cavalli-Sf orza, Menozzi, and Piazza 1994; Monsalve, Helgason, and Devine 1999; Blanco-Gelaz et al. 2001; Cao et al. 2004; Prugnolle et al. 2005). If aspects of host genetics affect the efficacy of dr ug therapy or vaccines, then only incorporating a subset of the total human genetic variation in thes e trials can lead researchers to misinterpret the value of their discoveries, sin ce treatments may not have the same efficacy in every human group. Furthermore, the need for such drugs is much greater in developing countries than in the West, and ignoring the particular genetic and cultural aspects of these populations hinders the development of effective treatments. Another pot ential concern with clinical studies is the generalized use of race, which is often used as a quick proxy by the medical community to represent perceived differences in genetic ancestry and cultural lifest yle, when in fact the factors 16
underlying a persons genetic ancestry and choi ces are much more complex (Duster 2007; Hoover 2007). For example, in a controversial decision by the FDA, approval was granted for the drug to be marketed towards African-Americans (Carmody and Anderson 2007; Yancy et al. 2007). Some have argued that the identification of the efficacy in African-Americans but not Causasians was prospective and questionable (Bibbins-Domingo and Fernandez 2007; Duster 2007), although others suggest that acknowledging the interplay be tween human genomic variation and pharmacogenomics may improve drug development and global health care (Seguin et al. 2008). Lastly, the ethics of clinical studies can be questionable when indigenous populations are used as study subj ects who may not have the e xpertise to fully give their voluntary informed consent, and who may receive no benefit from their participation. The expertise of anthropologists is so rely needed in the clinical realm to advise, plan and interpret studies and data that make use of clinical trials so that maximum benefit for the eventual recipients of the intervention can be achieved. In chapter five, I use a c linical perspective to investigate potential molecular mechanisms invol ved in transmission of HIV-1 via breastfeeding. I believe that current recommendations about breastfeeding by HIV-1 positive women in the developing world should both account for the diff iculties inherent in the practice and its cessation for women and the infants, as well as ensure that all aspects of the guidelines are scientifically sound. In this dissertation, I chose to study three di seases affecting humans corresponding to the three perspectives outlined above. I used genetic variation from bot h the pathogen itself and from humans to address anthropological, evolutionary, public health, a nd medical questions. I used an evolutionary framework to address the the or igin of syphilis, a popul ation perspective to determine genetic components contributing to al coholism in Native Americans, and a clinical 17
perspective to study factors relating to the tr ansmission of HIV-1 via breastfeeding. Thus, my results have broad relevance not onl y to a range of anthropological questions, such as the origin of venereal syphilis, but can also be translated into clinical significance and inform health policies. In chapter two, I examined the evolution of three human treponemes: Treponema pallidum subsp. pallidum which is the etiological agent of venereal syphilis, T.p subsp. pertenue which causes yaws, and T.p subsp. endemicum which causes endemic syphilis. Previous knowledge of these diseases has come primarily from archaeol ogical and historical ev idence; however it is difficult to discern the three diseases in the ar cheological record because the bone pathologies caused by the three diseases are similar, and a major diagnostic criter ion is therefore the frequency and distribution of treponemal pat hology among skeletons at burial sites and the anatomical distribution of lesi ons (reviewed in (Powell and C ook 2005). Even the diagnosis of contemporary samples is difficult because the clini cal manifestations are similar and there is a dearth of distinct molecular markers defining the three diseases (Centurion-Lara et al. 2006). Several prominent hypotheses have been advanced describing the evolution of the treponemes (Baker and Armelagos 1988; Powell and Cook 2005) Rothschild (2003) proposed that yaws ( T. p. subsp. pertenue) was the most ancestral of the three T. pallidum subspecies and was present at least as far back as the origin of modern humans in Africa, and the other two subspecies each derived from yaws, with T. p. subsp. pallidum evolving in the New Wo rld no more than ~2000 years ago (Rothschild 2003) A New World origin of T. p. subsp. pallidum is central to the original Columbian hypothesis that suggested venereal syphilis was brought to Europe by Columbus crews returning from the New Worl d (Crosby 1969). An alternative Columbian hypothesis was advanced by Baker and Armelagos (1988) that suggested venereal syphilis 18
evolved very rapidly during Columbus return vo yage from the New Worl d from a non-venereal treponeme and was subsequently introduced to Eu rope (molecular data supporting this view were published recently, (Harper et al. 2008) as well as a critical review (Mulligan, Norris, and Lukehart 2008) and both attracted astonishingly widespread interest among the general public). In contrast, the Pre-Columbian hypothesis suggests that treponemal diseases, including venereal syphilis, existed in the Old World prior to Co lumbus voyages but were diagnosed incorrectly. For example, pinta was the original form pres ent throughout the world during the Pleistocene, followed by the evolution of yaws (12,000 years ago), then endemic syphilis (9,000 years ago) and, finally, venereal syphilis (5,000 years ago) (Hackett 1963). Lastly, the Unitarian hypothesis suggests that venereal syphilis, endemic syphilis, yaws and pinta are not in fact distinct diseases, but rather are environmentally determined mani festations of the same disease (Hudson 1965). My goal was to use molecular genetic data samp led from contemporary strains of each of the three main treponemes (no molecular data exist for T. carateum that causes pinta), as well as two outgroup species, to determine th e support for any of these hypothe ses (Chapter 2, Gray et al. 2006). This was the first phylogenetic study of the treponemes, and it provided valuable information on the possible evolu tionary scenarios of these pat hogens. Furthermore, I was able to establish particular alleles that are specific to each of the three subspecies that could be used in future clinical investigations to aid in dia gnosis. Finally, I provided new data that suggests treponemal genome evolution has been driven by recombination, specifically gene conversion, much more often than was previously known or predicted. In the second study (chapters three and four), I used a popul ation perspec tive to study alcoholism in Native Americans. This group experien ces alcohol related deaths at more than five times the rate of the general United States popu lation (IHS 2006) and are twi ce as likely to die of 19
chronic liver disease than Caucasians (CDC 2006a). The possible cultural reasons for this disparity include high rates of poverty and unemp loyment, lack of access to health care, and overall poor health (IHS 2006). Ho wever, high rates of alcoholis m in Native Americans are surprising in light of the resear ch that shows ancestral Asian p opulations have a low level of alcoholism, most likely mediated by a very high frequency of two alleles at the two main genes involved in alcohol metabolism (alcohol dehydrogenase gene [ ADH ] and aldehyde dehydrogenase [ ALDH ]). These alleles slow th e bodys metabolism of alc ohol resulting in toxic accumulation of acetaldehyde that produces an inte nsely uncomfortable sensation, i.e. flushing response, that ultimately prot ects against alcoholism through the behavioral response of consuming less alcohol (Chao et al. 1994; Thomass on et al. 1994; Chen et al. 1996; Nakamura et al. 1996; Tanaka et al. 1996; Shen et al. 1997; Osier et al. 1999). Because Native Americans are genetically descended from a north-central Asia n source population within the last 20,000 years (Meltzer 1993; Merriwether, Rothhamme r, and Ferrell 1995; Kolman, Sambuughin, and Bermingham 1996), it might be expect ed that they would have inhe rited these protective genes. However, these protective alleles were found to be absent in a Sout hwest population, although a significant association was found between other alleles at the ADH locus and the behavior of binge drinking (Mulligan et al. 2003). In addi tion, a genomewide association study performed with the same Native American population found a strong associat ion signal with alcoholism on chromosome four near the ADH gene (Long et al. 1998). In order to further investigate the possible genetic basis of alcoholism in Nativ e Americans, I examined 12 single nucleotide polymorphisms (SNPs) at both the ADH and ALDH genes of ~400 individuals from a Plains population for association with multiple dichotomous and continue measures of alcohol and drug abuse (Chapter 4). Despite the numerous phenot ypes and the extensive genetic dataset, no 20
significant associations were dete cted. I also analyzed genotype data from the alpha-synuclein ( SNCA ) gene, also located on chromosome four near ADH and therefore another attractive candidate gene for alcoholism (Chapter 3, Clarimon et al. 2007). -synuclein is involved in dopaminergic neurotransmission, and the overexpression of the protein has been implicated in the etiology of Parkinsons disease (Polymer opoulos et al. 1997; Kr uger et al. 1998) and Alzheimers disease (Ueda et al. 1993), possi bly because of neurodegeneration of dopamine neurons due to toxic build-up of the prot ein (Mash et al. 2003). More recently, -synuclein has also been associated with alcoholism (Liang et al. 2003; Bonsch et al. 2005a; Bonsch et al. 2005b; Bonsch et al. 2005c) and drug addicti on (Mash et al. 2003; Kobayashi et al. 2004). Specifically, increased mRNA and protein are elev ated in alcohol-prefe rring individuals in humans, rats, and macaque monkeys (Liang et al 2003; Spence et al. 2005; Walker and Grant 2006) and are associated with al cohol craving in humans (Bonsch et al. 2005a; Bonsch et al. 2005c). I genotyped and analyzed 15 SNPs at the SNCA locus in ~1000 individuals from a Plains and a Southwest population and again found no si gnificant association between any SNP and alcohol or drug abuse or dependence. Since ge netic variability and promoter polymorphisms upstream of SNCA may mediate the increase in mRNA and protein expression (Bonsch et al. 2005b) my results suggest that study of upstr eam polymorphisms may represent a productive avenue for future research. However, the re sults of these two studies suggest that the environment may be a more influential compone nt in substance abuse among Native Americans, and therefore further resources s hould be devoted to address the underlying economic and social problems in these populations. In the final study (chapter five), I used mol ecular data to investigate recent observations that, contrary to previous wide-held opini on, breastfeeding by HIV-1 positive women in 21
resource-poor areas is more benefi cial to the long-term health of their children than complete or partial replacement feedi ng (feeding of any substance other th an breastmilk). This study used a clinical perspective, as I inve stigated the evolutio n of HIV-1 in the breastmilk and blood over time from a woman who participated in a clinical trial on breastfeeding-mediated transmission of HIV. This study addressed many anthropological issues. Worldwide, an estimated 420,000 children were infected with HIV-1 in 2007, the vast majority through mother-to-childtransmission (MTCT) (WHO 2007). Breast-feedi ng accounts for one-third to one-half of all MTCT events during the first 24 months of life (Dabis et al. 1999; Iliff et al. 2005). In the US, women are counseled by the CDC to replace breastf eeding with formula if infected with HIV-1 (CDC 2007), and the World Health Organizati on (WHO) previously recommended that HIV-1 positive women in all countries avoid all breast feeding (WHO 2003). However, formula-feeding is impractical for women in resource poor regions of the world where they do not have consistent access to clean water, formula, and health care, and breast feeding may be the only practical option. Cultural pressures also make women reluct ant to eschew breast fe eding as this can be seen as a tacit admission of HIV-1 status. However, recent observational studies have suggested that exclusive breast f eeding, as opposed to the simultaneous feeding of milk and other foods, may significantly reduce the risk of transmission of HIV-1 (Coutsoudis 2000; Coutsoudis et al. 2001; Coutsoudis et al. 2002; Il iff et al. 2005; Kuhn et al. 2007). The WHO subsequently changed its recommendations to women in de veloping countries to encourage exclusive breastfeeding up to si x months followed by abrupt weaning (WHO 2006). However, the biological mechanisms underlying th e reduction of risk through excl usive breastfeeding have not been clearly elucidated. Also, the benefits of abruptly weaning at six months are not at all clear, while the practice is difficult and painful for th e mother who would typically wean over a period 22
of months. I amplified and sequenced the env gene from viral populati ons present in the breast milk and plasma over a two-year period from an HIV-1 positive woman participating in a clinical trial in Zambia. I us ed phylogenetic and sequence-based analyses to examine the evolution of the virus over time and within tissues. I concluded that the breastmilk virus was genotypically distinct from the plasma virus duri ng the early stages of breastfeeding, and the virus in both tissues was subjec t to changing evolutionary dynam ics and selective pressures over time. The benefit of an anthropolog ical genetic perspectiv e that I bring to this study is the ability to use evolutionary analyses to investigate the molecular basis of modulated risk of MTCT, with the goal of advocating a scientifically sound a nd culturally sensitive breastfeeding management plan to women while eliminating unnecessarily onerous measures. In sum, this dissertation demonstrates how genetic anthropology can be used to address both anthropological and clinical concerns from three tempora lly and philosophi cally distinct perspectives. My studies incorporate pathogen ge netics in addition to human genetics, which can broaden our evolutionary understa nding of the interaction between humans and pathogens. My dissertation demonstrates the value of using an anthropological persp ective in arenas often dominated by medical practitione rs. My unique advantage as a ge netic anthropologist is that I can apply analytical tools of evolutionary genetics to study th e biological mechanisms of diseases, while maintaining a multi-discinplinary approach that considers the cultural, historical, and demographic factors that infl uence etiology. In a ddition, my training as an anthropologist allows me to interpret the clinical results from these studies in a cultu rally appropriate context for the maximum benefit of the participants. 23
TimeDiversity TimeDiversity Figure 1-1. Model for studying human diseases. 24
CHAPTER 2 MOLECULAR EVOLUTION OF THE TPRC D, I, K, G AND J GENES IN THE PATHOGENIC GENUS Treponema 1 Introduction The evolution of bacterial genomes has been heavily influenced by processes such as horizontal gene transfer and homologous recombination, both of which can accelerate adaptation through the generation of new alle les (Feavers et al. 1992; Baldo et al. 2006). Horizontal (or lateral) gene transfer occurs through the uptake of genetic material from another genome, i.e. an inter-genomic event, and includes transformation, conjugation, and transduction (Ochman, Lawrence, and Groisman 2000). Homologous reco mbination, which is typically an intragenomic event, also occurs with high freque ncy in bacterial geno mes (Smith, Dowson, and Spratt 1991; Feil et al. 2001; Feil and Spratt 2001). Several outcomes may arise from a recombination event, including translocations, deletions, duplic ations, inversions, and gene conversions (Hughes 2000). Gene conve rsions are intra-genomic events that are the result of a non-reciprocal transfer of genetic information from a donor locus to a r ecipient locus, either through the permanent transf er of genetic material to the reci pient locus or through the temporary use of the donor sequence as a template for DNA synthesis on the recipi ent strand (Santoyo and Romero 2005). Gene conversion is especially important in the evolution of gene families (Slightom, Blechl, and Smithies 1980; Drouin et al. 1999; La the and Bork 2001; Noonan et al. 2004). Gene families are comprised of paralogous genes, which are defined as two or more genes within the same genome that are so simila r in DNA sequence they are assumed to have originated from one 1 Gray, R. R., C. J. Mulligan, B. J. Molini, E. S. Sun, L. Giacani, C. Godornes, A. Kitc hen, S. A. Lukehart, and A. Centurion-Lara. 2006. Molecular evolution of the tprC, D, I, K, G, and J genes in the pathogenic genus Treponema. Mol Biol Evol 23:2220-2233. 25
ancestral gene (King and Stansfield 1997). The in itial event crea ting the gene family was thus likely to be one or more duplication events The high sequence homology between paralogous genes that signals a past duplication event also sets the stage for pot ential future homologous recombination events (Schimenti 1994; Posada, Cr andall, and Holmes 2002). Orthologous genes, on the other hand, share sequence homology and ar e assumed to be descendant from a common ancestral gene, but are present in different sp ecies (King and Stansfie ld 1997; Gogarten and Olendzenski 1999). In this case, the genes most likely evolved through sp eciation rather than duplication. Recombination can sign ificantly impact inferred phyloge netic relationships (Feil et al. 1999; Holmes, Urwin, and Maiden 1999; Feil a nd Spratt 2001; Worobey 2001). In the case of gene families, gene conversion can cause paralogous genes to cluster more closely than orthologous genes, thus confusi ng the order of evolution of th e organisms (Drouin et al. 1999). There are two seemingly opposite outcomes of gene conversion, concerted evolution and increased sequence diversity, which may be distinctive of different stages of multi-gene evolution (Santoyo and Romero 2005). After a gene family has been generated by ancient duplication events, paralogous a nd orthologous comparisons should exhibit the same degree of divergence. If the paralogous comparisons are mo re similar, then the genes in a multi-gene family are evolving in a non-independent manner leading to homogenization of the genes, or concerted evolution (Ohta 1992; Howell-Adams a nd Seifert 2000; Liao 2000; Lathe and Bork 2001). This may be beneficial in the case where a weakly advantageous point mutation arises in one gene, and its effect is multiplied when the entire gene sequence is converted to other loci (Dover 2002). This is consistent with the proposal that purifyi ng selection may operate on genes that have undergone duplication on the assumption th at a duplicated gene must have an initial benefit for the organism and, thus, its sequen ce must be conserved (Lynch and Conery 2000; 26
Kondrashov et al. 2002). As the sequences accumulate neutral diversity, though, the process of gene conversion becomes less efficient. After tim e, only small islands of homology exist and a site-specific system of shorter regions of gene conversion may take over, the outcome of which is increased sequence variati on (Zhang et al. 1992; Zhang et al. 1997; Zhang and Norris 1998; Santoyo and Romero 2005; Taguchi et al. 2005). This is cons istent with Ohno (1970), who suggested that duplicated genes are under less selective pressure and may accumulate more mutations leading to loss of the paralog or creation of a ne w function (Kimura and King 1979; Walsh 1995; Wagner 1998; Lynch and Force 2000). Thus, concerted evolution and increased sequence diversity may indicate ea rlier and later stages, respectiv ely, in the evolution of gene families (Santoyo and Romero 2005). In this study, we examine genes in the tpr ( Treponema pallidum repeat) gene family in members of the genus Treponema (Spirochete family of bacteria) to investigate the evolution of the gene family and, possibly, evolution of the treponemes themselves. The tpr gene family consists of 12 paralogous genes that comprise 2% of the T. pallidum genome and have probably evolved through gene duplication and gene conv ersion. These genes are related to the major outer sheath protein (Msp) in Treponema denticola (TDE0405); however it appears that T. denticola did not experience a history of gene duplica tion and gene conversion at this locus since T. denticola possesses only one tpr -like gene (Seshadri et al. 2004). The tpr gene family in T. pallidum is believed to encode potential virulence factors and is divided into three families: Subfamily I (tprC, D, I and F ), Subfamily II (tprE, G, and J ), and Subfamily III ( tprA, B, H, K and L ). The gene products from Subfamilies I and II have conserved amino and carboxyl terminal sequences with unique central regions, while Subfamily III has scattered conserved and unique or variable regions (Centurion-Lara et al. 1999). Gene conversion has previously been 27
reported in tprK (Centurion-Lara et al. 2000a; Centurio n-Lara et al. 2004). Seven variable regions within tprK were proposed to have been created by gene conversion using sequences from the flanking regions of tprD as donors (Centurion-Lara et al. 2004). The degree of diversity in these variable regions appears to increase in the presence of adaptive immune pressure, suggesting that a function of th ese gene conversions may be to create antigenic diversity (Centurion-Lara et al. 2004). The pathogenic treponemes include three Treponema pallidum subspecies, T. carateum T. paraluiscuniculi (rabbit syphilis) and the unclassified Fribourg-Blanc (simian) isolate. The three T. pallidum subspecies include pallidum which is the causative agent of human venereal syphilis and pertenue and endemicum which cause yaws and bejel, respectively. T carateum is the etiological agent of pinta, although no isolates of this organism are known to exist. None of the pathogenic treponemes mentioned above can be propagated in vitro. The complete T. p. subsp. pallidum genome (from the Nichols strain) was sequenced in 1998 and is considered the reference strain (Fraser et al. 1998). T. denticola considered a non-pathogenic treponeme, probably had an anci ent divergence with T. pallidum based on the large difference in GC content between T. pallidum and T. denticola (52.8% and 37.9%, respectively) and in genome length (1.14 Mb and 2.84 Mb, respectively) (S eshadri et al. 2004) and thus the T. denticola sequence was not considered in this study. Although lateral gene transfer ha s been identified as a probable evolutionary force in the genome of T. denticola no evidence exists for lateral gene transfer in T. pallidum (Seshadri et al. 2004). In this project, we examined eight strains of T. pallidum subsp. pallidum and two strains each of T. pallidum subsp. pertenue and T. pallidum subsp. endemicum representing all known propagated human strains (two additional T. pallidum subsp. pertenue strains have recently been 28
obtained and are under study) as well as two non-human strains, T. paraluiscuniculi and the simian isolate. Six tpr genes, representing all three subfamilies, were sequenced: tprC D G J I and K In order to investigate the evolution of these tpr genes, we utilized phylogenetic methods, general measures of nucleotide diversity, and spec ific methods to detect recombination events. Materials and Methods Treponemal Strains and tpr Sequencing All treponemal isolates used in this study were propagated in New Zealand White rabbits (Lukehart et al. 1980) with the approval of the University of Washington Institutional Animal Care and Use Committee. The Fribourg-Blanc stra in was isolated from the popliteal lymph node of a baboon from a yaws-endemic area (Fribourg-B lanc, Mollaret, and Ni el 1966); a single report describes an experimental infect ion of humans with this strain (Smith et al. 1971). Strain designations and origins of the isolates are indicated in Ta ble 2-1. Organisms were extracted by mincing infected testicular tissue in 0.9% saline and were quantitated by darkfield microscopy. Treponemal suspensions were mixed with an equal volume of 2x DNA lysis buffer (20mM Tris, pH 8; 0.2 M EDTA, pH 8; 1.0% sodium dodecyl sulfate). DNA from treponemes was extracted as previously described (LaFond et al. 2003). Full-length open reading frames (ORFs) of 1791-2268 bp (Table 2-2) from each strain were amplified, cloned, and sequenced as previous ly described (Giacani et al. 2004; Sun et al. 2004). The ORFs were amplified from T. pallidum strains by PCR using primers (Table 2-2) located in the flanking regions of the genes, cloned into th e TOPO II vector (Invitrogen, Carlsbad, CA) and sequenced in both directions by the primer walking approach as previously described (Centurion-Lara et al. 2000b); the amplicons at tprG and J from MexicoA were obtained using primers internal to the start a nd stop codons and contained no flanking sequence. A minimum of two clones were sequenced for each amplicon and ambiguities were resolved by 29
sequencing a third clone from an inde pendent PCR, except for the Gauthier tprG, I, and J ORFs, for which a single clone for each ORF was sequen ced in both directions. For most sequences, five clones were analyzed. The T. paraluiscuniculi sequences were described previously (Giacani et al. 2004). GenBank accession numbers for the sequences are: tprC NC_000919, AY536645-6, AY550204, AY542157, AY590560, AY550206 AY542153-5 AY685236 DQ886671-73; tprD AF217537 -41, AF187952 AY685237 AE000520 AY533515 AY542156 ; tprI AY533508 -14, NC_000919 DQ886678-82; tprG/tprJ NC_000919 AF073527 AY685239 -40, DQ886674-77; TprK NC_000919, AY685248-50, DQ886683-700. Evolutionary Analysis of Sequences Six loci were considered in this analysis: tprC, D, G, I, J and K Sequences were aligned using ClustalX (Thompson et al. 1997) as well as manually using BioEdit to ensure proper amino acid alignment ( http://www.mbio.ncsu.edu/BioEdit/bioedit.html ). Frameshift mutations in T. paraluiscuniculi (bp 439 in tprC and tprD bp 653 in tprG1 and tprG2) and T. p. subsp. pallidum Sea81-4 (bp 1860 at tprG ) were removed from the alignment, as this would have created a misalignment of the amino acids for the rest of the sequence. Levels of nucleotide diversity within and between human treponemal subspecies ( and D xy respectively) were calculated using DNAsp v. 4.10.4 (Rozas et al. 2003). GC content, using all available tpr sequences from human treponemes (see Table 2-1), was cal culated using PAML (Yang 1997). An AMOVA (Analysis of Molecular Variance) was performed for tprC I and K using Arlequin version 3.0 (Excoffier, Laval, and Schneider 2005). Phylogenetic Analyses Maximum likelihood (ML) methods were used to infer the phylogenetic relationships among the tested loci. First, the most appropr iate substitution model for each locus was determined using MODELTEST 3.06 (Posada and Crandall 1998). The following models were 30
selected for each locus: tprC (without T. paraluiscuniculi ) HKY+I+ which allows for different base frequencies and a separate tr ansition and transversion rate (HKY model; (Hasegawa, Kishino, and Yano 1985) as well as a proportion of invariant sites (I) and a gamma distribution of mutation rates ( ; tprD HKY+ tprG/J GTR+ which is a general time reversible model that allows six different mu tation rate categories (GTR) as well as a gamma distribution of mutation rates ( (with Nichols J) and HKY+ (without Nichols J); tprI HKY; tprK HKY+ tprC D and I GTR + The HKY + model was used for the phylogeny including all twelve Nichols tpr genes to reduce computational time due to the complexity of the dataset. A maximum likelihood phylogeny was inferred using PAUP* 4.0b10 (Swofford 2002) and the indicated substitution model. Full heur istic searching with the simple addition of sequences and tree-bisection-r econnection (TBR) branch-swapping algorithms were used to traverse the tree-space. Bootstrap analysis (1000 maximum likelihood replicates) was performed using PAUP* 4.0b10 to determine the relative suppor t for internal nodes. Third positions were excluded in a separate analysis in order to de termine if these positions had been subject to mutational saturation. Detection of Recombination The RDP2 package (Martin, Williamson, and Posada 2005) was used to detect recombination. This program implements se veral non-parametric methods to identify recombinant and parental sequences and to estimat e breakpoint positions that identify the limits of the recombinant DNA in the sequences (Martin, Williamson, and Posada 2005). We used four methods implemented in the RDP2 program: the RDP method, which is a phylogenetic method that uses discordant branching patterns to infer recombination; the Maximum Chi-squared (MaxChi) method (Smith 1992; Posada and Cr andall 2001), which uses a sliding-window 31
approach along pairwise comparisons to identify discrepancies; the Chimera method (Smith 1992; Posada and Crandall 2001), which is similar to MaxChi but uses triplets of sequences instead of pairs; and GENECONV, which comp ares fragments of sequence pairs (Padidam, Sawyer, and Fauquet 1999). Non-default settings th at were used consisted of a window size of 100, linear sequences, maximum p-value of 0.01 or 0.001 and a Bonferroni correction. All events were listed. For the RDP method, internal and ex ternal references sequences were used, the window size was set to 10, and 0-100 sequence identity was use d. For both the MaxChi and the Chimera methods, the number of variable sites was set to 30 with 1000 permutations and a max p-value of 0.05. For the GENECONV method, the program was set to scan sequence triplets. In all cases, the same alignment files from the phylogenetic analyses were used for the recombination analyses. Results We examined six genes of the tpr gene family ( tprC, D, G, J, I and K ) in three human treponemal subspecies ( T. pallidum subsp. pallidum, endemicum and pertenue) and in two nonhuman treponemes ( T. paraluiscuniculi and the simian isolate) (Table 2-1). We were interested in the relationship of th e genes and alleles to one another as well as evidence for recombination. Because of the well documented evidence for gene conversion in gene families and because no evidence exists for lateral gene transfer in T. pallidum we were specifically interested in identifying intra-genomic recombination events, i. e. gene conversion. In order to investigate the evolution of these tpr genes, we utilized 1) phylogenetic me thods, 2) specific methods to detect recombination events, and 3) general measur es of nucleotide diversity and composition. Phylogenetic Analyses In order to obtain an overall view of the genetic diversity at all of the studied loci, a maximum likelihood (ML) tree was created usin g an alignment of 2708 nucleotides from all 32
twelve available tpr gene sequences for T. p. subsp. pallidum Nichols strain (obtained from GenBank) (fig. 1a). Sequ ences from Subfamily I ( tprC, D, I, F ) and Subfamily II ( tprE, J, G ) cluster in two separate clades that are each cl early separated from the rest of the phylogeny. In contrast, Subfamily III ( tprA, B, H, L, K ) sequences do not cluster with each other or any other sequences and are distributed with varying branch lengths between the Subfamily I and II clades. These results are consistent with previous st udies in which Subfamily III membership was less clearly defined than the other subfam ilies (Centurion-Lara et al. 2000b). Phylogenetic analyses of subfamily I In order to focus on Subfamily I diversity, a ML phylogeny was generated for all available DNA sequences for all Subfamily I loci: tprC, D and I (Figure 2-1). All of the tprI sequences clade together, while the tprC and D sequences are interspersed w ith each other such that there are no major monophyletic tprC or D clades. There are three in stances in which paralogous sequences cluster more closely than their or thologous counterparts, all of which involve tprC and D : 1) The tprC and D sequences for four of the T. pallidum subsp. pallidum strains; 2) the tprC and D sequences from pertenue Gauthier (along with SamoaD tprC ); 3) the tprC and D sequences for T. paraluiscuniculi. The eight pallidum sequences are identical, while the pertenue Gauthier and T. paraluiscuniculi tprC and D sequences differ by a maximum of one and three point mutations, respectively, highlig hting the paralogous relationship of tprC and D in these strains. Individual ML trees were also created for each of the Subfam ily I loci examined in this study: tprC, D, and I In the tprD phylogeny, two distinct clades are evident (Figure 2-2). One clade is comprised of four identical T. p subsp. pallidum sequences and T. paraluiscuniculi all of which carry the D2 allele (using terminology of Centurion-Lara et al 2000b; sequences that differ by a few base pairs but have the same defining motif are cons idered the same allele). The 33
second clade is comprised of the other four identical T. p. subsp. pallidum sequences that carry the D allele and the T. p. subsp. pertenue Gauthier strain, which carries the D3 allele that is 95% homologous to the D allele (Centurion-La ra et al. 2000b). Although sequence data are unavailable, PCR analysis suggests that T. p. subsp. endemicum and non-Gauthier strains of T. p. subsp. pertenue would cluster in the D2 clade (Centurion-Lara et al. 2000b). The D and D2 alleles differ from each other by a 330bp central region at bp 855-1180 and three smaller variable regions at bp1275-1306, 1425-1503 and 1569-1626 (relative to the Nichols strain sequence). A contiguous expanse encompassing the four variable regions (bp 855-1626) was removed and the new alignment was used to genera te a ML tree in which the eight T. p. subsp. pallidum sequences comprise a monophyletic clade (data not shown). An initial tprC phylogeny included all strains (not shown). This phylogeny contained a very long branch leading to T. paraluiscuniculi which increased the scale by an order of magnitude (data not shown). The long T. paraluiscuniculi branch, along with the paralogous grouping of T. paraluiscuniculi tprC and D sequences in Figure 2-1, suggested a gene conversion event in T. paraluiscuniculi that replaced the ancestral sequence at tprC with tprD Table 2-3 summarizes the proposed gene conversion events between tprC and D ). The T. paraluiscuniculi sequence was removed and an alternative phylo geny was generated (Figure 2-2). In the new phylogeny, the three human subspecies cluster separately with strong bootstrap support (94100%). Simian is contained in the T. p subsp. pertenue clade although it is distinct from the two T. p. subsp. pertenue sequences. The T. p. subsp. pallidum sequences form three well-supported clusters within a monophylet ic clade. Four of the T. p. subsp. pallidum sequences are identical to each other as well as to the tprD sequences in these same strains and are considered to carry the D allele at both loci (see examples of paralogous sequences clustering above). The tprC alleles in 34
the other four T. p. subsp. pallidum strains show high sequence hom ology with the D allele and are labeled D-like alleles (Centuri on-Lara et al. 2004) (Table 2-3) The fact that there is higher similarity between the paralogous tprC and D sequences in the strains carrying the D allele (they are identical) than between their respective homologs suggests that a gene conversion event has occurred between tprC and tprD in the D allele strains. In this case, tprC appears to be the likely donor since there is detectable hom ology among all of the subspecies at this locus, whereas the D and D2 alleles differ by a long central variable region. Furthermore, the tprC and tprD sequences in T. p. subsp. pertenue Gauthier strain are id entical suggesting a third gene conversion event, again with the tprC locus serving as the donor due to the detectable homology among the tprC homologs (Table 2-3). The tprI phylogeny includes the same isolates as the tprC phylogeny with the exception of T. paraluiscuniculi which does not have a tprI locus (Figure 2-2). The phylogeny for tprI shows a relatively long branch between T. p. subsp. pallidum and the other treponemes, similar in length (0.016 substitutions/site) to the corresponding branch in the tprC phylogeny (0.014 substitutions/site). There is 100% bootstrap support for the tw o monophyletic clades consisting of T. p. subsp. pallidum (all eight pallidum sequences are identical) and T. p subsp. endemicum, respectively, moderate support fo r clustering of simian with T. p subsp. pertenue SamoaD (85%), and little support for a T. p. subsp. pertenue + simian clade (62%), although the simian sequence clearly does not belong with the other two clades. The tprI phylogeny confirms the close relationship between the unclassified Fribourg-Blanc simian isolate and T. p. subsp. pertenue that was also evident in the tprC phylogeny. Phylogenetic clustering of these sequences suggests that there is no strong sp ecies boundary, a conclusion that is supported by the fact that the simian treponeme is reported to infect humans (Smith et al. 1971). 35
Phylogenetic analyses of subfamily II The tpr Subfamily II consists of tprE, G, and J Previous studies have shown that the T. p. subsp. pallidum Nichols tprG and J sequences are highly homologous at the 5 and 3 ends while the central regions show extreme divergence (Giacani, Hevner, and Centurion-Lara 2005), specifically at two variable regions (V1 = s ites 976-1510, with a small internal region of homology at sites 1168-1295, and the much smalle r V2 = sites 1879-1947) that are unlikely to have evolved through point mutation.. Different V1 and V2 sequences are classified as G and J motifs, which are used to define the G, J, and G/J alleles present at tprG and J loci. The G allele is defined as a G motif at V1 and V2, the J allele is defined as a J motif at V1 and V2, and the G/J allele is defined as a G mo tif at V1 and a J motif at V2. At the tprG locus, analysis of our alignment shows that two of the three pallidum strains analyzed at this locus (Nichols and Sea81-4) carry the G allele, while the other pallidum strain (MexicoA) and the pertenue strain (Gauthier) carry the G/J allele. At tprJ, Nichols and MexicoA carry the J allele, while Sea81-4 and pertenue Gauthier carry the G/J allele (data not shown). PCR analysis indicates that the other five pallidum strains discussed in this st udy also carry the J allele at tprJ, although sequence data do not exist for these strain s. PCR analysis also indicates that the other T. p. subsp. pertenue strain (SamoaD) as well as a T. p subsp. endemicum strain (IraqB) carry the G/J allele (Centurion-Lara, unpublished). In T. paraluiscuniculi, the positions corresponding to tprE and J contain two almost identical G/J allele sequences that are designated the G2 and G1 alleles, respectively (Giacani et al. 2004). In T paraluiscuniculi G1 and G2 alleles, the second half of V1 is somewhat different than V1 in the human G/J allele, although it is still much more similar to the G/J allele than to the J allele. Furthermore, in T. paraluiscuniculi tprG has recombined with tprI (Subfamily I) to form a single alle le termed the G/I hybrid at the tprG locus (Giacani et al. 2004). 36
For our phylogenetic analysis, tprG and J sequences were grouped together because of the high amount of gene conversion at and between th ese loci (Figure 2-3). In this phylogeny, a long branch with 100% bootstrap support l eads to the Nichols and MexicoA tprJ sequences, while the MexicoA tprG, Sea81-4 tprJ, and Gauthier tprG and J form a polytomy (no bootstrap support). The tprG sequences from Nichols and Sea81-4 fo rm a highly supported clade (99%) and are clearly closer to the rest of the sequences than Nichols and MexicoA tprJ. Because the J allele is only found in T. p. subsp. pallidum while the G/J allele is found in T. p. subsp. pallidum, pertenue, and endemicum and T. paraluiscuniculi the latter is most likely ancestral. The J motif at V2 may be the result of a gene c onversion or lateral gene transfer, although no sequence homology was found in a search of the pub lic database. The G motif at V2 in the G allele also occurs in Nichols tprE (data not shown) and may represent a small gene conversion event from tprE to tprG that replaced only V2 of the ancest ral G/J allele in Nichols and Sea81-4 (although more tprE sequence data are needed to be certain). The Gauthier tprG and J sequences differ by only 2 bp and may also represent a paralogous clustering reflective of a gene conversion event, although the polytomy makes it difficult to be certain. The T. paraluiscuniculi clade is also highly supported (100%), which represents a pa ralogous clustering of closely related G/J alleles at tprE and J Phylogenetic analyses of subfamily III The tprK phylogeny includes multiple clones from all represented strains because the locus is highly variable and accumula tes mutations within a single infection (Figure 2-4). Seven variable regions have been identified in tprK that are likely the result of gene conversion events, with the probable donor sites located in the 3 and 5 flanking regions of tprD (Centurion-Lara et al. 2004). These variable regions were removed from our analysis in order to focus on the nonrecombinant history of the locus (variable region s were slightly modified to capture additional 37
flanking sites, i.e. bp 132-180, 596-671, 749-834, 866-920, 963-1059, 1141-1215, and 12911390). T. paraluiscuniculi appears to be an appropriate outgroup for tprK as the scale is on the same order of magnitude as tprC and I Strong bootstrap support is shown for the T. paraluiscuniculi (100%) and T. p. subsp. endemicum (97%) clades, as well as for a combined T. p. subsp. pallidum + T. p. subsp. pertenue clade (96%) in which these two subspecies are unresolved relative to each other. However, the fact that variable regions in tprK appear to accumulate more variation in response to select ive pressure (Centurion-Lara et al. 2004) and clones from single individuals s how single nucleotide polymorphism s (SNPs) even after removal of variable regions suggests that tprK may evolve differently than the other tpr loci. Statistical Tests for Recombination Four tests (RDP, MaxChi, Chimera and GENECONV) in the RDP2 package were used to investigate recombination events in the Subfamily I, II and III ge nes (Table 2-4). We use these methods to identify significant recombination and the location of recombinant breakpoints, but we do not infer donor and recipient alleles because there is likely inter-locus recombination also occurring that will be undetected because these methods focus on a single locus at a time. In all cases, the same alignment files from the phylogenetic analyses we re used for the recombination analyses. Our primary interest was to invest igate support for the putat ive regions of gene conversion identified in the phylogenetic analyses Relatively strong overl ap was shown in the results from all four methods, and, in general, MaxChi found the most recombination events, which was previously shown to be the most pow erful test in the RDP2 package (Posada and Crandall 2001). In tprD one region of recombination was identified in all of the T. p. subsp. pallidum D2 allele sequences, pertenue Gauthier, and T. paraluiscuniculi (see Table 2-4 for exact location of recombinant regions). These resu lts are consistent with a reco mbination breakpoint present at 38
site 855, which marks the beginning of the central variable region that differentiates the D and D2 alleles. In tprC two regions of recombination in the T. p. subsp. pallidum D allele sequences were identified at bp 137-889 and 1459-1728. This re sult is consistent with the 100% clustering of these sequences within the T. p. subsp. pallidum clade (Figure 2-2). Multiple recombination events were identified in MexicoA and Sea81-3 th at are consistent with a branch leading to a monophyletic clade containing Me xicoA and Sea81-3 in the tprC phylogeny (Figure 2-2). In tprI only one recombination event was identified in pertenue SamoaD although the sequence has only two unique single nucleotide polymorphism s in this region and point mutation seems a more likely evolutionary mechanism than recombination in this case. In tprG and J more than 40 recombination events were identified when the significance level was set to p=0.01. This result was impossible to interpret precisely so the analysis was performed again with more stringent settings of p=0.001 and the requirement that more than one method was necessary to identify a recombinati on event. Five sequences showed no evidence of recombination under these conditions: pallidum Sea81-4 tprJ (G/J allele), pertenue Gauthier tprG (G/J allele), pertenue Gauthier tprJ (G/J allele), pallidum Nichols tprJ (J allele), and pallidum MexicoA tprJ (J allele). However, all four met hods identified recombination at the region containing V2 in tprG sequences for both pallidum G alleles (Sea81-4 and Nichols) as well as the pallidum G/J allele (MexicoA). There are f our polymorphisms between V1 and V2 that are shared between the pallidum G alleles and MexicoA G/J, although they are not found in any of the other G/J alleles from other subspe cies, which may contribut e to this result. In tprK with the extended variable regions ex cluded, no recombination events were found by any of the methods. These results agree with the phylogenetic analyses, which also do not indicate any recombination outside of the va riable regions (although Giacani and colleagues 39
(2004) did identify a putative region of recombination at tprK in CuniculiA between V5 and V6 that was not detected in any of our analyses). The results of the tests for recombination were consistent with the phylogenetic analyses in the overall detection of a high level of recombin ation across the studied loci. The recombination tests also identified new regions of recombination, particularly at tprC Overall, far more recombination was indicated at tprG and J than for any other locus stud ied here and this result is consistent with our phylogenetic analyses that revealed multiple inst ances of paralogous clustering and the presence of multiple divergent alleles at tprG and J Analysis of Nucleotide Diversity and Composition Additional measures, such as nucleotide dive rsity and GC content, can be used to investigate recombination events, with the acknowledgement that other phenomena also affect these measures (Baldo et al. 2006). The amount of within-subspecies genetic diversity is low for all three subspecies at loci tprC I and K ( = 0-.0076) (Table 2-5). At tprD and J however, the diversity within T. p. subsp. pallidum is very high ( = 0.101 and 0.0958, respectively), reflecting the intra-subspecies gene conversion events discusse d above. The amount of diversity at tprG within T. p. subsp. pallidum is intermediate ( = 0.0154), and specifically lower than tprJ, reflecting a smaller putative gene c onversion event, i.e. the V2 region. The pattern of genetic diversity between subspecies of T. pallidum differs among loci, especially for T. p. subsp. pallidum The D xy nucleotide diversity between T. p. subsp. pertenue and T. p. subsp. endemicum is fairly consistent within tprC I and K (no tprD sequence data currently exist for T. p. subsp. endemicum ). The D xy distance between T. p. subsp. pallidum and the other two subspecies is approximatel y doubled relative to the distance between T. p. subsp. pertenue and T. p. subsp. endemicum at tprC and I consistent with the long branches leading to 40
T. p. subsp. pallidum in these phylogenies. However, at tprK the distance between T. p. subsp. pallidum and T. p. subsp. pertenue is much smaller than the distance between T. p. subsp. endemicum and the others (0.0028 vs 0.011 and 0.013), c onsistent with the clustering of T. p. subsp. pallidum and T. p. subsp. pertenue in the tprK phylogeny. At tprG the distance between T. p. subsp. pallidum and T. p. subsp. pertenue is intermediate (D xy = 0.0130), while at tprJ the distance between T. p. subsp. pallidum and T. p. subsp. pertenue is only slightly lower than between these same two subspecies at tprD (D xy = 0.0962). Again, this is in agreement with the proposed gene conversion or horiz ontal gene transfer event that created th e highly divergent J allele in most T .p. subsp. pallidum strains. Previous studies have suggested that gene conversion events lead to increased GC content at third codon positions (Eyre-Walker 1993; Galt ier et al. 2001; Galtier 2003; Noonan et al. 2004). Although the molecular mechanism is unknown, it may be due to a GC bias in mismatch repair, which is required to resolve conversion events (Galtier et al. 2001; Noonan et al. 2004). Third positions reflect this bias more strongly because they are under less selective constraint since base changes at this positi on are less likely to result in a change in amino acid. At each of the six tpr loci studied here, GC content was increased at the third position (GC3) relative to the first and second positions combined (GC1+2) (table 6) although not as dramatically as reported in other systems (Galtier et al. 2001; Noonan et al. 2004). This analysis supports our general finding of multiple gene conversi on events at the studied loci. Discussion Intra-genomic homologous recombination appear s to have been a major force in the evolution of the tpr gene family in the pathogenic Treponema species. After the gene duplication events that created the gene fa mily, our phylogenetic analyses of tprC, D, I, G, J, and K suggest that the high levels of homology among the loci have supported multiple gene conversion events 41
both within and between these tpr genes. Although lateral gene transfer can theoretically produce the genetic signatures we describe, this mechanism has not been reported in T. pallidum (lateral gene transfer has been iden tified as a probable evolutiona ry force in the genome of T. denticola due to the signature presence of phage-mediated integration events and restriction-modification systems that may serve as a barrie r against lateral gene transfer, but neither of these signatures is present in T. pallidum (Seshadri et al. 2004) and gene conversion appears more likely, particularly at tprC D G and K where donor regions can be identi fied within the same genome. No donor sequence was identified for the V1 region of the J allele at tprJ and, thus, horizontal gene transfer cannot be definitively ruled out. In Subfamily I, we propose three ge ne conversion events between loci tprC and tprD ; 1) a tprC -totprD conversion that introdu ced the D allele into tprD in the D pallidum strains, 2) a tprC -totprD conversion in T. p. subsp. pertenue Gauthier strain that introduced the D3 allele into tprD, and 3) a tprD -totprC conversion in T. paraluiscuniculi that introduced the D2 allele into tprC (table 3) At this point, there are insufficient data to determine the order of the three proposed gene conversion events. However, it is clear that the tprC -totprD conversions (#1 and 2) represent two distinct events since the pertenue Gauthier sequences differ by only two bp and the pallidum sequences are identical, but the pertenue Gauthier and pallidum sequences differ from each other by 55 bp. Furthermore, our results suggest that the tprC locus is likely to be older than the tprD locus because there is more variation among the pallidum D-like alleles at tprC compared to the pallidum D2 sequences at tprD that are identical (fig. 2a and b). At tprD the D2 allele is most likely the ancestral alle le since it is found in multiple subspecies (i.e. pallidum, pertenue, endemicum ) as well as in T. paraluiscuniculi while the D allele is only 42
found in a subset of pallidum strains. Non-D2 alleles in the four pallidum strains and pertenue Gauthier are likely the result of two subsequent gene conversi on events, as described above. At tprC a single gene conversion event (originating from tprD ) is posited in T. paraluiscuniculi based on the phylogenetic analyses The RDP2 recombination analysis identified several small, additional recombinant regions in all of the T. p. subsp. pallidum sequences, which is consistent with the higher diversity observed in the T. p subsp. pallidum tprC sequences (Figure 2-2, Table 2-4). Close inspection of the tprC alignment (including pallidum and nonpallidum strains) revealed the presence of a high number of non-synonymous mutations in the pallidum strains that were grouped in clus ters rather than scattered throughout the alignment. The transition/transversion ratio was decreased in these clusters and initially revealed a significant signal for positive selection at tprC in T. p. subsp. pallidum (data not shown). However, when we examined an alignment of tprC, D, and I together, we found that the majority of the transversions and non-synonymous mutations were unique to T. p. subsp pallidum at tprC The presence of clustered mutations, wi th a high frequency of transversions, argues against accumulated point mutations and in stead suggests that multiple smaller, sitespecific gene conversion events may have occurred at tprC in T. p. subsp. pallidum This may be similar to the presence of multiple, variable regions in tprK, although the tprC regions do not appear to undergo rapid seque nce variation as occurs in tprK These putative recombination events at tprC would have to have occurred prior to the major tprC -totprD gene conversion that replaced the D2 allele with the D allele at tprD since the D alleles at tprC and D are identical (table 3). In proteins with an tigenic relevance, recombination produces variation that may have an adaptive purpose. However, it is not understo od whether these proteins are more likely to undergo recombination or whether high variability simply increases the power of detection of 43
these events (Baldo et al. 2006). In the case of tprC which may be a cell-surface protein as predicted by PSORT analysis (data not s hown), it appears that the majority of tprC variation may be a result of gene conversion events supporting an adaptive explanation. At the tprI locus, variation among the treponemes was scattered and did not highlight a specific region that might have undergone gene conversion as described above for other loci. However, the fact that all eight of the T. p. subsp. pallidum tprI sequences were identical is intriguing, considering this was not the case at any other locus in our study. There are several possible explanations for the 100% sequence ho mology including a significantly lower (point) mutation rate at tprI in T. p subsp. pallidum a more recent divergence of the T. p. subsp. pallidum tprI sequences, functional constraint at tprI in T. p. subsp. pallidum or a gene conversion event that occurred prior to evolution of the T. p. subsp. pallidum tprI sequences (if a sequence longer than that in our dataset were replaced, the recombination event would go undetected by our recombination an alysis that looked for the endpoi nts of recombination events). Both a lower mutation rate and more recent evolution of T. p. subsp. pallidum tprI seem unlikely because the lengths of branches leading to T. p. subsp. pallidum at tprI and tprC are comparable (0.016 and 0.014 substitutions/sit e, respectively). Using a tprI phylogeny, a test for selection on the branch leading to the eight T. p. subsp. pallidum sequences indicated that the nonsynonymous/synonymous rate ratio was not signifi cantly different from 1.0 (data not shown). Thus, functional constraint does not appear to explain the lack of mutations at this locus in the pallidum subspecies. No specific gene conversion events were identified at tprI in our analyses (although GC3 content was highest at tprI ). The most likely explanati on may be that the rate of point mutations is generally low at all tpr genes and the pallidum tprI sequences have escaped 44
(by chance) the frequent gene conversion events that are mainly responsible for the diversity seen at tprC and D in T. p. subsp. pallidum. In Subfamily II, the various alleles that occur at tprG and J (and at tprE and J in T. paraluiscuniculi ), i.e. the G, J, and G/J alleles, are strongly suggestive of multiple gene conversion events although the dir ectionality of these events is difficult to determine due to the complexity of the DNA sequences at these loci Because the G/J allele occurs in multiple subspecies and at multiple loci (Figure 2-3), it appears to be the ancestral sequence. The Nichols and Mexico tprJ sequences have a divergent central region that is suggestive of a gene conversion event (or horizontal gene transfer) that replaced the G/J allele (most likely only the V1 region was replaced with a J motif V1). Unlike the scenario proposed above for gene conversions at tprC and D no donor region is immediately appare nt for the gene conversion that created the J motif V1 (BLAST searches did not identify any homology between the J motif at the VI variable region and any other treponemal or non-trep onemal sequences). Interestingly, the clustering of the pallidum strains is not consis tent between loci. At tprJ only Sea81-4 has apparently escaped the gene conversion wh ich created the divergent J allele. At tprG however, Nichols and Sea81-4 appear to have shared a ge ne conversion event creating the G motif at V2 to the exclusion of MexicoA. This is in contrast with tprD where the ancestral D allele was replaced by the D2 allele in Nic hols, but not in MexicoA or Sea 81-4. A consistent history of the evolution of the subspecies pallidum strains therefore cannot be as certained from these data. Previous studies have demons trated a high frequency of ge ne conversion events at the tprK locus in T. p. subsp. pallidum (Centurion-Lara et al. 2004). Seve n variable regions have been identified at tprK that are likely the result of multiple ge ne conversion events, with the probable donor sites located in the 3 and 5 flanking regions of tprD A multi-site/multi-step 45
recombination process has been described for th e accumulation of diversity within the variable regions. This diversity was shown to accumulate mo re dramatically in the presence of adaptive immune pressure suggesting a mechanism to generate antigenic diversity in T. pallidum (Centurion-Lara et al. 2004) It is probable that the gene conversion operating at tprK is different than that affecting the tprC and D loci. Gene conversion at the tprK locus generates diversity and each event seems to affect a relatively small re gion (each variable region is 48-99 bases). In contrast, gene conversions at the tprD loci appears to result in concerted evolution and affect a larger portion of the gene since the T. p. subsp. pallidum D alleles are identical at both tprC and D These seemingly contradictory outcomes of c oncerted evolution and in creased diversity, both mediated by gene conversion, may be explained by differing stages of evolution in a multi-gene family (Santoyo and Romero 2005). In the first stag e after initial gene dup lication to create the multi-gene family, homogenization or concerted evolution is likely the dominant force because the high sequence homology drives gene conversion at a faster rate than point mutation occurs. As point mutations accumulate over time, homol ogous recombination is no longer as effective and the rate of point mutations may surpass that of gene conversion. At this stage of evolution of a multi-gene family, smaller scale site-specific gene conversi on may become more significant, thus allowing concerted evolution to occur in sm all regions while antigenic variation is created throughout the gene (Santoyo and Romero 2005). This explanation may indicate a younger history for the tprD sequences (i.e. concerted ev olution stage) relative to tprK while the tprK (and possibly tprC ) sequences are experienci ng site-specific gene c onversion events, possibly leading to increased antigenic variation. Several scenarios have been proposed for th e evolution of the human treponemal species (Baker and Armelagos 1988; Powell and Cook 2005) A New World vs. Old World origin of 46
venereal syphilis has been long debated (For a recent review, see Powell and Cook 2005). The Columbian hypothesis originally suggested that venereal syphilis ( T. p. subsp. pallidum ) originated in the New World a nd was brought to Europe by Columbus crews returning from the New World (Crosby 1969). This was based on the pa ucity of skeletal and historical evidence for treponemal disease in the Old World prior to the early 1500s. For example, Rothschild (2003) proposed that yaws ( T. p. subsp. pertenue) was the most ancestral of the three T. pallidum subspecies and was present at least as far back as the origin of modern humans in Africa, and the other two subspecies each derived from yaws, with T. p. subsp. pallidum evolving in the New World no more than ~2000 years ago (Rothschild 2003). Baker and Armelagos (1988) have proposed an alternative Columbian hypothesis which suggests that venereal syphilis evolved in Europe from a New World non-vene real treponeme that was introduced to Europe by Columbus crews. This hypothesis is based on the lack of specific evidence for venereal syphilis in the New World, despite the overwhelming evidence of treponemal disease. The Pre-Columbian hypothesis suggests that treponema l diseases, including venereal syphilis, existed in the Old World prior to Columbus voyages but were diagnos ed incorrectly. One scen ario suggests that pinta was the original form present throughout the world during the Pleistocene, followed by the evolution of yaws (12,000 years ago), then endemic syphilis (9,000 years ago) and, finally, venereal syphilis (5,000 years ago) (Hackett 196 3). Finally, a Unitarian hypothesis, based on skeletal morphology data, has been advanced by Hudson (1965), who suggests that venereal syphilis, endemic syphilis, yaws, and pinta are no t in fact distinct diseases, but rather are environmentally determined manifestations of th e same disease. More recently, Armelagos and colleagues (2005) have reviewed the molecular l iterature on human treponemes and they suggest a lack of molecular distinction between these subspecies. 47
Our molecular data suggest that the three subspe cies are legitimately classified as distinct entities (Figure 2-2) The phylogenies for tprC and tprI demonstrate high bootstrap support for the separation of the three subspecies into separate clades, with relatively long branches leading to endemicum and pallidum In tprD pertenue is artificially closer to some pallidum sequences because of the gene conversion event that separates pallidum D and D2 alleles. In the tprK phylogeny, there is no bootstrap support to separate pallidum and pertenue but the tprK phylogeny is difficult to interpret because this locus has an exceedingly high mutation rate as demonstrated by the fact that multiple tprK sequences exist in a single individual (even when the variable regions are removed). Furthermore, AMOVA results reveal a significant amount of among-subspecies variation (7095%, p=0.000) when analyzing tprC I and K from all three subspecies further supporting the ge netic distinctiveness of the subs pecies (data not shown). It is clear that recombination has played a si gnificant role in the evolution of the tpr genes, and possibly in the evolution of the treponemes. Therefore, studies that look only at SNPs, as reviewed by Armelagos et al. (2005), will miss this high level of recombination and it may appear there are few subspeci es-specific variants because recombination has frequently scrambled alleles within a subspeci es, e.g. the D and D2 alleles at tprD Moreover, the fact that these recombination events are unique to a subs pecies argues strongly in favor of the genetic distinctiveness of the three subspecies. Ascertaining the distinctiveness of the subsp ecies is prerequisite to resolving their evolutionary history. In general, our analyses do not appear to s upport a dramatically older origin of yaws relative to venereal syphilis contra Rothschild (2003), i.e. we do not see greater levels of variation or longer tree branches for T. p. subsp. pertenue relative to T. p. subsp. pallidum (Table 2-4, Figures 2-2 and 2-4) Furthermore, our results do not cl early support current hypotheses that 48
49 consider venereal syphilis to be the most rece ntly evolved of the treponemal syndromes. For instance, multiple gene conversion events have occurred within subsets of T. p. subsp. pallidum strains at tprC, D, G, and J (Figures 2-2 and 2-3) arguing for an older evolution of the entire subspecies of pallidum in order to allow sufficient time for these events to have occurred. The long branch in the tprI phylogeny leading to T. p. subsp. pallidum reflects a large number of point mutations, which are assumed to evolve in a clock-like manner, suggesting that more evolution has occurred on the branch to pallidum than on the branches to the other two subspecies. Our results are generall y consistent with a relatively co incident evolution of the three human treponemal subspecies as proposed by Hackett (1965) but contra Rothschild (2003) who proposed dramatically different timeframes fo r evolution of yaws and venereal syphilis. Moreover, the T. p. subsp. pallidum sequences appear to carry t oo much variation to support the modified Columbian hypothesis of evolution of venereal syphilis within the past 500 years (Baker and Armelagos 1988). This is furthe r supported by the fact that at least one T. p. subsp. pallidum strain (Nichols) was collected in the early 1900s and is identic al at the loci examined to several other strains collected later in the 20 th century, suggesting that the mutation rate is not high enough to have created variants within th is time frame. Additional samples, e.g. more representatives of T. p. subsp. endemicum and T. p. subsp. pertenue and analysis of more loci will be needed to definitively answer questi ons concerning the origin and evolution of the treponemes. Moreover, the high levels of recomb ination revealed in ou r study suggest that the analysis of contiguous sequence data, as opposed to analysis of scattered SNPs, will be necessary to identify possible recombination events prior to reconstruction of the e volutionary history of the treponemes.
50 Table 2-1. Treponema isolates used in this study Species/ Subspecies Strain Geographical source Year isolated tprC tprD tprG tprI tprJ tprK Reference T. p. subsp. pallidum Nichols a Washington, D.C. 1912 + e + + + + + (Nichols and Hough 1913) Sea81-4 b Seattle 1980 + + + + + + (Lukehart et al. 1988) Chicago c Chicago 1951 + + f + (Turner and Hollander 1957) Bal73-1 c Baltimore 1968 + + + (Hardy et al. 1970) Bal3 c Baltimore Unknown + + + Bal7 c Sea81-3 b Baltimore Seattle 1976 1981 + + + + + + (Tramont 1976) (Lukehart et al. 1988) MexicoA c Mexico 1953 + + + + + (Turner and Hollander 1957) T. p. subsp. pertenue Gauthier d Brazzaville 1960 + + + + + + (Gastinel et al. 1963) SamoaD c Western Samoa 1953 + + + (Turner and Hollander 1957) T. p. subsp. endemicum BosniaA c Bosnia 1950 + + + (Turner and Hollander 1957) IraqB c Iraq 1951 + + + (Turner and Hollander 1957) T. paraluiscuniculi CuniculiA c Baltimore Unknown + + + + + Simian species FribourgBlanc c Guinea 1966 + + (Fribourg-Blanc, Mollaret, and Niel 1966) a Originally provided by James N. Miller, Un iversity of California, Los Angeles, CA b Strain isolated in Seattle by Sheila A. L ukehart, University of Washington, Seattle, WA c Strains provided by Paul Hardy and Ellen Nell, Johns Hopkins University, Baltimore, MD d Provided by Peter Perine, Ce nters for Diseases Control and Prevention, Atlanta, GA e + indicates sequence data that we re generated in the current study f indicates sequence data were not generated in the current study
Table 2-2. T. pallidum primers used in this study. Locus a Strains sequenced Sense primer Antisense Primer Length of ORFs analyzed (bp) tprC BosniaA, Simian, SamoaD, CuniculiA 5gggggtgaggtagaagtgaga 5-ccctcatccgagacaaaat 1797 Nichols, Bal7, Bal73-1, Chicago, Sea81-3, Sea814, MexicoA, Bal3, IraqB, Gauthier 5-ttagaggaggcgtcagaacg 5-ggtccgtgagcaggaagtaa 1797 tprD Nichols, Bal7, Bal73-1, Chicago, Sea81-3, Sea814, MexicoA, Bal3, Gauthier 5cgcgtaccgctttgcagttca 5catggcattggtgagaaagacg 1791 CuniculiA 5aagaggttcaggaagcaacg 5-acttcgtaggagcagcagga 1791 tprI BosniaA, IraqB, Simian, SamoaD 5-cgtcaccctctcctggtagt 5-atccctcgcctgtaaactga 1830 Nichols, Bal7, Bal73-1, Chicago, Sea81-3, MexicoA, Bal3, Sea81-4 5-tgggagcttgtatgcagatg 5-gggaaccctctcccttcc 1830 Gauthier 5agggtgagggggctactaga 5-atccctcgcctgtaaactga 1830 tprG Gauthier, Sea81-4 5-ccctgcgtttcccatctg 5-gtactaccttcccccggtct 2268 MexicoA 5'-caggttttgccgttaag c 5'-aatcaagggagaataccgtc 1812 CuniculiA 5-cgcgtacccacttctctctc 5-gtactaccttcccccggtct 2268 tprJ MexicoA 5'-caggttttgccgttaagc 5'-aatcaagggagaataccgtc 1812 Gauthier 5agggtgagggggctactaga 5-atccctcgcctgtaaactga 2268 Sea81-4 5-cgagtgaggctcatcaagaa 5-agtaagccctgcccaagaac 2268 CuniculiA 5-aagtttgctttcagat 5-gtactaccttcccccggtct 2268 tprK BosniaA, IraqB, Simian, SamoaD, Gauthier 5-agtaatggttttcggcatcg 5-ccatacatccctaccaaatca 1470 Sea81-4, CuniculiA 5-tcccccagttgcagcactat 5-t cgcggtagtcaacaatacca 1470 51
a PCR Cycling conditions: tprC (Other than IraqB or CuniculiA): Denaturation at 94 C for 3 minutes, then 40 cycles of 94 C for 1 minute, 60C for 2 minutes, and 72 C for 1 minute, with a final elongation step of 72C for 10 minutes. tprC (IraqB): 40 cycles of 94 C for 1 minute, 58 C for 1 minute, and 72C for 2 minutes, with a final elongation step of 72 C for 2 minutes. tprC, D ,G, J, K (CuniculiA): Denaturation at 95 C for 2 minutes, then 35 cycles of 95 C for 1 minute, 60C for 1 minutes, and 72 C for 2 minute, with a final elongation step of 72 C for 3 minutes. tprD : (Other than CuniculiA)94 C denaturation for 3 minutes, then 30 cycles of 94 C for 1 minute, 60C for 2 minutes, and 72 C for 1 minute, with a final extension step of 72 C for 10 minutes. tprI : Denaturation at 94 C for 3 minutes, then 30 (Nichols, Bal7, Bal73-1, Chicago, Sea81-3, MexicoA, Bal3, Sea81-4) or 40 cycles (BosniaA, IraqB, Simian, SamoaD) of 94C for 1 minute, 60 C for 2 minutes, and 72 C for 1 minute, with a final elongation step of 72 C for 10 minutes. tprI, J (Gauthier): Denaturation at 94 C for 1 minute, then 40 cycles of 98 C for 10 seconds, 63 C for 5 minutes, with a final elongation step of 72 C for 10 minutes (used LA Kit, Takara Bio Inc. Shiga, Japan). tprG (Gauthier): Denaturation at 94 C for 1 minute, then 45 cycles of 98 C for 10 seconds, 62 C for 5 minutes, with a final elongation step of 72 C for 10 minutes (used LA Kit, Tokara Bio Inc. Shiga, Japan). tprG (Sea81-4): Denaturation at 94 C for 1 minute, then 35 cycles of 98 C for 20 seconds, 68 C for 5 minutes, with a final elongation step of 72 C for 10 minutes (used LA Kit, Tokara Bio Inc. Shiga, Japan). tprG, J (MexicoA): Denaturation at 94 C for 3 minutes, then 40 cycles of 94 C for 1 minute, 63C for 2 minutes, and 68 C for 1 minute, with a final elongation step of 68 C for 7 minutes. Note that these primers amplify a partial ORF. tprK : (All except CuniculiA) Denaturation at 94 C for 3 minutes, then 40-45 cycles of 94 C for 1 minute, 60C for 2 minutes, and 72C for 1 minute, with a final elongation step of 72 C for 10 minutes. Table 2-3. Polymorphism at the tprC and tprD loci among pathogenic treponemes. Species/Subspecies Strains tprD allele Directionality of proposed gene conversion events tprC allele T. p. subsp pallidum Nichols Chicago Bal73-1 Bal7 D D D D D D D D Sea81-3 Sea81-4 Bal3 MexicoA D2 D2 D2 D2 No conversion No conversion No conversion No conversion D-like D-like D-like D-like T. p. subsp. endemicum Iraq Bosnia D2 ND No conversion ND a D3-like ND T. p. subsp pertenue Gauthier D3 D3 SamoaD D2 No conversion D3-like a ND = No sequence data 52
Table 2-4. Recombinant regions identified by RDP2. Subspecies Strain tprC tprD tprI tprK tprG/J T. p. subsp. pallidum MexicoA ~800-1600 a,c,d,e ~1450-1750 a,c,d, 1-854 e NC f NC 1669-2149 (G/J) b,c Sea81-3 ~1450-1750 a,c,d, 1-1500 c 1-854 e NC ND ND g Sea81-4 NC 1-854 e NC NC 1669-2149 (G) b,c,d,e Bal3 NC 1-854 e NC ND ND Nichols 137-889 c 1459-1728 c,d NC NC ND 1556-2176 (G) b,c,d,e Chicago 137-889 c 1459-1728 c,d NC NC ND ND Bal73-1 137-889 c 1459-1728 c,d NC NC ND ND Bal7 137-889 c 1459-1728 c,d NC NC ND ND T. p. subsp. pertenue Gauthier NC 1-854 e 17011768 c,d,e 16271796 d NC NC NC SamoaD NC 143-621 c NC ND T. p. subsp. endemicum BosniaA 1459-end c,d ND NC NC ND IraqB 714-1471 c,d ND NC NC ND Simian NC ND NC ND ND a Multiple regions were identified by all four programs; the consensus is reported here. b RDP program identified this region as recombinant. c MaxChi program identified this region as recombinant. d Chimera program identified this region as recombinant. e GENECONV program identified this region as recombinant. f NC = no conversion identified. g ND = no DNA sequence data. 53
Table 2-5. Levels of nucleotide dive rsity within and between subspecies. Within subspecies ( ) Between subspecies (D xy ) Within subspecies ( ) Between subspecies (D xy ) Locus pallidum pertenue Locus pallidum pertenue Locus tprD 0.10105 NC a tprD 0.10105 NC a tprD tprC 0.00659 0.00111 tprC 0.00659 0.00111 tprC tprI 0.0 0.00765 tprI 0.0 0.00765 tprI tprK 0.00066 0.00443 tprK 0.00066 0.00443 tprK tprG 0.01541 NC tprG 0.01541 NC tprG tprJ 0.09584 NC tprJ 0.09584 NC tprJ a NC = not calculated because only one DNA sequence was available. b ND = no DNA sequence data Table 2-6. Average GC content at combined 1 st + 2 nd (GC1+2) and 3 rd codon (GC3) positions Locus a GC1+2 GC3 tprD 0.524 0.634 tprC 0.537 0.625 tprI 0.535 0.654 tprK 0.481 0.569 tprG+J 0.523 0.599 a All available human sequences were used for each locus; see Table 1 for list of sequences used. 54
Figure 2-1. Unrooted ML phylogenies of multiple tpr genes. (a) Unrooted ML phylogeny of twelve Nichols tpr nucleotide sequences based on an alignment of 2708 bp. (b) ML phylogeny of tprC, D, and I nucleotide sequences based on an alignment of 1797 bp using mid-point rooting. Subsp ecies designations are indicated by vertical lines. Grey boxes indicate clades in which pa ralogous sequences group together. The specific tpr locus designation is appended to each strain name. Bootstrap values based on (a) 250 and (b) 1000 replications are shown next to branches. 55
Figure 2-2. ML phylogenies for tpr D, C, and I. Phylogenies are based on nucleotide sequences of (a) tprD; (b) tprC; (c) tpr I. Human subspecies are circled and labeled. Bootstrap values based on 1000 replic ations are shown next to branches. 56
Figure 2-3. ML phylogeny of tprG and J. The specific tpr locus designation is appended to each strain name. Subspecies are circled and labeled. For T. paraluiscuniculi G1 signifies the G/J allele at tpr J and G2 signifies the G/J at tprE. Bootstrap values based on 1000 repli cations are shown next to branches. 57
Figure 2-4. ML phylogeny of tprK. Rooted using CuniculiA strains. Subspecies de signations are indicated by vertical lines and labeled. Bootstrap values based on 1000 repl ications are shownnext to branches. 58
CHAPTER 3 LINKAGE DISEQUILIBRIUM AN D ASSOCIATION ANALYSIS OF ALPHA SYNUCLEIN ( SNCA ) AND ALCOHOL AND DRUG DEPENDENCE IN TWO AMERICAN INDIAN POPULATIONS 2 Introduction Alpha synuclein is involved in dopaminergic neurotransmission and has been implicated in a number of neurological disord ers. An association between -synuclein and neurodegenerative disorders is well establ ished. Overexpression of -synuclein has been implicated in the etiology of Parkinsons disease (Polymeropoulos et al. 199 7; Kruger et al. 1998) an d Alzheimers disease (Ueda et al. 1993), possibly because of neurode generation of dopamine neurons due to toxic build-up of -synuclein (Mash et al. 2003). More recently, -synuclein has been associated with ne uropsychiatric disorders, such as alcoholism (Liang et al. 2003; Bonsch et al. 2005 a; Bonsch et al. 2005b; Bonsch et al. 2005c) and drug addiction (Mash et al. 2003; Kobayashi et al. 2004). Alpha synuclein is located at a quantitative trait locus for a(B onsch et al. 2005a; Bonsch et al 2005c) Alcohol preference in humans and levels of its mRNA a nd protein are elevated in alcohol -preferring individuals in rats and macaque monkeys (Liang et al. 2003; Spence et al. 2005; Walker and Grant 2006). The complex microsatellite repeat, NACP -REP1, which is locate d ~10kb upstream of the -synuclein gene ( SNCA ) has been associated with alcohol depende nce in humans; specifically, longer alleles were correlated with elevated levels of -synuclein and were more fre quent in alcohol dependent patients (Bonsch et al. 2005b). Increased methylation of the SNCA promoter was also detected in alcoholic patients (Bonsch et al. 2005c), and elevated SNCA mRNA and protein levels have also been associated with craving in alcoholic pa tients (Bonsch et al. 2005a; Bonsch et al. 2005c). 2 Clarimon, J., R. R. Gray, L. N. Williams, M. A. Enoch, R. W. Robin, B. Albaugh, A. Singleton, D. Goldman, and C. J. Mulligan. 2007. Linkage disequilibrium and associa tion analysis of alpha-synuclein and alcohol and drug dependence in two American Indian populations. Alcohol Clin Exp Res 31:546-554. 59
With respect to drug abuse, three out of four SN Ps assayed in intron 1 of SNCA were found to be significantly associated with methamphetamine psychosis/dependence (K obayashi et al. 2004). An overexpression of SNCA was also observed in the dopamine neurons of cocaine abusers although -synuclein may not directly increase the risk of drug abuse and it was speculated that SNCA overexpression may be a protective response to changes in dopamine turnover resulting from cocaine abuse (Mash et al. 200 3). In general, overexpression of -synuclein is thought to interfere with dopaminergic neurotransmission, which has been proposed as a main mechanism for withdrawal and craving (S elf et al. 1995), two important factors for the development, maintenance and relapse of addictive disorders. The gene for -synuclein has been mapped to chromosome position 4q21.3-22 (Chen et al. 1995; Shibasaki et al. 1995; Spillantini, Divane, and Goedert 1995). Independent genomewide linkage studies have provided modest evidence that a locus in this region contributes to alcohol dependence in one of th e American Indian populations analyzed in the current study (Southwest population; (Long et al 1998; Mulligan et al. 2003) as well as in Euro-Americans (Reich et al. 1998). Follow-up studies on the Eu ro-American population demonstrated strong linkage to a phenotype defined by the maximum number of drinks consumed on one occasion (Saccone et al. 2000). A recent study of Mission Indians reported no linkage in this region to a diagnosis of alcohol dependence, but detected modest support fo r linkage to a more narrowly defined phenotype of drinking sever ity (Ehlers et al. 2004a). This re gion has also been associated with drug abuse through a genome-wide single-n ucleotide polymorphism (SNP) genome scan (Uhl et al. 2001). In our study, we assayed 15 SNPs in the -synuclein gene and one upstream microsatellite repeat ( NACP -REP1) in participants belonging to Southwest (n=514) and Plains 60
(n=420) American Indian populations. Patterns of linkage disequilibrium (LD) across the assayed SNPs were similar in both tested populati ons and were consistent with the LD patterns of the SCNA region in Caucasians, Africans and Asians as reflected in the HapMap database ( www.hapmap.org ). The assayed alleles and constructed haplotypes were tested for association with alcohol dependence or alcohol use disorder s (pooled diagnoses of alcohol dependence and alcohol abuse) and drug dependen ce or drug use disorders (pooled diagnoses of drug dependence and drug abuse), which are disorders that reach lifetime prevalences as high as 64% in the study populations. Individual alleles a nd constructed haplotypes were al so tested against two symptom count phenotypes (all 18 questions and the eight questions that are diagnostic for alcohol dependence). Materials and Methods Sampling Strategy Blood samples and clinical data were collected from 514 adult member s of a SW American Indian tribe and from 420 adult members of a Plains American Indian tribe. Participants were initially chosen at random from the tribal regist ry, and family members of ascertained alcoholics were subsequently included. Descriptive data on both populations are pr esented in Table 3-1 (only a subset of these individua ls was typed for a sufficient numb er of SNPs to be included in the haplotype analysis, see Tabl e 3-2). All participants were 21 years and required to have a minimum of 25% ancestry to be included on th e register. Williams et al. (1992) found a high correspondence between overall leve ls of stated ancestry and an cestry estimated from genetic markers and they found evidence for 5% non-American Indian admixture in the SW population. Belfer et al. (2006) report lo w admixture with non-American I ndian tribes in the Plains population. Informed consent was obtained under a human subjects research protocol approved 61
by the respective Tribal Councils of each populat ion group and the Institutional Review Board (IRB) of the National Institute on Alcohol Abuse and Alcoholism. Testing Instruments, Intervie ws, and Psychiatric Diagnoses Focus groups comprised of tribal staff and community members reviewed testing instruments and questionnaires for potential cu ltural biases and general suitability to the population. Research diagnoses for lifetime alco hol dependence and abuse and lifetime drug dependence and abuse were based on: 1) semi-s tructured psychiatric interviews using the Schedule for Affective Disorders and Schizophren ia Lifetime Version (SADS-L) with probes added to enable diagnoses using both Research Di agnostic Criteria and Diagnostic and Statistical Manual of Mental Disorders, Third EditionRevised (DSM-III-R, American Psychiatric Association, 1987) criteria (Robin et al. 1998), 2) medical, educationa l, court, and other records, and 3) corroborative information from family members. The SADS-L was administered to all subjects by a psychologist experienced with psyc hiatric assessment in this tribe and other American Indian populations. DSM-III-R diagnose s of alcohol dependence were made from the SADS-L by following operationally defined criteria and using the in structions of Spitzer et al. (1989) (an additional criterion of heavy drinking for one year or more was added for a diagnosis of alcohol dependence in the Pl ains population; (Belfe r et al. 2006). Diagnoses were made from the SADS-L interview data independently by two raters: a clinical social worker and a clinical psychologist. Diagnostic differences were resolved in a consensus conference that included a senior psychiatrist experienced in diagnosis in American Indian people. Sampling strategy, interview procedure, and diagnosis protocol ar e summarized from Belfer et al. (2006), Long et al. (1998), and Robi n et al. (1998). 62
Genotyping Genotype data from the HapMap project ( www.hapmap.org ) were used to generate an optimal set of six tagging SNPs through the us e of TagIT software. However, because the HapMap SNP frequencies were calculated us ing Euro-American populations, we added nine more SNPs in order to provide full coverage of genetic variation within this locus in American Indian populations (Figure 3-1). Thirteen of the SNPs were a ssayed using Taqman Assays-byDesign SM SNP Genotyping Services (Applied Biosystems). Thermal cycling and end-point PCR analysis was performed on an ABI PRISM 7900HT Sequence Detection System. Primer and probe sequences are available upon request. Two SNPs, rs920624 and rs3775423, were assayed as restriction fragment length polymorphisms by di gesting amplification products with restriction enzymes Psi I or Mse I and SspI (New England Biolabs, Beverly, MA), respectively, and electrophoresing digests on 2% agarose gels. Primers for rs920624 were 5ACTACTTCTCTGTTGGATTGC-3 and 5-AAGATTCTTCACCTCTGTGTG-3 and for rs3775423 were 5-GTATCCAATGCCCAAAGG3 and 5-TGCCTCAGAAAGAACAGATG3. The NACP -REP1 dinucleotide repeat polymorphism was amplified as previously described (Farrer et al., 2001). PCR products were run on an ABI PRISM 3100 (Applied Biosystems) automated sequencer. GeneScan TM analysis software (version 3.7) and Genotyper (version 3.7) were used to assess fragment sizes. Statistical Analysis Previous studies have demonstr ated that the degree of genetic relationship (kinship) in both the Plains and SW samples is low and close to the average for the source populations, thus permitting analyses that assume independence of i ndividual samples (Robin et al. 1998; Belfer et al. 2006). Hardy-Weinberg equilibrium was asse ssed by Fishers exact test, implemented in 63
Arlequin ver.2.000 program (Schneider, Roessli, and Excoffier 2000). PHASE ver.2.1 (Stephens and Donnelly 2003) uses a Bayesian approach to reconstruct ha plotypes from unphased population genotypic data and takes into account r ecombination events and decay of LD with distance between markers, thus leading to a more accurate inference of the real haplotype. One thousand permutations were performed for each comparison. Haploview (Barrett et al. 2005) was used to visualize LD relationships between SNCA SNPs as well as to ascertain the tagSNPs that resulted from the LD analyses. LD blocks we re constructed following the D method by Gabriel et al. (2002) also impl emented in Haploview. A total of four clinical phenotypes were te sted: alcohol dependence, alcohol use disorder (diagnoses of alcohol dependence and alcohol ab use were pooled), drug dependence, and drug use disorder (diagnoses of drug dependence and drug abuse were pooled). Differences in allele and genotype distributions we re analyzed using the chi square test and two-tailed P values are presented. Data were analyzed using SPSS TM ver.11.0 for Windows (SPSS, Chicago, IL). Haplotype frequency comparisons between cases and controls were performed with PHASE ver.2.1 (Stephens and Donnelly 2003). In orde r to correct for multiple comparisons, global P values were calculated using the COCAPHASE module of the UNPHASED statistical package (Dudbridge 2003). Permutation co rrection was performed usi ng 10,000 permutations. Symptom counts were also used as two additional phenotypes, which were calculated as 1) the total number of affirmative responses to the SADS-L interview questions (Endicott and Spitzer 1978) and 2) the total number of affirmative responses to eight questions used in the diagnosis of alcohol dependence. ANOVA tests were used to analyze the variance of the two symptom count phenotypes among the four most common haplotype s and among the six SNPs used to define haplotypes using the R program (R Foundation for Statistical Computing, Vienna, Austria). The 64
CLUMP program (Sham and Curtis 1995) was used to analyze contingency tables resulting from NACP -REP1 alleles. Significance was assessed using Monte Carlo approach by performing a total of 10,000 simulations. A power calculation, based on haplotypes, indicated that our study had 83-100% power to detect an association at an odds ratio of 2.0 using pooled ha plotype frequencies, 91% power based on the most frequent haplotype in the SW population and 87% power based on the most frequent haplotype in the Plains population ( p = 0.05). Results Based on 14 common SNPs located within the SNCA gene, similar LD patterns were detected in both American Indian populations that are also consistent with the LD structures of European Caucasian, Yoruban (African) and Ch inese and Japanese Asian populations in the HapMap database (Figure 3-1; SNP rs920624 and the NACP -REP1 polymorphism were not included in the comparative LD analysis because there are no associated frequency data in the HapMap database). Based on a strict criterion of c ontinuous LD >90%, one small and one large LD block were defined in the SW population and three small LD blocks were identified in the Plains population. A less string ent criterion based on the pres ence of a recombination region between SNPs 4 and 5 suggests two large LD blocks (SNPs 2-4 and 5-14) present in both populations. Haplotype tests can be more powerful than using only one SNP to define a haplotype block (Schaid 2004), so SNPs 3, 4, 8, 9, 12, and 13 (as depicted in Figure 3-1) were chosen to define haplotypes in our analysis. Th e high level of LD suggests that we have assayed the full spectrum of SNCA variation present in our study popul ations, i.e. effectively all SNPs currently identified at SNCA (433 SNPs, http://www.ncbi.nlm.nih.gov/SNP/ ) were assayed. Phenotypes were investigated with respect to individual SNPs as well as to the haplotype blocks defined above. 65
Frequencies of all analyzed SNPs were in Hardy-Weinberg equilibrium for cases and controls. We did not find any allele or genotype frequency differen ces between cases and controls for alcohol dependence, alcohol use di sorders (pooled diagnoses of alcohol dependence and alcohol abuse), or drug use disorders (poole d diagnoses of drug dependence and drug abuse) when total populations were tested ( p values for allele frequency comparisons are presented in Figure 3-2). Drug dependence was significantly associated with SNPs rs2583978, rs356186, rs356198 and rs3775423 in the SW population (Figure 3-2). We tested whether this association was gender-dependent. When we focused on male s, the association w ith drug dependence persisted for SNPs rs356186 and rs3775423 and SNPs rs3775439 and rs356165 also were significantly associated (Figure 3-2). However, when we adjusted for multiple testing in order to control for the family-wise type I error (FWER) by means of a permutation test with 10,000 replicates, the global P -value was 0.113 for the entire population and 0.112 when only males where counted. None of the SNP associations were significant in females (Figure 3-2). Stratification of the Plains population by gender revealed a si gnificant association between rs356163 and alcohol dependence and alcohol us e disorders and between rs2572324 and alcohol use disorders in males (Figure 32). However, the global COCAPHASE P value in Plains males was not significant ( p > 0.1 for both alcohol dependen ce and alcohol use disorders). Association of the four addiction phenotypes was also tested against haplotypes constructed across the entire gene using SNPs 3, 4, 8, 9, 12, and 13 as described above. Several SNPs within the SNCA gene may be contributing to an addiction phenotype in the form of a super-allele (Schaid 2004); therefore we combined the six markers into a haplotype to increase the power of detecting an associa tion. Four major haplotypes were detected in the SW and Plains populations (>5 % frequency in all phenotype categories) (Table 3-2). No diffe rence in haplotype 66
frequencies between cases and controls were de tected in either popula tion. A power calculation using pooled haplotype frequency data from both populations indica ted 83-100% power to detect an association at an odds ratio of 2.0 ( p = 0.05). Categorical diagnoses of substance abuse may not adequately describe a disease that is both heterogeneous and occurs on a continuum and symptom counts may be used as a quantitative variable to provide more informati on (Helzer et al. 2006). Thus, the total number of affirmative responses on the SADS-L interview (n =18) and the eight diagnostic questions for alcohol dependence were used as additional pheno types. No significant association between the symptom count and the four most common haplot ypes was detected using an ANOVA in either the SW or the Plains populations. When each of the six SNPs used to define the haplotypes was tested individually against the symptom count only SNP rs356198 was marginally significant (p=0.044) in the SW population; however, that significance disappeared after correction for multiple testing. No comparisons with individual SNPs were significant in the Plains population. Since recent reports have descri bed an association between the NACP -REP1 repeat polymorphism and the risk of alcohol dependence (B onsch et al. 2005b), we tested this variant in our populations. Distribution of the NACP -REP1 alleles in both populations is shown in Figure 3-3. In contrast to other studied populations, bo th American Indian populations exhibited reduced variation at the REP1 locus with a very high frequency of the 267 bp repeat allele (8085%). The 267 bp allele was associated with virtually all haplotypes in both populations suggesting that no linkage dise quilibrium exists between the SNCA SNPs and the REP1 locus, thus the REP1 locus was analyzed independently of the SNCA SNPs and haplotypes. No association was found between the REP1 locus and any of the four addiction phenotypes in the 67
two populations (Figure 3-3). The same lack of association was found when each gender was analyzed independently (data not shown). Discussion Several recent studies have suggested that -synuclein may play a ro le in the development and maintenance of certain addictive disorders (L iang et al. 2003; Mash et al. 2003; Kobayashi et al. 2004; Bonsch et al. 2005b). We assayed 15 SNPs in the -synuclein gene, SNCA and one upstream microsatellite repeat ( NACP -REP1) in two American Indian populations with a high lifetime prevalence of alcohol and drug dependen ce and abuse (Table 3-1). The assayed alleles and constructed haplotypes were tested for asso ciation with one of four clinical phenotypes, including alcohol dependence and alcohol us e disorders (pooled diagnoses of alcohol dependence plus abuse) and dr ug dependence and drug use disorders (pooled diagnoses of drug dependence plus abuse), as we ll as two symptom count phenotype s (total number of questions and eight questions diagnostic for alcohol dependence). Single allele tests reveal ed significant associations between four SNPs and drug dependence in the SW population. Two of those SN Ps plus another two SNPs were found to be associated with drug dependence in SW male s only. In the Plains population, a significant association was detected only in males with tw o SNPs and alcohol use disorders and one SNP and alcohol dependence. An asso ciation with alcoholism in males is consistent with an overrepresentation of alcohol dependence in Na tive American males (66-70%) compared to females (30-53%) (Kinzie et al. 1992; Kunitz et al. 1999; Ehlers et al. 2004b; Gilder, Wall, and Ehlers 2004). However, none of the global p -values, calculated to ad just for multiple testing, reached the level of significance. Haplotype anal yses did not reveal any association between SNCA and substance abuse or dependence. Furthermore, when corrected for multiple 68
comparisons, no significant associat ions were detected when symp tom counts were tested against haplotypes or individual SNPs. The polymorphic REP1 complex dinucleotide repeat is located ~10kb upstream of the SNCA transcriptional start site and is polymorphic for alleles of length 265-273 (Chiba-Falek, Touchman, and Nussbaum 2003). Longer REP1 allele s have been associated with increased expression of the SNCA gene and loss of the REP1 repeat in vitro resulted in a 4-fold reduction in expression of the SNCA gene (Touchman et al. 2001; Ch iba-Falek, Touchman, and Nussbaum 2003). Several studies have correlated elevated levels of SNCA mRNA and protein with various phenotypes related to alcohol dependence (Liang et al. 2003; Bonsch et al. 2004; Walker and Grant 2006). In contrast, our results do not support an association between NACP -REP1 and alcohol dependence/use disorder s or drug dependence/use disord ers in the American Indians populations analyzed here. Lack of association co uld be explained by the low allelic variability present at this locus in both populat ions and, in particular, lack of the longer alleles implicated in the previous study (alleles 271 a nd 273, Bnsch et al. 2005a). To our knowledge, this is the most exhaustiv e analysis performed to date of genetic variation at the SNCA locus and possible association with ri sk of alcohol and drug addiction. The study is strengthened by analysis of two diffe rent populations. Although several SNPs initially returned significant p values, none of the results remain ed significant after correction for multiple testing. It is possible that the low geneti c variation in these Amer ican Indians may mask a significant association, or actually contribute to SNCA being a non-factor in addiction vulnerability in these particular populations. It is likely that a search for genes involved in addiction disorders will be complicated by population-specific genetic effects, as well as varying effects of social and environmenta l factors. Additionally, cannabis is the most frequently abused 69
drug in our populations in cont rast to previous studies th at found a correlation between SNCA and cocaine or methamphetamine addiction. Alcohol craving was not examined in this study, and thus its association with the SNCA gene in these populations cannot be ruled out. Nonetheless, our results in two American Indi an populations do not support a ro le for a genetic variant in the SNCA gene that contributes to alcohol or drug addi ction. These results do not preclude a role for this gene, particularly in other popu lations exhibiting more diversity at NACP -REP1. Altered methylation patterns in the SNCA promoter have already been a ssociated with alcoholism and may contribute to differential expression levels of this gene (Bonsch et al. 2005c). Thus, future research may focus on additional variants in the promoter region of SNCA that could cause the changes in mRNA and protein leve ls observed in previous studies. Table 3-1. Demographic and phenotypic charact erstics of southwest (SW) and plains populations. SW population n=514 a Plain population n=420 a Age SD (range) 36.5 13.6 (21) 42 14.1 (18) Gender (% males) 42.8 43.6 Alcohol dependence 316 239 Alcohol use disorder 348 248 Drug dependence 44 (47% cannabis, 49% amphetamines, 9% cocaine) 49 (69% cannabis, 36% amphetamines, 15% cocaine) Drug use disorder 185 (65% cannabis, 28% amphetamines, 34% cocaine) 96 (75% cannabis, 40% amphetamines, 17% cocaine) Controls b 135 159 a Phenotype counts do not equal population sample sizes because individuals had multiple diagnoses. For example, under drug dependence, 42 SW and 46 Plains individuals also had alcohol use disorders and, under drug use disord er, 170 SW and 88 Plains individuals also had alcohol use disorders. Furthermore, percenta ges of abused drugs do not sum to 100 because several individuals had multiple drug dependence. b Controls are defined as those individuals with no diagnosis of alcohol dependence, alcohol use disorder drug dependence, or drug use disorder. 70
Table 3-2. Frequency of Major Haplotypes in Cases and Controls in Southwest and Plains Populations. Population Southwest Plains Phenotype AD AUD DD DUD Control a AD AUD DD DUD Control a Nb 312 344 43 184 130 224 233 47 91 147 221212c 0.412 0.411 0.337 0.416 0.4 0.281 0.281 0.328 0.258 0.265 222121c 0.24 0.23 0.302 0.217 0.208 0.337 0.339 0.34 0.324 0.313 111212c 0.164 0.166 0.186 0.152 0.204 0.152 0.15 0.192 0.154 0.119 111221c 0.08 0.076 0.081 0.092 0.077 0.132 0.129 0.075 0.126 0.16 % total haplotypes 0.896 0.883 0.896 0.877 0.889 0.902 0.899 0.935 0.862 0.857 a Controls are defined as those individuals with no diagnosis of alcohol dependence, alcohol use disorder, drug dependence, or drug use disorder. b Number of individuals in each category differs slightly from Table 1 because only individuals typed for a majority of the 6 single-nucleotide polymorphism (SNPs) were included. c Haplotypes were constructed from the following SNPs (in order): rs2737020, rs1812923, rs356164, rs 356198, rs3775423, rs10033209. AD = alcohol dependence, AUD = alcohol use disorder, DD = drug dependence, DUD = drug use disorder. 71
rs2583985 rs2583978 rs3775439 rs356163 rs2572324 rs356168 rs10033209 rs2737020 rs1812923 rs356186 rs356164 rs356198 rs356165 109,8 Kb5 3 rs3775423 rs2583985 rs2583978 rs3775439 rs356163 rs2572324 rs356168 rs10033209 rs2737020 rs1812923 rs356186 rs356164 rs356198 rs356165 109,8 Kb5 3 rs3775423 Figure 3-1. Relative positions of single nucleotide polymorphisms assessed in -synuclein ( SNCA ) gene. Single-nucleotide polymorphisms (SNP) rs920624 is not included because there are no associated frequency data in the HapMap database The gene structure of SNCA is shown (top), with vertical bars indicating exons. Linkage dise quilibrium (LD) structure is presented for the CEU population allele frequencies from the HapM ap project (top), the Southwest population (middle), and the Plains populati on (bottom). Numbers within the diamonds are D' values for the respective SNP pairs. Solid red diamonds represen t absolute LD (D'=1), blue diamonds represent strong LD with low level of significance. Numb ers in gray within white diamonds represent a high probability or evidence of historical recomb ination. Haplotype blocks, as determined with the use of Haploview software, are depicted 72
Southwest0 0.5 1 1.5 2 2.5 3rs2583985 rs2583978 rs920624 rs27 3 7020 rs1 81 2923 rs37 7 5439 rs 356 186 r s356 16 3 r s356 16 4 r s356 1 98 rs2572324 rs356168 rs3775423 rs1003320 9 rs356165-log p-values Alc-dep Alc-abu Drug-dep Drug-abu Southwest males0 0.5 1 1.5 2 2.5 3r s25 8 3985 r s 258 39 78 r s9 20 624 r s27 3 7020 r s 181 29 23 rs3775439 rs356186 rs356163 r s3 56 164 rs356198 r s 257 23 24 r s3 56 168 r s37 7 5423 rs10033209 r s3 56 165-log p-values Southwest females0 0.5 1 1.5 2 2.5 3r s258 3 985 r s258 3 978 r s920624 r s273 7 020 r s181 2 923 r s377 5 439 rs356186 rs356163 r s356164 rs356 1 98 r s257 2 324 r s356168 r s377 5 423 rs 1 00332 0 9 rs356165-log p-values Plains 0 0.5 1 1.5 2 2.5 3rs2 58 398 5 rs2 5 8397 8 rs9 2 0624 rs2 7 3702 0 rs1 8 1292 3 rs377543 9 rs356186 rs356163 rs356164 rs356198 rs2572324 rs356168 rs3775423 rs10033209 rs356165-log p-values Alc-dep Alc-abu Drug-dep Drug-abu Plains males0 0.5 1 1.5 2 2.5 3r s258 3 985 r s258 3 978 rs920624 rs273 7 020 r s181 2 923 r s377 5 439 r s356186 r s356163 r s35 6 164 r s356198 r s2 572 324 r s356168 r s377 5 423 r s 10033 2 09 r s356165-log p-values Plains females0 0.5 1 1.5 2 2.5 3rs2583985 rs2583978 rs920624 rs27 3 702 0 rs1812923 rs3775439 rs35 6 186 rs356163 r s 356 1 64 rs3 56 19 8 r s 257 2 324 rs356168 rs377542 3 r s1 00 33 20 9 rs3 56 16 5-log p-values Alc-dep Alc-use-dis Drug-dep Drug-use-dis Alc-dep Alc-use-dis Drug-dep Drug-use-dis Southwest0 0.5 1 1.5 2 2.5 3rs2583985 rs2583978 rs920624 rs27 3 7020 rs1 81 2923 rs37 7 5439 rs 356 186 r s356 16 3 r s356 16 4 r s356 1 98 rs2572324 rs356168 rs3775423 rs1003320 9 rs356165-log p-values Alc-dep Alc-abu Drug-dep Drug-abu Southwest males0 0.5 1 1.5 2 2.5 3r s25 8 3985 r s 258 39 78 r s9 20 624 r s27 3 7020 r s 181 29 23 rs3775439 rs356186 rs356163 r s3 56 164 rs356198 r s 257 23 24 r s3 56 168 r s37 7 5423 rs10033209 r s3 56 165-log p-values Southwest females0 0.5 1 1.5 2 2.5 3r s258 3 985 r s258 3 978 r s920624 r s273 7 020 r s181 2 923 r s377 5 439 rs356186 rs356163 r s356164 rs356 1 98 r s257 2 324 r s356168 r s377 5 423 rs 1 00332 0 9 rs356165-log p-values Plains 0 0.5 1 1.5 2 2.5 3rs2 58 398 5 rs2 5 8397 8 rs9 2 0624 rs2 7 3702 0 rs1 8 1292 3 rs377543 9 rs356186 rs356163 rs356164 rs356198 rs2572324 rs356168 rs3775423 rs10033209 rs356165-log p-values Alc-dep Alc-abu Drug-dep Drug-abu Plains males0 0.5 1 1.5 2 2.5 3r s258 3 985 r s258 3 978 rs920624 rs273 7 020 r s181 2 923 r s377 5 439 r s356186 r s356163 r s35 6 164 r s356198 r s2 572 324 r s356168 r s377 5 423 r s 10033 2 09 r s356165-log p-values Plains females0 0.5 1 1.5 2 2.5 3rs2583985 rs2583978 rs920624 rs27 3 702 0 rs1812923 rs3775439 rs35 6 186 rs356163 r s 356 1 64 rs3 56 19 8 r s 257 2 324 rs356168 rs377542 3 r s1 00 33 20 9 rs3 56 16 5-log p-values Alc-dep Alc-use-dis Drug-dep Drug-use-dis Alc-dep Alc-use-dis Drug-dep Drug-use-dis Figure 3-2. Single-marker analyses representing p values for each ma rker on a logarithmic scale. A dotted line indicating a p value of 0.05 is represented in each graph, so values above correspond to significant associations. Graphs on the left refer to the Southwest population whereas graphs on the right refer to the Plains po pulation. The first row represents the en tire data set for each population, while the second row refers to males only and the third row refers to females only. 73
Figure 3-3. Allelic distribution of th e NACP-REP1 microsatellite repeats. Shown for the Southwest and Plains populations. A llele size is depicted on the X-axis while allele frequency is depicted on the Y-axis. 74
CHAPTER 4 LACK OF ASSOCIATION BETWEEN ADH/ALDH MARKERS AND SUBSTANCE USE DISORDER IN NATIVE AMERICAN POPULATION Introduction Native Americans had the highest rate of alcohol related deaths of all ethnic groups in the United States in 2001 (age-adjusted death rate =42.1/100,000) which was more than five times higher than the alcohol-related death rate for the general US popul ation (6.9/100,000) (Health, United States, 2006, http://info.ihs.gov/Files/DisparitiesFacts-Jan2006.pdf ). Native Americans were also 2.5 times as likely to die of chr onic liver disease or cirrhosis than Caucasians (22.7/100,000 vs. 9.2/100,000) (Facts on I ndian Health Disparities, http://www.cdc.gov/nchs/data/hus/hus06.pdf#031). In addition, this group also had the highest frequency of current drinkers who reported drinking more than four drinks in one day in the past year compared to other ethnic groups (40.9% vs. 32.7% for Caucasians, 24.3% for AfricanAmericans, and 20.8% for Asians) (Fact s on Indian Health Disparities, http://www.cdc.gov/nchs/data/hus/hus06.pdf#031 ). It is unclear to what ex tent this is a result of genetic determinants or cultural influences such as poverty and lack of health care on reservations. Many studies have investigated the role of genes in substance abuse, and multiple studies have provided evidence for a locus cont ributing to alcohol depe ndence on chromosome 4 (Long et al. 1998; Reich et al. 1998; Williams et al. 1999; Zinn-Justin and Abel 1999; Mulligan et al. 2003; Ehlers et al. 2004a). Several candidate genes proxima lly located on that chromosome include alpha synuclein (Bonsch et al. 2004; Bonsch et al. 2005a; Bonsch et al. 2005b; Bonsch et al. 2005c; Clarimon et al. 2007) the GABA recep tors (Long et al. 1998) and the alcohol dehydrogenase gene family (ADH ) (Chen et al. 1996; Osier et al. 1999; Mulligan et al. 2003). The seven ADH genes encode enzymes that convert al cohol to acetaldehyde, which is then 75
processed into acetate by enzymes encoded by the ALDH genes. The ADH genes are located in a cluster (ADH7-ADH1C-ADH1B-ADH1CA-ADH6-ADH4-ADH5 ) on chromosome 4 (Osier et al. 2002b). Certain variants within the ADH and ALDH genes have been found to protect against alcoholism; in particular, the ADH1B*47His allele (and possibly the ADH1B*369Cys allele) results in increased blood levels of acetaldehyde leading to an unpleasant flushing response that is proposed to have a protective effect against alcoholism (Thoma sson et al. 1991; Goedde et al. 1992; Thomasson et al. 1993; Nakamu ra et al. 1996). However, the ADH1B*47His allele is only present at polymorphic freque ncies in Asian and Jewish populations (Chao et al. 1994; Thomasson et al. 1994; Chen et al. 1996; Nakamura et al. 1996; Tanaka et al. 1996; Shen et al. 1997; Neumark et al. 1998; Osier et al. 1999; Ehlers et al. 2004a ), although it has been found at low levels in European, African, and Middle Eastern populations as we ll (Borras et al. 2000; Ehlers et al. 2001; Osier et al. 2002b). A protective effect has also been found for the ADH1C*349Ile allele in Asians, Native Americans and Europeans (Chao et al. 1994; Chen et al. 1996; Shen et al. 1997; Borras et al. 2000; Konishi et al. 2003; Mulligan et al. 2003), although this may be a result of linkage disequilibrium with ADH1B*47His in Asian populations that carry this allele (Chen et al. 1999; Osier et al. 1999). An additi onal variant in the mitochondrial ALDH2 gene, ALDH2-2 blocks the conversion of actetalde hyde to acetate resulting in an accumulation of acetaldehyde and a more pronounced flushing response than the ADH1B*47His allele (Harada, Agarwal, and Goedde 1982; Thomasson et al. 1991; Goedde et al. 1992; Thomasson et al. 1993; Novoradovsky et al. 1995; Peterson et al. 1999). The allele has been found to be protective in Asian popul ations (Iwahashi et al. 1995; Chen et al. 1996; Chen et al. 1999; Hara, Terasaki, and Okubo 2000; McCarthy et al. 2000). In addition, the combination of 76
the ADH1B*47His and the ALDH2-2*487Glu alleles have been found to drastically increase the risk of alcoholism in Koreans (Kim et al. 2008). Although ADH1B*47His and ALDH2-2 are absent in most Native American populations, several lines of evidence suggest that genetic variants in or near the ADH genes contribute to alcohol dependence in Native Americans. First, an autosomal genome scan in a Native American population identified this region of chromosome 4 as a possible alcoholism risk locus (Long et al. 1998). Subsequent studies have identif ied additional polymorphisms within the ADH genes that are present in Native Americans (Wall et al. 1997; Ehlers et al. 1998 ; Osier et al. 2002a). Two ADH1C variants (including ADH1C*349Ile ) and a neighboring microsatellite marker were associated with alcohol dependence in a subset of individuals from a S outhwest Native American population (Mulligan et al. 2003). In order to determ ine if these alleles co ntribute to risk of substance abuse in a Native American Plains population, we assayed nine single nucleotide polymorphisms (SNPs) across the ADH1A, ADH1B and ADH1C genes and three markers in the ALDH gene in 359 members of a Plains Native American population. Two different diagnoses for alcohol use were investigated (alcohol depe ndence and abuse) as well as two continuous measures of alcohol use based on survey res ponses. The survey consisted of 18 questions concerning the impact of alcohol use on daily life (see Appendix A). The number of positive responses to both the full 18 questions and a subset of eight diagnostic ques tions was used as the outcome variable in a regression analysis. We also included two measures of drug use (drug dependence and abuse) as a re cent study found an association between drug dependence and ADH variants independent of alcohol behavior (Luo et al. 2007). 77
Materials and Methods Samples The sampling methodology was previ ously described in Mulligan et al. (2003) and is briefly summarized here. Blood samples and clinical data were initially collected from 420 adult members of a Plains Native American population. Ultimately, a subset of 359 individuals that had full diagnostic information was selected for genotyping. Informed consent was obtained under a human subjects research protocol a pproved by the Plains Tribal Council and the Institutional Review Board (IRB) of the Nati onal Institute on Alcohol Abuse and Alcoholism Testing Instruments, Intervie ws, and Psychiatric Diagnoses Focus groups comprised of tribal staff and community members reviewed testing instruments and questionnaires for potential cu ltural biases and general suitability to the population. Research diagnoses for lifetime alco hol dependence and abuse and lifetime drug dependence and abuse were based on: (1) semi-s tructured psychiatric interviews using the Schedule for Affective Disorders and SchizophreniaLifetime Version (SADS-L) with additional information to enable diagnoses using both the Res earch Diagnostic Criteria and Diagnostic and Statistical Manua l of Mental Disorders, Third Edition, Revised (DSM-III-R, American Psychiatric Association, 1987), (2) medical educational, court, and other records, and (3) corroborative information from family memb ers. The SADS-L was administered to all subjects by a psychologist experienced with psyc hiatric assessment in this tribe and other American Indian populations. Sampling strategy, interview procedure, and diagnosis protocol are summarized from (Long et al. 1998; Robin et al. 1998; Mulligan et al. 2003; Belfer et al. 2006). 78
Genotyping Three hundred and fifty-nine indi viduals were typed for five ADH1A, three ADH1B, one ADH1C and three ALDH polymorphisms. Loci, primers, thermocycling conditions, and restriction enzymes are listed in Table 1. PCR amplicons were analyzed by electrophoresis on 2% agarose or 4% Metaphor (FMC BioProducts, Rockland, Me.) gels. Marker ADH1Ain8 BccI was analyzed on a Beckman Coulter CEQ 8000 using the Beckman SNP genotyping kit according to the manufacturers instruc tions (Beckman Inc, Fullerton CA). Statistical Analysis Haploview (Barrett et al. 2005) was used to visualize linkage disequilibrium (LD) relationships between the assayed ADH markers, and linkage di sequilibrium blocks were constructed following the D' method by (Gabriel et al. 2002). PHASE ver. 2.1 (Stephens and Donnelly 2003) was used to reconstruct haplot ypes from unphased population genotypic data. The program R (R Foundation for Statistical Computing, Vienna, Austria) was used to implement tests for Hardy-Weinberg equilibrium. A total of four clinical phenotypes were test ed: alcohol dependence, alcohol use disorder (diagnoses of alcohol dependence and alcohol ab use were pooled), drug dependence, and drug use disorder (diagnoses of drug dependence and drug abuse were pooled). The phenotypic information is summarized in Table 4-2 (categor ies were not mutually ex clusive). Six continuous variable measures of substance use were also tested for association with the genetic data. Two measures of symptom count data were tested, which were calculat ed as (1) the to tal number of affirmative responses to the SADS-L interview questions (Endicott and Spitzer 1978) and (2) the total number of affirmative responses to eight questions used in the diagnosis of alcohol dependence (see Appendix 1 for que stions). The smaller subset of questions captured the most severe aspects of alcohol depende nce and was tested separately to determine association with 79
genetic markers among the most afflicted indi viduals. The other four phenotypes included maximum number of drinks ever consumed in on e day, maximum drinks ever consumed in one month, age at which regular drinking bega n, and age at which heavy drinking began. Genotype and allele frequencies for each mark er were compared for each of the four diagnostic groups with the frequencies in the cont rol group. The chi-square test was used for all comparisons as implemented in the program R. Fo r each analysis, males and females were tested separately, then pooled together. Genotype fr equencies for each marker were tested for association with the six continuous phenotypes. These data were analyzed in SAS v.9.1 (SAS Institute Inc, Cary NC) using a regression mode l that incorporated sex and age as well as genotype information. A subset of individuals (n= 325) were used in the regression analysis due to missing phenotype information for some individuals. In addition, frequencies for the haplotypes identified through the st atistical analysis described above were compared for all cases (i ndividuals with any of the four diagnoses) and controls, defined as individuals without any of the four diagnoses. Haplotype association tests incorporate information from multiple sites and possibly are more powerful than single-marker tests (Shaid et al 2004). Results Three hundred and fifty seven individuals we re genotyped, of which 212 were females and 142 were males. Of the nine ADH markers, four ( ADH1BArg47His ADH1B RsaI ADH1BArg369Cys and ADH1Ain8 BccI ) were not polymorphic in this population and were thus not informative for our analyses. All markers were in HW equilibrium (data not shown). Among the five polymorphic ADH markers, 98% linkage disequilibrium (LD) was determined by Haploview analysis (Figure 41). Because of the very high level of LD, only three major 80
haplotypes were present in the population and represente d 93% of the diversity. The frequencies of these haplotypes were not significantly differe nt between cases and controls (Table 4-3). The genotype, allelle and haplotype frequencie s of each of the seven polymorphic markers were tested for association with the four disc rete phenotypes (alcohol dependence, alcohol use disorder, drug dependence, and drug use disorder ) and the control indivi duals (Table 4-4). A marginally significant association ( p =0.0421) was found between drug dependence and genotype at marker ADH1C EcoRI (Table 4-4). However, the allele comparison at this locus was not significant, and after Bonferroni correction for multiple testing, the p-value for the genotype association was no longer significant (for each phenotype, adjusted alpha = 0.05/ number of comparisons = 0.0035). When males and females were tested separately for association with the four discrete diagnoses and th e genotype data, males had lower p-values in general but none were significant after correction for multiple testing (data not shown). A regression analysis was also performed using six continuous phenotypic categories (total number of affirmative answers on the symp tom count questionnaire, number of affirmative responses for a reduced set of eight questions on the same questionnaire, maximum number of drinks ever consumed in one day, maximum dri nks ever consumed in one month, age at which regular drinking began, and age at which heavy drinking began) and the genotype for each of the seven markers along with sex and age as additio nal predictor variables (Table 4-4). For both symptom count variables, only se x was consistently significan t after correction for multiple testing (alpha=0.002). For both maximum number of drinks variables, again only sex was consistently significant. For the categories relate d to age/years of drinking, age was significantly associated, and sex was only occasionally signifi cantly associated with these variables. The genotype data were not significan tly associated with any of th e continuous phenotype variables. 81
Discussion The ADH and ALDH genes have been extensively st udied in many ethnically diverse populations, and the strongest associations with substance abuse have been found in Asian populations that carry the ADH1B*47His and ALDH2-2 alleles known to protect against alcoholism through toxic accumulation of acetaldehyde (Chao et al. 1994; Thomasson et al. 1994; Chen et al. 1996; Nakamura et al. 1996; Tana ka et al. 1996; Shen et al. 1997; Osier et al. 1999). A previous study found signi ficant association between alcohol dependence and two ADH1C markers ( ADH1CHaeIII and ADHCIle349Val ) between binge drinking and three markers ( ADH1C EcoRI, ADH1C HaeIII, and ADHC Ile349Val ) and between flushing and one marker ( ALDH2-In6A ) in a subset of individuals in a Sout hwest Native American tribe (Mulligan et al. 2003). Thus, we sought to replicate the st udy in a different Native American population in order to strengthen support for ADH as a risk locus. We used a continuous measure of alcoholism and information about behavior in a regression model in addition to traditional dichotomous diagnoses, since cons idering substance abuse as a c ontinuum might enable us to detect associations that would be masked by broad diagnoses which in clude individuals who abuse substances for social or cultural reasons. Three hundred and fifty-nine individuals we re assayed for nine markers from the ADH1A ADH1B, and ADH1C genes and three from the ALDH gene. As previously reported (Mulligan et al. 2003), neither of th e protective alleles ( ADH1B*47His and ALDH2-2) were detected in this population. In contrast to the previous study (Mul ligan et al. 2003), neither of the significantly associated markers in the Southwest population we re found to be significantly associated with any of the tested phenotypes in this Plains population. Age and/or se x were significant in most of the regression analyses, suggesti ng that the demographic informati on is strongly correlated with substance abuse. 82
It is somewhat surprising that this stu dy did not uncover any significant association between the genetic data and substance use diso rder phenotypes, despite the multiple ways in which the genetic data were analyzed and severa l innovative measures of abuse. However, the previous study investigating the association between the ADH/ALDH genes and alcohol use in a Southwest population found only nominally sign ificance and only in a subsets of the study sample (Mulligan et al. 2003) In additi on, a study use a whole-genome scan only found significance for the ADH locus on chromosome 4 using twopoint linkage analysis, but not multipoint linkage analysis which takes into account all marker information from one chromosome (Long et al. 1998). While these results were initially suggestive of ADH/ALDH having genetic determinants linked to alcohol abuse and were the impetus for conducting a similar study in a different Native American popul ation, the results from this study suggest that genetic variants in the ADH and ALDH genes may not be major determinants of substance abuse disorders in Native Americans. There are several po ssible reasons for this re sult. First, during the migration from Asia to the New World, a populat ion bottleneck severely reduced the genetic variability present in the an cestral population (Kolman et al. 1995; Kolman and Bermingham 1997; Ramachandran et al. 2005), which also resu lted in higher LD at the ADH genes in Native Americans than in Chinese or Africans (Mulligan et al. 2003). Variants conferring protection/risk at this locus may therefore have been lost in all or some Na tive American populations. The fact that modest association was found in the Southw est population is consistent with a recent study that found higher diversity in western populations compared with eastern populations in South American Native Americans, suggesting that more an cestral variants might ha ve been retained in western populations (Wang et al. 2007) This pattern could result fr om an initial coastal migration 83
84 from Asia, followed by expansion into the interior regions, as has been recently suggested (Anderson and Gillam 2000; Dixon 2001; Fix 2002; Surovell 2003). Another explanation is that different cultural factors may be influenc ing substance abuse in the Plains population and may play a larger role than genetic factors. This is supported by the demographic component consistently being the only significantly associat ed factor. The genetic component of alcoholism in this case may be swamped by the high number of individuals who exhibit substance abuse phenotypes, because ma ny individuals in the affected category may abuse substances for cultural or social reasons but do not have a genetic pre-disposition, despite the fact that we attempted to use continuous meas ures of abuse to capture degrees of severity. Native Americans experience much poorer health in general than the rest of the population in the U.S. Their infant death rate is almost double that of Caucasians, they have a 40% higher prevalence of AIDS, and are more than twice as likely to be diagnosed with diabetes. These statistics may be associated w ith poorer access to health care as well: 30% of Native Americans had no health coverage in 2005, and 25% of this group lives at the pov erty level (Office of Minority Health, 2006). Therefore, it is likely that environmental f actors contribute heavily to the high prevalence of alcoholism in this Native American population, and suggest that resources could productively be devoted to addressing the poverty and poor health care in this ethnic group.
85 Table 4-1. Loci, primers, cycling conditions and restriction enzymes for 12 loci studied. Marker Primer Name Primer Sequence Thermocycling Conditions Restriction Enzymes ADH1CEcoRI A3EX2DW A3EcoUP2 5-TTGCACCTCCTAAGGCTC-3 5-TCTAATGCAAATTGATTGTGAAC-3 94 C (15 s), 51 C (15 s), 72 C (75 s); 40 cycles EcoRI ADH1CHaeIII A3EX5FOR2 A3EX5REV1 5-TGAGTTTGCACATTAGTTATGG-3 5-TGCTCTCAGTTCTTTCTGGG-3 94 C (40 s), 56 C (30 s), 72 C (60 s); 35 cycles HaeIII ADH1CArg271Gln A3EX6FXNFOR1 A3EX6FXNREV3 5-TTGTTTATCTGTGATTTTTTTTGT-3 5-CGTTACTGTAGAATACAAAGC-3 94 C (15 s), 54 C (15 s), 72 C (60 s); 35 cycles ADHCIle349Val A3FXNFOR1 A3FXNREV3 5-TTGTTTATCTGTGATTTTTTTTGT-3 5-CGTTACTGTAGAATACAAAGC-3 94 C (15 s), 51 C (15 s), 72 C (75 s); 40 cycles SspI ADH1CPro351Thr ADH1CSNPFOR1 A3FXNREV3 5-GTTTTCACTGGATGCACTAATAAC-3 5-CGTTACTGTAGAATACAAAGC-3 94 C (30 s), 51 C (30 s), 72 C (75 s); 40 cycles ADH1BArg47His A2FXNFOR A2FXNREV 5-ATTCTAAATTGTTTAATTCAAGAAG-3 5-ACTAACACAGAATTACTGGAC-3 95 C (30 s), 56 C (30 s), 72 C (60 s); 35 cycles MslI ADH1BRsaI A2IN3DW3 A2IN3UP2 5-ATATTTATTTTACCCTAAACTTATG-3 5-GAGCTAAAACATACTTTGGATAG-3 94 C (30 s), 60 C (30 s), 72 C (30 s); 35 cycles RsaI ADH1BArg369Cys HE39 HE40 5-TGGACTTCACAA CAAGCATGT-3 5-TTGATAACATCTCTGAAGAGCTGA-3 95C (15 s), 58C (15 s), 72C (60 s); 35 cycles Alu NI ADH1Ain8BccI A1BccIDW A1IN8UPI A1BccITUP 5-ATTGTCTAGCAGAAAATGAAAAG-3 5-AGTTTCTTTCCCTCCTCAAGAATG-3 5TTTTTTTTTTTTCTAATTTTTCTCATCCTTCCA -3 94 C (15 s), 54 C (15 s), 72 C (60 s); 35 cycles NA ALDH2.5' 5 .for 5 .rev 5-GCAGTGCCGTCTGCCCCATCCATGT-3 5-GGCCCGAGCCAGGGCGACCCTGAGCT-3 94 C (30 s), 602 C (30 s), 72 C (30 s); 40 cycles SacI ALDH2.In6A In6A.For In6A.Rev 5-AAATATTGCTCTAGGCCAGGC-3 5-TGGGAATTCTAAATGGGACGG-3 94 C for 10 cycles/89 C for 30 cycles(30 s), 55 C (30 s), 72 C (30 s); 40 cycles HaeIII ALDH2.Def L12 R12 5-TTTGGTGGCTAGAAGATGTC-3 5-CACACTCACAGTTTTCTCTT-3 94 C (30 s), 57 C (30 s), 72 C (30 s); 40 cycles MboII
Table 4-2. Phenotypic charac teristics of the dataset. Phenotype Females Males Total Alcohol dependence a 101 105 206 Alcohol Abuse a 106 111 217 Drug Dependence a 15 32 47 Drug Abuse a 42 51 92 All Cases b 113 112 225 Control b 99 30 129 Average Total Symptom Count (18) 5.5 9.6 7.1 Average Reduced Symptom Count (8) 3.3 5.2 4.1 a Individuals can be part of multiple phenotypes, a nd therefore the totals do not sum to the toal number of individuals (359). b All cases and controls sum to 359. Table 4-3. Haplotype frequenc ies and p-value for comparis ons of cases vs. controls. Haplotype Frequency All individuals Frequency Cases Frequency Controls p-value 11212 0.51 0.49 0.55 0.173 12122 0.34 0.35 0.33 0.490 22121 0.08 0.08 0.05 0.138 For haplotype designations, the order of the markers is as follows: ADH1CEcoRI, ADH1CHaeIII, ADH1CArg271Gln, ADHCIle47Val, ADH1CPro351Thr. refers to the presence of a restriction site, and refe rs to the absence of a restriction site. 86
Table 4-4. Chi-squared and regression p-values for genotype and allele associations for each marker. Marker P-value ADH1C EcoRI ADH1C HaeIII ADH1C Arg271Gln ADHC Ile349Val ADH1C Pro351Thr ALDH 2.5' ALDH 2.In6A Frequency Major Allele 0.9 0.54 0.53 0.54 0.92 0.65 0.94 Dichotomous Diagnoses a genotype 0.2725 0.1366 0.0914 0.1682 0.2062 0.2662 0.2775 Alcohol Dependence allele 0.19 0.168 0.204 0.195 0.091 0.146 0.184 genotype 0.2642 0.1005 0.0746 0.1377 0.2062 0.3864 0.2624 Alcohol Abuse allele 0.171 0.125 0.176 0.166 0.09 0.233 0.177 genotype 0.0421 0.1136 0.05997 0.0954 0.2436 0.2352 0.7812 Drug Dependence allele 0.588 0.565 0.44 0.523 0.365 0.107 0.567 genotype 0.1525 0.2407 0.2702 0.2966 0.1319 0.341 0.3318 Drug Abuse allele 0.121 0.188 0.417 0.327 0.257 0.174 0.142 Continuous Diagnoses b genotype 0.0989 0.1485 0.136 0.1873 0.0265 0.2377 0.3569 age 0.9461 0.8225 0.7142 0.7464 0.9693 0.9464 0.9952 Symptom Count (18) sex <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 genotype 0.0868 0.3751 0.267 0.3551 0.0304 0.1151 0.5301 age 0.396 0.3553 0.2432 0.257 0.4223 0.352 0.4559 Symptom Count (8) sex <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 genotype 0.5815 0.7387 0.9867 0.7225 0.6649 0.835 0.5663 age 0.354 0.3983 0.4531 0.4677 0.2355 0.3546 0.1893 Max Drinks/month sex 0.001 0.0009 0.0028 0.0017 0.0006 0.0013 0.0006 genotype 0.5815 0.6398 0.6768 0.6014 0.0839 0.7642 0.3218 age 0.354 0.1074 0.1322 0.0736 0.1048 0.117 0.0591 Max Drinks/Day sex 0.001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 genotype <0.0001 0.5548 0.515 0.6154 0.0709 0.62 0.9289 age <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 Years of Heavy Drinking sex 0.0215 0.0448 0.0208 0.0235 0.0434 0.0414 0.049 genotype 0.0403 0.2558 0.1969 0.2071 0.0482 0.9888 0.7319 age 0.0004 0.0004 0.0007 0.0006 0.0003 0.001 0.0008 Start Heavy Drinking sex 0.0542 0.0819 0.0625 0.0823 0.0631 0.082 0.0709 genotype 0.1465 0.7951 0.7094 0.7449 0.2044 0.9803 0.1766 age 0.0056 0.006 0.0288 0.0308 0.0055 0.01 0.0059 Start regular drinking sex 0.0006 0.0017 0.0019 0.0018 0.0009 0.0019 0.0009 a All analyses were performed using the chi-squared test. b These analyses were performed using a regression model with each marker separately, and age and sex included in the model. Values in bold indicate significance after correction for multiple testing. 87
Figure 4-1. Linkage disequilbrium of markers assessed in the ADH gene family. Linkage disequilibrium (LD) stru cture is presented for the Plai ns population. Numbers above the red blocks indicate polymorphic assayed markers in ADH1A, B and C : 1=ADH1CEcoRI 2= ADH1CHaeIII, 3= ADH1CArg271Gln 4= ADHCIle47Val, 5=ADH1CPro351Thr Numbers within the diamonds are D' values for the respec tive marker pairs. Solid red diamonds represent absolute LD (D'=1). One haplotype blocks, as de termined with the use of Haploview software, was determined. 88
CHAPTER 5 DYNAMIC AND DISTINCT EVOLUTION OF HIV-1 IN BREASTMILK OVER TWO YEARS POST-PARTUM Introduction As of 2007, 33.2 million people were living with the human immuno-deficiency virus type 1 (HIV-1) worldwide. 22.5 million of those peop le (68%) live in sub-Saharan Africa. Worldwide, an estimated 420,000 children were infected in 2007, the vast majority through mother-tochild-transmission (MTCT) of HIV-1 (WHO 2007). Breast-feeding accounts for one-third to one-half of all MTCT events over 24 months (Dabis et al. 1999; Iliff et al. 2005). In the US, women are counseled by the CDC to replace breastf eeding with formula if infected with HIV-1 (CDC 2007), which (along with anti-retroviral drugs and cesarean-sections) resulted in only ~300 infants becoming infected perinatally in 2000 (CDC 2006b). Because of the seemingly obvious reduction of transmission with formula feed ing, as demonstrated in studies in developed countries, the World Health Organization reco mmended that when replacement feeding is acceptable, feasible, affordable, sustainable, a nd safe, avoidance of all breastfeeding by HIVinfected mothers is recommended (WHO 2003). Ho wever, formula-feeding is impractical for women in resource poor regions of the world wher e they do not have consistent access to clean water, formula, and health care, and breast f eeding may be the only practical option. Cultural pressures also make women reluctant to eschew breastfeeding as this ca n be seen as a tacit admission of HIV-1 status. The result was an ine ffectual policy because women were essentially left with no practical guidelines for reducing the risk of transmission. This situation highlights the critical need for an anthropol ogical perspectiv e in international health policy, which is often based on ideal conditions and Western epidemiology. A shift in the tradit ional thinking about breastfeeding was influenced by the supposition that breastfeeding by infected mothers actuall y reduced infant mortality (Ross and Labbok 89
2004). Recent observational studies have suggested that exclusive breastfeeding, as opposed to the simultaneous feeding of milk and other foods, may significantly reduce the risk of transmission of HIV-1 (Coutsoudi s 2000; Coutsoudis et al. 2001; C outsoudis et al. 2002; Iliff et al. 2005; Kuhn et al. 2007). The WHO subsequen tly changed its recommendations to women in developing countries to state that breastfeeding is prefer able to artificial feeding in the first six months of life, regardless of the mothers HIV status, as replacement feeding poses a greater risk of death to the infant than breastfeeding from an HIV-infected mother in first months. HIVinfected mothers are advised to wean their infants early to avoi d prolonged exposure of the infant and to avoid combining breastfeeding with repla cement feeding, which appears to heighten the risk of transmission (WHO 2006). While this revision addresses some of the cultural barriers presented by the former recommendation, there ar e still problematic aspects. The rapid weaning advocated by the WHO is often painful for the mo ther and stressful for the infant, often leading to a relapse in breastfeed ing after the introduction of other foods. Furthermore, in many African countries the typical duration of breastfeeding lasts well after the infants first year. Therefore, the recommendation to wean early still presents the mother with the challenges of finding and affording formula, clean water, and social stig mas. Furthermore, the biological mechanisms underlying the reduction of risk through exclus ive breastfeeding have not been clearly elucidated, and the optimal durati on of breastfeeding is still debated. Several studies suggest that the risk of transmission actually declines over time, and that exclusive breastfeeding is only necessary for the first four months (Kuhn et al. 2007). If this can be bourne out, women would be able to avoid the painful abrupt weaning proce ss and the problems associated with complete replacement feeding without undue risk of tr ansmission, and would be a more culturally appropriate recommendation. The benefit of the anthropological genetic perspective which I 90
bring to this study is the ability to use evolutionary analyses to investigate the molecular basis of modulated risk of MTCT, with the goal in mind of advocating a scientifically sound and culturally sensitive breastfeeding management pl an to women while eliminating unnecessarily onerous measures. Background Human Immunodeficiency Virus Type 1 Infection There are two major types of HIV: HIV-1 and HIV-2. HIV-1 is categorized into three main subtypes, M, N, and O, which derive from sepa rate zoonotic events in which the simian immunodeficiency virus (SIV) was transmitted from Pan troglodytes troglodytes (chimpanzee) (Gao et al. 1999). Group O viruses have only been found in people living or having contact with central Africa (mainly Cameroon and some neighbouring c ountries) (Gurtler et al 1994; Loussert-Ajaka et al. 1995), and group N viruses have only been reported from Cameroon (Simon et al. 1998). Within group M, eleven major subtypes of HIV1 (A-D, F-H, J-K) and at least 16 published circulating recombinant forms (CRFs) make up the majority of infections worldwide (Leitner et al. 2005). Subtypes are specific to geographic regions reflecting the initial routes and modes of transmission. In the United States, Western Eu rope, and Australia, subtype B is found almost exclusively, while in sub-Saharan Africa and India, subtype C is most prevalent. Subtypes can be up to 25% different in nucleotide diversity (Perrin, Kaiser, and Yerly 2003). The HIV-1 genome is ~10,000 bp in size and codes for the gag, pol and env genes characteristic of all retrovirus es (Steffy and Wong-Staal 1991). Env encodes for the glycoproteins (gp)120 and gp41, which are located on the surface of the lipid membrane surrounding the viral particle. Gp120 binds to the host cell-su rface molecule CD4, which is expressed primarily on the surface of T-lymphocytes and macrophrages, i.e. leukocytes critical to the immune response to infection (Maddon et al. 1986). HIV-1 al so requires a co-receptor to 91
enter the host cell, prin cipally either the CCR5 or CXCR4 chemokine receptors, which are differentially expressed on subsets of macr ophages and lymphocytes (Broder and Collman 1997). CCR5 is mainly expressed on macrophage s and antigen-primed memory T-cells, while CXCR4 is expressed on unactivated nave T-cell s and some macrophages. Different HIV viruses differ in their ability to use each co-receptor, and are categorized as an R5 virus (uses the CCR5 co-receptor), X4 virus (uses the CXCR4 co-receptor), or R5X4 (uses both) (Berger et al. 1998; Garzino-Demo et al. 2000). Typi cally, only R5 viruses are found early in infection, while X4 viruses emerge later in the majo rity of infected individuals (B jorndal et al. 1997; Connor et al. 1997), and appear to evolve from earlie r R5 viruses (Salemi et al. 2007). After attachment to CD4 and th e co-receptor, the vira l lipid envelope fuses with the target cell lipid membrane, which allows the viral core to enter the cell. The core is comprised of the double stranded RNA, proteins, and enzymes. Afte r entry, the viral RNA is reverse-transcribed into double-stranded DNA in the cytoplasm and tran sported to the nucleus as a pre-integration complex. The viral integrase enzyme integrates the viral DNA through linkage between the long terminal repeats at each end of the viral DNA and in the host DNA. The viral DNA is then transcribed into viral RNA, which either remains in tact or else is spliced and transported to the cytoplasm for translation into regulatory and po lyproteins. These proteins along with the fulllength RNA transcripts are packaged into viral part icles that bud from the surface of the infected cells and enveloped by the host cell membrane The viral particle must undergo a final maturation step in which the gag polyproteins are cleaved by viral prot ease (Goodenow et al. 2003). Env gp120 is comprised of five conserved region s (C1-C5) and five variable regions (V1V5). The determinants for co-receptor use are localized to the V3 loop of gp120 (Carrillo and 92
Ratner 1996; Hung, Vander Heyden, and Ratner 1999). In subtype B viruses, the net charge of the amino acids in the V3 loop in gp120 is predictive of co-receptor usage (Briggs et al. 2000). However, in other subtypes, the correlation may not be as strong. Env can differ by up to 5% within an individual (Lamers et al. 1993), and can differ up to 30% among subtypes. N-linked glycosylation sites are prevalen t in gp120, and are involved in th e structure and folding of the protein. The glycosylation sites can also form a glycan shield to block neutralizing antibody response (Wei et al. 2003). The V1 and V2 regions display consid erable diversity in terms of number of glycosylation sites, length, and amino acid variation over th e course of infection in one individual (Hughes, Bell, and Simmonds 1997a; Klevytska et al. 2002; Kitrinos et al. 2003; Nabatov et al. 2004; Ritola et al 2004; Sagar et al. 2006). Furtherm ore, functional (Pastore et al. 2006) and phylogenetic (Salemi et al. 2007) studies suggest that X4 sequences evolve according to a consistent program of development that is recapitulated in indivi dual patients and that requires compensatory mutations in V1V2 to occur prior to emergence of high charge V3 domains. Thus, analyzing the diversity in gp120 provides both phylogene tic signal as well as identification of potentially releva nt functional changes over time. Stages of Breastmilk Production The initial HIV-1 infection in the breastmilk is poorly understood. Breastmilk is a complex composition of cells, proteins, water, ions, fats vitamins and minerals. Secretory alveolar epithelial cells in the mammary gland surround multiple lumina, which are storage chambers for the milk. Lactogenesis I refers to the onset of secretion of milk components to form colostrum (i.e. the early milk), which occurs at 16-24 weeks of pregancy (Arthur, Smith, and Hartmann 1989). During this phase, the epithe lial cells of the alveoli differe ntiate into secretory cells, which then begin to synthesize lactose, casein, a nd milk fat triglycerides, and are secreted into the lumen. The alveolar epithelial cells also extract water, vitami ns, and minerals from the blood 93
capillaries surrounding the milk duct. During this st age, the epithelial cells are not tightly joined, which allows elements from the blood such as im mune cells and plasma pr otein to directly enter the lumen via the paracellular pathway (Neville and Neifert 1983), as well as HIV-1 infected cells or viral particles from the plasma. Colost rum contains higher conc entrations of sodium, chlorine, and proteins than mature milk (G eorgeson and Filteau 2000), as well as a higher proportion of leukocytes (white bl ood cells) (Goldman 1993), and is optimized for the infants requirements in the first days after birth. All of the milk components remain in the lumina until the suckling infant stimulates activation of the hormone oxytocin, which in turn causes the contraction if the alveolar cells and the flow of the milk through the duct system (Fuchs 1991). Lactogenesis II is defined as the initiation of large amounts of milk wh ich is triggered by a decrease in progesterone a few days post-part um (Neville et al. 1991). Biochemical changes include increase in lactose and glucose and a decr ease in sodium and prot ein resulting from the closure of tight junctions between the epithelial cells. This clos ure is reversed during weaning when the milk volume falls to <400 ml/day wh ich corresponds to <2 feedings per day and corresponds with significant increases in sodium, chloride, and protein and a decrease in lactose (Neville et al. 1991). Inflammation of the br east (mastitis) also causes the opening of the paracellular pathway, and increase d sodium and albumin levels ar e associated with mastitis (Shuster, Kehrli, and Baumrucker 1995; Semba et al. 1999b; Becquart et al. 2000; Rollins et al. 2001). The increased HIV-1 loads associated with both weaning (Thea et al. 2006) and mastitis have been suggested to result from these lea ky ducts in which cell asso ciated (viral DNA) and cell-free (viral RNA) virus can mo re efficient transfer from the plasma to the milk (Kuhn et al. 2007). 94
Cellular Composition of Breastmilk The relative composition of cells in the breastm ilk has not been consistently reported, due to the changing composition of the milk over time the storage conditions of the expressed milk, and the use of different measurement technique s (Kourtis et al. 2003). Ea rly studies suggested that macrophages predominate both in the co lostrum (Crago et al. 1979) and in mature breastmilk (Ho, Wong, and Lawton 1979; Pitt 1979). However, later studies suggest that polymorphonuclear cells (neutrophils) comprise 80% of all cell types, followed by macrophages (15%) and lymphocytes (5-10%), most of which are T-lymphocytes (Goldman, Chheda, and Garofalo 1998). The leukocyte conc entration is highest during early lactation and decreases 5-10 fold by the end of the first week post-part um (Goldman 1993; Georgeson and Filteau 2000). Since macrophages and T-lymphocytes are the primary cells infected by HIV-1, the high frequency of these cells in the breast m ilk represent a large target cell population. The phenotype and functional characteristics of milk T cells are different than peripheral blood T cells (Bertotto et al. 1990b). Almost all of the breastmilk T-cells are memory cells, as evidenced by high expression of activation markers such as HLA-DR, CD25, and CD45RO (Bertotto et al. 1990b; Wirt et al. 1992; Rivas, el-Mohandes, a nd Katona 1994; Kourtis et al. 2003). A large percentage of T cells express mucosal homing markers such as CD49f, alpha4beta7 integrin, and CD103+, sugges ting that the T cells found in milk migrate from other tissues in the body where they were orig inally activated (Bertotto et al. 1990b; Kourtis et al. 2003). A possible source for the T-cells is the gut a ssociated lymphoid tissue (GALT) (Manning and Parmely 1980; Kourtis et al. 2003), which is supp orted by the high frequency of cells carrying the gamma/delta T-cell receptor in both the GALT (Ullrich et al. 1990) and in the breast milk, but not the plasma (Bertotto et al. 1990a). The GA LT contains the majority of the T-cells in the human body, and during initial HIV in fection experiences almost a co mplete depletion of T-cells 95
due to high levels of infection (Douek 2007b; D ouek 2007a). If the milk is being populated by GALT-derived T-cells, many of the cells are likel y infected with HIV-1 and are the source of infection. Macrophages in the breastmilk also have distin ct characteristics from those in the blood. For example, they spontaneously produce granul ocyte monocyte colony s timulating factor (GMCSF) and can differentiate into CD1+ dendritic cells in the presence of interluken-4 (IL4) alone, in contrast with monocytes in the periphera l blood mononuclear cells (PBMC), which require GM-CSF and IL4 (Ichikawa et al. 2003). IL-4 stimulated breastmilk macrophages express DCSIGN, a dendritic cell-specific receptor for HIV-1. During mastitis, IL-4 is locally produced, which may then up-regulate DC-SIGN expressi on, which could lead to increased HIV-1 infection of macrophages (Ichikawa et al. 2003). This is an alte rnative explanation for the high levels of viral load in the breastmilk associat ed with mastitis, rather than the leaky duct hypothesis. Epithelial cells from the mammary gland itself we re reported present in mature milk at low levels (Xanthou 1997) although another study found that almost 80% of all cells were epithelial (Petitjean et al. 2007). Mammar y epithelial cells can become pr oductively infected with HIV-1 (Toniolo et al. 1995), and therefor e the infection in the breastmilk could originate from either infected epithelial cells shedding into the lumina, or else cell-fr ee viral particles produced by the epithelial cells pass in to the milk and infect T-cells and macrophages already there. Compartmentalization of Breastmilk Virus As discussed above, the origin of the virus in breastmilk is unclear, and the molecular genetic evolution of the virus in breastmilk over time has not been extensively investigated. One important question that remains to be answer ed is whether the virus in the breastmilk compartmentalizes after the initi al infection, i.e, forms a sepa rate population initially due to 96
restricted gene flow with other tissues (Nickl e et al. 2003), and accelerated by the high in vivo mutation rate (Saag et al. 1988) and differential se lective pressures (Haase et al. 1996; Pilcher et al. 2001). This is an important question because viruses in different compartments may exhibit differential pathogenesis (Donaldso n et al. 1994), response to drug therapy (Si-Mohamed et al. 2000; Venturi et al. 2000; Smit et al. 2004) and potentially different transmission rates. Compartmentalization has already been established for the genital tract in men and women (Zhu et al. 1996; Poss et al. 1998; Ping et al. 2000; De Pasquale et al. 2003; Kemal et al. 2003; Philpott et al. 2005; Pillai et al. 2005; Sullivan et al. 2005), which as th e most common route of HIV transmission worldwide (Royce et al. 1997) has tremendous implications for our understanding of the transmission a nd initial seeding of infection. If particular char acteristics of the virus infecting the genital tract can be identi fied, such as tropism, co-receptor usage, viral epitopes, or structural characte ristics, then vaccines and drug intervention could potentially be developed to target these attributes. Because the virus both within and between individuals is so variable, interventions will be more efficient if pa rticular subsets of viruses can be targeted and neutralized, and eradicat ing the early virus before it can in fect other tissues is critical. Distinct viral populations have al so been identified in the central nervous system due to the blood-brain barrier, which is important becau se many anti-retroviral drugs are unable to penetrate the brain, and th e proliferation of the virus there ca n lead to AIDS-associated dementia (Korber et al. 1994; Hughes, Bell, and Simmonds 1997b; Gatanaga et al. 1999; Morris et al. 1999; Shapshak et al. 1999; Staprans et al. 1999 ; Venturi et al. 2000; Sm it et al. 2001; Wang et al. 2001; Ohagen et al. 2003; Langford et al. 2004; Petito 2004; Smit et al. 2004; Thompson et al. 2004; Abbate et al. 2005; Burkala et al. 2005; Ritola et al 2005; Salemi et al. 2005; Strain et al. 2005; Pillai et al. 2006). Ly mphoid, spleen, and lung tissu es are also subject to 97
comparmentalization (Wong et al. 1997; Salemi et al. 2007). The few studies that have investigated compartmentalizati on in the breastmilk have found conflicting results. One study found compartmentalization betwee n breastmilk virus DNA and RNA with respect to plasma and PBMC virus (Becquart et al. 2002). A second study found that viral va riants were similar between plasma and milk (Henderson et al. 2004 ). However, both studies only considered the virus from a single timepoint in each patient. In addition, the median time since delivery in the first study was 3 months versus 12 months in the second study, which may explain the apparently contradictory results if compartmen talization is not static, but changes over time. Risk of Transmission via Breast-feeding Increased transmission has been associated with the mothers health status and the method of feeding. In particular, an incr eased risk of transmission is corre lated with a high viral load in the mothers plasma (John et al. 2001; Fawzi et al. 2002a; Rousseau et al. 2003) and breastmilk (Van de Perre et al. 1993; Lewis et al. 1998; Semba et al. 1999a; Pi llay et al. 2000; Rousseau et al. 2003; Rousseau et al. 2004). Br eastmilk RNA concentrations are typically 2-3 log lower than the plasma viral load, though th e two are highly correlated (R ousseau et al. 2004). RNA viral loads were found to be significantly higher in the colostrum than in the mature milk produced 14 days after birth (2.59 log 10 copies/ml vs. 2.19 log 10 copies/ml) but did not significantly change from 14 days to 15 months (Rousseau et al. 2004). Although this may explain earlier studies suggesting that most transmission events occur during the first 6 weeks after delivery (Miotti et al 1999), it can be difficult to determine if an in fant was infected via breastfeeding or intrapartum early in life. Both cell-free (RNA) and cell-associated ( DNA) viral load indepe ndently increases the risk, although the two measures are highly correlated (R ichardson et al. 2003; Koulinska et al. 2006). A ten-fold increase in breastmilk RNA is associated with a two-fold increase in risk of 98
transmission, while a ten-fold in crease in infected cells (DNA) is associated with a three-fold increase in transmission (Rousseau et al. 2003; Rousseau et al. 2004). RNA was detected in one study in 57% of all breastmilk samples from HI V-1+ women, and in 74% of all samples from women who transmitted the virus (Koulinska et al. 2006). Cell associat ed (DNA) virus was detected in 74% of all breast milk samples from HIV+ women and in 87% of the samples from women who transmitted via breastfeeding (Koulinsk a et al. 2006). Interestingly, RNA virus was significantly associated with late transmission of HIV-1 (> 9 months post-partum) while DNA virus had no time-dependent association. Other factors increasing the risk of transmi ssion via breastmilk in clude a low CD4 count (Leroy et al. 2003; Iliff et al. 2005), mastitis (Van de Peere et al. 1992), and potentially subclinical mastitis (the symptoms are not evident) which are all associated with high viral loads (Semba et al. 1999b; Willumsen et al. 2000). If the mother was infected after birth, the risk of MTCT may be increased 6-fold due to the high viral loads associated with primary infection (Embree et al. 2000). Maternal nutrition status may also be associated with transmission; vitamin A supplementation was associated with an increas ed risk, while multivitamin use was associated with a lower risk (Fawzi et al. 2002b). A possi ble explanation for thes e results is the recent observation that env gp120 binds to an activated form of the integrin alpha4-beta7 on CD4 T cells which facilitates infec tion of neighboring cells. Retin oic acid, which is derived from vitamin A, activates alpha4-beta7 and promotes binding to gp120. Interestingly, the function of alpha4-beta7 is to act as a homing receptor for T-cells migrating to the GALT (Arthos et al. 2008). Therefore, a possible hypothesis is that increased vitamin A supplementation by the mother may be passed to the infant through the breast milk, which activates alpha4-beta7 in the 99
gut of the infant, which in turn promotes bind ing of gp120 to CD4+ cells in the gut, leading to increased infection. The mode of breastfeeding is also associated with differential risk and is the basis for modification of the WHO guidelines to HIV+ mothers concerning breastfeeding. Recent observational studies have found that exclusive breastfeeding appears to lower the risk of transmission over mixed feeding, which is define d as providing the infant with any food other than water. A study in Durban, South Africa de monstrated an equal rate of infant HIV-1 infection by three months among mothers who ha d never breastfed and whose infants were negative at birth (13.2%) and mothers who had ex clusively breastfed up to 3 months (8.3%), compared to a significantly high er infection probability for in fants who were mixed fed (19.9%) (Coutsoudis et al. 1999; Coutsoudis et al. 2001). In a follow-up study of the same cohort, the cumulative number of HIV-infections (including inter-uterine and intra-partum) was 19.4% for both the non-breastfeeding and the exclusively breastfeeding mo thers, compared to 26.1% for the mixed-feeding mothers by six months (Coutso udis et al. 2001). In another study in South Africa, the cumulative probability of infection at six months for exclusively breastfed infants was 15% compared with 7% for non-breast fed infa nts and 27% for mixed-fed infants (Coovadia et al. 2007). In a study of >2000 mothers in Zimbab we, infants who were HIV negative at six weeks and who were exclusively breast fed were si gnificantly less likely to be infected at six months (1.31%) than mixed fed infants (4.4%) (Ili ff et al. 2005), while in a study in Zambia, 4% of exclusively breastfed infants who were nega tive at six weeks were infected by 4 months, compared to 10% of mixed fed infants (Kuhn et al. 2007). The mechanism for the reduced risk via exclusive breastfeeding has not been demonstrated. One possibility is that damage to the infants gut mucosa from early introduction of food compromises the intestinal mucosal barrier 100
or intestinal immune activation from early introduction of foreign antigens or pathogens facilitates transmission (Smith and Kuhn 2000). Mixed feeding could be associated with suboptimal breastfeeding, subclinical mastistis (i n which no swelling of the mammary gland is visible during inflammation), or poor health in general which leads to higher viral loads (Willumsen et al. 2000), so the association betw een mixed feeding and transmission is complex (Chisenga et al. 2005). However, the study in Zamb ia controlled for viral load and maternal health (as measured by CD4 cells) and still found a significant association (Kuhn et al. 2007). The increased risk in transmi ssion upon cessation of exclusive br eastfeeding may also be related to the increase in mammary epithelial permeab ility upon weaning (Kourti s et al. 2003). This hypothesis is supported by both the finding that the mean concentration of plasma albumin was higher in the milk of transmitting mothers (Becqua rt et al. 1999) as well as the increase in breastmilk viral load following weaning (Thea et al. 2006). In addition, the duration of the breastfeeding has been suggest ed influence the risk of transmission, although the associat ion is not clear. Over a tw o year period, the risk of transmitting the virus through breastfeeding was es timated at 16% (37% for mothers who did breastfeed, vs. 21% for mothers who did not) (Nduati et al. 2000). A study in Zimbabwe found that more than 2/3 of all postnatal MTCT occurred after six m onths (Iliff et al. 2005) while a study in Kenya found that the majority of tran smission events occurred before six months (Nduati et al. 2000). Other studies have found an approximately constant risk over time (Coutsoudis et al. 2004) or a redu ced risk over time (Kuhn et al. 2007). The contradictory results are difficult to interpret, alt hough it is difficult to compare across studies because of the different interpretation of intra-partum vs. early breas tfeeding transmission and the non-distinction of exclusive vs. non-exclusive breastfeeding. In a st udy in Zambia, the risk of HIV-1 transmission 101
during non-exclusive breastfeeding wa s shown to be greater during the first four months (~2.4% per month) compared with 1% from 5-12 mont hs, and 0.5% from 12-24 months, suggesting that the greatest benefit for exclusiv e breastfeeding is only until month 4. The risk of continuing to breastfeed non-exclusively was compensated by the mortality associated with not breastfeeding, and no difference in HIV-1 free survival was detected between mothers who stopped breastfeeding at four months versus mothers who continued up to 16 months (Kuhn et al. 2007; Sinkala et al. 2007). Therefore, while the cumu lative probability of HIV-1 transmission may increase over time, the more important measure of HIV-1 free survival may be increased with continued breastfeeding. Our Study The current WHO recommendations concerning breastfeeding by HIV+ mothers are based on several observational studies that provide strong evidence for the attenuation of risk of MTCT by exclusive breastfeeding. However, although the WHO advises women to abruptly wean at 6 months, recent data suggest a decreased risk of MTCT over time that is outweighed by the benefits of continued breastfeeding (Kuhn et al 2007). Furthermore, while exclusive breastfeeding is critical during the first several months post-partum, after 4 months the benefits are less evident (Kuhn et al 2007) Clearly, a better understanding of the molecular mechanisms underlying the observations is needed in orde r to provide HIV-1 positive women with the optimal recommendations. HIV-1 rapidly evolves within a patient in re sponse to constantly ch anging host pressures. Within several months of infection a new vira l population may emerge, often with functionally significant differences. New viral populations may have an altere d entry, replicat ion, or immune evasion phenotype, which could impact the overall pathogenicity or transmissi bility of the virus. Therefore, investigating the vira l evolution over seve ral years can provide important information 102
about both the characteristics of the virus as well as the dynamic host environment. It is likely that features of the virus at pa rticular junctures during the course of breastfeeding may impact its ability to initiate a new infec tion in the infant. Understanding the evolutionary dynamics of the milk virus may elucidate the underlying cause for the attenuated risk of MTCT associated with exclusive breastfeeding as well as the re duced risk of transmission over time. This study is the first to inve stigate the longitudina l evolution of HIV-1 in breastmilk with a focus on elucidating important factors underlyi ng transmission to the infant. I have posed several questions designed to a ddress aspect of the viral evolu tion that may be important in transmission. First, I wanted to determine th e means by which virus enters the breastmilk. Specifically, I investigated if breastmilk is inf ected by virus from the plasma during the initial stages of lactogenesis, or if the early breastmilk virus is distinct from the virodeme (HIV-1 subpopulations) in the peripheral blood system suggesting an alternat ive infection. Second, I wanted to learn more about the characteristics of the virus when it is in breastmilk. Specifically, I investigated if virus in the br eastmilk evolves into a separate population distinct from the peripheral virus population, indicatin g compartmentalization. Relate d to that question, I also investigated to what degree the virus migrates between tissues. Third, I wanted to learn more about what role natural selection might play in the evolution of the virus. Specifically, I explored what selective pressures are present during the evolution of the breastmilk virus, and whether they differed between the plasma and the milk. Fina lly, I wanted to know if the characteristics of the virus in breastmilk change over time. Sp ecifically, I considered whether particular evolutionary dynamics between four and six mo nths could be associated with either the documented cessation of exclusive br eastfeeding or the transmission of the virus to the infant. To address these questions, I analyzed the population of viruses, or virodeme, present in the breast 103
milk (left and right) and plasma RNA over a tw o-year period (timepoints at birth, 1, 4, 12, 18, and 24 months) from an HIV-1 positive woman who was documented to have transmitted the virus to her infant via breastfeedi ng at four months post-partum. Materials and Methods Subject The HIV-1 positive female subject in this st udy was enrolled in the Zambia Exclusive Breast Feeding Study (ZEBS). The goal of the ZEBS was to determine whether exclusive breastfeeding was associat ed with a lower risk of transmi ssion (Thea et al. 2004; Kuhn et al. 2007). All women were encouraged to breastfeed exclusively until month 4, at which time women were randomized into two groups: the control group, in which women were encouraged to exclusively breastfeed until six months and then gradually introduce complementary foods, and the intervention group, in which women were en couraged to rapidly wean their infants after four months (Thea et al. 2004; K uhn et al. 2007). This patient wa s part of the control group and breastfed until 18 months post-partum. She reporte d exclusively breastfeeding until four months, and reported mixed feeding at six months, which was defined as providing the infant with any non-maternal milk substance including water and cows milk. Her infant had the first positive PCR for HIV at 4 months, and negative PCRs at 1 week, and 1, 2, and 3 months, strongly suggesting that breastmilk was the vehicle of transmission. Breast milk and blood were serially sampled from the mother at 1 week, 1 month, 4 months, 12 months, 18 months, and 24 months (see Figure 5-1). Viral Isolation, Amplification, and Sequencing RNA was isolated from breastmilk and plasma samples using a Qaigen RNA Easy kit (Qiagen, Valencia, CA, USA) and 10 ul of RNA extr act was used as template in a One-Step RTPCR kit (Invitrogen, Carlsbad, CA, USA) accordi ng to manufacturers instructions. To amplify 104
the V1V5 region of the env gp120 gene, I used first round primers pol-env (5GAGCAGAAGACAGTGGCAATGA-3) and 192H (5-CCATAGTGCTTCCTGCTGCT-3). Cycling conditions were as follows: 50C for 30 94C for 2, followed by 35 cycles of 94C for 15s, 58C for 30s, 72C for 2, and a final extens ion step of 72C for 10. A second nested PCR was performed using the GoTaq PCR Supermix (Promega, Madison, WI, USA) according to manufacturers instru ctions using 5ul of the first round amp lification as template and primers D1 (5-CACAGTCTATTATGGGGTACCT GTGTGGAA-3) and 194G (5CTTCTCCAATTGTCCCTCATA-3) with cycling conditions as follo ws: an initial denaturing step of 95C for 5, 35 cycles of 94C for 1, 58C for 1, and 72 for 2, followed by a final extension step of 72C for 10. In some cases, a third round PCR was necessary using 5ul of the second round as template, primers Env5 (5GGGGATCCGGTAGAACAGATGCATGAGGAT-3) and 194G, and the same cycling conditions as described for the second round PCR. Additional sequences were generated for the plasma 1 week, 12 month, and 24 month samples using 1ul of RNA template to determine whether larger input volumes of RNA were c ontributing to PCR-generated recombination. PCR products were ligated into T opoTA vector (Invitrogen, Carlsba d, CA, USA) according to the manufacturers instructions, transformed into top10F cells for 30 on ice, followed by a 30s heatshock at 42 C, and incubation overnight at 30 C. In sum, approximately 30 clones from each sample were sequenced in both directions by the Genome Sequence Service Laboratory at the University of Florida yielding 284 independent sequences. Sequence Analysis and Recombination The V1V5 sequences were manually aligned and checked for accuracy using BioEdit v7.0 (Hall, 1999) and Mega v4.0 (Tamur a et al. 2007). V1 and V2 hapl otypes were assigned based on sequence similarity. Length and number of gl ycosylation sites were calculated manually. 105
The entire V1V5 region as well as shorter regions of the gp120 gene were evaluated for recombinant sequences based on an algorithm that I helped developed that uses the pairwise homoplasy index (PHI) test in conjunction with phylogenetic ne tworks (Salemi MM, Gray RR, Goodenow MM, unpublished). The PHI te st statistic is a modified su m of incompatibility scores calculated for pairs of informative sites in an alig nment. The PHI statistic is then assessed with a normal approximation of a permutation test, and has been shown to be a powerful method with simulation studies (Bruen, Phili ppe, and Bryant 2006). In our method, viral populations were tested for significant population structure using the K*s test (Hudson, Slatkin, and Maddison 1992; Achaz et al. 2004) and divided into smaller datasets by time point if significant population structure was detected. A neighbor-net phylogenetic network was inferred for each dataset, which allows the presence of phylogenetic uncer tainty, and sequences were progressively removed until the PHI test statistic was no longer significant (p>0.05). The smaller datasets were again combined, and the procedure repeated to de tect inter-time point recombinants. All putative recombinants were removed from the dataset for further analysis. Pairwi se calculations were performed for the non-recombinant alignment us ing Mega v4.0 with the Kimura 2-Parameter model with pairwise deleti ons and uniform rates. Phylogenetic Analyses Phylogenies were estimated for non-recombinant datasets using a Bayesian analysis with the BEAST software package 1.4 (Drummond et al. 2005; Drummond and Rambaut 2007). The evolutionary rate was initially estimated using a model assuming a strict clock, the SRD6 model of nucleotide substitution (HKY + the 1 st and 2 nd codon positions were considered separately from the 3 rd hereafter referred to as two partitions (Shapiro, Rambaut, and Drummond 2006)), and a constant population size with sampli ng time information included in the model (Drummond et al. 2006). The rate obtained from this model was th en used as a prior in further 106
analyses that assumed a relaxed clock with log-normally distribut ed rates with three models of nucleotide substitution: the SRD6 model, the gamma time-reversible (GTR) model with two partitions, and the GTR model with three parti tions (all three codon positions considered separately). The Markov Chai n Monte Carlo analysis was run for 100,000,000 generations with sampling every 10,000 th generation. The results were visual ized in Tracer v.1.3, and convergence of the Markov chain was assessed by calculating the effective sampling size (ESS) for each parameter (Drummond et al. 2006). All ESS valu es were >500 indicating sufficient sampling. The time to the most recent common ancestor (TMRCA) was estimated for each analysis. The marginal likelihoods of each model were compar ed using the Bayes Factor method, in which a difference of greater than 20 log units between the marginal likelihoods of any two models is considered significant evidence for the alternativ e model (the more complex model). The correct root of the phylogeny was analy zed by constraining all sequences except particular groups from the first timepoint as the ingr oup using the SRD6 model with the relaxed clock and log-normal distribution of rates. This was performed for fi ve sets of outgroup sequences, and the marginal likelihoods were compared using the Bayes F actor test. A 50% consensus tree based on the posterior distribution of tr ees with a 50% burnin was cal culated using Mr. Bayes and manipulated in FigTree v.1.0. A maximum likelihood (ML) phylogeny was in ferred using PAUP v. 4.0 (Swofford 2002). The best-fitting nucleotide substitution model was tested wi th a hierarchical likelihood ratio test using a neighbor-joining tree with Jukes and Cantor corrected distances. Statistical support for internal branches in the tree was obtained by bootstrapping (1, 000 replicates). All rootings of the best ML phylogeny were generated in MacC lade (Maddison and Maddison 1989). The most likely rooting was determined using baseml in the PAML package (Yang 1997) with the clock 107
hypothesis and sampling information included with th e sequences. The best rooted tree was then re-estimated without the clock hypothesis in PAML. To determine the subtype of this patient, one sequence from each tissue (plasma, left and right breast milk) from week 1 was aligned to re presentative sequences from each major subtype in macrogroup M obtained from the Los Alamos database ( http://www.hiv.lanl.gov/c ontent/sequence/HIV/SU BTYPE_REF/align.html ). A neighborjoining tree was inferred usi ng PAUP v. 4.0 with the HKY mode l of substitution, estimated transition/transversion ratio, an estimated ga mma distribution of ra tes and an estimated proportion of invariable sites. Statis tical support for internal branches in the tree was obtained by bootstrapping (1,000 replicates). Branch Selection Analysis A branch selection analysis was performe d using HYPHY (Pond, Frost, and Muse 2005). The synonymous/non-synonymous rate ratio (dN/dS) was estimated for each branch leading to a major clade, as well as the 95% confidence interv al (CI). All branches for which the CI did not include 1.0 were determined to signifi cantly deviate from neutral evolution. Compartmentalization Compartmentalization between tissues was investigated using a modified version of the Slatkin-Maddison test (Slatkin and Maddison 1989) implemented in MacClade (Maddison and Maddison 1989) using the State Changes a nd Stasis option. The number of unambiguous instances in which an ancestral state changed from one tissue to the other, as well as all instances in which no change occurred, were calculated for each phylogeny sampled from the posterior distribution in BEAST with a 25% burnin and 10,000 random splitting and joining trees. The Kolmogorov-Smirnov nonparametric te st was used to compare the distribution of each category (breastmilk to breastmilk, breastmilk to plasma, plasma to plasma, and plasma to breastmilk) for 108
the estimated and random trees. The nonparame tric Wilcoxon Rank Sum test was used to compare the median value for each category between the estimated and random phylogenies. Results Subtype Analysis The neighbor-joining tree using representa tive sequences from each major subgroup in macro-group M is shown in Figure 5-2. A high boot strap value (100) suppor ts the placement of this patient into subtype C. Sequence Analysis An initial dataset of 184 sequences was obtaine d and used in the majority of analyses (Table 5-1). Additional sequences were generated for three plasma samples (1W, 12M, 24M) to rule out the possibility of PC R-generated recombination. These sequences were not included in any of the phylogenetic analyses. Additional sequ ences were also generated for milk samples from month 1. These sequences were included in a subset of phylogeneti c analyses as noted below. Variable regions 1 and 2 sequence analysis All generated sequences (n=282) were includ ed in the sequence analysis. Because the length variation in the V1V2 region may conf ound phylogenetic analyses this variation was analyzed separately. All gene rated sequences (n=282) were included. For V1, six different haplotypes were assigned (A-F) based on am ino acid (aa) motif, length, and number of glycosylation sites. Representative sequences for each haplotype are shown in Figure 5-3. The length of V1 ranged from 29aa-51aa, and the number of gylcosylation sites ranged from 1-5. Haplotypes D, B, and C were most prevalent (in that order), and share similar aa motifs but differ in length. For V2, five haplotypes (A-E) we re assigned. Representative sequences for each haplotype are shown in Figure 54. The length of V2 ranged from 40aa-51aa, and the number of 109
gylcosylation sites ranged from 1-3. Haplotype B was most prevalent, with the remaining sequences somewhat evenly divided between the other haplotypes. The amino terminus was fairly well conserved among sequences, as was th e leu-asp-iso motif that mediates binding of activated integrin a4b7, which mediates migration of CD4 T-cells to the gut (Arthos et al. 2008). In V1, the plasma shows a tre nd of increasing diversity over time (number of haplotypes), with the haplotypes present at the earliest timepoints being re tained (Table 5-2). The longer haplotypes C and D are present at the first timepoint, while the shorter haplotype B emerges at month 12 at a low frequency. Haplotype B increa ses to almost 50% by month 24, which suggests that a trend of shortening V1 length and loss of glycosylation s ites over time. However, because haplotype B is present in the br eastmilk at the first time point (BMR1W), it is unclear whether the presence of haplotype B in th e plasma by 12 months is this is the result of a migration from the breastmilk to the plasma, or whether th e viral population in the plasma evolved into haplotype B independently (Tab le 5-2). Haplotype A, which is the most divergent from haplotypes B, C, and D, remains at a low frequency at all timepoi nts in the plasma. Haplotype E, which has a unique aa motif, emerges only at month 24. There is less diversity in the breastmilk sequences at most timepoints relative to the plasma At week 1, the left breast has only a subset of haplotypes found in the right br east. Haplotype B is unique to the breastmilk, and while two of the haplotypes (A and C) are pr esent in both the plasma and th e breastmilk, haplotype A is found at very low frequencies in the plasma. By mont h 1, a new haplotype emerges in the breastmilk (F), which is not found at any ot her timepoint or tissue, but is found in both breasts. In fact, the viral population in the left breast seems to ha ve experienced a complete population replacement from haplotype A in week 1 to haplotypes B and F at month 1. In contrast, the viral population in the right breast displays more continuity from week 1 to month 1. This pattern suggests either 110
limited gene flow from the right to the left breast, or else the left breast is infected with only a subset of the viral population infecting the right br east. Haplotype D is present in the right breast (but not the left) at month 1, whic h at week 1 was in the plasma but not the breastmilk. At month 4, haplotype D is the most frequent in the both the breastmilk and the plas ma, and by the last two timepoints only haplotype D is found in the breastm ilk. Interestingly, this is the opposite trend of the plasma, in which diversity increased over time This could suggest that at month 1, the right breast began to experience some gene flow from the plasma, though the left breast remained compartmentalized. By month 4, there was complete geneflow between both breasts and the plasma. By the last timepoints, the gene flow from the plasma subsided (Table 5-2). In V2, a similar trend of incr easing diversity over time in the plasma is apparent (Table 52). Again, the haplotypes present at month 24 includes all haplotype s seen at previous timepoints with the exception of haplotype E that was present at month 4. Neither the length nor the number of glycosylation sites appear increase or decrease over ti me, although the diversity again increases over time. In the breastm ilk, again the haplotypes in the left breast are a subset of those found in the right at both week 1 and month 1, and again a populati on turnover is apparent in the left breast from week 1 to m onth 1. At month 4, the same hapl otypes are found in the breastmilk and the plasma, and again by 12 and 18 months only one haplotype is found in the breastmilk. There is no clear association be tween particular haplotypes in V1 and V2. This may be due to recombination between the tw o regions or to convergent evol ution, as the difference between some of the haplotypes is only a few amino acids. Ov erall, there is a clear trend in the plasma of increasing combinations of haplotypes, while the opposite trend in true in the milk. Variable region 3 loop analysis All sequences had V3 loop charges of <5, which in subtype B is predictive of CCR5 coreceptor usage (data not shown). However, the a ssociation between charge and co-receptor usage 111
is not as defined in subtype C. Further functi onal analyses are being performed to determine actual tropism and co-receptor usage. Recombination Analysis The presence of population structure was tested for each group of sequences from different tissues and timepoints. All inter-tissu e and inter-timepoint comparisons demonstrated significant population stru cture (p>0.001) except for the thre e tissues sampled at month four, which showed no population structure (Table 5-4). These sequences were th erefore considered as one group for the recombination test. The V1-V5 alignment was initially tested for the presence of putative recombinants using the PHI/phylogenetic method described above w ith a dataset of 184 sequences. However, a nonsignificant inter-timepoint dataset could not be ob tained using more than half of the sequences. Therefore, the alignment was shortened into four new alignments: V1-V3, V1-V2, C2-V3, and C2-V3 (see Figure 5-5). For the V1-V3 dataset, again a non-recombinant dataset could not be obtained. The full V1-V2 dataset initially suggested the presence of recombinants (p=8.43 x 107), and 27 sequences could be removed so that no recombination was detected (Table 5-5). For the C2-V3 dataset, no recombinants were detecte d. For the C2V5 dataset, initially the dataset was again unable to be resolved. However, the removal of ~10 amino acids at the carboxyl end of V3 as well as all of V4 resulted in a non-re combinant dataset. These results suggested that a hotspot for recombination is located at the am ino terminus the V2 region or the carboxyl terminus of the C2 region of the gp120 gene. Two alternate datasets could be produced for the C2V5 alignment, depending on which week 1 breastmilk sequences were removed. The networ k for these sequences is shown in Figure 5-6 (when all sequences were included, p=4.06 x 10-4, indicating a significantly recombinant dataset). Four distinct groups ar e evident: Group 1 contains all of the sequences from the left 112
breast, while the sequences from the right breas t form groups 2-4. Two of these groups could be removed separately to result in a non-recomb inant dataset: Group 3 plus two additional nonGroup 3 sequences (p= 0.076) or Group 1 (p=0.3 76). Because choosing the first alternative resulted in less putative recombinants overall (31 vs. 34), and to avoid removing all of the representatives from the left breast, the first alte rnative (C2V5 [A]) was c onsidered the better of the two options. However, to confirm some of the phylogenetic results, the second alternative (C2V5 [B]) was also used. Other than the breast milk week 1 sequences and the three additional recombinants, the other sixteen recombinants were identical between C2V5 (A) and (B). In order to confirm that the multitude of recombinant sequences was not due to PCRgenerated recombination, additional plasma samp les from week 1, month 12, and month 24 were re-amplified using 1/10 the input RNA, as incr eased template has been suggested to cause in vitro recombination. The new sequences were added to the C2V5 alignment, and the dataset was re-tested for recombinants. Similar variants were recovered for all three timepoints, and again the majority of the plasma recombinants were f ound in the month 24 sample, suggesting that the recombinant sequences are actually present in vivo The breastmilk sequences sampled from month 1 were added to the C2V5 (A) alignment. No additional recombinants were detected among these sequences. Phylogenetic Analyses Bayesian tip-date phylogeny The C2V5 (A) and C2V5 (B) alignments were us ed to infer a posterior distribution of trees using the program BEAST under three models of nucleotide substitution (SRD6, GTR + 2 partitions, and GTR + 3 partitions) with a re laxed molecular clock, and the SDR6 model assuming a a strict molecular clock. For the C2V5 (A) dataset, the Bayes Factor of the GTR + 2 partition model was >20 log higher than either the SRD6 model or the strict clock model, 113
suggesting a significantly better f it to the data, while the GTR + 3 model was <20 log higher than the GTR +2 model, suggesting unnecessary pa rameterization (Table 6). For the C2V5 (B) dataset, both GTR models resu lted in very low ESS values after 100,000,000 generations of the MCMC chain, and thus these models were not considered. The Bayes Factor for the SRD6 model with the relaxed clock was >20 log higher than the strict clock (data not shown), again suggesting that the strict clock coul d be rejected for both datasets. The posterior distribution of tr ees generated under the best m odel (as identified above) was used to generate a 50% consensus phylogeny for C2V5 (A) (Figure 5-7) and C2V5 (B) (Figure 5-8). Both phylogenies demonstrated similar t opologies with strong temporal evolution. In phylogeny (A), breastmilk week 1 samples formed three groups (1, 2, 3) that clustered in separate clades at the base of the tree. Gr oups 1 and 2 were well supported (>95%), while group 3 (from the right breast) was moderately suppor ted (72%). Groups 2 and 3 showed moderate support for clustering together ( 79%). In phylogeny (B), breastmilk week 1 Groups 2 and 4 also form well supported clades (95% and 100%, respectively). Group 3 is again moderately supported (73%), but in this case does not cluster with Group 2 and is part of the larger clade of the remaining sequences. In both phylogenies, the week 1 plasma sequences do not cluster with the breastmilk sequences and with one exception are part of the larger clade of remaining sequences. These results indicate that the breastmilk and plasma viruses, as well as the virus in both breasts, are distinct one week after birth. Furthermore, the vi rus in the right breast is also more diverse than the left, as shown in the V1V2 analysis. The majority of the month 4 sequences fo rm a moderately well -supported paraphyletic clade (90% in [A] and 84% in [B]), and a wellsupported clade containing a subset of month 4 plasma, right and left breastmilk sequences is pr esent within the larger clade (99% [A] and 94% 114
[B]), consistent with a lack of population stru cture among these sequences. Thus, at month 4, it appears that no biological barrier is restricting gene flow betw een the tissues. In general, this large population of month four viruses does not gi ve rise to later viral populations, with the exception of a few month 12 plasma sequences. Fu rthermore, this group of month 4 sequences appears to evolve from the earlier plasma vi rus rather than from the breastmilk virus. The rest of the later timepoint sequences form two major clades that cluster together with moderate support (89% [A] and 83% [B]). In the fi rst major clade, the majority of the plasma sequences from 12 months form a well-supported clade (92% [A] and 82% [B]) which gives rise to the clade of 24 month sequences (93% [A] an d 80% [B]). In the second major clade the few remaining month four sequences from all three tissu es cluster with all of the month 12 breastmilk sequences as well as three plasma month 12 se quences with very high support (98% [A] and 81% [B]). In (A), the month 12 breastmilk se quences cluster with the month 18 breastmilk sequences as well as a few plasma month 24 sequences with moderate support (81%). Alternatively, in (B), the month 18 breast milk sequence cluster with the majority of the plasma sequences from months 12 and 24. This suggests that either the virus in the breastmilk recompartmentalizes after month 12 a nd evolves independently from th e plasma virus (A), or that the breast milk was re-infected by the virus in the plasma at month 18 (B). For the breastmilk month 18 sequences, only a truncated region amplifie d, probably due to low viral load. This may explain the phylogenetic uncertainty as to their tr ue location. Irrespective of the true position of the month 18 sequences, at month 12, there seems to be limited migration of the virus between the breastmilk and the plasma. Possibilities for the apparent lack of gene flow include a biological barrier such as a re newed tight junction between ep ithelial cells or limited production of breastmilk. 115
Rooting the phylogeny Because the origin of the virus in both tissues is of great interest, obtaining the correct root for the phylogeny is essential before drawing co nclusions. Because the model chosen for the Bayesian analyses assumed a relaxed molecular clock, the most likely po sition of the root is estimated along with possible topologies. To formally test the position of the root, five alternative consensus Bayesian phylogenies were estimated for the C2V5 (A) dataset (Table 56). Both the original analysis with no constrai ned outgroup and the analysis with the breastmilk group 1 sequences as the outgroup had the high est marginal likelihoods, though the marginal likelihoods for all analyses were very similar to each other and no significant difference could be ascertained. In order to furthe r investigate the correct root ing, a maximum likelihood (ML) phylogeny was estimated with an assumption of no molecular clock, and the most likely root was estimated from the entire set of all possible r oots (Figure 5-9). The topology of the correctly rooted ML tree was similar to the Bayesian consensus phylogeny, although most branches were not well-supported and the phylogeny is less resolv ed; specifically the tem poral structure of the virus evolution is much less clear. The root was placed at the breastmilk week 1 sequences, with a bootstrap value of 100. The group 2 breastmilk se quences are still well supported (87%). The group 3 breastmilk sequences were not supporte d as a clade in the ML rooted phylogeny, however, and appear to give ri se to the plasma sequences. The breastmilk month 12 sequences clade together and appear to give rise to th e month 18 breastmilk sequences, with only a few plasma sequences are present in this clade. Th e plasma month 24 sequences also clade together (along with one week 1 plasma sequence), altho ugh not with the month 12 plasma sequences as in the original Bayesian phylogeny. However, none of these clades are well supported (<50 bootstrap value). 116
In general, the ML rooted phylogeny supports the conclusions drawn from the Bayesian phylogeny, in particular the distinct groups in the first timepoint breastmilk sequences. The Bayesian outgroup analysis as well as the ML an alysis suggests that the breastmilk sequences did not arise from the plasma and that they share a common ancestor well before birth. The TMRCA was estimated for each of Bayesian outgroup analyses, and the range was between 1053-1171 days before present (=24 months). This corresponds to about one year before birth, before the pregnancy and well before the initiation of breastmilk. This supports a model in which the breastmilk infection is initiated locally, or else is seeded by infected cells which traffic from other tissues in the body than th e plasma. By month 4, the brea stmilk virus appears to be replaced by a virus that shares an origin with the plasma virus, while by month 12, the breastmilk virus is again compartmentalized. In genera l, both the Bayesian and maximum likelihood methods and both alternative alignments re sult in similar topologies, indicating high phylogenetic signal and robust results. Branch selection analysis The dN/dS ratio was tested for each of the br anches leading to major clades on the bestmodel Bayesian consensus tree. Fo r several of the branches, the ratio was significantly different from 1, indicating a deviation from neutrality (d ata not shown). The branch leading to the right breastmilk sequences at the first timepoint was under negative selection, as was the branch leading to the majority of the month four plas ma and breastmilk sequen ces (Figure 5-10). Three branches towards the later part of the phylogeny were under significant positive selection: the branch leading to all of the late breastmilk sequences, the branch leading to the month 18 milk sequences plus some month 24 plasma sequences, a nd the branch leading to the majority of the plasma month 24 sequences. These results sugge st changing selective pressures in both the plasma and breastmilk later in infection. 117
Inclusion of breastmi lk month 1 sequences Because the characteristics of the week 1 and the month 4 breastmilk viruses appear very different, i.e. a shift from compartmentalization to mixing with the plasma, additional sequences from month 1 were included in the C2V5 (A) alig nment. The Bayesian analysis was re-estimated using the SRD6 model of nucleotide substitution, a relaxed clock with a log-normal distribution of rates and a constant population size, and th e 50% consensus tree was calculated (Figure 5-11) The left breastmilk sequences from month 1 cluste r together with a modera te probability (82%), suggesting a common origin, while they cluste r with the left breastmilk week 1 group 1 sequences with low probability (57%). The initia l right breastmilk week 1 virus does not appear to evolve into the month 1 virus as these seque nces do not cluster toge ther. Rather, the right breastmilk virus appears to have been replaced by a different, but similar, viral population. This supports a model in which infected cells traffi c to the breastmilk from other tissues, and continuously seeds a new infection. Only a subset of infected cel ls may migrate from the tissue to the milk, which would explain the observed pa ttern of variation. If the locally infected mammary cells were infected, a stronger evoluti onary relationship would be expected between the week 1 and month 1 virus. There is still no support for the pl asma having seeded the month 1 virus population. Another piece of information to be learned from the phylogeny is the relative branch lengths. Typically, longer internal branch lengths relative to the terminal branches is indicative of a constant population size. Conversely, longer terminal branches re lative to the internal branch lengths can be indicative of expone ntial growth (Grenfell et al. 2004). In general, the internal branch lengths leading to the month 1 clades are much longer than those in the plasma clades, which is indicative of a constant population size. This supports a model in which the infection in the breastmilk is a slowly evolving infection, possibly being reseeded with virus from other 118
tissues. This could also support a model in whic h a limited number of cells are available for infection in the breastmilk relative to the plas ma, which restricts the potential growth of the infection. Migration Analysis To test the hypothesis that th e virus in the breastmilk is compartmentalized with respect to the plasma virus, a modified version of th e Slatkin-Maddison test wa s used which compares both the distribution and the median number of four possible events (migration from breastmilk to plasma or plasma to breastmilk, and constant state of breastmilk or plasma) for each tree from the posterior distribution and a set of randomly generated trees. The distributions for all four possible scenarios were signifi cantly different (p<0.0001) than the random expectation. The median for each of the four events was also significantly different (p <0.0001) than the random expectation. The minimum, maximum, and aver age for each event for the actual and random trees is shown in Figure 5-12. These results sugg est that significant comp artmentalization of the breastmilk does occur over the two-year period. In order to de termine if there was migration between the left and right breasts the analysis was repeated usi ng nine potential events. Again, both the distribution and the median for each of the nine events in the observed trees were significantly greater th an expected by chance (p<0.001 for all comparisons). The minimum, maximum, and averages are shown in Figure 5-13. This suggests that each breast functions as a separate compartment for the virus over time. Discussion This study represents the first longitudinal analysis of the ev olution of HIV in breastmilk and was designed to answer several questions ab out the evolution of the breastmilk virus over time. The initial question was whether the virus in the breastmilk originates from the plasma during lactogenesis, or is seeded by either a lo cal infection of the mammary cells or infected 119
cells trafficking from another tissue. The haplotype analysis of V1V2 and the phylogenetic analysis of C2V5 suggest that the milk viru s does not derive from the plasma. V1 and V2 haplotypes in the milk are not present in the pl asma at the first timepoint, and the phylogenies clearly show well-supported clades for the milk viruses that do not include plasma virus. The month 1 milk sequences show the same patter n, although they are not unequivocally derived from the week 1 virus. This lends more support to the model of trafficking infected cells, which are infected with related viruse s but do not represent the full range of variation in the source tissue. Furthermore, the majority of the mutations are located on the internal branches of the milk clades, rather than on the external branches as for the plasma se quences. This suggests a constant size population of virus in the milk, which could be consistent with a low migration rate of trafficking cells. This hypothesis is also consistent with earlier observations that the T-cells and macrophages in the milk are phenoty pically different than the cells in the blood (B ertotto et al. 1990b; Ichikawa et al. 2003; Kourtis et al. 2003). Because only two tissues were included in this study, the tissue of origin cannot be determined. However, one likely source of infection could be the gut associated lymphoid tissue (GALT) (G. Aldrovandi, personal communication). The GALT contains the majority of T-cells in the bod y, which are the primary target cells of HIV-1. During acute infection in the initial stage of disease, HIV-1 targets this tissue which results in the massive depletion of T-cells (Douek 2007b; Douek 2007a). The high frequency of infected cells in this tissue is consistent with a model in whic h T-cells trafficking to th e breast milk during the first month post-partum initiate a new infecti on. Because the GALT is infected early in the course of disease, this scenario would also explain the observation that the milk virus appears to emerge from the root of the tree and shares an ancestor with the plasma virus >1 year prepartum. 120
A second question addressed by this study was whether the onset of weaning was associated with any evolutionary changes. The mother reported exclusive breastfeeding at four months, although by six months she reported mixe d feeding. The milk and plasma virus were part of the same population at the month 4 samp le, which appeared to have evolved from the earlier plasma virus rather than from either the milk virus or another tissue. Therefore, it seems that the plasma virus infected the milk before the onset of weaning, which is contrary to the initial expectation that weaning would preced e panmixia. The opening of the paracellular pathway at the onset of weaning (<2 feedings per day) is associated with increased levels of plasma derived minerals and nutrients, and would provide an opportunity for the plasma virus to migrate to the milk. It is intere sting that the infant was infected at 4 months as well, even though the mother was reportedly exclusively breast fe eding. Unfortunately, the precise timing of the events around four months ca nnot be ascertained, so an ex act order of weaning and HIV infection cannot be determined. A third question was whether the breastmilk virus compartmentalized over time. The earliest virus is clearly derived from a different population than the plasma virus, but at month 4 the two tissues are experiencing nearly complete migration. By month 12, the milk virus appears again compartmentalized with respect to the plasma virus, and appears to derive from the month 4 milk virus. Overall, this does suggest compartm entalization of the virus, and the analysis for migration indicated much greater compartmenta lization than expected by chance. By month 12 and 18, the mother was most likely breastfeedin g much less frequently than during the few months after birth, and therefore milk producti on would have been lower and fewer opportunities may have existed for the plasma virus to enter the milk. In addition, the ini tial source of infection does not appear to have re-infected the mil k, possibly again due to lowered milk production. 121
These results may also explain the discrepancie s reported earlier in th e literature concerning whether the breastmilk virus compartmentalized, since those studies consid ered very different timepoint. This study demonstrates the importance in choosing sampling times carefully before making generalizations about the origin and evolution of breastmilk virus. Finally, I examined the selective pressures th at were acting on the virus over time. My analysis indicated that an episode of negative se lection had occurred duri ng the evolution of the right breastmilk virus at week 1, as well as th e breastmilk and plasma at month 4. During the later stages of evoluti on, several independent episodes of positive selectio n occurred on branches leading to the month 12 and 24 viruses in both tis sues. This could suggest a relaxation of host pressure allowing the viral populatio n to acquire new beneficial muta tions. This is also consistent with the longer terminal branch lengths in the later samples, which is indicative of exponential growth. In another study, I show ed that the diversity and eff ective population size of the virus increases as host selective pressure relaxes and the immune system begins to fail (as measured by decline of CD4 cells) (Gray et al, in prep). No clinical in formation was available for the patient in the current study so th is hypothesis could not be tested. However, th is is an avenue of investigation for future studies. The conclusions drawn from this study have implications for the management of breastfeeding by HIV-infected mothers. First, because the milk virus is distinct from the plasma virus until at least month 1 and because it originated in tissues with potentially different selective pressures, the initial milk virus may have different pathogenicity and/or transmissibility than the plasma virus. If so, this could modulate the risk of MTCT transmission during the first few months and impact the recommendations currently given to women. Second, the population dynamics of the virus clearly changes by m onth 4. Although the mother reported exclusive 122
breastfeeding, she did initiate weaning sometime within the weeks after this sample was collected. Therefore, although it ap pears that migration of the vi rus between tissues occurred prior to weaning, it is difficult to confidently conclude that the two are unlinked. Third, this study conclusively demonstrates that the milk virus is compartm entalized throughout much of its production. The again suggests that the milk virus may evolved different characteristics that modulate transmission. Further, if the infection in the milk maintains a constant population size as suggested by the phylogeny, and is protected from the more rapidly growing plasma virus, fewer infected cells are available to transmit the virus. Determining exactly why the virus is or is not compartmentalized in the milk may aid our u nderstanding of how to lower the risk of MTCT. Lastly, this study demonstrates that the evolution of the virus in the milk is a dynamic process. I am currently studying two additional patients for whom samples are av ailable over a one-year period to determine whether the same dynamics occur in other patients as well. Understanding the complexity of the evolutionary process is of great importan ce in understanding how exclusive breastfeeding attenuates the risk of transmission on a molecular level, so that more precise recommendations for avoiding MTCT can be made to HIV-1 infected women worldwide. This study was conducted within an evolutionary anthropological framework on several levels. Traditional analytical t echniques used for population genetics were implemented here to investigate the evolution of a human pat hogen, although on a much shorter timescale than traditionally considered. Because the rate of evolution in HIV-1 is many orders of magnitude faster than in other pathogens, the population dynamics of in fection, including response to selective pressures, changes in population size, and migration be tween compartments, can all be measured within one individual, rather than wi thin and between populatio ns of humans as is usually the case in anthropologi cal genetics. This difference hi ghlights the fact that pathogen 123
dynamics are similar in a micro and a macro environment, and lessons from each level can inform the other. In addition, women in third-wo rld countries who do not enjoy the benefits in developed countries cannot safely choose to form ula feed their infants to minimize risk of transmission. However, comparatively little re search has been conducted on the molecular characteristics of the virus in milk as compar ed to other tissues. The recent observations that exclusive breastfeeding may attenuate the risk of transmission ar e extremely important in both a clinical and an anthropological context. The management of breastfeed ing practices by the women themselves in developing countries represents an economical, culturally appropriate, and non-invasive method to control transmission to their infants, without intervention by medical staff, drug companies, or governments. The uniqu ely interdisciplinary ap proach of anthropology promotes the importance in providing marginal ized women with the t ools to manage their infants health, as well as the analytical fram ework to investigate and understand the molecular mechanisms underlying the recommendations. 124
Table 5-1. Number of sequences generated for each tissue. Tissue Timepoint Number of Clones Part of initial dataset PL 1W 23 Yes PL 1W 13 a No PL 4M 12 Yes PL 12M 37 a Yes PL 12M 22 No PL 24M 28 a Yes PL 24M 22 No BML 1W 15 Yes BMR 1W 31 Yes BML 1M 28 b No BMR 1M 13 b No BML 4M 11 Yes BMR 4M 9 Yes BMR 12M 14 Yes BML 18M 4 Yes a These sequences were generated using less starting template in the PCR reaction and were not included in any of the phylogenetic analyses. b These sequences were in cluded in a subset of analyses. Table 5-2. Sequence characteristics of V1 and V2. V1 V2 Tissue/Timepoint Haplotypes #GlySites Length Haplotypes #GlySites L PL1W A(.03) C(.11) D(.86) 1-4 24-37 A(.06) B(.94) 2-3 4 PL4M A(.08) C(.16) D(.75) 2-4 33-37 B(.84) C(.08) E(.08) 2-3 4 PL12M A(.03) B(.05) C(.31) D(.61) 2-4 25-37 A(.05) B(.46) C(.05) D(.41) E(.3) 2-3 4 PL24M A(.04) B(.48) C(.1) D(.22) E(.16) 2-5 22-44 A(.38) B(.32) C(.16) D(.12) 1-3 4 BML1W A(1) 1 23 E(1) 3 5 BMR1W A(.29) B(.32) C(.39) 1-3 23-36 A(.42) C(.29) E.29) 2-3 4 BML1M B(.96) F(.04) 1-4 27-46 A(.25) C(.75) 1-2 4 BMR1M A(.08) B(.23) D(.08) F(.54) 1-4 24-46 A(.15) B(.08) C(.77) 1-2 4 BML/R4M A(.05) D(.95) 2-4 29-37 B(.8) C(.1) E(.1) 2-3 4 BMR12M D(1) 4 37 D(1) 3 4 BML18M D(1) 4 37 D(1) 3 4 Haplotypes were assigned based on aa motif, lengt h, and glycosylation sites. The frequency of each haplotype is given in parentheses. 125
Table 5-3. Combination of V1 and V2 haplotypes. V1 HAP A A A B B B C C D D D D D E F V2 HAP A C E A B C A B A B C D E C C PLA1W 1 4 1 30 PLA4M 1 2 7 1 1 PLA12M 2 3 18 9 3 24 PLA24M 2 15 9 1 4 1 3 1 6 8 BML1W 15 BMR1W 9 1 9 12 BML1M 7 20 1 BMR1M 2 3 1 7 BML4M 1 9 1 1 BMR4M 7 1 BML12M 14 BMR18M 4 The number of clones displaying each combination of V1 and V2 haplotypes is given for each tissue/timepoint. Haplotypes were defined by length, aa motif, and number of glycosylation sites. Table 5-4. Hudson test fo r population structure. Comparison Group 1 Group 2 p-value intra-timepoint tissues BML 1W BMR 1W p<0.001 intra-timepoint tissues BML 4M BMR 4M p=0.935 intra-timepoint tissu es BML1W PLA1W p<0.001 intra-timepoint tissues BMLR 1W PLA1W p<0.001 intra-timepoint tissues BML/R 4M PLA 4M p=0.428 intra-timepoint tissues BMR 12M PLA 12M p<0.001 inter-timepoint, same tissue PLA 1W PLA 4M p<0.001 inter-timepoint, same tissue PLA 1W PLA 12M p<0.001 inter-timepoint, same tissue PLA 1W PLA 24M p<0.001 inter-timepoint, same tissue PLA 4M PLA 12M p<0.001 inter-timepoint, same tissue PLA 4M PLA 24M p<0.001 inter-timepoint, same tissue PLA 12M PLA 24M p<0.001 inter-timepoint, same tissue BML 1W BML/R 4M p<0.001 inter-timepoint, same tissue BML 1W BMR 12M p<0.001 inter-timepoint, same tissue BMR 1W BML/R 4M p<0.001 inter-timepoint, same tissue BMR 1W BMR 12M p<0.001 inter-timepoint, same tissue BML/R 4M BMR 12M p<0.001 inter-timepoint, same tissue BMR 12M BML 18M p<0.001 126
Table 5-5. Number of putative recombinant clones. V1V5 v1v3 v1v2 c2v3 C2V5 (A) C2V5 (B) C2V5 (A) c C2V5 (A) d Initial pvalue p<10 -19 p<10 -19 p=8.43x10 7 p=0.191 p=0.002 p=0.002 p=0.027 p=0.006 PL 1wk UNDET b UNDET 3 0 2 2 2 3 PL 1wk a UNDET UNDET * 0 PL 4m UNDET UNDET 1 0 6 6 6 6 PL 12m UNDET UNDET 0 0 4 5 4 6 PL 12m a UNDET UNDET * 0 PL 24m UNDET UNDET 10 0 0 2 0 5 PL 24m a UNDET UNDET * 9 BML 1w UNDET UNDET 0 0 2 15 2 2 BMR 1w UNDET UNDET 12 0 14 0 14 14 BMl 4m UNDET UNDET 1 0 2 2 2 2 BMR 4m UNDET UNDET 0 0 2 2 2 2 BMR 12m UNDET UNDET 0 0 0 0 0 0 BML 18m UNDET UNDET 0 0 0 0 0 0 The number of identified clones in each a lignment is given for each tissue/timepoint. a These sequences were generated using 1/10 of the amount of starting template and were only used in this analysis. b For these alignments, a dataset that did not include recombinants could not be obtained. c This alignment included the additional breas tmilk sequences from month 1 in addition to the non-recombinant C2V5 (A) dataset. d This alignment included the additional plasma sequences in addition to the non-re combinant C2V5 (A) dataset. Table 5-6. Marginal likelihoods for models used in the Bayesian analysis. Model Outgroup Marg. Lik. SRD6, SC none -4362.9985 SRD6, RC none -4321.3865 GTR2, RC none -4278.6483 GTR3, RC none -4270.3296 SRD6, RC BML 1W (Group 1) -4321.3865 SRD6, RC BMR 1W (Group 2) -4322.9025 SRD6, RC BMR 1W (Group 3) -4322.7647 SRD6, RC PLA 1W (4 seq.) -4323.1764 SRD6, RC PLA 1W (2 seq.) -4322.8887 The definition of each outgroup is given in the text. SRD6 = Shapiro, Rambaut, and Drummond (2006) model. GTR = general time reversible with two or three partitions of the data based on codon position. SC = strict clock assumption. RC = relaxed clock assumption. 127
128 Figure 5-1. Sampling times and tissues. Sampling times are on the top of the graph. W=week, M=month. Tissues which were sampled from each time are on the bottom: PL=plasma, BML=left breastmilk, BMR=right breastmilk. Three sequences from week 1 are in red. Bootstrap values based on 1,000 neighbor-joining replicates are indicated above each major branch. Branch lengths are in units of substitutions/site. Figure 5-2. Neighbor-joining phylog eny of all subtypes in group M plus this patient.
HAPLOTYPE LENGTH #GLY # CLONESA CTNLA NDTA RRIIAK SMEEEVKNC24127 A CTEIR NITGN NRTIDF SMKGEVQNC2522 A CTEIR NITDGG NRTIDF SMREEIKNC2622 A CTEIR NITGGGTDNNRTIDF SMKGEVKNC2921 A CTEIYNSTSG NSTDGGKDNNRNIDL SMQGEVKNC3421 B CTNLNRTFV NYTS SMKEEIKNC2222 B CTNLNRTIA NDTSG I SMQEEIKNC2423 B CTNLNRTIG NDTSD AN GMKEEIKNC2521 B CTNLKNTIV NDTSG SDTS SMKEEIKNC2717 B CTNLNRTIV NDTSG SDTS SMKEEIKNC27254 C CTNLKNTTVNGT SGNGARTIDS NMKGEVKNC3121 C CTNLTNTTVNGT GSGNDTKRTVDS SMEGEVKNC3336 C CTNLKNTTVNGT SDTSGGNNNNRTIDN SMKEEIKNC36333 D CTNLKNTTVNGTVG NGTGGG RTIDS NMKGEVKNC3432 D CTNLKNTTVNSTSG TSNGNDKKRTIDS SMKGEVKNC3623 D CTNLKNTTVNGTSG TSGGTDNNRTIDF SMKGEVKNC3631 D CTNLKNITVNGTSG NSTSGGNENRTIDS SMKGEVKNC37444 D CTNLKNITVNGISG NSTGGGTAINRTIDF SMKEEIKNC3836 D CTNLKNTTVNGTSG NSTGGGNGNNRTIDL SMKGEVKNC38461 D CTNLKNITVNGTSS NSTGDGNDTNRTIDS SMTEEVKNC3852 D CTNLQNITVNGTSSNSTSGNSTGGGTDYNRTIDF SMKGEVKNC4353 E CEELNGTIVNDTYSNDATFNNNTSGGIKRNKTIVR SMEGEVKNC4448 F CTKPSGTIVNDTAGNGTMNDTAG-NGTMNGGDRKIEISMEEEVKNC4658 129 Figure 5-3. Haplotype analysis of V1. Haplotypes were assigned based on amino aci d (aa) motif, length, and number of glycos ylation sites (highlighted in green). Representative sequences for each haplotype are shown. Sequences w ithin a haplotype that differed in aa motif but shared the same length and number of glycosyl ation sites are not shown.
130 HAPLOTYPE LENGTH #GLY # CLONESA CSFNTTTEIHDKQQKVHALFYRLDIAQLDKDRND YRLINC4011 A CSFNTTTEIHDKQQKVHALFYRLDIAQLDNNNET YRLINC40220 A CSFNTTTEIHDKQQKVHALFYRLDIAQLDNDSRT YRLINC40225 B CSFKTTTEIRDREQKVHALFYRLDIQPLGNETEEGGT YRLINC4313 B CSFNTTTEINDRKQKVHALFYRLDIQPLGNETEEGGT YRLINC4329 B CSFNTTTEIRDREQKVHALFYRLDIQPLGNETEENST YRLINC43371 B CSFNTTTEVNDRKQKVHALFYRLDIQPLGNETKEGNGT YRLINC44313 B CSFNTTTEINDRQQKVRALFYRLDIQPLGNETNKEGNGT YRLINC4532 B CSFNTTTEINDRKQKVHALFYRLDIQPLGNETTEGNGTYT YRLINC4634 C CSFRANTEKDRKQNVTALFYRLDIAPLD KGNKT YRLINC4018 C CSFNTTTEIHDRQQKVHALFYRLDIQQLD KERNDT YRLINC4123 C CSFNTTTEIQDKQQKVHALFYRLDIQQLD KEGDDTSET YRLINC4412 C CSFNTTTEIQDRQQKVHALFYRLDIQPLD KEGNDTFET YTLINC44243 D CSFNTTTEIQDRQQKVHALFYRLDIQPLGNKTTKEGNDTFET YTLINC48348 E CSFNTTTEIHDRQQKVHALFYRLDIQPLEEGNKGGNDTTERENGTYTLINC51329 Haplotypes were assigned based on amino aci d (aa) motif, length, and number of glycos ylation sites (highlighted in green). Representative sequences for each haplotype are shown. Sequences within a haplotype that differed in aa motif but shared the sa me length and number of glycosyl ation sites are not shown. Figure 5-4. Haplotype analysis of V3.
VVC V3C3 VC V5 131157 196 296331385418460471 VVC V3 CVC V5 131 157 196 296331385418460471 V1V V1V C2V5 V1V2 C2V5 Figure 5-5. Recombination alignments. Red lines represent alignments for which a non -recombinant datatset could not be obtained. Green lines represent alignments for whic h non-recombinant datasets were obtained. Figure 5-6. Network of breast milk sequences from week 1. Breastmilk sequences from week 1 are categorized into four groups basaed on clustering in the network. Branch lengths are in substitutions/site. 131
Figure 5-7. Bayesian consensus phylogeny for C2V5 (A). Branches are color coded according to tissue as follows: blue = right brea st, red = left breast, green = plasma. Posterior probabi lities are given above each major branch. The timepoint at which sequences were sampled is indicated by a symbol (see legend). The three groups of breastmilk sequences at week 1 are labeled as categorized in Figure 5-6. 132
Figure 5-8. Bayesian consensus phylogeny for C2V5 (B). Branches are color coded according to tissue as follows: blue = right brea st, red = left breast, green = plasma. Posterior probabi lities are given above each major branch. The timepoint at which sequences were sampled is indicated by a symbol (see legend). The three groups of breastmilk sequences at week 1 are labeled as categorized in Figure 5-6. 133
Figure 5-9. Best-rooted ma ximum likelihood phylogeny. Branches are color coded according to tissue as follows: blue = right brea st, red = left breast, green = plasma. Bootstrap valu es >50 based on 1,000 replicates are given above each major branch. The timepoint at which sequences were sampled is indicated by a symbol (see legend). The three groups of breastmilk sequences at week 1 are labeled as categorized in Figure 5-6. 134
Figure 5-10. Bayesian consensus phylogeny for the C2V5 (A) dataset with branches under significant selection. Branches are color coded according to tissue as follows: blue = right brea st, red = left breast, green = plasma. The timepoint at which sequences were sampled is indicated by a symbol (see legend). The three groups of breastm ilk sequences at week 1 are la beled as categorized in Figure 5-6. Monophyletic sequences from the same tissue are collapsed. Thick black branches indicate significant negative selection, and thick red bran ches indicate significant positive selection. 135
Figure 5-11. Bayesian consensus phyloge ny with breast milk month 1 sequences. Branches are color coded according to tissue as follows: blue = right brea st, red = left breast, green = plasma. Posterior probabi lities are given above each major branch. The timepoint at which sequences were sampled is indicated by a symbol (see legend). The three groups of breastmilk sequences at week 1 are labeled as categorized in Figure 5-6. 136
Figure. 5-12. Migration an alysis for two tissues. (a) The migration matrix calculated from the poste rior distribution of tr ees estimated using the C2V5 (A) dataset. Circles are proportional to th e minimum (dark blue), average (light blue) and maximum (white) number of events for each cate gory. (b) The migration matrix calculated from 10,000 random trees. 137
Figure 5-13. Migration analysis for three tissues. (a) The migration matrix calculated from the poste rior distribution of tr ees estimated using the C2V5 (A) dataset. Circles are proportional to th e minimum (dark blue), average (light blue) and maximum (white) number of events for each cate gory. (b) The migration matrix calculated from 10,000 random trees. 138
CHAPTER 6 CONCLUSION The impact of disease on the human populati on is of immediate and practical concern. Emerging infectious diseases, defi ned as "infections that have ne wly appeared in a population or have existed previously but ar e rapidly increasing in inciden ce or geographic range" (Morse 1995) have been significantly in creasing over the past 50 years (J ones et al. 2008). Emergence is largely due to increasing urbanization, wars environmental degradation, and global interconnectivity (Fauci 1998; Ste phens et al. 1998a; Desselberge r 2000; Pollard and Dobson 2000; Feldmann et al. 2002; Fauci, Touchette, and Folk ers 2005). Tuberculosis, malaria, and HIV-1, as well as emerging threats such as SARS, West Nile Virus, and influenza, are major threats to international public health (Fauci 1998; Morens, Folkers, and Fauci 2004; Fauci, Touchette, and Folkers 2005). Infectious diseases are the second leading cause of death in the pre-industrialized world (Fauci, Touchette, and Folkers 2005) and account for >25% of all deaths worldwide (Morse 1995). In addition, complex diseases in cluding cancer, diabetes, and heart disease are responsible for 87% of deaths in high income countries and 43% of deaths in low income countries, of which 40-80% of these deaths co uld be avoided with lif estyle changes (WHO 2008a; WHO 2008b) though myriad studies suggest have these diseases have a genetic component as well. By 2030, heart disease and HIV/AIDS are predicted to be the largest components of the burden of human disease (Mathers and Loncar 2006) Due to the increasing global impact and complicated manner of transmission and inheritance of infecti ous and complex diseases, a comprehe nsive approach is essential to diagnose, treat and eradicate human diseases My dissertation dem onstrates how genetic anthropology can be used to address both anth ropological and clinical concerns from three perspectives. As anthropological geneticists, we are uniquely positioned to use the analytical 139
tools of evolutionary genetics to study the biol ogical mechanisms of diseases, while maintaining a holistic approach that considers the cultura l, historical, and dem ographic factors which influence etiology. We can also use our expertis e to influence US and international policy by advocating for culturally sensitive implementation of scientific findings. I believe that lack of dialogue between fields, even t hose that use similar molecular and bioinformatics techniques, can substantially impede progress towards shar ed goals of treating and eradicating human disease. However, my study also demonstrates that a conscientious treatmen t of the clinical and policy implications does not preclude simultaneously addressing more traditional biological anthropological questions involving human evolution and migration. Both anthropological and clinical implicati ons were addressed for the four projects included in this dissertation. First, with regard to the evolution of tr eponemal diseases, I found that the genetic variation pres ent within and between treponem al subspecies was largely the result of a particular type of homologous reco mbination called gene conversion. Furthermore, the largest number of gene conversion events took pl ace in the venereal syphilis genome, which is largely regarded as the most recently evolved of the three subspecies. These recombinant events distort the phylogenetic and mo lecular relationships among the subspecies, and therefore I conclude that relying on single nucleotide polymorphisms without considering the contextual sequence information are insufficient to discern evolutionary relationships among the treponemes. Second, my molecular data suggest that the three subspecies ar e molecularly distinct based on high bootstrap values for the phylogenetic branches separating the subspecies, as well as a significant amount of among-subspecies vari ation. However, this co uld be the result of geographic structure and does not ne cessarily support a classificati on of three diseases. Third, the molecular data do not appear to support a dramatica lly older origin of yaws relative to venereal 140
syphilis but instead are consiste nt with a relatively coincident evolution of the three human treponemal subspecies. Moreover, the venereal s yphilis sequences harbor more variation than would be expected under the modified Columbia n hypothesis of evolution of venereal syphilis within the past 500 years (Baker and Armelagos 1988). This study was able to compare support for the leading anthropological hypo theses for the evolution of syphilis, as well as provide new genomic regions suitable for diagnosing between the three diseases. In the second project, I i nvestigated the association be tween genotypic data from the ADH and ALDH alcohol metabolism genes and both dichotomous and continuous substance abuse phenotypes in a Plains population of Native Ameri cans. In the third project, I genotyped and analyzed genotype data from the SNCA gene in ~1000 individuals from the same Plains population and a second Native American population from the Southwest United States. Despite the extensive genetic data, I found no correlation between any of the ADH ALDH, or SNCA markers and the substance abuse phenotypes. For the SNCA gene, this may suggest that unassayed promoter polymorphisms are affecting the expression of the gene and risk of substance abuse rather than vari ants within the gene itself, si nce excessive mRNA and protein levels have been associated with alcohol use di sorders. Thus, future st udies should investigate the genetic variation upstream of the SNCA for an association with al cohol use. Alternatively, the evolutionary history of Native Americans could explain the lack of a ssociation between the assayed genetic markers and substance abuse. Native Americans experien ced a severe population bottleneck during migration from Asia to the New World, in which much of the ancestral genetic variability was lost (Mulliga n et al. 2004; Ramachandran et al. 2005). It was known from previous research that the ADH and ALDH alleles already identified as protective against alcoholism in Asians were not present in th e tested Native American populations although the 141
fact that the genes were implicated in the risk of alcoholism motivated my analysis of additional ADH and ALDH alleles. My research demonstrated that the SNCA upstream polymorphism previously associated with alc oholism was not present in the test ed populations, consistent with a severe population bottleneck duri ng Native American evolutionary history. Thus, these results represent a fairly comprehensive investigation of the major candidate genes and alleles for association with alcoholism in Native American s and lack of associa tion with such alleles suggests that a non-genetic approach may represen t the best strategy for treatment of alcoholism in these populations. I believe that additional resources should be dedicated to substance abuse intervention efforts that target so cietal and cultural causes, such as poverty, lack of health care and unemployment. In the final project, I conducte d the first longitudina l study of the evolution of HIV-1 in the breast milk and blood plasma of a HIV-positive mo ther over a two year period. The goal of the study was to elucidate th e molecular mechanisms responsible for the observation that exclusive breastfeeding reduces th e risk of transmission over mixed feeding (Coutsoudis et al. 1999; Coutsoudis 2000; Coutsoudis et al. 2001; Coutsoudis et al 2002; Iliff et al. 2005; Coovadia et al. 2007), as well as to understand the ge neral evolutionary patterns of the virus in different tissues within a single patient over time. I determined that the virus in the breastmilk post-partum is distinct from the virus contemporaneously circ ulating in the plasma, which suggests a tissue other than the plasma is seeding the milk inf ection. Because the initial milk virus may have different pathogenicity and/or tr ansmissibility than the plasma virus, future studies should investigate the phenotypic charact eristics of the early milk viru s. Intriguingly, the population dynamics of the infection in this patient cl early changes by month 4, coinciding with the transmission of the virus to the infant in this case. The mother also reported ceasing exclusive 142
breastfeeding shortly thereafter. However, the exact timing of these events is unclear and therefore conclusions about causality between cha nges in the breastmilk virus population and weaning with HIV transmission to the baby ar e uncertain. This study also demonstrates conclusively that the milk virus is compartm entalized throughout much of its production, and that the evolution of the virus in the milk is a dynamic process. Understanding the complexity of the evolutionary process is of great importance in determining th e relationship between feeding practices and transmission on a mol ecular level, so that women can be empowered to effectively manage their breastfeeding practices to minimize the risk of transmission. Thus, the uniquely interdisciplinary approach of anthropology prom otes the importance in providing marginalized women with the tools to manage their infants health, as well as the analytical framework to investigate and understand the molecular mechanisms underlying the recommendations. My dissertation also has broade r implications for the field of genetic anthropology. First, I use a model that incorporates three distinct perspectives from whic h human disease can be approached. Each perspective is temporally a nd philosophically distinct and includes different levels of human variation. While anthropologists traditionally employ the evolutionary or population perspective, we have the tools and the expertise to info rm studies from the clinical perspective as well. Second, I have demonstrated that incorporating studie s of pathogen genetics is valuable in understanding the interaction between humans and disease. Pathogens can be studied from a clinical perspective, such as HIV1 in an individual, as well as an evolutionary perspective, such as the movement of tre ponemes across continents. Pathogen evolutionary dynamics appear to be similar across temporally disparate perspectives, for example, the high degree of gene conversion/reco mbination that appears to be characteristic of both T. pallidum and HIV-1. Uncovering uniform proce sses recurring in the evolution of infectious diseases will 143
allow us to better understand human -pathogen interactions, and may ai d in developing strategies to eradicate infectious diseases. Lastly, because both infectious and complex diseases are increasing concerns to global health, genetic an thropologists should widen their focus to address clinical and policy implications implicit in their studies that may primarily address more evolutionary questions. Genetic anthropologists are equipped with the analytical tools to study the biological mechanisms of diseases and incorporate information about the underlying population structure and evolut ionary history. This unique perspective allows genetic anthropologists to provide comprehensive c linical and policy recommendations based on a comprehensive evaluation of genetic data. Finally, the multi-discinplinary approach employed by anthropologists can be valuable in ensuring that resulting applications of the data are culturally appropriate and provide ma ximum health benefits to communities in need. 144
APPENDIX A. LIST OF QUESTIONS FOR SUBSTANCE ABUSE CATEGORIZATION 1. Was there ever a time when, because of your drinking, you often missed work, had trouble on the job, or were unable to take ca re of household responsibilities? 2. Did you ever lose a job because of your drinking? 3. Did you often have difficulties with your fam ily, friends, or acquaintances because of your drinking? 4. Was there ever a period in your life when you drank too much? 5. Has anyone in your family or anyone elseever objected to your drinking? 6. Was there ever a time when you often c ouldn't stop drinking when you wanted to? 7. Have you ever had traffic difficulties becau se of your drinking like reckless driving, accidents, or speeding? 8. Have you often had tremors that were most lik ely due to drinking? (i.e. when you cut down or stopped or cut down? After not drinking for a fe w hours or more, did you often drink to keep yourself from getting the shakes or becoming sick?) 9. Was there ever a time when you frequently had a drink before breakfast? 10. Were you ever divorced or separate d primarily because of your drinking? 11. Have you ever gone on a bender? (Definition: dr inking steadily for 3 or more days, more than a fifth of whiskey daily/24 bot tles of beer/3 bottles of wine. Must have occurred 3 or more times.) 12. Have you ever been physically violent while drinking? (Must have occurred on at least 2 occasions.) 145
13. Have you ever been picked up by the police because of how you were acting while you were drinking (e.g. disturbing the peace, fighti ng, public intoxication. Do not include traffic difficulties.) 14. Have you ever had blackouts? (Definition: memory loss for events that occurred while conscious during a drinking episode.) 15. Have you ever had the DT's? (Definition: co nfused state following stopping drinking that includes disorientation and illusions or hallucinations.) 16. Did you ever hear voices or see things that weren't really there, soon after you stopped drinking (Hallucinations must have occurr ed on at least two separate occasions) 17. Have you ever had a seizure or fit ( non-epileptic) after yo u stopped drinking? 18. Did a doctor ever tell you that you had develo ped a physical complication of alcoholism, like gastritis, pancreatitis, cirrhosis, or neuritis ? (Include good evidence of Korsakoff's Syndrome chronic brain syndrome with anterograde amnesia as the predominant feature. 146
LIST OF REFERENCES Abbate, I., G. Cappiello, R. Longo, A. Ursitti, A. Spano, S. Calcaterra, F. Dianzani, A. Antinori, and M. R. Capobianchi. 2005. Cell memb rane proteins and quasispecies compartmentalization of CSF and plasma HI V-1 from aids patients with neurological disorders. Infect Genet Evol 5:247-253. Achaz, G., S. Palmer, M. Kearney, F. Maldarelli, J. W. Mellors, J. M. Coffin, and J. Wakeley. 2004. A robust measure of HIV-1 population turnover within chr onically infected individuals. Mol Biol Evol 21:1902-1912. Allen, S. J., A. O'Donnell, N. D. Alexander, M. P. Alpers, T. E. Peto, J. B. Clegg, and D. J. Weatherall. 1997. alpha+-Thalasse mia protects children against disease caused by other infections as well as malaria. Proc Natl Acad Sci U S A 94:14736-14741. Amos, A. F., D. J. McCarty, and P. Zimmet. 1997 The rising global burden of diabetes and its complications: estimates and projec tions to the year 2010. Diabet Med 14 Suppl 5 :S1-85. Anderson, D. G., and J. C. Gillam. 2000. Paleoind ian colonization of the Americas: implications from an examination of physiography, de mography, and artifact distribution. Am Antiquity 65 :43-66. Armelagos, G. J., and J. Dewey. 1975. Evolutionary response to human infectious disease. Bioscience 20:271-275. Armelagos, G. J., K. N. Harper, and P. S. Ocampo. 2005. On the Trail of the Twisted Treponeme: Searching for the Origins of S yphilis. Evolutionary Anthropology: Issues, News, and Reviews 14:240-242. Arthos, J., C. Cicala, E. Martinelli, K. Macleod, D. Van Ryk, D. Wei, Z. Xiao, T. D. Veenstra, T. P. Conrad, R. A. Lempicki, S. McLaughlin, M. Pascuccio, R. Gopaul J. McNally, C. C. Cruz, N. Censoplano, E. Chung, K. N. Reitano, S. Kottilil, D. J. Goode, and A. S. Fauci. 2008. HIV-1 envelope protein binds to and si gnals through integrin alpha(4)beta(7), the gut mucosal homing receptor for peripheral T cells. Nat Immunol. Arthur, P. G., M. Smith, and P. E. Hartmann. 1989. M ilk lactose, citrate, and glucose as markers of lactogenesis in normal and diabetic women. J Pediatr Gastroenterol Nutr 9:488-496. Baker, B. J., and G. J. Armelagos. 1988. The orig in and antiquity of syphilis: paleopathological diagnosis and interpretation. Curr Anthropol 29 :703-738. Baldo, L., S. Bordenstein, J. J. Wernegreen, and J. H. Werren. 2006. Widespread recombination throughout Wolbachia genomes. Mol Biol Evol 23:437-449. Barker, D. J., C. N. Hales, C. H. Fall, C. Osmond, K. Phipps, and P. M. Clark. 1993. Type 2 (non-insulin-dependent) diabetes mellitus, hypertension and hyperlipidaemia (syndrome X): relation to reduced fe tal growth. Diabetologia 36:62-67. Barrett, J. C., B. Fry, J. Maller, and M. J. Da ly. 2005. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21:263-265. Barrett, R., C. W. Kuzawa, T. McDade, and G. J. Armelagos. 1998. Emerging and re-emerging infectious diseases: The third epidemiological transition. A nnual Review of Anthropology 27:247-271. Becquart, P., N. Chomont, P. Roques, A. Ayouba, M. D. Kazatchkine, L. Belec, and H. Hocini. 2002. Compartmentalization of HIV-1 between breast milk and blood of HIV-infected mothers. Virology 300:109-117.
Becquart, P., H. Hocini, B. Garin, A. Se pou, M. D. Kazatchkine, and L. Belec. 1999. Compartmentalization of the IgG immune re sponse to HIV-1 in breast milk. Aids 13:1323-1331. Becquart, P., H. Hocini, M. Levy, A. Sepou, M. D. Kazatchkine, and L. Belec. 2000. Secretory anti-human immunodeficiency virus (HIV) an tibodies in colostrum and breast milk are not a major determinant of the protection of ear ly postnatal transmission of HIV. J Infect Dis 181:532-539. Belfer, I., H. Hipp, C. McKnight, C. Evans, B. Buzas, A. Bollettino, B. Albaugh, M. Virkkunen, Q. Yuan, M. Max, D. Goldman, and M. Enoch. 2006. Association of galanin haplotypes with alcoholism and anxiety in two ethnically distinct popu lations. Molecular Psychiatry 11:301-311. Benyshek, D. C., and J. T. Watson. 2006. Explor ing the thrifty genotype's food-shortage assumptions: a cross-cultural comparison of ethnographic accounts of food security among foraging and agricultural soci eties. Am J Phys Anthropol 131:120-126. Berger, E. A., R. W. Doms, E. M. Fenyo, B. T. Korber, D. R. Littman, J. P. Moore, Q. J. Sattentau, H. Schuitemaker, J. Sodroski, and R. A. Weiss. 1998. A ne w classification for HIV-1. Nature 391:240. Bertotto, A., G. Castellucci, G. Fabietti, F. Scalise, and R. Vaccaro. 1990a. Lymphocytes bearing the T cell receptor gamma delta in human breast milk. Arch Dis Child 65 :1274-1275. Bertotto, A., R. Gerli, G. Fabietti, S. Crupi C. Arcangeli, F. Scalise, and R. Vaccaro. 1990b. Human breast milk T lymphocytes display the phenotype and functiona l characteristics of memory T cells. Eur J Immunol 20:1877-1880. Bibbins-Domingo, K., and A. Fernandez. 2007. Bi Dil for heart failure in black patients: implications of the U.S. Food and Drug Administration approval. Ann Intern Med 146:52-56. Bjorndal, A., H. Deng, M. Jansson, J. R. Fiore, C. Colognesi, A. Ka rlsson, J. Albert, G. Scarlatti, D. R. Littman, and E. M. Fenyo. 1997. Coreceptor usage of primary human immunodeficiency virus type 1 isolates vari es according to biological phenotype. J Virol 71:7478-7487. Blanco-Gelaz, M. A., A. Lopez-Vazquez, S. Garc ia-Fernandez, J. Martinez-Borra, S. Gonzalez, and C. Lopez-Larrea. 2001. Genetic variab ility, molecular evol ution, and geographic diversity of HLAB27. Hum Immunol 62:1042-1050. Bonsch, D., V. Greifenberg, K. Bayerlein, T. Biermann, U. Reulbach, T. Hillemacher, J. Kornhuber, and S. Bleich. 2005a. Alpha-synucle in protein levels are increased in alcoholic patients and are linked to craving. Alcohol Clin Exp Res 29:763-765. Bonsch, D., T. Lederer, U. Reulbach, T. Ho thorn, J. Kornhuber, and S. Bleich. 2005b. Joint analysis of the NACP-REP1 marker with in the alpha synuclein gene concludes association with alcohol dependence. Hum Mol Genet 14:967-971. Bonsch, D., B. Lenz, J. Kornhuber, and S. Bleich. 2005c. DNA hypermethylation of the alpha synuclein promoter in patients with alcoholism. Neuroreport 16:167-170. Bonsch, D., U. Reulbach, K. Bayerlein, T. Hillemacher, J. Kornhuber, and S. Bleich. 2004. Elevated alpha synuclein mRNA levels are associated with craving in patients with alcoholism. Biol Psychiatry 56:984-986. Borras, E., C. Coutelle, A. Rosell, F. Fernandez-Muixi, M. Broch, B. Crosas, L. Hjelmqvist, A. Lorenzo, C. Gutierrez, M. Santos, M. Szczepanek, M. Heilig, P. Quattrocchi, J. Farres, F. Vidal, C. Richart, T. Mach, J. Bogdal, H. Jornvall, H. K. Seitz, P. Couzigou, and X.
Pares. 2000. Genetic polymorphism of alc ohol dehydrogenase in europeans: the ADH2*2 allele decreases the risk for alcoholism and is associ ated with ADH3*1. Hepatology 31:984-989. Brass, A. L., D. M. Dykxhoorn, Y. Benita, N. Ya n, A. Engelman, R. J. Xavier, J. Lieberman, and S. J. Elledge. 2008. Identification of host proteins required for HIV infection through a functional genomic screen. Science 319:921-926. Briggs, D. R., D. L. Tuttle, J. W. Sleasma n, and M. M. Goodenow. 2000. Envelope V3 amino acid sequence predicts HIV-1 phenotype (co-receptor usage and tropism for macrophages). Aids 14:2937-2939. Broder, C. C., and R. G. Collman. 1997. Ch emokine receptors and HIV. J Leukoc Biol 62:20-29. Bruen, T. C., H. Philippe, and D. Bryant. 2006. A si mple and robust statistical test for detecting the presence of recombination. Genetics 172:2665-2681. Burkala, E. J., J. He, J. T. West, C. Wood, and C. K. Petito. 2005. Compartmentalization of HIV1 in the central nervous system: ro le of the choroid plexus. Aids 19:675-684. Cann, R. L., M. Stoneking, and A. C. Wilson. 1987. Mitochondrial DNA and human evolution. Nature 325:31-36. Cao, K., A. M. Moormann, K. E. Lyke, C. Masabe rg, O. P. Sumba, O. K. Doumbo, D. Koech, A. Lancaster, M. Nelson, D. Meyer, R. Single, R. J. Hartzman, C. V. Plowe, J. Kazura, D. L. Mann, M. B. Sztein, G. Thomson, and M. A. Fernandez-Vina. 2004. Differentiation between African populations is evidenced by th e diversity of allele s and haplotypes of HLA class I loci. Tissue Antigens 63 :293-325. Caramelli, D., C. Lalueza-Fox, S. Condemi, L. L ongo, L. Milani, A. Manfredini, M. de Saint Pierre, F. Adoni, M. Lari, P. Giunti, S. Ricci A. Casoli, F. Calafell, F. Mallegni, J. Bertranpetit, R. Stanyon, G. Bertorelle, and G. Barbujan i. 2006. A highly divergent mtDNA sequence in a Neandertal in dividual from Italy. Curr Biol 16:R630-632. Carmody, M. S., and J. R. Anderson. 2007. BiDil (i sosorbide dinitrate and hydralazine): a new fixed-dose combination of two older medicatio ns for the treatment of heart failure in black patients. Cardiol Rev 15:46-53. Carrillo, A., and L. Ratner. 1996. Cooperative eff ects of the human immunode ficiency virus type 1 envelope variable loops V1 and V3 in mediating infectivity for T cells. J Virol 70:13101316. Cavalli-Sforza, L. L., P. Menozzi, and A. Piazza. 1994. The History a nd Geography of Human Genes. Princeton University Press, Princeton. CDC. 2006a. Health, United States, 2006: with Chart book on Trends in the Health of Americans. CDC. 2006b. Mother to Child (Per inatal) HIV Transmission and Inf ection. Department of Health and Human Services. CDC. 2007. What Women Can Do. Departme nt of Health and Human Services. Centurion-Lara, A., C. Castro, L. Barrett, C. Ca meron, M. Mostowfi, W. C. Van Voorhis, and S. A. Lukehart. 1999. Treponema pallidum major sheath protein homologue TprK is a target of opsonic antibody and the protectiv e immune response. J Exp Med 189:647-656. Centurion-Lara, A., C. Godornes, C. Castro, W. C. Van Voorhis, and S. A. Lukehart. 2000a. The tprK gene is heterogeneous among Treponema pallidum strains and has multiple alleles. Infect Immun 68:824-831. Centurion-Lara, A., R. E. LaFond, K. Hevner, C. Godornes, B. J. Molini, W. C. Van Voorhis, and S. A. Lukehart. 2004. Gene conversion: a mechanism for generation of heterogeneity in the tprK gene of Treponema pallidum during infection. Mol Microbiol 52 :1579-1596.
Centurion-Lara, A., B. J. Molini, C. Godornes, E. Sun, K. Hevner, W. C. Van Voorhis, and S. A. Lukehart. 2006. Molecular differentiation of Treponema pallidum subspecies. J Clin Microbiol 44 :3377-3380. Centurion-Lara, A., E. S. Sun, L. K. Barrett, C. Castro, S. A. Lukehart, and W. C. Van Voorhis. 2000b. Multiple alleles of Treponema pallidum repeat gene D in Treponema pallidum isolates. J Bacteriol 182 :2332-2335. Chao, Y. C., M. F. Wang, H. S. Tang, C. T. Hsu, and S. J. Yin. 1994. Genotyping of alcohol dehydrogenase at the ADH2 and ADH3 loci by using a polymerase chain reaction and restriction-fragment-length polymorphism in Chinese alcoholic cirrhotics and nonalcoholics. Proc Natl Sci Counc Repub China B 18:101-106. Chen, C. C., R. B. Lu, Y. C. Chen, M. F. Wa ng, Y. C. Chang, T. K. Li, and S. J. Yin. 1999. Interaction between the functional polymor phisms of the alcohol-metabolism genes in protection against alcoholism. Am J Hum Genet 65:795-807. Chen, W. J., E. W. Loh, Y. P. Hsu, C. C. Ch en, J. M. Yu, and A. T. Cheng. 1996. Alcoholmetabolising genes and alcoholism among Taiw anese Han men: independent effect of ADH2, ADH3 and ALDH2. Br J Psychiatry 168 :762-767. Chen, X., H. A. de Silva, M. J. Pettenati, P. N. Rao, P. St George-Hyslop, A. D. Roses, Y. Xia, K. Horsburgh, K. Ueda, and T. Saitoh. 1995. The human NACP/alpha-synuclein gene: chromosome assignment to 4q21.3-q22 and TaqI RFLP analysis. Genomics 26:425-427. Chiba-Falek, O., J. W. Touchman, and R. L. Nu ssbaum. 2003. Functional anal ysis of intra-allelic variation at NACP-Rep1 in the alpha-synuclein gene. Hum Genet 113:426-431. Chisenga, M., L. Kasonka, M. Makasa, M. Sinka la, C. Chintu, C. Kaseba, F. Kasolo, A. Tomkins, S. Murray, and S. Filteau. 2005. Factors affecting the duration of exclusive breastfeeding among HIV-infect ed and -uninfected women in Lusaka, Zambia. J Hum Lact 21:266-275. Clarimon, J., R. R. Gray, L. N. Williams, M. A. Enoch, R. W. Robin, B. Albaugh, A. Singleton, D. Goldman, and C. J. Mulligan. 2007. Linkage disequilibrium and association analysis of alpha-synuclein and alcohol and drug depe ndence in two American Indian populations. Alcohol Clin Exp Res 31:546-554. Cockburn, J. 1971. Infectious disease in ancient populations. Current Anthropology 12 :45-62. Cohen, M. N., and G. J. Armelagos. 1984. Pale opathology at the Orig ins of Agriculture. Academic Press, Orlando. Connor, R. I., K. E. Sheridan, D. Ceradini S. Choe, and N. R. Landau. 1997. Change in coreceptor use coreceptor use correlates w ith disease progression in HIV-1--infected individuals. J Exp Med 185:621-628. Coovadia, H. M., N. C. Rollins, R. M. Bland, K. Little, A. Coutsoudis, M. L. Bennish, and M. L. Newell. 2007. Mother-to-child transmission of HIV-1 infection during exclusive breastfeeding in the first 6 months of life: an intervention cohort study. Lancet 369:11071116. Coutsoudis, A. 2000. Influence of infant feedi ng patterns on early mother -to-child transmission of HIV-1 in Durban, South Africa. Ann N Y Acad Sci 918:136-144. Coutsoudis, A., F. Dabis, W. Fawzi, P. Gaillard, G. Haverkamp, D. R. Harris, J. B. Jackson, V. Leroy, N. Meda, P. Msellati, M. L. Newell, R. Nsuati, J. S. Read, and S. Wiktor. 2004. Late postnatal transmission of HIV-1 in breas t-fed children: an individual patient data meta-analysis. J Infect Dis 189:2154-2166.
Coutsoudis, A., L. Kuhn, K. Pillay, and H. M. Coovadia. 2002. Exclusive breast-feeding and HIV transmission. Aids 16:498-499. Coutsoudis, A., K. Pillay, L. Kuhn, E. Spooner, W. Y. Tsai, and H. M. Coovadia. 2001. Method of feeding and transmission of HIV-1 from mothers to children by 15 months of age: prospective cohort study from Durban, South Africa. Aids 15 :379-387. Coutsoudis, A., K. Pillay, E. Spooner, L. Kuhn, and H. M. Coovadia. 1999. Influence of infantfeeding patterns on early moth er-to-child transmission of HIV-1 in Durban, South Africa: a prospective cohort study. South Af rican Vitamin A Study Group. Lancet 354:471-476. Crago, S. S., S. J. Prince, T. G. Pretlow, J. R. McGhee, and J. Mestecky. 1979. Human colostral cells. I. Separation and characterization. Clin Exp Immunol 38:585-597. Crosby, A. 1969. The Early History of Syphilis: A Reappraisal. American Anthropologist 71:218-227. Dabis, F., P. Msellati, N. Meda, C. Welffens-E kra, B. You, O. Manigart, V. Leroy, A. Simonon, M. Cartoux, P. Combe, A. Ouangre, R. Ra mon, O. Ky-Zerbo, C. Montcho, R. Salamon, C. Rouzioux, P. Van de Perre, and L. Mandelbrot. 1999. 6-month effi cacy, tolerance, and acceptability of a short regimen of oral zidovud ine to reduce vertical transmission of HIV in breastfed children in Cote d'Ivoire and Burkina Faso: a double-blind placebocontrolled multicentre trial. DITRAME St udy Group. DIminution de la Transmission Mere-Enfant. Lancet 353 :786-792. De Pasquale, M. P., A. J. Leigh Brown, S. C. Uvin, J. Allega-Ingersoll, A. M. Caliendo, L. Sutton, S. Donahue, and R. T. D'Aquila. 2003. Differences in HIV-1 pol sequences from female genital tract and blood during anti retroviral therapy. J Acquir Immune Defic Syndr 34:37-44. Desselberger, U. 2000. Emerging and re-eme rging infectious diseases. J Infect 40:3-15. Diaz, G. A., B. D. Gelb, N. Risch, T. G. Nyg aard, A. Frisch, I. J. Cohen, C. S. Miranda, O. Amaral, I. Maire, L. Poenaru, C. Caillaud, M. Weizberg, P. Mistry, and R. J. Desnick. 2000. Gaucher disease: the origins of the As hkenazi Jewish N370S and 84GG acid betaglucosidase mutations. Am J Hum Genet 66:1821-1832. Dirks, R. 1993. Starvation and Famine: cross-cultural codes and some hypothesis tests. Cross Cult Res 27:28-69. Dixon, E. J. 2001. Human colonization of the Americas: timing, technology and process. Quat Sci Rev 20:277-299. Doeblin, T. D., K. Evans, and G. B. Ingall. 1969. Diabetes and hyperglycemia in Seneca Indians. Human Hered 19:613-627. Donaldson, Y. K., J. E. Bell, J. W. Ironside, R. P. Brettle, J. R. Robertson, A. Busuttil, and P. Simmonds. 1994. Redistribution of HIV outside the lymphoid system with onset of AIDS. Lancet 343:383-385. Douek, D. 2007a. HIV Diseas e Progression. Top HIV Med 15:114-117. Douek, D. 2007b. HIV disease progr ession: immune activation, mi crobes, and a leaky gut. Top HIV Med 15:114-117. Dover, G. 2002. Molecular Drive. Trends Genet 18:587-589. Drouin, G., F. Prat, M. Ell, and G. D. Cl arke. 1999. Detecting and characterizing gene conversions between multigene family members. Mol Biol Evol 16:1369-1390. Drummond, A. J., S. Y. Ho, M. J. Phillips, and A. Rambaut. 2006. Relaxed phylogenetics and dating with confidence. PLoS Biol 4:e88.
Drummond, A. J., and A. Rambaut. 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol 7:214. Drummond, A. J., A. Rambaut, B. Shapiro, and O. G. Pybus. 2005. Bayesian coalescent inference of past population dynamics from molecular sequences. Mol Biol Evol 22:1185-1192. Dudbridge, F. 2003. Pedigree disequilibrium tests for multilocus haplotypes. Genet Epidemiol 25:115-121. Duster, T. 2007. Medicalisation of race. Lancet 369 :702-704. Ehlers, C. L., C. Garcia-Andrade, T. L. Wall, D. F. Sobel, and E. Phillips. 1998. Determinants of P3 amplitude and response to alcohol in Native American Mission Indians. Neuropsychopharmacology 18:282-292. Ehlers, C. L., D. A. Gilder, L. Harris, and L. Carr. 2001. Association of the ADH2*3 allele with a negative family history of alcoholism in African American young adults. Alcohol Clin Exp Res 25:1773-1777. Ehlers, C. L., D. A. Gilder, T. L. Wall, E. Ph illips, H. Feiler, and K. C. Wilhelmsen. 2004a. Genomic screen for loci associated with alcohol dependence in Mission Indians. Am J Med Genet B Neuropsychiatr Genet 129:110-115. Ehlers, C. L., T. L. Wall, M. Betancourt, a nd D. A. Gilder. 2004b. Th e clinical course of alcoholism in 243 Mission Indians. Am J Psychiatry 161 :1204-1210. Embree, J. E., S. Njenga, P. Datta, N. J. Nagelkerke, J. O. Ndinya-Achola, Z. Mohammed, S. Ramdahin, J. J. Bwayo, and F. A. Plummer. 2000. Risk factors for postnatal mother-child transmission of HIV-1. Aids 14:2535-2541. Endicott, J., and R. L. Spitzer 1978. A diagnostic interview: the schedule for affective disorders and schizophrenia. Arch Gen Psychiatry 35 :837-844. Excoffier, L., G. Laval, and S. Schneider. 2005. Arlequin ver. 3.0: An integrated software package for population genetics data anal ysis. Evolutionary Bioinformatics Online 1:4750. Eyre-Walker, A. 1993. Recombination and ma mmalian genome evolution. Proc Biol Sci 252:237-243. Fauci, A. S. 1998. New and reemerging diseases: the importance of biomedical research. Emerg Infect Dis 4:374-378. Fauci, A. S., N. A. Touchette, and G. K. Folk ers. 2005. Emerging infectious diseases: a 10-year perspective from the National Institute of A llergy and Infectious Diseases. Emerg Infect Dis 11:519-525. Fawzi, W., G. Msamanga, D. Spiegelman, B. Renjifo, H. Bang, S. Kapiga, J. Coley, E. Hertzmark, M. Essex, and D. Hunter. 2002a. Transmission of HIV-1 through breastfeeding among women in Dar es Salaam, Tanzania. J Acquir Immune Defic Syndr 31:331-338. Fawzi, W. W., G. I. Msamanga, D. Hunter, B. Renjifo, G. Antelman, H. Bang, K. Manji, S. Kapiga, D. Mwakagile, M. Essex, and D. Spiegelman. 2002b. Randomized trial of vitamin supplements in relation to transmi ssion of HIV-1 through breastfeeding and early child mortality. Aids 16 :1935-1944. Feavers, I. M., A. B. Heath, J. A. Bygraves, and M. C. Maiden. 1992. Role of horizontal genetic exchange in the antigenic variation of the class 1 outer membrane protein of Neisseria meningitidis Mol Microbiol 6:489-495.
Fee, M. 2006. Racializing narratives: Obesity, di abetes, and the "aborig inal" thrifty genotype. Social Science and Medicine 62:2988-2997. Feil, E. J., E. C. Holmes, D. E. Bessen, M. S. Chan, N. P. Day, M. C. Enright, R. Goldstein, D. W. Hood, A. Kalia, C. E. Moore, J. Zhou, and B. G. Spratt. 2001. Recombination within natural populations of pathogenic bacteria: short-term empirical estimates and long-term phylogenetic consequences. Proc Natl Acad Sci U S A 98:182-187. Feil, E. J., M. C. Maiden, M. Achtman, and B. G. Spratt. 1999. The relative contributions of recombination and mutation to the divergence of clones of Neisseria meningitidis Mol Biol Evol 16 :1496-1502. Feil, E. J., and B. G. Spratt. 2001. Recombinati on and the population stru ctures of bacterial pathogens. Annu Rev Microbiol 55 :561-590. Feldmann, H., M. Czub, S. Jones, D. Dick, M. Garbutt, A. Grolla, and H. Artsob. 2002. Emerging and re-emerging infectious diseases. Med Microbiol Immunol 191:63-74. Fellay, J., K. V. Shianna, D. Ge, S. Colombo, B. Ledergerber, M. Weale, K. Zhang, C. Gumbs, A. Castagna, A. Cossarizza, A. Cozzi-Lepri, A. De Luca, P. Easterbrook, P. Francioli, S. Mallal, J. Martinez-Picado, J. M. Miro, N. Obel J. P. Smith, J. Wyniger, P. Descombes, S. E. Antonarakis, N. L. Letvin, A. J. McMi chael, B. F. Haynes, A. Telenti, and D. B. Goldstein. 2007. A whole-genome association study of major determinants for host control of HIV-1. Science 317:944-947. Fix, A. G. 2002. Colonization models and initial genetic diversity in th e Americas. Hum Biol 74:1-10. Fraser, C. M., S. J. Norris, G. M. Weinstock, O. White, G. G. Sutton, R. Dodson, M. Gwinn, E. K. Hickey, R. Clayton, K. A. Ketchum, E. Sodergren, J. M. Hardham, M. P. McLeod, S. Salzberg, J. Peterson, H. Khalak, D. Richardson, J. K. Howell, M. Chidambaram, T. Utterback, L. McDonald, P. Artiach, C. Bowman, M. D. Cotton, C. Fujii, S. Garland, B. Hatch, K. Horst, K. Roberts, M. Sandusky, J. Weidman, H. O. Smith, and J. C. Venter. 1998. Complete genome sequence of Treponema pallidum the syphilis spirochete. Science 281 :375-388. Fribourg-Blanc, A., H. H. Mollaret, and G. Niel. 1966. [Serologic and microscopic confirmation of treponemosis in Guinea baboons]. Bull Soc Pathol Exot Filiales 59:54-59. Fuchs, A. R. 1991. Physiology a nd endocrinology of lactation. in S. G. Gabbe, ed. Obstetrics Normal and Problem Pregnancies. Churchill Livingstone, New York. Fujimoto, W. Y. 1996. Overview of non-insulindependent diabetes mellitus (NIDDM) in different population groups. Diabet Med 13:S7-10. Gabriel, S. B., S. F. Schaffner, H. Nguyen, J. M. Moore, J. Roy, B. Blumenstiel, J. Higgins, M. DeFelice, A. Lochner, M. Faggart, S. N. Liu-Cordero, C. Rotimi, A. Adeyemo, R. Cooper, R. Ward, E. S. Lander, M. J. Da ly, and D. Altshuler. 2002. The structure of haplotype blocks in the human genome. Science 296:2225-2229. Galtier, N. 2003. Gene conversion drives GC content evolution in mammalian histones. Trends Genet 19:65-68. Galtier, N., G. Piganeau, D. Mouchiroud, and L. Duret. 2001. GC-content evolution in mammalian genomes: the biased gene conversion hypothesis. Genetics 159 :907-911. Galvani, A. P., and J. Novembre. 2005. The evolutionary history of the CCR5-Delta32 HIVresistance mutation. Microbes Infect 7:302-309.
Galvani, A. P., and M. Slatkin. 2003. Evaluating plague and smallpox as historical selective pressures for the CCR5-Delta 32 HIV-resist ance allele. Proc Natl Acad Sci U S A 100:15276-15279. Gao, F., E. Bailes, D. L. Robertson, Y. Chen, C. M. Rodenburg, S. F. Michael, L. B. Cummins, L. O. Arthur, M. Peeters, G. M. Shaw, P. M. Sharp, and B. H. Hahn. 1999. Origin of HIV-1 in the chimpanzee Pan troglodytes troglodytes. Nature 397:436-441. Garzino-Demo, A., A. L. DeVico, K. E. Conant and R. C. Gallo. 2000. The role of chemokines in human immunodeficiency vi rus infection. Immunol Rev 177:79-87. Gatanaga, H., S. Oka, S. Ida, T. Wakabayash i, T. Shioda, and A. Iwamoto. 1999. Active HIV-1 redistribution and replication in the br ain with HIV encephalitis. Arch Virol 144:29-43. Georgeson, J. C., and S. M. Filteau. 2000. Physiology, immunology, and disease transmission in human breast milk. AIDS Patient Care STDS 14:533-539. Giacani, L., K. Hevner, and A. Centurion-Lara. 2005. Gene organization and transcriptional analysis of the tprJ, tprI, tprG, and tprF loci in Treponema pallidum strains Nichols and Sea 81-4. J Bacteriol 187 :6084-6093. Giacani, L., E. S. Sun, K. Hevner, B. J. Molini, W. C. Van Voorhis, S. A. Lukehart, and A. Centurion-Lara. 2004. Tpr homologs in Treponema paraluiscuniculi Cuniculi A strain. Infect Immun 72:6561-6576. Gilder, D. A., T. L. Wall, and C. L. Ehlers. 2004. Comorbidity of select anxiety and affective disorders with alcohol dependence in southwes t California Indians. Alcohol Clin Exp Res 28:1805-1813. Goedde, H. W., D. P. Agarwal, G. Fritze, D. Meier-Tackmann, S. Singh, G. Beckmann, K. Bhatia, L. Z. Chen, B. Fang, R. Lisker and et al. 1992. Dist ribution of ADH2 and ALDH2 genotypes in different populations. Hum Genet 88:344-346. Gogarten, J. P., and L. Olendzenski. 1999. Or thologs, paralogs and genome comparisons. Curr Opin Genet Dev 9:630-636. Goldman, A. S. 1993. The immune system of huma n milk: antimicrobial, antiinflammatory and immunomodulating properties Pediatr Infect Dis J 12:664-671. Goldman, A. S., S. Chheda, and R. Garofalo. 1998. Evolution of immunologic functions of the mammary gland and the postnatal deve lopment of immunity. Pediatr Res 43:155-162. Goodenow, M. M., S. L. Rose, D. L. Tuttle and J. W. Sleasman. 2003. HIV-1 fitness and macrophages. J Leukoc Biol 74:657-666. Gray, R. R., C. J. Mulligan, B. J. Molini, E. S. Sun, L. Giacani, C. Godornes, A. Kitchen, S. A. Lukehart, and A. Centurion-Lara. 2006. Molecular evolution of the tprC, D, I, K, G, and J genes in the pathogenic genus Treponema. Mol Biol Evol 23 :2220-2233. Grenfell, B. T., O. G. Pybus, J. R. Gog, J. L. Wood, J. M. Daly, J. A. Mumford, and E. C. Holmes. 2004. Unifying the epidemiological an d evolutionary dynamics of pathogens. Science 303 :327-332. Gurtler, L. G., P. H. Hauser, J. Eberle, A. von Brunn, S. Knapp, L. Zekeng, J. M. Tsague, and L. Kaptue. 1994. A new subtype of human immu nodeficiency virus type 1 (MVP-5180) from Cameroon. J Virol 68:1581-1585. Haase, A. T., K. Henry, M. Zupancic, G. Sedgewi ck, R. A. Faust, H. Melroe, W. Cavert, K. Gebhard, K. Staskus, Z. Q. Zhang, P. J. Daile y, H. H. Balfour, Jr., A. Erice, and A. S. Perelson. 1996. Quantitative image analysis of HIV-1 infection in lymphoid tissue. Science 274 :985-989.
Hackett, C. J. 1963. On the origin of the human treponematoses (pinta, yaws, endemic syphilis and venereal syphilis). Bulletin of the World Health Organization 29 :7-41. Hales, C. N., and D. J. Barker. 1992. Type 2 (non-insulin-dependent) diabetes mellitus: the thrifty phenotype hypothesis. Diabetologia 35:595-601. Hales, C. N., and D. J. Barker. 2001. The thrifty phenotype hypothesis. Br Med Bull 60 :5-20. Hara, K., O. Terasaki, and Y. Okubo. 2000. Di pole estimation of alpha EEG during alcohol ingestion in males genotypes for ALDH2. Life Sci 67:1163-1173. Harada, S., D. P. Agarwal, and H. W. Goedde. 1982. Mechanism of alcohol sensitivity and disulfiram-ethanol reaction. S ubst Alcohol Actions Misuse 3 :107-115. Harpending, H., and A. Rogers. 2000. Genetic pers pectives on human origins and differentiation. Annu Rev Genomics Hum Genet 1:361-385. Harpending, H., S. T. Sherry, A. R. Rogers, and M. Stoneking. 1993. The genetic structure of ancient human populations. Curr Anthropol 34:483-496. Harpending, H. C. 1994. Signature of anci ent population growth in a low-resolution mitochondrial DNA mismatch distribution. Hum Biol 66:591-600. Harpending, H. C., M. A. Batzer, M. Gurven, L. B. Jorde, A. R. Rogers, and S. T. Sherry. 1998. Genetic traces of ancient demogr aphy. Proc Natl Acad Sci U S A 95:1961-1967. Harper, K. N., P. S. Ocampo, B. M. Steiner, R. W. George, M. S. Silverman, S. Bolotin, A. Pillay, N. J. Saunders, and G. J. Armelagos. 2008. On the origin of the treponematoses: a phylogenetic approach. PLoS Negl Trop Dis 2:e148. Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22:160-174. Helzer, J. E., K. K. Bucholz, L. J. Bierut, D. A. Regier, M. A. Schuckit, and S. E. Guth. 2006. Should DSM-V include dimensional diagnostic criteria for alc ohol use disorders? Alcohol Clin Exp Res 30:303-310. Henderson, G. J., N. G. Hoffman, L. H. Ping, S. A. Fiscus, I. F. Hoffman, K. M. Kitrinos, T. Banda, F. E. Martinson, P. N. Kazembe, D. A. Chilongozi, M. S. Cohen, and R. Swanstrom. 2004. HIV-1 populations in blood and breast milk are similar. Virology 330:295-303. Ho, F. C., R. L. Wong, and J. W. Lawton. 1979. Hu man colostral and breast milk cells. A light and electron microscopic study. Acta Paediatr Scand 68:389-396. Holmes, E. C., R. Urwin, and M. C. Maiden. 1999. The influence of recombination on the population structure and evolution of the human pathogen Neisseria meningitidis. Mol Biol Evol 16 :741-749. Hoover, E. L. 2007. There is no scientific ration ale for race-based research. J Natl Med Assoc 99:690-692. Howell-Adams, B., and H. S. Seifert. 2000. Molecular models accounting for the gene conversion reactions mediating gonococcal pi lin antigenic variation. Mol Microbiol 37:1146-1158. Hudson, E. H. 1965. Treponematosis and man's so cial evolution. American Anthropologist 67:885-901. Hudson, R. R., M. Slatkin, and W. P. Maddison. 1992. Estimation of levels of gene flow from DNA sequence data. Genetics 132:583-589. Hughes, D. 2000. Co-evolution of the tuf genes li nks gene conversion with the generation of chromosomal inversions. J Mol Biol 297:355-364.
Hughes, E. S., J. E. Bell, and P. Simmonds. 1997a. Investigation of population diversity of human immunodeficiency virus type 1 in vivo by nucleotide sequencing and length polymorphism analysis of the V1/V2 hypervariable region of env. J Gen Virol 78 ( Pt 11) :2871-2882. Hughes, E. S., J. E. Bell, and P. Simmonds. 1997b. Investigation of the dyna mics of the spread of human immunodeficiency virus to brain and other tissues by evolutionary analysis of sequences from the p17gag and env genes. J Virol 71:1272-1280. Hugot, J. P., M. Chamaillard, H. Zouali, S. Lesage, J. P. Cezard, J. Belaiche, S. Almer, C. Tysk, C. A. O'Morain, M. Gassull, V. Binder, Y. Finkel, A. Cortot, R. Modigliani, P. LaurentPuig, C. Gower-Rousseau, J. Macry, J. F. Colombel, M. Sahbatou, and G. Thomas. 2001. Association of NOD2 leucine-ri ch repeat variants with susceptibility to Crohn's disease. Nature 411:599-603. Hung, C. S., N. Vander Heyden, and L. Ratner. 1999. Analysis of the critical domain in the V3 loop of human immunodeficiency virus type 1 gp120 involved in CCR5 utilization. J Virol 73:8216-8226. Ichikawa, M., M. Sugita, M. Takahashi, M. Sato mi, T. Takeshita, T. Araki, and H. Takahashi. 2003. Breast milk macrophages spontaneously produce granulocyte-macrophage colonystimulating factor and differentiate into dendritic cells in the presence of exogenous interleukin-4 alone. Immunology 108:189-195. IHS. 2006. Facts on Indian Health Disparities. Indian Health Services. Iliff, P. J., E. G. Piwoz, N. V. Tavengwa, C. D. Zunguza, E. T. Marinda, K. J. Nathoo, L. H. Moulton, B. J. Ward, and J. H. Humphre y. 2005. Early exclusive breastfeeding reduces the risk of postnatal HIV-1 transmissi on and increases HIV-free survival. Aids 19:699708. Iwahashi, K., Y. Matsuo, H. Suwaki, K. Nakamura, and Y. Ichikawa. 1995. CYP2E1 and ALDH2 genotypes and alcohol dependence in Japanese. Alcohol Clin Exp Res 19:564566. John, G. C., R. W. Nduati, D. A. Mbori-Ngacha, B. A. Richardson, D. Panteleeff, A. Mwatha, J. Overbaugh, J. Bwayo, J. O. Ndinya-Achola, a nd J. K. Kreiss. 2001. Co rrelates of motherto-child human immunodeficiency virus type 1 (HIV-1) transmissi on: association with maternal plasma HIV-1 RNA load, genital HIV-1 DNA shedding, and breast infections. J Infect Dis 183 :206-212. Johnson, J. E., and C. W. McNutt. 1964. Diabetes mellitus in an American Indian population isolate. Tex Rep Biol Med 22:110-125. Jones, K. E., N. G. Patel, M. A. Levy, A. Stor eygard, D. Balk, J. L. Gittleman, and P. Daszak. 2008. Global trends in emerging infectious diseases. Nature 451:990-993. Kemal, K. S., B. Foley, H. Burger, K. Anastos, H. Minkoff, C. Kitchen, S. M. Philpott, W. Gao, E. Robison, S. Holman, C. Dehner, S. Beck, W. A. Meyer, 3rd, A. Landay, A. Kovacs, J. Bremer, and B. Weiser. 2003. HIV-1 in ge nital tract and plasma of women: compartmentalization of viral sequences, cor eceptor usage, and glycosylation. Proc Natl Acad Sci U S A 100:12972-12977. Kim, D.-J., I.-G. Choi, B. L. Park, B.-C. Lee, B.-J. Ham, S. Yoon, J. S. Bae, H. S. Cheong, and H. D. Shin. 2008. Major genetic compone nts underlying alcoholism in Korean population. Hum Mol Genet 17:854-858. Kimura, M., and J. L. King. 1979. Fixation of a delete rious allele at one of two "duplicate" loci by mutation pressure and random drif t. Proc Natl Acad Sci U S A 76:2858-2861.
King, R. C., and W. D. Stansfield. 1997. A Dictio nary of Genetics. Oxford University Press, New York. Kinzie, J. D., P. K. Leung, J. Boehnlein, D. Matsunaga, R. Johnson, S. Manson, J. H. Shore, J. Heinz, and M. Williams. 1992. Psychiatric epidemiology of an Indian village. A 19-year replication study. J Nerv Ment Dis 180 :33-39. Kitrinos, K. M., N. G. Hoffman, J. A. Nelson, and R. Swanstrom. 2003. Turnover of env variable region 1 and 2 genotypes in subjects with la te-stage human immunodeficiency virus type 1 infection. J Virol 77:6811-6822. Klevytska, A. M., M. R. Mracna, L. Guay, G. Becker-Pergola, M. Furtado, L. Zhang, J. B. Jackson, and S. H. Eshleman. 2002. Analysis of length variation in the V1-V2 region of env in nonsubtype B HIV type 1 from Uganda. AIDS Res Hum Retroviruses 18:791-796. Knowler, W. C., D. J. Pettitt, M. F. Saad, a nd P. H. Bennett. 1990. Diabetes mellitus in the Pima Indians: incidence, risk factors a nd pathogenesis. Diabetes Metab Rev 6:1-27. Kobayashi, H., S. Ide, J. Hasegawa, H. Ujike, Y. Sekine, N. Ozaki, T. Inada, M. Harano, T. Komiyama, M. Yamada, M. Iyo, H. W. Sh en, K. Ikeda, and I. Sora. 2004. Study of association between alpha-synuclein ge ne polymorphism and methamphetamine psychosis/dependence. Ann N Y Acad Sci 1025:325-334. Kolman, C. J., and E. Bermingham. 1997. M itochondrial and nuclear DNA diversity in the Choco and Chibcha Amerinds of Panama. Genetics 147:1289-1302. Kolman, C. J., E. Bermingham, R. Cooke, R. H. Ward, T. D. Arias, and F. Guionneau-Sinclair. 1995. Reduced mtDNA diversity in the N gobe Amerinds of Panama. Genetics 140:275283. Kolman, C. J., N. Sambuughin, and E. Berm ingham. 1996. Mitochondrial DNA analysis of Mongolian populations and implications for the origin of New World founders. Genetics 142:1321-1334. Kondrashov, F. A., I. B. Rogozin, Y. I. Wolf, a nd E. V. Koonin. 2002. Selection in the evolution of gene duplications. Genome Biol 3:RESEARCH0008. Konishi, T., M. Calvillo, A. S. Leng, J. Feng, T. L ee, H. Lee, J. L. Smith, S. H. Sial, N. Berman, S. French, V. Eysselein, K. M. Lin, and Y. J. Wan. 2003. The ADH3*2 and CYP2E1 c2 alleles increase the risk of alcoholism in Mexican American men. Exp Mol Pathol 74:183-189. Korber, B. T., K. J. Kunstman, B. K. Patterson, M. Furtado, M. M. McEvilly, R. Levy, and S. M. Wolinsky. 1994. Genetic differences between bl oodand brain-derived viral sequences from human immunodeficiency virus type 1infected patients: evidence of conserved elements in the V3 region of the envelope protein of brain-deri ved sequences. J Virol 68:7467-7481. Koulinska, I. N., E. Villamor, B. Chaplin, G. Msamanga, W. Fawzi, B. Renjifo, and M. Essex. 2006. Transmission of cell-free and cell-a ssociated HIV-1 through breast-feeding. J Acquir Immune Defic Syndr 41:93-99. Kourtis, A. P., S. Butera, C. Ibegbu, L. Bele d, and A. Duerr. 2003. Breast milk and HIV-1: vector of transmission or vehicle of protection? Lancet Infect Dis 3:786-793. Krause, J., C. Lalueza-Fox, L. Orlando, W. Enard, R. E. Green, H. A. Burbano, J. J. Hublin, C. Hanni, J. Fortea, M. de la Rasilla, J. Be rtranpetit, A. Rosas, and S. Paabo. 2007. The derived FOXP2 variant of modern humans wa s shared with Neandertals. Curr Biol 17:1908-1912.
Krings, M., C. Capelli, F. Tschentscher, H. Geiser t, S. Meyer, A. von Haeseler, K. Grossschmidt, G. Possnert, M. Paunovic, and S. Paabo. 2000. A view of Neandertal genetic diversity. Nat Genet 26 :144-146. Krings, M., H. Geisert, R. W. Schmitz, H. Krainitzki, and S. Paabo. 1999. DNA sequence of the mitochondrial hypervariable region II from the neandertal type speci men. Proc Natl Acad Sci U S A 96 :5581-5585. Krings, M., A. Stone, R. W. Schmitz, H. Krainitzki, M. Stoneking, and S. Paabo. 1997. Neandertal DNA sequences and the origin of modern humans. Cell 90:19-30. Kruger, R., W. Kuhn, T. Muller, D. Woitalla, M. Graeb er, S. Kosel, H. Przuntek, J. T. Epplen, L. Schols, and O. Riess. 1998. Ala30Pro mutation in the gene encoding alpha-synuclein in Parkinson's disease. Nat Genet 18:106-108. Kuhn, L., M. Sinkala, C. Kankasa, K. Semrau, P. Ka sonde, N. Scott, M. Mwiya, C. Vwalika, J. Walter, W. Y. Tsai, G. M. Aldrovandi, and D. M. Thea. 2007. High Uptake of Exclusive Breastfeeding and Reduced Early Post -Natal HIV Transmission. PLoS ONE 2:e1363. Kunitz, S. J., K. R. Gabriel, J. E. Levy, E. He nderson, K. Lampert, J. McCloskey, G. Quintero, S. Russell, and A. Vince. 1999. Alcohol depende nce and conduct disorder among Navajo Indians. J Stud Alcohol 60:159-167. LaFond, R. E., A. Centurion-Lara, C. Godornes, A. M. Rompalo, W. C. Van Voorhis, and S. A. Lukehart. 2003. Sequence diversity of Trepone ma pallidum subsp. pallidum tprK in human syphilis lesions and rabbit-propagated isolates. J Bacteriol 185:6262-6268. Lalueza-Fox, C., M. L. Sampietro, D. Caramelli, Y. Puder, M. Lari, F. Calafell, C. MartinezMaza, M. Bastir, J. Fortea, M. de la Rasilla, J. Bertranpetit, and A. Rosas. 2005. Neandertal evolutionary genetics: mitochondr ial DNA data from the iberian peninsula. Mol Biol Evol 22:1077-1081. Lamers, S. L., J. W. Sleasman, J. X. She, K. A. Barrie, S. M. Pomeroy, D. J. Barrett, and M. M. Goodenow. 1993. Independent variation and positive selection in env V1 and V2 domains within maternal-infant strains of human immunodeficiency virus type 1 in vivo. J Virol 67:3951-3960. Langford, D., A. Grigorian, R. Hurford, A. Adame, R. J. Ellis, L. Hansen, and E. Masliah. 2004. Altered P-glycoprotein expressi on in AIDS patients with HIV encephalitis. J Neuropathol Exp Neurol 63:1038-1047. Lathe, W. C., 3rd, and P. Bork. 2001. Evolution of tuf genes: ancient duplication, differential loss and gene conversion. FEBS Lett 502:113-116. Leitner, T., B. Korber, M. Daniels, C. Calef, and B. Foley. 2005. HIV-1 Subtype and Circulating Recombinant Form (CRF) Reference Se quences, 2005. The Human Retroviruses and AIDS 2005 Compendium. Theoretical Biology and Biophysics, Los Alamos. Leroy, V., J. M. Karon, A. Alioum, E. R. Ekpini, P. van de Perre, A. E. Greenberg, P. Msellati, M. Hudgens, F. Dabis, and S. Z. Wiktor. 2003. Postnatal transmission of HIV-1 after a maternal short-course zidovudine peripa rtum regimen in West Africa. Aids 17:14931501. Lewis, P., R. Nduati, J. K. Kreiss, G. C. J ohn, B. A. Richardson, D. Mbori-Ngacha, J. NdinyaAchola, and J. Overbaugh. 1998. Cell-free hum an immunodeficiency virus type 1 in breast milk. J Infect Dis 177:34-39. Liang, T., J. Spence, L. Liu, W. N. Strother, H. W. Chang, J. A. Ellison, L. Lumeng, T. K. Li, T. Foroud, and L. G. Carr. 2003. alpha-Synuclein maps to a quantitativ e trait locus for
alcohol preference and is differentially expressed in alcohol-preferring and nonpreferring rats. Proc Natl Acad Sci U S A 100:4690-4695. Liao, D. 2000. Gene conversion drives within genic sequences: concerted e volution of ribosomal RNA genes in bacteria and archaea. J Mol Evol 51:305-317. Libert, F., P. Cochaux, G. Beckman, M. Sa mson, M. Aksenova, A. Cao, A. Czeizel, M. Claustres, C. de la Rua, M. Ferrari, C. Ferrec, G. Glover, B. Grinde, S. Guran, V. Kucinskas, J. Lavinha, B. Mercier, G. Ogur, L. Peltonen, C. Rosatelli, M. Schwartz, V. Spitsyn, L. Timar, L. Beckman, M. Parmen tier, and G. Vassart. 1998. The deltaccr5 mutation conferring protection against HIV-1 in Caucasian populations has a single and recent origin in Northeastern Europe. Hum Mol Genet 7:399-406. Lindsay, R. S., and P. H. Bennett. 2001. Type 2 diabetes, the thrifty phenot ype an overview. Br Med Bull 60 :21-32. Long, J. C., W. C. Knowler, R. L. Hanson, R. W. Robin, M. Urbanek, E. Moore, P. H. Bennett, and D. Goldman. 1998. Evidence for genetic linkage to alcohol dependence on chromosomes 4 and 11 from an autosome-wide scan in an American Indian population. Am J Med Genet 81:216-221. Loussert-Ajaka, I., M. L. Chaix, B. Korber, F. Letourneur, E. Gomas, E. Allen, T. D. Ly, F. Brun-Vezinet, F. Simon, and S. Saragosti. 1995. Variability of hu man immunodeficiency virus type 1 group O strains isolated from Ca meroonian patients living in France. J Virol 69:5640-5649. Lucotte, G. 2001. Distribution of the CCR5 gene 32-basepair deletion in West Europe. A hypothesis about the possible di spersion of the mutation by th e Vikings in historical times. Hum Immunol 62:933-936. Lukehart, S. A., S. A. Baker-Zander, R. M. Lloyd, and S. Sell. 1980. Characterization of lymphocyte responsiveness in early experimental syphilis. II. Nature of cellular infiltration and Treponema pallidum distribution in testicular lesions. J Immunol 124:461-467. Luo, X., H. R. Kranzler, L. Zuo, S. Wang, N. J. Schork, and J. Gelernter. 2007. Multiple ADH genes modulate risk for drug dependence in both Africanand European-Americans. Hum Mol Genet 16:380-390. Lynch, M., and J. S. Conery. 2000. The evolutionary fate and consequences of duplicate genes. Science 290 :1151-1155. Lynch, M., and A. Force. 2000. The probabi lity of duplicate gene preservation by subfunctionalization. Genetics 154:459-473. Macaulay, V., C. Hill, A. Achilli, C. Rengo, D. Clarke, W. Meehan, J. Blackburn, O. Semino, R. Scozzari, F. Cruciani, A. Taha, N. K. Shaari, J. M. Raja, P. Ismail, Z. Zainuddin, W. Goodwin, D. Bulbeck, H. J. Bandelt, S. O ppenheimer, A. Torroni, and M. Richards. 2005. Single, rapid coastal settlement of As ia revealed by analysis of complete mitochondrial genomes. Science 308 :1034-1036. Maddison, W. P., and D. R. Maddison. 1989. Interactive analysis of phylogeny and character evolution using the computer progra m MacClade. Folia Primatol (Basel) 53:190-202. Maddon, P. J., A. G. Dalgleish, J. S. McDougal, P. R. Clapham, R. A. Weiss, and R. Axel. 1986. The T4 gene encodes the AIDS virus receptor and is expressed in the immune system and the brain. Cell 47 :333-348.
Manning, L. S., and M. J. Parmely. 1980. Cellula r determinants of mammary cell-mediated immunity in the rat. I. The migration of radioisotopically labeled T lymphocytes. J Immunol 125:2508-2514. Martin, D. P., C. Williamson, and D. Posada. 2005. RDP2: recombination detection and analysis from sequence alignments. Bioinformatics 21:260-262. Mash, D. C., Q. Ouyang, J. Pablo, M. Basile, S. Izenwasser, A. Lieberman, and R. J. Perrin. 2003. Cocaine abusers have an overexpression of alpha-synuclein in dopamine neurons. J Neurosci 23 :2564-2571. Mathers, C. D., and D. Loncar. 2006. Projections of global mortality and burden of disease from 2002 to 2030. PLoS Med 3:e442. McCarthy, D. J., and P. Zimmet. 2001. P acific Island Populations. Pp. 239-245 in J. M. Ekoe, P. Zimmet, and R. WIlliams, eds. The epidemiol ogy of diabetes mellitus. An international perspective. Wiley, Chichester. McCarthy, D. M., T. L. Wall, S. A. Brown, and L. G. Carr. 2000. Integrating biological and behavioral factors in alcohol use risk: the role of ALDH2 status and alcohol expectancies in a sample of Asian Americans. Exp Clin Psychopharmacol 8:168-175. Meltzer, D. J. 1993. Pleistocene peopling of the Americas. Evol Anthropol 1:157-169. Merriwether, D. A., F. Rothhammer, and R. E. Ferrell. 1995. Distributi on of the four founding lineage haplotypes in Native Americans suggest s a single wave of migration for the New World. Am J Phys Anthropol 98:411-430. Monsalve, M. V., A. Helgason, and D. V. Devine. 1999. Languages, geography and HLA haplotypes in native American and Asian populations. Proc Biol Sci 266 :2209-2216. Moore, C. B., M. John, I. R. James, F. T. Christiansen, C. S. Witt, and S. A. Mallal. 2002. Evidence of HIV-1 adaptation to HLA-restrict ed immune responses at a population level. Science 296 :1439-1443. Morens, D. M., G. K. Folkers, and A. S. Fauci. 2004. The challenge of emerging and reemerging infectious diseases. Nature 430:242-249. Morris, A., M. Marsden, K. Halcrow, E. S. Hughes, R. P. Brettle, J. E. Bell, and P. Simmonds. 1999. Mosaic structure of the human immunodefi ciency virus type 1 genome infecting lymphoid cells and the brain: evidence for fre quent in vivo recombin ation events in the evolution of regional populations. J Virol 73 :8720-8731. Morse, S. S. 1995. Factors in the emergence of infectious diseases. Emerg Infect Dis 1:7-15. Mulligan, C. J., K. Hunley, S. Cole, and J. C. Long. 2004. Population genetics, history, and health patterns in native amer icans. Annu Rev Genomics Hum Genet 5:295-315. Mulligan, C. J., S. J. Norris, and S. A. Lukeha rt. 2008. Molecular Studies in Treponema pallidum Evolution: Towards Clarity? PLoS Neg Trop Dis 2:e184. Mulligan, C. J., R. W. Robin, M. V. Osier, N. Sambuughin, L. G. Goldfarb, R. A. Kittles, D. Hesselbrock, D. Goldman, and J. C. Long. 2003. Allelic variation at alcohol metabolism genes ( ADH1B, ADH1C, ALDH2) and alcoho l dependence in an American Indian population. Hum Genet 113:325-336. Nabatov, A. A., G. Pollakis, T. Linnemann, A. Kliphius, M. I. Chalaby, and W. A. Paxton. 2004. Intrapatient alterations in the human immunodeficiency virus type 1 gp120 V1V2 and V3 regions differentially modulate corecept or usage, virus inhibition by CC/CXC chemokines, soluble CD4, and the b12 and 2G12 monoclonal antibodies. J Virol 78:524530.
Nakamura, K., K. Iwahashi, Y. Matsuo, R. Miyatake, Y. Ichikawa, and H. Suwaki. 1996. Characteristics of Japanese alcoholics with the atypical aldehyde dehydrogenase 2*2. I. A comparison of the genotypes of ALDH2, ADH2, ADH3, and cytochrome P-4502E1 between alcoholics and nonalcoho lics. Alcohol Clin Exp Res 20:52-55. Nduati, R., G. John, D. Mbori-Ngacha, B. Ri chardson, J. Overbaugh, A. Mwatha, J. NdinyaAchola, J. Bwayo, F. E. Onyango, J. Hughes, and J. Kreiss. 2000. Effect of Breastfeeding and Formula Feeding on Transmission of HIV1: A Randomized Clinical Trial. Pp. 11671174. Neel, J. V. 1962. Diabetes mellitus: a "thrifty" ge notype rendered detrimental by "progress"? Am J Hum Genet 14:353-362. Neel, J. V. 1999. The "thrifty genotype" in 1998. Nutr Rev 57:S2-9. Neel, J. V. 1982. The thrifty genotype revisited. in J. Kobberling, and J. Tattersall, eds. The genetics of diabetes mellitus. Academic Press, New York. Neumark, Y. D., Y. Friedlander, H. R. Thomasson, and T. K. Li. 1998. Association of the ADH2*2 allele with reduced ethanol consumption in Jewish men in Is rael: a pilot study. J Stud Alcohol 59:133-139. Neville, M. C., J. C. Allen, P. C. Archer, C. E. Casey, J. Seacat, R. P. Keller, V. Lutes, J. Rasbach, and M. Neifert. 1991. Studies in hum an lactation: milk volume and nutrient composition during weaning and l actogenesis. Am J Clin Nutr 54:81-92. Neville, M. C., and M. R. Neifert. 1983. Lact ation: Physiology, Nutrition, and Breastfeeding. Plenum Press, New York. Nickle, D. C., M. A. Jensen, D. Shriner, S. J. Br odie, L. M. Frenkel, J. E. Mittler, and J. I. Mullins. 2003. Evolutionary indicators of human immunodeficiency virus type 1 reservoirs and compartments. J Virol 77:5540-5546. Nielsen, R., I. Hellmann, M. Hubisz, C. Bust amante, and A. G. Clark. 2007. Recent and ongoing selection in the human genome. Nat Rev Genet 8:857-868. Noonan, J. P., J. Grimwood, J. Schmutz, M. Dickson, and R. M. Myers. 2004. Gene conversion and the evolution of protocadherin gene cluster diversity. Genome Res 14:354-366. Novoradovsky, A. G., J. Kidd, K. Kidd, and D. Goldman. 1995. Apparent monomorphism of ALDH2 in seven American Indian populations. Alcohol 12:163-167. O'Brien, S. J., and G. W. Nelson. 2004. Hu man genes that limit AIDS. Nat Genet 36:565-574. Ochman, H., J. G. Lawrence, and E. A. Groisma n. 2000. Lateral gene transfer and the nature of bacterial innovation. Nature 405:299-304. Ohagen, A., A. Devitt, K. J. Kunstman, P. R. Gorry, P. P. Rose, B. Korber, J. Taylor, R. Levy, R. L. Murphy, S. M. Wolinsky, and D. Gabuzda 2003. Genetic and functional analysis of full-length human immunodeficiency virus ty pe 1 env genes deri ved from brain and blood of patients with AIDS. J Virol 77:12336-12345. Ohta, T. 1992. The Nearly Neutral Theory of Molecular Evolution. A nnual Review of Ecology and Systematics 23 :263-286. Omran, A. R. 1971. The epidemiologic transition. A theory of the epidemiology of population change. Milbank Mem Fund Q 49:509-538. Ordovas, J., A. Pittas, and A. S. Greenberg. 2003. Might the diabetic environment in utero lead to type 2 diabetes? Lancet 361:1839-1840. Osier, M., A. J. Pakstis, J. R. Kidd, J. F. Lee, S. J. Yin, H. C. Ko, H. J. Edenberg, R. B. Lu, and K. K. Kidd. 1999. Linkage disequilibrium at the ADH2 and ADH3 loci and risk of alcoholism. Am J Hum Genet 64:1147-1157.
Osier, M. V., A. J. Pakstis, D. Goldman, H. J. Edenberg, J. R. Kidd, and K. K. Kidd. 2002a. A proline-threonine substituti on in codon 351 of ADH1C is common in Native Americans. Alcohol Clin Exp Res 26:1759-1763. Osier, M. V., A. J. Pakstis, H. Soodyall, D. Comas, D. Goldman, A. Odunsi, F. Okonofua, J. Parnas, L. O. Schulz, J. Bert ranpetit, B. Bonne-Tamir, R. B. Lu, J. R. Kidd, and K. K. Kidd. 2002b. A global perspective on genetic va riation at the ADH genes reveals unusual patterns of linkage disequilibrium and diversity. Am J Hum Genet 71:84-99. Padidam, M., S. Sawyer, and C. M. Fauquet. 1999. Possible emergence of new geminiviruses by frequent recombination. Virology 265 :218-225. Paradies, Y. C., M. J. Montoya, and S. M. Fu llerton. 2007. Racialized ge netics and the study of complex diseases: the thrifty genotyp e revisited. Perspect Biol Med 50:203-227. Pastore, C., R. Nedellec, A. Ramos, S. P ontow, L. Ratner, and D. E. Mosier. 2006. Human immunodeficiency virus type 1 coreceptor switching: V1/V 2 gain-of-fitness mutations compensate for V3 loss-of-fitness mutations. J Virol 80 :750-758. Perrin, L., L. Kaiser, and S. Yerly. 2003. Travel and the spread of HIV-1 genetic variants. Lancet Infect Dis 3:22-27. Peterson, L. E., J. S. Barnholtz, G. P. Page, T. M. King, M. de Andrade, and C. I. Amos. 1999. A genome-wide search for susceptibility gene s linked to alcohol dependence. Genet Epidemiol 17 Suppl 1:S295-300. Petitjean, G., P. Becquart, E. Tuaillon, Y. Al Tab aa, D. Valea, M. F. Huguet, N. Meda, P. Van de Perre, and J. P. Vendrell. 2007. Isolation and characterization of HI V-1-infected resting CD4+ T lymphocytes in breast milk. J Clin Virol 39:1-8. Petito, C. K. 2004. Human immunodeficiency virus type 1 compartmentalization in the central nervous system. J Neurovirol 10 Suppl 1 :21-24. Philpott, S., H. Burger, C. Tsoukas, B. Foley, K. Anastos, C. Kitchen, and B. Weiser. 2005. Human immunodeficiency virus type 1 genomic RNA sequences in the female genital tract and blood: compartm entalization and intrapatie nt recombination. J Virol 79:353363. Pilcher, C. D., D. C. Shugars, S. A. Fiscus, W. C. Miller, P. Menezes, J. Giner, B. Dean, K. Robertson, C. E. Hart, J. L. Lennox, J. J. Eron, Jr., and C. B. Hicks. 2001. HIV in body fluids during primary HIV infection: implicat ions for pathogenesis, treatment and public health. Aids 15:837-845. Pillai, S. K., B. Good, S. K. Pond, J. K. Wong, M. C. Strain, D. D. Richman, and D. M. Smith. 2005. Semen-specific genetic characteristics of human immunodefici ency virus type 1 env. J Virol 79:1734-1742. Pillai, S. K., S. L. Pond, Y. Liu, B. M. Good, M. C. Strain, R. J. Ellis, S. Letendre, D. M. Smith, H. F. Gunthard, I. Grant, T. D. Marcotte, J. A. McCutchan, D. D. Richman, and J. K. Wong. 2006. Genetic attributes of cerebr ospinal fluid-derive d HIV-1 env. Brain 129:1872-1883. Pillay, K., A. Coutsoudis, D. York, L. Kuhn, and H. M. Coovadia. 2000. Cell-free virus in breast milk of HIV-1-seropositive wome n. J Acquir Immune Defic Syndr 24:330-336. Ping, L. H., M. S. Cohen, I. Hoffman, P. Vernazz a, F. Seillier-Moiseiwitsch, H. Chakraborty, P. Kazembe, D. Zimba, M. Maida, S. A. Fiscus, J. J. Eron, R. Swanstrom, and J. A. Nelson. 2000. Effects of genital tract inflammation on human immunodeficiency virus type 1 V3 populations in blood and semen. J Virol 74:8946-8952. Pitt, J. 1979. The milk mononuc lear phagocyte. Pediatrics 64 :745-749.
Plagnol, V., and J. D. Wall. 2006. Possible ancestral structure in human populations. PLoS Genet 2:e105. Pollard, A. J., and S. R. Dobson. 2000. Emerging infectious diseases in the 21st century. Curr Opin Infect Dis 13:265-275. Polymeropoulos, M. H., C. Lavedan, E. Leroy, S. E. Ide, A. Dehejia, A. Dutra, B. Pike, H. Root, J. Rubenstein, R. Boyer, E. S. Stenroos, S. Chandrasekharappa, A. Athanassiadou, T. Papapetropoulos, W. G. Johnson, A. M. Lazzarin i, R. C. Duvoisin, G. Di Iorio, L. I. Golbe, and R. L. Nussbaum. 1997. Mutation in the alpha-synuclein gene identified in families with Parkinson's disease. Science 276 :2045-2047. Pond, S. L., S. D. Frost, and S. V. Muse. 2005. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21:676-679. Posada, D., and K. A. Crandall. 2001. Evaluation of methods for detecting recombination from DNA sequences: computer simulations Proc Natl Acad Sci U S A 98:13757-13762. Posada, D., and K. A. Crandall. 1998. MODELTEST: testing the model of DNA substitution. Bioinformatics 14:817-818. Posada, D., K. A. Crandall, and E. C. Holmes. 2002. Recombination in evolutionary genomics. Annu Rev Genet 36:75-97. Poss, M., A. G. Rodrigo, J. J. Gosink, G. H. Lear n, D. de Vange Panteleeff, H. L. Martin, Jr., J. Bwayo, J. K. Kreiss, and J. Overbaugh. 1998. E volution of envelope sequences from the genital tract and peripher al blood of women infect ed with clade A human immunodeficiency virus type 1. J Virol 72:8240-8251. Powell, M. L., and D. C. Cook. 2005. Treponematosi s: Inquires into the Nature of Protean Disease in M. L. Powell, and D. C. Cook, eds. The Myth of Syphilis: the Natural History of Treponematosis in North America. Univ ersity of Florida Press, Gainesville Prugnolle, F., A. Manica, M. Charpentier, J. F. Guegan, V. Guernier, and F. Balloux. 2005. Pathogen-driven selection and worldw ide HLA class I diversity. Curr Biol 15:1022-1027. Quintana-Murci, L., A. Alcais, L. Abel, and J. L. Casanova. 2007. Immunology in natura: clinical, epidemiological and evolutionary ge netics of infectious diseases. Nat Immunol 8:1165-1171. Quintana-Murci, L., O. Semino, H. J. Bandelt, G. Passarino, K. McElreavey, and A. S. Santachiara-Benerecetti. 1999. Genetic evidence of an early exit of Homo sapiens sapiens from Africa through eastern Africa. Nat Genet 23 :437-441. Ramachandran, S., O. Deshpande, C. C. Roseman, N. A. Rosenberg, M. W. Feldman, and L. L. Cavalli-Sforza. 2005. Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. Proc Natl Acad Sci U S A 102:15942-15947. Reich, T., H. J. Edenberg, A. Goate, J. T. Willia ms, J. P. Rice, P. Van Eerdewegh, T. Foroud, V. Hesselbrock, M. A. Schuckit, K. Bucholz, B. Porjesz, T. K. Li, P. M. Conneally, J. I. Nurnberger, Jr., J. A. Tischfield, R. R. Crow e, C. R. Cloninger, W. Wu, S. Shears, K. Carr, C. Crose, C. Willig, and H. Begleiter. 1998. Genome-wide search for genes affecting the risk for alcohol dependence. Am J Med Genet 81:207-215. Relethford, J. H. 2001. Absence of regional affi nities of Neandertal DNA with living humans does not reject multiregional evolution. Am J Phys Anthropol 115:95-98. Richardson, B. A., G. C. John-Stewart, J. P. H ughes, R. Nduati, D. Mbori-Ngacha, J. Overbaugh, and J. K. Kreiss. 2003. Breast-milk infectivity in human immunodeficiency virus type 1infected mothers. J Infect Dis 187:736-740.
Ritola, K., C. D. Pilcher, S. A. Fiscus, N. G. Ho ffman, J. A. Nelson, K. M. Kitrinos, C. B. Hicks, J. J. Eron, Jr., and R. Swanstrom. 2004. Multi ple V1/V2 env variants are frequently present during primary infection with huma n immunodeficiency virus type 1. J Virol 78:11208-11218. Ritola, K., K. Robertson, S. A. Fiscus, C. Hall, and R. Swanstrom. 2005. Increased human immunodeficiency virus type 1 (HIV-1) env compartmentalization in the presence of HIV-1-associated dementia. J Virol 79 :10830-10834. Rivas, R. A., A. A. el-Mohandes, and I. M. Katona. 1994. Mononuclear phagocytic cells in human milk: HLA-DR and Fc gamma R ligand expression. Biol Neonate 66:195-204. Robin, R. W., J. C. Long, J. K. Rasmussen, B. Albaugh, and D. Goldman. 1998. Relationship of binge drinking to alcohol dependence, othe r psychiatric disorders, and behavioral problems in an American Indian tribe. Alcohol Clin Exp Res 22:518-523. Rollins, N. C., S. M. Filteau, A. Coutsoudis, a nd A. M. Tomkins. 2001. Feeding mode, intestinal permeability, and neopterin excretion: a long itudinal study in infants of HIV-infected South African women. J Acquir Immune Defic Syndr 28:132-139. Rothschild, B. 2003. Infectious proce sses around the dawn of civilization in C. L. Greenblatt, and M. Spigelman, eds. Emerging Pathogens : The Archaeology, Ecology, and Evolution of Infectious Disease. Oxford University Press, London. Rousseau, C. M., R. W. Nduati, B. A. Richardson G. C. John-Stewart, D. A. Mbori-Ngacha, J. K. Kreiss, and J. Overbaugh. 2004. Association of levels of HIV-1-infected breast milk cells and risk of mother-to-ch ild transmission. J Infect Dis 190:1880-1888. Rousseau, C. M., R. W. Nduati, B. A. Richards on, M. S. Steele, G. C. John-Stewart, D. A. Mbori-Ngacha, J. K. Kreiss, and J. Over baugh. 2003. Longitudinal analysis of human immunodeficiency virus type 1 RNA in breast milk and of its relationship to infant infection and maternal disease. J Infect Dis 187 :741-747. Royce, R. A., A. Sena, W. Cates, Jr., and M. S. Cohen. 1997. Sexual transmission of HIV. N Engl J Med 336:1072-1078. Rozas, J., J. C. Sanchez-DelBarrio, X. Messeguer, and R. Rozas. 2003. DnaSP, DNA polymorphism analyses by the coalescen t and other methods. Bioinformatics 19 :24962497. Saag, M. S., B. H. Hahn, J. Gibbons, Y. Li, E. S. Parks, W. P. Parks, and G. M. Shaw. 1988. Extensive variation of hu man immunodeficiency virus type-1 in vivo. Nature 334 :440444. Saccone, N. L., J. M. Kwon, J. Corbett, A. Goate, N. Rochberg, H. J. Edenberg, T. Foroud, T. K. Li, H. Begleiter, T. Reich, and J. P. Rice. 2000. A genome screen of maximum number of drinks as an alcoholism phenotype. Am J Med Genet 96:632-637. Sagar, M., X. Wu, S. Lee, and J. Overbaugh. 2006. Human immunodeficiency virus type 1 V1V2 envelope loop sequences expand and add glycosylation sites over the course of infection, and these modifications affect antibody neutralization sensitivity. J Virol 80:9586-9598. Salemi, M., B. R. Burkhardt, R. R. Gray, G. Ghaffari, J. W. Sleasman, and M. M. Goodenow. 2007. Phylodynamics of HIV-1 in Lymphoid and Non-Lymphoid Tissues Reveals a Central Role for the Thymus in Emergen ce of CXCR4-Using Quasispecies. PLoS ONE 2:e950. Salemi, M., S. L. Lamers, S. Yu, T. de Oliveira, W. M. Fitch, and M. S. McGrath. 2005. Phylodynamic analysis of human immunodefici ency virus type 1 in distinct brain
compartments provides a model for th e neuropathogenesis of AIDS. J Virol 79:1134311352. Samson, M., F. Libert, B. J. Doranz, J. Rucker, C. Liesnard, C. M. Farber, S. Saragosti, C. Lapoumeroulie, J. Cognaux, C. Forceille, G. Muyldermans, C. Verhofstede, G. Burtonboy, M. Georges, T. Imai, S. Rana, Y. Yi, R. J. Smyth, R. G. Collman, R. W. Doms, G. Vassart, and M. Parmentier. 1996. Resistance to HIV-1 infection in caucasian individuals bearing mutant alleles of the CCR-5 chemokine receptor gene. Nature 382:722-725. Santoyo, G., and D. Romero. 2005. Gene conversi on and concerted evolution in bacterial genomes. FEMS Microbiol Rev 29:169-183. Schaid, D. J. 2004. Evaluating associations of haplotypes with traits. Genet Epidemiol 27:348364. Schimenti, J. C. 1994. Gene conversion and the e volution of gene families in mammals. Soc Gen Physiol Ser 49:85-91. Schneider, S., D. Roessli, and L. Excoffier. 2000. Arlequin: A software for population genetics data analysis. Genetics and Biometry Laboratory, Department of Anthropology, University of Geneva. Schork, N. J. 1997. Genetics of complex disease: approaches, problems, and solutions. Am J Respir Crit Care Med 156 :S103-109. Schroeder, S. A., D. M. Gaughan, and M. Swif t. 1995. Protection against bronchial asthma by CFTR delta F508 mutation: a heterozygote advantage in cystic fibrosis. Nat Med 1:703705. Schwartz, K., L. Carrier, P. Guicheney, and M. Komajda. 1995. Molecular basis of familial cardiomyopathies. Circulation 91:532-540. Seguin, B., B. Hardy, P. A. Singer, and A. S. Daar. 2008. Bidil: recontextualizing the race debate. Pharmacogenomics J. Self, D. W., A. W. McClenahan, D. Beitner-Joh nson, R. Z. Terwilliger, and E. J. Nestler. 1995. Biochemical adaptations in the mesolimbic dopa mine system in res ponse to heroin selfadministration. Synapse 21:312-318. Semba, R. D., N. Kumwenda, D. R. Hoover, T. E. Taha, T. C. Quinn, L. Mtimavalye, R. J. Biggar, R. Broadhead, P. G. Miotti, L. J. Sokoll, L. van der Hoeven, and J. D. Chiphangwi. 1999a. Human immunodeficiency vi rus load in breast milk, mastitis, and mother-to-child transmission of human immun odeficiency virus type 1. J Infect Dis 180:93-98. Semba, R. D., N. Kumwenda, T. E. Taha, D. R. Hoover, Y. Lan, W. Eisinger, L. Mtimavalye, R. Broadhead, P. G. Miotti, L. Van Der Hoeve n, and J. D. Chiphangwi. 1999b. Mastitis and immunological factors in breast milk of l actating women in Malawi. Clin Diagn Lab Immunol 6:671-674. Seshadri, R., G. S. Myers, H. Tettelin, J. A. Eisen, J. F. Heidelberg, R. J. Dodson, T. M. Davidsen, R. T. DeBoy, D. E. Fouts, D. H. Haft, J. Selengut, Q. Re n, L. M. Brinkac, R. Madupu, J. Kolonay, S. A. Durkin, S. C. Daugherty, J. Shetty, A. Shvartsbeyn, E. Gebregeorgis, K. Geer, G. Tsegaye, J. Malek, B. Ayodeji, S. Shatsman, M. P. McLeod, D. Smajs, J. K. Howell, S. Pal, A. Amin, P. Vashisth, T. Z. McNeill, Q. Xiang, E. Sodergren, E. Baca, G. M. Weinstock, S. J. Norris, C. M. Fraser, and I. T. Paulsen. 2004. Comparison of the genome of the oral pathogen Treponema denticola with other spirochete genomes. Proc Natl Acad Sci U S A 101 :5646-5651.
Sham, P. C., and D. Curtis. 1995. Monte Carlo test s for associations between disease and alleles at highly polymorphic loci. Ann Hum Genet 59:97-105. Shapiro, B., A. Rambaut, and A. J. Drummond. 2006. Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. Mol Biol Evol 23:7-9. Shapshak, P., D. M. Segal, K. A. Crandall, R. K. Fujimura, B. T. Zhang, K. Q. Xin, K. Okuda, C. K. Petito, C. Eisdorfer, and K. Goodkin. 1999. Independent evolution of HIV type 1 in different brain regions. AI DS Res Hum Retroviruses 15:811-820. Sharp, P. M., E. Bailes, R. R. Chaudhuri, C. M. Rodenburg, M. O. Santiago, and B. H. Hahn. 2001. The origins of acquired immune defici ency syndrome viruses: where and when? Philos Trans R Soc Lond B Biol Sci 356:867-876. Shen, Y. C., J. H. Fan, H. J. Edenberg, T. K. Li, Y. H. Cui, Y. F. Wang, C. H. Tian, C. F. Zhou, R. L. Zhou, J. Wang, Z. L. Zhao, and G. Y. Xia. 1997. Polymorphism of ADH and ALDH genes among four ethnic groups in Chin a and effects upon the risk for alcoholism. Alcohol Clin Exp Res 21:1272-1277. Sherry, S. T., A. R. Rogers, H. Harpending, H. Soodyall, T. Jenkins, and M. Stoneking. 1994. Mismatch distributions of mtDNA reveal recent human population expansions. Hum Biol 66:761-775. Shibasaki, Y., D. A. Baillie, D. St Clair, and A. J. Brookes. 1995. High-resolution mapping of SNCA encoding alpha-synuclein, the non-A beta component of Alzheimer's disease amyloid precursor, to human chromo some 4q21.3-->q22 by fluorescence in situ hybridization. Cytogenet Cell Genet 71:54-55. Shuster, D. E., M. E. Kehrli, Jr., and C. R. Baumrucker. 1995. Relationship of inflammatory cytokines, growth hormone, and insulin-like growth factor-I to reduced performance during infectious disease. Proc Soc Exp Biol Med 210:140-149. Si-Mohamed, A., M. D. Kazatchkine, I. Heard, C. Goujon, T. Prazuck, G. Aymard, G. Cessot, Y. H. Kuo, M. C. Bernard, B. Diquet, J. E. Malkin, L. Gutmann, and L. Belec. 2000. Selection of drug-resistant variants in the female genital tract of human immunodeficiency virus type 1-infected women receiving antiretroviral therapy. J Infect Dis 182:112-122. Simon, F., P. Mauclere, P. Roques, I. Loussert-A jaka, M. C. Muller-Trutwin, S. Saragosti, M. C. Georges-Courbot, F. Barre-Si noussi, and F. Brun-Vezinet. 1998. Identification of a new human immunodeficiency virus type 1 dist inct from group M and group O. Nat Med 4:1032-1037. Sinkala, M., L. Kuhn, C. Kankasa, P. Kasonde, C. Vwalika, M. Mwiya, N. Scott, K. Semrau, G. Aldrovandi, D. M. Thea, and Z. E. B. S. Group. 2007. No Benefit of Early Cessation of Breastfeeding at 4 Months on HIV-free Survival of Infants Born to HIV-infected Mothers in Zambia: The Zambia Exclusive Breastf eeding Study Conference on Retroviruses and Opportunistic Infections, San Francisco. Slatkin, M., and W. P. Maddis on. 1989. A cladistic measure of gene flow inferred from the phylogenies of alleles. Genetics 123:603-613. Slightom, J. L., A. E. Blechl, and O. Smithies. 1980. Human fetal G gammaand A gammaglobin genes: complete nucleotide sequen ces suggest that DNA can be exchanged between these duplicated genes. Cell 21:627-638. Smit, T. K., B. J. Brew, W. Tourtellotte, S. Morgello, B. B. Gelman, and N. K. Saksena. 2004. Independent evolution of human immunode ficiency virus (HIV) drug resistance
mutations in diverse areas of the brain in HIV-infected patients, with and without dementia, on antiretroviral treatment. J Virol 78:10133-10148. Smit, T. K., B. Wang, T. Ng, R. Osborne, B. Br ew, and N. K. Saksena. 2001. Varied tropism of HIV-1 isolates derived from different regions of adult br ain cortex discriminate between patients with and without AIDS dementia co mplex (ADC): evidence for neurotropic HIV variants. Virology 279:509-526. Smith, J. L., N. J. David, S. Indgin, C. W. Israel, B. M. Levine, J. Justice, Jr., J. A. McCrary, 3rd, R. Medina, P. Paez, E. Santana, M. Sarkar, N. J. Schatz, M. L. Spitzer, W. O. Spitzer, and E. K. Walter. 1971. Neuro-ophthalmological study of late yaws and pinta. II. The Caracas project. Br J Vener Dis 47 :226-251. Smith, J. M. 1992. Analyzing the mosaic structure of genes. J Mol Evol 34:126-129. Smith, J. M., C. G. Dowson, and B. G. Spratt. 1991. Localized sex in bacteria. Nature 349:29-31. Smith, M. M., and L. Kuhn. 2000. Exclusive breast-f eeding: does it have the potential to reduce breast-feeding transmission of HIV-1? Nutr Rev 58:333-340. Spence, J., T. Liang, T. Foroud, D. Lo, a nd L. Carr. 2005. Expression profiling and QTL analysis: a powerful complementary strate gy in drug abuse research. Addict Biol 10 :4751. Spillantini, M. G., A. Divane, and M. Goeder t. 1995. Assignment of human alpha-synuclein (SNCA) and beta-synuclein (SNCB) genes to chromosomes 4q21 and 5q35. Genomics 27:379-381. Spitzer, R. L., J. Endicott, and E. Robins. 1989. Research Diagnostic Criteria (RDC) for a selected group of psychiatric disorders. Department of Research Assessment and Training, New York Psychiat ric Institute, New York. Staprans, S., N. Marlowe, D. Glidden, T. N ovakovic-Agopian, R. M. Grant, M. Heyes, F. Aweeka, S. Deeks, and R. W. Price. 1999. Time course of cerebrospinal fluid responses to antiretroviral therapy: evidence for vari able compartmentalizati on of infection. Aids 13:1051-1061. Steffy, K., and F. Wong-Staal. 1991. Genetic regulation of human immunodeficiency virus. Microbiol Rev 55:193-205. Stephens, D. S., E. R. Moxon, J. Adams, S. Altiz er, J. Antonovics, S. Aral, R. Berkelman, E. Bond, J. Bull, G. Cauthen, M. M. Farley, A. Glasgow, J. W. Glasser, H. P. Katner, S. Kelley, J. Mittler, A. J. Nahmias, S. Nichol, V. Perrot, R. W. Pinner, S. Schrag, P. Small, and P. H. Thrall. 1998a. Emerging and reemerging infectious diseases: a multidisciplinary perspective. Am J Med Sci 315 :64-75. Stephens, J. C., D. E. Reich, D. B. Goldstei n, H. D. Shin, M. W. Smith, M. Carrington, C. Winkler, G. A. Huttley, R. Allikmets, L. Schrim l, B. Gerrard, M. Malasky, M. D. Ramos, S. Morlot, M. Tzetis, C. Oddoux, F. S. di Gi ovine, G. Nasioulas, D. Chandler, M. Aseev, M. Hanson, L. Kalaydjieva, D. Glavac, P. Gasp arini, E. Kanavakis, M. Claustres, M. Kambouris, H. Ostrer, G. Duff, V. Baranov, H. Sibul, A. Metspalu, D. Goldman, N. Martin, D. Duffy, J. Schmidtke, X. Estivi ll, S. J. O'Brien, and M. Dean. 1998b. Dating the origin of the CCR5-Delta32 AIDS-resistance allele by the coalescence of haplotypes. Am J Hum Genet 62:1507-1515. Stephens, M., and P. Donnelly. 2003. A compar ison of bayesian methods for haplotype reconstruction from population ge notype data. Am J Hum Genet 73:1162-1169. Strain, M. C., S. Letendre, S. K. Pillai, T. Russell, C. C. Ignacio, H. F. Gunthard, B. Good, D. M. Smith, S. M. Wolinsky, M. Furtado, J. Mar quie-Beck, J. Durelle, I. Grant, D. D.
Richman, T. Marcotte, J. A. McCutchan, R. J. Ellis, and J. K. Wong. 2005. Genetic composition of human immunodeficiency virus type 1 in cerebrospinal fluid and blood without treatment and during failing antiretroviral therapy. J Virol 79:1772-1788. Sullivan, S. T., U. Mandava, T. Evans-Strickfade n, J. L. Lennox, T. V. Ellerbrock, and C. E. Hart. 2005. Diversity, divergence, and evolut ion of cell-free human immunodeficiency virus type 1 in vaginal secretions and blood of chronically infected women: associations with immune status. J Virol 79:9799-9809. Sun, E. S., B. J. Molini, L. K. Barrett, A. Ce nturion-Lara, S. A. L ukehart, and W. C. Van Voorhis. 2004. Subfamily I Treponema pallidum repeat protein family: sequence variation and immun ity. Microbes Infect 6:725-737. Surovell, T. A. 2003. Simulating coastal migrati on in New World colonization. Curr Anthropol 44:580-591. Swofford, D. L. 2002. PAUP*: Phylogenetic An alysis Using Parsimony (and Other Methods). Sinauer. Taguchi, Y., T. Koide, T. Shiroishi, and T. Yagi. 2005. Molecular evolution of cadherin-related neuronal receptor/protocadherin( alpha) (CNR/Pcdh(alpha)) gene cluster in Mus musculus subspecies. Mol Biol Evol 22:1433-1443. Tamura, K., J. Dudley, M. Nei, and S. Kumar. 2007. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol 24:1596-1599. Tanaka, F., Y. Shiratori, O. Yokosuka, F. Imazeki, Y. Tsukada, and M. Omata. 1996. High incidence of ADH2*1/ALDH2*1 genes among Japa nese alcohol dependents and patients with alcoholic liver disease. Hepatology 23:234-239. Thea, D. M., G. Aldrovandi, C. Kankasa, P. Kas onde, W. D. Decker, K. Semrau, M. Sinkala, and L. Kuhn. 2006. Post-weaning brea st milk HIV-1 viral load, blood prolactin levels and breast milk volume. Aids 20:1539-1547. Thea, D. M., C. Vwalika, P. Kasonde, C. Kankasa, M. Sinkala, K. Semrau, E. Shutes, C. Ayash, W. Y. Tsai, G. Aldrovandi, and L. Kuhn. 2004. Issu es in the design of a clinical trial with a behavioral intervention--the Zambia exclus ive breast-feeding study. Control Clin Trials 25:353-365. Thomasson, H. R., D. W. Crabb, H. J. Edenbe rg, and T. K. Li. 1993. Alcohol and aldehyde dehydrogenase polymorphisms a nd alcoholism. Behav Genet 23:131-136. Thomasson, H. R., D. W. Crabb, H. J. Edenberg, T. K. Li, H. G. Hwu, C. C. Chen, E. K. Yeh, and S. J. Yin. 1994. Low frequency of th e ADH2*2 allele among Atayal natives of Taiwan with alcohol use diso rders. Alcohol Clin Exp Res 18:640-643. Thomasson, H. R., H. J. Edenberg, D. W. Crabb, X. L. Mai, R. E. Jerome, T. K. Li, S. P. Wang, Y. T. Lin, R. B. Lu, and S. J. Yin. 1991. Alcohol and aldehyde de hydrogenase genotypes and alcoholism in Chinese men. Am J Hum Genet 48:677-681. Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The CLUSTAL_X windows interface: flexible stra tegies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25:4876-4882. Thompson, K. A., M. J. Churchill, P. R. Gorry, J. Sterjovski, R. B. Oelrichs, S. L. Wesselingh, and C. A. McLean. 2004. Astrocyte specific vi ral strains in HIV dementia. Ann Neurol 56:873-877. Toniolo, A., C. Serra, P. G. Conaldi, F. Basolo, V. Falcone, and A. Dolei. 1995. Productive HIV1 infection of normal human mammary epithelial cells. Aids 9:859-866.
Touchman, J. W., A. Dehejia, O. Chiba-Falek, D. E. Cabin, J. R. Schwartz, B. M. Orrison, M. H. Polymeropoulos, and R. L. Nussbaum. 2001. Human and mouse alpha -synuclein genes: comparative genomic sequence analysis and identification of a novel gene regulatory element. Genome Res 11:78-86. Ueda, K., H. Fukushima, E. Masliah, Y. Xia, A. Iwai, M. Yoshimoto, D. A. Otero, J. Kondo, Y. Ihara, and T. Saitoh. 1993. Molecular cloning of cDNA encoding an unrecognized component of amyloid in Alzheimer disease. Proc Natl Acad Sci U S A 90 :11282-11286. Uhl, G. R., Q. R. Liu, D. Walther, J. He ss, and D. Naiman. 2001. Polysubstance abusevulnerability genes: genome scans for asso ciation, using 1,004 s ubjects and 1,494 singlenucleotide polymorphisms. Am J Hum Genet 69:1290-1300. Ullrich, R., H. L. Schieferdecker, K. Ziegler, E. O. Riecken, and M. Zeitz. 1990. gamma delta T cells in the human intestine express surface ma rkers of activation and are preferentially located in the epithelium. Cell Immunol 128:619-627. Van de Peere, P., P. Lepage, A. Simonon, C. De sgranges, D. G. Hitimana, A. Bazubagira, C. Van Goethem, A. Kleinschmidt, F. Bex, K. Broliden, and et al. 1992. Biological markers associated with prolonged survival in African children maternally infected by the human immunodeficiency virus type 1. AIDS Res Hum Retroviruses 8:435-442. Van de Perre, P., A. Simonon, D. G. Hitimana, F. Dabis, P. Msellati, B. Mukamabano, J. B. Butera, C. Van Goethem, E. Karita, and P. Lepage. 1993. Infective and anti-infective properties of breastmilk from HIV-1-infected women. Lancet 341:914-918. Venturi, G., M. Catucci, L. Roma no, P. Corsi, F. Leoncini, P. E. Valensin, and M. Zazzi. 2000. Antiretroviral resistance mu tations in human immunodefici ency virus type 1 reverse transcriptase and protease from paired cerebro spinal fluid and plasma samples. J Infect Dis 181:740-745. Verrelli, B. C., J. H. McDonald, G. Argyropoulos G. Destro-Bisol, A. Froment, A. Drousiotou, G. Lefranc, A. N. Helal, J. Loiselet, a nd S. A. Tishkoff. 2002. Evidence for balancing selection from nucleotide sequence anal yses of human G6PD. Am J Hum Genet 71:11121128. Wagner, A. 1998. The fate of duplicated ge nes: loss or new function? Bioessays 20:785-788. Walker, S. J., and K. A. Grant. 2006. Pe ripheral blood alpha-synuclein mRNA levels are elevated in cynomolgus monkeys that chronically self-administe r alcohol. Alcohol 38:14. Wall, T. L., C. Garcia-Andrade, H. R. Thomasson, L. G. Carr, and C. L. Ehlers. 1997. Alcohol dehydrogenase polymorphisms in Native Ameri cans: identification of the ADH2*3 allele. Alcohol Alcohol 32:129-132. Walsh, J. B. 1995. How often do duplicated genes evolve new functions? Genetics 139 :421-428. Wang, S., C. M. Lewis, M. Jakobsson, S. Ramacha ndran, N. Ray, G. Bedoya, W. Rojas, M. V. Parra, J. A. Molina, C. Gallo, G. Mazzotti, G. Poletti, K. Hill, A. M. Hurtado, D. Labuda, W. Klitz, R. Barrantes, M. C. Bortolini, F. M. Salzano, M. L. Petzl-Erler, L. T. Tsuneto, E. Llop, F. Rothhammer, L. Excoffier, M. W. Feldman, N. A. Rosenberg, and A. RuizLinares. 2007. Genetic Variation and Popula tion Structure in Native Americans. PLoS Genet 3:e185. Wang, T. H., Y. K. Donaldson, R. P. Brettle, J. E. Bell, and P. Simm onds. 2001. Identification of shared populations of human immunodeficien cy virus type 1 infecting microglia and tissue macrophages outside the central nervous system. J Virol 75:11686-11699.
Wei, X., J. M. Decker, S. Wang, H. Hui, J. C. Kappes, X. Wu, J. F. Salazar-Gonzalez, M. G. Salazar, J. M. Kilby, M. S. Saag, N. L. Komarova, M. A. Nowak, B. H. Hahn, P. D. Kwong, and G. M. Shaw. 2003. Antibody neutralization and escape by HIV-1. Nature 422:307-312. West, K. M. 1974. Diabetes in American Indian s and other native populations of the New World. Diabetes 23 :841-855. WHO. 2003. HIV and Infant Feedin g. World Health Organization. WHO. 2007. Mother-to-child transmission of HIV. WHO. 2008a. Impact of chronic disease by Wo rld Bank income groups: High Income Countries. World Health Organization. WHO. 2008b. Impact of chronic disease by Worl d Bank income groups: Low Income Countries. World Health Organization. WHO. 2006. Comprehensive HIV Prevention: 2006 Report on the Global AIDS Epidemic in W. H. Organization, ed. Williams, J. T., H. Begleiter, B. Porjesz, H. J. Edenberg, T. Foroud, T. Reich, A. Goate, P. Van Eerdewegh, L. Almasy, and J. Blangero. 1999. Joint multipoint linkage analysis of multivariate qualitative and quantitative traits. II. Alcoholism and event-related potentials. Am J Hum Genet 65:1148-1160. Williams, R. C., W. C. Knowler, D. J. Pettitt, J. C. Long, D. A. Rokala, H. F. Polesky, R. A. Hackenberg, A. G. Steinberg, and P. H. Bennett. 1992. The magnitude and origin of European-American admixture in the Gila River Indian Community of Arizona: a union of genetics and demography. Am J Hum Genet 51:101-110. Willumsen, J. F., S. M. Filteau, A. Coutsoudis, K. E. Uebel, M. L. Newell, and A. M. Tomkins. 2000. Subclinical mastitis as a risk factor for mother-infant HIV transmission. Adv Exp Med Biol 478 :211-223. Wirt, D. P., L. T. Adkins, K. H. Palkowetz, F. C. Schmalstieg, and A. S. Goldman. 1992. Activated and memory T lymphocyt es in human milk. Cytometry 13 :282-290. Wise, P. H. 1976. Diabetes and asso ciated variables in the South Au stralian Aboriginal. Aust NZ J Med 6:191-196. Wong, J. K., C. C. Ignacio, F. Torriani, D. Hav lir, N. J. Fitch, and D. D. Richman. 1997. In vivo compartmentalization of human immunodeficien cy virus: evidence from the examination of pol sequences from autopsy tissues. J Virol 71 :2059-2071. Wooding, S., A. C. Stone, D. M. Dunn, S. Mummidi, L. B. Jorde, R. K. Weiss, S. Ahuja, and M. J. Bamshad. 2005. Contrasting effects of na tural selection on human and chimpanzee CC chemokine receptor 5. Am J Hum Genet 76:291-301. Worobey, M. 2001. A novel approach to detecting and measuring recombination: new insights into evolution in viruses, bacter ia, and mitochondria. Mol Biol Evol 18:1425-1434. Xanthou, M. 1997. Human milk cells. Acta Paediatr 86 :1288-1290. Yancy, C. W., J. K. Ghali, V. M. Braman, M. L. Sabolinski, M. Worcel, W. T. Archambault, and J. A. Franciosa. 2007. Evidence for the contin ued safety and tolerability of fixed-dose isosorbide dinitrate/hydralazine in patients w ith chronic heart failure (the extension to African-American Heart Failure Trial). Am J Cardiol 100:684-689. Yang, Z. 1997. PAML: a program package for p hylogenetic analysis by maximum likelihood. Comput Appl Biosci 13 :555-556. Young, T. K., J. Reading, B. Elias, and J. D. O'Neil. 2000. Type 2 diabetes mellitus in Canada's first nations: status of an epidemic in progress. Cmaj 163:561-566.
Zhang, J. R., J. M. Hardham, A. G. Barbour, a nd S. J. Norris. 1997. Antigen ic variation in Lyme disease borreliae by promiscuous recombina tion of VMP-like sequence cassettes. Cell 89:275-285. Zhang, J. R., and S. J. Norris. 1998. Genetic vari ation of the Borrelia bu rgdorferi gene vlsE involves cassette-specific, segmenta l gene conversion. Infect Immun 66:3698-3704. Zhang, Q. Y., D. DeRyckere, P. Lauer, and M. Koomey. 1992. Gene conversion in Neisseria gonorrhoeae: evidence for its role in pilus antig enic variation. Proc Natl Acad Sci U S A 89:5366-5370. Zhu, T., N. Wang, A. Carr, D. S. Nam, R. Moor-Jankowski, D. A. Cooper, and D. D. Ho. 1996. Genetic characterization of hu man immunodeficiency virus ty pe 1 in blood and genital secretions: evidence for viral compartm entalization and selection during sexual transmission. J Virol 70:3098-3107. Zinn-Justin, A., and L. Abel. 1999. Genome search for alcohol dependence using the weighted pairwise correlation linkage method: interesting findings on chromosome 4. Genet Epidemiol 17 Suppl 1:S421-426.
BIOGRAPHICAL SKETCH I graduated from Great Valley High School in Malvern, Pennsylvania in 1997. I attended Reed College in Portland, Oregon, from A ugust to December 1997. I attended Millersville University in Millersville, Pennsylvania, from January 1999 to December 2001. I graduated in December 2001 with a Bachelor of Arts degree in anthropology. I began graduate school at the University of Florida in August 2002 in anthropol ogy. I received my Master of the Arts degree in May of 2005 and Ph.D. in May 2008.