EPIDEMIOLOGICAL MODELS OF INFECTIOUS DISEASES FOR CLINICAL AND PUBLIC HEALTH DECISION SUPPORT By JACOB DANIEL BALL A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2018
2018 Jacob Daniel Ball
To everyone I have ever been lucky enough to learn from
4 ACKNOWLEDGMENTS This dissertation woul d not be possible if it were not for my fantastic PhD committee. Drs. Chen, Cummings, Prosperi, Kenah and Yang have provided me with incredible mentorship as an epidemiologist and scientist. In particular, Dr. Cummings stepped up and took me on as a studen t during a tough time in my academic trajectory, and Dr. Chen made sure I met all of my programmatic milestones. I am proud to call all of my committee members my colleagues and friends. I would not be the scientist I am today without the guidance from Dr. Juliet R.C. Pulliam during the first two years of my doctoral studies and for taking a chance on me She taught me learning things by pen and paper before taking shortcuts is the o nly way to truly learn something. She led me on a journey of academic, professional, and personal self discovery, and I am forever thankful that I can call her a mentor and friend. I would like to thank the UF Haiti Laboratory and the Christianville Founda tion in Haiti for their work to provide quality medical care and diagnostic services to school children in Haiti. Without their tireless work, Chapter 2 would not have been possible. Specifically, I would like to thank Drs. J. Glenn Morris, Jr., Maha El Ba dry, Sarah White, Valery Madsen Beau de Rochars, and John Lednicky for their work in coordinating data collection, processing blood samples, and performing diagnostic tests for the data in this study. Dr. Taina Telisma, Ms. Sonese Chavannes and Marie Gina Anilis played essential roles in the data collection process. I thank the government of Haiti for permitting us to perform this important work. I am very thankful for Kyra Grantz, Chuck Vukotich and the rest of the Social Mixing and Respiratory Transmissio n in Schools study staff for facilitating data access
5 and for be ing so generous with their time. I would like to acknowledge the U.S. Centers for Disease Control and Prevention for funding the study, and to the schools and school children for participating in the study. Without them, Chapter 3 would not have been possible. Chapter 4 would not have been possible without the mentorship of Dr. Caitlin Rivers, who is an inspiration and a great friend, and without the financial support of the Science, Mathematic s and Research for Transformation Scholarship program funded by the U.S. Department of Defense. I am thankful for my colleagues at Army Public Health Center for their support, and for the military recruits whose data were used in this dissertation for thei r service. Lastly, I would like to thank my friends and family for their overwhelming support and love throughout my journey to the doctorate. None of this would have been possible without you, and I am incredibly grateful.
6 TABLE OF CONTENTS page ACKNOWLEDGMENTS ................................ ................................ ................................ .. 4 LIST OF TABLES ................................ ................................ ................................ ............ 9 LIST OF FIGURES ................................ ................................ ................................ ........ 10 ABSTRACT ................................ ................................ ................................ ................... 11 CHAPTER 1 INTRODUCTION ................................ ................................ ................................ .... 13 Public Health Surveillance: The Observation Process ................................ ............ 13 Models for Clinical Decision Support ................................ ................................ ...... 15 Models for Public Health Decision Support ................................ ............................. 16 Model Validation, Comparison and Selection ................................ ......................... 18 Overview of the Dissertation ................................ ................................ ................... 20 2 ALGORITHMS TO PREDICT ARBOVIRAL INFECTIONS USING CLINICAL AND DEMOGRAPHIC INFORMATION DURING CO INCIDENT OUTBREAKS OF DENGUE, C HIKUNGUNYA AND ZIKA ................................ ............................. 22 Background ................................ ................................ ................................ ............. 22 Materials and Methods ................................ ................................ ............................ 23 Da ta ................................ ................................ ................................ .................. 23 Analysis ................................ ................................ ................................ ............ 24 Results ................................ ................................ ................................ .................... 26 Study Sample Characteristics ................................ ................................ .......... 26 Clinical Symptoms by Pathogen ................................ ................................ ....... 27 Multivariate Regression Results ................................ ................................ ....... 27 CHIKV Predictive Models ................................ ................................ ........... 30 DENV 1 Predictive Models ................................ ................................ ......... 31 DENV 4 Predictive Models ................................ ................................ ......... 31 Discussion ................................ ................................ ................................ .............. 40 Stepwise Regression for Variable Selection ................................ ..................... 40 Full Scope Modeling of CHIKV, DENV 1 and DENV 4 To Aid Differential Diagnosis ................................ ................................ ................................ ...... 41 Performance of the Predictive Models for CHIKV ................................ ............. 42 Limitations ................................ ................................ ................................ ........ 43 3 ESTIMATING THE CONTRIBUTION OF PRIOR INFECTION WITH SPECIFIC RESPIRATORY VIRUSES ON WITHIN SEASON VIRAL INFECTION HAZARDS ................................ ................................ ................................ ............... 45
7 Background ................................ ................................ ................................ ............. 45 Immunological Mechanisms ................................ ................................ ............. 45 Respiratory Diseases ................................ ................................ ....................... 46 Materials and Methods ................................ ................................ ............................ 48 Surveillance System ................................ ................................ ......................... 49 Data Setup ................................ ................................ ................................ ....... 49 Analysis ................................ ................................ ................................ ............ 50 Results ................................ ................................ ................................ .................... 52 Pathogens ................................ ................................ ................................ ........ 52 Demographics ................................ ................................ ................................ .. 52 IRR E and Univariate Hazard Estimates for Prior ILI, Prior Infections and Demographic Variables ................................ ................................ ................. 56 Influenza B (Flu B) Outcome ................................ ................................ ............ 56 Influenza A (Flu A) Outcome ................................ ................................ ............ 57 Human Rhinovirus (HRV) Outcome ................................ ................................ 57 Coronavirus (CoV) Outcome ................................ ................................ ............ 58 Multivariate, Mixed Effects Model ................................ ................................ ..... 58 Influenza B (Flu B) outcome ................................ ................................ ....... 58 Influenz a A (Flu A) outcome ................................ ................................ ....... 58 Human rhinovirus (HRV) outcome ................................ ............................. 61 Coronavirus (CoV) outcome ................................ ................................ ....... 61 Discussion ................................ ................................ ................................ .............. 61 Any Swab Hazard Estimates ................................ ................................ ............ 63 4 THE ROLE OF BENZATHINE PENICILIN G AT PREDICTING AND PREVENTI NG ALL CAUSE ACUTE RESPIRATORY DISEASE IN MILITARY RECRUITS ................................ ................................ ................................ ............. 68 Background ................................ ................................ ................................ ............. 68 Materials and Methods ................................ ................................ ............................ 71 Data ................................ ................................ ................................ .................. 71 Analysis ................................ ................................ ................................ ............ 71 Results ................................ ................................ ................................ .................... 73 Prediction ................................ ................................ ................................ ......... 73 Cross validation (22.5 years) ................................ ................................ ..... 73 External Validation ................................ ................................ ..................... 76 Inference on BPG Effectiveness ................................ ................................ ....... 78 Discussion ................................ ................................ ................................ .............. 80 5 CONCLUDING THOUGHTS AND FUTURE DIRECTIONS ................................ .... 85 Clinical Decision Support ................................ ................................ ........................ 85 Public Health Surveillance ................................ ................................ ...................... 87 Public Health Intervention Evaluation ................................ ................................ ..... 88 Concluding Thoughts ................................ ................................ .............................. 89 LIST OF REFERENCES ................................ ................................ ............................... 90
8 BIOGR APHICAL SKETCH ................................ ................................ ............................ 97
9 LIST OF TABLES Table page 2 1 Demographics by pathogen ................................ ................................ ................ 29 2 2 Preva lence of symptoms by pathogen ................................ ................................ 30 2 3 Multivariate stepwise logistic regression for each outcome on full dataset ......... 32 2 4 Variable importance measures from random forest models on full data set ....... 34 2 5 Mean and Standard Deviation of Sensitivity, Specificity and AUROC in Validation for CHIKV Models with 0.50 Threshold ................................ .............. 35 2 6 Mean and Standard Deviation of Sensitivity, Specificity and AUROC in Validation for DENV 1 Models with 0.09 Threshold ................................ ............ 36 2 7 M ean and standard deviation of sensitivity, specificity and AUROC in validation for DENV 4 models with 0.09 threshold ................................ .............. 37 2 8 Pathogen outcome by cluster from k means clustering of symptoms with k=6 ... 38 3 1 Demographics of Study Population By Infection Status at End of Surveillance Period ................................ ................................ ................................ ................. 53 3 2 PCR Confirmed Infection by Swab Number ................................ ....................... 54 3 3 IRR estimate and univariate Cox regression for each pathogen outcome .......... 59 3 4 Multivariate, Mixed Effects Cox Regression for Each Pathogen Outcome ......... 62 4 1 Comparing Prediction Accuracy for 5 Fold Cross Validation by Mean Absolute Error for Random Forest and Poisson Regression .............................. 76 4 2 Variable Importance (% Increase Mean Squared Error when left out) for Random Forest Model (k fold cross validation) Using Trainee Class Size ......... 77 4 3 Comparison of variable importance between the cross validation set and the external validation set from RF models ................................ .............................. 78 4 4 Incidence Rate Ratios for Independent Variables in the Poisson Regress ion Model on the Training Data with Population Offset (22.5 years) ......................... 80
10 LIST OF FIGURES Figure page 2 1 ROC Curve for CHIKV Outcome with 0.50 Thresh old ................................ ........ 35 2 2 ROC Curves for DENV 1 Outcome with 0.09 Threshold ................................ .... 36 2 3 ROC Curves for DENV 4 Outcome with 0.09 Threshold ................................ .... 37 2 4 k means clustering of symptoms shown by patients, with k=6 ........................... 39 3 1 Heat map of swab timing, order and co detection status for Flu B ..................... 54 3 2 Heat map of swab timing, order and co detection status for Flu A ..................... 55 3 3 Heat map of swab timing, order and co detection s tatus for HRV ...................... 55 3 4 Heat map of swab timing, order and co detection status for CoV ....................... 56 4 1 Time series of ARD cases from 1991 2017 at Fort Sill, Fort Leonard Wood, Fort Benning, and Fort Jackson. ................................ ................................ ......... 74 4 2 Predicted and Observed Values from the 5 Fold Cross Validation ..................... 75 4 3 Predicted and Observed ARD Case Counts in External Validation Set .............. 79
11 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of th e Requirements for the Degree of Doctor of Philosophy EPIDEMIOLOGICAL MODELS OF INFECTIOUS DISEASES FOR CLINICAL AND PUBLIC HEALTH DECISION SUPPORT By Jacob Daniel Ball May 2018 Chair: Xinguang Chen Major: Epidemiology Data collected by clinicians and infectious disease surveillance systems, when modeled appropriately, can be used for a variety of decision support purposes, including to aid patient diagnostics and treatment; to identify at risk subpopulations in order to guide targ eted surveillance effo rts; and to evaluate interventions. This dissertation consists of three examples of fitting epidemiological models to surveillance data to provide guidance to clinicians and public health professionals to treat and survey and control infectious diseases. C hapter 2 fit and validated a suite of regression and machine learning models to clinical symptom and demographic data from febrile school children in Haiti to predict arbovirus infection We were able to correctly predict chikungunya virus infection with 7 2% sensitivity and 63% specificity, based on the presence of arthralgia and grade of the patient. Results from this study can streamline screening practices, as well as guide prescribing pain management medications in areas with co incident epidemics of de ngue, chikungunya and Zika. Chapter 3 used a survival analysis framework to quantify the extent that prior infection, both in general and with specific respiratory viruses, influences the risk of
12 subsequent infection with multiple different respiratory vir uses in school children during the same influenza season. R esults from this study showed that any prior infection was significantly, positively associated with subsequent infection, indicating that surveillance efforts should focus on people who previously had a respiratory infection. Chapter 4 fit and validated machine learning and regression models to 26 years of weekly counts of all cause ac ute respiratory disease among Army trainees in order to evaluate the effectiveness of a high dose antibiotic prophy lactic shot at preventing illness. We found that the antibiotic prophylaxis was significantly protective against disease, but less so than the adenovirus vaccine. The prophylaxis was associated with a 32% reduction in incidence While these results suggest that the prophylaxis is beneficial at reducing all cause acute respiratory disease, further studies are needed to examine dosing and distribution Fitting and validating epidemiological models to infectious disease data are essential in elucidating the un derlying mechanisms and determinants of infection outcomes, and to guide clinical treatment and public health policy.
13 CHAPTER 1 INTRODUCTION Public Health Surveillance: The Observation Process One fundamental role of public health systems regarding infect ious diseases is to perform disease surveillance and to implement evidence based interventions to limit spread at the population level. One major role of the infectious disease epidemiologist has traditionally been to collect data and then apply statistica l models to identify risk factors that are statistically significantly associated with infection or mortality from such infection. A primary application of this is to identify subpopulations for targeted interventions, in order to improve health outcomes a mong those at highest risk of illness or death. Most infectious disease surveillance is passive, meaning that in order for a case to be detected, the ill patient must present to a medical facility where the medical provider must have the order the proper d iagnostic tests for the suspected etiologic agent, and that test must then yield a positive result. Then, the laboratory that performed the test or medical provider must report the positive case to the proper health authorities who would enter the case int o a database. data. The performance of a surveillance system at capturing health information can be affected by many patient level factors, including social, biological and beh avioral. For example, different pathogens have different rates of asymptomatic or subclinical infections whereby a patient is infected but does not experience clinical disease, or does experience clinical disease but not to the extent that could prompt him /her to seek medical care. Additionally, health seeking behavior is known to vary by socioeconomic
14 status, health insurance status and other demographic characteristics. If infected individuals do not seek care, then cases are not captured in surveillanc e systems, and captured when a case is not detected are missing, such as gender, age, race, and other demographic and behavioral characteristics. These covariates are essent ial in the descriptive epidemiological framework, as they are the factors associated with either increased or decreased risk of infection. If there were differential associations between demographic, social or biological variables and symptomatic infection outcomes and health seeking, then estimates of the risk factors could be biased, rather than simply error prone. Increasingly, and drawing from methods in biostatistics, disease ecology, quantitative population biology, computer engineering and more, epi demiology is transitioning from more of a descriptive discipline to an analytic one in which the goal is to understand the underlying biological and social processes that govern disease emergence and epidemic dynamics (1) Applying these methods to infectious disease surveillance data may help elucidate the mechanisms that drive transmission, and identify opportunities to control outbreaks. Rather than collecti ng data and then fitting statistical models, the analytic approach consists of building models and then fitting them to data. The goal of this is to evaluate which model is best able to explain the real world processes that produced the observed data. Dyna mic and other non linear models, including machine learning models, are used to explore emergent properties of the systems that produce the observed data (2,3) For example, with this approach, it is possible to examine higher
15 order interactions among the components (e.g. variables) of the system and qu antify the effects of unmeasured covariates on the outcome of interest (1) Models for Clinical Decision Support Clinicians are responsible for the timely di symptoms, the provider must accurately record those signs and symptoms in the cause of the illness. The provider then begins treatment based on the suspected illness, while he or she waits for a confirmatory diagnosis via tests such as enzyme linked immunoabsorbent assays (ELISAs) or polymerase cha in reaction (PCR). Common risk factors, which have been identified by epidemiological and biostatistical models, are frequently used by clinicians during differential diagnosis, and can advise the provider on what confirmatory diagnostic tests to order. Pr ecision optimal treatment regimens, is gaining traction. While these two areas are important in the clinical decision making process, the former is subjected to the ecological fallacy, while the latter is expensive and time consuming, which makes it impractical in an outpatient scenario. In many circumstances, multiple pathogens can have overlapping symptoms, which can lead to incorrect differential diagnoses and thus inappropriate treatment. Depending on the pathogen of interest, these tests can take hours or even days if sent off to a laboratory, and they may or may not arrive at a conclusive result. In some circumstances, providing incorrect treatment from the onse t may result in disease progression, or possibly unintended side effects from the incorrect treatment. Predictive
16 models that combine demographic with clinical and laboratory data have been used to alert medical providers of possible clinical complications and progression of chronic kidney disease (4) mortality from heart failure (5) diagnosing influenza infection (6) and more. The res ults from these types of studies can provide clinicians and epidemiologists with tools to improve diagnosis and treatment for individual patients, and improve surveillance and estimates of disease burden at the population level. Models for Public Health De cision Support Epidemiological models can provide a variety of information that can be used to inform public health policy during epidemics. For example, models can assist decision makers in estimating the number of people that will be infected in the next week, the timing of an epidemic peak, the maximum peak size, and the cumulative incidence. Models can be used to estimate the effects of interventions both on short term disease outcomes (e.g. incidence) and longer term dynamics (e.g. endemicity). However there are a number of issues at the interface of modeling and public health practice that must be addressed in order to ensure the effective utilization of models in guiding public health decisions (7) All models have underlying assumptions that must be effectively communicated to policymakers because they directly influence projections and other estimates that the m odels produce (7) For example, data from experimental animal models are frequently used to assume individual trajec tories and time course of infections in mathematical models. These values are then used to parameterize population level models, which tend to assume that humans and the animal in question experience the same underlying susceptibility to disease, the same immunological response kinetics, and that
17 key components of transmission (i.e. duration of infectiousness and generation intervals) are the same (8) Whereas mathematical models can be purely mechanistic and based not on data but on assumptions of the biological and social phenomena responsible for disease transmission, statistical models are fit to data generated in the real world by an imper fectly understood observation or generating process. This could mean that assumptions from the statistical model may be violated. Furthermore, the data collected are often incomplete in that observations are missing (either at random or systematically) or important variables on the true causal pathway are not properly captured (either due to difficulties or cost in the data collection process, or because elements on the causal pathway have not been discovered yet). As a result, the models could produce err or prone estimates or entirely spurious associations between an intervention and the outcome of interest, which could translate into suboptimal interventions or policies. Communicating these assumptions and the proper interpretation of model estimates wit h policymakers is vital to ensuring that the science behind evidence based policies is clear. Public health responses to outbreaks are reactionary and need to have immediate effects, but are also desired to have long term impacts. The uncertainty in model projections increases with the forecasting window, even if all of the underlying model assumptions are appropriate. The initial conditions (parameterization) of the model often are no longer accurate on longer time scales (7) In fact, public health responses can be a cause of this infidelity in prediction, since policies and interventions can change behavior that can re sult in decreased transmission, and the effectiveness of
18 interventions is proportional to its acceptability and utilization in the population. For example, during the West African Ebola outbreak, as more Ebola Treatment Units opened up and contact tracing and surveillance improved, cases were discovered earlier on and moved to appropriate medical facilities which resulted in fewer hospital acquired cases (9) Additionally, an intervention for safe burial practices for deceased Ebola virus disease patients resulted in a decrease in transmission, but uptake of the intervention was first met with resistance fro m community members. Epidemiological models showed that the timing, pervasiveness and effectiveness of these interventions could have had variable impacts on the Ebola outbreak trajectory (9) In practice, multiple, competing models that make various underlying assumptions are used and then formally compared in order to select the one that performed the be st. Model Validation, Comparison and Selection In order to contextualize and interpret estimates of the effects of covariates on the outcome of interest, it is important to assess the validity of the statistical model that generates those estimates. A mod el that suggests use of an intervention is associated with a 200 percent reduction in mortality from a disease has positive implications. But, if that model can only explain a very small fraction of the variance in the data, or if the model has poor sensit ivity and specificity, then researchers and policymakers should be cautious in making inferences from the results. Models should be tested for internal validity, and then compared against competing models in order to ensure that proper inferences are made. In this section, we review three ways to internally validate models, and then discuss ways to compare and select from competing models.
19 Perhaps the most common way to quantify the internal validity of a model is to split the data into training and testing sets. The training set is a randomly selected subset of the original data, and is used to build the model, while the test set consists of the remaining observations from the original data. The model is then used to predict the outcome of interest for the values (10) The percent of times that the model correctly discriminates between observations with and without the outcome of interest are referred to as the sensitivity and specificity, respectively. There is often a tradeoff between sensitivi ty and specificity in model performance. This procedure can be repeated many times and then the metrics for model performance can be averaged over all runs. Another way to validate a model is to create a test dataset by sampling observations with replaceme nt (bootstrapping) from the original data so that the test set has, on average, the same properties as the complete, original dataset. The model is then trained on the bootstrapped sample and tested on the observations not selected by the bootstrap procedu re (11,12) This procedure can similarly be repeated many times, and performance metrics can be averaged over all runs. A third model validation technique is k fold cross validation. In this approach, the original da taset is partitioned into k folds or segments. The model is then trained on k 1 folds and tested on the k th The training and testing then is repeated k times until each fold is the test set. Prediction statistics are calculated for each test fold, and the n averaged over in order to estimate the overall performance of the model (13,14) Many different methods are used to select the best model from a set of competing models. First is the principle of parsimony. In situations where multiple
20 models fit the data similarly well based on the prediction statistics generated by validati to the number of parameters that are estimated by the model, or the model structure itself. For example, if a linear model fits the data as well as a polynomial based model likelihoods, which can be thought of as the probability of a model given the data, is one way to compare two competing models. As a rule of thumb, the model with the maximu m likelihood (or minimal negative log likelihood) should be selected. A formal test to compare models is the likelihood ratio test, which utilizes the likelihood functions of the respective models, and includes a penalty term that is a function of the numb er of parameters or the number of observations used in the models (15) Another formal test test with adjusted degrees of freedom (1 6) In this approach, performance metrics (e.g. mean absolute error or mean squared error) from multiple validation runs are averaged within each model and then formally tested across models in a difference of means framework. Overview of the Dissertatio n This dissertation consists of four additional chapters. Chapter 2 fits regression and machine learning models to clinical and demographic data from febrile school aged children in Haiti to predict which, if any, arbovirus they are infected by. It also ex amines symptom clusters in relation to diagnostic outcomes to characterize the spectrum of clinical illness associated with the respective infections. The results of this study can be used to assist clinicians in differentially diagnosing patients during c o incident epidemics of multiple arboviruses. This work was completed in collaboration with Valery Madsen Beau de Rochars, Maha A. El Badry, Taina Telisma, Sarah K. White, Sonese
21 Chavannes, Marie Gina Anilis, John Lednicky, J. Glenn Morris, Derek A.T. Cum mings, Eben Kenah, Xinguang Chen, Mattia Prosperi and Yang Yang. Chapter 3 examines within season epidemiology of respiratory viruses in a cohort of school children from Alleghany County, PA. Specifically, it utilizes Cox proportional hazards models to est imate the effects of prior infection with specific respiratory viruses on the hazard of infection with other viruses during the same respiratory disease season. The results of this study can be used for improved surveillance during respiratory disease seas on in order to prevent severe clinical disease. This work was completed in collaboration with the SMART Schools Study investigators and study staff, including but not limited to Shanta Zimmer, Jonathan Read, Charles Vukotich, Mary Lou Schweizer Kyra Grant z, and Derek A.T. Cummings. Chapter 4 utilizes regression and machine learning methods to estimate the predictive ability and effectiveness of an antibiotic prophylaxis shot on the burden of all cause acute respiratory disease in military recruits. The res ults of this study can be used to help Army Medicine to ensure the medical readiness of the US Army and thus contribute to biosecurity. This work was completed with Alfonza Brown at Army Public Health Center, Caitlin Rivers and Johns Hopkins University Cen ter for Health Security, as well as Derek A.T. Cummings, Xinguang Chen, Mattia Prosperi, Eben Kenah and Yang Yang at the University of Florida. The dissertation ends with a Conclusion chapter that summarizes the results of each of the studies, discusses th e public health impact of the work, and provides insights into possible future directions.
22 CHAPTER 2 ALGORITHMS TO PREDICT ARBOVIRAL INFECTIONS USING CLINICAL AND DEMOGRAPHIC INFORMATION DURING CO INCIDENT OUTBREAKS OF DENGUE, CHIKUNGUNYA AND ZIKA Backgr ound The past few decades have been characterized by the emergence and geographic expansion of arboviruses, including dengue viruses (DENV), chikungunya virus (CHIKV) (17) and more recently, Zika virus (ZIKV) (18) DENV is the most prevalent arbovirus globally, and has four serotypes (DENV 1, 2, 3 and 4). Infection with one serotype of DENV provides lifelong immun ity against reinfection with that serotype. The first infection with DENV is usually subclinical or mild, but severe clinical complications including dengue shock syndrome can develop during subsequent with different serotypes. DENV is hyperendemic through out the Americas, including in Haiti, meaning that multiple serotypes are co circulating in the population. CHIKV arrived in Haiti in 2014 with a major outbreak (19) while ZIKV arrived one year later in 2015 (20) Serological evidence and surveillanc e data suggest that there are concurrent outbreaks of these three pathogens in Haiti and other American countries (21,22) All three patho gens are associated with similar symptoms during acute infection (23) which poses a problem for differential diagnosis and thus care for patients, as well as for public health surveillance (24) Many afflicted areas are resource limited and are unable to perform di agnostic testing within an actionable time period for all patients who present with symptoms associated with arboviral infections, or are unable to perform diagnostic tests altogether. In such circumstances, data on demographic and clinical factors may pro vide additional information for differential diagnosis, treatment options and monitoring programs for possible complications. Furthermore, if a clinician
23 misdiagnoses and treats a febrile patient as infected with CHIKV, but who actually has DENV, then this could increase the chances of the patient developing hemorrhagic or other DENV clinical complications (25,26) Th us, it is important for clinicians to be able to distinguish between suspected arboviral infections at point of care, without having to wait for confirmatory lab results. Braga and colleagues evaluated multiple possible case definitions for ZIKV and develo ped their own predictive model to diagnose ZIKV in concurrent ZIKV CHIKV DENV epidemics (27) Their model creates a weighted symptom score that uses the presence of a maculopapular rash, a low grade fever, itching and conjunctival hyperemia, and the absence of anorexia and petechiae to predict Z IKV infection. While beneficial in the diagnosis of ZIKV, their model was not built to differentiate between CHIKV and DENV infections among those that tested negative for ZIKV. Here, we utilized both regression and machine learning techniques to test the hypothesis that clinical symptoms, along with patient demographic characteristics, can be used to predict and distinguish between PCR confirmed arboviral infections in a cohort of children in Haiti. Materials and Methods Data Data from this study come from a cohort of children and affiliates of that cohort, which is part of a clinical surveillance system established by the University of Florida near Gressier, Haiti. Details of the cohort, including participant recruitment and clinical data collection have b een reported elsewhere (28) A total of 268 school children who presented to the Christianville School clinic in the Gressier region of Haiti and reporte d experiencing a fever prior to the clinic visit from May 2014 to February 2015 had whole
24 blood samples drawn, which were then tested for DENV 1 4, CHIKV and ZIKV via reverse transcriptase polymerase chain reaction (rt PCR). The Haitian National Institutio nal Review Board and the Institutional Review Board at the University of Florida approved all study protocols. Written informed consent and assent was obtained from the children and their parents or guardians, respectively. Analysis Data cleaning and analy sis were carried out in R version 3.2.1 (R Core Team 2014). Descriptive statistics and univariate tests were computed for the study infection status. Forward and backwards s tepwise logistic regression models were employed for each pathogen specific outcome in order to narrow down the list of candidate predictors for use in the three modeling approaches, including logistic regression, classification trees and random forests. T he final model was selected when adding or removing selected variables that provided information (i.e. had non zero, non infinity point estimates and associated confide nce intervals) were included in subsequent analysis. Classification trees and random forest models were employed using the rpart (29) and randomForest (30) packages, respectively to classify individuals into those having a virologically confirmed infection of each virus using clinical and demographic data. Random forests are ensembles of classification trees. Classification trees are hierarchical partitions of the bootstrapped sample data, ba sed on random subsets of covariates; this allows for the identification of the most highly discriminative variables in the covariate set (31) Bootstrapped samples of the data are used to build and train the
25 random forest model, while the remaining observations act as an out of sample va classification for each observation, and the forest chooses the classification that has the most votes over all trees (31 33) In order to minimi ze bias in selecting variables to split on, each level of categorical variables selected from the stepwise regression were fed into the decision tree and random forest models as binary variables (34) A tota l of five random forest models of 5,000 classification trees were employed where all covariates from the stepwise regression were tried at each split on the full data set. The first four models predicted PCR confirmed infection with each DENV 1, DENV 4, CH IKV and ZIKV. The fifth model used any PCR confirmed DENV infection (DENV 1 or DENV 4) as the outcome. Variable importance was measured by calculating the percent increase in mean squared error (MSE) for the model performance if that variable were to be re moved from the model. A threshold analysis was then employed whereby the probability threshold for classification of cases was lo wered from 50% to 9 % in order to improve the likelihood of positive classification of cases, and thus the predictive performan ce of the models At times, it is more pertinent for public health surveillance and clinical practice to sacrifice specificity for higher sensitivity. Then, models were validated by training them on a randomly selected 66% of the data set, and then valida ted against the remaining 34% using the forecast package (35) This procedure was repeated 100 times for each pathogen, so that each type of model for the same pathogen was trained and tested on the same data sets. Receiver operating curves (ROC) were generated for the average of the 100 replicates for each
26 model type for each pathogen outcome. Predictive statistics including area under the receiver ope rating curve (AUROC), sensitivity and specificity were generated. Lastly, K means clustering with K=2 through K=7 was applied to the data, in order to better observe how clinical symptoms cluster with the presence and absence of each different pathogen. Th e variables included were binary, so that centroid values closer to 1 indicate that individuals who were classified into that cluster, more often than not had that symptom. Results Study Sample Characteristics Population demographic characteristics by inf ection status are presented in Table 2 1. DENV subtypes 2 and 3 were not identified in the samples by rt PCR. The mean age of the population was 7.8 years old (SD = 4.5 years). 54% of the 26 patients infected with DENV 1 were between 5 and 9 years old, whi le approximately 15% were under 5 years of age. 54% of the DENV 1 patients were male, and 80% were from the Lassalle school. Nearly 39% of these patients were in primary school, compared to 35% in kindergarten and 14% in secondary school. Of the 35 DENV 4 patients, 46% were male, with 34% and 31% in the 0 4 years and 5 9 years age categories, respectively. 66% and 26% of these patients were from Lassalle and Tiboukan schools, respectively. Kindergarteners represented 40% of the DENV 4 patient population, co mpared to 31% in primary school and 26% in secondary school. Among the 82 CHIKV patients, 48% were male and 15% were under five years old. 44% were aged 5 9 years, while 28% and 13% were aged 10 14 and 15+ years, respectively. 86% of CHIK patients were fro m Lassalle, while 11% were from Tiboukan
27 schools. 25% were in kindergarten, compared to over 50% in primary school and 23% in secondary school. 50% of the eight ZIKV positive patients were male, with 38% of patients under 5 years, and another 38% between 5 and 9 years old. 75% of the patients came from the Lassalle schools, while the remaining 25% came from Tiboukan. 50% of the patients were in kindergarten, while 25% were each in primary and secondary schools. Clinical Symptoms by Pathogen The proportion o f positive cases of each pathogen with specific symptoms is presented in Table 2 2. Of note, 65% and 59% of the patients who tested positive for CHIKV reported experiencing arthralgia and myalgia, respectively. Three (37.5%) of the eight ZIKV cases report ed experiencing arthralgia compared to four (14.3%) DENV 1 patients and zero DENV 4 patients. CHIKV patients experienced higher rates of various types of pain compared to patients infected with the other pathogens. The one departure from this trend was abd omen pain where 32% and 40% of the DENV 1 and DENV 4 patients experienced abdomen pain, respectively, compared to 24% and 25% of the CHIKV and ZIKV patients in the cohort. Multivariate Regression Results Table 2 3 shows the results from the multivariate s tepwise logistic regression. The blacked out cells indicate variables that were not selected in the final model. Of note, no one variable was included in every model, and sex and school were not included in any model. Increased age was significantly associ ated with odds of DENV 1 infection, with individuals aged 15 years and older having 5.5 times increased odds of PCR confirmed infection compared to individuals aged 0 4 years old (95%CI: 1.13, 27.05), after controlling for all other covariates. Oppositely individuals aged 15 years
28 and older had significantly lower odds of DENV 4 infection compared to 0 4 year olds (OR=0.05, 95%CI: 0.002, 0.59). Age was not significantly associated with odds of ZIKV infection, and it was not selected for inclusion in the a ny DENV model or the CHIKV model. Grade was significantly associated with CHIKV infection and DENV 4 infection, with secondary school students having a 4.41 (95%CI: 1.55, 12.76) and 20.54 (95%CI: 2.74, 195.55) times higher odds of infection compared to kin dergarteners, respectively. Headache was a significant risk factor for any DENV (OR=2.38, 95%CI: 1.24, 4.55), but protective against CHIKV (OR=0.40, 95%CI: 0.15, 0.96). Oppositely, arthralgia was significantly protective against any DENV (OR=0.30, 95%CI: 0 .06, 0.99), but a significant risk factor for CHIKV (OR=21.59, 95%CI: 6.76, 87.19). The variables selected from the stepwise regression were then used in decision tree and random forest models for all of the pathogens on the full data set. Of note, all ran dom forest models had 0% sensitivity except for the CHIKV random forest, which had 65% sensitivity and 94% specificity (data not shown). Variable importance measures were extracted from the random forest models, and are presented in Table 2 4. Arthralgia w as the most important symptom in the CHIKV random forest, with an associated mean percent increase in mean squared error of 140.5%, followed by inferior member pain which was associated with 48.7% increase in MSE.
29 Table 2 1. Demographics by pathogen V ariable DENV 1 (N=26) DENV 4 (N=35) CHIKV (N=82) ZIKV (N=8) Overall Population (N=268 ) Sex Male 14 (53.8%) 16 (45.7%) 39 (47.5%) 4 (50.0%) 136 Female 11 (42.3%) 19 (54.3%) 43 (52.4%) 4 (50.0%) 128 Unknown 1 ( 3.8 %) 0 (0.0%) 0 (0.0%) 0 (0.0%) 4 Age Group 0 4 yrs 4 (15.4%) 12 (34.3%) 12 (14.6%) 3 (37.5%) 82 5 9 yrs 14 (53.8%) 11 (31.4%) 36 (43.9%) 3 (37.5%) 102 10 14 yrs 3 (11.5%) 9 (25.7%) 23 (28.0%) 0 (0.0%) 54 15+ yrs 4 (15.4%) 3 (8.6%) 11 (13.4%) 2 (25.0%) 26 Unknown 1 (3.8%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 4 School Lassalle 21 (80.1%) 23 (65.7%) 71 (86.6%) 6 (75.0%) 181 Jean Jean 1 (3.8%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 13 Tiboukan 3 (11.5%) 9 (25.7%) 9 (11.0%) 2 (25.0%) 49 Ticouzen 0 (0.0%) 2 (5.7%) 2 (2.4%) 0 (0.0%) 13 Unknown 1 (3. 8%) 1 (2.9%) 0 (0.0%) 0 (0.0%) 2 Grade Kinder 9 (34.6%) 14 (40.0%) 21 (25.6%) 4 (50.0%) 105 Primary 10 (38.5%) 11 (31.4%) 41 (50.0%) 2 (25.0%) 95 Secondary 4 (14.2%) 9 (25.7%) 20 (24.3%) 2 (25.0%) 41 Unknown 3 (11.5%) 1 (2.9%) 0 (0.0%) 0 (0.0%) 31 Note: pathogens treated independently, so those coinfected are counted twice
30 Table 2 2. Pre valence of symptoms by pathogen DENV 1 (N=26) DENV 4 (N=35) CHIKV (N=82) ZIKV (N=8) Temperature (Celsius) (mean (sd)) 38.2 (1.2) 37.5 (0.9) 37.6 (1.0) 37.3 (0 .9) H eadache 9 (34.6%) 14 (40.0%) 30 (36.6%) 2 (25.0%) M yalgia 6 (23.1%) 1 (2.9%) 48 (58.5%) 3 (37.5%) N eck pain 1 (3.8%) 0 (0.0%) 8 (9.8%) 1 (12.5%) P halanges pain 1 (3.8%) 0 (0.0%) 3 (3.7%) 1 (12.5%) E lbow pain 1 (3.8%) 0 (0.0%) 13 (15.9%) 2 (25.0%) Wrist pain 1 (3.8%) 0 (0.0%) 23 (28.0%) 2 (25.0%) Inferior member pain 3 (11.5%) 0 (0.0%) 37 (45.5%) 3 (37.5%) Knee pain 1 (3.8%) 0 (0.0%) 27 (32.9%) 3 (37.5%) Ankle pain 1 (3.8%) 0 (0.0%) 18 (22.0%) 1 (12.5%) Abdomen pain 9 (34.6%) 17 (48.6%) 20 (2 4.4%) 2 (25.0%) Arthralgia 4 (15.4%) 0 (0.0%) 53 (64.6%) 3 (37.5%) Note : Pathogens treated independently, so those co infected are counted twice All variables in the DENV 1 random forest were associated with decreases in MSE, suggesting that these varia bles were not important predictors of DENV 1 infection. Attending secondary school and being aged 15 and older were the two most important variables in the DENV 4 model with 36.2% and 31.3% increases in MSE, respectively. The any DENV model only included t wo variables, both of which had relatively low importance scores; headache was associated with a 7.6% increase in MSE, while arthralgia was associated with an 11.2% increase in MSE. Temperature was associated with an 11.7% increase in MSE in the ZIKV model and represented the most important predictor, followed by phalanges pain (6.5% increase MSE). CHIKV Predictive Model s A 0.5 threshold was used for the models of RT PCR confirmed CHIKV infection. The ROC curves for the three different CHIKV models are sho wn in Figure 2 1, and the average sensitivity, specificity and AUROC are reported in Table 2 5. The logistic
31 regression model had the exact same sensitivity as the random forest model, and its specificity was only 1% lower. The sensitivity for the decision tree model was a full 10% lower than those for the logistic regression and random forest models. The AUCs for the logistic regression, classification tree and random forest models were 0.86, 0.82 and 0.86 respectively for this outcome DENV 1 Predictive Models A threshold of 0.09 was used for the models of RT PCR confirmed DENV 1 infection. The decision tree model was unable to distinguish between positive and negative infection status. The logistic regression model performed with 71% sensitivity (SD=0.06 ), 36% specificity (SD=0.05), and had an AUROC of 68% (SD=0.08). In comparison, the random fo rest model only performed with 59% sensitivity (SD=0.09 ), and 54 % specificity (SD=0.02) with an AUROC of 64% (SD=0.10) %. The ROC curves for DENV 1 are shown in Fig ure 2 2, and the summary results for DENV 1 prediction metrics are presented in Table 2 6. DENV 4 Predictive Models Similar to the DENV 1 model, the DENV 4 models used a threshold of 0.09 in order to classify cases, and the decision tree model was unable t o correctly classify any cases. The logistic regression DENV 4 model performed with 70% sensitivity (SD=0.06), and 44% specificity (SD=0.05), with an AUROC of 0.70 (SD=0.06). The random forest model performed marginally worse, with 68% sensitivity (SD=0.06 ), 48% specificity (SD=0.03), and an AUROC of 0.68 (SD=0.06) (Figure 2 3 and Table 2 7).
32 Table 2 3. Multivariate stepwise logistic regression for each outcome on full dataset CHIKV DENV 1 DENV 4 Any DENV ZIKV Variable OR 95% CI OR 95% CI OR 95% CI OR 95% CI OR 95% CI Age 0 4 yrs REF REF REF REF REF REF 5 9 yrs 3.58 (1.18, 13.42) 0.58 (0.12, 2.17) 0.85 (0.11, 7.27) 10 14 yrs 1.16 (0.21, 5.61) 0.34 (0.04, 2.00) 15+ yrs 5.49 (1.13, 27.05) 0.05 (0.003, 0.59) 3.2 (0.34, 30.98) School Lassalle Jean Jean Tiboukan Ticouzen Unknown School Grade Kinder REF REF REF REF Primary 2.05 (0.88, 4.79) 2.65 (0.65, 14.07) Secondary 4.41 (1.55, 12.76) 20.54 (2.74, 194.55) Unknown Grade 0.42 (0.02, 2.69) Sex Male Female Unknown Sex
33 Table 2 3. Continued CHIKV DENV 1 DENV 4 Any DENV ZIKV Variable OR 95% CI OR 95% CI OR 95% CI OR 95% CI OR 95% CI Clinical headache 0.4 (0.15, 0.96) 2.38 (1.24, 4.55) myalgia n eck pain 35.58 (0.49, 4084) phalanges pain 0.04 (0.00 0.58) INF (0, INF) INF (0, INF) ankle pain 0.05 (0, 1.72) wrist pain 7.43 (0.74, 110.86) 0 (0, INF) 0 (0, INF) knee pain 0 (0, INF) 0 (0, INF) elbow pain 0.13 (0.01, 1.25) INF (0, INF) INF (0, INF) abdominal pain 1.82 (0.81, 4.22) inferior member pain 4.09 (0.71, 24.80) 10.97 (1.29, 79.05) arthralgia pain 21.59 (6.76, 87.19) 0.3 (0.06, 0.99) temperature 0.63 (0.41, 0.94) 1.67 (1.14, 2.47) 0.44 (0.15, 1.06)
34 Table 2 4. Variable importance measures from random forest models on full data set CHIKV DENV 1 DENV 4 Any DENV ZIKV Variable % Inc MSE % Inc MSE % Inc MSE % Inc MSE % Inc MSE Age 0 4 yrs 1.14 1.98 5 9 yrs 2 1.32 2.69 10 14 yrs 4.68 8.5 15+ yrs 31.27 4.73 School Lassalle Jean Jean Tiboukan Ticouzen Unknown School Grade Kinder 15.36 13.19 Primary 9.76 19.46 Secondary 12.52 36.22 Unknown Grade 40.43 17.5 Sex Male Female Unknown Sex Clinical headache 0.15 7.63 myalgia neck pain 0.12 phalanges pain 25.34 25 ankle pain 6.51 wrist pain 40.38 9.21 knee pain 20.3 elbow pain 7.11 20.38 abdominal pain 4.39 inferior member pain 48.65 4.92 arthralgia pain 140.48 23.65 11.18 temperature 35.46 5.6 11.74
35 Figure 2 1. ROC Curve for CHIKV O utcome with 0.5 0 Threshold Table 2 5. Mean and Standard Deviation of Sensitivity, Specificity and AUROC in Validation for CHIKV M odels with 0.50 Threshold CHIKV Models Model, Mean (SD) Sensitivity Specificity AUROC Logistic Regression 0.72 (0.04) 0.62 (0.03) 0.86 (0.04) Decision Tree 0.62 (0.07) 0.63 (0.05) 0.82 (0.05) Random Forest 0.72 (0.04) 0.63 (0.02) 0.86 (0.04)
36 Figure 2 2. ROC C urves for DENV 1 O utcome with 0.09 T hreshold Table 2 6. Mean and S tan dard D eviation of S ensitivity, S pecificity and AUROC in V alidation for DENV 1 M odels with 0.09 T hreshold DENV 1 Models Model, Mean (SD) Sensitivity Specificity AUROC Logistic Regression 0.71 (0.06) 0.36 (0.05) 0.68 (0.08) Decision Tree NA NA NA Ran dom Forest 0.59 (0.09) 0.54 (0.02) 0.64 (0.10)
37 Figure 2 3. ROC C urve s for DENV 4 Outcome with 0.09 T hreshold Table 2 7. Mean and standard deviation of sensitivity, specificity and AUROC in validation for DENV 4 models with 0.09 threshold DENV 4 M odels Model, Mean (SD) Sensitivity Specificity AUROC Logistic Regression 0.70 (0.06) 0.44 (0.05) 0.70 (0.06) Decision Tree NA NA NA Random Forest 0.68 (0.06) 0.48 (0.03) 0.68 (0.06) Results from the k means cluster analysis for all symptoms with k=6 are shown in Figure 2 4 and the cluster assignments by participant infection status are presented in Table 2 8 Each colored line represents a cluster, with the Y axis representing the mean value of the symptoms among participants grouped in that cluster The black cluster (cluster 1) has all symptoms except phalanges pain, but nearly 100% of study participants in this cluster exhibited arthralgia, myalgia, headache, ankle pain and
38 inferior member pain (Figure 2 4 ). All 10 patients in cluster 1 tested pos itive for CHIKV (Table 2 8 ), though that only represents approximately 1/8 of all CHIKV cases. 20 1 cases. Myalgia and arthralgia were prevalent in cluster 2 (~80% each), but to a lesser extent than cluster 1, and approximately one fifth of patients in cluster 2 experienced inferior member pain and abdominal pain. Clusters 3, 4 and 6 contained a few patients from multiple disease outcome categories, with none dominating the cluster. The purple cluster (cluster 5), however, DENV 4 cases, 23 CHIKV cases and at least one in the remaining disease outcome catego ries. This cluster was characterized by nearly all of the cluster members experiencing each headaches and abdominal pain, and almost entirely absent of every other symptom. Table 2 8 Pathogen outcome by cluster from k mea ns clustering of symptoms with k = 6 Cluster 1 (black) Cluster 2 (red) Cluster 3 (blue) Cluster 4 (green) Cluster 5 (purple) Cluster 6 (orange) No Infection 0 5 3 2 106 7 ZIKV Only 0 0 0 0 2 0 CHIKV Only 10 20 8 12 23 4 CHIKV ZIKV Coinfection 0 0 1 2 2 0 DENV 4 Only 0 0 0 0 27 7 D ENV 4 ZIKV Coinfection 0 0 0 0 1 0 DENV 1 Only 0 3 1 1 17 4
39 Figure 2 4 k means clustering of sy mptoms shown by patients, with k =6
40 Discussion The correct diagnosis of CHIKV and other arboviruses are vital to ensuring patient safety and proper medi cal follow up. In particular, it is important to distinguish between CHIKV and other arboviruses, because treatment of a DENV patient with the pain killer used to treat CHIKV could result in severe clinical complications (26) In this study, we fit and validated regression and machine learning models to clinical symptom and demographic data to predict PCR co nfirmed arboviral infections among 262 unique febrile school children in Haiti. We further examined how patients with the various arboviral infection outcomes clustered based on their clinical symptoms. Stepwise Regression for Variable Selection In order to build parsimonious predictive models, we used forward and backwards stepwise logistic regression, and selected the final covariates to include based on AIC. Sex and school were not selected for inclusion in any of the models. This school related result is in contrast to prior studies that have found vector borne diseases to cluster in relatively small spatial scales (36 39) This finding could be due to the small sample size of the data, as well as how the data were aggregated at the end of the s eason. Arthralgia and attending secondary school were the two statistically significant risk factors for CHIKV infection with ORs of 21.6 and 4.4, respectively. Arthralgia is a well established clinical sign of CHIKV infection (24,40) Additional common clinical sequelae of CHIKV infection include joint pains such as in the elbows, wrists, knees an d fingers (40) though none of these were found to be significant associated with the odds of CHIKV infection in this study.
41 Interestingly, age was significantly associated with increased odds of DENV 1 infection, but decreased odds of DENV 4 infection. Students aged 15 years and older had a 5.5 times increased risk of DENV 1 infection compared to students aged 0 4 years, while the same age category exhibited a 95% decreased odds of DENV 4 compared to the same reference group. Primary infection with one strain of DENV is typically subclinical or asymptomatic and provides lifelong immunity against that serotype. However, subsequent infection by a second serotype has been associated with increased ris k of developing severe dengue disease (41) Older individuals are likely to have had multiple DENV i nfections, and thus an increased chance of severe clinical illness resulting in health seeking behavior. The DENV 4 finding is in contrast to this, but may be the result of the relatively small sample size (N=35). In the any DENV stepwise regression model, headache was found to be a significant risk factor (OR=2.38), while arthralgia was a borderline significant protective factor (OR=0.30) for infection. Severe headache is a known risk factor for DENV infection, while arthralgia is less common (24,42) Full Scope Modeling of CHIKV DENV 1 and DENV 4 To Aid Differential Diagnosis The CHIKV models were able to accurately pred ict RT PCR confirmed infection with 72% sensitivity and 63% specificity at 0.5 thresholds Arthralgia proved to be the most important predictor of PCR confirmed CHIKV infection, with a 140.5% increase in MSE. Inferior member pain and wrist pain were associ ated with 48.7% and 40.4% increase in MSE, respectively, which are consistent with established clinical signs of CHIKV (40)
42 Performance of the Predictive Models for CHIKV The logistic regression, decision tree and random forest model for the CHIKV DENV 1 and DENV 4 outcome s were then trained on 100 samples of 66% of the original data set and validated against the remaining 34%. Interestingly, the logistic regression model performed on par with o r better than the random forest model for all three pathogen outcomes. For CHIKV, both models had an average 72% sensitivity, and specificity in the low 60%. Furthermore, the logistic regression models employed on the DENV 1 and DENV 4 outcomes with a 0.09 probability threshold out performed the random forest models by 4% and 2%, for DENV 1 and DENV 4, respectively. This suggests that non arbovirus infection status from clinical symptoms. By further examining the association between clinical symptoms and disease status through kmeans clustering, we found that CHIKV cases were spread out across the six clusters, but that it specifically dominated clusters 1 (black) and 2 (red). The black cluste r contained patients that experienced high rates of myalgia, arthralgia, inferior member pain, ankle pain and headache, and none of who had phalanges pain. The red cluster had high rates of arthralgia and myalgia (approximately 80%), with relatively low ra tes of abdominal pain, headache and inferior member pain. These two clusters could represent a more severe and less severe clinical presentation of CHIKV infection, respectively. The purple cluster (cluster 5) contained over 40% of all observations and was experienced very high rates of abdominal pain and headaches, in the absence of almost every other symptom. It is possible that this result could reflect that those febrile
43 patients had a ga strointestinal illness or were infected by another pathogen that was not tested here. It is very important to be able to distinguish between CHIKV infections and non CHIKV infections, since the pain relief drug of choice for CHIKV, if given to a patient wi th DENV, could result in an increased likelihood of hemorrhagic symptoms (25) We were similarly able to successfully predict DENV 1 and DENV 4 infection, which could further improve accuracy of differential diagnoses and prescribing of proper pain relief drugs. DENV infection could result in p otentially life threatening clinical illness such as dengue shock syndrome. DENV patients should be carefully monitored on the short term in order to prevent severe outcomes and mortality. CHIKV and ZIKV patients need longer term follow up for chronic sequ elae such as arthritis for CHIKV and Guillain Barre syndrome for ZIKV. Women of childbearing age should in also be monitored due to the association between ZIKV infected pregnant women and microcephaly in their offspring. Limitations This study is subject to a number of limitations. First, the data used in this study come from 2 68 school aged children in the Gressier region of Haiti. Care should be used when generalizing the results to other areas. Second, data used in this analysis were collected through a syndromic surveillance program whereby febrile children who presented to a clinic had blood drawn and tested for arboviral infections. The probability of case detection was, in part, dependent on the prevalence, transmission rate and within host replica tion rate of each of the arboviruses. Additionally, the potential cross reactivity of the PCR assays could have resulted in misclassification or false positives.
44 Furthermore, the clinical symptom data used in this analysis were not collected in a questionn aire format. Rather, upon presenting to the clinic, whichever symptoms patients volunteered were recorded by the attending clinician. Thus, it is possible that less severe or (perceived severe) symptoms were not provided by the patient and thus incorrectly and symptoms of CHIKV, ZIKV, and DENV were not collected. Due to the relatively small sample size s full scale modeling analysis could not be conducted for ZIKV and DENV. Last, the data are from one region of Haiti. Thus, care should be taken when generalizing these findings to other age groups, and to other areas affected by coincident epidemics of CHIKV, DENV and ZIKV. This study should be repeated using additional seasons of d ata, from geographically diverse areas and in both child and adult populations. Despite these limitations, this study was able to distinguish between individuals with and without each CHIKV, DENV 1 and DENV 4 infections, suggesting that there may be clinic al signatures unique enough to these pathogens, and that predictive models could be useful in aiding clinicians in differential diagnosis of arboviruses among patients with acute febrile illness. However, there is still a marked need for improved field dia gnostic capacity such as rapid diagnostic tests, particularly in rural and underserved contexts.
45 CHAPTER 3 ESTIMATING THE CONTRIBUTION OF PRIOR INFECTION WITH SPECIFIC RESPIRATORY VIRUSES ON WITHIN SEASON VIRAL INFECTION HAZARDS Background Pathogen abunda nce and co circulation can depend on competiti ve or facilitative interactions between pathogens. Competition between pathogens may be mediated by immune responses in hosts (43) Immune mediated interactions may also be facilitative, where infection by one path ogen reduces immune responses that may fend off other pathogens, thus reducing the hurdle for subsequent pathogens to mount a productive infection. Pathogen induced mortality can regulate the number of infectious hosts (44,45) Pathogen induced morbidity can also reduce interactions that can lead to acquisition of new pathogens. Immunological Mechanisms The immune response kinetics is relevant to population level transmission dynamics due to its effect on th e duration of infectiousness, the likelihood of seeking care/treatment, the magnitude of infectiousness, and other key parameters used in transmission models (46,47) The adaptive immune response can alter host susceptibility to future infections from the same pathogen, as well as f rom antigenically similar pathogen s (cross protection ). Conversely, immune enhancement can occur, as is the case with dengue virus, where infection with one serotype provides lifelong immunity against that serotype, but subsequent infection from a differen t serotype can result in more severe clinical disease (48) Furthermore, the adaptive immune response can take a va riable a mount of time post infection to develop Th is acquired immunity can both influence pathogen competition by decreasing the susceptibility of the host to re infection with the same or antigenically similar pathogen, thus allowing antigenically
46 different stra ins or other pathogens to more successfully compete (44,49) as well as influence the probability of case detection either through enhancement, or throu gh cross protection In addition, the heightened innate immune response that follows seasonal rhinovirus infection has been suggested to have a protective effect against infection for a short period after recovery (refractory period) (50) This temporary immunity against further infection may impact the success of othe r pathogens at infecting the host. A better understanding of how specific infections influence risk of subsequent infection is needed in order to improve surveillance efforts and interventions. Respiratory Diseases Many viral pathogens can caus e respirator y illness including fever, cough, sore throat runny or stuffy nose, myalgia (muscle aches), fatigue, and headache. Together, this complex of symptoms is often termed influenza like illness (ILI). Viruses such as influenza virus types A and B (Flu A and Fl u B), respiratory syncytial virus types A and B (RSVA and RSVB), human rhinoviruses (HRV), human coronaviruses ( CoV ), human metapneumovirus (HMPV) and parainfluenza viruses (PIV) are associated with ILI (51) Despite causing similar clinical illness, these viruses are genetically distinct. Flu A and Flu B constitute two different genera of the Paramyxovi ridae family and have different evolutionary histories (52) While PIV is also a member of Paramyxoviridae it is in a separate genus from Flu A and Flu B, and its genome only codes for 6 proteins (53) compared to the 11 in the Flu A and B genome (54) HRV is a member of the Picornaviridae family, while HMPV and RSV (both A and B) are members of Pneumoviridae and CoV is a member of Coronaviridae One would expect mo re antigenic similarity among viruses within the same genera and families (e.g. Flu A and
47 Flu B), which could result in possible antibody mediated cross protection or in enhancement. Conversely, there is more antigenic dissimilarity between viruses of diff erent families, which may be expected to only result in non specific, waning immunity during the refractory period following infection, and not result in antibody mediated cross protection or enhancement. All of these viruses can exhibit overlapping season ality and thus may be co circulating during re spiratory disease season. As a result, their ecologies may be intertwined due to complex host immune responses and competition dynamics (55,56) Not much is known about the within season ecology and epidemiology of co circulating respiratory viruses. One study showed that A/H3N2 infectio n was a protective factor against subsequent infection by A/H1N1 during the same influenza season (57) Ben Cowling and colleagues similarly found that infection by seasonal influenza was protective against laboratory confirmed pandemic A/H1N1 infection (58) Additionally, a time series analysis showed that Flu A subtypes H3N2 and H1N1 produce cross immunity against Flu B (59) Convers ely, respiratory virus infection s in particular influenza virus and RSV infection s have been shown to be associated with secondary infections such as bacterial pneumonia, suggesting that pathogenesis from the initi al infection could increase susceptibility to infection with a second pathogen (60 62) The US Centers for Disease Control and Prevention (CDC) estimates that there are between 9.2 and 35.6 million annual influenza infections in the United States annuall y, resulting in between 140,000 and 710,000 annual hospitalizations, and between 12,000 and 56,000 annual deaths (63) Additionally, a survey of 4,000
48 households nationwide estimated the economic impact of non influenza viral respiratory infections to be approximately $40 billion per year (64) To improve the public health response to the threat of respiratory diseases, forecasting challenges for influenza and ILI have been developed by the US Centers for Disease Control and Prevention (CDC) and other agencies in order with the goal of predicting the timing of the start of the epidemic, the timing and magnitude of the epidemic peak, and the cumulative number of cases. A better understanding of the relationship between within season infection history and infection risk for specific viruses could not only provide insight into targeted surveillance st rategies and interventions, but also has the potential to improve forecasting efforts. In this study, we describe the occurrence of respiratory illness and its etiology in a cohort of school aged children in Allegheny County, Pennsylvania, US. W e fit sur vival models to PCR confirmed viral respiratory infection s in order to quantify how prior ILI events and prior infections with specifi c respiratory viruses influence the hazards of infection for other respiratory viruses in the same respiratory disease sea son. Materials and Methods The data used in this analysis come from the Social Mixing and Respiratory Transmission (SMART) Schools Study, which examined influenza like illness (ILI) associated absenteeism in nine schools, including grades K 12 from two s chool districts between December 2012 and April 2013. Kindergarten had both half day students (K H ) and full day students (K). 3,021 students in the two school districts were included in an absenteeism based ILI surveillance program unless they specifically opted out Further details on the surveillance system are described below.
49 that is, any swab that was taken from an ILI case, regardless of PCR outcomes and then for the four virus infection outcomes with the most PCR confirmed infections: Flu B, Flu A, HRV and CoV. Surveillance System Surveillance began on December 16, 2012 for students in the BHE, BHHS, CEC I and ME schools (a few days prior to the winter holiday break), and on January 3, 2013 in the BME, FSE, MK, SCE and WVE schools. For all schools, the end of the surveillance period was defined as April 25, 2013. When a child was absent from school, his or her parents were contacted to ascertain if they met the case definition for ILI: f ever of at least 37.8 C and either cough or sore throat nasal swab taken and was asked when they first experienced symptoms. A polymerase chain reaction (PCR) multiplex assay was then used to characterize which virus or viruses were the etiologic agent(s) associated with the ILI [ eSensor XT 8 instrument RVP RUO panel (Luminex, Austin, TX) ] The PCR multiplex was capable of detecting infection with 15 different respiratory viruses, inclu F F A), respiratory syncytial virus B (RSV B), human coronaviruses (H COV ), human metapneumovirus (HMPV), adenoviruses (Adeno) and human rhinoviruses (HRV). Data Setup We conducted separate survival analyses for where detection of Flu B, Flu A, HRV and CoV were the outcomes of interest. Time varying covariates were used to indicat e previous ly having had a swab, and infection with viruses b esides the pathogen
50 that defined the outcome (defined as detection of a pathogen by PCR in a swab obtained upon illness) The date the swab was taken was used to define the time that these covariates changed status. A ll students started with time varying c ovariate status indicating no prior infection for all covariates. S tart time for all students was the start of the surveillance period. For students who did not have any PCR confirmed infections, the end of the surveillance period was denoted as the stop t ime and the student was considered right censored If a pathogen other than the one of interest was detected in risk period began for that student with the time varying cov ariate status updated. If the date the swab was taken When more than one pathogen was detected in a single swab, the event was varying covariate status for both pathogens changed at the date of swab. If one of the co detected pathogens was the outcome of interest, no person time with prior infection with the other co detected pathogen was counted. The non time varying covariates attributed to each student were his/her self reported gender, grade and school. Analysis providing a nasal swab given that the participant had a prior swab taken, regardless of s pecific prior infections, and prior pathogen specific ILI. The three infections with the
51 order to both maximize statistical power, and avoid potential bias in searching only for positive interactions. Incidence rate ratio estimates were calculated for prior infection with ea ch pathogen on each outcome by Equation 2 1 (2 1) IRR E denotes the IRR estimate under an expo nential waiting time assumption E tvc=1 and T tvc=1 are the number of events and the sum of all person time with time varying covariate status equal to one regardless of infection outcome, respectively. Similarly, E tvc=0 and T tvc=0 are the number of events with time varying covariate status equal to zero and the sum of all person time at risk of an event with time varying covariate status e qual to zero, respectively. 95% confidence intervals (CIs) were generated using the method described by Rothman et al (65) Univariate Cox proportional hazards models were run using c ovariates including: time varying binary variables indicating having had a swab due to prior ILI regardless of outcome prior infection with each specific respiratory virus, gender, school and grade. Multivariate mixed effects Cox models were then used wit h school and grade as random effects terms, with gender and prior infection variables as fixed effects in order to estimate hazard ratios (HRs) and 95% CIs for each outcome School and grade were bles in regards to the goal of the present study. All d ata cleaning and analysis were performed in R using the survival (66) and coxme (67) packages.
52 Results Pathogens There were a total of 341 swabs taken over the course of the surveillance period. 266 students had at least one PCR confirmed viral infecti on (Table 3 1). A total of 29 students had a second PCR confirmed infection during the surveillance period, with 12 of those representing confirmed FluB infections. Demographics Study population demographics by PCR confirmed infection status are presented in Table 3 1. There were a total of 266 PCR confirmed infections during the surveillance period, with 151 (57%) of those infections occurring in males. Males represented 56% of the total cohort. 41 (15.4%) confirmed respiratory infections occurred in each the BME and WVE schools, and 32 (12.0%) occurred in the ME school. In contrast, BME, WVE, and ME represented 10.7%, 8.0% and 7.7% of the total cohort, respectively 70% of PCR confirmed infections occurred in primary school children (grades 1 4). 51 (19. 2%) PCR confirmed infections were observed in each first and second grade students, while first and second graders made up only 11.5% and 12.3% of the non infected student population, respectively. Kindergarteners (K H and K) made up 8.7% of the PCR infecte d students, but over 12% of the total cohort Similarly, while fifth graders made up 7.5% of the infected student population, they represented over 16% of the cohort
53 Table 3 1. Demographics of Study Population By Infection Status at End of Surveillance Period One or More PCR Positive Infection (N=266) N (%) All Cohort Members (N=302 1) N (%) Gender Male 151 (56.8%) 1572 (52.0 %) Female 115 (43.2%) 1449 ( 47.9 %) School BHE 25 (9.4%) 310 (10.3 %) BHHS 12 (4.5%) 258 (8.5 %) BME 41 (15.4%) 302 (10.0 %) ME 32 (12.0%) 218 (7.2 %) CIS 30 (11.3%) 569 ( 18.8 %) SCE 40 (15.0%) 451 ( 14.9 %) FSE 24 (9.0%) 300 ( 9.9 %) MK 21 (7.9%) 387 ( 12.8 %) WVE 41 (15.4%) 226 (7.5 %) Grade KG 1 (0.4%) 40 (1.3 %) K 22 (8.3%) 322 (10.7 %) 1 51 (19.2%) 368 ( 12.2 %) 2 51 (19 .2%) 389 (12 .8%) 3 43 (16.2%) 405 ( 13.4 %) 4 41 (15.4%) 404 ( 13.4 %) 5 20 (7.5%) 456 ( 15.1 %) 6 19 (7.1%) 293 ( 9.7 %) 7 3 (1.1%) 47 (1.6 %) 8 3 (1.1%) 38 (1.3%) 9 5 (1.9%) 89 (2.9 %) 10 3 (1.1%) 90 (3.0 %) 11 4 (1.5%) 67 (2.2 %) 12 0 (0.0%) 12 (0.4%) T he breakdown of pathogen detection by swab number is presented in Table 3 2 Of the 341 unique infections, 127 (37%) of them were positive for Flu B. Of those, 12 of the cases occurred on the second infection. Flu A made up 57 (17%) of the total PCR confir confirmed infections were CoV with 6 of them (10% of all CoV infections) occurring in a
54 sw Figures 3 1, 3 2, 3 3 and 3 4 are heat maps that show the daily number of swabs that were positive for each type of pathogen by number of swab and co detection events. Table 3 2. PCR Confirmed Infe ction by Swab Number Swab/Infection Number Pathogen First Second Total Positive Swabs Flu A 57 0 57 Flu B 115 12 127 RSV A 22 2 24 RSV B 4 2 6 CoV 55 6 61 Adenovirus 6 0 6 HMPV 3 0 3 HRV 46 6 52 PIV 4 1 5 Total 312 29 341 Figure 3 1. Heat map of swab timing, order and co detection status for Flu B
55 Figure 3 2. Heat map of swab timing, order and co detection status for Flu A Figure 3 3. Heat map of swab timing, order and co detection status for HRV
56 Figure 3 4. Heat map of swab timing order and co detection status for CoV IRR E and Univariate Hazard Estimates for Prior ILI, Prior Infections and Demographic Variables IRR E estimates from Equation 3 1 for the time varying previous infection covariates are presented in Table 3 3 along wit h univariate Cox regression results for all covariates for each of the outcomes of interest There were no statistically significant associations between gender and Flu B, Flu A, HRV or CoV infection from the univariate Cox models. School and grade are no t discussed below because they were treated as random effects in the multivariate analysis, though HR estimates for these covariates are present in Table 3 3. Influenza B (Flu B) Outcome Prior ILI was associated with a 4.3 times increased incidence of Flu B infection (95% CI: 3.71, 4.90) from the IRR E estimation, and 2.3 (95% CI: 1.3, 4.1) times
57 increased hazard of Flu B infection compared to those who did not have prior ILI from the univariate Cox model. Similarly, prior infection with Flu A was associated with a significantly increased risk of Flu B infection in both the IRR E estimate (IRR E = 3.2, 95% CI: 2.4, 4.1) and the Cox model (HR = 2.8, 95% CI: 1.2, 6.40). Prior HMPV infection was also associated with a statistically significant increased hazard of Flu B, with an IRR E = 20.8, and HR = 20.8 (95% CI: 2.90, 148.90). HRV, RSVA and CoV infections also occurred prior to, and were risk factors for, Flu B infection, but the associations were not statistically significant. Influenza A (Flu A) Outcome Flu A wa s never detected as a second swab (Table 3 2). None of the prior infection variables were significantly associated with subsequent Flu A infection in either the IRR E calculation or in the Cox model (Table 3 3). Human Rhinovirus (HRV) Outcome Any prior ILI was associated with a 42.3 times increased hazard of HRV infection from the IRR calculation, and 43.9 times increased hazard in the Cox model (95% CI: 20.87, 92.24). Prior infection with Flu A and CoV were similarly statistically significantly associated w ith increased hazard of HRV infection. In the IRR calculation, prior Flu A infection was estimated to increase the hazard of HRV infection by 2.7 times increased hazard of HRV infection, and by 4.3 times (95% CI: 1.34, 13.84) in the Cox model. Likewise, Co V infection was associated with a 5.6 times and 5.1 (95% CI: 1.58, 16.50) times increased hazard of HRV infection in the IRR calculation and Cox model, respectively.
58 Coronavirus (CoV) Outcome Prior ILI was associated with a 41.6 times increased hazard of C oV through the IRR calculation, and a 31.2 times increased hazard in the Cox model (95% CI: 14.65, 66.47). Additionally, prior HMPV infection was associated with a 44.7 times and 70.1 times (95% CI: 9.55, 514.51) increased hazard in the calculation and mod el, respectively. Multivariate, Mixed Effects Model Multivariate, mixed effects Cox models were then run with sex and infection history as fixed effects, and school and grade as random effects (Table 3 4). Of note, no prior infection had a significant prot ective effect against any of the three models. An summary of model specific results is presented below. Influenza B (Flu B) outcome Prior infection with Flu A remained a statistically significant risk factor for Flu B infection (HR = 2.7, 95% CI: 1.87, 3.5 5) after controlling for all covariates. Prior HMPV infection similarly remained a statistically significant risk factor for Flu B infection in the multivariate model, but its effect size was substantially attenuated (HR = 10.1, 95% CI: 8.03, 12.07). Prior infection with CV, RSVA and HRV, as well as gender remained non significantly associated with Flu B infection. Influenza A (Flu A) outcome None of the prior infection variables were statistically significantly associated with the hazard of Flu A infection Similarly, gender was not statistically significantly associated with Flu A infection (HR = 1.02, 95%CI: 0.49, 1.55).
59 Table 3 3. IRR estimate and univariate Cox regression for each pathogen outcome Flu B Flu A HRV CoV Variables IRR (95% CI) HR (95% CI) IRR (95% CI) HR (95% CI) IRR (95% CI) HR (95% CI) IRR (95% CI) HR (95% CI) Previous Infections Previous Swab 4.31 (3.71, 4.90) 2.26 (1.25, 4.09) 45.66 (45.40, 45.92) 43.87 (20.87, 92.24) 41.62 (40.87, 42.36) 31.2 (14.65, 66.47) Previous Flu A 3.24 (2.42, 4.06) 2.82 (1.24, 6.40) 4.15 (3.46, 4.84) 4.3 (1.34, 13.84) 2.43 (1.03, 3.85) 2.57 (0.63, 10.52) Previous Flu B 1.29 (0.00, 3.29) 1.37 (0.19, 10.07) 1.11 (0.00, 3.09) 2.8 (0.37, 20.99) Previous CoV 3.15 (2.16, 4.06) 2.45 (0.90, 6.63) 5.32 (4.63, 6.01) 5.11 (1.58, 16.50) Previous HMPV 20.84 (18.87, 22.81) 20.79 (2.90, 148.90) 44.68 (42.70, 46.66) 70.09 (9.55,514.51) Previous RSVA 1.85 (0.00, 3.82) 1.4 (0.20, 10.02) 5.02 (3.02, 7.02) 4.75 (0.65, 34.53) Previous RSVB Previous HRV 1.18 (0.00, 3.15) 1.02 (0.14, 7.29) Previous Adeno Previous PIV Gender Male 1.13 (0.80, 1.61) 1.02 (0.61, 1.71) 1.02 (0.88, 1.16) 1.47 (0.84, 2.56) 1.12 (0.67, 1.86) Female REF REF REF REF
60 Table 3 3. Continued Flu B Flu A HRV CoV Variables IRR (95% CI) HR (95% CI) IRR (95% CI) HR (95% CI) IRR (95% CI) HR (95% CI) IRR (95% CI) HR (95% CI) Grade KG 0.63 (0.21, 1. 87) K 0.38 (0.18, 0.82) 0.64 (0.27, 1.53) 0.14 (0.02, 1.13) 1.08 (0.14, 8.53) 1 REF REF REF REF 2 1.02 (0.60, 1.73) 0.74 (0.33, 1.62) 1.91 (0.82, 4.45) 0.84 (0.32, 2.17) 3 0.98 (0.58, 1.68) 0.19 (0.06, 0.67) 1.14 (0.45, 2.89) 0.60 (0.21 1.69) 4 0.95 (0.55, 1.63) 0.38 (0.15, 1.00) 0.79 (0.29, 2.18) 0.80 (0.31, 2.09) 5 0.15 (0.06, 0.40) 0.36 (0.15, 0.85) 0.17 (0.04, 0.81) 0.39 (0.13, 1.17) 6 0.55 (0.16, 1.83) 1.63 (0.69, 3.85) 7 0.29 (0.04, 2.15) 0.13 (0.03, 0.57) 8 1.29 (0.16, 10.31) 2.25 (0.49, 10.43) 9 0.31 (0.07, 1.31) 0.39 (0.09, 1.75) 0.40 (0.05, 3.14) 10 0.15 (0.02, 1.12) 0.86 (0.18, 4.09) 0.39 (0.05, 3.10) 11 0.53 (0.12, 2.34) 0.57 (0.07, 4.63) 0.53 (0.07, 4.18) 12 School BHE REF REF REF REF BHHS 0.72 (0.17, 3.01) 0.59 (0.18, 1.97) 0.40 (0.11, 1.46) 0.51 (0.13, 1.99) BME 5.28 (2.02, 13.80) 2.88 (1.10, 7.54) 1.70 (0.70, 4.15) 1.44 (0.51, 4.03) CIS 0.54 (0.16, 1.89) 0.61 (0.23, 1.57) 0.18 (0.05, 0.66 ) 1.01 (0.40, 2.54) FSE 2.08 (0.71, 6.07) 1.66 (0.57, 4.87) 1.14 (0.43, 3.02) 0.18 (0.02, 1.47) ME 5.57 (2.08, 14.93) 0.88 (0.29, 2.69) 0.79 (0.26, 2.34) 0.81 (0.24, 2.77) MK 1.13 (0.36, 3.58) 0.36 (0.07, 1.79) 0.55 (0.18, 1.68) 0.98 (0.34, 2.84) SCE 3.52 (1.35, 9.20) 1.10 (0.37, 3.22) 0.47 (0.15, 1.44) 0.60 (0.19, 1.91) WVE 8.16 (3.15, 21.13) 0.94 (0.24, 3.70) 0.37 (0.08, 1.76) 2.89 (1.12, 7.47)
61 Human rhinovirus (HRV) outcome As with the Flu B model, prior infection with Flu A remained a significant risk factor for HRV infection in the multivariate model, though the magnitude of its effect decreased after controlling for all covariates (HR = 3.5, 95% CI: 2.27, 4.67). Similarly, prior CV infection remained a significant risk fact or, and was associated with a 4.2 times increased hazard of HRV infection (95% CI: 2.97, 5.43). While prior RSVA infection was not significantly associated with HRV infection in the univariate model, after controlling for all covariates, it became a statis tically significant risk factor for HRV (HR = 4.02, 95% CI: 2.00, 6.04). Coronavirus (CoV) outcome The effect of prior infection with Flu A on the hazard of CoV infection increased from 2.6 in the univariate model to 2.8 in the multivariate model (95% CI: 1.39, 4.21). Additionally, prior HMPV infection remained a statistically significant risk factor with a large effect size in the multivariate model (HR = 52.3, 95% CI: 50.19, 54.31). Discussion This study used a survival analysis framework to examine how prior infection with specific respiratory viruses influences the infection hazards for Flu B, HRV and CoV during the same respiratory disease season. During the surveillance period, there were a total of 29 students who had PCR confirmed viral infections o n two separate occasions. 12 of the 29 second infections were Flu B, and 6 of the second infections were each HRV and CoV.
62 Table 3 4. Multivariate, Mixed Effects Cox Regression for Each Pathogen Outcome Flu B Flu A HRV CoV Fixed Effects HR 95% CI HR 95 % CI HR 95% CI HR 95% CI Previous Infections Previous Flu A 2.71 (1.87, 3.55) 3.47 (2.27, 4.67) 2.8 (1.39, 4.21) Previous Flu B 0.88 (0.00, 2.94) 2.74 (0.70, 4.78) Previous CoV 1.96 (0.94, 2.98) 4.2 (2.97, 5.43) Previous HMPV 10.05 (8 .03, 12.07) 52.25 (50.19, 54.31) Previous RSVA 1.21 (0.00, 3.20) 4.02 (2.00, 6.04) Previous RSVB Previous HRV 1 (0.00, 3.02) Previous Adenovirus Previous PIV Gender Male 1.01 (0.66, 1.36) 1.02 (0.49, 1.5 5) 1.38 (0.83, 1.93) 1.15 (0.64, 1.66) Female REF REF REF REF REF REF REF REF Random Effects Standard Deviation Variance Standard Deviation Variance Standard Deviation Variance Standard Deviation Variance Grade 0.74 0.55 0.5 0.25 0.5 0.25 0.3 0.09 Scho ol 0.67 0.45 0.36 0.13 0.47 0.22 0.48 0.23
63 Any Swab Hazard Estimates Having had a prior swab, regardless of whether or not there was a PCR confirmed infection was shown to have a statistically significant association with each Flu B, HRV and CoV infecti on in univariate analysis (Table 3 3), but not in the Flu A analysis. There was a large outbreak of Flu A in this population at the beginning of the respiratory disease season, which is a possible explanation for why there no second swabs tested positive f or Flu A. The positive results in the Flu B, HRV and CoV models could also be indicative that past participation in self reporting surveillance systems may predict future participation, rather than a true association between prior and subsequent infection. An additional explanation could be that prior infections with specific viruses, through antibody mediated enhancement, were responsible for the observed association. Oppositely, it could suggest that individuals with repeated ILI may have chronic exposure to, or other risk factors for, viral respiratory infections. A more detailed discussion of confounding is under Limitations. To further examine this general association, we then ran multivariate, mixed effects Cox models to determine if the observed assoc iation was due to confirmed infection with specific pathogens, or an artifact of the surveillance system. While this surveillance effect could not be eliminated, we were able to determine that having any prior swab (and thus ILI) was predictive of future F lu B, HRV and CoV infections. IRR E estimates for infection with Flu B and univariate Cox models suggested that prior infection with Flu A and HMPV were significant risk factors (Table 3 3). These results held consistent in the multivariate model (Table 3 4 ). Similarly, prior Flu A infection was significantly associated with HRV and CoV infection (HR = 3.5 and 2.8, respectively) in the corresponding multivariate models. Additionally, prior infection with
64 RSVA was statistically significantly associated with H RV infection (HR = 4.02, 95% CI: 2.00, 6.04), and prior HMPV infection was statistically significantly associated with CoV infection. The significant association between prior Flu A infection and subsequent Flu B infection may be indicative of adaptive imm une enhancement, due to their genetic similarities. Limitations This study is subject to a number of limitations. First the data used in this study come from surveillance in two school districts in one county during one respiratory disease season. Care s hould be given in extrapolating these findings to different populations, and if possible, should be done on surveillance data from multiple respiratory disease seasons in order to more precisely describe the relationship between prior infection and the wit hin season hazard of subsequent infection among this high risk demographic. Second, surveillance system was based on absenteeism. In order for a child to be able to miss school, an adult must be able to stay home with him/her. Thus, children from families of high socioeconomic status who can afford a nanny or for one parent to miss work or not work altogether are more likely to be absent given ILI symptoms than a child from a family of lower socioeconomic status. Information on socioeconomic status was not available, and therefore could not be included in the model. However, public and that is associated with socioeconomic status. Schools were treated as random effects in or der to focus on prior infection history among the individuals rather than the nested risk at the school level.
65 Additionally, most of the risk of influenza infection via direct contact or through fomites within a school likely occurs at the classroom level. Because middle and high school students generally change classes throughout the day, while kindergarteners and primary school students likely remain in the same class for most of the day, this information was not included in our models. Measures of social network centrality may be useful in quantifying the true risk of direct transmission in a school setting. Furthermore, risk of respiratory infection exists outside of the school setting. Households, school buses and public spaces are potential sources of infection that to some information on these types of exposures in the outcome and time varying covariates, the model did not explicitly adjust for possible household or co mmunity exposures. Next, information on seasonal influenza vaccination status was not available for all students and therefore was not included in the analysis. Vaccination against seasonal influenza has been shown to be associated with a significant incre ase in non influenza virus infections (68) which may have impacted our results. Additionally, respiratory pathogens can be subclinical or asymptomatic, which means cases of the various pathogens likely went undetected. Further, if prior infection was asymptomatic or subclinical, it could have a differential effect on clinical disease for subsequent infections compared to symptomatic prior infections. These unmeasured conf ounders pose a potential problem in extrapolating these results, and represent an avenue for future research. Asymptomatic participants, in addition to those experiencing
66 symptoms, should be swabbed throughout the respiratory disease season in order to est imate the true baseline hazard of infection. Future studies of prospective respiratory disease surveillance in defined populations would benefit from additional collection of swabs among persons not exhibiting ILI throughout the entire respiratory disease season. Persons who have sought care for a certain condition may be more likely to subsequently seek care. Thus, it is possible that allowing study or nursing staff in this study to take a swab could be a predictor of subsequent of allowing a second swab t o be taken later on in the surveillance period. We attempted to test for this by including analysis, and then testing specific pathogens in multivariate models. Indee d, having a prior swab taken was a statistically significant risk factor for infection with each Flu B, Flu A, HRV and CoV. By examining the infection outcome associated with those prior swabs, we found that prior infection by specific pathogens namely Flu A and HMPV may drive the observed association between prior swab and subsequent Flu B and CoV infection. Similarly, Flu A and CoV may drive the observed association between prior ILI and CoV infection. Despite these limitations, our results suggest that e nhanced surveillance efforts among individuals who have already exhibited respiratory infections during the course of the respiratory disease season may be appropriate for preventing severe clinical disease. While this finding could be the result of many p ossible immunological or surveillance based mechanisms, this conclusion only needs to depend on an association between prior infection status and subsequent infection risk.
67 Parents, teachers, school administrators, and clinicians should use the results of this study to take individual based control measures to prevent subsequent respiratory infections in children. Clinicians can discuss with parents and children the risk behaviors associated with respiratory infections, but could also (as an extreme exampl e) prescribe antiviral drugs prophylactically to children who have already had one infection during the respiratory symptoms in children who have already had one infectio n and then keep them out of school and take them to the doctor upon first signs of subsequent illness. T eachers and administrators can improve sanitization protocols for classrooms and other public areas in the schools, and can provide hand sanitizers for students before and after lunch and recess periods. They can also stress to parents the importance of influenza vaccination, and the importance of removing a child from school if he or she is sick.
68 CHAPTER 4 THE ROLE OF BENZATHINE PENICILIN G AT PREDICTIN G AND PREVENTING ALL CAUSE ACUTE RESPIRATORY DISEASE IN MILITARY RECRUITS Background Acute respiratory diseases (ARD) represent a significant concern to the US military and thus national security, as they are responsible for between 12,000 and 27,000 misse d training days among recruits, annually (69) The US Army defines ARD syndromically; soldiers must present to a medical treatment facility with: oral temperature > 100.5 degrees Fahrenheit, recen t signs or symptoms of acute respiratory tract inflammation, and either having a limited duty profile or being removed from duty altogether to have ARD. As a result, both viral and bacterial pathogens, as well as non pathogenic conditions can result in AR D. Adenoviruses, influenza viruses, respiratory syncytial viruses and human coronaviruses make up the majority of viral contributors to ARD, while Streptococcus pneumoniae Streptococcus pyogenes Mycoplasma pneumoniae represent some of the m ajor bacterial etiologic agents of ARD (70,71) Recruits at basic combat training (BCT) have been shown to be at higher risk of ARD compared to their civilian counterparts (70,71) In the 1960s, before the rollout of the adenovirus vaccine, approximately 80% of trainees would contract a respiratory infection during BCT. The majority of those infections were attributed to adenovirus types 4 and 7 (72) Of those who contracted a respiratory infection, nearly 20% were hospitalized (4) This excess risk has been attributed to crowded barracks (73) and other environmental challenges (74) as well as to stress from physically and psychologically demanding conditions (71,73,75,76)
69 To address this burden, the oral adenovirus vaccine against types 4 and 7 (AdV 4 and round beginning in 1971 (70) The sole manufacturer of AdV 4 and 7 ceased production in 1994 due to low public demand for the product. The final doses were shipped in 1996, and distributed to Army recruits only during the winter months until the stockpile was depleted in 1999 (77) As a result of the phase out, ARD rates, particularly adenovirus associated ARD rates, increased at BCT sites (72) The increased disease burden cost the Army an estimated $10 $26 million in medical costs and lost training time annually. In 2011, a live enteric coated oral vaccines against types 4 and approved for use in military personnel (78) The subsequent year round administration of the new vaccines resulted in drastic, sustained declines in ARD at all BCT installations (77,78) dose antibiotic injection administered to military recruits upon accession to BCT. Prophylactic use of BPG is intended to reduce the impact of imported bacterial infections on troop h ealth upon entry to BCT in hopes of preventing further spread (79) Multiple studies have suggested that BPG and other a ntibiotic prophylaxis have positive effects on recruit health and ARD rates (80,81) Injection with 1.2 million units of BPG has been shown to provide protection against infection with strep and other bacterial pathogens for u p to six weeks (79) with other studies suggesting the duration may be shorter (82) One such study found that the duration of protection may be as low as less than two weeks in Navy recruits (83) Generally, BPG prophylaxis is given to all Army recruits upon entry to
70 the 10 w eek long BCT program. Additional doses are not administered to tr ainees after protection wanes. Although it has been Army policy to administer BPG to new recruits since the 1950s, there has been some variability in administration. For reasons that are not clear, Fort Jackson has never given BPG prophylaxis. Also, multiple BPG manufacturing problems resulted in relatively minor and localized supply shortages, and some of these shortages were associated with increases in ARD burden. One of the most extensive BPG shortages occurred in April July 2016 and affected all BCT installations. Fort Benning chose not to resume routine prophylaxis after the shortage ended, and thus has not administered BPG since March 2016.. The variability in the use of BPG in time r elative to variability in adenovirus vaccine administration at different sites provides an opportunity to conduct an observational study of the impact of BPG use on ARD. Here we fit, k fold cross validated and externally validated random forest (RF) and Po isson regression (PR) models to weekly counts of ARD in Army recruits at BCT installations from January 1991 to April 2017 in order to evaluate the role of BPG prophylaxis at reducing ARD incidence. We evaluated the relative performance of the RF to the PR in predicting all cause ARD, and quantified the relative importance of BPG and other covariates in this prediction. We hypothesized that BPG availability would be significantly associated with ARD. Specifically, we hypothesized that BPG would have a moder ately protective effect on ARD incidence, and that its effect size would be lower than that of the new adenovirus vaccine.
71 Materials and Methods Data As mentioned above, a trainee is said to have ARD if he or she presents to a medical treatment facility wi th: oral temperature > 100.5 degrees Fahrenheit, recent signs or symptoms of acute respiratory tract inflammation, and either having a limited duty profile or being removed from duty altogether. Weekly counts of all cause ARD from January 1991 April 2017 among recruits at the four BCT installations, recruit population size and training type (BCT or a longer program called One Station Unit Training OSUT a combined BCT advanced individual training (AIT) program whereby both training programs are completed i n succession at the same installation rather than completing BCT at one installation and traveling to a second installation for AIT) were obtained from the Army ARD Surveillance Program (84) BCT installations are required to report weekly counts of ARD to Army Public Health Center as part as part of the ARD Surveillance Program. Data on BPG prophylaxis and adenovirus vaccine availability (both the old and new vaccines) were extracted from the literature and cross referenced against ARD Surveillance reports and Army memoranda. Analysis Data cleaning and analysis were implemented using R version 3.2.1 (R Core Team 2014). Both a random forest (RF) model and a Poisson regression (PR) model were used in a k fold cross validation framework (k = 5), with three yea rs of data one held out for external validation. In k fold cross size. The model is then trained on k 1 folds and used to pre dict values in the k th fold. The process is iterative so that each of the folds acts as the test set and a predicted
72 value is generated for every observation in the original data set. Performance was evaluated by comparing mean absolute error (MAE) across folds within the same model and within each fold across models (14) K=5 was selected in order to conserve computational power, while also providing an 80/20 split between the training and testing set in each of the iterations. The superiority of the RF relative to the PR was calculated by 100*[1 (MAE RF / MAE PR )] (4 1) and statistica test adjusted for sample overlap (16) RF procedures and prediction metrics were run using the randomForest (30) and forecast (35) R packages, respectively. Random forest models are a nonlinear ensemble of decision trees in which bootstrapped samples from the original data set are used for model construction, while the remaining obser vations constitute the out of box sample (31 33) A single decision tree in the random forest represents a hierarchical partition of the bootstrapped data grown by considering highly dis criminative variables among randomly drawn subsets of the entire covariate set (31) Each level of categorical variables was fed into the model as a binary variable in order to minimize bias in selecting variables to split on (34) Mean squa red error (MSE) was used as a metric for variable importance to assess the relative predictive ability of each covariate in the model, including BPG. PR was used to estimate incidence rate ratios (IRRs) and corresponding confidence intervals for BPG and ea ch o f the covariates on weekly ARD. Trainee population size was included in this model as an offset rather than as a fixed effect in
73 order to adjust for how the size of the population at risk influences the likelihood of observing more cases. The RF model was then re run on the external validation set, and variable importance measures were extracted and compared to the average importance measures from the cross validation. Similarly, a PR model was performed on the external validation set. The MAEs for the RF and PR models were compared using E quation 4 1. We then ran PR separately on the entire training data set (used in the cross validation) to obtain IRR point estimates and corresponding confidence intervals for BPG and other covariates. These IRR estim ates allowed us to test the hypothesis that BPG would have a significant protective effect on the rate of all cause ARD, and to compare the magnitude of the effect to that of adenovirus vaccine. Results Prediction Cross validation (22.5 years) Figure 4 1 p resents the raw time series (1991 2017) for the four BCT installations, Fort Sill (top left), Fort Benning (top right), Fort Leonard Wood (bottom left) and Fort Jackson (bottom right). The predicted values from the random forest and Poisson regression 5 fold cross validation analyses are overlaid on the original time series in Figure 4 2. Prediction accuracy for the RF and PR k fold cross validation analyses are presented in Table 4 1.
74 Figure 4 1. Time series of ARD cases from 1991 2017 at Fort Sill (top left), Fort Leonard Wood (top right), Fort Benning (bottom left), and Fort Jackson (bottom right). The period between the red bars is the adenovirus vaccine indicates BPG prophylaxis shortage. Fort Jackson never has used BPG for prophylaxis. The RF model was able to explain on average 65.2% of the variance in the data across the folds, and the estim ates across each testing fold were consistent ranging from 63.8% 66.5%. The RF model, on average and for each individual fold, had a lower MAE than the PR, average MAE of 6.5 (range 6.3 6.7), and 7.3 (range 7.1 7.7) for the random forest and Poisson regression models, respectively. From E quation 4 1, the average superiority of the random forest model over the Poisson regression model was estimated at 11.6%, with a range of 10.9% 12.1% across folds. This difference was statistically significant (P<0. 001).
75 Variable importance, as measured by percent increase in mean squared error (MSE) for each independent variable when left out of models, in each fold of the random forest model is presented in Table 4 2. Of note, the new adenovirus vaccine was the m ost important variable in all five folds, with an average of 133% increase in MSE. Trainee class size in hundreds and the old adenovirus vaccine were the second and third most important variables with mean % increase MSE of 83% and 75%, respectively. The m onth of January and training type were the next most important predictors, with percent increase MSE values of 39%, 35%, respectively. Exclusion of an indicator of BPG use increased MSE by an average of 21.2%. Figure 4 2. Predicted and Observed Values fr om the 5 Fold Cross Validation
76 Table 4 1. Comparing Prediction Accuracy for 5 Fold Cross Validation by Mean Absolute Error for Random Forest and Poisson Regression % Variance Explained Random Forest MAE Random Forest MAE Poisson Regression 1 Ratio o f MAE Fold 1 65.37 6.53 7.33 10.91 Fold 2 66.49 6.72 7.66 12.27 Fold 3 63.75 6.44 7.28 11.54 Fold 4 65.83 6.46 7.31 11.63 Fold 5 64.39 6.27 7.13 12.06 Average 65.17 6.48 7.34 11.68 External Validation Both the PR and RF models were used to predict ARD case counts in an external validation set consisting of three years one from each adenovirus vaccine 4 3 overlays the predicted values from each of the two models on the observed data. The RF model still out performed the PR by 20.9%, wi th MAE values of 4.9 and 6.2 for the RF and PR, respectively (data not shown). Table 4 3 presents the average variable importance measures for the covariates from the cross validated random forest model, and those from when the same random forest model wa s applied to the e xternal validation set. The five most important variables in the cross validation analysis were also the five most important variables in the validation analysis: the new adenovirus vaccine availability, trainee population size, old adeno virus vaccine availability the month of January and training type. In both analyses, BPG availability was ranked as the 9 th most important variable.
77 Table 4 2. Variable Importance (% Increase Mean Squared Error when left out) for Random Forest Model (k fold cross validation) Using Trainee Class Size Variable Mean SD Minimum Maximum Population in hundreds 83.4 5.4 74.9 89.7 Fort Sill 22.9 2.9 20.0 26.8 Fort Leonard Wood 18.2 0.8 17.4 19.5 Fort Benning 17.9 1.3 16.4 19.3 Fort Jackson 21.8 1.2 20.0 23 .4 January 39.0 3.4 33.8 42.6 February 23.0 2.1 20.9 25.7 March 14.4 4.9 9.4 21.1 April 9.3 1.6 7.4 11.4 May 15.3 3.1 11.7 19.6 June 8.2 3.4 5.2 13.6 July 10.6 1.5 9.3 12.9 August 11.4 3.3 6.6 15.2 September 4.3 3.8 0.5 10.3 October 3.0 1.2 2.0 4 .8 November 8.6 4.0 3.0 13.6 December 1.4 7.3 8.4 9.5 BPG 20.8 1.4 19.2 22.3 Old Adenovirus Vaccine 75.3 3.2 71.7 80.1 New Adenovirus Vaccine 133.1 5.3 127.6 141.8 BCT/OSUT 35.1 2.4 32.7 38.7
78 Table 4 3. Comparison of variable importance between th e cross validation set and the external validation set from RF models Variable Average % Increase MSE CV % Increase MSE EV New Adenovirus Vaccine 133.09 131.46 Trainee Population Size in Hundreds 83.398 82.34 Old Adenovirus Vaccine 75.262 71.67 January 38.998 40.71 Training Type (BCT/OSUT) 35.146 35.69 February 23 25.17 Fort Sill 22.926 21.75 Fort Jackson 21.794 20.85 BPG Availability 20.806 19.96 Fort Leonard Wood 18.226 19.51 Fort Benning 17.928 16.7 May 15.274 13.75 March 14.39 9.76 August 11.404 9.68 July 10.56 9.54 April 9.258 9.26 November 8.566 9 June 8.16 7.34 September 4.262 4.84 October 3.002 3.03 December 1.424 1.95 Inference on BPG Effectiveness Poisson regression model results from the training data set used in the cross validation procedure are presented in Table 4 4 to obtain IRRs. BPG prophylaxis availability was found to be significantly protective against all cause ARD (IRR = 0.68, 95% CI: 0.67, 0.70). Similarly, both the old and new adenovirus vaccines were signific antly protective against ARD when compared to the adenovirus vaccine shortage, with IRRs of 0.39 (95% CI: 038, 0.39) and 0.11 (95% CI: 0.10, 0.11), respectively. Compared to Fort Jackson, Forts Benning and Leonard Wood were
79 significant risk factors for ARD (IRR = 1.17, 95%CI: 1.13, 1.20, and IRR = 1.29, 95%CI: 1.26, 1.33, respectively). Fort Sill was significantly protective compared to Fort Jackson (IRR = 0.67, 95%CI: 0.65, 0.69). BCT training was a significant risk factor compar ed to OSUT. Figure 4 3. Predicted and Observed ARD Case Counts in External Validation Set
80 Table 4 4. Incidence Rate Ratios for Independent Variables in the Poisson Regression Model on the Training Data with Population Offset (22.5 years) Variable IRR 95% CI P Value Installation Fort Sill 0.67 (0.65, 0.69) <0.001 Fort Benning 1.17 (1.13, 1.20) <0.001 Fort Leonard Wood 1.29 (1.26, 1.33) <0.001 Fort Jackson REF REF REF Month January 0.55 (0.53, 0.57) <0.001 February 0.81 (0.78, 0.83) <0.001 March 1.06 (1.03, 1.08) <0. 001 April 1.07 (1.04, 1.10) <0.001 May 1.11 (1.08, 1.14) <0.001 June 0.93 (0.90, 0.95) <0.001 July 1.02 (0.99, 1.04) 0.211 August 1.08 (1.06, 1.11) <0.001 September REF REF REF October 1.03 (1.00, 1.05) 0.031 November 1.11 (1.08, 1.14) <0.001 Dece mber 1.04 (1.01, 1.07) 0.018 BPG Availability Available 0.68 (0.67, 0.70) <0.001 Shortage REF REF REF Adenovirus Vaccine Availability New Vaccine 0.11 (0.11, 0.11) <0.001 Old Vaccine 0.39 (0.38, 0.39) <0.001 Shortage REF REF REF Discussion In this study, we sought to build a predictive model for all cause ARD and assess how important BPG and other covariates are in its performance. Furthermore, we hypothesized that BPG was significantly associated with all cause ARD specifically, that BPG wo uld have a significant, moderately predictive of and protective against all cause ARD. We also hypothesized that the magnitudes of its association and predictive ability would be lower than those of the new adenovirus vaccine.
81 The random forest model expl ained on average over 65% of the variance in the data, and outperformed the Poisson regression model the k fold cross validation by 11.6% (Table 4 accuracy was over 20% superior to that o f the Poisson regression model. BPG, in the PR on the full dataset, was associated with a 32% reduction in ARD cases (IRR = 0.68; 95%CI: 0.67, 0.70). In the random forest k fold cross validation, BPG was found to be the 7 th most important variable, with a 21 percent increase in RMSE. This indicates that while BPG availability is not highly predictive of all cause ARD, it does have a significantly protective effect. This result is consistent with a previous study which found that BPG prophylaxis was associa ted with a broad and persistent protection against not only group A streptococcal infections, but ARD in general (85) It is well established that the development of the new adenovirus vaccine was associated with substantial declines in ARD (77,78) Our finding that the new adenovirus vaccine was associated with an 89% lower incidence rate compared to when there was no adenovirus vaccine further supports its effectiveness. The old adenovirus vaccine, which was thought to be effective (86) but not to the same degree as the new vaccine (77) was similarly statistically significantly protective ag ainst ARD, with a 61% reduced IRR compared to weeks when there was no adenovirus vaccine available. Trainee population size in hundreds was the second most important variable in the random forest model, representing an 84% mean increase in MSE across the five folds. This result was expected, since larger trainee class sizes means that more people are at risk of contracting ARD. In the Poisson model, log trainee population size in
82 hundreds was included as an offset in order to constrain its effect on the re maining Temporal covariates such as the months of January and February were also important variables, associated with an average of 42% and 25% increase in RMSE, respectively. However, January and February both shown to be statistically significant protective factors (IRR = 0.53 and 0.81, respectively) on ARD. January comes after what is usually an extended winter holiday when trainees are permitted to leave the installation. Leaving the psychologically and physically stres sful BCT environment may result in improved temporary immune function upon their return (76) Furthermore, the New Year could be when large numbers of new recruits first arrive at BCT. As a result, recruits at installations with BPG prophylaxis available would be protected against the common bacterial pathogens responsible for ARD, with protection extending into February. Fort Leonard Wood and Fort Benning were both found to be signific ant risk factors for ARD compared to Fort Jackson (IRRs = 1.29 and 1.17 with 95% CIs of 1.26, 1.33, and 1.13, 1.20, respectively). Oppositely, Fort Sill was found to be significantly protective against ARD (IRR = 0.67, 95% CI: 0.65, 0.69). Each base may ha ve a different baseline risk for the various pathogens that make up ARD due to varying environmental conditions across sites, and the protocol for assigning new recruits to BCT sites. Recruits from different parts of the country may have different immune p rofiles and thus may be more or less susceptible to infection from pathogens vulnerable to BPG.
83 Limitations This study is subject to a number of limitations. First, BPG prophylaxis availability does not mean that BPG was not available or used reactively to control an ongoing outbreak. Situations where BPG was used to control an outbreak were not explicitly captured in the ARD Surveillance Program. Additionally, only those in the impacted trainee class or barrack would receive the antibiotic, and not neces sarily all trainees in the installation. Similarly, adenovirus vaccine availability does not necessarily reflect use. There could be variability in administration in the few years between when the final adenovirus vaccine shipments and when the installatio respective stockpiles were depleted. Second, ARD has a syndromic case definition that encompasses viral and bacterial respiratory pathogens. Since BPG is an antibiotic, it is only effective against specific gram positive bacteria, including some Strept ococci and Treponema. Meanwhile, it is thought that the majority of ARD in this population is caused by viral pathogens (70,71) BPG effectiveness in a given week is a function of the composition of respiratory pathogens circulating in the recruit population; this is not captured as part of the ARD Surveillance Program. Military treatment facilities are encouraged to test for strep in at least 60% of recruits with ARD, but true strep test compliance rates are around 30%. We opted to use ARD rather than a strep specific outcome due to low, potentially differential testing compliance rates across installations over time. Next, the only type of temporal variation taken into account was month, and not week, season or year. Taking into ac count week of the year would have resulted in estimating an additional 52 parameters, which would have been computationally intensive and would have significantly decreased our statistical power. Year was not
84 included in the analysis so that the resulting models could be then used to forecast ARD for prospective outbreak detection. Season was not included because within season variation in weather patterns, as well as overall seasonal trends, should have been captured by the finer temporal scale. The resul ts of this work suggest that BPG prophylaxis reduces the rate of all cause ARD by 32%. A randomized controlled trial to test the effectiveness of different dosing regimens of this intervention at reducing ARD would provide additional data on the modes of a ction and overall effectiveness.
85 CHAPTER 5 CONCLUDING THOUGHTS AND FUTURE DIRECTIONS This dissertation presented three case studies in how epidemiological models can be fit to data to provide guidance to clinicians, public health practice agencies and publ ic health policymakers. Below we provide an overview of the findings and public health importance of each of the studies, and conclude with future directions for this research. Clinical Decision Support Infectious diseases continue to pose a major public h ealth concern in developing nations, as well as in special high risk subgroups in the United States. Patients who visit a medical clinic due to an illness should receive the best possible medical care, regardless of where they live or their sociodemographi c characteristics. Unfortunately, many countries and underserved areas do not have the material or human resources to successfully diagnose and treat every patient who seeks care in a timely fashion. One such example is laboratory services. Samples taken from patients must be stored in special media at specific temperatures and then sent to laboratories to run tests on special equipment which could take days or weeks to get processed and have the results returned. Additional resources to enable clinicians to make accurate diagnoses at point of care are essential to improve individual health outcomes. Epidemiological models that incorporate patient clinical symptoms, medical histories and demographic information can be used to predict health outcomes with r easonable sensitivity and specificity. These methods and the guidance they provide can be used in resource limited settings to help improve access to quality medical care.
86 In Chapter 2, we applied regression and machine learning models to clinical symptom and demographic data from febrile children in Haiti to predict arbovirus infection outcomes for DENV, CHIKV and ZIKV. All three arboviruses have similar clinical presentations, and require laboratory resources for diagnostic confirmation. Pain management i s important in treating all three pathogens, but patients with DENV infection should not be given non steroidal anti inflammatory drugs (NSAID) because they increase the likelihood of clinical complications. NSAID, however, should be used for pain manageme nt of CHIKV infection. Our models of CHIKV had 72% sensitivity and 63% specificity, with 86% AUROC. Arthralgia was the most important clinical symptom in predicting CHIKV infection; removing arthralgia from the RF model was associated with a 140% increase in MSE, and was associated with a 21.6 times increased odds of CHIKV infection in the multivariate logistic regression analysis (95%CI: 6.8, 87.2). This result indicates that screening for arthralgia may be sufficient in classifying patients with acute fe brile illness as CHIKV positive, and thus may be useful in deciding whether or not to prescribe NSAID to patients. We also explored the clinical spectrum of illness among the patients and examined how patients clustered based on their symptomology, and w hether or not those patients were PCR positive for one or more arboviruses. We found that CHIKV patients clustered into two main groups: one that experienced high rates of nearly every symptom measured, and another with very low rates of all symptoms exc ept for arthralgia and headache. Thus, the clinical presentation of CHIKV is highly variable, and this should be considered when making a differential diagnosis and developing a pain management plan.
87 This study should be repeated with data from more patien ts in different areas of Haiti, as well as in other regions with co incident epidemics of DENV, CHIKV and ZIKV, in order to have more generalizable results. Public Health Surveillance Epidemiological models utilize data collected by surveillance systems in order to estimate risk factors for infection or post infection outcomes. Individuals who are part of interventions to alleviate some of the disproportionate risk. Syndromic based surveillance, such as that for ILI is restricted. Without taking samples and performing diagnostic tests, it is impossible to know what etiologic agent is responsible for the observed clinical disease. In Chapter 3, we used PCR confirmed infection data from school children that were absent with an ILI in order to quantify the effect of prior infection with specific viruses on the risk of subsequent infection in the same influenza season. We found that having had a nasal swab as a result of an ILI associated absence was statistically significantly associated with the risk of Flu B, HRV and CoV infection. The results of this study suggest that individuals with histories of ILI should be targeted for surveillance and intervention activities to prevent subsequent infection. By further examining prior infection with specific respiratory viruses, we found that prior Flu A infection was a significant risk factor for subsequent Flu B infection. Similarly, prior HMPV infection was a significant risk factor for subsequent Flu B infection. These result s provide evidence in support of possi ble immune mediated enhancement, and evidence against the theory of a refractory period in which there is non specific, ephemeral immunity following infection, respec tively.
88 The analysis presented in Cha pter 3 should be repeated for additional respiratory disease season s in order to increase statistical power, and to determine the extent to which the order of infections matters. In the present study, there was a wave o f Flu A that occurred in the first part of the season, followed by a wave of Flu B later on. As our results have suggested, order of infections may matter, so examining seasons with different pathogen compositions and outbreak timings will be important. A dditionally, information on negative swabs and students who had ILI but did not provide a swab should be considered in future models. These students represent the overall burden of ILI experienced in the community, and this information would improve estima tes on the role of past ILI in predicting future ILI within the same respiratory disease season. Public Health Intervention Evaluation Surveillance data from pre and post intervention implementation provide the necessary foundation to evaluate the impact of an intervention on population level health metrics. Recommendations can then be made for further studies to refine the intervention distribution in order to cost effectively prevent morbidity and mortality. We saw in Chapter 4 that periods of BPG avail ability were associated with a statistically significant decreased incidence of ARD among military recruits, after controlling for month, installation, training type, and adenovirus vaccine availability. We also saw that this relationship had a lower effec t size that that of either the old or new adenovirus vaccine. It has weeks long, which is substantially shorter than the duration of BCT. Additional studies are needed including on e using a survival analysis framework similar to Chapter 3 to
89 examine how the hazard of ARD changes as a function of BPG prophylaxis, and if a of their training. Addi tionally, surveillance for antimicrobial resistance should be actively pursued, particularly for streptococcal infections for situational awareness and to further advise ARD control strategies moving forward. Concluding Thoughts Statistical and machine lea rning models are invaluable tools for epidemiologists and public health practitioners as they seek to better understand the drivers of infectious disease transmission in the populations most affected, then take act ion to minimize disease burden. Clinical a nd s urveillance data provide detailed information that, if modeled correctly and communicated effectively to clinicians and policymakers, can improve public health practice worldwide.
90 LIST OF REFERENCES 1. Smith KF, Dobson AP, McKenzie FE, et al. Ecological theory to enhance infectious disease control and public health policy. Front. Ecol. Environ. 2005;3(1):29 37. 2. Westreich D, Lessler J, Funk MJ. Propensity score estimation: neural networks, sup port vector machines, decision trees (CART), and meta classifiers as alternatives to logistic regression. J. Clin. Epidemiol. 2010;63(8):826 833. 3. May RM. Simple mathematical models with very complicated dynamics. Nature 1976;261(5560):459. 4. Tangr i N, Stevens LA, Griffith J, et al. A Predictive Model for Progression of Chronic Kidney Disease to Kidney Failure. JAMA 2011;305(15):1553 1559. 5. Lee DS, Austin PC, Rouleau JL, et al. Predicting Mortality Among Patients Hospitalized for Heart Failure: Derivation and Validation of a Clinical Model. JAMA 2003;290(19):2581 2587. 6. Monto AS, Gravenstein S, Elliott M, et al. Clinical Signs and Symptoms Predicting Influenza Infection. Arch. Intern. Med. 2000;160(21):3243 3247. 7. Metcalf CJE, Edmunds W J, Lessler J. Six challenges in modelling for public health policy. Epidemics 2015;10:93 96. 8. Wallinga J, Lipsitch M. How generation intervals shape the relationship between growth rates and reproductive numbers. Proc. R. Soc. Lond. B Biol. Sci. 2007; 274(1609):599 604. 9. Rivers CM, Lofgren ET, Marathe M, et al. Modeling the Impact of Interventions on an Epidemic of Ebola in Sierra Leone and Liberia. PLoS Curr. [electronic article]. 2014;6. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4325479/). (Ac cessed January 24, 2018) 10. Picard RR, Berk KN. Data Splitting. Am. Stat. 1990;44(2):140 147. 11. Efron B, Tibshirani R. Improvements on Cross Validation: The 632+ Bootstrap Method. J. Am. Stat. Assoc. 1997;92(438):548 560. 12. Steyerberg EW, Harrell FE, Borsboom GJ, et al. Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J. Clin. Epidemiol. 2001;54(8):774 781. 13. Fushiki T. Estimation of prediction error by using K fold cross validation. Sta t. Comput. 2011;21(2):137 146. 14. Arlot S, Celisse A. A survey of cross validation procedures for model selection. Stat. Surv. 2010;4:40 79.
91 15. Buckland ST, Burnham KP, Augustin NH. Model Selection: An Integral Part of Inference. Biometrics 1997;53( 2):603 618. 16. Bouckaert RR, Frank E. Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms. In: Advances in Knowledge Discovery and Data Mining Springer, Berlin, Heidelberg; 2004 (Accessed July 1, 2017):3 12.(https://lin k.springer.com/chapter/10.1007/978 3 540 24775 3_3). (Accessed July 1, 2017) 17. GBD 2013 DALYs and HALE Collaborators. Global, regional, and national disability adjusted life years (DALYs) for 306 diseases and injuries and healthy life expectancy (HALE) for 188 countries, 1990 2013: quantifying the epidemiological transition. Lancet 2015;386:2145 2191. 18. Campos GS, Bandeira AC, Sardi SI. Zika Virus Outbreak, Bahia, Brazil. Emerg. Infect. Dis. 2015;21(10):1885 1886. 19. Fischer M, Staples E. Notes f rom the Field: Chikungunya Virus Spreads in the Americas Caribbean and South America, 2013 2014. Morb. Mortal. Wkly. Rep. MMWR 2014;63(22):500 501. 20. Lednicky J, Beau De Rochars VM, El Badry M, et al. Zika Virus Outbreak in Haiti in 2014: Molecular and Clinical Data. PLoS Negl. Trop. Dis. 2016;10(4):e0004687. 21. Iovine NM, Lednicky J, Cherabuddi K, et al. Coinfection With Zika and Dengue 2 Viruses in a Traveler Returning From Haiti, 2016: Clinical Presentation and Genetic Analysis. Clin. Infect. D is. 2017;64(1):72 75. 22. Rodriguez Morales AJ, Villamil Gmez WE, Franco Paredes C. The arboviral burden of disease caused by co circulation and co infection of dengue, chikungunya and Zika in the Americas. Travel Med. Infect. Dis. 2016;14(3):177 179. 23. Beltran Silva S, Chacon Hernandez S, Moreno Palacios E, et al. Clinical and differential diagnosis: Dengue, chikungunya and Zika. Rev. Medica Hosp. Gen. Mex. [electronic article]. 2016;(http://www.sciencedirect.com/science/article/pii/S018510631630113 5#bib02 10). (Accessed October 4, 2017) 24. Paniz Mondolfi AE, Rodriguez Morales AJ, Blohm G, et al. ChikDenMaZika Syndrome: the challenge of diagnosing arboviral infections in the midst of concurrent epidemics. Ann. Clin. Microbiol. Antimicrob. 2016;15:42 25. WHO. Dengue: Guidelines for diagnosis, treatment, prevention and control. 2009;(http://www.who.int/tdr/publications/documents/dengue diagnosis.pdf). (Accessed April 15, 2016)
92 26. Symptoms, Diagnosis, & Treatment | Chikungunya virus. Cent. Dis. Con trol Prev. 2016;(https://www.cdc.gov/chikungunya/symptoms/index.html). (Accessed February 2, 2018) 27. Braga JU, Bressan C, Dalvi APR, et al. Accuracy of Zika virus disease case definition during simultaneous Dengue and Chikungunya epidemics. PLOS ONE 20 17;12(6):e0179725. 28. Beau De Rochars VEM, Alam MT, Telisma T, et al. Spectrum of outpatient illness in a school based cohort in Haiti, with a focus on diarrheal pathogens. Am. J. Trop. Med. Hyg. 2015;92(4):752 757. 29. Therneau TM, Atkinson B, Ripler y B. rpart: Recursive Partitioning and Regression Trees. 2017.(https://CRAN.R project.org/package=rpart) 30. Liaw A, Wierer M. Classification and Regression by randomForest. R News 2002;2(3):18 22. 31. Breiman L. Random Forests. Mach. Learn. 2001;45(1) :5 32. 32. Breiman L. Bagging predictors. Mach. Learn. 1996;24(2):123 140. 33. Cutler DR, Edwards TC, Beard KH, et al. Random Forests for Classification in Ecology. Ecology 2007;88(11):2783 2792. 34. Strobl C, Boulesteix A L, Zeileis A, et al. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics 2007;8:25. 35. Hyndman R. forecast: Forecasting functions for time series and linear models. 2016.(http://github.com/robjhyndman/forecast) 36. Jr M PM, Pimgate C, Koenraadt CJM, et al. Spatial and Temporal Clustering of Dengue Virus Transmission in Thai Villages. PLOS Med. 2008;5(11):e205. 37. Nsoesie EO, Ricketts RP, Brown HE, et al. Spatial and Temporal Clustering of Chikungunya Virus Transmission in Dominica. PLoS Negl. Trop. Dis. 2015;9(8):e0003977. 38. Salje H, Lessler J, Endy TP, et al. Revealing the microscale spatial signature of dengue transmission and immunity in an urban population. Proc. Natl. Acad. Sci. 2012;109(24):9535 9538. 39. Sa lje H, Cauchemez S, Alera MT, et al. Reconstruction of 60 Years of Chikungunya Epidemiology in the Philippines Demonstrates Episodic and Focal Transmission. J. Infect. Dis. 2016;213(4):604 610.
93 40. Symptoms, Diagnosis, & Treatment | Chikungunya virus | C DC. (https://www.cdc.gov/chikungunya/symptoms/index.html). (Accessed January 5, 2018) 41. Ranjit S, Kissoon N. Dengue hemorrhagic fever and shock syndromes. Pediatr. Crit. Care Med. 2011;12(1):90 100. 42. Clinical Guidance | Dengue | CDC. (https://www.c dc.gov/dengue/clinicallab/clinical.html). (Accessed January 5, 2018) 43. Fenton A, Perkins SE. Applying predator prey theory to modelling immune mediated, within host interspecific parasite interactions. Parasitology 2010;137(6):1027 1038. 44. Cobey S, Lipsitch M. Pathogen diversity and hidden regimes of apparent competition. Am. Nat. 2013;181(1):12 24. 45. Holt RD, Lawton JH. Apparent competition and enemy free space in insect host parasitoid communities. Am. Nat. 1993;142(4):623 645. 46. Reyes Sil veyra J, Mikler AR. Modeling immune response and its effect on infectious disease outbreak dynamics. Theor. Biol. Med. Model. [electronic article]. 2016;13. (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4779228/). (Accessed January 3, 2017) 47. Martcheva M Tuncer N, Mary CS. Coupling Within Host and Between Host Infectious Diseases Models. BIOMATH 2015;4(2):1510091. 48. Goncalvez AP, Engle RE, Claire MS, et al. Monoclonal antibody mediated enhancement of dengue virus infection in vitro and in vivo and s trategies for prevention. Proc. Natl. Acad. Sci. 2007;104(22):9422 9427. 49. Cobey S, Lipsitch M. Niche and neutral effects of acquired immunity permit coexistence of pneumococcal serotypes. Science 2012;335(6074):1376 1380. 50. Haller O, Kochs G, Web er F. The interferon response circuit: induction and suppression by pathogenic viruses. Virology 2006;344:119 130. 51. Tregoning JS, Schwarze J. Respiratory viral infections in infants: causes, clinical symptoms, virology, and immunology. Clin. Microbio l. Rev. 2010;23(1):74 98. 52. Hay AJ, Gregory V, Douglas AR, et al. The evolution of human influenza viruses. Philos. Trans. R. Soc. Lond. Ser. B 2001;356(1416):1861 1870. 53. Henrickson KJ. Parainfluenza Viruses. Clin. Microbiol. Rev. 2003;16(2):242 264. 54. Bouvier NM, Palese P. The biology of influenza viruses. Vaccine 2008;26:D49 D53.
94 55. Lofgren E, Fefferman NH, Naumov YN, et al. Influenza Seasonality: Underlying Causes and Modeling Theories. J. Virol. 2007;81(11):5429 5436. 56. Dowell SF. Seasonal variation in host susceptibility and cycles of certain infectious diseases. Emerg. Infect. Dis. 2001;7(3):369 374. 57. Sonoguchi T, Naito H, Hara M, et al. Cross subtype protection in humans during sequential, overlapping, and/or concurrent epid emics caused by H3N2 and H1N1 influenza viruses. J. Infect. Dis. 1985;151(1):81 88. 58. Cowling BJ, Ng S, Ma ESK, et al. Protective efficacy of seasonal influenza vaccination against seasonal and pandemic influenza virus infection during 2009 in Hong Kon g. Clin. Infect. Dis. Off. Publ. Infect. Dis. Soc. Am. 2010;51(12):1370 1379. 59. Cobey S, Karpova L, Marinich I, et al. Cross immunity in the dynamics of influenza A H3N2, H1N1, and influenza B in humans. 93rd Ecol. Soc. Am. Annu. Meet. 2008;(https://ec o.confex.com/eco/2008/techprogram/P11675.HTM). (Accessed January 17, 2017) 60. Morens DM, Taubenberger JK, Fauci AS. Predominant Role of Bacterial Pneumonia as a Cause of Death in Pandemic Influenza: Implications for Pandemic Influenza Preparedness. J. In fect. Dis. 2008;198(7):962 970. 61. Breese Hall C, Powell KR, Schnabel KC, et al. Risk of secondary bacterial infection in infants hospitalized with respiratory syncytial viral infection. J. Pediatr. 1988;113(2):266 271. 62. Peltola V, McCullers J. Res piratory viruses predisposing to bacterial infections: role of neuraminidase. Pediatr. Infect. Dis. J. 2004;23(1):S87 S97. 63. Disease Burden of Influenza | Seasonal Influenza (Flu) | CDC. 2017;(https://www.cdc.gov/flu/about/disease/burden.htm). (Accesse d February 4, 2018) 64. Fendrick AM, Monto AS, Nightengale B, et al. The Economic Burden of Non Influenza Related Viral Respiratory Tract Infection in the United States. Arch. Intern. Med. 2003;163(4):487 494. 65. Rothman K, Greenland S, Lash T. Modern Epidemiology. 3rd ed. Philadelphia, PA: Lippincott Williams & Wilkins; 2008. 66. Therneau TM. Modeling Survival Data: Extending the Cox Model. 2015.(https://CRAN.R project.org/package=survival) 67. Terry M. coxme: Mixed Effects Cox Models. 2015.(https:// CRAN.R project.org/package=coxme)
95 68. Cowling BJ, Fang VJ, Nishiura H, et al. Increased Risk of Noninfluenza Respiratory Virus Infections Associated With Receipt of Inactivated Influenza Vaccine. Clin. Infect. Dis. 2012;54(12):1778 1783. 69. Armed Force s Health Surveillance Center (AFHSC). Surveillance snapshot: Illness and injury burdens among U.S. military recruit trainees, 2014. MSMR 2015;22(4):26. 70. Gray GC, Callahan JD, Hawksworth AW, et al. Respiratory diseases among U.S. military personnel: c ountering emerging threats. Emerg. Infect. Dis. 1999;5(3):379 385. 71. Sanchez JL, Cooper MJ, Myers CA, et al. Respiratory Infections in the U.S. Military: Recent Experience and Control. Clin. Microbiol. Rev. 2015;28(3):743 800. 72. Dudding BA, Top FH, Winter PE, et al. Acute respiratory disease in military trainees: the adenovirus surveillance program, 1966 1971. Am. J. Epidemiol. 1973;97(3):187 198. 73. Breese B, Stanbury J, Upham H. Influence of crowding on respiratory illness in a large naval trai ning station. War Med. 1945;7:143 146. 74. Russell KL. Respiratory infections in military recruits. In: Recruit medicine Washington, DC: TMM Publications; 2006:227 253. 75. Gleeson M. Immune function in sport and exercise. J. Appl. Physiol. 2007;103(2) :693 699. 76. Cohen S, Tyrrell DAJ, Smith AP. Psychological Stress and Susceptibility to the Common Cold. N. Engl. J. Med. 1991;325(9):606 612. 77. Clemmons NS, McCormic ZD, Gaydos JC, et al. Acute Respiratory Disease in US Army Trainees 3 Years after Reintroduction of Adenovirus Vaccine1. Emerg. Infect. Dis. 2017;23(1):95 98. 78. Radin JM, Hawksworth AW, Blair PJ, et al. Dramatic Decline of Respiratory Illness Among US Military Recruits After the Renewed Use of Adenovirus Vaccines. Clin. Infect. Dis. 2014;59(7):962 968. 79. Morris AJ, Rammelkamp Ch Benzathine penicillin G in the prevention of Streptococcic infections J. Am. Med. Assoc. 1957;165(6):664 667. 80. Davis JC, Schmidt WC. Benzathine Penicillin G. N. Engl. J. Med. 1957;256(8):339 342. 81. Gray GC, McPhate DC, Leinonen M, et al. Weekly Oral Azithromycin as Prophylaxis for Agents Causing Acute Respiratory Disease. Clin. Infect. Dis. 1998;26(1):103 110.
96 82. Kassem AS, Zaher SR, Shleib HA, et al. Rheumatic Fever Prophylaxis Using Benzath ine Penicillin G (BPG): Two week Versus Four week Regimens: Comparison of Two Brands of BPG. Pediatrics 1996;97(6):992 995. 83. Broderick MP, Hansen CJ, Russell KL, et al. Serum Penicillin G Levels Are Lower Than Expected in Adults within Two Weeks of A dministration of 1.2 Million Units. PLOS ONE 2011;6(10):e25308. 84. Lee S, Eick A, Ciminera P. Respiratory Disease in Army Recruits: Surveillance Program Overview, 1995 2006. Am. J. Prev. Med. 2008;34(5):389 395. 85. Gunzenhauser JD, Brundage JF, McNe il JG, et al. Broad and Persistent Effects of Benzathine Penicillin G in the Prevention of Febrile, Acute Respiratory Disease. J. Infect. Dis. 1992;166(2):365 373. 86. Russell KL, Hawksworth AW, Ryan MAK, et al. Vaccine preventable adenoviral respiratory illness in US military recruits, 1999 2004. Vaccine 2006;24(15):2835 2842.
97 BIOGRAPHICAL SKETCH Jacob earned his Bachelor of Science in Public Health from Tulane University in New Orleans, LA where he became fascinated by infectious disease epidem iology and public health program design, implementation and evaluation Immediately following, he moved to Gainesville, FL to pursue a Master of Arts in the medical anthropology of infectious diseases at the University of Florida. During his tenure as a M asters student, he studied under the guidance of two applied anthropologists, Drs. Rick Stepp and Alyson Young, who taught him the value of applying a holistic, systems based approach to studying the interactions between humans, pathogens and the environme nt It was during that time that he met Dr. Juliet R.C. Pulliam at the University of Florida Emerging Pathogens Institute who trained him in quantitative disease ecology, specifically in u sing mathematical and statistical models to understand the complex d rivers of disease transmission across scales. For his thesis, Jacob tra veled to Yucatn, Mexico where he worked with Dr. Eric Dumonteil, a professor at The Autonomous University of Yucatn and Tulane University to study the sociocultural factors s urrounding Chagas disease transmission and barriers to control in rural communities participating in an insect screen distribution program. After finishing his degree, Jacob began a PhD in Epidemiology in 2014 where he received two Intramural Research Training Awards from the National Institute of Allergy and Infectious Diseases/U.S. National Institutes of Health to work with Dr. Vincent Munster in the Virus Ecology Unit at Rocky Mountain Laboratories. The goal of those experiences was to gain a better understanding of high and highest containment biosafety laboratory procedures, as well as to understand how data from infection experiments are generated, and how to incorporate them into population level models of
98 infecti ous disease t ransmission. During that time, he also received Biosafety Level 3 Ag training funded by an award from the U.S. Department of Homeland Security through Kansas State University. The National Institute of General Medical Sciences/U.S. National In stitutes of Health funded the first year of his PhD work through award R25GM102149 to Dr. Juliet R.C. Pulliam. In 2015, Jacob received the Science, Mathematics and Research for Transformation Scholarship through the U.S. Department of Defense, which funded the remainder of his PhD program. Following his graduation in May 2018, Jacob joined Army Public Health Center as an epidemiologist in the Clinical Public Health & Epidemiology Portfolio, Disease Epidemiology Division.