1 SELF INJURY RISK FACTORS AND OUTCOMES AMONG ADULTS IN FLORIDA By JAE SUN MIN A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2019
2 2019 Jae Sun Min
3 To my parents and grandparents
4 ACKNOWLEDGEMENTS My sincere gratitude goes out to my advisor Dr. Mattia Prosperi, whose teaching and mentorship have been indispensable to my growth. Thank you for believing in my potential and guiding me until the finish line I would also like to thank the members of my dissertation committee. I thank Dr. Kelly K. Gurka for her keen eye for details and stimulatin g discussions in the realm of injury research. I thank Dr. Hui Hu for sharing his expertise in environmental epidemiology and data integr ation, and I thank Dr. Jiang Bian for his frank critiques and guidance on conducting observational studies using admini strative data. This dissertation would not have been possible without their support My time at University of Florida was enriched by many caring friends Rachel Zahigian, Vicki Osborne, Nancy Seraphin, Krishna Gelin, Mirsada Serdarevic, Akemi Wijayabahu, Jiyeon Park, Bin Yu, and others within and outside the Department of Epidemiology. Our late night talks and food adventures have kept me going over the years. I thank my parents and family for their encouragement a nd championship of my education I would b e remiss to leave out my best friends, Danielle Michaud and Gabriela Lpez Compen, from the acknowledgements. Thank you for always believing the best in me. I would also like to express my gratitude to Dr. Shelley Sazer at Baylor College of Medicine who p atiently taught me how to hold a pipette and forever instilled in me a love for science. Lastly, I would like to acknowledge the patients, health care providers, and staff who have been a part of Healthcare Cost and Utilization Project for making research like this dissertation possible.
5 TABLE OF CONTENTS page ACKNOWLEDGEMENTS ................................ ................................ ............................... 4 LIST OF TABLES ................................ ................................ ................................ ............ 7 LIST OF FIGURES ................................ ................................ ................................ .......... 9 LIST OF ABBRE VIATIONS ................................ ................................ ........................... 10 ABSTRACT ................................ ................................ ................................ ................... 11 CHAPTER 1 INTRODUCTION ................................ ................................ ................................ .... 13 Epidemiology of Self In jury ................................ ................................ ..................... 13 Definition of Self Injury ................................ ................................ ............................ 14 Gaps in Current Research ................................ ................................ ...................... 14 Conceptu al Framework ................................ ................................ ........................... 16 Specific Aims ................................ ................................ ................................ .......... 17 Aim 1: Examine Individual Level Risk Factors for Self Injury for Three Age Groups: Younger (18 39 Yea rs of Age), Middle Aged (40 64 Years of Age), and Older (65+ Years Old) Adults ................................ ........................ 17 Aim 2: Identify Environmental Factors Associated with Self Injury and Compare the Strengths of Associations with Self Injury When Accounting for Individual Level Factors ................................ ................................ ........... 17 Aim 3: Ascertain the Health Outcomes Subsequent to Self Injury and Their Associations with Environmental Factors ................................ ...................... 18 Potential Implications ................................ ................................ .............................. 18 2 DESIGN AND METHODS ................................ ................................ ....................... 22 Study Design ................................ ................................ ................................ .......... 22 Health Care Encounter Data ................................ ................................ ................... 22 Environmental Data ................................ ................................ ................................ 24 Methods of Analyses ................................ ................................ ............................... 26 3 COMPARISON OF KNOWLEDGE DRIVEN MODEL TO DATA BASED MACHINE LEARNING MODELS TO STUDY RISK FACTORS FOR SELF INJURY ................................ ................................ ................................ ................... 31 Background ................................ ................................ ................................ ............. 31 Aims and Hypotheses ................................ ................................ ............................. 32 Methods ................................ ................................ ................................ .................. 32 Setting and Study Population ................................ ................................ ........... 32
6 Variables ................................ ................................ ................................ .......... 33 Statistical Analyses ................................ ................................ .......................... 34 Results ................................ ................................ ................................ .................... 35 Discussion ................................ ................................ ................................ .............. 39 Limitations ................................ ................................ ................................ ........ 40 Strengths ................................ ................................ ................................ .......... 41 4 RISK FACTORS FOR SELF INJU RY ACROSS INDIVIDUAL AND ENVIRONMENTAL DOMAINS STUDIED THROUGH MACHINE LEARNING ....... 54 Background ................................ ................................ ................................ ............. 54 Aims and Hypotheses ................................ ................................ ............................. 55 Methods ................................ ................................ ................................ .................. 55 Study Population and Case Definition ................................ .............................. 55 Individual Level Data ................................ ................................ ........................ 56 Environmental Data ................................ ................................ .......................... 56 Data Linkage ................................ ................................ ................................ .... 59 Statistical Analyses ................................ ................................ .......................... 60 Results ................................ ................................ ................................ .................... 61 Discussion ................................ ................................ ................................ .............. 64 Limitations ................................ ................................ ................................ ........ 65 Strengths ................................ ................................ ................................ .......... 66 5 DESCRIPTIVE STUDY OF HEALTH OUTCOMES FOLLOWING SELF INJURY .. 81 Background ................................ ................................ ................................ ............. 81 Aims and Hypotheses ................................ ................................ ............................. 82 Methods ................................ ................................ ................................ .................. 83 Study Population and Case Definition ................................ .............................. 83 Individual Level Data ................................ ................................ ........................ 83 Environment Level Data ................................ ................................ ................... 85 Data Linkage ................................ ................................ ................................ .... 87 Statistical Analyses ................................ ................................ .......................... 88 Results ................................ ................................ ................................ .................... 88 Discussion ................................ ................................ ................................ .............. 9 1 Limit ations ................................ ................................ ................................ ........ 92 Strengths ................................ ................................ ................................ .......... 93 6 CONCLUSION ................................ ................................ ................................ ...... 107 Summary of Findings ................................ ................................ ............................ 107 Future Research ................................ ................................ ................................ ... 108 APPENDIX: SUPPLEMENTARY MATERIAL ................................ ............................. 110 LIST OF REFERENCES ................................ ................................ ............................. 114 BIOGRAPHICAL SKETCH ................................ ................................ .......................... 125
7 LIST OF TABLES Table page 1 1 The epidemiologic triad and the corres ponding variables used for this dissertation. ................................ ................................ ................................ ........ 20 2 1 List of data sources and variables used in this dissertation. ............................... 29 3 1 Distribution of cases and controls by their ICD 9 CM case definitions shown as frequencies and percentages. ................................ ................................ ........ 42 3 2 Demographic characteristics of the study population. Cases are those with self injury and co ntrols are those with injury other than self injury. ..................... 43 3 3 List of variables with an absolute value of the Phi correlation coefficient greater than 0.5. ................................ ................................ ................................ 44 3 4 Results of bivariate analyses showing top 10 CCS codes with the highest unadjusted odds ratios shown with frequencies among cases and controls. ...... 45 3 5 List of variables used to build the models. ................................ .......................... 46 3 6 Comparison of models used to examine predictors of self injury ....................... 47 3 7 Adjusted odds ratios and 95% confidence intervals from logistic regression based on expert knowledge variables model. ................................ ..................... 48 3 8 Comparison of domains using the best per forming model, LASSO regression ................................ ................................ ................................ .......... 49 3 9 Comparison of models used to examine predictors of self injury stratified by age: younger age (18 39), middle aged (40 64), and older age (65+) groups. ... 50 4 1 Distribution of cases and controls by their ICD 9 CM case definitions. .............. 68 4 2 Demographic characteristics of the study population. Cases are those with self injury and controls are those wi th injury other than self injury. ..................... 69 4 3 Results from bivariate analyses of environmental variables with self injury as outcome, shown as unadjusted odds ratios and 95% confidence intervals. ....... 70 4 4 Comparison of models used to examine individual and environmental predictors of self injury. ................................ ................................ ...................... 72 4 5 Comparison of individual and en vironmental variable domains using the best performing model, random forests. ................................ ................................ ..... 73
8 4 6 Comparison of random forests models used to examine individual and environmental predictors of self injury str atified by age groups ......................... 74 5 1 Distribution of individuals in the study by their ICD 9 CM case definitions. ........ 95 5 2 Demographic characte ristics of the self injury cohort and comparison cohort. ... 96 5 3 Frequencies, unadjusted, and multivariate adjusted relative risks of clinical conditions in descending order of adjusted relative r isk. ................................ .... 97 A 1 List of ICD 9 CM codes used to define cases ................................ ................... 110 A 2 List of ICD 9 CM codes used to define controls ................................ ............... 111 A 3 Environmental variables used in this study with their sources*. ....................... 112 A 4 Brief list of 2007 North American Industry Classification System (NAICS) desi gnations for health care facilities in County Business Patterns data. ......... 113
9 LIST OF FIGURES Figure page 1 1 Epidemiologic triad of self injury. ................................ ................................ ........ 21 2 1 ................................ .. 30 3 1 Flow chart of sample population for inclusion in the first study. .......................... 51 3 2 Receiver Operating Characteristic (ROC) curves of models used to examine risk factors for self injury. ................................ ................................ .................... 52 3 3 Receiver Operating Cha racteristic (ROC) curves of models used to examine predictors of self injury with best performing model, LASSO. ............................. 53 4 1 Flow diagram of study population for the second study. ................................ ..... 75 4 2 Number of self injury cases in this study per 100,000 for each county. .............. 76 4 3 Unadjusted associations between environment level variables and self i njury as outcome. ................................ ................................ ................................ ........ 77 4 4 Receiver Operating Characteristic (ROC) curves of models used to examine individual and environmental predictors of self injury. ................................ ........ 78 4 5 Receiver Operating Characteristic (ROC) curves of models used to examine predictors of self injury with best performing model, random forests .................. 79 4 6 Top 15 variables wi th the highest Gini impurity from random forests models for younger age (18 39), middle aged (40 64), and older age (65+) groups. ...... 80 5 1 Derivation of study population from HCUP Florida. ................................ .......... 102 5 2 Adjusted relative risks for clinical outcomes after self injury with 95% confidence intervals (CI). ................................ ................................ .................. 103 5 3 Number of psychiatric and physical conditions for cohort with prior self injury and cohort without prior self injury. ................................ ................................ ... 104 5 4 Number of psychiatric and physical conditions by age group within prior self injury cohort. ................................ ................................ ................................ ..... 105 5 5 Adjusted relative risks of showing associations between environmental variables and self injury recurrence for middle aged group. ............................. 106
10 LIST OF ABBREVIATIONS AC S American Community Survey ADI Area Deprivation Index AHRQ Agency for Healthcare Research and Quality AUROC Area Under Receiver Operating Characteristic CCI Charlson Comorbidity Index CCS Clinical Classifications Software CBP County Business Pattern s CV Cross Validation E code External Causes of Injury Code ED Emergency Department FARA Food Access Research Atlas HCUP Healthcare Cost and Utilization Project ICD International Classification of Diseases ICD 9 CM International Classification of Di seases 9 th revision Clinical Modification LASSO Least Absolute Shrinkage and Selection Operator ZIP Code Zone Improvement Plan Code ZCTA ZIP Code Tabulation Area
11 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy SELF INJURY RISK FACTORS AND OUTCOMES AMONG ADULTS IN FLORIDA By Jae Sun Min August 2019 Chair: Mattia Prosperi Major: Epidemiology Injury that has been p urposefully inflicted upon self, or self injury, is a considerable morbidity and mortality burden. Over 44,000 Americans die of fatal self injury and half a million are treated in emergency departments for self injury each year. Although the morbidity burd en is highest among the adolescents and young adults, the rising rate of fatal self injury among middle aged and older adults ha s been found and has not been curbed. This dissertation leveraged electronic health care data from Healthcare Cost and Utilizati on Project to assess individual level, clinical risk factors for self injury. En vironment level data from the Census, American Community Survey, Community Business Patterns, and Food Access Research Atlas were examined in conjunction Expert knowledge base d and data driven machine learning models were tested and compared. In the first study of individual level risk and protective factors associated with self injury, we found that expert knowledge based logistic regression was as informative as the data driv en models for younger and middle aged groups, but not for older adults. We also found that prior history of unintentional poisoning is a major risk factor for self
12 injury. In the second study, we found that individuals with self injury were more likely to have resided in areas of lower average income and reduced access to health care service. Combining i ndividual and environment level risk factors yielded the best models. The last study examined health outcomes subsequent to self injury and found that thos e with self injury were twice as likely to have psychiatric outcomes and certain physical conditions such as coma, epilepsy, and injury of all intents We did not find any environment level factors to be associated with recurrence of self injury. This diss ertation illustrated the importance of incorporating environment level factors when studying risk factors for self injury but showed limited evidence for use when study ing outcomes following self injury. We also demonstrated that the risk for self injury v aries over lifespan and age specific epidemiological studies are needed to create targeted prevention strategies.
13 CHAPTER 1 IN TRODUCTION Epidemiology of Self Injury Self injury, or injury that has been inflicted upon self, is a considerable burden to soc iety. In 2016, over 44,000 Americans died of suicide or fatal self injury 1 That year suicide was the tenth leading cause of death overall and the second leading cause of death among persons aged 10 and 34 years in the United St ates. Despite the long history of research on suicide, the suicide rate has increased by 20% in the past 10 years, increasing by more than 60% in certain age groups. 2 4 In addition, about 500,000 individuals are treated in emergency departments (ED ) for self injury in the US each year. 1 From 2001 to 2017, t he rate of self injury leading to a hospital visit increased from 113 per 100,000 to 157 per 100,000. Moreover, it is estimated that 1.4 million individuals had some form of self injury in the past year. 5 Since the majority of these self injuries did not result in an ED visit, it is likely that this is an underestimate of the true self injury incidence rate. Nonfatal self injury is mo st common among young females while fatal self injury is more likely to be incurred by older males in the US 6 R isk factors for self injury at the individual level include unemployment, financial strain, sexual minority status, and social dis connectedness. 7 11 Additionally, mental health relate d conditions such as depression, bipolar, anxiety, schizophrenia, personality, and substance related disorders are well studied risk factors 5,9,12 Suicidal ideation, or thinking about considering, or planning suicide ute of Mental Health has also been linked to self injury but a very small proportion (7.2%) of those with ideation end up sustaining a self injury. 13,14
14 C ertain env ironmental factors including economic downturn, suicide epidemic, and exposure to community violence have also been found to be associated with self injury 9,15 However, many of these studies ex amined suicide rates at state level s not at the individual level. 16 19 Such studies are prone to ecologic fallacy by making an inference about individual risk for self injury using aggregate group data. Definition of Self Injury Historically, several terms have been used to define injury inflicte d upon self, such as self injury, self inflicted injury, deliberate self harm, suicidal behavior, and self directed violence. 20 Self injuries can occur with or without the intention of suicide and are often called su icidal self injury SSI and non suicidal self injury NSSI respectively, in American and Canadian literature. 21 A similar, but technically different term of deliberate self harm has been used to denote all self injuries regardless of intent in Europe and Australia. 21,22 The World Health Organization (WHO) uses the term in a universal manner to include suicide, suicidal ideation, and suicide attempt. 12 In its International Classification of Diseases (ICD) 9 th revision coding system, the WHO attempt. The current version, ICD 10, distinguishes suicide attempt separately from self 23 The heterogeneity in studies across the literature and, by extension, to treat o r prevent self injury 24 Gap s in Current Research Younger age and female sex have been associated with nonfatal self injury, but males, especially older males, are more likely to sustain fatal self injury. 6 Due to the
15 majori ty of nonfatal self injury being incurred by young adults and adolescents, much of self injury research has focused on this population. 1 The mean age across studies of self injury was 21.3 years (standard deviation : 4.41). 11 Given that risk factors for nonfatal self injury can vary by age and fatal self injury burden is highest among the middle aged and old er age groups 25,26 there is a need to as sess risk factors across all ages This is especially important since there has been a recent increase in suicide s among middle aged adults (40 65 year olds) in the US. 27 Age is important to consider since previous studies have found that the associations between mental health conditions and self injury vary by age. 28 For example, the odds ratio for the association between dysthymia (persistent depressive disorder) and self injury ranges from 6.6 to 20.3 and similarly for anxiety, substance dependence, and personality disorders. Moreover, t he preva lence of psychiatric conditions among individuals who sustained a self injury vary by age: 75.0% among 18 30 year old, 60.4% among 31 40, 81.2% among 41 49, 49.7% among 50 and older. 28 In addition, several cross sectional studies have examined comorbid conditions at the time of self injury but were unable to account for prior medical history. 29 32 Longitudinal studies in the US have studied select variables of interest, 33 and most studies examining prior medical history and their association with self injury have been conducted outside of the US. 34,35 Because suicide and self injury are relatively rare outcomes, adequately powered and longitudinal research on risk factors in the general population (other than the commonly studied psychiatric population) is scarce. 33 In lieu of studying a handful number of well known risk factors, a study examining all prior medical history for new risk factors has the potential to identify h igh risk groups for
16 intervention. Lastly, although certain environmental factors are known to be associated with higher rates of suicide, 19,36 their role in self injury at the individual level and their interactions with individual risk factors are unknown. Conceptual Framework The epidemiolog ic triad was used to conceptualize the causal factors related to self injury (Figure 1 1 ). The triad has been frequently used by epidemiologists to show that health condition s are the results of agent, individual and environment al factors. 37 A gent factor s are related to the energy transferred to or withheld from the human body that results in an injury and can be categorized as biologic, physical, or chemical; some examples are firearm (physica l) and poisoning (chemical). I ndividual factor s are related susceptibility to injury and can range from biologic factors like age and sex to behavioral factors like alcohol and substance use. Lastly, environmental factors refer to extrinsic f actors that influence the occurrence of injury. This dissertation uses a broad definition of environmental factors to include the physical and social environment. For self injury, these are largely socioeconomic factors like social support and characterist ics of the community in which people live in, e.g. area deprivation and poverty. This study examine d a wide range of factors related to self injury By linking health care data from emergency department visits, inpatient visits hospitals, and outpatient s ettings to environmental factors at each area of residence, both individual level and environment level risk factors were examined. These variables within the context of the epidemiologic triad are shown in Table 1 1. Moreover, medical records of patients who sustained at least one self injury were used to identify adverse
17 health outcomes following self injury and how those outcome s are affected by the environment in which these individuals reside. The u nique availability of individual level he alth care data that can be linked across time and the robust set of ecological inform atio n collected in Florida make this an ideal setting in which to conduct a study examining individual and environmental factors related to self injury. Moreover, Florida is the third largest state by population with an above average rate of overall injury incidence. 38 Although Florida has a greater proportion of older population than other states, the age adjusted mortality ra te from injury is still 11% higher in Florida than the national average. Specific Aims The objective of this dissertation was to examine risk factors associated with self injury and identify adverse health outcomes following self injury among adults in Flo rida over the lifespan Specifically, this dissertation had the following aims : Aim 1: Examine Individual Level Risk Factors for Self Injury f or Three Age Groups: Younger (18 39 Years of Age), Middle Aged (40 64 Years of Age), and Older (65+ Years Old) Adu lts Hypothesis 1.1 : A data driven machine learning model using all variables will outperform a model built using well known risk factors in differentiating self injury from other injury Hypothesis 1.2 : Age will modify the performance of the models. Aim 2: Identify En vironmental Factors Associated w ith Self Injury and Compare the Strengths of Associations with Self Injury When Accounting f or Individual Level Factors Hypothesis 2.1 : Individuals with self injury will be more likely to reside in areas of great er poverty, lower income, and reduced access to healthy foods and health care.
18 Hypothesis 2.2 : Individual level factors will be s tronger risk factors for self injury than environmental factors alone or in combination with individual level factors. Hy pothesis 2.3: Self injury is more likely to be associated with environmental factors for young er adults, both individual and environmental factors will be important for middle age d adults, and only individual factors will be strongly associated self injury a mong older adults. Aim 3: Ascertain the Health Outcomes Subsequent to Self Injury and Their Associations w ith Environmental Factors Hypothesis 3.1: Middle aged adults will have a greater number of adverse physical and psychiatric conditions following self injury compared to young and older adults. Hypothesis 3.2: Environmental factors related to employment opportunities will be strongly associated with adverse health outcomes following self injury among middle aged adults. Hypothesis 3. 3 : Individuals living in areas with low access to health care are more likely to have another self injury than those living in areas with high access to health care. Potential Implications Understanding the risk factors for self injury is essential to preventing the physical a nd psychological harm to the patients with self injury and their close relations. 4 Self injury incre ase s the risk for suicide, 11,33 which takes over 44,000 American lives each year. 1 Suicides also resulted in 895,466 years of potential life lost, $2 billion in health care costs, and $187 billion in work loss. 1
19 Identification of individuals who are likely to incur fatal or nonfatal self injury remains a high stake priority. 39 Currently, self injuries are considered behavioral s ymptoms or manifestations of another illness. However, it has been recognize d that individ uals with self injury may be potentially diagnosed in the future with 39 Thus, this disserta tion has th e potential to add to the scientific knowledge on self injury by examining its risk factors and outcomes among the general population. Drawing from the entire population of Florid ians who used some form of hospital based health care from 2005 to 2014, this dissertation has greater generalizability than studies conducted using data from only a single health care system or single payer. The health care data source used in this dissertation is the largest longitudinal health care data across all payers. Moreo ver, this dissertation has linked individuals to their environmental characteristics, such as poverty, unemployment, lack of economic resources, and lack of food access, all of which can adversely affect health. These factors have the potential to inform a rea or neighborhood level public health programs to reduce self injury. Lastly, t his dissertation fills a gap in research that has focused on the young adult and adolescent population. Identifying age specific risk factors for self injury and clinical out comes after a self injury will be important steps to create age specific prevention strategies and post injury care plans.
20 Table 1 1. The epidemiologic triad and the corresponding v ariables used for this dissertation Epidemiologic triad Variables Agent Mechanism of injury (method of injury) Individual Sex Age Race/Ethnicity Payer Charlson Comorbidity Index All diagnoses prior to self injury All diagnoses after self injury All procedures after self injury Environment Area Deprivation Inde x Population characteristics A ge R ace S ex N umber of veterans N umber of disabled Household characteristics N umber of households A verage household size M edian household income M edian value of occupied housing units N umber of households living below poverty Economic characteristics N umber of businesses N umber of employees N umber of h ealth care facilities N umber of social assistance facilitie s Food desert designation
21 Figure 1 1 Epidemiologic triad of self injury
22 CHAPTER 2 DESIGN AND METHODS Study Design This dissertation was a retrospective, observational study of adults years) who have utilized health care within the state of Florida between January 1, 2005 and December 31, 2014. A flow diagram of the study population across the three aims is shown in Figure 2 1. For Aims 1 and 2, a case control study of any individual with self injury as ca ses and 1:2 matched individual s with injury other than self i njury as controls via risk set sampling was used. For Aim 3, a cohort study of individuals with nonfatal injury was carried out to assess adverse health outcomes, including subsequent self injury and death. This dissertation used data at the individual lev el from health care encounter data and environmental level data from multiple federal sources. Health Care Encounter Data T he Healthcare Cost and Utilization Project (HCUP) by the Agency for Healthcare Research and Quality (AHRQ) is a set of health care da tabases from multiple states in the United States. HCUP contains information from health care encounters ( an emergency department visit, for example) and is the largest longitudinal, all payer health care database in the US. For this dissertation, HCUP dat abases from the state of Florida, including the Florida State Emergency Department Database, the Florida State Inpatient Database, and the Florida State Ambulatory Surgery and Services Databases were used. The State Emergency Department Database contained health care encounter related information from emergency department (ED) visits that did not require hospital admission, the State Inpatient Database had inpatient care records and the State
23 Ambulatory Surgery and Services Database was comprised of outpa tient visits. These databases include detailed information from almost all EDs in the state (99%) and a majority of hospitals for inp atient stays (>95%) 40 This provides nearly comple te census of all the hospitals in Florida I ndividuals can be linked across different HCUP databases and years through a HIPAA compliant anonym ous patient identifier called v isitLink affording the opportunity to assemble longitudinal medical histories 41,42 Records of e ach encounter in the HCUP contained demographic information, such as age, sex, race /ethnicity payer and ZIP code (Zone Improvement Plan code) of residence; International Classification of Diseases ( ICD ) 9th revision clini cal modification (ICD 9 CM) codes specifying clinical diagnose s (up to 10), including the external cause of injury codes or E codes (up to 3); and medical procedures conducted at each of the visit s The Charlson comorbidity index (CCI) was calculated from the ICD 9 CM 43 ICD codes were mapped to their equivalents in Clinical Classifications Software (CCS) codes to reduce complexity and increase interpretability; ICD 9 CM has 14,000 codes and CCS contains 285 codes. 44 All variables were recoded as binary variables. We used HCUP databases from years 2005 to 2014. The data from 2015 and 2016 (latest available) were not used due to the vast changes in coding structure of ICD codes from i ts 9 th (ICD 9 CM) to 10 th (ICD 10 CM) revisions. The ICD 10 CM was adopted in October 2015. The changes in injury coding between these revisions would have introduced misclassification when harmonizing the ICD codes across years. For example poisoning, su ffocation, and foreign body were previously coded as external cause of injury in ICD 9 CM which were not mandatory for reimbursement 45 However,
24 they became nature of injury codes in ICD 10 CM, which are mandatory for reimbursement affecting whether the se di agnoses codes are present in health care data. Moreover, s pecifically related to this dissertation self injury codes which were exclusively external cause codes in ICD 9 CM (E950 9) have been split with some as nature of injury (T36 50, T51 65, T7 1) and others as external cause (X71 X83) codes in ICD 10 CM. 45 This would have led to a mismatch between case definitions using ICD 9 CM and case definitions using ICD 10 CM Therefore, HCUP data from 2005 to 2014 were used for uniformity of case definitions and diagnoses. Environmental Data In addition to individual level health care data from HCUP, several environment level variables from federal sources were used in this dissertation. These included populati on demographics from the 2010 c ensus 46 and the 2011 American Community Survey (ACS) 47 economic data from the County Business Patterns (CBP), 48 and food accessibility from the US Department of Agriculture Food Access Research A tlas (FARA). 49 These environmental data were linked to individual health care data using the Since the individual data is longitudinal and there may be multiple ZI P codes over time the mo st frequent ZIP code for each individual was used. All variables are summarized in Table 2 1. The environmental data sources used various geographical units : ZIP Code Tabulation Areas (ZCTAs) by the c ensus and ACS ZIP codes by HCU P and CBP, and census tracts by FARA. This required us to translate, or crosswalk, the different geographic units to a single standard unit. In Florida, there we re 4 245 census tracts, 1 472 ZIP codes, and 983 ZCTAs in 2010, the midpoint of our HCUP data The ZCTA unit was designated as the geographic unit of observation since it was the most
25 parsimonious and did not require extrapolation of missing values ZCTAs are similar to ZIP codes with the exception of Post Office (PO) boxes. ZIP codes of PO boxes we re mapped to their physical, geographic locations and translated to their respective ZCTAs. 50 The 2010 c ensus data by ZIP Code Tabulation Area s (ZCTA s ) were used in this dissertation. These variables included total population, numb er of females, number of white, black, and other races, number of Hispanics, median age, number of households and average household siz e From the 2011 ACS, we used the number of disabled individuals, number of veterans, median household income, median va lue of housing, and number of households living below poverty per ZCTA These variables provide a snapshot of the population demographics in the area of residence. The economic data on the number of business establishments and employment in each ZIP code w ere also used in this dissertation These data can indicate area s employment opportunities and economic health. We used the CBP data from 2005 to 2014 in this dissertation. Specifically, we used the number of business establishments, number of employees, number of health care facilities and number of social assistance services per ZIP code. Health care facilities included physician dentist, chiropractor, optometrist, mental health, and occupational therapist offices as well as home health care services, skilled nursing facilities, and continuing care retirement communities Social assistance services included individual and family social services, vocational rehabilitation, and child day care services. These variables from CBP provide an indication of hea lth care and social assistance availabilities in the neighborhood.
26 F ood desert areas were obtained at census tract level from FARA. Data from 2006, 2010, and 2015 were available as of 2019. Food desert areas were defined as areas with low income and low ac cess to supermarkets and grocery stores. A low income tract was defined as low income if the poverty rate in that tract is over 20% or median income or the A tract was defined as low access if more than 33% or at least 500 individuals in a census tract resided more than 1 mile from a supermarket or large grocery store in an urban area and more than 10 miles in a rural area. A census tract must be both low income and low access to be defined as a food desert. The census tract level data was translated to ZCTA using a relationship file from the Census Bureau. Because t he accessibility to supermarkets and grocery stores is important for a hea lthy diet and health a dichotomous variable indicating food desert status was included the analysis Lastly, the ZIP code of residence from HCUP data was translated into ZCTA. This allowed us to link the environmental variables to the individual variables using a common geographical unit. Methods of Analyses Descriptive statistics were calculated prior to testing statistical models For continuous variables, a t test or a Wilcoxon rank sum test was conducted to compare the means. For categorical variables, a chi square test was used. Due to the large number of variables in cluded in the analysis Bonferroni correction was used to adjust for multiple testing. Correlation between CCS codes was tested using Pearson correlation coefficients of the dichotomized C CS codes. A Phi coefficient greater than 0.5 or lower
27 than 0.5 indicated moderate to strong correlation Although the V ariable I nflation F actor (VIF) was also used to test for multicollinearity, no variables had a VIF greater than 10 Several statistical models were used in this dissertation. These methods fell within the broad category of supervised machine learning, which uses the patterns in the data to predict the known dependent variable. First, logistic regression was used to predict the relationship between independent variables and a binary outcome. Both univariable logistic regression ( with one independent variable ) and multivariable logistic regression ( with multiple independent variables ) were used. A variant of this regressio n, least absolute sh rinkage and selection operator, or LASSO was also used. LASSO regression adds a penalty to the coefficient for independent variables, such that some of these coefficients become 0; this leads to the removal of the independent variable with a 0 coefficient from the model and reduces the number of variables in the model. A decision tree model repeatedly splits the data into subsets to make the best predictions about the probability of an outcome. Lastly, a random forests model averages several decision tree models The advantage of the last two machine learning methods, decision tree and random forests is their ability to account for interactions without explicit statement. Cross validation (CV) was used to estimate the performance of a statistical model on new data. This involved dividing the whole data set into one for building the models and another for testing the accuracy of this model. CV increases generalizability and reduces overfitting i.e. having a model that will fit the specific data used to buil d the model well but not any other data. A 10 fold CV framework was used. All analyses were conducted using R.
28 This study used de AHRQ and was e xempt f ro m review by the University of Florida Institutional Review Board (IRB2 01701906) as non human subjects research
29 Table 2 1. List of data sources and variables used in this dissertation. Source Variable Name Variable Type Variable Description HCUP Age Continuous Age HCUP Sex Dichotomous Sex (Male/Female) HCUP Race /Ethnicit y Categorical Race/Ethnicity (White, Black, Hispanic, Other) HCUP Payer Categorical Insurance (Medicare, Medicaid, Private Insurance, Self Pay, Other) HCUP ZIP code Categorical HCUP ICD* Categorical Diagnoses HCUP CCS* C ategorical Diagnoses Census cPop Continuous Total population in ZCTA Census cFemale Continuous Number of females in ZCTA Census cWhite Continuous Number of whites in ZCTA Census cBlack Continuous Number of blacks in ZCTA Census cOther Continuous Numbe r of other races in ZCTA Census cHisp Continuous Number of Hispanics in ZCTA Census cAge Continuous Median age of individuals in ZCTA ACS a H Value Continuous Median value of occupied housing units in ZCTA ACS a HSize Continuous Average household size in ZCTA ACS aInc Continuous Median household income in ZCTA ACS aVet Continuous Number of veterans in ZCTA ACS aPov Continuous Percentage of households living below poverty in ZCTA ACS aVacant Continuous Percentage of housing units that are vacant in ZCTA ACS aDis Continuous Number of disabled individuals in ZCTA CBP bEst Continuous Number of business establishments per ZIP code CBP bEmp Continuous Number of employees per ZIP code CBP bAmb Continuous Number of ambulatory health care services per ZIP co de CBP bSocial Continuous Number of social assistance facilities per ZIP code FARA Desert Dichotomous Designation as food desert based on income and food access at ZIP code (Yes/No) ICD and CCS were dichotomized into binary variables. HCUP=Healthcare Cost and Utilization Project; ICD = International Classification of Diseases ; CCS=Clinical Classification s Software ; ZCTA=ZIP Code Tabulation Area ; ACS=American Community Survey ; CBP=County Business Patterns ; FARA= Food Access Research Atlas
30 Figure 2 1 .
31 CHAPTER 3 COMPARISON OF KNOWLEDGE DRIVEN MODEL TO DATA BASED MACHINE LEARNING MODELS TO STUDY RISK FACTORS FOR SELF INJURY Background On an average day more than 1 300 Americans are treated in e mergency departments for self injury. 1 Self injury, or injury that has been purposefully inflicted upon self, places a considerable burden on society and is incurred by an estimated 1.4 million individuals each year 5 Despite the long history of research and unde rstanding of major risk factors the rate of self injury is increasing 3,4,33 The increase in suicide rate has been the greatest among middle aged adults aged 40 65 years 27 Because nonfatal self injury rate is most common among adolescents and young adults, research has focused on this population. 11 However, given that risk fact ors f or self injury can vary by age and fatal self injury occurs most frequently among the middle aged and older age groups, 25,26 there is a need to study self injury across all ages. Using health care encounter data we retrospectively examined self injury and its association to a wide range of risk and protecti ve factors among adults Health care data can be a good source for studying rare outcomes in the general population since it enables evaluation of multiple risk factors spanning across all medical diagnoses. 51 Specifically related to self i njury, about half of the individuals with fatal self injury utilized primary care in the month prior to injury, and 19% used mental health care within the month before suicide. 52 The health care utilization rate is greater among those with nonfatal self injury where 75% of those with nonfatal injury had a health care encounter in the 1 month prior to attempt. 53
32 Aims and Hypotheses The aim s of this study w ere to examine individual level risk factors of self injury among adults : young er (18 39 yea rs of age ), middle aged (4 0 64 year s of age ), and old er (65+ year s of age ). Data from health care encounters wer e used to compare the performance of knowledge driven model to data driven machine learning models across adult lifespan. Two hypotheses were tested: Hypothesis 1: A data driven machine learning model using all variables will outperform a model built using well known risk factors in differentiating self injury from other injury Hypothesis 2: Age will modify the performance of the models. Methods Setting and Study Population This case control study used population based hospital discharge data from Healthca re Cost and Utilization Project (HCUP) from January 1, 2005 to December 31, 2014 HCUP data consisting of State Inpatient Database, State Ambulatory Surgery and Services Database, and State Emergency Department Database from Florida were utilized. These da tabases cover emergency room, inpatient, and outpatient health care encounters within the state. Patients were linked across databases and over time using a HIPAA compliant anonym ous identifier called visitLink. 42 Individuals were selected as cases if they had a health care visit with an International Classification of Diseases, Ninth Revision, Clinical Modification ( ICD 9 CM ) code E950 E958 indi cating self injury (Table A 1) with at least three years of health history in the HCUP without the aforementioned codes for self injury Thus, individuals who were seen for self injury without a history of self injury in the past 3 years were
33 selected as c ases. Controls were selected based on risk set sampli ng from individuals who had a health care visit in the same year as the case and with an ICD 9 CM code for injury other than self (E000 E949, E960 E999) as shown in Table A 2 Controls were selected in a 1:2 case to control ratio to increase statistical power. They also had to have three years of health history with in HCUP We focused on adults (18 or older) and individuals with non missing age, sex, and race /ethnicity Age was missing from 0.005% of the emergency department encounters in 2014; sex (<0.001%) and race/ethnicity (0.78%) also had low frequencies of missing values. Middle aged adults were individuals who are 40 64 years old, as defined in current self injury literature and by their roles as wo rking adults with family responsibilities and aging related physical and psychological developments. 27,54 Younger adults were defined as those less than 40 years old and older adults were those over 65 years of age. Variables A lookback period of three years was used to extract all ICD 9 CM codes for cases and controls from visits prior to t he index visit. There were 736 unique ICD 9 CM codes among the entire study population Comorbidity Software (CCS) codes to reduce the dimensionality and increase clinical relevance 44 In addition to the diagnosis codes, demographic variables such as age (< 39 years old, 4 0 64 years old, and 65+ years old), race /ethnicity (white, black, Hispanic, other), payer (Medicare, Medicaid, self pay, private insurance, and other), and sex (male and female) were included in the analysis. calculate the Charlson comorbidity in dex (CCI) an indicator for number of major health conditions related to mortality 43 Due to the positive skewness of the CCI values in the
34 Categorical variables were coded into binary variables (1=yes; 0=no). Sex was code d as 1 if female and 0 if male. Statistical Analyses Descriptive statistics were computed comparing means and standard deviations for continuous variables using t test for the n ormally distributed variable (age) and Wilcoxon rank sum test for the non normally distributed variable (CCI). Categorical variables were compared using chi square test s All descriptive statistics were conducted with Bonferroni correction for multiple testing and two tailed p value of 0.05 was used as the significance cutoff Prior to model building, multicollinearity was a ssessed with variance inflation factor (VIF) (<10 as cutoff ) and Phi correlation coefficient (<| 0. 5 0 | as cutoff ). Bivariate analyses were conducted by testing each of the CCS codes in a univariable logistic regression with self injury as outcome. Unadjuste d odds ratios (OR) and 95% confidence intervals (CI) were examined. Four supervised machine learning models were tested. Random forests decision tree, logistic regression, and least absolute shrinkage and selecti o n operator ( LASSO ) regression models used all independent variables in the data set. T wo additional logistic regression models using select varia bles were also tested. The first o f these regression models was an expert knowledge based model containing CCS codes related to mental health and s uicida l ideation (CCS codes: 650 670) which are known risk factors for self injury. 55 Lastly, a feature selection model which included the top 20 most important CCS codes was tested. Using a feature selector based on chi square tests of significance for each CCS c ode in the data the CCS codes were ranked based on strength of
35 association with the outcome via This yield ed a list of variables which are positively or negatively most strongly associated with the outcome. These models were tested in a 10 fol d cross validation framework and compared using Area Under the Receiver Operating Characteristics (AUROC), sensitivity, and specificity. The optimal cutoff point for the calculation of sensitivity and specificity was chosen via The best p erforming model was used to conduct stratified analyses by age For each age group, the best performing model wa s refitted with a) all variables, b) only demographic variables (age, sex, race/ethnicity, payer, CCI) c) only clinical variables (CCS codes) d) expert knowledge based set of variables, and e) feature selected variables Similar to analysis with the entire population, a 10 fold cross validation framework was used and AUROC, sensitivity, and specificity were used to compare the models. All analy ses were conducted in R. This study used de identified data and was exempt f ro m review by the University of Florida Institutional Review Board (IRB201701906). Results Among over 21 million individuals in the HCUP data from 2005 to 2014, there were 62 140 c ases who met the case definition of having a health care visit due to self injury and 3 years of health care history without prior self injury (Figure 3 1) As shown in Table 3 1 the majority of cases were seen for self poisoning (n= 46,408; 74.7 %). Cuttin g was the second most common form of self injury (n= 9770 ; 15.7 %). There were 121,300 controls in the study. Although the risk set sampling would allow any control
36 who would later have self injury to become a case in the study, there were no overlaps betwee n the cases and controls due to the large sample size. As shown in Table 3 2, i ndividuals with self injury were younger t han the controls (mean age: 38.5 vs. 52.2 ), had few er comorbidities (mean CCI: 0.93 vs 1.54 ) were more likely to be female (57.3% vs. 52.9%) and white (76.9% vs. 63.1%), and were self paying (28.4% vs. 8.6 %). All of these demographic variables were significantly different between cases and controls after correction for multiple testing (p<0.05). No variables had VIF higher than 10. Ther e were 5 variables with absolute value of the correlation coefficient greater than 0.50 and were removed prior to analysis (Table 3.3). I n bivariate analyses across all 191 CCS codes poisoning by psychotropic agents yielded the top unadjusted odds ratio a ssociated with self injury. Compared to those without self injury, individuals with self injury had 9.89 times the odds of prior poisoning by psychotropic agents (95% confidence interval: 8.97 10.94). CCS codes with top 10 highest unadjusted odds ratios ar e shown in Table 3 4. The CCS code 662 indicating suicide and intentional self inflicted injury had the second highest odds ratio. This CCS corresponded wholly to suicidal ideation, given the case definition of not having a self injury in the 3 years pri or to index visit. Thus, s uicidal ideation had the second highest odds ratio; the cases were 9.78 times more likely to have had a diagnosis for suicidal ideation (95% confidence interval: 9.78 10.24). Ideation was previously diagnosed in 16.8% of the cases (n=10,432 out of n=62,140). Other well known psychiatric disorders such as personality, mood, substance related, and alcohol related disorders were also found among the top predictors of self injury.
37 Most of these top CCS codes were part of the expert kn owledge model (Table 3 model had a substantial overlap with the expert knowledge model. Figure 3 2 A and Table 3 6 show the results of various models tested LASSO reg ression had the highest AUROC of 0.840 5 as well as the highest sensitivity (76. 9 % ) and specificity (76. 6 %). The worst performing model was a decision tree (AUROC: 0.81 11 ; sensitivity: 7 7.2 %; specificity: 7 2.4 %). When compared to LASSO, most models were sig nificantly worse. However, the LASSO only had an incremental gain in performance compared to the logistic regression model based on expert knowledge (AUROC: 0 8 279 ; sensitivity: 75. 9 %; specificity: 75. 3 %). This meant that the expert knowledge model with 14 CCS codes and demographic covariates performed almost as well as the LASSO which retained 186 out of 196 variables. Therefore, we used the expert knowledge model to examine specific risk factors associated with self injury. The adjusted odds ratios from t he expert knowledge model are shown in Table 3 7. After adjusting for demographic covariates, individuals with self injury had higher odds of prior diagnosis with anxiety disorders (adjusted OR (aOR): 1.5 9 ; CI: 1.5 4 1.6 4 ); attention deficit, conduct, and d isruptive behavior disorders (aOR: 1.2 2 ; CI: 1.0 7 1. 39 ); mood disorders (aOR: 2.5 4 ; CI: 2.4 5 2.6 2 ); personality disorders (aOR 1.25; CI: 1.1 1 1.4 2 ); schizophrenia and psychotic disorders (aOR: 1. 6 5 ; CI: 1.5 7 1.7 4 ); alcohol related disorders (aOR: 1. 5 5 ; CI: 1. 50 1. 61 ); substance related disorders (aOR: 1.4 4 ; CI: 1.3 9 1.4 9 ); suicide and intentional self inflicted injury (aOR: 2. 50 ; CI: 2.3 7 2.6 4 ); and other mental health disorders (aOR: 1.1 1 ; CI: 1.0 2 1.2 1 ). In contrast, delirium, dementia, and other cognitiv e disorders were protective (aOR: 0.8 6 ; CI: 0. 79 0.9 3 ).
38 From the expert knowledge based model, we also found that screening and history of mental health and substance abuse was a significant risk factor for self injury (aOR: 1.4 1 ; CI: 1.3 7 1.4 5 ) When the corresponding ICD codes to this CCS code (663) were further examined, we found that most of this screening was l imited to tobacco use screening. T here were 67,467 individuals diagnosed with tobacco disorder, and 29,409 had history of tobacco use; 13,167 ha d both. Other possible screening and history within this CCS code include history of mental health disorder, psychological trauma, history of physical or emotional abuse, and screening for depression, among others. However, none of the cases or controls ha d mental health, trauma, or abuse related screenings. T he best performing model, LASSO, w as used to test the variable domains for each of the age groups Figure 3 2 B and Table 3 8 show the results from comparison of domains: demographic domain, clinical do main, and both domains compared to expert knowledge variables. Using all variables yielded the highest AUROC of 0.8407, highest sensitivity at 77.2% and secon d highest specificity of 76.5%. The stratified analysis of younger, middle aged, and older adults is shown in Figure 3 3 and Table 3 9. Among young er adults, using all available variables yielded the highest AUROC (0. 79 39 ), second highest sensitivity (7 3.0 %), and highest specificity (72. 2 %). Similarly, using all the variables yielded highest AUROCs for middle aged group and older aged group. While the expert knowledge variables performed well for young er age and middle aged groups (AUROC: 0.7 771 and 0. 8016 respectively) it had reduced performance for the older age group. Among the older adults, the exp ert knowledge based model had an AUROC of 0.6900, sensitivity of 75.2%, and specificity of 52.6%.
39 Discussion In this study, we found that majority of self injury occ urred through self poisoning which is consistent with national and state figures. 1 We also found that previous diagnosis of poisoning, suicidal ideation, personality disorders, and mood disorders were highly associated with self injury Although the psychiatric disorders are well studied risk facto rs for self injury, p oisoning was a novel addition since these poisonings were unintentional poisonings, not intentional self poisonings definition was an individual with a diagnosis of self injury and 3 years of health encounter data without prior diagnosis of sel f injury. This meant that those with history of prior self poisoning would not be eligible to become a case. In fact, all 2,317 cases with poisoning had a prior diagnosis of unintentional poisoning with benz odiazepine based tranquilizers. We tested s everal machine learning models in this study. Although random forests and decision tree models are known for their ability to account for complex interactions and simultaneous analysis of all independent variables these models did not perform the best ove rall This led us to conclude that interactions were not a significant factor in this study but it is possible that there are significant interactions present and will be found if explicitly stated in regression models Additionally, e xpert knowledge base d model which had 13 CCS codes related to psychiatric health performed just as well as a regression model using all 19 6 variables in the data. Such model with smaller number of variables will have greater interpretability and clinical utilit y
40 Limitations This study has several limitations. Although a case control study is appropriate for studying self injury due to its rarity, selection of ap propriate controls is a common issue for injury studies. 56 We used risk set sampling which allowed a control to become a case at an y point during the st udy period. We also selected from a pool of patients who were seen at the hospital for a n injury other than self injury Thus, the injury for both cases and controls were of sever ity which led to a hospital visit. However, we are missing individuals who di d not have a hospital visit for nonfatal self injury and those who had fatal self injury. In this study the controls were matched to the cases based on the year of the index visit. This is due to the anonymized nature of the data which does not contain th e exact date or month of health care visit for privacy. Thus we used the year of the index visit to match controls to cases. This means that seasonal effects on suicide (suicide rates are highest in late spring and early summer) cannot be accounted. 5 7 The data used in this study lacks other variables related to self injury like bereavement, unemployment, financial issues, genetics, and legal problems so confounding is also possible 26 Also Florida has a long history of accurate coding of injuries via state mandate, 58 but it is possible that there are misclassifications and misdiagnoses present in health care data such as HCUP. Misclassification on the intentionality of an injury or poisoning is a known issue with health care data. 59 Varying ranges of positive predictive values using ICD 9 CM codes for self injury (4 100%) have been observed. 60
41 Strengths large sample size (n=183,440) and its generalizability HCUP data in Florida has almost complete coverage across the state; 95 99% of hospit al emergency departments, outpatient facilities, and inpatient facilities in Florida contribute to HCUP. 40 HCUP is not limited to a single health care system or a single payer and cov ers the entire population across multiple payers, including uninsured and self paying individuals. This increases th e generalizability of the study The use of health care data eliminates recall bias. In this study, we found that prior unintentional poison ing is a risk factor for self injury. We also found that the expert knowledge based model was just as informative as data driven machine learning models to study risk factors for self injury However, we found that such expert knowledge based model worked well for young er and middle aged groups but had reduced performance for the older age group. A penalized regression model using all of the health care data had the best performance for the older age group. Further research on risk factors for self injury o ther than psychiatric disorders is warranted for this age group Moreover, we found a lack of mental health related screenings, such as history of physical or emotional abuse, being recorded in these health encounter data. Although this may be driven by la ck of financial incentive to conduct or record such screenings, screening for mental health, trauma, and abuse is especially pertinent for people at risk of self injury and should be a priority.
42 Table 3 1 Distribution of cases and controls by their ICD 9 CM case definitions shown as frequencies and percentages. ICD 9 CM ICD 9 CM d escription Cases (n= 62140 ) n (%) Controls (n=12 1300 ) n (%) E950 Suicide and self inflicted poisoning by solid or liquid substances 4 6408 ( 74.7 ) E951 Suicide and self inflicte d poisoning by gases in domestic use 5 (0.0) E952 Suicide and self inflicted poisoning by other gases and vapors 381 (0.6) E953 Suicide and self inflicted injury by hanging strangulation and suffocation 8 3 7 (1. 3 ) E954 Suicide and self inflicted injur y by submersion [drowning] 66 (0.1) E955 Suicide and self inflicted injury by firearms air guns and explosives 767 (1. 2 ) E956 Suicide and self inflicted injury by cutting and piercing instrument 9770 (15. 7 ) E957 Suicide and self inflicted injuries by jumping from high place 273 (0. 4 ) E958 Suicide and self inflicted injury by other and unspecified means 3633 ( 5.8 ) E000 External Cause Status 9 3 (0.1) E001 E030 Activity 181 (0. 1 ) E800 E807 Railway Accidents 5 (0.0) E810 E819 Motor Vehicle Traff ic Accidents 8769 (7. 2 ) E820 E825 Motor Vehicle Nontraffic Accidents 360 (0.3) E826 E829 Other Road Vehicle Accidents 667 (0. 5 ) E830 E838 Water Transport Accidents 76 (0.1) E840 E845 Air And Space Transport Accidents 12 (0.0) E846 E849 Vehicle Ac cidents, Not Elsewhere Classifiable 5 885 (4. 9 ) E850 E858 Accidental Poisoning By Drugs, Medicinal Substances, And Biologicals 11 03 (0.9) E860 E869 Accidental Poisoning By Other Solid And Liquid Substances, Gases, And Vapors 1 70 (0.1) E870 E876 Misadv entures To Patients During Surgical And Medical Care 849 (0. 7 ) E878 E879 Surgical And Medical Procedures As The Cause Of Abnormal Reaction Of Patient Or Later Complication, Without Mention Of Misadventure At The Time Of Procedure 3 1556 (2 6.0 ) E880 E888 Accidental Falls 2 3489 (19. 4 ) E890 E899 Accidents Caused By Fire And Flames 2 36 (0.2) E900 E909 Accidents Due To Natural And Environmental Factors 1 277 (1. 1 ) E910 E915 Accidents Caused By Submersion, Suffocation, And Foreign Bodies 1 235 (1. 0 ) E916 E928 Other Accidents 27633 (22.8) E929 E929 Late Effects Of Accidental Injury 1193 (1.0) E930 E949 Drugs, Medicinal And Biological Substances Causing Adverse Effects In Therapeutic Use 12133 (10.0) E960 E969 Homicide And Injury Purposely Inflicted B y Other Persons 2948 (2.4) E970 E979 Legal Intervention 128 (0.1) E980 E989 Injury Undetermined Whether Accidentally Or Purposely Inflicted 1293 (1.1) E990 E999 Injury Resulting From Operations Of War 9 (0.0)
43 Table 3 2 Demographic c haracteristic s of the study population. Cases are those with self injury and controls are those with injury other than self injury. Total (n=183440) n (%) Cases (n=62140) n (%) Controls (n=121300) n (%) p value Age Mean (SD) 47. 57 (18.5 ) 38. 46 (14.6 ) 5 2.23 ( 18.5 ) <0.0001 [18 39] 67980 (37.1) 34992 (56.3) 32988 (27.2) <0.0001 [40 64] 76376 (41.6) 23483 (37.8) 52893 (43.6) <0.0001 [65+) 39084 (21.3) 3665 (5.9) 35419 (29.2) <0.0001 Charlson Comorbidity Index Mean (SD) 1.3 3 (2.0 ) 0.9 3 (1. 6 ) 1.5 4 (2.1 ) <0.0001 90361 (49.3) 26749 (43.0) 63612 (52.4) <0.0001 Sex Female 99840 (54.4) 35617 (57.3) 64223 (52.9) <0.0001 Race /Ethnicity Black 32621 (17.8) 6564 (10.6) 26057 (21.5) <0.0001 Hispanic 21422 (11.7) 6289 (10.1) 15133 (12.5) <0.0001 Other 5070 (2.8) 1475 (2.4) 3595 (3.0) <0.0001 White 124327 (67.8) 47812 (76.9) 76515 (63.1) <0.0001 Payer Medicare 67519 (36.8) 13670 (22.0) 53849 (44.4) <0.0001 Medicaid 22397 (12.2) 11341 (18.3) 11056 (9.1) <0.0001 Self Pa y 28033 (15.3) 17623 (28.4) 10410 (8.6) <0.0001 Other 17463 (9.5) 6204 (10.0) 11259 (9.3) 0.0045 Private 48028 (26.2) 13302 (21.4) 34726 (28.6) <0.0001 SD= Standard D eviation. P values from t test s (if normally distributed), Wilcoxon rank sum te st s (if non normally distributed), and chi square test s of proportions with Bonferroni corrections are shown
44 Table 3 3. List of variables with an absolute value of the Phi correlation coefficient gr eater than 0.5 CCS CCS Description 49 Diabetes mellit us without complication 53 Disorders of lipid metabolism 99 Hypertension with complications and secondary hypertension 157 Acute and unspecified renal failure 236 Open wound of extremities
45 Table 3 4. Results of bivariate analyses showing top 10 CCS codes with the highest unadjusted odds ratios shown with frequencies among cases and controls. CCS CCS description Unadjusted OR (95% CI) Cases (n=62,140) Controls (n=121,300) 241 Poisoning by psychotropic agents 9.89 (8.97 10.94) 2317 473 662 Suicide a nd intentional self inflicted injury 9.78 (9.35 10.24) 10432 2451 658 Personality disorders 7.85 (7.06 8.74) 1688 430 652 Attention deficit, conduct, and disruptive behavior disorders 5.77 (5.17 6.47) 1205 414 657 Mood disorders 5.28 (5.16 5.42) 23023 1 2156 661 Substance related disorders 5.12 (4.98 5.26) 18856 9519 659 Schizophrenia and other psychotic disorders 3.82 (3.67 3.97) 7401 4149 660 Alcohol related disorders 3.79 (3.69 3.89) 16290 10397 670 Miscellaneous mental health disorders 3.77 (3.51 4.04) 2255 1201 651 Anxiety disorders 3.25 (3.17 3.32) 22992 18582 OR= Odds Ratio; CI= Confidence Interval s
46 Table 3 5 List of variables used to build the models. Models Variables Random forests All Decision tree All Logistic regression All LASSO r egression All Logistic regression based on expert knowledge Demographic s : sex, race /ethnicity payer, CCI CCS 650 Adjustment disorders CCS 651 Anxiety disorders CCS 652 Attention deficit, conduct, and disruptive behavior disorders CCS 653 Delirium dementia, and amnestic and other cognitive disorders CCS 654 Developmental disorders CCS 657 Mood disorders CCS 658 Personality disorders CCS 659 Schizophrenia and other psychotic disorders CCS 660 Alcohol related disorders CCS 661 Substance related disorders CCS 662 Suicide and intentional self inflicted injury CCS 663 Screening and history of mental health and substance abuse codes CCS 670 Miscellaneous mental health disorders Logistic regression with feature selection Demographic s: sex, race /ethnicity payer, CCI CCS 47 Other and unspecified benign neoplasm CCS 50 Diabetes mellitus with complications CCS 55 Fluid and electrolyte disorders CCS 59 Deficiency and other anemia CCS 86 Cataract CCS 96 Heart valve disorders CCS 98 Essential hypertension CCS 101 Coronary atherosclerosis and other heart disease CCS 108 Congestive heart failure; non hypertensive CCS 114 Peripheral and visceral atherosclerosis CCS 117 Other circulatory disease CCS 158 Chronic kidney di sease CCS 237 Complication of device; implant or graft CCS 651 Anxiety disorders CCS 657 Mood disorders CCS 659 Schizophrenia and other psychotic disorders CCS 660 Alcohol related disorders CCS 661 Substance related disorders CCS 662 Suicide and intentional self inflicted injury CCS 663 Screening and history of mental health and substance abuse codes CCI= Charlson Comorbidity Index
47 Table 3 6 Comparison of models used to examine predictors of self injury usi ng variables listed in Table 3 5 Mean and SD of A UROC sensitivity, and specificity across 10 cross validations are shown Models AUROC SD Sensitivity Specificity p value Random forests 0.8395 0.0016 0.7764 0.7590 0.1691 Decision tree 0.8111 0.0022 0.7715 0.7241 <0.0001 Logistic reg ression 0.8403 0.0013 0.7713 0.7629 0.1427 LASSO regression 0.8405 0.0013 0.7691 0.7657 REF Logistic regression based on expert knowledge 0.8279 0.0015 0.7593 0.7531 <0.0001 Logistic regression with feature selection 0.8337 0.0015 0.7672 0.7541 <0. 0001 AUROC= Area Under Receiver Operating Characteristic; SD= Standard Deviation ; P value s show t test s results comparing each model to the model with best AUROC.
48 Table 3 7 Adjusted odds ratios and 95% confidence intervals from logistic regression base d on expert knowledge variables model. Variable Adjusted OR (95% CI) Sex Female 1.41 (1.3 8 1.45 ) Male REF Race/Ethnicity Black 0.29 (0.28 0.30) Hispanic 0.67 (0.65 0.70) Other 0.72 (0.67 0.77) White REF Payer Medica re 1.05 (1.01 1.09 ) Medicaid 1.93 (1. 86 2.01 ) Self Pay 3.54 (3. 42 3.6 7 ) Other 1.22 (1.17 1.28 ) Private REF Age [18 39] 2 .07 (2.02 2.13) [40 64] REF [65+) 0.41 (0.39 0.43) CCI 0.65 (0.63 0.67) CCS 650 Adjustment disorders 0.98 (0.86 1.11) 651 Anxiety disorders 1.59 (1.5 4 1.64 ) 652 Attention deficit, conduct, and disruptive behavior disorders 1.22 (1.07 1.39 ) 653 Delirium, dementia, and amnestic and other cognitive disorders 0.86 (0.79 0.93 ) 654 Developmental disorders 1.04 (0.91 1.19 ) 657 Mood disorders 2.54 (2.45 2.62 ) 658 Personality disorders 1.25 (1.11 1.42 ) 659 Schizophrenia and other psychotic disorders 1.65 (1.57 1.74 ) 660 Alcohol related disorders 1.55 (1.50 1. 61) 661 Substance related disorders 1.44 (1.39 1.49 ) 662 Suicide and intentional self inflicted injury 2.50 (2.37 2.64 ) 663 Screening and history of mental health and substance abuse 1.41 (1.3 7 1.45 ) 670 Miscellaneous mental health disorders 1.11 (1.02 1.21 ) CCI= Charlson Comorbidity Index ; OR= Odds Ratio; CI= Confidence Intervals
49 Table 3 8. Comparison of domains using the best performing model, LASSO regression : 1) clinical variables d omain includes all CCS diagnoses codes; 2) demographic variables include age, sex, race/ethnicity, payer, and CCI; 3) expert knowledge variables include demographic variables and CCS 650 670 and 4) feature selected variables include demographic variables a nd top 20 most significant CCS diagnoses Mean and SD of AUROC sensitivity, and specificity across 10 cross validations are shown Variable domains AUROC SD Sensitivity Specificity p value All variables 0.8404 0.0017 0.7754 0.7588 REF Clinical variables 0.8146 0.0012 0.7513 0.7255 <0.0001 Demographic variables 0.7699 0.0013 0.6924 0.7032 <0.0001 Expert knowledge variables 0.8286 0.0014 0.7673 0.7471 <0.0001 Feature selected variables 0.8005 0.0010 0.7416 0.7115 <0.0001 CCS= Clinical Classifications S oftware; AUROC= Area Under Receiver Operating Characteristic; SD= Standard Deviation; P value s show t test s results comparing each model to the model with best AUROC.
50 Table 3 9 Comparison of models used to examine predictors of self injury stratified by age: young er age ( 18 39 ), middle aged (4 0 64), and old er age (65 + ) groups Domains are separated into: 1) clinical variables domain includes all CCS diagnoses codes; 2) demographic variables include age, sex, race/ethnicity, payer, and CCI; 3) expert know ledge variables include demographic variables and CCS 650 670 ; and 4) feature selected variables include demographic variables and top 20 most significant CCS diagnoses. Mean and SD of AUROC, sensitivity, and specificity across 10 cross validations are sho wn. AUROC SD Sensitivity Specificity p value Young er age group All variables 0.7939 0.0024 0.7297 0.7223 REF Clinical variables 0.7493 0.0028 0.7339 0.6410 <0.0001 Demographic variables 0.7110 0.0029 0.7171 0.6020 <0.0001 Expert knowledge variables 0.7771 0.0026 0.7067 0.7220 <0.0001 Feature selected variables 0.7833 0.0021 0.7225 0.7157 <0.0001 Midd le age d group All variables 0.8269 0.0029 0.7805 0.7301 REF Clinical variables 0.7993 0.0018 0.7729 0.6830 <0.00 01 Demographic variables 0.6909 0.0041 0.5390 0.7175 <0.0001 Expert knowledge variables 0.8016 0.0030 0.7716 0.6997 <0.0001 Feature selected variables 0.8121 0.0031 0.7670 0.7200 <0.0001 Old er age group All variables 0.7304 0.0069 0.7174 0.6200 REF Clinical variables 0.6980 0.0075 0.7805 0.5151 <0.0001 Demographic variables 0.5588 0.0058 0.2403 0.8611 <0.0001 Expert knowledge variables 0.6900 0.0054 0.7524 0.5257 <0.0001 Feature selected variables 0.6686 0.0047 0.7930 0.4391 <0.0001 CCS= Clinical Classifications Software; AUROC= Area Under Receiver Operating Characteristic; SD= Standard Deviation; P value s show t test s results comparing each model to the model with best AUROC.
51 Figure 3 1. Flow chart of s am ple population for inclusion in th e first study
52 Figure 3 2 Receiver Operating Characteristic (ROC) curves of models used to examine risk factors for self injury. A) ROC curves of expert knowledge based and data driven machine learning models for the study population B) ROC curves of f ive domain specific models with the best performing model, LASSO, fitted on all variables, demographic variables only, clinical variables only, set of expert knowledge variables, and a set of feature selected variables.
53 Figure 3 3 Receiver Operating Characteristic (ROC) curves of models used to examine predictors of self injury with best performing model, LASSO, fitted on all variables, demographic variables only, clinical variables only, set of expert knowledge va riables, and a set of feature selected variables for A) young er age group ( 18 39 ) ; B) for middle age d group (4 0 6 4 ); and C) for old er age group ( 65+ ).
54 CHAPTER 4 RISK FACTORS FOR SELF INJURY ACROSS INDIVIDUAL AND ENVIRONMENTAL DOMAINS STUDIED THROUGH MACHI NE LEARNING Background Each year, over half a million individuals are treated in emergency departments for self injury in the US. 1 Over 44,000 die of fatal self injury, and it is estimated that some 1.4 million individuals inflict ed self injury in the past year. 5 Self injury, or injury intentionally inflicted upon self, is a considerable morbidity and mortality burden on society. Although self injury has been known by various terms including deliberate self harm, self directed violence, suicidal behavior, and sel f inflicted injury, 20 there has been a long history of research on self injury. Some of the well studied risk factors are mental health conditions, such as schizophrenia, depression and bipolar disorders, suicidal id eation, and substance related disorders. 5,9,10,12,61 Unemployment, financial strain, sexual minority status, and social dis connectedness have also been found to increase the risk of self injury. 7 9,11,33 In addition, environmental factors like economic downturn, a local suicide epidemic, and exposure to community violence have been found to be associated with higher incidence of self injury. 9,15 Variations in s elf injury incidence by geography also have been observed in the US and elsewhere. 16,62 64 his or her individual health determinants but also his or her environment. 65 In this study, we aimed to examine both individual and environment level risk factors for self injury using machine learning. Both levels of factors have been found to be important in the context of self injury, but studies incorporating both levels have been limited to certain age groups or select composite indices indicating socioeconomic health. 66 68
55 Using routinely collected health care encounter data and linking individuals to their environmental characteristics, we examined the associations of demographic, clinical, and environmental risk factors with the outcome of self injury among adults in Florida. Aims and Hypotheses We aimed to identify environmental factors associated with self injury and compare the strengths of associations with self injury when accounting for individual level factors The following hypotheses were tested: Hy pothesis 1: Individuals with self injury will be more likely to reside in areas of greater poverty, lower income, and reduced access to healthy foods and health care. Hypothesis 2: Individual level factors will be s tronger risk factors for self inju ry than environmental factors alone or in combination with individual level factors. Hypothesis 3: Self injury is more likely to be associated with environmental factors for young er adults (18 38 years old), both individual and environmental factors will be important for middle aged adults (40 64 years old), and only individual factors will be strongly associated self injury among older adults (65+ years old). Methods Study Population and Case Definition This retrospective observational study examined adults residing in Florida who utilized health care between January 1, 2005 and December 31, 2014 with at least 3 years of prior history of health care utilization. Cases were those with an International Classification of Diseases 9 th revision Clinical Modificat ion (ICD 9 CM) diagnosis for self injury (Table A 1) and controls were those with an ICD 9 CM diagnosis for any injury other than self inflicted (Table A 2). Controls were selected using risk set
56 sampling based on year of visit for the cases in a 1:2 case to control ratio. Individuals with non missing sex, race/ethnicity, payer status, and valid ZIP code (Zone Improvement Plan code) of residence in Florida were included in this study. Individual Level Data Health care encounter data from the Healthcare Cost and Utilization Project (HCUP) databases were used. These databases were the Florida State Emergency Department Databases, State Inpatient Databases, and State Ambulatory Surgery and Services Databases including visits date from January 1, 2005 to Decembe r 31, 2014. Individuals were linked across the databases and years using a HIPAA compliant anonym ous patient identifier, visitLink, to reconstruct their health care utilization across hospitals and time. 42 HCUP data included demographic covariates, sex, race/ethnicity, payer, and ZIP code of residence, and clinical variables: ICD 9 CM diagnoses and procedure codes. ICD 9 CM codes were mapped to th eir respective Clinical Classifications Software (CCS) codes 44 to reduce the number of codes and increase clinical utility and interpretability. ICD 9 CM has alm ost 50 times the number of codes in CCS. We used all CCS codes from the 3 years prior to self injury for cases and 3 years prior to any non self injury for controls. Charlson comorbidity index was calculated using ICD 9 CM diagnoses codes. 43 All variables were dichotomized into binary variables. Environmental Data Built and s ocial environment data were obtained from the decennial census, 46 the American Community Survey (ACS), 47 the County Business Patterns (CBP), 48 and the ` Food Access Research A tlas (FARA). 49 The variables are summarized in Table A 3
57 The 2010 decennial census provided demography of the population living in a ZIP Code Tabulation Area (ZCTA). These included the total number of i ndividuals residing in the area and their demographic breakdown by sex, race, ethnicity, and age. From the 2011 ACS, household and housing characteristics were obtained. The m edian value of occupied housing units average household size, median household i ncome, percentage of households living below poverty, and percentage of housing units that are vacant were obtained. In addition, the number of veterans residing in the ZCTA was also obtained. From the 2012 ACS, the number of disabled individuals residing in the ZCTA was obtained (this variable was not available until 2012). For all ACS variables, 5 year estimates, or estimates using 5 years of collected data, were used since they had the greatest reliability and full availability of the ZCTA geographic uni ts compared to 1 or 3 year estimates. 69 For some ZCTAs, the census and ACS did not contain information on housing value, income, poverty status, and/or other variables. These geographic areas were most likely non residential and included the state prison (ZCTA: 32026), naval stations (32212, 32228, 32508), air force base s (32403, 32925), and university (33620). Information on the economic characteristics of a neighborhood was gathered from the 2005 2014 CBP dataset s The variables extracted were number of business establishments overall, number of employees overall, number of ambulatory health care services, and number of social assistance facilities. The last two variables were created based on the categorization of businesses within the CBP. The number of ambulatory health care services was extracted with the North American Industry Classification System (NAICS) designation for ambulatory health care services (621); these
58 businesses included of fices of physicians, dentists, mental health practitioners, and others as listed in Table A 4 The variable for the number of social assistance facilities was extracted under NAICS code for social assistance (624), which included individual and family serv ices, community food services, vocational rehabilitation, and child day care. The 2007 and 2012 NAICS codes were used within the 2005 2014 time period, but the NAICS codes for health care facilities did not change between these two versions. These variable s from CBP were used to indicate the economic health of a neighborhood as well as the accessibility to health care and social assistance. Each of these variables was measured by CBP at the ZIP code level, so they were translated to their respective ZCTAs. 50 Next, food desert designation was extracted from the 2006, 2010, and 2015 FARA datasets. Food deserts are areas of low income and low access to health foods, such as supermarkets and grocery stores. FARA designates a census tract as low low access if more than 33%, or at least 500 persons, reside more than 1 mile from a supermarket or a large grocery store in an urban area or more than 10 miles in a rural area. While FARA provides data at the census tract level, the environment level data for this study was aggregated at the ZCTA level. Thus, we mapped t he c ensus tract s from FARA to ZCTA using a relationship file from the Census Bureau A ZCTA may contain one or more census tracts Using the population density, the food desert designation for a ZCTA was based on whether the majority of the population within t hat ZCTA lived in
59 food desert designated census tracts. This variable indicating food desert was included to consider the accessibility to a healthy diet. Lastly, an each ZCTA. 70 deprivation. Data Linkage The individual level data from HCUP and the environmental data from the census, ACS, CBP, and FARA were linked at the ZCTA level. For each individual in HCU P, the most frequently recorded ZIP code from the 3 year lookback period was used and the year where this modal ZIP code most frequently appeared was used as the year of ZIP code extraction. The modal ZIP code was mapped to ZCTAs, which are the standard un its in the census and ACS. This translation geocodes Post Office (PO) boxes to their physical locations. 50 Using the ZCTA of residence, cases and controls were linked to the census and ACS characteristics of their environment. Usin g both the modal ZCTA and year of ZCTA extraction, each individual was linked to the CBP variables by ZCTA and year. Lastly, since FARA data were available only for years 2006, 2010, and 2015, the FARA dataset from the year closest to the year of ZCTA extr action were used. To reduce the effect of varying scales of the environmental variables, they were categorized into multinomial variables using tertiles: lower than average (1 st tertile), around average (2 nd or median tertile), and higher than average valu es (3 rd tertile), e.g. lower than average median household income, around average median household income, and higher than average median income. Individuals with any missing environmental data and residence outside of Florida were removed from the analyse s.
60 Statistical Analyses Descriptive statistics were conducted using t test s or Wilcoxon rank sum test s for continuous variables and chi square test s for categorical variables. Correction s for multiple testing w ere made using Bonferroni adjustment. Bivariat e analyses of each environmental variable and self injury as outcome were conducted using univariable logistic regression. Since the environmental variables were transformed to a multinomial variable with 3 categories, the median tertile was used as the re ference group. We reported unadjusted odds ratios (OR) and their 95% confidence intervals (CI) from these bivariate analyses. Several statistical models were compared in this study. A multivariable logistic regression model with all independent variables a nd a least absolute shrinkage and selection operator (LASSO) regression were used. The latter adds a penalty to the coefficients that are less informative and yields a regression model with a smaller number of variables. A decision tree model was also used ; this methodology repeatedly and if not A, then C like ranch for those with age over 65 and the and the probability of having Medicare under this branch. A random forests model combines many decision trees to reduce their variance. 71 Decision tree and random forests models can account for interactions without explicit statement, and random forests tend to yield high accuracy but reduced interpretability. For random forests models in this study, t he top 15 variables with the highest variable importance calculated by Gini impurity were reported. 72
61 A 10 fold cross validation (CV) framework was used to estim ate the performance of these models. For each CV run, the data for the study were divided into a training set to build the models and a test set to apply the models and test their performance. The models were compared by averaging the means and standard de viations of Area Under corrected t test. 73 Sensitivity and specificity To test the discriminatory power of the sets of the variables in this study, further model testing was conducted using the best model from the CV. The variables were divided into domains of a) clinical variables (CCS codes from HCUP), b) demographics (from HCUP), c) individual level variables (clinical & demographic variables combined), and d) environmental variables (from census, ACS, CBP, and FARA). This study used de pt from review by the University of Florida Institutional Review Board (IRB201701906). Results Study sample selection is shown in Figure 4 1. Among over 21 million individuals in the HCUP data from 2005 to 2014, there were 62,140 who met the case definitio n and had at least 3 years of prior medical history. Those with ZIP code of residence outside of Florida or ZIP code of a residence without linkable environmental characteristics (such as those of state prison, military bases, and university among others) were excluded. The final sample size was 181,322 with 61,530 cases and 121,300 controls. The m ajority of cases was seen for self poisoning at the index visit (74.7%) (Table 4 1 ). Geographic heterogeneity in the self injury rate can be seen in Figure 4 2 wh ich shows the number of cases in our study per 100,000 population by county. As detailed
62 in Table 4 2 the self injury cases were more likely to be younger in age, with less comorbidity, more likely to be female and white, and more likely to be on Medicaid or uninsured (self paying) than controls. Results of preliminary analyses examining the associations of environmental variables with the self injury outcome are shown in Table 4 3 and Figure 4 3 Compared to controls, cases were more likely to have previo usly resided in areas of higher deprivation, of smaller population size, with smaller number of blacks, and of higher median age of residents. In terms of housing characteristics, cases were more likely to reside in areas with lower than average median value of housing unit s smaller households, and lower median household incomes. Compared to controls, cases resided in areas of both lower and higher than average number of veterans. Highest unadjusted ORs were seen for economic variables of the neighborhood; ca ses had 1.26 times the odds of residing in areas with smaller number of businesses (CI: 1.23 1.29) and 1.23 times the odds of residing in areas with smaller number of employees (CI: 1.20 1.26). Interestingly, previous residence in areas with higher numbers of ambulatory health care service and social assistance service were protective with OR of 0.97 (CI: 0.95 0.99) and OR of 0.69 (CI: 0.68 0.71), respectively. Among the four models tested in this study, the random forests model had the highest AUROC of 0.8 68, sensitivity of 80.2%, and specificity of 78.4% (Table 4 4 and Figure 4 4 A ). LASSO and logistic regressions closely followed behind but were inferior to the random forests model (p<0.05). Using random forests, the best performing model, we tested how well each of the domains was able to discriminate between cases and controls (Table 4 5 and Figure 4 4 B ). The model with all variables had the best AUROC
63 of 0.868 and highest sensitivity and specificity, 79.9% and 78.6%, respectively. Although a model with only en vironment level variables had lower AUROC, it was still higher than chance (AUROC: 0.5). Interestingly, the second best performing m odel was composed of individual level variables (demographic & clinical), and its performance was statistically significantly improved with the addition of environmental variables, as shown in the highest AUROC for the model with all variables. After stratifying by age groups, young er (18 39 years old), middle aged (40 64 years old), and old er (65+ years old), the random forests model was applied to each age group. For all ages, the model with all variables had the highest AUROC (Table 4 6 and Figure 4 5 ). Among individuals of young er age, individual and environment level variables had similar performance, 0.787 and 0.780, respecti vely. Overall, the discriminatory ability of environmental variables across all age groups hovered around 0.76 0.78. While the addition of individual variables to the environmental variables improved the model performance for the young er age and middle ag e d groups, its effect was non significant for the old er age group. The environment al variables only model achieved AUROC of 0.781 while the all variables model had AUROC of 0.788 (p=0.2587). We further examined the variables with the highest importance in t he age specific random forests models. These variables represent those most important for the model and can be positively or negatively associated with self injury. Across all ages, several CCS codes for mental health disorders were seen (Figure 4 6 ). The variable with the highest importance, CCS 657, encodes mood disorders, bipolar and depressive disorders. Although the CCS 662 code for suicide and intentional self inflicted injury was
64 an important variable for all age groups, this code only indicated suic idal ideation (ICD 9 CM: V62.84 Suicidal ideation) in this study due to our case definition. We also found several codes unrelated to mental health to be significant predictors of self injury: CCS 205 (Spondylosis, intervertebral disc disorders, and other back problems), CCS 84 (Headache, including migraine), CCS 53 (Disorders of lipid metabolism), and CCS 257 (Other aftercare). Discussion In this study, we tested several hypotheses on the associations of self injury with individual and environment level v ariables. First, we found that individuals with self injury were more likely to reside in areas with lower income and reduced access to health care service. Contrary to our hypothesis, we found that they were not more likely to reside in areas of greater p overty or in areas with reduced social assistance facilities. Cases were also not more likely to reside in food deserts. The random forests model outperformed the decision tree, logistic regression, and LASSO regression models in terms of AUROC, sensitivit y, and specificity. Using both individual and environment level variables yielded the model with the best performance. When specifically examining the young er age and middle age d groups, we found that using both individual and environmental variables were able to classify cases better than using either set of variables alone. In contrast, we found that a model with only environmental variables had comparable performance to a model with individual and environmental variables for the old er age group. However this is most likely due to a reduction of performance of individual level variables among the old er age group, not due to an increase in performance of environmental factors.
65 Limitations This study has several limitations. Although our study is longitudi nal and examines 3 years of clinical history prior to self injury, we did not test any causal pathways in this machine learning framework. Additionally, although we tested machine learning models which are able to accommodate possible interactions, we did not explicitly test interactions w ithin variable domains or cross level interactions across individual and environmental domains. It is possible that such pathways and interactions are better modeled by other statistical methods. 74,75 Ecologic fallacy, or inference about individuals based on gro up level characteristics, is possible in studies incorporating environmental variables. For example, an individual may have resided in an area with greater access to ambulatory health care overall, but this individual may not have that greater access due t o their rural residence. Thus, concluding that everyone in that same residence equally has high access is incorrect. In juxtaposition, there is a possibility for atomistic fallacy, or inference a bout groups based on individual level characteristics. Deduci ng that living in an area with many individuals with mood disorders is associated with self injury would be false, for instance. A study of discordance between individual characteristics and environmental factors would be more appropriate to solidify such inference. In our study, we used ZCTAs as geographic units to merge the individual and environmental level data. For HCUP and CBP, these ZCTAs were derived from ZIP codes. ZIP codes are created for the ease and efficiency of the US postal service, not to s erve as statistical or standardized geographical units. 76 Frequent updates are made in sub annual basis, so our use of the 2010 relationship file may lead to mismatch of ZIP
66 codes to ZCTAs for years other than 2010 and perhaps even within year 2010. Thus, it is possib le that our spatial joins are inaccurate. Moreover, our environmental variables provided a snapshot of the environmental characteristics and do not account for their spatiotemporal variations. Examples of such variations specific to this study include the number of employees recorded for the CBP data (e.g., there may be more employees during end of year holiday seasons) or the number of census tracts over time (i.e., 64,909 census tracts in 2000 vs. 72,365 census tracts in 2010). 49 The exact date of health care encounter is not included in HCUP due to privacy concerns. Therefore, the year of the health care encounter was used to conduct the risk set sampling as well as for linkage to environmental variabl es. We were unable to account for seasonal trends in suicide (suicide rates are highest in late spring and early summer). 57 We were also unable to account for genetics, 77 family history, 78 sexual identity, 79 or mental pain 80 which have been shown to be associated with self inj ury, so confounding cannot be ruled out for our study. Strengths Despite these limitations, our study has a large sample size (n=181,322) with a 99% of hospitals in the state particip ate in HCUP. 40 Thus, the HCUP captures almost the entire population of Florida who has ever visited a hospital during the study period. Although this means that our study is generaliz able only to those who have ever used health care in Florida, this sample population is not biased in terms of geography (state wide coverage), clinical conditions (not limited to individuals in psychiatric treatment or with
67 mental health illness diagnoses like many self injury studies), or payer/insurance status (HCUP is all payer). Our study was able to test multiple environmental factors spanning demography, housing, regional economy and business, and access to health care, social services, and food In comparison, other studies combining individual and environment level characteristics have been conducted using composite socioeconomic indices, 67,81 tested only among adolescents, 66 or examined fatal self injury exclusively. 68,82,83 Additionally, our individual level variables spanned across clinical expertise and health care systems and included all diagnoses from three years prior to first hospital presenting self injury. This allowed us to find pain related cond itions, such as back problems (CCS 205) and headache (CCS 84), and chronic disease like hypercholesterolemia and hyperlipidemia (CCS 53) as important variables in the machine learning models. Models limited to psychiatric conditions or studies limited to p sychiatric patients would have been unable to detect these conditions. In this study, we simultaneously tested over 180 individual and environmental variables and their association with self injury through machine learning. Longitudinal, all payer health c are encounter data like those from HCUP have the potential to make a public health impact, 84 and the addition of environmental factors h ave the potential to reflect a well rounded 85 Moreover, the environmental risk factors can point toward neighborhood or area based interventions, which can be more effective than targeting individuals in terms of cost and r esources Further research to understand geographic variations of self injury and modifiable environmental factors in conjunction with modifiable individual factors is needed.
68 Table 4 1 Distribution of cases and controls by their ICD 9 CM case definit ions. ICD 9 CM ICD 9 CM description Cases (n=61530) n (%) Controls (n=119792) n (%) E950 Suicide and self inflicted poisoning by solid or liquid substances 45976 (74.7) E951 Suicide and self inflicted poisoning by gases in domestic use 5 (0.0) E952 S uicide and self inflicted poisoning by other gases and vapors 379 (0.6) E953 Suicide and self inflicted injury by hanging strangulation and suffocation 826 (1.3) E954 Suicide and self inflicted injury by submersion [drowning] 66 (0.1) E955 Suicide an d self inflicted injury by firearms air guns and explosives 756 (1.2) E956 Suicide and self inflicted injury by cutting and piercing instrument 9662 (15.7) E957 Suicide and self inflicted injuries by jumping from high place 266 (0.4) E958 Suicide and self inflicted injury by other and unspecified means 3594 (5.8) E000 External cause status 93 (0.1) E001 E030 Activity 177 (0.1) E800 E807 Railway accidents 5 (0.0) E810 E819 Motor vehicle traffic accidents 8667 (7.2) E820 E825 Motor vehicle non traffic accidents 355 (0.3) E826 E829 Other road vehicle accidents 651 (0.5) E830 E838 Water transport accidents 75 (0.1) E840 E845 Air and space transport accidents 12 (0.0) E846 E849 Vehicle accidents, not elsewhere classifiable 5798 (4.8) E850 E858 Accidental poisoning by drugs, medicinal substances, and biologicals 1086 (0.9) E860 E869 Accidental poisoning by other solid and liquid substances, gases, and vapors 170 (0.1) E870 E876 Misadventures to patients during surgical and medical care 843 (0.7) E878 E879 Surgical and medical procedures as the cause of abnormal reaction of patient or later complication, without mention of misadventure at the time of procedure 31118 (26.0) E880 E888 Accidental falls 23202 (19.4) E890 E899 Accidents caused by fire and flames 232 (0.2) E900 E909 Accidents due to natural and environmental factors 1269 (1.1) E910 E915 Accidents caused by submersion, suffocation, and foreign bodies 1219 (1.0) E916 E928 Other accidents 27285 (22.8) E929 E929 Late e ffects of accidental injury 1179 (1.0) E930 E949 Drugs, medicinal and biological substances causing adverse effects in therapeutic use 12032 (10.0) E960 E969 Homicide and injury purposely inflicted by other persons 2915 (2.4) E970 E979 Legal interven tion 125 (0.1) E980 E989 Injury undetermined whether accidentally or purposely inflicted 1275 (1.1) E990 E999 Injury resulting from operations of war 9 (0.0)
69 Table 4 2 Demographic characteristics of the study population. Cases are those with sel f injury and controls are those with injury other than self injury. Total (n=181322) n (%) Cases (n=61530) n (%) Controls (n=121300) n (%) p value Age Mean (SD) 47.54 (18.46) 38.47 (14.63) 52.20 (18.47) <0.0001 [18 39] 67263 (37.1) 34621 (5 6.3) 32642 (27.2) <0.0001 [40 64] 75546 (41.7) 23279 (37.8) 52267 (43.6) <0.0001 [65+) 38513 (21.2) 3630 (5.9) 34883 (29.1) <0.0001 Charlson Comorbidity Index Mean (SD) 1.34 (2.01) 0.94 (1.65) 1.54 (2.14) <0.0001 89498 (49.4) 26 575 (43.2) 62923 (52.5) <0.0001 Sex Female 98837 (54.5) 35326 (57.4) 63511 (53.0) <0.0001 Race/Ethnicity Black 32412 (17.9) 6515 (10.6) 25897 (21.6) <0.0001 Hispanic 21318 (11.8) 6258 (10.2) 15060 (12.6) <0.0001 Other 5019 (2. 8) 1466 (2.4) 3553 (3.0) 0.0001 White 122573 (67.6) 47291 (76.9) 75282 (62.8) <0.0001 Payer Medicare 66554 (36.7) 13519 (22.0) 53035 (44.3) <0.0001 Medicaid 22306 (12.3) 11298 (18.4) 11008 (9.2) <0.0001 Self Pay 27689 (15.3) 17421 (28.3) 10268 (8.6) <0.0001 Other 17242 (9.5) 6117 (9.9) 11125 (9.3) <0.0001 Private 47531 (26.2) 13175 (21.4) 34356 (28.7) <0.0001 SD=standard deviation. P values from t test s (if normally distributed), Wilcoxon rank sum test s (if non normally d istributed), and chi square test s with Bonferroni corrections are shown.
70 Table 4 3 Results from b ivariate analyses of environmental variables with self injury as outcome, shown as unadjusted odds ratios and 95% confidence intervals. Variable Cutoff valu es Unadjusted OR (95% CI) Area Deprivation Index Lower deprivation [ 22.3, 102] 0.76 (0.75 0.78) Average deprivation (102, 108] REF Higher deprivation (108, 127] 1.12 (1.09 1.14) Total population Lower population size [13, 23119] 1.18 (1.15 1.21) Average population size (23119, 35825] REF Higher population size (35825, 72249] 0.82 (0.80 0.84) Number of females Lower number of females [7, 12012] 1.10 (1.07 1.12) Average number of females (12012, 18905] REF Higher number of females (18905, 38000] 0.70 (0.69 0.72) Number of whites Lower number of whites [10, 15730] 0.83 (0.81 0.85) Average number of whites (15730, 25521] REF Higher number of whites (25521, 68021] 0.79 (0.77 0.81) Number of blacks Lower number of blacks [0, 1 774 ] 1.19 (1.17 1.22) Average number of blacks (1 774 6068 ] REF Higher number of blacks ( 6068, 53812 ] 0.69 (0.67 0.71) Number of other races Lower number of other races [0, 1457] 0.87 (0.85 0.89) Average number of other races (1457, 3334] REF Higher number of other races (3334, 12430] 0.76 (0.74 0.78) Number of Hispanics Lower number of Hispanics [0, 1903] 0.95 (0.93 0.98) Average number of Hispanics (1903, 5994] REF Hi gher number of Hispanics (5994, 68562] 0.79 (0.77 0.81) Median age of individuals Lower median age of individuals [21.6, 36.6] 0.85 (0.83 0.87) Average median age of individuals (36.6, 42.3] REF Higher median age of individuals (42.3, 82. 8] 1.33 (1.30 1.37) Median value of occupied housing units Lower median value of housing unit s [ $ 13400, $ 154300] 1.18 (1.15 1.21) Average median value of housing unit s ( $ 154300, $ 206800] REF Higher median value of housing unit s ( $ 206800, $ 1000000+] 0.80 (0.78 0.82) Average household size Smaller average household size [1.26, 2.42] 1.29 (1.26 1.32) Around average household size (2.42, 2.74] REF Larger average household size (2.74, 4.81] 0.91 (0.89 0.94) Median household i ncome Lower median household income [$9979, $41043] 1.06 (1.03 1.08) Average median household income ($41043, $52091] REF Higher median household income ($52091, $250000+] 0.83 (0.81 0.85) Number of veterans Lower number of veteran s [0, 1799] 1.07 (1.04 1.09) Average number of veterans (1799, 2856] REF Higher number of veterans (2856, 9816] 1.12 (1.09 1.15)
71 Table 4 3. Continued. Variable Cutoff values Unadjusted OR (95% CI) Percentage of households living below povert y Lower percentage of households living below poverty [0 % 10.6 % ] 0.95 (0.93 0.98) Average percentage of households living below poverty (10.6 % 15.7 % ] REF Higher percentage of households living below poverty (15.7 % 54.7 % ] 0.96 (0.94 0.99 ) Percentage of housing units that are vacant Lower percentage of vacant housing units [0 % 13.7 % ] 0.97 (0.95 1.00) Average percentage of vacant housing units (13.7 % 19.5 % ] REF Higher percentage of vacant housing units (19.5 % 95 % ] 1.06 (1.04 1.09) Number of disabled individuals Lower number of disabled individuals [0, 2944] 1.09 (1.07 1.12) Average number of disabled individuals (2944, 4603] REF Higher number of disabled individuals (4603, 11515] 0.89 (0.87 0.91) Numbe r of business establishments Smaller number of businesses [1, 514] 1.26 (1.23 1.29) Average number of businesses (514, 894] REF Higher number of businesses (894, 4036] 1.06 (1.04 1.09) Number of employees Smaller number of employe es [4, 5439] 1.23 (1.20 1.26) Average number of employees (5439, 11458] REF Higher number of employees (11458, 82131] 0.88 (0.85 0.90) Number of ambulatory health care service s Smaller number of ambulatory health care service s [0, 26] 1.0 5 (1.02 1.07) Average number of ambulatory health care service s (26, 72] REF Higher number of ambulatory health care service s (72, 372] 0.98 (0.96 1.00) Number of social assistance facilities Smaller number of social assistance facilities [0, 8] 0.97 (0.95 0.99) Average number of social assistance facilities (8, 16] REF Higher number of social assistance facilities (16, 53] 0.69 (0.68 0.71) Food desert designation Not food desert 0 REF Food desert 1 0.94 (0.91 0.97) OR= Odds Ratio; CI= Confidence Intervals
72 Table 4 4 Comparison of models used to examine individual and environmental predictors of self injury. Mean and SD of AUROC, sensitivity, and specificity across 10 cross validations are shown Models AUROC SD Se nsitivity Specificity p value Random forests 0.8681 0.0015 0.8024 0.7835 REF Decision tree 0.8146 0.0020 0.7760 0.7260 <0.0001 Logistic regression 0.8505 0.0011 0.7847 0.7655 <0.0001 LASSO regression 0.8522 0.0011 0.7879 0.7642 <0.0001 AUROC= Area Und er Receiver Operating Characteristic; SD= Standard Deviation; P value s show t test s results comparing each model to the model with best AUROC.
73 Table 4 5 Comparison of i ndividual and environmental variable domains using the best performing model, random forests Mean and SD of AUROC, sensitivity, and specificity across 10 cross validations are shown. Variable domains AUROC SD Sensitivity Specificity p value All variables 0.8681 0.0013 0.7992 0.7858 REF Clinical variables 0.8146 0.0010 0.7530 0.7245 <0.0 001 Demographic variables 0.7707 0.0021 0.7179 0.6777 <0.0001 Individual level variables* 0.8397 0.1398 0.7795 0.7569 <0.0001 Environm ent level variables 0.7758 0.0017 0.7136 0.7087 <0.0001 Individual level variables contain both demographic and clin ical variables AUROC= Area Under Receiver Operating Characteristic; SD= Standard Deviation; P value s show t test s results comparing each model to the model with best AUROC.
74 Table 4 6 Comparison of random forests models used to examine individual and en vironmental predictors of self injury stratified by age groups: young er age (18 39), middle aged (40 64), and old er age (65+). Mean and SD of AUROC, sensitivity, and specificity across 10 cross validations are shown. AUROC SD Sensitivity Specificity p val ue Young er age group All variables 0.8404 0.0010 0.7546 0.7758 REF Clinical variables 0.7477 0.0017 0.7286 0.6520 <0.0001 Demographic variables 0.7107 0.0028 0.7137 0.6065 <0.0001 Individual level variables* 0.7870 0.0019 0.7219 0 .7223 <0.0001 Environment level variables 0.7801 0.0012 0.7124 0.7202 <0.0001 Middle age d group All variables 0.8479 0.0017 0.7817 0.7618 REF Clinical variables 0.7936 0.0025 0.7790 0.6716 <0.0001 Demographic variables 0.6908 0.00 21 0.5956 0.6598 <0.0001 Individual level variables* 0.8206 0.0020 0.7714 0.7305 <0.0001 Environment level variables 0.7621 0.0030 0.6970 0.7045 <0.0001 Old er age group All variables 0.7881 0.0033 0.7352 0.6917 REF Clinical variab les 0.6816 0.0063 0.7507 0.5247 <0.0001 Demographic variables 0.5620 0.0053 0.2604 0.8439 <0.0001 Individual level variables* 0.6911 0.0055 0.7567 0.5316 <0.0001 Environment level variables 0.7806 0.0063 0.7542 0.6717 0.2587 Individual le vel variables contain both demographic and clinical variables AUROC= Area Under Receiver Operating Characteristic; SD= Standard Deviation; P value shows t test results comparing each model to the model with best AUROC.
75 Figure 4 1. Flow diagram of st udy population for the second study.
76 Figure 4 2. Number of self injury cases in this study per 100,000 for each county.
77 Figure 4 3. Unadjusted associations between environment level variables and self injury as outcome.
78 Figure 4 4 Receive r Operating Characteristic (ROC) curves of models used to examine individual and environmental predictors of self injury. A) ROC curves of four models using all variables: random forests, decision tree, logistic regression, and LASSO regression. B) ROC cur ves of five models with the best performing model, random forests, fitted on all variables, individual level variables (demographic and clinical), demographic variables only, clinical variables only, and environmental variables only.
79 Figure 4 5 Recei ver Operating Characteristic (ROC) curves of models used to examine predictors of self injury with best performing model, random forests, fitted on all variables, individual level variables (demographic and clinical), demographic variables only, clinical v ariables only, and environmental variables only for A) young er age group (18 39); B) for middle aged group (40 64); and C) for old er age group (65+).
80 Figure 4 6 Top 15 variables with the highest Gini impurity from random forests models for young er ag e (18 39), middle aged (40 64), and old er age (65+) groups. Demographic characteristics are in light blue, clinical variables are in dark blue, and environmental factors are in fuchsia.
81 C HAPTER 5 DESCRIPTIVE STUDY OF HEALTH OUTCOMES FOLLOWING SELF INJURY Background Each year, over 1.4 million Americans are estimated to sustain an intentional self injury. 5 Alarmingly both fatal and nonfatal self injury rates have been increasing in the US and at a faster rate for the middle aged population specifically 3,6,27 In addition to the societal costs of health care and productivity loss 5 nonfatal self injury increas ing the risk for de ath Any n onfatal self injury regardless of method chosen increases the risk of another self injury, especially fatal self injury. 86 A systematic revi ew found that suicide risk is 100 times higher among this population compared to the general population. 87 Death due to any cause is also 8 times higher in this population. 88 History of self injury is the biggest risk factor for repe tition of self injury and mortality due to self injury and all causes. 89 91 Se veral studies have examined the risk of repetition and mortality, but few have examined outcomes other than these two. Among the handful of studies examining other adverse outcomes, one study found that individuals who sustained a self injury at young age experience d more mental and physical health problems (e.g., metabolic syndrome, functional limitations) were more likely to be unemployed, and needed social support such as welfare benefits at midlife. 92 These adverse conditions followi ng a self injury may be related to the lack of follow up care. More than 39% of those who sustained a self injury did not receive any assistance after the injury. 93 Another study found that only 12% of older adults with self injury are referred to mental healthcare. 94 There are no effective treatments to prevent
82 repetition of self injury however ; psychotherapy, outreach, and follow up programs have been ine ffective and suffer from substantial drop outs. 95 98 In addition, current scales and instruments to predict self 99 102 In conclusion there is a lack of literature on outcomes subsequent to self injury other than repetition and mortality and there are limited treatment and screening tools to prevent another self injury Therefore, this exploratory study aimed to identify clinical outcomes among adults who sustained a self injury and visited a hospital for care. We examined the recurrence of self injury and the onset of physical and psychiatric conditions over the adulthood lifespan. Aims and Hypotheses We aimed to a scertain the health outcomes subseq uent to self injury and their associations with environmental factors The following hypotheses were tested: Hypothesis 3.1: Middle aged adults (40 64 years of age) will have a greater number of adverse physical and psychiatric conditions following self in jury compared to young er (18 39 years of age) and older adults (65+ years of age) Hypothesis 3.2: Environmental factors related to lack of employment opportunities will be strongly associated with adverse health outcomes following self injury among middle aged adults. Hypothesis 3.3: Individuals living in areas with low access to health care are more likely to have another self injury than those living in areas with high access to health care.
83 Methods Study Population and Case Definition This study was a r etrospective cohort study of individuals who visited any of the Floridian hospitals participating in the Healthcare Cost and Utilization Project (HCUP) between January 1, 2005 and December 31, 2014 Our cohort of interest was those who sustained a self inj ury as defined by International Classification of Diseases 9 th revision Clinical Modification (ICD 9 CM) diagnoses for self injury (Table A 1). These codes have been commonly used in literature to define those with self injury and are the codes used by the Florida Department of Health. 38,55,103 For each individual with a self injury, we identified i ndividuals for the comparison cohort who visited a hospital or an emergency department for an injury that is not self inflicted (codes are listed in Table A 2) selected 2 comparisons for each case; 1:2 ratio was chosen to increase statistical power Individual s with at least 3 years of health care utilization in the HCUP prior to self injury and at least 1 follow up subsequent to self injury were included in this study. Those with missing informatio n on sex, race/ethnicity, or payer status and /or with in valid ZIP code (Zone Improvement Plan code) of residence ( indicating residence outside of Florida or in a non residential neighborhood ) were excluded. Individual Level Data The HCUP data for the Florida State Emergency Department Databases, the State Inpatient Databases and the State Ambulatory Surgery and Services Databases were used. Individuals were linked across these databases and over time via a HIPAA compliant patient anonymous identifier visitLink. This identifier was assigned by
84 and all owed reconstruction of health care utilization across hospitals and over the years. 42 For this study, we used the following variables from HCUP: sex, race/ethnicity, payer, ZIP code of residence, and diagnoses as coded by ICD 9 CM. We mapped ICD 9 CM codes to their respective Clinical Classifications Software (CCS) codes 44 to reduce the number of variables tested and to increase interpretability. CCS is calculated and validated by the HCUP provider and has been used in self injury literature to make the clinical diagnoses easier to interpret. 55 In order to ascertain outcomes occurring after self injury, we only considered new diagnoses and excluded diagnoses present before or at the time of self injury. For example, if an individual has a diagnosis of CCS 659 (Schizophrenia and other psychotic disorders) dur ing the follow up period which was not present in the 3 years prior to self injury and was not present at index visit, then this person is considered to have acquired this new diagnosis. On the other hand, if CCS 659 was present in the prior history or at index visit, then the diagnosis is not considered to be a new outcome that occurred after self injury. To examine psychiatric and physical health burden, we calculated t he n umber of psychiatric conditions (CCS 650 6 70) and the number of physical conditions (all others) manually. For each individual, we used the original ICD 9 CM diagnoses codes to calculate the Charlson Comorbidity Index (CCI) and Injury Severity Score (ISS). 43,104 CCI has been used in literature to predict mortality risk in the 1 year following hospitalization, but it has also been commonly used to quantify comorbid ity. ISS indicates severity of injury ;
85 we used the ve rsion implemented by 105 All individual level variables were dichotomized into indicator variables Environment Level Data Since socio environmental factors such as neighborhood socioeconomic status, have been found to be associated with rates o f self injury 15,64,106 we wanted to account for and test these e nvironmental factors in our study. W e linked the ZIP code of residence from the HCUP data for each individual in the study with environment level characteristics from the decennial census, 46 the American Community Survey (ACS), 47 the County Business Patterns, 48 and the Food Access Research Atlas (FARA). 49 The specific variables we used are listed in Table A 3. The observatio n units in these datasets included ZIP codes, ZIP Code Tabulation Areas (ZCTAs) and census tracts ; thus, we standardized them into single units across the datasets into ZCTAs For the 2010 decennial census, we retrieved information on the demography of th e population living within a ZCTA, including the total number of individuals residing in the ZCTA and their demographic breakdown by sex, race, ethnicity, and age. From the 2011 ACS, we obtained household and housing characteristics such as the median valu e of occupied housing units, average household size (number of individuals in a household), median household income, percentage of households living below poverty, and percentage of housing units that were vacant out of all housing units in the ZCTA. We ob tained the number of veterans residing within the ZCTA from the 2011 ACS and the number of disabled individuals living in the ZCTA from the 2012 ACS because the variable was not available until 2012. The 5 year estimates, or estimates using 5 years of coll ected data, were used for all ACS variables due to their greater
86 reliability and stability compared to 1 or 3 year estimates 69 For example the 5 year estimate from 2011 use s all data from 200 7 2011 We obtained the number of business establishments and the number of employees at each ZIP code from the 2005 2014 CBP datasets Using the CBP datasets, we calculated the number of ambulatory health care services in the ZIP code using the North American Industry Classification System (NAICS) designation for ambulatory health care services (621) (Table A 4). This variable was calculated as the total number of physicians, dentists, mental health practitioners, and other health care professionals in the ZIP Code. We also calculated the number of social assistance facilities which were designated with NAICS code 624. These facilities included individual and family services, community food services, vocational rehabilitation, and day care. These four variables derived from the CBP were used as proxies for economic health indicators as well as health and social care accessibilities. From the 2006, 2010, and 2015 FARA datasets, we obtained food desert census tracts. Food deserts are areas of low income and low ac cess to healthy food from resources such as supermarkets and grocery stores. Since several census tracts can fall within a ZCTA, we used the density of number of people living in a food desert census tract to designate a ZCTA as food desert For instance, i f there are 4 census tracts within a single ZCTA with 2 that are food deserts and 2 that are not food deserts and if t he number of people living in the two food desert census tracts were greater than the number of people living in the two not food desert census tracts we designated the ZCTA as food desert
87 In addition to these variables from the Census Bureau and the Department of 70 ADI is a composite ind and higher values indicat e greater deprivation. All of the environment level variables were categorized into multinomial variables to reduce the effect of outliers account for non Gaussian distribution of the continuous variables, and variety of scales in which these data are collected They were constructed as tertiles: lower than average (1 st tertile), around average (2 nd or median tertile), and higher than average (3 rd tertile). The utility of this can b e seen with the median value of occupied housing unit s ; the 1 st tertile of lower than median value of housing unit included units valued at $13,400 154,300, the 2 nd tertile with housing units valued at $154,300 $206,800, and the 3 rd tertile with housing un its valued at $206,800 to over $1 million. In comparison to this wide range of values from ten thousands to over a million, median age of individuals in a ZCTA range d from 21.6 to 82.8. Data Linkage The individual level variables from HCUP and the environme nt level variables from the census, ACS, CBP, and FARA were linked prior to analysis. The ZIP code of within HCUP was used to map to ZCTAs (observational units in census and ACS) 50 The year in which the modal ZIP code is most frequently reported was also used to link to datasets that spanned multiple years. For example, since CBP was available every year of the study period, the ZIP code and the year of ZIP code extraction were used to extract the economic characteristics. In contrast, FARA was available for 2006, 2010, and 2015, so
88 we used the FARA dataset from the year closest to the year of ZIP code extraction to designate the food desert status Statistical An alyses The data used in this study were described using means, medians, and standard deviations (SD) with appropriate tests to compare their means. T test s, Wilcoxon rank sum tests, and Kruskal Wallis tests were used with Bonferroni correction for multiple testing. Unadjusted relative risks (RR), adjusted relative risks ( a RR) and their 95% confidence intervals (CI) were calculated for outcomes using generalized linear models (GLM) with log as link function. Each c linical diagnoses was used as dependent v ariable and self injury indicator was the i ndependent variable Adjusted RR s were calculated by adding age, sex, race/ethnicity, and payer as covariates Similarly, the associations between environment level factors and recurrence of self injury was assessed u sing GLM and with 2 nd tertile /median values as referent groups. All analyses were conducted in R. d (IRB201701906). Results Figure 5 1 shows how the cohort of individuals with self injury and comparison cohort of those with injury other than self injury were derived Out of over 21 million individuals identified in our HCUP data, there were 138,035 who met the case definition, and they were matched to a comparison group in a 1:2 ratio using risk set sampling. After excluding those without 3 years of medical history, missing demographics, residence outside of Florida and without any follow up, there wer e 52,877 in the self
89 injury cohort and 90,895 in the comparison cohort. Most of the self injury cohort sustained a poisoning injury (75.9%) (Table 5 1). The description of the cohort is shown in Table 5 2. Those in the self injury cohort were more likely t o be younger (mean age : 38.4; SD : 14.5 vs. mean age: 52.4; SD: 18.3 respectively ). Self injury cohort was mostly female (58.5%), white (77.0%), and likely to be without insurance/self pay (27.2%) Suicidal ideation was found among 30.2% of the self injury coh ort compared to 2.4% in the comparison group The mean CCI was lower for the self injury cohort, but the mean ISS was higher for the self injury cohort. When examining outcomes subsequent to self injury, many of the self injury cohort were diagnosed with m ood disorders (n=12,381 out of 52,877; 23.4%) and anxiety disorders (n=8,811; 16.7%) (Table 5 3) Over 15% ended up having another self injury The unadjusted and adjusted RR were highest for the diagnosis of personality disorders; individuals with self in jury had 14.9 times (CI: 13.1 16.9) the risk of having a personality disorder compared to their controls after adjusting for age, sex, race/ethnicity, and payer. The self injury cohort were at higher risk o f another poisoning injury (aRR: 7.11; CI: 6.3 8.1), getting diagnosed with schizophrenia (aRR : 3.59; CI : 3.4 3.8), and being diagnosed with attention deficit, conduct, and disruptive behavior disorders (aRR : 3.43; C I : 3.0 4.0) (Table 5 3 and Figure 5 2) In terms of non psychiatric conditions, individuals with self inju ry had 2.3 times the risk of coma, stupor or brain damage, 2.0 times the risk of hepatitis, and 1.9 times the risk of epilepsy. Chronic conditions such as chronic obstructive pulmonary disease (16.8%) headache
90 (11.2%) and hypertension (10.8%) were found at a greater prevalence among the self injury cohort. Interestingly, self RR : 0.73; CI : 0.6 0.8) and delirium, dementia, and amnestic and other cognitive disorders ( RR : 0.5; CI : 0.5 0.5) in bivariate analyses but it became a risk factor after accounting for the demographic covariates (aRR : 1.7 and aRR : 1.5, respectively). Similar pattern was seen with osteoarthritis with RR=0.9 and aRR=1.3. Individuals with self injury had on average 1.07 (SD : 1.21) psychiatric c onditions and 7.20 (SD : 7.14) physical conditions based on CCS codes after their self injury. In comparison, i ndividuals in the injury but not self injury cohort had on average 0.48 psychiatric conditions (SD : 0.84) and 6.88 (SD : 6.63) physical conditions Th ese differences were significant and are shown in Figure 5 3 Some individuals had as many as 9 psychiatric conditions and 60 physical conditions following their injury Younger adults had on average 1.1 psychiatric and 6.7 physical conditions, middle aged adults had 1.1 psychiatric and 7.8 physical conditions, and older adults had 1.1 psychiatric and 7.9 physical conditions (Figure 5 4 ). When individuals with self injury were further examined for recurrence, m any of the environment level factors had no sig nificant effects on the recurrence of self injury ( Table 5 4 ) Even after stratifying by age groups, younger, middle aged, and older, most variables did not increase the risk of another self injury (Figure 5 5 shows aRR for middle aged group) For the youn ger adults, living in an area with lower median household income (RR : 1.1; CI : 1.0 1.2) and with higher than average number of veterans ( 1.1; 1.0 1.2) were associated with increased risk of recurrence Living in an
91 area with higher than average number of His panics (RR : 1.2; CI : 1.0 1.3), with smaller average household size (1.2; 1.1 1.3), with lower number of disabled individuals (1.2; 1.0 1.3), and with reduced access to healthy food (1.2; 1.0 1.3) increased the risk of recurrence among the middle aged group Among older adults, living in an area with higher median age of individuals ( 1.6; 1.0 2.6), with higher percentage of vacant housing units (1.6; 1.0 2.4), and with higher number of businesses (1.6; 1.0 2.6) were associated with recurrence. Discussion This study compared the clinical outcomes between 52, 877 individuals with self injury and 90,895 individuals with injury other than self injury. We found that those with self injury were at higher risk of unintentional poisoning and several psychiatric and phys ical conditions. They were more likely to be diagnosed with personality disorders, schizophrenia, and other mental health conditions including substance and alcohol related disorders They were also more likely to have chronic physical outcomes such as co ma, stupor, and brain damage as well as hepatitis, epilepsy chronic obstructive pulmonary disease, hypertension, and diabetes among others The average number of psychiatric conditions affecting the self injury group was two times th at of the comparison g roup, but the average number of physical conditions were similar. Older adults had the highest average numbers of psychiatric and physical conditions. This study also found that environment level factors were unlikely to affect the recurrence among self in jury cohort No environment level factor was consistently found to be associated with risk of recurrence of self injury across age groups. Therefore, contrary to our hypotheses, neither employment related opportunities nor access to health care affected cl inical outcomes in this study.
92 Limitations There are several limitations to this study. To focus on the first episode of self injury and its effect we restricted our analysis to individuals with no prior health care visit for self injury in the 3 years pr ior to index self injury visit. However, it is possible that these individuals may have had self injury which did not require a hospitalization or emergency department visit Most self injuries do not require a health care visit. 5 Our study is also unable to account for those who sustaine d an injury but were not diagnosed as having an injury. Our study also does not include individuals who sustained a fatal self injury without hospital based treatment In this work, w e did not limit our follow up time period since self injury leads to incr eased lifetime risk of another self injury. 55,107 Thus, it is possible that the outcomes we detected are n ot immediate consequences of self injury Our individual level data lacked known risk factors for recurrence of self injury and death such as mental pain, 80 physical pain 108 genetics and family history, 77 childhood environment and adverse childhood events, 109 and sexual identity. 79 Although we did not find any significant environment level factors which protect against recurrence, it is possibl e that inference about individual risk of recurrence should not be made based on group level characteristics since that would lead to ecologic fallacy. Issues regarding the use of ZIP codes or ZCTAs as geographic units have been addressed elsewhere 76 These concerns i nclude frequent updates to geographic boundaries of a ZIP code and mismatch between ZIP code and ZCTA mappings over time However, ZIP code s from health care data are generally likely to be accurate since they are routinely collected for billing and commun ication purposes.
93 We included ISS to account for severity of the injury, but we were unable to produce accurate scores using the computerized algorithm. Most of the individuals in our study had ISS score of 0 (n= 119,477 out of 143,772) indicating no injur y Therefore, ISS was of limited utility Literature has already pointed out limitations in deriving ISS from ICD 9 CM codes. 110 112 Strengths This study also has several strengths First, the c overage of hospital based health care encounters is near universal since 95 99% of hospital emergency departments, inpatient facilities, and outpatient facilities in Florida participate in HCUP 40 In our study, both the self injury and comparison cohorts had injury severe enough to lead to a hospital vi sit We were not limited to a single hospital or single payer system and were able to achieve a large sample size (n=143,772 total). An d as with all studies using health care data, there is no recall or interviewer bias which can be found in survey based self injury studies. In this study, we compared the clinical outcomes among individuals with and without self injury. Using 3 years of p rior medical history and at least 1 follow up visit post injury, we were able to ascertain psychiatric and physical conditions among these individuals. Because much of literature focuses on recurrence, fatal self injury, and all cause mortality, m any of th ese conditions have not been previously examined Current treatment for self injury focus es on reducing these aforementioned outcomes, but not any other outcome s. 95 Although cognitive behavior therapy and medications like Lithium have been shown to reduce recurrence, other social and low cost approaches such as case management and postcard f ollow ups have been found to be ineffective. E ven among those with near fatal self injury, only a small number of people receive therapy
94 and only 12% of those who were treated for self injury were referred to mental health services. 94 In our study, w e found that administrative/social admission was highly prevalent among the self injury cohort (14.3%). Largely, these encompass supervision and hospital based social care, but they can also inclu de homelessness, lack of family able to care for the individual, and psychological stress. Perhaps addressing these specific social conditions may yield fruitful results.
95 Table 5 1 Distribution of individuals in the study by their ICD 9 CM case definit ions. ICD 9 CM ICD 9 CM description Prior self i njury (n=52877) n (%) No prior s elf i njury (n=90895) n (%) E950 Suicide and self inflicted poisoning by solid or liquid substances 40136 (75.9) E951 Suicide and self inflicted poisoning by gases in domesti c use 2 (0.0) E952 Suicide and self inflicted poisoning by other gases and vapors 296 (0.6) E953 Suicide and self inflicted injury by hanging strangulation and suffocation 606 (1.1) E954 Suicide and self inflicted injury by submersion [drowning] 53 ( 0.1) E955 Suicide and self inflicted injury by firearms air guns and explosives 422 (0.8) E956 Suicide and self inflicted injury by cutting and piercing instrument 8123 (15.4) E957 Suicide and self inflicted injuries by jumping from high place 226 (0 .4) E958 Suicide and self inflicted injury by other and unspecified means 3013 (5.7) E000 External cause status 57 (0.1) E001 E030 Activity 108 (0.1) E800 E807 Railway accidents 5 (0.0) E810 E819 Motor vehicle traffic accidents 5906 (6.5) E820 E825 Motor vehicle nontraffic accidents 268 (0.3) E826 E829 Other road vehicle accidents 426 (0.5) E830 E838 Water transport accidents 54 (0.1) E840 E845 Air and Space Transport Accidents 8 (0.0) E846 E849 Vehicle accidents, not elsewhere classifia ble 4603 (5.1) E850 E858 Accidental poisoning by drugs, medicinal substances, and biologicals 895 (1.0) E860 E869 Accidental poisoning by other solid and liquid substances, gases, and vapors 123 (0.1) E870 E876 Misadventures to patients during surgic al and medical care 677 (0.7) E878 E879 Surgical and medical procedures as the cause of abnormal reaction of patient or later complication, without mention of misadventure at the time of procedure 25267 (27.8) E880 E888 Accidental falls 17012 (18.7) E890 E899 Accidents caused by fire and flames 166 (0.2) E900 E909 Accidents due to natural and environmental factors 886 (1.0) E910 E915 Accidents caused by submersion, suffocation, and foreign bodies 941 (1.0) E916 E928 Other accidents 19859 (21.8) E929 E929 Late effects of accidental injury 928 (1.0) E930 E949 Drugs, medicinal and biological substances causing adverse effects in therapeutic use 9479 (10.4) E960 E969 Homicide and injury purposely inflicted by other persons 2170 (2.4) E970 E97 9 Legal intervention 67 (0.1) E980 E989 Injury undetermined whether accidentally or purposely inflicted 984 (1.1) E990 E999 Injury resulting from operations of war 6 (0.0)
96 Table 5 2 Demographic characteristics of the self injury cohort and compa rison cohort. Total (n=143772) n (%) Prior self injury (n=52877) n (%) No prior self injury (n=90895) n (%) p value Age Mean (SD) 47.3 (18.3) 38.4 (14.5) 52.4 (18.3) <0.0001 [18 39] 53702 (37.4) 29712 (56.2) 23990 (26.4) <0.0001 [40 64] 60336 (42.0) 20162 (38.1) 40174 (44.2) <0.0001 [65+) 29734 (20.7) 3003 (5.7) 26731 (29.4) <0.0001 Sex Female 80273 (55.8) 30942 (58.5) 49331 (54.3) <0.0001 Race Black 26247 (18.3) 5629 (10.7) 20618 (22.7) <0.0001 Hispanic 16420 (11.4) 5320 (10.1) 11100 (12.2) <0.0001 Other 3676 (2.6) 1199 (2.3) 2477 (2.7) <0.0001 White 97429 (67.8) 40729 (77.0) 56700 (62.4) <0.0001 Payer Medicare 54252 (37.7) 12121 (22.9) 42131 (46.4) <0.0001 Medicaid 19430 (13.5) 1 0184 (19.3) 9246 (10.2) <0.0001 Self Pay 21866 (15.2) 14355 (27.2) 7511 (8.3) <0.0001 Other 13222 (9.2) 5098 (9.6) 8124 (8.9) 0.0002 Private 35002 (24.4) 11119 (21.0) 23883 (26.3) <0.0001 Pre injury conditions Pre injury suicidal i deation 18088 (12.6) 15946 (30.2) 2142 (2.4) <0.0001 Pre i njury CCI Mean (SD) 1.4 (2.1) 1.0 (1.7) 1.7 (2.2) <0.0001 75056 (52.2) 23840 (45.1) 51216 (56.4) <0.0001 Injury Severity Score Mean (SD) 0.36 (1.0) 0.5 (1.2) 0.2 (0.5) <0.0001 CCI=Charlson Comorbidity Index ; SD= Standard Deviation. P values from t test Wilcoxon rank sum te st and chi square test of proportions with Bonferroni corrections are shown.
97 Table 5 3 Frequencies, u nadjusted and multivariate adjust ed relative risks of clinical conditions in descending order of adjusted relative risk CCS CCS Description Prior Self Injury n (%) No Prior Self Injury n (%) RR (95% CI) Adjusted RR (95% CI) 658 Personality disorders 3407 (6.44) 274 (0.30) 21.37 (18.91 24.16) 14.90 (13.13 16.90) 662 Suicide and intentional self inflicted injury 8128 (15.37) 1395 (1.53) 10.02 (9.47 10.59) 8.83 (8.32 9.37) 241 Poisoning by psychotropic agents 1864 (3.53) 310 (0.34) 10.34 (9.17 11.65) 7.11 (6.27 8.07) 659 Schizophrenia and other psychotic disorders 4176 (7.90) 2106 (2.32) 3.41 (3.24 3.59) 3.59 (3.39 3.80) 652 Attention deficit, conduct, and disruptive behavior disorders 988 (1.87) 251 (0.28) 6.77 (5.89 7.77) 3.43 (2.96 3.96) 255 Administrative/social admission 7549 (14 .28) 2634 (2.90) 4.93 (4.72 5.14) 3.39 (3.24 3.54) 650 Adjustment disorders 966 (1.83) 587 (0.65) 2.83 (2.55 3.13) 3.21 (2.86 3.61) 654 Developmental disorders 958 (1.81) 569 (0.63) 2.89 (2.61 3.21) 2.91 (2.59 3.27) 660 Alcohol related disorders 4610 (8 .72) 2543 (2.80) 3.12 (2.97 3.27) 2.41 (2.29 2.54) 85 Coma; stupor; and brain damage 1656 (3.13) 1318 (1.45) 2.16 (2.01 2.32) 2.30 (2.12 2.50) 661 Substance related disorders 5802 (10.97) 3457 (3.80) 2.89 (2.77 3.00) 1.97 (1.88 2.06) 6 Hepatitis 1605 (3 .04) 922 (1.01) 2.99 (2.76 3.24) 1.96 (1.79 2.14) 83 Epilepsy; convulsions 3386 (6.40) 2719 (2.99) 2.14 (2.04 2.25) 1.90 (1.80 2.01) 235 Open wounds of head; neck; and trunk 2072 (3.92) 2129 (2.34) 1.67 (1.58 1.78) 1.88 (1.76 2.02) 129 Aspiration pneumo nitis; food/vomitus 1576 (2.98) 2398 (2.64) 1.13 (1.06 1.20) 1.75 (1.63 1.88) 657 Mood disorders 12381 (23.41) 12248 (13.47) 1.74 (1.70 1.78) 1.75 (1.71 1.80) 79 Parkinson`s disease 270 (0.51) 634 (0.70) 0.73 (0.64 0.84) 1.72 (1.48 2.01) 10 Immunization s and screening for infectious disease 4841 (9.16) 5049 (5.55) 1.65 (1.59 1.71) 1.67 (1.60 1.74) 651 Anxiety disorders 8811 (16.66) 8347 (9.18) 1.81 (1.76 1.87) 1.64 (1.59 1.70) 5 HIV infection 1312 (2.48) 1471 (1.62) 1.53 (1.42 1.65) 1.63 (1.51 1.77) 8 Other infections; including parasitic 1541 (2.91) 1345 (1.48) 1.97 (1.83 2.12) 1.57 (1.45 1.71) 653 Delirium, dementia, and amnestic and other cognitive disorders 1308 (2.47) 4444 (4.89) 0.51 (0.48 0.54) 1.50 (1.41 1.60) 236 Open wounds of extremities 3 161 (5.98) 3116 (3.43) 1.74 (1.66 1.83) 1.48 (1.40 1.56) 152 Pancreatic disorders (not diabetes) 1264 (2.39) 1398 (1.54) 1.55 (1.44 1.68) 1.48 (1.36 1.61) 136 Disorders of teeth and jaw 2996 (5.67) 2110 (2.32) 2.44 (2.31 2.58) 1.45 (1.37 1.54) 81 Other hereditary and degenerative nervous system conditions 590 (1.12) 728 (0.80) 1.39 (1.25 1.55) 1.43 (1.27 1.61) 125 Acute bronchitis 3135 (5.93) 3032 (3.34) 1.78 (1.69 1.87) 1.41 (1.34 1.49) 239 Superficial injury; contusion 6704 (12.68) 7899 (8.69) 1.46 ( 1.41 1.50) 1.39 (1.34 1.44) 252 Malaise and fatigue 4578 (8.66) 6520 (7.17) 1.21 (1.16 1.25) 1.37 (1.31 1.42) 128 Asthma 2985 (5.65) 3589 (3.95) 1.43 (1.36 1.50) 1.35 (1.28 1.42) 94 Other ear and sense organ disorders 1869 (3.53) 2161 (2.38) 1.49 (1.40 1.58) 1.34 (1.25 1.44)
98 Table 5 3 Continued. CCS CCS Description Prior Self Injury n (%) No Prior Self Injury n (%) RR (95% CI) Adjusted RR (95% CI) 127 Chronic obstructive pulmonary disease and bronchiectasis 8861 (16.76) 13026 (14.33) 1.17 (1.14 1.2 0) 1.34 (1.30 1.37) 232 Sprains and strains 5220 (9.87) 5299 (5.83) 1.69 (1.63 1.76) 1.32 (1.27 1.38) 92 Otitis media and related conditions 1050 (1.99) 834 (0.92) 2.16 (1.98 2.37) 1.32 (1.19 1.45) 245 Syncope 3406 (6.44) 4921 (5.41) 1.19 (1.14 1.24) 1. 32 (1.25 1.38) 84 Headache; including migraine 5943 (11.24) 6407 (7.05) 1.59 (1.54 1.65) 1.32 (1.27 1.37) 93 Conditions associated with dizziness or vertigo 3892 (7.36) 4907 (5.40) 1.36 (1.31 1.42) 1.31 (1.25 1.37) 98 Essential hypertension 5705 (10.79) 6542 (7.20) 1.50 (1.45 1.55) 1.31 (1.26 1.36) 200 Other skin disorders 1859 (3.52) 1912 (2.10) 1.67 (1.57 1.78) 1.29 (1.20 1.38) 139 Gastroduodenal ulcer (except hemorrhage) 1720 (3.25) 2637 (2.90) 1.12 (1.06 1.19) 1.27 (1.19 1.36) 210 Systemic lupus e rythematosus and connective tissue disorders 234 (0.44) 254 (0.28) 1.58 (1.33 1.89) 1.26 (1.04 1.54) 244 Other injuries and conditions due to external causes 6177 (11.68) 8887 (9.78) 1.19 (1.16 1.23) 1.26 (1.22 1.31) 168 Inflammatory diseases of female p elvic organs 990 (1.87) 733 (0.81) 2.32 (2.11 2.55) 1.26 (1.14 1.40) 49 Diabetes mellitus without complication 3704 (7.00) 5688 (6.26) 1.12 (1.08 1.17) 1.25 (1.20 1.31) 203 Osteoarthritis 3115 (5.89) 5806 (6.39) 0.92 (0.88 0.96) 1.25 (1.20 1.31) 163 Gen itourinary symptoms and ill defined conditions 4601 (8.70) 7199 (7.92) 1.10 (1.06 1.14) 1.24 (1.20 1.30) 138 Esophageal disorders 6164 (11.66) 9539 (10.49) 1.11 (1.08 1.14) 1.24 (1.20 1.28) 134 Other upper respiratory disease 2188 (4.14) 2913 (3.20) 1.29 (1.22 1.36) 1.24 (1.16 1.31) 198 Other inflammatory condition of skin 797 (1.51) 954 (1.05) 1.44 (1.31 1.58) 1.23 (1.11 1.37) 154 Noninfectious gastroenteritis 2643 (5.00) 3348 (3.68) 1.36 (1.29 1.43) 1.23 (1.16 1.30) Adjustments include demographic characteristics (age, sex, race/ethnicity, and payer ) and adjustment for prior condition of same CCS. CCS= Clinical Classifications Software; RR= Relative Risk; CI= Confidence Intervals.
99 T able 5 4 Environment level factors and their associations with re currence of self injury using multivariate log binomial regression. Relative risks and 95% confidence intervals per age groups are shown. Younger Age RR (95% CI) Middle Aged RR (95% CI) Older Age RR (95% CI) Area Deprivation Index Lower deprivati on 0.96 (0.86 1.06) 0.98 (0.87 1.12) 1.34 (0.82 2.21) Average deprivation REF REF REF Higher deprivation 0.93 (0.86 1.01) 1.03 (0.92 1.14) 0.93 (0.56 1.55) Total population Smaller population size 1.03 (0.83 1.28) 1.00 (0.74 1.34) 1.19 ( 0.34 4.09) Average population size REF REF REF Larger population size 0.79 (0.63 0.99) 1.11 (0.84 1.47) 0.99 (0.30 3.32) Number of females Lower number of females 1.16 (0.94 1.44) 0.94 (0.70 1.26) 1.08 (0.31 3.78) Average number of females REF REF REF Higher number of females 1.13 (0.90 1.41) 0.93 (0.71 1.22) 1.12 (0.36 3.51) Number of whites Lower number of whites 1.00 (0.91 1.10) 0.92 (0.81 1.04) 1.04 (0.61 1.79) Average number of whites REF REF REF Higher n umber of whites 1.05 (0.95 1.16) 0.99 (0.87 1.12) 1.20 (0.67 2.16) Number of blacks Lower number of blacks 1.04 (0.96 1.12) 0.97 (0.88 1.07) 0.76 (0.50 1.16) Average number of blacks REF REF REF Higher number of blacks 0.97 (0.90 1.06) 0 .93 (0.83 1.03) 1.44 (0.88 2.37) Number of other races Lower number of other races 0.89 (0.80 0.99) 1.03 (0.90 1.18) 0.66 (0.36 1.21) Average number of other races REF REF REF Higher number of other races 1.07 (0.97 1.17) 1.01 (0.89 1.14 ) 0.84 (0.47 1.50) Number of Hispanics Lower number of Hispanics 1.03 (0.94 1.13) 0.95 (0.84 1.07) 0.92 (0.54 1.58) Average number of Hispanics REF REF REF Higher number of Hispanics 1.03 (0.94 1.12) 1.16 (1.03 1.31)* 1.13 (0.64 1.98) M edian age of individuals Lower median age of individuals 1.01 (0.94 1.09) 1.06 (0.95 1.17) 1.46 (0.86 2.48) Average median age of individuals REF REF REF Higher median age of individuals 1.06 (0.98 1.16) 1.01 (0.91 1.12) 1.65 (1.03 2.64)* Median value of occupied housing unit s Lower median value of housing unit s 1.02 (0.94 1.11) 1.05 (0.94 1.17) 1.00 (0.60 1.68) Average median value of housing unit s REF REF REF Higher median value of housing unit s 0.99 (0.90 1.09) 0.98 ( 0.87 1.10) 1.08 (0.65 1.78)
100 Table 5 4 Continued. Younger Age RR (95% CI) Middle Aged RR (95% CI) Older Age RR (95% CI) Average household size Smaller average household size 0.99 (0.92 1.07) 1.19 (1.07 1.31)* 0.73 (0.47 1.13) Around averag e household size REF REF REF Larger average household size 1.04 (0.96 1.12) 0.98 (0.89 1.09) 0.74 (0.45 1.20) Median household income Lower median household income 1.10 (1.01 1.20)* 0.89 (0.79 1.00) 1.05 (0.61 1.81) Average median househ old income REF REF REF Higher median household income 1.06 (0.97 1.17) 1.04 (0.92 1.18) 1.02 (0.63 1.67) Number of veterans Lower number of veterans 0.96 (0.89 1.05) 0.95 (0.85 1.06) 0.79 (0.50 1.24) Average number of veterans REF REF RE F Higher number of veterans 1.08 (1.00 1.18)* 0.98 (0.88 1.09) 0.89 (0.55 1.43) Percentage of households living below poverty Lower percentage of households living below poverty 1.07 (0.99 1.16) 0.98 (0.89 1.09) 0.95 (0.62 1.45) Average percentage of households living below poverty REF REF REF Higher percentage of households living below poverty 1.06 (0.97 1.16) 1.07 (0.95 1.20) 1.09 (0.64 1.88) Percentage of housing units that are vacant Lower percentage of vacant housing u nits 1.05 (0.98 1.13) 1.05 (0.96 1.15) 1.27 (0.81 1.98) Average percentage of vacant housing units REF REF REF Higher percentage of vacant housing units 0.98 (0.92 1.06) 0.99 (0.90 1.09) 1.58 (1.04 2.42) Number of disabled individuals Lo wer number of disabled individuals 1.03 (0.94 1.14) 1.16 (1.02 1.32)* 1.33 (0.76 2.35) Average number of disabled individuals REF REF REF Higher number of disabled individuals 1.05 (0.96 1.14) 0.91 (0.82 1.02) 1.11 (0.68 1.80) Number of business establishments Smaller number of businesses 0.97 (0.87 1.07) 1.05 (0.92 1.20) 1.52 (0.85 2.70) Average number of businesses REF REF REF Higher number of businesses 1.05 (0.96 1.15) 0.99 (0.88 1.11) 1.62 (1.00 2.64)* Number of employees Smaller number of employees 1.02 (0.93 1.12) 1.01 (0.90 1.14) 1.06 (0.62 1.80) Average number of employees REF REF REF Higher number of employees 1.03 (0.95 1.12) 1.09 (0.98 1.22) 0.82 (0.52 1.28) Number of ambulatory health care servic e s Smaller number of ambulatory health care service s 1.00 (0.92 1.09) 0.92 (0.82 1.02) 1.10 (0.67 1.82) Average number of ambulatory health care service s REF REF REF Higher number of ambulatory health care service s 1.05 (0.96 1.14) 0.97 ( 0.87 1.08) 1.26 (0.79 2.02)
101 Table 5 4. Continued. Younger Age RR (95% CI) Middle Aged RR (95% CI) Older Age RR (95% CI) Number of social assistance facilities Smaller number of social assistance facilities 1.00 (0.93 1.08) 0.98 (0.88 1.08) 1. 39 (0.91 2.11) Average number of social assistance facilities REF REF REF Higher number of social assistance facilities 0.98 (0.91 1.06) 1.04 (0.93 1.15) 1.14 (0.72 1.82) Food desert designation Not food desert REF REF REF Food dese rt 1.02 (0.94 1.11) 1.15 (1.04 1.28)* 1.26 (0.75 2.12) Indicate confidence intervals that do not include 1. RR= Relative Risk; CI= Confidence Intervals.
102 Figure 5 1. Derivation of study population from HCUP Florida
103 Figure 5 2. Adjusted relati ve risks for clinical outcomes after self injury with 95% confidence intervals (CI).
104 Figure 5 3 Number of psychiatric and physical conditions for cohort with prior self injury and cohort without prior self injury.
105 Figure 5 4 Number of psychia tric and physical conditions by age group within prior self injury cohort. P values from paired Wilcoxon rank sum tests are shown.
106 Figure 5 5. Adjusted relative risks of showing associations between environmental variables and self injury recurrence for middle aged group
107 CHAPTER 6 CONCLUSION Summary of Findings This dissertation confirms previous findings on presence of any mental health disorder as a risk factor for self injury. 33,55,113 This dissertation adds to the literature on the unrecog nized unintentional poisonings as a possible risk factor for self injury. Unintentional poisonings and intentional poisonings are difficult to dis tinguish clinically since both behaviors indicat e i njury upon self departing from human nature or drive to sur vive. While i ntentional self poisoning (or self injury by poisoning) has been recognized to increase the risk of suicide 89 the association with unintentional poisoning has not been seen. These accidental overdoses may be related to i ncreased substance use which have been recognized by American Association of Suicidology and Centers for Disease Control and Prev ention as proximal warning signs of self injury 33 Several statistical and machine learning methods have been applied to health care data to predict self injury. 55,74,114,115 This dissertation found that although certain machine learning methods such as random forest and LASSO w ere able to yield high discriminatory performance, a simple regression model with known risk factors (13 mental health conditions) and demographic covariates performed just as well. Performance of new prediction models should be compared with a model based on known risk factors. Recentl y published machine learning based prediction models ha ve yielded mixed results in terms of performance and have been limited by their low positive predictive value s (PPV s ) 114 Although PPV will always remain low (<0.01) due to rare incidence of self injury, we found that the combination of environment level factors wi th
108 individual level data, such as health care data, can have better classification accuracy than individual level factors alone. Incorporation of area or neighborhood level factors may aid in creating better performing prediction models. However, this mus t be conducted with an improved collection and application of environment data 36 It has been noted that the heterogeneity of associations with area socioeconomic characteristics and s elf injury may be due to the lack of consistency of the study d esigns and environment level data collection There are no guidelines on how to test environment level factors with self injury. Lastly, this dissertation found that individuals with self injury suffer adverse outcomes in psychiatric and physical health wh ich are not mitigated by environment level factors. Future Research Self injury has a long history of research. 33 Yet, key definitions and a greement on terminology are lacking among self injury researchers 20 22 S elf injury can encompass self injuries with both suicidal and non suicidal intent s but s uch distinction is not always possible since the intention may chan ge over time 60,116 Since non suicidal self injury is highly correlated to future suicidal self injury, 117 researchers have proposed that non suicidal self injury with suicidal intent. 118 Measurement of inten t and clarification of vocabulary a re critical. Measurement of self injury can also be improved. This dissertation relied on ICD codes (E950 E959) which are commonly used in self injury studies with PPV over 82%. 55,103 However, a recent systematic review noted that this high PPV is applicable to
109 self injury hospitalizations and not for suicides. 60 The performance of these ICD codes for less severe self injury is unknown. A rec urrent goal of the National Action Alliance for Suicide Prevention and goal of many others, clinicians or otherwise, is to predict who is at risk of self injury. 119 The poor prediction performance of scales currentl y used in clinical practice have been recognized. 99,101,120 Currently, s cales commonly used to quantify self injurious thoughts 99 and recent studies using health care data have shown to predict self injury better than clinicians. 121 With the increasing availability of health care data and other novel sources of data that reflect our environment and lifestyle, data driven models may be able to outperform our current abilities.
110 APPENDIX SUPPLEMENTARY MATERIAL Table A 1 List of ICD 9 CM codes used to define cases ICD 9 Code Code Description E950 Suicide and self inflicted poisoning by solid or liquid substances E951 Suicide and self inflicted poisoning by gases in domestic use E952 Suicide and self inflicted poisoning by other gases and vapors E953 Suicide and self inflicted injury by hanging strangulation and suffocation E954 Suicide and self inflicted injury by submersion [drowning] E955 Suicide and self inflicted injury by firearms air guns and explosives E956 Suicide and self inflicted injury by cutting and piercing instrument E957 Suicide and self inflicted injuries by jumping from high place E958 Suicide and self inflicted injury b y other and unspecified means
111 Table A 2. List of ICD 9 CM codes used to define controls ICD 9 Code Code Description E000 External cause status E001 E030 Activity E800 E807 Railway accidents E810 E819 Motor vehicle traffic accidents E820 E825 Motor vehicle nontraffic accidents E826 E829 Other road vehicle accidents E830 E838 Water transport accidents E840 E845 Air and space transport accidents E846 E849 Vehicle accidents, not elsewhere classifiable E850 E858 Accidental poisoning by drugs, medic inal substances, and biologicals E860 E869 Accidental poisoning by other solid and liquid substances, gases, and vapors E870 E876 Misadventures to patients during surgical and medical care E878 E879 Surgical and medical procedures as the cause of abnorm al reaction of patient or later complication, without mention of misadventure at the time of procedure E880 E888 Accidental falls E890 E899 Accidents caused by fire and flames E900 E909 Accidents due to natural and environmental factors E910 E915 Accid ents caused by submersion, suffocation, and foreign bodies E916 E928 Other accidents E929 E929 Late effects of accidental injury E930 E949 Drugs, medicinal and biological substances causing adverse effects in therapeutic use E950 E959 Suicide and self inflicted injury E960 E969 Homicide and injury purposely inflicted by other persons E970 E979 Legal intervention E980 E989 Injury undetermined whether accidentally or purposely inflicted E990 E999 Injury resulting from operations of war
112 Table A 3. E nvironmental variables used in this study with their sources*. Source Variable Name Variable Description Census cPop Total population Census cFemale Number of females Census cWhite Number of whites Census cBlack Number of blacks Census cOther Number o f other races Census cHisp Number of Hispanics Census cAge Median age of individuals ACS aHValue Median value of occupied housing units ACS aHSize Average household size ACS aInc Median household income ACS aVet Number of veterans ACS aPov Percentag e of households living below poverty ACS aVacant Percentage of housing units that are vacant ACS aDis Number of disabled individuals CBP bEst Number of business establishments CBP bEmp Number of employees CBP bAmb Number of ambulatory health care serv ices CBP bSocial Number of social assistance facilities FARA Desert Designation as food desert based on income and food access 2010 decennial census; 2011 and 2012 American Community Survey (ACS) 5 year estimates; 2005 2014 County Business Patterns (C BP); and 2006, 2010, and 2015 Food Access Research Atlas (FARA) were used.
113 Table A 4. Brief list of 2007 North American Industry Classification System (NAICS) designations for health care facilities in County Business Patterns data. Full detailed list ca n be accessed: https://www.census.gov/cgi bin/sssd/naics/naicsrch?chart=2007. NAICS NAICS Description 62 Health care and social assistance 621 Ambulatory health care services 6211 Offices of physicians 6212 Offices of dentists 6213 Offices of other he alth practitioners 62131 Offices of chiropractors 62132 Offices of optometrists 62133 Offices of mental health practitioners (except physicians) 62134 Offices of physical, occupational and speech therapists, and audiologists 62139 Offices of all other health practitioners 6214 Outpatient care centers 62141 Family planning centers 62142 Outpatient mental health and substance abuse centers 62149 Other outpatient care centers 621491 HMO medical centers 621492 Kidney dialysis centers 621493 Freestan ding ambulatory surgical and emergency centers 621498 All other outpatient care centers 6215 Medical and diagnostic laboratories 621511 Medical laboratories 621512 Diagnostic imaging centers 6216 Home health care services 6219 Other ambulatory health care services 62191 Ambulance services 62199 All other ambulatory health care services 621991 Blood and organ banks 621999 All other miscellaneo us ambulatory health care services 624 Social assistance 6241 Individual and family services 62411 Child and youth services 62412 Services for the elderly and persons with disabilities 62419 Other individual and family services 6242 Community food and housing, and emergency and other relief services 62421 Community food services 624221 Temporary shelter s 624229 Other community housing services 62423 Emergency and Other Relief Services 6243 Vocational rehabilitation services 6244 Child day care s ervices
114 LIST OF REFERENCES 1. Centers for Disease Contro l and Prevention, National Center for Injury Prevention and Control. WISQARS (Web based Injury Statistics Query and Reporting System). https://www.cdc.gov/injury/wisqars/index.html. Published 2018. Accessed January 31, 2018. 2. U.S. Department of Health a nd Human Services. Healthy People 2020: Mental Health and Mental Disorders. https://www.healthypeople.gov/2020/data search/Search the Data#topic area=3498; Published 2018. Accessed October 23, 2018. 3. Curtin SC, Warner M, Hedegaard H. Increase in Suicide in the United States, 1999 2014 Vol 241. Hyattsville, MD: National Center for Health Statistics; 2016. https://www.cdc.gov/nchs/products/databriefs/db241.htm. Accessed January 15, 2018. 4. Nock MK. Self Injury. Annu Rev Clin Psychol 2010;6(1):339 363. doi:10.1146/annurev.clinpsy.121208.131258 5. Centers for Disease Control and Prevention, for Disease Control, Prevention, Centers for Disease Control and Prevention. Suicide Facts at a Glance 2015 .; 2015. https://www.cdc.gov/violenceprevention/pdf/suicide datasheet a.pdf. Accessed July 24, 2017. 6. Olfson M, Blanco C, Wall M, et al. National trends in suicide attempts among adults in the united states. JAMA Psychiatry 2017;74(11):1095 1103. doi:10.1001/jamapsychiatry.2017.2582 7. Johns MM, Lowry R, Andr zejewski J, et al. Transgender Identity and Experiences of Violence Victimization, Substance Use, Suicide Risk, and Sexual Risk Behaviors Among High School Students 19 States and Large Urban School Districts, 2017. MMWR Morb Mortal Wkly Rep 2019;68(3):6 7 71. doi:10.15585/mmwr.mm6803a3 8. Conwell Y, Van Orden K, Caine ED. Suicide in older adults. Psychiatr Clin North Am 2011;34(2):451 468, ix. doi:10.1016/j.psc.2011.02.002 9. Cramer RJ, Kapusta ND. A Social Ecological Framework of Theory, Assessment, a nd Prevention of Suicide. Front Psychol 2017;8:1756. doi:10.3389/fpsyg.2017.01756 10. Hawton K, Casaas i Comabella C, Haw C, Saunders K. Risk factors for suicide in individuals with depression: A systematic review. J Affect Disord 2013;147(1 3):17 28. doi:10.1016/j.jad.2013.01.004
115 11. Fox KR, Franklin JC, Ribeiro JD, Kleiman EM, Bentley KH, Nock MK. Meta analysis of risk factors for nonsuicidal self injury. Clin Psychol Rev 2015;42:156 167. doi:10.1016/j.cpr.2015.09.002 12. WHO. Preventing Suicide: A Global Imperative (World Health Organization, ed.). Geneva: World Health Organization; 2014. 13. Ten Have M, De Graaf R, Van Dorsselaer S, et al. Incidence and course of suicidal ideation and suicide attempts in the general population. Can J Psychiatry 2009;54(12):824 833. doi:10.1177/070674370905401205 14. National Institute of Mental Health. Suicide. https://www.nimh.nih.gov/health/statistics/suicide.shtml. Published 2019. Accessed May 17, 2019. 15. Milner A, Hjelmeland H, Arensman E, De Leo D. Soc ial environmental factors and suicide mortality: A narrative review of over 200 articles. Sociol mind 2013;3(2):137 148. 16. Cheng D. Higher suicide death rate in rocky mountain states and a correlation to altitude. Wilderness Environ Med 2010;21(2):177 178. doi:10.1016/j.wem.2010.01.004 17. Haws CA, Gray DD, Yurgelun Todd DA, Moskos M, Meyer LJ, Renshaw PF. The possible effect of altitude on regional variation in suicide rates. Med Hypotheses 2009;73(4):587 590. doi:10.1016/j.mehy.2009.05.040 18. Kim N, Mickelson JB, Brenner BE, Haws CA, Yurgelun Todd DA, Renshaw PF. Altitude, Gun Ownership, Rural Areas, and Suicide. Am J Psychiatry 2011;168(1):49 54. doi:10.1176/appi.ajp.2010.10020289 19. Crosby AE, Han B, Ortega LAG, Park s SE, Gfroerer J, Centers for Disease Control and Prevention (CDC). Suicidal thoughts and behaviors among adults -United States, 2008 2009. MMWR Surveill Summ 2011;60(13):1 22. http://www.ncbi.nlm.nih.gov/pubmed/22012169. Accessed October 2 5, 2018. 20. Turecki G, Brent DA. Suicide and suicidal behaviour. Lancet 2016;387(10024):1227 1239. doi:10.1016/S0140 6736(15)00234 2 21. Muehlenkamp JJ, Claes L, Havertape L, Plener PL. International prevalence of adolescent non suicidal self injury an d deliberate self harm. Child Adolesc Psychiatry Ment Health 2012;6(1):10. doi:10.1186/1753 2000 6 10 22. Oquendo MA, Baca Garcia E. Suicidal behavior disorder as a diagnostic entity in the DSM 5 classification system: advantages outweigh limitations. Wo rld Psychiatry 2014;13(2):128 130. doi:10.1002/wps.20116
116 23. Crosby A, Ortega L, Melanson C. Self Directed Violence Surveillance: Uniform Definitions and Recommended Data Elements, Version 1.0 Atlanta, GA; 2011. https://www.cdc.gov/violenceprevention/pd f/Self Directed Violence a.pdf. Accessed February 15, 2018. 24. Klonsky ED, May AM, Saffer BY. Suicide, Suicide Attempts, and Suicidal Ideation. Annu Rev Clin Psychol 2016;12(1):307 330. doi:10.1146/annurev clinpsy 021815 093204 25. Choi NG, DiNitto DM, Marti CN, Kaplan MS. Older Suicide Decedents: Intent Disclosure, Mental and Physical Health, and Suicide Means. Am J Prev Med 2017;53(6):772 780. doi:10.1016/j.amepre.2017.07.021 26. Stone DM, Holland KM, Schiff LB, McIntosh WL. Mixed Methods Analysis o f Sex Differences in Life Stressors of Middle Aged Suicides. Am J Prev Med 2016;51(5):S209 S218. doi:10.1016/J.AMEPRE.2016.07.021 27. Hempstead KA, Phillips JA. Rising suicide among adults aged 40 64 years: the role of job and financial circumstances. Am J Prev Med 2015;48(5):491 500. doi:10.1016/j.amepre.2014.11.006 28. de Raykeer RP, Hoertel N, Blanco C, et al. Effects of Psychiatric Disorders on Suicide Attempt. J Clin Psychiatry 2018;79(6). doi:10.4088/JCP.17m11911 29. Ting SA, Sullivan AF, Boudre aux ED, Miller I, Camargo CA. Trends in US emergency department visits for attempted suicide and self inflicted injury, 1993 2008. Gen Hosp Psychiatry 2012;34(5):557 565. doi:10.1016/J.GENHOSPPSYCH.2012.03.020 30. Bazargan Hejazi S, Ahmadi A, Bazargan M, et al. Profile of Hospital Admissions due to Self Inflicted Harm in Los Angeles County from 2001 to 2010. J Forensic Sci 2017;62(5):1244 1250. doi:10.1111/1556 4029.13416 31. Doshi A, Boudreaux ED, Wang N, Pelletier AJ, Camargo CA. National Study of US Emergency Department Visits for Attempted Suicide and Self Inflicted Injury, 1997 2001. Ann Emerg Med 2005;46(4):369 375. doi:10.1016/J.ANNEMERGMED.2005.04.018 32. Mathews EM, Woodward CJ, Musso MW, Jones GN. Suicide Attempts Presenting to Trauma Centers : Trends across Age Groups Using the National Trauma Data Bank. Am J Emerg Med 2016;34(8):1620 1624. doi:10.1016/j.ajem.2016.06.014 33. Franklin JC, Ribeiro JD, Fox KR, et al. Risk factors for suicidal thoughts and behaviors: A meta analysis of 50 years of research. Psychol Bull 2017;143(2):187 232. doi:10.1037/bul0000084
117 34. National Action Alliance for Suicide Prevention. A Prioritized Research Agenda for Suicide Prevention: An Action Plan to Save Lives .; 2014. http://actionallianceforsuicidepreventi on.org/sites/actionallianceforsuicidepreventio n.org/files/Agenda.pdf. Accessed June 9, 2017. 35. Moden B, Ohlsson H, Merlo J, Rosvall M. Risk factors for diagnosed intentional self injury: a total population based study. Eur J Public Health 2014;24(2):28 6 291. doi:10.1093/eurpub/ckt066 36. Rehkopf DH, Buka SL. The association between suicide and the socio economic characteristics of geographical areas: a systematic review. Psychol Med 2005;36(02):145. doi:10.1017/S003329170500588X 37. Li G, Baker SP. I njury Research: Theories, Methods, and Approaches Springer; 2012. 38. Florida Department of Health. Injury Prevention. http://www.floridahealth.gov/programs and services/prevention/injury prevention/index.html. Published 2018. Accessed October 24, 2018. 39. American Psychiatric Associaton. Diagnostic and Statistical Manual of Mental Disorders. In: Diagnostic and Statistical Manual of Mental Disorders Washington, D.C.: American Psychiatric Association; 2013. 40. Agency for Healthcare Research and Qualit y. Introduction to the HCUP state inpatient databases (SID). Healthc Cost Util Proj 2012;4287(866). http://www.hcup us.ahrq.gov/db/state/siddist/Introduction_to_SID.pdf. 41. Healthcare Cost and Utilization Project (HCUP). HCUP Supplemental Variables for Revisit Analyses. https://www.hcup us.ahrq.gov/toolssoftware/revisit/revisit.jsp. Published 2019. Accessed June 10, 2019. 42. Metcalfe D, Zogg CK, Haut ER, Pawlik TM, Haider AH, Perry DC. Data resource profile: State Inpatient Databases. Int J Epidemiol July 2019. doi:10.1093/ije/dyz117 43. Quan H, Sundararajan V, Halfon P, et al. Coding algorithms for defining comorbidities in ICD 9 CM and ICD 10 administrative data. Med Care 2005;43(11):1130 1139. http://www.ncbi.nlm.nih.gov/pubmed/16224307. Accessed January 7, 2019. 44. Elixhauser A, Steiner C, Palmer L. Clinical Classifications Software (CCS). http://www.hcup us.ahrq.gov/toolssoftware/ccs/ccs.jsp. Published 2015.
118 45. Hedegaard H, Crosby A, Holland K. Issues in Developing a Surveillance Case Defin ition for Nonfatal Suicide Attempt and Intentional Self harm Using International Classification of Diseases, Tenth Revision, Clinical Modification (ICD 10 CM) Coded Data. 2018. https://www.cdc.gov/nchs/data/nhsr/nhsr108.pdf. Accessed July 10, 2018. 46. Un ited States Census Bureau. Decennial Census of Population and Housing. https://www.census.gov/programs surveys/decennial census/decade.2010.html. Published 2019. Accessed April 25, 2019. 47. United States Census Bureau. American Community Survey (ACS). ht tps://www.census.gov/programs surveys/acs/. Published 2019. Accessed April 25, 2019. 48. United States Census Bureau. County Business Patterns (CBP). https://www.census.gov/programs surveys/cbp.html. Published 2019. Accessed April 25, 2019. 49. United St ates Department of Agriculture. Food Access Research Atlas. https://www.ers.usda.gov/data products/food access research atlas/. Published 2017. Accessed April 25, 2019. 50. American Academy of Family Physicians. ZCTA to ZIPCode Crosswalk. https://www.udsm apper.org/zcta crosswalk.cfm. Published 2019. Accessed April 5, 2019. 51. Casey JA, Schwartz BS, Stewart WF, Adler NE. Using Electronic Health Records for Population Health Research: A Review of Methods and Applications. http://dx.doi.org/101146/annurev p ublhealth 032315 021353 March 2016. doi:10.1146/ANNUREV PUBLHEALTH 032315 021353 52. Luoma JB, Martin CE, Pearson JL. Contact With Mental Health and Primary Care Providers Before Suicide: A Review of the Evidence. Am J Psychiatry 2002;159(6):909 916. do i:10.1176/appi.ajp.159.6.909 53. Trofimovich L, Skopp NA, Luxton DD, Reger MA. Health care experiences prior to suicide and self inflicted injury, active component, U.S. Armed Forces, 2001 2010. MSMR 2012;19(2):2 6. http://www.ncbi.nlm.nih.gov/pubmed/223 72750. Accessed July 13, 2017. 54. Lachman ME. Development in Midlife. Annu Rev Psychol 2004;55(1):305 331. doi:10.1146/annurev.psych.55.090902.141521 55. Barak Corren Y, Castro VM, Javitt S, et al. Predicting suicidal behavior from longitudinal electro nic health records. Am J Psychiatry 2017;174(2):154 162. doi:10.1176/appi.ajp.2016.16010077
119 56. Marshall SW. Injury case Epidemiology 2008;19(2):277 279. doi:10.1097/EDE.0b013e3181632700 57. Postolac he TT, Mortensen PB, Tonelli LH, et al. Seasonal spring peaks of suicide in victims with and without prior history of hospitalization for mood disorders. J Affect Disord 2010;121(1 2):88 93. doi:10.1016/J.JAD.2009.05.015 58. Annest JL, Fingerhut LA, Gall agher SS, et al. Strategies to improve external cause of injury coding in state based hospital discharge and emergency department data systems: recommendations of the CDC Workgroup for Improvement of External Cause of Injury Coding. Morb Mortal Wkly Rep 2 008;57(RR 1):1 15. http://www.ncbi.nlm.nih.gov/pubmed/18368008. 59. Hunt PR, Hackman H, Berenholz G, McKeown L, Davis L, Ozonoff V. Completeness and accuracy of international classification of disease (ICD) external cause of injury codes in emergency depa rtment electronic data. Inj Prev 2007;13(6):422 425. doi:10.1136/ip.2007.015859 60. Walkup JT, Townsend L, Crystal S, Olfson M. A systematic review of validated methods for identifying suicide or suicidal ideation using administrative or claims data. Pha rmacoepidemiol Drug Saf 2012;21(S1):174 182. doi:10.1002/pds.2335 61. Hawton K, Sutton L, Haw C, Sinclair J, Deeks JJ. Schizophrenia and suicide: systematic review of risk factors. Br J Psychiatry 2005;187(1):9 20. doi:10.1192/bjp.187.1.9 62. Cheong K S, Choi M H, Cho B M, et al. Suicide rate differences by sex, age, and urbanicity, and related regional factors in Korea. J Prev Med Public Health 2012;45(2):70 77. doi:10.3961/jpmph.2012.45.2.70 63. Cho S E, Na K S, Cho S J, Im J S, Kang S G. Geographic al and temporal variations in the prevalence of mental disorders in suicide: Systematic review and meta analysis. J Affect Disord 2016;190:704 713. doi:10.1016/J.JAD.2015.11.008 64. Hempstead K. The geography of self injury: Spatial patterns in attempted and completed suicide. Soc Sci Med 2006;62(12):3186 3196. doi:10.1016/J.SOCSCIMED.2005.11.038 65. Bircher J, Kuruvilla S. Defining health by addressing individual, social, and environmental determinants: New opportunities for health care and public heal th. J Public Health Policy 2014;35(3):363 386. doi:10.1057/jphp.2014.19 66. Young R, Sweeting H, Ellaway A. Do schools differ in suicide risk? the influence of school and neighbourhood on attempted suicide, suicidal ideation and self harm among secondary school pupils. BMC Public Health 2011;11(1):874. doi:10.1186/1471 2458 11 874
120 67. Hawton K, Harriss L, Hodder K, Simkin S, Gunnell D. The influence of the economic and social environment on deliberate self harm and suicide: An ecological and person base d study. Psychol Med 2001;31(5):827 836. doi:10.1017/S0033291701003993 68. Martikainen P, Mki N, Blomgren J. The effects of area and individual social characteristics on suicide risk: A multilevel study of relative contribution and effect modification. Eur J Popul 2004;20(4):323 350. doi:10.1007/s10680 004 3807 1 69. US Census Bureau. American Community Survey (ACS): When to Use 1 year, 3 year, or 5 year Estimates. https://www.census.gov/programs surveys/acs/guidance/estimates.html. Published 2018. Acc essed April 19, 2019. 70. Singh GK. Area Deprivation and Widening Inequalities in US Mortality, 1969 1998. Am J Public Health 2003;93(7):1137 1143. doi:10.2105/AJPH.93.7.1137 71. Breiman L. Random Forests. Mach Learn 2001;45(1):5 32. doi:10.1023/A:1010 933404324 72. Strobl C, Boulesteix A L, Zeileis A, Hothorn T. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics 2007;8(1):25. doi:10.1186/1471 2105 8 25 73. Bouckaert RR, Frank E. Evaluating th e Replicability of Significance Tests for Comparing Learning Algorithms https://www.cs.waikato.ac.nz/~eibe/pubs/bouckaert_and_frank.pdf. Accessed April 23, 2019. 74. Lopez Castroman J, Perez Rodriguez M de las M, Jaussent I, et al. Distinguishing the rel evant features of frequent suicide attempters. J Psychiatr Res 2011;45(5):619 625. doi:10.1016/J.JPSYCHIRES.2010.09.017 75. Li X, Sundquist S, Johansson S E. Effects of neighbourhood and individual factors on injury risk in the entire Swedish population: a 12 month multilevel follow up study. Eur J Epidemiol 2008;23(3):191 203. doi:10.1007/s10654 007 9219 x 76. Grubesic TH, Matisziw TC. On the use of ZIP codes and ZIP code tabulation areas (ZCTAs) for the spatial analysis of epidemiological data. Int J Health Geogr 2006;5(1):58. doi:10.1186/1476 072X 5 58 77. Maciejewski DF, Creemers HE, Lynskey MT, et al. Overlapping Genetic and Environmental Influences on Nonsuicidal Self injury and Suicidal Ideation. JAMA Psychiatry 2014;71(6):699. doi:10.1001/jama psychiatry.2014.89 78. Tidemalm D, Runeson B, Waern M, et al. Familial clustering of suicide risk: a total population study of 11.4 million individuals. Psychol Med 2011;41(12):2527 2534. doi:10.1017/S0033291711000833
121 79. Hatzenbuehler ML. The social en vironment and suicide attempts in lesbian, gay, and bisexual youth. Pediatrics 2011;127(5):896 903. doi:10.1542/peds.2010 3020 80. Verrocchio MC, Carrozzino D, Marchetti D, Andreasson K, Fulcheri M, Bech P. Mental Pain and Suicide: A Systematic Review of the Literature. Front Psychiatry 2016;7:108. doi:10.3389/fpsyt.2016.00108 81. Qin P, Agerbo E, Mortensen PB. Suicide Risk in Relation to Socioeconomic, Demographic, Psychiatric, and Familial Factors: A National Register Based Study of All Suicides in De nmark, 1981 1997. Am J Psychiatry 2003;160(4):765 772. doi:10.1176/appi.ajp.160.4.765 82. Agerbo E, Sterne JAC, Gunnell DJ. Combining individual and ecological data to determine compositional and contextual socio economic risk factors for suicide. Soc Sc i Med 2007;64(2):451 461. doi:10.1016/J.SOCSCIMED.2006.08.043 83. Control Study. Bmj 2002;325(July):1 5. doi:10.1136/bmj.325.7355.74 84. Haegerich TM, Sugerman DE, Annest JL, K levens J, Baldwin GT. Improving Injury Prevention Through Health Information Technology. Am J Prev Med 2015;48(2):219 228. doi:10.1016/J.AMEPRE.2014.08.018 85. Prosperi M, Min JS, Bian J, Modave F. Big data hurdles in precision medicine and precision pub lic health. BMC Med Inform Decis Mak 2018. doi:10.1186/s12911 018 0719 2 86. Runeson B, Tidemalm D, Dahlin M, Lichtenstein P, Langstrom N. Method of attempted suicide as predictor of subsequent successful suicide: national long term cohort study. BMJ 20 10;341(jul13 1):c3222 c3222. doi:10.1136/bmj.c3222 87. Owens D, Horrocks J, House A. Fatal and non fatal repetition of self harm. Br J Psychiatry 2002;181(03):193 199. doi:10.1192/bjp.181.3.193 88. Chen VCH, Tan HKL, Chen C Y, et al. Mortality and suici de after self harm: community cohort study in Taiwan. Br J Psychiatry 2011;198(1):31 36. doi:10.1192/bjp.bp.110.080952 89. Finkelstein Y, Macdonald EM, Hollands S, et al. Risk of Suicide Following Deliberate Self poisoning. JAMA Psychiatry 2015;72(6):57 0. doi:10.1001/jamapsychiatry.2014.3188 90. Larkin C, Di Blasi Z, Arensman E. Risk Factors for Repetition of Self Harm: A Systematic Review of Prospective Hospital Based Studies. Lai Y H, ed. PLoS One 2014;9(1):e84282. doi:10.1371/journal.pone.0084282
122 9 1. Haukka J, Suominen K, Partonen T, Lonnqvist J. Determinants and Outcomes of Serious Attempted Suicide: A Nationwide Study in Finland, 1996 2003. Am J Epidemiol 2008;167(10):1155 1163. doi:10.1093/aje/kwn017 92. Goldman Mellor SJ, Caspi A, Harrington H, et al. Suicide Attempt in Young People. JAMA Psychiatry 2014;71(2):119. doi:10.1001/jamapsychiatry.2013.2803 93. De Leo D, Cerin E, Spathonis K, Burgis S. Lifetime risk of suicide ideation and attempts in an Australian community: Prevalence, suicidal process, and help seeking behaviour. J Affect Disord 2005;86(2 3):215 224. doi:10.1016/J.JAD.2005.02.001 94. Morgan C, Webb RT, Carr MJ, et al. Self harm in a primary care cohort of older people: incidence, clinical management, and risk of suicide and ot her causes of death. The Lancet Psychiatry 2018;0(0). doi:10.1016/S2215 0366(18)30348 1 95. Hawton K, Witt KG, Salisbury TLT, et al. Psychosocial interventions following self harm in adults: a systematic review and meta analysis. The Lancet Psychiatry 2 016;3(8):740 750. doi:10.1016/S2215 0366(16)30070 0 96. Morthorst B, Krogh J, Erlangsen A, Alberdi F, Nordentoft M. Effect of assertive outreach after suicide attempt in the AID (assertive intervention for deliberate self harm) trial: randomised controlle d trial. BMJ 2012;345:e4972. doi:10.1136/bmj.e4972 97. Spirito A, Plummer B, Gispert M, et al. Adolescent suicide attempts: Outcomes at follow up. Am J Orthopsychiatry 1992;62(3):464 468. doi:10.1037/h0079362 98. Milner AJ, Carter G, Pirkis J, Robinson J, Spittal MJ. Letters, green cards, telephone calls and postcards: Systematic and meta analytic review of brief contact interventions for reducing self harm, suicide attempts and suicide. Br J Psychiatry 2015;206(3):184 190. doi:10.1192/bjp.bp.114.14781 9 99. Carter G, Milner A, McGill K, Pirkis J, Kapur N, Spittal MJ. Predicting suicidal behaviours using clinical instruments: Systematic review and meta analysis of positive predictive values for risk scales. Br J Psychiatry 2017;210(06):387 395. doi:10. 1192/bjp.bp.116.182717 100. Mann JJ, Apter A, Bertolote J, et al. Suicide Prevention Strategies. JAMA 2005;294(16):2064. doi:10.1001/jama.294.16.2064 101. Gaynes BN, West SL, Ford CA, Frame P, Klein J, Lohr KN. Screening for Suicide Risk in Adults: A Su mmary of the Evidence for the U.S. Preventive Services Task Force. Ann Intern Med 2004;140(10):822. doi:10.7326/0003 4819 140 10 200405180 00015
123 102. Steeg S, Quinlivan L, Nowland R, et al. Accuracy of risk scales for predicting repeat self harm and sui cide: a multicentre, population level cohort study using routine clinical data. BMC Psychiatry 2018;18(1):113. doi:10.1186/s12888 018 1693 z 103. Callahan ST, Fuchs DC, Shelton RC, et al. Identifying suicidal behavior among adolescents using administrati ve claims data. Pharmacoepidemiol Drug Saf 2013;22(7):769 775. doi:10.1002/pds.3421 104. Method for Describing Patients with Multiple Injuries and Evaluating Emergency Care. Trauma Injury Infect Crit Care 1974;14(3):187 196. http://journals.lww.com/jtrauma/Citation/1974/03000/The_Injury_Severity_Score_ _A_Method_for_Describing.1.aspx. Accessed April 12, 2017. 105. Clark DE, Black AW, Skavdahl DH, Hallagan LD. Open access programs fo r injury categorization using ICD 9 or ICD 10. Inj Epidemiol 2018;5(1):11. doi:10.1186/s40621 018 0149 8 106. Cubbin C, Smith GS. Socioeconomic Inequalities in Injury: Critical Issues in Design and Analysis. Annu Rev Public Health 2002;23(1):349 375. do i:10.1146/annurev.publhealth.23.100901.140548 107. Borges G, Angst J, Nock MK, Ruscio AM, Kessler RC. Risk factors for the incidence and persistence of suicide related outcomes: A 10 year follow up study using the National Comorbidity Surveys. J Affect Di sord 2008;105(1 3):25 33. doi:10.1016/J.JAD.2007.01.036 108. Calati R, Laglaoui Bakhiyi C, Artero S, Ilgen M, Courtet P. The impact of physical pain on suicidal thoughts and behaviors: Meta analyses. J Psychiatr Res 2015;71:16 32. doi:10.1016/J.JPSYCHIR ES.2015.09.004 109. Dube SR, Anda RF, Felitti VJ, Chapman DP, Williamson DF, Giles WH. Childhood Abuse, Household Dysfunction, and the Risk of Attempted Suicide Throughout the Life Span. JAMA 2001;286(24):3089. doi:10.1001/jama.286.24.3089 110. Fleischm an RJ, Mann NC, Dai M, et al. Validating the Use of ICD 9 Code Mapping to Generate Injury Severity Scores. J Trauma Nurs 2017;24(1):4 14. doi:10.1097/JTN.0000000000000255 111. Tohira H, Jacobs I, Mountain D, Gibson N, Yeo A. Systematic review of predicti ve performance of injury severity scoring tools. Scand J Trauma Resusc Emerg Med 2012;20(1):63. doi:10.1186/1757 7241 20 63 112. Nakahara S, Yokota J. Revision of the International Classification of Diseases to include standardized descriptions of multip le injuries and injury severity. Bull World Health Organ 2011;89(3):238 240. doi:10.2471/BLT.10.078964
124 113. Hawton K, Arensman E, Townsend E, et al. Deliberate self harm: systematic review of efficacy of psychosocial and pharmacological treatments in pre venting repetition. BMJ 1998;317(7156):441 447. doi:10.1136/BMJ.317.7156.441 114. Belsher BE, Smolenski DJ, Pruitt LD, et al. Prediction Models for Suicide Attempts and Deaths. JAMA Psychiatry March 2019. doi:10.1001/jamapsychiatry.2019.0174 115. Walsh CG, Ribeiro JD, Franklin JC. Predicting Risk of Suicide Attempts Over Time Through Machine Learning. Clin Psychol Sci 2017;5(3):457 469. doi:10.1177/2167702617691560 116. suicidal self injury v. attempted su icide: new diagnosis or false dichotomy? Br J Psychiatry 2013;202(05):326 328. doi:10.1192/bjp.bp.112.116111 117. Klonsky ED, May AM, Glenn CR. The relationship between nonsuicidal self injury and attempted suicide: Converging evidence from four samples. J Abnorm Psychol 2013;122(1):231 237. doi:10.1037/a0030278 118. Bryan CJ, Bryan AO, May AM, Klonsky ED. Trajectories of Suicide Ideation, Nonsuicidal Self Injury, and Suicide Attempts in a Nonclinical Sample of Military Personnel and Veterans. Suicide L ife Threatening Behav 2015;45(3):315 325. doi:10.1111/sltb.12127 119. Glenn CR, Nock MK. Improving the Short Term Prediction of Suicidal Behavior. Am J Prev Med 2014;47(3):S176 S180. doi:10.1016/J.AMEPRE.2014.06.004 120. Mann JJ, Michel CA. Prevention of Firearm Suicide in the United States: What Works and What Is Possible. Am J Psychiatry 2016;173(10):969 979. doi:10.1176/appi.ajp.2016.16010069 121. Tran T, Luo W, Phung D, et al. Risk stratification using data from electronic medical records better p redicts suicide risks than clinician assessments. BMC Psychiatry 2014;14:76. doi:10.1186/1471 244X 14 76
125 BIOGRAPHICAL SKETCH Jae Min earned her Bachelor of Arts degree in c hemistry from Rice University in 2011. During her undergraduate education, she e arned her Emergency Medical Technician certificate and worked part time as an intern for Dr. Shelley Sazer at Baylor College of Medicine. Upon graduation, she continued to conduct cell and molecular biology research under the guidance of Dr. Sazer full tim e Jae started her PhD at University of Florida in August 2015 under the mentorship of Dr. Mattia P rosperi. She worked on several phylogenetics projects as well as epidemiological studies using electronic health records and administrative claims data. Jae has served as the doctoral student representative during 2017 2018 Society for Pharmacoepidemiology Student Chapter during 2018 2019 She has received travel awards from University of Florida Graduate Student Council and University of Florida Center for European Studies. She also received an award to study statistical methods for big data at University of Washington in 2017 Jae completed her PhD in epidemiolo gy in the summer of 2019.