Citation
Examining the Influence of Built Environment on Traffic Crashes: A Spatio-Temporal Data Mining Approach

Material Information

Title:
Examining the Influence of Built Environment on Traffic Crashes: A Spatio-Temporal Data Mining Approach
Creator:
Ouyang, Yiqiang
Publisher:
University of Florida
Publication Date:
Language:
English

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Design, Construction, and Planning
Design, Construction and Planning
Committee Chair:
BEJLERI,ILIR
Committee Co-Chair:
ZWICK,PAUL D
Committee Members:
STEINER,RUTH LORRAINE
SRINIVASAN,SIVARAMAKRISHNAN
Graduation Date:
5/3/2014

Subjects

Subjects / Keywords:
Databases ( jstor )
Land use ( jstor )
Mining ( jstor )
Pedestrian traffic ( jstor )
Physical trauma ( jstor )
Statistics ( jstor )
Streets ( jstor )
Syntactical antecedents ( jstor )
Traffic safety ( jstor )
Transportation ( jstor )
associationrulemining
data
gis
mining
spatiotemporal
Miami-Dade County ( local )
Genre:
Unknown ( sobekcm )

Notes

General Note:
Most of the current traffic crash analysis research is done either at spot level or at regional level. Less attention is placed at the community level. Additionally, the effects of changes in land use, population growth and transportation networks on the community traffic safety have not been systematically studied. Although some studies have defined some variables to describe the built environment influence on traffic crashes, no research has looked at this problem using a comprehensive framework of relevant built environment variables. Moreover, the spatial distance and topological relationship among built environment elements has not been considered and the temporal influence of built environment on crashes has not been considered either. What is more, the widely used statistics method like Poisson and negative binomial regression in traffic crash analysis are inherently limited by assumptions and pre-defined underlying relationship between dependent and independent variables, and typically neglect the spatial dependency among crashes. Data mining techniques, which overcome these limitations, provide an alternative perspective to understand the relationship between built environment and crashes. This research explores a spatio-temporal association rule data mining method to understand the effects of built environment on traffic crashes. The census block-group is selected as the analysis unit for the study. The D transportation variables framework which includes dimensions of density, diversity, design, destination accessibility and distance to transit, is utilized to characterize the built environment. Most importantly, the land use mix indicator is used as part of the Design dimension. The study calculates the spatial distance and the topological relationship among participating built environment spatial data elements. The crashes are categorized by types and are divided into four time groups based on 24 hour statistical distribution. The crash and built environment data are processed in GIS environment. The association rule data mining technique are applied to explore association between the built environment variables and frequency of each crash type. The results show that many rules were mined for all crash types except for fatal crashes. There is no pattern that shows the influence of built environment on fatal crashes. The same outcome resulted for pedestrian crashes in time group one and for all bike crashes except for those in time group three. Composed of rule antecedent and consequent, each rule directly describes how the built environment (rule antecedent) influences a certain crash type (rule consequent). All resulting rules are a mix of all D variables, distance variables, and topological variables which suggests that the interaction of built environment elements influence the occurrence of crashes. The same built environment variable has a different effect when combined with other different variables, and in different time groups. The GIS mapping of the strongest rules can help identify the high risk communities and examine how the influence of built environment may change during different time periods. The results also show that highly mixed land use is associated with a high number of crashes in total.

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright Ouyang, Yiqiang. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Embargo Date:
5/31/2016

Downloads

This item has the following downloads:


Full Text
xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID ELPFC1K5B_JH2IKS INGEST_TIME 2014-10-03T21:38:08Z PACKAGE UFE0046661_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES



PAGE 1

1 EXAMINING THE INFLUENCE OF BUILT ENVIRONMENT ON TRAFFIC CRASHES : A SPATIO TEMPORAL DATA MINING APPROACH By YIQIANG OUYANG A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2014

PAGE 2

2 201 4 Y iqiang O uyang

PAGE 3

3 To my family

PAGE 4

4 ACKNOWLEDGMENTS First, I would like to express my deepest gratitude to my advi sor Dr. Ilir Bejleri for his endless help and support during my Ph.D. study in University of Florida His creative research ideas and insightful thought have been inspiring me all the time. I would also like to thank my committee members, Dr. Ruth Steiner Dr. Paul Zwick and Dr. Sivaramakrishnan Srinivasan I appreciate their time, effort, support and invaluable suggestions to help me finish my research. I give my special appreciation to Nathaniel Wingfield, Dan Brown and Marni Fowler of GeoPlan Center. T he y gave me a lot help and encouragement. I also appreciate the help and encouragement of the facult y and staff in Department of Urban and Regional Planning and College of Design, Construction and Planning during my Ph.D. study. Last but not least, I am d eeply indebted to my parents and sister for their endless love and support. I appreciate my dear wife for h er boundless love, encouragement, support and advice. Without her I could not finish my Ph.D. study.

PAGE 5

5 TABLE OF CONTENTS P age ACKNOWLEDGMENTS ................................ ................................ ................................ .. 4 LIST OF TABLES ................................ ................................ ................................ ............ 7 LIST OF FIGURES ................................ ................................ ................................ ........ 10 LIST OF ABBREVIATIONS ................................ ................................ ........................... 12 ABSTRACT ................................ ................................ ................................ ................... 13 CHAPTER 1 INTRODUCTION ................................ ................................ ................................ .... 16 2 LITERATURE REVIEW ................................ ................................ .......................... 20 The Spatial Nature of a Crash ................................ ................................ ................ 20 D Variables Framework for Built Environment ................................ ........................ 21 Density ................................ ................................ ................................ ............. 22 Diversity ................................ ................................ ................................ ............ 23 Design ................................ ................................ ................................ .............. 23 Destination Accessibility ................................ ................................ ................... 24 Distance to Transit ................................ ................................ ........................... 24 Community Level Crash Analysis ................................ ................................ ........... 25 Statistics Methods for Crash Analysis ................................ ................................ ..... 27 Traditional Engineering Statistical Methods ................................ ...................... 27 Spatial Statistics Methods for Traffic Safety Analysis ................................ ....... 28 Spatial a ................................ ............................. 28 High/low c lustering (Getis Ord g eneral G) ................................ ................. 29 Cluster and o utlier a nalysis (Anselin l ................................ 29 Hot s pot a nalysis (Getis Ord Gi*) ................................ ............................... 30 Advantages and d isadvantages of s patial s tati stics m ethod ...................... 30 Data Mining for Crash Analysis ................................ ................................ ............... 31 Data Mining ................................ ................................ ................................ ...... 31 Association Rule Data Mining ................................ ................................ ........... 32 Spatial Association Rule Data Mining ................................ ............................... 33 Current Application of Data Mining in Crash Analysis ................................ ...... 35 3 RESEARCH METHODOLOGY ................................ ................................ ............... 44 Data ................................ ................................ ................................ ........................ 44 Study Design ................................ ................................ ................................ .......... 45 Tools Used ................................ ................................ ................................ .............. 46

PAGE 6

6 Crash Data Aggregation ................................ ................................ ......................... 49 Crash Data Aggregation without Consid ering Crash Time ............................... 49 Crash Data Aggregation Considering Crash Time ................................ ........... 50 D Variables ................................ ................................ ................................ ............. 52 Spatial Distance and Topological Relationship ................................ ...................... 53 Introduction ................................ ................................ ................................ ....... 53 Methods of Spatial Relationship Extraction and Calculation ............................. 55 Results of Spatial Relationship Extraction and Calculation .............................. 56 Discretization ................................ ................................ ................................ .......... 56 Methods of Discretization ................................ ................................ ................. 56 Result of Discretization ................................ ................................ ..................... 59 Association Rule Data Mining ................................ ................................ ................. 59 4 RESULTS AND DISCUSSION ................................ ................................ ............... 93 Result Overview ................................ ................................ ................................ ...... 93 Spatial Analysis of the Influence of Built Environment on Crashes ......................... 95 Total Crashes ................................ ................................ ................................ ... 95 PDO Crashes ................................ ................................ ................................ ... 97 Injury Crashes ................................ ................................ ................................ .. 98 Pedestrian Crashes ................................ ................................ .......................... 98 Bike Crashes ................................ ................................ ................................ .... 99 Spatio Temporal Analysis of the Influence of Built Environment on Crashes ....... 100 Total Crashes ................................ ................................ ................................ 101 PDO Crashes ................................ ................................ ................................ 102 Injury Crashes ................................ ................................ ................................ 103 Pedestrian Crashes ................................ ................................ ........................ 104 Bike Crashes ................................ ................................ ................................ .. 105 Influence of Mixed Land Use on Crashes ................................ ............................. 106 5 CONCLUSIONS, RECOMMENDATIONS AND FUTURE RESEARCH ................ 164 APPENDIX A SPATIAL DISTANCE RELATIONSHIP CALCULATION AND TOPOLOGICAL RELATIONSHIP EXTRACTION SOURCE CODE SAMPLE ................................ 169 B DISCRETIZATION AND ASSOCIATION RULE MINING SOURCE CO DE SAMPLE ................................ ................................ ................................ ............... 171 LIST OF REFERENCES ................................ ................................ ............................. 177 BIOGRAPHICAL SKETCH ................................ ................................ .......................... 184

PAGE 7

7 LIST OF TABLES Table page 2 1 D Variables of built environment used in someule current crash research ......... 41 2 2 Summary of existing statistics models for analyzing crash frequency data ........ 42 3 1 Number of each crash type ................................ ................................ ................ 61 3 2 Number of block group that has no crash ................................ ........................... 61 3 3 Summary of the number of crashes by block group for all crash types .............. 61 3 4 Summary of total crash by time ................................ ................................ .......... 62 3 5 Time group of crashes ................................ ................................ ........................ 62 3 6 Summary of blog groups have no crashes for each crash type in different time groups ................................ ................................ ................................ ......... 62 3 7 Statistics summary of crashes by block group for time group 1 .......................... 63 3 8 Statistics summary of crashes by block group for time group 2 .......................... 63 3 9 Statistics summary of crashes by block group for time group 3 .......................... 63 3 10 Statistics summary of crashes by block group for time group 4 .......................... 63 3 11 Summary statistics for D variables by block group ................................ ............. 64 3 12 Layers participate the spatial relationship extraction ................................ .......... 65 3 13 Topological relationships need to be extracted ................................ .................. 66 3 14 Distance relationship need to be extracted ................................ ......................... 67 3 15 Summary statistics for spatial distance relationship by block group level ........... 68 3 16 Summary statistics for spatial topological relationship extracted by block group l evel ................................ ................................ ................................ .......... 69 3 17 Summary statistics of all spatial distance variables ................................ ............ 70 3 18 Crashes discretization without considering the occurre nce time ........................ 70 3 19 Total crash discretization considering the occurrence time ................................ 70 3 20 PDO crash discretization considering the occ urrence time ................................ 71

PAGE 8

8 3 21 Injury crash discretization considering the occurrence time ............................... 71 3 22 Pedestrian crash discretization conside ring the occurrence time ....................... 71 3 23 Bike crash discretization considering the occurrence time ................................ 72 3 24 D variables discretization ................................ ................................ .................... 72 3 25 Distance variables discretization ................................ ................................ ........ 73 4 1 Number of rules that mined for each crash type ................................ ............... 109 4 2 Frequent items of top 100 rules for total crashes rules ................................ ..... 109 4 3 Frequent items of top 100 rules for PDO crashes rules ................................ .... 110 4 4 Frequent items of top 100 rules for injury crashes rules ................................ ... 110 4 5 Frequent items of top 100 rules for pedestrian crashes rules ........................... 110 4 6 Frequent items of top 45 rules for bike crashes rules ................................ ....... 110 4 7 Top 10 rules for total crash number ................................ ................................ .. 111 4 8 Top 10 rules for PDO crash ................................ ................................ .............. 112 4 9 Top 10 rules for injury crash ................................ ................................ ............. 113 4 10 Top 10 rules for pedestrian crash ................................ ................................ ..... 114 4 11 Top 10 rules for bike crash ................................ ................................ ............... 115 4 12 Frequent items for top 100 rules for total crashes in time group 1 .................... 116 4 13 Frequent items for top 100 rules for total crashes in time group 2 .................... 116 4 14 Frequent items for top 100 rules for total crashes in time group 3 .................... 116 4 15 Frequent items for top 100 rules for total crashes in time group 4 .................... 116 4 16 Frequent items for top 100 rules f or PDO crashes in time group 1 ................... 116 4 17 Frequent items for top 100 rules for PDO crashes in time group 2 ................... 117 4 18 Frequent items for top 100 rules for PDO crashes in time group 3 ................... 117 4 19 Frequent items for top 100 rules for PDO crashes in time group 4 ................... 117 4 20 Frequent items for top 100 rules for injury crashes in time group 1 .................. 117

PAGE 9

9 4 21 Frequent items for top 100 rules for injury crashes in time group 2 .................. 117 4 22 Frequent items for top 100 rules for injury crashes in time group 3 .................. 118 4 23 Frequent items for top 100 rules for injury crashes in time group 4 .................. 118 4 24 Frequent items for top 100 rules for pedestrian crashes in time group 2 .......... 118 4 25 Frequent items for top 100 rules for pedestrian crash es in time group 3 .......... 118 4 26 Frequent items for top 100 rules for pedestrian crashes in time group 4 .......... 118 4 27 Frequent items for top 100 rules for bike crashes in time group 3 .................... 119 4 29 Top 10 rules for total crash in time group 2 ................................ ...................... 121 4 30 Top 10 rules for t otal crash in time group 3 ................................ ...................... 122 4 31 Top 10 rules for total crash in time group 4 ................................ ...................... 123 4 33 Top 10 rules for PDO crash in time gro up 2 ................................ ..................... 125 4 34 Top 10 rules for PDO crash in time group 3 ................................ ..................... 126 4 35 Top 10 rules for PDO crash in time group 4 ................................ ..................... 127 4 36 Top 10 rules for injury crash in time group 1 ................................ .................... 128 4 37 Top 10 rules for injury crash in time group 2 ................................ .................... 129 4 38 Top 10 rules for injury crash in time group 3 ................................ .................... 130 4 39 Top 10 rules for injury crash in time group 4 ................................ .................... 131 4 40 Top 10 rules for pedestrian crash in time group 2 ................................ ............ 132 4 41 Top 10 rules for pedestrian crash in time group 3 ................................ ............ 133 4 42 Top 10 rules for pedestrian crash in time group 4 ................................ ............ 134 4 43 Top 10 rules for bike crash in time group 3 ................................ ...................... 135 4 44 Influenc e of land use mix on crashes without considering crash time .............. 136 4 45 Influence of land use mix on crashes considering crash time ........................... 137

PAGE 10

10 LIST OF FIGURES Figure p age 2 1 Conceptual Framework Linking the Built Environment and Traffic Safety .......... 43 3 1 Overview of the Miami Dade Coun ty, FL ................................ ............................ 75 3 2 Residential land use ................................ ................................ ........................... 76 3 4 Institutional land use ................................ ................................ ........................... 78 3 5 ecreation land use ................................ ................................ .............................. 79 3 6 I ndustrial land use ................................ ................................ .............................. 80 3 7 Agricultural land use ................................ ................................ ........................... 81 3 8 Built environment and crash map for a block group ................................ ............ 82 3 9 System design ................................ ................................ ................................ .... 83 3 10 Distribution o f crash by block group. ................................ ................................ ... 84 3 11 The distribution of number of crashes along with the time ................................ .. 85 3 12 Distribution of crash of tim e group 1 by block group ................................ ........... 86 3 13 Distribution of crash of time group 2 by block group ................................ ........... 87 3 14 Distribution of crash of time gr oup 3 by block group ................................ ........... 88 3 15 Distribution of crash of time group 4 by block group ................................ ........... 89 3 16 Three spatial relationships ................................ ................................ .................. 90 3 17 The 9 intersection model represented as a matrix ................................ .............. 90 3 18 Spatial Topological Relationship Extraction and Distance Relationship Calculation Algorithm ................................ ................................ .......................... 91 3 19 Spatial distance distribution. ................................ ................................ ............... 92 4 1 Scatter plot of all rules for all crash types without con sidering crash occruence time ................................ ................................ ................................ 138 4 2 Matching block groups for the top 5 rules of total crashes ................................ 139 4 3 Matching block groups for the top 5 rules of PDO crashes ............................... 140

PAGE 11

11 4 4 Matching block groups for the top 5 rules of injury crashes .............................. 141 4 5 Matching block groups for the top 5 rules of pedestrian crashes ...................... 142 4 6 Matching block groups for the top 5 rules of bike crashes ................................ 143 4 7 Sc atter plot for total crash in different time group ................................ ............. 144 4 8 Scatter plot for PDO crash in different time group ................................ ............ 145 4 9 Scatter plot for injury crash in different time group ................................ ........... 146 4 10 Scatter plot for pedestrian crash in different time group ................................ ... 147 4 11 Scat ter plot for bike crash in different time group ................................ ............. 147 4 12 Matching block groups for the top 5 rules of total crash in time group 1 ........... 148 4 13 Matching block groups for the top 5 rules of total crash in time group 2 ........... 1 49 4 14 Matching block groups for the top 5 rules of total crash in time group 3 ........... 150 4 15 Matching block groups for the top 5 rules of total crash in time group 4 ........... 151 4 16 Matching block groups for the top 5 rules of PDO crash in time group 1 .......... 152 4 17 Matching block groups for the top 5 rules of PDO crash in time group 2 .......... 153 4 18 Matching block groups fo r the top 5 rules of PDO crash in time group 3 .......... 154 4 19 Matching block groups for the top 5 rules of PDO crash in time group 4 .......... 155 4 20 Matching block groups for the top 5 rules of injury crash in time group 1 ......... 156 4 21 Matching block groups for the top 5 rules of injury crash in time group 2 ......... 157 4 22 Matching block groups for the top 5 rules of injury crash in time group 3 ......... 158 4 23 Matching block groups for the top 5 rules of injury crash in time group 4 ......... 159 4 24 Matching block groups for the top 5 rules of pedestrian crash in time group 2 160 4 25 Matching block groups for the top 5 rules of pedestrian crash in time group 3 161 4 26 Matching block groups for the top 5 rules of pedestrian crash in time group 4 162 4 27 Matching block groups for the top 5 rules of bike crash in time group 3 ........... 163

PAGE 12

12 LIST OF ABBREVIATIONS ANN Artificial Neural Network ANFIS Adaptive Neuro Fuzzy Inference System CART Classific ation and Regression Trees DBI Database Interface DOT Department of Transportation ESRI Environment System Research Institute FGDL Florida Geographic Data Library FHP Florid Highway Portal GIS Geographic Information System GWR Geographically Weighted Regr ession MPO Metropolitan Planning Organization NHTSA National Highway Traffic Safety Administration ORDBMS Object relational Database Management System PDO Property Damage Only SCP Safety Conscious Planning TAZ Traffic Analysis Zones TSP Transportation Safety Planning

PAGE 13

13 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of th e Requirements for the Degree of Doctor of Philosophy EXAMINING THE INF L U E N C E OF BUILT ENVIRONMENT ON TRAFFIC CRASHES: A SPATIO TEMPORAL DATA MINING APPROACH By Yiqiang Ouyang May 2014 Chair: Ilir Bejleri Major: Design, Construction and Planning Most of the current traffic crash analysis research is done either at spot level or at regional level. Less attention is placed at the community level. Additionally the effects of changes in land use, population growth and transportation networks on the communit y traffic safety have not been systematically studied. Although some studies have defined some variables to describe the built environment influence on traffic crashes, no research has looked at this problem using a comprehensive framework of relevant buil t environment variables. Moreover, t he spatial distance and topological relationship among built environment elements has not b een considered and the temporal influence of built environment on crashes has not been considered either. used statistics method like Poisson and negative binomial regression in traffic crash analysis are inherently limited by assumptions and pre defined underlying relationship between dependent and independent variables, and typically neglect the spatial dep endency among crashes. D ata mining techniques which overcome these limitations provide an alternative perspective to understand the relationship between built environment and crashes

PAGE 14

14 This research explores a spatio temporal association rule data mining method to understand the effects of built environment on traffic crashes. The census block group is selected as the analysis unit for the study. The D transportation variables framework which includes dimensions of density, diversity, design, destination accessibility and distance to transit, is utilized to characterize the built environment. Most importantly, the land use mix indicator is used as part of the Design dimension. The study calculates the spatial distance and the topological relationship among participat ing built environment spatial data elements The crashes are categorized by types and are divided into four time groups based on 24 hour statistic al distribution The crash and built environment data are processed in GIS environment. The associa tion rule data mining technique are applied to explore association between the built environment variables and frequency of each crash type. The results show that many rules were mined for all crash type s except for fatal crashes. There is no pattern that shows the influence of built environment on fatal crashes. The same outcome result ed for pedestrian crash es in time group one and for all bike crash es except for those in time group three Composed of rule antecedent and consequent, each rule directly des cribes how the built environment (rule antecedent) influences a certain crash type (rule consequent). A ll resulting rules are a mix of all D variables, distance variables, and topological variables which suggests that the interaction of built environment e lements i nfluence the occurrence of crash es The s ame built environment variable has a different effect when combined with other different variables, and in different time groups. The GI S mapping of the strongest rules can help identify the high risk commu nities and examine how the influence of built

PAGE 15

15 environment may change during different time periods The results also show that highly mixed land use is associated with a high number of crashes in total

PAGE 16

16 CHAPTER 1 INTRODUCTION According to a report from National H ighway Traffic Safety Administration (NHTSA) 33,808 people died in motor vehicle traffic crashes in the United States in 2009, and an estimated 2.22 million people were injured in motor vehicle traffic crashes in the same year ( National Highway Traffic Safety Administrator 2010 ) The sheer number of crashes comes with a large economic cost. In 2000, NHTSA estimated the annual financial loss related to crashes to be in excess of $230 billion Transportation sa fety is one of the highest priorities for Department of Transportation s (DOT) at all levels. The staggering economic cost of crashes calls for the integration of the safety into transportation planning. The passage of Transportation Equity Act for the 21st Century (TEA 21) in 1998 is a watershed for absorbing the attention of the nation. TEA 21requires every state DOT and metropolitan planning organization (MPO) should integrate safety and security as one of several priority factors into their respective transportation planning processes and activities. Since then, the research and practice of incorporating safety into transportation planning safety conscious planning (SCP) or transportation safety planning (TSP) become a hot topic in U.S. transportation p lanning To reduce the number of traffic crashes and associated deaths and injuries, numerous safety studies have tried to evaluate the influence of factors like characteristics, while few st udies ha ve evaluate d the effects of built environment (also urban form in some studies ) on traffic safety.

PAGE 17

17 Ewing and Dumbaugh (2009) concluded that built environment affects crash frequency and severity through the mediators of traffic volume and traffic speed. Development patterns of the built en vironment impact safety primarily through the traffic volumes they generate, and secondarily through the speeds they encourage. Roadway designs impact safety primarily through the traffic speeds they allow, and secondarily through the traffic volumes they generate. Traffic volumes in turn are the primary determinants of crash frequency, while traffic speeds are the primary determinants of crash severity ( Handy et al. 2002 ) Alt hough factors of built environment like land use, population density, job count, transit accessibility are secondary variables and don 't have direct effect on traffic crashes, their influence actually is significant ( Kim et al. 2006b ) Previous research uses some v ariables to measure the built environment, but the spatial relationship among the built environment elements are never considered. The spatial layout of built environment is important for crash analysis because different built environment layouts may resul t in different effects on crashes. In addition to considering both the non spatial and spatial attributes of built environment the time dimension also is very important. The influence of built environment on crashes may vary from time to time For exampl knowledge that more crashes occur during rush hour than during other times of the day. Besides, m ost of the typical research in this area is done at either spot level or regional level resulting on safety interventions focus ed primarily on either spot level engineering improvements, or on regional education and law enforcement campaigns ( Dumbaugh et al. 2011 ) Additionally the effect s of the chang e s of built

PAGE 18

18 environment to the community safety hav e s not been systematically studied. W e often fail to pay enough attention to overall community design when it comes to safety Marshall and Ga rrick (2011) From the analysis methodology perspective, t raditionally, regression models, especially the Poisson and negative binomial regression have been widely used in the traffic crash analysis. On the one side, t hese models have advantages when app lied in crash analysis, on the other side, they are limited by assumptions and pre defined underlying relationship between dependent and independent variables ( Chang and Chen 2005 ) ; and, usually the y do not consider the spatial dependency between the dependent variables and independent variables. Data mining, defined as the exploration and analysis of large datasets to discover implicit knowledge and rules, and embraced by researchers in recent years has a promising potential to understand crashes and their relationships to the built environment from another persepctive The capacity of data mining to store and analyze very large dataset lends itself to analysis of traffic crashes which occur in tens of millions each year throughout the world. Additionally, the could prove useful for this research which not only has to deal with the large size of crashes but also with the complexity of built environment. Some researchers have been using data mining techniques in crash analysis. M ost studies are focused on exploring the association between crash severity or crash frequency and independent variables such as driver and ve hicle characteristics, road conditions, environment factors and highway geometric attributes Few other studies have looked at other aspects such as economic impact of crashes, data quality

PAGE 19

19 improvement and traffic risks. From a review of current research one observation emerges clearly: most of the data mining methods fail to integrate spatial elements. Almost all the data mining methods operate on tabular representations. Geographic patial attribute of crashes or the spatial contextual elements that may influence crashes such as road network characteristics or more broadly the built environment. T he goal of this research is to explore the spatial temporal relationship between the bui lt environment and traffic crashes T o accomplish this research goal several questions are addressed: 1. How to describe the built environment comprehensively from both non spatial and spatial perspective? 2. How to design and implement the spatio temporal sp atial association rule data mining model to evaluate the influence of built environment on the frequency of crashes? 3. What i s the spatio temporal effect of built environment on crashes? 4. W hat can we learn from the result for better planning in future? To an swer these questions the Miami Dade County is selected as the study area The research is meaningful to the practice of integrating safety in the built environment (which includes transportation system) planning, design and implementation. This is especi ally important in the United States since the roadways are classified mainly in terms of their access and mobility functions as opposed to the European design practice which begins by examining the developmental context of a roadway, identifying the hazard s that are expected to exist in these environments, and then specifying a target design ( Lamm et al. 1999 Ewin g and Dumbaugh 2009 )

PAGE 20

20 CHAPTER 2 LITERATURE REVIEW The chapter discusses the spatial nature of crash es followed by a review of the D v ariables framework to describe the built environment and the community level crash analysis It also reviews the statistic al met hod used for crash analysis and concludes with the introduction of the spatio temporal association rule data mining. The Spatial Nature of a Crash First, crashes are spatial events that occur in the built environment. Urban planners usually use terms like when referring to the built environment ( Handy et al. 2002 ) According to Handy et al. (2002) : within it, including both their arrangement and their appearance, and is concerned with the function and appeal activities across space, including the location and density of different activities, where activities are grouped into relatively coarse categories, such as residential, commercial, offic infrastructure of roads, sidewalks, bike paths, railroad tracks, bridges, as well as the level of service determined by traffic levels, bus frequencies, and the like. Se cond, crashes are influenced by the interaction with the components of the built environment. Ewing and Dumbaugh (2009) presented a theoretical framework ( Figure 2 1 ) of how the built environment influences traffic safety and mentioned that all current published literature is generally supportive of this framework. In this framework, the development patterns and road design components in built environment affect the

PAGE 21

21 crash frequency and severity through three mediators: traffic volumes, conflicts and speeds. Third, crashes ar e spatially dependent. The first law of geography by Waldo Tobler points out that "everything is related to everything else, but near things are more related than distant things." ( Tobler 1970 ) This means that objects in geographic space interact with each other. This phenomenon s ometimes results in spatial dependency a nd it is the case for crashes For example, t wo intersections along the same street (but farther apart) exhibit stronger influence to each other than two intersections that are geographically close to each other but share no common street. Traditionally, the crash counts data are typically analyzed with Poisson and negative binomial (NB) modeling techniques ( Miaou 1994 Ivan et al. 2000 Hadayeghi et al. 2003a Lord et al. 2005 Kim et al. 2006b ) The fundamental limitation of Poisson and NB regression modeling is that they assume that the sample observations are independently generated. This may not necessarily be true in the case of spatial data. D Variables Framework f or Built Environment According to the definition from Transportation Research Board and Institue of Medicine (2005) built environment is a broad concept that includes lan d use patterns, transportation system, and design features that together provide opportunities for travel and physical activity. To research how the crashes are affected by built environment, the first question that we need to answer is how to describe the built environment. Researchers began to rigorously study the relationships between built environment and travel since 1990 by using the D variables to describe the built environment. The D variables have extended from the original 3Ds density, diversity and design ( Cervero

PAGE 22

22 and Kockelman 1997 ) to 5Ds by including destination acc essibility and distance to transit ( Ewing and Cervero 2010 ) Up to now, many studies have integrated some variables of built environment in their traffic safety research ( Ivan et al. 2000 Sawalha and Sayed 2001 Kim and Yamashita 2002 Hadayeghi et al. 2003b Kim et al. 2006a Wedagama et al. 2006 Dumbaugh and Rae 2009 Rifaat and Tay 2009 Dumbaugh and Li 2010 Hadayeghi et al. 2010 Khattak et al. 2010 Kim et al. 2010 Rifaat et al. 2010 Dumbaugh et al. 2011 ) Due to the different research purposes, different variables for each D dimension ha ve been used. Also the number of variables under each D dimension is different. And for the same reasons, the influence of the same variable may different. All these studies have the following in common: First, no study has utilized the D variables framewo rk to include all the variables of built environment. Table 1 present a summary of the D variables used in research related to built environment and crashes. Second, almost all of the studies that included the Diversity dimension have used the land use typ e as the variable but not the land use mix index. Each D dimension is defined below and a summary of how current research has used the D variables is provided. Density Density is usually measured in terms of persons, jobs, or housing units per unit area ( Ewing and Cervero 2010 ) The density variables are the most widely used in the crash analysis literature. They are sometimes called socio economic and demographic data. Hadayeghi et al. (2003b) used the variables including total population, population density, number of households, household density, full time/part time employed and the tot al employed in their research of evaluating the safety of urban transportation system.

PAGE 23

23 The population density is used to evaluate the influence of the urban land use on non motorized crashes ( Wedagama et al. 2006 ) Diversity Diversity measures the number of different land uses in a given area and the degree to which they are represented in land area, floor area, or employment ( Ewing and Cervero 2010 ) Almost all crash research studies are using land use types to describe the diversity of built environment and then to find their influences on certain crashes. The results vary. Ivan et al. (2000) used the number of driveways of v arious land use types on each highway segment to represent the land use effects. Sawalha and Sayed (2001) classified the land use into three categories: residential, business and other. The land use type variable is used to evaluate the safety in urban arterial roadways along with other variables. Its results indicate that the land use type has a significant effect on accident occurrence. Kim and Yamashita (2002) in the 2002 research determined the land use type for each crash and performed a statistics analysis of crash frequencies by land use and measured crashes per acre of land by lan d use type. Design Design relates to street network characteristics within an area, and also refers to the sidewalk coverage ( Ewing and Cervero 2010 ) Measures include average block size, proportion of four way intersections, and number of intersections to mention a few. It may also include variables about the network pattern. Stre et network density and street connectivity along with street network pattern have also been considered in research ( Marshall and Garrick 2011 ) Rifaat and Tay ( Rifaat and Tay 2009 Rifaat et al. 2010 ) classified the street pattern into four categories: gridiron, warped parallel, loops

PAGE 24

24 and lollipops, mixed pattern s and integrated them into the influence of street pattern on crashes. Destination A ccessibility Destination accessibility refers to relative ease of accessing jobs, housing, and other attractions with the region ( Ewing and Cervero 2010 ) Unlike density, diversity and design, the destination accessibility has not been very widely use d by researchers. Khattak et al. (2010) considered the distance to school, distance to supercenter and distance to bridge/tunnel in the study of secondary incidents. Distance to T ransit Distance to transit is usually measured as an average of the shortest street routes from the residences or workplaces in an area to the nearest rail station or bus stop ( Ewing and Cervero 2010 ) Alternatively, it could be measured as transit route density, distance between transit stops, or the number of stations per unit area. The number of bus stops is used as a variable in ( Kim et al. 2010 ) Table 2 1 p rovides a summary of the built environment variables used in some of the typical built environment and safety research studies. From this table we can see that the variables o f density and design categories are used the most by researches. Only few studies looked at destination accessibility and distance to transit categories. The effects of land use on two lane highway crash rates are represented by No. of driveways observed on each highway segment, classified into seven different categories ( Ivan et al. 2000 ) Sawalha and Sayed (2001) built a crash prediction model The land use was divided into residential, business, and other i n this research. Kim and Yamashita (2002) calculated the crash frequencies per acre classified by various types

PAGE 25

25 of land use, which include residential, business, institutional, and other land use categori along with population, job counts, and other measures of economic activity ( Kim et al. 2006b ) Kim et al. (2010) incorporated in the accessibility factor based on this 2006 research. Wedagama et al. (2006) analyzed the relationship between different la nd use types and the non motorized transport casualties. Khattak et al. (2010) considered the supercenter and school as two types in land use for exploring the factor s associated with secondary crashes. An important observation that should be pointed out here is that while many researchers consider the land use in evaluating its effect on traffic crashes and almost all of them analyze the relationship between different land use types and crashes, only a few investigate the influence of land use mix on crashes. While the mixed land use is important to the traffic safety because it reduces trip rates and encourage non auto travel in statistically significant ways ( Cervero and Kockelman 1997 ) and the mixed land use policies can ultimately result in safer local roadways through the use of appropriate designs and slower speeds ( Berkovitz 2001 ) Although the impact of land use m ix on travel modes and behaviors has been evaluated, there has been less examination of its impact on traffic safety. Community Level Crash Analysis Some research studies have studied the traffic crashes at the community level ( Kim and Yamashita 2002 de Guevara et al. 2004 Kim et al. 2006a Dumbaugh and Rae 2009 Dumbaugh and Li 2010 Kim et al. 2010 Dumbaugh et al. 2011 ) The most important factor that should be considered for community level crash safety analysis is the choice of a reasonable community scale since the scale is the very first step and

PAGE 26

26 also is the base for the entire analysi s. All the spatial data including street network and crash points, and non spatial data like demographic must be integrated into the selected analysis scale for further analysis. Usually, there are three choices for community scale selection: census geogra phy, traffic analysis zone (TAZ) and user defined scale. Ukkusuri et al. (2011) developed a method to predicate the frequency of pedestrian crashes at the census tract level with consideration of the built environment. The author selected the census tract as the analysis un it because the data at this level provide a greater number of explanatory variables and hence can give better insight into the effects of various factors on the number of pedestrian crashes. Dumbaugh used the block group as the community level for his urb an design and crash incidence research ( Dumbaugh and Rae 2009 Dumbaugh and Li 2010 Dumbaugh et al. 2011 ) The advantage of the block group is that it is not only big enough to provide accurate population information, but are also small enough to have relatively homogeneous design characteristics ( Dumbaugh and Rae 2009 ) Since the block group typically uses arterials and thoroughfares as geographic boundar ies, it is consistent with the characteristics of crashes that happen along the street. Marshall and Garrick (2011) also conducted the analysis at block group level. de Guevara et al. (2004) utilized the TAZ as the analysis unit to create a model to forecast crashes in Tuscon, Arizo na. In his research, all of the data like demographic and economics are integrated into the 859 TAZs of the study area. Kim et al ( Kim et al. 2006a Kim et al. 2010 ) used a unified grid cell with a side of 0.316 mile and size of 0.1 square mile as the analysis unit to explore the relationship between land use, demographic information, accessibility, economic data an d the traffic

PAGE 27

27 crashes. This grid cell unit is convenient for the spatial and statistics analysis, it has its disadvantage: since the crashes occur along the street network, a grid cell may fail to count the number of crashes in it which eventually may infl uence the regression analysis result. This situation can happen especially when the grid cell boundary is very close to the street. Besides, it is difficult to incorporate into each grid cell the accurate information about housing and population which are census based. Also, Kim et al ( Kim et al. 2006a Kim et al. 2010 ) didn't consider the fact that crashes on a st reet may be affected by all the land use factors that are adjacent to the street. Based on the review of literature, the block group is selected as the analysis unit for this study. Statistics Methods for Crash Analysis Traditional E ngineering S tatistica l M ethods Lord and Mannering (2010) made a comparative meta analysis about the advantages and disadvantages of current methods that used in crash frequency analy sis. There are 16 types of statistical methods were compared. Table 2 2 shows the detailed result. Another common disadvantage for all of the traditional engineering statistical methods r the spatial dependency. All of the statistical models assume that the crashes are occurring independent with each other. While according to the first law of geography that everything is related to everything else, but near things are more related than di stant things ( Tobler 1970 ) this situation may not true in real world.

PAGE 28

28 Spatial Statistics Methods for Traffic Safety Analysis The Spatial Statistics toolbox provided by ArcGIS contains statistical tools for analyzing spatial distributions, patterns, processes, and relationships ( ESRI ) There are several tools that could be used for the spatio temporal crash analysis. Among t hese Ord General G under Analyzing Pattern tool set, and Ord Gi* under Mapping Clusters set could be used for the spatio temporal clusters/concentrations analysis. Spatial Autocorrelation (M still used for spatial autocorrelation. Given a set of feature and an associated attributes, it evaluates whether the pattern expressed is clustered, dispersed, or random. +1.0, it means the crashes are clustering, and if the value closes to 1.0, it mean s dispersion. When we use the tool for spatio temporal cluster anal ysis, the input is the groups of first we need see whether the Z score is large (or small) enough and falls ou tside the desired significance, so the null hypothesis there is no spatial clustering could be Index. The crashes have the clustered pattern if the value is greater th an 0, otherwise the pattern is dispersed if the value is less than 0. extremely low and extremely high total costs from the rest, but cannot distinguish the

PAGE 29

29 between situati ons ( Erdogan 2009 ) And also, the it cannot determine which specific clusters are the high value ones ( Songchitruksa and Zeng 2010 ) High/low Clustering (Getis Ord General G) Moran's I m ethods indicate clustering of high or low values, but these methods cannot distinguish between situations. The Getis Ord General G statistic could give an understanding of the clustering of high or low values. The General G statistic shows either hot spots or cold spots in the region. A larger value of G statistic than expected means that high values are found together (hot spots), and a small value of G statistic means low values are found together (cold spots). Limitation: Getis Ord General G, like Moran 's I is a global measure It provide s a single value of the spatial autocorrelation, checking the clustering of the spatial pattern, but do t show where the clusters are. signifi es the cases like presence of the cluster of similar values cluster of dissimilar values and disperse d distribution I indicate clustering of similar values and a low value means a clustering of dissimilar values of a variable. The tool also calculates Z score for each point, depending on the confidence interva l, statistical significance of each point was determined. Limitation: The family of Moran indices, however, does not discriminate between hot spots and cold spots ( Songchitruksa and Zeng 2010 )

PAGE 30

30 Hot Spot Analysis (Getis Ord Gi*) Th e Getis Ord Gi* hot spot analysis tool identifies statistically significant spatial c lusters of high values (hot spots) and low values (cold spots). It creates a new Output Feature Class with a z score and p value for each feature in the Input Feature Class. It also returns the z score and p value field names as derived output values for p otential use in custom models and scripts. The first step of Getis Ord Gi* hot spot analysis tool is using the Collect Event tool from ArcGIS Spatial Statistics package to turn the crashes feature class to a new weighted point feature class with a field I Count that indicates the sum of all the accidents happened in a unique geographic location. Then this new feature class will be used as the input for the hot spot analysis tool. The result of the tool: t he z scores and p values are measures of statistical significance which tell whether or not to reject the null hypothesis, feature by feature. A high z score and small p value for a feature indicates a spatial clustering of high values (hot spots) A low negative z score and small p value indicates a spatial clustering of low values (cold spots) A z score near zero indicates no apparent spatial clustering. Limitation : The Getis Ord Gi* hot spot analysis tool can tell if the accident of reference lies in the cluster of high/low values, but it cannot distingu ish on a global scale which hot spot clusters have the higher values compared with other hot spot clusters. In other words, the method cannot prioritize hot spots within hot spots. Advantages and Disadvantages of Spatial Statistics Method Advantages of sp atial statistics method includes: 1 ) Spatial dependency of crash data is considered. The crucial advantage spatial statistical methods have over classical statistical methods is the ability to analyze and

PAGE 31

31 model spatial dependence and heterogeneity in spati al data. Spatial autocorrelation analysis in spatial statistics is particularly useful to address the problems associated with crash because every spatial unit bears some quantifiable attributes (e.g., crash severity) and not just location information. 2) The spatial clustering methods could help users to understanding the patterns of the crash and the clustering distributions. 3) The geographic weighted regression (GWR) model which explores the spatial deviations of associations between dependent (crash frequency or severity) and explanatory variables performs much better than the global regression models in terms of the accuracy of the models. As for the d isadvantages al l of the transportation related studies reviewed used GWR techniques that assume a n ormally distributed error structure in the calibration of regression models. Such an assumption is not optimal for calibrating regression models of count data (such as crash data) ( Hadayeghi et al. 2003a ) Data Mining for Crash Analysis Data Mining Data Mining is a process to extract implicit, nontrivial, previously unknown and potentially useful information (such as knowledge rules, constraints, regularities) from data in databases ( Chen et al. 1996 ) The explosive growth i n data and databases used in business management, government administration, and scientific data analysis has created a need for tools that can automatically transform the processed data into useful information and knowledge. Consequently, data mining has become a research area with increasing importance. D ata mining have some distinct features of their own ( Geurts et al. 2005 )

PAGE 32

32 First, not only can data sets be m uch larger than in statistics but data analyses can be applied on a correspondingly larger scale. There are also differences of emphasis in the approach to modeling: compared with statistics, data mining pays less attention to the large scale asymptotic pr operties of its inferences and more to the and the computations they require. Furthermore, data mining has tackled problems such as what to do in situations where the num ber of variables is so large that looking at all pairs of variables is computationally not feasible. Generally, data mining techniques have the advantages as follows: 1) N ot only can data sets be much larger than in statistics but data analyses can be app lied on a correspondingly larger scale ( Geurts et al. 2005 ) 2) Data mining has tackled problems such as what to do in situations where the number of variables i s so large that looking at all pairs of variables is computationally not feasible ( Geurts et al. 2005 ) 3) el to be specified in advance and the assumption of additive relationship between risk factors. Association Rule Data Mining Association rule data mining is a data mining technique that seeks to discover associations among transactions encoded within a d atabase ( Agrawal et al. 1993 ) An association rule takes the form A B where A (the antecedent) and B ( the consequent) are sets of predicates. For example, consider a database that encodes transactions made at a supermarket.

PAGE 33

33 his statement may be expressed as: is_a ( x bagel ) ^ is_purchased ( x ) is_a ( y cream_cheese ) ^ is_purchased ( y ) The quality of an association rule can be stated by its support and its confidence ( Laube et al. 2008 ) The support is the probability of a record in the database satisfying the s et of predicates contained in both the antecedent and consequent, for instance the probability that a record in the database contains the purchase of a bagel and cream cheese in the example above. The confidence is the probability that a record that contai ns the antecedent also contains the consequent (Formula 2 1) ....(2 1) Usually, a practical solution of a problem by using the association rule data mining has too many rules satisfy the support and confidence constraints The lif t is a popular measure that could help further analyze the rules (Formula 2 2). Greater lift values indicate stronger associations. 2) Spatial Association Rule Data Mining A spatial association rule is a rule which describe s the implication of one or a set of features by another set of features in spatial databases ( Koperski and Han 1995 ) A spatial asso ciation rule is a rule in the form of 3 )

PAGE 34

34 where at least one of the predicates is a spatial predicate and c% is the confidence of the rule which indicates that c% of objects satisfying the antecedent of the rule will also satisfy the consequent of the rule. Various kinds of spatial predicates can be involved in spatial association rules. They may represent topological relationships between spatial objects, such as disjoint, intersects, inside/outside, adjacent to, covers/covered by, equal, etc. They may also represent spatial direction or ordering, such as left, right, north, east, etc., or contain some distance information such as close to, far away, etc. There a re several existing spatial association rule data mining models and methods ( Koperski and Han 1995 Appice e t al. 2003 Appice et al. 2005 Mennis and Liu 2005 Bogorny et al. 2006 ) the Weka GDPM by ( Bogorny 2006 Bogorny et al. 2006 ) and ARES by ( Appice et al. 2003 ) These two methods treat differently the two most important steps for spatial association rule data mining: spatial relationship extraction and association rule mining algorithm. Weka GDPM is an extension of Weka. It supports the dynamic geographic data pre processing and transformation for mining spatial association rules. Weka is a well established free and open source toolkit with friendly and graphical user interfaces which covers the w hole association rule data mining process. The Weka is an open platform at the same time and easy for researchers to develop new modules to enhance its function. In Weka GDPM(Geographic Data Preprocessing Module), spatial relationships are computed with SQ L queries based on GDBMS (Geographic Database Management System). The geo ontologies are used to optimize the extraction of spatial relationships.

PAGE 35

35 Then the extraction result is transferred to the Weka input format and the association rule algorithm in Weka is used to mining the result rules. ARES (Association Rules Extractor from Spatial data) is a full fledged system that integrates SPADA (Spatial Pattern Discovery Algorithm) to extract multi level spatial association rules by exploiting an Inductive Logi c Programming (ILP) approach to (multi ) relational data mining ( Appice et al. 2003 ) ARES assists data miners in extracting the units of analysis (i.e. reference objects and task relevant objects) from a spatial d atabase by means of a complex data transformation process that makes spatial relations explicit, and generates high le vel logic descriptions of spatial data by specifying the background knowledge on the application domain (e.g. hierarchies on target releva nt spatial objects or knowledge domain) and defining some form of search bias to filter only association rules that fulfill user expectations. To achieve the most two important steps of the spatial association rule data mining in this research, the PostGI S/PostgreSQL will be used to extract the spatial relationships. F or the association rule mining algorithm, this research will use the R package arules Current Application of Data Mining in Crash Analysis Hardin et al. (2003) created an economic impact index for the traffic crashes in the whole Alabama state by using the data mining approach. The classification and regression trees (CART) methodology was used to find variables that might help predict county level characteristics associated with the occurrence of a crash El Seoud and Elbadrawi (2004) utilized the clustering da ta mining technology to identify clusters of common accidents, and conditions under which accidents are more likely to cause death or injury. And also, accidents were profiled in terms of accident

PAGE 36

36 and freeway characteristics using data mining. The study wa s conducted for the traffic Chong et al. (2004) used the decision trees and neural networks to develop a realistic model of injury severity resulting from an automobile accident, improve the prediction accuracy and est ablish the most important factors influencing the severity of the injury Chong et al. (20 05) applied three machine learning approaches: neural networks trained using hybrid learning approaches, decision trees and a concurrent hybrid model involving decision trees and neural networks to predict the severity of the injuries in accident. The r esults reveals that, for non incapacitating injury, incapacitating injury, and fatal injury classes, the hybrid approach performed better than neural network, decision trees and support vector machines. For no injury and possible injury classes, the hybrid approach performed better than neural network. The no injury and possible injury classes could be best modeled by decision trees Smith and Wang (2005) proposed two new techniques, called Max Gain (MG) and Sum Max Gain Ratio (SMGR), which are well cor related with existing techniques, to rank the variables of the data in order to remove the non useful, irrelevant and noisy data in traffic crash analysis. The result proves that data mining method is better than the traditional statistical approaches. In Graves et al. (2005) data mining toolkit, was used to examine the possible relationships between roadw ay data and traffic safety data. The data mining system is developed by the University of

PAGE 37

37 Alabama in Huntsville (UAH) Information Technology and Systems Center, and provides classification, clustering and association rules mining method. Chang and Chen (2005) used the data mining of tree based models to analyze freeway accident frequency. In the study, a Classification and Regression Tree (CART) model and a negative binomial regression model were developed to establish the empirical relationship between traffic accidents and highway geometric variables, traffic characteristics, and environmental factors. Result shows that CART is a good alternative method for analyzing freeway accident frequencies when compared with the negative binomial regression model. Chang and Wang (2006) states that the data mining technique classification and regression t ree (CART) could overcome this limitation the statistics regression models that have. The study creates a CART model to establish the relationship between injury severity and driver/vehicle characteristics, highway/environment variables and accident variab les. Tseng et al. (2005) applies three data mining techniques to discover the relationship between driver inattention and motor vehicle accidents. The data are first clustered using the Kohonen networks. Then, the patterns and rules of the data are explored by decision t ree and neural network models. Geurts et al. (2005) explored why road accidents tend to cluster in specific road segments by using data mining technique. A data mining technique called frequent item sets is applied for automatically identifying accident circumstances that frequently occur together, for accident located in and outside black zone.

PAGE 38

38 Pande and Abdel Aty (2009) used the association analysis or market basket analysis to determine which crash characteristics are associated with each other. The ern at first. The author states the potential of the data mining technique used may be realized in the form of decision support tool for the traffic safety administrators. The classification trees and association rules of data mining te chniques are used b y Montella et al. (2011b) for e xploratory analysis of pedestrian crashe. The result is consistent with the previous studies that used other analystical techniques, such as probabilitstic model of crash injury severity. The authors think one of the advantages for the data mining techniqu es is their ability to detect interdependencies among crash characteristics. Montella et al. (2011c) also used the classification trees and association rule discovery methods in the analysis of powered two wheeler crashes in Italy. The associatin rule data mining technique was used by Montella in another study to explore relationships between crash contributo ry factors at urban roundabouts and different crash types ( Montella 2011 ) Kashani and Mohaymany (2011) used the classification and regression trees (CART) to find the significant factors influencing injury severity of vehi cle occupants (excluding drivers) involved in crashes on two lane two way rural roads in Iran ( Alikhani et al. 2013 ) proposed a hybrid clustering classification method to classify the road crashes severity by using the K means and self organizing maps( SOM) as clustering approach to impro ve the accuracy of the classification. The study also used the individual classification methods artificial neural network (ANN) and adaptive neuro fuzzy inference system (ANFIS) to make comparison.

PAGE 39

39 In summary, current research of applying data mining on c rash analysis has the following characteristics and limitations: 1) Most of the research are focused on exploring the association between crash severity or crash frequency and independent variables such as driver and vehicle characteristics, road conditions, environment factors and highway geometric attributes. Few other studies have looked at other aspects such as economic impact of crashes, data quality improvement and traffic risks. 2) The association rule data mining technique are used in some studies, while none of them used it to explore the influence of built environment on crashes. 3) Most of the data mining methods fail to integrate spatial elements consider the spatial attribute of cras hes or the spatial contextual elements that may influence crashes such as road network characteristics or more broadly the built environment. 4) The spatial association rule data mining never been used in crash analysis. Except the advantages of general data mining techniques, t he advantages of using the spatial association rule data mining method to evaluate the influence of built environment on crashes are as follows: Not only the regular variables but also the spatial relationships are considered in the min ing process. User controlled mining process. User could control the mining process by setting the threshold of support and confidence that the algorithm should satisfy. Thus, user

PAGE 40

40 could get much stronger rules but less in total when users increase the thr eshold of support and confidence. Result reflects the interdependency of the participating variables and is more close to real world. Th e result of the spatia l association rule data mining is a set of rules with each composed of several predicates for the rule antecedent One predicate may play totally different effect in two different rules. Like a plaza in a high density mixed use neighborhood may play negative effect on crash frequency along with other variables, at the same time, a plaza close to a sho pping mall in a suburban area may play a positive effect on the occurrence of the crash. This sort of result is different from what we can get from traditional statistical models and spatial statistical models, since the all plaza will play either negative or positive effect in these two models.

PAGE 41

41 Table 2 1 D Variables of built environment used in some ule current crash research Research Study Density Diversity Design Destination Accessibility Distance to T ransit Ivan et al. (2000) X Sawalha and Sayed (2001) X X X Kim and Yamashita (2002) X Kim et al. (2006a) Hadayeghi et al. (2003b) X X Hadayeghi et al. (2010) X X Wedagama et al. (2006) X X X Rifaat and Tay (2009) X X X Rifaat et al. (2010) X X X X Dumbaugh and Rae (2009) Kim et al. (2010) X X X X Khattak et al. (2010) X X X Dumbaugh and Li (2010) X X Dumbaugh et al. (2011) X X Marshall and Garrick (2011) X X: One or more variables under the dimension are used

PAGE 42

42 Table 2 2 Summary of existing statistics models for analyzing crash fr equency data ( Lord and Mannering 2010 )

PAGE 43

43 ( Ewing and Dumbaugh 2009 ) Figure 2 1 Conceptual Framework Linking the Built Environment and Traffic Safety

PAGE 44

44 CHAPTER 3 RESEA R CH METHODOLOGY This chapter presents the data used, followed the system design, which is the core of the methodology of this research Data The Miami Dade County, Florida is selected as the study area for this research. The county has the largest number of the crashes among the 67 counties in F lorida. According to the 2009 Florida traffic crash statistics report, 12.5% of all crashes (42,244 of 338,633) in Florida came from Miami Dada County ( Florida Department of Highway Safety and Motor Vehicle 2010 ) Another important reason for selecting Miami Dade is because it has a good mix of urban and suburban areas. Figure 3 1 shows an overview of the study area. The data used in the research mainly includes two parts: the crash data a nd the data used to describe and measure the built environment. These data comes from different sources, and are all managed in a PostGIS/PostgresSQL database which is convenient for spatial data process ing Crash Data : The crash data for year s 2005 to 20 09 in Miami Dade County were obtained from the Florida Highway Portal (FHP). The data was geolocated by GIS address matching process using the Miami Dade County GIS streets as reference. The crash attributes include crash time, crash severity, pedestrian c ount and bike count. There are three levels for the crash severity: PDO (property damage only), injury and fatal crashe s Since this research explores how the crashes are influenced by built environment, crashes occurred on interstate highway s and Florida s T urnpike are

PAGE 45

45 excluded. 1 4 965 highway crashes were excluded out of a total 202 428 crashes Table 3 1 shows the number of crashes by type. Census Data : This research uses 2010 Census data at the block group leve l. Attributes include housing and population. The job data is from 2009 There are a total of 1594 block groups in the study area. T wo block groups in the Everglades National Park area were excluded f rom the study because they cover very large forest area s and only few other land use types around the boundary, and ver y few crashes Land Use : The parcel level land use data came from the Florida Geographic Data Library (FGDL). Six lan d use types residential, retail/office, institutional, recreation, indus trial, agricultural were mainly used for the spatial relationship calculation and extraction. Figure 3 2 through Figure 3 7 shows the land use maps for the six land use t ypes. Street : The street network data was obtained from the Miami Dade County GIS database system. Intersection : The intersection data was from Signal Four Analytics hosted at Geoplan Center, University of Florida There are total 45,097 number of interse ctions in the study area. Transit D ata : I nclude both bus stops and bus routes obtained from Miami Dade County GIS database system. There are a total of 8 563 bus stops in the study area. Fi gure 3 8 shows a map of crashes and land use data that were used to calculate the value of D variables for built environment on a block group. Study Design This research aims at exploring the spatio temporal influence of the built environment to crashes at the community level. The crash frequency for different cash

PAGE 46

46 types and in different time groups are used as measur es for crashes. Crash es are organized by type: PDO crash es injury crash es fatal crash es pedestrian crash es, bike crash es and total crashes To study both spatial and temporal influence of built environment, crashes were also classified into four different groups based on the occurrence time The built environment is measure d through the 5Ds variables, distance and topological relationship among built environment e lements. The block group was selected as the spatial analysis unit for the research Figure 3 9 shows the system design for the research It contains five main steps : Crash data aggregation : Crashes organized by type are aggregated spatio temporally by block group using spatial join s. D variables calculation : This part is to calculate the 5Ds variables of built environment for each block group. Spatial relationship extraction : This part includes the distance rela tionship calculation and topological relationship extraction. Discretization : Since all current association rule mining algorithms can only accept categorical data, discretization of all numeric al data is needed for further analysis. Spatio temporal asso ciation rules mining and visualization. Broadly all of the above four steps could be regarded as data preparation for this step. The first task in this step is mining the rules which provide information on the kind of conditions that result in big number of crashes. The visualization of the rules help understand the rules Tools Used Several tools are used in this research. To better explain the steps mentioned in the system design, an introduction of the se tools is necessary

PAGE 47

47 ArcGIS Esri 's ArcGIS is a geographic information system (GIS) for working with maps and geographic information It is used for: creating and using maps; compiling geographic data; analyzing mapped information; sharing and discovering geographic information; using maps and geographic information in a range of applications; and managing geogr aphic information in a d atabase ( http://en.wikipedia.org/wiki/Arcgis ). The ArcGIS is the most popular GIS tool and some of its toolboxes are used in this research to view and process the GIS data. QGIS QGIS (previously known as "Quantum GIS") is a cross platform free and open source desktop geographic information systems (GIS ) application that provides data viewing, editing, and analysis capabilities. QGIS has a good integration with the open source GIS package PostGIS which is used in this research. Its capability can be easily extended using Python based plugins. PostgreSQL PostgreSQL is an open source object relational database management system (ORDBMS) with an emphasis on extensibility and standards compliance PostGIS PostGIS is a spatial database extender for PostgreSQL database. It adds support for geographic objects allowing location queries to be run in SQL. PostGIS is used in this research for: 1) importing and exporting GIS data to and from PostgreSQL database; 2) using its development library to conduct spatial analysis for GIS data stored in PostgreSQL database. R R is a language and environment for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modelling, c lassical statistical tests, time series analysis, classification, clustering ...) and graphical techniques, and is

PAGE 48

48 highly extensible ( http://www.r project.org/ ). In addition to the inherent statistics packages R has many other extension packages that cover statistics and data mining developed by open source community. The association rule data mining, visualization and part of the rules analysis in this research is done using R. The main R packages used in this re search include a rules, a ruleViz, RPostgreSQL. RStudio RStudio IDE is the user interface for R. The newest RStudio support s debugging. RStudio is used as the development environment for this research. a rules The R package arules provides a basic infrast ructure for creating and manipulating input data sets and for analyzing the resulting itemsets and rules. The package also includes interfaces to two fast mining algorithms, the popular C implementations of Apriori and Eclat by Christian Borgelt. These alg orithms can be used to mine association rules. The package is used in th is research for manipulating the block group based crash and built environment data and mining and analyzing the association rules among them. arulesViz arulesViz implements several known and novel visualization techniques t o e xplore association rules RPostgreSQL This package provides a Database Interface (DBI) compliant driver for R to access PostgreSQL database systems This package is used in the research for bridging the rules f ound and corresponding block groups, which can help examine o n a map the block groups that satisfy a given rule. Weka GDPM Weka GDPM is an interoperable module that supports automatic geographic data preprocessing for spatial data mining. GDPM is impleme nted into Weka, which is a free and open source classical data mining toolkit that has been

PAGE 49

49 widely used in academic institutions. GDPM follows the Open GIS specifications to support interoperability with Geographic Information Systems The Weka GDPM is use d as the development environment to implement the function of calculating spatial distance variables and extracting spatial topological relationship variables. Crash Data Aggregation The purpose of this task was to aggregat e crashes by block groups accor ding to their spatial location. The frequency of crashes in a block group is considered. A ggregation is accomplished in two ways First, crashes are separately spatially aggregated by block group for each of the 6 crash types: total, PDO, injury, fatal, pe destrian and bike crashes. Second, crashes are separately aggregated spatially by block group including consideration of the crash occurrence time. Crash Data Aggregation without Considering Crash Time The following list is the crash aggregation without considering crash time for 6 crash types at block group level. Frequency of t otal crash Frequency of PDO crashes Frequency of injury crashes Frequency of fatal crashes Frequency of pedestrian crashes Frequency of bike crashes One of the most important questions is how to assign crashes to a block group, especially those close to the block group boundar ies The method from previous research in ( Dumbaugh and Li 2010 ) and ( Dumbaugh et al. 2011 ) was used in this research A 200 feet buffer was created around each block group and the buffer area was used to calculate the number of crashes inside the buffered block group. The buffer

PAGE 50

50 approach may cause some crashes to be aggregated on two adjacent block groups (double count) since crashes are geocoded at street centerlines and street centerlines are used as block group boundaries. This is reasonable since these crashes are influenced by the built environment in the adjacent block groups. The aggregation process was accomplished by using the spatial operation ST_Within in PostGIS/PostgreSQL. The final result shows the frequency of each crash type for each block group. In the aggregation of total crashes process five block groups had no crash A further look of these five block groups shows that they are either close or in the sea shore and no meaningful land use (only water) existed. These block groups have no meaning for the research and were excluded. Thus, 1 587 block groups in total are valid for further analysis. Table 3 2 shows the number of block groups without crash type. Table 3 3 is the aggregation result. Figure 3 10 is the plot of the aggregation result. Crash Data Aggregation Considering Crash Time The purpose of this task is to classify crashes into groups based on crash ti me and categorized them in different types, then spatially aggregated at block group The groups are determined based on statistics of the crash data distribution by time. Table 3 4 shows the number of total crash es for each hour. Total crash number is 184 381 This number is smaller than the total crashes considered in the above section T his is because the occurrence time attribute is NA for 3 138 crashes Figure 3 11 sh ow s the distribution of number of crashes by time

PAGE 51

51 Based on the Table 3 4 and Figure 3 11 the crash data is classified in four group s Table 3 5 shows the time period s and average number of crashes per hour for each group. Group 1 is from 12am to 6am. It is for late night and very early morning time period It has the lowest crashes number in this period among the 4 groups due to t he least human activity Group 2 i s from 7am to 11am for the morning. People have more travel activities in this period than group 1 especially when people drive to work during 8am to 9am. Group 3 is from 12pm to 6pm. It has the highest crash number comp ared to other groups. The total crash number reache s the peak of the day from 4pm to 5pm. Average number of crash per hour in this time group is 12,043 which is much higher than other groups Group 4 is from 7pm to 11 pm. The crash number s decrease gradu ally The crash aggregation considering the crash time used the same method in the section without considering time. The only difference is the fatal crashes wer e not independently considered but combined with injury crashes as one category. One reason is because the fatal crash number is small when divided into four time groups, and the other reason is to reduce the number of aggregation categories ( number of crash types multiple by number of time groups ) from 24 to 20. Table 3 6 is the summary of the block groups that have no crashes for each crash type in different time groups. Many groups have no pedestrian and especially no bike crashes 1 leaves o n 263 block groups with bike crashes (1 6.6% )

PAGE 52

52 Table 3 7 Table 3 8 Table 3 9 and Table 3 10 show the summary statistics of crashes by block group for time group 1, group 2, group 3, group 4, respectivly. Figure 3 12 Figure 3 13 Figure 3 14 and Figure 3 15 show crash es by block group in each time group. D Variables The selected 5 Ds variables of built environment used as the independent variables in this research are as foll ows: Density Population density (per square mile) Housing density (per square mile) Job density (per square mile) Diversity Land use mix index Design Street length ( feet per square mile ) Number of intersections (per square mile) Destination accessibili ty Distance to commercial ( feet ) Distance to transit Number of bus stops (per square mile) Average d istance to bus stops ( feet ) P opulation, housing and job data were aggregated by block group. D istance to commercial and bus stops were calculated as the av erage distance from all of the residential land use s in the block group to commercial land use s and bus stops.

PAGE 53

53 The number of intersections by block group was calculated by spatial join ing intersection points to block group s Street length by block group w as calculated by clipping streets to block group boundaries and summarizing the street length for all the streets inside the block group. The land use mix was calculated by using the entropy measure. Entropy is widely used in land use literature. The entr opy measure was first used to quantify land use balance by Cerve ro and Kockelman (1997) in looking at suburban employment centers. The formula to calculate the entropy measure is shown below: Land use mix index entropy = ( 3 1 ) where is the proportion of land use area of type i of the sum of all land use area types in a block group. The entropy calculation is implemented through a Java program. The land use mix index is a value between [0,1 ) The bigger the number the high er the land use mix in the block group. Zero represents a single land use for the entire block group no mix at all. Table 3 11 shows the statistics information for the calculated D variables in the study area Spatial Distance an d Topological Relationship Extraction Introduction As discussed in the spatial association rule data mining literature ( Malerba et al. 2003 Ceci et al. 2004 Sharma et al. 2005 ) there are three kinds of relationship between spatial objects: topological relationship such as disjoint, intersects, inside/ou tside, adjacent to, covers/covered by, equal, etc.; direction or orientation or

PAGE 54

54 ordering relationship, such as left, right, north, east, etc.; and distance relationship such as close to, far away, etc Since direction relationship is too complicated to des cribe and rarely used in the built environment and safety analysis, this research only consider the topological and distance relationships. Table 3 12 shows the list of layers chosen to participate in the spatial relationship extraction. Topological relationships characterize the type of intersection between two spatial features, and remain invariant under topological transformations such as rotation and scaling. There are many approaches in the literature to form ally define a set of topological relationships among points, lines, and polygons The most widely used framework used to describe the topological relationship between spatial object s is the 9 intersection model proposed by ( Egenhofer and Franzosa 1991 ) and ( Egenhofer and Herring 1994 ) to categorize binary topological relations in geographic databases. It is independent of the concepts of distance and direction and is based purely upon topological properties. The 9 intersection model applies to spatial objects represented by regions/polygons, lines, and points. It is based on the consideration that for each spatial object it is possible to distinguish three parts: its interior its boundary and its exterior In the case of spatial objects described b y the Cartesian space regions have non empty interiors, lines and points have empty interiors, lines have non empty boundaries (coincide with them), and points have empty boundaries.

PAGE 55

55 For this research, not all possible topological relationship between all these layers are considered. Some of them are excluded because: the topological relationship has no practical meaning or is not meaningful For example, the topological relationship between point layers (bus stops, inte rsections ) is not considered because it has no practical meaning. The same situation exists between the point features and the line features. A ll the point data in this research crash, bus stops or intersections are on the line features (street network or bus route). For polygons, since the different land use types never overlap with each other, only the touch and disjoint are considered For the topological relationship between polylines and polygons, only the cross (polyline cross polygon ) and contain (polygon contains polyline) are considered in this research. Table 3 13 shows the topological relationships considered in this research. The pairs of feature layers that participate in the spatial distance relat ionship calculation are almost the same with those in topological relationship extraction process except the residential and bus stop pair are only available for the distance relationship. All of the distance value s calculated are average distance s Table 3 14 shows the spatial distance relationship relevant to the association rule data mining in this research. Methods of Spatial Relationship Extraction and Calculation Figure 3 18 illustrates how the spatial topological relationship extract ion and distance relationship are calculated. The whole process is implemented in W eka GDPM ( Bogorny et al. 2006 ) development environment. The most important part of the algorithm is the query of the relationship between a pair of feature layers in a block group. This is accomplished by using the PostGIS/PostgreSQL in Java. The result was stored in the database.

PAGE 56

56 Results of Spatial Relationship Extraction and Calculation Table 3 15 shows the statistic al information for the spatial distance relationship calculated at the block group level the block group has no values for one of both layers The such block groups. Table 3 16 shows the statistics information for topological relationship extracted at block group level. The disjoint relationship indicates that both layers are present in the block group, but they do not touch each other. NA means th at one or both layers exist in the block group. Discretization A ssociation rule data mining can only work with categorical data ( Mennis and Liu 2005 ) One approach to address this problem is to discretize numeric data into ordinal categories and mine the ordinal data for association rules The data that needs to be discretized in this research include three categories: the number of crashes by type 5 D s variables, and the spatial distance variables. To avoid the complexity of the rules result ing from a large number of classes of discretization, only three classes are selected in th is research. The number of crashes and the value of 5Ds variables are converted to: big i ntermediate small T he distance variables were discretized to far intermediate and close Methods of Discretization D iscretization methods can be divided into two types: supervised and unsupervised discretization. In supervised discretization give n a numeric attribute to be discretized into several classes one looks at correlations between this attribute and the

PAGE 57

57 productivity of the class ( Lud and Widme r 2000 ) On the contrary, there are no cl ass attributes that could be used in unsupervised discretization method. The association rules data mining is an unsupervised learning method that needs discrete attributes, and actually no class attributes are av ailable, only the unsupervised method could be used. The R package arules provides four widely used unsupervised discretization methods. They are: "interval" (equal interval width), "frequency" (equal frequency), "cluster" (k means clustering) and "fixed" (categories specif y interval boundaries). The obvious weakness of the equal width method is that in cases where the outcome observations are not distributed evenly, a large amount of important information can be lost after the discretization process ( Kotsiantis and Kanellopoulos 2006 ) Since the distribution of the three categories of the data in this study is far away from even, the equal width discretization is not suitable for this research. E qual frequency method has the disa dvantage that many occurrences of a continuous value could cause the occurrences to be assigned into different bins ( Kotsiantis and Kanellopoulos 2006 ) The R package arules makes an improvement which makes the duplicate values b elong to one bin only. So the frequency in each bin is not strict equal. Generally, t he K = means clustering discretization is a better method when considering the distribution of the data. Since it is very sensitive to the value of k an incorrect estimate of k could lead to unsatisfactory results ( Vannucci and Colla 2004 ) Since the K in this r esearch is 3, not all variables are suitable to be discretized by using this method especially for the distance which var ies from very small to very large

PAGE 58

58 values. B lock groups containing such distances will become outlier s if the K means method is us ed Since the goal of this research is to find how the built environment influence the big number of crashes at the block group level t he b ig number of each crash type is on the right side of the potential rules. Thus the equal frequency method is the best me thod for this study because if the cluster or the equal width method is used, it is very likely that the variable for the number of crash es may be come an outlier for those block groups that have large number of crashes. As for the D variables, since they a re used for all crash types, they needed to be discretized by using the same threshold in the association rule mining for different crash types al though there are different number of block groups in which the crash number is bigger than 0. The cluster meth od may result in outlier and the equal frequency is as good as fixed method to reflect the distribution of the data. The quartile based fixed discretization method is the better choice than other methods. Before discretization is conducted for the distan ce variable, basic statistics were developed for all the 14,152 distance measures with value equal or greater than 0. The result is presented in Table 3 17 Figure 3 19 sho ws the distribution of the distance values. It shows that most of the value s are in a relative small distance area. Based on this result, the fixed discretization method is adopted, and the 1 st Qu. and 3 rd Qu. are selected as the thresholds which means t he fixed areas are: [ 0 648.2], (648.2, 1 542], (1 542, 31 310] After trying different methods of discretization on the values of the 3 categories, the following discretization methods were chosen : equal frequency f or number of

PAGE 59

59 crashes, quartile based fi xed method for D variables (except the distance) and fixed method for all distance variables Result of Discretization The discretization threshold and result of crash numbers for each crash type without considering the time groups is in Table 3 18 Table 3 19 through Table 3 23 show the discretization result for each crash type in the 4 time groups. Table 3 24 shows the D variables discretization result. Since the quartile base fixed method is used, most of the intermediate groups have about double of the number to the small and big groups. The intersection number and bus stop number are not the case this is because many block groups has the same number of the variabl e at the edge of the threshold. Table 3 25 shows the result for the distance discretization. Association Rule Data Mining As introdu ced in the tools section, the R package arules was used for the association rule mining and analysis, and the arulesViz package was used for rule visualization. Additionally RPostgreSQL package was used to further understand the mined rules the rules Thi s involved find ing the corresponding block groups for a rule, and write the block group geometry, crash related data, D variables, topological relationship information, and distance information for that block group into database. The output of discretizat ion process serves as the input for the association rule mining process. The discretization results were load ed into R, then the apriori function of arules was called to min e the potential rules. The apriori function implemented the Apriori algorithm with some improvements such as addition of a prefix tree and item

PAGE 60

60 sorting. The apriori function has several parameters Table 3 26 lists the parameters The meaning of the parameter also is explained below through a sam ple code. totalrules< apriori(crashplusbetran, parameter = list(support = 0.1, confidence = 0.6, minlen = 2, maxlen=6), appearance = list(rhs = c( total_crash=big ))) From above, the apriori is the function of mining association rules. The crashplusbetran i s the transaction data that were transferred from the discretization data. T ransaction is a n internal data type of arules and all discretized data should be convert ed to it before they can be used by the apriori function. Minimum s upport, minimum confidenc e, minlen (minimal length), and maxlen(max length) are user defined parameter for the rules. Appearance defines how the rules will appear, and it has the function of filtering the ( right hand sides which means only the rules that have the required rhs will be mined. Decision on the values for minimum support, minimum confidence, were guided by previous studies: ( Montella et al. (2011b) Montella et al. (2011a) ) used the threshold 1.5. Diana (2012) used mining r ules with Support > 0.01 and Lift > 1.1 ( Pande and Abdel Aty 2008 ) used support <0.80%, confidence <10% ( Geurts et al. 2003 ) utilzied minsup = 5 % and minconf = 30 % the algorithm obtained 187 829 frequent item sets of maximum size 4 for which 598 584 association rules could be generated. The minimum support used in this study is minsup = 5%, the minimum confidence is minconf = 50%, the minlen =2, and maxlen = 5.

PAGE 61

61 Table 3 1 Number of each crash type Crash type Number Total crash 187 519 PDO crash 155 522 Injury crash 30 850 Fatal crash 1 147 Pedestrian crash 6 015 Bike crash 2 255 Table 3 2 Number of block group that has no crash Total Crash PDO Crash Injury Crash Fatal Crash Ped Crash Bike Crash No. Block Group 0 0 22 605 148 327 Table 3 3 Summary of the number of crashes by block group for all crash types Total Crash PDO Crash Injury Crash Fatal Crash Ped Crash Bike Crash Min 1 1 0 0 0 0 1 st Qu. 88.5 69.0 16 0 2 1 Median 191 151 33 1 5 2 Mean 255 211.2 42.23 1.517 8.105 3.044 3 rd Qu. 342.5 283.5 56 2 11.5 4 Max 3801 3091 68 7 23 57 34

PAGE 62

62 Table 3 4 Summary of total crash by time Crash Hour Crash Count (23:00, 0:00] 4 039 (0:00, 1:00] 3 194 (1:00, 2:00 ] 2 610 (2:00, 3:00] 2 403 (3:00, 4:00] 2 508 (4:00, 5:00] 2 743 (5:00, 6:00] 4 436 (6:00, 7:00] 7 423 (7:00, 8:00] 9 566 (8:00, 9:00] 8 290 (9:00, 10:00] 8 230 (10:00, 11:00] 9 176 (11:00, 12:00] 10 486 (12:00, 13:00] 10 702 (13:00, 14:00] 11 700 (14:00, 15:00] 13 190 (15:00, 16:00] 12 981 (16:00, 17:00] 13 309 (1 7:00, 18:00] 11 931 (18:00, 19:00] 9 024 (19:00, 20:00] 7 992 (20:00, 21:00] 7 401 (21:00, 22:00] 6 057 (22:00, 23:00] 4 990 Table 3 5 Time group of crashes Group Time periods Average crashes per hour 1 [0, 7) 3 133 2 [7, 12) 8 537 3 [12, 19) 12 043 4 [19, 24 ) 7 093 Table 3 6 Summary of blog groups have no crashes for each crash type in different time groups Total crash PDO crash Injury crash Ped c rash Bike crash Group 1 25 31 169 868 1324 Group 2 14 19 98 588 912 Group 3 6 8 45 355 629 Group 4 15 23 102 554 1 004

PAGE 63

63 Table 3 7 Statistics summary of crashes by block group for time group 1 Total crash PDO crash Injury crash Ped crash Bike crash Min 0 0 0 0 0 1 st Qu. 9 7 2 0 0 Median 21 15 5 0 0 Mean 30.56 23.67 6.893 0.8746 0.2313 3 rd Qu. 40 31 10 1 0 Max 466 344 122 15 8 Table 3 8 Statistics summ ary of crashes by block group for time group 2 Total crash PDO crash Injury crash Ped crash Bike crash Min 0 0 0 0 0 1 st Qu. 20 16 3 0 0 Median 41 33 7 1 0 Mean 57.72 48.5 9.227 1.777 0.7461 3 rd Qu. 76 65 12 3 1 Max 927 754 173 16 20 Table 3 9 Statistics summary of crashes by block group for time group 3 Total crash PDO crash Injury crash Ped crash Bike crash Min 0 0 0 0 0 1 st Qu. 39 30 7 1 0 Median 85 68 14 2 1 Mean 113.6 95.41 18.16 3.52 1.474 3 rd Qu. 151 126.5 25 5 2 Max 1739 1457 282 29 14 Table 3 10 Statistics summary of crashes by block group for time group 4 Total crash PDO crash Injury crash Ped crash Bike crash Min 0 0 0 0 0 1 st Qu. 1 7 13 3 0 0 Median 39 31 7 1 0 Mean 49.14 39.96 9.179 1.834 0.5551 3 rd Qu. 66 54 13 3 1 Max 648 516 132 15 9

PAGE 64

64 Table 3 11 Summary statistics for D variables by block gro up P op density Housing density Job density Land use mix index Street length feet No. of i ntersecti on No. of b us stop Dist. residential to bus s top Dist. residential to commercial Min 0 0 0 0 0 0 0 0 0 1 st Qu. 4 836 1 738 250.3 0.3117 85,929 132.2 8.2 899.8 676.4 Median 8 042 2 756 638.1 0.5035 116,018 212.5 24 1 263.7 1 040.5 Mean 11 029 5 404 2 056.9 0.4815 107,365 215.0 31.2 1 422.5 1 358 .2 3 rd Qu. 12 696 5 411 1509 0.6684 134,141 276.7 43.4 1 677 1 487.3 Max. 115 583 90 301 174 618.8 0.9992 3 21 173 2919.2 251.7 13 218.6 29 140.7

PAGE 65

65 Table 3 12 Layers participate the spatial relationship extraction Layer type layers point layer bus stop intersection line layer street network bus routes polygon layer Agricultural Residential Commercial Recreation Institutional Industrial

PAGE 66

66 Table 3 13 Topological relationships need to be extracted Layer to layer Layer name Layer name Polygon with polyline Land use to bus routes Resident ial Bus routes Commercial Bus routes Recreation Bus routes Industrial Bus routes Institutional Bus routes Land use to street Residential Street Commercial Street Recreation Street Industrial Street Institutional Street Agr icultural Street Polygon with polygon Residential to other land use types Residential Agricultural Residential Commercial Residential Recreation Residential Institutional Residential Industrial Commercial to other land use types Co mmercial Institutional Commercial Recreation Commercial Industrial Industrial to recreation Industrial Recreation Institutional to recreation Institutional R ecreation

PAGE 67

67 Table 3 14 Distance re lationship need to be extracted Layer to layer Layer name Layer name Polygon with Point Land use to bus stops Residential Bus stops Polygon with polyline Land use to bus routes Residential Bus routes Commercial Bus routes Recreation Bus route s Industrial Bus routes Institutional Bus routes Land use to street Residential Street Commercial Street Recreation Street Industrial Street Institutional Street agricultural Street Polygon with polygon Residential to othe r land use types Residential Agricultural Residential Commercial Residential Recreation Residential Institutional Residential Industrial Commercial to other land use types Commercial Institutional Commercial Recreation Commercial I ndustrial Industrial to recreation Industrial Recreation Institutional to recreation Institutional R ecreation

PAGE 68

68 Table 3 15 Summary statistics for spatial distance relationship by block group level Min 1 st Qu. Median Mean 3 rd Qu. Max Na's dist residential to bus stop 0 899.8 1,263.7 1,422.5 1,677 13,218.6 215 dist residential to commercial 0 676.4 1,040.5 1,358.2 1,487.3 29,140.7 413 dist commercial to bus route 0 174.2 517.7 790.9 1,023.6 20,21 2.2 480 dist recreation to bus route 0 288.1 857.7 1,336.4 1,603.8 18,805 1,332 dist industrial to bus route 11.69 373.5 772.66 1,099.37 1,247.83 25,088.18 1,202 dist institutional to bus route 0 393 809.3 975 1,256.2 27,973.7 824 dist residential to s treet 0 737.1 1,018.4 1,237.4 1,370.5 20,062.1 15 dist commercial to street 10.67 849.23 1,190.19 1,528.45 1,699.56 293,14.56 408 dist recreation to street 0 732 1,330 1,786 1,972 22,014 1,276 dist industrial to street 357.4 1,033.9 1,293.3 1,862.1 1,88 9.2 19,379.8 1,180 dist institutional to street 43.74 922.22 1,203.53 1,484.47 1,665.15 17,049.87 794 dist agricultural to street 325.3 1,385.2 2,043.2 2,956.1 3,675 15,567.1 1,403 dist residential to agricultural 0 1092 1,757 2,656 3,313 15,732 1,405 dist residential to recreation 0 474.6 1,012.2 1,394 1,645 21,063 1,275 dist residential to institutional 0 720.3 1,035 1,292.8 1,446.1 21,886.9 795 dist residential to industrial 133.7 842.4 1,174.4 1,751.1 1,830.1 20,511 1,182 dist commercial to insti tutional 0 597.1 1,002.6 1,293.4 1,503.1 11,954.5 894 dist commercial to recreation 0 547.7 1,228.9 2,197.1 2,235.3 31,309.5 1,372 dist commercial to industrial 0 616.6 970.8 1,410.5 1,462 29,000.2 1,189 dist industrial to recreation 218.8 888.8 1,900.5 3,345.4 4,016.2 22,702.4 1,507 dist institutional to recreation 0 609.5 1,387.9 1,907.3 2,460.6 25,531.8 1,458 dist residential to bus route 1.523 511.786 871.968 982.6 1,204.199 20,333.605 143

PAGE 69

69 Table 3 16 Summary statistics for spatial topological relationship extracted by block group level Relationship Disjoint NA topo_agricultural_geostreet CROSSES/CONTAINS 103 81 1 403 topo_fullresidential_agricultural TOUCHES 153 29 1 405 topo_fullresidential_bus route CROSSES/CONTAINS 133 1236 2 18 topo_fullresidential_geostreet CROSSES/CONTAINS 900 672 15 topo_fullresidential_industrial TOUCHES 277 128 1 182 topo_fullresidential_institutional TOUCHES 688 104 795 topo_fullresidential_recreation TOUCHES 271 41 1 275 topo_fullresidential_retailoffice TOUCHES 1096 78 413 topo_industrial_busroute CROSSES/CONTAINS 14 371 1 202 topo_industrial_geostreet CROSSES/CONTAINS 106 301 1 180 topo_industrial_recreation TOUCHES 10 70 1 507 topo_institutional_busroute CROS SES/CONTAINS 26 737 824 topo_institutional_geostreet CROSSES/CONTAINS 142 651 794 topo_institutional_recreation TOUCHES 23 106 1 458 topo_recreation_busroute CROSSES/CONTAINS 10 245 1 332 topo_recreation_geostreet CROSSES/CONTAINS 168 143 1 276 topo_r etailoffice_busroute CROSSES/CONTAINS 53 1054 480 topo_retailoffice_geostreet CROSSES/CONTAINS 287 892 408 topo_retailoffice_industrial TOUCHES 312 86 1 189 topo_retailoffice_institutional TOUCHES 383 310 894 topo_retailoffice_recreation TOUCHES 66 149 1 372

PAGE 70

70 Table 3 17 Summary statistics of all spatial distance variables Min. 1 st Qu. Median Mean 3 rd Qu. Max. 0.0 648.2 1 051.0 1 358.0 1 542.0 31 310.0 Table 3 18 .Crashes discretization without considering the occurrence time Threshold number total_crash_num [ 1, 119) 530 [119, 275) 529 [275,3801] 528 pdo_crash_num [ 1, 95) 534 [ 95, 227) 527 [227,3091] 526 injury_crash_num [ 1, 22) 539 [22, 49) 522 [49,687] 509 fatal_crash_num 1 385 [2, 4) 426 [4,23] 176 ped_crash_num [ 1, 5) 578 [ 5,11) 430 [11,57] 436 bike_crash_num [1, 3) 584 [3, 5) 330 [5,34] 351 Table 3 19 Total crash dis cretization considering the occurrence time Threshold number Total crash tgp1 [ 1, 14) 541 [14, 34) 511 [34,466] 510 tgp2 [1, 27) 531 [27, 63) 530 [63,927] 512 tgp3 [ 1, 52) 529 [ 52, 124) 526 [124,1739] 526 tgp4 [ 1, 25) 54 1 [25, 56) 513 [56,648] 518

PAGE 71

71 Table 3 20 PDO crash discretization considering the occurrence time pdo_crash_num tgp1 [ 1, 11) 534 [11, 26) 519 [26,344] 503 tgp2 [ 1, 22) 527 [22, 53) 52 5 [53,754] 516 tgp3 [ 1, 42) 529 [ 42, 103) 527 [103,1457] 523 tgp4 [ 1, 20) 532 [20, 45) 513 [45,516] 519 Table 3 21 Injury crash discretization considering the occurrence time injury _crash_num tgp1 [1, 4) 501 [4, 9) 457 [9,122] 460 tgp2 [ 1, 6) 567 [ 6, 11) 437 [11,173] 485 tgp3 [1, 10) 514 [10, 22) 534 [22,282] 494 tgp4 [ 1, 6) 560 [ 6, 12) 471 [12,132] 454 Table 3 22 Pedestrian crash discretization considering the occurrence time ped_crash_num tgp1 1 400 2 159 [3,15] 160 tgp2 1 364 [2, 4) 361 [4,16] 274 tgp3 [1, 3) 504 [3, 6) 365 [6,29] 363 tgp4 1 374 [2, 4) 384 [4,15] 275

PAGE 72

72 Table 3 23 Bike crash discretization considering the occurrence time bike_crash_num tgp1 1 204 [2,8] 59 tgp2 1 417 2 148 [3,20] 110 tgp3 1 394 2 247 [3,14] 317 tgp4 1 400 [2,9] 183 Table 3 24 D variables discretization Threshold number pop_density_sqmile [ Inf, 4 836) 397 [ 4 836, 12 696) 793 [12 696, Inf] 397 housing_density_sqmile [ Inf, 1 738 ) 397 [1 738, 5 411) 793 [5 411, Inf] 397 job_density [ Inf, 250) 397 [ 250, 1 509) 793 [1 509, Inf] 397 mix_index [ Inf, 0.312) 397 [0.312, 0.668) 793 [0.668, Inf] 397 street_length_feet _sqmile [ Inf, 85,929 ) 333 [ 85,929 134 141 ) 637 [ 1 34,141 Inf] 295 intersect ion_num_sqmile [ Inf, 132.2 ) 328 [132.2 276.7 ) 644 [ 276.7 Inf] 293 busstop_num _sqmile [ Inf, 8.2 ) 349 [ 8.2 43.3 ) 654 [ 43.3 Inf] 262

PAGE 73

73 Table 3 25 Distance variables di scretization Close Far Intermediate NA dis_fullresidential_busstop 180 418 774 215 dis_fullresidential_retailoffice 265 280 629 413 dis_retailoffice_busroute 623 124 360 480 dis_recreation_busroute 106 69 80 1 332 dis_industrial_busroute 163 71 151 1 202 dis_institutional_busroute 308 118 337 824 dis_fullresidential_geostreet 301 313 958 15 dis_retailoffice_geostreet 189 340 650 408 dis_recreation_geostreet 68 124 119 1 276 dis_industrial_geostreet 23 155 229 1 180 dis_institutional_geostreet 82 239 472 794 dis_agricultural_geostreet 5 129 50 1 403 dis_fullresidential_agricultural 11 111 60 1 405 dis_fullresidential_recreation 94 90 128 1 275 dis_fullresidential_institutional 156 178 458 795 dis_fullresidential_industrial 48 128 229 1 182 d is_retailoffice_institutional 194 168 331 894 dis_retailoffice_recreation 62 90 63 1 372 dis_retailoffice_industrial 111 89 198 1 189 dis_industrial_recreation 11 48 21 1 507 dis_institutional_recreation 33 58 38 1 458 dis_fullresidential_busroute 506 198 740 143

PAGE 74

74 Table 3 26 Apriori function in arules package Usage apriori(data, parameter = NULL, appearance = NULL, control = NULL) Arguments data object of class transactions or any data structure which can be coerced into transactions (e.g., a binary matrix or data frame). parameter object of class APparameter or named list. The default behavior is to mine rules with support 0.1, confidence 0.8, and maxlen 10. appearance object of class APappearance or named list. With this argument item appearance can be restricted. By default all items can appear unrestricted. control object of class APcontrol or named list. Controls the performance of the mining algorithm (item sorting, etc.)

PAGE 75

75 Figure 3 1 Overview of the Miami Dade County, FL

PAGE 76

76 Figure 3 2 R esidential land use

PAGE 77

77 Figure 3 3 R etail /office land use

PAGE 78

78 Figure 3 4 I nstitutional land use

PAGE 79

79 F igure 3 5 R ecreation land use

PAGE 80

80 Figure 3 6 I ndustrial land use

PAGE 81

81 Figure 3 7 Agricultur al land use

PAGE 82

82 Fi gure 3 8 Built environment and crash map for a block group

PAGE 83

83 Figure 3 9 System d esign

PAGE 84

84 A B C D E F Figure 3 10 Distribution of crash by block group. A) total crash, B) PDO crash, C) injury crash, D) fatal crash, E) ped crash, F) bike crash.

PAGE 85

85 Figure 3 11 The distribution of number of crashes along with the time

PAGE 86

86 A B C D E Figure 3 12 Distribution of crash of time group 1 by block group. A) total crash, B) PDO crash, C) injury crash, D) ped crash, E) bike crash

PAGE 87

87 A B C D E Figure 3 13 Distribution of crash of time group 2 by block group. A) total crash, B) PDO crash, C) injury crash, D) ped crash, E) bike crash

PAGE 88

88 A B C D E Figure 3 14 Distribution of crash of time group 3 by block group. A) total crash, B) PDO crash, C) injury crash, D) ped crash, E) bike crash

PAGE 89

89 A B C D E Figure 3 15 D istribution of crash of time group 4 by block group. A) total crash, B) PDO crash, C) injury crash, D) ped crash, E) bike crash

PAGE 90

90 ( Bogorny 2006 ) Figure 3 16 Three spatial relationships Figure 3 17 The 9 intersection model represented as a matrix

PAGE 91

91 Figure 3 18 Spatial Topological Relationship Extraction and Distance Relationship Calculation Algorithm

PAGE 92

92 Figure 3 19 Spatial distance distribution.

PAGE 93

93 CHAPTER 4 RESULTS AND DISCUSSION The structur e of this chapter is as follows: an overview of the rules mined by association rule data mining for different crash types is first introduced; then the rules are separately discussed based on whether the spatial and temporal attributes of crashes were cons idered in data mining process. For each category, the rules for 5 crash types are discussed in detail; last, the influence of mixed land use on crashes is also discussed and summarized. Result Overview Table 4 1 p resents the summary of rules mined for each crash type. The result shows that there are no rules for fatal crashes, which means the occurrence of fatal crashes has no relationship pattern with the built environment based on the data used in this research. There are 176 block groups marked as fatal_crash_num=big after the discretization of fatal crash numbers. The same result is presented for pedestrian crashes in time group 1 and bike crashes in time group 2. Additionally, the number of rules for bike cras hes in time group 1 and 4 is marked with NA, which means the mining for these two crash types is not conducted. This is because no block groups are classified as bike_crash_num=big after the discretization process of the number of bike crashes in these two time groups (see the discretization section in chapter 3). With a combined analysis of Table 4 1 Table 3 2 (summary of block groups that h type) and Table 3 6 (summary of block groups that between the number of block groups that have crashes and the number of rules. For

PAGE 94

94 example, there are only 45 rules for the bike crashes in spatial mode based on the data for 1260 block groups, while there are 650 rules for the bike crashes in time group 3 based on the data for 958 block groups. Usually more rules are expected to be mined for a crash type in total than when classified into 4 time groups since there will be block groups that have crashes, which means more records will participate in the rule mining process. Result Analysis Method As shown in Table 4 1 t he number of the rules for a crash type is usually very large. Practically it is impossible to discuss every rule, and it is not meaningful to do this since most of rules have a relative low lift. Thus the analysis is focused o n rules summary and rules with high quality (i.e. the rule has big lift) For each crash type, the rules are discussed from four aspects: the scatter plot for all rules, the most frequent items for the top 100 rules, the list of top 10 rules, and the match ing block groups in GIS map for the top 5 rules. Scatter plot for all rules. The plot depicts the support confidence lift distribution of all crashes for a certain crash type. The x axis is the support of the rules, y axis is the lift, and the color (dark or light) is for the confidence of the rule. See example in Figure 4 1 Most frequent items for the top 100 rules. A statistics of the most frequent items appeared in the antecedents for the top 100 rules identify the most important contributing built environment variables related to a certain crash type. List of top 10 rules. A list of the top 10 rules for a certain crash type that show how crashes are influenced by the built environment. The top 10 rules for a cr ash type usually share some of the same predicates in antecedent.

PAGE 95

95 Matching block groups for top 5 rules. For each rule, its matching block groups could be identified through its lhs parameter. To help understand how a crash type is influenced by the built environment, the block groups for the top 5 rules of that crash type are identified in GIS environment and integrated as one layer. The reason chose top 5 rules but not more is because: 1) the matching block groups for each rule are highly overlapped, 2) the top 5 rules are the strongest rules and could embody the pattern of the relationship between the built environment and that certain crash type. Figure 4 2 to Figure 4 6 are maps show these result for the 5 crash types. Spatial Analysis of the Influence of Built Environment on Crashes In this section, the temporal influence of the built environment is not evaluated i.e. all crashes of the same type are treated together irrespective of time of occurrence. The influence of built environment on total crashes, PDO crashes, injury crashes, pedestrian crashes and bike crashes are discussed separately below. Total Crashes Figure 4 1 A is the scatter plot for all the 19 677 rules for all crashes. Generally, the lower support of the rule, the higher the lift, and vice versa. Most of the rules have the support lower than 0.1 (10 %). The distribution of confidence is consistent with the li ft, which means a rule that has small confidence usually has small lift. The strongest rules are those in the top left corner. To further understand the strongest rules, Table 4 2 is a statistics of the most frequent items that a ppeared in the left side of the rules as antecedents based on the top 100 rules. There are 44 out of 100 rules that have the dis_retailoffice_busroute=intermediate as one of the antecedent, 36 rules contains dis_fullre sidential_busroute=intermediate, and 3 5 rules have the item

PAGE 96

96 topo_fullresidential_retialoffice=TOUCHES. The statistics show that the most frequent items relate with distance to transit (number of bus stop per square mile, distance to bus route) except the topo_fullresidential_retailoffice=TOUCH ES Table 4 7 lists the top 10 rules between the built environment variables and the big number of total crashes (total_crash_num=big). If a community has the attributes that distance from the residential to bus s top is far, and the retail/office land use touches the industrial land use, then this community has a high probability of crashes with lift equal to 2. 3187. The most frequent item that appeared in the top 10 rules are topo_fullresidential_retailoffice=TOUC HES ( 5 of 10 rules), foll owed by dis_fullresidential_busstop=far ( 4 of 10 rules), which means that residential land use is one of the key factors in the occurrence of crashes. This make sense since most of travel activities are home based. The far distance between residences to bus stop makes people drive more, which increase the probability of crashes. Figure 4 2 shows the matching block groups for the top 5 rules for total crashes. The result shows these block g roups cross or are very close to major highways with exclusion of interstates. These major highways include: S Dixie HWY, US 27, SR 826, SR 836, and SR 924. The result also shows several big size block groups in the north west corner of Miami Dade County ( Doral, medley, and Miami International airport area) and one in south east area have a high crash risk. The attributes like population density, job density for t hese block groups varies, while they have the common attributes that d is_retailoffice_geostreet =far topo_ retailoffice_industrial=TOUCHES and also total_crash_num=big.

PAGE 97

97 PDO Crashes Figure 4 1 B shows the distribution of the 2 0 388 rules for the PDO crashes. The distribut i on is very similar to the total crashe s. The very likely reason is that 82.94% of the total crashes are PDO crashes. Table 4 3 is the list of frequency items for PDO crashes in the top 100 rules. Th e list shows that the busstop_num_sqmile = big is the most frequent item among all items that contained by the top 100 rules, which is 64 out frequent, and both of them are agricultural land use related. The topo_agricultural_ geostreet= topo_fullresidential_agricultural= residential and agricultural not coexisted in the same block group. Table 4 8 is the list of the top 10 rules with consequent pdo_crash_num=big. The list shares 3 residential related items with the top 10 rules for total crashes: dis_fullresidential_geostreet=intermediate, dis_fullresidential_busroute=intermediate a nd topo_fullresidential_retailoffice=TOUCHES. Unl ike only 3 rules contains item busstop_num _sqmile =big in total crash rules there are 9 rules contains the item for PDO crashes, which means the item is a key factor for the top 10 rules. Additionally, the r ule 5 is the father rule for rule 1, 2, 4, 10. Though it has higher support, its lift is smaller than rule 1, 2, 4. Figure 4 3 shows the distribution of matching block groups for the top 5 rules for PDO crashes U nlike the block groups are distributed at different areas in the whole county and there some big size block groups for total crashes, the matching block groups of the top 5 rules are mainly in Miami area, and some in Miami Gardens.

PAGE 98

98 Injury Crashes The scat ter plot for all 7,503 rules is shown in Figure 4 1 C. The figure shows that the lift for the strongest rules (lift >2.4) is a little higher when compared to the rules for total and P DO crashes (lift < 2.4) The mo st frequent items for the top 100 rules in Table 4 4 shows that the most 3 frequent items are distance related items and all has the value far: dis_fullresidential_busstop=far (65), dis_fullresidential_retailoffice=far (48) and di s_retailoffice_geostreet=far (36). While unlike in total and PDO crashes contains the busstop_num_sqmile=big, the busstop_num_sqmile has the value intermediate. Table 4 9 lists the top 10 rules for injury crashes The antecedents for all of the 10 rules are composed with spatial distance and topological items and no D variables contained. The top rules is dis_retailoffice_geostreet=far, dis_industrial_geostreet=far, topo_retailoffice_industrial=TOUCHES > injury_crash_num = big. T he pattern of matching block groups for injury crashes ( Figure 4 4 ) is similar with the total crashes. The main difference is that there are less block groups in Glenvar Heights and Miami areas (see red rectangle in Figure 4 4 ). Pedestrian Crashes There are a lot of more rules for pedestrian crashes than all other crash types. ( Figure 4 1 D)This is unexpected since the number of pedestrian crashes (6,015) is much smaller than total crashes (187,519) or PDO crashes (155,522). On the other hand, the strongest rules for pedestrian crashes have muc h higher lift than other crash types : from 2. 2 for the total PDO crashes 2.4 for injury crashes to 3.0 for pedestrian crashes. This means the pattern between built environment and pedestrian crash is stronger than others.

PAGE 99

99 The most appeared items in the antecedent of rules for pedestrian is busstop_num _sqmile = big, which is 100 out of 100 rules ( Table 4 5 ) This means the big number of bus stop is the most important factor that results in big number of pedestrian crashes. Table 4 10 shows the top 10 rules for pedestrian crashes. A very obvious difference between these rules and the rules for other crash types is that all distance variables have the value intermediate ( fro m 648.2 to 1,542 feet ), which usually is far for the other crash types. The top 10 rules for pedestrian crashes have a relative higher quality when compared with other crash types. The lift varies from 2. 8883 to 3.0221 The map of matching block groups f or pedestrian crashes ( Figure 4 5 ) is obviously different than total and injury crash types but similar with PDO crashes Compared with the matching map for PDO crashes, the matching block group for pedestrian cra shes are focused at downtown Miami area The pedestrian crash corresponding block groups are small and located in areas that potentially have more people than other places. This is expected since both the density of people walking and driving here are much higher than other areas. Although the people density in Miami Beach is supposed to be high er because of the visitors this area is relative ly safe for the pedestrian. likely that few er cars are driving in the Miami Beach streets. The number of the co rresponding block groups is also smaller than for other crash types. Bike Crashes Unlike all other crash types, the number of rules for bike crashes is very small only 45 ( Figure 4 1 E). Possible reason may incl ude: first, there has much less number of bike crashes (2,255 in total) than any other type; second, 327 block groups have no bike crashes, and only 351 block groups have attribute bike_crash_num=big. The

PAGE 100

100 scatter plot shows that quality of the rules for b ike crashes is also lower than others. The lift is about 2. 2 for the strongest rule for bike crashes. Table 4 6 shows the list of fr equent antecedent items for all of the 45 rules The D variable related item pop_ density_sqmile=small (12 of 45) is one of the five most frequent items and it is exclusive to bike crashes. The item topo_fullresidential_retailoffice=TOUCHES (32 of 45) is the most frequent. Table 4 11 tabulates the top 10 rules for bike crash. The strongest rule is a rule with 3 items for the antecedent pop_density_sqmile=small, b usstop_num _sqmile = intermediate topo_fullresidential_retailoffice=TOUCHES, in which the population density is the first density variabl e appearing in a top 10 rule list. The rule 2 shows that if a block group attributes include pop_density_sqmile=small dis_fullresidential_busstop=f ar, di s_retailoffice_geostreet=far, t opo_fullr esidential_retailoffice=TOUCHES it has the risk of bike crash es with lift 2.2 007 Figure 4 6 shows t he map of matching block gr oups for top 5 bike crash rules. The pattern is more scattered than centralized at certain areas. First, the blocks groups mainly distributed along the major road. Second, there are some block groups in Key Biscayne and Miami Beach, which usually not included for other crashes types. Spatio Temporal Analysis of the Influence of Built Environment on Crashes The spatial influence of built environment on crashes is discussed in the last section. This section is focused on the spatio temporal influence of built environment on crashes by type and how the influence may change for each crash type along the time.

PAGE 101

101 Total Crashes Figure 4 7 shows the distribution of the rules for total crashes by different time groups. The rules for the 4 time groups have similar distribution, while the 4 th group has the smallest lift and confidence. This means the relationship pattern between built environment and total crashes from 7pm to 11pm is not as stronger as during other times. Table 4 12 through Table 4 15 show the statistics for the most frequ ent items appearing in the left side of the top 100 rules for the total crashes in each group. The two items: topo_fullresidential_busroute=DISJOINT and topo_fullresidential_retailoffice=TOUCHES are frequent items in all of the 4 time groups. The topo_full residential_busroute=DISJOINT and busstop_num_sqmile=big is shared by group 2, 3 and 4. The time group 1 has three special items: dis_industrial_busroute=intermediate topo_industrial_busroute=DISJOINT and dis_retailoffice_busroute=intermediate T able 4 28 through Table 4 31 present a list of top 10 rules for total crash in each time group. An obvious finding that emerges is that among the top 10 rules for each time gro up, the lift of the strongest rules in time group 4 is smaller than the lift of the weakest rules in group 2 and 3, and close to the lift of weakest rule in group 1 Additionally, no D variable related items are contained in the antecedents of top 10 rules in group 1 and all items either spatial distance variables or topological relationship variables. The only D variable appeared in the top 10 rule for group 2, 3 and 4 is busstop_num_sqmile but with different values. Figure 4 12 t hrough Figure 4 15 show the matching block groups for the top 5 rules for total crashes in each time group. The maps show that the pattern is stable in

PAGE 102

102 some areas like: Pince t on and southe ast area, Miami Lakes, and North Miami Beach during the time change (see red rectangle from Figure 4 12 to Figure 4 15 ) While the difference between the matching groups in 4 groups are also obvious: first, a n obvi ous change between the time group 1, 2 and the time group 3, 4 is that more small block groups appear in Miami and Hialeah area ; second, the big blocks in the north west PDO Crashes Figure 4 8 shows the support lift confidence distribution of rules for PDO crashes during different time groups. Similar to the total crash rules in different time periods, the maximum lift of rules in time group 4 is smaller than in other groups. Table 4 16 through Table 4 19 show the summary of frequent items for the top 100 rules for PDO crashes in e ach time group. No fre quent item is shared by the four time groups. The time group 1 and 2 share no common frequent item, while 2 frequent items are shared by time group 2, 3 and 4: busstop_num_sqmile=big and dis_fullresidential_busroute=intermediate The busstop_num_sqmile=big also is the most frequent item for time group 2 (76 of 100) and 3 (56 of 100). This indicates the important influence of bus stops for PDO crashes from 7am to 6pm. Table 4 32 through Table 4 35 show the top 10 PDO rules in the 4 time groups. Like the result for total crashes with considering the crash time, first, the time group 4 has the weakest pattern between the built environment and PDO crashes. The lift for the stron gest rule is smaller than the weakest rule in other 3 time groups. Second, no D variable items for the top 10 rules in time group 1. A difference from the top 10 rules for total crash is that the strongest rules for PDO crashes in time group 2, 3 and 4 con tains

PAGE 103

103 the item: busstop_num_sqmile=big. Additionally, the mix_index=big item contributed to the top 10 rules in time group 2 (rule 7, 8, 9). Figure 4 16 through Figure 4 19 show the change of matching block groups for the top 5 rules for PDO crashes in different time groups. First, the pattern from time group 1 to group 2 is stable, and very similar with the matching block groups for the total crash in time group 1. Second, the pattern from time group 3 to group 4 is kind of stable and similar with the matching block groups for PDO crashes without considering the crash time. The difference between group 3 and 4 is that there more block groups in downtown Miami area. Third, th e pattern change from group 1, 2 to group 3, 4 is huge. All of the matching block groups are small for group 3 and 4, and they are mainly in Miami, North Miami and Hialeah area. The large block groups that appeared in group 1 and 2 in Miami Gardens, North Miami Beach, Doral, Miami International Airport and Southeast corner no longer appear in group 3 and 4 Injury Crashes Figure 4 9 shows the support lift confidence distribution of rules for injury crashes during d ifferent time groups. Table 4 20 through Table 4 23 summarize the frequent items for the association rules of injury crashes in different time groups. It is very clear that the 4 groups share 3 1 and 3 have the same frequent items with different frequencies. T he only D variable item busstop_num _sqmile=big is a frequent item shared by group 1 and 4. Table 4 36 through Table 4 39 list top 10 rules for injury crash in different time groups. The results show that the busstop_num _sqmile = intermediate is the only frequent D variables in the top rules for the injury crashes. The item

PAGE 104

104 dis_fullresidential_busstop=far is most frequent item for injury crashes in group 2, 3 and 4, and also is a frequent item for time group 1 Figure 4 20 through Figure 4 23 shows the matching block groups for the top 5 rules for injury crashes in different time groups. Generally, the matching block group patterns stay consistent along the time change in most of the area, which is mainly distributed in the red irregular area from Figure 4 20 to Figure 4 23 The difference between each group and the red irregular area is: the time group 1has the b ig block group at the northwest corner; time group 2 has some block groups in Key Biscayne; time group 3 has some in Miami Beach; while group 4 with less block groups when pike and I 75. Pedestrian Crashes Figure 4 10 shows the distribution of the rules of pedestrian crashes in time groups 2, 3 and 4. It is clear that the time group 3 has more rules than the other two groups, and al so the lifts of the strongest rules are bigger than those in groups 2 and 4. This is very likely that more pedestrian activity during time group 3. Table 4 24 Table 4 25 and Table 4 26 show the most frequent items for the top 100 rules for time group 2, 3 and 4, respectively. The most important finding is that the item busstop_num _sqmile =big is almost contained by all top 100 rules for the 3 time groups. There are 91 out of 100 rules for group 2, 100 out of 100 for group 3 and 4 contains this item. Except the frequent item busstop_num _sqmile =big is shared by the 3 time groups and the topo_fullresidential_institutional=TOUCHES is sha red by group 2 and 3 all other frequent items for each time group are exclusive to themselves. Additionally, except the D variable buststop_num_sqmile=big is the most frequent item

PAGE 105

105 for all groups, another D variable street_length_feet_sqmile = big is freq uent with group 2. Table 4 40 through Table 4 42 show the top 10 rules for pedestrian crashes in time group 2 to 4. The result shows all of the distance variables items in three time groups have the value intermediate, which means that the some of the intermediate distance play a great role in pedestrian crashes in time group 2, 3 and 4. Figure 4 24 through Figure 4 26 shows the change of matching block groups of top 5 rules for pedestrian crashes in time group 2 to 4. Similar to the matching block group distributions of pedestrian crashes without considering the occurrence time, all of the m atching block groups are in small size and focused on high density population areas. The change in the pattern is obvious: While the matching block groups are mainly in downtown Miami, Hialeah, North Miami and Miami Gardens in time group 2, they are presen t in downtown Miami in time group 3, and only few in North Miami. Group 4 has more block groups north (Miami Gardens North Miami ) but less in Downtown Miami. Bike Crashes Figure 4 11 shows the distribution of the rules for bike crashes in time group 3. When compared with the rules for all bike crashes, time group 3 shows more rules, but the lift of the strongest rules is smaller. More rules means the pattern of bike crashes in time group 3 is stronger than for all bike crashes. Table 4 27 shows the frequent items for the top 100 rules for bike crash in time group 3. Although there are no rules of bike crashes for all other time groups, the frequent items for bike crashes i n time group 3 and for all bike crashes without considering the time are not the same. 3 of the 5 frequent items for bike crashes in time

PAGE 106

106 group 3 are topological variables. The item pop_density_sqmile=small is shared by rules in time group 3 and rules with out considering crash time. While another D variable street_length_feet_sqmile=intermediate is exclusive to the rules for bike crashes in time group 3. Table 4 43 shows the top 10 rules for bike crashes in time gr oup 3. The rule 1 indicates that if a block group has the attributes: dis_fullresidential_busstop=far, topo_fullresidential_retailoffice=TOUCHES and topo_retailoffice_institutional=TOUCHES, then it has the high risk of bike crash from 12pm to 6pm. Ther e are 7 of 10 rules contains the item pop_density_sqmile=small this means the small density community has big number of bike crashes when combing some other built environment attributes in that community. Figure 4 27 shows the matching block groups for the top 5 rules of bike crashes in time group 3. The pattern is more scattered than centralized. When compared with the matching block groups for the top 5 rules of bike crashes without considering the time ( Figure 4 6 ), it has a smaller number of block groups. Additionally, the pattern of bike crashes in time group 3 in Miami downtown area is not as obvious as in Figure 4 6 Influen ce of Mixed Land Use on Crashes The mixed land use is a very important aspect of planning. Planners advocate that smart growth calls for mixed land use versus single land use. Is highly mixed land use good from the traffic safety perspective? Table 4 44 and Table 4 45 show the effect of mix_index on all crash types. The big, intermediate and small in the table means mix_index = big, mix_index = intermedate, and mix_index = small respectively. The two tables show the total number, the max value of the rule lift and the mean value of the

PAGE 107

107 rule lift for the rules which contain the three antecedents for each crash type. Here are some findings: First, there are no rule containing the antecedent mix_index = small for all of the crash types (regardless crash occurrence time). This means that low mixed land use has no effect on all crash types. Second, generally, there are less rules containing the antecedent mix_index=big than mix_i ndex=intermediate. This is excepted since the number of the latter is about two times more than the former item which is the result of land use mix index from the discretization process. The exception s are rule for bike crashes and pedestrian crashes in t ime group 2 The number of rules contain mix_index=big for bike crashes in time group 3 which is 5 6, is bigger than 3 of the rules contain mix_index=intermediate. This indicates the high mixed land use plays a more important role than intermediate mixed l and use in the bike crash in time group 3 (12pm 6pm). Though only 3 rules contain item mix_index=big f or bike crashes without considering crash time there zero rules with item mix_index=intermediate. The number is 75 to 51 when for the pedestrian crashe s in time group 2. Third, although the number of rules that contain the item mix_index= big is less than the mix_index= intermediate, Generally, the max and mean value of rule lifts for rules contain mix_ind ex= big are bigger than contain mix_index=intermediate for total and PDO crashes (except in time group 4). This means that highly mixed land use may result more easily in total and PDO crashes when combined with other built environment variables. On contrar y, the

PAGE 108

108 quality of rules contain mix_index=intermediate is higher than mix_index=big when for injury crashes and pedestrian crashes (except time group 2). Fourth, the influence of mixed land use on crashes varies from time to time. Statistically, there are more rules for time group 2 and 3 than for time group 1 and 4, regardless of the crash type. The corresponding rule quality also is higher. This is expected since the travel activity in group 2 and 3 is larger than in group 1 and 4.

PAGE 109

109 Table 4 1 Nu mber of rules that mined for eac h crash type Crash type No. of rules spatial total_crash_num=big 19,677 pdo_crash_num=big 20,388 injury_crash_num=big 7,503 fatal_crash_num=big 0 ped_crash_num=big 3,336 b ike_crash_num=big 45 Spatio temporal tgp1_total_crash_num=big 12,960 tgp1_pdo_crash_num=big 13,763 tgp1_injury_crash_num=big 2,853 tgp1_ped_crash_num=big 0 tgp1_bike_crash_num=big NA tgp2_total_crash_num=big 14,462 tgp2_pdo_crash_num= big 19,458 tgp2_injury_crash_num=big 6,477 tgp2_ped_crash_num=big 2,506 tgp2_bike_crash_num=big 0 tgp3_total_crash_num=big 19,730 tgp3_pdo_crash_num=big 17,933 tgp3_injury_crash_num=big 9,278 tgp3_ped_crash_num=big 22,849 tgp3_bike_crash_nu m=big 650 tgp4_total_crash_num=big 13,542 tgp4_pdo_crash_num=big 16,627 tgp4_injury_crash_num=big 2,173 tgp4_ped_crash_num=big 3,755 tgp4_bike_crash_num=big NA Table 4 2 Frequent items of top 100 rules for total crashes rules Frequent Items No. dis_retailoffice_busroute=intermediate 44 dis_fullresidential_busroute=intermediate 36 topo_fullresidential_retailoffice=TOUCHES 35 busstop_num_sqmile=big 33 topo_industria l_busroute=DISJOINT 27

PAGE 110

110 Table 4 3 Frequent items of top 100 rules for PDO crashes rules Frequent Items No. busstop_num_sqmile=big 64 topo_industrial_busroute=DISJOINT 47 dis_fullresidential_busroute =intermediate 34 topo_agricultural_geostreet= 26 topo_fullresidential_agricultural= 25 Table 4 4 Frequent items of top 100 rules for injury crashes rules Frequent Items No. dis_fullres idential_busstop=far 65 dis_fullresidential_retailoffice=far 48 dis_retailoffice_geostreet=far 36 topo_fullresidential_retailoffice=TOUCHES 35 busstop_num_sqmile=intermediate 23 Table 4 5 Freq uent items of top 100 rules for pedestrian crashes rules Frequent Items No. busstop_num_sqmile=big 100 dis_fullresidential_industrial=intermediate 56 topo_industrial_geostreet=DISJOINT 40 dis_fullresidential_retailoffice=intermediate 30 topo_full residential_busroute=DISJOINT 26 Table 4 6 Frequent items of top 45 rules for bike crashes rules Frequent Items No. topo_fullresidential_retailoffice=TOUCHES 32 dis_fullresidential_busstop=far 17 dis_ret ailoffice_geostreet=far 13 pop_density_sqmile=small 12 dis_institutional_geostreet=far 10

PAGE 111

111 Table 4 7 Top 10 rules for total crash number Antecedent Supp. Conf. Lift 1 {dis_fu llresidential_busstop=far, topo_retailoffice_industrial=TOUCHES} 0.0510 0.7714 2.3187 2 {dis_fullresidential_busstop=far, topo_fullresidential_retailoffice=TOUCHES, topo_industrial_busroute=DISJOINT} 0.0523 0.7411 2.2274 3 {busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, dis_fullresidential_busroute=intermediate, topo_fullresidential_busroute=DISJOINT} 0.0586 0.7381 2.2185 4 {busstop_num_sqmile=intermediate, dis_fullresidential_busstop=far, dis_fullresidential_geost reet=far, topo_fullresidential_retailoffice=TOUCHES} 0.0529 0.7368 2.2147 5 {dis_retailoffice_geostreet=far, topo_retailoffice_industrial=TOUCHES} 0.0523 0.7345 2.2077 6 {dis_industrial_geostreet=far, topo_retailoffice_industrial=TOUCHES} 0.0510 0. 7232 2.1738 7 {busstop_num_sqmile=intermediate, dis_fullresidential_busstop=far, dis_fullresidential_geostreet=far, dis_retailoffice_geostreet=far} 0.0510 0.7232 2.1738 8 {busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, d is_fullresidential_busroute=intermediate, topo_fullresidential_retailoffice=TOUCHES} 0.0636 0.7214 2.1684 9 {busstop_num_sqmile=big, dis_fullresidential_busroute=intermediate, topo_fullresidential_busroute=DISJOINT, topo_fullresidential_retailoffi ce=TOUCHES} 0.0636 0.7214 2.1684 10 {busstop_num_sqmile=intermediate, dis_fullresidential_geostreet=far, dis_retailoffice_geostreet=far, topo_fullresidential_retailoffice=TOUCHES} 0.0504 0.7207 2.1663

PAGE 112

112 Table 4 8 Top 10 rules for PDO crash Antecedent Supp. Conf. Lift 1 {busstop_num_sqmile=big, dis_fullresidential_geostreet=intermediate, topo_agricultural_geostreet=, topo_industrial_busroute=DISJOINT} 0.0542 0.7478 2.2563 2 {busstop_num_sqmile=big dis_fullresidential_geostreet=intermediate, topo_fullresidential_agricultural=, topo_industrial_busroute=DISJOINT} 0.0542 0.7478 2.2563 3 {busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, dis_fullresidential_busroute=intermedi ate, topo_fullresidential_busroute=DISJOINT} 0.0592 0.7460 2.2509 4 {busstop_num_sqmile=big, dis_fullresidential_geostreet=intermediate, topo_fullresidential_retailoffice=TOUCHES, topo_industrial_busroute=DISJOINT} 0.0529 0.7304 2.2038 5 {busstop_num _sqmile=big, dis_fullresidential_geostreet=intermediate, topo_industrial_busroute=DISJOINT} 0.0542 0.7288 2.1989 6 {busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, dis_fullresidential_busroute=intermediate, topo_fullresidential _retailoffice=TOUCHES} 0.0643 0.7286 2.1982 7 {busstop_num_sqmile=big, dis_fullresidential_busroute=intermediate, topo_fullresidential_busroute=DISJOINT, topo_fullresidential_retailoffice=TOUCHES} 0.0643 0.7286 2.1982 8 {busstop_num_sqmile=big, dis_r etailoffice_geostreet=intermediate, dis_fullresidential_busroute=intermediate, topo_fullresidential_busroute=DISJOINT} 0.0580 0.7244 2.1856 9 {dis_fullresidential_busstop=far, topo_fullresidential_retailoffice=TOUCHES, topo_industrial_busroute=DISJOIN T} 0.0510 0.7232 2.1820 10 {busstop_num_sqmile=big, dis_fullresidential_geostreet=intermediate, dis_retailoffice_geostreet=intermediate, topo_industrial_busroute=DISJOINT} 0.0510 0.7232 2.1820

PAGE 113

113 Table 4 9 Top 10 rules for injury crash Antecedent Supp. Conf. Lift 1 {dis_retailoffice_geostreet=far, dis_industrial_geostreet=far, topo_retailoffice_industrial=TOUCHES} 0.0503 0.7900 2.4367 2 {dis_industrial_geostreet=far, topo_fullresidential_retailoffice =TOUCHES, topo_retailoffice_industrial=TOUCHES} 0.0503 0.7822 2.4126 3 {dis_fullresidential_busstop=far, topo_retailoffice_industrial=TOUCHES} 0.0522 0.7810 2.4088 4 {dis_industrial_geostreet=far, topo_retailoffice_industrial=TOUCHES} 0.0554 0.7768 2. 3960 5 {dis_fullresidential_busstop=far, dis_retailoffice_geostreet=far, dis_industrial_geostreet=far} 0.0503 0.7745 2.3890 6 {dis_retailoffice_geostreet=far, topo_fullresidential_retailoffice=TOUCHES, topo_retailoffice_industrial=TOUCHES} 0.0503 0.7 745 2.3890 7 {dis_retailoffice_geostreet=far, topo_retailoffice_industrial=TOUCHES} 0.0554 0.7699 2.3748 8 {dis_fullresidential_busstop=far, dis_fullresidential_retailoffice=far, dis_fullresidential_busroute=far, topo_fullresidential_retailoffice=TOU CHES} 0.0510 0.7692 2.3727 9 {dis_fullresidential_busstop=far, dis_industrial_geostreet=far} 0.0535 0.7636 2.3554 10 {dis_fullresidential_busstop=far, dis_fullresidential_retailoffice=far, dis_retailoffice_geostreet=far, dis_fullresidential_busroute= far} 0.0548 0.7611 2.3475

PAGE 114

114 Table 4 10 Top 10 rules for pedestrian crash Antecedent Supp. Conf. Lift 1 {busstop_num_sqmile=big, dis_fullresidential_busroute=intermediate, topo_fullresidential_retailoffic e=TOUCHES, topo_industrial_busroute=DISJOINT} 0.0506 0.9125 3.0221 2 {busstop_num_sqmile=big, dis_fullresidential_busroute=intermediate, topo_industrial_busroute=DISJOINT} 0.0506 0.9012 2.9848 3 {busstop_num_sqmile=big, dis_fullresidential_industrial =intermediate, topo_fullresidential_retailoffice=TOUCHES, topo_industrial_geostreet=DISJOINT} 0.0533 0.8953 2.9653 4 {busstop_num_sqmile=big, dis_fullresidential_industrial=intermediate, topo_fullresidential_busroute=DISJOINT, topo_industrial_geostre et=DISJOINT} 0.0512 0.8916 2.9528 5 {busstop_num_sqmile=big, dis_fullresidential_busstop=intermediate, dis_fullresidential_industrial=intermediate, topo_industrial_geostreet=DISJOINT} 0.0506 0.8795 2.9129 6 {busstop_num_sqmile=big, dis_fullresidentia l_industrial=intermediate, topo_industrial_geostreet=DISJOINT} 0.0540 0.8764 2.9026 7 {busstop_num_sqmile=big, dis_fullresidential_industrial=intermediate, topo_industrial_busroute=DISJOINT, topo_industrial_geostreet=DISJOINT} 0.0533 0.8750 2.8979 8 {busstop_num_sqmile=big, dis_fullresidential_industrial=intermediate, topo_fullresidential_busroute=DISJOINT, topo_fullresidential_retailoffice=TOUCHES} 0.0575 0.8737 2.8936 9 {busstop_num_sqmile=big, dis_fullresidential_industrial=intermediate, topo _industrial_geostreet=DISJOINT, topo_retailoffice_busroute=DISJOINT} 0.0519 0.8721 2.8883 10 {busstop_num_sqmile=big, dis_fullresidential_industrial=intermediate, topo_agricultural_geostreet=, topo_industrial_geostreet=DISJOINT} 0.0519 0.8721 2.8883

PAGE 115

115 Table 4 11 Top 10 rules for bike crash Antecedent Supp. Conf. Lift 1 {pop_density_sqmile=small, busstop_num_sqmile=intermediate, topo_fullresidential_retailoffice=TOUCHES} 0.0545 0.6106 2.2007 2 {pop_ density_sqmile=small, dis_fullresidential_busstop=far, dis_retailoffice_geostreet=far, topo_fullresidential_retailoffice=TOUCHES} 0.0514 0.5652 2.0370 3 {pop_density_sqmile=small, dis_fullresidential_busstop=far, topo_fullresidential_retailoffice=TOU CHES} 0.0569 0.5496 1.9808 4 {dis_fullresidential_busstop=far, dis_retailoffice_geostreet=far, topo_fullresidential_institutional=TOUCHES, topo_fullresidential_retailoffice=TOUCHES} 0.0506 0.5378 1.9383 5 {dis_fullresidential_busstop=far, topo_fullre sidential_retailoffice=TOUCHES, topo_institutional_geostreet=DISJOINT} 0.0577 0.5368 1.9345 6 {dis_fullresidential_retailoffice=far, topo_fullresidential_institutional=TOUCHES, topo_fullresidential_retailoffice=TOUCHES} 0.0522 0.5366 1.9338 7 {dis_ful lresidential_busstop=far, dis_retailoffice_geostreet=far, topo_fullresidential_institutional=TOUCHES} 0.0538 0.5354 1.9297 8 {dis_fullresidential_geostreet=far, dis_retailoffice_geostreet=far, dis_institutional_geostreet=far, topo_fullresidential_ret ailoffice=TOUCHES} 0.0506 0.5333 1.9221 9 {dis_fullresidential_busstop=far, dis_retailoffice_geostreet=far, dis_institutional_geostreet=far, topo_fullresidential_retailoffice=TOUCHES} 0.0514 0.5328 1.9202 10 {pop_density_sqmile=small, housing_density _sqmile=small, dis_fullresidential_busstop=far, topo_fullresidential_retailoffice=TOUCHES} 0.0514 0.5328 1.9202

PAGE 116

116 Table 4 12 Frequent items for top 100 rules for total crashes in time group 1 Frequent Item s No. dis_industrial_busroute=intermediate 58 topo_industrial_busroute=DISJOINT 48 topo_fullresidential_retailoffice=TOUCHES 35 topo_fullresidential_busroute=DISJOINT 34 dis_retailoffice_busroute=intermediate 25 Table 4 13 Frequent items for top 100 rules for total crashes in time group 2 Frequent Items No. dis_fullresidential_busroute=intermediate 59 busstop_num_sqmile=big 44 topo_fullresidential_retailoffice=TOUCHES 32 topo_fullresi dential_busroute=DISJOINT 30 topo_retailoffice_industrial=TOUCHES 22 Table 4 14 Frequent items for top 100 rules for total crashes in time group 3 Frequent Items No. busstop_num_sqmile=big 74 dis_fullresidential_busroute=intermediate 74 topo_fullresidential_retailoffice=TOUCHES 24 topo_fullresidential_busroute=DISJOINT 20 dis_fullresidential_retailoffice=intermediate 17 Table 4 15 Frequent items for top 100 rules for total crashes in time group 4 Frequent Items No. dis_fullresidential_busroute=intermediate 66 busstop_num_sqmile=big 49 topo_fullresidential_retailoffice=TOUCHES 38 dis_fullresidential_busstop=far 16 t opo_fullresidential_busroute=DISJOINT 1 5 Table 4 16 Frequent items for top 100 rules for PDO crashes in time group 1 Frequent Items No. dis_industrial_busroute=intermediate 52 topo_fullresidential_retailoffice=TOUCHES 41 topo_fullresidential_busroute=DISJOINT 3 9 dis_retailoffice_busroute=intermediate 3 8 topo_retailoffice _industrial=TOUCHES 30

PAGE 117

117 Table 4 17 Frequent items for top 100 rules for PDO crashes in time group 2 Frequent Items No. busstop_num_sqmile=big 76 dis_fullresidenti al_busroute=intermediate 38 topo_industrial_busroute=DISJOINT 35 topo_agricultural_geostreet= 2 5 topo_fullresidential_agricultural= 24 Table 4 18 Frequent items for top 100 rules for PDO crashes in time group 3 Frequent Items No. busstop_num_sqmile=big 56 dis_fullresidential_busroute=intermediate 54 topo_industrial_busroute=DISJOINT 3 3 topo_fullresidential_busroute=DISJOINT 2 4 dis_retailoffice_busroute=intermediate 23 Table 4 19 Frequent items for top 100 rules for PDO crashes in time group 4 Frequent Items No. dis_fullresidential_busroute=intermediate 9 6 topo_industrial_geostreet=DISJOINT 38 busstop_num_sqmil e=big 2 6 topo_fullresidential_busroute=DISJOINT 26 topo_fullresidential_retailoffice=TOUCHES 23 Table 4 20 Frequent items for top 100 rules for injury crashes in time group 1 Frequent Items No. topo_f ullresidential_retailoffice=TOUCHES 5 0 dis_fullresidential_busstop=far 4 8 dis_fullresidential_geostreet=far 4 6 dis_retailoffice_geostreet=far 4 1 dis_fullresidential_retailoffice=far 27 Table 4 21 Frequent items for top 100 rules for injury crashes in time group 2 Frequent Items No. dis_fullresidential_busstop=far 74 topo_fullresidential_retailoffice=TOUCHES 42 dis_retailoffice_geostreet=far 41 dis_fullresidential_geo street=far 31 busstop_num_sqmile=intermediate 24

PAGE 118

118 Table 4 22 Frequent items for top 100 rules for injury crashes in time group 3 Frequent Items No. dis_fullresidential_busstop=far 7 1 dis_retail office_geostreet=far 47 dis_fullresidential_retailoffice=far 4 4 topo_fullresidential_retailoffice=TOUCHES 3 2 dis_fullresidential_geostreet=far 2 5 Table 4 23 Frequent items for top 100 rules for injury c rashes in time group 4 Frequent Items No. dis_fullresidential_busstop=far 52 dis_fullresidential_retailoffice=far 43 topo_fullresidential_retailoffice=TOUCHES 40 dis_retailoffice_geostreet=far 31 busstop_num_sqmile=intermediate 24 Table 4 24 Frequent items for top 100 rules for pe de strian crashes in time group 2 Frequent Items No. busstop_num_sqmile=big 91 topo_fullresidential_institutional=TOUCHES 53 street_length_feet_sqmile=big 34 topo_fullresidential_geostreet=DISJOINT 29 topo_fullresidential_busroute=DISJOINT 19 Table 4 25 Frequent items for top 100 rules for pedestrian crashes in time group 3 Frequent Items No. busstop_num_s qmile=big 100 topo_industrial_busroute=DISJOINT 4 5 opo_fullresidential_industrial=TOUCHES 3 8 topo_fullresidential_institutional=TOUCHES 3 7 dis_institutional_geostreet=intermediate 15 Table 4 26 Frequent items for top 100 rules for pedestrian crashes in time group 4 Frequent Items No. busstop_num_sqmile=big 100 dis_institutional_busroute=intermediate 99 dis_fullresidential_ge ostreet=intermediate 19 topo_fullresidential_recreation= 19 topo_recreation_busroute= 19

PAGE 119

119 Table 4 27 Frequent items for top 100 rules for b ik e crashes in time group 3 Frequent Items No. topo_fullresidential_retailoffice=TOUCHES 41 topo_retailoffice_institutional=TOUCHES 38 street_length_feet_sqmile=intermediate 3 5 topo_retailoffice_busroute=DISJOINT 31 pop_density_sqmile=small 29

PAGE 120

120 T able 4 28 Top 10 rules for total crash in time group 1 Antecedent Supp. Conf. Lift 1 {topo_retailoffice_geostreet=CROSSES/CONTAINS, topo_retailoffice_industrial=TOUCHES} 0.0512 0.7407 2.2687 2 {dis_retail office_geostreet=far, topo_retailoffice_industrial=TOUCHES} 0.0512 0.7080 2.1683 3 {dis_retailoffice_busroute=intermediate, topo_fullresidential_busroute=DISJOINT, topo_industrial_busroute=DISJOINT, topo_recreation_geostreet=} 0.0538 0.7059 2.1619 4 {dis_retailoffice_busroute=intermediate, topo_fullresidential_busroute=DISJOINT, topo_fullresidential_recreation=, topo_industrial_busroute=DISJOINT} 0.0538 0.7059 2.1619 5 {dis_retailoffice_busroute=intermediate, topo_fullresidential_busroute=DISJOIN T, topo_industrial_busroute=DISJOINT, topo_recreation_busroute=} 0.0538 0.7059 2.1619 6 {dis_retailoffice_busroute=intermediate, topo_fullresidential_busroute=DISJOINT, topo_industrial_busroute=DISJOINT, topo_retailoffice_recreation=} 0.0538 0.7059 2 .1619 7 {dis_industrial_geostreet=far, topo_retailoffice_industrial=TOUCHES} 0.0506 0.7054 2.1603 8 {dis_industrial_busroute=intermediate, topo_fullresidential_busroute=DISJOINT, topo_industrial_busroute=DISJOINT, topo_recreation_geostreet=} 0.0512 0 .6957 2.1306 9 {dis_industrial_busroute=intermediate, topo_fullresidential_busroute=DISJOINT, topo_fullresidential_recreation=, topo_industrial_busroute=DISJOINT} 0.0512 0.6957 2.1306 10 {dis_industrial_busroute=intermediate, topo_fullresidential_bus route=DISJOINT, topo_industrial_busroute=DISJOINT, topo_recreation_busroute=} 0.0512 0.6957 2.1306

PAGE 121

121 Table 4 29 Top 10 rules for total crash in time group 2 Antecedent Supp. Conf. Lift 1 {dis_fullresiden tial_busstop=far, topo_retailoffice_industrial=TOUCHES} 0.0502 0.7524 2.3115 2 {dis_retailoffice_geostreet=far, topo_retailoffice_industrial=TOUCHES} 0.0528 0.7345 2.2566 3 {busstop_num_sqmile=intermediate, dis_fullresidential_retailoffice=far, dis_f ullresidential_geostreet=far} 0.0502 0.7315 2.2473 4 {busstop_num_sqmile=intermediate, dis_fullresidential_busstop=far, dis_fullresidential_geostreet=far, topo_fullresidential_retailoffice=TOUCHES} 0.0528 0.7281 2.2368 5 {dis_industrial_geostreet=far, topo_retailoffice_industrial=TOUCHES} 0.0515 0.7232 2.2219 6 {busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, dis_fullresidential_busroute=intermediate, topo_fullresidential_busroute=DISJOINT} 0.0579 0.7222 2.2189 7 {busstop_n um_sqmile=intermediate, dis_fullresidential_geostreet=far, dis_retailoffice_geostreet=far, topo_fullresidential_retailoffice=TOUCHES} 0.0509 0.7207 2.2142 8 {busstop_num_sqmile=intermediate, dis_fullresidential_geostreet=far, topo_fullresidential_ret ailoffice=TOUCHES} 0.0540 0.7203 2.2131 9 {busstop_num_sqmile=intermediate, dis_fullresidential_busstop=far, dis_fullresidential_geostreet=far, dis_retailoffice_geostreet=far} 0.0509 0.7143 2.1945 10 {dis_fullresidential_busstop=far, dis_fullresident ial_geostreet=far, topo_agricultural_geostreet=, topo_fullresidential_retailoffice=TOUCHES} 0.0540 0.7083 2.1762

PAGE 122

122 Table 4 30 Top 10 rules for total crash in time group 3 Antecedent Supp. Conf. Lift 1 { dis_fullresidential_busstop=far, topo_retailoffice_industrial=TOUCHES} 0.0512 0.7714 2.3187 2 {busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, dis_fullresidential_busroute=intermediate, topo_fullresidential_busroute=DISJOINT} 0. 0607 0.7619 2.2901 3 {busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, dis_fullresidential_busroute=intermediate, topo_fullresidential_retailoffice=TOUCHES} 0.0670 0.7571 2.2757 4 {busstop_num_sqmile=big, dis_retailoffice_geostr eet=intermediate, dis_fullresidential_busroute=intermediate, topo_fullresidential_busroute=DISJOINT} 0.0601 0.7480 2.2484 5 {busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, dis_fullresidential_busroute=intermediate, topo_fullre sidential_institutional=TOUCHES} 0.0506 0.7477 2.2473 6 {busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, dis_fullresidential_busroute=intermediate} 0.0670 0.7465 2.2437 7 {busstop_num_sqmile=big, dis_fullresidential_retailoffice =intermediate, dis_retailoffice_geostreet=intermediate, dis_fullresidential_busroute=intermediate} 0.0626 0.7444 2.2373 8 {busstop_num_sqmile=big, dis_fullresidential_busroute=intermediate, topo_fullresidential_busroute=DISJOINT, topo_fullresidential _retailoffice=TOUCHES} 0.0658 0.7429 2.2328 9 {busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, dis_fullresidential_geostreet=intermediate, dis_fullresidential_busroute=intermediate} 0.0639 0.7426 2.2322 10 {busstop_num_sqmile=bi g, dis_fullresidential_retailoffice=intermediate, dis_fullresidential_busroute=intermediate, topo_agricultural_geostreet=} 0.0651 0.7410 2.2272

PAGE 123

123 Table 4 31 Top 10 rules for total crash in time group 4 A ntecedent Supp. Conf. Lift 1 {dis_fullresidential_busstop=far, topo_fullresidential_retailoffice=TOUCHES, topo_industrial_busroute=DISJOINT} 0.0509 0.7143 2.1677 2 {dis_industrial_geostreet=far, topo_retailoffice_industrial=TOUCHES} 0.0503 0.7054 2.14 06 3 {dis_retailoffice_geostreet=far, topo_retailoffice_industrial=TOUCHES} 0.0503 0.6991 2.1216 4 {busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, dis_fullresidential_busroute=intermediate, topo_fullresidential_busroute=DISJOI NT} 0.0560 0.6984 2.1195 5 {dis_industrial_geostreet=far, topo_fullresidential_retailoffice=TOUCHES, topo_industrial_busroute=DISJOINT} 0.0503 0.6930 2.1030 6 {busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, dis_fullresidential _busroute=intermediate, topo_fullresidential_retailoffice=TOUCHES} 0.0617 0.6929 2.1026 7 {busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, dis_fullresidential_busroute=intermediate, topo_recreation_geostreet=} 0.0585 0.6917 2.09 92 8 {busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, dis_fullresidential_busroute=intermediate, topo_fullresidential_recreation=} 0.0585 0.6917 2.0992 9 {busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, dis_fullresidential_busroute=intermediate, topo_recreation_busroute=} 0.0585 0.6917 2.0992 10 {busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, dis_fullresidential_busroute=intermediate, topo_retailoffice_recreation=} 0.0585 0.69 17 2.0992

PAGE 124

124 Table 4 32 Top 10 rules for PDO crash in time group 1 Antecedent Supp. Conf. Lift 1 {topo_retailoffice_geostreet=CROSSES/CONTAINS, topo_retailoffice_industrial=TOUCHES} 0.0514 0.7407 2.2914 2 {dis_retailoffice_busroute=intermediate, topo_fullresidential_busroute=DISJOINT, topo_industrial_busroute=DISJOINT, topo_retailoffice_industrial=TOUCHES} 0.0508 0.7315 2.2628 3 {dis_retailoffice_busroute=intermediate, topo_fullresidential_busroute=D ISJOINT, topo_retailoffice_industrial=TOUCHES} 0.0521 0.7297 2.2574 4 {dis_retailoffice_busroute=intermediate, topo_fullresidential_busroute=DISJOINT, topo_fullresidential_retailoffice=TOUCHES, topo_retailoffice_industrial=TOUCHES} 0.0521 0.7297 2.257 4 5 {dis_retailoffice_busroute=intermediate, topo_fullresidential_busroute=DISJOINT, topo_industrial_busroute=DISJOINT, topo_recreation_geostreet=} 0.0546 0.7143 2.2096 6 {dis_retailoffice_busroute=intermediate, topo_fullresidential_busroute=DISJOINT topo_fullresidential_recreation=, topo_industrial_busroute=DISJOINT} 0.0546 0.7143 2.2096 7 {dis_retailoffice_busroute=intermediate, topo_fullresidential_busroute=DISJOINT, topo_industrial_busroute=DISJOINT, topo_recreation_busroute=} 0.0546 0.7143 2.2096 8 {dis_retailoffice_busroute=intermediate, topo_fullresidential_busroute=DISJOINT, topo_industrial_busroute=DISJOINT, topo_retailoffice_recreation=} 0.0546 0.7143 2.2096 9 {dis_retailoffice_busroute=intermediate, topo_fullresidential_busroute =DISJOINT, topo_fullresidential_retailoffice=TOUCHES, topo_industrial_busroute=DISJOINT} 0.0578 0.7087 2.1922 10 {dis_retailoffice_busroute=intermediate, topo_fullresidential_busroute=DISJOINT, topo_industrial_busroute=DISJOINT} 0.0591 0.7077 2.1892

PAGE 125

125 Table 4 33 Top 10 rules for PDO crash in time group 2 Antecedent Supp. Conf. Lift 1 {busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, dis_fullresidential_busroute=intermediate, to po_fullresidential_busroute=DISJOINT} 0.0593 0.7381 2.2429 2 {dis_industrial_geostreet=far, topo_retailoffice_industrial=TOUCHES} 0.0517 0.7232 2.1977 3 {busstop_num_sqmile=big, dis_fullresidential_geostreet=intermediate, topo_agricultural_geostreet=, topo_industrial_busroute=DISJOINT} 0.0529 0.7217 2.1932 4 {busstop_num_sqmile=big, dis_fullresidential_geostreet=intermediate, topo_fullresidential_agricultural=, topo_industrial_busroute=DISJOINT} 0.0529 0.7217 2.1932 5 {busstop_num_sqmile=big, di s_fullresidential_retailoffice=intermediate, dis_fullresidential_busroute=intermediate, topo_fullresidential_retailoffice=TOUCHES} 0.0644 0.7214 2.1922 6 {busstop_num_sqmile=big, dis_fullresidential_busroute=intermediate, topo_fullresidential_busroute =DISJOINT, topo_fullresidential_retailoffice=TOUCHES} 0.0644 0.7214 2.1922 7 {mix_index=big, topo_retailoffice_busroute=DISJOINT, topo_retailoffice_industrial=TOUCHES} 0.0542 0.7203 2.1889 8 {mix_index=big, topo_fullresidential_retailoffice=TOUCHES, topo_retailoffice_busroute=DISJOINT, topo_retailoffice_industrial=TOUCHES} 0.0523 0.7193 2.1858 9 {mix_index=big, topo_industrial_busroute=DISJOINT, topo_retailoffice_busroute=DISJOINT, topo_retailoffice_industrial=TOUCHES} 0.0517 0.7168 2.1782 10 { dis_fullresidential_busstop=far, dis_fullresidential_geostreet=far, topo_agricultural_geostreet=, topo_fullresidential_retailoffice=TOUCHES} 0.0548 0.7167 2.1778

PAGE 126

126 Table 4 34 Top 10 rules for PDO crash in time group 3 Antecedent Supp. Conf. Lift 1 {busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, dis_fullresidential_busroute=intermediate, topo_fullresidential_busroute=DISJOINT} 0.0583 0.7302 2.2044 2 {busstop_num_sqmile=bi g, dis_fullresidential_geostreet=intermediate, topo_agricultural_geostreet=, topo_industrial_busroute=DISJOINT} 0.0526 0.7217 2.1790 3 {busstop_num_sqmile=big, dis_fullresidential_geostreet=intermediate, topo_fullresidential_agricultural=, top o_industrial_busroute=DISJOINT} 0.0526 0.7217 2.1790 4 {busstop_num_sqmile=big, dis_retailoffice_geostreet=intermediate, dis_fullresidential_busroute=intermediate, topo_fullresidential_busroute=DISJOINT} 0.0576 0.7165 2.1633 5 {busstop_num_sqmile=b ig, dis_fullresidential_retailoffice=intermediate, dis_fullresidential_busroute=intermediate, topo_fullresidential_retailoffice=TOUCHES} 0.0633 0.7143 2.1565 6 {busstop_num_sqmile=big, dis_fullresidential_busroute=intermediate, topo_fullresident ial_busroute=DISJOINT, topo_fullresidential_retailoffice=TOUCHES} 0.0633 0.7143 2.1565 7 {busstop_num_sqmile=intermediate, dis_fullresidential_busstop=far, dis_fullresidential_geostreet=far, topo_fullresidential_retailoffice=TOUCHES} 0.0513 0.7105 2.1452 8 {busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, dis_retailoffice_geostreet=intermediate, dis_fullresidential_busroute=intermediate} 0.0595 0.7068 2.1338 9 {dis_fullresidential_busstop=far, topo_fullresidential_re tailoffice=TOUCHES, topo_industrial_busroute=DISJOINT} 0.0500 0.7054 2.1296 10 {busstop_num_sqmile=big, dis_fullresidential_geostreet=intermediate, dis_retailoffice_geostreet=intermediate, topo_industrial_busroute=DISJOINT} 0.0500 0.7054 2.1296

PAGE 127

127 Table 4 35 Top 10 rules for PDO crash in time group 4 Antecedent Supp. Conf. Lift 1 {busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, dis_fullresidential_busroute=intermediate, topo_fullresidential_busroute=DISJOINT} 0.0563 0.6984 2.1047 2 {busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, dis_fullresidential_busroute=intermediate, topo_recreation_geostreet=} 0.0588 0.6917 2.0845 3 {busstop_num_sqmile=bi g, dis_fullresidential_retailoffice=intermediate, dis_fullresidential_busroute=intermediate, topo_fullresidential_recreation=} 0.0588 0.6917 2.0845 4 {busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, dis_fullresidential_busroute =intermediate, topo_recreation_busroute=} 0.0588 0.6917 2.0845 5 {busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, dis_fullresidential_busroute=intermediate, topo_retailoffice_recreation=} 0.0588 0.6917 2.0845 6 {dis_fullresiden tial_busroute=intermediate, topo_industrial_geostreet=DISJOINT, topo_retailoffice_busroute=DISJOINT, topo_retailoffice_geostreet=DISJOINT} 0.0505 0.6870 2.0701 7 {dis_fullresidential_busroute=intermediate, topo_fullresidential_retailoffice=TOUCHES, t opo_industrial_geostreet=DISJOINT, topo_retailoffice_geostreet=DISJOINT} 0.0505 0.6870 2.0701 8 {busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, dis_fullresidential_busroute=intermediate, topo_fullresidential_retailoffice=TOUCHE S} 0.0614 0.6857 2.0664 9 {dis_retailoffice_geostreet=intermediate, dis_fullresidential_busroute=intermediate, topo_industrial_busroute=DISJOINT, topo_industrial_geostreet=DISJOINT} 0.0505 0.6810 2.0523 10 {dis_fullresidential_busroute=intermediate, topo_industrial_geostreet=DISJOINT, topo_retailoffice_geostreet=DISJOINT} 0.0512 0.6780 2.0430

PAGE 128

128 Table 4 36 Top 10 rules for injury crash in time group 1 Antecedent Supp. Conf. Lift 1 {dis_fullresidentia l_geostreet=far, dis_industrial_geostreet=far, topo_fullresidential_retailoffice=TOUCHES} 0.0501 0.7978 2.4592 2 {dis_fullresidential_busstop=far, dis_industrial_geostreet=far, topo_fullresidential_retailoffice=TOUCHES} 0.0515 0.7849 2.4197 3 {dis_fu llresidential_geostreet=far, dis_retailoffice_geostreet=far, dis_industrial_geostreet=far} 0.0529 0.7813 2.4083 4 {busstop_num_sqmile=intermediate, dis_fullresidential_busstop=far, dis_retailoffice_geostreet=far, topo_fullresidential_institutional=TO UCHES} 0.0515 0.7684 2.3687 5 {dis_fullresidential_busstop=far, topo_fullresidential_retailoffice=TOUCHES, topo_retailoffice_industrial=TOUCHES} 0.0529 0.7653 2.3591 6 {dis_fullresidential_geostreet=far, dis_industrial_geostreet=far} 0.0550 0.7647 2.3 573 7 {dis_fullresidential_retailoffice=far, dis_industrial_geostreet=far, topo_fullresidential_retailoffice=TOUCHES} 0.0522 0.7629 2.3517 8 {dis_fullresidential_busstop=far, dis_retailoffice_geostreet=far, dis_industrial_geostreet=far} 0.0522 0.7629 2.3517 9 {busstop_num_sqmile=intermediate, dis_fullresidential_retailoffice=far, dis_fullresidential_geostreet=far, topo_fullresidential_retailoffice=TOUCHES} 0.0543 0.7624 2.3501 10 {dis_fullresidential_retailoffice=far, dis_retailoffice_geostreet= far, dis_industrial_geostreet=far, topo_fullresidential_retailoffice=TOUCHES} 0.0515 0.7604 2.3441

PAGE 129

129 Table 4 37 Top 10 rules for injury crash in time group 2 Antecedent Supp. Conf. Lift 1 {busstop_num_sq mile=intermediate, dis_fullresidential_busstop=far, dis_institutional_geostreet=far, topo_fullresidential_retailoffice=TOUCHES} 0.0517 0.8105 2.4884 2 {busstop_num_sqmile=intermediate, dis_fullresidential_busstop=far, dis_retailoffice_geostreet=far, topo_fullresidential_institutional=TOUCHES} 0.0517 0.7938 2.4371 3 {dis_fullresidential_busstop=far, dis_retailoffice_geostreet=far, dis_industrial_geostreet=far} 0.0537 0.7921 2.4318 4 {dis_fullresidential_busstop=far, dis_industrial_geostreet=far, topo_fullresidential_retailoffice=TOUCHES} 0.0504 0.7895 2.4238 5 {dis_fullresidential_busstop=far, dis_fullresidential_geostreet=far, dis_institutional_geostreet=far, topo_fullresidential_retailoffice=TOUCHES} 0.0604 0.7826 2.4027 6 {busstop_num_sqm ile=intermediate, dis_fullresidential_retailoffice=far, dis_fullresidential_geostreet=far, topo_fullresidential_retailoffice=TOUCHES} 0.0531 0.7822 2.4014 7 {dis_fullresidential_busstop=far, dis_fullresidential_geostreet=far, topo_fullresidential_ins titutional=TOUCHES, topo_fullresidential_retailoffice=TOUCHES} 0.0531 0.7822 2.4014 8 {dis_fullresidential_busstop=far, dis_retailoffice_institutional=far, topo_fullresidential_institutional=TOUCHES} 0.0504 0.7813 2.3985 9 {dis_fullresidential_busstop =far, dis_institutional_geostreet=far, dis_retailoffice_institutional=far, topo_fullresidential_retailoffice=TOUCHES} 0.0504 0.7813 2.3985 10 {dis_retailoffice_geostreet=far, dis_industrial_geostreet=far, topo_retailoffice_industrial=TOUCHES} 0.0524 0.7800 2.3947

PAGE 130

130 Table 4 38 Top 10 rules for injury crash in time group 3 Antecedent Supp. Conf. Lift 1 {dis_fullresidential_busstop=far, dis_fullresidential_retailoffice=far, dis_fullresidential_busroute =far, topo_fullresidential_retailoffice=TOUCHES} 0.0532 0.7885 2.4611 2 {busstop_num_sqmile=intermediate, dis_fullresidential_retailoffice=far, dis_fullresidential_geostreet=far, topo_fullresidential_retailoffice=TOUCHES} 0.0512 0.7670 2.3941 3 {buss top_num_sqmile=intermediate, dis_fullresidential_busstop=far, dis_retailoffice_geostreet=far, topo_fullresidential_busroute=DISJOINT} 0.0551 0.7658 2.3903 4 {dis_fullresidential_busstop=far, dis_retailoffice_geostreet=far, dis_industrial_geostreet=fa r} 0.0506 0.7647 2.3870 5 {dis_fullresidential_busstop=far, dis_fullresidential_retailoffice=far, dis_fullresidential_institutional=far, topo_fullresidential_retailoffice=TOUCHES} 0.0506 0.7647 2.3870 6 {busstop_num_sqmile=intermediate, dis_fullresid ential_retailoffice=far, dis_fullresidential_geostreet=far, dis_retailoffice_geostreet=far} 0.0506 0.7647 2.3870 7 {busstop_num_sqmile=intermediate, dis_fullresidential_busstop=far, dis_fullresidential_retailoffice=far, dis_fullresidential_geostreet= far} 0.0525 0.7642 2.3853 8 {dis_fullresidential_busstop=far, topo_retailoffice_industrial=TOUCHES} 0.0519 0.7619 2.3783 9 {dis_fullresidential_busstop=far, dis_retailoffice_geostreet=far, dis_fullresidential_institutional=far, topo_fullresidential_r etailoffice=TOUCHES} 0.0538 0.7615 2.3769 10 {dis_fullresidential_busstop=far, dis_fullresidential_retailoffice=far, dis_retailoffice_geostreet=far, dis_fullresidential_busroute=far} 0.0558 0.7611 2.3756

PAGE 131

131 Table 4 39 Top 10 rules for injury crash in time group 4 Antecedent Supp. Conf. Lift 1 {dis_fullresidential_busstop=far, topo_fullresidential_retailoffice=TOUCHES, topo_retailoffice_industrial=TOUCHES} 0.0512 0.7755 2.5366 2 {dis_fullresidential_bu sstop=far, topo_retailoffice_industrial=TOUCHES} 0.0532 0.7524 2.4610 3 {dis_fullresidential_busstop=far, dis_retailoffice_geostreet=far, dis_retailoffice_institutional=far, topo_fullresidential_retailoffice=TOUCHES} 0.0512 0.7238 2.3675 4 {dis_fullr esidential_busstop=far, dis_retailoffice_institutional=far, topo_fullresidential_retailoffice=TOUCHES} 0.0525 0.7222 2.3623 5 {dis_retailoffice_geostreet=far, dis_retailoffice_institutional=far, topo_fullresidential_retailoffice=TOUCHES, topo_retailo ffice_busroute=DISJOINT} 0.0525 0.7222 2.3623 6 {dis_fullresidential_busstop=far, dis_fullresidential_retailoffice=far, dis_fullresidential_busroute=far, topo_fullresidential_retailoffice=TOUCHES} 0.0505 0.7212 2.3588 7 {busstop_num_sqmile=intermediat e, dis_fullresidential_retailoffice=far, topo_fullresidential_geostreet=CROSSES/CONTAINS} 0.0532 0.7182 2.3491 8 {busstop_num_sqmile=intermediate, dis_fullresidential_busstop=far, dis_fullresidential_retailoffice=far, dis_fullresidential_geostreet=far } 0.0512 0.7170 2.3452 9 {dis_fullresidential_busstop=far, topo_fullresidential_retailoffice=TOUCHES, topo_industrial_busroute=DISJOINT} 0.0539 0.7143 2.3364 10 {busstop_num_sqmile=intermediate, dis_fullresidential_busstop=far, dis_fullresidential_re tailoffice=far, topo_fullresidential_retailoffice=TOUCHES} 0.0606 0.7143 2.3364

PAGE 132

132 Table 4 40 Top 10 rules for pedestrian crash in time group 2 Antecedent Supp. Conf. Lift 1 {street_length_feet_sqmile=big, busstop_num_sqmile=big, topo_fullresidential_geostreet=DISJOINT, topo_fullresidential_retailoffice=TOUCHES} 0.0591 0.6413 2.3382 2 {street_length_feet_sqmile=big, busstop_num_sqmile=big, topo_fullresidential_busroute=DISJOINT, topo_fullresidential_ geostreet=DISJOINT} 0.0571 0.6196 2.2589 3 {busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, topo_fullresidential_geostreet=DISJOINT, topo_fullresidential_institutional=TOUCHES} 0.0521 0.6190 2.2570 4 {street_length_feet_sqmile=b ig, busstop_num_sqmile=big, topo_fullresidential_geostreet=DISJOINT, topo_retailoffice_busroute=DISJOINT} 0.0551 0.6111 2.2281 5 {street_length_feet_sqmile=big, busstop_num_sqmile=big, topo_fullresidential_busroute=DISJOINT, topo_fullresidential_ins titutional=TOUCHES} 0.0561 0.6087 2.2193 6 {busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, topo_fullresidential_geostreet=DISJOINT, topo_institutional_geostreet=DISJOINT} 0.0521 0.6047 2.2045 7 {busstop_num_sqmile=big, dis_ret ailoffice_geostreet=intermediate, topo_fullresidential_geostreet=DISJOINT, topo_fullresidential_institutional=TOUCHES} 0.0561 0.6022 2.1954 8 {street_length_feet_sqmile=big, busstop_num_sqmile=big, topo_fullresidential_geostreet=DISJOINT} 0.0601 0.600 0 2.1876 9 {street_length_feet_sqmile=big, busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, topo_fullresidential_busroute=DISJOINT} 0.0511 0.6000 2.1876 10 {busstop_num_sqmile=big, topo_fullresidential_geostreet=DISJOINT, topo_ fullresidential_institutional=TOUCHES, topo_fullresidential_retailoffice=TOUCHES} 0.0701 0.5983 2.1814

PAGE 133

133 Table 4 41 Top 10 rules for pedestrian crash in time group 3 Antecedent Supp. Conf. Lift 1 {busstop _num_sqmile=big, topo_agricultural_geostreet=, topo_fullresidential_busroute=DISJOINT, topo_fullresidential_industrial=TOUCHES} 0.0528 0.8025 2.7235 2 {busstop_num_sqmile=big, topo_fullresidential_agricultural=, topo_fullresidential_busroute=DISJOINT topo_fullresidential_industrial=TOUCHES} 0.0528 0.8025 2.7235 3 {busstop_num_sqmile=big, topo_fullresidential_busroute=DISJOINT, topo_fullresidential_institutional=TOUCHES, topo_industrial_busroute=DISJOINT} 0.0593 0.8022 2.7226 4 {busstop_num_sqmi le=big, topo_fullresidential_busroute=DISJOINT, topo_fullresidential_industrial=TOUCHES, topo_fullresidential_retailoffice=TOUCHES} 0.0519 0.8000 2.7152 5 {busstop_num_sqmile=big, topo_fullresidential_busroute=DISJOINT, topo_fullresidential_instituti onal=TOUCHES, topo_industrial_geostreet=DISJOINT} 0.0519 0.8000 2.7152 6 {busstop_num_sqmile=big, dis_fullresidential_retailoffice=intermediate, dis_institutional_geostreet=intermediate, topo_industrial_busroute=DISJOINT} 0.0519 0.8000 2.7152 7 {buss top_num_sqmile=big, dis_fullresidential_industrial=intermediate, topo_fullresidential_institutional=TOUCHES, topo_industrial_busroute=DISJOINT} 0.0511 0.7975 2.7066 8 {busstop_num_sqmile=big, topo_fullresidential_institutional=TOUCHES, topo_industria l_geostreet=DISJOINT, topo_recreation_geostreet=} 0.0511 0.7975 2.7066 9 {busstop_num_sqmile=big, topo_fullresidential_institutional=TOUCHES, topo_fullresidential_recreation=, topo_industrial_geostreet=DISJOINT} 0.0511 0.7975 2.7066 10 {busstop_num_s qmile=big, topo_fullresidential_institutional=TOUCHES, topo_industrial_geostreet=DISJOINT, topo_recreation_busroute=} 0.0511 0.7975 2.7066

PAGE 134

134 Table 4 42 Top 10 rules for pedestrian crash in time group 4 A ntecedent Supp. Conf. Lift 1 {busstop_num_sqmile=big, dis_retailoffice_busroute=intermediate, dis_institutional_busroute=intermediate, topo_recreation_geostreet=} 0.0503 0.7222 2.7129 2 {busstop_num_sqmile=big, dis_retailoffice_busroute=intermediate, dis_institutional_busroute=intermediate, topo_fullresidential_recreation=} 0.0503 0.7222 2.7129 3 {busstop_num_sqmile=big, dis_retailoffice_busroute=intermediate, dis_institutional_busroute=intermediate, topo_recreation_busroute=} 0.0503 0.7222 2.71 29 4 {busstop_num_sqmile=big, dis_retailoffice_busroute=intermediate, dis_institutional_busroute=intermediate, topo_retailoffice_recreation=} 0.0503 0.7222 2.7129 5 {busstop_num_sqmile=big, dis_institutional_busroute=intermediate, dis_retailoffice_i nstitutional=intermediate, topo_recreation_geostreet=} 0.0523 0.7200 2.7046 6 {busstop_num_sqmile=big, dis_institutional_busroute=intermediate, dis_retailoffice_institutional=intermediate, topo_fullresidential_recreation=} 0.0523 0.7200 2.7046 7 {bus stop_num_sqmile=big, dis_institutional_busroute=intermediate, dis_retailoffice_institutional=intermediate, topo_recreation_busroute=} 0.0523 0.7200 2.7046 8 {busstop_num_sqmile=big, dis_institutional_busroute=intermediate, dis_retailoffice_institutio nal=intermediate, topo_retailoffice_recreation=} 0.0523 0.7200 2.7046 9 {busstop_num_sqmile=big, dis_institutional_busroute=intermediate, dis_fullresidential_geostreet=intermediate, dis_retailoffice_institutional=intermediate} 0.0513 0.7067 2.6545 10 {busstop_num_sqmile=big, dis_institutional_busroute=intermediate, dis_fullresidential_geostreet=intermediate, topo_recreation_geostreet=} 0.0571 0.6941 2.6074

PAGE 135

135 Table 4 43 Top 10 rules for bike crash in time group 3 Antecedent Supp. Conf. Lift 1 {dis_fullresidential_busstop=far, topo_fullresidential_retailoffice=TOUCHES, topo_retailoffice_institutional=TOUCHES} 0.0501 0.6316 1.9087 2 {dis_fullresidential_busstop=far, topo_retailoffice_institutional= TOUCHES} 0.0532 0.6071 1.8348 3 {pop_density_sqmile=small, topo_agricultural_geostreet=, topo_fullresidential_recreation=, topo_fullresidential_retailoffice=TOUCHES} 0.0564 0.5934 1.7933 4 {pop_density_sqmile=small, topo_fullresidential_agricultural= topo_fullresidential_recreation=, topo_fullresidential_retailoffice=TOUCHES} 0.0564 0.5934 1.7933 5 {pop_density_sqmile=small, topo_agricultural_geostreet=, topo_fullresidential_retailoffice=TOUCHES, topo_recreation_geostreet=} 0.0564 0.5934 1.7933 6 {pop_density_sqmile=small, topo_fullresidential_agricultural=, topo_fullresidential_retailoffice=TOUCHES, topo_recreation_geostreet=} 0.0564 0.5934 1.7933 7 {pop_density_sqmile=small, topo_agricultural_geostreet=, topo_fullresidential_retailoffic e=TOUCHES, topo_retailoffice_recreation=} 0.0564 0.5934 1.7933 8 {pop_density_sqmile=small, topo_fullresidential_agricultural=, topo_fullresidential_retailoffice=TOUCHES, topo_retailoffice_recreation=} 0.0564 0.5934 1.7933 9 {dis_fullresidential_geos treet=far, dis_institutional_geostreet=far, topo_fullresidential_retailoffice=TOUCHES, topo_retailoffice_busroute=DISJOINT} 0.0532 0.5930 1.7922 10 {pop_density_sqmile=small, busstop_num_sqmile=intermediate, topo_fullresidential_retailoffice=TOUCHES, topo_retailoffice_busroute=DISJOINT} 0.0501 0.5926 1.7909

PAGE 136

136 Table 4 44 Influence of land use mix on crashes without considering crash time Crash type big intermediate small No. Max Mean No. Max Mean No. total_crash_num=big 206 1.932 1.6 47 893 1.926 1.6 35 0 pdo_crash_num=big 315 2. 071 1.6 68 80 4 1.933 1. 638 0 injury_crash_num=big 31 1.813 1.650 248 2. 091 1. 678 0 fatal_crash_num=big NA ped_crash_num=big 34 2 2. 208 1.8 35 1, 5 77 2. 587 1.8 69 0 bike_crash_ num=big 3 1.859 1.8 44 0 0

PAGE 137

137 Table 4 45 Influence of land use mix on crashes considering crash time Crash type big intermediate small No. Max Mean No. Max Mean No. tgp1_total_crash_num=big 137 1.955 1. 66 5 456 1.885 1. 651 0 tgp2_total_crash_num=big 261 2. 161 1. 704 470 1.935 1. 664 0 tgp3_total_crash_num=big 321 2. 063 1. 680 877 1.996 1.626 0 tgp4_total_crash_num=big 4 7 1.776 1.62 4 5 4 1 1. 857 1. 621 0 tgp1_pdo_crash_num=big 140 2. 045 1. 674 591 1.944 1. 676 0 tgp2_pdo_crash_num=big 333 2. 189 1.713 751 2. 018 1. 648 0 tgp3_pdo_crash_num=big 324 2. 047 1. 643 600 1.955 1. 648 0 tgp4_pdo_crash_num=big 70 1. 760 1.594 762 1. 889 1. 629 0 tgp1_injury_crash_num=big 7 1. 613 1. 583 112 1.899 1. 622 0 tgp2_injury_crash_num= big 1 7 1.783 1. 652 169 2. 117 1. 712 0 tgp3_injury_crash_num=big 49 1.878 1. 698 184 1.951 1. 636 0 tgp4_injury_crash_num=big 6 1. 774 1. 720 82 2.019 1. 744 0 tgp1_ped_crash_num=big 0 tgp2_ped_crash_num=big 75 2.130 1.938 51 1.958 1. 868 0 tgp3_ped_crash_num =big 267 2. 024 1. 834 497 2. 215 1. 873 0 tgp4_ped_crash_num=big 3 1.896 1.896 1 38 2. 078 1.9 36 0 tgp1_bike_crash_num=big NA tgp2_bike_crash_num=big 0 tgp3_bike_crash_num=big 56 1.690 1.57 2 3 1. 526 1. 521 0 tgp4_bike_crash_num=big NA

PAGE 138

138 A B C D E Figure 4 1 Scatter plot of all rules for all crash types without considering crash occruence time

PAGE 139

139 Figure 4 2 Mat ching block groups for the top 5 rul es of total crashes

PAGE 140

140 Figure 4 3 Matching block groups for the top 5 rules of PDO crashes

PAGE 141

141 Figure 4 4 Matching block groups for the top 5 rules of injury crashes

PAGE 142

142 Figure 4 5 Matching block groups for the top 5 rules of pedestrian crashes

PAGE 143

143 Figure 4 6 Matching block groups for the top 5 rules of bike crashes

PAGE 144

144 A B C D Figure 4 7 Scatter plot for total crash in different time group

PAGE 145

145 A B C D Figure 4 8 Scatter plot for PDO crash in different ti me group

PAGE 146

146 A B C D Figure 4 9 Scatter plot for injury crash in different time group

PAGE 147

147 A B C Figure 4 10 Scatter plot for pedestrian cras h in different time group A Figure 4 11 Scatter plot for bike crash in different time group

PAGE 148

148 Figure 4 12 Matching block groups for the top 5 r ules of total crash in time group 1

PAGE 149

149 Figure 4 13 Matching block groups for the top 5 rules of total crash in time group 2

PAGE 150

150 Figure 4 14 Matching block groups for the top 5 rules of total crash in time group 3

PAGE 151

151 Figure 4 15 Matching block groups for the top 5 rules of total crash in time group 4

PAGE 152

152 Figure 4 16 Matching block groups for the top 5 rules of PDO crash in time group 1

PAGE 153

153 Figure 4 17 Matching block groups for the top 5 rules of PDO crash in time group 2

PAGE 154

154 Figure 4 18 Matching block groups for the top 5 rules of PDO crash in time group 3

PAGE 155

155 Figure 4 19 Matching block groups for the top 5 rules of PDO crash in time group 4

PAGE 156

156 Figure 4 20 Matching block groups for the top 5 rules of injury crash in time group 1

PAGE 157

157 Figure 4 21 Matching block groups for the top 5 rules of injury crash in time group 2

PAGE 158

158 Figure 4 22 Matching block groups for the top 5 rules of injury crash in time group 3

PAGE 159

159 Figure 4 23 Matching block groups for the top 5 rules of injury crash in time gro up 4

PAGE 160

160 Figure 4 24 Matching block groups for the top 5 rules of pedestrian crash in time group 2

PAGE 161

161 Figure 4 25 Matching block groups for the top 5 rules of pe destrian crash in time group 3

PAGE 162

162 Figure 4 26 Matching block groups for the top 5 rules of pedestrian crash in time group 4

PAGE 163

163 Figure 4 27 Matching block groups for the top 5 rules of bike crash in time group 3

PAGE 164

164 CHAPTER 5 CONCLUSIONS, RECOMMENDATIONS AND FUTURE RESEARCH This research studied the influence of built environment on traffic crashes by developing a spatio temporal association rule data mining approach. The key factors in this approach include: 1) Used the D variables framework to describe and measure built environment. The framework ensures that the built environment is comprehensively considered for the crash analysis. The land use mix index used for the Dive rsity dimension is not used by previous crash analysis research. 2 ) Integrated the spatial relationship between built environment elements. The spatial relationships considered include spatial distance and spatial topological relationship. There were 44 sp atial relationships calculated or extracted for this research. 3) Conducted the analysis at the community level. Census block group was selected as the analysis unit, and all the data were aggregated at the block group level. The result is also presented at block group level. 4) Applied the association rule data mining method to explore the associations between the built environment variables and different crash types. The resulting rules directly show how the combination of the built environment variable s (D variables, spatial distance relationship variables and spatial topological relationship variables) influence the occurrence of crash types in communities. 5) Conducted the analysis considering both spatial and spatio temporal aspects. The 5 years of crashes were divided into 4 groups according to time variation based on

PAGE 165

165 statistical analysis and the influence of built environment on crashes in different time groups was analyzed. The analysis shows that the D variables, spatial distance relationship var iables and spatial topological relationship variables have a mixed influence on certain crash type. A lot of rules were mined for each crash type except for three of them ( fatal crashes, pedestrian crashes in time group 1 and bike crashes in time group 2 ). All these rules are a combination of the three sorts of built environment variables (D variables, spatial distance relationship variables and spatial topological relationship variables). The same variables may have different influence when combined with d ifferent other variables for different crash types. The research also shows that mixed land use may bring big number of crashes in a community when combined with some other built environment variables, and the higher the mixed land use index, the stronger the influence. Implication to Planning The methodology and the result of this research are meaningful for planning practice. They can be used in safety conscious planning, land use decisions, long range transportation plans, and, to proactively apply safe ty treatments in high risk block groups. First, this research provides an alternative perspective for planners to think about the influence of environment Traditionally, planners have the idea that the influence of a land use types to crashes is either p ositive or negative. The result of this research indicate that the influence of a land use varies. The influence is different when it is combined with other different built environment variables, for different crash types, and in different times.

PAGE 166

166 Second, this research provides a way to identify the communities that have relative high risk of transportation safety issues. Any community that satisfies the antecedents of the top rules for a crash type is a community in high risk of that crash type. The mappin g of the matching block groups of high lift rules can help planners visualize the information in an easier way. Third, similar to the identification function, this research provides a method to evaluate a plan from the perspective of safety risk. If the c ondition of a community under suggests that most likely it will not have a big number of crashes, although the results cannot guarantee a complete lack of crashes or very few crashes. Fourth mixed land use is not an ideal planning schema. In recent years, planners have been advocating smart growth principles, promoting mixed land use development to battle urban sprawl problems and build better communities. According to American Planning Association (2012) use developments include quality housing, varied by type and price, integrated with shopping, schools, community facilities, and jobs resea rch shows that the highly mixed land use could bring big number of total and PDO crash to a community, which may make the community unsafe. This calls for a balanced approach between livability and safety for the planners and policy makers. Last, since th e research used Miami Dade County in Florida as case study, the result could be directly used by planners, government officials and transportation engineers in Miami Dade County. However, the spatio temporal association rule data

PAGE 167

167 mining framework to evalua te the influence of built environment on crashes and the tools used in this study could be applied to any place. Limitations and Future Research This research looked at a spatio temporal data mining framework to analyze the influence of built environment on crashes. It developed a comprehensive analysis for all crash types and included crash occurrence time. The results contain extensive information and further work will be needed to dig deeper into some of the crash types to find more behind the results. The calculation of some of the D variables (density variables, bus stop number per square mile, intersection number per square mile, street length per square mile) is based on the block group area. This research used the area of block group. To make the calculation more accurate, the water land in a block group need to be excluded in future research. Moreover, due to the large number of columns of the data, a lot of computer memory was required to process mining of association rules. Therefore the suppor t and confidence parameters for this research are set a little bit higher than the values used in other studies to run the mining algorithm. This could result in the loss of some rules. The analysis of this research is based on the block group as the com munity level unit and the results are presented at the block group level. Since most of the crashes occurred at the intersections, it is meaningful to explore the influence of the built environment on intersection crashes. The analysis level could be a com bination of community level combined with intersection zones. Another direction to further expand this research is to develop a system of operational tools to evaluate the influence of built environment on crashes by using the

PAGE 168

168 association rule data minin g method. An integration of the tools developed for this research would make it complete

PAGE 169

169 APPENDIX A SPATIAL DISTANCE RELATIONSHIP CALCULATION AND TOPOLOGICAL RELATIONSHIP EXTRACTION SOURCE CODE SAMPLE /** to get the topological relationship betwe en two relevant features which intersect the target feature with unique ID * @param targetFeatureName @param targetFeatureID unique ID for target feature @param relativeFeatureOne @param relativeFeatureTwo */ privat e void topologicalBGExtraction(String targetFeatureName, String targetFeatureID, String relativeFeatureOne, String relativeFeatureTwo) { try { Statement smnt = conn .createStatement(); // Create a temporary table String temporaryTopoT ableName = targetFeatureName + "_topo_" + relativeFeatureOne + "_" + relativeFeatureTwo; System. out .println( "Creating temporary table for the topolgocial relationship in the block group..." ); try { String createstr = "create table + "wdisj_" +temporaryTopoTableName + "(BGID" + varchar(50), topo_" + relativeFeatureOne + "_" + relativeFeatureTwo + varchar(20) )" ; smnt.execute(createstr); } catch (SQLException vErro) { String dropstr = "drop table + "wdisj_" + temporaryTopoTableName; String recreatestr = "create table + "wdisj_" + temporaryTopoTableName + "(BGID" + varchar(50), topo_" + relativeFeatureOne + "_" + relativeFeatureTwo + varchar(20) )" ; smnt.execute(dropstr); smnt.execute (recreatestr); } String strbg = "Select + targetFeatureID + from + targetFeatureName; ResultSet bgSet = smnt.executeQuery(strbg); ArrayList bglist = new ArrayList(); while (bgSet.next()) { bglist.add(bgSet. getString(1)); }

PAGE 170

170 for ( int i = 0; i < bglist.size(); i++) { // block group id String strbgid = bglist.get(i); // Get the SQL command //consider the disjoint String sAux = pu .getSql( "insert_topology_type_bg_nocontain_y esdisjoint" ); sAux = chSqlMeta(sAux, "" "wdisj_" +temporaryTopoTableName); sAux = chSqlMeta(sAux, "" relativeFeatureOne); sAux = chSqlMeta(sAux, "" re lativeFeatureTwo); sAux = chSqlMeta(sAux, "" targetFeatureName); sAux = chSqlMeta(sAux, "" targetFeatureID); sAux = chSqlMeta(sAux, "" strbgid); smnt.execute(sAux); } } catch (SQLException e) { // TODO Auto generated catch block e.printStackTrace(); } }

PAGE 171

171 APPENDIX B DISCRETIZATION AND ASSOCIATION RULE MINING SOURCE CODE SAMPLE #discritize the crash, d variables and spatial distance variables # ther result is a d iscrited result file for different crash types crashbediscretize < function (timeperiod,target, crashdis, dvardis,disdis) { bgresult < NULL address < NULL processaddress < NULL resultaddress < NULL #set the address for export result add ress=paste(target, "crash" sep= "_" ) address=paste(address, crashdis, sep= "_" ) address=paste(address, dvardis, sep= "_" ) address=paste(address, disdis, sep= "_" ) if (timeperiod == "sp" ){ address=paste( "sp" address, sep= "_" ) } else { a ddress=paste(timeperiod, address, sep= "_" ) address=paste( "ti" address, sep= "_" ) } address=paste( "D: \ \ Sync \ \ Dropbox \ \ Research \ \ Dissertation Research \ \ experiment \ \ result \ \ discretize \ \ rf_" address, sep= "" ) cat(address) resultaddress=paste( address, "_result.txt" sep= "" ) processaddress= paste(address, "_process.txt" sep= "" ) sink(processaddress) cat(processaddress) cat( \ n" ) cat(resultaddress) cat( \ n" ) if (timeperiod == "sp" ){ bgresult < read.csv( "D: \ \ Sync \ \ Dropbox \ \ Res earch \ \ Dissertation Research \ \ experiment \ \ result \ \ rf_bg_total_crash_wdisj.csv" header= TRUE ) cat(ncol(bgresult)) } else { bgresult < read.csv( "D: \ \ Sync \ \ Dropbox \ \ Research \ \ Dissertation Research \ \ experiment \ \ result \ \ rf_ti_bg_all_1587_wdisj.csv header= TRUE ) cat(ncol(bgresult)) } crashcolum < "" if (timeperiod == "sp" ){ crashcolum=paste(target, "crash_num" ,sep= "_" ) for ( i in 1 : ncol(bgresult)){ if (crashcolum == colnames(bgresult) [ i ] ){

PAGE 172

172 bgresult < bgresult [ bgresult [ i ] > 0 ] bgresult < bgresult [ ,c( 1 ,i, 8 : 58 ) ] break } } } else { crashcolum=paste(timeperiod,target,sep= "_" ) crashcolum=paste(crashcolum, "crash_num" ,sep= "_" ) for ( i in 1 : ncol(bgresult)){ if (crashcolum == colnames(bgresult) [ i ] ){ bgresult < bgresult [ bgresult [ ,i ] > 0 ] bgresult < bgresult [ ,c( 1 ,i, 22 : 72 ) ] break } } } cat( "total rows: ) cat(nrow(bgresult)) cat( \ n" ) tempresult < NULL tempresult < as.ma trix(as.factor(bgresult [ 1 ] )) nclass < 3 #discretize the crash number cat( \ n crash summary: \ n" ) cat( "Min. 1st Qu. Median Mean 3rd Qu. Max. \ n" ) cat(summary(bgresult [ 2 ] )) cat( \ n' ) if (crashdis == "fixed" ){ q < quantile(bgresu lt [ 2 ] probs=c( 0 0.25 0.75 1 )) tempresult < cbind(tempresult,lapply(discretize(bgresult [ 2 ] crashdis, categories=c( Inf ,q [ 2 ] + 0.01 ,q [ 3 ] + 0.01 Inf ), labels=c( "small" "intermediate" "big" )), as.character)) cat( \ ncol:' ) cat ( 2 ) cat( ' ) cat(colnames(bgresult) [ 2 ] ) cat( \ n' ) disis < discretize(bgresult [ 2 ] crashdis, categories=c( Inf ,q [ 2 ] + 0.01 ,q [ 3 ] + 0.01 Inf )) cat(levels(disis)) cat( \ t' ) cat( \ n' ) cat(table(disis)) cat( \ n' ) } else { tempr esult < cbind(tempresult,lapply(discretize(bgresult [ 2 ] crashdis, categories=nclass, labels=c( "small" "intermediate" "big" )), as.character))

PAGE 173

173 cat( \ n \ ncol:' ) cat ( 2 ) cat( ' ) cat(colnames(bgresult) [ 2 ] ) cat( \ n' ) disis < discre tize(bgresult [ 2 ] crashdis, categories=nclass) cat(levels(disis)) cat( \ t' ) cat( \ n' ) cat(table(disis)) cat( \ n' ) } #discretize the d variables if (dvardis == "fixed" ){ pop < c( Inf 4836.01 12696.01 Inf ) housing < c( Inf 1738.01 5411.01 Inf ) job < c( Inf 250.31 1509.01 Inf ) mix < c( Inf 0.31171 0.66841 Inf ) strnum < c( Inf 34.01 93.01 Inf ) strlen < c( Inf 11996.01 32927.01 Inf ) inter < c( Inf 23.01 57.01 Inf ) busstop < c( Inf 2.01 7.01 Inf ) dvar < data .frame(pop,housing,job,mix,strnum,strlen,inter,busstop) for (i in 3 : 10 ){ tempresult < cbind(tempresult,lapply(discretize(bgresult [ ,i ] dvardis, categories=dvar [ ,i 2 ] labels=c( "small" "intermediate" "big" )), as.character)) cat( \ ncol:' ) cat (i) cat( ' ) cat(colnames(bgresult) [ i ] ) cat( \ n' ) disis < discretize(bgresult [ ,i ] dvardis, categories=dvar [ ,i 2 ] ) cat(levels(disis)) cat( \ t' ) cat( \ n' ) cat(table(disis)) cat( \ n' ) } } else { for (i in 3 : 10 ){ tempresult < cbind(tempresult,lapply(discretize(bgresult [ ,i ] dvardis, categories=nclass, labels=c( "small" "intermediate" "big" )), as.character)) cat( \ ncol:' ) cat (i ) cat( ' ) cat(colnames(bgresult) [ i ] ) cat( \ n' )

PAGE 174

174 disis < discretize(bgresult [ ,i ] dvardis, categories=nclass) cat(levels(disis)) cat( \ t' ) cat( \ n' ) cat(table(disis)) cat( \ n' ) } } # discretize the spatial distance variables if (disdis == "fixed" ){ for (i in 11 : 32 ){ tempresult < cbind(tempresult,lapply(discretize(bgresult [ ,i ] disdis, categories=c( Inf 648.21 1542.01 Inf ), labels=c( "close" "intermediate" "far" )), as.charac ter)) cat( 'col:' ) cat (i) cat( ' ) cat(colnames(bgresult) [ i ] ) cat( \ n' ) disis < discretize(bgresult [ ,i ] disdis, categories=c( Inf 648.21 1542.01 Inf ),) cat(levels(disis)) cat( \ t' ) cat( \ n' ) cat(table(dis is)) cat( \ n \ n' ) } } else { for (i in 11 : 32 ){ tempresult < cbind(tempresult,lapply(discretize(bgresult [ ,i ] disdis, categories=nclass, labels=c( "close" "intermediate" "far" )), as.character)) cat( 'col :' ) cat (i) cat( ' ) cat(colnames(bgresult) [ i ] ) cat( \ n' ) disis < discretize(bgresult [ ,i ] disdis, categories=nclass) cat(levels(disis)) cat( \ t' ) cat( \ n' ) cat(table(disis)) cat( \ n \ n' ) } } tempresult < cbind(tempresult, bgresult [ 33 : 53 ] ) tempresult < as.matrix(tempresult) cat( "number of result column" ,ncol(tempresult)) cat( "number of result row" ,nrow(tempresult))

PAGE 175

175 colnames(tempresult) < colnames(bgresult) #export the discritization result bgtotalcrash < tempresult [ 1 : 53 ] bgtotalcrash < as.matrix(bgtotalcrash) write.table(bgtotalcrash,file=resultaddress, sep= \ t" row.names= FALSE ) sink() return(tempresult) } #rule mining rulemining < fun ction (timeperiod, target, crashdis, dvardis,disdis) { basicaddress < NULL processaddress < NULL resultaddress < NULL crashplusbefull < NULL totalrules < NULL # basicaddress < getbasicaddress(timeperiod, target, crashdis, dvardis,disdis) proce ssaddress < paste(basicaddress, "_mining_process.txt" sep= "" ) resultaddress < paste(basicaddress, "_mining_result.csv" sep= "" ) #get the crash+be from the discretized data file crashplusbefull < getcrashplusbe(timeperiod, target, crashdis, dvardis,d isdis) crashplusbe < NULL crashplusbe < deletefields(crashplusbefull) sink(processaddress) ##get the crash number column crashcolumn < colnames(crashplusbe) [ 1 ] crashcolumn < paste(crashcolumn, "big" ,sep= "=" ) crashplusbetran=as(crashplu sbe, "transactions" ) totalrules < NULL totalrules < apriori(crashplusbetran, parameter = list(support = 0.05 confidence = 0.5 minlen = 2 maxlen= 5 ), appearance = list(rhs = c(crashcolumn), default= "lhs" )) # remove redundant rules if the lenght of the rules is bigger than 0 if ( is.null(totalrules) && length(totalrules) > 0 ){ totalrules.sorted < sort(totalrules, by= "support" ) subset.matrix < is.subset(totalrules.sorted, totalrules.sorted) subset.matrix [ lower.tri(subset.matrix, dia g=T) ] < NA redundant < colSums(subset.matrix, na.rm=T) >= 1 which(redundant) # remove redundant rules rules.pruned < totalrules.sorted [ redundant ] cat( \ n the total rules after the remove of redundant rules is: ) cat(length(rules. pruned)) cat(summary(rules.pruned)) cat( "top 10 rules with highest lift value \ n" ) inspect(head(SORT(rules.pruned, by = "lift" ), n = 10 ))

PAGE 176

176 #write all rules inspect(rules.pruned) #write the top20 to a CSV file totalrules.top20 < head(SO RT(rules.pruned, by = "lift" ), n = 20 ) write(totalrules.top20, file=resultaddress, quote= TRUE sep= "," col.names= TRUE ) } sink() return(totalrules) }

PAGE 177

177 LIST OF REFERENCES Agrawal, R., Imielinski, T., Swam i, A., 1993. Mining association rules between sets of items in large databases. Proceedings of the 1993 ACM SIGMOD international conference on Management of data ACM, Washington, D.C., United States, pp. 207 216. Alikhani, M., Nedaie, A., Ahmadvand, A., 2 013. Presentation of clustering classification heuristic method for improvement accuracy in classification of severity of road accidents in iran. Safety Science 60, 142 150. American Planning Association, 2012. Policy guide on smart growth. Appice, A., Ber ardi, M., Ceci, M., Malerba, D., 2005. Mining and filtering multi level spatial association rules with ares. Foundations of Intelligent Systems, Proceedings 3488, 342 353. Appice, A., Buono, P., 2005. Analyzing multi level spatial association rules through a graph based visualization. Innovations in Applied Artificial Intelligence 3533, 448 458. Appice, A., Ceci, M., Lanza, A., Lisi, F.A., Malerba, D., 2003. Discovery of spatial association rules in geo referenced census data: A relational mining approach. Intell. Data Anal. 7 (6), 541 566. Berkovitz, A., 2001. The marriage of safety and land use planning: A fresh look at local roadways. Public Roads 65 (2). Bogorny, V., 2006. Enhancing spatial association rule mining in geographic databases. UNIVERSIDADE FE DERAL DO RIO GRANDE DO SUL. Bogorny, V., Kuijpers, B., Alvares, L.O., 2008. Reducing uninteresting spatial association rules in geographic databases using background knowledge: A summary of results. Int. J. Geogr. Inf. Sci. 22 (4), 361 386. Bogorny, V., Pa lma, A.T., Engel, P., Alvares, L.O., Year. Weka gdpm: Integrating classical data mining toolkit to geographic information systems. In: Proceedings of the SBBD Workshop on Data Mining Algorithms and Aplications (WAAMD 2006), Florianopolis, Brasil, October, pp. 16 20. Ceci, M., Appice, A., Malerba, D., 2004. Spatial associative classification at different levels of granularity: A probabilistic approach. Knowledge Discovery in Databases: Pkdd 2004, Proceedings 3202, 99 111. Cervero, R., Kockelman, K., 1997. Tr avel demand and the 3ds: Density, diversity, and design. Transportation Research Part D Transport and Environment 2 (3), 199 219.

PAGE 178

178 Chang, L.Y., Chen, W.C., 2005. Data mining of tree based models to analyze freeway accident frequency. Journal of Safety Resea rch 36 (4), 365 375. Chang, L.Y., Wang, H.W., 2006. Analysis of traffic injury severity: An application of non parametric classification tree techniques. Accident Analysis and Prevention 38 (5), 1019 1027. Chen, M. S., Han, I., Yu, P.S., 1996. Data mining: An overview from a database perspective. Knowledge and Data Engineering, IEEE Transactions on 8 (6), 866 883. Chong, M., Abraham, A., Paprzyck, M., 2005. Traffic accident analysis using machine learning paradigms. Informatica 29, 89 98. Chong, M.M., Abrah am, A., Paprzycki, M., Year. Traffic accident analysis using decision trees and neural networks. In: Proceedings of the IADIS International Conference on Applied Computing, Portugal, pp. 39 42. De Guevara, F.L., Washington, S.P., Oh, J., 2004. Forecasting crashes at the planning level simultaneous negative binomial crash model applied in tucson, arizona. Transportation Research Record: Journal of the Transportation Research Board (1897), 191 199. Diana, M., 2012. Studying patterns of use of transport mode s through data mining application to us national household travel survey data set. Transportation Research Record: Journal of the Transportation Research Board (2308), 1 9. Dumbaugh, E., Li, W.H., 2010. Designing for the safety of pedestrians, cyclists, an d motorists in urban environments. Journal of the American Planning Association 77 (1), 69 88. Dumbaugh, E., Rae, R., 2009. Safe urban form: Revisiting the relationship between community design and traffic safety. Journal of the American Planning Associati on 75 (3), 309 329. Dumbaugh, E., Rae, R., Wunneberger, D., 2011. Using gis to develop a performance based framework for evaluating urban design and crash incidence. Urban Design International 16 (1), 63 71. Egenhofer, M.J., Franzosa, R.D., 1991. Point set topological spatial relations. International Journal of Geographical Information Systems 5 (2), 161 174. Egenhofer, M.J., Herring, J., 1994. Categorizing binary topological relations between regions, lines, and points in geographic databases. El Seoud, M. K.A., Elbadrawi, H.R., 2004. Data mining and gis technologies to support highway safety management systems. IAMOT 2004 Washington, D.C.

PAGE 179

179 Erdogan, S., 2009. Explorative spatial analysis of traffic accident statistics and road mortality among the provinces o f turkey. Journal of Safety Research 40 (5), 341 351. Esri, An overview of the spatial statistics toolbox. Ewing, R., Cervero, R., 2010. Travel and the built environment: A meta analysis. Journal of the American Planning Association 76 (3), 265 294. Ewing, R., Dumbaugh, E., 2009. The built environment and traffic safety a review of empirical evidence. Journal of Planning Literature 23 (4), 347 367. Florida Department of Highway Safety and Motor Vehicle, 2010. 2009 florida traffic crash statistics report. Ge urts, K., Thomas, I., Wets, G., 2005. Understanding spatial concentrations of road accidents using frequent item sets. Accident Analysis and Prevention 37 (4), 787 799. Geurts, K., Wets, G., Brijs, T., Vanhoof, K., 2003. Profiling of high frequency acciden t locations by use of association rules. Transportation Research Record: Journal of the Transportation Research Board (1840), 123 130. Graves, S.J., Rochowiak, D., Anderson, M., 2005. Mining and analysis of traffic safety and roadway condition data. Univer sity Transportation Center for Alabama. Hadayeghi, A., Shalaby, A., Persaud, B., 2003a. Macrolevel accident prediction models for evaluating safety of urban transportation systems. Transportation Research Record: Journal of the Transportation Research Boar d 1840 ( 1), 87 95. Hadayeghi, A., Shalaby, A.S., Persaud, B.N., 2010. Development of planning level transportation safety tools using geographically weighted poisson regression. Accident Analysis and Prevention 42 (2), 676 688. Hadayeghi, A., Shalaby, A.S ., Persaud, H.N., 2003b. Macrolevel accident prediction models for evaluating safety of urban transportation systems. Transportation Research Record: Journal of the Transportation Research Board (1840), 87 95. Handy, S.L., Boarnet, M.G., Ewing, R., Killing sworth, R.E., 2002. How the built environment affects physical activity: Views from urban planning. American Journal of Preventive Medicine 23 (2), 64 73. Hardin, J.M., Conerly, M., Watkins, W., 2003. Traffic safety analysis: A data mining approach. Univer sity Transportation Center for Alabama. Ivan, J.N., Wang, C.Y., Bernardo, N.R., 2000. Explaining two lane highway crash rates using land use and hourly exposure. Accident Analysis and Prevention 32 (6), 787 795.

PAGE 180

180 Kashani, A.T., Mohaymany, A.S., 2011. Analys is of the traffic injury severity on two lane, two way rural roads based on classification tree models. Safety Science 49 (10), 1314 1320. Khattak, A.J., Wang, X., Zhang, H.B., 2010. Spatial analysis and modeling of traffic incidents for proactive incident management and strategic planning. Transportation Research Record: Journal of the Transportation Research Board (2178), 128 137. Kim, K., Brunner, I., Yamashita, E., 2006a. Influence of land use, population, employment, and economic activity on accidents. Transportation Research Record: Journal of the Transportation Research Board (1953), 56 64. Kim, K., Brunner, I.M., Yamashita, E.Y., 2006b. Influence of land use, population, employment, and economic activity on accidents. Transportation Research Record: Journal of the Transportation Research Board 1953, 56 64. Kim, K., Pant, P., Yamashita, E., 2010. Accidents and accessibility measuring influences of demographic and land use variables in honolulu, hawaii. Transportation Research Record: Journal of the Tra nsportation Research Board (2147), 9 17. Kim, K., Yamashita, E., 2002. Motor vehicle crashes and land use: Empirical analysis from hawaii. Transportation Research Record: Journal of the Transportation Research Board (1784), 73 79. Koperski, K., Han, J., 19 95. Discovery of spatial association rules in geographic information databases. Proceedings of the 4th International Symposium on Advances in Spatial Databases Springer Verlag, pp. 47 66. Kotsiantis, S., Kanellopoulos, D., 2006. Discretization techniques: A recent survey. GESTS International Transactions on Computer Science and Engineering 32 (1), 47 58. Lamm, R., Psarianos, B., Mailaender, T., 1999. Highway design and traffic safety engineering handbook NcGRAW HILL. Laube, P., Berg, M.D., Kreveld, M., 200 8. Spatial support and spatial confidence for spatial association rules. In: Ruas, A., Gold, C. eds. Springer Berlin Heidelberg, pp. 575 593. Lord, D., Mannering, F., 2010. The statistical analysis of crash frequency data: A review and assessment of method ological alternatives. Transportation Research Part A: Policy and Practice 44 (5), 291 305. Lord, D., Washington, S.P., Ivan, J.N., 2005. Poisson, poisson gamma and zero inflated regression models of motor vehicle crashes: Balancing statistical fit and the ory. Accident Analysis and Prevention 37 (1), 35 46.

PAGE 181

181 Lud, M. C., Widmer, G., 2000. Relative unsupervised discretization for association rule and knowledge discovery Springer Berlin Heidelberg, pp. 148 158. Malerba, D., Lisi, F., Appice, A., Sblendorio, F., Year. Mining census and geographic data in urban planning environments. In: Proceedings of the L. Santini and D. Zotta (Eds.), Atti della Terza Conferenza Nazionale su Info rmatica e Pianificazione Urbana e Territoriale (INPUT 2003). Marshall, W.E., Garrick, N.W., 2011. Does street network design affect traffic safety? Accident Analysis and Prevention 43 (3), 769 781. Mennis, J., Liu, J.W., 2005. Mining association rules in s patio temporal data: An analysis of urban socioeconomic and land cover change. Transactions in Gis 9 (1), 5 17. Miaou, S.P., 1994. The relationship between truck accidents and geometric design of road sections poisson versus negative binomial regressions Accident Analysis and Prevention 26 (4), 471 482. Montella, A., 2011. Identifying crash contributory factors at urban roundabouts and using association rules to explore their relationships to different crash types. Accident Analysis and Prevention 43 (4) 1451 1463. Montella, A., Aria, M., D'ambrosio, A., Mauriello, F., 2011a. Data mining techniques for exploratory analysis of pedestrian crashes. Transportation Research Record: Journal of the Transportation Research Board (2237), 107 116. Montella, A., Ar ia, M., D'ambrosio, A., Mauriello, F., 2011b. Data mining techniques for exploratory analysis of pedestrian crashes. Transportation Research Record: Journal of the Transportation Research Board 2237 ( 1), 107 116. uriello, F., 2011c. Analysis of powered two wheeler crashes in italy by classification trees and rules discovery. Accident Analysis &Prevention (0). National Highway Traffic Safety Administrator, 2010. Highlights of 2009 motor vehicle crashes. Pande, A., A bdel Aty, M., 2008. Discovering indirect associations in crash data through probe attributes. Transportation Research Record: Journal of the Transportation Research Board (2083), 170 179. Pande, A., Abdel Aty, M., 2009. Market basket analysis of crash data from large jurisdictions and its potential as a decision support tool. Safety Science 47 (1), 145 154.

PAGE 182

182 Rifaat, S.M., Tay, R., 2009. Effects of street patterns on injury risks in two vehicle crashes. Transportation Research Record: Journal of the Transport ation Research Board (2102), 61 67. Rifaat, S.M., Tay, R., De Barros, A., 2010. Effect of street pattern on road safety are policy recommendations sensitive to aggregations of crashes by severity? Transportation Research Record: Journal of the Transportati on Research Board (2147), 58 65. Sawalha, Z., Sayed, T., 2001. Evaluating safety of urban arterial roadways. Journal of Transportation Engineering Asce 127 (2), 151 158. Sharma, L.K., Vyas, O.P., Tiwary, U.S., Vyas, R., 2005. A novel approach of multilevel positive and negative association rule mining for spatial databases. Machine Learning and Data Mining in Pattern Recognition, Proceedinds 3587, 620 629. Smith, R.K., Wang, H., 2005. Data mining to improve traffic safety. University Transportation Center f or Alabama. Songchitruksa, P., Zeng, X.S., 2010. Getis ord spatial statistics to identify hot spots by using incident management data. Transportation Research Record: Journal of the Transportation Research Board (2165), 42 51. Tobler, W.R., 1970. A compute r movie simulating urban growth in the detroit region. Economic Geography 46 (ArticleType: research article / Issue Title: Supplement: Proceedings. International Geographical Union. Commission on Quantitative Methods / Full publication date: Jun., 1970 / C opyright 1970 Clark University), 234 240. Transportation Research Board and Institue of Medicine, 2005. Trb speical report 282: Does the built environment influence physical activity? Examining the evidence Tseng, W.S., Nguyen, H., Liebowitz, J., Agrest i, W., 2005. Distractions and motor vehicle accidents: Data mining application on fatality analysis reporting system (fars) data files. Industrial Management & Data Systems 105 (9), 1188 1205. Ukkusuri, S., Hasan, S., Aziz, H.M.A., 2011. Random parameter m odel used to explain effects of built environment characteristics on pedestrian crash frequency. Transportation Research Record: Journal of the Transportation Research Board (2237), 98 106. Vannucci, M., Colla, V., 2004. Meaningful discretization of contin uous features for association rules mining by means of a som. ESANN pp. 489 494. Wedagama, D.M.P., Bird, R.N., Metcalfe, A.V., 2006. The influence of urban land use on non motorised transport casualties. Accident Analysis and Prevention 38 (6), 1049 1057.

PAGE 183

183 Yoo, J.S., Shekhar, S., Celik, M., 2005. A join less approach for co location pattern mining: A summary of results. Proceedings of the Fifth IEEE International Conference on Data Mining IEEE Computer Society, pp. 813 816. Yoo, J.S., She khar, S., Smith, J., Kumquat, J.P., 2004. A partial join approach for mining co location patterns. Proceedings of the 12th annual ACM international workshop on Geographic information systems ACM, Washington DC, USA, pp. 241 249.

PAGE 184

184 BIOGRAPHICAL SKETCH Yiq g eomatics from China University of Geosciences in 2007. Then he furthered his studies in Geographic Information System (GIS) in State Key Laboratory of Information Engineering, Mapping and Remote Sensing in Wuh Mr. Ouyang started his Ph.D. studies at the Department of Urban and Regional Planning in the College of Design, Construction and Planning at University of Florida in 2009. With a background in enginee information technology (including GIS, computer science) to urban planning. He used a combination of GIS and data mining techniques in transportation safety research for his dissertation.