<%BANNER%>

A Data-driven Modeling of Large-scale Mobile Networks

Permanent Link: http://ufdc.ufl.edu/UFE0044975/00001

Material Information

Title: A Data-driven Modeling of Large-scale Mobile Networks Community and Vehicular Mobility
Physical Description: 1 online resource (267 p.)
Language: english
Creator: Thakur, Gautam
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2012

Subjects

Subjects / Keywords: bigdata -- datascience -- modeling -- networks -- protocol -- vehicular
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: In this work, an overarching need of a framework for data-driven mobility modeling in pedestrian and vehicular networks is proposed. Mobility is one of the main factors affecting the design and performance of wireless networks. Mobility modeling has been an active field for the past decade, mostly focusing on matching a specific mobility with little focus on matching protocol performance. Recently, in pedestrian mobility, new generations of wireless and mobile services have emerged based on communal aspects of mobile societies; conventionally not considered in mobility modeling. In this work, first we focus on pedestrian mobility - we start our study by investigating the adequacy of existing pedestrian mobility models in capturing various aspects of human mobility behavior (including communal behavior), as well as protocol performance. This is achieved systematically through the introduction of a multi-dimensional mobility metric space (MIDAS) to measure individual, pair-wise (encounter) and group (community) metrics. In addition, a methodical analysis of a range of mobile encounter-based networking protocols is conducted to compare the performance under various mobility models and extensive traces. Real wireless traces from MIT, USC, UF, Dartmouth, Infocom and random mobility, TVC (community based), and SMOOTH (power-law encounters generating) mobility models are used. Our results indicate significant gaps in several metric dimensions between real traces and existing mobility models. We then introduce COBRA, a new mobility model capable of spanning the mobility metric space to match realistic traces. We further evaluate DTN protocols; including epidemic, spray-wait, Prophet, and Bubble Rap using various traces and models. Our findings clearly show that COBRA can match the realistic protocol performance, reducing the gap from 80% to 12%, and showing the efficacy of our approach based upon the metric space matching. We hope for our new mobility model to provide a more realistic and accurate alternative for mobility modeling, to aid in the analysis, simulation and design of future communication protocols. Realistic design and evaluation of vehicular mobility has been particularly challenging due to a lack of large- scale real-world measurements in the research community. Current mobility models and simulators rely on artificial scenarios and use small and biased samples. To overcome these challenges, we introduce a novel frame- work for large-scale monitoring, analysis, and model- ing of vehicular traffic using freely available online webcams. So far, we collected 140 million vehicular mobility records from 4,909 cameras located in ten different cities. First, our study shows that driving routes and visiting locations of cities demonstrate power-law distribution, indicating a planned or recently designed road infrastructure. Second, we represent cities by network graphs in which nodes are camera locations and edges are urban streets that connect the nodes. Such representation exhibits small world properties with short path lengths and large clustering coefficient. Third, traffic densities show 80% temporal correlation during several hours of a day. It is found using the goodness-of-fit test that the vehicular density distribution follows heavy-tail distributions such as Log-gamma, Log-logistic, and Weibull in over 90% of these locations. Moreover, a heavy-tail gives rise to long-range dependence and self- similarity, which we studied by estimating the Hurst exponent. Our analysis based on seven different Hurst estimators signifies that the traffic patterns are stochastically self-similar. This is surprising because the assumption of an exponential distribution for traffic modeling. To the best of our knowledge, this is also the largest dataset ever used. We believe this framework and the dataset provide a much-needed contribution to the research community for realistic and data-driven design and evaluation of vehicular networks.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Gautam Thakur.
Thesis: Thesis (Ph.D.)--University of Florida, 2012.
Local: Adviser: Helmy, Ahmed H.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2012
System ID: UFE0044975:00001

Permanent Link: http://ufdc.ufl.edu/UFE0044975/00001

Material Information

Title: A Data-driven Modeling of Large-scale Mobile Networks Community and Vehicular Mobility
Physical Description: 1 online resource (267 p.)
Language: english
Creator: Thakur, Gautam
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2012

Subjects

Subjects / Keywords: bigdata -- datascience -- modeling -- networks -- protocol -- vehicular
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: In this work, an overarching need of a framework for data-driven mobility modeling in pedestrian and vehicular networks is proposed. Mobility is one of the main factors affecting the design and performance of wireless networks. Mobility modeling has been an active field for the past decade, mostly focusing on matching a specific mobility with little focus on matching protocol performance. Recently, in pedestrian mobility, new generations of wireless and mobile services have emerged based on communal aspects of mobile societies; conventionally not considered in mobility modeling. In this work, first we focus on pedestrian mobility - we start our study by investigating the adequacy of existing pedestrian mobility models in capturing various aspects of human mobility behavior (including communal behavior), as well as protocol performance. This is achieved systematically through the introduction of a multi-dimensional mobility metric space (MIDAS) to measure individual, pair-wise (encounter) and group (community) metrics. In addition, a methodical analysis of a range of mobile encounter-based networking protocols is conducted to compare the performance under various mobility models and extensive traces. Real wireless traces from MIT, USC, UF, Dartmouth, Infocom and random mobility, TVC (community based), and SMOOTH (power-law encounters generating) mobility models are used. Our results indicate significant gaps in several metric dimensions between real traces and existing mobility models. We then introduce COBRA, a new mobility model capable of spanning the mobility metric space to match realistic traces. We further evaluate DTN protocols; including epidemic, spray-wait, Prophet, and Bubble Rap using various traces and models. Our findings clearly show that COBRA can match the realistic protocol performance, reducing the gap from 80% to 12%, and showing the efficacy of our approach based upon the metric space matching. We hope for our new mobility model to provide a more realistic and accurate alternative for mobility modeling, to aid in the analysis, simulation and design of future communication protocols. Realistic design and evaluation of vehicular mobility has been particularly challenging due to a lack of large- scale real-world measurements in the research community. Current mobility models and simulators rely on artificial scenarios and use small and biased samples. To overcome these challenges, we introduce a novel frame- work for large-scale monitoring, analysis, and model- ing of vehicular traffic using freely available online webcams. So far, we collected 140 million vehicular mobility records from 4,909 cameras located in ten different cities. First, our study shows that driving routes and visiting locations of cities demonstrate power-law distribution, indicating a planned or recently designed road infrastructure. Second, we represent cities by network graphs in which nodes are camera locations and edges are urban streets that connect the nodes. Such representation exhibits small world properties with short path lengths and large clustering coefficient. Third, traffic densities show 80% temporal correlation during several hours of a day. It is found using the goodness-of-fit test that the vehicular density distribution follows heavy-tail distributions such as Log-gamma, Log-logistic, and Weibull in over 90% of these locations. Moreover, a heavy-tail gives rise to long-range dependence and self- similarity, which we studied by estimating the Hurst exponent. Our analysis based on seven different Hurst estimators signifies that the traffic patterns are stochastically self-similar. This is surprising because the assumption of an exponential distribution for traffic modeling. To the best of our knowledge, this is also the largest dataset ever used. We believe this framework and the dataset provide a much-needed contribution to the research community for realistic and data-driven design and evaluation of vehicular networks.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Gautam Thakur.
Thesis: Thesis (Ph.D.)--University of Florida, 2012.
Local: Adviser: Helmy, Ahmed H.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2012
System ID: UFE0044975:00001


This item has the following downloads:


Full Text

PAGE 1

ADATA-DRIVENMODELINGOFLARGE-SCALEMOBILENETWORKS:COMM UNITY ANDVEHICULARMOBILITY By GAUTAMS.THAKUR ADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOL OFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENT OFTHEREQUIREMENTSFORTHEDEGREEOF DOCTOROFPHILOSOPHY UNIVERSITYOFFLORIDA 2012

PAGE 2

c r 2 012GautamS.Thakur 2

PAGE 3

ACKNOWLEDGMENTS I wouldliketothankmyadviser,Prof.AhmedHelmy,forhisincessantguidance andintellectualcontributionthroughthecourseofmyPh.D.degree.Heisawonderful mentorwhohasgivenexibilityinpursuingresearchdirectionsofmychoiceaswell asgavehisutmostcontributioninordertobringrstclassresearch.Ireallyenjoyed andlearnedalotfromhismethodicalandscienticapproachestoproblemsolvingand in-depthanalysis. Iwouldlikealsoexpressdeepgratitudetomysupervisorycommitteemembers, includingDr.PanHui,Prof.MyTraThai,Prof.SartajSahni,Prof.AlinDobra,andProf. MichaelFang,whohelpedmesignicantlyintheprocessofformingmydissertation. Theirinsightintogivingnewperspectiveandopinionstotheproblemhelpedmeto writeamorethoroughandcompletedissertation.Foremost,myspecialthankstoDr. PanHui,whogavemeanopportunitytoworkunderhisableguidanceatDeutsche TelekomLaboratories,Berlin.Itopenedawholenewperspectiveformetoworkon cuttingedgeproblemsrelatedtolarge-scaledataanalysisandplanet-scalevehicular mobility.Iwasparticularlyimpressedbyhisnovelapproachesandfreshideasthatmade thewholeprocessintellectuallychallengingandrewarding.Ialsoextendmyspecial thankstoProf.MyTraThai,whoshowedmethebeautyofadoptingnetworksciences approachinsolvingtoughgraphdataproblemsandtheirusabilityinbothpedestrianand vehicularresearchdomain.IowealottoProf.SartajSahni,forhelpingmedevisingnew algorithmsforefcientlyprocesstera-scalesizedata.Hisactiveinvolvementhelpedme focusmoreontheanalysis,design,anddevelopmentofactionableresults. ForDr.Wei-JenHsu,forhisguidanceandinsightintoformingtheresearchproblem andprovidingsubstantialsupportfortoolsandinitialdevelopmentofthesolution.He wasawonderfulcolleaguewhoseinputsbroughtnewperspectivesandunderstanding totheproblemandhelpedinmanyhelpfuldiscussions.Iwillalwaysrememberhimfor givingmebestresearchworkexperience.IamalsothankfultoUdayanKumarwho 3

PAGE 4

helpedsecuringnecessarycomputingresources.Hisefforts madesurethatmywork isdonewithinashortamountoftime.Ialsoenjoyedworkingwithhimonproblems relatedtomobilitymodelingperformance,specicallyIappreciatehiscontribution towardsidentifyingkeymetricsfordevelopingasystematicexperimentalmethodology forprotocolanalysisformobilitymodels.IamthankfultomyothercolleaguesDr. SungwookMoon,JeeyoungKim,YibinWang,SaeedMoghaddam,andGulizSeray Tuncayformanyhelpfuldiscussions,suggestions,manuscriptreviewandgeneral criticonmyresearchwork.IamalsothankfultoHamidKetabdar,andMohsinAlifor extendingtheirhandinhelpingmeinnovatescalablealgorithmsforprocessingmillions ofimagesinaveryshorttime,DamienFayforbringingnewdimensiontotrafcwork throughcausalityandtimeseriesanalysis.IalsogratefultoMs.AditiMalviyaforher literatureprowessinrigorouslyreviewingmymanuscriptsandextendingherhandat everystepofmydoctoralstudies.Inditimpossibletoattainthelevelofachievement andthedepthinresearchwithoutherincessantsupport. Iwouldliketoexpressmydeepestgratitudetowardsmyfamilywhowerevery supportiveandkepttheirstrongbeliefandencouragementinthisjourney.Iamin-debt tothemforkeepingpatienceandgivingmeadvicethatmademyjourneyeasierand moreenjoyable.Lastbutnottheleast,IamthankfultoMr.RickSwenson,Ms.Dana Moser,andMr.SureshLokhandewhohelpedmedevelopastrongcharacterandtaught disciplinedlifestylewithoutwhichitwasimpossibletocarryoutthisarduousjourney towardscompletingthedoctoraldissertation.Iamalsothankfultohundredsofother researcherswhodirectlyorindirectlyprovidedinputsthatmadethisworkpossible. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS ..................................3 LISTOFTABLES ......................................11 LISTOFFIGURES .....................................12 ABSTRACT .........................................16 C HAPTER 1OVERARCHINGINTRODUCTION .........................18 2RELATEDWORK ..................................21 3DATASETFORHUMANMOBILITYMODELING .................24 4GAUGINGBEHAVIORALANDPROTOCOLPERFORMANCEGAPS .....26 4.1Introduction ..................................26 4.2MobilityModelsStudied ............................29 4.2.1RandomDirectionModel .......................30 4.2.2TimeVariantCommunityModel ...................30 4.3HumanMobilityCharacteristics .......................31 4.3.1AnalysisofSpatio-TemporalPreferences ..............31 4.3.2AnalysisofEncounterStatistics ...................32 4.3.3SimilarityandStructuralDynamics ..................34 4.3.3.1Similarity ...........................34 4.3.3.2CapturingSpatio-TemporalPreferences ..........35 4.3.3.3Characterizingassociationpatterns ............35 4.3.3.4Calculatingsimilarity .....................36 4.3.3.5SimialrityAnalysis ......................38 4.3.3.6Modularity ..........................38 4.3.3.7Detectionofmobilesocieties ................39 4.3.3.8Modularityanalysisformobilesocieties ..........39 4.3.3.9Networkanalysisformobilesocieties ...........41 4.3.3.10Similarityinmodel-themissinglink .............41 4.3.3.11ConstructionofTVCmodelforcampuses .........42 4.3.3.12Similarityevaluation .....................42 4.4RoutingProtocolAnalysis ..........................45 4.5Discussion ...................................45 4.6Conclusion ...................................46 5COBRA:AFRAMEWORKFORTHEANALYSISOFREALISTICMODELS ..48 5.1Introduction ..................................48 5

PAGE 6

5.2TheMobilityFramework ...........................51 5.2.1Realmeasurements: .........................52 5.2.2Buildingblocksformobilitymodels: .................53 5.2.2.1Individualmobilitypatterns .................54 5.2.2.2Pairwisemobilitypatterns ..................54 5.2.2.3Collectivemobilitybehaviorpatterns ............54 5.2.3Protocol-independentmobilityanalysis: ...............55 5.2.4Protocoldependentmobilityanalysis ................55 5.3COBRA .....................................56 5.3.1Initialidea ...............................57 5.3.2DesignandDescriptionofCOBRA ..................57 5.3.3ParametersandModelConstructionofCOBRA ..........61 5.3.3.1Determinethestructureforlocationvisitation .......61 5.3.3.2Determinethestructureforactivitytime,epoch,an dperiod 61 5.3.3.3Movementspeed .......................62 5.4ProtocolIndependentAnalysis ........................63 5.4.1MobilityModelsStudied ........................63 5.4.2AnalysisofSpatio-TemporalPreferences ..............64 5.4.3SimilarityinMobileSocieties .....................64 5.4.4AnalysisofHumanEncounterStatistics ...............66 5.4.5CommunityDetectionthroughModularityOptimization ......67 5.5ProtocolDependentAnalysis .........................68 5.5.1ProtocolEvaluationMetrics: .....................69 5.5.2EncounterbasedForwardingProtocolAnalysis ...........69 5.5.3CommunitybasedForwardingProtocolAnalysis ..........70 5.6Conclusion ...................................71 6MEASUREMENTANDVEHICULARDENSITYESTIMATION ..........72 6.1TopologyDataofCameraLocations .....................72 6.2Vehicularimagerydatacollection ......................72 6.2.1BackgroundSubtraction ........................75 6.2.2AlgorithmforTrafcDensityEstimation ...............76 6.2.3OutlierDetectionandRemoval ....................78 6.2.4GroundTruthforValidation ......................80 6.2.5Geo-CoordinatesPairing .......................80 7SPATIALANDTEMPORALANALYSISOFPLANETSCALEVEHICULARDAT A 82 7.1Introduction ..................................82 7.2RelatedWork .................................84 7.3SpatialAnalysisofCities ...........................85 7.4TemporalAnalysisofTrafcDensities ....................89 7.4.1TrafcDistribution ...........................90 7.4.2CorrelationinTrafcDistribution ...................92 7.4.3TrafcCongestionCorrelation ....................93 6

PAGE 7

7.5Spatio-TemporalAnalysis ...........................96 7.6Conclusion ...................................99 8MODELINGANDCHARACTERIZATIONOFVEHICULARMOBILITY .....101 8.1Introduction ..................................101 8.2RelatedWork .................................102 8.3TrafcModeling ................................106 8.4Self-SimilarityandLongRangeDependence ...............112 8.4.1Self-SimilarProcessandLongRangeDependence ........113 8.4.2RelationbetweenSelf-SimilarityandLongRangeDepen dence ..114 8.4.3ImpactofHeavyTailonLongRangeDependence .........114 8.4.4EstimationofHurstExponent .....................115 8.5AnalysisofSelf-Similarity ...........................115 8.6ConclusionandFutureWork .........................120 9THESTRUCTUREANDTRAFFICFLOWANATOMYOFVEHICLES .....121 9.1Introduction ..................................121 9.2RelatedWork .................................124 9.3AnalysisofTopologicalProperties ......................125 9.3.1NetworkofUrbanStreets .......................129 9.3.2AnalysisofDegreeDistribution:UnweightedCase .........131 9.3.3AnalysisofDegreeDistribution:WeightedCase ..........132 9.3.4SmallWorldAnalysis .........................133 9.4TrafcModelingandCharacterization ....................134 9.4.1TrafcFlowAuto-Correlation .....................135 9.4.2TrafcModelingandCharacterization ................136 9.5FutureApplicationtoVehicularNetworks ..................137 9.6Conclusion ...................................137 10INFRASTRUCTURETOENABLECITY-WIDEUBIQUITOUSCOMPUTING ..139 10.1Introduction ..................................139 10.2RelatedWork .................................141 10.3Spatio-TemporalAnalysis ...........................143 10.4NetworkAnalysis ...............................149 10.4.1DistanceandTimeAnalysis .....................151 10.4.2NetworkTheory ............................152 10.4.3NetworkTheoryAnalysis .......................154 10.5SocialAnalysis ................................156 10.6Conclusion ...................................157 11KNOWLEDGEDISCOVERYANDCAUSALITYINURBANCITYTRAFFIC ..159 11.1RelatedWork .................................161 11.2GrangerNetworks ...............................163 7

PAGE 8

11.3Results .....................................167 11.3.1NetworkAnalysis ...........................167 11.3.2Disaggregationandtrafcforecastanalysis .............171 11.3.3Comparisonofglobalcitynetworks .................173 11.4Conclusion ...................................175 A PPENDIX ACOMMUNITYDETECTIONINNETWORKS ...................177 A.1CentralityMeasures ..............................178 A.1.1DegreeCentrality ...........................178 A.1.2EigenVectorCentrality ........................179 A.1.3BetweennessCentrality ........................179 A.1.4ClosenessCentrality .........................180 A.2StudyofComplexSystems ..........................180 A.3Overview ....................................181 A.4TheProposedAlgorithm ...........................184 A.4.1Overview ................................184 A.4.2AlgorithmDescription .........................184 A.4.3RandomWalk .............................186 A.4.4LocalOptimization ...........................188 A.4.4.1Vertexranking ........................189 A.4.4.2Communityranking .....................192 A.4.4.3Mutualexchange .......................193 A.4.4.4Dynamicoptimization ....................194 A.5Experiments ..................................195 A.5.1Benchmarking .............................195 A.5.2ComputerGeneratedExperiments ..................197 A.5.3ApplicationtoRealNetworks .....................201 A.5.3.1Zacharykarateclub .....................202 A.5.3.2Neurotransmitterreceptorcomplexes ...........204 A.5.4StudyofWirelessMobileUsers ...................206 A.6FurtherImprovements ............................208 A.7ConcludingRemarks .............................209 BSHIELD-SOCIALSENSINGANDHELPINEMERGENCYUSINGDEVICES 210 B.1Introduction ..................................210 B.2Background ..................................212 B.2.1EmergencyandRescueSystemsinGeneral ............212 B.2.2OtherApproaches ...........................212 B.2.3UnderstandingUserbehavioralpatterns ...............212 B.2.4UsingHumanMobilityasaCommunicationParadigm .......213 B.3SHIELD:RationaleandArchitecturaloverview ...............213 B.3.1TheEncounterandDurationMatrix .................213 8

PAGE 9

B.3.2TheTrustMatrix ............................214 B.3.3AdvisorySub-System .........................214 B.3.4Context-awareAdaptiveProtocol ...................215 B.3.5DistressSignaling ...........................215 B.4TraceAnalysis .................................215 B.5TrustModel ..................................216 B.5.1NumberofEncounters f ( i j ) .....................217 B.5.2DurationofEncountersD(i,j) .....................217 B.6ProtocolDesign ................................218 B.6.1TheScanEngine ...........................219 B.6.2ProtocolAdaption ...........................219 B.6.3DistressSignalCommunication ...................219 B.6.4DiscoveringandPairing ........................220 B.7ApplicationPrototype .............................220 B.8TestbedImplementationAnalysis ......................221 B.8.1BluetoothEvaluation .........................221 B.8.2CrimeStatisticsandMobileUserDensity ..............222 B.9ConcludingRemarks .............................222 CTHEONESIMULATORREADME .........................224 C.1Running ....................................224 C.2Conguring ..................................225 C.3Runindexing ..................................226 C.4Movementmodels ...............................227 C.5Routingmodulesandmessagecreation ..................230 C.6Reports .....................................231 C.7Hostgroups ..................................231 C.8Settings ....................................232 C.8.1ScenarioSettings ...........................232 C.8.2InterfaceSettings ...........................232 C.8.3HostGroupSettings ..........................232 C.8.4ReportSettings ............................233 C.8.5EventGeneratorSettings .......................234 C.9ExampleConguration ............................234 C.10Toolkit .....................................234 C.11ExampleofUSCConguration ........................236 DSMOOTHMOBILITYMODEL ...........................239 D.1UsageInstructions ..............................239 D.2Settings ....................................239 ECOBRASCRIPTS ..................................242 9

PAGE 10

FSUMMARYOFTHEORETICALDISTRIBUTION .................244 F.1HurstEstimators ................................244 F.1.1AbsoluteValueMethod ........................244 F.1.2AggregatedVarianceMethod .....................244 F.1.3R/SMethod ..............................245 F.1.4PeriodogramMethod .........................245 F.1.5WhittleEstimator ............................245 F.1.6Abry-VeitchMethod ..........................245 REFERENCES .......................................247 BIOGRAPHICALSKETCH ................................267 1 0

PAGE 11

LISTOFTABLES Table page 3-1Detailsofwirelessmeasurements ........................25 4-1AnonymizedWLANsessionsample .......................34 4-2Networkanalysisofdatasetsonthreedifferentmetrics .............39 6-1Mainglobalwebcamdataset ...........................72 6-2Usedglobalwebcamdatasets ...........................73 6-3Summaryofregressionanalysis .........................79 6-4Detailsofgeo-coordinatemeasurements ....................80 7-1Clusteringanalysis .................................86 8-1Dominantdistributionasbestts[rankedand%deviation ] ...........111 8-2TheaveragevalueofHurstExponentcalculatedforallloc ations. .......118 9-1Parameteranddetails ...............................127 9-2Networkanalysisofvehiculartrafc. .......................128 10-1Parameteranddetails ...............................151 10-2Resultsforspatial,temporalandnetworkanalysis. ...............151 10-3FourSquaredata. ..................................155 11-1SummaryoftopologicalcharacteristicsfortheGN's. ..............174 F-1Denitionoftheoreticaldistributions. .......................244 1 1

PAGE 12

LISTOFFIGURES Figure page 4-1Spatio-temporalpreferencesofmobileusers ..................31 4-2TVCmodel'sencounterstatistics .........................32 4-3Similarityrepresentationthroughassociationmatrix. ..............33 4-4Similaritydistributionhistogramamonguserpairsforc ampuses. .......37 4-5LogNormalizedSimilaritydistributionofallfourdatas etsisshown. ......37 4-6Shownarethestructuralandspatio-temporaldynamicsof mobilesocieties. .40 4-7Cumulativedistributionfunctionofdistancesforthesi milarityofusers. ....42 4-8Dendrogramsgivingvisualrepresentationofhierarchic alclustering. .....43 4-9Analysisforepidemicroutingandrelatedprotocols. ...............44 5-1Frameworkforanalysisandmodelingofhumancontactnetw orks. ......51 5-2Multi-dimensionbehavioralmetricspaceforhumanmobil ity. ..........53 5-3Probabilitydistributionoflocationvisitationandtim e ..............56 5-4COBRAarchitecture ................................58 5-5Spatio-temporalanalysisofmodels. .......................64 5-6DistributionofSimilarity ..............................65 5-7Intercontacttimedistribution ...........................66 5-8ModularitybasedclusteringdistributionforIBM .................67 5-9Resultsforencounterbasedforwardingprotocolanalysi s ...........68 5-10Resultsforcommunitybasedforwardingprotocolanalys is ...........69 6-1Distributedarchitectureforvehicularimagerycollect iononplanet-scale. ...73 6-2TrafccamerasinLondonandSydney. ......................75 6-3Asetofpicturesforanintersectionwithvaryingtrafci ntensities. .......77 6-4Outliersdetectionandremoval. ..........................78 6-5Acomparisonofempiricaltrafcdensitieswithnumberof carsrecorded. ..79 7-1K-medoidClusteringforthedistanceandtimepairingfor thecities. ......87 1 2

PAGE 13

7-2Theerrorbarsshowthedeviationinthemembershipofobje ctsforclusters. .88 7-3ShowtheAuto-correlationfordifferenttypesoftrafcf romfourcities. ....90 7-4Showtheaverageddailydistributionoftrafcforfourci ties. ..........91 7-5A42daystrafcdistributionsnapshotfromfourdifferen tcamerasisshown. .91 7-6Anhourlyvariabilityforthetrafc .........................92 7-7Cumulativedistributionfunctionshowcameraauto-corr elationoftrafc. ...92 7-8A42dayshourbyhouraveragecorrelationoftrafcdensit ieswithfourlags. .93 7-9Thespatio-temporalcongestionismodeledasgraph. .............97 8-1Histogramofempiricaltrafcdensitiesoftwodifferent locations. .......107 8-2Trafcdensitiesduringtwodifferenttimeperiods ................108 8-3Curvettingforlocationstime-series. ......................109 8-4Thepercentageforthedistributionsthatcoverallsixre gions. .........112 8-5Scalingofstochasticselfsimilarvehiculartrafc. ................116 8-6Hurstexponentplotsforthetimeseriesofonelocation. ............117 8-7AhistogramshowingthedistributionsofHurstexponent. ............119 8-8Thepercentageoflocationsfromregionsthathaveself-s imilarityintrafc. .119 9-1Radialaxislayoutshowtwodimensionalrepresentationo fdegreedistribution. 126 9-2Ageo-laidnetworkofurbanstreetsofthreeregionsissho wn. .........130 9-3ACDF P ( x ) a nditsmaximumlikelihoodpower-lawtsfortwolocations. ...131 9-4Analysisofclusteringcoefcient. .........................133 9-5CDFshowingcorrelationoftrafcdensitiesbetweenhour softheday. ....134 9-6Trafcwithvaryingdensities(A)low,(B)medium,(C)hig hisshown. ......135 9-7Besttsforsixregions.Thevaluesintheboxshowdeviati on. .........136 10-1AveragedensityforregionsofSydneyandLondon ...............144 10-2Kohonennetworkresults. .............................145 10-3Severalvariationsintrafcdensitiesacrosssix-week strafcmonitoring. ..146 10-4ACDFshowingthedistributionoftrafcdensitiesthata recorrelated. ....146 1 3

PAGE 14

10-5ACDFshowingthedistributionofaverageauto-correlat ion. ..........147 10-6SimilarityandhotspottrafcinSydneyandTorontoregi on. ..........147 10-7Ascatterplotfordistancevs.timeforsixregionswithl ineart. ........149 10-8CentralitydistributionforSydney .........................150 10-9ACCDFofdistanceandtimeplotsbetweenlocationspairs ofsixregions. ..151 10-10Thescatterplotsshowthedistributionofthreecentra litiesforsixregions. ..154 10-11ACDFplotshowthedistributionofvehiculartrafcden sity. ..........155 10-12AbarplotthatcomparesthetrafcdensitiesagainstFo urSquarecheck-ins. .156 11-1GrangernetworkforSydney(allhours,explanationinth etext). ........168 11-2Theco-integratingfactorforSydney. .......................168 11-3AGPmodelfordistribution. ............................170 11-4AcontourplotoftheGPdistribution. .......................171 11-5Thepanelsshowsthestateofthenetworkat14:34andat15 :04. .......172 11-6Asampleforecastversustheactualforthetrafcdensit yatcamera16. ...173 11-7ThepredictionMSEforSydneymorningdata(6-stepsahea dor30minutes). 173 11-8Theincreaseinforecasterrorwithforecasthorizonfor Sydney. ........174 11-9Acomparisonofthedistributionsof R 2 u v aluesforthe6cities. .........175 A-1ShownabovearefourcommunitiesgeneratedfromRandomWa lk. ......187 A-2Aperformanceanalysisofalgorithmfor128-nodenetwork (1000realizations). 196 A-3Performanceofalgorithmon512-noderewirednetwork(10 00realizations). .196 A-4Plotofdendrogramshownforageneratedlocalcommunitys tructure. ....197 A-5Plotagainstnumberofiterationsandaverage )Tj /T1_0 11.955 Tf 10.08 0 Td (v alueisshown. ........198 A-6Bargraphshowingtheefciencyofalgorithmonvariousgr aphsparseness. 198 A-7Arelativecomparisonofstaticanddynamiccommunityval ues )Tj /T1_0 11.955 Tf 10.08 0 Td [(areshown. .199 A-8Plotshowinganeffectonvalue )Tj /T1_0 11.955 Tf 10.08 0 Td (( y-axis)andnumberofiteration(x-axis). ..200 A-9Comparisonoftheexecutiontimewithvaryingnumberofno desinnetwork. .201 A-10ActualbreakdownofZacharyKarateClub. ....................202 1 4

PAGE 15

A-11TwolocalfactionsinZacharykarateclubgeneratedusin gthealgorithm. ...203 A-12Karateclubconvergencegraphfortheidenticationofc ommunities. .....203 A-13NRC/MASCAnalysisforcommunitystructure ..................205 A-14InthestudyofWLANmobileusers,31activecommunitiesa reidentied. ..207 B-1SHIELDarchitecture ................................214 B-2Analysisofencounterpatterns. ..........................216 B-3ApplicationProtocol ................................219 B-4SHIELDApplicationPrototype ..........................220 B-5Analysisofbluetoothscanningfordevices. ...................221 B-6Agraphshowcrimeloganddensitydistributionofon-camp usmobileusers .222 1 5

PAGE 16

AbstractofDissertationPresentedtotheGraduateSchool o ftheUniversityofFloridainPartialFulllmentofthe RequirementsfortheDegreeofDoctorofPhilosophy ADATA-DRIVENMODELINGOFLARGE-SCALEMOBILENETWORKS:COMMUNITY ANDVEHICULARMOBILITY By GautamS.Thakur December2012 Chair:AhmedHelmy Major:ComputerEngineering Inthiswork,anoverarchingneedofaframeworkfordata-drivenmobilitymodeling inpedestrianandvehicularnetworksisproposed.Mobilityisoneofthemainfactors affectingthedesignandperformanceofwirelessnetworks. Inthiswork,rstwefocusonpedestrianmobility-westartourstudybyinvestigating theadequacyofexistingmobilitymodelsincapturingvariousaspectsofhumanmobility behavioraswellasprotocolperformance.Thisisachievedsystematicallythroughthe introductionofamulti-dimensionalmobilitymetricspace(MIDAS)tomeasureindividual, pair-wiseandgroupmetrics.Inaddition,amethodicalanalysisofarangeofmobile encounter-basednetworkingprotocolsisconductedtocomparetheperformanceunder variousmobilitymodelsandextensivetraces.Ourresultsindicatesignicantgapsin severalmetricdimensionsbetweenrealtracesandexistingmodels.Wethenintroduce COBRA,anewmobilitymodelcapableofspanningthemobilitymetricspacetomatch realistictraces.WefurtherevaluateDTNprotocolsusingvarioustracesandmodels. OurndingsclearlyshowthatCOBRAcanmatchtherealisticprotocolperformance, reducingthegapfrom80%to12%,andshowingtheefcacyofourapproachbased uponthemetricspacematching. Realisticdesignandevaluationofvehicularmobilityhasbeenparticularly challengingduetoalackoflarge-scalereal-worldmeasurements.Toovercome thesechallenges,weintroduceanovelframeworkforlarge-scalemonitoring,analysis 16

PAGE 17

ofvehiculartrafcusingfreelyavailableonlinewebcams.S ofar,wecollected140 millionvehicularmobilityrecordsfrom4,909cameraslocatedintendifferentregions. Werepresentcitiesbynetworkgraphsinwhichnodesarecameralocationsandedges areurbanstreetsthatconnectthenodes.Suchrepresentationexhibitssmallworld properties.Trafcdensitiesshow80%temporalcorrelationduringseveralhoursof aday.Itisfoundusingthegoodness-of-ttestthatthevehiculardensitydistribution followsheavy-taildistributionsinover90%oflocations.Ouranalysissigniesthat thetrafcpatternsarestochasticallyself-similar.Webelieveourworkprovidea much-neededcontributiontotheresearchcommunityforrealisticanddata-driven designandevaluationofpedestrianandvehicularnetworks. 17

PAGE 18

CHAPTER1 O VERARCHINGINTRODUCTION Inthiswork,wehaveproposedadata-drivenframeworktomodel,design,andtest large-scalenetworksystems.Thetwosystemsthatwehavefocusedoureffortsare: i. Pedestrianand ii. Vehicularnetworks.Adata-drivenapproachprovidesseveralbenets overtraditionalstatisticalandanalyticalapproaches.Themainmotivationbehindusing suchanapproachinvolvesdealingwithfactsandrealityoftheunderlyingparadigm.The evaluationandresultscarrywidercondenceintervals.Inadditiontothat,arealistic data-drivenapproachprovidesawaytoperformlongitudinalanalysis,whichiseffective instudyingperiodictrendsandforecastingpurposes. Thisthesisisbifurcatedintotwomaindivisionsbasedonpedestrianandvehicular workthatwehaveachievedsofar.Ourmaincontributiontowardsdata-drivenpedestrian mobilitymodelingandanalysisincludes: Weidentiedtheperformanceandstructuralgapsinexistingpedestrianmobility modelstorealwirelessmeasurements. introducedasystematicmobilitybenchmarkingmethod,includingamulti-dimensional mobilitymetricspaceandaframeworkforthoroughmobilityandprotocol performanceanalysis, introducedanewmobilitymodeltocapturemultiplemobilitymetricssimultaneously. Wealsoplantoreleasethemodelimplementationandbenchmarkscenariosas partofthisstudy. characterizedandquantiedthemetricgapsinexistingmobilitymodels, providedacomprehensiveevaluationofperformanceofDTNprotocolsoverthe variousmodelsandtraces. Ourmaincontributiontowardsdat-drivenvehicularmobilitymodelingincludes: Weprovide,tothebestofourknowledge,byfarthelargestandmostextensive datasetforfuturevehicularnetworkanalysis.Thispotentiallyaddressesasevere shortageofsuchdatasetsinthecommunity. introduceanewandmorepracticalwaytolookintourbanstreetnetworksbased ondrivingroutes.Anetworkgraphofroutesandlocationsdepictsmallworld properties. 18

PAGE 19

establishheavy-tailedmodelssuchaslog-logisticandlog-gammadistributionsas themostsuitabletsformodelingvehiculartrafcdensity. thedataconsistentlyshowedahighdegreeofself-similarityoverordersof magnitudeoftimescalesfortheselocations. provideanovelframeworktostudythecauseandeffectrelationshipoftrafc causalityinurbanstreetsusingthousandsofon-linewebcameras.Infuture,we alsoplantoreleasethedatasettotheresearchcommunity, establishedthatcausalityonmotorways(highwaysandinter-states)arefarmore perceivablethancityslocaltrafc.Thisisadistinguishingfactorthatcanbeused forprolingthecities,and weconstructanempiricaltimeseriesmodelbasedonrealdatathatconsidersthe interactionbetweenthetrafcobservedatdifferentpointsinacity.Themodels produceforecastswithatypicalerror(PMSE)of3-9%.Theyalsorevealthattrafc cameranetworkstendtobedisassortative;manysmallerjunctionsfeedinginto largerones.Itisenvisagedthatthemodelsconstructedherewillformthebasisof futurestudiesintovehiculartrafcdynamicsincitiesbasedonrealdata. Ourapproachhasinvolvedtheuseofgenericandsystematictechniquesthatcan bescaledandusedforanyurbanorruralsettings. Overall,thethesisisorganizedas:Chapter 3 talksaboutthepedestrianmeasurements t hatwehaveusedtomodelspatio-temporalpreferencesaswellasinidentifying gapsinexistingmodels.Wehavealsousedthesedatasetsinshowingthatour newmobilitymodelisabletoreplicatealltheunderlyingstatistics.Weidentiedthe structuralgapsandprotocolperformanceinexistingmobilitymodelsandgavean explanationinChapter 4 .InChapter 5 ,weintroducedaframeworkthatincludesa m ulti-dimensionalmobilitymetricspacetomeasureindividual,pair-wise(encounter) andgroup(community)metrics.WethenintroducedCOBRA,anewmobilitymodel capableofspanningthemobilitymetricspacetomatchrealistictraces.Amethodical analysisusingarangeofprotocol(epidemic,spray-wait,Prophet,andBubbleRap) dependentandindependentmetrics(modularity)ofvariousmobilitymodels(SMOOTH andTVC)andtraces(universitycampuses,ofces,andthemeparks y )isalsodone.At thispointwehavefocusedmainlyontheworkthatwehavedoneinpedestrianmobility 19

PAGE 20

andmodeling.Subsequentchaptersfocustheworkthatwehave doneinvehicular networkingdomain.InChapter 6 ,wediscussdetailsofthecollectedgeo-location i nformationofcamerasusedfortheanalysisoftopologicalpropertiesandrecorded vehicularimagescapturedfromthesecamerastomodelandcharacterizethevehicular trafc.Weexaminethestructuraldynamicsofcitiesandatemporalanalysisofthe vehiculartrafcowintheminChapter 7 .InChapter 8 ,Weconductedtwomainsets o fstatisticalanalyses:therstincludesaninvestigationofthebest-tdistributionand goodness-of-ttestusingafamilyofheavy-tailandmemorylessmodels,whilethe secondisastudyofthelongrangedependence(LRD)andself-similarityobserved inthatdata.Weperformacombinedstudytolearnthestructureandconnectivity ofurbanstreetsandmodelingandcharacterizationofvehiculartrafcdensitieson theminChapter 9 .Weprovideasignicantinsightforenablingcity-wideubiq uitous computingenvironmentinChapter 10 .Atimeseriesmodelemployingaco-integrated v ectorautoregressionmodelispresentedinwhichtrafcforecastsmaybeproduced andregionsofthecitynotwellobservedmaybesuggestedinChapter 11 .Finally,we c oncludethethesisgivingfuturedirections. 20

PAGE 21

CHAPTER2 R ELATEDWORK DelayTolerantNetworks(DTNs)areessentiallyopportunisticnetworks.These typesofnetworksdonotdemandpermanentconnectivitybetweensourceand destination;insteadattempttomakebestuseofanyschemeavailablethatcangetthe messageacross.Mobilityofthenodesisoftenrealizedfortransferringthemessages. DesignofanycommunicationprotocolsforDTNsisheavilydependentonhowwellthe underlyingmobilityisunderstood[ 10 216 ].Therearetwoelementalwaystodesignand t esttheprotocolsforDTNs,namelyTrace-BasedandMobilityModelbased[ 11 ]. I ncaseoftracebaseddesignandevaluation,amobilitytracecanbedownloaded fromalimitednumberoftracerepositories[ 137 167 ].Thesetracearefromthereal w orldandcapturerealmobilitypatternsoftheusersbelongingtothetraces.Forthe tracecollectionenvironment,testingtheprotocolonthetraceswouldproducemost realisticresults.Buttherearequiteafewdrawbacksofusingrealtracessuchaslimited numberoftraces,notcapturingallscenariosandinabilitytogeneralizetheresultsbased onafewtraces.Duetothesedrawbacks,researchershaveproposedmodelsthat capturekeycharacteristicsofhumanmobilityandproducesynthetictraces. Duetothecomplexityofunderstandinghumanmobilityandmodelingit,models arecreatedtoreproducefewcharacteristicsfromrealtracessuchasinter-encounter time[ 36 ],regularity[ 50 ]andcommunitybehavior[ 36 65 91 109 138 ].Inmostcases, a synthetictraceisvalidatedbycomparingafewkeycharacteristicsagainstthereal trace.Thisvalidationwethinkisnotthebest,asitdoesnottesttheapplicationoriented parametersofthegeneratedtrace. Anewparadigmofprotocolsthatreliesonthehumanbehavioralpatternshas gainedrecentattentioninDTN-relatedresearch.Inthesestudies,researchersattempt tousesocialaspectsofhumanmobilitytoderivenewservicesandprotocols[ 41 74 100 106 ].Asanexample,researcherscreatedabehavior-orientedse rvice,called 21

PAGE 22

prole-cast,thatreliesonspatio-temporalsimilaritiesb etweentheusers[ 68 ].Prole-cast p rovidesasystematicframeworktoutilizeimplicitrelationshipsdiscoveredamong mobileusersforinterest-basedmessageefcientforwardinganddeliveryinDTNs. Participatorysensing[ 144 170 177 ]providesaserviceforcrowdsourcingusing r ecruitingcampaignsusingmobileuserproles[ 74 ].Alltheseworksrelyonandutilize s imilarityofmobileuserproles.Inalltheabovescenarioshowcanwebesurethata givenmobilitymodelcancaptureallthecharacteristicsneededtotesttheseprotocols. Inanattempttoanswertheabovequestion,inthiswork,wehavetakenanovel approachforevaluatingthemobilitymodels;whichistocomparetheperformance ofroutingprotocolsonsynthetictraceandonrealtracewhosecharacteristicswere utilizedtocreate/validatethemobilitymodel.Thisapproachallowsustotestandcreate mobilitymodelswhilekeepingtheapplicabilityofthegeneratedtraces.Inthiswork, asacasestudy,weconsideracomplexmobilitymodelTVC[ 109 ](alongwithrandom d irectionmobilitymodel[ 107 ]).Thismodelgeneratesthenon-homogeneousbehaviors o fmobileusersinbothspaceandtime.Thetracesgeneratedbythismodelshow(i) skewedlocationvisitingpreferences;(ii)timedependentperiodicalreappearanceof mobileusersasseeninWLANmeasurementsalongwithotherencounterstatisticssuch asaveragenodedegreeandmeetingtime.Itusesseveralrealtracestovalidatethe correctnessofitsdesign. MobilityModeling :Generally,mobilitymodelingcanbeclassiedassynthetic ortrace-based.Oneofthemostcommonlyusedsyntheticmodelsincluderandom models(e.g.,randomdirection,RWP[ 33 ])wherenodesarenotcorrelatedandtheir d estinationsarechosenrandomlyinthesimulationarea.Theinadequacyofthese modelsinmanyscenariosiswelldocumentedintheliterature[ 13 108 ]andweshall u seitasamerereference.Severalrecentmodelsusedtraceanalysistoguidetheir design.ThetruncatedformofLevywalksobservedinsomescenariosofhuman mobilityisdevelopedin[ 201 ],andisalsousedtorecreatepower-lawinter-contacttime 2 2

PAGE 23

distributions(SLAW[ 148 ]).In[ 108 ],Hsuetal.proposedatime-variantcommunity m odelthatusesspatio-temporalcharacteristicsofhumanmobility.SMOOTH[ 171 ] i sanothermodelthatborrowsfromSLAWandaddsight,contactandpausetime distributions.Severalothermodelshavealsostudiedthemobilitypatternsingreater detail,alldrivenbythefundamentalconceptofcharacterizingthehumanbehavior. BehavioralCharacterization :Forintermittentlyconnectednetworks,several ofworkshavebeenproposedtoutilizehumanbehavioralpatterns.Onthisfront, non-homogeneousbehaviorsinbothspaceandtimecommonlyfoundinrealityare proposedin[ 54 108 ],wherethelocationvisitingpreferenceandtime-dependen t mobilitybehaviorwerestudied.In[ 40 ],authorsstudiedtransferopportunitiesbetween w irelessdevicescarriedbyhumansandshowedthatthedistributionofinter-contact time(thatisthetimegapseparatingtwocontactsofthesamepairofdevices)exhibits aheavytailsuchasoneofapower-law.Thesepropertieshadsignicantimpactonthe designanddevelopmentofnewmobilitymodelsthatfocusoncapturingrealisticmobility. In[ 120 228 ],authorsestablishedthepresenceofspatio-temporalsimi laritythatcanbe exploitedbymobilitymodelsforthedisseminationofmessagewithlikecharacteristics. RoutingProtocolsinDTNs :Epidemicrouting[ 237 ]isawell-knownprotocolthat w asproposedtobroadcast(intimeandspace)amessagetoallnodesinanetwork.On theotherhand,someprotocolsavoidbroadcastandrelyonstatisticalproperties(e.g., spray-and-wait[ 215 ])orencounterpatterns(e.g.,Prophet[ 153 ])toguidethemessage t oadestination.Inrecentworksprotocolsbasedoncommunalbehaviorinmobile societieswereproposed,includingSimbet[ 54 ],Bubblerap[ 112 ],andProle-cast[ 120 ]. I ntheseprotocols,messagesareguidedbasedonbehavioralcommunitiesandproles. Theseworksutilizebehavioralcharacteristicstoimprovetheperformanceascompared tosocially-obliviousroutingprotocols. 23

PAGE 24

CHAPTER3 D ATASETFORHUMANMOBILITYMODELING Inthissection,wediscussthedetailsofthetracesused.Onesetoftracescontain WLANsessionlogs.Theothertypeoftracescontainlogsofbluetoothencounters amongmobiledevices.Wehaveusedsixdifferenttypesofmeasurementsfromvaried sourcesthatincludeuniversitycampuses,ofces,conferences,andthemeparks.These measurementsareshowninTable 3-1 andarepubliclyavailableat[ 137 ],[ 122 ]except t hemeparksmeasurements.Thesemeasurementsarecategorizedas: S PATIO -T EMPORAL M EASUREMENTS .TheseareWiFiusagemeasurementsthat arecollectedfromseveraluniversitycampusesandofcesandexceededinsize andnumberbymanyorderscomparedtoothers.Theyalsohavehighdensityof activeusersandincludelocationinformation.Also,thesedatasetshavebeenused inpreviousstudiesofmobilitymodeling[ 103 105 106 110 113 135 247 ].We p erformSystematicRandomSampling[ 205 ]onthedatasetstogetanunbiased s ubsetofmobileusersfromthepopulation. E NCOUNTER M EASUREMENTS .Thesemeasurementsarecollectedusingdevices withbluetoothscanningfunctionality.Theycaptureexplicituser-userencounters anditsduration(radiocontactbetweentwomobiledevices).However,theyare collectedwithsmalluserbaseandforshortertimeduration(say2-3days).This dataiscollectedusingIntel'siMote,whichcommunicateonBluetoothprotocol andlogcontactinformationofallvisibleBluetoothcapabledevices.Sucharecord containsthreeentities-MACaddress,starttime,endtimethatcorrespondto eachencounterbetweenthehostandforeigndevice.Aspartoftheexperiment, thesedevicesweredistributedinconferencesettingsto41participantsfora periodofthree-fourdays.Wetransformedthegathereddataforourneedtostudy inter-meetingtimeanddurationofmeetingamongmobileusersandcompare outputbetweenmodelandreality.Thisdatasetisavailableat[ 137 ].Inorder t opursueastudyontheencounterstatisticsanddynamicroutinginDTN,we needmeasurementsthatquantitativelydepictthecontact(a.k.a.encounter) betweenmobileusers.Anencounteroccursbetweenapairofnodeswhenthey areinaradiocommunicationrangeofeachother.Thisisstraightforwardfor theiMoteBluetoothmeasurementsthatcontainpreciseencounterinformation. However,theWLANmeasurementsareaccumulatedattheaccesspointlevel andcontainusagepatterns.So,weneedtoconvertthesemeasurementsina waytogetuserencountersaswellasmaintainingtheirspatio-temporalfootprints. WeconsiderencounterinWLANiftwousersconnecttosameaccesspoint andshareonlinesessiontime.Forexample,AliceandBobareconnectedto accesspointAP-1between10:00AM-02:00PM.Acounterargumentcanbe establishedbysayingthatsomeWLANdevicesmaymissencountersbeyondtheir 24

PAGE 25

coverageregionofaccesspoints,butWLANmeasurementshave theadvantage toobtaintracesinmuchlargersizeswithricheruserpresence.Theyalsocontain locationinformation,whichhelpsinspatio-temporalanalysis.Mostly,aBluetooth experimenthassmallsetofuserbaseforalimitedtimeperiod. GPSM EASUREMENTS .Thesemeasurementsloggeo-coordinatefootprintsof guestsinthemeparksandtheGPSlocationsaccuracywastakeneverytwo minutesonaverage,whenthesatellitesignalswereavailable.Moredetailsare availablein[ 240 ](restrictedaccesstothedataandasperagreementwithDisn ey). WehavesystematicallyfollowedtheTRACEprocessasmentionedin[ 106 ].We s tartbyextractingrelevantstatisticsofmobileuserspatio-temporalpatternstobeused forstudyingthestructuralgapsinmobilitymodels.Foreachmobileuserweobtain normalizedassociationmatrixwithtimegranularityofoneday.Onthismatrixweapply SVDtoextractthedominanttrends.Finally,wecomputethecosinesimilarityofalluser pairs.Weperformthisprocessiterativelyforfourdifferenttimeintervals:1week,2 weeks,3weeksand4weeks.Thesetracesarepubliclyavailableat(Kotz,etal.,2005) althoughwecustomizeditinaformatthatsuitsus.Initially,weinvestigatemobileusers preferentialattachmenttocertainlocationsandtheirtime-dependentperiodicbehavior. Lateronweinvestigatestructuraldynamicsandcomparethemwiththemobilitymodels output.Asasecondstep,wehaveusedthemtoextractencounterpatternstobelater usedforopportunisticprotocolperformanceanalysis. Table3-1.Detailsofwirelessmeasurements Campus#UsersDurationSettings Dartmouth300Fall2007WiFi,Campus I nfocom413daysBluetooth,Conference IBMWatson1300Fall2006WiFi,Ofce ThemePark 1 8255daysGPS,Attractions Univ.ofFlorida700Fall2008WiFi,Campus USC300Fall2007WiFi,Campus 1 A nalysisisdonewhileworkingatDisneyResearch,Zurich,2012. 25

PAGE 26

CHAPTER4 G AUGINGBEHAVIORALANDPROTOCOLPERFORMANCEGAPS 4.1Introduction Theproliferationofhighlycapablemobiledevices(e.g.,laptops,smartphones, tablets)withmulti-sensingcapabilitiesgreatlyfacilitatesthecaptureofmobility traces[ 137 167 ]andthedirectexchangeofinformationthroughencounters. Mobility tracescanthenbeusedasguidelinesformodelingpurposes.Morerealisticmobility modelshavebeencreated,bymimickingencounterstatistics[ 41 128 ]ormobileuser l ocationvisitationpreferences[ 109 ].Muchoftherecentmodelingworkfocusedon e ncountermetrics;suchasinter-encounterandhittingtimedistribution[ 41 ],meeting d uration[ 41 109 ],orspatio-temporalproles[ 109 ].Thesemetricsaregenerally c onsideredimportanttotheoperationofmobilenetworksingeneral,includingDTNs, Adhocandsensornetworks.DTNsarecharacterizedbyintermittentconnectivity,limited end-to-endconnectivityandnoderesources.Futuresocialnetworksareexpectedto haveclassesofapplicationsthatareawareofmobileusers'behavioralprolesand preferencesandarelikelytosupportpeer-to-peermobilenetworkingincludingDTNs. Anewgenerationofprotocolsisemerging,includingbehavior-awarecommunication paradigms(suchasprole-cast[ 105 ])andservicearchitectures(suchasparticipatory s ensing[ 176 212 ]).Suchbehavior-awarecommunicationparadigmleveragesu ser behaviorandpreferencestoachieveefcientoperationinDTNs(e.g.,interest-based targetmessageforwarding;encounter-basedrouting,mobileresourcediscovery). Accuratemodelsofmobileuserbehavioralprolesareessentialfortheanalysis, performanceevaluation,andsimulationofsuchnetworkingprotocols.Hence,thereisa compellingneedtounderstandandrealisticallymodelmobileusersbehavioralproles, similarityandclusteringofusergroups. Earlierworkonmobilitymodelingpresentedadvancesinrandommobilitymodels (e.g.,RWP,RD[ 34 ]),syntheticmodelsthatattempttocapturespatialcorrela tion 26

PAGE 27

betweennodes(e.g.,groupmodels[ 11 ])ortemporalcorrelationandgeographic r estrictions(e.g.,Freeway,Manhattan,PathwayModels[ 11 ]).Morerecentmodelstend t obetrace-drivenandsomeaccountforlocationpreferencesandtemporalrepetition [ 109 ].However,similaritycharacteristicsbetweenclustersof nodes,whichlieinthe heartofbehavior-awarenetworking,havenotbeenmodeledexplicitlybythesemobility models.Hence,itisunclearwhether(andtowhichdegree)similaritybetweenmobile nodesiscaptured,andmoreimportantly,howcloselycansuchmodelsbene-tunedto replicatesocialstructures,suchasgroupswithdistinctbehaviorobservedfromreal traces. Also,nostudieshavebeenconductedtoshowwhethermatchingmetricsis sufcienttoaccuratelyreproduceDTNprotocolperformance.Inthisstudy,we thoroughlyexaminethisspecicproblem,andattempttoshow,forthersttime,the sufciency(orlackthereof)ofexistingencounterandmobilitymetricsinreproducing realisticeffectsofmobilityonstructuraldynamicsandtheperformanceofnetworking protocols. Inthischapter,rstweanalyzespatio-temporalpropertiesandencounterstatistics oftworealisticwirelessmeasurementtraces.Wethenevaluatethesamecharacteristic onthesynthetictracesproducedbytwodifferentmobilitymodels;therandomdirection model[ 202 ]andthetime-variantcommunity(TVC)model[ 109 ].Specically,weanalyze t wocommonlyusedencounterstatistics;inter-meetingtimeandmeetingduration,in additiontotwospatio-temporalmetrics;periodicre-appearanceandlocationvisitation preference.Ourresultsshowthatmodelsmimicsuchstatisticsifcarefullytuned. Second,weaddressissuesrelatedtomobileusersimilarity,itsdenition,analysis andmodeling.Similarity,inthisstudy,isdenedbymobilitypreferences,andismeantto reecttheusersintereststotheextentthatcanbecapturedbywirelessmeasurements ofon-lineusage.Todenesimilarity,weadoptabehavioral-prolebasedonusers mobilityandlocationpreferencesusinganon-lineassociationmatrixrepresentation, 27

PAGE 28

andthenusethecosineproductoftheirweightedEigen-behav iorstocapturesimilarity betweenusers.Thisquantitativelycomparesthemajorspatio-temporalbehavioral trendsbetweenmobilenetworkusers,andcanbeusedforclusteringusersinto similaritygroupsorcommunities.Notethatthismaynotreectsocialtiesbetween usersorrelationshipsperse,butdoesreectmobility-relatedbehaviorthatwillaffect connectivityandnetworktopologydynamicsinaDTNsetting. Weanalyzesimilaritydistributionsofmobileuserpopulationsintwosettings.The rstanalysisaimstoestablishdeepunderstandingofrealisticsimilaritydistributionsin suchmobilesocieties.Itisbasedonrealmeasurementsofover8860usersforamonth infourmajoruniversitycampuses,USC[ 167 ]IBMWatson,Dartmouth[ 137 ]andUF.It m aybereasonabletoexpectsomeclusteringofusersthatbelongtosimilarafliations, butquanticationofsuchclusteringanditsstabilityovertimeisnecessaryfordeveloping accuratesimilaritymodels.Furthermore,on-linebehaviorthatreectsdistributionof activewirelessdevicesmaynotnecessarilyreectworkorstudyafliationsorsocial clustering.ForDTNs,on-lineactivityandmobilitypreferencestranslateintoencounters thatareusedforopportunisticmessageforwarding,andthisisthefocusofourstudy ratherthansocialrelationsperse.Thesecondsimilarityanalysisweconductaimsto investigatewhetherexistingmobilitymodelsprovideareasonableapproximationof realisticsimilaritydistributionsfoundinthecampustraces. Ourresultsshowthatamongmobileusers,wecandiscoverdistinctclustersof usersthataresimilartoeachother,whiledissimilartootherclusters.Thisistrue forallcampuses,withthetrendbeingconsistentandstableovertime.Wendan averagemodularityof0.64,clusteringcoefcientof0.86andpathlengthof0.24among discoveredclusters.Surprisingly,however,wendthattheexistingmobilitymodels donotexplicitlycapturesimilarityandresultinhomogeneoususersthatareallsimilar toeachother(inonebigcluster).Thisndinggeneralizestoallothermobilitymodels thatproducehomogeneoususers,notonlythemobilitymodelsstudiedinthischapter. 28

PAGE 29

Thustherichnessanddiversityofuserbehavioralpatternsi snotcapturedinany degreeintheexistingmodels.Ourndingsstronglysuggestthatunlesssimilarityis explicitlycapturedinmobilitymodels,theresultingbehavioralpatternsarelikelyto deviatedramaticallyfromreality,sometimestotallymissingtherichnessinthesimilarity distributionfoundinthetraces.Furthermore,thisindicatesourcurrentinabilityto accuratelysimulateandevaluatesimilarity-basedprotocols,servicesandarchitectures usingmobilitymodels. Finally,weperformepidemicrouting[ 238 ]onthesynthetic(modelgenerated) t racesandrealnetworktracesandcomparestheirnetworkperformance.Surprisingly, throughsystematicanalysis,wendthatevenwhenmobilitymodelsreectequivalent spatio-temporalandencounterstatistics,theyexhibitlargeDTNroutingperformance discrepancywiththerealscenarios.Furthermore,theyclearlyshowtheinsufciencyof existingencounterandpreferencemetricsasameasureofmobilitymodelgoodness. Systematicallyestablishinganewsetofmeaningfulmobilitymetricsshouldcertainlybe addressedinfutureworks.Thisalsomotivatestheneedtore-visitmobilitymodelingto incorporateaccuratebehavioralmodelsinthefuture. Thechapterisstructuredas:insection 4.2 ,wediscussmobilitymodelsusedto c ompareagainstreal-worldtraces.Westudyhumanmobilitycharacteristicsinthe section 4.3 usingthedatasetasdiscussedinchapter 3 .Wecomparetherouting p erformanceinsection 4.4 .Finally,wediscusstheresultsinsection 4.5 andconcludes t hischapterinsection 4.6 4 .2MobilityModelsStudied Inthissection,wediscusstwomobilitymodelsusedforevaluation.Weuse RandomDirectionModel[ 202 ],whichdoesnotpossesanyspatialortemporalstructure i nmobilitydecisions,asanexamplefromtypicalrandommobilitymodels.Thelackof spatialandtemporalstructureleadstofastermixingofthemobilenodes,andsetsthe lowerboundfordelayandmessagedeliveryoverhead.This,aswewillshow,deviates 29

PAGE 30

fromrealisticmobilitytracessignicantly.Wefurthercon siderTimeVariantCommunity model[ 109 ]asanexampleoftrace-basedmobilitymodels,whichincorpo raterealistic mobilitycharacteristicsobservedinrealtraces.Ourgoalistoevaluatewhethersuch realisticmobilitymodelsleadtomorerealisticevaluationofroutingperformances.In thefollowingtext,webrieydescribethesemodelsandconstructtracedrivenDTN scenariostoestimateroutingperformance. 4.2.1RandomDirectionModel Inrandomdirectionmodel,amobilenodemakesrandommobilitydecisionswith respecttocurrenttimeorlocation,independentofothernodes.Anoderandomlypicksa movementdirection,andtakesstraight-linemovementtowardsthatdirectionforagiven distance.Thenodethenstopsforagivenpausetimebeforeselectinganewdirection tomove.Thismodelismorestableascomparedtootherrandommodelsandprovides quantitativelyevendistributionofnodesinthesimulationarea.Wesetupthismodelto investigatetheeffectofrandommovementsonDTNperformance.Wemodifythismodel intwoways:(1)thebaselinerandomdirectionmodeldescribedabove;(2)weaddon/off behaviorofmobilenodes(i.e.,whenanodeis`off',itcannotreceive/transmitpackets), whichcorrespondstothefactthatmobiledevicesarenotalwaysturnedon. 4.2.2TimeVariantCommunityModel WechoosetheTVCmodel[ 109 ]asanexampleoftrace-basedmobilitymodels t hatcapturerealisticfeaturesofhumanmobility.Specically,theTVCmodelallows congurationstocapture(1)spatialpreferenceand(2)temporalperiodicityinhuman mobility.Withthesettingofcommunities,preferredlocationscanbedesignatedand mobilenodesvisitsuchlocationsmoreoften.Thevisitsarefurthermadeperiodically withthesettingoftimeperiods.TVCmodelalsoincludeson/offbehaviorofmobile nodes.Itisshownthatwithcarefulcommunityandtimeperiodsetup,TVCmodel producesmobilitycharacteristicsthatmatchwiththerealmobilitytracesbetter.Since thesetupofTVCmodelisscenariospecic,inthischapterwehaveconsideredtwo 30

PAGE 31

0 1 2 3 4 5 6 7 8 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Time Gaps (in Days)Re-appearance Probability IBM Watson Campus TVC Model TVC-Model IBM Watson Campus WLAN Trace APeriodicre-appearance 0 5 10 15 20 25 30 35 10 -6 10 -4 10 -2 10 0 AP Sorted by Visit TimeFraction of Visit Time IBM Watson Campus TVC Model TVC-Model IBM Watson Campus WLAN trace BLocationvisitingpreferences F igure4-1.Spatio-temporalpreferencesofmobileusers.(A)Periodicre-appearancesof mobileusersintheIBMWatsonCampus.TVCModelrecreatessimilar preferencesasobservedinrealtraces.(B)TVCdepictsskewedlocation visitingpreferencesasobservedinIBMWatsonWLANtraces. instancesofTVCmodelsetup.Wesynthesizemobilitytracesfromdifferentsettings ofTVC(i)withmatchinglocation-visitingpreferencesandperiodicalvisitstoatrace collectedataresearchlab(ii)withmatchingencounterstatisticsataconference. Itisourgoalinthischaptertoevaluatewhethersuchimprovedrealisminmobility characteristicstranslatestohighersimilarityintermsofroutingperformancetoreal traces,whenweuseTVCasopposedtorandommodels. 4.3HumanMobilityCharacteristics Inthissection,weanalyzesetofmetricsusedtocapturenon-homogenous behaviorofhumanmobility.Itincludesspatio-temporalpreferencesandencounter statistics.Later,weintroducetheconceptof similarity amongmobileusersand demonstrateitsexistenceintherealworldtracesanddiscoverclustersofsuchusers withhighsimilarityindex.Finally,wealsoevaluatecurrentmobilitymodelstocapture similarityandclusteringeffects. 4.3.1AnalysisofSpatio-TemporalPreferences Thenon-homogenousbehaviorofmobileusersinspaceandtimeiscapturedby: (i)Skewedlocationvisitingpreferences(ii)Periodicalreappearances.Studiescarried outin[ 65 103 110 135 208 ]tellusthatmobileuserexhibitpreferentialattachment a ndperiodicreappearancestofewlocationsinDTNs.Weassume,understandingthese 31

PAGE 32

10 0 10 1 10 2 10 3 10 4 10 5 10 0 10 2 10 4 10 6 Inter-meeting Time (s)Probability [Inter-meeting Time] Infocom TVC-Model TVC-Model Infocom AIntermeetingtime 10 0 10 1 10 2 10 3 10 4 10 5 10 -5 10 0 Meeting Duration(in secs)Probability(Meeting Duration) InfoCom TVC Model TVC-Model Infocom BMeetingduration F igure4-2.TVCmodel'sencounterstatistics.(A)TVCModeldepictsmeetingduration asmeasuredintherealInfocomtraces.(B)Inter-meetingtimebetween mobileusersarealsosimilarfortheTVCandrealInfocomtraces. distributionsaidtobettermessagedissemination,predictionofinformationtransmission andthemessagedeliveryinopportunisticsetting. WeconstructtheTVCmodeltogenerateamonthlongsynthetictraceforIBM Watson's1366nodes.InFigure 4-1 (a)and(b),weseethatTVCmodeldemonstrates r ealisticallycloselocationvisitingandperiodicreappearanceproperties.Forbrevity, periodicre-appearancesareplottedforsevendaysonly.There-appearanceofspikes demonstratesusersvisitthesamelocation(s)withhigherprobabilityinaperiodic fashion.Anormalizedcurveoflocationpreferencesshownodesvisitveryfewlocations althoughspendingsignicantamountoftheironlinetime.Thesetwocharacteristics whencombinedresultsinbetterpredictingthemobilityandon/offpatternsofmobile nodes.Furthermore,itcanalsohelptoidentifyhotspotsandtomeasureanapproximate delayinmessagereception.Next,weanalyzethestatespaceofencountersamong mobilenodes. 4.3.2AnalysisofEncounterStatistics Indynamicinfrastructure-lessmobilenetworks(likeDTNsetc.),theroutingis performedbydatacarryingmobilenodes.Theexchangeofinformationtakesplace whentwonodesencounter(a.k.ameet)eachother.Intuitively,wecanimproverouting mechanismgivenweunderstandthestatisticsoftheseencounterpatterns.So,we analyzetwoencounterstatistics:(i)IntermeetingTime,whichisthetimegapthat 32

PAGE 33

Figure4-3.Similarityrepresentationthroughassociation matrix.(A)Aprototypeof AssociationMatrix.Thecolumnsrepresentlocations(accesspoint,building, etc)androwsrepresenttimegranularity(days,weeks,etc.).(B)Acomputed matrixAwith5locationsandtimeperiods.Eachentryrepresentthe percentageonlinetimespentatcorrespondinglocationcolumn. separatestwoconsecutivemobileencounters.(ii)MeetingDuration,whichisthesingle uninterruptedmeetingdurationsurroundedbyintermeetingtimes.Thus,ourstatistics alternatebetweeneachother.Thesestatisticsasmentionedin[ 128 ]canhaveimportant i mplicationsontheperformanceofopportunisticforwardingalgorithmsinchallenged networks. TheTVCmodelhasabilitytogeneratemeasurementstoanalyzeencounter statistics.So,wecongureitforInfocomsettingtogenerateindividualmobilitytraces forthesamenumberof41nodesandforanequivalentdurationoffourdays.The simulationareaismodeledlikeaconferencesettingwithexibilitytovisithotelrooms andoutsidelocations.Welateronprocessthegeneratedtracesandplotthemalong realmeasurements.TheCDFplotsinFigure 4-2 showthatmodelsignicantlymatches r ealencounterstatistics.WeseeintermeetingtimefollowsPowerlawdistributionup toacharacteristictimeperiodafterwhichitdecaysexponentially.Thismadeusto believethatTVCcanalsobeusedtomodelencounterpatternsforunknownscenarios. Weconclude,TVCmodelisstatisticallyaccurateonthesemetricsandcloselyfollow observedrealities.WehopethatTVCwillcapturestructuraldynamicsandwillshow closeprotocolperformanceaswell,whichweseenext.Inourcase,weassumethese metricsifcapturedviaamodelarevitalinachievingidenticalperformanceinrouting 33

PAGE 34

Table4-1.AnonymizedWLANsessionsample MacID Location StartTime EndTime aa:bb:cc:dd:ee:ff Loc-1 64400343 66404567 a a:bb:cc:dd:ee:ff Loc-2 85895623 86895742 aa:bb:cc:dd:ee:ff Loc-3 87444343 89404567 aa:bb:cc:dd:ee:ff Loc-4 98846767 99878766 4.3.3SimilarityandStructuralDynamics 4 .3.3.1Similarity Thecongregationofmobileagentswithsimilarcharacteristicpatternsnaturally developsmobilesocietiesinwirelessnetworks[ 66 115 247 ].Uponreectionitshould c omeasnosurprisethatthesecharacteristicsinparticularalsohaveabigimpact ontheoverallbehaviorofthesystem[ 50 113 170 172 ].Researchershavelong b eenworkingtoinferthesecharacteristicsandwaystomeasurethem.Onemajor observationisthatpeopledemonstrateperiodicreappearancesatcertainlocations [ 65 103 135 ],whichinturnbreedsconnectionamongsimilarinstances[ 163 ].Thus, p eoplewithsimilarbehavioralprincipletietogether.Thisbringsanimportantaspect where,user-locationcouplingcanbeusedtoidentifysimilaritypatternsinmobile users.So,forthepurposeofourstudy,toquantifysimilaritycharacteristicsamong mobileagents,weusetheirspatio-temporalpreferencesandpreferentialattachment tolocationsandthefrequencyanddurationofvisitingtheselocations.Itisimportantto studysimilarityinDTNtodevelopbehavioralspaceforefcientmessagedissemination [ 105 ]anddesignbehavior-awaretrustadvisorsamongothers[ 144 ].Forefcient n etworking,itcanhelptoquantifytrafcpatternsanddevelopnewprotocolsand applicationtotargetsocialnetworking.Analysisofsimilaritycanbeusedtoevaluatethe networktransitivity,whichhelpstoanalyzemacro-mobility,evolutionarycharacteristics andemergentproperties.Inthissection,weintroduceassociationmatrixthatcaptures spatio-temporalpreferencesandastatisticaltechniquethatuseittomeasuressimilarity amongmobileusers. 34

PAGE 35

4.3.3.2CapturingSpatio-TemporalPreferences W euselongitudinalwirelessactivitysessiontobuildmobileuser'sspatio-temporal prole.AnanonymoussampleisshowninTable 4-1 .Eachentryofthismeasurement t racehasthelocationofassociationandsessiontimeinformationforthatuser.The locationassociationcoupledwithtimedimensionprovidesagoodestimateofuser onlinemobileactivityanditsphysicalproximitywithrespecttootheronlineusers[ 110 138 ].Wedeviseascalablerepresentationofthisinformationin formofanassociation matrixasshowninFigure 4-3 .Eachindividualcolumncorrespondstoauniquelocation i nthetrace.Eachrowisann-elementassociationvector,whereeachentryinthevector representsthefractionofonlinetimethemobileuserspentatthatlocation,duringa certaintimeperiod(whichcanbeexiblychosen,suchasanhour,aday,etc.).Thusfor ndistinctlocationsandttimeperiods,wegeneratea t-by-n sizeassociationmatrix. RepresentationFlexibility: Therepresentationofspatio-temporalpreferencesin formofanassociationmatrixcanbechangedtouseeachcolumnforabuilding(where acollectionofaccesspointsrepresentabuilding)andthetimegranularitycanbe changedtorepresenthourly,weeklyormonthlybehavior.Forthepurposeofourstudy, eachrowrepresentsadayinthetraceandcolumnrepresentsanindividualaccess point. 4.3.3.3Characterizingassociationpatterns Forasuccinctmeasureofmobileuserbehavior,wecapturethedominant behavioralpatternsbyusingSingularValueDecomposition(SVD)[ 100 ]ofthe a ssociationmatrix.SVDhasseveraladvantages: Ithelpstoconverthighdimensionalandhighvariabledatasettolowerdimensional spacetherebyexposingtheinternalstructureoftheoriginaldatamoreclearly. Itisrobusttonoisydataandoutliers. Itcaneasilybeprogrammedforhandhelddevices,whichisourotheron-going work. 35

PAGE 36

TheSingularValueDecompositionofagivenmatrix A c anberepresentedas aproductofthreematrices:anorthogonalmatrix U ,adiagonalmatrix S ,andthe transposeofanorthogonalmatrix V A = U S V T where U T U = I = V T V U is t )Tj /T1_2 11.955 Tf 12.12 0 Td [(by )Tj /T1_2 11.955 Tf 12.12 0 Td (t matrixwhosecolumnsareorthonormal eigenvectorsof AA T S isa t )Tj /T1_2 11.955 Tf 13.08 0 Td [(by )Tj /T1_2 11.955 Tf 13.08 0 Td (n matrixwith r non-zeroentriesonitsmain diagonalcontainingthesquarerootsofeigenvaluesofmatrix A indescendingorder ofmagnitudeand V T isa n )Tj /T1_2 11.955 Tf 13.08 0 Td [(by )Tj /T1_2 11.955 Tf 13.08 0 Td (n matrixwhosecolumnsaretheorthonormal eigenvectorsof A T A .Thustheeigenbehaviorvectorsof V = f v 1 v 2 v 3 ,..., v n g summarizetheimportanttrendsintheoriginalmatrix A .Thesingularvaluesof S = f s 1 s 2 s 3 ,..., s r g orderedbytheirmagnitude S = f s 1 > s 2 > s 3 ,... > s r g .The percentageofpowercapturedbyeacheigenvectorofthematrix A iscalculatedby w i = k X i =1 s 2 i rank ( A ) X i = 1 s 2 i Ithasbeenshownthat[ 103 ]SVDachievesgreatdatareductionontheoriginal a ssociationmatrixand90%ormorepowerformostoftheusersiscapturedbyve componentsoftheassociationvectors.Bythisresult,weinferthatusers'fewtop location-visitingpreferencesaremoredominantthantheremainingones. 4.3.3.4Calculatingsimilarity WeusetheeigenvectorsofassociationmatrixAtoquantitativelymeasurethe similaritybetweenbehavioralprolesofmobileuserpairs.Forapairofusers,with respectiveeigen-vectorsas X = f x 1 x 2 x 3 ,..., x r x g and Y = f y 1 y 2 y 3 ,..., y r y g ,the behaviorsimilaritycanbecalculatedbytheweightedsumofpairwiseinnerproductof theireigenvectorsas Sim ( X Y )= rank ( X ) X i =1 rank ( Y ) X j =1 w x i w y j j x i y j j 36

PAGE 37

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 0 10 1 10 2 10 3 10 4 10 5 Similarity ScoreUser Pairs 1-week 2-weeks 3-weeks 4-weeks ADartmouth 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 2 10 3 10 4 10 5 10 6 Similarity ScoreUser Pairs 1-week 2-weeks 3-weeks 4-weeks BIBMWatson 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 3 10 4 10 5 10 6 Similarity ScoreUser Pairs 1-week 2-weeks 3-weeks 4-weeks CUF 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 4 10 5 10 6 10 7 Similarity ScoreUser Pairs 1-week 2-weeks 3-weeks 4-weeks DUSC F igure4-4.Similaritydistributionhistogramamonguserpairsisshownforcampuses.All thefourtimeintervalsshownearconsistentuserpaircountsforaparticular similarityscore.Lowestsimilarityscore(0.0-0.1)showsthatusershave verydifferentspatio-temporalpreferences.Afractionoftheuserpairsare alsoverysimilarwith(0.9-1.0)similarityscore. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.001 0.01 0.1 1 Similarity ScoreNormalized User Pairs Dartmouth IBM Watson UF USC Figure4-5.LogNormalizedSimilaritydistributionofallfo urdatasetsisshown. Sim ( X Y ) isquantitativemeasureindexthatshowstheclosenessoftwousersin spatio-temporaldimension.Thevalueofsimilarityliesbetween 0 6 Sim ( X Y ) 6 1 .A highervalueisderivedfromuserswithsimilarassociationpatterns.Inthisstudy,weare therstonetoinvestigatethedistributionofsuchasimilaritymetricamonguserpairs basedonrealisticdatasets. 37

PAGE 38

4.3.3.5SimialrityAnalysis T hedistributionhistogramofsimilarityscoresforthecampusdatasetsisshown inFigure 4-4 .Thegureshowsnumberofuserpairsasafunctionofsimilari tyscore thatquantifythebehavioralsimilaritybetweenmobileusers.Weobservethat:1)mobile societiescomposeofuserswithmixedbehavioralsimilarities,2)Forallfourtimeperiods thereisaconsistencyandstabilityinthesimilarityscoreamongmobileuserpairs. Thelowsimilarityscores(0-0.1)inFigure 4-4 indicateasubstantialportionofusers i sspatio-temporallyverydissimilar.Ontheotherhand,similarityscoresof(0.91.0) suggestastatisticallysignicantlikelihoodofhigh-densitytiescreatingtightlyknit groups.Thevariationinthemiddleshowspartiallysimilarandpartiallydissimilaruser pairs.Thisissignicantandprovidesaninsightintotheexistenceofmobilesocieties inthenetworkwithquitesimilarlocationvisitingpreferences.Overall,thecurvesshow anassortativemixingofuserpairsforallpossiblesimilarityscores.Figure 4-5 gives a normalizedlogplottocomparedatasetsfromdifferentcampuses,andshowsthat similarityexistsevenlyacrossallthetraces.Next,webrieyexplainmodularityanduse adivisivealgorithmtodiscovermobilesocietiesinthetraces. 4.3.3.6Modularity Tounderstandtheunderlyingstructureofmobilesocieties(orcommunities),the similaritydistributionisnotsufcient.Therefore,weusearobustmethodtosegregate userpairsthathavehighsimilarityscoreintotightlyknitgroups.Todetectsuch communitiesinagraphlikestructure,acentrality-index-drivenmethod[ 178 ]isutilized. T hismeasuretodetectcommunitiescircumventsthetraditionalclusteringnotionto identifymostcentraledges.Instead,adivisivealgorithmisappliedbasedonidentifying leastcentraledges,whichconnectmostcommunities(via edgebetweenness ).First,the betweennessscoreofedgesarecalculatedasthenumberofshortestpathsbetween pairofverticesthatrunthroughit.Understandably,tightlyknitcommunitiesareloosely connectedbyonlyfewintergroupedgesandhenceshortestpathstraversetheseedges 38

PAGE 39

Table4-2.Networkanalysisofdatasetsonthreedifferentme trics DatasetClusteringCoefcientAveragePathLengthModulari ty OrigRandOrigRandOrigRand Dartmouth0.890.050.102.470.630.2 I BMWatson0.920.050.402.120.790.14 UF0.780.0510.302.6050.670.24 USC0.910.050.192.00.460.11 Orig=OriginalDatasetGraph Rand=RandomGraph repeatedly,therebyincreasingtheirrespectivebetweenne ssscore.Ifsuchedges areremoved,accordingtoathreshold,whatwegetarethegroupsoftightlyknitted verticesknownascommunities.Toidentifyareasonablethresholdvalue,modularityis used.Modularityisthedifferenceofedgesfallingwithincommunitiesandtheexpected numberinanequivalentnetworkwithrandomlyplacededges[ 172 178 182 ]. 4 .3.3.7Detectionofmobilesocieties Humannetworksareknowntoexhibitamultitudeofemergentpropertiesthat characterizethecollectivedynamicsofacomplexsystem[ 23 36 221 ].Theirability t onaturallyevolveintogroupsandcommunitiesisthereasontheyshownon-trivial clustering.Here,weconsiderthespatio-temporalpreferencesandcosinesimilarity ofmobileusersasarelativeindextogenerateemergentstructures,whichwecall mobilesocieties.Thenetworktransitivitystructuresofmobilenodesforvariouscampus datasetsareshowninFigure 4-6 .Weusemutualsimilarityscoreofmobilenodesto p roduceaconnectedgraphandappliedrandomiterationsofmodularity[ 178 182 ]and b etweennessalgorithmtoinferthemobilesocieties.Asetofvisiblysegregatedclusters validatestheirdetectionandpresenceinmobilenetworks. 4.3.3.8Modularityanalysisformobilesocieties Statistically,modularitygreaterthan0.4isconsideredmeaningfulindetecting communitystructure.Forourdataset,wealsondhighmodularityindexascompared toanequivalentrandomgraph.ThecomparisonisshowninTable3.Henceforth,the heterogeneityindatasethastightlyknittedMobileSocieties.Thisanalysisfurtherhelped 39

PAGE 40

ADartmouth BIBMWatson CUF DUSC F igure4-6.Shownarethestructuralandspatio-temporaldynamicsofmobilesocieties. Theyareasafunctionofweightedcosinescore,producedfromhighly positivevalues. ustoinvestigatethepossibilityofexistenceofdifferentclustersofusersbasedontheir proximityinsimilarityscorevalues. 40

PAGE 41

4.3.3.9Networkanalysisformobilesocieties W ecomputetheaverageclusteringcoefcientandthemean-shortestpathlength oftheseclusters.Wecomparetheresultswitharandomgraphofthesamesizeto understandthevariationandcapacitytodepictsmallworldcharacteristics.Table3 delineatesnetworkpropertiesandaveragemodularitythatprovidedetailsofthe structureofmobilesocietiesagainstsamesizerandomgraph.Thecomparativevalues inthetableclearlyshowthatmobilesocietiescanexhibitsmallworldcharacteristics. However,weleavesuchsmallworldstudyforfuturework. Basedontheaboveanalysis,wendthatsimilaritynotonlyexistsamongmobile users,butitsdistributionsseemtobestablefordifferenttimeperiods.Furthermore,this trendisconsistentinallfourtraces,whichhighlightssimilarityclusteringasanimportant characteristictocaptureusingmobilitymodels. 4.3.3.10Similarityinmodel-themissinglink Inthissection,weevaluateexistingmobilitymodelsandcontrasttheiroutput againstrealtraceresults.Tracebasedmobilitymodels[ 29 53 68 107 134 147 151 200 ]areacloseapproximationofrealistichumanmovementsandt heirnonhomogenousbehavior.Theyfocusonvitalmobilitypropertieslikenodes'on/off behavior,connectivitypatterns,spatialpreferencesundergeographicalrestrictions, contactduration,inter-meetingandpausetime,etc.Weconsidertwomobilitymodels, therandomdirectionmodel(awidelyusedclassicmobilitymodel)andTimeVariant CommunityModel[ 109 ](duetoitscapabilitytocapturespatio-temporalmobility p roperties).Intheensuingtext,webrieydescribetheTVCmodelanduseitto generaterealisticmovements.Finally,wecompareitsresultagainstthesimilarity characteristicfoundinrealmeasurements. WesetuptheTVCmodelfortwouniversitycampuses(IBMWatsonandUSC)to statisticallyevaluatethesimilaritymetricestablishedpreviously.Ourgoalistwofolds: 41

PAGE 42

!" !# $" !#%" !#&" !#'" (" !"!#("!#$"!#)"!#%"!#*"!#&"!#+"!#'"!#," (" !"#$%&'($"%)*+,-% .(/(0'$(12%.34$#% )' -% IBM Watson-Trace 4-weeks IBM Watson-Trace 3-weeks IBM Watson-Trace 2-weeks IBM Watson-Trace 1-week IBM Watson-Model[TVC] IBM Watson-Random Direction !" !# $" !#%" !#&" !#'" (" !"!#("!#$"!#)"!#%"!#*"!#&"!#+"!#'"!#," (" !"#$%&'($"%)*+,-% .(/(0'$(12%.34$#% )5 -% USC-Trace 4-weeks USC-Trace 3-weeks USC-Trace 2-weeks USC-Trace 1-week USC-Model[TVC] USC-Random Direction Figure4-7.Cumulativedistributionfunctionofdistancesf orthesimilarityofusers.Real tracecurvesshowaconformancewithuserpairsfordifferentvaluesof similarityscore,whileTVCandRandomDirectionModelhasalluserspairs inthe0.9scorerange. AsproposedbytheTVCmodel,weseektomaintaintheskewedlocationvisiting preferencesandtimedependentmobilitybehaviorofusers. ToanalyzewhetherTVCmodelsuccessfullycapturessimilarityamongmobile usersandquantitativelysimulatethedistributionthatwehaveseeninthereal measurements. 4.3.3.11ConstructionofTVCmodelforcampuses Initially,wedeterminethenumberofcommunitiesthatnodesshouldperiodically visit.Wedeterminethattop2-3communitiescapturemostskewedlocationvisiting preferences.Thenweemployaweeklytimescheduletocapturetheperiodicre-visits tothesemajorcommunities.Tokeepfaircomparisonagainsttherealmeasurements, weconguretheTVCmodelwithsamenumberofmobilenodesandgenerating measurementsequivalenttoonemonthtimeperiodwithone-daygranularity.Finally,for WLANmeasurementweassumemobileusersarestationarywhilebeingonline[ 109 ]. 4 .3.3.12Similarityevaluation TVCmodelaccuratelydemonstrateslocationvisitingpreferencesandperiodic reappearancesforbothcampuses[ 225 ].Surprisingly,itisunabletoaccuratelycapture t herichnessinsimilaritydistributiononspatio-temporalbasis.Forallvaluesofsimilarity scoreexcept0.9,TVCandRandomDirectionmodelyieldsnouserpairs.Figure 4-7 42

PAGE 43

29 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 0 0.5 1 1.5 2 2.5 IBM Watson Campus TVC Clustering Mobile Users (b) Distance 11 5 24 1 20 28 6 29 12 8 25 15 2 18 16 27 19 22 23 9 3 17 7 21 4 13 30 10 14 26 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3 IBM Watson Campus Clustering Mobile Users (a) Distance 7 29 13 2 6 21 26 4 24 27 20 12 28 5 17 18 1 10 9 23 11 25 15 19 22 16 3 14 8 30 1.6 1.8 2.0 2.2 2.4 2.6 2.8 USC Campus Clustering Mobile Users (c) Distance 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 0 0.5 1 1.5 2 2.5 USC TVC Clustering Mobile Users (d) Distance Figure4-8.Dendrogramsgivingvisualrepresentationofhie rarchicalclustering.Real traces(Figurea&c)showanincrementalbuilt-upofcomponentbasedon thesimilarityscorestrengthbetweenmobileuser.TVCModel(Figureb&d), outputonlyoneclustercontainingallmobileusers.Invariably,TVCtreatsall mobileuserstohavesamepreferences. showssimilaritydistributionCDFcurvesforbothcampuses.Weclearlyobservea discrepancybetweenthecurvesfromactualtracesandthetwomobilitymodels(TVC andrandomdirection).Inaddition,dendrogramsinFigure 4-8 showstheresultof h ierarchicalclusteringbasedonusersmutualsimilarityscores.Here,inrealtraces wendclustersatdifferentsimilarityscores.InFigure 4-8 (a),theaveragedistanceof 2 .0hascloseto18smallclustersandFigure 4-8 (c)shows16smallclustersofmobile u sers.However,correspondingTVCdendrogramsinFigure 4-8 (b)and 4-8 (d)show o nlyoneclusterofmobileusersatadistanceof2.0.Apossibleexplanationisthatthe communityassignmentinTVCmodelcreatesahomogeneoususerpopulationwhereall nodesareassignedthesamecommunities.Whileitcapturesthelocationvisitingand periodicpreferences,itfailstodifferentiateamongmobilenodeswithdifferentbehaviors. Whatismissinghereisamechanismtoassigndifferentlocationsasthecommunitiesto 43

PAGE 44

0 5000 10000 15000 20000 25000 30000 35000 0 0.2 0.4 0.6 0.8 1 Time (in Sec)Prob(delay > T) Infocom TVC-Model Random Direction (on/off) Random Direction ADelay-Infocom 1 2 3 4 5 6 7 8 9 10 11 0 0.2 0.4 0.6 0.8 1 Hops% Nodes Reached (CDF) Infocom TVC-Model Random Direction (on/off) Random Direction BReachability-Infocom 0 100000 200000 300000 400000 500000 600000 0 0.2 0.4 0.6 0.8 1 Time (in sec)Prob(delay > T) IBM Watson Campus TVC Model Random Direction (on/off) Random Direction CDelay-IBM 0 5 10 15 20 25 0 0.2 0.4 0.6 0.8 1 Hops% Nodes Reached (CDF) IBM Watson Campus TVC-Model Random Direction (on/off) Random Direction DReachability-IBM F igure4-9.Analysisforepidemicroutingandrelatedprotocols.(A,B)ShowEpidemic routingresultsfortheInfocomsettings.(C,D)ShowEpidemicroutingresults fortheIBMWatsonsettings.Asseen,largelydeviateintheirnetwork performancefordelayandreachabilitycomparedtorealmeasurement results. differentnodes,inawaythatreproducesthesocialstructure(clusters)observedinthe traces. ResultsinthissectionshowthatalthoughTVCmodelisabletocapturelocation visitingpreferencesandperiodicreappearances,itdoesnotcapturethesimilaritymetric distributionandtheclusterswithdifferentbehaviorsinthetraces.Randomdirection modelalsofailsonthisfrontinasimilarway.Thisstudyrealizesusthatcurrentmobility modelsarenotfullyequippedtohandlebehavioralmetricsandcommunitybehaviorof usersthatformmobilesocieties.Itcompelsustorevisitmobilitymodelingintheattempt tocapturebothindividualandcommunitybehaviorofmobileusers,whichispartofour on-goingstudy. 44

PAGE 45

4.4RoutingProtocolAnalysis I nthissection,wecompareroutingprotocolperformancebetweenrealworldand mobilitymodelgeneratedtraces.Ourimplementationofepidemicroutinginputtime varyingmobileencountersessions.Essentially,theyserveabasisforintermittently connecteddynamicnetworktopologysettingwhereeachencounterisviewedasan opportunitytoreceiveandforwardmessages.WerunepidemicroutingagainsttheIBM WatsonandInfocomtracestomeasuretheperformanceontwoparameters: 1.Rechability:Thepercentagenumberofnodesthatcouldbereachedinmultiple hopsbyagivensourceaveragedoverallnodesinthescenario. 2.Delay:Thepercentageofnumberofnodesthatarereachedinagivenamountof time. WeplottheroutingperformanceinFigure 4-9 tovalidaterealandsynthetictracesin a llcasesofreachabilityanddelay.Surprisingly,despitemodelsclaimtoexhibitvital mobilitycharacteristics,theydramaticallydeviateinnetworkroutingperformance benchmarks.Alongside,aquantitativereportisshowninTable4.Weobserveepidemic resultsonInfocomexperimenttracetakesanaverageof11hopstodelivermessageto allothernodes;whileittakesonlysevenincaseofTVCtracesandevenlessincase ofRandomModels.Meanwhileforthedelay,thereareatleasttwofoldsofdifference betweenrealmeasurementandsynthetictrace.TheTVCandRandommodelstake muchlesstimeindeliveringmessagescomparedtotheobservedonesinthereal scenario.WendsimilarresultsincaseofIBMWatsontracesaswell.Thedifferencein reachabilityismorethan15hopsbetweenrealandTVCmodelperformanceindelivery themessage. 4.5Discussion Mobilitymodelsaredesignedwithaparticularscenarioinmind.However,inthis studywewouldliketoquestiontheefcacythosemetricsthatarewidelyadoptedor expectedtobevitalinclosingtheperformancegapbetweenmodelingandreality. Now,thereisanimmenseneedtoidentifythemandaperceptionshouldbemadeto 45

PAGE 46

usethemforcorrectestimation.Ourresultsshowcurrentmet ricslikespatio-temporal preferencesandencounterstatisticsareinadequate;becausemodelscompletely missoutonthestructureandperformancecriteria.Webelieveitisimportantfor theresearcherstosearchforfundamentalcharacteristicsthatdrivethedynamicsin challengednetworks.Notonlyweshouldmaintainthecurrentcharacteristicbutalso lookoutforstructuralsemblanceandtopologicalrealismsbetweensimulationand similarity.Agoodresearchdirectionwouldbelookintomeasuresthataffectgloballyina similarwayroutingdecisionaremade. 4.6Conclusion Inthischapter,weshowthatexistingmodeldemonstratespatio-temporaland encounterstatisticsseeninrealtraces.Weanalyzedthespatio-temporalbehavioral similarityprolesamongmobileusers.Wedenemobilityprolesbasedonusers associationmatrices,andthenuseaSVD-based-weighted-cosinesimilarityindex toquantitativelycomparethesemobilityproles.AnalysisofextensiveWLANtraces fromfourmajorcampusesrevealsrichsimilaritydistributionhistogramssuggestinga clusteredunderlyingstructure.Applicationofmodularitybasedclusteringvalidatedand furtherquantiedtheclusteredbehaviorinmobilesocieties.Similaritygraphsexhibitan averagemodularityof0.64,andclusteringcoefcientof0.86,whichindicatespotential forfurthersmallworldanalysis.scrutinizemobilitymodelsonroutingperformance benchmarks.Wecomparedsimilaritycharacteristicsofthetracestothosefromexisting commonandcommunitybasedmobilitymodelstocapturesimilarity.Surprisingly, existingmodelsarefoundtogenerateahomogeneouscommunitywithonecluster andthusdeviatedramaticallyfromrealisticsimilaritystructures.Wealsotestifythat despitemodelscapturerealistichumanbehavioralpatterns;theirroutingperformance deviatefromreality.Weusedthesamesynthetictracestorunepidemicroutingand measureperformance.Bydoingso,wendthatmobilitymodelsperformanceisnot analogoustoreality.Thesyntheticmobilitytraces'indeedcarrynostructuralsimilarity. 46

PAGE 47

Thesedramaticdeviationsfromrealismindicateseriousaw sintheexistingmodels andtheirinadequacyastestbedtoolsforanykindofperformanceevaluationpurposes. Inthischapter,welimitourworkinverifyingepidemicroutingagainsttwowell-known mobilitymodels.Infuture,wearelookingtotestotherroutingprotocolsandmodels thatparameterize.Weelaboratedonthepresenceofsimilarityamongmobileusers andthedetectionofcollectivebehaviorviacommunitydetectioninwirelessnetworks. Weshowedthegapbetweenrealityandcurrentmobilitymodelsindemonstrating collectivebehavior.Inouron-goingwork,wearedevelopingamulti-dimensional mobilityframeworkthathelpsscientiststodevelopmobilitymetricsandverifycurrent modelsagainstrealisticsettingsandprovideguidelinestodevelopnewmodels.Weare lookingintoaglobalperspectiveofclusteringandmobilitycoefcientandmaintaining structuralpropertiesandperformancebyrevisitingmobilitymodeling,whichisvitalfor theevaluationanddesignofnext-generationbehavior-awareprotocols. 47

PAGE 48

CHAPTER5 C OBRA:AFRAMEWORKFORTHEANALYSISOFREALISTICMODELS Mobilityisoneofthemainfactorsaffectingthedesignandperformanceofwireless networks.Mobilitymodelinghasbeenanactiveeldforthepastdecade,mostly focusingonmatchingaspecicmobilityorencountermetricwithlittlefocusonmatching protocolperformance.Recently,newgenerationsofwirelessandmobileserviceshave emergedbasedoncommunalaspectsofmobilesocieties;conventionallynotconsidered inmobilitymodeling.Thisstudyinvestigatestheadequacyofexistingmobilitymodelsin capturingvariousaspectsofhumanmobilitybehavior(includingcommunalbehavior), aswellasnetworkprotocolperformance.Thisisachievedsystematicallythroughthe introductionofaframeworkthatincludesamulti-dimensionalmobilitymetricspaceto measureindividual,pair-wise(encounter)andgroup(community)metrics.Wethen introduceCOBRA,anewmobilitymodelcapableofspanningthemobilitymetricspace tomatchrealistictraces.Amethodicalanalysisusingarangeofprotocol(epidemic, spray-wait,Prophet,andBubbleRap)dependentandindependentmetrics(modularity) ofvariousmobilitymodels(SMOOTHandTVC)andtraces(universitycampuses, ofces,andthemeparks y )isdone.Ourresultsindicatesignicantgapsinseveralmetric dimensionsbetweenrealtracesandexistingmobilitymodels.Ourndingsclearlyshow thatCOBRAmatchescommunalaspectandrealisticprotocolperformance,reducingthe overheadgap(w.r.texistingmodels)from80%tolessthan12%,showingtheefcacy ofourframeworkbaseduponthemetricspacematching.Wehopeforournewmobility modeltoprovideamorerealisticandaccuratealternativeformobilitymodeling;toaidin theanalysis,simulation,andthedesignoffuturecommunicationprotocols. 5.1Introduction Mobilitymodeling,analysisandsimulationareessentialtothedesignand evaluationofmobilenetworkingprotocols,servicesandapplications.Inthepast decade,theareaofmobilitymodelinghasbeenquiteactiveinliteraturewithdozens 48

PAGE 49

ofmodelsproposedandusedinresearch.Manyofthesepropose dmodelswere showntosuccessfullycaptureoneormorecharacteristicsofmobility,sometimes validatedusingrealtraces.Whatislacking,however,isabenchmarkingframework formobilityevaluation,whichcansystematicallyassesscomprehensivemetricstoaid inthecharacterizationandmeaningfulcomparisonofthesemodels.Furthermore, themainpurposeofthesemodelsistherealisticevaluationofprotocolperformance, whichshouldbeanintegralpartoftheframework.Recently,anewgenerationofmobile networkingprotocolshasbeenintroducedbasedoncommunalandstructuralcongruity aspectsofmobileusers[ 54 112 120 ].Suchaspectshavenotbeenconsideredin m obilitymodelingconventionally,anditisimportantforthebenchmarkstoincludethese newaspectsandmetricsofhumanmobility.Themainchallengeslieinintroducing andassessingmobilitymodelsthatcaptureallthesemetricssimultaneouslyina realistic(matchingtracesandprotocolperformance)andpracticalmanner(amenable tolarge-scalesimulations).Particularly,weattempttoanswerthefollowingquestions: i. Whichaspectsofbehaviordocurrentmobilitymodelscapture(orfailtocapture)and towhatdegree? ii. Howcanamodelbepurposefullydesignedtocapturethevarious metricsofmobilityatwill? iii. Howdoescapturingmobilitymetricsreectintheabilityto capturerealisticprotocolperformance? Thisstudyre-visitstheareaofmobilitymodelingforthepurposeofmobilenetwork evaluation,andintroducesamulti-dimensionalmobilitymetricspacetoaccurately characterizeandbenchmarkmobilitymodels.Themetricsareclassiedintoindividual (e.g.,spatio-temporalpreferences),pair-wise(e.g.,encounterbased)andcollective (e.g.,group,community)metrics.Inadditiontotheseprotocol-independentmetrics,a systematicmethodisadoptedtoevaluateandcompareprotocolsperformanceacross modelsandrealtraces. Next,anewmobilitymodelisintroduced.Thesalientfeatureofthenewmodel iscapturingthe CO llective B ehaviorbasedon R ealistic A spectsofhumanmobility 49

PAGE 50

( COBRA ).Theconstructionofthismodelattemptstoexplicitlycaptureindividual, pair-wise,andgroupmobilitymetrics,whilemaintainingscalabilityandmanageability duringsimulations.COBRAisthenthoroughlyanalyzedusingframeworkguidelines. Oncethebenchmarkingframeworkisinplaceandthemobilitymodelisdened, thestudypassesthroughtwophases.First,asystematicprotocol-independentanalysis isperformedtocharacterizethemobilitymetricsofrealtraces,COBRA,andaset ofexistingmobilitymodels.Extensivetracesfromseveraluniversitycampuses, conferences,ofces,andthemeparks y areused.Inaddition,severalmobilitymodels areevaluatedincludingrandomdirection[ 33 ],time-variantcommunity(TVC)[ 108 ], a ndSMOOTH[ 171 ].Thelattertwomodelsarebasedonrealtracesandhave b eenshowntocaptureseveralimportantcharacteristicsofmobility.Second,a protocol-dependentanalysisisperformed,usingseveralkeyDTNprotocols;including epidemicrouting[ 237 ],spray-and-wait[ 215 ]andprophet[ 153 ],toevaluatetheaccuracy b ywhichthemodelsmatchtheprotocolperformanceovernetworktraces.Furthermore, thesupportforbehavior-awareprotocols;suchasprole-cast[ 120 ],bubblerap[ 112 ], a ndSimbet[ 54 ]isdiscussed. T heresultsofouranalysesclearlyshowtheshortcomingsofexistingmobility modelsoverseveralmobilitymetrics.Mainly,noneoftheexistingmodelscancapture thecommunalstructureexhibitedbygroupsofrealmobileusers.Thus,therichness anddiversityofuserbehavioralpatternsisnotcapturedinanydegreeintheexisting models.Also,mostmodelswerenotdesignedtocapturemulti-dimensionalmetrics, triggeringtheneedforanewmodel(orsetofmodels).COBRA'sdesign,ontheother hand,enablesittocaptureallofthemobilitymetricsaccurately. Interestingly,thesystematicapproachtodesigningCOBRAtoaccountforthe variousmobilitymetricsalsoachievesaveryclosematchinalltheprotocolperformance metrics(foralltheevaluatedprotocols),thusclosingthesignicantgapinmobilityand protocolevaluation.Specically,theresultshaveshownthatCOBRAis90%accuratein 50

PAGE 51

Structural Dynami cs EncounterPatterns Commun ityDetection Spatio-temporalSimilarity SmallWorldAnalysis ... ICT,PathLength Clustering Coefficient Modularity SimilarityScore ... Protocol Perfo rmance EncounterBased Fo rwardingProtocols CommunityBased Forwa rdingProtocols EpidemicRouting S praynWait Prophet Profile-Cast Bu bbleRap SimBit DeliveryProbability L atency OverheadRatio ProtocolDependentAnalysis P rotocol In dependentAnalysis Performance Me trics Structural Me trics Network Graph s Mobility Models Individual Patterns LocationVisiting Preferences Periodic Reappearances ... EncounterStatistics InterContactPatterns ... Collective Patterns Pair-wise Patterns CommunityDetection spatio-temporal similarity ... COBRA, TVC, SMO OTH, Random BuildingblocksforMobilityModels Mobility Metric Space RealMeasurements GPS Wireless (WLAN and Bluetooth) Figure5-1.Frameworkforanalysisandmodelingofhumancont actnetworks. demonstratingthesimilaritypatternsobservedinrealtraces.Also,itcloselymatchesthe protocolperformance(85%-95%)onalldifferentmetrics.Contributionsofthisworkare manifold: 1.introducingasystematicmobilitybenchmarkingmethod,includingamulti-dimensional mobilitymetricspaceandaframeworkforthoroughmobilityandprotocol performanceanalysis, 2.introducinganewmobilitymodeltocapturemultiplemobilitymetricssimultaneously. Wealsoplantoreleasethemodelimplementationandbenchmarkscenariosas partofthisstudy. 3.characterizingandquantifyingthemetricgapsinexistingmobilitymodels, 4.providingacomprehensiveevaluationofperformanceofDTNprotocolsoverthe variousmodelsandtraces. Therestofthispaperisoutlinedasfollows.InSection 5.2 ,weintroducethemobility f rameworkandmulti-dimensionalmobilitymetricspace.InSection 5.3 ,weintroduceour n ewmobilitymodel,COBRA.InSection 5.4 ,weextensivelyperformprotocol-independent a nalysis.Insection 5.5 ,weperformprotocol-dependentanalysisandnallyconclud e thispaperinSection 5.6 5 .2TheMobilityFramework Futuremobileservices,applications,andmessagedisseminationparadigms willbeinuencedbybehavior-drivenhumanmobility.Forexample,spatio-temporal 51

PAGE 52

preferencesofhumans(e.g.,suchasgoingtosportscomplexa ndmusicconcerts)will provideinsightintotheirlikingsanddiurnalactivities,whichcanbeusedforcustomized servicesandadvertisements.Also,opportunisticcommunicationtechniquesrely onvaryinghumanmobilitycharacteristics(suchasinter-contacttimeandsocial structures)toefcientlytransfermessagesinthenetwork.Sincemobilityandsocial dynamicsimpacttheperformanceofroutingprotocols,itisofcriticalimportanceto examineconstituentfactorsthatidentifysuchcharacteristicsandevaluatemodels thatusethem.AsshowninFigure 5-1 ,theproposedframeworkconsistsoffourmajor c omponents: I. Realmeasurements, II. Buildingblockformobilitymodels, III. Mobility characterizationusingprotocol-independentanalysis,and IV. Mobilitycharacterization usingprotocol-dependentanalysis.Moredetailswillbeprovidedoneachoftheseblocks throughoutthispaper.Theseblocksconstitutesystematicguidelinesfordeveloping futuremodels,genericevaluationofprotocol-independentmetricssuchassimilarity andcommunitystructures,aswellasnetworkprotocolsanalysis.Nextwedescribe componentsoftheframework. 5.2.1Realmeasurements: Recentyearshavewitnessedtheuseofportableandwearablecomputing devicesandwirelesscommunicationinfrastructures.Theirusagepatternsanddata collectedthroughthemhashelpedcaptureandunderstandmobileusers'behavioral patterns.Thesepatternsthenplayacrucialroleindevelopingnewmobileservices andapplications.Inordertostudytheaccuracyofmobilitymodelsitisimperativeto comparethemagainstrealmeasurements[ 122 141 ].Manystudiesaredonepurely o nanalyticalandtheoreticalbasisandvalidatedthroughsimulations.However,with benetofhavingthegroundtruththroughrealmeasurements,theexaminationof currentmodelswillproveveryeffectiveandfactual.Furthermore,oncecondenceis built,suchmodelscanbeusedinuntestedenvironments.Intheframework,weusereal measurementstoanalyzetheeffectivenessofmobilitymodels. 52

PAGE 53

CollectivePatters Pair-wisePatterns IndividualPatt e rns Encounterstatistics Inter-Contact patterns Locationvisiting preferences Periodic re-appearance Communitystructures Spatio-temporalsimilarity Smallworld Figure5-2.Multi-dimensionbehavioralmetricspaceforhum anmobility. 5.2.2Buildingblocksformobilitymodels: Weproposeamulti-dimensionalmobilitymetricspacethatexpressesvarious aspectsofhumanmobility.Eachofthedimensionsconsistofasetofmetricsthathelp tocapturespecicfeatures.Next,wediscussthismetricspace. Recentstudieshaveshownthatbehavioralpreferencesandsocialstructures shapehumanmobility.Since,humanmobilityimpactsmessagedisseminationand performanceofroutingprotocolsincommunicationnetworks[ 13 40 ],thereforea m ajoremphasisiscurrentlygiventoaccuratelycapturethesebehavioralpreferences andstructures.Inthissense,mobilitymodelsplayapivotalroleindemonstrating suchcharacteristicsandareexpectedtoshowrealisticperformance.Owingtothe complexityofunderstandingthehumanbehavioralpreferences,werepresentthem through Multi-dimensionalMobilityMetricSpaces asshowninFigure 5-2 .Thedesign 5 3

PAGE 54

ofthesedimensionsmakewaystoclassifycommonlyusedquant itativemetricsof humanbehaviorthatfollowsnaturallyfromtheunderstandingofexistingmobile servicesandprotocols.Theseservicesandprotocolsrelyonspatio-temporalmobility, location-basedservices,individualpatterns,encounters,andcommunalbehavior.The threedimensionsare i. individualmobility, ii. pair-wisemobility,and iii .collectivemobility patterns.Next,wediscussthesedimensionsindetail. 5.2.2.1Individualmobilitypatterns Individualpatternsfocusonindependentbehaviorofthemobileusersoverspace andtime.TworelatedimportantmetricshavebeenobservedbyHsuetal.in[ 108 ]. T herstisaspatialmetrictocapturethelocationvisitingpreferencesmeasuredbythe percentageoftimeamobileuserspendsatagivenlocation.Thesecondisatemporal metrictocapturetheperiodicreappearancesmeasuredbytheprobabilityofvisiting thesamelocationafteratimegap.Othermetricsincludespeedandpausetime.We evaluatethesepatternsinSection 5.4.2 5 .2.2.2Pairwisemobilitypatterns Pairwisepatternsareobservedbetweentwoencounteringmobileusersand reectvariousstatisticalaspectsofencounterpatterns.Theyprovideaninsightinto opportunitiestoexchangemessagesinencounter-basedprotocols.In[ 40 ],thestudy s howedthatcontactpatternsareessentialinthedesignofeffectiveroutingandcontent distributionschemes.Encountermetricsincludenumberanddurationdistributionof encounters,andinter-contacttime.Wepreciselydeneandevaluateseveralmobility modelsforthepresenceofthesepatternsinSection 5.4.3 and 5.4.4 5 .2.2.3Collectivemobilitybehaviorpatterns Capturingthemannerinwhichhumanscongregateanddispersebasedon communalbehaviorisimportantattwolevels: i) thecorrelatedformation(andbreakage) oflinksaffectstheperformanceofmobilenetworkingprotocols,and ii) thesimilarityof behaviorexhibitedwithinanidentiablecommunitycanbeusedtoinformthedesignof 54

PAGE 55

newservices(includingmobilesocialnetworking).Tocaptu rethecommunitydynamics, metricsthatassessthesimilarityofusersareintroduced,inadditiontoclustering mechanismsbasedonmodularity[ 92 179 ].Metricsincludesimilarityandclustersize d istributions.Weevaluatetheaccuracyofseveralmobilitymodelsinreplicatingthese metricsinSection 5.4.5 W eexpect,foraccuratehumanmobilitymodeling,currentmodelsshouldexplicitly incorporatethesedimensionsintheirdesign.However,currentinvestigationsuggests thatmanyearliermodelsinadequatelycapturethem,andthereforemodels'widespread applicabilityisquestionable[ 102 209 228 232 ]. 5 .2.3Protocol-independentmobilityanalysis: Inthecontextofcommunicatingacrosswirelessnetworks,recentservices, protocols,andmodelshavestartedtoexploitthestructuraldynamicsofhumansocial connectivity[ 54 108 ].Thesemacroscopicstructures(suchascommunities,etc.) have beenfoundfavorableforthedesignofefcientopportunisticprotocols[ 88 101 112 120 ].Theygobeyondthesimpleone-to-oneinteraction(inter-c ontactpatterns)to showcasethecomplexlongitudinalpatternsofhowpeoplemeet,howoftenandfor howlong[ 209 ].Thus,desirablemodelsshouldaccuratelyreplicatesuchs tructures(via synthetictraces).Tothisend,weproposeprotocol-independentanalysisofmobility modelsthatinvolveexaminingdynamicpropertiessuchasspatio-temporalsimilarity, clustering,andcommunitystructures.Theyareevaluatedthroughmetricssuchas modularity[ 179 ],similarityscores[ 228 ],andclusteringcoefcient[ 7 ]. 5 .2.4Protocoldependentmobilityanalysis Themainpurposeofmobilitymodelsistosimulaterealisticallyidenticalperformance ofprotocolsandservices.Weidentifytwotypesofroutingprotocols i) Encounterbased ii) Communitybasedforwardingprotocols.Encounterbasedforwardingprotocolssuch asepidemicrouting[ 237 ]utilizeshumanencountersasanopportunitytotransfer m essages.While,communitybasedforwardingprotocolssuchasBubbleRap[ 112 ] 5 5

PAGE 56

0 20 40 60 80 100 0 0.05 0.1 LocationVisitation Probability A 0 20 40 60 80 100 0 0.05 0.1 LocationVisitation Time B F igure5-3.Probabilitydistributionoflocationvisitationandtime benetfromstructuraldynamics(communities,etc.)ofhumanmobilitytoperform messagedissemination.Theframeworkrecommendstoevaluatesuchprotocols throughperformancemetricssuchasdeliveryratio,latency,etc.[ 132 ]. T hediscussedmobilityanalysisframeworkfocusesonmulti-dimensionalaspectof humanmobilitythataearliermodelshoulddemonstrate.Inthesubsequentsections, wewillusethisframeworkforevaluatingcurrentmobilitymodelsandroutingprotocol analysis. 5.3COBRA Today,mobilitymodelshaveventuredfromreplicatingfeaturesofpurestochastic systemssuchasrandomwalktomoresophisticatedones,whichinvolvedemonstrating realistichumanbehaviorandmobilitypatterns.Nonetheless,earliermodelsare expectedtobesimple,scalable,effortlesslycongurable,andmathematicallytractable. It'spreferableforthemtobedata-drivenandbeabletogeneratesynthetictraces thatarecomparabletorealmeasurements[ 141 ].Also,modelsshoulddemonstrate i denticalprotocolperformancelikewisethereality.ProtocolssuchasProle-cast[ 120 ] a ndBubble-rap[ 112 ]harnesstheunderlyingstructuraldynamicsofhumancommun al behaviortotransmitmessages.Forthatpurpose,models'generatedtracesshould reectsuchdynamicsthatweevaluateusingcommunitydetection.[ 179 ].However, p reviousstudiesshowthatvastmajorityofmobilitymodelsareinadequatetodepict 56

PAGE 57

realisticpatterns.Forexample,in[ 102 ],usingcontactgraphsauthorshaveshown t hatmodelsfailtocapturebridginglinksbetweencommunities.In[ 228 231 232 ],we h aveshownthatevencarefullycraftedmodelssurprisinglyresultinstructuraldynamics andprotocolperformancethatdramaticallydeviatefromreality.In[ 240 ],authorshave o bservedthatinter-contactpatternsinthemeparksarebestdescribedbygamma distribution.Similarstudiesarefollowedin[ 99 209 ].Onthecontrary,someearlier m odelsexclusivelyfocusonlyondemonstratingpower-lawandexponentialdecay dichotomy,whichlimitstheiradoptionandusabilityinsuchsituations.Thesendings stronglymotivatestheneedtore-visitmobilitymodelingtodepictaccuratehuman behavioralcharacteristicsandnetworkperformance. 5.3.1Initialidea Wedesign CO llective B ehaviorbasedon R ealistic A spectsofhumanmobility ( COBRA )keepinginmindthelimitationimposedbythepredispositionofmodeling aspecictypeofstochasticdistribution.Instead,wetakeanaturalapproachby attemptingtoencapsulaterealistichumanbehavioralmobilityfeaturesthroughexplicit spatio-temporalsynchronization ofnitesetofkeyactivites.Thatincludeorderly distributionoflocationvisitation,structureintime(pausetime,weekdayandweekend behavior,andofine/onlinepatterns),movementsspeeds,andsocialties.Ingeneral, humanmobilitycanbepicturedthroughasetoflocationsvisitedduringaparticular time.TheideafundamentaltoCOBRAistoexplicitlysynchronizetheevents(notdone inearliermodels)leadinguptothevisittotheselocationsataparticulartimeintervalfor respectivemobileusers.Forexample,COBRAattemptstomodeltheperiodicvisitation patternsofmobileusersattendinglecturesinaclassroom(location).Asaresult,this approachnaturallyhelpstomodelhumanmobilityandtestdiscussedmetrics. 5.3.2DesignandDescriptionofCOBRA Inthissection,wedescribethedesignofCOBRAinmoredetails.Theblock diagramofCOBRAisshowninFigure 5-4 .Themodelcomponentsinvolvetime 5 7

PAGE 58

Generators VisitingLocations distribution PauseTimedistribution EpochLength dis tribution StructureinTime ( Weekday/Weekend) Locations ( Cells) Movement Speeds Statistics Generator NodeConfiguration[1,2,3,...N] Event Generator Mobility Generator TraceGenerator NodeTraces[1,2,3,...N] Figure5-4.COBRAarchitecture s tructureandpausetime,visitationlocation,epochlengthdistribution,eventand mobilitygenerator,andatracegenerator.Themodelprovidesexibilitytoindependently congureeachnode'sspatio-temporalpatterns,therebycapturingtheheterogenous behavioralpatternandmobilityatwill.ThismakesCOBRAdistinctfromothermodels andhelpstocapturetherichnessotherwiseevidentonlyinrealmeasurements.Westart withlocationvisitationpatternsofnodesthatistheprobabilitydistributionoffrequencies oftheirvisitstoasetoflocations.Thisapproachhelpstocaptureskewed(heavily visited)aswellinfrequentlyvisitedlocations,showninFigure 5-3A .Forexample,a m obileuserregularlygoestoofce,butonceinawhile(sayweekend)goestogrocery store.Inthatsensetheprobabilitytovisitofceismuchhigherthanstores.Inthe simulationsetting,eachlocationisasquaregeographicalarea(cell)withconstant edgelength.Next,thedurationoftimeanodespendsinmovingtoalocationisdened byepochlength.Itstartsfromtheendpointofthepreviouslocation'sepochandis generatedfromanexponentialdistributionequaltothesizeoflocation.Theofine 58

PAGE 59

behaviorofthenodeisthusdenedasthetraveltimefromonee pochtoanother, measuredthroughaspeedandadirection(angle)movementforthechosenlocation. Aroamingepochisalsodenedwhennoderoamsaroundthewholesimulationarea duringsomeepoch,byassigninganadditionallocationthatcorrespondstothewhole simulation.Basically,anodechoosesanewlocationprobabilityandepoch,and continuestomoveinthatdirectionwithachosenspeed.Aftereachepoch,thenode remainsstationaryinthatlocationforthepausetimedrawnfromthedistribution,an exampleisshowninFigure 5-3B .Asevidentfromboththegures,therearefew l ocationsthatarefrequentlyvisitedwithlargepausetimes.Inadditiontothat,wealso gatherperiodicity,whichprovidesexibilitytocreatemultipletimeperiodswithdifferent locationsandvariablesettings.Thetimeperiodsareessentiallytheperiodicitiesthat arepresentinthehumanmobility.Forexample,aweeklyperiodicitycanbegoingto workduringtheweekdaysandspendingweekendathome,orattendingclassroom lecturesthreetimesaweek,etc.Theepochlengthsandpausetimethereforedepend onthetimeperiodicity.Themodelisdata-drivenandgeneratessynthetictracesthat canbecomparedagainstrealmeasurementsusingmetricsdiscussedintheframework section.WemodeltimedependentlocationselectionprocessthroughMarkovchains thatmaintainthespatio-temporalheterogeneityofindividualnodesinthesimulation area.Let I beacountablesetoflocationvisitationprobabilitiesand i 2 I islocation stateand I isdenedassimulationstatespaceofalllocations.Also, =( i : i 2 I ) is ameasureon I if 0 i < 1, 8 i 2 I ,suchthat P i 2I i =1 .Let,theprobabilityspace is f ,n, g andarandomvariable X hasvaluesin I X : I .Wedene as X as i =( X = i )=( f : X ( )= i g ) .Weuse X asmodelingthelocationthata usertakeswithvalue i withprobability i .Let,matrix A =( a ij : i j 2 I ) with j th rowas distributionand n asintegerwith ( X n ) n 0 isaMarkovchainwithinitialdistribution andlocationtransitionmatrixas A .Suchthatforanumber n 0 and i 0 i n +1 2 I ,(i) ( X 0 = i 0 )= i 0 (ii) ( X n +1 = i n +1 j X 0 = i 0 X n = i n )= a i n i n +1 .Next,weseeadiscrete 59

PAGE 60

timelocationselectionisessentiallyaMarkovprocess, ( X 0 = i 0 X N )Tj /T1_2 11.955 Tf 11.88 0 Td (i N )= i 0 a i 0 i 1 a i N )Tj /T1_3 5.978 Tf 5.76 0 Td (1 i N (51) Proof: Let ( X n ) 0 n N isMarkovprocess ( A ) ,then ( X 0 = i 0 X 1 = i 1 X N = i N ) =( X 0 = i 0 )( X 1 = i 1 j X 0 = i 0 ) ( X N = i N j X 0 = i 0 X N )Tj /T1_3 7.97 Tf 6.6 0 Td (1 = i N )Tj /T1_3 7.97 Tf 6.6 0 Td (1 ) = i 0 a i 0 i 1 a i N )Tj /T1_3 5.978 Tf 5.76 0 Td (1 i N ifEq. 51 holdfor N ,thenaddingbothsidesover I N 2 I and P j 2I a ij =1 ,Eq. 51 also h oldfor N )Tj /T1_1 11.955 Tf 11.4 0 Td (1 8 n =0, N thenusinginduction ( X 0 = i 0 X N = i n )= i 0 a i 0 i 1 a i n )Tj /T1_3 5.978 Tf 5.76 0 Td (1 i n For ( X 0 = i 0 )= i 0 and n =0,1, N )Tj /T1_1 11.955 Tf 12 0 Td (1 ( X n +1 = i n +1 j X 0 = i 0 X n = i n ) = ( X 0 = i 0 X n = i n X n +1 = i n +1 ) ( X 0 = i 0 X n = i n ) = a i n i n +1 Thatproves ( X n )= 0 n N isMarkov ( A ) .Next,weshowthatselectionofthenext locationcellisindependentbutbasedonprobabilityvalue.Let ( X n ) n 0 beMarkov ( A ) then X m = i ,( X m + n ) n 0 isMarkov ( i A ) andmemorylessoftherandomvariable, X 0 X m Proof :Letalocationselectionevent S k isanumber ( f X m = i m X m + n = i m + n g T S j X m = i ) (52) = ii m a i m i m +1 a i m + n )Tj /T1_3 5.978 Tf 5.76 0 Td (1 i m + n ( S j X m = i ) (53) 60

PAGE 61

byconsidering S = X 0 = i 0 X m = i m ,weshowthat ( X 0 = i 0 X m + n = i m + n i = i m ) ( X m = i ) = i i m a i m i m +1 a i m + n )Tj /T1_3 5.978 Tf 5.76 0 Td (1 i m + n ( X 0 = i 0 X m = i m i + i m ) ( X m = i ) t hisisprovedbyEq. 51 .Thelocationselectionevent S t hrough X 0 X m isthesum ofdisjointunionof S = [ 1 k =1 S k .Then,Eq. 52 for 8 S followbyaddingtherespective S k Next,wedescribemodelparametersandconstruction. 5.3.3ParametersandModelConstructionofCOBRA Here,weoutlinetheparametersrequiredtoconstructthemobilityscenario.Asa generalrule,locationinformationandvisitationprobability,activitytime(pause),epoch length,andmovementspeedsamonglocationsareneeded.Thestructureintime (activitytime,epoch,etc.)isnecessarytocapturethetimedependentbehavior. 5.3.3.1Determinethestructureforlocationvisitation Eachnodeinthesimulationareamayhaveskewedlocationvisitingpreferences. Thisskewedpatternisrepresentedbyaprobabilitydistributionthatshowsthepopularity ofvisitingasetofcertainlocations.Thenumberandprobabilityofvisitinglocations ( L = f l 1 l n g )isdecidedforallthenodes.Wegeneratethisdistributionforindividual node,byaveraginglocation'sfrequencieswithtotalvisitedlocationsfrequencies,as V ( L ) .Thus, P ( L i )= V ( L i ) = P n V ( L ) ,suchthat P i = n i =0 P ( l i )=1 5.3.3.2Determinethestructureforactivitytime,epoch,andperiod Thepauseinterval(activitytime)isdistributionoftime( T ( L )= f t ( l 1 ), t ( l 2 ), t ( l n ) g )whileamobileuserisonlinedoingsomeactivityatvisitedlocations.Wegenerate thedistributionofactivity-timeforalltherespectivelocations.Thisactivitytimeis synchronizedacrossallmobileusersandbreedsencounterswithsimilarlocation visitationpreferences. 61

PAGE 62

Theepochlengthisthedistributionoftime E ( L ) = f e ( l 1 ), e ( l 2 ), e ( l n ) g whilea nodeisofineormovingorcannotconnectorisnotvisibletootherusers.InCOBRA, themobilityofanodehasvariousepoch,andwhenanodechoosestomovefromone locationtoanother,itstartsfromtheendepochofpreviouslocationtillitreachesthe newdestination.Theepochlengthiscalculatedusingthedifferenceofsimulationtime andtheactivitytimeforasetoflocationsthatnodehasvisitedinthepast.Ingeneral, anodehasalargeepochlengthwhenitisofineformostofthetime,whileasmall epochlengthshowsgreateronlineactivityandprobabilityofencounteringothernodes. Thusforatotalsimulationtimewithoutactivitytime S ,theepochlengthforalocation l i is e ( l i )= P i = n i =0 ( S)]TJ /T1_1 11.955 Tf 23.88 0 Td (t ( l i )) .Inordertocalculateactivitytime U ( L )= f u 1 u 2 u n g for thelocationsofamobileuser,wedeterminethetotalamountoftimeitplanstospend onlineduringthesimulation.Inreality,humanmobilityhasastructureintime.Humans havetendencytovisitsameplaces,forexample,employeesvisitworkplaceduring thedaytime,studentshavexedclasstimings,etc.Oncethisstructureisdecided,the simulationdrivesthevisitationpatterns,activityandepochlengthsaccordinglyforthose periods. 5.3.3.3Movementspeed Whenanodechoosesanotherlocation,itstartsfromtheendpointoftheprevious activitytimeandstartsanewepochwiththenewlocationandepochlength.The nodemovestowardsthenewlocationwitharandomspeeduniformlydistributedin ( V min V max V avg ) withadirectiontowardsthedestinationlocation. Wehaveautomatedthecongurationprocesstohelpcreatesimulationscenarios andgeneratetraces.COBRAsimulatoralsogeneratesONEsimulatorcompatibleles forprotocolanalysis,networkgraphlestoanalyzestructuraldynamics,andGoogle EarthKMLlesforcrowdsimulation.Next,weperformprotocolindependentanalysis. 62

PAGE 63

5.4ProtocolIndependentAnalysis I nthissection,wecarryoutprotocolindependentanalysisonseveralmobility models,includingCOBRAandonrealworldmeasurements.Thepurposeofthis studyistocomparetheaccuracyofmodelsagainstrealityincapturinghuman structuraldynamicsand;tovalidateCOBRAandtestit'ssuperiorityoverother models.Whileseveralmetricsarementionedintheframework,wefocusourstudyon i) Spatio-temporalpreferences, ii) Similarityinmobilesocieties, iii) EncounterStatistics, and iv) ClusteringbasedonModularity. 5.4.1MobilityModelsStudied Weusecurrentlyavailablemobilitymodelsandevaluatethemthroughthemetrics proposedinprotocolindependentanalysisoftheframework.Thesemobilitymodels haveshowntocapturehumanbehavioraldynamicssuchasspatialandtemporal preferences.Forbaselinecomparison,wehavealsousedavariantofrandommobility model.Themodelsinclude: i) Time-VariantCommunityModel, ii) SMOOTH, iii) Random Directionmodel.WesetuptheTVCmodel[ 108 ]toexaminetheskewedlocation v isitingpreferencesandtimedependentmobilitybehaviorofusers,andcheckifTVC successfullycapturesprotocolindependentmetrics,andifyes,uptowhatextent. Smoothisbasedontheconceptofpower-lawandexponentialdecaydichotomy ofencounterstatistics[ 171 ].It'smainfeatureincludestheabilitytodemonstrate i nter-contacttime,ightdurationdistribution,andpausetimesinmobileusers.In randomdirectionmodel[ 203 ],amobilenodemakesrandommobilitydecisionswith r especttocurrenttimeorlocation,independentofotherusers.Wesetupthismodelto investigatetheeffectofrandommovementsonperformance.Wecongurethismodel intwoways: i) abasemodelasdescribedin[ 33 ]; ii) addon/offbehaviortousers(i.e., whenanodeisoff,itcannotreceive/transmitpackets),whichcorrespondstothefact 63

PAGE 64

0 5 10 15 20 25 30 35 10 -6 10 -4 10 -2 10 0 Locations sorted by visit timeFraction of visit time IBM Trace TVC COBRA TVC COBRA IBM Trace A 0 1 2 3 4 5 6 7 0 0.1 0.2 0.3 0.4 0.5 Time (in Days)Probability of Re-appearing IBM Trace TVC COBRA TVC IBM Trace COBRA B F igure5-5.Spatio-temporalanalysisofmodels.(A)Locationvisitationpatterns.(B) Periodicre-appearance. thatmobiledevicesarenotalwaysturnedon.Whilethereareseveralmodels,wedid ourbesttoselecttheonesthatshowedanedgeoverothers[ 54 164 173 ]. 5 .4.2AnalysisofSpatio-TemporalPreferences Inreality,thereexistsanon-homogenousbehaviorofmobileusersinboth spaceandtime.Ingeneraltwoimportantmetrics: i) Locationvisitingpreferences, ii) periodic-reappearancesareimportantincapturingsuchbehavior[ 108 ].InFigure 5-5 w eshowtheplotsformetricsexhibitedinrealmeasurementsaswellasinthesynthetic measurementsofbothTVCandCOBRAmodel.WeseethatTVCandCOBRAwere abletoaccuratelycapturetheabovemetrics.InadditiontothatCOBRAwasableto capture averagenodedegree,thehittingtimeandthemeetingtime (notshownforpage limits).Sinceothermodelsarenotdesignedtoreplicatethesecharacteristics,weare unabletostudythematthispointoftime. 5.4.3SimilarityinMobileSocieties Weexaminethedistributionofsimilarityvaluesamongnodepairsasproposed in[ 120 228 ].Thismetriccanbeexpandedtogeneratemobilesocietieswi thsimilar preferencesthataidinbetterdisseminationofmessages[ 112 120 ].Inthismetric,an 0 A nalysisisdonewhileworkingatDisneyResearch,Zurich,2012. 64

PAGE 65

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Users Pairs (CDF)Similarity Score Dartmouth Trace COBRA SMOOTH TVC RDP RDP-Pause ADartmouth-Similarity 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Users Pairs (CDF)Similarity Score USC Trace COBRA SMOOTH TVC RDP RDP-Pause BUSC-Similarity F igure5-6.DistributionofSimilarity on-lineassociationmatrixisusedtorepresentusers'temporalandlocationpreferences, andaSingularValueDecomposition(SVD)ofthismatrixcapturesthedominant behavioralpatterns.Acosineproductofit'sweightedeigen-behaviorvectorsgivesa quantitativescorethatdenessimilarityamonguserpairs.Forapairofusers,with eigen-vectorsas X = f x 1 x 2 x 3 ,..., x r x g and Y = f y 1 y 2 y 3 ,..., y r y g ,thesimilarityis calculatedas Sim ( X Y )= rank ( X ) X i =1 rank ( Y ) X j =1 w x i w y j j x i y j j Sim ( X Y ) isquantitativemeasurethatrepresentstheclosenessoftwousersin spatio-temporaldimension.Thevalueofsimilarityliesbetween 0 6 Sim ( X Y ) 6 1 .Avalueneartooneshowsthatpairofusersareverysimilar.Weshowtheresults ofsimilaritydistributionforDartmouthandUSCinFigure 5-6 .Wendarangeof s imilarityscoresexiststhatcapturetheheterogenousbehavioramonguserpairsofreal measurements.Forexample,inDartmouth,90%ofpairshavesimilarityscorelessthan 0.6andforUSC85%haveascorelessthan0.5.Theanalysisofuserpairsgenerated fromthemodelsshowthatCOBRAisveryaccurateindemonstratingthedistribution ofsimilarityscoresakintothereality.WhileTVCandvariantsofrandommodelsshow that90%usershaveasimilarityscoreof0.85and0.8respectivelyforDartmouthand USC,whichlargelydeviatefromthereality.SinceSMOOTHisdesignedtodemonstrate power-lawdistributionofinter-contacttimes,wendthisisthemainreasonthatitis 65

PAGE 66

10 0 10 1 10 2 10 3 10 4 10 5 10 -3 10 -2 10 -1 10 0 TimeP[X>T] IBM Trace COBRA SMOOTH TVC AInfocom'05-ICT 10 0 10 1 10 2 10 3 10 4 10 5 10 -4 10 -3 10 -2 10 -1 10 0 TimeP[X>T] IBM Trace COBRA Smooth TVC BIBMWatson-ICT F igure5-7.Intercontacttimedistribution abletodistributethesimilaritypatternssomewhereinthemiddle,alasdeviatingfromthe reality. 5.4.4AnalysisofHumanEncounterStatistics In[ 40 ],Mooreetal.proposedanimportantsetofencounterstatist icmetricsthat includeinter-contacttimes(ICT),encounterfrequencies,andencounterdurations, whicharecriticalinanalyzingtransferopportunitiesbetweenwirelessdevicescarriedby humans.Theyhavealsoestablishedthesestatisticsexhibitpower-lawandexponential decaydichotomythathelpsinstudyingtheimpactofhumanmobilityonforwarding protocolperformanceinopportunisticnetworks.Wehaveextensivelyanalyzedthese metricsinrealworldmeasurementsandmodels'generatedsyntheticmeasurements tostudytheaccuracyoflaterindemonstratingrealhumanmobilitypatterns.Inviewof pagelimit,weshowtheICTresultsonlyforInfocomandIBMWatsonmeasurements inFigure 5-7 .IncaseofInfocommeasurements,SMOOTH,TVC,andCOBRA a ccuratelydemonstrateICTpatternsasevidentinrealmeasurements.Thismake senseforSMOOTHandTVC,whichpreviouslyhadshowntocapturepower-lawand exponentialdecaydichotomy[ 108 171 ].However,incaseofIBMmeasurements(where t hedistributionofICTisnotpower-lawandexponentialdecayeddichotomy),only COBRAdemonstratestheaccuracy.Thisanalysisindicatesthescalabilityandbenetof 66

PAGE 67

1 2 3 4 5 6 7 8 9 10 -1 10 0 ClusterP[X>N] IBM Trace COBRA SMOOTH Figure5-8.ModularitybasedclusteringdistributionforIB M adoptingCOBRAwheretheunderlyinghumanencounterstatisticsdonotnecessarily followapower-lawandexponentialdecaydichotomy(e.g[ 240 ]). 5 .4.5CommunityDetectionthroughModularityOptimization Inorderformodelstoimitatereality,itisimportantthattheyshouldreproducereal worldsocialandstructuraldynamicsofhumanbehaviormobility.Weusemodularity optimization[ 179 ]todetectstructuraldynamicsofmobilesocieties(communi ties)inreal andmodels'generatedsynthetictraces.Initially,werepresenttheirsocialtiesthrough contactgraph[ 102 ]andapplyadivisivealgorithm[ 92 ]tocomputemodularityscore a ndnumberofdetectedindependentcommunitiesinrealIBMWatsonmeasurements andit'scorrespondingsyntheticmeasurementsthroughvariousmodels.Theresults ofdistributionofclustersizesandprobabilityofmobileusersinoneofthoseclusters arecomparedinFigure 5-8 .Thedivisivealgorithmhasdetectedeightcommunitiesin I BMWatsonmeasurementsandnineinCOBRAandSMOOTHgeneratedsynthetic measurements.Furthermore,wendthatclustersizesandmembershipfollowa power-lawdistributionthatisalsocapturedbyCOBRA.However,incaseofSMOOTH thecommunitymembershipsareevenlydistributedwithinclustersandlargelydeviate fromthereality.IncaseofTVCandRDP(notshowninFigure 5-8 ),wedetectonlyone c ommunity. Inthissection,wehaveexaminedprotocolindependentmetricsthatcapturehuman behavioralpatterns.Ourresultsindicatethatcurrentmodelslargelydeviatefromthe 67

PAGE 68

Epidemic Prophet Spray and Wait 0 0.2 0.4 0.6 0.8 1 ProtocolDelivery Probability IBM Traces COBRA SMOOTH TVC RDP RDP-Pause AIBM,DeliveryProbability Epidemic Prophet Spray and Wait 0 2 4 6 8 10 ProtocolLatency IBM Traces COBRA SMOOTH TVC RDP RDP-Pause BIBM,Latency Epidemic Prophet Spray and Wait 0 50 100 150 200 250 300 350 ProtocolOverhead Ratio IBM Traces COBRA SMOOTH TVC RDP RDP-Pause CIBM,OverheadProbability Epidemic Prophet Spray and Wait 0 0.2 0.4 0.6 0.8 1 ProtocolDelivery Probability Dartmouth COBRA SMOOTH TVC RDP RDP-Pause DDartmouth,DeliveryProbability Epidemic Prophet Spray and Wait 0 2 4 6 8 10 ProtocolLatency Dartmouth COBRA SMOOTH TVC RDP RDP-Pause EDartmouth,Latency Epidemic Prophet Spray and Wait 0 50 100 150 200 250 300 350 ProtocolOverhead Ratio Dartmouth COBRA SMOOTH TVC RDP RDP-Pause FDartmouth,OverheadRatio Epidemic Prophet Spray and Wait 0 0.2 0.4 0.6 0.8 1 ProtocolDelivery Probability Theme Park COBRA SMOOTH TVC RDP RDP-Pause GThemePark,DeliveryProbabili ty Epidemic Prophet Spray and Wait 0 2 4 6 8 10 ProtocolLatency Theme Park COBRA SMOOTH TVC RDP RDP-Pause HThemePark,Latency Epidemic Prophet Spray and Wait 0 50 100 150 200 250 300 ProtocolOverhead Ratio Theme Park COBRA SMOOTH TVC RDP RDP-Pause IThemePark,OverheadRatio Epidemic Prophet Spray and Wait 0 0.2 0.4 0.6 0.8 1 ProtocolDelivery Probability USC Traces COBRA SMOOTH TVC RDP RDP-Pause JUSC,DeliveryProbability(Top 3 0%) Epidemic Prophet Spray and Wait 0 2 4 6 8 10 ProtocolLatency USC Traces COBRA SMOOTH TVC RDP RDP-Pause KUSC,Latency Epidemic Prophet Spray and Wait 0 50 100 150 200 250 300 350 ProtocolOverhead Ratio USC Traces COBRA SMOOTH TVC RDP RDP-Pause LUSC,OverheadRatio F igure5-9.Resultsforencounterbasedforwardingprotocolanalysis realityandareinadequateinsynthesizinghumanmobilesocieties.However,COBRA hasaccuratelyreplicatedalltheexaminedstatisticsindicatingit'seasyadoptionand scalingtoanykindofscenario.Next,weinvestigateprotocoldependentmetricsand comparetheperformanceofroutingprotocols. 5.5ProtocolDependentAnalysis Here,wecomparethenetworkprotocolperformanceofmobilitymodels(including COBRA)totheperformanceachievedwiththerealmeasurements.Wedividethis 68

PAGE 69

IBM Trace COBRA SMOOTH TVC RDP RDP-Pause 0 0.2 0.4 0.6 0.8 1 Delivery ProbabilityModels AIBMWatson,Delivery(Top 3 0%) IBM Trace COBRA SMOOTH TVC RDP RDP-Pause 0 2 4 6 8 10 LatencyModels BIBMWatson,Latency IBM Trace COBRA SMOOTH TVC RDP RDP-Pause 0 50 100 150 Overhead RatioModels CIBMWatson,OverheadRatio F igure5-10.Resultsforcommunitybasedforwardingprotocolanalysis analysisintotwoparts.Firstweexaminetheperformanceofencounterbased forwardingprotocols(e.g.,epidemicrouting)andthenofcommunitybasedforwarding protocols(e.g.,Bubblerap).Forafairanalysis,weusethesamesetoftracesthatwe havegeneratedforprotocolindependentanalysis. 5.5.1ProtocolEvaluationMetrics: Webenchmarkprotocolperformanceonthefollowingthreecriteria: i) Delivery probability, ii) Latency,and iii) Overheadratio.Thedeliveryprobabilityisthepercentage ofsuccessfullyreceivedmessages,latencyisthedelayincurred,andoverheadratio isthecostofdeliveringthemessages(hops,controlinformation,etc.).Aprotocolis expectedtohavehighdeliveryprobabilityandlowlatencyandoverheads.WeuseONE simulator[ 132 ]torunthisprotocolperformanceanalysis. 5 .5.2EncounterbasedForwardingProtocolAnalysis Weevaluatethreedifferentencounterbasedforwardingprotocols: i) Epidemic Routing, ii) SprayandWait,and iii) Prophet.Theepidemicroutingextendsthegeneral notionofmessageoodinginthenetwork.Thisisachievedthroughexchanging messagesamongtheencounteringusersinthemobilityspace[ 237 ].TheSprayand W aitroutingmechanismsprayanumberofcopiesofthemessageintothenetwork, andthenwaitstilloneoftheintermediatemobileuserencountersthedestination[ 215 ]. T heProphetusesaprobabilisticmetricknownasdeliverypredictability,whichisset 69

PAGE 70

foreachsource-destinationpair,indicatingtheforecaste dchanceofthatsourceuserin deliveringamessagetothedestination[ 153 ]. A nalysis: Weshowtheresultsofencounterbasedforwardingprotocolforreal andmodelgeneratedmeasurementsofIBMWatson,Dartmouth,themepark,and USCinFigure 5-9 .Inbroadterms,COBRAisfoundtoperformbetterthanother m odelsonallthreemetrics.Overall,COBRA'sdeliveryratioperformancedeviates lessthan5%toreality,whileothermodelsonanaveragedeviatemorethan40%. Similarly,COBRA'slatencydifferlessthan6%andoverheadratiolessthan10%to thereality.Surprisingly,othermodels'latencyandoverheadonanaveragedeviate more66%and80%respectivelytothereality.AlthoughSMOOTHandTVCshow someproximitytoDartmouthtrace,buttherandommodelslargelydeviateandinmost casesoverestimatedtheperformance.Particularly,COBRA'sperformanceincaseof themeparkmeasurementisworthyofattention,becausetheencounterstatisticsin thiscasefollowgammadistribution(notthetypicalpower-lawandexponentialdecay dichotomy[ 240 ]).Ingeneral,wendthat,rstCOBRAoutperformsallotherm odels init'scategoryby80%inrealisticallyreplicatingtheperformanceandsecond,itis adaptableandscalabletoanyunderlyingdistributionofencounterstatistics. 5.5.3CommunitybasedForwardingProtocolAnalysis WestudytheperformanceofBubbleRaproutingprotocoltoexaminethe usefulnessofmobilitymodelsinutilizinghumancommunitydynamicstotransferthe messagesinopportunisticsettings[ 112 ].BubbleRapisadecentralizedroutingprotocol t hattakestheadvantageofheterogeneousinteractionsamongindividualsandtheir communities.Wesetupoursimulationsuchthateachuserbelongstoatleastone community. Analysis Weshowtheresultsofcommunitybasedforwardingprotocolforrealandmodel generatedmeasurementsofIBMWatsoninFigure 5-10 .Theprotocolperformanceof 7 0

PAGE 71

COBRAincaseofdeliveryprobabilitydifferbylessthan6%,l atencybylessthan2%, andoverheadbylessthan10%totherealmeasurements.Ontheotherhandother models'deliveryprobabilityonaveragedeviatemorethan10%,latencymorethan 60%,andoverheadbymorethan45%tothereality.TheseresultsindicatethatCOBRA issuperioratdemonstratingheterogeneousinteractionsamongindividualsandtheir communitiesforsocialmessageforwardingpurposes. 5.6Conclusion Inthispaper,weproposedanewframeworkfortheanalysisofdata-driven humanmobilitymodelsthatincludedamulti-dimensionalmobilitymetricspaceto measureindividual,pair-wise,andcommunitymetrics.Inaddition,ithassystematic guidelinesforprotocoldependentandindependentanalysisofmobilitymodels.Wealso proposedCOBRA,anewmobilitymodelthatcapturesthe CO llective B ehaviorbased on R ealistic A spectsofhumanmobility.Lateron,wedemonstratedtheabilityofCOBRA inreplicatingseveralprotocolindependentmetricssuchasspatio-temporalpreferences akintotherealmeasurements.Also,COBRAperformanceisclosertothereality thanothermodelsofitsclassforseveralnetworkingprotocols.Particularly,COBRA's encounterbasedprotocolperformancefaredwellbymorethan80%toothermodels.It wasalsosuperiortoothermodelsatdemonstratingcommunitybasedprotocols(less than10%deviation).Insummary,ourworkshowedaneedforasystematicmobility testingframework,whichweachievedinthiswork.WithCOBRA,wewereabletobridge thegapbetweencurrentmodelsandhumanbehavioralmobilitymodeling.Finally,we hopeourworkwillsetastandardforthefuturedevelopmentofnewmobilitymodelsand theirsystematictesting. 71

PAGE 72

CHAPTER6 M EASUREMENTANDVEHICULARDENSITYESTIMATION Inthissection,wegivedetailsofthecollectedgeo-locationinformationofcameras usedfortheanalysisoftopologicalpropertiesandrecordedvehicularimagescaptured fromthesecamerastomodelandcharacterizethevehiculartrafc. 6.1TopologyDataofCameraLocations Trafcwebcamerasaredeployedonkeyintersectionsandhighwayswithinevery city.Thus,wecanassumetheselocationsarerepresentativeofurbanstreetsofthat city.Westartbyrecordingcameras'geo-coordinatesandlocationinformationtostudy thetopologicalpropertiesofurbanstreets.Thisincludeslatitudeandlongitude,zipcode, state,directionalview,andcameralocation.Lateron,weusethisdatatocreatea networkgraphofurbanstreets.Figure 6-1 showsthedistributedsystemarchitecture f orvehicularimagerycollectiononplanet-scale.Theaggregatedatasetcollectedsofar isshownintheTable 6-1 thathasmorethantenregionsaroundtheworldalongwith c ameradetailsandtotalnumberofvehicularrecords. 6.2Vehicularimagerydatacollection Therearethousands,ifnotmillions,ofoutdoorcamerascurrentlyconnectedtothe Internet,whichareplacedbygovernments,companies,conservationsocieties,national parks,universities,andprivatecitizens.Weviewtheconnectedglobalnetworkof webcamsasahighlyversatileplatform,enablinganuntappedpotentialtomonitorglobal Table6-1.Mainglobalwebcamdataset Region#ofCamerasDurationIntervalRecordsDatabaseSizeR outes Bangalore16030/Nov/10-01/Mar/11180sec2.8million357GB B eaufort7030/Nov/10-01/Mar/1130sec.24.2million1150GB Connecticut112021/Nov/10-20/Jan/1120sec.10million435GB74,801 Georgia120030/Nov/10-02/Feb/1160sec.32million1400GB London18211/Oct/10-22/Nov/1060sec.3million201GB32,580 London(BBC)40030/Nov/10-01/Mar/1160sec.20million1050GB Newyork116020/Oct/10-13/Jan/1115sec.26million1200GB Seattle12130/Nov/10-01/Mar/1160sec.8.2million600GB7,656 Sydney6711/Oct/10-05/Dec/1030sec.4million350GB4,422 Toronto18921/Nov/10-20/Jan/1130sec.5million325GB43,055 Washington24030/Nov/10-01/Mar/1160sec.5million400GB59,809 Total 4909 -140.2million7468GB114,942 72

PAGE 73

Asia Fast Ethernet Switch Europe Americas Connecticut Connecticut-2 Beaufort Georgia Newyork Sydney London Washington Seattle Bangalore Kulalumpur Toranto Australia Local Intranet Media Server Backup Server File Server Figure6-1.Distributedarchitectureforvehicularimagery collectiononplanet-scale. Table6-2.Usedglobalwebcamdatasets City#CamerasDuration #Images#Routes Connecticut27421/Nov/10-20/Jan/117.2million74,801 L ondon18111/Oct/10-22/Nov/101million32,580 Seattle12130/Nov/10-01/Mar/118.2million7,656 Sydney67 11/Oct/10-05/Dec/102.0million4,422 Toronto20821/Nov/10-20/Jan/111.8million43,055 Washington24030/Nov/10-01/Mar/115million59,809 Total109125,2million222,323 trendsorchangesintheowofthecity,andprovidinglarge-s caledatatorealistically modelvehicular,orevenhumanmobility.Majorityofthesewebcamsaredeployedbya city'sDepartmentofTransportations(DoT).Although,it'snotpossibletodeploythemat everyintersectionorhighway,nonethelesstheyarestrategicallyplacedtocapturethe trafctrendsatcriticallocations.Atregularintervalsoftime,theycapturestillpictures ofon-goingroadtrafcandsendthemintheformoffeedstotheDoTsmediaserver. Wehavedevelopedcrawlersthatcollectvehicularmobilitytracesfromtheseservers. Forthepurposeofthisstudy,wehavealsomadeagreementswithDoTswithlargecity coveragetocollectthesevehicularimagerydataforseveralmonths.Wecovercitiesin NorthAmerica,Europe,Asia,andAustralia.Overall(hereonlysixoutoftencitiesare presentedwithdetailsinTableI),wedownload15Gigabytesofimagerydataperday fromover2700trafcwebcameras,withanoveralldatasetof7.5Terabytescontaining 73

PAGE 74

around125millionimages.Sincethesecamerasprovidebette rimageryduringthe daytime,welimitourstudytoonlythosehours.Table 6-1 givesahighlevelstatisticsof t hedatasetcollectedsofar.Eachcityhasadifferentnumberofdeployedcamerasanda differentintervaltimethatcapturesimages.Webelieveourstudyiscomprehensiveand reectsmajortrendsintrafcmovement.Next,wediscussthealgorithmtoextracttrafc informationfromimages. IntheTable 6-2 ,wedetailthesixregionsusedinthisstudyandtheextentoft he dataandtimespanofthesample.Thesnapshotstakenateverycamera(atintervals rangingfrom20-60seconds)rstpassabackgroundestimationandsubtractionphase. Thesearethenusedtoestimatethe trafcdensity arrivingperunittimeasopposedtoa carcount. Whileacarcountmightseempreferabletoatrafcdensitymeasure,thereare severalpracticalchallenges.Acarcountrequiresafargreatercomputationalcostdueto theeffortrequiredtoisolateeachobject.Trafccongestionfurthercomplicatesmatters whencarsoccludeeachother,makingitdifculttosegregatecarsbasedonedge structures.Inaddition,vehiclesatthefarendoftheroadaresmallintheimageand cannotbedetectedbythesealgorithms. 1 Sincethesecamerasdonothavenightvision,welimitourstudyto7am-6pm.On average,wedownload15gigabytesofimagerydataperdayfromover2,700trafc cameras,withanoveralldatasetof7.5terabytescontainingaround125millionimages. Inthischapter,forafaircomparison,wehaveselectedonlysixregionswithnearly similartimegranularityoftrafcsnapshot,asshowninTable 6-2 .InFigure 6-2 ,we 1 A nothersolutioncouldbetoonlycountcarsthatareclosetothecamera;while thisisdenitelyanoptionforvideodata,forsnapshotdataitwouldresultinthose distantcarshavingleftthescenebeforethenextsnapshot;theneteffectbeingthat themaximumobservedcarcountatajunctionistruncatedcausingproblemsinthe multivariateanalysislateron. 74

PAGE 75

ALondon BSydney F igure6-2.TrafccamerasinLondonandSydney.Thereddotsshowthelocationof camerasdeployed. showageologicalsnapshotofcamerasdeployedintheregionsofLondonandSydney, asanexample.TheareacoveredbythecamerasinLondonis950km 2 whilethatin Sydneyis1500km 2 6.2.1BackgroundSubtraction Backgroundsubtractionisastandardmethodforobjectlocalizationinimage sequenceswithxedcameras,wheretheframerateislowerthanthevelocityofthe objectstobetracked(i.e.carsmoveoutofthescenetypicallyatarateexceeding 1minute).Thebasisformodelsofbackgroundarebasedontheobservationthat background doesnotchangesignicantly(incomparisontoforeground/objects)across time.Anypartofanimagethatdoestwiththatmodelisdeemedas foreground/object Theseforegroundregionsarethenfurtherprocessedforthedetectionofdesired objects. Thebackgroundmodelusedhereassumesthatthedistributionofbackgroundpixel valuesmaybemodeledasaweightedsumofGaussiandistributions.Ourapproach followscloselytothoseproposedby[ 28 211 218 ]becauseoftheirreliabilityand r obustnesstosensitivechangesinthelightingconditions.Inourapproach,theobserved pixelvalueismodeledbyaweightedsumofGaussiankernels.Let x t representapixel valueinthe t th frame,thentheprobabilityofobservingthisvalueisassumedtobe: p ( x t )= K X i =1 w t i N ( i t i t ) (61) 75

PAGE 76

where N ( i t i t ) isthe i th kernelwithmean i t andcovariancematrix i t ,and w t i is theweightappliedtothatkernelsuchthat P i w t i =1 .Weassumethat RGB channels areuncorrelatedthusthecovariancematrixforeachkernelisdiagonal. 2 Whenanew framearrives,thepixelvaluesarecomparedtothekernelstodetermineifitislikely thatthisvaluewasdrawnfromadistributionwith N ( i t i t ) (usingforexamplea95% condenceinterval).Ifso, i t i t and w i areupdatedusingexponentiallters;ifnot,a newkerneliscreatedandtheexistingkernelwiththelowest w i iseliminated(see[ 218 ] f orspecics).Shortlivedkernelsandtheirassociatedpixelsaredeemedtobepossibly foregroundproducingabinarymap.Morphologicaloperationsarethenappliedtothis maptoremovenoiseandanyblobswithareasmallerthanacertainthreshold. Theviewofmostcamerasusedinthisstudyisalongthedirectionoftheroadand thisperspectiveskewsthesizeofobjectsonanimage[ 78 ].Tocounterthiseffect,we w eigheachforegroundpixelwiththeexponentofit'sdistancefromthebottomofthe image.Thusapixelinthebottomoftheimagewillbeweightedless(objectappear largeratthebottomthanonthetop)thanapixelatthetop.Whilethisweightingisnot exactanddoesproducesomewarpingasweshallseeinthenextsection,itisnot excessivebutissimpleanddoesnotrequiremanuallytuningeachcamera. 6.2.2AlgorithmforTrafcDensityEstimation Weaimtoestimatetrafcdensity(d)onroadsconsideringthenumberofvehicles orpedestrianscrossingtheroad.AsampleoftrafcimagesisshownFigure 6-3 .We h aveasequenceofimagescapturedbywebcams.Consideringourproblem,wehaveto beabletoseparateinformationweneed,e.g.numberofvehiclesandpedestriansfrom thebackgroundimage,whichisnormallyroadandbuildingsaround.Themainfactor thatcandistinguishbetweenvehiclesandbackgroundimage(road,buildings)isthe factthatthevehiclesarenotinastationarysituationforalongperiodoftime,however 2 T husreducingthenumberofunknownparameters. 76

PAGE 77

A d =2023,0.28 B d =5400,0.55 C d =9230,0.93 Figure6-3.Asetofpicturesforanintersectionwithvaryingtrafcintensities.This variationiscapturedbydensityparameter d .Therstvaluesistheresultof backgroundsubtractionandlateristhenormalizedvalue. thebackgroundisstationary.Thesolutionfortheproblemthenseemstobeapplying asortofhighpasslteringoverasequenceofimagescapturedbyawebcamover time.Thehighpasslterremovesthestationarypartoftheimages(road,buildings, etc.),andkeepsthemovingcomponents(mainlyvehicles).Inordertoimplementsuch ahighpasslter,wesubtractresultofalowpasslteroverasequenceofimages, fromeachstillimage.Thisispracticallyequivalenttoimplementingahighpasslter oversequenceofimages.Inordertoobtainlowpasslteringeffect,werunamoving averagelteroveratimesequenceofimagesobtainedfromonewebcam.Theduration ofthemovingaverageltercanbeadjustedinanadhocway.Themovingaverage lterissimplyimplementedbyaveragingovertheintensitymapforseveralimagesin acertainduration.Attheoutputofthemovingaveragelter,theintensityofeachpixel isobtainedbyaveragingintensityofcorrespondingpixelsintheinterval.Theoutputof themovingaveragelter(lowpasslter)isnormallytherequiredbackgroundimage, whichisstillpartoftheimage.Therefore,subtractingeachimagefromtheoutputofthe lowpasslter,givesusthemovingcomponents(e.g.vehicles).Havingthehighpass componentoftheimage,thevehiclesarehighlightedfrombackground.Onecouldthen useregularobjectdetectiontechniquestoidentifyandcountnumberofvehiclesinthe highpasslteredimage.However,thisiscomputationallyexpensiveandunnecessary. 77

PAGE 78

0 2000 4000 6000 8000 10000 12000 14000 16000 0 10000 20000 30000 40000 Image CountTraffic Densities 5000 10000 15000 20000 25000 30000 35000 Outliers AOutliersPresent 0 2000 4000 6000 8000 10000 12000 14000 16000 0 2000 4000 6000 8000 10000 Image CountTraffic Densities 2000 4000 6000 8000 BOutliersRemoved F igure6-4.Outliersdetectionandremoval.(A)Outliersdetectionbyencirclingthem(B) Factualtrafcdensitydistribution. Asanalternative,wesimplycountthenumberofactivepixels(pixelswithavaluehigher thanacertainthreshold).Thisismuchfasterthandetectingandcountingobjectsin animage.Atthesametime,itismoreeffective,becausewearelookingatthetrafc densities(d)i.e.,percentageofthestreet(road)whichiscoveredbyvehicles(asan indicatorofhowcrowdedisthestreet),ratherthannumberofvehicles.Thenumberof vehiclesisnotagoodindicatorofcrowd,asalongvehiclemayintroducemoretrafc thanasmallone.Second,ourmethodovercomestheissuesthatobjectdetection faceincaseofseverecongestion.Countingnumberofactivepixelscanindicatewhat percentageoftheroadiscovered,nomatterhowmanyvehiclesareintheroad.In manyinstances,imagesareduplicate,corruptedwithzerosizedorwithextraneous bytes(noise).Weusesemi-supervisedlearningandhierarchicalclusteringtoovercome thechallengesofoutliers'detectionandremoval. 6.2.3OutlierDetectionandRemoval Animportantaspectofcollectingimagesonsuchalargescalerequiresautomated processestomanageandextractusefulinformation.Asmentioned,differentcameras havedifferentrefreshingrate,wehavetocontinuouslydownloadimagesataspecic time-intervalforeachcamera.Toensurethatwearenotmissingevenasingletrafc snapshot,wekeepourdownloadtime-intervalalittleshorterthanthecamerarefreshing rate.However,thisresultsinfewduplicateimagesthatwelteroutasarststep 78

PAGE 79

Table6-3.Summaryofregressionanalysis Cameradf f 0 ( = 0.95) f 1 ( =0.95) R 2 p 1100-1.19 0 .0460.03 0.0030.792200.91 2100-3.25 0.1300.09 0.0070.857900.92 31008.16 0.0450.10 0.0050.930801.00 41008.16 0.0450.10 0.0050.930801.00 51008.16 0.0450.10 0.0050.930801.00 6100-2.13 0.1120.07 0.0080.749900.88 ACamera5 BCamera11 CCamera15 F igure6-5.Acomparisonofempiricaltrafcdensitieswithnumberofcarsrecorded. towardsoutliersdetectionandremoval.Normally,thedownloadeddatasetcontain images,whicharethesnapshotofvehiculartrafcontheroads.Butinmanyinstances, theimagesarecorruptedwithzerosizedorwithextraneousbytes(noise).Next,ifthe camerainstrumentisnon-functionalorhasmechanicalerrors,thetrafcmonitoring serverreplacescurrenttrafcsnapshotwitherrornoticationimage.Thechallenge hereistodetectallsucherrorsandremovethembeforemodelingandstatistical analysis.Theanalysisbecomemorecomplexaswedonotknowthekindofdistribution underlyingandhenceanystatisticaltechniquesthatrelyonsomedistribution(box-plot etc.)cannotbeused.Weusedsemi-supervisedlearningandHierarchicalclustering basedondistancestoovercomethechallengesofoutliersdetectionandremovalin millionsoftrafcimages.Duetolimitedspace,weareomittingthedetails,howeverthe resultsofoutliersdetectionandremovalisshowninFigure 6-4 7 9

PAGE 80

Table6-4.Detailsofgeo-coordinatemeasurements City#PairsAvg.Dis.(km)Avg.Time(m)Area Connecticut748016.4 4260% L ondon325801.56 2653% Sydney44223.2 3367% 6.2.4GroundTruthforValidation T otesttheperformanceofthecardensitycapture,sixcameraswereselected atrandomand 102 imagesfromeachwereexaminedbyhandtoproducea ground truth countforthenumberofcars.Thisgroundtruthwasthenregressedagainstthe measuredcardensitytocheckthattherelationshipislinear.Theregressionfrom threecamerasisshowninFigure 6-5 andshowsareasonablet.Therearesome o utliers,especiallyatlowlevelsoftrafcandtherealsoappearstobeaslightnon-linear relationshipbetweenthegroundtruthandmeasuredcardensityduetothewarping effectofperspective(discussedabove).Table 6-3 showsthesummarystatisticsfor t heregressionanalysisincludingSpearman'scorrelationcoefcient, ,whichseems toimplythatthereisaperfectnon-linearcorrelationforcamera's3to5. 3 Overall,the analysisshowsthatwhiletherearesomeerrors,therelationshipbetweentheactualand measurednumberofcarsissufcientlycleartoallowanalysisatanetworklevel. 6.2.5Geo-CoordinatesPairing Wecollectthephysicalinformationofthesecamerasasreferencepointstostudy thestructuralanalysis.Thephysicalinformationofthesecamerasincludeslatitude andlongitudecoordinates,zipcodeandstate,directionalview,cameralocations. Thesecamerasareinstalledatcriticalintersectionsandhighways,soastudyinvolving suchlocationswilleventuallygiveagoodestimateofitsstructuralsignicance.After 3 T heothernotationinTable 6-3 isstandardregressionnotation: d f denotesthe degreesoffreedom. and f aretheregressioncoefcientsas y = x + f R 2 isthe %ofvarianceexplained,seeEquationeqn:r2, p isthep-value. 80

PAGE 81

gatheringthephysicalinformationofcameras,wecalculate thedrivingdistancesand correspondingdrivingtimeforallcamerapairswithineachcity.WeusetheGoogle MapsAPItogatherinformationaboutallsuchpairsoflocations.Thedetailsofthree regionsisshownintheTable 6-4 8 1

PAGE 82

CHAPTER7 S PATIALANDTEMPORALANALYSISOFPLANETSCALEVEHICULARDATA 7.1Introduction Vehiculartrafccongestionisbecominganeverincreasingproblemaround theworld.Inthelatest(2010)urbanmobilityreport[ 59 ],congestioncausedurban A mericanstotravel4.8billionhoursandtopurchaseanextra3.9billiongallonsof fuelforacostof$115billion.Onaverage,yearlypeakperioddelaycausedbythe trafccongestionfortheaveragecommuterwas34hoursandthecosttotheaverage commuterhasincreasedby230%intwodecades[ 59 ].Congestionsnotonlyaffect p eopleduringthepeakperiod,butalsoatotherhours,approximatelyhalfoftotaldelays occuratmiddayandovernight. Inattemptingtoidentifythecausesforthisproblem,mostofthecurrentapproaches focusonisolatedeffortstoimproveconditionsasandwhenitaroseatfewlocations. Asofnowthetransportationandengineeringscienceshaveproposedtogetasmuch serviceaspossible.First,bytimingthetrafcsignalssothatmorevehiclesseegreen lights,improvingroadandintersectiondesigns,oraddingashortsectionofroadways. Second,byaddingmorecapacityincriticalcorridorsnewstreetsandhighways,new orexpandedpublictransportationfacilities,andlargerbusandraileets.Third,by changingtheusagepatternslikeexibleworkhours,avoidtravelingduringtherush hours.Webelievetheseapproachesarecurrentlyinsufcientunlesscoupledwith acomprehensivepictureofcitystructureandthetrafcdistributionacrossitskey intersectioninacollectivemanner. Indoingso,weproposeasystematicapproachtorstanalyzethestructural dynamicsofthesecitiesandatemporalanalysisofthevehiculartrafcowin them.Onewaytopursuestructuraldynamicsistolookintodrivingdistancesand correspondingtimeacrossseveralintersections.Adeviationfromthegeneralnotion ofsmalldrivingdistancetocorrespondingsmalldrivingtimeandsimilarlyforlarge 82

PAGE 83

distancesrequirelongtimecanbeusedtoidentifythecritic alsectionsofthecities thatarepronetocongestions.Itiseminentthatsmallandlessnumberoflanes, closelysituatedintersection,mergingandbi-furcationswillonlyresultinlongertime totraveldespiterelativelyshorterdistances.Furthermore,atemporalanalysisofthe trafcdistributionacrossseveralhoursofday,duringweekdaysandweekendsgive animportantinsightintothedistributionandcorrelationoftrafcinthatcity.Finally, combiningthedistributionoftrafcwithstructuraldynamicsofcitieswillprovidea comprehensiveknowledgeoflocationsandtimeandinsightintotheefcacyofprior measures. Recentlyseveraltransportationdepartments(DoTs)haveinstalledonlinetrafc webcamerasatkeyintersectionstoknowcurrenttrendsinthetrafcow.Atregular intervalsoftime,thesecamerascapturestillpicturesofon-goingroadtrafcand sendtheminformoffeedstomediaserver.Wedevelopanautomaticscriptto acquireimagesatanerintervalofaround30secondsperimage.Wedevelop afastbackgroundimagesubtractionalgorithmtoextractthetrafcdensitiesfrom theseimagesforthepurposeofanalyzingthetrafconroads.Wealsouselocation information(geo-coordinates)ofthesecamerastocalculatedrivingdistancesanddriving times(usingGoogleservices)betweenallpairsoflocations.Weperformk-mediod clusteringonthecamerapairsbasedonthedistancesandcomparetheclustersformed usingdrivingtimeforthesamepairs.Thediscrepancybetweendrivingdistanceand timeclustersprovideagoodreasoningforthestudytoidentifylocationsthatareprone totrafccongestions.Later,weusethelongitudinaldensitiesvaluesextractedfrom imagesforthesecitiestoshowthedistributionoftrafcacrossseveralhoursoftheday andacrossseveralweeks.Insummary,ourcontributionsinclude: 1.Anovelapproachtocollectvehiculartrafcowanddrivinginformationusing publiclyavailabletrafccameras. 2.Tothebestofourknowledge,weprovidebyfarthelargestandmostextensive libraryofvehiculardensitydata,basedonprocessingofmillionsofimages.This 83

PAGE 84

addressesasevereshortageofsuchdatasetsinthecommunity .Thelibrarywill bemadeavailabletotheresearchcommunityinthefuture. 3.Applicationofcorrelationandspatio-temporalanalysiseffectivelyinidentifying patternsandpredictionstocurrenttrafcproblems. Thischapterisorganizedas:Section7.2 weprovidebackgroundtothiswork. I nSection7.3 weexplainthestructuraldynamicsofthecityandapplyclust eringto cities'structuraldatasets.InSection7.4 ,weperformtemporalAnalysisofTrafc D ensitiesanddiscusstheresultsinaelaborateway.Finally,weconcludeourchapterin Section7.6 detailingfuturedirections. 7 .2RelatedWork Severalmeasureshavebeenproposedrecentlytocounterthetrafccongestion andprovidebettermanagementforthetrafcthroughput.Fortransportationandcivil engineeringpointofviewsegregateeffortshavebeenmadetoimprovejunctions, busandexpresslanes,andcarpoolingtonameafew.Cityplanningandurban design[ 18 ]practicescanhaveahugeimpactonlevelsoffuturetrafcco ngestion.In [ 175 ],clusteringhavebeenappliedtostudyvehiculartrafcthr oughasequenceof trafclightsonahighway,whereallsignalsturnonandoffsynchronouslyisstudiedand thedynamicalbehaviorsofvehiclesareclariedbyanalyzingtrafcpatterns.Theresults showthatclusteringofvehiclesvarieswiththecycletimeofsignalsandarecontrolled byvaryingbothsplitandcycletimeofsignals.Alongthesamelinesin[ 27 ],authors s tudiedsimpleaggregationmodelthatmimicstheclusteringoftrafconaone-lane roadwayandderivedderiveananalyticalsolutionfortheprobabilityofasinglecarand anasymptoticallyexactexpressionforthejointmass-velocitydistributionfunction. CORRSIM[ 96 ]andVISSIM[ 155 ]aretwocommonlyusedsimulatorsusedfor m icro-modelingoftrafc.Inotherstudies,researcherattemptedtomodelthecongestion andtrafcusingmathematicalmodelsandderiveacloseformofexpression.In[ 15 ], B andoetal.proposeddynamicalmodeloftrafccongestionbasedontheequation ofmotionofeachvehiclebyanalyzingthestabilityoftrafcowandtheevolutionof 84

PAGE 85

trafccongestionisobservedwiththedevelopmentoftime.T heimplicationsofempirical timeheadwaydistributionsoftrafcow[ 241 ]andunderlyingstochasticprocesshas s howntomodeluctuationsoftrafcow.Howevertheseapproachesalsolacka comprehensivestudyfortrafccongestionsanditseffectonothernon-congested segmentsofthecity. However,tobestofourknowledgeoursapproachistherstofitskindtoapply dataminingonsuchalargedatasettoprolecitiesandmodelthetrafcdensities.In nextfewsectionsweintroduceourdatasetandexplainalgorithmtoextractrelevant informationfromtheimagerydata. 7.3SpatialAnalysisofCities Inthissection,westudythegeographicalaccessandcitydynamicsintermsof drivingdistanceandrespectivetimebetweenasetofcamerasforeachcityseparately. Bydefault,thedrivingtimeandcorrespondingdistanceareobviouslycorrelated,but organicgrowthofcities,numberoftrafcsignals,lanesandwidthofroadtonamea fewcanaffectthiscorrelation.Withoutanyexceptions,ahighcorrelationvaluereect acongestion-freetrafcsystemandnormalmovementofvehicles.Ontheotherhand, alowtonegativecorrelationispronetotrafccongestionsandunevendistribution oftrafcspeeds.Theknowledgeofthiskindisimportantforexample,inhospital emergencyvehicles,wherepatientsaremuchmoresensitivetotimethantodistance differentials[ 192 ]. ClusteringusingK-medoid: TheK-medoid[ 18 ]isahardpartitioning a lgorithmwhoseobjectivefunctionistopartitiontheinputdata X = f x 1 x 2 x 3 ,..., x k ,.. g into c clusters.Weusedthisalgorithmforthepurposeusingdistancemeasurethatwe havefromthegeo-coordinatesinformation.Fromgivendatasetofcameraspairs X K-mediodalgorithmallocateseachdatapointtooneofthe c clusterstherebyminimizing theintra-clustersumofsquares. c X i =1 X k 2A i jj x k )Tj /T1_2 11.955 Tf 12 0 Td (v i jj 2 85

PAGE 86

Table7-1.Clusteringanalysis CityD.I#Clusters Connecticut0.00166% L ondon0.00021052% Sydney0.00545% Toronto0.0008510% where A i isthedistancepointbetweenanytwocamerapairsinthe i -thclusterand v i isthemeanforthatpointovercluster i andalsotheclustercenter.Thiscenteristhe nearestobjectstothemeanofdatainonecluster V = f v i 2 X j 1 i c g ClusterSize Eachcityisdenedbyitsownclustersizes.SinceK-medoidisahardpartitioning thatrequiresinputtothenumberofclusters.Howeverthis apriori isnotavailable.We usedDunnIndex[ 61 ](D.I)tomeasureagoodqualityofnumberofclusterswithdif ferent valuesof k .Thisindexisdenedas D ( c )= min i 2c f min j 2c i 6= j f W gg (71) W = min x 2C i y 2C j dist ( x y ) max k 2c f max x y 2C diam ( x y ) g (72) where dis ( x y ) and diam ( x y ) arerespectivelydistanceanddiameterbetweentwo clustersand C i C j aretheclusters. Themaingoalofthemeasureistomaximizetheinter-clusterdistancesand minimizetheintra-clusterdistances.Therefore,thenumberofclusterthatmaximize D ( c ) istakenastheoptimalnumberoftheclusters. Analysis Weseparatelyapplyk-medoidclusteringintwo-dimensionsongeo-coordinates pairsofcameras,rstclusteringthembasedondrivingdistanceandthenoncorresponding drivingtimebetweenthemforeachcity.Tomeasurethevariationwecomparedeviation intheclustermembership.Thishelps:(i)todiscovertheirregularitycausedbyspatial 86

PAGE 87

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 Camera PairsDistance AConnecticut(D) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 Camera PairDistance BLondon(D) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 Camera PairsDistance CSydney(D) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 Camera PairsDistance DToronto(D) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 Camera PairsTime EConnecticut(T) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 Camera PairTime FLondon(T) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 Camera PairsTime GSydney(T) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 Camera PairsTime HToronto(T) F igure7-1.K-medoidClusteringforthedistanceandtimepairingforthecities..Thesize andboundariesofclustersshowanon-lineardistanceandtime correspondence. 87

PAGE 88

1 2 3 4 5 6 0 500 1000 1500 Cluster Cluster Size Distance Time AConnecticutClusterAnalysis 1 2 3 4 5 6 7 8 9 10 0 1000 2000 3000 4000 ClusterCluster Size Distance Time BLondonClusterAnalysis 1 2 3 4 0 200 400 600 800 ClusterCluster Size Distance Time CSydneyClusterAnalysis 1 2 3 4 5 0 200 400 600 800 1000 ClusterCluster Size DTorontoClusterAnalysis F igure7-2.Theerrorbarsshowthedeviationinthemembershipofobjectsforclusters. Quantitativeresultsofclusteringshowconnecticutgeo-spatialdistributionis morestreamlined. featurearoundtheselocations,(ii)toknowthesetofcamerasthataresimilaraccording tospecicmetrics.Inordertoidentifycorrectnumberofclusters,weusedDunn index[ 61 ]sincek-medoidclusteringishardpartitioning. W esamplethedistanceandtimedistributionofallcamerapairstoinvestigate ifthethereisalinearmappinginthesemetrics.Surprisingly,wendadeviationin thisstatisticsasshowinFigure 7-1 and 7-2 .Thebargraphsshowthesizeofclusters g eneratedafterrunning1000iterationsofk-medoidalgorithm.Theerrorbarsarethe deviationinthedatapoints(camerapairs)thatarepresentindistanceclusterbutnot intimeclustersandviceversafortime.IntheTable 7-1 weshowthisdeviationusing whichisafunctionofmembershipofapairofcoordinatesthatbelongtoonecluster. Acorrelationexists,whenapairofcoordinatesfallinthesameclusternumberincase ofbothtimeanddistance.InthecityofLondon,wendthatroadintersectionsare veryclosetogetherandhencetendencytohaveaslowtrafcininherentinthecity 88

PAGE 89

infrastructure.InthestateofConnecticuttheclusters1,4 ,5and6havelargedeviation andhaveacomparativelylargernumberofsmalldistanceintersections.Whileforthe sydney,itseemstheclustershaveverylowdeviationandananalysisofclustersshow anevendistributionoftrafcacrossallitsroads.ThisfurthershowsthatSydneyis morewellplannedandhaslessorganicgrowthascomparedtoothercities.Wend similaranalysisforothercitiesaswell,howeverbrevityweonlydiscussthreeofthe tencities.InTable 7-1 theaveragedeviationamongclustersmembershipshowsthe i nconsistencyamongdistanceandtime.Ouranalysissuggeststhatdistanceandtime variationindicateonethemainfeaturesofcongestionandslowtrafcinthecities. 7.4TemporalAnalysisofTrafcDensities Intheprevioussectiononspatialanalysis,wediscussedtheclusteringvariationfor geodesicdrivingdistanceandtimeamongasetofcriticalintersectionpointsofcities.At anygiventimethesestatisticsremainthesameunlesscitiesarestructurallymodied.In thissection,wefocusmoreonthetemporalaspectoftrafcdistributioninthesecities. Weusetrafcdensitiesextractedfromimagerydatasetofcamerastoperformthis analysis.Inthisactivityweaskthefollowingquestions. Q.1: Whatisthenatureoftrafcdistributionforthesecameras?Doallcamerashave samedistribution? Q.2: Isthenatureofdistributionpredictableoveralongperiodoftime? Q.3: Whicheventscausedeviationinthetrafcdistribution?Howtoidentifysuch events? Inordertogetanimpressionofthetrafcdistribution,westartwithaqualitative analysisofdensitiesinthreecities.Thisstephelptogetahighlevelrepresentationand reasoningmethodsaboutthebehavioroftrafc.Lateron,wemakeitalogicalbaseto investigateprecisequantitativeinformation. 89

PAGE 90

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Lags (in Days)Coefficient ALowTrafcDensityACF 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Lags (in Days)Coefficient BHighTrafcDensityACF 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Lags (in Days)Coefficient CPeriodicTrafcDensityACF 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Lags (in Days)Coefficient DRandomTrafcDensityACF F igure7-3.ShowtheAuto-correlationfordifferenttypesoftrafcfromfourcities. 7.4.1TrafcDistribution Toanswertherstquestion,wesamplethedatasetintohourlyrepresentfor aperiodof42daysandusespectralmethodtogeneratetherelationshipbetween variousdays.Asampledviewtakenfromthreecitiesshowthatcamerashavevarying trafcdistributionagainstthepopularnotionof'rushhours'.Wendthatitisdifcult toestimateanaggregatestatisticalparameterthatdeneallcameras.Forexample, inFigure 7-5A ,wendaverylowinstanceoftrafcforaperiodof42days,whi leits oppositeintheFigure 7-5B withconsistentlyhightrafc,indicatingabusymarket a rea.Oneinterestingpatternisthereductionduringtheweekends(day7,14etc.).In Figure 7-5C thetemporalactivityrichesitsheightduringthemorningan deveninghours whilerelativesmoothnessduringtheafternoonhours.FinallyinFigure 7-5D ,wefound 9 0

PAGE 91

1200 1400 1600 1800 2000 2200 2400 2600 2800 MON TUE WED THU FRI SAT SUN MON TUE WED THU FRI SAT SUN MON TUE WED THU FRI SAT SUN MON TUE WED THU FRI SAT SUN MON TUE WED THU FRI SAT SUN MON TUE WED THU FRI SAT SUNDaysTraffic Density ALowTrafc 2600 2800 3000 3200 3400 3600 3800 4000 4200 DaysTraffic Density MON TUE WED THU FRI SAT SUN MON TUE WED THU FRI SAT SUN MON TUE WED THU FRI SAT SUN MON TUE WED THU FRI SAT SUN MON TUE WED THU FRI SAT SUN MON TUE WED THU FRI SAT SUN BHighTrafc 2200 2400 2600 2800 3000 3200 3400 DaysTraffic Density MON TUE WED THU FRI SAT SUN MON TUE WED THU FRI SAT SUN MON TUE WED THU FRI SAT SUN MON TUE WED THU FRI SAT SUN MON TUE WED THU FRI SAT SUN MON TUE WED THU FRI SAT SUN CPeriodicTrafc 1000 1500 2000 2500 3000 3500 4000 DaysTraffic Density MON TUE WED THU FRI SAT SUN MON TUE WED THU FRI SAT SUN MON TUE WED THU FRI SAT SUN MON TUE WED THU FRI SAT SUN MON TUE WED THU FRI SAT SUN MON TUE WED THU FRI SAT SUN DRandomTrafc F igure7-4.Showtheaverageddailydistributionoftrafcforfourcities. ALowTrafcDensity BHighTrafcDensity CPeriodicTrafcDensity DRandomTrafcDensity F igure7-5.A42daystrafcdistributionsnapshotfromfourdifferentcamerasisshown. (A)showsrelativelymildtrafcduringvarioushoursoftheday,while(B) showshightrafcrecordingforthefulltraceperiods.In(C),wenda regularitypatternsduringthemorningandeveninghourswhenthetrafcis relativelyhigherthanafternoonintervals.Arandomtrafccharacterizationis recordedinthelast. thatthepatternsarenotveryregularandwenditreallyhardtoestimateacorrect trafcmodel.Thiskindofvariationintrafcgivechallengesindevelopinganaggregate forecastingmechanismforacitywidetrafc.Italsorejecttheapopularnotionof`rush 91

PAGE 92

-0.09 0.00 0.10 0.19 0.29 0.39 0.48 0.58 0.67 0.77 0 5 10 15 20 25 30 35 40 Correlation Coefficient(r )% Cameras 1-Hour 2-Hour 3-Hour 4-Hour Figure7-6.Anhourlyvariabilityforthetrafc -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.2 0.4 0.6 0.8 1 Correlation Coefficient ( r )CDF (% Cameras) 1-Hour 2-Hour 3-Hour 4-Hour AConnecticut -0.2 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 1 Correlation Coefficient ( r )CDF (% Cameras) 1-Hour 2-Hour 3-Hour 4-Hour BLondon -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 0.2 0.4 0.6 0.8 1 Correlation Coefficient ( r )CDF (% Cameras) 1-Hour 2-Hour 3-Hour 4-Hour CSydney 0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.2 0.4 0.6 0.8 1 Correlation Coefficient ( r )CDF (% Cameras) 1-Hour 2-Hour 3-Hour 4-Hour DToronto F igure7-7.Cumulativedistributionfunctionshowcameraauto-correlationoftrafc. hours'conceptsincedifferentcamerasseemstopossestheirowndistributionoftrafc duringvaryinghoursoftheday. 7.4.2CorrelationinTrafcDistribution LevelofTrafcCongestion: TheTrafcCongestionisafunctionofmaximum amountoftrafcthatacamerashaseverexperiencedinthedurationoftraceanalysis. Mathematicallyitisdenedas.Foracamera C p 2 C, C = f C 1 C 2 ,.. C n g athour h ofthe day k ,thetrafcdensitieswillbe d ( C k 1 h ) ,thelevelofcongestionis 92

PAGE 93

7 8 9 10 11 12 13 14 15 16 17 0 0.2 0.4 0.6 0.8 1 Hour of DayCorrelation Coefficient ( r ) AConnecticut(1-Hour) 7 8 9 10 11 12 13 14 15 16 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Hour of DayCorrelation Coefficient ( r ) BConnecticut(2-Hours) 7 8 9 10 11 12 13 14 15 0 0.2 0.4 0.6 0.8 1 Hour of DayCorrelation Coefficient ( r ) CConnecticut(3-Hours) 7 8 9 10 11 12 13 14 -0.2 0 0.2 0.4 0.6 Hour of DayCorrelation Coefficient ( r ) DConnecticut(4-Hours) 7 8 9 10 11 12 13 14 15 16 17 0 0.2 0.4 0.6 0.8 1 Hour of DayCorrelation Coefficient ( r ) ELondon(1-Hour) 7 8 9 10 11 12 13 14 15 16 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Hour of DayCorrelation Coefficient ( r ) FLondon(2-Hours) 7 8 9 10 11 12 13 14 15 -0.1 0 0.1 0.2 0.3 0.4 Hour of DayCorrelation Coefficient ( r ) GLondon(3-Hours) 7 8 9 10 11 12 13 14 -0.05 0 0.05 0.1 0.15 0.2 Hour of DayCorrelation Coefficient ( r ) HLondon(4-Hours) 7 8 9 10 11 12 13 14 15 16 17 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Hour of DayCorrelation Coefficient ( r ) ISydney(1-Hour) 7 8 9 10 11 12 13 14 15 16 0 0.1 0.2 0.3 0.4 0.5 Hour of DayCorrelation Coefficient ( r ) JSydney(2-Hours) 7 8 9 10 11 12 13 14 15 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Hour of DayCorrelation Coefficient ( r ) KSydney(3-Hours) 7 8 9 10 11 12 13 14 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Hour of DayCorrelation Coefficient ( r ) LSydney(4-Hours) 7 8 9 10 11 12 13 14 15 16 17 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Hour of DayCorrelation Coefficient ( r ) MToronto(1-Hour) 7 8 9 10 11 12 13 14 15 16 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Hour of DayCorrelation Coefficient ( r ) NToronto(2-Hours) 7 8 9 10 11 12 13 14 15 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Hour of DayCorrelation Coefficient ( r ) OToronto(3-Hours) 7 8 9 10 11 12 13 14 -0.1 0 0.1 0.2 0.3 0.4 Hour of DayCorrelation Coefficient ( r ) PToronto(4-Hours) F igure7-8.A42dayshourbyhouraveragecorrelationoftrafcdensitieswithfourlags. ( C k 1 h C k 2 h )= n P d ( C k 1 h ) d ( C k 2 h ) )Tj /T1_4 11.955 Tf 11.88 0 Td (( P d ( C k 1 h )) ( P d ( C k 2 h )) q n ( X d ( C k 1 h ) 2 ) )Tj /T1_4 11.955 Tf 12 0 Td (( X d ( C k 1 h ) ) 2 q n ( X d ( C k 2 h ) 2 ) )Tj /T1_4 11.955 Tf 11.88 0 Td (( X d ( C k 2 h ) ) 2 (74) T = d ( C r p q ) n X i = 1 12 X h =1 m X d =1 max ( d ( C k i h )) (73) 7.4.3TrafcCongestionCorrelation Weutilizethepowerofcorrelationcoefcientstomeasurethedegreetowhich trafccongestionvaluesanddirectionofalinearrelationshipareassociatedwith itself.Inourcaseweusingthistechniquetocorrelatethetrafcchangeduringseveral hoursofday.Weareanalyzingthecorrelationsfor1-4hourlagsforeachcamera 93

PAGE 94

withitself.Forexample,weinvestigatewhatisthecorrelat ionbetweenthetrafcat 7AMand8AM,7AMand9AMetc.Weusethefollowingtocalculatethevalueof correlationcoefcient .Fortwocameras C 1 and C 2 athour h oftheday k ,theirwith trafcdensitieswillbe d ( C k 1 h ) and d ( C k 2 h ) ,respectively.Thecorrelationoftheirdensities isshowninEq.74 T hevalueof isvarying: )Tj /T1_4 11.955 Tf 9.24 0 Td (1 +1 .Iftwohourlagshaveastrongpositive linearcorrelationinthetrafcvalues,thevalueof isreachesto +1 elseastrong negativelinearcorrelationthevalueof iscloseto )Tj /T1_4 11.955 Tf 9.24 0 Td (1 .Ifthereisnolinearcorrelationor aweaklinearcorrelation,thevalueof isclosetozero.Avaluenearzeromeansthat thereisarandom,nonlinearrelationshipbetweenthetwohourstrafcvalues.Inorder toanalyzethetrafcchangeweinvestigatethecorrelationoftrafcinfourdifferenttime lagsfromonetofourhours.Todothis,wesamplethetrafcdensitiesofeachcamera intohoursfrom7A.Mto7P.M.Thenwendcorrelationsbetweenconsecutivehoursfor 42days.Forexample,wendthecorrelationcoefcientoftrafcdensitiesbetween42 daysof7A.Mand8A.M,7A.Mand9A.M.Weaccumulatetheresultsforeachcamera separatelyandcompiletheminformaCDFthatshowninFigure 7-7 .Weseethatfor t hecityofConnecticutandSydneyinFigure 7-7A and 7-7C ,thehourlytrafcchange i shighlycorrelated,almost80%ofcameras'nexthourtrafcis70%correlatedtoits currenthour.Moresurprisingisthetwohoursdifferenceoftrafcdensitiesforcameras ofthesetwocities,where80%ofnexttwohourstrafcisonly50%orlesscorrelated tothecurrenthour.Andaround60%camerashave30%correlationforatimelag3-4 hours.WhileincaseofthecityofLondonthenexthourtrafcdensityfor80%cameras iscloseto60%correlatedtothecurrenthour.Itgoesfurtherdownto30%fornexttwo hoursandaround15-20%fora3-4hourdifference.Theseobservationstellusthat LondontrafchashighuctuationsthanConnecticutandSydney,whicharerathervery smooth.Anypredictionmodelbuildonthisrequiresmoreinsightintothetransitioning trafcforLondonandConnecticutbecauseoftheirhighvariability.Afterlookingintoa 94

PAGE 95

highlevelpictureoftrafcchangeforseveralconsecutiveh ours,wemovetoanalyze thevariabilitycontributedtotheCDFsofFigure 7-7 bytheindividualhours.Thishelps t oknowwhichhouroftheday'strafcchangeonalargescalecomparedtoitsprevious hours.WestartbyanalyzingthecityofConnecticut,wherethecorrelationfornexthour islowat7AMandand12noonanditisrelativelylowalsofor2-4hourslag.Thecity ofLondonnormallyhasarelativelylowcorrelationwitharound50%during7AM,but itsuddenlydropstolessthan1%for2-4hourslag.However,afternoontrafcremain relatively40%correlatedfor1-2hourlagandveryhighduringthelaterhoursofthe daywith60%-80%correlationfor1-3hourlags.TheLondonhourlytrafcuctuates moreduringthemorningtimesandearlyevening,butrelativelystableduringthenoon andeveninghour,whichisoppositetoconnecticut,althoughtheaveragecorrelation forLondonisnearly50%-60%only.AsexpectedforthecityofSydney,thetrafcis relatively60%-70%correlatedforonehourlagandshownearlyconsistentcorrelationof 40%-55%for2-4hourlag.Thisanalysisshowsthetrendsasseeninprevioussection whereCDFsofSydneyarerelativelymorestableandhighlycorrelated.Thisanalysis provideasteppingstonetodevelopamodelforpredictingnexthourstrafcbasedon thecurrentcondition.Moredetailsareavailablein[ 230 ]. P redictibility: Animportantcriteriaistheidenticationofpatternsthatchangethe trafcdensitiesoveralongperiodoftime.InFigure7-6 ,weshowthedeviationinthe p redictingthetrafcgiventhecurrentdensity.Usingcorrelationcoefcientwecometo knowthathowmuchtrafciscorrelatedwithdifferenthoursoftheday.Fromthegure wecanseethatimmediatelynexthourishighlyperceptiblewhilethetrafcaroundfour hoursfromcurrentisnotverycorrelated.Thusanymodelthatbasesitsdecisiononthe currenttrafcvaluetopredictfuturetrafccanestimateawrongone. 7-8 .Thegoalof t hisstudyistoinvestigatetheunderlyingdistributionofthetimeseriesdataandnd repeatingpatterns.Weuseauto-correlationfunctionatvariouslagstoobserveany 95

PAGE 96

possibledependence.Wecomputetheautocorrelationbetwee nevery m sizenumber array. Toanswerthesecondquestionweperformstabilityanalysisandinvestigatethe auto-correlationamongthedataset.Indoingso,wendthatalthoughthetrafc patternsaredifferentfromcameratocamera,butthenatureoftrafcremainsstable andmostpredictableduringthefulllengthofthetraceincludingweekends.The auto-correlationfunctionmeasurethevariationandshownearlycorrelatedtrafc. thus,fromthisresultweassumethatthetrafcmostlyhaveasimilartrendinthetrace period.Theresultsareshowin 7-3 I nthecaseofthirdactivity,wendthatworkinghoursandweekdaysshowcase mosttrafcwithaperiodicdipduringtheweekendsandholidays.Thisisareasonable ndingandgivesasetofeventsthatareinter-related.Theresultsareshowninthe Figure 7-4 InSummary,wendthatitishardtogenerateanaggregatemode lthatcan identifythetrafconcitywidebases.However,usingspectralanalysiscategorization canbedonetoclassifycamerasintoclasseslikemorningrushhours,daytimerush hoursofevenlydistributedtrafc.Second,thetrafcisverypredictablewithitselfand demonstrateshighcorrelationduringvaryingdays.Finally,thetrafchaslotofregularity withitself. 7.5Spatio-TemporalAnalysis Inthissection,weutilizethendingfromspatialandtemporalactivitytoanalyze thespatio-temporalpatternsoftrafc.Wemodelthecameralocationsforeachcityas agraph G =( V E ) ,with V assetofverticesrepresentingcameralocationsand E is setofedgesasthecorrelationoftrafccongestionbetweentwocamerasrepresenting thewidthofconnectingedge(roads).Thesizeofthevertexrepresentstheamountof congestionexperiencedbythatcamera.Bymodelinginsuchway,wegetaweighted graphviewofthespatio-temporalactivitypatternonascaleofthesizeofthecity.Itin 96

PAGE 97

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 G(T)G(L) AConnecticut 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 G(T)G(L) BLondon 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 G(T)G(L) CSydney 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 G(T)G(L) DToronto F igure7-9.Thespatio-temporalcongestionismodeledasgraph. turnprovidesaqualitativereasoningaboutthelocationsanditsinter-connectingroads (asedges)thatexperiencenormalthanusualtrafc.Alternatively,athinlinkrepresents congestionlevelthatisuncorrelatedatthesetwolocations,whileathicklinkshows congestionlevelthatishighlycorrelatedatthesetwolocations.Let'sdiscussthemnow. Onaplane,weusegeographicalmappingofcoordinatestorepresentthelocations anddistancebetweenthemasthelengthofedgesinterconnectingtheselocationsforall thefourcities.Theactualgeographicalcoordinatesaretransformed(G(T)andG(L)in Figure 7-9 )tomakebettervisualizationofthelinksandlocations. A nalysis: Inordertoevaluateeachcity,weperformanalysisatvarioushours andanalyzethetrafccongestiontrends.Wesystematicallysampled42daysof trafccongestionintohoursforeachcameraandcalculatedacorrelationcoefcient betweenallthecamerapairs.Inthegraphweinputthiscorrelationaswidthofeachlink interconnectingthosetwolocations.Tocalculatethetrafccongestionateachcamera level,wesimplytookthecorrelationcoefcientofthetrafcof42days.Thisisinputto 97

PAGE 98

thegraphasofvaryingsizedverticesrepresentingthelevel ofcongestionexperienced ateachpointinthegraph. TheresultstothisanalysisisshownintheFigure 7-9 .Tomakeitmoreelaborative, w elteredtheedgesandnodeswith 0.7 T .Thesparsenessofgraphsisalsoa measurehowmuchdensecongestionsareforthatcity.Ourresultsshowthatforthecity ofLondonthetrafcispresentatalllinksandduringmanyhoursoftheday.However, fewnodesexperiencemoretrafcthanusual.Thiscorroborateourpreviousresultsfrom temporalanalysisaswell,wherewefoundthatlondontrafcisvaryinginnatureand experiencerelativelymorecongestedthanothercities.However,usingthismethodology wecanvisuallyaspectspecicsectionsofthecitywithdifferenttrafccongestion estimates.Thecityofconnecticutalsodepictsimilarstatisticsaboutthecongestion trends.,Internally,weobservedthetotaltimethattrafcispresentonthelinksincaseof ConnecticutissmallerthanLondon.Thus,laterexperiencetrafcforlonghours. AsexpectedthecityofSydneyhaslesstrafcaswellaslesscongestion.Itturned outthatthiscitytrafcdemonstratedlesstrafccomparedtoothercities.Notonlythis, thelinksarelesscongestedandindividuallocationsalsoexperienceverylesstrafc. WefoundsimilaranalysisforTorontoaswell,withveryfewlinksexperiencingcongested trafcandtherearefewroutes,whicharepronetocongestion,otherwisethecityhas goodtrafcmanagement. Onanote,bysettingthresholdthelevelofsparsenessinagraphalsotellthenature ofcongestionexperiencedbythatcity.IncaseofLondonandConnecticutthetrafc onitsroadsarehigherthanSydneyandToronto.Notonlythelocationexperience congestionbutthecorrelationamongthelocationsthatistheroadnetworksalso experiencelargertrafc. Insummary,spatio-temporalanalysisoftrafccamerasandtrafcowisobserved. Westartedouranalysisbyspatiallyminingthecamerastomodelthedistanceand time.Wendthatspatialanalysishasshownthedifferencebetweencitiesstructural 98

PAGE 99

dynamics.WendthatLondonismorepronetocongestion.Weca ninferthatincase ofLondon,evenforsmalldistances,thedrivingtimeisrelativelylarger.Oneofreasons, smallroadsandlargenumberofintersectionsrelativelyinshortdistances.Secondly fromourtemporalanalysiswealsondthatlondontrafcisnotonlylesscorrelatedbut haslargeuctuationinthepredictingthetrafcvalues.Thespatio-temporalanalysis hasshownthatmanylinksarecongested.OntheotherhandforcityofConnecticutand Sydney,wendspatialandtemporalcorrelationsaremorehigherandrelativeshow stabletrafcdistribution.ThetrafcofTorontocomesinbetweenthesetwoextremes whereitisbusyduringverysmallportionofthedayandrathershowastabletemporal miningandspatio-temporalmining. 7.6Conclusion Thecongestionproblemhasbecomewidespreadinrecentyears.Inthiswork,we introducedanoveldatacollectionmethodforvehiculartrafcusinggloballyavailable webcamsandanovelmethodtominevehicularimagerydataforthemeasuringlevel oftrafccongestion.Westudiedthreecitiestoanalyzethespatiotemporaldistribution oftrafc.Thespatialgaveanimpressionabouttheinternalroadnetworksofthecity anditsmappingtothedrivingtime.Avariationinsuchstatisticsresultsincitybeing highlycriticaltoapotentialcongestionproblemthatresultinanon-uniformtrafc.In thesecondstep,weusedthetrafcestimatesfrom42daysmeasurementtoanalyze itspatternacrossmanydimensionofdaysandweeks.Theresultsofcorrelationfor differenthourlagsgavethestabilityinthetrafc.Thetemporalanalysisisdonetoshow thetrafccongestionarevariableinnatureandcanbeselectiveanddifferentintime scales.WendtheLondontrafcisveryuncorrelatedandhencedifculttopredict. Ourresultsshowthat:(i)Highcorrelationbetweendrivingtimeanddistanceindicate congestion-freetrafc,(ii)Trafcfollowcertainpatternsthatarestableforalongtime (42days).(iii)TrafcCongestionshowhighCorrelation(80%)for1-2hourlagthen decreasesignicantlyto25-30%forfourhourslag.Infuture,welookforwardtodevelop 99

PAGE 100

aframeworkforrealtimeanalysisoftrafcandincorporatet hemasdataminingtoolfor thecommunity. 100

PAGE 101

CHAPTER8 M ODELINGANDCHARACTERIZATIONOFVEHICULARMOBILITY 8.1Introduction Researchintheareaofvehicularnetworkshasincreaseddramaticallyinrecent years.Withtheproliferationofmobilenetworkingtechnologiesandtheirintegrationwith theautomobileindustry,variousformsofvehicularnetworksarebeingrealized.These networksincludevehicle-to-vehicle,vehicle-to-roadside,andvehicle-to-roadside-to-vehicle architectures.Realisticmodeling,simulationandinformeddesignofsuchnetworksface severalchallenges,mainlyduetothelackoflarge-scalecommunity-widelibrariesof vehiculardatameasurement,andrepresentativemodelsofvehicularmobility. Earlierstudiesinthisareahaveclearlyestablishedadirectlinkbetweenvehicular macro-mobilitybasedondensitydistributionandtheperformanceofvehicularnetwork primitivesandmechanisms[ 32 213 ],includingbroadcastandgeocastprotocols[ 12 ]. A lthoughgoodinitialeffortshavebeenexertedtocapturerealisticvehiculardensity distributions,sucheffortswerelimitedbyavailabilityofsensedvehiculardata.Hence, thereisarealneedtoconductvehicularmodelingandcharacterizationusinglarger scaleandmorecomprehensivedatasets.Furthermore,commonlyusedassumptions, suchasexponentialdistribution[ 244 ]havebeenusedtoderivemanytheoriesand c onductseveralanalyses,thevalidityofwhichbearsfurtherinvestigation. Inthisstudy,wesystematicallyexaminethemodelingandcharacterizationof vehicularmobilityusingafamilyofheavy-tailandmemorylesstheoreticaldistributions. Toavoidthelimitationsofsensedvehiculardata,weinsteadutilizetheexistingglobal infrastructureoftensofthousandsofvideocamerasprovidingacontinuousstreamof streetimagesfromhalfadozenregionsaroundtheworld[ 229 233 ].Weprocessed m illionsofimages,capturedfrompubliclyavailabletrafcwebcameras,usinganovel densityestimationalgorithmtohelpinvestigateandunderstandthetrafcpatterns ofcitiesandmajorhighways.Ouralgorithmemployssimple,scalable,andeffective 101

PAGE 102

backgroundsubtractiontechniquestoprocesstheimagesand buildanextensivelibrary ofspatio-temporalvehiculardensitydata[ 227 ].Theresultantdatasetof25million r ecordsused,hastrafcdensitytimeseriesfrom819locationsbelongingtosixmajor metropolitanregionsaroundtheworld. Astherststeptowardsrealisticvehicularnetworkmodeling,weaimtoprovidea comprehensiveviewofthefundamentalstatisticalcharacteristicsofthevehiculartrafc densityexhibitedbythedata.Weconductedtwomainsetsofstatisticalanalyses:the rstincludesaninvestigationofthebest-tdistributionandgoodness-of-ttestusing afamilyofheavy-tailandmemorylessmodels,whilethesecondisastudyofthelong rangedependence(LRD)andself-similarityobservedinthatdata.Ouranalysisshows twomainresults:i)theempiricaldataofvehiculardensitiesinmostofthelocations followheavy-taildistributionssuchas`Log-gamma',`Log-logistic',and`Weibull'.ii)the dataconsistentlyshowedahighdegreeofself-similarityoverordersofmagnitudeof timescalesfortheselocations.Thismaysuggestalong-range-dependentprocess governingthevehiculararrivalprocessinmanyrealisticscenarios.Suchresultis insharpcontrasttotheassumptionsofmemorylessprocessescommonlyusedfor modelingthevehicularmobility. Therestofthechapterisorganizedasfollows:Section 8.2 discussesrelated w ork.Statisticalanalysisofmeasurementsandmodelingisillustratedinsection 8.3 V ehiculartimeseriesfollowingself-similarprocessisdiscussedinsection 8.5 .Finally, w econcludeourchapterinsection 8.6 8 .2RelatedWork Inthissection,westartbyrstdiscussingcurrentchallengesinvehicularmobility traceacquisitionandgiveinsightintothetracesthatweareusinginthisstudy.Second, wegiveabackgroundonimageprocessingeffortsthatisrequiredforourtracesto extractessentialvehiculartrafcinformation.Third,wediscusscurrentsetofsimulator toolsandmodelsforvehicularnetworkanalysis.Fourth,weoutlinethecurrenteffortsin 102

PAGE 103

modelingarrivalpatternsandheadwaydistributionofvehic ulartrafcthatisheavy tailed.Finally,wepointoutstudiesthathaveexaminedtheself-similarnatureof vehiculartrafc. Longitudinalandlarge-scalevehiculardatasetsareveryimportantinthestudy oftrafcandtelematicsapplications,butcollectingthemischallengingandusually expensive[ 30 114 217 ].Insomecases,commercialvendorslognumberofvehicles, G PScoordinates,speedandmovementtraces.However,therearethreedownsides toit.First,thesetracesarenotpubliclyavailabletotheresearchcommunity.Second, theycontainonlyparticularvehicleswithvendorspecichardware.Third,theyarefrom individualvehicleswithshortdrivingdistancesandinsomecaseswithnon-repetitive journeys.Invariably,theseissuesunderminetheefcacyofusingthemforanykindof longitudinalanalysis.Inthischapter,weusethedatasetthatwehaveproposedin[ 229 ], a ninexpensivemethodtocollectglobalscalevehicularmobilitytracesusingthousands offreelyavailabletrafcwebcams.Thesecamerasprovidecontinuousandne-grained monitoringofthetrafcinformofsnapshotimages. ImageProcessing Centraltothedatacollectionprocessistheimageprocessing,designedtobe efcientforsuchalargedataset.Manystudieshavebeencarriedoutthatlookinto aspectsofbothbackgroundsubtractionandobjectdetection[ 43 44 152 193 218 ]. I nbackgroundsubtractionmethods,differenceinthecurrentandreferenceframeis usedtoidentifyobjects,whileindetectionapproaches,objectfeatures(shape,etc.)are learnedtodetectandclassifythem[ 69 224 ].Inthischapter,weusetemporalmethods f orbackgroundsubtractiontocalculatearelativenumericalvalueinsteadofcounting cars[ 227 ].Wendbackgroundsubtractionisfasterandscalablethano bjectdetection, whichisdiscussedinlatersections.Previousattemptshavebeenmadetousevideos andcomputervisiontechniquesinvehicletrackingandtrafcsurveillancetodecrease congestiononfreewaysandsolveproblemsassociatedwithexistingdetectors[ 49 ]. 1 03

PAGE 104

Theyarealsousedtoestimatetrafcspeed[ 52 ]andautomaticvehicleguidancesuch a slanendinganddistancemeasurement[ 130 ].However,rstthesetechniqueswork o nvideosandsecondtheyarenotwidelyavailable,andthirdtheyrequirefarmore computingresourcesandprocessingtime.Ourworkinvolvestheuseofmillionsof snapshotstakenbytrafcwebcameras;andforthatmatter,ourtechniquesandtools arehighlyscalable,robusttooutliers,andcanbeappliedtoanykindofimage. ModelingandSimulatingVehicularMobility Thereisalargebodyofworkinthisarea,butweshalllimitourdiscussiontofew simulatorsandmodelsforbrevity.SimulationtoolslikeCORSIM[ 96 ],SUMO[ 142 ],and V ISSIM[ 156 ]aregearedtomodelspecicscenariosforplanningfuturetr afcconditions onmicro-mobilityandsmallscalelevel.Butthesetoolslackthosefeaturesthatperform causalitybasedtimeseriesanalysis,andtrafcdensitydistributionanditsrelationshipto capacityofroadways.Thus,ourcurrentworkhelpstollthisgapbyanalyzingtherich scenariosoftrafcpatternsevidentinworldcities.Severalstudieshavebeencarried outinthepasttoexaminevehicularmobilitypatterns.Theyweregearedtowardsarrival patternsandheadwaydistributionofvehiclesattheindividuallevel.Inthisregard, exponentialmodelwassuggestedthatdescribethearrivalpatternsforalongtime[ 2 ]. O ntheselines,Poissonprocesseshasbeenwidelyoptedformodelinglowtomoderate trafcpatterns,whichwerecharacterizedbysmooth,memorylessandindependent arrivals[ 86 186 ].However,theywereinsufcientinmodelingheavytrafcco nditions, theoreticallyjustiedbyDaou,whoproposedtheuseoflognormaldistributionfor modelingheadwaytrafcpatterns[ 56 ].Howeverlog-normalwasfailedbyseveral g oodness-of-ttestsandthecomplexityofitsapplicabilitytotherealtrafcledtothe introductionofgammadistribution.Later,thegammadistributionwasalsodeemed inadequatetostudytheuctuatingandbiasedtrafcarrivalpatterns[ 157 ].Recently, B aietal.studiedthatthespatio-temporalarrivalpatternsofcarshaveexponential distributions[ 12 ]. 1 04

PAGE 105

Overthelastdecade,awealthofstudieshaveshownthepresen ceofheavytail distributionsinlessizeandconnectionarrivalpatternsininternetmeasurements, durationoftelephonecalls,microarrayanalysis,andnancialdomains.Ingeneral, heavytaileddistributionshaveinnitevariance,reectingtheextremelyhighvariability thattheycapture.Thisisparticularlyimportanttostudyincaseofvehiculartrafc distributionthatcanleadtotheidenticationofemergingcongestionsonroadsand itsmitigation.In[ 253 ],authorshaveobserved20%oftheroadsegmentsown80% o fthetrafcthatcanbemodeledusingtheParetodistribution.Thetemporaland statisticalpropertiesofhumanactivitieshaveledtotheheavy-taileddistributionsin theoccurrenceofcaraccidents.Severalstudiesinvehicularnetworkhaveobserved heavy-taildistributionsinvehiculardensityandconnectionlifetimeformessage dissemination[ 6 126 ]. S elf-SimilarityforTrafcCharacterization Previousevidenceshavesuggestedthatheavytaildistributionsarewidespread andanunderlyingcauseofself-similarityinmanykindsofnetworks.Theself-similar natureoftrafcshowscaleinvariance,hasanimportantconsequencethatthewhole anditspartsarenotclearlydistinguishable.Althoughaplethoraofstudieshavebeen doneininternetmeasurementthatexaminethelongrangedependenceandself similarity,tothebestofourknowledge,littleworkisdonethatinvestigateself-similarity invehicularnetworks[ 3 150 189 ].In[ 174 ],authorshavestudieddynamicbehaviorof a singlevehiclemovingthroughasequenceoftrafclightsonasingle-lanehighway thathasdemonstratedtheself-similarbehavior.In[ 165 ],Mengetal.examinedthe q uantitativecharacteristicsoftheselfsimilarvehiclearrivalpatternonhighwaysand headwaydistributionintrafcdataprovidedbytheTexasDepartmentofTransportation. Usingcellularautomatamodel,Camparietal.showedthathighwaytrafcexhibits self-similarityinbothcardensityandow.Theyalsoconcludedthefractaldimension increasesfromfreeowtocongestedow[ 35 ]. 1 05

PAGE 106

Althoughthesestudiesstronglyindicatethatmodelingarri valpatternsisa challengingtask,theyhaveseveralshortcomings.Firstthedatasetusedwerelimited tofewregions,secondtheyweresampledfromsametypeoftrafcconditions(either highwaysorurbanstreets),andthirdtheyhavebeenfocusingonthearrivalpatterns ofindividualvehicleswithoutconsideringlanecapacitiesandtrafcdensitiesthatare importanttostudythevehicularcongestionandfutureaspectsofvehicularnetworking. Inthischapter,wespecicallyfocusonexaminingthedistributionoftrafcdensities onaplanetscalelevelthathelpstudyroadcapacityandinvestigatethecausesof widespreadcongestioninmajormetropolitanareas.Ourstudiesalignwithmacroscopic models,whicharesuitedforlargescaleandnetwork-wideapplicationsthataremore suitableformodel-basedestimation,forecasting,andtrafcowanalysis[ 217 ].We e xamineatotalof800+typesoftrafcdatasetstotaling25millionrecords,collected fromseveralregionsacrossEurope,NorthAmerica,andAustralia.Inaddition,the durationoftrafcdatacollectionspans3-5months,whichprovidesamplegroundfora longitudinalstudyinbothspaceandtime,whichhasnotbeendonebefore. 8.3TrafcModeling Inthissection,weperformmodelingandcharacterizationofvehiculartrafc densities.Weshowthatmemorylessmodelssuchasexponentialdistributiondonot capturethetrafctrends,insteadheavy-taildistributionssuchasWeibullarebetter atestimatingtheparametersofempiricaltrafcdata.Weusegoodness-of-ttestto supportouranalysisofusingsuchdistributionsfortrafcmodelingpurposes. EvaluationApproach Earlier,wehaveshownthatthetrafcateachlocationislinearlycorrelated tothenumberofvehiclesatthatlocation.Thenextstepistostudytheunderlying statisticalpatternsthroughasequenceofobservations.Weachievethisbymodeling theempiricalvehicletrafcdensitiesusingafamilyofheavy-tailandmemoryless distributions.Aheavy-taildistributionsuchasPareto,ischaracterizedbyadensity 106

PAGE 107

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 50 100 150 200 250 300 Normalized DensityFrequency A 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 50 100 150 200 Normalized DensityFrequencyt B F igure8-1.Histogramofempiricaltrafcdensitiesoftwodifferentlocations. functionthatconvergeslessrapidlythananexponentialfunction.Forarandom variableexhibitingheavy-tailwaitingtime,thelargeritsalreadypassedwaiting timevalue,thelowerthelikelihoodofthefuturearrivalinthegiventimeinterval.In caseofmemorylessprocessessuchasexponential,models'subsequentevents arecompletelyindependentfromthepreviousevents.Thedistributionmodelswe considerareExponential,Log-gamma,Log-logistic,Normal,Poisson,andWeibull distribution.Fortheanalysispurposes,wefollowtheapproachsuggestedbyClauset etal.toensurethattheparametersofthetheoreticalmodelsarenotestimatedfrom theobserveddata[ 47 ].First,wemeasurethedistributionparametersbyusingthe M aximumLikelihoodEstimation(MLE).Second,wevalidatethesignicancelevel ofestimatedmodelparametersusingthegraphicalpropertiesandgoodness-of-t measuresbasedonstatisticaltheory.WeuseKolmogorov-Smirnovtest(KS-test)and evaluateitsD-statistics(estimatemaximumabsolutedifferencebetweentheempirical andtheoreticaldistribution)ontheCDFsofestimatedparametersandofempirical vehicledensities.Werankmodelsbasedontheirsignicanceandaccuracyinmodeling theempiricaldata.Wealsoreportmodelsat3%and5%ofconservativedeviationin ordertoshowtheefcacyofmorethanonedistributioninmodelingtheempiricaldata. Finally,wecomputeanaggregatestatisticformorethan1000locationsthatshowsthe widespreadapplicabilityoftypicalheavy-taildistributionsinmodelingempiricaldata. 107

PAGE 108

0 0.1 0.2 0.3 0.4 0.5 0.6 Time (in min)Normalized Densities ASydney(Nov'10) 0 0.1 0.2 0.3 0.4 0.5 Time (in min)Normalized Densities BSydney(Dec'10) F igure8-2.Trafcdensitiesduringtwodifferenttimeperiods DataPreparation Wepreparethetrafcdensitydataofeachlocationasanindividualtimeseries.As showninTable 6-2 ,wemodelmorethan1000timeseriesfromsixregionsforsever al months.Specically,wedene y i ( t ) asthetimeseriesofvehiculartrafcdensities associatedwith i th locationattime t .Notethat y i ( t ) isatimeseriesoftrafcdensity ateachtrafclocationandislinearlyrelatedtothenumberofvehiclesatthattrafc locationasdescribedintheprevioussection.Foreach y i ( t ) ,wesystematicallycalculate distributionparametersbyusingthemaximumlikelihoodestimationandestimatethe goodness-of-tbetweentheempiricaldataofeachlocationagainstallthetheoretical modelsusingtheKolmogorov-Smirnov(KS)test[ 162 ]. S kewedDistribution Westartourobservationbylookingatthehistogramoftrafcdensitiesoftwo differentlocationsasshowninFigure 8-1 .Afairlysmoothhistogramisskewed-right w ithapossibilityoflargefrequencyoftrafcoccurringinthersthalf(averagedensity valueof0.45)ofthedensitydistribution.Forboththelocations,thefrequencymeanis centeredaround30andthemedianbinisat0.50.Thespreadoftheempiricalvalues showsthatawiderangeoftrafcispresentattheselocations.Wehaverecorded similarobservationsforothertrafclocationsaswell.Previousstudieshaveshown thatskewed-rightdataarebettermodeledusingWeibull-likedistributionthathave 108

PAGE 109

0 0.2 0.4 0.6 0.8 TimeProbability Density Measured Data Exponential Log-gamma Log-logistic Normal Weibull AWeibull 0 0.2 0.4 0.6 0.8 TimeProbability Density Measured Data Exponential Log-gamma Log-logistic Normal Weibull BLog-gamma 0 0.2 0.4 0.6Probability Density Time Measured Data Exponential Log-gamma Log-logistic Normal Weibull CLog-logistic F igure8-3.Curvettingforlocationstime-series.In(A)Weibull,(B)Log-gamma,and (C)Log-logisticbettermodelstheempiricaldata. twoparameters(shapeandscale),unliketheexponentialdistribution,whichhasonly oneparameter(rate)[ 95 ].Next,weusethemethodsproposedin[ 48 ]tocheckthe s tationarity,wherethemean,variance,andautocorrelationofdensitydistributionare allconstantovertime.InFigure 8-2 ,wehaveshownthedistributionoftrafcfortwo m onthsforalocationthatdespitefewspikeslooksstationary.Thisisanimportantstep, asexaminationoffractalbehaviorinthetrafcrequiresthestationaritycriteriontobe fullled. CurveFitting Next,weconsidertheunivariatedistributionttingusingourtheoreticalmodelsto theempiricaltrafcdensities.InFigure 8-3 ,weshowtheempiricalprobabilitydensity f unction(PDF)plotforthetteddistributionofthreedifferentlocations'trafcdensities togetherwithveothertheoreticalmodels.InFigure 8-3A ,Weibulldistributionhas e stimatedtheparametersforempiricaldataquitewellandthettingagreeswith theempiricalPDF.InFigure 8-3B ,Log-gammadistributionhastheleastdeviation a ndisabletomodeltheempiricaldataquitewell,thettingshowsthatLog-gamma distributionisabletoagreewiththeempiricalPDFoftrafcdensitiesforthatlocation. ThelastFigure 8-3C showsthatthettedlog-logisticdistributionverywellmod els theempiricaldensitiesofthenallocation.Although,onemayreasonthatthethree samplescorroboratesefcacyoftheheavy-tail,theanalysisistryingtofocusthat memorylessdistributionsuchasexponentialdeviatelargelyinaccuratelymodeling theempiricaldata.Itcanverywellbesaidthatmodelparametersofexponential 109

PAGE 110

distributionhaveunderestimatedtheempiricaldatainallt hethreecases,whilethe normaldistributionhasoverestimatedtheskewed-rightsectionoftheempiricaldata. Thus,aninitialregressionanalysisindicatesthatmodelswithheavy-tailproperties suchasLog-gamma,Log-logisticandWeibullarebetterinestimatingtheparametersof empiricaldata. Goodness-of-FitTest Weextendourstudyofmodelingtoallthelocationsandperformagoodness-of-t testasexplainedpreviously.Theresultofgoodness-t-testranksvariousdistributions basedonthedeviationofcurvesfoundandthevaluesofestimators.Wehaveobserved thatthetrafcatindividuallocationscanvaryalot,butingeneralLog-gamma, Log-logistic,andWeibulldistributioncancapturethekeytrends.InTable 8-1 ,wehave r ankedthetopthreedistributionsthatverywellestimatetheparametersofempirical data.Log-logisticdistributionyieldsamuchbettertforfouroutofsixregions,while mostofthetrafclocationsinLondonandSeattlearebestdescribedusingLog-gamma andWeibulldistributionrespectively.Theresultsofgoodness-of-ttestalsoshowthat 91%ofConnecticut,70%ofSydney,and80%ofWashingtonD.Clocations'empirical trafcdataarebettermodeledusingLog-logisticdistribution.InTable 8-1 ,weshow d ominantdistributionsat3%and5%deviationfromtheempiricaldatadistribution, withmostofthecasesshowingheavy-tailmodelsarebettersuitedforcharacterizing theempiricaldata.Forexample,at3%deviation,50%oflocations'trafccanbe modeledusingLog-logisticdistributionwhileat5%deviation93%locations'trafccan bemodeled.Theseresultsstronglyindicatethatthetrafcatseverallocationsinthose sixregionslastsforalongtime,whichcanleadtocongestion-likesituation. InFigure 8-4 ,weshowtheaggregateresultsofgoodness-of-tcriterionf orall1000 locations.Ourresultsshowthatthedistributionswithheavy-tailpropertiesareprominent inmodelingtheempiricaldata. 110

PAGE 111

Table8-1.Dominantdistributionasbestts[rankedand%deviation] Region1 st BestFit2 nd BestFit3 rd BestFit 6 3% 6 5% ConnecticutL[91%]G[5%]W[4%]L[50%],W[2%],G[1%] L[93%],W[13%],G[10%],E[5%] LondonG[38%]L[29%]W[26%]G[20%],L[15%],W[10%],N[8%]G[55%],L[51%],W[44%],N[23%] SydneyL[70%]G[17%]W[14%]L[65%],G[22%],W[8%], G[49%],W[37%],N[6%] TorontoL[40%]G[27%]W[26%]G[18%],W[17%],L[9%],E[3%]W[72%],L[69%],G[63%],E[24%],N[1%] WashingtonD.C.L[80%]W[11%]G[7%]L[60%],W[8%],G[6.54%],E[4%]L[91%],W[35%],G[30%],E[14%] SeattleW[36%]L[34%]G[29%]W[16%],G[14%],L[4%] G[55%],W[47%],L[35%] E=Exponential,G=Log-gamma,L=Log-logistic,N=Normal,W=Weibull 111

PAGE 112

Exponential Log-gamma Log-logistic Normal Poisson Weibull 0 10 20 30 40 50 60 70 Distribution ModelAvg. % [1st Best Fit] <3%=4% <5%=5% <3%=17% <5%=24% <3%=2% <5%=4% <3%=54% <5%=41% <3%=23% <5%=26% Figure8-4.Thepercentageforthedistributionsthatcovera llsixregions.Thevaluesin theboxshowaveragepercentageestimatesdeviationfromempiricaldata. Ingeneral,ourresultsshowthatmemorylessdistributionssuchasexponentialare insufcienttoexplaintheempiricaldistributionoftrafcdensities.Wendthatempirical valuesarebettermodeledusingdistributionwithheavy-tailssuchasLog-gamma, Log-logistic,andWeibulldistributions.TheLog-logisticdistribution,particularlyyields bettertfor57%ofallaggregatedlocationsincomparisontoLog-gammaandWeibull, whicharebotharoundthe20%mark.TheLog-logisticalsohaslessdeviationat both3%and5%errorlevelcomparedtoallothermodels.Theseresultsimplythat mostoftheurbantrafcpatternsareburstyinnatureandtraditionalmemoryless distributionssuchasexponentialareinadequateinrealisticallycapturingthetrafc patterns.Previousstudieshaveshownthatthepresenceofheavy-taildistribution indicateaself-similarbehaviorwithnoticeableburstsatawiderangeoftimescale[ 3 189 ].Next,weexaminethisconjectureforalltrafctimeseries andestimatethevalueof Hurstexponentusingvariousestimators. 8.4Self-SimilarityandLongRangeDependence Intheprevioussection,wesawthattheoreticaldistributionswithheavytailsare betteratmodelingtheempiricaldata.Suchdistributionsexhibitextremevariabilityin sampling.Forexample,samplingforsuchdistributionsresultsinlargequantitiesof valuesbeingverysmallbutafewsampleshavingextremelylargevalues.Suchtype ofbehaviorwasknowntocauselong-rangedependenceandself-similarityinthe 112

PAGE 113

networktrafc[ 3 189 ].Inthissection,weexaminethisnatureofself-similarpro cessfor vehicularsettingandestimateHurstexponent. 8.4.1Self-SimilarProcessandLongRangeDependence Self-similarprocessesexhibitfractalslikebehaviorrstobservedbyMandelbrot[ 159 ]. A timeserieswithdeterministic self-similar natureisinvarianttoscalinginbothspace andtime.Althoughinpracticality,weusestochasticself-similaritythattakesalevel ofnondeterminism.Inmostcases,burstinessiscapturedbysecondorderstatistics, andtheshapeofauto-correlationfunctionifpreservedacrossrescaledtimeisusedto ascertainthephenomenon.Ingeneral,a longrangedependence isreferredwhenthe auto-correlationfunctiondecayspolynomiallyasopposedtoexponentialasafunction oftime.Consideratimeseries X ( t ), t 2 Z ,ismeasuredtrafcdensityattime t foran aggregatedtimeseriesdenotedby Y ( t ) suchthat X ( t )= Y ( t ) )Tj /T1_3 11.955 Tf 12.24 0 Td (Y ( t )Tj /T1_4 11.955 Tf 12.24 0 Td [(1) .Wedene anaggregatedprocess X ( m ) ataggregationlevel m by X ( m ) ( i )= 1 m m i X t = m ( i )Tj /T1_7 7.97 Tf 6.6 0 Td [(1)+1 X ( t ) Thus, X ( t ) isdividedintonon-overlappingblockseachofsize m .Letsnowdenote auto-covariancefunctionof X ( m ) by r ( m ) ( k ) .The X ( t ) isasymptoticallyself-similarwith HurstexponentH( 1 = 2 < H < 1 )if lim m !1 r ( m ) ( k )= 2 2 ( ( k +1) 2 H )Tj /T1_4 11.955 Tf 12 0 Td (2 k 2 H +( k )Tj /T1_4 11.955 Tf 12 0 Td [(1) 2 H ) ; 8 k 1 Theasymptoticallyself-similarnaturehelpsthenondeterminismnatureandis importantformodelingtrafcdensities.Let r ( k )= r ( k ) = 2 ,whichdenoteautocorrelation functionthenfor 0 < H < 1 r ( k ) H (2 H )Tj /T1_4 11.955 Tf 11.4 0 Td [(1) k 2 H )Tj /T1_7 7.97 Tf 6.6 0 Td (1 for k !1 .Nowif 1 = 2 < H < 1 ,then r ( k ) goes ck )Tj /T1_12 7.97 Tf 6.6 0 Td (f ,0 0 isconstantand f =2 )Tj /T1_4 11.955 Tf 12 0 Td (2 H ,then 1 X k = )-2(1 r ( k )= 1 (81) Here,wehaveshownthatthe r ( k ) decayshyperbolicallyoveralongtime.When autocorrelationfunctiondecayhyperbolicallywith( 81 )holds,thecorresponding s tationaryprocess X ( t ) islongrangedependent. 113

PAGE 114

8.4.2RelationbetweenSelf-SimilarityandLongRangeDepen dence AlthoughSelf-similarityandlongrangedependencecannotguaranteeifoneis presentthenotherisobeyedaswell.Butforasymptoticallyself-similarprocesswith 1 = 2 < H < 1 alongrangedependenceprocessisself-similarandviceversa. 8.4.3ImpactofHeavyTailonLongRangeDependence Inordertoestablishthefactthatheavytailandindicateaself-similarprocess,lets startwitharandomvariable Z thathasaheavytaildistributionif, P f Z > x g cx )Tj /T1_7 7.97 Tf 6.6 0 Td ( x !1 and 0 << 2 isthetailindexsuchthatthe tailisasymptoticallydecayinghyperbolicallyunlikememorylessprocessessuchas exponentialdistribution.Foranunboundedmean 0 < 1 ,theyalsogiverise toextremelylargevalueswithverysmallprobabilityandbulkofsmallvalueswith largeprobability.Suchasamplingmechanismofheavytaildistributionslowdownthe convergencerateofthesamplemeantothepopulationmean. Nowconsideranon/offmodelwith N independenttrafcsources X i ( t ), i 2 [1, N ] thecumulativeprocess Y N ( Tt ) isdenedas Y N ( Tt )= R Tt 0 ( N X i =1 X i ( s )) ds where T > 0 isascaleparameter.Then Y N ( Tt ) statisticallybehaveas E ( on ) E ( o n )+ E ( o ) NTt + CN 1 = 2 T H B H ( t ) (82) forlarge T N where H =(3 )Tj /T1_3 11.955 Tf 12.72 0 Td ( ) = 2, B H ( t ) iffractionalBrownianmotion(FBM) withparameter H ,and C > 0 isaquantitydependingonlyonthedistributionof on and o .Thus,whensuitablynormalized, Y N ( Tt ) behaveslikeaFBMuctuatingaround NTtE ( on = E ( on + E ( o )) .Itdemonstratelongrangedependencefor 1 = 2 < H < 1 iff 1 << 2 thatmean on distributionisheavytailedelseitwillbehaveasamemoryless process.Thiscaseofonandoffperiodsprovidesanessentialcomponenttogenerate longrangedependenceintheaggregatedtimeseries. 114

PAGE 115

Astationaryprocesshasheavytailcharacteristics,ifthep rocess X t has lim k !1 ( k ) = c p k )Tj /T1_6 7.97 Tf 6.6 0 Td ( =1 ,forarealnumber 2 (0,1) withaconstant c p > 0 forasample correlationof k .Hurstexponent( H ),whosevalueliesintheinterval 1 = 2 < H < 1 iswidelyusedtocharacterizelongrangedependenceandself-similarprocesses. SeveralestimatorsareusedtoestimatethevalueofHurstexponentthatmeasurethe correlationsdecayslowlytozero.Inthischapter,weevaluatesevenestimatorsfor robustnessinourresults. 8.4.4EstimationofHurstExponent WebenchmarktheestimationofHurstexponentusingsevendifferentestimators toensurethattimeseriesoftrafcdensitiesareself-similarinnatureandhaslong rangedependence.Theestimatorswehaveusedare:(i)AbsoluteValueMethod,(ii) AggregateVarianceMethod,(iii)VarianceofResidualsMethod,(iv)R/SMethod,(v) Periodogram,(vi)WhittleMethod,and(vii)Abry-VeitchMethod.Ingeneral,theHurst parameterisasymptoticallyestimatedthatispronetosomestatisticallyuncertainty anderrors.Byapplyingmanyestimatorswetakeaholisticapproachinanalyzingthe self-similarbehavior.Adetailedinformationabouttheworkingoftheseestimatorsis giveninApendix F.1 8 .5AnalysisofSelf-Similarity Intheprevioussection,wesawthattheoreticaldistributionswithheavy-tail propertiesarebetteratmodelingtheempiricaltrafcdensities.Suchdistributions exhibitextremevariabilityinsampling.Forexample,samplingsuchdistributionsresultin largequantitiesofverysmallvaluesandfewsampleswithextremelylargevalues.Such typeofbehaviorisknowntocauselong-rangedependenceandself-similarityofthe networktrafc[ 3 189 ].Here,weexaminethisself-similarnatureinavehicularse tting andestimatetheHurstexponent. WebenchmarktheestimationofHurstexponentusingsevendifferentestimators tostudywhethertheobservedtrafcdensitytimeseriesofall819locationsis 115

PAGE 116

0 4000 8000 12000 16000 20000 0 10000 20000 Chronological Time (1 min, scale=10 0 ) (a) Density 0 400 800 1200 1600 2000 0 10000 20000 Chronological Time (10 min, scale=10 1 ) (b) Density 0 40 80 120 160 200 0 10000 20000 Chronological Time (100 min, scale=10 2 ) (c) Density 0 4 8 12 16 20 0 10000 20000 Chronological Time (1000 min, scale=10 3 ) (d) Density Figure8-5.Scalingofstochasticselfsimilarvehiculartra fc. self-similarinnature.Theestimatorswehaveusedare:(i)AbsoluteValueMethod, (ii)AggregateVarianceMethod,(iii)VarianceofResidualsMethod,(iv)R/SMethod, (v)Periodogram,(vi)WhittleMethod,and(vii)Abry-VeitchMethod.Ingeneral,the Hurstexponentisasymptoticallyestimated,whichispronetostatisticaluncertainty anderrors.Byapplyingmanyestimatorswetakeacomprehensiveapproachin analyzingtheself-similarbehavior.Adetailedinformationabouttheseestimatorsis givenin[ 3 129 189 ]. T herstindicationthattrafctimeserieshaveastochasticself-similarprocess isvisuallydepictedinaseriesofplotsinFigure 8-5 .InFigure 8-5 (a)thetrafctrace i splottedagainstatimegranularityof1minute.Figure 8-5 (b)isthesametrafctime s eriesaggregatedbyafactorof10(i.e.thetimescaleiscompressedat10minutes). SubsequentplotsofFigure 8-5 (c)and(d)areaggregatedbyincreasingthegranularity b ytwomoreorders.Theseplotslookverysimilartolongrangedependenceandare invarianttothechosentimegranularity. 116

PAGE 117

0 2 4 6 8 10 12 14 0 1 2 3 4 5 6 7 8 log(X m )log(Var(X m )) Measurement Densities linear Fit AAbsoluteMoment(H=0.69) 0 2 4 6 8 10 12 14 0 5 10 15 log(size(X m ))log(Var(X m )) Measurement Densities linear Fit BAggregateVariance(H=0.64) 0 2 4 6 8 10 12 14 5 10 15 20 25 30 35 40 log(X m )log(Avg(Var(X m ))) Measurement Densities linear Fit CVarianceofResiduals(H=0.79) -14 -12 -10 -8 -6 -4 -2 0 -5 0 5 10 15 20 25 log(frequency)log(periodogram) Measurement Densities linear Fit DPeriodogram(H=0.76) 0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 log(std)log(r/s) Measurement Densities linear Fit ER/S(H=0.73) F igure8-6.Hurstexponentplotsforthetimeseriesofonelocation. WeusetheSelstool[ 129 ]toinvestigatethevalueofHurstexponentusingseven d ifferentestimatorsat95%condenceintervalforall819timeseries.AHurstexponent at(orverycloseto)0.5indicateslack(orweaknessof)thelongrangedependenceand suggestsashortrangememoryprocess,whileanexponenthigherthan0.5andcloser to1indicatesastronglong-rangedependenceandsuggestsaself-similarprocess.In Figure 8-6 ,weplotveestimatedHurstexponentsforonetimeseries.Th eplotclearly indicatesaselfsimilarprocess.Thelineartshowsthatthedecayisveryslowforall 117

PAGE 118

Table8-2.TheaveragevalueofHurstExponentcalculatedfor alllocations. H isthe averagevalueofHurstexponent. H isdeviationfromthemeanvalue. Region AbsoluteMomentsAggregateVarianceAbry-VeitchPer iodogramVarianceofResidualsR/SWhittle Connecticut H 0 .81 0.76 0.850.85 0.850.720.85 H 0.04 0.06 0.030.03 0.030.030.03 London H 0.80 0.79 0.830.85 0.850.570.85 H 0.07 0.07 0.070.03 0.030.030.03 Sydney H 0.78 0.78 0.830.85 0.850.540.81 H 0.05 0.05 0.060.03 0.030.040.06 Toronto H 0.82 0.84 0.850.85 0.850.650.85 H 0.04 0.04 0.030.03 0.030.030.03 WashingtonDC H 0.55 0.79 0.780.76 0.760.790.80 H 0.04 0.06 0.060.04 0.030.040.06 Seattle H 0.74 0.76 0.750.75 0.750.580.75 H 0.06 0.06 0.030.03 0.030.040.03 estimators.WhittleandAbry-Veitchestimators,whichcann otbeplotted,areestimated atH=0.77andH=0.79respectively. InFigure 8-7 ,weshowthebar-plotsofthedistributionsofestimatedHurs texponent byallthesevenestimatorsforallsixregions.Asevidentfromtheseplots,theHurst estimatorsconsistentlyproduceresultswell-exceeding0.5.Thesetrendssupportthe adequacyofself-similarprocessesinmodelingthevehiculartrafctimeseriesover varioustimescales.Thisresultisalsoin-linewithourpreviousndingsthatLog-gamma, Log-logistic,andWeibull(consideredinthefamilyofheavy-taildistributions)provide abetterdistributiontforobservedtrafcdensities,andalsosupportsthefailure ofmemorylessdistributions.InTable 8-2 ,weshowthattheaveragevalue( H ) of Hurstexponentwithdeviation( H )forthelocations'timeseriesthatareexhibitingthe selfsimilarprocess.Asevident,averageHurstexponentisgreaterthan0.5withlow deviationat95%condenceinterval.Finally,weshowtheaggregatestatisticsforall self-similartimeseriesinFigure 8-8 .Thisresultindicatesthatthetrafconmorethan 7 0%locationsofLondon,ConnecticutandTorontoisself-similar,whilelessthan50%of SydneyandWashingtonD.C.trafcisself-similar. Overall,ouranalysisfromsection 8.3 and 8.5 showsthatthecurrenttrafctrends a rebettermodeledusingheavy-taildistributions.IngeneralLog-gamma,Log-logistic andWeibulldistributionarebetterinmodelingempiricaldataoftrafcdensities,as 118

PAGE 119

0.5 0.6 0.7 0.8 0.9 0 20 40 60 Hurst EstimatorFrequency (%) Absolute Value Agg. Variance Abry-Veitch Periodogram Residuals RsPlot Whittle AConnecticut 0.5 0.6 0.7 0.8 0.9 0 20 40 60 Hurst EstimatorFrequency (%) BLondon 0.5 0.6 0.7 0.8 0.9 0 20 40 60 Hurst EstimatorFrequency (%) CSydney 0.5 0.6 0.7 0.8 0.9 0 10 20 30 40 Hurst EstimatorFrequency (%) DToronto 0.5 0.6 0.7 0.8 0.9 0 20 40 60 Hurst EstimatorFrequency (%) EWashingtonDC 0.5 0.6 0.7 0.8 0.9 0 20 40 60 Hurst EstimatorFrequency (%) FSeattle F igure8-7.AhistogramshowingthedistributionsofHurstexponent. Connecticut London Sydney Toronto Washington DC Seattle 0 20 40 60 80 100 City% Locations (0.5 < H < 1) Figure8-8.Thepercentageoflocationsfromregionsthathav eself-similarityintrafc. 119

PAGE 120

showedbythegoodness-of-ttest.Also,thetrafctimeseri esexhibitself-similar process. 8.6ConclusionandFutureWork Inthischapter,wehaveshownthattheobservedvaluesofvehiculartrafcdensities arebettermodeledusingheavy-taildistributions.Sinceaheavy-taildistributionalso indicatesaself-similarprocess,aninvestigationintothatdirectionshowedself-similarity inthetimeseriesoftrafcdensities.Thedatasetusedinthisstudywasextractedfrom sixmetropolitanregionsaroundtheworld.Inall,weexamined819locations'(spread inthosesixregions)vehiculartrafcdensitydata,containingmorethan25million records.Ourrstanalysisonmodelingandcharacterizationindicatesthatheavy-tail distributionssuchasLog-gamma,Log-logistic,andWeibullbettermodeltheseobserved trafcdensities.Thegoodness-of-ttestshowedthatusingheavy-taildistributions, themeasureddistancesarestatisticallysignicantformorethan90%locations.In thesecondanalysis,wefoundthattimeseriesoftrafcdensitiesareself-similar, asestimatedbysevendifferentestimatorsfortheHurstexponent.Theseresults suggestthatthetraditionalnotionofusingmemorylessmodelsfortrafcmodeling purposesshouldberevisited.Webelievethatourstudywillprovidenewinsightintothe developmentoffuturevehicularnetworksandinfrastructuredesign.Infuture,wealso plantomakethisdatasetavailabletotheresearchcommunity. 120

PAGE 121

CHAPTER9 T HESTRUCTUREANDTRAFFICFLOWANATOMYOFVEHICLES Adata-drivenrealisticdesignandevaluationofvehicularmobilityhasbeen particularlychallengingduetoalackoflarge-scalereal-worldmeasurementsinthe researchcommunity.Currentresearchmethodologiesrelyonarticialscenarios, randomconnectivity,andusesmallandbiasedsamples.Inthispaper,weperforma combinedstudytolearnthestructureandconnectivityofurbanstreetsandmodeling andcharacterizationofvehiculartrafcdensitiesonthem.Ourdatasetisacollection ofmorethan222thousandroutesand25millionvehicularmobilityimagesfrom 1091onlinewebcameraslocatedinsixdifferentregionsoftheworld.Ourresults centeredaroundfourmajorobservations: i. studyshowsthatdrivingroutesandvisiting locationsofregionsdemonstratepower-lawdistribution,indicatingaplannedorrecently designedroadinfrastructure; ii. werepresentregionsbynetworkgraphsinwhich nodesarecameralocationsandedgesareurbanstreetsthatconnectthenodes. Suchrepresentationexhibitssmallworldpropertieswithshortpathlengthsandlarge clusteringcoefcient; iii. trafcdensitiesshow80%temporalcorrelationduringseveral hoursofaday; iv. modelingtrafcdensitiesagainstknowntheoreticaldistributionsshow lessthan5%deviationforheavy-trailedmodelssuchaslog-logisticandlog-gamma distributions.Webelievethisworkwillprovideamuch-neededcontributiontothe researchcommunityfordesignandevaluationoffuturevehicularnetworksandsmart cities. 9.1Introduction Researchintheareaofvehicularnetworkshasincreaseddramaticallyinthe recentyears.Withtheproliferationofmobilenetworkingtechnologiesandtheir integrationwiththeautomobileindustry,variousformsofvehicularnetworksarebeing realized.Thesenetworksincludevehicle-to-vehicle[ 30 ],vehicle-to-roadside[ 123 ], a ndvehicle-to-roadside-to-vehiclearchitectures.Realisticmodeling,simulation,and 121

PAGE 122

informeddesignofsuchnetworksfaceseveralchallenges,ma inlyduetothelackof twomainfactors: i. Underlyingtopology,and ii. Large-scalecommunity-widelibrariesof vehiculardatameasurement. Topologicalunderstandingisimportantinaccuratelymodelingvehicularmobility. Itinvolvesintersections,roadsandtheirconnectivity.Itcomesasnosurprisethose topologicalconstraintslikespeedlimits,direction,etc.impacttrafccongestion,density, scenariogeneration,andmobility,whichinturnaffecttheperformanceofanynetwork communicationprotocol[ 14 ].Thus,foraccurateevaluationofavehicularnetwork,one s houldhaveabetterknowledgeofitstopology. Earlierstudiesinmobilitymodelinghaveclearlyestablishedadirectlinkbetween vehiculardensity distributionandtheperformanceofvehicularnetworks'primitivesand mechanisms,includingbroadcastandgeocastprotocols[ 12 ].Initialeffortstocapture r ealisticvehiculardensitydistributionswerelimitedbyalackofavailabilityofsensed vehiculardata[ 245 ].Hence,thereisaneedtocollectandconductvehiculardens ity modelingusinglargerscaleandmorecomprehensivedatasets.Furthermore,commonly usedassumptions,suchasExponentialdistributions[ 12 ]and[ 244 ],havebeenusedto d erivemanytheoriesandconductseveralanalyses,thevalidityofwhichbearsfurther investigation. Inthispaper,werststudythestructureandconnectivityoftheurbanstreetsofsix majorregionsandsecond,performalarge-scaledata-drivensystematicanalysisand modelingofvehiculartrafcdensitydistributions. Recently,thedepartmentsoftransportationofseveralregions(e.g.,London, Sydney)havestartedtodeploytrafcweb-camerastocriticalintersectionsand highwaystostudytrafcpatterns.Wecollectedtwodifferentkindofdata: i. geo-graphical coordinatesoftheselocationsandcreatedagraph G ( V E ) asmentionedbefore. ii. Toavoidthelimitationsofsensedvehiculardata,wealsoutilizetheexistingglobal infrastructureoftensofthousandsofvideocamerasprovidingacontinuousstream 122

PAGE 123

ofstreetimagesfromdozensofregionsaroundtheworld.Mill ionsofsuchimages capturedfromtheseavailabletrafcwebcamerasareprocessedusinganoveldensity estimationalgorithmtobuildanextensivemeasurementdatasetofspatio-temporal vehiculartrafcdensities. Weperformacomprehensiveanalysisofthesedatatostudythestructureand connectivityofurbanstreetsandcharacterizetheunderlyingstatisticalpatternsoftrafc densityatindividualintersectionsandhighwaysofmajorcities.Inresultsshowthat i. Visitstolocationsfollowpower-lawdistribution; ii. Roadnetworkshaveshortpath lengthsandlargeclusteringcoefcient,indicatingsmall-worldproperties; iii. Temporal correlationsofvehiculartrafcdensityforindividualcameralocationsarenearly80% betweenconsecutivehours,butgodownto30%fora3-4hourslagdifference.Wealso investigatetrafcmodelingbycomparingthefrequenciesobservedintheempirical densitydistributiontotheexpectedfrequenciesofthetheoreticaldistribution.Theresult ofthisactivityshowsthattheempiricalvaluescloselyfollow(lessthan3%deviation onKS-test)heavy-tailedmodelssuchas`Log-logistic'and`Weibull'distributions.The contributionsofthisworkare: Weprovide,tothebestofourknowledge,byfarthelargestandmostextensive datasetforfuturevehicularnetworkanalysis.Thispotentiallyaddressesasevere shortageofsuchdatasetsinthecommunity. Weintroduceanewandmorepracticalwaytolookintourbanstreetnetworks basedondrivingroutes.Anetworkgraphofroutesandlocationsdepictsmall worldproperties. Weestablishheavy-tailedmodelssuchaslog-logisticandlog-gammadistributions asthemostsuitabletsformodelingvehiculartrafcdensity. Webelieveourworkhelps`llagap'betweentheexpectedandrealizednecessityfor the`designandevaluationofrealisticanddata-drivenmodels'forfuturegenerationsof vehicularnetworks. Inwediscussrelatedwork,insection 9.3 ,wediscusstopologicalanalysisofurban s treetmapsforsixdifferentregions.Insection 9.4 ,westatisticallymodelvehiculartrafc 1 23

PAGE 124

andcharacterizeit.Insection 9.5 ,weshowtheimpactandchallengesonthevehicular n etworks.Finally,weconcludeinsection 9.6 withfuturework. 9 .2RelatedWork Inthissection,wediscusstherelatedwork,whichiscategorizedindatacollection andpre-processing,networkanalysisforurbanstreetsandvehicularnetworks.Inthe rstcategory,wediscusstheinadequaciesofexistingrepositoriesofvehicularmobility data.Next,techniquesusedtoprocessimagedataareexamined.Inthepast,efforts havebeenmadetocollectvehicularmobilityrecords;byGPStraces,vialoopdetectors andradiosensors[ 131 ]and[ 143 ]and[ 251 ].However,thesedatasetsaregenerallynot p ubliclyavailableandlimitedintheirscope,size,andgeographicspread.Inadditionto theirsmalltimeline(typicallyonlyafewdays) 1 ,whichmakestheiruseforlongitudinal analysislimited,themethodsappliedwerealsospecictothesedatasetsandcannot bescaledforotherpurposes.Webelievethatsimilartothepedestriantracedataset in[ 140 ],acomprehensiverecordofvehicularmobilityisvitalfora researchinfuture vehicularnetworks.Incontrasttothedatasetsdescribedabove,ourdatasetcovers sixregions,forperiodsofseveralmonthsathundredsoflocations(seeMeasurement sectionforspecics). Second,centraltothedatacollectionprocessistheimageprocessing,designedto becomputationallyefcientforsuchalargedataset.Manystudies[ 43 ]havebeen c arriedoutthatlookintoaspectsofbothbackgroundsubtraction[ 44 ]and[ 193 ] a nd[ 219 ]andobjectdetection[ 152 ].Inbackgroundsubtractionmethods[ 69 ],difference b etweenthecurrentandreferenceframeisusedtoidentifyobjects.Indetectionbased approaches[ 224 ],learningtheobjectfeatures(shape,sizeetc.)areusedto detect 1 S pecically,[ 12 ]uses3daysofdata,[ 131 ]usestracesrangingfrom30hourstoa t otalof400hoursfrom4to100cars(sensingpointsinthatcontext)while[ 165 ]usea l ongersample,30days,butonlyat5locations. 124

PAGE 125

andclassifythem.Inthisresearch,weareusingtemporalmet hodsforbackground subtractiontoestimatearelativenumericalvalueinsteadofcountingcars.Wend backgroundsubtractionismuchfaster,robusttooutliers,applieduniversallyandmore scalablethanobjectdetection(seeMeasurementsectionformoredetails). Third,attemptshavebeenmadetoexaminethestructureandtopologicalfeatures ofvehicularnetworksusingappliedgraphmeasuressuchascentrality.Inthisregard, Cardilloetal.[ 38 ]performeda1-squaremileprojectonseveralcitiesacrosst heworld andstudiedlocalandglobalpropertiesofthegraphstocategorizetheirorganicversus plannedgrowth.Theyalsostudiedthebackboneofacitybyderivingspanningtrees basedonedgebetweennessandedgeinformation.Severalotherstudieshavealso foundthatforsustainableurbandesign,centrality,self-organizedstructures,andscaling, aredrivingforces[ 25 ]and[ 51 ]and[ 243 ].In[ 196 ]and[ 197 ],theauthorsexaminedthe r elationshipbetweenstreetcentralityanddensitiesofcommercialandserviceactivities inthecityofBolognaandBarcelona.Wetaketheapproachofstudyingthefeatures ofaregionbyextractingmotorwaysthatarefrequentlyusedandconnectedwithmajor locations.Thisway,weisolatetheissuesofcongestionandconnectivityamongzones oftheseregions. Finally,in-carandout-carcomputinghasshownpromisingresultsintermsof driverlesscars,controlinterfacesforfuturecarsthatwillhaveminimalvisualdemands, sophisticatedtrafcmanagementsystems,suchasthoseincorporatingdynamictrafc assignments[ 57 90 ]and[ 168 ]and[ 248 ].Whilefurtherexaminationofthesecutting e dgetechnologiesisbeyondthescopeofthisresearch,webelieveourapproach providesaplanet-scalesystemfordatacollectionprovidinginvaluabledataforthe developmentoffuturevehicularservicesandapplications. 9.3AnalysisofTopologicalProperties Inthissection,weexaminedegreedistributionandsmallworldpropertiesofsix differentregionsandstatesinordertostudythestructureandconnectivityoftheir 125

PAGE 126

AConnecticut BLondon CSydney DToronto ESeattle FWashingtonD.C F igure9-1.Radialaxislayoutshowtwodimensionalrepresentationofdegree distribution.Eachradiatingaxis(spar)isgroupedbysimilardegree distribution h k i .Theclockwisevaryingofcolordotsfrombluetoredmean increaseinthevalueof h k i fornodes.Thevaryingsizesofdotsare respectiveaverageweighteddegree h ^ k i .Alargersizedotsmeanmore weight.ForConnecticutandSydneythedistributionof h ^ k i showpower-law distributionswith 2 << 3 andforTorontoshowthesameforits h k i distribution.Londonisanoldcity,withlotsofsmallstreetsandintersections (hencemorethanonewaystoreachdestination),shownopower-law distributions( =3.5 ). 126

PAGE 127

Table9-1.Parameteranddetails Abbr.Deails Abbr.Deails Abbr.Deails Abbr.Deails G ^ G Unweighted,Weightedgraph k m Largestdegree d trafcdensity P ExponentialDistribution V Totalnumberofnodes(cameralocations) h ^ k i Averageweighteddegreepernode L Characteristicpathlength M GammaDistribution E Totalnumberofedges(streets) ^ k m Largestweighteddegree C Clusteringcoefcient LL LoglogisticDistribution k Degreeofanode Powerlawexponent L r Randomgraphcharacteristicpathlength N NormalDistribution h k i Averagedegreepernode Correlationcoefcient C r Randomgraphclusteringcoefcient W WeibullDistribution 127

PAGE 128

Table9-2.Networkanalysisofvehiculartrafc.Reportofdegreedistribution,power-lawexponent,pathlength,clustering coefcient,andmodelttingoftrafc City VE h k i k m h ^ k i ^ k m (G) ( ^ G)LL r CC r DominantdistributionasBestFits(ByRanking)DominantdistributionsasBestFits(By%DeviationKS-Test) 1 st BestFit2 nd BestFit3 rd BestFit 3% 5% Connecticut274212815518994321193.52.413.62.330.520.05LL[87%]M[11%]P[0.5%] LL[62%],M[15%],W[3%] LL[94%],M[44%],W[19%] London18112521332218070893.53.53.042.230.50.07LL[42%]M[39%]W[16%] M[34%],LL[34%],W[10%],N[0.5%]LL[82%],M[70%],W[47%],N[7%] Sydney673199.5263229853.52.982.732.050.560.137LL[62%]M[32%]N[2%] LL[88%],M[61%],W[4%],N[2%]LL[98%],M[88%],W[44%],N[18%] Toronto208112810447435213232.83.55.022.50.60.05M[46%]W[31%]LL[21%] M[75%],W[58%],LL[34%]M[94%],W[88%],LL[87%],P[4%],N[1%] Seattle1215139.821123533763.53.53.32.270.560.087W[36%]LL[34%]G[29%] W[16%],G[14%],LL[4%] G[55%],W[47%],LL[35%] WashingtonD.C240308926.8923530158243.52.82.341.90.5370.11LL[80%]W[11%]G[7%] LL[60%],W[8%],G[6.54%],E[4%]LL[91%],W[35%],G[30%],E[14%] 128

PAGE 129

urbanstreetnetwork.Werepresentthisnetworkbyagraph G = ( V E ) ,where V isthesetofcameralocationsasnodesand E isasetofdrivingsegmentsasedges, inter-connectingthenodesofset V ofthenetworkgraph G .Thedegree k i ofanode i in G isthenumberofedgesincidentwiththenode.Inanundirectedandunweighted G (weight=1),thedegreecanbewrittenintermsoftheadjacencymatrix A as k i = n X j =1 A ij Theweighteddegreeofeachnode i inundirectedgraph, ^ G is ^ k ,andcanbewrittenin termsoftheadjacencymatrix W as ^ k i = n X j =1 W ij Next,weexplainthegraphgenerationprocessofurbanstreetnetworkofregions usingGoogleMaps,andthenanalyzetheirdegreedistributionforunweightedand weightedcases,andnallyexaminetheirsmallworldproperties. 9.3.1NetworkofUrbanStreets S EGMENT .Apath(edgeinthenetworkgraph)thatdirectlyconnectstwolocations. R OUTE .Arouteisasetofsegmentsappearingintheorderoftheincreasing distancefromsourcetodestination. Asmentionedbefore,wegenerateanetworkgraphwithnodesactingascamera locationsandsetofedgesasdrivingroutesconnectingthem.Theseroutesare basicallythedrivingsegmentsbetweenthelocations,asreturnedbyGoogleMaps API.Inordertogeneratethisgraph,westartbytakingthegeo-coordinatesofapair ofcameralocations,calculatingthedrivingdistancebetweenthem.Next,wecheck forapossiblesubsetofothercameralocationsthatmightlieen-routefromsourceto destination.Allsuchlocationsareinsertedinorderoftheiroccurrencesandconnected throughintermediatesegments(asedges).Forexample,drivingfromNewYorktoSan Francisco,wedrivethroughIowaCity,Omaha,SaltLakeCity,andSacramentointhat order.Ifnosuchlocationsexist,thesourceanddestinationaredirectlyconnectedbyan edge.Weiteratethisprocessforallpairsofcameralocation,total V ( V )Tj /T1_2 11.955 Tf 12.48 0 Td [(1) .While 129

PAGE 130

AConnecticut BLondon # $ % & ( # % & ( !) !! !" !$ !& !' !( ") "! "" "# "$ "% "& "' "( #) # #! #" ## #$ #% #& #' #( $) $ $! $" $$ $% $& $' $( %) % %! %" %# %$ %% %& CSydney DWashingtonD.C EToronto FSeattle F igure9-2.Ageo-laidnetworkofurbanstreetsofthreeregionsisshown.Thenodesare cameralocationsandedgesareroutesconnectingtheselocations.Wehave shownSydneywithitsweighteddegree h ^ k i network. doingso,wealsomaintaina between countforthetraversededges,connectingtwo nearestlocations.Everytimetheyaretraversed,weincreasetheirrespectivebetween countbyone.Finally,weendupwithaweightedgraphshowingthelocationsand segmentstakenwithlaters'weightequaltothetimestheyhaveappearedonanyroute. Forexample,themostfrequentlytakenedgesegmenthasthelargestbetweencount. Forsimplicity,weassumeeachstreetallowbi-directionaltrafc.Ingeneral,locations areconnectedbyamaximumof3-5roadways,inourcaseweeasethisassumption forinvestigatingtheconnectivitypatterns.InFigure 9-2 ,weshowtheexampleofa w eightedgraphofSydneygeneratedwith67cameralocations.Theunderlyingprocess ofgeneratingthisnetworkgraphiscomputationallyexpensive[ 226 ],nonethelessit 1 30

PAGE 131

!" # !" $ !" % !" & !" $ !" # !" !" ( )* + +, + + ./001234253)61478319+AConnecticut( h ^ k i = 2.41 ) !" !" !" # !" $ !" # !" !" % & '( ) )* + ) ) ,-&-./-)'0+ BToronto( h k i = 2.8 ) Figure9-3.ACDF P ( x ) anditsmaximumlikelihoodpower-lawtsfortwolocations. hasmanybenets: i. WeuseGoogleMapsAPItocalculateallpossibleroutesand intersections,today,anyoneplanningtotravel,accessesmapsviaGoogleorlike services. ii. Weareassuredthatresultantgraphlters-outnon-frequentroutes,which helptobetterexplainthecauseoftrafccongestiononfrequentlytakenroutesand locations. iii. Therecommendationscanbemadetogeneratedynamicroutesfrom divertingthetrafconalreadycongestedsegments.Thereareseveralvariationsof between count. UnweightedGraph :Webaselinethe between scoreofastreet(edge)tooneifit haseverappeared. Weightedgraphbydistance: Theweightsontheedgescanbereplacedbyactual drivingdistance.Thus,recommendationcanbemadeincaseshortestpathis availablebetweenapairofsourceanddestination. Weightedgraphbydistanceandbetweenscore :Theweightsontheedgesare acombinationofdistanceand between score.Ithelpstodiscoveroverheadand congestedsegmentsinthenetwork. 9.3.2AnalysisofDegreeDistribution:UnweightedCase Westudythenumberofconnectionsthatcameralocationshavewithoneanother. Ithelpstoanalyzetheconnectivityandprobabilityoftakingalternateroutestothe destinations.AnetworkanalysisofToronto'scamerasshowstheirdegreedistribution isfollowpower-lawdistributions.AsevidentinFigure 9-1 ,theradialaxisgraphof T orontoclearlyshowsonly2-3locationswithlargedegreedistribution.InFigure 9-3 1 31

PAGE 132

anexponentvalueof = 2.8 showsthatthemaximumlikelihoodpower-lawtforthe degreesofitsfewlocationshavealongright-sidetailofvaluesthatareabovethemean. Avalue x obeysapower-lawifitisdrawnfromaprobabilitydistribution p as: p ( x ) / x )Tj /T1_6 7.97 Tf 6.6 0 Td ( aconstantknownasexponentparameter.Theusualvalueofexponentliesinthe range 2 << 3 withsomeexceptions. Aboveresultsindicatethatsuchlocationshavemuchhigherconnectivitywithrest oftheone-hopfarlocations.Ontheotherhand,iftheyareremovedfromthenetwork, averagepathlengthwillincrease,andlocationpairswillbecomedisconnectedand travelingbetweenthemwillbecomeimpossible. 9.3.3AnalysisofDegreeDistribution:WeightedCase Theweighteddegreeofacameralocationiscalculatedbasedonthefrequencyof itsconnectededgesthathaveappearedbetweenanypairofsourceanddestination. UsingGoogleMaps,wehavecalculatedshortestpathbetweenallpairsoflocations, andthelistoflocationsthatareonenroute.Therefore,itispossiblethatfewlocations havebeentraversedmoreoftenthanother,makingthemthemostvisitedlocations. Inourstudy,wendthelocationsbelongingtoSydneyandConnecticutdemonstrate apower-lawdistributions,whichmeanstheycreateanhourglassmodel,making mostofthetrafctopassthroughfewlocations.Italsomakesthemsusceptibleto trafccongestionandclosures.InFigure 9-1 ,weseethedistributionofnodesizes r epresentingweighteddegreesforConnecticutandSydney,withpower-lawexponent =2.41 and 2.98 respectivelyinTable 9-2 .InFigure 9-3 ,acumulativedistribution f unctionformaximumlikelihoodtforConnecticutandTorontoisshown. Thus,whileTorontoisskewedonconnectivity,ConnecticutandSydneyareskewed onvisitingsamelocationsagainandagain.Wecansaythattrafccongestionin Torontoappearsbecauseofgeometryoflocations,whileforConnecticutandSydneyits becausespecicrouteshavebeentraversed.ThecityofLondonappearstohaveeven 132

PAGE 133

! !"# !"$ !"% !"& !"( )*) +,-).*,-) + / / )0112345364 -) 7+809 :21;5 -) + AConnecticut !"# !"$ !"% !"& !"( )*) +,-).*,-) + / / 012312 -) 4+315 67289 -) + BLondon !"# !"$ !"% !"& !"( )*) +,-).*,-) + /0+0120 -) 3+405 67189 -) + CToronto !"# !"$ !"% !"& !"( )*) +,-).*,-) + / / 012341 -) 5+267 84319 -) + DSydney !"# !"$ !"% !"& !"( )*) +,-).*,-) + / / 0123341 -). 5+678 91:;< -) + ESeattle !"# !"$ !"% !"& !"( )*) +,-).*,-) + / / 0123456785/9")" -) :+;82 <=5>4 -) + FWashingtonD.C. F igure9-4.Analysisofclusteringcoefcient.ACDFof C and C r showlargevaluesof clusteringcoefcientforregionsnetworktorandomgraphs,indicatingpriors' networkstructureexhibitingsmallworldproperties. distributionforbothmetrics,asevidentinitsradiallayoutinFigure 9-1 andTable 9-2 W ecansaythatLondonnetworkismoreresilientthanotherregions,withlotofsmall andinter-connectingstreets,exhibitingpropertiesofanhistoriccity'sgrowth. 9.3.4SmallWorldAnalysis Weinvestigatethatnetworkofurbanstreetsofallsixregionsclearlyexhibitsmall worldproperties.Ingeneral,anetworkwithsmallworldshouldhavesmallaverage pathlength( L < 6 )andlargeclusteringcoefcient( 0.4 < C < 1 ).Wemakeabasis 133

PAGE 134

!"# !"# !"$ !"% !"& !"' ( )*++,-./0*12)*,33040,1/25 6)782592).:,+.;6 2 2 ( <*=+ # <*=+ > <*=+ $ <*=+ ALondon !"# !"# !"$ !"% !"& !"' ( )*++,-./0*12)*,33040,1/25 6)782592).:,+.;6 2 ( <*=+ # <*=+ > <*=+ $ <*=+ BSydney F igure9-5.CDFshowingcorrelationoftrafcdensitiesbetweenhoursoftheday. forafaircomparison,byusingErdos-RenyiG(n,M)[ 71 ]modeltogeneratearandom g raphforeachcityseparately,with n = V and M = E .Toascertainourstructure, weexamine C against C r foreachcity-for C tobeextremeinthatdistributionand greaterthantheninety-fthpercentile.Next,wecalculatetheaveragepathlength( L ) andclusteringcoefcient( C )ofthesixregions'networksandcomparedthemagainst L r and C r ofrandomgraphsrespectively,asshowninTable 9-2 .Wendthatnetworksof a llsixregionshavesmallaveragepathlength( 8 L < 6 )andlargeclusteringcoefcient ( 8 C 1 C C r ),withTorontohavinglargestvalueofclustering, C =0.6 .TheCDFs ofclusteringcoefcientareshowninFigure 9-4 i. aquickconvergencefortherandom graph,indicatingverysmallclusteringcoefcientvaluesof C r ii. Allvaluesof C > 0.3 andlargegapsincurvesindicatingnetworkofregionsexhibitingstrongsmallworld properties. 9.4TrafcModelingandCharacterization Westudiedconnectivityofurbanstreets,nowweturntomodelandcharacterize thetrafcdensityonthesestreets.Wewillsee,howthetrafciscorrelatedwithitself forseveralhoursoftheday.Later,wewilluseknowntheoreticaldistributionstomodel trafcdensities. 134

PAGE 135

A d =2023,0.28 B d =5400,0.55 C d =9230,0.93 Figure9-6.Trafcwithvaryingdensities(A)low,(B)medium,(C)highisshown.Therst valueistheresultofbackgroundsubtractionandlateristhenormalized value. 9.4.1TrafcFlowAuto-Correlation Weinvestigatecorrelationcoefcients( )tomeasurethedegreetowhichtrafc fromacameraislinearlyassociatedwithitselffor42days.Inourcase,weareusing thistoanalyzethechangeintrafcdensities.Weanalyzethecorrelationsfor1-4hour lagsforeachcameraagainstitselfduring12hoursoftheday,from7AMto6PM.For example,weinvestigatewhatthecorrelationisbetweenthetrafcat7AMand8AM (1-hourlag),1PMand3PM(2-hourlag)etc.InFigure 9-5 ,weshowCDFforvarious h ourslagoftheday.ForthecityofSydneythehourlytrafcchangeishighlycorrelated, almost80%ofcameras'nexthourtrafcis70%correlatedtoitscurrenthour.Fornext twohoursfromthecurrent,thetrafcfor80%ofthecamerasareonly50%orless correlated.Andaround60%camerashaveonly30%correlationforatimelagof3-4 hours.WhileincaseofthecityofLondon,thenexthourtrafcdensityfor80%cameras iscloseto60%correlatedtothecurrenthour.Itgoesfurtherdownto30%fornexttwo hoursandaround15-20%fora3-4hourdifference. Thus,vehiculartrafchastemporal richness,whichin-turnaffectsthemobilityofvehiclesandtherefore,haveanimpacton theperformanceofroutingprotocols[ 14 ]. Similartrendsareobservedinotherregions, butomittedhereforbrevity. 135

PAGE 136

Exponential Log-gamma Log-logistic Normal Poisson Weibull 0 10 20 30 40 50 60 70 Distribution ModelAvg. % [1st Best Fit] <3%=4% <5%=5% <3%=17% <5%=24% <3%=2% <5%=4% <3%=54% <5%=41% <3%=23% <5%=26% Figure9-7.Besttsforsixregions.Thevaluesintheboxshow deviation. 9.4.2TrafcModelingandCharacterization Here,wefocusonmodelingthearrivalprocessoftrafc(trafcdensityvalue) inequalintervalsoftimesagainstknowntheoreticaldistributions.InFigure 9-6 ,we s howthreetrafcscenariosofvaryingintensitiesfromlowtofullycongestedlocation, capturedbythedensityparameter( d ).Theobjectiveofthisstudyistohelpunderstand theunderlyingstatisticalpatterns.Wealreadylteredtheseforthepurposeofshowing thematurityinourstudiestoselectandidentifythestatisticalpatternswithoutmuch deviations.Toensurethevalidity,wealsoperformedseveralgoodnessofttestusing Maximumlikelihoodestimation(MLE)andKolmogorov-Smirnovtesttomeasureaverage deviationandcomparethevaluesinthedensityvectortoknowndistribution.We systematicallymodelindividuallocations'empiricaltrafcdensitydistributionagainst wellknowntheoreticalones.Inordertomatch,weusevetheoreticaldistributions: Exponential,Gamma,Log-Logistic,NormalandWeibull.Wendthattrafcatindividual camerascanvaryalot,butingenerallog-logistic,GammaandWeibulldistributions cancapturesomeofthekeyfeatures.Werankthesedistributions(basedonKS-tests) inTable 9-1 ,withfouroutofsixregions'individuallocationshaveloglogisticasthe 1 st bestt,whileTorontohasGammadistributions.InTable 9-1 ,weshowdominant d istributionsat3%and5%deviationusingtheKS-test.InFigure 9-7 ,resultsshow t hedominanceofdistributionsforallthelocationsfromallsixregions.Overall,the empiricaldatacloselymatcheslog-logisticandGammadistributions.Wendthateven 136

PAGE 137

onregions'aggregatetrafclevels,thelog-logisticdistr ibutionsprovideagoodestimate ofempiricaldata.Theseresultsarerealisticscenarios,andcanbeusedasinputfor simulatorstoevaluatetheperformanceofvehicularroutingprotocols. 9.5FutureApplicationtoVehicularNetworks Theexperiencegainedfromtheanalysisandmodelingoftrafcdensitiespotentially aidsinfuturedesignandevaluationofvehicularnetworks.Today,mostofthesimulation toolsinputgenericorrandomscenariosanddisregardthechallengesbroughtby mobilityinvehicularnetworks[ 14 ]and[ 217 ]and[ 249 ].Inourcase,thebenetof u rbanstreetanalysisandlargedatasetofrealistictraces,anditsmodelingresults provetobeveryhelpfulindevelopingrichscenariosfortestingprotocols,network dynamics,scalabilityoftrafc,topologysizeestimation,andtheanalysisoftrafc patterns.Thedata-drivenrealisticsimulationtoolsandmobilitymodelsarenecessary foraccurateevaluationofvehicularroutingprotocolsandservices.However,our analysisshowsthattrafccharacterizationandcommunicationnetworkanalysistools (e.g.,ns2)areseparatelydevelopedandthereforelackatightintegration[ 217 ]and [ 194 ].Ourgatheringandanalyzingrealtrafcdatacanaidiniden tifyingmetrics(e.g., spatio-temporaldensity)todevelopdatadrivenmobilitymodelsandsimulators.The uniquechallenges(e.g.,highspeed,intermittentconnectivity)ininter-vehicle[ 30 ]and c ar-to-roadside[ 123 ]communicationrequirethedevelopmentofrobustandefcie nt routingprotocols.Wecanusethecameras'geo-coordinatesandtheirtrafcdensity distributiontodevelopandtestnewperformancemetricsandprotocols.Inthefuture,we aimtofocusondevelopingrealisticanddata-drivenmodels.Wehavealsoplantomake thisdatasetavailabletotheresearchcommunityandextendourexistingworktostudy centralitymeasureforallthecities. 9.6Conclusion Weknowtopologicalproperties(likedirectionsandlanes)impactthemovementof vehiculartrafconroads.Inthispaper,rstwehavediscussedanapproachtocreate 137

PAGE 138

anetworkofurbanstreetsfromdrivingdirectionsandsecond useofvehicularimagery snapshotimagesfromfreelyavailableonlinecamerasfortrafcanalysis.Ourresults haveshownthatforthreeregions(Connecticut,Sydney,andToronto),duringseveral trips,visitstotheirlocationsandstreetsexhibitapower-lawdistributions.Atemporal auto-correlationof80%isevidentfortrafcdensitiesinthosethreecitiesforconsecutive hours(1-2hours)oftheday.InLondon,highandvariabletrafcpattern.Wehave observedastableperiodicityoftrafcdensityformanydays(42days)correspondingto weekdaysandweekends.Thisisanimportantresult,andcanaidindevelopingfuturistic trafcpredictionmodels.Wehavealsofoundthatempiricaltrafcdensitiesclosely follow(withlessthan3%deviation)theoreticaldistributionslikeLog-logisticandWeibull. Webelieveourworkwillprovidemuchneededcontributiontotheresearchcommunity. 138

PAGE 139

CHAPTER10 I NFRASTRUCTURETOENABLECITY-WIDEUBIQUITOUSCOMPUTING 10.1Introduction Ubiquitouscomputingisdeemedvitalforthedevelopmentofenvironmentfriendly andsustainablesmartcities.It'susehasbeenrealizedforexample,inelectronic roadpaymentsystems,computer-drivenmasstransits,smart-postal,andmobile networks[ 26 149 ].Ontheotherhand,theurbaninfrastructuresuchasgeograp hical spreadandroadnetworkswillinuencethedesignanddeploymentofsuchubiquitous computingsystemsonurbanscale[ 149 252 ].Hence,thistightcouplingcallsfora t horoughunderstandingofurbaninfrastructurefortherealizationofcity-wideubiquitous computingeffort.Thereareseveralfactorsthatimpacttheunderstandingofurban infrastructure.Ingeneral,thetopologicalfeaturestostudytheinfrastructureofanurban settingprimarilyinvolvesit'sgeographicalspreadandarea,networkofmotorways, structures(suchasbuildingsanddams)andhumanpopulationdensity.Inthisregard, manystudieshaveexaminedastarkdifferenceamongself-organizedcitiesthatare evolvedbecauseofsomehistoricalprocessesversusthosethataretheresultofa single-plan,mostlyproducingagrid-likestructure.Thehistoricalcitieshaveobservably moredenselypackednetworkofintersectionsandsmallmotorways,lessfragmented anddecentralizedgeographicexpansion[ 38 51 251 ].Theyarealsomorepopulated, i nhabited,anddemonstratemorecomplexanddynamiceco-systems.Ontheother hand,man-madedesignedcitiesaremorestructuredwithevenlydistributedspatial owsandsparselandscaping.Othermeta-physicalfactorssuchasglobalization, socio-economicandnancialviability,technologicaladvancement,andpoliticsalso affectourunderstandingofurbaninfrastructure.Moreover,thesediversitiesbring aprinciplechallengetodesignanddeveloppracticaltools[ 76 ]formeasuringand q uantifyingtheirinteractingeffects,popularlyknownasemergentproperties.Inaddition tothis,theevaluationcriterionshouldbegenericenoughtoapplytoanysetting(urban 139

PAGE 140

infrastructure).Essentially,astudyoftheseactivitiesw illprovideasignicantinsight forenablingcity-wideubiquitouscomputingenvironment.Forinstanceimpactof aforementionedfeaturesarewelldocumentedforSingaporeandKoreain[ 26 ].Several o therstudieshavealsoshownidiosyncrasiesindeployedsystemsbasedonthe structureandfunctionofurbaninfrastructure[ 80 81 ]. R ecently,DepartmentofTransportation(DOTs)acrossseveralmetropolitanareas havestartedtodeploytrafcweb-cameras.Thesecamerasarestrategicallylocated andpositionedtowardsmotorwaystoenablethemonitoringofvehiculartrafc.Ata constantinterval,theycapturesnap-shotsoftrafcconditions,whicharethenavailable forviewingonDOTs'mediaservers.Wehavecollectedandprocessedmorethan25 millionsuchimagestogeneratelongitudinaltimeseriesdatasetoftrafcdensitiesfor morethan800locationsspreadinsixregions(Connecticut,London,Seattle,Sydney, Toronto,andWashingtonD.C.)aroundtheworld.Inthischapter,weharnessthe powerofthesecamerasandusethisdatasettostudyvehiculartrafcconditionsand usecameras'geo-graphicalspreadtoanalyzethetopologicalpropertiesofthese regions.Inthecurrentscenario,OnlineSocialNetworks(OSNs)suchasFacebook andFourSquarearetightlyintegratedwithourlife-style.Theyprovidenumerousways toshareourdiurnalpatterns,presenceandmovementswiththerestoftheworld.In ordertostudythehumandynamicsinthesesixregions,wehavecollectedanonymous spatio-temporalfootprintsofhumanactivitiesthroughFourSquare.Inthiswork,we integratethesetwodifferentdatasets(VehicularandHuman)andusegenerictoolssuch asnetworkcentralitytocomprehensivelystudyandreasontheurbaninfrastructureof thesesixregions.Bytheway,theseregionsareamixofplannedman-madecitiesand self-organizedhistoricalcities,asdiscussedbefore. Inourapproach,werstexaminethenatureofvehiculartrafc.Inthatwestudy trafcpatterns(regularorrandom),evaluatetrafccorrelationsacrossalllocationpairs andtheirstability(auto-correlation)intheirregion.Wealsolocatehotspots(congestion 140

PAGE 141

pronezones)andsimilartrafclocationsinindividualregi ons.Second,weexaminethe spatialfeaturesoftheselocations.Inthatwecorrelatetraveldistanceandtimeamong theselocationsandreasonaboutreachability.Thenweturnthegeographicalmapof theseregionsintonetworkgraphs.Weemployvariouscentralitymeasures[ 184 ]in o rdertoaccessthestructureandconnectivitythathaveabigimpactonthebehavior (topology)ofthesystem.Finally,weexaminethehumandynamicswithvehiculartrafc ontheselocationsandreasonandstudytheircorrelationinurbansettings. Tosummarize,ourcontributionsare: Weproposeanoveltechniquetouseglobalinfrastructureoftrafccamerasto performalongitudinalstudyofurbaninfrastructure. Weprovideacomprehensivestudyintothetopologicalfeaturesbyintegrating diversieddataofvehiculartrafc,urbanstreets,andhumandynamics.Infuture, weplantoreleasethisdatasettotheresearchcommunity. Ourapproachhasinvolvedtheuseofgenericandsystematictechniquesthatcan bescaledandusedforanyurbanorruralsettings. Therestofthechapterisorganizedasfollows:Firstwesummarizerelatedwork, secondweprovidedetailsofourdatasetandprocessingtechniques,thirdweexamine trafcdensitydistributionpatterns,fourthweperformnetworkevaluationofurbanstreet maps,fthwestudyhumandynamicswithvehiculardensitypatterns,andthenwe concludeourwork. 10.2RelatedWork Inthissection,wediscusstherelatedwork,whichiscategorizeddatacollection andpre-processing,networkanalysisforurbanstreetsandubiquitouscomputing forvehicularnetworks.Intherstcategory,wediscusstheinadequaciesofexisting repositoriesofvehicularmobilitydata.Next,techniquesusedtoprocessimage dataareexamined.Inthepast,effortshavebeenmadetocollectvehicularmobility records;byGPStraces,vialoopdetectorsandradiosensors[ 131 143 251 ].However, t hesedatasetsaregenerallynotpubliclyavailableandlimitedintheirscope,size,and 141

PAGE 142

geographicspread.Inadditiontotheirsmalltimeline(typi callyonlyafewdays) 1 ,which makestheiruseforlongitudinalanalysislimited,themethodsappliedwerealsospecic tothesedatasetsandcannotbescaledforotherpurposes.Webelievethatsimilarto thepedestriantracedatasetin[ 140 ],acomprehensiverecordofvehicularmobilityis v italforaresearchinfuturevehicularnetworks.Incontrasttothedatasetsdescribed above,ourdatasetcoverssixregions,forperiodsofseveralmonthsathundredsof locations(seeMeasurementsectionforspecics). Second,centraltothedatacollectionprocessistheimageprocessing,designed tobecomputationallyefcientforsuchalargedataset.Manystudies[ 43 ]havebeen c arriedoutthatlookintoaspectsofbothbackgroundsubtraction[ 44 193 218 ]and o bjectdetection[ 152 ].Inbackgroundsubtractionmethods[ 69 ],differencebetween t hecurrentandreferenceframeisusedtoidentifyobjects.Indetectionbased approaches[ 224 ],learningtheobjectfeatures(shape,sizeetc.)areusedto detect andclassifythem.Inthisresearch,weareusingtemporalmethodsforbackground subtractiontoestimatearelativenumericalvalueinsteadofcountingcars.Wend backgroundsubtractionismuchfaster,robusttooutliers,applieduniversallyandmore scalablethanobjectdetection(seeMeasurementsectionformoredetails). Third,attemptshavebeenmadetoexaminethestructureandtopologicalfeatures ofvehicularnetworksusingappliedgraphmeasuressuchascentrality.Inthisregard, Cardilloetal.[ 38 ]performeda1-squaremileprojectonseveralcitiesacrosst heworld andstudiedlocalandglobalpropertiesofthegraphstocategorizetheirorganicversus plannedgrowth.Theyalsostudiedthebackboneofacitybyderivingspanningtrees basedonedgebetweennessandedgeinformation.Severalotherstudieshavealso 1 S pecically,[ 12 ]uses3daysofdata,[ 131 ]usestracesrangingfrom30hourstoa t otalof400hoursfrom4to100cars(sensingpointsinthatcontext)while[ 165 ]usea l ongersample,30days,butonlyat5locations. 142

PAGE 143

foundthatforsustainableurbandesign,centrality,self-o rganizedstructures,andscaling, aredrivingforces[ 25 51 243 ].In[ 196 197 ],theauthorsexaminedtherelationship b etweenstreetcentralityanddensitiesofcommercialandserviceactivitiesinthecityof BolognaandBarcelona.Wetaketheapproachofstudyingthefeaturesofaregionby extractingmotorwaysthatarefrequentlyusedandconnectedwithmajorlocations.This way,weisolatetheissuesofcongestionandconnectivityamongzonesoftheseregions. Finally,in-carandout-carubiquitouscomputinghasshownpromisingresultsin termsofdriverlesscars,controlinterfacesforfuturecarsthatwillhaveminimalvisual demands,sophisticatedtrafcmanagementsystems,suchasthoseincorporating dynamictrafcassignments[ 57 90 168 248 ].Whilefurtherexaminationofthese c uttingedgetechnologiesisbeyondthescopeofthisresearch,webelieveourapproach providesaplanet-scalesystemfordatacollectionprovidinginvaluabledataforthe developmentoffuturevehicularservicesandapplications. 10.3Spatio-TemporalAnalysis Inthissectionthecharacteristicsofthetrafcdataareanalyzedacrosstimeand location;aspatio-temporalanalysis.Tobegin,Figure 10-1 showstheaveragedensity f orallcamerasfortwocities;SydneyandLondon.Ascanbeseenthereisanexpected diurnalpattern;amorningandeveningrushhour.However,thetimeseriesalsoexhibits ahighvarianceascanbeseenbythe95%condenceintervals(dashedline).This underliesthefactthatwhileastrongdiurnalpatternisevidentonaveragedays,thismay notbethecaseforsomeparticulardays.Comparingthetwocities,itisalsointeresting tonotethatSydneyhasasignicantlyloweraveragethanLondonbutahighervariance. Thisiscontrarytotypicaltimeserieswhereahighermeanisusuallyaccompanied byahighervariance.Themostlikelyexplanationisthatinacitywithhighcongestion thereislittleroomformaneuver;thecityisquitesimplycongestedeverydayandsothe trafcdensityeverydaylooksbroadlysimilar.Theimplicationofthistypeofbehavior 143

PAGE 144

6 8 10 12 14 16 18 7.5 8 8.5 9 9.5 10 10.5 11 HourTraffic density Sydney London Figure10-1.AveragedensityforregionsofSydneyandLondon i sthatwithfutureeffortstorelievecongestion,comesincreaseddifcultyinpredicting congestion. Thenextstepexaminesthedailypatternsintheaveragedensityforacitytoseeif thelargevariabiltyobservedinFigure 10-1 canbeexplained.Thedataisdecomposed i ntoa 30 12 matrixof30dailypatterns,each12hourslong.AKohonenneuralnetwork isthenusedtoclassifythese30dailypatternsintogroupscalled day-types .AKohonen neuralnetworkiscomposedofagridofoutputpoints(inthiscasea2-Dgrid)where eachgridpointhasanassociatedpattern;thepatternsinadjacentgridpointsbeing similar.Whenapatternispresentedasinputtothenetworkitiscomparedtothegrid patternsandtheclosestmatchisdeclaredthewinner.Thetrainingalgorithmconsists ofbeginningwithrandompatternsonthegridpointsandadjustingthe(atrstrandom) winneranditsneighborsuntilthedatahasbeensiftedintoitsconstituentgroups.The specicalgorithmusedhereisexplainedindetailin[ 111 ].Figure 10-2A showsthe r esultingmapconstructedfromtheSydneydataset.Ascanbeseen,therearetwo verydistinctday-typesinthedatacoveringapproximatelyhalfthedataeach.The correspondingpatternsatthosetwogridlocationsarealsoshowninFigure 10-2B ; t hesearethe archetypal day-types.Theseshowanintriguingresult;forday-type one(solidblack),theeveningrushisroughlythesameasthemorningrushwiththe 144

PAGE 145

! # $ %& # $ %& & # $ %& %! '()* + '()* -(./0.12, AKohonenmap #$ #% #& #! #" # $'( $ $'( # #'( % %'( )*+,-.-/ / 0,*+1/# 0,*+1/% BDay-types F igure10-2.Kohonennetworkresults.(A)KohonenmapforSydneydatashowing2day types,(B)the2patternsassociatedwithtwopeaksin(A). expectedlullinthemiddle;forthesecondday-typehowever,themorningrushis dominantwithalargerpeakthanexpectedduringtheafternoonandaneveningpeak thatismuchlessthanthemorningrush.Thesecondday-typecanbepartiallyexplained bytheweekendsbutnotcompletelyso(asitaccountsforalmosthalfthedata),thusthe trafcinthisdatasetisnotaspredictableasoriginallymayhavebeenassumed.For trafcmanagementitisobviouslyimportanttoknowthedifferentday-typesthatexistin thenetworkandwhentheyarelikelytooccur. Ingeneral,ithasbeenobservedthatvehiculartrafchascertainpatternandshow periodicityinnature.Inordertoenableubiquitouscomputing,itisimportanttostudyand quantifythem.Particularly,weaskfollowingquestions: Q.1: Whatdoesthetrafcdistributionlooklikeacrossseveralhoursformultipledays? Q.2: Howisthetrafcdistributedacrossseverallocationsofaregion? Q.3: Howisthetrafccorrelatedwithitself?Isthetrafcpredictable? Inordertoanswertherstquestion,wehourlysamplethetrafcdatasetfora periodof42daysandstudytheirdensitydistributionpatterns.Theresultsindicatethat camerashavevaryingtrafcdistributionagainstthepopularnotionof`rushhours'.In Figure 10-3 ,weshowthetrafcdensitydistributionfromfoursampledca meras.Itis 145

PAGE 146

ALowTrafcDensity BHighTrafcDensity CPeriodicTrafcDensity DRandomTrafcDensity F igure10-3.Severalvariationsintrafcdensitiesacrosssix-weekstrafcmonitoring.(A) showrelativelymildtrafcduringvarioushoursoftheday,while(B)show hightrafcrecordingforthefulltraceperiods.InFig-(C)wendaregularity patternsduringthemorningandeveninghourswhenthetrafcisrelatively higherthanafternoonintervals.Arandomtrafccharacterizationis recordedinthelast. -0.5 -0.25 0 0.25 0.5 0.75 1 0 0.2 0.4 0.6 0.8 1 Corr. ScoreCDF (Location Pairs) Connecticut London Seattle Sydney Toronto Washington D.C. Figure10-4.ACDFshowingthedistributionoftrafcdensiti esthatarecorrelated. 146

PAGE 147

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 ACF. ScoreCDF (Locations) Connecticut London Seattle Sydney Toronto Washington D.C. Figure10-5.ACDFshowingthedistributionofaverageauto-c orrelation. ASydney BToronto F igure10-6.SimilarityandhotspottrafcinSydneyandTorontoregion.(A)Similarityin TrafcforSydney.(B)Hot-spotsinToronto. evidentthatFigure 10-3A hasverylowtrafcthroughoutthe12-hoursperiodforall42 d ays.InFigure 10-3B ,consistentlyhightrafcisrecordedforastreetinLondon, with relativelylesstrafcduringtheweekends(day7,14,21onweekends).Wealsondthe periodicityintrafcduringthemorningandeveninghoursincaseofSydney,asshown inFigure 10-3C .Thus,herethetemporalactivityreachesitsmaximumvalued uring themorningandeveninghourswhileitislowduringtheafternoonhours.Finally,some randompatternsareobservedinFigure 10-3D .Ingeneraltheseresultsrejectthenotion o fone-size-ts-allandprovideessentialinputindeployingtheubiquitoussystemthat conformstoperiodicityorrandomness. 147

PAGE 148

Toanswerthesecondquestion,weperformcorrelationanalys isoftrafctime seriesforallpairsoflocationsofaregion.OurresultsinFigure 10-4 indicatethattrafc d istributionacross50%ofToronto'slocationsis75%correlatedand60%ofSeattle's locationis50%correlated.Itmakesensethatthecorrelationsarehigh,sincemany camerasthataredeployedinthesetworegionsareonhighways,whichgenerate consistenttrafcpatterns.IncaseofSydneyandLondon,wendthatdeployed camerasarewithincitylimits(businessplacesandresidentialarea)andbelieveto haveuncorrelatedtrafcdistributionpatters.Theseresultsprovideanimportantinsight intothecategorizationofvariousmotorwaysbasedonthedistributionoftrafcthatis correlatedtoeachother.InFigure 10-6 ,wehavepointedoutsimilartrafcpatterns ( samecolorbulbs)forsydneyandhotspot(highintensity)trafclocationsforToronto region. Next,toanswerthethirdquestion,wesamplethetrafcofeachlocationand calculateauto-correlationfunctiontoexaminethevariabilityinthepatternsacross severalweekdays.TheresultofthisanalysisisshownintheFigure 10-5 .Wend t hatforSeattle,Toronto,andConnecticutthetrafcishighlyauto-correlated.While forLondon,wehaveregisteredsomevariationduringtheweekdays,andtheleast auto-correlatedareSydneyandWashingtonD.C.,wherethetrafcisnearly70% autocorrelatedwith40%oftheirindividuallocations(regionwise). ThesendingsareveryinterestingforSeattleandToronto,wherethedistribution oftrafcisnotonlycorrelatedamongitslocations,butalsohighlycorrelatedwithitself forindividuallocations(morepredictable).WhileSydneyandLondondemonstratelotof varianceintheirauto-correlationsandcorrelationsacrossseverallocations.Overall,we believe,thisstudywillprovidelotofinsightintothedeploymentofubiquitoussystems suchasself-drivingvehiclesandtransitsystems. 148

PAGE 149

0 0.2 0.4 0.6 0.8 1 0 0.25 0.5 0.75 1 Normalized TimeNormalized Distance Distance vs.Time Linear Fit AConnecticut 0 0.2 0.4 0.6 0.8 1 0 0.25 0.5 0.75 1 Normalized TimeNormalized Distance Distance vs.Time Linear Fit BLondon 0 0.2 0.4 0.6 0.8 1 0 0.25 0.5 0.75 1 Normalized TimeNormalized Distance Distance vs.Time Linear Fit CSeattle 0 0.2 0.4 0.6 0.8 1 0 0.25 0.5 0.75 1 Normalized TimeNormalized Distance Distance vs.Time Linear Fit DSydney 0 0.2 0.4 0.6 0.8 1 0 0.25 0.5 0.75 1 Normalized TimeNormalized Distance Distance vs.Time Linear Fit EToronto 0 0.2 0.4 0.6 0.8 1 0 0.25 0.5 0.75 1 Normalized TimeNormalized Distance Distance vs.Time Linear Fit FWashingtonD.C. F igure10-7.Ascatterplotfordistancevs.timeforsixregionswithlineart. 10.4NetworkAnalysis Inthissection,weperformthespatialanalysisofmotorwaysandintersections tostudythegeographicalspread,connectivity,andcitydynamics.Thedetailsofthe usedroutesisshowninTable 6-2 .Firstweexaminetraveldistanceandtimeamong a lllocationsofindividualregions.Secondweturntheurbanstreetmapofregionsinto networkgraphsandusemeasuresofcentralitiestostudythestructureandfunction ofnetworks.Here,wewanttoemphasizetheuseoftraveldistanceanddrivingtime 149

PAGE 150

1 2 3 4 5 6 7 8 9 14 16 17 18 19 20 2 1 22 23 25 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 ABetweenness 1 2 3 4 5 6 7 8 9 1 4 16 17 18 19 20 21 22 23 25 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 BCloseness 1 2 3 4 5 6 7 8 9 1 4 16 17 18 19 20 21 22 23 25 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 CPageRank F igure10-8.CentralitydistributionforSydney 150

PAGE 151

Table10-1.Parameteranddetails Drivingdistancebetweentwolocations Drivingtimebetweentwolocations f Betweennessscore Closenessscore PageRankscore Average Std.Deviation Correlation Table10-2.Resultsforspatial,temporalandnetworkanalys is. Region ( ) f f Connecticut604403421942230.8822593679124550.0030.001 1 London15605973726120.81838121198360.0050.002 Seattle311261642723100.8831647752220.0090.004 Sydney329372102634160.7812717945160.0150.005 Toronto12559614819275930.971879265965330.00480.0018 WashingtonD.C1461190401780.747811082186700.0040.001 amonglocationpairs,whichhelpsustofocusouranalysisonl yonthemotorwaysthat arefrequentlytaken.Thisprovidemuchmoreinsightintotherealisticnatureoftrafc movementsthanexaminingentirecitieswithmanyinfrequentroutes[ 38 51 ]. 1 0.4.1DistanceandTimeAnalysis Ingeneralthetraveldistanceandtimeofcommutersaresignicantlyinuenced bythecitysizeanditsinterconnectionofmotorwaynetworks[ 118 ].Incaseof s lowconnectivityandcongestion,movementshiftstocarpoolingandrider-sharing 0 5 10 0 0.2 0.4 0.6 0.8 1 Location Pairs (in '00 mtrs.)Distance (CCDF) Connecticut London Seattle Sydney Toronto Washington D.C. ADistance 0 100 200 300 400 500 0 0.25 0.5 0.75 1 Time CCDF Connecticut London )Tj 62 19 Td [( )Tj 62 20 Td [( Toronto b t n f r BTime F igure10-9.ACCDFofdistanceandtimeplotsbetweenlocationspairsofsixregions. 151

PAGE 152

approaches.Fromanetworkpointofview,ittellsthestructu reandfunctionofthecity's infrastructure. Therstglimpseofthedistributionoftraveldistanceandcorrespondingtimein showninFigure 10-7 .Inordertohaveaperfectcorrelationbetweenthetraveldis tance andtimeacrossalllocationsofaregion,thescatterplotshouldbecenteredaround thelineartasvisibleforToronto,whosecorrelationcoefcientis0.97andshown inTable 10-2 .AgoodcorrelationisalsofoundforConnecticutandSeattle where camerasaremostlydeployedonhighwaysthathaveconstantspeedtrafcforlong distances.Whilemostofthecameras,whicharedeployedinsidethecityofSydney andWashingtonD.C.mighthavemoresignalsandbusinessspotsthattendtohave slowspeedlimitsandthereforelongtimetotravelshortdistances,weexpecttrafc congestiontooccurwhereslowandfastdistancesmeetinthesenetwork.Wealso quantifythecross-correlationbetweenthetraveltimeanddistancesforthesixregions, andtheirresultsareshowninTable 10-2 .Thetablealsoprovidesaninsightintothe a veragetraveldistancesandtimefortheseregions.Manystatisticscanbelearnedfrom thisTable,forexampletheaveragedeviationinthedistanceandtime.InFigure 10-9 w eshowtheCCDFofthetraveldistanceandtimeforthesixregions.Wendthat exceptToronto,allotherregionshaveshorttraveldistances,whileinToronto'sthe averagedistanceis5km.Incaseoftraveltime,alljourneysoccurinlessthan100 minuteswithanexceptionofToronto.Theseresultsindicatethatlocationswithlow correlationarepronetotrafccongestion. 10.4.2NetworkTheory Weexaminethestructureofmotorwaysusingnetworktheory.Werepresent locationsofaregionbytheverticesofagraphandmotorwaysconnectingthese locationsarerepresentedbythecorrespondingedgesofthegraph.Inordertostudythe inuenceoflocationsandcertainmotorways,wecountthenumberoftimesamotorway isadopted;thenassigningtheequivalentweighttothecorrespondingedgeinthe 152

PAGE 153

networkgraph.Sucharepresentationhelpstoidentifycriti caljunctionsandmotorways thatarepronetocongestion,infrequentlytakenroutesandtheoverallstructureof motorwaynetworks. Networktheoryhasbeenusedinmanydifferentplaces,wheretherelationships aremodeledusingnetworkgraphandmanyalgorithmsareappliedonthetopofthat inordertoexaminetheconnectivity,structure,andfunctionofsuchnetworks.Inthis chapter,wefocusouranalysisonmeasuresofcentralitiesthattells,whichlocations aremostcentraltothenetwork.Weareusingtheemainmeasures:(i)Betweennness, (ii)Closeness,and(iii)PageRank.Betweennesscentralitymeasurestheextentto whichalocationliesonaroutewhenmovingamongotherlocations.Themostvisited locationshaveahighbetweennessscoreandhaveconsiderableinuenceonthe connectivityoftheroadnetwork.Suchlocationswhencongestedorclosedmaycause considerabledisruptionoftrafcacrossthemotorwaynetwork.Inthisstudy,weuse thebetweennesscentralitytodiscoverlocationsthatarehighlyvisitedandaredeemed importantforanefcientroutediscoveryamongpairsofsourceanddestination. Closenesscentralitymeasuresthemeandistancefromonelocationtoanother.Inour case,thiscentralityhelpstoidentifylocationsandmotorwayroutesthatareinfrequently takenandhenceforthcanaidasalternateroutesincaseofcongestiononthemost betweenlocations.Theseroutesareimportantforevacuationrouteplanningand identifyingalternateroutesforacitywherecrisiscanbringthetrafctoahalton majorroutes.WeusePageRankcentralityinordertoexaminethenearestlocations thatcontributetrafcaswellasexperiencethetrafcloadfromnearbylocationsand motorways.Sinceformationofcongestionisanemergentprocess,PageRankcentrality canhelptoidentifythetippingpointsinthenetworkthathavethepotentialtodisruptthe trafcatthemostbetweenlocationsandmotorways.Formoreinformation,interested readerscanrefer[ 184 ]. 1 53

PAGE 154

0 0.2 0.4 0 0.05 0.1 0 0.05 0.1 Betweenness ClosenessPage Rank AConnecticut 0 0.2 0.4 0 0.1 0.2 0 0.1 0.2 Betweenness ClosenessPage Rank BLondon 0 0.5 1 0 0.1 0.2 0 0.2 0.4 Betweenness ClosenessPage Rank CSeattle 0 0.5 1 0 0.2 0.4 0 0.2 0.4 Betweenness Closeness Page Rank DSydney 0 0.2 0.4 0 0.05 0.1 0 0.05 0.1 Betweenness ClosenessPage Rank EToronto 0 0.2 0.4 0 0.05 0.1 0 0.1 0.2 Betweenness ClosenessPage Rank FWashingtonD.C. F igure10-10.Thescatterplotsshowthedistributionofthreecentralitiesforsixregions. 10.4.3NetworkTheoryAnalysis Westartouranalysisbylookingatthedistributionofvariouscentralitiesforthe cityofSydneyinFigure 10-8 .Wendthatlocations31and21arethemostvisited l ocationsandtheirbetweennessscoreishigherthantheotherlocations.Itturnsoutthat thelocation21isSydneybridgethatconnectstwodifferentislandsand31servesasa majorhighway(M2)thatprovidesentryandexitpointsinsidethecity.InFigure 10-8B 1 54

PAGE 155

0 50 100 150 200 0 0.25 0.5 0.75 1 % DensityLocations Four Square Traffic Density ALondon 0 20 40 60 0 0.25 0.5 0.75 1 % DensityLocations Four Square Traffic Density BSydney 0 20 40 60 80 100 120 0 0.25 0.5 0.75 1 % DensityLocations Four Square Traffic Density CWashingtonD.C. F igure10-11.ACDFplotshowthedistributionofvehiculartrafcdensity. Table10-3.FourSquaredata. RegionNumberofVenues#Checkins London 90551765181 S ydney 3350 20274 WashingtonD.C114961982339 thelocationsthatarelessfrequentlyvisitedaretheendpoi ntsofthegraphs.Finally, usingthepagerankcentrality,weareabletoidentifylocationsthatcancontributetrafc tomost-betweenlocations.Asevident,location64thatisdirectlyconnectedto21and providesentryandexittothesoutheasternpartofSydney.Wendsimilarresults forotherregionsaswell.InTable 10-2 ,weprovidethequantitativenumbersforthe c entralities.Ingeneral,wendlargedeviationfromtheaveragevaluesindicatingthat thenetworkshaveskewedconnectivityandtrafcdistribution.Weshowthedistribution oflocationswithallthreecentralitiesinFigure 10-10 .Ourresultsshowthatinallcities t hereareatleast2-3locationsthathavehighveryhighcentralitiesandhenceforthare criticalinmaintainingtheconnectivityofthenetwork. Thisanalysisprovideslotofinsightintotheconnectivityofmotorwaysandlocations. TheresultsindicatethatSydney,Londonregionshavehighbetweennessscoresand arepronetocongestion.OnthesamelinesresultsofPageRankcentralityshowthat trafcisemergentinnatureandtheevidenceoftrafcpresentathighbetweenness locationscanbeattributedtothetrafcthathaspassedthroughthelocationwithhigh PageRankcentrality.Webelieveouranalysiswillopennewwaystostudythetrafc patternsforfuturescitiesandaidinthedeploymentofubiquitoussystems. 155

PAGE 156

0 5 10 15 20 25 30 0 0.5 1 LocationNormalized Densities Vehicular Traffic FourSquare ALondon 0 5 10 15 20 25 30 0 0.5 1 LocationNormalized Densities Vehicular Traffic FourSquare BSydney 0 5 10 15 20 25 30 0 0.5 1 LocationNormalized Densities Vehicular Traffic FourSquare CWashingtonD.C. F igure10-12.AbarplotthatcomparesthetrafcdensitiesagainstFourSquare check-ins. 10.5SocialAnalysis Inthissection,westudycorrelationsbetweenthedensitydistributionofpedestrians (humans)throughOnlineSocialNetworks(OSN)andvehiculartrafc.Itcanbeargued thatvehiculartrafcisafunctionofhumanactivityinmanyplacessuchasbusiness centers,museums,anddowntowns.Also,humancrowdaggregationcanaidin developingbetterpredictionmodelsforvehiculartrafccongestion.Inordertoget thedataofhumanactivity,weuseFourSquareOSN.FourSquareisalocation-based socialnetworkingwebsiteformobiledeviceusers.FourSquareprovidesusersafacility toperformcheck-ins(markspatio-temporalpresence)atvenues(locationstheyvisit, suchasrestaurants,museums,etc.)inordertohelpkeepupwithfriendsanddiscover nearbyplaces.Wecountanonymouscheck-insthathavebeenoccurredatvenues, whichareneartothedeployedcameralocations.Thisactivityhelpstoquantifythe numberofhumanspresentinthevicinityofcameralocations. Westudythedistributionofcheck-insandvehiculartrafcdensityforonlythose locations,wherehumanactivitycantakesplace.Inotherwords,weskipseveral regionssuchashighwaysandcity'soutskirtswherethesemonitoringwebcamerasare deployedbuttheprobabilityofpedestrianorhumanpresenceisverylow.Afterltering, wefoundthatthreeregions,London,Sydney,andWashingtonD.C.arebettersuited forthisanalysis,asmostofthecamerasaresituatedincitylimitsandnearbusiness locations.InTable 10-3 ,wegivethetotalnumberofvenuesandcheck-insrecorded 1 56

PAGE 157

fortheseregions.Forafaircomparison,thetimeperiodofch eck-insmatchesthe timeperiodofrecordedvehiculardensities.WeusetheFourSquareVenueAPI,which returnsalistofvenuesnearthecurrentlocation.Everyvenueinformationhasaunique stringidentierandtotalcheck-insevents.Thesecheck-inseventsrepresentthetotal numberofhumansthatvisitedthatvenue. Analysis: WeshowtheCDFdistributionofvehiculardensityandcorresponding FourSquarecheck-insatcameralocationsofallthreeregionsinFigure 10-11 .Our r esultsindicatethattrafcdensityandFourSquarecheck-insarepositivelycorrelatedfor allthethreeregions.Although,incaseofLondon,locationsaround110and150show somedeviationincorrelationvaluesandinSydneysomedeviationoccursatlocations 18and60,ingeneraltheaggregateresultsarerepresentativeoftheassumptionthat FourSquarecheckinsarecorelatedtothetrafcdensitiesintheselectedurbanareasof allthreeregions.ThehistograminFigure 10-12 givesadistributionoftrafcdensities a gainsttheFoursquarecheckins.Wendthattheresultsshowthathumanactivityis highlycorrelated(80%forlondon)withvehiculartrafc. 10.6Conclusion Inthischapter,westudytheurbaninfrastructuretoenablecity-wideubiquitous computing.Wehaveusedthepowerofglobaltrafcweb-cameras,urbanstreetmaps andhumandynamicstoquantifytheurbansettingsofsixmetropolitanregionsaround theworld.Ourvehiculardatasethasmorethan25milliontrafcdensityrecords,urban streetdatasethasmorethan200thousandsroutesandhumandynamicsdatais comprisedofmorethan2millionspatio-temporalcheck-ins.Inthisregard,ourndings are(i)Urbantrafcshowsamultitudeoftrafcpatternsbeyondthenormalrushhour concept.Wefoundregionsthatinitiallyhavenotrafcbutendupwithheavytrafc andviceversa.Wealsondthatvehiculartrafcisrelativelystableandpredictable duringweekdays.HistoricalcitieslikeLondonshowalargedeviationintraveldistances andtimeindicatingunevendistributionoftrafcspeedandrelativelyhighernumber 157

PAGE 158

ofsignalsandshorterrouteswithseveralconnections.(ii) Thenetworkanalysisof urbanstreetsindicatesthatthecentralitymeasuresareabletodetectfrequentlyvisited locationsandroutesthatarepronetotrafccongestion.Wearealsoabletodetect locationsthatcontributetoemergenttrafccongestion.(iii)Wendahighcorrelation betweenspatio-temporalactivityofhumansandcorrespondingvehiculartrafcinurban regionssuchasLondonandSydney.Webelieveourstudieswillprovideasignicant insightintothesystematicstudyofurbaninfrastructureforenablingcity-wideubiquitous computing.Ithelpstorealizefutureubiquitoussystemsforexamplethatenablevehicle tovehicleandvehicletoroad-sidetypeofseamlesscommunicationandinidentifying wherearethecommunicationbottlenecksforabetterdeploymentofcomputingsystem. Infuture,wewanttoexpandourstudiestomoreregions.Wealsolookforwardto developasimulatorandprovideengineeringapproachestoubiquitouscomputing systembasedonourexperience. 158

PAGE 159

CHAPTER11 K NOWLEDGEDISCOVERYANDCAUSALITYINURBANCITYTRAFFIC Theincreaseinnumberofvehicleshascreatedproblemsinmanycitiesacrossthe globe.Buildingcomprehensiveknowledgebaseaboutglobalcitydynamicsandtrafc distributionisakeysteptoprovidefundamentalsolutiontotheproblems.Inthischapter, weexamineareadilyavailabledatasource;theexistinginfrastructureoftrafccameras aroundtheworld.Wehavecollectedrealtimetrafcdatafrom2,700publiconline trafccameradistributedacross10citiesinfourcontinentsforadurationofsixmonths. Ourplatformallowsustoautomaticallysearchpubliccameras,collectandprocess imagerydata,removeoutliers,andextracttrafcdensityfromthoseimagesinahighly scalableway.Atimeseriesmodelemployingaco-integratedvectorautoregression modelispresentedinwhichtrafcforecastsmaybeproducedandregionsofthecitynot wellobservedmaybesuggested.Inaddition,atopologicalcomparisonofsixofthese networksispresented. Theincreaseinnumberofvehicleshascreatedaproblemoftrafccongestion inmanyworldcities.Inattemptingtosolvethisproblem,isolatedapproacheslike improvingthedesignofroadandintersections,andchangingtheusagepatternshave beenconsidered.Althoughithasrewardedtosomeextent,therootcauseforsuchan aggregationstillpersist.Webelievetheseapproachesshouldbeaugmentedwitha comprehensivepictureofcities'structuraldynamicsandthetrafcdistributionacross itskeyintersectioninacollectivemanner,andmoreglobalcitiesshouldbesampledto buildarichknowledgebaseforsuchstudies. Inthischapter,weutilizeforthersttimeareadilyavailableinformationsource; theexistingglobalinfrastructureofthousandsofvideocameras,providingacontinuous streamofstreetimagesfromdozensofcitiesaroundtheworld.Weintroduceanovel monitoring,analysisandpredictionframework,whichconsistsofanetworkofplanet 159

PAGE 160

scalepublicwebcams,fastandscalabletrafcdensityestim ationalgorithm,and statisticalmodelsfortrafcforecasting. Ourdatasetconsistsof125millionimagesfromover2,700trafcwebcamerasin 10cities/statesforsixmonths,withaoverallsizeof7.5terabytes.Theseregionsare spreadacrossNorthAmerica,Europe,Asia,andAustralia.Inthischapter,wehave selectedsixcitieswithsimilartimegranularlyforafaircomparisonofthisstudy. Ouralgorithmtoestimatetrafcdensityemployes,scalable,andeffective backgroundsubtractiontechniquetoprocessmillionsoftrafccameraimages,and buildanextensivelibraryofspatio-temporalvehiculardensitydata.Basedonthe mixtureofgaussians,thisalgorithmisrobusttooutliers(cameraerrors)andsensitiveto frequentlychanginglightingconditions.Acomparisonwithgroundtruth(ofnumberof cars)showsanearlinearcorrelationtoallowanalysisatanetworklevel. Moreover,wehaveconstructedatimeseriesmodelforthetrafcdatausinga co-integratedvectorautoregressionmodel,torevealtheextenttowhichthetrafc observedatapointinthenetworkisexplainedbythatobservedatanotherlocation.We employa Grangernetwork ;anetworkinwhichcausallinksareidentied;toprovide asparserepresentationandalsotorevealthemajorpathwaysinthenetwork.The majorcontributionhereistheactualmodelitself,whichisalsothebuildingblockfor understandingcitydynamics. Tothisend,ourcontributionsare: Weprovideanovelframeworktostudythecauseandeffectrelationshipoftrafc causalityinurbanstreetsusingthousandsofon-linewebcameras.Infuture,we alsoplantoreleasethedatasettotheresearchcommunity, weestablishthatcausalityonmotorways(highwaysandinter-states)arefarmore perceivablethancity'slocaltrafc.Thisisadistinguishingfactorthatcanbeused forprolingthecities,and weconstructanempiricaltimeseriesmodelbasedonrealdatathatconsidersthe interactionbetweenthetrafcobservedatdifferentpointsinacity.Themodels produceforecastswithatypicalerror(PMSE)of3-9%.Theyalsorevealthattrafc 160

PAGE 161

cameranetworkstendtobedisassortative;manysmallerjunc tionsfeedinginto largerones.Itisenvisagedthatthemodelsconstructedherewillformthebasisof futurestudiesintovehiculartrafcdynamicsincitiesbasedonrealdata. Insection 11.1 ,wediscusstherelatedwork.Weprovidesometheoretical b ackgroundandconstructionofGrangernetworkfortrafcanalysisinsection 11.2 p resenttheresultsinsection 11.3 ,andnallysection 11.4 concludesthischapter. 1 1.1RelatedWork Longitudinalandlarge-scalevehiculardatasetsareveryimportantinthestudy oftrafccausalityandtelematicsapplications,butcollectingthemischallengingand usuallyexpensive[ 30 114 217 ].Insomecases,commercialvendorslognumberof v ehicles,GPScoordinates,speedandmovementtraces.However,therearethree downsidestoit.First,thesetracesarenotpubliclyavailabletotheresearchcommunity. Second,theycontainonlyparticularvehicleswithvendorspecichardware.Third, theyarefromindividualvehicleswithshortdrivingdistancesandinmostcaseswith non-repetitivejourneys.Invariably,theseissuesunderminetheefcacyofusingthem foranykindoflongitudinalanalysis.Inthischapter,weareusingthedatasetthat wehaveproposedin[ 229 ],aninexpensivemethodtocollectglobalscalevehicular m obilitytracesusingthousandsoffreelyavailablewebcamsthatprovidecontinuousand ne-grainedmonitoringofthevehiculartrafcinformoftrafcsnapshotimages. Centraltothedatacollectionprocessistheimageprocessing,designedtobe efcientforsuchalargedataset.Manystudies[ 43 ]havebeencarriedoutthatlookinto a spectsofbothbackgroundsubtraction[ 44 193 219 ]andobjectdetection[ 152 ].In b ackgroundsubtractionmethods[ 69 ],differenceinthecurrentandreferenceframeis u sedtoidentifyobjects,whileindetectionapproaches[ 224 ],learningtheobjectfeatures ( shape,sizeetc.)areusedtodetectandclassifythem.Inthischapter,weareusing temporalmethodsforbackgroundsubtraction[ 227 ]tocalculatearelativenumerical v alueinsteadofcountingcars.Wendbackgroundsubtractionismuchfasterand scalablethanobjectdetection[ 227 ],whichisdiscussedindetailinlatersections. 1 61

PAGE 162

Previousattemptshavebeenmadetousevideosandcomputervi siontechniques invehicletrackingandtrafcsurveillancetodecreasecongestiononfreewaysand solvingproblemsassociatedwithexistingdetectors[ 49 ].Theyarealsousedinareasto e stimatetrafcspeed[ 52 ]andautomaticvehicleguidancelikelanendinganddistanc e measurement[ 130 ].However,rstthesetechniquesworkonvideos,andsecondt hey arenotwidelyavailable,andthirdrequirefarmoreresourcesandprocessingtime.Our workinvolvesastudyofglobaltrafcpatternsusingmillionsofsnapshotstakenbyweb cameras;andforthatmatter,ourtechniquesandtoolsarehighlyscalable,robustto outliersandcanbeappliedtoanykindofimages. SimulationtoolslikeCORSIM[ 96 ]andVISSIM[ 156 ]aregearedtomodelspecic s cenariosforplanningfuturetrafcconditionsonamicro-mobilityandsmallscale level.Butthesetoolslackthosefeaturesthatperformcausalitybasedtimeseries analysis,andtrafcdensitydistributionanditsrelationshiptocapacityofroadways. Thus,ourcurrentworkhelpstollthisgapbyanalyzingtherichscenariosoftrafc patternsevidentinworldcities.Foralongtime,variousstudieshavebeencarriedout inregardtounderstandtheinducedandcausaltrafcpatternsonhighwaysofvarious roadnetworks.HansenandHuangin[ 98 ]haveempiricallyestimatedrelationships b etweenlane-milesofhighwaycapacityandvehicle-milesoftravel(VMT).Their resultshaveshownastrongsupportforthecausalitybetweenhighwaycapacityand increaseinVMT.ASimilarstudyusingGrangertestontheMid-Atlanticregionofthe UnitedStatesisstudiedin[ 87 ]thathaveindicatedthatchangesinlane-milesprecede c hangesintravel,furtherbolsteringthesignicantrelationshipbetweenthelevelof highwaycapacity,asmeasuredbylane-miles,andtheleveloftravel,measuredby dailyVMT.Inthepast,approacheshavealsobeentakentodevelopforecastingtrafc owconditionsinanurbannetworkusingGrangercausalityrelationshipsamongloop detectorsaccordingtothestateofthetrafc[ 63 64 ].Ingeneral,GrangerCausalityis a wellknownstatistic,whichmeasurestheextenttowhichonetimeseriescanpredict 162

PAGE 163

another;beyondthelatters'abilitytopredictitself[ 94 ].Itisprimarilyusedintime-series a nalysis[ 206 ],wherebi-variatedistributionsareusedtoestimateuniva riatedistribution ofvariablesandisfundamentallyalinearpredictionmodel[ 127 ].Inthischapter,we a reusingamodernadaptationofGrangercausalitytonetworksoftimeseries;called aGrangernetwork.Workinthisareahasfocusedmainlyonneuralsystems;EEGand fMRIanalysis[ 20 ]. 1 1.2GrangerNetworks Atthispointthedatahasbeendistilledsothateachcameranodehasanassociated timeseries.Specically,dene y i ( t ) asthetimeseriesassociatedwiththe i th nodeat time t .Note, y i ( t ) isatimeseriesof trafcdensity ateachtrafccamandislinearly relatedtothenumberofvehiclesatthattrafccameraasdescribedinSection 6.2.4 H owever,from y i ( t ) alone,itisnotpossibletotellthenumberofcarsthathaveactually passedthroughthejunction.Acompositeviewofthenetworkcan,however,beformed bylookingatallthejunctionstogetherbyexaminingthe empirical behavior 1 ofthetime series. Thereisastrongcorrelationbetweenallthetimeseriesinthisstudy.Indeed,there isevenastrongcorrelationbetweenthetimeseriesofdifferentcities.However,these are spuriouscorrelations andarecausedbyfactorssuchasthetrafcincreasingin theleaduptorushhour;regardlessoflocation.Thusstandardcorrelationanalysisis misleadinginthiscase.Whatisofinterest,isthemovementoftrafcaroundthecity; theowofonetimeseriesintoanother.Notethatthisowisdirectionalandrequires identifyingthecausationsinthenetwork;thetypeofnetworkthusdescribedisknownas a GrangerNetwork (GN). TherststepinconstructingaGNisanappropriatemultivariatemodelofthetime series.AVectorAuto-regressivemodel(VAR)isoneinwhichthedynamicinteractions 1 i .e.nounderlyingmodelofphysicaltransportisassumedinthisstudy. 163

PAGE 164

betweenasetofvariablesismodeledjointly[ 97 ].Regressorsarealsoincludedtoan o rderdenoted, p ,soastoincludetheeffectaregressedvariablehasonitselfandon othervariables.Specically,aVAR( p )modelmaybedescribedas: Z t = c +b 1 Z t )Tj /T1_5 7.97 Tf 6.6 0 Td (1 +b 2 Z t )Tj /T1_5 7.97 Tf 6.6 0 Td (2 +...+b p Z t )Tj /T1_3 7.97 Tf 6.6 0 Td (p + t (111) where, Z t isan n 1 vectoroftimeseries,i.e. [ z 1 ( t ) z 2 ( t )... z n ( t )] T attime t c isan n 1 vectorofconstanttermsinthemodel, b i isan n n matrixofautoregressive coefcientsfor i =1,2,..., p and t isan n 1 vectorofiidnoisetermswithzeromean andcovariancematrix n TheVARdescribedinEquation 111 isapplicablewheneachofthetimeseries z i ( t ) a restationaryandthereareno co-integrating factors.Fornon-stationarytime seriestypicallyarstorderdifferenceproducesastationarytimeseries. 2 Co-integration isaphenomenonthatoccurswhenalinearcombinationofnon-stationarytimeseries isstationary.Anexampleoftenquotedis thedrunkenwomanwalkingherdog [ 169 ]; t hewalktakenbythewomanisnon-stationary(specicallyarandomwalkisassumed) asisthewalktakenbythedog(beingthesameasthewomens'plusawandering termforthedog).Whiletherstorderdifferenceofbothtimeseriesisstationary,the differencebetweentheirwalksisalsostationary.Thisimpliesthatthereiscorrelationin theerrortermsoftherstorderdifferencedVARofthetwowalks.AsshownbyEngelet al.[ 70 ]ignoringthecorrelationintheerrortermscanresultinspu riouscorrelationsand mustbetakenintoaccountbyaddingatermtotheVARtoincludetheco-integrating relationships[ 190 ]: Z t = r t )Tj /T1_5 7.97 Tf 6.6 0 Td (1 +b 1 Z t )Tj /T1_5 7.97 Tf 6.6 0 Td (1 +b 2 Z t )Tj /T1_5 7.97 Tf 6.6 0 Td (2 +...+b p Z t )Tj /T1_3 7.97 Tf 6.6 0 Td (p + # ( L ) t (112) 2 T hesetypesoftimeseriesareknownas I (1) astheycanbetransformedto stationarityusingasingledifferencing. 164

PAGE 165

where Z t = Y t isnowtherstorderdifferenceof Y t r t )Tj /T1_6 7.97 Tf 6.6 0 Td (1 = f y t )Tj /T1_6 7.97 Tf 6.6 0 Td (1 isthesetof r co-integratingrelationships, f 2< r n isthelinearco-integratingcoefcientmatrix, # ( L ) isalagoperatorontheerrorsinthemodel(whichishereassumedtobesimply theidentitymatrixandsoisignoredhereafter[ 190 ])and isthematrixofcoefcients relating r t )Tj /T1_6 7.97 Tf 6.6 0 Td (1 tothedependent(differenced)timeseries.Thereisoneco-integrating factortakenintoaccountinthecurrentdata,whicharisesfromthediurnalpatternof trafc(i.e.rushhours,lunchetc.).Thecoefcientsofthemodelareestimatedusingthe maximumlikelihoodestimatordescribedoriginallybyJohansen[ 124 ].Finally,themodel d escribedinEquation 112 isoftencalledaneVAR( p ) model. Givenanappropriatemodelthatrelatestheeffecteachtimeserieshasonthe otherstheGrangercausalitymaynowbeestimated.Fortwotimeseries, u and v 3 v is saidto GrangerCause u ,denoted v u ,ifwearebetterabletopredict u using v than wecouldusingthehistoryof u alone[ 94 ].Inthecurrentscenarioitisalsoinformativeto i ncludeasetofexogenoustimeseries,denoted q ,soastotakeintoaccounttheeffect ofothertimeserieson u ;i.e.weconsider u conditionedon q .Thestandardmeasureof GrangerCausalityisanF-statisticformedbytakingtheratioofthevarianceofthemodel residualwithandwithout v [ 21 ]: ^ F v u j q = ln ^ 2 u j v q ^ 2 u j q ( 113) where ^ 2 u j v q isthevarianceoftheresidualofamodelfor u thatincludes v while ^ 2 u j q is thevarianceoftheresidualofamodelthatdoesnotinclude v .Notethat F v u j q 1 asinclusionofanextravariablewillalwaysresultinanreductioninin-sampleresidual variance. ^ F v u j q isknowntohaveanasymptotic 2 distributionunderthenullhypothesis that F v u j q =0 andanon-central 2 distributionunderthealternativehypothesis F v u j q > 0 .Thus ^ F v u j q maybeusedasatestofGrangerCausality. 3 T hetimesubscriptisheresuppressedforclarity. 165

PAGE 166

WithreferencetotheeVAR(p)modelusedinthischapter(Equa tion 112 ),wewish t otestthenullhypothesis: H 0: f z j ( t )Tj /T1_2 11.955 Tf 12 0 Td [(1)... z j ( t )Tj /T1_1 11.955 Tf 11.88 0 Td (p ) g 9 z i ( t ) j Z )Tj /T1_4 7.97 Tf 0 -7.8 TD (t (114) againstthealternative: H 1: f z j ( t )Tj /T1_2 11.955 Tf 12 0 Td [(1)... z j ( t )Tj /T1_1 11.955 Tf 11.88 0 Td (p ) g! z i ( t ) j Z )Tj /T1_4 7.97 Tf 0 -7.92 TD (t (115) where Z )Tj /T1_4 7.97 Tf 0 -7.56 TD (t is Z t withthe j th timeseriesremoved.InthenotationusedinEquation 113 v = f z j ( t )Tj /T1_2 11.955 Tf 12 0 Td (1 )... z j ( t )Tj /T1_1 11.955 Tf 11.88 0 Td (p ) g u = z i ( t ) and z = Z )Tj /T1_4 7.97 Tf T* (t AGrangernetworkisaweighteddirectedgraphas G =( V E ) where V istheset ofvertices(nodes)and E isthesetofedges(links).Theadjacencymatrixof G A ( G ) hasanentryif v u ,andzerootherwise A ( G )( u v )= 8 > > < > > : w u v if v u j q 0, if v 9 u j q (116) where w u v isthestrengthoftheconnectionbetween u and v asnowexplained.Given that v Grangercauses u doesnotmeanthatthestrengthofthecausalityisstrong;it merelymeansthatitisconsistent.Thestrengthoftheconnectionisinsteadmeasured bythecoefcientofvarianceexplained[ 125 ]byinclusionofa(Granger)causalvariable: R 2 v u = ^ 2 u j v q )Tj /T1_2 11.955 Tf 12.36 0 Td (^ 2 u j q ^ 2 u ( 117) where ^ 2 u isthevarianceof u .Inaddition,thetotalvarianceexplainedatanode, denoted R 2 u ,istheincreaseinvarianceexplainedin u byinclusionof all theinformation fromitsneighbors. Insummary,aneVARmodelisconstructedusingthetimeseriesdataandthe Hypothesis( 114 )istested.IfHypothesis( 114 )isrejectedthenacasuallinkissaid 1 66

PAGE 167

toexistbetween y i ( t ) a nd y j ( t ) whosestrength, w i j istheincreasein R 2 v u of z i ( t ) by inclusionof z j ( t ) 11.3Results 11.3.1NetworkAnalysis Figure 11-1 showstheGrangernetworkconstructedusingallthedatafrom the Sydneytrafcsystem.Thisgureshowsquiteabitofinformationandisnowexplained. The X and Y axisarelocal X )Tj /T1_1 11.955 Tf 9.72 0 Td (Y co-ordinatesconstructedfromGPSco-ordinatesusing theCarlsonandClaymodel[ 39 ].Thegreennodesarelocatedattheactuallocal X )Tj /T1_1 11.955 Tf 12.12 0 Td (Y c o-ordinatesforeachcamera.Thesizeofthegreennodeisproportionaltothetotal varianceexplainedatthatnode(givenalltheincomingnodes),i.e. R 2 u asshowninthe legend.Theonlyedgesshownarethoseforwhichthealternatehypothesisisaccepted (Equation 115 ).Theweightonanedgefrom u t o v ,i.e. R 2 v u (Equation 117 )is r epresentedbyorangecirclesproportionaltotheweight(showninthelegend)andare locatedatthetailincidenton v ;thatis,theyrepresentthevarianceexplainedowing intoanodefromthesource.Ascanbeseenfromthegurethenetworkhasasparse structurewithlinksexistingmainlybetweenphysicallyproximatenodesorinthecase ofexteriornodes,pointinginwardtowardsthecitycentre.Specically,ofatotalof N 2 =4,356 possiblelinks,only 119 arefoundtobesignicant. Figure 11-2 showstheco-integratingfactorfortheSydneynetworkavera gedby each15minuteinterval.Thismaybethoughtofastheaveragestateofthenetwork andshowsclearlythemorningandeveningrushhours.Itisinterestingtonotethatjust beforetheeveningrushhourthereisalullinthetrafcandinadditionquitealarge trafcvolumeduringlunch.Later(Section 11.3.2 )thisfactwillbeusedtodisaggregate t hedatabyhourofthedayintothreeregions. ReturningtoFigure 11-1 ,itcanbenotedthatadjacentnodestendtohavesimilar t otal R 2 u values.Thisisnotsurprising;awellobservedlocation(intheGrangercausal sense)islikelytohavemanyneighborswhichre-enforceeachother.However,itis 167

PAGE 168

20 30 40 50 60 70 80 90 100 -30 -20 -10 0 10 20 Local x co-ordinate/kmLocal y co-ordinate/km 40 20 10 2.6 1.3 0.65 Figure11-1.GrangernetworkforSydney(allhours,explanat ioninthetext). 8 9 10 11 12 13 14 15 16 17 18 10 20 30 40 50 60 70 80 90 Time of dayg (t) Figure11-2.Theco-integratingfactorforSydney.(Bluesmo othed, 95%conf. intervalsingreen) interestingtoaskifthereareregionsinwhichthe R 2 u valuesarelowerthanexpected andatwhichanextracameramightbebenecial.Theanalysiscarriedouthere employsa GaussianProcess (GP)tointerpolatethe R 2 u values.AGPistheappropriate interpolationtousehereasitmeetsrequiredconstraintsontheproblem;thefunction shouldtendtozerointheabsenceofcamera'sandthedegreetowhichonecamera effectsanotherisunknownandneedstobeestimatedfromthedata.Thisisbasedon thesimplifyingassumptionthatthetotalvariances'explainedatthenodesaresamples 168

PAGE 169

fromarandomprocesswhichisspatiallycorrelatedwithisot ropiccorrelation 4 : C x 1 x 2 = E [[ R 2 x 1 )Tj /T1_6 11.955 Tf 12 0 Td ( R 2 u ][ R 2 x 2 )Tj /T1_6 11.955 Tf 12 0 Td ( R 2 u ]]= f fk x 1 x 2 kg (118) where C x 1 x 2 isthecovariancebetweentwolocations x 1 2< 2 and x 2 2< 2 E [ ] denotestheexpectationoperator, R 2 u istheaveragevalueoftherandomprocess(to beestimated), f denotesafunctionand k x 1 x 2 k isthephysical(Euclidean)distance between x 1 and x 2 .Inaddition,itisassumedthattheprocesstendstozero;i.e.that farfromacameranoinformationisknown.AMat ernKernel 5 isusedtorepresent f fk x 1 x 2 kg andconsistsoftwoparameters, and (tobeestimatedfromthedata),with controllingthescaleand theshapeofthekernel(see[ 204 ]foradescriptionofthe e ffectofvaryingtheseparameters).AfulldescriptionofaGPisbeyondthescopeofthis chapter;whatisimportantinthecurrentcontextisthewidthandshapeoftheoptimized kernelandofcoursetheinterpolationitself. 6 Figure 11-3 showstheinterpolationof R 2 u a crossthesampledomain.Overall, theinterpolatedvaluesarequitehighinthecitycentreanddrifttowardstheprocess mean( ^ R 2 u =0.2) )quitequickly.InadditiontheinsetinFigure 11-3 showsthe M at ernKernelwhichisestimatedfromthedata( f =0.02, =21 g ).Thiskernel isquite'peaked'suchthattheeffectthatasample(i.e.an R 2 u value)hasonanother 4 T hatis,thecorrelationbetweentheobserved R 2 u valuesattwopointsdependson distancealoneandnotthedirection. 5 TheMat ernKernelisacommonlyusedkernelforinterpolationandissimilartoa Gaussiankernelinthatitdiesawaymonotonicallywithadecayratethatcanbeadjusted usingtwoparameters;anexcellentdescriptionmaybefoundin[ 204 ]. 6 T heinterestedreaderisinvitedtoreference[ 185 ]whichfullydescribesthesame G Pasemployedhere.[ 199 ][ 161 ][ 204 ]provideexcellentbackgroundmaterial.Inbrief, a nd areestimatedrstusingMaximumLikelihoodestimation(forspecicssee[ 185 ] a nd[ 199 ]).TheprocessparametersareestimatedusingBayesianconj ugateanalysis appliedtoahierarchicalGP(see[ 161 ])andnallyamaximuma-posterioriestimateis u sed(see[ 161 ])toproducetheinterpolation. 1 69

PAGE 170

0 10 20 30 40 50 60 70 -20 -15 -10 -5 0 5 10 15 20 25 30 0.15 0.2 0.25 0.3 0.35 0.4 Local y co-ordinate/km Local x co-ordinate/km R 2 -20 0 20 0.4 0.6 0.8 1 Figure11-3.AGPmodelfordistribution.Thisisof R 2 u (Sydneyallhours;theMat ern Kernelisshownintheinset, f =0.02, =21 g ) locationismostlylocal.However,thekernelisquitewideatthebottomshowingthat aweightingofapproximately0.2-0.4isstillfeltuptoapproximatelytenkilometers away.Figure 11-4 showsthesameinformationasacontourplot;thisismoresuit edto identifyinginterestingregions. InFigure 11-4 therearethreeregionslabeledA,BandC,inwhichtheinterpo lation appearstobelowerthanexpected(thesewereselectedsubjectively).Aninspection ofthestreetmaparoundCshowsitsproximitytoamajorHighway(M7).Wesuspect thatregionCcontributestothetrafcobservedontheM7.Onlythreecamerasare deployedthroughouttheM7andnoneofthemareinC'svicinity.Itispossiblethat inclusionoffewcamerasaroundthisregion 7 wouldincreasetheinformationavailableto thenetwork.Bislocatedinaparkareacovering32squarekilometerscalledGeorges nationalpark(whichexplainsthelowcoverage)whileAislocatedincentralSydneyona majorhighway,againtheM7,whichhaslowcoverage.Asimilarcaseinallthenetworks observedinthisstudy. 7 P articularlyatVardysRoadandSunnyholtRoadintersection(aboveKingParks road). 170

PAGE 171

Local x co-ordinate/km C A B Local y co-ordinate/km 0 10 20 30 40 50 60 70 -20 -15 -10 -5 0 5 10 15 20 25 30 0.18 0.2 0.22 0.24 0.26 0.28 Figure11-4.AcontourplotoftheGPdistribution.Thisisof R 2 u w ithlocationsofpossible interestlabeledA,BandC.(Sydneyallhours) 11.3.2Disaggregationandtrafcforecastanalysis TheaimofthissectionistodemonstrateasetofforecastsfromtheeVAR( p ) models.Thedataisrstdisaggregatedintothreeregions;themorning(7amto10am), noon(10:15amto3pm)andtheafternoon(3:15to7pm).Inaddition,withineach dataset2/3and1/3iskeptbackasan outofsample testset.Resultsarereportedas apredictionmeansquarederror(PMSE)atvariousforecasthorizonsfrom1to6steps ahead(1step=5minutes).Thedataisnormalizedtobetween0and1,where1isthe maximumtrafcdensityseenatajunction.Thisistofacilitatethereadertoeasilygauge theamplitudeoftheforecasterrors. Figure 11-5 presentsapanelinwhichasingle6step-aheadforecastissho wnfor thenetworkaswhole.Thesoftwaredemonstrateshowtroublespots(congestion)in thefuturecanbeidentiedusingthecurrentstateofthenetwork.Theregressioninthe bottomlefthandpanelhowevershowsthatthereisstillasignicantforecasterrorin somecases. Figure 11-6 showsasampleforecastofthetrafcatcamera16(acitycentr e camera)forSydneyovertheentiredurationofthedataset.Theforecasthorizonhereis 6stepsahead.Ascanbeseentheforecastslookreasonablebutnomorecanbesaid withoutlookingattheresidualstatistics. 171

PAGE 172

0 50 100 -40 -30 -20 -10 0 10 20 30 Now: 14:34 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 50 100 -40 -30 -20 -10 0 10 20 30 Horizon actual: 15:04 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 50 100 -40 -30 -20 -10 0 10 20 30 d Y: 15:04 -0.34 -0.27 -0.2 -0.14 -0.07 0 0.06 0.13 0.2 0.26 -10 0 10 -10 0 10 d y %d y f %Forecast vs actual 15:04 0 50 100 -40 -30 -20 -10 0 10 20 30 forecast 15:04 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 50 100 -40 -30 -20 -10 0 10 20 30 Forecast-actual -0.34 -0.27 -0.2 -0.14 -0.07 0 0.06 0.13 0.2 0.26 Figure11-5.Thepanelsshowsthestateofthenetworkat14:34 andat15:04.Thetop rowshowstheactualstates(A)and(c)while(C)showsthechange between14:34and15:04.Thesizeandcolorofthecirclesreectsthe trafcdensityasa%ofthemaximumandarelocatedatthelocalX-Y coordinatesofthejunctions.Thebottompanelsshowtheforecasts;rst theerrorsagainsttheactual(D),thentheforecastitself(E)andthenthe forecastchange(F).(Sydneyafternoondataset,6-stepaheadforecastor 30minutes). Figure 11-7 showsthesquarerootofthePMSEforthemorningdatasetinSyd ney. Thereare66camera'sandso66valuestoreportwhicharesortedlexically(i.e.by cameranumber)andalsoinascendingorder(bottompanel).Thesevaryfrom0.03to 0.09meaningthattheexpectedvalueofanerrorisabout3-9%. 8 8 T hestandarderrorandtheexpectederrorarenotidentical. 172

PAGE 173

850 860 870 880 890 900 910 920 930 0.15 0.2 0.25 0.3 0.35 Traffic densityTime y y f ( s e f :0.037) Figure11-6.Asampleforecastversustheactualforthetraf cdensityatcamera16. (Sydneymorningdataset,6-stepaheadforecastor30minutes). 10 20 30 40 50 60 0.03 0.04 0.05 0.06 0.07 0.08 0.09 Camera Numbers e f 10 20 30 40 50 60 0.03 0.04 0.05 0.06 0.07 0.08 0.09 Camera (ordered)s e f Figure11-7.ThepredictionMSEforSydneymorningdata(6-st epsaheador30 minutes). Astheforecasthorizonincreasestheforecasterrorisexpectedtoincreaseas isshowninFigure 11-8 .Thereisnodiscernibledifferencebetweenthethreetimes p eriods. 11.3.3Comparisonofglobalcitynetworks ThissectioncomparesthetopologicalcharacteristicsoftheGrangernetworksfor thesixcitiesinvolvedinthisresearch.Asthenetworksareweighteddirectedgraphs, weightedmetricsarerequiredforthecomparison(see[ 19 ]fordenitionsofthemetrics u sedbelow). Theaverageweightofalinkforeachcity, w ,showsamarkeddifferenceineach network.WhileBeaufort,andtoalesserdegreeWDC,andLondontendtohavelinks 173

PAGE 174

1 2 3 4 5 6 7 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 x 10 -3 Forecast horizons 2 e Morning Noon Afternoon Figure11-8.Theincreaseinforecasterrorwithforecasthor izonforSydney.(Morning, noonandafternoon;averagedacrossallcamera's). Table11-1.SummaryoftopologicalcharacteristicsfortheGN's. City w r k k n n k w nn Sydney0.100.001523.1922.18 B eaufort0.770.013248.6042.10 Connecticut0.020.00191.371.18 GTA0.080.00153.462.81 London0.160.001334.3428.36 WDC0.240.003329.7516.67 thatconveymoreinformation,thenetworksforSydney,Conne cticutandLondonexhibit linksthatdonotonaverageconveymuchinformationsuggestingthateitherthese networkscouldbenetfrommorecamerasorthattheinformationarrivingatajunction isincomingfrommanydirections(atypicalcaseinadensecityenvironment).The averageweightedclusteringcoefcient, r k ,whichmeasuresthelocalcohesiveness, conrmsthishypothesiswithBeaufortandWDCagainscoringhighest. Fortheaveragenearestneighborsdegreewendthat k w nn < k nn ,indicating thatedgeswithlargerweightstendnottopointtoneighborswithlargerdegree;the implicationbeingthatonaveragetheinformationowingintoajunctionneednotcome fromhighdegreeneighbors.Inaddition,as k w nn and k nn arebothpositivethenetworks arefoundtobedis-assortative;highdegreenodestendtobeconnectedtolowdegree nodesindicatingthatimportantjunctionstendtobemeetingpointsformanysmaller junctions.ItisnotablethatinthecaseofGTAandConnecticutthisisnotsignicantas 174

PAGE 175

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.002 0.004 0.006 0.008 0.01 0.012 R 2f R 2(R 2 ) Sydney Beaufort Conneticut GTA London WDC Figure11-9.Acomparisonofthedistributionsof R 2 u v aluesforthe6cities. thesenetworksarecomposedofmainlymotorwaysinwhicheachjunctionismainlya linkinalongchainalongthenetwork. Figure 11-9 comparesthedistributionof R 2 u v aluesateachnode/junction.Again, itisseenthatforWDCandBeaufortthatthisdistributionishigherwithSydneyand Connecticuthaveadistributioncenteredaroundthelowestmeans. 11.4Conclusion Vehiculartrafccongestionisbecomingacriticalprobleminmajorcitiesthroughout theworld.Inordertoaddressthatproblem,wehaveproposedaframeworkforthe systematicmonitoring,measurement,analysisandpredictionofurbantrafcdensity ataglobalscale.Theproposedframeworkishighlyscalableandcanbeusedfor anycitythatutilizesonlinetrafcwebcamerasfortrafcanalysis.Sofar,wehave analyzedvecitiesandastatetoexaminethecauseandeffectoftrafccausalitythat iscontributingtowidespreadcongestion.Thetimeseriesanalysisinthischapterthrew upsomecompellingresults.Firstly,thelocationoftrafccamerasmaynotbeoptimal andthereareregionsinallthenetworkswesurveyedinwhichtherealackofcamera coverage,whicheffectstheinformationinthenetworkasawhole.Onepossibilityisthe introductionofnewcamerasinthoseareasorpossiblytheintegrationofinformation fromalternativesourcessuchassmartphonedata(oreventoalertsmartphonesto theuseoftheirdataataparticularlocation).Whileonlythreecaseswerefocusedon 175

PAGE 176

intheanalysisinSection 11.3.1 ,furtherinvestigationisenvisaged.Thedisaggregate f orecastsproducedshowthatreasonableforecastsofthecitytrafcmaybecreatedup to30minutesahead.ThePMSEoftheseforecastsvariedwidely(from0.03to0.09)and thisinitselfallowsthecityoperatorstoseewhichcamerasarenotprovidingadequate information.Weenvisagethattheseforecastscanbeimprovedinfutureanalyses withtheaidofaBayesianframeworkespeciallyintakingintoaccountinvalidcamera readingsthatdegradetheresults. TheGrangernetworksthemselvesshowedahighdegreeofinformationexchange betweenthenodes,whilestillpresentingasparserepresentationofthenetworkas awhole.Thissparserepresentationisimportant,asitrevealsthemaininformation pathwaysinthenetwork,whichapartfrombeingstatisticallyimportantalsoprovidesa succinctsummaryofthenetworktotrafcmanagers.Whilecomparingthesenetworks acrossthedifferentcities,twotypesofcitiesemerged;thosebasedonmainlymotorway basedroadnetworks,whichexhibitahigherdegreeofcohesion,comparedtothose basedonacitystreettypetopology.Thedifferingclusteringcoefcientsineachnetwork presentaninterestingdilemma;givenniteresourcesitisbesttopoolcamerassuch thattheyreinforceeachother'sinformation(asinthecaseofBeaufort)orspreadthe camerasevenlyaroundthecity(asinthecaseofLondon). 176

PAGE 177

APPENDIXA C OMMUNITYDETECTIONINNETWORKS Thenetworksareeverywhereandevolvinginvariousforms,followingcertain characteristicpatternslikesmallworldandscalefreecharacteristics.Thebasisofthese formisnottherandomadditionofnodesandverticestothesystemunderstudy,but asystematicprocessofmutualevolution.Thismakesmanyaspectsofsuchsystem worthstudying.Researchersforalongtimehaveinvestedmanyyearsinterpreting suchsystems,forinstancehowbiologicalnetworksevolve,whyhumansformsocieties tonameafew.Thus,seminaltothebehaviortothesesystemsarethewayinternal componentsareconnectedtoeachother.Theseconnectionbreedsstructurethat exemplifythenatureandthebehavioroftheoverallsystem. Advancementsingenomicsgeneratedenormousamountofdata,currently gatheredbybiologistandnowavailableforscienticcommunityatlarge.Thecategorization ofthisdatanaturallyfallsintovariousclasses,rangingfromproteinsequencesto complexmolecules.Thuschallengeslieinunderstandingthepropertiesanddynamics ofsuchcomplexsystemratherthangaininginsightintoitsindividualcomponents. Fortunately,promisesshowninstatisticalphysicsandnetworktheoryofferingahostof methodsandtoolstoanalyzelargedatacaninvariablybeusedinbiology.Bymodeling interactionsbetweenproteins,cells,othergenomicdataasnetworkgraphsandapplying statisticaltechniquesonthemcangreatlybenetusinscrutinizingtheintricaciesexists inthesedatasets.Thesemethodsnotonlysimplifytherepresentationsbutalsoaidin searchingforgeneralprinciplesoforganizationandevolutionofbiologicalstructures. Forexample,smallworldanalysisisusedtoexplainprotein-proteininteractionnetworks, scalefreeinvariancepropertiesexistsforinsilicoevolutionofmetabolicnetworks, whichencodeanincreasingamountofinformationabouttheirenvironment.The questionoftenlooms,astowhichpropertiesofsuchsystemareworthstudying,how aretheemergentbehavioractuallyaffecttheoverallfunctionalityofsuchsystems. 177

PAGE 178

Visualizationisapotentialtoolthatareusedasarststept odisplaytheconnectivity patternsbetweenvariousofitscomponentsanddescribethestructureasawhole. Anothersetofmeasurementarevariouscentralityindicesthatquantifytheconnectivity ofsuchsystems.Forexample,theconnectivityinthegeneticinteractionsaffectsthe pairingofgenesandrepresentedgraphmodelsofproteininteractionscanhelppredict athree-dimensionalstructureofaproteinfromitsaminoacidsequence.Inthefollowing text,wediscussabasicmeasurethatexplainvarioustypesofsuchconnectivitypatterns intermsofcentralitymeasuresoftenusedtostudycomplexsystems. A.1CentralityMeasures Toquantifyanetwork[ 45 62 242 ]itisimportanttostandardizevariousmeasures t hatcapturetheinternalstructureofthesystem.Inthissection,wediscusssome oftheimportantmeasuresthatcanbeusedtoanalyzethecomplexsystems.To adherethenotionofgeneralizingthesystem,weconsiderthemasgraph G =( V E ) where V isasetofverticesand E isasetofedgesthatrepresentstheinteraction betweenthevertices.Oneadvantageofusingsuchnotationisthat,wecancompare differenttypesofdatasetonthesamescaleandapplysamealgorithmtodiscover featuresandpatterns,whicharecommontoallofthem.Onesuchmeasurethatis widelyusedintheliteratureis Centrality .Centralitydenestheimportanceofavertex inawaysuchthatitdominatestheoverallstructureofthesystem.Someimportant centralitymeasureswidelyusedinnetworktheory[ 1 183 ]includes:degreecentrality, b etweenness,eigenvectorcentrality,closeness[ 84 ]. A .1.1DegreeCentrality Thedegreecentralityisthenumberofedgesconnectedtoavertex.While undirectedgraphhasonlydegreecentrality,adirectedgraphhasbothinandout degree.Avertexinanetworkwithhighdegreecentralityprovidesameasureofits reachabilityandquantitativedepthnesstoaccessinformationfromvarioussources.For example,railwayconnectivityofBerlinamongotherEuropeancitiesishighest,which 178

PAGE 179

showsitiseasiertocatchatrainfromBerlinandreachsigni cantpartofEuropean nations.Thedegreecentralityismeasuredasbelow: D v = d ( v ) V w here d ( v ) isthedegreeofvertex v and V isthenumberofverticesingraph G A.1.2EigenVectorCentrality Arelativemeasurewhereavertex'simportanceismeasuredbyitsconnectionsto othervertices,whicharethemselvesimportant.Thus,thecentralityscoreisassigned asacumulativetotalofthemeasureofavertex'sneighbors.Initially,wecalculatethe centralityscoreofonevertexatatime,thencalculatethecentralityofeachvertex asweightedsumofcentralitiesofallverticesinitsneighborhood.Wethendividethe valueofcentralitybythelargestvaluepresentinthenetworkandcontinuetofollow thisprocessuntiltherelativemeasureofthecentralitydonotchangeinthesuccessive iterations.Foravertex v ,itscentralityscoreisproportionaltothecumulativetotalofall thescoresofitsneighbors,i.e. v ` j = 1 N X j = 1 M ij v j where M ij isthevertex'sadjacencymatrixand N isthetotalnumberofverticesin thegraph G A.1.3BetweennessCentrality Thiscentralitymeasurestheextentofconnectivityavertexprovideinagraph. Basically,itisameasurethatcounthowmanytimesavertexisvisitedwhiletraversing insidethegraph[ 85 ].Theoretically,forallpossiblesetofshortestpaththata represent betweenallpairsofvertices,betweennesscentralityofavertex v isthenumberofpaths thatpassthrough v 179

PAGE 180

B v = X s p ( v ) s p ( G ) where sp ( v ) isthenumberofshortest-pathsthatpassesthroughvertex v and sp ( G ) arethetotalnumberofshortestpathspresentingraph G .Adownsideof theusingbetweennesscentralityistheamountofresourcesittaketocomputerthe betweennessofeachvertex.Invariably,ittakes O ( n 2 ) spaceand O ( n 3 ) computation time,where n isthenumberofverticesinthegraph G .Afasterimplementationis availablein[ 31 ]forinterestedreaders.Animportantstudyofbetweennessc entralityin LargeComplexNetworksisdonein[ 22 ]. A .1.4ClosenessCentrality Theconceptisbasedontheaveragedistancebetweentwovertices.Avertexwith lowaveragedistancehashighervalueofclosenesscentrality.Thus,avertexwithlarge numberofdirectneighborshashighclosenesswiththemandassumedtohavebetter accesstotheseverticesascomparedtootherwithlargenumberofhops.Insocial networks,thismeasureplayakeyroleinaccessingtheclosenessbetweentwoentities. In[ 187 ],authorsapproximatevaluesofclosenesscentralityandpr esentnewalgorithms torankbasedonthismeasure.However,therearefewproblemusingthismeasure, whicharedescribedbyNewmanin[ 180 ]. A .2StudyofComplexSystems Theinteractionsbetweenentitiesspursthedevelopmentofemergentproperties visuallydepictedinformofconnectivitynetworks.Amathematicalstudyofsuch networksusingnetworktheoryhelpsunderstandtheinternalstructuresofInternet,the WorldWideWeb,socialnetworksandvariousformsofbiologicalstructures.However,a commonproblemwiththesenetworksaretheenormityandtheircomplexityinthesizes, whichmakesthemdifculttostudy.Therecentdevelopmentinstatisticalmethodsto quantifysuchlargenetworksaddresstheseissuesinformofsmallworldanalysis[ 236 ], s calefreenetworks[ 16 17 ],percolationtheory[ 220 ]andpreferentialattachments[ 37 ]. A naturalwaythustostudysuchsystemistobreakdownlargesystemintosmaller 180

PAGE 181

sub-systems.Fromtheorypointofview,thedensitydistribu tionofverticesinanetwork sub-graphishigherthanexpectedinanequivalentnetworkwithedgesplacedat random.Herewestudythisconceptofanetworkthatiswidelyusedtounderstandthe emergentpropertiesistheinvestigationofgroupsorcommunitiesinthenetworks.The processbywhichanetworkevolvesintocomponentstellshowtheinternalorganization isbuilduponandhelpsusunderstandhowthesystemisstructured.Identication ofinteractionpatternsincomplexnetworksviacommunitystructureshasgathered alotofattentioninrecentresearchstudies.Localcommunitystructuresprovidea bettermeasuretounderstandandvisualizethenatureofinteractionwhentheglobal knowledgeofnetworksisunknown.Recentresearchonlocalcommunitystructures[ 9 46 234 ],however,lacksthefeaturetoadjustitselfinthedynamicn etworksandheavily dependsonthesourcevertexposition.Inthischapter,wediscussanovelapproach toidentifylocalcommunitiesinbiologicalnetworksbasedoniterativeagglomeration andlocaloptimization.Thealgorithmineachiterationstrengthensthelocalcommunity measurebyagglomeratingthebestpossiblesetofverticessuchthattheproposed vertexandcommunityrankcriterionaresuitableforthedynamicnetworkswherethe interactionsamongverticesmaychangeovertime.Anextensivesetofexperiments andbenchmarkingoncomputergeneratednetworksaswellasreal-worldsocialand biologicalnetworksreectthattheproposedalgorithmcanidentifylocalcommunities, irrespectiveofthesourcevertexposition,withmorethan92%accuracyinthesynthetic aswellasintherealworldnetworks. A.3Overview Identifyingpatternsofinteractionhasgainedalotofrecentattentioninnding communitystructuresincomplexnetworks[ 72 79 93 ].Suchnetworksaremodeledas a graph G =( V E ) ,where V isasetofverticesand E isasetofedgesrepresenting theinteractionamongthevertices. Acommunityinanetworkisdenedasagroupof verticeshavingdenseredgeconnectivitythanwithverticesoutsidethegroup .Thus 181

PAGE 182

arelativedegreedistributionamongthemembersofcommunit yisgreaterthanwith non-communitymembers.Weidentifycommunityasafunctionalgroupingofentities exhibitingcertaingenericcharacteristics.Forinstance,acommunityinacitation networkcanrepresentliteraturethatbelongtotheproteinfoldingproblem,afoodweb canrepresentfeedingrelationshipsbetweenspecieswithinanecosystem.Thus,a widespreadapplicabilityhasmadestudyofcommunitystructuresamainstayresearch topicintoday'sscenario. ResearchersfromvariousdisciplinesanalyzecommunitystructureusingEigen vectorsandsparsematrixformulations[ 198 ],algebraicconnectivity[ 77 ],partitioning t echniques[ 133 ],smallworldeffect[ 236 ],parameterizedlinearprogramming[ 73 ]and m anyothermethods.Theseapproachesprovidebetterresults,butarecomputationally expensiveastheyrequireglobalinformationofthenetwork. Arecentwork[ 46 ]hasintroducedtheconceptof l ocalcommunitystructure that detectscommunitygivenasourcevertexanddegreedistributionofimmediateneighbor vertices( localinformation ).Subsequently,severalmethodshavebeenproposedto reducethecomplexityofthistask,however,theysufferinoneormoreways.For instance,proposedmethodsin[ 5 17 ]arespecictoparticularcasesofeitherportal b rowsingortheyrequireaminimalconnectedinitialtopologytopromotethegrowth ofalocalcommunitystructure.Themeasureofmodularitydenedin[ 46 ]considers t hoseverticesthatresideontheboundaryofasubgraph.Themethodsproposedin [ 75 188 214 ]expectsomedegreeofglobalinformationtoascertainparti tionsthatare noteasilydistinguishable.Theshell-spreadingmethodmentionedin[ 9 ]performswell o nlyifthesourcevertexisinthemiddleoftheenclosedcommunity.Animportantissue inallapproachesislackofabilitytoself-adjustwiththeevolvingcommunitystructure whenverticesareaddedandremoved.Furthermore,itisnotpossibletocapture changesthatoccurinhighlydynamicnetworkswherepatternsofinteractionchange constantly. 182

PAGE 183

Inthischapter,weexplainanovelalgorithm[ 234 ]todetectevolvinglocalcommunity s tructureusingiterativeagglomerationandlocaloptimization.Thisalgorithmis self-adjustinginnatureandperformswellinstaticaswellasondynamicnetworks. Basedonthedegreedistributionofofverticesitoptimizesthecommunitystructureby selectingtheverticesthatarecloseenoughtoformaclique.Asdescribedinthelater sections,themeasureof cliqueness iscalculatedusingtheavailablelocalinformation ofverticesdegreeandintercommunityrankings. Weevaluatetheefciencyofthethisalgorithmoncomputergenerateddynamic andsparsenetworkswithdifferentsourcevertexpositions.Theanalysisoftheresults showsthatthealgorithmisveryeffectiveinaddressingawidevarietyofsocial[ 60 ] a ndbiologicalnetworkproblems[ 117 136 ].Wealsoevaluatetheperformance o fouralgorithmonthreerealworldnetworks:ZacharyKarateClub,NRC/MASC receptorcomplexesandMITwirelessmobileusers.InZacharyKarateClub,94%of clubmembers,regardlessofsourcevertexposition(includingbordervertices)are correctlyidentiedwithintheiroriginalcommunities.Theseresultsarebetterthanthe resultsin[ 9 93 ]thatconstraintsthepresenceofsourcevertexinthemiddle ofan enclosedcommunity.Anotherexperimenttounderstandproteininteractionsmapping inNRC/MASCreceptorcomplexesyieldtwobigclustersenclosinglocalcommunityof motifswith92%accuracylevelincoreMASCproteinsfunctionalelements.Inthestudy ofwirelessmobileusers,wefound31activecommunitiesconsistsof890usersbased ontheirpreferentialattachmenttolocations. Therestofthechapterinsection A.4 outlinethisalgorithm.Theevaluationand b enchmarkingoncomputergeneratedandrealworldnetworksisdoneinSection A.5 .Section A.6 discussessomesuggestionstofurtheroptimizethediscover ed communities.Finally,Section A.7 concludesthechapter. 1 83

PAGE 184

A.4TheProposedAlgorithm A .4.1Overview Thelocalcommunitystructurecanbeformallydenedasfollows:Givenan undirectedgraph G =( V E ) andasourcevertex s 2 V ,intheabsenceofthe globalknowledge,thegraph G isexploredonevertexatatimetoformacommunity G `=( V `, E `) ,where G ` isasubgraphof G and s ishighlyconnectedwithin V ` thanto anyvertexin V n V ` Thequantitativemeasureoflocalcommunitystructureisindependentoftheglobal topologyknowledge.Basedontheavailablelocalinformation,algorithmstartcrawling thegraph G fromthesourcevertex s rstvisitingimmediateneighboringvertices.The methodattemptstoquantifythedegreedistributionofneighborinteractionsthatresult intheformationofaclique.Theoptimizationfunctionranksasubsetofverticesfrom alargeunknownsizegroupthataremorecloselyrelatedtoeachotherthantothe remainingverticesofthegroup. Theproposedalgorithmhasthreemainssteps: 1.Initially,aseriesofrandomwalkisperformedtogenerateaninitialsetof communities C 2.Thesetofcommunitiesandtheirverticesobtainedinthestep1arerankedusing localoptimizationfunctions,whicharedenedlaterinthissection. 3.Basedontheseranks,theverticesamongthecommunitiesareexchangedto optimizetheformationofthelocalcommunitystructure.Thealgorithmstopswhen thelocaloptimizationfunctioncannotimproveanyfurtherorifthesizeofthe highestrankedcommunityreachtoauserdenedvalue k .Finally,thealgorithm returnshighestrankedcommunityasthesolution. A.4.2AlgorithmDescription Thealgorithmbeginswithaseriesofrandomwalksfromthesourcevertex s and exploresunknownportionsofthegraph G togenerateaninitialsetofcommunities. Thesecommunitiesareformedasasubsetofverticesandedgesofanunknownsize graph G .Eachvertexofacommunityisassignedavalue r ,whichisdenedasits 184

PAGE 185

degreedividedbynumberofitsimmediateneighborspresenti nthesamecommunity. Withinacommunity,theenclosedsetofverticesarepositionedwithrespectto r Thisimpliesthatavertexwithlarger r hasahigherprobabilitytodiscoverimmediate neighborsinothercommunities;suchverticesarecalled activevertices .Itshouldbe notedthat,becauseoftheinherentnatureoftherandomwalkalgorithm,avertexcanbe includedinmorethanonecommunitybutwithdifferent r values. Similarly,eachcommunityisassignedthevalue )Tj /T1_0 11.955 Tf 6.72 0 Td [(,whichiscalculatedasa cumulativeaverageof r valuesofitsenclosedsetofthevertices.Thesecommunities arepositionedwithrespecttotheir )Tj /T1_0 11.955 Tf 10.08 0 Td [(value.Thisreectstwoimportantaspects:(i) Communitieswithlarge )Tj /T1_0 11.955 Tf 10.08 0 Td [(valuearemoreactivetoformalocalcommunitystructure;(ii) Communitieswithsmall )Tj /T1_0 11.955 Tf 10.08 0 Td [(valueareproclaimedlocalcommunitystructuresortheyare insufcientintermsofthebasicrequirement.Thus,acommunity C shouldeithersiphon offits activevertices oracquiretheimmediateneighborsofits activevertices toyielda localcommunitystructure. Inordertoaccomplishthis,weperformiterativeagglomerationonthesecommunities indecreasingorderoftheir )Tj /T1_0 11.955 Tf 10.08 0 Td [(values.Theagglomerationtakesplacethroughthe exchangeof activevertices amongthehigherrankedcommunities.Itisnowapparent thatthesetofverticesthatbelongstothesecommunitieshasahigherprobabilityto discovertheirimmediateneighborsinothercommunities.Thisexchangeresultsinthe changeofavertex'sneighborhoodandsubsequentlyits r valuechanges.Thisinturn changestherankingofthesecommunities.Asaresultofsuccessiveiterations,vertices comeclosertosatisfyingtheirneighborrequirements.Indoingso,theirrespective r valueswilldecrease,whichfurtherattributestowardsthecreationofalocalcommunity structure.Thisprocesscontinuesuntilthealgorithmhasagglomerated k (givenasinput) numberofverticesorthealgorithmhasdiscoveredanentirelocalcommunitystructure. Fundamentally,thealgorithminvolvesaprocessofbuildingacommunitystructureby clusteringtogetherapatternofhighlyinteractivevertices. 185

PAGE 186

Inthefollowingsubsections,werstdiscusstherandomwalk algorithmthat generatesaninitialsetofcommunities.Thenwediscussamethodtondvertexand communitypositionsandpresentformulaetocalculatetheirrespectiverankings.We alsodiscusstheunderlyingrulestoexchange activevertices amongthehigherranked communities.Finally,thedynamicoptimizationtoachievecommunityformationin dynamicnetworksispresented. A.4.3RandomWalk Thereexistsavarietyofrandomwalkalgorithmsbasedondifferentparameters suchasEuclideantimedistance[ 82 ],reversibleMarkovchains[ 58 ],timingparameters[ 154 ] a ndavailableglobalinformation[ 146 ].However,thesemethodsdonotmeetthe r equirementstocomputealocalcommunitystructureduetotwomainreasons:(i)The sourcevertexpositiontoinitializethelocalcommunitystructureand(ii)Theminimum convergencetimetoformacommunitybasedontheavailablelocalinformation.Hence, weproposearandomwalkbasedonvertexdegreedistribution,whichmeetsallthe basicrequirementstoformalocalcommunity.Thebasicassumptionoftheproposed randomwalkisthattheprobabilitytoexploreavertexisproportionaltoitsdegree distribution. Therandomwalkonthegraph G isperformedinordertogenerateaninitialsetof communities.Atrst,thegivensourcevertex s islabeledandaddedtoacommunity C i .Then,oneofitsimmediateneighboringvertex v isselectedwithaprobabilityof P s =1 = d ( s ) ;where d ( s ) isthedegreeofsourcevertex s .Theselectedvertex v is labeledandaddedtothecommunity C i .Continuinginthisfashion,subsequentvertices areaddedtothesamecommunityuntilapreviouslylabeledvertexencounters(hence, backwardwalkisnotpossible). Iteratively,suchrandomwalksareperformedtogenerateaset C ofinitialcommunities. Let C = f C 1 C 2 ,... C g bethesetofinitialcommunitiesgeneratedthroughaseriesof randomwalks,where j C i j l forall i = f 1... g .Therearetwodifferentwaysof 186

PAGE 187

FigureA-1.ShownabovearefourcommunitiesgeneratedfromR andomWalk.As shownvertex34isthesourcevertex s ,usedinZacharyKarateClub. decidingthetotalnumberofcommunitiestogenerate:-(1)Userprovidedinputofvalue ;and(2)arandomwalkthatisgeneratedtwice.Therstoptionisusefulwhenuser wantstorestrictthenumberofgeneratedcommunities,whilewithsecondoption,the algorithmmaintainsuniquehashes(eg.md5hashes)ofperformedrandomwalks.A hashisgeneratedfromtheuniquesetofnodesandedgesencounteredinthatwalk. Ifthesamehashisgeneratedtwicetherandomwalkalgorithmisterminatedandthe numberofcommunitiesthataregeneratedtillthatiterationistermed .Figure A-1 showsanumberofgeneratedcommunitieswithuserinputoptio nforthesourcevertex s =34 .Furthermore,thisniteMarkovchainrandomwalkalsohasthecapability toreachsignicantlyobscurelocationsofagraph.Verticesthatarenotpartofany 187

PAGE 188

communityareconsideredtobe l eftoververtices .Lateron,these leftoververtices form acommunity,whichischeckedagainstaclassiedlocalcommunityforanypossible exchangeofvertices.Pseudo-codeforuserinputoptionisgiveninAlgorithm-1. Algorithm1 : RandomWalkAlgorithm Input : -numberofcommunitiestogenerate, s -sourcevertex Output :InitialSolutionof communities for i 1 to do Initializenewcommunity C i a dd s to C i set v s set j 0 while j 6= l do if rag status ( v )== true then break e nd srand(time(NULL)) set neib id rand()%d( v ) add neib id to C i setag( v ) true set v neib id j ++ end end TheAlgorithm-1canbemodiedtoimplementthehashbasedgen erationofinitial communities.Insteadofuserinputofvalue ,thealgorithmwillmaintainhashofrandom walksgenerated.Aftereverywalkthenewhashiscomparedwithexisitinghashesandif amatchisfoundthealgorithmterminates. A.4.4LocalOptimization Theclassicationtopartitionagraphintolocalcommunitystructurecanbe interpretedasoptimizingaquantitativenotionofacommunitystructure[ 46 ].Incase o flocalcommunitystructures,thebiggestchallengeisthelackofglobalknowledge ofthenetwork.Therefore,anyoptimizationmustrelyonlyontheavailable LocalInformation .Here,thelocalinformationofacommunityistheedgeconnectivityofits enclosedsetofverticesandtheknowledgeoftheirimmediateneighbors.Thus,from 188

PAGE 189

edgeconnectivity(adjacencymatrix)itcanbedeterminedth athowmanyimmediate neighborsofavertexarewithinitsowncommunityandhowmanyofthemareinsome differentcommunity. Furthermore,iftwoormoresuchcommunitiessharetheirlocalinformation,itis inferentialthatalocalcommunitystructurecanbediscovered.Duringthisprocess,a vertexwilllookoutforitsimmediateneighborsinothercommunities,thenitwilleitherjoin thecommunityofoneofitsimmediateneighborsoraskitsimmediateneighborstojoin itsowncommunity.Thisresultsintheformationofalocalcommunitystructure.Thus, LocalOptimization isdenedasaprocesstodiscovercommunitystructurebasedon theexchangeofavailablelocalinformationbetweencommunities. Tofurtherunderstand localoptimization ,wedenetwoformulae:1)Vertex ranking;and2)Communityranking.Invertexranking,theverticesofacommunity arerankedonthebasisoftheiravailablelocalinformationrelatedtotheirneighbor degreerequirements.Incommunityranking,acommunityisrankedbasedonitsstate andcapabilitytotransformitselfintoalocalcommunitystructure.Letusdenethese formulae. A.4.4.1Vertexranking Fromthediscussionitisclearthatthevertexexchangebetweencommunitiestakes placeonone-on-onebasis.So,wecanthinktodenesomecriteriatooptimizethe processingtimeandreducethevertex-to-vertexcomparison.Notethatthepurpose tomaximizethelocalcommunitystructureistoidentifyasetofverticesthatareclose enoughtoformaclique.However,toidentifyacliqueevenifweknowthenetwork's globalinformationisanNP-completeproblem[ 89 ]. T osolvethisproblem,werstdeneavertexrankfunctiontorankallthevertices exploredduringtherandomwalk.The RankofaVertex decidesitspositionwithinits communityandisthebasisonwhichtheverticesingeneralareagglomeratedwithhigh probabilisticcoefcienttoformaclique. 189

PAGE 190

Algorithm2 : Pseudo-codetondlocalcommunitystructure Input : s l n k O utput :Communitywith k vertices PerformRandomWalk ( )Tj /T1_4 11.955 Tf 12 0 Td (1 s l ,) Addonemoresolutionofnon-pickedvertices for i 1 to -1 do for j 1 to l i do Compute( r j i ) end sortVertices( C i ) Compute( )Tj /T1_7 7.97 Tf 6.72 -1.8 Td (i ) end sortCommunities() set crag true ; while crag 6= false do set crag false ; for i 1 to -1 do set exchange do exchange( C i C i + 1 ) if (exchange==true) then set crag true ; end end for i 1 to do remove empty communities( C i ) e nd Compute sortCommunities() end Let C bethettestcommunity. do crossover( C C ) i f j C j > k then remove j C j )Tj /T1_2 11.955 Tf 11.88 0 Td (k verticesfrom C end return C 190

PAGE 191

Algorithm3 : Pseudo-codetocomputepossibleexchangeofvertices Input : C 1 C 2 O utput :ExchangeCommunities remove common vertices( C 1 C 2 ) s et rag false for i 1 to l 1 do for j 1 to l 2 do if r i > r j then set val 1 r i Computenew )Tj /T1_3 7.97 Tf 6.72 -1.8 Td (2 byadding v i toitsenclosingcommunity.Letnew value v i be r ` i if val 1 >r ` i then add( C 2 v i ) r emove( C 1 v i ) Compute( )Tj /T1_3 7.97 Tf 6.72 -1.8 Td (1 ) Compute( )Tj /T1_3 7.97 Tf 6.72 -1.8 Td (2 ) set i 1 set j i set rag true continue; end elseif r j >r i then set val 1 r j Computenew )Tj /T1_3 7.97 Tf 6.72 -1.8 Td (1 byadding v j toitsenclosingcommunity.Letnew value v j be r ` j if val 1 >r ` j then add( C 1 v j ) r emove( C 2 v j ) Compute( )Tj /T1_3 7.97 Tf 6.72 -1.8 Td (1 ) Compute( )Tj /T1_3 7.97 Tf 6.72 -1.8 Td (2 ) set i 1 set j i set rag true continue; end else continue; e nd end end if (ag==true) then return true end return false 191

PAGE 192

Let C = f C 1 C 2 .. C g bethesetofcommunitiesgeneratedfromtherandomwalk. Eachcommunity C i containsasetofvertices V C i = f v 1 v 2 ... v l g .Thevertexrankis denedas, r ji = (2 d ji ) ( k j i ( k ji )Tj /T1_4 7.97 Tf 6.6 0 Td [(1)) where j isavertexinacommunity C i d ji isthedegreeofvertex j and k ji arethe possiblenumberofitsimmediateneighborsthatarepresentinthesamecommunity.An activevertex withinacommunityisavertexthathasthegreatest r value.Thisimplies that,suchan activevertex hasthehighestprobabilitytondmoreneighborsifitis transferredtosomeothercommunity.Thisfurtherexplainstheexchangeofvertices fromonecommunitytoanothercommunityasawaytomutuallydecreasethevalue r ,whichinturnbringsalltheneighboringverticestogethertoformalocalcommunity structure. Pseudo-codeofthisprocessisgiveninAlgorithm-2and3.Thealgorithmperforms multipleiterationstoagglomerateasetofverticesenclosedwithinalocalcommunity structure.Usingmultipleiterations,thealgorithmself-adjuststhecommunitybyadding theneighboringverticesfromseveralothercommunitiesandbysiphoningoffthe verticesthatmaybelongtosomeothercommunity.Finally,thealgorithmoutputsan assortedsetofverticesthatarecloseenoughtoformaclique. A.4.4.2Communityranking Avertex'sneighborrequirementisverymuchlocaltoitscommunity.Fora vertextolookoutforitspotentialneighbors,itisrequiredtohavetheknowledgeof othercommunities.Butfromoptimizationpointofview,toexploreallthegenerated communitiesisinfeasible.Hence,weshouldderivesomequantitativenotionofa communitytoselectivelyperformneighborlookupforitsenclosedsetofvertices. The rankofacommunity isdenedasanaverage r valueofitsenclosedsetof vertices.Thuscommunityrank )Tj /T1_6 7.97 Tf 6.72 -1.8 Td (i ofacommunity C i isdened, )Tj /T1_6 7.97 Tf 6.72 -1.8 Td (i = X v ji 2C i r ji j C i j 1 92

PAGE 193

Fromtheaboveequationitisclearthat )Tj /T1_0 11.955 Tf 10.2 0 Td (v alueofacommunityisdirectlyproportional tothesumof r valuesofalltheverticesenclosedinthatcommunity.Fromthe discussiononvertexrank,wecanconcludefurtherthatacommunityisrelatively moreactiveifitenclosesmore activevertices .Such activecommunities arepreferred forneighborlookupandpossibleexchangeoftheirvertices. A.4.4.3Mutualexchange Themutualexchangeofverticesamongcommunitiesisanacttomutuallydecrease the )Tj /T1_0 11.955 Tf 10.08 0 Td [(valueofthesecommunities.Atanypointintime,acommunitywithalowest )Tj /T1_0 11.955 Tf -432.12 -24 Td [(valueisthettestcommunityandisconsideredasacandidatetobecomethelocal communitystructure.Thefundamentalactofmutualexchangeoftheverticesbetweena pairofactivecommunitiesisperformedinthedecreasingorderoftheir )Tj /T1_0 11.955 Tf 10.2 0 Td [(values. So,whenavertexisaddedtoanewcommunity,ithasmoreneighborsthanithad initspreviouscommunity.Thusthisvertexisclosertosatisfyingitsimmediateneighbor requirement.Itisverylikelythatthisvertexwasoriginallyapartofthisnewcommunity. Duringtheexchange,ifcommonedgesmatch,bothvertexandcommunityrankings decreasewiththemovementofverticesfromonecommunitytoanother.So,thevertices aregroupedtogetherandtheirnewrankingsarerecalculated.Similarly,rankofthe communitiesarerecalculatedwithrespecttotheirnewsetofvertices. Themutualexchangecriteriawithinapairofcommunitiesisdenedas:Giventwo toprankedcommunities C A and C B suchthat )Tj /T1_6 7.97 Tf 6.72 -1.8 Td (A )Tj /T1_6 7.97 Tf 6.72 -1.8 Td (B v a 2 C A andvalue r a 1.Anexchangeof v a existsif r ` a
PAGE 194

A.4.4.4Dynamicoptimization T hepreviouspublishedworkonthelocalcommunitystructuredoesnotallowany changeinthealreadyformedintermediatecommunities[ 5 17 46 75 181 188 214 ]. A lthough,itisaninexpensivecomputationalapproachthatmightnotproducean optimizedcommunity. Inthecurrentapproach,usingthedynamicoptimization,wheneveravertexis exchangedbetweentwosetsofcommunities,itsrankingisre-calculatedwithrespect toitsimmediateneighbors(ifany)inthisnewcommunity.Thustheiterationalways concludeswiththemostrecentvalueof r fortheenclosedsetofverticesbelongingto thatcommunity. AsillustratedinAlgorithms-2,ineachiterationthealgorithmcalculatesthe maximumaveragereductionfortheindividualcommunity C A = f v a 1 v a 2 ,... v al a g C B = f v b 1 v b 2 ,... v bl b g andthevertextobeexchanged.Thealgorithmcompares theirrespectiveaveragesbeforethetransferoftheirverticesbasedontheconditions mentionedinmutualexchangescheme.Whenavertex v 2 C A communityistransferred to C B community,its r valuechanges,whichinturn,causesachangeinvalue )Tj /T1_1 11.955 Tf 10.08 0 Td [(ofboth C A and C B communities.Thechangedvaluesarecalculatedasfollows: 1.Forvertex v ,thechangedrankingvalueiscalculatedas: r ` v = 2 d v ( k ` v ( k ` v )Tj /T1_7 7.97 Tf 6.6 0 Td (1)) where k ` isthenumberofneighborsenclosedinnewcommunity.Thisistrueforall othervertices. 2.Changedrankingvalueof C A iscalculatedas: )Tj /T1_7 7.97 Tf 6.72 4.32 Td (` A = X v ai 2C A r ai l ` A w here l ` A isthenumberofverticesafter v a isremoved. 3.Changedrankingvalueof C B iscalculatedas: )Tj /T1_7 7.97 Tf 6.72 4.32 Td (` B = X v bi 2C B r bi l ` B w here l ` B isthenumberofverticesafter v a isadded. 194

PAGE 195

A.5Experiments A .5.1Benchmarking Inthissection,weusethebenchmarkingcriteriaasillustratedin[ 8 ]toprovidean o bjectivecomparisonwiththeexistingmethodsofndinglocalcommunitystructures. Initially,aBarab asi-Albertgraph G =( V E ) of V =512 and e 0 =8 iscreated, whichisthenrandomlypartitionedintofour reference communitiestocontainnearly equalnumberofvertices.Everyvertexinthesecommunitieshasanaveragedegree z = z i + z out =16 .Clearly,asmall z out displaysastrongcommunitystructure.Finally, verticesarerewiredthroughapairofedgeswithinsamecommunitieswithoutchanging theirdegreedistribution. Evaluation Thealgorithmgeneratesacommunity C andanon-communityof V n C ofvertices. Forthecomparisonpurpose,the reference partitionistermed P R = f C R C ` R g and found partitionas P F = f C F C ` F g .TheevaluationcriteriausestheNormalizedMutual Information(NMI)[ 55 83 222 ]tomeasurethelikelinessin P R a nd P F .InNMI,a confusionmatrix N iscreated,withitscolumnscorrespondstothe found communities androwstothe reference communities.Thesimilaritymeasureisdenedasfollows: I ( P R P F )= )Tj /T1_3 11.955 Tf 9.24 0 Td (2 C R X i =1 C F X j =1 N ij log ( N ij N N i N j ) C R X i = 1 N i log ( N i N ) + C F X j =1 N j log ( N j N ) w here N ij isthenumberofnodesinthereferencecommunity i thatappearinthe foundcommunity j C R arethenumberofreferencecommunitiesand C F arethenumber offoundcommunities. N i isthesumoverrow i and N j isthesumovercolumn j Thus, I ( P R P F ) givesaquantitativemeasureofhowmuchthefoundcommunityis similartothereferencecommunity.Ascoreof1showsbothcommunitiesareidentical andascoreof0iswhentheyaretotallyindependent. 195

PAGE 196

1 2 3 4 5 6 7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Z outI(P R ,P F ) FigureA-2.Aperformanceanalysisofalgorithmfor128-node network(1000 realizations).Thealgorithmperformswellforlow z out indicatingstrongly separatedcommunitiesanddecreaseinaccuracyforhighervaluesmeaning communityseparationblurs.Theerrorbarsshowthedeviationfromthe meanvalue. 500 600 700 800 900 1000 1100 1200 1300 0 0.2 0.4 0.6 0.8 1 number of rewiringsI(P R ,P F ) FigureA-3.Performanceofalgorithmon512-noderewirednet work(1000realizations). Forlargenumberofrewiringthealgorithmsperformswell,butagradually decreasesinaccuracyaslessnumberofedgesareexchanged.Theerror barsshowthedeviationfromthemeanvalue. 196

PAGE 197

3 10 5 6 7 9 30 14 29 1 16 19 2 26 8 18 17 12 13 27 4 23 11 24 15 20 28 25 22 21 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Vertex NumberAverage Value FigureA-4.Plotofdendrogramshownforageneratedlocalcom munitystructure.A communitywithasetof30-vertexisidentiedcorrectlyforameanvalueof )-276(=.86 .Numbersatthebottomdenotevertexidentiersbelongtothat community.Thetopsinglelineagreewiththeformationofacomplete community. Asshowningures A-2 and A-3 ,thealgorithmshaveperformedverywelland c omparabletotheillustratedresultsin[ 8 ].Thealgorithmisabletocorrectlyidentify t hecommunitieswiththesmallvaluesof z out withadecreaseintheaccuracyasthe valuesof z in and z out becameequal.Inthegure A-3 ,thealgorithmperformsvery w ellwithlargenumberofrewiring,anditsaccuracydecreaseaslessnumberofedges arerewired,whichindicatesthecommunitiesarenotsharplyseparatedwithalesser rewiringcount. A.5.2ComputerGeneratedExperiments TheFollowingsectionillustratestheexperimentalsetupandtheapplicationofthe proposedalgorithmtotheproblemsrelatedtosocio-physicsandcomputationalbiology. Toverifythatalgorithmcanidentifyandrecognizeboundariesofalocalcommunity,we rstapplytheproposedalgorithmoncomputergeneratedgraphs[ 239 ]thathavewell k nowncommunitystructures.Aspreviouslymentioned,theproposedalgorithmcanbe usedindynamicnetworksalso.Toverifythisscenario,wearticiallyaddandsubtract verticesontheyduringtheexecutionofthealgorithm.Agraphisconstructedfor n =120 vertices,dividedintofourlocalcommunitystructuresknownalready.Edgesare placedindependentlyatrandombetweenvertexpairssuchthattheexpectedaverage 197

PAGE 198

0 5 10 15 20 25 30 -2 0 2 4 6 8 10 12 14 16 Number of IterationsAverage G value Average G Value FigureA-5.Plotagainstnumberofiterationsandaverage )Tj /T1_2 11.955 Tf 10.2 0 Td (v alueisshown.When numberofiterationsofthealgorithmreaches 30 ,valueof )-276(=.86 signiesa localcommunitystructureisformedwithallverticesclassiedcorrectly. 1 2 3 4 5 6 7 8 9 0 20 40 60 80 100 Average degree of vertices% correct classified community FigureA-6.Bargraphshowingtheefciencyofalgorithmonva riousgraphsparseness. Thealgorithmperformedquitewellwhileaveragedegreeofcommunityis d =6 degreeofeachvertexisequalto10.Wecantunethesparsenessofgraphbyvarying averagedegreeofvertices. Figure A-4 showsadendrogramgeneratedfromthisalgorithm.Itcontain sa communityof n =30 vertices.Ascanbeseen,alltheverticesareidentiedwithamajor splitatthetopofthetree.Figure A-5 showscorrespondingvalue )Tj /T1_2 11.955 Tf 10.08 0 Td (a gainstnumberof iterationsrequiredtoachieveit.Herewecannotputaboundonthelowerlimitingvalue of )Tj /T1_2 11.955 Tf 10.08 0 Td [(asitisrelatedtoanaveragedegreeofallvertices.However,ourexperimentshave shownthatanyvaluefor 0 < )Tj /T1_6 11.955 Tf 10.08 0 Td ( 1 isanindicationofagoodcommunity. 198

PAGE 199

0 5 10 15 20 25 30 35 40 45 0 5 10 15 Number of vertices in the communityAverage G value Dynamic Network Community Static Network Community FigureA-7.Arelativecomparisonofstaticanddynamiccommu nityvalues )Tj /T1_2 11.955 Tf 10.2 0 Td [(areshown. Thetwocurvesshowaveragevalue )Tj /T1_2 11.955 Tf 10.08 0 Td [((y-axis)withnumberofverticesin communities(x-axis).Asexpected,dynamiccommunity(squares)show spikesin )Tj /T1_2 11.955 Tf 10.08 0 Td [(valuewhenverticesareaddedonthey.Also,staticcommunity (circles),whichisbasedoninitialrandomsolutions,showasteadydecrease inits )Tj /T1_2 11.955 Tf 10.08 0 Td [(value. InFigure A-6 ,resultofanexperimentthatinvolvesgraphsparsenessissh own. Atypicalbargraphchartillustratespercentageofverticesofacommunityidentied correctlyagainstavaryingaveragevertexdegree.Thealgorithmperformedquitewell foranaverageof d =6 andonwardsidentifyingalmost 92% ofverticescorrectly.Foran averagelowdegree( d 4 ),theresultsdeterioratetoalevelof 75% .Thisresultindeed clariesanimportantaspectofrandomwalk.Initialsolutiongeneratedfromrandom walkcontainssignicantnumberofverticesthatcontributetoothercommunities.Asthe sparsenesslevelofagraphincreases,randomwalktraversesthesenon-localizedparts ofcommunity. TheDynamicpropertyofouralgorithmwasalsoevaluatedasshowninFigure A-7 F oraknowncommunityof n =43 vertices,wefeedthealgorithmwithverticesonthey duringitsexecutionprocess.Theseexternallyaddedverticesarenotpartofanyinitial solutiongeneratedfromrandomwalk.Thespikes(squares)showatemporaryincrease inthevalue )Tj /T1_2 11.955 Tf 10.08 0 Td [(whenasetofverticesarefedtonetwork.Acomparisonismadeagainst staticnatureofcommunity(circles)withsamesetofvertices.Instaticcommunity,all 199

PAGE 200

0 5 10 15 20 25 0 2 4 6 8 10 12 14 16 Number of iterationsAverage G value s= 32 s= 16 s= 8 s= 26 s= 21 FigureA-8.Plotshowinganeffectonvalue )Tj /T1_2 11.955 Tf 10.08 0 Td (( y-axis)andnumberofiteration(x-axis). Themergeonthebottom-rightdenotesconvergencetolocalcommunity structurewithnearlysame )Tj /T1_2 11.955 Tf 10.08 0 Td [(value.Almost95%ofverticesarecorrectly identieddespitevaryingsourcevertices. theverticeswerepartofinitialsolutiongeneratedfromrandomwalk.Attheendof execution,value )Tj /T1_2 11.955 Tf 10.08 0 Td [(mergestoacommonpointforbothcommunities.Itisexpected,given thatsetofverticeseventuallyremainsthesameinboththecommunities. Figure A-8 presentsacomparativestudyoflocalcommunityconvergence by varyingthepositionofsourcevertex.Severaliterationsoftheproposedalgorithm onaconstantsetof n =120 verticesarecarriedoutforaknowncommunityof n = 43 vertices.Ineachiteration,adifferentsourcevertex s ischosentocarryoutthe communityformation.Forthepurposeofclarity,onlyvedifferentsourceverticesare illustratedwiththeirrespectivevaluesof )Tj /T1_2 11.955 Tf 10.2 0 Td [(againstnumberofiteration,beforethey convergetoalocalcommunity.AsrevealedinFigure A-8 ,forallsourcevertices,nal c ommunitiesconvergeforanaveragevalue )-447(=.82 signifyinggoodqualityoflocal structuresassumingallcommunitiescontainsamesetofvertices. Figure A-9 presentsexecutiontimecomparisonofvariouslocalandglob al communitybasedalgorithmthatarereferencedinthischapter.Ascanbeseen localinformationbasedalgorithmhavegoodexecutiontimeperformanceoverglobal 200

PAGE 201

1 2 3 4 5 6 0 500 1000 1500 2000 2500 Execution Time (sec)Algorithm n = 512 n = 2048 n = 5000 FigureA-9.Comparisonoftheexecutiontimewithvaryingnum berofnodesinnetwork. Thealgorithmusedareinthefollowingorderofreferences:[ 5 17 75 188 ], p roposedalgorithmand[ 214 ].Ascanbeseenlocalinformationbased c ommunitystructurealgorithmsperformswaybetterthanglobalinformation basedalgorithms.Also,amonglocalalgorithmsproposedmethodsperforms betterthanotherreferencedlocalcommunitystructurealgorithms. informationbasedalgorithms.Also,withinlocalcommunitystructuresproposedmethod performswelloverotherlocalmethods. Fromtheseresults,weconcludethattheproposedalgorithmeffectivelyidentifythe localcommunitystructureinstaticandindynamicnetworks. A.5.3ApplicationtoRealNetworks Inthissection,weevaluatetheproposedalgorithmonthereal-worldsocialand biologicalnetworks.TherstexampleistakenfromZacharykarateclubinwhicha factionaldivisionledtoaformalseparationofthekarateclubintotwoorganizations.It wasearlierstudiedby[ 9 250 ]withdifferentclassesofhierarchicalclusteringmethods Wealsoconsiderabiologicalexampletomapproteininteractiontocreateanetwork representationofneurotransmitterreceptorcomplexes.Finally,withtheadvancements 201

PAGE 202

FigureA-10.ActualbreakdownofZacharyKarateClub. i nwirelesstechnologywecancontinuouslylogmobileusers'interactions,behavioral patternsinasocialsystem.Inonesuchobtaineddataset,weapplyouralgorithmto discovercommunitiesbasedonmobileuserspatio-temporalpreferences. A.5.3.1Zacharykarateclub ThisexampleisbuiltonZacharyKarateClub[ 250 ],whichpresentdatafroma u niversity-basedkarateclubfor n =34 vertices,inwhichafactionaldivisionledtoa formalseparationoftheclubintotwoorganizations. Theinputconsistsof34verticesrepresentingmembersofabigcommunity.A disagreementdevelopedbetweentheadministratorandinstructoroftheclub,which resultedininstructorleavingandstartinganewclub,takingaboutahalfoftheoriginal clubmembers.Weapplytheproposedalgorithm,inanattempttoidentifythefactions anditsmembersinvolvedinthesplit.Figure A-10 showstheoriginalkarateclub,where c irclesrepresentmembersassociatedwiththeadministrator'sfactionandsquares representstheinstructor'sfaction. 202

PAGE 203

FigureA-11.TwolocalfactionsinZacharykarateclubgenera tedusingthealgorithm. Thecommunityrepresentedusingcircle s =34 andsquaresrepresented communitywith s =1 0 5 10 15 20 0 10 20 30 40 50 60 Number of IterationsAverage G value Number of Iterations s= 34 s= 22 FigureA-12.Karateclubconvergencegraphfortheidentica tionofcommunities.The abovegraphshowstheconvergenceofcommunitiestaking s =1 and s =34 ,for =25 .x-axisshowsthenumberoftimestheiterative agglomerationisperformed. 203

PAGE 204

Tojustifythecorrectnessoftheproposedalgorithm,werunt heexperimentfora distinctpairofsourceverticesfromeachlocalfaction.Considering s =1 asaborder vertexand s =34 ascentervertexwith =25 ,wefedthisnetworktotheproposed algorithm.Figure A-11 showstheresultantgeneratedcommunities.Exceptforthe v ertices9and10,algorithmisabletocorrectlyidentifycommunities,howeveritcanbe justiedasbothverticesareequallylinkedtobothfactionswithsamenumberofedges outwardtobothcommunities.ThegraphinFigure A-12 showsasteadyconvergencein )Tj /T1_0 11.955 Tf 10.08 0 Td (v aluesoftheformedcommunities.Itcanalsobeinterpretedthatborderverticesbring asharpdecreaseinthevalue )Tj /T1_0 11.955 Tf 10.08 0 Td [(ofthecommunitystructure.Withanaverage )-387(=.51 for s =1 and )-356(=.65 for s =34 ,anaccuracyof94%isachieved,whichissuperiorto previousresultsof[ 9 93 ]. A .5.3.2Neurotransmitterreceptorcomplexes Mappingproteininteractionstocreateanetworkrepresentationofthecomplexes helpsrevealmanyimportantbiologicalfunctions.Hereweexploretheorganization andfunctionof N -methylD -aspartatereceptorcomplex/MAGUKassociatedsignaling complex(NRC/MASC)[ 195 ]usinglocalcommunityapproach.Usingtheinformation o nthefunction,interactionspatterns,andthephylogenyofeachprotein,weareable todevelopcertaininferencesonthestructuralandfunctionalaspectsofsynapse complexes. Theannotationstudycarriedbeforeonlyconsidersthelistofcomponentsand doesnottakeintoconsiderationtheirinteractionpatternsandorganization.When representedasanundirectedgraph,theinputconsistedof101proteinsvertices connectedby246interactionsedges.Thisconstitutescorefunctionalelementsof MASCproteins.Itlinkstogetherallglutamatereceptorsandahighproportionof thesignaltransductionmachineryresponsibleforthereceptionandintegrationof calciumandG-protein-coupledsynapticsignaling.Given13differentsparseand densecollectionsofnetworkcommunities,wefocusedourworkontheidentication 204

PAGE 205

FigureA-13.NRC/MASCAnalysisforcommunitystructure.Com munitiesgeneratedfor neurotransmitterreceptorcomplexesusing[ 210 ] o fionotropicglutamatereceptorproteinsthatalsocontainnumberofPDZ/DHR/GLGF scaffoldingmoleculesandSer/ThrkinasePKA,knownasanintegratorofsignalsin synapticplasticitycontainingsometracesoftyrosinekinasesandSH2motifproteins. Figure A-13 showtheresultsoftheapplicationformingtwobigclusterso flocal communityofmotifs.Thealgorithmisabletoidentifymorethan92%ofthemcorrectly foranaverage )-377(=.81 ascomparedtotheoriginalcommunitystructurementionedin [ 195 ].ThetwoclustersinFigure A-13 showsthattheproteinsperformfastinformation s haringandresponsecoordination,rstwithinthemselvesandthenwithothermodule members.Asinferred,theclustersofproteinsaroundionotropicglutamatereceptors 205

PAGE 206

andTyrosineproteinkinaseformtheprimarysitesforsignal receptionregulatingeffector mechanismandvesiculartrafcking[ 195 ].Theyaresimilarinnatureandinuencesa p articularsetoffunctionalprocessesspecictotheirstructureandindividualbehavior. Thus,theformedgroupingshowscommunityofsignicantoverlapinfunctionand phenotypicannotations.Thisconveysanimportantresultthatirrespectiveoftypeof network,theproposedalgorithmhasperformedconsistentlywellacrosswiderangeof implementation. A.5.4StudyofWirelessMobileUsers Recently,thewideapplicabilityofhand-heldmobiledevices(example,iPhones) equippedwithmulti-sensor(likeBluetooth)capabilityhasgivenrisetoawholenew worldofsocialanalysis[ 116 121 ].Thetight-coupling[ 104 ]betweenuser-devicehas o penednewdirectiontosensetheirspatio-temporalactivities.Thecontinuouslogging ofsuchactivitiesprovideaninsightintothebehavioralpatterns,likings/dislikingand preferentialattachmenttolocationsofthemobileusers.Traditionally,mostofthe networkresearchwasdonewithoutthespecicassumptionofusers'presenceincluding otherenvironmentalfactors.However,withtheadoptionofwirelesssystem,now thereisaneedtodevelopprotocolsandnetworksinfrastructurethatconsideruser mobility,surroundingsandcommunicationmedium.Specically,forabetterutilization ofavailableresourceslikebandwidthandlimitedbatterypower,weneedprotocols andservicesthatarebehaviororientedandleveragingtheunderlyingpatternsof useractivityandtheirmutualinteraction.Forexample,studyofdensitydistributionof mobileuseronauniversitycampuscanhelpinvestigatewhichlocationscontribute mostnetworkconnectivityandduringwhichtimeoftheday.Thus,ananalysisofsuch datacollectedfromconnectingwirelessAccesspointscantellsuswhethernetwork administratorsneedtoprovidemorebandwidthattheselocationascomparedtoother sparselocations. 206

PAGE 207

FigureA-14.InthestudyofWLANmobileusers,31activecommu nitiesareidentied. Formoreclarityweremovedinter-connectionsamonguserofdifferent communities.Communitystructuremodelingisdoneusing[ 24 ]. T ounderstandthemacromobilityofmobileusers,inthisexperimentweobtain adatasetconsistingof890mobileusersspreadacrossMITcampus[ 145 246 ].We a nalyzetheirmonthlongmovementsandapproximatedtheirpreferentialattachment tocertaincampuslocations[ 139 ].Tounderstandtheinteractionpatternsamongthese u sersweranouralgorithmtoidentifycommunitiesofuserbasedonspatio-temporally similarity.Thus,twousersaredeemedclosestiftheyvisitsimilarlocationsmostofthe timesduringonemonthperiod. 207

PAGE 208

AsshowninFigure A-14 ,ouralgorithmnd31uniquecommunitiesillustrating r ichnessintheuser'sinteraction.Formoreclarityweremovedinter-connectionsamong userofdifferentcommunitiesthatareclusteredbasedonindividuallocationvisiting preferences.Themostdominantclustersshowslocationthatarehighlyvisitedand hencecontributetolargesetofmobileusers.Thelargestclusterconsistof25.73%of usersfollowedby14.16%and7.3%ofmobileusers,combininginallmorethan50% ofuserslocatedonuniversitycampuses.Wealsond23smallclusterswhereuser populationislessthan5%oftotal.Theseresultsprovideaveryinterestinginsightinto thedistributionofmobileusersandanunderstandingandcharacterizingthestructureof suchcommunitypatternsdemonstratingsimilarityintheirbehavioralpatterns.Invariably, concentratingonlyonthree-fourlocations,wecanidentifymorethan50%population inthedataset.Furthermore,suchanalysisisveryimportantindevelopinggroup-aware services,usermodeling,understandingnetworkthrough-putfromamacromobilitylevel anddiscoveringthesocialperspectiveofmobileuserassociatedwithcertainlocations. A.6FurtherImprovements Oncetheagglomerationprocessiscomplete,additionalactivitiescanbeperformed tofurtheroptimizethediscoveredcommunity.Successivemergingofthecommunities isanimportantaspectthatinvolvesgroupingsmallcommunitiestoformabigenclosed community(when l < k ).Byperformingthisprocess,wearealsoabletoavoid possiblelocalminimasthatareformedduringcommunitygeneration.Secondly, whenagglomerationdoesnotoptimizetoprankedcommunities,insteadanexchange operationbetweencomplementaryrankedcommunitiescanbeperformed. CommunityMerging: Setofsmallenclosedcommunities(with l < k )can representabigcommunity,thoughasinglevertexexchangeamongthemdoes notoptimizethesolution.Usingthebasisofvertexandcommunityrank,algorithm successivelymergethesecommunitiestogethertoformabiggercommunity. ComplementaryExchange: Asdiscussedbefore,thetoprankedcommunities areconsideredforvertexexchange.Sometimescommunitieswithhighvalue )Tj /T1_0 11.955 Tf 10.08 0 Td [(maynotnecessarilygenerateanoptimizesolution.Theyremaindivided 208

PAGE 209

inthemselvesandlookforthoseverticeswhichexistsincomp arablystable communities.Duringthecourseofiterations,ifexchangedoesnotgeneratea bettersolution,algorithmmarkssuchcommunitiessuchthattheywillnottakepart infutureiterationstosavedegenerativecounts.Tofurtheroptimizethesolution, algorithmperformexchangeoperationamongcomplementaryrankedcommunities withrespecttotheiradjacencyandvalue )Tj /T1_0 11.955 Tf 6.72 0 Td (. A.7ConcludingRemarks Inthischapter,weintroducedanovelapproachtondlocalcommunitystructure, whenthecompleteinformationofanetworkisunknown.Usingtherandomwalkbased ondegreecentrality,weexploredeachvertexatatimeandaddedittotheinitialsetof communities.Wethenintroducedtheconceptofvertexandcommunityrankings.This helpedtheiterativeagglomerationtechniquetoimprovetheinitialsetofcommunities. Duringthecourseofiterativeagglomeration,theverticesonthebasisofrankingsare transferredfromonecommunitytoanotherinsearchforimmediateneighbors.This processbringsthehighrankedcommunitiesclosertoformaclique.Eachiteration optimizesthecurrentsetofcommunitiesuntil k numberofverticesareexploredora localcommunitystructureisdiscovered.Anextensiontothisalgorithmcanbemadefor theidenticationofglobalcommunitystructurewithdifferentsourcevertexpositions. Computersimulatedexperimentsprovedtheeffectivenessofalgorithminbothstatic aswellasdynamicnetworks.UsingNormalizedMutualInformation,webenchmark proposedalgorithmtoquantifyitsaccuracytodiscoverthelocalcommunities. ExperimentsperformedonZacharyKarateClub,NeurotransmitterReceptorComplexes andwirelessmobileuserdatasetsshowtheapplicabilityoftheproposedalgorithmina widerangeofapplications. 209

PAGE 210

APPENDIXB S HIELD-SOCIALSENSINGANDHELPINEMERGENCYUSINGDEVICES SchoolandCollegecampusesfaceaperceivedthreatofviolentcrimesandrequire arealisticplanagainstunpredictableemergenciesanddisasters.Existingemergency systems(e.g.,911,campus-widealerts)arequiteuseful,butprovidedelayedresponse (oftentensofminutes)anddonotutilizeproximityorlocality.Thereisaneedtoexploit proximity-basedhelpforimmediateresponseandtodeteranycrime.Inthispaper,we proposeSHIELD,anon-campusemergencyrescueandalertmanagementservice.It isafullydistributedinfrastructure-lessplatformbasedonproximity-enabledtrustand cooperation.ItreliesonnearbylocalizedresponsessentusingBluetoothand/orWiFi toachieveminimalresponsetimeandmaximalavailabilitytherebyaugmentingthe traditionalnotionofcentralizedemergencyservices.Analysisofcampuscrimestatistics andWLANtracessurprisinglyshowastrongpositivecorrelation(over55%)between on-campuscrimestatisticsandspatio-temporaldensitydistributionofon-campus mobileusers.Thisresultispromisingtodevelopaplatformbasedonmutualtrustand cooperation.Finally,wealsoshowaprototypeapplicationtobeusedinsuchscenarios. B.1Introduction Thecurrentemergency,alertandpublicsafetysystemstakecentralizedapproaches anddonottapanyavailablelocalrescueservice.Forexample,911andEmergency BlueTowersconnecttocentralizedPublicSafetyAnsweringPoint(PSAP),which thensendrescuersatthecrimesite.Also,theyrequirepre-establishedlinksand infrastructure,whichmaynotbeavailableeverywhereespeciallyinareasaffectedby earthquakeandoods.Ontheotherhand,decentralizedanddistributedapproachesof smallhandhelddeviceswithshortcommunication(Bluetooth,WiFi)givenewdimension toexpresshumanactivitiesneverseenbeforeintermsofpersonalsafetyandrescue. Theyhelpedtorealizegreatpotentialofservicelocalization,proximity,participatory sensingandmessagerelayinginmultitudeofways,forexample:inferringshared 210

PAGE 211

interest[ 160 ]andfriendshipnetworks[ 67 ],identifyingsocialstructureand;human b ehaviorbasedmessageforwarding[ 120 ].Inanovelway,hereweextendtheir u nderlyingcapabilitiestoaugmentcurrentemergencyrescueandalertresponse managementsystems.Weproposeideastodevelop:(1)trustfrommobileuser encounters(2)contextawareservicelocalizationandsignalingofhistoricalcrimelog statistics,allasmeasurestoprovideapreemptiveresponseinavertingthepossibility ofincidentoccurringviaasystemwecalledSHIELD.Asareactiontominimize theaverageresponsetimeofanalreadyoccurredevent,SHIELDmaximizesthe useofavailablelocalhelpinthevicinityofincidence.SHIELDachieveoperational independenceandrobustnessintheprocessofdistresssignalingbyprovidingaset ofguidelinestoensureprivacy,identifytrustedentitiesofthenetworkandincrease thecooperationamongthem,henceregulatingtheowofinformation/messagein acontrolledmanner.Wedeveloptrustandcooperationinthenetworkbasedon(1) NumberofBluetoothencounters(2)DurationofBluetoothencounters.Then,we proposeacomprehensivetrustmodelbuiltontheseandothercontextualfeatures. Thetrustmodelplaysavitalroleinprivacypreservationofmobileusersandsought toincreasecooperationinsidethenetwork.Wealsoproposeacontextawareenergy efcientprotocolthattakesinputfromtrustmodelandhistoricalcrimelogstatistics. Keepinginmindthelimitedresourceofmobiledevices,theprotocolisadaptive andadjuststheparametersettingtobetterservethenatureofemergencyandalert scenario.Finally,theproposedsystemcaneasilyaugmentexistingservices(like911) andbridgethegaptocateravailablelocalizedservicesofrstresponderasquickly aspossible.Inall,ourdesigngoalstodevelopsuchinfrastructure-lessdistributed systemincludes:(1)Maximumavailability,(2)Minimumresponsetime,(3)Reliability ofcommunicationvianetworktrustgeneration,(4)Scalabilityand(5)Cross-platform functionalitytomitigateresponseirrespectiveofthedevicemanufacturers. 211

PAGE 212

B.2Background B .2.1EmergencyandRescueSystemsinGeneral Asmentioned,mostoftheexistingsystemsareeithercentrallycontrolledor requirethirdpartysupport(CellTowersetc).Forexample,universitycampuses deployBlueTowersandstandardtextmessagingsystemslikeCampusED,e2Campus, PanicnPoketoalertstudentsandfaculties.However,theyhavesomeshortcomings: BlueTowersarenotavailableeverywhereandtheSMStextserviceareexpensive, passiveandincurlotofresources.Insteadtargetingaffectedusers,theseSMSaresent inthousands,whichoverloadthecentralsystemandaffectothervoice-dataservices withdelayedthroughputandunacceptablepercentagelossoftotalmessagessent. Furthermore,affectedmobileuserslackanynetworkdriventrustandcooperation,but needtocoordinateontheirlevel.Finally,theycannotbeusedinsituationsofdisasters, earthquakewhereinfrastructurecollapses. B.2.2OtherApproaches Thedevelopmentofarobustandresponsivesystemiscriticaltoemergency management.Severalprototypeshavebeenproposedinthepast.Theauthorsin[ 158 ] p roposedadynamicdatadrivenapplicationframeworkthatuseswirelesscalldatato measuretheabnormalmovementpatternsinthepopulation.Theneedforareliable communicationandinteroperabilitychallengesamongrescueteamsfromtechnological, sociologicalandorganizationalpointofviewarediscussedin[ 160 ].Thebarriersto t echnologyadaptioninemergencymanagementandusercapabilitiesarediscussed in[ 235 254 ],whichgivesdeploymentlevelintricaciesofsystems.Fina lly,realtimetest bedsandsimulatoraremodeledin[ 119 223 ],tohelpdevelopasystemthatcanactually r eactinreality. B.2.3UnderstandingUserbehavioralpatterns Currently,numerousattemptsarebeingmadetounderstanduserbehavioral patternsfrommachine-sensedmeasurements[ 122 137 ].Theytrytodiscovermobile 2 12

PAGE 213

userssocialstructures,periodicroutinesandspatio-temp oralproles.Adetailed studyonvariousexpectsofhumanpatternsaredonein[ 4 166 191 ].Onthesame l ines,authorsin[ 106 108 ]investigatethesocialstructures,communityformationan d deriveexpressionsforthecooperationinthenetworkbasedonsimilarityanddensity distribution.Theseworksmotivateustodevelopaframeworkthatusesbehavioral patternsinthecontextofdevelopingtrustandcooperationfrommulti-sensinghandheld devices. B.2.4UsingHumanMobilityasaCommunicationParadigm Touncoveruserbehavioralpatternsisnotenough;weneedsomecompelling reasonstodevelopacommunicationparadigmbasedonthesepatterns.Animportant workin[ 120 ].proposesamobilityprotocolthatuseshumanbehaviortotr ansfer messages.In[ 42 207 ],authorsconsidertheimpactofmobilityindesigningcommu nication protocolsandprovidewaystodevelopaneffectivecommunicationsystem.These rationalesgiveanimportantmotivationthatasystemcanbedevelopedfromuser behavioralpatterns.Itcanusethosecharacteristicsfeaturesasamediumtoestablish asecureandtimelycommunicationinDTNlikeenvironment.Next,wediscussSHIELD architecture. B.3SHIELD:RationaleandArchitecturaloverview Inthissection,wediscusstheSHIELDarchitectureasshowninFigure B-1 .Here, w eassumemobileusersarecarryinghandhelddevicesequippedwithRF-communication capability.ThemaincomponentsoftheSHIELDare: B.3.1TheEncounterandDurationMatrix MobileencountersarethediscoveryoftheBluetoothdevicespresentinthe vicinity.Initially,webuildtwomatrices:1)AnEncountermatrixthatcontainsthenumber ofencounterswithotherusers.2)ADurationmatrixthatcontainsthedurationof theseencounterswithotherusers.Wealsorecordthetimestampandthelocationof 213

PAGE 214

FigureB-1.SHIELDarchitecture e ncounter.Thelocationisderivedfromtheaccesspointsnifngandusedtoco-locatea userwiththeincidencelocationandtime. B.3.2TheTrustMatrix Thetrustmodeldiscussedlaterusestheseencountersandbuildsawrapperoftrust forthemobilehostinformofatrustmatrix.Itrstmapsencountersinspatio-temporal dimensionandthenassignsthemintovariousclassesoftrustthatidentiesotherusers inuenceinemergencyrescueandalert.Fromtheanalysisofrealtracesandhuman surveys,wendthatlargenumberofencountersandlongerencounterdurationbelong toknownpersonslikefriends,colleaguesandspouse. B.3.3AdvisorySub-System Toprovideanoptimallevelofsafetyandself-preparation,weanalyzedhistorical on-campuscrimelogs.Wecreatedanonlinedatabaseofthesecrimesstatisticsand rankedvariouscampuslocation.Thelocationrankingandvulnerabilityassessmentis doneusingtimeandnatureofthecrime.Thesystemgivesastatutorythreatwarning 214

PAGE 215

tothemobileuserswhilstvisitingalocationataparticular time.Thelocationisderived fromthenearestAccessPoint.Forexample,thesystemashesacautionarysignalto thestudentspassingaparkinggarageatnighttime,ifithassomerecentcrimehistory. AsshowninFigure B-1 ,thisdatamaybestoredonthedevice,oraccessedfroma s erverusingaWiFiconnection. B.3.4Context-awareAdaptiveProtocol Weintroduceacontext-awareadaptiveprotocoltocomplementtrustmodeland advisorysub-system.Itsmaintaskistoperformefcientroutingandtransmissionof thedistresssignal.Forexample,duringacriticaltimelikepassingaparkinggarage, theprotocolincreasesBluetoothscanningfrequencytoidentifynearbytrusteddevices andnotifythemofitsexistence.However,innormaloperationsittriestosaveresources (batterypoweretc),byreducingscanningfrequency. B.3.5DistressSignaling Inemergencysituationsthemobileusercanuseallavailablemodesofcommunication toletnearbytrustednodesknowofthesituation.Ausercanselectclassesoftrust tosendthedistresssignal(automatically)basedontheiravailabilityandalsotoa categoryofindividualswhoprovidespecializedserviceslikedoctors,securityand rescuepersonnel,nighttimevigilguardsetc.Inthefollowingtext,wedescribethe details. B.4TraceAnalysis AnimportantaspectinAdHocNetworkresearchisthecarefulloggingofmobility tracesandempiricalwaystounderstandlargesystems.Inthepastfewyears,wesaw asignicanteffortbyseveraluniversities[ 122 137 ]tocollectlarge-scalemeasurement t hatlogsBluetoothencountersandWLANusers'networkusagespatio-temporal information.TheTRACEframeworkasmentionedin[ 103 ]helpstofurtherrene a ndgenerateencountermatricesforouranalysis.Table1showsthemeasurements usedforourpurposeofstudy.Theabovemeasurementsarecollectedfromthemain 215

PAGE 216

0 5 10 15 20 25 30 35 40 45 50 10 2 10 3 10 4 10 5 Encountered UserDuration of Encounter 0 5 10 15 20 25 30 35 40 45 50 0 200 400 600 800 1000 1200 1400 Encountered UserNumber of Encounters Number of Encounters Duration of Encounters FigureB-2.Analysisofencounterpatterns.Thedistributio nshowuserswhoknoweach otherhavelargefrequencyanddurationofencounterscomparedto strangers. universitycampusofUniversityofFlorida.Bluetoothencountersarecollectedfrom135 studentsinFall2009onNokiaN810sdevices.Tounderstanddensity,wealsouseWiFi measurementsfromAccessPoint(WLAN)connections.Thecrimelogreceivedfrom PolicedepartmentofUniversityofFloridacontainstenyearsofon-campusincidences detailingthetype,time,dateandlocationofcrime. B.5TrustModel Todaymobileusersfrequentlycarryhandheld(e.g.iPhones)devicesthatcanbe usedtoreecttheirpersonality.Usingthemulti-sensorcapabilityofthesedevices(e.g. Bluetooth,GPSandWiFisnifng),wecancapturevitalstatisticslikefrequencyand durationoftimespentatparticularlocations.Statisticalanalysisofhistoricallogging ofmobilityshowsuserbehavioralpatternhaslocationvisitingpreferences,periodic reappearancesandpreferentialattachments.Anotherperspectiveinthestudyof behavioralpatternsistheanalysisofsimilaritythathelpsdevelopinter-connection betweenmobileusers.Inthisregard,afundamentalworkisdonein[ 36 163 ],which p rovidessignicantevidencesofsocio-demographic,spatio-temporalregularityand socialstructuresasabasistodevelophomophilousrelationsandpropinquityamong users.Italsoshowspeoplewhoknoweachotherformacohesiveclusterwithasmall averageshortestpathlength,andalargeclusteringcoefcient.Usingthisrationale, weconductedanexperimenttoanalyzeuserencounterpatternsinUniversityof 216

PAGE 217

Floridacampus.ForaperiodoftenweeksinFall2009,wedistr ibutedNokiaN810and OpenMokoto135students.ThedeviceswereequippedtosniffnearbyactiveBluetooth devicesinarangeupto50meters,localizedbyWiFiAccessPointinformation.The analysisshowthatlargenumberanddurationofmeetingsbelongtouserswhoknow eachotherverywell(validatedbythestudentscarryingouttheexperiments),isshown intheFigure B-2 .Thecurvesdecreaseformobileuserswithknownfacestocomp letely strangers.Thecharacteristicsimilaritybringspeopleofthesamenaturetogether.Thus, theinformationowsrelevanttoonemobileusermorelikelytobeoftheinterestto anothermobileuserofsamecircle.Conversely,thesamecirclecanbecalledforin theeventofemergencyandalert.Finally,thewaytodevelopsuchacircleoftrustand friendshipcanbederivedfromthemobileencounters.Usingtheseresultswecansay thatknownmobileuserswithfrequentencountersandlargemeetingdurationmost effectivelyaretheonestrustedatrstplace.Furthermore,aco-operationnetworkbased onthesetwometricscanbedevelopedandverywellbeusedintheeventofemergency andalertmanagement.Theformationoftrustandcooperationbetweentwomobile usersandisdenedas: B.5.1NumberofEncounters f ( i j ) Wedenethenumberofencounters(n)asthefrequencyofencounter(i.e.,coming withinradiorange)betweentwomobileusersandarethenumberofrepeatedmeetings perunittimeas f ( i j )= n X i j =1 ( i j ) B.5.2DurationofEncountersD(i,j) Wedenedurationofencounterastheamountoftimespentbymobileusers together.Whilethenumberofencountersprovidesanimportantcriteriontoquantify theactivemobilityofusersinthenetwork,itdoesnotprovideaminimumthreshold timerequiredtoestablishaconnectionandsuccessfullytransferthemessages.For example,saytwomobileusersoftenmeet,butonlyforafractionofaminute,despite asuccessfulencounter,itisimpossibleforthemtocommunicateeffectively.Duration 217

PAGE 218

ofencountersprovidestherequisitestabilityfactorinady namicnetworkenvironment. Qualitatively,italsodenestheclosenessbetweentwomobileusers,asadimension tomeasuretrustworthinessandanexpectedlevelofcooperationintheemergencyand alertsituation.Wedenethedurationofencounterbetweentwomobileusernodesand as: Df ( i j )= n X i j =1 d ( ( i j )) Where d ( ( i j )) ,istheindividualdurationofmeetingbetweenandwhilethey encounter.Thesetwometricsprovideafoundationthatledustoimplementacomprehensive trustmodel.Iteffectivelyoptimizesthedistresssignaltransmissionwithtrustednodes. Thetrustmodelusesarule-basedclassierthatrecognizeBluetoothencounter andassignsthemintovariousclassesoftrust.Theseclassesoftrustdenethe socialproximityofauserinemergencyandalertsituations.Therulebaseclassier consistsofaRuleset R = f r 1 r 2 r 3 ,... r n g suchthateachclassicationruleisofform: r j :( Encounter Condition )= C l ( y ) Eachruleconsistsofaconditionstatementthat denesat-tributespertinenttotheencounter.Theterm C l ( y ) showsthatnode y is assignedtoclass C l ( y ) 2 C ,suchthat C = f C 1 C 2 C 3 ,... C m g .Theserulesmaynot bemutuallyexclusiveandsometimesmorethanonerulecanapplytoanencounter. Followingconditionstatementsareusedtobuiltclassierfromenvironmentsensed emergentproperties: LocationandvicinityinformationofBluetoothencounter. Tagsthatdenetheleveloftrustwithanencountereddevice.Thesetagsare similartoranksandstatusquoofaperson,i.e.doctors,securitypersonnel. Duration,frequencyandclocktimeoftheencounter. Activitybasedencounters,whichdescribesthecircumstanceswhenBluetooth encounteredhappened.Thisclassieriseasytointerpretandcanbeincrementally builtontheexistingrules. B.6ProtocolDesign Thestackdenesaprocesstosenddistresssignalstoafewtrustednodes.It achievesitsgoalbymanagingactivities,sensingtheemergentpropertiesfromthe 218

PAGE 219

FigureB-3.ApplicationProtocol l ocationofincidence,dataofhistoricaleventsandintelligentlychoosingthemost effectiveformofavailablecommunication.TheprotocolstackisshowninFigure B-3 is d ividedintofourmaincomponentsas B.6.1TheScanEngine Thetopcomponentcontainsscannerandprolerformobileuserencounters.The trustmodelgeneratesausers'behavioralproleandaggregatestheclassicationof itsBluetoothmeasurementsintoencountermatrixform.Thedevicesensorprovides detailonthehistoricalcrimelog,locationoftheincident,durationandtimetomaximize efciencyofsignal. B.6.2ProtocolAdaption Toprovideoptimallevelofservicesandtoensurethelimitationimposedbymobile deviceprotocoladaptstoenvironmentbyselectivelychangingtheparameterspace basedonenvironmentsensedinputandcrimelogstatistics. B.6.3DistressSignalCommunication Thiscomponentisresponsiblefordecidingtheleveloftrust.Basedonthetype ofincident,thetrustmagnitudeandseverityisdecidedbeforesendingthedistress signal.Wedenevarioustrustmagnitudes:Friends,Strangers,Acquaintances,Tagged data.Adistresssignalcanbebifurcatedintoemergenciesandalerts.Anemergency 219

PAGE 220

A B C F igureB-4.SHIELDApplicationPrototype situationcanbeburglary,heartattack,whilealertsmightinvolvehurricanewarning, earthquakes.Theapplicationdecidesthelifespanofthemessage,numberofhops, typeofforwardingmethodetc. B.6.4DiscoveringandPairing Thelowermostcomponentisresponsibleformessagetransmissiontoothermobile userdevicesandalsoperformsimportantoperationslikepairing,discoveringand relayingtootherdevices.Thismoduleisattachedtohardware,whichcanuseanyofthe availablecommunicationtosenddistresssignal. B.7ApplicationPrototype WedevelopedaprototypeapplicationonAndroidSimulatorbasedonSHIELD architecture.AsshowinFigure B-4 ,thisprototypecanuseBluetoothtocollect u serencountersandtotransmitdistresssignal.Thecoreisthetrustmodel,which isresponsibleforclassifyingandextractingtheleveloftrustbasedonmobileuser encountersandothermachinesensedvicinitydata.Userinterfacesprovidegraphical inputcapabilitytorecordvictimresponsetoadistresssignalmessage.Finally,the protocolanditsadaptationmoduleareusedalongwithmessagetosendthedistress signal. 220

PAGE 221

0 10 20 30 40 50 60 70 0 2 4 6 8 10 12 14 Distance (in mtrs)Time (in sec.) AConnectionTime 0 10 20 30 40 50 60 70 80 4 6 8 10 12 14 16 Distance (in mtrs)Time (in sec) BScanner F igureB-5.Analysisofbluetoothscanningfordevices.TheBluetoothScanningtime varieswithdistance.(b)ConnectionandTransfertimeforBluetooth. B.8TestbedImplementationAnalysis Inthissection,rstweevaluateourtest-bedresultsforBluetoothperformanceusing NokiaN810.Wendthatanaverageof15-20secondsisagoodestimateforsendinga distresssignaltotrustednodes.Then,weanalyzedovertenyearsofon-campuscrime incidencesagainstthedensitydistributionofon-campusmobileusers.Here,wend mostoftheincidenceshappenwhenmobileusersareactive.Thiscorrelationof55% showsonegoodthing,incidencescanbeavertedifapropercoordinationandtrustin thenetworkexists. B.8.1BluetoothEvaluation SinceBluetoothisaprimarymodefordistresssignaling,itisveryimportantto rstevaluateitsperformanceandeffectivenesstocommunicatethemessage.We usedNokiaN810tomeasuretheBluetoothperformanceonscanningandconnection time,deliveryandmessagesize.Toensurethequickestlevelofcommunication,we optimizedtheBluetoothcapabilityofthesedevices.Theresultforaveragescanningand connectiontimeisshowninFigure B-5 .Thescantimeofsixtotensecondsisoptimal t ondtrustednodeswithin50metersofradius.Astheemergencyisrelaxed,wecan spendextratimeinscanningandtracingthetrustednodes.Wealsodeneanefcient andusefulMessageformat.Weanalyzedtheconnectionandtransfertimetakenfora OneHopTransferofmessagesizeof184bytes.AsshowninFigure B-5 ,wefoundthat 2 21

PAGE 222

FigureB-6.Agraphshowcrimeloganddensitydistributionof on-campusmobileusers lowtransfertimesrangebetween0-60meters.Theseresultsshowgreatpromisetouse Bluetoothcommunicationindesigningrescueapplications. B.8.2CrimeStatisticsandMobileUserDensity Next,weanalyzedthepasttenyearsofthecrimelogstatisticsofUniversityof Floridatounderstandthespatio-temporaldistributionoftheincidenceshappened on-campusandalsothedensitydistributionofactivemobileusers.Asshownin Figure B-6 ,thecrimestatisticsarehighduringthemidnightandthenin creasesas thedaytimeprogresses.Thereisapositivecorrelationbetweentheincidencesandthe numberofactivemobileusers.Thus,theseincidencescanbeverywellavertedgiven properpreparednessexistsforthemobileusers. B.9ConcludingRemarks Inthispaper,weproposeanovelmethodtoutilizehandhelddevicesinemergency rescueandalertscenarios.WeintroduceSHIELDtoestablishspatio-temporaltrust andcooperationforuseinlocalizedemergencyalerts.Animportantsub-systemof SHIELDistheproximity-enabledtrustgenerationbasedonthenumberanddurationof encountersamongmobileusers.Ouranalysisshowsthatalargenumberofencounters andhighmeetingdurationoccursamonguserswhoknoweachotherverywell.Then, weintroduceacontext-awareadaptiveprotocolthatisbothenergyefcientandsocial 222

PAGE 223

awareforsignalingdistressmessage.Ourstatisticalanaly sisrevealsapositive correlation(55%)betweenon-campuscrimeincidencesanddensitydistributionof users.Theresultsindicateaneedforasystembasedonmutualtrustandcooperation toavertincidencesandhelpcontrolledowofinformationduringalerts.Toaidthis,we alsoproposedanapplicationprototypeforiPhonesandotherhandhelddevices.We hopethatSHIELDwillaugmentthecurrentsafetyinfrastructureanditsdeploymenthelp makeasafeenvironmentinschoolsanduniversitiescampuses. 223

PAGE 224

APPENDIXC T HEONESIMULATORREADME TheONEisaOpportunisticNetworkEnvironmentsimulatorwhichprovidesa powerfultoolforgeneratingmobilitytraces,runningDTNmessagingsimulationswith differentroutingprotocols,andvisualizingbothsimulationsinteractivelyinreal-timeand resultsaftertheircompletion.WehaveusedthissimulatortotestseveralDTNprotocols includingepidemicrouting,sprayandwait,PROPHETandBubble-raptovalidatethe accuracyofexistingandproposedmobilitymodeltorealwirelessmeasurements.In thisappendixchpater,weexplaintheusetoONEsimulatorwithfewexamplesand congurationparameters. C.1Running ONEcanbestartedusingtheincludedone.bat(forWindows)orone.sh(for Linux/Unix)script.Followingexamplesassumeyou'reusingtheLinux/Unixscript(just replace./one.shwithone.batforWindows). Synopsis: ./one.sh[-bruncount][conf-les] Options: -bRunsimulationinbatchmode.Doesn'tstartGUIbutprintsinformationaboutthe progresstoterminal.Theoptionmustbefollowedbythenumberofrunstoperformin thebatchmodeorbyarangeofrunstoperform,delimitedwithacolon(e.g,value2:4 wouldperformruns2,3and4).SeesectionRunindexingformoreinformation. Parameters: conf-les:Thecongurationlenameswheresimulationparametersarereadfrom. Anynumberofcongurationlescanbedenedandtheyarereadintheordergivenin thecommandline.Valuesinthelaterconglesoverridevaluesinearliercongles. 224

PAGE 225

C.2Conguring A llsimulationparametersaregivenusingcongurationles.Theselesarenormal textlesthatcontainkey-valuepairs.Syntaxformostofthevariablesis: Namespace.key=value I.e.,thekeyis(usually)prexedbyanamespace,followedbyadot,andthenkey name.Keyandvalueareseparatedbyequals-sign.Namespacesstartwithcapital letterandbothnamespaceandkeysarewritteninCamelCase(andarecasesensitive). Namespacedenes(loosely)thepartofthesimulationenvironmentwherethesetting haseffecton.Many,butnotall,namespacesareequaltotheclassnamewherethey areread.Especiallymovementmodels,reportmodulesandroutingmodulesfollowthis convention. Numericvaluesuse'.'asthedecimalseparatorandcanbesufxedwithkilo(k) mega(M)orgiga(G)sufx.Booleansettingsaccepttrue,false,0,and1as values. Manysettingsdenepathstoexternaldatales.Thepathscanberelativeor absolutebutthedirectoryseparatormustbe'/'inbothUnixandWindowsenvironment. Somevariablescontaincomma-separatedvalues,andforthemthesyntaxis: Namespace.key=value1,value2,value3,etc. Forrun-indexedvaluesthesyntaxis: Namespace.key=[run1value;run2value;run3value;etc] I.e.,allvaluesaregiveninbracketsandvaluesfordifferentrunareseparatedby semicolon.Eachvaluecanalsobeacomma-separatedvalue.Formoreinformation aboutrunindexing,gotosectionRunindexing. Settinglescancontaincommentstoo.Acommentlinemuststartwith# character.Restofthelineisskippedwhenthesettingsareread.Thiscanbealso usefulfordisablingsettingseasily. 225

PAGE 226

Somevalues(scenarioandreportnamesatthemoment)support valuelling. Withthisfeature,youcanconstructe.g.,scenarionamedynamicallyfromthesetting values.Thisisespeciallyusefulwhenusingrunindexing.Justputsettingkeynames inthevaluepartprexedandsufxedbytwopercent(%)signs.Theseplaceholders arereplacesbythecurrentsettingvaluefromthecongurationle.Seetheincluded snw comparison settings.txtforanexample. F iledefault settings.txt,ifexists,isalwaysreadandtheothercongu rationles givenasparametercandenemoresettingsoroverridesome(orevenall)settingsin thepreviousles.Theideaisthatyoucandeneintheearlierlesallthesettingsthat arecommonforallthesimulationsandrundifferent,specic,simulationsusingdifferent congurationles. C.3Runindexing Runindexingisafeaturethatallowsyoutorunlargeamountsofdifferent congurationsusingonlysinglecongurationle.Theideaisthatyouprovidean arrayofsettings(usingthesyntaxdescribedabove)forthevariablesthatshouldbe changedbetweenruns.Forexample,ifyouwanttorunthesimulationusingvedifferent randomnumbergeneratorseedsformovementmodels,youcandeneinthesettings lethefollowing: MovementModel.rngSeed=[1;2;3;4;5] Now,ifyourunthesimulationusingcommand: ./one.sh-b5my cong.txt y ouwouldrunrstusingseed1(runindex0),thenanotherrunusingseed2,etc. Notethatyouhavetorunitusingbatchmode(-boption)ifyouwanttousedifferent values.Withoutthebatchmodeagtherstparameter(ifnumeric)istherunindexto usewhenrunninginGUImode. 226

PAGE 227

Runindexeswraparound:usedvalueisthevalueatindex(runI ndexarrayLength). Becauseofwrapping,youcaneasilyrunlargeamountofpermutationseasily.For example,ifyoudenetwokey-valuepairs: key1=[1;2] key2=[a;b;c] andrunsimulationusingrun-indexcount6,youwouldgetallpermutationsofthe twovalues(1,a;2,b;1,c;2,a;1,b;2,c).Thisnaturallyworkswithanyamountofarrays. Justmakesurethatthesmallestcommonnominatorofallarraysizesis1(e.g.,use arrayswhosesizesareprimes)unlessyoudon'twantallpermutationsbutsome valuesshouldbepaired. C.4Movementmodels Movementmodelsgovernthewaynodesmoveinthesimulation.Theyprovide coordinates,speedsandpausetimesforthenodes.Thebasicinstallationcontains5 movementmodels:randomwaypoint,mapbasedmovement,shortestpathmapbased movement,maproutemovementandexternalmovement.Allmodels,exceptexternal movement,havecongurablespeedandpausetimedistributions.Aminimumand maximumvaluescanbegivenandthemovementmodeldrawsuniformlydistributed randomvaluesthatarewithinthegivenrange.Sameappliesforpausetimes.In externalmovementmodelthespeedsandpausetimesareinterpretedfromthegiven data. Whenanodeusestherandomwaypointmovementmodel(RandomWaypoint),it isgivenarandomcoordinateinthesimulationarea.Nodemovesdirectlytothegiven destinationatconstantspeed,pausesforawhile,andthengetsanewdestination.This continuesthroughoutthesimulationsandnodesmovealongthesezig-zagpaths. Map-basedmovementmodelsconstrainthenodemovementtopredenedpaths. Differenttypesofpathscanbedenedandonecandenevalidpathsforallnode 227

PAGE 228

groups.Thiswaye.g.,carscanbepreventedfromdrivingindo orsoronpedestrian paths. Thebasicmap-basedmovementmodel(MapBasedMovement)initiallydistributes thenodesbetweenanytwoadjacent(i.e.,connectedbyapath)mapnodesandthen nodesstartmovingfromadjacentmapnodetoanother.Whennodereachesthenext mapnode,itrandomlyselectsthenextadjacentmapnodebutchoosesthemapnode whereitcamefromonlyifthatistheonlyoption(i.e.,avoidsgoingbacktowhereit camefrom).Oncenodehasmovedtrough10-100mapnodes,itpausesforawhileand thenstartsmovingagain. Themoresophisticatedversionofthemap-basedmovementmodel(ShortestPath MapBasedMovement)usesDijkstra'sshortestpathalgorithmtonditswaytroughthe maparea.Onceanodereachesitsdestination,andhaswaitedforthepausetime,a newrandommapnodeischosenandnodemovesthereusingtheshortestpaththat canbetakenusingonlyvalidmapnodes. Fortheshortestpathbasedmovementmodels,mapdatacanalsocontainPoints OfInterest(POIs).Insteadofselectinganyrandommapnodeforthenextdestination, themovementmodelcanbeconguredtogiveaPOIbelongingtoacertainPOIgroup withacongurableprobability.TherecanbeunlimitedamountofPOIgroupsandall groupscancontainanyamountofPOIs.Allnodegroupscanhavedifferentprobabilities forallPOIgroups.POIscanbeusedtomodele.g.,shops,restaurantsandtourist attractions. Routebasedmovementmodel(MapRouteMovement)canbeusedtomodelnodes thatfollowcertainroutes,e.g.busortramlines.Onlythestopsontheroutehavetobe denedandthenthenodesusingthatroutemovefromstoptostopusingshortestpaths andstoponthestopsfortheconguredtime. Allmovementmodelscanalsodecidewhenthenodeisactive(movesandcanbe connectedto)andwhennot.Forallmodels,exceptfortheexternalmovement,multiple 228

PAGE 229

simulationtimeintervalscanbegivenandthenodesinthatgr oupwillbeactiveonly duringthosetimes. Allmap-basedmodelsgettheirinputdatausinglesformattedwithasubsetof theWellKnownText(WKT)format.LINESTRINGandMULTILINESTRINGdirectives ofWKTlesaresupportedbytheparserformappathdata.Forpointdata(e.g.for POIs),alsothePOINTdirectiveissupported.Adjacentnodesina(MULTI)LINESTRING areconsideredtoformapathandifsomelinescontainsomevertex(es)withexactly thesamecoordinates,thepathsarejoinedfromthoseplaces(thisishowyoucreate intersections).WKTlescanbeeditedandgeneratedfromrealworldmapdatausing anysuitableGeographicInformationSystem(GIS)program.Themapdataincluded inthesimulatordistributionwasconvertedandeditedusingthefree,Javabased OpenJUMPGISprogram. Differentmaptypesaredenedbystoringthepathsbelongingtodifferenttypesto differentles.PointsOfInterestaresimplydenedwithWKTPOINTdirectiveandPOI groupsaredenedbystoringallPOIsbelongingtoacertaingroupinthesamele.All POIsmustalsobepartofthemapdatasotheyareaccessibleusingthepaths.Stops fortheroutesaredenedwithLINESTRINGandthestopsaretraversedinthesame ordertheyappearintheLINESTRING.OneWKTlecancontainmultipleroutesand theyaregiventonodesinthesameorderastheyappearinthele. Theexperimentalmovementmodelthatusesexternalmovementdata (ExternalMovement)readstimestampednodelocationsfromaleandmovesthe nodesinthesimulationaccordingly.SeejavadocsofExternalMovementReaderclass frominputpackagefordetailsoftheformat.Asuitable,experimentalconverterscript (transimsParser.pl)forTRANSIMSdataisincludedinthetoolkitfolder. ThemovementmodeltouseisdenedpernodegroupwiththemovementModel setting.Valueofthesettingmustbeavalidmovementmodelclassnamefromthe movementpackage.Settingsthatarecommonforallmovementmodelsarereadinthe 229

PAGE 230

MovementModelclassandmovementmodelspecicsettingsare readintherespective classes.Seethejavadocdocumentationandexamplecongurationlesfordetails. C.5Routingmodulesandmessagecreation Routingmodulesdenehowthemessagesarehandledinthesimulation.Six differentactiveroutingmodules(FirstContact,Epidemic,SprayandWait,Direct delivery,PRoPHETandMaxProp)andalsoapassiverouterforexternalrouting simulationareincludedinthepackage.Theactiveroutingmodulesareimplementations ofthewellknownroutingalgorithmsforDTNrouting.Seetheclassesinroutingpackage fordetails. Passiverouterismadeespeciallyforinteractingwithother(DTN)routingsimulators orrunningsimulationsthatdon'tneedanyroutingfunctionality.Therouterdoesn'tdo anythingunlesscommandedbyexternalevents.Theseexternaleventsareprovidedto thesimulatorbyaclassthatimplementstheEventQueueinterface. Thecurrentreleaseincludestwoclassesthatcanbeusedasasourceof messageevents:ExternalEventsQueueandMessageEventGenerator.Theformer canreadeventsfromalethatcanbecreatedbyhand,withasuitablescript(e.g., createCreates.plscriptinthetoolkitfolder),orbyconvertinge.g.,dtnsim2'soutputto suitableform.SeeStandardEventsReaderclassfrominputpackagefordetailsofthe format.MessageEventGeneratorisasimplemessagegeneratorclassthatcreates uniformlydistributedmessagecreationpatternswithcongurablemessagecreation interval,messagesizeandsource/destinationhostranges. Thetoolkitfoldercontainsanexperimentalparserscript(dtnsim2parser.pl)for dtnsim2'soutput(thereusedtobeamorecapableJava-basedparserbutitwas discardedinfavorofthismoreeasilyextendablescript).Thescriptrequiresafew patchestodtnsim2'scodeandthosecanbefoundfromthetoolkit/dtnsim2patches folder. 230

PAGE 231

Theroutingmoduletouseisdenedpernodegroupwiththesett ingrouter. Allrouterscan'tinteractproperly(e.g.,PRoPHETroutercanonlyworkwithother PRoPHETrouters)sousuallyitmakessensetousethesame(orcompatible)routerfor allgroups. C.6Reports Reportscanbeusedtocreatesummarydataofsimulationruns,detaileddataof connectionsandmessages,lessuitableforpost-processingusinge.g.,Graphviz(to creategraphs)andalsotointerfacewithotherprograms.Seejavadocsofreport-package classesfordetails. Therecanbeanynumberofreportsforanysimulationrunandthenumberof reportstoloadisdenedwithReport.nrofReportssetting.Reportclassnamesare denedwithReport.reportNsetting,whereNisanintegervaluestartingfrom1.The valuesofthesettingsmustbevalidreportclassnamesfromthereportpackage.The outputdirectoryofallreports(whichcanbeoverriddenperreportclasswiththeoutput setting)mustbedenedwithReport.reportDir-setting.Ifnooutputsettingisgivenfor areportclass,theresultingreportlenameisReportClassName ScenarioName.txt. A llreportshavemanycongurablesettingswhichcanbedenedusing ReportClassName.settingKey-syntax.SeejavadocsofReportclassandspecic reportclassesfordetails(lookforsettingiddenitions). C.7Hostgroups Ahostgroupisgroupofhosts(nodes)thatsharesmovementandroutingmodule settings.Differentgroupscanhavedifferentvaluesforthesettingsandthiswaythey canrepresentdifferenttypesofnodes.BasesettingscanbedenedintheGroup namespaceanddifferentnodegroupscanoverridethesesettingsordenenewsettings intheirspecicnamespaces(Group1,Group2,etc.). 231

PAGE 232

C.8Settings T hereareplentyofsettingstocongure;morethanismeaningfultopresenthere. Seejavadocsofespeciallyreport,routingandmovementmodelclassesfordetails.See alsoincludedsettingslesforexamples.Perhapsthemostimportantsettingsarethe following. C.8.1ScenarioSettings Scenario.name:Nameofthescenario.Allreportlesarebydefaultprexedwith this. Scenario.simulateConnections:Shouldconnectionsbesimulated.Ifyou'reonly interestedinmovementmodeling,youcandisablethistogetfastersimulation.Usually youwantthistobeon. Scenario.updateInterval:Howmanysecondsaresteppedoneveryupdate. Increasethistogetfastersimulation,butthenyou'lllosesomeprecision.Values from0.1to2aregoodforsimulations. Scenario.endTime:Howmanysimulatedsecondstosimulate. Scenario.nrofHostGroups:Howmanyhostsgrouparepresentinthesimulation. C.8.2InterfaceSettings type:Whatclass(fromtheinterfaces-directory)isusedforthisinterface.The remainingsettingsareclass-specic.Canbeforexample: transmitRange:Range(meters)oftheinterface. transmitSpeed:Transmitspeedoftheinterface(bytespersecond). C.8.3HostGroupSettings groupID:Group'sidentier(astringoracharacter).Usedastheprexofhost namesthatareshownintheGUIandreports.Host'sfullnameisgroupID+networkAddress. nrofHosts:Numberofhostsinthisgroup. nrofInterfaces:Numberofinterfacesthisthenodesofthisgroupuse interfaceX:TheinterfacethatshouldbeusedastheinterfacenumberX 232

PAGE 233

movementModel:Themovementmodelallhostsinthegroupuse. Mustbea validclass(onethatisasubclassofMovementModelclass)namefromthemovement package. waitTime:Minimumandmaximum(twocomma-separateddecimalvalues)ofthe waittimeinterval(seconds).Deneshowlongnodesshouldstayinthesameplaceafter reachingthedestinationofthecurrentpath.Anewrandomvaluewithintheintervalis usedoneverystop.Defaultvalueis0,0. speed:Minimumandmaximum(twocomma-separateddecimalvalues)ofthe speedinterval(m/s).Deneshowfastnodesmove.Anewrandomvalueisusedon everynewpath.Defaultvalueis1,1. bufferSize:Sizeofthenodes'messagebuffer(bytes).Whenthebufferisfull,node can'tacceptanymoremessagesunlessitdropssomeoldmessagesfromthebuffer. router:Routermodulewhichisusedtoroutemessages.Mustbeavalidclass (subclassofReportclass)namefromroutingpackage. activeTimes:Timeintervals(comma-separatedsimulatedtimevaluetuples:start1, end1,start2,end2,...)whenthenodesinthegroupshouldbeactive.Ifnointervalsare dened,nodesareactiveallthetime. msgTtl:TimeToLive(simulatedminutes)ofthemessagescreatedbythishost group.Nodes(withactiveroutingmodule)checkeveryoneminutewhethersomeof theirmessages'TTLshaveexpiredanddropsuchmessages.IfnoTTLisdened, inniteTTLisused. C.8.4ReportSettings Report.nrofReports:Howmanyreportmodulestoload.Modulenamesaredened withsettingsReport.report1,Report.report2,etc.Followingreportsettingscan bedenedforallreports(usingReportnamespace)orjustforcertainreports(using ReportNnamespaces). 233

PAGE 234

Report.reportDir:Wheretostorethereportoutputles.Can beabsolutepathor relativetothepathwherethesimulationwasstarted.Ifthedirectorydoesn'texists,itis created. Report.warmup:Lengthofthewarmupperiod(simulatedsecondsfromthestart). Duringthewarmupthereportmodulesshoulddiscardthenewevents.Thebehavioris reportmodulespecicsocheckthe(java)documentationofdifferentreportmodulesfor details. C.8.5EventGeneratorSettings Events.nrof:Howmanyeventgeneratorsareloadedforthesimulation.Event generatorspecicsettings(seebelow)aredenedinEventsNnamespaces(so Events1.settingNameconguresasettingforthe1steventgeneratoretc.). EventsN.class:Nameofthegeneratorclasstoload(e.g.,ExternalEventsQueueor MessageEventGenerator).Theclassmustbefoundfromtheinputpackage. FortheExternalEventsQueueyoumustatleastdenethepathtotheexternal eventsle(usingsettinglePath).Seeinput.StandardEventsReaderclass'javadocsfor informationaboutdifferentexternalevents. C.9ExampleConguration Events.nrof=1 Events1.class=DTN2Events Report.nrofReports=1 Report.report1=DTN2Reporter DTN2.congFile=cla.conf C.10Toolkit Thesimulationpackageincludesafoldercalledtoolkitthatcontainsscriptsfor generatinginputandprocessingtheoutputofthesimulator.All(currentlyincluded) scriptsarewrittenwithPerl(http://www.perl.com/)soyouneedtohaveitinstalledbefore 234

PAGE 235

runningthescripts.Somepostprocessingscriptsusegnuplo t(http://www.gnuplot.info/) forcreatinggraphics.BothoftheprogramsarefreelyavailableformostoftheUnix/Linux andWindowsenvironments.ForWindowsenvironment,youmayneedtochangethe pathtotheexecutablesforsomeofthescripts. getStats.pl:Thisscriptcanbeusedtocreatebar-plotsofvariousstatisticsgathered bytheMessageStatsReport-reportmodule.Theonlymandatoryoptionis-statwhich isusedtodenethenameofthestatisticsvaluethatshouldbeparsedfromthereport les(e.g.,delivery probformessagedeliveryprobabilities).Restoftheparam eters shouldbeMessageStatsReportoutputlenames(orpaths).Scriptcreatesthreeoutput les:onewithvaluesfromalltheles,onewiththegnuplotcommandsusedtocreate thegraphicsandnallyanimagelecontainingthegraphics.Onebariscreatedtothe plotforeachinputle.Thetitleforeachbarisparsedfromthereportlenameusingthe regularexpressiondenedwith-labeloption.RungetStats.plwith-helpoptionfor morehelp. ccdfPlotter.pl:ScriptforcreatingComplementary(/Inverse)CumulativeDistribution Functionplots(usinggluplot)fromreportsthatcontaintime-hitcount-tuples.Output lenamemustbedenedwiththe-outoptionandrestoftheparametersshouldbe (suitable)reportlenames.-labeloptioncanbeusedfordeninglabelextracting regularexpression(similartooneforthegetStatsscript)forthelegend. createCreates.pl:Messagecreationpatternforthesimulationcanbedenedwith externaleventsle.Suchalecanbesimplycreatedwithanytexteditorbutthisscript makesiteasiertocreatealargeamountofmessages.Mandatoryoptionsarethe numberofmessages(-nrof),timerange(-time),hostaddressrange(-hosts)and messagesizerange(-sizes).Thenumberofmessagesissimplyanintegerbutthe rangesaregivenwithtwointegerswithacolon(:)betweenthem.Ifhostsshouldreply tothemessagesthattheyreceive,sizerangeofthereplymessagescanbedenedwith -rsizesoption.Ifacertainrandomnumbergeneratorseedshouldbeused,thatcanbe 235

PAGE 236

denedwith-seedoption.Allrandomvaluesaredrawnfroma uniformdistributionwith inclusiveminimumvalueandexclusivemaximumvalue.Scriptoutputscommandsthat aresuitableforexternaleventsle'scontents.Youprobablywanttoredirecttheoutputto somele. dtnsim2parser.plandtransimsParser.pl:Thesetwo(quiteexperimental)parsers convertdatafromotherprogramstoaformthatissuitableforONE.Bothtaketwo parameters:inputandoutputle.Iftheseparametersareomitted,stdinandstdout areusedforinputandoutput.With-hoptionashorthelpisprinted.dtnsim2parser convertsdtnsim2's(http://watwire.uwaterloo.ca/DTN/sim/)output(withverbosemode8) toanexternaleventslethatcanbefedtoONE.Themainideaofthisparseristhatyou canrstcreateaconnectivitypatternleusingONEandConnectivityDtnsim2Report, feedthattodtnsim2andthenobservetheresultsvisuallyinONE(usingtheoutput convertedwithdtnsim2parserastheexternaleventsle).transimsParsercanconvert TRANSIM's(http://transims-opensource.net/)vehiclesnapshotlestoexternal movementlesthatcanbeusedasaninputfornodemovement.SeeExternalMovement andExternalMovementReaderclassesformoreinformation. C.11ExampleofUSCConguration Scenario.nrofHostGroups=1 Scenario.name=USC-Real-%%Group.router%% Group.router=SprayAndWaitRouter SprayAndWaitRouter.nrofCopies=10 SprayAndWaitRouter.binaryMode=true Scenario.endTime=647710 Scenario.simulateConnections=false Group.movementModel=StationaryMovement Group.nrofHosts=75 236

PAGE 237

Group.nodeLocation=10,10 # Howmanyeventgenerators Events.nrof=2 #Externaltraces Events1.class=StandardEventsReader Events1.filePath=simTraceList_usc.txt #Events2.class=StandardEventsReader #Events2.filePath=msg_list.txt ##Messagecreationparameters Events2.class=MessageEventGenerator #(followingsettingsarespecificfortheMessageEventGeneratorclass) #Creationintervalinseconds(onenewmessageevery25to35seconds) Events2.interval=25,35 #Messagesizes(500kB-1MB) Events2.size=1k,2k #rangeofmessagesource/destinationaddresses Events2.hosts=0,74 #MessageIDprefix Events2.prefix=M Report.nrofReports=25 Report.warmup=0 Report.granularity=10 #MessageLocationReport.messages=1,30 Report.reportDir=reports/USC/Real/SprayAndWaitRouter/ver1/ 237

PAGE 238

#reports/USC-Real/SprayAndWaitRouter/ R eport.report1=AdjacencyGraphvizReport Report.report2=ConnectivityDtnsim2Report Report.report3=ConnectivityONEReport Report.report4=ContactsDuringAnICTReport Report.report5=ContactsPerHourReport Report.report6=ContactTimesReport Report.report7=CreatedMessagesReport Report.report8=DeliveredMessagesReport Report.report9=DistanceDelayReport Report.report10=DTN2Reporter Report.report11=EncountersVSUniqueEncountersReport Report.report12=UniqueEncountersReport Report.report13=EventLogReport Report.report14=InterContactTimesReport Report.report15=MessageDelayReport Report.report16=MessageDeliveryReport Report.report17=MessageGraphvizReport Report.report18=TotalEncountersReport Report.report19=MessageReport Report.report20=MessageStatsReport Report.report21=MovementNs2Report Report.report22=PingAppReporter Report.report23=TotalContactTimeReport Report.report24=EnergyLevelReport Report.report25=MessageLocationReport 238

PAGE 239

APPENDIXD S MOOTHMOBILITYMODEL Inadditiontobeingrealistic,amobilitymodelshouldbeeasytounderstandand use.Unfortunately,mostofthesimplemobilitymodelsproposedthusfararenotrealistic andmostoftherealisticmobilitymodelsproposedthusfararenotsimpletouse.The maincontributionofthisworkistopresentSMOOTH,anewmobilitymodelthatis realistic(e.g.,SMOOTHisbasedonseveralknownfeaturesofhu-manmovement) andissimpletouse(e.g.,SMOOTHdoesnothaveanycomplexinputparameters). SMOOTHprovideresearcherswithatoolthatallowsthemtoleveragethestatistical featurespresentinrealhumanmovementinasimpleandeasytounderstandmanner. D.1UsageInstructions Extractthesmoothfolderandcompileusing: $makeclean $make Thesmooth-binleisanexecutablethatcanbeusedinthefollowingmanner: $./smooth-bin102000020000100117201.451140001.51040200 numberofNodes-saywehave300. simulationarea-2000020000 clusters-theyaredenedasthelocations.Initsowncase,locationsform communities,someofthemaremoreoftenvisitedthanothers.Thus,thenotionof communitiesisderivedfromthevisitingpreferencestolocation. duration-usuallytakeninhours.sofor30daysits720 ,fminandfmaxshowstheightlengthdistribution,wetooksay1.45114000 f ,pminandpmaxshowsthepausetimedistribution,wetooksay1.510040200 D.2Settings Howlonganodestaysatalocation? 239

PAGE 240

WheneveranodearrivesatalocationL1(seeCASE3inlocation map.c),itwill pausehereforsometime(denotedbyp time);thus,thetimeforwhichthenodewillbe s eenatthislocationisgivenbyp time.So,youcansaythenodearrivedatL1atcurrent t ime(denotedbyt)andandleftL1(i.e.,startedmovingtoanotherlocationL2)attime (t+p time).ThetimeitwilltakeforthenodetogetfromL1toL2isde notedbyight time(ight time)inlocationmap.c.Whenamobilenodeistravellingfrom onelocationto another,youcancertainlytrackitscurrentlocationviavariable(crt xy). G enerateencounterstatistics(contactduration,intercontacttimeandnumberof encounters)fromthemobilitytracesgeneratedbySMOOTH? ForICTs,uncommentthefollowingpieceofcodeinlocationmap.c printf("ictis:%f\n",ict1); thiswillprintICTforeverypairofuser,wheneverauserreachesitslocationand scanstheneighborstoseeifhasgonefromadisconnectedtoconnectedstate(stored inthestatusarray) ForCDs,uncommentthefollowingpieceofcodeinlocationmap.c printf("durationis:%f\n",ct[try1][try2]); thiswillprintCDforeverypairofuser,wheneverauserreachesitslocationand scanstheneighboringuserstoseeifithasgonefromconnectedtodisconnectedstate (representedbythestatusarray) ForCNs,uncommentthefollowingpieceofcodeinsmooth.c if(cn[try1][try2]>0){ printf("contactnumberis:%f\n",cn[try1][try2]); } Itprintsthenumberofencountersmadebyeverypairofnodesattheendofthe simulation.. 240

PAGE 241

ThemovementtracesgeneratedbySMOOTHrepresentthe(x,y)c oordinates visitedbythemobilenodeatthecurrenttimestamp.Youcanchangethistoacontact trace(ifyouwant);youcanalsomodifythescripttoprintonlythelocationsvisitedbythe mobilenode... e.g.,thefollowingpieceofcode(insmooth.c)printsthecurrentlocationofthenode (i.e.,atthetimestampt) printf("%d%d%f%f\n",t,i,crt_xy[i][0],crt_xy[i][1]); youcanprintanns2compatibletraceleviathefollowingcodeinsmooth.c printf("(t-t_gap),i,crt_xy[i][0],crt_xy[i][1],v_velocity); Basedonyourcurrentneeds,youwouldonlyneedtomakesmallchangesineither smooth.corlocationmap.c 241

PAGE 242

APPENDIXE C OBRASCRIPTS #!/bin/bash #Thisprogramisapostprocessingstep#oncewehave #generatedtracesfromCOBRAmodelItsusesajava #program"tvcprofilecast"for#processingandgenerating #data.Somekeygoals-generate#similarityscores, #ONEfiles,googleearthfiles,encounterdetails, #graphmlorGephifiles#matlabfileforhierarchical #clusteringconfigurationforsmoothotherbehavioral #statisticssuchastime,frequencyetc. #COPYREALSAMPLEDTRACESFROMSTAGE_2TOSESSION_TRACES ./copy_sample.sh cdjava_program/tvcprofilecast/ #RUNTVCJAVAPROGRAMLEVEL0 ./run0 cd../../ #RUNSORTINGSHELLSCRIPT ./SORT_AFTER_CONVERSION.sh cdjava_program/tvcprofilecast/ #GENERATEASSOCIATIONMATRIX,SIMILARITYAMONGNODES ./run1 #ANALYSISOFSIMILARITY,DISTRIBUTIONOFSIMILARITYVALUES ./run2 #CALCULATEFREQUENCYANDDURATIONSPENDBYNODESATTHELOCATIONS ./run3 #GENERATEENCOUNTERSDETAILS,MAYBEADATEISSPECIFIED ./run4 242

PAGE 243

#GETTINGTHENUMBERSFORSMOOTHTORUN,CHECKTHEWIKI # FORLATESTVALUESANDHOWTORUN ./run5 #GENERATE.GEXF,.GRAPHMLFILEFORSTRUCTURALANALYSIS ./run6 #GENERATEMATLABFORMATTEDFILEFORHIERARCHICALCLUSTERING ./run7 cd../.. 243

PAGE 244

APPENDIXF S UMMARYOFTHEORETICALDISTRIBUTION TableF-1.Denitionoftheoreticaldistributions. DistributionMean E ( x ) P robabilityDensity p ( x ) CumulativeProbability f ( x ) Exponential 1 e )Tj /T1_7 5.978 Tf 8.16 3.36 Td (x 1 )Tj /T1_1 11.955 Tf 12 0 Td (e )Tj /T1_7 5.978 Tf 8.04 3.36 Td (x G amma k 1 )3(( k ) k x k )Tj /T1_4 7.97 Tf 6.6 0 Td (1 e )Tj /T1_7 5.978 Tf 5.76 0 Td (x 1 )3(( k ) r ( k x ) L og-logistic =f sin ( =f ) ( f= )( x = ) f )Tj /T1_4 5.978 Tf 5.76 0 Td (1 [1+( x + ) f ] 2 1 1+( x = ) )Tj /T1_8 5.978 Tf 5.76 0 Td (f Normal 1 p 2 e )Tj /T1_4 5.978 Tf 7.8 3.84 Td (( x )Tj /T1_8 5.978 Tf 5.76 0 Td ( ) 2 2 2 1 2 [ 1+ erf ( x )Tj /T1_5 7.97 Tf 6.6 0 Td ( p 2 2 ) ] Weibull )5((1+1 = k ) k ( x ) k )Tj /T1_4 7.97 Tf 6.6 0 Td (1 e )Tj /T1_4 7.97 Tf 6.6 0 Td (( x = ) k 1 )Tj /T1_1 11.955 Tf 12 0 Td (e )Tj /T1_4 7.97 Tf 6.6 0 Td (( x = ) k F.1HurstEstimators W ehaveusedsevenestimatorstocharacterizethevalueofHurstexponent. Foradistributiontohaveself-similarcharacteristics,thevalueofHurstexponentis 0.5 < H < 1 .Givenatimeseries X i ofsize N ,thecorrespondingaggregatedseriesis, X ( m ) ( k )= 1 m k m X i =( k )Tj /T1_4 7.97 Tf 6.6 0 Td [(1) m +1 X i ; k =1,2,3,...[ N = m ] (F1) F.1.1AbsoluteValueMethod Fortheaggregatedseries( F1 ),theabsolutevaluemethodcalculatesthesumof t heabsolutevaluesoftheaggregatedseries AM ( m ) = 1 N = m X N = m k =1 j X ( m ) ( k ) )Tj /T1_2 11.955 Tf 13.68 2.64 Td ( X j X isthecompletemeanofseries. F.1.2AggregatedVarianceMethod Fortheaggregatedseries( F1 ),theaggregatedvariancemethodcalculatesits s amplevariance d VarX ( m ) )Tj /T1_4 7.97 Tf 19.8 4.8 Td (1 N = m N = m X k = 1 ( X ( m ) ( k ) )Tj /T1_2 11.955 Tf 13.56 2.76 Td ( X ) 2 244

PAGE 245

F.1.3R/SMethod T he R = S methodestimatetheHurstparameterbyttingaleastsquareslineto thestatisticcomputedatdifferentlevelsof X m .Forthediscretetimeseries( F1 ),with p artialsum Y ( t )= t X i =1 X ( i ) ,variance S 2 ( n ) ,the R = S ratiois R S ( n ) = 1 S ( n ) [ ( max ) 0 6 t 6 n ( Y ( t ) )Tj /T1_2 11.955 Tf 13.681 8.04 Td (t n Y ( n ) ) )Tj /T1_5 11.955 Tf 11.88 0 Td (( min ) 0 6 t 6 n ( Y ( t ) )Tj /T1_2 11.955 Tf 13.68 8.04 Td (t n Y ( n ) )] F.1.4PeriodogramMethod ThePeriodogrammethodestimatesthepowerorintensityspectrum S ( w ) of astationaryprocess( F1 )intermsofasinglerealizationoftheanitesegmentof t heprocess.Theauto-correlationandspectraldensityforsuchprocessisdened as R ( k )= E [ X( t ) X( t + k ) ] and S ( w )= X k R ( k ) e )Tj /T1_4 7.97 Tf 6.6 0 Td [(jkw .Foranergodicprocessin correlationtheauto-correlationisgivenby b R N ( k )=1 = N N )Tj /T1_7 7.97 Tf 6.6 0 Td (1 X n =0 X ( n + k ) X ( n ) Thespectraldensityoftheprocessisdenedatdiscretetimeinstancescanbe estimatedbyFourierseriesoverthefulltimeperiod N as I N ( w )= 1 2 N j X k = 1 N x k e jkw j 2 F.1.5WhittleEstimator Inthisestimator,weassumethattheformofdensity S ( w H ) isknownbutthe Hurstparameterisunknown.Itcanbeshownthat H canbeestimatedbynding thevalueof H thatminimizes X w =2 = N 2 I N w S ( w H ) Thismethodalsoproduce asamplevariance,whichisusedtoestimatethecondenceinterval, Var ( H )= 4 [ P )Tj /T1_12 7.97 Tf 6.6 0 Td ( logS ( w ) 2 H ] )Tj /T1_7 7.97 Tf 6.6 0 Td (1 Thewhittleestimatorisimportantinthesensethatitprovidesa particularformofaself-similarprocessandalsoestimatestheHurstparameterwith condenceintervals. F.1.6Abry-VeitchMethod Theestimatorisbasedonmulti-resolutionanalysisanddiscretewavelettransformation toestimatethevalueofHurstparameter.Thismethodisveryrobustandaccuratein estimatingthescalingbehaviorofatimeseriesovervarious X m .Thisfrequencydomain 245

PAGE 246

analysisexploitsthescalingeffectandtransformstomuchs implerwaveletdomainand calculatesthescalingfactorfortheHurstparameter.Thescalebehaviorisestimated fromtheplotof log 2 (1 = n j X k j d x ( j k ) j 2 ) 246

PAGE 247

REFERENCES [ 1]A.Abraham,A.E.Hassanien,andV.Sn a sel.Computationalsocialnetwork analysis:Trends,toolsandresearchadvances. Springer ,2009. [2]WFAdams.Roadtrafcconsideredasarandomseries.(includesplates). JournaloftheICE ,4(1):121130,January1936. [3]RobertJ.Adler,RaisaE.Feldman,andMuradS.Taqqu,editors. Apracticalguide toheavytails:statisticaltechniquesandapplications .BirkhauserBostonInc., Cambridge,MA,USA,1998. [4]SherifAkoushandAhmedSameh.Mobileusermovementpredictionusing bayesianlearningforneuralnetworks.In Proceedingsofthe2007international conferenceonWirelesscommunicationsandmobilecomputing ,IWCMC'07, pages191196,NewYork,NY,USA,2007.ACM. [5]RodrigoB.AlmeidaandVirglioA.F.Almeida.Localcommunityidentication throughuseraccesspatterns. ClinicalOrthopaedicsandRelatedResearch cs.IR/0212045,2002. [6]M.M.Artimy,W.Robertson,andW.J.Phillips.Connectivityininter-vehicleadhoc networks.In ElectricalandComputerEngineering,2004.CanadianConference on ,volume1,pages293298Vol.1,may2004. [7]LarsBackstrom,PaoloBoldi,MarcoRosa,JohanUgander,andSebastianoVigna. Fourdegreesofseparation. CoRR ,abs/1111.4570,2011. [8]JamesPBagrow.Evaluatinglocalcommunitymethodsinnetworks. Journalof StatisticalMechanics:TheoryandExperiment ,2008(05):P05001(16pp),2008. [9]JamesP.BagrowandErikM.Bollt.Localmethodfordetectingcommunities. PhysicalReviewE(Statistical,Nonlinear,andSoftMatterPhysics) ,72(4):046108, 2005. [10]FBai,NSadagopan,andAHelmy.Important:aframeworktosystematically analyzetheimpactofmobilityonperformanceofroutingprotocolsforadhoc networks. AdHocNetworks ,1(4):825835,2003. [11]FanBaiandAhmedHelmy.Chapter1inwirelessadhocnetworks. WirelessAd hocandSensorNetworks ,2(5):130,2006. [12]FanBaiandBhaskarKrishnamachari.Spatio-temporalvariationsofvehicletrafc invanets:factsandimplications.In ProceedingsofthesixthACMinternational workshoponVehiculArInterNETworking ,VANET'09,pages4352,NewYork,NY, USA,2009.ACM. 247

PAGE 248

[13]FanBai,NarayananSadagopan,andAhmedHelmy.Theimpor tantframework foranalyzingtheimpactofmobilityonperformanceofroutingprotocolsforadhoc networks. AdHocNetworks ,1(4):383403,2003. [14]FanBai,NarayananSadagopan,andAhmedHelmy.Theimportantframework foranalyzingtheimpactofmobilityonperformanceofroutingprotocolsforadhoc networks. AdHocNetworks ,1(4):383403,2003. [15]M.Bando,K.Hasebe,A.Nakayama,A.Shibata,andY.Sugiyama.Dynamical modeloftrafccongestionandnumericalsimulation. PhysicalReview 51:10351042,February1995. [16]Albert-LaszloBarabasi.Scale-FreeNetworks:ADecadeandBeyond. Science 325(5939):412413,2009. [17]ValmirC.Barbosa,RaulDonangelo,andSergioR.Souza.Emergenceof scale-freenetworksfromlocalconnectivityandcommunicationtrade-offs. PhysicalReviewE ,74:016113,2006. [18]Jonathan.Barnett. Anintroductiontourbandesign .Harper&Row,NewYork:, 1sted.edition,1982. [19]A.Barrat,M.Barth elemy,R.Pastor-Satorras,andA.Vespignani.Thearchitecture ofcomplexweightednetworks. ProceedingsoftheNationalAcademyofSciences oftheUnitedStatesofAmerica ,101(11):37473752,March2004. [20]A.B.BarrettandA.K.Seth.Practicalmeasuresofintegratedinformationfor time-seriesdata. PLoSComputationalBiology. ,1(7):e1001052,January2011. [21]AdamB.Barrett,LionelBarnett,andAnilK.Seth.MultivariateGrangercausality andgeneralizedvariance. PhysicalReviewE ,81(4):041907+,apr2010. [22]MarcBarthelemy.Betweennesscentralityinlargecomplexnetworks. Eur.Phys.J. B ,38(2):163168,May2004. [23]MathieuBastian,SebastienHeymann,andMathieuJacomy.Gephi:Anopen sourcesoftwareforexploringandmanipulatingnetworks. AmericanJournalof Sociology ,2(2):361362,2009. [24]MathieuBastian,SebastienHeymann,andMathieuJacomy.Gephi:Anopen sourcesoftwareforexploringandmanipulatingnetworks. AAAI ,2009. [25]MBatty.Networkgeography:Relations,interactions,scalingandspatial processesingis. RGIS ,2003. [26]GenevieveBellandPaulDourish.Yesterday'stomorrows:notesonubiquitous computingdominantvision. PUC ,2007. 248

PAGE 249

[27]E.Ben-Naim,P.L.Krapivsky,andS.Redner.Kineticsofc lusteringintrafcows. Phys.Rev.E ,50(2):822829,Aug1994. [28]Y.Benezeth,P.M.Jodoin,B.Emile,H.Laurent,andC.Rosenberger.Review andevaluationofcommonly-implementedbackgroundsubtractionalgorithms.In PatternRecognition,2008.ICPR2008.19thInternationalConferenceon ,pages1 4,dec.2008. [29]StudentsDeepankarBhattacharjee,AshwinRao,ChintanShah,MananShah, andFacultyAhmedHelmy. Empiricalmodelingofcampus-widepedestrian mobility .ACMPress,2004. [30]J.J.Blum,A.Eskandarian,andL.J.Hoffman.Challengesofintervehicleadhoc networks. IntelligentTransportationSystems,IEEETransactionson ,5(4):347 351,dec.2004. [31]UlrikBrandes.Afasteralgorithmforbetweennesscentrality. JournalofMathematicalSociology ,25:163177,2001. [32]L.Briesemeister,L.Schafers,andG.Hommel.Disseminatingmessagesamong highlymobilehostsbasedoninter-vehiclecommunication.In IntelligentVehicles Symposium,2000.IV2000.ProceedingsoftheIEEE ,pages522527,2000. [33]TracyCamp,JeffBoleng,andVanessaDavies.Asurveyofmobilitymodelsfor adhocnetworkresearch. WirelessCommunicationsandMobileComputing 2(5):483502,2002. [34]TracyCamp,JeffBoleng,andVanessaDavies.Asurveyofmobilitymodelsfor adhocnetworkresearch. WirelessCommunicationsandMobileComputing 2(5):483502,2002. [35]E.G.CampariandG.Levi.Self-similarityinhighwaytrafc. TheEuropean PhysicalJournalB-CondensedMatterandComplexSystems ,25:245251, 2002.10.1140/epjb/e20020028. [36]JCandia,MCGonzlez,PWang,TSchoenharl,GMadey,andALBarabsi. Uncoveringindividualandcollectivehumandynamicsfrommobilephonerecords. JournalofPhysicsA:MathematicalandTheoretical ,41(22):16,2007. [37]A.Capocci,V.D.P.Servedio,F.Colaiori,L.S.Buriol,D.Donato,S.Leonardi, andG.Caldarelli.Preferentialattachmentinthegrowthofsocialnetworks:The internetencyclopediawikipedia. Phys.Rev.E ,74(3):036116,Sep2006. [38]AlessioCardillo,SalvatoreScellato,VitoLatora,andSergioPorta.Structural propertiesofplanargraphsofurbanstreetpatterns. PRE ,73,2006. [39]C.G.CarlsonandD.E.Clay.Theearthmodelcalculatingeldsizeanddistances betweenpointsusinggpscoordinates. SiteSpecicManagementGuidelines series-11,PotashandPhosphateInstitute(PPI) ,1999. 249

PAGE 250

[40]A.Chaintreau,P.Hui,J.Crowcroft,C.Diot,R.Gass,and J.Scott.Impactof humanmobilityonthedesignofopportunisticforwardingalgorithms.In INFOCOM 2006.25thIEEEInternationalConferenceonComputerCommunications. Proceedings ,pages113,april2006. [41]AugustinChaintreau,PanHui,JonCrowcroft,ChristopheDiot,RichardGass,and JamesScott.Impactofhumanmobilityonopportunisticforwardingalgorithms, 2007. [42]AugustinChaintreau,PanHui,JonCrowcroft,ChristopheDiot,RichardGass,and JamesScott.Impactofhumanmobilityonopportunisticforwardingalgorithms. IEEETransactionsonMobileComputing ,6(6):606620,June2007. [43]RobertE.Chandler,RobertHerman,andElliottW.Montroll.Trafcdynamics: Studiesincarfollowing. OPERATIONSRESEARCH ,6(2):165184,1958. [44]Sen-ChingS.CheungandChandrikaKamath.Robustbackgroundsubtraction withforegroundvalidationforurbantrafcvideo. EURASIPJ.Appl.Signal Process. ,2005:23302340,January2005. [45]S.Cho,S.G.Park,d.o..H.Lee,andB.C.Park.Protein-proteininteraction networks:frominteractionstonetworks. JournalofBiochemicalMolecular Biology ,37(1):4552,January2004. [46]AaronClauset.Findinglocalcommunitystructureinnetworks. PhysicalReviewE 72:026132,2005. [47]AaronClauset,CosmaRohillaShalizi,andM.E.J.Newman.Power-law distributionsinempiricaldata. SIAMReview ,51(4):661703,2009. [48]RichardClegg.Apracticalguidetomeasuringthehurstparameter.In Proceedingsof21stUKPerformanceEngineeringWorkshop,SchoolofComputing Science,TechnicalRepo ,pagespp.6878.N.Thomas,N.Thomas,2005. [49]BenjaminCoifman,DavidBeymer,PhilipMcLauchlan,andJitendraMalik.A real-timecomputervisionsystemforvehicletrackingandtrafcsurveillance. TransportationResearchPartC:EmergingTechnologies ,6(4):271288,1998. [50]PCosta,CMascolo,MMusolesi,andGPPicco.Socially-awareroutingfor publish-subscribeindelay-tolerantmobileadhocnetworks,2008. [51]PaoloCrucitti,VitoLatora,andSergioPorta.Centralitymeasuresinspatial networksofurbanstreets. PRE ,2006. [52]D.J.Dailey,F.W.Cathey,andS.Pumrin.Analgorithmtoestimatemeantrafc speedusinguncalibratedcameras. IntelligentTransportationSystems,IEEE Transactionson ,1(2):98107,jun2000. 250

PAGE 251

[53]EMDalyandMHaahr.Socialnetworkanalysisforinformat ionowin disconnecteddelay-tolerantmanets,2009. [54]ElizabethM.DalyandMadsHaahr.Socialnetworkanalysisforroutingin disconnecteddelay-tolerantmanets.In MobiHoc07 ,MobiHoc'07,pages3240, NewYork,NY,USA,2007.ACM. [55]LeonDanon,AlbertDaz-Guilera,JordiDuch,andAlexArenas.Comparing communitystructureidentication. JournalofStatisticalMechanics:Theoryand Experiment ,2005(09):P09008,2005. [56]A.Daou.Onowwithinplatoons.In AustralianJournalofStatics ,pages94116, 1966. [57]N.K.DaveandV.B.Vaghela. VehicularTrafcControl:AUbiquitousComputing Approach .SSRN,2009. [58]JamesAllenFillDavidAldous.Reversiblemarkovchainsandrandomwalkson graphs-chapter9:Asecondlookatgeneralmarkovchains,2002. [59]TimLomaxDavidSchrankandShawnTurner.Ttis2010urbanmobilityreport poweredbyinrixtrafcdata. TexasTransportationInstitute,TheTexasA&M UniversitySystem,December2010. [60]D.J.deSollaPrice.NetworksofScienticPapers. Science ,149:510515,July 1965. [61]J.C.Dunn.Well-separatedclustersandoptimalfuzzypartitions. Journalof Cybernetics ,4(1):95,1974. [62]JenniferA.Dunne,RichardJ.Williams,andNeoD.Martinez.Food-web structureandnetworktheory:Theroleofconnectanceandsize. PNAS 99(20):1291712922,2002. [63]NicolasDurandandGeraudGranger.Atrafccomplexityapproachthrough clusteranalysis.In ATM2003 ,2003. [64]FrancisT.DursoandCarolA.Manning.Airtrafccontrol. ReviewsofHuman FactorsandErgonomics ,4(1):195244,2008. [65]NathanEagleandAlexSandyPentland.Eigenbehaviors:identifyingstructurein routine. BehavioralEcologyandSociobiology ,63(7):10571066,2009. [66]NathanEagle,AlexSandyPentland,andDavidLazer.Inferringfriendshipnetwork structurebyusingmobilephonedata. ProceedingsoftheNationalAcademyof SciencesoftheUnitedStatesofAmerica ,106(36):1527415278,2009. 251

PAGE 252

[67]NathanEagle,Alex(Sandy)Pentland,andDavidLazer.In ferringfriendship networkstructurebyusingmobilephonedata. ProceedingsoftheNational AcademyofSciences ,106(36):1527415278,2009. [68]FransEkman,AriKernen,JouniKarvo,andJrgOtt.Workingdaymovement model. Proceedingofthe1stACMSIGMOBILEworkshoponMobilitymodels MobilityModels08 ,page33,2008. [69]AhmedElgammal,DavidHarwood,andLarryDavis.Non-parametricmodelfor backgroundsubtraction.InDavidVernon,editor, ComputerVisionECCV2000 volume1843of LectureNotesinComputerScience ,pages751767.Springer Berlin/Heidelberg,2000. [70]RobertF.EngleandCliveW.J.Granger.Co-integrationanderrorcorrection: Representation,estimation,andtesting. Econometrica ,55(2):25176,1987. [71]P.ErdosandA.Renyi.Ontheevolutionofrandomgraphs. Publ.Math.Inst.Hung. Acad.Sci ,5:1761,1960. [72]Glooret.al.Visualizationofinteractionpatternsincollaborativeknowledge networksformedicalapplications. ProceedingsofHuman-ComputerInteraction2003 ,2003. [73]WilliamY.C.Chenet.al.Checkingthereliabilityofanewapproachtowards detectingcommunitystructuresinnetworksusinglinearprogramming. IET SystemsBiology ,1(5):286291,2007. [74]KevinFall.Adelay-tolerantnetworkarchitectureforchallengedinternets. Proceedingsofthe2003conferenceonApplicationstechnologiesarchitecturesand protocolsforcomputercommunicationsSIGCOMM03 ,25:27,2003. [75]V.Farutin,K.Robison,E.Lightcap,V.Dancik,A.Ruttenberg,S.Letovsky, andJ.Pradines.Edge-countprobabilitiesfortheidenticationoflocalprotein communitiesandtheirorganization. Proteins ,62(3):800818,March2006. [76]AFatahgenSchieck,VKostakos,APenn,EO'Neill,TKindberg,DStanton Fraser,andTJones.Designtoolsforpervasivecomputinginurbanenvironments, 2006. [77]M.Fiedler.Albegraicconnectivityofgraphs. Czech.MathJournal ,23:298305, 1973. [78]DavidA.ForsythandJeanPonce. ComputerVision:AModernApproach PrenticeHallProfessionalTechnicalReference,2002. [79]SantoFortunato,VitoLatora,andMassimoMarchiori.Methodtondcommunity structuresbasedoninformationcentrality. PhysicalReviewE ,70(5):056104,Nov 2004. 252

PAGE 253

[80]M.Foth,L.Forlano,C.Satchell,andM.Gibbs. F romSocialButterytoEngaged Citizen .MITPress,2011. [81]MarcusFoth.Urbaninformatics,ubiquitouscomputingandsocialmediafor healthycities.In MCLC ,2011. [82]FrancoisFouss,AlainPirotte,andMarcoSaerens.Anovelwayofcomputing similaritiesbetweennodesofagraph,withapplicationtocollaborative recommendation. WI'05:Proceedingsofthe2005IEEE/WIC/ACMInternationalConferenceonWebIntelligence ,pages550556,2005. [83]AnaL.N.FredandAnilK.Jain.Robustdataclustering. ComputerVisionand PatternRecognition,IEEEComputerSociety ,02:128,2003. [84]L.Freeman.Centralityinsocialnetworksconceptualclarication. Social Networks ,1(3):215239,1979. [85]L.C.Freeman. TheDevelopmentofSocialNetworkAnalysis:AStudyinthe SociologyofScience .EmpiricalPress,2004. [86]JonD.FrickerandRobertK.Whitford. FundamentalsofTransportationEngineering:AMultimodalSystemsApproach .PrenticeHall,2004. [87]LMFulton,RBNoland,DJMeszler,andJVThomas.Astatisticalanalysisof inducedtraveleffectsintheU.S.mid-Atlanticregion. JournalofTransportation andStatistics ,3(1):114,2000. [88]WeiGao,QinghuaLi,BoZhao,andGuohongCao.Multicastingindelaytolerant networks:asocialnetworkperspective.In MobiHoc ,2009. [89]M.R.GareyandD.S.Johnson. ComputersandIntractability:AGuidetothe TheoryofNP-Completeness(SeriesofBooksintheMathematicalSciences) .W. H.Freeman,January1979. [90]BurnettGaryandJMarkPorter.Ubiquitouscomputingwithincars:designing controlsfornonvisualuse. IJHCS ,2001. [91]MGirvanandMEJNewman.Communitystructureinsocialandbiological networks. ProceedingsoftheNationalAcademyofSciencesoftheUnitedStates ofAmerica ,99(12):78217826,2002. [92]MGirvanandMEJNewman.Communitystructureinsocialandbiological networks. PNAS ,2002. [93]MichelleGirvanandM.E.J.Newman.Communitystructureinsocialand biologicalnetworks. PROC.NATL.ACAD.SCI.USA ,99:7821,2002. [94]CWJGranger.InvestigatingCausalRelationsbyEconometricModelsand Cross-SpectralMethods. Econometrica ,37(3):424438,1969. 253

PAGE 254

[95]R.A.Groeneveld.Skewnessfortheweibullfamily. S tatisticaNeerlandica 40(3):135140,1986. [96]A.Halati,H.Lieu,andS.Walker.CORSIM-corridortrafcsimulationmodel. In ProceedingsoftheTrafcCongestionandTrafcSafetyinthe21stCentury Conference ,pages570576,1997. [97]J.D.Hamilton. Timeseriesanalysis .PrincetonUniversityPress,Princeton,NJ, 1994. [98]MarkHansenandYuanlinHuang.Roadsupplyandtrafcincaliforniaurban areas. TransportationResearchPartA:PolicyandPractice ,31(3):205218,May 1997. [99] OlafurHelgason,SylviaT.Kouyoumdjieva,andGunnarKarlsson.Doesmobility matter?In WONS ,2010. [100]RAHornandCRJohnson. MatrixAnalysis ,volume169.CambridgeUniversity Press,1985. [101]TheusHossmann,ThrasyvoulosSpyropoulos,andFranckLegendre.Knowthy neighbor:towardsoptimalmappingofcontactstosocialgraphsfordtnrouting.In INFOCOM,2010. [102]TheusHossmann,ThrasyvoulosSpyropoulos,andFranckLegendre.Putting contactsintocontext:Mobilitymodelingbeyondinter-contacttimes.In ACM MobiHoc ,2011. [103]Wei-jenHsu,DebojyotiDutta,andAhmedHelmy.Miningbehavioralgroups inlargewirelesslans. MobiCom07Proceedingsofthe13thannualACM internationalconferenceonMobilecomputingandnetworking ,Montr 02bce:14, 2006. [104]Wei-jenHsu,DebojyotiDutta,andAhmedHelmy.Prole-cast:behavior-aware mobilenetworking. SIGMOBILEMob.Comput.Commun.Rev. ,12(1):5254, 2008. [105]Wei-jenHsu,DebojyotiDutta,andAhmedHelmy.Csi:Aparadigmfor behavior-orientedprole-castservicesinmobilenetworks. AdHocNetworks pages114,2011. [106]Wei-jenHsuandAHelmy. OnModelingUserAssociationsinWirelessLAN TracesonUniversityCampuses .IEEE,2006. [107]Wei-jenHsu,KashyapMerchant,Haw-weiShu,Chih-hsinHsu,andAhmed Helmy.Weightedwaypointmobilitymodelanditsimpactonadhocnetworks. ACMSIGMOBILEMobileComputingandCommunicationsReview ,9(1):5963, 2005. 254

PAGE 255

[108]Wei-JenHsu,T.Spyropoulos,K.Psounis,andA.Helmy.M odelingspatialand temporaldependenciesofusermobilityinwirelessmobilenetworks. Networking, IEEE/ACMTransactionson ,17(5):15641577,oct.2009. [109]Wei-jenHsu,ThrasyvoulosSpyropoulos,KonstantinosPsounis,andAhmed Helmy.Modelingspatialandtemporaldependenciesofusermobilityinwireless mobilenetworks. IEEE/ACMTransactionsonNetworking ,17(5):14,2008. [110]Wei-jenHsuWei-jenHsuandAHelmy.Onnodalencounterpatternsinwireless lantraces,2006. [111]Yuan-YihHsuandChien-ChuenYang.Designofarticialneuralnetworksfor short-termloadforecasting. GTD ,1991. [112]PanHui,JonCrowcroft,andEikoYoneki.Bubblerap:social-basedforwardingin delaytolerantnetworks.In MobiHoc ,MobiHoc'08,pages241250,NewYork,NY, USA,2008.ACM. [113]PanHui,JonCrowcroft,andEikoYoneki.Bubblerap:social-basedforwardingin delaytolerantnetworks. MobiHoc ,pages241250,2008. [114]PanHui,RichardMortier,MichalPi orkowski,TristanHenderson,andJon Crowcroft.Planet-scalehumanmobilitymeasurement.In Proceedingsofthe 2ndACMInternationalWorkshoponHotTopicsinPlanet-scaleMeasurement HotPlanet'10,pages1:11:5,NewYork,NY,USA,2010.ACM. [115]PanHui,EikoYoneki,ShuYanChan,andJonCrowcroft.Distributedcommunity detectionindelaytolerantnetworks. ProceedingsofrstACMIEEEinternational workshoponMobilityintheevolvinginternetarchitectureMobiArch07 ,Kyoto, Jap:1,2007. [116]PanHui,EikoYoneki,ShuYanChan,andJonCrowcroft.Distributedcommunity detectionindelaytolerantnetworks.In MobiArch'07:Proceedingsof2nd ACM/IEEEinternationalworkshoponMobilityintheevolvinginternetarchitecture pages18,NewYork,NY,USA,2007.ACM. [117]TakashiIto,TomokoChiba,RitsukoOzawa,MikioYoshida,MasahiraHattori,and YoshiyukiSakaki.Acomprehensivetwo-hybridanalysistoexploretheyeast proteininteractome. PNAS ,98(8):45694574,2001. [118]OdedIzraeliandThomasR.McCarthy.Variationsintraveldistance,traveltime andmodelchoiceamongsmsas. JournalofTransportEconomicsandPolicy 1985. [119]SanjayJainandC.McLean.Aframeworkformodelingandsimulationfor emergencyresponse.In SimulationConference,2003.Proceedingsofthe2003 Winter ,volume1,pages10681076Vol.1,dec.2003. 255

PAGE 256

[120]WeijenHsu,DebojyotiDutta,andAhmedHelmy.CSI:Apar adigmfor behavior-orientedprole-castservicesinmobilenetworks. AdHocNetworks InPress,2011. [121]WeijenHsuandA.Helmy.Onnodalencounterpatternsinwirelesslantraces.In ModelingandOptimizationinMobile,AdHocandWirelessNetworks,20064th InternationalSymposiumon ,pages110,apr.2006. [122]WeijenHsuandAhmedHelmy.CRAWDADdatasetusc/mobilib(v.2008-07-24). Ubicomp ,July2008. [123]JJiruandDEilers.Cartoroadsidecommunicationusingieee802.11p technology. IndustrialEthernetBookIssue ,2010. [124]SorenJohansen.Statisticalanalysisofcointegrationvectors. JournalofEconomic DynamicsandControl ,12(2-3):231254,1988. [125]R.A.JohnsonandD.W.Wichern. AppliedMultivariateStatisticalAnalysis .Pearson PrenticeHall,2007. [126]NicolasJozefowiez,FrdricSemet,andEl-GhazaliTalbi.Targetaimingpareto searchanditsapplicationtothevehicleroutingproblemwithroutebalancing. JournalofHeuristics ,13:455469,2007.10.1007/s10732-007-9022-6. [127]YiannisKamarianakisandPoulicosPrastacos.ForecastingTrafcFlowConditions inanUrbanNetwork:ComparisonofMultivariateandUnivariateApproaches. TransportationResearchRecord:JournaloftheTransportationResearchBoard 1857(-1):7484,January2003. [128]TKaragiannis,JYLeBoudec,andMVojnovic 0301.Powerlawandexponential decayofintercontacttimesbetweenmobiledevices,2010. [129]ThomasKaragiannis,MichalisFaloutsos,andMartMolle.Auser-friendly self-similarityanalysistool. SIGCOMMComput.Commun.Rev. ,33:8193, July2003. [130]VKastrinaki,MZervakis,andKKalaitzakis.Asurveyofvideoprocessing techniquesfortrafcapplications. ImageandVisionComputing ,21(4):359381, 2003. [131]S.Kaul,M.Gruteser,V.Rai,andJ.Kenney.Onpredictingandcompressing vehicularGPStraces.In ICC ,2010. [132]AriKer anen,J orgOtt,andTeemuK arkk ainen.TheONESimulatorforDTN ProtocolEvaluation.In SIMUTools ,2009. [133]B.WKernighanandS.Lin.Anefcientheuristicprocedureforpartitioninggraph. BellSystemTechnicalJournal ,49:291307,1970. 256

PAGE 257

[134]MKim,DKotz,andSKim.Extractingamobilitymodelfrom realusertraces. ProceedingsIEEEINFOCOM200625THIEEEInternationalConferenceon ComputerCommunications ,00(c):113,2006. [135]MinkyongKimandDavidKotz.Periodicpropertiesofusermobilityand access-pointpopularity. PersonalandUbiquitousComputing ,11(6):465479, 2006. [136]JonM.Kleinberg,RaviKumar,PrabhakarRaghavan,SridharRajagopalan,and AndrewS.Tomkins.TheWebasagraph:Measurements,modelsandmethods. LectureNotesinComputerScience ,1627:117,1999. [137]DKotzandTHenderson.Crawdad:Acommunityresourceforarchivingwireless dataatdartmouth. IeeePervasiveComputing ,4(4):2122,2006. [138]DavidKotzandKobbyEssien.Analysisofacampus-widewirelessnetwork. WirelessNetworks ,11(1-2):115133,2005. [139]DavidKotzandKobbyEssien.Analysisofacampus-widewirelessnetwork. WirelessNetworks ,11(12):115133,January2005. [140]DavidKotzandTristanHenderson.CRAWDAD:ACommunityResourcefor ArchivingWirelessDataatDartmouth. IEEEPervasiveComputing ,2005. [141]DavidKotz,TristanHenderson,andIlyaAbyzov.CRAWDADdataset dartmouth/campus,December2004. [142]DanielKrajzewicz,GeorgHertkorn,ChristianRssel,andPeterWagner. SUMO (SimulationofUrbanMObility)-anopen-sourcetrafcsimulation ,pages183187. sn,2002. [143]JohnKrumm.Trajectoryanalysisfordriving.In ComputingwithSpatialTrajectories .Springer,2011. [144]UdayanKumar,GautamThakur,andAhmedHelmy.Protect:Proximity-based trust-advisorusingencountersformobilesocieties. Proceedingsofthe6th InternationalWirelessCommunicationsandMobileComputingConferenceon ZZZ ,abs/1004.4:636 2013645,2010. [145]NOMADSLab.Mobilib:Community-widelibraryofmobilityandwirelessnetworks measurements. [146]M.LatapyandP.Pons.Computingcommunitiesinlargenetworksusingrandom walks. ArXivCondensedMattere-prints ,2004. [147]JongKLeeandJenniferCHou. Modelingsteady-stateandtransientbehaviorsof usermobility:formulation,analysis,andapplication .ACM,2006. 257

PAGE 258

[148]KyunghanLee,SeongikHong,SeongJoonKim,InjongRhee ,andSongChong. Slaw:Anewmobilitymodelforhumanwalks.In INFOCOM'09 ,pages855863, 2009. [149]Sang-HoLee,TanYigitcanlar,Jung-HoonHan,andYoun-TaikLeem.Ubiquitous urbaninfrastructure:InfrastructureplanninganddevelopmentinKorea. IMPP 2008. [150]WillE.Leland,MuradS.Taqqu,WalterWillinger,andDanielV.Wilson.Onthe self-similarnatureofethernettrafc(extendedversion). IEEE/ACMTrans.Netw. 2:115,February1994. [151]DanLelescu,UlasCKozat,RaviJain,andMahadevanBalakrishnan. ModelT++: anempiricaljointspace-timeregistrationmodel .ACM,2006. [152]R.LienhartandJ.Maydt.Anextendedsetofhaar-likefeaturesforrapidobject detection.In ImageProcessing.2002.Proceedings.2002InternationalConferenceon ,volume1,pagesI900I903vol.1,2002. [153]AndersLindgren,AvriDoria,andOlovSchel en.Probabilisticroutingin intermittentlyconnectednetworks. SIGMOBILEMob.Comput.Commun.Rev. 7:1920,July2003. [154]LszlLovsz. Combinatorics,PaulErdosiseighty .JnosBolyaiMathematical Society,Hungary,1993. [155]NicholasE.LownesandRandyB.Machemehl.Vissim:amulti-parameter sensitivityanalysis.In Proceedingsofthe38thconferenceonWintersimulation WSC'06,pages14061413.WinterSimulationConference,2006. [156]NicholasE.LownesandRandyB.Machemehl.Vissim:amulti-parameter sensitivityanalysis.In Proceedingsofthe38thconferenceonWintersimulation WSC'06,pages14061413.WinterSimulationConference,2006. [157]R.TapioLuttinen. Statisticalanalysisofvehicletimeheadways .PhDthesis, HelsinkiUniversityofTechnology,Finland,1996. [158]GregoryR.Madey,GborSzab,andAlbert-LszlBarabsi.Wiper:Theintegrated wirelessphonebasedemergencyresponsesystem.In InternationalConference onComputationalScience(3)'06 ,pages417424,2006. [159]BenoitB.Mandelbrot. TheFractalGeometryofNature .W.H.FreedmanandCo., NewYork,1983. [160]B.S.ManojandAlexandraHubenkoBaker.Communicationchallengesin emergencyresponse. Commun.ACM,50(3):5153,March2007. 258

PAGE 259

[161]R.Martinez-Cantin,N.deFreitas,A.Doucet,andJ.Cas tellanos.Activepolicy learningforrobotplanningandexplorationunderuncertainty.In Proceedingsof Robotics:ScienceandSystems ,Atlanta,GA,USA,June2007. [162]Jr.Massey,FrankJ.Thekolmogorov-smirnovtestforgoodnessoft. Journalof theAmericanStatisticalAssociation ,46(253):pp.6878,1951. [163]MillerMcPherson,LynnSmith-Lovin,andJamesMCook.Birdsofafeather: Homophilyinsocialnetworks. AnnualReviewofSociology ,27(1):415444,2001. [164]AlessandroMeiandJulindaStefa.Swim:Asimplemodeltogeneratesmall mobileworlds. ComputingResearchRepository ,abs/0809.2:21062113,2008. [165]QiangMengandHooiLingKhoo.Self-similarcharacteristicsofvehiclearrival patternonhighways. JournalofTransportationEngineering ,135(11):864872, 2009. [166]AndrewG.Miklas,KiranK.Gollu,KelvinK.W.Chan,StefanSaroiu,KrishnaP. Gummadi,andEyalDeLara.Exploitingsocialinteractionsinmobilesystems. In Proceedingsofthe9thinternationalconferenceonUbiquitouscomputing UbiComp'07,pages409428,Berlin,Heidelberg,2007.Springer-Verlag. [167]SMoghaddamandAHelmy. Internetusagemodelingoflargewirelessnetworks usingself-organizingmaps .IEEE,2010. [168]MatthewM.MooreandBeverlyLu.AutonomousVehiclesforPersonalTransport. SSRNeLibrary ,2011. [169]MurrayM.P.Adrunkandherdog:anillustrationofcointegrationanderror correction. TheAmericanStatistician ,pages4837,February1994. [170]AbderrahmenMtibaa,MartinMay,ChristopheDiot,andMostafaAmmar. Peoplerank:Socialopportunisticforwarding. 2010ProceedingsIEEEINFOCOM,SanDiego,:15,2010. [171]AartiMunjal,TracyCamp,andWilliamC.Navidi.Smooth:asimplewaytomodel humanwalks. SIGMOBILEMob.Comput.Commun.Rev. ,14(4):3436,November 2010. [172]MircoMusolesi,PanHui,CeciliaMascolo,andJonCrowcroft.Writingonthe cleanslate:Implementingasocially-awareprotocolinhaggle. 2008International SymposiumonaWorldofWirelessMobileandMultimediaNetworks ,pages16, 2008. [173]MircoMusolesiandCeciliaMascolo.Acommunitybasedmobilitymodelforad hocnetworkresearch.In REALMAN ,2006. [174]TakashiNagatani.Self-similarbehaviorofasinglevehiclethroughperiodictrafc lights. PhysicaA:StatisticalMechanicsanditsApplications ,347:673682,2005. 259

PAGE 260

[175]TakashiNagatani.Clusteringandmaximalowinvehicu lartrafcthrougha sequenceoftrafclights. PhysicaA:StatisticalMechanicsanditsApplications 377(2):651660,2007. [176]FawadNazir,HelmutPrendinger,andArunaSeneviratne.Participatorymobile socialnetworksimulationenvironment. ProceedingsoftheSecondInternational WorkshoponMobileOpportunisticNetworkingMobiOpp10 ,Pisa,Ital:171,2010. [177]SamuelCNelson,AlbertFHarris,andRobinKravets.Event-driven,role-based mobilityindisasterrecoverynetworks. Proceedingsofthesecondworkshopon ChallengednetworksCHANTSCHANTS07 ,page27,2007. [178]MEJNewman.Modularityandcommunitystructureinnetworks. ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica 103(23):85778582,2006. [179]M.E.J.Newman.Modularityandcommunitystructureinnetworks. Proceedings oftheNationalAcademyofSciences ,103(23):85778582,2006. [180]M.E.J.Newman. Networks:anintroduction .OxfordUniversityPress,Oxford; NewYork,2010.ID:456837194. [181]M.E.J.NewmanandM.Girvan.Findingandevaluatingcommunitystructurein networks. PhysicalReviewE ,69:026113,2004. [182]MarkNewman,Albert-LaszloBarabasi,andDuncanJWatts. Thestructureand dynamicsofnetworks ,volume107.PrincetonUniversityPress,2006. [183]MarkNewman,Albert-LaszloBarabasi,andDuncanJ.Watts. TheStructureand DynamicsofNetworks:(PrincetonStudiesinComplexity) .PrincetonUniversity Press,1edition,April2006. [184]M.E.J.Newman. Networks:AnIntroduction .OxfordUniversityPress,2010. [185]MindaugasNorkus,DamienFay,MaryJ.Murphy,FrankBarry,GearoidOLaighin, andLiamKilmartin.Ontheapplicationofactivelearningandgaussianprocesses inpost-cryopreservationcellmembraneintegrityexperiments. IEEE/ACM TransactionsonComputationalBiologyandBioinformatics ,99(PrePrints),2011. [186]KennethOgden. Trafcengineeringpractice .Dept.ofCivilEngineeringMonash University,ClaytonVic.Australia,4thed.edition,1989. [187]KazuyaOkamoto,WeiChen,andXiang-YangLi.Rankingofclosenesscentrality forlarge-scalesocialnetworks.In FAW'08:Proceedingsofthe2ndannual internationalworkshoponFrontiersinAlgorithmics ,pages186195,Berlin, Heidelberg,2008.Springer-Verlag. 260

PAGE 261

[188]GergelyPalla,ImreDerenyi,IllesFarkas,andTamasVi csek.Uncoveringthe overlappingcommunitystructureofcomplexnetworksinnatureandsociety. Nature ,435:814,2005. [189]KihongParkandWalterWillinger. Self-SimilarNetworkTrafcandPerformance Evaluation .JohnWiley&Sons,Inc.,NewYork,NY,USA,1stedition,2000. [190]K.Patterson. AnIntroductiontoAppliedEconometrics:ATimeSeriesApproach MacMillanPress,London,1994. [191]Alex(Sandy)Pentland.Automaticmappingandmodelingofhumannetworks. PhysicaA:StatisticalMechanicsanditsApplications ,378(1):5967,May2007. [192]CiaranS.PhibbsandHaroldS.Luft.Correlationoftraveltimeonroadsversus straightlinedistance. MedicalCareResearchandReview ,52(4):532542,1995. [193]M.Piccardi.Backgroundsubtractiontechniques:areview.In Systems,Manand Cybernetics,2004IEEEInternationalConferenceon ,volume4,pages3099 3104vol.4,oct.2004. [194]M.Pi orkowskiandet.al.Trans:realisticjointtrafcandnetworksimulatorfor vanets. SigmobileCCR ,pages864872,2008. [195]A.J.Pocklington,J.D.Armstrong,andS.G.N.Grant.Organizationofbrain complexitysynapseproteomeformandfunction. BriefFunctGenomicProteomic 5(1):6673,2006. [196]SergioPorta,VitoLatora,FahuiWang,SalvadorRueda,EmanueleStrano, SalvatoreScellato,AlessioCardillo,EugenioBelli,FranciscoCrdenas,Berta Cormenzana,andLauraLatora.Streetcentralityandlocationofeconomic activitiesinbarcelona. UrbanStudies ,2011. [197]SergioPorta,VitoLatora,FahuiWang,EmanueleStrano,AlessioCardillo, SalvatoreScellato,ValentinoIacoviello,andRobertoMessora.Streetcentrality anddensitiesofretailandservicesinbologna,italy. EPD ,2009. [198]AlexPothen,HorstD.Simon,andKan-PuLiou.Partitioningsparsematrices witheigenvectorsofgraphs. SIAMJournalonMatrixAnalysisandApplications 11(3):430452,1990. [199]C.E.RasmussenandC.K.I.Williams. GaussianProcessesforMachine Learning .MITpress.,2006. [200]IRhee,MShin,SHong,KLee,andSChong.Onthelevy-walknatureofhuman mobility. IEEEINFOCOMThe27thConferenceonComputerCommunications (2008) ,19(3):924932,2008. 261

PAGE 262

[201]InjongRhee,MinsuShin,SeongikHong,KyunghanLee,an dSongChong.Onthe levy-walknatureofhumanmobility.In INFOCOM2008.The27thConferenceon ComputerCommunications.IEEE ,pages924932,april2008. [202]EMRoyer,PMMelliar-Smith,andLEMoser.Ananalysisoftheoptimumnode densityforadhocmobilenetworks. ICC2001IEEEInternationalConferenceon CommunicationsConferenceRecordCatNo01CH37240 ,3:857861,2001. [203]E.M.Royer,P.M.Melliar-Smith,andL.E.Moser.Ananalysisoftheoptimumnode densityforadhocmobilenetworks.In ICC ,2001. [204]T.J.Santner,B.Williams,andW.Notz. TheDesignandAnalysisofComputer Experiments .Springer-Verlag,2003. [205]RussellKSchutt. InvestigatingtheSocialWorld:TheProcessandPracticeof Research ,volume5.PineForgePress,2006. [206]DarrenM.Scott.OvercomingTrafcCongestion:ADiscussionofReduction StrategiesandBehavioralResponsesfromaNorth-AmericanPerspective. EuropeanJournalofTransportandInfrastructureResearch ,2(3-4):317338, 2003. [207]J.Scott,P.Hui,J.Crowcroft,andC.Diot.Haggle:aNetworkingArchitecture DesignedAroundMobileUsers.In IFIPWONS,Jan,2006 ,January2006. [208]JamesScott,PanHui,JonCrowcroft,andChristopheDiot.Haggle:anetworking architecturedesignedaroundmobileusers. Online ,2006:7886,2008. [209]MukundSeshadri,SridharMachiraju,AshwinSridharan,JeanBolot,Christos Faloutsos,andJureLeskove.Mobilecallgraphs:beyondpower-lawand lognormaldistributions.In Proceedingofthe14thACMSIGKDDinternational conferenceonKnowledgediscoveryanddatamining ,KDD'08,pages596604, NewYork,NY,USA,2008.ACM. [210]PShannon,AMarkiel,OOzier,NSBaliga,JTWang,DRamage,NAmin, BSchwikowski,andTIdeker.Cytoscape:asoftwareenvironmentforintegrated modelsofbiomolecularinteractionnetworks. GenomeRes ,13(11):24982504, November2003. [211]Y.SheikhandM.Shah.Bayesianmodelingofdynamicscenesforobject detection. PatternAnalysisandMachineIntelligence,IEEETransactionson 27(11):17781792,2005. [212]KShilton,NRamanathan,SReddy,VSamanta,JBurke,DEstrin,MHansen, andMSrivastava.Participatorydesignofsensingnetworks:Strengthsand challenges. Design ,http://www:285288,2008. 262

PAGE 263

[213]J.P.Singh,N.Bambos,B.Srinivasan,andD.Clawin.Wir elesslanperformance undervariedstressconditionsinvehiculartrafcscenarios.In VehicularTechnologyConference,2002.Proceedings.VTC2002-Fall.2002IEEE56th ,volume2, pages743747vol.2,2002. [214]V.SpirinandL.A.Mirny.Proteincomplexesandfunctionalmodulesinmolecular networks. ProcNatlAcadSciUSA ,100(21):1212312128,October2003. [215]ThrasyvoulosSpyropoulos,KonstantinosPsounis,andCauligiS.Raghavendra. Sprayandwait:anefcientroutingschemeforintermittentlyconnectedmobile networks.In Proceedingsofthe2005ACMSIGCOMMworkshoponDelaytolerantnetworking ,WDTN'05,pages252259,NewYork,NY,USA,2005. ACM. [216]ThrasyvoulosSpyropoulos,KonstantinosPsounis,andCauligiSRaghavendra. Performanceanalysisofmobility-assistedrouting. Proceedingsoftheseventh ACMinternationalsymposiumonMobileadhocnetworkingandcomputing MobiHoc06 ,Florence,:49,2006. [217]RazvanStanica,EmmanuelChaput,andAndr e-LucBeylot.Simulationof vehicularad-hocnetworks:Challenges,reviewoftoolsandrecommendations. Comput.Netw. ,55:31793188,October2011. [218]C.StaufferandW.E.L.Grimson.Adaptivebackgroundmixturemodelsfor real-timetracking.In ComputerVisionandPatternRecognition,1999.IEEE ComputerSocietyConferenceon. ,volume2,pages2vol.(xxiii+637+663),1999. [219]C.StaufferandW.E.L.Grimson.Adaptivebackgroundmixturemodelsfor real-timetracking.In IEEECVPR ,pages1761,1999. [220]Dietrich.StaufferandAmnon.Aharony. Introductiontopercolationtheory/Dietrich StaufferandAmnonAharony .TaylorandFrancis,London:,2nded.edition,1992. [221]KarstenSteinhaeuserandNiteshVChawla.Communitydetectioninalarge real-worldsocialnetwork. SocialComputingBehavioralModelingandPrediction http://www:168175,2008. [222]AlexanderStrehlandJoydeepGhosh.Clusterensemblesaknowledgereuse frameworkforcombiningmultiplepartitions. JournalonMachineLearning Research(JMLR) ,3:583617,December2002. [223]ThomasJSullivan.Modelingandsimulationforemergencyresponse. Lawrence LivermoreNationalLaboratory ,May1985. [224]ZehangSun,GeorgeBebis,andRonaldMiller.On-roadvehicledetection: Areview. IEEETransactionsonPatternAnalysisandMachineIntelligence 28:694711,2006. 263

PAGE 264

[225]GautamThakur,AhmedHelmy,andWei-JenHsu.Similarit yanalysisand modelinginmobilesocieties:Themissinglink. Proceedingsofthe5thACM workshoponChallengednetworks ,page8,2010. [226]GautamS.Thakur.GeneratingUrbanStreetNetworkusingGoogleMapsAPI. CISETechReport ,2012. [227]GautamS.Thakur,MohsenAli,PanHui,andAhmedHelmy.Comparing backgroundsubtractionalgorithmsandmethodofcarcounting. ArXive-prints January2012. [228]GautamS.Thakur,AhmedHelmy,andWei-JenHsu.Similarityanalysisand modelinginmobilesocieties:themissinglink.In Proceedingsofthe5thACM workshoponChallengednetworks ,CHANTS'10,pages1320,NewYork,NY, USA,2010.ACM. [229]GautamS.Thakur,PanHui,HamedKetabdar,andAhmedHelmy.Spatial andtemporalanalysisofplanetscalevehicularimagerydata.In DataMining Workshops(ICDMW),2011IEEE11thInternationalConferenceon ,pages905 910,dec.2011. [230]GautamS.Thakur,PanHui,HamedKetabdar,andAhmedHelmy.Towards realisticvehicularnetworkmodelingusingplanet-scalepublicwebcams. CoRR abs/1105.4151,2011. [231]GautamS.Thakur,UdayanKumar,AhmedHelmy,andWei-JenHsu.Onthe efcacyofmobilitymodelingforDTNevaluation:Analysisofencounterstatistics andspatio-temporalpreferences.In IEEEIWCMC ,NewYork,NY,USA,2011. [232]GautamS.Thakur,UdayanKumar,Wei-JenHsu,andAhmedHelmy.Gauging humanmobilitycharacteristicsanditsimpactonmobileroutingperformance. Int. J.Sen.Netw. ,11(3):179191,April2012. [233]G.S.Thakur,PanHui,andA.Helmy.Modelingandcharacterizationofurban vehicularmobilityusingwebcameras.In ComputerCommunicationsWorkshops (INFOCOMWKSHPS),2012IEEEConferenceon ,pages262267,march2012. [234]G.S.Thakur,R.Tiwari,M.T.Thai,S.-S.Chen,andA.W.M.Dress.Detectionof localcommunitystructuresincomplexdynamicnetworkswithrandomwalks. IET SystemsBiology ,3(4):266278,2009. [235]K.TierneyandJ.Sutton.Costandculture:Barrierstotheadoptionoftechnology inemergencymanagement. RESCUEResearchHighlights ,June2005. [236]JeffreyTraversandStanleyMilgram.Anexperimentalstudyofthesmallworld problem. Sociometry ,32(4):425443,1969. [237]AminVahdatandDavidBecker.Epidemicroutingforpartially-connectedadhoc networks. CoRR ,2000. 264

PAGE 265

[238]AminVahdatandDavidBecker.Epidemicroutingforpart ially-connectedadhoc networks. CoRR ,2000. [239]VGJ.Drawinggraphswithvisualizinggraphswithjava,April1998.VGJ, VisualizingGraphswithJava,isatoolforgraphdrawingandgraphlayout.The experimentalrandomgraphsaregeneratedusingVGJandMatlabBioinformatics toolbox. [240]VladimirVukadinovicandStefanMangold.Opportunisticwirelesscommunication inthemeparks:astudyofvisitorsmobility.In Proceedingsofthe6thACM workshoponChallengednetworks ,CHANTS'11,pages38,NewYork,NY,USA, 2011.ACM. [241]PeterWagner.Modelingtrafcowuctuations. JournaloftheICE ,4:121130, 1936. [242]S.WassermanandK.Faust. SocialNetworkAnalysis .CambridgeUniversity Press,Cambridge,1994. [243]A.G.Wilson. Complexspatialsystems:modelingfoundationsofurbanand regionalanalysis .PH,2000. [244]N.Wisitpongphan,FanBai,P.Mudalige,V.Sadekar,andO.Tonguz.Routingin sparsevehicularadhocwirelessnetworks. SelectedAreasinCommunications, IEEEJournalon ,25(8):15381556,oct.2007. [245]JihwangYeo,DavidKotz,andTristanHenderson.Crawdad:acommunity resourceforarchivingwirelessdataatdartmouth. SIGCOMMComput.Commun. Rev. ,36:2122,April2006. [246]JihwangYeo,DavidKotz,andTristanHenderson.CRAWDAD:ACommunity ResourceforArchivingWirelessDataatDartmouth. ACMSIGCOMMComputer CommunicationReview ,36(2):2122,April2006.Projectoverview. [247]EikoYoneki,PanHui,andJonCrowcroft. Visualizingcommunitydetectionin opportunisticnetworks ,volumeMontreal,.ACMPress,2007. [248]SaitohYoshihiro,KondohMinoru,andKomatsuYukio.Driverlesscartraveling guidesystem. USPT ,1989. [249]SYouseandet.al.Vehicularadhocnetworks(vanets):Challengesand perspectives.In VehicularNetwork ,pages864872,2006. [250]W.W.Zachary.Aninformationowmodelforconictandssioninsmallgroups. JournalofAnthropologicalResearch ,33:452473,1977. [251]DaqingZhang,NanLi,ZhiZhou,ChaoChen,LinSun,andShijianLi.iBAT: detectinganomaloustaxitrajectoriesfromGPStraces.In Ubicomp ,2011. 265

PAGE 266

[252]YuZheng,YanchiLiu,JingYuan,andXingXie.Urbancomp utingwithtaxicabs.In UbiComp ,pages8998,2011. [253]H.Zhu,Y.Zhu,M.Li,andL.M.Ni.Seer:Metropolitan-scaletrafcperception basedonlossysensorydata.In INFOCOM2009,IEEE ,pages217225,april 2009. [254]H.Zimmermann.Availabilityoftechnologiesversuscapabilitiesofusers. InternationalISCRAMConference ,May2006. 266

PAGE 267

BIOGRAPHICALSKETCH G autamS.ThakurreceivedhisPh.D.fromTheComputerandInformationScience andEngineeringdepartmentatUniversityofFloridainthefallof2012.Hewasadvised byProf.AhmedHelmy.HewasaresearchassistantinMobileWirelessNetworks DesignandTestingGroup(NOMADS)laboratory.HecloselycollaboratewithIntelligent NetworksandManagementofDistributedSystemsgroupatDeutscheTelekom Laboratories,BerlinunderthesupervisionofDr.PanHuiandDr.HamedKetabdar.His workallowsforresearchinrealisticanddata-drivenhuman(pedestrian)andvehicular activity(mobility)modelinginlargewirelessnetworks.Healsostudiesthestructure, functionandscienceofsuchnetworks(andonlinesocialnetworks)usingacombination ofempiricalmethodsandvisualizationtechniques.Duringthesummerof2012,Gautam workedasaLabAssociateatDisneyResearch,Zurich(ETH,Zurich).Hewasworking onactivityandmobilitymodelingofthemeparkguests.Thisworkwascloselyaligned withhisPhDworkonpedestrianmobilitymodeling.Currently,heisinvolvedintwo partiallyfundedNSFprojectsonHumoNetsandPlanetSensing. 267