<%BANNER%>

Community Structure and Its Applications in Dynamic Complex Networks

Permanent Link: http://ufdc.ufl.edu/UFE0045408/00001

Material Information

Title: Community Structure and Its Applications in Dynamic Complex Networks
Physical Description: 1 online resource (185 p.)
Language: english
Creator: Nguyen, Nam P
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2013

Subjects

Subjects / Keywords: adaptive-algorithm -- community-detection -- dynamic-complex-network -- fowarding-routing-stategy -- mobile-network -- online-social-network -- stable-community -- worm-containment
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: In this dissertation, we focus on analyzing and understanding the organizational principals, assessing the structural vulnerability as well as exploring practical applications of dynamic complex networks. In particular, we propose two adaptive frameworks for identifying the nonoverlapping and overlapping community structure in dynamic networks. Our approaches have not only the power of quickly and efficiently updating the network communities, but also the ability of tracing the evolution of those communities over time. We also suggest a detection method based on nonnegative matrix factorization which can work on weighted and directed networks. Consequently, we study the discovery of stable communities in the networks, i.e., communities which are tightly connected and remain wealthy even over a long period of time. Furthermore, we investigate on the structural vulnerability of the network community structure via identifying key nodes that play an important role in maintaining the normal function of the whole system. This is a new research direction on the cyber-infrastructure that we have recently introduced. To certify the effectiveness of our suggested frameworks and algorithms, we extensively test them on not only synthesized networks but also on real-world dynamic traces. Finally, we demonstrate the wide applicability of our algorithms via realistic applications, such as the limiting  misinformation spread in Online Social Networks as well as the social-based forwarding and routing strategy and worm containment in Mobile networks.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Nam P Nguyen.
Thesis: Thesis (Ph.D.)--University of Florida, 2013.
Local: Adviser: Thai, My Tra.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2015-05-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2013
System ID: UFE0045408:00001

Permanent Link: http://ufdc.ufl.edu/UFE0045408/00001

Material Information

Title: Community Structure and Its Applications in Dynamic Complex Networks
Physical Description: 1 online resource (185 p.)
Language: english
Creator: Nguyen, Nam P
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2013

Subjects

Subjects / Keywords: adaptive-algorithm -- community-detection -- dynamic-complex-network -- fowarding-routing-stategy -- mobile-network -- online-social-network -- stable-community -- worm-containment
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: In this dissertation, we focus on analyzing and understanding the organizational principals, assessing the structural vulnerability as well as exploring practical applications of dynamic complex networks. In particular, we propose two adaptive frameworks for identifying the nonoverlapping and overlapping community structure in dynamic networks. Our approaches have not only the power of quickly and efficiently updating the network communities, but also the ability of tracing the evolution of those communities over time. We also suggest a detection method based on nonnegative matrix factorization which can work on weighted and directed networks. Consequently, we study the discovery of stable communities in the networks, i.e., communities which are tightly connected and remain wealthy even over a long period of time. Furthermore, we investigate on the structural vulnerability of the network community structure via identifying key nodes that play an important role in maintaining the normal function of the whole system. This is a new research direction on the cyber-infrastructure that we have recently introduced. To certify the effectiveness of our suggested frameworks and algorithms, we extensively test them on not only synthesized networks but also on real-world dynamic traces. Finally, we demonstrate the wide applicability of our algorithms via realistic applications, such as the limiting  misinformation spread in Online Social Networks as well as the social-based forwarding and routing strategy and worm containment in Mobile networks.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Nam P Nguyen.
Thesis: Thesis (Ph.D.)--University of Florida, 2013.
Local: Adviser: Thai, My Tra.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2015-05-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2013
System ID: UFE0045408:00001


This item has the following downloads:


Full Text

PAGE 1

COMMUNITYSTRUCTUREANDITSAPPLICATIONSINDYNAMICCOMPLEXNETWORKSByNAMP.NGUYENADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFDOCTOROFPHILOSOPHYUNIVERSITYOFFLORIDA2013

PAGE 2

c2013NamP.Nguyen 2

PAGE 3

ACKNOWLEDGMENTS Iwouldlikeexpressthedeepestappreciationtomycommitteechair,ProfessorMyT.ThaiforalwaysbeingagreatadvisorofmyPh.D.journey.Shecontinuallyandconvincinglyconveyedaspiritofadventureinregardtoresearchandscholarship,andanexcitementinregardtoteaching.Herwisdom,supportandadviceshaveguidedmethroughallofmydifcultmoments,notonlyindoingresearchbutalsoinmylife.Withoutherguidanceandpersistenthelpthisdissertationwouldnothavebeenpossible.Also,Iamgracefultohaveexcellentlabmateswhohaveprovidedextremelyhelpfulresourcestomystudy.Iwouldliketothankmycommitteemembers,ProfessorSanjayRanka,ProfessorPanosPardalos,ProfessorTamerKahveciandProfessorPrabhatMishrawhohavebeenverysupportivetomydissertation.TheirencouragementandadviceshavehelpedmealotnotonlyinmyPh.D.studybutalsoinmyfuturecareer.FinancialsupportformyPh.D.programwasprovidedbytheUniversityofFlorida,NSFCAREERAwardGrantnumber0953284andtheDTRAGrantnumberHDTRA1-08-10. 3

PAGE 4

TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................. 3 LISTOFTABLES ...................................... 7 LISTOFFIGURES ..................................... 8 ABSTRACT ......................................... 10 CHAPTER 1INTRODUCTION ................................... 11 1.1CommunityDetectioninDynamicComplexNetworks ........... 11 1.2NonnegativeMatrixFactorizationforCommunityDetection ........ 12 1.3ApplicationsofTheNetworkCommunityStructure ............. 14 1.4TheIdenticationofStableCommunities ................... 15 1.5TheAssessmentofNetworkCommunityStructureVulnerability ...... 16 1.6LiteratureReview ................................ 17 1.7Dissertationoutline ............................... 22 2NONOVERLAPPINGCOMMUNITYSTRUCTUREDETECTION ........ 25 2.1ProblemDenition ............................... 25 2.2AlgorithmDescription ............................. 26 2.2.1NewNode ................................ 28 2.2.2NewEdge ................................ 30 2.2.3NodeRemoval ............................. 36 2.2.4EdgeRemoval ............................. 36 2.3ExperimentalResults ............................. 41 2.3.1ResultsonSynthesizedNetworks ................... 42 2.3.2ResultsonRealWorldTraces ..................... 44 3OVERLAPPINGCOMMUNITYSTRUCTUREDETECTION ........... 51 3.1ProblemFormulation .............................. 51 3.1.1BasicNotations ............................. 51 3.1.2DynamicNetworkModel ........................ 51 3.1.3DensityFunction ............................ 52 3.1.4ObjectiveFunction ........................... 53 3.1.5ProblemDenition ........................... 54 3.2BasicCommunityStructureDetection .................... 54 3.2.1LocatingLocalCommunities ...................... 55 3.2.2CombiningOverlappingCommunities ................ 58 3.2.3RevisitingUnassignedNodes ..................... 60 3.3DetectingEvolvingNetworkCommunities .................. 61 4

PAGE 5

3.3.1HandlingaNewNode ......................... 63 3.3.2HandlingaNewEdge ......................... 65 3.3.3RemovinganExistingNode ...................... 67 3.3.4RemovinganExistingEdge ...................... 68 3.3.5Remarks ................................. 70 3.3.6Complexity ............................... 70 3.4ExperimentalResults ............................. 71 3.4.1ChoosingtheOverlappingThreshold ................ 73 3.4.2ReferencetoStaticMethods ...................... 74 3.4.3ReferencetoOtherDynamicMethods ................ 75 4COMMUNITYSTRUCTUREDETECTIONUSINGNONNEGATIVEMATRIXFACTORIZATION ................................... 79 4.1ProblemDenitionandProperties ...................... 79 4.1.1MotivationforNMFinCommunityDetection ............. 79 4.1.2ProblemDenitions ........................... 81 4.1.3PropertiesofiSNMFandiANMFfactorizations ........... 81 4.2TheUpdateRuleforiSNMF .......................... 84 4.2.1MultiplicativeUpdateRule ....................... 84 4.2.2Quasi-NewtonMethodforiSNMF ................... 88 4.3UpdateRulesforiANMF ............................ 89 4.3.1MultiplicativeUpdateRules ...................... 89 4.4ExperimentalResults ............................. 94 4.4.1EmpiricalResultsonSynthesizedNetworks ............. 95 4.4.2ResultsonRealNetworks ....................... 100 5SOCIAL-AWAREROUTINGSTRATEGIESINMOBILEAD-HOCNETWORKS 102 5.1AMessageForwardingandRoutingStrategyEmployingQCA ....... 102 5.1.1Setup .................................. 103 5.1.2Results ................................. 105 5.2AMessageForwardingandRoutingStrategyEmployingAFOCS ..... 106 5.2.1MessageForwardingStrategy ..................... 106 5.2.2Setup .................................. 107 5.2.3Results ................................. 109 6SOLUTIONSFORWORMCONTAINMENTINONLINESOCIALNETWORKS 111 6.1AnApplicationofQCAinContainingWormsinOSNs ........... 113 6.1.1Setup .................................. 113 6.1.2Results ................................. 115 6.2ContainingWormswithOverlappingCommunitiesDetectedbyAFOCS 118 6.2.1Setup .................................. 118 6.2.2Results ................................. 119 5

PAGE 6

7STABLECOMMUNITYDETECTIONINONLINESOCIALNETWORKS .... 122 7.1BasicNotations ................................. 123 7.2LinkStabilityEstimation ............................ 124 7.2.1LinkReciprocityPrediction ....................... 125 7.2.2LinkStabilityEstimation ........................ 127 7.3StableCommunityDetection ......................... 129 7.3.1LumpedMarkovChain ......................... 129 7.3.2TheConnectiontoNetworkTopology ................. 131 7.3.3DetectingCommunities ........................ 132 7.3.3.1Formulation .......................... 132 7.3.3.2Resolutionlimitanalysis ................... 133 7.3.3.3Connectiontostabilityestimation .............. 134 7.3.3.4AgreedyalgorithmforSCDproblem ............ 135 7.4ExperimentalResults ............................. 137 7.4.1Datasets ................................. 137 7.4.2Metric .................................. 139 7.4.3EffectofLinkStabilityEstimation ................... 139 7.4.4GeneralCommunityStructureDetection ............... 141 7.4.5ResultsonStableCommunityDetection ............... 142 7.5Conclusion ................................... 144 8ASSESSINGNETWORKCOMMUNITYSTRUCTUREVULNERABILITY ... 145 8.1Introduction ................................... 145 8.2ProblemDenition ............................... 146 8.3AnalysisofNMIMeasure ........................... 148 8.3.1NMIFormulation ............................ 148 8.3.2MinimizingNMIinaDisjointCommunityStructure ......... 150 8.3.2.1MinimizingNMIwithinacommunity ............ 150 8.3.2.2MinimizingNMIinageneraldisjointcommunitystructure 151 8.3.3MinimizingNMIinanOverlappedCommunityStructure ...... 153 8.4ASolutiontoCSVProblem .......................... 154 8.5ExperimentalResults ............................. 158 8.5.1ResultsonSynthesizedNetworks ................... 161 8.5.1.1Solutionquality ........................ 161 8.5.1.2TheNumberofCommunitiesandTheirSizes ....... 163 8.5.2ResultsonRealWorldTraces ..................... 164 8.6AnApplicationinDTNs ............................ 167 9CONCLUSIONS ................................... 172 REFERENCES ....................................... 173 BIOGRAPHICALSKETCH ................................ 185 6

PAGE 7

LISTOFTABLES Table page 8-1Statisticofsocialtraces ............................... 164 7

PAGE 8

LISTOFFIGURES Figure page 1-1ThegeneralframeworkforouradaptivecommunitydetectionalgorithmA. ... 13 1-2Theclassicationofcommunitydetectionalgorithmsincomplexnetworks. ... 17 2-1Possiblebehaviorsofthenetworkcommunitystructureduringevolution. .... 28 2-2NMIscoresonsynthesizednetworkswithknowncommunities ......... 41 2-3Modularityvaluesonsynthesizednetworkswithknowncommunities ...... 42 2-4SimulationresultsonEnronemailnetwork. .................... 45 2-5SimulationresultsonarXive-printcitationnetwork. ................ 46 2-6SimulationresultsonFacebooksocialnetwork. .................. 47 3-1Overlappedv.s.non-overlappedcommunitystructures. ............. 52 3-2Locatingandmerginglocalcommunities. ..................... 55 3-3Apossiblescenariowhenanewnodeisintroduced. ............... 63 3-4Possiblescenarioswhenanewedgeisintroduced. ............... 65 3-5Possiblescenarioswhenanexistingnodeisremoved. .............. 67 3-6Possiblescenarioswhenanexistingedgeisremoved. .............. 69 3-7NMIscoresfordifferentvaluesof.N=5000(top),N=1000(bottom),=0.1(left),=0.3(right). ............................... 71 3-8ComparisonamongAFOCS,COPRAandCFindermethods.N=5000(top),N=1000(bottom),=0.1(left),=0.3(right). ................. 72 3-9ComparisonamongAFOCS,iLCD,FacetNetandOSLOMdynamicmethods. 76 3-10ThenumberofcommunitiesobtainedbyAFOCS,iLCD,FacetNetandOSLOMandOSLOMsmethods. ............................... 77 4-1AnillustrativeexamplemotivatingNMFincommunitydetection ......... 80 4-2ThepartialderivativematrixofHHTwithrespecttoHab. ............. 85 4-3ThepartialderivativematrixofHSHTwithrespecttoHab. ............ 91 4-4NormalizedMutualInformationscoresonsynthesizednetworks ........ 96 4-5Numberofcommunitiesonsynthesizednetworks ................. 97 8

PAGE 9

4-6RunningTimeonsynthesizednetworks ...................... 99 4-7Thenumberofcommunities,InternaldensityandOverlappingratioofEnronemailandFacebook-likedatasets ......................... 100 5-1ExperimentalresultsontheRealityMiningdataset ................ 104 5-2ExperimentalresultsontheRealityMiningdataset ................ 108 6-1Ageneralwormcontainmentstrategy. ....................... 112 6-2Infectionratesonstaticnetworkwithk=150clusters .............. 114 6-3Infectionratesondynamicnetworkwithk=200clusters ............ 115 6-4OverCompatchingscheme. ............................. 119 6-5Infectionratesbetweenfourmethods. ....................... 120 7-1Illustrationsofstabilityfunction. ........................... 128 7-2Resultsonsynthesizednetworkswithdifferentcommunitycriteria. ....... 138 7-3PerformanceofSCDindetectingstablecommunitiesonrealsocialtraces. .. 140 8-1ComparisonamongdifferentnodeselectionstrategiesonsynthesizednetworkswithN=2500nodes ................................. 159 8-2ComparisonamongdifferentnodeselectionstrategiesonsynthesizednetworkswithN=5000nodes ................................. 160 8-3ResultsobtainedbyAFOCSonnetworkswithN=2500nodesandN=2500nodes. ......................................... 162 8-4NMIscoresonRealityminingdata,FoursquareandFacebooknetworksobtainedbyAFOCS(k=50...1000) .............................. 165 8-5SimulationresultsonHAGGLEdataset. ...................... 169 8-6NMImeasureonHaggledataset. .......................... 170 9

PAGE 10

AbstractofDissertationPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofDoctorofPhilosophyCOMMUNITYSTRUCTUREANDITSAPPLICATIONSINDYNAMICCOMPLEXNETWORKSByNamP.NguyenMay2013Chair:MyT.ThaiMajor:ComputerEngineeringInthisdissertation,wefocusonanalyzingandunderstandingtheorganizationalprincipals,assessingthestructuralvulnerabilityaswellasexploringpracticalapplicationsofdynamiccomplexnetworks.Inparticular,weproposetwoadaptiveframeworksforidentifyingthenonoverlappingandoverlappingcommunitystructureindynamicnetworks.Ourapproacheshavenotonlythepowerofquicklyandefcientlyupdatingthenetworkcommunities,butalsotheabilityoftracingtheevolutionofthosecommunitiesovertime.Wealsosuggestadetectionmethodbasedonnonnegativematrixfactorizationwhichcanworkonweightedanddirectednetworks.Consequently,westudythediscoveryofstablecommunitiesinthenetworks,i.e.,communitieswhicharetightlyconnectedandremainwealthyevenoveralongperiodoftime.Furthermore,weinvestigateonthestructuralvulnerabilityofthenetworkcommunitystructureviaidentifyingkeynodesthatplayanimportantroleinmaintainingthenormalfunctionofthewholesystem.Thisisanewresearchdirectiononthecyber-infrastructurethatwehaverecentlyintroduced.Tocertifytheeffectivenessofoursuggestedframeworksandalgorithms,weextensivelytestthemonnotonlysynthesizednetworksbutalsoonreal-worlddynamictraces.Finally,wedemonstratethewideapplicabilityofouralgorithmsviarealisticapplications,suchasthelimitingmisinformationspreadinOnlineSocialNetworksaswellasthesocial-basedforwardingandroutingstrategyandwormcontainmentinMobilenetworks. 10

PAGE 11

CHAPTER1INTRODUCTION 1.1CommunityDetectioninDynamicComplexNetworksManycomplexsystemsinrealityexhibitthepropertyofcontainingcommunitystructure[ 37 ][ 85 ],i.e.,theynaturallydivideintogroupsofverticeswithdenserconnectionsinsideeachgroupandfewerconnectionscrossinggroups,whereverticesandconnectionsrepresentnetworkusersandtheirsocialinteractions,respectively.Membersineachcommunityofasocialnetworkusuallysharethingsincommonsuchasinterestsinphotography,movies,musicordiscussiontopicsandthus,theytendtointeractmorefrequentlywitheachotherthanwithmembersoutsideoftheircommunity.Communitydetectioninanetworkisthegatheringofnetworkverticesintogroupsinsuchawaythatnodesineachgrouparedenselyconnectedinsideandsparseroutside.Itisnoteworthytodifferentiatebetweencommunitydetectionandgraphclustering.Thesetwoproblemssharethesameobjectiveofpartitioningnetworknodesintogroups;however,thenumberofclustersispredenedorgivenaspartoftheinputingraphclusteringwhereasthenumberofcommunitiesistypicallyunknownincommunitydetection.Detectingcommunitiesinanetworkprovidesusmeaningfulinsightstoitsinternalstructureaswellasitsorganizationprinciples.Furthermore,knowingthestructureofnetworkcommunitiescouldalsoprovideusmorehelpfulpointsofviewtosomeuncoveredpartsofthenetwork,thushelpsinpreventingpotentialnetworkingdiseasessuchasvirusorwormpropagation.Studiesoncommunitydetectiononstaticnetworkscanbefoundinanexcellentsurvey[ 58 ]aswellasintheworkof[ 76 ][ 6 ][ 78 ][ 8 ]andreferencestherein.Real-worldcomplexnetworks,however,arenotalwaysstatic.Infact,mostofcomplexsystemsinreality(suchasFacebook,BeboandTwitterinOSNs)evolveandwitnessanexpandinsizeandspaceastheirusersincrease,thuslendthemselvestotheeldofdynamicnetworks.Adynamicnetworkisaspecialtypeofevolvingcomplex 11

PAGE 12

networksinwhichchangesarefrequentlyintroducedovertime.Inthesenseofanonlinesocialnetwork,suchasFacebook,TwitterorFlickr,changesareusuallyintroducedbyusersjoininginorwithdrawingfromoneormoregroupsorcommunities,byfriendsandfriendsconnectingtogether,orbynewpeoplemakingfriendwitheachother.Anyoftheseeventsseemstohavealittleeffecttoalocalstructureofthenetworkononehand;thedynamicsofthenetworkoveralongperiodoftime,ontheotherhand,mayleadtoasignicanttransformationofthenetworkcommunitystructure,thusraisesanaturalneedofreidentication.However,therapidlyandunpredictablychangingtopologyofadynamicsocialnetworkmakesitanextremelycomplicatedyetchallengingproblem.Althoughonecanpossiblyrunanyofthestaticcommunitydetectionmethods,whicharewidelyavailable[ 76 ][ 6 ][ 78 ][ 17 ],tondthenewcommunitystructurewheneverthenetworkisupdated,hemayencountersomedisadvantagesthatcannotbeneglected:(1)thelongrunningtimeofaspecicstaticmethodonlargenetworks(2)thetrapoflocaloptimaand(3)thealmostsamereactiontoasmallchangetosomelocalpartofthenetwork.Abetter,muchefcientandlesstimeconsumingwaytoaccomplishthisexpensivetaskistoadaptivelyupdatethenetworkcommunitiesfromthepreviousknownstructures,whichhelpstoavoidthehassleofrecomputingfromscratch.Thisadaptiveapproachisthemainfocusofourstudyinthispaper.InFigure 1)]TJ /F5 11.955 Tf 11.95 0 Td[(1 ,webrieygeneralizetheideaofdynamicnetworkcommunitystructureadaptation.Here,thenetworkevolvesfromtimettot+1underthechangeGt.TheadaptivealgorithmAquicklyndsthenewcommunitystructureC(Gt+1)basedonthepreviousstructureC(Gt)togetherwiththechangesGt. 1.2NonnegativeMatrixFactorizationforCommunityDetectionCommunityidenticationoncomplexnetworksisawell-establishedeldandmanyefcientgraph-basedmethodshavebeenintroducedintheliterature(see[ 32 ]foranexcellentsurvey).Unfortunately,thesemethodsexposethestrongdependenceonsomelocalpartsofthenetworktopologyaswellastheimplicitmeaningandinterpretation 12

PAGE 13

Figure1-1. ThegeneralframeworkforouradaptivecommunitydetectionalgorithmA. fromthedetectedoverlappingcommunities.Recently,NMF-basedalgorithmsfordetectingnetworkcommunitieshavegainedgreatattentionduetoitsmeaningfulinterpretation[ 102 ].Ingeneral,anNMFproblemasksfor,givenanonnegativematrixX2Rmmandanumberkminfm,ng,nonnegativematricesW2RmkandH2RknsuchthatjjX)]TJ /F4 11.955 Tf 12.75 0 Td[(WHjjisminimized,wherejjjjisacostfunction(usuallytheFrobeniusdistanceorI-divergence).OnenotablepropertyofNMFisitscloserelationshiptoK-meanclusteringandgraphpartitioning[ 67 ][ 24 ],whichalsocloselyrelatestocommunityidentication.Afewattemptshavebeensuggestedonthislineofmethod.LinetalproposedMetaFac[ 72 ],aNMF-basedmethodforextractingcommunitystructurethroughrelationalhypergraphs.Thismethod,however,isnotcapableforidentifyingoverlappedstructures.In[ 90 ],Prorakisetal.recentlyproposedanapproachforndingoverlappingcommunitiesusingaBayesianNMFbasedonhyperparameters.Thismethodhastheadvantagesofautomaticallydeterminingthenumberofcommunitiesandnotsufferingfromtheresolutionlimit.Unfortunately,itsbuilt-inestimateofthenumberofcommunitiescouldmisleadthefactorizationtoreturnabadsolution.In[ 103 ],Wangetal.proposedNMFmethodsontheFrobeniusnormwiththecapabilityofextractingoverlappedstructures.However,wendtheseapproachesdonotappeartoperformwellonweighteddirectednetworksasshownintheexperiments.Toovercometheabovelimitations,weintroducetwoNMFapproaches,namelyiSNMFandiANMF,foreffectivelyidentifyingsocialnetworkcommunitieswithmeaningful 13

PAGE 14

interpretations.Inparticular,weareinterestedinapproximatingXHSHTsincethisfactorizationprovidesusHasthecommunityindicatormatrixandSasthecommunity-interactionstrengthmatrix,respectively.Thisfactorization,asaresult,nicelyreectstheoverlapofnetworkcommunitiesandpromisesameaningfulcommunityinterpretationthatisindependentofthenetworktopology. 1.3ApplicationsofTheNetworkCommunityStructureDetectingcommunitystructureofadynamicsocialnetworkisofconsiderableuses.Togiveasenseofit,considertheroutingproblemincommunicationnetworkwherenodesandlinkspresentpeopleandmobilecommunications,respectively.Duetonodesmobilityandunstablelinkspropertiesofthenetwork,designinganefcientroutingschemeisextremelychallenging.However,sincepeoplehaveanaturaltendencytoformgroupsofcommunication,thereexistgroupsofnodeswhicharedenselyconnectedinsidethanoutsideintheunderlyingMANETasareection,andtherefore,formscommunitystructureinthatMANET.Aneffectiveroutingalgorithm,assoonasitdiscoversthenetworkcommunitystructure,candirectlyrouteorforwardmessagestonodesinthesame(ortotherelated)communityasthedestination.Bydoingthisway,wecanavoidunnecessarymessagesforwardingthroughnodesindifferentcommunities,thuscanlowerdownthenumberofduplicatemessagesaswellasreducetheoverheadinformation,whichareessentialinMANETs.Anothergreatexampleincludesthewormcontainmentincellularnetworks[ 110 ],orinOSNs[ 81 ][ 82 ].Nowadays,manysocialapplicationssuchasFacebook,TwitterandFourSquare,areabletorunonopen-APIenabledmobiledeviceslikePDAsandIphones.However,ifsuchanapplicationisinfectedwithmalicioussoftware,suchaswormsorviruses,thisopennesswillalsomakeiteasierfortheirpropagation.Apossiblesolutiontopreventwormsfromspreadingoutwideristosendpatchestocriticalusersandletthemredistributetotheothers.Intuitively,thesmallerthesetofimportantusersforsendingpatches,thebetter.Buthowcanweeffectivelychoosethatsetofminimal 14

PAGE 15

size?Thisiswherecommunitystructurecomesintothepictureandhelps.Inparticular,weshowthatselectingusersintheboundariesoftheoverlappednodesgivesatighterandmoreefcientsetofinuentialusers,thussignicantlylowersthenumberofsentpatchesaswellasoverheadinformation,whichareessentialincellularnetworksandOSNs. 1.4TheIdenticationofStableCommunitiesOSNsinrealityarehighlydynamicassocialinteractionsonthemtendtocomeandgoquickly.Consequently,theircommunitiesarealsodynamicalandevolveheavilyasthenetworkschangeovertime.However,Pallaetal.observeintheirseminalworkthatsomecommunitiesinsocialnetworksaretightlyconnectedandremainwealthyevenoveralongperiodoftime[ 86 ].Theauthorsalsopointoutthatlarge-sizecommunitieswithahighinternaldensitiesandlessexternaldistractionstendtoremainstableduringthenetworkevolution,whichintuitivelyagreewiththendingsreportedin[ 49 ].TheseobservationsreassembletheconceptofstablecommunitiesinOSNs.Forexample,stablecommunitiesonFacebookcanbevisualizedasgroupsofuserswhodevotedthemselvestooneparticularinterestsuchasmovie,musicorphotography.Likewise,astablecommunityinTwittercanbeillustratedviaagroupofuserswhomayfollowmanybutonlyloyaltoaspeciccelebrity.Inadifferentperspective,stablecommunitiesinacitationnetworkmayrefertowell-establishedresearchtopicsintheeldwhereasunstablecommunitiesmayrepresenttopicalorrecentlyarisingresearchdirections.Thediscoveryofthesestablecommunities,asaconsequence,willprovideusvaluableinsightsintothecorepropertiesandcharacteristicsofnotonlyeachcommunitybutalsoofthenetworkasawhole.ThisknowledgecanfurtherbenetinformationretrievalinOSNsassearchescanberedirectedtostablecommunitiessharingthemostsimilarcharacteristicstothequeriesformoremeaningfulanswers.Forinstance,thesearchforwell-establishedresearchtopicsinacitationnetworkcanbeminedmoreeffectivelywhenonelooksatitsstableratherthanunstablecommunities,as 15

PAGE 16

discussedabove.However,thelarge-scaleandnonreciprocaltopologiesofOSNsinrealitymakethedetectionofstablecommunitystructureanextremelychallengingyettopicalproblem. 1.5TheAssessmentofNetworkCommunityStructureVulnerabilityComplexsystems,despitetheirdiversityinphysicalinfrastructuresandunderlyinginteractions,exposetobeextremelyvulnerableundernodeattacks.Insomescenarios,thefailuresofonlyafewkeynodesareenoughtobringthewholenetworkoperationdowntoitsknees[ 25 ].Moreimportantly,thisvulnerabilitycanfurtherbepropagatedtoawiderpopulation,leadingtoamuchmoredevastatingconsequence.Inordertodevelopacomprehensiveunderstandingonthistypeofattack,itisthereforeimportanttounderstandnotonlytheimpactofnodes'failuresonthenetworkcomponentsbutalsotheinnerandinterdependencyamongthosecomponents[ 88 ].Particularly,itiscrucialtoexplorehowthefailureofasinglenode,orasetofnodesingeneral,cansignicantlychangethestructureofthenetworkcomponentsaswellashowthesecomponentswouldaffecteachotherincasesofattacks.However,thelargescaleanddynamicalpropertiesofcomplexsystemsinpracticemakethisacomplicatedproblem.Totacklethisproblem,weintroducetheuseofnetworkmodulestostudyboththeimpactofnodes'sfailuresandthenetworkcomponentinterdependency.Thereareseveralreasonsandbenetsbehindthisapproach.Firstofall,investigatingtheinterdependenciesbasedonthetopologyoftheunderlyingnetworkstructuresisamajoraspectthatmustbeconsideredtounderstandthebehaviorofstructuralvulnerability[ 88 ].Secondly,mostcomplexnetworkscommonlyexhibitmodularproperty,orinotherwords,theyexhibittocontaincommunitystructureintheirunderlyingorganizations.Thatis,thenetworknodescanbegatheredintogroupsinsuchawaythateachgroupisdenselyconnectedinternallyandsparselyconnectedexternally[ 38 ][ 75 ].Nodesineachcommunityusuallysharesimilarfunctionsandcharacteristicsthatdistinguishthemselvesfromtheothers.Inabroaderview,communitiesdisplaysthewholenetwork 16

PAGE 17

Figure1-2. Theclassicationofcommunitydetectionalgorithmsincomplexnetworks. structureasacompactandmoreunderstandablelevelwhereacommunitymayrepresentanentityorafunctionalgroupinthesystem.Atthislevel,elementfailuresinonecommunitycanhaveaprofoundimpactwhichcanconsequentlyleadtochangesofothercommunities.Therefore,identifyingnetworkelementsthatareessentialtoitscommunitystructureisafundamentalandextremelyimportantissue.Tothebestofourknowledge,thisresearchdirectionhasnotbeenaddressedsofarintheliterature. 1.6LiteratureReviewCommunitydetectionindynamicnetworks Communitydetectionincomplexnetworksisawellestablishedeldandatremendousnumberofidenticationmethodshasbeenproposedintheliterature.Somenotableapproachdirectionsincludeclassicalgraphclusteringalgorithms[ 4 ][ 73 ],dynamicapproaches[ 92 ],modularityoptimizationmethods[ 75 ],statisticalinference[ 79 ]orrandomwalkforcommunitydetection[ 28 ](see[ 32 ]andreferencesthereinforanexcellentsurvey).Inageneralview,communitystructuredetectionalgorithmsforcomplexnetworkcanbeclassiedindifferentways:eitherbynonoverlappingoroverlappingdetection 17

PAGE 18

algorithms,bystaticordynamicalgorithms,orbyalgorithmsfordirectedandundirectednetworks,etc.Figure 1-2 describesadetailsclassicationof16differenttypesofidenticationalgorithms.Communitydetectiononstaticnetworkshasattractedalotofattentionsandmanyefcientmethodshavebeenproposedforthistypeofnetworks.Detectingcommunitystructureondynamicnetworks,however,hassofarbeenanuntroddenarea.ArecentworkofPallaetal.[ 85 ]proposedaninnovativemethodfordetectingcommunitiesondynamicnetworksbasedonthek-cliquepercolationtechnique.Thisapproachcandetectoverlappingcommunities;however,itistimeconsuming,especiallyonlargescalenetworks.AnotherrecentworkofZhangetal.[ 109 ]proposedadetectionmethodbasedoncontradictingthenetworktopologyandthetopology-basedpropinquity,wherepropinquityistheprobabilityofapairofnodesinvolvedinacommunity.Aworkin[ 98 ]presentedaparameter-freemethodologyfordetectingclustersontime-evolvinggraphsbasedonmutualinformationandentropyfunctionsofInformationTheory.Huietal.[ 48 ]proposedadistributedmethodforcommunitydetectioninwhichmodularitywasusedasameasureinsteadofobjectivefunction.Apartfromthat,[ 44 ]attemptedtotracktheevolvingofcommunitiesovertime,usingafewstaticnetworksnapshots.In[ 99 ],theauthorspresentaframeworkforidentifyingdynamiccommunitieswithaconstantfactorapproximation.However,thismethoddoesnotseemtomakesenseonreal-worldsocialnetworkssinceitrequiressomepredenedpenaltycostswhicharegenerallyunknownondynamicnetworks.Arecentwork[ 26 ],Thangetal.proposedasocial-awareroutingstrategyinMANETswhichalsomakesusesofamodularity-basedprocedurenameMIENforquicklyupdatingthenetworkstructure.Inparticular,MIENtriestocomposeanddecomposenetworkmodulesinordertokeepupwiththechangesandusesfastmodularityalgorithm[ 76 ]toupdatethenetworkmodules.However,thismethodperformsslowlyonlargescaledynamicnetworksduetothehighcomplexityof[ 76 ]. 18

PAGE 19

In[ 70 ],Linetal.proposedFacetNet,aframeworkforanalyzingcommunitiesindynamicnetworksbasedontheoptimizationofsnapshotcosts.FacetNetisguaranteedtoconvergetoalocaloptimalsolution;however,itsconvergencespeedisslowanditsinputasksforthenumberofnetworkcommunitieswhichareusuallyunknowninpractice.In[ 27 ],Duanetal.proposedStream-Group,anincrementalmethodtosolvethecommunitymininganddetectthechangepointsinweighteddynamicgraphs.Thismethodismodularity-basedthusmayinherittheresolutionlimitwhilediscoveringnetworkcommunities.Inanotherattempt,Kimetal.[ 52 ]suggestedaparticle-and-densitybasedclusteringmethodfordynamicnetworks,basedontheextendedmodularityandtheconceptsofnano-communityandl-quasi-clique-by-clique.Apartfromthat,theworkofCazabetetal.[ 9 ]proposediLCDmethodtondtheoverlappingnetworkcommunitiesbyaddingedgesandthenmergingsimilarones.However,thismodelmightnotbesufcientinconsiderationwiththedynamicbehaviorsofthenetworkwhennewnodesareintroducedorremoved,orwhenexistingedgesareremovedfromthenetwork.In[ 60 ],theauthorpresentedOSLOM,aframeworkfortestingthestatisticalsignicanceofaclusterwithrespecttoaglobalnullmodel(e.g.,arandomgraph).Toexpandacommunity,OSLOMlocallycomputesthevaluerforeachneighbornodeandtriestoincludethatnodeintothecurrentcommunity.Nonnegativematrixfactorizationforcommunitydetection CommunitydetectiononcomplexnetworksisamatureresearchareaandbesidesNMF-basedalgorithms,manyeffectivegraph-basedortopology-basedalgorithmshavebeenproposedforthispurpose.Ingeneral,detectionmethodscanbeclassiedintonon-overlapping(disjoint)andoverlappingalgorithms.Traditionalnon-overlappingalgorithms[ 75 ][ 17 ][ 77 ]mayreturngoodcommunityidenticationresults,howeverarenotabletorevealtheoverlappednetworkstructures,particularlyonsocialnetworks.Ontheothercategory,algorithmsforgraph-basedandtopological-baseddetectionofoverlappingcommunitieshavealsobeenproposedintheliterature.Mostofthem 19

PAGE 20

arebasedontheclique-percolation[ 84 ]orcliqueextension[ 63 ]techniques,ontheextendedmodularity[ 62 ][ 83 ],onaspecictnessfunction[ 56 ],onlabelpropagation[ 40 ],orlink-basedtechnique[ 1 ].See[ 32 ]andreferencesthereinforanexcellentsurveyonthosedetectionmethods.Althoughthesuccessoftheseaforementionedalgorithmshavebeentheoreticallyandempiricallyveried,theystillexposethefollowinglimitations:(1)Thestrongdependenceonsomelocalpartsofthenetworktopology,e.g.,theclique-percolationmethoddependsonsomedensesubnetworksinordertopercolate,alink-basedtechniquereliesonpotentiallinkswithhighestdegrees,amodularity-basedtechniquedependsonthenetworkhierarchyinordertomaximizemodularity,etc,and(2)Theimplicitmeaningandinterpretationfromthedetectedoverlappingcommunities,e.g.,whatisthecontributionofanoverlappednodetothesepercolated-cliquesorwhywoulditevenbethere?Theseshortcomingsofthesemethodsdrivetheneedforabetterapproachwithamoremeaningfulinterpretation.Stablecommunitydetection Thediscoveryofstablecommunities,onthecontrary,isstillanuntroddenareawithonlyafewattemptshasbeensuggested[ 23 ][ 59 ][ 68 ].ThisspecialpropertyofnetworkcommunitieswasperhapsrstobservedbyPallaetal.inhisseminalwork[ 86 ],wheretheypointoutthattight-knitcommunitieswithhighinternaldensitiesandlessexternaldistractionstendtoremainstrongovertime,therebyreassemblestheconceptofcommunitystability.Delvenneetal.[ 23 ]extendthisgeneralconcepttoproposedanmeasure,calledstabilityoftheclusteringr(t,H),toquantifyhowstableagivencluster(orcommunitystructure)HisataspecictimesteptbasedontheMarkovAutocovariancemodel.Underthisnotation,aclusterHisstableattimetifahighvalueofr(t,H)isobserved.Thisquantity,instead,ismoreappropriateforvericationratherthanidenticationofstablenetworkcommunitiessinceitrequiresthespecicationoftimesteptaprior. 20

PAGE 21

Inadifferentapproach,Lancichinettietal.[ 59 ]investigateontheconsensusofcommunitydetectionmethods.Theauthorsreportthat,givenaparticularalgorithmA,theconsensusoncommunitiesfoundbyAaftermultiplerunsdramaticallyimprovethequalityofthedetection,henceforthsuggestthatthosecommunitiesarecandidatesforstablestructures.Thisisaveryinterestingapproach,however,mightencountersomedisadvantagesof(1)theexpensivecomputationalcostandtimeconsuming,and(2)theconvergenceofthewholeiterativeprocessisnotguaranteed.Inarecentattempt,Yanhuaetal.[ 68 ]utilizetheconceptofmutuallinksandsuggestanspectral-clustering-basedidenticationmethodthattriestomaximizethetotalmutualconnectionsinordertondstablecommunities.However,therearepossibilitiesthatsomemutuallinksareoflowmagnitudes,andthus,donotsignicantlycontributetotheoverallstabilityatthecommunitylevel.Structuralvulnerabilityassessmentofcommunitystructure Communitystructureandcomplexnetworkvulnerabilityarethetwomajorandwell-developedareasofnetworkingresearch.SurveysoncommunitystructuredetectionalgorithmsaswellasmethodsforassessingnetworkvulnerabilitiescanbefoundintheworkofFortunatoset.al.[ 32 ],andGrubesicet.al.[ 41 ],respectively.However,assessingthevulnerabilityofnetworkcommunitystructurehassofarbeenanuntroddenarea.Alargebodyofworkhasbeendevotedtondthenoderoleswithinacommunitybyalink-basedtechniquetogetherwithamodicationofnodedegree[ 95 ],byusingthespectrumofthegraph[ 105 ],byusingawithin-moduledegreeandtheirparticipationcoefcient[ 42 ],orbythedetectionofkeynodes,overlappingcommunitiesanddateandpartyhubs[ 54 ].However,noneoftheseapproachesdiscusshowthecommunitystructurewouldchangeinthefailureofthoseimportantnodes,especiallyintermsofNMImeasure.Thevulnerabilityofnetworkfunctionandstructurehasbeenexaminedunderthenodecentralitymetrics,suchashighdegreeandbetweenesscentrality,aswellas 21

PAGE 22

undertheaverageshortestpathwhichtriestosignifythelengthsofshortestdistancesbetweennodepairs[ 41 ],underthepairwiseconnectivitymetricwhosegoalaimstobreakthenetwork'spairwiseconnectivitydowntoacertainlevel[ 25 ],orundertheavailablenumberofcompromiseds)]TJ /F4 11.955 Tf 12.21 0 Td[(tows[ 74 ],etc.However,thereisanevenmorecrucialriskthatcoulddramaticallyaffectthenormalnetworkfunctionalitythathasnotbeenaddressedsofar:thetransformationorrestructionofthenetworkcommunitystructure.Duetoitsvitalroleinthenetwork,anysignicantrestructionortransformationofthecommunitystructure,resultedfromimportantnoderemoval,canpotentiallychangetheentirenetworkorganizationandconsequentlyleadtoamalfunctionorunpredictablecorruptionofthewholenetwork. 1.7DissertationoutlineInchapter 2 ,weproposeQCA,afastandadaptivemethodforefcientlyidentifyingthenonoverlappingcommunitystructureofadynamicsocialnetwork.Ourapproachtakesintoaccountthediscoveredstructuresandprocessesonnetworkchangesonly,thussignicantlyreducescomputationalcostandprocessingtime.Westudythedynamicsofasocialnetworkandprovetheoreticalresultsregardingitscommunities'behaviorsovertime,whicharethebasesofourmethod.Weextensivelyevaluateouralgorithmsonbothsynthesizedandrealdynamicsocialtraces.ExperimentalresultsshowthatQCAachievesnotonlycompetitivemodularityscoresbutalsohighqualitycommunitystructuresinatimelymanner.WeapplyQCAmethodtowormcontainmentprobleminOSNs.SimulationresultsshowthatQCAoutperformscurrentavailablemethodsandconrmitsapplicabilityinsocialnetworkproblems.Inchapter 3 ,wesuggestAFOCS,atwo-phaseadaptiveframeworkfornotonlydetectingandupdatingtheoverlappingnetworkcommunitiesbutalsotracingtheirevolutionovertime.TheoreticalanalysesshowAFOCSpartiallyachievesmorethan74%theinternaldensityoftheoptimalsolution.Second,weevaluateAFOCSonbothsynthesizedandrealtracesincomparisontoboththestate-of-the-artandthe 22

PAGE 23

mostpopularstaticdetectionmethodsCOPRAandCFinder,aswellastorecentadaptivemethodsFacetNet,iLCDandOSLOM.EmpiricalresultsshowthatAFOCSachievesbothcompetitivelyresultsandhighqualitycommunitystructuresinatimelymanner.Finally,withAFOCS,wesuggestacommunitybasedforwardingstrategyforcommunicationnetworksthatreducesupto11xoverheadinformationwhilemaintainingcompetitivelydeliverytimeandratio.Wealsoproposeanewsocial-awarepatchingschemeforcontainingwormsinOSNs,whichhelpsreducingupto7xtheinfectionratesonFacebookdataset.WeanalyzetwoNMFapproachesinchapter 4 ,namelyiSNMFandiANMF,foreffectivelyidentifyingsocialnetworkcommunitieswithmeaningfulinterpretations.Inparticular,weareinterestedinapproximatingXHSHTsincethisfactorizationprovidesusHasthefoundationfeaturematrixandSasthefeatureinteractionmatrix.Alternatively,HandScanalsobethoughtofascommunityindicatorandinter-communitystrengthmatriceswhoserowelementscanfurtherbeinterpretedasprobabilitiesofnodesbelongingtodifferentcommunities.Thisfactorization,asaresult,nicelyreectstheoverlapofnetworkcommunitiesandpromisesameaningfulcommunityinterpretationthatisindependentofthenetworktopology.Inanapplicationperspective,weillustratethepracticalapplicationsofthenetworkcommunitystructureviatwoemergingproblemsonsocialandmobilecomputing,namelytheWormspreadcontainmentproblemononlinesocialnetworks(chapter 5 )andtheforwardingandroutingstrategy(chapter 6 )onmobilenetworks.WedemonstratethatmethodsandstrategiesemployingQCAandFOCSascommunitydetectioncoresobtainasignicantimprovementintermofperformanceandsolutionquality.Theserealisticapplicationsbrightenthewideapplicabilityofthenetworkcommunitystructuremanyproblemsenabledmycomplexnetworks.Inchapter 7 ,wesuggestanestimationwhichprovideshelpfulinsightsintothestabilityoflinksintheinputnetwork.Basedonthat,weproposeSCD-aframework 23

PAGE 24

toidentifycommunitystructureindirectionalOSNswiththeadvantageofcommunitystability.Wenextexploreanessentialconnectionbetweenthepersistenceprobabilityofacommunityatthestationarydistributionanditslocaltopology,whichisthefundamentalmathematicaltheorytosupporttheSCDframework.Tocertifytheefciencyofourapproach,weextensivelytestSCDonbothsynthesizeddatasetswithembeddedcommunitiesandreal-worldsocialtraces,includingNetHEPTandNetHEPT WCcollaborationnetworksaswellasFacebooksocialnetworks,inreferencetotheconsensusofotherstate-of-the-artdetectionmethods.HighlycompetitiveempiricalresultsconrmthequalityandefciencyofSCDonidentifyingstablecommunitiesinOSNs.Inchapter 8 ,weintroduceCSVproblemtoassesstheimpactofnodes'failuresonthenetworkcommunitystructure.Tothebestofourknowledge,thisistherstattemptinthislineofresearch.WeanalyzepossibleconditionsthatcanleadtotheminimizationofNMIonnetworkcommunitystructures.WesuggesttheconceptofgeneratingedgesofacommunityandprovideanoptimalsolutionforndingaMGES.WeproposegenEdge,anodeselectionstrategyforCSVbasedontheMGESsolution.Weconductedexperimentsonbothsynthesizeddatawithknowncommunitystructuresandrealworldtraces.EmpiricalresultsrevealthatgenEdgeoutperformsothernodeselectionstrategiesintermsofsolutionqualityaswellasinreferencetodifferentunderlyingcommunitydetectionalgorithms.Inanapplicationperspective,wedemonstratethecriticalimportanceofCSVviatheforwardingandroutingstrategiesindelaytolerantnetworks(DTNs),wherethefailuresofsomeimportantdevicessignicantlydegradetheentiresystem'sperformance.Finally,wesummaryourcontributionsandconcludethedissertationinchapter 9 24

PAGE 25

CHAPTER2NONOVERLAPPINGCOMMUNITYSTRUCTUREDETECTIONInthischapter,wepresentQCA,ourproposedalgorithmsfordetectingnonoverlappingcommunitystructureinadynamiccomplexnetwork.Inthefollowingsections,werstintroducethepreliminariesinsection 2.1 andthendescribeourQCAmethodindetailinsection 2.2 .Finally,theempiricalevaluationsofQCAonbothsynthesizedandrealdatasetsarepresentedinsection 2.3 2.1ProblemDenitionWerstpresentthenotations,objectivefunctionaswellasthedynamicgraphmodelrepresentingasocialnetworkthatwewillusethroughoutthissection.(Notation)LetG=(V,E)beanundirectedunweightedgraphwithNnodesandMlinksrepresentingasocialnetwork.LetC=fC1,C2,..,Ckgdenoteacollectionofdisjointcommunities,whereCi2CisacommunityofG.Foreachvertexu,denotebydu,C(u)andNC(u)itsdegree,thecommunitycontaininguandthesetofitsadjacentcommunities.Furthermore,foranySV,letmS,dSandeuSbethenumberoflinksinsideS,thetotaldegreeofverticesinSandthenumberofconnectionsfromutoS,respectively.Thepairsoftermscommunityandmodule;nodeandvertexaswellasedgeandlinkandareusedinterchangeably.(Dynamicsocialnetwork)LetGs=(Vs,Es)beatimedependentnetworksnapshotrecordedattimes.DenotebyVsandEsthesetsofverticesandlinkstobeintroduced(orremoved)attimesandletGs=(Vs,Es)denotethechangeintermofthewholenetwork.ThenextnetworksnapshotGs+1isthecurrentonetogetherwithchanges,i.e.,Gs+1=Gs[Gs.AdynamicnetworkGisasequenceofnetworksnapshotsevolvingovertime:G=(G0,G1,..,Gs).(Objectivefunction)Inordertoquantifythegoodnessofanetworkcommunitystructure,wetakeintoaccountthemostwidelyacceptedmeasurecalledmodularityQ 25

PAGE 26

[ 78 ],whichisdenedas:Q=XC2C)]TJ /F4 11.955 Tf 6.67 -1.6 Td[(mC M)]TJ /F4 11.955 Tf 17.87 8.08 Td[(d2C 4M2.Basically,Qisthefractionofalllinkswithincommunitiessubtractstheexpectedvalueofthesamequantityinagraphwhosenodeshavethesamedegreesbutlinksaredistributedrandomly,andthehighermodularityQ,thebetternetworkcommunitystructureis.Therefore,ourobjectiveistondacommunityassignmentforeachvertexinthenetworksuchthatQismaximized.Modularity,justlikeotherqualitymeasurementsforcommunityidentications,hassomecertaindisadvantagessuchasitsnon-localityandscalingbehavior[ 8 ],orresolutionlimit[ 35 ].However,itisstillverywellconsideredduetoitsrobustnessandusefulnessthatcloselyagreewithintuitiononawiderangeofrealworldnetworks.ProblemDenition:GivenadynamicsocialnetworkG=(G0,G1,..,Gs)whereG0istheoriginalnetworkandG1,G2,..,GsarethenetworksnapshotsobtainedthroughG1,G2,..,Gs,weneedtodeviseanadaptivealgorithmtoefcientlydetectandidentifythenetworkcommunitystructureatanytimepointutilizingtheinformationfromtheprevioussnapshotsaswellastracingtheevolutionofthenetworkcommunitystructure. 2.2AlgorithmDescriptionLetusrstdiscusshowchangestotheevolvingnetworktopologyaffectthestructureofitscommunities.Weusethetermintra-communitylinkstodenoteedgeswhosetwoendpointsbelongtothesamecommunity,andtheterminter-communitylinkstodenotethosewithendpointsconnectingdifferentcommunities.ForeachcommunityC,theconnectionslinkingCwithothercommunitiesaremuchfewerthanthosewithinCitself,i.e.,nodesinCaredenselyconnectedinsideandlessdenselyconnectedoutside.Intuitively,addingintra-communitylinksinsideorremovinginter-communitylinksbetweencommunitiesofGwillstrengthenthosecommunitiesandmakethestructureofGmoreclear.Viceversa,removingintra-communitylinksandinsertinginter-communitylinkswillloosenthestructureofG.Thecommunityupdatingprocess,asaresult,is 26

PAGE 27

challengingsinceaninsignicantchangeinthenetworktopologycanpossiblyleadtoanunexpectedtransformationofitscommunitystructure.WewilldiscussindetailpossiblebehaviorsofdynamicnetworkcommunitiesinFigure 2-1 2-1A :Newedge(u,v):uandvarerstcheckedandmembershipsarethentestedonXandY. 2-1B :(a)Theoriginalcommunity(b)Afterthedottededgeisremoved,twosmallercommunitiesarise. 2-1C :(a)Theoriginalfourcommunities(b)Afterthecentralnodeisremoved,theleftovernodesjoinindifferentmodules,formingthreenewcommunities. 2-1D :(a)Theoriginalcommunity(b)Whengisremoved,a3-cliqueisplacedatatodiscoverb,c,dande.fassignedsingletonafterwards.Inordertoreectchangesintroducedtothesocialnetwork,itsunderlyinggraphisconstantlyupdatedbyeitherinsertingorremovinganodeorasetofnodes,orbyeitherintroducingordeletinganedgeorasetofedges.Infact,theintroductionorremovalofasetofnodes(oredges)canbedecomposedasasequenceofnode(oredge)insertions(orremovals),inwhichasinglenode(orasingleedge)isintroduced(orremoved)atatime.ThisobservationhelpsustotreatnetworkchangesasacollectionofsimpleeventswhereasimpleeventcanbeoneofnewNode,removeNode,newEdge,removeEdgewhosedetailsareasfollow: newNode(V[fug):Anewnodeuwithitsassociatededgesareintroduced.ucouldcomewithnoormorethanonenewedge(s). removeNode(Vnfug):Anodeuanditsadjacentedgesareremovedfromthenetwork. newEdge(E[feg):Anewedgeeconnectingtwoexistingnodesisintroduced. removeEdge(Enfeg):Anexistingedgeeinthenetworkisremoved.OurapproachrstrequiresaninitialcommunitystructureC0,whichwecallthebasicstructure,inordertoprocessfurther.Sincetheinputmodelisrestrictedasanundirectedunweightednetwork,thisinitialcommunitystructurecanbeobtainedbyperforminganyoftheavailablestaticcommunitydetectionmethods[ 76 ][ 6 ][ 17 ].To 27

PAGE 28

A B C D Figure2-1. Possiblebehaviorsofthenetworkcommunitystructureduringevolution. obtainagoodbasicstructure,wechoosethemethodproposedbyBlondeletal.in[ 6 ]whichproducesagoodnetworkcommunitystructureinatimelymanner[ 58 ]. 2.2.1NewNodeLetusconsidertherstcasewhenanewnodeuanditsassociatedconnectionsareintroduced.Notethatumaycomewithnoadjacentedgesorwithmanyofthemconnectingoneormorecommunities.Ifuhasnoadjacentedge,wecreateanewcommunityforitandleavethecurrentstructureintact.Theinterestingcasehappens,anditusuallydoes,whenucomeswithedgesconnectingoneormoreexisting 28

PAGE 29

communities.Inthislattersituation,weneedtodeterminewhichcommunityushouldjoinininordertomaximizethegainedmodularity.Thereareseverallocalmethodsintroducedforthistask,forinstancethealgorithmsof[ 76 ][ 17 ].Ourmethodisinspiredbyaphysicalapproachproposedin[ 107 ],inwhicheachnodeisinuencedbytwoforces:FCin(tokeepustaysinsidecommunityC)andFCout(theforceacommunityCmakesinordertobringutoC)denedasfollow:FCin(u)=euC)]TJ /F4 11.955 Tf 13.15 8.09 Td[(du(dC)]TJ /F4 11.955 Tf 11.96 0 Td[(du) 2M,andFSout(u)=maxS2NC(u)euS)]TJ /F4 11.955 Tf 13.15 8.09 Td[(dudoutS 2M,wheredoutSisofoppositemeaningofdS.Takingintoaccounttheabovetwoforces,anodevcanactivelydeterminesitsbestcommunitymembershipbycomputingthoseforcesandeitherletsitselfjointhecommunityShavingthehighestFSout(v)(ifFSout(v)>FC(v)in(v))orstaysinthecurrentcommunityC(v)otherwise.ByTheorem 2.1 ,webridgetheconnectionbetweenthoseforcesandtheobjectivefunction,i.e.,joiningthenewnodeinthecommunitywiththehighestouterforcewillmaximizethelocalgainedmodularity.TheprocessispresentedinAlg. 1 Theorem2.1. LetCbethecommunityhavingthemaximumFCout(u)whenanewnodeuwithdegreepisaddedtoG,thenjoininguinCgivesthemaximalgainedmodularity. Proof. LetDbeacommunityofGandD6=C,weshowthatjoininguinDcontributeslessmodularitythanjoininguinC.TheoverallmodularityQwhenujoinsinCisQ1=mC+euC M+p)]TJ /F5 11.955 Tf 13.15 8.09 Td[((dC+euC+p)2 4(M+p)2+mD M+p)]TJ /F5 11.955 Tf 13.15 8.09 Td[((dD+euD)2 4(M+p)2+A, 29

PAGE 30

Algorithm1New Node Input:Newnodeuwithassociatedlinks;CurrentstructureCt.Output:AnupdatedstructureCt+1 1: Createanewcommunityofonlyu; 2: forv2N(u)do 3: Letvdetermineitsbestcommunity; 4: endfor 5: forC2NC(u)do 6: FindFCout(u); 7: endfor 8: ifmaxCFCout(u)>FCuin(u)then 9: LetCu argmaxCfFCout(u)g; 10: UpdateCt+1:Ct+1 )]TJ /F2 11.955 Tf 5.48 -9.68 Td[(CtnCu[)]TJ /F4 11.955 Tf 5.48 -9.68 Td[(Cu[u; 11: endif whereAisthesummationofothermodularitycontributions.Similarly,joiningutoDgivesQ2=mC M+p)]TJ /F5 11.955 Tf 13.15 8.09 Td[((dC+euC)2 4(M+p)2+mD+euD M+p)]TJ /F5 11.955 Tf 13.15 8.09 Td[((dD+euD+p)2 4(M+p)2+A,andQ1)]TJ /F4 11.955 Tf 11.96 0 Td[(Q2=1 M+p)]TJ /F4 11.955 Tf 5.48 -9.68 Td[(euC)]TJ /F4 11.955 Tf 11.95 0 Td[(euD+p(dD)]TJ /F4 11.955 Tf 11.96 0 Td[(dC+euD)]TJ /F4 11.955 Tf 11.96 0 Td[(euC) 2(M+p).Now,sinceCisthecommunitythatgivesthemaximumFCout(u),weobtaineuC)]TJ /F4 11.955 Tf 13.15 8.09 Td[(p(dC+euC) 2(M+p)>euD)]TJ /F4 11.955 Tf 13.15 8.09 Td[(p(dD+euD) 2(M+p),whichimplieseuC)]TJ /F4 11.955 Tf 11.96 0 Td[(euD+p(dD)]TJ /F4 11.955 Tf 11.95 0 Td[(dC+euD)]TJ /F4 11.955 Tf 11.95 0 Td[(euC) 2(M+p)>0.Hence,Q1)]TJ /F4 11.955 Tf 11.96 0 Td[(Q2>0andthustheconclusionfollows. 2.2.2NewEdgeWhenanewedgee=(u,v)connectingtwoexistingverticesu,visintroduced,wedivideitfurtherintotwosubcases:eisanintra-communitylink(totallyinsideacommunityC)oraninter-communitylink(connectstwocommunitiesC(u)andC(v)).IfeisinsideacommunityC,itspresencewillstrengthentheinnerstructureofC 30

PAGE 31

accordingtoLemma 1 .Furthermore,byLemma 2 ,weknowthataddingeshouldnotsplitthecurrentcommunityCintosmallermodules.Therefore,weleavethecurrentnetworkstructureintactinthiscase.TheinterestingsituationoccurswheneisalinkconnectingcommunitiesC(u)andC(v)sinceitspresencecouldpossiblymakeu(orv)leaveitscurrentmoduleandjoininthenewcommunity.Additionally,ifu(orv)decidestochangeitsmembership,itcanadvertiseitsnewcommunitytoallitsneighborsandsomeofthemmighteventuallywanttochangetheirmembershipsasaconsequence.ByLemma 3 ,weshowthatshouldu(orv)everchangeitscommunityassignment,C(v)(orC(u))isthebestnewcommunityforit.Buthowcanwequicklydecidewhetheru(orv)shouldchangeitsmembershipinordertoformabettercommunitystructurewithhighermodularity?Tothisend,weprovideacriteriontotestformembershipchangingofuandvinTheorem 2.2 .Here,ifbothqu,C,Dandqv,C,Dfailtosatisfythecriteria,wecansafelypreservethecurrentnetworkcommunitystructure(Corollary 1 ).Otherwise,wemoveu(orv)toitsnewcommunityandconsequentlyletitsneighborsdeterminetheirbestmodulestojoinin,usinglocalsearchandswappingtomaximizegainedmodularity.Figure 2-1A describestheprocedureforthislattercase.ThedetailedalgorithmisdescribedinAlg. 2 Lemma1. ForanyC2C,ifdCM)]TJ /F5 11.955 Tf 12.23 0 Td[(1thenaddinganedgewithinCwillincreaseitsmodularitycontribution. Proof. TheportionQ1thatcommunityCcontributestotheoverallmodularityQisQ1C=mC M)]TJ /F4 11.955 Tf 17.87 8.08 Td[(d2C 4M2.Whenanewedgecomingin,thenewmodularityQ2isQ2C=mC+1 M+1)]TJ /F5 11.955 Tf 15.4 8.09 Td[((dC+2)2 4(M+1)2. 31

PAGE 32

Algorithm2New Edge Input:Edgefu,vgtobeadded;CurrentstructureCt.Output:AnupdatedstructureCt+1. 1: if(uandv=2V)then 2: Ct+1 Ct[fu,vg; 3: elseifC(u)6=C(v)then 4: ifqu,C(u),C(v)<0andqv,C(u),C(v)<0then 5: returnCt+1Ct; 6: else 7: w=argmaxfqu,C(u),C(v),qv,C(u),C(v)g; 8: Movewtothenewcommunity; 9: fort2N(w)do 10: Lettdetermineitsbestcommunity; 11: endfor 12: UpdateCt+1; 13: endif 14: endif Now,takingthedifferencebetweenQ2andQ1givesQC=Q2C)]TJ /F4 11.955 Tf 11.96 0 Td[(Q1C=4M3)]TJ /F5 11.955 Tf 11.96 0 Td[(4mCM2)]TJ /F5 11.955 Tf 11.96 0 Td[(4dCM2)]TJ /F5 11.955 Tf 11.95 0 Td[(4mCM+2d2CM+d2C 4(M+1)2M24M3)]TJ /F5 11.955 Tf 11.96 0 Td[(6dCM2)]TJ /F5 11.955 Tf 11.95 0 Td[(2dCM+2d2CM+d2C 4(M+1)2M2(sincemCdC 2)(2M2)]TJ /F5 11.955 Tf 11.96 0 Td[(2dCM)]TJ /F4 11.955 Tf 11.95 0 Td[(dC)(2M)]TJ /F4 11.955 Tf 11.96 0 Td[(dC) 4(M+1)2M20ThelastinequalityholdssincedCM)]TJ /F5 11.955 Tf 11.96 0 Td[(1implies2M2)]TJ /F5 11.955 Tf 11.96 0 Td[(2dCM)]TJ /F4 11.955 Tf 11.95 0 Td[(dC0. Lemma2. IfCisacommunityinthecurrentsnapshotofG,thenaddinganyintra-communitylinktoCshouldnotsplititintosmallermodules. Proof. Assumethecontradiction,i.e,Cshouldbedividedintosmallermoduleswhenanedgeisaddedintoit.LetX1,X2,..,XkbedisjointsubsetsofCrepresentingthesemodules.LetdiandeijbethetotaldegreeofverticesinsideXiandthenumberoflinksgoingfromXitoXj,respectedly.Assumethat,W.L.O.G.,whenanedgeisaddedinside 32

PAGE 33

C,itisaddedtoX1.WewillshowthatPi6=jdidj 2MkXi=1QXi,orequivalently,mC M)]TJ /F4 11.955 Tf 17.87 8.08 Td[(d2C 4M2>kXi=1)]TJ /F4 11.955 Tf 6.67 -1.6 Td[(mi M)]TJ /F4 11.955 Tf 18.37 8.08 Td[(d2i 4M2.SinceX1,X2,..,XkaredisjointsubsetsofC,itfollowsthatdC=kXi=1diandmC=kXi=1mi+Xid2C 4M2)]TJ /F6 7.97 Tf 18.17 14.94 Td[(kXi=1d2i 4M2,orXiPi
PAGE 34

i.e.,Q2C
PAGE 35

decreasesforallS=2fC,Dg.FDout(u)new)]TJ /F4 11.955 Tf 11.95 0 Td[(FDout(u)old=)]TJ /F4 11.955 Tf 5.48 -9.69 Td[(eDu+1)]TJ /F5 11.955 Tf 13.15 8.09 Td[((du+1)(doutD+1) 2(M+1))]TJ /F11 11.955 Tf 11.95 9.69 Td[()]TJ /F4 11.955 Tf 5.48 -9.69 Td[(eDu)]TJ /F4 11.955 Tf 13.15 8.09 Td[(dudoutD 2M=2M+dudoutD 2M)]TJ /F4 11.955 Tf 13.15 8.09 Td[(dudoutD+doutD+du+1 2(M+1)2M+dudoutD 2(M+1))]TJ /F4 11.955 Tf 13.15 8.08 Td[(dudoutD+doutD+du+1 2(M+1)>0andthusFDout(u)isstrengthenedwhen(u,v)isintroduced.Furthermore,foranycommunityS2CandS=2fC,Dg,wehaveFSout(u)new)]TJ /F4 11.955 Tf 11.95 0 Td[(FSout(u)old=)]TJ /F4 11.955 Tf 5.48 -9.69 Td[(eSu)]TJ /F5 11.955 Tf 13.15 8.09 Td[((du+1)doutS 2(M+1))]TJ /F11 11.955 Tf 11.95 9.69 Td[()]TJ /F4 11.955 Tf 5.48 -9.69 Td[(eSu)]TJ /F4 11.955 Tf 13.15 8.09 Td[(dudoutS 2M=doutS)]TJ /F4 11.955 Tf 9.84 -1.6 Td[(du 2M)]TJ /F4 11.955 Tf 21.21 8.09 Td[(du+1 2(M+1)<0whichimpliesFSout(u)isweakenedwhen(u,v)isconnected.Hence,theconclusionfollows. Theorem2.2. Assumethatanewedge(u,v)isaddedtothenetwork.LetCC(u)andDC(v).Ifqu,C,D4(M+1)(euD+1)]TJ /F4 11.955 Tf 11.73 0 Td[(euC)+euC(2dD)]TJ /F5 11.955 Tf 11.72 0 Td[(2du)]TJ /F4 11.955 Tf 11.73 0 Td[(euC))]TJ /F5 11.955 Tf 11.73 0 Td[(2(du+1)(du+1+dD)]TJ /F4 11.955 Tf 11.73 0 Td[(dC)>0thenjoiningutoDwillincreasetheoverallmodularity. Proof. NodeushouldleaveitscurrentcommunityCandjoininDifQD+u+QC)]TJ /F6 7.97 Tf 6.58 0 Td[(u>QC+QD,orequivalently,mD+eD+1 M+1)]TJ /F5 11.955 Tf 13.15 8.09 Td[((dD+du+2)2 4(M+1)2+mC)]TJ /F4 11.955 Tf 11.96 0 Td[(eC M+1)]TJ /F5 11.955 Tf 13.15 8.09 Td[((dC)]TJ /F4 11.955 Tf 11.96 0 Td[(du)]TJ /F4 11.955 Tf 11.96 0 Td[(eC)2 4(M+1)2>mD M+1)]TJ /F5 11.955 Tf 15.25 8.09 Td[((dD+1)2 4(M+1)2+mC M+1)]TJ /F5 11.955 Tf 15.4 8.09 Td[((dC+1)2 4(M+1)2 35

PAGE 36

Theaboveequationequalsto4(M+1)(eD+1)]TJ /F4 11.955 Tf 11.95 0 Td[(eC)+eC(2dD)]TJ /F5 11.955 Tf 11.96 0 Td[(2du)]TJ /F4 11.955 Tf 11.96 0 Td[(eC))]TJ /F5 11.955 Tf 11.95 0 Td[(2(du+1)(du+1+dD)]TJ /F4 11.955 Tf 11.95 0 Td[(dC)>0,whichconcludestheTheorem. Corollary1. IftheconditioninTheorem 2.2 isnotsatised,thenneitherunoritsneighborsshouldbemovedtoD. Proof. TheprooffollowsfromTheorem 2.2 2.2.3NodeRemovalWhenanexistingnodeuinacommunityCisremoved,allofitsadjacentedgesaredisregardedasaresult.Thiscaseischallenginginthesensethattheresultingcommunityisverycomplicated:itcanbeeitherunchangedorbrokenintosmallerpiecesandcouldprobablybemergedwithothercommunities.Let'sconsidertwoextremecaseswhenasingledegreenodeandanodewithhighestdegreeinacommunityisremoved.Ifasingledegreenodeisremoved,itleavestheresultedcommunityunchanged(Lemma 5 ).However,whenahighestdegreevertexisremoved,thecurrentcommunitymightbedisconnectedandbrokenintosmallerpieceswhichthenaremergedtoothercommunitiesasdepictedinFigure 2-1C .Therefore,identifyingtheleftoverstructureofCisacrucialpartonceavertexinCisremoved.Toquicklyandefcientlyhandlethistask,weutilizethecliquepercolationmethodpresentedin[ 85 ].Inparticular,whenavertexuisremovedfromC,weplacea3-cliquetooneofitsneighborsandletthecliquepercolateuntilnoverticesinCarediscovered(Figure 2-1D ).WethenlettheremainingcommunitiesofCchoosetheirbestcommunitiestomergein.ThedetailedalgorithmispresentedinAlg. 3 2.2.4EdgeRemovalInthelastcasewhenanedgee=(u,v)isremoved,wedividefurtherintofoursubcases(1)eisasingleedgeconnectingonlyuandv(2)eitheruorvhas 36

PAGE 37

Algorithm3Node Removal Input:Nodeu2Ctoberemoved;CurrentstructureCt.Output:AnupdatedstructureCt+1. 1: i 1; 2: whileN(u)6=;do 3: Si=fNodesfoundbya3-cliquepercolationonv2N(u)g; 4: ifSi==;then 5: Si fvg; 6: endif 7: N(u) N(u)nSi; 8: i i+1; 9: endwhile 10: LeteachsingletoninN(u)consideritsbestcommunities; 11: LeteachSiconsideritsbestcommunitiesasin[ 6 ] 12: UpdateCt; degreeone(3)eisaninter-communitylinkconnectingC(u)andC(v)and(4)eisanintra-communitylink.Ifeisansingleedge,itsremovalwillresultinthesamecommunitystructureplustwosingletonsofuandvthemselves.ThesamereactionappliestothesecondsubcasewheneitheruorvhassingledegreeduetoLemma 5 ,thusresultsinthepriornetworkstructureplusu(orv).Wheneisaninter-communitylink,theremovalofewillstrengthenthecurrentnetworkcommunities(Lemma 4 )andhence,wejustmakenochangetotheoverallnetworkstructure.Thelastbutmostcomplicatedcasehappenswhenanintra-communitylinkisdeleted.AsdepictedinFigure 2-1B ,removingthiskindofedgeoftenleavesthecommunityunchangedifthecommunityitselfisdenselyconnected;however,thetargetmodulewillbedividedifitcontainssubstructureswhicharelessattractiveorlooselyconnectedtoeachother.Therefore,theproblemofidentifyingthestructureoftheremainingmodulesisimportant.Theorem 2.3 providesusaconvenienttooltotestforcommunitybi-divisionwhenanintra-communitylinkisremovedfromthehostcommunityC.However,itrequiresanintensivelookforallsubsetsofC,whichmaybetimeconsumingwhenCisbig.Notethatpriortotheremovalof(u,v),thecommunityChostingthislinkshouldcontaindenseconnectionswithinitselfandthus,theremoval 37

PAGE 38

of(u,v)shouldleavesomesortof`quasi-clique'structure[ 85 ]insideC.Therefore,wendallmaximalquasi-cliqueswithinthecurrentcommunityandhavethem(aswellasleftoversingletons)determinetheirbestcommunitiestojoinin.ThedetailedprocedureisdescribedinAlg. 4 Algorithm4Edge Removal Input:Edge(u,v)toberemoved;CurrentstructureCt.Output:AnupdatedclusteringCt+1. 1: if(u,v)isasingleedgethen 2: Ct+1=(Ctnfu,vg)[fug[fvg; 3: elseifEitheru(orv)isofdegreeonethen 4: Ct+1=(CtnC(u))[fug[fC(u)nug; 5: elseifC(u)6=C(v)then 6: Ct+1=Ct; 7: else 8: %Now(u,v)isinsideacommunityC% 9: L=fMaximalquasi-cliquesinCg; 10: LetthesingletonsinCnLconsidertheirbestcommunities; 11: endif 12: UpdateCt+1; Lemma4. IfCandDaretwocommunitiesofG,thentheremovalofaninter-communitylinkconnectingthemwillstrengthenmodularitycontributionsofbothCandD. Proof. LetQ1C(resp.Q1D)andQ2C(resp.Q2D)bethemodularitiesofC(resp.D)beforeandaftertheremovalofthatlink.WeshowthatQ2C>Q1C(andsimilarly,Q2D>Q1D)andthus,CandDcontributehighermodularitiestothenetwork.Q2C)]TJ /F4 11.955 Tf 11.95 0 Td[(Q1C=)]TJ /F4 11.955 Tf 15.57 -1.6 Td[(m1 M)]TJ /F5 11.955 Tf 11.96 0 Td[(1)]TJ /F5 11.955 Tf 16.49 8.09 Td[((d1)]TJ /F5 11.955 Tf 11.96 0 Td[(1)2 4(M)]TJ /F5 11.955 Tf 11.96 0 Td[(1)2)]TJ /F11 11.955 Tf 11.96 9.69 Td[()]TJ /F4 11.955 Tf 6.67 -1.6 Td[(m1 M)]TJ /F4 11.955 Tf 18.37 8.09 Td[(d21 4M2=m1)]TJ /F5 11.955 Tf 19.73 -1.6 Td[(1 M)]TJ /F5 11.955 Tf 11.95 0 Td[(1)]TJ /F5 11.955 Tf 15.76 8.08 Td[(1 M+1 4)]TJ /F4 11.955 Tf 6.87 -1.6 Td[(d1 M)]TJ /F4 11.955 Tf 13.35 8.08 Td[(d1)]TJ /F5 11.955 Tf 11.96 0 Td[(1 M)]TJ /F5 11.955 Tf 11.96 0 Td[(1)]TJ /F4 11.955 Tf 12.35 -1.6 Td[(d1 M+d1)]TJ /F5 11.955 Tf 11.96 0 Td[(1 M)]TJ /F5 11.955 Tf 11.96 0 Td[(1Sincealltermsareallpositive,Q2C)]TJ /F4 11.955 Tf 11.97 0 Td[(Q1C>0.ThesametechniqueappliestoshowthatQ2D>Q1D. Lemma5. Theremovalof(u,v)insideacommunityCwhereonlyuorvisofdegreeonewillnotseparateC. 38

PAGE 39

Proof. Assumethecontradiction,i.e.,aftertheremovalof(u,v)wheredu=1,CisbrokenintosmallercommunitiesX1,X2,...,Xkwhichcontributehighermodularity:QX1+...+QXk>QC.W.L.O.G.,supposeuwasconnectedtoX1priortoitsremoval.ItfollowsthatQX1+u>QX1andthusQX1+u+...+QXk>QC,whichraisesacontradictionsinceCisoriginallyacommunityofC. Lemma6. (Separationofacommunity)LetC1CandC2=CnC1betwodisjointsubsetsofC.(CnC)[fC1,C2gisacommunitystructurewithhighermodularitywhenanedgecrossingC1andC2isremoved,i.e.,CshouldbeseparatedintoC1andC2,ifandonlyife12e12)]TJ /F5 11.955 Tf 11.96 0 Td[(1 M)]TJ /F5 11.955 Tf 11.95 0 Td[(1,(d1+d2)]TJ /F5 11.955 Tf 11.96 0 Td[(2)2 4(M)]TJ /F5 11.955 Tf 11.96 0 Td[(1)2)]TJ /F5 11.955 Tf 16.48 8.09 Td[((d1)]TJ /F5 11.955 Tf 11.95 0 Td[(1)2 4(M)]TJ /F5 11.955 Tf 11.95 0 Td[(1)2)]TJ /F5 11.955 Tf 16.48 8.09 Td[((d2)]TJ /F5 11.955 Tf 11.96 0 Td[(1)2 4(M)]TJ /F5 11.955 Tf 11.95 0 Td[(1)2>m1+m2+e12)]TJ /F5 11.955 Tf 11.95 0 Td[(1 M)]TJ /F5 11.955 Tf 11.95 0 Td[(1)]TJ /F4 11.955 Tf 13.15 8.09 Td[(m1)]TJ /F5 11.955 Tf 11.95 0 Td[(1 M)]TJ /F5 11.955 Tf 11.95 0 Td[(1)]TJ /F4 11.955 Tf 13.15 8.09 Td[(m2)]TJ /F5 11.955 Tf 11.96 0 Td[(1 M)]TJ /F5 11.955 Tf 11.96 0 Td[(1,m1)]TJ /F5 11.955 Tf 11.95 0 Td[(1 M)]TJ /F5 11.955 Tf 11.95 0 Td[(1)]TJ /F5 11.955 Tf 16.48 8.09 Td[((d1)]TJ /F5 11.955 Tf 11.95 0 Td[(1)2 4(M)]TJ /F5 11.955 Tf 11.96 0 Td[(1)2+m2)]TJ /F5 11.955 Tf 11.95 0 Td[(1 M)]TJ /F5 11.955 Tf 16.49 8.09 Td[((d2)]TJ /F5 11.955 Tf 11.96 0 Td[(1)2 4(M)]TJ /F5 11.955 Tf 11.96 0 Td[(1)2>m1+m2+e12)]TJ /F5 11.955 Tf 11.95 0 Td[(1 M)]TJ /F5 11.955 Tf 11.95 0 Td[(1)]TJ /F5 11.955 Tf 13.15 8.09 Td[((d1+d2)]TJ /F5 11.955 Tf 11.96 0 Td[(2)2 4(M)]TJ /F5 11.955 Tf 11.96 0 Td[(1)2,q1+q2>qC.Thus,theconclusionfollows. Theorem2.3. (Communitybi-division)ForanycommunityC,letandbethelowestandthesecondhighestdegreeofverticesinC,respectively.AssumethatanedgeeisremovedfromC.IftheredonotexistsubsetsC1CandC2CnC1suchthateiscrossingC1andC2andminf(dC)]TJ /F15 7.97 Tf 6.59 0 Td[(),(dC)]TJ /F15 7.97 Tf 6.58 0 Td[()g 2M
PAGE 40

Proof. FromLemma 6 ,itfollowsthatinordertoreallybenettheoverallmodularitywemusthaved1d2 2M
PAGE 41

AN=1000,=0.1 BN=1000,=0.3 CN=5000,=0.1 DN=5000,=0.3 Figure2-2. NMIscoresonsynthesizednetworkswithknowncommunities 2.3ExperimentalResultsInthissection,werstvalidateourapproachesondifferentsynthesizednetworkswithknowngroundtruths,andthenpresentourndingsonrealworldtracesincludingtheEnronemail[ 98 ],arXiveprintcitation[ 22 ],andFacebooksocialnetworks[ 100 ].Tocertifytheperformanceofouralgorithms,wecompareQCAtootheradaptivecommunitydetectionmethodsincluding(1)MIENalgorithmproposedbyThangetal.[ 26 ],(2)FacetNetframeworkproposedbyLinetal.[ 70 ],and(3)OSLOMmethodsuggestedbyLancichinettietal.[ 60 ]. 41

PAGE 42

AN=1000,=0.1 BN=1000,=0.3 CN=5000,=0.1 DN=5000,=0.3 Figure2-3. Modularityvaluesonsynthesizednetworkswithknowncommunities 2.3.1ResultsonSynthesizedNetworksOfcourse,thebestwaytoevaluateourapproachesistovalidatethemonrealnetworkswithknowncommunitystructures.Unfortunately,weoftendonotknowthatstructuresbeforehand,orsuchstructurescannotbeeasilyminedfromthenetworktopology.Althoughsynthesizednetworksmightnotreectallthestatisticalpropertiesofrealones,theyprovideusknowngroundtruthsviaplantedcommunities,andtheabilitytovaryotherparameterssuchassizes,densitiesandoverlappinglevels,etc.Testingcommunitydetectionmethodsongenerateddatahasbecomeanusualpracticethatiswidelyacceptedintheeld[ 58 ].Hence,comparingQCAwithotherdynamicmethods 42

PAGE 43

onsynthesizednetworksnotonlycertiesitsperformancebutalsoprovidesusthecondencetoitsbehaviorsonrealworldtraces.Setup.Weusethewell-knownLFRbenchmark[ 58 ]togenerate40networkswith10snapshots.Parametersare:thenumberofnodesN=f1000,5000g,themixingparameter=f0.1,0.3gcontrollingtheoverallsharpnessofthecommunitystructure.Inordertoquantifythesimilaritybetweentheidentiedcommunitiesandthegroundtruth,weadoptawellknownmeasureinInformationTheorycalledNormalizedMutualInformation(NMI).NMIhasbeenproventobereliableandiscurrentlyusedintestingcommunitydetectionalgorithms[ 58 ].Basically,NMI(U,V)equals1ifstructuresUandVareidenticalandequals0iftheyaretotallyseparated,andthehigherNMIthebetter.Results.TheNMIandModularityvaluesarereportedinFigures 2-2 and 2-3 .Asdepictedintheirsubgures,theNMIvaluesandmodularitiesindicatedbyourQCAmethod,ingeneral,areveryhighandcompetitivewiththoseofOSLOMwhilearemuchbetterthanthoseproducedbyMIENandFacetNetmethods.Onthesegeneratednetworks,weobservethatMIENandFacetNetperformwellwhenthemixingparameterissmall,i.e.,whenthenetworkcommunitystructuresareclear,however,theirperformancesdegradedramaticallywhenthesestructuresbecomelessclearasgetslarger.Particularly,MIEN'andFacetNet'NMIscoresandmodularitiesinalltestcasesarefairlylowandusuallyfrom10%to50%and5%to15%worstthanthoseproducedbyQCA.Thisimpliesthenetworkcommunitiesrevealedbythesemethodsarenotashighsimilaritytotheground-truthasQCAalgorithm.Onthegeneratednetworks,OSLOMalgorithmperformsverywellassuggestedthroughitshighNMIscoresandmodularityvalues.Inparticular,OSLOMtendstoperformbetterthanQCAintherstcoupleofnetworksnapshots,however,itsperformanceistakenoverbyQCAwhenthenetworksevolveovertime,especiallyattheendoftheevolutionwhereOSLMrevealsbiggapsinsimilaritytotheplantednetworkcommunities(NotethatthehigherNMIscoreattheendoftheevolution,thebetterthenaldetectedcommunitystructure).This 43

PAGE 44

concludesthatthenetworkcommunitiesdiscoveredbyQCAareofthebestsimilaritytoonesplantedintheground-truthincomparisonwithothermethods. 2.3.2ResultsonRealWorldTracesWenextpresenttheresultsofQCAalgorithmsonrealworlddynamicsocialnetworksincludingENRONemail[ 98 ],arXive-printcitation[ 22 ],andFacebooknetworks[ 100 ].Duetothelackofappropriatecommunitiescorrespondingtothesetraces,wereporttheperformanceoftheaforementionedalgorithmsinreferencetothestaticmethodproposedbyBlondeletal.[ 6 ].Inparticular,wewillshowthefollowingquantities(1)modularityvalues,(2)thequalityoftheidentiednetworkcommunitiesthroughNMIscores,and(3)theprocessingtimeofourQCAincomparisonwithothermethods.Theabovenetworkspossesstocontainstrongcommunitystructuresduetotheirhighmodularities,whichwasthemainreasonforthemtobechosen.Foreachnetwork,timeinformationisrstextractedandaportionofthenetworkdata(usuallytherstsnapshot)isthencollectedtoformthebasicnetworkcommunitystructure.OurQCAmethod(asloMIENandOSLOM)takeintoaccountthatbasiccommunitystructureandrunonthenetworkchangeswhereasthestaticmethodhastobeperformedonthewholenetworksnapshotforeachtimepoint.Inthisexperiment,FacetNetmethoddoesnotappeartocompletethetasksinatimelymanner,andisthusexcludedfromtheplots.ENRONemailnetwork Data.TheEnronemailnetworkcontainsemailmessagesdatafromabout150users,mostlyseniormanagementofEnronInc.,fromJanuary1999toJuly2002[ 98 ].EachemailaddressisrepresentedbyanuniqueIDinthedatasetandeachlinkcorrespondstoamessagebetweenthesenderandthereceiver.Afteradatarenementprocess,wechoose50%oftotallinkstoformabasiccommunitystructureofthenetworkwith7majorcommunities,andsimulatethenetworkevolutionviaaseriesof21growingsnapshots. 44

PAGE 45

AModularity BNumberofCommunities CRunningTime(s) DNMI Figure2-4. SimulationresultsonEnronemailnetwork. Results.WerstevaluatethemodularityvaluescomputedbyQCA,MIEN,OSLOM,andBlondelmethods.AsshowninFigure 2)]TJ /F5 11.955 Tf 11.96 0 Td[(4A ,ourQCAalgorithmarchivescompetitivelyhighermodularitiesthanthestaticmethodbutalittlebitlessthanMIEN,andisfarbetterthanthoseobtainedbyOSLOM.Moreover,QCAalsosuccessesinmaintainingthesamenumbersofcommunitiesoftheothertwomethodsMIENandBlondelwhileOSLOM'sarevague(Figure 2)]TJ /F5 11.955 Tf 11.95 0 Td[(4B ).Inparticular,themodularityvaluesproducedbyQCAverywellapproximatethosefoundbystaticmethodwithlesservariation.Therearereasonsforthat.RecallthatourQCAalgorithmtakesintoaccountthebasiccommunitystructuresdetectedbythestaticmethod(attherstsnapshot)andprocessesonnetworkchangesonly.Knowingthebasicnetworkcommunitystructure 45

PAGE 46

AModularity BNumberofCommunities CRunningTime(s) DNMI Figure2-5. SimulationresultsonarXive-printcitationnetwork. isagreatadvantageofourQCAalgorithm:itcanavoidthehassleofsearchingandcomputingfromscratchtoupdatethenetworkwithchanges.Infact,QCAusesthebasicstructureforndingandquicklyupdatingthelocaloptimalcommunitiestoadaptwithchangesintroducedduringthenetworkevolution.TherunningtimeofQCAandthestaticmethodinthissmallnetworkarerelativelyclose:thestaticmethodrequiresonesecondtocompleteeachofitstaskswhileourQCAdoesnotevenaskforone(Figure 2)]TJ /F5 11.955 Tf 11.96 0 Td[(4C ).Inthisdataset,MIENandOSLOMrequiresalittlemoretime(1.5and2.4secondsinaverageforMIENandOSLOM)tocompletetheirtasks.TimeandcomputationalcostaresignicantlyreducedinQCA 46

PAGE 47

AModularity BNumberofCommunities CRunningTime(s) DNMI Figure2-6. SimulationresultsonFacebooksocialnetwork. sinceouralgorithmsonlytakeintoaccountthenetworkchangeswhilethestaticmethodhastoworkonthewholenetworkeverytime.AsreportedinFigure 2)]TJ /F5 11.955 Tf 11.96 0 Td[(4D ,boththeNMIscoresofoursandMIENmethodareveryhighandrelativelycloseto1whilethoseobtainedbyOSLOMfallshortandarefarfromstable.TheseresultsindicatethatinthisEnronemailnetwork,bothQCAandMIENalgorithmsareabletoidentifyhighqualitycommunitystructurewithhighmodularityandsimilarity;however,onlyourmethodsignicantlyreducestheprocessingtimeandcomputationalrequirement. 47

PAGE 48

arXive-printcitationnetwork Data.ThearXive-printcitationnetwork[ 22 ]hasbecomeanessentialmeanofassessingresearchresultsinvariousareasincludingphysicsandcomputersciences.Thisnetworkcontainedmorethan225KarticlesfromJanuary1996toMay2003.Inourexperiments,citationlinksofthersttwoyears1996and1997wereusedtoformthebasiccommunitystructureofourQCAmethod.Inordertosimulatethenetworkevolution,atotalof30timedependentsnapshotsarecreatedonatwo-monthregularbasisfromJanuary1998toJanuary2003.Results.WecomparemodularityresultsobtainedbyQCAalgorithmateachnetworksnapshottoBlondelaswellastoMIENandOSLOMmethods.ItrevealsfromFigure 2-5A thatthemodularitiesreturnedbyQCAareveryclosetothoseobtainedbythestaticmethodwithmuchmorestablerandarefarhigherthanthoseobtainedbyOSLOMandMIEN.Inparticular,themodularityvaluesproducedbyQCAalgorithmcoverfrom94%upto100%thatofBlondelmethodandfrom6%to10%higherthanMIENandatleast1.5xbetterthanOSLOM.Inthiscitationnetworks,thenumbersofcommunitiesdetectedbyOSLOMtakeoffwithmorethan1200whereasthosefoundbyQCA,MIENandBlondelmethodsarerelativelysmall(Figure 2-5B ).OurQCAmethoddiscoversmorecommunitiesthanbothBlondelandMIENasthenetworkevolvesandthiscanbeexplainedbasedontheresolutionlimitofmodularity[ 35 ]:thestaticmethodmightdisregardsomesmallcommunitiesandtendtocombinetheminordertomaximizetheoverallnetworkmodularity.AsecondobservationontherunningtimeshowsthatQCAoutperformsthestaticmethodaswellasitscompetitorMIEN:QCAtakesatmost2secondstocompleteupdatingthenetworkstructurewhileBlondelmethodrequiresmorethantriplethatamountoftime,MIENandOSLOMasksformorethan5times(Figure 2)]TJ /F5 11.955 Tf 11.95 0 Td[(5C ).Inaddition,higherNMIscoresofQCAthanMIEN'sandespeciallyOSLOM'sscores(Figure 2)]TJ /F5 11.955 Tf 11.95 0 Td[(5D )impliesnetworkcommunitiesidentiedbyourapproacharenotonly 48

PAGE 49

ofhighsimilaritytothegroundtruthbutalsomoreprecisethanthatdetectedbyMIEN,whilethecomputationalcostandtherunningtimearesignicantlyreduced.Facebooksocialnetwork Data.ThisdatasetcontainsfriendshipinformationamongNewOrleansregionalnetworkonFacebook[ 100 ],spanningfromSeptember2006toJanuary2009withmorethan60Knodes(users)connectedbymorethan1.5millionfriendshiplinks.Inourexperiments,nodesandlinksfromSeptember2006toDecember2006areusedtoformthebasiccommunitystructureofthenetwork,andeachnetworksnapshotisrecoredaftereverymonthduringJanuary2007toJanuary2009foratotalof25networksnapshots.Results.TheevaluationdepictedinFigure 2)]TJ /F5 11.955 Tf 11.96 0 Td[(6A revealsthatQCAalgorithmachievescompetitivemodularitiesincomparisonwiththestaticmethod,andagainfarbetterthanthoseobtainedbyMIENandOSLOMmethod,especiallyincomparisonwithOSLOMwhoseperformwasniceonsynthesizednetworks.Inthegeneraltrend,thelinerepresentingQCAresultscloselyapproximatesthatofthestaticmethodwithmuchmorestability.Moreover,thetwonalmodularityvaluesattheendoftheexperimentarerelativelythesame,whichmeansthatouradaptivemethodperformscompetitivelywiththestaticmethodrunningonthewholenetwork.Figure 2)]TJ /F5 11.955 Tf 11.95 0 Td[(6C describestherunningtimeofthethreemethodsontheFacebookdataset.Asonecanseefromthisgure,QCAtakesatleast3secondsandatmost4.5secondstosuccessfullycomputeandupdateeverynetworksnapshotwhereasthestaticmethod,again,requiresmorethantripleprocessingtime.MIENandOSLOMmethodsreallysufferonthislargescalenetworkwhenrequiringmorethan10xand11xthatamountsofQCArunningtimes.Inconclusion,highNMIandmodularityscorestogetherwithdecentexecutingtimesonalltestcasesconrmtheeffectivenessofouradaptivemethod,especiallywhenappliedtorealworldsocialnetworkswherea 49

PAGE 50

centralizedalgorithm,orotherdynamicalgorithms,maynotbeabletodetectagoodnetworkcommunitystructureinatimelymanner.However,thereisalimitationofQCAalgorithmweobserveonthislargenetworkandwanttopointouthere:Asthethedurationofnetworkevolutionlastslongerovertime(i.e.,thenumberofnetworksnapshotsincreases),ourmethodtendstodividethenetworkintosmallercommunitiestomaximizethelocalmodularity,thusresultsinanincreasingnumberofcommunitiesandadecreasingofNMIscores.Figure 2)]TJ /F5 11.955 Tf 11.95 0 Td[(6B and 2)]TJ /F5 11.955 Tf 11.96 0 Td[(6D describesthisobservation.Forinstance,atsnapshot12(ayearafterDecember2006),theNMIscoreisapproximately1=2andcontinuesdecayingafterthistimepoint.Itimpliesarefreshmentofnetworkcommunitystructureisrequiredatthistime,afteralongenoughduration.Thisisreasonablesinceactivitiesonanonlinesocialnetwork,especiallyonFacebooksocialnetwork,tendtocomeandgorapidlyandlocaladaptiveproceduresarenotenoughtoreectthewholenetworktopologyoveralongperiodoftime. 50

PAGE 51

CHAPTER3OVERLAPPINGCOMMUNITYSTRUCTUREDETECTIONInthischapter,wepresentAFOCS,anadaptiveframeworktodiscoverandtracetheevolutionofnetworkcommunitiesindynamiccomplexsystems.Insection 3.1 ,werststatetheproblemdenitionincludingbasicnotationsandthedynamicnetworkmodel.Next,wepresenttheproceduretodetectthebasiccommunitystructureinsection 3.2 ,andthenourAFOCSframeworktoupdateandtracethecommunitystructureevolutionovertimeinsection 3.3 .Finally,wedemonstratetheempiricalresultsinsection 3.4 3.1ProblemFormulation 3.1.1BasicNotationsLetG=(V,E)beanundirectedunweightedgraphrepresentingthenetworkwhereVisthesetofNnodesandEisthesetofMconnections.DenotebyC=fC1,C2,...,Ckgthenetworkcommunitystructure,i.e.,acollectionofsubsetsofVwhereeachCi2CanditsinducedsubgraphformacommunityofG.Incontrastwiththedisjointcommunitystructure,weallowCi\Cj6=;sothatnetworkcommunitiescanoverlapwitheachother.Foranodeu2V,letdu,N(u)andCom(u)denoteitsdegree,itsneighborsanditssetofcommunitylabels,respectively.ForanyCV,letCinandCoutdenotethesetoflinkshavingbothendpointsinCandthesetoflinkshavingexactlyoneendpointinC,respectively.Finally,thetermsnode-vertexaswellasedge-link-connectionareusedinterchangeably. 3.1.2DynamicNetworkModelLetG0=(V0,E0)betheoriginalinputnetworkandGt=(Vt,Et)beatimedependentnetworksnapshotrecordedattimet.DenotebyVtandEtthesetsofnodesandedgestobeaddedtoorremovedfromthenetworkattimet.Furthermore,letGt=(Vt,Et)describethischangeintermsofthewholenetwork.Thenetworksnapshotatnexttimestept+1isexpressedasacombinationofthepreviousone 51

PAGE 52

Figure3-1. Overlappedv.s.non-overlappedcommunitystructures. togetherwiththechange,i.e.,Gt+1=Gt[Gt.Finally,adynamicnetworkGisdenedasasequenceofnetworksnapshotschangingovertime:G=(G0,G1,G2,...). 3.1.3DensityFunctionInordertoquantifythegoodnessofanidentiedcommunity,weusethepopulardensityfunction[ 33 ]denedas:(C)=2jCinj jCj(jCj)]TJ /F7 7.97 Tf 8.94 0 Td[(1)whereCV.Unlikethecaseofdisjointcommunitystructure,inwhichthenumberofconnectionscrossingcommunitiesshouldbelessthanthoseinsidethem,ourobjectivedoesnottakeintoaccountthenumberofout-goinglinksfromeachcommunity.Tounderstandthereason,letusconsiderasimpleexamplepicturedinFigure 3)]TJ /F5 11.955 Tf 11.95 0 Td[(1 .Intheoverlappingcommunitystructurepointofview,itisclearthateverycliqueshouldformacommunityonitsown,andeachcommunityshareswiththecentralcliqueexactlyonenode.However,inthedisjointcommunitystructurepointofview,anyvertexatthecentralcliquehasninternaland2nexternalconnections,whichviolatestheconceptofacommunityinthestrongsense.Furthermore,theinternalconnectivityofthecentralcliqueisalsodominatedbyitsexternaldensity,whichimpliestheconceptofacommunityinweaksenseisalsoviolated.(AcommunityCisinaweaksenseifjCinj>jCoutj,andinastrongsenseifanynodeinChasmorelinksinwardthanoutwardC[ 91 ]). 52

PAGE 53

InordertosetupathresholdontheinternaldensitythatsufcesforasetofnodesCtobealocalcommunity,weproposeafunction(C)denedasfollows:(C)=(C) )]TJ /F10 7.97 Tf 5.48 -4.38 Td[(jCj2where(C)=jCj21)]TJ /F16 5.978 Tf 15.74 3.26 Td[(1 (jCj2)Here(C)isthethresholdonthenumberofinnerconnectionsthatsufcesforCtobealocalcommunity.Particularly,asubgraphinducedbyCisalocalcommunityiff(C)(C)orequivalentlyjCinj(C).Severalfunctionswiththesamepurposehavebeenintroducedintheliterature,forinstance,intheworkof[ 56 ][ 62 ],anditisworthnotingdownthemaindifferencesbetweenthemandours.Firstandforemost,ourfunctions(C)and(C)locallyprocessonthecandidatecommunityConlyandneitherrequireanypredenedthresholdsoruser-inputparameters.Secondly,byProposition 3.1 ,(C)and(C)areincreasingfunctionsandcloselyapproachC'sfullconnectivityaswellasitsmaximaldensity.Thatmakes(C)and(C)relaxationversionsofthetraditionaldensityfunction,yetusefulonesasweshallseeintheexperiments. Proposition3.1. Thefunctionf(n)=n1)]TJ /F16 5.978 Tf 7.83 3.26 Td[(1 nisstrictlyincreasingforn4andlimn!1f(n)=n. 3.1.4ObjectiveFunctionOurobjectiveistondacommunityassignmentforthesetofnodesVwhichmaximizestheoverallinternaldensityfunction(C)=PC2C(C)sincethehighertheinternaldensityofacommunityis,thecleareritsstructurewouldbe.Althoughourobjectiveputsmorefocusontheinternaledgesandlessfocusontheexternaledges,theseexternaledgesarenotcompletelyignoredbutareconsideredinthefollowingsenses:theywillbetestedlaterfortheformationofanothercommunityifthenumberofedgessufces.Onlywhentheseexternaledgesarereallysparse,theywillnotbeconsidered. 53

PAGE 54

3.1.5ProblemDenitionGivenadynamicnetworkG=(G0,G1,G2,...)whereG0istheinputnetworkandG1,G2,...arenetworksnapshotsobtainedthroughacollectionofnetworktopologychangesG1,G2,...overtime.TheproblemasksforanadaptiveframeworktoefcientlydetectandupdatethenetworkoverlappingcommunitystructureCtatanytimepointtbyonlyutilizingtheinformationfromtheprevioussnapshotCt)]TJ /F7 7.97 Tf 6.59 0 Td[(1,aswellastracingtheevolutionofthenetworkcommunities.Inthenextsection,wepresentourmaincontribution:anadaptiveframeworkfor(1)identifyingbasicoverlappedcommunitystructureinanetworksnapshotand(2)updatingaswellastracingtheevolutionofthenetworkcommunitiesinadynamicnetworkmodel.First,wedescribeFOCS,aproceduretoidentifythebasiccommunitiesinastaticnetwork,andthendiscussingreatdetailhowAFOCSadaptivelyupdatesthesebasiccommunitiestocaterwiththeevolutionofthedynamicnetwork. 3.2BasicCommunityStructureDetectionWedescribeFOCS,therstphaseofourframeworkthatquicklydiscoversthebasicoverlappingnetworkcommunitystructure.Ingeneral,FOCSworkstowardtheclassicationofnetworknodesintodifferentgroupsbyrstlocatingallpossibledenselyconnectedpartsofthenetwork( 3.2.1 ),andthencombiningthosewhohighlyoverlapwitheachother,i.e.,thoseshareasignicantsubstructure( 3.2.2 ).Finally,analrenementtogroupunassignednodesintodifferentcommunitiesisconductedin( 3.2.3 ).InFOCS,(theinputoverlappingthreshold)deneshowmuchsubstructuretwocommunitiescanshare.NotethatFOCSfundamentallydiffersfrom[ 1 ]inthewayitallowsjCi\Cjj2foranysubsetsCi,CjofV,andconsequentlyallowsnetworkcommunitiestooverlapnotonlyatasinglevertexbutalsoatapartofthewholecommunity. 54

PAGE 55

A B Figure3-2. Locatingandmerginglocalcommunities. 3.2.1LocatingLocalCommunitiesLocalcommunitiesareconnectedpartsofthenetworkwhoseinternaldensitiesaregreaterthanacertainlevel.InFOCS,thislevelisautomaticallydeterminedbasedonthefunction()andthesizeofeachcorrespondingpart.Particularly,alocalcommunityisdenedbasedonaconnection(u,v)whenthenumberofinternalconnectionsinthesubgraphinducedbyCfu,vg[(N(u)\N(v))exceeds(C),orinotherwords,when(C)(C)asillustratedinFigure 3-2A .Here,(a)AlocalcommunityCdenedbyalink(u,v).Here(C)=0.9>(C)=0.794(b)Mergingtwolocalcommunitiessharingasignicantsubstructure(OSscore=1.027>=0.8).However,thereisaproblemthatmighteventuallyarise:thecontainmentofsubcommunitiesinanactualbiggerone.Intuitively,onewouldliketodetectabiggercommunityuniedbysmalleronesifthebiggercommunityisitselfdenselyconnected.Inordertolterthisundesiredcase,weimpose)]TJ 7.47 -.71 Td[(Ssi=1Ci<)]TJ 7.47 -.71 Td[(Ssi=1Ci8s=1...jCj(notethatsomeoftheseunicationsdonotcontainallthenodes).Inaddition,weallowthislocatingproceduretoskipovertinycommunitiesofsizelessthan4.ThisconditioniscarriedoutfromProposition 3.1 .Thismakessenseintermsofmobileorsocialnetworkswhereagroupofmobiledevicesorasocialcommunityusuallyhassizelargerthan3,andintuitivelyagreeswiththendingof[ 34 ][ 66 ].Thus,theconditionjCj4is 55

PAGE 56

imposedforanycommunityCwediscusshereafter.Thetinycommunitieswillthenbeidentiedlater.Alg. 6 describesthisprocedure. Algorithm6Locatinglocalcommunities Input:G=(V,E)Output:AcollectionofrawcommunitiesCr. 1: Cr ;; 2: for((u,v)2E)do 3: if(Com(u)\Com(v)=;)then 4: C fu,vg[N(u)\N(v); 5: if(jCinj(C)andjCj4)then 6: CheckC'sconnectivityifjCj=5; 7: DeneCalocalcommunity; 8: /*IncludeCintotherawcommunitystructure*/ 9: Cr Cr[fCg; 10: endif 11: endif 12: endfor Lemma7. AlllocalcommunitiesC'sdetectedbyAlg. 6 satisfy(C)(4)0.74.Furthermore,othercommunitiessatisfyingtheseconditionswillalsobedetectedbyAlg. 6 Proof. Alg. 6 willexamineeveryedge(u,v)2E(exceptthosewhoseendpointsarealreadyinthesamecommunity),andbythisgreedynature,anylocalcommunityitdetectshasjCj>4and(C)(C)(4)0.74.WenowshowthatanycommunityCstatisfyingjCj4and(C)(C)(4)willalsobedetectedbyAlg. 6 .Supposeotherwise,thatisthereexistsacommunityCsatisfyingthesetwoconditionsandisnotdetectedbyAlg. 6 .Toprovethatthisisnotthecase,wedothefollowing:(1)ConstructacommunityDwhichisnotdetectedbyAlg. 6 withjDj=njCjand(D)ismaximized,and(2)showthat(D)<(D).BecausejDj=jCj,itimplies(D)=(C).However,since(D)ismaximized,(D)(C)whichinturnimplies(C)(D)<(D)=(C).Thisraisesacontradictiontoouroriginalassumption,andthusconcludestheproof. 56

PAGE 57

ToconstructD,wedoasfollow(i)makeDacliqueofsizen,and(ii)removeedgesfromDonebyoneuntilDcannotbedetectedbyAlg. 6 .Bydoinginthisway,(D)ismaximizediffthenumberofremovededgesisminimized.ItiseasytondtheleastnumberofedgeswehavetoremovefromDisn=2ifnisevenandn=2)]TJ /F5 11.955 Tf 12.75 0 Td[(1ifnisodd.Therefore,mD=n(n)]TJ /F5 11.955 Tf 12.75 0 Td[(1)=2)]TJ /F4 11.955 Tf 12.75 0 Td[(n=2ifniseven,andmD=n(n)]TJ /F5 11.955 Tf 12.03 0 Td[(1)=2)]TJ /F5 11.955 Tf 12.04 0 Td[((n)]TJ /F5 11.955 Tf 12.04 0 Td[(1)=2ifnisodd.Now,(D)<(D)iffmD<)]TJ /F6 7.97 Tf 6.67 -4.21 Td[(n(n)]TJ /F7 7.97 Tf 6.59 0 Td[(1) 21)]TJ /F16 5.978 Tf 16.67 3.26 Td[(2 n(n)]TJ /F16 5.978 Tf 5.75 0 Td[(1).Letf(n)bethedifferencebetweentheleftandtherighthandsides,weshowthatf(n)<0asnincreases.Takingthederivativeoff(n)givesf(4)<0andf(n)4,andf(7)<0andf(n)7.Whenn=5,f(5)>0butthisistheonlyexceptionandthus,canbehandledeasilyinline 6 ofAlg. 6 .Therefore,wehave(D)<(D),andhence,theconclusionfollows. Theorem3.1. ThelocalcommunitystructureCrdetectedbyAlg. 6 satises(Cr)(4)(OPT)whereOPTistheoptimaldensecommunityassignmentsatisfying(S)(4)foranyS2OPT. Proof. LetCrbethelocalcommunitystructurereturnedbyAlg. 6 ,andOPTbetheoptimalsolutionofthedensecommunityassignmentsatisfying(S)(4)foranyS2OPT.Letk=jOPTj.Clearly(OPT)k.ByLemma 7 ,weknowthatAlg. 6 candetectasmanycommunitiesasOPTbutprobablywithlessinternaldensity.Moreover,sinceAlg. 6 onlyskipsoveredgesinacommunity,itensuresthatnorealcommunityisasubstructureofabiggerone.Hence,wehave(Cr)(4)k0.74(OPT).ThisalsoimpliesthatAlg. 6 isan0.74-approximationalgorithmforndinglocaldenselyconnectedcommunities. Lemma8. ThetimecomplexityofAlg. 6 isO(dM)whered=maxv2Vdv. Proof. Timetoexamineanedge(u,v)isjN(u)j+jN(v)j=du+dv.However,whenuandvareinthesamecommunity,(u,v)willbeskipped.Therefore,thetotaltimecomplexityisupperboundedbydPu2Vdu=O(dM). 57

PAGE 58

3.2.2CombiningOverlappingCommunitiesAfterAlg. 6 nishes,therawnetworkcommunitystructureispicturedasacollectionof(possiblyoverlapped)densepartsofthenetworktogetherwithoutliers.Assomeofthosedensepartscanpossiblysharesignicantsubstructures,weneedtomergethemiftheyarehighlyoverlapped.Tothisend,weintroducetheoverlappingscoreoftwocommunitiesdenedasfollowOS(Ci,Cj)=jIijj minfjCij,jCjjg+jIinijj minfjCinij,jCinjjgwhereIij=Ci\Cj.Basically,OS(Ci,Cj)valueshowimportantthecommonnodesandlinkssharedbetweenCiandCjmeantothesmallercommunity.Incomparisonwiththedistancemetricsuggestedin[ 63 ],ouroverlappingscorenotonlytakesintoaccountthefractionofcommonnodesbutalsovaluesthefractionofcommonconnections,whichiscrucialinordertocombinenetworkcommunities.Furthermore,OS(,)issymmetricandscaleswellwiththesizeofanycommunity,andthehighertheoverlappingscore,themorethosecommunitiesinconsiderationshouldbemerged.Inthismergingprocess,wecombinecommunitiesCiandCjifOS(Ci,Cj)(Figure 3-2B ). Algorithm7Combininglocalcommunities Input:RawcommunitystructureCrOutput:ArenedcommunitystructureD. 1: D Cr; 2: Done false; 3: while(!Done)do 4: Done true; 5: Order(Ci,Cj)'sbytheirOS(Ci,Cj)scores; 6: for(Ci,Cj2Cr)do 7: if(OS(Ci,Cj)>and)then 8: C CombineCiandCj; 9: /*Updatethecurrentstructure*/ 10: D (CfnfCi[Cjg)[C; 11: Done False; 12: endif 13: endfor 14: endwhile 58

PAGE 59

ThetimecomplexityofAlg. 7 isO(N20)whereN0isthenumberoflocalcommunities.Clearly,N0Mandthus,itcanbeO(M2).However,whentheintersectionoftwocommunitiesisupperbounded,byLemma 9 weknowthatthenumberoflocalcommunitiesisalsoupperboundedbyO(N),andthus,thetimecomplexityofAlg. 7 isO(N2).Inourexperiments,weobservethattherunningtimeofthisprocedureis,indeed,muchlessthanO(N2). Lemma9. ThenumberofrawcommunitiesdetectedinAlg. 6 isO(N)whenthenumberofnodesintheintersectionofanytwocommunitiesisupperboundedbyaconstant. Proof. ForeachCi2C,decomposeitintooverlappedandnon-overlappedparts,denotedbyCoviandCnovi.WehaveCi=Covi[CnoviandCovi\Cnovi=;.Therefore,jCij=jCovij+jCnovij.Now,XCi2CjCij=XCi2C(jCovij+jCnovij)N+Xi
PAGE 60

communitiesafterwards.Otherwise,theycanbemergedtootherdensepartstobecomenewbiggercommunities.Asaresult,ifthecommunitiesarehighlyoverlapped,someofthemcanpotentiallygrowtoverylargesizesattheendofthemergingprocess,besidethesmallcliquesdetectedattherstplace.Largerdensequasi-cliques,thoughrareinmanynetworks,willsurelybedetectedbyFOCSasweobservedinTheorem 3.1 3.2.3RevisitingUnassignedNodesEvenwhentheabovetwoproceduresareexecuted,therewouldstillexistleftovernodesoredgesduetotheirlessattractiontotherestofthenetwork.Becauseofitssizeconstraint,therstprocedureskipsovertinycommunitiesofsizeslessthanfourandthus,mayleaveoutsomenodesunlabeled.Thesenodeswillnotbetouchedinthesecondphasesincetheydonotbelongtoanylocalcommunities,andconsequently,willremainunassignedafterwards.Moreover,theyaremostlynodeswithlessconnectiontotherestofthenetwork,andthus,areverylikelysupplementnodespossiblytotheiradjacentcommunities.Therefore,weneedtorevisitthosenodestoeithergroupthemintoappropriatecommunitiesorclassifythemasoutliersbasedontheirconnectivitystructures. Algorithm8RevisitUnassignedNodes Input:TherenedcommunitystructureD=fD1,D2,...,DtgOutput:ThebasiccommunitystructureC=fC1,C2,...,Ckg 1: C D; 2: for(u2VandCom(u)==;)do 3: NC(u) fCj2CjuisadjacenttoCjg; 4: for(Cj2NC(u))do 5: if(FCj[fugFCj)then 6: Cj Cj[fug; 7: Com(u) Com(u)[fjg; 8: endif 9: endfor 10: if(Com(u)==;)then 11: Classifyuasanoutlier; 12: endif 13: endfor 60

PAGE 61

Alternatively,thisprocesscanbethoughtofasacommunitytryingtohireadjacentunassignednodeswhicharesimilartothehostcommunity.However,theinternaldensityfunctionmightbetoostrictforthemtobeincludedinanycommunity(whichwasalsothereasonwhytheyareleftunassigned).Tothisend,weneedacommunitytnessfunctioninordertoquantifythesimilaritybetweenanodeuandaneighborcommunityC.WendthetnessfunctionFS=jSinj 2jSinj+jSoutj(whereSV)commonlyusedin[ 56 ][ 39 ][ 63 ]performscompetitivelyinbothsynthesizedandreal-worlddatasets.Takingintoaccountthistnessfunction,acommunityCwillkeephiringanyunassignedadjacentvertexofmaximumsimilarityinagreedymanner,providedthenewlyjoinedvertexdoesnotshrinkdownthecommunity'scurrenttnessvalue.Ifthereisnosuchnode,Cisdenedasanalnetworkcommunity.Nodesremainedunlabeledthroughthislastprocedureareidentiedasoutliers.ThisalgorithmispresentedinAlg. 8 3.3DetectingEvolvingNetworkCommunitiesWedescribeAFOCS,thesecondphaseandalsothemainfocusofourdetectionframework.Inparticular,weuseAFOCStoadaptivelyupdateandtracethenetworkcommunities,whichwerepreviouslyinitializedbyFOCS,asthedynamicnetworkevolvesovertime.NotethatFOCSisexecutedonlyonceonG0,afterthatAFOCSwilltakeoverandhandleallchangesintroducedtothenetwork.Letusrstdiscussthevariousbehaviorsofthecommunitystructurewhenthenetworktopologyevolvesovertime.SupposeG=(V,E)andC=fC1,C2,..,Cngisthecurrentnetworkanditscorrespondingoverlappingcommunitystructure,respectively.Weusethetermintralinkstodenoteedgeswhosetwoendpointsbelongtothesamecommunity,interlinkstodenotethosewithendpointsconnectingdifferentdisjointcommunitiesandthetermhybridlinkstostandfortheothers.ForeachcommunityCofG,thenumberofconnectionsjoiningCwiththeothersarelesserthanthenumberofconnectionswithinCitselfbydenition 61

PAGE 62

Intuitively,theadditionofintralinksorremovalofinterlinksbetweencommunitiesofGwillstrengthenthemandconsequently,willmakethestructureofGmoreclear.Similarly,removingintralinksfromorintroducinginterlinkstoacommunityofGwilldecreaseitsinternaldensityandasaresult,loosenitsinternalstructure.However,whentwocommunitieshavelessdistractiontoeachother,addingorremovinglinksmakesthemmoreattractivetoeachotherandtherefore,leavesapossibilitythattheycanoverlapwitheachotherorcanbecombinedtoformanewcommunity.Theupdatingprocess,asaresult,isverycomplicatedandchallengingsinceanyinsignicantchangeinthenetworktopologycouldpossiblyleadtoanunpredictabletransformationofthenetworkcommunitystructure.Inordertoreectthesechangestoacomplexnetwork,itsunderlyinggraphmodelisfrequentlyupdatedbyeitherinsertingorremovinganodeorasetofnodes,oranedgeorasetofedges.Ascrutinylookintotheseeventsrevealsthattheintroductionorremovalofasetofnodes(oredges)canfurthermorebedecomposedasacollectionofnode(oredge)insertions(orremovals),inwhichonlyanode(oronlyanedge)isinserted(orremoved)atatime.Therefore,changestothenetworkateachtimestepcanbeviewedasacollectionofsimplereventswhosedetailsareasfollow: newNode(V+u):Anewnodeuanditsadjacentedge(s)areintroduced removeNode(V)]TJ /F4 11.955 Tf 12.57 0 Td[(u):Anodeuanditsadjacentedge(s)areremovedfromthenetwork. newEdge(E+e):Anewedgeeconnectingtwoexistingnodesisintroduced. removeEdge(E)]TJ /F4 11.955 Tf 11.95 0 Td[(e):Anedgeeinthenetworkisremoved.Aswementionedearlier,ouradaptiveframeworkinitiallyrequiresabasiccommunitystructureC0.Toobtainthisbasicstructure,weapplyFOCSalgorithmattherstnetworksnapshot,i.e.,weexecuteFOCSonthenetworkG0andthenletAFOCSadaptivelyhandlethisstructureasthenetworkevolves. 62

PAGE 63

Figure3-3. Apossiblescenariowhenanewnodeisintroduced. 3.3.1HandlingaNewNodeLetusdiscusstherstcasewhenanewnodeuanditsassociatedlinksareintroducedtothenetwork.Possibilitiesare(1)umaycomewithnoadjacentedgeor(2)withmanyofthemconnectingoneormorepossiblyoverlappedcommunities.Ifuhasnoadjacentedge,wesimplyjoinuinthesetofoutliersandpreservethecurrentcommunitystructure.Theinterestingcasehappens,anditusuallydoes,whenucomeswithmultiplelinksconnectingoneoremoreexistingcommunities.Sincenetworkcommunitiescanoverlapeachother,weneedtodeterminewhichonesushouldjoinininordertomaximizethegainedinternaldensity.Buthowcanwequicklyandeffectivelydoso?ByLemma 10 ,wegiveanecessaryconditionforanewnodeinordertojoininanexistingcommunity,i.e.,ouralgorithmwilljoinnodeuinCifthenumberofconnectionsuhastoCsufces:dui>maxf2jCinij jCij)]TJ /F7 7.97 Tf 8.94 0 Td[(1,f(jCij+1))-254(jCinijg.However,failingtosatisfythisconditiondoesnotnecessarilyimplythatushouldnotbelongtoC,sinceitcanpotentiallygathersomesubstructureofCtoformanewcommunity(Figure 3-3 ).Thus,wealsoneedtohandlethispossibility.Alg. 9 presentsthealgorithm. Lemma10. SupposeuisanewlyintroducednodewithduiconnectionstoeachadjacentcommunityCi.uwilljoininCiifdui>maxf2jCinij jCij)]TJ /F7 7.97 Tf 8.94 0 Td[(1,f(jCij+1))-221(jCinijg. 63

PAGE 64

Algorithm9Handlinganewnodeu Input:ThecurrentcommunitystructureCt)]TJ /F7 7.97 Tf 6.59 0 Td[(1Output:AnupdatedstructureCt. 1: C1,C2,...,Ck Adjacentcommunitiesofu; 2: fori=1dotok 3: if(dui>maxf2jCinij jCij)]TJ /F7 7.97 Tf 8.93 0 Td[(1,f(jCij+1))-222(jCinijg)then 4: Ci Ci[fug; 5: else 6: C N(u)\Ci; 7: if((C)(C)andjCj4)then 8: Ci Ci[fug; 9: endif 10: endif 11: endfor 12: /*Checkingnewcommunitiesformedfromoutliers*/ 13: for(v2N(u)andCom(v)\Com(u)=;)do 14: CN(u)\N(v); 15: if((C)(C)andjCj4)then 16: DeneCanewcommunity; 17: endif 18: endfor 19: MergingoverlappingcommunitiesonC1,C2,...,Ck; 20: UpdateCt; Proof. PriortoujoiningtoCi,theinternaldensityis(Ci)=2jCinij jCij(jCij)]TJ /F7 7.97 Tf 8.94 0 Td[(1).Similarly,afterujoininginCi,thedensityfunctionis(Ci[fug)=2jCinij+2dui jCij(jCij+1).Takingthedifferencebetweenthesetwoquantitiesgives(Ci[fug)>(Ci)()dui>2jCinij jCij)]TJ /F7 7.97 Tf 8.93 0 Td[(1.Moreover,ushouldalsosatisfy(Ci[fug)(Ci[fug),whichinturnimpliesdu,if(jCij+1))-40(jCinij.Therefore,dui>maxf2jCinij jCij)]TJ /F7 7.97 Tf 8.94 0 Td[(1,f(jCij+1))-222(jCinijg. TheanalysisofAlg. 9 isshownbyLemma 11 .Inparticular,weshowthatthisprocedureachievesatleast74%theinternaldensityoftheoptimalassignmentforu,giventhepriorcommunitystructure. Lemma11. Alg. 9 producesacommunityassignmentthat,priortothecommunitycombinationprocess,achieves(Ct)(4)(OPT(u)t)whereOPT(u)tistheoptimalcommunityassignmentforuattimet,giventhepriorcommunitystructureCt)]TJ /F7 7.97 Tf 6.59 0 Td[(1. 64

PAGE 65

Figure3-4. Possiblescenarioswhenanewedgeisintroduced. Proof. LetC1,C2,...,Ckbethecommunities(includingthenewlyformedones)inCtthatAlg. 9 assignsthenewnodeuto.NotethatintheoptimalsolutionOPT(u)t,thenumberofcommunitiesubelongstoshouldnotexceedksinceeachCiisalsoacandidateforOPT(u)t(ofcourse,OPT(u)tcouldpossiblyrearrangenodesdifferently).Therefore,theoptimalinternaldensitygainedisupperboundedbyk.Ontheotherhand,Alg. 9 makessurethateachcommunityCithatujoinsinshouldhave(Ci)(Ci)(4)sincejCij4.Thus,Alg. 9 willachieveatleast(4)k0.74(OPT(u)t). 3.3.2HandlingaNewEdgeIncasewhereanewedgee=(u,v)connectingtwoexistingverticesuandvisintroduced,wedivideitfurtherintotwofoursmallercases:(1)eissolelyinsideasinglecommunityC(2)eiswithintheintersectionoftwo(ormore)communities(3)eisjoiningtwoseparatedcommunitiesand(4)eiscrossingoverlappedcommunities.IfeistotallyinsideacommunityC,itspresencewillstrengthenC'sinternaldensityandbyLemma 12 ,weknowthataddingeshouldnotsplitthecurrentcommunityCintosmallersubstructures.Inthesecondsubcase,theintroductionofthenewedgemightincreasethedensityofsomepartofCanditisreasonabletothinkofthatpart(sayD)asanewseparatedcommunity.However,sinceDoriginallysharedasignicantsubstructurewithC,the 65

PAGE 66

mergingprocesswillthencombineCandD(iftheywereseparated)tobeabiggercommunity,thusraisingthesamecommunityasifCwaskeptintact.Therefore,thesamereactionappliesinthesecondsubcasewheneiswithintheintersectionoftwocommunitiessincetheirinnerdensitiesarebothincreased.Thus,inthesersttwocases,weleavethecurrentnetworkstructureintact. Algorithm10Handlinganewedge(u,v) Input:ThecurrentcommunitystructureCt)]TJ /F7 7.97 Tf 6.59 0 Td[(1.Output:AnupdatedcommunitystructureCt. 1: if((u,v)2asinglecommunityOR(u,v)2Cu\Cv)then 2: Ct Ct)]TJ /F7 7.97 Tf 6.59 0 Td[(1; 3: elseif(Com(u)\Com(v)==;)then 4: C N(u)\N(v); 5: if((C)(C))then 6: DeneCanewcommunity; 7: CheckforcombiningonCom(u),Com(v)andC; 8: else 9: for(D2Com(u)(orD2Com(v)))do 10: if((D[fvg)(D[fvg))(or(D[fug)(D[fvg))then 11: D D[fvg(orD D[fug) 12: endif 13: endfor 14: MergingoverlappingcommunitiesforD's(orD); 15: endif 16: UpdateCt; 17: endif Handlingthelasttwosubcasesiscomplicatedsinceanyofthemcaneitherhavenoeffectonthecurrentnetworkstructureorunpredictablyformanewnetworkcommunity,andfurthermorecanoverlapormergewiththeothers(Figure 3-4 ).However,thereisstillapossibilitythattheintroductionofthisnewlink,togetherwithsomesubstructureofCuorCv,sufcestoformanewcommunitythatcanoverlapwithnotonlyCuandCvbutalsowithsomeoftheothers.Theothersubcasescanbehandledsimilarly.Alg. 10 describethisprocedure. Lemma12. Ifannewedge(u,v)isintroducedsolelyinsideacommunityC,itshouldnotsplitCintosmallersubstructures. 66

PAGE 67

Figure3-5. Possiblescenarioswhenanexistingnodeisremoved. Proof. Supposeotherwise,thatisCisdividedintosmallerpartsC1andC2.Priortotheintroductionof(u,v),wehave(C)=(C1[C2)(C)=(C1[C2).Now,whenC1andC2areformed,theyimplythat(C1[C2+(u,v))<(C1[C2+(u,v)).Puttingalltogether,wehave(C1[C2+(u,v))=(C1[C2)>(C1[C2+(u,v))>(C)>(C1[C2),whichraisesacontradiction.Thus,theconclusionfollows. 3.3.3RemovinganExistingNodeWhenanexistingnodeuisabouttoberemovedfromthenetwork,allofitsadjacentedgeswillalsoberemovedasaconsequence.Ifuisanoutlier,wecansimplyexcludeuanditscorrespondinglinksfromthecurrentstructureandsafelykeepthenetworkcommunitiesunchanged.Inunfortunatesituationswhereuisnotanoutlier,theproblembecomesverychallenginginthesensethattheresultingcommunityiscomplicated:itcaneitherbeunchanged,orbrokenintosmallercommunities,orcouldprobablybefurthermergedwiththeothercommunities.Togiveasenseofthiseffect,let'sconsidertwoexamplesillustratedinFigure 3-5 .Intherstexample,whenCisalmostafullclique,theremovalofanynodewillnotbreakitapart.However,ifwearemovenodethattendstoconnecttheotherswithinacommunity,theleftovermoduleisbrokenintoasmalleronetogetherwithanodethatwilllaterbemergedtooneofitsnearbycommunities.Therefore,identifyingtheleftoverstructureofCisacrucialtaskonceavertexuinCisremoved. 67

PAGE 68

Toquicklyhandlethistask,werstexaminetheinternaldensityofCexcludingtheremovednodeu.Ifthenumberofinternalconnectionsstillsufces,e.g.,(Cnfug)(Cnfug),wecansafelykeepthecurrentnetworkcommunitystructureintactbecauseCisstilltightlyconnecteditselfwithasufcientinternaldensity.Otherwise,thiscommunityisofaweakstrengthandshallbebrokenintosmallerones.ThesesubstructuresmightfurtherbemergedwithothercommunitiesifCorigianllyoverlapswiththem.Toefcientlydetectthesenewsubstructures,weapplyAlg. 6 onthesubgraphinducedbyCnfugtoquicklyidentifytheleftovermodulesinC,andthenletthesemoduleshireasetofunassignednodes(C)thathelpthemincreasingtheirinnerdensities.Finally,welocallycheckforcommunitycombination,ifany,byusinganalgorithmsimilartoAlg. 7 .Alg. 11 presentstheprocedure. Algorithm11Removinganodeu Input:ThecurrentcommunitystructureCt)]TJ /F7 7.97 Tf 6.59 0 Td[(1.Output:AnupdatedstructureCt. 1: for(C2Com(u)and(Cnfug)<(Cnfug))do 2: LC LocalcommunitiesbyAlg 6 onCnfug; 3: for(Ci2LCandjCij4)do 4: Si Nodessuchthat(Ci[Si)(Ci[Si); 5: Ci Ci[Si; 6: endfor 7: MergingoverlappingcommunitiesonLC; 8: endfor 9: UpdateCt; 3.3.4RemovinganExistingEdgeInthelastsituationwhenanedgee=(u,v)isabouttoberemoved,wedivideitfurtherintofoursubcasessimilartothoseofanewedge(1)eisbetweentwodisjointcommunities(2)eisinsideasolecommunity(3)eiswithintheintersectionoftwo(ormore)communitiesandnally(4)eiscrossingoverlappingcommunities.Intherstsubcase,wheneiscrossingtwodisjointcommunities,itsremovalwillmakethenetworkstructuremoreclear(sincewenowhavelessconnectionsbetweengroups),andthus,thecurrentcommunitiesshouldbekeepunchanged.Wheneis 68

PAGE 69

Figure3-6. Possiblescenarioswhenanexistingedgeisremoved. totallywithinasolecommunityC,handlingitsremovaliscomplicatedsincethiscanleadtoanunpredictabletransformationofthehostmodule:Ccouldeitherbeunchangedorbrokenintosmallermodulesifitcontainssubstructureswhicharelessattractivetoeachother,asdepictedinFigure 3-6 .Therefore,theproblemofidentifythestructureoftheremainingmodulebecomesthecentralpartfornotonlythiscasebutalsofortheothers. Algorithm12Removinganedge(u,v) Input:ThecurrentstructureCt)]TJ /F7 7.97 Tf 6.58 0 Td[(1.Output:AnupdatedcommunitystructureCt. 1: if((u,v)isanisolatededge)then 2: Ct=(Ct)]TJ /F7 7.97 Tf 6.58 0 Td[(1nfu,vg)[fug[fvg; 3: elseif(du=1(ordv=1))then 4: Ct=(Ct)]TJ /F7 7.97 Tf 6.58 0 Td[(1nC(u))[fug[C(v); 5: elseif(CC(u)\C(v)=;)then 6: Ct=Ct)]TJ /F7 7.97 Tf 6.59 0 Td[(1; 7: elseif((Cn(u,v))<(Cn(u,v)))then/*HereC6=;*/ 8: LC LocalcommunitiesbyAlg 6 onCn(u,v); 9: DeneeachL2LCalocalcommunityofCt)]TJ /F7 7.97 Tf 6.59 0 Td[(1; 10: MergingoverlappingcommunityonL's; 11: endif 12: UpdateCt; Toquicklyhandlethesetasks,werstverifytheinnerdensityoftheremainingmoduleand,againutilizethelocalcommunitylocationmethod(Alg. 6 )tolocallyidentifytheleftoversubstructures.Next,wecheckforcommunitycombinationsince 69

PAGE 70

thesestructurescanpossiblyoverlapwithexistingnetworkcommunities.ThedetailedprocedureisdescribedinAlg. 12 3.3.5RemarksNotethattheultimategoalofourframeworkistoadaptivelydetectandupdatethecommunitystructureasthenetworkevolves,i.e.,tomainlydealwiththedynamicsofamobilenetwork.Asaresult,wemainlyputourfocusonAFOCS.AlthoughFOCS,therstdetectionphase,appearstobeacentralizedalgorithm,itisexecutedonlyonceattheveryrstnetworksnapshotwhereasAFOCSstaysupandlocallyhandlesallchangesasthenetworkevolvesovertime.Thatsaid,wedonotexecuteFOCSagain.Furthermore,AFOCScanberunindependentlywithFOCS,i.e.,onecanuseanylocalizeddetectionalgorithmtoidentifyabasiccommunitystructureattherstphase.Thus,AFOCScanbeeasilyapplytosolvemobilenetworkproblems. 3.3.6ComplexityOurmainalgorithmconsistsoftwoparts:(1)ndingthebasiccommunitystructureand(2)updatingthenetworkcommunitystructurethroughchangesintroducedateverytimestep.ThecomplexityofquicklyunfoldingthebasicnetworkcommunitystructureshasbeenclaimedtobelinearintermsofnumberofnodesandlinksO(M+N)[ 58 ].Tohandlethecaseofanewnodeofdegreepcomingin,ouralgorithmcomputespforcesthisnewnodeappliestoitsneighbors,whichresultsinlineartimecomplexityO(p).Whenanewedgeconnectingnodesuandvisintroducedtothenetwork,ouralgorithmjustsimplycomputestheforcesappliedtocommunitiesadjacencynodes,whichtakesO(jC(u)j+jC(v)j)inthebestcaseandO(kmaxfjC(u)j,jC(v)jg)intheworstcasewhensomenodesinamodulearepulledouttoformnewcommunities(wherekisthenumberofcommunitiesinG).Thetimetakentohandlethelasttwocasesisessentiallythetimecomplexityofthecliquepercolation,whichisroughlyO(jC(u)j3)intheworstcase.Althoughthetimecomplexityisinthethirdorderofnumberofnodes,thetotalnodesinsideasinglecommunityisrelativelysmallincomparisonwiththetotalnumber 70

PAGE 71

Figure3-7. NMIscoresfordifferentvaluesof.N=5000(top),N=1000(bottom),=0.1(left),=0.3(right). ofverticesN,andthus,doesnotaffecttheactualrunningtime.ExperimentalresultsinSection4showthatouralgorithmperformsquicklyandsmoothlyinlargesocialonlinenetworks. 3.4ExperimentalResultsInthissection,werstpresenttheempiricalresultsofAFOCSincomaprisonwithtwostaticdetectionmethods:CFinder-themostpopularmethod[ 84 ],andCOPRA-themosteffectivemethod[ 40 ].WenextcomparetheperformanceofAFOCSwithotherdynamicmethodsincludingOSLOM[ 60 ],FacetNet[ 71 ]andiLCD[ 9 ].DataSets:Weusenetworksgeneratedbythewell-knownLFRoverlappingbenchmark[ 58 ],the`defacto'standardforevaluatingoverlappingcommunitydetectionalgorithms.Generatednetworksfollowpower-lawdegreedistributionsandcontainembeddedoverlappingcommunities(thegroundtruth)ofvaryingsizesthatcapturetheinternalcharacteristicsofreal-worldnetworks. 71

PAGE 72

ANumberofcommunities BNMIscores Figure3-8. ComparisonamongAFOCS,COPRAandCFindermethods.N=5000(top),N=1000(bottom),=0.1(left),=0.3(right). 72

PAGE 73

Setup:TofairlycomparewithCOPRAandtoavoidbeingbiased,wekeeptheparameterscloseto[ 40 ]:theminimumandmaximumcommunitysizesarecmin=10andcmax=50,eachvertexbelongstoatmosttwocommunities,om=2.N=1000andN=5000.Themixingrate=0.1and=0.3.Theoverlappingfraction,whichdeterminesthefractionofoverlappednodes,isfrom0to0.5.SinceCOPRAisnondeterministic,werunit10timesoneachinstanceandselectthebestresult.Metrics:Weevaluatefollowingmetrics.(1)ThegeneralizedNormalizedMutualInformation(NMI)[ 56 ]speciallybuiltforoverlappingcommunities.NMIscoresthesimilaritybetweenthedetectednetworkcommunitiesandthegroundtruth.ThisisanstandardizedmeasuresinceNMI(U,V)=1ifstructuresUandVareidenticaland0iftheyaretotallyseparated.(2)Thenumberofcommunities,ignoringsingletoncommunitiesandunassignednodes.Agoodcommunitydetectionmethodshouldproduceroughlythesamenumberofcommunitieswiththeknowngroundtruth. 3.4.1ChoosingtheOverlappingThresholdTheoverlappingthresholdistheonlyinputparameterrequiredbyourframework,andthus,determiningitsappropriatevalueplaysanimportantroleinassessingAFOCS'sperformance.Tobestdeterminethisthreshold,werunAFOCSongeneratednetworkswithdifferentvaluesof,andrecordthesimilaritiesbetweenthedetectedcommunitiesandtheground-truthviaNMIscores(Figure 3-7 ).Ofcourse,thehigherNMIscoresimplythebettervalues.Asathresholdparameter,controlshowmuchsubstructurecommunitiescanhaveincommon.Thesmallervaluesofimplythemoreweallownetworkcommunitiestooverlapwitheachother,andviceversa.Similarly,canbethoughtofasthezoomingscaleofthenetworkstructurewherelower'srevealthecoarserandhigher'srevealthenerstructure.AsdepictedinFigure 3-7 ,thebestvaluesforarerangingfrom0.67to0.80,amongwhich=0.70yieldsthebestcommunitysimilarity(NMIscoresare 73

PAGE 74

rangingfrom0.8to1)inallofthegeneratednetworks.Therefore,wextheoverlappingthresholdinAFOCStobe0.70hereafter. 3.4.2ReferencetoStaticMethodsWeshowourresultsingroupsoffour.Foreachcasewevarytheoverlappingfractionfrom0to0.5andanalyzetheresultsfoundbyAFOCS,CFinder,COPRAand(static)OSLOMmethods(OSLOMs).WeonlypresentresultswhencorrespondingparametersgivetopperformanceforCFinder(cliquesizek=4,5)andCOPRA(max.communitiespervertexv=3,6).Figure 3-8A showsthenumberofcommunitiesfoundbyAFOCS,COPRAandCFinder,OSLOMsandthegroundtruth.ItrevealsfromthisgurethatthenumbersofcommunitiesfoundbyAFOCS,markedwithsquares,aretheclosestandalmostidenticaltothegroundtruthastheoverlappingfractiongetshigher.ThereisanexceptionwhenN=1000and=0.3whichwewilldiscusslater.IntermsofNMIscores,asonecaninferfromFigure 3-8B ,AFOCSachievesthehighestperformanceamongallmethodswithmuchmorestable.Acommontrendinthistestistheperformancesofallmethodsdegrade(1)whenthemixingrateincreases,i.e.,whenthecommunitystructurebecomesmoreambiguousor(2)whenthesizeofnetworkdecreaseswhilethemixingratestaysthesame.EventhoughAFOCSisnotverycompetitiveonlywhenbothnegativefactorshappeninthebottom-rightcharasN=1000and=0.3,itisingeneralthebestperformer.OSLOMs,thestaticversionofOSLOMmethod,doesnotappeartoperformwellonthesesynthesizeddataasitsNMIscoresarelowanddegradequicklywhenthenetworkcommunitiesbecomemorestochastic.TheNMIscoresofAFOCS,ontheotherhand,remainhighandstableevenwhenthenetworkcommunitystructurebecomesunclearwhentheoverlappingfractionincreases.Thesignicantgapisobservedwhenthemixingrategetshigher(=0.3)andthenetworksizegetssmaller(N=1000).AFOCSprovideslessnumbersofcommunities 74

PAGE 75

thanthoseofthegroundtruthbutwithmuchhigheroverlappingrates.Thereasoniswithalargermixingrate,anodewillhavemoreedgesconnectingverticesinothercommunities,thusincreasesthechancethatAFOCSwillmergehighlyoverlappedcommunities.Hence,AFOCScreateslessbutwithlargersizecommunities.Wenotethatthis`weakness'ofAFOCSiscontroversialaswhenthemixingrateincreases,thegroundtruthdoesnotnecessarilycoincidewiththestructureimpliedbythenetwork'stopology.ExtensiveexperimentsshowtheabilityofAFOCSinidentifyinghighqualityoverlappingcommunities.Inaddition,wefoundAFOCSrunssubstantiallyfasterthantheothercompetitors:ontheFacebookregionalnetwork[ 100 ]containing63Knodes,AFOCSis150xfasterthanCOPRAwhileCFinderisunabletonishitstasks. 3.4.3ReferencetoOtherDynamicMethodsWenextobservetheperformanceofAFOCSinreferencetotwodynamicmethodsFacetNet,iLCDandOSLOM.Sincetheground-truthcommunitiesareknownonsynthesizeddatasets,faircomparisonsamongthreemethodscanbeobtainedviatheirNMIscoresandrunningtimes.Ofcourse,thehigheritsNMIscoreswithlesstimeconsuming,thebetterthemethodseemstobe.Eachsynthesizeddynamicnetworkissimulatedvia5snapshots,inwhichthebasiccommunitiesareformedbyusing50%ofthenetworkdatawithapproximately10%ofthenetworkevolution(node/edgeadditionsandremovals)addedtoeachgrowingsnapshotatatime.SinceFacetNetrequiresthenumberofcommunitiesapriori,weinputthismethodtheactualnumberasminedformtheground-truth.ForiLCDandOSLOMmethods,wekeepthedefaultsettingasprovidedintheirdeliverable.Werstevaluatetheobjectivefunction,i.e.,thetotalinternaldensityobtainedbyallmethodsinFigure 3-9A .Althoughinternaldensityisnotnecessarilytheobjectiveofothermethods,thismetriccanprovideustheconceptofhowstrongthecommunitystructuredetectedbyeachapproachis.AsrevealedFigure 3-9A ,AFOCSobtainedthehighestinternaldensityinalltestsandisonlylaggedbehindiLCDapproach. 75

PAGE 76

AObjectivevalues BNMIscores Figure3-9. ComparisonamongAFOCS,iLCD,FacetNetandOSLOMdynamicmethods. 76

PAGE 77

Figure3-10. ThenumberofcommunitiesobtainedbyAFOCS,iLCD,FacetNetandOSLOMandOSLOMsmethods. TheNMIscoresoffourmethodsarepresentedinFigure 3-9B and 3-10 .ItrevealsfromthesesubguresthattheNMIscoresofAFOCSarehigherthanthoseofFacetNet,iLCDandOSLOM.Inparticular,theNMIscoresofAFOCSareaboutjust5-7%lagbehindthatofOSLOMandiLCDintherst2networksnapshots,whilearemuchbetterthantheothersattheendoftheevolution.OSLOM'sNMIvaluesareveryhighattheverybeginning,however,theytendtodecreasequicklyasmoreconnectionsandnodesareintroduced.TheNMIscoresofiLCDandFacetNettendtouctuateandalsodecreasesignicantlyatthelastsnapshot.AFOCS,intheothertrend,keepsitsNMIscoreshighandwealthy,especiallyattheendofthenetworkevolution.ThisimpliescommunitiesdiscoveredbyAFOCSareofhighersimilaritytotheground-truththan 77

PAGE 78

theotherdynamicmethods,especiallyinthelongrun.ThenumberofcommunitiesfoundbyallmethodsarereportedinFigure 3-10 .Ofcourse,thecloserthesedetectednumbersofcommunitiestotheground-truth,thebetterthemethodarebelievedtobe.AsrevealedinthesubguresofFigure 3-10 ,thesequantitiesdiscoveredbyAFOCStendtocloselyapproachtheactuallynumbers,evenwhenthemixingratesarehigh(rightgures).ThehighestsimilaritybetweenthesenumbersofcommunitiesispossiblythebestexplanationforthehighNMIscoresofAFOCSovertheothercompetitors.Wenexttakealookattherunningtimeofallmethodsinthesesynthesizednetworks.AFOCSrequiresatmost5secondstonishupdatingeachnetworksnapshotwhereasFacetNetasksformorethan25seconds(5xmoretimeconsuming)inthenetworkswithjust5000nodes.iLCDandOSLOMalsoperformfastinthesegenerateddatasets;however,thesimilarityofthedetectedcommunitiesandtheground-truthissurprisinglypoor,asrevealedfromtheresults.Therefore,intermsofdynamicapproaches,westronglybelievethatAFOCSachievescompetitivecommunitydetectionresultsinatimelymanner.TheseresultsalsoprovideusthecondencewhenapplyingAFOCStoanalyzereal-worldnetworks. 78

PAGE 79

CHAPTER4COMMUNITYSTRUCTUREDETECTIONUSINGNONNEGATIVEMATRIXFACTORIZATIONInthischapter,analyzetwoapproaches,namelyiSNMFandiANMF,foreffectivelyidentifyingsocialnetworkcommunitiesusingNonnegativeMatrixFactorization(NMF)withI-divergence(Kullback-Leiblerdivergence)asthecostfunction.OurapproachesworkbyiterativelyfactorizinganonnegativeinputmatrixthroughderivedmultiplicativeupdaterulesandtheQuasi-Newtonmethod.Bydoingso,wecannotonlyextractmeaningfuloverlappingcommunitiesviasoftcommunityassignmentsproducedbyNMFbutalsonicelyhandlebothdirectedandundirectednetworkswithorwithoutweights.WegivethecompletemultiplicativeupdaterulesforfactorizingXHHT(iSNMFproblem)andXHSHT(iANMFproblem)toeffectivelyidentifyoverlappingcommunitiesonsocialnetworks.Theseapproachesaretopology-independentandtheirsolutionscanbeeasilyinterpreted.WeprovideindetailthefoundationpropertiesaswellastheproofsofcorrectnessandconvergenceofbothiSNMFandiANMFproblems.WealsoproposetheQuasi-NewtonmethodtospeeduptheperformanceofiSNMFupdaterule.Furthermore,wevalidatetheperformanceofourapproachesthroughextensiveexperimentsonnotonlysynthesizeddatasetsbutalsoreal-worldnetworks.EmpiricalresultsshowthatiSNMFisamongthebestefcientdetectionmethodsonundirectednetworkswhileiANMFoutperformscurrentavailablemethodsindirectednetworks,especiallyintermsofdetectionquality. 4.1ProblemDenitionandProperties 4.1.1MotivationforNMFinCommunityDetectionLetusrstgetsomeinsightabouthowNMFcanbehelpfulindetectingnetworkcommunities,especiallyoverlappingones.ConsiderthetoynetworkG=(V,E)picturedinFigure 4-1 .ThisnetworkcontainsclearcommunitiesC1andC2havingnode4incommon.TheadjacencymatrixXofthisidealnetworkcanberepresentedas 79

PAGE 80

Figure4-1. AnillustrativeexamplemotivatingNMFincommunitydetection X=0B@S100S21CA,whereS1andS2are44and55squarematricescorrespondingtoC1andC2,respectively.ThisadjacentmatrixXsummarizesallthenetworkinformationandistheonlythingwehave.So,howcanwederivebacktheappropriatecommunities(orthecommunityindicators)onlyfromthismatrix?ThisiswhereNMFcomesintothepictureandhelps.Inparticular,thespecialNMFfactorizationXHSHTgivesusHandSasthecommunityindicatorandthecommunityinternal-strengthindicatormatrices,respectively.Inthisexample,XHSHTfactorizationrealizesS=I2andHT=0B@111.870000000.890.990.990.990.991CAMatrixHclearlyindicatesthatnodes1-3shouldbeinacommunityandnodes5-8shouldbelongtoanotherone.Halsoadvisesthatnode4shouldbeanoverlappingnodeduetoitssignicantcontributiontobothcommunities.Theseassignmentsindeedreectthetruenodes'labels.Inaddition,matrixSindicatesthateachdetectedcommunityattainsitsperfectinternalstrength,whichintuitivelyagreeswiththeoriginalcliquestructures.Thisillustrativeexample,thoughsimple,motivatestheapplication 80

PAGE 81

oftheNMFfactorizationXHSHTincommunitydetection.NotethatwhenXissymmetric(i.e.,thenetworkisundirected),Sisalsosymmetricandthus,canbefurtherabsorbedintoHbytheassignmentH HS1=2.Hence,theproblemisreducedtoXHHTonlywhenXissymmetric. 4.1.2ProblemDenitionsInordertoquantifythegoodnessoftheapproximation,weusetheI-divergence(Kullback-Leibler(KL)divergence)betweentwononnegativematricesAandBsuggestedby[ 64 ]asD(AjjB)=XijAijlogAij Bij)]TJ /F4 11.955 Tf 11.96 0 Td[(Aij+BijDuetotheinequalityxlogxx)]TJ /F5 11.955 Tf 12.21 0 Td[(18x>0,itiseasytoseeD(AjjB)islowerboundedbyzeroandvanishesifandonlyifA=B.However,unliketheEuclideandistance,thisfunctionisnotsymmetricinAandB,sowerefertoitasthedivergencefromAtoB.ThesmallerthedivergencebetweenAandB,themoresimilartheyare.Therefore,ourobjectivesseekforthefactorizationsXHHTandXHSTHsuchthatD(XjjHHT)andD(XjjHSHT)areminimized.Formally,theproblemsweareinterestedincanbestatedasfollows(herethelittleicomesfromtheI-divergence)Problem1(iSNMF)GivenanonnegativesymmetricmatrixX,ndamatrixH0thatminimizesDX(HHT)D(XjjHHT)Problem2(iANMF)GivenanonnegativeasymmetricmatrixX,ndmatricesH,S0thatminimizeDX(HSHT)D(XjjHSHT) 4.1.3PropertiesofiSNMFandiANMFfactorizationsByLemma 13 ,wegiveimportantpropertiesofiSNMFandiANMF:thedivergencesDX(HHT)andDX(HSHT)areconvexinSonlyorHonly;however,theyarenotconvexinbothvariablestogether.AlthoughthesameobservationshavebeenproposedforthegeneralNMFproblemonbothFrobeniusandI-divergencecostfunctions[ 64 ],no 81

PAGE 82

claimhasbeenmadeparticularlyfortheiSNMFandiANMFproblems,especiallyontheI-divergencefunction. Lemma13. ThedivergencesDX(HHT)andDX(HSHT)iniSNMFandANMFareconvexinHorSonlybutnotinbothSandHtogether. Proof. (ConvexityinS)SupposeHisaxedmatrix.Foranynumber,2[0,1]and+=1,wehaveDX(H(S1+S2)HT)DX(HS1HT)+DX(HS2HT),ifandonlyif)]TJ /F11 11.955 Tf 11.29 11.36 Td[(Xijlog([HS1HT]ij+[HS2HT]ij))]TJ /F3 11.955 Tf 21.92 0 Td[(Xijlog[HS1HT]ij)]TJ /F3 11.955 Tf 11.95 0 Td[(Xijlog[HS2HT]ijforanymatricesS1,S20.Thelaterinequalityholdstrueduetotheconvexityof)]TJ /F5 11.955 Tf 11.29 0 Td[(log()functionandJensen'sinequality.Thus,DX(HSHT)isconvexinSwhenHisxed.(ConvexityinH)AssumeSisaxedmatrix.RewriteDX(HSHT)=XijXij(logXij)]TJ /F5 11.955 Tf 11.95 0 Td[(1))]TJ /F11 11.955 Tf 11.95 11.35 Td[(XijXijlog[HSHT]ij+Xij[HSHT]ij.Sincethersttermisaconstantand)]TJ /F5 11.955 Tf 11.29 0 Td[(log()isaconvexfunction,weneedtoshowthatthelasttermisalsoconvexinH.Letf(H)=Pij[HSHT]ij.Now,f(H1)+f(H2))]TJ /F4 11.955 Tf 11.96 0 Td[(f(H1+H2)=Xij)]TJ /F5 11.955 Tf 5.48 -9.68 Td[([H1SHT1]ij)]TJ /F5 11.955 Tf 11.96 0 Td[([H2SHT1]ij)]TJ /F5 11.955 Tf 11.95 0 Td[([H1SHT2]ij+[H2SHT2]ij=Xij[(H1)]TJ /F4 11.955 Tf 11.95 0 Td[(H2)S(H1)]TJ /F4 11.955 Tf 11.96 0 Td[(H2)T]ij0(sinceS0andPij[AAT]ij0foranymatrixA).ThisimpliestheconvexityinHofDX(HSHT).TheconvexityofHiniSNMFisderivedsimilarlyasabovewhenSissimilartoI,theidentitymatrix.ThenonconvexityinbothSandHfollowsfromthegeneralNMFcase[ 64 ]. 82

PAGE 83

TheabovepropertiesarenontrivialsincetheytellusitisunrealistictosolveeitheriSNMForiANMFproblemforglobalminima,andconsequentlygiveusthehopetouseothertechniquessuchasProjectGradient[ 14 ],Quasi-Newton[ 108 ]orparticularlytheAlternatingLeaseSquare(ALS)[ 15 ]methodstoquicklyndalocalminima.However,byLemma 14 ,weshowthatforiSNMFandiANMFproblems,employingthetraditionalALSdoesnotprovideanyspeedupsincewecanneitherindependentlyupdatethecolumnsofSnorHatthesametime,thuspreventtheemploymentofthistechniquetoourproblems. Lemma14. EmployingALSmethoddoesnotprovideanyspeeduptoeitheriSNMForiANMF. Proof. LetusrstreviewtheALSmethod'sworkingmechanismonthegeneralNMFproblemXWH.GivenX0,theALSmethoddoesthefollowingsteps[ 5 ] 1. RandomlyinitializeW1ia0,H1bj0,8i,a,b,j 2. Fork=1,2,...alternativelyupdateWk+1andHk+1byWk+1=argminW0DX(WHk),andHk+1=argminH0DX(Wk+1H);ThemainideaoftheALSmethodistosolveeachminimizationproblemasthecollectionofseveralnon-negativeindependentleastsquareproblems,duetotheuncorrelatedrelationshipbetweenWandH.Forinstance,onecanwriteHk+1=argminH0D(XjjWk+1H)asHk+1'sjthcolumn=minh0D(xjjWk+1h),wherexisthejthcolumnofXandhisacolumnvectorofappropriatesize.Therefore,eachsub-minimizationproblemrequiresonlythevaluesofaspeciccolumnandconsequentlycanbedoneinaparallelmanner.SinceHandHTarestronglyrelated,itisinappropriatetoapplyALSmethodtoiSNMFproblem.ForANMFproblem,werstnotethat[HSHT]ij=PtkHikHjtSkt,whichimpliesanentryinHSHTalreadyrequiresallvaluesofSevenwhenHisxed.Therefore,shouldonetrytoupdateasinglecolumnofSindependentlyassuggestedinS-phaseoftheALSmethod,hehastorepeatedly 83

PAGE 84

solveforallelementsSkt's,whichmayincurevenmorecomputationalrequirements.Thus,theconclusionfollows. 4.2TheUpdateRuleforiSNMF 4.2.1MultiplicativeUpdateRuleWepresentoursolutionforiSNMFwhentheinputmatrixXissymmetric.Formally,givenanonnegativesymmetricmatrixXofsizennandanintegernumberKn,weneedtondanonnegativematrixHofsizenKsuchthatDX(HHT)D(XjjHHT)isminimized.WesolvethisproblemusingtheKarush-Kuhn-Tucker(KKT)[ 13 ]conditions.Inparticular,weintroducetheLagrangemultipliersijfortheconstraintsHij0andconsidertheobjectivefunctionJ=D(XjjHHT))]TJ /F11 11.955 Tf 11.95 8.96 Td[(PijijHij,or,J=XijXijlogXij [HHT]ij)]TJ /F4 11.955 Tf 11.95 0 Td[(Xij+[HHT]ij)]TJ /F11 11.955 Tf 11.96 11.35 Td[(XijijHijTheKKTconditionsrequire@J @Hab=0(or@DX(]HHT) @Hab=ab)astheoptimalityconditionandabHab=0asacomplementaryslacknessconditionforanyHab.Fortheeaseofcomputation,weconstructthederivativematrixHHTwithrespecttoHabinFigure 4-2 .Foreachposition(a,b),thisderivativematrixiszeroelsewhereexceptfortheathcolumnandathrowwhoseelementsarefromthebthcolumnofH.Usingthismatrix,weobtain @DX(HHT) @Hab=2XkHkb)]TJ /F11 11.955 Tf 11.96 11.36 Td[(XkHkbXak [HHT]ak.(4) 84

PAGE 85

Figure4-2. ThepartialderivativematrixofHHTwithrespecttoHab. Hence,theoptimalityconditionimpliesab=2(XkHkb)]TJ /F11 11.955 Tf 11.96 11.36 Td[(XkHkbXak [HHT]ak),andthus,thecomplementaryslacknessconditionrequires2)]TJ 7.48 1.67 Td[(XkHkb)]TJ /F11 11.955 Tf 11.95 11.36 Td[(XkHkbXak [HHT]akHab=0,whichsuggeststhefollowingupdaterule Hab HabPkHkbXak=[HHT]ak PtHtb.(4)Intermsofprojectedgradientmethod,theruleabovecanbeobtainedbyusingtheupdateruleHab Hab)]TJ /F3 11.955 Tf 11.95 0 Td[(ab@DX(HHT) @Hab,withthemagnitudeabsettosomeappropriatesmallpositivenumber.Here,settingab=Hab 2PtHtb 85

PAGE 86

leadstothesameupdateruleas( 4 ).TheiSNMFcommunitydetectionalgorithmisdescribedinAlg. 13 .Here,n0isthemaximumnumberofiterations,istheallowedthresholdforthequalityofiSNMFapproximationandisagivenscaletodeterminecommunitymemberships.WeassumethatK,thenumberofcommunities,ispredeterminedorgivenaspartoftheinput.Also,thechoiceofwillbedescribedlater. Algorithm13SNMFforcommunitydetection Input:Undirected,unweighted(weighted)adjacentmatrixX,K,n0,,;Output:CommunityindicatormatrixH; 1: InitializeHtobearandomnongnegativematrix; 2: iter 0; 3: while(itern0)and(DX(HHT)>)do 4: UpdateHab HabPkHkbXak=[HHT]ak PtHtb; 5: iter iter+1; 6: endwhile 7: %InferringcommunitylabelsfromH% 8: Cb ;8b=1...K; 9: P normalized(H); 10: forb 1...pdo 11: ifP(a,b)max(P(a,:))then 12: Cb Cb[fag; 13: endif 14: endfor Remark Incontrasttothoseupdaterulesfoundin[ 64 ],wehaveshownanimportantfact:Theserulescanbederivedsimilarlyforthisspecialcase.However,ourmultiplicativeupdaterule( 4 )isnottrivialinthesensethatwecanobtaintheconvergenceproofforourproposedrulewhereasonemaynditinappropriatetoadapttheproofof[ 64 ][ 16 ]whichassumedabsolutelynocorrelationbetweenWandH.Analysis Weprovidetheconvergenceanalysisforourproposedupdaterule( 4 )usinganauxiliaryfunctiondenedasfollow: 86

PAGE 87

(Auxiliaryfunction)G(h,eh)istheauxiliaryfunctionforF(h)iftheconditionsG(h,eh)F(h)andG(h,h)=F(h)aresatised Lemma15. [ 64 ]IfGisanauxiliaryfunction,thenFisnonincreasingundertheupdateht+1=argminhG(h,eh).Toprovetheconvergenceoftheproposedmultiplicativeupdaterule,weconstructanauxiliaryfunctionG(H,eH)ofF(H)DX(HHT)asfollow G(H,eH)=Xij)]TJ /F4 11.955 Tf 5.48 -9.69 Td[(XijlogXij)]TJ /F4 11.955 Tf 11.95 0 Td[(Xij+[HHT]ij)]TJ /F11 11.955 Tf 11.96 11.36 Td[(XijkXijHikeHjk PtHiteHjtlogHikHjk)]TJ /F5 11.955 Tf 11.96 0 Td[(logHikeHjk PtHiteHjt Theorem4.1. ThedivergenceDX(HHT)isnonincreasingundertheupdaterule( 4 )andisinvariantwhenHisatitsstationarypointofthedivergence. Proof. WheneH=H,itiseasytoverifythatG(H,H)=F(H),thusweonlyneedtocheckG(H,eH)F(H).Now,G(H,eH)F(H)iff)]TJ /F11 11.955 Tf 11.96 11.35 Td[(XijkXijHikeHjk PtHiteHjtlogHikHjk)]TJ /F5 11.955 Tf 11.96 0 Td[(logHikeHjk PtHiteHjt)]TJ /F11 11.955 Tf 23.91 11.36 Td[(XijXijlog[HHT]ij=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XijXijlog)]TJ 7.47 1.68 Td[(XkHikHjk())]TJ /F11 11.955 Tf 39.85 11.36 Td[(XijkXijHikeHjk PtHiteHjtlogHikHjkPtHiteHjt HikeHjk)]TJ /F11 11.955 Tf 23.91 11.36 Td[(XijXijlog)]TJ 7.48 1.67 Td[(XkHikHjk.Toprovetheaboveinequality,weapplyJensen'sinequalitytotheconvexfunction)]TJ /F5 11.955 Tf 11.29 0 Td[(log)]TJ 7.47 -.72 Td[(PkHikHjk,yielding)]TJ /F5 11.955 Tf 11.29 0 Td[(logXkkHikHjk k)]TJ /F11 11.955 Tf 23.91 11.36 Td[(XkklogHikHjk k,wherekijk=HikeHjk PtHiteHjt.Itisobviousthatk'sarenonnegativeandsumuptounity.Thus,wehaveG(H,eH)F(H).TakingthederivativeofG(H,eH)withrespecttoHalsogivesthesameupdaterule( 4 ). 87

PAGE 88

4.2.2Quasi-NewtonMethodforiSNMFOneoftheproblemswiththemultiplicativeupdateruleisitsslowconvergence,i.e.,itdoesconvergeto(possible)stationarypointbutmaybeslow,takingmoreiterationsandtime,aswellaseasilygettingintolocalminimatrap[ 5 ].Onewaytospeeduptheconvergenceistoadjustthelearningrateinasequentialmanner,usingthesecond-orderestimateoftheobjectivefunction,e.g.theQuasi-Newtonmethod.In[ 108 ],theauthorsalreadyaddressedthismethodforthegeneralNMFXWHbutwiththeuncorrelatedrelationshipassumptionbetweenWandH.Obviously,thatassumptiondoesnotholdwhenXHHTandhence,itisnottrivialtoderiveproperQuasi-NewtonformulationforiSNMFproblem.Infact,weshowthatthesecond-order,orHessian,matrixH(H)DXofiSNMFismuchdifferentfromthatofthegeneralNMF.ThegeneralQuasi-Newtonmethod,whenappliedtoiSNMFproblem,takestheform H maxH)]TJ /F5 11.955 Tf 11.96 0 Td[([H(H)DX])]TJ /F7 7.97 Tf 6.59 0 Td[(1@DX @H,,(4)whereDXisshortforDX(HHT),@DX @HisthenKrst-ordermatrixofDX(HHT)w.r.tH,H(H)DXisthenKnKsecond-orderderivative(orHessian)matrixofDXw.r.ttoHandisasmallnonnegativenumbertoenforcethenonnegativityofH.Thankstoequation( 4 ),therst-orderderivativematrix@DX @Hcanbefoundas @DX @H=21)]TJ /F4 11.955 Tf 11.95 0 Td[(X.=HHTH,where1isaNNmatrixofall1'sand.=istheHadamard(element-wise)division.Foranypair(i,j),theHessianmatrixH(H)DXcanbefoundby:[H(H)DX]ij=@DX @HijHab= 8>>>>>>>>>><>>>>>>>>>>:2)]TJ /F5 11.955 Tf 5.48 -9.69 Td[(1)]TJ /F6 7.97 Tf 13.15 7.48 Td[(Xii([HHT]ii)]TJ /F7 7.97 Tf 6.59 0 Td[(2H2ij) [HHT]2ii+Pk6=iH2ikXik [HHT]2ika=i,b=j2)]TJ /F7 7.97 Tf 6.67 -3.58 Td[(2HibHijXii [HHT]2ii+Pk6=iHkbHkjXik [HHT]2ika=i,b6=j2)]TJ /F5 11.955 Tf 5.48 -9.68 Td[(1)]TJ /F6 7.97 Tf 13.15 6.1 Td[(Xak([HHT]ai)]TJ /F6 7.97 Tf 6.59 0 Td[(HijHaj) [HHT]2aia6=i,b=j2)]TJ /F6 7.97 Tf 6.67 -3.57 Td[(HibHajXai [HHT]2aia6=i,b6=j(4) 88

PAGE 89

TherearetwoimportantdifferencesbetweentheHessianHNMFforthegeneralcase[ 108 ]andH(H)DX.Firstly,HNMF'selementsarezeroseverywhereexceptwhena=i,b=jwhereasH(H)DXobtainsvaluesforallcombinationsofa,b,iandj.Secondly,duetoitssparseness,HNMFcanbewrittenundermatrixblockformwhileH(H)DXmightnotbe,particularlywhenitisafullmatrix.Therefore,updatingHiniSNMFproblemismuchmorecomplicatedthanusualsincending[H(H)DX])]TJ /F7 7.97 Tf 6.59 0 Td[(1in( 4 )shallrequiremorenumericalcomputations.Theauthorsin[ 108 ]alsoproposedanumericaltechniquetoovercometheill-conditionedHessianmatrixandtospeedupthecomputingprocess,whichwenditusefulwhenappliedtoourproblems.Here,webrieystatetheirtechniquesothatthepaperisself-contained(notethat( 4 )and( 4 )arenotourequations):Toreducethecomputationalcost,theinversionoftheHessianisreplacedwiththeQ-lessQRfactorizationcomputedbyLAPACK.ThenalformoftheQuasi-Newtonmethodis H maxH)]TJ /F3 11.955 Tf 11.96 0 Td[(RHjWH,(4) WH=QTH@DX @H,QHRH=H(H)DX+IH(4)whereIHisthenKnKidenticalmatrix,=10)]TJ /F7 7.97 Tf 6.59 0 Td[(12and=0.9arethesmallxedregularizationandtherelaxparameters,respectively.Thejoperatorin( 4 )meanstheGaussianelimination. 4.3UpdateRulesforiANMF 4.3.1MultiplicativeUpdateRulesInthissection,wepresentoursolutionfortheiANMFproblemwhenXisnotsymmetric.Formally,givenanonegativeasymmetricmatrixXofsizenN,wendnonnegativematricesHandSofsizenKandKK,respectively,suchthatDX(HSHT)D(XjjHSHT)isminimized.WeagainsolvethisproblemusingtheKKTconditionsbyintroducingtheLagrangemultipliersijandijfortheconstraints 89

PAGE 90

Hij0andSij0,respectively,andthenconsidertheobjectivefunctionJ=D(XjjHSHT))]TJ /F11 11.955 Tf 11.95 8.96 Td[(PijijHij)]TJ /F11 11.955 Tf 11.95 8.96 Td[(PijijSij.Equivalently,JcanbewrittenasJ=XijXijlogXij [HSHT]ij)]TJ /F4 11.955 Tf 11.96 0 Td[(Xij+[HSHT]ij)]TJ /F11 11.955 Tf 11.96 11.36 Td[(XijijHij)]TJ /F11 11.955 Tf 11.96 11.36 Td[(XijijSij.TheKKTconditionsrequire@J @Hab=0and@J @Sab=0,orequivalently,@DX(HSHT) @Hab=aband@DX(HSHT) @Sab=abastheoptimalityconditions,aswellasabHab=0andabSabasacomplementaryslacknessconditionforanyHabandSab.Fortheeaseofcomputation,weconstructthematrixforndingthederivativeofanentry[HSHT]ijwithrespecttoanyHabinFigure 4-3 .Herer(A,i)andc(B,j)meantheithrowofAandjthcolumnofB,respectively.Elementsoutsideoftheplottedcolumnandrowarezeros.Theelementsofthismatrixarezeroselsewhereexceptfortheathcolumnandathrow.Usingthisconventionalpartialderivativematrix,weobtain@PijXijlog[HSHT]ij @Hab=XkXka[HS]kb [HSHT]ka+XkXak[SHT]bk [HSHT]akand@Pij[HSHT]ij @Hab=Xk([HS]kb+[SHT]bk) 90

PAGE 91

Figure4-3. ThepartialderivativematrixofHSHTwithrespecttoHab. Therefore,@DX(HSHT) @Hab=)]TJ /F3 11.955 Tf 9.3 0 Td[(@PXijlog[HSHT]ij+P[HSHT]ij @Hab=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XkXka[HS]kb [HSHT]ka)]TJ /F11 11.955 Tf 11.95 11.36 Td[(XkXak[SHT]bk [HSHT]ak+Xk([HS]kb+[SHT]bk). (4)Theoptimalitycondition@DX(HSHT) @Hab=abandthecomplementaryslacknessconditionabHab=0togethergivethefollowingupdateruleforHab Hab HabPkXka[HS]kb=[HSHT]ka Pt[HS]tb+[SHT]bt+PkXak[SHT]bk=[HSHT]ak Pt[HS]tb+[SHT]bt(4)Alternatively,thisupdaterulecanalsobeachievedbyusingprojectedgradientmethod,inparticularbyupdatingHab Hab)]TJ /F3 11.955 Tf 11.96 0 Td[(ab@DX(HSHT) @Habwiththemagnitudeab=Hab Pt([HS]tb+[SHT]bt). 91

PAGE 92

NowwegivethemultiplicativeupdateruleforanySab.ThepartialderivativeofDX(HSHT)w.r.tSabisderivedas @DX(HSHT) @Sab=XstHsaHtb)]TJ /F11 11.955 Tf 11.96 11.36 Td[(XstXstHsaHtb [HSHT]stTheKKTconditions@DX(HSHT) @Sab=abandabSab=0togetherimplythefollowingupdateruleforSab Sab SabPstHsaHtb(Xst=[HSHT]st) PstHsaHtb(4)Alternatively,thisrulecanbederivedbytheprojectedgradientmethodSab Sab)]TJ /F3 11.955 Tf 11.95 0 Td[(ab@DX(HSHT) @Sabwithmagnitudeab=Sab PstHsaHtb.TheiANMFcommunitydetectionispresentedinAlg. 14 .TheparametersandtheirmeaningsinthiscasearesimilartothosedescribedintheSNMFcase. Algorithm14iANMFforcommunitydetection Input:Directed,unweighted(weighted)adjacentmatrixX,K,n0,,;Output:MatricesHandSandtheinferredcommunitylabels; 1: InitializeHandStobearandomnongnegativematrices; 2: iter 0 3: while(itern0)and(DX(HHT)>)do 4: UpdateHabbasedonequation( 4 ); 5: UpdateSabbasedonequation( 4 ); 6: iter iter+1; 7: endwhile 8: %InferringcommunitylabelsfromH% 9: Cb ;8b=1...K; 10: P normalized(H); 11: forb 1...Kdo 12: ifP(a,b)max(P(a,:))then 13: Cb Cb[fag; 14: endif 15: endfor 92

PAGE 93

Summary Withthemultiplicativeupdaterules( 4 )and( 4 ),wegivethecompletestepsforiterativelysolvingiANMFproblemwithrespecttotheI-divergence.Theserulesaredifferentfromwhathavebeendiscoveredinpriorstudiesand,toourknowledge,havenotyetbeenderivedintheliterature.Thus,theyareourcontributionsinthispaper.Analysis Werstshowthefollowingresult Theorem4.2. Atthestationarypoint(H,S)ofDX(HSHT),KKTconditionsimplythatXstXst=Xst[HSHT]st Proof. WeshowthattheconditionSab@DX(HSHT) @Sab=0implytheaboveequality.Inparticular,theKKTconditionequalstoSabXstHsaHtb=SabXstXstHsaHtb [HSHT]st.SummingoverallaandboftheLHSgivesXabSabXstHsaHtb=XstXabHsaSabHtb=Xst[HSHT]st.Similarly,summingoverallaandboftheRHSgivesXabXstXstHsaHtb [HSHT]st=XstXst[HSHT]st [HSHT]st=XstXst.Therefore,theequalityfollows. Wenextanalyzetheconvergenceanalysisofourproposedrules( 4 )and( 4 ).ByusingappropriateauxiliaryfunctionsG(S,eS)andG(H,eH),onecanshowthefollowing Theorem4.3. ThedivergenceD(XjjHSHT)isnonincreasingundertheupdaterules( 4 )and( 4 )andisinvariantifandonlyifSandHareattheirstationarypointsinthedivergence. 93

PAGE 94

Proof. Theproofofconvergenceforthetwoupdaterules( 4 )and( 4 )issimilartoTheorem 4.1 .LetusrstdenetwofunctionsG(S,eS)=XijXij(logXij)]TJ /F5 11.955 Tf 11.95 0 Td[(1)+Xij[HSHT]ij)]TJ /F11 11.955 Tf 11.96 11.36 Td[(XijXijijuv(logHivSvuHju)]TJ /F5 11.955 Tf 11.96 0 Td[(logijuv),andG(H,eH)=XijXij(logXij)]TJ /F5 11.955 Tf 11.95 0 Td[(1)+Xij[HSHT]ij)]TJ /F11 11.955 Tf 11.96 11.36 Td[(XijXijijuv(logHivSvuHju)]TJ /F5 11.955 Tf 11.95 0 Td[(logijuv).whereijuv=HiveSvuHju PstHiteStsHjs,ijuv=HivSvueHju PstHitStseHjs.Itisclearthateachijuv'sandijuv'sarenonzeroandsumuptounity.Wenowprovetheconvergenceofrule( 4 )forSwhenmatrixHisxed.LetF(S)=DX(HSHT).WeshowthatG(S,eS)denedaboveisanauxiliaryfunctionforF(S).WheneS=S,onecanverifythatG(S,eS)=F(S),thusweneedtocheckG(S,eS)F(H).Thisinequalityequalsto)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XijXijlog[HSHT]ij)]TJ /F11 11.955 Tf 23.92 11.36 Td[(XijXijijuv(logHivSvuHju)]TJ /F5 11.955 Tf 11.96 0 Td[(logijuv)Bythedenitionofijuv,onecanrewritetheaboveinequalityas)]TJ /F5 11.955 Tf 11.3 0 Td[(logXijijuvHivSuvHju ijuv)]TJ /F11 11.955 Tf 23.91 11.35 Td[(XijijuvlogHivSvuHju ijuvwhichgenerallyholdstrueduetoJensen'sinequalityandtheconvexityof)]TJ /F5 11.955 Tf 11.29 0 Td[(log()function.Now,takingthederivativeofG(S,eS)withrespecttoSgivestheupdaterule( 4 ).TheproofforHcanbeobtainedinaverysimilarmannerwithandthus,isomittedhere. 4.4ExperimentalResultsInthissection,werstvalidateourapproachesondifferentsynthesizednetworkswithknownground-truths,andthenpresentourndingsonreal-worldtracesincludingtheEnronemail[ 98 ]andFacebooksocialnetwork[ 87 ].Tocertifyourperformance,we 94

PAGE 95

comparetheresultstotwoNMFmethodsproposedin[ 102 ](i.e.,wSNMFandwANMF),andtherecentlysuggestedBayesianNMF[ 90 ](i.e.,Bayesianmethod).OurmethodsrequirethenumberofcommunitiesKasaninputparameter.WestressthatdeterminingthisquantityisnotthemainfocusofNMF-baseddetectionmethodssincealmostallofthemrelyonapredenedKtodiscoverthenetworkcommunities,ascommonlyobservedin[ 102 ][ 67 ][ 70 ].Thefore,thisquantityKispredeterminedusingaproceduresuggestedin[ 80 ],whichhasbeenshowntowell-predictthenumberofnetworkcommunitiesinatimelymanner.WealsousethisvalueasinputforwSNMFandwANMF.FortheBayesianmethod,wekeepthedefaultsettingsasprovidedinitsdeliverable. 4.4.1EmpiricalResultsonSynthesizedNetworksOfcourse,thebestwaytoevaluateourapproachesistovalidatethemonreal-worldnetworkswithknowncommunitystructures.Unfortunately,weoftendonotknowthatstructuresbeforehand,orsuchstructurescannotbeeasilyminedfromthenetworktopologies.Althoughsynthesizednetworksmightnotreectallthestatisticalpropertiesofrealones,theycanprovideustheknownground-truthsviaplantedcommunitiesandtheabilitytovaryothernetworkparameterssuchassizes,densitiesandoverlappinglevels,etc.Testingcommunitydetectionmethodsongenerateddatahasbecomesausualpracticethatiswidelyacceptedintheeld[ 58 ].Therefore,runningiSNMFandiANMFonsynthesizednetworksnotonlycertiestheirperformancebutalsoprovidesusthecondencetotheirbehaviorswhenappliedtoreal-worldtraces.Setup:Weusethewell-knownLFRoverlappingbenchmark[ 57 ]togenerate22weighteddirectedandundirectedtestbeds.Generatednetworksfollowthepower-lawdegreedistributionandcontainembeddedoverlappingcommunitiesofvaryingsizesthatcapturetheinternalcharacteristicsofreal-worldnetworks.Parametersare:thenumberofnodesN=1000,themixingparameter=0.1and0.3controllingtheoverallsharpnessofthecommunitystructure,theweightmixingw=0.1and0.3,theminimum 95

PAGE 96

Figure4-4. NormalizedMutualInformationscoresonsynthesizednetworks andmaximumcommunitysizescmin=10andcmax=50,themaximummembershipsofanodeom=2,andtheoverlappingfraction2[0,0.5]measuringthefractionofnodeswithmembershipsinmorethancommunities.Wesetthenumberofiterationsto400inallmethodsandrun22tests100timesforconsistency.Metric:Tomeasurethesimilaritybetweendetectedcommunitiesandtheembeddedground-truth,weevaluateGeneralizedNormalizedMutualInformation(NMI)[ 56 ].NMI(U,V)is1ifstructuresUandVareidenticalandis0iftheyaretotallyseparated.Thisisthemostimportantmetricforacommunitydetectionalgorithmbecauseitindicateshowgoodthealgorithmisincomparisonwiththetruecommunities.ThehighertheNMIvaluetotheground-truth,thebetter.Detectionquality:AsdepictedinFigure 4-4 ,ourapproachesiSNMFandiANMFachievethemoststableandcompetitive(ifnottosaythebest)NMIscoresonboth 96

PAGE 97

Figure4-5. Numberofcommunitiesonsynthesizednetworks weighteddirectedandundirectednetworks.Inparticular,onundirectednetworks(top2gures),NMIscoresproducedbyiSNMFarehighlycompetitivetothoseofwSNMFandareupto84%betterthanthosereturnedbytheBayesianmethod.Moreover,itsNMIscoresstillremainshighandbalanceasthemixingoverlappingratioincreases.ThismeansthecommunitiesdiscoveredbyiSNMFareconsistentlyofhighsimilaritytotheground-truthevenwhenmoreandmorenetworkcommunitiesareoverlappedwitheachother.wSNMFalsodisplaysthesepropertiesonundirectednetworks;however,itsperformancedegradessignicantlyondirectedweightednetworks,aswewilldiscussshortly.TheBayesianmethod,ontheotherhand,producesverylowNMIvaluesthattendtodecreasequicklyasincreases.Thisimpliescommunitiesdetectedbythismethodarenotideallycoincidentwiththeembeddedones,especiallywhentheyhighlyoverlapwitheachother. 97

PAGE 98

ThereisacloserelationshipbetweenthenumberofcommunitiesandtheidenticationcapacitythatweobservedinthecaseofundirectednetworksinFigure 4-5 .Asrevealedinitstopgures,theinputnumbersofcommunitiesforiSNMFandiSNMFarealmostidenticaltotheground-truthwhen=0.1andslightlydeviatefromthemwhen=0.3,whilethoseoftheBayesianmethodarefarawayfromthebaseline.Thiscloserelationship,asaresult,helpsiSNMFandwSNMFtodetermineapropernumberofbasicfeaturesandconsequently,indicatemoreappropriatecommunitylabels.However,thisobservationdoesnotappeartoholdforwANMFondirectednetworkssinceitperformspoorlywhereasourapproachiANMFstillperformsexcellentlyonthistypeofnetworks(Figure 4-4 ,bottomgures).ThebiggapbetweentheBayesianmethodandtheground-truthimpliesitsbuilt-inestimateofthenumberofcommunitiescouldpotentiallymisleadthefactorization,thusresultsinitslowNMIscores.ThesuperiorityofouriANMFapproachbecomesmorevisibleondirectedweightednetworks(Figure 4-4 ,bottomgures).Inthesegures,iANMFreturnsthebeststableNMIvaluesandtheyremainwealthyevenwhenevolves,i.e.,whenstronglyoverlappedcommunitiesappear.Inparticular,theNMIscoresreturnedbyiANMFaremorethantwicethoseofwANMFandareupto10%thoseofBayesianmethod.TheperformanceofwANMF,surprisingly,reducestonomorethanhalfofitspriorachievementevenwhenfedwiththerelativelyclosenumberoftruecommunities(bottomguresofFigure 4-5 ).ThisinturnindicatesthecommunitiesdiscoveredbywANMFareheavilydeviatedfromandareofverylowsimilaritytotheground-truth.Bayesianmethod'sperformanceissomehowthesameonthesedirectednetworkswithaverageNMIscorestendtoquicklydecreaseinthelongrun.ThiscomparisonamongthreeNMFfactorizationsrevealsthatiSNMFandiANMFarethebestidealmethodsforeffectivelyrecoveringtheoverlappednetworkcommunitystructures,especiallyonweightedanddirectednetworks. 98

PAGE 99

Figure4-6. RunningTimeonsynthesizednetworks Wenextcomparetherunningtimeofthreemethods.AsreportedinFigure 4-6 ,therunningtimesofiSNMFandwSNMFonundirectednetworksarefairlysimilartoeachother(atmost2sdifference)andaremuchlessthanthehugetimerequirementoftheBayesianmethod.Inaverage,theBayesianmethodrequiresalmost200sinordertonishthetestwhereasiSNMFandwSNMFonlyaskforroughly16sand14s,respectively.Ondirectednetworks,iANMFrequiresnearlythesameamountoftimeoftheBayesianmethodandmuchmoretimethanwANMF.NotethatthistimeconsumptionofiANMFisquiteunderstandablebecauseeachupdateforSabinequation( 4 )basedontheI-divergencealreadytookO(n2)time.However,thesuperiorityofitsproducedNMIscorestoothercompetitorsmakesiANMFapromisingapproach,especiallysuitedforthosewhostrivetodiscoverexcellentnetworkcommunitystructures. 99

PAGE 100

Figure4-7. Thenumberofcommunities,InternaldensityandOverlappingratioofEnronemailandFacebook-likedatasets Insummary,comparisonsamongthreealgorithmsongeneratednetworksshowthat(1)iSNMFisamongthebestNMFmethodsforefcientlyidentifyhighqualityoverlappingcommunitiesinweightedandunweightedundirectednetworks(2)iANMFisthebestamongthreemethodsforanalyzingweightedanddirectednetworkscontaininghighlyoverlappedcommunities,despiteitlongrunningtime.Moreimportantly,theperformanceofbothapproachesremainshealthilystableevenwhenmoreandmoreoverlappingcommunitiesareintroduced.TheseresultsprovideusthestrongcondencewhenapplyingiSNMFandiANMFtoanalyzethereal-worldtraces. 4.4.2ResultsonRealNetworksWenextutilizeiANMFandiSNMFtoanalyzetherealnetworkdatasetsandpresentourndingsontheiroverlappingstructures.Inparticular,wechoosetheEnronemaildatasetandtheFacebook-likesocialnetwork[ 87 ].TheEnronemailnetworkcontainsemailmessagesdatafromabout150users,mostlyseniormanagementofEnronInc.,fromJan1999toJuly2002[ 98 ].Eachemailaddressisrepresentedbyanuniqueidenticationnumberinthedatasetandeachlinkcorrespondstoamessagesentbetweenthesenderandthereceiver.TheFacebook-likesocialnetworkiscollectedfromstudentsofUniversityofCalifornia,Irvine.Thedatasetcontains20296messagessentandreceivedamong1899users.ThenumberofcommunitiesinputedforEnronemailandFacebook-likedatasetsaresetto8and18,respectively. 100

PAGE 101

Weareinterestedinunderstandingtheiroverlappingstructuresandwhattheoverlappingnodesreallymeantothem,particularlyinthetop5biggestcommunities.AsrevealedinFigure 4-7 ,thenumbersofmembersintop5communitiesofFacebook-likenetworkare,notsurprisingly,muchbiggerarethoseoftheEnronemailnetwork.However,theinternaldensity,i.e.,theinnerstructuresofthosetop5communitiesinEnronemailsaremuchstrongerthanthoseofFacebooknetworks.Indeed,thedensityvaluesofEnronemailcommunitiesaremorethantwiceofFacebooknetworks.ThiscanbeexplainedasemailcommunicationinaworkplaceamongmanagersoccursmuchmorefrequentlythanmessagesonasocialenvironmentliketheFacebooknetwork.Wenextinvestigateontheoverlappingsubstructuresoftheserealnetworks,i.e.,wewanttoknowhowmuchtheyareoverlappedandwhattheoverlappednodesmeantothecommunities.AsdescribedinFigure 4-7 ,all5topcommunitiesofFacebooknetworkarehighlyoverlappedwhereasjust3topcommunitiesofEnronemailnetworkappeartohavethisproperties.Moreover,overlappednodesonFacebooknetworktendtobeactiveuserswhoeagerlyparticipateinmultiplecommunitiesatthesametime,i.e.,theysendmessagestomultiplefriendsindifferentgroups.OverlappednodesonEnronemailnetwork,thoughfewer,suggestthattheypotentiallyplayvitalrolesinthecompanysincemostofthemcommunicatefrequentlymanyothermembersinallofthecommunities. 101

PAGE 102

CHAPTER5SOCIAL-AWAREROUTINGSTRATEGIESINMOBILEAD-HOCNETWORKSInthischapter,wedemonstratetheapplicabilityofourproposeddetectionalgorithmsQCAandAFOCSasthecommunityidenticationcoresinforwardingandroutingstrategiesinmobiledynamicnetworks.Inthefollowingparagraphs,werstpresenttheapplicationofQCAandthendescribehowAFOCScanhelptoimprovetheperformanceofthispracticalapplications. 5.1AMessageForwardingandRoutingStrategyEmployingQCAInabroadview,aMANETisadynamicwirelessnetworkwithorwithouttheunderlyinginfrastructure,inwhicheachnodecanmovefreelyinanydirectionandorganizeitselfinanarbitrarymanner.DuetonodesmobilityandunstablelinksnatureofaMANET,designinganefcientroutingschemehasbecomeoneofthemostimportantandchallengingproblemsonMANETs.RecentresearcheshaveshownthatMANETsexhibitthepropertiesofsocialnetworks[ 46 ][ 19 ][ 10 ]andsocial-awarealgorithmsfornetworkroutingareofgreatpotential.Thisisduetothefactthatpeoplehaveanaturaltendencytoformgroupsorcommunitiesincommunicationnetworks,whereindividualsinsideeachcommunitycommunicatewitheachothermorefrequentthanwithpeopleoutside.ThissocialpropertyisnicelyreectedtotheunderlyingMANETsbytheexistenceofgroupsofnodeswhereeachgroupisdenselyconnectedinsidethanoutside.ThisresemblestheideaofcommunitystructureinMobileAdhocNetworks.Multipleroutingstrategies[ 19 ]-[ 45 ]basedonthediscoveryofnetworkcommunitystructureshaveprovidedsignicantenhancementovertraditionalmethods.However,thecommunitydetectionmethodsutilizedinthosestrategiesarenotapplicablefordynamicMANETssincetheyhavetorecomputethenetworkstructurewheneverchangestothenetworktopologyareintroduced,whichresultsinsignicantcomputationalcostsandprocessingtime.Therefore,employinganadaptivecommunitystructure 102

PAGE 103

detectionalgorithmasacorewillprovideaspeedupaswellasrobusttoroutingstrategiesinMANETs.Weevaluateveroutingstrategies(1)WAIT:thesourcenodewaitsuntilitmeetsthedestinationnode(2)MCP:Anodekeepsforwardingthemessagesuntiltheyreachthemaximumnumberofhops(3)LABEL:Anodeforwardsorsendsthemessagestoallmembersinthedestinationcommunity[ 46 ](4)QCA:ALabelversionutilizingQCAasthedynamiccommunitydetectionmethodandlastly,(5)MIEN:Asocial-awareroutingstrategyonMANETs[ 26 ].EventhoughWAITandMCPalgorithmsareverysimpleandstraightforwardtounderstand,theyprovideushelpfulinformationaboutthelowerandupperboundsonthemessagedeliveryratio,timeredundancyaswellasmessageredundancy.TheLABELforwardingstrategyworksasfollow:itrstndsthecommunitystructureoftheunderlyingMANET,assignseachcommunitywiththesamelabelandthenexclusivelyforwardsmessagestodestinations,ortonext-hopnodeshavingthesamelabelsasthedestinations.MIENforwardingmethodutilizesMIENalgorithmasasubroutine.QCAroutingstrategy,insteadofusingastaticcommunitydetectionmethod,employsQCAalgorithmforadaptivelyupdatingthenetworkcommunitystructureandthenusesthenewlyupdatedstructuretoinformtheroutingstrategyforforwardingmessages. 5.1.1SetupWechooseRealityMiningdataset[ 29 ]providedbytheMITMediaLabtotestourproposedalgorithm.TheRealityMiningdatasetcontainscommunication,proximity,location,andactivityinformationfrom100studentsatMIToverthecourseofthe2004-2005academicyear.Inparticular,thedatasetincludescalllogs,Bluetoothdevicesinproximity,celltowerIDs,applicationusage,andphonestatus(suchaschargingandidle)oftheparticipatedstudentsofover350,000hours(40years).Inthispaper,wetakeintoaccounttheBluetoothinformationtoformtheunderlyingMANETandevaluatetheperformanceoftheaboveveroutingstrategies. 103

PAGE 104

ADeliveryRatio BAverageDuplicateMessage CAverageDeliveryTime Figure5-1. ExperimentalresultsontheRealityMiningdataset 104

PAGE 105

5.1.2ResultsForeachroutingmethod,weevaluatethefollowings(1)Deliveryratio:Theportionofsuccessfullydeliveredoverthetotalnumberofmessages(2)Averagedeliverytime:Averagetimeforamessagetobedelivered.(3)Averagenumberofduplicatedmessagesforeachsentmessage.Inparticular,atotalof1000messagesarecreatedanduniformlydistributedduringtheexperimentdurationandeachmessagecannotexistlongerthanathresholdtime-to-live.TheexperimentalresultsareshowninFigure 5)]TJ /F5 11.955 Tf 11.96 0 Td[(1A 5)]TJ /F5 11.955 Tf 11.96 0 Td[(1B and 5)]TJ /F5 11.955 Tf 11.96 0 Td[(1C .Figure 5)]TJ /F5 11.955 Tf 11.95 0 Td[(1A describesthedeliveryratioasafunctionoftime-to-live.Asrevealedbythisgure,QCAachievesmuchbetterdeliveryratiothanMIENaswellasLABELandfarbetterthanWAIT.ThismeansthatQCAroutingstrategysuccessfullydeliversmanymoremessagesfromthesourcenodestothedestinationsthantheothers.Moreover,astime-to-liveincreases,thedeliveryratioofQCAtendstoapproximatetheratioofMCP,thestrategywithhighestdeliveryratio.ComparisonondeliverytimeshowsthatQCArequireslesstimeandgetsmessagesdeliveredsuccessfullyfasterthanLABEL,asdepictedinFigure 5)]TJ /F5 11.955 Tf 11.95 0 Td[(1C .Itevenrequireslessdeliverytimeincomparisonwiththesocial-awaremethodMIEN.ThiscanbeexplainedasthestaticcommunitystructuresinLABELcanpossiblygetmessageforwardedtoawrongcommunitywhenthedestinationseventuallychangetheircommunitiesduringtheexperiment.BothQCAandMIEN,ontheotherhand,capturesandupdatesthecommunitystructureson-the-yaschangesoccur,thusachievesbetterresults.ThenumbersofduplicatemessagespresentedinFigure 5)]TJ /F5 11.955 Tf 11.95 0 Td[(1B indicatethatbothQCAandMIENachievesthebestresults.ThenumbersofduplicatedmessagesofMCPmethodaresubstantiallyhigherthanthoseoftheothersandarenotplotted.Infact,theresultsofQCAandMIENarerelativelycloseandtendtoapproximateeachotherastime-to-liveincreases. 105

PAGE 106

Inconclusion,QCAisthebestsocial-awareroutingalgorithmamongveroutingstrategiessinceitsdeliveryratio,deliverytime,andredundancyoutperformthoseoftheothermethodsandareonlybelowMCPwhilethenumberofduplicatemessagesismuchlower.QCAalsoshowsasignicantimprovementoverthenaiveLABELmethodwhichusesastaticcommunitydetectionmethodandthus,conrmstheapplicabilityofouradaptivealgorithmtoroutingstrategiesinMANETs. 5.2AMessageForwardingandRoutingStrategyEmployingAFOCSWepresentapracticalapplicationwherethedetectionofoverlappingnetworkcommunitiesplaysavitalroleinforwardingstrategiesincommunicationnetworks.WiththehelpfulknowledgeofthenetworkcommunitystructurediscoveredbyAFOCS,weproposeanewcommunity-basedforwardingalgorithmthatsignicantlyreducesthenumberofduplicatemessageswhilemaintainingcompetitivedeliverytimesandratios,whichareessentialfactorsofaforwardingstrategy. 5.2.1MessageForwardingStrategyLetusrstdiscusshowournewforwardingalgorithmworksinpracticeandthenhowAFOCShelpsittoovercometheabovelimitations.WeuseAFOCStodetectoverlappingcommunitiesandkeepitup-to-dateasthenetworkchanges.EachnodeinacommunityisassignedthesamelabelandeachoverlappednodeuhasasetofcorrespondinglabelsCom(u).Duringthenetworkoperation,ifadevicesucarryingthemessagemeetsanotherdevicevwhoindeedsharesmorecommoncommunitylabelswiththedestinationthanu,i.e.,jCom(v)\Com(dest)j>jCom(u)\Com(dest)j,thenuwillforwardthemessagetov.Thesameactionsthenapplytovaswellastodevicesthatvmeets.Theintuitionbehindsthisstrategyisthatifvsharesmorecommunitieswiththedestinationnodes,itislikelythatvwillhavemorechancestodeliverthemessagetothedestination.Bydoinginthisway,wenotonlyhavehigherchancestocorrectlyforwardthemessagesbutalsogeneratemuchlessduplicatemessages.Duetoits 106

PAGE 107

adaptivenatureandtheabilityofidentifyingoverlappingcommunities,AFOCShelpsouralgorithmtoovercometheaboveshortcomingsnaturally.Thisexplainswhyourforwardingalgorithmcansignicantlyreducethenumberofduplicatemessageswhilemaintainingverycompetitivedeliverytimesandratios. 5.2.2SetupWecomparesixforwardingstrategies(1)MIEN:Arecentlyproposedsocial-awareroutingstrategyonMANETs[ 26 ](2)LABEL:Anodewillforwardthemessagestoanothernodeifitisinthesamecommunityasthedestination[ 46 ](3)WAIT:Thesourcenodewaitsandkeepsforwardingthemessageuntilitmeetsthedestination(4)MCP:Anodekeepsforwardingthemessagesuntiltheyreachthemaximumnumberofhops(5)QCA:ALABELversionutilizingQCA[ 81 ]astheadaptivedisjointcommunitydetectionmethodandlastly(6)AFOCS:OurnewlyproposedforwardingalgorithmequippedwithAFOCSasancommunitydetectionandupdatecore.ResultsofWAITandMCPalgorithmsprovideusthelowerandupperboundsofimportantfactors:messagedeliveryratio,timeredundancyandmessageredundancy.OurexperimentsareperformedontheRealityMiningdatasetprovidedbytheMITMediaLab[ 29 ].Thisdatasetcontainscommunication,proximity,location,andactivityinformationfrom100studentsatMIToverthecourseofthe2004-2005academicyear.Inparticular,wetakeintoaccounttheBluetoothinformationtoconstructtheunderlyingcommunicationnetworkandevaluatetheperformanceoftheabovesixroutingstrategies.Ineachexperiment,500messagesendingrequestsarerandomlygeneratedanddistributedindifferenttimepoints.Tocontroltheforwardingprocess,weusehop-limit,time-to-live,andmax-copiesparameters.Amessagecannotbeforwardedmorethanhop-limithopsinthenetworkorexistintheprocesslongerthantime-to-live,otherwiseitwillbeautomaticallydiscarded.Moreover,themaximumnumberofsamemessages 107

PAGE 108

AAverageDuplicateMessage BDeliveryRatio CAverageDeliveryTime Figure5-2. ExperimentalresultsontheRealityMiningdataset 108

PAGE 109

adevicecanforwardtotheothersisrestrictedbymax-copies.Experimentsresultsarerepeatedandresultsareaveragedforconsistency. 5.2.3ResultsOurresultsarepresentedinFigures 5-2A 5-2B 5-2C .TherstobservationrevealsthatourproposedforwardingalgorithmachievesthelowestnumberofduplicatemessagesasdepictedinFigure 5-2A ,andevenfarbetterthanthesecondbestmethodQCA.Onaverage,only46.5duplicatemessagesaregeneratedbyAFOCSduringevaluationprocessincontrastwith212.2ofQCA,274.2ofMIEN,496.4ofLABELandthehuge1071.0overheadmessagesofMCP.Thus,onthenumberofduplicatemessages,AFOCSstrikinglyachievesimprovementfactorsof4.5x,5x,11xand23xoverthesementionedstrategies,respectively.TheseextremelylowoverheadstronglyimplytheefciencyofAFOCSincommunicationnetworks.Figures 5-2B and 5-2C presentourresultsontheothertwoimportantfactors,themessagedeliveryratiosanddeliverytimes.TheseguressupportivelyindicatethatAFOCSachievescompetitiveresultsonbothofthesevitalfactors.Ingeneral,AFOCSisthesecondbeststrategywithalmostnonoticeabledifferentbetweenitselfandtheleadermethodLABEL.Onaverage,AFOCSgets33%ofthetotalmessagesdeliveredin3569.2sandonlyalittlebitlagsoverMCP(34%in3465.3s)andLABEL(slightlyover33%in3462.7s),andisfarbetterthanMIEN(32%in3537.6s)andQCA(32%in3572.2s).Thiscanbeexplainedbytheadvantagesofknowingtheoverlappingcommunitystructure:thedisjointnetworkcommunitiesinQCAandMIENcanpossiblyhavemessagesforwardedtothewrongcommunitieswhenthedestinationchangesitsmembership.Withtheabilityofquicklyupdatingthenetworkstructure,AFOCScanefcientlycopewiththischangeandthus,canstillprovidethemostupdatedforwardinginformation.Insummary,AFOCShelpsourforwardingstrategytoreduceupto11xthenumberofduplicatemessageswhilekeepinggoodaveragedeliveryratioandtime.These 109

PAGE 110

experimentalresultsarehighlycompetitiveandsupportivelyconrmtheeffectivenessofAFOCSandournewroutingalgorithmoncommunicationnetworks. 110

PAGE 111

CHAPTER6SOLUTIONSFORWORMCONTAINMENTINONLINESOCIALNETWORKSInthissection,wepresentanotherpracticalapplicationofourproposedalgorithmsinwormcontainmentprobleminOSNs.WerstsuggestasolutionbasedonQCA,andthendescribehowAFOCScanhelptoimprovetheperformanceofthissolutionforthispracticalproblemincomplexnetworks.Sincetheirintroduction,popularsocialnetworksitessuchasFacebook,Twitter,Bebo,andMySpacehaveattractedmillionsofusersworldwide,manyofwhomhaveintegratedthosesitesintotheireverydaylives.Onthebrightside,OSNsareidealplacesforpeopletokeepintouchwithfriendsandcolleagues,tosharetheircommoninterests,orjustsimplytosocializeonline.However,ontheotherside,socialnetworksarealsofertilegroundsfortherapidpropagationofmalicioussoftwares(suchasvirusesorworms)andfalseinformation.Facebook,oneofthemostfamoussocialsites,experiencedawidepropagationofatrojanwormnamedKoobfaceinlate2008.KoobfacemadeitswaynotonlythroughFacebookbutalsoBebo,MySpaceandFriendstersocialnetworks[ 31 ][ 53 ].Onceauser'smachineisinfected,thiswormscansthroughthecurrentuser'sproleandsendsoutfakemessagesorwallpoststoeveryoneintheuser'sfriendlistwithtitlesorcommentstoappealtopeople'scuriosity.Ifoneoftheuser'sfriends,attractedbythecommentswithoutashadowofdoubt,clicksonthelinkandinstallsthefakeashplayer,hiscomputerwillbeinfectedandKoobface'slifewillthencycleonthisnewlyinfectedmachine.WormcontainmentproblembecomesmoreandmorepressinginOSNsasthiskindofnetworksevolvesandchangesrapidlyovertime.Thedynamicsofsocialnetworksthusgiveswormsmorechancestospreadoutfasterandwiderastheycanexiblyswitchbetweenexistingandnewusersinordertopropagate.Therefore,containingwormpropagationonsocialnetworksisextremelychallenginginthesensethatagoodsolutionattheprevioustimestepmightnotbesufcientoreffectiveatthenexttime 111

PAGE 112

Figure6-1. Ageneralwormcontainmentstrategy. step.Althoughonecanrecomputeanewsolutionateachtimethenetworkchanges,doingsowouldresultinheavycomputationalcostsandbetimeconsumingaswellasallowingwormsspreadingoutwiderduringtherecomputingprocess.Abettersolutionshouldquicklyandadaptivelyupdatethecurrentcontainingstrategybasedonchangesinnetworktopology,andthuscanavoidthehassleofrecomputation.Therearemanyproposedmethodsforwormcontainmentoncomputernetworksbyeitherusingamulti-resolutionapproach[ 97 ],orusingasimplicationoftheThresholdRandomWalkscandetector[ 106 ],orusingfastandefcientwormsignaturegeneration[ 51 ].Therearealsoseveralmethodsproposedforcellularandmobilenetworks[ 104 ][ 7 ].However,theseapproachesfailtotakeintoaccountthecommunitystructureaswellasthedynamicsofsocialnetworks,andthusmightnotbeappropriateforourproblem.Arecentwork[ 110 ]proposedasocial-basedpatchingschemeforwormcontainmentoncellularnetworks.However,thismethodencountersthefollowinglimitationsonarealsocialnetwork(1)itsclusteredpartitionsdonotnecessarilyreectthenaturalnetworkcommunities,(2)itrequiresthenumberofclustersk(whichisgenerallyunknownforsocialnetworks)mustbespeciedbeforehand,and(3)itexposesweaknesseswhendealingwiththenetwork'sdynamics. 112

PAGE 113

6.1AnApplicationofQCAinContainingWormsinOSNs 6.1.1SetupToovercometheselimitations,ourapproachrstutilizesQCAtoidentifythenetworkcommunitystructure,andadaptivelykeepsthisstructureupdatedasthenetworkevolves.Oncenetworkcommunitiesaredetected,ourpatchdistributionprocedurewillselectthemostinuentialusersfromdifferentcommunitiesinordertosendpatches.Theseusers,assoonastheyreceivepatches,willapplythemtorstdisinfectthewormandthenredistributethemtoallfriendsintheircommunities.Theseactionswillcontainwormpropagationtoonlysomecommunitiesandpreventitfromspreadingouttoalargerpopulation.Tothisend,aquickandprecisecommunitydetectionmethodwilldenitelyhelpthenetworkadministratortoselectamoresufcientsetofcriticaluserstosendpatches,thusloweringdownthenumberofsentpatchesaswellasoverheadinformationoverthesocialnetwork. Algorithm15PatchDistribution Input:G=(V,E)anditscommunitystructureC=fC1,C2,..,CpgOutput:ThesetofinuentialusersP. 1: P=;; 2: forCi2Cdo 3: while9uunvisitedinCisatisfyingmaxu2CifeCiout(u)g>0do 4: Letv argmaxu2CifeCiout(u)g; 5: P=P[v; 6: MarkvasvisitedinCi; 7: endwhile 8: endfor 9: SendpatchestousersinP; Wenextdescribeourpatchdistribution.Thisproceduretakesintoaccounttheidentiednetworkcommunitiesandselectsasetofinuentialusersfromeachcommunityinordertodistributepatches.Inuentialusersofacommunityareoneshavingthemostrelationshipsorconnectionstoothercommunities.Inanadversarypointofview,theseinuentialusersarepotentiallyvulnerablesincetheynotonlyinteractactivelywithintheircommunitiesbutalsowithpeopleoutside,andthus,they 113

PAGE 114

A=2% B=10% C=20% Figure6-2. Infectionratesonstaticnetworkwithk=150clusters caneasilyfool(orbefooledby)peoplebothinsideandoutsideoftheircommunities.Ontheotherpointofview,theseusersarealsothebestcandidatesforthenetworkdefendertodistributepatchessincetheycaneasilyannounceandforwardpatchestoothermembersandnon-members.InAlg. 15 ,wepresentaquickalgorithmforselectingthesetofmostinuentialusersineachcommunity.Thisalgorithmstartsbypickingtheuserwhosenumberofsocialconnectionstooutsidecommunitiesisthehighest,andtemporarilydisregardsthisuserfromtheconsideringcommunity.Thisprocessrepeatsuntilnoconnectionscrossingamongcommunitiesexists.Thissetofinuentialusersisthecandidateforthenetworkdefenderfordistributingpatches. 114

PAGE 115

A=2% B=10% C=10% Figure6-3. Infectionratesondynamicnetworkwithk=200clusters 6.1.2ResultsWepresenttheresultsofourQCAmethodontheFacebooknetworkdataset[ 100 ]andcomparetheresultswiththesocialbasedmethod(Zhu'smethod[ 110 ])viaaweightedversionofouralgorithms.ThewormpropagationmodelinourexperimentsmimicsthebehaviorofthefamousKoobfaceworm.Theprobabilitiesofactivatingthewormisproportionaltocommunicationfrequencybetweenthevictimandhisfriends.Thetimetakenforwormstospreadoutfromoneusertoanotherisinverselyproportionaltothecommunicationfrequencybetweenthisuserandhisparticularfriend.Finally,whenawormhassuccessfullyinfectedauser'scomputer,itwillstartpropagatingassoonas 115

PAGE 116

thiscomputerconnectstoaspecicsocialnetwork(Facebookinthiscase).Whenthefractionofinfectedusersreachesathreshold,thedetectionsystemraisesanalarmandpatcheswillautomaticallybesenttomostinuentialusersselectedbyAlg. 15 .Onceauserreceivesthepatch,hewillrstapplyittodisinfectthewormandthenwillhaveanoptiontoforwardittoallfriendsinhiscommunity.Eachexperimentisseededwith0.02%ofuserstobeinitiallyinfectedbyworms.Wecompareinfectionratesofthesocial-basedmethodofZhu'sandours.Theinfectionrateiscomputedasthefractionoftheremaininginfectedusersoverallinfectedones.ThenumberofclusterskinZhu'smethodissettobe150instaticand200indynamicnetworks,andforeachvalueofk,thealarmingthresholdissettobe2%,10%,and20%,respectively.Eachexperimentisrepeated1000timesforconsistency.Figure 6-2 6-3 showtheresultsofourexperimentsforthreedifferentvaluesofkand.Werstobservethatthelongerwewait(thehigherthealarmthresholdis),thehighernumberofusersweneedtosendpatchestoinordertoachievethedesiredinfectionrate.Forexample,withk=150clustersandanexpectedinfectionrateof0.3,weneedtosendpatchestolessthan10%numberofuserswhen=2%,tomorethan15%numberofuserswhen=10%andtonearly90%oftotalinuentialuserswhen=20%.Asecondobservationrevealsthatourapproachachievesbetterinfectionratesthanthesocial-basedmethodofZhu'sinastaticversionofthesocialnetworkasdepictedinFigure 6-2 .Inparticular,theinfectionratesobtainedinourmethodarefrom5%to10%betterthanthoseofZhu's.Whenthenetworkevolvesasnewusersjoininandnewsocialrelationshipsareintroduced,weresizethenumberofclusterkandrecomputetheinfectionratesofthesocialbasedmethodwiththenumberofclusterk=200,andthealarmthreshold=2%and10%respectively.AsdepictedinFigures 6-3 ,ourmethod,withthepowerofquicklyandadaptivelyupdatingthenetworkcommunitystructure,achievesbetterinfectionratesthanZhu'smethodwhilethecomputationalcostsand 116

PAGE 117

runningtimeissignicantlyreduced.Asdiscussed,detectingandupdatingthenetworkcommunityisthecrucialpartofasocialbasedpatchingscheme:agoodandup-to-datenetworkcommunitystructurewillprovidethenetworkdefenderatightersetofvulnerableusers,andthus,willhelptoachievelowerinfectionrates.Ouradaptivealgorithm,insteadofrecomputingthenetworkstructureeverytimechangesareintroduced,quicklyandadaptivelyupdatesthenetworkcommunitieson-the-y.Thankstothisfrequentlyupdatedcommunitystructure,ourpatchdistributionprocedureisabletoselectabettersetofinuentialusers,andthus,helpsinreducingthenumberofinfectedusers.WefurtherlookmoreintothebehaviorofZhu'smethodwhenthenumberofclusterskvaries.WecomputeandcomparetheinfectionratesonFacebookdatasetforvariouskrangingfrom1Kto2.5Kwithourapproach.Wersthopethatthemorepredenedclusters,thebetterinfectionratesclusteredpartitioningmethodwillachieve.However,theexperimentalresultsrevealtheopposite.Inparticular,withaxedalarmingthreshold=10%and60%patchednodes,theinfectionratesachivedbyZhu'smethoddonotdecreasebutrangingnear28%whileoursarefarbetter(20%)withmuchlesscomputationaltime.Finally,acomparisononrunningtimeonthetwoapproachesshowsthattimetakenforZhu'smethodismuchmorethanourcommunityupdatingprocedure,andhence,maypreventthismethodtocompleteinatimelymanner.Inparticular,ourapproachtakesonly3secondsforobtainingthebasiccommunitystructureandatmost30secondstocompleteallthetaskswhereas[ 110 ]requiresmorethan5minutestodividethecommunicationnetworkintomodulesandselectingthevertexseparators.Inthatdelay,wormpropagationmayspreadouttoalargerpopulation,andthus,thesolutionmaynotbeeffective.Theseexperimentalresultsconrmtherobustnessandefciencyofourapproachonsocialnetworks. 117

PAGE 118

6.2ContainingWormswithOverlappingCommunitiesDetectedbyAFOCSWeshowanotherapplicationofAFOCSinwormcontainmentproblemonOSNs.OSNsaregoodplacesforpeopletosocializeonlineortostayintouchwithfriendsandcolleagues.However,whensomeoftheusersareinfectedwithmalicioussoftware,suchasvirusesorworms,OSNsarealsofertilegroundsfortheirrapidpropagations.Sincemobiledevicesareabletoaccessonlinesocialapplicationsnowadays,wormsandvirusesnowcantargetcomputers[ 81 ]andmobiledevices[ 110 ].Recently,communitystructure-basedmethodshavebeenproventobeeffectivesolutionstopreventwormsfromspreadingoutwideronnotonlysocialnetworks[ 81 ][ 82 ]butalsocellularnetworks[ 110 ].Duetothehighandlowfrequenciesofinteractionsinsideandbetweencommunities,wormsspreadoutquickerwithinacommunitythanbetweencommunities.Therefore,anappropriatereactionshouldrstcontainwormsintoonlyinfectedcommunities,andthenpreventthemfromgettingoutside.Thisstrategycanbeaccomplishedbypatchingthemostinuentialmemberswhoarewell-connectednotonlytomembersoftheircommunitybutalsotopeopleinothercommunities. 6.2.1SetupInourexperiments,weuseFacebooknetworkdatasetcollectedin[ 100 ].ThisdatasetcontainsfriendshipinformationandwallpostsamongNewOrleansregionalnetwork,spanningfromSep2006toJan2009.Thedatasetcontainsmorethan63.7Knodes(users)connectedbymorethan1.5millionfriendshiplinks.WekeepotherparametersaswellastheKoobfacewormpropagationmodelthesameas[ 82 ]forcomparisonconvenience.Withtheadvantagesofknowledgeoverlappingcommunities,weareabletodevelopabetterandmoreefcientpatchingscheme.Inparticular,weenhancethepatchingschemepresentedinin[ 82 ]totaketheadvantageoftheoverlapregions:nodesintheboundaryofoverlappedregionsareselectedforpatching(Figure 6-4A ).Alg 16 detailstheadjustedscheme. 118

PAGE 119

AInuentialusersselection BNumberofpatchednodes Figure6-4. OverCompatchingscheme. Algorithm16OverComPatchingScheme Input:G=(V,E)andC=fC1,C2,...,CkgdetectedbyAFOCSOutput:AsetofpatchednodesIS. 1: IS ;; 2: for(Ci,Cj2C)do 3: if(Ci\Cj6=;)then 4: %Choosetheneighborsofoverlappednodesasinuentialones% 5: IS IS[N(u)8u2Ci\Cj; 6: endif 7: endfor 8: %Patchdistributionprocedure% 9: for(u2IS)do 10: Sendpatchestou; 11: Leturedistributepatchestow2ISnN(u); 12: endfor 6.2.2ResultsWecomparetheOverCompatchingschemeandoverlappingcommunitiesfoundbyAFOCStothoseusingdisjointcommunitiesproposedbyBlondeletal.[ 6 ],QCAbyNguyenetal.[ 81 ]andClusteringbasedmethodsuggestedbyZhuetal.[ 110 ].ThenumberofpatchednodesisshowninFigure 6-4B .Boththenumberofpatchednodesandtheinfectionratesdeclineremarkably.Inparticular,thenumberofnodestosendpatchinAFOCSissubstantiallysmallerbyhalfofthoserequiredbyBlondel,QCAaswellasZhu'smethods:only1725nodesover63Knodesinthenetworksareneeded 119

PAGE 120

A=2% B=10% C=20% Figure6-5. Infectionratesbetweenfourmethods. 120

PAGE 121

tobepatchedbyOverCompatchingscheme,whiletheotherschemesrequirenearlytwice(3,300nodes).ThereasonbehindthisimprovementisduetothenatureofourAFOCSframework,theneighborsoftheoverlappednodesshouldnotbetofarawayfromthecenterofeachcommunity,thustheycaneasilyredistributethepatchesoncereceived.Wenextpresenttheachievedinfectionrateswithalarmingthresholds(thefractionofinfectednodesoverallnodes)=2%,10%and20%,respectively.Thisthresholdalarmsthedistributionprocessassoonastheinfectedrategoesbeyond.TheresultsarereportedinFigures 6-5A 6-5B 6-5C ,respectively.Ingeneral,thehigher(i.e.,thelongerwewait),themorenodeswehavetosendpatchesandthehigherinfectionrate.OverComwithAFOCSachievesthelowestinfectionratesinalmostalltheexperimentsandjustalittlebitlagbehindwhen=10%.Inparticular,when=2%,AFOCShelpsOverComtoremarkablyreducefrom1.6xupto4.3xtheinfectionratesofQCA,from2.6xupto4xtheinfectionratesofBlondeland3.2xto7xthoseofZhu'smethod.When=10%,AFOCS+OverComachievesaverageimprovedratesof9%overQCA,5%overBlondeland43%overZhu'smethods.As=20%,theaverageimprovementsare12%,23%and53%,respectively.Duetothenatureoftheeventhandlingprocesses,theneighborsofoverlappednodesarenotlocatedfarawayfromtherestoftheircommunities.Asaresult,theycanhelptodistributepatchestomoreusersinthecommunities,hencehelptolowertheinfectionratesofAFOCS.Theseimprovementfactors,again,conrmtheeffectivenessofourproposedmethod. 121

PAGE 122

CHAPTER7STABLECOMMUNITYDETECTIONINONLINESOCIALNETWORKSAlargebodyofworkhasbeendevotedtondgeneralcommunities(i.e.,withouttheconceptofstability)onbothdirectedandundirectednetworksintheliterature[ 32 ].Onthecontrary,onlyaveryfewapproachesaresuggestedtoidentifystablecommunities[ 59 ][ 68 ],especiallyondirectedandweightednetworks.Themainsourceofdifcultyisduetotheinconsistencyofcommunitymembersinageneralstructure:whiletheymightappeartobeinacommunityatonetime,theymaynotcommittothatparticularcommunityinalongrun.Onepossibleapproach,therefore,istondaconsensusofaspecicalgorithmaftermultiplerunsandusethiscoreasstablecommunities[ 59 ].However,doinginthiswaywouldresultinexpensivecomputationalcostandtimeconsumingaswellaslackofconvergenceguarantees.In[ 68 ],theauthorsestimatethemutuallinksbetweenpairsofusersandsuggestadetectionmethodthatoptimizesthetotalmutualconnectiononthewholenetwork.Whiletheideaofmutualconnectionisquiteinteresting,wendthatitmightnotbesufcientbecausesomeestimatedmutuallinksareoflowmagnitudes,andthus,maynotreectthecorrectconceptofstabilityatthecommunitylevel.Ingeneral,astablecommunityisoftencharacterizedeitherbyitstightandstronginternalrelationshipsrepresentedbythemutualconnectionsamongitsusers[ 68 ],orbyitsinternallinkswhopossessahightendencytoremainwithinthecommunityoveralongperiodoftime[ 23 ].Inotherwords,stablecommunitiesinthenetworkarecommonlycharacterizedbystableconnectionsamongtheirmembers.Motivatedbytheseobservations,wesuggestSCD(shortforStableCommunityDetection),aframeworktoeffectivelyidentifystablecommunitiesindirectedOSNsthatfacilitatesbothoftheaboveintuitions.Inabigpicture,SCDworksbyrstenrichingtheinputnetworkwiththestabilityestimationofalllinksinthenetwork,andthendiscoveringcommunitiesviastableconnectionsusingthelumpedMarkovchainmodel.Ourapproachis 122

PAGE 123

mathematicallysupportedbyakeyconnectionbetweenthepersistenceprobabilityofacommunityatthestationarydistributionanditslocaltopology.OnenotableadvantageofSCDisthatitrequiresonlyasingleiteration,whichshallsignicantlyreducetherunningtime.Furthermore,sinceourmethodintrinsicallyaccountsforstability,thediscoveredcommunitiesshouldbestableasopposedtodoingastatisticalanalysis.Insummary,wesuggestanestimationwhichprovideshelpfulinsightsintothestabilityoflinksintheinputnetwork.Basedonthat,weproposeSCD-aframeworktoidentifycommunitystructureindirectionalOSNswiththeadvantageofcommunitystability.Wenextexploreanessentialconnectionbetweenthepersistenceprobabilityofacommunityatthestationarydistributionanditslocaltopology,whichisthefundamentalmathematicaltheorytosupporttheSCDframework.Tocertifytheefciencyofourapproach,weextensivelytestSCDonbothsynthesizeddatasetswithembeddedcommunitiesandreal-worldsocialtraces,includingNetHEPTandNetHEPT WCcollaborationnetworksaswellasFacebooksocialnetworks,inreferencetotheconsensusofotherstate-of-the-artdetectionmethods.HighlycompetitiveempiricalresultsconrmthequalityandefciencyofSCDonidentifyingstablecommunitiesinOSNs. 7.1BasicNotationsWeintroducethebasicnotationsrepresentingtheunderlyingsocialnetworkthatwewillusethroughoutthispaper.(Graphnotation)LetG=(V,E,w)beadirectedandweightedgraphrepresentingasocialnetworkwithVisthesetofnnetworkusers(ornodes),Eisthesetofmdirectedrelationships(oredges),andw(orpreciselywuv)istheweightfunctiononeachedge(u,v)2Erepresentingthecommunicationfrequencybetweenuseruandvinthesocialnetwork.Withoutlossofgenerality,weassumethatalledgeweightsarenormalized,i.e.,P(u,v)2Ewuv=1andwuv0.Foreachedge(u,v)2Ewhich(v,u)=2E,wesaythatthebackwardsedge(v,u)ismissing,wewillusethenotation 123

PAGE 124

(v;u)andwv;utodenotethemutuallinkofedge(u,v)if(v,u)shouldindeedexistinEanditsweight,respectively.Furthermore,wewillusethenotationst(u,v,t)todenotethestabilityestimateofedge(u,v)attimestep(orhop)t.Thesenotationswillbedescribedindetailinnextsection.(Communitynotation)DenotebyC=fC1,C2,...,Cqgthenetworkcommunitystructure,i.e.,acollectionofqsubsetsofVsatisfying[qi=1Ci=VandCi\Cj=;8i,j.WesaythateachCi2CanditsinducedsubgraphformacommunityofG.Foranodeu2V,letN+u,N)]TJ /F6 7.97 Tf -.93 -7.3 Td[(uandNudenotethesetofoutgoing,thesetofincoming,andthesetofallneighbornodesadjacenttou,respectively.Furthermore,letk+u(orw+u),k)]TJ /F6 7.97 Tf -1.06 -7.29 Td[(u(orw)]TJ /F6 7.97 Tf -1.37 -7.29 Td[(u)andku(orwu)bethecorrespondingcardinalities(ortotalweights)ofthesesets.ForanyCV,letCinandCoutdenotethesetoflinkshavingbothendpointsinCandthesetoflinksheadingoutfromC,respectively.Inaddition,letmC=jCinj(rsp.wC=w(Cin))andk+C=Pu2Ck+u(rsp.w+C=Pu2Cw+u).Finally,thetermsnode-vertexaswellasedge-link-connectionareusedinterchangeably. 7.2LinkStabilityEstimationWedescribeourrststeptowardstheidenticationofstablecommunitiesinthenetwork:thelinkstabilityestimationprocess.Intuitively,astablecommunityisoftencharacterizedeitherbyitstightandstronginternalrelationshipsrepresentedbythemutualconnectionsamongitsusers[ 68 ],orbyitsinternallinkswhopossessahightendencytoremainwithinthecommunityoveralongperiodoftime[ 23 ].Inotherwords,stablecommunitiesinthenetworkarecommonlycharacterizedbystableconnectionsamongtheirmembers.Motivatedbytheseobservations,inthissection,wesuggestaprocedureforestimatingthestabilityofeachlinkinthenetworkthatfacilitatesbothoftheaboveintuitions.Ourestimationprocedureconsistsoftwostages:Intheearlystage,thereciprocityofeachlinkinthenetworkisrstpredicted,andbasedonthat,itsstabilityisconsequentlyevaluatedinthelaterstage. 124

PAGE 125

7.2.1LinkReciprocityPredictionWhendealingwithlargescaleOSNs,itispossiblethatsomebackwardsedgesbetweenindividualsaremissing.Thislackofinformationmayduetotheimperfectdatacollectionprocess,orbecausethesebackwardsedgesarenotyetreectedintheunderlyingnetworkbutshouldduetothestrongrelationshipsbetweenlocalnetworkusers.Forinstance,Leskovecetal.[ 65 ]observethatfriendsoffriendsinsocialnetworkstendtobefriendofeachotherinthenearfuture,i.e.,thereshouldbedualconnectionsbetweenfriendsoffriendswithhighchanceeveniftheyarenotyetfriendofeachother.Therefore,predictingtheexistenceofthesebackwardsedgeswillallowamorecompleteandcomprehensivedetectionofstablecommunitiesbyincreasingtheinternaldensityofstronglyconnectedcomponents,whicharepotentialcandidatesfornetworkcommunities.Linkreciprocitypredictionproblemisawell-studiedeldandmanymethodsareproposedintheliterature[ 69 ][ 3 ][ 30 ].Inthispaper,weutilizeamethodcalledfriends-measuresuggestedin[ 30 ].Theintuitionbehindthismeasureisthatwhenlookingattwousersinthesocialnetwork,onecanassumethatthemoreconnectionstheirneighborshavewitheachother,thehigherthechancethetwousersareactuallyconnected.Originally,thisfriends-measurebetweentwousersuandvisformulatedas:friends-measure(u,v)=Xx2NuXy2Nv(x,y)where(x,y)=1ifeitherx=yor(x,y)2Eor(y,x)2E.Thismeasurehasbeenextensivelyveriedamongothertopologicalfeaturesandhasbeenshowntobeapromisingoneincomparisonwithothermetrics[ 30 ].However,inthecaseofdirectednetworks,therearepossibilitiesthatdifferentlinktopologiescanshareacommonfriends-measurevalue.Therefore,weneedtomodifytheaboveformulasothatitreectsthetruerelationshipbetweenthenetworkusers,andfurthermorecopeswithedgeweightsinthenetwork. 125

PAGE 126

Inordertobetterhandledirectedandweightedgraphs,wewillattempttopredicttheexistenceofbackwardsedgesofunidirectionallinks.Forexample,if(u,v)2Eand(v,u)=2E,wewilltrytondthepossibilitywhetherweshouldenrichthenetworkbyinserting(v,u)intoE.Tothisend,werstrelaxthedirectionoftheedgebetweenuandv,andnextcomputethelikelihoodthatabackwardsedgeshouldexistbetweenuandvbyusingthemodiedformula (u,v)=Px2NvPy2Nu(x,y) WvWu(7)Where(x,y)=wvxwxywyuisthetotalpossibilityofthebackwardspathstartingfromv,passingthroughneighbornodesxandy,andendingatu.WhenthenetworkisunweightedWu=du,Wv=dvandthus,(u,v)countsthe(normalized)numberofpathsoflengthstwoandthreejoiningtwousersuandv,whichintuitivelyagreeswiththeaforementionedfriends-measureformula.ByProposition 7.1 ,weshowthat(u,v)isindeedthegeneralizationofweightedfriend-measure(u,v)anddependsonlyonthenodes'topology.Hence,(u,v)canberegardedastheestimatedprobabilitythatthebackwardsconnectionindeedexists,i.e.,wesetw=(u,v). Proposition7.1. Forany(u,v)2Ewhich(v,u)=2E,0(u,v)1. Proof. Werstprovethisforunweightedgraphs.Theproofforweightedgraphscanbeextendedstraightforwardly.Itisobviousthat0(u,v).Nowweshow(u,v)1.Foranyx,ysuchthat(x,y)=1,ifx=y,theycanmakejustoneconnectioncountedtowardsthesummation.Otherwise,theycanmakeatmostdu(ordv)dual-connectionsateachvertex.Takingthesefactsintoaccount,wehave(u,v)=Px2NvPy2Nu(x,y)dvdu.Thus,theinequalitiesfollows.Theleftequalityholdswhentherearenoconnectionsfromutovandviceversa.Therightequalityholdswheneverypathoflength2fromutov(orfromvtou)arecontainedinthecorrespondingpathoflength3. 126

PAGE 127

Proposition7.2. Letn0bethenumberofunidirectionallinksintheinputnetwork.ThetimecomplexityforestimatingthemutualconnectionsfortheselinksisO(n0M). Proof. Thetotaltimerequiredforestimatingthepossibilityforabackwardconnectionatanedge(u,v)isdu+dv+minPx2N+vPy2N)]TJ /F14 5.978 Tf -.58 -5.09 Td[(yfd+x,d)]TJ /F6 7.97 Tf -1.15 -7.29 Td[(yg.Thus,foralln0links,thetotaltimecomplexityisupperboundedbyn0(2M)+n0M=O(n0M). 7.2.2LinkStabilityEstimationAfterthereciprocityofeachlinkinthenetworkhasbeenestimated,theinputnetworkisnowenrichedwithmoreinformationofthebackwardsedges.Whilethepresenceofthesedualedgesishelpfulincharacterizingthemutualrelationshipsbetweenpairsofnetworkusers,itmightnotbesufcienttoevaluatethestabilityofallnetworkconnectionsassomeofthebackwardsedgesmaybeoflowmagnitudes,andthus,maynotbeabletohintthestabilityoftheconnection.Therefore,weneedtofurtherestimatethestabilityofanetworklinkgivenitspredictedreciprocity.Inordertodoso,wedenethestabilityofanedge(u,v)2Eatttimesteps(orthops)asfollowst(u,v,t)=XjPj=tw(P)wherePisapathgoingfromvtou(vanduareexcluded)oflengthjPj=t,andw(P)=Q(a,b)2PwabisthetotalweightofpathP.Finally,wedenethestabilityst(u,v)ofalink(u,v)2EasthetotalstabilityofuptoT0timesteps,whereT0isapredenedparameter(ortheupperboundonthenumberofhops) st(u,v)=T0Xt=1st(u,v,t).(7)Theintuitionbehindourstabilityfunctionst(u,v)isasfollow:sincestablecommunitiesarecommonlyrecognizedbyahighdensityofstableedges,itisreasonabletoexpectthatsuchedgesformacycles.Inthesensesofdirectedandweightednetworks,thestrongerthestrengthofcyclesanedge(u,v)ison,themorestableitisbelievedtobe. 127

PAGE 128

Figure7-1. Illustrationsofstabilityfunction. Onthecontrary,edgesthatconnectingorjoiningbetweencommunitiesshallhardlybepartofmanycycles,andeventuallyresultinlowstability.Figure 7-1 illustratesthestabilityestimatesforlink(u,v)at0,1and2hops:a)st(u,v,0)=0.45=w,b)st(u,v,1)=0.50.2=0.1,c)st(u,v,2)=0.50.10.2=0.05.Asalocalmeasure,oursuggestedstabilityfunctionhasthefollowingadvantages(1)itputsmorefocusontheexistenceofthemutuallinkofanylink(u,v)byreservingtheoriginalstrengthofthebackwardsedge.Thisintuitivelyagreeswiththendingsthatstableclustersareusuallymadeofbidirectionallinksin[ 68 ].Moreover,ourformulafurthertakesintoaccountthestrengthofcyclescontainingthecurrentlink;(2)themoretime(or,numberofhops)weallow,themorestabilityalinkwouldbe.Nevertheless,linksthatreallybelongtoastablecommunityaremorelikelytohavestrongstabilitywhereasthoseconnectingcommunitiesareoflowstability.Theseadvantagessupporttheintuitionsofstablecommunitiesthatwediscussedabove.Theperformanceofourstabilityestimationisevaluatedinmoredetailinsection 7.4 .Insummary,ourlinkstabilityestimationrstpredictsthepotentialoftheduallinkofanylink(u,v)2Esuchthat(v,u)=2Ebyusingthemodiedmeasureinequation( 7 ).Next,itevaluatesthestabilityoftheeverylinkinthegivennetworkenrichedfromtherststagebyusingequation( 7 ),andutilizesthesestabilityvaluesasnewweightsforlinksinthenetwork.Thisresultingnetworkwillbeconsequentlypassedastheinputnetworktoourmainprocess:theidenticationofstablecommunities. 128

PAGE 129

7.3StableCommunityDetectionInthissection,wepresentourmaincontribution:thestablecommunityidenticationprocess.Giventheinputnetworkenrichedwithlinkstabilityinformation,wediscoverthestablecommunitiesbyexploringanimportantconnectionbetweenthepersistenceprobabilityofeachcommunityanditslocalnetworktopology.Inthefollowingparagraphs,werstreviewtheconceptofLumpedMarkovchain[ 50 ][ 89 ],andthenestablishourkeyconnectionbetweenthisMarkovchainandthelocalnetworktopology.Finally,wedescribeindetailourlastbutmostimportantprocess:stablecommunitydetection. 7.3.1LumpedMarkovChainAMarkovchain[ 96 ]isamathematicalsystemrepresentingtransitionsfromonesystem'sstatetoanother,betweenanitenumberofpredenedstates.Intermsofsocialnetworks,astatecanbeeitherauser(anodeinthegraph)oragroupoftightlyconnectedusers(acommunity)inthenetworks,whereastransitionscanberegardedastheuser-to-userorgroup-to-groupcommunicationtendencies.Ann-stateMarkovchaincorrespondingtoann-nodenetworkiscommonlyrepresentedbythetransitiont+1=tP,wheret=(1,t,2,t,...,n,t)withu,tistheprobabilityofbeingatnodeuattimet,andP=(puv)isthetransitionmatrix.Inparticular,thisn-stateMarkovchaincanbeassociatedtoinputnetworkbylettingtheprobabilityoftransitingfromanodeutoaneighbornodevaspuv=wuv Pjwuj=wuv w+u.Basically,puvistheprobabilityofarandomwalkerjumpsfromnodeutonodevgiventhenetworktopology.AMarkovchainissaidtobeatitsstationarystatedistributionifsatisestheequation=P.Asshownin[ 43 ],whenthenetworkisoriginallyconnectedPwouldbeirreducible,andthus,theequation=Phasauniquesolutionwhichisstrictlypositive(u>08u2V)whichcorrespondstothestationaryMarkovchainstatedistribution.Whenthenetworkisundirected,canbeexactlycomputedas=1 2W0(w1,w2,...,wn)withW0isthetotaledgeweights.However,wedonothavean 129

PAGE 130

exactformforthestationarydistributioningeneralfordirectednetwork,andthus,hastobecomputednumerically.Asourultimategoalistodetectthestablenetworkcommunitystructure,wesoughttondagoodpartitioningofVwhereeachpartitionwillremainwealthyovertime.InthelightofMarkovianchainmethod,thiscorrespondstondingacollectionofcommunitiesC=fC1,C2,...,Cqgwherearandomwalkerwouldspendmostofthetimewalkinginsideacommunityandlesstimewanderingamongcommunities.BydeningthispartitionCofqcommunities,weintroduceasocalledq-statemeta-networkwhereeachcommunityinthenetworkbecomesameta-state.However,atthisaggregatelevel,aingeneraldynamicsMarkoviandescriptionofarandomwalkerwalkingamongcommunitiesisnotpossiblebecausetheMarkovianpropertymaynotbewell-preserved[ 50 ].Nevertheless,thisq-statecommunity-to-communitytransitioncanstillbedenedusingthelumpedMarkovchain,whichcorrectlydescribestherandomwalkeratthisscalegiventhestochasticprocessisstartedatthestationarydistribution[ 43 ].ThislumpedMarkovchainisdenedviatheqqmatrixasUin[ 89 ]U=[diag(H)])]TJ /F7 7.97 Tf 6.58 0 Td[(1HTdiag()PHwhereHisanqbinarymatrixrepresentingthepartitioningC.OneofthenotableadvantagesofthelumpedMarkovchaint+1=tUdenedonUisthatitsharesthesamestationarydistributionwiththeoriginalMarkovchain,i.e.,thenewstationarydistributiondenedby=Hsatisestheequation=U.Moreover,thedifferencebetweent+1=tU,startingat0=U,andtheoriginaltHtendsexponentiallytozeroifthetwochainsareregular.Theseadvantagesmakethecommunity-basedlumpedMarkovchaindenedbytUaverygoodapproximationoftheoriginaln-nodenetwork.WestressthattheabilityofthelumpedMarkovchaintodescribetherandomwalkdynamicsonlyatstationaryisnotalimitationforthedetectionofstablecommunities.Indeed,thisstationaryrequirementevaluatestherandomwalk 130

PAGE 131

dynamicsofallnodesattheirstablestates,andhenceperfectlysupportstheconceptofstablecommunities.Intermsofinterpretation,eachentryucdofUdenotesthechancethatarandomwalker,attimet,wandersfromcommunityctoanothercommunitydintimet+1.Asaresult,thediagonalelementsuCC's(oruC'sinshort)ofUindicatethepersistenceprobabilitiesthatarandomwalkerjustwalkingwithinaparticularcommunityC.Ofcourse,largevaluesofuC'sareexpectedformeaningfulcommunities.Itisalsoshownin[ 89 ]thatindirectedandweightedgraphs,uCcanbecomputedas uC=Pi,j2Cipij Pi2Ci(7)NotethatPi,j2CipijisthefractionoftimearandomwalkerspendsonthelinksinsideacommunityC.Hence,uCisindeedtheratiobetweentheamountoftimearandomwalkerspendsonlinksandthatitspendsonnodesinC.Inundirectednetworks,onecanverifythatuC=Pi,j2Ciwij Pi2Cwi=2wC 2wC+w(Cout). 7.3.2TheConnectiontoNetworkTopologyAtthisstage,onemighttrytooptimizeuCforallcommunitiesC2Cinordertomaximizetheiepersistenceprobabilities.However,doinginthiswayrequiressolvingforthestationarydistributioni's(asinequation( 7 ))whichmaybeextremelycostly,especiallyinlargescaledirectednetworks.So,howcanweeffectivelyoptimizethepersistenceprobabilityuCforeachcommunitywithoutsolvingforthatcostlyexactstationarydistribution?Asananswerforthischallengingquestion,wepresentinProposition 7.3 aconnectionbetweenthepersistenceprobabilityofacommunityCanditslocaltopology.Inparticular,weshowthattheminimumvalueofuCcanberepresentedbyquantitiesthatonlyinvolveC'slocaltopology.Therefore,optimizinguC 131

PAGE 132

canbeshiftedastheoptimizationoftheselocalcomponents,whichareinexpensiveandeasytoderive. Proposition7.3. ForanycommunityC2C,atthestationarydistribution,wehavethefollowinginequalityuC=Pi,j2Cipij Pi2CiwC w+C. Proof. ItiseasytoseethatuC=Pi,j2Cipij Pi2Ci=Pi2Ciwi,C w+i Pi2Ci.wherewi,C=Pj2Cwij.Next,werewritePi2CiintheformPi2Ci=TeCwhereeC=(ei)N1andei=1ifi2Cand0otherwise.SinceisthestationarydistributionoftheMarkovchain,wehave=P.ThusTeC=TPeC=Xi2Ci)]TJ 14.94 1.68 Td[(Xj:(i,j)2E1 w+iNowwehave,Xi2CiwC=Xi2Ci)]TJ 14.94 1.67 Td[(Xj:(i,j)2E1 w+iwCXi2Ciwi,C w+i)]TJ 7.47 1.67 Td[(Xt2Cw+t=Xi,j2Ciwi,C w+iw+CHence,theconclusionfollows.ThequalityholdswhenalliequalstoeachotherandwC=w+C.ThishappenswhenCisafullduallyconnectedcliqueandisdisconnectedfromtherestofthenetwork. 7.3.3DetectingCommunities 7.3.3.1FormulationProposition 7.3 discussedintheaboveparagraphestablishestheconnectionbetweenthepersistenceprobabilityofarandomwalkerstayingwithinacommunityCandthelocalnetworktopology.Asaresult,ifwecanmaximizethelaterquantity,wecanprovidesomeinsurancetothedesiredoptimizationwithhighcondence.Takinginto 132

PAGE 133

accountthisintuition,weproposeStableCommunityDetection(SCD)asanoptimizationproblemdenedasfollow:Givenadirected,weightednetworkG=(V,E,w),ndacommunitystructureC=fC1,C2,...,Cqgsuchthattheoveralltotalpersistenceprobabilityismaximized:maxR=XC2CwC w+CsubjecttoCi\Cj=;8i,j2f1,2,...,qgq[i=1Ci=VNotethatinourSCDformulation,thenumberofcommunitiesqwillbedeterminedbyoptimizingtheobjectivefunctionRandisnotaninputparameter.Indeed,optimizingRprovidesusqaverygoodestimatefortheactualnumberofcommunities,aswewillshowinsection 7.4 7.3.3.2ResolutionlimitanalysisPerhapsoneofthemostimportantpropertiesametricsuggestedforidentifyingcommunitystructureshouldsatisfyistheabilityofovercomingtheresolutionlimit[ 35 ],i.e.,themetricshouldbeabletodetectnetworkcommunitiesevenatdifferentscalinglevels.Inthissubsection,weanalyzetheresistancetoresolutionlimitofourproposedfunctionRbylookingparticularlyattheconditioninwhichtwocommunitiesshouldbemergedtogether.Inwhatfollowing,wesimplifythesituationbyconsideringundirectednetworks.LetusconsidertwocommunitiesC1andC2.Letm12bethenumberofedgesconnectingC1andC2.InordertomergeC1andC2intoabiggercommunity,m12shouldsatisfy:mC1 d+C1+mC2 d+C2mC1+mC2+m12 d+C1+d+C2 133

PAGE 134

Theaboveconditionisequivalentto:mC1 d+C2d+C1+mC2 d+C1d+C2m12whichinturnimplies2p mC1mc2m12.Withoutlossofgenerality,wecanassumethatmC1mC2,thus2mC1m12.Thisviolatestheconditionofevenaweakcommunity.Moreover,thisinequalityimpliesthesufcientconditiontomergetwoadjacentcommunitiesdependsonthelocalstructureoftwocommunitiesonly,regardlessoftherestofthenetwork.ThisobservationindicatesthatourproposedmetricRisstronglyagainsttheresolutionlimit. 7.3.3.3ConnectiontostabilityestimationWenextverifythefollowingpropertiesofnetworkcommunitiesidentiedbyoptimizingoursuggestedmetricR:(1)linkswithinacommunitiesareofhighstabilityand(2)linksconnectingcommunitiesareoflowstabilityvalues.ThesetwoobservationsareshowninPropositon 7.4 Proposition7.4. LetC=fC1,C2,...,Ckgbeacommunitystructuredetectedbyoptimiz-ingR,linkswithineachCiareofstrongstabilityandthoseconnectingcommunitiesareofweakstabilityvalues. Proof. Foranynodep2VandsubsetAV,letwp,AbethetotalweightofalllinksthatphastowardsAandviceversa.Bythisdenition,weobtainwp=wp,A+wp,VnA.ForanycommunityC2C,s2Candp=2C,sincepisnotamemberofC,wehavewC w+C>wC+wp,C w+C+wp=wC+wp,C w+C+wp,C+wp,VnC,becauseotherwisejoiningptoCwillgiveabettervalueofR.Thisequalityequalswp,C wp
PAGE 135

Similarly,foranynodes2C,wehavewC w+C>wC)]TJ /F4 11.955 Tf 11.95 0 Td[(wp,C w+C)]TJ /F4 11.955 Tf 11.96 0 Td[(wp=wC)]TJ /F4 11.955 Tf 11.96 0 Td[(wp,C w+C)]TJ /F4 11.955 Tf 11.95 0 Td[(wp,C)]TJ /F4 11.955 Tf 11.96 0 Td[(wp,VnC,becauseotherwiseexcludingsfromCwillgiveabetterR.Thisinequalityequalstows,C ws>wC w+C,whichinturnimpliesthatthestabilitycontributionofinternallinksofCaresignicantincomparisontoCasawhole. 7.3.3.4AgreedyalgorithmforSCDproblemAnalyzingthetheoreticalhardnessoftheSCDproblemisanaspectbeyondthescopeofthispaper.Infact,theNP-hardnessoftheSCDproblemcanbeshownbyasimilarreductiontoMODULARITYasin[ 8 ](seealso[ 101 ]and[ 36 ]foracomprehensivesurveyonsimilargraphclusteringproblems).GivenitsNP-hardness,aheuristicapproachthatcanprovideagoodsolutioninatimelymanneristhereforemoredesirable.Inthissection,wedescribeagreedyalgorithmfortheSCDproblemconsistingofcommunitygrowing,strengtheningandrenementphasesdescribedasfollow.Growingphase.Thisphaseisresponsiblefordiscoveringrawcommunitiesintheinputnetwork.Initially,allnodesareunassignedanddonotbelongtoanycommunity.Next,arandomnodeisselectedastherstmember(ortheseed)ofanewcommunityC,andconsequently,newmemberswhohelptomaximizeC'spersistenceprobabilityaregraduallyadmittedintoC.Whenthereisnomorenodethatcanimprovethisobjectiveofthecurrentcommunity,anothernewcommunityisformedandthewholeprocessisthencycledintheverysamemanneronthisnewlyformedcommunity.Strengtheningphase.Wefurtherrearrangenodesintomoreappropriatecommunities.SincenewmembersareadmittedintoacommunityCinarandomorder,C'sobjectivevaluecouldbefurtherimprovewiththeabsenceofsomeofitmembersastheycanbe 135

PAGE 136

Algorithm17SCDAlgorithm Input:AdirectedweightedgraphG=(V,E,w) Output:CommunitystructureC GrowingPhase: C ; A V while9unassignednodeu2Ado C fug A Anfug while9v2AsuchthatuC[fvg>uCdo v argmaxv2AfuC[fvgg C C[fvg A Anfvg endwhile C C[fCg endwhile StrengtheningPhase: forC2Cdo while9u2CsuchthatuCuC1+uC2do (C1,C2) argmaxC1,C22CfuC1[C2)]TJ /F4 11.955 Tf 11.95 0 Td[(uC1)]TJ /F4 11.955 Tf 11.96 0 Td[(uC2g C (CnfC1,C2g)[fC1[C2g endwhile ReturnC obstaclesforthetotalstability.ThisrequiresthereevaluationofallC'smembersasaresult.Therefore,inthisphase,weexcludeanynodewhichreducesthepersistenceprobabilityofacommunityandletthembesingletoncommunities.Theremovalofsuchnodescreatesmorecohesivecommunities,i.e.,communitieswithhigherinternalstability.Reningphase.Inthelastphase,theglobalstabilityofthewholenetworkisreevaluated.Inparticular,thislastrenementphaselooksatthemergingof 136

PAGE 137

twoadjacentcommunitiesinordertoimprovetheoverallobjectivefunction.Iftwocommunitieshaveagreatnumberofmutualconnectionsbetweenthem,itisthusmorestabletomergethemintoonecommunity.Thenalalgorithm,whichwecallSCDalgorithm,ispresentedinAlg. 17 7.4ExperimentalResultsInthissection,wepresentourresultsonthediscoveryofnetworkcommunitiesonbothsynthesizednetworkswithknowngroundtruthsandreal-worldsocialtracesincludingNetHEPTandNetHEPT WCcollaborationandFacebooknetworks.WeevaluatethefollowingaspectsofourproposedSCDframework(1)theeffectivenessofourlinkstabilityestimationprocess,(2)theabilityofidentifyingthegeneralnetworkcommunitystructurewithouttheconceptofcommunitystability,i.e.,howsimilarourdetectedcommunitiesareincomparisonwiththegroundtruths,and(3)theabilityofidentifyingstablecommunitiesinreferencetotheconsensusofotherstate-of-the-artmethods,includingBlondel's[ 6 ],Infomap[ 93 ]andOSLOM[ 61 ]methods,aftertheirmultipleexecutions. 7.4.1Datasets(Synthesizednetworks)Ofcourse,thebestwaytoevaluateourapproachesistovalidatethemonreal-worldnetworkswithknowncommunitystructures.Unfortunately,weoftendonotknowthatstructuresbeforehand,orsuchstructurescannotbeeasilyminedfromthenetworktopologies.Althoughsynthesizednetworksmightnotreectallthestatisticalpropertiesofrealones,theycanprovideustheknowngroundtruthsviaplantedcommunitiesandtheabilitytovaryothernetworkparameterssuchassizes,densitiesandoverlappinglevels,etc.Testingcommunitydetectionmethodsongenerateddatahasalsobecomesausualpracticethatiswidelyacceptedintheeld[ 55 ].Weusethewell-knownLFRbenchmark[ 55 ]togenerate190weightedanddirectedtestbeds.Generateddatafollowpower-lawdegreedistributionandcontainembedded 137

PAGE 138

ANetworkswithminC,maxCunconstrained. BNetworkswithminC=25,maxC=50(small-size). CNetworkswithminC=50,maxC=100(big-size). Figure7-2. Resultsonsynthesizednetworkswithdifferentcommunitycriteria. communitiesofvaryingsizesthatcapturecharacteristicsofreal-worldnetworks.Parametersare:thenumberofnodesN=1000and5000,themixingparameter=[0.1...1]controllingtheoverallsharpnessofthecommunitystructure,theminimum(minC)andmaximum(maxC)ofcommunitysizesaresetto(25,50)forsmall-sizeand(50,100)forbig-sizecommunitiesasinthestandardsettings.Eachtestisaveragedover100runsforconsistency. 138

PAGE 139

(NetHEPTandNetHEPT WC)TheNetHEPTtracesarewidely-useddatasetsfortestingsocial-awaredetectionmethods[ 11 ][ 12 ].Thesetracescontaininformation,mostlytheacademiccollaborationfromarXiv'sHighEnergyPhysics-Theorysectionwherenodesstandforauthorsandlinksrepresentcoauthorships.Intheirdeliverable,theNetHEPTnetworkscontain15233nodesand31398links,andweightsonedgesareassignedbyeitheruniformlyatrandom(forNetHEPTdata)orbyweightedcascade(forNetHEPT WCdata)wherewuv=1=din(v)withdin(v)istheindegreeofanodev.(Facebook)ThisdatasetcontainsfriendshipinformationamongNewOrleansregionalnetworkonFacebook,spanningfromSeptember2006toJanuary2009[ 100 ].Thedatacontainsmorethan63Knodes(users)connectedbymorethan1.5millionfriendshiplinkswithanaveragenodedegreeof23.5.Inourexperiments,theweightforeachlinkbetweenusersuandvisproportionaltothecommunicationfrequencybetweenthem,normalizedonthewholenetwork. 7.4.2MetricTomeasurethequalityofthedetectedcommunitiesincomparisonwiththeembeddedgroundtruths,weevaluateGeneralizedNormalizedMutualInformation(NMI)[ 55 ].Basically,theNMI(U,V)valueoftwostructuresUandVis1ifUandVareidenticalandis0iftheyaretotallyseparated.Thisisthemostimportantmetricforacommunitydetectionalgorithmbecauseitindicateshowgoodthealgorithmisincomparisonwiththeplannedcommunities.HigherNMIvaluesareexpectedforabettercommunitydetectionalgorithm. 7.4.3EffectofLinkStabilityEstimationWerstevaluatetheeffectofourlinkstabilityestimationonthedetectionofnetworkcommunitiesbycomparingNMIvaluesofSCDanditsversionwithNoLinkstabilityPrediction(SCD-NLP).Duetospacelimit,resultsofSCDandSCD-NLParealsoreportedinFigure 7-2 ,wherethoseongeneralcommunitystructuredetectionarealsopresented. 139

PAGE 140

Figure7-3. PerformanceofSCDindetectingstablecommunitiesonrealsocialtraces. Ingeneral,SCD-NLPperformsverycompetitivelyevenwithoutbeingpreprocessed:onsynthesizednetworkswithnocommunitysizeconstraint(Figure 7-2A ),itsdiscoveredcommunitiesarealmostofperfectsimilaritytotheembeddedones(NMIvaluesapproximately1)on=[0...0.5]whereasthequalitydropsdownquicklywhenisabove0.5.Wenotethatthisdropofdetectionqualityiscontroversialanddoesnotnecessaryimplyabadperformancesincenetworkswith>0.5isconsiderverystochastic,andthus,maynotcontainaclearcommunitystructure.Nevertheless,withthehelpofthestabilityestimation,theperformanceisnowboostedupsignicantlyonSCDasthedetectionqualitiesareveryhighevenfor>0.65(N=1000)and>0.75(N=5000),andonlydropdownwhenthenetworksareextremelystochastic(>0.8).Wenexttakealookatthecaseswherenetworksareconstrainedwithsmall-sized(Figure 7-2B )andbig-sizedcommunities(Figure 7-2C ).Weobservethat,whencommunitysizesareconstrained,SCD-NLPperformsmuchbetterthanbeforeandevenovercomeitspriorlimit=0.5.Inparticular,theperformanceofSCD-NLPcloselyapproachesthatofSCD,especiallyinlargenetworks(N=5000).However,SCD-NLPappearstobesensitivetobig-sizecommunitiesinsmallnetworksasitsqualitydropsdownquicklyinFigure 7-2C (left),andseemstofavorsmall-sizecommunitiesasitsplotstendtotanglewiththoseofSCD(Figure 7-2B ).SCDdetectionquality,thankstothestabilityestimation,stayswealthyinalltestcases. 140

PAGE 141

Insummary,theseresultsindicatethat(1)withoutthestabilityestimationprocess,oursuggestedmetricRappearstobeaverygoodonetodetectcommunitystructureingeneraldirectedandweightednetworks,and(2)whenthecommunitysizeisconstrained,linkstabilityestimationhasalittleeffectonthecommunitydetectionquality.However,inreal-worldsocialnetworksettingswherecommunitysizesaretypicallyunknown,andthereforeunconstrained,thestabilityestimationhasasignicanteffectonthedetectionofnetworkcommunities.Theseexperimentsalsoconrmtheefcacyofourproposedstabilityestimationprocedure. 7.4.4GeneralCommunityStructureDetectionWenextinvestigateonSCD'sabilitytoidentifygeneralnetworkcommunitystructure,i.e.,withoutcommunitystability,incomparisonwiththeaforementionedstate-of-the-artdetectionmethods.ResultsarereportedinFigure 7-2 .Ingeneral,theperformanceofourSCDframeworksonsynthesizednetworksappearstobebetterthanthoseofBlondelandInfomapmethods,andonlylagsbehindOslom'swhenthenetworksareheavilystochastic.Whenthecommunitysizeisunconstrained,thedetectionqualityofSCDandothermethods,exceptforBlondel's,retainatnearlyperfecton=[0...0.65](N=1000)and=[0...0.8](N=5000)andthenalldegradequickly.Amongthethreemethods,Infomap'sperformanceappearstobesensitivetosomecertainmixingthresholdasitNMIvaluestendtodropdirectlyto0,whereasOslomandourstendtodropdownslower.Onaverage,theNMIvaluesofSCDareabout8%and3%betterthanthoseofBlondelandInformapmethods,andareabout2%lagbehindthoseofOslommethod.Blondel'smethod,ontheotherhand,doesnotattainagoodperformancethroughduetolowNMIvaluesevenatalowrangeofmixingvalue.AnpossibleexplanationforthisbehaviorofBlondel'smethodisduetotheeffectofresolutionlimit,asweshalldiscussbelow.Whentheembeddedcommunitiesareconstrainedwithsmallandlargecommunitysizes,weobservethenearlysamebehaviorofSCD,OslomandInfomapmethodsas 141

PAGE 142

depictedinFigures 7-2B and 7-2C .Blondel'smethodgetsasignicantimprovementinthesecaseswhereitsperformanceiscloselyrelatedtotheothers.Aswediscussedabove,onepossiblereasonforthebadbehaviorofBlondel'smethodisduetotheresolutionlimitofmodularityobjectivefunction[ 35 ].Asthecommunitysizeisunconstrained,thisresolutionlimitcanmisleadBlondelmethodtomergesomecommunitieswhoseareofsmallsizesincomparisontotherestofthenetwork,thusresultsinthelowNMIvalues.Ontheotherhand,thisresolutionlimitdoesnottakeeffectwhensizeconstraintsareimposedandthusthesignicantimprovement.OurSCDframework,asshowninsection 7.3.3.2 ,canwithstandthisscalinglimitasitsobtainshighlycompetitivelyresults.Moreover,thedifferencebetweenourSCDandothermethodsareinsignicantonaveragewhichindicatesthatallmethodsareabletodetectnetworkcommunitieswithhighquality.ThisisnotasurprisingresultsinceBlondel,OslomandInformaparecurrentlystate-of-the-artmethodsbutagreatmotivationandawardforourSCDframework. 7.4.5ResultsonStableCommunityDetectionInordertocompareourresultstotheconsensusofotherdetectionmethods,wewilladoptastrategyrecentlyproposedin[ 59 ].Inparticular,givenaspeciccommunitydetectionmethodA,itsconsensus(orstable)communitiescanbedeterminedby:(i)executeAonGnptimestohavenppartitions(ii)ndthematrixD=(Dij)whereDijistheprobabilitywhichverticesiandjofGareassignedtothesameclusteramongnppartitions(iii)allDij'sthatarebelowathresholdwillbedisregarded(iv)ApplyAonDnptimes,sotocreatenppartitionsand(v)ifallpartitionsareequal,stop(theresultmatrixwouldbeblockdiagonal).Otherwisegotostep(ii).Assuggestedin[ 59 ],theresultedcommunitiesareidealcandidatesforstablestructuresasmemberscommittotheircommunities.WealsocomputetheJaccardindexJ(U,V)=jA\Bj jA[Bjtobetterevaluatethequalityofthedetectedstablecommunities.ResultsarerepresentedinFigure 7-3 142

PAGE 143

Asillustratedbythesubgures,evenasinglerunofSCDisabletoobtainveryhighNMIscoresandJaccardindiciesincomparisonwiththeconsensusofothermethodsaftermultipleruns.Inparticular,communitystructuresdiscoveredbySCDonNetHEPTandNetHEPT WCobtainnearly70%similarityincomparisonwithBlondel,InfomapandOslommethods,meanwhiletheJaccardindiciesindicatethat,inaverage,almost66%numberofnodesarefoundincommonbetweenSCDandthecorestructureofothercompetitors.ThisshowthatcommunitiesdiscoveredbySCDareindeedhighlyoverlapwithcorecommunitystructuresidentiedbyotherdetectionmethods,whichinturnsimpliesthatthoseclustersfoundbySCDarestablewithhighcondence.Surprising,inbothNetHEPTandNetHEPT WCnetworks,weobservethehighsimilarityamongtheconsensusofBlondel,InfomapandOslommethodsevenwithdifferenceinedgeweightdistribution.ThisobservationindicatethoseidentiedcommunitiesbySCDare,infact,stableinthesenetworks.EveninFacebook,alargenetworkwithrealsocialinteractions,thesimilaritybetweenconsensuscommunitiesdiscoveredbyothermethodsandbySCDarestillofhighsimilaritywithnearly60%,50%similaritytothosefoundbyBlondelandInfomapmethodswithover50%overlapinthestablepartitionsasindicatedbytheJaccardindices.TheachievedNMIvaluesincomparisonwithOslommethodarerelativelylowastheircorecommunitiesdonotappeartohighlyoverlap(Jaccardindexofonly35%).WenotethatthislowsimilaritydoesnotindicatetheunstabilitycommunitystructureofourSCDframeworksincecommunitiesdetectedbyOslomcanbeoverlappedwitheachother,whileSCDworkstowardsdisjointcommunitystructure.Nevertheless,asjustasinglerun,theabovecompetitivelyresultsinreferencetootherstate-of-the-artmethodsconrmtheefcayandqualityofourmethodindetectingstablenetworkcommunitiesinOSNs. 143

PAGE 144

7.5ConclusionInthiswork,weinvestigatecommunitystructuresindirectedOSNswithmorefocusoncommunitystability.Asanefforttowardstheunderstandingofstablecommunities,wesuggestanestimationprocedurewhichprovideshelpfulinsightsintothestabilityoflinksintheinputnetwork.Basedonthat,weproposeSCD,aframeworktoidentifycommunitystructureindirectedOSNswiththeadvantageofcommunitystability.Weexploreanessentialconnectionbetweenthepersistenceprobabilityofacommunityatthestationarydistributionanditslocaltopology,whichisthefundamentalpointtobackourSCDframework.Finally,wecertifytheefciencyofourapproachonbothsynthesizeddatasetswithembeddedcommunitiesandreal-worldsocialtraces,includingNetHEPTcollaborationandFacebooksocialnetworks,inreferencetotheconsensusofotherstate-of-the-artdetectionmethods.HighlycompetitiveempiricalresultsconrmthequalityandefciencyofSCDonidentifyingstablecommunitiesinOSNs. 144

PAGE 145

CHAPTER8ASSESSINGNETWORKCOMMUNITYSTRUCTUREVULNERABILITY 8.1IntroductionAsarststudyonassessingthevulneraibilityofthenetworkcommunitystructure,inthispaper,wetaketherststeponunderstandinghowthefailuresofcrucialnodesinthenetworkwillaffectitscommunitystructure.Particularly,weareinterestedinidentifyingnetworknodeswhoseremovalstriggerasignicantrestructionofthecurrentcommunitystructure.Formally,giventheinputnetworkandapositivenumberk,weintroducetheCommunityStructureVulnerability(CSV)whichaimstondoutasetSofknodeswhoseremovalmaximallytransformsthecurrentnetworkcommunitystructuretoatotallydifferentone,i.e.,thenewcommunitystructureresultedfromtheremovalofSisofleastsimilaritytotheoriginalone,evaluatedviatheNormalizedMutualInformation[ 20 ]measure.Knowledgeaboutthiscrucialvulnerabilityofnetworkcommunitystructureisofconsiderableusage,especiallyforsocial-awaremethodsinmobilead-hocandonlinesocialnetworks(OSNs).Togiveasenseofitseffects,considermessageforwardinginDTNs.Sincesocial-basedforwardingstrategiesinDTNsrelyonthehighestrankednodesineachcommunitytoforwardthemessage[ 47 ][ 80 ],theknowledgeofthisvulnerabilitycanhelptoeitherdesignroutingalgorithmsthatdonotoverloadthosecrucialdevices,iftheyarethosehighlyrankedonesinacommunity,ortodesignaneffectivebackupplanwhensomeofthemmayfailatthesametime.InwormcontainmentapplicationinOSNs[ 82 ][ 110 ],thisknowledgecanprovidehelpfulinsightsintotheprotectionofthosesensitivenodes,iftheyareindeedhighinuentialusers,oncewormsspreadoutinthenetwork.Asaresult,theidenticationofnodeswhoseremovaltriggersamassiverestructionofthecommunitystructureisextremelyimportantforthenetwork'sregularoperation.However,underaminorstructuralchangewhenanodeisexcludedfromacommunity,thisparticularcommunitycaneitherstayintactifthe 145

PAGE 146

removednodeislessimportant,orcanbebrokendownintosmallersubcommunitieswhichcanfurtherbemergedtoothercommunitiesifthecurrentnodeisofgreatimportanttothecommunity.Thisunpredictabletransformationofnetworkcommunitiestogetherwiththeirlargescalesinrealitymaketheassessmentofcommunitystructurevulnerabilityafundamentalyetchallengingproblem. 8.2ProblemDenitionInthissection,werstdenethegraphnotationsthatwillbeusedthoroughlyinthispaper.WethendescribeNormalizedMutualInformation(NMI)[ 20 ],aconceptinInformationTheory,asametrictoassessthedifferencebetweencommunitystructuresbeforeandaftertheremovalofimportantnodes.Finally,weformallydenetheCommunityStructureVulnerabilityproblem-ourmainfocusinthispaper.(Notations)LetG=(V,E)beanundirectedunweightedgraphrepresentinganetworkwhereVisthesetofjVj=Nnodes(e.g.,users),andEisthesetofjEj=Mlinks.Foranynodeu2VandasetCV,letN(u),duanddCubethesetofallneighborsofu,itsdegreeinGanditsdegreeinC,respectively.Furthermore,letnC=jCjbethenumberofnodesandmCbethenumberofinternaledgesinC.(Communitystructure)DenotebyAthespeciccommunitydetectionalgorithmthatwillbeappliedonG,andbyX=fX1,X2,...,XcXg,Y=fY1,Y2,...,YcYgthetwo(possiblyoverlapped)communitystructuresofcXandcYcommunitiesdetectedbyAbeforeandaftertheremovalofasetSofknodesinG,respectively.Mathematically,XandYarerepresentedasX=A(G)andY=A(G[VnS]),whereG[VnS]isthesubgraphinducedbyVnSonG.Foranyindexi=1,...,cXandj=1,...,cY,letxi=jXij,yj=jYjj,andnij=jXi\Yjj.Finally,letx=PcXi=1xi,y=PcYj=1yjandn=PcXi=1PcYj=1nijbethetotalsizeofcommunitiesinXandY,andthetotalnumberofcommonnodessharedbetweenXandY,respectively.(NormalizedMutualInformation)Inordertoevaluatehowmuchthenetworkcommunitystructurechangesbeforeandaftertheremovalofimportantnodes,we 146

PAGE 147

utilizetheconceptofNormalizedMutualInformationsuggestedin[ 20 ].Basically,giventwostructuresXandY,NMI(X,Y)is1ifXandYareidenticalandis0ifXandYaretotallyseparated,andthehighertheNMIscore,themoresimilaritybetweenXandY.Asaresult,NMIisawell-suitedmetricdedicatedforcertifyingthequalityofcommunitystructuresdiscoveredbydifferentdetectionalgorithms.Theeffectivenessofthiswidely-acceptedmeasurehasalsobeenextensivelyveriedintheliterature[ 55 ].Formally,NMI(X,Y)isdenedasNMI(X,Y)=2I(X,Y) H(X)+H(Y),whereH(X),H(Y)andI(X,Y)aretheentropyofstructuresXandY,andtheMutualInformationconveyedbetweenthem,respectively.MoredetailsaboutNMIformulationwillbeelaboratedinouranalysis.(Problemdenition)Finally,theCommunityStructureVulnerability(CSV)problemisformulatedasfollow. Denition1. GivenanetworkrepresentedbyanundirectedandunweightedgraphG,aspeciccommunitydetectionalgorithmA,andapositiveintegerkN,weseekforasubsetSVsuchthatS=argminTV,jTj=kfNMI(A(G),A(G[VnT]))g.Inotherwords,CSVproblemseeksforasubsetSVofknodeswhoseremovalresultsinthemaximumdifferencebetweentheinitialcommunitystructureXandthenewcommunitystructureYdetectedbyAonG[VnS].WecallStheNode-VulnerabilitysetofGsinceitsremovalmaximallytransformsnetworkcommunitiesofGtodifferentstructures.Remark.TheformulationofCSVrequiresthecommunitydetectionalgorithmAasaninputparameter.Becausethereisnotyetanuniversalagreementoraccepteddenitionofanetworkcommunity,thisinputisnecessaryinthesensethatdifferent 147

PAGE 148

algorithmswithdifferentobjectivefunctionsmightfavordifferentsetsofnodes,andthus,agoodsolutionsetforonecommunitydetectionalgorithmmaynotbegoodfortheothers.However,whenthereisaclearobjectivefunctionforndingcommunitystructure,suchasmaximizingModularityQ[ 55 ]orthetotalinternaldensity[ 80 ],thisrequirementcanbelifted.Nevertheless,thenodeselectionstrategythatreliesmoreontheinputnetworkandlessonthecommunitydetectionalgorithmisalwaysofdesire. 8.3AnalysisofNMIMeasureInthissection,weinvestigatethepossibleconditionsonsizesandthenumberofcommunitiesthatcanpotentiallyleadtoeithertheglobalorlocalminimizationofNMI(X,Y).Westressthattheseconditionsarebynomeansuniversalorexhaustivesincesomeofthemmightnotholdtruesimultaneously,giventheinputparameters.Indeed,whatwehopeforistheseconditionswouldprovideuskeyinsightsintotheselectionofimportantnodestomaximallyseparateXandY.Inthecomingparagraphs,werstdiscusstheNMIformulationinagreaterdetail,andthenanalyzeitintermsofbothdisjointandoverlappingcommunitystructures. 8.3.1NMIFormulationToevaluateNMI(X,Y)[ 20 ]whereX=fX1,X2,...,XcXgandY=fY1,Y2,...,YcYg,westartoutbyconsideringcommunityassignmentsXiandYj,whereXiandYjindicatethecommunitylabelsofanodetinXandY,respectively.Withoutlossofgenerality,wecanasloassumethatthelabelsXiandYjarealsovaluesoftworandomvariablesXandY(herewereusenotationsXandYtodenotethetworandomvariables),withjointdistributionP(Xi,Yj)=P(X=Xi;Y=Yi)=nij=(N)]TJ /F4 11.955 Tf 11.96 0 Td[(k),andindividualdistributionP(Xi)=P(X=Xi)=xi=N,P(Yj)=P(Y=Yj)=yj=(N)]TJ /F4 11.955 Tf 11.96 0 Td[(k). 148

PAGE 149

Theentropy(oruncertainty)ofXandYisdenedas[ 18 ]H(X)=)]TJ /F6 7.97 Tf 15.16 15.29 Td[(cXXi=1P(Xi)logP(Xi)=)]TJ /F6 7.97 Tf 15.16 15.29 Td[(cXXi=1xi Nlogxi N,H(Y)=)]TJ /F6 7.97 Tf 15.03 15.29 Td[(cYXj=1P(Yj)logP(Yj)=)]TJ /F6 7.97 Tf 15.03 15.29 Td[(cYXj=1yj N)]TJ /F4 11.955 Tf 11.96 0 Td[(klogyj N)]TJ /F4 11.955 Tf 11.96 0 Td[(k=1 N)]TJ /F4 11.955 Tf 11.95 0 Td[(k)]TJ /F5 11.955 Tf 5.76 -9.69 Td[(ylog(N)]TJ /F4 11.955 Tf 11.95 0 Td[(k))]TJ /F6 7.97 Tf 15.69 15.29 Td[(cYXj=1yjlogyj.NotethatinCSVproblem,XcanbederivedstraightforwardlybasedonAandG,andthus,quantitiesxi'scanalsobeinferredfromtheseinputparameters.Therefore,wesimplyconsiderxi'sandH(X)asconstantsinthispaper.TheMutualInformationI(X,Y)[ 18 ]oftworandomvariablesXandYisdenedasI(X,Y)=cXXi=1cYXj=1P(Xi,Yj)logP(Xi,Yj) P(Xi)P(Yj)=cXXi=1cYXj=1nij (N)]TJ /F4 11.955 Tf 11.96 0 Td[(k)logNnij xiyj.Thismeasureissymmetricandittellsushowmuchweknowaboutvariable(orstructure)YifwealreadyknowaboutvariableX,andviceversa.However,asindicatedin[ 20 ][ 55 ],MutualInformationitselfisnotidealasaglobalsimilaritymetricsinceanysubpartitionofagivencommunitystructureXwouldresultinthesamemutualinformationwithX,eventhoughtheycanpossiblybeverydifferentfromeachother.Asaresult,[ 20 ]introducestheNormalizedMutualInformationwhichcanovercomethatlimitation.Formally,NMIoftworandomvariablesXandYisdenedas NMI(X,Y)=2I(X,Y) H(X)+H(Y)(8)Intermofnotations,NMI(X,Y)canbewrittenas 2PcXi=1PcYj=1nijlogNnij xiyj (N)]TJ /F4 11.955 Tf 11.95 0 Td[(k)H(X)+ylog(N)]TJ /F4 11.955 Tf 11.96 0 Td[(k))]TJ /F11 11.955 Tf 11.96 8.96 Td[(PcYj=1yjlogyj(8) 149

PAGE 150

8.3.2MinimizingNMIinaDisjointCommunityStructureWhennetworkcommunitiesaredisjointfromeachother,wehaveXi\Xs=;,[cXi=1Xi=V,Yj\Yt=;,and[cYj=1Yj=VnSforalli,s=1,...,cXandallj,t=1,...,cY.Asaresult,thefollowingequalitiesholdtrue:x=PcXi=1xi=N,y=PcYj=1yj=N)]TJ /F4 11.955 Tf 12.04 0 Td[(kandn=Pijnij=N)]TJ /F4 11.955 Tf 11.96 0 Td[(k(). 8.3.2.1MinimizingNMIwithinacommunityWerstinvestigatethebehaviorofNMI(X,Y)inaspecialcasewhereonlyonespeciccommunityofXisaffectedbytheremovalofsetSofknodeswhileothercommunitiesstayintact.WecanassumethatX1isthetargetedcommunitywhichisfurtherdividedintopsmallersubcommunitiesofsizess1,s2,...,spsatisfyingPpj=1sj=x1)]TJ /F4 11.955 Tf 11.95 0 Td[(k.InthiscaseH(Y)=pXj=1sj N)]TJ /F4 11.955 Tf 11.96 0 Td[(klogN)]TJ /F4 11.955 Tf 11.96 0 Td[(k sj+cxXi=2xi N)]TJ /F4 11.955 Tf 11.96 0 Td[(klogN)]TJ /F4 11.955 Tf 11.96 0 Td[(k xi=(x1)]TJ /F4 11.955 Tf 11.95 0 Td[(k)log(N)]TJ /F4 11.955 Tf 11.95 0 Td[(k)+PcXi=2xilogN)]TJ /F6 7.97 Tf 6.59 0 Td[(k xi)]TJ /F11 11.955 Tf 11.96 8.97 Td[(Ppj=1sjlogsj N)]TJ /F4 11.955 Tf 11.96 0 Td[(k,andI(X,Y)=pXj=1sj N)]TJ /F4 11.955 Tf 11.96 0 Td[(klogN x1+cxXi=2xi N)]TJ /F4 11.955 Tf 11.96 0 Td[(klogN xi=x1)]TJ /F4 11.955 Tf 11.96 0 Td[(k N)]TJ /F4 11.955 Tf 11.95 0 Td[(klogN x1+cXXi=2xi N)]TJ /F4 11.955 Tf 11.95 0 Td[(klogN xi.Thus,NMI(X,Y)isminimizedwhenPpj=1sjlogsjisminimized.Sincefunctionslogsisstrictlyconvexforanys>0,weapplyJensen'sinequality[ 18 ]tothissummationandget1 ppXj=1sjlogsjPpj=1sj plogPpj=1sj p=x1 plogx1 p,withtheequalityholdswhenallsj'sareequaltoeachother.Itrevealsfromthisinequalitythat,inordertofurtherminimizetheRHSquantity,onecantrytobreakX1intoasmanysmallercommunitiesoftherelativelysamesizeaspossible(i.e.,toenlarge 150

PAGE 151

pasmuchaspossiblewhileensuringsi'sareallequal).ThisintuitionmakessensessinceanewstructureofX1withallsingletoncommunitieswillincurPpj=1sjlogsj=0,andhence,willmaximizeH(Y)andinturnwillminimizeNMI(X,Y).However,sincethenewstructureofX1dependsonthecommunitydetectionalgorithmA,theall-singletoncommunitiesscenariomightnotalwaysbethecase.Furthermore,willthiscrucialobservationholdtrueinageneraldisjointandoverlappingcommunitystructure?Wetendtoleanovertheafrmativeanswerthroughouranalysisinthecomingsubsections. 8.3.2.2MinimizingNMIinageneraldisjointcommunitystructureIngeneraldisjointcommunitystructure,theequalities()helptosimplifyNMI(X,Y)(eq. 8 )to2PcXi=1PcYj=1nijlogNnij xiyj (N)]TJ /F4 11.955 Tf 11.95 0 Td[(k)H(X)+(N)]TJ /F4 11.955 Tf 11.96 0 Td[(k)log(N)]TJ /F4 11.955 Tf 11.96 0 Td[(k))]TJ /F11 11.955 Tf 11.95 8.97 Td[(PcYj=1yjlogyj.Inordertominimizetheaboveratio,onewouldseekfortheconditionsinwhichthenumeratorofNMI(X,Y)isminimizedwhileitsdenominatorisalsomaximized.Tomaximizethelatterquantity,weneedtominimizePcYj=1yjlogyj.ApplyingJensen'sinequalitytothissummandgives1 cYcYXj=1yjlogyjy cYlogy cY=N)]TJ /F4 11.955 Tf 11.95 0 Td[(k cYlogN)]TJ /F4 11.955 Tf 11.96 0 Td[(k cY,andthusPcYj=1yjlogyjcanattainitminimumat(N)]TJ /F4 11.955 Tf 13 0 Td[(k)logN)]TJ /F6 7.97 Tf 6.59 0 Td[(k cYwithequalityholdswhenallyj'sareequaltoeachother.AsNandkareinputparameters,logN)]TJ /F6 7.97 Tf 6.59 0 Td[(k cYcanfurtherbeminimizedwhencYisaslargeaspossible,whilerequiringyj'stobeequaltoeachother.Mathematically,thiscanbeachievedwhenYcontainsexactlycY(N)]TJ /F4 11.955 Tf 12.33 0 Td[(k)singletoncommunities.However,sinceourproblemdependsonthedetectionalgorithm,thisinequalityadvisesthatthenewlycommunitystructureYshouldcontainasmanycommunitiesofrelativelythesamesizeaspossible.Wetakeintoaccountthisobservationasitwillplayakeyroleinourimportant-nodeselectionprocess.Thisobservationisalsocoincidentwithwhatinferredinthepriorspecialcase,andintuitivelyagreeswiththeconceptofCriticalNodeDetection(CND)[ 25 ]andBalancedGraph 151

PAGE 152

Partitioning(BGP)[ 2 ]whosegoalsaimtodeletenodesandcuttheinputgraphintopconnectedcomponentsofrelativelythesamesize.However,CSVfundamentallydiffersfromtheseproblemsinthesensesthatconnectedcomponentsinBGPandCNDdonotnecessarilyreexnetworkcommunities.Inordertominimizethenumerator,werewriteitasI(X,Y)=1 N)]TJ /F4 11.955 Tf 11.95 0 Td[(k(XijnijlogNnij yj)]TJ /F11 11.955 Tf 11.96 11.36 Td[(Xijnijlogxi).ApplyingLogSumTheorem[ 18 ]totherstsummandgivesI(X,Y)1 N)]TJ /F4 11.955 Tf 11.96 0 Td[(knlogNn cXy)]TJ /F11 11.955 Tf 11.96 11.36 Td[(Xijnijlogxi=logN cX)]TJ /F5 11.955 Tf 25.47 8.09 Td[(1 N)]TJ /F4 11.955 Tf 11.96 0 Td[(kXi(xi)]TJ /F4 11.955 Tf 11.95 0 Td[(li)logxi,becausen=y=N)]TJ /F4 11.955 Tf 12.52 0 Td[(kandPcYj=1nij=xi)]TJ /F4 11.955 Tf 12.52 0 Td[(li,8i=1,...,cX,whereliisthenumberofdeleted(orlost)nodesincommunityXi,andli'ssatisfyPcXi=1li=k.Theequalityholdswhennij=yjisaconstant,say0,foralli=1,...,cX,j=1,...,cY.Ifweassumethatthisisthecase,thenPcYj=1nij=PcYj=1yj=(N)]TJ /F4 11.955 Tf 12.09 0 Td[(k),whichinturnimpliesN)]TJ /F4 11.955 Tf 11.36 0 Td[(k=Pijnij=cX(N)]TJ /F4 11.955 Tf 11.36 0 Td[(k).Hence,=1=cXandthus,li=xi)]TJ /F5 11.955 Tf 11.36 0 Td[((N)]TJ /F4 11.955 Tf 11.36 0 Td[(k)=cX.Therefore,tominimizethesecondsummand,theequationli=xi)]TJ /F5 11.955 Tf 12.5 0 Td[((N)]TJ /F4 11.955 Tf 12.51 0 Td[(k)=cXadvisesthatweshouldputmorefocuson(i.e.,removemorenodesin)big-sizedcommunitiesXiofXtobreakitintosmallermodules.Thisbreakingdownofbig-sizedcommunitiespartiallysupportsthepriorobservationthatcommunitiesofYshouldhaverelativelythesamesize.Notethatinthisanalysis,wehaveassumedthatnij=yjisaconstantforallpairofiandj.Inpractice,thismightnotalwaysbethecasesincerealcommunitiescanbedistributeddifferentlybasedontheunderlyingdetectionalgorithm.Nevertheless,wendthisobservationhelpfulasitsuggestsageneralinstructionforselectingimportantnodesinthenetwork. 152

PAGE 153

8.3.3MinimizingNMIinanOverlappedCommunityStructureTheminimizationofNMI(X,Y)measureismuchmorecomplicatedwhennetworkcommunitiescanoverlapwitheachother.Inparticular,theconditions[cXi=1Xi=Vand[cYj=1Yj=VnSstillholdinthiscase;however,Xi\XsandYj\Ytmightnotbeemptyforsomei,s=1,...,cXandj,t=1,...,cY.Thesefactsindicatethatx=PcXi=1xiN,y=PcYj=1yjN)]TJ /F4 11.955 Tf 11.95 0 Td[(kandn=PijnijN)]TJ /F4 11.955 Tf 11.96 0 Td[(k.OuranalysisstrategyinthiscaseissimilartotheprioroneaswealsostriveformaximizingthedenominatorwhileminimizingthenumeratorofNMI(X,Y)(eq. 8 ).BecausenN)]TJ /F4 11.955 Tf 12.05 0 Td[(k,theminimizationofthetoptermI(X,Y)nolongerdependsonlyonxi'sanymore.OnewaytoworkaroundthisissueistoinvestigatetherelativecorrelationbetweenthetotalcommunitysizeyandthenumberofcommunitiescY.LetA=y cYbetheratiobetweenthesetwoquantities,orinotherwords,theaveragedcommunitysize.ThedenominatorofNMI(X,Y)isevaluatedasylog(N)]TJ /F4 11.955 Tf 11.95 0 Td[(k))]TJ /F6 7.97 Tf 15.69 15.29 Td[(cYXj=1yjlogyjylog(N)]TJ /F4 11.955 Tf 11.95 0 Td[(k))]TJ /F5 11.955 Tf 13.15 8.09 Td[(log(y=cY) cY=ylog(N)]TJ /F4 11.955 Tf 11.95 0 Td[(k))]TJ /F3 11.955 Tf 11.96 0 Td[(AlogA.withequalityholdswhenallyj'sareequaltoeachother.Tofurthermaximizethisdenominator,weneedytobeaslargeaspossiblewhilekeepingAassmallaspossible,i.e.,thenewcommunitystructureYshouldcontainmoreandmorecommunitiesastoincreasecYaswellastolowerdownA.DuetothedependenceonthespecicdetectionalgorithmA,thisoptimizationonthecorrelationbetweenyandcYmightnotbegloballyachieved.However,acoarseanalysisbetweenyandcYcanrelativelybeconductedinthefollowingsenses:ifweassumethatyiswithinaconstantfactorofthetotalnumberofactualnodes(N)]TJ /F4 11.955 Tf 12.52 0 Td[(k),i.e.,ya0(N)]TJ /F4 11.955 Tf 12.65 0 Td[(k)forsomeconstanta0>1,wecanthenincreasethevalueoftheRHSbybreakingasmanycommunitiesaspossiblewhilekeepingthemhavingthesize(i.e.,enlargecYandkeepyj'sareallthesame),whichhelpstoreducetheimpactof 153

PAGE 154

AlogA.Thisobservation,thoughrelative,agreeswithwhatweachievedinthecaseofdisjointcommunitystructure.Inanunfortunatecasewhereyisnotknowntobewithinanyconstantfactorof(N)]TJ /F4 11.955 Tf 12.3 0 Td[(k),theobservationmightnotholdsincebothyandcYcanbearbitrarylargeandthus,AlogAcouldstillberelativelysmall.Next,applyingLogSumTheoremonthenumeratoryieldsI(X,Y)=XijnijlogNnij xiyjnlogNn xy,withequalityholdswhenNnij xiyjisaconstantforalli=1,...,cXandj=1,...,cY.Thus,onecantrytominimizeI(X,Y)bydeletingnodesinsuchawaythatnismaximizedandyisminimizedwhilemakingsurethatNnij xiyjisaconstant.Asaresult,thisminimizationofI(X,Y)isamultiple-objectiveoptimizationsproblemwhichmaynothaveafeasiblesolution.However,ifweassumethatthelaterconditionisimposed,i.e.,Nnij xiyj=AforsomeconstantA>0,thennij=Axiyj N,andthusn=A Nxy.ThisreducestheaboveinequalitytoI(X,Y)x NAylogAN.TheRHSoftheinequalityadvisesthat,inordertominimizedI(X,Y),thetotalsizeofnetworkcommunitiesshouldnotbetoolargewhiletheoverlappingratioofeverycommunityshouldbeequaltoeachotherandbeassmallaspossible.Thisisadifferentcriterionfromthedisjointcommunitystructurepointofview. 8.4ASolutiontoCSVProblemInthefollowingparagraphs,weconsiderthescenariowhenmaximizingtheinternaldensity[ 80 ]istheobjectivefunctionforndingnetworkcommunities,i.e.,communitiesofGareassumedtohaveoptimizedinternaldensities.Inthismanner,wepresentgenEdeg,analgorithmforsolvingCSVproblemthatisindependentoftheunderlyingcommunitydetectionalgorithmA.Oursolutionstrategywilltrytobreaklargercommunitiestoasmanysmallonesaspossiblewhilelookingforthemtohavetherelativelysamesizewithsmalloverlappingratios.Theideaofourstrategyisbased 154

PAGE 155

onthefollowingintuition:sincecommunitiesinXareoptimizedfortheirinternaldensity,theyarelikelytocontainstrongsubstructuresthataretightlyconnectedwhichformthecoresofthesecommunities.Asaresult,theremovalofcrucialnodesinacoremightpotentiallybreakthecommunityintosmallermodules.Moreover,asnodesinacorearetightlyconnected,thereshouldbesomeedgethatgeneratethem,i.e.,allnodesinthecoreareincidenttobothendpointsofthisedge.Inspiredbythisintuition,ourstrategyworkstowardstheidenticationofthesegeneratingedgesofacommunity,andthenseekfortheminimumsetofgeneratingedgesthatcomposestheoriginalcommunities.LetDbeasubsetofV.Denoteby(D)=2mD nD(nD)]TJ /F7 7.97 Tf 6.59 0 Td[(1)theinternaldensityofDandby(D)=nD(nD)]TJ /F7 7.97 Tf 6.58 0 Td[(1) 2)]TJ /F16 5.978 Tf 22.2 3.26 Td[(2 nD(nD)]TJ /F16 5.978 Tf 5.75 0 Td[(1)thethresholdfunctionontheinternaldensityofD,respectively.Foranynodesu,v2D,ifedge(u,v)isnotinE,wecallitamissingedgeinD.Inaddition,wecallanedgeinDnegativeifitisincidenttoamissingedgeinD,andpositiveotherwise.WedenetheconceptofgeneratingedgesofDasfollow Denition2. (Generatingedge)Foranyedge(u,v)inD,ifD=(D\N(u)\N(v))[fu,vgand(D)(D),wecall(u,v)ageneratingedgeofD.WefurthercallDalocalcoregeneratedby(u,v)andwritegen(u,v)=D.ForanycommunityCofG,asetLEiscalledageneratingedgesetofaCif[(u,v)2Lgen(u,v)=C.SinceCcanbegeneratedbydifferentgeneratingedgesetsandweareconstrainedonthenodebudget,wewouldintuitivelyseekforthegeneratingedgesetofminimalcardinality. Denition3. (MinimumGeneratingEdgeSet)GivenacommunityCofG,theMGESproblemseeksforageneratingedgesetLofCwiththesmallestcardinality.ThecoresgeneratedbyedgesinaMGESofacommunityCofGaretightlyconnectedandtheyalltogethercomposeC.Asaresult,ifwedeleteanendpointofeveryedgeinaMGES,CwillbebrokenintosmallermoduleswiththenumberofmodulesisatleastthenumberofedgesinaMGES(Lemma 16 ).SinceourgoalistobreakthecurrentcommunitystructureXintoasmanynewcommunitiesaspossible, 155

PAGE 156

theremovalofcrucialnodesdenedbyedgesinaMGESwillbeagoodheuristicforthispurpose.Butrstandforemost,weneedtocharacterizeallMGESsinthecurrentcommunitystructureXbasedonlyontheinputnetworkG.Lemma 17 realizesthelocationofthegeneratingedge(s)ofalocalcoreinacommunityC:theyhavetoadjacenttonodeswiththehighestdegreeinC.Basedonthisresult,wepresentinAlg. 18 aprocedurethatcancorrectlyndtheMGESofagivencommunityC(Theorem 8.1 ). Algorithm18AnoptimalalgorithmforndingtheMGES Input:NetworkG=(V,E)andacommunityC2X;Output:MinimumgeneratingedgesetLofC; 0.MarkallnodesasunassignedandL=;. 1.RemoveallnegativeedgesinC.Ifanyedge(s)survive,theyarecandidateforgeneratingedgesintheircorrespondingcommunities,includingthemtoL,gotostep2.Else,gotostep3. 2.Reconstructlocalcoresbasedongeneratingedgesfoundinstep1.Markallnodesinthosecommunitiesasassigned.DiscardgeneratingedgesinLthatfallintoanynewlyconstructedcommunities.Returnifalledgesareassigned. 3.FindthesetUasinLemma 17 .FindtheedgeinNE(U)thatcangeneratealocalcommunityhavingthelargestsize.IncludethisedgetoLandmarkallnodesinthenewlocalcommunityasassigned.Tiesarebrokenrandomly.Returnifalledgesareassigned. 4.Iftherearestillunassignednodes,saythesetIC,constructG1=G[(I[N(I))\C].Gotobacktostep1. Lemma16. LetLbeaMSGEofacommunityC.TheremovalofanendpointineveryedgeofLwillbreakCintoatleastjLjsubcommunities. Proof. Clearly,theremovalofanendpointofeveryedgeinLwilldegradetheinternaldensityofeachcoresincetheendpointofthegeneratingedgeisoffulldegreeinitscore.Now,ifthenumberofsubcommunitiesresultedinthenoderemovalislessthanjLj,itmeansthereareatleasttwocoresthataremergedtogether.Thatistherearecoresc1andc2aremergedtogetherevenwithlessinternaldensity.Thisshouldnotbethecasesinceotherwise,theyhavetobeidentiedasasinglecoreattherstplace. 156

PAGE 157

Theircombination,asaresult,impliesthatChasaMGESofsizelessthanjLj,whichraisesacontradictiontotheassumptionthatLisaMGESofC. Lemma17. LetCbeasubsetofV,U=fu2CjdCuisthehighestinCgandNE(U)=f(u,v)ju2Uorv2Ubutnotbothg.Then,jNE(U)\Lj1. Proof. Aftereachrefreshmentinstep2,letubethenodewiththehighestindegreeinC.Afterstep1ofAlg. 18 ,allnegativeedgesaredeletedsincetheydonotcontributetotheactualgeneratingsetL.Assuch,edgesincidenttouarenotnegative.Thisinturnimpliesthattheyarecandidatesforgeneratingedges.Now,iteratethroughalledgesincidenttouandchoosetheonethatgeneratesthebiggest-sizedcore.ThisedgeshouldbeinthelistL. Theorem8.1. LetdCbethemaximumin-degreeofanodeinC.Alg. 18 takesO(dCjCj)timeintheworstcasescenariosandreturnsanoptimalsolutionforMGESproblem. Proof. SinceeverytimeLemma 17 makessurethatatleastoneedgeshouldbeaddedtoLandtheprocedureterminateswhennoedgesleft,theAlg. 18 shouldterminate.Moreover,itisveriablethatAlg. 18 takeattimeasmostthenumberofedgesinC,whichisO(dCjCj).Also,duetotheintenseinternaldensityofacore,everytimeanedgeisaddedintoL,thatedgeactuallygeneratesthelargestcorepossible.Theprooffollowsfromthisfact,Lemma 17 andtheexhaustivepropertyofAlg. 18 Algorithm19genEdge-AnodeselectionstrategyforCSVbasedongeneratingedges Input:NetworkG=(V,E),X=A(G);Output:AsetSVofknodes; 1.UseAlg. 18 tondLXiforallcommunitiesXi'sinX. 2.SortallcommunitiesXi'sinXbytheirsizesofMGSEs. 3.SortallnodesinGbythenumberofgeneratingedgesthattheyareincidenttoinXi.Ifthereisatie,sortthembytheirdegreesinG. 4.Returntopknodesinstep3. WiththeoptimalsolutionofMGEStakenintoaccount,wenextsuggestaheuristicforselectingimportantnodesfollowingtheguidelinessuggestedintheprevious.In 157

PAGE 158

particular,ourheuristicselectsnodesinagreedymanner,startingfromcommunitiesthathavelarge-sizeMGESs.Moreover,intheMGESofeachcommunityC,wegiveprioritytonodesthatareincidenttomoregeneratingedgessincetheirremovalswillbreakCintomoresubcommunities. 8.5ExperimentalResultsInthissection,weshowtheempiricalresultsofournodeselectionstrategyforCSVonbothsynthesizednetworkswithknowncommunitystructuresandreal-worldsocialtracesincludingtheRealityminingcellulardataset[ 29 ],Facebook[ 100 ]andFoursquare[ 21 ]socialnetworks.Inordertocertifytheperformanceofourapproach,wecomparetheresultsobtainedbythefollowingmethods:Highdegreecentrality(highDeg)selectstopknodesinGwiththehighestdegrees,betweenesscentrality(betweeness)selectstopknodesinGwiththehighestbetweenesses(wherethebetweenessofanodeuisthenumberofshortestpathsinGthatpassthroughu),Generatingedges(genEdge)-ourstrategydescribedinAlg. 19 ,andnally,NodeImportance(nodeImp)[ 105 ]selectstopknodesbytheirimportancetothecommunitystructure.WerstexaminetheeffectoftheunderlyingcommunitydetectionmethodsbycomparingresultsobtainedbyAFOCS[ 80 ],Blondel[ 6 ]andOslom[ 61 ]algorithmstotheembeddedgroundtruths.Inparticular,wesetXtobethegroundtruthcommunitystructureandwhenSisremovedfromthenetworkNMI(X,Y)isreported,whereY=AFOCS(G[VnS]),Y=Blondel(G[VnS])andY=Oslom(G[VnS]),respectively.Thesemethodshavebeenempiricallycertiedintheliteraturetothebestalgorithmsforndingnon-overlappingandoverlappingcommunitystructure[ 55 ].Verifyingourstrategyonsynthesizednetworksnotonlycertiesitsperformancebutalsoprovidesusthecondencetoitsbehaviorswhenappliedtoreal-wordtraces.Wenextdemonstratethefollowingquantities(1)theNMIdifferencesbetweencommunitystructuresbeforeandafterthenoderemoval,whichisourmainobjectivefunction,(2)thenumberof 158

PAGE 159

ANMIscoresbyAFOCS BNMIscoresbyBlondel CNMIscoresbyOslom Figure8-1. ComparisonamongdifferentnodeselectionstrategiesonsynthesizednetworkswithN=2500nodes 159

PAGE 160

ANMIscoresonAFOCS BNMIscoresonBlondel CNMIscoresonOslom Figure8-2. ComparisonamongdifferentnodeselectionstrategiesonsynthesizednetworkswithN=5000nodes 160

PAGE 161

communitiesinthenewstructure,and(3)theaveragesizeofthenetworkcommunitiesinthenewstructure. 8.5.1ResultsonSynthesizedNetworksSetup:Weusethewell-knownLFRoverlappingbenchmark[ 55 ]togeneratetestnetworks.ThenumberofnodesareN=2500and5000,themixingparameter=0.15,thecommunitysizescmin=10andcmax=50forN=2500andcmin=30andcmax=100forN=5000.Ateveryknodesareremovedfromthenetwork,thenetworkcommunitystructureisreidentiedandcomparedtotheoriginalembeddedone(ortheground-truth).TheoverlappingthresholdinAFOCSissetat0.7andalltestsareaveragedon100runsforconsistency. 8.5.1.1SolutionqualityWerstevaluatetheperformanceofallaforementionednodeselectionsstrategiesondifferentcommunitydetectionalgorithmsAFCOS,BlondelandOslom,respectively.Becausetheground-truthcommunitiesonsynthesizednetworksaregivenapriori,comparisonsthroughNMIscoresamongthesestrategiesaswellasamongdetectionalgorithmsarethereforevalid,andthelowerNMIscoresastrategyobtains,themoreeffectiveitseemstobe.Inaddition,thehighertheremainingNMImeasureadetectionalgorithmobtainsafterthenoderemoval,themoreresistanttonodevulnerabilityitseemstobe.Thequalityofnodeselectionsolutions,arereportedingures 8-1 and 8-2 .Inageneraltrend,NMIscorestendtodropdownquicklyasmorenodesareremovedfromthenetworkwhenN=2500;however,theydegrademuchslowerinnetworkswithN=5000.TherstobservationrevealedinthoseguresisthatourapproachgenEdgeachievesthebest(lowest)NMIscoresonalmostalltestcases.Inaverage,onnetworkswith2500nodes,genEdgeis14%betterthanbothhighDegandbetweeness,andis12%betterthannodeImponAFOCSalgorithm;andis19%,11%and5%betterthanhighDeg,betweeness,andnodeImponBlondelalgorithm(gure 8-1A 8-1B ).OnOslomalgorithm, 161

PAGE 162

AN=2500 BN=2500 CN=5000 DN=5000 Figure8-3. ResultsobtainedbyAFOCSonnetworkswithN=2500nodesandN=2500nodes. genEdgediffersinsignicantwithhighDegandbetweenesswith1.5%and1.4%better,andisonlylaggedbehindnodeImpwith3%lowerNMIscores.Onnetworkwith5000nodes,genEdgestilloutperformsotherstrategieswith12%lowerNMIscoresthantheothersonAFOCSalgorithm,andwith23%,8%and6%lowerNMIscoresthanhighDeg,betweenessandnodeImponBlondelalgorithm,andnally,with7%,10%and8%betterthantheothersonOslomalgorithm(gure 8-2 ).TheseresultsimplythatgenEdgenodeselectionstrategyperformsexcellentlywithcompetitiveresultsondifferentcommunitydetectionalgorithmincomparisonwithotherstrategies.Thesecondobservationweobtainfromgures 8-1 and 8-2 isthatthetop-of-the-listnodeseemstobeessentialtothenetworkcommunitystructure.Theremovalofonly 162

PAGE 163

thisnodefromthenetworkbringstheNMIscorestoaslowas0.7-0.8onAFOCS(gure 8-1A 8-2A ),to0.58-0.6onBlondelalgorithm(gure 8-1B 8-2B ),andto0.7onOslomalgorithm.Furthermore,thetop15-20nodesarealsovitaltothenetworkcommunitystructuredetectedbyOslomandBlondelsincetheirdestructionbringstheNMIscoresdownto0.5,thethresholdwherethecommunitystructurebecomestochasticandfuzzytorecognize.TheNMIvaluesonAFOCSalgorithm,ontheotherhand,donotsufferfromthisdestructionastheyonlycomecloseto0.5whenalmostk=50nodesareremovedfromthenetworkswithN=2500nodes(gure 8-1A ).Finally,thelastobservationinferredfromgures 8-1 and 8-2 isthat,amongthethreecommunitydetectionalgorithms,AFOCSalgorithmobtainsthehighestremainingNMIvalueswhenthesamenumberofnodesisremovedfromthenetworks.Inotherwords,AFOCSwasabletodetectthecommunitystructurewhichwasofthemostsimilaritytotheground-truthcommunities.Aswediscussedabove,thisobservationimpliesthatAFOCSseemstobethedetectionalgorithmwhichismoreresistanttonodevulnerabilitythantheotheralgorithms.Therefore,weemployAFOCSasthemaincommunitydetectionalgorithmtofurtheranalyzenetworkcommunitiesofreal-worldtraces. 8.5.1.2TheNumberofCommunitiesandTheirSizesWenextexaminethenumberofcommunitiesandtheirsizeswhenkimportantnodesareremovedfromthenetwork.Asdiscussedinsubsection 8.4 ,ourselectionstrategygivesprioritytobreakingthecurrentcommunitystructureintomorecommunitieswhilelookingfortheirsizestoberelativelythesameinordertominimizeNMImeasure.Theresultsarepresentedingure 8-3 .Asreportedinthesegures,thenumbersofnewcommunitiesgeneratedbygenEdgetendtoincreaseasmorenodesareexcluded;however,theydifferinsignicantlyfromothermethodsonsmallnetworksof2500nodes(gure 8-3A ),butthedifferencesbecomemorevisibleonlargernetworksof5000nodes(gure 8-3C ).Inparticular,the 163

PAGE 164

Table8-1. Statisticofsocialtraces Data N M Avg.Deg Max. Com.Size Reality 100 3100 62 35 Facebook 63731 1.5M 23.50 33425 Foursquare 47260 1.1M 49.13 30381 numberofcommunitiesgeneratedbygenEdgeisthesecondhighestwhenN=5000(onlybelowbetweenessmethod)whiletheaveragesizesofcommunitiesarerelativelyequaltoothermethods(gure 8-3B and 8-3D ).OnemightquestionwhytheNMIscoresreturnedbygenEdgeisstillhighsinceitsnumberofcommunitiesandaveragecommunitysizearerelativelythesameastheother.Onepossiblereasonisbecausenewcommunitiesformedbyotherstrategiesmightpossiblybethesubcommunitiesorpartsofoftheoriginalstructure,whichinturnresultsinhighsimilaritytotheground-truth.Ourstrategy,ontheotherhand,makessurethatonceanodeincidenttothemostgeneratingedgesisexcluded,thesubcommunitystructureisbrokenandthenewcommunitystructurehaslittlesimilaritytotheoriginalone,andhence,thelowerNMImeasures. 8.5.2ResultsonRealWorldTracesWefurtherpresenttheempiricalresultsofCSVonreal-worldnetworksincludingRealitymobilephonedata[ 29 ],Facebook[ 100 ]andFoursquare[ 21 ]datasets.TheoverviewofthesedatasetsissummarizedinTable 8-1 .RealityMiningdatasetprovidedbytheMITMediaLab.Thisdatasetcontainscommunication,proximity,location,call,andactivityinformationfrom100studentsatMIToverthecourseofthe2004-2005academicyear.Facebookdatasetcontainsfriendshipinformation(i.e.,whoisfriendwithwhomandwallposts)amongNewOrleansregionalnetworkonFacebook,spanningfromSep2006toJan2009.Tocollecttheinformation,theauthorscreatedseveralFacebookaccounts,joinedeach 164

PAGE 165

AReality BFoursquare CFacebook Figure8-4. NMIscoresonRealityminingdata,FoursquareandFacebooknetworksobtainedbyAFOCS(k=50...1000) 165

PAGE 166

totheregionalnetwork,startedcrawlingfromasingleuserandvisitedallfriendsinabreath-rst-searchfashion.Foursquaredatasetcontainslocationandactivitiesof47260usersonFoursquaresocialnetworkonMay2011-Jul2011.Tocollectthedata,wecreatedseveralFoursquareaccounts,joinedtothenetwork,startedcrawlingfromasingleuserandvisitedallfriendsalsoinabreadth-rst-searchfashion.OnRealityMiningdataset,wesetk=1...20andreportresultingure 8-4A .Itrevealsfromthisgurethatcommunitystructureinthisdatasetisextremelyvulnerabletonodeattackssincetheremovalofonly2nodes,foundbygenEdgeisenoughtomakethenewcommunitystructuresignicantlydiffersfromtheoriginaloneasitbringsdowntheNMIvaluesto0.4.Incomparisonwithothernodeselectionmethods,genEdgestillperformexcellentlyandisabout14%-17%betterthantheothers.WenotethattherstnodeidentiedbygenEdgeisindeedcrucialtothecommunitystructureofthisnetworksinceitimmediatelybringsdownNMIscoreto0.6whiletheotherdoesnotseemtodiscoverthisimportantfeature.Furthermore,whentoomanynodesareremovedfromthenetwork,thenetworkdoesseemtocontaincommunitiesanymoreorthecommunitystructurebecomeextremelyfuzzyasNMIvaluesconvergedowntoaround0.2.Thisisunderstandablesincethisdatasetisofsmallsizewithaveryhighaveragenodedegree.OnlargernetworksFacebookandFoursquare,wesetkfrom50nodesto1000nodes(only2.1%and1.5%numberofnodesofFoursquareandFacebooknetworks)witha50-nodeincrementatatime.Thenumericalresultsarereportedingure 8-4 .Ingeneral,NMIvaluesofallmethodsdegradequicklyonFoursquarenetworks,andtendtodecreasesloweronFacebooknetworks.Asmorenodesareexcludedfromthenetwork,genEdgestillachievesthebestperformanceonbothnetworkswithsignicantlylowerNMIvaluesthantheothermethods.Specically,onFoursquarewithhighaveragedegreeandinternalcommunitydensity,theremovalofnodesincidenttothemostgeneratingedgesingenEdgesignicantlyleadstotheseparationofnetworkcommunitystructureasNMIscoresdropdownto0.2ingenEdge.OnFacebooknetwork,the 166

PAGE 167

similaritybetweentheoriginalandnewcommunitystructureseemtoretainfairlyhighevenall1000nodesareremoved,whereasthenewstructureofArXivnetworkisattheedgeofstochasticthresholdsincetheNMImeasureisaround0.5.ThisimpliesthatcommunitystructureinFoursquarenetworkisalsoextremelyvulnerabletonoderemovalattacks,whilethematureFacebooknetworkdoesnotseemtosufferthisthreat.OnepossiblereasonforthisissinceFacebookcontainsagiantcommunitywithlowaveragedegree,itthereforerequiresmuchmoreeffortinordertobreakthatgiantcommunityapart.Insummary,theexperimentsonbothsynthesizedandreal-worksocialnetworkconrmtheeffectivenessofourproposedmethodbasedongeneratingedges.Theempiricalresultsalsoconrmthat,genEdgeoutperformsotherheuristicmethodsonothercommunitydetectionmethodssuchasAFOCS,BlondelandOslomalgorithms. 8.6AnApplicationinDTNsWepresentapracticalapplicationwherethedetectionofoverlappingnetworkcommunitiesplaysavitalroleinforwardingstrategiesincommunicationnetworks.Inordertoevaluatetheimpactofcommunityrestructuringincomplexnetworks,wecomparethesetofcriticalnodesidentiedbyourcommunitystructurevulnerabilityalgorithmtothesetofnodesselectedusingaforementionedalgorithms.Furthermore,inordertoevaluatewhichoneofthecriticalnodesetisthemostcritical,westudyhowtheremovalofthecriticalnodesetinuencestheperformanceofroutinginPocketSwitchedNetworks(PSN),intermsofaveragemessagedeliveryratio,anddelivery-time.PSNsareaparticularcaseofDTNs,wherethenodesofthenetworkcorrespondtoactualpeoplethatareequippedwithportabledevices(i.e.,mobilephones),andthatusetheseportabledevicestocommunicate.Becauseofthehighdegreeofmobilityofthistypeofnetworks,apathbetweenasourceandadestinationseldomexists,thereforemostoftheapproachestoroutinginthiskindofenvironmentsadoptastore-carry-and-forwardapproach.Instore-carry-and-forwardapproaches,messages 167

PAGE 168

arestoredlocallyand,dependingontheapproach,theyareforwardedorreplicatedtotheencounterednodeswhenanopportunityoccurs.Inthismanner,anodeisimportantifitservesasahubtoforwardthemessagestootherdevices.Asaresult,thefailuresoftheseimportantnodesshalldegradethemessagedeliveryratiowhileshallincurmoreduplicatemessagesanddeliverytime.WeusetheHAGGLEdataset[ 94 ].ThistracewascollectedattheInfocomconferencein2006inBarcelona.70studentsandresearchersattendingtheworkshopwereequippedwithiMotedevicesthatregisteredtheyencounterforthedurationoftheconference(3days).Inadditiontothe70mobilepartecipants,approximately20static,longrangeiMotesweredeployedthroughouttheareaoftheconference.Atotalof1000messagesarecreatedanduniformlydistributedduringtheexperimentdurationandeachmessagecannotexistlongerthanathresholdtime-to-live.InourevaluationwewillfocusonthePSNroutingalgorithminspiredbyBubbleRap[ 47 ].Whileweexpecttheperformanceofthisprotocoltodeteriorateupontheremovalofimportantnodes,weexpecttheperformancesofBubbleRaptodeterioratemorequickly,becauseoftherelianceoftheprotocolonthecommunitystructure.BecauseBubbleRapreliesontheknowledgeofthecommunitystructuretoroutethemessages,andbecausewerealizethatdifferentalgorithmsthatattempttondthecommunitystructureusedifferentobjectivefunctionsthatmaybemoresusceptibletotheremovalofnodes,weconsiderevaluatetheaveragedeliveryratio,averagedeliverytimeandtheaveragenumberofcopiedmessages.Results Astheremovalof10nodeinHaggledatasetisenoughtomakeitoriginalcommunitystructuretobecomestochastic(gure 8-6 ),wexk=10andreporttheresultsasafunctionoftime)]TJ /F4 11.955 Tf 12.29 0 Td[(to)]TJ /F4 11.955 Tf 12.3 0 Td[(live(theamountoftimeamessagecanexist).Theperformancesofallmethodsarepresentedingure 8-5 .Asreportedinsubgures 8-5A 8-5B and 8-5C ,theremovalofnodesselectedbygenEdgeapproachsignicantly 168

PAGE 169

AAvg.DeliveredMessages BAvg.DeliveryTime CAvg.NumberofcopiedMessages Figure8-5. SimulationresultsonHAGGLEdataset. 169

PAGE 170

Figure8-6. NMImeasureonHaggledataset. degradestheperformanceofBubbleRapforwardingandroutingsystemintermsofnotonlydeliveredmessagesandtimebutalsothenumbersofcopiedmessages.Asdepictedinsubgure 8-5A ,theaveragednumberofmessagesdeliveredbyBubbleRapundergenEdgeandtime)]TJ /F4 11.955 Tf 12.17 0 Td[(to)]TJ /F4 11.955 Tf 12.18 0 Td[(live=450sisonlytwo,whereasthoseunderhighDeg,betweenessandnodeImparefour,threeandthree,whichimplies100%and50%systemdowngradewhenonly10nodesareexcludedfromthenetworks.ThisalsomeansnodesselectedbygenEdgeareofimportantroleinmaintainingthenormaloperationofthewholenetwork.Furthermore,whennodesareremovedfromthenetwork,oneexpectsthatthedeliverytimeshouldbeincreasedasaconsequncebecauseparticipantsnowhavelesschancestocommunicatewitheachother,andthus,itshouldtakelongerforparticipatingdevicestoforwardthecarriedmessages.Thisintuitionisnicelyreectedingure 8-5B .Asreportedinthissubgure,theaverageamountoftimerequiredtodelivercarriedmessagesincreasessignicantlyastime)]TJ /F4 11.955 Tf 12.31 0 Td[(to)]TJ /F4 11.955 Tf 12.31 0 Td[(liveincreases(notethatfrom0-100s,therewasnomessagedelivered,andthus,thedeliverytimewas0).Intermsofdeliverytime,theremovalofnodesundergenEdgeaffectsthesystemtorequiresahugeextraamounttodeliverthemessagesincomparisonwithothermethods.Inparticular,thesystemdeliverytimeundergenEdgeis 170

PAGE 171

about1.25x,1.7xand1.21xhigherthanthatunderbetweeness,nodeImpandhighDegwhentime)]TJ /F4 11.955 Tf 12.81 0 Td[(to)]TJ /F4 11.955 Tf 12.82 0 Td[(live=450.Moreover,thenumberofcopiedmessages,affectedbygenEdgeapproach,isalsothehighestoneamongothermethods.ThismeansthatgenEdgeheuristicalgorithm,indeed,selectsappropriatenodeswhoseeffectssignicantlyreducethesystemperformanceasreportedbythethreeevaluatedfactors. 171

PAGE 172

CHAPTER9CONCLUSIONSInthisdissertation,weestablishthefundamentalknowledgeonthefollowingaspectsofthecomplexnetworkscience(1)thenetworkorganizationalprincipalsviathediscoveryofitsdynamiccommunitystructure(2)theassessmentofthecommunitystructurevulnerability,and(3)thesocial-basedsolutionsforpracticalapplicationsenabledbycomplexsystems,suchasinonlinesocialnetworksandmobilenetworks.Wesuggestedtwoadaptiveframeworksfordiscoveringthedynamicnetworkcommunitystructureandanalyzetheoreticalresultsthatguaranteetheirperformances.Intheexecutionperspective,ourmethodsareadaptive,andthus,arescalableforverylargenetworkswithverycompetitiveexperimentalresults.Toinvestigatetheassessmentofthenetworkcommunitystructurevulnerability,weintroducethenewproblemofidentifyingkeynodeswhoseremovalcanmaximallyreformthecurrentnetworkcommunities.Thosenodesareimportantinmaintainingthenormalfunctioningofthewholesystem,suchasinthecaseofDTNs(inamobilenetwork)orlungcancer(inabiologicalnetwork).Ourworkpresentsrstandpreliminaryyetimportantinsights,intermsofboththeoreticalresultsandheuristicalgorithms,intothevulnerabilityassessmentofthenetworkcommunitystructure.Inanapplicationperspective,ourworkinthisdissertationfocusesonproposingnovelcommunitystructure-basedsolutionsforthefollowingemergingproblems:theforwardingandroutingstrategyinmobilenetworks,thewormcontainmentprobleminsocialnetworksandthelimitingmisinformationspreadinonlinesocialnetworks.Oursuggestedstrategiesprovideasignicantimprovementintermsofthesolutionqualityforthosementionedproblems,andpromiseawiderrangeofapplicationsenabledbydynamiccomplexnetworks. 172

PAGE 173

REFERENCES [1] Ahn,Yong-Yeol,Bagrow,JamesP.,andLehmann,Sune.Linkcommunitiesrevealmulti-scalecomplexityinnetworks.Nature466(2010):761+.URL http://arxiv.org/abs/0903.3178 [2] Andreev,K.andRacke,H.Balancedgraphpartitioning.ProceedingsofthesixteenthannualACMsymposiumonParallelisminalgorithmsandarchitectures.SPAA'04.NewYork,NY,USA:ACM,2004,120. [3] Backstrom,LarsandLeskovec,Jure.Supervisedrandomwalks:predictingandrecommendinglinksinsocialnetworks.ProceedingsofthefourthACMinternationalconferenceonWebsearchanddatamining.WSDM'11.NewYork,NY,USA:ACM,2011,635.URL http://doi.acm.org/10.1145/1935826.1935914 [4] Barnes,EarlR.AnAlgorithmforPartitioningtheNodesofaGraph.SIAMJournalonAlgebraicandDiscreteMethods3(1982).4:541.URL http://link.aip.org/link/?SML/3/541/1 [5] Berry,MichaelW.,Browne,Murray,Langville,AmyN.,Pauca,V.Paul,andPlemmons,RobertJ.Algorithmsandapplicationsforapproximatenonnegativematrixfactorization.ComputationalStatisticsandDataAnalysis.2006,155. [6] Blondel,VincentD,Guillaume,Jean-Loup,Lambiotte,Renaud,andLefebvre,Etienne.Fastunfoldingofcommunitiesinlargenetworks.JournalofStatisticalMechanics:TheoryandExperiment2008(2008).10:P10008.URL http://stacks.iop.org/1742-5468/2008/i=10/a=P10008 [7] Bose,AbhijitandShin,KangG.Proactivesecurityformobilemessagingnetworks.Proceedingsofthe5thACMworkshoponWirelesssecurity.WiSe'06.NewYork,NY,USA:ACM,2006,95.URL http://doi.acm.org/10.1145/1161289.1161307 [8] Brandes,Ulrik,Delling,Daniel,Gaertler,Marco,Gorke,Robert,Hoefer,Martin,Nikoloski,Zoran,andWagner,Dorothea.OnModularityClustering.IEEETrans.onKnowl.andDataEng.20(2008).2:172.URL http://dx.doi.org/10.1109/TKDE.2007.190689 [9] Cazabet,R.,Amblard,F.,andHanachi,C.DetectionofOverlappingCommunitiesinDynamicalSocialNetworks.SocialComputing(SocialCom),2010IEEESecondInternationalConferenceon.2010,309. 173

PAGE 174

[10] Chaintreau,Augustin,Hui,Pan,Crowcroft,Jon,Diot,Christophe,Gass,Richard,andScott,James.ImpactofHumanMobilityonOpportunisticForwardingAlgorithms.MobileComputing,IEEETransactionson6(2007).6:606. [11] Chen,Wei,Wang,Chi,andWang,Yajun.Scalableinuencemaximizationforprevalentviralmarketinginlarge-scalesocialnetworks.Proceedingsofthe16thACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining.KDD'10.NewYork,NY,USA:ACM,2010,1029.URL http://doi.acm.org/10.1145/1835804.1835934 [12] Chen,Wei,Yuan,Yifei,andZhang,Li.ScalableInuenceMaximizationinSocialNetworksundertheLinearThresholdModel.Proceedingsofthe2010IEEEInternationalConferenceonDataMining.ICDM'10.Washington,DC,USA:IEEEComputerSociety,2010,88.URL http://dx.doi.org/10.1109/ICDM.2010.118 [13] Cichocki,A.,Lee,H.,Kim,Y-D,andChoi,S.Non-negativematrixfactorizationwith-divergence.PatternRecognitionLetters('08). [14] Cichocki,A.andZdunek,R.Multilayernonnegativematrixfactorizationusingprojectedgradientapproaches.Proc.13thInternationalConferenceonNeuralInformationProcessing('07). [15] Cichocki,A.,Zdunek,R.,andAmari,S.-i.NonnegativeMatrixandTensorFactorization[LectureNotes].SignalProcessingMagazine,IEEE25(2008).1:142. [16] Cichocki,Andrzej,Zdunek,Rafal,Phan,AnhHuy,andAmari,Shun-ichi.Nonneg-ativeMatrixandTensorFactorizations:ApplicationstoExploratoryMulti-wayDataAnalysisandBlindSourceSeparation.WileyPublishing,2009. [17] Clauset,Aaron,Newman,M.E.J.,andMoore,Cristopher.Findingcommunitystructureinverylargenetworks.PhysicalReviewE70(2004).6:066111+.URL http://dx.doi.org/10.1103/PhysRevE.70.066111 [18] Cover,T.M.andThomas,J.A.ElementsofInformationTheory.Wiley-Interscience,1991. [19] Daly,ElizabethM.andHaahr,Mads.Socialnetworkanalysisforroutingindisconnecteddelay-tolerantMANETs.Proceedingsofthe8thACMinternationalsymposiumonMobileadhocnetworkingandcomputing.MobiHoc'07.NewYork,NY,USA:ACM,2007,32.URL http://doi.acm.org/10.1145/1288107.1288113 174

PAGE 175

[20] Danon,L.,Diaz-Guilera,A.,Duch,J.,andArenas,A.Comparingcommunitystructureidentication.JournalofStatisticalMechanics:TheoryandExperiment2005(2005).09:P09008. [21] Data,Foursquare.sites.google.com/site/namnpuf/original foursquare.7z.Collecteddata.2012,0. [22] dataset,ArXiv.http://www.cs.cornell.edu/projects/kddcup/datasets.html.KDDCup2003(2003). [23] Delvenne,J-CC.,Yaliraki,S.N.,andBarahona,M.Stabilityofgraphcommunitiesacrosstimescales.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica107(2010).29:12755.URL http://dx.doi.org/10.1073/pnas.0903215107 [24] Ding,Chris,Li,Tao,andPeng,Wei.OntheequivalencebetweenNon-negativeMatrixFactorizationandProbabilisticLatentSemanticIndexing.Comput.Stat.DataAnal.52(2008).8:3913.URL http://dx.doi.org/10.1016/j.csda.2008.01.011 [25] Dinh,ThangN.,Xuan,Ying,Thai,MyT.,Pardalos,PanosM.,andZnati,Taieb.Onnewapproachesofassessingnetworkvulnerability:hardnessandapproximation.IEEE/ACMTrans.Netw.20(2012).2:609. [26] Dinh,T.N.,Xuan,Ying,andThai,M.T.Towardssocial-awareroutingindynamiccommunicationnetworks.PerformanceComputingandCommunicationsConference(IPCCC),2009IEEE28thInternational.2009,161. [27] Duan,Dongsheng,Li,Yuhua,Jin,Yanan,andLu,Zhengding.Communityminingondynamicweighteddirectedgraphs.Proceedingsofthe1stACMinternationalworkshoponComplexnetworksmeetinformation&knowledgemanagement.CNIKM'09.NewYork,NY,USA:ACM,2009,11.URL http://doi.acm.org/10.1145/1651274.1651278 [28] E,Weinan,Li,Tiejun,andVanden-Eijnden,Eric.Optimalpartitionandeffectivedynamicsofcomplexnetworks.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica105(2008).23:7907.URL http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2786939&tool=pmcentrez&rendertype=abstract [29] Eagle,Nathanand(Sandy)Pentland,Alex.Realitymining:sensingcomplexsocialsystems.PersonalUbiquitousComput.10(2006).4:255.URL http://dx.doi.org/10.1007/s00779-005-0046-3 175

PAGE 176

[30] Fire,Michael,Tenenboim,Lena,Lesser,Ofrit,Puzis,Rami,Rokach,Lior,andElovici,Yuval.LinkPredictioninSocialNetworksUsingComputationallyEfcientTopologicalFeatures.SocialCom/PASSAT.IEEE,2011,73.URL http://dblp.uni-trier.de/db/conf/socialcom/socialcom2011.html#FireTLPRE11 [31] Foobface.facebook virus turns your computer into a zombie.html,http://www.pcworld.com/article/155017/.PCWorld.2008,1. [32] Fortunato,S.Communitydetectioningraphs.PhysicsReports486(2010).3-5:75174. [33] Fortunato,S.andCastellano,C.CommunityStructureinGraphs.eprintarXiv:0712.2716(2007). [34] Fortunato,Santo.Communitydetectioningraphs.PhysicsReports486(2010):75. [35] Fortunato,SantoandBarthelemy,Marc.Resolutionlimitincommunitydetection.ProceedingsoftheNationalAcademyofSciences104(2007).1:36.URL http://www.pnas.org/content/104/1/36.abstract [36] Garey,MichaelR.andJohnson,DavidS.ComputersandIntractability;AGuidetotheTheoryofNP-Completeness.NewYork,NY,USA:W.H.Freeman&Co.,1990. [37] Girvan,M.andNewman,M.E.J.Communitystructureinsocialandbiologicalnetworks.ProceedingsoftheNationalAcademyofSciences99(2002).12:7821.URL http://dx.doi.org/10.1073/pnas.122653799 [38] .Communitystructureinsocialandbiologicalnetworks.PNAS99(2002). [39] Goldberg,M.,Kelley,S.,Magdon-Ismail,M.,Mertsalov,K.,andWallace,A.FindingOverlappingCommunitiesinSocialNetworks.SocialComputing(SocialCom),2010IEEESecondInternationalConferenceon.2010,104. [40] Gregory,Steve.Findingoverlappingcommunitiesinnetworksbylabelpropagation.NewJournalofPhysics12(2010).10:103018. [41] Grubesic,T.H.,Matisziw,T.C.,Murray,A.T.,andSnediker,D.Comparativeapproachesforassessingnetworkvulnerability.Inter.RegionalSci.Review31(2008). [42] Guimera,RogerandAmaral,LuisA.Nunes.Functionalcartographyofcomplexmetabolicnetworks.Nature433(2005).7028:895. 176

PAGE 177

[43] Hoffmann,KHandSalamon,P.BoundingthelumpingerrorinMarkovchaindynamics.AppliedMathematicsLetters22(2009).9:1471.URL http://www.google.com/search?client=safari&rls=en-us&q=Bounding+the+lumping+error+in+Markov+chain+dynamics&ie=UTF-8&oe=UTF-8 [44] Hopcroft,John,Khan,Omar,Kulis,Brian,andSelman,Bart.Trackingevolvingcommunitiesinlargelinkednetworks.ProceedingsoftheNationalAcademyofSciences101(2004):5249.URL http://www.pnas.org/cgi/content/full/101/suppl_1/5249 [45] Hui,Pan,Crowcroft,J.,andYoneki,E.BUBBLERap:Social-BasedForwardinginDelay-TolerantNetworks.MobileComputing,IEEETransactionson10(2011).11:1576. [46] Hui,PanandCrowcroft,Jon.HowSmallLabelsCreateBigImprovements.Per-vasiveComputingandCommunicationsWorkshops,2007.PerComWorkshops'07.FifthAnnualIEEEInternationalConferenceon.2007,65. [47] Hui,Pan,Crowcroft,Jon,andYoneki,Eiko.Bubblerap:social-basedforwardingindelaytolerantnetworks.Proceedingsofthe9thACMinternationalsymposiumonMobileadhocnetworkingandcomputing.MobiHoc'08.NewYork,NY,USA:ACM,2008,241. [48] Hui,Pan,Yoneki,Eiko,Chan,ShuYan,andCrowcroft,Jon.Distributedcommunitydetectionindelaytolerantnetworks.Proceedingsof2ndACM/IEEEinternationalworkshoponMobilityintheevolvinginternetarchitecture.MobiArch'07.NewYork,NY,USA:ACM,2007,7:1:8.URL http://doi.acm.org/10.1145/1366919.1366929 [49] Jamali,Mohsen,Haffari,Gholamreza,andEster,Martin.ModelingtheTemporalDynamicsofSocialRatingNetworksUsingBidirectionalEffectsofSocialRelationsandRatingPatterns.Proceedingsofthe2010IEEEInternationalConferenceonDataMiningWorkshops.ICDMW'10.Washington,DC,USA:IEEEComputerSociety,2010,344.URL http://dx.doi.org/10.1109/ICDMW.2010.103 [50] Kemeny,JohnG.,Hazleton,Mirkil,J.Laurie,Snell,andGeraldL.,Thompson.FiniteMathematicalStructures.1stedition.EnglewoodCliffs,N.J.:Prentice-Hall,Inc(1959). [51] Kim,Hyang-AhandKarp,Brad.Autograph:towardautomated,distributedwormsignaturedetection.Proceedingsofthe13thconferenceonUSENIXSecuritySymposium-Volume13.SSYM'04.Berkeley,CA,USA:USENIXAssociation,2004,19. 177

PAGE 178

URL http://dl.acm.org/citation.cfm?id=1251375.1251394 [52] Kim,Min-SooandHan,Jiawei.Aparticle-and-densitybasedevolutionaryclusteringmethodfordynamicnetworks.Proc.VLDBEndow.2(2009).1:622.URL http://dl.acm.org/citation.cfm?id=1687627.1687698 [53] Koobface.http://news.cnet.com/koobface-virus-hits-facebook/.CNET.2008,1. [54] Kovacs,IstvanA.,Palotai,Robin,Szalay,MateS.,andCsermely,Peter.CommunityLandscapes:AnIntegrativeApproachtoDetermineOverlappingNetworkModuleHierarchy,IdentifyKeyNodesandPredictNetworkDynamics.PLoSONE5(2010).9:e12528. [55] Lancichinetti,A.andFortunato,S.Communitydetectionalgorithms:Acomparativeanalysis.Phys.Rev.E80(2009).5:056117. [56] Lancichinetti,A.,Fortunato,S.,andJnos,K.Detectingtheoverlappingandhierarchicalcommunitystructureincomplexnetworks.NewJournalofPhysics11(2009).3:033015. [57] Lancichinetti,AndreaandFortunato,Santo.Benchmarksfortestingcommunitydetectionalgorithmsondirectedandweightedgraphswithoverlappingcommunities.Phys.Rev.E80(2009).1:016118.URL http://pre.aps.org/abstract/PRE/v80/i1/e016118 [58] .Communitydetectionalgorithms:Acomparativeanalysis.Phys.Rev.E80(2009):056117.URL http://link.aps.org/doi/10.1103/PhysRevE.80.056117 [59] .Consensusclusteringincomplexnetworks.ScienticReports2(2012).URL http://dx.doi.org/10.1038/srep00336 [60] Lancichinetti,Andrea,Radicchi,Filippo,Ramasco,JosJ.,andFortunato,Santo.FindingStatisticallySignicantCommunitiesinNetworks.PLoSONE6(2011).4:e18961.URL http://dx.doi.org/10.1371%2Fjournal.pone.0018961 [61] .FindingStatisticallySignicantCommunitiesinNetworks.PLoSONE6(2011).4:e18961.URL http://dx.doi.org/10.1371%2Fjournal.pone.0018961 [62] Lazar,A.,Abel,D.,andVicsek,T.Modularitymeasureofnetworkswithoverlappingcommunities.EPL(EurophysicsLetters)90(2010).1:18001. 178

PAGE 179

URL http://stacks.iop.org/0295-5075/90/i=1/a=18001 [63] Lee,C.,Reid,F.,McDaid,A.,andHurley,N.Detectinghighlyoverlappingcommunitystructurebygreedycliqueexpansion.Proceedingsofthe4thWorkshoponSocialNetworkMiningandAnalysis(2010). [64] Lee,DanielD.andSeung,H.Sebastian.AlgorithmsforNon-negativeMatrixFactorization.InNIPS.MITPress,2000,556. [65] Leskovec,Jure,Huttenlocher,Daniel,andKleinberg,Jon.Predictingpositiveandnegativelinksinonlinesocialnetworks.Proceedingsofthe19thinternationalconferenceonWorldwideweb.WWW'10.NewYork,NY,USA:ACM,2010,641.URL http://doi.acm.org/10.1145/1772690.1772756 [66] Leskovec,Jure,Lang,KevinJ.,Dasgupta,Anirban,andMahoney,MichaelW.Statisticalpropertiesofcommunitystructureinlargesocialandinformationnetworks.Proceedingsofthe17thinternationalconferenceonWorldWideWeb.WWW'08.NewYork,NY,USA:ACM,2008,695.URL http://doi.acm.org/10.1145/1367497.1367591 [67] Li,TaoandDing,Chris.TheRelationshipsAmongVariousNonnegativeMatrixFactorizationMethodsforClustering.ProceedingsoftheSixthInternationalConferenceonDataMining.ICDM'06.Washington,DC,USA:IEEEComputerSociety,2006,362.URL http://dx.doi.org/10.1109/ICDM.2006.160 [68] Li,Yanhua,Zhang,Zhi-Li,andBao,Jie.MutualorUnrequitedLove:IdentifyingStableClustersinSocialNetworkswithUni-andBi-directionalLinks.AlgorithmsandModelsfortheWebGraph.eds.AnthonyBonatoandJeannetteJanssen,vol.7323ofLectureNotesinComputerScience.SpringerBerlinHeidelberg,2012.113.URL http://dx.doi.org/10.1007/978-3-642-30541-2_9 [69] Liben-Nowell,DavidandKleinberg,Jon.Thelinkpredictionproblemforsocialnetworks.ProceedingsofthetwelfthinternationalconferenceonInformationandknowledgemanagement.CIKM'03.NewYork,NY,USA:ACM,2003,556.URL http://doi.acm.org/10.1145/956863.956972 [70] Lin,Yu-Ru,Chi,Yun,Zhu,Shenghuo,Sundaram,Hari,andTseng,BelleL.Facetnet:aframeworkforanalyzingcommunitiesandtheirevolutionsindynamicnetworks.Proceedingsofthe17thinternationalconferenceonWorldWideWeb.WWW'08.NewYork,NY,USA:ACM,2008,685. 179

PAGE 180

URL http://doi.acm.org/10.1145/1367497.1367590 [71] .Analyzingcommunitiesandtheirevolutionsindynamicsocialnetworks.ACMTrans.Knowl.Discov.Data3(2009).2:8:1:31.URL http://doi.acm.org/10.1145/1514888.1514891 [72] Lin,Yu-Ru,Sun,Jimeng,Castro,Paul,Konuru,Ravi,Sundaram,Hari,andKelliher,Aisling.MetaFac:communitydiscoveryviarelationalhypergraphfactorization.Proceedingsofthe15thACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining.KDD'09.NewYork,NY,USA:ACM,2009,527.URL http://doi.acm.org/10.1145/1557019.1557080 [73] Luxburg,Ulrike.Atutorialonspectralclustering.StatisticsandComputing17(2007).4:395.URL http://dx.doi.org/10.1007/s11222-007-9033-z [74] Matisziw,TimothyC.andMurray,AlanT.Modelings-tpathavailabilitytosupportdisastervulnerabilityassessmentofnetworkinfrastructure.Comput.Oper.Res.36(2009).1:16. [75] Newman,MEJ.TheStructureandFunctionofComplexNetworks.SIAMReview45(2003).2:167.URL http://link.aip.org/link/SIREAD/v45/i2/p167/s1&Agg=doi [76] Newman,M.E.J.Fastalgorithmfordetectingcommunitystructureinnetworks.Phys.Rev.E69(2004):066133.URL http://link.aps.org/doi/10.1103/PhysRevE.69.066133 [77] .Findingcommunitystructureinnetworksusingtheeigenvectorsofmatrices.PhysicalReviewE74(2006).3:036104. [78] Newman,M.E.J.andGirvan,M.Findingandevaluatingcommunitystructureinnetworks.PhysicalReviewE69(2004).026113. [79] Newman,MEJandLeicht,EA.Mixturemodelsandexploratoryanalysisinnetworks.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica104(2007).23:9564.URL http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1887592&tool=pmcentrez&rendertype=abstract [80] Nguyen,NamP.,Dinh,ThangN.,Tokala,Sindhura,andThai,MyT.Overlappingcommunitiesindynamicnetworks:theirdetectionandmobileapplications. 180

PAGE 181

Proceedingsofthe17thannualinternationalconferenceonMobilecomputingandnetworking.MobiCom'11.NewYork,NY,USA:ACM,2011,85.URL http://doi.acm.org/10.1145/2030613.2030624 [81] Nguyen,N.P.,Dinh,T.N.,Xuan,Ying,andThai,M.T.Adaptivealgorithmsfordetectingcommunitystructureindynamicsocialnetworks.INFOCOM,2011ProceedingsIEEE.2011,2282. [82] Nguyen,N.P.,Xuan,Ying,andThai,M.T.Anovelmethodforwormcontainmentondynamicsocialnetworks.MILITARYCOMMUNICATIONSCONFERENCE,2010-MILCOM2010.2010,2180. [83] Nicosia,V.,Mangioni,G.,Carchiolo,V.,andMalgeri,M.Extendingthedenitionofmodularitytodirectedgraphswithoverlappingcommunities.J.Stat.Mech.:TheoryandExperiment2009(2009).03:P03024. [84] Palla,G.,Derenyi,I.,Farkas1,I.,andVicsek,T.Uncoveringtheoverlappingcommunitystructureofcomplexnetworksinnatureandsociety.Nature435(2005).10. [85] Palla,G.,Pollner,P.,Barabasi,A.,andVicsek,T.SocialGroupDynamicsinNetworks.AdaptiveNetworks(2009). [86] Palla,Gergely,Barabasi,Albert-Laszlo,andVicsek,Tamas.Quantifyingsocialgroupevolution.Nature446(2007).7136:664.URL http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve\&db=pubmed\&dopt=Abstract\&list_uids=17410175 [87] Panzarasa,P.,Opsahl,T.,andCarley,K.M.Patternsanddynamicsofusers'behaviorandinteraction:Networkanalysisofanonlinecommunity.J.AmericanSoc.ofInfo.Sci.Tech.60(5)('09). [88] Peters,Karsten,Buzna,Lubos,andHelbing,Dirk.Modellingofcascadingeffectsandefcientresponsetodisasterspreadingincomplexnetworks.IJCIS4(2008).1/2:46.URL http://dblp.uni-trier.de/db/journals/ijcritis/ijcritis4.html#PetersBH08 [89] Piccardi,Carlo.FindingandTestingNetworkCommunitiesbyLumpedMarkovChains.PLoSONE6(2011).11:e27028.URL http://dx.doi.org/10.1371%2Fjournal.pone.0027028 [90] Psorakis,I.,Roberts,S.,andEbden,M.OverlappingcommunitydetectionusingBayesiannon-negativematrixfactorization.Phys.Rev.E.83(11). 181

PAGE 182

[91] Radicchi,Filippo,Castellano,Claudio,Cecconi,Federico,Loreto,Vittorio,andParisi,Domenico.Deningandidentifyingcommunitiesinnetworks.Proceed-ingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica101(2004).9:2658.URL http://www.pnas.org/content/101/9/2658.abstract [92] Reichardt,JoergandBornholdt,Stefan.DetectingfuzzycommunitystructuresincomplexnetworkswithaPottsmodel.PhysicalReviewLetters93(2004).21:218701.URL http://arxiv.org/abs/cond-mat/0402349 [93] Rosvall,MartinandBergstrom,CarlT.MappingChangeinLargeNetworks.PLoSONE5(2010).1:e8694.URL http://dx.doi.org/10.1371%2Fjournal.pone.0008694 [94] Scott,James,Gass,Richard,Crowcroft,Jon,Hui,Pan,Diot,Christophe,andChaintreau,Augustin.CRAWDADtracecambridge/haggle/imote/infocom2006(v.2009-05-29).Downloadedfromhttp://crawdad.cs.dartmouth.edu/cambridge/haggle/imote/infocom2006,2009. [95] Scripps,Jerry,Tan,Pang-Ning,andEsfahanian,Abdol-Hossein.Noderolesandcommunitystructureinnetworks.Proceedingsofthe9thWebKDDand1stSNA-KDD2007workshoponWebminingandsocialnetworkanalysis.WebKDD/SNA-KDD'07.NewYork,NY,USA:ACM,2007,26. [96] SeanP.,MeynandRichardL.,Tweedie.MarkovChainsandStochasticStability.2ndedition.CambridgeUniversityPress(2009). [97] Sekar,Vyas,Xie,Yinglian,Reiter,MichaelK.,andZhang,Hui.AMulti-ResolutionApproachforWormDetectionandContainment.ProceedingsoftheInternationalConferenceonDependableSystemsandNetworks.DSN'06.Washington,DC,USA:IEEEComputerSociety,2006,189.URL http://dx.doi.org/10.1109/DSN.2006.6 [98] Sun,Jimeng,Faloutsos,Christos,Papadimitriou,Spiros,andYu,PhilipS.GraphScope:parameter-freeminingoflargetime-evolvinggraphs.Proceedingsofthe13thACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining.KDD'07.NewYork,NY,USA:ACM,2007,687.URL http://doi.acm.org/10.1145/1281192.1281266 [99] Tantipathananandh,ChayantandBerger-Wolf,Tanya.Constant-factorapproximationalgorithmsforidentifyingdynamiccommunities.Proceedings 182

PAGE 183

ofthe15thACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining.KDD'09.NewYork,NY,USA:ACM,2009,827.URL http://doi.acm.org/10.1145/1557019.1557110 [100] Viswanath,Bimal,Mislove,Alan,Cha,Meeyoung,andGummadi,KrishnaP.OntheevolutionofuserinteractioninFacebook.Proceedingsofthe2ndACMworkshoponOnlinesocialnetworks.WOSN'09.NewYork,NY,USA:ACM,2009,37.URL http://doi.acm.org/10.1145/1592665.1592675 [101] Sma,JirandSchaeffer,SatuElisa.OntheNP-Completenessofsomegraphclustermeasures.Proceedingsofthe32ndconferenceonCurrentTrendsinTheoryandPracticeofComputerScience.SOFSEM'06.Berlin,Heidelberg:Springer-Verlag,2006,530.URL http://dx.doi.org/10.1007/11611257_51 [102] Wang,Fei,Li,Tao,Wang,Xin,Zhu,Shenghuo,andDing,Chris.Communitydiscoveryusingnonnegativematrixfactorization.DataMin.Knowl.Discov.22(2011).3:493.URL http://dx.doi.org/10.1007/s10618-010-0181-y [103] .Communitydiscoveryusingnonnegativematrixfactorization.DataMin.Knowl.Discov.22(2011).3:493.URL http://dx.doi.org/10.1007/s10618-010-0181-y [104] Wang,Pu,Gonzalez,MartaC.,Hidalgo,CesarA.,andBarabasi,Albert-Laszlo.UnderstandingtheSpreadingPatternsofMobilePhoneViruses.Science324(2009).5930:1071.URL http://www.barabasilab.com/pubs/CCNR-ALB_Publications/200904-02_ScienceExpr-PhoneViruses/200904-02_ScienceExpr-PhoneViruses [105] Wang,Yang,Di,Zengru,andFan,Ying.IdentifyingandCharacterizingNodesImportanttoCommunityStructureUsingtheSpectrumoftheGraph.PLoSONE6(2011).11:e27418. [106] Weaver,Nicholas,Staniford,Stuart,andPaxson,Vern.Veryfastcontainmentofscanningworms.Proceedingsofthe13thconferenceonUSENIXSecuritySymposium-Volume13.SSYM'04.Berkeley,CA,USA:USENIXAssociation,2004,3.URL http://dl.acm.org/citation.cfm?id=1251375.1251378 [107] Ye,Zhenqing,Hu,Songnian,andYu,Jun.Adaptiveclusteringalgorithmforcommunitydetectionincomplexnetworks.Phys.Rev.E78(2008):046115. 183

PAGE 184

URL http://link.aps.org/doi/10.1103/PhysRevE.78.046115 [108] Zdunek,RafalandCichocki,Andrzej.Non-negativematrixfactorizationwithquasi-newtonoptimization.Proceedingsofthe8thinternationalconferenceonArticialIntelligenceandSoftComputing.ICAISC'06.Berlin,Heidelberg:Springer-Verlag,2006,870.URL http://dx.doi.org/10.1007/11785231_91 [109] Zhang,Yuzhou,Wang,Jianyong,Wang,Yi,andZhou,Lizhu.Parallelcommunitydetectiononlargenetworkswithpropinquitydynamics.Proceedingsofthe15thACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining.KDD'09.NewYork,NY,USA:ACM,2009,997.URL http://doi.acm.org/10.1145/1557019.1557127 [110] Zhu,Zhichao,Cao,Guohong,Zhu,Sencun,Ranjan,S.,andNucci,A.ASocialNetworkBasedPatchingSchemeforWormContainmentinCellularNetworks.INFOCOM2009,IEEE.2009,1476. 184

PAGE 185

BIOGRAPHICALSKETCH NamP.NguyeniscurrentlyathisfourthyearPhDstudentinDepartmentofComputerandInformationScienceandEngineering(CISE),UniversityofFloridaandamemberofOptimaNetworkScienceLabundertheguidanceofProfessorMyT.Thai.PriortohisPh.D.study,NamreceivedmyB.S.andM.S.degreesbothinappliedmathematicsfromVietnamNationalUniversity(2007)andOhioUniversity(2009).Hisresearchinterestsincludedynamiccomplexnetworkproblems,suchasnon-overlappingandoverlappingnetworkcommunitystructure,wormandviruscontainment,socialnetworks;cascadingfailures;combinatorialoptimizationandapproximationalgorithms.Inparticular,hiscurrentresearchfocusesondesigningadaptivealgorithmsforeffectivelyidentifycommunitiesindynamicnetworkssuchasmobileoronlinesocialnetworks,aswellastheirapplicationsindifferentaspectsofnetworkingproblems.Inaddition,heisalsointerestedineffectivemethodstostopthepropagationofvirus,wormandmisinformationspreadonlargescaledynamicnetworks,intermsofbothapproximationandsocial-basedalgorithms.DuringhisPh.D.study,Namhaspublishedmanypapersintop-tierconferencesandjournalsincludingINFOCOM,MOBICOM,WEBSCIandIEEETransactiononMobileComputing,IEEETransactiononNetworking,etc.In2011,NamspenthissummerasaninterninCCS-3division,LosAlamosNationalLaboratories,whereheconductedresearchandpublishedapaperoncontainingthespreadofmisinformationinlargescaleonlinesocialnetworks.NamalsotherecipientofmanyawardssuchastheStudentTravelGrantsofMILCOM'10,MOBICOM'11andSIGWEB'12conferences,TravelGrantofTheCollegeoftheEngineeringin2011,UniversityofFlorida. 185