How Do User Behaviors Affect Information Propagation in Twitter?

MISSING IMAGE

Material Information

Title:
How Do User Behaviors Affect Information Propagation in Twitter?
Physical Description:
1 online resource (43 p.)
Language:
english
Creator:
Syu, Yu-Song
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Master's ( M.S.)
Degree Grantor:
University of Florida
Degree Disciplines:
Computer Engineering, Computer and Information Science and Engineering
Committee Chair:
Thai, My Tra
Committee Members:
Mishra, Prabhat
Kahveci, Tamer

Subjects

Subjects / Keywords:
socialnetwork
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre:
Computer Engineering thesis, M.S.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
Online social networks (OSNs) have become an imperative channel for information propagation and influence such as advertising products in viral marketing or convincing voters in a political campaign. Knowledge of the intrinsic understanding of this phenomena in OSNs provides us not only the user behaviors and their mutual impacts, but also key insights into designing better advertisement strategies. However, most of related works have only focused on the analysis of specific propagation mechanism in certain networks and largely ignored two important factors: the propagation behaviors between users in general social networks and the crucial factors affecting information propagations. In this paper, we have demonstrated that: 1) multiple retweets have an unignored effect on enlarging the size of diffusion cascades and extending the propagation chains. If we extract cascades with respect to topics rather than messages based on the operating mechanism of Twitter, we can find a considerable number of cascades with ranges larger than 10 hops. Multiple retweets also keep the news fresh by allowing people sharing new different information about one topic, so that a news propagation is dynamic process rather than a static story; 2) the time interval between two consecutive retweets of one user clearly indicates the tendency that whether this user will retweet about the same topic many times; and 3) the time interval between the first two levels in a cascades helps us forecast the this cascade will conduct a long lifetime period  or not.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Yu-Song Syu.
Thesis:
Thesis (M.S.)--University of Florida, 2012.
Local:
Adviser: Thai, My Tra.
Electronic Access:
RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2013-12-31

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2012
System ID:
UFE0044690:00001


This item is only available as the following downloads:


Full Text

PAGE 1

HOWDOUSERBEHAVIORSAFFECTINFORMATIONPROPAGATIONINTWITTERByYU-SONGSYUATHESISPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFMASTEROFSCIENCEUNIVERSITYOFFLORIDA2012

PAGE 2

c2012Yu-SongSyu 2

PAGE 3

Idedicatethisthesistomydearestmother.Wishherthebestinheaven. 3

PAGE 4

ACKNOWLEDGMENTS ThankstoallthehelpIhavereceivedinwritingandlearningaboutthisthesis.ThankstoDr.Thaifortheadvicesduringthetwoyears.ThankstoYilin,whoalwayshelpsmewhendiscussing,doingsimulation,andwritingarticles.Alsothankstoallthelabmembers.Finally,greatestthankstomyfamily.TheirsupportismybestpowertokeepgoingonthewaypursuadingthedegreeintheStates. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 6 LISTOFFIGURES ..................................... 7 ABSTRACT ......................................... 8 CHAPTER 1INTRODUCTION ................................... 9 2INFORMATIONDIFFUSIONINTWITTER ..................... 12 2.1Datasets ..................................... 12 2.2TheOperatingMechanismandInformationDiffusionModel ........ 12 2.3BuildingCascadesofInformation ....................... 14 2.4TheSelectionofTopics ............................ 16 3THEEFFECTOFMULTIPLERETWEETSINTWITTER ............. 20 3.1LongerRangeThanExpected ........................ 20 3.2EveryRetweetMattersinTwitter ....................... 22 3.3TimelinessofNewsinTwitter ......................... 22 4TIMEINTERVALBETWEENMULTIPLERETWEETS .............. 25 5TIMEINTERVALBETWEENCASCADELEVELS ................ 28 5.1DistributionofTimeIntervalbetweenCascadeLevels ........... 28 5.2InformedUsersatEachLevel ......................... 30 5.3TimeintervalbetweenCascadeLevels .................... 30 6USERACTIVENESS ................................. 32 7DISCUSSIONANDFUTUREWORK ........................ 34 7.1MoreonTimeNeededtobeRetweeted ................... 34 7.2ProbabilityofBeingRetweeted ........................ 34 7.3Futurework ................................... 35 8RELATEDWORK .................................. 37 9CONCLUSION .................................... 38 REFERENCES ....................................... 39 BIOGRAPHICALSKETCH ................................ 43 5

PAGE 6

LISTOFTABLES Table page 2-1DatasetDetails .................................... 12 2-2DetailsofTopics ................................... 17 5-1TimeIntervelbetweenthe1stand2ndLevel(hr) .................. 31 6

PAGE 7

LISTOFFIGURES Figure page 2-1TheUIISModel .................................... 14 2-2AnExampleofaCascade .............................. 15 2-3TheDirectionofInformationDiffusion ....................... 16 2-4NumberofTimestoRetweetandtobeRetweetedofaUser .......... 19 3-1DistribtionofRange ................................. 21 3-2TheAverageInuenceofEachRetweetPerUser ................. 23 3-3TheNewlyInuencedFollowersbyEachRetweet ................ 24 3-4TimeNeededtobeRetweeted ........................... 24 4-1TimeIntervalbetweenRetweets .......................... 27 5-1DistributionofTimeIntervalBetweenLevels .................... 29 5-2AccumulativeSizeofAudiencebyEachHop ................... 30 6-1ActivenessofUsers ................................. 33 7-1TheDistributionofTimeNeededtobeRetweeted ................ 35 7-2TheProbabilityofBeingRetweeted ........................ 36 7

PAGE 8

AbstractofThesisPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofMasterofScienceHOWDOUSERBEHAVIORSAFFECTINFORMATIONPROPAGATIONINTWITTERByYu-SongSyuDecember2012Chair:MyT.ThaiMajor:ComputerEngineeringOnlinesocialnetworks(OSNs)havebecomeanimperativechannelforinformationpropagationandinuencesuchasadvertisingproductsinviralmarketingorconvincingvotersinapoliticalcampaign.KnowledgeoftheintrinsicunderstandingofthisphenomenainOSNsprovidesusnotonlytheuserbehaviorsandtheirmutualimpacts,butalsokeyinsightsintodesigningbetteradvertisementstrategies.However,mostofrelatedworkshaveonlyfocusedontheanalysisofspecicpropagationmechanismincertainnetworksandlargelyignoredtwoimportantfactors:thepropagationbehaviorsbetweenusersingeneralsocialnetworksandthecrucialfactorsaffectinginformationpropagations.Inthispaper,wehavedemonstratedthat:1)multipleretweetshaveanunignoredeffectonenlargingthesizeofdiffusioncascadesandextendingthepropagationchains.IfweextractcascadeswithrespecttotopicsratherthanmessagesbasedontheoperatingmechanismofTwitter,wecanndaconsiderablenumberofcascadeswithrangeslargerthan10hops.Multipleretweetsalsokeepthenewsfreshbyallowingpeoplesharingnewdifferentinformationaboutonetopic,sothatanewspropagationisdynamicprocessratherthanastaticstory;2)thetimeintervalbetweentwoconsecutiveretweetsofoneuserclearlyindicatesthetendencythatwhetherthisuserwillretweetaboutthesametopicmanytimes;and3)thetimeintervalbetweenthersttwolevelsinacascadeshelpsusforecastthethiscascadewillconductalonglifetimeperiodornot. 8

PAGE 9

CHAPTER1INTRODUCTIONTherapidgrowthofOnlineSocialNetworks(OSNs),suchasFacebook,Twitter,andGoogle+[ 1 2 6 ],hasmadethembecomeoneofthemostimportantmediaforfastinformationpropagations[ 15 20 ].Accordingtothestatistics,thereare2.9millionfollowersfollowingBritneySpearsinGoogle+andmorethan526millionusersloginFacebookeveryday[ 7 8 ].Usingthispopularchannel,manyindividualsandcompaniescanspreadtheirownmessagesoradvertisetheirproductsbyleveragingthepowerofothers'inuences.Forinstance,Twitterdrew456tweetspersecondaboutthedeadofMichaelJackson[ 5 ];the2009presidentelectioninIranreached221,744tweetsperhouronJune.16th[ 4 ].Therefore,itbecomesanurgentneedtoexploitanintrinsicunderstandingoftheinformationpropagationsinOSNsandhowtheyareaffectedbyuserbehaviors.Recently,manyworkshavebeenproposedtoinvestigatethephenomenaofmessagespreadinginOSNs,focusingonthecharacteristics(e.g.,range,scale,andspeed)ofinformationdiffusionanditsrelationstonetworkstructureandtrafccharacteristics.Amongwhich,Gomezetal.[ 16 ]studiedthereactionsinSlashdotsocialnetworks.Basedontheradialtreegeneratedbythenestingofcomments,someinterestingpropertieswerefoundsuchasselfsimilaritywithinthedifferentnestinglevelsofadiscussion.However,ausercanrateapostonlyonceinSlashdot,whichdoesnotnecessarilyhappeninothersocialnetworks.Lateron,Chaetal.[ 13 ]exploredthepropertiesofinformationdiffusioninFlickr,whichconcludedthatthepropagationcascadeisneitherwidenortall.Thatis,boththenumberofaudienceandthenumberofpropagationhopsaresmall.Yet,theirconstructionsofcascadesdoesnotguaranteetheinformationow.InDiggnetwork,Steegetal.[ 29 ]furtheranalyzethereasonwhythepropagationswillstop,inwhichtheconceptsaturationwasrstintroduced.Particularly,whenauserknowstheinformationthroughalotoffriends,thenetworkis 9

PAGE 10

calledsaturatedand,therefore,nofurtherinformationspreadingsareneeded.Again,theassumptionthatausercanonlyforwardatopiconcemadetheresultsunrealisticandlessconvincing.Therefore,theaboveworksonlyfocusedonsomemechanismsofmessagedisseminationsinspecicnetworks,yetneglectedthepropagationbehaviorsbetweenusersinotherbutmoregeneralsocialnetworks(i.e.,TwitterandFacebook)andthecrucialfactorsinuserbehaviorswhichheavilyaffectinformationpropagations.Inthispaper,wefocusontheanalysisofthemessagetracesinTwitter,inwhichuserbehaviorsaremoregeneralandpractical.Inparticular,afterreadingapost,ausercaneitherusetheretweetormentionmechanismtofurtherspreaditouttootherusers.Also,atopiccanbepostedmultipletimesandapostcanretweetedormentionedmorethanonce.Inordertoproposeasuccessfulcampaignstrategy,ouranalysisfocusesontheimpactofuserbehaviorstoinformationpropagations,mainlyaddressingthefollowingthreequestions:(1)howcanwepredictthepropagationtendencyofatweetinsocialnetworksatitsearlystage?(2)howpossiblewillapostberetweeted?Ifso,howlongwillittake?(3)willauserretweetaspecictopicfrequentlyifheisusuallynotactiveinsocialnetworks?First,wecompilethetracesinTwitterbyusingtwodatasets:oneiscollectedbyYangetal.in[ 32 ]consistingof476milliontweets(includingcontents)by17millionusersin2009;andtheotheronefromKwaketal.in[ 20 ],whichdescribesthenetworktopologywith41.7millionusersand1.47billionsocialrelations(thefollowingrelationshipsbetweenusers).TounderstandtheinformationdiffusioninTwitter,weselecttwohottopicsfromtheTopTweeterTrendsin2009[ 5 ]todemonstrateourobservations.Ourndingscanbesummarizedasfollows: Theinuenceofmultipleretweets:Thedistributionofcascadesrangedoesn'tfollowtheto6hopsassumption,whichhasbeenwidelyprovedandadmittedwhenweextractcascadeswithrespecttotopicsratherthanmessages.Multipleretweetsaboutoneparticulartopicwon'tcausetheinformationsaturationinTwitter 10

PAGE 11

butenlargetheinuenceandextendcascadessize.Besides,multipleretweetscanshortenthetimecosttoberetweetedbyotherusers. Theintervalofmultipleretweetsoneusermadeanditsrelationtotheinformationpropagation:Contrarytoourintuition,iftheintervalbetweenrsttworetweetsissmall,wendthatthisuserisverylikelytomakemoreretweetsregardingthesametopic.Thatistosay,thetimeintervalbetweenthersttworetweetsindicatesthetendencyaboutwhethertheuserwouldliketomakemoreretweetsornot.Inaddition,afterthethirdretweet,thetimeintervalsamongtherestofretweetsarealmosttheequal. Theintervalofeachlevelamongthecascadesanditsrelationtotheinformationpropagation:Thequickertherstlevelaudiencepickupthenewsfromtheseedsandforwarditout,thelargerthecascaderangewillbe.Thisndingtellsusthatthetimeintervalbetweenthersttwolevelscanbeservedasaindicatortopredictthewhetherthisseedcangeneratealargecascadeornot. Theinuenceoftheinterestofusers:consideringapostwithaspecictopic,auserwillretweetitonlyifheisactiveandkeepsretweetinginallkindsoftopicsinOSNs.Thatis,thereifveryfewsuchuserswhoisonlyinterestedinonetopicandretweetonlythepostscontainingthistopic.Therestofthepaperisorganizedasfollows.Wedescribeourmeasurementmethodology,introducethedatasetweuse,andexplaintheselectionofthetwopopularnewsforanalysisinChapter 2 .WedescribeouranalysisandndingsinChapter 3 4 5 ,and 6 .Chapter 7 discussestheinsightsbytheourndingsandpossiblefuturework.WesummarizerelatedworkinChapter 8 andconcludeinChapter 9 11

PAGE 12

CHAPTER2INFORMATIONDIFFUSIONINTWITTERInthischapter,wedescribethedatasetsweuse,delineatetheoperatingmechanismanddiffusionmodelinTwitter,andthenintroduceourmethodologytoconstructtheinformationcascadesduringthediffusionprocess. 2.1DatasetsInthispaper,weusetwodatasets[ 20 32 ]fortheanalysesofuserbehaviorandinformationdiffusioninTwitter.Therstdatasetin[ 32 ]iscollectedbyYangetal.,whichconsistesof467milliontweetspostedby17millionusersfromJunetoDecemberin2009.Thisdatasetprovidesthetime,author,andcontentofeachtweet,butitdoesnotprovideanyinformationabouttherelationshipbetweenusers,i.e.,whofollowswhomonTiwtter.Thus,wealsouseanotherdatasetin[ 20 ]inordertoconstructthenetworktopology.In[ 20 ],Kwaketal.collectedthe1.47followingrelationshipsbetween41.7millionusersonTwitter. 2.2TheOperatingMechanismandInformationDiffusionModelHerewerstdelineatetheoperatinmechanisminTwitter,andthenintroducetheinformationdiffusionmodelthatweuseinthispaper.OnTwitter,peoplecanpostaparagraph(within140words)onhisownpageorreplyotherusers'tweets.Besidestweetingandreplying,therearethreeimportantuser-userinteractionsthatoneusermaytakeonTwitter,namelyfollow,mention,andretweet.ThesearethethreeimortantactionsthatmakesinformationpropagatedinTwitter.UsersinteractinTwitterbyrstbecomingfollowersofoneanother,afterwhichtheycanthenseeallthepostsmadebywhomtheyfollowontheirownTwitterpage.Whenusersreadapieceofnews,theymaywanttofurtherforwardtotheirfollowers. Table2-1. DatasetDetails Dataset#Tweets#Users#Edges(Followings)Directed/Undirected Yang467million17million--Kwak106million41.7million1.47billionDirected 12

PAGE 13

BytypingRT@usernameorvia@usernamefollowedbythecontent,theycanindicatethesourceofthispost;similarly,thepostwillshowonalltheirfollowers'page.Ausercanspecifywhotoseethepostbyusingmentioning.Notethatatweetcanbebotharetweetandamention.SeethesecondtweetinFigure 2-3 asanexample.Twitter'soperatingmechanismisdifferentfromsomeotherOSNs(e.g.,SlashdotandDigg).Afterreadingonetweet,ausercanretweetitforseveraltimes.SuchmechanismisusedinotherOSNs,suchasFacebookandGoogle+.Infact,thisismoregeneralnotonlyitismorewidelyusedintrendingOSNs,butalsobecausewecanconsiderthemechanismthatSlashdotandDigguseisaspecialcaseofTwitters.Suchgeneralityisimportantbecauseitlargelyaffectsthepropagationmodelandperformancewouldbedifferent.Basedontheoperatinmechanism,wewouldliketoanalyzeusers'behaviorsinTwitters.Becauseausercanforward(retweet)onatopicmorethanonetime,itisnotapplicabletousethewidelyusedinactive-activeandSusceptible-Infected-Recoveredinwhichauserhasonlyonechancetoinuencehisfollowerstoprolongthediffusionprocessforonehopfurther.Thus,weproposetheUnknown-Informed-Inuential-Stopped(UIIS)modeltorepresentuserbehaviorsinTwitter.DuringthediffusionprocessononetopicinTwitter,wendthatauserhasfourstatus: 1. Unknown:theinitialstate,atwhichtheuserdoesnotknowthenewsyet. 2. informed:theuserknowsthenewsbyreadingatweetorbeingmentionedbyotherusers,buthasnotfurtherretweetthenewsyet. 3. inuential:thestatethattheuserhasfurtherforwardedthisnewsout. 4. stopped:thenalstatethattheuserhasstoppedretweeting,andwillnotretweetanymore.Figure 2-1 demonstratesthetransitionsbetweenthesefourstates.Itisimportanttohaveknowledgeofthistransitionmodelbecauseitiswhatouranalyesinthefollowingsectionsrelyon. 13

PAGE 14

Figure2-1. TheUIISmodel.ThisrepresentsthefourpossiblestatesinwhichauserinTwittercanbe.Atthebeginning,theunknownuserhasnotknownthenewsyest.Hethenbecomesinformedwhenheknowsthenewsbyreadingatweetorbeingmentionedbyothers.Ifhendsthistopicinterestingandfurtherretweetsitout,hethenbecomesinuential.Theuserswitchtothenalstatestoppedoncehedecidesnottoforwardanymoreaboutthistopic. 2.3BuildingCascadesofInformationAfterausertweetsaboutonetopic,theinformationdiffusionbeginswhenanotheruserreadsthistweetandthenretweetsit,andcontinueswhentherearesomenewuserswhoseetheretweetsandthenfurtherretweetsthem.Wedeneacascadebycollectingallusersthathaveeverparticipatingthediffusionprocessandtheowsoftheinformation,startingfromtheveryrstuserthatpoststheearliesttweet.Toillustratetheowofeachcascade,initializedbyasingleuser,webuildahierarchicaltreebasedon[ 17 ]andmodifyittofulllourneed.Wecallsuchatreeacascade.Inacascade,allnodesareconnectedandeachnodehasonlyoneparent,excepttheroot,whichdoesnothaveaparent.Insuchatree,wenametherootastheinitiator,whoistherstuserinthiscascade.Startingfromtheinitiator,theexistanceofeachedgei!jrepresentstheinformationowfromitoj.Wethennametheleavesasthereceivers,andallothernodesasthespreaders,asFigure 2-2 illustrates.Themethodologythatweusetoconstructacascadeisasfollows.Firstwehavetoidentifythetweetsrelatedtoonetopic.InTwitter,usersareallowedtoexplicitlyidentifytopicswhentweetingbyusinghashtags,awordstartingwith#(e.g.,#H1N1, 14

PAGE 15

Figure2-2. Anexampleofacascadeofapieceofnews.Anarrowfromnodei!jmeansthattheinformationowsfromitoj.Wemarktheinitiatorasastar,thespreadersascircles,andthereceiversassquares.Thedashedarrow2!5indicatesthatalthough2tells5aboutthenews,thishasnoeffectbecause5hasalreadyknownitfrom1.Thiscascadeterminateswhennonodefurtherspreadsoutthetopic.Inthisexample,therangeofthiscascadeis3 #michaeljackson).Thismakesthetopicidenticationapplicable.However,themajorityofusersdonotusehashtags(onlyabout1/10tweetshavehashtags[ 10 ]).Thus,wethenneedtodeterminetopicsfortweetswithouthashtags.Werstidentifythesetofkeywordsbysurveyingnewswebsitesandthetweetsinourdataset.Wethenidentifyrelevanttweetsbysearchingforthekeywordsinthetweets.Foronetopic,thenextstepistondoutallthecascades.Werstndoutallinitiatorswhohaveneverretweetedanyotherusersorbeenmentionedbyotherusers,beforepostingthersttweetaboutthetopic.Startingfromeachinitiator,wethenndtheedgestoanotherusers.Inourconstructionofcascades,theexistanceofanedgei!jrepresents:1)imentionsj,2)jretweetsi,or3)imakesapostandjfollowsi.Thuswehavetherstlevelofusersinthiscascade.Wethenndtheedgesfromsuchuserstoanotherusersbythesamemethod,untilwecannotndanewparticipatinguseranymore.Onemayask:Whyisthedirectionofdiffusiondifferentbetweenretweetingandmentioning?Whenimentionsj,itexplicitelyindicatesthatipostsanarticleandhopes 15

PAGE 16

thattwitterwouldmakejseethat,sotheinformationowsfromitoj;wheniretweetsj,itimplicitlymeansthatireadsapreviouspostonj'spageandthenforwarditout(toi'sfollowers),sotheinformationowisfromjtoi,asFigure 2-3 shows.Onemaythenask:Whatiftherearemultipleusersthatprovideinformationtothesameuseraboutthesametopic?Ouransweris,underthiscircumstance,weselecttheearilistedgeforthatitistheearlist.Notethat,despitethatahierarchicaltreehasasingleroot(initiator),theremaybemultipleinitiatorswhosharethesamenewsconcurretntly.Wethenbuildatreeforeachinitiator.Forthecompletenessoftheanalysisonthetreestructureandthebehaviorsoftheinitiaotrs,wemakeitpossiblethatanodeappearsinmultipletrees.Moreover,inthispaper,therangeindicatesthemaximumlevelofahierarchicaltree,andtheaudienceofonetopicmeansalltheusersthatappearinthehierarchicaltreesaboutthistopic,excepttheinitiators. Figure2-3. Thedirectionofinformationdiffusion.TheinformationabouttheconcertowsfromAtoBwhenBretweetsA;TheinformationaboutthevideobyDowsfromDtoCwhenDmentionsC. 2.4TheSelectionofTopicsWethenchooserepresenatativetopicsforcasestudy.In2009thatourdatsetscover,thereare70mostpopulartopicsdiscussedonTwitter[ 5 ].Amongwhichwendusabletoroughlydividethemintotwogenres.Oneisthemoreserioushot-newsgenre(e.g.,iranelection,michaeljackson,swineu),andoneisthemoreleisure 16

PAGE 17

Table2-2. DetailsofTopics #Tweets#RTs#mentions iranelection40834797685(23.9%)30163(7.3%)lakers7405120918(28.3%)25989(35.1%) entertaininggenre(e.g.,Transformers2,A-Rod,Lakers).Wechooseonetopicfromeachgenre,andrstmakepreliminaryobeservationsontheircharacteristicsw.r.t.informationdiffusion.Thechosentopicsare:theelectioninIranandthe2009NationalBasketballAssociation(NBA)champion-L.A.Lakers.Forsimplicity,inthispaper,weuseiranelectionandlakerstorepresentthesetwotopicsrespectively.Thesetwotopicsaredifferentfrommanyaspects.First,iranelectionisaseriouspoliticaltopic;lakersisaleisuresportstopic.Second,iranelectionhappensinAsiaandspreadsallaroundtheworld;lakershappensintheStatesandismainlydiscussedintheareaswherebasketballispopular.Third,iranelectionhasneweventsupdatedconsequently,includingtheresult,protests,andarrests;lakershappenedonJune14th,andthennomajorissuescomesoutbecauseoftheendoftheseason.Werstsummarizethenumbersoftweets,retweets,andmentionsaboutbothtopicsinTable 2-2 .Besidesthedifferenceintheirscales,theiruserbehaviorsduringthediffusionprocessarealsodifferent.iranelectiondrawsmuchmoreretweetsthanmentions(3.2times),whileforlakers,thenumberofmentionsismoderately(24%)largerthanretweets.Thiscanbeexplainedbythenatureofthetopics:formoreseriousnews,onemayintendforsimplyretweetinganarticle(ratherthanidentifyingaspecicuser)sothatallfollowerscanseeit;whileforamoreleisuretopic,becausetheinterestedaudiencevarieswithdifferenttopics,onemaywanttomentionaparticularfriend,whoareknowntobeinterestedinthattopic.Forbothtopics,thenumberofretweetisainnegligiblemeasure(around1/4ofthetotaltweets).Thus,wethenwouldliketounderstandtheretweetingpatternofusersinTwitter.Weshowthedistributionofthenumberthatusersretweetandberetweetedin 17

PAGE 18

Figure 2-4 .Inboth 2-4A and 2-4B ,thedistributionsroughlyfollowsalong-tailshape.Thatis,mostusersretweetandberetweetedveryfewtimesaboutonetopic,andmostoftheretweetsrefertoandarepostedbyveryfewusers.Moreover,theseguresalsoshowhowthetopic#iranelectionand#lakersdifferfromeachotherintheaspectofthetrafcamountofretweets(byroughlyanorderofmagnitude).Inthispaper,wefocusontheinformationdiffusionprocessaboutpopulartopicsonTwitter.Specically,westudytherelationbetweentherangeofcascadeanduserbehaviorduringthediffusionporcess.WebelieveouranalysisdemonstrateshowuserbehaviorsaffectinformationspreadthroughTwitter.Wethenpresentourobservationsinthefollowingsections. 18

PAGE 19

ARetweets BRetweetedsFigure2-4. Numberoftimestoretweetandtoberetweetedofauser,intermsofComplementaryCumulativeDistributionFunction(CCDF).Bothroughlyshowlong-tailshapes. 19

PAGE 20

CHAPTER3THEEFFECTOFMULTIPLERETWEETSINTWITTERAsmentionedinSection 2.2 ,multipleretweetsmadebyoneuseronaspecictopicfeaturestheoperatingmechanismofTwitter.Inthischapter,werepresenttheeffectofsuchmechanismontheinformationpropagationinTwitter. 3.1LongerRangeThanExpectedWerstobservethelengthofinformationpropagationinTwitter.Itiswidelyadmittedandappliedasanassumptioninpreviousworkthatinformationmostlypropagationsfor2to6hopsinOSNs[ 16 ].WerstcheckwhethersuchphenomenoncanbeobservedinTwitter.WeusethemethoddescribedinSection 2.3 toextractcascadesforthetwotopics#iranelectionand#lakersselectedinSection 2.4 andseehowtheirrangesdistribute.Therangeofacascadeisdenotedbythelongestpathfromtheinitiatortothereceiversinit(i.e.,fromroottoleaf).WethenuseFigure 3-1 toshowthedistributionofcascaderangesaboutthesetwotopics.Tooursurprise,therangedistributionsfollowto6hopsdistributionassumption[ 16 ].Whenthecascaderangesarelessthan8,thefrequencydropswiththeincreaseofrange;however,thefrequencyriseswiththeincrementofrangewhentherangeislargerthan10hops.Thisisespeciallyapparentintopic#iranelectionasshowninFigure 3-1A .Although#lakersdoesnotapparentlyshowthisphenomenon,westillcanndaconsiderablenumberofcascadeswithrangeofupto15inFigure 3-1B .Observingthiscuriousphenomenon,weareinterestedinthereasonwhythishappensinTwitter.AftercomparingwithotherOSNssuchasFlickr,Digg,andSlashdot,wendoutthattheoperatingmechanismofTwittercouldbeoneexplanation.Ratherthanhavingonlyonechancetomakeavoteforapictureorstory,usersinTwittercanexecuteretweetandmentionforseveraltimestomakeconsecutivediscussionsintermsofonetopic.Thismechanismisalsousedinmanyothersocial 20

PAGE 21

Airanelection BlakersFigure3-1. DistributionofRange networkssuchasFacebookandGoogle+,wheretheuserscanresponsetoamessageamanytimesaslongasheisstillinterestedinit.Onemaythenwonderwhethersuchsequentialresponsesreallymakeadifferenceoninformationdiffusionprocessintheunderlyingsocialnetwork.Intuitively,wemayassumethatifauserretweetsmoreaboutonetopic,itwouldbemorepossiblethathisfollowersareinuencedandthenextendthepropagationchain,thusthephenomenonaboveisexplained.However,onehastobeawareofanotherprobleminformationsaturation,whichisdenedbySteegetal.:despitemultipleopportunitiesforinfectionwithinasocialgroup,peoplearelesslikelytobecome 21

PAGE 22

spreadersofinformationwithrepeatedexposure[ 29 ].ToexaminewhetherthiswouldhappenalongwithmultipleretweetsinTwitter,wefurtherpresentourinvestigationinthenextsection. 3.2EveryRetweetMattersinTwitterWerstseetherelationbetweenthetotalnumberofretweetsandtheaveragenumberofinuencedfollowersbyauserwhenhepostsatweetaboutonetopic.Thisistoseehowdifferentlydousersinuencetheirfollowers,whentheyretweetsmoreaboutthesametopic.AsFigure 3-2 shows,theinuencedoesnotincreasewiththenumberofretweetsofauserasweassumed.Thatis,wecannotndaleftbottom-righttoptrendinbothtopics.Infact,alefttop-rightbottomtrendisobserved.Then,wewonderwhetherthefollowers'willingtobecomeinuencialdropsinsteadastheincrementofauser'snumberofretweets.Ifthiscase,wecanexpectthatitdropstozeroasthenumberofretweetsreachesacertainvalue.Toseethis,wethencountthetotalnumberofretweetsthateachuserhasmade,andthencalculatetheaveragenumberofnewlyinuencedfollowerscausedbyeveryretweet.AspresentedinFigure 3-3 ,auser'srstretweetinuencesnearly2.5followerswithrespecttothetopicabout#iranelection.(NotethedifferencebetweenFigure 3-2 and 3-3 )Suchinuencedropsslightlyafterthesecondretweet,andoscillatesbetween1and2fromthethirdretweeton.Thesamephenomenonexistsw.r.t.thetopic#lakers.Therefore,wendthatmultipleretweetsexplainsthelongerrangesofcascadesobservedinTwitter.Furthermore,althoughtheinuenceofthersttworetweetsdifferfromeachotheronacertainscale,thesaturationdoesnotobviouslyexistsincethenumberofnewlyactivatedusersdoesnotdroptozerowithmoreretweets. 3.3TimelinessofNewsinTwitterBesidestherange,multipleretweetsinTwitteralsoaffectsthetimelinessofthepropagationofnews,whichisoneofthemostimportantfeaturestopropagatethe 22

PAGE 23

Airanelection BlakersFigure3-2. Theaverageinuenceofeachretweetperuser.Auser'saveragefollowersinuencedbyperretweetdecreaseswhenthisuserretweetsmoretimes. newsbecausequickerpropagationcouldhelpnewskeepfreshandthusinuencemorepeopleduringthediffusionprocess.Todemonstratetheeffect,wethengrouptheusersbythetotalnumberofretweetsthattheyhavemade,andshowtherelationbetweenthenumberofretweetsandthetimethatatweetneedstoberetweetedbyanotheruserinFigure 3.3 ,wherepeoplewithmoreretweetscanbeenretweetedinarelativelyshortertimeforbothofthetopics.Thisresultsshowthatmultipleretweetsshortenthetimetoberetweetedbyother,i.e.,aquickerpropagationcanbeexpected. 23

PAGE 24

Figure3-3. Thenewlyinuencedfollowerscausedbyauser'seachretweet.Therstretweetofauserplaysthemostimportantrolewheninuencingfollowerstofurtherretweetthetopicout.Therepeatingexposure(morethan3times)doesincreasethefollowers'willingnesstobecomeaspreaderattheinuencedstate,buttheincrementconvergestoacertainrange,ratherthantozerooralargenumber Figure3-4. TimeNeededtobeRetweeted.Aretweettakeslongertimetoberetweetediftheautherdoesnotretweetmuchaboutthesametopic. 24

PAGE 25

CHAPTER4TIMEINTERVALBETWEENMULTIPLERETWEETSInChapter 3 ,itisshownthatmultipleretweetshaveanunignoredeffectonenlargingthesizeofdiffusioncascadesandextendingpropagationchains.Then,wewouldliketogureoutthefactorsthatraiseusers'willingnessofmakingmultipleretweets.Asmentionedperviously,timelinessplaysanimportantroleininformationdiffusion,sowetrytondapredictortoforecastwhowouldbemorelikelytoretweetmorebycheckingandcomparingintervalbetweenconsequentialretweetsthatausermade.Thispredictorshouldbedeterminedintheearlystageofthepropagationprocess.Foreachuser,weexaminehowthetimeintervalbetweeneachtwoconsecutiveretweetsaffectshisperformanceintermsofretweeting.WerstdenethetimeintervalInRas:thetimeintervalbetweenauser'snthandthen+1thretweetsaboutonetopic.Thenwechoosetherst15retweetsforeachusertostudy(i.e.,calculatingI1R,I2R,...,I15R),forthereasonthat:1)althoughthenumberofretweetsoneusermaderangeswidelyfrom1to350regardingonetopic,98%ofwhicharelessthan15;and2)themajorityoftheother2%oftheusersarerobots,whichautomaticallyretweetsonrandomorspecictopicswehavetoeliminatetheseaccounts.Additionally,toalleviatetheimpactofpersonaldifference,wegrouptheusersbythenumberofretweets.Thisissimilartowhatwedidintheprevioussection,butthistimewedividetheusersintovegroups:the2retweetsgroup,3-4retweetsgroup,5-10retweetsgroup,11-20retweetsgroup,and21+retweetsgroup,toconductmoredetailedanalysis.ResultsareshowninFigure 4-1 .Notethatfortopic#lakerswecombinethelast2groupstogetherbecausethereareonlyafewusersinthe21+group(asFigure 2-4A shows).Twoimportantndingsarelistedinthefollowing.Therstndingisthatwiththedecrementandconvergenceoftimeintervalasthenumberofretweetsincreases. 25

PAGE 26

Forthetopicabout#iranelection,wendthatthetimeintervalbetweentherstthreeretweets,i.e.,I1RandI2R,aredifferentfromthelaterones.Thetimeintervalofmakingmoreretweetsdropsapparentlybeforethefourthretweet,butlateron,thetimeneededtomakethemoreretweetsoscillateswithinacertainrange.Thisphenomenontrulyreectsthediffusionprocessinreality.Whenafreshnewsstorycomesout,peoplemayspendtimeverifyingitsauthenticity:whetherthisnewsisrelatedtotheirlives,etc.Oncetheyfeelinterestedinit,theytendtoforwardmorefrequently,andoncetheusersarefullyengagedinthistopicafterwards,theywillcontinueretweetingnewsaboutitinastablepace.However,wedonotseethephenomenononthetopic#lakersintermsofdecrement,whiletheconvergenceisstillobserved.Thismaybecauseofitssmallerscaleandtheearliersaturation.Wewilldiscussthelatterconjectureinthenextsubsection.Secondly,wendthattherelationshipbetweenthenumberofauser'sretweetsandthetimeintervalbetweentheseretweets.Bycomparingthetimethatthesevegroupsspendonmakingeachnextretweet,theobservationsuggeststhatthetimeoneuserspendonmakingthersttworetweetsI1Rcanbeservedasaclearpredictortoforecastthepotentialofmakingmultipleretweetsinanearlystage.Thatistosay,ifusersspendlesstimeonsendingtherstseveralretweets,thentheyenjoyarelativelyhighchanceoftalkingaboutthespecictopicmymakeingmultipleretweetsinashorttime.Weattributethistousers'interests,sinceuserswouldliketoquicklyidentifyandpickupnewsthattheyareinterestedin.Oncetheyarefullyengagedinthistopic,thedevelopmentoftheeventswoulddrawtheirattention,sothattheyhavemorechancestomakemoreretweets. 26

PAGE 27

Airanelection BlakersFigure4-1. TimeIntervalbetweenretweets.Thisgureshowsthetimethatauserneedstomakethenextretweet.Wendtwophenomenonshere:1)Peopleretweetsmorefrequentlywhentheygaininterestsinatopic.Thisisapparentiniranelection,butnotinlakers;2)userstendtoretweetmoreiftheirrstfewretweetsaremorefrequentlymade. 27

PAGE 28

CHAPTER5TIMEINTERVALBETWEENCASCADELEVELSSofarwehaveobservedtheattributesofusers.Inthischapter,wecontinuediscussingtheimpactoftimeintervalsateachlevelinacascade.Wendthat:1)Thebehaviorsofthedirectfriendsoftheinitiatorsaredifferentfromthoseofotherusers;2)Theinformationreachestomajorityoftheusersthatitcanreachattheearlystages;and3)Theintervalspentbythedirectfriendsofinitiatorsdeterminestherangeofthecascades. 5.1DistributionofTimeIntervalbetweenCascadeLevelsSincethetimelinessplaysasignicantroleindiffusionunfoldprocessaswehavefoundintheprevioussections,andalsobecausethecascadesarebuiltbasedontheorderofinformedtime,itisknee-jerktoassumethatuserswhoretweetearlierandquickerwillgainlargeraudienceandthusenlargethewholecascadeasaresult;thatis,thetimethattheinformationneedstobepropagatedfromoneleveltothenextshouldbelongeratthehigherlevel(i.e.,fartherfromtheinitiator)ofthecascade.Toseethecorrectnessofthisassumption,wecollectallcascadesfortopic#iranelectionand#lakers,anddrawthedistributionoftheintervalbetweeneachtwoconsecutivelevels.WeuseInLtorepresenttheintervalbetweenthetimethatauseratlevlenisinformedandthetimethatheretweetsaboutthistopic,sothathisfollowersatlevlen+1areinformed.ThedistributionisshowninFigure 5-1 .FromFigure 5-1 wehavetwoobservations.First,wecanseethedifferencebetweentherstfewlevelsandtheothers.Takeiranelectionforexample,therstthreelevelscanstillbedistinguishedfromoneanother,butfromthefthlevelon,thecurvesareallinterleavedtogether.Topiclakershassimilartrend,yetonlytherstandsecondlevelsaredistinguishablefromothers.Second,bylookingatthemoststeepsectionofeachcurverepresentinglevel4to10,wecanstillpointoutthatInLincreaseswithntoverifythepreviousassumption,which 28

PAGE 29

Airanelection BlakersFigure5-1. Distributionoftimeintervalbetweenlevels.Onlythecurvesoftherstfewlevelscanbedistinguishedfromoneanother.Afterthefthlevel,thetimedifferencedistributiondoesnotchangealotfromleveltolevel.Thisexplainswhyonlytheintervalbetweentherstfewlevelsmoreclearlyindicatesthecascaderange. suggestthatittakeslongertimetopropagatetheinformationiftheuserisatthehigherlevelinacascade.However,thedifferencesbetweenthesecurvesaresosmallthatonecannottellclearly.Whatleadstosuchndings?Wethentakeonemorestepdeeperintothisproblem. 29

PAGE 30

Figure5-2. Accumulativesizeofaudiencebyeachhop.Everynewsstoryhasitslimitationoninformingusers.Inaverage,#iranelectionreaches80%ofitslimitationatthe5thlevel,while#lakersdoesatthe2ndlevel. 5.2InformedUsersatEachLevelTothisend,wecalculatethecumulativepercentageofinformedusersineachleveltotheoverallinformedusersever,asFigure 5-2 shows.For#iranelection,80%peopleinthecascadesarealreadyinformedaboutthistopicbeforethefthlevel;fortopicabout#lakers,itinforms80%ofthepeoplethatcanbeeventuallyreachedbythistopicevenwithin3levelsofpropagation.Hence,weexplainwhyinformationpropagationtendstoslowdownafterbepropagatedforcertainhop(becauseofthedifcultytoinformnewfriends);wealsoexplainwhythecurvesinFigure 5-1 tendtobeindifferentaftercertainhops(becauseofthelittledifferencebetweennewlyinformedusersbetweendifferentlevels). 5.3TimeintervalbetweenCascadeLevelsAswecanseeinFigure 5-1 ,thetimeintervalattherstlevelI1Ldivergesgreatly.Thus,wewonderwhetheritsvalueaffectsthepropagationcascadeonitsrange.Herewedividethecascadesintogroupsbasedontherangeofcascades.Foracascadeineachgroup,wewouldcomputetheaveragetimethatauserrstretweetsthenews 30

PAGE 31

Table5-1. TimeIntervelbetweenthe1stand2ndLevel(hr) CascadeRange<55to910to14>15hlineiranelection24.296.31lakers21.0813.827.402.21 fromthetimehewasrstlyinformed1aboutthistopic,i.e.,theaveragedvalueofallI1L's.Afterthat,wendthatthetimeintervalbetweentherstandsecndlevelisaveryimportantindicatortotherangeofacascade:thesoonertherstlevelaudienceforwardsthenews,thelargerthecascaderangewillbe.ThisisshowninTable 5-1 1SeethedenitioninChapter 2 31

PAGE 32

CHAPTER6USERACTIVENESSInthischapter,wediscusstheeffectofactivenessofauserintwitter.InTwitter,wedenetheactivenessofauserastheaveragenumberoftweetsthathepostshourly(#tweets=hr).Besides,aswecanassume,users'interestscouldmakecontributiontothediffusionprocessontheunderlyingnetwork.Thus,peoplewouldliketobroadcastmorenewsaobutthetopicsthattheyareinterestedin,comparedwithothernews.Inordertoseethis,wecalculatetheactivenessofauseringeneral,andcompareitwiththeinterestofauseraboutaspecictopic.Wemeasuretheuser'sinterestaboutatopicbythetotalnumberofretweetsbytheuseraboutthetopic.WethenobservetherelationbetweenthesetwomeasurestoseetheirrelationsinFigure 6-1 .InFigure 6-1 ,byobservingtherelationsofactivenessandinterestunderdifferentvalueofx-axisandy-axis,wesummarizeourobservationsasfollows: 1. Thelowerpartsofbothguresaredense,andtheydonotapparentlytendtotheleftcornerorrightcorner.Thisshowsthatalotofusershelppropagatethepopulartopicsdespitetheactiveness,iftheyarenotveryinterestedinthattopic. 2. Morespecicaly,someactiveusersretweetaboutatopicthattheyarenotinterestedin.Theyretweetsimplybecauseitispopular.Theright-bottomcornersshowthis. 3. Ifauserisinterestedinonetopic,hewillbeactiveingeneral.Thisisshownbytheemptyleft-topcornerofbothsubguresofFigure 6-1 .Thatis,auserwouldnotbeinterestedinaspecictopiconly;hemustbealsoactiveingeneral. 32

PAGE 33

Airanelection BlakersFigure6-1. ActivenessofUsers.Thescatteredplotsreecttherelationbetweenusers'interestonspecictopicsandtheiractivenessingeneral. 33

PAGE 34

CHAPTER7DISCUSSIONANDFUTUREWORKInthispaperwediscussalotofmeasuresforusers,tweets,andcascadesinTwitter,anddiscusstheirrelationshipsbetweenoneanother.Inthissection,weshowtwointerestingphenomenonofonefactor,anddiscusstwopossibleexplanationsofit.Then,wediscussaboutourfuturework.Wewouldliketoseetheeffectoftheselectionoftopic.Thus,wendit'sinuenceontwomeasures:timeneededtoberetweeted,andprobabilitytoberetweeted. 7.1MoreonTimeNeededtobeRetweetedInsection 3.3 ,wediscussonefactorthatincluencesthetimeneededtoberetweetedmadebyauser:thetotalnumberofretweetsofthatuser.Wewouldliketoseewhethertheselectionoftopicaffectthismeasure,too.Thus,despitetheauthors,foralltheretweetsmadeabouteachtopic,weshowthedistributionofthetimeneededtoberetweetedinFigure 7-1 .Asshowninthegure,about80%ofthetweetsareretweetedbythefollowerswithin48hours(2days),andthereareonlyfewtweetsthatbeingrstretweetedafteraweek.Anotherinterestingndingistheverysimilardistributionsonthismeasureforbothtopics,i.e.,theselectionoftopicdoesnotaffectthetendenceofatweettobeearlyretweeted. 7.2ProbabilityofBeingRetweetedTheaboveanalysisisonthearticlesthatareretweeted.Onemaythenpromptanotherquestion:howmanyofthearticlesareretweetedbyothers?Wethenanswerthisquestionasfollows.Wethenwouldliketoseewhetherfrequentretweetsanddifferenttopicsleadtohigherchancestoberetweeted.Thus,wegrouptheusersbythenumberofretweetsaboutonetopic,andseeeachuser'saverageprobabilityofbeingretweeted,asshowninFigure 7-2 .Asonecansee,theprobabilityincreasewhenauserretweetsmoreabout 34

PAGE 35

Figure7-1. TheDistributionofTimeNeededtobeRetweeted.Thetwotopicsshowverysimilardistributions:onceatweetisretweeted,thetendencyofthetimebeingretweetedwillnotbeaffectedbytheselectionoftopics. #iranelection,butfor#lakerswecannotndsuchacorrelationatall.Wethendiscusstwopossibleexplanationsasbelow.Therstreasoncouldbethetopics'differenceinnature.Comparingto#iranelec-tion,#lakersisofsmallerscaleandmoreinterested-oriented,asarguedinSection 2.4 .Thus,notonlytheoverallprobabilityofbeingretweetedissmaller,butalsopeopletendtoretweetedbasedontheirinterestedin#lakers,ratherthanontheinuencebyothers.Wethenarguethatthisisbecauseoftheirdifferentneedofexpertise.Afterscanningonthemostfrequentlyretweetedarticles,for#iranelection,suchtweetsareusuallywrittenbyjournalistormediacolumnists.However,#lakersisnotthecase:itisrelativelyeasiertowriteacommentindepth,despiteofthenumberofretweetsaboutthesametopic(whichreectstheuser'sexpertise). 7.3FutureworkAlthoughtherehavealreadybeenalotoreffortsputontheinformationdiffusiononOnlineSocialNetworks(OSNs),thepreviousworkeitherhavetoostrongassumptionstofulllagenerousdiffusionpatternofOSNs,orhavealimitedperformancetting 35

PAGE 36

Figure7-2. Therelationshipbetweennumberofretweetsofusersandtheprobabilityofbeingretweeted.Theexistanceoftherelationshipdependsonthetopic.Theyarepositivelycorrelatedfor#iranelection,butnearlyirrelaventfor#lakers real-worlddata.Tothebestofourknowledge,ageneralandperfectlyttinginformationdiffusionmodelisstilllacking.Inthispaper,weproposedaUIISmodeltorepresenttheusers'statesduringthediffusionprocess;however,westillneedtotakeindividualdifferencesintoconsiderationinordertoquatativelytthemodel.Thetransitionsbetweenthefourstatesarecrucialandinneedofclearidentiers:atweetaboutthetopicpostedbywhomauserfollowsindicatesthetransitionfromunknowntoinformed;afurtherretweetindicatesthefollower'stransitionfrominformedtoinuential;however,theonlyuncleartransitionisthatbetweenInuentialandStopped.Theobservationswehavepresentedprovideusaninsightofmoniteringthetrasitionbetweenthelasttwostates,yetneedfurthermathematicalevaluations.Moreover,todeterminewhethertwotweetsareaboutthesametopic,like[ 12 ],hereweuseasimplemethodthatdetermineswhethertheycontainscommonkeywords.ThiscanbeimprovedbyoutsourcingtotheOpenCalaisservice[ 3 10 ],whichextractstagsfrominputarticles.Weleavetheseforthefuturework. 36

PAGE 37

CHAPTER8RELATEDWORKThroughoutthepaper,wehavediscussedthereferencesthatcloselyrelatetoourwork.Tohaveathoroughviewofthewholebackgroundoftheliterature,Webrieyreviewrelatedworkonthenetworkstructureofonlinesocialnetworks,investigationinformationdiffusion,andmodelingofthediffusionprocess.Inthelastdecade,anumberofresearcheffortshavebeenputonunderstandingthenetworkstructure,userinteractions,andtrafccharacteristics[ 9 11 14 18 19 22 26 28 ].SeveralworkhavestudiedtheinformationdiffusionoverOSNsusingempiricalapproaches[ 12 13 16 21 27 ],andsomeusedstatisticalmethodstopredictthecharacteristics(e.g.,range,scale,andspeed)oftheinformationdiffusionprocessonOSNs[ 33 ].Recently,someresearchershavebeeninvolvedinanalyzingtheprocessofinformationdiffusiononOSNs[ 29 31 ].Sunetal.[ 30 ]arguedthattheassumptionthatafewnodesstartlongchainreactions,resultinginlarge-scalecascadesdoesnotholdforsocialmedianetworks.Besides,Wangetal.[ 31 ]proposedadiffusivelogisticmodeltopredictinformationdiffusionoveratimeperiodinDigg.ThemostrelatedworktooursisbySteegetal.[ 29 ].Theyraisedargumentstoopposethewidelyusedmodelswithreal-worlddata.TheinformationdiffusioninOSNsisfundamentallydifferentfromothercontagionprocesses:peopledonotbecomemorelikelytofurtherspreadinformationoutwithrepeatedexposure.SuchSaturatingphenomenonwasdiscussedasthemainreasonthatstopstheinformationdiffusion.Bytheselectionofdiffusionmodel,however,theyneglectedtheuserbehaviorsinmoregeneralsocialnetworks(e.g.,Twitter,Facebook,andGoogle+).Inthiswork,wehavestudiedthediffusionprocessonamoregeneraldiffusionmodel,andfoundsomecrucialfactorsthatheavilyaffectinformationpropagations,yetneglectedintheliterature. 37

PAGE 38

CHAPTER9CONCLUSIONInthispaper,wehavedemonstratedthatmultipleretweetshaveanunignoredeffectondiffusionprocessinTwitter.Thereareconsiderablenumberofcascadeswithrangelargerthan10hopswhenweextractthemwithrespecttotopicsratherthanmessages.Multipleretweetsenablepeoplekeeptalkingandsharinginformationaboutthedevelopmentofoneevents,thusallowustoseeadynamicnewsratherthanastaticstory.Inaddition,timeintervalamongmultipleretweetsplaysanimportantroleinpredictingthetendencythatoneuserwouldmakemoreretweetsornot.Furthermore,wecanalsoevaluatethetimeintervalbetweenthersttwolevelsinthecascadestoforecastwhetherasourcecangeneratelargercascadesornot. 38

PAGE 39

REFERENCES [1] Facebook. http://www.facebook.com ,2004. [2] Twitter. http://www.twitter.com ,2006. [3] OpenCalais. http://www.opencalais.com/ ,2008. [4] Mashable. http://mashable.com/2009/06/17/iranelection-crisis-numbers/ ,2009. [5] Twitterblog. http://blog.twitter.com/ ,2009. [6] Google+. https://plus.google.com/ ,2011. [7] Newsroom. http://newsroom.fb.com ,2012. [8] SocialStatistics. http://socialstatistics.com ,2012. [9] Anagnostopoulos,Aris,Kumar,Ravi,andMahdian,Mohammad.Inuenceandcorrelationinsocialnetworks.Proceedingsofthe14thACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining.KDD'08.NewYork,NY,USA:ACM,2008,7.URL http://doi.acm.org/10.1145/1401890.1401897 [10] Ardon,Sebastien,Bagchi,Amitabha,Mahanti,Anirban,Ruhela,Amit,Seth,Aaditeshwar,Tripathy,RudraM.,andTriukose,Sipat.Spatio-TemporalAnalysisofTopicPopularityinTwitter.CoRRabs/1111.2904(2011). [11] Benevenuto,Fabrcio,Rodrigues,Tiago,Cha,Meeyoung,andAlmeida,Virglio.Characterizinguserbehaviorinonlinesocialnetworks.Proceedingsofthe9thACMSIGCOMMconferenceonInternetmeasurementconference.IMC'09.NewYork,NY,USA:ACM,2009,49.URL http://doi.acm.org/10.1145/1644893.1644900 [12] Cha,Meeyoung,Haddadi,Hamed,Benevenuto,Fabricio,andGummadi,KrishnaP.MeasuringUserInuenceinTwitter:TheMillionFollowerFallacy.2010. [13] Cha,Meeyoung,Mislove,Alan,andGummadi,KrishnaP.Ameasurement-drivenanalysisofinformationpropagationintheickrsocialnetwork.Proceedingsofthe18thinternationalconferenceonWorldwideweb.WWW'09.NewYork,NY,USA:ACM,2009,721. [14] Chun,Hyunwoo,Kwak,Haewoon,Eom,Young-Ho,Ahn,Yong-Yeol,Moon,Sue,andJeong,Hawoong.Comparisonofonlinesocialrelationsinvolumevsinteraction:acasestudyofcyworld.Proceedingsofthe8thACMSIGCOMMconferenceonInternetmeasurement.IMC'08.NewYork,NY,USA:ACM,2008,57. 39

PAGE 40

[15] DeLongueville,Bertrand,Smith,RobinS.,andLuraschi,Gianluca.OMG,fromhere,Icanseetheames!:ausecaseofmininglocationbasedsocialnetworkstoacquirespatio-temporaldataonforestres.Proceedingsofthe2009InternationalWorkshoponLocationBasedSocialNetworks.LBSN'09.NewYork,NY,USA:ACM,2009,73.URL http://doi.acm.org/10.1145/1629890.1629907 [16] Gomez,Vicenc,Kaltenbrunner,Andreas,andLopez,Vicente.Statisticalanalysisofthesocialnetworkanddiscussionthreadsinslashdot.Proceedingsofthe17thinternationalconferenceonWorldWideWeb.WWW'08.NewYork,NY,USA:ACM,2008,645.URL http://doi.acm.org/10.1145/1367497.1367585 [17] Hanneman,RobertA.andRiddle,Mark.Introductiontosocialnetworkmethods.UniversityofCalifornia,Riverside,2005.URL http://faculty.ucr.edu/~{}hanneman/nettext/ [18] Jiang,Jing,Wilson,Christo,Wang,Xiao,Huang,Peng,Sha,Wenpeng,Dai,Yafei,andZhao,BenY.Understandinglatentinteractionsinonlinesocialnetworks.Proceedingsofthe10thannualconferenceonInternetmeasurement.IMC'10.NewYork,NY,USA:ACM,2010,369.URL http://doi.acm.org/10.1145/1879141.1879190 [19] Kossinets,Gueorgi,Kleinberg,Jon,andWatts,Duncan.Thestructureofinformationpathwaysinasocialcommunicationnetwork.Proceedingsofthe14thACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining.KDD'08.NewYork,NY,USA:ACM,2008,435.URL http://doi.acm.org/10.1145/1401890.1401945 [20] Kwak,Haewoon,Lee,Changhyun,Park,Hosung,andMoon,Sue.WhatisTwitter,asocialnetworkoranewsmedia?WWW'10:Proceedingsofthe19thinternationalconferenceonWorldwideweb.NewYork,NY,USA:ACM,2010,591. [21] Lerman,K.andGhosh,R.Informationcontagion:AnempiricalstudyofthespreadofnewsonDiggandTwittersocialnetworks.2010.URL http://scholar.google.de/scholar.bib?q=info:ITFPFHlz1aAJ:scholar.google.com/&output=citation&hl=de&as_sdt=0&ct=citation&cd=32 [22] Leskovec,JureandHorvitz,Eric.Planetary-scaleviewsonalargeinstant-messagingnetwork.Proceedingsofthe17thinternationalconferenceonWorldWideWeb.WWW'08.NewYork,NY,USA:ACM,2008,915. 40

PAGE 41

[23] Leskovec,Jure,Lang,KevinJ.,Dasgupta,Anirban,andMahoney,MichaelW.Statisticalpropertiesofcommunitystructureinlargesocialandinformationnetworks.Proceedingsofthe17thinternationalconferenceonWorldWideWeb.WWW'08.NewYork,NY,USA:ACM,2008,695.URL http://doi.acm.org/10.1145/1367497.1367591 [24] Liben-Nowell,DavidandKleinberg,Jon.TracinginformationowonaglobalscaleusingInternetchain-letterdata.ProceedingsoftheNationalAcademyofSciences105(2008).12:4633.URL http://dx.doi.org/10.1073/pnas.0708471105 [25] Nazir,Atif,Raza,Saqib,andChuah,Chen-Nee.Unveilingfacebook:ameasurementstudyofsocialnetworkbasedapplications.Proceedingsofthe8thACMSIGCOMMconferenceonInternetmeasurement.IMC'08.NewYork,NY,USA:ACM,2008,43.URL http://doi.acm.org/10.1145/1452520.1452527 [26] Nazir,Atif,Raza,Saqib,Gupta,Dhruv,Chuah,Chen-Nee,andKrishnamurthy,Balachander.Networklevelfootprintsoffacebookapplications.Proceedingsofthe9thACMSIGCOMMconferenceonInternetmeasurementconference.IMC'09.NewYork,NY,USA:ACM,2009,63.URL http://doi.acm.org/10.1145/1644893.1644901 [27] Rodrigues,Tiago,Benevenuto,Fabrcio,Cha,Meeyoung,Gummadi,Krishna,andAlmeida,Virglio.Onword-of-mouthbaseddiscoveryoftheweb.Proceedingsofthe2011ACMSIGCOMMconferenceonInternetmeasurementconference.IMC'11.NewYork,NY,USA:ACM,2011,381.URL http://doi.acm.org/10.1145/2068816.2068852 [28] Schneider,Fabian,Feldmann,Anja,Krishnamurthy,Balachander,andWillinger,Walter.Understandingonlinesocialnetworkusagefromanetworkperspective.Proceedingsofthe9thACMSIGCOMMconferenceonInternetmeasurementconference.IMC'09.NewYork,NY,USA:ACM,2009,35.URL http://doi.acm.org/10.1145/1644893.1644899 [29] Steeg,GregVer,Ghosh,Rumi,andLerman,Kristina.WhatStopsSocialEpidemics?2011. [30] Sun,Eric,Rosenn,Itamar,Marlow,Cameron,andLento,Thomas.Gesundheit!ModelingContagionthroughFacebookNewsFeed.2009. 41

PAGE 42

[31] Wang,Feng,Wang,Haiyang,andXu,Kuai.DiffusiveLogisticModelTowardsPredictingInformationDiffusioninOnlineSocialNetworks.CoRRabs/1108.0442(2011). [32] Yang,JaewonandLeskovec,Jure.PatternsofTemporalVariationinOnlineMedia.2011.URL http://ilpubs.stanford.edu:8090/984/ [33] Yang,JiangandCounts,Scott.PredictingtheSpeed,Scale,andRangeofInformationDiffusioninTwitter.2010.URL http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/view/1468 42

PAGE 43

BIOGRAPHICALSKETCH Yu-SongSyugothisbachelor'sdegreeandmaster'sdegreefromtheDepartmentofComputerScienceinNationalTsingHuaUniversityinTaiwan.HethencametotheUnitedStatesforhissecondmaster'sdegreeintheDepartmentofComputerInformationandScienceEngineeringinUniversityofFlorida.Yu-song'sresearchinterestsarebroadlydistributedamongDTVinterface,vocalsignalprocessing,andsocialnetworking.Amonghistechnicalskills,heisusedtocodingwithJavathemost.Yu-Song'sextracurricularactivitiesincudecooking,exercise,andmusic.HelovesChineseandJapanesefoodthebest.Heplayssoftballandbasketballcasuallyandlistenstoallkindsofmusic. 43