<%BANNER%>

Palantir

Permanent Link: http://ufdc.ufl.edu/UFE0045118/00001

Material Information

Title: Palantir Crowdsourced Newsification on Twitter
Physical Description: 1 online resource (72 p.)
Language: english
Creator: Venkat Raj, Prithvi Raj
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2012

Subjects

Subjects / Keywords: syndication -- twitter
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, M.S.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: People today generate and consume several exabytes of content. A significant portion of this is on social media and microblogging sites like Twitter. The popularity of these services encourages people to share real world developments and experiences online. The value of such sharing is very apparent during times of duress, when people take to posting important news on social media. While this enlightens the outside world,it lacks coherence and clarity that might make a bigger impact. In addition, there is no obvious way of communicating back to these people, to blend, assimilate and stimulatethe flow of information. In this work, we develop and evaluate colloboration system, Palantir, which is designed for people who glean information from Twitter during an event, and consolidate the information into stories, which allows them to capture a snapshot of how things were at the time of the event. Palantir is designed so that people can easily annotate Tweets from mobile clients,which uses a web client to track Tweets, and finally consolidate these Tweets into stories which can later be published.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Prithvi Raj Venkat Raj.
Thesis: Thesis (M.S.)--University of Florida, 2012.
Local: Adviser: Helal, Abdelsalam A.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2012
System ID: UFE0045118:00001

Permanent Link: http://ufdc.ufl.edu/UFE0045118/00001

Material Information

Title: Palantir Crowdsourced Newsification on Twitter
Physical Description: 1 online resource (72 p.)
Language: english
Creator: Venkat Raj, Prithvi Raj
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2012

Subjects

Subjects / Keywords: syndication -- twitter
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, M.S.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: People today generate and consume several exabytes of content. A significant portion of this is on social media and microblogging sites like Twitter. The popularity of these services encourages people to share real world developments and experiences online. The value of such sharing is very apparent during times of duress, when people take to posting important news on social media. While this enlightens the outside world,it lacks coherence and clarity that might make a bigger impact. In addition, there is no obvious way of communicating back to these people, to blend, assimilate and stimulatethe flow of information. In this work, we develop and evaluate colloboration system, Palantir, which is designed for people who glean information from Twitter during an event, and consolidate the information into stories, which allows them to capture a snapshot of how things were at the time of the event. Palantir is designed so that people can easily annotate Tweets from mobile clients,which uses a web client to track Tweets, and finally consolidate these Tweets into stories which can later be published.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Prithvi Raj Venkat Raj.
Thesis: Thesis (M.S.)--University of Florida, 2012.
Local: Adviser: Helal, Abdelsalam A.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2012
System ID: UFE0045118:00001


This item has the following downloads:


Full Text

PAGE 1

PALANTIR:CROWDSOURCEDNEWSIFICATIONUSINGTWITTER By PRITHVIRAJVENKATRAJ ATHESISPRESENTEDTOTHEGRADUATESCHOOL OFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENT OFTHEREQUIREMENTSFORTHEDEGREEOF MASTEROFSCIENCE UNIVERSITYOFFLORIDA 2012

PAGE 2

c 2012PrithviRajVenkatRaj 2

PAGE 3

Tomyfamily 3

PAGE 4

ACKNOWLEDGMENTS Iwouldliketoconveymysinceregratitudetomyadvisor,Dr.Helal,forhisexcellent counsel,support,andencouragementinpursuingresearchinthisexcitingeld. IwouldalsoliketothankDr.ThaiandDr.Xiaforservingonmysupervisory committee. Atthesametime,IwishtothankthemembersoftheonlineforumTurkerNation whoprovidedmevaluablefeedbackonsomeaspectsofmyevaluationmethodology, andworkersonAmazonMechanicalTurkwithoutwhoselaborIcouldn'thaveactual humangeneratedresults. Iwouldespeciallyliketothankmyfamilyfortheirpersistentencouragementand beliefinme. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 7 LISTOFFIGURES ..................................... 8 ABSTRACT ......................................... 9 CHAPTER 1OVERVIEW ...................................... 10 1.1Introduction ................................... 10 1.1.1MotivationandCurrentProblems ................... 10 1.1.2ThesisObjective ............................ 11 1.1.3ThesisOrganization .......................... 11 2RELATEDWORK .................................. 13 2.1CommercialProductsonTwitter ....................... 13 2.1.1Storify .................................. 13 2.1.2Vibe ................................... 13 2.1.3Dataminr ................................. 13 2.2CommercialProductsSupportingCitizenJournalism ............ 14 2.2.1Wikinews ................................ 14 2.2.2CNNiReport .............................. 16 2.3OverviewofRelatedAcademicResearch .................. 16 2.4Microblogging .................................. 17 2.4.1Overview ................................ 17 2.4.2Providers ................................ 17 2.4.3Twitter .................................. 17 2.4.3.1Overview ........................... 17 2.4.3.2ResearchonTwitter ..................... 18 3OVERALLAPPROACH ............................... 20 3.1ABriefIncursion ................................ 20 3.2AnOutlineofPalantir ............................. 21 3.3Challenges ................................... 21 4ARCHITECTURE .................................. 23 4.1PalantirArchitecture .............................. 23 4.2TweetTaggingandAnnotationServices ................... 23 4.3TagsinPalantir ................................. 26 4.3.1HowUsersTagTweets ......................... 26 5

PAGE 6

4.3.2TagRecommender ........................... 26 4.3.2.1TextTransformation ..................... 28 4.3.2.2GeographicLocation .................... 28 4.3.2.3Tweetopic ........................... 28 4.3.2.4Tagco-occurrenceandPrexmatching .......... 30 4.3.2.5TagRanking ......................... 30 4.4UserInterface .................................. 33 4.5WhatPalantirUsesTags ............................ 33 4.5.1CreateanUserInterestProle .................... 33 4.5.2Searching,TopicFollowing ....................... 34 4.5.3TagConsolidation ............................ 34 5EXPERIMENTATIONANDEVALUATION ..................... 35 5.1Crowdsourcing ................................. 35 5.1.1AmazonMechanicalTurk(AMT) ................... 35 5.1.1.1BasicTerminology ...................... 36 5.1.1.2RelatedWork ......................... 39 5.2Experiments .................................. 40 5.2.1Experiment1:PalantirBaseline .................... 41 5.2.1.1ExperimentalData ...................... 42 5.2.1.2Analysis ............................ 42 5.2.2Experiment2:UnguidedHumanBaseline .............. 43 5.2.2.1ExperimentalResults .................... 43 5.2.2.2Analysis ............................ 46 5.2.3Experiment3:HeatedPalantir ..................... 47 5.2.3.1ExperimentalResults .................... 48 5.2.3.2Analysis ............................ 51 5.2.4Experiment4:AMTSynonymDetection ................ 52 5.2.4.1ExperimentalResults .................... 53 5.2.4.2Analysis ............................ 53 5.2.5SummaryofResults .......................... 54 6CONCLUSIONANDFUTUREWORK ....................... 55 6.0.6Conclusion ............................... 55 6.0.7FutureWork ............................... 55 6.0.7.1ContentSyndication ..................... 55 6.0.7.2SurveyCreation ....................... 56 APPENDIX:WORDLISTS ................................. 57 REFERENCES ....................................... 67 BIOGRAPHICALSKETCH ................................ 72 6

PAGE 7

LISTOFTABLES Table page A-1FilterTerms ...................................... 57 7

PAGE 8

LISTOFFIGURES Figure page 2-1Storify ......................................... 14 4-1PalantirUsagepatterns ............................... 24 4-2PalantirArchitecture ................................. 25 4-3PalantirTagRecommender ............................. 27 4-4TweetopicAlgorithm ................................. 29 4-5TweetEntryScreen ................................. 31 4-6TagSuggestionScreen ............................... 32 5-1Experiment1:Palantirbaseline ........................... 41 5-2Experiment1:Results ................................ 44 5-3Experiment1:Results ................................ 45 5-4Experiment2:TweetstaggedusingonlyusingAMT ............... 45 5-5Experiment2:Results ................................ 46 5-6Experiment2:Results ................................ 47 5-7Experiment3:TweetstagsrecommendedbyPalantirandvalidatedbyAMT .. 48 5-8Experiment3:Results ................................ 49 5-9Experiment3:Results ................................ 50 5-10Experiment3:Histogramsshowingvariationinusefulness ............ 50 5-11Experiment4:SynonymdetectiononAMT .................... 52 5-12Experiment4:SimilarWords ............................ 53 8

PAGE 9

AbstractofThesisPresentedtotheGraduateSchool oftheUniversityofFloridainPartialFulllmentofthe RequirementsfortheDegreeofMasterofScience PALANTIR:CROWDSOURCEDNEWSIFICATIONUSINGTWITTER By PrithviRajVenkatRaj December2012 Chair:Abdelsalam(Sumi)Helal Major:ComputerEngineering Peopletodaygenerateandconsumeseveralexabytesofcontent.Asignicant portionofthisisonsocialmediaandmicrobloggingsiteslikeTwitter.Thepopularityof theseservicesencouragespeopletosharerealworlddevelopmentsandexperiences online.Thevalueofsuchsharingisveryapparentduringtimesofduress,whenpeople taketopostingimportantnewsonsocialmedia.Whilethisenlightenstheoutsideworld, itlackscoherenceandclaritythatmightmakeabiggerimpact.Inaddition,thereisno obviouswayofcommunicatingbacktothesepeople,toblend,assimilateandstimulate theowofinformation. Inthiswork,wedevelopandevaluateacolloborationsystem,Palantir,whichis desginedforpeoplewhogleaninformationfromTwitterduringanevent,andconsolidate theinformationintostories,whichallowsthemtocaptureasnapshotofhowthingswere atthetimeoftheevent. PalantirisdesignedsothatpeoplecaneasilyannotateTweetsfrommobileclients, whichusesawebclienttotrackTweets,andnallyconsolidatetheseTweetsintostories whichcanlaterbepublished. 9

PAGE 10

CHAPTER1 OVERVIEW 1.1Introduction 1.1.1MotivationandCurrentProblems PeoplehavebeenconsumingnewsonNewspaperssincethe16thcentury.In 2011,46%ofAmericansturnedtotheInternetfornewsatleast3timesaweek,as opposedtothe40%ofpeoplewhogottheirnewsonnewspapers[ 1 ].Thestudyalso ndsthat84%ofAmericansownmobiledevices,47%ofwhomconsumenewson thesedevices.Whilethesenumbersshowthatmanypeopletodayaremovingtowards digitalnewssources,itdoesnottakeintoaccountnewsthatndspeoplebymeans ofsocialnetworks.Newsofalmosteverymajorearthquakeinthelastthreeyears brokeoutonTwitterbeforecatchingtheattentionofmainstreammedia.Anarticle[ 2 ] byMeganGarber,Neimanlabscommentsthatmostmainstreamnewsorganizations thatareonthesocialmediabandwagon,useTwitterasagloriedRSSfeed.APEJ studyshowsthatlessthan2%ofTweetsby13ofAmerica'spopularnewsorganizations usedTwitterasaconversationalmediumtogatherinformationfrompeople.Oftentimes, transienteventsoflocalimportanceareTweetedpromptlylongbeforemainstream mediaisaware.Inherarticle[ 3 ],GinaChenhighlightshowpeoplesupportedeach otherbypublishingalternativeroutesthatcouldbetakentoavoidamulti-carpileup. Incaseslikethese,thereismuchvaluetobehadingettingrealtimeinformationthatis immediatelyuseful.TwitteralsoplayedapivotalroleinIranelectionprotestsof2009, Egyptrevolutionof2011,andcurrently,theOccupymovement.Twitter,howeveruseful, ismoreatransientmedium.AsTweetsage,theyslipoutofcontext,andwhatwasonce importantnowrapidlyvanish.WebelievethatTweets,abroadlyadoptedmedium,can beleveragedtoprovideafarmorepowerfulexperiencebysparkingnewstidbitsand circulatingtimelynews.WeimaginethatcontextualsnapshotsofTweetswouldhave potencytostayrelevantlongafteraneventhaselapsed. 10

PAGE 11

1.1.2ThesisObjective Wedescribeacollaborativetool,Palantir,designedforpeopletogleaninformation fromTwitterduringanevent,andenablethemtoconsolidateTweetsintostoriesasand whentheyhappen.ThefundamentalideaofPalantiristappingintothesocialnetwork effectofmobileTwitteruserswhoareintimatelyfollowingdevelopmentofatopic.Such users,ifgiventheappropriatetoolscannaturallychanneltheirpassionandenergyto writeamoreorganizedviewoflowerleveltweets.Palantirmakesorganizingtweetsby taggingsimplerbyprovidingtagrecommendationswhichareinuencedbybothcontent, andgeospatialcontext.Thesetagsareexploitedtocreateprolesforindividualusers ofthissystem,whoformacrowdthatcanbecategorizedandcontactedforinformation. Palantirallowsforqueryingrelevantportionsofthiscrowdandaggregatingtheirresults. TheseaspectsmakePalantiravaluabletoolwhichcanbeusedforrealtimeremote reporting.Byfollowingtopicscodedbytags,llinginmissingdata,andpublishing consumablestorylines,peopleareenabledtoproductivelycreatecontentbasedonreal timeinformationfromTweets,andsharethemwithothers.Fundamentaltotheworking ofPalantiristheconceptofrecommendingtags,whicharetextannotationsappliedto amicrobloggingpost.Whiletherearestudiesabouttaggingcontent,nonefocuson humanstaggingshorttextssothatitismorediscoverableforothers.Wetestwhether Senetal'sndings[ 4 ]couldbeappliedtothedomainofmicroblogs.Inthisthesis,we concernourselveswiththefollowing Dotagrecommendationsaffectthecognitiveloadofpeopleapplyingtags? Dopeoplendvalueinapplyingtagstomicroblogposts? Isthereafasterconvergenceoftagvocabularywhentagsarerecommended? Doesthequalityandquantityoftagsimprovewhentagsarerecommended? 1.1.3ThesisOrganization Thisthesisisorganizedinto5chapters.Chapter2providesinsightaboutrelated workintheindustry,andprovidesdetailaboutacademicresearchworkcoveredin 11

PAGE 12

thisthesis.Chapter3presentstheoverallapproachtakenbyPalantir,whileChapter 4providesanin-depthdiscussionofPalantirarchitecture,designconsiderations,and implementationdetails.Chapter5coversvalidationandevaluationofresultsusinga crowdsourcebasedapproach.Chapter6concludeswiththoughsaboutfuturework. 12

PAGE 13

CHAPTER2 RELATEDWORK 2.1CommercialProductsonTwitter 2.1.1Storify Storify[ 5 ]isasocialmediacurationservicethatlaunchedinlate2011[ 6 ].Users ofStorifysearchsocialnetworks,andselectindividualelementsintostories.Figure 2-1 showsatypicalarticlewrittenwiththeservice.Storifyallowsuserstoimportsocial mediaelementslikeTweetsandimagesintotheirstory,andallowsuserstosupplytheir owntextualcontenttomaintaintheowoftheirstory.Storifyhasbeenusedtoprovide politicalcoverage[ 7 ],andevendocumentmeetingsandworkshops[ 8 ].However,Storify doesnotmakendingthesesocialmediaelementseasier,nordoesitprovideawayto collaborateoverstories. 2.1.2Vibe Vibe[ 9 ]isamessagingapplicationthatallowsuserstopostanonymousshort messagesthatarepinneddowntoacertaingeographicalregion,andhaveanexpiry timeafterwhichtheyaredeleted.AnyuserwiththeVibeapplicationcanviewmessages withinaradiusfromthecurrentlocationoftheuser.Thecurrentversiononthis applicationallowsuserstosetalargeradius(12000miles).VibediffersfromTwitter andotherservicesinthatitdoesnotrequireuserstosignup.Whiletheapplication isn'tpopularwitheverydayTwitterusers,itwasbenecialtopeopleparticipatinginthe wallstreetprotests,providingthemwithanelectronicchannelofcommunicationwith theanonymouscrowdaroundthem[ 10 ].Itisofinteresttonotethatpeoplearemaking useofapplicationsthatpinsdownpostsatspeciclocations,wefeelthatwithasocial networklikeTwitter,messages,andbyextensiontags,shouldhavegeographicalafnity. 2.1.3Dataminr Dataminr[ 11 ]isareal-timesocialmediaanalyticsenginethatlistenstoeverypublic postonTwittertomathematicallydetermineeventsandmicrotrends.Dataminrclaimed 13

PAGE 14

Figure2-1. Storify tohaveinformedclientsaboutOsamaBinLaden'sdeathbeforeitwasreportedbysocial mediaoutlets[ 12 ]. 2.2CommercialProductsSupportingCitizenJournalism 2.2.1Wikinews Wikinews[ 13 ]isacollaborativejournalismplatformthatwasestablishedinNov 2004.AninterestingfacetofWikinewsisthatitallowsoriginalworkinadditiontowork thathasbeensourced.Wikinewsisvaluableincoveringnewsoflargeeventsaffectinga largepopulationofpeoplewhocanreportaboutitfromdifferentviewpoints. 14

PAGE 15

ThemajordetractionforWikinewsistheperceivedinabilitytopreserveaneutral pointofview.Someofthemorecomplexissueslieattheheartofwhatnewsis.i.e., deliveringinformationinatimelybasis,andprovidingthatinformationwithacaptivating narrative.AndrewLih,anotedauthorityonWikipedia,feelsthatisitdifculttogettwo ormorepeopletowriteinthesamestyle,asevidencedbythefailedprojectAmillion penguins'byPenguinPublishing. WikinewshasamodelsimilartoWikipediawheresubmittedarticlesarereviewed bytrustedusers.Whereitdiffersisthatthepurposeofnewsistocaptureasnapshot intime.Ifanewdevelopmenttakesplace,anewsarticlemakesareferencetothe previousarticle,recapsthestory,andthentakesitforward.Thisisinstrongcontrast withhowWikipediaworks,wheresuchachangewouldhavecausedsomeonetoedit anexistingarticle.AnotherkeydifferenceisthatWikipediastickstoanformulaicstyle likethestrictinvertedpyramid.AnarticleinaWikipediapagemightstartlike"On14 November,eventxhappenedatlocationy,zpeoplewereaffected".Articlesgivean overallviewbeforedrillingintodetails,andmostarticlesonWikipediastrictlyfollowsuch astyle. Aproblemthatplaguescommunityreviewedsitesisthequestforperfectionism. Thisleadstoinstructioncreep,which,inthecaseofWikinewsgraduallycausedoutput ofnewsarticlestofallfrom6-8articlesperdaytothesamenumberperweek.Asan artifactofmaintainingasnapshotofhistoryintime,Wikinewsimposessometimelimit beforewhichanarticlemeetingtheirstandardsiswritten.Manyauthorsareunable togettimelyhelpmeetingtheserequirements,andthusmightnothavetheirarticles publishedonthemainsite. Palantir,ontheotherhandisdesignedforpeoplewhoaretunedintosocialmedia likeTwitter,andhaveaneedtoquicklycatalyzeinformationowduringanongoing event.WhileaWikinewsmodelworkswellforreportingafteranevent,wefeelthat Palantirissuitedmoreforlivereporting. 15

PAGE 16

2.2.2CNNiReport iReport[ 14 ]isatoolthatcitizenjournalistscanusetosubmittheirnewsarticles toCNN.iReportissimilartoWikinewsinthatitallowspeopletosubmitphotos,videos orarticles,butitdoesallowallowforcollaborationasWikinewsdoes.Whenastoryis postedtoiReport,itimmediatelyappearsontheCNNiReportsite.CNNhasastaffof reportswhocombthroughthesestoriesforonethatareinterestingandcanbeused ontheirmainsite.Theseselectedstoriesarethenveried,andused.Veriedstories arebadgedsaying"VettedbyCNN".PeoplesubmitstoriestoCNN,becausethereisa possibilitythattheirstorymightappearonthemainsite,andin2011,CNNrecognized goodcontributionstoiReportbyholdingtherstiReportAwards,furtherfostering communityeffort. 2.3OverviewofRelatedAcademicResearch Wereviewworkdoneonmicroblogging,taggingsystems,topicdetectionalgorithms, temporalexplorationinterfacesandcrowdsourcingmarketplaces.Twitter,amicroblogging service,isgrowingatarapidpace,andisspurringresearch.EhrlichandShami[ 15 ] foundmicroblogsbeingusedasreal-timeinformationsources,buthighlightedconcerns aboutthevolumeofdata,noiseandrelevancyissues.Wedrawonresearchonhuman guidedunstructuredtaggingsystems,folksonomies,byAdamMathes[ 16 ],MariekeGuy andEmmaTonkin[ 17 ].Theadvantagefolksonomieshaveoverformalclassication systemsisthatthetermsusedinformaltaggingsystemsmaybeimprecise.Adam Mathes'[ 16 ]papercontainsgooddiscussionabouthowtagsaredistributed,andallays fearsofsingleusetagsdominatingothers.Senetal.[ 4 ],provideagoodtreatment onhowrecommendingtagstopeopleaffectsthetagsthattheychoosefortagging. Palantir'stagrecommendationsystemisbasedontheTweeTopicAlgorithmthathas beendescribedbyBernsteinetal.in[ 18 ].ThisalgorithmmakesuseofaonlineInternet searchenginetoassigntopicstoTweetsorothershortpiecesoftext. 16

PAGE 17

In Section2.4 ,weprovideadiscussiononmicroblogging,withafocusonTwitter. Crowdsourcingsystemsarecoveredin Subsubsection5.1.1.2 2.4Microblogging 2.4.1Overview Microbloggingisanemergingformofbroadcastcommunicationthathasbeen gainingpopularityoverthepastfewyears.Microbloggingservicesallowuserstopost shortcontentonline[ 19 20 ].Amajorityofthiscontentistext,butusersalsolinktoother typesofcontentlikeaudio,video,images,orotherwebresources. 2.4.2Providers MicrobloggingservicesareprovidedbymanyorganizationslikeTwitter,Plurk, Yammer,Jaiku,Pownce,Social.Net,App.Net,etc.Inthiswork,wewillbeconcentrating onTwitter,whichwasamongtherstservicestolaunchinMay2006,andhad100 millionactiveusersinSeptember2011[ 21 ],andisbelievedtohavemorethan500 millionregisteredusersasofApril2012[ 22 ]. 2.4.3Twitter 2.4.3.1Overview WhileTwitterisamicrobloggingsite,ithassomesocialnetworkingsemantics. Twittersupportsandencouragestheactionsbelow. Following Theactofausersubscribingtotheupdatesofanotheruser Followers Thesubscribersforaspecicuser @reply Ausercanuse@usernametomentionotherusersinapost.Otherscanview thisiftheaccountoftheposterispublic DirectMessage(DM) Aprivatemessagethatcanbesenttoafollower ThesocialnetworkmodelofTwitterisdifferentfrommanyothersocialnetworks. Specically,inTwitterrelationsbetweenusersaredirected.Ausercanfollowanother withoutrequiringtheotherusertofollowhim/herback.Ithasbeennotedthatonly22% offollowsaremutual[ 23 ].TheTwitterfeedofagivenusercontainsTweets(Messages 17

PAGE 18

postedonTwitter)fromallusersthatthecurrentuserfollows,arrangedinareverse chronologicalmanner. Twitterallowsshort140charactermessagestobepostedusingit'sservice,which canbeseenimmediatelybyotherpeopleontheservice.ThisallowsforTwittertobe arealtimestreamofinformation.TherealtimenatureofTwitterhasbeenexploitedto spreadnews.TwitterhasbeenthepreferredmediumoncommunicationintheArab Spring[ 24 ],theIranelections[ 25 ],andOccupyWallStreetMovement.Incertainevents liketheUSAirwaysFlight1549crashingintotheHudsonriver[ 26 ],thedeathofpopidol MichaelJackson[ 27 ],theterroristattacksinMumbai[ 28 ],thedeathofterroristOsama BinLaden[ 29 ]andforeveryearthquakeinthepast3years[ 30 ],Twitterprovided newsquickerthanothernewsmedia.Assuch,Twitteractsasasourceofinformation, allowinguserstodiscovernewandinterestingcontentontheInternet. AlthoughTwitterisagoodsourceofinformation,thereisn'taneasywaytoorganize Tweets,orretrieveTweetscorrespondingtoatopicofinterest.WhileTwitterintroduced hashtags,whicharewordtagsinthebodyoftheTweethavingasyntaxof[word],e.g. OWS(OccupyWallStreet)theyareinlinewiththeTweet,usuallycontainedattheend andeatintothe140characterlimit,forcingpeopletousetagsthatarealreadyshort, andlimitingthenumberoftagsusedperTweet.InastudyusingasamplingofTweets from2009,researchersfromMicrosoft[ 31 ]foundthatonly5%ofTweetscontaina hashtag.Inaddition,notallTweetshavehashtagsthatarepertinenttothecontentin theTweet.Currently,thereisn'tawaytoretrievealltheTweetsbyaspecicuserona specictopic. 2.4.3.2ResearchonTwitter Kwaketal.[ 23 ]crawltheTwitternetworkason2009,studynetworkstructure, determineinuentialusers,informationdiffusionandconcludethatTwittertakesafter aninformationsharingnetwork,ratherthanasocialnetwork.Ofparticularinterestto usisthequantitativecomparisonbetweenCNNHeadlineNewsandtrendingtopicson 18

PAGE 19

Twitter.ItwasfoundthatthoughCNNHeadlineshadbettercoverage,newsofalive broadcastingnaturebrokeoutrstonTwitter.[ 32 ]studytheintentofTwitterusers,which aredailychatter,conversations,sharinginformation/newsetc.Andreetal.[ 33 ]analysed thecontentsofover43,000TweetsonTwittertondthat36%oftheratedtweetsare worthreading,25%arenot,and39%aremiddling.Thoughtthisisjustasamplingof Tweets,itshowsthatthereissignicantroomforimprovementwhichcanbeachieved bypresentingtousersTweetsthathaveconnectthattheyvalue.Infact,theauthors arguethattakingasocialinterventionapproachbyinformingusersaboutcontentvalue, audiencereactionandemergingnormswhileleavingtheuserincontrolhaspotentialto improvethemicrobloggingexperience.Whilethisseemslikeleavingthehumanvalueof theequation,i.e.onecan'tspeaktopeoplesolelyaboutthingsthelistenerisinterested in,wefeelthatsuchanapproachmightbeneededformedialikeTwitterwheresome researchers[ 34 ]areoftheopinionthatabout40%ofTweetsmightbepointlessbabble. Acontributingfactortothelargebodyofresearchdoneinthepastfewyears wastheopennessoftheTwitterplatform.Twittersupportedacademiabyallowing unrestrictedaccesstoTweets,andfollower/followeeinformation.Recently(September, 2012),Twitterhaschangedit'sbusinessmodeltoshowingadvertisementsintheTweet Stream,andisnowpreventingpeoplefromthesamelevelofaccesstheyhadbefore. Thisarticle[ 35 ]describeshowthesechangesaffectresearchonTwitter. Higashinakaetal.[ 36 ]showthatthemajorityofconversationsonTwitterare composedofjusttwotweetsandthatthisissufcienttomodelconversation.Researchers fromYahoo[ 37 ]ndthatTwitterisaveryhomophiliousnetwork,whereinformation diffusionoccursprimarilyinthesamecommunitythattheinformationoriginatedin. AlineofresearchonTwitterfocussesonndingthemostinuentialpeopleonthe network. 19

PAGE 20

CHAPTER3 OVERALLAPPROACH 3.1ABriefIncursion Onlinecommunitiesarelargertoday,thanatanypointinhistory.Servicessuch asTwitter,Jaiku,Facebook,Tumblr,Reddit,etc;havefosteredcommunitespassionate aboutdiversetopics.However,everyonlinecommunitysuffersfromthe1%rule[ 38 ], whichstatesthatonly1%oftheuserbaseactuallycreatecontent,9%editcontent, andtheremaing90%ofavirtualcommunityonlyconsumecontent.BillandMikolaj [ 39 ]randomlysampledtheTweetsof300,000usersin2009tondthat10%ofprolic userscreate90%oftheTweets.Whilethisemphiricalresultsconcerningpariticipation inequalityontheInternetseemextreme,wearefamiliarwiththeParetoprinciple,orthe lawofthevitalfew,whichstatesthat,formanyevents,roughly80%oftheeffectscome from20%ofthecauses[ 40 ]. AnotherinuenceforthedesignofPalantiristheexistenceandtheriseofcitizen journalism.ToolslikeCNNiReport,FoxuReport,etc;allowcitizenjournaliststowrite theirownarticles.InthecaseofCNNiReport,thesearticleshavetheirownseparate sectioneasilyaccesiblefromthemainnavigationbaronCNN'swebsite,whereusers canreadthroughallsuchcontributedarticles.ItisimportanttonotethatiReportallows anonymousarticles,andpostsallarticlespostedbyeveryuserontotheirwebsite, someofwhicharesubsequentlyveriedbyCNNreporters,andtheirwebsiteamended toincludethisinformation.WhileCNNiReporthassomeguidelinesthatadvicepeople topostarticlesthatareknowntobetrue,FoxuReportprovidesnosuchguidance,and doesnothaveaconceptofverifyingpostings.Ithoweverincludesasectiontitlededitor picks.Acommonnoteamongtheseplaformsistheinablilitytomobilizepeopleinto creatinginformationthatoneneedsforareportinatimelymanner,whichwefeelisa signicantdrawbackonethatcouldhavethemostimpactifsolved. 20

PAGE 21

WithPalantir,ourgoalistostimulatepeopleonlinetocreatetimelycontentthat couldbeuseful,depreciatingtheskewbetweencontentproducersandconsumers. Thebasisofthisideaistondsufcientlyinterestedpeoplewhoaremotivatedabout creatingcontentthatisrelevanttoissuesonhandandcatalyzingthemintoaction.Itis designedastheevolutionofcitizenjournalismandreportingfromamostlyonesided affairtodialog. 3.2AnOutlineofPalantir ToenablePalantir,wedrawintoworkbyMaslow[ 41 ]andtapintohumans'natural needforesteemandrecognition.Palantirisacollaborationplatform,whereusers postabouttopicstheyareinterestedin,whileexplicitymentioningtopicsthattheir postmentions.Theselectionoftopic(s)forthepostisguidedbyPalantirtoensure thatexistingtagsthataresimilarareconsideredrstbytheuser,promotingreuseand borrowingoverreinventionoftags.Usersarealsoallowedtoapplytagstopoststhat aren'ttheirown. Theuseoftagsbyauserdeterminestopicsthattheuserisinterestedinposting about.Weusethisinformationtoformcommunitieswhichcanthenbeusedtocreate owofinformationonaparticulartopic.Thisisdonebyhavingpeopleaskquestions whicharethenpostedtotherelevantinterestedcommunities. 3.3Challenges AttheheartofPalantirarethemethodsusedtorecommenedtopicstousers.Sen et.al[ 4 42 ]showthattagrecommendationssignicantlyinuencethetagsthatusers choose.AsPalantirrunsonTwitter,wefacethefollowingcomplexities Modestcharacterlimitforposts.Twitterposts(Tweets),arelimitedto140 characters,andthushaveonlyabout15words[ 39 ].Traditionaltopicmodelling algorithmshavebeendesignedtoworkwellwithlargetextcorpuses.Analgorithm likeLatentDirichletAllocation(LDA)[ 43 ],requiresasinputatextcorpusand numberoftopicstomine.AlgorithmslikeLDArelyoninferringworddistributions inadocumenttosplitintotopics.Whendocumentscontainasmallnumberof words,theresultingperformanceispoor.Inadditiontothis,thenumberoftopicsis assumedtobeknownapriori. 21

PAGE 22

Variancesinlanguage.Twitterusersusecreativeshortenedwordstoconveywhat theymeanwithinthe140characterlimit.Manyofthesewordsmaybenouns,not havingacanonicalform,thuscreatingproblemswhereforasameintent,different wordsareused. TwitterAPI.WhileTwitterwasconducivetowardsdeveloperswhentheystarted (circa2006),theydidnothaveaclearmonetizationstrategy,andwererelyingon investormoneyfortheirexpenses,whilespendingtimeandresourcestobuild agreatproduct.However,goingforward,Twitterfeelsthattheirbiggestasset isthedatathatthehaveonthesystemandtheeyeballsofthepeopleusing Twitter.Asof2012,Twittermonetizesonthisdatastreambyprovidingbulkraw accesstoTweetstocompaniesasapricingthatisn'taffordableformostindividual developers.TwitterseverelyrestrictsAPIuseandaddednewtermsofservice whichpreventsharingcollectionsofTweets.Thisyear,2012,Twitterhaveadded furtherrestrictionsontheAPI,whichwouldmakescholarlyresearchonTwitter evenmoredifcult.Thedifcultlyfacedistheunpredictabilityofthelicencing termsandthepaceatwhichtheyarerevised.Unfortunately,Managingriskina landscapeofvacillatingandunpredictablelicensingposesimmensechallengesin anyresearchdependingonthirdpartyservicesanddata. 22

PAGE 23

CHAPTER4 ARCHITECTURE 4.1PalantirArchitecture Figure 4-1 showstypicalusesofPalantir.Palantirisanabstractionwhichallows authorstomakeuseofinformationavailableonTwitter,whileenablingthemtoturna trickleofinformationintoagushbyaskingquestions.PeopleinteractwithPalantirinthe followingways 1. SubmitTweetsandTags:UsingtheUI,peoplecansubmitTweetsalongwithtags pertinenttothoseTweets.ArecommendersystemaidsPalantiruserstoselect goodtagsbasedonthetopictheyarecontributingto. 2. SearchandFollowtags:AsTweetsareorganizedbytags,peoplesearchfor Tweetstheylikebyfollowingtagsthatareinterestingtothem. 3. Correct,ConsolidateandVote:Peopleseethetagsthatothershaveapplied,and dependingonwhethertheyagreewithitornot,theyvotefororagainstthetag. PalantirusersmayalsoconsolidatesimilartagsandTweetsintoabundle. 4. Writearticlesusingpermanentreferences:AsetofTweetsmightcontain substantialinformationthatarecoherenttogether,andwhileidentifyingthis Palantirusersusethemtoformandcontributearticlesbasedonthem,adding valuetotheseordinaryTweets. 5. SurveyResponses:Sometimes,theremightnotbeenoughinformationona particulartag,andpeoplecouldcreatesurveystogatherdata.Repliestothese requestsarecomposedbyPalantirusersonmobiledevices. ToughthePalantirarchitectureprovidesforseveralbroadandusefulinteractions, thescopeofthisthesisislimitedtothedesignandevaluationofatagrecommender system,andimplementationthatallowsforsumbittingTweetsandtags,andsearching andfollowingtags. 4.2TweetTaggingandAnnotationServices Asmentionedearlier,Andreetal.[ 33 ]analyzedover43,000volunteerratedTweets tondthatnearly36%oftheratedTweetsareworthreading,25%arenot,and39%are middling.ThoughthisisjustasamplingofTweets,itshowsthatthereissignicantroom forimprovementwhichcanbeachievedbypresentingtousersTweetsthathavecontent 23

PAGE 24

!"#"$%&' !"#$%&'()*+,,( -./)(-01 *-0)/% 2")3455,6(7)34.148'/-(,) -./)94(,)4.)*-017)*+,,(1 :");,1<4./)(4)1$5=,>1 ?")@5'(,)-5('68,1)) A,()<,5&-.,.( 5,B,5,.6,1 C")#,-56D)-./)E4884+ (-01 Figure4-1. PalantirUsagepatterns theyvalue.Otherstudies[ 34 ]mentionthatupto40%ofTweetsmightjustbepointless babble.PalantirreliesonTweetshavingreasonablyaccuratetagstosupportselective explorationandtopicfollowing.Palantiralsoexploitsthesetagstondpertinentusers forsurveysandpolls.Asof2011,usersofTwitterproduceonaverage1620Tweets persecond,witharecordhighof6939Tweetspersecond.EachoftheseTweets arejust140characters,andthereforetechniqueslikeLatentDirichletAllocation[ 43 ], ProbabilisticLatentSemanticAnalysis[ 44 ]aredifculttoapplysuccessfully.Palantir partiallyshiftstheonusofdeterminingthetopicsmentionedontheTweet,byhaving userstagTweetswhentheypostit.Usersareaidedbytopicrecommendationsprovided byPalantirwhichultimatelyshapesupthevocabularyusedintheTagset. 24

PAGE 25

!"#$%&'( )"*+,&!"#$ ./ 0-12',&$ ./ 34&-$ ./ !5&&1$ 2+$ ./ 6"'7& %&8,2'"129* :","*12-$69;;<*21= 0,#9-217;4 !"#$%&'9;;&*+&34&-$6,"44$%&'9;;&*+&!"#$69>9''<-"*'& ?&9#-"872'$?-2+$69*@&-429* &1'($ A<1<-&$:,<#2*4 0##-&#"1&$"*+$ B9-;",2C&$-&4<,14 088,2'"129*$D"=&B&542E'"129* F<-@&=$ G8&-"129*4 /<*+,2*#$ !5&&14$ H0-12',&4I 0<17&*12'"1& !5211&-$ G0<17 34&-$J",2+"129* KL8,9-&$ 0-12',&4$!"#4M$ !5&&14 :","*12-$ 34&Figure4-2. PalantirArchitecture 25

PAGE 26

4.3TagsinPalantir WeusetagstomarkTweetssothatotherusersmayretrievethemeasilywhen theyaresearchingforaspecictopic.Inaddition,tagsallowsusersandtheircircles tocreatetheirownnichetagswhichcouldaidcollaborationfurther.Itmaybeargued thatTwitterhashtagsservethesamepurpose,buttheyarenotaseffectivewhenwe donotknowwhatexactlyatopicbelongstoandwanttoaddmultipletags,inwhich case,hashtagseatintocontentlength.Weencourageuserstoentertagsthatare alreadyinthesystemusingprexmatching,ortagsthataresuggestedbytheTweetopic algorithm.Therationalebehindthisistoreducethenumberoftagswithdifferent spellingvariations. 4.3.1HowUsersTagTweets ThefundamentalcontributionofPalantiristheideaofharnessingtheactivityof peoplesubmittingandreadingTweetstoallocatetopicstoTweets,calledtags.Palantir allowsuserstotagtheirownTweets,orTweetsthathavebeenpostedbyothers. Palantiralsoallowspeopletohavenichetagswhichcanonlybefoundbyknowingthe nameofthetagbeforehand.Thesenichetagswouldnotbesuggestedbythesystem, andmaybeconsideredprivatetoauser,orgroupofuserswhoknowthetag. 4.3.2TagRecommender ThemotivationforprovidingrecommendationsisderivedfromSenet.al's[ 4 ]work thatshowsthatwhentagsarerecommendedtopeople,theytendtoselectfactual tagsasopposedtopersonalorsubjectivetags.Thisapproachalsooptimizesuser experienceonamobiledeviceinthattheuserdoesnothavetotypeouttagswhen theyarerecommendedcorrectly.Inaddition,italsoreducesthenumberofsingleuse tags,tagsthataremisspelled,orhavedifferentpunctuation,andthequalityoftagsare maintained. TheTagRecommendersystemoperateswhentheuserisinputtingdatainto Palantir,andanalyzespartialuserinputtoproducecandidatetags.Figure 4-3 outlines 26

PAGE 27

!"#$%"&'()**$+' ,-$*.$%"&'/0*#' *.$*#*1'$"20 ("2'3".4%.2 5*-2#",6%7'8%..%.2 ()**$-,%7 9--77/#".7* (*:$'(#".0;-#<"$%-. =*%26$*1 3*7-<<*.1"$"%-.0 $"20 $"20 $"20+ &-7"$%-. $)**$ $"2 Figure4-3. PalantirTagRecommender 27

PAGE 28

theworkingofthissystem.Palantirusesthreedistinctmechanisms,Geographic Binning,Tweetopic,andTagCooccurancetogeneratecandidatetags,whichareranked toproduceaweightedrecommendationlist 4.3.2.1TextTransformation Wenotethatsearchenginesgivebetterresultswhenshortspecicwordswhich characterizepagesthatwearesearchingforaregivenasthesearchquery.Toachieve thisweremoveTwitterspecicidiosyncrasieslikeRTand@usernamereplies.Wethen useamaximumentropypartofspeechtagger[ 45 ]toidentifyallthenounphrasesinthe Tweet. 4.3.2.2GeographicLocation Theintuitionbehindbinningisthatwecanrewritenoisylatitudeandlongitude coordinatesusingagridsystem,whichallowsforasimplersearchfornearbyneighbors. PalantircollectsTweetsandtagssubmittedbytheusersalongwiththeirphysical coordinates.EachTweetcanhavealocationassociatedwithit,andcanbetaggedwith multipletags.WedenetaglocationasthelistoflocationsthattheTweetsutilizingthis taghave.WemaptherawlatitudeandlongitudevaluesintotheUniversalTransverse Mercator(UTM)Gridsystem[ 46 ],whichusesatwodimensionalCartesiancoordinate systembasedonanellipsoidalmodelofEarthtogivelocationsonthesurfaceofthe earth.Topicsarestored,indexedbytheirgrididentier.Thisallowsforretrievaloftopics thathavebeenusedinaspecicgrideasily.Further,suchindexingmakesiteasierto lookattopicsthathavebeenusedingridsnearby. InPalantirthegeographicbinningrecommendertakesinalatitudeandlongitude, andoutputsasetofcandidatetagsthatarenearby. 4.3.2.3Tweetopic Thisalgorithm,asdescribedin[ 18 ],Figure 4-4 usesdataoftheusersTweetalong withdatafromasearchenginetoprovideprospectivetags.Inshort,thealgorithm formulatesqueriesforasearchengineandminestheresults. 28

PAGE 29

T WEETOPIC [ nounPhrase ] 1 results searchnounphraseon Google 2 if results length < 10 3 then tokNouns = T OKENIZE ( nounPhrase ) 4 for each noun in tokNouns 5 do noun result no.ofresultsby searchingonlyforthatnoun 6 sort tokNouns basedon noun result 7 resultsMax getsNIL 8 resultsMin getsNIL 9 while resultsMax length < 10 10 do 11 resultsMax search tokNouns # tokNouns minimum 12 tokNouns tokNouns # tokNouns minimum 13 while resultsMin length < 10 14 do 15 resultsMin search tokNouns # tokNouns maximum 16 tokNouns tokNouns # tokNouns maximum 17 results resultMax $ resultMin 18 for each url in results 19 do keywords keywords % TF-IDF ( url Text ,20) 20sort keywords bynumberofoccurrencesofwords 21 return top 5 unique keywords Figure4-4. TweetopicAlgorithm QueryaSearchEngine Thenounphrasesobtainedfromtheprevioussteparesenttoasearchengine.An iterativebackoffisusedtoadjustthequeryuntilatleast10resultsareobtained. IdentifyPopularTermsintheResults TF-IDFisusedtoidentifyabout20keywordsperpage.Keytermsthatoccurmore than5timesareconsideredasvaliddescriptorsfortheTweet. 29

PAGE 30

TheTweetopicalgorithmdoesnotlearndirectlyfromthetagsenteredintothesystem, andcannotdeterminetagsthatarealreadyinthesystem.Palantirusesprexmatching todisplaypreexistingtagswhentheusertypesinatagnotshownbyTweetopic. AnadvantageofusingTweetopicisthatthesystemdoesnothaveacoldstart,and providestagrecommendationsevenforTweetswhichdonotyethavecorresponding tagsinthesystem. 4.3.2.4Tagco-occurrenceandPrexmatching Tagco-occurrencegradespairsoftagsbasedontheirassociationwitheachother. Weassumethattagsaresimilarandsharetraitsiftheyoccurwithinsimilarcontext,i.e. theyareusedtodescribethesameTweet. Eachtag t ,isnumbered,andisrepresentedasasparseco-occurrencevectorin multidimensionalspace w =( f 1 f 2 ,..., f N ) ,where f i indicateshowoften w occurswith t i .Thesimilarityoftwotagsismeasuredbytheproximityofthevectors.Weusethe measureofcosinesimilaritytomeasureproximitywhichisgivenby cos( #& x #& y )= n i =1 x i y i n i =1 x 2 n i =1 y 2 Thetop n similartagsarechosentobepresentedtotheuser. Prexmatchingworksbyconsideringthetagenteredbytheuserasaprex,and fetchingalltagsalreadyinthedatabasewhichhavethesameprex. 4.3.2.5TagRanking Weadaptthenotionthatsametagscomingfromdifferentmechanismsaremore important.Toplacethesetagsrst,wesortthesetoftagsthathavebeenidentiedfrom alldifferentmodulesonthefrequencyofoccurrence,andthendiscardduplicates.Tags withfrequenciesgreaterthanonearepresentedindecreasingorderoffrequencies.For tagsthathaveafrequencyofone,wecomparethemwiththesetoftagstheuserhas usedbefore,andpresentthoserst,followedbytheremainingtags,orderedbytheir globalfrequencycounts. 30

PAGE 31

Figure4-5. TweetEntryScreen 31

PAGE 32

Figure4-6. TagSuggestionScreen 32

PAGE 33

4.4UserInterface TweetspostedthroughPalantiraretaggedbyusersaidedbythetagrecommender. WhenuserscomposeTweetsonthemobileclient,tagsarecomputedperiodicallyafter theinputof n words,andarethendisplayedonthescreen,allowingtheusertoselect some,orenteranewones.Ifthesystemisabletorecommendmorethan5tags,users couldgettothenextsetofresultsbyickingtheresultbar.Toenteratagthatisnot recommended,theuserclickson"AddNew",andisallowedtoenteracustomtag. Whenusersareintheprocessofenteringcustomtags,theyareshowntagsthatare matchedbyprex,tofurtherenablethemtopickatagthatisalreadyinthesystem. However,theyarenotrestrictedtoanyvocabulary,andarefreetocompletetheirtagsin anywaythattheyseet. PalantirprovidesimmediatefeedbacktouserstaggingTweetsbyallowingthemto viewTweetsdescribedbysimilartags,andprovidingthemanopportunitytoretroactively changethetagsthattheyselectedfortheTweettheysubmitted. Palantirallowspeopletobrowseothers'Tweetspossiblyconstrainedbya geographiclocationortags.Whiledoingso,usersareallowedtoaddorremovetags fromTweets.ThisinformationisstoredtoallowPalantirtodeterminethemembershipof atagtoaparticularTweet,whichistheratioofuserswhodon'tremovethetag. 4.5WhatPalantirUsesTags 4.5.1CreateanUserInterestProle Foreachuser,wecomputethetop k mostfrequentlyusedtags.Thisformsthe featuresetforaparticularuser,andletscallit frequentSet .Weincorporatefeedback givenbyotherusers(byvirtueofremovingtags)bycomputingthepercentageoftags thatwereremovedbyotherusersofthesystem,andatopk listofthesetags,called disputedSet .Thesetermsarerecomputedwhenevertheusermakesapost.Inparticular, thesetdifferenceof frequentSet and DisputedSet givesusanideaofthetopicsposted 33

PAGE 34

bythisuserthatisacceptedbyotherusers,called acceptedSet ,isthecomputeruser prolewhichisusedtodeterminewhichpollsaretotargetthisuser. 4.5.2Searching,TopicFollowing Usingthewebinterface,userswouldfollowTweetsbysearchingfortopicsthat interestthem.ThiswouldbeupdatedwheneverpeoplepostnewTweets.Thisinterface allowsuserstoreferenceTweetsandconsolidatethemintostories.Thesystemmakes useoftagco-occurrencetoshowTweetsfromrelatedtopicsaswell. 4.5.3TagConsolidation Whenthenumberofuniquetagsinthesystemreachaspeciedsize,wecompute thesemanticsimilarityofwordsinthetag.Groupsofwordsarepresentedtousers whentheysearchfortopicsinthewebapplication,andusersareallowedtogiveinput onwhetherthesewordsshouldbemerged.Whenthereisstrongagreementbetween users,thetagsaremergedandsaved. 34

PAGE 35

CHAPTER5 EXPERIMENTATIONANDEVALUATION Palantirreliesonuserinputforitstasks,namelytagrecommendation,polling andfactconsolidation.OnemethodofdoingthisisbypublicizingPalantir,andnding agroupofvolunteerswillingtocreatedatathatwecoulduseforevaluation.Thisis atoughundertaking,mainlyinthewayofrecruitingvolunteers,anditeratingover algorithmsandexperiments. Webelievethatalabormarketformicro-tasks,likeAmazonMechanicalTurk(AMT), iswellsuitedforourexperiments. Subsubsection5.1.1.2 explorescrowdsourcingwith afocusonAMT,existingresearchthatmakesuseofthatplatform,ourreasonsfor choosingAMTandnallytheexperimentswerunonAMT,anddiscussionaboutresults. 5.1Crowdsourcing Crowdsourcing,atermcoinedbyJeffHowein2006,wasdescribedbyhimas "theprocessbywhichthepowerofmanycanbeleveragedtoaccomplishfeatsthat wereoncetheprovinceofaspecializedfew"[ 47 ].Howeusedexamplesofpeople contributingtoWikipedia,uploadingvideostoYouTube,todemonstratetheconcept. Today,crowdsourcingismorepopularwhenreferringtoonlineserviceswhere publishersposttasksthatarecompletedbyagroupofpeopletofulllrequirements ofthepublisher.Thetasksarediverse,andmarketplaceslikeAmazonMechanicalTurk [ 48 ]haveabundantworkerstoensuretimelycompletionoftasks. 5.1.1AmazonMechanicalTurk(AMT) AmazonMechanicalTurk(AMT)[ 48 ]isanonlinelabormarketplacethatfocuseson assistingdevelopersinusinghumaninputfortheirprograms.Typically,thehumaninput requiredisforsimpletasksthatcannotyetbedonealgorithmicallyusingcomputers whilebeingcostandtimeefcient.Someoftheseproblemsareeasilysolvedby humansafterminimaltraining.Anexampleofsuchamicrotaskmightbetolookattwo photosofasinglepersontakenunderdifferentconditionstodeterminewhetherthey 35

PAGE 36

arethesameperson.AMTusesthetaglineArticialArticialIntelligencetobrandtheir service,stemmingfromthefactthatAMTcouldbeusedtoperformtasksthatArticial Intelligencecannot. 5.1.1.1BasicTerminology Workers Workersarehumanswhoselectandcompleteoneormultiplemicrotaskson AMT.WorkersarepaidwithrewardsthataredepositedintoAmazonPayments accounts,andmaybewithdrawnascash.Therewardorwageisdetermined individuallyforeachtaskgroupbyrequesters,andissubjecttoapprovalby requesters. Requesters Requestersarepeoplewhopublishtaskswhicharetobecompletedbyworkers. Therequesterdesignsthetasksandthestepsneededforitscompletion,decides therewardforatask,andacceptancecriteriaforthetask.Requestersmayalso limitvisibilityoftasksdependingonqualicationofworkersandrangeofprevious acceptanceratioofworkers. HIT AHumanIntelligenceTask(HIT)isajobpostedonAMT.Tasksthatareeasyand welldenedproducegoodresults.Tasksrangefromselectinggoodpicturesof storefronts,identifyingaddresses,etc.,towritingproductdescription,etc. HITType TheHITTypereferstothecharacteristicsoftheHIT,viz.titleoftheHIT,the requesterwhocreatedtheHIT,rewardbeingoffered,numberofHITsofthis type,timeforcompletionofHIT,auto-approvaltimeafterwhichworkersgetpaid automatically,qualicationsofworkerswhocanaccepttheHIT,andHITexpiry time. HITGroup 36

PAGE 37

AHITGroupcomprisesofHITsofthesametype.Thisistoenableworkersto easilyndsimilarHITs.WorkerspreferHITgroupswithalargenumberoftasks becausetheydonothavetoretrain,andcouldbettertheirskillwhilepickingup jobs. Assignment AMTsupportshavingmultipleworkersworkingonareplicaofthesameHIT.Each ofthesereplicasiscalledanassignment.AMTensuresthatworkerscanonly completeasingleassignmentforanHIT.Thisallowsrequesterstoevaluatethe qualityofsubmissionsbylookingathowotherworkershavecompletedthesame task. Qualications TherearerequirementstheworkermusthavetoworkonHITs.Thesequalication canbeauto-generatedbyAMTorcanbecreatedbyrequesters.Auto-generated qualicationincludecriterialikeapprovalpercentageoftheworker,thenumberof assignmentscompletedsuccessfully,etc.;whilecustomqualicationsarethose thatarecreatedbyrequesters,andareusuallytimeboundtasksthatneedtobe completedthatneedtobecompletedaccordingtotherequestersspecicationto begranted.Requestersarealsoabletograntqualicationsforworkerswhohave previouslyworkedforthem.Sometimesmultiplequalicationsmayberequiredfor agivenHIT. Reward RewardisthewagepaidtoaworkerforcompletingaHIT.OnapprovalofaHIT byarequester,rewardsareautomaticallytransferredfromtheprepaidAmazon paymentsaccountoftherequestertotheaccountoftheworker. LifecycleofaHIT Forarequester,therststepistoregisteranaccountwithAmazonpayment servicesusingaUSbasedcredit/debitcard.Theroleoftherequesteriscreating 37

PAGE 38

HITswhicharesubsequentlyputontheAMTmarketplace.AHITmaybecreated usingoneofthreeways,viz.usingtheRequesterUserInterface(RUI),usingthe AMTApplicationProgrammingInterface(API)orusingtheAMTCommandLine Tools(CLT).RegardlessofthetechniqueusedtocreateaHIT,therequester needstoprovideatitle,reward,timeforcompletionofHIT,assignments, auto-approvaltime,qualications,HITexpirytime.TheHITitselfcanbehosted onAmazonWebServices(AWS)orhostedexternally.ForaHITsubmittedusing RUIandhostedonAWS,therequesterprovidesaHITtemplatewithplaceholders fordata,whicharethenlledfromadatalebeforebeingsentouttoworkers. Requestersneedtoprepaythemaximumamountwhichcanbeconsumedby theirHITgroupsbeforetheyappearinthemarketplace.Theminimumreward foraHITis$0.01,withAmazoncharginga10%commissionofallpayments madeusingit'splatform.Theminimumcommissionchargedis$0.005.Once HITshavebeenposted,theyappearontheworkerinterface,whichisbydefault sortedbythemostrecenttimeaHIThasbeenposted/update.Workersthen selectoneormultiplejobsareofinterest,andcompletethetasks.Whenthe workerisdonewithonetaskinaHITgroup,he/sheisgiventheoptiontocomplete anothertaskwiththesameHITtype.OnewaytocontrolthisistohaveaHIT groupwithanumberofassignments,whereeachworkerislimitedtosubmitting asingleassignment.OnceHITshavebeencompletedbyworkers,requesters canreviewsubmissionsandapproveordenypayments.Atthispoint,requesters mayoptionallypresentabonustoaworker,orsetofworkers.Forworkerson AMT,theratioofsubmissionstothatofapprovalsiscalculatedanddisplayed alongwiththeirproles.RequestershavetheoptionofrestrictingHITstoonly workersmeetingacertainratio.Todealwithworkerswhoconsistentunsatisfactory performance,AMTallowsrequesterstoblockworkers.AMTdoesnotprovideany suchmetricsforrequesters,leavingworkerswithoutawaytoraterequesters. 38

PAGE 39

CommunityforumslikeTurkerNation[ 49 ]andTurkopticon[ 50 ]stepintoprovide ratingsandguidanceaboutrequesters.Ingeneralworkersprefertoworkwith requesterswhohavewelldenedHITswithclearacceptancecriteria,andprompt approvalandpaymentpractices.Ifarequesterhasambiguouspracticesthathurt theinterestsofworkers,theymaybeblacklistedonaforumlikeTurkerNation [ 49 ],causingthemalotofdifcultyingettingworkdoneusingAMT.AHITstops appearingontheworkerinterfacewheneitherallHITspostedinthatgrouphas completed,orifthetimeassignedfortheexpiryoftheHITispast.Theexpiry time,andotherparameterscouldbechangedbyarequesterwhentheHITisstill running,whichcausestheHITtobeupdatedandlistedatthetopoftheworker interface. 5.1.1.2RelatedWork Crowdsourcingisarelativelynewconceptthatresearchershavebeentoyingwith. In2004,LuisvonAhnandLauraDabbishcameupwiththeESPGame[ 51 ],which madeuseofcrowdwisdomtoannotateimages.Theyestimatedthat5000people playingthegamefor31dayswouldassignlabelstoallimagesindexedbyGoogle. Inanovelattempt,MichaelDenton[ 52 ]usedcrowdsourcingtocreateartonapublic sidewalk.Moretraditionalusesofcrowdsourcingaredatacollectionandsensing. ApplecollectswisignalstrengthalongwithGPScoordinatestomaketheirmapping servicesaccurate.Wazecollectsvelocityinformationtoproviderealtimetrafcdata tomotorists.Googleenrichestheirmapswithusersubmittedphotospinneddownto speciccoordinates. YahooResearchersMasonandSuri[ 53 ]delvedeeplyintousingAmazon MechanicalTurkforbehavioralresearch.Theexamineaplethoraofworkpertaining tocomparingAMTtotraditionalofinetests,andothertestsadministeredonline,and summarizeresults.TheynoticethatresultsfromwelldesignedstudiesonAMTare qualitativelyandquantitativelythesameasthoseconductedinlabsettings.Worker 39

PAGE 40

demographicsgivenbySuriandWatts[ 54 ]showthattheaverageageofaworkeris 30yearsold,with55%beingfemale,and45%beingmale,withmajorityofturkersfrom USAandIndia.Theyalsodetailtheirexperienceconductingsynchronousexperiments inAMTusingwaitingrooms. AMThasbeenputthroughavarietyofcreativeusesbydevelopersandresearchers. Bernsteinet.al[ 55 ]relyonTurKit's[ 56 ]AMTalgorithmstoiterativelyrenetextinside MicrosoftWord.Soylentprovideshumanpoweredtextshortening,proong,andan interfaceforrequestingarbitrarywordprocessingtasks.CrowdDb[ 57 ],presentsan openworlddatabasemodelwherequeriesformissinginformationistransparently convertedintoAMTHITs,andcrowdresultsaggregatedintotheresultspresentedtothe user.VizWiz[ 58 ]empowerspeoplewithlimitedvisualacuitytoperformvisualsearch byharnessingcognitiveskillsofhumansonAMT.VizWizonamobiledeviceallows theusertosnapapicture,verballyaskaquestionaboutit,andgetahumantoreplyin minutes. 5.2Experiments ThissectionpresentsresultsfromexperimentsrunwithPalantironAMTusingdata fromTwitter.UsingtheTwitterStreamingAPI,wecollectedabout270,000Tweetsfrom Twitterduringthesecondpresidentialdebatebetween9:00PMto10.30PMESTOctober 16,2012;whichformsourbasedataset.Wefoundthatasignicantpercentofthese TweetswerereTweets,whichwereeliminated.ToensurethatwecapturedonlyEnglish Tweetsrelatedtopolitics,weusethewordlistprovidedin TableA-1 .Wethenconstrain theTweetstothosethatoriginatedfromthestateofNewYork.Thisprocessreduced thenumberofTweetsto245,whichwereusedforrunning455HITsonMechanical TurkbetweenOctober16toOctober24,2012.Wedesignedexperimentstomimic waysinwhichPalantircouldbeused,andevaluateuserbehaviorwhenusingPalantir's keystone,it'stagrecommendationsystem.WealsovaryparametersspecictoAMT, viz.priceandjobsperHIT,andreportonthetimetakentocompletetasks,andthe 40

PAGE 41

!"##$%&'()*+ ,-.-/$0(%!-1% 2#3'44#/5#( 6//'$-$#5%!"##$% &'()*+%6%76!&869 AStage1 !""#$%$&'()*&&$( +#,-./(!(0!)+1!2 !""#$%$&'()*&&$( +#,-./(3(0!)+132 !4) BStage2 Figure5-1. Experiment1:Palantirbaseline qualityofresponses.Workerswhohadanapprovalratingofatleast98%andcould demonstrateabasicunderstandingofTwitterconceptsandterminologywerechosenfor thistask. WenotethatweareusingthesameTweetdatacorpusfordifferentexperimentsso thatcomparisonsmaybedrawn.However,thefunctioningofAMTcannotbestringently controlled,andresultsderivedusingAMTaredependentonpropertiesofAMTatthe timeoftheexperiments.Specically,theportionofcrowddoingexperimentsdetermine results.Thismightcausevariancesintheresultseverytimeanexperimentisrunwith adifferentportionofthecrowd.Inaddition,designofthespecictasksandassignment ofHITssignicantlyaffectresults.Despitethis,webelievethatwecangetsome interestinginsightsfromtheseexperiments. 5.2.1Experiment1:PalantirBaseline OurinitialevaluationseekstoestablishcoldbaselineperformanceofPalantir's tagrecommendersystem.WerunthisexperimenttoseehowwellPalantir'stag recommenderworkswhenthereisnopreviousdatainPalantir. 41

PAGE 42

Thisexperimentisrunintwostages.Intherststage,werunPalantir'stag recommenderoneveryTweetinourcorpus,andsetthenumberoftagsto5.Asthere arenoexistingtagsinthesystem,theresultsofthisexperimentaresolelycontributed byourimplementationoftheTweetopicalgorithm,asdescribedin Figure4-4 .Instage two,tohaveanideaofqualitativeperformanceofthisalgorithm,resultsweresentto AMT(5jobsperHIT,2assignments,25centsperHIT),askingworkerstopickout usefultagsfromtheonesgeneratedbyPalantir.ForeachTweet,workerswereaskedto inputtheusefulnessofeverytagona5pointscale.Theworkowforthisexperiment isshownin Figure5-1 .Workershadtohaveanapprovalratingofabove98%and demonstrateadequateunderstandingofTwitterandtagging,asdeterminedbyacustom qualicationtotakepartinthisexperiment. 5.2.1.1ExperimentalData AhistogramdepictingthedistributionofTweetscoresareshownin Figure5-2B Weuseatagratingof 0 tosignifythatthetagisblank,whileratings 1 to 5 rangefrom 'notuseful'to'mostuseful'.Wefoundthatjust7.1%oftagssuppliedbyPalantirwere markedasmostuseful,and6.2%markedasuseful.Thetagcloudcorrespondingtothe 338uniquetagsgeneratedbyPalantirareshownin Figure5-2A ,whilethepopularityof tagsareshownin Figure5-3A Workerstookabout19hourstocompletethisjob,withanaveragecompletiontime perHITbeing5.9minutes. 5.2.1.2Analysis WendthatPalantirperformsinadequatelyatrecommendingtagstouserswhenit itstartedwithoutanydata.Wenotethatamajorityofturkersstronglyfeltthattagswere unrelatedtotheTweetspresented.Wepositthattheratherlongtimetocompletethis setofHITswasbecausewewerenewrequestersontheAMTmarketplace.Workers arecautiousaboutnewrequestersasrequestershavetheabilitytoarbitrarilyreject workdonebyworkers,whichimpactsworker'sapprovalratingnegatively.Infact,the 42

PAGE 43

rstresultsforthisbatchofHITsstartedcominginonlyafterintroducingtheHITs,and providingashortdescriptionofwhattheyareusedfor,andhowwewouldevaluatethem onTurkerNation[ 49 ].Someworkersareapprehensiveofsubmittingworktorequesters whoevaluatetheirworkusingmajorityrules,perhapsduetoaperceivedthreatthattheir workcouldberejectedeveniftheyarecorrect. 5.2.2Experiment2:UnguidedHumanBaseline Thisexperiment,depictedin Figure5-4 ,isusedtodeterminebaselineperformance ofAMTfortaggingaTweet.Ourgoalistoexaminethebehaviorofpeopleannotating Tweetswithoutanyguidanceorrecommendations.Weareinterestedinndingoutthe quantityandqualityofthesetags,andhowusefulthecommunityperceivesthesetags. ThisexperimentrunssimilartoExperiment1,butwithadifferentrststage.Asweare benchmarkingtheperformanceofthecrowd,therststageischangedsothattagsare generatedbyturkers.ThisHITprovidesaTweet,andaskstheturkertosuggestupto5 tagsforit.Themethodologyofstagetwoofthisexperimentisidenticaltothatoftherst experiment.Wehopetomeasurecommunityacceptanceoftagsgeneratedbyturkers. 5.2.2.1ExperimentalResults Ahistogramofworkersratingispresentedin Figure5-5B ,andthetagcloudis shownin Figure5-5A .Wend,onaveragethat14.94%oftagsgeneratedbytheturker populationworkingontherststagewasmarkedasmostuseful,and10.16%oftags markedusefulbyturkersworkingonthesecondstage.Itisalsointerestingtonote that19.36%ofspacesfortagswereleftblank,indicatingthatfewerthanvetagswere requiredforsomeTweets.Forstageone,whenwepaidtheturkersawageof27cents perjob,thebatchwascompletedin19hours,withaveragecompletiontimeperTweet being7minutes.ThisbatchwassubmittedtoAMTalongwiththerstexperiment. Forthesecondstage,whenwepaidtheturkersawageof27centsperjob,thebatch wascompletedin1hourand40minutes,withtheaveragecompletiontimeperTweet 43

PAGE 44

ATagcloud 012345 Usefulness 0 500 1000 1500 2000 NumberofTags E1:Palantirbaseline BUsefulnessoftags Figure5-2. Experiment1:Results 44

PAGE 45

050100150200250300350 TagsTested 10 0 10 1 10 2 Popularity E1:Palantirbaseline ATagpopularity Figure5-3. Experiment1:Results !"##$%&'()*+ ,--'$.$#/%!"##$% &'()*+%,%0,!&1,2 ,3! AStage1 !""#$%$&'()*&&$( +#,-./(!(0!)+1!2 !""#$%$&'()*&&$( +#,-./(3(0!)+132 !4) BStage2 Figure5-4. Experiment2:TweetstaggedusingonlyusingAMT 45

PAGE 46

ATagcloud 012345 Usefulness 0 500 1000 1500 2000 NumberofTags E2:Humanbaseline BUsefulnessoftags Figure5-5. Experiment2:Results being14.7minutes.Thisdifferssignicantlyfromthetimetakenbythesecondstageof experimentoneby+8.8min. 5.2.2.2Analysis Weseethatforagoodpart,humansintheAMTcommunityagreeaboutthequality oftagsappliedbytheirpeers.Only3.1%oftagsweremarkedasnotuseful.There were303uniquetagsgeneratedbyhumansinthisexperiment,similartowhatPalantir baselinegenerated.However,thereisasignicantdifferenceinthemultiplicityofthe tagsasrevealedbytherespectivetagcloudsin Figure5-5A ,and Figure5-2A .We 46

PAGE 47

050100150200250300350 TagsTested 10 0 10 1 10 2 10 3 Popularity E2:Humanbaseline ATagpopularity Figure5-6. Experiment2:Results alsocomparetagpopularityofthesetwoexperiments,asshownin Figure5-3A ,and Figure5-6A tonotethisdifference.Weseethatwhilethenumberofuniquetagsare similar,therearefewtagswhichareconsiderablymorepopularthantheiralternatives. Thefactthatthesealternativetagsaren'tdiscoverableprovesunfavorabletothem, effectivelyburyingthem.Theadditionaltimeofabout9minutestakenperjobwas puzzlingtousuntilwecommunicatedwithaworker,whoinformedusthatwhenworkers acceptmultiplejobs,thetimersignifyingthatajobwasstartedistriggered(evenifthey didnotstartworkingonthejob).Thisshowsthatrelyingonjobcompletiontimeasa metricforAMTexperimentsshouldbeawellconsidereddecision. 5.2.3Experiment3:HeatedPalantir ThegoalofthisexperimentistomeasurethequalityoftagsproducedbyPalantir's tagrecommenderwhenithaspriordataabouttagsthathavebeenused.Italso 47

PAGE 48

!"#$%$&'(") !"#$%$&'("* !"# !++,-.-/01"2//-1 #,34561%7!"#$89 !++,-.-/01"2//-1 #,34561#7!"#$#9 (.:.+-;31".<1 =/>,??/+0/3 !++,-.-/01"2//-1 #,34561#7!"#$#9 (.:.+-;31 ".<18% "2//-1#,3456 Figure5-7. Experiment3:TweetstagsrecommendedbyPalantirandvalidatedbyAMT illustratestheinuenceofguidance,showingusthenumberoftimespeoplepicka tagthatexistsasopposedtocreatingtheirowntags. Weusetheresultsofthepreviousexperiment,describedin Subsection5.2.2 to populatedatastructuresusedbyPalantir.TorealisticallymodelPalantir'susecase,we runthisexperimentinasinglestage,withPalantirprovidingtagrecommendations.The experimentisdepictedin Figure5-7 .ThisHITwasrunwith1jobperassignment,with eachjobcontaining5Tweets. 5.2.3.1ExperimentalResults 48

PAGE 49

ATagcloud 012345 Usefulness 0 500 1000 1500 2000 NumberofTags E3:Heatedpalantir BUsefulnessoftags Figure5-8. Experiment3:Results 49

PAGE 50

050100150200250 TagsTested 10 0 10 1 10 2 Popularity E3:HeatedPalantir ATagpopularity Figure5-9. Experiment3:Results 012345 Usefulness 0 500 1000 1500 2000 NumberofTags AAlltags 012345 Usefulness 0 500 1000 1500 2000 NumberofTags BFirstfourtags 012345 Usefulness 0 500 1000 1500 2000 NumberofTags CFirstthreetags Figure5-10. Experiment3:Histogramsshowingthevariationintheusefulness distributionofTweetswhenonlytherst5,4,3tagspresentedbyPalantir arechosen 50

PAGE 51

Figure5-8B showsthedistributionofworkersratingtheTweetsgeneratedby Palantirwhenithasaccesstoatagcorpus.Thetagcloudcorrespondingtothis experimentisshownin Figure5-8A .Thistagcloudisgeneratedbytakingintoaccount overridesbyturkers.IncaseswhereatagrecommendedbyPalantirisreplacedbythe turker,weusethereplacedtag.Inthehistogram,weseethat19.7%oftagsareratedas mostusefulbyworkerswhile10.8%areratedasuseful.Thepercentageoftagsmarked asnotusefulis18.26%,whichismuchbetterthantheinitialperformanceofPalantirin experiment1,wherewesawthismetricwasashighas36.9%.Turkerswerepaid42 cents,andtook19hoursand30minutestocompletethisbatch.Theaverageresponse timewas4.85minutes. 5.2.3.2Analysis Atrstglance,itissurprisingtonotethatthenumberoftagsmarkedasmostuseful hasincreasedcomparedtothepreviousexperiment,whichinvolvedhumansratingtags createdbyotherhumans.Wefeelthatthisisattributedtothefactthatwhilehumans putinonlythemostrelevanttagspertainingtoaTweet,Palantirrecommendsahigher numberoftags,providingachanceformoretagstoberelevant.Thisalsocauses thepercentageoftagsthataremarkedas'notuseful'toriseto18.2%.Webelieve thatthisisagoodtradeoffcomparedtohavingtotypeinmoretags,whichwouldbe morepronouncedonaspaceconstrainedmobiledevice. Figure5-10 exploresthe behaviorofthePalantir'stagrecommendersystemwhenweconstrainittove,four andthreetags.Wendthattagsaresomewhatequallydistributedacrossallbins,and thatthisdoesnotsignicantlychangethedistribution.Thetagcloud Figure5-8A and tagdistribution Figure5-9A showthatthereisalargersetofpopulartagscomparedto thepreviousexperiment.Wefeelthatrecommendingtagsmakepeopleawareofthe optionstheyhavebeforetheyinventtheirowntagswhichmaynothavemainstream appeal.Thisisevidencedbythefactthatonly217uniquetagswereappliedtoTweetsin 51

PAGE 52

!"##$ !%&' (#)*+*,# -*../0%$#' 12%3*$4 0%$#' 5#.6%3# (#.*,)% !"# (#)*+*,# 777 -*../0%$#' 0%$#' 777 (#.*,)% 12%3*$4 5#.6%3# Figure5-11. Experiment4:SynonymdetectiononAMT thisexperiment,asopposedto303uniquetagsinexperiment2,and338uniquetagsin experiment1. 5.2.4Experiment4:AMTSynonymDetection Thisexperimentisdifferentfromtheprecedingexperimentsinthatwedon'task turkerstogeneratenewtags.Instead,weaskthemtosplittagsthathavebeenapplied toaspecicTweetintobuckets.Thegoalofthisexperimentistodeterminehowmany tagsthathavebeenappliedtoasingleTweetarewordsthatmightbeconsidered synonymsbythecommunity.Itisimportanttonotethattheconventionalmethodof 52

PAGE 53

1 01 52 02 53 03 54 0 Synonymgroups 0 20 40 60 80 100 120 Tweets ASynonymgroupsdistribution 1 01 52 02 53 03 54 0 Tags 0 50 100 150 200 250 Synonymgroups BTagsdistribution Figure5-12. Experiment4:SimilarWords usingasynonymdictionarylikeWordNet[ 59 ]failshereassomeofthewordsmightnot beconsideredtobysynonymswithoutcontext,ordon'thavesynonyms.Forexample, onetaggingcommunitymightconsider football and rugby tobeequivalentwhileanother communityconsiders football and soccer tomeanthesamething.Anadvantage ofusingAMTforthisisthatturkersalsodoentityresolution,consolidatingtagslike Samsunggalaxys3 and sgs3 ThisHITisstructuredasshownin Figure5-11 ,whereinaTweetispresented withtagsappliedtoit.ThesetagshavebeenconsolidatedperTweetfromprevious experiments.Theinterfacepresentsworkerswith5bucketswheretheycaninputtags. Tagsmayberepeatedindifferentbuckets. 5.2.4.1ExperimentalResults Figure5-12 describestheresultofthisexperiment.Weseethatthereanaverage of2.58taggroupsperTweethavingonaverage2.21tagspergroup.Whenwepaid35 centsforthistask,workerstook10hourstocompleteit,withtheaverageresponsetime being14.5minutes. 5.2.4.2Analysis Asseenbytheresultsofthisexperiments,thesametaggingcommunitymayuse multiplewordswhichhavesimilarmeaningstotagTweets.Thefactthatthecommunity 53

PAGE 54

isabletopartitiontagsintobuckets,onanaverageof2.58bucketsperTweetsuggest thatwhenpeopleapplymorethan3tagstoaTweet,theyareusingmultiplesynonyms totagthesameTweet.Thismaybeboughtdownbyensuringthatsynonymsarenot automaticallysuggestedbythesystem,astheyhappentobenow. 5.2.5SummaryofResults WeseethattheperformanceofPalantirasmeasuredbythepercentageoftags peopleratedas'mostuseful'and'useful'risesdramaticallywhilethepercentageof tagsratedas'notuseful'fallstoabouthalfofwhatitiswhenthereisnodataloaded. Inaddition,wenotethatthereareonly217uniquetagswhentaggingusingPalantir, ascomparedto303whenunguidedhumanstag.Whilewearen'tabletocommenton thetimetakenbytheAMTcommunitytoapplythesetags,weexpectthattaggingusing Palantirwouldbequickercomparedtousinganunguidedapproach. 54

PAGE 55

CHAPTER6 CONCLUSIONANDFUTUREWORK 6.0.6Conclusion Inthisthesis,wedescribedPalantir,anarchitecturethatusescollectivehuman intelligenceinmicrobloggingasameanstoachievecoherentsnapshotsofreal worldevents.Westudiedthefeasibilityofrecommendingtagstoanonlinemicroblog community,andmeasuredhowusefulthetagswereforthecommunity.Wenoticethat whenhumansarelefttotagmicroblogpostswithoutanyguidanceonthecontentof tags,theyselecttagsthatareverygeneral,oraretoospecic,indicatingtroublein selectingatagwhichhastherightnumberofselectivity.Pickingatagthatishighly specictoaTweetisasetbacktothetag'spopularity,asitcannotbeappliedtomost otherTweets.Ontheotherhand,therearecaseswheretagsspecictousershave helpedthemretrieveTweetsthatareofinteresttothatuseralone.Palantirassistsusers byshowingcandidatetagswhichmaybeeasilyapplied.Forusersnewtothesystem, thisguidancecouldbeimportanttohavingthemengagedirectlyandinstantaneously. Palantir,evenwheninaccurate,passivelyincreasestheuser'sknowledgeoftags existinginthesystem,contributingtoserendipitousdiscoveriesoftagsandinterests. 6.0.7FutureWork Palantirwasconceptualizedwiththegoalofpersuadingpeopleofparticipating incontentcreationduringtimeswhensuchcontentismostcrucial.Whilethetag recommendersystemallowspeopletojoininonconversationsaroundaparticular tag,anddiscovernewtagstocontributeto,Palantircangomuchfarther,byenabling otherscenariossummarizedinthebeginningof Chapter4 .Wehighlightsomeofthese possibilitiesbelow. 6.0.7.1ContentSyndication PeopleusingPalantirarealreadypluggedintoastreamofinformationgenerated byTwitter,whichhasbeenlteredbytopicsthatareofinteresttothem,andtopicsthey 55

PAGE 56

contributeto.WecanfostercitizenjournalismbyenablingPalantiruserstoformadhoc groupstoreportonaspecicevent.ByconstructingPalantirasaservice,wecould createclientsonavarietyofdevices.PeopleusingPalantirondifferentdevicesmay useitfordifferentpurposes.Onadevicelikeadesktop,whichhasabigscreenwith aspecializedtextinputdevice,anusermaycombthroughTweetsofinteresttopick onesthathecouldweaveintoastory.AmobileusermayimprovePalantirbyapplying Tweetstotagsandreportingonevents.Asymbioticrelationmaybeestablishedby thesegroupsofpeoplewhereusershavingmobiledevicesarereportfromthesiteof theeventwhilethoseonmorepowerfulmachineschannelthesetidbitsintoinformation thatcanbereadilydigestedbyoutsiders. 6.0.7.2SurveyCreation Awaytomakethepreviousapproachmorerobustwouldbetoaskmorepeople forinformation,assupplementalviewpointsmaypaintaholisticpicture.Further,mobile devicesaregreatforcontentconsumption,andpeoplecanusethesametoread throughsucharticlesandtag,commentandratethem.BecausePalantiralreadyknows users'location,andtagsthattheyhavecontributedtowardsmost,itmightbepossibleto targetthesequestionstothesubsectionofpeoplewhoarelikelytobemoreinterested init.Withtheadventofpushnotications,highspeeddata,andcomputationpower availableintodaysportablemobiledevices,webelievethatwearenolongershackled byhardwarecapabilities.However,wewouldneedtodevelopsophisticatedwaysof managingreputationoftheseusersinthesystem,andprovidehumanstewardship tosustainandgrowthecommunity.Ofsignicantimportanceishavingtheabilityof lteringoutmisinformationfromTweets.Areputationmanagementsystemthatdoes providesgoodreachtonewpeopleconstructivelyusingthesystemwhilesimultaneously preventingestablishedusersfromtwistingcurrentfactswouldworkwellforthispurpose. Itwouldbeaninterestingchallengetobuildatrustedsystemthatisexpressivewhilenot beingoverlyconstrained. 56

PAGE 57

APPENDIX WORDLISTS TableA-1.FilterTerms S.no Word 1 47 2 5trillion 3 6studies 4 abort 5 abortion 6 absurd 7 accurate 8 afghanistan 9 akin 10 alaska 11 ambassador 12 anderson 13 anti 14 apploause 15 approval 16 arafat 17 arithmetic 18 assessment 19 bain 20 benjamin 21 bibi 22 biden 23 biggovernment 57

PAGE 58

TableA-1.Continued 24 billionaires 25 binladen 26 bipartisan 27 bird 28 bowles 29 brilliance 30 broad-minded 31 budget 32 buffet 33 bush 34 business 35 canada 36 candidates 37 candy 38 cbo 39 charlie 40 cheers 41 cheny 42 china 43 chinese 44 chouces 45 civilrights 46 class 47 college 48 colorado 49 commander 58

PAGE 59

TableA-1.Continued 50 commission 51 companies 52 congressman 53 controversialissues 54 cooper 55 corporate 56 credibility 57 crippling 58 crist 59 criticial 60 crowley 61 daily 62 debate 63 democrat 64 denver 65 depression 66 detroit 67 differences 68 different 69 dishonest 70 dodge 71 domestic 72 donald 73 dreamact 74 earth 75 economy 59

PAGE 60

TableA-1.Continued 76 education 77 egypt 78 eisenhower 79 elk 80 energy 81 environmentalpolicy 82 exxonmobil 83 factcheck 84 federaldecit 85 nland 86 foreignpolicy 87 fraud 88 fundamentalist 89 gallup 90 gates 91 gay 92 gaymarriage 93 giuliani 94 gop 95 governer 96 governing 97 government 98 graduate 99 green 100 growth 101 half 60

PAGE 61

TableA-1.Continued 102 healthcare 103 healthcarereform 104 hempstead 105 hispanic 106 hisses 107 hofstra 108 homelandsecurity 109 benghazi 110 incentives 111 inclusive 112 independent 113 intelligence 114 intolerant 115 iran 116 iraq 117 israel 118 israelis 119 jet 120 jim 121 jobs 122 kill 123 korans 124 kosher 125 language 126 latino 127 left 61

PAGE 62

TableA-1.Continued 128 lehrer 129 liar 130 libya 131 linda 132 lying 133 malarkey 134 marine 135 martha 136 math 137 mcmahon 138 medicaid 139 medicare 140 michelle 141 mideastpolicy 142 middleeasternpolicy 143 military 144 mitt 145 morris 146 mubarak 147 mullahs 148 multi-cultural 149 netanyahu 150 newshour 151 nominee 152 nuclear 153 ny 62

PAGE 63

TableA-1.Continued 154 obama 155 obamacare 156 obsolete 157 ohio 158 oil 159 opportunity 160 osama 161 oval 162 overseas 163 overwhelming 164 palestine 165 pbs 166 peace 167 peaceful 168 perception 169 philipmorris 170 pickering 171 plutocrat 172 polluter 173 pollution 174 potus 175 president 176 principal 177 priority 178 prochoice 179 proenvironment 63

PAGE 64

TableA-1.Continued 180 progressive 181 queada 182 raddatz 183 rape 184 reagan 185 regressive 186 resilience 187 right-wing 188 roby 189 roev.wade 190 roll 191 romney 192 romneycare 193 roughly 194 rudy 195 russia 196 russian 197 ryan 198 sanctions 199 satan 200 school 201 scotus 202 sensata 203 shipping 204 silent 205 simpson 64

PAGE 65

TableA-1.Continued 206 sixstudies 207 slowest 208 small 209 smallgovernment 210 smoking 211 socialsecurity 212 socialservices 213 socialist 214 soup 215 souza 216 speciecs 217 stewart 218 stockman 219 stunt 220 subjects 221 taces 222 taibbi 223 taliban 224 taxcode 225 ticket 226 todd 227 tolerant 228 training 229 trickle-down 230 tripoli 231 trump 65

PAGE 66

TableA-1.Continued 232 tsa 233 unbalanced 234 uninsured 235 veracity 236 vietnam 237 vp 238 waivers 239 war 240 warren 241 wealth 242 wealthy 243 weapon 244 women 245 work 246 yassir 247 york 66

PAGE 67

REFERENCES [1] "Stateofthenewsmedia,"2012.[Online].Available: http://www.stateofthemedia.org [2] M.Garber,"Twitter,theconversation-enabler?actually,mostnewsorgsusethe serviceasagloriedrssfeed,"2011.[Online].Available: goo.gl/tWrMs [3] G.Chen,"Breaking-newssituationsrequireabreaking-news approach,"2012.[Online].Available: http://www.niemanlab.org/2012/01/ gina-chen-breaking-news-situations-require-a-breaking-news-approach/ [4] S.Sen,S.K.Lam,A.M.Rashid,D.Cosley,D.Frankowski,J.Osterhouse,F.M. Harper,andJ.Riedl,"tagging,communities,vocabulary,evolution,"in Proceedings ofthe200620thanniversaryconferenceonComputersupportedcooperativework ser.CSCW'06.NewYork,NY,USA:ACM,2006,pp.181190.[Online].Available: http://doi.acm.org/10.1145/1180875.1180904 [5] "Storify,"2012.[Online].Available: http://storify.com/ [6] "Storify:Aboutus,"2012.[Online].Available: http://storify.com/about [7] M.J.Tenore,"25waystousefacebook,twitterstorifytoimprovepoliticalcoverage," 2011.[Online].Available: http://www.poynter.org/how-tos/digital-strategies/151883/ 25-ways-to-use-facebook-twitter-storify-to-improve-election-coverage/ [8] E.Zak,"Howjournalistscanusestorifytocoveranytypeof meeting,"2012.[Online].Available: http://www.mediabistro.com/10000words/ how-to-use-storify-to-cover-a-meeting-workshop-or-event b9068 [9] "Appstore:Vibe,"2012.[Online].Available: http://itunes.apple.com/us/app/vibe/ id433067417?mt=8 [10] J.Wortham,"Messagingappgrowswithwallstreetprotests," 2011.[Online].Available: http://bits.blogs.nytimes.com/2011/10/12/ anonymous-messaging-app-vibe-gets-boost-from-occupy-wall-street/ [11] "Dataminr,"2012.[Online].Available: http://www.dataminr.com/ [12] T.C.Sottek,"Dataminranalyzesover340milliontweetsadayto trackandpredictglobalevents,"2012.[Online].Available: http: //www.theverge.com/2012/4/9/2936816/dataminr-twitter-data-predict-events [13] "Wikinews,"2012.[Online].Available: http://en.wikinews.org [14] "Aboutcnnireport,"2012.[Online].Available: http://ireport.cnn.com/about.jspa [15] K.EhrlichandN.Shami,"Microblogginginsideandoutsidetheworkplace," in InProceedingsofthe4thInternationalAAAIConferenceonWeblogsandSocial Media ,2010,(ICWSM2010),AAAIPublications.[Online].Available: http://www.cs.cornell.edu/ sadats/icwsm2010.pdf 67

PAGE 68

[16] A.Mathes,"Folksonomies-cooperativeclassicationandcommunicationthrough sharedmetadata,"December2004.[Online].Available: http://www.adammathes. com/academic/computer-mediated-communication/folksonomies.html [17] M.GuyandE.Tonkin,"Folksonomies:Tidyinguptags?" DLibMagazine ,vol.12,no.1,January2006.[Online].Available: http://www.dlib.org/dlib/january06/guy/01guy.html [18] M.S.Bernstein,B.Suh,L.Hong,J.Chen,S.Kairam,andE.H.Chi,"Eddi: Interactivetopic-basedbrowsingofsocialstatusstreams," Fortune ,pp.303312, 2010.[Online].Available: http://portal.acm.org/citation.cfm?id=1866077 [19] A.M.KaplanandM.Haenlein,"Theearlybirdcatchesthenews: Ninethingsyoushouldknowaboutmicro-blogging," BusinessHorizons vol.54,no.2,pp.105113,March2011.[Online].Available: http: //ideas.repec.org/a/eee/bushor/v54yi2p105-113.html [20] "Wikipedia:Microblogging,"082012.[Online].Available: http://en.wikipedia.org/wiki/ Microblogging [21] "Twitterblog:Onehundredmillionvoices,"2012.[Online].Available: http://blog.twitter.com/2011/09/one-hundred-million-voices.html [22] "Twittertosurpass500millionusers,"2012.[Online].Available: http://www.mediabistro.com/alltwitter/500-million-registered-users b18842 [23] H.Kwak,C.Lee,H.Park,andS.Moon,"Whatistwitter,asocialnetworkoranews media?"in Proceedingsofthe19thinternationalconferenceonWorldwideweb ser.WWW'10.NewYork,NY,USA:ACM,2010,pp.591600.[Online].Available: http://doi.acm.org/10.1145/1772690.1772751 [24] Z.PapacharissiandM.deFatimaOliveira,"Affectivenewsand networkedpublics:Therhythmsofnewsstorytellingonegypt," Journal ofCommunication ,vol.62,no.2,pp.266282,2012.[Online].Available: http://dx.doi.org/10.1111/j.1460-2466.2012.01630.x [25] L.Grossman,"Iranprotests:Twitter,themediumofthemovement," TimeMagazine vol.17,2009. [26] C.Beaumont,"Newyorkplanecrash:Twitterbreaksthenews,again," 2009.[Online].Available: http://www.telegraph.co.uk/technology/twitter/4269765/ New-York-plane-crash-Twitter-breaks-the-news-again.html [27] J.Wortham,"Michaeljacksontopsthechartsontwitter," 2009.[Online].Available: http://bits.blogs.nytimes.com/2009/06/25/ michael-jackson-tops-the-charts-on-twitter/ 68

PAGE 69

[28] C.Beaumont,"Mumbaiattacks:Twitterandickrusedtobreaknews,"2008. [Online].Available: http://www.telegraph.co.uk/news/worldnews/asia/india/3530640/ Mumbai-attacks-Twitter-and-Flickr-used-to-break-news-Bombay-India.html [29] J.O'Dell,"Onetwitteruserreportslivefromosamabinladenraid,"2011.[Online]. Available: http://mashable.com/2011/05/02/live-tweet-bin-laden-raid/ [30] T.Sakaki,M.Okazaki,andY.Matsuo,"Earthquakeshakestwitterusers: real-timeeventdetectionbysocialsensors,"in Proceedingsofthe19thinternational conferenceonWorldwideweb ,ser.WWW'10.NewYork,NY,USA:ACM,2010, pp.851860.[Online].Available: http://doi.acm.org/10.1145/1772690.1772777 [31] D.Boyd,S.Golder,andG.Lotan,"Tweet,tweet,retweet:Conversationalaspectsof retweetingontwitter,"in SystemSciences(HICSS),201043rdHawaiiInternational Conferenceon ,jan.2010,pp.110. [32] A.Java,X.Song,T.Finin,andB.Tseng,"Whywetwitter:understanding microbloggingusageandcommunities,"in Proceedingsofthe9thWebKDDand 1stSNA-KDD2007workshoponWebminingandsocialnetworkanalysis ,ser. WebKDD/SNA-KDD'07.NewYork,NY,USA:ACM,2007,pp.5665.[Online]. Available: http://doi.acm.org/10.1145/1348549.1348556 [33] P.Andr e,M.S.Bernstein,andK.Luther,"WhoGivesATweet?Evaluating MicroblogContentValue,"in ProceedingsofCSCW2012 ,Feb.2012.[Online]. Available: http://www.cs.cmu.edu/ pandre/pubs/whogivesatweet-cscw2012.pdf [34] D.Boyd,"Twitter:"pointlessbabble"orperipheralawareness?"2009.[Online]. Available: http://www.zephoria.org/thoughts/archives/2009/08/16/twitter pointle.html [35] A.Watters,"Howrecentchangestotwitter'stermsofservicemighthurtacademic research,"2011. [36] R.Higashinaka,N.Kawamae,K.Sadamitsu,Y.Minami,T.Meguro,K.Dohsaka,and H.Inagaki,"Buildingaconversationalmodelfromtwo-tweets,"in AutomaticSpeech RecognitionandUnderstanding(ASRU),2011IEEEWorkshopon ,dec.2011,pp. 330335. [37] S.Wu,J.M.Hofman,W.A.Mason,andD.J.Watts,"Whosayswhattowhomon twitter,"in Proceedingsofthe20thinternationalconferenceonWorldwideweb ,ser. WWW'11.NewYork,NY,USA:ACM,2011,pp.705714.[Online].Available: http://doi.acm.org/10.1145/1963405.1963504 [38] C.Arthur,"Whatisthe1%rule?"2006.[Online].Available: http: //www.guardian.co.uk/technology/2006/jul/20/guardianweeklytechnologysection2 [39] M.P.BillHeil,"Newtwitterresearch:Menfollowmenandnobodytweets,"2009. [Online].Available: http://blogs.hbr.org/cs/2009/06/new twitter research men follo. html 69

PAGE 70

[40] M.Newman,"Powerlaws,paretodistributionsandzipf'slaw," Contemporary Physics ,vol.46,no.5,pp.323351,2005.[Online].Available: http://www.tandfonline.com/doi/abs/10.1080/00107510500052444 [41] A.Maslow,"Atheoryofhumanmotivation," PsychologicalReview ,vol.50, pp.370396,1943.[Online].Available: http://psychclassics.yorku.ca/Maslow/ motivation.htm [42] S.Sen,J.Vig,andJ.Riedl,"Tagommenders:connectinguserstoitemsthrough tags,"in Proceedingsofthe18thinternationalconferenceonWorldwideweb ,ser. WWW'09.NewYork,NY,USA:ACM,2009,pp.671680.[Online].Available: http://doi.acm.org/10.1145/1526709.1526800 [43] D.M.Blei,A.Y.Ng,andM.I.Jordan,"Latentdirichletallocation." Journal ofMachineLearningResearch ,vol.3,pp.9931022,2003.[Online].Available: http://dblp.uni-trier.de/db/journals/jmlr/jmlr3.html#BleiNJ03 [44] T.Hofmann,"Probabilisticlatentsemanticanalysis."in UAI ,K.B.Laskeyand H.Prade,Eds.MorganKaufmann,1999,pp.289296.[Online].Available: http://dblp.uni-trier.de/db/conf/uai/uai1999.html#Hofmann99 [45] K.Toutanova,D.Klein,C.D.Manning,andY.Singer,"Feature-richpart-of-speech taggingwithacyclicdependencynetwork,"in NAACL'03:Proceedingsofthe2003 ConferenceoftheNorthAmericanChapteroftheAssociationforComputational LinguisticsonHumanLanguageTechnology .Morristown,NJ,USA:Association forComputationalLinguistics,2003,pp.173180.[Online].Available: http://portal.acm.org/citation.cfm?id=1073445.1073478 [46] DefenseMappingAgency,"Theuniversalgrids:UniversalTransverseMercator (UTM)andUniversalPolarStereographic(UPS),"DefenseMappingAgency, Hydrographic/TopographicCenter,Fairfax,VA,USA,Tech.Rep.TM8358.2,1989. [Online].Available: http://earth-info.nga.mil/GandG/publications/ [47] J.Howe, Crowdsourcing:WhythePoweroftheCrowdIsDrivingtheFutureof Business ,1sted.CrownBusiness,August2008.[Online].Available: http://www.worldcat.org/isbn/0307396207 [48] "Wikipedia:Amazonmechanicalturk,"2012.[Online].Available: http: //en.wikipedia.org/wiki/Amazon Mechanical Turk [49] "mturkforum:Turkernation,"2012.[Online].Available: www.turkernation.com [50] "Turkopticon,"2012.[Online].Available: http://turkopticon.differenceengines.com/ [51] L.vonAhnandL.Dabbish,"Labelingimageswithacomputergame,"in ProceedingsoftheSIGCHIconferenceonHumanfactorsincomputingsystems ser.CHI'04.NewYork,NY,USA:ACM,2004,pp.319326.[Online].Available: http://doi.acm.org/10.1145/985692.985733 70

PAGE 71

[52] M.Denton,"Crowdsourcingtheproductionofpublicart,"Master'sthesis,Massey, 2010.[Online].Available: http://mro.massey.ac.nz/handle/10179/1345 [53] W.MasonandS.Suri,"Conductingbehavioralresearchonamazon'smechanical turk," BehaviorResearchMethods ,vol.44,pp.123,2012.[Online].Available: http://dx.doi.org/10.3758/s13428-011-0124-6 [54] S.SuriandD.J.Watts,"Cooperationandcontagioninweb-based,networkedpublic goodsexperiments," SIGecomExch. ,vol.10,no.2,pp.38,Jun.2011.[Online]. Available: http://doi.acm.org/10.1145/1998549.1998550 [55] M.S.Bernstein,G.Little,R.C.Miller,B.Hartmann,M.S.Ackerman,D.R.Karger, D.Crowell,andK.Panovich,"Soylent:awordprocessorwithacrowdinside," in Proceedingsofthe23ndannualACMsymposiumonUserinterfacesoftwareand technology ,ser.UIST'10.NewYork,NY,USA:ACM,2010,pp.313322.[Online]. Available: http://doi.acm.org/10.1145/1866029.1866078 [56] G.Little,"Turkit:Toolsforiterativetasksonmechanicalturk,"in VisualLanguages andHuman-CentricComputing,2009.VL/HCC2009.IEEESymposiumon ,sept. 2009,pp.252253. [57] M.J.Franklin,D.Kossmann,T.Kraska,S.Ramesh,andR.Xin,"Crowddb: answeringquerieswithcrowdsourcing,"in Proceedingsofthe2011international conferenceonManagementofdata .NewYork,NY,USA:ACM,2011,pp.6172. [Online].Available: http://doi.acm.org/10.1145/1989323.1989331 [58] J.P.Bigham,C.Jayant,H.Ji,G.Little,A.Miller,R.C.Miller, A.Tatarowicz,B.White,S.White,andT.Yeh,"Vizwiz:nearly real-timeanswerstovisualquestions."in W4A ,C.Asakawa,H.Takagi, L.Ferres,andC.C.Shelly,Eds.ACM,2010,p.24.[Online].Available: http://dblp.uni-trier.de/db/conf/w4a/w4a2010.html#BighamJJLMMTWWY10 [59] "Wordnet,"2012.[Online].Available: http://wordnet.princeton.edu/ 71

PAGE 72

BIOGRAPHICALSKETCH PrithviRajwasborninChennai,India.HeattendedCrescentEngineeringCollege, Chennaiandgraduatedwithabachelor'sdegreeincomputerscienceandengineering fromAnnaUniversity,Chennaiin2010. HejoinedtheDepartmentofComputerandInformationScienceandEngineering attheUniversityofFloridainFall2010.Hisinterestsincludecrowdcomputing,human computerinteraction,andinformationvisualization. 72