Citation
Privacy-Preserving Data Analytics for Big Data Applications

Material Information

Title:
Privacy-Preserving Data Analytics for Big Data Applications
Creator:
Gong, Yanmin
Place of Publication:
[Gainesville, Fla.]
Florida
Publisher:
University of Florida
Publication Date:
Language:
english
Physical Description:
1 online resource (164 p.)

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Electrical and Computer Engineering
Committee Chair:
FANG,YUGUANG
Committee Co-Chair:
SHEA,JOHN MARK
Committee Members:
SAHNI,SARTAJ KUMAR
CHEN,SHIGANG
Graduation Date:
8/6/2016

Subjects

Subjects / Keywords:
Customers ( jstor )
Data encryption ( jstor )
Datasets ( jstor )
Detection ( jstor )
Mobile devices ( jstor )
Munchausen syndrome by proxy ( jstor )
Preliminary proxy material ( jstor )
Proxy reporting ( jstor )
Proxy statements ( jstor )
Statistics ( jstor )
Electrical and Computer Engineering -- Dissertations, Academic -- UF
bigdata -- iot -- privacy -- security
Genre:
bibliography ( marcgt )
theses ( marcgt )
government publication (state, provincial, terriorial, dependent) ( marcgt )
born-digital ( sobekcm )
Electronic Thesis or Dissertation
Electrical and Computer Engineering thesis, Ph.D.

Notes

Abstract:
In today's information age, companies are increasingly acquiring and storing vast amounts of data about their users and their users' activities using information and communication technologies. Advances in big data analytics enable companies to examine these data to uncover hidden patterns, correlations, and other revealing information, improving the quality of their services. However, with massive data and advanced data analytical techniques, far more information can be inferred than most people have anticipated at the time of data collection/publication, as evidenced by recent privacy leakage incidents such as the AOL search log scandal and the de-anonymization of Netflix prize data. Traditional privacy-preserving techniques are either insufficient against such new privacy attacks (e.g., anonymization and privacy notices) or preventing reasonable data usage (e.g., data encryption and data deletion). The need for a secure and privacy-preserving solution to allowing people to learn information as it was intended and stopping people from learning information in ways it was not has motivated my research. My research attempts to provide a systematic view and an in-depth understanding of the security and privacy issues in data-intensive applications, and design practical, secure, and privacy-preserving protocols for performing tasks widely used in big data analytics. Specifically, my research is focused on formalizing and addressing security and privacy problems with emphasis on the following domains: (1) mobile health (mHealth), which uses emerging mobile telecommunication and network technologies to deliver healthcare services such as remote health monitoring, remote data collection, and diagnostic and patient-group support; (2) smart grid, which is a modernized power grid that uses information and communication technologies to improve the efficiency, reliability, economics, and sustainability of power systems, (3) mobile cloud computing, which enables resource-constrained mobile devices to utilize computational resources of varied cloud-based resources such as proximate mobile computing entities, and (4) mobile crowdsourcing, which utilizes the advanced sensing, computing, and communication capabilities of mobile devices to provide crowdsourcing services. These applications generate massive datasets, i.e., medical, metering, or mobile data, respectively. A major challenge throughout these applications is to design secure and privacy-preserving mechanisms that can handle the volume, velocity, and variety of the involved data. To address this challenge, my research approach is first to identify the data analysis goals in these applications, then to rigorously model and analyze the security and privacy issues, and finally, to devise solutions that achieve the data analysis goals with rigorous and provable security and privacy guarantees. Throughout my projects, I integrate computational, information-theoretic, and cryptographic techniques to construct simple, efficient, and practical solutions. ( en )
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Thesis:
Thesis (Ph.D.)--University of Florida, 2016.
Local:
Adviser: FANG,YUGUANG.
Local:
Co-adviser: SHEA,JOHN MARK.
Statement of Responsibility:
by Yanmin Gong.

Record Information

Source Institution:
UFRGP
Rights Management:
Copyright Gong, Yanmin. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Classification:
LD1780 2016 ( lcc )

Downloads

This item has the following downloads:


Full Text

PAGE 1

PRIVACY-PRESERVINGDATAANALYTICSFORBIGDATAAPPLICATIONSByYANMINGONGADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFDOCTOROFPHILOSOPHYUNIVERSITYOFFLORIDA2016

PAGE 2

c2016YanminGong

PAGE 3

Tomyparentsandmyhusband,fortheirendlesssupportandunconditionallove.Iloveyoualldearly.

PAGE 4

ACKNOWLEDGMENTSCompletionofthisdoctoraldissertationwaspossiblewiththesupportofseveralpeople.Iwouldliketoexpressmysinceregratitudetoallofthem.Firstofall,Iamextremelygratefulformyadvisor,Dr.YuguangFang,forhisinvaluableguidanceandconsistentsupportwithmyyearsinWirelessNetworksLaboratory(WINET).Dr.Fangguidedmyresearchwithhisknowledgeandexperience,andmoreimportantly,aidedmypersonalgrowthwithhispatienceandthoughtfulnessinthepastfewyears.Thisdissertationwouldnotbepossiblewithouthissupportandencouragement.Heshowedmehowtobeanenthusiastic,supportive,andinuentialadvisor,andIhopeIcouldbeasgoodinmyfuturecareer.IalsowouldliketothankDr.SartajSahni,Dr.ShigangChen,andDr.JohnSheaforservingonmysupervisorycommittee.Iamespeciallygratefultothemfortheirgreathelpandsupportinmyjobhuntingdespitetheirbusyschedules.IwouldliketoextendmythankstoallmyfantasticcollegesinWINETforprovidingmeafamily-likeenvironmentandfortheircollaborationandinsightfuladvice.IwouldliketospeciallyacknowledgeDr.JinyuanSun,Dr.ChiZhang,Dr.MiaoPan,Dr.PanLi,Dr.MingLi,Dr.KaiheXu,Dr.HuangLin,Dr.HaoYue,Dr.LinkeGuo,Dr.YingCai,Dr.PengboSi,Dr.HongningLi,Dr.LingboWei,Dr.YanLong,Dr.BaiDu,Dr.ChaonongXu,MortezaShahriariNia,HaichuanDing,JianqingLiu,YaodanHu,YaweiPang,ShunrongJiang,andXuanhengLiformanyvaluablediscussionsandallthegoodmemories.Specialthankstomyfamilyandmyhusband,YuanxiongGuo,fortheircontinualsupportandlove.Theirlovemotivatesmetokeepmovingforwardnomatterwhatobstacleshavecomemyway. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS ................................. 4 LISTOFTABLES ..................................... 8 LISTOFFIGURES .................................... 9 ABSTRACT ........................................ 11 CHAPTER 1INTRODUCTION .................................. 13 1.1Privacy-PreservingMiningofMedicalData .................. 14 1.2UsingSmartMeterDataPrivatelyinDemandResponsePrograms ..... 15 1.3PrivacyandQualityControlofMobileData ................. 17 1.4ProtectionofLocationDatainMobileCrowdsourcing ............ 18 1.5FutureResearchDirections ........................... 19 2PRIVACY-PRESERVINGLEARNINGOFMEDICALDATA .......... 21 2.1SystemModel .................................. 24 2.1.1SystemArchitecture ........................... 24 2.1.2ThreatModel .............................. 25 2.1.3DesignGoals ............................... 26 2.1.4LogisticRegression ........................... 27 2.2PrivatePredictiveModelTrainingViaDistributedComputation ...... 29 2.2.1BasicsofADMM ............................ 29 2.2.2HorizontallyPartitionedData ..................... 31 2.2.3VerticallyPartitionedData ....................... 36 2.3PrivateAggregationofLocalRegressionParameters ............. 40 2.3.1PrivateComputationatthemHealthServer ............. 40 2.3.2ProvablePrivacy ............................ 44 2.4PerformanceEvaluation ............................ 46 2.4.1ResultsonHorizontallyPartitionedDataset ............. 47 2.4.2ResultsonVerticallyPartitionedDataset ............... 49 2.5RelatedWork .................................. 53 3SMARTMETERDATAPRIVACYINDEMANDRESPONSEPROGRAMS . 56 3.1CryptographicPrimitives ............................ 58 3.2SystemModel .................................. 60 3.2.1Background ............................... 60 3.2.2Components ............................... 61 3.2.3SystemFlow ............................... 62 3.2.4DesignGoals ............................... 63 5

PAGE 6

3.3BasicProtocolDesign ............................. 64 3.3.1RegistrationProcess .......................... 64 3.3.2MeteringandQueryingProcesses ................... 65 3.3.3SettlementProcess ........................... 66 3.3.4RevocationProcess ........................... 67 3.4PracticalConsiderationsandExtensions ................... 68 3.4.1CloakingMechanism .......................... 68 3.4.2PseudonymUpdate ........................... 69 3.4.3Re-identication ............................. 70 3.5SecurityandPrivacyAnalysis ......................... 71 3.5.1DataIntegrity .............................. 71 3.5.2Privacy .................................. 72 3.5.3UnlinkabilityBetweenPseudoAccountsandIdentiableAccounts . 73 3.6PerformanceAnalysis .............................. 74 3.7RelatedWork .................................. 76 4PRIVACYANDQUALITYCONTROLINMOBILECLOUDCOMPUTING . 78 4.1Background ................................... 81 4.1.1DierentialPrivacy ........................... 81 4.1.2PrivateSpatialDecomposition ..................... 82 4.2ProblemFormulation .............................. 83 4.2.1SystemModel .............................. 84 4.2.2TaskandMobileServerCharacteristics ................ 85 4.2.3ThreatModelandAssumptions .................... 86 4.2.4PerformanceMetrics .......................... 87 4.3ConstructingtheR-PSD ............................ 88 4.3.1ConstructingPrivateSpatialDecompositions ............. 88 4.3.2ConstructingReputation-BasedPrivateSpatialDecomposition ... 89 4.3.3AuthenticationofMobileServerTrustLevel ............. 91 4.4TaskAllocation ................................. 92 4.4.1AcceptanceRateCharacterization ................... 93 4.4.2ServiceQualityCharacterization .................... 94 4.4.3GeocastRegionConstructionwithPSD ................ 95 4.4.4GeocastRegionConstructionwithR-PSD .............. 97 4.5GeocastCommunicationProcess ....................... 99 4.6PerformanceEvaluation ............................ 100 4.6.1ExperimentalSetup ........................... 100 4.6.2ExperimentalResults .......................... 102 4.7RelatedWork .................................. 104 5PROTECTIONOFLOCATIONDATAINMOBILECROWDSOURCING ... 107 5.1TheProposedFramework ........................... 112 5.1.1SystemModel .............................. 112 5.1.2DesignGoals ............................... 115 6

PAGE 7

5.2OptimizationModelforTaskSelection .................... 116 5.2.1Denitions ................................ 116 5.2.2Trade-OsamongUtility,PrivacyandEciency ........... 117 5.2.3OptimizationProblemFormulation .................. 117 5.3AlgorithmsfortheOptimizationProblem ................... 120 5.3.1UtilityMaximizationAlgorithm .................... 120 5.3.2JointUtilityandEciencyOptimizationAlgorithm ......... 123 5.4Privacy-PreservingStatisticsCollection .................... 125 5.4.1ProblemOverview ............................ 125 5.4.2ComputationofWorkerStatisticsBasedonCounting ........ 128 5.4.3DistributedDierentially-PrivateCountingProcedure ........ 129 5.4.4CoinGenerationandNoiseAddition ................. 132 5.4.5AnalysisofDesignGoals ........................ 135 5.4.6PracticalConsiderations ........................ 136 5.5PerformanceEvaluation ............................ 137 5.6SystemOverhead ................................ 142 5.7RelatedWork .................................. 144 6CONCLUSIONANDFUTUREPLAN ....................... 147 REFERENCES ....................................... 151 BIOGRAPHICALSKETCH ................................ 164 7

PAGE 8

LISTOFTABLES Table page 3-1Numberofpairingandexponentiationoperations. ................. 75 5-1Computationaloverheadofouralgorithms ..................... 142 8

PAGE 9

LISTOFFIGURES Figure page 2-1ArchitectureofModelTrainingbasedonBiomedicalSensingData. ....... 24 2-2Illustrationofhorizontallypartitioneddata. .................... 31 2-3Illustrationofverticallypartitioneddata. ...................... 32 2-4AggregationofPrivateUserData. .......................... 42 2-5Convergenceofthelogisticregressionparametersobtainedfromourdistributedapproachonthehorizontallypartitioneddataset. ................. 49 2-6Comparisonoftheobjectiveofourapproach(solidline)andtheoptimalobjective(dashedline)onthehorizontallypartitioneddataset. ............... 49 2-7Testingerrorsforourdistributedapproachandthelocalapproachonthehorizontallypartitioneddataset. .................................. 50 2-8Convergenceofthelogisticregressionparametersobtainedfromourdistributedapproachontheverticallypartitioneddataset. ................... 52 2-9Comparisonoftheobjectiveofourapproach(solidline)andtheoptimalobjective(dashedline)ontheverticallypartitioneddataset. ................. 52 2-10Testingerrorsforourdistributedapproachandthelocalapproachontheverticallypartitioneddataset. .................................. 53 3-1ElectricityMarketforIncentive-BasedDemandResponse(IDR). ......... 60 3-2ExampleBaselineandPerformanceMeasurementforDemandResponseAsset[1]. 61 3-3RegistrationProcess. ................................. 64 3-4MeteringandQueryingProcess. ........................... 65 3-5SettlementProcess. .................................. 66 3-6Anonymityfordierentcloakingmechanisms ................... 74 3-7EvaluationofCloakingMechanisms. ........................ 75 4-1Privacy-preservingframeworkfortaskallocationinMCC. ............ 84 4-2Exampleofanadaptivegrid. ............................ 90 4-3ExampleofR-PSDwithtworeputationlevels. ................... 91 4-4IllustrationofageocastregionwithR-PSD. .................... 97 4-5Real-worlddatasetsforexperiments ......................... 101 9

PAGE 10

4-6Eectofprivacybudgetwhen ARk=0:9. .................... 103 4-7Comparisononvaryingacceptanceratethreshold ARkwith=0:5 ....... 104 4-8Eectsofreputationlevelrequirement ....................... 104 5-1BasicsystemmodelfortaskrecommendationinMC. ............... 112 5-2Contextgeneralizationmethods ........................... 114 5-3Taxonomyofworkeractivities. ........................... 115 5-4Illustrationoftheanswervectorforworkerk. ................... 129 5-5AggregationprocessofanswervectorsfromMworkers. .............. 130 5-6Schematicsofourprivacy-preservingcountingprocess. .............. 131 5-7Expectedcommissionperformancewithvaryingcontextgeneralizationlevels .. 137 5-8ExpectedcommissionofAlgorithm5withvaryingeciencyandprivacy. .... 138 5-9PerformanceofourapproximationAlgorithm5. .................. 139 5-10OptimizingtheweightedsumofutilityandeciencyforAlgorithm6. ...... 140 5-11Trade-obetweenandaccuracyofstatistics. ................... 141 5-12RunningtimeofrecommendingtasksforAlgorithm5. .............. 142 10

PAGE 11

AbstractofDissertationPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofDoctorofPhilosophyPRIVACY-PRESERVINGDATAANALYTICSFORBIGDATAAPPLICATIONSByYanminGongAugust2016Chair:YuguangFangMajor:ElectricalandComputerEngineeringIntoday'sinformationage,companiesareincreasinglyacquiringandstoringvastamountsofdataabouttheirusersandtheirusers'activitiesusinginformationandcommunicationtechnologies.Advancesinbigdataanalyticsenablecompaniestoexaminethesedatatouncoverhiddenpatterns,correlations,andotherrevealinginformation,improvingthequalityoftheirservices.However,withmassivedataandadvanceddataanalyticaltechniques,farmoreinformationcanbeinferredthanmostpeoplehaveanticipatedatthetimeofdatacollection/publication,asevidencedbyrecentprivacyleakageincidentssuchastheAOLsearchlogscandalandthede-anonymizationofNetixprizedata.Traditionalprivacy-preservingtechniquesareeitherinsucientagainstsuchnewprivacyattacks(e.g.,anonymizationandprivacynotices)orpreventingreasonabledatausage(e.g.,dataencryptionanddatadeletion).Theneedforasecureandprivacy-preservingsolutiontoallowingpeopletolearninformationasitwasintendedandstoppingpeoplefromlearninginformationinwaysitwasnothasmotivatedmyresearch.Myresearchattemptstoprovideasystematicviewandanin-depthunderstandingofthesecurityandprivacyissuesindata-intensiveapplications,anddesignpractical,secure,andprivacy-preservingprotocolsforperformingtaskswidelyusedinbigdataanalytics.Specically,myresearchisfocusedonformalizingandaddressingsecurityandprivacyproblemswithemphasisonthefollowingdomains:(1)mobilehealth(mHealth),whichusesemergingmobiletelecommunicationandnetworktechnologies 11

PAGE 12

todeliverhealthcareservicessuchasremotehealthmonitoring,remotedatacollection,anddiagnosticandpatient-groupsupport;(2)smartgrid,whichisamodernizedpowergridthatusesinformationandcommunicationtechnologiestoimprovetheeciency,reliability,economics,andsustainabilityofpowersystems,(3)mobilecloudcomputing,whichenablesresource-constrainedmobiledevicestoutilizecomputationalresourcesofvariedcloud-basedresourcessuchasproximatemobilecomputingentities,and(4)mobilecrowdsourcing,whichutilizestheadvancedsensing,computing,andcommunicationcapabilitiesofmobiledevicestoprovidecrowdsourcingservices.Theseapplicationsgeneratemassivedatasets,i.e.,medical,metering,ormobiledata,respectively.Amajorchallengethroughouttheseapplicationsistodesignsecureandprivacy-preservingmechanismsthatcanhandlethevolume,velocity,andvarietyoftheinvolveddata.Toaddressthischallenge,myresearchapproachisrsttoidentifythedataanalysisgoalsintheseapplications,thentorigorouslymodelandanalyzethesecurityandprivacyissues,andnally,todevisesolutionsthatachievethedataanalysisgoalswithrigorousandprovablesecurityandprivacyguarantees.Throughoutmyprojects,Iintegratecomputational,information-theoretic,andcryptographictechniquestoconstructsimple,ecient,andpracticalsolutions. 12

PAGE 13

CHAPTER1INTRODUCTIONAdvancesincomputingandelectroniccommunicationtechnologieshaveenabledubiquitouscollectionofhighvolume,velocity,and/orvarietyinformationassets,leadingtotheeraofbigdata.Bigdatahasdriveninnovation,productivity,eciency,andgrowthinmanydomains,creatingenormousbenetsfortheglobaleconomy.However,withmassivedataandadvanceddataanalyticaltechniques,farmoreinformationcanbeinferredthanmostpeoplehaveanticipatedatthetimeofdatacollection/publication,asevidencedbyrecentprivacyleakageincidentssuchastheAOLsearchlogscandalandthede-anonymizationofNetixprizedata.Traditionalprivacy-preservingtechniquesareeitherinsucientagainstsuchnewprivacyattacks(e.g.,anonymizationandprivacynotices)orpreventingreasonabledatausage(e.g.,dataencryptionanddatadeletion).Theneedforasecureandprivacy-preservingsolutiontoallowingpeopletolearninformationasitwasintendedandstoppingpeoplefromlearninginformationinwaysitwasnothasmotivatedmyresearch.Myresearchattemptstoprovideasystematicviewandanin-depthunderstandingofthesecurityandprivacyissuesindata-intensiveapplications,anddesignpractical,secure,andprivacy-preservingprotocolsforperformingtaskswidelyusedinbigdataanalytics.Amajorchallengeindata-intensiveapplicationsistodesignsecureandprivacy-preservingmechanismsthatcanhandlethevolume,velocity,andvarietyoftheinvolveddata.Toaddressthischallenge,myresearchapproachisrsttoidentifythedataanalysisgoalsintheseapplications,thentorigorouslymodelandanalyzethesecurityandprivacyissues,andnally,todevisesolutionsthatachievethedataanalysisgoalswithrigorousandprovablesecurityandprivacyguarantees.Throughoutmyprojects,Iintegratecomputational,information-theoretic,andcryptographictechniquestoconstructsimple,ecient,andpracticalsolutions. 13

PAGE 14

Thepresentationofthisdissertationisbrokenupintofourmainpartsaccordingtodierentapplicationareas. 1.1Privacy-PreservingMiningofMedicalDataInChapter 2 wediscusstheproblemofprotectingtheprivacyofbiomedicalsensingdatainmobilehealthmonitoring.Recentyears,thegovernmenthasputavibrantemphasisonpreventivehealthcarethroughawidearrayofnewinitiativesandfundings.Animportantcomponentforpreventivehealthcareismobilehealthmonitoring,whichcontinuouslyassessesthephysiology,behavior,andenvironmentalexposureofanindividualtomakehealth-relateddecisions,suchastimelyinterventionoralertingacaregiver.Mobilehealthmonitoringprogramsrequirenewcomputationalmodelsthatrelatetheobservablevariablestothequantitiesofinterest.Althoughresearchersaremakingprogressindesigningsuchmodels,signicantworkisstillneededtomakethemreliableenoughforreal-worlduse.Itisenvisionedthatcollaborativelearningwithlarge-volume,time-seriesbiomedicalsensingdatafromadiversesampleofparticipantsprovideabasisforreliablereal-timedecisions.However,sharingbiomedicalsensingdataposesseriousprivacychallengesfromthefollowingaspects:First,thebiomedicalsensingdatacontainprivacy-revealinginformationsuchashealthstatus,addictivebehaviorsandmovementpatterns.Smartalgorithmscanfusethesebehaviorswithdigitalfootprints{thatis,informationfromothersensorsandpubliclyavailableinformation{toconstructanearreal-timevirtualbiographyofpreviouslyprivatebehaviorandlifestylepatterns.Second,duetocontinuousandlong-termdatacollection,thebiomedicalsensingdataaremuchlargerthantraditionalmedicaldatacollectedinclinics,andmaycontainmoreprivacy-revealinginformation.Third,mHealthsystemsareopeninthesensethattheycanberunbymultipleparties(e.g.,doctors,insurancecompanies,dietadvisers,athleticcoachesorhome-careproviders)withcomplextrustrelationshipsandtechnicalcompetence.Ensuringprivacyinsuchsettingisdicultinpracticewhereconcernsofusability,cost,legacy,andconictinginterestsintrude. 14

PAGE 15

Duetotheaboveaspects,sharingbiomedicalsensingdataraisessevereprivacyconcerns.Toencourageuserparticipationincollaborativelearning,suchprivacyconcernsmustbeaddressed.Thus,wehavedesignedaprivacy-preservingschemeforcollaborativelearningofaspeciccomputationalmodel,i.e.,logisticregression.Althoughcomputation-intensivesecurityprotocolsmayservethispurpose,theyareinfeasibleinpracticeduetolimitedcomputingresourcesofmobiledevices.Ontheotherhand,ecientandlight-weightedprotocols,suchaspartiallyhomomorphicencryption,canonlyperformsimplecomputationtasksthatdonotsatisfytherequirementofdataanalysis.Inordertoutilizetheecientandlight-weightedprotocols,weleveragedtheintrinsicstructureofthelogisticregressionmodelanddecomposedthecollaborativelearningproblemintomultiplesubproblemsthatcanbelocallysolved,i.e.,eachusercanusehisowndatatocomputelocaloptimalresultsforthesesubproblems.Thedatasharingprocessonlytakesplacewhenaggregatingtheselocaloptimalresults.Thisprocessrevealslesssensitiveinformationandcanbeprotectedbyecientpartiallyhomomorphicencryptionprotocols.Throughthecombinationofadistributedalgorithmandamodiedversionofhomomorphicencryption,wegaveascalableandpracticalsolutionforprivacy-preservinglearninginmHealth.ExperimentalresultsbasedonrealdatasetsshowedthatourapproachishighlyecientandcouldbescaledtoalargenumberofmHealthusers.ThisworkhasbeenacceptedbyIEEE/ACMTransactionsonComputationalBiologyandBioinformatics[ 2 ]. 1.2UsingSmartMeterDataPrivatelyinDemandResponseProgramsInChapter 3 weinvestigateprivacyissuesofsmartmeterdata.Weidentifytheuniqueprivacyissueraisedinincentive-baseddemandresponseprograms,anddesignaframeworkthatenablesthedemandresponseprovidertolearnthecontributionofconsumersinademandresponseeventwithoutcompromisingtheirprivacy.Demandresponseisanimportantcomponentofsmartgrids,whichincentivizescustomerstoreduceorshifttheirelectricityusagepatternsduringpeakhours,and 15

PAGE 16

thusimprovesgridreliability,lowersgenerationcost,andfacilitatesrenewableenergyintegration.Demandresponsecanbeachievedthroughincentive-baseddemandresponse(IDR)programs,whichrewardparticipatingcustomersforreducingtheirelectricityusageatdemandresponserequests.InIDRprograms,apartynameddemandresponseprovider(DRP)needstoprole,reward,andprovidefeedbacktocustomersbasedonne-grainedmeteringdata.Thesedata,unavailableinthetraditionalpowergrid,cannowbecollectedbytheadvancedmeteringinfrastructureinthesmartgrid.InIDRprograms,thetimeintervalofmeteringmeasurementsvariesfromhourstosecondsbasedondierenttriggerconditions.However,dataatthisgranularityposeseriousprivacyissuestocustomers.Ithasbeenshownthatpowerusageprolesatagranularityof15minutesmayrevealwhetherachildisleftaloneathome,andpowerusageprolesatanergranularitymayrevealinformationonhealthstatusandfavoriteTVprograms.TheuniqueprivacychallengeofIDRprogramsliesinthefactthatthemetermeasurementsshouldbebothattributableandne-grained,excludingsomepopularprivacy-preservingapproachesthataddressprivacyissuesofothersmartgridoperations.CustomerprivacyconcernshavealreadyjeopardizedthemandatorydeploymentofsmartmetersintheNetherlandsdespiteoftheirpromisingbenets.Wehaveproposedaschemewhichprovidesne-grainedmeteringdatatotheDRPforbasicdemandresponseoperations,ensuringdataintegritythroughouttheprocesses.Theschemeprotectscustomerprivacybyseparatingtherealidentityandthene-grainedmeteringdata,i.e.,theDRPcanonlylearneithertherealidentityorthene-grainedmeteringdataatatimebutcannotlinkthemtogether.Inthecasewhenre-identicationisrequired,thelinkagebetweenrealidentityandmeteringdatacanbeeasilyrestored.Wedesigneddetailedprotocolsforregistration,metering,querying,settlement,andrevocationprocessesoftheIDRprogramswithacombinationofseveralcryptographicprimitives.Individualmeteringdataaresignedwithaspecialtechniquesuchthattheauthenticitycanbeveriedwithoutrevealingtherealidentityofthesigner.Whencustomerswant 16

PAGE 17

toinquiretheirmeteringdataorclaimtheirdemandresponserewards,theyprovetheireligibilitytotheDRPbutrevealnoadditionalinformationaboutthemselves.Withthesetechniquescombined,theanonymityofcustomersisguaranteedthroughouttheprocessesoftheIDR.Wefurtheranalyzedinformationleakageincustomerbehaviors(billpaymentorrewardwithdraw)associatedwiththerealidentitywhichmayleadtode-anonymization,anddevisedcloakingmechanismsthathelpcustomersreduceprivacyleakage.Wehaveshownthattheproposedschemeachievesthesecuritygoalsofintegrityandprivacy.Ourschemeprovidesanintegratedsolutionforprivacy-awareIDRprograms,whichcanpromotetheacceptanceofIDRprograms.Asfarasweknow,wearethersttoidentifyandaddressprivacyissuesforIDRprogramsinthesmartgrid.ThepreliminaryversionoftheworkinChapter 3 hasappearedin[ 3 ]. 1.3PrivacyandQualityControlofMobileDataInChapter 4 weinvestigateprivacyissuesinmobilecloudcomputinganddesignaschemethatensuresbothprivacyandservicequalityinthemobilecloud.Mobilecloudcomputingcanleveragetheadvancedsensing,computing,andcommunicationcapabilitiesofmobiledevices(i.e.,mobilecomputingentities)toprovidemobilecloudcomputingservices,suchasepidemicmonitoring,tracmonitoring,image/videocapturing,andpricecheckingformobileclients.Despitethesepromisingapplications,therearetwomajorchallengesinutilizingproximatemobilecomputingresources.First,sincemobileusersengagedinmobilecloudcomputingareuidanddiverse,itisdiculttocontrolthequalityofthecollecteddata.Specically,mobileusersmayleavethemobilecloudwithoutcompletingthetask,submitarbitraryanswerswithoutactuallycompletingthespecictask,orprovidelow-qualitysensingdataduetolimitedresourcesoftheirmobiledevices.Second,securityandprivacyofmobilecomputingentitiesisacriticalconcern.Inordertoallocatetasksandprovideeectiveservices,mobilecomputingentitiesneedtosharetheirlocationdatawiththecloudcomputingprovider,whichcouldrevealalotofpersonalinformationsuchastheiridentities,healthstatus,personalactivities,and 17

PAGE 18

politicalviews.Hence,itismandatorytoprovideprivacyguaranteeinordertoengagemoremobilecomputingresourcesinthemobilecloud.Wehaveproposedaframeworkfortaskallocationinmobilecloudcomputingwhichaddressesbothchallenges.Toprotectlocationprivacyofthemobilecomputingentities,weonlyallowthecloudcomputingprovidertolearnsanitizedlocationinformation.Datasanitationisperformedbythecellularserviceprovider,becauseithasalreadyearnedthetrustofmobilecomputingentitieswhentheyaresubscribedforitscellularservice.Wedesignedanewdatastructurebasedonprivatespatialdecomposition,whichcontainsbothreputationandlocationdistributioninformation,andwefurtherprovidedasolutiontosanitizeinformationcontainedinthedatastructurewithdierentialprivacyguarantee,whichensuresprivacyprotectionagainstadversarieswitharbitrarybackgroundinformation.Thecloudcomputingprovidercanusegeocasttechniquestoallocatetasksbasedonthesanitizeddataset.Inordertoensurehighservicequalityforthemobilecloudcomputingservice,wefurtherdevelopedanecientsearchstrategythatndstheoptimalgeocastregiontoensurehighservicequality.Wehaveconductedextensiveexperimentsbasedonreal-worlddatasetstodemonstratetheeectivenessoftheproposedframework.ThepreliminaryversionofthisworkhasbeenacceptedbyIEEETransactionsonEmergingTopicsinComputing[ 4 ]. 1.4ProtectionofLocationDatainMobileCrowdsourcingInChapter 5 wediscusstheproblemofprotectinglocationprivacyinMobilecrowdsourcing(MC).MCisthecombinationofcrowdsourcingandmobiletechnologiesthatleveragestheadvancedsensing,computing,andcommunicationcapabilitiesofmobiledevicestoprovidecrowdsourcingservices.InMC,alargenumberofmobileusersareengagedtoprovidepervasiveandcost-eectiveservicesofdatacollecting,processing,andcomputing.MCenableslarge-scaledatacollection,andthushasbeenusedinmanyapplicationssuchassearchandrescue(e.g.,locatingFlight370),landpreservation, 18

PAGE 19

large-scaleurbansensing,healthsurveillance,anddisasterplanningandresponse.mobilecloudcomputingapplications.SincemobileusersengagedinMCareuidanddiverse,itisdiculttocontrolthequalityofthecollecteddata.Specically,mobileusersmayleavetheMCsystemwithoutcompletingthetask,orsubmitarbitraryanswerswithoutactuallycompletingthespecictask,orprovidelow-qualitysensingdataduetolimitedresourcesoftheirmobiledevices.Therefore,itischallengingtoensuredataqualitywithadiversepoolofworkers.Totakeadvantageofthediversityandensuredataquality,MCserverstendtoallocateMCtaskstomobileusersbasedontheircontextinformationextractedfromtheirinteractionsandsmartphonesensors.Dependingontheapplicationscenario,thecontextofamobileusercanbedenedinmultipledimensions,includinggeographical,temporal,activity,andprole(e.g.,gender).Thesecontextscontainprivateandsensitiveinformationthatmaybeusedtouniquelyidentifyanindividual,revealhis/herhealthstatus,ortrackhis/herdailyroutines,thusraisingsevereprivacyconcerns.Thereisaninherentconictbetweendataqualityandprivacyintaskallocation.FindingasolutionthatensuresprivacywhileguaranteeingdataqualityisthusamajorchallengeforMCsystems.Weidentifyfundamentaltrade-osamongthreemetrics{utility,privacy,andeciency{inaMCsystemandproposeaexibleoptimizationframeworkthatcanbeadjustedtoanydesiredtrade-opointwithjointeortsofMCplatformandworkers.,weuseanecientaggregationapproachtocollectingworkerfeedbackswhileprovidingdierentialprivacyguarantees.Bothnumericalevaluationsandperformanceanalysisareconductedtodemonstratetheeectivenessandeciencyoftheproposedframework.ThepreliminaryversionoftheworkinChapter 5 hasappearedin[ 5 ]and[ 6 ]. 1.5FutureResearchDirectionsChapter 6 summarizesmypastandcurrentresearchanddescribessomefuturedirections.Theprimarygoalofmyresearchistoprovidesecurityandprivacysolutions 19

PAGE 20

inbigdataanalytics.Intheshortterm,Iplantofocusondesigningsecureandprivacy-preservingmHealthsystems.Inparallel,Iwillbuildthegroundworkforalong-termresearchagenda,whereIplantoextendmyfocusonutilizingbigdataanalyticstosolvesecurityandprivacyissuesindata-intensiveapplicationsandcyber-physicalsystems. 20

PAGE 21

CHAPTER2PRIVATEDATAANALYTICSONBIOMEDICALSENSINGDATAVIADISTRIBUTEDCOMPUTATIONMobilehealth(mHealth)technologies,includingremotemonitoring,wearabledevices,andembeddedsensors,havegrownrapidlyinthepastyearsandshowngreatpotentialtoimprovethequalityandeciencyofhealthcare.InmHealth,long-termandcontinuoushealthmonitoringisenabledbymobiledevicesthatwirelesslyconnectbiomedicalsensors.Thebiomedicalsensorscanbemanufacturedtobelight,durable,andcomfortableatlowcostandcansensealargevarietyofbiomedicalsignalsorphysicalactivities,suchaselectrocardiogram,glucoseconcentration,breathingrate,pulserate,bloodpressure,peripheraloxygensaturation,andbodymotion[ 7 , 8 ].Anexampleofsuchbiomedicalsensorsisthe\biostamp"designedbyacompanycalledMC10,whichisquarter-size,waterproof,andbreathable,andcostsjusttensofcentsunderbatchproduction[ 9 ].ThesenseddatacanbetransmittedtoaremotemHealthserver,whichconductsanalysisonthebiomedicaldataandreturnstimelyadvicestothesensedsubject.Healthmonitoringthroughbiomedicalsensorsenablestimelyinterventionandbettermanagementofindividualhealthstatus,thussignicantlyimprovinghealthcarequality.Biomedicalsensingdatacollectedinhealthmonitoringhaveattractedmuchresearchinterest.First,thesubjectsofbiomedicalsensingincludebothpatientsandhealthypeople.Thedataofhealthypeoplearenotavailableintraditionalhealthcarebecausemedicaldataareonlycollectedwhenpatientsvisitclinics.However,biomedicaldatafromhealthypeoplecanbeusedaspositivesamplesfortrainingpredictivemodelsandwilladdimportantinsightsofdiseasepreventionandprediction.Second,sincebiomedicalsensorscanmonitorthehumanbodydayandnightoveralongtimespan,thedatacollectedbybiomedicalsensorshavemuchlargervolumethantraditionalmedicaldata.Datacollectedatthisscaleenablene-graineddiagnosisandtreatmentsuchaspersonalizedmedicine,andmaylargelyimprovehealthcarequalityandeciency[ 10 ].Duetothehugepotentialofbiomedicalsensingdatainhealthcare,researchersfromtheInstituteofSystemBiology 21

PAGE 22

haveinitiatedaprojectcalled100KWellnessProject,whichaimstointenselymonitor100;000healthyindividualsandobservetheirphysiologyfor25years[ 11 ].Itisenvisionedthatanalysisonlarge-scalebiomedicalsensingdatawillrevealtheearliestharbingersofkillerdiseasessuchascancerandheartdisease.Inthispaper,wefocusonlogisticregression,aclassicmachinelearningtechniquewhichisappropriateforpredictingdichotomousoutcomesandthuswidelyusedformakingdecisionsinmedicaldiagnosisandprognosis[ 12 ].Forexample,logisticregressioncanbeusedforcalculatingtheprobabilitythatapatientwillsuercardiovasculardisease[ 13 ],diabetes[ 14 ],andpostpartumdepression[ 15 ];anditisalsousedforpredictingthemortalityprobabilityinblunttrauma[ 16 ]andafteraheartsurgery[ 17 ].Duetothediversityofhumanphysiology,classierstrainedonindividualdatasetsmaynotberobustoverawiderangeofinputdata.Theavailabilityoflarge-scalebiomedicalsensingdatapavesthewaytocollaborativelearning[ 18 ],whichovercomesthelimitationbyutilizingmultipleuserdatasetswithenoughdiversity.Incollaborativelearning,multipleindividualscondetheirdatatoacentralizedparty(e.g.,acloudserveroraresearchinstitution,hereaftercalledmHealthserver)astrainingsamples[ 19 { 21 ].Thecentralizedpartythenconstructsmathematicalmodelsbasedonthedata.FormHealthapplications,thecollaborativelearningmayengagepatientswiththesamedisease,patientsundersimilartreatment,orpatientscarryingcertaingeneticpatterns.Foreaseofpresentation,weusetheterm\Patient"torepresentthesubjectofsensing,includingbothhealthypersonsandpatients.Notethattherstcharacterofthetermiscapitalizedtoremindreadersofitsbroadmeaning.Althoughcollaborativelearningbasedonbiomedicalsensingdatacanbeeectiveinpredictivemodeltraining,italsoraisesseriousprivacyconcerns.Medicaldatahavealwaysbeenprivateinnature.However,privacyissuesinmHealthisespeciallyprominentinmultipleaspects.First,themHealthservercollectsawiderangeofhealthinformationincludingbothphysiologicalandphysicalactivitydata.Whilephysiologicaldatareect 22

PAGE 23

healthstatusofPatientsandareprivateinnature,physicalactivitydatamayrevealsensitiveinformationaboutlifestylesandactivitiesofPatients.Second,mHealthdevicesusuallycollectuserdatacontinuouslyoveralongtimeperiod,andthusthesensingdatacontainmoreprivateinformationthanmedicaldatacollectedintraditionalclinicvisits.Third,mHealthapplicationscanberunbyawiderangeofparties.Thusthedatamaynotonlybelearnedbyhealthcareproviders,butalsoinsurancecompanies,dietadvisers,athleticcoachesorhome-careproviders.Insuchasetting,PatientsmaynottrustthemHealthserverwiththeirprivatedata.Hence,toincentivizemHealthuserstocontributetheirdataformodelconstruction,weshouldguaranteetheirdataprivacy.Inthischapter,wedevelopaprivacy-preservingcollaborativelearningschemethatutilizescontinuoussensingdatafrommultiplePatientstowardstraininglogisticregressionmodelsinmHealth.Thescenarioweconsiderhereisthatthetrainingsamplesareprivatewhiletheresultingmodelsarepubliclyavailable.WeinnovativelycombineadistributedalgorithmwithamodiedversionofhomomorphicencryptionandgiveascalableandpracticalsolutionforprivatemodeltraininginmHealthwithoutanactivethirdparty.Unlikepreviousapproaches,weleveragetheintrinsicstructureofthelogisticregressionmodelanddecomposethecollaborativelearningproblemintomultiplesubproblemsthatcanbelocallysolved.Anaggregateclassieristhencomputedbyaveraginglocallytrainedparameters.Thelocaltrainingandaveragingstepsarerepeatedmultipleroundsuntiltheaggregateclassierconverges.Specically,weconsidertwodierentcasesofdistributedsensingdata: Horizontallypartitioneddata:AllPatientshaveadatabaseofsensingdatathataresensedbythesamesetofsensors.Atypicalsettingisdatacollectedthroughmobilehealthmonitoringprogramssuchastnesstrackingapplications,wherePatients'activitiesandsleeppatternsarecollectedthroughacertaintypeofwearabledevices. Verticallypartitioneddata:EachPatientonlyownsafewsensorsandhasadatabasesensedwithpartialsensors.Withcollaborativelearning,wetrytoexploitthesepartialsensingdatatondacommonhealthpatternamongtheusers.A 23

PAGE 24

Figure2-1. ArchitectureofModelTrainingbasedonBiomedicalSensingData. typicalsettingisfortheanalysisofgrouptherapy,whereeachPatientinthegroupsensesherowndataduringeverygroupmeeting.OurschemeishighlyecientandincurslowcomputationalandcommunicationoverheadforeachPatient,thusscalabletoalargenumberofPatients.Theremainderofthischapterisorganizedasfollows.WerstpresentthesystemmodelinSection 2.1 .ThenwedevelopdistributedalgorithmstodecomposethelogisticregressionmodelinSection 2.2 .WefurtherpresentasecuresummationprotocolthatenhancestheprivacyofthedistributedalgorithmsinSection 2.3 .Section 2.4 demonstratesexperimentalresultsandtheperformanceanalysis.Section 3.7 reviewstherelatedwork. 2.1SystemModelInthissection,werstoutlinethesystemarchitectureforprivatepredictivemodelinganddescribethethreatmodelanddesigngoals.WethencoverbackgroundonlogisticregressionandpresentthemotivatingscenariosinwhichprivatecomputationonPatientdataisdesirable. 2.1.1SystemArchitectureWefocusonpatient-centeredmHealthsystemswherePatientssharetheirprivatesensingdatawithanmHealthserverformodeltraining.ThesystemarchitectureisshowninFig. 4-1 .Asshowninthegure,thePatientiscontinuouslymonitoredbymultiplesensors,generatingalargevolumeofdatasuchasheartrates,hydrationlevels,activity 24

PAGE 25

levels,andglucoselevels.Thesesensorsarewirelesslyconnectedtoamobiledevice,whichcollectsandstoresthesenseddata.Therawsensingdatainacertaintimeperiodmaybepreprocessedandtransformedintoafeaturevector.ThetaskforthemHealthserveristoconstructalogisticregressionmodelwhichenablescomputationoftheoutcome'sprobabilitygivenanewfeaturevector.ThemodelneedstobecollaborativelylearnedusingdatasetsfrommultiplePatients.Inordertotunethelogisticregressionmodelbasedonthedistributeddatasets,wedesignaniterativealgorithmwhichdecomposestheoriginallogisticregressionmodeltrainingproblemintosmallsubproblems.Ineachiteration,Patientsusetheirownprivatedatatoconstructintermediatelocalclassiers,whichareaggregatedlateratthemHealthserver.SincelocalclassiersaretrainedbasedonthedataofeachindividualPatient,theymaycontainsensitiveinformationaboutPatients.Topreventprivacyleakageinlocalclassiers,wedecomposethelogisticregressionprobleminawaysuchthatineachiteration,themHealthserveronlyneedstoperformasimpleaverageoperationoverlocalclassiers.WefurtherprotecttheinformationoflocalclassiersbylayeringanecientsecuresummationprotocolontothedistributedalgorithmsothatthemHealthserverlearnsnothingotherthantheaggregatedresultineachiteration.IndividualPatientdataarethusmaskedoutintheaggregateswhicharesafetorelease. 2.1.2ThreatModelWehavethefollowingsecurityassumptions.ThemHealthserverisassumedtobeHonest-but-Curious(HbC).Ononehand,themHealthserverwillhonestlyfollowtheprotocolandistrustworthytocorrectlycomputepredictivemodels.ThisisreasonablesinceitwillbeinthebestinterestofthemHealthservertoobtainanaccurateandunbiasedmodel.ThemHealthserverhasnoincentivetotamperintermediateaggregatesorprevent/delaytheconvergenceofthealgorithms.Ontheotherhand,themHealthserveriscuriousabouttheprivatefeaturevectorsofPatientswhichtheydonotwanttoshare.Tothisend,themHealthservermay(passively)attempttoinferprivateinputs 25

PAGE 26

ofPatientsorcolludewithsomeofthePatientstoinferprivateinformationaboutotherhonestPatients.WealsoassumethatPatientsareHbC.ThismeansthatPatientswillfaithfullyfollowtheprotocols,buttheymaycolludewiththemHealthserverorsomePatientstoinfertheinputsofotherPatients.Nevertheless,weassumethatonlyasmallfractionofPatientswillcolludewiththemHealthserver.NotethatitispossiblethatsomePatientsarenotHbCandareincentivizedtobiasthecomputationresults.However,whenthesePatientssendlargelybiaseddataformodeltraining,theuncommondatamaybedetectedthroughsignalprocessingtechniques[ 22 ].Thus,toavoiddetection,theyareassumedtoonlysendslightlybiaseddata,whicharemaskedoutintheaggregatesofdatafromalargepopulationofPatientsandcannothavemuchinuenceontheaccuracyofthecomputationresults.Hence,wearguethatinoursettingHbCsecurityissucient.Wedonotconsideroutsiderattacksbecausesuchattackscanbemitigatedwithsystemlevelprotectionandstandardnetworksecuritytechniques.Specically,weassumethatthemobiledeviceateachPatientissecure(i.e.,thedatastoredatthemobiledevicesareprotectedfromintrusion),andallthecommunicationchannelsarereliable,encryptedandauthenticated.Duetolimitedresourcesofbiomedicalsensors,lightweightcryptographyschemesshouldbeemployedfordatatransmissionfromsensorstomobiledevices,suchastheoneintroducedin[ 23 ]. 2.1.3DesignGoalsOurgoalistoprovideapracticalprivacy-preservingsolutionforrealworldmodeltrainingunderaforementionedsecurityassumptions.Tothisend,weidentifythreekeypropertiesforapracticalprivacy-preservingsolution.Privacy.SincethemHealthserverisnottrustedtolearnthetrainingsamples,theyshouldbekeptlocallyatthePatient'sside.Howcanwedesignacollaborativelearningschemebasedondistributeddata?Wewillanswerthisquestionwithdistributedapproacheswhichiterativelytrainandaggregatelocalclassiers.Second,evenifdatacan 26

PAGE 27

belocallytrained,theresultinglocalclassiersstillneedtobeaggregatedatthemHealthserverineachiteration.TheselocalclassiersaretrainedbasedonprivatepersonaldataandcouldrevealsensitiveinformationaboutPatients[ 24 ].Howcanweensurenoprivateinformationisleakedduringtheaggregationprocess?Scalability.OurschemeshouldbescalabletoalargenumberofmHealthuserssothatthetrainingsamplesprovidesucientdiversityfortrainingrobustclassiers.Themainfactorthatinuencesscalabilityisthecomputationalcomplexity.Consideringthelimitedcomputationalresourcesandbatterylifetimeofmobiledevices,weneedtokeepthecomputationalandcommunicationoverheadatthePatientsidelowevenwithalargenumberofparticipatingPatients.Eciency.Ourtrainingschememustbeecientfortime-seriessensingdata.InmHealthmonitoring,samplesarecontinuouslygeneratedovermultipletimeperiods.Thepredictivemodelmaybeperiodicallyupdatedwhennewtrainingsamplesarrive.Thisobservationmotivatesustodesignaschemewithlowamortizedcomputationaloverhead(i.e.,theaveragecomputationcostforeachtimeperiod). 2.1.4LogisticRegressionLogisticregressionisaclassicmachinelearningtechniquethatiscommonlyusedinpredictingdichotomousoutcomesinmedicaldiagnosisandprognosis[ 12 ].Herewebrieydiscussthebasicsoflogisticregression.Withoutlossofgenerality,wefocusonbinaryclasslogisticregression,butoursolutionisdirectlyapplicabletothecaseofk-classlogisticregression.Considerasupervisedlearningtaskwithasetoflabeledtrainingsamplesf(xi;yi);i=1;:::;Ng,wherexi2Rndenotesafeaturevectorandyi2f)]TJ /F1 11.955 Tf 29.45 0 Td[(1;+1gdenotesthecorrespondingbinaryclasslabel.The`1regularizedlogisticregressionproblem[ 25 ]isdenedas minNXi=1log)]TJ /F1 11.955 Tf 5.48 -9.69 Td[(1+exp)]TJ /F2 11.955 Tf 5.48 -9.69 Td[()]TJ /F3 11.955 Tf 9.3 0 Td[(yi)]TJ /F5 11.955 Tf 5.48 -9.69 Td[(wTxi+v+kwk1;(2{1) 27

PAGE 28

wheretheweightvectorw2Rnandinterceptv2Raretheparametersofthelogisticregressionmodel,and>0istheregularizationparameter.Withthetrainedregularizedlogisticregressionclassier(w;v),alogisticregressionmodelstheconditionalprobabilitydistributionoftheclasslabely2f)]TJ /F1 11.955 Tf 28.79 0 Td[(1;1ggivenafeaturevectorx2Rnasfollows. Pr(y=1jx)=1 1+exp()]TJ /F1 11.955 Tf 11.29 0 Td[((wTx+v));(2{2) Pr(y=)]TJ /F1 11.955 Tf 9.29 0 Td[(1jx)=1 1+exp(wTx+v):(2{3)Theresultingclassiercanpredicttheclasslabelofanewfeaturevector,andthusisparticularlysuitablefordiseasestateprediction(healthyorunhealthy)anddecisionmaking(yesorno)inmedicaldiagnosisandprognosis.Forinstance,in[ 14 ],TabaeiandHermanconductadiabetesstudywhichscreensdiabetesbasedonlogisticregressionclassiers.Intheirstudy,eachPatientgeneratesaprivatefeaturevector,whichconsistsofage(years),sex(0=maleand1=female),bodymassindex(BMI),postprandialtime(PT),andrandomcapillaryplasmaglucoselevel(RPG).Eachfeaturevectorisassociatedwithalabelyi,whichisanindicatoroffastplasmaglucose(FPG)andplasmaglucose2haftera75goralglucoseload(2-hPG),bothindicatingtheriskofhavingdiabetes.Specically,whenFPG140mg/dlor2-hPG200,yi=1;otherwise,yi=)]TJ /F1 11.955 Tf 9.3 0 Td[(1.ThemHealthservercollectsdatafrom1;032Patientsandusesthedatatotrainalogisticregressionclassier.Theresultingclassieris(w;v),wherew=[0:0331;0:0308;0:2500;0:5620;0:0346]andv=)]TJ /F1 11.955 Tf 9.3 0 Td[(10:0382.Givenafeaturevectorx,theclassiercanpredicttheprobabilitythatFPG140mg/dlor2-hPG200accordingto( 2{2 )and( 2{3 ).Althoughhavingmanybenetsinmedicaleld,logisticregressionposessignicantthreatstouserprivacysinceitinvolvestheusageofprivatesensingdatasuchasbloodlevel,activity,andage.Therefore,aschemetopreserveuserprivacywhilenotsacricingutilityofthemedicalsensingdataisneeded. 28

PAGE 29

2.2PrivatePredictiveModelTrainingViaDistributedComputationInthissection,wedescribeapracticalprivacy-preservingschemethatenablescollaborativemodellearningoverdistributeddata.Duringcollaborativemodeltraining,eachPatientcontributesasetoftrainingsamples.Eachtrainingsampleisassociatedwithalabel(\+1"or\-1"forbinaryclass).Thetrainingsamplesareconsideredprivateastheymayrevealsensitiveinformationsuchashealthstatusandunusualactivitiesofindividuals.Ourschemeisbasedonanalgorithmcalledalternatingdirectionmethodofmultipliers(ADMM).Thealgorithmprovidesapossiblewaytodecomposethelogisticregressionmodelintosmallersubproblemsthatcanbelocallycomputed.Inthissection,werstgivesomebackgroundonADMM.ThenweuseADMMtodesignprivacy-awaredistributedalgorithmsthatsolvelogisticregressionundertwocasesofPatientdata:horizontallypartitioneddataandverticallypartitioneddata,whichcorrespondtodierentapplicationscenarios. 2.2.1BasicsofADMMInthefollowing,wedescribethebasicsofADMM.ADMMisadistributedalgorithmthatsolvesalarge-scaleoptimizationproblembydecomposingitintosmallersubproblemsthatareeasiertosolve.ADMMisrstintroducedbyGlowinski,Marroco,Gabay,andMercier[ 26 , 27 ]in1976andhasfoundapplicationsinmanyareassincethen[ 28 ].Thealgorithmsolvesproblemsinthefollowingform: minimizex;zf(x)+g(z)subjecttoAx+Bz=cx2X;z2Z(2{4)wherex2Rn,z2Rm,A2Rpn,B2Rpm,andc2Rp.Weassumethatfunctionsfandgareconvex,andXandZarenon-emptypolyhedralsets.Thevariablesaresplitintotwopartsxandz,andtheobjectivefunctionisseparableacrossthesplitting. 29

PAGE 30

WecanformtheaugmentedLagrangianfor( 2{4 )as L(x;z;y)=f(x)+g(z)+yT(Ax+Bz)]TJ /F3 11.955 Tf 11.96 0 Td[(c)+(=2)kAx+Bz)]TJ /F3 11.955 Tf 11.95 0 Td[(ck22;(2{5)where>0isthepenaltyparameterandthelasttermistheregularizationterm.WecanviewtheaugmentedLagrangianastheLagrangianassociatedwiththefollowingproblem minimizex;zf(x)+g(z)+(=2)kAx+Bz)]TJ /F3 11.955 Tf 11.96 0 Td[(ck22subjecttoAx+Bz=c;x2X;z2Z:(2{6)Sincetheregularizationtermequalszeroforanyfeasiblexandz,theaboveproblemisequivalenttoproblem( 2{4 ).TheintroducedregularizationtermensuresthatLisstrictlyconvexevenwhenfandgareaneandhelpstoimprovetheconvergencepropertyofthealgorithm.ADMMconsistsofthreestepsineachiterationk: 1. x-minimizationwithzandyxed: xk+1:=argminx2XL(x;zk;yk):(2{7) 2. z-minimizationwithxandyxed: zk+1:=argminz2ZL(xk+1;z;yk):(2{8) 3. Dualvariableyupdate: yk+1:=yk+(Axk+1+Bzk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(c);(2{9)wherethestepsizeequalstothepenaltyparameter.NotethatinADMM,xandzareupdatedsequentiallyinsteadofjointlyasindualascentalgorithm.Theorderofx-updatestepandz-updatestepcanbereversed,leadingtoavariationonADMM.TheoptimalityandconvergenceoftheADMMalgorithmisgivenbythefollowingtheorem,whoseproofcanbefoundin[ 29 ]. 30

PAGE 31

Figure2-2. Illustrationofhorizontallypartitioneddata. Theorem2.1([ 29 ]). Assumethattheoptimalsolutionsetof( 2{4 )isnon-empty,andeitherXisboundedorATAisnonsingular.Thenasequencefxk;zk;ykggeneratedbytheiterations( 2{7 )( 2{8 )( 2{9 )isbounded,andeverylimitpointoffxk;zkgisanoptimalsolutionof( 2{4 ).Inpractice,ADMMusuallyconvergestomodestaccuracywithinafewtensofiterations. 2.2.2HorizontallyPartitionedDataTheterm\horizontallypartitioneddata"isinitiallyusedindatabaseswherethedataarepartitionedbasedonrows.InourmHealthscenario,PatientshavethesamesetofbiomedicalsensorsandeachPatientgeneratessensingdatawiththesamenumberoffeatures,asshowninFig. 2.2.2 .EachPatientstoresseveralrowsofsensingdatawitheachrowcontainingthesensingresultscollectedinasinglesamplingperiod.Amotivatingscenariothatgeneratessuchdataismobilehealthmonitoringfordiabetesmanagement.Considerthataresearchinstitutewantstostudytheriskfactorsthatinuencetheglucoselevelandconstructapredictivemodelthatpredictswhethertheglucoselevelwillbenormalornotgiventheseriskfactors.TheinstituterecruitsagroupofPatientsforits 31

PAGE 32

Figure2-3. Illustrationofverticallypartitioneddata. study.Aspartofthestudy,aPatientwearsamobiledeviceprovidedbytheinstitutethatcontinuouslymonitorsfactorsincludingmedication(e.g.,insulinintake),physicalactivity(e.g.,lightexercise),foodintake(e.g.,carbohydrates),andotherbiological(e.g.,dawnphenomenon)andenvironmental(e.g.,altitude)factors.ThePatientalsorecordshis/herbloodglucoselevelsatxedfrequencies(e.g.,threetimesperday)andlabelsthebloodglucoselevelsaseither`positive"or\negative"bycomparingthemwithasafetythreshold.Bothfeaturevectorsandlabelsaresenttotheinstitute,whothentrainsamodelthatpredictswhetherthebloodglucoselevelisaboveorbelowthesafetythreshold.Suchamodelwillhelpdiabeticsbettermonitortheirbloodglucoselevelsandreducethefrequenciesofunpleasantbloodtests.Inthecaseofhorizontallypartitioneddata,PatientsareequippedwiththesamesetofsensorsandeachPatientobtainsasetoffeaturevectors.Specically,eachPatientihasalocalsetoftrainingsamplesDi:=f(aij;bij);j=1;:::;mig,whereaij2Rnisafeaturevector,bij2f)]TJ /F1 11.955 Tf 27.38 0 Td[(1;+1gisthecorrespondinglabeloftheoutcomevariable,andmiisthenumberoftrainingsamplesownedbyPatienti.ThelabelisownedbythePatientandassumedtobeprivate.The`1regularizedlogisticregressionproblembecomesthe 32

PAGE 33

following:Givenasetoflabeledtrainingsamples[Ni=1DifromNPatients,solve minNXi=1miXj=1log)]TJ /F1 11.955 Tf 5.48 -9.69 Td[(1+exp)]TJ /F2 11.955 Tf 5.48 -9.69 Td[()]TJ /F3 11.955 Tf 9.3 0 Td[(bij)]TJ /F5 11.955 Tf 5.48 -9.69 Td[(wTaij+v+kwk1;(2{10)wherew2Rn;i=1;:::;Nandv2R.TheaboveproblemcannotbedirectlysolvedbyADMMsincetheobjectivefunctionisnotseparableovertwosetsofvariables.Toaddressthischallenge,weintroduceasetofauxiliaryvariablesf(wi;vi)g;i=1;:::;NandreformulatetheoptimizationproblemasminNXi=1miXj=1log)]TJ /F1 11.955 Tf 5.48 -9.68 Td[(1+exp)]TJ /F2 11.955 Tf 5.48 -9.68 Td[()]TJ /F3 11.955 Tf 9.3 0 Td[(bij)]TJ /F5 11.955 Tf 5.48 -9.68 Td[(wTiaij+vi+kwk1s.t.wi=w;vi=v;i=1;:::;N: (2{11)Itisobviousthatthenewproblem( 2{11 )isequivalenttotheoriginalproblem( 2{10 ).Notethattheobjectivefunctionintheproblem( 2{11 )isnowseparableovertwosetsofvariablesf(wi;vi);i=1;:::;Ngand(w;v).Wecanview(wi;vi)asthecopyofregressionparametersatPatientiand(w;v)asthecopyofregressionparametersatthemHealthserverside.Thesetwosetsofvariablesareconnectedthroughequalityconstraints.Inthefollowing,wedemonstratethatthroughtheseauxiliaryvariablestheproblemcanbedecomposedintoseveralsubproblems.Forsimplicityofnotation,wedene:=f(w;v)gand:=f(wi;vi);i=1;:::;Ng.FollowingtheframeworkofADMM,weformulatetheaugmentedLagrangianof( 2{11 )asL(;;)=NXi=1miXj=1log)]TJ /F1 11.955 Tf 5.48 -9.69 Td[(1+exp)]TJ /F2 11.955 Tf 5.48 -9.69 Td[()]TJ /F3 11.955 Tf 9.3 0 Td[(bij)]TJ /F5 11.955 Tf 5.48 -9.69 Td[(wTiaij+vi+kwk1+NXi=1)]TJ /F1 11.955 Tf 5.47 -9.68 Td[((wi)]TJ /F5 11.955 Tf 11.95 0 Td[(w)Ti;w+i;v(vi)]TJ /F3 11.955 Tf 11.95 0 Td[(v)+NXi=1(=2))]TJ /F1 11.955 Tf 5.48 -9.68 Td[((wi)]TJ /F5 11.955 Tf 11.95 0 Td[(w)T(wi)]TJ /F5 11.955 Tf 11.96 0 Td[(w)+(vi)]TJ /F3 11.955 Tf 11.95 0 Td[(v)2; (2{12)where:=f(i;w;i;v);i=1;:::;Ngarethedualvariablescorrespondingtotheconstraintsin( 2{11 ). 33

PAGE 34

Wethensolvetheproblembyupdating,,andsequentially.Specically,atthe(k+1)-thiteration,the-minimizationstepinvolvessolvingthefollowingproblem: minkwk1+(N=2)wT)]TJ /F5 11.955 Tf 5.48 -9.69 Td[(w)]TJ /F1 11.955 Tf 11.95 0 Td[(2 wk)]TJ /F1 11.955 Tf 11.96 0 Td[(2 kw=+(N=2)v)]TJ /F3 11.955 Tf 5.48 -9.69 Td[(v)]TJ /F1 11.955 Tf 11.95 0 Td[(2 vk)]TJ /F1 11.955 Tf 11.95 0 Td[(2 kv=;(2{13)wheretheoverlinenotation ()denotestheaverageofavectoroveri=1;:::;N.Aclosed-formsolutionoftheaboveproblemcanbecomputedusingsubdierentialcalculus[ 30 ].Specically,theoptimalsolutionisgivenby wk+1:= wk+ kw=)]TJ /F1 11.955 Tf 11.96 0 Td[((=N)+)]TJ /F9 11.955 Tf 11.95 9.69 Td[()]TJ ET q .478 w 254.07 -184.28 m 263.97 -184.28 l S Q BT /F5 11.955 Tf 254.07 -191.27 Td[(wk)]TJ ET q .478 w 283.71 -184.28 m 291.52 -184.28 l S Q BT /F12 11.955 Tf 283.71 -191.27 Td[(kw=)]TJ /F1 11.955 Tf 11.96 0 Td[((=N)+ (2{14a)vk+1:= vk+ kv=; (2{14b)wheretheoperator[]+meanstakingthemaximumofzeroandtheargumentinside.Afterobtainingk+1fromthe-minimizationstep,the-minimizationstepconsistsofsolvingthefollowing:minNXi=1miXj=1log)]TJ /F1 11.955 Tf 5.48 -9.68 Td[(1+exp)]TJ /F2 11.955 Tf 5.48 -9.68 Td[()]TJ /F3 11.955 Tf 9.29 0 Td[(bij)]TJ /F5 11.955 Tf 5.48 -9.68 Td[(wTiaij+vi+NXi=1(=2)wTi(wi)]TJ /F1 11.955 Tf 11.95 0 Td[(2wk+1+2ki;w=)+NXi=1(=2)vi(vi)]TJ /F1 11.955 Tf 11.95 0 Td[(2vk+1+2ki;v=); (2{15)whichisdecomposableoverallPatients.Eectively,eachPatientionlyneedstoindependentlysolvethefollowingsubproblem: minimiXj=1log)]TJ /F1 11.955 Tf 5.48 -9.68 Td[(1+exp)]TJ /F2 11.955 Tf 5.48 -9.68 Td[()]TJ /F3 11.955 Tf 9.29 0 Td[(bij)]TJ /F5 11.955 Tf 5.48 -9.68 Td[(wTiaij+vi+(=2)wTi(wi)]TJ /F1 11.955 Tf 11.95 0 Td[(2wk+1+2ki;w=)+(=2)vi(vi)]TJ /F1 11.955 Tf 11.95 0 Td[(2vk+1+2ki;v=):(2{16) 34

PAGE 35

Thisper-PatientsubproblemhasamuchsmallerscaleandusesthePatient'sownprivateinformation.StandardmethodssuchasNewton'smethodortheconjugategradientmethodcanbeappliedtosolvethesubproblemeciently.Afterweobtaink+1andk+1,thedualupdateisasfollows: k+1i;w:=ki;w+)]TJ /F5 11.955 Tf 5.47 -9.68 Td[(wk+1i)]TJ /F5 11.955 Tf 11.96 0 Td[(wk+1; (2{17a)k+1i;v:=ki;v+)]TJ /F3 11.955 Tf 5.48 -9.69 Td[(vk+1i)]TJ /F3 11.955 Tf 11.96 0 Td[(vk+1: (2{17b)TheentireproceduresofouralgorithmaredescribedinAlgorithm 1 .Obviously,ourproblemmeetstheconditionsinProposition 2.1 ,andtheproposedalgorithmconvergestotheoptimalsolution.Attheendofthealgorithm,eachPatientiwilllearntheglobaloptimalclassierswandvwithoutsendinghis/herlocaltrainingsettoothers.Therefore,thissystemcanpreserveuserprivacywithoutsacricingtheutilityofthelearningfunction. Algorithm1. DistributedAlgorithmforHorizontallyPartitionedData 1: ThemHealthserverinitializesk 0, w0 0, v0 0. 2: EachPatientiinitializesk 0,0i;w 0,and0i;v 0. 3: repeat 4: ThemHealthservergathers(wki;vki)and(ki;w;ki;v)fromallPatientsi2Nandaveragesthemtoget wk, vk, kw,and kv.Thenitupdateswk+1andvk+1accordingto( 2{14 )andbroadcaststhemtoallPatients. 5: Afterreceivingwk+1andvk+1,eachPatientisolvestheper-Patientsubproblem( 2{16 )independentlyusinghis/herowntrainingsetandthenupdatesindependentlythedualvariablesaccordingto( 2{17 ). 6: EachPatientsendstheoptimalsolution(wk+1i;vk+1i)and(ki;w;ki;v)tothemHealthserver. 7: k k+1 8: untilConvergencecriteriaismet 35

PAGE 36

end 2.2.3VerticallyPartitionedDataTheterm\verticallypartitioneddata"referstopartitioningdatabasedoncolumnsindatabases.Inverticallypartitioneddatasets,eachrowrepresentsdatasampledatthesametimeslotorunderthesamecontext,andthedataineachrowarecollaborativelysensedbyallPatientsasshowninFig. 2.2.2 .Insteadofexploitingthediversityofindividualsforrobustnessasinthehorizontallypartitionedcase,wetrytoinvolvemorefeaturesofsensingdata.WeassumethatPatientsowndisjointsubsetsofsensors,andweaimtotmodelswithsensingdatafromtheunionofallthesesensors.Amotivatingscenarioistheanalysisofgrouptherapy.ConsideratherapygroupwhereatherapisttreatsagroupofPatients.Aftereachgroupmeeting,thetherapistwillevaluatethegroupmeetingaseective(\+1")orineective(\-1"),whichisthelabelofthismeeting.Eachparticipantofthegroupisequippedwithsensorstosensehis/herownbiologicalandemotionalstatusduringthegroupmeeting.ThefeaturevectorofagroupmeetingisthesensingdatafromallPatients.Aseriesofgroupmeetingwillbeheldduringsomeperiod,resultinginaverticallypartitioneddistributeddataset.Theverticallypartitionedcaseisparticularlyusefulwhenweneedtomonitorthedataofallmembersinagroupforgroupeectevaluation,asinthegrouptherapycase.Itisalsoveryusefulinahighdimensionaldatasettingwherethenumberoffeatures(i.e.,sensedmetricsorriskfactorsofadisease)isverylarge.Inthiscase,itwouldbeagreatcommitmentforanindividualtouseallsensorsonhis/herbody,especiallyunderacontinuoussensingenvironment,becauseitiscumbersomeandinconvenient.Thusanindividualmayhesitatetoparticipateinsuchprojects.Moreover,inmanycasesthetrainingdataareby-productsofotherhealthmonitoringprogramswherePatientsonlyuseasmallsetofsensorsthatarecloselyrelatedtotheirownhealthstatuses.Inthecaseofverticallypartitioneddata,eachPatientiisequippedwithasubsetofsensorsandobtainspartialfeaturevectorsoveraspeciedtimeinterval.Specically,each 36

PAGE 37

Patientiposesasetoftrainingsamples^Di:=f(^aij;^bj);j=1;:::;mg,where^aij2Rniisapartialfeaturevector,^bj2f)]TJ /F1 11.955 Tf 26.94 0 Td[(1;1gisthecorrespondinglabeloftheoutcomevariable,misthenumberoftrainingsamplesownedbyeachuser,andPNi=1ni=n.Aswithotherpapers[ 31 ]inliterature,thelabelsf^bj;j=1;:::;mgareassumedtobeknownbyallPatientsandnotprivate.Then,the`1regularizedlogisticregressionproblembecomesthefollowing:Givenasetoflabeledtrainingsamples[Ni=1^Di,solve minmXj=1log 1+exp )]TJ /F1 11.955 Tf 8.86 3.16 Td[(^bj NXi=1wTi^aij+v!!!+NXi=1kwik1;(2{18)wherewi2Rni;i=1;:::;Nandv2R.TosolvetheaboveproblemwithADMM,werstintroduceasetofauxiliaryvariablesfzij;i=1;:::;N;j=1;:::;mgandreformulatetheoptimizationproblemasminmXj=1log 1+exp )]TJ /F1 11.955 Tf 8.86 3.16 Td[(^bj NXi=1zij+v!!!+NXi=1kwik1;s.t.wTi^aij)]TJ /F3 11.955 Tf 11.95 0 Td[(zij=0;i=1;:::;N;j=1;:::;m: (2{19)Itisobviousthatthenewproblem( 2{19 )isequivalenttotheoriginalproblem( 2{18 ).Theobjectivefunctionnowisseparableovertwosetsofvariablesf(v;zij);i=1;:::;N;j=1;:::;mgandfwi;i=1;:::;Ng.Inthefollowing,wedemonstratethatthroughtheseauxiliaryvariablestheproblemcanbedecomposed.Forsimplicityofnotation,wedene^:=fwi;i=1;:::;Ngand^:=f(v;zij);i=1;:::;N;j=1;:::;mg.FollowingtheframeworkofADMM,we 37

PAGE 38

formulatetheaugmentedLagrangianasL(^;^;^)=mXj=1log 1+exp )]TJ /F1 11.955 Tf 8.86 3.16 Td[(^bj NXi=1zij+v!!!+NXi=1kwik1+mXj=1NXi=1ij)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(wTi^aij)]TJ /F3 11.955 Tf 11.96 0 Td[(zij+mXj=1NXi=1(=2))]TJ /F5 11.955 Tf 5.48 -9.68 Td[(wTi^aij)]TJ /F3 11.955 Tf 11.95 0 Td[(zij2;where^:=f^ij;i=1;:::;N;j=1;:::;mgarethedualvariablescorrespondingtoconstraints( 2{19 ).Ouralgorithmworksasfollows.Atthe(k+1)-thiteration,the^-minimizationstepinvolvessolvingthefollowingproblemforeachPatientiinparallel: minkwik1+(=2)mXj=1wTi^aij)]TJ /F5 11.955 Tf 5.48 -9.68 Td[(wTi^aij)]TJ /F1 11.955 Tf 11.96 0 Td[(2zkij+2^kij=:(2{20)Afterobtaining^k+1fromthe^-minimizationstep,the^-minimizationstepconsistsofsolvingthefollowing:minmXj=1log 1+exp )]TJ /F1 11.955 Tf 8.86 3.16 Td[(^bj NXi=1zij+v!!! (2{21)+(=2)mXj=1NXi=1zij)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(zij)]TJ /F1 11.955 Tf 11.96 0 Td[(2(wk+1i)T^aij)]TJ /F1 11.955 Tf 11.96 0 Td[(2^kij=:The^-minimizationproblemcanbefurthersimpliedasfollows.Let zjdenotetheaverageofzijacrossalli.The^-updateproblemcanberewrittenas minmXj=1log1+exp)]TJ /F1 11.955 Tf 8.86 3.15 Td[(^bj(N zj+v) (2{22a)+(=2)mXj=1NXi=1zij)]TJ /F3 11.955 Tf 5.48 -9.69 Td[(zij)]TJ /F1 11.955 Tf 11.95 0 Td[(2(wk+1i)T^aij)]TJ /F1 11.955 Tf 11.95 0 Td[(2^kij=s.t. zj=(1=N)NXi=1zij (2{22b) 38

PAGE 39

Notethatintheaboveproblem,minimizingoverzij;8iwith zjxedhasthesolution zij=(wk+1i)T^aij+^kij=+ zj)]TJ /F1 11.955 Tf 11.96 0 Td[((1=N)NXi=1)]TJ /F1 11.955 Tf 9.22 -9.69 Td[(^ijk=+(wk+1i)T^aij:(2{23)Therefore,Problem( 2{22 )canbecomputedbysolvingthefollowingunconstrainedoptimizationproblem:minmXj=1 log1+exp)]TJ /F1 11.955 Tf 8.86 3.16 Td[(^bj(N zj+v)+(N=2) z2j)]TJ /F3 11.955 Tf 11.96 0 Td[( zjNXi=1)]TJ /F1 11.955 Tf 5.91 -9.68 Td[(^kij=+(wk+1i)T^aij! (2{24)andthenapplying( 2{23 )toobtainzij.Bysubstituting( 2{23 )forzk+1ijinthedualupdateequationgives ^k+1ij:= (1=N)NXi=1)]TJ /F1 11.955 Tf 5.92 -9.69 Td[(^kij=+(wk+1i)T^aij)]TJ ET q .478 w 331.57 -281.9 m 337.54 -281.9 l S Q BT /F3 11.955 Tf 331.57 -288.72 Td[(zk+1j!;(2{25)whichdoesnotdependoni.Therefore,thedualvariables^k+1ij;i=1;:::;Nareallequalandcanbereplacedbyasingledualvariable^k+1j.Insummary,bysubstituting^jand( 2{23 )intothe^-minimization,^-minimization,anddualvariableupdateequation,ournalalgorithmconsistsofthefollowingiterations:wk+1i:=argminwikwik1+(=2)mXj=1)]TJ /F5 11.955 Tf 5.48 -9.68 Td[(wTi^aij2 (2{26))]TJ /F3 11.955 Tf 11.96 0 Td[(mXj=1wTi^aTij (wTi)k^aij+ zkj+^kj )]TJ /F1 11.955 Tf 15.54 8.08 Td[(1 NNXi=1(wTi)k^aij!^k+1:=argmin z;vmXj=1log1+exp)]TJ /F1 11.955 Tf 8.86 3.15 Td[(^bj(N zj+v))]TJ /F1 11.955 Tf 12.39 0 Td[(^kjN zj+N 2 z2j)]TJ /F3 11.955 Tf 11.96 0 Td[( zjNXi=1(wTi)k+1^aij (2{27)^k+1j:=^kj+ 1 NNXi=1(wTi)k+1^aij)]TJ ET q .478 w 280.03 -593 m 286 -593 l S Q BT /F3 11.955 Tf 280.03 -599.82 Td[(zk+1j!: (2{28) 39

PAGE 40

TheentireproceduresofouralgorithmaredescribedinAlgorithm 2 .Attheendofthealgorithm,Patientswilllearntheglobaloptimalregressionparameterswandvwithoutdisclosingtheirlocaltrainingsettoothers. Algorithm2. DistributedAlgorithmforVerticallyPartitionedData 1: Initialization:k 0,(wTi)0 0, z0j 0,and^0j 0. 2: repeat 3: EachPatientisolvestheper-Patientsubproblem( 2{26 )independentlyusinghis/herowntrainingsettoobtaintheoptimalsolutionwk+1iandthensendsf(wTi)k+1^aij;j=1;:::;mgtothemHealthserver. 4: Aftergatheringf(wTi)k+1^aij;j=1;:::;mgfromallPatientsi=1;:::;N,themHealthserveraveragesthemtoobtain(1=N)PNi=1(wTi)k+1^aij;j=1;:::;m.Thenitupdatesvk+1andf zk+1j;j=1;:::;mgaccordingto( 2{27 ),anddualvariablesf^k+1j;j=1;:::;mgaccordingto( 2{28 ). 5: ThemHealthserverbroadcasts zk+1j,^k+1j,and(1=N)PNi=1(wTi)k+1^aijtoallPatients. 6: k k+1 7: untilConvergencecriteriaismetend 2.3PrivateAggregationofLocalRegressionParametersInthissection,wedescribeasecuresummationprotocolthatenhancestheprivacyofthedistributedalgorithms.TheprotocolcomputesthesumoverencryptedvaluessuchthatonlythesumislearnedbythemHealthserver. 2.3.1PrivateComputationatthemHealthServerInthedistributedalgorithmforhorizontallypartitioneddata,eachPatientisendshis/herlocaloptimalsolution(wk+1i;vk+1i)and(ki;w;ki;v)tothemHealthserver(Line 4 ,Algorithm 1 ).However,theselocalregressionparametersaretrainedonindividualprivatedataandmayleaksensitiveinformationaboutPatients[ 24 ].Weobservethat 40

PAGE 41

ineachiteration,themHealthserveronlyneedstoknowtheaverageoftheselocaloptimalsolutions,i.e., wk, vk, kw,and kv.Similarobservationcanbemadeforthedistributedalgorithmforverticallypartitioneddata,wherethemHealthservergathersf(wTi)k+1^aij;j=1;:::;mgfromallPatientsi=1;:::;Nbutonlyneedstoknowtheaverages(1=N)PNi=1(wTi)k+1^aij;j=1;:::;m(Algorithm 2 ,Line 4 ).Hence,theprivacyissuesinbothalgorithmscanbemitigatedifthemHealthservercancalculatetheaverage(orsum)withoutknowingtheindividualvalues.Wewillrstdescribeanaiveapproachthatenablessecuresummationoverdistributedprivatevaluesbutleaksprivatevaluesundercollusionattacks.Thenwewillpresentasolutionthatmitigatessuchattackswithlowcomputationalandcommunicationoverhead.ForsimplicityweonlydescribehowthemHealthservercanobtain vkfromdistributedvaluesvk+1i;i=1;:::;Nwithoutlearningthem.Withoutlossofgenerality,weassumethatvk+1iisaninteger.Averagingoverothertypesofdistributedvalues(i.e.,realnumbersorvectors)canbecalculatedinasimilarway:(i)Whenvkiisarealnumber,agivenprecisionischoseninadvance,andrealnumbersattheprecisioncanbescaledbythecorrespondingfactortomakethemintegersforencoding,asdescribedin[ 32 ];(ii)Averagingvectorscanbetreatedasaggregatingscalarsateachcomponentofthevectors.Whenthecontextisclear,weomitthesuperscriptkandk+1.Naiveapproach.AnaivesolutiontoaveragingdistributedprivatevaluesisthesecuresummationprotocolproposedbyCliftonet.al.[ 33 ].Usingtheirprotocol,patientsarearrangedinaunidirectionalringwithonepatientactingastheprotocolinitiator.Theprotocolinitiatorselectsarandomnumberandaddsthenumbertohis/herowndata,thenthesumispassedalongthering,withPatientsalongtheringaddingtheirowndatatothesum.Whentheprotocolinitiatorreceivesthesumagain,he/shesubtractstherandomnumberfromthesumandobtaintheaccuratesumofallPatient'sdata.TheaveragecanbedirectlycomputedbydividingthesumbyN.SincethevaluespassedbetweenPatientsaremaskedbyarandomvalue,whichisonlyknownbytheprotocolinitiator,these 41

PAGE 42

Figure2-4. AggregationofPrivateUserData. valuesarekeptprivate.However,thisapproachisnotsecureagainstcollusion:IfthetwoneighborsofaPatientcollude,theycaninfertheprivatevalueofthePatient.Moreover,thisprotocolrequiresPatientstointeractwitheachotherwheneverasumneedstobecalculated,whichisundesirableforcomputationovermultipleiterationsasrequiredinouralgorithms.Modiedapproach.Toovercomethetwoaforementionedshortcomings,weuseahomomorphicapproachthatisrobustagainstcollusionattacksandhighlyecientforcomputationovermultipleiterations[ 34 ].Withthisapproach,securesummationcanbeachievedwithoutanyactivetrustedthird-parties.Moreover,thisapproachhaslowamortizedcomputationaloverheadandisthusecientforouriterativealgorithms.TheoverviewofthisapproachisshowninFig. 2-4 .Atthebeginningoftheaggregationprocess,eachPatientihasasecretkeyskiandthemHealthserverhasasecretkeysk0,wherePNi=0ski=0.ThePatientencryptshis/herprivatedataviasvi+skiandsendstheciphertexttothemHealthserver.ThemHealthserversumsalltheciphertextanddecryptsthesumasPNi=1vi=PNi=1(vi+ski))]TJ /F3 11.955 Tf 12.38 0 Td[(sk0.SincethemHealthserverdoesnotknowthesecretvaluesofski,theindividualciphertextsaremeaninglessrandomnumbersfromtheviewofthemHealthserver.Thisschemepreventscollusionattackbecause 42

PAGE 43

eachprivatevalueisrandomizedbyaseparatesecretkeyandwillonlyberevealedifthemHealthservercolludeswithallotherPatients,whichisunlikely.InordertoguaranteethatPNi=0ski=0,secretkeysshouldbecollaborativelygeneratedineveryiteration.Thisprocessrequiresanactivetrustedthirdpartyandisnotpracticalforouriterativealgorithms.Toovercomethischallenge,werelyonahashfunctionHthatmapsanintegertoanappropriatemathematicalgroup.Inthekthiterationofouralgorithms,eachPatienticomputesH(k)skiandthemHealthservercomputesH(k)sk0.FromPNi=0ski=0,wehaveQNi=0H(k)ski=1,whichcanbeleveragedtogeneratesecretkeyswithoutinteractivecommunicationineachiteration.Wesummarizetheprotocolbelow.LetGdenoteacyclicgroupofprimeorderpforwhichDecisionalDie-Hellmanproblemishard.LetH:Z!Gdenoteahashfunction. Keygeneration:Atrustauthoritychoosesarandomgeneratorg2Gandrandomsecretssk1;:::;skN2Zp.Thepublicparameterisg.Eachuseriobtainsaprivatekeyski,andthemHealthserverobtainsitsprivatekeysk0=)]TJ /F1 11.955 Tf 9.3 0 Td[((sk1+:::+skN). Encryption:Duringiterationk,Patientiencryptshis/herprivatevalueviasfollows:ci gviH(k)ski: Decryption:Giventheciphertextc1;c2;:::;cn,computeP H(k)sk0nYi=1ci;whereP=H(k)sk0Qni=1ci=H(k)Pni=0skigPni=1vi=gPni=1vi.ThesumofvicanthenbecalculatedbycomputingthediscretelogofPbaseg.TheschemeallowsuntrustedmHealthservertoperiodicallyestimatethesumPNi=1viwithoutknowingeachindividualvalueofvi;i=1;2;:::;N.Theaverage vcanthenbereadilycalculatedbydividingthesumbythetotalnumberofPatientsN.SincePatientsdonotneedtocommunicatewitheachotherforsharingsecretsaftertheinitialkeygenerationprocess,weonlyrequireapassivetrustedauthorityduringinitialization. 43

PAGE 44

Hence,thecomputationaloverheadofsecretkeysdoesnotincreasewiththeiterationprocess,achievinglowamortizedcomputationaloverhead.Thecomputationaloverheadfortheaggregationprocessineachiterationcomesfromtwoparts:encryptionandsecretkeygeneration.Encryptionoperationintheconstructionincludesonehash,onemultiplicationinaDie-Hellmangroup,andtwomodularexponentiations.Thetwomodularexponentiationstakesmuchlongerthanotheroperationsandthusdominatetherunningtime.AccordingtothebenchmarkingreportofeBACsproject[ 35 ],ittakesaround0:3mstocomputeamodularexponentiationusinghigh-speedellipticcurvesonamodern64-bitcomputer.Hence,theconstructionispracticalandposeslowcomputationaloverheadforthePatientsinmHealthapplications. 2.3.2ProvablePrivacyWeprovetheprivacyofourapproachfromthefollowingtwoaspects:First,weshowthatourcomputationprotocolleaksnoinformationbeyondtheintermediateandnalaggregatedregressionparameters.Wenotethatthehomomorphicapproachweuseinourschemeis\aggregatoroblivious"inthesensethattheaggregator(i.e.,mHealthserver)learnsonlythesumforeachtimeperiodandnothingmore.Detailedproofofthispropertycanbefoundin[ 34 ].Basically,theirproofisbasedontheassumptionthattheDecisionalDie-Hellmanproblemiscomputationallyinfeasibleforprobabilisticpolynomial-timeadversaries.The\aggregatoroblivious"propertyofthehomomorphicapproachguaranteesthatthemHealthservercannotlearnanyunintendedinformationotherthanwhatcanbededucedfromitsauxiliaryknowledgeandtherevealedcomputationresults.FromthemHealthserver'sview,theinputdataoftheaggregationprotocol,i.e.,Patients'intermediateregressionparameters,areindistinguishablefromdatauniformlychosenatrandomfromtheplaintextspace.Second,Weshowtheinformationleakageduringtheiterationsofouralgorithmsisbounded.Intuitively,wecanviewthesumoflocalparametersrevealedineachiterationas\global"informationandthusprivacy-preservingincommonpractice.Wecanprovide 44

PAGE 45

astrongprivacyguarantee,(;)-dierentialprivacy(DP)[ 36 , 37 ],whichensuresthattheprivacyriskofauserdoesnotsubstantiallyincreaseiftheuserparticipatesincollaborativelearningdespiteoftheauxiliaryknowledgeofadversaries.Formallyspeaking, Denition1. 8;0,arandomizedalgorithmFgives(;)-DPifforanytwodatasetsD1andD2whichdierinonlyoneelement,and8Orange(F),thefollowinginequalityholds: lnPr[F(D1)2O])]TJ /F3 11.955 Tf 11.96 0 Td[( Pr[F(D2)2O]:(2{29)Heretheparameterboundstheratioofprobabilitydistributionsoftwodatasetsdieringonatmostoneelement,andrelaxesthestrictrelativeshiftateventsthatarenotespeciallylikely.Mostsolutionsthatachievedierentialprivacyarebasedonperturbingtheresponsewithadditivenoise[ 36 , 38 ]orperturbingthecomputationwithexternalrandomness[ 24 , 39 ].Thiscanalsobeachievedinourscheme.Notethatwewanttoaddtheproperamountofnoisetotheaggregatedresultratherthaneachindividuallocalparameter,becausethenoiseforthethelattercasewouldbelargerunderthesameprivacyrequirement.Meanwhile,weneedtoensurethattheaggregatedresultsareperturbedbeforetheyaredecryptedbythemHealthserver.AfeasiblewaytoachievethisistointroduceaproxywhichcanbeanotherserveratthemHealthserviceprovider,andlettheserverandtheproxycollaborativelyaddnoisebeforedecryptingtheaggregatedresults,asproposedin[ 34 ].Sincetheaggregationprocessinouralgorithmsisrepeatediteratively,thenoiseaddedineachiterationwouldaccumulate.However,duetothegoodconvergencepropertiesofouralgorithms(usuallyconvergewithinafewtensofiterations),thetotaladdednoisecanbecontrolledatalowlevel,ensuringtheaccuracyofthenalresults.Furthermore,wenotethattheprocessofaggregationhavealreadyincorporatedrandomness,thusprovidingcertainprivacyprotectionitself.Infact,Duan[ 40 ]hasprovidedarigorousproofthatdierentialprivacycanbeachievedbyaggregatingvectors 45

PAGE 46

fromalargenumberofentitiesundercertainconstraints,assummarizedinthefollowingtheorem: Theorem2.2([ 40 ]). LetfbethesumofNn-vectorswi;i=1;:::;N,wi2[0;1]n.Assumingw1;w2;:::;wNarei.i.d.withE[wi]=;E[wiwTi])]TJ /F3 11.955 Tf 12.99 0 Td[(T=V<1,thesummationis(;)-DPifNissucientlylargeand min(V)>2n2log(2n=) (N)]TJ /F1 11.955 Tf 11.95 0 Td[(1)2;(2{30)wheremin(V)isthesmallesteigenvalueofmatrixV.Thistheoremprovidesatheoreticbasisforachievingdierentialprivacyintheaggregationprocess.Evenifforagivensetof(;),theconstraint( 2{30 )isnotsatised,i.e.,theaggregationprocesscannotprovideenoughprivacyprotection,wemaystillachieve(;)-DPwiththeperturbationapproach,andtheamountofnoiserequiredintheperturbationapproachmaybefurtherreducedwithTheorem 2.2 . 2.4PerformanceEvaluationInthissection,weevaluatetheperformanceofourapproachbasedonreal-worlddatasets.AllthesimulationsareconductedinMATLABusinganotebookwith1:6GHzCPUand4Gmemory.Toprovidebenchmarksfortheperformanceofourdistributedapproach,wecompareitwiththefollowingtwobaselines:Centralizedapproach:Inthisapproach,themHealthserverhasaccesstoallPatientdataandsolvesthelogisticregressionproblemcentrally.Althoughthisalgorithmcanobtaintheoptimalperformance,itviolatesPatientprivacyandthusisprivacy-oblivious.Localapproach:Inthisapproach,eachPatienttrainsthelogisticregressionmodelsolelybasedonhis/herownlocaldata.Sincetheperformanceofthemodelhighlydependsonthesizeoftrainingset,thelocalapproachhaslowerperformancethanthecentralized 46

PAGE 47

approach.Inotherwords,thelocalapproachprotectsPatientprivacyatthecostofutilityoraccuracy. 2.4.1ResultsonHorizontallyPartitionedDatasetWerstdemonstratetheperformanceofourdistributedapproachforhorizontallypartitioneddata(Algorithm 1 ).WetestourapproachonthedatasetforthephysiologicaldatamodelingcontestattheInternationalConferenceonMachineLearningin2004[ 41 ].ThedatasetwascollectedfromusersusingBodyMediawearablebodymonitors.Weuseasubsetofthedatasettoclassifytwostatesofactivities,whichincludes4413positivesamples(context1)and98172negativesamples(context2).Eachsamplecontains9dimensionalphysiologicaldataand3characteristics(denotedas\char"1,\char"2,andsex)oftheusers.Thus,weconstructa10258512matrixfromthemonitoringdata.Thelabelforeachrowisthecontextoftheuserwhenthesampleiscollected:Whentheuserisundercontext1,welabelitas1;otherwise,welabelitas)]TJ /F1 11.955 Tf 9.3 0 Td[(1.Weaimtotrainalogisticregressionmodelthatpredictsthelabelbasedonanewsample.Ineachexperimentaltrial,werandomlyselect14000trainingsamples(4000positivesamples,10000negativesamples)and1413testingsamples(413positivesamples,1000negativesamples).Thetesterrorratesofalgorithmsareaveragedresultsof10experimentaltrials.WeimplementourdistributedalgorithmandobservegoodconvergencepropertiesfordierentnumbersofPatientsN.SincetheconvergencepropertiesfordierentNaresimilar,weonlydemonstratetheconvergenceresultsforN=1000.TheconvergencepropertyofourdistributedalgorithmisdepictedinFig. 2-5 ,whichshowsthechangeofthelogisticregressionparametersw.r.t.theiterationnumberk.Thex-axisoftheplotrepresentsthenumberofiterationsk,andthey-axisoftheplotrepresentsthenormofthedistancebetweentheglobaloptimalregressionparametersandtheintermediateregressionparametersineachiteration.Wecanseefromthegurethatthelogisticregressionparametersobtainedbyourapproachconvergesfast(around40iterations)totheoptimalones.Todemonstratetheaccuracylossofthedistributedapproachw.r.t.thecentralized 47

PAGE 48

approach,weshowinFig. 2-6 thechangeoftheobjectivefunctionvaluew.r.t.iterationnumberk.Thesolidlineindicatestheobjectivevalueobtainedbyourapproach,andthedashedlinedenotestheglobaloptimalobjectivevalueobtainedbythecentralizedapproach.Asshowninthegure,theobjectivevalueofourapproachdecreasesveryfastintherstfewiterationsandnallyapproachestheoptimalvalueafter40iterations.Thisindicatesthatourdistributedapproachcanachievethesameaccuracyasthecentralizedone,thuspreservingprivacyatnocostforaccuracy.AswehaveshowninFig. 2-5 andFig. 2-6 ,themodelcomputedfromthecentralizedapproachisthesameasthatfromourdistributedapproach.Therefore,weonlyneedtocomparethetestingerrorratesofmodelscomputedbythedistributedapproachandthelocalapproach.Sinceforthelocalapproach,theperformancedependsonthesizeoftrainingsetforeachuser,wecomparethemunderdierentusernumbersNtoseetheinuenceoflocaldatasizeontheerrorrates.NotethatwhenN=1,theperformanceofthetwoapproachesarethesamesincebothareidenticaltothecentralizedapproach.WesetN=100;200;:::;1000andthenrandomlydividetheoriginaldatasetsintosmallertrainingsets,respectively.ThetestingerrorratesofourdistributedapproachandthelocalapproachareshowninFig. 2-7 .Ononehand,theerrorrateofthelocalapproachincreasesasNincreases(i.e.,samplesizeperuserdecreases)duetothelackofdiversitythroughdatasharing.Ontheotherhand,theerrorrateofourapproachdoesnotchangew.r.tNbecauseouralgorithmalwaysconvergestotheglobaloptimalsolution.Thisshowstheadvantageofcollaborativelearningbyutilizingdatasetssensedbymultipleusers.Themaximumcomputationtimeforanyuserateachiterationinouralgorithmis0:007sec.Thetotaltimeforthedistributedapproachtoconvergeisaround0:12sec.Therefore,ourapproachconvergesfasttotheglobaloptimalsolutionandincurssmallcomputationoverheadforeachuser. 48

PAGE 49

Figure2-5. Convergenceofthelogisticregressionparametersobtainedfromourdistributedapproachonthehorizontallypartitioneddataset. Figure2-6. Comparisonoftheobjectiveofourapproach(solidline)andtheoptimalobjective(dashedline)onthehorizontallypartitioneddataset. 2.4.2ResultsonVerticallyPartitionedDatasetInthissection,wedemonstratetheperformanceofourdistributedapproachforverticallypartitioneddata(Algorithm 2 ).Sincenoverticallypartitionedmedicaldatabaseisreadilyavailable,weutilizedatasetIoftheBrain-ComputerInterfaceCompetitionIII 49

PAGE 50

Figure2-7. Testingerrorsforourdistributedapproachandthelocalapproachonthehorizontallypartitioneddataset. (BCI-III-I)[ 10 , 42 , 43 ]tosimulateourscenario.Thedatasetrecordselectrocorticography(ECoG)dataofepilepticpatients.Intheoriginalsetting,eachpatientissensedby64implantedelectrodescoveringcertainlocationsofthecortex.However,hereweassumethatdataforeachtrialiscollaborativelysensedby64patientswith1electrodeimplantedineachpatient.Atotalof278trialsareperformedfordatacollection.Eachtrialstartswithacueofanimagerytask(tongueorngermovement),andpatientsarerequiredtomentallyfollowthecue.TheirECoGdataduringa3-secondimaginationphasearesampledatsamplingrateof10Hz,resultingin3064datapointsora1920-dimensionalfeaturevectorpertrial.Thedataforall278trialsforma2781920matrix.Eachrowofthematrixrepresentsdatasampledwiththesameimagerycuebyall64patientsandhasthesamelabel(+1or)]TJ /F1 11.955 Tf 9.3 0 Td[(1fortwodierentimagerytasks,respectively).Thenumberofavailabletrainingpointsisrelativelysmallcomparedtothedimensionalityofthedatasignal,whichisacommoncaseforverticallypartitioneddatabasesandcanbeeectivelyaddressedbyourdistributedapproach.Notethatweusedatasensedbyasinglepatienttosimulatedatasensedby64patients,thusthetrainingresultmaydeviatefromtheoriginalverticallypartitionedsetting.However,thegoalofthisexperimentistotestthe 50

PAGE 51

performanceofourdistributedapproachratherthanobtainingtheregressionparameters,thusthisdeviationdoesnotinvalidateourconclusion.Ineachexperimentaltrial,werandomlyselect200trainingsamples(100positive,100negative)and78testingsamples(36positive,36negative)witheachsamplecollaborativelysensedby64users.Thetestingerrorratesaretheaverageresultsof10experimentaltrials.Onceagain,weobservegoodconvergencepropertiesofouralgorithmsfordierentnumbersofPatientsNandtherefore,forsimplicity,weonlyshowtheconvergenceresultsofourdistributedalgorithmforN=64.Fig. 2-8 illustratesthechangeofthelogisticregressionparametersw.r.t.theiterationnumberkwhenthedatasetisverticallypartitionedintoN=64subsets.Thegureshowsthatthedierencebetweentheregressionparametersobtainedbyouralgorithmandtheoptimalparametersconvergestozerowithintensofiterations.Fig. 2-9 showsthechangeoftheobjectivefunctionvaluew.r.t.iterationnumber.Thesolidlineindicatestheobjectivevalueobtainedbythedistributedapproach,andthedashedlinedenotestheoptimalobjectivevalueobtainedbythecentralizedapproach.Asshowninthegure,theobjectivevalueofourapproachdecreasesfastintherstfewiterationsandnallyapproachestheglobalminimumafter50iterations.Therefore,ourdistributedapproachcanachievethesameaccuracyasthecentralizedone.Next,wecomparetheerrorratesofmodelscomputedbythedistributedapproachandthelocalapproach.Sinceforthelocalapproach,theperformancedependsonthesizeoftrainingsetforeachuser,wecomparethesetwoapproachesunderdierentusernumbersNtoseetheinuenceoflocaldatasizeontheerrorrates.WhenN=1,theperformanceofthetwoapproachesarethesamesincebothareidenticaltothecentralizedapproach.WesetN=2;4;8;6;32;64andrandomlypartitiontheoriginaldatasetsintosmallertrainingsets,respectively.Fig. 2-10 showstheerrorratesofourdistributedapproachandthelocalapproach.WecanseethattheerrorrateofthelocalapproachincreasesasNincreases(i.e.,samplesizeperuserdecreases)duetothelackofdiversity 51

PAGE 52

Figure2-8. Convergenceofthelogisticregressionparametersobtainedfromourdistributedapproachontheverticallypartitioneddataset. Figure2-9. Comparisonoftheobjectiveofourapproach(solidline)andtheoptimalobjective(dashedline)ontheverticallypartitioneddataset. throughdatasharing.TheperformanceofourdistributedapproachdoesnotdependonNsinceitalwaysconvergestotheoptimalsolutionaftertensofiterations.Thegureshowsthebenetofdatasharingintheverticallypartitioneddata. 52

PAGE 53

Figure2-10. Testingerrorsforourdistributedapproachandthelocalapproachontheverticallypartitioneddataset. Themaximumcomputationtimeforanyuserateachiterationinouralgorithmis0:023secandthetotaltimespentuntilconvergenceis1:15sec.Therefore,ourapproachishighlyecientevenwith1960-dimensionaldata. 2.5RelatedWorkThereareanumberofpapersonprivatecomputationonmedicaldata,butmostofthemfocusonsimplecomputationssuchassearchingonencryptedmedicaldata[ 44 ],computingstatisticalfunctionssuchassumandvariance[ 45 ],orperformingpredictiveanalysistasksonencrypteddata[ 32 ].Fewpapersconsiderprivatemodellearningbasedonlarge-scalemedicaldatadespiteofitsgreatpotentialforhealthcarequalityandeciencyimprovements.Thereare,however,severalapproachesforprivatemodellearningingeneralassummarizedbelow.Anonymization:Oneofthemostpopularwaysforprivacy-preservinglearningistoanonymizethedatabyhidingtheidentityofthedatasource[ 46 , 47 ].However,itispossibletore-identifythedatasource.NarayananandShmatikovdesignalinkageattackthatidentiespersonalinformationbylinkingtwoormoreseparatedatasets[ 48 ].Arecent 53

PAGE 54

studyinmedicaldatademonstratesthatindividualswithdetailedmedicalprolesarere-identiedamonganonymizedmedicaldata[ 49 ].Perturbation-BasedApproach:Anotherapproachistoperturbthedatacontentbeforetransmittingittothecentralizedparty[ 50 { 53 ].FongandWeber-Jahnke[ 50 ]transformtheoriginaltrainingsamplesintounrealdatasamplesandusetheunrealdatasamplesfordecisiontreelearning.However,perturbationalwaysintroduceserrorinthemodelingprocess,tradingaccuracyforprivacy.Amodernprivacydenitionrelatedwiththisapproachisdierentialprivacy,whichrequiresthattheoutputofacomputationbeequallylikelywithorwithoutaninputrecord[ 37 ].Themostcommonwaytoachievedierentialprivacyisthroughaddingrandomnoise[ 38 ].In[ 54 ],McSherryandMironovdesignaprivacy-preservingschemefortrainingarecommendationsystembyaddingdierentiallyprivatenoisetouserdata.Ourapproachisorthogonaltodierentialprivacyduetodierencesinthreatmodels.Dierentialprivacyprotectprivateinformationcontainedinthenalcomputationalresultsbyinjectingnoisetotheresults,whileweaimtoprotectprivateinformationduringthecomputationprocesssuchthatthepartywhoperformsthecomputationlearnsnothingmorethanthecomputationalresults.SecureMulti-PartyComputation:Securemulti-partycomputation-basedapproachisaconventionalapproachtotrainingclassiersbasedonprivatedataownedbymultipleparties.Acombinationofcryptographictechniquesisusedtocomputeafunctionoftheirprivatedata[ 55 { 57 ].Thisapproachusuallyguaranteesthatnonepartiescanlearnanythingbeyondwhatiscontainedinthenalresult.However,thecryptographictechniquesusedinsecuremulti-partycomputationusuallyincurhighcomputationcost,whichisimpracticalformHealthapplicationsduetolimitedcomputingresourcesofmobiledevices.HomomorphicEncryption:Gentry[ 58 ]providesafullyhomomorphicencryptionsolutionforprivacy-preservingcomputation,whichavoidstheneedfortwonon-colludingparties.However,logisticregressioninvolvesalargenumberofbothmultiplicationand 54

PAGE 55

additionsteps.Inthissituation,currentsolutionsforfullyhomomorphicencryptionarenotquiteecient[ 45 , 59 ].AlthoughLauteretal.[ 45 ]mentionthattheirfullyhomomorphicencryptionschemecanbeusedforregression,theydonotshowitsperformance.Graepeletal.trainencryptedclassiersonencryptedtrainingdatausingleveledhomomorphicencryption[ 59 ],however,theeciencyoftheirapproachesdegradesrapidlywhenthesizeofthetrainingdataincreases.Formodellearningbasedonlarge-scalebiomedicalsensingdata,itisimportantthatthetrainingalgorithmsscalewellasthenumberofPatientsincreases.MostoftheaforementionedcryptographicsolutionsincurhighcomputationorcommunicationloadatthePatientside,andthuscannotbedirectlyappliedtoourscenario.WeaddressthescalabilityproblembydecomposingthecentralizedoptimizationproblemintosubproblemssuchthatthecomputationcostperPatientdoesnotgreatlyincreasewiththenumberofPatients.Specically,thedecompositionalgorithmweuseisbasedonADMM,whichhasbeenpreviouslyusedfordecomposingsupportvectormachine(SVM)in[ 60 ].Duetothedecomposition,onlytheaverageoflocallyoptimalparametersareneededbythemHealthserver.Thuswecanutilizeasimplesecuresummationprotocolwithlowamortizedcomputationalcosttoprotectprivateintermediateresults.Thispaperisanextensionofitsconferenceversion[ 61 ],withanewsolutionforvertical-partitionedhealthcaredata,morein-depthexplanationsofourapproach,andamoreextensiveexperimentalevaluation. 55

PAGE 56

CHAPTER3APRIVACY-PRESERVINGSCHEMEFORINCENTIVE-BASEDDEMANDRESPONSEINTHESMARTGRIDThesmartgridisamodernizedpowergridthatusesinformationandcommunicationtechnologiestoimprovetheeciency,reliability,economics,andsustainabilityofthegeneration,transmission,distribution,andconsumptionofelectricity.Inthesmartgrid,afullmeasurementandcollectionsystemcalledtheadvancedmeteringinfrastructure(AMI)replacestraditionalelectromechanicalmeters.TheAMIcollectsne-grained,time-basedinformationandtransmitsthemtovariouspartiesthroughacommunicationnetwork,enablingtheintegrationofdemand-sideresourcesintothewholesalemarketandhencethedemandresponse(DR).AccordingtotheU.S.DepartmentofEnergy,DRrefersto\changesinelectricusebydemand-sideresourcesfromtheirnormalconsumptionpatternsinresponsetothevaryingelectricityprice,ortoincentivepaymentsdesignedtoreduceelectricityusewhenwholesalemarketpricesarehighorwhensystemreliabilityisjeopardized"[ 62 ].Inthepowergrid,generationandconsumptionshouldbebalancedinstantaneously.Theload-followingstrategy,whereapowerplantadjustsitspowersupplytomatchtheuctuatingdemand,hasbeendominantinthetraditionalpowergridoperations.However,thisstrategyincursahighcostintermsofenvironment,gridreliabilityandoperationaleciency.Onthecontrary,thesmartgridplacesgreatemphasisontheDRstrategywhereconsumersshapetheirpowerdemandtomatchthesupply[ 63 ].DRsupportshighpenetrationofrenewableenergygenerationbyshapingthedemandtomatchtheintermittentandunpredictablepoweroutputofrenewablegeneration,anditalsobringsotherbenetssuchaspeakshaving,reliabilityenhancement,andgenerationcostreduction.Generallyspeaking,therearetwotypesofDRprograms:price-baseddemandresponse(PDR)programsthatmotivatecustomerstochangetheirconsumptionpatternsaccordingtotime-varyingelectricitypricesandincentive-baseddemandresponse(IDR)programsthatrewardparticipatingcustomersforreducingtheirelectricityusagein 56

PAGE 57

responsetoDRrequests.AlthoughmoreutilitiesoersometypesofPDRprogramstocustomersthanIDRprograms,PDRaccountsforjustasmallpartofthetotalDRresourcebase[ 64 ].SinceIDRprogramscanbetailoredtospecicoperationalgoalssuchaslocalizedloadreductionduringtransmissioncongestion,theydiversifythewaysinwhichdemand-sidemanagementcontributestoreliableandecientgridoperations.InIDRprograms,thetimeintervalofmeasurementsvariesfromhourstosecondsbasedondierenttriggerconditions[ 1 ],whichposesaseriousthreattocustomerprivacy[ 65 , 66 ].Ithasbeenshownthatpowerusageprolesatagranularityof15minutesmayrevealwhetherachildisleftaloneathomeandatanergranularitymayrevealthedailyroutinesofcustomers[ 67 ].Despiteitsimportance,theprivacyissuesinIDRprogramshaveneverbeenaddressedbefore.TheuniquechallengeofIDRprogramsliesinthefactthatthemetermeasurementsshouldbebothattributableandne-grained,excludingsomepopularprivacy-preservingapproachesthataddressprivacyissuesinPDRprograms.InIDRprograms,thereisanewpartycalledthedemandresponseprovider(DRP),whoaggregatesdemand-sideresourcesofcustomersandrewardscustomersbasedontheirdemandcurtailmentsinDRevents.TheDRPcaneitherbetheelectricutilitycompanyorathirdparty,anditcollectsne-grainedmeteringmeasurementsinordertocalculatethecustomerbaseline(CBL)andhencethedemandcurtailments.Inthispaper,weaimatpreservingcustomerprivacyforIDRprogramsinthesmartgrid.WeproposeaschemethatenablestheDRPtoprole,reward,andprovidefeedbacktocustomersinIDRprogramswithoutviolatingcustomerprivacy.TheDRPisabletoanalyzene-grainedmeteringdatatocalculateCBLs,scheduledemandcurtailments,andcorrectlyrewardcustomers,butitcannotlinktherealidentityofacustomertothene-grainedmeteringdata.Ourschemeisconstructedbycryptographicprimitives.Individualmeteringdataaresignedwithaspecialtechniquesuchthattheauthenticitycanbeveriedwithoutrevealingtherealidentityofthesigner.WhencustomerswanttoinquiretheirmeteringdataorclaimtheirDRrewards,theyprovetheireligibilityto 57

PAGE 58

theDRPbutrevealnoadditionalinformationaboutthemselves.Withthesetechniquescombined,theanonymityofcustomersisguaranteedthroughouttheIDRprocesses.Asfarasweknow,wearethersttoaddresstheprivacyissuesinIDRprograms.Therestofthepaperisorganizedasfollows.Section 3.1 presentsthecryptographicprimitivesusedinourscheme.WeprovidesomebackgroundonIDRprogramsanddescribethecomponents,systemow,anddesigngoalsofourschemeinSection 3.2 .Section 3.3 elaboratesontheproposedscheme,wherewedesignprivacy-preservingprotocolsfordierentprocessesinIDRprograms.PracticalconsiderationsandusefulextensionsarepresentedinSection 3.4 .Section 3.5 andSection 3.6 analyzethesecurityandtheeciencyoftheproposedscheme,respectively.Section 3.7 reviewstherelatedwork. 3.1CryptographicPrimitivesThissectiongivesanintroductiontothecryptographicprimitivesusedasthebuildingblocksinourscheme.Identity-CommittableSignature(ICS).Theidentity-basedsignaturescheme[ 68 ]avoidstheuseofcerticatesinconventionalpublickeyinfrastructurebyderivingthepublickeyofasignerfromhispublicidentityinformationsuchasemailaddressandtelephonenumber.Theschemedesignedin[ 69 ]makesuseofbilinearpairingsonellipticcurves,apopulartechniqueinidentity-basedpublickeycryptography.LetGbeanadditivegroupwithgeneratorP,andGTbeamultiplicativegroup.Amapping^e:GG=GTiscalledabilinearpairingifitsatisesthefollowing: Bilinearity:^e(aP;bP)=^e(P;Q)abforalla;b2ZpandP2G. Non-degeneracy:IfPisageneratorofG,then^e(P;P)6=1. Computability:Thereexistsanecientalgorithmtocompute^e=(P;Q)forallP;Q2G.Basedontheidentity-basedsignaturescheme,ChuandTzeng[ 70 ]constructanICSschemewhichallowsasignertosignamessageonbehalfofanorganizationoragroup. 58

PAGE 59

Theschemeissetupasfollows.Theprivatekeygenerator(PKG)choosesamastersecretkey(x;y):x;y2RZpandthreehashfunctionsH1:f0;1g!G,H2:f0;1gG!Zp,andH02:f0;1gGG!Zp.ThenitcomputesPX=xP,PY=yPandpublishes(G;GT;^e;P;PX;PY;H1;H2;H02)asthepublicparameters.ForidentityI,theDRPcalculatesQI=H1(I),Q0I=xQI,andSI=xyQI.ThepublicandprivatekeypairsfortheuserareQIand(Q0I;SI),respectively.TogenerateanICSonmessagem,thesignerrandomlyselectsavaluer2Zp,computesh=H2(m;U),andgeneratesUI=rQ0I,VI=(r+h)SI.Thesignerthenchoosesasecret2Zpnf1gandcomputesQ=QI;Q0=Q0I;U=UI;V=VI.TheICSonmessagemisIC=(Q;Q0;U;V).Toverifythesignature,theveriercalculatesh=H02(m;Q;U)andacceptsthesignatureifandonlyif^e(Q;PX)=^e(Q0;P)and^e(U;PY)=^e(V;P)^e(Q0;)]TJ /F3 11.955 Tf 9.3 0 Td[(PY)hhold.Zero-knowledgeProof(ZKP).ThenotionofZKPisintroducedbyGoldwasseretal.[ 71 ]inwhichtheprovertakesinteractiveinputfromtheverierandrespondsbasedonthisinput.WithFiat-Shamirheuristic[ 72 ],theZKPcanbetransformedintothenon-interactiveformwhereinteractionisnotneededbetweentheverierandtheprover.WefollowthenotionsintroducedbyCamenischandStadler[ 73 ]todescribetheZKPprotocolsandletPKfgdenotesthezero-knowledgeproofofastatement.Forinstance,PKf:C=ggisusedtoprovetheknowledgeofthediscretelogarithmofCwithbaseg.Thisisequivalenttotheknowledgeofthatsatisestheexpressionontherightsideofthecolon.PartiallyBlindSignature.Commitmentschemesenableonetocommitachosenvaluewithoutrevealingit.AwellknowncommitmentschemeisthePedersenCommitment[ 74 ].LetGbeagroupofprimeorderp,andgandhbegeneratorsofG.Tocommitavaluex2Zp,thecommitterrandomlychoosesr2Zp,computesC=gxhr,andoutputsCasthecommitment.Torevealx,thecommitterdisclosesx;r.TheveriercanverifyifC=gxhr.Multiplevaluescanbecommittedinasinglecommitment.For 59

PAGE 60

Figure3-1. ElectricityMarketforIncentive-BasedDemandResponse(IDR). example,thecommitmentforx1;x2isC=gx11gx22hr,whereg1;g2aregeneratorsofG.WedenotethePedersenCommitmentonmessagexasCM(x).AnapplicationofcommitmentschemesistheBBS+signaturedesignedin[ 75 ]and[ 76 ].TheconstructionofBBS+signatureispartiallyblinded:thesignercansignmessagesinacommitmentwithoutknowingtheirvalues.LetG,GTbetwocyclicgroupsofprimeorderp,and^e:GG!GTbeabilinearpairingfunction.Letg;g0;g1;g22GbegeneratorsofG,whicharepublicparameters.Thesignerrandomlychooses2Zpashissecretkeyandcomputes!=gashispublickey.Tosignmessagesm1;m2,thesignerrandomlychoosesc;z2Zp,computesA=(ggz0gm11gm22)1=(c+),andoutputs(A;c;z)asthesignature.OnecanverifyaBBS+signaturebycheckingif^e(A;!gc)=^e(ggz0gm11gm22;g)holds. 3.2SystemModelWedescribethesystemmodeloftheproposedprivacy-preservingschemeinthissection. 3.2.1BackgroundAsshowninFig. 3-1 ,theelectricitymarketforIDRinvolvesthreeentities:themarketoperator,theDRP,andthecustomers.ThemarketoperatormanagestheelectricitymarketandtriggersDReventsbasedonthestatusofthepowergrid.WhenaDReventistriggered,theDRPschedulesloadcurtailmentamongcustomersandaggregatesthesedemandsideresources.Participatingcustomersreducetheirload 60

PAGE 61

Figure3-2. ExampleBaselineandPerformanceMeasurementforDemandResponseAsset[ 1 ]. duringaDReventasscheduled.ThemarketoperatorpaystheDRPforitsaggregatecurtailment,andtheDRPthenallocatestherewardamongparticipatingcustomers.Sincecustomershaveunequalcontributionstotheaggregateloadcurtailment,theyshouldberewardedbasedontheircontributionssothatactivecustomersareencouraged.Individualcustomercontributioniscalculatedasthedierencebetweenhisreal-timepowerconsumptionandhisCBL,whichrepresentsthe\behave-as-usual"usagepatternofacustomer.CalculationofCBLisamongthemostimportantfactorsinanIDRprogrambecauseitshouldneitherrewardnorpenalizeacustomerforhisnaturalloadvariances.Fig. 3-2 (depictedby[ 1 ])givesanexampleofCBL,wheretheinitialbaselineisadjustedaccordingtotheactualloadonthatdaysothattheeortofdemandreductionofthecustomercanbefairlyestimated.Inordertomimicthedynamicshapeofthecustomerload,CBLcalculationalgorithmsoftheDRPtakeasinputanextensivedatasetincludingbothne-grainedhistoricalmetermeasurementsandperipheraldata(e.g.,weatherandtimeoftheday)[ 1 , 77 ].However,thesene-grainedmeteringdataasrequiredbytheDRPintheCBLandcurtailmentcalculationraiseseriouscustomerprivacyconcerns. 3.2.2ComponentsToaddresstheseprivacyconcerns,weproposeaschemewhichenablestheDRPtoperformalltherequiredoperationswithoutlinkingcustomeridentityandne-grained 61

PAGE 62

meteringdata.Theschemeinvolvesfourcomponents,i.e.,smartmeters,proxy,DRP,andcustomerdevices.SmartMeters.Theutilitycompanyinstallssmartmetersatcustomerpremises,oneforeachcustomer.Smartmetersareassumedtobetamper-resistantandabletoperformelementarycryptographicoperations,buttheycannotstorelong-termmeteringdataorperformCBLcalculationduetolimitedstorageandcomputationcapabilities.TheProxy.TheproxyplaystheroleofananonymizerwhichhidesthestaticIPaddressofsmartmeters.Itcanbeeitherthegatewayoratrustedthirdparty.Theproxyissemi-trusted,meaningcuriousbutnotmalicious,inthesensethatitmayattempttolearnthecustomerprivacy,butitwillfaithfullyrelaythemeteringdataandhidethesmartmeterIPaddressfromtheDRP.Fromnowon,whenwereferto\anonymouschannel",wemeanananonymouscommunicationchannelestablishedbytheproxy.TheDRP.TheDRPschedulesDReventsamongcustomers,recordscustomerperformanceinDRevents,andcalculatestheircorrespondingrewards.TheDRPissemi-trusted,meaningthatitmayattempttolearnthecustomerprivacy,butitwillfaithfullyfollowprotocolspecications.CustomerDevices.CustomersquerytheDRPtolearntheirownmeteringdataandclaimDRrewardsthroughcustomerdevices(e.g.,personalcomputersorsmartphones).Customersareassumedtobecuriousandpotentiallymalicious.TheymayimpersonateothercustomersorcolludewiththeDRPtolearnpowerusageprolesofothercustomers,orcheattogainundeservedrewards.Theremaybeexternalattackerswholaunchdenial-of-serviceattackorman-in-the-middleattack,oreavesdrop.However,addressingtheseattacksisbeyondthescopeofthispaper. 3.2.3SystemFlowTheschemeincludesthefollowingprocesses.Intheregistrationprocess,theDRPcreatestwoaccountsforacustomer,oneassociatedwithhisrealidentityandtheotherassociatedwithhispseudonym.Therealidentitycanbeanyinformationthat 62

PAGE 63

uniquelyidentiesthecustomer,suchastheaccountnumberortelephonenumber.SinceacustomercanonlyenrollinasingleIDRprogramatatime,theDRPneedstomakesurethatacustomerdoesnotregistermultiplepseudonyms.Thisisachievedwiththeanonymousticket:thecustomerobtainsaticketwhenheregisterstherealidentityandpresentsittotheDRPwhenheanonymouslyregistersthepseudonym.Inthemeteringprocess,thesmartmetercollectsmeteringdata,constructssignaturesonthem,andsendsthemtogetherwithitspseudonymtotheDRPthroughtheanonymouschannel.TheDRPstoresthedatabypseudonyminthedatabaseandanalyzesthedataforoperationalandsettlementpurposes.TheICSsignatureensurestheauthenticityofthemeteringdata,whiletheZKPensuresthatadversariescannotchangethepseudonyminthemessage.TheZKPalsoenablescustomerstoproveownershipoftheirpseudonymswhenmakingpersonalinquiriesforCBLsormeteringdatainthequeryingprocess.Customersclaimrewardswithapartiallyblindedsignature(BBS+)whichhidestherealidentitybutensurestheintegrityinthesettlementprocess.ThepseudoaccountsofcustomersarerevokedintherevocationprocesswhencustomersleavetheDRPprograms. 3.2.4DesignGoalsWeintendtodesignaschemethatguaranteesprivacy,integrity,andavailability.Privacy.Customersneedtoregistertheirrealidentitiesforsecurityreasons.However,theywanttoremainanonymouswhenqueryingtheirmeteringdataorclaimingtheirrewards.Weguaranteethisbyallowingnootherpartyexceptthecustomerhimselftoknowhisne-grainedpowerusageprole.Integrity.Wealsoneedtoensuretheintegrityofthescheme.Misbehaviorssuchasfalsifyingthemeteringdataordoublespendingshouldbedetectedimmediately.Availability.GuaranteeingtheavailabilityofIDRprogramsmeansthatallthefeaturesrequiredbyIDRprogramsarefullledandtheeciencyisguaranteed.Specically,theDRPcangatherinformationtoprole,rewardandprovidefeedbacktocustomerswhilecustomerscanlearntheirDRperformanceandclaimtheirrewards. 63

PAGE 64

Figure3-3. RegistrationProcess. Moreover,sincethemeteringdatashouldbetransmittedandprocessedwithlowlatency,themeteringprocessshouldhavelowcomputationandcommunicationoverhead. 3.3BasicProtocolDesignWedescribethebasicprotocolsinthissection.Duetopagelimit,weleavethedetailedconstructionoftheprotocolsandZKPsinourtechnicalreport[ 78 ].TheDRPplaystheroleoftheprivatekeygenerator(PKG)andsetsupthemasterkeyandpublicparametersfortheICSschemeandtheBBS+schemeasdescribedinSection 3.1 . 3.3.1RegistrationProcessFig. 3-3 describestheregistrationprocess.ThecustomerrevealshisidentityItotheDRPforregistration.Afterverifyinghiseligibility,theDRPcomputesandsendsthepublic/privatekeypair(Q0I;SI)(ICSsignature)tothesmartmeterofthecustomer.Moreover,thecustomercommitsarandomsecretsandsendscommitmentCM(s)totheDRP.TheDRPthencreatesandreturnsRGs,aBBS+signatureonstaggedwith\RG".Thevalueofsremainshiddenduringtheprocess.AfterthecustomerreceivesRGs,hestores(RGs;s)astheregistrationticketforhispseudonym.Thecustomeralsoregistersapseudonym.Tothisend,thecustomerselectsarandomnumberIashissecretandcomputeshispseudonymPIasPI=gI4.Afterarandom 64

PAGE 65

Figure3-4. MeteringandQueryingProcess. delay,thecustomersendsPIand(RGs;s)totheDRPthroughananonymouschannel.HeprovestotheDRPthat(1)RGsisavalidsignatureons,and(2)PI=gI4withaZKPPK1: PK1f(I;A;c;z;I;z0):PI=gI4^^e(A;!gc)=^e(ggz0gI1gs3;g)g:(3{1)IftheDRPveriesthevalidityoftheZKP,itestablishesapseudoaccountassociatedwithPI.Toinitiatethebalanceinthepseudoaccount,thecustomerrandomlyselectsanewvalueofs,sendscommitmentCM(I;B;s)totheDRPviatheanonymouschannel,andobtainsBLs,aBBS+signatureon(I;B;s).Here,Bdenotesthebalanceandisinitializedto0.Thecustomerstores(BLs;I;B;s)asthebalanceticket.Notethatsisupdatedeverytime,andhenceacustomercannotusethesametickettwice.AftertheregistrationofpseudonymPI,thecustomerinputsthepseudonymintothesmartmeter.Thesmartmeterstoresthepseudonymlocallyandonlyusesthepseudonymformeteringpurposes. 3.3.2MeteringandQueryingProcessesFig. 3-4 describesthemeteringandqueryingprocesses.Ateachreportingcyclet,thesmartmetercollectsmeteringdatamtandgeneratesanICSsignatureIConthemeteringdata.Itthenattachesthepseudonymofthecustomertothemessageandsendstheentire 65

PAGE 66

Figure3-5. SettlementProcess. message(mt;t;PI;IC)totheDRPthroughtheanonymouschannel.Uponreceivingthemessage,theDRPchecksthevalidityofIC.IfICpassestheverication,theDRPstores(mt,IC)asthemeteringrecordattimetforthepseudoaccountPI.Otherwise,theDRPdiscardsthemessage.ThemeteringrecordsassociatedwithPIcanbeusedtocalculateindividualCBLandallocateDRrewards.Inthequeryingprocess,thecustomerproveshisknowledgeaboutthesecretkeyIofpseudoaccountPIwithaZKPPK2: PK2fI:PI=gI4g:(3{2)ThecustomersendsaqueryingrequesttogetherwithPK2totheDRPviatheanonymouschannel.IfPK2iscorrectlyconstructed,theDRPlocatestherequesteddatainthedatabaseandsendsitbackovertheestablishedanonymouschannel.Otherwisetherequestisrejectedandthecorrespondingrequestisignored. 3.3.3SettlementProcessTheDRrewardsareallocatedtopseudoaccountsbytheDRPbasedonindividualcurtailments.Customersmayclaimtheirrewardsintwosteps,asdescribedinFig. 3-5 .First,acustomeraddstherewardintohisbalanceticketthroughtheanonymouschannel.Supposehisoldbalanceticketis(BL~s;I;~B;~s).Totransferrewarddfromthepseudoaccount,thecustomerrstchecksiftherewardinhispseudoaccountislargerthand. 66

PAGE 67

Ifyes,heselectsarandomsecretsandsendscommitmentCM(s;I;B;~B)totheDRP,togetherwiththefollowingZKPPK3: PK3f(I;~A;~c;I;~z;~B;~s):PI=gI4^B)]TJ /F3 11.955 Tf 9.41 0 Td[(d>0^C=gz00gI1g~B2gs3^^e(~A;!g~c)=^e(gg~z0gI1g~s3;g)g;(3{3)whichshowsthathispseudonymisPI,thenewbalanceispositive,andthebalanceticketiscorrectlyformed.NowtheDRPveriesifboth~sisnevershownbeforeandPK3istrue.Ifyes,itreplieswithanewBBS+signatureBLsonthetuple(I;B;s).Thecustomerstores(BLs;I;B;s)asthenewbalanceticket.Second,thecustomerredeemsrewardfromthebalanceticketwithhisrealidentity.ThecustomerselectsanewsandsendsthebalanceticketBL~s,thewithdrawalamountd,andaZKPPK4totheDRP,whichisacombinationofPK2andPK3.TheDRPthenveriesthevalidityofPK4andchecksif~sisneverusedbefore.Ifbotharetrue,itreturnsanewBBS+signatureBLson(I;B;s),andthecustomerstores(BLs;I;B;s)asthenewbalanceticket. 3.3.4RevocationProcessWhenthecustomerquitsfromanIDRprogram,theDRPneedstoensurethatboththeidentiableandthepseudoaccountsofthecustomerareclosed.Thisisguaranteedthrougharevocationticket.Whenthecustomerrevokesthepseudoaccountthroughtheanonymouschannel,heobtainsarevocationticketRVsfromtheDRP.TherevocationticketcontainsaBBS+signatureon(I;s)withsbeingtherandomsecretselectedbythecustomer.Afterarandomperiod,thecustomerpresentshisrealidentity,therevocationticket,andaZKPPK5togethertotheDRP,where PK5f(A;c;z;I;z0):^e(A;!gc)=^e(ggz0gI1gs3;g)g:ThisticketprovestherevocationofthepseudoaccountassociatedwithcustomerI.ThentheDRPcancontinuetocompletetherestoftherevocationprocess. 67

PAGE 68

3.4PracticalConsiderationsandExtensionsInthissection,wediscusssomepracticalissuesandprovideusefulextensionstosolvethem. 3.4.1CloakingMechanismInthemeteringprocess,allthemeteringrecordsofacustomerareassociatedwiththesamepseudonym,whichenablestheDRPtolinkthemeteringdataandperformbasicoperations.Intheory,theDRPonlyknowsthepseudonymofthepowerusageproles,andthustherealidentityofthecustomerishidden.Inpractice,however,theDRPmaystillinfertherealidentityofthecustomerbyminingtherelationshipsbetweenrewardsandwithdrawals.Forexample,ifacustomerwithdrawsalltheavailablerewardsinhisaccounteverysettlementcycle,thewithdrawalswillequaltherewards;theDRPcanthenusetherewardsasaquasi-identiertondtherealidentityassociatedwiththepseudoaccount.Toavoidsuchalinkage,customerscanusecloakingmechanismswhentheywithdrawfromthebalancetickets.Ingeneral,thecloakingruleshidetherelationshipbetweenwithdrawalsandrewardsbyreducingthewithdrawalamountandfrequency.Ideally,ifacustomerwithdrawsonceperyearandleavessomebalanceunredeemed,theDRPcanonlylearnanestimateofhistotalrewardthroughtheyear.Thisinformationdoesnotrevealtherelationshipbetweentherealidentityandthepseudonymsinceitappliestomanycustomers.However,customersusuallywanttouserewardswhenevertheyareavailable,andredeemingrewardsmotivatesthemtobemoreactiveinfutureDRevents.Hence,weneedtobalanceprivacyandtimeliness.Inthefollowing,weproposetwocloakingmechanisms,i.e.,oorfunctionwithdrawal(FFW)mechanismandpartitionandrandomselection(PRS)mechanism.Withoutlossofgenerality,weassumethatwithdrawaldecisionsaremadeoncepersettlementcycle.TheFFWdividestherangeofrewardsintonon-overlappingintervals.Eachcustomerfallsintooneintervalbasedontheirremainingbalanceinthebalanceticket.Attheend 68

PAGE 69

ofasettlementcycle,acustomerwithdrawstheoorvalueofhisinterval.Inthisway,customersinthesameintervalareindistinguishablebecausetheywithdrawthesameamountofrewards.Theachievedanonymityisdeterminedbyintervalsize:intervalsoflargersizemayincludemorecustomers,andthereforeprovidestrongeranonymityguarantee.Intervalsdonotneedtobeofthesamesize.Forintervalswhichcontainadensepopulation,thesizecanbechosensmaller,whileforotherintervals,thesizeshouldbelarger.ThePRSdenesasetofcells,sayf5;10;20;40g.Customersrstpartitiontherewardsintocells.Thentheyselecteachcellwithaprobabilitypandwithdrawanamountequaltothesumoftheselectedcells.Forexample,iftherewardis45,thecustomermaydivideitintof5;10;10;20g,chooseasubsetofitwithselectionprobability0:8,andnallyselectcellsf5;10;10g.Thewithdrawalisthesumofthesecells,whichis25. 3.4.2PseudonymUpdateCloakingschemescanreduceinformationleakedtotheDRP.However,inthelongrun,theDRPcanstillgainenoughinformationforde-anonymization.SupposethatAlicereceivesrewardsR1;R2;:::;RNandwithdrawsW1;W2;:::;WNintherstNsettlementcycles.TheDRPlearnsthat nXk=1WknXk=1Rk;n=1;2;:::;N;(3{4)whereWkisthek-thwithdrawal,andRkisthek-threwardofAlice.IfthewithdrawalsofcustomerBobalsosatisfy( 3{4 ),thatis, nXk=1WknXk=1R0k;n=1;2;:::;N;(3{5)whereR0kisthek-threwardofBob,thenAliceandBobareindistinguishablefromtheDRPside.However,asNbecomeslarge,itbecomeshardertonda\Bob"whoisindistinguishablefromAlice.Withcloakingmechanismscustomerscanslowdown 69

PAGE 70

thisprocessbutnotstopit.Hence,Aliceneedstoupdatethepseudonymafterafewsettlementcycles.Updatingthepseudonymincludesrevocationofcurrentpseudonym(Sec. 3.3.4 )andregistrationofthenewone(Sec. 3.3.1 ).Toavoidlinkageofthetwopseudonyms,afterrevocatingtheoldone,thecustomerwaitsacertainperiodbeforeregisteringthenewpseudonym. 3.4.3Re-identicationTherearescenarioswherethecustomerneedstoprovidehispowerusageprolestoothers.Forexample,acustomermayusehispowerusageproletojustifyinalegaldispute.ThisisespeciallyimportantwhentheDRPisnotathirdparty,buttheutilitycompanyitself.Inthiscase,theDRPshouldenablethecustomertoproveownershipofhisprole.Inotherwords,thecustomershouldbeabletoprovetotheDRPorotherpartiesthatthepowerusageproleislinkedtohisrealidentity.ThisfeaturecanbeprovidedthroughtheICSscheme.Toprovehisownershipofameteringrecord,thecustomerpresentsthesecretItogetherwiththemeteringdata,correspondingICSsignatures,andtherealidentityItotheverier.TheverierparsestheICSofthemeteringdataintherecordasIC=(Q;Q0;U;V),computesthepublickeyforthecustomerasQI=H1(I),andchecksifQI=)]TJ /F8 7.97 Tf 6.59 0 Td[(1IQholds.Iftheresultisyes,thentheverierisconvincedthatthesignedmeteringdataisgeneratedbythecustomerwithidentityI.Hence,thecustomerwithidentityIisre-identiedtobetheownerofthepowerusageprole.SincetheDRPalreadyknowsthelinkagebetweenthepseudonymPIandthemeteringdata,itcannowreadilylinktherealidentitytothepseudonym.Ifacustomerwantstokeephisfuturepowerusageprolehiddenafterthere-identicationprocess,heneedstoupdatehispseudonymfollowingtheupdateprotocolinSection 3.4.2 . 70

PAGE 71

3.5SecurityandPrivacyAnalysisInthissection,weshowthattheproposedschemeachievesthesecuritygoalsofintegrityandprivacyandanalyzetheunlinkabilityachievedthoughourcloakingmechanism. 3.5.1DataIntegrityDataintegrityoftheproposedschemeisguaranteedinthefollowingaspects.DefendingCheatingCustomers.WemodeltheinteractionbetweenacheatinguserAandanhonestDRPCasagame.LettheDRPCrecordthebalanceBremainedintheaccountofA.AwinsthegameifAcangetanegativebalanceofB.Weusereductionargumentforourproof.Specically,weshowthatifadversaryAcanwinthegame,thenaforgeryattackcanbeconstructedagainsttheBBS+signature.However,BBS+signaturehasbeenshowntobeunforgeablein[ 76 ].ThustheredoesnotexistsuchadversaryA.Inotherwords,cheatingcustomerscannotwininoursystem.Duetospacelimit,weleavethedetailedformalproofinourtechnicalreport[ 78 ].Authenticity.Theauthenticityofthemeteringdata,ensuredbytheICSscheme,allowstheDRPtoverifythatthedataaregeneratedbyagenuineandregisteredsmartmeter.NoattackerscanforgeortamperthemeteringdatasincetheycannotforgetheICSunderadaptivelychosenmessageattack[ 70 ].Ticketsareusedinourschemeforregistrationprocess,revocationprocess,andsettlementprocess.Thetickets,orBBS+signatures,havebeenproventobesecureagainstexistentialattack[ 76 ].Henceattackerscannotforgeticketstogainmonetarybenets.Besides,theattackersarenotabletoreplayusedticketsasthesecretsintheticketisupdatedwheneveritisshowntotheDRP,andtheDRPcaneasilydeterminewhetheratickethasbeenusedornotbylookingitupinthedatabase.Condentiality.Inadditiontoprotocolswedescribedinthepaper,standardasymmetricandsymmetricencryptionsareusedtoprovidecondentiality.Fornormalcommunication,eitherencryptionschemeisgood.Foranonymouscommunication, 71

PAGE 72

asymmetricencryptionschemesarerequired.Forexample,inthemeteringprocess,smartmetersneedasymmetricencryptiontoensurecondentiality.Tothisend,theyencryptmessageswiththepublickeyoftheDRP,whothendecryptsthemwithitsprivatekey.Thisensuresend-to-endcondentialityofthemeteringdata. 3.5.2PrivacyWepreservecustomerprivacybyensuringtheanonymityofne-grainedmeteringdata.Eachcustomerregistersbotharealidentityandapseudonym,andonlypseudonymsareattachedtometeringdata.TheDRPcanneitherinfertherealidentitiesfromthemeteringdatainthemeteringprocessnorlinkrealidentitiesandpseudonymsinotherprocesses.PrivacyinRegistration,Querying,Settlement,andRevocation.Weuseagametomodeluserprivacyinregistration,querying,settlement,andrevocation.LetthecuriousDRPinteractwithCwhoactsonbehalfoftwousers.UserprivacyisprovidedintheseprocessesiftheDRPcannottellwhichofthetwousersisresponsibleforaparticularinteractionundertheconditionthatallotherinteractionshavebeenidentiedbythecuriousDRP.Theparticularinteractioncouldbeduringpseudonymregistration,settlement,querying,andrevocation,butnotduringrealidentityregistration,becauserealidentityistobeknowninrealidentityregistration.Ourdenitionalsoguaranteethattheinteractionscannotbelinked.Duetospacelimit,weincludethedetailsoftheproofinourtechnicalreport[ 78 ].Inthefollowing,weprovideanintuitivedescriptionoftheprivacyguaranteeprovidedbyourscheme.Intheregistrationprocess,BBS+signaturesareusedtohidetherelationshipbetweentherealidentityandthepseudonym.AcustomerobtainsaBBS+signature(i.e.,theregistrationticketinSec. 3.3 )afterheregistershisrealidentity.Inaseparatecommunicationsession,thecustomerusesthissignaturetoprovehiseligibilityofenrollmentandtoregisterhispseudonym.SincetheBBS+signaturehidesthevalueoftherealidentity,theDRPdoesnotknowhisrealidentitywhenregisteringthepseudonym 72

PAGE 73

andthuscannotlinkthesetwoidentities.Thesameconclusioncanbegivenforthesettlementprocessandtherevocationprocess.Inthequeryingprocess,customersinquiretheirdatathroughpseudonymsandnoinformationofrealidentityisinvolved.SinceinformationinvolvingpseudonymsissentthroughaproxywhohidesthestaticphysicaladdressofasmartmeterfromtheDRP,anonymityisalsoensuredinthephysicallayer.AnonymityinMetering.Duringthemeteringprocess,smartmetersonlyattachpseudonymstometeringdata.Toverifytheauthenticityofthemeteringdata,theysignthemeteringdatawithICS.ThesecretvalueoftheICSisstoredlocallyatthesmartmeter,andtheDRPdoesnotknowit.Hence,duetotheanonymitypropertyoftheICS,theDRPcanverifythedatasource,butitcannotidentifythecustomer[ 70 ]. 3.5.3UnlinkabilityBetweenPseudoAccountsandIdentiableAccountsInSec. 3.4 ,weshowthattherelationshipbetweenrewardsandwithdrawalsmaycompromiseanonymityandproposetwocloakingmechanismstomitigatetheattack.Thecloakingmechanismsdividecustomersintoseveralsetsandcustomersinthesamesetareindistinguishable.DenotethesetasS.Asetwithlargersizeprovidesstrongeranonymity.SupposetheDRPhas500subscribedcustomersandcustomerswithdrawmoneyoncepersettlementcycle(e.g.,amonth).WeassumethatDRrewardsfollowaGaussiandistributionwithmean50andvariance20.WesimulatethewithdrawalbehaviorsofcustomerswiththeFFWmechanism,andthePRSmechanismundertwoparametersettings,i.e.,PRSwithcellsf1;5;10;50gandprobabilityp=0:2,andPRSwithcellsf1;5;10;50gandprobabilityp=0:5.WedemonstratetheratioofcustomerswithdierentsizesofSinFig. 3-6 .Overall,mostofthecustomersareindistinguishableatleastfrom9others.However,astheDRPgraduallygainsmoreinformation,thesetsarebecomingsmaller,andcustomersneedtoupdatetheirpseudonyms.WecomparetheiraverageundrawnamountsonamonthlybasisinFig. 3-7 .WecanseethattheFFWmechanismrequiresthefewestamountofundrawnrewardsamongallthethreeapproaches,whilethe 73

PAGE 74

A B CFigure3-6. AnonymityforA)FFW,B)PRSwithp=0:2andC)PRSwithp=0:5. PRSmechanismwithselectionprobabilityp=0:5requiresthemost.Thisillustratesthetrade-obetweenprivacyandtimeliness:Ifyouwantbetterprivacy,youshouldwithdrawlessfrequently. 3.6PerformanceAnalysisInthissection,weanalyzetheeciencyandcostoftheproposedscheme.Thecomputationcostcomesmainlyfrompairingsandexponentiationsinsignatureschemes(ICS,BBS+)andZKP(PK1)]TJ /F3 11.955 Tf 12.28 0 Td[(PK5).Wesummarizenumberofthesetwooperationsinbasicprotocolsforsmartmeters,customers,andtheDRPinTable 3-1 . 74

PAGE 75

Figure3-7. EvaluationofCloakingMechanisms. Table3-1. Numberofpairingandexponentiationoperations. RegistrationMeteringQueryingSettlementCustomerDRPCustomerSmartMeterDRPCustomerDRPCustomerDRP GroupGexponentiation(pre-processed)2214000214821GroupGexponentiation(direct)160000139GroupGTexponentiation(pre-processed)46000051316GroupGTexponentiation(direct)1200001235Pairing(oneparameterisconstant)320050162Pairing(bothparametersarenotconstant)200000020 Fromthistable,wecanseethatsmartmetersdonotneedtoperformanyofthetwooperations.ThemosttimeconsumingprocessesperformedbysmartmetersareICSgenerationinthemeteringprocess,whichinvolvesnoparingorexponentiationoperationsandcanbehandledbyexistingsmartmeterseciently.ThecustomerdevicesorDRPserversareassumedtobepowerfulenoughtoconductcomputation-intensiveoperations.Nonetheless,someoftheoperations,suchastheexponentiationsthathaveconstantbase(e.g.,gz00)andthepairingswithoneoftheparameterbeingconstant(e.g.,^e(ggz0gI1gs3;g)),canbepreprocessed,whichexpeditesthecalculationgreatly.Basedonsimulationresultsof[ 79 ]thatusessimilarcryptographictools,wecanestimatethecomputationtimeofouralgorithm.IfweuseasmartphoneHTCDesireHDwithQSD82551GHzCPUand1.5GROMtosimulatethecustomerdevice,the 75

PAGE 76

registrationtimeforthecustomerislessthan3seconds,andthesettlementtimeislessthan6seconds.IfweuseadesktopwithQ66002.4GHzCPUand3GBRAMastheserverofDRP,theregistrationtimefortheDRPis0:2seconds,themeteringtimeis0:05seconds,andthesettlementtimeis0:3seconds.Sinceregistrationandbillingprocesseshappenatlowfrequency,theprocessingtimeintheorderofafewsecondsisinsignicant.Sincesmartmetersarenotinvolvedinanycomputationintensiveoperationssuchasexponentiationorpairingoperations,ourproposedprotocolscanbeimplementedeciently. 3.7RelatedWorkWhileinstrumentaltotheimplementationofDR,ne-grainedmeteringdatacollectedbytheAMIcanbeusedtodetermineoccupantactivities,raisingseriousprivacyconcerns[ 80 ].Researchstudiesonnon-intrusiveloadmonitoring(NILM)[ 66 , 81 , 82 ]haveshownthepossibilityofdeducingapplianceusagepatternsfromne-grainedmeteringdata.Theapplianceusagepatternscanbefurtheranalyzedtolearnthehealthstatus,dailyroutinesorunusualbehaviorssuchas\yousleptlateatnight"and\yourchildisleftaloneathome"[ 67 ].Hence,agrowingnumberofresearchactivitieshavebeencarriedouttoaddressprivacyissuesintheAMI.TheapproachestoaddressingprivacyissuesintheAMIcanbedividedintothreecategories.Therstcategoryproposestoaggregateindividualmeteringdatabeforesendingthemouttoutilitycompaniessincemostbenetsofthesmartgridcanbeachievedwiththeaggregatedata.Aggregationcanbeeitherperformedatacentralpoint[ 83 { 85 ]ordistributedinthenetwork[ 86 ].Thesecondcategoryusescryptographictoolstohidesensitiveinformation,mainlyadoptedforprivatebillingpurposesinPDRprograms[ 67 , 84 , 87 ].Theseworksintendtocalculatebillsatthecustomersideandaskcustomerstoprovethecorrectnessoftheirbillstoutilities.Thethirdcategoryusesanonymitytoprotectuserprivacy[ 47 ].Theaforementionedapproachesshareacommonassumption:meteringdataforoperationalpurposesdonotneedtobeattributableto 76

PAGE 77

aspeciccustomer,andmeteringdataforbillingpurposesdonotrequiretobeinnegranularity.Thisassumption,however,nolongerholdsforIDRprograms.Fine-grainedmeteringdataarerequiredwhentheDRPschedulesdemandcurtailments,calculatesCBLs,andallocatesDRrewardsforindividualcustomers.HencebothnegranularityandattributabilityarerequiredforIDRprograms.Inthiscase,aforementionedprivacyprotectionmechanismssuchasaggregationarenotapplicable,andanewapproachisneeded.Anonymouscredentialshavebeenproposedtopreserveprivacyin[ 88 , 89 ].Specically,auserobtainsacredentialfromanissueranddemonstratesthepossessionofthecredentialtoaverierwhoonlyhasthepublicinformationoftheissuer.Theveriercanlearnnothingbeyondthefactthattheuserownsacredentialgrantedbytheissuer,evenwhenitcolludeswiththeissuer.\ANONIZE"[ 90 ]isananonymoussurveysystemwhereuserscancreateananonymoussingle-usetoken(orcredential)anduseittoansweragivensurvey.However,single-usetokenfailstoservethebillingpurposeinourscenario,wherethebalanceshouldbeinheritedandupdatedrepeatedlyacrossdierentsettlementcycles.Ontheotherhand,\OPAAK"[ 91 ]providesanauthenticationframeworkwithprivacy-preservingandsinglesign-onproperties.Thisissimilartooursolution,however,besidesidentitymanagement,ourscenarioalsorequiresmanagementofarunningbalanceandauthenticationofmeteringdatawhichcannotbeaddressedbytheirwork. 77

PAGE 78

CHAPTER4PROTECTINGLOCATIONPRIVACYFORTASKALLOCATIONINADHOCMOBILECLOUDCOMPUTINGNowadays,mobiledevicessuchassmartphonesandtabletshavegainedatremendouspopularity.Thesedevicesareoftenequippedwithavarietyofsensorssuchascamera,microphone,GPS,accelerometer,gyroscope,andcompass.Thedata(e.g.,position,speed,temperature,andheartrate)generatedbythesesensorsenablemanyusefulapplicationswithmobiledevices,includinglocation-basedservices[ 92 , 93 ],mobilesensing[ 94 ],andmobilecrowdsourcing[ 95 , 96 ].Althoughimprovedlargelyoverthepastseveralyears,mobiledevicesarestillresourceconstrainedmainlyduetothelimitedbatterylifetime.Ontheotherhand,cloudcomputinghaswidelybeenregardedasthenext-generationcomputingparadigm,whichcanprovide\unlimited"cloudresourcestoend-usersinanon-demandfashion.Therichcloudresourcesincloudcomputingcanbeexploitedtoincrease,enhance,andoptimizecapabilitiesofmobiledevices,leadingtotheconceptofmobilecloudcomputing(MCC).Accordingto[ 97 ],MCCintegratescloudcomputingtechnologieswithmobiledevicestomakethemobiledevicesmorecapableintermsofcomputationalpower,memory,storage,energy,andcontextawareness.TherearegenerallytwotypesofmobilecloudsinMCC:infrastructure-basedandad-hoc[ 97 ].Theinfrastructure-basedmobilecloudconsistsofstationarycomputingresourcesandprovidesservicestothemobileusersviatheInternet.Alternatively,inthead-hocmobilecloud,acollectionofmobiledevices(hereafterreferredtoas\mobileservers")performsascloudresourcesandprovidesaccesstolocalorInternet-basedcloudservicestoothermobileusers(hereafterreferredtoas\mobileclients").Inthischapter,wefocusonthesecondcase,namely,thead-hocmobilecloud.Themainbenetofutilizingad-hocmobilecloudresourcesistheirdistributedandcontext-awarenessfeatures.Asexplainedin[ 98 { 101 ],incentivizedbythemobilecloudcomputingplatform(CCP),individualmobileuserscontributetheirmobiledevicesasmobileserversinthead-hocmobilecloud,andthesemobileserverscanbeusedtoperformlocation-dependenttasks 78

PAGE 79

suchasepidemicmonitoring,tracmonitoring,image/videocapturing,andpricecheckingformobileclients.Despitemanypromisingapplications,ad-hocmobilecloudsposeseveralchallenges.First,mobilecloudresourcesinanad-hocmobilecloudaredynamicanddiverse.Asaresult,somemobileserversmaydropthetasktheyareperformingandleavethecloud.Somemobileserversmaybe\spammers"thatonlywanttocollectrewardsandsubmitarbitraryanswerswithoutlookingatthespecictask.Moreover,somemobileserversmaynotbepowerfulenoughtoprovidesensingdataattherequiredaccuracy.Therefore,howtoallocatetaskstoensurethequalityoftheserviceprovidedbythesedynamicmobileserversischallenging.Second,aspointedoutby[ 98 ],securityandprivacyofmobiledevicesasaserviceproviderisacriticalconcerninthead-hocmobilecloud.Inordertoallocatetasksandprovideeectiveservices,mobileserversinanad-hocmobilecloudneedtosharetheirlocationdatawiththeCCP,whichcouldrevealalotofpersonalinformationsuchasauser'sidentity,healthstatus,personalactivities,andpoliticalviews[ 102 ].Hence,itismandatorytoprovideprivacyguaranteeinordertoengagemoremobiledevicesinthecloud.Finally,thereisaninherentconictbetweenqualityofservice(i.e.,utility)andprivacyintaskallocation.Ifanad-hocmobilecloudensuresprivacyofmobileservers,itisdiculttoguaranteetheutilityoftheprovidedMCCservice.Findingasolutionthatcanensureprivacywhileguaranteeingutilityfortaskallocationisamajorchallengeinsuchsystems.Severalsolutionstoprivacyissuesinmobileapplicationshavebeenproposed.Forexample,aggregationisacommonapproachtohidingindividualsensitiveinformationwhenonlystatisticsofusersarerequired[ 103 ].However,thisapproachonlycalculatesstatisticsandthuscannotbeusedtoselectmobileserversinanad-hocmobilecloud.Anotherapproachisusedinlocation-basedservices,wherethetruelocationsareobfuscatedinlocation-basedqueries,andtheserviceproviderreturnsresultsbasedontheobfuscatedquery[ 104 , 105 ].Inourscenario,however,theprivateinformationisno 79

PAGE 80

longerpartofalocation-basedquery,buttheresultofalocation-basedqueryregardingthetask.Somepapers[ 106 , 107 ]considerqueriesonprivatelocationsinanoutsourceddatabase,buttheyonlyprotectprivatedatafromanintermediateserviceproviderwhileassumingatrustrelationshipbetweenthedataownerandthequeryingentity.ThisisnottrueinourscenariobecausemobileserversandtheCCPmaynotshareaninherenttrustrelationship.Arecentwork[ 108 ]hasbeenproposedtoprotectthelocationprivacyofcrowdsourcingworkersinspatialcrowdsourcing.However,theirsolutiondoesnotconsiderworkerreputation,andthuscannotprovideanyqualitycontroloverthenalresult.Therefore,itcannotbeeasilyappliedtoourscenarioinmobilecloudcomputing,whichhastoensureservicequality.Inthischapter,weproposeaframeworkthatprovidessolutionstotheabovechallenges,wherebothlocationprivacyandservicequalityareconsidered.Inourframework,theCCPonlyhasaccesstosanitizedlocationdataofmobileserversaccordingtodierentialprivacy(DP).SinceeverymobileserverissubscribedtoaCellularServiceProvider(CSP)withwhichitalreadyhasatrustrelationship,theCSPcanintegratemobileserverlocationandreputationinformation,andprovidesthedatatotheCCPinnoisyformaccordingtoDP.Togeneratethenoisymobileserverdata,weadaptthePrivateSpatialDecomposition(PSD)approachproposedin[ 108 , 109 ],andconstructanewstructurecalledReputation-basedPSD(R-PSD).SincefakepointsneedtobecreatedintheDPmodel,geocastisusedtodisseminatetaskstomobileserverstopreventtheCCPfromidentifyingthesepoints.Tosummarize,ourmaincontributionsareasfollows: 1. Weidentifythespecicchallengesfortaskallocationinad-hocmobileclouds,andproposeaframeworkthatcanachievedierentialprivacyformobileserverlocationdatawhileprovidinghighservicequality. 2. WeintroduceanewstructurecalledR-PSDthatpartitionsthespacebasedonbothreputationandlocationinformation,anddevelopanecientsearchstrategythatndsappropriateR-PSDpartitionstoensurehighqualityofservice. 80

PAGE 81

3. WeuseageocastmechanismwhendisseminatingtaskstomobileserverstoovercometherestrictionsimposedbyDP,andtheoverheadduringthisprocessisincorporatedintothedesignofthesearchstrategy. 4. Weconductextensiveexperimentsbasedonreal-worlddatasetstoshowtheeectivenessoftheproposedframework.Theremainderofthischapterisorganizedasfollows.WepresentbackgroundonseveraltechniquesweuseinSection 4.1 .InSection 4.2 ,wedescribethesystemmodelfortheproposedframework.Section 4.3 andSection 4.4 describethedetailedsolutions,theprocessofR-PSDgenerationandtaskallocationbasedonR-PSD,respectively.Thereafter,wediscusstheexperimentalresultsandanalyzethesystemoverheadinSection 4.6 .Section 4.7 reviewstherelatedwork. 4.1BackgroundInthissection,weintroducebackgroundondierentialprivacy(DP)andPrivateSpatialDecomposition(PSD). 4.1.1DierentialPrivacyTheprivacyguaranteeprovidedinourframeworkis-dierentialprivacy[ 36 , 38 ].DPprovidesprotectionofdatasetsagainstadversarieswitharbitrarybackgroundinformation.Bysanitizingthedata,DPpreventsanadversaryfromknowingwhetheracertainindividualrecordispresentornotinthedatabase.Formallyspeaking,wehavethefollowingformaldenition. Denition2. ArandomizedalgorithmFsatises-DPifforanytwodatasetsD1andD2whichdierinonlyoneelement,and8Orange(F),thefollowinginequalityholds: lnPr[F(D1)2O] Pr[F(D2)2O]:(4{1)Inthedenition,theparameterboundstheratioofprobabilitydistributionsoftwodatasetsdieringonatmostoneelement.Itspeciestheamountofprivacyprotection,andasmallervalueofindicatesbetterprotection.Wecallthisparametertheprivacybudget. 81

PAGE 82

Inordertoachieve-DPinadataset,therawdataissanitizedbyaddingrandomnoisetothereleasedquerysetQS.TheamountofnoiseisdeterminedbythesensitivityofQS,whichisdenedasfollows: Denition3. GivenanytwodatasetsD1andD2whichdierinoneelement,thesensitivityofthereleasedquerysetQSis (QS)=maxD1;D2kQSD1)]TJ /F3 11.955 Tf 11.96 0 Td[(QSD2k1:(4{2)Giventhesensitivity,asucientconditiontoachieve-DPistoaddtoeachqueryresultrandomlydistributedLaplacenoisewithmean=(QS)=[ 110 ].TheresultsfromadatabaseusuallyinvolveseveralstagesofanalysesMi.Theprivacylevelofthecompositionofseveralstagescanbecomputedbythefollowingresults[ 54 ]: Theorem4.1(Sequentialcomposition). IfMiareasetofanalyses,eachprovidingi-DP,thentheirsequentialcompositionsatises(Pii)-DP. Theorem4.2(Parallelcomposition). IfMiareasetofanalyses,eachprovidingi-DP,thentheirparallelcompositionsatisesmaxi(i)-DP.Thesetheoremsenableustocalculateprivacylevelofanaggregatedresultbasedontheprivacylevelofeachindividualresult. 4.1.2PrivateSpatialDecompositionThePSDapproachisrstintroducedin[ 109 ]toconstructaspatialdatasetthatachievesDP.APSDisaspatialindexwhereeachindexnodeisassociatedwithaspatialregion,andthevalueforeachnodeisthenoisycountofdatapoints(mobileserversinourscenario)inthatregion.Thedatastructureforspatialindexcanbegrids,k-dtrees,orquad-trees[ 111 ].Choiceofdatastructureanditsparameters(fan-outandheight)canheavilyinuencetheaccuracyofPSD.Inspace-basedpartitioningPSDsuchasgridsandquadtrees,thesplittingpositionsofspaceisindependentofMSlocations.Thusprivacybudgetisonlyconsumedwhencalculatingthenoisycountofmobileservers.Typically,indexnodes 82

PAGE 83

atthesamelevelcovernon-overlappingextents,resultinginalowsensitivityof2(i.e.,thelocationchangeofasingleMSaectsatmost2cellsinalevel).Theprivacybudgetisdistributedacrosslevelsaccordingtogeometricallocationstrategyin[ 109 ],whereleafnodesisallocatedmorebudgetthanhigherlevelnodes.Space-basedPSDareeasytoconstruct,buttheycanbecomeunbalancedwhenmobileserversarenotuniformlydistributedinspace.Ontheotherhand,object-basedstructuressuchask-dtrees[ 109 ]splitspacebasedonthelocationsofmobileservers.Sincelocationdataareusedbothforcalculatingsplittingpositionsandcomputingnoisycounts,theprivacybudgetshouldbesplitbetweenthetwoprocessesaswell.Object-basedstructuresareexpectedtobemorebalancedthanspace-basedPSD;however,theyarenotveryrobustinthesensethattheiraccuracymaydecreaseabruptlywithaslightchangeofthePSDparametersorinputdatasetdistributions.Theworkin[ 112 ]proposesanadaptivegrid(AG)approachwithtwo-levelgrids.Therst-levelgridisuniformlydivided,andthegranularityofthesecond-levelgriddependsonthenoisycountsobtainedintherst-level.AGisahybridapproachthatinheritsthesimplicityandrobustnessofspace-basedapproach,butstillutilizessomedata-dependentinformationwhenchoosingthegranularityforthesecond-levelgrid.Inthischapter,weadapttheirapproachtoconstructourPSD. 4.2ProblemFormulationTherearevarioustypesofmobilecloudcomputingapplications,andeachofthemmayhaveadierentsystemmodel.Inthischapter,weconsideranemergingMCCsystemthatisconsideredbyseveralrecentstudies[ 100 , 101 , 113 ].Auniquefeatureofthissystemisthatvariousmobiledevicessuchassmartphones,tablets,andhandheldcomputingdevicesplaytheroleofserversbasedoncloudcomputingprinciples.Benetsofthissystemincludetheproximityofthemobileresourcestomobileclientsandcontext-awarenessofthemobileresources[ 98 ].Inthefollowing,wepresentthesystem 83

PAGE 84

Figure4-1. Privacy-preservingframeworkfortaskallocationinMCC. modelinSection 4.2.1 ,describethemobileservercharacteristicsinSection 4.2.2 ,outlineourthreatmodelandassumptionsinSection 4.2.3 ,anddiscussourperformancemetricsinSection 4.2.4 . 4.2.1SystemModelThesystemmodelforourproposedprivacy-preservingframeworkisshowninFig. 4-1 .Therearemainlyfourpartiesinthesystem:theCCP,theCSP,mobileservers,andmobileclients.TheCCPhasapoolofmobileserversasitscloudresources,andassignsareputationscoretoeachofthesemobileserversbasedonitshistoricaltaskcompletionperformance.MobileclientssendrequeststotheCCPtoutilizeitscloudresourcesforcompletingvarioustasks.TheCSPisatrustedthird-partythatguaranteesprivacyofmobileserverswhileenablingecientMCCservices.Oursystemworksasfollows.Firstofall,mobileserverssendtheirlocationsextractedfromtheirGPSsensors,andreputationscoreslearnedfromtheCCPtotheCSP(Step1)whothencollectsupdatesandreleasesaReputation-basedPrivateSpatialDecomposition(R-PSD)accordingtotheprivacybudgetagreeduponwithmobileservers(Step2).WhentheCCPreceivesataskrequestfrommobileclients(Step3),itusestheR-PSDtodecideageocastregionthatcontainsmobileserversincloseproximitytothetaskand 84

PAGE 85

withhighreputationlevel.Then,theCCPinitiatesageocastcommunicationprocess[ 114 ]toallmobileserverswithinthegeocastregion(Step4).GeocastisusedinourframeworktokeeptheCCPfromdirectlycontactingmobileservers.NotethatwhensanitizingadatasetbasedonDP,itisrequiredtocreatesomefakelocationsintheR-PSD.Ifallowedtocontactmobileserversdirectly,theCCPcaneasilyidentifythesefakepoints,andthereforebreachprivacywheneveritfailstoestablishacommunicationchannelwithsomemobileservers.Afterreceivingthetask,amobileserverdecideswhethertoacceptthetaskornot.Ifthemobileserverdecidestoacceptthetask,itreplieswithamessageconrmingitsavailabilitytotheCCP(Step5).Otherwise,itdoesnotreplyandremainsinvisibletotheCCP.InsuchaMCCsystem,wefocusonprotectinglocationprivacyofmobileservers.Themostchallengingissuesintheproposedframeworkarethefollowing: 1. HowtoincorporatethereputationintothetraditionalPSDtoconstructtheR-PSD? 2. HowtochooseanappropriategeocastregioninordertoensurehighservicequalitygiventheuncertainnatureoftheR-PSD? 3. Howtodisseminatetaskstomobileserversinthechosengeocastregionwithlowoverhead?Asshownlater,weproposenovelandecientapproachestoaddressallofthesechallenges. 4.2.2TaskandMobileServerCharacteristicsTasksconsideredinthesystemarelocation-dependent,i.e.,theymustbeperformedatspeciclocations.Typicalexamplesincludesensingtasksandthoseinlocation-basedservices.Inmanycases,themobileserverneedstotravelphysicallytothelocationassociatedwiththetask.Therefore,mostmobileserversthatperformataskwillbelocatedincloseproximitytothetasklocation.Furthermore,itisnotuncommonthatsometasksneedtobeperformedbymorethanonemobileserver. 85

PAGE 86

Mobileserversaremobiledevicesthathavediversecomputing,communication,andsensingcapabilities.Theycouldbeeithervoluntaryresourcesthataredonatedbymobileusers[ 115 ]similartothatinparticipatingsensing,orrecruitedbytheCCPusingsomeincentivessimilartothatincrowdsourcing[ 116 ].Duetotheirheterogeneity,mobileserversmayhavedierentperformanceswhencompletingatask.Weusereputationasameansforevaluatingthequalityofresultsreceivedfrommobileservers.Itreectsthetrustworthinessofthereturnedresults.Here,weassumethattheCCPhasalreadyimplementedareputationsystemthatassignstrustleveltoeachmobileserver.Severalreputationsystems[ 117 { 120 ]havebeenproposedintheliterature,andtheycanbeusedinoursystem.Similarto[ 119 ],thereputationscoreofamobileserverisanumberintherangeof0and1.Ahightrustlevelmeansthataparticularserverhasbeenprovidingreliableresultsinthepast.NotethatthetrustlevelofamobileserverisupdatedperiodicallybytheCCPbaseditspastperformance. 4.2.3ThreatModelandAssumptionsWeaimtoprotectthelocationprivacyofmobileserversbeforetheyacceptatask.Notethatonceamobileserveracceptsatask,itmaybyitselfrevealitsinformationtotheCCPortheclientwhorequeststhetask.However,informationdisclosureatthisstageisoutofourscope.Weonlyfocusonprivacyleakagebeforethemobileserveracceptsataskduetotworeasons.First,beforeacceptingatask,allmobileserversarecandidatesforreal-timetaskallocation,andthustheirlocationinformationismonitoredcontinuously.Thevolumeandtimescaleofinformationexposuremakeprivacyprotectionmechanismsanecessity.Ontheotherhand,onlyafewmobileserverswillacceptthetaskandexposetheirinformationatlaterstage.Second,amobileserverexplicitlyconsentstorevealitslocationinformationafteracceptingatask,whichisunavoidableinourscenario.Theinuenceofsuchleakagecanbemitigatedbyhidingitsidentity,whichisdoneinTor[ 121 ].Ontheotherhand,fewpapersconsiderthemorechallengingproblemofpreservingprivacyatearlystage. 86

PAGE 87

WeconsidertheCCPnottrustworthywhenprotectingservers'privacy.AnyinformationlearnedbytheCCPmaybeleakedtoadversarieswhenitiscompromised.Ontheotherhand,theCSPhasalreadyestablishedatrustrelationshipwithmobileserversthroughcelluarservicecontractsandmutuallyagreedrulesregardinginformationdisclosure.Moreover,theCSPalreadyhasaccesstothelocationsofmobileserversthroughlocalizationtechniquessuchascelltowertriangulation.Hence,reportingmobileserverlocationsdisclosesnoadditionalinformation.AlthoughtheCSPistrustedtolearnmobileserverlocations,wecouldnotrelyonitwhenallocatingtasksamongmobileservers.Thereasonisobvious:taskallocationinvolvesmultipleissuessuchasprolemanagementandmobileclientcategorization,andtheCSPhasnoexpertiseorincentivetogetinvolvedwithsuchservices.Therefore,inourframework,theCSPisconsideredasatrustedthird-partythathelpsprotectingprivacyofmobileservers.Wealsoassumesecure,reliable,andauthenticatedcommunicationchannelsamongtheCCP,theCSP,andmobileservers. 4.2.4PerformanceMetricsThissectionpresentsataskallocationmodelthateectivelyallocatestasksamongmobileserversintheMCCsystemwhileprovidingdierentiallocationprivacyformobileservers.AddingprivacyprotectiontotaskallocationgreatlycomplicatestheproblemsincetheCCPcannolongerallocateataskamongmobileserversbasedontheirexactlocations.DuetotheuncertainnatureofDP,itispossiblethatthereisnomobileserverinageocastregion,evenifthenoisycountshowspositive.Thusthetaskmaynotbecompletedasno,oraninsucientnumberofmobileserversareactuallynotied.Also,ifthetaskisallocatedtomostlymobileserverswithlowreputationscores,theresultmaynotsatisfythequality-of-servicerequirementforthetask.Finally,theCCPmayneedtocontactmoremobileserversthanwhatisneededintheprivacy-obliviouscase,increasingsystemoverhead.Therefore,wefocusonthefollowingperformancemetricsfortheproposedframework: 87

PAGE 88

AcceptanceRate.DuetotheuncertaintyofR-PSD,theCCPmayfailtondenoughmobileserversforatask.Theacceptancerateoftaskallocationcapturestheratioofacceptedtaskstoalltaskrequests. ServiceQuality.Evenifataskisacceptedbysomemobileserversbasedontheirlocationinformation,theymaynotfulllthetasksuccessfullytomeetthequality-of-servicerequirementforthetask.Highservicequalitycouldbeachievedbyselectingonlymobileserverswithhighreputationscore;however,thismayresultinlowacceptancerate.Therefore,oneneedstobalanceservicequalityandacceptanceratewhenchoosingthegeocastregion. SystemOverhead.Additionalsystemoverheadisincurredduetoprivacyprotection,asmoremobileserverswouldbecontactedwhentheCCPdoesnotknowtheirpreciselocations.Weusetheaveragenumberofnotiedmobileserverstoquantifythecommunicationoverheadofnotifyingmobileserversinthetaskdisseminationprocessandthecomputationaloverheadofgeocastregiondetermination. 4.3ConstructingtheR-PSDInthissection,wesolvetherstchallengingissue,thatis,howtoincorporatethereputationtobuildtheR-PSD.WerstdescribethemethodofbuildingaPSDwithoutconsideringanyreputationinformationandthen,addreputationlevelsintothepreviousmethodtoconstructaR-PSD. 4.3.1ConstructingPrivateSpatialDecompositionsTobuildaPSDbasedonthelocationsofmobileservers,wefollowthestate-of-the-artAdaptiveGrid(AG)approachproposedin[ 108 , 112 ]duetotheadvantagesmentionedinSection 4.1.2 .TheAGapproachoverlaysatwo-levelgridontoaregion.Attherstlevel,thelocationdomainisuniformlypartitionedintom1m1cells.TominimizeerrorsduetoDPpartition,thelevel-1granularitym1ischosenas m1=max 10;&1 4r N k1'!;(4{3)whereNisthetotalnumberofmobileserverlocations,isthetotalprivacybudget,andk1isasmallconstantdependingonthedatasets.Thisheuristicmethodisdata-independent,andthusdoesnotconsumeanyprivacybudget.Extensiveexperimentsin[ 112 ]suggestthatk1=10workswellfordierentsizesofdatasetsanddierentprivacybudget. 88

PAGE 89

Afterthelevel-1gridisdetermined,theCSPissuesanoisycountqueryforeachlevel-1cellusingasmallportionoftheprivacybudget:1=,whereparameter2(0;1)decideshowtheprivacybudgetisdividedbetweentwolevels.ThentheCSPfurtherpartitionseachlevel-1cellintom2m2level-2cells,wherem2ischosenas: m2=&r N02 k2';(4{4)where2=)]TJ /F3 11.955 Tf 12.73 0 Td[(1istheremainingprivacybudgetfornoisycountineachlevel-2cell,N0isthenoisycountofmobileserverlocationsinthelevel-1cell,andk2isaconstantdependingondatasets.In[ 108 ],Toetal.arguethatthePSDcountshouldbesmalltoprovidenegranularity,butshouldbelargerthanthestandarddeviationofaddednoise.Herewesetk2=p 2,whichprovidesagoodbalancebetweenthesetwoconsiderationsaccordingto[ 108 ].AnexampleofanadaptivegridisillustratedinFig. 4-2 .Thereare4level-1cells,A1;A2;A3;A4,inthelevel-1grid,whichisdeterminedby( 4{3 ).Thenoisycountofeachlevel-1celliscalculatedbyaddingrandomLaplacenoisewithmean1=1totheactualcount.Theselevel-1cellsarefurtherdividedintolevel-2cellsbasedonthenoisycountofmobileserversineachcell.Thenumberoflevel-2cellsineachlevel-1cellism2m2,wherem2iscalculatedin( 4{4 ).Inthegure,thenoisycountforthelevel-1cellA1is200andA1isdividedintoa33grid.Inthesameway,otherlevel-1cells,A2;A3;A4,arealldividedinto22level-2grids.TheAGapproachnallygivesnoisycountsofmobileserversforeachlevel-2cell,whicharecalculatedbyaddingrandomLaplacenoisewithmean1=2.FinallytheCSPpublishesthesenoisycountstogetherwiththestructureofAG.TheAGapproachprovides-DPguaranteeaccordingtoTheorem 4.1 . 4.3.2ConstructingReputation-BasedPrivateSpatialDecompositionNotethatthePSDconstructedbeforedoesnotcontainthereputationinformationofmobileservers,andthereforetheCCPcannotcontrolthequalityoftheanswers.Integratingreputationinformationintoourframework,however,isnon-trivial.First, 89

PAGE 90

Figure4-2. Exampleofanadaptivegrid. incorporatingreputationintothePSDaddsanewdatadimension,whichincreasesthedicultyofapplyingDP.Specically,theCSPshouldbothprovideanoisycountofmobileserversfollowingDPandrevealthereputationassociatedwitheachmobileserver.Thuswedesignanewdatastructure,whichisauniquefeatureofourframework.Second,toensureservicequality,theCCPprefersmobileserverswithhighreputationscore.However,MCCresourcesmaybescarce,anditmaytakealongtimetowaitforamobileserverwithhighreputationscore.Thisisimpracticalformanytime-criticaltasks.Asaresult,theCSPshouldbalancebetweenacceptancerateoftaskallocationandservicequality.Inthefollowing,werstproposeanewdatastructure,R-PSD,thatintegrateslocationandreputationinformationwithoutcompromisingmobileserverlocationprivacy.TaskallocationbasedontheproposedR-PSDwillbedetailedinSection 4.4 .TheR-PSDconsistsofmultiplelayersofsub-PSDs.Therangeofmobileserverreputationscoresisrstdividedintoseveralreputationlevels,andmobileserverswhosereputationscoresfallintothesamelevelaregroupedtogether.Thenumberoflevels 90

PAGE 91

Figure4-3. ExampleofR-PSDwithtworeputationlevels. reectsthegranularityofreputationthataspecictaskneeds.Finergranularityleadstobetterqualitycontrol,butincurshighersystemoverheadincomputingR-PSD,choosingthegeocastregion,andcontactingmobileservers.Asub-PSDisconstructed,inthesamewayasconstructingaPSD,foreachgroupofmobileserverswiththesamereputationlevel.Theonlydierencebetweenasub-PSDandaPSDisthenumberofdatapoints.Welabelsub-PSDsbasedonitscorrespondingreputationlevel.Forexample,sub-PSDformobileserverswithreputationleveliisnamedasthei-thsub-PSD.Fig. 4-3 showsanexampleofR-PSDwithtworeputationlevels.Theprivacybudgetallocatedtoallsub-PSDsissettobe.Sincesub-PSDsarecalculatedbasedondisjointsubsetsofmobileservers,theprivacybudgetforthecombinationofsub-PSDsisthemaximumofprivacybudgetconsumedineachsub-PSDaccordingtoTheorem 4.2 ,whichisequalto.Theresultingcellsineachsub-PSDaredierentfromeachotherwhenreputationlevelisnotuniformlydistributedamongmobileservers. 4.3.3AuthenticationofMobileServerTrustLevelTheCSPgatherslocationandtrustlevelinformationofmobileservers.ThelocationinformationcanbeeasilyobtainedsincetheCSPalreadyhasaccesstothelocations 91

PAGE 92

ofmobileservers.Ontheotherhand,thetrustleveliscalculatedbytheCCP.AstraightforwardsolutionistoasktheCCPtosharethisdata.However,sinceamobileservermayhaveseveralpseudonymsfordierentCCPs,andeachCCPwillcomputeatrustlevel,itisdicultfortheCSPtomatchthesepseudonymswiththeuniqueidentierofamobileserver.Therefore,weletmobileserverstoreporttheirowntrustlevelstotheCCP.However,amobileservermayreportafaketrustleveltohaveahigherchanceofbeingnotied.Toensuretheauthenticityofthetrustleveldata,asignatureisrequiredfromtheCCP.WhentheCCPcalculatesthetrustleveltrforamobileserver,itsendsthefollowingmessagetothemobileserver: CCP!MobileServer:tr;SIG)]TJ /F11 5.978 Tf 5.28 -1.34 Td[(CCP(psyjjtrjjt);(4{5)wherepsyisthepseudonymofMSwhenheinteractswiththeCCP,SIG)]TJ /F11 5.978 Tf 5.29 -1.34 Td[(CCPdenotesthesignatureusingCCP'sprivatekey)]TJ /F4 7.97 Tf 206.02 -1.79 Td[(C,andtisthetimestampusedtopreventreplayattack.MobileserversneedtosendthismessagetoCSP,whocanauthenticatethemessagebyverifyingthesignatureandensuringmessageintegrity: MS!CSP:psy;tr;SIG)]TJ /F11 5.978 Tf 5.29 -1.34 Td[(C(psyjjtrjjt):(4{6) Remark1. IftheCSPisnottrustedtolearnworkertrustlevel,wecanusezero-knowledgeprooftoensurethattheCSPcanverifythevalidityoftrjbutcannotlearnthevalueoftrj. 4.4TaskAllocationInthissection,wesolvethesecondchallengingissueasexplainedinSection 4.2 .Toallocateataskamongmobileservers,theCCPqueriestheR-PSDandcomputesageocastregion.Allmobileserversintheregionarenotiedofthetask.ThegoaloftheCCPwhendeterminingthegeocastregionistoachievehighacceptancerateoftaskallocationwhilemeetingthequality-of-servicerequirementofthetaskandreducingthesystemoverhead. 92

PAGE 93

4.4.1AcceptanceRateCharacterizationThemobilecloudinourscenarioconsistsofproximatemobilecomputingentitiesinsteadofdistantcloud-basedresources.Consideringthecostincurredbytravelingorcommunication,amobileserverismorelikelytoacceptanearbytaskthanadistantone[ 122 ].Wemodeltheprobabilityofacceptingataskforamobileserverasafunctionofthedistancebetweenthemobileserverandthetask.Forsimplicity,weusealinearmodeltocharacterizetherelationshipbetweenindividualacceptancerateandmobileserver-taskdistance.Letpadenotetheprobabilityforamobileservertoacceptanotiedtaskandddenotethedistancebetweenthemobileserverandthetask.Thenwehave pa=p0a)]TJ /F3 11.955 Tf 11.96 0 Td[(d;0ddmax(4{7)wherep0adenotestheacceptancerateofaserverwithinthesamelevel-2cellasthetask,isapositiveparameter,anddmaxisathresholdoverwhichtheacceptancerateofaserveriszero.Obviously,inthismodel,asthedistanceincreases,theacceptanceratedecreaseslinearly.Notethatothermodelsdependingondierentapplicationscouldbeusedinourframeworkaswell.NowweproposeananalyticalmodelthatenablestheCCPtoestimatetheexpectedacceptancerateoftaskallocationARk,i.e.,theprobabilitythatatleastkmobileserversacceptthetaskinagivengeocastregion.Itisdeterminedbytheacceptancerateofeachindividualmobileserverandthenumberofmobileserversinthegeocastregion.First,assumewehavemindependentmobileserversinasinglelevel-2cellofthePSD(henceeachserverhasthesamepa),andthetaskrequiresatleastkmobileserverstoperformthetask.TheoverallacceptancerateARkiscalculatedasfollows: ARk=1)]TJ /F4 7.97 Tf 12.87 14.95 Td[(k)]TJ /F8 7.97 Tf 6.59 0 Td[(1Xi=0mi(pa)i(1)]TJ /F3 11.955 Tf 11.96 0 Td[(pa)m)]TJ /F4 7.97 Tf 6.59 0 Td[(i:(4{8)Next,ifageocastregioncontainsmultiplelevel-2cells,theoverallacceptanceratecanbecalculatediteratively:WerststartwithanemptysetandinitializeARkto0.Thena 93

PAGE 94

newcellisaddedtotheset,andthevalueofARkisupdated.Theprocesscontinuesuntilwehaveaddedallcellsinthegeocastregion.LetARold(j)denotetheprobabilitythatexactjmobileserversacceptthereceivedtaskintheoriginalregion,ARnew(j)denotetheprobabilitythatexactjmobileserversacceptthereceivedtaskinthenewregion,andAR+(j)denotetheprobabilitythatexactjmobileserversacceptthereceivedtaskintheaddedregion.Wehavethefollowingupdateequation: ARnew(j)=jXn=0)]TJ /F3 11.955 Tf 5.48 -9.69 Td[(ARold(n)AR+(j)]TJ /F3 11.955 Tf 11.96 0 Td[(n):(4{9)Thentheprobabilitythatatleastkmobileserversacceptataskinthegeocastregioniscalculatedas: ARk=1)]TJ /F4 7.97 Tf 12.87 14.94 Td[(k)]TJ /F8 7.97 Tf 6.59 0 Td[(1Xj=0ARnew(j):(4{10)Notethateachtaskcanhaveaminimumrequirementonitsacceptancerate.Whenconstructingthegeocastregion,theCCPneedstoensurethattheregioncansatisfythisthreshold ARk. 4.4.2ServiceQualityCharacterizationBesidestheacceptancerateoftaskallocation,theCCPalsoneedstoensurethequalityofresults.Intuitively,thequalityofresultsdependsonthereputationofparticipatedservers.However,itishardtolearnwhichserverswillparticipateinthetaskallocationprocess,sincetheCCPcanonlydisseminateatasktoaregionandwaitformobileserverstoreply.Supposeataskneedskmobileserverstoperform,theCCPwillallocatethetasktotherstkmobileserverswhoreply.Theabovefactormakesitdiculttodirectlycontrolthequalityofresults.OurproposedR-PSDprovidesnoisycountsofmobileserverswithdierentreputationlevelsinaregion,whichenablestheCCPtodeterminethecellsinwhichmoretrustworthymobileserversarelocated.Thereforetheproblemofselectingreliablemobileserverscanbetransformedintotheproblemofdeterminingageocastregionthatcontainsenoughtrustworthymobileservers.Notethatinoursystem,multiplemobileserverscanperform 94

PAGE 95

thesametask,andtheirresultsneedtobeaggregatedtogethertogetthenalresult.Supposewehaveserverswithldierentreputationlevels,andthenumberofserversforeachreputationleveliswi;i=1;2;:::;l.Thequalityofresultscanbecapturedbyafunction(w)wherew=[w1;w2;:::;wl]denotesthereputationdistributions.Intuitively,()isapositive-valuedfunction,anditsvalueincreasesmonotonicallywhentheportionofserverswithhighreputationlevelincreases.Thefunction()quantiesdierentaspectsoftheresultunderdierentapplicationscenarios.Inapplicationthatmonitorsthevalueofcertainmetrics,suchasepidemicmonitoring,()mayrepresenttheaccuracyofestimation.Ontheotherhand,inapplicationsthatrequiresdecisionmaking,suchasforecastinganepidemicoutbreak,()maydenotethedetectionandfalsealarmprobabilities.Here,weassumethattheminimumservicequalityindicatedbythetaskisrepresentedasathreshold ,whichmustbesatisedbytheCCP. 4.4.3GeocastRegionConstructionwithPSDGivenatask,thegeocastregionshouldbeconstructedtobalancethreegoals:(1)acceptancerateoftaskallocationshouldbecloseto100%,(2)thenumberofnotiedmobileserversshouldbesmall,and(3)theaggregateresultofparticipatingmobileserversshouldbetrustworthy.However,itisnothardtoseethatoptimizingthesethreedesigngoalssimultaneouslymaybeimpossibleinpractice.Forinstance,togetahighacceptancerateoftaskallocationandasmallnumberofnotiedmobileservers,theCCPhastoselectmobileserverswhoseindividualacceptancerateishigh(i.e.,nearbymobileservers)despiteoftheirtrustlevels,whichmakestheservicequalitysuboptimal.Ontheotherhand,ifwewanttoimprovetheservicequality,theCCPshouldnotifymobileserverswithhighreputationlevels,whichincreasesthedicultyofgettingenoughmobileservers.Inthiscase,eithertheacceptancerateisreduced,orthenumberofnotiedmobileserversisincreased.Foreaseofpresentation,werstpresentinthissectionanalgorithmofgeocastregionconstructionwhichonlyconsidersthersttwogoals.Thealgorithmtakestaskt 95

PAGE 96

andthePSDofmobileserversasinput,andoutputsthegeocastregionwheretasktisallocatedto.Thebasicideaofouralgorithmistorstinitializethegeocastregionwiththelevel-2cellthatcoverstaskt,anddeterminestheacceptancerateARofthiscellby( 4{8 ).Ateachstep,thecellthatproducesthelargestincreaseofacceptancerateisadded.Consideringtherelationshipbetweenacceptancerateanddistance,werstchoosecellsthatareclosertotaskt.ThegeocastregionstopsexpandingeitherwhenARexceedsthethreshold ARk,orwhenthesizeofthegeocastregionislargerthan2dmaxsinceARcannotincreaseanyfurtherbeyondthisrange.Algorithm 3 givesthedetailedstepsoftheproposedgreedyalgorithm. Algorithm3. GreedyAlgorithmwithPSD Input: Taskt,dmax, ARk,k Output: Geocastregion 1: Initialize=;,ARk=0; 2: LetUdenotethesquareoflength2dmaxcenteredatthetasklocation; 3: LetAR()denotetheoverallacceptancerateARkofaregion; 4: Q fthelevel-2cellthatcoverstasktg; 5: repeat 6: ifQ=;then 7: return; 8: else 9: c argmaxc2QAR(GR[c); 10: Q Qnfcg; 11: [fcg; 12: ARk AR(); 13: S (fneighborsofcgn)\U; 14: Q Q[S; 15: endif 96

PAGE 97

Figure4-4. IllustrationofageocastregionwithR-PSD. 16: untilARk ARk 17: return;endInsummary,Algorithm 3 expandsthegeocastregionbyaddingthecellwiththemaximummarginalARateachstep,whichdependsonthecellpositionandmobileserverdistributionsinthatcellbasedon( 4{8 ). 4.4.4GeocastRegionConstructionwithR-PSDNowwepresentthegreedyalgorithmthatconsidersallthreegoals,anddetermineageocastregionbasedontheR-PSD.NotethatintheR-PSD,eachsub-PSDpartitionsthespaceintheirownways,andthuscellsindierentsub-PSDsmayoverlap.Thegeocastregioninthiscaseisacombinationofcellsinallsub-PSDsfortheR-PSD.AnexampleofageocastregionisillustratedinFig. 4-4 . Algorithm4. GreedyAlgorithmwithR-PSD Input: Taskt,R-PSDwithlsub-PSDs,dmax, ARk, ,k Output: Geocastregion 97

PAGE 98

1: Initialize=;,ARk=0,wi=0;i=1;2;:::;l; 2: LetUdenotethesquareoflength2dmaxcenteredatthetasklocation; 3: LetAR()denotetheoverallacceptancerateARkofaregion; 4: Let(w1;w2;:::;wl)denotetheservicequalitygivenw1;w2;:::;wl; 5: fori=1;2;:::;ldo 6: Qi flevel-2cellini-thsub-PSDthatcoverstasktg; 7: endfor 8: repeat 9: ifQi=;;8ithen 10: return; 11: endif 12: fori=1;2;:::;ldo 13: ci argmaxci2QiAR([ci); 14: endfor 15: Sortci;i=1;2;:::;lindecreasingorderofAR([fcig); 16: Compute(w1;w2;:::;wt)for[fciginthepreviouslycomputedorderuntilwemeetacjthatsatises(w1;w2;:::;wt) ; 17: ARk AR([fcjg); 18: Qj Qjnfcjg; 19: [fcjg; 20: fori=1;2;:::;ldo 21: Si=(fneighborsofcign)\U; 22: Qi=Qi[Si; 23: endfor 24: until)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(ARk ARk^((w1;w2;:::;wl) ) 25: return;end 98

PAGE 99

Theinputtothealgorithmistaskt,R-PSDwithlsubPSDs,andparametersdmax, ARk, ,andk.ThevariablewirepresentsthenoisycountofserversincludedinthegeocastregionGRthatbelongstothei-thsub-PSD,i=1;2;:::;l.InadditiontotheconstraintARk ARkconsideredinAlgorithm 3 ,weaddanewconstraint(fwigl1)< ,whichguaranteestheservicequalityofthechosenmobileservers.ThegeocastregionGRisrstinitializedtoanemptysetandthenexpandediteratively.Ineachiteration,anewcellthatbothbestimprovesARkandensures(fwigl1)< isselectedandaddedtoGR.ThegeocastregionstopsexpandingwhennonewcellswithindistanceofdmaxcanbeaddedoruntilARkexceeds ARk.Thealgorithmisagreedyapproachthatalwayschoosesthecellwiththehighestacceptanceratewhileguaranteeingservicequalityateachiteration. 4.5GeocastCommunicationProcessTosolvethethirdchallengeasexplainedinSection 4.2 ,taskdisseminationintheselectedgeocastregioncanbeimplementedwiththeinfrastructureoftheCSP.TheCSPeitherdirectlysendsamessagetoeachmobileserversintheregion,ornotiesthemessagetoseveralmobileserversandlettheserversrelaythemessagehop-by-hop.Thecommunicationcostfortheformerapproachisproportionaltotheaveragenumberofnotiedmobileservers,whichmaybehighwhenalargenumberofmobileserversshouldbenotied.Henceitissuitableonlywhenserversaresparselydistributed.Ontheotherhand,thehop-by-hopapproachreducestheoverheadfortheCSP.Anecientwaytodeliverpacketsinageographicregionisusinggeographicrouting,i.e.,geocast.Geocasthastheadvantageofloweroverheadandfasterresponsetodynamicsoverad-hocroutingprotocolswhenamessageneedstobebroadcastedtoageographicarea.Geocastinourprotocolisslightlydierentfromthatinpreviouspapers[ 114 , 123 ]byaddinganewdimension,thereputationlevel.Onlymobileserversthatsatisfyboththereputationandthelocationrequirementsofthegeocastregionwillbenotied.ThisispossiblesincetheCSPalreadyhasaccesstolocationsandreputationscoresofmobileserverswhen 99

PAGE 100

calculatingR-PSD.Inthischapter,weusetheaveragenumberofnotiedmobileserverstomeasurethesystemoverheadwhichincludesthegeocastoverhead. 4.6PerformanceEvaluationInthissection,weevaluatetheperformanceofourproposedframeworkusingreal-worlddatasets. 4.6.1ExperimentalSetupWeusetworeal-worlddatasets:Gowalla[ 124 ]andCrowdFlower[ 125 ].TheGowalladatasetisusedtosimulatethespatialdistributionofmobileserversinourexperiments,whichcontainsatotalof6;442;890check-insonalocation-basedsocialnetworkingwebsitefromFeb.2009toOct.2010.Weusethecheck-inhistoryofGowallausersasthetaskallocationhistoryofmobilemobileservers.WeconsiderGowallausersasthemobileservers.Weassumeallcheck-insofaGowallauser,exceptthelatestone,aretasksthathavebeencompletedbyhim/her,andthelatestcheck-inlocationistreatedashis/hercurrentlocation.Duetodatasparsity,therearenodatapointsinsomeareaofthedataset.Henceweoverlaythedatasetwithasetofuniformlydistributedmobileservers.TheresultingmobileserverdistributionisshowninFig. 4-5A ,whereeachcrossinthegurerepresentsthecurrentlocationofamobileserver.Weextractthereputationscoresofmobileserversbasedonastudycarriedoutin[ 126 ],whichasksparticipantstoreporttraceventsinDublin.Theycreatedandassignedapproximately4000tasksandcalculatedreputationscoresofparticipantsbasedonthegroundtruthofthetasksandhistoricalperformanceinCrowdFlower.WerandomlyassignthereputationscorestoGowallauserssothatwewouldgetadatasetwhichcontainsbothtaskperformancehistoryandreputationscoresofservers.ThereputationdistributionisgiveninFig. 4-5B .AswepresentedinSection 4.4 ,theservicequalityofthetaskcanbecapturedbyfunction().Inourexperiment,weusetheresultsin[ 127 ]toestimatetheservicequalityofthetaskwhenkmobileserverswithpotentiallydierentreputationlevelsperform 100

PAGE 101

A BFigure4-5. (a)Currentlocationsofmobileservers;(b)Reputationdistributionofmobileservers. thesametask.Intheirpaper,theerrorrateofcompletingatask,denotedasER,isthemetrictoquantifytheservicequality.Supposeweusethemajorityvotingtoaggregatetheresultsofmobileserversforatask.Itisprovedin[ 127 ]thattheerrorrateER,therequirednumberofserversk,andthecollectivequalityQsatisfythefollowinginequality: kQ2 4ln1 ER:(4{11)ThecollectivequalityQiscalculatedinthefollowingway.DeneXasarandomvariabletodescribetheeventthatamobileserversubmitsacorrectanswer.WehavePr(X=True)=prandPr(X=False)=1)]TJ /F3 11.955 Tf 12 0 Td[(pr,wherepristhereputationscoreofamobileserverinourscenario.Ifthereputationscoresofmobileserversareindependentandidenticallydistributed,wehave Q=E(2pr)]TJ /F1 11.955 Tf 11.95 0 Td[(1)2;(4{12)wheretheexpectationistakewithrespecttothedistributionofreputationscores.Therefore,givenaerrorraterequirementERforataskandthenumberofrequiredmobileserversk,wecandeduceacorrespondingrequirementonthereputationscoredistributioninthegeocastregion. 101

PAGE 102

Inourexperiments,wesupposethatmobileserversaredividedintotwogroupswhosereputationscoresfallinto[0;0:5]and(0:5;1],respectively.Thenumberofserversineachgroupisw1andw2,respectively.Foragivengeocastregion,thecollectivequalitydependsontheratioofthenumberofmobileserversineachgroup,i.e.,w1=w2.Ifthereputationscoreineachreputationlevelfollowsanuniformdistribution,thecollectivequalityQcanbecalculatedfrom( 4{12 )asQ=E(2pr)]TJ /F1 11.955 Tf 11.96 0 Td[(1)2=Z0:50w1 w1+w2(2x)]TJ /F1 11.955 Tf 11.95 0 Td[(1)21 0:5dx+Z10:5w2 w1+w2(2x)]TJ /F1 11.955 Tf 11.95 0 Td[(1)21 0:5dx=w1 (w1+w2)3: (4{13)GivenanrequirementonERandthenumberofmobileserversk,wecandeducealowerboundforQandfurthercalculatearequirementonw1andw2.Whenconstructingthegeocastregion,theCCPneedstoensurethattheregioncansatisfythisrequirement.Werandomlygenerate1;000taskswhichareuniformlydistributedinanarea,anduseouralgorithmstocalculateGRregionsforeachtask.Wealsoimplementabaselinealgorithmthatisprivacy-oblivious.Thebaselinealgorithmhasaccesstoexactlocationsofallserversandalwaysaddsthenearestservertoasetuntiltheacceptancerateofthesetsurpassestheacceptancethreshold ARk. 4.6.2ExperimentalResults EvaluationofsystemoverheadforachievingprivacyWerstevaluatethesystemoverhead(i.e.theaveragenumberofnotiedmobileservers)incurredbyourprivacy-preservingmethod.Fig. 4-6 presentsthesystemoverheadforourprivatealgorithm(thegreedyalgorithmbasedonPSD)andthebaselinealgorithmwhenvaryingprivacybudget.Asincreases,whichmeansmobileserversarelesssensitivetotheirprivacybreach,thePSDprovidesmoreaccuratedataforgeocast,and 102

PAGE 103

Figure4-6. Eectofprivacybudgetwhen ARk=0:9. thegeocastoverheaddecreasesaswell.Additionally,wecanobservethatcomparedwiththebaseline,ourprivatealgorithmdoesnotsignicantlyincreasethesystemoverhead,especiallywhentheprivacybudgetislargerthan0:3.Thisshowstheabilityforouralgorithmtochoosenearbymobileserversforatask. EectofvaryingacceptanceratethresholdonsystemoverheadWeevaluatetheperformanceofAlgorithm 3 andAlgorithm 4 byvaryingthethresholdoftheexpectedacceptancerate ARk.Weusetheprivacybudget=0:5forbothPSDandR-PSD.Fig. 4-7 illustratestheimpactofincreasing ARk.Asonewouldexpect,forahigher ARk,alargerGRregionshouldbeselected,andthustheoverheadwillincrease. AcceptancerateandreliabilityFig. 4-8 comparesAlgorithm 3 andAlgorithm 4 withdierentratiosofmobileserverswithdierentreputationlevels.Wecanseethatasthethresholddecreases,i.e.,moremobileserverswithgoodreputationareneeded,theadvantageofAlgorithm 4 overAlgorithm 4 becomeslarger.NotethatweusethereputationdistributioninFig. 4-5B forourexperiment,wherealargeportionofmobileservershavehighreputationscores. 103

PAGE 104

Figure4-7. Comparisononvaryingacceptanceratethreshold ARkwith=0:5 A BFigure4-8. Eectofreputationthresholdonthenumberoftasksthatreachesthethresholdofservicequalitywhen(a) ARk=0:7,=0:5;(b) ARk=0:9,=0:5. However,inpractice,theresourceofhighlyreliableserversareusuallyscarce,andthustheadvantageofAlgorithm 4 willbecomemoreprominent. 4.7RelatedWorkMobilecloudcomputing(MCC)extendstheconceptofcloudcomputingintothemobiledomain.TherearegenerallytwotypesofmobilecloudsinMCC:infrastructure-basedandad-hoc[ 97 ].Theinfrastructure-basedmobilecloudconsistsofstationarycomputing 104

PAGE 105

resourcesandprovidesservicestothemobileusersviatheInternet.Researchinthisdirectiongenerallyfocusesonreducingthecostandcomplexityofusingcloudresources.Chunetal.[ 128 ]proposeaCloneCloudthatclonesthemobileplatformintothecloudVMandenablestheremoteservertoexecutecomputation-extensivepartofajob.In[ 129 ],Zhangetal.partitionajobintomultiplesmalltaskswhichareleastdependentoneachother.Thesesmalltasksareeitherexecutedlocallyorremotelybasedontheresourceintensity.Alternatively,inthead-hocmobilecloud,acollectionofmobiledevicesperformsascloudresourcesandprovidesaccesstolocalorInternet-basedcloudservicestoothermobileusers.Inthischapter,wefocusonthesecondcase.Thereareonlyafewpapersalongthisline[ 98 { 101 ].Theworkin[ 101 ]exploitstheresourceofnearbymobiledevicestoperformintensecomputationjobsanddesignsaschemetoalleviatefrequentdisconnectionsofmobileservers.Ontheotherhand,Huerta-CanepaandLeein[ 100 ]useanad-hocclusterofnearbymobiledevicestoenhancecomputingcapabilitiesofamobiledevicewithminimumnetworklatencyandtrac.Theirsystemcontinuesmonitoringtracesofmobileserversandestablishespeer-to-peerconnectionamongthem.Theworkin[ 98 ]providesapreliminaryframeworkthatpaysnearbysmartphonestorunresource-intensivetasks.Asemphasizedin[ 98 ],securityandprivacyofmobileserversisacriticalconcerninad-hocmobilecloud.Moreover,itisalsochallengingtoensurethequalityoftheserviceprovidedbythesedynamicmobileservers.Thereisaninherentconictbetweenqualityofservice(i.e.,utility)andprivacyintaskallocation,whichcomplicatestheproblem.Noneofpreviousworksfocusontheprivacyissueofmobileservers.Inthischapter,weproposeaframeworkthatprovidessolutionstotheabovechallenges,wherebothlocationprivacyandservicequalityareconsidered.Severalsolutionstoprivacyissuesinmobileapplicationshavebeenproposed.Forexample,aggregationisacommonapproachtohideindividualsensitiveinformationwhen 105

PAGE 106

onlystatisticsofusersarerequired.In[ 103 ],ausercanlearntracconditionbasedonreportsofothermobileusers,andthusthelocationinformationofalltheusersarelearnedbytheserviceprovider.Toprotectlocationprivacy,individualsensitiveinformationishiddenthroughaggregation,andonlystatisticsofusersareprovided.However,thisapproachonlycalculatesstatisticsandthuscannotbeusedtoselectmobileserversinanad-hocmobilecloud.Anotherapproachisusedinlocation-basedservices,wherethetruelocationsareobfuscatedinlocation-basedqueries,andtheserviceproviderreturnsresultsbasedontheobfuscatedquery.In[ 104 ],DuckhamandKulikprovideamechanismtocalculatelocationthatbalanceslocationprivacyandtheneedforhighqualitylocation-basedservice.Withthemechanism,thelocation-basedserviceprovideronlylearnsminimuminformationtoprovideserviceofrequiredquality.In[ 105 ],Shokrietal.proposeagame-theoreticframeworkthatprovidesanoptimallocation-privacyprotectionmechanismandconsidertheadversarialknowledgeduringmechanismdesign.Inourscenario,however,theprivateinformationisnolongerpartofalocation-basedquery,buttheresultofalocation-basedqueryregardingthetask.Somepapers[ 106 , 107 ]considerqueriesonprivatelocationsinanoutsourceddatabase,buttheyonlyprotectprivatedatafromanintermediateserviceproviderwhileassumingatrustrelationshipbetweenthedataownerandthequeryingentity.ThisisnottrueinourscenariobecausemobileserversandtheCCPmaynotshareaninherenttrustrelationship. 106

PAGE 107

CHAPTER5OPTIMALTASKRECOMMENDATIONFORMOBILECROWDSOURCINGWITHPRIVACYPROTECTIONMobilecrowdsourcing(MC)isthecombinationofcrowdsourcingandmobiletechnologiesthatleveragesthesensing,computing,andcommunicationcapabilitiesofmobiledevicestoprovidecrowdsourcingservices.AccordingtothestatisticsgivenbyStatista[ 130 ],thenumberofsmartphonessoldtoendusersworldwideis1:2billioninthesingleyearof2014.NewsmartphonesareusuallyequippedwithvarioussensorsincludingGPSunits,accelerometers,gyroscopes,andtouchscreensensors,whichcanprovideawiderangeofsensingdata.Theubiquityandadvancedsensingcapabilitiesofmobiledevicesenablemobileuserstogatherrichinformationeverywhere/anytime,andthereforetoperformtasksthatcanhardlybecompletedbyweb-basedcrowdsourcing.InMC,acrowdofmobileusersareengagedtoprovidepervasiveandcost-eectiveservicesofdatacollecting,processing,andcomputing.Thesemobileusershaveshiftedfromthetraditionalroleofserviceconsumerstothenewroleofserviceproviders,andtheyusuallycollectasmallfee(orotherformsofreward)forprovidingservices.Theapplicationsofmobilecrowdsourcinghavedevelopedrapidly.ExistingcommercialMCapplicationsincludetracmonitoring(e.g.,Waze[ 131 ]),ridesharing(e.g.Uber[ 132 ]),environmentalmonitoring(e.g.,Stereopublic[ 133 ]),andwirelesscoveragemapping(e.g.OpenSignal[ 134 ]).Nonetheless,MCisstillinitsinfancy,andtherearemanyundergoingresearchexploringapplicationssuchasepidemicsmonitoringandprediction[ 135 ]andurbansensing[ 136 ].InMC,aspatio-temporaltaskisoutsourcedtoagroupofmobileusers(i.e.,workers)whoperformthetaskwithinadeadline,andonlyworkersundercertaincontextsarequaliedforthetask.However,itisquiteinecientforworkerstoselecttasksbythemselveswhenthereareahugenumberofcrowdsourcingtasks,especiallyonamobiledeviceduetoitslimitedscreenandkeyboard.Hence,MCplatformsmustprovidetaskrecommendationserviceswhichproactivelypushatasktoqualiedworkers.Incurrent 107

PAGE 108

solutions,workershavetorevealtheirexactcontextstoMCplatformsinordertoreceivepersonalizedtaskrecommendation.Dependingontheapplicationscenario,thecontextofaworkercanbedenedalongmultipledimensions,includinggeographical(e.g.,alongastreet),temporal(e.g.,withinhours),activity(e.g.,movingspeed),andprole(e.g.,gender)[ 137 ].Thesecontextscontainprivateandsensitiveinformationthatmaybeusedtouniquelyidentifyanindividual,revealhis/herhealthstatus,ortrackhis/herdailyroutines.However,theMCplatformsarepotentiallyuntrustworthyinthesensethattheymaybeoperatedbyvariousorganizationsandcompaniesandmayalsobecompromisedbymaliciousadversaries.Hence,allowingtheMCplatformstolearnexactcontextsmayputworkerprivacyatrisk[ 138 ].Itisimperativetoprotectworkerprivacyinordertoenablethelarge-scaledeploymentofmobilecrowdsourcingapplications.AnMCsystemhasthreecomponentsthatmayrevealprivateworkerinformation:1)oinestatisticscollectiontolearnrecommendationrulesbasedonworkercontextsandhistoricaltaskcompletionperformance,2)onlinetaskselectiontoselectthemostsuitabletaskstoaworkerbasedonhiscurrentcontext,and3)taskcompletionforaworkertoacceptandperformatask,andtoreturntheresultback.Eachcomponentexposesworkercontextsandraisesprivacyconcernsindierentways.Inthischapter,wefocusontheprivacyissueofthersttwocomponentsduetothefollowingreasons.First,contextdisclosureismoresevereduringtaskrecommendation,whichconsistsofthersttwocomponents,fromthestandpointofthedisclosurevolumesinceallworkersarecandidatesfortaskrecommendation,whileonlyfewworkersareeventuallyinvolvedinthetaskcompletioncomponent.Second,inthetaskcompletioncomponent,aworkersendsexplicitconsenttoperformatask,andcontextdisclosureisunavoidablesincemerelyperformingataskindicatesthathe/sheisintherequiredcontext.NotethatidentityprotectionduringtaskcompletioncouldbeprovidedthroughanonymousroutingorpseudonymssuchasTor,anditisnotthefocusofthischapter. 108

PAGE 109

Inthischapter,weproposeaframeworkforprotectingprivacyofworkercontextswhileenablingeectivetaskrecommendationinMCsystems.Specically,ourproposedframeworkcontainstwomaincomponentsthatmayoperateinparallel:\privacy-awareonlinetaskselection"whichselectsthebestMCtasksforworkersbasedontheircurrentnoisycontexts,and\privacy-preservingoinestatisticscollection"whichaggregateshistoricalinformationaboutworkercontextsandtaskcompletionactivitiesneededfortaskselectionwhilepreservingworkerprivacy.Privacy-AwareOnlineTaskSelection.CurrentMCsystemsselecttasksbycollectingpersonaldataataserver.Workershavetorevealtheirexactcontextinformationtotheserverinordertoparticipate.Toaddresstheprivacyconcernsofsuchserver-onlyrecommendation,analternativeapproachwouldbeworker-only,whereworkers'mobiledeviceskeeptheirownpersonalcontextinformationandperformrecommendation.Indeed,ithasbeenproposedforpersonalizationinmobileadvertisingsystems[ 139 ].Theproblemwiththisapproachisthehugecomputationandcommunicationoverheadforresource-constrainedmobiledevices.Thus,somerecentpapersproposehybridsolutionsthatjointlyconsiderbothsidestoaddressprivacyissuesinmobilesystems[ 140 { 143 ].Forexample,in[ 140 ],theserverreturnsasupersetoftheresultsandletenduserstolterusefulinformationbythemselves.Thesesolutionshaveavarietyofoptimizationgoals,whichmotivatesustoconsiderthefundamentaltrade-osinthesemobilesystems.Inthischapter,weformulatethetaskselectionfromaMCservertoaworkerasanoptimizationproblemthatconsidersthreecriteria:(1)privacythatisrelatedtotheamountoftheworker'scontextinformationsharedwiththeMCserver,(2)utilitythatrepresentsthebenetsofrecommendingthetasksintermsofrelevanceorrevenue,and(3)eciencythatmeasuresthecommunicationandcomputationoverheadimposedontheworker'smobiledevicesbyrecommendingacertainamountoftasks.WeshowinSection 5.2 thatthesethreecriteriacannotbeoptimizedsimultaneously.Notethatthe 109

PAGE 110

aforementionedsolutionsonlypresentsomediscretetrade-opoints:recommendationonlyattheserversideprovidesbesteciencyandutilityatthecostofprivacy,whilerecommendationattheworkersideguaranteesprivacyandutilityatthecostofeciency.Incontrasttothem,weproposeanoptimizationmodelthatcanbeadjustedtoanydesirabletrade-opoint.Intheproposedoptimizationframework,aworkercandecidehowmuchinformationabouthiscontexttosharewiththeMCserver.Basedonthislimitedinformation,theMCserverselectsandsendsasetoftaskstotheworker.Thesizeofthetasksetispre-denedbytheworkerconsideringtheassociatedcommunicationandcomputationoverhead.Aftertheworkerreceivesthetaskset,hepicksandcompletesthebesttaskbasedonhisprivateinformation.ThemostchallengingpartinthewholeprocessistoselectthetasksetsentbytheMCserverthatmaximizesthetotalexpectedutilityoftheMCservergivenconstraintsonprivacyandeciency.Therearealsoothertrade-opointswecanconsider,suchasjointlyoptimizingutilityandeciencygivenaconstraintonprivacy.Sincetheprioritiesofprivacyandeciencycriteriacanbearbitrarilyselectedbytheworker,theframeworkisquiteexibleandcanbeusedindierentMCsystems.Privacy-PreservingOineStatisticsCollection.Recommendedtasksarechosenbasedonstatisticsincludingbothhistoricalperformanceofworkersandthedistributionoftheircontexts.Thesestatisticsarecollectedoineandareusedtocalibratetheonlinetaskselectioncomponent.However,estimatingthesestatisticsoftenposesaprivacychallenge:workersmaybeunwillingtorevealtherequiredinformationsuchastheirexactcontextsandtasksthattheyhavecompletedsuccessfully.Therefore,weneedtoprovideaprivacy-preservingsolutionthatcanestimatethesestatisticsfromdistributedworkerdata.Somepreviousworksproposetoaddressprivacyissuesinstatisticalqueriesbyanonymizingdata;however,therearepossibilitiesthatdataownersmaybede-anonymizedwithauxiliaryinformation[ 144 , 145 ].Anotherapproach,dierentialprivacy,addsnoiseinthequeryingresultsofstatisticaldatabasessothatevenwithauxiliaryinformation,onecannotinferthepresenceorabsenceofindividuals. 110

PAGE 111

Althoughdierentialprivacyhasbecomepopularrecently,mostofthesolutionsareproposedforacentralizeddatabase,whichisnotsuitableforourcaseinwhichdataaredistributedamongworkers.Ontheotherhand,existingdistributedsolutions[ 34 , 36 , 146 , 147 ]areimpracticalforlargesystems.Forexample,thecomputationcostperuserin[ 36 ]isO(N)whereNisthenumberofusers,whichbecomesprohibitiveforalargepopulationofusers.ThecomputationcostisreducedtoO(1)in[ 34 , 147 ],buttheyuseanexpensivesecretsharingprotocolthatisnotscalabletoalargegroupofworkers.Additionally,thesolutionin[ 148 ]usestwoserverstocollaborativelycalculatestatisticsandhandleuserdynamics.However,allthesesolutionscannotpreventasinglemalicioususerfromgreatlydistortingthequeryingresults.Inthischapter,weprovideaprivacy-preservingstatisticscollectionprotocoltoreliablycomputetherequiredstatisticsfromadynamicsetofworkerswhoarepotentiallymalicious.Oursolutionreliesonasemi-honestthirdparty,aproxy,whichguaranteesdierentialprivacybyaddingblindnoisetotheencryptedworkerdata.WorkerdataareencryptedbyworkerswiththepublickeyofMCserverandaresanitizedbytheproxy.HenceneithertheproxynortheMCservercanlearntheaccuratestatisticsorindividualworkerdata.Insummary,themaincontributionsofthischapterareasfollows. WeidentifythespecicprivacychallengesoftaskrecommendationinMCsystems,andweproposeaframeworkthatprotectsworkercontextprivacy.Tothebestofourknowledge,thisistherstworktostudyprivacyintaskrecommendationforMC. Weproposeanoptimizationmodelfortaskselectionthatexploresfundamentaltrade-osamongthreedesigncriteria{privacy,utility,andeciency{inMCsystems,andwepresentecientapproximationalgorithmstosolveit. Weintroduceanecientstatisticscollectionprotocolthatpreservesdierentialprivacyinadistributedsettingwithtoleranceofmaliciousordynamicworkers. Weconductbothnumericalevaluationsandperformanceanalysistoshowtheeectivenessandeciencyofourproposedframework. 111

PAGE 112

Theremainderofthischapterisorganizedasfollows.WepresentourframeworkinSection 5.1 .NextwerepresentthetaskselectionprocessasaconstrainedoptimizationprobleminSection 5.2 .Section 5.3 developsanapproximationalgorithmtosolvetheoptimizationproblem.Aprivacy-preservingprotocolforstatisticscollectionispresentedinSection 5.4 .WediscusstheexperimentalresultsandanalyzethesystemoverheadinSection 5.5 andSection 5.6 ,respectively.Section 5.7 summarizesrelatedwork. 5.1TheProposedFrameworkInthissection,wedescribethebasicsystemmodelfortaskrecommendationinMCsystemsanddesigngoals. 5.1.1SystemModel Figure5-1. BasicsystemmodelfortaskrecommendationinMC. Fig. 5-1 showsthebasicmodeloftheproposedframeworkconsistingofthefollowingtwocomponents: StatisticsCollection.Inthiscomponent,theservercollectsvariousstatisticsfromworkersperiodicallyinthebackground.Asemi-honestthirdparty(tobeelaboratedlaterinSection 5.4 )isemployedtoprotecttheprivatecontextinformationofparticipatingworkers. TaskSelection.Inthiscomponent,basedonthestatisticscollectedinthestatisticscollectioncomponentandworker'scurrentcontext,theserverselects 112

PAGE 113

anddeliversasetoftaskstotheworker.Notethatweallowworkerstodecidehowmuchprivateinformationtheyarewillingtosharewiththeserver.Theserverselectsasetoftasks,wherethesetsizeisconstrainedbyaboundedcommunicationoverhead,basedonthislimitedinformationandsendsthemtotheworker.Theworker1thenchoosesthemostrelevantonetocompletebasedonallhisprivateinformationandreturnstheanswertotaskrequesters.PrivacyGuarantee.Ourframeworkcanprotectworkerprivacyinbothonlinetaskselectionandoinestatisticscollection.Notethattaskselectionandstatisticscollectionuseprivateworkercontextsindierentways,andthereforerequiredierentprivacy-preservingtechniques.Intaskselection,asingleworker'scurrentcontextisused,andweensureworkerprivacythroughlimitedinformationdisclosureasusedinmanymobilesystems[ 142 , 149 , 150 ].Weallowtheworkertoshareageneralizedcontextwiththeserverratherthanhisexactcontext.Thegeneralizationofworkercontextisdoneaccordingtoapredenedhierarchy.Quantiablecontextssuchaslocationcanbesimplydividedintodierentintervalsbasedontheirvalues.Forinstance,locationinformationrepresentedbythelatitudeandlongitudewithatotalof6decimaldigitscanbegeneralizedbykeeping6)]TJ /F3 11.955 Tf 12.09 0 Td[(adecimaldigitsforlevel-ageneralization.Anexampleoflevel-4generalizationisshowninFig. 5-2A .Aworkercanalsochoosedierent(i.e.,adaptive)levelsofgeneralizationfordierentintervalsofcontextswithexistingapproaches[ 140 ],asillustratedinFig. 5-2B .Withtheadaptivegeneralizationapproach,theworkercanprotecthislocationprivacyatarelativelylowcostofutility.Ifacontextinformationisnotquantiablesuchasactivities,itcanbedescribedinatreetaxonomy.Fig. 5-3 showsanexampledescribingactivitieswithdierentprecisions.Inthiscase,ifaworkerdininginarestaurantchooseslevel-2generalization,hewouldonlytelltheserverthatheisstatic.Inthischapter,wefocusonthechallengeofrecommendingtasksbasedonthegeneralizedcontextratherthan 1Forbrevity,weuse\he"torefertotheworkerwithoutmeaninganydistinctionsabouttheworker'sgenderintherestofthepaper. 113

PAGE 114

A BFigure5-2. A)Abasiccontextgeneralizationapproach;B)Anadaptivecontextgeneralizationapproach. howtogeneralizedierentcontexts.Interestingreadersmayreferto[ 151 ]fordetailsaboutdierentgeneralizationmethods.Instatisticscollection,historicalcontextandtaskcompletioninformationfromworkersisused.Aworkercanchoosewhethertoparticipateinthestatisticscollectionprotocolornot.Ifhedecidestoparticipate,weensurethatnootherparty,excepttheworkerhimself,couldknowhisprivateinformationduringthestatisticscollectionprocess.Moreover,wegivetheworkerdierentialprivacyguarantee[ 36 ],whichensuresthattheresultingstatisticsdonotsignicantlychangewiththepresenceorabsenceofasingle 114

PAGE 115

Figure5-3. Taxonomyofworkeractivities. worker.Therefore,anadversarywitharbitrarybackgroundknowledgecannottraceorde-anonymizeaworkerfrommultiplerunsofthestatisticscollectionprotocol. 5.1.2DesignGoalsWeaimtoprovidegoodprivacy,utility,andeciencyintheproposedframework.Sincetaskselectionandstatisticscollectiondierinnature,wedescribetheirdesigngoalsseparately.Goalsfortaskselection.Wehavethreedesigngoalsfortaskselection:privacy,utility,andeciency. Privacy.Workercontextsareneededfortaskrecommendation,whichmaybeleveragedbytheservertouniquelyidentifyanindividualworker.Toreducetheriskofbeingidentied,theworkerlimitstheinformationsharedwiththeserver.Insteadofprovidinganexactcontext,theworkerprovidesageneralizedcontextwhichobfuscatesprivacysensitiveinformationsuchaslocationandactivity. Utility.Utilityrepresentsthevalueofasetofrecommendedtasks.Fromtheperspectiveoftheserver,theutilityistheexpectedcommissionoftherecommendedtasks.Fromtheperspectiveoftheworker,theutilityisreectedinthepaymenthewouldobtainfromcompletingtherecommendedtasks.Theutilityforbothstakeholdersisrelatedtothepaymentofthetaskthatisselectedandcompletedsuccessfullybytheworker. Eciency.Whenaworkerreceivesasetofrecommendedtasks,hetriestoselectthebesttaskfromtheset.Alargersettakesmoretimetoselectfrom,whichcontradictstheintentionofrecommendation.Thustheeciencyoftaskrecommendationisdirectlyrelatedtothesetsize.Therecommendationsystemshouldrecommendareasonablenumberoftasksatatimetoensuretheeciencyoftaskselectionbytheworker. 115

PAGE 116

GoalsforStatisticsCollection.Therearemainlythreedesigngoalsforstatisticscollection: Privacy.Workerprivacyshouldbeprotectedduringstatisticscollection. Robustness.Thenalstatisticsshouldnotbedistortedlargelybyasmallportionofmaliciousworkers. Scalability.Thesystemshouldbescalabletoalargepopulationofworkers,whichmeansthattheproposedprotocolsshouldbehighlyecient. 5.2OptimizationModelforTaskSelectionInthissection,weinvestigatefundamentaltrade-osamongthreedesigngoalsandformulatetwooptimizationproblemstomodeltheminthetaskselectioncomponent. 5.2.1DenitionsBeforeproceedingfurther,wegivethedenitionsfornotationsusedintherestofthepaperasfollows. Denition4. ContextsandTasks DenotebyC=fc:c=1;2;:::;jCjgthesetofallexactcontexts.Eachworkerhasanexactcontextc. Denoteby^C=f^c:^c=1;2;:::;j^Cjgthesetofallgeneralizedcontexts.Eachexactcontextismappedintoageneralizedcontext,andageneralizedcontextmaycorrespondtomultipledetailedcontexts. DenotebyT=ft:t=1;2;:::;jTjgthesetofalltasks.Forsimplicityofnotations,wetreattasksthathavethesamerequirementsforworkercontextsandthesamepaymentasonetask.Eachtaskmayhavemultipleinstances.Thepaymentforsuccessfullycompletingatasktisdenotedast. Denition5. Complete-and-ApproveRate(CAR):BothworkersandtheMCplatformcanearnsomemoneywhentasksarecompletedsuccessfully(i.e.,answersapprovedbytaskrequesters).Thiscanbecharacterizedbythecomplete-and-approverate(CAR),whichcanbecalculatedasN1,thetotalnumberofworkerswithcontextcwhohavesuccessfullycompletedtaskt,dividedbyN2,thetotalnumberofworkerswithcontextc,i.e.,CAR(tjc)=N1=N2. 116

PAGE 117

5.2.2Trade-OsamongUtility,PrivacyandEciencyTheoptimizationmodeloftaskselectionspecieshowtochoosetasksbasedonlimitedinformationaboutaworker.Therearethreeconictingdesigngoalsinthismodel:utility,privacy,andeciency.Thesethreegoalscannotbeoptimizedsimultaneously.First,supposethatprivacyandeciencyareoptimized,whichmeansthattheworkerprovidesnocontextabouthimselftothesystemandexpectstoreceiveasingletasktailoredforhim.Inthiscase,aslongastheutilityoftasksvariesacrossdierentcontexts,itisimpossiblefortherecommendationservertochooseataskthatisofhighutilityfortheworker.Second,considerthecasethateciencyandutilityareoptimized.Inordertondataskthathasthehighestutilityfortheworker,therecommendationserverneedstoknowtheexactworkercontext,compromisinghisprivacy.Finally,ifwewanttoensuretheoptimalityofutilityandprivacy,therecommendationserverneedstorecommend,withoutanypriorknowledgeofworkercontext,asetoftaskswithinwhichtheworkercanndonetomaximizehisutility.Inthiscase,theeciencybecomessuboptimalsincetherecommendedtasksetwouldbeverylarge.Ifanyoftheabovethreegoalsisdropped,itistrivialtooptimizetheothertwo.Therefore,inpractice,wehavetondagoodtrade-oamongthesethreegoals. 5.2.3OptimizationProblemFormulationInourframework,theworkerrstdecidestheamountofinformationabouthisprivatecontexttosharewiththeserver.Basedonthislimitedinformation,theserverselectsLtasksTTandsendsthemtotheworker.Here,Ldeterminestheeciency.ThentheworkerselectsataskfromtherecommendedLtasks,completesit,andreturnstheresultbacktotheseverortaskrequester.Therefore,thetaskisselectedjointlybytheserverandtheworkerinourframework.Asmentionedbefore,therearethreeconictinggoals.Althoughthesegoalscannotbeoptimizedsimultaneously,thereareseveralcandidateobjectivefunctionsthatoptimizesthegoalsfromdierentaspects.Inthefollowing,wechooseanoptimizationobjective 117

PAGE 118

functionrepresentingtheutilityandmodeltheothertwogoalsasconstraints.Inotherwords,weoptimizetheutilitywhileallowingtheworkertodeterminetheeciencyandprivacyrequirements.Alternativeobjectivefunctionsarealsodiscussed. ComputationattheworkersideGivenasetofrecommendedtasksT,theworkerselectsonetocomplete.Thebehavioroftheworkerissupposedtoberational.Inotherwords,theworkerwithexactcontextcwouldselectthetaskthatmaximizeshisownrevenue,whichcanbemodeledas t=argmaxt2TtCAR(tjc):(5{1) ComputationattheserversideSincetheworkerknowshisowncontext,hecaneasilymaketheselectionbymaximizinghisrevenue.Thisisnottruefortheserverasitcanonlyselecttasksbasedonthelimitedinformationprovidedbytheworker.Toincreasetherelevancebetweenrecommendedtasksandtheworker,theserverneedstorecommendmultipletasksatthesametime.Assumethattheserveralreadyhaspriorknowledgeonthecontext-dependentclick-and-approverates,CAR(tjc),andtheprobabilitydistributionovercontexts.Fromtheperspectiveoftheserver,itsutility(i.e.,commission)dependsonthetaskthattheworkerchooses,andweusetheexpectedcommissionofthesetoftaskstoquantifyit.Sincetheserverdoesnotknowtheexactcontextcoftheworker,itconsiderstheprobabilityofeachoftheexactcontextsthatgeneralizeinto^candcalculatestheexpectedcommissionofthesetoftasksTasfollows: E[Commission(Tj^c)]=Xc:c!^cPr[cj^c]maxt2TtCAR(tjc);(5{2)whereistheportionofrevenuethattheplatformcanobtainforeachsuccessfultransaction.LetLdenotethesizeofthetaskset.TheserverneedstoselectLtasks 118

PAGE 119

thatmaximizetheexpectedcommissiongivenageneralizedcontext^c,i.e., T=argmaxTT:jTj=LE[Commission(Tj^c)]:(5{3) AlternativeObjectivesTheaboveoptimizationmodelcontainstheextremecaseswhentaskselectionistakensolelyattheserverside(L=1)orsolelyattheworkerside(L=jTj).Fortheformercase,iftheserverrecommendsasingletaskbasedonaverygeneralizedcontextprovidedbytheworker,itislikelythattherecommendationhasalowutility.Forthelattercase,theserversendsalltheavailabletaskstotheworker.Theselectionbecomesinecient,andtherecommendationserviceismeaningless.Hence,theparameterLshouldbeselectedcautiously.InsteadofsettingLasapredenedparameter,wecanalsoincludeitasoneofthedesignvariables.ThiscanbedonebysubstitutingE[Commission(Tj^c)])]TJ /F3 11.955 Tf 12.65 0 Td[(LfortheoriginalobjectiveE[Commission(Tj^c)]in( 5{3 ),whereistheweightoftheeciencymetricLinthetotalobjectivefunction.Asaresult,theserverselectsasetoftasksthatmaximizesthenewobjective,i.e., T=argmaxTT:jTj=LE[Commission(Tj^c)])]TJ /F3 11.955 Tf 11.95 0 Td[(L:(5{4)Inthisway,theeciencyandtheutilitycanbeoptimizedjointly.Thereareotheroptionstomodeltheutilityaswell.Forexample,wecanincorporatethecostofataskintotheobjectivesuchastimeorotherresourcesneededforcompletingatask.Inthiscase,theselectionprocessamongasetoftasksfortheworkerbecomesmorecomplicated.Apossibleformulationmightbemaxt2T(t)]TJ /F1 11.955 Tf 12.98 0 Td[(costt;c)CAR(tjc),wherecostt;cdenotesthecosttocompletetasktbyworkerswithcontextc.Inaddition,theremightbeareservationwagewr[ 152 ]belowwhichtheworkerwouldnotpickthetask.Consideringthis,theprocessoftaskselectionforaworkercanbemodeledasmaxt2T(t)]TJ /F1 11.955 Tf 11.95 0 Td[(costt;c)1ft)]TJ /F8 7.97 Tf 6.59 0 Td[(costt;cwrgCAR(tjc). 119

PAGE 120

5.3AlgorithmsfortheOptimizationProblemInthissection,weproposealgorithmsfortheserverandtheworkertooptimizetheirobjectiveseciently.Werstconsiderthespecicscenariowhichoptimizestheobjectiveofutilityasin( 5{2 )andthendiscusshowtojointlyoptimizeutilityandeciencyasin( 5{4 ). 5.3.1ApproximationAlgorithmforOptimizingtheUtilityTheoptimizationpartfortheworkercanbeeasilysolvedsinceheonlyneedstochooseataskamongLtasks,whereLisusuallydesignedtobeasmallnumber.However,itisnontrivialfortheservertoselectLtasksfromtheentiretaskspaceT.Actually,wehavethefollowingfact: Proposition5.1. Givenageneralizedcontext^c,itisNP-hardtondasetoftasksTsuchthat: T=argmaxTT:jTj=LXc:c!^cPr[cj^c]maxt2TtCAR(tjc):(5{5) Proof. Theproofisbasedonthehardnessofthemaximumcoverageproblem.Themaximumcoverageproblemisdenedasfollows:givenacollectionofsetsS=fS1;S2;:::;SmgoversomeniteuniverseUandanumberLasinput,weneedtoselectatmostLofthesesetssuchthattheunionoftheselectedsetshasthemaximumcardinality.Considerthefollowingsettingofourtaskrecommendationproblem:thesetofcontextsthatgeneralizeto^cequalstheuniverseU.Foreachcontextc2U,wesetPr[cj^c]=1=jUj.ForeachsetsinthecollectionS,wedeneataskwithcategoryasuchthatforallelementsc2s,CAR(tjc)=1and0forallotherelements.Moreover,wesett=1;8tand=1.Then,foragivensetTofLtasks,theexpectedcommissionis Xc:c!^c1 jUjmaxt2TCAR(tjc):(5{6)ItcanbeviewedasthetotalnumberofelementscoveredbythecorrespondingsetofsetsSdividedbytheuniversesize.Byusingtheabovetransformation,anyinstanceofthemaximumcoverageproblemcanbereducedtoaninstanceofourtasksetselection 120

PAGE 121

probleminpolynomialtime.Therefore,thetaskrecommendationproblemisharderthanthemaximumcoverageproblem.SincethemaximumcoverageproblemisknowntobeNP-hard,ourproblemisalsoNP-hard. Sincetheproblem( 5{5 )isNP-hard,weproposeagreedyalgorithmasshowninAlgorithm 5 below. Algorithm5. GreedyAlgorithmforProtMaximization Input: T,^c,L Output: T 1: //initialization 2: T ;; 3: F(T) E[Commission(Tj^c)]; 4: repeat 5: t argmaxt2TF(T[t))]TJ /F3 11.955 Tf 11.95 0 Td[(F(T); 6: T T[ftg; 7: untiljTj=L 8: returnTendByrepeatedlychoosingataskthatmaximizestheutilityimprovement,thegreedyalgorithmcanbeprovedtoapproximatetheoptimalvaluewithin1)]TJ /F1 11.955 Tf 12.81 0 Td[(1=e.Notethatin[ 153 ],agreedyalgorithmthatsolvesthemaximumcoverageproblemprovidesthesameapproximationratio.Howeverintheirproblem,theseteitherfullyincludestheelementornotatall,whileinourproblemataskcanpartiallymatchesthecontext,whichcomplicatestheproblemandrequiresadditionalanalysis.Theproofofthisapproximationratioforourapproximationalgorithmisgivenbelow. Proposition5.2. Thegreedyalgorithmapproximatestheoptimalsolutionwithinafactorof1)]TJ /F1 11.955 Tf 11.95 0 Td[(1=e. 121

PAGE 122

Proof. DeneamarginalutilityfunctionofaddingasetTtoT0asM(T;T0)=E[Commission(T[T0j^c)])]TJ /F7 11.955 Tf 11.96 0 Td[(E[Commission(Tj^c)]:Thisfunctionissubmodular.Denotethetasksselectedbythegreedyalgorithmasft1;t2;:::;tLgandtheoptimalsetoftasksasft1;t2;:::;tLg.Letmidenotethemarginalgainwhenaddingtasktiandm(i)denotethesumofthemarginalgainoftherstltasks,i.e.,m(i)=Pli=1mi.Then,miandm(i)aredenedsimilarlyontheoptimalsetft1;t2;:::;tLg.First,weprovethatforeach1lL, mlm(L))]TJ /F3 11.955 Tf 11.95 0 Td[(m(l)]TJ /F1 11.955 Tf 11.95 0 Td[(1) L:(5{7)Addingthesetft1;t2;:::;tLgtoft1;t2;:::;tl)]TJ /F8 7.97 Tf 6.59 0 Td[(1g,wehaveM(ft1;:::;tl)]TJ /F8 7.97 Tf 6.59 0 Td[(1g;ft1;:::;tLg)m(L))]TJ /F3 11.955 Tf 11.95 0 Td[(m(l)]TJ /F1 11.955 Tf 11.96 0 Td[(1):NotethatM(ft1;:::;tl)]TJ /F8 7.97 Tf 6.59 0 Td[(1g;ft1;:::;tLg)canbeexpressedasPli=1M(ft1;:::;tl)]TJ /F8 7.97 Tf 6.59 0 Td[(1g[ft1;:::;ti)]TJ /F8 7.97 Tf 6.58 0 Td[(1g;ftig).Bytheaveragingargument,thereexistsanisuchthatM)]TJ /F2 11.955 Tf 5.48 -9.68 Td[(ft1;:::;tl)]TJ /F8 7.97 Tf 6.58 0 Td[(1g[ft1;:::;ti)]TJ /F8 7.97 Tf 6.59 0 Td[(1g;ftigm(L))]TJ /F3 11.955 Tf 11.95 0 Td[(m(l)]TJ /F1 11.955 Tf 11.95 0 Td[(1) L:Therefore,bythepropertyofsubmodularity,thereexistsanisuchthatM(ft1;:::;tl)]TJ /F8 7.97 Tf 6.58 0 Td[(1g;ftig)m(L))]TJ /F3 11.955 Tf 11.95 0 Td[(m(l)]TJ /F1 11.955 Tf 11.95 0 Td[(1) L:Sincethetaskselectedinthel-throundoftheapproximationalgorithmistheonethatmaximizesthemarginalutilityM(ft1;:::;tl)]TJ /F8 7.97 Tf 6.58 0 Td[(1g;),wehaveM(ft1;:::;tl)]TJ /F8 7.97 Tf 6.58 0 Td[(1g;tl)M(ft1;:::;tl)]TJ /F8 7.97 Tf 6.59 0 Td[(1g;ftig)m(L))]TJ /F3 11.955 Tf 11.95 0 Td[(m(l)]TJ /F1 11.955 Tf 11.96 0 Td[(1) L: 122

PAGE 123

Recallingthedenitionofml,weobtainthatmlm(L))]TJ /F3 11.955 Tf 11.95 0 Td[(m(l)]TJ /F1 11.955 Tf 11.95 0 Td[(1) L:Using( 5{7 ),wecanprovebyinductionthatm(l)(1)]TJ /F1 11.955 Tf 11.95 0 Td[((1)]TJ /F1 11.955 Tf 11.95 0 Td[(1=L)l)m(L);8l2f1;2;:::;Lg:Settingl=Lintheaboveinequality,weobtainthatm(L)(1)]TJ /F1 11.955 Tf 11.95 0 Td[((1)]TJ /F1 11.955 Tf 11.96 0 Td[(1=L)L)m(L)(1)]TJ /F1 11.955 Tf 11.96 0 Td[(1=e)m(L);whichcompletestheproof. 5.3.2ApproximationAlgorithmforJointlyOptimizingtheUtilityandEciencyAsmentionedbefore,therearealternativeobjectivesfortheoptimizationproblem.Wenowdiscusshowcanwejointlyoptimizetheutilityandeciencyin( 5{4 ).Asweshowbelow,thisisalsoanNP-hardproblem. Proposition5.3. Givenageneralizedcontext^c,itisNP-hardtondasetoftasksT,suchthat: T=argmaxTT:jTj=LXc:c!^cPr[cj^c]maxt2TtCAR(tjc))]TJ /F3 11.955 Tf 11.95 0 Td[(L:(5{8) Proof. InProposition 5.1 ,wehaveprovedtheNP-hardnessofndingLtasksthatmaximizestheexpectedcommissiondenedin( 5{2 ).Thisisactuallyaspecialcaseofthenewoptimizationproblem( 5{8 ).Ifonecansolvethenewoptimizationprobleminpolynomialtime,andtheresultinglistoftasksisofsizeL0,thenhecansolvetheoptimizationproblemdenedinProposition 5.1 withL=L0inpolynomialtime,whichcontradictsProposition 5.1 .Hence,thenewoptimizationproblemisalsoNP-hard. Below,wedescribeAlgorithm 6 thatapproximatelysolvestheaboveoptimizationproblem( 5{8 )inpolynomialtimeandgivetheanalysisofapproximationratioinProposition 5.4 . 123

PAGE 124

Algorithm6. GreedyAlgorithmforJointlyUtilityandEciencyOptimization Input: T,^c,,Lmax Output: T 1: //initialization 2: L 1,F 0,T ;; 3: whileLLmaxdo 4: T ;; 5: F(T) E[Commission(Tj^c)])]TJ /F3 11.955 Tf 11.95 0 Td[(L; 6: repeat 7: t argmaxt2TF(T[t))]TJ /F3 11.955 Tf 11.96 0 Td[(F(T); 8: T T[ftg; 9: untiljTj=L 10: ifFF(T)then 11: F F(T); 12: T T; 13: endif 14: L L+1; 15: endwhile 16: returnTend Proposition5.4. Thegreedyalgorithmapproximatestheoptimalsolutionwithinafactorof1)]TJ /F1 11.955 Tf 11.95 0 Td[(1=e. Proof. LetFgdenotethegreedysolutionofF(T)inAlgorithm 6 andFdenotetheoptimalsolutionofF(T)inAlgorithm 6 ,respectively.Inthefollowing,wewillprovethat FgF(1)]TJ /F1 11.955 Tf 11.95 0 Td[(1=e):(5{9) 124

PAGE 125

Foreaseofpresentation,letFgLdenotethegreedysolutiongivenbyAlgorithm 5 foraxedL,andFLdenotetheoptimalsolutionofF(T)inAlgorithm 5 foraxedL,respectively.AsshownintheproofofProposition 5.2 ,wehave FgLFL(1)]TJ /F1 11.955 Tf 11.95 0 Td[(1=e):(5{10)SupposethattheoptimalsolutionFisachievedatL=l0,thenFshouldalsobetheoptimalsolutionforAlgorithm 5 atL=l0,i.e.,Fl0=F.Substitutel0into( 5{10 ),wehaveFgl0Fl0(1)]TJ /F1 11.955 Tf 12.11 0 Td[(1=e).AccordingtothewaythatthegreedysolutionFgischosen,wehaveFgFL,8L=1;:::;Lmax.Therefore,wehavethefollowingrelationship: FgFgl0Fl0(1)]TJ /F1 11.955 Tf 11.96 0 Td[(1=e):(5{11)SinceF=Fl0,wehaveFgF(1)]TJ /F1 11.955 Tf 11.96 0 Td[(1=e),whichcompletestheproof. 5.4Privacy-PreservingStatisticsCollectionIntheprevioussections,wehaveassumedthattheserverhaspriorinformationaboutworkerstatisticsinthetaskselectioncomponentsuchasPr[cj^c]andCAR(tjc).Inthissection,wedescribehowtoobtainthesestatisticswhileachievingdesigngoalsdescribedinSection 5.1.2 . 5.4.1ProblemOverviewTherearethreepartiesintheoinestatisticscollectioncomponent:theMCserver,workers,andasemi-honestthirdparty(proxy).Theservermakesstatisticsqueriesandcollectstheresults.Workerslocallystoretheirhistoricalcontextsaswellasperformancerecords,andanswerqueries.Theproxyplaysamediationrolebetweentheserverandtheworkersinordertoguaranteeworkerprivacy.Theideaofusingasemi-honestproxytoensuredistributeddierentialprivacyhasbeenusedpreviouslyindierentapplications[ 108 , 148 , 154 ].Here,weuseitforprivacy-preservingstatisticscollectioninourframework. 125

PAGE 126

ThreatModelTheserverisassumedtobepotentiallymaliciousinthesensethatitintendstoviolateworkerprivacy.Theservermayattempttousethestatisticscollectionprotocoltolearnprivateinformationaboutworkers,ordeployitsownworkersandmanipulatetheiranswers.Moreover,theservermayalsopublishitscollectedworkerstatistics.Workersarealsoassumedtobepotentiallymaliciousinthesensethattheymaydistortthenalstatisticslearnedbytheserverbysubmittingfalseorillegitimateanswers.Theproxyisassumedtobesemi-honestor\honest-but-curious",whichmeansitwillfaithfullyfollowthespeciedprotocol,butmayattempttoexploitadditionalinformationlearnedinexecutingtheprotocol.Theproxydoesnotcolludewithotherparties.Inpractice,assuggestedin[ 154 ],theservermaypaytheproxytoexecutethestatisticscollectionprotocol.Suchaproxyhasbeenusedinpreviouspapers[ 155 , 156 ]andtherelationshipbetweentheproxyandtheMCserverpre-existsinindustrytodaywhichusuallydoesnotleadtocollusion.Forexample,pharmaceuticalcompaniespayanindependentorganizationwhoevaluatesthesafety,quality,orperformanceoftheirproductsandmaygiveunfavorableresultsagainstthepharmaceuticalcompanies.Therefore,webelievethatitisreasonabletohavesuchasemi-honestproxyinourprotocol.Notethatsomedistributeddierentialprivacydesigns[ 34 , 36 , 147 ]donotrequiresuchaproxy.However,theyhaveahighcomputationorcommunicationcost,whichisimpracticalinmostapplications.Withthesemi-honestproxy,wecanprovidedierentialprivacyguaranteeinadistributedsettingwithamuchhighereciency. AssumptionsWeassumethatworkershavecorrectpublickeysfortheserverandtheproxy,thattheserverandtheproxyhavecorrectpublickeysforeachother,andthatallthecorrespondingprivatekeysaresecurelykept.Wealsoassumesecure,reliable,andauthenticatedcommunicationchannelsamongtheserver,theproxy,andworkers. 126

PAGE 127

Workersareassumedtobedynamic,whichmeansthattheymayquitinthemiddleofthestatisticscollectionprocessduetounstablewirelessconnectionorpowersaving.Moreover,thecomputationandcommunicationresourcesofworkerdevicesareassumedtobelimited. PrivacyDenitionOurframeworkallowsworkerstochoosewhethertoparticipateinstatisticscollection,andprotecttheprivacyofparticipatingworkers.Weconsidertheprivacyriskforparticipatingworkersfromtwoaspects.Werstguaranteethatnootherparty,excepttheworkerhimself,wouldknowhisprivateinformationduringstatisticscollection.Thiscanbeachievedthroughdataencryptionasshownin[ 83 , 157 , 158 ].Moreover,wealsoconsiderprivacyleakagethatcannotbesolvedbydataencryption.Apotentialprivacyleakageisduetomultiplerunsofstatisticscollectionwhenaworkerdoesnotparticipateinallruns,e.g.,becausehehasreachedhome.Hence,weshouldprotecteveryworkerfromanadversary(witharbitrarybackgroundknowledge)whotriestotraceorde-anonymizeauserbetweenseveralrunsofstatisticscollectionprotocol.Tothisend,weadopttheprivacynotionof(;)-dierentialprivacy[ 36 ],whichensuresthattheresultofourprotocoldoesnotsignicantchangewiththepresenceorabsenceofasingleworker.Theformaldenitionof(;)-dierentialprivacyisgivenasfollows[ 36 ]: Denition6. AstatisticscollectionalgorithmFsatises(;)-dierentialprivacyif,foralldatasetsD1andD2dieringononerecord,andforalloutputsOrange(F),thefollowinginequalityholds: Pr[F(D1)2O]exp()Pr[F(D2)2O]+:(5{12)Inotherwords,ifthestatisticsaredierentiallyprivate,theadversarywouldnotchangehisprobabilisticbeliefaboutanindividualevenwithknowledgeofthepublishedstatisticsandanyotherauxiliaryinformation. 127

PAGE 128

Thestrongguaranteeprovidedbydierentialprivacyisnotfree.Privacyisenforcedbyaddingnoisetotheoutcomeofthealgorithm.Therearetwoprivacyparametersand.TherstparameterboundstheratioofoutputdistributionsoninputofD1andD2,withhigherrepresentingweakerprivacyguarantee.Thelatterparameterrelaxestherelativeshiftateventsthatareunlikelytohappen. 5.4.2ComputationofWorkerStatisticsBasedonCountingBasedonadierentiallyprivatecountingprocedure,thestatisticscollectionprotocolgathersresponsesfromworkersandtransformstheresponsesintostatisticsPr[cj^c]andCAR(tjc).Werstdescribehowthesestatisticscanbecomputedovercontextswithacountingprocedure.WewillgivethedetailsofthecountingprocedureinSection 5.4.3 .CalculatingPr[cj^c].ThestatisticPr[cj^c]iscalculatedasthenumberofworkerswithcontextcdividedbythenumberofworkerswithgeneralizedcontext^c.Hence,theMCservershouldcountthenumbersofworkerswithcontextcandgeneralizedcontext^c,respectively.Tothisend,theMCserverconstructsastatisticsquerywhichaskstwoquestions:(1)\Isyourprivatecontextc?"and(2)\Isyourgeneralizedcontext^c?".Bothquestionsexpectbinaryanswers\yes"(representedby1)or\no"(representedby0).Theanswerfromeachworkerkisavector(b1k;b2k)thatconsistsoftwobits,eachcorrespondingtoaquestion.AnexampleoftheanswervectorisshowninFig. 5-4 .Therefore,givenaprivacy-preservingcountingprocedure,wecanaggregateanswerstothesetwoquestionsfromworkers,andcalculatePr[cj^c]inaprivacy-preservingmanner.CalculatingCAR(tjc).ThestatisticCAR(tjc),asdenedinSection 5.2.1 ,iscalculatedasthetotalnumberofworkerswithcontextcwhohavecompletedtasktdividedbythetotalnumberofworkerswithcontextc.TheMCserveralsogeneratesaquerythatconsistsoftwoquestions:(1)\Isyourcontextc?"and(2)\Ifyourcontextisc,haveyousuccessfullycompletedtaskt?".Theanswertothesetwoquestionsiscontainedinatwo-bitvectoraswell.Ifthecontextoftheworkerisnotc,theansweroftheworkerwouldbe[0;0];ifthecontextoftheworkerisc,andhehassuccessfullycompletedthe 128

PAGE 129

Figure5-4. Illustrationoftheanswervectorforworkerk. task,hisanswerwouldbe[1;1];ifthecontextoftheworkeriscbuthedoesnotcompletethetask,hisanswerwouldbe[0;1].Here,therstbitoftheanswervectorindicatesthattheworkersatisesbothcontextcandcompletionoftaskt,whilethesecondbitindicateswhethertheworker'scontextiscasshowninFig. 5-4 .Similarly,wecancomputeCAR(tjc)usingacountingprocedureoveranswerstothesequestionsfromworkers.Inpractice,theMCserverrstsetsMand,whereMindicatesthenumberofworkersthatneedtobequeriedandistheprivacybudgetthatcontrolstheamountofnoise.ThequeriesandtheparametersMandarethenbroadcastedtoworkers,whoseanswerswillbeaddedbitbybitasshowninFig. 5-5 byaprivacy-preservingcountingprocedureexplainedinthenextsection. 5.4.3DistributedDierentially-PrivateCountingProcedureWenowdescribeourdierentiallyprivatecountingprocedurewhichisthekeypartofthestatisticalcollectionprotocolasexplainedbefore.Thecountingproceduretakesanswersfromworkersastheinputdata,andoutputsanoisysumwithdierentialprivacyguarantee,i.e.,asumthatdoesnotsignicantlychangewiththepresenceorabsenceofasingleworker.Itistrivialtoachievethisprivacygoalinacentralizedsetting,wherethedataownerfullypossessesthedataandcaneasilyaddnoisetothesumbeforerevealingittoothers.However,inourdistributedsettingwherethedataareownedbyworkersthemselves,itisnon-trivialtoaddthenoisetothedistributeddata.There 129

PAGE 130

Figure5-5. AggregationprocessofanswervectorsfromMworkers. areafewworkswhichprovidedierentialprivacyinadistributedsetting[ 34 , 36 , 147 ].However,theyeitherhaveahighcomputationcostoneachuser[ 36 ]orrequiresuserstobeonlineduringthewholecomputationprocess[ 34 , 147 ],renderingthemimpracticalforalarge-scalesettingasourscenario.Toensurethescalabilityofourprotocol,weemployasemi-honestproxytoachievedierentialprivacyunderdistributedsetting.Theproxywillaggregateanswersfromworkersandaddnoisetothesum;however,itisunabletolearnthevalueofanswersortheirsum.Moreover,sincethenalstatisticsareopentopublic,theproxycannotlearntheaddednoise.Otherwise,itmaysubtractthenoisefromthenoisysumandobtaintheaccuratecount.Thisrequiresthattheproxycanonly\blindly"addthedierentiallyprivatenoise.AsdepictedinFig. 5-6 ,thecountingprotocolworksasfollows:Step1:Dependingonthetypeofstatisticsitwantstocalculate,theserverformulatesaqueryrequestandspeciesthenumberofqueriedworkersMandtheprivacyparameterforthisquery. 130

PAGE 131

Figure5-6. Schematicsofourprivacy-preservingcountingprocess. Step2:Theproxyhasaccesstoaworkerlistwhichrecordsthehistoricalgeneralcontexts^cofworkers,theircurrentprivacyloss,andthetimestampswhentheylastconnectedtotheproxy.Foraquerywithaspeciccontextc,theproxysendsittoMworkersfromtheworkerlistwhosehistoricalgeneralcontextsmaycovercontextcandwhoseprivacylosshasnotexceedapredenedthreshold.Afterworkersreturntheiranswers,theproxywilladdandtotheprivacylossofqueriedworkersintheworkerlist.Step3:Afteraworkerreceivesthequery,heconstructsananswervectorasdescribedinSection 5.4.2 .Theworkerneedstopreventtheproxyfromlearninghisanswer,thusheencryptshisanswerwiththepublickeyoftheMCserverandsendstheciphertexttotheproxy.Forinstance,iftheansweris(0;1),theencryptedanswerwouldbee(0)jje(1),wheree()istheencryptionoperationunderthepublickeyoftheMCserver.Foreaseofpresentation,wecalltheencryptedbinarybite()asacoin.WewillexplaintheusedcryptosystematthisstepinSection 5.4.4 . 131

PAGE 132

Step4:Theproxyaggregatesthecoinsintobucketswitheachbucketcorrespondingtoabitintheanswervector.Theproxyalsoaddsbinomialnoisetoeachbucketbasedontheprivacybudget,wherethenoiseconsistsofcoinswithrandombinaryvaluestobespeciedinSection 5.4.4 .Forexample,iftheamountofnoiseisn,thenncoinsareaddedtoeachbucket.Step5:TheproxyforwardsthebucketstotheMCserver.Step6:AftertheMCserverreceivesthebucketsforacertainquery,itrstdecryptsallthecoinsineachbucketwithitsprivatekeyandsumsupthedecryptedvaluesineachbucket.ThenalresultsarecalculatedasPkb1k)]TJ /F3 11.955 Tf 11.58 0 Td[(n=2andPkb2k)]TJ /F3 11.955 Tf 11.58 0 Td[(n=2,wheren=2isusedtocanceltheaddednoise.SincetheMCservercannottellwhoconstructsthecoins,theidentitiesofworkersareanonymized.Notethatwhenworkerscommunicatewiththeproxy,standardcryptographicprotocolssuchasTLSareused.Therefore,onlytheproxycanlearnthecontentofmessagessentfromworkers.Similarly,standardcryptographicprotocolsareusedduringthecommunicationbetweentheproxyandtheMCserver. 5.4.4CoinGenerationandNoiseAdditionInthissection,wegivethedetailsaboutcoingeneration,thatis,encryptionofbinarydata.WeusetheGoldwasser-Micali(GM)cryptosystem[ 159 ]duetoitshigheciencyinencryptingbinaryvaluesanditsXOR-homomorphicproperty.Forcompleteness,webrieydescribetheGMencryptionprocessasfollows. KeyGeneration:LetNpqwherepandqaretwolargeprimenumbersindependentofeachother.Chooseanon-residuexsuchthattheLegendresymbolx=p=x=q=)]TJ /F1 11.955 Tf 9.29 0 Td[(1,andhencetheJacobisymbolx=N=+1.Thepublickeyis(N;x),andtheprivatekeyis(p;q). Encryption:Letbbethemessagewewanttoencrypt.Chooseanon-zerorandomnumberr2ZN.Theciphertexteisgivenby er2xb(modN):(5{13) Decryption:Giventheciphertexte,thereceiverusestheprimefactorization(p;q)tocheckwhethereisaquadraticresidue.Inordertodothis,thereceiverrst 132

PAGE 133

computesep=e(modp)andeq=e(modq).Ifbothe(p)]TJ /F8 7.97 Tf 6.58 0 Td[(1)=2p=1(modp)ande(q)]TJ /F8 7.97 Tf 6.59 0 Td[(1)=2q=1(modq)hold,theneiscalledaquadraticresidue.Thereceiversetsb=0ifeisaquadraticresidue,andsetsb=1otherwise. DierentiallyPrivateNoiseTheamountofnoiserequiredtoachieve(;)-dierentialprivacyiscalculatedin[ 36 ]anddescribedasfollows. Proposition5.5. Letnbethenumberofunbiasedcoinsaddedinabucket,i.e.,theamountofBinomialnoise.Thestatisticscollectionalgorithmachieves(;)-dierentialprivacyif n64ln(2 ) 2:(5{14) Proof. Letadenotetheaccurateoutputbeforeperturbation,i.e.,a=Pb2Db,sothestatisticscollectionalgorithmF(D)outputsf=a+noise.IftwodatasetsD1andD2diersinonerow,achangesbyatmost1.Boundingtheratioofprobabilitiesthatfoccursisequivalenttoboundingtheratioofprobabilitiesthatnoise=yandnoise=y+1forarangeofpossiblevaluesofy.WerstcomputethelargestythatsatisesPr[noise=y]=Pr[noise=y+1]exp(),andthenndstheparameteroftheBinomialdistributionthatlimitstheprobabilityofaprivacybreach.Theprobabilityatn=2+yforaBinomialdistributionis Pr[n=2+y]=nn=2+y1=2n:(5{15)Thus,theratioofprobabilitiesattwoconsecutivevaluesis Pr[n=2+y] Pr[n=2+y+1]=n=2+y+1 n=2)]TJ /F3 11.955 Tf 11.96 0 Td[(y:(5{16)Aslongasyislessthann=8,thisratioisboundby1+.Since1+exp(),theratioisalsoboundbyexp(). 133

PAGE 134

Whenyexceedsn=8,wewanttoboundthetailofthedistributionfunction.TheupperboundoftheuppertailcanbederivedusingCherno'sinequality,i.e., Pr[y>n=2+n=8]exp()]TJ /F1 11.955 Tf 9.29 0 Td[((2n=64)):(5{17)Toachieve(;)-dierentialprivacy,thisupperboundshouldbelessthan=2.Hence,nshouldbeatleast64ln(2=)=2. TheparametersandMisselectedbytheserver.Supposethatanyqueryofeachpersonissensitive,then>1=Mindicatesthedisclosureofatleastoneperson'sprivacy.Therefore,isselectedtobesmallerthan1=M.Withthisconstraint,theamountofnoiseaddedintoeachbucketshouldsatisfy n64ln(2M) 2:(5{18) BlindNoiseGenerationNotethattheproxyshouldcollaboratewithworkerstogenerateunbiasedandblindcoins.Iftheproxygeneratestheunbiasedcoinsbyitself,itwouldknowtheaccuratevalueofthenoise.Whentheserverpublishestheobfuscatedstatisticslater,theproxycouldrecovertheaccuratestatistics.Ontheotherhand,iftheworkersaretrustedtogeneratethenoisecoins,theymayintentionallydistortthenalstatisticsbygeneratingbiasednoisecoins.Toaddressthisissue,followingaippingapproachproposedin[ 154 ],welettheworkersgeneratencoinsrst,whichareippedbythesemi-honestproxy.ThisismadepossiblewiththeXOR-homomorphicpropertyofGMencryption,wheretheresultofXORoperationonplaintextscanbeobtainedbydecryptingtheproductoftheciphertexts.Notethatforanyb;b02f0;1g,wehave e(b)e(b0)=e(bb0(modN));(5{19) 134

PAGE 135

wheree()istheencryptionoperator.Withthishomomorphicproperty,twopartiescancollaborativelygenerateanencryptedvalueofeither0or1whilenosinglepartycanknoworcontrolthenalresults.Aslongasoneofthetwopartiesisunbiased,thenalresultswouldbeunbiased.Hencemultiplyingacoingeneratedbyworkersandanunbiasedcoingeneratedbytheproxyalwaysresultsinaippedcointhatisbothunbiasedandhiddenfromtheproxy.Inthisway,wecangenerateapoolofunbiasedcoinsfornoiseaddition. 5.4.5AnalysisofDesignGoalsWehavelistedthreedesigngoalsinSection 5.1.2 :privacy,scalability,androbustness.Inthefollowing,weanalyzehowcanweachievethesegoalsintheproposedprotocol.Privacy.Ourprotocolensuresdierentialprivacyforallworkers.Wheneveraworkerparticipatesinthestatisticscollectionprocedure,herevealssomeinformationabouthimself.Suchkindofprivacylossisquantiedbytheprivacybudget[ 36 , 160 ].Theprivacylossisaccumulatedacrossqueriesuntilitsurpassestheworker'sprivacybudget.Thentheworkerstopscontributinganydatainthestatisticscollectionprocedure.Thisprovidesthebestprivacyfortheworker.However,itinuencesthelifetimeofourprotocolbecauseafterallexistingworkersreachtheirprivacybudgets,thestatisticscanonlybelearnedfromnewworkers.Notethatinourframework,thedatabaseisadaptiveduetothefollowingreasons.First,workercontextmaychangeovertime.Second,thesetofworkersthatanswerthesamequerywouldchanges.Ifaworkerstopscontributinghisdata,theinuenceofhisdatatothenalresultwilldecreaseovertime.Withsuchobservations,wecantreattheprivacylossastheworst-casemeasurement.Inpractice,theproxymaintainsaworkerlistwhichrecordsthelatestlogintimeofeachworker.Ifaworkerdoesnotcontributedataforalongtime,hisprivacylosscanbesetbackto0.Scalability.Toachievethescalabilitygoal,weshouldrstensurelowper-workercomputationcostsothatevenwhenthenumberofworkersislarge,thecostforindividualworkerdoesnotchangemuch.Inourprotocol,thecostperworkerisO(1).Whenthenumberofworkersislarge,itishardtoensurethatallworkersareonlineduringthe 135

PAGE 136

process.Henceweshouldensurethatthestatisticscanbecomputedevenwhensomeworkersleaveinthemiddleofthecomputationprocess.Inourprotocol,workersonlyneedtosubmitanswersonceandnofurthercommunicationisrequiredafterthat.Therefore,ourprotocolallowsworkerstoleaveaftertheysubmittheiranswers.Robustness.WiththeGMencryption,weareabletoboundtheerrorbroughtbymaliciousworkersduetothefollowingreasons.First,thedataencryptedbyGMencryptionisguaranteedtobeeither0or1.ThiscanbydonebycheckingtheJacobisymbolsofciphertextsattheproxy(Jacobisymboloflegitimateciphertextsis\+1")orcheckingthedecryptedvalueattheMCserver.Second,eachworkercouldonlyaddasinglecointoeachbucket.Therefore,amaliciousworkerwouldbeunabletodistortthenalsumbymorethan1.Ifanadversarytriestosubstantiallydistortthenalsum,itshouldemployalargenumberofmaliciousworkers,whichisdicultinpractice.Suppose1%ofworkersaremalicious,theerrorintroducedbymaliciousworkerwouldbelessthan1%. 5.4.6PracticalConsiderationsUtilityoftheCountingResult.Whenthepercentageofmaliciousworkersissmall,themainfactorthatinuencestheutilityoftheresultwouldbethedierentiallyprivatenoise.Inourprotocol,theproxyaddsnnoisycoinstothenalresult,andtheserveradjuststhenoisebysubtractingtheaveragenoisen=2fromthenoisycount.Theactualnoisebroughtinthisprocessfollowsabinomialdistribution,whichcanbeapproximatedbythenormaldistributionN(0;n=2).Wecanthenconcludefrom\three-sigmaruleofthumb"thatthestatisticscalculatedbytheserverwillquiteprobablyliewithinthreestandarddeviationsofthemean,withthestandarddeviationequaltop n=2.Forexample,whenthenumberofqueriedworkersis100;000andtheprivacyparameteris1:0,from( 5{18 ),theprobabilitythatthenoisyanswerdeviatesfromtheactualanswerwithin38is99:7%,andtheprobabilitythatthenoisyanswerdeviatesfrom 136

PAGE 137

Figure5-7. Expectedcommissionsofrecommendationalgorithmswithdierentcontextinformation. theactualanswerwithin24is95%.Hence,theutilityoftheaggregatedanswerremainshighevenwithaconservativedierentialprivacyguarantee. 5.5PerformanceEvaluationToevaluatetheperformanceoftheproposedoptimizationalgorithms,wegenerateasyntheticdatasettosimulatethestatisticsPr(c)andCAR(tjc).Withoutlossofgenerality,weassumethefrequencyofworkercontextsisuniformlydistributed.Thedatasetincludes2048exactcontextsand10000dierenttasks.Thedetailedcontextscanbegeneralizedatfourdierentlevels.Thereare512level-1generalizedcontextsdenotedas\G1",128level-2generalizedcontextsdenotedas\G2",8level-3generalizedcontextsdenotedas\G3",and2level-4generalizedcontextsdenotedas\G4".ThestatisticCAR(tjc)isgeneratedinawaysuchthattheclosertwoexactcontextsare,themoresimilarthedistributionsofCAR(tjc)wouldbe.TheCAR(tjc)ofatasktforworkerswiththesameexactcontextcfollowsauniformdistribution.Thepaymentsoftasksaresetasarandomvaluebetween0and10,andtheratioofcommissionischosentobe0:1. 137

PAGE 138

Figure5-8. ExpectedcommissionofAlgorithm 5 withvaryingeciencyandprivacy. Firstly,wetesttheeectivenessofthetaskrecommendationmodel.Tothisend,wecompareourproposedalgorithm(Algorithm 5 )withtwobaselinealgorithms,\baseline1"and\baseline2".Therstbaselinealgorithmusestheexactworkercontextastheinput.Withtheexactworkercontext,thealgorithmdirectlychoosesthetaskthatmaximizesthecommissiongainedbytheMCplatform.workerprivacyiscompromisedinthisalgorithmtotradeforutilityandeciency.Onthecontrary,inthesecondbaselinealgorithm,nocontextinformationisused,andthereforeworkerprivacyismaximized.Thisalgorithmdoesnotconsiderthedierenceofworkercontextsandrecommendstasksthathavehighestpayments.Fig. 5-7 showstheexpectedcommissionoftheMCplatformbyadjustingthesizeoftherecommendedtasksetL.Weruntheexperimentsusingsixdierentalgorithms,includingtwobaselinealgorithmsandAlgorithm 5 withfourdierentlevelsofgeneralizedcontexts.Intuitively,thetwobaselinealgorithmsserveasaupperboundandalowerboundofotheralgorithms,respectively,whichisclearlyshowninthegure.TheexpectedcommissionofAlgorithm 5 increaseswhenmorecontextinformationisused.Foraspecic 138

PAGE 139

Figure5-9. PerformanceofourapproximationAlgorithm 5 . levelofgeneralization,theexpectedcommissionincreaseswithL.Forexample,whenthegeneralizationlevelis3(whichcorrespondsto\G3"inthegure),thecommissionincreasesfrom0:986to0:991asLincreasesfrom2to10.NotethattheperformancesofthetwobaselinealgorithmsdonotchangewithLbecausetheyalwaysselectthetaskthatmaximizestheexpectedcommissionregardlessofL.Fig. 5-8 illustratesthetrade-osamongutility,eciencyandprivacy.WecanobservethatthedeciencyinprivacycanbecompensatedbyincreasingthesizeoftherecommendationsetL.ThisalignswiththefactthatwithalargerL,aworkerreceivesmoretasksandismorelikelytogetasuitabletask.Moreover,whenLislarge,there 139

PAGE 140

Figure5-10. OptimizingtheweightedsumofutilityandeciencyforAlgorithm 6 . arealreadyenoughtasksintherecommendationset.Thus,themarginalimprovementincommissionbecomessmaller,whichmeansthatincreasingLwouldnotimprovetheutilitymuchafterLapproachesareasonablylargenumber.Wecanalsoseetheimpactofvaryingprivacylevelsontheexpectedcommission.Itisexpectedthatastherequirementforprivacyincreases,thedierenceofcommissionsobtainedbyusingtheexactcontextandgeneralizedcontextwouldincrease,whichcanbeobservedfromthegure.Sincethedierencesarealwayswithinasmallrange,weconcludethattheexpectedcommissionofourprivacy-preservingapproachisclosetothatoftheprivacy-obliviousone.Secondly,weevaluatetheperformanceoftheproposedapproximationalgorithms.DuetotheNP-hardnessoftheoriginaloptimizationproblem,theoptimalsolutionbecomesintractableinpracticewheneitherLorthetaskspaceislarge.Therefore,weuseareducedsizeofdatasetforthisexperiment(i.e.,100tasksandL=1;2;3).Fig. 5-9 comparestheperformancesofAlgorithm 5 andtheoptimalalgorithm.WeseethatthereislittledierencebetweenthetwoalgorithmsforL=1;2;3andjTj=100.ThedierencebetweenthetwoalgorithmsmaygrowasLbecomeslarger,butwehave 140

PAGE 141

Figure5-11. Trade-obetweenandaccuracyofstatistics. provedinprevioussectionsthatourapproximationalgorithmhasanapproximationratioof1)]TJ /F1 11.955 Tf 11.96 0 Td[(1=e.Thirdly,weshowtheperformanceofAlgorithm 6 ,whichjointlyoptimizesutilityandeciency.Fig. 5-10 plotstheweightedsumofutilityandeciencywiththeweightcoecientrangingfrom0:005to0:1.Foreach,thex-axisrepresentsthelevelofcontextgeneralization,andthey-axisrepresentsthetheweightedsumofutilityandeciency.SameaswhatwegetfromFig. 5-7 ,theweightedsumdecreasesasthelevelofgeneralizationincreases,whichshowsacleartrade-obetweenutilityandprivacy.Withtheincreaseof,theoptimizedweightedsumdecreases.Thisisreasonablebecauseitisshownin( 5{4 )thatforthesamelistofrecommendedtasks,theweightedsumdecreaseswiththeincreaseof.Asaresult,theoptimalweightedsumisexpectedtodecreaseaswell.Lastly,weanalyzetheprivacyandaccuracyofthestatisticscollectionmodule.Sincenoiseneedstobeaddedtoprovide(;)-dierentialprivacy,theprivacyisachievedatthecostofaccuracy.Weillustratethetrade-obetweentheprivacyparameterandthe 141

PAGE 142

Table5-1. Runningtime(inunitofseconds)ofrecommendingLtasksfromasetof100tasks(L=2or3) LG1G2G3G4OptimalGreedyOptimalGreedyOptimalGreedyOptimalGreedy 213.08.113.32.014.00.124.90.0431125.215.71267.43.81275.60.31838.10.10 Figure5-12. RunningtimeofrecommendingtasksforAlgorithm 5 . accuracyofthestatisticsinFig. 5-11 .Wecanseefromthegurethat,withtheincreaseofthetotalnumberofqueriedworkersM,higheraccuracyisobtained.Toachieveanacceptableaccuracyof,forexample,0:8,thevalueofshouldbewithintherangeof(0:1;1)formostM.Weneedtokeepsuchatrade-obetweenaccuracyandprivacyinmindwhenwechoosetheprivacyparameter. 5.6SystemOverheadInthissection,weanalyzethesystemoverheadoftheproposedframework,includingbothtaskselectionandstatisticscollectioncomponents. 142

PAGE 143

WelisttheestimatedrunningtimeforthetaskselectioncomponentinTable 5-1 .SincethetimefortheoptimalalgorithmgrowsexponentiallyasLgrows,weonlyrunthisalgorithmatL=2or3withasmalldatasetwherethenumberoftasksis100.WecanseethatthetimetogetanoptimalsolutiongrowsrapidlywithL,whilethetimefortheproposedgreedyalgorithmislinearwithrespecttoL.Fig. 5-12 furtherillustratestherunningtimeforAlgorithm 5 .Obviously,therunningtimedependsonboththesizeofrecommendationsetLandthelevelofcontextgeneralization.Eachlineinthegurerepresentsalevelofcontextgeneralization.Wecanclearlyseethatforaxedlevelofcontextgeneralization,therunningtimeincreasesalmostlinearlywithrespecttoL,whichisimportantforpracticalsystems.Infact,wecanalsoinferfromthegurethattherunningtimealsoincreaseslinearlywithrespecttothesizeofgeneralizedcontexts(thesizesofgeneralizedcontextsfor\G1",\G2",\G3",and\G4"are512,128,8,and2,respectively).Inthefollowing,weanalyzethecomputation,storageandcommunicationoverheadforstatisticscollection.Firstly,weanalyzethecomputationoverheadfortheGMcryptosystem.Witha1024-bitkeylength,asmartphonerunningAndroid2.2with1GHzprocessorcanexecutemorethan800encryptionswithinonesecond[ 154 ].Sinceworkersonlyneedtoexecutetheencryptionprocessonceforeachqueryrequest,thecomputationcostisnegligibleforthem.TheproxyisimplementedwithApacheTomcat6.0.33,whichcanexecutemorethan15;000GMencryptions,or123;000homomorphicXORspersecond,andtheserverisimplementedwithJavasourcecode,whichcanexecutemorethan6000GMdecryptionspersecond.Consideranormalsettingwherethereare5000workerswith100dierentexactcontextswhichgeneralizetothesamegeneralizedcontext,andthereare90tasksrelevanttothisgeneralizedcontext.Suppose10%oftheworkersparticipateinthestatisticscollectionprocess,theproxyneedstoexecute18encryptionsand18homomorphicXORsforasinglestatisticquerywhentheprivacyparameterissetto5accordingto( 5{18 ).Inordertocalculateallthestatisticsneededforthetask 143

PAGE 144

selectionmodel,theproxyneedstoexecute1827200encryptionsand1827200homomorphicXORs,whichtakes31secondsand4seconds,respectively.Forthesamesetting,theserverneedstodecryptatotalof(500+18)27200coins,whichtakes36minutes.Notethatthestatisticscanbecalculatedoineandarereusableamongworkerswithsimilarcontexts.Bycontrast,iftheprotocolisimplementedwiththePailliersystem,inordertocalculatestatisticsforataskselectionmodel,ittakesthemobileworker,theproxy,andtheserver4seconds,139minutes,and4500minutes,respectively.Therefore,theGMcrytosystemuseinourframeworkishighlyecient.Next,wediscussthestorageandcommunicationbandwidthrequirements.Sinceaworkertransmitsnomorethan3coinsforeachstatisticscollectionqueryandaperiodicallygeneratedcoinfornoiseaddition,thestoragerequirementforhimisintheorderofkB.Consideringthatworkerscanselectlyrespondtotherequests,thestorageoverheadisquiteacceptable.Supposethecoinsshouldbesentoutwithinonesecond,thebandwidthrequirementwouldbearound1kB/s.Asfortheproxy,sinceitneedstostoreallqueriedcoinsandnoisecoinsbeforesendingthemtotheserver,whichisabout51827200coinsintotalintheabovesetting,thestorageoverheadwouldbeabout1:7GB.Sincethestatisticscollectionprocessarecomputedbeforehand,weassumethemaximumtransmissiontimeis30minutes.Therefore,thebandwidthforsendingthesedatais1MB/s.Notethatalthoughthestoragerequirementforcomputingastatisticisnotsmall,inpractice,thestatisticonlyneedstobecomputedonceandupdatedatalowfrequencyafterithasbeencalculated.Theoverheadsfortheproxytoupdatethestatisticsareatthesameorderoftheoverheadforworkers. 5.7RelatedWorkInthissection,wereviewsomeworksrelatedtoourproblemintheliterature.Privacyissuesinmobileapplications.Previousworksonprivacyissuesofmobileapplicationsmainlyfocusonlocationprivacyinlocation-basedservices,andtheyuseeitherobfuscationtohidetruelocations[ 104 , 105 ]oraggregationtohideindividual 144

PAGE 145

sensitiveinformation[ 103 ].However,noneofthemdiscusshowtorecommendtasksintheabsenceofaccurateprivateinformation.Inthischapter,weconsiderthefundamentaltrade-osamongprivacy,utility,andeciency,andprovidesaexibleframeworktoselecttasksatdierenttrade-opoints.Privacyissuesinstatisticscomputation.Therearemultipleapproachesaddressingprivacyissuesinstatisticscomputationsuchask-anonymityand`-diversity.k-anonymity[ 161 ]ensuresthateachindividualisindistinguishablefromatleastk)]TJ /F1 11.955 Tf 12.65 0 Td[(1otherindividuals,andthuscannotbeuniquelyidentied.`-diversity[ 162 ]requiresthatthereareatleast`well-representedvaluesforeverysensitiveattribute.However,thesetwoprivacynotionscanonlyguaranteesyntacticpropertiesofthereleaseddata,andcannotprotectagainstanadversarywithcertainbackgroundknowledge.Ontheotherhand,dierentialprivacy[ 37 ]makesnoassumptionoftheadversaryandisaverystrongguarantee.However,thetraditionalnotationofdierentialprivacyisdesignedforcentralizeddatabasesandcannotbedirectlyusedinadistributedsetting.Thereareafewapproacheswhichhavebeenproposedtoprovidedierentialprivacyoverdistributeddata[ 34 , 36 , 146 , 147 ],buttheyareimpracticalforalarge-scalesetting.Forexample,thecomputationcostperuserin[ 36 ]isO(N)whereNisthenumberofusers,whichbecomesprohibitiveforalargepopulationofusers.ThecomputationcostisreducedtoO(1)in[ 34 , 147 ],buttheyuseanexpensivesecretsharingprotocolthatisnotscalabletoalargegroupofworkers.Additionally,thesolutionin[ 148 ]usestwoserverstocollaborativelycalculatestatisticsandhandleuserdynamics.However,asinglemalicioususercouldgreatlydistortthequeryingresultinthisapproach.Ourapproachprovidesapracticalsolutiontocomputedierentially-privatestatisticsoverdistributeddatathatarebothscalableandrobustagainstmaliciousworkers.Taskassignmentincrowdsourcing.Thereareafewworksintaskrecommendationforweb-basedcrowdsourcingapplications.HoandVaughan[ 163 ]addressthescenariowhereheterogeneoustasksareassignedtoworkerswithunknownskillsetswithan 145

PAGE 146

exploration-exploitationtrade-o.Yuenetal.[ 164 ]utilizeperformancehistoryandtasksearchhistorytomodeluserpreferenceandrecommendtasksforauserbasedonhis/herpreference.Ambatietal.[ 165 ]implicitlymodeluserskillsandinterests,andrecommendtasksbasedonuserpreference.However,theseworkshavenotaddressedthespecicprivacyconcernsinMCscenarioswheretasksshouldberecommendedtoworkersbasedonprivate,sensitiveinformation. 146

PAGE 147

CHAPTER6CONCLUSIONANDFUTUREPLANAdvancesincomputingandelectroniccommunicationtechnologieshaveenabledubiquitouscollectionofhighvolume,velocity,and/orvarietyinformationassets,leadingtotheeraofbigdata.Bigdatahasdriveninnovation,productivity,eciency,andgrowthinmanydomains,creatingenormousbenetsfortheglobaleconomy.However,ndingwaystoutilizethedataforinnovationwhileprotectingdataprivacyisatrulymulti-disciplinarychallengethatrequiresthejointeortofmultipleentitieswithcompetinginterests.Toachievethisweidentifyexamplesofprivacyviolations,proposeprivacy-preservingschemes,andanalyzethetrade-osbetweenutilityandprivacyinbigdataapplications.Inthisdissertation,wehaveproposedaprivacy-preservingschemeforlearningalogisticregressionmodelbasedondistributedbiomedicalsensingdata.OurschemeenablesmHealthuserstocontroltheirrawdataandonlysharenecessaryintermediateresultsduringthetrainingprocess.Wehavefurtherprovidedasolutiontoprotectingtheprivateinformationofintermediateresultsduringtheaggregationprocess.Experimentalresultsonreal-worlddatasetsshowthattheproposedapproachesconvergequicklyandprovideperformancecloselytotheoptimalresult.Ourschemeshavelowcomputationaloverheadforeachuserevenwhenthenumberofusersislarge,andarethuspracticalformHealthmonitoringscenarios.Wehavefocusedonthelogisticregressionprobleminthispaper.However,ourschememaybegeneralizedtootherclassicationproblems(e.g.,support-vectormachine)inmHealthapplicationswhichconstitutesourfuturework.Wehavealsoidentiedandaddressedtheuniqueprivacyissuesinincentive-baseddemandresponse(IDR)programs.Specically,wehaveproposedaschemewhichprovidesne-grainedmeteringdatatothedemandresponseprovider(DRP)forbasicoperations,ensuringdataintegritythroughoutalltheprocesses.Theschemeprotectscustomerprivacybyseparatingtherealidentityandthene-grainedmeteringdata,i.e.,theDRPcanonlylearneithertherealidentityorthene-grainedmeteringdataatatimebut 147

PAGE 148

cannotlinkthemtogether.Inthecasewhenre-identicationisrequired,thelinkagebetweenrealidentityandmeteringdatacanbeeasilyrestored.Hence,ourschemeprovidesanintegratedsolutionforprivacy-awareIDRprograms,whichpromotestheacceptanceofIDRprograms.Intheadhocmobilecloudcomputing,itischallengingtoensureservicequalityandlocationprivacyatthesametime.Toaddressthischallenge,wehaveproposedaframeworkthatprotectsthelocationprivacyofmobileserverswhenallocatingmobilecloudcomputingtasks.Consideringthedynamicanddiversenatureofmobileservers,wehavedesignedanewdatastructureR-PSDanddevelopedanecientsearchstrategythatndsappropriateR-PSDpartitionstoensurehighservicequality.Wehaveconductedextensiveexperimentsbasedonreal-worlddatasetstodemonstratetheeectivenessofourproposedframework.Wehavealsoanalyzedthetrade-osbetweenprivacy,utility,andeciencyinmobilecrowdsourcing,andproposedataskrecommendationframeworkwhichrecommendsmobilecrowdsourcingtaskswithoutviolatingworkerprivacy.Theproposedframeworkiscomprisedoftwocomponents:taskselectioncomponentandstatisticscollectioncomponent.Inthetaskselectioncomponent,wehavedevelopedaprivacy-awareoptimizationmodeloftaskselectionthatconsiderstheintrinsictrade-osamongutility,privacyandeciencyandselectstasksbasedonthelimitedinformationofworkercontext.Workershavethechoiceofhowmuchprivateinformationtheyarewillingtosharewiththeserver.Inthestatisticscollectioncomponent,wehaveproposedaprotocolthatgathersnecessarystatisticsaboutworkercontextswhileguaranteeingdierentialprivacy.Wehaveevaluatedtheeectivenessandeciencyoftheproposedframeworkandanalyzedthesystemoverhead.Userprivacyhasraisedheateddiscussionnowadays.Withtherapiddevelopmentofbigdataapplications,theprivacyofindividualsbecomeshardertoprotect,andthequestionofhowwebalanceprivacyandthecostforachievingprivacyintermsofutility, 148

PAGE 149

security,andeciencywillonlygrow.Intheshortterm,Iplantofocusonprivacyinmedicaldatasuchasbiomedicalsensingdataandgenomicdata.Inparallel,Iwillbuildthegroundworkforalong-termresearchagenda,whereIplantoextendmyfocusonensuringthesecurityandprivacyincyber-physicalsystemsandutilizingbigdataanalyticstosolvesecurityandprivacyissuesindata-intensiveapplications.GenomicPrivacy.Despitethemassiveinuxofdatacreatedbyrapidadvancesingenomictechnologiesandincreasingcollectionofclinicaldata,wehaveaveryincompletepictureofdiseaseatthesystemlevel.Althoughtheimpactofgenomicsoncancerhasalreadybeendramaticandengenderedtargetedtherapies,thereisstillalongwaytoutilizethebighealthdataforproactive,predictive,andpersonalizedhealthcare.Withpersonalizedhealthcare,diagnosisandtreatmentwouldbetailoredtothegeneticproleofindividuals,enablingearlydiagnosisofseriousdiseaseandreducingtheprobabilityofadverseoutcomes.Cyber-PhysicalSystemsSecurityandPrivacy.Cyber-physicalsystems(CPS)areengineeredsystemsthatseamlesslyintegrateinformationandcommunicationtechnologiesanddistributedphysicalcomponents.CPSoernumerousbenetssuchasgreatecienciesandfastresponsiveness,andarewidelyusedincriticalinfrastructures.However,theincreasedcomplexityandconnectivityaccompanyingCPSmakethemmorevulnerabletocybersecuritythreats,whichcouldplacetheNation'ssecurity,economy,publicsafety,andhealthatgreatrisk.Existingsecurityapproaches,eithercyberorsystem-theoretic,cannotprovideperfectsolutionsagainstthecybersecuritythreatsduetothetightcouplingbetweeninformationandcommunicationtechnologiesandphysicalsystems.Weneedtoredesignsystemmodels,attackmodels,andsystemrequirementstogainacompletepictureoftheCPS.Inaddition,cyberandsystem-theoreticcountermeasureshavetheirowndrawbacks,forexample,currentsystem-theoreticmethodswillnotbeabletodeteranyattackuntilitactsonthephysicalsystem,whereascybercountermeasuresaloneareinsucientto 149

PAGE 150

guaranteesecurityofthesmartgrid.ToovercomethesedrawbacksandprovidebettersecuritylevelofCPS,Iplantodesigncountermeasuresthatintegratebothcyberandsystem-theoreticapproaches.ProvidingSecurityandPrivacythroughBigDataAnalytics.Althoughbigdatamayincreasesecurityandprivacyissues,thetechnologiesdrivenbybigdatamayprovidesolutionstothesesecurityandprivacyissues.Forexample,acommonchallengeindata-intensiveapplicationsisreal-timesecurity/compliancemonitoring.Currentsolutionsofreal-timesecuritymonitoringgeneratetoomanyfalse-positivealertswhicharemostlyignoredor\clickedaway".Bigdatatechnologiesallowfastprocessingandanalyticsovermassiveamountsandvarioustypesofdata,whichcouldfundamentallyimprovereal-timesecurityanalytics,andthusprovideanopportunityforfasterandbetterdecisionsonsecurity/compliancemonitoring.Inthelongterm,Iaminterestedinapplyingbigdatatechnologiestodefendagainstsecurityandprivacyattacksindata-intensiveapplications,solvingpracticalsecurityandprivacyissuessuchasanomalydetection,anomalyanalysis,predictingtheconsequenceofanattack,anddetectionofanomalousretrievalofpersonalinformation. 150

PAGE 151

REFERENCES [1] \Thedemandresponsebaseline,"EnerNoc,2008.[Online].Available: http://www.naesb.org/pdf4/dsmee group2 022609w2.pdf [2] Y.Gong,Y.Fang,andY.Guo,\Privatedataanalyticsonbiomedicalsensingdataviadistributedcomputation,"IEEE/ACMTransactionsonComputationalBiologyandBioinformatics,2016. [3] Y.Gong,Y.Cai,Y.Guo,andY.Fang,\Aprivacy-preservingschemeforincentive-baseddemandresponseinthesmartgrid,"IEEETransactionsonSmartGrid,2015. [4] Y.Gong,C.Zhang,Y.Fang,andJ.Sun,\Protectinglocationprivacyfortaskallocationinadhocmobilecloudcomputing,"IEEETransactionsonEmergingTopicsinComputing,2015. [5] Y.Gong,Y.Guo,andY.Fang,\Aprivacy-preservingtaskrecommendationframeworkformobilecrowdsourcing,"inGlobalCommunicationsConference(GLOBECOM),2014IEEE.IEEE,2014,pp.588{593. [6] Y.Gong,L.Wei,Y.Guo,C.Zhang,andY.Fang,\Optimaltaskrecommendationformobilecrowdsourcingwithprivacycontrol,"IEEEInternetofThingsJournal,2016. [7] P.Mohan,D.Marin,S.Sultan,andA.Deen,\Medinet:personalizingtheself-careprocessforpatientswithdiabetesandcardiovasculardiseaseusingmobiletelephony,"inEngineeringinMedicineandBiologySociety,2008.EMBS2008.30thAnnualInternationalConferenceoftheIEEE.IEEE,2008,pp.755{758. [8] H.Lin,J.Shao,C.Zhang,andY.Fang,\Cam:Cloud-assistedprivacypreservingmobilehealthmonitoring,"InformationForensicsandSecurity,IEEETransactionson,vol.8,no.6,pp.985{997,2013. [9] R.Etherington,\Biostamptemporarytattooelectroniccircuitsbymc10,"LastUpdate:March28th,2013. [10] J.d.R.Millan,\Ontheneedforon-linelearninginbrain-computerinterfaces,"inNeuralNetworks,2004.Proceedings.2004IEEEInternationalJointConferenceon,vol.4.IEEE,2004,pp.2877{2882. [11] L.HoodandN.Price,\Promotingwellness&demystifyingdisease:the100kproject,"ClinicalOMICsInnovator,May2014,2014. [12] S.C.Bagley,H.White,andB.A.Golomb,\Logisticregressioninthemedicalliterature::Standardsforuseandreporting,withparticularattentiontoonemedicaldomain,"Journalofclinicalepidemiology,vol.54,no.10,pp.979{985,2001. 151

PAGE 152

[13] R.B.D'Agostino,M.J.Pencina,J.M.Massaro,andS.Coady,\Cardiovasculardiseaseriskassessment:Insightsfromframingham,"Globalheart,vol.8,no.1,pp.11{23,2013. [14] B.P.TabaeiandW.H.Herman,\Amultivariatelogisticregressionequationtoscreenfordiabetesdevelopmentandvalidation,"DiabetesCare,vol.25,no.11,pp.1999{2003,2002. [15] S.Jimenez-Serrano,S.Tortajada,andJ.M.Garca-Gomez,\Amobilehealthapplicationtopredictpostpartumdepressionbasedonmachinelearning,"Telemedicineande-Health,2015. [16] C.R.Boyd,M.A.Tolson,andW.S.Copes,\Evaluatingtraumacare:thetrissmethod."JournalofTraumaandAcuteCareSurgery,vol.27,no.4,pp.370{378,1987. [17] R.Blankstein,R.P.Ward,M.Arnsdorf,B.Jones,Y.-B.Lou,andM.Pine,\Femalegenderisanindependentpredictorofoperativemortalityaftercoronaryarterybypassgraftsurgerycontemporaryanalysisof31midwesternhospitals,"Circulation,vol.112,no.9suppl,pp.I{323,2005. [18] K.A.Bruee,Collaborativelearning:Highereducation,interdependence,andtheauthorityofknowledge.Baltimore:JohnsHopkinsUniversityPress,1993. [19] E.Miluzzo,C.T.Cornelius,A.Ramaswamy,T.Choudhury,Z.Liu,andA.T.Campbell,\Darwinphones:theevolutionofsensingandinferenceonmobilephones,"inProceedingsofthe8thinternationalconferenceonMobilesystems,applications,andservices.ACM,2010,pp.5{20. [20] X.BaoandR.RoyChoudhury,\Movi:mobilephonebasedvideohighlightsviacollaborativesensing,"inProceedingsofthe8thinternationalconferenceonMobilesystems,applications,andservices.ACM,2010,pp.357{370. [21] H.Ahmadi,N.Pham,R.Ganti,T.Abdelzaher,S.Nath,andJ.Han,\Privacy-awareregressionmodelingofparticipatorysensingdata,"inProceedingsofthe8thACMConferenceonEmbeddedNetworkedSensorSystems.ACM,2010,pp.99{112. [22] P.J.RousseeuwandA.M.Leroy,Robustregressionandoutlierdetection.JohnWiley&Sons,2005,vol.589. [23] C.C.Tan,H.Wang,S.Zhong,andQ.Li,\Bodysensornetworksecurity:anidentity-basedcryptographyapproach,"inProceedingsoftherstACMconferenceonWirelessnetworksecurity.ACM,2008,pp.148{153. [24] C.-Y.Chow,M.F.Mokbel,J.Naps,andS.Nath,\Approximateevaluationofrangenearestneighborquerieswithqualityguarantee,"inAdvancesinSpatialandTemporalDatabases.Springer,2009,pp.283{301. 152

PAGE 153

[25] T.Hastie,R.Tibshirani,J.Friedman,andJ.Franklin,TheElementsofStatisticalLearning:DataMining,InferenceandPrediction,2nded.Springer,2009. [26] R.GlowinskiandA.Marroco,\Surl'approximation,parelementsnisd'ordreun,etlaresolution,parpenalisation-dualited'uneclassedeproblemesdedirichletnonlineaires,"ESAIM:MathematicalModellingandNumericalAnalysis-ModelisationMathematiqueetAnalyseNumerique,vol.9,no.R2,pp.41{76,1975. [27] D.GabayandB.Mercier,\Adualalgorithmforthesolutionofnonlinearvariationalproblemsvianiteelementapproximation,"Computers&MathematicswithApplications,vol.2,no.1,pp.17{40,1976. [28] S.Boyd,N.Parikh,E.Chu,B.Peleato,andJ.Eckstein,\Distributedoptimizationandstatisticallearningviathealternatingdirectionmethodofmultipliers,"FoundationsandTrendsRinMachineLearning,vol.3,no.1,pp.1{122,2011. [29] D.P.BertsekasandJ.N.Tsitsiklis,ParallelandDistributedComputation:NumericalMethods.AthenaScientic,1997. [30] R.T.Rockafellar,Convexanalysis.PrincetonUniversityPress,1970. [31] A.P.Sanil,A.F.Karr,X.Lin,andJ.P.Reiter,\Privacypreservingregressionmodellingviadistributedcomputation,"inKDD,2004. [32] J.W.Bos,K.Lauter,andM.Naehrig,\Privatepredictiveanalysisonencryptedmedicaldata,"Journalofbiomedicalinformatics,vol.50,pp.234{243,2014. [33] C.Clifton,M.Kantarcioglu,J.Vaidya,X.Lin,andM.Y.Zhu,\Toolsforprivacypreservingdistributeddatamining,"ACMSigkddExplorationsNewsletter,vol.4,no.2,pp.28{34,2002. [34] E.Shi,T.-H.H.Chan,E.G.Rieel,R.Chow,andD.Song,\Privacy-preservingaggregationoftime-seriesdata,"inNDSS,vol.2,no.3,2011,p.4. [35] D.J.BernsteinandT.Lange(editors).ebacs:Ecryptbenchmarkingofcryptographicsystems.http://bench.cr.yp.to,accessed1March2015. [36] C.Dwork,K.Kenthapadi,F.McSherry,I.Mironov,andM.Naor,\Ourdata,ourselves:Privacyviadistributednoisegeneration,"inAdvancesinCryptology-EUROCRYPT2006.Springer,2006,pp.486{503. [37] C.Dwork,\Dierentialprivacy,"inAutomata,languagesandprogramming.Springer,2006,pp.1{12. [38] C.Dwork,F.McSherry,K.Nissim,andA.Smith,\Calibratingnoisetosensitivityinprivatedataanalysis,"inTheoryofCryptography.Springer,2006,pp.265{284. 153

PAGE 154

[39] F.McSherryandK.Talwar,\Mechanismdesignviadierentialprivacy,"inFoundationsofComputerScience,2007.FOCS'07.48thAnnualIEEESymposiumon.IEEE,2007,pp.94{103. [40] Y.Duan,\Privacywithoutnoise,"inProceedingsofthe18thACMconferenceonInformationandknowledgemanagement.ACM,2009,pp.1517{1520. [41] PDMCWorkshop,2004.[Online].Available: http://algoval.essex.ac.uk/data/series/pdmc [42] BCICompetitionIII.[Online].Available: http://www.bbci.de/competition/iii/ [43] T.N.Lal,T.Hinterberger,G.Widman,M.Schroder,N.J.Hill,W.Rosenstiel,C.E.Elger,N.Birbaumer,andB.Scholkopf,\Methodstowardsinvasivehumanbraincomputerinterfaces,"inAdvancesinneuralinformationprocessingsystems,2004,pp.737{744. [44] J.Benaloh,M.Chase,E.Horvitz,andK.Lauter,\Patientcontrolledencryption:ensuringprivacyofelectronicmedicalrecords,"inProceedingsofthe2009ACMworkshoponCloudcomputingsecurity.ACM,2009,pp.103{114. [45] K.Lauter,M.Naehrig,andV.Vaikuntanathan,\Canhomomorphicencryptionbepractical?"inProceedingsofthe3rdACMworkshoponCloudcomputingsecurityworkshop.ACM,2011,pp.113{124. [46] B.GedikandL.Liu,\Locationprivacyinmobilesystems:Apersonalizedanonymizationmodel,"inDistributedComputingSystems,2005.ICDCS2005.Proceedings.25thIEEEInternationalConferenceon.IEEE,2005,pp.620{629. [47] C.EfthymiouandG.Kalogridis,\Smartgridprivacyviaanonymizationofsmartmeteringdata,"inSmartGridCommunications(SmartGridComm),2010FirstIEEEInternationalConferenceon.IEEE,2010,pp.238{243. [48] A.NarayananandV.Shmatikov,\Robustde-anonymizationoflargesparsedatasets,"inSecurityandPrivacy,2008.SP2008.IEEESymposiumon.IEEE,2008,pp.111{125. [49] M.Gymrek,A.L.McGuire,D.Golan,E.Halperin,andY.Erlich,\Identifyingpersonalgenomesbysurnameinference,"Science,vol.339,no.6117,pp.321{324,2013. [50] P.K.FongandJ.H.Weber-Jahnke,\Privacypreservingdecisiontreelearningusingunrealizeddatasets,"KnowledgeandDataEngineering,IEEETransactionson,vol.24,no.2,pp.353{364,2012. [51] K.-P.LinandM.-S.Chen,\Onthedesignandanalysisoftheprivacy-preservingsvmclassier,"KnowledgeandDataEngineering,IEEETransactionson,vol.23,no.11,pp.1704{1717,2011. 154

PAGE 155

[52] A.Evmievski,\Randomizationinprivacypreservingdatamining,"ACMSigkddExplorationsNewsletter,vol.4,no.2,pp.43{48,2002. [53] R.AgrawalandR.Srikant,\Privacy-preservingdatamining,"inACMSigmodRecord,vol.29,no.2.ACM,2000,pp.439{450. [54] F.McSherryandI.Mironov,\Dierentiallyprivaterecommendersystems:buildingprivacyintothenet,"inProceedingsofthe15thACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining.ACM,2009,pp.627{636. [55] J.VaidyaandC.Clifton,\Privacy-preservingdecisiontreesoververticallypartitioneddata,"inDataandApplicationsSecurityXIX.Springer,2005,pp.139{152. [56] J.Vaidya,M.Kantarcoglu,andC.Clifton,\Privacy-preservingnaivebayesclassication,"TheVLDBJournalTheInternationalJournalonVeryLargeDataBases,vol.17,no.4,pp.879{898,2008. [57] J.Vaidya,H.Yu,andX.Jiang,\Privacy-preservingsvmclassication,"KnowledgeandInformationSystems,vol.14,no.2,pp.161{178,2008. [58] C.Gentry,\Afullyhomomorphicencryptionscheme,"Ph.D.dissertation,StanfordUniversity,2009. [59] T.Graepel,K.Lauter,andM.Naehrig,\Mlcondential:Machinelearningonencrypteddata,"inInformationSecurityandCryptology{ICISC2012.Springer,2013,pp.1{21. [60] K.Xu,H.Yue,L.Guo,Y.Guo,andY.Fang,\Privacy-preservingmachinelearningalgorithmsforbigdatasystems,"inDistributedComputingSystems(ICDCS),2015IEEE35thInternationalConferenceon.IEEE,2015,pp.318{327. [61] Y.Gong,Y.Fang,andY.Guo,\Privacy-preservingcollaborativelearningformobilehealthmonitoring,"in2015IEEEGlobalCommunicationsConference(GLOBECOM).IEEE,2015,pp.1{6. [62] U.DOE,\Benetsofdemandresponseinelectricitymarketsandrecommendationsforachievingthem,"U.S.DepartmentofEnergy,Tech.Rep.,2006. [63] A.IpakchiandF.Albuyeh,\Gridofthefuture,"PowerandEnergyMagazine,IEEE,vol.7,no.2,pp.52{62,2009. [64] C.Goldman,M.Reid,R.Levy,andA.Silverstein,\Coordinationofenergyeciencyanddemandresponse,"LawrenceBerkeleyNationalLaboratory,Tech.Rep.,2010. [65] G.W.Hart,\Non-intrusiveapplianceloadmonitoring,"Proc.oftheIEEE,vol.80,no.12,pp.1870{1891,1992. 155

PAGE 156

[66] D.C.Bergman,D.Jin,J.P.Juen,N.Tanaka,C.A.Gunter,andA.K.Wright,\Distributednon-intrusiveloadmonitoring,"inProc.IEEEPESISGT,2011,pp.1{8. [67] A.Molina-Markham,P.Shenoy,K.Fu,E.Cecchet,andD.Irwin,\Privatememoirsofasmartmeter,"inACMWorkshoponEmbeddedSensingSystemsforEnergy-eciencyinBuilding,2010,pp.61{66. [68] A.Shamir,\Identity-basedcryptosystemsandsignatureschemes,"inAdvancesincryptology.Springer,1985,pp.47{53. [69] J.C.ChaandJ.H.Cheon,\Anidentity-basedsignaturefromgapdie-hellmangroups,"inPKC2003,vol.6.SpringerScience&BusinessMedia,2002,p.18. [70] C.-K.ChuandW.-G.Tzeng,\Identity-committablesignaturesandtheirextensiontogroup-orientedringsignatures,"inInformationSecurityandPrivacy,2007,pp.323{337. [71] S.Goldwasser,S.Micali,andC.Racko,\Theknowledgecomplexityofinteractiveproofsystems,"SIAMJournaloncomputing,vol.18,no.1,pp.186{208,1989. [72] A.FiatandA.Shamir,\Howtoproveyourself:practicalsolutionstoidenticationandsignatureproblems,"inCRYPTO,1987,pp.186{194. [73] J.CamenischandM.Stadler,\Ecientgroupsignatureschemesforlargegroups,"inProc.ofCRYPTO'97,1997. [74] T.P.Pedersen,\Non-interactiveandinformation-theoreticsecureveriablesecretsharing,"inCRYPTO,1992,pp.129{140. [75] D.Boneh,X.Boyen,andH.Shacham,\Shortgroupsignatures,"inCRYPTO,2004,pp.41{55. [76] M.H.Au,W.Susilo,andY.Mu,\Constant-sizedynamick-taa,"inProc.5thInt.Conf.onSecurityandCryptographyforNetworks,2006,pp.111{125. [77] K.Coughlin,M.A.Piette,C.Goldman,andS.Kiliccote,\Estimatingdemandresponseloadimpacts:evaluationofbaselineloadmodelsfornon-residentialbuildingsincalifornia,"LawrenceBerkeleyNationalLaboratory,2008. [78] \Aprivacy-preservingframeworkforincentive-baseddemandresponseinthesmartgrid,"Tech.Rep.,2014.[Online].Available: http://www.fang.ece.u.edu/drafts/PrivacyIDR-extended.pdf [79] J.K.Liu,M.H.Au,W.Susilo,andJ.Zhou,\Enhancinglocationprivacyforelectricvehicles(attherighttime),"inESORICS2012.Springer,2012,pp.397{414. [80] E.Quinn,\Smartmeteringandprivacy:Existinglawsandcompetingpolicies,"AvailableatSSRN1462285,2009. 156

PAGE 157

[81] S.McLaughlin,P.McDaniel,andW.Aiello,\Protectingconsumerprivacyfromelectricloadmonitoring,"inProc.ACMCCS,2011,pp.87{98. [82] M.A.Lisovich,D.K.Mulligan,andS.B.Wicker,\Inferringpersonalinformationfromdemand-responsesystems,"IEEESecurity&Privacy,vol.8,no.1,pp.11{20,2010. [83] F.D.GarciaandB.Jacobs,\Privacy-friendlyenergy-meteringviahomomorphicencryption,"inSecurityandTrustManagement.Springer,2011,pp.226{238. [84] K.Kursawe,G.Danezis,andM.Kohlweiss,\Privacy-friendlyaggregationforthesmart-grid,"inProc.11thInt.Conf.onPrivacyEnhancingTechnologies,2011,pp.175{191. [85] Z.ErkinandG.Tsudik,\Privatecomputationofspatialandtemporalpowerconsumptionwithsmartmeters,"inProc.10thInt.Conf.onAppliedCryptographyandNetworkSecurity,2012,pp.561{577. [86] F.Li,B.Luo,andP.Liu,\Secureinformationaggregationforsmartgridsusinghomomorphicencryption,"inProc.IEEESmartGridComm,2010,pp.327{332. [87] A.RialandG.Danezis,\Privacy-preservingsmartmetering,"inProc.10thACMWorkshoponPrivacyintheElectronicSociety,2011,pp.49{60. [88] D.Chaum,\Securitywithoutidentication:Transactionsystemstomakebigbrotherobsolete,"CommunicationsoftheACM,vol.28,no.10,pp.1030{1044,1985. [89] J.CamenischandA.Lysyanskaya,\Anecientsystemfornon-transferableanonymouscredentialswithoptionalanonymityrevocation,"inAdvancesinCryptology,2001,pp.93{118. [90] S.Hohenberger,S.Myers,andR.Pass,\Anonize:Alarge-scaleanonymoussurveysystem,"inProceedingsofthe2014IEEESymposiumonSecurityandPrivacy.IEEE,2014,pp.375{389. [91] G.Maganis,E.Shi,H.Chen,andD.Song,\Opaak:usingmobilephonestolimitanonymousidentitiesonline,"inProceedingsofthe10thinternationalconferenceonMobilesystems,applications,andservices.ACM,2012,pp.295{308. [92] J.SchillerandA.Voisard,Location-basedservices.Elsevier,2004. [93] M.SpreitzerandM.Theimer,\Providinglocationinformationinaubiquitouscomputingenvironment,"MobileComputing,pp.397{423,1996. [94] T.Choudhury,S.Consolvo,B.Harrison,J.Hightower,A.LaMarca,L.LeGrand,A.Rahimi,A.Rea,G.Bordello,B.Hemingwayetal.,\Themobilesensingplatform:Anembeddedactivityrecognitionsystem,"PervasiveComputing,IEEE,vol.7,no.2,pp.32{41,2008. 157

PAGE 158

[95] S.B.Eisenman,E.Miluzzo,N.D.Lane,R.A.Peterson,G.-S.Ahn,andA.T.Campbell,\Bikenet:Amobilesensingsystemforcyclistexperiencemapping,"ACMTransactionsonSensorNetworks(TOSN),vol.6,no.1,p.6,2009. [96] F.Alt,A.S.Shirazi,A.Schmidt,U.Kramer,andZ.Nawaz,\Location-basedcrowdsourcing:extendingcrowdsourcingtotherealworld,"inProceedingsofthe6thNordicConferenceonHuman-ComputerInteraction:ExtendingBoundaries.ACM,2010,pp.13{22. [97] A.R.Khan,M.Othman,S.A.Madani,andS.U.Khan,\Asurveyofmobilecloudcomputingapplicationmodels,"CommunicationsSurveys&Tutorials,IEEE,vol.16,no.1,pp.393{413,2014. [98] S.Abolfazli,Z.Sanaei,E.Ahmed,A.Gani,andR.Buyya,\Cloud-basedaugmentationformobiledevices:motivation,taxonomies,andopenchallenges,"CommunicationsSurveys&Tutorials,IEEE,vol.16,no.1,pp.337{368,2014. [99] N.Fernando,S.W.Loke,andW.Rahayu,\Dynamicmobilecloudcomputing:Adhocandopportunisticjobsharing,"inUtilityandCloudComputing(UCC),2011FourthIEEEInternationalConferenceon.IEEE,2011,pp.281{286. [100] G.Huerta-CanepaandD.Lee,\Avirtualcloudcomputingproviderformobiledevices,"inProceedingsofthe1stACMWorkshoponMobileCloudComputing&Services:SocialNetworksandBeyond.ACM,2010,p.6. [101] E.E.Marinelli,\Hyrax:cloudcomputingonmobiledevicesusingmapreduce,"Master'sthesis,CarnegieMellonUniversity,2009. [102] I.Krontiris,F.C.Freiling,andT.Dimitriou,\Locationprivacyinurbansensingnetworks:researchchallengesanddirections,"WirelessCommunications,IEEE,vol.17,no.5,pp.30{35,2010. [103] J.W.Brown,O.Ohrimenko,andR.Tamassia,\Haze:Privacy-preservingreal-timetracstatistics,"inProceedingsofthe21stACMSIGSPATIALInternationalConferenceonAdvancesinGeographicInformationSystems.ACM,2013,pp.530{533. [104] M.DuckhamandL.Kulik,\Aformalmodelofobfuscationandnegotiationforlocationprivacy,"inPervasivecomputing.Springer,2005,pp.152{170. [105] R.Shokri,G.Theodorakopoulos,C.Troncoso,J.-P.Hubaux,andJ.-Y.LeBoudec,\Protectinglocationprivacy:optimalstrategyagainstlocalizationattacks,"inProceedingsofthe2012ACMconferenceonComputerandcommunicationssecurity.ACM,2012,pp.617{627. [106] M.L.Yiu,G.Ghinita,C.S.Jensen,andP.Kalnis,\Enablingsearchservicesonoutsourcedprivatespatialdata,"TheVLDBJournal,vol.19,no.3,pp.363{384,2010. 158

PAGE 159

[107] B.Yao,F.Li,andX.Xiao,\Securenearestneighborrevisited,"inDataEngineering(ICDE),2013IEEE29thInternationalConferenceon.IEEE,2013,pp.733{744. [108] H.To,G.Ghinita,andC.Shahabi,\Aframeworkforprotectingworkerlocationprivacyinspatialcrowdsourcing,"in40thInternationalConferenceonVeryLargeDataBases,2014,pp.919{930. [109] G.Cormode,C.Procopiuc,D.Srivastava,E.Shen,andT.Yu,\Dierentiallyprivatespatialdecompositions,"inDataEngineering(ICDE),2012IEEE28thInternationalConferenceon.IEEE,2012,pp.20{31. [110] C.Dwork,\Dierentialprivacy:Asurveyofresults,"inTheoryandApplicationsofModelsofComputation.Springer,2008,pp.1{19. [111] H.Samet,Thedesignandanalysisofspatialdatastructures.Addison-WesleyReading,MA,1990,vol.85. [112] W.Qardaji,W.Yang,andN.Li,\Dierentiallyprivategridsforgeospatialdata,"inDataEngineering(ICDE),2013IEEE29thInternationalConferenceon.IEEE,2013,pp.757{768. [113] S.Abolfazli,Z.Sanaei,M.Shiraz,andA.Gani,\Momcc:market-orientedarchitectureformobilecloudcomputingbasedonserviceorientedarchitecture,"inCommunicationsinChinaWorkshops(ICCC),20121stIEEEInternationalConferenceon.IEEE,2012,pp.8{13. [114] J.C.NavasandT.Imielinski,\Geocastgeographicaddressingandrouting,"inProceedingsofthe3rdannualACM/IEEEinternationalconferenceonMobilecomputingandnetworking.ACM,1997,pp.66{76. [115] A.ChandraandJ.Weissman,\Nebulas:Usingdistributedvoluntaryresourcestobuildclouds,"inHotCloud'09.USENIX,2011. [116] T.Yan,M.Marzilli,R.Holmes,D.Ganesan,andM.Corner,\mcrowd:aplatformformobilecrowdsourcing,"inProceedingsofthe7thACMConferenceonEmbeddedNetworkedSensorSystems.ACM,2009,pp.347{348. [117] S.BucheggerandJ.-Y.LeBoudec,\Performanceanalysisofthecondantprotocol,"inProceedingsofthe3rdACMinternationalsymposiumonMobileadhocnetworking&computing.ACM,2002,pp.226{236. [118] S.Ganeriwal,L.K.Balzano,andM.B.Srivastava,\Reputation-basedframeworkforhighintegritysensornetworks,"ACMTransactionsonSensorNetworks(TOSN),vol.4,no.3,p.15,2008. [119] K.L.Huang,S.S.Kanhere,andW.Hu,\Areyoucontributingtrustworthydata?:thecaseforareputationsysteminparticipatorysensing,"inProceedingsofthe13thACMinternationalconferenceonModeling,analysis,andsimulationofwirelessandmobilesystems.ACM,2010,pp.14{22. 159

PAGE 160

[120] H.Lin,X.Zhu,Y.Fang,D.Xing,C.Zhang,andZ.Cao,\Ecienttrustbasedinformationsharingschemesoverdistributedcollaborativenetworks,"SelectedAreasinCommunications,IEEEJournalon,vol.31,no.9,pp.279{290,2013. [121] R.Dingledine,N.Mathewson,andP.F.Syverson,\Tor:Thesecond-generationonionrouter,"inUSENIXSecuritySymposium,2004,pp.303{320. [122] L.KazemiandC.Shahabi,\Geocrowd:enablingqueryansweringwithspatialcrowdsourcing,"inProceedingsofthe20thInternationalConferenceonAdvancesinGeographicInformationSystems.ACM,2012,pp.189{198. [123] C.Maihofer,\Asurveyofgeocastroutingprotocols,"CommunicationsSurveys&Tutorials,IEEE,vol.6,no.2,pp.32{42,2004. [124] [Online].Available: https://snap.stanford.edu/data/loc-gowalla.html [125] [Online].Available: http://www.crowdower.com/ [126] I.BoutsisandV.Kalogeraki,\Ontaskassignmentforreal-timereliablecrowdsourcing,"inDistributedComputingSystems(ICDCS),2014IEEE34thInternationalConferenceon.IEEE,2014,pp.1{10. [127] D.R.Karger,S.Oh,andD.Shah,\Budget-optimaltaskallocationforreliablecrowdsourcingsystems,"Operationsresearch,no.1,pp.1{24,2014. [128] B.-G.Chun,S.Ihm,P.Maniatis,M.Naik,andA.Patti,\Clonecloud:elasticexecutionbetweenmobiledeviceandcloud,"inProc.Eurosys'11.ACM,2011,pp.301{314. [129] X.Zhang,S.Jeong,A.Kunjithapatham,andS.Gibbs,\Towardsanelasticapplicationmodelforaugmentingcomputingcapabilitiesofmobileplatforms,"inMobilewirelessmiddleware,operatingsystems,andapplications.Springer,2010,pp.161{174. [130] Statista.[Online].Available: http://www.statista.com/ [131] Waze.[Online].Available: https://www.waze.com [132] Uber.[Online].Available: https://www.uber.com/ [133] Stereopublic.[Online].Available: http://www.stereopublic.net/ [134] OpenSignal.[Online].Available: http://opensignal.com/ [135] C.Freifeld,R.Chunara,S.Mekaru,E.Chan,T.Kass-Hout,andJ.Brownstein,\Participatoryepidemiology:useofmobilephonesforcommunity-basedhealthreporting,"PLoSMed.,vol.7,no.12,p.e1000376,2010. [136] Y.Chon,N.D.Lane,Y.Kim,F.Zhao,andH.Cha,\Alarge-scalestudyofmobilecrowdsourcingwithsmartphonesforurbansensingapplications,"inProc.ofUbiComp'13,2013. 160

PAGE 161

[137] A.Tamilin,I.Carreras,E.Ssebaggala,A.Opira,andN.Conci,\Context-awaremobilecrowdsourcing,"inProc.ofUbiMiWorkshop.ACM,2012,pp.717{720. [138] Y.Wang,Y.Huang,andC.Louis,\Respectinguserprivacyinmobilecrowdsourcing,"SCIENCE,vol.2,no.2,pp.pp{50,2013. [139] X.Shen,B.Tan,andC.Zhai,\Implicitusermodelingforpersonalizedsearch,"inProceedingsofthe14thACMinternationalconferenceonInformationandknowledgemanagement.ACM,2005,pp.824{831. [140] M.F.Mokbel,C.-Y.Chow,andW.G.Aref,\Thenewcasper:queryprocessingforlocationserviceswithoutcompromisingprivacy,"inProceedingsofthe32ndinternationalconferenceonVerylargedatabases.VLDBEndowment,2006,pp.763{774. [141] S.Guha,A.Reznichenko,K.Tang,H.Haddadi,andP.Francis,\Servingadsfromlocalhostforperformance,privacy,andprot,"inHotNets,2009. [142] M.FredriksonandB.Livshits,\Repriv:Re-imaginingcontentpersonalizationandin-browserprivacy,"inSecurityandPrivacy(SP),2011IEEESymposiumon.IEEE,2011,pp.131{146. [143] S.Chakraborty,K.R.Raghavan,M.P.Johnson,andM.B.Srivastava,\Aframeworkforcontext-awareprivacyofsensordataonmobilesystems,"inProceedingsofthe14thWorkshoponMobileComputingSystemsandApplications.ACM,2013,p.11. [144] S.E.Coull,C.V.Wright,F.Monrose,M.P.Collins,M.K.Reiteretal.,\Playingdevil'sadvocate:Inferringsensitiveinformationfromanonymizednetworktraces."inNDSS,vol.7,2007,pp.35{47. [145] B.F.Ribeiro,W.Chen,G.Miklau,andD.F.Towsley,\Analyzingprivacyinenterprisepackettraceanonymization."inNDSS,2008. [146] G.AcsandC.Castelluccia,\Ihaveadream!(dierentiallyprivatesmartmetering),"inInformationHiding.Springer,2011,pp.118{132. [147] V.RastogiandS.Nath,\Dierentiallyprivateaggregationofdistributedtime-serieswithtransformationandencryption,"inProceedingsofthe2010ACMSIGMODInternationalConferenceonManagementofdata.ACM,2010,pp.735{746. [148] M.HardtandS.Nath,\Privacy-awarepersonalizationformobileadvertising,"inProceedingsofthe2012ACMconferenceonComputerandcommunicationssecurity.ACM,2012,pp.662{673. [149] Y.Xu,K.Wang,B.Zhang,andZ.Chen,\Privacy-enhancingpersonalizedwebsearch,"inProceedingsofthe16thinternationalconferenceonWorldWideWeb.ACM,2007,pp.591{600. 161

PAGE 162

[150] C.S.Jensen,H.Lu,andM.L.Yiu,\Locationprivacytechniquesinclient-serverarchitectures,"inPrivacyinlocation-basedapplications.Springer,2009,pp.31{58. [151] E.Baralis,L.Cagliero,T.Cerquitelli,P.Garza,andM.Marchetti,\Context-awareuserandserviceprolingbymeansofgeneralizedassociationrules,"inProceedingsofthe13thInternationalConferenceonKnowledge-BasedandIntelligentInformationandEngineeringSystems:PartII.Springer-Verlag,2009,pp.50{57. [152] J.J.HortonandL.B.Chilton,\Thelaboreconomicsofpaidcrowdsourcing,"inProceedingsofthe11thACMconferenceonElectroniccommerce.ACM,2010,pp.209{218. [153] D.S.HochbaumandA.Pathria,\Analysisofthegreedyapproachinproblemsofmaximumk-coverage,"NavalResearchLogistics,vol.45,no.6,pp.615{627,1998. [154] R.Chen,A.Reznichenko,P.Francis,andJ.Gehrke,\Towardsstatisticalqueriesoverdistributedprivateuserdata,"inProceedingsofthe9thSymposiumonNetworkedSystemsDesignandImplementation(NSDI),2012. [155] S.Yu,C.Wang,K.Ren,andW.Lou,\Achievingsecure,scalable,andne-graineddataaccesscontrolincloudcomputing,"inINFOCOM,2010ProceedingsIEEE.Ieee,2010,pp.1{9. [156] C.Wang,Q.Wang,K.Ren,andW.Lou,\Privacy-preservingpublicauditingfordatastoragesecurityincloudcomputing,"inINFOCOM,2010ProceedingsIEEE.Ieee,2010,pp.1{9. [157] J.Shi,Y.Zhang,andY.Liu,\Prisense:privacy-preservingdataaggregationinpeople-centricurbansensingsystems,"inINFOCOM,2010ProceedingsIEEE.IEEE,2010,pp.1{9. [158] R.K.Ganti,N.Pham,Y.-E.Tsai,andT.F.Abdelzaher,\Poolview:streamprivacyforgrassrootsparticipatorysensing,"inProceedingsofthe6thACMconferenceonEmbeddednetworksensorsystems.ACM,2008,pp.281{294. [159] S.GoldwasserandS.Micali,\Probabilisticencryption&howtoplaymentalpokerkeepingsecretallpartialinformation,"inProceedingsofthefourteenthannualACMsymposiumonTheoryofcomputing.ACM,1982,pp.365{377. [160] C.Dwork,\Armfoundationforprivatedataanalysis,"CommunicationsoftheACM,vol.54,no.1,pp.86{95,2011. [161] L.Sweeney,\k-anonymity:Amodelforprotectingprivacy,"InternationalJournalofUncertainty,FuzzinessandKnowledge-BasedSystems,vol.10,no.05,pp.557{570,2002. [162] A.Machanavajjhala,D.Kifer,J.Gehrke,andM.Venkitasubramaniam,\l-diversity:Privacybeyondk-anonymity,"ACMTransactionsonKnowledgeDiscoveryfromData(TKDD),vol.1,no.1,p.3,2007. 162

PAGE 163

[163] C.-J.HoandJ.W.Vaughan,\Onlinetaskassignmentincrowdsourcingmarkets,"inAAAIConferenceonArticialIntelligence,2012. [164] M.-C.Yuen,I.King,andK.-S.Leung,\Taskrecommendationincrowdsourcingsystems,"inProceedingsoftheFirstInternationalWorkshoponCrowdsourcingandDataMining.ACM,2012,pp.22{26. [165] V.Ambati,S.Vogel,andJ.Carbonell,\Towardstaskrecommendationinmicro-taskmarkets,"inProceedingsofThe25thAAAIWorkshopinHumanComputation,2011. 163

PAGE 164

BIOGRAPHICALSKETCHYanminGongreceivedtheBachelorofEngineeringdegreeinelectronicsandinformationengineeringfromHuazhongUniversityofScienceandTechnology,China,in2009,theMasterofEngineeringdegreeinelectricalengineeringfromTsinghuaUniversity,China,in2012,andtheDoctorofPhilosophydegreeinelectricalandcomputerengineeringfromtheUniversityofFloridainthesummerof2016.SheiscurrentlyanassistantprofessorattheSchoolofElectricalandComputerEngineering,OklahomaStateUniversity.Herresearchinterestincludessecurityandprivacyinbigdataapplicationswithemphasisonhealthcare,mobilesystems,andenergy. 164