Efficient Implementation of Multi-Dimensional Co-Clustering

MISSING IMAGE

Material Information

Title:
Efficient Implementation of Multi-Dimensional Co-Clustering
Physical Description:
1 online resource (43 p.)
Language:
english
Creator:
Gao,Xiaoyang
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Master's ( M.S.)
Degree Grantor:
University of Florida
Degree Disciplines:
Computer Engineering, Computer and Information Science and Engineering
Committee Chair:
Ranka, Sanjay
Committee Members:
Chen, Shigang
Helmy, Ahmed H.

Subjects

Subjects / Keywords:
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre:
Computer Engineering thesis, M.S.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
Co-Clustering is an important data mining operation that can automatically cluster along two or more dimensions. Most of the work in the literature focuses on co-clustering on two dimensions. In this report, we develop extensions of ITCC (Information Theoretical Co-Clustering) for multi-dimension data. We first extend the approach to more than two dimensions. We also develop parallel algorithms for the resulting approach. Our experimental results show that our algorithms and implementation scale well to handle large datasets both on sequential and parallel machines. The Multi-Dimensional ITCC has been used to help the analysis of multi-dimensional wireless data records to find out the hidden model of user activities.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Xiaoyang Gao.
Thesis:
Thesis (M.S.)--University of Florida, 2011.
Local:
Adviser: Ranka, Sanjay.

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2011
System ID:
UFE0043454:00001


This item is only available as the following downloads:


Full Text

PAGE 1

EFFICIENTIMPLEMENTATIONOFMULTI-DIMENSIONALCO-CLUSTERINGByXIAOYANGGAOATHESISPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFMASTEROFSCIENCEUNIVERSITYOFFLORIDA2011

PAGE 2

c2011XiaoyangGao 2

PAGE 3

TomyMonandDad,andeveryonethathelpedmenishthisthesis 3

PAGE 4

ACKNOWLEDGMENTS Ofthemanypeoplewhohavebeenenormouslyhelpfulinthepreparationofthisthesis,I'mespeciallyandheartilythankfultomysupervisorDr.SanjayRanka.Thethesiscouldnothavebeenwrittenwithouthimwhonotonlyservedasmysupervisorbutalsoencouragedandchallengedmethroughouttheacademicproblem.Hispatienceonansweringmyvariousquestionsandinstructiveguidancehavetaughtmeusefulmethodstoanalyzeandsolveaproblemandagoodattitudetonishajobwell.IwouldliketowarmlyacknowledgeDr.AhmedHelmyforhissupportandguidanceintheinterestingproject,ofwhichtherealwirelessnetworkdatahelpedalotonnishingthisthesis.Icanalwayslearnsomethingthroughthemeetingwithhim.Also,IwouldliketothankDr.ShigangChenforhisinstructionwhichsimulatedmyinterestoncomputernetworksandbroughtmeagoodmasteroncomputernetworksrelatedknowledge,aswellashisinputonmythesisdefense.Inaddition,aspecialthankstoSaeedMoghaddamandClintP.Georgefortheirnecessarysupportonanalysisofco-clusteringresults.Mostespeciallytomyfamilyandallmyfriends.Theirconsideration,motivationandencouragementenabledmetocompletethisthesis. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 6 LISTOFFIGURES ..................................... 7 ABSTRACT ......................................... 8 CHAPTER 1INTRODUCTION ................................... 9 2MULTI-DIMENSIONALINFORMATIONTHEORETICALCO-CLUSTERING .. 12 3DATAREPRESENTATION .............................. 16 3.1OriginalDataCube ............................... 16 3.1.1DenseCube ............................... 16 3.1.2SparseCube .............................. 17 3.2ClusteredDataCube .............................. 18 3.3MarginalDistribution .............................. 19 4SERIALIMPLEMENTATION ............................ 21 4.1Preprocessing ................................. 22 4.2SerialImplementationandOptimizationforDenseCube .......... 23 4.3SerialImplementationandOptimizationforSparseCube ......... 24 5PARALLELIMPLEMENTATION ........................... 25 5.1ParallelImplementationandOptimizationforDenseCube ......... 25 5.1.1ParallelReductiononCUDAPlatform ................ 25 5.1.2Multi-itemsCalculationonCUDAPlatform .............. 28 5.1.3Optimization ............................... 29 5.2ParallelImplementationandOptimizationforSparseCube ......... 29 6EXPERIMENTS ................................... 34 6.1DataSet,EnvironmentandMeasurementDetails .............. 34 6.2PerformanceandDiscussion ......................... 35 6.3Co-clusteringAlgorithmResults ........................ 38 7CONCLUSION .................................... 41 REFERENCES ....................................... 42 BIOGRAPHICALSKETCH ................................ 43 5

PAGE 6

LISTOFTABLES Table page 5-1Exampleofsortingindexesinthreadsandblocks ................. 32 5-2Exampleofsharedmemory ............................. 32 6-1Informationondatasets ............................... 34 6

PAGE 7

LISTOFFIGURES Figure page 2-1Multi-dimensionalinformationtheoreticco-clusteringalgorithm ......... 14 3-1Storagefordensecubeinmemory ......................... 17 3-2Storageforsparsecubeinmemory ........................ 18 3-3Storageforjagged2Darrayinmemory ...................... 20 4-1Flowofimplementation ............................... 22 4-2Procedureofcomputingdistancesiniterationonsparsecube .......... 24 5-1Tree-basedreduction[ 2 ] ............................... 26 5-2Exampleof2Dthreadmappingformarginaldistributioncomputation ...... 28 5-3Reductionofrepeatedcommunicationbetweenhostanddevice ........ 29 5-4Mappingofthreadsinparallelimplementationforsparsecube ......... 30 6-1Trendsofthelossofmutualinformation ...................... 35 6-2Comparisonoflossofmutualinformation ..................... 36 6-3Performanceresults ................................. 37 6-4Datapointsdistributionbeforeandafterco-clustering .............. 39 6-5Co-clusteringresultofdomainnamesof2Ddataset ............... 40 7

PAGE 8

AbstractofThesisPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofMasterofScienceEFFICIENTIMPLEMENTATIONOFMULTI-DIMENSIONALCO-CLUSTERINGByXiaoyangGaoAugust2011Chair:SanjayRankaMajor:ComputerEngineeringCo-Clusteringisanimportantdataminingoperationthatcanautomaticallyclusteralongtwoormoredimensions.Mostoftheworkintheliteraturefocusesonco-clusteringontwodimensions.Inthisreport,wedevelopextensionsofITCC(InformationTheoreticalCo-Clustering)formulti-dimensiondata.Werstextendtheapproachtomorethantwodimensions.Wealsodevelopparallelalgorithmsfortheresultingapproach.Ourexperimentalresultsshowthatouralgorithmsandimplementationscalewelltohandlelargedatasetsbothonsequentialandparallelmachines.TheMulti-DimensionalITCChasbeenusedtohelptheanalysisofmulti-dimensionalwirelessdatarecordstondoutthehiddenmodelofuseractivities. 8

PAGE 9

CHAPTER1INTRODUCTIONIntheeraofdataexplosion,largeamountofdataaregeneratedeveryday.However,theutilityofdatafailstokeepupwiththeincreasingamountofdata.Lotsofknowledgehiddenindatacannotbefoundout.Clusteringisafundamentaltoolindatamining,whichisusedtoautomaticallygroupthesimilarobjectsintoclustersinanunsupervisedwaytohelppeopleexploitmoreknowledgethatcanhardlybediscoveredfromobservationbasedoncommonsenseorcurrentknowledge.Dataintherealworldalwayshasmorethanoneattribute.Clusteringalongonlyonedimensionwouldbeunabletodiscoverthenewknowledgewhichhasrelationswithalltheattributes.Todealwiththis,co-clusteringcomesout,andprovidesawaytoautomaticallyandsimultaneouslyclusterthedataalongtwoormoredimensions.Nowadays,thistechniquehasalreadybeenwidelyusedinmanyareas,includingtext,web-log,bioinformatics,andwirelessnetworkdataanalysisandmodeling.Researchershavetriedtousedifferentmeasurementsonthesimilarityofdifferentobjectstoanalyzethedatafromdifferentaspects.Fordifferentapplications,variousco-clusteringalgorithmshavebeenpresentedinliteratures.Mostofthemfocusontwo-dimensionaldata.However,thereisagreatdemandonclusteringthedatainmulti-dimensions,sincethedataintherealworldisalwaysmorethantwodimensions.Forexample,thetrafcrecordsfromwirelessnetworkhasattributessuchasusers,domains,time,locations,andsoon.Inthisway,itisoftendesirabletoco-clusteronallofthem,anddiscovertheknowledgeamongallthedimensions.Bytheincreasingofthedatadimensions,theefcientimplementationsofhigh-dimensionalco-clusteringalgorithmsarealsoneededinpracticetodealwithhugeamountofrealworlddata.Generallyspeaking,wecantreatthedatainmulti-dimensionsasacontingencytable.Informationtheoryprovidesatheoreticalway,mutualinformation,tomeasurethemutualdependenceofrandomvariables.Itprovidesagoodwaytomeasure 9

PAGE 10

whetheraco-clusteringisoptimal.Basedonmutualinformation,someresearchers[ 1 ]presentedInformationTheoreticalCo-Clustering(ITCC)algorithm,anefcientco-clusteringalgorithm.Ittreatstheoptimalco-clusteringasonethatleadstothelargestmutualinformationbetweentheclusteredrandomvariables.Equivalently,itistominimizethedifferenceinthemutualinformationbetweentheoriginalrandomvariablesandthemutualinformation.For2Ddata,itintertwinesbothrowandcolumnclusteringatallstages.Thealgorithmhasbeenprovedthatitcanmonotonouslydecreasethedifferenceofthemutualinformationbetweentheclusteredvariablesandthecorrespondingoriginalvariables,andnallyleadstoalocaloptimalco-clusteringresultbasedontheinitializationoftheclusterassignments.Andfortunately,accordingtotheliterature[ 1 ],ITCCprovidesanreasonablewaytoco-clusteringonmulti-dimensionaldatawithoutintroducingmuchcostontheefciency..Inthisthesis,weusesMulti-DimensionalInformationTheoreticalCo-Clustering(MDITCC)algorithmintheimplementation.Duetothelargescaleofhighdimensionaldata,itisnecessarytoconstructamoreefcientimplementationofthealgorithmtomakeco-clusteringfasterwithoutlosingprecision.Parallelizingthealgorithmbecomesanidealchoicetoimprovetheperformanceandcapability.NVIDIAprovidesCUDA(ComputeUniedDeviceArchitecture)[ 3 ],theparallelcomputingarchitecture,whichenablesdramaticincreasesincomputingperformancebyharnessingthepowerofGPU(GraphicsProcessingUnit)ongeneralpurposecomputing.CUDAsuccessfullydecreasesthecostperGFLOPS,andprovidesandeveloper-friendlyenvironmentforconstructingparallelprogram.AllofthesemakeCUDAanidealplatformtoparallelizetheMulti-DimensionalITCC.WedevelopparallelalgorithmsbasedonMulti-DimensionalITCConNVIDIACUDAplatformtoimprovetheefciencyandthroughputoftheourimplementation.Inthisthesis,wepresentanovelandefcientimplementationofMulti-DimensionalCo-Clusteringalgorithm,whichisbasedontheInformationTheoreticalCo-Clusteringandperformsefcientlyonlargeandmulti-dimensionaldata,especiallyforsparseand 10

PAGE 11

high-dimensionaldata.WerstdescribetheMulti-DimensionalITCC,andprovethekeyformulasinmulti-dimensions.Then,weshownewdatarepresentationsusedtostorevariousdata,includingoriginaldata,clustereddata,marginaldistributions,andsomeotherassistantdatainthecomputation.Also,separatedatastructuresforsparsedataanddensedataarepresentedfordifferentapplications.Thenfollowstheoptimizedserialimplementationsforsparseanddensedata,aswellastheparallelonesonCUDAplatform.Wealsodoexperiments,whichshowtheimprovementontheperformanceourimplementationprovides,andtheresultsoftheco-clustering.Wedemonstratethatourimplementationworkscorrectlyandefcientlyonthelargescalehigh-dimensionaldatabypresentingtheresultsofhigh-dimensionalwirelessnetworkdataco-clustering.Theresultsalsoshowtheparallelimplementationhasanobviousimprovementontheperformance,especiallyonlargescaledata. 11

PAGE 12

CHAPTER2MULTI-DIMENSIONALINFORMATIONTHEORETICALCO-CLUSTERINGTheInformationTheoreticalCo-ClusteringAlgorithmdescriptionisbasedontwo-dimensionaldata.However,itiseasytobeextendedintomulti-dimensionalspaceasmentionedintheoriginalliterature[ 1 ].TooutlinetheapproachofMulti-dimensionalITCC,werstprovethekeyformulaeofITCCinmulti-dimensionalspace,andthendescribetheMulti-DimensionalITCC.Inmulti-dimensionalspace,weassumethevariablesineachdimensionareindependentofeachother,andtreattheinputdataasamulti-dimensionalcontingencytable.Thekeyistorepresentthelossofthemutualinformationinmulti-dimensionalspace.Thus,anewrepresentationofthelossofmutualinformationformulti-dimensionalcontingencytableisnecessary.Basedontheaboveassumptions,wecanwritethenewdenitionofthelossofthemutualinformationinmulti-dimensionalspaceasfollowing: Lemma1. Foraxedco-clustering(CD1,CD2,...,CDn),wecanwritethelossofthemutualinformationas I(D1;D2;...;Dn))]TJ /F6 11.955 Tf 11.96 0 Td[(I(^D1;^D2;...;^Dn)=D(p(D1,D2,...,Dn)kq(D1,D2,...,Dn))(2)whereD(jj)denotestheKullback-Leibler(KL)divergence,alsoknownasrelativeentropy,andq(D1,D2,...,Dn)isthedistributionoftheform q(d1,d2,...,dn)=p(^d1,^d2,...,^dn)nYi=1p(dij^di)(2) Proof. Lemma. 1 p(^d1,^d2,...,^dn)=Xd12^d1Xd22^d2...Xdn2^dnp(d1,d2,...,dn)I(D1;D2;...;Dn))]TJ /F6 11.955 Tf 11.96 0 Td[(I(^D1;^D2;...;^Dn)=X^d1X^d2...X^dnXd12^d1Xd22^d2...Xdn2^dnp(d1,d2,...,dn)logp(d1,d2,...,dn) p(d1)p(d2)...p(dn) 12

PAGE 13

)]TJ /F11 11.955 Tf 11.29 11.36 Td[(X^d1X^d2...X^dn(Xd12^d1Xd22^d2...Xdn2^dnp(d1,d2,...,dn))logp(^d1,^d2,...,^dn) p(^d1)p(^d2)...p(^dn)=X^d1X^d2...X^dnXd12^d1Xd22^d2...Xdn2^dnp(d1,d2,...,dn)logp(d1,d2,...,dn) p(^d1,^d2,...,^dn)p(d1) p(^d1)p(d2) p(^d2)...p(dn) p(^dn)=X^d1X^d2...X^dnXd12^d1Xd22^d2...Xdn2^dnp(d1,d2,...,dn)logp(d1,d2,...,dn) q(d1,d2,...,dn) Somesimplebutusefulequalitiesbetweenpandq,whichhighlightpropertiesofqdesirabletoapproximatingparealsopresented. Proposition2.1. q(^d1,^d2,...,^dn)=p(^d1,^d2,...,^dn),q(di,^di)=p(di,^di)(2) q(di)=p(di),q(^di)=p(^di)(2) p(dij^di)=q(dij^di)(2) p(^d1,^d2,...,^di)]TJ /F10 7.97 Tf 6.59 0 Td[(1,^di+1,...,^dnj^di)=q(^d1,^d2,...,^di)]TJ /F10 7.97 Tf 6.58 0 Td[(1,^di+1,...,^dnj^di)(2)8di,^di,1in.Further,if^di=CDi(di),then q(d1,d2,...,di)]TJ /F10 7.97 Tf 6.59 0 Td[(1,di+1,...,dnj^di)=q(^d1,^d2,...,^di)]TJ /F10 7.97 Tf 6.58 0 Td[(1,^di+1,...,^dnj^di)Yk6=iq(dkj^dk)(2) Proof. Proposition. 2.1 Equation 2 2 2 aresimpletoshowandfollowfromEquation 2 .Equation 2 followsfromq(d1,d2,...,di)]TJ /F10 7.97 Tf 6.59 0 Td[(1,di+1,...,dnj^di)=q(d1,d2,...,di)]TJ /F10 7.97 Tf 6.59 0 Td[(1,di+1,...,dn^d1,^d2,...,^di)]TJ /F10 7.97 Tf 6.58 0 Td[(1,^di+1,...,^dnj^di)=q(d1,d2,...,di)]TJ /F10 7.97 Tf 6.58 0 Td[(1,di+1,...,dn^d1,^d2,...,^dn) q(^di)=Pdi2^dip(^d1,^d2,...,^dn)Qnk=1p(dkj^dk) q(^di) 13

PAGE 14

=q(^d1,^d2,...,^di)]TJ /F10 7.97 Tf 6.59 0 Td[(1,^di+1,...,^dn) q(^di)Yk6=iq(dkj^dk)=q(^d1,^d2,...,^di)]TJ /F10 7.97 Tf 6.58 0 Td[(1,^di+1,...,^dnj^di)Yk6=iq(dkj^dk) AlgorithmCo Clustering(n,p,l1,l2,...,ln,C(D1),CD2,...,CDn)Input:Thejointprobabilitydistributionp(D1,D2,...,Dn),l1,l2,...,lnarethedesirednumberofclustersineachdimension.Output:ThepartitionfunctionCD1,CD2,...,CDn. 1. Initialization:Sett=0.StartwithsomeinitialpartitionfunctionsC(0)D1,C(0)D2,...,C(0)Dn.Computeq(0)(^D1,^D2,...,^Dn),q(0)(D1j^D1),q(0)(D2j^D2),...,q(0)(Dnj^Dn),andthedistributionsq(0)(D2,D3,...,Dnj^d1),1^d1l1. 2. Iterationsoneachdimensionkfrom1ton: (a) Computek-dimensionclusters:foreachdk,nditsnewclusterindexasC(t+k)Dk(dk)=argmindkD(p(D1,D2,...,Dk)]TJ /F10 7.97 Tf 6.59 0 Td[(1,Dk+1,...,Dnj^dk)kq(D1,D2,...,Dk)]TJ /F10 7.97 Tf 6.58 0 Td[(1,Dk+1,...,Dnj^dk))resolvingtiesarbitrarily.LetC(t+k)Dj=C(t+k)]TJ /F10 7.97 Tf 6.59 0 Td[(1)Dj,jk. (b) Computedistributionsq(t+k)(^D1,^D2,...,^Dn),q(t+k)(D1j^D1),q(t+k)(D2j^D2),...,q(t+k)(Dnj^Dn),andthedistributionsq(t+k)(D1,D2,...,Dk)]TJ /F10 7.97 Tf 6.58 0 Td[(1,Dk+1,...,Dnj^dk),1^dklk. 3. StopandreturnCD1=C(t+n)D1,CD2=C(t+n)D2,...,CDn=C(t+n)Dn.Ifthechangeinobjectivefunctionvalue,thatisD(p(D1,D2,...,Dn)kq(t)(D1,D2,...,Dn)))]TJ /F6 11.955 Tf -429.44 -14.45 Td[(D(p(D1,D2,...,Dn)kq(t+n)(D1,D2,...,Dn)),issmall.Elsesett=t+nandgotostep2.Figure2-1. Multi-dimensionalinformationtheoreticco-clusteringalgorithm WecannowdescribeMulti-DimensionalITCCinFigure 2-1 .Thealgorithmstartswithaninitialclusterassignmentforeveryelementineachdimension.Based 14

PAGE 15

ondifferentinitialization,thealgorithmwillleadtoalocalminimallossofmutualinformation,howeveritcannotguaranteeaglobalminimum. 15

PAGE 16

CHAPTER3DATAREPRESENTATIONDatarepresentationintheimplementationplaysanimportantrole.Wetreatthemulti-dimensionaldataasadatacube.Byanalyzingthealgorithm,originaldatacube,clustereddatacube,andthemarginaldistributionsofdatacubesarethethreemostimportanttypesofdata.Thedatastructuresusedintheimplementationshouldgenericallysupportthedatainanynumberofdimensions.Also,thedatastructuresaredesignedbasedontheconsiderationofefcientaccesses,lowspaceoverhead,andparallelcommunication-friendly,whichmeansthedatacanbeextractedintoone-dimensionalarraywithleasttimeandspaceoverhead,sincemostparallelcommunicationoperationsarefriendlytoarrayofbasictypes.Thefollowingpartswilldescribethedatastructuresusedfororiginaldatacube,clustereddatacube,andmarginaldistributionsindetails. 3.1OriginalDataCubeBasedontheproportionofzerosinthecube,theoriginaldatacubescanbedividedintotwodifferenttypes.Oneissparsecube,whichispopulatedprimarilywithzeros.Tothecontrary,theotheroneisdensecube,inwhichmajorityofelementsisnon-zeros.Forthesetwotypesofcubes,twodifferentkindsofdatastructuresaredesignedseparately. 3.1.1DenseCubeThedatastructurefordensecubeusesone-dimensionalarraytostorealltheelementsinthecube.Theelementsarestoredsequentiallyfromthelogicallyrstelementtothelastone.Toaccessoneelementinthecube,weuseaconvertertoconvertthemulti-dimensionalindexesintotheindexofthisone-dimensionalarray,aswellasanotherconvertertodotheoppositeconversion.ThecomplexityofaccessanelementisO(k),whilekisthenumberofdimensions.Thekisalwayssmall.Inmost 16

PAGE 17

casesitislessthan10.Sowecantreatthetimecomplexityofaccessingoneelementasaconstant.Figure 3-1 showsanexampleof2Ddatastorageindensecubestructure. Figure3-1. Storagefordensecubeinmemory 3.1.2SparseCubeOneofthemostcommondatastructuretostoresparsecubeiscoordinatelist,inwhicheachrecordcontainsthemulti-dimensionalcoordinateofaelement,andthevalueofit.Thecoordinatelistincludesallthenon-zeroelementsinthecube.Aninterestingcharacteristicoftheoriginaldatacubeisthatitisunnecessarytovisittheelementsinaspecicsequence.Inanotherwords,visitingtheelementsinanysequencesisacceptable.Atthesametime,wedonotneedtodoanyoperationsonthecube.Thesereasonsmakethecoordinatelistformatanidealwaytorepresentthesparsedata.Still,one-dimensionalarrayisusedforthestorage.Specically,supposethenumberofnon-zeroelementsisn,andthenumberofdimensionsisk.Ankelementsarrayisusedtostoreallthecoordinatesofalltheelements,eachcontinuouskelementsrepresentthecoordinateofoneelement.Anothernelementsarrayisusedtostorethe 17

PAGE 18

valueoftheelements.Thesequenceofalltheelementsinbotharraysisthesame.Forexample,theithcontinuouskelementsintherstarrayrepresentthecoordinateoftheithelement,whiletheithelementinthesecondarrayrepresentsthevalueoftheithelement.Itisworthnotingthatnorandomaccessinterfaceisprovidedinthesparsecube.Thisismainlybecausewedon'thavesuchdemandinthealgorithm. Figure3-2. Storageforsparsecubeinmemory Figure 3-2 showsanexampleof2Ddatastoredinthesparsecubestructure.ThetimecomplexityofaccessalltheelementsinthecubeisO(l),inwhichlmeansthenumberofnon-zeroelementsinthesparsecube.Inasparsecube,thenumberofnon-zerosismuchlessthanthatinadensecube. 3.2ClusteredDataCubeByanalyzingthealgorithm,someinterestingcharacteristicsofclustereddatacubecanbefound,whicharehelpfulfordesigningthedatastructure: Theclusteredcubeisalwaysdense; 18

PAGE 19

Thedataintheclusteredcubechangesfrequentlyduringruntime; Theelementsarealwaysrandomaccessedduringtheexecution; Thesizeoftheclusteredcubeisalwayssmallenoughsothatevenstoringalltheelementsofitwon'tconsumemuchofthememory.ThedensecubestructuredescribedinOriginalDataCubesectionsatisesallthedemandsabove.Therefore,weusedensecubefortherepresentationofClusteredDataCube. 3.3MarginalDistributionMarginaldistributionisfrequentlyaccessedduringtheexecution.Becausethenumbersofelementsindifferentdimensionsvary,themostspace-efcientdatastructuretostorethemisjaggedtwo-dimensionarray,withthedimensionnumberintherstdimensionandtheindexoftheelementinthecorrespondingdimensionintheseconddimension.Inourimplementation,wepreferone-dimensionalarray.Thusweusearrayforthemarginaldistribution.Theone-dimensionalarraystoresalltheelementsinthejaggedarraysequentiallyfromtheelementsintherstdimensiontothelast.Forafastaccessonaspecicelement,insteadofcalculatingtheindexofelementsbyaddingupthelengthofeachdimensionfromtherstonetothespecicone,theindexesofeachdimensionarestoredinanotherassistantarray.Itisworthnotingthatthesamestructureisalsousedfortheclusterassignments,whichstorestheclusterindexforeachelementineachdimension.Figure 3-3 showsanexampleof4Dmarginaldistributionsstoredinthisdatastructure. 19

PAGE 20

Figure3-3. Storageforjagged2Darrayinmemory 20

PAGE 21

CHAPTER4SERIALIMPLEMENTATIONSerialimplementationprovidesabasicprogramstructureforthealgorithmprocedure.Fordifferenttypesofdata,sparseversionanddenseversionareimplementedandoptimizedseparately.Serialimplementationalsoprovidesafundamentalworkowimplementationfortheparallelimplementation,whichmainlyfocusontheparallelizationofthecomputation-intensivepartofthealgorithm.Figure 4-1 showstheowstructureofthewholeprogram.Wecandividethecomputationsinthealgorithmintoseveralsmallbasicoperations.Theseoperationsare: Calculatingtheclustereddatacube; Calculatingmarginaldistributionofcube(boththeoriginaloneandtheclusteredone); Calculatingthedistancebetweenoneelementandthecorrespondingcluster,D(p(D1,D2,...,Dk)]TJ /F10 7.97 Tf 6.59 0 Td[(1,Dk+1,...,Dnjdk)kq(D1,D2,...,Dk)]TJ /F10 7.97 Tf 6.58 0 Td[(1,Dk+1,...,Dnj^dk)),andthecorrespondingq(t+k)(d1,d2,...,dk)]TJ /F10 7.97 Tf 6.59 0 Td[(1,dk+1,...,dnj^dk),1^dklk; FindingtheminimumofallthedistanceC(t+k)Dk(dk)=argmindkD(p(D1,D2,...,Dk)]TJ /F10 7.97 Tf 6.59 0 Td[(1,Dk+1,...,Dnjdk)kq(D1,D2,...,Dk)]TJ /F10 7.97 Tf 6.59 0 Td[(1,Dk+1,...,Dnk^dk)).Thesebasicoperationsarethekeypartsoftheimplementation.Thefollowingpartswillrstintroducethepreprocessingpartofthewholeprogram,whichconvertstheinputdataintoamorefriendlywayforoperationsaswellassavingonthetimeandspaceconsumption.Thenthedetailsoftheimplementationofthecomputation-intensivealgorithmoperationswillbediscussedseparatelyindenseformandsparseform.Somespecicoptimizationinimplementationlevelisalsoprovidedtoachievebetterperformance. 21

PAGE 22

Figure4-1. Flowofimplementation 4.1PreprocessingPreprocessingisindispensabletomaketheimplementationworkcorrectlyandefciently.Fortheinputdata,theelementsofeachdimensionmightbeanytypebesidesinteger.Evenifitisinteger,thevaluesmaydistributeinlargerangediscretely,whichbringhugeunusedspaceinthestorageinmulti-dimensionalspaceandmoreredundantcomputation. 22

PAGE 23

PreprocessingcountsthenumberofelementsinalldimensionsandassignseachelementwithanewandsequentialIDinthatdimensions.Inthisway,theelementswithnonon-zerorecordsinalldimensionsareeliminated,whichmayreducetheperformanceofthealgorithmandtheutilityoflimitedmemory.Repeatedrecords,whichhavethesameattributes,willbeaggregated. 4.2SerialImplementationandOptimizationforDenseCubeMostcomputationcanbedoneduringtheiterationonalltheelementsinthecube.Theoperationsbehaveasthefollowing: Computingtheclustereddatacube,andthecorrespondingmarginaldistributions:theprogramcalculatesthecorrespondingindexesintheclusteredcubefromtheelement'sindex.Theelementsincorrespondingmarginaldistributionandtheclustereddatacubecanbeaccessedthroughtheindexes.Andthecorrespondingeldsaretheresultofaddingupthevalueoftheiteratedelement. Computingtheintermediateqvalues:theprogramcalculatestheqvaluesduringtheiterationonalltheelementsinoriginalcubefollowingtheequation:q(t+k)(d1,d2,...,dk)]TJ /F10 7.97 Tf 6.58 0 Td[(1,dk+1,...,dnj^dk)=p(^d1,^d2,...,^dk)]TJ /F10 7.97 Tf 6.58 0 Td[(1,^dk+1,...,^dnj^dk)Yi6=kp(dij^di),1^dklk Thedistancesbetweenelementandcluster:theprogramiteratesonallthepairsofelementandcluster,andcalculatesthedistancesfollowingthedenitionofthedistance,theKullback-Leibler(KL)divergence.Theshortestdistanceandthecorrespondingclusterwillbestored.Indetails,thecalculationfollowsthefollowingequation:D(p(D1,D2,...,Dk)]TJ /F10 7.97 Tf 6.59 0 Td[(1,Dk+1,...,Dnjdk)kq(D1,D2,...,Dk)]TJ /F10 7.97 Tf 6.59 0 Td[(1,Dk+1,...,Dnj^dk))=X^d1X^d2...X^dk)]TJ /F8 5.978 Tf 5.75 0 Td[(1X^dk+1...X^dnp(d1,d2,...,dk)]TJ /F10 7.97 Tf 6.59 0 Td[(1,dk+1,...,dnjdk)logp(d1,d2,...,dk)]TJ /F10 7.97 Tf 6.58 0 Td[(1,dk+1,...,dnjdk) q(d1,d2,...,dk)]TJ /F10 7.97 Tf 6.59 0 Td[(1,dk+1,...,dnj^dk)Anoptimizationcanbeadoptedinthedistancecomputation.Heavycomputationtakesplaceintherepetitivecalculationoftheindexesandtheindexconversions,whichcosthugeamountoftimeduringprocess.Toreduceit,insteadofcomputingthedistanceforeachpairofelementandcluster,wecalculatealltheplogp qvaluesintheiterationonalltheelementsonce.Thisreducestherepetitivecalculationofthesame 23

PAGE 24

indexes.Thenweaddthevaluetothecorrespondingdistances.Experimentsshowtheoptimizationsuccessfullyreduceslargeamountofcalculationtime,andprovidesgreatperformanceimprovementtothewholeprogram. 4.3SerialImplementationandOptimizationforSparseCubeBecausethedatarepresentationofthesparsecubeisdifferentfromtheoneofthedensecube,theserialimplementationforsparsecubesolvestheprobleminadifferentway.However,muchoftheimplementationfollowsthesameprincipleastheonefordensecube,includingthecomputationofclusteredcubeandmarginaldistributions.Forthemostcomplicateddistancecomputationaswellastheqcomputation,itisalsodoneintheiterationonalltheelementsinthesparsecube.Duringiterationononeelement,wecanvisiteachdimensionindexoftheelement.Ifit'sthedimensionwhichisbeingclustered,theqvalueismultipliedbyq(^d1,^d2,...,^di)]TJ /F10 7.97 Tf 6.58 0 Td[(1,^di+1,...,^dnj^di),otherwiseitismultipliedbyq(dij^di).Figure 4-2 showsanexampleforbasicproceduretocomputedistanceinthisimplementation. Figure4-2. Procedureofcomputingdistancesiniterationonsparsecube 24

PAGE 25

CHAPTER5PARALLELIMPLEMENTATIONParallelimplementations,bothdensecubeandthesparseone,focusontheparallelizationofthecoreandcomputation-intensivepartsofthealgorithm.Inthischapter,theparallelimplementationfordensecubeispresentedrst,andthenfollowstheoneforsparsecube. 5.1ParallelImplementationandOptimizationforDenseCubeByanalyzingtheoperationsonthedensecubeinthealgorithm,wecandivideallthecomputationoperationsintodifferenttypesofabstractoperations. Reduction.Thisappearsinthecalculationofclusterdistribution,marginaldistribution,distancesbetweeneachelementandcorrespondingclusterinthecorrespondingdimension,andclusterassignments. Multi-itemscalculation.Thistypeofcalculationneedstocalculatemanyitemswiththesamecalculationformulabutwithdifferentdatasource.Thistypeappearsmostlyintheintermediateqvaluecalculationandthedistances.Someofthecalculationsbelongtobothtypes,suchasdistancescalculation.Thesetypesofcalculationscomposethemostintensivecomputationinthealgorithm.Thus,ourparallelimplementationmainlyfocusesonthesetwotypesofcalculation. 5.1.1ParallelReductiononCUDAPlatformParallelReductiononCUDAplatformissimilartogeneralparallelreduction,whichiswidelyusedinMPIonclusters.Mainlyitisatree-basedreduction.Wecanexploitmaximumparallelduringreduction.ToimplementparallelreductiononCUDAplatform,therearesomemorethingsneedtobeconsidered,includingthethreadandblockconcept,sharedmemoryinthreadblock.Someoptimizationscanalsobeadopted,suchasloopunrollingorevencompleteunrolling.SomeresearchersinnVidiapresentedanoptimizingparallelreductionintheirwork[ 2 ],inwhichtheabovefactorsareconsidered.Figure 5-1 showsthemainideaofthisparallelreductionalgorithm.Inourimplementation,weusemostoftheirideaforthereductionoperation. 25

PAGE 26

Figure5-1. Tree-basedreduction[ 2 ] BecauseCUDAplatformhasnoglobalsynchronization,thedataisdividedintoseveralblocks.Eachblockcorrespondstoathreadblock.Thereductionisexecutedseparatelyineachthreadblock,butsimultaneouslyamongtheblocks.Eachthreadinthethreadblockwouldrstdothereductiononatleasttwoelementstoreducethehalfidlethreadsafterloadtheelements.Atthesametime,eachthreadsumsupasmanyasnecessary,ifthenumberislargerthantwotimesofthetotalnumberofthreadsinallthreadblocks.Theoretically,ifthenumberoftheelementsislog2ntimesoftwiceofthenumberofthreads,thisparallelreductionimplementationiscost-optimal.Thereductionusessharedmemoryasbufferfortheelementsandresults.Intherststepofreduction,eachthreadloadsthecorrespondingdataintheglobalmemoryintosharedmemoryineachblock.Allthefollowingreductionstepsaretakeninthesharedmemory,whichreducesthedataaccesstimeduringthereduction.Thereductionalsousessequentialaddressinginthesharedmemoryinordertoreducethesharedmemorybankconicts.Ineachstep,eachthreadsumsupitsown 26

PAGE 27

elementwiththeoneinthesecondhalfintheblock.Thethreadsinthesecondhalfbecomeidleatthesametime.Inthisway,theelementsusedforthenextreductionstepisinthesequentialaddressspace,andthetimeforaccessdataondifferentsharedaddressbankisreduced.Loopisanothertimeconsumingpartinthewholereductioncomputation.Whenthenumberofthreadsgoeslessthanthenumberofthreadsinonewrap,wecanunrolltheloopandreducethetimeforsynchronizingthethreads.Becausethelimitonthenumberofthreadsinoneblock,wecanevencompleteunrolltheloopbyusingC++templatefeature.Thebranchesareeliminatedatthecompiletime.Tomakeitworkintherightway,weneedtolimitthenumberofthreadsineachblocktothepowerof2.Withtheaboveoptimizationsforthereduction,thereductionisefcientandtheoreticallycost-optimal,whichprovidesafundamentalprimitiveforthecomputationinimplementation.Asdescribedabove,eachtypeofcalculationtakesdifferentsourcedata.Forexample,theoriginaldistributionisusedmultipletimesbydifferentkindofcalculation,buteachkindofcalculationjustusespartoftheoriginaldata,suchasthemarginaldistributionandtheclusterdistribution,whichonlyneedsalltheelementsrelatedtooneelementorinonespecicblock.Thedataneededbythistypeofcalculationisnotalwayssequentialinmemory.Whenwedistributethecalculationintodifferentthreads,weneedtomakesureeachthreadtakespartofthedatauniquely,correctlyandcompletely.Intheparallelimplementation,wemakemappingfunctionsforeachtypeofcalculation,whichmapfromthethreadIDtothelocationofthedata,andfromthelocationofdatatothethreadID.Sharedmemoryisalsoheavilyusedwhencomputingthemappingaddressoftheelement.Assistantvariablesforcomputingmapping,suchasthenumberofelementsineachdimension,andsoon,arestoredinthesharedmemory.Figure 5-2 showsanexampleformappingdifferentareastosequentialthreads. 27

PAGE 28

AExampleonmappingdimension0 BExampleonmappingdimension1Figure5-2. Exampleof2Dthreadmappingformarginaldistributioncomputation 5.1.2Multi-itemsCalculationonCUDAPlatformAnothertypicalcalculationinthealgorithmistocalculatemultiplesimilaritemswiththesameformulabutwithdifferentdatasource.Themosttypicalpartofthistypeofcalculationinthealgorithmistheintermediateqvalues'calculationinthedistancescalculation.Basedonthedifferentqvaluethatiscomputing,thedifferentdataareusedfromtheoriginalcube,clusteredcube,andmarginaldistributionsfororiginalandclusteredcube.Inourimplementation,weusesOutputDataPartitionmethodtopartitionthecalculationtasktodifferentthreads.EachthreadcalculatesaqvaluebycalculatingtheindexescorrespondingtothethreadID,loadingthecorrespondingdatafromglobalmemory.Sharedmemoryisalsousedforassistantdata. 28

PAGE 29

5.1.3OptimizationTouseGPUsforcalculation,basicmemoryoperations,suchasallocatingdevicememory,freeingdevicememory,andcopyingbetweenhostanddevice,arefrequentlyused.Thememoryneedstobeallocatedrstandthenthedataiscopiedtothedevicememoryfromthehost.Afternishingthecalculation,itiscopiedbacktothehostmemoryfromgraphicdevice.Theseoperationsareverytimeconsumingwhentheamountofdataisverylarge.Inourparallelimplementation,somepartsoftheimplementationareparallelized.Someintermediateresultandoriginalread-onlydatashouldnotbecopiedmultipletimesbetweenhostanddevice.Weoptimizetheimplementationbyleavetheread-onlydata,suchasoriginaldataanditsmarginaldistribution,andintermediatedatainthedevicememorytobereused.Wealsoreusetheaddressspaceallocatedbeforetoreducethemultipletimesrepeatedallocationandfreeing. Figure5-3. Reductionofrepeatedcommunicationbetweenhostanddevice 5.2ParallelImplementationandOptimizationforSparseCubeInsteadofparalleltree-basedreductionintheimplementationfordensecube,atomicoperationsareheavilyusedintheparallelimplementationforsparsecube.Tree-basedreductionwon'tworkwellinthissituation,becausewecanhardlyndawaytomapthedatatobereducedintosequentialthreads.Inaddition,thiskindofatomic 29

PAGE 30

operationwon'tbringinmuchoverheadincomplexity.Theblockjusthappenswhentwothreadswritethesameblockofmemory.Themostcomputation-intensivepartisthecomputationofthedistances,whichisthekeypartforparallelization.However,theparallelimplementationforthecomputationofthedistancesisalittlebittricky.First,westilltreatthecomputationononenon-zeroelementasaindividualtask.AndthenwemapittoathreadonGPUs.Thereasonwhyweputthecomputationofalldistancesrelatedtotheelementinonetaskisbecause,normally,thesizeofclusteredcubeissmall,andthenumberofnon-zeroelementsisquitelarge,whichexceedthenumberofallthreadsthatcanexecutesimultaneouslyinGPUsorevenseveraltimesofit.Dividedthetaskintosmallergranularitywon'tbringimprovementonperformance.Althoughwecanperformtree-basedreductiononndingtheshortestdistanceamongdistances,itneedssynchronizeoperationandintroducelargeoverheadonextramemorycostandthecostonwrapexchangesduringtheexecution.Insteadoftree-basedreduction,whenwecalculatethedistances,wecanrewriteintothesamesharedmemoryeldifthenewervalueissmaller,whichismoreefcient.Figure 5-4 showsanexampleofmappingbetweentaskandthreads.Theneverythreadatomicallyaddsthisvaluetothecorrespondingmemoryeldinthedevicememory. Figure5-4. Mappingofthreadsinparallelimplementationforsparsecube 30

PAGE 31

Duringtheiterationonallthecoordinates,wecomputesallthedistancesbetweentheelementandtheclusterinthespecicdimension.However,thenumberofthedistancesisusuallyverylarge.Eachoftheseeldswouldbefrequentlyreadorwrittenduringtheprocessofthealgorithm.Sothereexistsoneproblem.Frequentatomicoperationsonglobalmemorycostshugeamountoftime,formanyoftheatomicoperationsexecuteinsequenceandeachwriteaccessbyatomicoperationstoglobalmemorywillcosthundredsofcycles.Thisapproachbecomesinefcientandcanbehardlyusedinpractical.Sharedmemoryisagoodsolutiontoreducethetimeofwriteaccessbyatomicoperations.Butwestillhavetwoproblems.Oneisthedistancesistoolarge,butinCUDAthesharedmemoryineachblockisquitesmall,just16KBinthedeviceswithComputeCapability1.X,and48KBinthedeviceswithComputeCapability2.X.Itisimpossibletoplaceevenanormal-sizeco-clusteringproblem'sdistancesintosharedmemory.Theotheroneis,largenumberofatomicoperationsreducestheparallelism.Mostofoperationsaresequential,whichreducestheadvantageoftheparallelization.Weuseasmartsolutiontosolvetheaboveproblems.Eventhoughthetotalnumberofdistancesisoftenverylarge,themaximalnumberofthreadsinoneblockissmall,512inthedevicewithComputeCapability1.Xand1024inthedeviceswithComputeCapability2.X.(We'llusethedeviceswithComputeCapability1.Xforthefollowingexplanation.)Intheworstsituation,thetotalnumberofdistancesthatgeneratesinonethreadblockis512multipliedbythenumberofclustersinthisdimension.Thisisstillalargenumberfortheamountofsharedmemory.Ifthenumberofclustersexceeds8,thetotalamountofsharedmemoryfordistancesmayexceedthelimitation.Tosolvethisproblem,weaddanotherpreprocessingprocedure.Thispreprocessingproceduregeneratemultiplecopiesofcoordinatelist.Althoughallthecoordinatelistsrepresentthesamecube,thesequencesofthecoordinatesaredifferent.Foreachcopyofcoordinatelist,itissortedinascendingorderoftheindexesinthecorresponding 31

PAGE 32

dimension.Thepreprocessingistimeconsumingandspaceconsuming,however,itisonlyaone-timeprocedure.Weonlyneedtoexecuteitonceandplacetheresultsinthedevicememory.Throughthepreprocessing,wereducesthenumberofdifferentelementsintheclusteringdimensioninoneblock.Althoughtheworstcaseisthesame,itrarelyhappensinpractice.Fromthestatisticsinourexperiments,nomorethan40differentelementsappearinoneblock.Inmostsituationsthenumberis10.Thus,inthisway,wecanplacethedistancesthatwillbecalculatedinoneblockinitssharedmemory.Notonlyistheaccesstimereduced,butalsothenumberofatomicoperationsonglobalmemoryisreduced. Table5-1. Exampleofsortingindexesinthreadsandblocks ThreadblockRecordindexinthreads 000000011111122333233444444355556667 Table5-2. Exampleofsharedmemory BlockElementsDistancetocluster 000.140.140.230.23010.140.120.340.24110.240.140.230.53120.230.140.340.24130.550.530.140.50230.240.340.330.60240.060.050.350.50350.400.440.350.24360.320.330.530.14370.360.240.420.11 Atomicoperationsonsharedmemoryarefast,comparingtothoseonglobalmemory.Copyingfromsharedmemorytoglobalmemoryisstraightforward.Sequentialwritesonlyhappenintheeldswhichappearintwoblocks.Intheworstcase,themaximalnumberofsuchkindofeldsisonlyequaltothenumberofblocks.Thetotalamountoftimeonsequentialoperationsreduces.Inthisway,wesolvetheproblem 32

PAGE 33

ofthecomputationofdistances.Otherparts,likecomputationonclusteredcubeandndingtheassignmentsthroughthedistances,usethesameparallelismsolutionasdescribedinotherparts. 33

PAGE 34

CHAPTER6EXPERIMENTSThissectionprovidessomeevidencetoshowthebenetfromtheMulti-DimensionalITCC,anditsparallelimplementation.Inparticular,weapplytheimplementationtorandomlygenerateddataforperformanceevaluation,andtherealdataofwirelessrecordsforco-clusteringfunctionalevaluation.Weshowthealgorithmworkswellonmulti-dimensionaldata,andtheparallelversionofimplementationhasobviousspeedupthantheserialimplementationonthesamedataset. 6.1DataSet,EnvironmentandMeasurementDetailsThedatasetsweusetoevaluatetheperformanceoftheimplementationsarerandom-generated3Ddata.Thesizeofthecubeis200200200.Totallywehavegenerated10datasets.Theyhave10000,20000,40000,80000,...,5120000recordsrespectively.ThedatasetsweusetoevaluatetheCo-clusteringresultsarewirelessdatarecords.Thereare2datasets.Table 6.1 showsthedetailsofeachdataset.Inthetable,uidstandsforUserID,didstandsforDomainID,andlidstandsforLocationID.TheevaluationtakesplacesontheteslanodeinUniversityofFloridaHighPerformanceComputingCenter.Thehosthas4IntelE5462coresrunningat2.8GHz,16GBofRAM,andnVidiaTesla(C1060)GPUwith4GBRAM.Boththeparallelimplementationandserialimplementationarerunonthesamemachine.Toreducetheotherfactors,suchaspreprocessingandoutput,whichmayaffecttheresult,weonlymeasurethetimeforthecomputationineachloop,whichisthecorepart Table6-1. Informationondatasets Dataset1Dataset2 NumberofDimensions23NumberofNon-0Elements30546418808NamesofDimsuid,diduid,did,lidNumberofElementsperDimension22816,1001800,100,68NumberofClusters15,1510,10,10 34

PAGE 35

oftheparallelization.Toreducetheuncertaintyoftheexperiments'results,eachtimemeasurementofeachloopderivesfromthetimeconsumptionofseveralloopsdividedbythenumberofloops. 6.2PerformanceandDiscussionTheperformanceoftheimplementationsdependsonmanyfactors,includingthenumberofdimensions,thenumberofelementsineachdimension,thenumberofnon-zeroelements,andthenumberofclustersintheresults.Numberofiterationsvariesfordifferentinputdataanditsinitialization.Inourexperiments,wesetthethresholdto10)]TJ /F10 7.97 Tf 6.59 0 Td[(6,whichmeanstheco-clusteringwillstopwhenthechangeoflossofmutualinformationislessthanthethreshold,10)]TJ /F10 7.97 Tf 6.59 0 Td[(6.Figure 6-1 showsthetrendsofthelossofmutualinformationasco-clusteringgoeson.Wecanseethelossofmutualinformationdecreasesrapidlyatrstandsloweraftereachloop. Figure6-1. Trendsofthelossofmutualinformation Wehavealsocomparedlossofmutualinformationbeforeandafterco-clusteringamong20timesofexecutionsonthesamedatasets.Figure 6-2 showsthechanges.It 35

PAGE 36

showsthealgorithmisabletosuccessfullydecreasethelossofmutualinformationnomatterwhatkindofinitializationhasbeenappliedontheoriginaldata. Figure6-2. Comparisonoflossofmutualinformation Fortheperformanceevaluation,weapplyourimplementationsonthe10random-generateddatasets.Figure 6-3 showsthedetailedresultsofperformance.Generallyspeaking,theparallelimplementationsbothhaveimprovementontheserialimplementationsforbothsparsecubeordensecube.Forthesamedatasets,wecanndasthenumbersofclustersgrowing,thetimeconsumptiongrows.Itisbecausethenumberofdistancesneededtobecalculatedgrows.Fortheperformanceofdifferentimplementationsondifferenttypesofdata,weappliedourimplementationstotherandom-generateddatasets.Comparingtheimplementationfordensedataandtheoneforsparsedata,theformeroneperformsbetteronthedensedataset,whileatthesametimethelatteroneperformsbetteronsparsecube.Thedatashowsthat,fortheserialimplementation,whenthedensityofthenon-zerorecordsislargerthan8%,thedenseimplementationperformsbetter.Fortheparallelimplementation,wecanndoutthatparallelizationexploitedfromthe 36

PAGE 37

APerformanceresultsonsparsecube BPerformanceresultsondensecubeFigure6-3. Performanceresults 37

PAGE 38

implementationforsparsedataismuchmorethantheonefromtheimplementationfordensedata.Thetimeconsumptionoftheimplementationforsparsecubeshowslinearincrementsasthenumberofrecordsincreasing,forbothserialimplementationandparallelimplementation.Insum,implementationfordensecubeshowgreaterperformanceondensedata.Atthesametime,theimplementationsforsparsecubedobetteronsparsedata.Parallelimplementationsbothhaveobviousspeeduponthecorrespondingserialimplementations. 6.3Co-clusteringAlgorithmResultsToshowtheco-clusteringresultofMulti-DimensionalITCC,weappliedtheimplementationonthereal3Dwirelessdatarecords.Wehavevisualizedthedistributionofdatapointsina3Dspace.Figure 6-4 showsthestatusofdatapointsdistributionsbeforeandafterco-clusteringalgorithmexecution.Inthegure,wegrouptheidsinthesameclustertogether.Fromthegure,wecanclearlyndoutthedatapointswithsimilarpropertiesareclusteredtogetherafterthealgorithmexecution.Theco-clusteringresultsareheavilyrelatedtotheinitializationoftheclusterassignmentoftheoriginalelements.Andinourexperiments,weuserandominitializationforthosedatasets.Inthissection,weshowoneofdomainco-clusteringresultsfrom3Ddataset.Figure 6-5 showstheclustereddomainnames.Fromthisresult,wecanempiricallyndsomeinterestedclusterswhichgrouptogetherthedomainssomegroupsofusersusuallyvisittogether.First,microsoftofce2007,macfeeandwindowsmediabelongtocluster5,whichmaybelongtothewebsiteswindowsusersoftenvisit.yahooandyimgbothbelongtoyahoo,andtheymightalwaysbevisitedtogether.hotmail,live,andgoallbelongtoMicrosoftLiveService.Somepotentialknowledgemightalsobeexploitedfromtheresultstoo.Forexample,wemightguessthemacuserslikewashingtonpostmore,becausemacandwashingtonpost 38

PAGE 39

ADatapointsdistributionafterinitialization BDatapointsdistributionafterco-clusteringFigure6-4. Datapointsdistributionbeforeandafterco-clustering 39

PAGE 40

Cluster0:netflixflyingcrocwsjCluster1:googleveohlexisCluster2:aboutadrevolverbrightcovecnncontextwebdiggriverdoubleclickebayebayrtmfastclickgridserverimageshackitunesmicrosoftmozillapanthercdnsecureservertheplanettribalfusiontypepadwebtrendslivewikimediaxpc-miiyoutubevirtualearthtmcsebayimgimeemmyspacebankofamericacoremetricsamericanidolha-hostingmsnbcsportsltdomainsnihnytimesbigchartsCluster3:cbsiglntravelpnCluster4:ilikeaolCluster5:infoaveuscwindowsmediahackerwatchmcafeemicrosoftoffice2007Cluster6:comcastharvardhamachiucsbdiggCluster7:facebookllnwmediaplexmsntfbnwxlhostapmebf247realmediaLevel3Cluster8:gonetlivehotmailgotomypcCluster9:asterfastresfastwebnetsmartbrobodoglifetorrentboxqwestCluster10:appleCluster11:bluehostrryahooyimgCluster12:drmisnotforsalesoftlayerquiettouchwestlawCluster13:costeadfastsocialmedialokmCluster14:cnetwashingtonpostearthlinkmacopendnsorbFigure6-5. Co-clusteringresultofdomainnamesof2Ddataset belongtoonecluster.Inanotherwords,theresultsfrommulti-dimensionalco-clusteringarevaluableandinstructiveforknowledgediscoveryfromlargeamountofnewdata.However,thisisjustthelocaloptimalresultsfromtheinitialization.Wecangetmuchdifferentresultsfromdifferentinitializations.Accordingtoourexperiments,theresultsfromrandominitializationsarequiteunstable.Afeasiblesolutionfortheunstableperformanceisusingtheco-clusteringresultsfromanotheralgorithmtoinitializetheITCC.Inthisway,we'llgetamoreoptimalresultonmutualinformationthantheoneusedforinitialization. 40

PAGE 41

CHAPTER7CONCLUSIONTheparallelandserialimplementationsofMulti-DimensionalInformationTheoreticCo-Clusteringalgorithmhaveshownthatitcanworkwellwithany-dimensionaldata,especiallyforthesparseandhigh-dimensionaldata.Parallelversionsofimplementationsforbothsparseanddensecubehaveanobviousperformanceimprovementonthecorrespondingserialversionsofimplementation.Theimplementationsforsparsecubehavegreateradaptabilityonlargescaleofdata,whichwillbemoreusefulthanthedenseversionsinmostofthesituations.Parallelimplementationforsparsecubeexploitsmoreparallelismonthealgorithm,whichshowsmoreobviousspeeduponthecorrespondingserialone.TheMulti-DimensionalITCCcansuccessfullyco-clustermulti-dimensionaldata,andexploitspotentialknowledgehiddeninthedata,whichishelpfulforknowledgediscoveryfromthereal-worlddata. 41

PAGE 42

REFERENCES [1] I.S.Dhillon,S.Mallela,andD.S.Modha.Informationtheoreticalco-clustering.InNinthACMSIGKDDInt'lConf.KnowledgeDiscoveryandDataMining(KDD'03),pages89,2003. [2] M.Harris.OptimizationparallelreductioninCUDA.Technicalreport,2007. [3] NVIDIACorporation.NVIDIACUDACProgrammingGuide. http://developer.download.nvidia.com/compute/cuda/3_2/toolkit/docs/CUDA_C_Programming_Guide.pdf ,2010. 42

PAGE 43

BIOGRAPHICALSKETCH XiaoyangGaoisaMasterofSciencecandidateincomputerengineeringfromDepartmentofComputerandInformationScienceandEngineeringatUniversityofFlorida.Whilehe'sstudyinginUniversityofFlorida,hedidresearchinparallelcomputinganddataminingunderDr.SanjayRanka'ssupervisionandappliedthemonmodelingthewirelessdata.Hereceivedhisbachelor'sdegreeinComputerScienceandTechnologyfromHuazhongUniversityofScienceandTechnology,Wuhan,P.R.China. 43