<%BANNER%>

Combinatorial Optimization Techniques in Data Mining

Permanent Link: http://ufdc.ufl.edu/UFE0021157/00001

Material Information

Title: Combinatorial Optimization Techniques in Data Mining
Physical Description: 1 online resource (109 p.)
Language: english
Creator: Busygin, Stanislav
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2007

Subjects

Subjects / Keywords: algorithm, analysis, biclustering, classification, clique, clustering, data, graph, optimization
Industrial and Systems Engineering -- Dissertations, Academic -- UF
Genre: Industrial and Systems Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: My research analyzes the role of combinatorial optimization in data mining research and proposes a collection of new practically efficient data mining techniques based on combinatorial optimization algorithms. A variety of addressed data mining problems include supervised clustering (classification), biclustering, dimensionality reduction, and outlier detection. The recent advances and trends in biclustering are surveyed and the major challenges for its further development are outlined. Similarly to many other data mining methodologies, one of them is the lack of mathematical justification for the significance of purported results. I address this issue with the development of the notion of consistent biclustering. The significance of consistent biclustering is mathematically justified by the conic separation theorem establishing simultaneous delineation of both sample and attribute classes by convex cones. This required property of the obtained biclustering serves as a powerful tool for selecting those attributes of the data which are relevant to a particular studied phenomenon. As an example of such an application, several well-known DNA microarray data sets are considered with the consistent biclustering results obtained for them. To further advance the application of mathematically well-justified optimization methods to major data mining problems, I developed a new optimization based data classification framework which relies upon the same criteria of class separation that serve as the objectives in unsupervised clustering methods, but utilizing them instead as the constraints on feature selection based upon the available training set of samples. The reliability and robustness of the methodology is also empirically confirmed with computational experiments on DNA microarray data. Next, I discuss the prominent role of graph models in data analysis with the emphasis on data analysis applications of the maximum clique/independent set problem. The great variety of real-world problems that can be tackled with the graph-based models is surveyed along with the employed methodologies of information retrieval. Finally, I present a practically efficient maximum clique heuristic QUALEX-MS. It utilizes a new simple generalization of the Motzkin-Straus theorem for the maximum weight clique problem. This generalization, representing quite a significant theoretical result itself, maximally preserves the form of the original Motzkin-Straus formulation and is proved directly, without the use of mathematical induction. QUALEX-MS employs a new trust region heuristic based upon this new quadratic programming formulation. In contrast to usual trust region methods, it takes into account not only the global optimum of a quadratic objective over a sphere, but also a set of other stationary points. The developed method has complexity O(n^3), where n is the number of vertices of the graph. Computational experiments indicate that QUALEX-MS is exact on small graphs and very efficient on the DIMACS benchmark graphs and various random maximum weight clique problem instances. QUALEX-MS was utilized for optimization of classification and regression trees of databases.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Stanislav Busygin.
Thesis: Thesis (Ph.D.)--University of Florida, 2007.
Local: Adviser: Pardalos, Panagote M.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2008-08-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2007
System ID: UFE0021157:00001

Permanent Link: http://ufdc.ufl.edu/UFE0021157/00001

Material Information

Title: Combinatorial Optimization Techniques in Data Mining
Physical Description: 1 online resource (109 p.)
Language: english
Creator: Busygin, Stanislav
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2007

Subjects

Subjects / Keywords: algorithm, analysis, biclustering, classification, clique, clustering, data, graph, optimization
Industrial and Systems Engineering -- Dissertations, Academic -- UF
Genre: Industrial and Systems Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: My research analyzes the role of combinatorial optimization in data mining research and proposes a collection of new practically efficient data mining techniques based on combinatorial optimization algorithms. A variety of addressed data mining problems include supervised clustering (classification), biclustering, dimensionality reduction, and outlier detection. The recent advances and trends in biclustering are surveyed and the major challenges for its further development are outlined. Similarly to many other data mining methodologies, one of them is the lack of mathematical justification for the significance of purported results. I address this issue with the development of the notion of consistent biclustering. The significance of consistent biclustering is mathematically justified by the conic separation theorem establishing simultaneous delineation of both sample and attribute classes by convex cones. This required property of the obtained biclustering serves as a powerful tool for selecting those attributes of the data which are relevant to a particular studied phenomenon. As an example of such an application, several well-known DNA microarray data sets are considered with the consistent biclustering results obtained for them. To further advance the application of mathematically well-justified optimization methods to major data mining problems, I developed a new optimization based data classification framework which relies upon the same criteria of class separation that serve as the objectives in unsupervised clustering methods, but utilizing them instead as the constraints on feature selection based upon the available training set of samples. The reliability and robustness of the methodology is also empirically confirmed with computational experiments on DNA microarray data. Next, I discuss the prominent role of graph models in data analysis with the emphasis on data analysis applications of the maximum clique/independent set problem. The great variety of real-world problems that can be tackled with the graph-based models is surveyed along with the employed methodologies of information retrieval. Finally, I present a practically efficient maximum clique heuristic QUALEX-MS. It utilizes a new simple generalization of the Motzkin-Straus theorem for the maximum weight clique problem. This generalization, representing quite a significant theoretical result itself, maximally preserves the form of the original Motzkin-Straus formulation and is proved directly, without the use of mathematical induction. QUALEX-MS employs a new trust region heuristic based upon this new quadratic programming formulation. In contrast to usual trust region methods, it takes into account not only the global optimum of a quadratic objective over a sphere, but also a set of other stationary points. The developed method has complexity O(n^3), where n is the number of vertices of the graph. Computational experiments indicate that QUALEX-MS is exact on small graphs and very efficient on the DIMACS benchmark graphs and various random maximum weight clique problem instances. QUALEX-MS was utilized for optimization of classification and regression trees of databases.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Stanislav Busygin.
Thesis: Thesis (Ph.D.)--University of Florida, 2007.
Local: Adviser: Pardalos, Panagote M.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2008-08-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2007
System ID: UFE0021157:00001


This item has the following downloads:


Full Text

PAGE 1

COMBINATORIALOPTIMIZATIONTECHNIQUESINDATAMININGBySTANISLAVBUSYGINADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFDOCTOROFPHILOSOPHYUNIVERSITYOFFLORIDA2007 1

PAGE 2

c2007StanislavBusygin 2

PAGE 3

Toallthepeopleofgoodwillwhohelpedmealongthepath 3

PAGE 4

ACKNOWLEDGMENTSFirstofall,IwouldliketoexpressmygratitudetoDr.PanosM.PardalosforhissupportandguidanceduringmyPhDstudiesattheUniversityofFlorida.IamgratefultothemembersofmysupervisorycommitteeDr.StanUryasev,Dr.JosephGeunesandDr.WilliamHagerfortheirtimeandgoodjudgement.IamalsoverygratefultomycollaboratorsandfriendsDr.SergiyButenko,Dr.VladimirBoginski,Dr.ArtyomNahapetyan,andDr.OlegProkopyevfortheirvaluablecontributionstoourjointresearch. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS ................................. 4 LISTOFTABLES ..................................... 7 LISTOFFIGURES .................................... 8 ABSTRACT ........................................ 9 CHAPTER 1INTRODUCTION .................................. 11 1.1GeneralOverview ................................ 11 1.2DataMiningProblemsandOptimization ................... 12 2BICLUSTERINGINDATAMINING ........................ 14 2.1TheMainConcept ............................... 14 2.2FormalSetup .................................. 15 2.3VisualizationofBiclustering .......................... 16 2.4RelationtoSVD ................................ 16 2.5Methods ..................................... 20 2.5.1DirectClustering" ........................... 20 2.5.2Node-DeletionAlgorithm ........................ 20 2.5.3FLOCAlgorithm ............................ 22 2.5.4BiclusteringviaSpectralBipartiteGraphPartitioning ........ 23 2.5.5MatrixIterationAlgorithmsforMinimizingSum-SquaredResidue 27 2.5.6DoubleConjugatedClustering ..................... 31 2.5.7Information-TheoreticBasedCo-Clustering .............. 32 2.5.8BiclusteringviaGibbsSampling .................... 35 2.5.9Statistical-AlgorithmicMethodforBiclusterAnalysisSAMBA .. 38 2.5.10CoupledTwo-wayClustering ...................... 39 2.5.11PlaidModels ............................... 40 2.5.12Order-PreservingSubmatrixOPSMProblem ............ 42 2.5.13OP-Cluster ................................ 43 2.5.14SupervisedClassicationviaMaximal-validPatterns ........ 44 2.5.15CMonkey ................................. 45 2.6DiscussionandConcludingRemarks ..................... 45 3CONSISTENTBICLUSTERINGVIAFRACTIONAL0{1PROGRAMMING 47 3.1ConsistentBiclustering ............................. 47 3.2SupervisedBiclustering ............................. 49 3.3Fractional0{1Programming .......................... 51 3.4AlgorithmforBiclustering ........................... 53 5

PAGE 6

3.5ComputationalResults ............................. 56 3.5.1ALLvs.AMLdataset ......................... 56 3.5.2HuGEIndexdataset .......................... 57 3.6ConclusionsandFutureResearch ....................... 57 4ANOPTIMIZATION-BASEDAPPROACHFORDATACLASSIFICATION .. 61 4.1BasicDenitions ................................ 61 4.2OptimizationFormulationandClassicationAlgorithm ........... 65 4.3ComputationalExperiments .......................... 67 4.3.1ALLvs.AMLDataSet ......................... 67 4.3.2ColonCancerDataSet ......................... 67 4.4Conclusions ................................... 68 5GRAPHMODELSINDATAANALYSIS ..................... 69 5.1ClusterCoresBasedClustering ........................ 71 5.2Decision-MakingunderConstraintsofConicts ............... 72 5.3Conclusions ................................... 75 6ANEWTRUSTREGIONTECHNIQUEFORTHEMAXIMUMWEIGHTCLIQUEPROBLEM ................................. 76 6.1Introduction ................................... 76 6.2TheMotzkin{StrausTheoremforMaximumCliqueandItsGeneralization 78 6.3TheTrustRegionProblem ........................... 85 6.4TheQUALEX-MSAlgorithm ......................... 89 6.5ComputationalExperimentResults ...................... 94 6.6RemarksandConclusions ........................... 101 REFERENCES ....................................... 102 BIOGRAPHICALSKETCH ................................ 109 6

PAGE 7

LISTOFTABLES Table page 3-1HuGEindexbiclustering ............................... 60 6-1DIMACSmaximumcliquebenchmarkresults ................... 98 6-2PerformanceofQUALEX-MSvs.PBHonrandomweightedgraphs ....... 100 7

PAGE 8

LISTOFFIGURES Figure page 2-1Partitioningofsamplesandfeaturesinto3clusters ................ 17 2-2Coclus H1algorithm ................................. 29 2-3Coclus H2algorithm ................................. 30 2-4Gibbsbiclusteringalgorithm ............................. 37 3-1Featureselectionheuristic .............................. 55 3-2ALLvs.AMLheatmap ............................... 58 3-3HuGEindexheatmap ................................ 59 4-1Dataclassicationalgorithm ............................. 66 5-1Clustercoresbasedclusteringalgorithm ...................... 73 5-2ExampleoftwoCaRTsforadatabase ....................... 73 6-1New-best-inweightedheuristic ............................ 94 6-2NBIW-basedgraphpreprocessalgorithm ...................... 95 6-3Meta-NBIWalgorithm ................................ 95 6-4QUALEX-MSalgorithm ............................... 96 8

PAGE 9

AbstractofDissertationPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofDoctorofPhilosophyCOMBINATORIALOPTIMIZATIONTECHNIQUESINDATAMININGByStanislavBusyginAugust2007Chair:PanosM.PardalosMajor:IndustrialandSystemsEngineeringMyresearchanalyzestheroleofcombinatorialoptimizationindataminingresearchandproposesacollectionofnewpracticallyecientdataminingtechniquesbasedoncombinatorialoptimizationalgorithms.Avarietyofaddresseddataminingproblemsincludesupervisedclusteringclassication,biclustering,dimensionalityreduction,andoutlierdetection.Therecentadvancesandtrendsinbiclusteringaresurveyedandthemajorchallengesforitsfurtherdevelopmentareoutlined.Similarlytomanyotherdataminingmethodologies,oneofthemisthelackofmathematicaljusticationforthesignicanceofpurportedresults.Iaddressthisissuewiththedevelopmentofthenotionofconsistentbiclustering.Thesignicanceofconsistentbiclusteringismathematicallyjustiedbytheconicseparationtheoremestablishingsimultaneousdelineationofbothsampleandattributeclassesbyconvexcones.Thisrequiredpropertyoftheobtainedbiclusteringservesasapowerfultoolforselectingthoseattributesofthedatawhicharerelevanttoaparticularstudiedphenomenon.Asanexampleofsuchanapplication,severalwell-knownDNAmicroarraydatasetsareconsideredwiththeconsistentbiclusteringresultsobtainedforthem.Tofurtheradvancetheapplicationofmathematicallywell-justiedoptimizationmethodstomajordataminingproblems,Idevelopedanewoptimizationbaseddataclassicationframeworkwhichreliesuponthesamecriteriaofclassseparationthatserveastheobjectivesinunsupervisedclusteringmethods,bututilizingtheminstead 9

PAGE 10

astheconstraintsonfeatureselectionbasedupontheavailabletrainingsetofsamples.ThereliabilityandrobustnessofthemethodologyisalsoempiricallyconrmedwithcomputationalexperimentsonDNAmicroarraydata.Next,Idiscusstheprominentroleofgraphmodelsindataanalysiswiththeemphasisondataanalysisapplicationsofthemaximumclique/independentsetproblem.Thegreatvarietyofreal-worldproblemsthatcanbetackledwiththegraph-basedmodelsissurveyedalongwiththeemployedmethodologiesofinformationretrieval.Finally,IpresentapracticallyecientmaximumcliqueheuristicQUALEX-MS.ItutilizesanewsimplegeneralizationoftheMotzkin-Straustheoremforthemaximumweightcliqueproblem.Thisgeneralization,representingquiteasignicanttheoreticalresultitself,maximallypreservestheformoftheoriginalMotzkin-Strausformulationandisproveddirectly,withouttheuseofmathematicalinduction.QUALEX-MSemploysanewtrustregionheuristicbaseduponthisnewquadraticprogrammingformulation.Incontrasttousualtrustregionmethods,ittakesintoaccountnotonlytheglobaloptimumofaquadraticobjectiveoverasphere,butalsoasetofotherstationarypoints.ThedevelopedmethodhascomplexityOn3,wherenisthenumberofverticesofthegraph.ComputationalexperimentsindicatethatQUALEX-MSisexactonsmallgraphsandveryecientontheDIMACSbenchmarkgraphsandvariousrandommaximumweightcliqueprobleminstances.QUALEX-MSwasutilizedforoptimizationofclassicationandregressiontreesofdatabases. 10

PAGE 11

CHAPTER1INTRODUCTION1.1GeneralOverviewDuetorecenttechnologicaladvancesinsuchareasasITandbiomedicine,researchersfaceever-increasingchallengesinextractingrelevantinformationfromtheenormousvolumesofavailabledata.Theso-calleddataavalancheiscreatedbythefactthatthereisnoconcisesetofparametersthatcanfullydescribeastateofreal-worldcomplexsystemsstudiednowdaysbybiologists,ecologists,sociologists,economists,etc.Ontheotherhand,moderncomputersandotherequipmentareabletoproduceandstorevirtuallyunlimiteddatasetscharacterizingacomplexsystem,andwiththehelpofavailablecomputationalpowerthereisagreatpotentialforsignicantadvancesinboththeoreticalandappliedresearch.Thatiswhyinrecentyearstherehasbeenadramaticincreaseintheinterestinsophisticateddataminingandmachinelearningtechniquesutilizingnotonlystatisticalmethodsbutalsoawidespectrumofcomputationalmethodsassociatedwithlarge-scaleoptimization,includingalgebraicmethodsandneuralnetworks.Thisdissertationpresentsnewcombinatorialoptimizationmodelsandalgorithmsfordataminingproblemswithapplicationtobiomedicine.Organizationally,thedissertationisdividedintotwomajorparts.Therstpart,whichconsistsofChapters2and3,isconcernedwithbiclusteringmodelsandmethods.Inparticular,Chapter2reviewstheongoingdevelopmentinresearchonbiclusteringanditsapplicationsandemphasizesthetheoreticaltoolsthatareseeminglynecessaryforconstructingrobustandecientbiclusteringalgorithms.Chapter3presentsanovelconceptofconsistentbiclustering,theoreticaljusticationofitsrobustness,andfractional0{1programmingmodelsforsupervisedconsistentbiclusteringwithcorrespondingheuristicalalgorithms.ThesecondpartofthedissertationChapters4,5and6isdedicatedtoothermodelsfordataanalysis.Chapter4presentsmathematicalprogrammingmodelsfordataclassicationwhoseconstrainsarerelatedtotheobjectivesofknownunsupervisedlearning 11

PAGE 12

methods.Next,Chapter5and6areconcernedwithgraphmodelsindataminingandanewpolynomial-timemaximumcliqueheuristicalgorithms.that,inparticular,canbeutilizedforextractinglargegroupsofcloselyrelateddatasamplesandselectingoptimalclassicationandregressionmodelsfordatabases.Chapter5discussesthemodelingofdatasetswithgraphs.Chapter6presentsanecientOn3maximumweightcliqueheuristicQUALEX-MSusingtheMotzkin-Strausquadraticprogrammingformulationoftheproblem.1.2DataMiningProblemsandOptimizationDataminingisabroadareacoveringavarietyofmethodologiesforanalyzingandmodelinglargedatasets.Generallyspeaking,itaimsatrevealingagenuinesimilarityindataproleswhilediscardingthediversityirrelevanttoaparticularinvestigatedphenomenon.Theproblemsassociatedwithdataminingtasksmainlyfallintothefollowingcategories:Clustering:partitionofagivensetofsamplesintoclassesaccordingacertainsimilarityrelevanttothepurposeofanalysis;Dimensionalityreduction:projectionofahigh-dimensionaldatasetontoalow-dimensionalspacefacilitatingthedataexploration;Reductionofnoise:correctingorremovinginaccuratemeasurementsandatypicalsamplesfromadataset.Inparticular,problemsoftherstcategorymaycorrespondtoeitherunsupervisedorsupervisedclustering.Inthelattercasealsocalledclassication,theresearcherisgivenaso-calledtrainingsetofsampleswhoseclassesareknown,andthisaprioriinformationissupposedtobeusedtoclassifythetestsetofsamples.Next,weshouldmentionthatthereexistsanimportantspecialcaseofdimensionalityreductioncalledfeatureselection,wherethenewlow-dimensionalspaceisobtainedbydroppingasubsetofcoordinatesoftheoriginalspace.Finally,thespecialcaseofreductionofnoise,wherethesampleswithpropertiesatypicalfortheirclassesareidentied,calledtheoutlierdetection. 12

PAGE 13

Alltheseproblemsareinterrelatedandusuallyrepresentcertainstagesofthewholedataminingprocedure.Forinstance,reductionofnoisemaybeperformedtorenethedatasetbeforeadimensionalityreductiontechniqueisappliedtoit,and,nally,classicationorsupervisedclusteringmaybeusedtoobtainthedesiredresult.Thedataminingproblemscanbenaturallytreatedasoptimizationproblems.Indeed,whetheronedecideshowtopartitiondataintosimilargroupsorhowtoconstructalow-dimensionalspaceandprojectthedataontoitorhowtopreprocessthedatatoreducethenoise,theobjectiveofthistaskcanbeexpressedasacertainmathematicalfunctionthatneedstobemaximizedorminimizedsubjecttoproperconstraints.Moreover,asthedataalwayscomeasasequenceofobservedsamples,theseoptimizationproblemsnormallyinvolvediscretevariablesassociatedwiththesampleseachofwhichrepresentsacertaindecisionregardingoneofthesamples.Therefore,onemayinferthatsuchatheoreticalareaascombinatorialoptimizationndsanimportantapplicationindatamining.Inthisworkwepresentsomenewcombinatorialoptimizationtechniquesforthedataminingproblems.Whilehavingclusteringasthemaingoal,thesetechniquesarealsoabletohandledimensionalityandnoisereductionwithinthesameoptimizationtaskduetothedevelopedsophisticatedmathematicalmodelsforthedataminingproblems. 13

PAGE 14

CHAPTER2BICLUSTERINGINDATAMINING2.1TheMainConceptTheproblemsofpartitioningobjectsintoanumberofgroupscanbemetinmanyareas.Forinstance,thevectorpartitionproblem,whichconsistsinpartitioningofnd-dimensionalvectorsintoppartshasbroadexpressivepowerandarisesinavarietyofapplicationsrangingfromeconomicstosymboliccomputation[ 7 47 55 ].However,themostabundantareaforthepartitioningproblemsisdenitelydatamining.Dataminingisabroadareacoveringavarietyofmethodologiesforanalyzingandmodelinglargedatasets.Generallyspeaking,itaimsatrevealingagenuinesimilarityindataproleswhilediscardingthediversityirrelevanttoaparticularinvestigatedphenomenon.Toanalyzepatternsexistingindata,itisoftendesirabletopartitionthedatasamplesaccordingtosomesimilaritycriteria.Thistaskiscalledclustering.Therearemanyclusteringtechniquesdesignedforavarietyofdatatypes{homogeneousandnon-homogeneousnumericaldata,categoricaldata,0{1data.Amongthemoneshouldmentionhierarchicalclustering[ 57 ],k-means[ 65 ],self-organizingmapsSOM[ 60 ],supportvectormachninesSVM[ 36 87 ],logicalanalysisofdataLAD[ 15 16 ],etc.Arecentsurveyonclusteringmethodscanbefoundin[ 92 ].However,workingwithadataset,thereisalwaysapossibilitytoanalyzenotonlypropertiesofsamples,butalsooftheircomponentsusuallycalledattributesorfeatures.Itisnaturaltoexpectthateachassociatedpartofsamplesrecognizedasaclusterisinducedbypropertiesofacertainsubsetoffeatures.Withrespecttothesepropertieswecanformanassociatedclusteroffeaturesandbindittotheclusterofsamples.Suchapairiscalledabiclusterandtheproblemofpartitioningadatasetintobiclustersiscalledabiclusteringproblem. 14

PAGE 15

2.2FormalSetupLetadatasetofnsamplesandmfeaturesbegivenasarectangularmatrixA=aijmn,wherethevalueaijistheexpressionofi-thfeatureinj-thsample.WeconsiderclassicationofthesamplesintoclassesS1;S2;:::;Sr;Skf1:::ng;k=1:::r;S1[S2[:::[Sr=f1:::ng;SkS`=;;k;`=1:::r;k6=`:Thisclassicationshouldbedonesothatsamplesfromthesameclasssharecertaincommonproperties.Correspondingly,afeatureimaybeassignedtooneofthefeatureclassesF1;F2;:::;Fr;Fkf1:::mg;k=1:::r;F1[F2[:::[Fr=f1:::mg;FkF`=;;k;`=1:::r;k6=`;insuchawaythatfeaturesoftheclassFkareresponsible"forcreatingtheclassofsamplesSk.Suchasimultaneousclassicationofsamplesandfeaturesiscalledbiclusteringorco-clustering. Denition1. AbiclusteringofadatasetisacollectionofpairsofsampleandfeaturesubsetsB=S1;F1;S2;F2;:::;Sr;FrsuchthatthecollectionS1;S2;:::;Srformsapartitionofthesetofsamples,andthecollectionF1;F2;:::;Frformsapartitionofthesetoffeatures.ApairSk;Fkwillbecalledabicluster.Itisimportanttonoteherethatinsomeofthebiclusteringmethodologiesadirectone-to-onecorrespondencebetweenclassesofsamplesandclassesoffeaturesisnotrequired.Moreover,thenumberofsampleandfeatureclassesisallowedtobedierent.ThiswaywemayconsidernotonlypairsSk;Fk,butalsootherpairsSk;F`,k6=`. 15

PAGE 16

Suchpairswillbereferredtoasco-clusters.Anotherpossiblegeneralizationistoallowoverlappingofco-clusters.Thecriteriausedtorelateclustersofsamplesandclustersoffeaturesmayhavedierentnature.Mostcommonly,itisrequiredthatthesubmatrixcorrespondingtoabiclustereitherisoverexpressedi.e.,mostlyincludesvaluesaboveaverage,orhasalowervariancethanthewholedataset,butingeneral,biclusteringmayrelyonanykindofcommonpatternsamongelementsofabicluster.2.3VisualizationofBiclusteringOnepopulartoolforvisualizingdatasetsisheatmaps.Aheatmapisarectangulargridcomposedofpixelseachofwhichcorrespondstoadatavalue.Thecolorofapixelrangesbetweenbrightgreenorbluelowestvaluesandbrightredhighestvaluesvisualizingthecorrespondingdatavalue.Thisway,ifthesamplesor/andfeaturesofthedatasetareorderedwithrespecttosomepatterninthedata,thepatternbecomesobvioustoobservevisually.Whenoneconstructsareasonablebiclusteringofadatasetandthenreorderssamplesandfeaturesbyclusternumbers,theheatmapissupposedtoshowacheckerboard"patternasdiagonalblocksshowbiclustersthatarethedistinguishedsubmatricesaccordingtotheusedbiclusteringmethod.Figure 2-1 isanexampleofdatasetwith3biclustersofoverexpressedvaluesvisualizedastheheatmapinablack-and-whitediagramdarkerpixelscorrespondtohighervalues.2.4RelationtoSVDSingularvaluedecompositionSVDisaremarkablematrixfactorizationwhichgeneralizeseigendecompositionofasymmetricmatrixprovidingtheorthogonalbasisofeigenvectors.SVDisapplicabletoanyrectangularmatrixA=aijmn.ItdeliversorthogonalmatricesU=uikmpandV=vjknpthecolumnsofthematricesareorthogonaltoeachotherandhavetheunitlengthsuchthatUTAV=diag1;:::;p;p=minm;n:{1 16

PAGE 17

Figure2-1.Partitioningofsamplesandfeaturesinto3clusters 17

PAGE 18

Thenumbers12:::p0arecalledsingularvalues,thecolumnsofUarecalledleftsingularvectorsandthecolumnsofVarecalledrightsingularvectorsofA.Thisway,leftsingularvectorsprovideanorthonormalbasisforcolumnsofA,andrightsingularvectorsprovideanorthonormalbasisforrowsofA.Moreover,thesebasesarecoupledsothatAvk=kuk;ATuk=kvk;whereukisk-thleftsingularvector,andvkisk-thrightsingularvectorofthematrix.ThesingularvaluesofAarepreciselythelengthsofthesemi-axesofthehyperellipsoidE=fAx:kxk2=1g.TheSVDprovidessignicantinformationaboutpropertiesofthematrix.Inparticular,ifristhelastnonzerosingularvaluei.e.,r+1=:::=p=0,thenrankA=r;nullA=spanfvr+1;:::;vng;ranA=spanfu1;:::;urg;wherespanfx1;:::;xkgdenotesthelinearsubspacespannedbythevectorsx1;:::;xk,nullA=fx:Ax=0gisthenullspaceofthematrix,andranAisthelinearsubspacespannedbythecolumnsofA.ItiseasytoseefromthesepropertiesthattheSVDisaveryusefultoolfordimensionalityreductionindatamining.TakingalsointoaccountthattheFrobeniusnormofthematrixkAk2F=mXi=1nXj=1a2ij=rXk=12k;onecanobtainthebestinsenseofFrobeniusnormlow-rankapproximationofthematrixbyequatingallsingularvaluesaftersome`tozeroandconsidering 18

PAGE 19

~A=`Xk=1kukvTk:Suchalow-rankapproximationmaybefoundinprincipalcomponentanalysisPCAwith`rstprincipalcomponentsconsidered.PCAappliesSVDtothedatamatrixaftercertainpreprocessingcentralizationorstandardizationofdatasamplesisperformed.Wereferthereadertoalinearalgebratext[ 46 ]formoretheoreticalconsiderationoftheSVDpropertiesandalgorithms.OnemayrelatebiclusteringtotheSVDviaconsiderationanidealizeddatamatrix.Ifthedatamatrixhasablock-diagonalstructurewithallelementsoutsidetheblocksequaltozero,itisnaturaltoassociateeachblockwithabicluster.Ontheotherhand,itiseasytoseethateachpairofsingularvectorswilldesignateonesuchbiclusterbynonzerocomponentsinthevector.Moreprecisely,ifthedatamatrixisoftheformA=0BBBBBBB@A10:::00A2:::0............00:::Ar1CCCCCCCA;wherefAkg,k=1:::rarearbitrarymatrices,thenforeachAktherewillbeasingularvectorpairuk;vksuchthatnonzerocomponentsofukcorrespondtorowsoccupiedbyAkandnonzerocomponentsofvkcorrespondtocolumnsoccupiedbyAk.Inalessidealizedcase,whentheelementsoutsidethediagonalblocksarenotnecessarilyzerosbutdiagonalblocksstillcontaindominatingvalues,theSVDisabletorevealthebiclusterstooasdominatingcomponentsinthesingularvectorpairs.Hence,theSVDrepresentsahandytoolforbiclusteringalgorithms.BelowweshowthatmanybiclusteringmethodseitherusetheSVDdirectlyorhaveacertainassociationwiththeSVDconcept. 19

PAGE 20

2.5Methods2.5.1DirectClustering"Apparentlytheearliestbiclusteringalgorithmthatmaybefoundintheliteratureisso-calleddirectclusteringbyHartigan[ 51 ]alsoknownasBlockClustering.Thisapproachreliesonstatisticalanalysisofsubmatricestoformthebiclusters.Namely,thequalityofabiclusterSk;FkisassessedbythevarianceVARSk;Fk=Xi2FkXj2Skaij)]TJ/F21 11.955 Tf 11.955 0 Td[(k2;wherekistheaveragevalueinthebicluster:k=Pi2FkPj2Skaij jFkjjSkj:Abiclusterisconsideredperfectifithaszerovariance,sobiclusterswithlowervarianceareconsideredtobebetterthanbiclusterswithhighervariance.This,however,leadstoanundesirableeect:single-row,single-columnsubmatricesbecomeidealbiclustersastheirvarianceiszero.TheissueisresolvedbyxingthenumberofbiclustersandminimizingtheobjectiveVARS;F=rXk=1Xi2FkXj2Skaij)]TJ/F21 11.955 Tf 11.955 0 Td[(k2:Hartiganmentionedthatotherobjectivefunctionsmaybeusedtondbiclusterswithotherdesirablepropertiessuchasminimizingvarianceinrows,varianceincolumns,orbiclustersfollowingcertainpatterns.2.5.2Node-DeletionAlgorithmAmoresophisticatedcriterionforconstructingpatternedbiclusterswasintroducedbyY.ChengandG.M.Church[ 30 ].Itisbasedonminimizationofso-calledmeansquaredresidue.Toformulateit,letusintroducethefollowingnotation.Letrik=1 jSkjXj2Skaij{2 20

PAGE 21

bethemeanofthei-throwinthesampleclusterSk,cjk=1 jFkjXi2Fkaij{3bethemeanofthej-thcolumninthefeatureclusterFk,andk=Pi2FkPj2Skaij jFkjjSkjbethemeanvalueinthebiclusterSk;Fk.Theresidueofelementaijisdenedasrij=aij)]TJ/F21 11.955 Tf 11.955 0 Td[(rik)]TJ/F21 11.955 Tf 11.955 0 Td[(cjk+k;{4i2Fk,j2Sk.Finally,themeansquaredresiduescoreofthebiclusterSk;FkisdenedasHk=Xi2FkXj2Skrij2:Thisvalueisequaltozeroifallcolumnsofthebiclusterareequaltoeachotherthatwouldimplythatallrowsareequaltoo.AbiclusterSk;Fkiscalleda-biclusterifHk.ChengandChurchprovedthatndingthelargestsquare-biclusterisNP-hard.So,theyusedagreedyprocedurestartingfromtheentiredatamatrixandsuccessivelyremovingcolumnsorrowscontributingmosttothemeansquaredresiduescore.Thebrute-forcedeletionalgorithmtestingthedeletionofeachrowandcolumnwouldbestillquiteexpensiveinthesenseoftimecomplexityasitwouldrequireOm+nmnoperations.However,theauthorsemployedasimpliedsearchforcolumnsandrowstodeletechoosingacolumnwithmaximaldj=1 jFkjXi2Fkr2ij;arowwithmaximaldi=1 jSkjXj2Skr2ij; 21

PAGE 22

orsubsetsofcolumnsorrowsforwhichdjordiexceedsacertainthresholdabovethecurrentmeansquareresidueofthebicluster.Theyhaveprovedthatanysuchdeletioncanonlydecreasethecurrentmeansquareresidue.Thesedeletionsareperformeduntila-biclusterisobtained.Then,astheconstructedco-clustercanbenotmaximalsomeofthepreviouslyremovedcolumnsorrowscanbeaddedwithoutviolatingthe-biclustercondition,theauthorsusedacolumnandrowadditionalgorithm.Namely,theyprovedthataddinganycolumnrowwithdjdibelowthecurrentmeansquareresiduedoesnotincreaseit.Therefore,successiveadditionofsuchcolumnsandrowsleadstoamaximal-bicluster.Softwareimplementationofthemethodaswellassometestdatasetsareavailableat[ 31 ].K.Bryanetal.improvedthenode-deletionalgorithmofChengandChurchapplyingasimulatedannealingtechnique.Theyreportedabetterperformanceonavarietyofdatasetsin[ 20 ].2.5.3FLOCAlgorithmJ.Yangetal.generalizedthedenitionofresidueusedinthenode-deletionalgorithmtoallowmissingdataentriessomeaijmaybeunknown[ 93 94 ].ForabiclusterSk;Fk,theyintroducedthenotionofalpha-occupancymeaningthatforeachsamplej2Skthenumberofknowndataentriesaij,i2FkisgreaterthanjFkjandforeachfeaturei2Fkthenumberofknowndataentriesaij,j2SkisgreaterthanjSkj.Theyalsodenedthevolumeofabiclusterasthenumberofknowndataentriesinthebicluster,andtheaveragevaluesrik,cjkandkarecalculatedwithrespecttotheknowndataentriesonly.TheauthorsdevelopedaheuristicalgorithmFLOCexibleoverlappedclusteringtondrbiclusterswithlowaverageresidue.First,thebiclustersaregeneratedrandomlywithachosenprobabilityforeachsampleandfeaturetobeincludedinabicluster.Then,foreachfeatureandsample,andforeachbiclusteritiscalculatedhowmuchtheadditionofthisfeature/sampleifitiscurrentlynotinthebiclusteroritsremovalifitiscurrently 22

PAGE 23

inthebiclusterreducestheresidueofthebicluster.Ifatleastoneofsuchactionsreducestheresidue,theoneachievingthelargestreductionisperformed.Whennofurtherresiduereductionispossible,themethodstops.ItiseasytoshowthatthecomputationalcomplexityofthemethodisOm+nmnrp,wherepisthenumberofiterationstilltermination.Theauthorsclaimthatinthecomputationalexperimentstheyperformedpisoftheorderof10.TheFLOCalgorithmisalsoabletotakeintoaccountvariousadditionalconstraintsonbiclustersbyeliminatingcertainfeature/sampleadditions/removalsfromconsideration.2.5.4BiclusteringviaSpectralBipartiteGraphPartitioningIn[ 38 ]I.S.Dhillonproposedthefollowingmethodofbiclustering.RepresenteachsampleandeachfeatureofadatasetasavertexofagraphGV;E,jVj=m+n.Betweenthevertexcorrespondingtosamplej=1:::nandthevertexcorrespondingtofeaturei=1:::mintroduceanedgewithweightaij.Thegraphhasnoedgesbetweenverticesrepresentingsamples,aswellasbetweenverticesrepresentingfeatures.Thus,thegraphisbipartitewithFandSrepresentingitscolorclasses.ThegraphGhasthefollowingweightedadjacencymatrixM=0B@0AAT01CA{5Now,apartitionofthesetofverticesintorpartsV1;V2;:::;Vr,V=V1[V2[:::[Vr;VkV`=;;k6=`;k;`=1:::r;willprovideabiclusteringofthedataset.Denethecostofthepartitionasthetotalweightofedgescutbyit:cutV1;:::;Vr=r)]TJ/F20 7.97 Tf 6.587 0 Td[(1Xk=1rX`=k+1Xi2VkXj2V`mij{6 23

PAGE 24

Whenwearelookingforabiclusteringmaximizingin-classexpressionvaluesthuscreatingdominatingsubmatricesofbiclustersitisnaturaltoseekminimizationofthedenedcutvalue.Besides,weshouldbelookingforratherbalancedinsizebiclustersasotherwisethecutvalueismostprobablyminimizedwithallbutonebiclusterscontainingonesample-featurepaironly.ThisproblemcanbetackledwithanSVD-relatedalgorithm.Letusintroducethefollowing Denition2. TheLaplacianmatrixLGofGV;EisajVjjVjsymmetricmatrix,withonerowandonecolumnforeachvertex,suchthatLij=8>>>><>>>>:Pkmik;ifi=j;)]TJ/F21 11.955 Tf 9.298 0 Td[(mij;ifi6=jandi;j2E;0;otherwise.LetapartitionV=V1[V2ofthegraphbedenedviaa1vectorp=pii=1:::jVjsuchthatpi=8><>:+1;i2V1;)]TJ/F15 11.955 Tf 9.299 0 Td[(1;i2V2:TheLaplacianmatrixisconnectedtotheweightofacutthroughthefollowing Theorem1. GiventheLaplacianmatrixLGofGandapartitionvectorp,theRayleighQuotientpTLp pTp=4 jVjcutV1;V2:Bythistheorem,thecutisobviouslyminimizedwiththetrivialsolution,i.e.,whenallpiareeither)]TJ/F15 11.955 Tf 9.298 0 Td[(1or1.So,toachieveabalancedpartitionweneedtomodifytheobjectivefunction.Letusassignapositiveweightwitoeachvertexi2V,andletW=diagw1;w2;:::;wjVjbethediagonalmatrixoftheseweights.WedenoteweightV`=Xi2V`wi 24

PAGE 25

Now,thefollowingobjectivefunctionallowsustoachievebalancedclusters:QV1;V2=cutV1;V2 weightV1+cutV1;V2 weightV2:Letusdenote`=Pi2V`wiandintroducethegeneralizedpartitionvectorwithelementsqi=8><>:+q 2 1;i2V1;)]TJ/F26 11.955 Tf 9.299 12.328 Td[(q 1 2;i2V2:ThefollowingtheoremgeneralizesTheorem 1 Theorem2. qTLq qTWq=cutV1;V2 weightV1+cutV1;V2 weightV2:{7Minimizingtheexpression 2{7 isNP-hard.However,arelaxedversionofthisproblemcanbesolvedviaageneralizedeigendecompositionnoticethatqTWe=0. Theorem3. Theproblemminx6=0xTLx xTWx{8s.t.xTWe=0;issolvedwhenqistheeigenvectorcorrespondingtothesecondsmallesteigenvalue2ofthegeneralizedeigenvalueproblemLz=Wz{9WecansolvethisproblemforthebipartitegraphcaseviatheSVD.ChoosingtheweightmatrixWtobeequaltothedegreematrix,wehaveL=0B@D1)]TJ/F21 11.955 Tf 9.298 0 Td[(A)]TJ/F21 11.955 Tf 9.298 0 Td[(ATD21CAandW=0B@D100D21CA; 25

PAGE 26

whereD1andD2arediagonalmatricessuchthatD1i;i=PjaijandD2j;j=Piaij.Then 2{9 becomes0B@D1)]TJ/F21 11.955 Tf 9.298 0 Td[(A)]TJ/F21 11.955 Tf 9.298 0 Td[(ATD21CA0B@xy1CA=0B@D100D21CA0B@xy1CA;ordenotingu=D1=21xandv=D1=22y,D)]TJ/F20 7.97 Tf 6.586 0 Td[(1=21AD)]TJ/F20 7.97 Tf 6.586 0 Td[(1=22v=)]TJ/F21 11.955 Tf 11.955 0 Td[(u;D)]TJ/F20 7.97 Tf 6.586 0 Td[(1=22ATD)]TJ/F20 7.97 Tf 6.587 0 Td[(1=21u=)]TJ/F21 11.955 Tf 11.955 0 Td[(v;whichpreciselydenestheSVDofthenormalizedmatrix^A=D)]TJ/F20 7.97 Tf 6.587 0 Td[(1=21AD)]TJ/F20 7.97 Tf 6.587 0 Td[(1=22.So,thebalancedcutminimizationproblemcanbesolvedbyndingthesecondlargestsingularvalueofthisnormalizedmatrixandthesingularvectorpaircorrespondingtoitthatcanbeusedtoobtainthebiclusteringtotwoclasses.Incaseofmulticlasspartitioning,Dhillonused`=dlog2resingularvectorsu2;u3;:::;u`+1andv2;v3;:::;v`+1toformthe`-dimensionaldatasetZ=0B@D)]TJ/F20 7.97 Tf 6.586 0 Td[(1=21UD)]TJ/F20 7.97 Tf 6.587 0 Td[(1=22V1CA;whereU=u2;:::;u`+1andV=v2;:::;v`+1.Aftersuchasignicantdimensionalityreductionisperformed,therowsofthematrixZwhichrepresentbothsamplesandfeaturesoftheoriginaldatasetareclusteredwithasimplek-meansalgorithm[ 65 ].Dhillonreportsencouragingcomputationalresultsfortextminingproblems.VerysimilarspectralbiclusteringroutinesformicroarraydatahavebeensuggestedbyY.Klugeretal.[ 59 ].Inadditiontoworkingwiththesingularvectorsof^A,theyconsideredtwoothernormalizationmethodsthatcanbeusedbeforeapplyingtheSVD.Therstoneisbistochastization.Itmakesallrowsumsequalandallcolumnsumsequaltoogenerally,toadierentconstant.ItisknownfromSinkhorn'stheoremthatunderquitegeneralconditionsonthematrixAthereexistdiagonalmatricesD1andD2suchthat 26

PAGE 27

D1AD2achievesbistochastization[ 6 ].Theotherapproachisapplicableifsample/featuresubvectorswithinabiclusterareexpectedtobeshiftedbyaconstantwithrespecttoeachotheri.e.,vectorsaandbareconsideredsimilarifab+e,whereistheconstantandeistheall-onevector.Whensimilardataareexpectedtobescaledbydierentconstantsi.e.,ab,thedesirablepropertycanbeachievedbyapplyingalogarithmtoalldataentries.Then,deningai=1 nnXj=1aij;aj=1 mmXi=1aij;anda=1 mnmXi=1nXj=1aij;thenormalizeddataareobtainedasbij=aij)]TJ/F15 11.955 Tf 12.101 0 Td[(ai)]TJ/F15 11.955 Tf 12.101 0 Td[(aj+a:Aftercomputingthesingularvectors,itisdecidedwhichofthemcontaintherelevantinformationabouttheoptimaldatapartition.Toextractpartitioninginformationfromthesystemofsingularvectors,eachofthemisexaminedbyttingtoapiecewiseconstantvector.Thatis,theentriesofaneigenvectorissortedandallpossiblethresholdsbetweenclassesareconsidered.Suchaprocedureisequivalenttosearchingforgoodoptimainone-dimensionalk-meansproblem.Thenfewbestsingularvectorscanbeselectedtorunk-meansonthedataprojectedontothem.2.5.5MatrixIterationAlgorithmsforMinimizingSum-SquaredResidueH.Choetal.proposedaco-clusteringalgorithmminimizingthesum-squaredresiduethroughoutallco-clusters[ 33 ].Thus,thisapproachdoesnottakeintoaccountanycorrespondencebetweenclustersofsamplesandclustersoffeatures,butconsidersallthesubmatricesformedbythem.Thealgorithmisbasedonalgebraicpropertiesofthematrixofresidues. 27

PAGE 28

ForagivenclusteringoffeaturesF1;F2;:::;Fq,introduceafeatureclusterindicatormatrixF=fikmqsuchthatfik=jFkj)]TJ/F20 7.97 Tf 6.587 0 Td[(1=2ifi2Fkandfik=0otherwise.Also,foragivenclusteringofsamplesS1;S2;:::;Sr,introduceasampleclusterindicatormatrixS=sjknrsuchthatsjk=jSkj)]TJ/F20 7.97 Tf 6.587 0 Td[(1=2ifj2Skandsjk=0otherwise.Noticethatthesematricesareorthonormal,thatis,allcolumnsareorthogonaltoeachotherandhaveunitlength.Now,letH=hijmnbetheresiduematrix.Therearetwochoicesforhijdenition.Itmaybedenedsimilarto 2{4 :hij=aij)]TJ/F21 11.955 Tf 11.956 0 Td[(rik)]TJ/F21 11.955 Tf 11.956 0 Td[(cj`+k`;{10wherei2F`,j2Sk,randcaredenedasin 2{2 and 2{3 ,andk`istheaverageoftheco-clusterSk;F`:k`=Pi2F`Pj2Skaij jF`jjSkj:Alternatively,hijmaybedenedjustasthedierencebetweenaijandtheco-clusteraverage:hij=aij)]TJ/F21 11.955 Tf 11.955 0 Td[(k`:{11BydirectalgebraicmanipulationsitcanbeshownthatH=A)]TJ/F21 11.955 Tf 11.955 0 Td[(FFTASST{12incaseof 2{11 andH=I)]TJ/F21 11.955 Tf 11.955 0 Td[(FFTAI)]TJ/F21 11.955 Tf 11.955 0 Td[(SST{13incaseof 2{10 .ThemethodtriestominimizekHk2usinganiterativeprocesssuchthatoneachiterationacurrentco-clusteringisupdatedsothatkHk2,atleast,doesnotincrease.TheauthorspointoutthatndingtheglobalminimimumforkHk2overallpossibleco-clusteringswouldleadtoanNP-hardproblem.Therearetwotypesofclusteringupdatesused:batchwhenallsamplesorfeaturesmaybemovedbetweenclustersatone 28

PAGE 29

timeandincrementalonesampleoronefeatureismovedatatime.Incaseof 2{11 thebatchalgorithmworksasdenedinAlgorithm 2-2 Input: datamatrixA,numberofsampleclustersr,numberoffeatureclustersq Output: clusteringindicatorsSandFInitializeSandF;objvalkA)]TJ/F21 11.955 Tf 11.956 0 Td[(FFTASSTk2;1,10)]TJ/F20 7.97 Tf 6.587 0 Td[(2kAk2fAdjustableg;while>doASFFTAS;forj1tondoassignj-thsampletoclusterSkwithsmallestkAj)-222(jSkj)]TJ/F20 7.97 Tf 6.586 0 Td[(1=2ASkk2;endupdateSwithrespecttothenewclustering;AFFTASST;fori1tomdoassigni-thfeaturetoclusterFkwithsmallestkAi)-222(jFkj)]TJ/F20 7.97 Tf 6.587 0 Td[(1=2AFkk2;endupdateFwithrespecttothenewclustering;oldobjobjval,objvalkA)]TJ/F21 11.955 Tf 11.955 0 Td[(FFTASSTk2;joldobj)]TJ/F21 11.955 Tf 11.956 0 Td[(objvalj;endFigure2-2.Coclus H1algorithm Incaseof 2{10 ,itbecomesAlgorithm 2-3 ,whichissimilarbutusesabitdierentmatrixmanipulations.Todescribetheincrementalalgorithm,werstnotethatincaseof 2{10 Hisdenedasin 2{13 ,andminimizationofkHk2isequivalenttomaximizationofkFTASk2.So,supposewewouldliketoimprovetheobjectivefunctionbymovingasamplefromclusterSktoclusterSk0.DenoteFTAbyAandthenewsampleclusteringindicatormatrixby~S.AsSand~Sdieronlyincolumnskandk0,theobjectivecanberewrittenaskA~Sk0k2)-222(kASk0k2+kA~Skk2)-222(kASkk2:{14So,theinnerloopoftheincrementalalgorithmlooksthroughallpossibleonesamplemovesandchoosestheoneincreasing 2{14 most.Asimilarexpressioncanbederived 29

PAGE 30

Input: datamatrixA,numberofsampleclustersr,numberoffeatureclustersq. Output: clusteringindicatorsSandF.InitializeSandF;objvalkI)]TJ/F21 11.955 Tf 11.955 0 Td[(FFTAI)]TJ/F21 11.955 Tf 11.956 0 Td[(SSTk2;1,10)]TJ/F20 7.97 Tf 6.587 0 Td[(2kAk2fAdjustableg;while>doASI)]TJ/F21 11.955 Tf 11.955 0 Td[(FFTAS,API)]TJ/F21 11.955 Tf 11.956 0 Td[(FFTA;forj1tondoassignj-thsampletoclusterSkwithsmallestkAPj)-222(jSkj)]TJ/F20 7.97 Tf 6.587 0 Td[(1=2ASkk2;endupdateSwithrespecttothenewclustering;AFFTAI)]TJ/F21 11.955 Tf 11.955 0 Td[(SST,APAI)]TJ/F21 11.955 Tf 11.956 0 Td[(SST;fori1tomdoassigni-thfeaturetoclusterFkwithsmallestkAPi)-222(jFkj)]TJ/F20 7.97 Tf 6.586 0 Td[(1=2AFkk2;endupdateFwithrespecttothenewclustering;oldobjobjval,objvalkI)]TJ/F21 11.955 Tf 11.955 0 Td[(FFTAI)]TJ/F21 11.955 Tf 11.955 0 Td[(SSTk2;joldobj)]TJ/F21 11.955 Tf 11.956 0 Td[(objvalj;endFigure2-3.Coclus H2algorithm forfeatures.Next,itcanbeshownthatincase 2{11 whenHisdenedasin 2{12 ,theobjectivecanbereducedtokA~Sk0k2)-143(kASk0k2+kA~Skk2)-143(kASkk2)-143(kA~Sk0k2+kASk0k2)-143(kA~Skk2+kASkk2;{15sotheincrementalalgorithmjustuses 2{15 insteadof 2{14 .NoticethedirectrelationofthemethodtotheSVD.MaximizationofkFTASk2ifFandSwerejustconstrainedtobeorthonormalmatriceswouldbesolvedbyF=UandS=V,whereUandVareasin 2{1 .FandShavetheadditionalconstraintonthestructurebeingaclusteringindicator.However,theSVDhelpstoinitializetheclusteringindicatormatricesandprovidesalowerboundontheobjectiveasthesumofsquaresofthesingularvalues.Softwarewiththeimplementationofbothcasesofthismethodisavailableat[ 34 ]. 30

PAGE 31

2.5.6DoubleConjugatedClusteringDoubleconjugatedclusteringDCCisanode-drivenbiclusteringtechniquethatcanbeconsideredafurtherdevelopmentofsuchclusteringmethodsask-means[ 65 ]andself-organizingmapsSOM[ 60 ].ThemethodwasdevelopedbyS.Busyginetal.[ 23 ].Itoperatesintwospaces{spaceofsamplesandspaceoffeatures{applyingineachofthemeitherk-meansorSOMtrainingiterations.Meanwhile,aftereachone-spaceiterationitsresultupdatestheothermapofclustersbymeansofamatrixprojection.Themethodworksasfollows.IntroduceamatrixC=cikmrwhichwillbereferredtoassamplesnodesorsamplesmapandamatrixD=djknrwhichwillbereferredtoasfeaturesnodesorfeaturesmap.Thisdesignatesrnodesforsamplesandrnodesforfeaturesthatwillbeusedforone-spaceclusteringiterationssuchask-meansorSOMinthelattercase,thenodesaretobearrangedwithrespecttoacertaintopologythatwilldeterminenodeneighborhoods.Westartfromthesamplesmap,initializeitwithrandomnumbersandperformaone-spaceclusteringiterationforinstance,incaseofk-meansweassigneachsampletoclosestnodeandthenupdateeachnodestoringinitthecentroidoftheassignedsamples.NowthecontentofCisprojectedtoformDwithamatrixtransformation:D:=BATC;whereBMistheoperatornormalizingeachcolumnofmatrixMtotheunitlength.Thematrixmultiplicationthattransformsnodesofonespacetotheothercanbejustiedwiththefollowingargument.Thevaluecikistheweightofi-thfeatureinthek-thnode.So,thek-thnodeofthefeaturesmapisconstructedasalinearcombinationofthefeaturessuchthatcikisthecoecientofthei-thfeatureinit.Theunitnormalizationkeepsthemagnitudeofnodevectorsconstrained.Next,aftertheprojection,thefeaturesmapisupdatedwiththesimilarone-spaceclusteringiteration,andthenthebackwardsprojection 31

PAGE 32

isapplied:C:=BAD;whichisjustiedinthesimilarmannerusingthefactthatdjkistheweightofthej-thsampleinthek-thnode.Thiscycleisrepeateduntilnosamplesandfeaturesaremovedanymore,orstopsafterapredenednumberofiterations.Tobeconsistentwithunitnormalizationoftheprojectednodes,theauthorshavechosentousecosinemetricsforone-spaceiterations,whichisnotaectedbydierencesinmagnitudesoftheclusteredvectors.Thisalsopreventsundesirableclusteringofalllow-magnitudeelementsintoasingleclusterthatoftenhappenswhenanode-drivenclusteringisperformedusingtheEuclideanmetric.TheDCCmethodhasacloseconnectiontotheSVDthatcanbeobservedinitscomputationalroutine.Noticethatifoneforgets"toperformtheone-spaceclusteringiterations,thenDCCexecutesnothingelsebutthepowermethodfortheSVD[ 46 ].Insuchcaseallsamplesnodeswouldconvergetothedominatingleftsingularvectorandallfeaturesnodeswouldconvergetothedominatingrightsingularvectorofthedatamatrix.However,theone-spaceiterationspreventthisfromhappeningmovingthenodestowardscentroidsofdierentsample/featureclusters.Thisactssimilarlytore-orthogonalizationinthepowermethodwhennotonlythedominatingbutalsoabunchofnextsingularvectorpairsaresought.ThiswayDCCcanbeseenasanalterationofthepowermethodforSVDrelaxingtheorthogonalityrequirementfortheiteratedvectorsbutmakingthemmoreappealingtogroupsofsimilarsamples/featuresofthedata.2.5.7Information-TheoreticBasedCo-ClusteringInthismethod,developedbyI.Dhillonetal.in[ 39 ],wetreattheinputdatasetaijmnasajointprobabilitydistributionpX;YbetweentwodiscreterandomvariablesXandY,whichcantakevaluesinthesetsfx1;x2;:::;xmgandfy1;y2;:::;yng,respectively. 32

PAGE 33

Formallyspeaking,thegoaloftheproposedprocedureistoclusterXintoatmostkdisjointclusters^X=f^x1;^x2;:::;^xkgandYintoatmostldisjointclusters^Y=f^y1;^y2;:::;^ylg.Putdierently,wearelookingformappingsCXandCYsuchthatCX:fx1;x2;:::;xmg)167(!f^x1;^x2;:::;^xkg;CY:fy1;y2;:::;yng)167(!f^y1;^y2;:::;^ylg;i.e.,^X=CXXand^Y=CYY,andatupleCX;CYisreferredtoasco-clustering.Beforeweproceedwithadescriptionofthetechniqueletusrecallsomedenitionsfromprobabilityandinformationtheory.Therelativeentropy,ortheKullback-LeiblerKLdivergencebetweentwoprobabilitydistributionsp1xandp2xisdenedasDp1jjp2=Xxp1xlogp1x p2x:Kullback-Leiblerdivergencecanbeconsideredasadistance"ofatrue"distributionp1toanapproximationp2.ThemutualinformationIX;YoftworandomvariablesXandYistheamountofinformationsharedbetweenthesetwovariables.Inotherwords,IX;Y=IY;XmeasureshowmuchXtellsaboutYand,viceversa,howmuchYtellsaboutX.ItisdenedasIX;Y=XyXxpx;ylogpx;y pxpy=Dpx;yjjpxpy:Now,wearelookingforanoptimalco-clustering,whichminimizesthelossinmutualinformationmin^X;^YIX;Y)]TJ/F21 11.955 Tf 11.956 0 Td[(I^X;^Y:{16DeneqX;Ytobethefollowingdistributionqx;y=p^x;^ypxj^xpyj^y;{17 33

PAGE 34

wherex2^xandy2^y.Obviously,pxj^x=px p^xif^x=CXxand0,otherwise.ThefollowingresultstatesanimportantrelationbetweenthelossofinformationanddistributionqX;Y[ 5 ]: Lemma1. Foraxedco-clusteringCX;CY,wecanwritethelossinmutualinformationasIX;Y)]TJ/F21 11.955 Tf 11.956 0 Td[(I^X;^Y=DpX;YjjqX;Y:{18Inotherwords,ndinganoptimalco-clusteringisequivalenttondingadistributionqdenedby 2{17 ,whichisclosetopinKLdivergence.ConsiderthejointdistributionofX,Y,^Xand^YdenotedbypX;Y;^X;^Y.Followingtheabovelemmaand 2{17 wearelookingforadistributionqX;Y;^X;^Y,anapproximationofpX;Y;^X;^Y,suchthat:qx;y;^x;^y=p^x;^ypxj^xpyj^y;andpX;YandqX;Yareconsideredastwo-dimensionalmarginalsofpX;Y;^X;^YandqX;Y;^X;^Y,respectively.Thenextlemmaliesinthecoreoftheproposedalgorithmfrom[ 39 ]. Lemma2. Thelossinmutualinformationcanbeexpressedas iaweightedsumoftherelativeentropiesbetweenrowdistributionspYjxandrow-lumped"distributionsqYj^x,DpX;Y;^X;^YjjqX;Y;^X;^Y=X^xXx:CXx=^xpxDpYjxjjqYj^x; iiaweightedsumoftherelativeentropiesbetweencolumndistributionspXjyandcolumn-lumped"distributionsqXj^y,thatis,DpX;Y;^X;^YjjqX;Y;^X;^Y=X^yXy:CYy=^ypyDpXjyjjqXj^y:DuetoLemma 2 theobjectivefunctioncanbeexpressedonlyintermsoftherow-clustering,orcolumn-clustering.Startingwithsomeinitialco-clusteringC0X;C0Y 34

PAGE 35

anddistributionq0weiterativelyobtainnewco-clusteringsC1X;C1Y,C2X;C2Y,:::,usingcolumn-clusteringinordertoimproverow-clusteringasCt+1Xx=argmin^xDpYjxjjqtYj^x{19and,viceversa,usingrow-clusteringtoimprovecolumn-clusteringasCt+2Yy=argmin^yDpXjyjjqtXj^y:{20Obviously,aftereachstep 2{19 ,or 2{20 weneedtorecalculatethenecessarydistributionsqt+1andqt+2.Itcanbeprovedthatthedescribedalgorithmmonotonicallydecreasestheobjectivefunction 2{16 ,thoughitmayconvergeonlytoalocalminimum[ 39 ].Softwarewiththeimplementationofthismethodisavailableat[ 34 ].In[ 5 ]thedescribedalternatingminimizationschemewasgeneralizedforBregmandivergences,whichincludesKL-divergenceandEuclideandistanceasspecialcases.2.5.8BiclusteringviaGibbsSamplingTheBayesianframeworkcanbeapowerfultooltotackleproblemsinvolvinguncertaintyandnoisypatterns.Thusitcomesasanaturalchoicetoapplyittodataminingproblemssuchasbiclustering.Q.Shengatal.proposedaBayesiantechniqueforbiclusteringbasedonasimplefrequencymodelfortheexpressionpatternofabiclusterandonGibbssamplingforparameterestimation[ 81 ].Thisapproachnotonlyndssamplesandfeaturesofabiclusterbutalsorepresentsthepatternofabiclusterasaprobabilisticmodeldenedbytheposteriordistributionforthedatavalueswithinthebicluster.ThechoiceofGibbssamplingalsohelpstoavoidlocalminimaintheExpectation-Maximizationprocedurethatisusedtoobtainandadjusttheprobabilisticmodel.Gibbssamplingisawell-knownMarkovchainMonteCarlomethod[ 29 ].Itisusedtosamplerandomvariablesx1;x2;:::;xkwhentheirmarginaldistributionofthejointdistributionaretoocomplextosampledirectlyfrom,buttheconditionaldistributions 35

PAGE 36

canbeeasilysampled.Startingfrominitialvaluesx1;x2;:::;xk,theGibbssamplesdrawsvaluesofthevariablesfromtheconditionaldistributions:xt+1ipxijxt+11;:::;xt+1i)]TJ/F20 7.97 Tf 6.586 0 Td[(1;xti+1;:::;xtk;i=1:::k,t=0;1;2;:::.Itcanbeshownthatthedistributionofxt1;xt2;:::;xtkconvergestothetruejointdistributionpx1;x2;:::;xkandthedistributionsofsequencesfxt1g,fxt2g,:::,fxtkgconvergetotruemarginaldistributionofthecorrespondingvariables.Thebiclusteringmethodworkswithm+n0-1valuesf=fii=1:::mforfeaturesands=sjj=1:::nforsamplesindicatingwhichfeaturesandsamplesareselectedtothebicluster.TheseindicatorsareconsideredBernoullirandomvariableswithparametersfandsrespectively.Thedataarediscretizedandmodeledwithmultinomialdistributions.Thebackgrounddatai.e.,allthedatathatdonotbelongtothebiclusterareconsideredtofollowonesingledistribution=1;2;:::;`,0k1,Pkk=1,k=1;:::;`,where`isthetotalnumberofbinsusedfordiscretization.Itisassumedthatwithinthebiclusterallfeaturesshouldbehavesimilarly,butthesamplesareallowedtohavedierentexpressionlevels.Thatis,fordatavaluesofeachsamplejwithinthebiclusterweassumeadierentdistribution1j;2j;:::;`j,0kj1,Pkkj=1,k=1;:::;`,anditisindependentfromtheothersamples.Theprobabilitiesf,s,fkgandfkjgareparametersofthisBayesianmodel,andthereforeweneedtoincludeinthemodeltheirconjugatepriors.TypicallyforBayesianmodels,onechoosesBetadistributionfortheconjugatepriorsofBernoullirandomvariablesandDirichletdistributionfortheconjugatepriorsofmultinomialrandomvariables:Dirichlet;jDirichletj;f=Betaf;s=Betas; 36

PAGE 37

whereandjareparametervectorsoftheDirichletdistributions,andfandsareparametervectorsoftheBetadistributions.Denotethesubvectorofswithj-thcomponentremovedbysjandthesubvectoroffwithi-thcomponentremovedbyfi.Toderivethefullconditionaldistributions,onecanusetherelationsbetweendistributionspfijfi;s;D/pfi;fi;s;D=pf;s;Dandpsjjf;sj;D/pf;sj;sj;D=pf;s;D;whereDistheobserveddiscretizeddata.Thedistributionpf;s;Dcanbeobtainedbyintegrating,,fandsoutofthelikelihoodfunctionL;;f;sjf;s;D:L;;f;sjf;s;D=pf;s;Dj;;f;s=pDjf;s;;pfjfpsjs:Usingtheseconditionalprobabilities,wecanperformthebiclusteringwithAlgorithm 2-4 Initializerandomlyvectorsfands;repeatfori1tomdo//eachfeaturepipfi=1jfi;s;D;assignfi1withprobabilitypiandfi0otherwise;endforj1tondo//eachsamplepjpsj=1jf;sj;D;assignsj1withprobabilitypiandsj0otherwise;enduntilthenumberofiterationsexceededapredeterminednumber;Figure2-4.Gibbsbiclusteringalgorithm Toobtainthebiclustering,theprobabilitiespi'sandpj'sareaveragedoveralliterationsandafeature/sampleisselectedinthebiclusteriftheaverageprobabilitycorrespondingtoitisaboveacertainthreshold.Morethanonebiclustercanbe 37

PAGE 38

constructedbyrepeatingtheprocedurewhiletheprobabilitiescorrespondingtopreviouslyselectedsamplesandfeaturesarepermanentlyassignedtozero.2.5.9Statistical-AlgorithmicMethodforBiclusterAnalysisSAMBAConsiderabipartitegraphGF;S;E,wherethesetofdatafeaturesFandthesetofdatasamplesSformtwoindependentsets,andthereisanedgei;j2Ebetweeneachfeatureiandeachsamplejitheexpressionleveloffeatureichangessignicantlyinsamplej.Obviously,abiclusterB0=S0;F0shouldcorrespondtoasubgraphHF0;S0;E0ofG.Nextassignsomeweightstotheedgesandnon-edgesofGinsuchawaythatthestatisticalsignicanceofabiclustermatchestheweightoftherespectivesubgraph.Hence,inthissetupbiclusteringisreducedtoasearchforheavysubgraphsinG.Thisideaisacornerstoneofthestatistical-algorithmicmethodforbiclusteranalysisSAMBAdevelopedbyTanayetal.[ 83 85 ].SomeadditionaldetailsonconstructionofabipartitegraphGF;S;Ecorrespondingtofeaturesandsamplescanbefoundinthesupportinginformationof[ 84 ].Theideabehindoneofthepossibleschemesforedges'weightassignmentfrom[ 85 ]worksasfollows.Letpf;sbethefractionofbipartitegraphswiththedegreesequencesameasinGsuchthattheedgef;s2E.Supposethattheoccurrenceofanedgef;sisanindependentBernoullirandomvariablewithparameterpf;s.Inthiscase,theprobabilityofobservingsubgraphHisgivenbypH=0@Yf;s2E0pf;s1A0@Yf;s=2E0)]TJ/F21 11.955 Tf 11.956 0 Td[(pf;s1A{21Nextconsideranothermodel,whereedgesbetweenverticesfromdierentpartitionsofabipartitegraphGoccurindependentlywithconstantprobabilitypc>maxf;s2F;Spf;s.Assigningweightslogpc pf;stoedgesf;s2E0andlog1)]TJ/F22 7.97 Tf 6.587 0 Td[(pc 1)]TJ/F22 7.97 Tf 6.587 0 Td[(pf;stof;s=2E0wecanobservethatthelog-likelihoodratioforasubgraphHlogLH=Xf;s2E0pc pf;s+Xf;s=2E01)]TJ/F21 11.955 Tf 11.955 0 Td[(pc 1)]TJ/F21 11.955 Tf 11.955 0 Td[(pf;s{22 38

PAGE 39

isequaltotheweightofthesubgraphH.Ifweassumethatwearelookingforbiclusterswiththefeaturesbehavingsimilarlywithinthesetofsamplesoftherespectivebiclusterthenheavysubgraphsshouldcorrespondtogood"biclusters.In[ 85 ]thealgorithmforndingheavysubgraphsbiclustersisbasedontheprocedureforsolvingthemaximumboundedbicliqueproblem.InthisproblemwearelookingforamaximumweightbicliqueinabipartitegraphGF;S;Esuchthatthedegreeofeveryfeaturevertexf2Fisatmostd.ItcanbeshownthatmaximumboundedbicliquecanbesolvedinOn2dtime.AttherststepofSAMBAforeachvertexf2Fwendkheaviestbicliquescontainingf.Duringthenextphaseofthealgorithmwetrytoimprovetheweightoftheobtainedsubgraphsbiclustersusingasimplelocalsearchprocedure.Finally,wegreedilylteroutbiclusterswithmorethanL%overlap.SAMBAimplementationisavailableasapartofEXPANDER,geneexpressionanalysisandvisualizationtool,at[ 79 ].2.5.10CoupledTwo-wayClusteringCoupledtwo-wayclusteringCTWCisaframeworkthatcanbeusedtobuildabiclusteringonthebasisofanyone-wayclusteringalgorithm.ItwasintroducedbyG.Getz,E.LevineandE.Domanyin[ 43 ].Theideabehindthemethodistondstableclustersofsamplesandfeaturessuchthatusingoneofthefeatureclustersresultsinstableclusteringforsamplesandviceversa.Theiterativeprocedurerunsasfollows.InitiallytheentiresetofsamplesS00andtheentiresetoffeaturesF00areconsideredstableclusters.F00isusedtoclustersamplesandS00isusedtoclusterfeatures.DenotebyfF1igandfS1jgtheobtainedclusterswhichareconsideredstablewithrespecttoF00andS00.NoweverypairFsi,Stj,t;s=f0;1gcorrespondstoadatasubmatrix,whichcanbeclusteredinthesimilartwo-waymannertoobtainclustersofthesecondorderfF2igandfS2jg.ThenagaintheprocessisrepeatedwitheachpairFsi,Stjnotusedearliertoobtaintheclusterson 39

PAGE 40

thenextorder,andsoonuntilnonewclustersatisfyingcertaincriteriaisobtained.Theusedcriteriacanimposeconstraintsonclustersize,somestatisticalcharacteristics,etc.Thoughanyone-wayclusteringalgorithmcanbeusedwithinthedescribediterativetwo-wayclusteringprocedure,theauthorschoseahierarchicalclusteringmethodSPC[ 12 40 ].ThejusticationofthischoicecomesfromthenaturalmeasureofrelativeclusterstabilitydeliveredbySPC.TheSPCmethodoriginatesfromaphysicalmodelassociatingabreakupofaclusterwithacertaintemperatureatwhichthisclusterlosesstability.Therefore,itiseasytodesignatemorestableclustersasthoserequiringhighertemperatureforfurtherpartitioning.OnlineimplementationofCTWCisavailableat[ 88 ].2.5.11PlaidModelsConsidertheperfectidealizedbiclusteringsituation.WehaveKbiclustersalongthemaindiagonalofthedatamatrixA=aijmnwiththesamevaluesofaijineachbiclusterk,k=1;:::;K:aij=0+KXk=1kikjk;{23where0issomeconstantvaluebackgroundcolor",ik=1iffeatureibelongstobiclusterkik=0,otherwise,jk=1ifsamplejbelongstobiclusterkjk=0,otherwiseandkisthevalue,whichcorrespondstobiclusterkcolor"ofbiclusterk,i.e.,aij=0+kiffeatureiandsamplejbelongstothesamebiclusterk.Wealsorequirethateachfeatureandsamplemustbelongtoexactlyonebicluster,thatis,8iKXk=1ik=1and8jKXk=1jk=1;{24respectively.In[ 61 ]LazzeroniandOwenintroducedamorecomplicatedplaidmodelasanaturalgeneralizationofidealization 2{23 2{24 .Inthismodel,biclustersareallowedtooverlap,andarereferredtoaslayers.Thevaluesofaijineachlayerarerepresentedasaij=ij0+KXk=1ijkikjk;{25 40

PAGE 41

wherethevalueofij0correspondstoabackgroundlayerandijkcanbeexpressedask,k+ik,k+jk,ork+ik+jkdependingonaparticularsituation.Wearelookingforaplaidmodelsuchthatthefollowingobjectivefunctionisminimized:minmXi=1nXj=1aij)]TJ/F21 11.955 Tf 11.955 0 Td[(ij0)]TJ/F22 7.97 Tf 16.738 14.944 Td[(KXk=1ijkikjk!2:{26In[ 61 ]theauthorsdevelopedaheuristiciterative-basedalgorithmforsolving 2{26 .Nextwebrieydescribethemainideaoftheapproach.SupposewehaveK)]TJ/F15 11.955 Tf 11.985 0 Td[(1layersandwearelookingfortheK-thlayersuchthattheobjectivefunctionin 2{26 isminimized.LetZij=ZK)]TJ/F20 7.97 Tf 6.587 0 Td[(1ij=aij)]TJ/F21 11.955 Tf 11.956 0 Td[(ij0)]TJ/F22 7.97 Tf 11.955 14.944 Td[(K)]TJ/F20 7.97 Tf 6.586 0 Td[(1Xk=1ijkikjk{27SubstitutingijKbyK+iK+jK,theobjectivefunctionfrom 2{26 canberewrittenintermsofZijasmXi=1nXj=1Zij)]TJ/F15 11.955 Tf 11.955 0 Td[(K+iK+jKikjk2{28LetiKandjKbesomestartingvaluesofouriterationalgorithm.Ateachiterationsteps=1;2;:::;SweupdatethevaluesofsiK,sjKandsijKapplyingthefollowingsimpleprocedure.ThevalueofsijKisobtainedfroms)]TJ/F20 7.97 Tf 6.587 0 Td[(1iKands)]TJ/F20 7.97 Tf 6.586 0 Td[(1jK,thenthevaluesofs)]TJ/F20 7.97 Tf 6.586 0 Td[(1iKands)]TJ/F20 7.97 Tf 6.587 0 Td[(1jKareupdatedusingsijKands)]TJ/F20 7.97 Tf 6.586 0 Td[(1jK,orsijKands)]TJ/F20 7.97 Tf 6.587 0 Td[(1jK,respectively.VariablessiKandsjKarerelaxed,i.e.,theycantakevaluesbetween0and1.Wexthemtobef0;1gduringoneofthelastiterationsofthealgorithm.Morespecically,giveniKandjK,thevalueofijK=K+iK+jKisupdatedasfollows:K=PiPjiKjKZij Pi2iKPj2jK;iK=PjZij)]TJ/F21 11.955 Tf 11.955 0 Td[(KiKjKjK iKPj2jK;jK=PiZij)]TJ/F21 11.955 Tf 11.955 0 Td[(KiKjKiK jKPi2iK: 41

PAGE 42

GivenijKandjK,orijKandiK,weupdateiK,orjKasiK=PjijKjKZij Pj2ijK2jK;orjK=PiijKiKZij Pi2ijK2iK;respectively.FormoredetailsofthistechniqueincludingtheselectionofstartingvaluesiKandjK,stoppingrulesandotherimportantissueswereferthereaderto[ 61 ].Softwarewiththeimplementationofthediscussedmethodisavailableat[ 62 ].2.5.12Order-PreservingSubmatrixOPSMProblemInthismodelintroducedbyBen-Doretal.[ 9 10 ],giventhedatasetA=aijmn,theproblemistoidentifyak`submatrixbiclusterF0;S0suchthattheexpressionvaluesofallfeaturesinF0increaseordecreasesimultaneouslywithinthesetofsamplesS0.Inotherwords,inthissubmatrixwecanndapermutationofcolumnssuchthatineveryrowthevaluescorrespondingtoselectedcolumnsareincreasing.Moreformally,letF0beasetofrowindicesff1;f2;:::;fkg.ThenthereexistsapermutationofS0,whichconsistsofcolumnindicesfs1;s2;:::;s`g,suchthatforalli=1;:::;kandj=1;:::;`)]TJ/F15 11.955 Tf 11.998 0 Td[(1wehavethatafi;sj
PAGE 43

Apartialmodel=f;;`gofacompletemodelS0;isgivenbythecolumnindicesofthecsmallestelements,thecolumnindicesofthedlargestelementss`)]TJ/F22 7.97 Tf 6.586 0 Td[(d+1;:::;s`andthesize`.Wesaythatisapartialmodeloforderc;d.Obviously,amodeloforderc;dbecomescompleteifc+d=`.Theideaofthealgorithmfrom[ 9 10 ]istoincreasecanddinthepartialmodeluntilwegetagoodqualitycompletemodel.Thetotalnumberofpartialmodelsoforder1;1inthematrixwithncolumnsisnn)]TJ/F15 11.955 Tf 12.236 0 Td[(1.Attherststepofthealgorithmweselecttbestpartialmodelsoforder;1.Nextwetrytoderivepartialmodelsoforder;1fromtheselectedpartialmodelsoforder;1.Picktbestmodelsoforder;1.Atthesteptwowetrytoextendthemtopartialmodelsoforder;2.Wecontinuethisprocessuntilwegettmodelsoforderd`=2e;d`=2e.OverallcomplexityofthealgorithmisOtn3m[ 9 10 ].2.5.13OP-ClusterTheorderpreservingclustermodelOP-ClusterisproposedbyJ.LiuandW.Wangin[ 63 ].ThismodelissimilartotheOPSM-modeldiscussedaboveandcanbeconsidered,insomesense,asitsgeneralization.Itaimsatndingbiclusterswherethefeaturesfollowthesameorderofvaluesinallthesamples.However,whentwofeaturevaluesinasamplearecloseenough,theyareconsideredindistinguishableandallowedtobeinanyorderinthesample.Formally,iffeaturesi;i+1;:::;i+4iareorderedinanon-decreasingsequenceinasampleji.e.,aijai+1;j:::ai+4i;jandauser-speciedgroupingthreshold>0isgiven,thesamplejiscalledsimilarontheseattributesifai+4i;j)]TJ/F21 11.955 Tf 11.955 0 Td[(aij
PAGE 44

choiceG;aij=aij:Next,asequenceoffeaturesissaidtoshowanUPpatterninasampleifitcanbepartitionedintogroupssothatthepivotpointofeachgroupisnotsmallerthantheprecedingvalueinthesequence.Finally,abiclusteriscalledanorder-preservingclusterOP-ClusteristhereexistsapermutationofitsfeaturessuchthattheyallshowanUPpattern.TheauthorspresentedanalgorithmforndingOP-Clusterswithnolessthantherequirednumberofsamplesnsandnumberoffeaturesnf.Thealgorithmessentiallysearchesthroughallorderedsubsequencesoffeaturesexistinginsamplestondmaximalcommonones,butduetoarepresentationoffeaturesequencesinatreeformallowingforanecientpruningtechniquethealgorithmissucientlyfastinpracticetoapplytorealdata.2.5.14SupervisedClassicationviaMaximal-validPatternsIn[ 27 ]theauthorsdeneda-validpatternasfollows.GivenadatamatrixA=aijmnand>0,asubmatrixF0;S0ofAiscalleda-validpatternif8i2F0maxj2S0aij)]TJ/F15 11.955 Tf 11.955 0 Td[(minj2S0aij<{29The-validpatterniscalledmaximalifitisnotasubmatrixofanylargersubmatrixofA,whichisalsoa-validpattern.Maximal-validpatternscanbefoundusingtheSPLASHalgorithm[ 26 ].Theideaofthealgorithmisndanoptimalsetof-patternssuchthattheycoverthesetofsamples.Itcanbedoneusingagreedyapproachselectingrstmoststatisticallysignicantandmostcoveringpatterns.Finally,thissetof-patternsisusedtoclassifythetestsamplessampleswithunknownclassication.Formoredetaileddescriptionofthetechniquewereferthereaderto[ 27 ]. 44

PAGE 45

2.5.15CMonkeyCMonkeyisanotherstatisticalmethodforbiclusteringthathasbeenrecentlyintroducedbyReissatal.[ 77 ].Themethodisdevelopedspecicallyforgeneticdataandworksatthesametimewithgenesequencedata,geneexpressiondatafromamicroarrayandgenenetworkassociationdata.Itconstructsonebiclusteratatimewithaniterativeprocedure.First,thebiclusteriscreatedeitherrandomlyorfromtheresultofsomeotherclusteringmethod.Then,oneachstep,foreachsampleandfeatureitisdecidedwhetheritshouldbeaddedto/removedfromthebicluster.Forthispurpose,theprobabilitiesofthepresenceoftheconsideredsampleorfeatureinthebiclusterwithrespecttothecurrentstructureofthebiclusteratthethreedatalevelsiscomputed,andasimulatedannealingformulaisusedtomakethedecisionabouttheupdateonthebasisofthecomputedprobabilities.Thisway,evenwhentheseprobabilitiesarenothigh,theupdatehasanonzerochancetooccurthatallowsescapesfromlocaloptimaasinanyothersimulatedannealingtechniqueforglobaloptimization.Theactualprobabilityoftheupdatealsodependsonthechosenannealingschedule,soearlierupdateshavenormallyhigherprobabilityofacceptancewhilethelaterstepsgetalmostidenticaltolocaloptimization.wereferthereaderto[ 77 ]forthedetaileddescriptionoftheReissatal.work.2.6DiscussionandConcludingRemarksInthischapterwereviewedthemostwidelyusedandsuccessfulbiclusteringtechniquesandtheirrelatedapplications.Generallyspeakingmanyoftheapproachesrelyonnotmathematicallystrictargumentsandthereisalackofmethodstojustifythequalityoftheobtainedbiclusters.Furthermore,additionaleortsshouldbemadetoconnectpropertiesofthebiclusterswithphenomenarelevanttothedesireddataanalysis.Therefore,futuredevelopmentofbiclusteringshouldinvolvemoretheoreticalstudiesofbiclusteringmethodologyandformalizationofitsqualitycriteria.Morespecically,asweobservedthatthebiclusteringconcepthasremarkableinterplaywithalgebraicnotion 45

PAGE 46

oftheSVD,webelievethatbiclusteringmethodologyshouldbefurtheradvancedinthedirectionofalgebraicformalization.Thisshouldalloweectiveutilizationofclassicalalgebraicalgorithms.Inaddition,amoreformalsetupfordesiredclassseparabilitycanbeachievedwithestablishingnewtheoreticalresultsonthepropertiesofdomainsconningallsamples/featuresofonebicluster.Thenumberofbiclusteringapplicationscanbealsoextendedwithotherareas,wheresimultaneousclusteringofdatasamplesandfeaturesattributesmakesalotofsense.Forexample,oneofthepromisingdirectionsmaybebiclusteringofstockmarketdata.Thiswayclusteringofequitiesmayrevealgroupsofcompanieswhoseperformanceisdependentonthesamebutpossiblyhiddenfactors,whileclustersoftradingdaysmayrevealunknownpatternsofstockmarketreturns.Tosummarize,weshouldemphasizethatfurthersuccessfuldevelopmentofbiclusteringtheoryandtechniquesisessentialfortheprogressindatamininganditsapplicationstextmining,computationalbiology,etc. 46

PAGE 47

CHAPTER3CONSISTENTBICLUSTERINGVIAFRACTIONAL0{1PROGRAMMING3.1ConsistentBiclusteringLeteachsamplebealreadyassignedsomehowtooneoftheclassesS1;S2;:::;Sr.Introducea0{1matrixS=sjknrsuchthatsjk=1ifj2Sk,andsjk=0otherwise.ThesampleclasscentroidscanbecomputedasthematrixC=cikmr:C=ASSTS)]TJ/F20 7.97 Tf 6.587 0 Td[(1;{1whosek-thcolumnrepresentsthecentroidoftheclassSk.ConsiderarowiofthematrixC.Eachvalueinitgivesustheaverageexpressionofthei-thfeatureinoneofthesampleclasses.Aswewanttoidentifythecheckerboardpatterninthedata,wehavetoassignthefeaturetotheclasswhereitismostexpressed.So,letusclassifythei-thfeaturetotheclass^kwiththemaximalvalueci^k 1 :i2F^k8k=1:::r;k6=^k:ci^k>cik{2Now,providedtheclassicationofallfeaturesintoclassesF1,F2,:::,Fr,letusconstructaclassicationofsamplesusingthesameprincipleofmaximalaverageexpressionandseewhetherwewillarriveatthesameclassicationastheinitiallygivenone.Todothis,constructa0{1matrixF=fikmrsuchthatfik=1ifi2Fkandfik=0otherwise.Then,thefeatureclasscentroidscanbecomputedinformofmatrixD=djknr:D=ATFFTF)]TJ/F20 7.97 Tf 6.587 0 Td[(1;{3 1Takingintoaccountthatinreal-lifedataminingapplicationsalldataarefractionalvalues,whoseaccuracyisnotperfect,wemaydisregardthecasewhenthismaximumisnotunique.However,forthesakeoftheoreticalpuritywefurtherassumethatiftheambiguityinclassicationoccurs,weapplyanegligibleperturbationtothedatasetvaluesandstarttheprocedureanew. 47

PAGE 48

whosek-thcolumnrepresentsthecentroidoftheclassFk.Theconditiononsampleclassicationweneedtoverifyisj2S^k8k=1:::r;k6=^k:dj^k>djk{4Letusstatenowthedenitionofbiclusteringanditsconsistencyformally. Denition1. AbiclusteringofadatasetisacollectionofpairsofsampleandfeaturesubsetsB=S1;F1;S2;F2;:::;Sr;FrsuchthatthecollectionS1;S2;:::;Srformsapartitionofthesetofsamples,andthecollectionF1;F2;:::;Frformsapartitionofthesetoffeatures. Denition2. AbiclusteringBwillbecalledconsistentifbothrelations 3{2 and 3{4 hold,wherethematricesCandDaredenedasin 3{1 and 3{3 .Wewillalsosaythatadatasetisbiclustering-admittingifsomeconsistentbiclusteringforitexists.Furthermore,thedatasetwillbecalledconditionallybiclustering-admittingwithrespecttoagivenpartialclassicationofsomesamplesand/orfeaturesifthereexistsaconsistentbiclusteringpreservingthegivenpartialclassication.Next,wewillshowthataconsistentbiclusteringimpliesseparabilityoftheclassesbyconvexcones.Furtherwewilldenotej-thsampleofthedatasetbyajwhichisthej-thcolumnofthematrixA,andi-thfeaturebyaiwhichisthei-throwofthematrixA. Theorem4. LetBbeaconsistentbiclustering.ThenthereexistconvexconesP1;P2;:::;PrRmsuchthatallsamplesfromSkbelongtotheconePkandnoothersamplebelongstoit,k=1:::r.Similarly,thereexistconvexconesQ1;Q2;:::;QrRnsuchthatallfeaturesfromFkbelongtotheconeQkandnootherfeaturebelongstoit,k=1:::r.Proof.LetPkbetheconichullofthesamplesofclassSk,thatis,avectorx2Pkifandonlyifitcanberepresentedasx=Xj2Skjaj; 48

PAGE 49

whereallj0.Obviously,PkisconvexandallsamplesoftheclassSkbelongtoit.Now,supposethereisasample^j2S`,`6=kthatbelongstotheconePk.Thenthereexistsrepresentationa^j=Xj2Skjaj;whereallj0.Next,consistencyofthebiclusteringimpliesthatinthematrixoffeaturecentroidsD,thecomponentd^j`>d^jk.ThisimpliesPi2F`ai^j jF`j>Pi2Fkai^j jFkjPlugginginai^j=Pj2Skjaij,weobtainPi2F`Pj2Skjaij jF`j>Pi2FkPj2Skjaij jFkjChangingtheorderofsummation,Xj2SkjPi2F`aij jF`j>Xj2SkjPi2Fkaij jFkj;orXj2Skjdj`>Xj2SkjdjkOntheotherhand,foranyj2Sk,thebiclusteringconsistencyimpliesdj`
PAGE 50

knownapriori,andclassicationofadditionalsamples,constitutingthetestset,hastobeperformed.Thatis,asupervisedclassicationmethodconsistsoftworoutines,rstofwhichderivesclassicationcriteriawhileprocessingthetrainingsamples,andthesecondoneappliesthesecriteriatothetestsamples.Ingenomicandproteomicdataanalysis,aswellasinotherdataminingapplications,whereonlyasmallsubsetoffeaturesisexpectedtoberelevanttotheclassicationofinterest,theclassicationcriteriashouldinvolvedimensionalityreductionandfeatureselection.Inthischapter,wehandlesuchataskutilizingthenotionofconsistentbiclustering.Namely,weselectasubsetoffeaturesoftheoriginaldatasetinsuchawaythattheobtainedsubsetofdatabecomesconditionallybiclustering-admittingwithrespecttothegivenclassicationoftrainingsamples.AssumingthatwearegiventhetrainingsetA=aijmnwiththeclassicationofsamplesintoclassesS1;S2;:::;Sr,weareabletoconstructthecorrespondingclassicationoffeaturesaccordingto 3{2 .Now,iftheobtainedbiclusteringisnotconsistent,ourgoalistoexcludesomefeaturesfromthedatasetsothatthebiclusteringwithrespecttotheresidualfeaturesetisconsistent.Formally,letusintroduceavectorof0{1variablesx=xii=1:::mandconsiderthei-thfeatureselectedifxi=1.Theconditionofbiclusteringconsistency 3{4 ,whenonlytheselectedfeaturesareused,becomesPmi=1aijfi^kxi Pmi=1fi^kxi>Pmi=1aijfikxi Pmi=1fikxi;8j2S^k;^k;k=1:::r;^k6=k:{5Wewillusethefractionalrelations 3{5 asconstraintsofanoptimizationproblemselectingthefeatureset.Itmayincorporatevariousobjectivefunctionsoverx,dependingonthedesirablepropertiesoftheselectedfeatures,butonegeneralchoiceistoselectthemaximalpossiblenumberoffeaturesinordertoloseminimalamountofinformationprovidedbythetrainingset.Inthiscase,theobjectivefunctionismaxmXi=1xi{6 50

PAGE 51

Theoptimizationproblem 3{6 3{5 isaspecictypeoffractional0{1programmingproblem,whichwediscussinthenextsection.3.3Fractional0{1ProgrammingFractional0{1programmingproblemorhyperbolic0{1programmingproblemisdenedasfollows:maxx2f0;1gmfx=nXj=1j0+Pmi=1jixi j0+Pmi=1jixi;{7whereitisusuallyassumedthatforalljandx2f0;1gmthedenominatorsin 3{7 arepositive,i.e.j0+Pmi=1jixi>0.Problem 3{7 isknowntobeNP-hard[ 75 ].Formoreinformationoncomplexityissuesoffractional0{1programmingproblemswereferthereaderto[ 75 76 ].Applicationsofconstrainedandunconstrainedversionsofproblem 3{7 ariseinnumerousareasincludingbutnotlimitedtoscheduling[ 78 ],queryoptimizationindatabasesandinformationretrieval[ 49 ],andp-choicefacilitylocation[ 86 ].Generally,intheframeworkoffractional0{1programmingweconsiderproblems,whereweoptimizeamultiple-ratiofractional0{1functionoftype 3{7 subjecttoasetoflinearconstraints.Algorithmsforsolvingproblem 3{7 includelinearizationtechniques[ 76 86 90 ],branchandboundmethods[ 86 ],network-ow[ 74 ]andapproximation[ 50 ]approaches.Inthischapterwedeneanewclassoffractional0{1programmingproblems,wherefractionaltermsarenotintheobjectivefunction,butinconstraints,i.e.weoptimizealinearobjectivefunctionsubjecttofractionalconstraints.Moreformally,wedenethefollowingproblem:maxx2f0;1gmgx=mXi=1wixi{8s.t.nsXj=1sj0+Pmi=1sjixi sj0+Pmi=1sjixips;s=1;:::;S;{9 51

PAGE 52

whereSisthenumberoffractionalconstraints,andwealsoassumethatforalls,jandx2f0;1gmdenominatorsin 3{9 arepositive,i.e.sj0+Pmi=1sjixi>0.ThisproblemisclearlyNP-hardsincelinear0{1programmingisaspecialclassofproblem 3{8 3{9 ifsji=0andsj0=1forj=1;:::;ns,i=1;:::;mands=1;:::;S.Atypicalapproachforsolvingproblem 3{7 istoreformulateitasalinearmixed0{1programmingproblem,whichcanbeaddressedusingstandardlinearprogrammingsolverslikeCPLEX[ 35 ].Formoredetailedinformationonpossiblelinearizationmethodsforfractional0{1programmingproblemswecanreferto[ 76 86 90 ].Fortunately,asimilartechniquecanbealsoappliedtoproblem 3{8 3{9 .Thelinearizationapproachdiscussednextisbasedonaverysimpleidea: Theorem5. Apolynomialmixed0{1termz=xy,wherexisa0{1variable,andyisacontinuousvariabletakinganypositivevalue,canberepresentedbythefollowinglinearinequalities:y)]TJ/F21 11.955 Tf 12.118 0 Td[(zM)]TJ/F21 11.955 Tf 12.118 0 Td[(Mx;zy;zMx;z0,whereMisalargenumbergreaterthany.Asimpleproofofthisresultcanbefoundin[ 90 ].Nextdeneasetofnewvariablesysjsuchthatysj=1 sj0+Pmi=1sjixi;{10wherej=1;:::;ns,ands=1;:::;S.Sinceweassumethatalldenominatorsarepositive,condition 3{10 isequivalenttosj0ysj+mXi=1sjixiysj=1:{11Intermsofnewvariablesysjproblem 3{8 3{9 canberewrittenasmaxx2f0;1gmgx=mXi=1wixi{12 52

PAGE 53

s.t.nsXj=1sj0ysj+nsXj=1mXi=1sjixiysjps;s=1;:::;S; {13 sj0ysj+mXi=1sjixiysj=1;j=1;:::;ns;s=1;:::;S: {14 Inordertoobtainalinearmixed0{1formulations,nonlineartermsxiysjin 3{13 and 3{14 canbelinearizedintroducingadditionalvariableszsijandapplyingtheresultsofTheorem 5 .Thenumberofnewvariablesysjandzsijism+1PSs=1ns.3.4AlgorithmforBiclusteringTolinearizethefractional0{1program 3{6 3{5 ,weshouldintroduceaccordingto 3{10 thevariablesyk=1 Pmi=1fikxi;k=1:::r:{15Sincefikcantakevaluesonlyzeroorone,equation 3{15 canbeequivalentlyrewrittenasmXi=1fikxi1;k=1:::r:{16mXi=1fikxiyk=1;k=1:::r:{17Intermsofthenewvariablesyk,condition 3{5 isreplacedbymXi=1aijfi^kxiy^k>mXi=1aijfikxiyk8j2S^k;^k;k=1:::r;^k6=k:{18Next,observethatthetermxiykispresentin 3{18 ifandonlyiffik=1,i.e.,i2Fk.So,therearetotallyonlymofsuchproductsin 3{18 ,andhencewecanintroducemvariableszi=xiyk,i2FktolinearizethesystembyTheorem 5 .Obviously,theparameterMcanbesetto1.So,insteadof 3{17 and 3{18 ,wehavethefollowingconstraints:mXi=1fikzi=1;k=1:::r:{19mXi=1aijfi^kzi>mXi=1aijfikzi8j2S^k;^k;k=1:::r;^k6=k:{20 53

PAGE 54

yk)]TJ/F21 11.955 Tf 11.955 0 Td[(zi1)]TJ/F21 11.955 Tf 11.955 0 Td[(xi;ziyk;zixi;zi0;i2Fk:{21Unfortunately,whilethelinearizationbyTheorem 5 worksnicelyforsmall-sizeproblems,itoftencreatesinstances,wherethegapbetweentheintegerprogrammingandthelinearprogrammingrelaxationoptimumsolutionsisverybigforlargerproblems.Asaconsequence,theinstancecannotbesolvedinareasonabletimeevenwiththebesttechniquesimplementedinmodernintegerprogrammingsolvers.Hence,wehavedevelopedanalternativeapproachtosolvingtheproblem 3{6 3{5 viamixed0{1programming,whichissimilarbythemainideatothemethodforsolvingspecicfractional0{1programmingproblemsdescribedin[ 74 ].Considerthemeaningofvariableszi.Wehaveintroducedthemsothatzi=xi Pm`=1f`kx`;i2Fk:{22Thus,fori2Fk,ziisthereciprocalofthecardinalityoftheclassFkafterthefeatureselection,ifthei-thfeatureisselected,and0otherwise.Thissuggeststhatziisalsoabinaryvariablebynatureasxiis,butitsnonzerovalueisjustnotsetto1.Thatvalueisnotknownunlesstheoptimalsizesoffeatureclassesareobtained.However,knowingziissucienttodenethevalueofxi,andthesystemofconstraintswithrespectonlytothecontinuousvariables0zi1constitutesalinearrelaxationofthebiclusteringconstraints 3{5 .Furthermoreitcanbestrengthenedbythesystemofinequalitiesconnectingzitoxi.Indeed,ifweknowthatnomorethanmkfeaturescanbeselectedforclassFk,thenitisvalidtoimpose:ximkzi;xizi;i2Fk:{23Wecanprove Theorem6. Ifxisanoptimalsolutionto 3{6 3{5 ,andmk=Pmi=1fikxi,thenxisalsoanoptimalsolutionto 3{6 3{19 3{20 3{23 54

PAGE 55

Proof.Obviously,xisafeasiblesolutiontothenewprogram,sowejusthavetoshowthat 3{6 3{19 3{20 3{23 cannothaveabettersolution.Assumesuchasolutionxexists.Then,mXi=1xi>mXi=1xi;and,therefore,atleastforonek2f1:::rg,mXi=1fikxi>mXi=1fikxi:Ontheotherhand,ximkzi,andinconjunctionwith 3{19 itimpliesthatmXi=1fikximXi=1mkfikzi=mk=mXi=1fikxi:Wehaveobtainedacontradictionand,therefore,xisanoptimalsolutionto 3{6 3{19 3{20 3{23 .Hence,wecanutilizeAlgorithm 3-1 astheiterativeheuristicoffeatureselection. fori1tomdoxi1;endrepeatmkPmi=1fikxiforallk=1:::r;solvethemixed0{1programmingformulationusingtheinequalities 3{23 insteadof 3{21 ;untilmk=Pmi=1fikxiforallk=1:::r;Figure3-1.Featureselectionheuristic Anothermodicationoftheprogram 3{6 3{5 thatmayresultintheimprovementofqualityofthefeatureselectionisstrengtheningoftheclassseparationbyintroductionofacoecientgreaterthan1fortheright-handsideoftheinequality 3{5 .Inthiscase,weimprove 3{5 bytherelationPmi=1aijfi^kxi Pmi=1fi^kxi+tPmi=1aijfikxi Pmi=1fikxi;{24 55

PAGE 56

wheret>0isaconstantthatbecomesaparameterofthemethodnoticealsothatdoingthiswehavealsoreplacedthestrictinequalitiesbynon-strictonesandmadethefeasibledomainclosed.Inthemixed0{1programmingformulation,itisachievedbyreplacing 3{20 bymXi=1aijfi^kzi+tmXi=1aijfikzi8j2S^k;^k;k=1:::r;^k6=k:{25Afterthefeatureselectionisdone,weperformclassicationoftestsamplesaccordingto 3{4 .Thatis,ifb=bii=1:::misatestsample,weassignittotheclassF^ksatisfyingPmi=1bifi^kxi Pmi=1fi^kxi>Pmi=1bifikxi Pmi=1fikxi;k=1:::r;^k6=k:3.5ComputationalResults3.5.1ALLvs.AMLdatasetWeappliedsupervisedbiclusteringtoawell-researchedmicroarraydatasetcontainingsamplesfrompatientsdiagnosedwithacutelymphoblasticleukemiaALLandacutemyeloidleukemiaAMLdiseases[ 45 ].Ithasbeenthesubjectofavarietyofresearchpapers,e.g.[ 8 11 89 91 ].ThisdatasetwasalsousedintheCAMDAdatacontest[ 28 ].Itisdividedintotwoparts{thetrainingsetALL,11AMLsamples,andthetestset20ALL,14AMLsamples.Accordingtothedescribedmethodology,weperformedfeatureselectionforobtainingaconsistentbiclusteringusingthetrainingset,andthesamplesofthetestsetweresubsequentlyclassiedchoosingforeachofthemtheclasswiththehighestaveragefeatureexpression.Theparameterofseparationt=0:1wasused.Thealgorithmselected3439featuresforclassALLand3242featuresforclassAML.Theobtainedclassicationcontainsonlyoneerror:theAML-sample66wasclassiedintotheALLclass.Toprovidethejusticationofthequalityofthisresult,weshouldmentionthatthesupportvectormachinesSVMapproachdeliversupto5classicationerrorsontheALLvs.AMLdatasetdependingonhowtheparametersofthemethodaretuned[ 89 ].Furthermore,theperfectclassicationwasobtainedonlywithonespecicsetofvaluesoftheparameters. 56

PAGE 57

TheheatmapfortheconstructedbiclusteringispresentedinFigure 3-2 .3.5.2HuGEIndexdatasetAnothercomputationalexperimentthatweconductedwasonfeatureselectionforconsistentbiclusteringoftheHumanGeneExpressionHuGEIndexdataset[ 54 ].ThepurposeoftheHuGEprojectistoprovideacomprehensivedatabaseofgeneexpressionsinnormaltissuesofdierentpartsofhumanbodyandtohighlightsimilaritiesanddierencesamongtheorgansystems.Wereferthereaderto[ 53 ]forthedetaileddescriptionofthesestudies.Thedatasetconsistsof59samplesfrom19distincttissuetypes.Itwasobtainedusingoligonucleotidemicroarrayscapturing7070genes.Thesampleswereobtainedfrom49humanindividuals:24maleswithmedianageof63and25femaleswithmedianageof50.Eachsamplecamefromadierentindividualexceptforrst7BRAsamplesthatwerefromthedierentbrainregionsofthesameindividualand5thLIsample,whichcamefromthatindividualaswell.WeappliedtothedatasetAlgorithm1withtheparameterofseparationt=0:1.TheobtainedbiclusteringissummarizedinTable 3-1 anditsheatmapispresentedinFigure 3-3 .Thedistinctblock-diagonalpatternoftheheatmapevidencesthehighqualityoftheobtainedfeatureclassication.WealsomentionthattheoriginalstudiesofHuGEIndexdatasetin[ 53 ]wereperformedwithout6oftheavailablesamples:2KIsamples,2LUsamples,and2PRsampleswereexcludedbecausetheirqualitywastoopoorforthestatisticalmethodsused.Nevertheless,wemayobservethatnoneofthemdistortstheobtainedbiclusteringpattern,whichconrmstherobustnessofourmethod.3.6ConclusionsandFutureResearchWehavedevelopedanewoptimizationframeworktoperformsupervisedbiclusteringwithfeatureselection.Ithasbeenprovedthattheobtainedpartitionsofsamplesandfeaturesofthedatasetsatisfyaconicseparationcriterionofclassication.Thoughtheconstructedfractional0{1programmingformulationmaybehardtotacklewithdirectsolvingmethods,itadmitsagoodlinearcontinuousrelaxation.Preliminarycomputational 57

PAGE 58

Figure3-2.ALLvs.AMLheatmap 58

PAGE 59

Figure3-3.HuGEindexheatmap 59

PAGE 60

Table3-1.HuGEindexbiclustering TissuetypeAbbreviation#samples#featuresselected BloodBD1472BrainBRA11614BreastBRE2902ColonCO1367CervixCX1107EndometriumENDO2225EsophagusES1289KidneyKI6159LiverLI6440LungLU6102MuscleMU6532MyometriumMYO2163OvaryOV2272PlacentaPL2514ProstatePR4174SpleenSP1417StomachST1442TestesTE1512VulvaVU3186 resultsshowthattighteningititerativelywithvalidinequalitieslinkingthecontinuousand0{1variables,weareabletoobtainagoodheuristicsolutionprovidingareliablefeatureselectionandthetestsetclassicationbasedonit.Wealsonotethatincontrasttomanyotherdataminingmethodologiesthedevelopedalgorithminvolvesonlyoneparameterthatshouldbedenedbytheuser.Furtherresearchworkshouldrevealmorepropertiesrelatingsolutionsofthelinearrelaxationtosolutionsoftheoriginalfractional0{1programmingproblem.Thisshouldallowformoregroundedchoicesoftheclassseparationparametertforfeatureselectionandbettersolvingmethods.Itisalsointerestingtoinvestigatewhethertheproblem 3{6 subjectto 3{5 itselfisNP-hard. 60

PAGE 61

CHAPTER4ANOPTIMIZATION-BASEDAPPROACHFORDATACLASSIFICATION4.1BasicDenitionsAdatasetisnormallygiveninformofarectangularmatrixA=aijmn.Thecolumnsofthismatrixrepresentndatasamples,whiletherowscorrespondtomfeaturesofthesesamples.Amatrixelementaijgivesustheexpressionofi-thfeatureinj-thsample.Ifthesetofsamplesispartitionedintorclasses,wewilldenotethek-thclassbySkf1:::ng,k=1:::r.Next,weintroducea0-1matrixS=sjknrsuchthatsjk=1ifj2Sk,andsjk=0otherwise.Wewillalsoconsidercentroidsofthoseclasses.EachclasscentroidwillberepresentedasacolumnofmatrixC=cikmr:C=ASSTS)]TJ/F20 7.97 Tf 6.587 0 Td[(1:ThefunctionJS=rXk=1nXj=1sjkka:j)]TJ/F21 11.955 Tf 11.955 0 Td[(c:kk2;{1wherea:jdenotesj-thcolumnofthematrixA,c:kdenotesk-thcolumnofthematrixCandk:kdenotestheEuclideannorm,willbecalledthesum-of-squaresork-meansobjectiveoftheclusteringgivenbythematrixS.ThesmallerthevalueJS,thetighteraretheclustersasthesumofdistancesfromclustermemberstothecorrespondingcentroiddecreases.Wewillsaythataclusteringsatisesthesum-of-squaresork-meanscriterionif,foreachsamplej,thedistancefromittothecentroidc:^koftheclassS^k3jisnotgreaterthanthedistancetoanyotherclasscentroid:ka:j)]TJ/F21 11.955 Tf 11.955 0 Td[(c:^kkka:j)]TJ/F21 11.955 Tf 11.955 0 Td[(c:kk;k=1:::r:{2Thecriterion 4{2 impliesthattheclusteringisterminalfork-meansalgorithm,thatis,ifitisgivenastheinputtothealgorithm,nosampleswillbemovedfromoneclasstoanother,andthealgorithmwillterminateimmediately. 61

PAGE 62

Remark.Note,however,thatthecriterion 4{2 doesnotalwaysguaranteethattheclusteringdeliversevenalocalminimumtotheobjective 4{1 .Indeed,letusconsider,forexample,threesamplesintheone-dimensionalspacerepresentedbypoints0,5,and8,withtheclusteringf0;5g;f8gprovided.Thisclusteringisnotlocallyoptimalwithrespecttothek-meansobjectiveasmoving5fromtherstclasstotheseconddecreasesJSfrom12:5to4:5.However,itiseasytoverifythatthisclusteringsatisesthek-meanscriterion 4{2 .Next,wewillsaythataclusteringsatisesthepairwisethresholdcriterionifthedistancebetweenanytwosamplesthatbelongtothesameclassisalwaysnotgreaterthananydistancebetweentwosamplesfromdierentclasses.Thiscanbeexpressedbynn)]TJ/F20 7.97 Tf 6.586 0 Td[(1 2inequalitiesoftheformja:j1)]TJ/F21 11.955 Tf 11.956 0 Td[(a:j2j2Dint{3ifsamplesj1andj2arefromthesameclass,andja:j1)]TJ/F21 11.955 Tf 11.955 0 Td[(a:j2j2Dext{4ifsamplesj1andj2arefromdierentclasses,withoneadditionalinequalityDintDext:{5Thisway,DintisanupperboundonthedistancebetweensamplesinthesameclassandDextisalowerboundontheinterclassdistance.Wewillcalltheinequalities 4{3 4{5 thepairwisethresholdconstraints.Thefollowingpropertyestablishesalinkbetweenthesum-of-squaresobjectiveandthepairwisethresholdcriterion: Theorem1. Thesumofsquaresofdistancesbetweenallsamplesofaclassandthecentroidoftheclasscanbeexpressedviapairwisedistancesbetweenthesamplesinthefollowingway:Xj2Skka:j)]TJ/F21 11.955 Tf 11.955 0 Td[(c:kk2=1 jSkjXj12SkXj22Skj2>j1ka:j1)]TJ/F21 11.955 Tf 11.955 0 Td[(a:j2k2:{6 62

PAGE 63

Proof. BytheCosineTheorem,ka:j1)]TJ/F21 11.955 Tf 11.955 0 Td[(a:j2k2=ka:j1)]TJ/F21 11.955 Tf 11.955 0 Td[(c:kk2+ka:j2)]TJ/F21 11.955 Tf 11.955 0 Td[(c:kk2)]TJ/F15 11.955 Tf -227.521 -35.863 Td[(2ka:j1)]TJ/F21 11.955 Tf 11.955 0 Td[(c:kkka:j2)]TJ/F21 11.955 Tf 11.955 0 Td[(c:kkcosa:j1)]TJ/F21 11.955 Tf 11.956 0 Td[(c:k;a:j2)]TJ/F21 11.955 Tf 11.955 0 Td[(c:k=ka:j1)]TJ/F21 11.955 Tf 11.955 0 Td[(c:kk2+ka:j2)]TJ/F21 11.955 Tf 11.955 0 Td[(c:kk2)]TJ/F15 11.955 Tf 11.956 0 Td[(2a:j1)]TJ/F21 11.955 Tf 11.955 0 Td[(c:ka:j2)]TJ/F21 11.955 Tf 11.956 0 Td[(c:k:So,Xj12SkXj22Skka:j1)]TJ/F21 11.955 Tf 11.955 0 Td[(a:j2k2=Xj12SkXj22Sk)]TJ/F19 11.955 Tf 5.479 -9.684 Td[(ka:j1)]TJ/F21 11.955 Tf 11.955 0 Td[(c:kk2+ka:j2)]TJ/F21 11.955 Tf 11.955 0 Td[(c:kk2)]TJ/F15 11.955 Tf -241.605 -36.784 Td[(2Xj12SkXj22Ska:j1)]TJ/F21 11.955 Tf 11.956 0 Td[(c:ka:j2)]TJ/F21 11.955 Tf 11.955 0 Td[(c:k:Thelasttermiszero,sinceXj12SkXj22Ska:j1)]TJ/F21 11.955 Tf 11.955 0 Td[(c:ka:j2)]TJ/F21 11.955 Tf 11.955 0 Td[(c:k=Xj12Ska:j1)]TJ/F21 11.955 Tf 11.955 0 Td[(c:kXj22Ska:j2)]TJ/F21 11.955 Tf 11.955 0 Td[(c:k!andXj22Ska:j2)]TJ/F21 11.955 Tf 11.956 0 Td[(c:k=Xj22Ska:j2)-222(jSkjc:k=jSkjc:k)-222(jSkjc:k=0:Thus,Xj12SkXj22Skka:j1)]TJ/F21 11.955 Tf 11.955 0 Td[(a:j2k2=Xj12SkXj22Sk)]TJ/F19 11.955 Tf 5.48 -9.684 Td[(ka:j1)]TJ/F21 11.955 Tf 11.955 0 Td[(c:kk2+ka:j2)]TJ/F21 11.955 Tf 11.955 0 Td[(c:kk2=2jSkjXj2Skka:j)]TJ/F21 11.955 Tf 11.955 0 Td[(c:kk2;whichimples 4{6 Corollary1. IfaclusteringSsatisesthepairwisethresholdcriterion,thenJSn)]TJ/F21 11.955 Tf 11.955 0 Td[(rD2int 2:{7 Proof. Usingthelemma,weobtainJS=rXk=11 jSkjXj12SkXj22Skj2>j1ka:j1)]TJ/F21 11.955 Tf 11.955 0 Td[(a:j2k2 63

PAGE 64

rXk=11 jSkjjSkjjSkj)]TJ/F15 11.955 Tf 17.933 0 Td[(1 2D2int=n)]TJ/F21 11.955 Tf 11.955 0 Td[(rD2int 2: Remark.Usuallythepairwisethresholdcriterionisstrongerthanthek-meanscriterion,thoughasmallexamplewhenitdoesnotguaranteealocalminimumtotheobjective 4{1 canbegiven.Considersamples;)]TJ/F15 11.955 Tf 9.298 0 Td[(1,;1,;0,;0,;0,;0inthetwo-dimensionalspace.Ifrsttwosamplesrepresentoneclassandtherestrepresenttheotherclass,thepairwisethresholdconstraintissatisedwithDmin=2,Dmax=p 5andJS=5.However,ifwemovethesample2;0totherstclass,wehaveJS=42 3despitethepairwisethresholdisviolatedthedistancebetweensamplesofthesameclass;1and;0isp 5butthedistancebetweensamplesfromdierentclasses;0and;0is2.Letusalsointroduceavectorofvariablesx=xii=1:::mboundedbetween0and1representingchosenfeatureweights:0xi1;i=1:::m:{8Ifxi=0,theni-thfeatureisdisregardedduringthetestsetclassication.Ifxi>0thenthevalueofxiistheweightofi-thfeature,whichwewilluseinourclassicationroutine.Thisway,thevectorxrepresentsafeatureselectionmadeinthedataset.Takingintoaccountthefeatureselection,wemayrewritethesum-of-squarescriterionasmXi=1aij)]TJ/F21 11.955 Tf 11.955 0 Td[(ci^k2ximXi=1aij)]TJ/F21 11.955 Tf 11.955 0 Td[(cik2xi;k=1:::r:{9where^kissuchthatj2S^k.ThepairwisethresholdcriterionwillbedenedbyinequalitiesmXi=1aij1)]TJ/F21 11.955 Tf 11.955 0 Td[(aij22xiDint;{10 64

PAGE 65

ifsamplesj1andj2arefromthesameclass,mXi=1aij1)]TJ/F21 11.955 Tf 11.955 0 Td[(aij22xiDext{11ifsamplesj1andj2arefromdierentclasses,andDintDext:{124.2OptimizationFormulationandClassicationAlgorithmGivenaclassicationofthetrainingsetofsamples,wewillbeperformingfeatureselectioninitutilizingtheintroducedclusteringcriteria.Wewillformulateanoptimizationproblemoverthevariablesx,wheretheobjectiveiseithertomaximizetheclassseparationortominimizetheinformationloss,andtheconstraintsareoneoftheformulatedclusteringcriterion.Ifthek-meanscriterionisemployed,theusedobjectivewillbemaxxmXi=1xi;{13whichmaximizesthetotalweightofselectedfeatures,thatis,omitstheminimumamountofinformationfromthedataset.WiththepairwisethresholdcriterionwewillusetheobjectivemaxxDext)]TJ/F21 11.955 Tf 11.956 0 Td[(Dint;{14managingthemaximumseparationbetweenclasses.Ifthisoptimizationproblemhasonlythetrivialsolutionx=0,thereisnopossibilitytosatisfytheclusteringcriterionforthetrainingsetirrespectivelyofthefeatureselection.Insuchcaseswewillrelaxthecriterionbydroppingsomeoftheconstraints.Inordertodecidewhatconstraintsshouldbedropped,wewillanalyzetheLagrangianmultipliersdualvariablescorrespondingtothetrivialsolution.Weknowthatifthedualvariablecorrespondingtoaconstraintisnonzero,thisconstraintisactiveandkeepstheoptimalsolutionfromimprovement.So,aslongasx=0istheonlyfeasiblesolutiontotheproblem,weiterativelyremoveconstraintswithcorrespondingnonzerodualvariables 65

PAGE 66

unlessweobtaintheopportunitytoimprovethesolution.Ifthisprocedureleadstoremovalofallconstraints,weconcludethatthegivenfeatureselectionproblemisnotsuitableforthechosenclusteringcriterion.Afterthefeatureselectionisperformed,wewillassignatestsampleb=bii=1:::mtotheclassS^khavingthenearestcentroidc^kwithrespecttotheweightsx,thatisforallk=1:::r,mXi=1bi)]TJ/F21 11.955 Tf 11.955 0 Td[(ci^k2ximXi=1bi)]TJ/F21 11.955 Tf 11.955 0 Td[(cik2xi;{15ifweusethek-meanscriterion;totheclassS^k3^jofthenearestneighbor^jfromthetrainingsetwithrespecttotheweightsx,thatisforallj=1:::nmXi=1)]TJ/F21 11.955 Tf 5.479 -9.684 Td[(bi)]TJ/F21 11.955 Tf 11.956 0 Td[(ai^j2ximXi=1bi)]TJ/F21 11.955 Tf 11.955 0 Td[(aij2xi;{16ifweusethepairwisethresholdcriterion.InFigure 4-1 wedescribeourdataclassicationalgorithmformally. Formthelinearprogram 4{13 4{9 4{8 ifusingthek-meanscriterion,or 4{14 4{10 4{11 4{12 4{8 ifusingthepairwisethresholdcriterion;repeatsolvethelinearprogrammingproblem;ifx=0thendropconstraintswithjkj>,whicharenottheboundconstraints 4{8 ,fromtheLP,wherekisthedualvariablecorrespondingtothek-thconstraint;enduntilx6=0or 4{8 aretheonlyconstraintsremaining;ifx=0thenexit;//noadmissiblefeatureselectionexistswiththechosencriterion;endClassifyeachtestsampleusing 4{15 ifusingthek-meanscriterionorusing 4{16 ifusingthepairwisethresholdcriterion;Figure4-1.Dataclassicationalgorithm 66

PAGE 67

Weshouldnoteherethatthealgorithmhascomputationalcomplexitycomparablewithsolvingthelinearprogram,sincewheneverthelinearprogramsolvedmorethanonce,itssizedecreaseswiththeconstraintsremoved.InthenextsectionwediscusscomputationalresultsontwoDNAmicroarraydatasets.ForsolvinglinearprogrammingproblemsweusedCPLEX[ 35 ].4.3ComputationalExperiments4.3.1ALLvs.AMLDataSetThefeatureselectionprogramwiththek-meanscriterion 4{13 4{9 deliveredtheoptimumvalue7069:3582whichmeansthatalmostallfeatureswereselectedwithweightscloseto1.Thesubsequentclassicationofthetestsetby 4{15 gavetwomisclassications:theAML-sample64andAML-sample66wereclassiedintotheALLclass.Thepairwisethresholdprogram 4{14 4{10 4{11 4{12 selected1457featureswithnonzeroweights.Thesubsequentclassicationofthetestsetwasperfect:allALLandAMLtestsampleswereclassiedintoappropriateclasses.Toprovidejusticationofthequalityofthisresult,weshouldmentionthatthesupportvectormachinesSVMapproachdeliversupto5classicationerrorsontheALLvs.AMLdatasetdependingonhowtheparametersofthemethodaretuned[ 89 ].Furthermore,theperfectclassicationwasobtainedonlywithonespecicsetofvaluesoftheparameters.4.3.2ColonCancerDataSetAcoloncancermicroarraydatasetincludingexpressionprolesof2000genesfrom22normaltissuesand40tumorsampleswaspublishedin[ 3 ].Werandomlyselected11normaland20tumorsamplesintothetrainingset.Theotherhalfofsampleswereusedasthetestset.Thefeatureselectionprogramwiththek-meanscriterion 4{13 4{9 deliveredtheoptimumvalue1903:045.Thenumberoffeaturesselectedwithnonzeroweightswas1901. 67

PAGE 68

Theclassicationerrorswereasfollows:4Normalsamples8,12,34,36areclassiedintoTumorclass,and2Tumorsamples,36areclassiedintoNormalclass.Thepairwisethresholdconstraintsallowedforafeasiblesolutiononlyaftertwoiterationsofexclusionofactiveconstraints,andafterthatonly32featureswereselectedwithnonzeroweights.Themisclassiedsamplesare5Normal,8,12,34,36,and2Tumor,36.4.4ConclusionsWehavedevelopedanoptimizationapproachtohandlingdataclassicationproblems,whichusesauniedmethodologyforfeatureselectionandclassicationwiththepossibilityofoutlierdetection.Ithasaverynaturalconnectiontotheconceptsofunsupervisedclustering.Sincetheusedunsupervisedclusteringcriteriaarenotxed,themethodologyishighlyexibleandpotentiallymaybeusedtoprocessdataofarbitrarynature.Thefactthatthepracticallyimportantdataminingproblemscanberepresentedasoptimizationproblemsallowsustousestandardoptimizationsoftwarepackagestosolvethem.Thisdirectiongivesusapromiseformoreecienttreatmentofreal-worldproblems,whoseoriginalformulationisnormallyquitefuzzy.Thegoodperformanceonknownmicroarraydatasetsconrmsreliabilityoftheappliedmethodology. 68

PAGE 69

CHAPTER5GRAPHMODELSINDATAANALYSISOneofthemostimportantaspectsofdataanalysisisndinganecientwayofsummarizingandvisualizingthedatathatwouldallowonetoobtainusefulinformationaboutthepropertiesofthedata.Asthedatanormallycomeasasequenceofsamplesandthecrucialinformationcharacterizingthedatamostoftenlieinrelationsbetweenthesamples,graphmodelstreatingthesamplesasverticesandtherelationsasedgesbetweenthemcomeasahandytooltorepresentandprocessthedata.Oneofsuchremarkablegraphmodelscanbefoundintheareaoftelecommunication.ThecallgraphGCV;EisdenedasagraphwhoseverticesVcorrespondtotelephonenumbersandtwoofthemareconnectedbyanedgeu;v2Eifacallwasevermadefromonenumbertotheother.Abelloetal.[ 2 ]useddatafromAT&Ttelephonebillingrecordstoconstructthisgraph.Thisway,massivetelecommunicationtracdatawererepresentedinaformsuitableforinformationretrieval.Nevertheless,thecallgraphcanbesolargethattheconventionalmethodsofdataanalysisarenotabletoprocessitdirectly.Indeed,Abelloetal.reportedthatconsideringa20-dayperiod,oneobtainsthecallgraphofabout290millionverticesand4billionedges.Theanalyzedone-daycallgraphhad53,767,087verticesandover170millionedges.Thisgraphhad3,667,448connectedcomponents,mostofwhichhadjustonlytwoverticesconnectedbyanedge.Only302,468ofthecomponentsor8%hadmorethan3vertices.Itwasalsoobservedthatthegraphhadagiantconnectedcomponentof44,989,297vertices.Largecliquesinthecallgraphmayrepresenttelephoneservicesubscribersformingclosegroups,sondingthemisusefulforanumberofdataminingobjectivessuchasecientmarketing,ordetectingsuspiciousactivities.Hence,theauthorspursuedndinglargecliquesaswellasevenlargerdensesubgraphsinthegiantconnectedcomponentusingsophisticatedheuristics.AsimilargraphmodelhasbeenusedtoanalyzeconnectionsamongInternethostsandthenetworkoflinksoftheWorldWideWeb.IntheInternetgraphGIV;E,theset 69

PAGE 70

ofverticesVcorrespondstothesetofroutersnavigatingpacketsofdatathroughouttheInternet,whileanedgeu;v2Erepresentsaphysicalconnectionbetweentherouteru2Vandtherouterv2Vwhichcanbeeithercreatedbyacableorcanbewireless.SpecicdataandresearchperformedontheInternetgraphcanbefoundontheInternetMappingProject"webpage[ 32 ].IntheWebgraphGWV;E,thesetofverticesVrepresentsthewebpagesexistingintheWorldWideWeb,andanedgeu;v2Esigniesalinkfromthepageu2Vtothepagev2V.Noticethatincontrasttothegraphsdescribedearlier,theWebgraphisgenuinelydirected,aseachlinkintheWorldWideWebhasonlyonedirection.However,formanypurposesofdataanalysisthedirectioncanbedisregardedtosimplifythemodelandmakeapplicablethemethodsproventobeecientonothergraphmodelsofsimilartype.ThenextremarkablegraphmodelofsimilarspiritisthemarketgraphintroducedbyBoginskietal.[ 13 ].InsuchagraphGWV;Eeachvertexv2Vcorrespondstoaparticularnancialsecurity.ThenwecomputethecorrelationmatrixCorthematrixofanothersimilaritymeasureforthesetofsecuritiesVonthebasisofobservedpricesoveracertainperiodoftime.Further,givensomethresholdc0,wewillconsideranedgeu;vtobepresentinthegraphGifandonlyifthecorrelationcuvbetweensecuritiesuandvisnotlessthanc0.Naturally,cliquesinthemarketgraphcorrespondtogroupsofsecuritiesshowingsimilarbehaviorthus,suchsecuritiesarecloselyrelatedtoeachotherduetosomestructuralpropertiesoftheeconomicsystem.Ontheotherhand,independentsetssuggestwelldiversiedportfoliosofsecuritieswhosebehaviorislessrelatedtoeachotherundertheexistingeconomicconditions.Finally,thelastbutnottheleastapplicationofthegraphrepresentationofpairwiserelationscomesfrombiomedicine.Humanbrainisoneofthemostcomplexsystemseverstudiedbyscientists,andtheenormousnumberofneuronsandthedynamicnatureofconnectionsbetweenthemmakestheanalysisofbrainfunctionespeciallychallenging.However,itisimportanttogainacertainunderstandingofsuchdynamics,atleast, 70

PAGE 71

formoreecienttreatmentofthewidespreadneurologicaldisorders,suchasepilepsy.DuringthelastdecadetheextensiveuseofelectroencephalogramsEEGallowedforasignicantprogressinquantitativerepresentationofthebrainfunction.Inshort,EEGsimultaneouslyrecordstheelectricsignalfromanumberofprespeciedspotsonthebrain,thusprovidingacollectionsynchronizedtimeseries.CertainmeasuresofsimilaritybetweentheseseriessuchasT-indexcanbecomputedandobservedintheirevolutionthroughoutthetimeofrecordingsee,e.g.,[ 56 ].ThenwecandenethebraingraphGBV;EwhoseverticesVcorrespondtotheelectrodesrecordingEEGand,similarlytothemarketgraph,andedgeu;v2Ebetweentwoofthemisintroducediftheirsimilarityexceedsacertainthreshold.Cliquesinthebraingraphcorrespondtogroupsoffunctionalareasentrainedinasynchronousactivity,whichmayclearlydesignatethecurrentstateofthebrainorsomeexistingpathologicalcondition.5.1ClusterCoresBasedClusteringItcomesasnosurprisethatthenotionoflargecliquesisfruitfulforclusteringapplication.Recently,Y.D.Shenetal.developedso-calledclustercores-basedcluster-ingforhigh-dimensionaldata[ 80 ].Themainadvantagesofthisnovelapproachoverconventionalclusteringmethodsconsistinextendingtheapplicablesimilaritymeasuresbetweensamplestosemantic-basedones,andlinearscalabilitywithrespecttothenumberofsamples.Infact,thesimilaritycriterionforsamplesmaybecommunicatedbytheuserofthedataminingapplicationviaasetofrulesafterthedatasetisalreadyformedandmadeavailablefortheanalysis.Forinstance,ifdatasamplesrepresentdierentpeopleandthereareattributesexpressingtheiroccupation,ageandincome,theusermaychoosetotakeintoaccountallofthem,disregardoneofthemsay,ageasirrelevanttoaparticularresearchgoal,ortopickjustoneofthemsay,incomeastheonlyoneofthesethreeattributessignicantforthegoal.Suchinteractivityhelpstoavoidundesirableinuenceofaprioriirrelevantattributestotheoutcomeofdataminingprocedureand,atthesametime,toghtthecurseofdimensionality"arisingfromtheenormousamountof 71

PAGE 72

featuresthatmaycharacterizethesamplesofthedataset,butwhicharenotimportanttocertaingoals.Infact,thecores-basedclusteringavoidsworkingwiththefeaturescompletelybyoperatingonthesimilaritygraphdenedforsamplesalone.Thisgraphhasthesetofverticesrepresentingthesamplesofthedataset,andanedgebetweentwoverticesisintroducedifandonlyifthesetwosamplesarerecognizedassimilarbytheuser.Theclustercoresaredenedasmaximalcliquesinthesimilaritygraphexceedingacertainsize: Denition3. AsubsetofsamplesCrisaclustercoreifitsatisesthefollowingthreeconditions:1.jCrj;2.AnytwosamplesfromCraresimilar;3.ThereexistsnoC0suchthatCrC0andC0satises2.Then,aclusterisdenedasanaturalextensionofthecoreallowingadditionalsamplestohavenolessthanjCrjsimilarsamplesinCr: Denition4. AsubsetofsamplesCisaclusterwithrespecttoselectedthresholdifcontainseverysampleofthedatasetsuchthatthesample1.eitherisamemberofthecoreCr,2.orhasatleastjCrjsamplessimilartoitinCr,01.NotethataclustercoreCrandathreshold01deneauniquecluster.Y.D.Shenetal.utilizedthedenednotionstoconstructclusteringAlgorithm 5-1 .5.2Decision-MakingunderConstraintsofConictsAnotherfruitfulapplicationofgraphmodelstodataanalysiscomesfromtheapplicationswhereasetofpossibledecisionsisconsideredundertheconstraintsofpairwiseconictsbetweenthem.Insuchasetup,eachdecisionismodeledasavertexofagraph,whileanedgebetweentwoverticesisintroducedifthetwocorrespondingdecisions 72

PAGE 73

Input: similaritygraphGV;E,thresholdparameter01 Output: setofclustersfCigi1;whileV6=;dondaclustercoreCrV;constructtheclusterCiasdenedbyCrand;removeallverticesofCifromGV;E;ii+1;endFigure5-1.Clustercoresbasedclusteringalgorithm age salary assets credit 20 30,000 25,000 poor 25 76,000 75,000 good 30 90,000 200,000 good 40 100,000 175,000 poor 50 110,000 250,000 good 60 50,000 150,000 good 70 35,000 125,000 poor 75 15,000 100,000 poor Figure5-2.ExampleoftwoCaRTsforadatabase haveconicts.Then,anyadmissiblesetofdecisionscorrespondstoanindependentsetofthegraph,andfollowingacertainobjective,wecanassociateappropriateweightswiththegraphverticestoobtainthecorrespondencebetweenoptimalsetofdecisionswiththemaximumweightindependentsetsofthegraph.Themaximumweightindependentsetproblemisequivalenttothemaximumweightcliqueproblemforthecomplementarygraph,henceitcanbealsotackledwithecientmaximumcliqueheuristics.Oneremarkableapplicationofthistechniqueismodelingdatasets/databaseswithasetofclassicationandregressiontreesCaRTs[ 17 ].LetusillustratetheconceptofCaRTwithasimpleexampleconsideredin[ 4 ]Figure 5.2 .Thepresentedisanexcerptfromadatabasetablewith4attributesand8records.Supposethatweconsider 73

PAGE 74

insignicantanerrorofmagnitude2forage,5,000forsalaryand25,000forassets.Theguredepictstwopossibledecisiontreesforthisdataset:therstclassicationtreepredictsthecreditattributewithsalaryasthepredictorattribute,thesecondregressiontreepredictstheassetswithsalaryandageasthepredictorattributes.Whenwemodeladataset/databaseviaCaRTs,weomitpredictedattributesinallbuttheoutliersamples/records,andstorethecorrespondingCaRTsinsteadofit.Thevalueofsuchrepresentationistwofold.First,theCaRTstellusalotaboutthepatternsexistinginthedataandhaveapowerfulpredictionabilityforthepropertiesofnewsamplesrecordsthatmaypotentiallybestoredinthefuture.Second,theomissionofpredictedattributesisanopportunitytoreducethesizeofstoragespacerequiredtostorethedata,whichisthevitalnecessityintoday'sITworldoverloadedwithimmenseamountsofdataanddataows.Virtuallyallindustrialdatabasesaremaintainedtodaywithoutanycompression,evenwhentheyaredatawarehousesofmanyterabytesinsizeasanillustrationoftheemergingtrendssee,e.g.,mediacoverageofTopTenBiggest,BaddestDatabases"athttp://www.eweek.com/article2/0,4149,1410944,00.asp.FindingtheoptimalsetofCaRTsminimizingsizeofthecompressedtableisaverychallengingcombinatorialoptimizationproblem.Infact,thereisnotonlytheexponentialnumberofpossibleCaRT-basedmodelstochoosefrom,butalsoconstructingCaRTstoestimatetheircompressionbenetsisacomputationallyintensivetask.Therefore,employmentofsophisticatedtechniquesfromknowledgediscoveryandcombinatorialoptimizationareasiscrucial.Astheauthorspointoutin[ 4 ],selectionoftheoptimalCaRTmodelsforadatabasetablecanbeperformedviasolvingthemaximumweightindependentsetproblem.Indeed,eachCaRTcanbetreatedasadecisionrepresentingavertexinthegraph.TwoCaRTshaveaconictbetweeneachotheriftheyeitherpredictthesameattribute,oroneofthemutilizesanattributepredictedbytheotherCaRT.TheappropriateweighttoassigntoaCaRTiseithersomequanticationofitspredictionvalue 74

PAGE 75

forthedataanalysispurposes,ortheamountofstoragethatwillbesavedoftheCaRTisusedforthedatabasecompressionpurposes.5.3ConclusionsWehavediscussedanumberofmodelsandapplicationsutilizinggraphsandnetworksforthedataanalysispurposes.RangingfromITandtelecommunicationtobiomedicineandnance,thesemethodologiesprovideahandytooltograspthemostimportantcharacteristicsofthedataandvisualizethem.Asthetechnologicalprogresscontinues,itmaybeexpectedthatmoreandmorepracticaleldswillbecomeasourceofvariousmassivedatasets,forwhichthegraphmodelswillbetheecientapproachtoapply.Theremarkableroleofcliquesandindependentsetsintheconstructedgraphsgivestheadditionalmotivationtocomeupwithmorepracticallyecientalgorithmsforthemaximumclique/independentsetproblem,whichisofagreatimportanceforthecomputationalcomplexitytheorybeinganNP-hardproblemwithoneofthesimplestformulations.Hence,thesignicanceofthisproblemformodelingdatasetsandcomplexsystemscannotbeoverestimated. 75

PAGE 76

CHAPTER6ANEWTRUSTREGIONTECHNIQUEFORTHEMAXIMUMWEIGHTCLIQUEPROBLEM6.1IntroductionLetGV;Ebeasimpleundirectedgraph,V=f1;2;:::;ng.TheadjacencymatrixofGisamatrixAG=aijnn,whereaij=1ifi;j2E,andaij=0ifi;j=2E.Thesetofverticesadjacenttoavertexi2VwillbedenotedbyNi=fj2V:i;j2Egandcalledtheneighborhoodofthevertexi.AsubgraphG0V0;E0,V0VwillbecalledinducedbythevertexsubsetV0ifi;j2E0wheneveri2V0,j2V0,andi;j2E,andE0includesnootheredges.AcliqueQisasubsetofVsuchthatanytwoverticesofQareadjacent.ItiscalledmaximalifthereisnoothervertexinthegraphconnectedwithallverticesofQ.Similarly,anindependentsetSisasubsetofVsuchthatanytwoverticesofSarenotadjacent,andSismaximalifanyothervertexofthegraphisconnectedwithatleastonevertexofS.Agraphiscalledcompletemultipartiteifitsvertexsetcanbepartitionedintomaximalindependentsetspartsandanytwoverticesfromdierentpartsareadjacent.Obviously,acliqueisacompletemultipartitegraph,whoseallpartsaresinglevertices.Themaximumcliqueproblemasksforacliqueofmaximumcardinality.Thiscardinalityiscalledthecliquenumberofthegraphandisdenotedby!G.Next,weassociatewitheachvertexi2Vapositivenumberwicalledthevertexweight.Thisway,alongwiththeadjacencymatrixAG,weconsiderthevectorofvertexweightsw2Rn.ThetotalweightofavertexsubsetSVwillbedenotedbyWS=Xi2Swi:ThemaximumweightcliqueproblemasksforacliqueQofthemaximumWQvalue.Wedenotethisvalueby!G;w.BoththemaximumcardinalityandthemaximumweightcliqueproblemsareNP-hard[ 42 ],soitisconsideredunlikelythatanexactpolynomialtimealgorithmfor 76

PAGE 77

themexists.Approximationoflargecliquesisalsohard.Itwasshownin[ 52 ]thatunlessNP=ZPPnopolynomialtimealgorithmcanapproximatethecliquenumberwithinafactorofn1)]TJ/F22 7.97 Tf 6.586 0 Td[(forany>0.Recentlythismarginwastightenedton=2logn1)]TJ/F23 5.978 Tf 5.756 0 Td[([ 58 ].Hencethereisagreatneedinpracticallyecientheuristicalgorithms.Foranextensivesurveyofdevelopedmethods,see[ 14 ].Theapproachesoeredincludesuchcommoncombinatorialoptimizationtechniquesassequentialgreedyheuristics,localsearchheuristics,methodsbasedonsimulatedannealing,neuralnetworks,geneticalgorithms,tabusearch,etc.AmongthemostrecentandpromisingcombinatorialalgorithmsaretheaugmentationalgorithmbasedonedgeprojectionbyManninoandStefanutti[ 66 ]andthedecompositionmethodwithpenaltyevaporationheuristicsuggestedbySt-Louis,Ferland,andGendron[ 82 ].Finally,therearemethodsutilizingvariousformulationsofthecliqueproblemasacontinuousnonconvexoptimizationproblem.ThemostrecentmethodsofthiskindincludePBHalgorithmbyMassaro,Pelillo,andBomze[ 67 ],andMax-AOalgorithmbyBurer,Monteiro,andZhang[ 21 ].Therstoneisbasedonlinearcomplementarityformulationofthecliqueproblem,whilethesecondoneemploysalow-rankrestrictionupontheprimalsemideniteprogramcomputingtheLovasznumber#-functionofagraph.InthischapterwepresentacontinuousmaximumweightcliquealgorithmnamedQUALEX-MSQUickALmostEXactMotzkin{Straus-based.Itfollowstheideaofndingstationarypointsofaquadraticfunctionoverasphereforguessingnear-optimumcliquesexploitedinQUALEXandQSHalgorithms[ 22 25 ].However,QUALEX-MSisbasedonanewgeneralizedversionoftheMotzkin{Strausquadraticprogrammingformulationforthemaximumweightclique,andweattributeitsbetterperformancetospecicpropertiesofitsoptimaalsodiscussedinthischapter.AsoftwarepackageimplementingQUALEX-MSisavailableat[ 25 ].Thechapterisorganizedasfollows.InSection 6.2 werevisetheMotzkin{Straustheoremtousethequadraticprogrammingformulationforthemaximumweightclique 77

PAGE 78

problem.Section 6.3 reviewsthetrustregionproblemandndingitsstationarypoints.InSection 6.4 weprovideatheoreticalresultconnectingthetrustregionstationarypointswithmaximumcliquendingandformulatetheQUALEX-MSmethoditself.Section 6.5 describescomputationalexperimentswiththealgorithmandtheirresults.InthenalSection 6.6 wemakesomeconclusionsandoutlinefurtherresearchwork.6.2TheMotzkin{StrausTheoremforMaximumCliqueandItsGeneralizationIn1965,MotzkinandStrausformulatedthemaximumcliqueproblemasacertainquadraticprogramoverasimplex[ 70 ]. Theorem7Motzkin{Straus. Theglobaloptimumvalueofthequadraticprogrammaxfx=1 2xTAGx{1subjecttoXi2Vxi=1;x0{2is1 21)]TJ/F15 11.955 Tf 23.252 8.088 Td[(1 !G:{3See[ 1 ]forarecentdirectproof.Weformulateasimplegeneralizationofthisresultforthemaximumweightcliqueproblemandproveitsimilarlyto[ 1 ].Incontrasttothegeneralizationestablishedin[ 44 ],thisonedoesnotrequireanyreformulationofthemaximumcliquequadraticprogramtoanotherminimizationproblem.ItmaximallypreservestheformoftheoriginalMotzkin{Strausresult.Letwminbethesmallestvertexweightexistinginthegraph.Weintroduceavectord2Rnsuchthatdi=1)]TJ/F21 11.955 Tf 13.151 8.088 Td[(wmin wi:Considerthefollowingquadraticprogram:maxfx=xTAG+diagd1;:::;dnx{4 78

PAGE 79

subjecttoXi2Vxi=1;x0:{5First,weformulateapreliminarylemma. Lemma1. Letx0beafeasiblesolutionoftheprogram 6{4 6{5 andi;j=2Ebeanon-adjacentvertexpairsuchthatx0i>0,x0j>0,andwithoutlossofgenerality@f @xix0@f @xjx0.Thenthepointx00,wherex00i=x0i+x0j;x00j=0;x00k=x0k;k2V;i6=k6=j;{6isalsoafeasiblesolutionof 6{4 6{5 andfx00fx0.Theequalityfx00=fx0holdsifandonlyifwi=wj=wminandPk2Nix0k=Pk2Njx0k.Proof.Itiseasytoseethatx00satisestheconstraints 6{5 andhenceitisafeasiblesolution.Nowweshowthatthissolutionisatleastasgoodasx0.Sincei;j=2E,aij=0andthereisnoxixjtermintheobjectivefx.So,wecanpartitionfxintotermsdependentonxi,termsdependentonxj,andtheotherterms:fix=dix2i+2xiXk2Nixk;fjx=djx2j+2xjXk2Njxk; fijx=fx)]TJ/F21 11.955 Tf 11.955 0 Td[(fix)]TJ/F21 11.955 Tf 11.955 0 Td[(fjx:Thepartialderivativesoffxwithrespecttoxiandxjare:@f @xi=@fi @xi=2dixi+2Xk2Nixk;@f @xj=@fj @xj=2djxj+2Xk2Njxk:Wehave fijx00= fijx0andfjx00=0,sotocomparefx00tofx0weshouldevaluatefix00andcompareittofix0+fjx0.Inthesecomputationswetakeintoaccountthat 79

PAGE 80

dianddjarealwaysnonnegative:fix00=dix0i+x0j2+2x0i+x0jPk2Nix0k=fix0+2dix0ix0j+di)]TJ/F21 11.955 Tf 5.48 -9.684 Td[(x0j2+2x0jPk2Nix0k=fix0+x0j2dix0i+2Pk2Nix0k+di)]TJ/F21 11.955 Tf 5.48 -9.684 Td[(x0j2=fix0+x0j@fi @xix0+di)]TJ/F21 11.955 Tf 5.48 -9.684 Td[(x0j2fix0+x0j@fj @xjx0+di)]TJ/F21 11.955 Tf 5.479 -9.684 Td[(x0j2fix0+x0j@fj @xjx0=fix0+2dj)]TJ/F21 11.955 Tf 5.48 -9.684 Td[(x0j2+2x0jPk2Njx0kfix0+dj)]TJ/F21 11.955 Tf 5.479 -9.684 Td[(x0j2+2x0jPk2Njx0k=fix0+fjx0:{7Hencefx00fx0.Next,weobservethatallthe-relationsin 6{7 becomeequalitiesifandonlyifdi=dj=0and@fi @xix0=@fj @xjx0.Therstimmediatelyimplieswi=wj=wmin,andtogetherwiththeseconditmeansthatPk2Nixk=Pk2Njxk.Thiscompletestheproofofthelemma. NowwearereadytoestablishthegeneralizedversionoftheMotzkin{Straustheorem. Theorem8. Theglobaloptimumvaluetheprogram 6{4 6{5 is1)]TJ/F21 11.955 Tf 21.656 8.088 Td[(wmin !G;w:{8ForeachmaximumweightcliqueQofthegraphGV;Ethereisaglobaloptimumxoftheprogram 6{4 6{5 suchthatxi=8><>:wi=!G;w;ifi2Q0;ifi2VnQ:{9Proof.Letusdenethesupportofafeasiblesolutionx0asthesetofindicesofnonzerovariablesV0=fi2V:x0i>0g.FromLemma 1 itfollowsthattheprogram 6{4 6{5 hasaglobaloptimumwhosesupportisaclique.Indeed,ifx0isaglobaloptimumsuchthatforsomenon-adjacentvertexpairi;j=2E,x0i>0andx0j>0,thenthepointx00denedin 6{6 isalsoaglobaloptimum.UsingthispropertywecanalwaysobtainaglobaloptimumxwhosesupportisacliqueQ.NowweshowthatQisnecessarilyamaximumweightclique. 80

PAGE 81

Indeed,inthesubspacefxig:i2Qwehavetheprogram:maxfx=Xi2Qdix2i+Xi2QXj2Qj6=ixixj{10subjecttoXi2Qxi=1:TheobjectivemaybetransformedtoXi2Qxi!2)]TJ/F26 11.955 Tf 13.074 11.358 Td[(Xi2Qwminx2i wi:Thersttermisequalto1duetotheconstraint,sowemayconsideranequivalentprogram:Xi2Qx2i wi!minTheLagrangianoftheprogramisXi2Qx2i wi+Xi2Qxi)]TJ/F15 11.955 Tf 11.956 0 Td[(1!:Itiseasytoseeithastheonlystationarypointxi=wi WQ;i2Q;=2 WQ;andthispointistheminimum.So,xi=wi=WQ;i2Q.Evaluatetheobjectivefx.Itis1)]TJ/F26 11.955 Tf 13.074 11.358 Td[(Xi2Qwminxi2 wi=1)]TJ/F26 11.955 Tf 13.074 11.358 Td[(Xi2Qwminw2i wiWQ2=1)]TJ/F21 11.955 Tf 19.553 8.088 Td[(wmin WQ:ThisvalueislargestwhenWQislargest,sotheobjectiveattainsaglobaloptimumwhenQisamaximumweightclique.Therefore,maxfx=fx=1)]TJ/F21 11.955 Tf 21.656 8.088 Td[(wmin !G;w: 81

PAGE 82

Finally,itiseasytoseethatforanymaximumweightcliqueQthepointxdenedasxi=8><>:wi=!G;w;ifi2Q0;ifi2VnQprovidestheobjectivevalue 6{8 .So,eachmaximumweightcliquehasaglobaloptimumoftheprogram 6{4 6{5 correspondingtoitasclaimed. WeextendTheorem 8 bythefollowingresultcharacterizingglobaloptimaof 6{4 6{5 : Theorem9. Letxbeaglobaloptimumoftheprogram 6{4 6{5 andGV;EbethesubgraphinducedbythesupportV=fi2V:xi>0gofx.ThenGisacompletemultipartitegraph,whoseanypartmayhavemorethanonevertexonlyifallverticesofthisparthavethesameweightwmin,andanymaximalcliqueofGisamaximumweightcliqueofthegraphGV;E.Proof.FirstweprovethatifthesubgraphGincludesanon-adjacentvertexpairi;j=2E,thentheverticesiandjnecessarilyhaveinitthesameneighborhood.Lemma 1 necessitatestheconditionswi=wj=wminandPk2Nixk=Pk2Njxk.Supposethereissome`2Vsuchthati;`2Ewhilej;`=2E.ThenLemma 1 alsonecessitatesw`=wminandPk2N`xk=Pk2Njxk.Obtainthepointxfromxbyalteringonlytwocoordinates:xi=xi+xj=2andxj=xj=2.Obviously,xisalsoaglobaloptimumasbecauseoftheabovementionedconditionsthesumoftermsoftheobjectivedependentonxiorxjremainsthesame.ButnowXk2N`xk=Xk2N`xk+xj=2=Xk2Njxk+xj=2>Xk2Njxk;sothevaluefxcanbeimprovedbyincreasingx`whilefurtherdecreasingxj.Henceneitherxnorxisaglobaloptimum,andwehaveobtainedacontradiction.Similarly,wecanshowthatthereisno`2Vsuchthatj;`2Ewhilei;`=2E.Thus,iandjhavethesameneighborhoodinG. 82

PAGE 83

NowitiseasytoseethatmaximalindependentsetsofGdonotintersect,andhenceitisacompletemultipartitegraph.Asitcanhaveanon-adjacentpairofverticesonlyifbothverticesofthispairhavetheweightwmin,weobtainthatGcannothavemultivertexpartswithverticesofanotherweight.Next,usingthetransformation 6{6 foreachnon-adjacentvertexpairinG,wecanarriveatanotherglobaloptimumwhosesupportisanarbitrarymaximalcliqueofG.AswehaveshownintheproofofTheorem 9 ,itimpliesthatthiscliqueisamaximumweightone.Therefore,allmaximalcliquesofGaremaximumweightcliquesofG.Theorem 9 evidencesthatallglobaloptimaoftheMotzkin{Strausprogramareequallyusefulforsolvingthecliqueproblem,andthereisnoneedtodriveoutspurious"optimanotcorrespondingdirectlytocliques.Evenmore,aglobaloptimumwhosesupportincludesnon-adjacentverticesprovidesmoreinformationasitrevealsimmediatelyafamilyofoptimumcliques.Onemayobservethattheprogram 6{4 6{5 hassimilarcorrespondenceofitslocaloptimatoothermaximalcliquesofthegraph.Henceitiscomplicatedtoarriveatanoptimumcliqueapplyinggradient-basedoptimizationmethodstotheMotzkin{Strausprogram.So,inourworkweexploreanotherapproach.Forthedevelopmentofourmethodwewillusearescaledformofthequadraticprogram 6{4 6{5 .Firstofall,forthegraphGV;EwiththevertexweightswdenetheweightedadjacencymatrixAwG=awijnnsuchthatawij=8>>>><>>>>:wi)]TJ/F21 11.955 Tf 11.955 0 Td[(wmin;ifi=jp wiwj;ifi;j2E0;ifi6=jandi;j=2E:{11Obviously,itistheordinaryadjacencymatrixwhenallvertexweightsareones.Next,weintroducethevectorofvertexweightsquarerootsz2Rn:zi=p wi:{12 83

PAGE 84

TherescaledformulationisgiveninthefollowingcorollaryofTheorem 8 Corollary1. Theglobaloptimumvalueofthequadraticprogrammaxfx=xTAwGx{13subjecttozTx=1;x06{14is1)]TJ/F21 11.955 Tf 21.656 8.088 Td[(wmin !G;w:ForeachmaximumweightcliqueQofthegraphGV;Ethereisaglobaloptimumof 6{13 6{14 suchthatxi=8><>:zi=!G;w;ifi2Q0;ifi2VnQ:{15Proof.Performthevariablescalingxi!p wixiintheformulationofTheorem 8 .Thecorollaryisobtainedimmediately. Ausefulpropertyoftherescaledformulationisthatoptimacorrespondingtoallmaximumweightcliquesarelocatedatthesamedistancefromtheorigin.Nowwestatethisfactformally. Denition5. AnindicatorofacliqueQVisavectorxQ2RnsuchthatxQi=8><>:zi=WQ;ifi2Q0;ifi2VnQ: Proposition1. Allcliquesofthesameweighthaveindicatorslocatedatthesamedistance1=p fromtheorigin. 84

PAGE 85

Proof.ItfollowsimmediatelythattheindicatorofacliqueQVwiththeweightWQ=isavectorofthelengths Xi2Qzi=2=p WQ==1=p : Wemaynoticeherethatcliquesoflargerweighthaveindicatorslocatedclosertotheorigin.Theindicatorsofthemaximumweightcliqueshavethesmallestradius,namely,1=p !G;w.Theideaofourmethodistoreplacethenonnegativityconstraintx0in 6{14 byaballconstraintxTxr2ofaradiusr1=p !G;wandtoregardthestationarypointsofthisnewprogramasvectorssignicantlycorrelatingwiththemaximumweightcliqueindicators.Inthenextsectionweoutlinepolynomialtimendingofstationarypointsofaquadraticonasphere.InourcasethistechniquecanbeusedaftertheobjectiveisorthogonallyprojectedontothehyperplanezTx=1,sothisequalitymayberemovedfromtheconstraints.Inthesubsequentsectionwegiveasubstantiationoftheusedconstraintreplacementprovingaparticularcasewhenasphericalstationarypointisexactlyanoptimumoftheprogram 6{13 6{14 andformulatethealgorithmitself.6.3TheTrustRegionProblemThetrustregionproblemistheoptimizationofaquadraticfunctionsubjecttoaballconstraint.Thetermoriginatesfromanonlinearprogrammingapplicationofthisproblem.Namely,toimproveafeasiblepoint,asmallball{trustregion{aroundthepointisintroducedandaquadraticapproximationoftheobjectiveisoptimizedinit.Then,iftheobjectiveapproximationisgoodenoughwithinthislocality,theballoptimumofthequadraticisveryclosetotheoptimumoftheobjectivethere,anditmaybetakenasthenextimprovedfeasiblesolution.Thistechniqueisveryattractiveinmanycasessincetheoptimizationofaquadraticfunctionoverasphereispolynomiallysolvableincontrasttogeneralnonconvexprogramming[ 95 ].Thereisavastrangeofothersourcesdescribing 85

PAGE 86

theoreticalandpracticalresultsonthetrustregionproblem[ 41 48 64 69 ].Hereweoutlinethecompletediagonalizationmethodderivingnotonlytheglobaloptimumatagivensphereradius,butallstationarypointscorrespondingtoparticularradiiwewanttoconsider.Thatis,theradiusvalueremainsnon-xeduptoanalstepwhenitappearsasaparameterofaunivariateequationdeterminingthestationarypoints.Wenotethatforourapplicationweareinterestedinhyperbolicobjectivesonly,sointeriorstationarypointsneverexist.Thus,considerndingthestationarypointsfortheproblemfx=xTAx+2bTx{16s.t.nXi=1x2i=r2;whereAisagivenrealsymmetricnnmatrix,b2Rnisagivenvector,andx2Rnisthevectorofvariables.First,wediagonalizethequadraticformin 6{16 performingeigendecompositionofA:A=Qdiag1;:::;nQT;whereQisthematrixofeigenvectorsstoredascolumnsandtheeigenvaluesfighavenondecreasingorder.Intheeigenvectorbasis, 6{16 isfy=nXi=1iy2i+2nXi=1ciyi;{17nXi=1y2i=r2;{18andthefollowingrelationshold:x=Qy;y=QTx;b=Qc;c=QTb:{19TheLagrangianof 6{17 6{18 isLy;=nXi=1iy2i+2nXi=1ciyi)]TJ/F21 11.955 Tf 11.955 0 Td[(nXi=1y2i)]TJ/F21 11.955 Tf 11.955 0 Td[(r2!:{20 86

PAGE 87

isthelagrangianmultiplierofthesphericalconstrainthere.Wetakeitwithnegativesignforthesakeofconvenience.Thestationaryconditionsare@L @yi=0;@L @=0:So,@L @yi=2i)]TJ/F21 11.955 Tf 11.955 0 Td[(yi+2ci=0;andassuming6=i,yi=ci )]TJ/F21 11.955 Tf 11.955 0 Td[(i:{21Substituting 6{21 intothesphericalconstraint 6{18 ,wegetnXi=1c2i )]TJ/F21 11.955 Tf 11.955 0 Td[(i2)]TJ/F21 11.955 Tf 11.955 0 Td[(r2=0:{22Theleft-handsideof 6{22 isaunivariatefunctionconsistingofn+1continuousandconvexpieces.Asallthenumeratorsarepositive,ineachpiecebetweentwosuccessiveeigenvaluesofAitmayintersect-axistwicedeterminingtwostationarypointsonthesphere,touchitoncedeterminingonestationarypoint,orbeovertheaxisnostationarypointcorrespondstothesevalues.Thatdependsonthechosenradiusr:thegreatertheradius,themorecasesoftwosphericalstationarypointswithinonecontinuouspieceof 6{22 .Twooutermostcontinuouspiecesare;1andn;+1.Ineachofthem 6{22 alwayshasoneandonlyoneroot.Therootintherstpieceistheglobalminimum,therootinthesecondpieceistheglobalmaximum.Adegeneratecasewhen=iforsomeiispossibleifci=0.Then,ifiisamultipleeigenvalueofA,allcjcorrespondingtoj=imustbeequaltozerotocausethedegeneracy.Thenallyjsuchthat6=jshouldbecomputedby 6{21 ,andifthesumoftheirsquaresisnotabover2,anycombinationoftherestentriesofyobeying 6{18 givesastationarypoint.Formally,wehaveaclusterofkequaleigenvalues 87

PAGE 88

i=i+1=:::=i+k)]TJ/F20 7.97 Tf 6.586 0 Td[(1withci=ci+1=:::=ci+k)]TJ/F20 7.97 Tf 6.586 0 Td[(1=0:{23Ifr20=i)]TJ/F20 7.97 Tf 6.587 0 Td[(1Xj=1y2j+nXj=i+ky2jr2;{24wherethevaluesyjarecomputedby 6{21 with=i,thenanyyi;yi+1;:::;yi+k)]TJ/F20 7.97 Tf 6.587 0 Td[(1suchthati+k)]TJ/F20 7.97 Tf 6.586 0 Td[(1Xj=iy2j=r2)]TJ/F21 11.955 Tf 11.955 0 Td[(r20provideastationarypoint.So,itispossiblethenthatthenumberofstationarypointsisinnite.Inourmethodwewillconsider,inthedegeneratecase,onlysuchpointsthatallbutoneoftheentriesyi;yi+1;:::;yi+k)]TJ/F20 7.97 Tf 6.586 0 Td[(1arezero.Thereare2kcases:yi=p r2)]TJ/F21 11.955 Tf 11.955 0 Td[(r20;yi+1=0;:::;yi+k)]TJ/F20 7.97 Tf 6.586 0 Td[(1=0;yi=0;yi+1=p r2)]TJ/F21 11.955 Tf 11.955 0 Td[(r20;:::;yi+k)]TJ/F20 7.97 Tf 6.586 0 Td[(1=0;yi=0;yi+1=0;:::;yi+k)]TJ/F20 7.97 Tf 6.586 0 Td[(1=p r2)]TJ/F21 11.955 Tf 11.955 0 Td[(r20;{25soaneigenvalueofmultiplicitykgives2kpointstoconsider.Finally,wenotethatthetotalcomplexityoftheaboveprocedureisOn3ifwederiveOnstationarypointsandittakesnotmorethanOn2timetogetonevalue.Indeed,theeigendecompositionmaybecomputeduptoanyxedprecisioninOn3time[ 72 ],andeachbasisconversionin 6{19 takesquadratictime,sogenerallywehaveoneOn3computationatthebeginningoftheprocedure,andOncomputationsofOn2complexityeachafterwards. 88

PAGE 89

6.4TheQUALEX-MSAlgorithmThus,wewillworkwiththeprogrammaxfx=xTAwGx{26s.t.zTx=1;xTxr2;whererisaparameternotxedapriori.Wedesignatenowaparticularcase,whenastationarypointoftheprogram 6{26 isanoptimumoftheprogram 6{13 6{14 .Ithappenswhenforeachvertexoutsideamaximumweightcliquetheweightsumofadjacentverticesinthecliqueisconstant.Namely,thefollowingtheoremholds. Theorem10. LetQVbeamaximalcliqueofthegraphGV;Esuchthat8v2VnQ:WNvQ=C;whereCissomexedvalue.ThentheindicatorxQofQxQi=8><>:zi=WQ;ifi2Q0;ifi2VnQ:isastationarypointoftheprogram 6{26 whentheparameterr=1=p WQ.Proof.ConsidertheLagrangianoftheprogram 6{26 .ItisLxQ;1;2=xQTAwGxQ+1zTxQ)]TJ/F15 11.955 Tf 11.955 0 Td[(1+2xQTxQ)]TJ/F21 11.955 Tf 11.955 0 Td[(r2:Itspartialderivativesare@L @xQi=2Xi2VawijxQj+zi1+2xQi2==2zi0@zixQi+Xj2NizjxQj1A)]TJ/F15 11.955 Tf 11.955 0 Td[(2wminxQi+zi1+2xQi2: 89

PAGE 90

Leti2Q.Thenitgives@L @xQi=2ziXj2Qwj WQ)]TJ/F15 11.955 Tf 11.955 0 Td[(2wminzi WQ+zi1+2zi WQ2==zi2)]TJ/F15 11.955 Tf 14.261 8.088 Td[(2wmin WQ+1+22 WQ:Conversely,ifi2VnQ,@L @xQi=2ziXj2NiQzjxQj+zi1=zi2C WQ+1:Wemayseethatinbothcasesthepartialderivativeisthesameforeachiuptoanonzeromultiplierzi.So,thestationarypointcriterionsystem@L=@xQi=0isreducedtotwoequationsovertwovariables1and2.Thesecondequationdirectlygives1=)]TJ/F15 11.955 Tf 18.477 8.088 Td[(2C WQ:Substitutingthisintotherstequation,weobtain2=C+wmin)]TJ/F21 11.955 Tf 11.955 0 Td[(WQ:So,therearevaluesofthelagrangianmultiplierssatisfyingthestationarypointcriterion.Therefore,xQisastationarypointoftheprogram 6{26 Wenoticethattheobtained2valueisnegativeunlesstheweightofthecliqueQcanbeincreasedbyaone-to-onevertexexchange.Thismeansthatinthestationarypointsweareinterestedinthegradientoftheobjectiveisdirectedoutsidetheconstrainingsphere.Itisconsistentwiththefactthatwelookforamaximumoftheobjective.WenoteaspecialcaseofTheorem 10 correspondingtothemaximumcardinalitycliqueproblem. Corollary2. LetQVbeamaximalcliqueofthegraphGV;Esuchthat8v2VnQ:jNvQj=C; 90

PAGE 91

whereCissomexedvalue,andallvertexweightswiequal1.ThentheindicatorxQofQxQi=8><>:1=jQj;ifi2Q0;ifi2VnQ:isastationarypointoftheprogram 6{26 whentheparameterr=1=p jQj.Generally,optimaof 6{13 6{14 cannotbefounddirectlyasstationarypointsof 6{26 .However,weaccepttheassumptionthatiftheparameterriscloseto1=p !G;w,thenthestationarypointsof 6{26 ,wheretheobjectivegradientisdirectedoutside,providesignicantinformationaboutmaximumweightcliqueindicators.Thismaybesupportedbythefactthattheconjunctionofthreeimposedrequirements{maximizationofaquadraticformwhosematrixisnonnegative,positivedotproductwiththepositivevectorz,andarathersmallnormofthesoughtvectorx{suggeststhatthevectorxshouldhaveratherbeencomposedofpositiveentries.Thus,weheuristicallyexpectthatviolationofthenonnegativityconstraintisnotverydramatic.Asthenextstep,weshowhowtoreducetheprogram 6{26 toatrustregionproblemprojectingorthogonallytheobjectiveontothehyperplanezTx=1.First,wemovetheoriginintoanewpointx0=z=WV:{27ThispointistheorthogonalprojectionoftheoriginontothehyperplanezTx=1.Thatis,weintroducenewvariables^x=x)]TJ/F21 11.955 Tf 11.955 0 Td[(z=WV.Thiswayweobtainanewprogramequivalentto 6{26 :maxg^x=^xTAwG^x+2x0TAwG^x{28s.t.zT^x=0;^xT^x^r2;where^r2=r2)]TJ/F15 11.955 Tf 11.955 0 Td[(1=WVherewetookintoaccountthatx0Tx0=1=WV.Nowtheconstrainingequalitydeterminesalinearsubspace.Theorthogonalprojectorontoitisa 91

PAGE 92

matrixP=pijnn,wherepij=8><>:1)]TJ/F21 11.955 Tf 11.955 0 Td[(wi=WV;ifi=j)]TJ 9.298 6.939 Td[(p wiwj=WV;ifi6=j:Thus,theprogram 6{28 maybereformulatedasmaxg^x=^xT^A^x+2^bT^x{29s.t.^xT^x^r2;where^A=PAwGPand^bT=x0TAwGP.Thisisatrustregionproblem{optimizationofaquadraticsubjecttoasingleballconstraint.Directmatrixmanipulationsshowthat^Aand^bcanbecomputedbytheformulas^aij=awij)]TJ/F21 11.955 Tf 11.955 0 Td[(x0jwi)]TJ/F21 11.955 Tf 11.955 0 Td[(x0iwj+x0ix0jD{30and^bi=wi)]TJ/F21 11.955 Tf 11.955 0 Td[(x0iD WV;{31wherewi=p wiwi)]TJ/F21 11.955 Tf 11.955 0 Td[(wmin+Xj2Niwj{32whicharevertexdegreesintheunweightedcase,andD=Xj2Vwjwj)]TJ/F21 11.955 Tf 11.955 0 Td[(wmin+Xj;k2Ewjwk:{33Thus,ifQisamaximumweightcliqueobeyingTheorem 10 conditions,itsindicatormayberecognizedbythetrustregionproceduredescribedintheprevioussection.Generally,wewillhandlethemaximumweightcliqueprobleminthefollowingwayallowingustopreservethetotalcomplexityofthemethodinanOn3time.Beforeapplyingthetrustregiontechnique,wendapossiblybestcliqueQbyafastgreedyprocedure.Toimproveit,wewilltrytosearchforcliquesweighingatleast 92

PAGE 93

WQ+wminusingthestationarypointsoftheprogram 6{26 .ItfollowsfromProposition 1 thatweshouldbeinterestedinthosepoints,where^r2=1 WQ+wmin)]TJ/F15 11.955 Tf 25.828 8.088 Td[(1 WV{34orless.Inourmethodweconsiderthestationarypointshavingthis^r2value,plusthosecorrespondingtovaluesminimizingthelefthandsideof 6{22 ineachcontinuoussection.Sincecliquesoflargerweightscorrespondtolesserradii,wehaveachancetocorrecttheshallowness"oftheformula 6{34 consideringtheminimumpossibleradii.Besides,tondstationarypointsatanyxedradius,weneedtondthoseminimizingvaluesanywaytodeterminehowmanyroots 6{22 hasoneachcontinuoussection.Ifthelefthandsideminimumonacontinuoussectionisnegative,therearetworootsandeachofthemisbracketedbetweentheminimizingpointandoneofthesectionbounds.BothunivariateminimizationandunivariaterootndingwhenarootisbracketedmaybeecientlyperformedbyBrent'smethod[ 18 ].Next,eachoftheobtainedstationarypointsispassedtoagreedyheuristicasanewvertexweightvectorandthefoundcliqueiscomparedtothebestcliqueknownatthismomentinitiallyitisthecliquefoundatthepreliminarystage.Thealgorithmresultisthebestcliqueobtaineduponcompletionofthisprocess.ThegreedyheuristicusedinourmethodtoprocessthestationarypointsisageneralizationoftheNew-Best-Insequentialdegreeheuristic.ItrunsinOn2time.Theusualversionofthisalgorithmiswhentheinputvectorxisthevertexweightvectorw.Withinourtrustregiontechniquewesubmittothisroutinetheobtainedsphericalstationarypoints.Beforeanythingelseweapplyapreprocessingabletoreducetheinputgraphinsomeinstances.Itisclearthatremovingoftoolowconnectedverticesandpreselectionoftoohighconnectedvertices{whentheseoperationsdonotleadtomissingoftheexactsolution{aredesirableastheTheorem 10 conditionmaybeviolatedbecauseof 93

PAGE 94

Input: graphGV;E,vectorx2Rn Output: amaximalcliqueQconstructvectory2Rnsuchthatyi=xi+Pj2Nixj;V1V,k1,Q;;whileVk6=;dochooseavertexvk2Vksuchthatyvkisgreatest;QQ[fvkg;Vk+1VkNvk;foreachj2Vk+1doyjyj)]TJ/F26 11.955 Tf 11.955 8.967 Td[(P`2VknVk+1Njx`;endkk+1;endFigure6-1.New-best-inweightedheuristic suchverticesmost.Thus,weiterativelyremovevertices,whoseweighttogetherwiththeneighborhoodweightisbelowthecliqueweightderivedbyAlgorithm 6-1 ,andpreselectanyvertexnon-adjacentonlytoasetweighingnotmorethanthevertexitself.Itiseasytoseethatone2.6{2.11cycletakesnotmorethananOn2timeandisrepeatedonlyifatleastonevertexisremovedfromthegraph.Aswell,therearenotmorethanncallsoftheAlgorithm 6-1 .Hence,thepreprocessingcomplexityisinOn3.ThepreliminarygreedyheuristicweusetoderivearstapproximationofthemaximumweightcliquecallsAlgorithm 6-1 ntimesstartingfromeachoftheverticesaschosenapriori.Obviously,thecomplexityofAlgorithm 6-3 isnOn2=On3.Itdoesnotexceedthetrustregionprocedurecomplexity,sothisprocessdoesnotincreasethetotalcomplexityofthemethod.Thus,weproposeAlgorithm 6-4 forthemaximumweightcliqueproblem.6.5ComputationalExperimentResultsThegoaloftherstcomputationalexperimentwastondasmallestmaximumcliqueinstance,whereQUALEX-MScannotndanexactsolution.Weusedtheprogramgengavailableat[ 68 ]togenerateallnon-isomorphictoeachothergraphsupto10verticesinclusive.QUALEX-MSsuccessfullyfoundexactsolutionstoallthoseinstances. 94

PAGE 95

Input: graphGV;E,itsvertexweightvectorw Output: reducedgraphGV;E,preselectedvertexsubsetQ0,cliqueQQ0;,B0;repeatassignQtheresultofAlgorithm 6-1 forGV;Ewithw;ifWQBthenbreak;endBWQ;flagfalse;composesetRofverticesi2Vsuchthatwi+Pj2Niwj
PAGE 96

Input: graphGV;E,itsvertexweightvectorw Output: maximalcliqueQexecuteAlgorithm 6-2 ;storethepreselectedvertexsetQ0andthecliqueQ;ifV=;thenQQ[Q0;exit;endexecuteAlgorithm 6-3 andstoretheresult^Q;computezby 6{12 ,x0by 6{27 ,wby 6{32 ,andDby 6{33 ;compute^Aby 6{30 and^bby 6{31 ;performtheeigendecomposition^A=Rdiag1;:::;nRT;computethevectorc=RT^b;computer2as^r2by 6{34 forW^Q;foreach>0minimizingleft-handsideof 6{22 inacontinuousintervalorsatisfying 6{22 docomputeyby 6{21 ;xRy+x0;rescalexizixi;i2V;executeAlgorithm 6-1 withthevectorxandrewritetheresultin^Qifitisabettersolution;endforeacheigenvalueclusteri=:::=i+k)]TJ/F20 7.97 Tf 6.586 0 Td[(1>0satisfying 6{23 docomputeallyj;j2Vnfi;:::;i+k)]TJ/F15 11.955 Tf 11.955 0 Td[(1gby 6{21 ;computer20by 6{24 ;ifr20r2thenforeachcombinationofyj;j2fi;:::;i+k)]TJ/F15 11.955 Tf 11.955 0 Td[(1gdenedby 6{25 doxRy+x0;rescalexizixi;i2V;executeAlgorithm 6-1 withthevectorxandrewritetheresultin^Qifitisabettersolution;endendendif^QisabettersolutionthanQthenQ^Q;endQQ[Q0;Figure6-4.QUALEX-MSalgorithm 96

PAGE 97

Thoughitcannotbeexcludedthatwithanothervertexnumberinginoneofthemtheexactsolutionwouldhavebeenlost,weconsiderthisresulttobeastrongevidencethatcounterexamplestothealgorithmdonotexistatleastupto11-vertexgraphs.Unfortunately,therearetoomanynon-isomorphic11-vertexgraphstocontinuetheexperimentthesameway,soithasnotbeencompleted.Next,wetestedQUALEX-MSonall80DIMACSmaximumcliqueinstances 1 .andcomparedtheresultsagainstourearlieralgorithmsQSHandQUALEX2.0[ 22 25 ].AllthreeprogramswererunonaPentiumIV1:4GHzcomputerunderOSLinuxRedHat.However,theQUALEX-MSpackagemakesuseofaneweigendecompositionroutineDSYEVRfromLAPACKinvolvingRelativelyRobustRepresentationstocomputeeigenpairsafterthematrixisreducedtoatridiagonalform[ 37 ].Thisexplainsimprovementoftheaveragerunningtimeversusthetwootherprograms.AsaBLASimplementation,theplatform-specicprebuiltofATLASlibrary 2 wasused.ExactorbestknownsolutionswerefoundbyQUALEX-MSin57instances.Itissignicantlybetterthan39exactorbestknownsolutionsbyQSHandanadvancecomparingto51exactorbestknownsolutionsbyQUALEX2.0.FortherestDIMACSgraphsQUALEX-MSobtainedgoodapproximationsolutions.TheresultsarecomposedinTable1.ThelastcomputationalexperimentperformedwithQUALEX-MSwasndingmaximumweightcliques.Sincetherearenowidelyacceptedmaximumweightcliquetestsuites,wefollowedtheapproachacceptedin[ 67 ]andtestedthealgorithmagainstnormalandirregularrandomgraphswithvariousedgedensitiescomparingittothealgorithmPBHwhichisanotherrecentcontinuous-basedheursitic.TogeneratetheirregularrandomgraphsAlgorithm4.1from[ 67 ]wasused.Vertexweightswereevenlydistributed 1availableatftp://dimacs.rutgers.edu/pub/challenge/graph/2availableathttp://www.netlib.org/atlas/archives/ 97

PAGE 98

Table6-1.DIMACSmaximumcliquebenchmarkresults Instancenden-!GQSHQUALEX2.0QUALEX-MSsityfoundtimefoundtimefoundtime brock200 12000:74521211211211brock200 22000:4961212<112<112<1brock200 32000:6051515<1151151brock200 42000:6581717117<117<1brock400 14000:74827274274272brock400 24000:74929294294293brock400 34000:74831314315312brock400 44000:74933334334332brock800 18000:64923173723362318brock800 28000:65124243824352418brock800 38000:64925253825372518brock800 48000:65026263726352618C125.91250:8983431<133<134<1C250.92500:89944421431441C500.95000:90057528538554C1000.910000:901686210363716427C2000.520000:5001613159316154716278C2000.920000:9007767154572151972215C4000.540000:5001815161981715558172345c-fat200-12000:0771212<112<112<1c-fat200-22000:1632424<124<124<1c-fat200-52000:4265858<158<158<1c-fat500-15000:03614145144141c-fat500-25000:07326265263262c-fat500-55000:18664642642642c-fat500-105000:374126126312631262DSJC500.55000:50013119138135DSJC1000.510000:50015138514741436gen200 p0.9 442000:9004437139142<1gen200 p0.9 552000:9005555<155<1551gen400 p0.9 554000:90055484504512gen400 p0.9 654000:90065634654652gen400 p0.9 754000:90075754754752hamming6-2640:9053232<132<132<1hamming6-4640:34944<14<14<1hamming8-22560:96912812811281128<1hamming8-42560:63916161161161hamming10-210240:990512512725126151238hamming10-410240:82940367036623645 98

PAGE 99

Table 6-1 Continued Instancenden-!GQSHQUALEX2.0QUALEX-MSsityfoundtimefoundtimefoundtime johnson8-2-4280:55644<14<14<1johnson8-4-4700:7681414<114<114<1johnson16-2-41200:76588<18<18<1johnson32-2-44960:87916165165168keller41710:6491111<111<1111keller57760:75127232226192616keller633610:81859486095515721531291MANN a9450:9271616<116<116<1MANN a273780:990126125212621251MANN a4510350:996345342703426134217MANN a8133210:999110010966671109660571096477p hat300-13000:2448718281p hat300-23000:48925242241251p hat300-33000:74436331352351p hat500-15000:2539999993p hat500-25000:50536338369364p hat500-35000:75250468489484p hat700-17000:2491182311241110p hat700-27000:49844422443264412p hat700-37000:74862592461246211p hat1000-110000:2451098210761028p hat1000-210000:49046438545794534p hat1000-310000:74468628365766532p hat1500-115000:2531210458124891295p hat1500-215000:50665624536450764111p hat1500-315000:75494854659148691108san200 0.7 12000:7003030<130<1301san200 0.7 22000:7001818118<118<1san200 0.9 12000:9007070<170170<1san200 0.9 22000:9006060<160<1601san200 0.9 32000:9004435140<140<1san400 0.5 14000:5001393134132san400 0.7 14000:70040404404403san400 0.7 24000:70030304304302san400 0.7 34000:70022164174182san400 0.9 14000:900100100310041002san100010000:50215107615691525sanr200 0.72000:69718151171181sanr200 0.92000:8984237<141<141<1sanr400 0.54000:50113114124132sanr400 0.74000:70021184205202 99

PAGE 100

Table6-2.PerformanceofQUALEX-MSvs.PBHonrandomweightedgraphs nden-QUALEX-MSPBHsityNormalIrregularNormalIrregularAvg.RSt.D.Avg.RSt.D.Avg.RSt.D.Avg.RSt.D. 1000:10100:00%0:00100:00%0:0097:95%0:1598:44%0:131000:20100:00%0:0099:88%0:0597:73%0:1698:64%0:121000:3099:87%0:0599:89%0:0497:25%0:1798:84%0:111000:4099:48%0:1899:75%0:0595:04%0:2398:53%0:121000:5099:45%0:1999:81%0:0494:61%0:2498:74%0:121000:6099:18%0:2199:93%0:0294:71%0:2399:64%0:061000:7098:02%0:3299:84%0:0396:10%0:2098:94%0:111000:8098:54%0:2999:99%0:0093:13%0:2698:56%0:121000:9098:43%0:2799:99%0:0094:29%0:2499:56%0:071000:9598:72%0:20100:00%0:0096:49%0:1999:75%0:052000:10100:00%0:0099:97%0:042000:2099:55%0:1999:86%0:042000:3099:33%0:2999:45%0:162000:4099:08%0:4599:36%0:352000:5098:34%0:4699:32%0:142000:6098:00%0:3599:61%0:102000:7096:99%0:6499:54%0:122000:8096:21%0:5599:71%0:10 randomintegernumbersfrom1to10.DuetosignicantlybetterspeedofQUALEX-MScomparingtotheheuristicsconsideredin[ 67 ]andavailabilityofahighlyoptimizedexactmaximumweightcliquesolvercliquerbyP.OstergardandS.Niskanen[ 71 ],wewereabletoperformthetestsnotonlyon100-vertexgraphsbutalsoon200-vertexgraphsuptotheedgedensity0:8.Aswell,weincreasedthenumberoftestedgraphsineachgroupfrom20to50.TherunningtimeofQUALEX-MSonallthoseinstancesisin1second,soitmaybeconsiderednegligible.However,similartestingonlargergraphsisunfortunatelydicultbecauseofsignicantslowingdownoftheexactsolver.Table 6-2 presentstheresultsofthiscomputationalexperiment.ThemeasuredvalueispercentageofthefoundcliqueweightstotheoptimumcliqueweightsaveragedthroughallgraphsofagroupAvg.Rcolumns.SecondresultcolumnsrepresentstandarddeviationsofthesevaluesSt.Dev.columns.Theobtainedguresshowthatourmethod 100

PAGE 101

strictlyoutperformsthealgorithmPBHandthedierencebetweenmaximumweightcliquesandthosefoundbyQUALEX-MSisrathernegligible.Apartfromthesecomputationalexperiments,Basuetal.reportthatQUALEX-MShasalwaysbeenabletondtheexactsolutionsofmaximumweightcliqueprobleminstancesconstructedtooptimizeclassicationandregressionfordatabases,whichwereconsideredintheirdatabasecompressionexperiments[ 4 ].6.6RemarksandConclusionsWehavepresentedanewfastheuristicmethodforthemaximumweightcliqueproblem.Ithasbeenshownempiricallythatthemethodisexactonaconsiderablerangeofinstances.AmongthemaretheBrockington-CulbersongraphsfromtheDIMACStestsuite[ 19 ]brock*exceptionallyhardforallothertypesofheuristicsthatmaybefoundintheliterature.Besides,wehavespeciedtheoreticallyanon-trivialclassofinstanceswheretheusedtrustregionformulationmaydirectlydeliveramaximumweightcliqueindicatorTheorem 10 .AsthenextstepofQUALEX-MSdevelopmentitshouldbeinvestigatedwhetheritispossibiletoexpressMotzkin{Strausoptimaasafunctionofaparticularsubsetofthesphericalstationarypoints.ItmayleadtoageneralizationofTheorem 10 expandingtheclassofmaximumweightcliqueinstanceswheretheoptimumisdirectlycomputablebythepresentedtrustregionprocedure.Acasetheoreticallyseemingtobetheworstforthedescribedtechniqueiswhentherearemultipleeigenvaluescausingthetrustregionproblemdegeneracy.Itmaybesupposedthataspecialsubmethoddealingwithsuchinstancesshouldbedeveloped. 101

PAGE 102

REFERENCES [1] J.Abello,S.Butenko,P.M.Pardalos,M.G.C.Resende,Findingindependentsetsinagraphusingcontinuousmultivariablepolynomialformulations,J.GlobalOptim.212001111{137. [2] J.Abello,P.M.Pardalos,M.G.C.Resende,Onmaximumcliqueprobleminverylargegraphs,DIMACSSeriesinDiscreteMathematicsandTheoreticalComputerScience,vol.50,AMSProvidence,RI,1999,pp.119{130. [3] U.Alon,N.Barkai,D.A.Notterman,K.Gish,S.Ybarra,D.Mack,A.J.Levine,Broadpatternsofgeneexpressionrevealedbyclusteringanalysisoftumorandnormalcolontissuesprobedbyoligonucleotidearrays,ProceedingsoftheNationalAcademyofSciences969996745{6750. [4] S.Babu,M.Garofalakis,R.Rastogi,SPARTAN:Amodel-basedsemanticcompressionsystemformassivedatatables,in:Proceedingsofthe2001ACMInternationalConferenceonManagementofDataSIGMOD,2001,pp.283{295. [5] A.Banerjee,I.S.Dhillon,J.Ghosh,S.Merugu,D.S.Modha,GeneralizedmaximumentropyapproachtoBregmanco-clusteringandmatrixapproximations,in:Proceedingsofthe10thACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMiningKDD,2004,pp.509{514. [6] R.B.Bapat,T.E.S.Raghavan,Non-negativeMatricesandApplicationsChapter6,CambridgeUniversityPress,Cambridge,UK,1997. [7] E.R.Barnes,A.J.Homan,U.G.Rothblum,Optimalpartitionshavingdisjointconvexandconichulls,Math.Program.5419269{86. [8] A.Ben-Dor,L.Bruhn,I.Nachman,M.Schummer,Z.Yakhini,Tissueclassicationwithgeneexpressionproles,J.Comput.Biol.7000559{584. [9] A.Ben-Dor,B.Chor,R.Karp,Z.Yakhini,Discoveringlocalstructureingeneexpressiondata:theorder-preservingsubmatrixproblem,in:Proceedingsofthe6thAnnualInternationalConferenceonComputationalBiologyRECOMB,2002,pp.49{57. [10] A.Ben-Dor,B.Chor,R.Karp,Z.Yakhini,Discoveringlocalstructureingeneexpressiondata:theorder-preservingsubmatrixproblem,J.Comput.Biol.10-4003373{384. [11] A.Ben-Dor,N.Friedman,Z.Yakhini,Classdiscoveryingeneexpressiondata,in:Proceedingsof5thAnnualInternationalConferenceonComputationalMolecularBiologyRECOMB,2001,pp.31{38. [12] M.Blatt,S.Wiseman,E.Domany,Dataclusteringusingamodelgranularmagnet,NeuralComput.9971805{1842. 102

PAGE 103

[13] V.Boginski,S.Butenko,P.M.Pardalos,Onstructuralpropertiesofthemarketgraph,in:A.NagurneyEd.,InnovationsinFinancialandEconomicNetworks,EdwardElgarPublishers,Cheltenham,UK{Northampton,MA,USA,2003,pp.28{45. [14] I.M.Bomze,M.Budinich,P.M.Pardalos,M.Pelillo,Themaximumcliqueproblem,in:D.-Z.DuandP.M.Pardalos,Eds.,HandbookofCombinatorialOptimizationSupplementVolumeA,KluwerAcademic,Dordrecht,1999,pp.1{74. [15] E.Boros,P.Hammer,T.Ibaraki,A.Kogan,Logicalanalysisofnumericaldata,Math.Program.7997163{190. [16] E.Boros,P.Hammer,T.Ibaraki,A.Kogan,E.Mayoraz,I.Muchnik,Animplementationoflogicalanalysisofdata,IEEETransactionsonKnowledgeandDataEngineering122000292{306. [17] L.Breiman,J.H.Friedman,R.A.Olshen,C.J.Stone,ClassicationandRegressionTrees,Chapman&Hall/CRCPress,1984. [18] R.P.Brent,AlgorithmsforMinimizationwithoutDerivatives,Prentice-Hall,EnglewoodClis,NJ,1973. [19] M.Brockington,J.C.Culberson,Camouagingindependentsetsinquasi-randomgraphs,in:D.JohnsonandM.A.TrickEds.,Cliques,ColoringandSatisability,DIMACSSeriesinDiscreteMathematicsandTheoreticalComputerScience,vol.26,AMSProvidence,RI,1996,pp.75{88. [20] K.Bryan,P.Cunningham,N.Bolshakova,Biclusteringofexpressiondatausingsimulatedannealing,in:Proceedingsofthe18thIEEESymposiumonComputer-BasedMedicalSystemsCBMS,2005,pp.383{388. [21] S.Burer,R.D.C.Monteiro,Y.Zhang,Maximumstablesetformulationsandheuristicsbasedoncontinuousoptimization,Math.Program.94102137{166. [22] S.Busygin,S.Butenko,P.M.Pardalos,Aheuristicforthemaximumindependentsetproblembasedonoptimizationofaquadraticoverasphere,JournalofComb.Optim.62002287{297. [23] S.Busygin,G.Jacobsen,E.Kramer,Doubleconjugatedclusteringappliedtoleukemiamicroarraydata,SIAMDataMiningWorkshoponClusteringHighDimensionalDataandItsApplications,2002. [24] S.Busygin,O.A.Prokopyev,P.M.Pardalos,Featureselectionforconsistentbiclusteringviafractional0{1programming,J.Comb.Optim.10057{21. [25] S.Busygin998,StasBusygin'sNP-completenesspage,http://www.busygin.dp.ua/npc.html 103

PAGE 104

[26] A.Califano,SPLASH:Structuralpatternlocalizationanalysisbysequentialhistograms,Bioinformatics1600341{357. [27] A.Califano,S.Stolovitzky,Y.Tu,Analysisofgeneexpressionmicroarraysforphenotypeclassication,in:Proceedingsofthe8thInternationalConferenceonIntelligentSystemsforMolecularBiologyISMB,2000,pp.75{85. [28] CAMDAConferenceWebsite2001,http://www.camda.duke.edu/camda01.html,lastaccessedApril2007. [29] G.Casella,E.I.George,ExplainingtheGibbssampler,TheAmericanStatistician46992167{174. [30] Y.Cheng,G.M.Church,Biclusteringofexpresssiondata,in:Proceedingsofthe8thInternationalConferenceonIntelligentSystemsforMolecularBiologyISMB,2000,pp.93{103. [31] Y.Cheng,G.M.Church,BiclusteringofexpresssiondataSupplementaryinformation,http://arep.med.harvard.edu/biclustering/,lastaccessedApril2007. [32] W.Cheswick,H.Burch,InternetMappingProject,http://www.cs.bell-labs.com/who/ches/map/,lastaccessedApril2007. [33] H.Cho,I.S.Dhillon,Y.Guan,S.Sra,Minimumsum-squaredresidueco-clusteringofgeneexpressiondata,in:Proceedingsofthe4thSIAMInternationalConferenceonDataMiningSDM,2004,pp.114{125. [34] H.Cho,Y.Guan,S.Sra,Co-clusteringsoftware,version1.1,http://www.cs.utexas.edu/users/dml/Software/cocluster.html,lastaccessedApril2007. [35] ILOGInc.,CPLEX9.0User'sManual,2004. [36] N.Cristianini,J.Shawe-Taylor,AnIntroductiontoSupportVectorMachinesandOtherKernel-basedLearningMethods,CambridgeUniversityPress,Cambridge,UK,2000. [37] I.S.Dhillon,AnewOn2algorithmforthesymmetrictridiagonaleigenvalue/eigenvectorproblem,ComputerScienceDivisionTechnicalReportNo.UCB//CSD-97-971,UCBerkeley,1997. [38] I.S.Dhillon,Co-clusteringdocumentsandwordsusingbipartitespectralgraphpartitioning,in:Proceedingsofthe7thACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMiningKDD,2001,pp.269{274. [39] I.S.Dhillon,S.Mallela,D.S.Modha,Information-theoreticco-clustering,in:Proceedingsofthe9thACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMiningKDD,2003,pp.89{98. 104

PAGE 105

[40] E.Domany,Super-paramagneticclusteringofdata,PhysicaA26399158{169. [41] G.E.Forsythe,G.H.Golub,Onthestationaryvaluesofaseconddegreepolynomialontheunitsphere,SIAMJ.Appl.Math.13651050{1068. [42] M.Garey,D.Johnson,ComputersandIntractability:AGuidetotheTheoryofNP-Completeness,Freeman&Co.,NewYork,1979. [43] G.Getz,E.Levine,E.Domany,Coupledtwo-wayclusteringanalysisofgenemicroarraydata,ProceedingsoftheNationalAcademyofSciences97200012079{12084. [44] L.E.Gibbons,D.W.Hearn,P.M.Pardalos,M.V.Ramana,Continuouscharacterizationsofthemaximumcliqueproblem,Math.Oper.Res.22997754{768. [45] T.R.Golub,D.K.Slonim,P.Tamayo,C.Huard,M.Gaasenbeek,J.P.Mesirov,H.Coller,M.L.Loh,J.R.Downing,M.A.Caligiuri,C.D.Bloomeld,E.S.Lander,Molecularclassicationofcancer:classdiscoveryandclasspredictionbygeneexpressionmonitoring,Science28699531{537. [46] G.H.Golub,C.F.VanLoan,MatrixComputations,TheJohnsHopkinsUniversityPress,BaltimoreMA,1996. [47] D.Granot,U.G.Rothblum,TheParetosetofthepartitionbargaininggame,GamesandEconomicBehavior31991163{182. [48] W.Hager,Minimizingaquadraticoverasphere,SIAMJ.Optim.12001188{208. [49] P.Hansen,M.PoggideArag~ao,C.C.Ribeiro,Hyperbolic0{1programmingandqueryoptimizationininformationretrieval,Math.Program.52991256{263. [50] S.Hashizume,M.Fukushima,N.Katoh,T.Ibaraki,Approximationalgorithmsforcombinatorialfractionalprogrammingproblems,Math.Program.371987255{267. [51] J.A.Hartigan,Directclusteringofadatamatrix,J.Amer.Stat.Assoc.67972123{129. [52] J.Hastad,Cliqueishardtoapproximatewithinn1)]TJ/F22 7.97 Tf 6.587 0 Td[(,in:Proceedingsof37thAnnualIEEESymposiumontheFoundationsofComputerScienceFOCS,1996,pp.627{636. [53] L-L.Hsiao,F.Dangond,T.Yoshida,R.Hong,R.V.Jensen,J.Misra,W.Dillon,K.F.Lee,K.E.Clark,P.Haverty,Z.Weng,G.Mutter,M.P.Frosch,M.E.MacDonald,E.L.Milford,C.P.Crum,R.Bueno,R.E.Pratt,M.Mahadevappa,J.A.Warrington,G.Stephanopoulos,G.Stephanopoulos,S.R.Gullans,Acompendiumofgeneexpressioninnormalhumantissues,Physiol.Genomics70197{104. 105

PAGE 106

[54] HuGEIndex.orgWebsite,http://www.hugeindex.org,lastaccessedApril2007. [55] F.K.Hwang,S.Onn,U.G.Rothblum,Linearshapedpartitionproblems,Oper.Res.Let.26000159{163. [56] L.D.Iasemidis,P.M.Pardalos,J.C.Sackellares,D.S.Shiau,Quadraticbinaryprogramminganddynamicalsystemapproachtodeterminethepredictabilityofepilepticseizures,J.Comb.Optim.5019{26. [57] S.C.Johnson,Hierarchicalclusteringschemes,Psychometrika2967241{254. [58] S.Khot,Improvedinapproximabilityresultsformaxclique,chromaticnumberandapproximategraphcoloring,in:Proceedingsof42ndAnnualIEEESymposiumontheFoundationsofComputerScienceFOCS,2001,pp.600{609. [59] Y.Kluger,R.Basri,J.T.Chang,M.Gerstein,Spectralbiclusteringofmicroarraydata:coclusteringgenesandconditions,GenomeRes.132003703{716. [60] T.Kohonen,Self-OrganizationMaps,Springer-Verlag,Berlin-Heidelberg,1995. [61] L.Lazzeroni,A.Owen,Plaidmodelsforgeneexpressiondata,StatisticaSinica1200261{86. [62] L.Lazzeroni,A.OwenPlaidmodels,formicroarraysanddnaexpression,http://www-stat.stanford.edu/~owen/plaid/,lastaccessedApril2007. [63] J.Liu,W.Wang,OP-cluster:ClusteringbytendencyinhighdimensionalSpace,in:Proceedingsofthe3rdIEEEInternationalConferenceonDataMiningICDM,2003,pp.187{194. [64] S.Lyle,M.Szularz,Localminimaofthetrustregionproblem,J.Optim.TheoryAppl.80994117{134. [65] J.B.MacQueen,Somemethodsforclassicationandanalysisofmultivariateobservations,in:Proceedingsofthe5thSymposiumonMathematicsandProbability,1967,pp.281{297. [66] C.Mannino,E.Stefanutti,Anaugmentationalgorithmforthemaximumweightedstablesetproblem,Comput.Optim.Appl.1499367{381. [67] A.Massaro,M.Pelillo,I.M.Bomze,Acomplementarypivotingapproachtothemaximumweightcliqueproblem,SIAMJ.Optim.122002928{948. [68] B.D.McKay984,Thenautypage,http://cs.anu.edu.au/~bdm/nauty/,lastaccessedApril2007. [69] J.J.More,D.S.Sorensen,Computingatrustregionstep,SIAMJ.Sci.Statist.Comput.4983553{572. 106

PAGE 107

[70] T.S.Motzkin,E.G.Straus,MaximaforgraphsandanewproofofatheoremofTuran,Canad.J.Math.1765533{540. [71] P.Ostergard,S.Niskanen002,Cliquer{routinesforcliquesearching,http://users.tkk./~pat/cliquer.html,lastaccessedApril2007. [72] V.Y.Pan,Z.Q.Chen,Thecomplexityofthematrixeigenproblem,in:Proceedingsofthe31stAnnualACMSymposiumonTheoryofComputingSTOC,1999,pp.507{516. [73] P.M.Pardalos,S.Busygin,O.A.Prokopyev,Onbiclusteringwithfeatureselectionformicroarraydatasets,in:R.MondainiEd.,BIOMAT2005{InternationalSymposiumonMathematicalandComputationalBiology,WorldScientic,2006,pp.367{378. [74] J.-C.Picard,M.Queyranne,Anetworkowsolutiontosomenonlinear0{1programmingproblems,withapplicationstographtheory,Networks12982141{159. [75] O.A.Prokopyev,H.-X.Huang,P.M.Pardalos,Oncomplexityofunconstrainedhyperbolic0{1programmingproblems,Oper.Res.Lett.332005312{318. [76] O.A.Prokopyev,C.Meneses,C.A.S.Oliveira,P.M.Pardalos,Onmultiple-ratiohyperbolic0{1programmingproblems,PacicJ.Optim.12005327{345. [77] D.J.Reiss,N.S.Baliga,R.Bonneau,Integratedbiclusteringofheterogeneousgenome-widedatasetsfortheinferenceofglobalregulatorynetworks,BMCBioinformatics728006. [78] S.Saipe,Solvinga,1hyperbolicprogrambybranchandbound,NavalRes.Logist.Quarterly2275497{515. [79] R.Shamir,EXPANDER:Ageneexpressionanalysisandvisualizationsoftware,http://www.cs.tau.ac.il/~rshamir/expander/expander.html,lastaccessedApril2007. [80] Y.D.Shen,Z.Y.Shen,S.M.Zhang,Q.Yang,Clustercores-basedclusteringforhighdimensionaldata,in:Proceedingsof4thIEEEInternationalConferenceonDataMiningICDM,2004,pp.519{522. [81] Q.Sheng,Y.Moreau,B.DeMoor,BiclusteringmicroarraydatabyGibbssampling,Bioinformatics1903ii196{ii205. [82] P.St-Louis,J.A.Ferland,andB.Gendron006,Apenalty-evaporationheuristicinadecompositionmethodforthemaximumcliqueproblem,TechnicalReport,Departmentd'informatiqueetderechercheoperationnelle,UniversitedeMontreal,http://www.iro.umontreal.ca/~gendron/publi.html,lastaccessedApril2007. 107

PAGE 108

[83] A.Tanay,Computationalanalysisoftranscriptionalprograms:functionandevolution,PhDthesis,2005,http://www.cs.tau.ac.il/~rshamir/theses/amos phd.pdf,lastaccessedApril2007. [84] A.Tanay,R.Sharan,M.Kupiec,R.Shamir,Revealingmodularityandorganizationintheyeastmolecularnetworkbyintegratedanalysisofhighlyheterogeneousgenomewidedata,ProceedingsoftheNationalAcademyofSciences1010042981{2986. [85] A.Tanay,R.Sharan,R.Shamir,Discoveringstatisticallysignicantbilcustersingeneexpressiondata,Bioinformatics18002S136{S144. [86] M.Tawarmalani,S.Ahmed,N.Sahinidis,Globaloptimizationof0{1hyperbolicprograms,J.GlobalOptim.2402385{416. [87] V.Vapnik,TheNatureofStatisticalLearningTheory,Springer-Verlag,Berlin-Heidelberg,1999. [88] WeizmannInstitueofScience2000,Thecoupledtwowayclusteringalgorithm,http://ctwc.weizmann.ac.il/,lastaccessedApril2007. [89] J.Weston,S.Mukherjee,O.Chapelle,M.Pontil,T.Poggio,V.Vapnik,FeatureselectionforSVMs,in:S.A.Solla,T.K.Leen,Klaus-RobertMullerEds.,AdvancesinNeuralInformationProcessingSystems,MITPress,2001. [90] T.-H.Wu,Anoteonaglobalapproachforgeneral0{1fractionalprogramming,EuropeanJ.Oper.Res.101997220{223. [91] E.P.Xing,R.M.Karp,CLIFF:Clusteringofhigh-dimensionalmicroarraydataviaiterativefeaturelteringusingnormalizedcuts,BioinformaticsDiscoveryNote10011{9. [92] R.Xu,D.Wunsch,Surveyofclusteringalgorithms,IEEETransactionsonNeuralNetworks1605645{648. [93] J.Yang,W.Wang,H.Wang,P.Yu,-clusters:Capturingsubspacecorrelationinalargedataset,in:Proceedingsofthe18thIEEEInternationalConferenceonDataEngineeringICDE,2002,pp.517{528. [94] J.Yang,W.Wang,H.Wang,P.Yu,Enhancedbiclusteringonexpressiondata,in:Proceedingsofthe3rdIEEEConferenceonBioinformaticsandBioengineeringBIBE,2003,pp.321{327. [95] Y.Ye,Anewcomplexityresultonminimizationofaquadraticfunctionwithasphereconstraint,in:C.A.Floudas,P.M.PardalosEds.,RecentAdvancesinGlobalOptimization,PrincetonUniversityPress,Princeton,NJ,1992,pp.19{31. 108

PAGE 109

BIOGRAPHICALSKETCHStanislavBusyginwasbornonJuly14,1974,inDzhankoy,CrimeaUkraine.HereceivedhisSpecialistDegreeinappliedmathematicsfromDnipropetrovskNationalUniversityUkrainein1996.From1996to2003,hehasbeenworkingasasoftwareengineerandascienticconsultantinanumberoftechnologicallyinnovativecompaniesinUkraineandWesternEurope.In2003,StanislavBusyginenteredthegraduateprograminindustrialandsystemsengineeringattheUniversityofFlorida.HereceivedhisM.S.degreeinindustrialandsystemsengineeringfromUniversityofFloridainApril2005.StanislavBusyginisanauthorofalmostadozenofscienticpeer-reviewedresearchpapersandsurveys. 109