A Comparative Study on Biological Networks

MISSING IMAGE

Material Information

Title:
A Comparative Study on Biological Networks Alignment and Structural Properties
Physical Description:
1 online resource (218 p.)
Language:
english
Creator:
Ay,Ferhat
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Computer Engineering, Computer and Information Science and Engineering
Committee Chair:
Kahveci, Tamer
Committee Members:
Dobra, Alin
Sahni, Sartaj
Banerjee, Arunava
De Crecy-Lagard, Valerie

Subjects

Subjects / Keywords:
alignment -- biological -- mapping -- metabolic -- network -- pathway -- regulatory -- subnetwork
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre:
Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
Biological networks encapsulate invaluable information about the roles of different biochemical entities and their interactions with each other. Analyzing these networks is essential in order to comprehend the machinery of a cell and to reveal evolutionary differences between different cells and organisms. Three main types of biological networks are protein interaction networks, metabolic networks (or pathways) and regulatory networks. In the literature, the terms "network" and "pathway" are used interchangeably for the metabolic interaction data. An important type of analysis for biological networks is the comparative analysis which aims at identifying functionally similar components of these networks that are shared among different species. Analogous to sequence alignment which identifies sequence similarity, network alignment reveals similar connectivity patterns such as alternative paths and subnetworks. Additional to the comparative analysis, examining solely the topological structure of biological networks also led to interesting observations such as the modular organization, repeating connectivity patterns, the steady states and specific degree distributions that these networks exhibit. In this thesis, we introduce (i) Alignment algorithms for metabolic networks that account for heterogeneous network elements, connected subnetwork mappings and scalability problem in network alignment (ii) An algorithm that predicts functional similarity between reactions based on metabolic flux analysis; (iii) Efficient methods that identify steady states of Boolean regulatory networks using binary decision diagrams (BDDs) and graph partitioning; (iv) An algorithm that identifies dynamic modular structure of regulatory networks.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Ferhat Ay.
Thesis:
Thesis (Ph.D.)--University of Florida, 2011.
Local:
Adviser: Kahveci, Tamer.

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2011
System ID:
UFE0043283:00001


This item is only available as the following downloads:


Full Text

PAGE 1

ACOMPARATIVESTUDYONBIOLOGICALNETWORKS:ALIGNMENTAND STRUCTURALPROPERTIES By FERHATAY ADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOL OFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENT OFTHEREQUIREMENTSFORTHEDEGREEOF DOCTOROFPHILOSOPHY UNIVERSITYOFFLORIDA 2011

PAGE 2

c 2011FerhatAy 2

PAGE 3

ACKNOWLEDGMENTS FirstandforemostIwouldliketoexpressmysinceregratitudetomyPh.D.advisor, Dr.TamerKahveci.Withouthismentorship,motivationandsupportfromtheveryrst dayofmygraduatestudiesthisthesiswouldnotbepossible.Iamalsothankfulfor theexcellentexamplehehasprovidedasasuccessfulresearcherandanoutstanding advisor. Iwouldliketoacknowledgethemembersofmythesiscommittee,Dr.Valerie de-CrecyLagard,Dr.AlinDobra,Dr.ArunavaBanerjeeandDr.SartajSahni,whohas beenveryhelpfulandunderstandingduringtheyearsIspentattheUniversityofFlorida. IamverygratefultoDr.ManolisKellisfromMITforgivingmetheopportunitytojoin hislabasavisitingresearcherinsummer2010andtohelpmeexplorenewresearch directions.IamalsothankfultoprofessorsofourdepartmentwhoseclassesIhave greatlybenetedfrom,especiallyDr.MyTraThai,Dr.AnandRangarajanandDr.Meera Sitharam.Iwouldliketothankallmyteachersandprofessorsstartingfrommyprimary schooltilltheendofmyPh.D.whohelpedmelearnallthatIknownow. Icannotthankenoughtoallmyfriendswhohasbeenwithmealltheseyears thatIlivedandwithwhomwecreatedandsharedthebestmemoriesofmylife.I considermyselfveryluckytohaveanamazinggroupoffriendsallthroughmylifefrom elementaryschooltohighschoolandfrommyundergraduateyearsinTurkeytomy Ph.D.yearsinFlorida.Iespeciallywanttothankallmyfriendswhohelpedmealot duringmyrstmonthsattheUniversityofFlorida. Mostimportantly,Iwanttothankmyfamilyforalltheloveandsupporttheygave meforyears.Mymomandmydadarethebestparentsthatsomeonecaneverhave andeverwishfor.Iwanttothankthemalsoforgivingmeagreatbrothereventhough IhadtowaitelevenyearsforhimwhoIlovewithallmyheart.IalsowanttosaythatI amgratefultoallthemembersofmyextendedfamilyfortheirlovingandforthatthey believedinmeandsupportedmeineverystepItakeinmylife. 3

PAGE 4

Nothingintheworldcanmakemehappierthanseeingtheprideintheeyesofmy familyforthethingsthatIachieve.Tomakethemproudandtoputasmileontheirfaces wasthebiggestmotivationineverythingthatIhavedonesofaranditwillbelikethatfor therestofmylife. Finally,IwanttothankthepersonwhoIspentmylasttwoyearswithandwith whomIwanttospendtherestofmylife.Mylove,mysoulmate,mylovelygirlfriend IlksenEceIcyuz.Shehasbeeneverythingtomeduringtheyearsthatwespent together,evenmynursewhenIwasbedboundwithmybrokenleg.Thankyou. Andthankstoeveryonewhohaseverhadtheslightestbitofroleinmyjourney. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS..................................3 LISTOFTABLES......................................8 LISTOFFIGURES.....................................9 ABSTRACT.........................................11 CHAPTER 1INTRODUCTION...................................13 1.1MetabolicNetworkAlignment.........................13 1.2FunctionalSimilaritiesofEntitySets.....................19 1.3SteadyStatesofBooleanRegulatoryNetworks...............21 1.4DynamicModularStructureofRegulatoryNetworks............22 1.5Outline......................................22 2ALIGNMENTOFMETABOLICNETWORKSWITHOUTABSTRACTION....24 2.1Background...................................26 2.2Model......................................30 2.3Algorithm....................................33 2.3.1PairwiseSimilarityofEntities.....................35 2.3.2SimilarityofTopologies........................39 2.3.3CombiningHomologyandTopology.................41 2.3.4ExtractingtheMappingofEntities..................45 2.3.5SimilarityScoreofNetworks.....................46 2.3.6ComplexityAnalysis..........................47 2.4ResultsandDiscussion............................48 2.4.1EffectsofHomologyandTopologyInformation...........49 2.4.2IdenticationofAlternativeEnzymesandPaths...........51 2.4.3PhylogenicReconstruction......................53 2.4.4Top-kQueriesinNetworkDatabases.................57 2.4.5EffectofConsistency..........................59 2.4.6ErrorTolerance.............................62 2.4.7RunningTime.............................64 2.4.8StatisticalSignicance.........................66 2.4.9Discussion................................68 3SUBNETWORKMAPPINGSINALIGNMENTOFNETWORKS.........73 3.1OurAlgorithm:SubMAP............................75 3.1.1EnumerationofConnectedSubnetworks...............75 3.1.2HomologicalSimilarityofSubnetworks................75 5

PAGE 6

3.1.3TopologicalSimilarityofSubnetworks.................77 3.1.4CombiningHomologyandTopology.................79 3.1.5ExtractingSubnetworkMappings...................79 3.2ResultsandDiscussion............................85 3.2.1AlternativeSubnetworks........................85 3.2.2NumberofConnectedSubnetworks.................88 3.2.3One-to-manyMappingswithinandacrossMajorClades......89 3.2.4EvaluationofRunningTimeandMemoryUtilization.........91 3.2.5Discussion................................93 4LARGESCALEMETABOLICNETWORKALIGNMENTBYCOMPRESSION.97 4.1CompressionPhase..............................101 4.1.1MinimumDegreeSelectionMDSMethodforCompression....102 4.1.2OptimalityAnalysisforMDS......................104 4.2AlignmentFramework.............................110 4.2.1OverviewofSubMAP..........................111 4.2.2AlignmentPhase............................112 4.2.3RenementPhase...........................113 4.2.4ComplexityAnalysis..........................114 4.2.5HowMuchShouldWeCompress?..................117 4.3ResultsandDiscussion............................119 4.3.1EvaluationofCompressionRates...................120 4.3.2EvaluationofRunningTimeandMemoryUtilization.........122 4.3.3AccuracyoftheAlignmentResults..................124 4.3.4Discussion................................127 5FUNCTIONALSIMILARITIESOFREACTIONSETSINMETABOLICNETWORKS130 5.1Background...................................136 5.2ModelingFunctionalSimilarity........................138 5.3ComputingFunctionalSimilarity.......................141 5.3.1FindingtheEFMsofMetabolicNetworks...............142 5.3.2ExtractingEFMsofImpacts......................143 5.3.3CalculatingtheSimilarityScore....................144 5.4ResultsandDiscussion............................149 5.4.1IdenticationofFunctionalSimilarities................149 5.4.2PredictionofEssentialReactionSets.................152 5.4.3Discussion................................154 6STEADYSTATESOFREGULATORYNETWORKS................157 6.1Methods.....................................161 6.1.1StateTransitionModel.........................162 6.1.2SegregationofStatesUsingBDDs..................163 6.1.3ExtractingCyclicSteadyStates....................165 6.2ResultsandDiscussion............................172 6

PAGE 7

6.2.1CellCyclesofBuddingYeastandFissionYeast...........172 6.2.2PerformanceEvaluation........................175 6.2.3Co-ExpressedGenePairsinHumanHedgehogNetwork......176 6.2.4AccuracyofEstimators.........................177 6.2.5Discussion................................178 7DYNAMICMODULARSTRUCTUREOFREGULATORYNETWORKS.....180 7.1Methods.....................................183 7.1.1StateTransitions............................183 7.1.2ConstructionofFunctionalNetworks.................184 7.1.3IdenticationofDynamicModules...................186 7.2ResultsandDiscussion............................191 7.2.1QualitativeEvaluation.........................192 7.2.2QuantitativeEvaluation.........................196 7.2.3Discussion................................199 8CONCLUSIONS...................................202 REFERENCES.......................................205 BIOGRAPHICALSKETCH................................218 7

PAGE 8

LISTOFTABLES Table page 2-1Commonlyusedsymbolsinthischapter......................70 2-2Alternativeenzymesthatcatalyzetheformationofacommonproduct.....70 2-3FullnamesandtwolevelsofNCBItaxonomyof73organisms..........71 2-4Comparisonofthephylogenictreespredictedbydifferentmethods.......72 3-1Extendedtableoffrequentlyusedsymbols....................95 3-2Alternativesubnetworksthatproducesameorsimilaroutputcompounds...95 3-3Characteristicsofmappingsinbetweenandacrossthreemajorclades.....96 4-1Commonlyusedsymbolsinthischapter......................128 4-2Summaryofcompressionrates...........................128 4-3Correlationofthemappingscores.........................129 5-1EssentialreactionsofE.coli.............................156 6-1ComparisonofouralgorithmwithGenysis.....................179 7-1Top20modulesfoundbyourmethodwiththehighestsupport.........201 8

PAGE 9

LISTOFFIGURES Figure page 1-1Asubnetworkofhumanlysinebiosynthesisnetwork...............14 1-2Alignmentoftwohypotheticalnetworks......................15 1-3Theeffectofabstractionformetabolicnetworks..................17 1-4AportionofanalignmentofLysinebiosynthesisnetworks............19 1-5Atoyexamplewithtwometabolicnetworks....................20 2-1Graphrepresentationofmetabolicnetworks....................32 2-2Consistencyofanalignment............................34 2-3Calculationofinformationcontentenzymesimilarity...............36 2-4Calculationofsimilarityvectoranditsnormalization...............38 2-5Calculationofthesupportmatrixforrunningexample..............41 2-6Effectof parameter................................50 2-7Identicationofalternativepaths..........................53 2-8Phylogenictreesfor73organisms.........................56 2-9Ourtreepredictionswithinmajorclades......................57 2-10Averagecorrectclassicationpercentagesofouralgorithm...........58 2-11Effectsofconsistencyrestrictionandtheentityorderingonalignmentscore..60 2-12Effectsofhomologyandtopologyerrorsonalignmentscore...........62 2-13Runningtimecomparisonandanalysis......................65 2-14Thedistributionoftheobservedalignmentscores................67 2-15TheZ-scoredistributionforarandomensembleofalignments..........68 3-1AnillustrativeexampleforthereductionfromtheMWISproblem........80 3-2Illustrationofconictgraph.............................83 3-3Visualrepresentationsofsubnetworkmappings..................87 3-4Thenumberofenumeratedsubnetworksfordifferentnetworks.........89 3-5TheaveragerunningtimeofSubMAP.......................92 9

PAGE 10

3-6TheaveragememoryutilizationofSubMAP....................93 4-1Networkalignmentwithandwithoutcompression.................99 4-2Onecompressionstepofourcompressionmethod................104 4-3Theaveragerunningtimeofourframework....................125 5-1Illustrationoftheimpactofareactionset.....................131 5-2MetabolicnetworkofGlycolysisandGluconeogenesis..............132 5-3Pictorialdescriptionoffunctionalsimilarity.....................142 5-4MinimumenclosingballMEBofapolytope...................148 5-5Comparisonofdifferentsimilaritymeasures....................151 5-6Statisticalsignicanceofouressentialityprediction................153 6-1Statesofahypotheticalnetworkwiththreegenes.................160 6-2Summaryofthetraversalprocess.........................166 6-3Regulatorynetworkofthecellcycleofbuddingyeast...............173 6-4Regulatorynetworkofthecellcycleofssionyeast...............174 6-5Convergenceoftheestimatorsforthesteadystateprolesofthegenes....178 7-1Aportionofthehumancoagulationcascadenetwork...............181 7-2Ahypotheticalnetworkwithtwomodules.....................187 7-3Twofunctionalnetworksofhumancoagulationcascadeattwoconsecutive timesteps.......................................193 7-4Averagemodularityvalueforeachpatientoveralldynamicsteps........197 7-5AverageNMIofCNMresultsandthecompactrepresentation..........198 7-6Cumulativerunningtimeforincreasingnumberofpatients............199 10

PAGE 11

AbstractofDissertationPresentedtotheGraduateSchool oftheUniversityofFloridainPartialFulllmentofthe RequirementsfortheDegreeofDoctorofPhilosophy ACOMPARATIVESTUDYONBIOLOGICALNETWORKS:ALIGNMENTAND STRUCTURALPROPERTIES By FerhatAy August2011 Chair:TamerKahveci Major:ComputerEngineering Biologicalnetworksencapsulateinvaluableinformationabouttherolesofdifferent biochemicalentitiesandtheirinteractionswitheachother.Analyzingthesenetworks isessentialinordertocomprehendthemachineryofacellandtorevealevolutionary differencesbetweendifferentcellsandorganisms.Threemaintypesofbiological networksareproteininteractionnetworks,metabolicnetworksorpathwaysand regulatorynetworks.Intheliterature,thetermsnetworkandpathwayareused interchangeablyforthemetabolicinteractiondata.Animportanttypeofanalysisfor biologicalnetworksisthecomparativeanalysiswhichaimsatidentifyingfunctionally similarcomponentsofthesenetworksthataresharedamongdifferentspecies. Analogoustosequencealignmentwhichidentiessequencesimilarity,network alignmentrevealssimilarconnectivitypatternssuchasalternativepathsandsubnetworks. Additionaltothecomparativeanalysis,examiningsolelythetopologicalstructure ofbiologicalnetworksalsoledtointerestingobservationssuchasthemodular organization,repeatingconnectivitypatterns,thesteadystatesandspecicdegree distributionsthatthesenetworksexhibit. Inthisthesis,weintroduceiAlignmentalgorithmsformetabolicnetworksthat accountforheterogeneousnetworkelements,connectedsubnetworkmappingsand scalabilityprobleminnetworkalignmentiiAnalgorithmthatpredictsfunctional similaritybetweenreactionsbasedonmetabolicuxanalysis;iiiEfcientmethods 11

PAGE 12

thatidentifysteadystatesofBooleanregulatorynetworksusingbinarydecision diagramsBDDsandgraphpartitioning;ivAnalgorithmthatidentiesdynamic modularstructureofregulatorynetworks. 12

PAGE 13

CHAPTER1 INTRODUCTION Withtherecentadvancesinhighthroughputtechnology,inthelastdecades, signicantamountofresearchhasbeendoneonidenticationandreconstruction ofbiologicalnetworkssuchasregulatory[13],proteininteraction[4]andmetabolic networks[5,6].Thesenetworksarecompiledinpublicdatabases,suchasKEGG KyotoEncyclopediaofGenesandGenomes[7],EcoCycEncyclopediaofE.coli K-12GenesandMetabolism[8],PIDthePathwayInteractionDatabase.[9]andDIP DatabaseofInteractingProteins[10].Thesedatabasesmaintaininformationbothin textualformandgraphicalform.Manyresearchersareworkingonanalyzingdifferent aspectsofthisdatabymeansofcomputationalmethods.Insightgatheredfromthese effortsareofgreatuseinimportantapplicationssuchasdrugtargetidentication[11, 12],metabolicengineering[13]andgeneticengineering[14]. Inthisintroduction,webrieydescribecomputationalmethodsthatwedeveloped toanalyzedifferentaspectsofthesebiologicalnetworks.Wefocusonfourdifferent problemsandthesolutionswedevelopforthem.First,wegiveanoverviewofthe methodswedevelopformetabolicnetworkalignmentSection1.1.Second,wediscuss analgorithmthatpredictsfunctionalsimilaritybetweenreactionsbasedonmetabolic uxanalysisSection1.2.Third,weintroduceanefcientmethodtoidentifysteady statesSection1.3ofBooleanregulatorynetworks.Last,weexplainanalgorithmthat revealsdynamicmodularstructureofregulatorynetworksSection1.4.Weconclude thissectionbytheoutlineoftherestofthisthesisSection1.5. 1.1MetabolicNetworkAlignment Animportanttypeofbiologicalnetworkismetabolicnetworks.Inmetabolic networksrelationshipsbetweendifferentbiochemicalreactionstogetherwiththeir inputs,outputsandcatalyzersareorganizedasnetworks.Analyzingthesenetworks isnecessarytocapturethevaluableinformationcarriedbythem.Anessentialtype 13

PAGE 14

ofanalysisisthecomparativeanalysiswhichaimsatidentifyingsimilaritiesbetween metabolismsofdifferentorganisms.Findingthesesimilaritiesprovidesinsightsfor drugtargetidentication[11],metabolicreconstructionofnewlysequencedgenome[5] andphylogenyreconstruction[15,16].Asamplemetabolicnetworkcanbeseenin Figure1-1. Toidentifysimilaritiesbetweentwonetworksitisnecessarytondamappingof theirentities.Figure1-2showsanalignmentoftwonetworkswithonlyone-to-onenode mappingsallowed.Everynodeneednotbemappedinnetworkalignments.Similarto sequencealignment,suchnodesarecalledinsertionsordeletionsindels. Intheliterature,alignmentisoftenconsideredasndingone-to-onemappings betweenthemoleculesoftwonetworks.Inthiscase,theglobal/localnetworkalignment problemsareGIGraphIsomorphism/NPcompleteasthegraph/subgraphisomorphism problemscanbereducedtotheminpolynomialtime[17].Hence,evenforthecase describedabove,efcientmethodsareneededtosolvethenetworkalignmentproblem forlargescalenetworks. Figure1-1.Asubnetworkofhumanlysinebiosynthesisnetwork.Inthisnetwork, rectanglesrepresentsenzymesthatcatalyzereactions.Eachenzymeis labeledwithitsEnzymeCommissionECnumber.Thecirclesrepresent inputandoutputcompoundsforthereactions. 14

PAGE 15

Anumberofstudieshavebeendonetosystematicallyaligndifferenttypesof biologicalnetworks.Formetabolicnetworks,Pinter etal. [18]devisedanalgorithmthat alignsquerynetworkswithspecictopologiesbyusingagraphtheoreticapproach. Tohsato etal. proposedtwoalgorithmsformetabolicnetworkalignmentonerelying onEnzymeCommissionEC[19]numbersofenzymesandtheotherconsideringthe chemicalstructuresofcompoundsofthequerynetworks[20,21].Latterly,Cheng et al. developedatool,MetNetAligner,formetabolicnetworkalignmentthatallowsa certainnumberofinsertionsanddeletionsofenzymes[22].Thesemethodsfocus onasingletypeofmoleculeandthealignmentisdrivenbythesimilaritiesofthese moleculese.g.,enzymesimilarity,compoundsimilarity.Also,someofthesemethods limitthequerynetworkstocertaintopologies,suchastrees,non-branchingpathsor limitedcycles.Theselimitationsdegradetheapplicabilityofthesemethodstocomplex networks.Onewaytoavoidthisistocombinebothtopologicalfeaturesandhomological similarityofpairwisemoleculesusingaheuristicmethod.Thisapproachhasbeen successfullyappliedtondthealignmentsofproteininteractionnetworks[23,24]. Anotheradvantageofthiscombinationisthatitimprovestheaccuracyofthealignment algorithmwithoutrestrictingthetopologiesofquerynetworks. Figure1-2.Analignmentoftwohypotheticalnetworks.Thedashedlinesrepresentthe mappingbetweennodesoftwonetworks.Inthisalignment,thenodesA1, A2,A3andA5aremappedtothenodesB1,B2,B3andB5respectively. Notethatnotallthenodesneedtobemappedinanalignment.Thenodes A4andB4arenotmappedinthisalignment. 15

PAGE 16

InthefollowingchaptersChapters2,3,4,wediscussfundamentalalgorithms thatwedevelopforaligningmetabolicnetworks.Wedescribeindetailsomeofexisting methodsaimedatsolvingthisprobleminBackgroundsectionofChapter2.Beforewe formulatetheproblemofnetworkalignmentorthesolutionstothisproblem,wediscuss majorchallengesinherentinalignmentofnetworksingeneralandmetabolicnetworksin particular. Challenge1. Acommondelusionofanumberofalgorithmsformetabolicnetwork alignmentistouseamodelthatfocusesononlyonetypeofentityandignoresthe others.Thissimplicationconvertsmetabolicnetworkstographswithonlycompatible nodes.Thewordcompatibleisusedfortheentitiesthatareofthesametype. Forexample,formetabolicnetworkstwoentitiesarecompatibleiftheybothare reactionsorenzymesorcompounds.Thetransformationsthatreducethemetabolic networkstographswithonlycompatibleentitiesarereferredas abstraction .Reaction based[15],compoundbased[20]andenzymebased[18,25]abstractionsareused formodelingmetabolicnetworks.Figure1-3illustratestheproblemswiththeenzyme basedabstractionusedbyPinter etal. [18]andKoyuturk etal. [25].Inthetopportionof Figure1-3A,enzymes E 1 and E 2 interactontwodifferentpaths.Abstractionlosesthis informationandmergesthesetwopathsintoasingleinteractionasseeninthebottom gure.Aftertheabstraction,analignmentalgorithmaligningthe E 1 E 2 interactions inFigures1-3Aand1-3Bcannotrealizethroughwhichpath,outoftwoalternatives,the enzymes E 1 and E 2 arealigned.Itisimportanttonotethattheamountofinformation lostduetoabstractiongrowsexponentiallywiththenumberofbranchingentities. Challenge2. Manyoftheexistingmethodslimitthepossiblemoleculemappingsto onlyone-to-onemappings.AsalsopointedoutbyDeutscher etal. [26]considering eachmoleculeonebyonefailstorevealitsfunctionsincomplexnetworks.This restrictionpreventsmanymethodsfromidentifyingbiologicallyrelevantmappingswhen differentorganismsperformthesamefunctionthroughvaryingnumberofsteps.Asan 16

PAGE 17

A B Figure1-3.Theeffectofabstractionformetabolicnetworks.Topguresinaandb illustratetwohypotheticalmetabolicnetworkswithenzymesandcompounds representedbyletters E and C ,respectively.Bottomguresinaandb showthesamenetworksafterabstractionwhenthecompoundsareignored. Inathetwodifferentpathsbetween E 1 and E 2 intopgurearecombined whencompoundsareignored. example,therearealternativepathsforLL-2,6-Diaminopimelateproductionindifferent organisms[27,28].LL-2,6-Diaminopimelateisakeyintermediatecompoundsince itliesattheintersectionofdifferentpathsonthesynthesisofL-Lysine.Figure1-4 illustratestwopathsbothproducingLL-2,6-Diaminopimelatestartingfrom2,3,4,5Tetrahydrodipicolinate.Theupperpathrepresentstheshortcutusedbyplantsand Chlamydia tosynthesizeL-Lysine.Thisshortcutisnotanoption,forexample,for E.coli or H.sapiens duetothelackofthegeneencodingLL-DAPaminotransferase 17

PAGE 18

.6.1.83. E.coli and H.sapiens havetouseathreestepprocessshownwithgraypath inFigure1-4todothistransformation.Thus,ameaningfulalignmentshouldmapthe twopathswhenforinstance,thelysinebiosynthesisnetworksofhumanandaplantare aligned.However,sincethesetwopathshavedifferentnumberofreactionstraditional alignmentmethods,limitedtoone-to-onemappings,failtoidentifythismapping. Challenge3. Aligninglargescalenetworksisacomputationallychallengingproblem duetotheunderlyingsubgraphisomorphismproblemthathastobesolvedtond thealignmentthatmaximizesthesimilaritybetweenthequerynetworks.Existing methodseitherrestrictthetopologiesofquerynetworksand/ortheirsizestoavoid prohibitivecomputationalcosts.Forinstance,themethodofPinter etal. [18]took aroundoneminuteperalignmentonadatasetwithonlysmallsizenetworksranging from2to41nodes.Anothermethod,QNet,worksonlyifthesmallerquerynetwork isamulti-sourcetreeandtherunningtimeisfeasiblewhenthisnetworkhasupto12 nodes.TheSubMAPmethodwedevelopinthisthesisChapter3hasnolimitations onthequerytopologiesandallowsmappingsofnodesetsthatareconnectedi.e., subnetworks.However,allowingsubnetworkscomesatacostofincreasingrunning timeduetothefactthatthenumberofallconnectedsubnetworksuptoagivensizecan beexponentialinthesizeofthenetwork.Foranetworkofsize70andsubnetworksizes upto3,SubMAPtakesaround2minutesand200MBsofmemoryontheaverageper alignmentwithadatabaseof50networkswithsizesrangingfrom2to57.Therefore, improvingtherunningtimeandmemoryutilizationofthesemethodsisnecessaryto leveragethealignmentoflargerscalenetworksespeciallywhensubnetworkmappings areallowed. Here,wepresentalignmentalgorithmsformetabolicnetworksthataddressesthe abovechallenges.Ouralgorithmsidentifybiologicallyrelevantalignmentsbycombining thehomologicalandtopologicalsimilaritiesofthequerynetworks.Therstmethodwe proposeusesacomprehensivemodelthatallowsaligningmetabolicnetworkswithout 18

PAGE 19

Figure1-4.AportionofanalignmentofLysinebiosynthesisnetworksfromtwodifferent organisms.EachreactionisrepresentedbytheEnzymeCommissionEC numberoftheenzymethatcatalyzeit.Circlesrepresentcompounds intermediatecompoundsarenotshown. E.coli and H.sapiens humanuse thepathcoloredbygraywiththreereactions,whereasplantsand Chlamydia achievethistransformationdirectlythroughthepathwithasinglereaction showninwhite. anyabstractioni.e.,withoutignoringcompoundsandenzymesChapter2.Our secondmethod,SubMAP,allowsmappingtheentitysetstoeachotherbyrelaxingthe restrictionof1-to-1mappings.Thisallowsustoidentifybiologicallyrelevantalignments thatcannotbeidentiedbypreviousmethodsbutcomesatanincreasingcomputational costandanumberofadditionalchallengesChapter3.Thethirdmethodwepresent hereintroducesaframeworkthatsignicantlyimprovesthescaleofthenetworksthat canbealignedinpracticaltimeusingexistingnetworkalignmentmethodsSection4. Forthisframework,wedevelopascalablegraphcompressiontechniquethatweuseto performthenetworkalignmentinthreemajorphases,namelythecompressionphase, thealignmentphaseandtherenementphase.WeusethetoyexampleinFigure1-5to explaindifferentstepsofthesemethodsinthecorrespondingchapters. 1.2FunctionalSimilaritiesofEntitySets Themethodsdiscussedaboveformetabolicnetworkalignmentallconsiderthe homologicaland/ortopologicalsimilaritiesofnetworks.Otherthanhomologyand topologybasedmethods,anothercommonwaytoanalyzemetabolicnetworksisto identifytheirmetaboliccapabilitiesintermsoftheirsteadystates.Asteadystateof anetworkisafeasibleuxdistributionthatrepresentsapossiblelongtermoutcome 19

PAGE 20

A B Figure1-5.Atoyexamplewithtwohypotheticalmetabolicnetworksthatwillbeused throughoutthenextchapters.Forsimplicity,weonlydisplaythereactionsof networks. ofthatnetwork.Thesteadystatesofanetworkdeterminethesetoffunctionsitcan perform.Thesestatesdeneapolyhedralconeinahighdimensionalspacewhere uxesofthenetworkcorrespondstodimensions.Themodelssuchaselementaryux modesEFMsandextremepathwaysEPsdenetheboundariesofthismetabolicux cone.However,thecontributionsofthesubsetsofnetworkcomponentsinthisuxcone sofarhasnotbeencharacterizedmathematically.Also,thefunctionalsimilaritiesof differentcomponentsetse.g.,setsofreactionshasnotbeenexpressedasafunction ofthesteadystatesofmetabolicnetworks. InChapter5,wedevelopamodeltollthisgapbyquantifyingtheimpactofa setofcomponentsonthesteadystatesofanetworkusingEFMs.Atahighlevel,we modeltheimpactofagivencomponentsetasthechangeintheuxconewhenallthe elementsofthatsetareinhibited.Furthermore,giventwosetsofcomponentsfrom differentnetworks,wemeasuretheirfunctionalsimilarityasthesimilarityoftheirimpacts oncorrespondingnetworks.Computingthefunctionalsimilarityisacomputationally challengingtaskasitrequiresndingthevolumesoftheintersectionandtheunionof twopolyhedralconesinhighdimensionalspace.Thesevolumescannotbeexpressedin closedform.Inthiswork,wersttransformthepolyhedralconestopolytopesandthen useminimumenclosingballstocalculatethisintersectionefciently. 20

PAGE 21

1.3SteadyStatesofBooleanRegulatoryNetworks Regulatorynetworksconsistofbindingrelationsbetweengeneproductse.g., proteinsandDNAsegmentsthatregulategeneexpressione.g.,promoters.These regulatoryinteractionscanbeofdifferenttypessuchasinhibitionoractivation. Interactionstakeplaceonlywhentheconcentrationofregulatoryelementsreacha certainlevel.Whenaninteractionbecomesactiveittriggersasetofothereventsinthe regulatorynetwork.Therefore,regulatorynetworksshowdynamicbehaviorwhichhasto beanalyzedinordertounderstandtheirroleindifferentprocessesofalivingorganism. Onewaytocharacterizearegulatorynetworkistoidentifyitssteadystates.Steady statesrepresentthelongtermbehaviorofthenetwork.Characterizingthislongterm behavioriscriticalinunderstandinghowbiologicalfunctionstakeplaceinorganisms. Forinstance,identicationofsteadystatesofBRNsisusedinseveralapplications suchasthetreatmentofvarioushumancancers[29,30]e.g.leukemia,glioblastoma andgeneticengineering[14].Additionally,thesteadystateanalysishasproventobe successfultoexplaintheowermorphogenesisof Arabidopsisthaliana [3133],the mechanismofTcellreceptorsignaling[34]andthecellcyclesofyeasttypes[35,36]. InChapter6,weproposeamethodforidentifyingallthesteadystatesofregulatory networkswithBooleanstates 0 inactive, 1 active.Webuildamathematicalmodel thatallowspruningalargeportionofthestatespacequicklywithoutcausinganyfalse dismissals.Fortheremainingstatespace,whichistypicallyverysmallcomparedto thewholestatespace,wedeveloparandomizedtraversalmethodthatextractsthe steadystates.Weestimatethenumberofsteadystates,andtheexpectedbehaviorof individualgenesandgenepairsinsteadystatesinanonlinefashion.Also,weformulate astoppingcriterionthatterminatesthetraversalassoonasusersuppliedpercentage oftheresultsarereturnedwithhighcondence.Asanimprovementoverthismethod, weproposeagraphpartitioningstrategythatrstpartitionstheregulatorynetworkinto componentsthatareweaklyconnectedtoeachother.Itthenndsthesteadystateof 21

PAGE 22

eachcomponentandcombinesthemtoobtaintheoverallsteadystatesfortheoriginal regulatorynetwork. 1.4DynamicModularStructureofRegulatoryNetworks Anotherimportantcharacteristicofregulatorynetworksisthemodularstructure. Oftengroupsofgenesinregulatorynetworks,calledmodules,workcollaborativelyon similarfunctions.Mathematically,themodulesinanetworkhasoftenbeenthoughtas agroupofnodesthatinteractwitheachothersignicantlymorethantherestofthe network.Findingsuchmodulesisoneofthefundamentalproblemsinunderstanding generegulation.Sincetheregulatorynetworksaredynamici.e.,activitylevelsofthe networkelementschangeintime,itisnecessarytotrackthechangesinmodular structureintime. Fortrackingthedynamicsofmodularstructuresinregulatorynetworks,in Chapter7,wedevelopanalgorithmthatextendsexistingcommunitystructure identicationmethodstothecaseofdynamicregulatorynetworks.Unlikeexisting methods,ourmethodrecognizesthattherearedifferenttypesofinteractionsactivation, inhibition,theseinteractionshavedirectionsandtheytakeplaceonlyiftheactivity levelsoftheactivatingorinhibitinggenesareabovecertainthresholds.Furthermore, itconsidersthatasaresultoftheseinteractions,theactivitylevelsofthegeneschange overtimeevenintheabsenceofexternalperturbations.Thisway,weaddressboth thedynamicbehaviorofgeneactivitylevelsandthedifferentinteractiontypesbyan incrementalalgorithmthatisscalabletoverylargeregulatorynetworkswithmany dynamicsteps. 1.5Outline Wepresentouralgorithmthatndsconsistentalignmentsofmetabolicnetworks withoutabstractioninChapter2.Chapter3discussesourmethodthatallowssubnetwork mappingsinnetworkalignment.Chapter4describesthescalablecompression techniquewedevelopandtheframeworkthatutilizescompressiontoprovidesignicant 22

PAGE 23

runningtimegainoverexistingalignmentmethods.Thealgorithmthatusesconstraint-based modelingofmetabolicnetworkstoinferfunctionalsimilaritiesispresentedinChapter5. WeintroduceamethodthatcomputesallthesteadystatesofBooleanregulatory networksinChapter6.Chapter7containsthedetailsofhowweidentifythedynamic modularstructureofregulatorynetworks.Chapter8containstheconclusions. 23

PAGE 24

CHAPTER2 ALIGNMENTOFMETABOLICNETWORKSWITHOUTABSTRACTION Inthischapter,wepresentanetworkalignmentalgorithmthataddressestherst challengedescribedinSection1.1[37,38].Ourmethodconsidersallthebiological entitiesinmetabolicnetworksnamelycompounds,enzymesandreactions.We pursuetheintuitionthatbothpairwisesimilaritiesofentitiesandthesimilaritiesof theirneighborhoodarecrucialinnetworkalignment.Hence,inouralgorithmweaccount forboththeeffectofpairwisesimilaritieshomologyandtheeffectoforganization ofnetworktopology.Inapreviousstudy,Singh etal. [23]combinedhomologyand topologyforpairwiseproteininteractionnetworkalignment.Also,theyextendedthis approachformultiplealignmentofproteininteractionnetworks[24].Asimilarapproach isalsousedfordiscoveryofauthoritativeinformationsourcesontheWorldWideWeb byKleinberg[39].Inthecaseofproteininteractionnetworks,thealignmentproblem ismappedtoasingleeigenvalueproblemsinceallnodesareofthesametypeand interactionsbetweenthemareassumedtobeundirected.Thealgorithmproposed bySingh etal. ,however,cannotbetriviallyextendedtometabolicnetworksasthese networkscontainentitiesofvaryingtypesandthedirectionsoftheinteractionsare important. Formetabolicnetworkalignment,werstcreatethreeeigenvalueproblemsonefor compounds,oneforreactionsandoneforenzymes.Also,weconsiderthedirections oftheinteractions.Wesolvetheseeigenvalueproblemsusingpowermethod.The principaleigenvectorsofeachoftheseproblemsdeneaweightedbipartitegraph. We,then,extractreactionmappingsusingmaximumweightbipartitematchingonthe correspondingbipartitegraph.Afterthat,toensureconsistencyofthealignment,we prunetheedgesinthebipartitegraphsofcompoundsandenzymeswhichleadto inconsistentalignmentswithrespecttoreactionmappings.Finally,wendtheenzyme andthecompoundmappingsusingmaximumweightbipartitematching.Wereportthe 24

PAGE 25

extractedmappingsofentitiesasanalignmenttogetherwithasimilarityscorethatwe deviseformeasuringthesimilaritybetweenthealignednetworks.Furthermore,we measuretheunexpectednessoftheresultingalignmentbycalculatingaZ-score. OurexperimentsonKEGGPathwaydatabaseshowthatouralgorithmsuccessfully identiesfunctionallysimilarentitiesandalternativepathsinnetworksofdifferent organisms.Wevalidatethesignicanceofouralignmentresultsbycomparingthem tobiologicallyidentiedresults.Furthermore,byusingthealignmentscoresof28 commonnetworks,wecreateaphylogenictreefor73organismsandcompareittothe predictionsofHeymans etal. [16]andClemente etal. [40]byusingtheNCBI[41]tree asagoldstandard.Weverifybytwodifferenttreedistancemeasures[42,43]thatthe phylogenictreecreatedbyouralgorithmismoresimilartoNCBItreethanthetrees createdbytheothermethods.Moreover,ouralgorithmreportedthephylogenictree13 timesfasterthanthealgorithmofHeymans etal. Ourtechnicalcontributionscanbesummarizedasfollows.Weintegratethegraph modelthatwedevisedearlier[11]intothecontextofnetworkalignment.Usingthis model,wedevelopanalgorithmtoalignnetworkswhenthereisnoabstraction.Unlike existinggraphmodels,thismodelisanonredundantrepresentationofnetworkswithout anyabstraction.Weintroducetheconsistencyconceptforalignmentofnetworks withdifferententitytypesbyconstructingreachabilitysets.Wedevelopanalgorithm thatalignsnetworkswithdifferenttypesofentitieswhileenforcingconsistency.We proposeapairwisealignmentalgorithmthatworksforanynetworktopology.Having notopologyrestriction,ourmethodisapplicableto100%ofmetabolicnetworksunlike existingalgorithms.WedeviseasimilarityscoreandaZ-scoreformeasuringsimilarities betweentwometabolicnetworks.Weveriedthesignicanceofdevisedscores. Theorganizationoftherestofthischapterisasfollows:Section2.1discusses therelatedwork.Section2.2describesthenetworkmodelusedbythealgorithm. Section2.3outlinesthealgorithm.Section2.4illustratestheexperimentalresults. 25

PAGE 26

2.1Background Theeffortsoncomparativeanalysisofbiologicalnetworksmainlyfocusedonthe alignmentoftwotypesofnetworks,namelymetabolicandprotein-proteininteraction networksPPI.Inthisthesis,wediscussindetailthealgorithmswedevelopfor alignmentofmetabolicnetworks.Inthissection,weaimtoprovidetheinterested readerwithpointersofothernetworkalignmentalgorithms.First,wementionbriey anumberofmethodsdevelopedforaligningPPInetworks.We,then,provideshort summariesofmetabolicnetworkalignmentalgorithmswhicharenotdiscussedinother sections. Inordertoidentifyconservedinteractionpathsandcomplexesbetweentwo PPInetworks,Kelley etal. developedamethodcalledPathBLAST[44,45].This methodisananalogofsequencealignmentalgorithmBLAST,hencethename,and searchesforhigh-scoringalignmentsbetweentwoproteininteractionpathsbypairing orthologsproteinsonefromeachpathaccordingtothesequencesimilaritybetween themandtheirordersinthecorrespondingpaths.PathBLASTidentiedlargenumber ofconservedpathsandduplicationeventsbetweentwodistantlyrelatedspecies S. cerevisiae and H.pylori [44].Inanefforttogeneralizethetopologyofthealignment frompathstogeneralgraphstructures,Koyuturk etal. modeledthealignmentasa graphoptimizationproblemandproposedefcientalgorithmstosolvethis[25,46,47]. Theybasedtheirframework,MAWISH,onduplication/divergencemodeltocapturethe evolutionarybehaviorofPPInetworks.MAWISHrstconstructsanalignmentgraphwith insertionsanddeletionsallowed.Then,itndsthemaximumweightinducedsubgraph ofthisgraphandreportsitastheresultingalignment.Inanotherwork,Berg etal. used simulatedannealingtoaligntwoPPInetworks[48,49].Theydevisedascoringfunction forderivedmotifsfromfamiliesofsimilarbutnotnecessarilyidenticalpatterns.They developedagraphsearchalgorithmthataimstondthemaximum-scorealignments ofthequerynetworks.ThemethodproposedbyNarayanan etal. reliesonsplittingthe 26

PAGE 27

networksrecursivelyandmatchingthesubnetworkstoconstructtheoverallalignment ofqueryPPInetworks[50].Theiralgorithmgaveprovableguaranteesthecorrectness ofthealignmentaswellasitsefciency.Dutkowski etal. [51]builtancestralnetworks guidedbyevolutionaryhistoryofproteinstogetherwithastochasticmodelofinteraction emergence,lossandconservation.TheapplicationoftheirmethodonPPInetworksof differentspeciesrevealedthatthemostprobableconservedancestralinteractionsare oftenrelatedtoknownproteincomplexes.QNet,developedbyDost etal. ,employed colorcodingtechniquetodecreasethenumberofiterationsnecessarytondthe highestscoringalignmentbetweentwoquerynetworks[52].Itextendedtheirearlier method,QPath[53],thatworksforonlylinearqueriesandallowednon-exactmatches. QNetlimitedthequerynetworkstotreesandperformedefcientalignmentswithupto nineproteins.Singh etal. proposedanewframeworkthatavoidsanytypeoftopology restrictionsonquerynetworks[23,24].Thisframework,IsoRank,inspiredbyGoogle's PageRankalgorithm,providesanefcientheuristicthatdenesasimilarityscore capturingboththesequencesimilaritiesoftheproteinsandthetopologicalsimilarity ofquerynetworks.Mappingthealignmentproblemtographisomorphismproblem, itreportsthealignmentasthehighestscoringcommonsubgraphbetweenthequery networks.Recently,theyextendedthisframeworktoincludeproteinclustersandto allowmultiplenetworksIsoRankN[54].Onelastmethod,NetworkBLAST,thataligns PPInetworksisbasedonanearlieralgorithmofSharan etal. [55,56].NetworkBLAST constructsanetworkalignmentgraphfromqueriesandusesaheuristicseed-extension methodtosearchforconservedpathsorcliquesinthisalignmentgraph.Ithasbeen extendedtoNetworkBLAST-MthatallowsalignmentofmultiplePPInetworksandrelies onanefcientrepresentationthatisonlylinearintheirsizeofquerynetworks. Alignmentofmetabolicnetworksismotivatedbyitsimportanceinunderstandingthe evolutionofdifferentorganisms,reconstructingmetabolicnetworksofnewlysequenced genomes,identifyingdrugtargetsandmissingenzymes.Tothebestofourknowledge, 27

PAGE 28

therstsystematicmethodtocomparativelyanalyzemetabolicnetworksofdifferent organismsisproposedbyDandekar etal. [57].Theyfocusedonglycolyticpathway andcombinedelementaryuxmodeanalysiswithnetworkalignmenttorevealnovel aspectsofglycolysis.Theyidentiedalternativeenzymesi.e.,isoenzymesaswellas severalpotentialdrugtargetsindifferentspecies.Shortlyafter,Ogata etal. proposed aheuristicgraphcomparisonalgorithmandappliedittometabolicnetworkstodetect functionallyrelatedenzymeclusters[58].Thismethodallowedextractionoffunctionally relatedenzymeclustersfromcomparisonofcompletemetabolicnetworksof10 microorganisms.ByrelyingonthehierarchyofEnzymeCommissionECnumbers, Tohsato etal. developedanalignmentmethodforcomparingmultiplemetabolic networks[21].Theirideawastoexpressreactionsimilaritiesbythesimilarities betweenECnumbersoftheenzymesofrespectivereactions.Theydevisedanew similarityscoreforenzymes,informationcontentenzymesimilarityscore,andadynamic programmingalgorithmthatmaximizesthisscorewhilealigningquerynetworks.Chen andHofestadtdevelopedanalgorithmanditswebtoolnamedPathAlignerthatis onlycapableofaligninglinearsubgraphsofmetabolicnetworks[59,60].Itprovideda frameworkforpredictionandreconstructionofmetabolicnetworksusingalignment. In2005,Pinter etal. publishedakeymethodinmetabolicnetworkalignmentwith thenameMetaPathwayHunter[18].Thisdynamicprogrammingmethodwasbased onanefcientpatternmatchingalgorithmforlabeledgraphs.Theyconsideredthe scenariowhereoneoftheinputnetworksisthequeryandtheotheristhedatabaseor text.Theyshowedthattheirapproximatelabeledsubtreehomeomorphismalgorithm worksinpolynomialtimewhenquerynetworksarelimitedtomulti-sourcetreesi.e., nocycles.WernickeandRascheproposedafastandsimplealgorithmthatdoesnot limitthetopologiesofthequerynetworks[61].Thisalgorithmreliedonwhattheycalled localdiversitypropertyofmetabolicnetworks.Exploitingthisproperty,theysearched formaximum-scoreembeddingofthequerynetworktothedatabasenetworks.An 28

PAGE 29

alternativeapproachtonetworkalignmentusingintegerquadraticprogrammingIQP wasdevelopedbyLi etal. [62].Theyusedasimilarityscorethatcombinespairwise similaritiesofmoleculesandtopologicalsimilaritiesofnetworks.Theyformulatedthe problemassearchingforafeasiblesolutionthatmaximizesthissimilaritybetweenthe querynetworks.ForcaseswhereIQPcanberelaxedtothecorrespondingquadratic programmingQP,thismethodalmostalwaysguaranteesanintegersolutionand hencethealignmentproblembecomestractablewithoutanyapproximation. Oneofthemorerecentmethodsformetabolicnetworkalignment,byTohsato andNishimura[20],usedsimilarityofchemicalstructuresofthecompoundsofa network.Realizingtheproblemswiththeirearlierworkduetodiscrepanciesinthe EChierarchyofenzymes[21],theauthorsfocusedsolelyonthechemicalformulaof compoundsinsteadofECnumbersofenzymes.Theyemployedngerprint-based similarityscorewhichutilizespresenceorabsenceofimportantatomsormolecular substructuresofthecompoundstodenetheirsimilarity.Thealignmentphaseofthe algorithmusesasimpledynamicprogrammingformulation.Yetanothersimilarityscore andanalignmentalgorithmformetabolicnetworkswasproposedbyLi etal. [63,64]. Theyaimedatseekingfordiversityandalternativesinhighlyconservedmetabolic networks.Theirsimilarityscoreincorporatedfunctionalsimilarityofenzymeswiththeir sequencesimilarity.Takingreactiondirectionsintoaccount,theyrstconstructedall buildingblocksofthealignmentandsequentiallymatchandscoretondthebest alignmentexhaustively.Latterly,Cheng etal. developedatool,MetNetAligner,that allowsacertainnumberofinsertionsanddeletionsofenzymesinformetabolicnetwork alignment[22,65].Itusedanenzyme-to-enzymefunctionalsimilarityscorewiththe goalofidentifyingandllingthemetabolicnetworkholesbytheresultingalignments. MetNetAlignerlimitedoneofthequerynetworkstoadirectedgraphwithrestrictedcyclic structurewhereastheotherquerygraphisallowedtohavearbitrarytopology. 29

PAGE 30

Allthesemethodsmodeltheinputnetworksasgraphswithinteractionsbetween entitiesofasingletype.Thisabstractioncancausesignicantinformationlossas seeninFigure1-3.Inthiswork,weavoidanytypeofabstractiononnetworks.We useacomprehensivegraphmodelwhichisnotbiasedononeentitytype.Refusing theabstractionandusinganefcientiterativealgorithm,ourtooloutperformsexisting methodsformetabolicnetworkalignmentinmanyaspects. 2.2Model Therststepindevelopingeffectivecomputationaltechniquestoleveragenetworks istodevelopanaccuratemodelforrepresentationofnetworks.Anumberofmodels havealreadybeendeveloped,suchasboolean[66,67]andlogicalnetworks[68]. Thesemodelsoftendenoteproteins,enzymesandcompoundsusingverticesand interactionsusingedges.TheyconsideranodeasanANDoranORoperator,and assignbooleanvaluestoedges.Dualmodelscanbebuiltbyassigningbooleanvalues toverticesandbyassigningAND/ORoperatorstoedges.Althoughtheyareeffective inmodelingproteininteractions,thesemodelsarenotsufcientformetabolicnetworks asreactionscantakeplaceinfractionalrates.Stochastic[69,70]andstoichiometric models[71,72]addressthisproblembyrepresentingreactionsasstoichiometricevents. Hatzimanikatis etal. [73]proposedtointegrateenzymechemistrywithmetabolite structuretounderstandmetabolicnetworksbetter.Someoftheothercommonlyused modelsareBayesiannetworks[74,75]andhyper-graphs[7678].Sharon etal. [79] usedMarkovnetworkstomodelDNA-proteininteractions.Ma etal. [80]represented metabolicnetworksusinggraphswithreactionsasvertices. However,theseexistinggraphmodelsarenotsufcientforrepresentingall interactionsbetweendifferententitytypesthatarepresentinmetabolicnetworks. Figure1-3emphasizestheimportanceofthemodelingschemefornetworkalignment. AsdiscussedinSection2.1,abstractionsinmodelingreducethealignmentaccuracy. Existingalignmentmethodsusenetworkmodelswhichonlyfocusonasingletypeof 30

PAGE 31

entity,asstatedinChallenge1.Thissimplicationconvertsmetabolicnetworkstothe graphswithonlycompatiblenodes.Weusethewordcompatiblefortheentitiesthat areofthesametype.Formetabolicnetworks,twoentitiesarecompatibleiftheyboth arereactionsorenzymesorcompounds.Thealgorithmwepresentinthischapteruses anetworkmodelwhichconsidersalltypesofentitiesandinteractionsbetweenthem. Next,wedescribethismodel. Let P beametabolicnetworkand R = fR 1 ; R 2 ;:::; R jRj g C = fC 1 ; C 2 ;:::; C jCj g and E = fE 1 ; E 2 ;:::; E jEj g denotethesetsofreactions,compoundsandenzymesofthis networkrespectively.Usingthisnotation,thedenitionbelowformalizesthegraph modelemployedbythisalgorithm: Denition1. Thedirectedgraph, P = V;E ,forrepresentingthemetabolicnetwork P isconstructedasfollows:Thenodeset, V =[ R ; C ; E ] ,istheunionofreactions, compoundsandenzymesof P .Theedgeset, E ,isthesetofinteractionsbetween differentnodes.Aninteractionisrepresentedbyadirectededgethatisdrawnfroma nodextoanothernodeyifandonlyifoneofthefollowingthreeconditionsholds: 1xisanenzymethatcatalyzesreactiony. 2xisaninputcompoundofreactiony. 3xisareactionthatproducescompoundy. Figure2-1illustratestheconversionofaKEGGmetabolicnetworktothegraph modeldescribedabove.Assuggested,thismodeliscapableofrepresentingmetabolic networkswithoutlosinganytypeofentitiesorinteractionsbetweentheseentities.This modelavoidsanykindofabstractioninalignment.Besides,thismodelisnonredundant sinceitavoidsrepetitionofthesameentity.InFigure2-1Atheenzyme1.2.4.1is showntwicetorepresenttwodifferentreactions,whereasinthelattermodelshownin Figure2-1Bitisrepresentedasasinglenode. 31

PAGE 32

A B Figure2-1.Graphrepresentationofmetabolicnetworks. A Aportionofthereference networkofAlanineandaspartatemetabolismfromKEGGdatabase B Our graphrepresentationcorrespondingtothisportion.Reactionsareshownby rectangles,compoundsareshownbycirclesandenzymesareshownby triangles. 32

PAGE 33

2.3Algorithm Thissectionformalizestheproblemofaligningcompatibleentitiesoftwometabolic networkswithonlyone-to-onemappings.Inthissection,werstdenethealignment andconsistencyofanalignment.Then,6wegivetheformaldenitionoftheproblem. Let, P; P standforthetwoquerymetabolicnetworkswhicharerepresentedby graphs P = V;E and P = V; E ,respectively.Usingthegraphformalizationgivenin Section2.2,wereplace V with [ R ; C ; E ] where R denotesthesetofreactions, C denotes thesetofcompoundsand E denotesthesetofenzymesof P .Similarly,wereplace V with [ R ; C ; E ] Denition2. Analignmentoftwometabolicnetworks P = V;E and P = V; E ,isa mapping : V V Beforearguingtheconsistencyofanalignment,wediscussthereachabilityconcept forentities.Giventwocompatibleentities v i ;v j 2 V v j isreachablefrom v i ifandonly ifthereisadirectedpathfrom v i to v j ingraph G .Asashorthandnotation, v i v j denotes v j isreachablefrom v i Usingthedenitionandthenotationabove,thedenitionofaconsistentalignment isasfollows: Denition3. Analignmentoftwonetworks P = V;E and P = V; E denedbythe mapping : V V isconsistentifandonlyifalltheconditionsbelowaresatised: Forall v = v where v 2 V and v 2 V v and v arecompatible. isone-to-one. Forall v i = v i thereexists v j = v j suchthat v i v j and v i v j ,or v j v i and v j v i ,where v i ;v j 2 V and v i ; v j 2 V TherstconditioninDenition3ltersoutmappingsofdifferententitytypes.The secondconditionensuresthatnoneoftheentitiesaremappedtomorethanoneentity. Thelastconditionrestrictsthemappingstotheoneswhicharesupportedbyatleast 33

PAGE 34

A B Figure2-2.Consistencyofanalignment.Figuresinaandbaregraph representationsoftwoquerynetworks.Enzymesarenotdisplayedfor simplicity.SupposethatthealignmentalgorithmmappedthereactionsR1to R1'andR2toR2'.Inthisscenario,aconsistentmappingisC1-C1'.An exampleforanonsensicalmappingthatcausesinconsistencyisC2'-C5, sinceitconictswiththegivenmappingofreactions. oneothermapping.Thatis,iteliminatesthenonsensicalmappingswhichmaycause inconsistencyasdescribedinFigure2-2. Now,let, SimP : P; P R [0 ; 1] beapairwisenetworksimilarityfunction, inducedbythemapping .Themaximumscorei.e., SimP =1 isachievedwhen twonetworksareidentical.Insection2.3.5,wewilldescribeindetailhow SimP is computedafter iscreated.Inordertorestatetheproblem,itisonlynecessarytoknow theexistenceofsuchsimilarityfunction. Inthelightoftheabovedenitionsandformalizations,hereistheproblem statementconsideredinthissection: Denition4. Giventwometabolicnetworks P = V;E and P = V; E ,thealignment problemistondaconsistentmapping : V V thatmaximizes SimP P; P Inthefollowingsections,wedescribethemetabolicnetworkalignmentalgorithm. 34

PAGE 35

2.3.1PairwiseSimilarityofEntities Metabolicnetworksarecomposedofdifferententitieswhichareenzymes, compoundsandreactions.Thedegreeofsimilaritybetweenpairsofentitiesoftwo networksis,usually,agoodindicatorofthesimilaritybetweenthenetworks. Anumberofsimilaritymeasureshavebeendevisedforeachtypeofentityinthe literature.Intherestofthissection,wedescribethesimilarityfunctionsthealgorithm usesforenzymeandcompoundpairs.Wealsodiscussthesimilarityfunctionthe authorsdevelopedforreactionpairs.Allpairwisesimilarityscoresarenormalizedtothe intervalof [0 ; 1] toensurecompatibilitybetweensimilarityscoresofdifferententities. Enzymes: Anenzymesimilarityfunctionisoftheform SimE : E E! R [0 ; 1] Twocommonlyusedenzymesimilaritymeasuresare: Hierarchicalenzymesimilarityscore[21]dependsonlyontheEnzymeCommission EC[19]numberswhichmadeupof4numerals.Startingfromtheleftmostnumbers oftwoenzymesitadds 0 : 25 tosimilarityscoreforeachcommondigituntiltwonumbers differ.Forinstance, SimE : 1 : 2 : 4 ; 5 : 2 : 2 : 4=0 sinceleftmostnumbersaredifferent, whereas SimE 6 : 1 : 2 : 4 ; 6 : 2 : 3 : 4=0 : 25 and SimE 6 : 1 : 2 : 4 ; 6 : 1 : 2 : 103=0 : 75 Informationcontentenzymesimilarityscore[18]usesECnumbersofenzymes togetherwiththeinformationcontentofthisnumberingscheme.Itisdenedas )]TJ/F24 11.9552 Tf 9.299 0 Td [(log 2 h= j E j where h isthenumberofenzymesthatareelementsofthesmallest commonsubtreecontainingbothenzymesand j E j isthenumberofallenzymesinthe database.Inordertomaintainthecompatibility,thisscoreisnormalizedbydividingitto log 2 j E j .Figure2-3showshowtocalculateinformationcontentenzymesimilarityscore forenzymes 6 : 1 : 2 : 4 and 6 : 1 : 2 : 103 Compounds: Forcompounds,theformofthesimilarityscoresis SimC : C C! R [0 ; 1] .Unliketheenzymes,thereisnohierarchicalnumberingsystemforcompounds. Therefore,usinghierarchyisnotanoptioninthiscase.Betterapproachforcompounds 35

PAGE 36

Figure2-3.Calculationof SimE : 1 : 2 : 4 ; 6 : 1 : 2 : 103 usinginformationcontentenzyme similaritymeasure. istoconsiderthesimilarityoftheirchemicalstructures.Hereistwodifferentmethods commonlyusedforcompoundsimilarity: Identityscoreforcompoundsiscomputingthesimilarityscoreas1iftwo compoundsareidenticaland0otherwise. SIMCOMPsimilarityscoreforcompoundsisdenedbyHattori etal. [81]This scoreisassessedbymappingchemicalstructuresofcompoundstographsandthen measuringthesimilaritybetweenthesegraphs.Thisalgorithmusesloosecompound similarityscoringschemeofSIMCOMP. Reactions: Theauthorsdeneasimilarityscoreforthereactionsusingthe similarityscoresforenzymesandcompounds.Anaccuratesimilarityscoreforreactions shouldratheraccountfortheprocessperformedbythereactionthanitslabel.Thisis becausereactionscatalyzedbyenzymesaffectthestateofthenetworkbytransforming 36

PAGE 37

asetofinputcompoundstoasetofoutputcompounds.Thesimilarityscorefor reactionsdependsonthesimilaritiesoftheenzymesandthecompoundsthattake placeinthisprocess. Thesimilarityfunctionforreactionsisoftheform SimR : R R! R [0 ; 1] Itemploysthemaximumweightbipartitematchingtechnique.Thefollowingisabrief descriptionofthemaximumweightbipartitematching: Denition5. M AXIMUM W EIGHT B IPARTITE M ATCHING Let, A and B betwodisjoint nodesetsand W bean j A jj B j matrixrepresentingedgeweightsbetweenallpossible pairswithoneelementfromAandoneelementfromB,whereexistingedgescorrespondtoanonzeroentryin W .ThemaximumweightbipartitematchingofAandBisa one-to-onemappingofnodes,suchthatthesumofedgeweightsbetweentheelements ofthesepairsismaximum.Wedenotethissumby MWBM A;B;W Let, r i and r j betworeactionsfrom R and R respectively.Thereaction r i isa combinationofinputcompounds,outputcompoundsandenzymesdenotedby [ I i ;O i ; E i ] where I i ;O i C and E i E .Similarly,dene r j as [ I j ;O j ; E j ] .Additionally,computethe edgeweightmatrices W O and W I usingtheselectedcompoundsimilarityscoreand W E usingtheselectedenzymesimilarity. Thesimilarityscoreof r i and r j iscomputedas: SimR r i ; r j = I MWBM I i ;I j ;W I + O MWBM O i ;O j ;W O + E MWBM E i ; E j ;W E Here, I ; O ; E arerealnumbersin [0 ; 1] interval.Theydenotetherelativeweights ofinputcompounds,outputcompoundsandenzymesonreactionsimilarityrespectively. Typicalvaluesfortheseparametersare I = O =0 : 3 and E =0 : 4 .Thesevaluesare empiricallydeterminedafteranumberofexperiments.Onemorefactorthatdenes reactionsimilarityisthechoiceof SimE and SimC functions.Sincetherearetwo 37

PAGE 38

optionsforeach,intotaltherearefourdifferentoptionsforreactionsimilaritydepending onthechoicesof SimE and SimC Now,itistimetocreatethepairwisesimilarityvectors H R 0 H C 0 H E 0 forreactions, compoundsandenzymes,respectively.Since,calculationofthesevectorsisverysimilar foreachentitytypewejustdescribetheoneforreactions. Figure2-4displaysstepbystepcomputationofhomologyvectorfromthe homologicalsimilaritymatrixofourtoyexample.Theentry H R 0 i )]TJ/F15 11.9552 Tf 12.999 0 Td [(1 jRj + j of H R 0 vectorstandsforthesimilarityscorebetween r i 2R and r j 2 R ,where 1 i jRj and 1 j j Rj .Wewillusethenotation H R 0 i;j forthisentrysince H R 0 canbe viewedasa jRjj Rj matrix.Onethingtobecarefulaboutisthat H R 0 ;H C 0 ;H E 0 vectors shouldbeofunitnorm.Asweclarifyinsection2.3.2,thisnormalizationiscrucialfor stabilityandconvergenceofthealgorithm.Therefore,wecomputeanentryof H R 0 as: H R 0 i;j = SimR r i ; r j jj H R 0 jj 1 r 1 r 2 r 3 r 4 r 1 0.80.30.10.5 r 2 0.10.90.20.7 r 3 00.30.70.4 A r 1 r 1 0.8 r 2 r 1 0.1 r 3 r 1 0 r 1 r 2 0.3 r 2 r 2 0.9 r 3 r 2 0.3 r 1 r 3 0.1 r 2 r 3 0.2 r 3 r 3 0.7 r 1 r 4 0.5 r 2 r 4 0.7 r 3 r 4 0.4 B r 1 r 1 0.16 r 2 r 1 0.02 r 3 r 1 0 r 1 r 2 0.06 r 2 r 2 0.18 r 3 r 2 0.06 r 1 r 3 0.02 r 2 r 3 0.04 r 3 r 3 0.14 r 1 r 4 0.1 r 2 r 4 0.14 r 3 r 4 0.08 C Figure2-4.Calculationofasimilaritymatrix,bsimilarityvectorandcnormalized similarityvectorforourrunningexample. Inasimilarfashion,allentriesof H C 0 ;H E 0 arecreatedbyusing SimC and SimE functions.Thesethreevectorscarrythehomologyinformationthroughoutthealgorithm. 38

PAGE 39

Section2.3.3describeshowtheyarecombinedwithtopologyinformationtoproducean alignment. 2.3.2SimilarityofTopologies Previously,wediscussedwhyandhowpairwisesimilaritiesofentitiesareused. Althoughpairwisesimilaritiesarenecessary,theyarenotsufcient.Theinduced topologiesofthealignedentitiesshouldalsobesimilar.Inordertoaccountfor topologicalsimilarity,thissectiondescribesthenotionofneighborhoodforeach compatibilityclass.Afterthat,themethodcreatessupportmatriceswhichallowthe useofneighborhoodinformation. Tobeconsistentwiththereachabilitydenition,theneighborhoodrelation denitionsareinlinewithdirectionsofinteractions.Inotherwords,thesedenitions distinguishbetweenbackwardneighborsandforwardneighborsofanentity.Let, BN x and FN x denotethebackwardandforwardneighborsetsofanentity x Theconstructionofthesesetsforeachentitytypestartsbydeningneighborhoodof reactionstobuildbackbonefortopologiesofthenetworks.Then,usingthisbackbone, neighborhoodforcompoundsandenzymesaredened. Considertworeactions r i and r u ofthenetwork P .Ifanoutputcompoundof r i isan inputcompoundfor r u ,then r i isabackwardneighborof r u and r u isaforwardneighbor of r i .Thealgorithmconstructstheforwardandbackwardneighborsetsofeachreaction inthismanner.Forinstance,inFigure1-5A,reaction r 3 isaforwardneighborof r 2 ,and r 1 isabackwardneighborof r 2 Amoregeneralizedversionofneighborhooddenitioncanbegiventoincludenot onlyinstantneighborsbutalsoneighborsofneighborsandsoon.However,thisisnot necessarysincethemethodconsidersthesupportofindirectneighborsaswedescribe insection2.3.3. Asstatedbefore,neighborhooddenitionsofcompoundsandenzymesdependon thetopologyofreactions.Let, c s and c t betwocompounds, r i and r u betworeactions 39

PAGE 40

ofthenetwork P .If r i 2 BN r u and c s isaninputoutputcompoundof r i and c t is aninputoutputcompoundof r u then c s 2 BN c t and c t 2 FN c s .Forexample, inFigure2-1B,PyruvateandLipoamide-Eareneighborssincetheyareinputsoftwo neighborreactions,namelyR00014andR03270.Forenzymestheneighborhood constructionisdonesimilarly. Utilizingtheaboveneighborhooddenitions,thealgorithmcreatessupportmatrices foreachcompatibilityclass.Thesematricesrepresentthecombinedtopological informationofthenetworkpair.Eachentryofasupportmatrixrepresentsthesupport givenbythepairofentitiesthatindextherowtothepairofentitiesthatindexthe column.Here,weonlydescribehowtocalculatethesupportmatrixforreactions.The calculationsforenzymesandcompoundsaresimilar. Denition6. Let, P =[ R ; C ; E ] ;E and P =[ R ; C ; E ] ; E betwometabolicnetworks. ThesupportmatrixforreactionsofPand P isa jRjj RjjRjj Rj matrixdenotedby S R Anentryoftheform S R [ i )]TJ/F15 11.9552 Tf 12.318 0 Td [(1 jRj + j ][ u )]TJ/F15 11.9552 Tf 12.318 0 Td [(1 jRj + v ] identiesthefractionofthetotal supportprovidedby r u ; r v mappingto r i ; r j mapping.Let, N u; v = j BN R u jj BN r v j + j FN r u jj FN r v j denotethenumberofpossiblemappingsofneighborsof r u and r v Eachentryof S R iscomputedas: S R [ i )]TJ/F15 11.9552 Tf 11.773 0 Td [(1 jRj + j ][ u )]TJ/F15 11.9552 Tf 11.772 0 Td [(1 jRj + v ]= 8 > > > > < > > > > : 1 N u; v if r i 2 BN r u and r j 2 BN r v or r i 2 FN r u and r j 2 FN r v 0 otherwise Afterllingallentries,thezerocolumnsof S R arereplacedwithwith jRjj Rj 1 vector [ 1 jRjj Rj 1 jRjj Rj ::: 1 jRjj Rj ] T .Thiswaysupportofthemappingindicatedbythezero columnisuniformlydistributedtoallothermappings. Now,wedescribethecalculationoftheentriesofasupportmatrixonourrunning exampleinFigure1-5.Letusfocusonthesupportgivenbythemapping f r 2 g ; f r 2 g to mappingsoftheirneighbors.Weseethat FN f r 2 g =1 FN f r 2 g =2 BN f r 2 g =1 and BN f r 2 g =1 .Hence,thesupportofmapping r 2 to r 2 shouldbeequallydistributed 40

PAGE 41

toits3i.e. 1 1+2 1 possibleneighbormappingcombinations.Thisisachieved byassigning 1 = 3 tothecorrespondingentriesof S R matrix.Formationoftherow correspondingtothesupportgivenby r 2 r 2 mappingisillustratedinFigure2-5. r 1 r 1 ... r 3 r 3 r 3 r 4 ... ............ r 2 r 2 1 3 0 1 3 1 3 ... ............ Figure2-5.Calculationofthesupportmatrixforourrunningexample.Onlytherow representingthesupportfrom r 2 r 2 mappingtoothermappingsisshownit onlyhasthreenon-zeroentries. Weusetheterms S R S C and S E torepresentthesupportmatricesforreactions, compoundsandenzymes,respectively.Thepowerofthesesupportmatricesisthatthey enabledistributionofthesupportofamappingtoothermappingsaccordingtodistances betweenthem.Thisdistributioniscrucialforfavoringmappingswhoseneighborscan alsobematchedaswell.Inthefollowingsection,wedescribeaniterativeprocessfor appropriatelydistributingthemappingscorestotheneighborhoodmappings. 2.3.3CombiningHomologyandTopology Boththepairwisesimilaritiesofentitiesandtheorganizationoftheseentities togetherwiththeirinteractionsprovideusgreatinformationaboutthefunctional correspondenceandevolutionarysimilarityofmetabolicnetworks.Hence,anaccurate alignmentstrategyneedstocombinethesefactorscautiously.Inthissection,we describeastrategytoachievethiscombination. Fromtheprevioussections,wehave H R 0 H C 0 H E 0 vectorscontainingpairwise similaritiesofentitiesand S R ;S C ;S E matricescontainingtopologicalsimilaritiesof networks.Usingthesevectorsandmatricestogetherwithaweightparameter 2 [0 ; 1] foradjustingtherelativeeffectoftopologyandhomology,themethodtransformsthe problemintothreeeigenvalueproblemsasfollows: H R k +1 = S R H R k + )]TJ/F24 11.9552 Tf 11.955 0 Td [( H R 0 41

PAGE 42

H C k +1 = S C H C k + )]TJ/F24 11.9552 Tf 11.955 0 Td [( H C 0 H E k +1 = S E H E k + )]TJ/F24 11.9552 Tf 11.956 0 Td [( H E 0 for k 0 Inordertoassuretheconvergenceoftheseiterations, H R k ;H C k and H E k are normalizedbeforeeachiteration. Lemma1. S R S C and S E arecolumnstochasticmatrices. Proof: Thisistheproofforthematrix S R only.Theproofsfor S C and S E canbe donesimilarly. Toprovethat jRjj RjjRjj Rj matrix S R iscolumnstochastic,weneedtoshowthat allentriesof S R arenonnegativeandthesumoftheentriesineachcolumnis1.The nonnegativityofeachentryof S R isassuredbyDenition6.Now,let c beanarbitrary columnof S R and T c bethesumoftheentriesofthatcolumn.Then, 9 u 2 [1 ; jRj ] and 9 v 2 [1 ; j Rj ] suchthat d c= jRje = u )]TJ/F15 11.9552 Tf 12.86 0 Td [(1 and c v mod jRj u and v areindicesof reactions r u and r v ,respectively.ByDenition6,eachentryofcolumn c iseither 0 or 1 N u; v ,where N u; v = j BN r u jj BN r v j + j FN r u jj FN r v j .Then,the i )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 jRj + j th entryofcolumn c is 1 N u; v ifandonlyifboth r i and r j areeitherforwardneighborsor backwardneighborsof r u and r v ,respectively.Hence, T c = P N u; v t =1 1 N u; v =1 .Since c is anarbitrarycolumn, T x =1 ,forall x 2 [1 ; jRjj Rj ] .Therefore,eachcolumnof S R sumsup to1andthus, S R iscolumnstochastic. Lemma2. Everycolumnstochasticmatrixhasaneigenvalue 1 with j 1 j =1 Proof: Let, A beanycolumnstochasticmatrix.Then, A T isarowstochasticmatrix and A T e = e where e isacolumnvectorwithallentriesequalto1.Hence, 1 isan eigenvalueof A T .Since A isasquarematrix,1isalsoaneigenvalueof A Lemma3. Let, A and E be N N columnstochasticmatrices.Then,forany 2 [0 ; 1] thematrix M denedas: M = A + )]TJ/F24 11.9552 Tf 11.955 0 Td [( E 42

PAGE 43

isalsoacolumnstochasticmatrix. Proof: Let, C i D i T i bethesumofthe i thcolumnsof A E and M ,respectively. Since A and E arecolumnstochasticforall i 2 [1 ;N ] C i = D i =1 .Also,since 2 [0 ; 1] C i + )]TJ/F24 11.9552 Tf 12.099 0 Td [( D i = C i )]TJ/F24 11.9552 Tf 12.1 0 Td [(D i +1=1 .Hence, T i = C i + )]TJ/F24 11.9552 Tf 12.099 0 Td [( D i =1 forall i 2 [1 ;N ] and M isacolumnstochastic. Lemma4. Let, A bean N N columnstochasticmatrixand E bean N N matrixsuch that E = He T ,where H isanN-vectorwith jj H jj 1 =1 and e isanN-vectorwithallentries equalto1.Forany 2 [0 ; 1] denethematrix M as: M = A + )]TJ/F24 11.9552 Tf 11.955 0 Td [( E Themaximaleigenvalueof M is j 1 j =1 .Thesecondlargesteigenvalueof M satises j 2 j Proof: Theproofisomitted.SeeHaveliwala etal. [82] Usinganiterativetechniquecalledpowermethod,thealgorithmaimstondthe stablestatevectorsoftheequations2,2and2.Lemma1showsthat S R S C and S E arecolumnstochasticmatrices.Byconstructionof H R 0 ;H C 0 ;H E 0 ,wehave jj H R 0 jj 1 =1 ; jj H C 0 jj 1 =1 ; jj H E 0 jj 1 =1 .Now,bythefollowingtheorem,weshowthat stablestatevectorsforequations2,2and2existandtheyareunique. Theorem1. Let, A bean N N columnstochasticmatrixand H 0 beanN-vectorwith jj H 0 jj 1 =1 .Forany 2 [0 ; 1] ,thereexistsastablestatevector H s ,whichsatisesthe equation: H = AH + )]TJ/F24 11.9552 Tf 11.955 0 Td [( H 0 Furthermore,if 2 [0 ; 1 ,then H s isunique. Proof: 43

PAGE 44

Existence: Let, e betheN-vectorwithallentriesequalto1.Then, e T H =1 since jj H jj 1 =1 afternormalizing H .Now,theEquation2canberewrittenas: H = AH + )]TJ/F24 11.9552 Tf 11.955 0 Td [( H 0 = AH + )]TJ/F24 11.9552 Tf 11.955 0 Td [( H 0 e T H = A + )]TJ/F24 11.9552 Tf 11.955 0 Td [( H 0 e T H = MH where M = A + )]TJ/F24 11.9552 Tf 12.566 0 Td [( H 0 e T .Let, E denote H 0 e T .Then, E isacolumnstochastic matrixsinceitscolumnsareallequalto H 0 and jj H 0 jj 1 =1 .ByLemma3Misacolumn stochastic.ThenbyLemma4, 1 =1 isaneigenvalueofM.Hence,thereexists aneigenvector H s correspondingtotheeigenvalue 1 whichsatisestheequation 1 H s = MH s Uniqueness: ApplyingtheLemma4tothe M matrixdenedintheexistencepart, wehave j 1 j =1 and j 2 j .If 2 [0 ; 1 ,then j 1 j > j 2 j .Thisimpliesthat 1 isthe principaleigenvalueofMand H s istheuniqueeigenvectorcorrespondingtoit. Convergencerateofpowermethodforequations2,2and2are determinedbytheeigenvaluesofthe M matricesasdenedinEquation2of eachequation.Convergencerateisproportionalto O j 2 j j 1 j ,whichis O ,foreach equation.Therefore,choiceof notonlyadjuststherelativeimportanceofhomology andtopology,butitalsoaffectsrunningtimeofthealgorithm.Since S R ;S C ;S E are usuallylargematrices,everyiterationofpowermethodiscostly.Theideaofreducing thenumberofiterationsbychoosingasmall isproblematicbecauseitdegradesthe accuracyofthealignment.Thisalgorithmperformswellandconvergesquicklywith =0 : 7 Beforetherstiterationofpowermethodinequations2,2and2,we onlyhavepairwisesimilarityscores.Aftertherstiteration,anentry H R 1 [ i;j ] ofan 44

PAGE 45

H R 1 vectorissetto )]TJ/F24 11.9552 Tf 12.823 0 Td [( timesthepairwisesimilarityscoreof r i ; r j ,plus times thetotalsupportsuppliedto r i ; r j mappingbyallmappings r u ; r v suchthatboth r i ;r u and r j ; r v arerst-degreeneighbors.Intuitively,therstiterationcombinesthepairwise similaritiesofentitieswiththetopologicalsimilarityofthembyconsideringtheirrst degreeneighbors.Ifwegeneralizethisto k thiteration,theweightofpairwisesimilarity scorestaystobe )]TJ/F24 11.9552 Tf 12.711 0 Td [( ,whereasweightoftotalsupportgivenby k )]TJ/F24 11.9552 Tf 12.711 0 Td [(t thdegree neighborsof r i ; r j is k )]TJ/F25 7.9701 Tf 6.586 0 Td [(t )]TJ/F24 11.9552 Tf 12.574 0 Td [( .Thatway,whentheequationsystemconvergesthe neighborhoodtopologiesofmappingsarethoroughlyutilizedwithoutignoringtheeffect ofinitialpairwisesimilarityscores.Asaresult,stablestatevectorscalculatedinthis mannerareagoodmixtureofhomologyandtopology.Hence,usingthesevectorsfor extractingtheentitymappingsgivesusanaccuratealignmentofquerynetworks. 2.3.4ExtractingtheMappingofEntities Havingcombinedhomologicalandtopologicalsimilaritiesofquerymetabolic networks,itonlyremainstoextractthemappingofentities.However,sincethe algorithmrestrictstheconsiderationtoconsistentmappingsthisextractionbyitself isstillchallenging.Figure2-2pointsouttheimportanceofmaintainingconsistencyofan alignment.AligningthecompoundsC2'andC5inthegurecreatesinconsistencysince itisnotsupportedbythebackboneofthealignmentcreatedbyreactionmappings. Analignmentdescribedbythemapping givestheindividualmappingsofentities. Letsdenote as =[ R ; C ; E ] ,where R C and E aremappingsforreactions, compoundsandenzymesrespectively. Therearethreeconditionsthat shouldsatisfytobeconsistent.Therstone istriviallysatisedforany oftheform [ R ; C ; E ] ,sincethealgorithmbeforehand distinguishedeachentitytype.Forthesecondcondition,itissufcienttocreate one-to-onemappingsforeachentitytype.Maximumweightbipartitematching createsone-to-onemappings R ; C and E ,whichinturnimplies isone-to-one sinceintersectionsofcompatibilityclassesareempty. 45

PAGE 46

Thedifcultpartofndingaconsistentmappingiscombiningmappingsof reactions,enzymesandcompoundswithoutviolatingthethirdcondition.Forthat purpose,themethodchoosesaspecicorderingbetweenextractionofentitymappings. Inbetweendifferentorderingoptions,creatingthemapping R comesrst.Adiscussion forreasonsofReactionsFirstorderingcanbefoundinAy etal: [38].The R mapping isextractedbyusingmaximumweightbipartitematchingonthebipartitegraph constructedbytheedgeweightsin H R s vector.Then,usingthealignedreactions andthereachabilitysets,thealgorithmprunestheedgesfromthebipartitegraphof compoundsenzymesforwhichthecorrespondingcompoundenzymepairsare inconsistentwiththereactionmapping.Inotherwords,thealgorithmprunestheedge betweentwocompoundsenzymes x )]TJ/F15 11.9552 Tf 12.934 0 Td [( x ,ifthereexistsnoothercompoundenzyme pair y )]TJ/F15 11.9552 Tf 13.362 0 Td [( y suchthat x isreachablefrom x and y isreachablefrom y or x isreachable from x and y isreachablefrom y .Pruningtheseedgesguaranteesthatforany C and E extractedfromtheprunedbipartitegraphs =[ R ; C ; E ] isconsistent. Recallthat,theaimofthemethodistondaconsistentalignmentwhichmaximizes thesimilarityscore SimP .The denedabovesatisestheconsistencycriteria.The nextsectiondescribes SimP andthendiscussesthatthealgorithmndsthemapping thatmaximizes thatmaximizesthisscore. 2.3.5SimilarityScoreofNetworks Aswepresentintheprevioussection,thealgorithmguaranteestondaconsistent alignmentrepresentedbythemappingsofentities.Onecandiscusstheaccuracy andbiologicalsignicanceofthealignmentbylookingattheindividualmappings reported.However,thisrequiresasolidbackgroundofthespecicmetabolismof differentorganisms.Tocomputationallyevaluatethedegreeofsimilaritybetween networks,itisnecessarytodeviseanaccuratesimilarityscore.Usingthepairwise similaritiesofalignedentities,anoverallsimilarityscorebetweentwoquerynetworks, SimP ,isdenedasfollows: 46

PAGE 47

Denition7. Let, P =[ R ; C ; E ] ;E and P =[ R ; C ; E ] ; E betwometabolicnetworks. Givenamapping =[ R ; C ; E ] betweenentitiesof P and P ,similarityof P and P is calculatedas: SimP P; P = j C j X 8 c i ; c j 2 C SimC c i ; c j + )]TJ/F24 11.9552 Tf 11.955 0 Td [( j E j X 8 e i ; e j 2 E SimE e i ; e j where j C j and j E j denotethecardinalityofcorrespondingmappingsand 2 [0 ; 1] isaparameterthatadjuststherelativeinuenceofcompoundsandenzymesonthe alignmentscore. Calculatedasabove, SimP givesascorebetween 0 and 1 suchthatabiggerscore impliesabetteralignmentbetweennetworks.Using =0 : 5 preventsabiasinthescore towardsenzymesorcompounds.Onecanset =0 tohaveanenzymebasedsimilarity scoreor =1 tohaveacompoundbasedsimilarityscore.Reactionsarenotconsidered whilecalculatingthisscoresincereactionsimilarityscoresarealreadydeterminedby enzymeandcompoundsimilarityscores. Havingdenedthenetworksimilarityscore,itisnecessarytoshowthatthe consistentmapping =[ R ; C ; E ] foundintheprevioussection,istheonethat maximizesthisscore.Thisfollowsfromthefactthatthealgorithmusesmaximumweight bipartitematchingontheprunedbipartitegraphsofenzymesandcompounds.Inother words,sincemaximalityofthetotaledgeweightsof C and E arebeforehandassured bytheextractiontechnique,theirweightedsumisguaranteedtogivethemaximum SimP valueforaxed 2.3.6ComplexityAnalysis Given P =[ R ; C ; E ] ;E and P =[ R ; C ; E ] ; E ,therearethreestepsthatcontribute tothetimecomplexityofthemethod.First,thealgorithmcalculatespairwisesimilarity scoresforentitiesin O jRjj Rj + jCjj Cj + jEjj Ej time,sincecalculatingthesimilarity 47

PAGE 48

scoreofapairisconstant.Second,itcreatesthreesupportmatricesandusespower methodforndingstablestatevectors.Bothcreationphaseandasingleiterationof powermethodtake O jRj 2 j Rj 2 + jCj 2 j Cj 2 + jEj 2 j Ej 2 time.Inallexperimentspower methodconvergesinsmallnumberofiterations < 15foreachentitytype.Hence,the totalcostofndingstablestatevectorsisalso, O jRj 2 j Rj 2 + jCj 2 j Cj 2 + jEj 2 j Ej 2 .Thelast stepisextractingmappingsfromstablestatevectorsbyusingmaximumweightbipartite matching.Thissteptakes O min jRj ; j Rj jRjj Rj + min jCj ; j Cj jCjj Cjj + min jEj ; j Ej jEjj Ej timeintotal.Therefore,thecomplexityofthealgorithmisdominatedbythepower methodpartwhichresultsinanoveralltimecomplexityof O jRj 2 j Rj 2 + jCj 2 j Cj 2 + jEj 2 j Ej 2 2.4ResultsandDiscussion Dataset: Inourexperiments,weusedthemetabolicnetworksinKEGGPathway database[7],whichcurrentlyhas91,732networksgeneratedfrom372reference networks.Thisdatabasecontain7,645reactions,5,022enzymesand15,217compounds. 462%ofthesereactionsarecatalyzedbymultipleenzymesand6,6777%ofthem hasmultipleinputoroutputcompounds.WeconvertedthemetabolicnetworksinKEGG toourgraphmodel. Implementation: WeimplementedouralgorithminCprogramminglanguage.It compilesandrunsonanyUnix-basedoperatingsystem.Weranallthetestsreported hereonadesktopcomputerrunningUbuntu8.04withoneIntelPentium4,3.20GHz processorand2GBofRAM.Weobtainedthesourcecodeforphylogenictreecreation usingmetabolicnetworksfromHeymans etal. [16].Also,wedownloadedtheexecutable ofmetabolicnetworkalignmenttooldesignedbyPinter etal. [18] Parameters: Wehaveanumberofadjustableparametersinouralgorithm.We discusstheparameter indetailinSection2.4.1.Theparameter adjuststheweight ofenzymeandcompoundsimilaritiesinthealignmentscore.Theweightofenzymeand compoundsimilaritiesaredenedas and 1 )]TJ/F24 11.9552 Tf 12.396 0 Td [( ,respectively. valuedoesnothave anyeffectonouralgorithmbutjustonthealignmentscore.Weusethevalue =0 : 5 48

PAGE 49

Thisvaluegivesequalweightstoenzymeandcompoundsimilarities.Theparameters C in ; C out and E arerelativeweightsofcorrespondingcomponentsinreactionsimilarity calculation.Weset C in =0 : 3 ; C out =0 : 3 ; E =0 : 4 tobalancetheeffectofcompounds andenzymesonreactionsimilarity.Theexibilityofadjustingtheseparametersmakes ourmethodmoresuitablefordifferentscenarios.Forinstance,ifweareinterested onlyintheenzymesimilaritiesthenitisenoughtosettheparameters E =1 : 0 and =1 : 0 .Anotherexibilityisthedifferentoptionsforenzymeandcompoundsimilarities asdiscussedinSection2.3.1.Inourexperiments,weusedinformationcontentsimilarity forenzymesandidentityscoreforcompounds. ComparisonCriteria: Inordertoevaluatetheperformanceofouralgorithm wecomparedourresultstoavailablebiologicalresultsandtheresultsofprevious computationalmethods.Wemeasuredthebiologicalsignicanceofouralignments bycomparingourresultstorelatedbiologicalstudies.[28,41,83]Forquantitative evaluationofourmethodweusedalignmentscore,recallrate,runningtimeandZ-score. Wecomparedourimplementationtorecentmethodsformetabolicnetworkanalysis[16, 18,40]. 2.4.1EffectsofHomologyandTopologyInformation Animportantfeatureofouralgorithmisthatitcombinessimilaritiesofhomologies andtopologiesofnetworks.Weemploytheparameter tobalancetherelativeweights ofthesetwofactors.Choiceof iscrucialsinceithassignicanceeffectonboth thealignmentscoreandtherunningtimeofouralgorithm.Choosingaverysmall underestimatestheimportanceoftopology,whereaschoosing =1 ignoresthenode labels,hencethebiologicalmeaningofnetworks. Weanalyzetheinuenceof ontherecallrateandtherunningtime.Firstwend 10differentorganismswithverysimilarPyruvatemetabolisms.Then,weintroduce63 differentpercentageoferrorsexcluding0%tocopiesof10originalnetworks.Then, weput10originaland630errorintroducednetworkstogethertocreateourdatabase. 49

PAGE 50

A B Figure2-6.Effectof parameter.AonrecallrateandBonrunningtime.Theresults areobtainedbyaligningPyruvatemetabolismof HomoSapiens againsta databaseof640networksfordifferent values. 50

PAGE 51

The630networkswithnonzeroamountoferrorareconsideredasfalsepositivesand10 originalnetworksarecalledtruepositives. Wethenquerythisdatasetwithoneofthe10originalnetworks.Wemeasure therecallrateasthepercentageoftruepositivesobservedinthetop 5% of640 alignments.Figure2-6Aillustratesthatthehighestrecallrateisachievedfor 2 [0 : 6 ; 0 : 7] .When isaround0,recallpercentageislowsincealignmentscorebecomes highlyaffectedbytheerrorsinnodelabelshomology.Ontheotherhand,when approachesto1recallratesuddenlydropssincelabelmatchingsbecomerandominthis case. Figure2-6Bpresentstherunningtimeanalysisofthesameexperiment.Aswe describedinSection2.3.3,increasing decreasestheconvergencerateofour algorithmandincreasestherunningtime.Increaseintherunningtimeisnotthe mainconcerninchoiceof sinceevenfor =1 averagerunningtimeforapairwise alignmentisonly75milliseconds.Therefore,wechoose =0 : 7 asitproducedthe highestrecallratefortherestoftheourexperiments,unlessotherwisestated. 2.4.2IdenticationofAlternativeEnzymesandPaths Anaccuratealignmentshouldrevealfunctionallysimilarentitiesorpathsbetween differentnetworks.Morespecically,itisdesirabletomatchtheentitiesthatcan substituteeachotherorthepathsthatservesimilarfunctions.Identifyingthese functionallysimilarpartsofnetworksisimportantandusefulforvariousapplications. Someexamplesare,metabolicreconstructionofnewlysequencedorganisms[5], identicationofnetworkholes[84]andidenticationofdrugtargets[11,85].We usedourmethodtoalignthenetworkpairsthatareknowntocontainnotidenticalbut functionallysimilarentitiesorpathsinthisexperiment. AlternativeEnzymes: Twoenzymesarecalledalternativeenzymes,ifthey catalyzetworeactionswithdifferentinputcompoundsthatproduceaspecictarget 51

PAGE 52

compound.Similarly,wenamethesereactionsasalternativereactionsandtheirinputs asalternativecompounds. Wetestedourtooltosearchforwell-knownalternativeenzymespresentedinKim et al. [83]Table2.4.9illustratesfourcasesinwhichouralgorithmsuccessfullyfoundthe alternativeenzymeswiththecorrespondingreactionmappings.Furthermore,resulting compoundmatchingsareconsistentwiththealternativecompoundsproposedinKim et al. Forinstance,therearetwodifferentreactionsthatgenerateAsparagineAsnfrom AspartateAspasseeninTable2.4.9.Oneiscatalyzedbyaspartateammonialigase EC:6.3.1.1andusesAmmoniumNH 3 directly,whereastheotheriscatalyzedby transaminaseEC:6.3.5.4thattransferstheaminogroupfromGlutamineGln.We comparedtheAlanineandAspartatepathwaysoftwoorganismsthatuse thetwodifferentroutes.Ouralgorithmalignedthealternatereactions,enzymesand compoundscorrectly.OuralignmentresultsfortheotherthreeexamplesinTable2.4.9 arealsoconsistentwiththeresultsin[83]. AlternativePaths: Asmetabolicnetworksareexperimentallyanalyzed,itis discoveredthatdifferentorganismsmayproducesamecompoundsbytotallydifferent paths.Experimentalidenticationprovideuswelldocumentedexamplesofsuch alternativepaths.Weusedouralgorithmtominetheseknownalternativepathsin metabolicnetworks. Itisshownthat,twoalternativepathsforIsopentenyl-PPproductionindifferent organismsexist[28].Figure2-7Aillustratesthesepathsandtheentitymappingsfound byouralgorithm.DespitethattheECnumbersofalignedenzymesaretotallydifferent, whichindicatesthattheirinitialpairwisesimilarityscoresare0,ouralgorithmstillaligned thesefunctionallysimilarpathssinceitalsoaccountsforthetopologicalsimilaritiesof networks. Sinceourmethodndsone-to-onemappings,onlyfourofsevenenzymesinthe Non-mevalonatepatharemappedtofourenzymesoftheMevalonatepath.Afuture 52

PAGE 53

A B Figure2-7.Identicationofalternativepaths.Weillustratetheresultingmatchingsof entitiesbydashedlines.AAportionofthemetabolicnetworkofsteroid biosynthesis. H.sapiens producesIsopentenyl-PPviathelowerpathwhichis calledMevalonatePath.However, E.coli usesatotallydifferentpathcalled Non-mevalonatePath,forproducingIsopentenyl-PPwhichisshowninbold. OuralgorithmidentiesthesealternativepathswhentheSteroid biosynthesispathwaysof H.sapiens and E.coli arealigned.BAportionof Lysinebiosynthesispathway.StartingfromTetrahydrodipicolinatethereare twodifferentpathstoproduce6-Diaminopimelate. E.coli usesthepathin boldwiththreeenzymes,whereas A.thaliana doesthisproductiondirectlyby asingleenzymeEC:2.6.1.83.Ourmethodndsthecorresponding alternativepaths. workwouldbetorelaxtherestrictionthatentitymappingsshouldbeone-to-one. Thatwayalternativepathswithdifferentnumbersofentitiescouldbealignedwithout individualentitymappings. 2.4.3PhylogenicReconstruction Traditionally,phylogenicreconstructionisdonebyanalyzinggenomicdatasuch asDNAsequencesoforganisms.Increaseintheavailablebiologicalinteractiondata motivatedtheuseofmetabolicsimilaritiesforinferringphylogeny.Inpreviousstudies 53

PAGE 54

Heymans etal. [16]andClemente etal. [40]usedthealignmentscoresofmetabolisms ofdifferentorganismstocreatephylogenictrees. Inhere,weuseouralgorithmtocreatephylogenictreesfromthesimilaritiesof28 commonmetabolicnetworksfor73organisms.Table2-3liststhecompletenamesand NCBIclassicationsofthese 73 organisms. Tocreatethephylogenictreesweusedtwodifferentimplementationsofthe NeighborJoining[86]method,namelyMatch7http://www.itu.dk/people/sestoft/bsa.html andthePhylippackage[43].NeighborjoiningimplementationinPhylipenforcesthe usertoselectanorganismastheoutgroupofthetree.Ontheotherhand,Match7 doesnotenforcethisselection.Usingthesetwomethods,wecreatedfourdifferent phylogenictrees,twofromthedistancematrixreportedbyouralgorithmandtwofrom theoneweobtainedusingthealgorithmofHeymans etal. Sincewecouldnotgetthe sourcecodeofClemente etal. ,weusedthetreegivenin[87],whichiscreatedfromthe Glycolysispathwayonly. Werstcreatethecompletetreeforthe73organismsinFigure2-8Dusingour method.ThetreeinFigure2-8AdisplaystheNCBItaxonomyoftheseorganisms. ComparingtheNCBItreewithourprediction,weobservethehighsimilarityinbetween thegroupingsoforganisms.Forinstance,the8organismsclassiedasEukaryota aregroupedtogetherinourtree.Evenifwetaketheclassicationonelevelfurther, ourtreestillremains100%accuratetocapturethegroups hsa,dme,rno,cel,mmu sce,spo and ath whichareclassiedasBilateria,AscomycotaandViridiplantae, respectively.ThetreeinFigure2-8CcreatedbyHeymans etal. alsocapturesthe groupingofEukaryota.However,inthesecondlevelofclassicationtheygroup ath whichisViridiplantaewith hsa,dme,rno,cel,mmu whichareBilateria.Ontheother hand,thephylogenypredictionofClemente etal. showninFigure2-8Bfailseveninthe rstlevelofclassication.Itgroupsonly5of8Eukaryotaorganismstogether.Another exampleisfor10BacteriaclassiedasFirmicutesinNCBI.Ourtreecapturesthe 54

PAGE 55

groupingofthese10organismsinagroupof13organisms.Theother3organismsare clusteredassingletongroupsaccordingtorsttwolevelsofclassication.Thetreeof Clemente etal. capturesthese10organismstogetheronlyinaclusterofsize21.For thetreeofHeymans etal. 8ofthe10Firmicutesisclusteredinagroupof10andthe other2aregroupedfarawayfromthem. WecomparedthetreesinFigure2-8quantitativelybyusingtwodifferenttree distancemeasures,namelytheSymmetricDifferencebyRobinsonandFoulds[88] providedbyPhylippackage[43]andUpdownDistance[42].TheSymmetricDifference usesthetreetopologiesandcountsthenumberofpartitionsamongthetwotreesthat areononetreebutnotontheother.TheUpdownDistancecalculatesthedistancevalue bythedifferencebetweenupanddownhopcountsofallpossibleorganismpairingsin twodifferenttrees. WeassumedtheNCBItreeasthegoldstandardandcalculatedthedistancesof theotherthreetreestoitasshowninTable2-4.Foralldifferentcombinationsoftree creationmethodsandtreedistancemeasures,thetreeproposedbyourmethodismore similarorlessdistanttoNCBItaxonomythantheonesproposedbyHeymans etal. andbyClemente etal. Anotheradvantageofourmethodcomparedtothemethodof Heymans etal. isthat,ouralgorithmrunssignicantlyfasterthantheirs.The 73 ; 584 pairwisealignments 73 72 2 28 tocreatethedistancematricesfrom28common networks,took2.25days.4hoursforthemethodofHeymans etal. ,whereasour algorithmperformeditinonly4.3hours.Weanalyzetherunningtimeofouralgorithmin detailinSection2.4.7. Aftercreatingthecompletetree,weanalyzedtheaccuracyofourphylogeny predictionwithinthemajorcladesArchaea,Eukaryota,Bacteria.Figure2-9shows theresultingtreesforfourdifferentsub-clades.Toassesshowmuchresolutionour algorithmprovidesfortreeprediction,wecomparethesetreestothecorresponding branchesoftheNCBItreeinFigure2-8A.Forinstance,ourpredictionforEuryarcheota 55

PAGE 56

A B C D Figure2-8.Phylogenictreesfor73organisms.ANCBIPhylogenyourgoldstandard, BPhylogenictreecreatedbyClemente etal. usingaverage-link hierarchicalclusteringonthedistancematrixproducedfromGlycolysis pathway,CPhylogenictreecreatedbyHeymans etal. usingNeighbor JoiningMatch7onthedistancematrixproducedfrom28common networks,DPhylogenictreecreatedbyourmethodusingNeighborJoining Match7onthedistancematrixproducedfrom28commonnetworks 56

PAGE 57

A B C D Figure2-9.Ourtreepredictionswithinmajorclades.AEuryarcheotaBFirmicutes C -proteobacteriaD -proteobacteria. correctlyclusterstheorganismpairstac,tvo,mma,macandpfu,pab.However, thelinkingoftheseclusterstoeachotherisdifferentthanNCBItree.Also,thetreefor FirmicutesinFigure2-9Bshowsthatourmethodaccuratelyclustersalltheorganism pairs.TheonlydifferencefromtheNCBItaxonomyistheplacementofspn,llapair. Ourmethodgroupsthispairtogetherwithsixotherorganisms,whereasinNCBI classicationspn,llapairisconsideredatthesamelevelwiththesesixorganisms. Similarobservationsholdfortheclusteringsoftheothertrees. 2.4.4Top-kQueriesinNetworkDatabases Afundamentalquerytypeindatabasesistopk queryalsoknownas K -nearest neighborquery.Inourcase,thisquerytypereturnsthe k closestnetworkstothe underlyingquerynetworkinagivendatabase.Weassesstheneighborhoodasthe valueofthesimilarityscorewedevisedformetabolicnetworks.Inthisexperiment,we evaluatetheaccuracyofouralgorithmfortopk queriesasdescribedbelow. Toevaluatetopk queries,weconductedrecallexperimentsoneachmajorclade ofthephylogenictree.Forthispurpose,wepickedoneorganismfromeachmajor clade,namely mma fromArchaea, hsa fromEukaryotaand eco fromBacteria.These organismshave28commonmetabolicnetworks.Wecreatedadatabasequeryforeach 57

PAGE 58

Figure2-10.Averagecorrectclassicationpercentagesofouralgorithmonthree differentorganisms mma hsa eco indescendingaccuracyorder.Three differentlinesrepresentthethreedifferentlevelsofphylogenywithLevel3 beingthedeepestmostspeciclevel.x-axisshowsdifferentquery networks. ofthese28networksqueriesintotal.Thedatabaseconsistsofall6,119networks of73organismsgiveninTable2-3.Weextractedthetopk resultsofeachqueryfor appropriate k valuesfordifferentlevelsofphylogenichierarchy.Then,wecalculated thecorrectclassicationpercentageforeachlevelofeachquery.Wecomputethis numberasthepercentageoftheorganism-networkpairsintopk resultswhichhave theorganisminthesamehierarchicalgroupasthequeryorganismandthenetwork numberequaltothenetworknumberofthequerypair.Weconsideredthreelevels ofhierarchyfortheFigure2-10.FirstoneisthemajorcladesArchaea,Eukaryota, Bacteria.Secondoneisthesub-groupsofmajorcladesThermoprotei,Ascomycota, Firmicutes,etc..Thirdlevelissub-groupsofsecondlevelgroupssuchasBacilliand ClostridiaforthesecondlevelgroupFirmicutes.Thedetailedtreewithalllevelsare availableat www.ncbi.nlm.nih.gov/Taxonomy/ .Wechoosethevalueof k foragiven 58

PAGE 59

queryasthenumberofnetworksinthecorrespondinghierarchicalgroupmultipliedby 1.5i.e.,50%morethanthesizeofthetrueresultset. Thewholeexperimenttook6.79hoursforatotalof513,996pairwisealignments. Thisgivesanaverageof4.84minutesperqueryand0.047secondsforapairwise alignment. Figure2-10showstheaveragecorrectclassicationpercentagesofthethree organismsfromeachmajorcladefordifferentdepthsofphylogeny.Thereasonwhywe taketheaverageoverthreeorganismsfromeachmajorcladeistounbiastheeffect ofthesizeofaspecicclade.Asseeninthegure,correctclassicationratesfor differentlevelsvary.ForLevel1thisrateis68%,forLevel2itis58%andforLevel3 73%.Togetherwithourtreepredictionfor73organisms,thevarianceinrecallvalues fordifferentlevelspointsoutanimportantpropertyofouralgorithm.Itshowsthat,our methodperformswelltodistinguishinbetweenthemajorcladesandevenbetterin clusteringthedeepestlevelsofthehierarchy.Theformerisduetohighdissimilarity betweenorganismsandthelatterisduetothehighsimilarity.Inbetweenthesetwo levelsourhierarchypredictionsmaydeviatefromtheNCBItaxonomy.Onereasonfor thatdeviationisthetreecreationmethodweuse,asanyotherknownmethod,isnotan exactmethod. 2.4.5EffectofConsistency Therearethreetypesofentitiesinmetabolicnetworks,namelyreactions,enzymes andcompounds.Inouralgorithmwendmappingsforeachentitytypeandwename thesethreemappingstogetherasouroverallalignment.Ensuringconsistencyis necessaryforlteringoutnonsensicalmappingsthatdegradetheaccuracyofthe alignment.Whenextractingmappingsfordifferententitytypes,wewanttomakesure thatnoneofthemappingsforanentitytypeisinconsistentwithanyothermappingof asameoradifferenttype.Inbetweenthesametypeofentitymappings,wedothisby restrictingalignmentstoone-to-onemappings.Inbetweendifferenttypesofentities, 59

PAGE 60

A B Figure2-11.Effectsofconsistencyrestrictionandtheentityorderingonalignment score.Eachdotrepresentsanactualalignmentscoreonx-axisandthe upperboundforthisalignmentscoreony-axis.14,894alignmentscores gatheredbyAmappingthereactionsrstBtakingthemaximumof scoresofthreepossibleorderingsReactionsFirst,EnzymesFirst, CompoundsFirst. 60

PAGE 61

however,weneedtospecifyanorderingbetweenentitytypestoensureconsistency. Wediscussedin2.3.4thatwerstextractthemappingofreactionsandthenweuseit asthebackboneofouralignment.Bymeansofthisbackbone,weconstructreachability setsandapplythedescribedpruningtechniquetotheweightedbipartitegraphsof enzymesandcompounds. Here,weanalyzetheeffectoftheconsistencyrestrictionandthedifferentchoices ofextractionorderings.Wedothisbycomputinganupperboundthatisindependent ofextractionordering.Thisupperboundiscomputedbyignoringthepruningphase describedin2.3.4.We,then,usethreepossibleextractionorderingswithnecessary pruning,namelyReactionsFirst,EnzymesFirst,CompoundsFirst. Figure2-11demonstratestheeffectofconsistencyrestrictiononalignmentscore. Whenreactionsmappingsareextractedrst,asinFigure2-11Afor93%ofthe alignments,thescoresarenotlessthan90%oftheupperbound.Thescoresthat arenotlessthan80%oftheupperboundconstitute98.4%. Figure2-11Bisplottedbytakingthemaximumscoreofthreedifferentextraction orderingsasthealignmentscore.Asseen,thereisnosignicantimprovementinterms ofalignmentscorecomparedtoReactionsFirstapproach.Ingure2-11B,95%ofthe alignmentscoresarenotlessthan90%oftheupperbound,and99.5%ofthemare notlessthan80%oftheupperbound.Inbetween14,894differentalignments,number oftimesthatthemaximumscoreisachievedbyReactionsFirstorderingis13967.8 %,byEnzymesFirstorderingis512.4%andbyCompoundsFirstorderingis415 .9%. TheabovediscussionsofFigure2-11pointedouttwoveryimportantconclusions. Therstoneis,thelossofsimilarityscoreduetoconsistencyrestrictionisnot signicant.Thesecondoneis,choosingtheReactionsFirstextractionorderingis considerablybetterthantheotherorderings.Also,evenifthebestofallthreeorderings isusedtheimprovementisstillnotsignicant. 61

PAGE 62

Figure2-12.Effectsofhomologyandtopologyerrorsonalignmentscore. 2.4.6ErrorTolerance Gatheredfrombiologicalexperiments,interactiondataispronetodifferenttypes oferrorssuchasmisclassicationofanentityormisidenticationofaninteraction.On topofthat,currentavailablenetworkdataisnotcomplete.Therearemanynetwork holesandevensomemissingnetworksforsomeorganismsinnetworkdatabases[89]. Therefore,itisimportantforanalignmentstrategytobeabletotoleratesomeamountof errorormissingdatainnetworks. Wemeasuredtheerrortoleranceofouralgorithmbytheinuencesoftwoerror typesonthealignmentscore,namelyhomologyandtopologyerrors.Wesystematically introducedtheseerrorsintooriginalnetworkswithdifferenterrorpercentages.These twotypesoferrorsareimportanttosimulatedifferenttypesofpossibleerrorsin metabolicnetworkconstruction. 62

PAGE 63

Wecreatehomologyerrorbyexchangingthenodelabelsofentitieswithrandom nodelabelsofthesametypewithaprobabilityofthegivenerrorpercentage.Thiserror simulatestheexperimentalmisclassicationofentitieswhennetworksareconstructed. Forexample,ifthehomologyerrorpercentageis30%,theexpectednumberofentities havingalabelchangeis30%ofthenumberoftotalentities. Theothertypeoferrorisintroducedinthetopologyoftheinteractiongraph.We deleteorinsertaninteractionwithaprobabilityofthegivenerrorpercentage.Deletion standsforexperimentallyfoundfalsenegativeinteractions,whereasinsertionstands forthefalsepositiveone.Inordertosimulatethenetworkholes,wedeleteanentity togetherwithallitsinteractions. Inthisexperiment,weusedGlycolysispathwaysof14differentorganisms.We introduced12differenterrorpercentages,rangingfrom0%to90%forbothtopology andhomology.Usingall144possiblecombinationsoferrorsforeachof14organisms, wecreatedadatabaseof2016networkswith14ofthembeingtheoriginalGlycolysis pathways.We,then,queriedthisdatabasewiththeGlycolysispathwayof Anabaenasp. Foreachdifferenterrorcombinationwecalculatedtheaveragealignmentscoreofthe querynetworktoerroneousnetworks. Figure2-12illustratesthechangeinthealignmentscoresfordifferenterrorvalues. Asseen,homologyerrordegradesthescoremuchfasterthanthetopologyerror.This iscompatiblewiththerealscenariosincemisclassicationofanentityinuences thenetworkmorethanamisidentiedoramisplacedinteraction.Goingbackto experimentalresults,theaveragescorefor14originalnetworksis0.8624.Fora combinederrorof15%topology8%homologytheaveragescoredropsto0.7754,which isapproximately90%oftheaveragescorefororiginalnetworks.However,only15% homologyerroritselfreducestheaverageto0.7230.Therefore,wecanarguethatour algorithmistoleranttotopologyerrorsaround15%andhomologyerrorsaround10%. 63

PAGE 64

2.4.7RunningTime Sofarwehavediscussedtheaccuracyofouralignmentalgorithm.Inordertobe practical,analgorithmneedstobeefcientaswell.Wecalculatedtheconvergencerate ofourmethodinSection2.3.3andsaidthatitruns13timesfasterthanthealgorithm ofHeymans etal. eventhoughtheirabstractioninmodelingsignicantlydecreasesthe networksize. Inthissection,weexaminetherunningtimeofourtoolfornetworksofdifferent sizes.Also,wecomparetheperformanceofourtooltoarecentmetabolicnetwork alignmenttooldesignedbyPinter etal. [18]. SimilartotheabstractionofHeymans etal. ,Pinter etal. discardsallthecompounds andthereactionsanduseonlyenzymesformodelingthenetworksasgraphs.Since werefusetohaveanykindofabstraction,thegraphsizeforthesamenetworkis considerablylargerinourmodel.Forexample,Folatebiosynthesispathwayof E.coli has12enzymes.Theirsimpliedmodelrepresentthisnetworkasagraphwith12 nodesand11edges,whereasinourgraphmodelthesamenetworkisrepresentedby 55nodesreactions,12enzymes,21compoundsand84edges.Whenwemeasure thenetworksizebythenumberofenzymes,thesetwonetworksareconsideredtobeof thesamesize. Furthermore,thealgorithmofPinter etal. runsinpolynomialtimeonlyfor multi-sourcetrees.Fornetworkswithcyclestheirmethodtakesexponentialtime. Forthisreason,theypartitionnetworksintonon-cyclicconnectedsub-pathways.A sub-pathwayofthistypewith n enzymesismappedtoagraphwith n nodesand n )]TJ/F15 11.9552 Tf 12.314 0 Td [(1 edges. Inthisexperiment,weusethedatabaseof100 E.Coli and88 S.Cerevisiae networks.Sinceourmethodworksforgraphswithanytypeoftopology,weuseall 188networksastheyare.However,thedatabaseisbrokeninto268non-cyclicnetworks for E.Coli ,155for S.Cerevisiae forthemethodofPinter etal. Eachnetworkinthe 64

PAGE 65

A B Figure2-13.Runningtimecomparisonandanalysis.ATotaltimesforeachquery networkagainstthewholedatabaseincludingIOoperationsand unexpectednessZ-score,p-scorecalculations.BEffectoftotalnumber ofentitiesofthequerynetworkonrunningtime. 65

PAGE 66

databaseisqueriedagainstthewholedatabase.Figure2-13Ashowsthat,eventhough ouralgorithmhassignicantlylargerinputsizesforthesamenumberofenzymes,it stillrunsconsiderablyfasterforalldifferentnetworksizes.Thetotalrunningtimeofthis experimentismeasuredas3.66hoursforthemethodofPinter etal. and24.3minutes forourmethod.Additionally,itisworthtonotethatthegapbetweentherunningtimes increasesasnetworksizegrows. Usingthesamedatabasewemeasuredtheeffectoftotalnumberofentitieson runningtimeinFigure2-13B.Sinceourmodelconvertsnetworksintographswith reactions,enzymesandcompoundsasnodes,thetotalnumberofentitiesisthesumof thenumbersofthesethreeentitytypes.AsseeninFigure2-13B,fornetworkswithtotal numberofentitieslessthan100,alignmenttimeagainstthedatabaseof188networksis lessthan20secondswhichgivesanaveragetimearound0.1secondsforeachpairwise alignment.Besides,evenforthelargestavailablenetworkinKEGGPurinemetabolism of E.Coli with92reactions,56enzymesand69compoundstotalquerytimeofour algorithmisaround2minutes. 2.4.8StatisticalSignicance TomeasuretheunexpectednessofthealignmentswedevisedaZ-score.We calculatetheZ-scorebyusingthedistributionofalignmentscoresofrandomly generatednetworksusingquerynetworks.Wegeneratetherandomnetworksby shufingthelabelsoftheentitiesofquerynetworks.Labelshufingcorrespondsto randomlyswitchingtherowsofsupportmatricesofeachentitytypewithoutchanging thetopologyofnetwork.Sincethealignmentalgorithmisxed,thisZ-scoremeasures theunexpectednessofthesimilarityofthequerynetworkscomparedtorandom networksofthesametopology. InFigure2-14weplottedthedistributionofalignmentscoresofrandomlygenerated networksandveriedthatitissimilartonormaldistribution.Thisshowsthatthe 66

PAGE 67

Figure2-14.Thedistributionoftheobservedalignmentscoresof1,000randomly generatednetworksfromValine,LeucineandIsoleucinedegradation pathwaysof H.Sapiens and D.Melanogaster andthenormaldistribution thathasthesamemeanandvarianceastheobserveddistribution. proposedZ-score,calculatedasthenumberofstandarddeviationsbetweentheactual scoreandtheaverageofrandomalignmentscores,isstatisticallymeaningful. WeusedthedatasetdescribedinSection2.4.7toobservethedistributionof Z-scoresonarealdatasetandtoanalyzethebiologicalmeaningofthestatistically signicantalignments.ThedistributioninFigure2-15indicatesthatthealignments withscoressignicantlygreaterthanrandomalignmentsconstituteasmallpercentage. Thisiswellsuitedtotherealworldscenariosinceeachnetworkrepresentsafunctional modulethatisdifferentthantheothers.However,ifwelookatthealignmentswitha signicantZ-score > 2 : 0 ,weobservethattheyareeitherthesamefunctionalmodules ofdifferentorganismsordifferentfunctionalmoduleshavingsimilartopologiesand commonentities. SomeofbiologicallymeaningfulalignmentswithZ-scoregreaterthan2.0arelisted below: 67

PAGE 68

Figure2-15.TheZ-scoredistributionof17,578pairwisealignmentsgatheredfrom188 networks E.Coli and88 S.Cerevisiae usedinSection2.4.7. Arginineandprolinemetabolismof sce withUreacycleandmetabolismofamino groups sce ,Z-score=3.57. Glycolysis/Gluconeogenesismetabolismof eco withPyruvatemetabolismof sce Z-score=2.60. ReductivecarboxylatecycleCO2xationof eco withCitratecycleTCAcycleof eco ,Z-score=2.51. Alanineandaspartatemetabolismof eco withbeta-Alaninemetabolismof eco Z-score=2.35. 2.4.9Discussion Inthischapter,weconsideredthepairwisealignmentproblemformetabolic networks.Wedevelopedanalgorithmwhichdoesnotrestrictthenetworktopologies. Weusedacomprehensiveandnon-redundantgraphmodelforrepresentingnetworks. Usingthismodel,webuiltatoolthatalignsreactions,compoundsandenzymesof metabolicnetworks.Inouralgorithm,weconsideredboththepairwisesimilarities ofentitieshomologyandtheorganizationofnetworkstopology.Weadjustedthe 68

PAGE 69

relativeweightsofhomologyandtopologybyaparameterandexaminedtheresults.We usedthemaximumweightbipartitematchingtoextractalignedentities.Wediscussed theinuenceofdifferentorderingsofextractingentitymappingsonthealignment accuracy.Wedenedreachabilitysetsofnotyetalignedentitiesusingthealigned entities.Weenforcedtheconsistencyofthealignmentbyusingthereachabilitysetsof entitiesandemployingapruningtechniquethatltersoutnonsensicalmappings.We devisedasimilarityscoreandaZ-scoretoassessthesignicanceofouralignments. Weevaluatedouralgorithm'sperformancebothqualitativelyandquantitativelyby comparingourresultstoexperimentallydeterminedandcomputationallyobtained results.Ourexperimentsshowedthat,ourmethodgivesaccuratealignmentsanditis usefulinseveralbiologicalapplications.Therunningtimeanalysisdisplayedthatour algorithmisrunningfastandisalsoscalablewiththenetworksize. 69

PAGE 70

Table2-1.Commonlyusedsymbolsinthischapter. SymbolDescription P P Querymetabolicnetworks R ; C ; E Setofallreactions,compoundsandenzymesinametabolicnetwork r i r j Reactionsofquerynetworks I i O i E i Setofinputcompounds,outputcompoundsandenzymesofreaction r i Relationwhichrepresentsthealignmentoftheentitiesofquerynetworks S R ;S C ;S E Supportmatricesofreactions,compoundsandenzymes H R ;H C ;H E Homologicalsimilarityvectorsofreactions,compoundsandenzymes Parameteradjustingrelativeweightsofhomologyandtopology I ; O ; E Relativeweightsofsimilaritiesofinput,outputcompoundsandenzymes Table2-2.Alternativeenzymesthatcatalyzetheformationofacommonproduct usingdifferentcompounds. Id a Org b Reaction R.Id c Enzyme d Compounds e 620 sau 012571.1.1.96MAL+FAD OAA+FADH 2 hsa 003421.1.1.37MAL+NAD OAA+NADH 620 ath 003454.1.1.31OAA+Pi PEP+CO 2 sau 003414.1.1.49OAA+ATP PEP+CO 2 +ADP 252 chy 005786.3.5.4Asp+ATP+Gln Asn+AMP+PPi cpv 004836.3.1.1Asp+ATP+NH 3 Asn+AMP+Glu 860 sau 068951.3.99.22CPP+O 2 PPHG+CO 2 hsa 032201.3.3.3CPP+SAM PPHG+CO 2 +Met a Networks:00620-Pyruvatemetabolism,00252-Alanineandaspartate metabolism,00860-Porphyrinandchlorophyllmetabolism; b Organismpairsthatarecompared; c KEGGnumbersofalignedreactionpairs; d ECnumbersofalignedenzymepairs; e Alignedcompoundspairsareputinthesamecolumn.; f Abbreviationsofcompounds:MAL,malate;FAD,Flavinadeninedinucleotide; OAA,oxaloacetate;NAD,Nicotinamideadeninedinucleotide;Pi,Orthophosphate; PEP,phosphoenolpyruvate;Asp,L-Aspartate;Asn,L-Aspargine;Gln, L-Glutamine;PPi,Pyrophosphate;Glu,L-Glutamate;AMP,Adenosine 5-monophosphate;CPP,coproporphyrinogenIII;PPHG,protoporphyrinogen; SAM,S-adenosylmethionine;Met,L-Methionine. 70

PAGE 71

Table2-3.FullnamesandtwolevelsofNCBItaxonomyof73organismsusedin phylogenicreconstructionexperiments A :Archaea, E :Eukaryota, B : Bacteria Level2Org.Name BContinued Org.Name A Euryarchaeota tac T.acidophilum -proteobacteria xfa X.fastidiosa tvo T.volcanium xcc X.campestris hal Halobacterium pae P.aeruginosa mac M.acetivorans hin H.inuenzae mma M.mazei pmu P.multocida mja M.jannaschii vch V.cholerae afu A.fulgidus sty S.typhi mth M.thermo. stm S.typhimurium pab P.abyssi ece E.coli-O157 pfu P.furiosus ecs E.coli-O157J Thermoprotei ape A.pernix eco E.coli pai P.aerophilum ecj E.coli-J sso S.solfataricus ype Y.pestis sto S.tokodaii -proteobacteria atc A.tumefaciens E Ascomycota sce S.cerevisiae atu A.tumefaciens spo S.pombe sme S.meliloti Bilateria hsa H.sapiens bme B.melitensis rno R.norvegicus mlo M.loti dme D.melanogaster rpr R.prowazekii cel C.elegans ccr C.crescentus mmu M.musculus -proteobacteria rso R.solanacearum Viridiplantaeath A.thaliana nme N.menin-Z2491 B Firmicutes sau S.aureus-N315 nma N.menin-MC58 sav S.aureus-Mu50 -proteobacteria cje C.jejuni bha B.halodurans hpy H.pylori-J99 bsu B.subtilis hpj H.pylori-26695 lmo L.monocytogenes Actinobacteria mtc M.tuberculosis lin L.innocua mtu M.tuberculosis cac C.acetobutylicum mle M.leprae tte T.tengcongensis sco S.coelicolor lla L.lactis Cyanobacteria syn Synechocystis spn S.pneumoniae ana Anabaenasp. Chlamydiaceae ctr C.trachomatis Thermotogaetma T.maritima cmu C.muridarum D-Thermusdra D.radiodurans cpn C.pneumoniae Aquicaeaae A.aeolicus cpj C.pneumoniae-J Fusobacteriafnu F.nucleatum cpa C.pneumoniae-A 71

PAGE 72

Table2-4.Comparisonofthephylogenictreespredictedbydifferentmethods.Distance valuesbetweentheNCBItreeforthe73organismsinTable2-3andthetrees createdbydifferentmethodsforthesameorganisms. TreeEvaluationTreeConstruction DistancestoNCBITree ClementeHeymansOurMethod SymmetricDifference Phylip-NJ 75 8263 Match7-NJ5957 UpdownDistance Phylip-NJ 17,896 23,13417,852 Match7-NJ14,49612,908 72

PAGE 73

CHAPTER3 SUBNETWORKMAPPINGSINALIGNMENTOFNETWORKS Thealgorithmwediscussedinthepreviouschapteraddresstherstchallenge inSection1.1.Inthischapter,wepresentanewalgorithmcalled SubMAP ,which addressesChallenge2aswellasChallenge1.Inotherwords,thealgorithmwe describeinthischapterconsiderssubnetworkmappingsofalltypesofbiological entitiesinmetabolicnetworks.Thissectionbeginsbyextendingthenotationdenedin Chapter2.Then,weformallystatethenetworkalignmentproblembyconsideringthe secondchallengeanddescribetheSubMAPalgorithmindetail. Extendednotation. InadditiontothenotationweusedsofarTable2-1,wedenea subnetworkofanetwork.Asubnetworkisdenedasareactionsubsetofanetwork suchthattheinducedundirectedgraphofthissubsetisconnected.Let R i V beone suchsubnetworkof P .Let R k bethesetofallsubnetworksof P thathaveatmost k reactions,denotedas R k = f R 1 ;R 2 ;:::;R N k g where j R i j k forall i 2 [1 ;N k ] .Here, j R i j denotesthecardinalityofthereactionset R i .Forinstance,forthenetwork P inour runningexample,thesetsofsubnetworksfor k =1 ; 2 ; 3 ; 4 are: R 1 = R = f r 1 ; r 2 ; r 3 ; r 4 g R 2 = R 1 [ff r 1 ; r 2 g ; f r 2 ; r 3 g ; f r 2 ; r 4 gg R 3 = R 2 [ff r 1 ; r 2 ; r 3 g ; f r 1 ; r 2 ; r 4 g ; f r 2 ; r 3 ; r 4 gg R 4 = R 3 [ff r 1 ; r 2 ; r 3 ; r 4 gg Usingthisnotation,wedeneabinaryrelationthatmapsareactionofaquerynetwork toasubnetworkoftheotherasfollows: Denition8. Let P and P betwonetworksand k beapositiveinteger.Also,let R k = f R 1 ;R 2 ;:::;R N k g and R k = f R 1 ; R 2 ;:::; R M k g bethesetsofsubnetworkswithsizeat most k of P and P respectively.Wedeneabinaryrelation between R k and R k that allowsone-to-manyreactionmappingsas : ' k = R 1 R k [ R k R 1 73

PAGE 74

Wedenotethenumberofreactionsof P and P with n and m respectively.The numberofallpossibleone-to-manymappingsbetween P and P is: j k j = nM k + mN k )]TJ/F24 11.9552 Tf 11.955 0 Td [(nm Thealignmentof P and P isabinaryrelation thatisasubsetofallthesepossible mappingsandconsistent.Sinceallowingsubnetworkmappingsintroducesnewtypes ofconictingmappings,inthefollowing,weextendthedenitionofconictandredene consistencyofanalignment. Recallthatforamapping R i ; R j 2 oneofthe R i or R j cancontainmorethan onereaction.Reportingthismappingasapartofouralignmentimpliesthatallthe reactionsofthesubnetworkwithmultiplereactionsarealignedtoasinglereactionof theother.Tohaveaconsistentalignmentnoneofthereactionsofthesesubnetworks canbeincludedinanyothermapping.Inanetworkalignmentwhichallowsone-to-many mappings,wedenethetermconictasfollows: Denition9. Let beabinaryrelationand R i ;R u 2R k and R j ; R v 2 R k .Thedistinct pairs R i ; R j 2 and R u ; R v 2 conict ifandonlyif R i R u [ R j R v 6 = ? WealsoneedtoreconsiderthesimilaritymeasurepresentedinSection2.3.1for subnetworks.InSection3.1.4,wedescribethegeneralizationofourscoringscheme toaccountforone-to-manymappings.Next,westatethealignmentproblemwith subnetworksformally. Problemformulation: Given k andtwonetworks P and P ,let R k and R k bethesetsof subnetworkswithsizeatmost k of P and P respectively.Wewanttondaconsistent binaryrelationi.e.,thealignment R 1 R k [ R k R 1 thatmaximizesthe summationofthesimilarityscoresofthealignedsubnetworks. Inthefollowing,wepresenttheSubMAPalgorithmthatsolvesthisalignment problem.Section3.1describesouralgorithm.Section3.2presentsexperimentalresults. 74

PAGE 75

3.1OurAlgorithm:SubMAP Inthissection,wepresentouralgorithmforpairwisemetabolicnetworkalignment thatallowsone-to-manymoleculemappings.Werstexplainhowweenumeratethe subnetworksofquerynetworksinSection3.1.1.We,then,describethecalculationof homologyandtopologysimilaritiesinSection3.1.2and3.1.3respectively.Section3.1.4 discussestheeigenvalueformulationandextractionofthealignmentwithsubnetwork mappings. 3.1.1EnumerationofConnectedSubnetworks TherststepofSubMAPistocreatethesetsofallconnectedsubnetworksof sizeatmost k foreachquerynetwork.Here,wedescribetheenumerationprocess forasinglequerynetwork.Let G = V;E representanetworkand k beapositive integer.Weconstructthesetofsubnetworks R k asfollows.For k =1 R k = R 1 = V For k> 1 wedene R k recursivelybyusing R k )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 .Ateachrecursivestepwecheckfor eachreactionin V ifitcanbeaddedtoalreadyenumeratedsubnetworksofsize k )]TJ/F15 11.9552 Tf 12.367 0 Td [(1 tocreateanewconnectedsubnetworkofsize k .Thiswaythe k threcursivesteptakes O j V j : jR k )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 j)-222(jR k )]TJ/F22 7.9701 Tf 6.586 0 Td [(2 j time. Thesizeoftheset R k canbeexponentialin k when G isdense.However, metabolicnetworksareusuallysparseontheaveragethereare2.5forwardneighbors perreaction.Weobservethatthenumberofsubnetworksofrealmetabolicnetworksfor k =3 isaround 5 j V j andfor k =4 itis 10 j V j ontheaverage.InSection3.2.2,weprovide adetaileddiscussionofhow jR k j changeswithdifferentnetworksizesanddifferent k values. 3.1.2HomologicalSimilarityofSubnetworks Recallthattherelation mapsareactiontoasubnetworkthatcancontain multiplereactions.Thisnecessitatescomputingthesimilaritybetweenreactionsets. InSection2.3.1,wedescribehowtocomputesimilaritybetweentworeactionsusing similaritiesbetweentheircompounds SimC andenzymes SimE .Herewedescribe 75

PAGE 76

howSubMAPextendsthissimilaritytoaccountforreactionsets.Todothis,themethod rstconstructsthreesetsforbothreactionsets.Thesearetheunionof 1.Theinputcompounds I i 2.Theoutputcompounds O i 3.Theenzymes E i ofthereactionsineachsubnetwork R i .Forinstance,inFigure1-4ifwetakethebottom pathasthesubnetwork R i ,then E i = f 2 : 3 : 1 : 117 ; 2 : 6 : 1 : 17 ; 3 : 5 : 1 : 8 g Next,SubMAPcomputesthesimilarityofeachofthesethreesetpairsandcombine themusingweightsi.e.,non-negativerealnumberstocalculatethehomological similarityofthetworeactionsets.Let MWBM A;B;SimX denotethesimilarity betweentwosets A and B withrespecttothesimilarityscore SimX i.e., SimE or SimC ,where MWBM iscalculatedasthesumofthesimilaritiesofthepairsreturned bytheirmaximumweightbipartitematching.SimilartoChapter2,let i ; o ; e denotethe relativeweightsofthesimilaritiesofenzymes,inputandoutputcompoundsrespectively. Similarityofthereactionsets R i and R j are SimRSet R i ; R j = i MWBM I i ; I j ;SimC + o MWBM O i ; O j ;SimC + e MWBM E i ; E j ;SimE Thealgorithmcalculates SimRSet forallpossibleone-to-manymappingsbetween thesubnetworksoftwonetworks.So,itdoesthiscalculation j k j timesintotal.Thisway, thehomologicalsimilaritiesbetweenallpossiblesubnetworkmappingsareassessed. Eventhoughthisscoringcanbeconsideredagoodmeasureofsimilarityforsubnetwork pairs,relyingsolelyonitignoresthetopologicalsimilarity.Nextsectionfocusesonthis issue. 76

PAGE 77

3.1.3TopologicalSimilarityofSubnetworks SubMAPalgorithmfollowstheintuitionofthealgorithminChapter2andthe IsoRankalgorithm[24]thatifthesubnetwork R i istobemapped R j ,thentheir neighborsinthecorrespondingnetworksshouldalsobesimilar.Withthismotivation,it utilizesthetopologicalsimilaritytofavormappingsofsubnetworksthatinducesimilar topologies.InSubMAP,theauthorsdothisbyextendingtheformulationofthealgorithm describedinChapter2toaccountforsubnetworkmappings.Theyrstexpandthe neighborhooddenitionofreactionstoreactionsubnetworks.Then,theyintroducethe notionofsupportbetweentwosubnetworkmappings. Denition10. Let R i R u 2R k .Then, R u isa forwardneighbor of R i R u 2 FN R i ifandonlyifthereexists r a 2 R i and r b 2 R u suchthat r b isaforwardneighborof r a or R i R u 6 = ? R i isa backwardneighbor of R u R i 2 BN R u ifandonlyif R u isa forwardneighborof R i Denition11. Let R i ;R u 2R k and R j ; R v 2 R k .Themapping R i R j supports themapping R u R v ifandonlyifboth R j 2 FN R i and R v 2 FN R u orboth R j 2 BN R i and R v 2 BN R u Denition12. Let P P betwonetworksand R k R k bethesetsoftheirsubnetworks thathaveatmost k reactions. Thesupportmatrix S isa j k jj k j matrixwitheach entry S [ i;j ][ u;v ] identifyingthefractionofthetotalsupportprovidedby R u ; R v mapping to R i ; R j mapping.Let N u;v = j BN R u jj BN R v j + j FN R u jj FN R v j denotethe numberofallpossiblemappingsbetweenbackwardneighborsof R u and R v plusthe onesbetweentheirforwardneighbors.Then,eachentryof S iscomputedas: S [ i;j ][ u;v ]= 8 > < > : 1 N u;v if R u ; R v supports R i ; R j 0 otherwise 77

PAGE 78

Denition10denesforwardandbackwardneighborsforsubnetworks.Denition11 statesthatthemappingof R i to R j favorsallpossiblemappingsofforwardbackward neighborsof R i tothoseof R j .Finally,Denition12describeshowtodistributethe supportofamappingtoneighborhoodmappings.Toseeitonourrunningexample, letusconsiderthecasewhen k =2 andfocusonthemapping f r 2 g ; f r 2 ; r 3 g .By Denition10wehave: BN f r 2 g = ff r 1 g ; f r 1 ;r 2 gg j BN f r 2 g j =2 BN f r 2 ; r 3 g = ff r 1 g ; f r 1 ; r 2 gg j BN f r 2 ; r 3 g j =2 FN f r 2 g = ff r 3 g ; f r 2 ;r 3 gg j FN f r 2 g j =2 FN f r 2 ; r 3 g = ff r 3 g ; f r 4 g ; f r 2 ; r 4 gg j FN f r 2 ; r 3 g j =3 Then,byDenition12wedistributethesupportofthemapping f r 2 g ; f r 2 ; r 3 g to 2 2+2 3=10 othermappingsbyplacing 1 = 10 inthecorrespondingentriesof S Namely,thesemappingsare f r 1 g f r 1 g f r 1 g f r 1 r 2 g f r 1 r 2 g f r 1 g f r 1 r 2 g f r 1 r 2 g f r 3 g f r 3 g f r 3 g f r 4 g f r 3 g f r 2 r 4 g f r 2 r 3 g f r 3 g f r 2 r 3 g f r 4 g and f r 2 r 3 g f r 2 r 4 g Therecanbecaseswhenonemappingdoesnotprovidesupporttoanyothers.In suchcases,itssupportequallydistributedtoallpossiblemappings j k j .Noticethat, byconstruction,theentriesineachcolumnof S sumsupto1.Wewillnotrecountthe propertiesofsupportmatrixastheyarelistedinChapter2. Atthispoint,supportmatrixcalculationcanbecomeanissueifnotproperly handled.Thetrivialwayofcreatingsupportmatrixistocheckeachmappingagainstall theotherstocalculatethesupportvalues.However,suchanexhaustivestrategywill requirecomputingahugematrix S ofsize j k jj k j .Sincethecreationof S willincur prohibitivecomputationalcosts,SubMAPdoesnotconstructthismatrixliterally.Instead, foreachmapping R i ; R j ,itusesthesets FN R i FN R j and BN R i BN R j to generateonlythepairssupportedby R i ; R j .Inotherwords,itmaintainsthesupport matrix S insparsematrixform. 78

PAGE 79

3.1.4CombiningHomologyandTopology AsdiscussedinChapter2,boththehomologicalsimilaritiesofsubnetworksand theirtopologicalorganizationareimportantfactorsinanaccuratealignment.Herewe describethecombinationofthesetwosimilaritiesinSubMAPwhichisverysimilartothe onedescribedinSection2.3.3. Let k beagivenparameterand P P betwonetworkswithconnectedsubnetwork sets R k = f R 1 ;R 2 ;:::;R N k g and R k = f R 1 ; R 2 ;:::; R M k g respectively.Thecolumn vector H denotesthehomologicalsimilarityofallsubnetworkpairswhichissize j k j Eachentryof H denotesthehomologicalsimilaritybetweentwosubnetworksonefrom eachnetworkwhichcorrespondstoamapping. Let S bethe j k jj k j supportmatrixconstructedasdescribedinSection2.3.2. Givenaparameter 2 [0 ; 1] toadjusttherelativeweightsofhomologyandtopology, combiningthesetwothroughpowermethoditerationsisdoneasfollows: H i +1 = SH i + )]TJ/F24 11.9552 Tf 11.955 0 Td [( H 0 SubMAPiteratesthisequationuntil H i +1 = H i i.e.,itconverges.Pleasereferto Section2.3.3formoredetailsofthisprocess. 3.1.5ExtractingSubnetworkMappings RecallthatSubMAPaimstondanalignmentthatmaximizesthesummationof thesimilarityscoresdenedby H i whilepreservingtheconsistencybetweendifferent mappings.SinceSubMAPallowsone-to-manymappings,extractionofamappingthat maximizesthealignmentscoreisNP-hard.HerewerstshowthatitisinNP-hardby areductionfromthemaximumweightindependentsetMWISprobleminbounded degreegraphs.MWISproblem,evenforgraphswithlargestdegree3,isNP-hard[90] andthereisnoconstantfactorapproximationtotheoptimalsolutioninpolynomial time[91,92].Wethendescribehowtoconstructaconictgraphfromthealignment 79

PAGE 80

A B C Figure3-1.AnillustrativeexampleforthereductionfromtheMWISproblemtothe metabolicnetworkalignmentproblem.aAvertex-weightedgraph G thatis aninputtotheMWISproblem.Fourverticeslabeledfromtohave weightsfromwtowrespectively.Theedgesarelabeledfromatoe andareunweightedandundirected.bTwonetworks P and P constructed from G .Wecreateonevertexforeachvertexof G inbothnetwork P and P Then,for P weaddavertexlabeledwithaletterforeachedgeof G andadd edgesfromittotheverticesonitsbothendsin G .Inordertosimplifythe gure,wematchthelabelofeachvertexin P withthatofitscorresponding vertexoredgein G .Similarly,wematchthelabelofeachvertexin P with thatofitscorrespondingvertexin G .cTheassignmentofsimilarityscores forsubnetworkpairsonefrom P andtheotherfrom P .Eachvertexhere showsasubnetworkfrom P or P .Thelabelofeachvertexliststhevertices containedinthatsubnetwork.Forinstance,labelabindicatesthe subnetworkof P thatconsistsofthethreeverticeslabeledas,aand b.Theedgeweightsshowthesimilarityofthetwosubnetworks correspondingtothetwoverticesatitsendpoints. 80

PAGE 81

problemandemployavertex-selectionheuristicinordertoextractthesetofmappings thatgeneratesthealignment. Theorem2. F INDINGTHEBESTALIGNMENTIS NPHARD Let P and P betwonetworks withreactionsets f r 1 ;:::;r n g and f r 1 ;:::; r m g respectively.Let R = f R 1 ;R 2 ;:::;R N g and R = f R 1 ; R 2 ;:::; R M g beallpossiblereactionsubsetsof P and P withsizeat mostagivenpositiveinteger k .Also,let w : R ; R [0 ; 1] [fg beasimilarity functiondeningthescoreforeachmappingandtheconictsbetweenmappingsbe denedaccordingtoDenition9.Then,ndingasetofmappingsi.e.,alignmentthat maximizesthesumofmappingscoresi.e.,alignmentscoreandhasnoconicting pairsisNP-hard. Proof: WeprovetheNP-hardnessofndingthebestalignmentbyareductionfrom theMWISprobleminboundeddegreegraphs.Let G = V;E;w beavertexweighted undirectedgraphwithlargestdegree k )]TJ/F15 11.9552 Tf 12.271 0 Td [(1 i.e., k = max i =1 ;:::; j V j deg v i +1 .Letusset n = j V j + j E j m = j V j .Wewillconstructtwohypotheticalnetworks P and P througha polynomialtimereductionsuchthattheirbestalignmentisequivalenttotheMWISof G R EDUCTION .Letusdenotethenetworks P and P as P = V 1 ;I 1 and P = V 2 ;I 2 Here, V i and I i i 2f 1 ; 2 g denotethesetofreactionsverticesandinteractionsedges respectively. Weinitialize V 1 V 2 I 1 and I 2 astheemptyset.Wetheninsertavertex r i in V 1 for each v i 2 V .Similarly,weinsertavertex r i in V 2 foreach v i 2 V .Atthispointwehave completedtheconstructionof P butnot P .Wecontinuebyinsertinganewvertexin V 1 foreachedge e 2 I .Thus,eachvertexin V 1 correspondstoeitheravertexoranedgein G whileeachvertexin V 2 correspondstoavertexin G Next,wepopulate I 1 .Let r i and r j betwoverticesin V 1 whichcorrespondtoa vertex v in G andtoanedge e in G respectively.Weincludeanedgebetween r i and r j in I 1 if e has v atoneofitsendsin G .Thiscompletestheconstructionof P 81

PAGE 82

Figure3-1illustratestheconstructionofthetwohypotheticalnetworks P and P fromasimpleinstanceof G .Figure3-1Ashowsasamplegraph G withfourvertices. Figure3-1Bpresentstheresultingnetworks.Wedonotshowtheweightsofthevertices of G tosimplifyFigure3-1A. Followingfromtheconstruction,weensurethatthereisaconnectedsubnetwork in P thatcontainsthereactionscorrespondingtoeachvertexanditsedgesin G .For instance,inFigure3-1A,thevertexlabeledashastwoedgeslabeledasaand b.InFigure3-1Btherearethreereactionsthathavethelabels,aandband theymakeupaconnectedsubnetwork.Thus,thesetofsubnetworksof P withsizeup to k isguaranteedtocontaineachvertexof G jointlywithallofitsedges.Figure3-1C demonstratesthis.Theverticesontheleftsidearethesubnetworksof P andthoseon therightsidearethesubnetworksof P Wecompletethereductionbyassigningsimilaritiestosubnetworkpairsonefrom P andtheotherfrom P .Thesimilaritiesweassigncorrespondtotheentriesofthecolumn vector H describedinSection3.1.4. Let R i beasubnetworkof P thatcorrespondstoavertex v in G andalledgesof G whichhave v ononeend.InFigure3-1B,thesubnetworkthatcontainsthereactions labeled,aandbisanexampletosuch R i .Also,let R i bethesubnetworkin P thatcorrespondstovertex v aswell.Weassignthesimilarityof R i and R i astheweight ofvertex v i.e., w v .Werepeatthisprocessforall v 2 V .Weassignthesimilarity betweenallremainingpairsofsubnetworks R i R j as .Figure3-1Cdepictsthe assignmentofsimilaritiesforthenetworksinFigure3-1B. C ORRECTNESSOFREDUCTION .Weneedtoaddresstwoissuestoprovethecorrectness ofthereduction. Costofreduction. Foragivenvertexweightedgraph G ,wecanreducethe MWISproblemon G tondingbestalignmentprobleminpolynomialtime.Thisis becausewecreateonereactionforeachedgeandtworeactionsforeachvertexin 82

PAGE 83

A B Figure3-2.Illustrationofconictgraph.aEachrowcorrespondstoapossible mappingbetweensubnetworksfromtwohypotheticalmetabolicnetworks. Therstcolumnistheuniquelabelforeachmapping.Secondandthird columnsarethereactionsinthetwosubnetworksthatcanbemapped.The lastcolumnisthesimilaritybetweentwosubnetworks.bTheconictgraph G c forthemappingsina. G .Wealsocreatetwointeractionsforeachedgein G .Thus,weconcludethatthe reductionispolynomialinthesizeof G Equivalenceoftheresult. Next,weprovethattheoptimalalignmentof P and P producestheoptimalsolutiontotheMWISproblemon G Analignmentisasubsetofsubnetworkpairsfromonefrom P andtheotherfrom P .Byconstruction, P containsonlysubnetworksofsizeone.Eachsubnetworkof P correspondstoavertexin G .Weclaimthattheverticesof G correspondingto thesubnetworksof P intheoptimalalignmentconstitutetheMWISof G Clearly,theoptimalalignmentcannotcontainasubnetworkpairwhosesimilarity is .Thisisbecauseitispossibletochooseanarbitrarysubnetworkpairthat hasapositivescore.Also,theoptimalalignmentcannotcontaintwooverlapping subnetworksfromthesamenetworkastheywillconictwitheachother.By construction,twosubnetworksfrom P conictonlyiftheyshareacommon reactionforthesamevertexoredgein G .Forinstancethesubnetworkslabeled abandacdinFigure3-1CconictsincetheybothcontainainFigure3-1B. 83

PAGE 84

Thesubnetworksintheoptimalalignmentcannotcontainsuchreactions.Hence, theoptimalalignmentconstituteanindependentsetin G .Thescoreofthe alignmentisthesumoftheweightsofthecorrespondingverticesin G .Therefore, weconcludethatbymaximizingthealignmentscore P and P ,weoptimallysolve theMWISproblemfor G WedemonstratetheresultonthehypotheticalexampleinFigure3-1.Assume thatalltheverticesinFigure3-1Ahavethesameweight w .TheMWISofthegraph inFigure3-1Ais f 1 ; 3 g .Thisisbecausenodes 2 and 4 conictwithalltheremaining nodes.Theoptimalalignmentof P and P containsthefollowingsetofsubnetworkpairs f ab, 1 ,ce, 3 g andhasanalignmentscoreof2 w .Thisisbecausetheremaining subnetworkpairseitherconictwitheachotherorhaveaminusinnity score. Sincetheoptimalalignmentcontainsnodes 1 and 3 ,itsuggeststhattheMWISof G is f 1,3 g SinceextractingthebestalignmentisNP-hard,SubMAPusesaheuristicstrategy totacklethisproblem.Intherststep,thescoresofmappingsrepresentedby H i andthedenitionofconictDenition9createsavertexweightedundirectedgraph G c = V c ;I c ;w ,namelytheconictgraphasfollows.Eachsubnetworkpair R i ; R j correspondstoavertexin V c .Theweightofeachvertex a = R i ; R j 2 V c i.e., w a is thesimilaritybetween R i and R j ascomputedin H i .Anundirectededgebetweentwo vertices a = R i ; R j and b = R u ; R v existsif R i R u [ R j R v 6 = ? i.e., a and b conict.Forinstance,inFigure3-2thereisanedgebetween a and b representingthe factthattheyconictsincereaction r 1 iscommontoboth a and b Thesecondstepisthegreedyvertex-selectionstrategywhichisadoptedfrom Sakai etal. [93]inordertoextracttheMWISof G c asthealignment.Let N v denote thesetofverticesthatareconnectedto v 2 V c .Ateachiterationofthisalgorithm, thisheuristicpicksavertex v thatmaximizes f v = P 8 u i 2 N v w v w u i .Thisstrategy impliesthatavertexismorelikelytobepickedifthemappingitrepresentshaslarge similarityscoreandconictswithasmallnumberofothermappingswithsmallsimilarity scores.Afterpickingavertex v ,itputs v intotheresultingsetandremoves v andallthe 84

PAGE 85

verticesconnectedtoitfrom G c i.e., v [ N v .Italsoremovesalltheedgesincident toatleastoneoftheremovedvertices.Whentherearenomoreverticestoremove from G c ,theresultsetcontainsamaximalweightindependentset.Forthealignment problem,thisvertexsetcorrespondstoalistofnon-conictingsubnetworkmappings. Asanexample,inFigure3-2, d istherstvertextobepicked.Then,weremove d and e 2 N d fromthegraphandinsert d totheresultset.Next,wepickthevertex b as f b = 0 : 6 0 : 7 >f a = 0 : 7 0 : 6+0 : 4 >f c = 0 : 4 0 : 7 .Weremove b and a 2 N b andinclude b in theresultset.Finally,only c isleftandtakingitintotheresultset,thealignmentisthe mappings b = r 1 ; r 2 c = r 3 ; r 1 and d = r 4 ; f r 3 ; r 4 ; r 5 g 3.2ResultsandDiscussion Inthissection,weexperimentallyevaluatetheperformanceofSubMAP. Dataset: Weusethemetabolicnetworksof20organismstakenfromtheKEGG database.Ourdatasetcontains1,842networksintotal.Theaveragenumberof reactionspernetworkis21andthelargestnetworkhas72reactions. 3.2.1AlternativeSubnetworks Differentorganismscanperformthesamefunctionthroughdifferentsubnetworks. Wenamesuchalteredpartsthathavesimilarfunctionsasalternativesubnetworks.An accuratealignmentshouldrevealalternativesubnetworksindifferentnetworks.Inour rstexperimentweevaluatewhetherSubMAPcanndtheminrealmetabolicnetworks. Wealignthenetworkpairswhichareknowntocontainfunctionallysimilarpartswith differentreactionsetsandtopologies.Table3.2.5presentsasubsetofmappingsthat arefoundbyouralgorithm. TherstrowofTable3.2.5correspondstoalternativesubnetworksinFigure1-4. ThereactionR07613representstheupperpathinFigure1-4thatplantsandChlamydia usetoproduceLL-2,6-Diaminopimelatefrom2,3,4,5-Tetrahydrodipicolinate.Thispath isdiscoveredandreportedasashortcutontheL-Lysinesynthesispathforplantsand Chlamydiawhichisnotpresentinhumans[27,28].Watanabe etal. [27]alsosuggest 85

PAGE 86

thatsincehumanslackthispathandhencethecatalyzerofthereactionR07613, namelyLL-DAPaminotransferaseEC:2.6.1.83,thisisanattractivetargetforthe developmentofnewdrugsantibioticsandherbicides.WhenwealigntheLysine biosynthesispathwaysof H.sapiens and A.thaliana aplant,ouralgorithmmapped thereactionR07613of A.thaliana tothethreereactionsthat H.sapiens hastouseto transform2,3,4,5-TetrahydrodipicolinatetoLL-2,6-DiaminopimelateR02734,R04365, R04475.Inotherwords,SubMAPsuccessfullyidentiedthealternativesubnetworksof differentsizefor A.thaliana and3for H.sapiens thatperformthesamefunction. Anotherinterestingexampleisthesecondrowthatisextractedfromthesame alignmentdescribedabove.Inthiscase,thethreereactionsthatcanproduceL-Lysine for A.thaliana arealignedtotheonlyreactionthatproducesL-Lysinefor H.sapiens R00451iscommontobothorganismsanditutilizesmeso-2,6-Diaminopimelateto produceL-Lysine.ThereactionsR00715andR00716takeplaceandproduceL-Lysine in A.thaliana inthepresenceofL-Saccharopine[94]. ForthealignmentofPyruvatemetabolismsof E.coli and H.sapiens ,thethirdand fourthrowsshowtwomappingsthatarefoundbySubMAP.Therstonemapsthetwo stepprocessin E.coli thatrstconvertsPyruvatetoOrthophosphateR00199andthen OrthophosphatetoOxaloacetateR00345tothesinglereactionthatdirectlyproduces OxaloacetatefromPyruvateR00344in H.sapiens .Thesecondoneshowsanother mappinginwhichasinglereactionof E.coli isreplacedbytworeactionsof H.sapiens ThersttworowsforCitratecyclealsoreportsimilarmappingsforotherorganismpairs. Notethatalltheaboveexamplesareone-to-manyreactionmappingsandhence ameritofthenewalgorithmweproposehere.OuralgorithmSubMAPalsoreports one-to-onemappings.ThelastmappingofFigure3-3isanexampleinwhichone reactionofanorganismisreplacedbyexactlyonereactionofanotherorganism. AligningCitratecyclesof H.sapiens and A.tumefaciens revealsthateventhoughboth theinputandoutputcompoundsoftworeactionsR00709andR00362aredifferent 86

PAGE 87

Figure3-3.VisualrepresentationsofsubnetworkmappingsreportedinTable3.2.5. Figuresathroughjcorrespondtorows1through10ofTable3.2.5. EnzymesarerepresentedbytheirEnzymeCommissionECnumbers[19]. 87

PAGE 88

SubMAPmapsthesereactions.Also,ifwelookattheECnumbersoftheenzymes catalyzingthesereactions.1.1.41and4.1.3.6theirinformationcontentenzyme similarityiszero.Ifweweretoconsideronlythehomologicalsimilarities,thesetwo reactionscouldnothavebeenmappedtoeachother.However,boththesereactions aretheneighborsoftwootherreactionsR01325andR01900thatarepresentinboth organisms.ThemappingsofR01325toR01325andR01900toR01900support themappingoftheirneighborsR00709toR00362.Therefore,byincorporatingthe topologicalsimilarityouralgorithmisabletondmeaningfulmappingswithsimilar topologiesanddistincthomologies.Analgorithmnotconsideringnetworktopologies wouldfailtoidentifysuchmappings. Theseresultssuggest:iByallowingone-to-manymappings,ourmethodidenties functionallysimilarsubnetworkseveniftheyhavedifferentnumberofreactions.iiThe incorporationoftopologicalsimilaritymakesitpossibletondmappingsthatcanbe missedbyonlyconsideringhomologicalsimilarity. 3.2.2NumberofConnectedSubnetworks Giventheparameter k ,ouralgorithmenumeratesallconnectedreactionsubnetworks ofsizeatmost k foreachquerynetwork.Onequestionthatweneedtoansweris:How manysuchsubnetworksexist?Figure3-4plotsthisnumberforallthenetworksinour dataset.When k =1,thegureshowsthenumberofreactionsineachnetwork.For k> 1 theresultsdemonstratethatthenumberofsubnetworksincreaseexponentially with k .However,theincreaseissignicantlylowerthanthetheoreticalworstcase P k i =1 )]TJ/F25 7.9701 Tf 5.479 -4.379 Td [(n i i.e., n choose i .Forinstance,thelargestnumberofsubnetworksweobtained for n =72and k =5isaround750timeslessthanthetheoreticalworstcase. Thegurealsosuggeststhatthenumberofsubnetworksincreaselinearlywith thesizeofthenetwork.Thisismainlybecausetheaveragenumberofedgesi.e., neighborsofanodei.e.,subnetworkremainsroughlysameasthesizeofthenetwork increases.Asaresult,weconcludethatfor k 4,wecanenumerateandstoreallthe 88

PAGE 89

Figure3-4.Thenumbersubnetworkswithatmost k nodesfornetworksofdifferent sizes. subnetworksinourdataset.Thenumberofsubnetworksfor k =5isstillsmallenoughto handle.However,inpracticeitisunlikelyforasinglereactiontoreplaceasubnetwork withsuchalargenumberofreactions.Weexpectthat k 4wouldbesufcienttond mostofthealternativesubnetworks.Hence,weuse k 4 inourexperiments. 3.2.3One-to-manyMappingswithinandacrossMajorClades InSection3.2.1,wedemonstratedthatouralgorithmcanndalternativesubnetworks onanumberofexamples.Anobviousquestionthatfollowsis:Howfrequentaresuch alternativesubnetworksandwhataretheircharacteristics?Inotherwords,isthere reallyaneedtoallowone-to-manymappingsinalignment.Inthisexperimentweaimto answerthesequestions. Weconductanexperimentasfollows.Werstpick9differentorganisms3from eachmajorphylogenicclade.Theseorganismsare T.acidophilum Halobacterium sp. M.thermoautotrophicum fromArchaea; H.sapiens R.norvegicus M.musculus 89

PAGE 90

fromEukaryota;and E.coli P.aeruginosa A.tumefaciens fromBacteria.Wethen extract10commonnetworksforthese9organismsfromKEGG.Foreachofthese commonnetworks,wechooseallpossiblepairsofthe9organisms )]TJ/F22 7.9701 Tf 5.479 -4.379 Td [(9 2 =36 and alignthatspecicnetworkforallorganismpairs.Inthesealignmentsweexcludethe selfalignmentsandthealignmentwithparameter k =1 sincethosewilldenitely incurabiasfavoringthenumberofone-to-onealignments.Wecomputedallpossible alignments 36=360for k =2 ,3and4 3=1,080alignmentsintotal. Finally,wecalculatedthenumberoffourpossibletypesofsubnetworkmappings whichare1-to-1,1-to-2,1-to-3and1-to-4.Wehypothesizethatthemetabolismsof theorganismswithinacladewilltendtoperformthesamefunctionthroughthesame orsimilarsizedsetsofreactionswhilethoseacrossdifferentcladeswillperformfrom alternativesubnetworksofvaryingsizes. Table3-3summarizestheresultsofthisexperiment.Thepercentagesofeach mappingtypebetweentwocladesisshownasarowinthistable.Therstthreerows correspondstoalignmentswithinacladeandthelastthreerepresentsalignments acrosstwodifferentclades.Animportantoutcomeoftheseresultsisthatthereare considerablylargenumberofone-to-manymappingsbetweenorganismsofdifferent clades.Intheextremecaselastrow,nearlyhalfofthemappingsareone-to-many. Theresultsalsosupportourhypothesisthatone-to-onemappingsismorefrequentfor alignmentswithinthecladescomparedtoacrosscladesduetohighsimilaritybetween theorganismsofthesameclade.Forinstance,forboththerstandlastrowonesideof thequerysetistheEukaryota.However,goingfromrstrowtolast,weseearound40% decreaseinthenumberofone-to-onemappingsand250%,850%and450%increase inthenumberof1-to-2,1-to-3and1-to-4mappingsrespectively.ConsideringArchaea aresingle-celledmicroorganismse.g.,HalobacteriaandEukaryotaarecomplex organismswithcellmembranese.g.,animalsandplants,thesejumpsinthenumber ofone-to-manymappingssuggestthattheindividualreactionsinArchaeaarereplaced 90

PAGE 91

byanumberofreactionsinEukaryota.Theseresultshavetwomajorimplications. iOne-to-manymappingsarefrequentinnature.Toobtainbiologicallymeaningful alignmentsweneedtoallowsuchmappings.iiThecharacteristicsofthealterative subnetworkscanhelpininferringthephylogenicrelationshipamongdifferentorganisms. 3.2.4EvaluationofRunningTimeandMemoryUtilization SubMAPallowsonetomanymappingstondbiologicallyrelevantalignments. Thishowevercomesattheexpenseofincreasedcomputationalcost.Theoretically, thisincreasecanbeexponentialin k intheworstcase.Theworstcasehappenswhen thenetworkishighlyconnected.Metabolicnetworkshoweveraresparseandtheir connectivityfollowspowerlawdistribution[95].Inordertounderstandthecapabilities andlimitationsofourmethodweexamineitsperformanceonrealdatasetsintermsof itsrunningtimeandmemoryusage. Weevaluatetheperformanceofourmethodforqueryingadatabaseofnetworks asfollows.Wecreateaquerysetbyselecting50networksofvaryingsizesfromour datasetdescribedatthebeginningofthissection.Wethenselectanother50networks ofdifferentsizestouseasourdatabasesetforthisexperiment.Wepickthelatter50 networkssuchthattheaveragereactionspernetworkis21.4,whichisverycloseto thatoftheentiredatabase.Wethenaligneachquerynetworkwithallthedatabase networksonebyonefordifferentvaluesof k .Wemeasuretheaveragerunningtimeand theaveragememoryusageforeachquerynetworkand k valuecombination.Notethat wedonotpresentanyperformancecomparisonwithanexistingmethodastheexisting methodsdonotallowone-to-manymappings.However,ourresultsfor k =1,showsthe performanceofouralgorithmwhenwerestrictittoone-to-onemappings. Figure3-5showstheaveragerunningtimeofSubMAPforquerynetworkswith increasingnumberofreactions.When k =1i.e.,onlyone-to-onemappingsasin existingmethods,itrunsinlessthan0.2secondsevenforthelargestquerynetworkin ourqueryset.As k increases,therunningtimeincreasesaswell.Thisisbecause 91

PAGE 92

Figure3-5.TheaveragerunningtimeofSubMAPwhenaquerynetworkisalignedwith allthenetworksinanetworkdatabase.Theselectednetworkdatabase contains50networks.X-axisisthenumberofthereactionsofthequery networks. thenumberofsubnetworksandtheaveragenumbersofforwardandbackward neighborsofsubnetworksincreasewith k .However,weobservethatourmethod canperformalignmentsinpracticaltimeevenwhen k =4.Italignsnetworkswitharound 50reactionsinlessthanoneminuteand20minutesfor k =3and4respectively.Itruns inlessthan15minutesforthelargestquerynetworkreactionsinourquerysetfor k =3. Wealsomeasuretheactualmemoryusageofouralgorithmforrealnetworks ofvaryingsizesand k valuesFigureomitted.For k =1or2,thememoryusageis negligibleMBorlessforallnetworks.Althoughthememoryusageincreaseswith k ,itremainsfeasibleevenforquerynetworkswitharound50reactionsfor k =4.Our algorithmuseslessthan300MBforthelargestquerywhen k =3.Fortwoquery networksbothwitharound50reactionsand k =4 ,thememoryrequirementisaround 92

PAGE 93

Figure3-6.TheaveragememoryutilizationofSubMAPwhenaquerynetworkis alignedwithallthenetworksinanetworkdatabase.Theselectednetwork databasecontains50networks.X-axisisthenumberofthereactionsofthe querynetworks. 600MB.Thus,ouralgorithmcanrunonastandardcomputerforaligningreal-sized metabolicnetworks. 3.2.5Discussion Inthischapter,weconsideredtheproblemofaligningtwometabolicnetworks. Thedistinguishingfeatureofourworkfromtheliteratureisthatweallowmapping onemoleculeofonenetworktoasetofmoleculesoftheother.Toaddressthis problem,giventwometabolicnetworks P and P andanupperbound k onthesize oftheconnectedsubnetworks,wedevelopedtheSubMAPalgorithmthatcannd theconsistentmappingofthesubnetworksof P and P withthemaximumsimilarity. Wetransformedthealignmentproblemtoaneigenvalueproblem.Thesolutiontothis eigenvalueproblemproducedagoodmixtureofhomologicalandtopologicalsimilarities ofthesubnetworks.Usingthesesimilarityvalues,weconstructedavertexweighted 93

PAGE 94

graphthatconnectsconictingmappingswithanedge.Then,ouralignmentproblem transformedintondingthemaximumweightindependentsubsetofthisgraph.We employedaheuristicmethodthatisusedtosolvemaximumweightindependentset problem.Theresultofthismethodprovidedusanalignmentthathasnoconictingpair ofmappingsi.e.,consistent.Ourexperimentsonrealdatasetssuggestedthatour methodcanidentifybiologicallyrelevantalignmentsofalternativesubnetworksthatare missedbytraditionalmethods.Furthermore,eventhoughSubMAPdoesnotrestrictthe topologiesofquerynetworks,itisstillscalableforrealsizemetabolicnetworkswhenthe reactionsubsetsofsizeatmostfourareconsidered. 94

PAGE 95

Table3-1.Extendedtableoffrequentlyusedsymbols. SymbolDescription k Parameterforthelargestsubnetworksize R i R j Subnetworksofquerynetworks R k R k Setsofallsubnetworkswithsizeatmost k n m Numbersofthereactionsinquerynetworks N k M k Numbersofallsubnetworksofsizeatmost k I i O i E i Setofinputcompounds,outputcompoundsandenzymesofsubnetwork R i Relationwhichrepresentsthealignmentwithsubnetworkmappings k Setofallpossibleone-to-manymappingsforagivenk j k j Numberofallpossibleone-to-manymappingsforagivenk S Supportmatrixforreactionsubnetworkmappings G c Conictgraph Table3-2.Alternativesubnetworksthatproducesameorsimilaroutput compoundsfromthesameorsimilarinputcompoundsindifferent organisms. NetworkOrganismsInputComp. a OutputComp. b ReactionMappings c Lysinebiosynthesis A.thaliana 2,3,4,5-Tetra-LL-2,6-DiR07613 R02734+R04365+R04475 E.coli hydrodipico.aminopimelate Lysinebiosynthesis A.thaliana L-Saccharopine L-LysineR00451+R00715+R00716 R00451 E.coli meso-2,6-Di. Pyruvatemetabolism E.coli PyruvateOxaloacetateR00199+R00345 R00344 H.sapiens Pyruvatemetabolism E.coli Oxaloacetate PhosphoenolR00341 R00431+R00726 H.sapiens pyruvate Pyruvatemetabolism T.acidophilum PyruvateAcetyl-CoAR01196 R00472+R00216+R01257 A.tumefaciens Glycine,serine, H.sapiens Glycine Serine R00945 R00751+R00945+R06171 threoninemetabolism R.norvegicus L-Threonine Fructoseand E.coli L-Fucose L-Fucose1-p R03163+R03241 R03161 mannosemetabolism H.sapiens L-Fuculose1-p Citratecycle S.aureusN315 Isocitrate2-OxoglutarateR00268+R01899 R00709 S.aureusCOL Citratecycle H.sapiens SuccinateSuccinyl-CoAR00432+R00727 R00405 A.tumefaciens Citratecycle H.sapiens Isocitrate2-Oxoglutarate R00709 R00362 A.tumefaciens CitrateOxaloacetate a Maininputcompoundutilizedbythegivensetofreactions; b Mainoutputcompoundproducedbythegivensetofreactions; c Reactionsmappingsthatcorrespondstoalternativepaths.Reactionsare representedbytheirKEGGidentiers. 95

PAGE 96

Table3-3.Characteristicsofmappingsinbetweenandacrossthreemajorclades Percentagesof1-to-1,1-to-2,1-to-3and1-to-4mappingsinbetweenand acrossthreemajorcladesA:Archaea,E:Eukaryota,B:Bacteria. 1-to-11-to-21-to-31-to-4 E-E89.68.81.10.5 B-B80.116.03.10.8 A-A78.315.74.71.3 B-E69.123.16.31.5 A-B60.528.38.52.7 A-E55.831.010.42.8 96

PAGE 97

CHAPTER4 LARGESCALEMETABOLICNETWORKALIGNMENTBYCOMPRESSION Aligninglargescalenetworksisacomputationallychallengingproblemduetothe underlyingsubgraphisomorphismproblemthathastobesolvedtondthealignment thatmaximizesthesimilaritybetweenthequerynetworks.Chapters2and3present twometabolicnetworkalignmentalgorithmsthattacklesthersttwochallengesin Section1.1.However,especiallywhensubnetworksareconsideredinalignmentas inSubMAP,therunningtimeandmemoryutilizationofalignmentmethodscanstillbe prohibitiveforlargequerynetworks.Forinstance,SubMAPSubMAPtakesaround 2minutesand200MBsofmemoryontheaverageperalignmentwithadatabaseof 50networkswithsizesrangingfrom2to57.Therefore,improvingtherunningtime andmemoryutilizationoftheexistingalignmentmethodsisnecessarytoleveragethe alignmentoflargerscalenetworksespeciallywhensubnetworkmappingsareallowed. Inthischapter,wedevelopaframeworkthatsignicantlyimprovesthescaleofthe networksthatcanbealignedusingexistingalgorithms.Ourframeworkhasthreemajor phases,namelythecompressionphase,thealignmentphaseandtherenementphase. Fortherstphase,wedevelopacompressionmethodthatreducesthesizeoftheinput metabolicnetworksbyadesiredrate.Inotherwords,wetransformthequerynetworks fromtheiroriginaldomainsFigure4-1AtoacompresseddomainFigure4-1D.A singlenodeincompresseddomaincorrespondstoasetofconnectednodesandthe edgesbetweenthemintheoriginaldomain.Wecalleachnodeinthecompressed networkasupernode.Forinstance,Figure4-1Ddepictsthecompressednetworksof thetwoinputnetworksinFigure4-1Awheneachsupernodeisallowedtocontainupto twonodesi.e.,onlyonelevelofcompressionisallowed.Inthesecondphase,wecarry outthealignmentinthecompresseddomainbyusinganexistingnetworkalignment algorithm.HereweuseSubMAPasourbasealignmentmethod.Itisworthnotingthat, ourframeworkcanbeusedwithotheralignmentmethodsaswellsincetheperformance 97

PAGE 98

gainisaninherentpropertyofcompressionforanybasealignmentalgorithmaslongas thequerynetworkscanbecompressed.Oncethecompressednetworksarealigned, wenextconsidereachmappingofsupernodesfoundbytherstphaseindividually. Eachsuchmappingsuggestsasmallerinstanceofnetworkalignment.Figure4-1F demonstratesthiswheretwosuchinstancesexist.Foreachofthesemappings,we solvethealignmentproblemusingthebasealgorithm.Attheendofthisrenement phase,thenalalignmentsofreactionsareextractedFigure4-1G. Wecanbestdescribetheneedforourframeworkonanexample.Figure1 illustratesthedifferencebetweenaligningtwometabolicnetworksincompressed domainversusaligningthemintheoriginaldomainwithoutcompression.Ifweusea basealignmentalgorithmsuchasSubMAPorIsoRank,thetimeandspacecomplexity ofthealgorithmisdeterminedbythesizeofadatastructure,namedsupportmatrix. Conceptually,thisdatastructuregovernsthetopologicalsimilaritiesbetweenevery pairofreactiontuples.Eachreactiontuplecontainsonereactionfromeachofthetwo querymetabolicnetworks.Adetaileddescriptionofthismatrixcanbefoundinprevious articlesdescribingtheIsoRank[24]andSubMAPmethods[96].Thesizeofthissupport matrixisquadraticintermsofboth n and m i.e., O n 2 m 2 forIsoRankandforSubMAP whenonlysubnetworksofsizeoneareallowed.Figures4-1Band4-1Eillustratethe supportmatricesrequiredforalignmentstartingfromthenetworksshowninFigure4-1A and4-1Drespectively.Asaresultofcompressionbyonlyonelevel,thesizeofthe matrixweneedtocreate,dropsto6 6from20 20whichtranslatesintomorethana factorof10improvementintheperformanceofthebasemethod. Noticethatwhenwecompressthenetworkmorei.e.,increasethenumberof compressionlevels,thecompressednetworkgetssmallerintermsofthenumberof nodes.Asaresult,wecanalignthecompressednetworksfaster.However,thiscomes atthepriceoftwodrawbacksbothduetothefactthateachsupernodetendstocontain manynodesfromtheoriginaldomain.First,oncewendamappingforthesupernodes 98

PAGE 99

A B C D E F G Figure4-1.Aligningtwometabolicnetworkswithandwithoutcompression.Topgures a-cillustratethestepsofalignmentwithoutcompression.Bottomgures d-gdemonstratedifferentphasesofalignmentwithcompressionusingour framework.aTwohypotheticalmetabolicnetworkswith5and4reactions respectively.Directededgesrepresenttheneighborhoodrelationsbetween thereactions.bSupportmatrixofsize20 20neededforthealignmentif compressionisnotused.Weonlyshowthenon-zeroentriesofasinglerow thatcorrespondstotopologicalsupportgivenby b )]TJ/F24 11.9552 Tf 11.955 0 Td [(b 0 mappingtopossible mappingsofitsbackwardandforwardneighbors.Fivesuchmappings supportedequallyaredenotedby 1 5 sinthematrix,namely a )]TJ/F24 11.9552 Tf 11.75 0 Td [(a 0 mappingfor thebackwardneighborsand c )]TJ/F24 11.9552 Tf 11.955 0 Td [(c 0 c )]TJ/F24 11.9552 Tf 11.955 0 Td [(d 0 d )]TJ/F24 11.9552 Tf 11.956 0 Td [(c 0 and d )]TJ/F24 11.9552 Tf 11.955 0 Td [(d 0 mappingsforthe forwardneighbors.cTheresultingreactionmappingsofalignmentwithout compression.dQuerynetworksshowninaincompresseddomainafter onelevelofcompression.eSupportmatrixofsize6 6neededforthe alignmentwithcompression.Weonlyshowtheentriesforthemappings supportedbythe a;b )]TJ/F24 11.9552 Tf 11.955 0 Td [(a 0 ;b 0 mapping.fTheresultingmappingsfromthe alignmentincompresseddomain.gTheresultingreactionmappingsafter renementphaseofourframework. 99

PAGE 100

inthecompresseddomain,westillneedtoalignthenodesofeachsupernodepair. Forexample,aftermappingthesupernodesa,band a 0 b 0 showninFigure4-1F,we needtoalignthetwosubnetworksinducesbythesetwosupernodes.Thusasthesize ofthesupernodesgrowi.e.,aswecompressformorelevels,thesizeofthesmaller probleminstancesgrowaswellandresourceutilizationbottleneckshiftsfromthe alignmentphasetorenementphase.Second,whenweusecompressiontheresulting alignmentmaynotbethesameastheonefoundbytheoriginalalgorithm.Forexample, oneoutoffourmappingsinFigure4-1Gi.e., e )]TJ/F24 11.9552 Tf 12.45 0 Td [(c 0 isdifferentthantheresultsofthe basealgorithmshowninFigure4-1Ci.e., e )]TJ/F24 11.9552 Tf 12.716 0 Td [(e 0 .Wecalculatetheaccuracyasthe correlationofthescorescalculatedforeachpossiblemappingfoundbyourframeworkin thecompresseddomainwiththescoresforthesemappingintheoriginaldomainfound bythebasemethod.Biggercompressionratesgenerallymeanlesssimilaritybetween theresultsofthetwomethodsi.e.,lessaccuracy. Severalkeyquestionsfollowfromtheseobservations. 1.Howfarisourcompressionmethodfromanoptimalcompressionthatproduces thecompressednetworkwiththeminimumnumberofnodes? 2.Whatistherightamountofcompression?Thatis,whendoescompression minimizetherunningtimeofouroverallframework? 3.Howdoescompressionaffectthealignmentaccuracywithrespecttothebase networkalignmentmethod? Intherestofthechapterweaddresseachofthesequestionsindetail. OurexperimentsonmetabolicnetworksextractedfromKEGGpathwaydatabase[7] demonstratethatourcompressionmethodreducesthenumberofverticesandedges byalmosthalfateachlevelofcompressionSection4.3.1.Asaresultofthisreduction, weobservesignicantamountofimprovementinrunningtimeandmemoryutilizationof ourearlieralignmentalgorithmSubMAPSection4.3.2.Lastly,weanalyzetheaccuracy ofourframeworkascomparedtothebasealignmentalgorithm.Theresultssuggest thatthealignmentobtainedbyonlyonelevelofcompressioncapturestheoriginal 100

PAGE 101

alignmentresultswithveryhighaccuracyandtheaccuracydecreaseswithfurtherlevels ofcompressionSection4.3.3. Ourtechnicalcontributionscanbesummarizedasfollows.Wedeviseanefcient frameworkforthenetworkalignmentproblemthatemploysascalablecompression methodwhichshrinksthegivennetworkswhilerespectingtheirtopology.Weprovethe optimalityofourcompressionmethodundercertainconditionsandprovideaboundon howmuchourcompressionresultscandeviatefromtheoptimalsolutionintheworst case.Weprovideamathematicalformulationthatservesasaguidelinetoselectan optimalnumberofcompressionlevelsdependingontheinputcharacteristicsofthe alignment. Theorganizationoftherestofthischapterisasfollows.Section4.1presentsthe methodweproposeforcompressingthemetabolicnetworks.Section4.2describes theremainingphasesofourframeworkthatperformsthealignmentincompressed domainandanalyzesitscomplexity.Wereportourexperimentalresultsonadatasetof metabolicnetworksinSection4.3. 4.1CompressionPhase Inthissection,wedescribethemethodwedevelopinordertocompressthe querynetworks.Beforegoingintodetail,itisimportanttostatethatweareusinga reaction-basedmodelforrepresentingmetabolicnetworksthroughoutthischapter. Formally,werepresentametabolicnetworkwith P = V E where V isthesetof allreactionsofthenetworkand E isthesetofdirectededgesbetweenthem.An edge e ij 2 E existsifandonlyifthereaction v i hasatleastoneoutputcompound whichisaninputforthereaction v j .Inthefollowing,werstdescribeournetwork compressionmethodSection4.1.1.Weusetheshorthandnotation MDS minimum degreeselectiontorefertothismethodintherestofthechapter.We,then,prove theoptimalityof MDS undercertainconditionsandprovideanupperboundforthe 101

PAGE 102

numberofcompressionsthatcanbemissedbythismethodwithrespecttotheoptimal compressionSection4.1.2. 4.1.1MinimumDegreeSelectionMDSMethodforCompression Let P = V E bethereaction-basedrepresentationofametabolicnetworkand c denotetheuserspeciedparameterforthedesiredlevelofcompression.For x = 1, ::: c ,wedenotethecompressedformof P after x compressionlevelswith P x = V x E x .Tosimplifyournotation,weassumethat P 0 = P .Weconstruct P x from P x )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 foreach x = 1, ::: c .Each v 2 V x iseitheranodefrom V x )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 orasupernodethat containstwonodesof V x )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 .Insummary,weconstruct V x from V x )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 inanumberof consecutivesteps.Ateachstep,wechooseapairofconnectednodesin V x )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 that arenotcompressedinearlierstepsofthecurrentcompressionlevel.Wethenmerge thisnodepairintoasupernodeandadditto V x .Werepeatthesestepsuntilthereis nosuchnodepairin V x )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 .Assumethatthenumberofsuchstepsis t forcompression level x .Wedenotethestateofthenetworkafterthe i thstepduringthe x thlevelof compressionas P x i = V x i E x i Figure4-2B.Notethat, V x t = V x and V x i V x )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 [ V x for each i = 1, ::: t asthenodesof V x i areeithersingletonnodesfrom V x )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 orsupernodes from V x Wearenowreadytodiscusshowwecompress P x )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 toget P x .Wedenethe degreeofanon-compressednode v inagivennetworkas deg v = indeg v + outdeg v ,where indeg v outdeg v denotesthenumberofincomingedgesfrom out-goingedgestonon-compressednodesinthenetwork.Wesaythattwonodes inanetworkareneighborsiftheyareconnectedbyatleastoneedge.Wedenotethe setofneighborsofanode v with N v .Westartthecompressionbyinitializing V x 0 = V x )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 E x 0 = E x )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 .Then,whilethereexistsanon-compressednodewithdegreegreater thanzeroatthecurrentstateofthenetwork,say P x i )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 ,weapplythenextstep,the i th step,ofcompressiontoobtain P x i from P x i )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 .Figure4-2depictsthestatesofanexample networkbeforeFig.4-2AandafterFig.4-2Bthe i thstepofcompression.Westart 102

PAGE 103

the i thstepbyselectinganodewithminimumpositivedegreeamongthenodesin V x i )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 .Iftherearemorethanonesuchnode,weselecttherstoneamongthem.In ourexampleinFigure4-2A,thenodewithminimumdegreeisuniqueandisshown by v a .Weusethetermminimumdegreeasashorthandforminimumpositivedegree toexcludesingletonnodesfromthisselection.Thiswayweensurethat deg v a > 0 and N v a isnon-empty.Weselectonesuchneighborfrom N v a ,say v b .Theonly nodein N v a inFigure4-2Aisdenotedwith v b .We,then,merge v a with v b toformthe supernode v ab = f v a v b g .Figure4-2Billustratesthisnewlycreatednode v ab .Thisisthe onlycompressiontobedoneatthe i thcompressionstep.Next,wecreatethenewnode setas V x i = V x i )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 [f v ab g)-278(f v a v b g .Forcreatingtheedgeset E x i ,werstinitializeitto E x i )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 andremovealltheincomingandout-goingedgesof v a and v b fromit.Then,we insertanincomingedgeto v ab fromeachnodein V x i )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 )-278(f v a v b g ,whichhasanout-going edgetoeither v a or v b inthepreviousedgeset E x i )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 .Weinsertout-goingedgesfrom v ab toothernodesinasimilarmanner.Figure4-2illustratesthechangesintheedgeset aftercreating v ab .Noticethatforeach i = 1, ::: t ,theset V x i containsamixtureofnodes andsupernodes.Aftereachsuchstep,theoverallsizeofthenetworkdecreasesbyone andthenumberofedgesofthenewnetworkdecreasesatleastbyone.Forinstance inFigure4-2,thenumberofnodesdroppedfromvetofourandthenumberofedges droppedfromsixtove.Thecompressionof P x )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 toget P x continuesbyapplying anothercompressionstepuntiltherearenomorenon-compressednodeswithdegree greaterthanzero. Thediscussionabovedescribestheintermediatecompressionstepsofthe MDS methodtoperformasinglelevelofcompressiononagivennetwork.Givena compressionlevel c ,foreachlevel x = 1, ::: c ,weapplythesamecompressionsteps on P x )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 = V x )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 E x )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 byinitiallytreating P x )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 asanon-compressednetworkwithno supernodes.Asaresultofthisprocess,afternishingthe x thlevelofcompression, theactualnumberofreactionsthateachnodeof V x cancontainisassuretobeinthe 103

PAGE 104

A B Figure4-2.Onecompressionstepofthe MDS methodonahypotheticalmetabolic network P .Smallcirclesrepresentreactionsandbigcirclesrepresent supernodesthatresultfromearlierstepsofcompression.Asolidarrow representsanedgebetweentwonon-compressednodesinthecurrent compressionlevel.Adashedarrowdenotesanedgebetweenasupernode andanothernodeinthenetwork.Whilecalculatingthedegreesofthe non-compressednodes,onlythesolidarrowsaretakenintoaccount.a Thestateofnetwork P duringcompressionlevel x beforethe i th intermediatestepi.e., P x i )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 .Thenodewiththeminimumdegreeisdenoted with v a anditsrstneighborisdenotedwith v b .bThestateofthisnetwork afterthe i thcompressionstepi.e., P x i .Wedenotethenoderesultedfrom thecompressionatthisstepwith v ab interval[1, 2 x ].Thelimitationonthenumberofreactionsineachnodeallowsthe MDS methodtorespectandhighlypreservetheinitialtopologyofthequerynetworks.Thisis veryimportantforthealignmentasitmakessignicantuseofthenetworktopologies. Additionally,theboundonthenumberofreactionsineachsupernodetranslatestoa uniformcompressionforbothnetworkswhichlimitsthesizesofthesmalleralignment problemswecanencounterintherenementphaseSection4.2.3.Thisallowsusto keepundercontrolthecomplexityandtherunningtimeoftherenementphaseofour alignmentframework. 4.1.2OptimalityAnalysisforMDS Intheprevioussection,wedescribedindetailthecompressionmethod MDS we useinourframework.Ideally,itispreferabletocompressthegivennetworkasmuchas possibleateachcompressionlevel.Thisisbecausesmallernetworksizeoftenimplies smallertimeandmemoryusageforthealignment.Wesaythatacompressionisoptimal 104

PAGE 105

iftheresultingcompressednetworkcontainsthesmallestnumberofnodesamong allpossiblecompressionswiththerestrictionthateachnon-compressednodecanbe mergedwithatmostoneothernon-compressednodeateachcompressionlevel.We namethehypotheticaloptimalcompressionmethodthatcanachievethebestpossible compressionrateas OPT .Intherestofthissection,weanalyzetheoptimalityofour MDS methodunderdifferentconditions.Werstconsidereachconnectedcomponent oftheinputnetworkthatwillbecompressedseparatelyandthenintegratetheirresults togeneralizeouranalysisfornetworkswitharbitrarytopologies. Westartbyintroducingthenotationweuseinthissectiontohandlenetworkswith morethanoneconnectedcomponents.Let P beametabolicnetworkwith r connected components.Wedenotethesecomponentsby C 1 = ^ V 1 ^ E 1 C 2 = ^ V 2 ^ E 2 ::: C r = ^ V r ^ E r suchthat P = S r j =1 ^ V j ; S r j =1 ^ E j .Let C = ^ V ^ E beanarbitrarycomponent of P and x representthecompressedformof C after x levelsofcompressionusing eitherthe MDS methodor OPT thatachievestheoptimalcompression.Weuse starasagenericsymboltoavoidintroducingnewsymbolsforeachcompressed componentinplaceswhereonlytheirsizesareofrelevance.Weuse MDS C; x OPT C; x todenotethetotalnumberofcompressionstepsperformedtotransform C intoitscompressedformafter x levelsofcompressionbyusingthecorresponding methods.Recallthateachcompressionstepreducesthenetworksizebyone.Thus, thebiggerthesevalues MDS C; x and OPT C; x thebettertheyareintermsof compressionrate.Therstandsecondargumentsinthisnotationcanbeanystateofa connectedcomponentoranetworkatanypointduringthecompression.Forinstance, OPT C x i ; x denotesthenumberofcompressionstepstakenby OPT startingfrom i +1 thintermediatestepofthe x thleveluntilthe x thlevelofcompressioniscompleted. Inthefollowing,werstprovethatthe MDS methodmakesanoptimalchoicein termsofwhichtwonodestocompressateachcompressionstepifthereexistsanode withdegreeoneinthecurrentstateforagivencomponent.We,then,showthatifno 105

PAGE 106

nodewithdegreeoneexistsatacompressionsteptakenby MDS canincreasethesize ofthecompressedcomponentbyatmostoneascomparedtotheonefoundby OPT Finally,byaggregatingtheresultsfromeachcomponent,foragivenmetabolicnetwork P andacompressionlevel c ,wedevelopanupperboundonthesizeofthecompressed networksobtainedby MDS withrespecttothesizeofnetworkthatcanbeobtainedby theoptimalmethod. Lemma5. Let C = ^ V ^ E denoteaconnectedcomponentofagivenmetabolicnetwork P .Let C x i = ^ V x i ^ E x i denotethestateof C afterthe i thstepofthe x thcompression level.Ifthereexistsanodein ^ V x i withdegreeone,thenthecompressionsteptakenby the MDS methodtocreatethenextstate C x i +1 isoptimal.Formally, OPT C x i ; x =1+ OPT C x i +1 ; x Proof: Weprove4bycontradictionintwoparts: Part1. OPT C x i ; x 6 < 1 + OPT C x i +1 ; x Part2. OPT C x i ; x 6 > 1 + OPT C x i +1 ; x Therstparti.e., 6 < istrivial.Thenumberofcompressionstepsof OPT after performingonestepofcompressioncannotbelargerthanthenumberbeforeperforming thisstep,otherwisethesolutionof OPT C x i ; x cannotbeoptimal.Thisleadstoa contradiction,henceprovesPart1. Toprovethesecondparti.e., 6 > ,itisimportanttorecallhowthe MDS method progressesgiventhestate C x i atwhichthereexistsatleastonenode v a with deg v a = 1 .Thismethodpicks v a .Thenode v a hasexactlyonenon-compressedneighbor,say v b .Thus, MDS mergesthemtocreatethesupernode v ab Figure4-2.Wecomplete theproofbyconsideringtwocases.Intherstcasethe OPT methodmerges v a and v b whilecompressing C x i .Inthiscase,wecanassumethat OPT takesthisstepasitsnext stepincompressing C x i ,sinceaxedcompressednetworkcanbeobtainedbyarbitrarily shufingtheorderofintermediatesteps.Therefore,if v a and v b arecompressedat 106

PAGE 107

anypointintheoptimalmethod,thentheoptimalsolutionfor C x i +1 ,whichiscreatedby applyingthe MDS methodon C x i ,hasexactly OPT C x i ; x )]TJ/F15 11.9552 Tf 12.309 0 Td [(1 compressions.Hence, OPT C x i ; x = 1 + OPT C x i +1 ; x and OPT C x i ; x 6 > 1 + OPT C x i +1 ; x Inthesecondcase v a and v b arenotmergedtogetherintheoptimalsolution.This caseimplies v a isleftasasingletonattheendofthe x thlevelas deg v a =1 .Then,the networkthatresultsafterremoving v a andalltheedgesconnectedtoitcanhaveatmost OPT C x i ; x compressionsuntiltheendofthe x thlevelsinceotherwiseitcontradicts withtheoptimalityof MDS .Thisshowsthatthenumberofcompressionsthatcanbe achievedwhen v a isleftasasingletoncannotbegreaterthanoneplus OPT C x i +1 ; x Thus, OPT C x i ; x 6 > 1 + OPT C x i +1 ; x andcombiningitwiththerstparti.e., 6 < we get OPT C x i ; x = 1 + OPT C x i +1 x Lemma6. Let C = ^ V ^ E denoteaconnectedcomponentofagivenmetabolicnetwork P .Let C x i = ^ V x i ^ E x i denotethestateof C afterthe i thstepofthe x thcompression level.Ifthenodewithminimumdegreein ^ V x i hasdegreegreaterthanone,thenthe compressionsteptakenbythe MDS methodtocreatethenextstate C x i +1 canincrease thesizeoftheoptimalcompressednetworkthatcanbeobtainedfromthestate C x i byat mostone.Formally, OPT C x i ; x 2+ OPT C x i +1 ; x Proof: Let v a betherstnodeinthelistofminimumdegreenodesin ^ V x i .Fromthe assumptionweknow deg v a > 1 andhenceithasatleastonenon-compressed neighbornodeof v b thatalsohas deg v b > 1 .Withoutlossofgeneralityassumethat the MDS methodmerges v a and v b tocreatethesupernode v ab atthecompressionstep from C x i to C x i +1 .Thisstepcanpreventatmostoneneighborof v a ,say v c ,andatmost oneneighborof v b ,say v d ,tobemergedwiththecorrespondingnodeinlatersteps. Noticethat v c and v d arenotnecessarilydistinct.The MDS algorithmcanalsomerge v c and v d inthenextstepsiftheyarealsoneighborsthoughwedonotknowitforsure 107

PAGE 108

atthispoint.Thisresultsineitheronecompressionortwocompressionsusingonly thefournodes v a v b v c and v d bythe MDS method.Next,wecalculatethenumberof compressionstepsthatthe OPT methodcantakeforcompressingthesefournodes. Therearethreecasestoconsider: Case1. The OPT methodmerges v a with v b atanypointduringthe x thlevelofcompression. Thiscaseisequivalenttomerging v a with v b inthenextstepby MDS andthen compressingtherestofthenetworkby OPT .Inotherwords, MDS alreadytakes theoptimalcompressionstep.Hence, OPT C x i ; x = 1+ OPT C x i +1 ; x 2 + OPT C x i +1 ; x Case2. The OPT methodmerges v a with v c atanypointduringthe x thlevelofcompression. Theworstcasescenarioforthe MDS methodinthiscaseiswhen v c isnot connectedto v d andthe OPT methodmerges v b with v d inalaterstep.Thisway the OPT methodoptimallycompressesfournodesdowntotwosupernodes, namely v ac and v bd .Ontheotherhandthe MDS methodcreatesasingle supernode, v ab ,andthenodes v c and v d remainassingletonHowever,evenfor thisworstcase,the MDS methodpreventsonlyonecompressionsteptotake placewithrespectto OPT .Hence, OPT C x i ; x 2 + OPT C x i +1 ; x Case3. The OPT methodmerges v b with v d atanypointduringthe x thlevelofcompression. Wecanprovethissimilarto Case2 bythesymmetry. Usinglemmas5and6,Theorem3developsanupperboundonthenumberof compressionthatcanbemissedby MDS withrespecttotheoptimalcompression. Theorem3. O PTIMALITY B OUND F OR MDS Let P beametabolicnetworkwith r connectedcomponents C 1 = ^ V 1 ^ E 1 ::: C r = ^ V r ^ E r suchthat P = S r j =1 C j and c beapositiveintegergivenasthedesirednumberofcompressionlevels.Let C = ^ V ^ E denoteanarbitraryconnectedcomponentof P .Also,let s representthenumberof intermediatestepsforwhichnonon-compressednodeswithdegreeoneisfoundduring thecompressionfrom P to P c bythe MDS method. Then,eachofthefollowingstatementshold: 1. OPT C x )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 ; x 2 MDS C x )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 ; x for x = 1, ::: c 2. OPT P; c s + MDS P; c 3. OPT P; c min f 2 MDS P; c s + MDS P; c g 108

PAGE 109

Proof: 1.ThispartfollowsfromLemma5and6.Lemma5statesthecasewhen MDS methodisequivalentto OPT .Lemma6givesanupperboundonthenumberof compressionstepsthat MDS canmiss.Theworstcaseiswhentheboundary conditionofLemma6holdsforeachstepofthe x thcompressionlevelfor C x )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 .In thiscase,thenumberofstepstakenbythe OPT methodwhilecompressing C x )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 istwotimesthenumberforthe MDS method. 2.ThispartalsofollowsfromLemma5and6.Throughoutthecompressionofthe entirenetwork P by c levels,eachstepofthe MDS methodthatsatisesthe conditioninLemma6candecreasethenumberofpossiblemergeoperations byonewithrespectto OPT .Bysimplycountingthesesteps,attheendofthe executionofthe MDS methodwecangivetheupperbound s + MDS P; c onthe numberofoptimalcompressions OPT P; c 3.Part2showsthat OPT P; c s + MDS P; c .Itisonlynecessarytoshow OPT P; c 2 MDS P; c .Part1provesthisresultforasingleconnected component C forthe x thcompressionlevel. P isgivenas S r j =1 C j beforethe rstlevelofcompression.WeknowbyPart1that OPT C; 1 2 MDS C; 1 Summingthisupforall j from1to r ,weget OPT P; 1 2 MDS P; 1 .This equationholdsforeachcompressionlevel x from1to c .Summationover x gives P c x =1 OPT P x )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 ; x P c x =1 MDS P x )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 ; x .Hence,weprove OPT P; c 2 MDS P; c AnotherwayofinterpretingTheorem3istotransformittoanupperboundon thesizeofthecompressednetworkgeneratedby MDS intermsoftheonethat canbeobtainedby OPT .Bycarryingoutthistransformation,weanswertherst questionwepointedoutintheintroductionwhichisHowfarisourcompressionmethod fromtheoptimalcompression?.Wedothisasfollows.Let P beanetworkofsize n Givencompressionlevel c ,letusrepresentthenumberofcompressionsstepsofthe OPT methodwith = OPT P; c .Also,let n OPT and n MDS denotethesizesofthe compressednetworksobtainedbythe OPT and MDS methodsrespectively.Bythe boundgiveninTheorem3,weknowthat MDS P; c > = d 2 e .Therefore,wecanwrite n OPT = n )]TJ/F24 11.9552 Tf 12.776 0 Td [( and n MDS n )-278(d 2 e .Also,weknowbydenitionthat P c x =1 b n 2 x c 109

PAGE 110

Usingthisinequality,weget: n OPT n )]TJ/F25 7.9701 Tf 18.756 14.944 Td [(c X x =1 b n 2 x c n MDS n )-222(d c X x =1 b n 2 x +1 ce Ifweexaminetheratio n MDS n OPT ,for c =1 weget n MDS n OPT 3 2 forarbitrary n details omitted.Thisdemonstratesthatafteronelevelofcompression,thesizeofthe compressednetworkfoundbyourmethodisatmost 1 : 5 timesthesizeoftheoptimal network.For x = 1,2, ::: c ,thisratioisproportionalwith : 5 x .Wecanalsousethe boundonnumberofcompressionstepsgiveninthesecondstatementofTheorem3 togatherasimilarupperboundonthesizeofthecompressednetworkfoundby MDS Thetighterofthesetwoupperboundsonthenetworksizecanbecalculatedduringthe executionofthe MDS methodandreportedasanindicatorofhowmuchroomisleftfor improvingthecompression. 4.2AlignmentFramework Wedescribedtherstphase,namelythecompressionphaseindetailinSection4.1. Here,werstsummarizethebasealignmentmethod,SubMAP[96,97],weusein ourframeworkinSection4.2.1.Then,weexplainthetworemainingphasesofour framework,namelythealignmentphaseandtherenementphase.Thealignment phasefollowsthecompressionphaseandutilizesthebasemethodtondanalignment incompresseddomainSection4.2.2.Therenementphaseappliesthebase methodonthemappingsfoundinpreviousphasetofurtherrenethealignment resultsSection4.2.3.Afterdescribingallthephases,weanalyzethecomplexity ofeachphaseandcombinethemtoobtainthecomplexityoftheentireframework Section4.2.4.Last,weprovideaguidelineforselectingacompressionlevelthatis expectedtogivethebestperformancegainthatcanbereachedbyourframeworkwith respecttothebasealignmentmethodSection4.2.5. 110

PAGE 111

4.2.1OverviewofSubMAP Here,wetakeasmalldetourandexplainSubMAP,arecentmethodforaligning twometabolicnetworkswhentheyarenotcompressed.Itisworthnotingthatonecan replaceSubMAPwithanothermethodwithalmostnochangesinourframework.We pickSubMAPmethodforitshighaccuracyandbiologicalrelevanceasitconsiders subnetworksofthegivennetworksduringthealignment.Asubnetworkofanetwork isasubsetofthereactionsofthatnetworksuchthattheinducedundirectedgraphof thissubsetisconnected.Giventwometabolicnetworks P = V E and P = V E andapositiveinteger k ,SubMAPaimstondasetofmappingsbetweenthereactions of P and P withthelargestsimilarityscore,suchthat:iEachreactionin P P can maptoasubnetworkof P P withatmost k reactionsiiEachreactionof P and P can appearinatmostonemapping.Therstconditionallowsone-to-manymappingsof reactions,whichisessentialincapturingfunctionallysimilarpartsof P and P thatare ofdifferentsizes.Thesecondconditionenforcesconsistencyofthemappingsfoundby thealignment.Inprinciple,consistencymeansthattherecanbenotwomappingsinthe alignmentthatmaponenodeofanetworktotwodifferentnodesoftheother. TherststepofSubMAPistocreatethesetofallpossiblesubnetworksofsizeat most k foreachquerynetwork.Wedenotethenumberofthesesubnetworksfor P and P with N k and M k respectively.ThesecondstepofSubMAPistocalculatepairwise similaritiesbetweeneachpairofthesesubnetworksonefrom P andonefrom P .We deferthereadertoChapter3forthedetailsofthepairwisesimilarityscore.Thestep thatdominatesthetimeandspacecomplexityofSubMAPisthethirdstep.Theaim ofthisstepistocreateasimilarityscorethatcombinespairwisesimilaritieswiththe topologicalsimilarityofthenetworks.Adatastructurenamedthesupportmatrixis createdforthispurpose.Thesizeofthismatrixisquadraticintermsofthenumber ofsubnetworksofbothquerynetworks.Inotherwords,thesupportmatrixrequires O N k 2 M k 2 space.Thiscomplexityisveryimportantasitisthedominatingfactorinthe 111

PAGE 112

overalltimeandspacecomplexityofSubMAPandalsoitdeterminesthecomplexityof thenexttwosteps. Thenextstep,thefourthstep,istousepowermethodtoiterativelydistributethe pairwisesimilarityscoresofthesubnetworkmappingstotheirneighbormappingsto supporttheiralignment.ThefthandlaststepofSubMAPistoconstructavertex weightedconictgraphusingthesimilarityscoresfoundbythefourthstepandto ndamaximumweightindependentsetMWISofthisgraph.Themaximumweight correspondstothelargestalignmentscoreandtheindependencyofthesetassures theconsistencyofthemappingsreportedbySubMAP.Theresultingsetisasetof subnetworkmappingsof P and P 4.2.2AlignmentPhase TheSubMAPmethoddescribedabovealignsthenetworks P = V E and P = V E intheiroriginalform.Ourframeworkrstcompresseseachofthesenetworksto reducetheirsizesandthenalignsthecompressednetworksinsteadof P and P .Inthis section,weexplainhowwealignthecompressednetworks P c and P c thatareinthe compresseddomainoflevel c usingSubMAPwithagivenparameter k Letusrstconsider P c = V c E c .Eachnode v a in V c isasupernodeofthe reactionsin V .Also,bytheworkingofourcompressionmethod,weknowthateach supernode v a containsatmost 2 c reactions.Anedgefromthenode v a tothenode v b existsin E c ifandonlyifatleastonereactionin v a hasanedgetoonereactionin v b in E .Thesameargumentsholdfortheothernetwork P c aswell.Toalignthese compressednetworks,weconsidertheirnodes,whicharesupernodesofreactions,as iftheyarethereactionsofthemetabolicnetworks P c and P c .Thisway,wecandirectly applySubMAPtoalignthesenetworks.AsfarastheoperationoftheSubMAPmethod isconcerned,thisisnodifferentthanaligningtwonetworksthatareidenticaltothese networksbutareintheoriginaldomain.Thedifferenceisintheinterpretationofthe intermediatestepsandtheformofthemappingsfoundbythealignment.Forinstance, 112

PAGE 113

fortherststepofSubMAP,weenumeratethereactionsubnetworksofsizeatmost k in theoriginaldomain,whereasinthecompresseddomainweenumeratethesubnetworks ofsupernodeswhereeachsupernodecancontainmorethanonereactionandthe numberofsuchsupernodesinonesubnetworkisatmost k .Similarly,wecalculate thepairwisesimilarity,thesupportmatrixandtheconictgraphforthesubnetworksof supernodesi.e.,nodesof V c insteadofsubnetworksofreactionsi.e.,nodesof V Theresultingalignmentgivesusasetofmappingsbetweenthesubnetworksof P c and P c .Wecanthinkofthesemappingsasahighlevelviewofthealignmentbetween thenetworks P and P .Forinstance,fromFigure4-1Fonecanimmediatelyseethat theresultingalignmentwillmapnode a eithertonode a 0 ornode b 0 andthattheseare theonlyoptionsfornode a whichisimposedbythehigherlevelsupernodemapping a;b )]TJ/F24 11.9552 Tf 12.936 0 Td [(a 0 b 0 .Inthenextphase,weconsidereachofthesesupernodemappingsas smallerinstancesofthealignmentproblemandsolvethemtoobtainamorerened alignmentof P and P 4.2.3RenementPhase Eachmappingfoundbythealignmentphaseisasubnetworkpairwhereoneisfrom P c andtheotherisfrom P c .ThemappingsfoundbySubMAPcanhaveupto k nodes inonesubnetworkandonlyonenodeintheother.Ifwedenoteasubnetworkof P c with R c i andasubnetworkof P c with R c j ,theresultingmappingsofthealignmentphasewill beintheform R c i R c j .Wecanassume,withoutlossofgenerality,forthisspecicpair that R c i containsupto k nodesof P c and R c j containsasinglenodeof P c .Eachnode containedineitherofthesesubnetworksisasupernodethatcontainseitheronenode ortwonodesandanedgebetweentheminthepreviouslevelofcompression,namely the c )]TJ/F15 11.9552 Tf 12.858 0 Td [(1 thlevel.Forboth R c i and R c j ,wedecompresstheirnodesbyonelevelby retrievingtheconnectivitybetweenthesenodesinthe c )]TJ/F15 11.9552 Tf 12.342 0 Td [(1 thcompressionlevelthat wasencapsulatedinthe c thlevel.Thisdecompressionresultsinatmost 2 k nodesfrom c )]TJ/F15 11.9552 Tf 12.064 0 Td [(1 thlevelfor R c i andatmost 2 nodesfrom c )]TJ/F15 11.9552 Tf 12.064 0 Td [(1 thlevelfor R c j .Wethenrecursively 113

PAGE 114

alignthesesmallernetworksgeneratedfrom R c i and R c j byusingSubMAPuntilthe originaldomaini.e., c =0 isreached.Atthe c )]TJ/F24 11.9552 Tf 12.306 0 Td [(x threcursivestep,thesizesoftwo networksintermsofthenumberofreactionstheycontainfromtheoriginaldomainto bealignedcanbeatmost k 2 x foronenetworkand 2 x fortheother. Figure4-1Fillustratesthisonaconcreteexample.Thenetworkonthelefthastwo supernodesi.e., a b and e d eachcontainingtwonodeswithanedgebetween themandonesupernodei.e., c whichcontainsonlyonenodefromtheprevious levelofcompression.Theoneontherighthastwosupernodeswithtwonodesin each.Tounderstandhowdecompressionbyonelevelworks,wecanfocusonthe supernodemapping e d c 0 d 0 whichisfoundincompressionlevelone.Wecan thinkofdecompressionasremovingthecirclesthatsurroundthesesupernodesto getbacktheconnectivitywithintheirnodesinthepreviouscompressionlevel.Inour case,thisleadstothesmallnetworks d e and c 0 d 0 .Wealignthesesmallnetworks recursivelyusingSubMAPandreporttheirnalalignmentinonlyonerecursivecall sincethecompressionlevelisonlyoneforthiscase.Also,since k =1 isusedfor theeaseofthisexample,thesizesofthenetworks,intermsofthenodesinoriginal domain,oneachsideareatmost2fortherecursivecallfrom c =1 ascanbeseenfrom Figure4-1Fi.e., k 2 c =2 c =2 for k = c =1 4.2.4ComplexityAnalysis Havingnishedthediscussionofallthethreephases,nowwecananalyzethe overallcomplexityofourframework.Westartfromtherstphasewhichiscompression oftheinputnetworks P and P by c levels.Werstcalculatethecomplexityoftherst compressionlevelforthenetwork P withsize n .Ateachcompressionstep, MDS rst searchesforaminimumdegreenode.Onceitndsthisnode,itpicksoneofitsneighbor nodesandmergesthesetwonodes.Afterthismerging,itupdatesthedegreesofall theneighborsofeachofthemergednodes.Thersttwooftheseoperationstake O log n timeifproperdatastructuresareusedandthelastonecantake O n intheworst 114

PAGE 115

case.Sincethesizeofnetwork P is n ,therecanbeatmost b n 2 c c compressionsteps duringtherstlevelofcompression.Hence,thecomplexityofthecompressionforthe rstlevelis O n 2 .Sincetheinputsizesofthislevelislargerthanallthenextlevels,we cansafelyassumethateachofthesenextlevelsalsotake O n 2 andthecomplexityof compressionby c levelsistherefore O cn 2 .Eventhoughthisisnotatightbound,itis sufcientatthispointforthecomplexityofthenexttwophaseswilldominateit.Sincewe compressbothnetworks,theoverallcomplexityforthecompressionphaseis: O c n 2 + m 2 : Fortheanalysisofthenextphases,wemaketwoassumptionsbothofwhichare supportedbyexperimentalevidenceonthetopologicalpropertiesofmetabolicnetworks. Ourrstassumptionisthatateachlevelofcompressionourmethodreducesthe networksizebyhalf.Inotherwords,ifthesizesofourquerynetworksare n and m ,then thesizesofthecompressednetworksafter c levelsbythe MDS methodare n MDS = d n 2 c e and m MDS = d m 2 c e respectively.Thisismainlybecausemetabolicnetworkscontain manynodeswithlowdegrees[95].Ourexperimentsonalargedatasetofmetabolic networkssummarizedinTable4-2supportsthisaswell.Thesecondassumptionisthat thenumberofsubnetworksisaconstantmultipleofthenetworksizeforsmall k values. Inotherwords, N MDS = k n and M MDS = k m where k and k arefunctions of k butareindependentof n and m respectively.OurearlieranalysisinChapter3 demonstratedthatthenumberofsubnetworksfor k = 3,whichisthelargest k valuewe usehere,isintheorderof5 j V j foralargesetofmetabolicnetworks. Wearenowreadytoanalyzethecomplexityofthesecondphasewhichisthe alignmentphase.Bytherstassumption,weknowthatthesizesof P c and P c are n MDS = d n 2 c e and m MDS = d m 2 c e respectively.Bythesecond,wehavethenumberof subnetworksofthesenetworksas N MDS = k n and M MDS = k m foragiven k Also,weknowthatthecomplexityofSubMAPisquadraticintermsof N MDS and M MDS 115

PAGE 116

fromSection4.2.1.Therefore,thecomplexityofthesecondphaseis: O k 2 k 2 n 2 m 2 2 4 c : Thecomplexityoftherenementphasehastwofactorsinit.Therstoneisthe numberofmappingsfoundbythealignmentphase.SinceweknowthatSubMAP allowseachnodeofbothnetworkstobereportedinatmostonemapping,wehave atrivialupperboundonthenumberofpossiblemappingsintermsof n and m .The biggestnumberofmappingsisreportedwhenallthesubnetworksofbothnetworksare singletons.Inthiscase,thenumberofreportedmappingsistheminimumof n and m Wecanassumewithoutlossofgeneralitythat n
PAGE 117

4.2.5HowMuchShouldWeCompress? Inthissection,weprovideaguidelineforselectingavalueforcompressionlevel c thatresultsintheminimumrunningtime,amongotherpossiblevalues,forour frameworktoalignthequerynetworkswithforagiven k .Wemakeextensiveuseof thecomplexityresultsfoundinSection4.2.4intheproofofthebelowtheoremwhich formulatestheoptimal c foragiven k valueandthetwoquerynetworkswithsizes n and m .ThistheoremanswersthequestionWhatistherightamountofcompressionthatwe needtouseinordertominimizetherunningtimeofourframework?. Theorem4. O PTIMALLEVELOFCOMPRESSION Let P = V E P = V E betwo metabolicnetworkswithsizes n and m respectively,and k beagivenpositiveinteger. Assumewithoutlossofgeneralitythat n
PAGE 118

given k valueas: k 2 k 2 n 2 m 2 2 4 c + k 2 k 2 nk 2 2 4 c Ouraimistomaximize4 )]TJ/F20 11.9552 Tf 12.622 0 Td [(4withrespectto c .Weknowthatthisdifference isnegativei.e.,alignmentincompresseddomainiscostlierwhen c n assuming n
PAGE 119

get: c = log 2 8 = log 2 : 966 8 = 1 : 37 Ifweroundthistothenearestinteger,theEquation4suggeststhatweuseonlyone levelofcompressionforthisalignmentproblem.Wecancarrythecalculationssimilarly foranothersetofinputs n = m =80 and k =2 whichgivesaround2.15,suggesting 2 levelsofcompressionislikelytoprovidethebestrunningtimeimprovementforthis instance. 4.3ResultsandDiscussion Inthissection,weexperimentallyevaluatetheperformanceofourframework.First, wemeasurethecompressionratesachievedfordifferentvaluesof c .We,also,compare thecompressionratesofthe MDS methodthatselectstherstnodewithminimum degreeateachstepwiththeratesobtainedfromanumberofdifferentcompressions obtainedbyrandomizingthisnodeselectionSection4.3.1.Then,weanalyzethegain inrunningtimeandmemoryutilizationachievedbyourframeworkfordifferentvaluesof c and k Section4.3.2.Last,weexaminetheaccuracyofthealignmentsfoundbyour framework.WemeasuretheaccuracyintermsofthePearson'scorrelationcoefcient betweenthescoresofmappingsresultedfromalignmentincompresseddomainandthe onesresultedwithoutcompressingthenetworksSection4.3.3. Dataset: WeusethemetabolicnetworksfromtheKEGGpathwaydatabase[7].We downloadedallmetabolicnetworkswithatleast10reactionsfor20differentorganisms. Thisresultedin620networksintotalwithsizesrangingfrom10to97.Inordertoobtain largernetworks,wecombinedallthemetabolicnetworksbelongingtocarbohydrate metabolismforeachorganismintoonelargernetworkfor10oftheseorganisms. Similarly,wecombinedthenetworksofcofactorandvitaminsmetabolismofeachof these10organisms.Thesizesofthese20combinednetworksrangefrom59to279 reactions.Intotal,ourdatasetcontains640metabolicnetworksthathavesizesinthe interval[10,279]. 119

PAGE 120

Implementationandsystemdetails: Weimplementedourcompressionandalignment algorithmsinC++.WeranalltheexperimentsonadesktopcomputerrunningUbuntu 10.10with4GBofRAMandtwo2.66GHzprocessors. 4.3.1EvaluationofCompressionRates Theefciencyofouralignmentframeworkdependsonhowmuchthequery metabolicnetworkscanbecompressed.Forthisreason,inthisexperiment,we measurethenumberofnodesandedgesofthenetworksinourdatasetbeforeand aftercompressingthem.Recallthatthe MDS methodselectstherstnodefromthelist ofnodeswithminimumdegreeateachintermediatestepandcompressesitwithitsrst neighborfromthelistofitsneighbors.Inordertoevaluatestabilityofourcompression method,foreachnetworkinourdatasetwegeneratedanumberofdifferentcompressed networksbyrandomizingtheminimumdegreeselectionstepofourmethod.Inthe following,weexaminehowmuchcompressionweachievebythe MDS methodand discussitsstability. Table4-2summarizesthecompressionratesachievedbyourmethodfornetworks ofdifferentsizes.Wedivideallthemetabolicnetworksinourdatasetintosixgroups accordingtothenumberoftheirreactionsi.e.,networksize.Therstcolumnin Table4-2liststhenetworksizeintervalsweusedforeachgroup.Eachrowofthistable showsthenumberofnodesandedgesaveragedoverallthenetworksinthisgroup beforeandaftercompression.Thetwocolumnswith c =0 correspondtotheaverage numberofnodesandedgesofthenetworkswithnocompressionrespectively.For c 2f 1 ,2, 3 g ,wespliteachrowcorrespondingtoanintervalintotwo.Theupperpart denotestheaveragenodeandedgenumbersforthecompressednetworkifthe MDS methodisused.Thelowerpartrepresentsthenumbersgatheredwhenweintroduce randomizationinthenodeselection.Thatisatforcompressionstepatwhichthereare morethanonenodeswithminimumdegree,weselectonenodeamongthemrandomly andwerepeatthisprocess10timestoobtaindifferentcompressions.Eachvalueinthe 120

PAGE 121

lowerportionsoftherowsinTable4-2denotestheaverageofthecorrespondingvalue overthese10differentrunsofcompression. OneconclusionthatcanbedrawnfromTable4-2isthatindependentofthenetwork size,ourcompressionmethodperformswellinpractice.Ontheaverage,withonly onelevelofcompressionweachievenetworksizesthatare62%,68%and77%of thenetworksizeinthepreviouscompressionlevelfor c = 1,2and3.Inotherwords, ourmethodcompressestheentiredatasetdownto62%,42%and33%ofthesizesof originalnetworksfor c =1 ,2and3respectively.Thesecompressionvaluessuggestthat ourframeworkhasgreatpotentialinscalingthenetworkalignmenttolargemetabolic networks.Asanexample,considertherowcorrespondingtointerval[80,100in Table4-2.Weseethatinsteadofaligninganetworkwith88nodesand309edges, wecanapplythreelevelsofcompressionrstanddothealignmentwithasignicantly smallernetworkwithonly20nodesand60edges.Anotherobservationisthat,weget themostofthereductioninnetworksizeaftertherstcompressionlevel.Thatis,our methodcompressesthenetworksaggressivelyfor c =1 andachieves62%compression ratewhichisclosetothehalfofthesizeofthenetworks.Asweincreasethevalueof c theactualrateofcompressionatonelevelreduces. Anotherresultofthisexperimentalsetupisthatthe MDS methodisstable.In otherwords,itisnotaffectedbythechoiceofthenodetocompressaslongasthat nodeisselectedfromamongthenodeswithminimumdegree.Focusingontherow correspondingtointerval[80,100,wecanobservethatallofthedifferencesfor c =1 2 3 arelessthanoneforboththenumberofnodesandthenumberofedges.Some otherrowshaveslightlybiggerdifferencese.g.,therowcorrespondingtointerval [100 100+] ,however,noneofthemaresignicant.Fromtheresultsofthisexperiment,we concludethatourcompressionmethodisstableanditservesasanefcientrstphase ofouralignmentframeworksinceitachievesgoodcompressionratesonalargedataset ofmetabolicnetworks. 121

PAGE 122

4.3.2EvaluationofRunningTimeandMemoryUtilization Inordertounderstandthecapabilitiesandlimitationsofourframework,weexamine itsperformanceintermsofitsrunningtimeandmemoryutilizationonasetofnetworks fromourdatasetthatisextractedfromKEGG.Weuseddifferentcombinationsof k and c valuesaswellasdifferentnetworkswithsizesfromabroadspectrum.Whenthevalue of c isequaltozero,thealignmentiscarriedoutcompletelybyasingleapplicationof SubMAPwithoutanycompression.Thisprovidesusamechanismtomeasurehow muchperformancegainisachievedbyourcompressionbasedframeworkwithrespect toSubMAPthatusesnocompression. Tomeasurethechangeinresourceutilizationfordifferentvaluesof k c andthe networksizes,wegeneratedalargenumberofalignmentsbetweenthenetworksinour dataset.Wecreateaquerysetbyselectingnetworksofvaryingsizesfromtheinterval [20 279] .Wealsoselect10networkswithsizesveryclosetomultiplesoftenstarting from 10 to 100 asourdatabaseset.Theaveragenetworksizeofthissetis55,which isgreaterthandoubletheaveragesizefortheoveralldataset.66.Foreachquery networkwerunanalignmentwithallthenetworksinthedatabasesetforeach k value, c valuecombination.Wehavetwelvecombinationsintotalfor k = 1,2,3and c = 0,1,2, 3. Figure4-3Aillustratestheaveragerunningtimeofourframeworkforquery networkswithincreasingnumberofreactionswhen k = 1isusedforeachalignment. Weplotalltheresultsforallfourdifferentcompressionvaluesandalsodrawthetting curvestobetterillustratethetrendintheincreaseofrunningtime.Wecanobservefrom thegurethateachadditionalcompressionlevelimprovestherunningtimeoverthe previousoneforallquerysizeswhen k =1 .Weobtainthelargestgaininrunningtime byonlyonelevelofcompressionfortherstlevel.Thisisexpectedconsideringthatthe rstlevelofcompressionachievedthelargestcompressionrateasshowninTable4-2. Thesecondcompressionlevelimprovestherunningtimebyasmallerfactorcompared 122

PAGE 123

totherstandbyalargerfactorcomparedtothethirdlevel.For k = 1wewereableto plotallthepointsforall c valuesastherunningtimeforeventhelargestquerynetwork i.e.,size279withno-compressioni.e., c = 0isstillpractical,around100seconds. Figure4-3BiscreatedsimilartoFigure4-3Abut k = 2isusedinstead.We observedthatmostofthealignmentsintheoriginaldomain c = 0didnotnishinless thanacutofftimewhichwesetasonehour.Thisisbecausethenumberofsubnetworks increasedsignicantlywhenthevalueof k isincreasedtotwo.Therefore,wedidnot plottherunningtimevaluesfor c = 0.Tosimplifythegure,wealsoomittedtheplot for c = 1asitisverysimilartotheresultsfor c = 2.Wemakethedecisionofwhether ornottoplotapointbylookingatthepercentageofalignmentqueriesthatcompleted beforethecutofftime.Ifoutoftenpossiblealignmentsforaspecicquerynetwork,at leasteighti.e.,80%arecompletedbeforethecutoff,thenweplottherunningtimefor thisquerysizeandforthecorresponding c valueinourgure.Focusingonaspecic networksizeinFigure4-3B,wecanobservesignicantdecreaseinrunningtimedue tocompressionbythreelevelsinsteadoftwo.Forinstanceifwelookatthenetwork sizethatisclosestto200,theaveragerunningtimeofaqueryisaround20minutes for c = 2anditdropstoalmost2minuteswhen c = 3.Additionally,thepointsthatare omittedfromtheFigure4-3Bduetotheonehourcutofftimesuggestthatbyusingthe correctamountofcompression,ourframeworkmakesitpossibletoalignnetworksthat couldnotbealignedwiththebasemethodwhichisSubMAPinourcase.Webelieve thisisanimportantstepinleveraginglargerscalenetworkalignmentsfortheyprovidea morecompletepictureoffunctionalsimilaritiesandevolutionarydifferencesbetweenthe metabolicnetworksoftwoormoreorganisms. Weomitthedetailsandguresforthememoryutilizationduetothespace constraints.Wementionbrieythatontheaveragethememoryrequiredforalignments incompresseddomainisaround30%ofthatneededforalignmentwithnocompression usingtheSubMAPmethod.Therefore,ourframeworkdemonstratesagreatpotential 123

PAGE 124

inoveralltoprovidesignicantimprovementinboththerunningtimeandthememory utilizationofthebasealignmentmethod.Thisallowsustoalignlargenetworksthat couldnotbealignedbyexistingmethodsbyutilizingthesamehardware. 4.3.3AccuracyoftheAlignmentResults Weconcludeourexperimentalresultsbyansweringthelastquestionthatremained unansweredamongthequestionsweaskedintheintroduction.Howdoescompression affectthealignmentaccuracy?Inordertoanswerthis,wecalculateacorrelation betweenthescoresofeachpossiblemappingincompresseddomainandthescores thatweobtainforthesemappingsfromtheoriginalSubMAPmethod.Weconsider thescoresofeachpossiblesubnetworkmappingofcompressednodesfoundbyour framework.SincethemappingsfoundbySubMAParenotofthesameformwiththe mappingsincompresseddomain,wecalculateascorevalueforeachmappingin compresseddomainbyusingthescoresofthemappingsfoundbySubMAP.Thisway, wegettwosetsofscorevaluesonefromSubMAPonefromourframeworkforthesame setofmappings.WecalculatethePearson'scorrelationcoefcientbetweenthesetwo setsofscoresasanindicatorofthesimilaritybetweentheresultsofthetwomethods. Beforelookingatthecorrelationvalueswefound,itisimportanttodescribehowwe calculateascoreforamappingincompresseddomainfromthemappingsofSubMAP. Let P 1 and P 1 denotetheonelevelcompressedformsoftwometabolicnetworks. Let v 1 f v 1 v 2 g denoteamappingincompresseddomainwhere v 1 isasubnetwork of P 1 and f v 1 v 2 g isasubnetworkof P 1 .Also,let v 1 = f r 1 r 2 g v 1 = f r 1 r 2 g and v 2 = f r 3 g .Weknowtheedgethatmapsthesetwosubnetworkshasamappingscore inthecompresseddomainandletusdenoteitby j e 1 j for c = 1.Wewanttocompute amappingscore,say j e j ,for v 1 f v 1 v 2 g fromthemappingsinoriginaldomainthat iscomparableto j e 1 j .Thissubnetworkmappingincompresseddomaincontainssix possiblemappingsintheoriginal,namely r 1 r 1 r 1 r 2 r 1 r 3 r 2 r 1 r 2 r 2 and r 2 r 3 .Letusdenotethescoresofthesemappingsintheoriginaldomainby j e i j for 124

PAGE 125

A B Figure4-3.Theaveragerunningtimeofourframeworkwheneachquerynetworkis alignedwithallthenetworksintheselecteddatabasesetawhen k = 1 andbwhen k = 2.x-axisisthenetworksizeintermsofthenumberof reactions. c = 0denotethealignmentsperformedwithnocompression. c = 1,2,3denotetheresultsofourframeworkthatcompressesboththequery andthedatabasenetworksby c levelsbeforealigningthem. 125

PAGE 126

i = 1,2,...,6respectivetotheirordering.Then,wecomputethemappingscore j e j as 1 6 P 6 i =1 e i .Itisimportanttonotethat,thisscoreisaconservativechoiceamongother possiblescoringoptions.Thisisbecausetheaveragecanincludemappingscoresof subnetworkswithverylowsimilaritiesfromtheoriginaldomainofSubMAP.Thiscan underestimatethecorrectmappingscoreof j e j andhencedegradethecorrelationof compresseddomainandoriginaldomainmappingscores.Overall,foreachmappingin compresseddomainwithascore j e c j andwecalculatethecorrespondingscore j e j inthe originaldomainusingthisaveragescore. Table4-3summarizesthecorrelationvaluesfoundfromatotalof3600alignments. Foreachoftheninecombinationsof k = 1,2,3and c = 1,2,3,weran400alignments inthecompresseddomainandcalculatedthecorrelationofeachwiththealignment thathasthesame k valuebutisintheoriginaldomaini.e., c = 0.Table4-3shows theaveragecorrelationvaluesofthese400alignmentsforeach k value, c value combination.Therstcolumnindicatesthatthealignmentfoundbyusingonlyone compressionlevelishighlysimilartothealignmentusingthebasemethod.Combining thiswiththerunningtimegaininFigure4-3Afor c = 1,wecanstronglyarguethat compressionbyonelevelnotonlyprovidessignicantimprovementinrunningtime butalsoaccuratelycapturesveryhighpercentoftheoriginalalignmentresults.The accuracymeasuredintermsofcorrelationdropsto0.57ontheaveragewhenwe performthesecondlevelofcompressionandto0.51forthethirdlevel.Theseresults suggestthatwecanalmostalwaysuseonelevelofcompressiontobenetfromahigh performancegainwithoutlosingmuchaccuracyintermsofthealignmentresults.For c = 2and c = 3,eventhoughtheaccuracyoftheirresultsaresignicantlybetterthan random,theyshouldbeusedwithcautioniftheaccuracyofthealignmentisthemain concern. 126

PAGE 127

4.3.4Discussion Inthischapter,weconsideredtheproblemofaligningtwometabolicnetworks particularlywhenbothofthemaretoolargetobedealtwithusingexistingmethods. Tosolvethisproblem,wedevelopedaframeworkthatscalesthesizeofthemetabolic networksthatexistingmethodscanalignsignicantly.Ourframeworkisgenericas itcanbeusedtoimprovethescalabilityofanyexistingnetworkalignmentmethod.It hasthreemajorphases,namelythecompressionphase,thealignmentphaseandthe renementphase.Fortherstphase,wedevelopedanalgorithmwhichtransforms thegivenmetabolicnetworkstoacompresseddomainwheretheyaresummarized usingmuchfewernodes,termedsupernodes,andinteractions.Inthesecondphase, wecarriedoutthealignmentinthecompresseddomainusinganexistingmethod, SubMAP,asthebasealignmentalgorithm.Intherenementphase,weconsidered eachindividualmappingofsupernodesonebyone.Eachsuchmappingcorresponds toasmallerinstanceofnetworkalignment.Foreachofthesemappings,wesolved thealignmentproblemusingSubMAPasourbasemethod.Ourexperimentsonthe metabolicnetworksextractedfromtheKEGGpathwaydatabasedemonstratethatour compressionmethodreducesthenumberofreactionsbyalmosthalfateachlevelof compression.Asaresultofthiscompression,weobservethatSubMAPcoupledwith ourframeworkcanaligntwiceormoreaslargenetworksasitsoriginalversioncan usingthesameamountofresources.Ourresultsalsosuggestedthatthealignment obtainedbyonlyonelevelofcompressionbenetsfromasignicantperformance gainwhilecapturingtheoriginalalignmentresultswithveryhighaccuracy.Webelieve thatthismethodtakesanimportantstepinscalingthemetabolicnetworkalignment problemtorealsizednetworks,andthus,itwillhavegreatimpactonmakingtheexisting computationalnetworkalignmentmethodsusefulfordomainscientists. 127

PAGE 128

Table4-1.Commonlyusedsymbolsinthischapter. SymbolDescription P = V E P = V E Querymetabolicnetworks V V Setsofallreactionsofthequerynetworks r i 2 V r j 2 V Reactionsofthequerynetworks n = j V j m = j V j Sizesofthequerynetworks c 2 c Compressionlevelandcompressionrate P c = V c E c P after c levelsofcompression C i = ^ V i ^ E i Aconnectedcomponentofnetwork P N v a deg v a Thesetofneighborsanddegreeofnode v a j v a j Numberofreactionsthatarecontainedin v a v ab Asupernodecontainingthenodes v a and v b k Parameterforthelargestsubnetworksize R k R k Setsofallsubnetworksofsizeatmost k R i R j Subnetworksofthequerynetworks N k M k Numbersofallsubnetworksofsizeatmost k Table4-2.Summaryofcompressionratesforallthenetworksinourdataset.Wecreate sixintervalsaccordingtonumberofreactionsinthesenetworks.Eachrow, correspondingtoonesuchinterval,showstheaveragenumberofnodesand edgesbeforecompressioni.e.,c=0andaftercompressionofdifferentlevels i.e., c = 1,2,3bothbythe MDS methodtopentriesandbyitsrandomized versionaveragedover10differentrunsbottomentries. NetworksizeAverageNumberofNodesAverageNumberofEdges intervalsc=0c=1c=2c=3c=0c=1c=2c=3 [10,2014.05 8.856.375.25 18.37 9.224.801.95 8.866.435.179.214.852.18 [20,4026.74 15.9110.558.10 46.32 25.8315.446.82 15.9110.728.0525.8315.667.72 [40,6047.95 30.8121.6417.67 76.76 45.0732.0020.24 30.8021.7616.9944.9133.1821.52 [60,8069.90 40.1026.0019.50 198.30 113.7074.4045.90 40.1626.1518.89113.9675.8448.41 [80,10088.25 47.7527.7519.88 309.00 165.6398.1359.63 47.7927.6919.36165.8399.3856.83 [100,100+173.44 98.4461.3147.44 1619.88 930.81515.44248.19 98.3261.3345.51924.46518.18276.47 All26.66 16.0610.838.56 78.82 44.2225.4612.48 16.0710.948.3944.0525.7513.72 128

PAGE 129

Table4-3.CorrelationofthemappingscoresfoundbySubMAPandbyourframework. k/c123Average 10.890.560.530.66 20.850.580.500.64 30.840.570.490.64 Average0.860.570.510.65 129

PAGE 130

CHAPTER5 FUNCTIONALSIMILARITIESOFREACTIONSETSINMETABOLICNETWORKS Thebiologicalfunctionsoftworeactionsetscanbeequivalentevenwhentheir size,connectivity,intermediateproductsandcatalyzingenzymesaredifferentaswe haveseeninthepreviouschapters.Therefore,therecanbecasesinwhichneitherthe topologicalfeaturese.g.,centralitynorthehomologicalsimilaritiese.g,enzymes similarity,compoundsimilarityprovidesufcientinformationforidentifyingsuch functionalsimilarities.Otherthanhomologyandtopologybasedmethods,another commonwaytoanalyzemetabolicnetworksistoidentifytheirmetaboliccapabilitiesin termsoftheirsteadystates.Asteadystateofanetworkisafeasibleuxdistribution thatrepresentsapossiblelongtermoutcomeofthatnetwork.Thesestatesdenea polyhedralconeinahighdimensionalspacewhereuxesofthenetworkcorrespondto dimensions.Figure5-1depictsanexampleofametabolicuxconeofahypothetical networkwiththreeuxes. Anumberofmodelshavebeenproposedtoanalyzethemetaboliccapabilitiesof anetwork,suchaselementaryuxmodesEFMs[98],extremepathwaysEPs[99], uxbalanceanalysisFBA[100]andminimalmetabolicbehaviorsMMBs[101].All thesemodelsaredifferentinterpretationsoftheuxcone.Theycomputethisuxcone usingthestoichiometricconstraintsofthereactionsoftheunderlyingnetworkinterms ofitsextremeraysemanatingfromtheoriginofthehighdimensionaluxspace.We elaborateonthesemodelsandsomeimportantpropertiesofthemetabolicuxconein Section5.1. Atthispoint,weexplaintheconceptofEFMonarealexampleasitisanintegral partoftherestofthechapter. Example1. Figure5-2illustratesametabolicnetworkwith11reactionsand7compounds.Schuster etal. [102]identiedsixmeaningfulEFMsofthisnetwork.EachEFM isan11dimensionalvectori.e.,onedimensionperreaction.TheseEFMsare e 1 = [1 130

PAGE 131

Figure5-1.Anillustrationoftheimpactofareactionset R onnetwork P P hasve elementaryuxmodesEFMsdenotedby e 1 , e 5 .Afterinhibitingall reactionsin R ,onlyfouroftheseremainfeasible,namely e 1 e 2 e 4 and e 5 Theimpactof R on P isthechangeintheuxconerepresentedbythe shadedpolyhedralconeboundedby e 2 e 3 and e 4 0110110010], e 2 = [10110111001], e 3 = [01-101-1011-10], e 4 = [01 -101-10010-1], e 5 = [000000010-11]and e 6 = [0000001011-1].Here, each e i representsanEFMand0,1and-1sdenoteuxvaluesforthereactionsfrom r 1 leftmostto r 11 rightmost.Forreversiblereactions,suchas r 3 r 6 r 10 and r 11 ,-1 indicatesthatthereactiondirectionisfromrighttolefti.e.,inversereaction. Biologically, e 1 correspondstoglycolyticpathway; e 2 representstheanaplerotic pathfromglucoseGlutooxaloacetateOAA; e 3 and e 4 aregluconeogenesispathwaysstartingfrompyruvatePyrandoxaloacetateOAArespectively; e 5 represents theconversionfrompyruvatetooxaloacetate;and e 6 isontheroutetosynthesisof severalaminoacidsinglucose-poorenvironment[102].EachoftheseEFMsarenondecomposablei.e.,minimalinthesensethatifanyofnonzerouxvaluesfor e i issetto zero,then e i isnotanEFManymore. Eachcomponenti.e.,enzyme,compound,reactionofanetworkcancontribute tothesetofpossiblesteadystatesofthatnetwork.TheabovemodelsEFMs,EPs, etc.arekeytoolsinexaminingtheoverallsteadystatebehavior.They,however, donotprovideanyinformationonhowmucheachcomponentoracomponentset contributestothesesteadystates.Analyzingandquantifyingtheimpactsofthese 131

PAGE 132

Figure5-2.SimpliedmetabolicnetworkofGlycolysisandGluconeogenesistakenfrom Schuster etal. .Glucoseisassumedtobeanexternalcompound.This networkcontains11reactionsdenotedbyr1,r2,...,r11.Here,r10andr11 areexchangereactions. componentsprovideabetterunderstandingofthenetwork.Whenthereactions ofanetworkareofinterest,severalexistingapproachesmeasuretheimpactofa reactionasthenumberofitsneighborscentrality[103],thenumberofcompounds ituniquelyproducesorconsumesUP/UC[104]andthenumberofEFMsthatit participatesinparticipation[105,106].However,noneofthesemethodscharacterize thebiologicalfunctionsofareactionsetasafunctionofthesteadystatesofitsnetwork. Characterizationoftheimpactofareactionsetontheoverallmetaboliccapabilities ofanetworkisanexcitingtaskasitisofgreatusetomodeltheoutcomesofdifferent perturbationsformetabolicengineeringapplications. Inthischapter,wedevelopasystematicwaytocharacterizeandcomputethe functionalsimilaritybetweentwogivenreactionsetsofmetabolicnetworks.Weachieve thisgoalintwosteps.Weelaborateonthesestepslaterinthissectionandintherestof thechapter. Step1.Givenametabolicnetwork P andasubset R ofitsreactions,wecalculate theimpactof R onthesteadystatesof P Step2.Givennetworks P P andsubsetsoftheirreactions R and R respectively, wecomputethefunctionalsimilarityof R and R intermsoftheirimpactson P and P respectively. Atahighlevel,wemodeltheimpactofareactionset R intermsofthesteady statesofitsnetwork P .Specically,wecomputeitastheportionoftheuxconeof theoriginalnetworkthatcannotbeachievedwithoutthereactionsof R .Todothis,we 132

PAGE 133

inhibitallthereactionsof R andcomputethenewuxcone.Thedifferencebetweenthe oldandnewuxconesistheimpactof R onnetwork P .Figure5-1demonstratesthe notionofimpactonahypotheticalexample.Thegreyportionillustratestheuxvalues thatarereachableinsteadystatebeforeinhibitingthereactionsin R butnotreachable afterinhibitingthem.Ifthereisasteadystateuxdistributionthatcorrespondstoakey biologicalfunctione.g.,optimalproductionofacertaincompoundthatisessentialfor thesurvivalinthisgreyarea,thentheperturbationharmsthisfunctionasthisstate couldnotbeachievedanymore.Intuitively,thissuggeststhat R hasanimportantrolein thatspecicfunction. Example2. Here,wewanttoillustrateonarealexamplehowtheimpactsofdifferent componentsofanetworkcanbesignicantlydifferent.Weusethesamenetworkas inExample1Figure5-2.Considertheimpactsoftwoirreversiblereactions r 1 and r 7 Whenweinhibit r 1 ,thetwoEFMs e 1 and e 2 arenomorefeasible.Thisimpliesthatthe perturbednetworkisneithercapableofusingglycolyticpathwaynortheanapleroticpath fromglucosetooxaloacetate.Ontheotherhand,itcanstillsynthesizenecessaryamino acidswithoutusingglucosethrough e 6 .However,ifweinhibit r 7 ,itwillrendertheEFM e 6 uselessinadditionto e 1 and e 2 .Inthiscase,theperturbednetworkcannotsynthesize necessaryaminoacidsneitherwithnorwithoutusingglucose.Hence,theeffectof inhibiting r 7 ismorelikelytobelethalcomparedto r 1 .Wecanseehowthistranslates intoourimpactmodelbyacloserlookatEFMs e 1 e 2 and e 6 e 1 and e 2 aredifferentfor only3dimensionswhereas e 6 differsfrom e 1 and e 2 in6and7dimensionsrespectively. Thisimpliesthattheshiftintheuxconeby r 7 whichexterminates e 6 ontopof e 1 and e 2 willhaveeffecton7moredimensionsthantheshiftcausedby r 1 .Therefore,inour model,theimpactof r 7 onthisnetworkwillbesignicantlylargerthantheimpactof r 1 Wediscusshowtheimpactsofreactionsubsetscomputedinthismannercanbeused topredictessentialreactionsinSection5.4. 133

PAGE 134

Utilizingtheimpactsasmodeledabove,wecharacterizethefunctionalsimilarity betweentworeactionsetsfrompotentiallydifferentnetworks.Wedothisbyrstlifting theuxconesofbothnetworkstoahigherdimensionaluxspacethatistheunion ofuxspacesofthesenetworks.Thetermliftingdenotesthetransformationofa geometricobjecttoahigherdimensionalspace.Weexplainwhyweusetheunionof uxspacesinSection5.2.Here,itisnecessarytoknowthatthisprocesspreserves EFMsoftheuxcones.Inotherwords,byliftinganEFMofanoriginalnetworktothe newuxspace,wegetanEFMofthatnetworkextendedwithadditionaluxes.We, then,representthefunctionalsimilarityoftworeactionsubsetsastheratioofthevolume oftheintersectionoftheregionscorrespondingtotheirimpactsontheirnetworksinthe newuxspacetothatoftheirunion. Functionalsimilaritymodeledinthisfashionisanaccurateindicatorofthe similaritiesofbiologicalrolesofdifferentreactionsetsasitrepresentstheratioofthe commonsteadystatesthatarenotpossiblewithoutthecontributionofthesereaction setsintheirnetworks.Thenextstepistodeviseanefcientmethodthatquantiesthis notion.Here,weproposeanovelmethodthatutilizesEFMsofmetabolicnetworksand convertsfunctionalsimilaritycalculationintoahighdimensionalgeometricproblem.The setofEFMsgeneratesthespaceofsteadystateuxdistributionsforanetwork.Incase ofmetabolicnetworkswithonlyirreversibleuxes,thesetofEFMsthatdelineatesthe uxconeisunique.Additionally,theuxconeispolyhedralinthiscaseandthereforeit isnitelygenerated[101].Inotherwords,foragivennetwork P ,itsuxcone C P is uniquelydenedbythesetofEFMs E P = f e 1 ;e 2 ;:::;e n g .TocomputetheEFMsofa network,inourmethod,weuseanexistingimplementationcalledMetatool[98].First, wecomputetheEFMsofthequerynetworks,say P and P ,byusingMetatool.Each setofEFMsgeneratetheoriginaluxconeofthecorrespondingnetwork.Then,fora givenreactionsubset R ofanetwork P ,werstremoveallthereactionsof R fromthis network.Biologically,thiscorrespondstoinhibitingtheenzymesthatcatalyzethese 134

PAGE 135

reactions.Then,wecomputetheregionthatrepresentstheimpactof R on P .Dueto thenon-decomposabilitypropertyofEFMs,theimpactcanalsoberepresentedasa polyhedralconeandhasasetofEFMsthatdenesit.Wedenotethissetby E P)]TJ/F24 11.9552 Tf 24.261 0 Td [(R Similarly,wecompute E P)]TJ/F15 11.9552 Tf 26.463 3.022 Td [( R fortheimpactof R on P Atthispoint,wehavetwosetsofEFMsinhandthatdenetwopolyhedralcones. Now,theproblemofndingthefunctionalsimilarityof R and R isequivalenttopurely geometricproblemofndingtheintersectionandtheunionofthesetwocones. This,however,iscomputationallydifcultasthereisnoclosedformsolutiontothis problem.Totacklethis,wersttransformthepolyhedralconesintopolytopesbytaking theirintersectionswithahyperplaneinthepositivequadranti.e.,alluxvaluesare non-negative.Weshowthatthistransformationpreservestheratiobetweenthe intersectionandtheunionoftheuxcones.Then,wecomputetheintersectionofthese polytopesastheintersectionoftheirminimumenclosingballsMEB.FindingMEBisa well-studiedcomputationalgeometryproblemandefcientsolutionsexistforproblems withuptoafewhundreddimensions.WeuseanefcientimplementationofFischer et al. [107]tocomputeMEBs.Finally,wenormalizethisintersectionandconvertittoa functionalsimilarityscore.WeelaborateoneachstepofourmethodinSection5.3. Ourexperimentsonrealmetabolicnetworksshowthatthefunctionalsimilaritywe proposehereisofimportantuseforidentifyingreactionsetsthatperformbiologically similarfunctions.Moreover,weobservethatourdenitionofimpactcanprovide biologicallyandstatisticallysignicantpredictionsofessentialreactionsets. Thefollowingsummarizesourtechnicalcontributions.Webuildamathematical modelfortheimpactofasetofreactionsonthesteadystatespaceofametabolic network.Wecharacterizethefunctionalsimilarityoftwosetsofreactionsintermsof theirimpactsonthemetaboliccapabilitiesoftheircorrespondingnetworks.Wedevelop anefcientmethodtocomputethefunctionalsimilarityofcomponentsetsfromdifferent networks. 135

PAGE 136

Therestofthechapterisorganizedasfollows.Section5.1summarizesexisting approachesforanalyzingmetabolicnetworks.Section5.2describeshowwemodelthe impactsofreactionsetsaswellasthefunctionalsimilaritybetweenthem.Section5.3 presentsourmethodthatcomputesthefunctionalsimilarityscore.Wereportour experimentalresultsinSection5.4. 5.1Background Thereareseveralexistingmethodsthatmodelmetabolicnetworksbymeansof theirsteadystates.Here,webrieydescribefourcommonlyusedmodels,namely elementaryuxmodesEFMs[98],extremepathwaysEPs[99],uxbalanceanalysis FBA[100]andminimalmetabolicbehaviorsMMBs[101].FBAdiffersfromtheother threemodelsasitaimstondasteadystatethatmaximizesagivenobjectivefunction. Alltheothermodelsdenetheuxconeofanetworkthatcontainsallofitspossible steadystates.InordertouseFBAweneedtoknowtheobjectiveofthecellsthatwe areanalyzing.Thisobjectiveisoftenmaximizingbiomassproductioninsinglecell organisms.However,incomplexorganismsdifferentcellshavedifferentobjectivesand itisusuallynotpossibletorepresenttheseobjectivesasawell-denedmathematical function.Furthermore,FBAdoesnotidentifysuboptimalstateswhichprovidebetter understandingofthesteadystateuxdistributionforperturbednetworks[108]. ArelativelynewermodelnamedMMBsdenestheuxconeusingaconstraint-based approach.Thismethodusesanouterdescriptionoftheuxconewhichismore compactcomparedtoaninnerdescription.Insteadofndingtheextremeraysofa uxcone,MMBsidentiesthesetsofconstraintsthatdenetheminimalproperfaces ofthecone.Thesesetsofconstraintsareallsubsetsofnonnegativityconstraintsof irreversiblereactionsandtheyformtheMMBsofthenetworks.MMBstogetherwith reversiblemetabolicspaceRMSuniquelydeterminetheuxcone.Also,thismodel providesatestfordeterminingwhetheragivenuxdistributionbelongstothecone. However,itdoesnotprovideameansofgeneratingallthesteadystateuxdistributions. 136

PAGE 137

Twopopularandcloselyrelatedmodelsthatuseaninnerdescriptionofauxcone areEFMsandEPs.AnEFMisaminimalsetofreactionsthatcanoperateatsteady statewithallirreversiblereactionshavingnonnegativerates.AnEPisanEFMthat correspondstoasteadystatewhichdenesanextremerayoftheuxcone.Forany metabolicnetwork,thesetofEPsisasubsetofthesetofEFMsandbothofthesesets generatetheuxcone.Klamt etal. [109]pointoutthatthesetwosetsareoftenequal forrealisticapplications.Infact,theystatethatifallexchangereactionsinanetwork areirreversiblethenthesetsofEFMsandEPscoincide.Anotherveryusefulproperty ofEFMsisthatreconguringanetworkbysplittingupallitsreversiblereactionsinto twoirreversiblereactionsdoesnotchangethesetofitsEFMs.Byusingthesetwo properties,foranetworkwithnoirreversibleexchangereactions,wealwaysgetaux coneintherstquadrantofhighdimensionaluxspacethatisgeneratedbyasetof extremeraysemanatingfromtheorigin.Thesetoftheseextremeraysisequivalentto thesetofEFMsaswellasthesetofEPs.Theconvexcombinationsoftheelementsof thissetimmediatelygenerateallpossiblesteadystateuxdistributions. Inourmethod,weuseEFMstocharacterizetheuxcones.Onereasonbehind thatis,weneedamodelthatcanrepresentallthemetaboliccapabilitiesofanetwork ratherthanonlytheoptimalsteadystate.Anotherreasonisthat,metabolicnetworks weconsiderhavenoirreversibleexchangeuxesandthesetofEFMscoincidewith thesetofEPs.Hence,itisonlytoavoidconfusionweusethenameEFMsforthatset. Furthermore,thenon-decomposabilitypropertyofEFMsi.e.,nonon-empty,proper subsetofthereactionsofanEFMcanleadtoasteadystateisusefulwhenconsidering differentperturbationsonthenetwork.ByusingthesetofEFMsofanetworkand analyzingtheeffectsofdifferentperturbationsonthisset,wedeviseamethodthat computestheimpactsandthefunctionalsimilaritiesofdifferentreactionsets. 137

PAGE 138

5.2ModelingFunctionalSimilarity Inthissection,wedescribehowwemathematicallyinterprettheimpactofaset ofreactionsonthemetaboliccapabilitiesofagivennetwork.Also,wediscusshowwe modelthefunctionalsimilaritybetweendifferentreactionsets. Considerametabolicnetwork P with n reactions U P = f r 1 ;r 2 ;:::;r n g and d uxes F P = f f 1 ;f 2 ;:::;f d g .Let S bethestoichiometricmatrixof P thatconsistsofone columnforeachux f i 2 F P whichidentiestheinputandoutputcompoundsofthat ux.Also,let v =[ v 1 v 2 , v d ] T beauxvectorthatrepresentsthestate v inwhich each v i isthevaluerealizedbyux f i .Then, S v computesthechangeineachuxfrom v tothenextstate.Thesolutionspaceoftheequationsystem S v =0isthesetofall statesinwhichtheuxvaluesstabilize.Thissethasinnitenumberofsolutionswhich formaconeinhighdimensionalspace.Wecanwritethisuxconeasthespanningset oftheEFMsofthenetwork.If E P = f e 1 ;e 2 ;:::;e t g isthesetofEFMsofnetwork P ,then theuxcone C P is: C P = span E P = f v j v = t X i =1 c i e i ;c i 0 g Wewanttoconsidertheimpactofareactionset R ontheoriginaluxconeof network P .Inotherwords,wewanttoobtainamathematicalexpressionforthechange in C P whenallthereactionsin R areinhibited.Werepresentthenetworkperturbed inthismannerwith P)]TJ/F24 11.9552 Tf 24.789 0 Td [(R .Also,wedenotetheuxconeandtheEFMsof P)]TJ/F24 11.9552 Tf 24.789 0 Td [(R with C P)]TJ/F24 11.9552 Tf 23.91 0 Td [(R and E P)]TJ/F24 11.9552 Tf 23.911 0 Td [(R respectively.Belowisaformalstatementofimpact: Denition13. I MPACT Let P beametabolicnetworkwithreactionset U P .The impactofareactionsubset R U P is: Impact R; P =Span E P )]TJ/F43 11.9552 Tf 11.955 0 Td [(Span E P)]TJ/F24 11.9552 Tf 23.91 0 Td [(R where E P and E P)]TJ/F24 11.9552 Tf 23.91 0 Td [(R arethesetofEFMsof P and P)]TJ/F24 11.9552 Tf 23.91 0 Td [(R respectively. 138

PAGE 139

Inhibitingreactionscanonlyshrinktheuxcone.Furthermore,weknowthatall EFMsof P)]TJ/F24 11.9552 Tf 25.102 0 Td [(R arealsoEFMsof P fromthenon-decomposabilitypropertyofEFMs. Thatis, E P)]TJ/F24 11.9552 Tf 24.387 0 Td [(R E P .Asaresult,forthemetabolicnetworkswithonlyirreversible uxesImpact R P isalsoapolyhedralconeandhasasetofEFMsthatgeneratesit. Whenreversibleuxesarepresent,wesplitthemintotwodifferentuxesinopposite directionsandthispropertystillholds.ThesetofEFMsthatdeneImpact R P isa subsetoftheEFMsoftheoriginalnetworkandwecanconstructitbycheckingforeach e i 2 E P whetherithasanynonzerouxvalueforareactionthatisanelementof R If e i haszeroentriesforallthereactionsof R ,then e i isanEFMofImpact R P .The setoftheseEFMsdenetheuxconeofthisimpact.Usingthismodelofimpact,we arenowreadytopresentourcharacterizationoffunctionalsimilaritybetweendifferent reactionsets. Let P P betwometabolicnetworks.Clearly,thesetwonetworkscanhavedifferent uxesaswellascommonones.Ifalltheuxesarenotcommontobothnetworks,then C P and C P lieondifferentspaces.Tobeabletocomparethesetwoconesandthe impactsofperturbationsonthem,weneedthemtobeinthesamehighdimensional uxspace.Onewaytodoitistotakeonlythecommonuxesandprojectthesetsof EFMsthatgeneratetheuxcones.However,againbynon-decomposabilityproperty ofEFMs,theprojectedEFMsmaynotbefeasibleanymore.Thisresultsinerroneous steadystateswhichdonotreecttheactualcapabilitiesofthecorrespondingnetwork. Inordertobringthetwouxconestothesameuxspacewithoutaffectingthesteady statestheyrepresent,weliftthemtoahigherdimensionalspacedenedbytheunion oftheuxesoftwonetworks.WedothisbysimplyextendingtheEFMswithzerosfor non-commonuxes.TheliftingprocessguaranteesthattheEFMsofbothnetworksstay asEFMsinthenewuxspaceandthenewuxconesreectthemetaboliccapabilities correctly.WeusethenotationImpact R Pj P torepresenttheimpactof R onnetwork P inthenewuxspacethatistheunionoftheuxesof P and P .Wedenethe 139

PAGE 140

functionalsimilaritybetweenthereactionsets R of P and R of P astheunexpectedness ofthesizeoftheintersectionsoftheirimpactsinthisnewuxspace.Formaldenitionis asfollows. Denition14. F UNCTIONAL S IMILARITY Let P and P betwometabolicnetworks.Let R R betwogivensubsetsofthereactionsof P and P respectively.Also,letImpact R Pj P andImpact R PjP denotetheimpactsof R and R intheuxspacedenedby theunionoftheuxesof P and P respectively.If s isanarbitraryuxdistributioninthis space,thenwedenethefunctionalsimilarityof R and R as: Sim R; R jP ; P = )]TJ/F15 11.9552 Tf 11.291 0 Td [(log 1 )]TJ/F24 11.9552 Tf 11.955 0 Td [(Pr )]TJ/F24 11.9552 Tf 5.479 -9.683 Td [(s 2 Impact R; Pj P ^ s 2 Impact R; PjP s 2 Impact R; Pj P s 2 Impact R; PjP Here,thesymbols ^ and denotelogicalANDandORoperatorsrespectively. Intuitively,theabovedenitionstatesthattworeactionsetsfromdifferentnetworks servesimilarfunctionsifthesetsofsteadystatestheycontributetotheirownnetworks haveasignicantintersection.Theprobabilityvaluei.e.,Princreaseslinearlyinthe interval[0,1]withtheratioofcommonsteadystatesoftheimpactsof R and R .Itsvalue becomes1whentwoimpactsareidentical. Computationofthefunctionalsimilarityasformulatedaboveisanontrivialproblem. Itrequiresndingtheintersectionandtheunionoftwopolyhedralconesinahigh dimensionalspace.Neitheroftheseproblemshasaclosedformsolution.Inthenext section,weproposeanefcientmethodthatallowsustocomputethefunctional similaritiesofreactionssetsforrealsizednetworks.InSection5.4,wepresent experimentalresultsthatillustratehowourmethodperformsonrealmetabolicnetworks. 140

PAGE 141

5.3ComputingFunctionalSimilarity Computingthefunctionalsimilaritybetweentworeactionsets R of P and R of P requiressolvingseveralchallengingproblems.Firstofall,weneedtoidentify theimpactsofbothreactionsetsontheirnetworks.Then,weneedtondahigh dimensionaluxspaceinwhichEFMsoforiginalnetworksarepreserved.Inthisnew uxspace,weneedtocomputethehyper-volumeoftheintersectionoftwopolyhedral conesaswellasthehyper-volumeoftheirunion.Aftersolvingalltheseproblems,we cancalculatethefunctionalsimilaritybetween R and R intermsoftheirimpactsonthe metaboliccapabilitiesoftheirnetworks. Algorithm1outlinesoursolution.Beforegoingintodetailsofeachstepofthis algorithm,weexplainoursolutiononavisualexample.Figure5-3illustratesthe cross-sectionsoftwouxconesbelongingtotwohypotheticalnetworksbothinthree dimensionaluxspace.Thecross-sectionsareobtainedbyintersecting C P and C P withatwodimensionalplane.Here, C P and C P aregeneratedby5and6EFMs respectively.Thedashedareasofthiscross-sectiondenotetheimpactsof R and R ontheirnetworksaslabeled.BothImpact R P P andImpact R P P have3 EFMsthatgeneratethem.Thedoubledashedarearepresentstheintersectionofthese twoimpacts.Thebiggerthisintersectionis,themoresimilartheimpacts,hencethe functions,of R and R .Inorderforfunctionalsimilaritiesofdifferentreactionsetpairsto becomparable,wenormalizethesizeofthisintersectionbythesizeoftheunionofthe impacts.Inotherwords,whatweareaimingtocomputeinFigure5-3istheratioofthe sizeofdoubledashedregiontothesizeoftheunionofalldashedregions.Forathree dimensionaluxspace,theproblemistondtheintersectionandtheunionoftwoareas intwodimensions.Ingeneral,iftwouxconeslieina d dimensionalspacetheweneed tondtheintersectionoftwo d )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 dimensionalpolytopes. Next,weelaborateoneachstepofAlgorithm1. 141

PAGE 142

Figure5-3.Pictorialdescriptionoffunctionalsimilarity.Thepentagonontheleftisa crosssectionoftheuxconeofnetwork P .Similarly,thehexagononthe rightrepresentstheuxconefor P .Thedashedareasdenotetheimpactsof reactionsets R on P and R on P .Functionalsimilarityoftheseimpactsis determinedbytheirintersectionthatispointedbythearrow. Algorithm1 Computingfunctionalsimilarity Input:Tworeactionsets R of P and R of P Output:Sim R R jP P 1. ComputetheEFMsof C P and C P i.e., E P E P 2. FindtheEFMsofImpact R P andImpact R P 3. Compute )]TJ/F24 11.9552 Tf 9.298 0 Td [(log 1 )]TJ/F29 11.9552 Tf 11.955 9.683 Td [()]TJ/F20 11.9552 Tf 6.675 -1.596 Td [(Impact R; Pj P Impact R; PjP Impact R; Pj P [ Impact R; PjP : I .LifttheEFMsofimpactsintoahigherdimensionaluxspace. II .Transformthepolyhedralconesinthisnewuxspaceintopolytopesbytaking theirintersectionwithahyperplane. III .Boundthehyper-volumesofthecreatedpolytopesbyndingtheirminimum enclosingballsMEBs. IV .Findthehyper-volumesoftheintersectionandunionoftheseMEBsandreturn thesimilarityscore. 5.3.1FindingtheEFMsofMetabolicNetworks Inordertocomputefunctionalsimilarityoftworeactionsetsasafunctionofthe steadystatesofthenetworks,werstneedtondthesetofEFMsthatgeneratesthe metabolicuxconeofanetwork.IdenticationofEFMsisawell-studiedproblemand anumberofalgorithmsaswellastheirimplementationsareavailable[98,110112].In thischapter,weusearecentimplementationofvonKamp etal. calledMetatool[98]. 142

PAGE 143

Wechoosethistoolasitsanefcientimplementationandiscommonlyusedinthe literature.Metatoolusesthestoichiometricconstraintsandthereversibilityofthe reactionsofanetworkinordertocomputeitsEFMs.However,ndingEFMsisa computationallyexpensiveproblemandeventhemostefcientalgorithmscannotscale fornetworkswithmorethan100reactions.Therefore,thispartbecomesthebottleneck whenwewanttocomputefunctionalsimilaritiesofreactionsetsofverylargenetworks. 5.3.2ExtractingEFMsofImpacts AfterndingtheEFMsoftheoriginalnetworks P and P ,thenextstepistocompute theimpactsofreactionssets R and R onthecorrespondinguxconesof P and P in termsoftheEFMs.Inotherwords,wewanttondthesetofEFMs E R E R that representsthechangein C P C P whenallthereactionsin R R areinhibited. Wediscusshowwecomputethechangein C P duetoinhibitionof R next.The computationofthatfor C P and R issimilar.AsweexplainedinSection5.2,theimpact ofareactionsetdenesapolyhedralconeandthesetofitsEFMsisasubsetofthe EFMsoftheoriginalnetwork.WeconstructthissetofEFMsbycheckingforeachEFM oftheoriginalnetworkwhetheritisstillfeasibleafterinhibitionofthecorresponding reactionset.IfanEFMdoesnotremainfeasibleafterinhibitingthereactionsof R ,then itisamemberofthegeneratingsetofthepolyhedralconethatrepresentstheimpactof R on P Here,wewanttodemonstrateonahypotheticalexamplehowweextractthe EFMsoftheimpactofareactionset R ofagivennetwork P .Let U P = f r 1 ;r 2 ;r 3 g F P = f f 1 ;f 2 ;f 3 ;f 4 g E P = f e 1 e 2 e 3 e 4 e 5 g denotethesetofreactions,setofuxes andthesetofEFMsofnetwork P respectively.Also,let r 1 r 2 beirreversiblereactions correspondingtouxes f 1 f 2 respectively,and r 3 beareversiblereactionthatissplit intotwoirreversibleuxes f 3 and f 4 .LettheuxvaluesofveEFMsofthisnetworkbe givenas: 143

PAGE 144

f 1 f 2 f 3 f 4 e 1 1001 e 2 0101 e 3 0110 e 4 1010 e 5 1100 Considertheimpactofthereaction r 1 .TheEFMs e 1 e 4 and e 5 arenotfeasible afterinhibiting r 1 since f 1 hasanonzerouxvalueattheseEFMs.Therefore,theset E f r 1 g = f e 1 e 4 e 5 g generatesthepolyhedralconethatrepresenttheimpactof r 1 on P .For r 3 therearetwocorrespondinguxes f 3 and f 4 .Hence,theimpactofinhibiting r 3 isgeneratedbytheEFMsthathaveeithernonzerovaluesforeither f 3 or f 4 i.e., E f r 3 g = f e 1 e 2 e 3 e 4 g ExtractingtheEFMsoftheimpactsinthismannerismoreefcientthancomputing themfromscratchforeverydifferentreactionset.Thisimportantreductionincomputational costofourmethodisduetothenon-decomposabilitypropertyofEFMs.TheseEFMs generateallthesteadystatesthatthereactionsetinconsiderationcontributestoits metabolicnetwork.Next,wedescribehowweusethesesteadystatestodenethe functionalsimilaritybetweentwosuchreactionsets. 5.3.3CalculatingtheSimilarityScore OnceweextracttheEFMsoftheimpactsof R and R ,itisconceptuallyeasyto describehowwecalculatethefunctionalsimilaritybetweenthem.Wemeasurethe similarityastheunexpectednessofthesizeoftheintersectionoftheconesrepresenting theseimpacts.However,ndingthissimilarityrequiressolvingseveralcomputationally challengingproblems.Inthefollowing,wesummarizethestepswetakeinoursolution totackletheseproblems. S TEPI .Astwonetworks P and P canhavedifferentuxes,werstneedtonda commonuxspaceinwhichthemetabolicuxconesofthesenetworksarecomparable. 144

PAGE 145

Atthispointtherstattemptwouldbetondtheuxesthatarecommontoboth networksandprojectuxconesontothatspace.AsweexplainedinSection5.2,this approachresultsinincorrectEFMsthatdonotgeneratethecorrectuxconesofthe networks.Therefore,weuseliftinginsteadofprojectionthatextendsEFMswithzero entriestolifttheuxconesintoahigherdimensionalspace. Here,weexplainhowtheuxconesareliftedtothenewuxspaceonahypothetical example.Let P P betwometabolicnetworkswithuxes F P = f f 1 ;f 2 ;f 3 ;f 4 g and F P = f f 1 ;f 3 ;f 5 g .Wedenotenewuxspacethatistheunionof F P and F P as F P + P = f f 1 ;f 2 ;f 3 ;f 4 ;f 5 g .Let f e 1 =[1010] e 2 =[0011] e 3 =[0100] g bethesetofEFMs thatgeneratetheconerepresentingtheimpactof R on P .Weliftthisconetotheux spaceof F P + P byaddingzerosfor f 5 toeachEFM.Thenewset f e 0 1 =[10100] e 0 2 =[00110] e 0 3 =[01000] g isthesetofEFMsthatgeneratetheimpactof R inthenewux space.Similarly,wecanlifttheimpactof R on P tothisnewspace.Theimpactsofany tworeactionsetsfrom P and P arecomparableintheuxspaceof F P + P S TEPII .Now,wehavetwopolyhedralconesinthesameuxspaceeachrepresenting theimpactofareactionsetonitsnetwork.Whatweneedatthispointistheratioof thehyper-volumeoftheintersectionofthesetwoconestothatoftheirunion.Aswe mentionedearlier,thereisnoclosedformexpressionforeitherofthesehyper-volumes. Inordertoavoidcomputingthesehyper-volumesdirectly,wetransformthepolyhedral conestopolytopesinourhighdimensionaluxspace.Wedothisbyintersectingthe coneswithahyperplaneinthepositivequadrant.If d isthedimensionoftheuxspace F P + P ,thentheintersectionofa d )]TJ/F15 11.9552 Tf 12.031 0 Td [(1 dimensionalhyperplanewithapolyhedralcone isapolytopein d dimensions.Byconstruction,thispolytopeisconvexandiseasierto dealwithcomparedtoapolyhedralcone. Thistransformationclearlychangesthehyper-volumesoftheregionsthat correspondtotheimpactsofdifferentreactionsets.However,whatweaimtondis neitherthesehyper-volumesnortheexacthyper-volumesoftheintersectionandunion 145

PAGE 146

oftheseimpacts.Instead,ourgoalistocalculatetheratioofthehyper-volumeofthe intersectiontothatoftheunionAlgorithm1.Theorem5provesthatourtransformation oftheuxconesintopolytopespreservesthisratio. Lemma7. Let ~e beanEFMofnetwork P with d uxes.Also,let S : ~n ~x + c =0 denote ahyperplanewhere ~n and ~x areboth d dimensionalvectorsrepresentingthenormaland apointonthishyperplanerespectively.If ~e intersects S ,thenthispointisoftheform t~e forsome t 2 R .Solvingfor t wegettheintersectionoftheEFM ~e withhyperplane S as T S ~e = )]TJ/F24 11.9552 Tf 18.499 8.088 Td [(c ~n ~e ~e Proof: Omitted. Theorem5. Let C R C R betwopolyhedralconesin d dimensionalspacethatare generatedbytheEFMs E R = f e 1 ;:::;e a g and E R = f e 1 ;:::; e b g respectively.Let S beanyhyperplaneinthisspaceand H S R = f T S e 1 ;:::;T S e a g H S R = f T S e 1 ; :::;T S e b g denotethepolytopescreatedbytheintersectionof C R and C R with S respectively.Then, Volume C R C R Volume C R [ C R = Volume H S R H S R Volume H S R [ H S R Proof: Let e i beanEFMof E R .Considerallthehyperplanes S 0 thatareparallelto S .Thenormalvectorforany S 0 willbethesameasthenormalof S .Thisimpliesthat theintersectionpointof e i with S i.e., T S e i isaconstantmultiple t ofitsintersection pointwith S 0 i.e., T S 0 e i .Thisconstantissameforeach e i as S 0 k S and T S e i = t T S 0 e i forall i 2 [1 ;a ] .Theintersectionpointsof E R withahyperplanedenethe cornersofaconvexpolytopeonthathyperplane.Therefore, H S R and H S 0 R denote twoconvexpolytopesthatarescaledversionsofeachotherbyafactorof t .Taking theratioofthehyper-volumesoftwosuchpolytopeswillcanceloutthescalingfactor. Hence, Volume H S R H S R Volume H S R [ H S R = Volume H S 0 R H S 0 R Volume H S 0 R [ H S 0 R 146

PAGE 147

when S 0 k S Now,ifwedointegrationoverall S 0 thatareparallelto S forthevolumesofthe polyhedralcones C R and C R ,weget: Volume C R = R S 0 k S Volume H S 0 R dS 0 Volume C R = R S 0 k S Volume H S 0 R dS 0 Usingtheaboveformulations: Volume C R C R = Z S 0 k S Volume H S 0 R Volume H S 0 R dS 0 Z S 0 k S Volume H S 0 R H S 0 R dS 0 Computing Volume C R [ C R similarlyandtakingtheratio: Volume C R C R Volume C R [ C R = Z S 0 k S Volume H S 0 R H S 0 R Volume H S 0 R [ H S 0 R dS 0 = Z S 0 tVolume H S R H S R tVolume H S R [ H S R dS 0 = Volume H S R H S R Volume H S R [ H S R Theorem5statesthatwecanconvertourproblemintoamorefamiliarformwithout sacricingaccuracy.Ourproblemofcalculatingtheratioofthesizeoftheintersectionof thetwopolyhedralconestothatoftheirunionisnowtransformedintondingthisratio fortwoconvexpolytopes. S TEPIII .Evenaftertransformingthetwouxconesthatrepresenttheimpactsinto convexpolytopes,computingthehyper-volumesoftheirintersectionandunionis stilladifcultproblem.Weapproximatetheintersectionandtheunionofthesetwo polytopesbytheintersectionandtheunionoftheirminimumenclosingballsMEBs. Figure5-4illustratesanexampleMEBthatboundsapolytopeinthreedimensional space.Inourcase,wecomputetheMEBsofbothpolytopesinhighdimensionalspace byusingthemethodofFischer etal. [107].ThismethodcomputestheMEBsofpoint setsinsteadofpolytopes.Therefore,wetreateachcornerofourpolytopesasapoint 147

PAGE 148

Figure5-4.MinimumenclosingballMEBofapolytopeinthreedimensionalspace. andthencomputetheMEBofthesepoints.ThisMEBisthesameastheMEBof theactualpolytopeasthispolytopeisconvexbyourconstruction.ComputingMEBs forlargenumberofpointscanbedoneefcientlyinspaceswithuptoafewhundred dimensions[107].TheMEBswecomputeprovideasimpleparametricrepresentation fortheimpactsofreactionsetsintermsofacenterandaradiusinhighdimensionalux space. S TEPIV .ByusingparametricrepresentationsoftheMEBs,wendtheintersectionof theimpactsofreactionsetsasthepointsofthepolytopesthatlieinbothMEBs.The unionistrivialandisthesetofallpointsofrstpolytopeplustheonesofthesecond. WecomputetheMEBsofthesepointssetstogettheradiiofMEBsoftheintersection andtheunionoftheimpacts.Let r int and r un denotetheseradiirespectively.Then,we calculatetheratioofthevolumeoftheMEBoftheintersectionofimpactstothevolume oftheMEBoftheirunionas c d :r d int c d :r d un where d isthenumberofdimensionsand c d isa constantfactordependingon d .Sincetheconstantscancelout,wehavetheratio r d int r d un asourapproximationto Impact R; Pj P Impact R; PjP Impact R; Pj P [ Impact R; PjP .Finally,wecomputethe 148

PAGE 149

functionalsimilaritybysubtractingthisratiofrom1andtakingtheminuslogarithmofthe result. 5.4ResultsandDiscussion Inthissection,weexperimentallyevaluatetheperformanceofourmethod.First, wemeasuretheaccuracyofourmethodinidentifyingreactionsetsofdifferentnetworks thatarefunctionallyequivalent.Wecompareourfunctionalsimilarityscorewithseveral existingsimilarityscoresinthissetting.Then,wetestwhetherourmethodcanbe utilizedtopredictessentialreactionsofagivenmetabolicnetwork.Wemodifyour functionalsimilarityscoretodeneanessentialityscoreandwecalculateitsstatistical signicance. Dataset: WeusethemetabolicnetworkstakenfromKEGGpathwaydatabase.We use12differentnetworksof E.coliK-12MG1655 .Theaveragenumberofreactionsper networkis20.75.ThelargestnetworkinourdatabasePyrimidinemetabolismhas69 reactions,60compoundsand1,007EFMs. Environment: WerunalltheexperimentsonadesktopcomputerrunningUbuntu9.10 withoneprocessorand2GBofRAM. 5.4.1IdenticationofFunctionalSimilarities Ourmotivationforidentifyingfunctionallysimilarreactionsetsisthatmetabolisms ofdifferentorganismscanperformthesamefunctionthroughdifferentsetsofreactions. Here,weevaluatetheaccuracyofthesimilaritymeasurewedescribeSection5.3for identifyingreactionsetswithsimilarbiologicalfunctions. Inthisexperimentalsetting,wecomparethreeexistingsimilaritymeasureswith thefunctionalsimilaritywepropose.Therstsimilaritymeasure,thedegreesimilarity, takesintoaccountonlythetopologicalfeaturesofthenetworks[103].Thismeasure considerstworeactionsetsfromtwopossiblydifferentnetworksassimilariftheinand outdegreesofthereactionsinthesesetsaresimilar.Thissuggeststhatthereaction setswithcentralreactionsinonenetworktendtobesimilartothereactionssetsthat 149

PAGE 150

containreactionswithhighcentralityintheothernetwork.Thesecondexistingmeasure istheUP/UCsimilarity[104].UP/UCsuggeststhatthenumberofcompoundsthatare uniquelyproducedorconsumedbyareactionisagoodindicatorofthebiologicalroleof thatreaction.Ifthisnumberissimilarforthereactionsoftworeactionsetswecompare, thentheyareconsideredtohavesimilarfunctionsbytheUP/UCmeasure.Thelast existingapproachisreactionparticipationsimilarity[105,106].Reactionparticipation usesthenumberofEFMsthatareactionparticipatesintomeasuretheimportance ofthatreactionforthenetwork.Unlikethersttwoapproaches,reactionparticipation similarityusesthesteadystateinformationofthemetabolicnetworks.However,it considerseachEFMequallyimportantanditonlycountsthenumberofEFMsthata reactionparticipatesin.Asaresult,itcanoverorunderestimatethecontributionsof differentEFMsinthemetaboliccapabilitiesofanetwork. Inordertohaveafaircomparisonbetweenexistingmeasuresandtheonewe propose,weintroducefunctionallyequivalentreactionsetsbyperturbingagiven network.Werandomlycombinetwoneighborreactionsoftheoriginalnetworkwitha givenprobability.Wedothisforallreactionpairstoattainaperturbedinstanceofthe network.Thiswayweisolatetheeffectsofotherfactorse.g.,networksize,number ofEFMs,etc.andcancreatereactionsetsintheperturbednetworkthatprecisely havethesamefunctionasanotherreactionsetintheoriginalnetwork.Areactionin theperturbedinstanceisacombinationoftworeactionsoftheoriginalnetworkandis functionallyequivalenttothesetofthesetworeactions.Wetakeeachreactionfroma perturbednetworkthatrepresentsacombination,thencompareitwitheachconnected reactionsetofsizeatmosttwooftheoriginalnetwork.We,then,calculatetherecallof eachmethodasthepercentageofthefunctionallyequivalentreactionsetsitidentiesin top-kscoringreactionsets. Wecomparetheaveragerecallvaluesofdifferentsimilaritymeasureswhenwe use100perturbedinstancesoftheenergymetabolismof E.coli .Thismetabolism 150

PAGE 151

Figure5-5.Comparisonofdifferentsimilaritymeasuresforidentifyingfunctionallysimilar reactionsets.Wecombinetwoconnectedreactionswithprobability0.05to introducefunctionallyequivalentbutperturbedinstancesoftheenergy metabolismof E.coli .Wemeasurerecallasthepercentageofidentifying functionallyequivalentpartsoftheoriginaland100perturbednetworks withinthetop-kmostsimilarreactionsets. isacombinationoffourdifferentnetworksintheKEGGdatabase,namelyreductive carboxylatecycle,methanemetabolism,nitrogenmetabolismandsulfurmetabolism. Overallenergymetabolismof E.coli has47reactions,87compoundsand37EFMs. Thenumberofconnectedreactionsetsofsizeatmosttwoforthisnetworkis130. Figure5-5plotstheaveragerecallvaluesforfourdifferentsimilaritymeasures.Theone withhighestrecallvalueforalldifferent k isthefunctionalsimilarityweproposeinthis chapter.Toseeitintermsofnumbers,whenweconsidertop20%ofallthereaction setsi.e.,top26among130oursimilaritymeasureachieves 76%recallwhereas therecallofthesecondbestmethodi.e.,reactionparticipationsimilaritystaysat57%. Also,ourfunctionalsimilarityreaches95%recallattop37%ofallthereactionsetswhile theothermethodscanonlyachievethisrecallat64%,77%and90%.Theresultsofthis experimentsuggestthatthefunctionalsimilarityisamoreaccuratemeasurecompared toexistingonesinordertoidentifyreactionssetsthatplaysimilarbiologicalrolesin differentnetworks. 151

PAGE 152

5.4.2PredictionofEssentialReactionSets Animportantandwell-studiedprobleminbiologyistondtheessentialcomponents ofametabolism.Oftentheessentialityofageneofanorganismisdeterminedby lookingatitsknockoutphenotype.Theessentialitycantakebinaryvaluesi.e.,essential ornon-essentialaswellascategoricalorcontinuousvalues.Thisnotionofessentiality canalsobeextendedtoothercomponentsofthemetabolismsuchasreactions, enzymesandcompounds. Here,weutilizeourcharacterizationoftheimpactofareactionsettodetermine itsessentialityforametabolicnetwork.Ourintuitionisthatamongtworeaction setsofanetwork,theonewiththebiggerimpactismoreessentialcomparedto theother.Inordertotestourhypothesis,weusedifferentnetworksof E.coli that containessentialreactionsfortheorganism.Weconstructthesetofallessential reactionsusingtheessentialgeneslistedintheonlinedatabasePECProlingof E.coli Chromosome[113].Weextract341essentialreactionsin E.coli byusing302essential geneslistedinPEC.Table5.4.3listseightmetabolicnetworksthatcontaindifferent fractionsofessentialreactions.Foreachoneofthesenetworks,werstenumerateall itsconnectedreactionsetsofsizeatmostthree.Thetermconnectedindicatesthat eachreactioninthesetisaneighborofatleastoneotherreactioninthatset.Then, foreachconnectedreactionset R ,wecomputeitsessentialityin P asthefractionof thesteadystatesthatareunreachableafterinhibitingthereactionsin R .Formally,we computetheessentialitybyusingourfunctionalsimilarityasfollows: Essentiality R P =Sim R PjP P =Sim R PjP Wecalculatetheessentiality asthefunctionalsimilarityof R andthesetofallreactionsof P giventhenetwork P Aftercomputingessentialityforeachreactionsetinthismanner,wesortthem indecreasingorderaccordingtotheiressentialityscores.Wecalculatethestatistical signicanceofourrankingbycalculatingitsp-value.Morespecically,let p bethe probabilitythatarandomlyselectedreactionisessentialand t n denotethenumberof 152

PAGE 153

essentialreactionsandallreactionsintop10%ofconnectedreactionsetsaccordingto ourrankingrespectively.Thep-valueofourpredictionforthisnetworkis p-value p;t;n = n X i = t n i p i )]TJ/F24 11.9552 Tf 11.955 0 Td [(p n )]TJ/F25 7.9701 Tf 6.587 0 Td [(i Wereportp-valuesfortheeightnetworksof E.coli listedinTable5.4.3.Consider thePyrimidinemetabolismwhichistherstrowinthistable.Theprobabilityofa reactionbeingessentialforthisnetworkis p = 10 69 =0 : 145 .Usingthisprobability, theexpectedvalueofthenumberofappearancesofessentialreactionsintop10% ofconnectedreactionsetsforthisnetworkis14.21whereasaccordingtoourranking weobserve111appearancesofessentialreactionsinthistop10%.Thistranslates toap-valueintheorderofE-15whichsuggestsstrongstatisticalsignicance.All thep-valuesoftherstsixrowsofTable5.4.3arealsostatisticallysignicanti.e., p-value < 0 : 05 .Forthelasttwonetworksthep-valueisgreaterthan 0 : 05 ,howeverour methodstillreportsmorethantheexpectednumberofessentialreactions. Figure5-6.StatisticalsignicanceofouressentialitypredictionforPyrimidine metabolismof E.coli Next,wetakeacloserlookathowthestatisticalsignicanceofouressentiality scorechangeswhenweconsiderdifferentpercentagesofthehighestscoringreaction sets.Forthispurpose,weusethePyrimidinemetabolismof E.coli andcompute 153

PAGE 154

Z-scoresfordifferentpercentagesof k .Figure5-6plotstheZ-scoresforourmethod. WeobservethatZ-scorereachessignicantvaluesevenwhenweconsiderasmall percentageofthetop-kresults.WeachieveZ-scoreof10fortop-10%whichimplies thatourresultis10standarddeviationsawayfromtherandomresultandhenceis statisticallysignicant.Z-scorereachesitspeakattop-26.4%,whichisshownbythe dashedlineinFigure5-6.AtthispointZ-scoreis18.24andofallthereactionsets uptothatpoint,approximately88% 228 = 259 containatleastoneessentialreaction. Consideringthatonly10outof69reactionsareessential,this88%showsthatour scoringschemecansuccessfullyextractreactionsetswithessentialreactionsby assigningthemlargerscorescomparedtotheothersets.Theresultsofthissection indicatesthatthefunctionalsimilaritymeasureweproposecanbeextendedtodene anessentialityscorethatcanaccuratelyidentifyessentialreactionsandreactionsetsof metabolicnetworks. 5.4.3Discussion Understandingthefunctionalroleofacomponentsetofametabolicnetworkhas beenanimportantprobleminmolecularbiology.Oftenhomologicalandtopological featuresofthenetworkshavebeenusedforthispurpose.However,thebiological functionsoftworeactionsetscanbeequivalentevenwhentheirsize,connectivity, intermediateproductsandcatalyzingenzymesaredifferent.Therefore,neitherthe topologicalfeaturese.g.,centralitynorthehomologicalsimilaritiese.g,enzymes similarity,compoundsimilaritycanprovidesufcientinformationforidentifyingsuch functionalsimilarities. Inthischapter,wedevelopedasystematicwaytocharacterizeandcomputethe functionalsimilaritybetweentwogivenreactionsubsetsinmetabolicnetworks.Webuilt amathematicalmodelthatexplainstheimpactofareactionsetintermsofthesetof possiblesteadystatesofitsnetwork.Specically,ourmodelcomputestheimpactas theportionoftheuxconeoftheoriginalnetworkthatcannotbeachievedwithoutthe 154

PAGE 155

reactionsinthesetthatweconsider.Usingthismodel,wecharacterizedthefunctional similarityoftworeactionsetsfrompotentiallydifferentnetworks.Weachievedthis byrstliftingtheuxconesofthenetworkstoahigherdimensionaluxspacethatis theunionoftheuxspacesofthesenetworks.We,then,representedthefunctional similarityoftworeactionsubsetsastheratioofthevolumeoftheintersectionofthe regionscorrespondingtotheirimpactsontheirnetworkstothevolumeoftheunionof theseregions.Wedevelopedanovelmethodthatcomputesthisratioasfollows.Itrst computestheElementaryFluxModesEFMsofthenetworkwithandwithoutthegiven reactionset.It,then,transformsthepolyhedralconesintopolytopesbytakingtheir intersectionswithahyperplaneinthepositivequadrantofthenewuxspace.Lastly, itcalculatestheintersectionofthesepolytopesastheintersectionoftheirminimum enclosingballsMEB.Ourexperimentsonrealmetabolicnetworksdemonstratedthat ourmethodcanidentifybiologicalsimilaritiesaccurately.Moreover,weobservedthat ourdenitionofimpactcanprovidebiologicallyandstatisticallysignicantpredictionsof essentialreactionsets. Characterizingthefunctionofasetofcomponentse.g.,reactionsmathematically hasgreatvalueinnumerousapplicationsofcomputationalbiology.Thus,webelievethat theideasdevelopedinthischapterhasthegreatpotentialtolayfoundationstomany advancesinunderstandingandcomparingcomplexbiologicalnetworksbetter. 155

PAGE 156

Table5-1.Essentialreactionsineightdifferentmetabolic networksof E.coli P.Id a NetworkName#E. b #R. c p-value d 00240Pyrimidinemet.10697.99E-15 00860Porphyrinandchl.met.9213.02E-06 00564Glycerophospholipidmet.7201.24E-05 00010Glycolysis/Gluconeogenesis4266.71E-03 00670Onecarbonpoolbyfolate6161.23E-02 00540Lipopolysaccharidebio.15194.86E-02 00760Nicotinatemetabolism4162.41E-01 00300Lysinebiosynthesis8152.45E-01 a KEGGidentierofthenetwork; b Numberofessentialreactionsofthenetworkaccording toPECclassication[113]; c Numberofreactionsofthenetwork; d Probabilityofobtainingrandomlyatleastasmany essentialreactionsasourmethodpredictsintop10%of allconnectedreactionsetswithsizeatmostthree. 156

PAGE 157

CHAPTER6 STEADYSTATESOFREGULATORYNETWORKS Analyzingbiologicalnetworksisessentialinunderstandingthemachineryofliving organismswhichhasbeenamaingoalforscientists[37,38].Generegulatorynetworks andsignalingpathwaysaretwoimportantnetworktypesthatplayroleineveryprocess oflivingorganisms[114].Inthelastdecade,signicantamountofresearchhasbeen doneonreconstructionofthesenetworksfromexperimentaldata[2,3,115120].The amountofregulatorydataproducedbythesemethodsissufcientenoughtotriggerthe researchonautomatedtoolstoanalyzevariousaspectsofthesenetworks.Weusethe termbiologicalregulatorynetworksBRNtocombinegeneregulatorynetworksand signaltransductionpathways. TocapturethebiologicalmeaningofBRNs,itisnecessarytocharacterizetheir longtermbehavior.Acommonwaytoachievethisistoidentifythesteadystatesofthe dynamicsystemdenedbyaBRN.IdenticationofsteadystatesofBRNsiscrucial inseveralapplicationssuchasthetreatmentofvarioushumancancers[29,30]e.g. leukemia,glioblastomaandgeneticengineering[14].Additionally,thesteadystate analysishasproventobesuccessfultoexplaintheowermorphogenesisof Arabidopsis thaliana [3133],thedifferentiationprocessofT-helpercells[121123],themechanism ofTcellreceptorsignaling[34]andthecellcyclesofyeasttypes[35,36]. WeuseBooleanvaluesforthestatesofthegenesONorOFFmeaning highorlowactivitysinceitissuccessfullyusedintheliteratureforBRNs[31,35, 36,121,123].Recently,severalmethodshaveusedcategoricalvaluese.g.,low, medium,highactivityforgenestatesintheirmodel[32,124,125].Thesteadystates extractedbythesemethodsshowedhighparallelismwiththeonesfoundusingBoolean models.ThenaiveapproachtosteadystateidenticationinBooleannetworksisto exhaustivelysearchthestatespace.However,thenumberofpossiblestatesofa BRNisexponentialinthenumberofitsgenes.Therefore,exhaustivemethodsare 157

PAGE 158

computationallyinfeasibleforevenmoderatelysizedBRNs.Toaddressthisproblem, someexistingmethodsusenite-stateMarkovchains[126],binarydecisiondiagrams BDD[121,122],constraintprogramming[127],probabilisticBooleannetworks[128], linearprogramming[129],relationalprogramming[130]andmodulenetworks[131, 132]. Orthogonaltotheselectionofthecomputationalmethod,therearetwocommonly usedalternativesformodelingthestatetransitions.Thesearesynchronousand asynchronousmodelsandbothareusedintheliterature[121,122,127,130]. Synchronousmodelsassumethattheactivitylevelsofallthegeneschangesimultaneously. Hence,thenextstateisdeterministicallydecidedbythecurrentstate.Ontheother hand,asynchronousmodelsconsidertimeinsmallintervals,suchthatonlyonegene canchangeitsstateatanintervalandstatechangeisequallylikelyforallgenes[122]. Foran n geneBRN,thestatespaceofsynchronousmodelhas 2 n statesand 2 n state transitions.Forasynchronousmodel,thenumberofstatesisstill 2 n butthenumberof possibletransitionscangoupto n 2 n .Theadvantages/disadvantagesofthesemodels togetherwiththeireffectonrunningtimeofsteadystateidenticationalgorithmsare discussedintheliterature[122,133,134].Duetoitsstrongassumptions,suchas allgeneschangetheirstateatthesametimeandallhaveequalresponsetimesto thesechanges,synchronousmodelisarguablymoreofanabstractionofthebiological processcomparedtoasynchronousmodel.Weusetheasynchronousmodelinour discussionhere,however,itisimportanttonoteourmethodworksforthesynchronous modelaswell. AstateofaBRNistheunionofthestatesofitsgenesatacertaintime.Thestate ofagenecanchangeoverthetimeduetointernalregulationsorexternalstimulants. SteadystatesarethestatesinwhichthedynamicsystemofthatBRNstabilizes.The restofthestatesofthenetworkarecalledtransientstatesandtheyareusuallynot 158

PAGE 159

ofinterestfrombiologicalviewpoint.WefollowthesteadystatedenitionofGarg et al. [121]. Denition15. Let S beasetofstates.Each s i 2 S issteadyifandonlyifthefollowing conditionsaresatised: Thesetofthesuccessorstatesofallthestatesin S isequalto S Foreach s i 2 S onceitisvisitedtheprobabilityofrevisiting s i isequalto1ina nitenumberofstatetransitions. Thisdenitionsuggeststhattherearetwotypesofpossiblesteadystates,self loopse.g.,Figure6-1aandsimpleloopse.g.,Figure6-1basnamedin[121].Ifa setofstatescreateacomplexloop,thenallthestatesofthissetaretransientsinceat leastoneofthestatesdoesnotsatisfythesecondconditionoftheabovedenition.For instance,inFigure6-1cthestate [010] isnotrevisitedwithprobabilityequalto1innite stepssincethesystemcanloopforeverthroughotherfourstateswhichcreatealoop. Similarly,Garg etal. namesuchsetsofstatesastransientstates.Figure6-1exemplies allthestatetypesdiscussedabove. OurContributions: Inthiswork,wedevelopanalgorithmthatidentiesallthesteady statesofBRNsaccuratelyandefciently. Tomathematicallyexpressthisproblemsclearly,wedenethreetypesofstates accordingtothenumberofpossibleoutgoingtransitionsfromthem.Wenameastate Type0ifithasnooutgoingtransitionstoanotherstateexceptitselfselfloopstate [ 110 ] inFigure6-1a.Astatewithexactlyoneoutgoingtransitiontoanotherstateis Type1allthestatesinFigure6-1b.Stateswithmorethanoneoutgoingtransitions areType2statesstate [ 110 ] inFigure6-1c.Usingthisnotation,weobservedthe following: AllType0statesaresteadyselfloops. AllType2statesaretransient. AllthestatesofasimpleloopareofType1. 159

PAGE 160

Figure6-1.Statesofahypotheticalnetworkwiththreegenes.Thebinaryvalues correspondtoactivationlevelsofthesegenes.aThethreestatesonthe leftaretransientandofType1.ThestatewithselfloopissteadyandType0. bThefourstatesinsimplelooparecyclicsteadystatesandtheyareof Type1.cTheleftmoststateistransientandType1.Eventhoughonly [110] isofType2othersareType1,theremainingvestatescreateacomplex loop,andthustheyaretransient. Itisimportanttonotethatalltheaboveobservationsareone-sidedi.e.if conditions.Forinstance,secondobservationmeansthatifastateisofType2thenitis transient.However,atransientstatedoesnothavetobeaType2state.Here,wename thesteadystatesofType1ascyclicsteadystatesi.e,simpleloops.Ourmethodrst dividesthewholestatespaceintothreetypesType0,1and2withoutmaterializing theexponentialstatespacegraph.Then,weextractthecyclicsteadystatesfromType 1statesbyusingarandomizedtraversalmethod.Cyclicsteadystatestogetherwiththe Type0statesconstituteallthesteadystatesoftheBRNofconsideration. WeusetheBooleannetworkmodelproposedbyKauffman etal. [135].Webuild ahypotheticalstatetransitiongraphusingtheinteractionsinaBRN.Wedevelopa mathematicalmodelthatusesbinarydecisiondiagramBDDdatastructure[136]to 160

PAGE 161

classifyeachstateintooneofthethreeclasses,namelyType0,Type1andType2. Type0andType2statesareguaranteedtobesteadyandtransienti.e.notsteady, respectively.Type1statescanbeeitherone.TofurtherclassifytheType1statesas transientorsteady,wedeveloparandomizedtraversalmethodwhichsamplesrandom seedstatesfromType1statesandclassiesthevisitedstatesduringthetraversal fromthisseedstate.Whilesampling,wecalculatetheestimatorsforthenumberof steadystates,expectedsteadystatedistributionofindividualgenesandjoint-steady statedistributionsofgenepairs.Wecalculateastoppingcriterionfromthestatistical informationofexploredstates.Thiscriterionallowsearlyterminationofsamplingwhen theuserdenedpercentageofsteadystatesarefoundwithhighcondence. Ourtechnicalcontributionscanbesummarizedasfollows.Webuildamathematical modelforpruningaverylargeportionofstatespacequicklywithoutlosinganysteady states.Wedeveloparandomizedtraversalmethodthatcomputesestimatorsforthe numberofsteadystatesandthefractionofindividualgenesandgenepairsbeingactive inthesestatesinanonlinefashion.Ouralgorithmguaranteestondallthesteady statesaftersufcientnumberofiterations.Weformulateastoppingcriterionwhichuses theinformationofclassiedstatestoterminatethealgorithmwhensufcientpercentage ofsteadystatesareextractedwithagivencondencevalue. 6.1Methods ThissectiondiscussesouralgorithmforidentifyingallthesteadystatesofBoolean BRNs.Firstwedescribethemathematicalmodelforexpressingthestatesandstate transitions.Then,wediscussourmethodtosegregatethestatespaceintothree subspaces.Finally,wepresentourrandomizedtraversalmethodthatextractsType1 steadystates.Wealsogivetheformulationofastoppingcriterionthatterminatesthe traversalwhensufcientamountofsteadystatesarereportedwithhighcondence. 161

PAGE 162

6.1.1StateTransitionModel InordertoidentifythesteadystatesofaBRN,werstneedtobuildamathematical modelthatexplainsitsstatesandhowthenetworkmovesfromonestatetoanother. Let X i t = true=false denotethestateofthe i thgeneattime t .Heretrue denotesthat i thgeneisactiveandfalsedenotesthatitisinactive.Weuse X i insteadof X i t forsimplicitywhereverappropriate. Wesummarizetheinteractionsthatdeterminethenextstateofthe i thgenefrom theactivityvaluesattime t asfollows.The i thgenewillbeinactiveifatleastoneof itssuppressorsisactive.Ifallthesuppressorsofthe i thgeneareinactiveandatleast oneofitsactivatorsisactive,thenitbecomesactiveinthenexttimestep.Inallother situationsthestateofthe i thgeneremainsunchanged.Eventhoughtheassumption thatoneinhibitorcansuppressallactivatorsseemsquestionable,itiscommonly observedinbiologicalnetworks.Wu etal. [137]namedthisasstronginhibitionmodel andshowedthatitproducesthesameresultsasthresholdnetworkmodel[138]for ssionyeastcellcyclenetwork.Also,ithasbeenusedasamodelingdecisionby Garg etal. [121,122].However,itisimportanttonotethatourmethoddoesnotdepend onthisassumption. Thefollowingequationsummarizeshowthenextstateof i thgeneisdetermined: X i t +1: X i t p A t ^: p S t Inthisequation,thesymbols and ^ denotethelogicalORandANDoperators, p A t and p S t representpredicatesfortheactivatorsandthesuppressorsofthe i th geneattime t ,respectively.Wecomputethesepredicatesas p A t = W j 2 A X j t and p S t = W j 2 S X j t ,where A and S arethesetsofindicesforactivatorsandthe suppressorsofthe i thgene. Animportantobservationisthat,eventhoughthenextstateofthe i thgeneis deterministicallycalculated,therecanbemultiplenextstatesforthewholenetwork 162

PAGE 163

sinceweuseasynchronousmodel.AstateofagivenBRNisdenedbythestatesof individualgenes.Let u = [ X 1 X n ]denoteastateofthenetwork.Thenetworkcan movefromstate u tostate v = [ X 1 X i )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 : X i X i +1 X n ]onlyifthe i thgeneis oneofthegenesthatcanhaveastatechange.Individualgenesthatcanissueastate changeatagivenstatedeterminesthepossiblenextstatesofthenetwork. WemodelthechangesinthestatesofaBRNusinganabstractgraphrepresentation. Inthisgraph,eachvertexcorrespondstoapossiblestateoftheBRN.Thus,ifthereare n genesinaBRN,thenthecorrespondinggraphcontains 2 n vertices.Thereisan edgefromvertex u tovertex v ,ifitispossibletochangethestateoftheBRNfrom thestaterepresentedby u tothestaterepresentedby v byonlychangingthestate ofasinglegene.Therecanbeupto n 2 n edgesbetweenthesestates.Thisgraphis hypotheticalasweuseitonlyforbuildingourmathematicalmodel.Wenevermaterialize thisexponentialgraphinourmethod. Weclassifytheverticesofthisgraphintothreeclassesbasedonthenumberof theiroutgoingedges.Figure6-1providesvisualexamplesforallthreestatetypeslisted below: Type0: Theverticesthathavenooutgoingedgesexceptselfcycles.These verticescorrespondtosteadystatesasthestateofthenetworkcannotchange onceoneofthemisvisited.Figure6-1a Type1: Theverticesthathaveexactlyoneoutgoingedge.Thestatesforthese verticescanbesteadyortransient.Figure6-1b Type2: Theverticesthathavetwoormoreoutgoingedges.AllType2statesare transient.Figure6-1c Inthefollowingsection,wedescribeourmethodforsegregatingthestatespaceinto theabovethreetypes. 6.1.2SegregationofStatesUsingBDDs Aswediscussedintheprevioussection,wenevergeneratethestatetransition graphoftheinputnetwork.Asimpleobservationonourstatetransitionmodelallows 163

PAGE 164

ustosegregatethestateswithoutthismaterialization.Thissegregationresultsinnot onlytheimmediateidenticationofallType0steadystates,butalsoeliminatesahuge portionofstatesbyclassifyingthemastransient. Forinstance,forT-Helpercellnetworkwith23genesand 8 ; 388 ; 608 23 possible states,oursegregationmethodclassies 1 ; 321 statesasType0and 8 ; 364 ; 757 2 23 statesasType2inonly0.08seconds.Theremaining22,530statesarelabeledas Type1.Thus,weneedtoexploreonlyasmallpercentage 0.26%ofthewholestate space. Here,wedescribehowweconstructtheBDDsforallType0statesandallType 1states,namely Z 0 and Z 1 .Werstdeneapredicatethatwillbehandyinthis discussion. C i : X i t +1 X i t Here, denotesthelogicalXORoperator. C i evaluatingtotrueattime t meansthat gene i willchangeitsstatefrom X i to : X i attime t +1 .Otherwise,itpreservesits currentstate.Thefollowingequations,showtheformulasofBDDsrepresentingType0 andType1states: Z 0 : ^ i : C i and Z 1 : i C i ^ ^ j 6 = i : C j : Z 0 = Truerepresentsthestatesthatdonotsatisfyanyofthe C i conditionsi.e. noneofthegeneschangestate.Thestatesin Z 1 = Truesatisfyexactlyoneofthe C i conditionsi.e.exactlyonegenechangesstate.Thestateswhicharenotincluded inthetwoBDDsabovearecalledType2andtheyarealltransientstates.TheBDDfor thesestatescanbeconstructedsimilarly.However,wesimplyeliminatethesestates sincetheydonotreectthelongtermbehaviorofthesystem.Bydoingthiswithout materialization,wequicklyreducethestatespaceoftheproblemtoasignicantly smallerone.Inthenextsection,wedescribehowweextractthesteadystatesofType1. 164

PAGE 165

6.1.3ExtractingCyclicSteadyStates Inthissection,wedeveloparandomizedtraversalstrategythatidentiesthesteady statesofType1.Wecallthesestatescyclicsteady.Anexampleforthisisthecycle offourstatesinFigure6-1b.Attheendofeachtraversal,weremovethetraversed statesfromthestatespacethatbyusingdifferenceoperatorofBDD.Inotherwords, ourmethodavoidsredundantenumerationofthestates.Aftertraversingaportionof thevertices,weestimatethetotalnumberofsteadystates,theprobabilityofeachgene beingactiveandthejointprobabilityofgenepairsbeingco-expressedinsteadystates. Itisworthmentioningthatourtraversalmethodnevertraversesastatemorethanonce. Hence,ifitrunsforenoughtimeitlabelsalltheType1statesassteadyortransient. Algorithm2brieydescribeshowwetraversetheType1states.Next,weelaborateon differentstepsofthisalgorithm. Algorithm2 RandomizedtraversalofType1states 1.RandomlygetanunobservedvertexfromtheType1set. 2.Followtheoutgoingedgetotraversethegraphuntilseeingoneofthefollowing vertices iAvertexthatislabeledastransientorsteadyinpreviousiterations. iiAvertexthatistraversedinthisiteration. 3.Labelallthetraversedverticesastransientorsteadyandupdatetheestimators. 4.Stopifthenumberofsteadystatesobservedsofarissufcient. Step1.Selectingarandomseedstate: Weobtainarandomseedstateamongtheuntraversedsatisfyingassignments oftheBDDforType1states.WedothisbytraversingtheBDDfromrootnodetothe leaflevel.Ateachstepofthetraversal,werandomlypickachildnodeofthecurrently visitednode.WhenwereachtheleafleveloftheBDD,thestatesofallthegenesare determinedandhence,ourseedstateforthewholeBRN. Step2.Traversalstartingfromtheseedstate: 165

PAGE 166

Oncewechooseanunobservedseedstate,thenextstepistounderstandwhether ornotwecanreachtoanewsteadystatefromthisstate.Todothis,wetraversethe statetransitiongraphstartingfromthisvertexbyfollowingtheedges. Figure6-2.Summaryofthetraversalprocessforarandomlypickedstate a from unobservedType1states.Ifthepathstartingfromaendsat b c d or e ,then allthestatesonthispatharetransientStep2iofAlgorithm1.Ifthepath startingfrom a endsatastatelike f thenallthestatesonthepathfrom a to f aretransientexcluding f andallthestatesonthecyclefrom f to f are steady. SincetheseedstateisofType1,bydenition,ithasonlyoneoutgoingedge.Thus, wecaneasilyndthenextstateasthestatethatsatisesthetransitioncondition.We continuetraversalbyapplyingthesameprinciple.Figure6-2summarizesthepossible casesthatcanoccurduringthistraversal.Startingfromanunobservedstateifwe traverseoneofthefollowingthreepathsthenallthestatesvisitedonthispathare transient: ApathendinginaType0state ApathendinginaType2state Apathendinginastatethatisobservedinpreviousiterations NoticethatallthreecasescorrespondtoStep2iofourtraversalmethod.Thenext caseproducesbothcyclicsteadyandtransientstates: 166

PAGE 167

Apathleadingtoacycleofstatesvisitedincurrentiteration Inthiscase,welabelallthestatesonthecycleassteadyandtheotherstateson thepathastransient.Forinstance,ifthetraversalstartsfromthe[001]statein Figure6-1b,then[001]istransientandotherfourstatesareType1steadystates. Step3.CalculatingEstimators: Ateachiteration,wetraverseapathinthestatetransitiongraphandlabeleach stateonthispathastransientorsteady.Wenamethesetofverticesvisitedineach suchtraversalasanobservation.Usingtheseobservations,wedevelopestimatorsfor thetotalnumberandtheproleofsteadystates.Theproleofthesteadystatesisthe vectorwherethe i thentryistheexpectedfractionofthesteadystatesatwhichthe i th geneisactive.Forexample,ifthesecondentryoftheproleis0.95,itmeansthatwe expectthatthesecondgeneisactivein95%ofthesteadystates.Wealsocomputethe estimatorsforthejointexpressionco-expressionfractionsofgenepairs.Computing theseestimatesisimportantastheycanleadtoearlypredictionofthesteadystate prole. Here,wedescribeindetailthecalculationandtheanalysisoftheestimatorof thetotalnumberofType1steadystates.Firstofall,weprovethatitisanunbiased estimator.Then,wediscusshowtominimizethevarianceofthisestimator.Fortheother estimatorsweonlygivetheformulations. First,letusintroducesomenotationweusethroughoutthissection: N 0 N 1 : NumberofType0andType1states,respectively.Wecalculatethese numbersattheinitialsegregationstep. O i = s i ;t i : i thobservation. s i and t i arethenumberofobservedsteadyand transientstatestraversedinthisobservation. S i T i U i : Totalnumberofobservedsteadystates,observedtransientstatesand unobservedstatesafterrst i observations,respectively. Fromthedenitionsabove,wecancalculate U i = N 1 )]TJ/F24 11.9552 Tf 12.352 0 Td [(S i )]TJ/F24 11.9552 Tf 12.351 0 Td [(T i S i = P i j =1 s j and T i = P i j =1 t j .Now,weintroducea0/1randomvariable B i foreachobservation O i .Ata 167

PAGE 168

giventime B i =1 meansthecurrentiterationresultsinobservation O i .Wesimulateour samplingbyassumingatanytimeoneandonlyoneofthe B i 'scanbe1.Inotherwords, E [ B i B j ]=0 forany i 6 = j .Noticethat E [ B i ]= E [ B n i ]= s i + t i N 1 forobservation O i .We formulatetheestimatorofthetotalnumberofType1steadystatesatthe i thiterationas: F i = i X k =0 B k s k N 1 s k + t k Lemma8. Theestimator F i isanunbiasedestimator. Proof: Weprovethisbyshowingtheexpectedvalueof F i isequaltothetotalnumber ofType1steadystates.Takingexpectationsofbothsidesandreplacing E [ B k ] with s k + t k N 1 : E [ F i ]= E [ i X k =0 B k s k N 1 s k + t k ] = i X k =0 E [ B k s k N 1 s k + t k ]= i X k =0 s k Afterdeningtheestimator,thenextstepistocalculateitsvariance. Lemma9. Thevarianceof F i is Var [ F i ]= i X j =0 s 2 j N 1 s j + t j )]TJ/F15 11.9552 Tf 11.955 0 Td [([ i X j =0 s j ] 2 : Proof: Weknowthat, Var [ F i ]= E [ F 2 i ] )]TJ/F24 11.9552 Tf 11.955 0 Td [(E 2 [ F i ] .Werstcompute F 2 i F 2 i = i X j =0 B j s j N 1 s j + t j i X k =0 B k s k N 1 s k + t k = X j 6 = k B j B k s j s k N 1 s j + t j N 1 s k + t k + i X j =0 B 2 j s 2 j N 1 s j + t j 2 168

PAGE 169

Whenwetaketheexpectedvalueof F 2 i thersttermcancelssince E [ B j B k ]=0 forany i 6 = j .Hence,thevarianceof F i canbecomputedas: Var [ F i ]= E [ F 2 i ] )]TJ/F24 11.9552 Tf 11.955 0 Td [(E 2 [ F i ] = E [ i X j =0 B 2 j s 2 j N 1 s j + t j 2 ] )]TJ/F15 11.9552 Tf 11.955 0 Td [( E [ F i ] 2 = i X j =0 s 2 j N 1 s j + t j )]TJ/F15 11.9552 Tf 11.955 0 Td [([ i X j =0 s j ] 2 Therearemanywaystobuildanestimatorfrom F j s.However,itisdesirabletobuild anestimatorwithasmallvarianceasitconvergestotruesolutionfaster.Thefollowing lemmabuildstheestimatorwithminimumvariance. Lemma10. Theestimatorthathasthesmallestvarianceis T = P j 1 P i 1 V i V j F j Proof: Now,wediscusshowwecombinetheestimators F 1 ;F 2 ;:::;F n withvariances V 1 ;V 2 :::;V n tominimizetheoverallvarianceofourestimation.Inotherwords,wewant tondtheweightparameters 1 ; 2 ;:::; n suchthat P i =1 andthevarianceofthe estimatorfortotalnumberofsteadystatesofType1isminimized.Letusdenotethis newestimatoras T = P i F i .Then, Var T = X 2 i V i Mathematically,ouraimistominimize P 2 i V i given P i =1 .Weformulatethisproblem byusingLagrangeMultiplierasfollows: L = X 2 i V i )]TJ/F24 11.9552 Tf 11.955 0 Td [( X i )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 Takingderivativeofbothsideswithrespecttoeach i ,wegettheequations: 2 i V i )]TJ/F24 11.9552 Tf 11.955 0 Td [( =0 ; = 1 P 1 2 V i 169

PAGE 170

Solvingtheseequationswegetthe i valuesthatminimizesthe Var T as: j = 1 P 1 V i V j Thus,byusingthevalueof i swendthattheestimatorwithsmallestvarianceis T = X j 1 P 1 V i V j F j Next,wegivetheformulationsoftheestimatorsforthefractionsofeachgeneand eachgenepairbeingactiveinsteadystates.First,weformulateourestimatorforthe fractionofagenebeingactiveincyclicsteadystates.Assumethatthenumberofsteady statesatthe i thobservationinwhichthe k thgeneisactiveis n k;i .Anestimatorforthe k thgeneafterthe i thiterationisthen: G k;i = i X j =1 n k;j =S i Let n a $ b;i denotethenumberofsteadystatesinwhichgene a andgene b are bothactiveorbothinactiveafterthe i thobservation.Wecalculatetheestimatorofjoint probabilityoftwogeneshavingthesameactivitylevelatasteadystateas: J a $ b;i = i X j =1 n a $ b;j =S i Step4.StoppingCriteria: WhenourmethodnishestraversingallType1statessteps1to3,itnds allthesteadystates.However,insomeapplicationsitmightbesufcienttonda predeterminedpercentageofsteadystates.Wedevelopastatisticalcriteriontobe abletoterminatethealgorithmquicklyafterasufcientportionoftheType1statesare explored.Ourmethodstillguaranteesthatthedesiredpercentageoftheresultsare foundwithhighcondence.Moreprecisely,whentheusersuppliesaparameter e.g. 0.9,wecomputeacondence c 2 [0 ; 1] ,ateachiterationsuchthatatleast 100 170

PAGE 171

percentofthesteadystatesarefoundwithprobabilityatleast c .Thisisdesirableasthe usercanterminatetheloopwhen c islargeenoughfortheunderlyingapplication. Now,letusdescribehowthestoppingcriterionworks.Let A denotetheactual numberoftotalType1steadystates.Ifwehaveknownthevalueof A wecouldhave stoppedsamplingwithacondencevalueof c =1 when A U i + S i wejuststopsamplingwith c =1 sinceevenifallthe unobservedstatesweretobesteady,thereportedoneswouldconstituteatleast 100 percentoftheType1steadystates.Otherwise,wecalculatethecondencevaluein i th iterationastheprobabilitythatwewouldhaveobservedatleast S i steadystatesinour observationssofariftherewere A i unobservedsteadystates.Formally,wecomputethe condenceas: C A i = S i + T i X k = S i [ S i + T i k q k i )]TJ/F24 11.9552 Tf 11.955 0 Td [(q i S i + T i )]TJ/F25 7.9701 Tf 6.586 0 Td [(k ] q i inEquation6representsthepercentageofsteadystatesiftherewere A i steady statesinType1statesi.e. q i = A i N 1 .TheinnertermofthesummationrepresentsThe probabilityofgettingexactly k steadystatesfrom S i + T i currentlyobservedstatesifthe probabilityofastatebeingsteadyis q i . Lemma11showsthat,thecondencevaluereportedwhenwestopsamplingis neveranoverestimation. Lemma11. ThecondencevaluegiveninEquation6byusing A i doesnotleadto falsedismissal. 171

PAGE 172

Proof: Here,wehavethreecasestoconsider: Case1: A >A i Then, q = A N 1 > A i N 1 = q i .Sincethecondencevalueiscalculatedasthearea undertherighthandsideoftheprobabilitydistributionfunctioni.e.inverseCDF, c willbelargerforalargervalueof q .Hence, C A >C A i .Thatmeanswhenever westopsamplingthecondencewereportisconservative. Case2: A = A i Trivially, C A = C A i whenweterminatethesampling. Case3: A
PAGE 173

gapbetweenthesecondandfourth G 2 .Thecelldivisioniscompletedatthefourth phasenamed M .Thetwonewcellsthenenterthe G 1 phaseagainwhichcompletesthe cycle.Thestatecorrespondingto G 1 phaseisasteadystatethatisobservedthemost intheyeastlifecycle. Figure6-3.Regulatorynetworkofthecellcycleofbuddingyeast.Redarrowswith pointedheadsrepresentactivation,blackarrowswithbarheadsrepresent inhibitionandyellowarrowsindicateself-degradation. Li etal. [36]studiedtheBooleannetworkmodelofthebuddingyeastFigure6-3 andidentiedtheBooleanstatesvisitedduringacompletecellcycletogetherwithseven steadystatesofthenetworkcorrespondingtothexedpointsofthedynamicsystem. Similarly,Davidich etal. [35]foundthirteendifferentsteadystatesfortheBooleanmodel ofthecellcycleofssionyeastFigure6-4. Herewecomparethesteadystatesreportedbyourmethodwiththeonesfromthe methodsofLi etal. andDavidich etal. Forthisweusevectornotationtorepresentthe activitylevelsofanorderedgeneset.Inthisnotation,0meansthecorrespondinggene isinactive,1meansitsactiveandXmeansitcanbeeitherone.Forinstance,foragene setof f g 1 g 2 g 3 g ,the[01X]vectorrepresentstwostates,namely[010]and[011]. 173

PAGE 174

Figure6-4.Regulatorynetworkofthecellcycleofssionyeast.Redarrowswith pointedheadsrepresentactivation,blackarrowswithbarheadsrepresent inhibitionandyellowarrowsindicateself-degradation. ThebuddingyeastcellcyclenetworkinFigure6-3isthesameastheoneanalyzed byLi etal. [36].Weusetheorder f Cln3,MBF,SBF,Cln1-2,Cdh1,Swi5,Cdc20,Clb56,Sic1,Clb1-2,Mcm1 g forthevectorrepresentationofthestatesofelevengenesin thisnetwork.WefollowLi etal. byexcludingcellsizefromthegenesetandthestate representation.Li etal. reportedsevensteadystatesforthisnetworkoneofwhich correspondstothe G 1 phaseofthecellcycle.Weidentiedeightdifferentsteadystates, sixofwhichareType0andtheothertwoareofType1.SixType0steadystateswe foundare[0000X000X00]statesand[0100X000100]statesandallarealso reportedbyLi etal .Also,ourmethodaccuratelylabeledthe[00001000100]statethat correspondsto G 1 phaseassteady.ThetwoType1steadystateswhichvisiteachother inacycleare SS 1 = [00100000000]and SS 2 = [00110000000]. SS 1 iswhen SBF isthe onlyactivegeneinthenetwork. SS 1 isfollowedby SS 2 sinceinthenexttimestep SBF alsoactivates Cln1-2 .Duetoselfdegradationof Cln1-2 instate SS 2 ,thisstategoes backtothe SS 1 again.ThemethodofLi etal. labels SS 2 assteadywhereasitdoesnot report SS 1 174

PAGE 175

ForthestatesofthessionyeastcellcycleinFigure6-4,weusetheorderedgene set f Start,SK,Cdc2/Cdc13,Ste9,Rum1,Slp1,Cdc2/Cdc13*,Wee1/Mik1,Cdc25, PP g .OurmethodreportsfteendifferentsteadystatesallareofType0.Thesestates are:[0001X00XX0]states,[0000100XX0]states,[00000001X0]states and[0000000000].Therstsetofstatescontainsthesteadystate[0001100100] thatcorrespondstomoststablephase G 1 ofthecellcycle.Thisstatetogetherwith twelveothersteadystateswefoundmatchesexactlytheonesfoundbyDavidich et al. [35]ThetwoadditionalsteadystatesthatwefounddifferentthanDavidich etal. are [0000000100]and[0000000110].Therststatecorrespondstohighactivationlevelof onlyWee1/Mik1genesandthesecondstateiswhenCdc25isalsoactivetogetherwith Wee1/Mik1.ThereasonofthisdifferenceisthatDavidich etal. manuallysetsanegative thresholdforCdc2/Cdc13activation.Cdc2/Cdc13degradesWee1/Mik1whichprevents theirsystemfromvisitingthetwosteadystateswefoundwithoutsettinganythreshold manually. Thesetwoexamplessuggestthatourmethodcanaccuratelyidentifythesteady statesofBRNs. 6.2.2PerformanceEvaluation Here,wecomparetheperformanceofourmethodtothatofGarg etal. [121,122]. Weusedtheasynchronousstatetransitionmodelforbothalgorithmsinthisexperiment. WecomparedtherunningtimesforanumberofrealBRNsaswellasforrandomly generatednetworks.WecompiledtherealBRNsfromthepathwaydatabasePID[9] andotherpublishedwork[35,36,121,122].Table6.2.5reportstherunningtimesfor Garg etal. 'smethodnamedGenysisandouralgorithmwithdifferentparametersettings. ForrealnetworksofsmallsizesuchasyeastcellcyclesandT-Helpernetwork,the runningtimesforbothmethodsarearoundonesecondwithGenysisrunningslightly fasterthanourmethod.However,forbiggerrealnetworksourmethod'srunningtime issignicantlysmallerthanGenysis.Astheauthorsalsostatedintheirwork,Genysis 175

PAGE 176

mightneedextensiveamountofrunningtimewhenusingasynchronousmodeldueto theirheuristicstoselectseedstatesfromthestatespace.Therowcorrespondingtop38 MAPKsignalingpathwayconstitutesagoodexampleforthisscenario.Forthesame networkouralgorithmcanidentifythe90%ofthesteadystateswith90%condencein only11.3seconds.Additionally,therunningtimesonfourrandomlygeneratednetworks indicatedthatGenysiscannotscalewellwiththegrowingnetworksizewhereasour algorithmcanstillndlargeportionofthesteadystatesinafewminutes.Itisworthwhile tonotethatbothGenysisandouralgorithmhaveexponentialtimeandspacecomplexity intheworstcasescenario.ThisisadirectconsequenceofusingBDDdatastructureas ithasexponentialworstcasecomplexity. Wealsocomparedthesteadystatesfoundbybothalgorithmsforthetwoyeast cellcycles.Asdiscussedinprevioussection,thesteadystatesofthesetwonetworks arereportedinLi etal. [36]andDavidich etal. [35].Forthebuddingyeastcellcyclein Figure6-3,Genysiswasabletoidentifyonlythetrivialsteadystatewhenallthegenes areinactive.Forthessionyeast,Genysislabeledthestatethatcorrespondstothe G 1 phaseonlythegenes Ste9 Rum1 and Wee1/Mik1 areactiveofcellcycleastransient. AsreportedinDavidich etal. [35], G 1 isthemoststablephaseofthiscycleandour methodcorrectlyclassiesthisstateassteady. Theaboveresultssupportthatouralgorithmismorescalableandpractical comparedtoGenysis.Furthermore,thesteadystateswereportedforyeastcellcycles matchbetterwiththepreviousndings. 6.2.3Co-ExpressedGenePairsinHumanHedgehogNetwork Wecalculatethefractionofsteadystatesinwhichtwogenesareinactivestate together.Biologicallythisfractioncorrespondstotheco-expressionofthetwogenes. Revealingco-expressedgeneshasgreatsignicanceindiscoveryofconservedgenetic modules[131,141,142]andidenticationofdifferentiallyexpressedgenes[143]. 176

PAGE 177

Here,wecomparetheco-expressionvaluesforgenepairsfoundbyouralgorithm withthevaluesreportedinthegeneco-expressiondatabase,COXPRESdb[144]. Forthispurpose,weuseTheHedgehogsignalingnetworkof HomoSapiens givenin theKEGGPathwayDatabase[7].Thisnetworkconsistsof17genesandhence,136 possiblegenepairs.Wesortedthegenepairsaccordingtotheirco-expressionvaluesin decreasingorderandcomparedourorderingwiththeoneinCOXPRESdb.Wepicked thetop20genepairsfromourlistandsearchedfortheindicesofthesepairsinthe orderingofCOXPRESdb.Here,wereportthelargestindex, l ,amongthese k indicesfor differentvaluesof k For k =1 wehave l =1 ,whichmeansthatthehighestco-expressedgenepair GL1-SMO inourorderingisalsothetopscoringpairinCOXPRESdb.For k =5 we have l =6 ,meaningthatthevegenepairs GL1-SMO GSK3B-FBXW11 RAB23GAS1 GLI1-IHH and SUFU-SMO withthehighestranksinourorderingareinbetween thetop 6 pairsintherankingofCOXPRESdb.Fortheothervaluesof k =10 and k =15 ,the l valuesare16and35respectively.Hence,thegenepairsreportedbyour methodthatarefoundtobeactivetogetherinthesteadystatessuggestthatthereisa co-expressionbetweenthesetwogenes. Theaboveresultssuggestthatouralgorithmisusefulinpredictingco-expressionof genesbyutilizingthethesteadystateinformationofBRNs. 6.2.4AccuracyofEstimators Toevaluatethequalityofoursampling-basedestimators,wemeasuredtheir correctnessandconvergencerate.Correctnessmeansthattheestimateswilleventually convergetothecorrectvalue.Fortheconvergencerate,agoodestimatorshould approximatethecorrectvalueafterasmallfractionofthestatespaceisexplored. Weuseaportionofp53networkof HomoSapiens takenfromKEGG[7]inthis experiment.Wemeasuretheestimatednumberofsteadystatesatwhichageneis activeforeachgeneateachiterationofouralgorithm.Ouralgorithmtraversestheentire 177

PAGE 178

Figure6-5.Convergenceoftheestimatorsforthesteadystateprolesofthegenes. Thesegenesareaselectedsubsetofthegenesofp53networkof Homo Sapiens [7].Y-axisshowsforeachgenethefractionofsteadystatesthatthe geneisinactivestate. spaceofType1statesinabout2,500iterationsforthisnetwork.Figure6-5showsthe resultsforsevendifferentgenes.Weplotthesegenesastheyhavedifferentsteady stateproles.Inotherwords,theyvaryinthefractionofsteadystatesinwhichtheyare activee.g.CHK1isactivewhereasp21issuppressedinmostofthesteadystates. Theresultsshowthatourestimatorsconvergetothecorrectratioforallgenesinless than500iterations.Therapidconvergencesuggeststhatouralgorithmapproximates thecorrectproleofgenelevelsatsteadystateswithouttraversingthewholespaceof Type1states.Thissuggeststhat,equippedwiththestoppingcriterionwedevised,our algorithmisalsopracticalandaccurateforBRNswithlargenumberofType1states sinceearlyterminationofthealgorithmdoesnotleadtosignicantdeviationfromthe correctsteadystateprole. 6.2.5Discussion Inthischapter,weproposedascalablealgorithmthatcanidentifyallthesteady statesofBooleanBRNsaccuratelyandefciently.Ourmethodusesasynchronous 178

PAGE 179

Table6-1.Comparisonofouralgorithmwithanexistingmethodonrealandrandom networks. NetworkNameGenesInteractionsGenysis a OurAlgo. b OurAlgo. c Fissionyeastcellcycle[35]10270.21s0.18s0.17s Buddingyeastcellcycle[36]12350.13s0.25s0.22s T-Helpercells[121]23350.23s1.14s0.43s p38MAPKsignaling[9]2628545.3m11.3s2.1s T-cellreceptor[122]405820.7m14.2s2.11s randomNet1203210.4m2.282s0.3s randomNet23048-5.923s3.13s randomNet34064-4.7m3.4m randomNet45080-68.7m15.3m a Weusedacut-offtimeof24-hoursand-indicatesthatthemethodcouldnotndall steadystateswithinthistime. s denotessecondsand m denotesminutes; b Runningtimeofouralgorithmwhen90%ofthesteadystatesarefoundwith90% condence; c Runningtimeofouralgorithmwhen80%ofthesteadystatesarefoundwith80% condence. statetransitionstomodelthechangesinthestatesofBRNs.WeemployedBDDdata structuretodoaninitialsegregationontheexponentialstatespacewithoutmaterializing it.Thisinitialsegregationtogetherwiththestoppingcriterionweformulatedspedupthe algorithmsignicantly.Additionally,ouralgorithmestimatesthenumberofsteadystates andtheexpectedbehaviorofindividualgenesandpairsofgenesinthesteadystatein anonlinefashion. 179

PAGE 180

CHAPTER7 DYNAMICMODULARSTRUCTUREOFREGULATORYNETWORKS Thedistributionofinteractionsbetweengenesinregulatoryandsignalingnetworks i.e.,BiologicalRegulatoryNetworksBRNs,isnotrandom.Theyformstatistically signicantconnectedsubnetworkscorrespondingtovariousbiologicalfunctions.Such groupsofgenesarealsocalledmodules.Recentstudieshaveshownthatbiological networksexhibitmodularity[80,142,145148].Inthesenetworks,theinteractions andtheentitiesofamodulecollectivelydescribehowacertainbiologicalfunctionis performedforanorganism. Numerousmethodshavebeenproposedtoidentifymodulesfordifferenttypes ofbiologicalnetworkssuchasmetabolicnetworks[80,97],protein-proteininteraction networks[142,145,146]andBRNs[147,148].Thesemethods,however,havetwo importantdrawbackswhentheyareappliedtoBRNs.Werstelaborateonthesetwo drawbacks.Wethendiscusshowweaddresstheseproblems. Ignoringinteractiontypesanddirections Existingmethodsoftenconsiderabiologicalnetworkasanundirectedgraph,where eachmoleculemapstoanodeandeachinteractionbetweenapairofmoleculesmaps toanedge.Intheresultingnetworkalltheedgesareundirectedandassumedtobeof thesametype.Wecallthenetworksconstructedinthismannerassimpliednetworks intherestofthischapter.Thismodelingstrategycauseslossofbiologicalcontextfor BRNswhereinteractionshavedirectionsanddifferenttypesactivation/inhibition. Figure7-1showsanexampleofhowthissimplicationdegradestheaccuracyof moduleidentication.Considerthetissuefactorpathwayinhibitor TFPI .Itinhibits fourdifferentcoagulationfactors,namely F3 F5 F7 and F10 .Therefore,itactsas ananticoagulantwhenitisactive.Indeed,itisannotatedasalipoprotein-associated coagulationinhibitor[7].However,whentheseinhibitionsarereducedtoundirected edgesasinDiao etal. [148], TFPI isgroupedwithmostofthecoagulationfactors 180

PAGE 181

Figure7-1.AportionofthehumancoagulationcascadenetworkfromKEGGPathway Database[7].Pointedarrowheadsrepresentpositiveregulationactivation andarrowheadswithbarsrepresentnegativeregulationinhibition.The twodivisionsshowthetwomodulesfoundaftersimplifyingthenetworkasin existingmethods. whichactascoagulationactivators.Inotherwords,thissimplicationdeterioratesthe identicationofcorrectfunctionaldecompositionoftheBRN. Ignoringactivitylevelchangesofgenesovertime Thesecondweaknessofexistingmethodsistheyignorethattheactivitylevels ofthegenesofaBRNcanchangeovertimeduetointeractionsbetweengenes. ExistingmethodsoftenworkforonlystaticsnapshotsofBRNs.Considertheexample inFigure7-1.Here,if THBD isactiveatthecurrentstate,itwillactivate PROC inthe nextstatewhichinturnwillinhibit F2 F5 and F8 .Thesestatechangescreatedifferent snapshotsoftheBRN.Thetraditionalapproachtodealwithsuchdynamicnetworksis toidentifythemodulesfromscratchforeachsnapshot[149155].However,copingwith networksnapshotsindependentlymayleadtosubstantialvariationinobtainedmodules inconsecutivestates,resultingininconsistentmodularstructures.Severalrecent methodspointouttheimportanceoftrackingtheevolutionofthemodularstructure 181

PAGE 182

throughdynamicstepsinothercontexts[156158].Furthermore,incrementallyupdating modularstructureiscomputationallymoreefcientthandealingwitheachsnapshot separately. Inthiswork,wedesignanalgorithmthatconsidersboththeinteractiontypes andthedirections.Italsoallowsincrementalupdateofmoduleswhentheunderlying BRNchangesitsstateintime.Thisincrementalstrategyallowsustokeeptrackof theevolutionofindividualmodulesandimprovestherunningtime.Belowistheformal statementoftheproblemthatweaddressinthiswork: Problemdenition: GivenaBRNandaninitialnetworkstate S 0 thatconsistsofthe statesofeachgeneactiveorinactive,identifythesequenceofmodulestructures C 0 C 1 , C t dynamicallywhenthestateofthenetworkchangesas S 0 S 1 ; ;S t over timewhere C i isthepartitioningofthegenesoftheinputBRNintomodulesaccordingto state S i Ourcontributions: Wedevelopanovelmethodtondthedynamicmodulesofa BRNwhenthestateoftheBRNcanchangeovertime.Ourapproachdiffersfromthe existingonesfromtheverybeginningi.e.,themodelingphase.Insteadofnding modulesofthesimpliednetworki.e.,ignoringedgetypesanddirections,werst createanewnetwork,namedfunctionalnetworkfromtheunderlyingBRNfortheeach differentstate.ThenodesofafunctionalnetworkarethegenesintheoriginalBRN. Theedgesbetweentwonodesrepresentthefunctionalsimilarityofthesegenesat thecorrespondingstateoftheBRN.Startingfromaninitialstate S 0 weuseastate transitionfunctiontocomputethenextstatesoftheBRNandateachstateweupdate thefunctionalnetworkbyconsideringthestatechangesofthegenes.Weobservethat thefunctionalnetworksattwoconsecutivestatesoftenshowhighsimilarity.Following thisobservation,wedevelopanincrementalalgorithmthatcomputesthemodulesof theBRNatanewstateusingitsmodulesinthepreviousstateandthechangesinthe functionalnetwork.Tofurtherreducethecomputationalcost,wedevelopacompact 182

PAGE 183

representationofthenetworkinwhichthemoduleinformationisembeddedandthe modularstructureispreserved. Ourtechnicalcontributionscanbesummarizedasfollows.Weintroducethe conceptoffunctionalnetworkthatrepresentsthefunctionalsimilaritiesbetweengenes atagivenstateofaBRN.Weproposeanalgorithmthatincrementallyidenties themodularstructureofthedynamicnetworkssuchasBRNs.Webuildacompact representationofthemodularstructureoftheBRNs.Thisallowsourmethodtoscalefor largenetworksizeswithmanydynamicsteps. Theorganizationoftherestofthischapterisasfollows:Section7.1describes howweaddresstheissuesofexistingmethodsanddetailedtheoreticalanalysisofour algorithm.TheexperimentalresultsareillustratedinSection7.2. 7.1Methods Thissectiondiscussesthealgorithmwedevelopforidentifyingdynamicmodules ofBRNs.Briey,thissectionisorganizedasfollows.Section7.1.1discussesthe simulationofthedynamicbehaviorofBRNsbyusingastatetransitionmodel. Section7.1.2describeshowweconstructfunctionalnetworksfromoriginalBRNs foragivenstateofthenetwork.Section7.1.3presentsouralgorithmthatincrementally updatesthemodularstructureofaBRNusingfunctionalnetworksatdifferenttime steps. 7.1.1StateTransitions ThedynamicbehaviorofBRNsstemsfromthealterationsingeneactivitylevels. Thesealterationsaredeterminedbythestatesofinteractinggenes.Wedenotethestate ofthe i thgeneattime t as X i t where X i t =1 meansthatthe i thgeneisactive highactivityleveland X i t =0 meansthatitisinactivelowactivitylevel.We denotethestateofaBRNwith n genesattime t usingavectorwith n entriesas x t = [ X 1 t , X i t , X n t ]. 183

PAGE 184

Thestateofthegenescanchangeovertimeduetointernalregulations.Let A i and I i bethesetofactivatorsandinhibitorsofthe i thgenerespectively.Anactivatoror inhibitorisactivewhenitsstateis1.Thefollowingequationcomputesthenextstate ofthe i thgenefromtheactivityvaluesofthegenesatstate x t X i t +1= 8 > > > > > > < > > > > > > : 0 if [ P j 2 A i X j t )]TJ/F29 11.9552 Tf 11.956 8.967 Td [(P j 2 I i X j t ] < 0 1 if [ P j 2 A i X j t )]TJ/F29 11.9552 Tf 11.956 8.967 Td [(P j 2 I i X j t ] > 0 X i t if [ P j 2 A i X j t )]TJ/F29 11.9552 Tf 11.956 8.966 Td [(P j 2 I i X j t ]=0 7.1.2ConstructionofFunctionalNetworks Recallthatwedenethemodulesasthesetofgenesthatcollectivelyservefor acertainbiologicalfunction.Tondsuchmodules,therstquestiontobeaddressed is:Howcanwemodelthefunctionalsimilaritybetweentwogenes?Herewebuild abiologicallyandstatisticallysoundapproach.Wesaythattwogeneshavesimilar functionsatagivenstateoftheBRNiftheirimpactsonthestateofthatBRNaresimilar. ForagivenstateofaBRNwith n genes,weconstructanundirectedandweighted graph.Wecallthisgraphthefunctionalnetwork.Eachnodeofthisnetworkcorresponds toageneintheBRN.Theweightofanedgeshowsthefunctionalsimilarityofthetwo genesconnectedbythatedge.Webuildthisfunctionalnetworkasfollows.Werst calculatetheimpactofeachgeneonthegivenstate.Werepresenttheimpactofeach genebyan n 1 vector.Then,foreachgenepair,wecalculatethesimilarityoftheir impactvectors.Weelaborateoneachofthesestepsnext. CalculationofImpactVectors: Wedenotetheimpactofthe i thgeneonthenetwork attime t withan n 1 vector Imp i t namedtheimpactvector.Wecomputethisvector asfollows.Let x t =[ X 1 t , X n t ]denotethestateofagivenBRNattime t .Also,let y t =[ X 1 t , X i )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 t 1 )]TJ/F24 11.9552 Tf 12.364 0 Td [(X i t , X n t ]denotethestateobtainedbyippingonly thestateofthe i thgene.Wecomputethenextstatesofboth x t and y t byapplyingour 184

PAGE 185

statetransitionruleandrepresentthemby x t +1 and y t +1 respectively.Thestateofthe i thgeneattime t canbeequalto0or1.Thenwecomputetheimpactvectorofthe i th geneasfollows: Imp i t = )]TJ/F15 11.9552 Tf 9.298 0 Td [(1 X i t y t +1 )]TJ/F24 11.9552 Tf 11.955 0 Td [(x t +1 Intuitively,wearecomputingthedifferencebetweentwostatevectorsat t +1 when i thgeneisactiveandwhenitisinhibitedattime t .The j thentryof Imp i t is1ifboth gene i and j hadthesameformofstatechangesi.e., 0 1 or 1 0 .Thisentry is0whenippingthestateof i thgenedoesnoteffectthestateof j thgene.If i and j hadreversestatechangesthe j thentryof Imp i t is-1.Thus,theimpactvectorshows thesetofgenesthathaveactivitylevelchangesbyalteringonlythe i thgeneandhow theiractivitylevelschange.Forinstance,let x t +1 = [0,0,1,1]and y t +1 = [1,1,0,1] fortherstgeneofahypotheticalnetworkwithfourgenes,then Imp 1 t = )]TJ/F15 11.9552 Tf 9.298 0 Td [(1 0 [1,1, -1,0].Thisshowsthatthesecondgeneisactivatedandthethirdgeneisinhibitedby theactivationoftherstgene.Thefourthentryof Imp 1 t iszeromeaningthattherst genehasnostatechangingeffectonthefourthgeneattime t .Biologically,non-zero entriesof Imp i t showsthegeneswhosestatesaresensitivetotheactivitylevelofthe i thgeneattime t CalculationofImpactSimilarities: Havingconstructedtheimpactvectorsforeach geneinthenetworkforaspecicstate,nowwedescribehowweusethesevectors tocalculatethesimilaritiesbetweenthefunctionsofgenesi.e.,theedgeweightsin functionalnetwork.Forthispurpose,wecalculatethestatisticalsignicanceofthe similarityoftheirimpactvectorsasfollows.Considergenes i and j attime t .Let a and b denotethenumbersofnonzeroentriesof n 1 impactvectors Imp i t and Imp j t ,respectively.Let c denotethenumberofnonzeroentriesatthesamepositions ofthesetwovectorswiththesamevalue.Ifbothimpactvectorshaveanequalvalue atposition k ,itmeansthatbothgene i andgene j canalterthestateofthegene k in thesamedirectionatthecurrentstate.Wecomputetheimpactsimilarityastheminus logprobabilityofthenumberofsuchcommonlyaffectedgenes.Formally,let K bethe 185

PAGE 186

randomvariablethatdenotesthenumberofcommonnonzeroentriesin Imp i t and Imp j t assumingthenonzeroentriesareuniformlydistributedinthesevectors.Without lossofgenerality,weassume a b .Then,wecalculatetheimpactsimilarityofgenes i and j attime t as: Sim i;j = )]TJ/F15 11.9552 Tf 11.291 0 Td [(log[Pr K c ] where Pr K c denotestheprobabilitythatthenumberofcommonnonzeroentries isgreaterthanorequaltoobservedvaluei.e., c andcalculatedas Pr K c = P b x = c n )]TJ/F26 5.9776 Tf 5.756 0 Td [(a b )]TJ/F26 5.9776 Tf 5.756 0 Td [(x a x n b FormalDenitionofFunctionalNetwork: Weconcludethissectionbyformally deningthefunctionalnetwork.GivenaBRNanditsinitialstate S 0 ,thefunctional networkofthatBRNatstate S t istheundirectedweightedgraph G t = V;E t ,where eachgeneoftheBRNcorrespondstoanodein V and E t isthesetofedgesin G t Wecomputethestate S t from S 0 byusingthegivenstatetransitionfunction.Then,we calculatetheedgeweightbetweenthe i thand j thnodesi.e.,Sim i;j astheirimpact similarityatstate S t 7.1.3IdenticationofDynamicModules AsthestateofaBRNchangesfromonestatetothenext,itsfunctionalnetwork oftenchangesslightly.Inthissection,wedevelopanalgorithmthatexploitsthis observation.ItcomputesthemodularstructureoftheBRNatitsnewstatefrom themodulesofitsoldstateinsteadofrecomputingitfromscratch.Todevelopour incrementalalgorithm,werstbuildacompactrepresentationofthemodularstructure. CompactRepresentationofaNetwork: Let G = V E bethefunctionalnetwork ofaBRNatacertainstatetogetherwiththeimpactsimilarityfunction w : E R as explainedinSection7.1.2.Let C = f C 1 ;C 2 ;:::;C k g bethemodularstructureof G ,where C i representsthe i th modulein G S k i =1 C i = V ,and C i C j = ; ; 8 i 6 = j .Weconstructa newnetworkwithanequivalentmodularstructureto G butcontainssignicantlysmaller 186

PAGE 187

numberofnodesandedges.Weusethisnetworkinplaceof G toreducetherunning timeofmoduleidenticationas G changesovertime. Figure7-2.Ahypotheticalnetworkwithtwomodulesabeforecompactingbafter compacting.Themoduleboundariesareshowndashedovals.Forsimplicity, theweightofalltheedgesinaareone. Webuildthecompactrepresentation G 0 = V 0 ;E 0 of G = V;E asfollows.Let usdenethefunction overamodule C i as C i = P u;v 2 C i w u;v .Foreachmodule C i in G ,if C i containsonlyonenode,wecreateonenode x i in G 0 .If C i containsmore thanonenode,wecreatetwonodes x i ;y i in G 0 andanedgebetweenthemwithanedge weightequalto 1 2 C i .Foreachpairofmodules C i ;C j in G ,weinsertfouredgesone foreachpairofnodesinopposingmodules x i ;x j ; x i ;y j ; y i ;x j ; y i ;y j ,eachhas anedgeweightequaltoonefourthofthesumoftheedgeweightsbetween C i and C j Figure7-2illustratesthisconstructiononasimpleexample. Hereweshowthatthisconstructionembedsthemoduleinformationof G and preservesthemodularstructure.Firstletusrecallthewell-knownmodularityscore Q ofNewman etal. [159].Givenamodulardecomposition C = f C 1 ;C 2 ;:::;C k g Q C = X i C i 2 m )]TJ/F29 11.9552 Tf 11.955 16.857 Td [( vol C i 2 m 2 # 187

PAGE 188

where m isthetotalweightofalledgesinthenetworkandvol C i istotalweightofall edgesthatareincidenttoatleastonenodein C i .Weprovethatthemodularstructureis preservedincompactrepresentationinTheorem6byusingthelemmasbelow.Weomit theproofsoflemmasduetospacelimitation. Lemma12. C ONSERVATIONOFMODULARITYSCORE Let G = V E anetworkand C beitsmodularstructure.Then,thecompactrepresentation G 0 = V 0 ;E 0 hasthemodular structure C 0 suchthat Q C 0 = Q C Lemma13. C ONSERVATIONOFMODULEMEMBERSHIP Let G 0 = V 0 ;E 0 bethecompact representationofthenetwork G .Also,let x i and y i x i ;y i 2 V 0 ,bethetwonodes constructedtorepresentthemodule C i of G whilecompressing G to G 0 .Then,forany modulardecomposition C 0 0 of G 0 inwhich x i ;y i belongtotwodifferentmodulesthere existsamodularstructureof G 0 withmodularityscoregreaterthan Q C 0 0 inwhich x i ;y i belongtothesamemodule. Theorem6. C ONSERVATIONOFMODULARSTRUCTURE Givenafunctionalnetwork G = V E anditsoptimalmodularstructure C = f C 1 C 2 ::: C k g i.e., Q C ismaximum, let G 0 = V 0 ;E 0 bethecompactrepresentationof G and C 0 = f C 1 0 ;C 2 0 ;:::;C k 0 g beits modularstructurecomputedasdescribedinthissection.Then, Q C 0 ismaximumover allpossiblemodulardecompositionsandthereisaone-to-onecorrespondencebetween C and C 0 Proof. Lemma12statesthat Q C = Q C 0 .Herewerstshowthatthereexistsno othermodulardecomposition C 0 0 = f C 1 0 0 ;C 2 0 0 ;:::;C t 0 0 g of G 0 suchthat C 0 0 6 = C 0 and Q C 0 0 > Q C 0 .Since C 0 0 6 = C 0 ,thereexistsatleasttwonodes x i ;y i suchthat x i ;y i 2 C i 0 but x i 2 C a 0 0 and y i 2 C b 0 0 where C a 0 0 6 = C b 0 0 .ByLemma13weknowthatthereexistsanother modulardecompositioninwhich x i ;y i belongtosamemoduleandthemodularityscore isgreaterthan Q C 0 0 .Applyingthisargumentiterativelyweget Q C 0 ismaximumover allpossiblemodulardecompositions.Also,againbyLemma13each C i 0 = f x i ;y i g 188

PAGE 189

andcorrespondstotheoriginalmodule C i of G .Hence,thereexistsaone-to-one correspondencebetween C and C 0 IncrementalIdenticationofDynamicModularStructure: HerewedescribeouralgorithmthatidentiesthemodulesofBRNsateachstate incrementallyi.e.,withoutrecomputingthemfromscratchbyutilizingthecompact representationwedevised.Formally,let S 0 betheinitialstateofagivenBRNand G t = V E t denotethefunctionalnetworkofthisBRNatstate S t where w t : E t R isa non-negativeweightfunction.Also,let C t = f C t 1 C t 2 ::: C t k g bethemodularstructureof G t 8 t 0 .Algorithm3computes C t +1 using C t and G t G t +1 where isthesymmetric differencesymbol.Atahighlevel,ourmethodidentiesthemodulesofdynamicBRNs byemployinganexternalalgorithm A tocalculatethemodularstructureofcompact representationateachstate.Todothis,weusethewell-knownCNMalgorithm[160]as A WegiveadescriptionofourincrementalmethodinAlgorithm3.First,wecompute theinitialmodularstructure C 0 anditscompactrepresentation C 0 c fromthefunctional network G 0 atstate S 0 .Afterthat,tillthenetworkreachesasteadystate[161]oritvisits auserdenednumberofstates,foreachstate S t weapplythefollowingthreemain steps: Combinechangesincomparisontothepreviousstateandpreviousmodular structuretoupdatethecompactrepresentationof G t Algorithm3,lines5-16; Applyalgorithm A onthecompactrepresentationtoobtainitsmodularstructure; Renetheobtainedmodularstructureonthecompactrepresentationand decompressittogettheactualmodularstructureofthenetwork. Instep,weupdatethecompactrepresentationasfollows.Considertheset ofnodes V thatareincidenttoupdatededges.Thenodesoftheset V aresubject tochangingtheirmemberships.Toallowthesenodestoleavetheirpreviousmodules andjoinnewonesasthenetworkstatechanges,wemoveallnodesin V outof 189

PAGE 190

Algorithm3 IncrementalAlgorithmforIdentifyingDynamicModulesofaBRN Computethefunctionalnetwork G 0 using S 0 Findthemodularstructure C 0 of G 0 usingAlgorithm A Computethecompactmodularstructure C 0 c of G 0 for eachstate S t )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 ;t> 0 do Initialize C t c of G t to C t )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 c Usestatetransitionfunctiontocompute S t Let E = f u;v j u;v 2 E t )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 E t g forall e 2 E do Updateedgeweightsbetweenmodulesof C t c endfor Let V = f u j9 v; u;v 2 E t )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 E t g forall v 2 V do Extract v fromitsmoduleandremoveitsedges Createanewsingletonmodulethatonlycontains v Calculatetheedgeweightsbetweenmodulesof C t c andthenewmodule endfor Findthemodularstructure C t c = f C t c; 1 ;C t c; 2 ;:::;C t c;m g of G t c usingAlgorithm A RenethemodularstructureasstatedinLemma13 Decompress C t c toget C t endfor theirmodulesandtreateachnodeasasingletonmodule.Inotherwords,foreach node u 2 V ,wecreateanewnode x u asanewsingletonmoduleinthecompact representation.Foreachnewnode x u ,if u isadjacenttoanymodule C i ,wecreate twonewedges x u ;x i and x u ;y i toconnectthissingletonmoduleto C i accordingly andassigntheedgeweightsashalfofthesumofedgeweightsbetweenthesetwo communities.Forany C i containing u ,wealsoadjusttheweightofedge x i ;y i by subtractingthesumofedgeweightsfrom u toothernodesin C i toreecttheremovalof u from C i Afterincorporatingthechangesintothecompactrepresentation,instep,werun algorithm A againonthenewcompactrepresentationtoobtainitsmodules.Notethat themodularstructureobtainedatthisstepisincompactform.Thisallowsustoreduce therunningtimeofthisstepcomparedtouncompressednetwork. 190

PAGE 191

Instep,wefurtherrenethismodularstructurebasedonLemma13as follows.If x i and y i areassignedtodifferentmodules,weeithermove x i tothemodule containing y i ormove y i tothemodulecontaining x i dependingonwhichoneincreases themodularitymore.Thisrenementmakessurethat x i ;y i arealwaysassignedto thesamemoduleanddoingsowillincreasethemodularityvalueaswestatedin Lemma13.Finally,wedecompressthecompactmodulestoobtaintheactualmodules oftheBRNatcurrentstate. Ourincrementalmethoddescribedinthissectionhasimportantadvantagesover thetraditionalmethodswhichassumethatBRNsarestaticnetworksandcalculate modularstructuredependingonthisassumptionsuchas[145]and[148].Firstly,we introducetheconceptoffunctionalnetworkthattakesintoaccountboththedifferent interactiontypesandtheirdirections.Functionalnetworksalsoallowustosimulate thedynamicbehavioroftheBRNthroughstatetransitions.Secondly,theincremental calculationofmodularstructuresatdifferentstatesmakesitpossibletotrackthe evolutionofmodulesi.e.,membershipchanges,creationofnewmodules.Tracking themoduleevolutionisdifcultwhencomputingmodularstructureindependentlyat eachstate.Also,thecompactrepresentationweuseimprovestherunningtimeofour algorithmbyonlyconsideringthesymmetricdifferenceofconsecutivestates.Forvery largenetworksornetworkswithmanydynamicstepscompactrepresentationscalesthe problemsuchthatitcanbehandledefcientlybymodularityidenticationalgorithms. 7.2ResultsandDiscussion Inthissection,weevaluatetheaccuracyandtheperformanceofourmethod onrealBRNs.Werstcomparethebiologicalrelevanceofourresultswiththeones ofexistingmethodsthatusesimpliednetworksi.e.,theedgedirectionsandtypes areignored[145,148].WeuseCNMmethod[160]bothasthemoduleidentication methodforthesesimpliednetworksandastheexternalalgorithmthatweemployto ndthemodularstructureofcompactversionsoffunctionalnetworksthatwegenerate. 191

PAGE 192

Thegoalinthisexperimentalsetupistoseethemeritsofbuildingfunctionalnetworks withoutdoinganysimplicationastraditionalmethods. InthesecondsetofexperimentsSection7.2.2wecomparetheeffectof compressingthenetworksonthemodularityscoreandtherunningtimewhennetwork hasanumberofdynamicsteps.ForthispurposeweapplyCNMmethodonthenetwork ofeachdynamicstepindividuallyandwecompareitwiththeresultswegatherwhen weuseCNMasourexternalalgorithmtondthemodularstructuresincrementally.Itis importanttonotethattheaimofthisexperimentisnottocomparewhetherouralgorithm orCNMisbetter.Instead,weanalyzetheeffectofnetworkcompressionindynamic moduleidentication. Datasets: WeusealltheregulatoryandsignalingnetworksavailableintheKEGG pathwaydatabasefor H.Sapiens [7].Therearetotally2103genesand9188interactions inKEGGfor H.Sapiens .Weusegeneexpressiondataofbreastcancerpatientsfrom theliteratureandcompilefourdifferentdatasetscontainingtotally722patients[162 165].Weusethesegeneexpressionvaluestodenetheinitialstateofthenetworkfor eachpatient. Environment: WerunalltheexperimentsonadesktopcomputerrunningUbuntu8.04 withoneIntelPentium4,3.20GHzprocessorand2GBofRAM.Weimplementallthe algorithmsinJava. 7.2.1QualitativeEvaluation Hereweevaluatethesignicanceofthefunctionalnetworksthatwedevised.We rstdiscusstheresultsonaspecicrealexampleindetail.Wethenpresentthemost frequentlyobservedmodulesbyourmethodforhumanregulatorynetworkbyusingthe geneexpressiondataof722breastcancerpatients. Coagulationcascadenetwork:Acasestudy. RecallthatinFigure7-1,wehave shownthattheexistingmethodsmayfailtoidentifythebiologicallysignicantmodules. 192

PAGE 193

Here,werevisitthesamenetworkhumancoagulationcascadeandevaluatewhether ourfunctionalnetworkcanovercomethisdrawbackonarealexample. A B Figure7-3.Twofunctionalnetworksinducedfromhumancoagulationcascadeattwo consecutivetimesteps. Figure7-3illustratestwomodularstructuresthatourmethodidentiesfor twoconsecutivetimestepsofhumancoagulationcascade.Figure7-3Ashows thatourmethodperfectlyseparatesthegenesintotwomodulesthatservefor coagulationandanti-coagulationfunctions.Theanti-coagulationmodulehaseight members.Alpha-2-macroglobulin A2M isaproteaseinhibitoranditinhibitsthrombin 193

PAGE 194

whichresultsinanadverseeffectonclotting. PROC encodesproteinC,avitamin K-dependentplasmaglycoproteinthatisakeycomponentoftheanticoagulantsystem. PROS1 hasananticoagulanteffecttoo,asitisacofactorofactivatedproteinC. THBD activates PROC bybindingtothrombinwhichresultsindegradationoftheactivated formsofcoagulationfactors F5 and F8 TFPI geneencodesaproteaseinhibitorthat inhibitstheactivatedcoagulationfactorsof F10 and F7 inanautoregulatoryloop.The otherthreegenesoftheanti-coagulationmodulearethethreemembersof SERPIN familyandtheyactasaninhibitorofthrombinthroughdifferentmechanismsindifferent conditions. Thecoagulationmoduleconsistsofcoagulationfactorswithnamesstartingwith F .Thecoagulationstartsby F7 comingincontactwithtissue-factorandformingan activecomplexthatactivates F9 and F10 F10 anditscofactor F5 formsacomplex thatactivatesprothrombintothrombinwhichinturnactivatesothercomponentsof coagulationcascade F8 F9 F11 ,etc..Figure7-3Ashowsthatthesecoagulation factorsaregroupedtogetherasamoduleandhencewecallthisthecoagulation module.Thus,weobservethatbyutilizingfunctionalnetworksouralgorithmcan separatethegenesofaBRNintobiologicallymeaningfulmodules. OuralgorithmcanupdatethemodulesincrementallyasthestateoftheBRN changes.Figure7-3showshowthefunctionalnetworksandthemodulesofthehuman coagulationcascadenetworkevolve.Themodulesofthefunctionalnetworkchangeas thestatesofthegeneschange.IntherststateFigure7-3A,theactivitylevelof F10 is low.Inthiscase,thetissuefactorpathwayinhibitor TFPI cansuppresstheactivating effectof F3 on F10 .Keepingtheactivitylevelofacoagulationfactorlow, TFPI has functionalsimilaritytotheotheranticoagulantssuchas PROC PROS1 and SERPINs in thissnapshot.However,inthenextstateFigure7-3B F10 becomeshighlyactiveand TFPI 'sinhibitingeffecton F10 decreases.Inthiscase, TFPI cannotshowanticoagulant effect,hence,itisnolongeramemberofanti-coagulationmodule.Thischangeresults 194

PAGE 195

inrestructuringofthemodulesandouralgorithmisabletoidentifythenewmodular structurereectingthechanges. Evaluationontheentirehumanregulatorynetwork: Weapplyourmethodtohuman regulatorynetworkextractedfromKEGG[7]andweusepublishedgeneexpression dataof722differentbreastcancerpatients[162165]todeterminetheinitialstates ofthegenesforeachpatient.Whiletraditionalmethodsassumethateachpatient hasthesamesnapshotoftheregulatorynetwork,weconsidergeneexpressiondata toconstructpatientspecicfunctionalnetworks.Consideringthevariationsingene expressionlevelsofdifferentpatients,ourmethodallowsustoidentifythemostfrequent modulesthatareobservedinasetofbreastcancerpatients. Wedenethesupportofamoduleasthepercentageofpatientswhosefunctional networkcreatedfrominitialgeneexpressionvaluescontainsthatmodule.InTable7-1, welistthetop20moduleswithlargestsupport.Fortheexistingmethodsthatuse simpliednetworks,thetopologyofentirenetworkisindependentoftheexpression levelsofthegenes.Asaresult,themodulesareexactlysameforallpatientsandthere isonlyonesimpliednetwork.Herewemeasuretheprecisionofthesemethodsas follows.Foreachsignicantmodule X fromourmethod,werstndthemodule Y from thesimpliednetworkthatcontains X ifthereisany.Let j X j and j Y j bethenumberof genesin X and Y .Wemeasuretheprecisionfor X as100 j X j = j Y j %.Thus,100% meansthatCNMcouldidentifythesamemoduleand50%meansthatitidentiesthe module X insideamodulethatistwicethesizeof X .0%impliesthatCNMdoesnot groupthegenesof X inthesamemodule. Next,wediscussthebiologicalrelevanceofseveralofthesemodules.Therst moduleinTable7-1containsthreegenesfromtheMAPKsignalingpathway. FOS takespartinseveralothernetworksaswellwhereastheothertwogenesplayrole onlyinMAPKpathwayaccordingtotheKEGGdatabase.Inthispathway ELK4 can formaternarynucleoproteincomplexwiththeserumresponsefactor SRF and SRF 195

PAGE 196

accessoryprotein1 SAP1 toactivate FOS .Asaresultthesegenescollectively serveforcellularproliferation/differentiation.Proliferationisawell-conservedbiological functionamongmulticellularorganisms[166].Thegeneslistedinthesecondmodulein Table7-1areallfromthe SMAD family.TheyjointlyappearintheTGF-betasignaling pathwaythoughseveralofthemtakepartinotherpathwaysaswell.Onthispathway theycollectivelyserveacriticalroleinanumberofactivitiesincludingcellgrowth, apoptosis,morphogenesis,developmentandimmuneresponses[167].Thegenesinthe thirdmoduleappearintheNeurotrophinsignalingpathway.Studieshaveshownthatthe expressionlevelsofthesegenescorrelatesignicantlyforpatientsthathavedamaged hippocampusaswellasforhealthypatients[168].Ouralgorithmidenties JAK3 STAT3 and POMC togetherasanothermodulewithhighsupport.Thegenesinthismodule affecthypothalamo-pituitary-adrenalaxisamonganumberofotherfunctions. JAK3 increasestheactivityof STAT3 throughphosphorylation. STAT3 thenindirectlyactivates POMC .However,simpliednetworkapproachfailstogroupthesethreegenestogether suggestingthattheirfunctionalsimilarityisnotsignicant. Theresultsofourqualitativeevaluationhastwoimportantimplications: iFunctionalnetworksareusefulindecomposingBRNsintomodulesthatcontain functionallysimilargenes. iiUsinggeneexpressiondatatocreatepatient-specicnetworksallowsidentication ofsignicantmodulesthataremissedwhensimpliednetworkapproachisused. 7.2.2QuantitativeEvaluation Inthissection,weevaluatetheperformanceofourmethodquantitatively.We wanttoseetheeffectofusingcompactrepresentationonthequalityofthemodular structureandtherunningtimeofthemethod.Westartwithexperimentingonwhether ouralgorithmsacricesmodularityvalue Q asitusesacompactrepresentationofthe modularstructure.Figure7-4plotstheaveragemodularityforeachpatientoverall dynamicsteps.ThemodularityofourmethodisclosetothatofCNMwithnegligible 196

PAGE 197

Figure7-4.Averagemodularityvalueforeachpatientoveralldynamicsteps.CNM AlgorithmdenotesusingCNMonfunctionalnetworksfromscratchateach step.OurApproachdenotesusingcompactrepresentationonfunctional networksandincrementallyupdatingmodularstructure. difference.Thedifferenceisbelow3%.Hence,thelossofmodularityscoredueto compressionisnotsignicant. WefurtherusethenormalizedmutualinformationNMI[169],aninformation-theoretical approach,tomeasurethesimilaritybetweenthemodularstructuresfoundbydirectly applyingCNMalgorithmonfunctionalnetworksandrstcompressingthesenetworks andthenapplyingCNM.NMItakesavalueintherange [0 ; 1] .AlargeNMIvalueimplies thatthetwomodularstructuresaresimilar.TheresultsshowthattheNMIvalueisvery closeto1meanNMIis0.95,conrmingthehighsimilarityoftwomodularstructures. Itisimportanttonotethatthesignicantdifferencebetweenmodularstructuresintwo differentcolumnsofTable7-1isduetothecomparisonoftwodifferentnetworktypes functionalnetworkvssimpliednetwork.Heretheinputsarethefunctionalnetworks andtheircompactforms.AsFigure7-5suggests,themembershipofthegenesin modulesinthiscasearehighlysimilar.Therefore,thegures7-4and7-5togetherimply 197

PAGE 198

Figure7-5.AverageNormalizedMutualInformationNMIofusingCNMalgorithmon originalnetworksandontheircompactrepresentationsaswedoinour method. thatforverylargenetworkswithmanydynamicsteps,whererunningtimeisanissue, usingcompactrepresentationisapracticaloption. Next,wemeasuretherunningtimeofCNMmethodwhenitisappliedtoeach dynamicstepfromscratchandwhenitiscombinedwithourcompactrepresentationto incrementallynddynamicmodulesbyonlyconsideringthedifferencesbetweentwo steps.Figure7-6showsthecumulativerunningtimeofeachmethodaswecompute themodularstructuresfor300consecutivestatesforeachpatient.Weseethatusing CNMoncompressednetworkasinourmethodissignicantlyadvantageousoverusing itfromscratchateachstepontheoriginalnetwork.Thetotalrunningtimeofthelatter approachforallpatientsismorethan1.5hourswhileourapproachrequiresonlyafew minutes. Thegaininrunningtimeisduetothereductioninsizeofcompactrepresentation comparedtotheoriginalnetwork.Inourexperiments,thesizeofthecompactgraph numberofvertices+numberofedgesisontheaverage10timessmallerthan 198

PAGE 199

Figure7-6.Cumulativerunningtimeforincreasingnumberofpatients.Forlegendssee Figure7-4. theoriginalnetworksResultsomitted.Therefore,ourincrementalmethodwith compressionusessignicantlylessspaceaswellasrunningtime.Theseresults implythattheimprovementofrunningtimeandspaceutilizationbyusingourmethodis reasonablylargewithoutasignicantchangeinidentiedmodularstructure. 7.2.3Discussion Inthischapter,weproposedanewapproachtoidentifydynamicmodulesinBRNs. Unlikeexistingmethods,weconsideredthetypesanddirectionsofinteractionsbetween genes.Wecreatedanewnetworktorepresentthefunctionalsimilaritiesofgenes atagivenstate.Inthisfunctionalnetworkanedgebetweentwogenesrepresents thesimilarityoftheirimpactsonthenetworkstate.Usingthisnetwork,ouralgorithm identiesthemodulesmoreaccuratelycomparedtotraditionalmethods.Additionally, ouralgorithmcapturesthedynamicbehaviorofBRNsastheactivitylevelsofthegenes changeovertimeduetotheirinteractionswitheachother.Itincrementallyupdates thecurrentmodularstructuretondthemodularstructureinthenextstateratherthan 199

PAGE 200

computingitfromscratch.Wealsobenetedfromthefactthatthedifferencebetween twoconsecutivestatesisoftenverysmallbykeepingacompactrepresentationof thenetworkthroughdynamicsteps.Ourexperimentssuggestedthatourmethod canefcientlyndbiologicallymeaningfulmodulesthataremissedbytraditional approaches.Additionally,therunningtimesshowedthatourapproachissignicantly morescalableforlargesizeapplicationscomparedtopreviousapproaches[170]. 200

PAGE 201

Table7-1.Top20modulesfoundbyourmethodwiththehighestsupportfromthe722 patients. %%Precision RankSupportGenesofthemodulewithsimplied networks[145,148] 1100.0ELK4,FOS,SRF8.6 297.1SMAD2,SMAD3,SMAD475.0 SMAD1,SMAD5,SMAD9 393.8NGF,NTF3,NTF4100.0 NTRK1,NTRK2,BDNF 492.7IL24,IL20,IL22RA160.0 592.7MAP2K3,MAP2K62.9 692.4FRAP1,RPS6KB150.0 RPS6KB2,RPS6 791.0CHUK,IKBKB,IKBKG5.7 NFKBIA,NFKBIB,NFKBIE 890.6CAMK4,CREBBP,EP3005.7 988.6NFKB1,RELA,BCL22.9 1087.0ADCY3,GNAL,PRKG12.5 1186.8TNF,TNFRSF1A,TRADD2.9 PRKAA1,PRKAA2,PRKAG2 1286.7PRKAG3,PRKAB1,PRKAB242.1 PRKAG1,SLC2A4 1386.1PARD3,IGSF5,F11R4.2 JAM2,JAM3 1485.9JAK2,STAT3,POMC0.0 1585.7WASF2,BAIAP2,ARPC50.0 ARPC1B,ARPC2,ARPC5L 1685.2FIGF,VEGFC,FLT43.4 1785.0CHEK2,ATM7.5 1883.9RHOA,DIAPH1,PFN30.0 PFN4,PFN1,PFN2 1983.4ADCY3,GNAL,PRKG22.5 2083.1RASGRP1,SOS1,SOS23.4 201

PAGE 202

CHAPTER8 CONCLUSIONS Thefocusofthisthesiswascomparativeanalysisofdifferentbiologicalnetworksto uncoverbiologicallyrelevantandinterestingpatterns.Wemainlydevelopedalignment algorithmsformetabolicnetworksthatcoverdifferentaspectsofthealignmentproblem. Webelievethatthemethodswedevelophereputtogethercreateaframeworkthat leveragesexistingchallengesinmetabolicnetworkalignment.Furthermore,the algorithmswedevelopedforanalyzingdynamicbehaviorofregulatorynetworks revealedstructuralpropertiesthatareimportantinunderstandingtheregulatory mechanismsofdifferentorganisms. Wecansummarizethecontributionsofthisthesisasfollows. Wedevelopedanalgorithmthatalignsmetabolicnetworkswithoutabstractionand revealsmappingsofdifferenttypesofentitiesinaconsistentway.Ourresultswere indicativethatconsideringdifferententitytypesincreasetheaccuracyofthealignment formetabolicnetworks.Experimentalresultsonmetabolicnetworksgatheredfrom KEGGshowedthatourmethodhasapracticalrunningtimeandistoleranttoerrorsin networktopologyandnodelabels.Weobservedthatourmethodcanbeusedefciently incapturingwell-knownalternativeentities,predictingphylogenyfrommetabolicnetwork similarityandansweringtop-kqueriesinnetworkdatabases.Ourmethodisgeneric enoughtobeappliedtoanygraphalignmentproblemwheregraphshaveheterogenous nodesandinteractionsbetweenthem. Wedevelopedanalgorithmthatallowmappingonenodeofonenetworktoasetof nodesoftheotheri.e.,allowsone-to-manymappings.Ourexperimentsonmetabolic networkssuggestedthatourmethodcanidentifybiologicallyrelevantalignments ofalternativesubnetworksthataremissedbytraditionalmethods.Thisexibilityof allowingsubnetworksisadesirablepropertyinnetworkalignmentsincethereexistsa signicantnumberofcaseswherethesameorsimilarfunctioniscarriedoutbydifferent 202

PAGE 203

numberofstepsindifferentorganisms.SubMAPalgorithmhaspotentiallyexponential runningtimeduetoinherentsubsetenumerationproblem.However,itisstillscalablefor realsizemetabolicnetworkswhenthereactionsubsetsofsizeatmostthreeorfourare considered. Inordertoleveragealignmentoflargermetabolicnetworkswithorwithout subnetworkmappings,wedevelopedaframeworkthatusesascalablecompression techniquetoimproveresourceutilizationofexistingalignmentmethods.Ourexperiments showedthatthisframeworkprovidessignicantspeedupandreducedmemory utilizationcomparedtoSubMAP.Weobservedthatonlyonelevelofcompression canprovide10timesspeedupandthegatheredalignmentresultsareverysimilartothe resultswithoutcompression.Wesuggestusinghigherlevelsofcompressionasitcan decreasethealignmentaccuracydependingonthesizesandtopologiesofthequery networks. Tounderstandthefunctionalroleofacomponentsetinametabolicnetwork,we developedamathematicalformalizationthatcalculatestheimpactofareactionset intermsofthesetofpossiblesteadystatesofthemetabolicnetworkitbelongsto. Specically,ourmodelcomputestheimpactastheportionoftheuxconeoftheoriginal networkthatcannotbeachievedwithoutthereactionsinthesetthatweconsider.Using thismodel,wecharacterizedthefunctionalsimilarityoftworeactionsetsfrompotentially differentnetworks.Weobservedthatourdenitionofimpactcanprovidebiologically andstatisticallysignicantpredictionsofessentialreactions.Sincecharacterizing thefunctionofasetofcomponentsmathematicallyhasgreatvalueinnumerous applicationsofcomputationalbiology,webelievethattheideasdevelopedinthismethod hasthegreatpotentialtolayfoundationsinunderstandingandcomparingcomplex biologicalnetworksbetter. Steadystatesofregulatoryandsignalingnetworksdeterminetheactivitylevels ofindividualentitiesinthelongrun.Identifyingallthesteadystatesofthesenetworks 203

PAGE 204

isdifcultduetothestatespaceexplosionproblem.Webuiltamathematicalmodel thatallowspruningalargeportionofthestatespacequicklywithoutcausinganyfalse dismissals.Fortheremainingstatespace,whichistypicallyverysmallcomparedto thewholestatespace,wedevelopedarandomizedtraversalmethodthatextractsthe steadystates.Weestimatedthenumberofsteadystates,andtheexpectedbehavior ofindividualgenesandgenepairsinsteadystatesinanonlinefashion.Also,we formulatedastoppingcriterionthatterminatesthetraversalassoonasusersupplied percentageoftheresultsarereturnedwithhighcondence. Regulatorynetworksareknowntoexhibitmodularstructurewhenallexisting interactionsareassumedtobefunctional.However,thesenetworksaredynamici.e., theirnodeschangestateintimeandfurtheranalysisoftheirmodularityisnecessaryby takingintoaccounttheirstatechanges.Fortrackingthedynamicsofmodularstructures inregulatorynetworks,wedevelopedanalgorithmthatextendsexistingcommunity structureidenticationmethodstothecaseofdynamicnetworks.Webenetedfromthe factthatthedifferencebetweentwoconsecutivestatesisoftenverysmallbykeepinga compactrepresentationofthenetworkthroughdynamicsteps.Thisobservationallowed ourmethodtobescalabletoverylargenetworkswithmanydynamicsteps. 204

PAGE 205

REFERENCES [1]MargolinAA,WangK,LimWK,KustagiM,NemenmanI,etal.Reverse engineeringcellularnetworks.NatureProtocols1:662. [2]AkutsuT,MiyanoS,KuharaSIdenticationofgeneticnetworksfroma smallnumberofgeneexpressionpatternsundertheBooleannetworkmodel.In: PacicSymposiumonBiocomputingPSB.volume4,pp.17. [3]WongSL,ZhangLV,TongAH,LiZ,GoldbergDS,etal.Combining biologicalnetworkstopredictgeneticinteractions.ProceedingsoftheNational AcademyofSciencesPNAS101:15682. [4]WuX,ZhuL,GuoJ,ZhangDY,LinKPredictionofyeastprotein-protein interactionnetwork:insightsfromtheGeneOntologyandannotations.Nucleic AcidsResearch34:2137. [5]FranckeC,SiezenRJ,TeusinkBReconstructingthemetabolicnetworkof abacteriumfromitsgenome.TrendsinMicrobiology13:550. [6]CakmakA,OzsoyogluGMiningbiologicalnetworksforunknown pathways.Bioinformatics23:2775. [7]OgataH,GotoS,SatoK,FujibuchiW,BonoH,etal.KEGG:Kyoto EncyclopediaofGenesandGenomes.NucleicAcidsResearch27:29. [8]KeselerIM,Collado-VidesJ,Gama-CastroS,IngrahamJ,PaleyS,etal. EcoCyc:acomprehensivedatabaseresourceforEscherichiacoli.NucleicAcids Research33:334. [9]SchaeferCF,AnthonyK,KrupaS,BuchoffJ,DayM,etal.PID:The PathwayInteractionDatabase.NucleicAcidsResearch37:674. [10]SalwinskiL,MillerCS,SmithAJ,PettitFK,BowieJU,etal.TheDatabase ofInteractingProteins:2004update.NucleicAcidsResearch32:449. [11]SridharP,KahveciT,RankaSAniterativealgorithmformetabolic network-baseddrugtargetidentication.In:PacicSymposiumonBiocomputing PSB.volume12,pp.88. [12]CampillosM,KuhnM,GavinAC,JensenLJ,BorkPDrugTarget IdenticationUsingSide-EffectSimilarity.Science321:263. [13]StephanopoulosGMetabolicengineering.CurrentOpinionsin Biotechnology5:196. [14]OstergaardS,OlssonL,JohnstonM,NielsenJIncreasinggalactose consumptionby Saccharomycescerevisiae throughmetabolicengineeringofthe GAL generegulatorynetwork.NatureBiotechnology18:1283. 205

PAGE 206

[15]ClementeJ,SatouK,ValienteGFindingConservedandNon-Conserved RegionsUsingaMetabolicPathwayAlignmentAlgorithm.GenomeInformatics 17:46. [16]HeymansM,SinghADerivingphylogenetictreesfromthesimilarity analysisofmetabolicpathways.Bioinformatics19:138. [17]DamaschkePGraph-TheoreticConceptsinComputerScience.Lecture NotesinComputerScience484:72. [18]PinterRY,RokhlenkoO,Yeger-LotemE,Ziv-UkelsonMAlignmentof metabolicpathways.Bioinformatics21:3401. [19]WebbECEnzymenomenclature1992.AcademicPress. [20]TohsatoY,NishimuraYMetabolicPathwayAlignmentBasedonSimilarity ofChemicalStructures.InformationandMediaTechnologies3:1910. [21]TohsatoY,MatsudaH,HashimotoAAMultipleAlignmentAlgorithmfor MetabolicPathwayAnalysisUsingEnzymeHierarchy.In:IntelligentSystemsfor MolecularBiologyISMB.pp.376. [22]ChengQ,HarrisonR,ZelikovskyAMetNetAligner:awebservicetoolfor metabolicnetworkalignments.Bioinformatics25:1989. [23]SinghR,XuJ,BergerBPairwiseGlobalAlignmentofProteinInteraction NetworksbyMatchingNeighborhoodTopology.In:InternationalConferenceon ResearchinComputationalMolecularBiologyRECOMB.pp.16. [24]SinghR,XuJ,BergerBGlobalalignmentofmultipleproteininteraction networkswithapplicationtofunctionalorthologydetection.Proceedingsofthe NationalAcademyofSciencesPNAS105:12763. [25]KoyuturkM,GramaA,SzpankowskiWAnefcientalgorithmfordetecting frequentsubgraphsinbiologicalnetworks.In:IntelligentSystemsforMolecular BiologyISMB.pp.200. [26]DeutscherD,MeilijsonI,SchusterS,RuppinECansingleknockouts accuratelysingleoutgenefunctions?BMCSystemsBiology2:50. [27]WatanabeN,CherneyMM,vanBelkumMJ,MarcusSL,FlegelMD,etal. CrystalstructureofLL-diaminopimelateaminotransferasefromArabidopsis thaliana:arecentlydiscoveredenzymeinthebiosynthesisofL-lysinebyplants andChlamydia.JournalofMolecularBiology371:685. [28]McCoyAJ,AdamsNE,HudsonAO,GilvargC,LeustekT,etal. L,L-diaminopimelateaminotransferase,atrans-kingdomenzymesharedby Chlamydiaandplantsforsynthesisofdiaminopimelate/lysine.Proceedingsofthe NationalAcademyofSciencesPNAS103:17909. 206

PAGE 207

[29]HuppTR,LaneDP,BallKLStrategiesformanipulatingthep53pathwayin thetreatmentofhumancancer.BiochemicalJournal352:1. [30]LaneDPExploitingthep53pathwayforcancerdiagnosisandtherapy. BritishJournalofCancer80:1. [31]MendozaL,ThieffryD,Alvarez-BuyllaERGeneticcontrolofower morphogenesisin Arabidopsisthaliana :Alogicalanalysis.Bioinformatics15: 593. [32]DemongeotJ,MorvanM,SeneSImpactofxedboundaryconditionson thebasinsofattractionintheower'smorphogenesisof Arabidopsisthaliana .In: AdvancedInformationNetworkingandApplications.pp.782. [33]Alvarez-BuyllaER,ChaosA,AldanaM,BenitezM,Cortes-PozaY,etal. Floralmorphogenesis:stochasticexplorationsofagenenetworkepigenetic landscape.PLoSONE3:e3626. [34]Saez-RodriguezJ,SimeoniL,LindquistJA,HemenwayR,BommhardtU,etal. AlogicalmodelprovidesinsightsintoTcellreceptorsignaling.PLoS ComputationalBiology3:1580. [35]DavidichMI,BornholdtSBooleannetworkmodelpredictscellcycle sequenceofssionyeast.PLoSONE3:e1672. [36]LiF,LongT,LuY,OuyangQ,TangCTheyeastcell-cyclenetworkis robustlydesigned.ProceedingsoftheNationalAcademyofSciencesPNAS 101:4781. [37]AyF,KahveciT,deCrecy-LagardVConsistentalignmentofmetabolic pathwayswithoutabstraction.In:ComputationalSystemsBioinformatics ConferenceCSB.volume7,pp.237. [38]AyF,KahveciT,Crecy-LagardVAfastandaccuratealgorithmfor comparativeanalysisofmetabolicpathways.JournalofBioinformaticsand ComputationalBiology7:389. [39]KleinbergJMAuthoritativesourcesinahyperlinkedenvironment.Journal oftheACM46:604. [40]ClementeJC,SatouK,ValienteGPhylogeneticreconstructionfrom non-genomicdata.Bioinformatics23:110. [41]WheelerDL,ChappeyC,LashAE,LeipeDD,MaddenTL,etal.Database resourcesoftheNationalCenterforBiotechnologyInformation.NucleicAcids Research28:10. [42]WangJ,ShanH,ShashaD,PielWFastStructuralSearchinPhylogenetic Databases.EvolutionaryBioinformatics1:37. 207

PAGE 208

[43]FelsensteinJPHYLIP-PhylogenyInferencePackage.Cladistics5: 164. [44]KelleyB,SharanR,KarpRM,SittlerT,RootDE,etal.Conserved pathwayswithinbacteriaandyeastasrevealedbyglobalproteinnetwork alignment.ProceedingsoftheNationalAcademyofSciencesPNAS100: 11394-11399. [45]KelleyB,YuanB,LewitterF,SharanR,StockwellBR,etal.PathBLAST: atoolforalignmentofproteininteractionnetworks.NucleicAcidsResearch32: 83-88. [46]KoyuturkM,GramaA,SzpankowskiWPairwiseLocalAlignmentofProtein InteractionNetworksGuidedbyModelsofEvolution.In:InternationalConference onResearchinComputationalMolecularBiologyRECOMB.pp.48. [47]KoyuturkM,KimY,TopkaraU,SubramaniamS,SzpankowskiW,etal. Pairwisealignmentofproteininteractionnetworks.JournalofComputational Biology13:182. [48]BergJ,LassigMLocalgraphalignmentandmotifsearchinbiological networks.ProceedingsoftheNationalAcademyofSciencesPNAS101: 14689. [49]BergJ,LassigMCross-speciesanalysisofbiologicalnetworksby Bayesianalignment.ProceedingsoftheNationalAcademyofSciencesPNAS 103:10967-72. [50]NarayananM,KarpRMComparingProteinInteractionNetworksviaa GraphMatch-and-SplitAlgorithm.JournalofComputationalBiology14:892. [51]DutkowskiJ,TiurynJIdenticationoffunctionalmodulesfromconserved ancestralproteinproteininteractions.Bioinformatics23:149. [52]DostB,ShlomiT,GuptaN,RuppinE,BafnaV,etal.QNet:AToolfor QueryingProteinInteractionNetworks.In:InternationalConferenceonResearch inComputationalMolecularBiologyRECOMB.pp.1-15. [53]ShlomiT,SegalD,RuppinE,SharanRQPath:AMethodforQuerying PathwaysinaProtein-ProteinInteractionNetwork.BMCBioinformatics7. [54]LiaoC,LuK,BaymM,SinghR,BergerBIsoRankN:spectralmethodsfor globalalignmentofmultipleproteinnetworks.Bioinformatics25:253. [55]SharanR,SuthramS,KelleyRM,KuhnT,McCuineS,etal.Conserved patternsofproteininteractioninmultiplespecies.ProceedingsoftheNational AcademyofSciencesPNAS102:1974. 208

PAGE 209

[56]KalaevM,SmootM,IdekerT,SharanRNetworkBLAST:comparative analysisofproteinnetworks.Bioinformatics24:594-6. [57]DandekarT,SchusterS,SnelB,HuynenM,BorkPPathwayalignment: applicationtothecomparativeanalysisofglycolyticenzymes.Biochemistry Journal343:115. [58]OgataH,FujibuchiW,GotoS,KanehisaMAheuristicgraphcomparison algorithmanditsapplicationtodetectfunctionallyrelatedenzymeclusters. NucleicAcidsResearch28:4021-8. [59]ChenM,HofestadtRPathAligner:metabolicpathwayretrievaland alignment.ApplBioinformatics3:241-52. [60]ChenM,HofestadtRIIPredictionandalignmentofmetabolicpathways. BioinformaticsofGenomeRegulationandStructureII:355-365. [61]WernickeS,RascheFSimpleandfastalignmentofmetabolicpathwaysby exploitinglocaldiversity.Bioinformatics23:1978-85. [62]LiZ,ZhangS,WangY,ZhangXS,ChenLAlignmentofmolecular networksbyintegerquadraticprogramming.Bioinformatics23:1631-1639. [63]LiY,RidderD,deGrootMJL,ReindersMJTMetabolicPathwayAlignment M-PalRevealsDiversityandAlternativesinConservedNetworks.In:AsiaPacic BioinformaticsConferenceAPBC.pp.273-286. [64]LiY,RidderD,deGrootMJL,ReindersMJTMetabolicpathwayalignment betweenspeciesusingacomprehensiveandexiblesimilaritymeasure.BMC SystemsBiology2:111. [65]ChengQ,BermanP,HarrisonR,ZelikovskyAFastAlignmentsof MetabolicNetworks.In:IEEEInternationalConferenceonBioinformaticsand BioengineeringBIBE.pp.147-152. [66]RRS,SniegoskiCAModelingthecomplexityofgeneticnetworks: understandingmultigeneandpleiotropicregulation.Complexity1:45-63. [67]KauffmanSATheOriginsofOrder:Self-OrganizationandSelectionin Evolution.OxfordUniversityPress. [68]ThomasR,ThieffryD,KaufmaunMDynamicalbehaviourofbiological regulatorynetworks-I.Biologicalroleoffeedbackloopsandpracticaluseofthe conceptoftheloop-characteristicstate.BulletinofMathematicalBiology57: 247. [69]ArkinA,RossJ,McAdamsHHStochastickineticanalysisof developmentalpathwaybifurcationinphagelambda-infectedEscherichiacoli cells.Genetics149:1633. 209

PAGE 210

[70]GillespieDTExactStochasticSimulationofCoupledChemicalReactions. JournalofPhysicalChemistry81:2340. [71]EdwardsJS,PalssonBOMetabolicFluxBalanceAnalysisAndTheIn SilicoAnalysisOfEscherichiaColiK-12GeneDeletions.BMCBioinformatics1. [72]ImielinskiM,BeltaC,HalaszA,RubinHInvestigatingMetabolite EssentialityThroughGenomeScaleAnalysisOfE.ColiProductionCapabilities. Bioinformatics21:2008. [73]HatzimanikatisV,LiC,IonitaJA,BroadbeltLJMetabolicNetworks: EnzymeFunctionAndMetaboliteStructure.CurrentOpinionInStructuralBiology 14:300. [74]FriedmanN,LinialM,NachmanI,Pe'erDUsingBayesiannetworksto analyzeexpressiondata.JournalofComputationalBiology7:601. [75]PearlJProbabilisticReasoninginIntelligentSystems.MorganKaufmann. [76]KrishnamurthyL,NadeauJ,OzsoyogluG,OzsoyogluZM,SchaefferG,etal. Pathwaysdatabasesystem:anintegratedsetoftoolsforbiological pathways.Bioinformatics19:930. [77]DwyerT,RolletschekH,SchreiberFRepresentingExperimentalBiological DatainMetabolicNetworks.In:AsiaPacicBioinformaticsConferenceAPBC. pp.13. [78]XiongH,HeX,DingC,ZhangY,KumarV,etal.IdenticationofFunctional ModulesinProteinComplexesviaHypercliquePatternDiscovery.In:Pacic SymposiumonBiocomputingPSB.volume10,pp.221. [79]SharonE,SegalEAFeature-BasedApproachtoModelingProtein-DNA Interactions.In:InternationalConferenceonResearchinComputational MolecularBiologyRECOMB.pp.77. [80]MaHW,ZhaoXM,YuanYJ,ZengAPDecompositionOfMetabolicNetwork IntoFunctionalModulesBasedOnTheGlobalConnectivityStructureOfReaction Graph.Bioinformatics20:1870. [81]HattoriM,OkunoY,GotoS,KanehisaMDevelopmentofachemical structurecomparisonmethodforintegratedanalysisofchemicalandgenomic informationinthemetabolicpathways.JournaloftheAmericanChemicalSociety JACS125:11853. [82]HaveliwalaTH,KamvarSDTheSecondEigenvalueoftheGoogleMatrix. StanfordUniversityTechnicalReport. [83]KimJ,CopleySDWhyMetabolicEnzymesAreEssentialorNonessential forGrowthofEscherichiacoliK12onGlucose.Biochemistry46:1250111. 210

PAGE 211

[84]GreenML,KarpPDABayesianmethodforidentifyingmissingenzymesin predictedmetabolicpathwaydatabases.BMCBioinformatics5:76. [85]SridharP,SongB,KahveciT,RankaSMiningmetabolicnetworksfor optimaldrugtargets.In:PacicSymposiumonBiocomputingPSB.pp. 291. [86]SaitouN,NeiMTheneighbor-joiningmethod:anewmethodfor reconstructingphylogenetictrees.MolecularBiologyandEvolution4:406. [87]ClementeJC,SatouK,ValienteGReconstructionofPhylogenetic RelationshipsfromMetabolicPathwaysBasedontheEnzymeHierarchyand theGeneOntology.GenomeInformatics16:45. [88]RobinsonDR,FouldsLRComparisonofphylogenetictrees.Mathematical Biosciences53:131-47. [89]RomeroP,WaggJ,GreenML,KaiserD,KrummenackerM,etal. Computationalpredictionofhumanmetabolicpathwaysfromthecomplete humangenome.GenomeBiology6:R2. [90]LovaszLStablesetandpolynomials.DiscreteMathematics124:137. [91]AustrinP,KhotS,SafraMInapproximabilityofVertexCoverand IndependentSetinBoundedDegreeGraphs.In:IEEEConferenceon ComputationalComplexity.pp.74. [92]BermanP,KarpinskiMOnsometighterinapproximabilityresults.Lecture notesinComputerScience. [93]SakaiS,TogasakiM,YamazakiKAnoteongreedyalgorithmsforthe maximumweightedindependentsetproblem.DiscreteAppliedMathematics126: 313. [94]SaundersPP,BroquistHPSaccharopine,anintermediateofaminoadipic acidpathwayoflysinebiosynthesis.JournalofBiologicalChemistry241: 3435. [95]JeongH,TomborB,AlbertR,OltvaiZN,BarabasiALThelarge-scale organizationofmetabolicnetworks.Nature407:651. [96]AyF,KellisM,KahveciTSubMAP:Aligningmetabolicpathwayswith subnetworkmappings.JournalofComputationalBiologyJCB18:1-17. [97]AyF,KahveciTSubMAP:Aligningmetabolicpathwayswithsubnetwork mappings.In:InternationalConferenceonResearchinComputationalMolecular BiologyRECOMB.volumeLNCS-6044,pp.15. 211

PAGE 212

[98]SchusterS,TDandekar,FellDADetectionofelementaryuxmodes inbiochemicalnetworks:apromisingtoolforpathwayanalysisandmetabolic engineering.TrendsinBiotechnology17:53. [99]StevenBL,PalssonBOExpa:aprogramforcalculatingextremepathways inbiochemicalreactionnetworks.Bioinformatics21:1739. [100]StellingJ,KlamtS,BettenbrockK,SchusterS,GillesEDMetabolic networkstructuredetermineskeyaspectsoffunctionalityandregulation.Nature 420:190. [101]LarhlimiA,BockmayrAAnewconstraint-baseddescriptionofthe steady-stateuxconeofmetabolicnetworks.DiscreteAppliedMathematics 157:2257. [102]SchusterS,HilgetagCOnelementaryuxmodesinbiochemicalreaction systemsatsteadystate.JournalofBiologicalSystems2:165. [103]MaHW,ZengAPTheconnectivitystructure,giantstrongcomponentand centralityofmetabolicnetworks.Bioinformatics19:1423. [104]SamalA,SinghS,GiriV,KrishnaS,RaghuramN,etal.Lowdegree metabolitesexplainessentialreactionsandenhancemodularityinbiological networks.BMCBioinformatics7:118. [105]ConradiC,FlockerziD,RaischJ,StellingJSubnetworkanalysisreveals dynamicfeaturesofcomplexbiochemicalnetworks.ProceedingsoftheNational AcademyofSciencesPNAS104:19175. [106]PapinJA,PriceND,PalssonBOExtremepathwaylengthsandreaction participationingenome-scalemetabolicnetworks.GenomeResearch12: 1889. [107]FischerK,GartnerB,KutzMFastSmallest-Enclosing-BallComputationin HighDimensions.In:EuropeanSymposiumonAlgorithmsESA.pp.630. [108]SegreD,VitkupD,ChurchGMAnalysisofoptimalityinnaturaland perturbedmetabolicnetworks.ProceedingsoftheNationalAcademyofSciences PNAS99:15112. [109]KlamtS,StellingJTwoapproachesformetabolicpathwayanalysis? TrendsinBiotechnology21:64. [110]SchusterS,HilgetagC,WoodsJH,FellDAReactionroutesinbiochemical reactionsystems:algebraicproperties,validatedcalculationprocedureand examplefromnucleotidemetabolism.JournalofMathematicalBiology45: 153-81. 212

PAGE 213

[111]TerzerM,StellingJLargescalecomputationofelementaryuxmodeswith bitpatterntrees.Bioinformatics24:2229. [112]TrinhCT,WlaschinA,SriencFElementarymodeanalysis:auseful metabolicpathwayanalysistoolforcharacterizingcellularmetabolism.Applied MicrobiologyandBiotechnology81:813. [113]KatoJ,HashimotoMConstructionofconsecutivedeletionsofthe Escherichiacolichromosome.MolecularSystemsBiology3:132. [114]KarlebachG,ShamirRModellingandanalysisofgeneregulatory networks.NatureReviewsMolecularCellBiology9:770. [115]BassoK,MargolinAA,StolovitzkyG,KleinU,Dalla-FaveraR,etal. ReverseengineeringofregulatorynetworksinhumanBcells.NatureGenetics4: 382. [116]MargolinAA,NemenmanI,BassoK,WigginsC,StolovitzkyG,etal. ARACNE:Analgorithmforthereconstructionofgeneregulatorynetworksina mammaliancellularcontext.BMCBioinformatics7:S7. [117]HiroseO,NariaiN,TamadaY,BannaiH,ImotoS,etal.Estimatinggene networksfromexpressiondataandbindinglocationdataviaBooleannetworks. LectureNotesinComputerScience3482:349. [118]AkutsuT,KuharaS,MaruyamaO,MiyanoSIdenticationofgenetic networksbystrategicgenedisruptionsandgeneoverexpressionsundera Booleanmodel.TheoreticalComputerScience298:235. [119]MiyanoSInference,modelingandsimulationofgenenetworks.In: ComputationalMethodsinSystemsBiology.pp.207. [120]HiroseO,YoshidaR,ImotoS,YamaguchiR,HiguchiT,etal.Statistical inferenceoftranscriptionalmodule-basedgenenetworksfromtimecoursegene expressionprolesbyusingstatespacemodels.Bioinformatics24:932. [121]GargA,XenariosI,MendozaL,DeMicheliGAnefcientmethodfor dynamicanalysisofgeneregulatorynetworksand insilico geneperturbation experiments.In:InternationalConferenceonResearchinComputational MolecularBiologyRECOMB.pp.62. [122]GargA,DiCaraA,XenariosI,MendozaL,DeMicheliGSynchronous versusasynchronousmodelingofgeneregulatorynetworks.Bioinformatics24: 1917-25. [123]MendozaLAnetworkmodelforthecontrolofthedifferentiationprocessin Thcells.Biosystems84:101. 213

PAGE 214

[124]GargA,MendozaL,XenariosI,DeMicheliGModelingofmultiplevalued generegulatorynetworks.In:IEEEEngineeringinMedicineandBiologySociety. pp.1398. [125]MendozaL,XenariosIAmethodforthegenerationofstandardized qualitativedynamicalsystemsofregulatorynetworks.TheoreticalBiologyand MedicalModelling3:13. [126]HachtelGD,MaciiE,PardoA,SomenziFMarkoviananalysisoflargenite statemachines.IEEETransactionsonComputer-AidedDesignofIntegrated CircuitsandSystems15:1479. [127]DevlooV,HansenP,LabbeMIdenticationofallsteadystatesinlarge networksbylogicalanalysis.BulletinofMathematicalBiology65:102551. [128]ShmulevichI,DoughertyER,KimS,ZhangWProbabilisticBoolean Networks:Arule-baseduncertaintymodelforgeneregulatorynetworks. Bioinformatics18:261. [129]ShlomiT,BerkmanO,RuppinERegulatoryon/offminimizationof metabolicuxchangesaftergeneticperturbations.ProceedingsoftheNational AcademyofSciencesPNAS102:7695. [130]SchaubMA,HenzingerTA,FisherJQualitativenetworks:Asymbolic approachtoanalyzebiologicalsignalingnetworks.BMCSystemsBiology1:4. [131]SegalE,Pe'erD,RegevA,KollerD,FriedmanNLearningmodule networks.JournalofMachineLearningResearch6:557. [132]SegalE,MichaelS,RegevA,Pe'erD,DavidB,etal.ModuleNetworks: Discoveringregulatorymodulesandtheirconditionspecicregulatorsfromgene expressiondata.NatureGenetics34:166. [133]AlbertI,ThakarJ,LiS,ZhangR,AlbertRBooleannetworksimulationsfor lifescientists.SourceCodeforBiologyandMedicine3:16. [134]AlbertR,OthmerHGThetopologyoftheregulatoryinteractionspredicts theexpressionpatternofthesegmentpolaritygenesin Drosophilamelanogaster JournalofTheoreticalBiology223:1. [135]KauffmanSHomeostasisanddifferentiationinrandomgeneticcontrol networks.Nature224:177. [136]NielsenJL.BUDDY-ABinaryDecisionDiagramPackage.TechReport TechnicalUniversityofDenmark,http://www.itu.dk/research/buddy. [137]WuY,ZhangX,YuJ,OuyangQIdenticationofatopologicalcharacteristic responsibleforthebiologicalrobustnessofregulatorynetworks.PLoS ComputationalBiology5:e1000442. 214

PAGE 215

[138]BornholdtSBooleannetworkmodelsofcellularregulation:Prospectsand limitations.JournaloftheRoyalSocietyInterface5Suppl1:85. [139]TysonJJ,Csikasz-NagyA,NovakBThedynamicsofcellcycleregulation. Bioessays24:1095. [140]TysonJJ,ChenK,NovakBNetworkdynamicsandcellphysiology.Nature ReviewsMolecularCellBiology2:908. [141]StuartJM,SegalE,KollerD,KimSKAgene-coexpressionnetworkfor globaldiscoveryofconservedgeneticmodules.Science302:240. [142]ZotenkoE,GuimaraesKS,JothiR,PrzytyckaTMDecompositionof overlappingproteincomplexes:Agraphtheoreticalmethodforanalyzingstatic anddynamicproteinassociations.BMCAlgorithmsforMolecularBiology1:7. [143]OldhamMC,HorvathS,GeschwindDHConservationandevolutionof genecoexpressionnetworksinhumanandchimpanzeebrains.Proceedingsof theNationalAcademyofSciencesPNAS103:17973. [144]ObayashiT,HayashiS,ShibaokaM,SaekiM,OhtaH,etal.COXPRESdb: Adatabaseofcoexpressedgenenetworksinmammals.NucleicAcidsResearch 36:77. [145]ChenJ,YuanBDetectingfunctionalmodulesintheyeastprotein-protein interactionnetwork.Bioinformatics22:2283. [146]WangC,DingC,YangQ,HolbrookSRConsistentdissectionofprotein interactionnetworkbycombiningglobalandlocalmetrics.GenomeBiology8: R271. [147]HallinanJSClusteranalysisofthep53geneticregulatorynetwork: topologyandbiology.In:IEEESymposiumonComputationalIntelligencein BioinformaticsandComputationalBiologyCIBCB.volume7,pp.1. [148]DiaoY,LiM,FengZ,YinJ,PanYThecommunitystructureofhuman cellularsignalingnetwork.JournalofTheoreticalBiology247:608. [149]TantipathananandhC,Berger-WolfTY,KempeDAframeworkfor communityidenticationindynamicsocialnetworks.In:InternationalConference onKnowledgeDiscoveryandDataMiningACMSIGKDD.pp.717. [150]Berger-WolfTY,SaiaJAframeworkforanalysisofdynamicsocial networks.In:InternationalConferenceonKnowledgeDiscoveryandData MiningACMSIGKDD.pp.523. [151]YangB,LiuDYIncrementalalgorithmfordetectingcommunitystructure indynamicnetworks.In:InternationalConferenceonMachineLearningand CyberneticsICMLC.volume4,pp.2284. 215

PAGE 216

[152]BackstromL,HuttenlocherD,KleinbergJ,LanXGroupformationinlarge socialnetworks:membership,growth,andevolution.In:InternationalConference onKnowledgeDiscoveryandDataMiningACMSIGKDD.pp.44. [153]PallaG,BarabasiAL,VicsekTQuantifyingsocialgroupevolution.Nature 446:664. [154]PollnerP,PallaG,VicsekTPreferentialattachmentofcommunities:the sameprinciple,butahigherlevel.EurophysicsLetters73:478. [155]SunJ,PapadimitriouS,YuPS,FaloutsosCGraphscope:Parameter-free miningoflargetime-evolvinggraphs.ConferenceonKnowledgeDiscoveryin DataKDD:687. [156]SpiliopoulouM,NtoutsiI,TheodoridisY,SchultRMonic:modelingand monitoringclustertransitions.In:InternationalConferenceonKnowledge DiscoveryandDataMiningACMSIGKDD.pp.706. [157]KumarR,NovakJ,TomkinsAStructureandevolutionofonlinesocial networks.In:InternationalConferenceonKnowledgeDiscoveryandDataMining ACMSIGKDD.pp.611. [158]LinYR,ChiY,ZhuS,SundaramH,TsengBLFacetnet:aframeworkfor analyzingcommunitiesandtheirevolutionsindynamicnetworks.In:International ConferenceonWorldWideWeb.pp.685. [159]NewmanMEJ,GirvanMFindingandevaluatingcommunitystructurein networks.PhysicalReviewE69:026113. [160]ClausetA,NewmanMEJ,MooreCFindingcommunitystructureinvery largenetworks.PhysicsReviewE70. [161]AyF,XuF,KahveciTScalableSteadyStateAnalysisofBooleanBiological RegulatoryNetworks.PLoSONE4:e7992. [162]LauraJVV,HongyueD,MarcJ,VijverVDGeneexpressionproling predictsclinicaloutcomeofbreastcancer.Nature415:530. [163]WangY,KlijnJG,ZhangYGene-expressionprolestopredictdistant metastasisoflymph-node-negativeprimarybreastcancer.Lancet38:1289. [164]DesmedtC,PietteF,LoiSStrongtimedependenceofthe76-gene prognosticsignaturefornode-negativebreastcancerpatientsinthetransbig multicenterindependentvalidationseries.ClinicalCancerResearch13:3207. [165]YudiP,JudithBGeneexpressionprolingsparesearlybreastcancer patientsfromadjuvanttherapy:derivedandvalidatedintwopopulation-based cohorts.BreastCancerResearch7:953. 216

PAGE 217

[166]XiaK,XueH,DongD,ZhuS,WangJ,etal.Identicationofthe proliferation/differentiationswitchinthecellularnetworkofmulticellularorganisms. PLoSComputationalBiology2:1482. [167]HeldinCH,MiyazonoK,tenDijkePTGF-betasignallingfromcell membranetonucleusthroughSMADproteins.Nature390:465. [168]MathernGW,BabbTL,MicevychPE,BlancoCE,PretoriusJKGranule cellmRNAlevelsforBDNF,NGF,andNT-3correlatewithneuronlossesor supragranularmossybersproutinginthechronicallydamagedandepileptic humanhippocampus.Molecularandchemicalneuropathology30:53. [169]DanonL,DuchJ,ArenasA,Diaz-GuileraAComparingcommunity structureidentication.JournalofStatisticalMechanics:TheoryandExperiment 9008. [170]AyF,DinhT,ThaiM,KahveciTDynamicmodularstructureofregulatory networks.In:IEEEInternationalConferenceonBioinformaticsandBioengineering BIBE.pp.136. [171]modENCODEConsortiumT,RoyS,ErnstJ,KharchenkoP,KheradpourP,etal. IdenticationofFunctionalElementsandRegulatoryCircuitsbyDrosophila modENCODE.Science330:1787. [172]AyF,GulsoyG,KahveciTFindingsteadystatesoflargescaleregulatory networksthroughpartitioning.In:GenomicSignalProcessingandStatistics GENSIPS,2010IEEEInternationalWorkshopon.pp.1. 217

PAGE 218

BIOGRAPHICALSKETCH FerhatAyreceivedhisPhDinComputerSciencefromtheUniversityofFlorida inthesummerof2011.HeworkedunderthesupervisionofDr.TamerKahveci.He nishedhisundergraduatestudiesatMiddleEastTechnicalUniversityMETUin TurkeyreceivinghisB.S.degreeinComputerEngineeringandhissecondB.Sdegreein Mathematicsin2007. Ferhat'smainresearchinterestsarebioinformaticsandcomputationalbiology.His researchfocusesoncomparativeandtopologicalanalysisofmetabolicandregulatory networks.Hehasworkedondesigningalgorithmsformetabolicnetworkalignment, identicationofsteadystatesofBooleanregulatorynetworksandminingbiological networksforstructuralpatterns.HealsocontributedinthemodENCODEprojectwhile hewasavisitingresearcherinKellislaboratoryatMITduringthesummerof2010. 218