<%BANNER%>

New in silico approaches for metabolic engineering

Permanent Link: http://ufdc.ufl.edu/UFE0042323/00001

Material Information

Title: New in silico approaches for metabolic engineering
Physical Description: 1 online resource (177 p.)
Language: english
Creator: Song, Bin
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2010

Subjects

Subjects / Keywords: genetic, heuristic, linear, metabolic
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: This thesis focuses on in silico methods for metabolic engineering. Metabolic engineering discusses methods to manipulate the metabolism to approach a given goal, e.g. increasing the cells' production of a certain substance by operating the genetic and regulatory processes. It is widely applied to drug discovery, food industry and cosmetics. In order to change the metabolism to the desired one, e.g. increasing the production of a certain compound, one metabolic engineering method is to use chemical compounds (i.e. drugs) to inhibit a set of enzymes. When an enzyme is inhibited, it cannot catalyze the reactions it is responsible from. As a result, the production of some of the compounds within the system may increase and some may decrease. The amount or the production rate of all the compounds determine the state of the metabolism. The enzymatic target identification problem aims to identify the set of enzymes whose knockouts lead to the state of the system close to the goal. There are several models in the literature to describe the state of metabolism. In this dissertation we address the enzymatic target identification problem for a broad set of mathematical models. One of these models is Boolean model. We develop a scalable iterative method for the problem when Boolean model is used. This method consists of two phases: Iteration and Fusion Phases. The experiments on the E. coli metabolic network show that the average accuracy of the Iteration Phase alone deviates from that of the exhaustive search only by 0.02%. The Iteration Phase is highly scalable. It can solve the problem for the entire metabolic network of Escherichia coli in less than 10 seconds. The Fusion Phase improves the accuracy of the Iteration Phase by 19.3%. Linear model describes metabolic networks as a system of linear equations rather than Boolean predicates. We prove that finding the enzyme knockout strategy by OptKnock framework is NP-hard and present methods considering multiple enzyme association. Though there exist a lot of articles to study the enzyme knockout strategy, there are few of these papers considering the enzyme association. We present the enzyme knockout strategies on FBA which can deal with the situation that a reaction is catalyzed by multiple enzymes. Considering the enzyme association, that is, ``AND', ``OR' or a combination of them, we provide a binary method and continuous method for each association. Our experiments suggest that the enzyme associations influence the performance of linear programming method very much. We observe that our binary method runs much faster than continuous method. For the pathways of H.sapiens from KEGG, our binary method runs in less than one second for the entire metabolism. Therefore, our binary method is useful for the biological application. Non-linear models can simulate the whole cell system and describe the complex interactions within a metabolic network that are impossible to explain using linear equations. We design two algorithms that solves the enzymatic target identification under non-linear as well as linear models. The first one is a traversal approach that explores possible solutions in a systematic way using a branch and bound method. The second one uses genetic algorithm to derive good solutions from a set of alternative solutions iteratively. Unlike the former one, this one can run for very large pathways. Our experiments show that our algorithms' results follow those obtained in vitro in the literature from a number of applications. They also show that the traversal method is a good approximation of the exhaustive search algorithm and it is up to 11 times faster than the exhaustive one. This algorithm runs efficiently for pathways with up to 30 enzymes. For large pathways, our genetic algorithm can find good solutions in less than 10 minutes. All the above three models, boolean, linear and non-linear suggest that metabolism reaches to steady state over time by changing its state dynamically. The sequence of all these states is the ``dynamic state'' of the system. Next, we consider the dynamic state of the system when we study the enzymatic target identification problem. We aim to find the set of enzyme knockouts that will produce a dynamic state similar to that goal pattern. In order to compare two dynamic states meaningfully, we propose three distance functions. These are the Euclidean distance, time-warping distance and pattern distance. Euclidean distance restricts the solution space to the exact goal state. Time-warping distance allows for stretching of the goal pattern in the time domain. Pattern distance allows scaling and shifting of the goal flux in addition to stretching in the time domain. We provide a branch and bound method to solve this problem. We also develop a partitioning strategy to reduce the running time of our method. This strategy avoids constructing the entire dynamic state by computing a lower bound to the distance between two dynamic states when the entire dynamic state is not available. Our experiments on the Purine metabolism show that our method runs accurately. They also show that our partitioning strategy reduces the number of time intervals computed for dynamic states by a factor of 2 to 6. Once we identify which enzyme set should inhibit, the next step is to select chemical compounds (i.e. drugs) to alter the activity of these enzymes. One of the popular compound selection methods is to screen libraries of small compounds for their ability to bind to biological targets such as receptors and enzymes in silico. We develop two novel computational methods that rank a given set of compounds for a given target protein or enzyme. The major difference between our first method and traditional in-silico screening methods is that we consider additional proteins and enzymes while ranking compounds whereas existing strategies often focus only on the target protein alone. A drug compound can alter the state of the metabolic network. Our second method considers the impact of the drug compounds on the metabolic network by integrating the interactions among proteins in metabolic networks with the docking results. Experiments on the pharmacologic chaperones of misfolded rhodopsin show that our method has better accuracy than the traditional methods that focus only on rhodopsin. Our results are in the top 5.7% of all possible rankings. For the same dataset, the traditional method's results are in the top 81% of all possible rankings.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Bin Song.
Thesis: Thesis (Ph.D.)--University of Florida, 2010.
Local: Adviser: Kahveci, Tamer.
Local: Co-adviser: Ranka, Sanjay.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2010
System ID: UFE0042323:00001

Permanent Link: http://ufdc.ufl.edu/UFE0042323/00001

Material Information

Title: New in silico approaches for metabolic engineering
Physical Description: 1 online resource (177 p.)
Language: english
Creator: Song, Bin
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2010

Subjects

Subjects / Keywords: genetic, heuristic, linear, metabolic
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: This thesis focuses on in silico methods for metabolic engineering. Metabolic engineering discusses methods to manipulate the metabolism to approach a given goal, e.g. increasing the cells' production of a certain substance by operating the genetic and regulatory processes. It is widely applied to drug discovery, food industry and cosmetics. In order to change the metabolism to the desired one, e.g. increasing the production of a certain compound, one metabolic engineering method is to use chemical compounds (i.e. drugs) to inhibit a set of enzymes. When an enzyme is inhibited, it cannot catalyze the reactions it is responsible from. As a result, the production of some of the compounds within the system may increase and some may decrease. The amount or the production rate of all the compounds determine the state of the metabolism. The enzymatic target identification problem aims to identify the set of enzymes whose knockouts lead to the state of the system close to the goal. There are several models in the literature to describe the state of metabolism. In this dissertation we address the enzymatic target identification problem for a broad set of mathematical models. One of these models is Boolean model. We develop a scalable iterative method for the problem when Boolean model is used. This method consists of two phases: Iteration and Fusion Phases. The experiments on the E. coli metabolic network show that the average accuracy of the Iteration Phase alone deviates from that of the exhaustive search only by 0.02%. The Iteration Phase is highly scalable. It can solve the problem for the entire metabolic network of Escherichia coli in less than 10 seconds. The Fusion Phase improves the accuracy of the Iteration Phase by 19.3%. Linear model describes metabolic networks as a system of linear equations rather than Boolean predicates. We prove that finding the enzyme knockout strategy by OptKnock framework is NP-hard and present methods considering multiple enzyme association. Though there exist a lot of articles to study the enzyme knockout strategy, there are few of these papers considering the enzyme association. We present the enzyme knockout strategies on FBA which can deal with the situation that a reaction is catalyzed by multiple enzymes. Considering the enzyme association, that is, ``AND', ``OR' or a combination of them, we provide a binary method and continuous method for each association. Our experiments suggest that the enzyme associations influence the performance of linear programming method very much. We observe that our binary method runs much faster than continuous method. For the pathways of H.sapiens from KEGG, our binary method runs in less than one second for the entire metabolism. Therefore, our binary method is useful for the biological application. Non-linear models can simulate the whole cell system and describe the complex interactions within a metabolic network that are impossible to explain using linear equations. We design two algorithms that solves the enzymatic target identification under non-linear as well as linear models. The first one is a traversal approach that explores possible solutions in a systematic way using a branch and bound method. The second one uses genetic algorithm to derive good solutions from a set of alternative solutions iteratively. Unlike the former one, this one can run for very large pathways. Our experiments show that our algorithms' results follow those obtained in vitro in the literature from a number of applications. They also show that the traversal method is a good approximation of the exhaustive search algorithm and it is up to 11 times faster than the exhaustive one. This algorithm runs efficiently for pathways with up to 30 enzymes. For large pathways, our genetic algorithm can find good solutions in less than 10 minutes. All the above three models, boolean, linear and non-linear suggest that metabolism reaches to steady state over time by changing its state dynamically. The sequence of all these states is the ``dynamic state'' of the system. Next, we consider the dynamic state of the system when we study the enzymatic target identification problem. We aim to find the set of enzyme knockouts that will produce a dynamic state similar to that goal pattern. In order to compare two dynamic states meaningfully, we propose three distance functions. These are the Euclidean distance, time-warping distance and pattern distance. Euclidean distance restricts the solution space to the exact goal state. Time-warping distance allows for stretching of the goal pattern in the time domain. Pattern distance allows scaling and shifting of the goal flux in addition to stretching in the time domain. We provide a branch and bound method to solve this problem. We also develop a partitioning strategy to reduce the running time of our method. This strategy avoids constructing the entire dynamic state by computing a lower bound to the distance between two dynamic states when the entire dynamic state is not available. Our experiments on the Purine metabolism show that our method runs accurately. They also show that our partitioning strategy reduces the number of time intervals computed for dynamic states by a factor of 2 to 6. Once we identify which enzyme set should inhibit, the next step is to select chemical compounds (i.e. drugs) to alter the activity of these enzymes. One of the popular compound selection methods is to screen libraries of small compounds for their ability to bind to biological targets such as receptors and enzymes in silico. We develop two novel computational methods that rank a given set of compounds for a given target protein or enzyme. The major difference between our first method and traditional in-silico screening methods is that we consider additional proteins and enzymes while ranking compounds whereas existing strategies often focus only on the target protein alone. A drug compound can alter the state of the metabolic network. Our second method considers the impact of the drug compounds on the metabolic network by integrating the interactions among proteins in metabolic networks with the docking results. Experiments on the pharmacologic chaperones of misfolded rhodopsin show that our method has better accuracy than the traditional methods that focus only on rhodopsin. Our results are in the top 5.7% of all possible rankings. For the same dataset, the traditional method's results are in the top 81% of all possible rankings.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Bin Song.
Thesis: Thesis (Ph.D.)--University of Florida, 2010.
Local: Adviser: Kahveci, Tamer.
Local: Co-adviser: Ranka, Sanjay.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2010
System ID: UFE0042323:00001


This item has the following downloads:


Full Text

PAGE 1

NEWINSILICOAPPROACHESFORMETABOLICENGINEERING By BINSONG ADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOL OFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENT OFTHEREQUIREMENTSFORTHEDEGREEOF DOCTOROFPHILOSOPHY UNIVERSITYOFFLORIDA 2010 1

PAGE 2

c 2010BinSong 2

PAGE 3

Tomyparents,ZhenlongSongandMeiluAn Tomylittlesister,RuiSong 3

PAGE 4

ACKNOWLEDGMENTS First,Ithanktomysupervisingcommitteechair,AssociateProfessor,Dr.TamerKahveci. Asaniceprofessor,hegavemehelpfulandusefuladvicesandsuggestionswhenIwaspursuing myPh.D.BeforeIworkedwithhim,IhadnoideaabouthowtopursuemyPh.Dandhowto dotheresearch.AfterIjoinedhisgroup,hewaspatienttoguidemeinmygraduatestudy.He advisedmeonhowtoselectresearchtopics,solvetheproblemsandpresentthepapers.Iam greatlyinuencedbyhisideathatDothingsinasmartway.Everythinghasrulesandpeople shouldfollowtheserules.Forexample,forresearch,hethinksitismuchmoreimportanttolearn thewaytodoresearchthandoingtheresearchitself.DuringpursuingPh.D.,Ishouldbecome anindependentresearchertondtheproblemandsolvetheproblem.Ilearnedalotfromhim formyfuturecareer.Moreover,Dr.Kahveciisalsoahelpfulfriend.Imettoughdifcultiesin mylifewhenIwasinschool.Heencouragedmealottoovercomethesedifculties.Heinspired metocontinuemyresearch,nishmygraduatestudyandcreateabrightfuture.Hespentalotof timetotalkwithmeandilluminatethehopeinmyheart.Heisreallyaniceprofessorandfriend inmylife.Withouthishelp,Icannotnishmygraduatestudy. Second,IamalsothankfultoProfessorSanjayRankaasmycommitteecochair.Heisalso patienttoguidemyresearchtopicsandgivemehelpfulsuggestionsintechniques.Heprovided usefulideastoorganizemyPh.D.dissertationandresearchtopics.Ilearnedalotfromhiminthe techniqueparts.Withhishelp,myresearchwentsmoothly.Ihavethecouragetogofurtherinmy futurecareer. ThanksarealsoduetoProfessorSartajSahni,ProfessorJih-KwonPeirandProfessor RaymondBoothasmyPh.D.committeemembers.Theygavemebenecialsuggestionsinmy researchtopicsandworks.TheyfullledmyPh.D.dissertation. Also,Iamthankfultomyofcematesandfriendswhowerewithmeinmygraduatestudy. Ph.D.lifeistoughandhard.Withmyofcematesandfriends,mylifebecamecolorful.Thank 4

PAGE 5

XuZhang,PadmavatiSridhar,JayendraVenkateswaran,LixiaChen,YuchuTong,FeiXu,Leqian Zhang,HechenLiu,WenjieYuan,XiaoLi,MingLi,GangLiu,YanLi,XuelianXiao,FerhatAy, NirmalyaBandyopadhyay,GunhanGulsoyandsoon. Finally,Igreatlyappreciatedmyfamily.WhenIfeltfrustratedinmylife,myparentsalways accompaniedme.MyparentsvisitedmewhenIwasinthehardtimes.Theirwarmheartsand lovesgivemethecourageandenergy.Withouttheirloves,IcannotimaginethatIamableto nishmydissertation.MymotherisagreatmotherandshealwayssaysthatEverythingwillbe well.Myfatherisawonderfullistener.Hecansolveallofmytroubles.Mylittlesisterstands withmeandsupportsme.Withmylittlesister,Idonotfeellonelyinthisworld.Thankstomy family! 5

PAGE 6

TABLEOFCONTENTS page ACKNOWLEDGMENTS....................................4 LISTOFTABLES.......................................9 LISTOFFIGURES.......................................11 CHAPTER ABSTRACT...........................................13 1INTRODUCTION....................................17 1.1OverviewoftheThesis...............................18 1.2OverviewoftheRelatedWork...........................21 1.3OverviewoftheContributions...........................27 1.4OutlineoftheThesis................................30 2ENZYMATICTARGETIDENTIFICATIONUSINGBOOLEANMODELBYHEURISTICMETHODS......................................33 2.1MotivationandProblemDenition.........................34 2.2Methods.......................................35 2.2.1TheIterationPhase.............................35 2.2.2TheFusionPhase..............................41 2.3Results........................................44 2.3.1EvaluationoftheDamageModelonRealDrugs..............44 2.3.2EvaluationoftheIterationPhase......................46 2.3.3EvaluationoftheFusionPhase.......................47 2.4Discussion......................................48 3ENZYMATICTARGETIDENTIFICATIONUSINGLINEARMODELFORMULTIPLEENZYMESCATALYZETHESAMEREACTION...............53 3.1MotivationandProblemDenition.........................53 3.2ComputationalComplexity.............................55 3.3Methods.......................................60 3.3.1SingleEnzyme...............................60 3.3.2MultipleEnzymes..............................61 3.3.2.1Substituteenzyme........................61 3.3.2.2Collaborateenzyme.......................63 3.3.2.3Enzymecomplex........................64 3.4Results........................................65 6

PAGE 7

3.4.1Datasets...................................65 3.4.2Results....................................65 3.4.2.1Effectofsingleandmultipleenzymes..............65 3.4.2.2Evaluationofbinarymethodandcontinuousmethod......66 3.4.2.3Biologicalapplication......................66 3.5Discussion......................................67 4ENZYMATICTARGETIDENTIFICATIONUSINGNON-LINEARMODELBY MANIPULATINGTHESTEADYSTATE........................70 4.1MotivationandProblemDenition.........................71 4.2ComputingtheSteadyState.............................77 4.2.1ComputingtheSteadyStateUsingFlux..................77 4.2.2TheoreticalYieldComputation.......................79 4.2.3ConcentrationComputation.........................82 4.3Methods.......................................83 4.3.1State-Distance................................83 4.3.2TraversalMethod..............................86 4.3.2.1PredictingthevalueofSD....................88 4.3.2.2Filteringstrategy.........................88 4.3.2.3Prioritizationstrategy:......................89 4.3.2.4Multipleoptimalsolutions:...................90 4.3.3GeneticAlgorithm..............................90 4.3.4UseofTraversalAlgorithminCrossoverandPerformanceHiccups....95 4.4Results........................................101 4.4.1EvaluationoftheBiologicalSignicance..................102 4.4.1.1MetabolicengineeringofGlycolysis/Gluconeogenesispathway102 4.4.1.2ApplicationontheproductionofPhenylalanine........103 4.4.1.3IncreasingtheproductionofcGMP...............103 4.4.1.4MetabolicengineeringoftheButanoatemetabolism.......104 4.4.1.5AcetatereductioninE.coli...................104 4.4.1.6PurinemetabolismonGeneralizedMassActionGMAmodel105 4.4.2QuantitativeAnalysisoftheProposedMethods..............105 4.4.2.1Evaluationofthetraversalmethod...............105 4.4.2.2Evaluationofthegeneticalgorithm...............107 4.4.2.3Comparisontoanexistinggeneticalgorithmmethod......110 4.5Discussion......................................111 5ENZYMATICTARGETIDENTIFICATIONWITHDYNAMICSTATES.......119 5.1MotivationandProblemDenition.........................119 5.2RelatedWork....................................123 5.3CalculatingtheDistancebetweenTransientPaths.................124 7

PAGE 8

5.3.1Notation...................................125 5.3.2ExactDistance................................126 5.3.3Time-warpingDistance...........................127 5.3.4PatternDistance...............................129 5.4Methods.......................................132 5.5ExperimentalResults................................134 5.5.1EvaluationofthePerformance.......................136 5.5.2EvaluationoftheAccuracy.........................138 5.6Discussion......................................139 6INTEGRATINGSTRUCTURALPROPERTIESOFPROTEINSANDBIOLOGICALNETWORKSFORCOMPOUNDSELECTION..................143 6.1MotivationandProblemDenition.........................143 6.2Methods.......................................147 6.2.1ProteinSelection..............................148 6.2.2RankingCompounds............................149 6.2.2.1Rankingbasedonafnities...................150 6.2.2.2Integratingnetworksinranking.................152 6.3Results........................................156 6.3.1AccuracyinEstimation...........................160 6.3.2AccuracyinRanking............................161 6.4Discussion......................................162 7CONCLUSION......................................166 REFERENCES.........................................168 BIOGRAPHICALSKETCH..................................177 8

PAGE 9

LISTOFTABLES Table page 2-1IterativeStepsforFigureg/iterEg............................51 2-2Metabolicnetworksusedinourexperiments.......................51 2-3ComparisonoftheaveragedamagevaluesofsolutionsdeterminedbytheIteration Phaseversusthatdeterminedbytheexhaustivesearchalgorithm............51 2-4ComparisonofthedamagevaluesfoundbytheIterationPhaseandtheFusionPhase focusingonthequeriesinwhichtheFusionPhaseimprovedtheresultsoftheIterationPhase.........................................52 3-1Runningtimeinsecondsofourbinarymethodforthewholemetabolismof H.sapiens fromKEGG........................................67 4-1Anexampleshowingthegenerationofachild Ch fromtwohypotheticalparents F and M forapathwaythatcontainsnineenzymes. ? denotesanundecidedvalue.....112 4-2Theexpectednumberofundecidedenzymesfordifferentpathways.#Edenotesthe numberofenzymes.....................................113 4-3MetabolicpathwaysfromKEGGthatareusedinourexperimentsinthischapter....113 4-4Comparisonofthetruncatedtraversalmethodmaximumnumberoflevels=20and thegeneticalgorithmfordifferentpathways.?denotesthatnoSDvaluescanbecomputedwithinoneday....................................116 4-5TheaverageSDvalueandtherunningtimeofthetruncatedtraversalalgorithmwith maximumnumberoflevels=23andtheGeneticAlgorithm.Therunningtimeisreportedinseconds.....................................117 4-6TheaveragerunningtimeofourGeneticAlgorithmforlargemetabolism.Therunningtimeisreportedinseconds..............................117 4-7TheaverageSDvalueswithdifferentmutationrates inseveralpathways........117 4-8Theestimated valuesandthecosttimefordifferentlevelsfortheGlycolysis/Gluconeogenesispathway.Theaveragenumberofknockedoutenzymesintheoptimal solutionofthispathwayis2.2...............................117 4-9TheaverageSDvalueandtherunningtimeofPatil'smethodandourGeneticAlgorithm.Therunningtimeisreportedinseconds......................118 5-1Comparisonoftheaveragepercentageoftheintervalsouralgorithmgeneratesforthe exactandtime-warpingdistancewithandwithoutsplittingthedynamicstates.....142 9

PAGE 10

5-2Accuracyofouralgorithmusingthethreedistancemeasuresondatasetswithdifferentcharacteristics.Theaccuracyvaluesarereportedintermsofpercentageofsuccess.............................................142 6-1Datasetinformationneededfortheapplicationtothesmallmoleculepharmacologic chaperonesofmisfoldedrhodopsin............................164 6-2Theaccuracyoftheactivitylevelpredictionsoffourstrategies.Correlationcoefcientshowsthecorrelationbetweentheactualandthepredictedactivitylevels. Ob p and Ob a denotethepredictedandtheactualactivitylevels................164 6-3Therankingaccuracyofthreestrategies.........................164 10

PAGE 11

LISTOFFIGURES Figure page 1-1Astructureofthisthesis.................................31 1-2Graphrepresentationofametabolicpathwaywithfourreactions R 1 R 2 R 3 and R 4 threeenzymes E 1 E 2 and E 3 ,andvecompounds C 1 C 5 .............32 2-1Agraphconstructedforametabolicnetworkwithfourreactions R 1 R 4 ,three enzymes E 1 E 2 and E 3 ,andvecompounds C 1 C 5 C 1 isthetargetcompound..49 2-2EvaluationoftheIterationPhase.aAverageexecutiontimeinmilliseconds.bAveragenumberofiterations................................50 3-1RedrawProtein-reactionassociationsbyReedetal....................68 3-2Theaveragerunningtimeinsecondsforthenetworkswithsingleenzymeandmultipleenzymessubstituteandcollaborate.........................68 3-3Theaveragerunningtimeinsecondsofbinaryandcontinuousmethodsforthenetworkswhosemultipleenzymesaresubstituteones....................69 3-4Theaveragerunningtimeinsecondsofbinaryandcontinuousmethodsforthenetworkswhosemultipleenzymesarecollaborate......................69 4-1Graphrepresentationofametabolicpathwaywiththreereactions R 1 R 2 R 3 ,twoenzymes E 1 and E 2 ,andninecompounds C 1 C 9 .Dashedlinesshowtheimpactof knockingoutenzyme E 1 .................................112 4-2Fluxdistributionofahypotheticalpathway........................112 4-3Upperboundtotheexpectednumberofundecidedenzymesfordifferent andpathwaysizes N = j E j .Theverticalbarsshowthestandarddeviationineachdirection..113 4-4Glycolysis/Gluconeogenesisfor E.coli .Theenzymeshighlightedingreenaretheenzymesthatexistin E.coli .................................114 4-5Phenylalanine,tyrosineandtryptophanbiosynthesisfor E.coli .Theenzymeshighlightedingreenaretheenzymesthatexistin E.coli ...................115 4-6AverageSDvaluesofthetraversalmethodandexhaustivesearchfordifferentpathwaysovermultiplequeries................................115 4-7Theaveragerunningtimeinsecondsofthetraversalmethodandexhaustivesearch overmultiplequeriesfordifferentpathways.......................116 4-8DistributionofthenumberofenzymesforUreacycleandmetabolismofaminogroups.116 11

PAGE 12

5-1Thepatterns P 1 and P 2 showthesequenceofconcentrationofacompoundresulting fromtwoalternativemanipulationstothemetabolicnetwork..............140 5-2Threetransformations,stretch,scaleandshiftappliedonahypotheticaldynamicstate.140 5-3Anillustrationoftwodynamicstates...........................141 5-4Anillustrationofhowpartitioningthedynamicstate X intoshorterintervalsimproves therunningtime......................................141 6-1AframeworkforthreecompoundselectionstrategiesdenotedbyA,B,andC......163 6-2DistributionofbindingafnityforrhodopsinPDB:1F88andallthe1,990compounds indiversityset.......................................163 6-3RetinolmetabolismandStarchandsucrosemetabolism.Thedarksolidcircleshows theenzymesintheselectedproteinset...........................165 12

PAGE 13

AbstractofDissertationPresentedtotheGraduateSchool oftheUniversityofFloridainPartialFulllmentofthe RequirementsfortheDegreeofDoctorofPhilosophy NEWINSILICOAPPROACHESFORMETABOLICENGINEERING By BinSong December2010 Chair:TamerKahveci Cochair:SanjayRanka Major:ComputerEngineering Thisthesisfocusesoninsilicomethodsformetabolicengineering.Metabolicengineering discussesmethodstomanipulatethemetabolismtoapproachagivengoal,e.g.increasingthe cells'productionofacertainsubstancebyoperatingthegeneticandregulatoryprocesses.Itis widelyappliedtodrugdiscovery,foodindustryandcosmetics. Inordertochangethemetabolismtothedesiredone,e.g.increasingtheproductionofa certaincompound,onemetabolicengineeringmethodistousechemicalcompoundsi.e.drugs toinhibitasetofenzymes.Whenanenzymeisinhibited,itcannotcatalyzethereactionsit isresponsiblefrom.Asaresult,theproductionofsomeofthecompoundswithinthesystem mayincreaseandsomemaydecrease.Theamountortheproductionrateofallthecompounds determinethestateofthemetabolism.Theenzymatictargetidenticationproblemaimsto identifythesetofenzymeswhoseknockoutsleadtothestateofthesystemclosetothegoal. Thereareseveralmodelsintheliteraturetodescribethestateofmetabolism.Inthisdissertation weaddresstheenzymatictargetidenticationproblemforabroadsetofmathematicalmodels. OneofthesemodelsisBooleanmodel.Wedevelopascalableiterativemethodfortheproblem whenBooleanmodelisused.Thismethodconsistsoftwophases:IterationandFusionPhases. Theexperimentsonthe E.coli metabolicnetworkshowthattheaverageaccuracyoftheIteration Phasealonedeviatesfromthatoftheexhaustivesearchonlyby0.02%.TheIterationPhaseis 13

PAGE 14

highlyscalable.Itcansolvetheproblemfortheentiremetabolicnetworkof Escherichiacoli in lessthan10seconds.TheFusionPhaseimprovestheaccuracyoftheIterationPhaseby19.3%. Linearmodeldescribesmetabolicnetworksasasystemoflinearequationsratherthan Booleanpredicates.WeprovethatndingtheenzymeknockoutstrategybyOptKnockframeworkisNP-hardandpresentmethodsconsideringmultipleenzymeassociation.Thoughthere existalotofarticlestostudytheenzymeknockoutstrategy,therearefewofthesepapersconsideringtheenzymeassociation.WepresenttheenzymeknockoutstrategiesonFBAwhichcan dealwiththesituationthatareactioniscatalyzedbymultipleenzymes.Consideringtheenzyme association,thatis,AND,ORoracombinationofthem,weprovideabinarymethodand continuousmethodforeachassociation.Ourexperimentssuggestthattheenzymeassociations inuencetheperformanceoflinearprogrammingmethodverymuch.Weobservethatourbinary methodrunsmuchfasterthancontinuousmethod.Forthepathwaysof H.sapiens fromKEGG, ourbinarymethodrunsinlessthanonesecondfortheentiremetabolism.Therefore,ourbinary methodisusefulforthebiologicalapplication. Non-linearmodelscansimulatethewholecellsystemanddescribethecomplexinteractions withinametabolicnetworkthatareimpossibletoexplainusinglinearequations.Wedesign twoalgorithmsthatsolvestheenzymatictargetidenticationundernon-linearaswellaslinear models.Therstoneisatraversalapproachthatexplorespossiblesolutionsinasystematic wayusingabranchandboundmethod.Thesecondoneusesgeneticalgorithmtoderivegood solutionsfromasetofalternativesolutionsiteratively.Unliketheformerone,thisonecanrunfor verylargepathways.Ourexperimentsshowthatouralgorithms'resultsfollowthoseobtainedin vitrointheliteraturefromanumberofapplications.Theyalsoshowthatthetraversalmethodis agoodapproximationoftheexhaustivesearchalgorithmanditisupto11timesfasterthanthe exhaustiveone.Thisalgorithmrunsefcientlyforpathwayswithupto30enzymes.Forlarge pathways,ourgeneticalgorithmcanndgoodsolutionsinlessthan10minutes. 14

PAGE 15

Alltheabovethreemodels,boolean,linearandnon-linearsuggestthatmetabolismreaches tosteadystateovertimebychangingitsstatedynamically.Thesequenceofallthesestatesis thedynamicstateofthesystem.Next,weconsiderthedynamicstateofthesystemwhenwe studytheenzymatictargetidenticationproblem.Weaimtondthesetofenzymeknockouts thatwillproduceadynamicstatesimilartothatgoalpattern.Inordertocomparetwodynamic statesmeaningfully,weproposethreedistancefunctions.ThesearetheEuclideandistance, time-warpingdistanceandpatterndistance.Euclideandistancerestrictsthesolutionspaceto theexactgoalstate.Time-warpingdistanceallowsforstretchingofthegoalpatterninthetime domain.Patterndistanceallowsscalingandshiftingofthegoaluxinadditiontostretching inthetimedomain.Weprovideabranchandboundmethodtosolvethisproblem.Wealso developapartitioningstrategytoreducetherunningtimeofourmethod.Thisstrategyavoids constructingtheentiredynamicstatebycomputingalowerboundtothedistancebetweentwo dynamicstateswhentheentiredynamicstateisnotavailable.OurexperimentsonthePurine metabolismshowthatourmethodrunsaccurately.Theyalsoshowthatourpartitioningstrategy reducesthenumberoftimeintervalscomputedfordynamicstatesbyafactorof2to6. Onceweidentifywhichenzymesetshouldinhibit,thenextstepistoselectchemicalcompoundsi.e.drugstoaltertheactivityoftheseenzymes.Oneofthepopularcompoundselection methodsistoscreenlibrariesofsmallcompoundsfortheirabilitytobindtobiologicaltargets suchasreceptorsandenzymesinsilico.Wedeveloptwonovelcomputationalmethodsthatrank agivensetofcompoundsforagiventargetproteinorenzyme.Themajordifferencebetweenour rstmethodandtraditionalin-silicoscreeningmethodsisthatweconsideradditionalproteins andenzymeswhilerankingcompoundswhereasexistingstrategiesoftenfocusonlyonthe targetproteinalone.Adrugcompoundcanalterthestateofthemetabolicnetwork.Oursecond methodconsiderstheimpactofthedrugcompoundsonthemetabolicnetworkbyintegrating theinteractionsamongproteinsinmetabolicnetworkswiththedockingresults.Experimentson 15

PAGE 16

thepharmacologicchaperonesofmisfoldedrhodopsinshowthatourmethodhasbetteraccuracy thanthetraditionalmethodsthatfocusonlyonrhodopsin.Ourresultsareinthetop5.7%ofall possiblerankings.Forthesamedataset,thetraditionalmethod'sresultsareinthetop81%ofall possiblerankings. 16

PAGE 17

CHAPTER1 INTRODUCTION Metabolicengineeringdiscussesmethodstomanipulatethemetabolismtoapproach agivengoal,e.g.increasingthecells'productionofacertainsubstancebyoperatingthe geneticandregulatoryprocesses[20,97,104,112].Itiswidelyappliedtodrugdiscovery, foodindustryandcosmetics.Forexample,fattyacidbiosynthesispathwayconvertsfattyacids thatareusedinthecosmeticindustryincreamsandlotions[105,115].Butanoatemetabolism producespoly-hydroxybutyratewhichisessentialforproducingplastics[11].Mevalonic acidpathwayandMEP/DOXPpathwayproducecarotenoidthatareoftenusedasanti-oxidant infoodindustry[83].Themetabolismsofmanyorganisms,suchasbacteria,algeaandplants naturallyproducethesecompounds.Acommonpracticeistoextractthemfromtheseorganisms. Bymanipulatingthepathwaysoftheseorganisms,theproductionofthesecompoundscanbe increasedsignicantlybymetabolicengineering.Forexample,wecancuttheconsumptionof thedesiredcompoundbytheunderlyingorganism. Indrugdiscoveryeld,metabolicengineeringplaysasignicantroletoo[10,53].A healthymetabolismkeepsthestatusoforganismatcertainvalues.Externalaffectsorgenetic mutationscanchangetheproductionrateofasetofenzymes.Theycanevenmodifythe structureofproducedenzymes.Suchunexpectedenzymebehaviorscanleadtoaberrations inthemetabolism.Lowormissingactivityofanenzymemayresultintheblockageofthe pathway.Furthermore,thiscanpropagatetootherpartsofthepathwaythatneedthecompounds producedintheblockedpartofthepathway.Asaresult,theproductionofsomecompounds maybeincreasedandsomeoftheothersmaybedecreased[66,84,111].Suchaberrationsinthe metabolismcanleadtoseverediseasessuchasmentalretardation,seizures,decreasedmuscle tone,organfailureandblindness[27,89].Thus,changingthemetabolismbacktoadesiredlevel isneeded.Therefore,ametabolicengineeringmethodisneededtoalterthemetabolismstatus. 17

PAGE 18

Currently,insilicomethodiswidelyusedinmetabolicengineering[8,22,24,25,72, 77].Computationalmethodshavethefollowingadvantages.First,ithaslowcosts.Withthe computationalpredictionandlter,Biologistsdonotneedtobuildunnecessaryinvivoorin vitroexperiments[45,60,98].Second,itreduceslargeamountsoftime.Forexample,ifwe wanttoselectdrugcandidatesfromalargecompoundlibrarybyrealscreening,thatis,webuild experimentsforeachcompound,thenitwillcostquiteafewyearsforalibrarywith10000 compounds.However,ifweselectthembyvirtualscreening,thatis,wescreenthesecompounds bycomputer,wecantestthousandsofcompoundsinadayandnishawholecompoundlibrary with10000compoundsforonlyseveraldays.Third,withthehelpofcomputer,wemaydesigna compoundasdrugwhichhasnotbeensynthesizedyet. Figure1-1showsthestructureofthisthesiswhichdiscussestheproblemsofmetabolic engineeringinthefollowingchaptersindetail. 1.1OverviewoftheThesis Enzymescatalyzebiologicalreactions[33,91].Reactionstransformasetofcompoundsinto anothersetofcompounds.Inordertochangethemetabolismtothedesiredone,e.g.increasing theproductionofacertaincompound,onemetabolicengineeringmethodistousechemical compoundsi.e.drugstoinhibitasetofenzymes.Whenanenzymeisinhibited,itcannot catalyzethereactionsitisresponsiblefrom.Asaresult,theproductionofthesystemmay increaseandsomemaydecrease.Theenzymatictargetidenticationproblemishowtoidentify thesetofenzymeswhoseknockoutsleadtothesystemstatusclosetothegoal.Thesizeandthe complexstructureofthemetabolicpathwaysmaketheenzymatictargetidenticationproblem computationallychallenging.Inordertochangethemetabolismtothedesiredone,werst evaluatethestateofthemetabolism.Basedondifferentmodels,thestateofthemetabolismcan beexpressedindifferentways,suchastheyieldofeachcompound,theuxofeachcompound ortheconcentrationofeachcompound.Thesteadystatemeansthatthestatethatremains 18

PAGE 19

unchangedovertime.Ifweonlyconsiderthesteadystateofthesystem,thereexistthreepopular models,Booleanmodel,linearmodelandnon-linearmodel.Therstpartofthisthesisdiscusses theenzymatictargetidenticationproblemforthesteadystateanalysiswiththreemodels, Booleanmodel,linearmodelandnon-linearmodel.Thisthesisalsodescribesthemethodsfor theenzymatictargetidenticationproblemwithdynamicstates.Finally,thethesispresents approachestoselectcompoundstoinhibitenzymes. EnzymatictargetidenticationusingBooleannetworkmodels. Sridharetal.andSonget al.consideredaBooleanmodeloftheenzymatictargetidenticationproblem[9496].Intheir version,eachentryofthestatedenoteswhetherthecorrespondingcompoundispresentornot. Forthissimpliedversion,thegoal,then,istoidentifythesetofenzymeswhoseknockouts eliminateallthetargetedcompoundswhileincurringminimumdamage.Currently,Sridhar etalprovidedanoptimalalgorithmforthismodel[96].Chapter2discussesaheuristicalgorithm fortheenzymatictargetidenticationproblemusingBooleanmodel. Enzymatictargetidenticationusinglinearmodelformultipleenzymescatalyzethesame reaction. Infact,Booleanmodelistoosimpletodescribethesystem.FluxBalanceAnalysis, FBA[7,23,29,48,57,69,71]isapopularlinearmodeltodescribethemetabolismaslinear programmingequations.Webexperimentsshowthatitisareasonablemodelfortheux distribution.Forthelinearmodel,wecanuseintegerlinearprogrammingtotackletheenzymatic targetidenticationproblem.OptKnockisoneexampletothealgorithmsinthisclass[9]. However,Optknockcannotdealwiththesituationthatthereactioniscatalyzedbymultiple enzymesnotonlyone.Chapter3providesamethodfortheenzymatictargetidentication problemusinglinearmodelinthecasethatmultipleenzymescatalyzethesamereaction. Enzymatictargetidenticationusingnon-linearmodelbymanipulatingthesteadystate. Thoughthelinearmodelworkswellforsomecases,thereexistmorecomplexnon-linearmodels todescribethemetabolism.Thesenon-linearmodelscansimulatethecellsystembetterthan 19

PAGE 20

thelinearmodel.Forexample,S-systems[85,86,107]andGMAmodel[15,43,73,107]are twopopularnon-linearmodelsforthecellsystemsimulation.Forthesenon-linearmodels, thepreviousmethods,e.g.linearprogrammingmethod,cannotapplytotheenzymatictarget identicationproblemdirectly.Therefore,Chapter4describesmethodsfortheenzymatictarget identicationproblemusingthenon-linearmodel. Enzymatictargetidenticationwithdynamicstates. Ifweconsiderthedynamicstateto evaluatethesystemstatus,Chapter5discussesthissituation.Theprocessbetweentwosteady statesissignicant.Ifthereexiststwodifferentpathesfromthestartstatetothenalstate, theinuenceofthesetwopathesonthewholebiologicalsystemmaybesignicantdifferent. Forexample,ifwewanttoincreasethebloodsugarconcentrationofabiologicalsystem,one pathistograduallyenhancethebloodsugarconcentration.Anotherwayistoaggrandizethe bloodsugarconcentrationwithasharpcurvethendecreaseittothegoalconcentration.The rstmethodmaycostmuchtimetoreachthegoalhoweverthesecondmethodmaybring dangerousside-effect.Thatis,thesugarconcentrationmayreachaserioushighextentandlead theorganismtodie.Thus,itisnecessarytoconsiderthedynamicprocesswhenweidentifythe enzymesettochangethestateofthebiologicalsystem.Therefore,Chapter5describesmethods fortheenzymatictargetidenticationproblemwithdynamicstates. Integratingstructuralpropertiesofproteinsandbiologicalnetworksforcompoundselection. Onceweidentifywhichenzymesetshouldinhibit,thenextstepistoselectchemical compoundsi.e.drugstoaltertheactivityoftheseenzymes.Oneofthepopularcompound selectionmethodsistoscreenlibrariesofsmallcompoundsfortheirabilitytobindtobiological targetssuchasreceptorsandenzymes[38]insilico.Thisprocessisalsoknownasdocking[59].Despitesomesuccessincompoundpredictioninseveralapplications,recentvalidation studiesshowthatdockingmethodshaveapoorperformanceincompoundselection[110].Inthis thesis,weconsiderthecompoundselectionproblem.Wedenethisproblemasfollows.Assume 20

PAGE 21

thatwearegivenatargetproteinorenzymeandalibraryofcompounds.Compoundselection problemaimstoidentifythecompoundsfromthislibrarythatwillbindtothetargetatahighrate andchangetheactivitylevelofthetarget.Morespecically,wedeveloparankingalgorithmthat sortsthecompoundsinthecompoundlibraryaccordingtotheirprobabilityofalteringtheactivity ofthetargetproteinorenzyme. 1.2OverviewoftheRelatedWork Metabolicengineeringhasbeenstudiesfortensofyears.However,insilicoapproachesfor metabolicengineeringisstillaneweld.Inordertondnewinsilicoapproachesformetabolic engineering,therststepistoachieveanumberofdatabasesformetabolism.Fortunately,there aremoreandmorepublicdatabasesavailable.SomeexamplesareKEGG[46],EcoCyc[52]and ENZYME[3].TherearealsosomeBionetworkandinteractiondatabasessuchas,aMAZE[58], DIP[55]andBIND[2].Enzymatictargetidenticationisaneweldinmetabolicengineering. Sridharetal.andSongetal.providedtheenzymatictargetidenticationproblem[9496]. Ifweconsiderthesteadystateofthesystem,webrieysummarizethreeclassicmodelsand correspondingmethodsfortheenzymatictargetidenticationproblem. FortheenzymatictargetidenticationproblemusingBooleanmodel,Sridharetal.and Songetal.consideredasimpliedversionofBooleanmodel[9496].InBooleanmodel, compoundsareseparatedintotwogroups,TargetcompoundsandNon-Targetcompounds.Target compoundsaretheonesthatweaimtoremoveandNon-Targetcompoundsaretheremaining ones.Allthecompoundsaretwostates,presentandnotpresent.Wetermtheside-effectsof inhibitingagivensetofenzymesasthedamagecausedtothemetabolicnetwork.Formally,we denedamageofinhibitingasetofenzymesasthenumberofnon-targetcompoundswhose productionsarestoppedduetotheinhibitionofthoseenzymes.Then,theenzymatictarget identicationproblembasedonBooleanmodelistoseekthesetofenzymeswhoseinhibition eliminatesallthetargetcompoundsandinictsminimumdamageontherestofthenetwork. 21

PAGE 22

Figure1-2showsanexampleforBooleanmodel.Inhibiting E 1 knocksoutthereactions R 1 and R 2 andcompounds C 1 C 2 C 3 and C 4 .Suppose C 1 isthetargetcompound.Inhibiting E 1 stopsthreenon-targetcompounds C 2 C 3 and C 4 .Thedamageisthree.Ifweinhibit E 2 and E 3 ,weremovethereactions R 3 R 4 and R 1 .Then,weremovethetargetcompound C 1 and onenon-targetcompound C 5 .Thedamageisone.Itcomesthequestionwhichenzymesetis betterandhowtoselecttheenzymesetwiththeminimumdamage.Evaluatingalltheenzymeset combinationisexponential. Sridharetal.,usedabranchandboundstrategy,namedOPMETtosolvethisproblemfor pathwayswithupto32enzymesinlessthananhour[96].OPMETexpressedthesearchspaceas abinarytree.Intherootnode,alltheenzymesarepresentinthenetwork.OPMETstartedfrom therootnode.WhenOPMETtraversedthetree,rstvisitthecurrentnode,secondtheleftchild, thirdtherightchild.WhenOPMETvisitedthecurrentnode,itcomputedthecurrentdamage andmaintainedtheglobalcut-offthreshold.Basedonthecurrentdamageandtheglobalcut-off threshold,OPMETdecidedwhetheritactivatedtheprunestrategytoprunesomeunnecessary nodestovisit.Thus,itsavedthecomputationtime.OPMETalsoprovidedaprioritization strategytoordertheenzymestovisit.Themainideaistopushthegoodsolutionstothetop levelsofthetree.Thus,morenodeswouldbeprunedintheprunestrategy.OPMETexploredthe searchspacedynamically.OPMETguaranteedanoptimalsolutionwithtwolteringstrategies, prunestrategyandprioritizationstrategy.Theycomputeanupperboundtothenumberoftarget compoundseliminatedandalowerboundtotheside-effectrespectively.However,OPMETisan exponentialmethodintheworstcase.Itcannotdealwiththelargenetworke.g.morethan32 enzymesinthenetwork. Fortheenzymatictargetidenticationproblemusinglinearmodel,integerlinear programmingisakindofapproachtotacklethisproblem.OptKnockisoneexamplefor thisapproach[9].ThesestrategiessimulatethemetabolismusingFluxBalanceAnalysis, 22

PAGE 23

FBA[7,29,48].Atahighlevel,theyrepresenteachuxasavariableandsolvealinear equationwithlinearconstraintsonthesevariablesasfollows. MaximizeorminimizeObjectivefunction/Subjecttosteadystateconstraints Thisformulationrepresentsthemetabolismusingastoichiometricmatrix S [116]where therowsandthecolumnscorrespondtocompoundsanduxesrespectively.Assumethat x = [ x 1 x 2 x n ] 0 denotestheuxvectorforanetworkwith n uxes.Theobjectivefunctionis typicallytomaximizeavariableoralinearcombinationofasetofvariables.Thusanobjective functionistypically P i c i x i where c i aregivenconstants.Theconstraintsdenethesteadystate usingstoichiometricmodel.Thesolutiontotheequation Sx =0 isthesetofallsteadystates inthismodel.Assumethat y = [ y 1 y 2 y m ] 0 denotesthevectorforenzymeactivity. y j isa binaryvariablewhichisequalto0ifanenzymeisknockedoutand1else.Then,thereexiststhe constrainsthat min j y j x j max j y j ,where max j and min j isthemaximumandminimum possibleowcorrespondingtoux j ,whichexpressthatwhethertheenzyme y j catalyzesthe reaction x j Thereexistseveralnon-linearmodelstoexpressthemetabolism.Forexample,Ssystems[85,86,107]denesthesteadystateasthesolutiontotheequationsystem X i = i Y j X g ij j )]TJ/F24 11.9552 Tf 11.955 0 Td [( i Y j X h ij j =0 ; 8 i: Here,thevariable X i representstheconcentrationofthe i thmolecule. X i isthederivativeof X i .Theconstants i i and g ij h ij denotetherateofthereactionandtherateatwhicheach moleculecontributestoareaction.Clearly,theconstraintsarenon-linearintheS-systemsof equations.Takingthelogarithmoftheconstraintslinearizestheconstraintsasfollows.Dene Y i =log X i .Theconstraintsbecome log i + X j Y j g ij )]TJ/F15 11.9552 Tf 11.955 0 Td [(log i + X j Y j h ij =0 ; 8 i: 23

PAGE 24

GMAmodel[15,43,73,107]isanothernon-linearmodel.Thismodelconsiderseachreaction thatacompoundisapartofseparatelyandrepresentsthesteadystateusingthefollowing equations X i = X h ih Y j X f ij j =0 ; 8 i: Here,theconstants ih and f ij denotetherateofthereactionandtherateatwhicheachmolecule contributestoareaction.Inthismodel,notonlytheconstraintsarenon-linear,butalsothe summationofthemultiplicativetermsmakeitimpossibletotakethelogarithmoftheconstraints. Becausethesemodelsarenon-linear,wecannotusetheexistinglinearmethodsforthese non-linearmodels. MostoftheexistingmethodsaresuitedwellforlinearmodelsorBooleanmodel.Thus theydonotworkwhenthesemorecomplexmodelsareusedtocomputethesteadystateofthe metabolicnetwork. Klamtetal.introducedaminimalcutsetproblem,whichaimedtondaminimalset ofreactionswhosedeletionleadstonofeasiblebalanceduxdistributionintheobjective reaction[54].Theauthorsdescribedseveralalgorithmstosolvetheminimalcutsetproblem.The aimofthismodelistoblocktheobjectivereactionfunctionwhichresultsintheremovalofthe objectivemetabolitesynthesis.Itcannotbeusedwhentheaimistopartlydecreaseorincrease theobjectivemetabolites. ExtremePathwayAnalysis[78]usesFBAtondthepathinapathwaythatmaximizes orminimizestheproductionofagivencompound.Thisproblemissimilartoaspecialcase oftheenzymetargetidenticationproblemconsideredinthischapter.Deetal.forexample, considertheextremepathwayanalysisproblem[18].Inordertoreducetheyieldofacompound inapathway,Deetal.useFBAtocomputetheoptimalpathwaysothattheyieldofthetarget metaboliteisminimum.They,then,changetheconcentrationoftheenzymesinotherpathsso thatthesepathsareinactiveexceptthatoptimalone.Thismethodhastwomajordrawbacks. 24

PAGE 25

First,itrequireschangingtheconcentrationofmanyenzymes.Inpractice,changingtheenzyme concentrationisacostlyprocess.Therefore,thenumberofenzymeswhoseconcentrationsare alteredshouldbekeptlow.Second,thealterationsthatchangetheproductionofacompound canaffecttheproductionofothercompoundsinthatpathway.Thus,thesolutionfoundby thismethodcanhavesignicantside-effects.Inadditiontothesedrawbacks,extremepathway analysiscannotsolvetheenzymatictargetidenticationproblemfornon-linearmodel. Patiletalpresentedanevolutionaryprogrammingmethodforndingoptimalgenedeletion strategies[72].Theirmethodgeneratesapopulationofrandomsolutionsandusegenetic algorithmtoimprovethispopulation.Theirmethodcanbeappliedtonon-linearmodelsaswell aslinearmodels.However,ithasseveraldrawbacks.Theenzymatictargetidenticationproblem looksforasetofenzymesthatareconnectedoveracomplexnetworkandinteractthrough reactionsovercompounds.Patiletal'smethodignorestheseinteractionswhileconstructingthe populationofsolutionsaswellascreatingnewgenerationofsolutionsusingcrossover.They insteadcreatethesesolutionsrandomly.Thesearchspaceoftheenzymetargetidentication problemisexponentialinthenumberofenzymes.Asaresult,theirmethodfailstoconverge toagoodsolution.Furthermore,thesolutionsfoundbytheirmethodsuggestsknockingout unnecessarilylargenumberofenzymes. Alltheabovementionedmethodsconsideronlythesteadystateofthemetabolism.They ignorethesequenceofstatestheunderlyingnetworkvisitswhilereachingthesteadystate.Asa result,althoughtheirsolutionmaybeoptimalatthesteadystate,theintermediatestatesoftheir solutionscanbeundesirable. Thereareseveralmodelsthatsimulatethedynamicstateofagivenmetabolicnetwork.For example,DynamicFluxBalanceAnalysisDFBAextendedthetraditionalFBAtodescribethe changerateoftheuxesoveraperiodoftime[62,64].DFBAincorporatesthetimeparameter whichcanpredictthemetaboliteconcentrations.Itconsiderstheentiretimeperiodandbuildsa 25

PAGE 26

non-linearprogrammingproblem.Itseparatesthetimeintroseveralintervals.Foreachintervalit employsalinear-programmingmethodtoestimatetheuxvaluesduringthatinterval.Integrated dynamicFBAidFBAsimulatestheintegratedsystemincludingsignaling,metabolicand regulatorynetworks[56].SimilartoDFBA,idFBAseparatesthetimeintoseveralintervals. Foreachinterval,itappliesFBAtocomputetheuxvalues.Fromthesevalues,itdecides whichreactionswilltakeplaceduringthenextinterval.IntegratedFBAiFBAmodelbuildsa dynamicsimulationamongmetabolic,regulatoryandsignalingnetworks[16]alongthesame linesasDFBAandidFBA.Itrstseparatesthetimetoseveralintervals.Itthenappliesordinary differentialequationsODEs[75]andBooleanregulatorymodeltoconstraintheFBAlinear programmingproblem.Itupdatesthebiomassandexternalmetaboliteconcentrationsforuse insubsequenttimesteps.Allthesemethodsaimtondthedynamicstateofagivenmetabolic network.Theyhoweverdonotconsiderthedynamicenzymatictargetidenticationproblem, whichisthefocusofChapter5. Forcompoundselectionproblem,oneofthepopularmethodsistoscreenlibrariesofsmall compoundsfortheirabilitytobindtobiologicaltargetssuchasreceptorsandenzymes[38]. Thisprocessisalsoknownasdocking.Dockingalgorithmsestimatehowtwomoleculescan bindwitheachothertoformastablecomplex[59].DOCK[26],Glide[30]andGOLD[41]are afewexamplesofexistingdockingsoftware.Thesetoolspredictthebindingafnitybetween eachsmallmoleculeandthetargetprotein.Oncethedockingsoftwarecomputestheafnity ofeachsmallmoleculeinalibraryofmolecules,thenextstepisoftentopicktheonesthat havehighpredictedafnityvaluesandtesttheminthelab.Thisprocesshasbeensuccessful inseveralapplications.Forexample,LyneetalusedFlexX-Pharm[34]dockingsoftware tosearchabout200,000compoundsandidentiedfournovelclassesofinhibitorforChk1 kinase[63].Kellenbergeretal.searchedabout44,000compoundsbydockingsoftware,GOLD andSurex[40,49],andfoundnovelnon-peptideligandsforGPCRCCR5[50]. 26

PAGE 27

1.3OverviewoftheContributions Thefollowingbrieyshowsourmaincontributionsinthisthesis. EnzymatictargetidenticationusingBooleannetworkmodels. Inthismodel,each compoundhasbooleanstate,presentornot.Thegoalistoidentifythesetofenzymeswhose knockoutseliminateallthetargetedcompoundswhileincurringminimumdamage.Targeted compoundsaretheoneswhoseproductionsneedtobestopped.Theyhavedeneddamage asthenumberofnon-targetedcompoundsthatareeliminatedbecauseofknockouts.Minimum damageistheminimumnumberofnon-targetedcompoundseliminatedfromthemetabolism whileeliminatingthetargetedcompoundsamongallpossiblewaysofeliminatingthetargeted compounds.Therefore,thismodelconsidersthetoxicitywhiletraditionaldrugdevelopment approacheshaveoftenfocusedmoreontheefcacyofdrugsthantoxicity. Contribution: Wedevelopedascalableiterativemethodwhichcomputesasub-optimal solutionwithinreasonabletime-bounds.Themethodconsistsoftwophases:IterationandFusion Phases.Theexperimentsonthe E.coli metabolicnetworkshowthattheaverageaccuracyofthe IterationPhasealonedeviatesfromthatoftheexhaustivesearchonlyby0.02%.TheIteration Phaseishighlyscalable.Itcansolvetheproblemfortheentiremetabolicnetworkof Escherichia coli inlessthan10seconds.TheFusionPhaseimprovestheaccuracyoftheIterationPhaseby 19.3%. Enzymatictargetidenticationusinglinearmodelformultipleenzymescatalyzethe samereaction FluxBalanceAnalysis,FBA[7,29,48]isapopularlinearmodeltodescribethe metabolismaslinearprogrammingequations.Integerlinearprogrammingisapopularmethodto tackletheenzymatictargetidenticationproblem.OptKnockisoneexampletothealgorithms inthisclass[9].Atahighlevel,theyrepresenteachuxasavariableandsolvealinearequation withlinearconstraintsonthesevariablesasfollows. MaximizeorminimizeObjectivefunction 27

PAGE 28

Subjecttosteadystateconstraints However,Optknockcannotdealwiththesituationthatthereactioniscatalyzedbymultiple enzymesnotonlyone.Infact,multipleenzymesmaycarryoutthesamereaction.Forexample, substituteenzymesdenotethatonlyoneoftheseenzymesneedtobepresentforthereactionto occur.ThereexistsORassociationsamongtheseenzymes.Collaborateenzymesdenotethatall theseenzymeshavetobeexpressedforthereactiontooccur.ThereexistsANDassociations amongtheseenzymes.Moreover,thereactioniscatalyzedbyasetofenzymeswhoseassociation isacombinationofORandAND.Therefore,itisnecessarytostudyhowtoapplythelinear programmingmethodtothemultipleenzymeassociation. Contribution: WeprovethatndingtheenzymeknockoutstrategybyOptKnockframeworkisNP-hard.Itisconsistentwiththeexperimentresultsthatwhenthenetworksizeincreases,therunningtimeofOptKnockframeworkincreasesexponentially.Weprovideabinary methodandcontinuousmethodforANDandORassociation.Experimentsshowsthatthe enzymeassociationinuencetheperformanceoflinearprogrammingmethodverymuch.We observethatourbinarymethodrunsmuchfasterthancontinuousmethod.Forthepathwaysof H.sapiens fromKEGG,ourbinarymethodrunslessthanonesecondforthewholemetabolism. Therefore,ourbinarymethodisusefulforthebiologicalapplication. Enzymatictargetidenticationusingnon-linearmodelbymanipulatingthesteady state. Thestateofametabolicpathwaycanbeexpressedasavector,whichdenotestheyield ofthecompounds[106]ortheux[71]inthepathwayatagiventime.Yieldofacompound istheamountofproductobtainedinthechemicalreaction[106].Theuxofareactionisthe rateatwhicheachcompoundisproducedorconsumedbythatreaction[71].Steadystateis thestatethatremainsunchangedovertime.TheEnzymaticTargetIdenticationProblemin thismodel,aimstoidentifythesetofenzymeswhoseknockoutsleadtoasteadystateofthe metabolicpathwaythatisasclosetoausersuppliedgoalstateaspossible. 28

PAGE 29

Contribution: Wedeveloptwoalgorithmstondtheenzymesetwithminimaldeviation fromthegoalstate.Therstoneisatraversalapproachthatexplorespossiblesolutionsina systematicwayusingabranchandboundmethod.Thesecondoneusesgeneticalgorithmsto derivegoodsolutionsfromasetofalternativesolutionsiteratively.Unliketheformerone,this onecanrunforverylargepathways.Ourexperimentsshowthatouralgorithms'resultsfollow thoseobtainedinvitrointheliteraturefromanumberofapplications.Theyalsoshowthatthe traversalmethodisagoodapproximationoftheexhaustivesearchalgorithmanditisupto11 timesfasterthantheexhaustiveone.Thisalgorithmrunsefcientlyforpathwayswithupto 30enzymes.Forlargepathways,ourgeneticalgorithmcanndgoodsolutionsinlessthan10 minutes. Enzymatictargetidenticationwithdynamicstates. Theuxofametabolicnetwork changesfromonesteadystatetoanothersteadystateduetopresenceofexternalinhibitions.The sequenceofintermediatestates,calledthedynamicpath,showsthepatternthattheunderlying networkfollowstoreachthesteadystate.Understandingthispatterniscrucialformetabolic engineeringassomeoftheintermediatestatescanbeundesirable. Contribution: Weconsidertheproblemofenzymatictargetidenticationinmetabolic networks.Unlikeexistingstrategies,weconsiderthedynamicbehaviorofthestatechangesof thenetworks.Givenagoalpatternfortheuxesofagivennetwork,weaimtondthesetof enzymeknockoutsthatwillproduceadynamicstatesimilartothatgoalpattern.Weconsider threedistancefunctionstomeasurethedifferencebetweentwodynamicstates.Thesearethe Euclideandistance,timewarpingdistanceandpatterndistance.Euclideandistancerestricts thesolutionspacetotheexactgoalstate.Time-warpingdistanceallowsforstretchingofthe goalpatterninthetimedomain.Patterndistanceallowsscalingandshiftingofthegoalux inadditiontostretchinginthetimedomain.Weprovideabranchandboundmethodtosolve thisproblem.Wealsodevelopapartitioningstrategytoreducetherunningtimeofourmethod. 29

PAGE 30

Thisstrategyavoidsconstructingtheentiredynamicstatebycomputingalowerboundto thedistancebetweentwodynamicstateswhentheentiredynamicstateisnotavailable.Our experimentsonthePurinemetabolismshowthatourmethodrunsaccurately.Theyalsoshowthat ourpartitioningstrategyreducesthenumberoftimeintervalscomputedfordynamicstatesbya factorof2to6. Integratingstructuralpropertiesofproteinsandbiologicalnetworksforcompound selection. Theprocessofinsilicocompoundselectionndinganewcandidatedrugfrom largelibrariesofcompoundsbycomputeraid,playsasignicantroleinmoderndrugdiscovery. Oneofthepopularcompoundselectionmethodsistoscreenlibrariesofcompoundsfortheir abilitytobindtobiologicaltargetssuchasreceptorsandenzymes.Thisprocessisalsoknown asdocking.Recentvalidationstudiesshowthatdockingmethodshaveapoorperformancein compoundselection. Contribution: Wedeveloptwonovelcomputationalmethodsthatrankagivensetof compoundsforagiventargetproteinorenzyme.Themajordifferencebetweenourrstmethod andtraditionalin-silicoscreeningmethodsisthatweconsideradditionalproteinsandenzymes whilerankingcompoundswhereasexistingstrategiesoftenfocusonlyonthetargetproteinalone. Adrugcompoundcanalterthestateofthemetabolicnetwork.Oursecondmethodconsidersthe impactofthedrugcompoundsonthemetabolicnetworkbyintegratingtheinteractionsamong proteinsinmetabolicnetworkswiththedockingresults.Experimentsonthepharmacologic chaperonesofmisfoldedrhodopsinshowthatourmethodhasbetteraccuracythanthetraditional methodsthatfocusonlyonrhodopsin.Ourresultsareinthetop5.7%ofallpossiblerankings. Forthesamedataset,thetraditionalmethod'sresultsareinthetop81%ofallpossiblerankings. 1.4OutlineoftheThesis AnintroductiontosketchesandmethodstometabolicengineeringispresentedinChapter1. Chapter2providesheuristicmethodsforenzymatictargetidenticationbyBooleanmodel. 30

PAGE 31

Chapter3discussesenzymatictargetidenticationonmultipleenzymesassociationbylinear model.Chapter4designsmethodsforenzymatictargetidenticationbymanipulatingthesteady state.Chapter5discussestheenzymaticidenticationproblemwithdynamicstates.Chapter6 presentsacompoundselectionmethodintegratingstructuralpropertiesofproteinsandbiological networks.Chapter7containstheconclusions. Figure1-1.Astructureofthisthesis 31

PAGE 32

Figure1-2.Graphrepresentationofametabolicpathwaywithfourreactions R 1 R 2 R 3 and R 4 threeenzymes E 1 E 2 and E 3 ,andvecompounds C 1 C 5 32

PAGE 33

CHAPTER2 ENZYMATICTARGETIDENTIFICATIONUSINGBOOLEANMODELBYHEURISTIC METHODS Drugdiscoveryaimsndingmoleculesthatmanipulateenzymesinordertoincrease ordecreasetheproductionofdesiredcompoundswhileincurringminimumside-effects.An importantpartofthisproblemistheidenticationofthetargetenzymes,i.e.,theenzymesthat willbeinhibitedbythedrugmolecules.Findingtherightsetoftargetenzymesisessential fordevelopingasuccessfuldrug.Therelationshipbetweenenzymesandcompoundsthrough reactionsisdenedusingmetabolicnetworks.Findingthebestsetoftargetenzymesrequiresa carefulanalysisoftheunderlyingmetabolicnetwork. Thischapterpresentstheproblemofndingthesetofenzymes,whoseinhibitionstopsthe productionofagivensetoftargetcompounds,whileeliminatingminimalnumberofnon-target compounds.Here,targetcompoundsarethecompoundswhosepresencecausetheunderlying disorder.Thenon-targetcompoundsarealltheremainingcompounds.Wecallthisproblem TargetIdenticationbyEnzymesTIE.Anexhaustiveevaluationofallpossibleenzyme combinationsinthemetabolicnetworktondtheoptimalsolutioniscomputationallyinfeasible forverylargemetabolicnetworks.Wedevelopedascalableiterativemethodwhichcomputes asub-optimalsolutionwithinreasonabletime-bounds.Themethodconsistsoftwophases: IterationandFusionPhases.TheIterationPhaseisbasedontheintuitionthatagoodsolutioncan befoundbytracingbackwardfromthetargetcompounds.Itinitiallyevaluatestheimmediate precursorsofthetargetcompoundsanditerativelymovesbackwardstoidentifytheenzymes whoseinhibitionincurslessside-effects.Thisphaseconvergestoasub-optimalsolutionafter asmallnumberofiterations.TheFusionPhasetakestheunionofasetofsub-optimalresults foundattheIterationPhase.Eachset,here,isapotentialsolution.Itthenincreasesthissetby insertingasmallsubsetoftheremainingenzymesrandomly.Thesizeofthenalsetisbounded bythetimeallowedfortheexhaustivesearch.TheFusionPhaseexhaustivelysearchesthenal 33

PAGE 34

settondtheoptimalsubsetofenzymesfromthisset.It,then,recursivelycreatesanewsetby insertingrandomenzymestotheoptimalsolutionfoundsofarandexhaustivelysearchesthisset againuntilapredenednumberofiterationsareperformed. Theexperimentsonthe E.coli metabolicnetworkshowthattheaverageaccuracyofthe IterationPhasealonedeviatesfromthatoftheexhaustivesearchonlyby0.02%.TheIteration Phaseishighlyscalable.Itcansolvetheproblemfortheentiremetabolicnetworkof Escherichia coli inlessthan10seconds.TheFusionPhaseimprovestheaccuracyoftheIterationPhaseby 19.3%. 2.1MotivationandProblemDenition Traditionaldrugdevelopmentapproacheshaveoftenfocusedmoreontheefcacyofdrugs thantheirtoxicityuntowardsideeffects.Lackofpredictivemodelsthataccountfortheinterrelationshipsbetweenthemetabolicprocessesoftenleadstodrugdevelopmentfailures.Toxicity and/orlackofefcacycanresultifmetabolicnetworkcomponentsotherthantheintended targetareaffected.Thisiswell-illustratedbythefailureofTolcaponeanewdrugdevelopedfor Parkinson'sdiseaseduetoobservedhepatictoxicityinsomepatients[19].Post-genomicdrug researchfocusesmoreontheidenticationofspecicbiologicaltargetsgeneproducts,such asenzymesorproteinsfordrugs,whichcanbemanipulatedtoproducethedesiredeffectof curingadiseasewithminimumdisruptiveside-effects[92,100].Themainphasesinsuchadrug developmentstrategyaretargetidentication,validationandleadinhibitoridentication[21]. Enzymescatalyzereactions,whichproducemetabolitescompoundsinthemetabolic networksoforganisms.Enzymemalfunctionsthatresultintheaccumulationofasetofcompoundsmaycausediseases.WetermsuchcompoundsastheTargetCompoundsandthe remainingcompoundsastheNon-Targetcompounds.Forinstance,themalfunctionofthe enzyme phenylalaninehydroxylase causesbuildupoftheaminoacid,phenylalanine,resultingin phenylketonuria[99],adiseasethatcausesmentalretardation.Hence,itisessentialtoidentify 34

PAGE 35

theoptimalenzymesetthatcanbemanipulatedbydrugstopreventtheexcessproductionof targetcompoundswhileincurringminimalside-effects.Wetermtheside-effectsofinhibiting agivensetofenzymesasthedamagecausedtothemetabolicnetwork.Formally,wedene damageofinhibitingasetofenzymesasthenumberofnon-targetcompoundswhoseproductionsarestoppedduetotheinhibitionofthoseenzymes.Notethat,itistrivialtoextendthis denitiontonon-uniformcostsfordifferentcompounds.Weusetheuniformdamageassumption fordifferentcompoundsinthischapterforsimplicity. Problemstatement. Givenametabolicnetworkandasetoftargetcompounds,theproblemof TargetIdenticationbyEnzymesTIEseeksthesetofenzymeswhoseinhibitioneliminates allthetargetcompoundsandinictsminimumdamageontherestofthenetwork. Evaluatingallenzymecombinationsisnotfeasibleformetabolicnetworkswithlarge numberofenzymes.Thisisbecausethenumberofenzymecombinations,i.e., 2 j E j )]TJ/F15 11.9552 Tf 12.003 0 Td [(1 ,increases exponentiallywiththenumberofenzymes.Efcientandpreciseheuristicsareneededtotackle thisproblem. Notethatdifferentenzymesandcompoundsmayhavevaryinglevelsofimportanceinthe metabolicnetwork.Ourmodelsimplisticallyconsidersalltheenzymesandcompoundstobeof equalimportance.Itcanbeextendedbyassigningweightstoenzymesandcompoundsbasedon theirrolesinthenetwork.However,wedonotdiscusstheseextensionsinthischapterasthese extensionscanbesolvedbymakingtrivialmodicationstothealgorithmthatsolvestheproblem describedinthischapter. 2.2Methods 2.2.1TheIterationPhase ThissectionpresentstheIterationPhaseofourmethod.Iterationphasequicklyproduces asetofsuboptimalsolutionstotheTIEproblem.TheIterationPhaseisbasedontheintuition thatwecanarriveatasolutionclosetotheoptimalonebytracingthemetabolicnetwork 35

PAGE 36

backwardsstartingfromthetargetcompounds.Weevaluatetheimmediateprecursorsofthe targetcompoundsanditerativelymovebackwardstoidentifytheenzymeswhoseinhibitionstops theproductionofthetargetcompoundswhileincurringminimumdamage.Thisphaseconsistsof aninitializationstepfollowedbyiterations,untilsomeconvergencecriteriaissatised. Let E R and C denotethesetsofenzymes,reactionsandcompoundsofthemetabolic networkrespectively.Let j E j j R j and j C j denotethenumberofenzymes,reactionsandcompoundsrespectively.Theiterationphaseusesthreeprimarydatastructures,namelyanenzyme vector V E =[ e 1 e 2 e j E j ] ,areactionvector V R =[ r 1 r 2 r j R j ] ,andacompound vector V C =[ c 1 c 2 c j C j ] .Eachvalue, e i ,in V E denotesthedamageofinhibitionofenzyme, E i 2 E .Eachvalue, r j i ,in V R denotesthedamageincurredbystoppingthereaction R i 2 R in j thiteration.Eachvalue, c j i ,in V C denotesthedamageincurredbystoppingtheproductionof thecompound C i 2 C in j thiteration.Theterms r 0 i and c 0 i representthevalues r i and c i atthe initializationstep. Initialization: Theinitializationstepcomputestheinitialvaluesof V E V R ,and V C .Itrst computesthevector V E ,then V R ,andnally V C .Itcomputeseachentryinthesevectorsby simplyeliminatingthevertexcorrespondingtothatentryfromthemetabolicnetwork.For instance,inordertocomputethevalueof e 1 ,itremovesthevertexcorrespondingtoenzyme E 1 fromthegraph.Inotherwords,itdoesnotconsideringcombinationsofenzymesatthisstep.It onlyaimstondaninitiallocalsolution.Next,weelaboratetheinitializationofvectors V E V R and V C Enzymevector: Thevalueof e i 8 i; 1 i j E j ,iscomputedasthedamageincurredafter inhibiting E i .Thisvalueiscomputedbydoingabreadth-rsttraversalofthemetabolicnetwork startingfrom E i .Thetraversalisperformedasfollows.Alltheedgesthataretraversedare removedfromthegraph.Ifavertexvisitedduringthetraversalcorrespondstoareaction,that vertexisremovedaswell,anditsoutgoingedgesareinsertedintoaqueuetobetraversedlater. 36

PAGE 37

Ifavertexvisitedduringthetraversalcorrespondstoacompound,thatvertexisremovedonlyif ithasnootherincomingedges.Ifacompoundvertexisremoved,itsoutgoingedgesareinserted intoaqueuetobetraversedlater.Otherwise,itsoutgoingedgesarenottraversed. Thedamage e i associatedwitheveryenzyme E i 2 E 8 i; 1 i j E j ,iscalculated separatelyandstoredatposition i intheenzymevector V E Reactionvector: Thedamage r 0 j iscomputedastheminimumofthedamagesoftheenzymes thatcatalyze R j 8 j; 1 j j R j .Let E 1 E 2 E k betheenzymesthatcatalyze R j .The damage r 0 j iscomputedas r 0 j =min k i =1 f e i g .Thiscomputationisintuitivesinceareactioncan bedisruptedbyinhibitinganyofitscatalyzers.Thevalueof r 0 j associatedwitheveryreaction R j 2 R 8 j; 1 j j R j iscomputedandstoredatposition j inthereactionvector V R .Let E 0 R j denotethesetofenzymesthatproducedthedamage r 0 j .Theset E 0 R j isalsostored alongwith r 0 j Compoundvector: Thedamage c 0 k 8 k; 1 k j C j ,iscomputedbyconsideringthereactions thatproduce C k .Let R 1 R 2 R j bethereactionsthatproduce C k .First,thesetof enzymes E 0 C k for C k iscomputedas E 0 C k = E 0 R 1 [ E 0 R 2 [[ E 0 R j .Next,the damage c 0 k iscomputedasthedamageincurredaftertheinhibitionofalltheenzymesin E 0 C k Thiscomputationisbasedontheobservationthatacompounddisappearsfromthesystemonly ifallthereactionsthatproduceitstop.Thevalueof c 0 k associatedwitheverycompound C k 2 C 1 k j C j iscalculatedandstoredatposition k inthecompoundvector V C .Theset E 0 C k is alsostoredalongwith c 0 k Table2-1showstheiterationstepsforFigure2.4. I 0 istheinitializationstep; I 1 and I 2 are theiterations. V R and V C representthedamagevaluesofreactionsandcompoundsrespectively computedateachiteration. V E = [3,0,0]inalliterations. flag =1 indicatesthatthe inhibitionoftheenzymescorrespondingtothatvectorentryin E R or E C eliminatesall thetargetcompounds.Column I 0 inTable2-1showstheinitializationofthevectorsforthe 37

PAGE 38

networkinFigure2.4.Thedamage e 1 of E 1 isthreeasinhibiting E 1 stopstheproductionof threenon-targetcompounds C 2 C 3 and C 4 .Sincethedisruptionof E 2 or E 3 alonedoesnot stoptheproductionofanynon-targetcompound,theirdamagevaluesarezero.Hence, V E = [3,0,0].Thedamagevaluesforreactionsarecomputedastheminimumoftheircatalyzers r 0 1 = r 0 2 = e 1 and r 0 3 = r 0 4 = e 2 .Hence, V R = [3,3,0,0].Thedamagevaluesforcompounds arecomputedfromthereactionsthatproducethem.Forinstance, R 1 and R 2 produce C 2 E 0 R 1 = E 0 R 2 = f E 1 g .Therefore, c 0 2 = e 1 .Similarly c 0 5 isequaltothedamageofinhibiting theset E 0 R 3 [ E 0 R 4 = f E 2 ;E 3 g .Thus, c 0 5 = 1. Iterativesteps: Theiterationsrenethedamagevaluesinvectors V R and V C byconsidering theprecursorsofthecorrespondingreactionandcompound.Thus,atthe n thiteration,the verticesfromwhichareactionoracompoundvertexisreachableonapathoflengthupto n areconsidered.Inthischapter,thelengthofapathonthegraphconstructedforametabolic networkisdenedasthenumberofreactionsonthatpath. Denition1. Inagivenmetabolicnetwork,thelengthofapathfromanenzyme E i toa reaction R j orcompound C k isdenedasthenumberofuniquereactionsonthatpath. ut Theenzymevector, V E ,remainsunchangedattheiterationstepsincetheenzymesarenot affectedbythereactionsorthecompounds.Next,wedescribetheactionstakentoupdate V R and V C ateachiteration.Wediscussthestoppingcriteriafortheiterationslater. Reactionvector: Let C 1 C 2 C t bethecompoundsthatareconsumedby R j .Weupdate thedamage r n j as r n j =min f r n )]TJ/F23 7.9701 Tf 6.586 0 Td [(1 j ; t min i =1 f c n )]TJ/F23 7.9701 Tf 6.587 0 Td [(1 i gg : Thersttermoftheminfunctionisthedamagevaluecalculatedfor R j duringtheprevious iterationi.e., n )]TJ/F15 11.9552 Tf 12.442 0 Td [(1 thiteration.Thesecondtermisthedamageoftheinputcompoundwith theminimumdamagefoundinthepreviousiterationi.e., n )]TJ/F15 11.9552 Tf 12.128 0 Td [(1 thiteration.Thiscomputation isintuitivesinceareactioncanbedisruptedbystoppingtheproductionofanyofitsinput 38

PAGE 39

compounds.Thedamageofalltheinputcompoundsarealreadycomputedinthe n )]TJ/F15 11.9552 Tf 12.804 0 Td [(1 th iteration.Therefore,atthe n thiteration,thesecondtermoftheminfunctionconsidersthe impactofthereactionsandcompoundsthatareawayfrom R j by n edgesinthegraphforthe metabolicnetwork.Let E n R j denotethesetthatcontainstheenzymesthatproducedthenew damage r n j .Alongwith r n j ,wealsostore E n R j .Weupdateall r n j 2 V R usingthesamestrategy. Notethatthevalues r n j canbecomputedinanyorder,i.e.,theresultdoesnotdependontheorder inwhichthereactionsareconsidered. Compoundvector: Thedamage c n k 8 k; 1 k j C j ,isupdatedbyconsideringthedamage computedfor C k inthepreviousiterationandthedamagesofthereactionsthatproduce C k Let R 1 R 2 R j bethereactionsthatproduce C k .Werstcomputeasetofenzymes as E n )]TJ/F23 7.9701 Tf 6.586 0 Td [(1 R 1 [ E n )]TJ/F23 7.9701 Tf 6.586 0 Td [(1 R 2 [[ E n )]TJ/F23 7.9701 Tf 6.587 0 Td [(1 R j .Here, E n )]TJ/F23 7.9701 Tf 6.587 0 Td [(1 R t 1 t j ,isthesetof enzymescomputedfor R t afterthereactionvectorisupdatedinthecurrentiteration.Weupdate thedamagevalue c n k as c n k =min f c n )]TJ/F23 7.9701 Tf 6.586 0 Td [(1 k ; damage j [ i =1 E n )]TJ/F23 7.9701 Tf 6.586 0 Td [(1 R i g : Thersttermheredenotesthedamagevaluecomputedfor C k inthepreviousiteration.The secondtermshowsthedamagecomputedforalltheprecursorreactionsinthecurrentstep.Along with c n k ,wealsostore E n C k ,thesetofenzymeswhichprovidesthecurrentminimumdamage c n k .Similartothereactionvector,theentries c n k inthecompoundvectorcanbecomputedinany orderastheydonotdependoneachother. Conditionforconvergence: Ateachiteration,eachvaluein V R and V C eitherremainsthe sameordecreasesbyanintegeramount.Thisisbecauseeachiterationappliesaminfunction toupdateeachvalueastheminimumofthecurrentvalueandafunctionofitsprecursors. Therefore,thevaluesof V R and V C donotincrease.Furthermore,adamagevalueisalways anonnegativeintegersinceitdenotesthenumberofdeletednon-targetcompounds.Iterative 39

PAGE 40

renementstepsstopwhenthevectors V R and V C donotchangeintwoconsecutiveiterations. Thisisjustied,because,ifthesetwovectorsremainthesameafteraniteration,itimpliesthat thedamagevaluesin V R and V C cannotbereducedanymoreusingourrenementstrategy. Columns I 1 and I 2 inTable2-1showtheiterativestepstoupdatethevaluesofthevectors V R and V C .In I 1 ,wecomputethedamage r 1 1 for R 1 astheminimumofitscurrentdamage threeandthedamageofitsprecursorcompound, c 1 5 =1 .Hence, r 1 1 isupdatedto1and itsassociatedenzymesetischangedto f E 2 ;E 3 g .Theothervaluesin V R remainthesame. Whenwecomputethevaluesfor V C c 1 isupdatedto1,asitsnewassociatedenzymesetis f E 2 ;E 3 g andthedamageofinhibitingboth E 2 and E 3 togetheris1.Hence, V R =[1 ; 3 ; 0 ; 0] and V C =[1 ; 3 ; 3 ; 3 ; 1] .In I 2 ,wendthatthevaluesin V R and V C donotchangeanymore.Hence, westopouriterativerenementandreporttheenzymecombination E 2 ;E 3 asthebestsolution observedintheiterativealgorithm. Aninterestingobservationthatfollowstheiterativealgorithmisthatitcanproducemultiple solutionstotheTIEproblem.Thiscanbeexplainedasfollows.Eachiterationproducesatleast onefeasiblesolutioni.e.,anenzymesetthateliminatesallthetargets.Thisisbecauseeach entryinthevector E n C denotesasetofenzymeswhoseinhibitioneliminatesitscorresponding compound,where n istheiterationnumber.Theunionofalltheentriesof E n C corresponding tothetargetcompoundsisafeasiblesolution.Inadditiontothis,eachset E n R or E n C isa feasiblesolutioniftheinhibitionoftheenzymesinthatseteliminatesallthetargetcompounds. Theorem1. Let V R =[ r 1 r 2 r j R j ] ,and V C =[ c 1 c 2 c j C j ] bethereactionand compoundvectorsrespectivelyseeSection2.2.1.Let n bethelengthoftheprecedingpathsee Denitions1ofreaction R j orcompound C k .Thevalue r j or c k remainsconstantafterat most n iterations. ut Proof:SeeSongetal'spaper[94]. Complexityanalysisisinthefollowing. 40

PAGE 41

SpaceComplexity:Thenumberofelementsinthereactionandcompoundvectorsis j R j + j C j .Foreachelement,westoreanassociatedsetofenzymes.Hence,thespacecomplexityis O j R j + j C j j E j TimeComplexity:FromTheorem1,thenumberofiterationsofthealgorithmis O j R j Thecomputationtimeperiterationis O G j R j + j C j ,where G isthesizeofthegraph, G = numberofvertices + numberofedges.Hence,thetimecomplexityis O j R j G j R j + j C j 2.2.2TheFusionPhase TheIterationPhasecomputesasetoffeasiblesolutionstotheTIEproblemquickly. However,theseresultsarenotnecessarilyoptimal.Inotherwords,theenzymesetwithminimum damageisnotguaranteedtobeinthesolutionset.Usually,itisdesirabletospendmoretimeif betterresultscanbefound.ThissectionpresentstheFusionPhasewhichoptimizestheresultsof theIterationPhase. TheFusionPhasecreatesasubsetofenzymesbytakingtheunionofanumberofsuboptimalresultsfoundattheIterationPhase.Recallthateachsub-optimalsolutionisasetof enzymeswhoseinhibitioneliminatesallthetargetcompounds.Itthengrowsthissetbyinserting asmallnumberoftheremainingenzymes.TheFusionPhaseexhaustivelysearchestheenzymes inthissettondthesubsetofenzymesfromthissetwhoseinhibitionstopstheproductionof allthetargetcompoundsandhasthesmallestdamage.It,then,recursivelycreatesanewset byinsertingrandomenzymesfromtheupstreamsubnetworkofthetargetcompoundstothe optimalsolutionfoundsofarandexhaustivelysearchesthissetagainuntilapredenednumber ofiterationsareperformed.Thecomplexityoftheexhaustivesearchisexponentialinthenumber ofenzymes.Therefore,thesizeoftheenzymesetateachstepisdeterminedbytheamountof timetheuseriswillingtospendontheexhaustivesearch. Algorithm2.1presentsabriefoverviewoftheFusionPhase.First,weselectasetofsuboptimalresultsfoundintheIterationPhase.Recallthat,intheIterationPhase,werecordthe 41

PAGE 42

Algorithm2.1 TheFusionPhase INPUT: -Ametabolicnetwork. E R C thesetsofenzymes,reactionsandcompoundsinthemetabolicnetworkrespectively. min theminimumnumberofenzymesselectedfromtheIterationPhase max themaximumnumberofenzymesthatcanbeexhaustivelysearched T asetoftargetcompounds. OUTPUT: Asubsetofenzymes,whoseinhibitioneliminatesallthetargetcompoundsandinictsminimumdamageonthenetwork. Selectthebestsub-optimalresultsfoundintheIterationPhasesothattheirunioncontains K enzymes,where min K max .Storetheunionoftheseenzymesasthepotentialsolutionset E 0 for i =1 to m do Growthepotentialsolutionset E 0 byinsertingenzymesin E 0 randomlyfromtheupstream subnetworkofthetargetcompoundsuntil E 0 contains max enzymes. Exhaustivelysearchtheenzymesin E 0 tondthesubsetofenzymesin E 0 thateliminatesall thenon-targetcompoundsandhasthesmallestdamage. Updatethepotentialsolutionset E 0 asthesetofenzymesintheresultoftheexhaustive search. endfor Report E 0 astheresult. candidateenzymesets E n R k and E n C k ineachiteration.Onlythecandidatesetsthatcan inhibitallthetargetcompoundsarepotentialsolutions.Wecomputeabooleanvalue flag n R k and flag n C k todistinguishenzymesetsthatarepotentialsolutionsfromtheonesthatare not.Indetail,iftheinhibitionof E n R k eliminatesallthetargetcompounds, flag n R k = 1. Otherwise, flag n R k = 0.Similarly flag n C k = 1onlyif E n C k stopstheproductionofall thetargetcompounds. Wemaintainalistoftopcandidatesetswiththesmallestdamagethroughtheiterations. ThislistisemptypriortotheIterationPhase.Ateachiteration,wecheckthedamagevaluesin vectors V R and V C aswellasthecorresponding flag n R k and flag n C k values.Thesevectors producecandidatesolutionsintwodifferentways. 42

PAGE 43

Case1: Eachentry flag n R k = 1or flag n C k = 1indicatesthat E n R k or E n C k is acandidatesolution. Case2: Theunionofallthesets E n C k ,where C k isatargetcompound,isacandidate solutionevenif flag n C k = 0forsomeofthe C k Theformercaseiscorrectsincetheagindicatesthatthecorrespondingenzymeset eliminatesallthetargets.Thelattercaseholdssinceeachenzymeset E C k isguaranteedto eliminatecompound C k .Thus,theirunionisguaranteedtoeliminateallthetargetcompounds. Ateachiteration,weevaluateallthecandidatesolutionsobtainedatthatiterationandinsert thetopsolutionsiftherearefewerthan min enzymesinthetopcandidateslist.Otherwise,we replacethenewsolutionwiththeexistingsolutioninthelistthathasthelargestdamageifthe damageofthenewsolutionislessthantheexistingone.Table2-1showsthe flag andthe unionofthetopcandidatesineachiterationforFigure2.4. V R V C E R and E C arethesame asTable2-1.Inthisexample, min= 3.Initeration I 0 ,thetopsolutionis E 0 C 5 = f E 2 ;E 3 g since flag 0 C 5 = 1andthedamagevalueof C 5 c 0 5 isthesmallestdamageobservedsofar. Therefore,weinitializethetopcandidatelisttotheenzymesin E 0 C 5 .Therearelessthan min enzymesinthislist.Therefore,wegrowthislistbyinsertingthenextbestsolutioninthe solutionsfoundsofaruntilwehaveatleast min enzymesornotothersolutionexists.Allthe remainingsolutionsatthisiterationhavethesamedamagevaluei.e.,damage = 3.Thus,we pickuponeofthemarbitrarily,say E 0 C 1 ,andinserttheenzymesinitintotopcandidatelist. Thenweimprovethecurrentsolutionasfollows.Wegrowtheset E 0 byinsertingrandom enzymesfromtheupstreamsubnetworkofthetargetcompoundsuntil E 0 contains max enzymes. Thereasonisthattheoptimalenzymesetmustresideintheupstreamsubnetworkofthetarget compounds.Therefore,weonlyconsidertheenzymesinthissubnetwork.Step4exhaustively searchesthesubsetsofenzymesintheenzymeset E 0 tondtheoptimalsolutionin E 0 .The algorithmrepeatsaprespeciednumberoftimes,andreportthebestresult. 43

PAGE 44

2.3Results Experimentalsetup. Weextractedthemetabolicnetworkof Escherichiacoli E.coli from KEGG[47] ftp://ftp.genome.jp/pub/kegg/pathways/eco/ .Themetabolic networkinKEGGhasbeenhierarchicallyclassiedintosmallernetworksaccordingtotheir functionalities.Weperformedexperimentsatdifferentlevelsofhierarchyofthemetabolic networkincludingtheentiremetabolicnetwork,thatisanaggregationofallthefunctional subnetworks.Wedevisedauniformlabelingschemeforthenetworksbasedonthenumber ofenzymestheycontain.Accordingtothisscheme,anetworklabelbeginswith`N'andis followedbythenumberofenzymesinthenetwork.Forinstance,`N20'indicatesanetwork with20enzymes.Table4-3showsthemetabolicnetworksusedinourexperimentsalongwith theiridentiersandthenumberofcompoundsC,reactionsRandedgesEd.Thenetworks aredownloadedfromtheKEGGdatabase.Id,C,RandEddenotethenetworkidentierused inthischapter,thenumberofcompounds,reactionsandedgesinteractionsrespectively.The edgesrepresenttheinteractionsinthenetwork.Foreachnetwork,weconstructedthreequery sets.Eachqueryinthesesetsconsistsofone,twoandfourtargetcompoundsrespectively.For eachnetworkinourdataset,weselectedtargetcompoundsinthequerysetsrandomlyamongthe compoundsinthatnetwork.Eachquerysetcontains10querieseach. WeimplementedIterationPhase,theFusionPhaseandanexhaustivesearchalgorithm[96] whichdeterminestheoptimalenzymecombination.WeimplementedtheIterationPhaseand theexhaustivesearchalgorithmsinJava.WeimplementedtheFusionPhaseinC.Weranour experimentsonanIntelPentium4processorwith2.8GHzclockspeedand1-GBmainmemory, runningLinuxoperatingsystem. 2.3.1EvaluationoftheDamageModelonRealDrugs Werstevaluatehowwelltheproposedcostmodelreectsthebiologicalprocess.We dothisbyqueryingwellstudiesdrugsintheliteratureusingdoubleiterativeoptimization. 44

PAGE 45

KEGGcontainsadatabaseofknowndrugmoleculesalongwiththeenzymestheyinhibitand theirtherapeuticcategory.Weusethedrugsatthisdatabaseasourbenchmarks.Duetospace limitation,wereportonlyfourofthem.ThevalueinparenthesisthatstartswithletterD,C, orEe.g.,D02562istheuniqueidentierassignedtothecorrespondingdrug,compound,or enzymerespectivelyinKEGG. BenoxaprofenD03080.Thisdruginhibitsarachidonate5-lipoxygenaseE1.13.11.34 whichappearsinseveralnetworksincludingarachidonicacidmetabolismnetworkhsa00590. InPharmacology,5-lipoxygenaseinhibitorswilldecreasethebiosynthesisofLTB4C02165, cysteinylcontainingleukotrienesLTC4C02166,LTD4C05951andLTE4C05952.Accordingtoourgraphmodel,theremovalof5-lipoxygenaseeliminatesthreeofthesecompounds LTB4,LTC4andLTD4inarachidonicacidmetabolismnetwork.Inhibitionofthisenzymealso eliminatesvemorecompounds,namely5S-HPETEC05356,5-HETEC04805,LTA4 C00909and20-OH-LTB4C04853.Thesecompoundscanbeconsideredasdamageinour model. RunningdoubleiterativeoptimizationwithLTB4,LTC4,LTD4andLTE4asthetarget compoundndsLTA4HE3.3.2.6andLTC4synthaseE4.4.1.20astheoptimalenzyme set.Theinhibitionoftheseenzymeseliminatesonlyonenon-targetcompound,20-OH-LTB4 C04853.Doubleiterativeoptimizationpotentiallyndsabettersolutioninthisexperimentthan theexistingdrugasthesamecompoundiseliminatedbytheexistingdruginadditiontofour othercompounds.Indeed,recentresearchsupportsourmodelsincetheanti-inammatoryeffect ofthelevelsofLTA4H[81]andLTC4[101]havebeenobserved. RasagilineD02562.Thisisanantiparkinsoniandrug.ItinhibitsamineoxidaseE.1.4.3.4 whichappearsinseveralmetabolicnetworks.Inthehistidinemetabolismnetworkhsa00340, theremovalofamineoxidaseeliminatesthecompoundsMethylimidazoleacetaldehyde C05827MethylimidazoleaceticacidC05828accordingtoourgraphmodel.Levelsof 45

PAGE 46

pros-methylimidazoleaceticacidhascorrelationwithseverityofParkinson'sdiseaseinpatients[5,76].Thisdemonstratesthat,ourmodelcanpredicttheintendedtargetwell.When doubleiterativeoptimizationisrunonthesamenetworkwithmethylimidazoleaceticacidand themethylimidazoleacetaldehydeasthetargetcompoundsitndsamineoxidaseastheoptimal target.ThisimpliesthatRasgilineistargetingtheoptimalenzymeaccordingtoourmodel. ForOzagrelD01683andErythromycinacistrateD02523,runningdoubleiterative optimizationcanndthesametargetenzymeastheactualdrug.detailsomitted 2.3.2EvaluationoftheIterationPhase Evaluationofaccuracy: InordertoevaluatetheaccuracyoftheFusionPhase,wecomparedthedamagevalueofthebestresultfoundatthisphasewiththatoftheexhaustivealgorithm.Notethattheexhaustivealgorithmguaranteestondtheoptimalresultsinceitconsiders allpossibilities.Table2-3showstheresults.Wepresenttheresultsonlyupto32enzymenetworksi.e., N 32 ,fortheexhaustivesearchalgorithmtooklongerthanonedaytonisheven forlargernetworks.Wecanseethatthedamagevaluesofourmethodexactlymatchthedamage valuesoftheexhaustivesearchforallthenetworksexcept N 24 .For N 24 ,theaveragedamage differsfromthatoftheexhaustivesolutionbyonly0.02%.ThisshowsthatIterationPhaseis agoodapproximationoftheexhaustivesearchalgorithmwhichcomputesanoptimalsolution. TheslightdeviationindamageisthetradeoffforachievingthescalabilityoftheIterationPhase describednext. Evaluationofscalability: Figure2-2aplotstheaverageexecutiontimeofourIteration Phaseforincreasingsizesofmetabolicnetworks.Therunningtimeincreasesslowlywiththe networksize.Asthenumberofenzymesincreasesfrom8to537,therunningtimeincreases fromroughly1to10seconds.Thelargestnetwork, N 537 ,consistsof537enzymes,andhence, anexhaustiveevaluationinspects 2 537 )]TJ/F15 11.9552 Tf 12.367 0 Td [(1 combinationswhichiscomputationallyinfeasible. Thus,ourresultsshowthattheIterationPhasescaleswellfornetworksofincreasingsizes.This 46

PAGE 47

propertymakesourmethodanimportanttoolforidentifyingtherightenzymecombinationfor eliminatingtargetcompounds,especiallyforthosenetworksforwhichanexhaustivesearchisnot feasible. Figure2-2bshowstheplotoftheaveragenumberofiterationsforincreasingsizesof metabolicnetworks.TheIterationPhasereachestoasteadystatewithin10iterationsinallcases. ThevariousparametersseeTable4-3thatinuencethenumberofiterationsarethenumber ofenzymes,compounds,reactionsandespeciallythenumberofinteractionsinthenetwork representedbyedgesinthenetworkgraph.Largernumberofinteractionsincreasethenumber ofiterationsconsiderably,ascanbeseenfornetworks N 22 N 48 N 96 N 537 ,wherethenumber ofiterationsisgreaterthan5.Thisshowsthat,inadditiontothenumberofenzymes,thenumber ofcompoundsandreactionsinthenetworkandtheirinteractionsalsoplayasignicantrolein determiningthenumberofiterations.OurresultsshowthattheIterationPhasecanreliablyreach asteadystateandterminate,fornetworksaslargeastheentiremetabolicnetworkof E.coli 2.3.3EvaluationoftheFusionPhase Thissectionevaluateshowtheaccuracyandtherunningtimeoftheproposedmethod changeswhentheFusionPhaseisrunaftertheIterationPhase.Weusetheentirenetwork, N 537 inthisexperiment.Weran83queries,whereeachquerycontainsone,two,orfourrandomly chosencompoundsasthetargetcompounds.WelimitedthenumberofiterationsoftheFusion Phasetosixsincewedidnotobserveanyimprovementinaccuracyafterthatpointinour experiments.ThenumberofenzymestakenfromtheIterationPhaseisnomorethan min= 25. Weuse max= 32asthemaximumnumberenzymestobesearchedbytheexhaustivesearch. Evaluationofaccuracy. In16outof83queries,theFusionPhaseimprovedtheresult foundbytheIterationPhase.Theresultsremainedunchangedfortheremainingqueries.Table24liststhequeriesforwhichtheresultshaveimproved.Here,theidentierforacompoundisthe uniqueidentierassignedtothatcompoundintheKEGGdatabase.Accordingtothedenition 47

PAGE 48

ofTIE,thesmallerdamage,thebetterresult.Forexample,whenthetargetcompoundswere C00048GlyoxylateandC00010CoA,theIterationPhasecouldndanenzymesubsetwhose damagevalueis20,whiletheFusionPhaseimprovedthissolutiontothedamagevalueof16. Inaddition,ifthetargetcompoundsareC00255RiboavinandC00536Triphosphate,the enzymesubsetthattheFusionPhasegetsis f E2.5.1.9,E2.7.1.26,E2.7.4.1 g .Accordingtoour graphmodel,theremovalofthesethereenzymeseliminatessixnon-targetcompounds,C00013, C00009,C04332,C04732,C00061andC00016.However,theIterationPhasendsanenzyme subset f E2.5.1.9,E2.7.4.1,E3.1.3.2 g whicheliminatesonemorenon-targetcompound,namely C00147Adenine,besidestheabovesixcompounds.Adenineisapurinewithavarietyofroles inbiochemistryincludingcellularrespiration.Thus,ifweeliminateAdenine,someimportant biologicalfunctionswouldbeinuenced.Therefore,theFusionPhasecouldndbetterresults thantheIterativePhase.Ontheaverage,theFusionPhaseimprovedtheaccuracyoftheIteration Phaseby16/83=19.3%. Evaluationoftherunningtime. TheaverageexecutiontimeoftheFusionPhaseis 432secondswhichislongerthanthatoftheIterationPhaseseconds.Thisisbecausethe FusionPhaseusestheIterationPhaseasasubroutineandperformstheexhaustivesearchona smallsetofenzymes.However,432secondsisreasonablesmallforbiologicalcomputation. Biologistsfocusmoreonaccurateresultsthanexecutiontime.Thismeansthatitisveryuseful andbenecialtoemploytheFusionPhaseasthetooltoaddresstheTIEproblemonlargesetsof metabolicnetworks. 2.4Discussion Efcientcomputationalstrategiesareneededtoidentifytheenzymesi.e.,drugtargets, whoseinhibitionwillachievetherequiredeffectofeliminatingagivensetoftargetcompounds whileincurringminimalside-effects.Anexhaustiveevaluationofallpossibleenzymecombinationstondtheoptimalsubsetiscomputationallyinfeasibleforlargemetabolicnetworks.We 48

PAGE 49

proposedadoubleiterativeoptimizationmethodwithtwophases,theIterationPhaseandthe FusionPhase.TheIterationPhaseisbasedontheintuitionthatagoodsolutioncanbefoundby tracingbackwardfromthetargetcompounds.Itinitiallyevaluatestheimmediateprecursorsof thetargetcompoundsanditerativelymovesbackwardstoidentifytheenzymeswhoseinhibition incurslessside-effects.Thisphaseconvergestoasub-optimalsolutionafterasmallnumberof iterations.TheFusionPhasetakesasetofsub-optimalresultsbasedontheIterationPhaseas thepotentialsolutionset.Then,weextendthepotentialsolutionsetbyrandomlyinsertingsome remainingenzymesandapplyanexhaustivesearchonitrecursivelyuntilapredenednumber ofiterationsareperformed.Inourexperimentsonthe E.coli ,metabolicnetwork,theaccuracy ofasolutioncomputedbytheIterationPhasedeviatedfromthatfoundbyanexhaustivesearch onlyby0.02%.OurIterationPhaseishighlyscalable.Itsolvedtheproblemforeventheentire metabolicnetworkof E.coli inlessthan10seconds.TheFusionPhaseimprovedthedamage valueoftheIterationPhaseby19.3%spendingonly432seconds. Theproblemaddressedinthischapterintroducesanumberofexcitingcomputational problems.First,thedamagemodelcanbeimprovedtoacontinuousfunctionwherethequantity ofeachtargetandnon-targetcompoundistakenintoconsiderationafterinhibitionofthetarget enzymes.Second,thegoalofthedrugcanincludeincreasingsomeofthetargetcompounds. Computableobjectivefunctionsrepresentingtheseproblemsneedtobedened. Figure2-1.Agraphconstructedforametabolicnetworkwithfourreactions R 1 R 4 ,three enzymes E 1 E 2 and E 3 ,andvecompounds C 1 C 5 C 1 isthetargetcompound. 49

PAGE 50

Figure2-2.EvaluationoftheIterationPhase.aAverageexecutiontimeinmilliseconds.b Averagenumberofiterations 50

PAGE 51

Table2-1.IterativeStepsforFigureg/iterEg. I 0 I 1 I 2 V E [3 ; 0 ; 0] [3 ; 0 ; 0] [3 ; 0 ; 0] V R [3,3,0,0] [1,3,0,0] [1,3,0,0] V C [3 ; 3 ; 3 ; 3 ; 1] [1 ; 3 ; 3 ; 3 ; 1] [1 ; 3 ; 3 ; 3 ; 1] E R [ f E 1 g ; [ f E 2 ;E 3 g ; [ f E 2 ;E 3 g ; f E 1 g ; f E 1 g f E 1 g ; f E 2 g ; f E 2 g ; f E 2 g ; f E 3 g ] f E 3 g ] f E 3 g ] E C [ f E 1 g ; [ f E 2 ;E 3 g ; [ f E 2 ;E 3 g ; f E 1 g ; f E 1 g ; f E 1 g ; f E 1 g ; f E 1 g ; f E 1 g ; f E 1 g ; f E 1 g ; f E 1 g ; f E 2 ;E 3 g ] f E 2 ;E 3 g ] f E 2 ;E 3 g ] flag R [1,1,0,0] [1,1,0,0] [1,1,0,0] flag C [1,1,1,1,1] [1,1,1,1,1] [1,1,1,1,1] E 0 [ E 1 ;E 2 ;E 3 ] [ E 1 ;E 2 ;E 3 ] [ E 1 ;E 2 ;E 3 ] Table2-2.Metabolicnetworksusedinourexperiments. IdMetabolicNetwork CREd IdMetabolicNetwork CREd N08Polyketide111133 N42Otheraminoacid6963208 biosynthesis N13Xenobiotics4758187 N48Lipid134196654 biodegradation N14CitrateorTCAcycle2135125 N52Purine67128404 N17Galactose3850172 N59Energy7282268 N20Pentosephosphate2637129 N71Nucleotide102217684 N22GlycanBiosynthesis5451171 N96Vitaminsand145175550 Cofactors N24Glycerolipid3249160 N170Aminoacid543781210 N28Glycine,serine3646151 N180Carbohydrate2475011659 andthreonine N32Pyruvate2151163 N537EntireNetwork98817905833 Table2-3.ComparisonoftheaveragedamagevaluesofsolutionsdeterminedbytheIteration Phaseversusthatdeterminedbytheexhaustivesearchalgorithm. PathwayId N 14 N 17 N 20 N 24 N 28 N 32 IterativeDamage 2 : 518 : 731 : 633 : 391 : 470 : 59 ExhaustiveDamage 2 : 518 : 731 : 633 : 171 : 470 : 59 51

PAGE 52

Table2-4.ComparisonofthedamagevaluesfoundbytheIterationPhaseandtheFusionPhase focusingonthequeriesinwhichtheFusionPhaseimprovedtheresultsoftheIteration Phase. Target Damage IterationPhase FusionPhase C00033,C00052 22 21 C00051,C00334 25 24 C00149,C00010 18 14 C00255,C00536 7 6 C00048,C00010 20 16 C00221,C00334 22 21 C00075,C00058 9 8 C00255,C00058 8 7 C04184,C00013,C00010,C00052 28 24 C00624,C00075,C00080,C01024 15 14 C00158,C00227,C00334,C00311 21 20 C00334,C00147,C00005,C00267 35 32 C00647,C00624,C04184,C00334 26 25 C00010,C00900,C00052,C05993 26 22 C00018,C00024,C01228,C00044 51 48 C00627,C00147,C00005,C00267 23 19 AVERAGE 22.25 20.06 52

PAGE 53

CHAPTER3 ENZYMATICTARGETIDENTIFICATIONUSINGLINEARMODELFORMULTIPLE ENZYMESCATALYZETHESAMEREACTION Thoughthereexistalotofarticlestostudytheenzymeknockoutstrategy,therearefewof thesepapersconsideringtheenzymeassociation.Thischapterpresentstheenzymeknockout strategiesonFBAwhichcandealwiththesituationthatareactioniscatalyzedbymultiple enzymes.Consideringtheenzymeassociation,thatis,AND,ORoracombinationofthem, weprovideabinarymethodandcontinuousmethodforeachassociation.OptKnockisapopular frameworkfortheenzymeknockoutstrategy.WeprovethatndingtheenzymeknockoutstrategybyOptKnockframeworkisNP-hard.Itisconsistentwiththeexperimentresultsthatwhen thenetworksizeincreases,therunningtimeofOptKnockframeworkincreasesexponentially. Ourexperimentsectionshowsthattheenzymeassociationinuencetheperformanceoflinear programmingmethodverymuch.Weobservethatourbinarymethodrunsmuchfasterthancontinuousmethod.Forthepathwaysof H.sapiens fromKEGG,ourbinarymethodrunslessthan onesecondforthewholemetabolism.Therefore,ourbinarymethodisusefulforthebiological application. 3.1MotivationandProblemDenition Enzymesplayasignicantroleinmetabolism.Enzymescatalyzethechemicalreactions. Reactionstransformasetofsubstratesintoproducts.Infact,multipleenzymesmaycarryout thesamereaction.Forexample,substituteenzymesdenotethatonlyoneoftheseenzymesneed tobepresentforthereactiontooccur.ThereexistsORassociationsamongtheseenzymes. Collaborateenzymesdenotethatalltheseenzymeshavetobeexpressedforthereactiontooccur. ThereexistsANDassociationsamongtheseenzymes.Moreover,thereactioniscatalyzed byasetofenzymeswhoseassociationisacombinationofORandAND.Reedetal[82] presentedexamplestodescribetheenzymeandreactionassociation,whichshowsinFigure3-1. ForD-XyloseABCTransporter,enzymesXylF,XylGandXylHcatalyzethereactionXYLabc 53

PAGE 54

withANDassociation,meaningthatallhavetobeexpressedforthereactiontooccur.For Glyceraldehyde3-PhosphateDehydrogenase,enzymesGapCandGapAcatalyzethereaction GAPDwithORassociation,meaningthatonlyoneoftheseenzymesneedtobepresentforthe reactiontooccur. Increasingorreducingtheproductionofproductsinametabolismisessentialformany industrialapplicationssuchascosmeticsandfoodindustry.Onecandidatemethodistoknockout asetofenzymes.Whenasetofenzymesaredeleted,somereactionsmaynotbeactive.Asa result,thenalproductsmaybeinuenced. Inordertondtheenzymeknockoutstrategyinsilico,itisnecessarytobuildthecomputationalmodelsformetabolism.Booleannetworkmodelisasimplewaytodescribethe metabolism[9496].Eachnodeinthismodel,onlyhastwostatus,presentornot.Fluxbalance analysisFBAisapopularmodelformetabolism[7,29,48]whichbasesonthelinearprogramming.S-systems[86,107]andGeneralizedMassActionGMA[73,107]modelsareanother waytodescribethemetabolismwhichbasesonthepower-functionrepresentation. Currently,theexistingmethodsofndingtheenzymeknockoutstrategydonotfocuson multipleenzymes.Sridharetal.andSongetal.consideredthealgorithmsforndingtheenzyme knockoutstrategyonbooleannetworkmodels[9496].Theyusedbranch-and-boundand iterativetechniquestoobtaintheenzymeknockoutstrategy.However,theirmethodscanonlybe appliedtothesituationthateachreactioniscatalyzedbyasingleenzyme.Optknockisapopular methodtondtheenzymeknockoutstrategybasedonFBAmodel[9].Infact,OptKnock providedabilevelprogrammingstrategy.Therstleveloptimizedthebioengineeringobjective thatis,whichenzymesshouldbeknockedout.Thesecondleveloptimizedthecellularobjective, thatis,howtodistributetheuxinordertoobtainthemaximalchemicalproductiontargets. Similarly,Optknockcannotdealwiththesituationthatthereactioniscatalyzedbymultiple enzymesnotonlyone.ForS-systems[86,107]andGeneralizedMassActionGMA[73,107] 54

PAGE 55

models,therearenocorrespondingenzymeknockoutstrategypresentedsofar.Therefore,it isnecessarytostudytheenzymeknockoutstrategywhenareactioniscatalyzedbymultiple enzymesnotonlyone. Contribution: Inthischapter,werstprovethatndingtheenzymeknockoutstrategy byOptKnockframeworkisNP-hard.Itisconsistentwiththeexperimentresultsthatwhenthe networksizeincreases,therunningtimeofOptKnockframeworkincreasesexponentially. Second,weovercomethelimitationofOptKnock.Wepresentanenzymeknockoutstrategy onFBAwhichcandealwiththesituationthatareactioniscatalyzedbymultipleenzymes. Consideringtheenzymeassociation,thatis,AND,ORoracombinationofthem,weprovide abinarymethodandcontinuousmethodforeachassociation.Forbinarymethod,weintroduce newbinaryvariablestodealwiththeassociation.Forcontinuousmethod,wedonotaddnew binaryvariablesbutrealvariablefortheassociation. Ourexperimentsusingthesyntheticalandbiologicaldatasetvalidatethattheenzyme topologyinuencestheperformanceoftheenzymeknockoutstrategyverymuch.Thetopology thatareactionmaybecatalyzedbymultipleenzymescostsmuchtimethantheonethata reactioniscatalyzedbyasingleenzyme.Weobservethatourbinarymethodruns60%to1100% fasterthancontinuousmethod. Therestofthischapterisorganizedasfollows.Section3.2provesthatndingtheenzyme knockoutstrategybyOptKnockframeworkisNP-hard.Section3.3describestheproposed methodsfortheenzymetopologythatareactionmaybecatalyzedbymultipleenzymes. Section5.5discussesexperimentalresults.Section6.4concludesthischapter. 3.2ComputationalComplexity OptKnockisabilevelframeworkforndingtheenzymeknockoutstrategy.Itconsiders maximizingabioengineeringobjectivegivenamaximizedcellularobjectivesubjecttoconstraintsontheuxesforasteadystatemetabolicnetworkcomprisingaset N =1 ;:::;N of 55

PAGE 56

metabolites,andaset M =1 ;:::;M ofmetabolicreactionsfueledbyaglucosesubstrate.This formulationisprovidedbyBurgardetal[9]. ThedecisionvariablesofOptKnockaregivenasfollows: v j :theuxofreaction j ; y j :binaryvariablewhichisequalto 0 ifanenzymeisknockedout,and 1 else; v chemical :yieldcorrespondingtochemical; v biomass :yieldcorrespondingtobiomass; v pts :theuptakeofglucosethroughthephoshphotransferasesystem; v glk :glucokinase; v glk uptake :thebasisglucoseuptakescenario; v atp :uxcorrespondingtoATP. v atp main :thenongrowth-associatedATPmaintenancerequirement. Otherrelevantparametersusedinthisproblemare: S ij :stoichiometricmatrix. target biomass :aminimumlevelofbiomassproduction; min j :minimumpossibleowcorrespondingtoux j max j :maximumpossibleowcorrespondingtoux j h j :costofblockingenzyme j correspondingtoreaction j w j :weightcorrespondingtothevalueofux j min j and max j areidentiedbyminimizingandmaximizingeveryreactionuxsubjectto theconstraintsfromtheOptKnockframeworkgivenbelow. Withthesevariablesandparameters,theintegerprogrammingformulationforOptKnock withcostsontheenzymesfollows. Maximize X j 2 M w j v j )]TJ/F29 11.9552 Tf 12.577 11.357 Td [(X j 2 M h j y j [OptKnock] 56

PAGE 57

subjectto: M X j =0 S ij v j =0 ;i 2 N v pts + v glk = v uptake v atp v atp main v biomass = OPT Primal min j y j v j max j y j j 2 M X j 2 M y j K; 8 j 2 M y j 2f 0 ; 1 g8 j 2 M: ToprovethatndingtheenzymeknockoutstrategybyOptKnockwithxedcostsisNPhard,weshowthattheuncapacitatedxedchargenetworkowproblem,whichisNP-hard,isa specialcaseoftheOptKnockwithxedcosts. LetG=V,Abeadirectedgraph,where V isthesetofnodesand A isthesetofarcs.Each arc i;j isassociatedwithtwocosts:axedcost f ij forselectingi,jandavariablecost c ij for routingowon i;j .ThentheuncapacitatedxedchargenetworkowproblemUFNFisto ndasetofarcsthatallowasupplynodetosendresourcestoasetofdemandnodes,suchthat thesumofxedandvariablecostsareminimized.UFNFcanbeformulatedbythefollowing mixed-integerprogram: min X i;j 2 A c ij X ij + X i;j 2 F f ij Z ij subjectto: X i;k 2 A X ik )]TJ/F29 11.9552 Tf 18.038 11.358 Td [(X k;j 2 A X kj = 8 > > > > > > < > > > > > > : )]TJ/F29 11.9552 Tf 11.291 8.966 Td [(P t 2 T d t ifk=s; d k if k 2 T ; 0 if k 2 V nf T [ s g 57

PAGE 58

X ij Z ij 8 i;j 2 F X ij 0 8 i;j 2 A Z ij 2f 0 ; 1 g8 i;j 2 F: Theobjectivefunction3minimizesthesumofthexedcostsassociatedwithselecting arc i;j andvariablecostsforsendingowthrough i;j .Constraints3areclassicalow conservationconstraints,whileconstraints3ensurethattherecannotbeanyowif Z ij is 0 andthemaximumowcanbeatmost is Z ij is 1 .Constraints3and3. THEOREM: FindingtheenzymeknockoutstrategybyOptKnockisNP-Hard. PROOF: Let'srewriteOptknockbyjustconsideringthebasicconstraintsgivenby3,3 and3,i.e.byignoringotherconstraintsweinvestigateaspecialcaseofOptknockwithcosts ontheenzymesintheobjectivefunction.Inourmetabolicpathwaynetworkgraph G 0 ,wecan deneeachmetabolite i 2 N asanode,andeachreaction k 2 M asanarcusingmetabolite i toproducemetabolite j ,andThereforeinourgraph G 0 wehave M arcscorrespondingtoeach reaction,and N nodescorrespondingtoeachmetabolite.Deneanewvariable x ij astheux correspondingtoreaction k whichusesmetabolite i toproducemetabolite j .LetSbeasetof sourcenodesexternalmetabolitesthatareimposedtothepathway,andTbethesetofsink nodesmetabolitesthatisnotgoingtobeusedwithinthepathwayafterproduced.Wealso specializeOptknockmorebyconsideringthattheuxforthesourceandsinkmetabolitesinthe metabolicpathwayisgivenasaconstant b i Thuswecanwriteconstraint3as X i;j 2 M S ij x ij = 8 > > > > > > < > > > > > > : )]TJ/F24 11.9552 Tf 9.299 0 Td [(b i if i 2 S ; b i if i 2 T ; 0 if i 2 V nf T [ s g 58

PAGE 59

foreach i 2 N Thenweredenethestoichiometricmatrix S ij ,say S 0 ij asfollows: S 0 ij = 8 > > > > > > < > > > > > > : 1 if i;j 2 M ; )]TJ/F15 11.9552 Tf 9.298 0 Td [(1 if j;i 2 M ; 0 otherwise;. Notethat S 0 ij hasentries 1 )]TJ/F15 11.9552 Tf 9.299 0 Td [(1 ,and 0 ,andthusisaspecialcaseof S ij .Byusingthe stoichiometricmatrix S 0 ij ,theconstraint3canbewrittenas X i;k 2 M x ik )]TJ/F29 11.9552 Tf 19.357 11.357 Td [(X k;j 2 M x kj = 8 > > > > > > < > > > > > > : )]TJ/F24 11.9552 Tf 9.298 0 Td [(b i if i 2 S ; b i if i 2 T ; 0 if i 2 V nf T [ s g foreach i 2 N Deneavariable z ij foreachvariable y k ,andcosts c ij and f ij suchthat c ij = )]TJ/F24 11.9552 Tf 9.298 0 Td [(w k f ij = h k ,andaconstant as = max k ,andset min k =0 foreachreaction k 2 M ,whichis denedbythearc i;j .Then,3canberewrittenby 0 x ij z ij 8 i;j 2 M Then,thefollowingformulationisaspecialcaseofOptknock: min X i;j 2 M c ij x ij + X i;j 2 M f ij z ij 59

PAGE 60

subjectto: X i;k 2 M x ik )]TJ/F29 11.9552 Tf 19.357 11.358 Td [(X k;j 2 M x kj = 8 > > > > > > < > > > > > > : )]TJ/F24 11.9552 Tf 9.298 0 Td [(b i if i 2 S ; b i if i 2 T ; 8 i 2 N 0 if i 2 V nf T [ s g x ij z ij 8 i;j 2 M x ij 0 8 i;j 2 M z ij 2f 0 ; 1 g8 i;j 2 M: AspecialcaseofOptknockgivenaboveisaUFNF.SinceUFNFisNP-Hard[67],OptKnockwithxedchargesisalsoNP-Hard. 3.3Methods Basedonstoichiometricmodel,themassbalanceconstraintsareformulatedas, S v =0 : where S isthestoichiometricmatrixand v isavectorofuxreactionrates.Theconstraintsare formulatedbasedontheenzymetype. 3.3.1SingleEnzyme Thereactioniscatalyzedbyasingleenzyme.Inthiscasetheremightbeaone-to-one correspondencebetweenenzymesandreactionsaswellasoneenzymemightcatalyzeseveral reactions.IntheOptknockformulation,itisassumedthateachreactioniscatalyzedbyonly oneenzyme,andweenforceboundrestrictionsoneachuxasafunctionoftheenzymes,i.e.if anenzymecorrespondingtoareactionisknockedoutthenthatreactioncannotresultanyux. Ontheotherhand,lowerboundsenforcethatifthereisaux,thenthereshouldbeanenzyme catalyzingthereactionandforcingalowerboundontheamountofuxproducedbytherelated reaction.NotethatintheOptknockenyzmesareindependentfromeachother. 60

PAGE 61

Theconstraintsenforcingboundsontheuxisgivenasfollows: min i T i v i max i T i : where T i isthebinaryvariabletakesvalue 1 ifanenzymeisknockedout,and 0 else. 3.3.2MultipleEnzymes Inthepresenceofthemultipleenzymes,onereactionmightbecatalyzedatleastbyoneor moreenzymes.Inthefollowingsubsections,weinvestigatethecorrelationbetweenenzymes catalyzingthereactions.Therelationoftheenzymesmightbesubstitute,collaborative,both substituteandcollaborativeorindependentfromeachother.Wedeneafunction F i which representstherelationshipbetweentheenzymes.Thenwecanrewritetheboundingconstraints ontheuxesasfollows: min i F i T i v i max i F i T i : 3.3.2.1Substituteenzyme Let S v i denoteasetofenzymescorrespondingtothereaction i ,whichproducesux v i andanenzyme, E ij belongstotheset S v i ,where j representsthenumberoftheenzymeamong allenzymescatalyzingreaction i ,and j S v i j = n .Whenthereaction i iscatalyzedbyasetof substituteenzymesin S v i ,onlyoneoftheseenzymesisnecessaryforthereactiontooccur.Let F i representtherelationshipbetweentheenzymesthatcansubstituteeachother.Thenwecan write F i asfollows: F i =max E ij 2 S v i f E ij g = OR E ij 2 S v i E ij Thenforthemultipleenzymecaseinwhichenzymescansubstituteeachother,werewrite theconstraint3asfollows: min i max E ij 2 S v i f E ij g v i max i max E ij 2 S v i f E ij g : 61

PAGE 62

Constraint3imposesnonlinearitytoourintegerprogram.Herewecanresolve nonlinearityinourproblembyperformingavariabletransformation,andobtainacompletely linearintegerprogram,whichcanbesolvedbycommercialsolverssuchasCPLEXandAMPL. Ourlinearizationtechniqueconsiderslowerandupperboundsseparately. Welinearizelowerboundingconstraintsgivenbytheinequality min i max E ij 2 S v i f E ij g v i asfollows, min i E ij v i 8 E ij 2 S v i Linearizationoftheupperboundingconstraintsgivenbytheinequality v i max i max E ij 2 S v i f E ij g ismorecomplicatedcomparedtothelowerbound.Forthelinearization, weconsidertwomethods.Intherstmethodwedene F i asbinaryvariables.Denotingthis methodasbinary,weprovidethefollowinglinearconstraintsrepresentingthenonlinearconstraint v i max i max E ij 2 S v i f E ij g asfollows: F i P n j =1 E ij n 8 i a F i X E ij 8 i b F i 2f 0 ; 1 g8 i c Inthesecondmethod,wedene F i ascontinuousvariables,anddenotethisapproach ascontinuousmethod.Thentheupperboundingconstraint v i max i max E ij 2 S v i f E ij g is representedbythefollowinglinearconstraints: F i n X j =1 E ij 8 i a F i 1 8 i b F i E ij 8 i c 62

PAGE 63

3.3.2.2Collaborateenzyme Inthecollaborationcase,areaction i iscatalyzedbyasetofcollaborateenzymesin S v i thenalloftheseenzymesarenecessaryforthereactiontooccur.Let C i representtherelationship betweentheenzymesthatcollaborate.Thenwecanwrite C i asfollows: C i =min E ij 2 S v i f E ij g = AND E ij 2 S v i E ij Forthecollaborateenzymecase,wecanrewritetheconstraint3asfollows: min i min E ij 2 S v i f E ij g v i min i min E ij 2 S v i f E ij g : Aswediscussedinthesubstituteenzymecase,constraint3forcesnonlinearitytoour problem.Herewealsoconsideralinearizationtechniqueinwhichwedenenewvariablesand provideusalinearintegerprogram.Asinthepreviouscase,weconsiderlowerandupperbounds separately. Welinearizeupperboundingconstraintsgivenbytheinequality v i min i max E ij 2 S v i f E ij g asfollows, v i min i E ij 8 E ij 2 S v i Nowweconsiderthelinearizationofthelowerboundingconstraintsgivenbytheinequality min i min E ij 2 S v i f E ij g v i .Analogoustothesubstituteenzymecase,weconsiderboth binaryandcontinuousmethods,wherewedene F i asabinaryvariablefortherstanda continuousvariableforthesecondmethod.Forthebinarymethod,weprovidethefollowing linearconstraintsrepresentingthenonlinearconstraint min i min E ij 2 S v i f E ij g v i asfollows: F i P n j =1 E ij n 8 i a F i P n j =1 E ij n )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 8 i b F i 2f 0 ; 1 g8 i c 63

PAGE 64

Inthecontinuousmethodthelowerboundingconstraint min i min E ij 2 S v i f E ij g v i is representedbythefollowinglinearconstraints: F i E ij 8 i a F i n X j =1 E ij )]TJ/F15 11.9552 Tf 11.955 0 Td [( n )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 8 i b F i 0 8 i c 3.3.2.3Enzymecomplex Inthiscase,thereactionisassociatedwithaBooleanequationrepresentingitsdependency onthepresenceofoneormultipleenzymes.Forexample, F i = E i 1 AND E i 2 OR E i 3 AND E i 4 .Infact,allthebooleanequationscantransformintothefollowingformation. F = E 11 ANDE 12 AND:::ANDE 1 i 1 OR E 21 ANDE 22 AND:::ANDE 2 i 1 OR ::: E n 1 ANDE n 2 AND:::ANDE ni 1 Deninganewvariable Z i correspondingtoeachORclause,werewriteFasfollows: F = Z 1 ORZ 2 OR:::Z 3 :::Z n where Z j = E j 1 ANDE j 2 AND:::E j 2 :::E jn Welinearize F = Z 1 OR Z 2 OR ::: OR Z n and Z j = E j 1 AND E j 2 AND ::: AND E ji 2 by usingthemethodsdescribedinSection3.3.2.1andSection3.3.2.2,respectively. 64

PAGE 65

3.4Results Inthissection,weevaluateouralgorithmsonsyntheticalandbiologicaldatasets.We evaluatetheirperformancequantitativelybyusingtherunningtimeinseconds. 3.4.1Datasets Syntheticaldata:Webuildoursyntheticaldatatoevaluatetheperformanceofourmethods bythefollowingways.Werandomlygeneratetennetworksforeachtestcase.Forexample,for testcase,100,thatis,thenumberofcompoundsis25andthenumberofreactionis100.We createthesenetworkswhichsatisfythepowerlawdistribution.Thatmeans,theprobabilityofthe numberofreactionseachcompoundjoinsdecreasesexponentiallywiththenumberofreactions. Similarly,theprobabilityofthenumberofenzymescatalyzingthesamereactiondecreases exponentiallywiththenumberofenzymes.Weevaluatetheperformanceofourmethodsby runningtime. Biologicaldata:Weusethemetabolicpathwayinformationof H.sapiens fromKEGGas theinputdataset.Wecombineallthemetabolicpathwaysasthewholemetabolism.Weapply ourbinarymethodtothismetabolism.Weevaluatetheperformanceofourbinarymethodbythe runningtimeinseconds. 3.4.2Results Inthissection,weevaluateourmethodsbasedonlinearprogrammingonsyntheticaland biologicalnetwork.Section3.4.2.1describestheinuenceoftheenzymetopologyonthelinear programmingmethod.Wecompareourbinarymethodwithcontinuousmethodonthesynthetical networkinSection3.4.2.2.Section3.4.2.3showsthebiologyapplicationofourmethod. 3.4.2.1Effectofsingleandmultipleenzymes Thissection,weevaluatetheinuenceoftheenzymetopologyonthelinearprogramming performance.WecreateasetofsyntheticalnetworksdescribedinSection3.4.1.Thesesyntheticalnetworksareseparateintogroups.Fortherstgroup,eachreactioniscatalyzedbyonlyone 65

PAGE 66

enzyme.Forthesecondgroup,thetopologyofreactionsandcompoundsaretheexactlysame astheformergroupofnetworks.Theonlydifferencebetweenthesetwogroupsisthatthereare 40%ofreactionsarecatalyzedbymultipleenzymesinthisnewnetwork.Formultipleenzymes, thenumberofenzymessatisesthepowerlawdistribution.Thatis,theprobabilityofthenumber ofenzymescatalyzingthesamereactiondecreasesexponentiallywiththenumberofenzymes. Theassociationofenzymesexiststwocases.Oneisthesubstitutecase,thatis,thereexistsOR relationshipamongalltheseenzymes.Oneisthecollaboratecase,thatis,theseexistsAND relationshipamongalltheseenzymes. Figure4-7showstherunningtimebetweenthesetwonetworksforbinarymethod.The resultshowsthatthemultipleenzymetopologycostmuchmoretimethanthesingleenzyme topology.Thus,theenzymetopologyinuencestheperformanceofthelinearprogramming algorithmverymuch.Therunningtimeofthemultipleenzymesis2to16timesthatofthesingle enzyme.Withthesizeofnetworkincreases,therunningtimeincreasesexponentially.Therefore, themethodbasedonlinearprogrammingisnotpracticalforlargenetworks. 3.4.2.2Evaluationofbinarymethodandcontinuousmethod Section3.3providestwomethodsforthemultipleenzymeknockoutproblem.Inthis section,weevaluatetheperformanceofthesetwomethods,binarymethodandcontinuous methodbyrunningtime.Figure3-3showstheaveragerunningtimeforbinarymethodand continuousmethodforthenetworkswhosemultipleenzymesaresubstituteones.Binarymethod runs60%to1100%fasterthancontinuousmethod.Similarly,inFigure3-4,allthemultiple enzymesarecollaborateassociation.Thebinarymethodstillrunsmuchfasterthancontinuous method. 3.4.2.3Biologicalapplication Weusethemetabolicpathwayinformationof H.sapiens fromKEGGastheinputdataset. Wecombineallthemetabolicpathwaysasthewholemetabolism.Table4-3showstherunning 66

PAGE 67

timeinsecondsforthewholemetabolism.#E,#Rand#Cdenotethenumberofenzymes, reactionsandcompoundsrespectivelyinthemetabolism.`AND'meanstheassociationofall themultipleenzymesisAND.`OR'meansthattheassociationofallthemultipleenzymesis OR.FromTable4-3,ourbinarymethodrunslessthanonesecondforthewholemetabolism. Therefore,ourbinarymethodisusefulforthebiologicalapplication. 3.5Discussion Thoughthereexistalotofarticlestostudytheenzymeknockoutstrategy,therearefewof thesepapersconsideringtheenzymeassociation.Thischapterpresentstheenzymeknockout strategiesonFBAwhichcandealwiththesituationthatareactioniscatalyzedbymultiple enzymes.Consideringtheenzymeassociation,thatis,AND,ORoracombinationofthem, weprovideabinarymethodandcontinuousmethodforeachassociation. OptKnockisapopularframeworkfortheenzymeknockoutstrategy.Weprovethat ndingtheenzymeknockoutstrategybyOptKnockframeworkisNP-hard.Itisconsistentwith theexperimentresultsthatwhenthenetworksizeincreases,therunningtimeofOptKnock frameworkincreasesexponentially. Ourexperimentsectionshowsthattheenzymeassociationinuencetheperformanceof linearprogrammingmethodverymuch.Weobservethatourbinarymethodrunsmuchfaster thancontinuousmethod.Forthepathwaysof H.sapiens fromKEGG,ourbinarymethodruns lessthanonesecondforthewholemetabolism.Therefore,ourbinarymethodisusefulforthe biologicalapplication. Table3-1.Runningtimeinsecondsofourbinarymethodforthewholemetabolismof H.sapiens fromKEGG. KEGG #E #R #C AND OR wholemetabolism 640 1176 1067 0.41 0.14 67

PAGE 68

Figure3-1.RedrawProtein-reactionassociationsbyReedetal. Figure3-2.Theaveragerunningtimeinsecondsforthenetworkswithsingleenzymeand multipleenzymessubstituteandcollaborate. 68

PAGE 69

Figure3-3.Theaveragerunningtimeinsecondsofbinaryandcontinuousmethodsforthe networkswhosemultipleenzymesaresubstituteones. Figure3-4.Theaveragerunningtimeinsecondsofbinaryandcontinuousmethodsforthe networkswhosemultipleenzymesarecollaborate. 69

PAGE 70

CHAPTER4 ENZYMATICTARGETIDENTIFICATIONUSINGNON-LINEARMODELBY MANIPULATINGTHESTEADYSTATE Metabolicpathwaysshowthecomplexinteractionsamongenzymesthattransformchemical compounds.Thestateofametabolicpathwaycanbeexpressedasavector,whichdenotesthe yieldofthecompoundsortheuxinthatpathwayatagiventime.Thesteadystateisastate thatremainsunchangedovertime.Alteringthestateofthemetabolismisveryimportantfor manyapplicationssuchasbiomedicine,bio-fuels,foodindustryandcosmetics.Thegoalof theenzymatictargetidenticationproblemistoidentifythesetofenzymeswhoseknockouts leadthemetabolismtoastatethatisclosetoagivengoalstate.Giventhatthesizeofthe searchspaceisexponentialinthenumberofenzymes,thetargetidenticationproblemisvery computationallyintensive. Inthischapter,wedevelopefcientalgorithmstosolvetheenzymatictargetidentication probleminthischapter.Unlikeexistingalgorithms,ourmethodworksforabroadsetof metabolicnetworkmodels.Wemeasuretheeffectoftheknockoutsofasetofenzymesasa functionofthedeviationofthesteadystateofthepathwayaftertheirknockoutsfromthegoal state.Wedeveloptwoalgorithmstondtheenzymesetwithminimaldeviationfromthegoal state.Therstoneisatraversalapproachthatexplorespossiblesolutionsinasystematicway usingabranchandboundmethod.Thesecondoneusesgeneticalgorithmstoderivegood solutionsfromasetofalternativesolutionsiteratively.Unliketheformerone,thisonecanrunfor verylargepathways. Ourexperimentsshowthatouralgorithms'resultsfollowthoseobtainedinvitrointhe literaturefromanumberofapplications.Theyalsoshowthatthetraversalmethodisagoodapproximationoftheexhaustivesearchalgorithmanditisupto11timesfasterthantheexhaustive one.Thisalgorithmrunsefcientlyforpathwayswithupto30enzymes.Forlargepathways,our geneticalgorithmcanndgoodsolutionsinlessthan10minutes. 70

PAGE 71

4.1MotivationandProblemDenition Metabolicpathwaysareoneofthemostimportantdataresourcesinbiology.Ametabolic pathwayisacomplexnetworkofchemicalreactionsoccurringwithinacell.Enzymescatalyze thesereactions.Reactionstransformasetofcompoundsintoanothersetofcompounds.Note that,thetermnetworkisalsousedintheliteraturetodenotetheunionofallpathwaysof anorganismorlargepathways.Tokeepthenotationconsistent,wewillusethetermpathway insteadofnetworkregardlessofthepathwaysizeinthischapter.Thestateofametabolic pathwaycanbeexpressedasavector,whichdenotestheyieldofthecompounds[106]ortheux [71]inthepathwayatagiventime.Yieldofacompoundistheamountofproductobtainedinthe chemicalreaction[106].Theuxofareactionistherateatwhicheachcompoundisproducedor consumedbythatreaction[71].Steadystateisthestatethatremainsunchangedovertime. Manyapplicationsrequirealteringthesteadystateofagivenpathway.Forexample,external factorsorgeneticmutationscanchangetheproductionrateofasetofenzymes.Theycaneven modifythestructureofproducedenzymes.Lowormissingactivityofanenzymemayresultin theblockageofthepathway.Furthermore,thiscanpropagatetootherpartsofthepathwaythat needthecompoundsproducedintheblockedpartofthepathway.Asaresult,theproductionof compoundsmayincreaseordecrease.Suchaberrationsinthestateofthemetabolismcanlead toseverediseases.Examplesincludementalretardation,seizures,decreasedmuscletone,organ failureandblindness[27,89].Thus,changingthestateofthemetabolismbacktoadesiredlevel isoftenneeded. Increasingorreducingtheproductionofcertaincompoundsinametabolismisessential formanyindustrialapplicationssuchascosmeticsandfoodindustry.Forexample,fattyacid biosynthesispathwayconvertsfattyacidsthatareusedinthecosmeticindustryincreamsand lotions[105,115].Butanoatemetabolismproducespoly-hydroxybutyratewhichisessentialfor producingplastics[11].MevalonicacidpathwayandMEP/DOXPpathwayproducecarotenoid 71

PAGE 72

thatareoftenusedasanti-oxidantinfoodindustry[83].Themetabolismofmanyorganisms, suchasbacteria,algaeandplantsnaturallyproducethesecompounds.Acommonpractice istoextractsuchcompoundsfromtheseorganisms.Bymanipulatingthepathwaysofthese organisms,theproductionofthesecompoundscanbeincreasedsignicantly. Onewaytochangethestateofapathwayclosetothedesiredoneistoknockoutasetof enzymes.Whenanenzymeisknockedout,itcannotcatalyzethereactionsitisresponsiblefrom. Asaresult,someentriesinthesteadystateofthepathwaymayincreaseandsomemaydecrease. TheEnzymaticTargetIdenticationProblem,theproblemaddressedinthischapter,aimsto identifythesetofenzymeswhoseknockoutsleadtoasteadystateofthemetabolicpathway thatisasclosetoausersuppliedgoalstateaspossible.Inordertoimprovethegeneralityof thisproblem,weallowthegoalstatetobetransientaswellassteady.Inotherwords,itmaybe impossibletoreachtothegoalstateexactly.However,therecanstillbesteadystatesthatare closetothegoalstate. Thesizeandthecomplexstructureofthemetabolicpathwaysalongwiththelargesize ofthesolutionstatespacemakestheenzymatictargetidenticationproblemcomputationally challenging.Wecancomputethesteadystateofametabolicpathwayafterknockingouta givensetofenzymesinpolynomialtime[107]seethelatersectionforadiscussionofthe steadystatecomputation.However,enzymatictargetidenticationaimstosolvetheinverse problemofndingthesetofenzymestoachieveasteadystatethatisclosetoagivengoal state.WehaveshownthatasimpliedversionoftheenzymeidenticationproblemisanNPcompleteproblem[44].Thus,exhaustivesearchisimpracticalforthetypicalpathwaysthat containhundredsofenzymes,thousandsofcompoundsandreactions.Assumingthatnding asteadystateofapathwaytakesontheorderof10milliseconds,itwillrequirenearlyayear ofcomputationaltimetotestallsolutionswithuptofourenzymecombinationsonametabolic pathwaywith500enzymes. 72

PAGE 73

Thereareseveralalgorithmsintheliteraturewhichaimtoaddresstheenzymatictargetidenticationproblem.Thesealgorithmshowevermakestrongassumptionsaboutthemetabolism ortheproblemdomain.Asaresult,theyworkonlyforspecialsettingsandfailtoaddressthis problemwhentheseassumptionsdonothold. Limitationsofintegerlinearprogramming-basedmethods.Oneclassofalgorithmsuse integerlinearprogrammingtotackletheenzymatictargetidenticationproblem.OptKnockis oneexampletothealgorithmsinthisclass[9].Thesestrategiessimulatethemetabolismusing FluxBalanceAnalysis,FBA[7,29,48].Atahighlevel,theyrepresenteachuxasavariable andsolvealinearequationwithlinearconstraintsonthesevariablesasfollows. MaximizeorminimizeObjectivefunction Subjecttosteadystateconstraints Thisformulationrepresentsthemetabolismusingastoichiometricmatrix S wheretherowsand thecolumnscorrespondtocompoundsanduxesrespectively.Assumethat x = [ x 1 x 2 x n ] 0 denotestheuxvectorforanetworkwith n uxes.Theobjectivefunctionistypicallyto maximizeavariableoralinearcombinationofasetofvariables.Thusanobjectivefunction istypically P i c i x i where c i aregivenconstants.Theconstraintsdenethesteadystateusing stoichiometricmodel.Thesolutiontotheequation Sx =0 isthesetofallsteadystatesinthis model,andthisequationdenestheconstraintsoftheintegerlinearprogrammingproblem. Integerlinearprogramming-basedmethodshaveseriousdrawbacks.First,theyarelimitedto thelinearmodels.Inotherwordstheyrequirethatboththeobjectivefunctionandconstrainsare representedaslinearequationsorinequalities.However,manymodelsformetabolicnetworks donotsatisfythisrequirement.Forexample,S-systems[86,107]denesthesteadystateasthe solutiontotheequationsystem X i = i Y j X g ij j )]TJ/F24 11.9552 Tf 11.955 0 Td [( i Y j X h ij j =0 ; 8 i: 73

PAGE 74

Here,thevariable X i representstheconcentrationofthe i thmolecule. X i isthederivativeof X i .Theconstants i i and g ij h ij denotetherateofthereactionandtherateatwhicheach moleculecontributestoareaction.Clearly,theconstraintsarenon-linearintheS-systemsof equations.Takingthelogarithmoftheconstraintslinearizestheconstraintsasfollows.Dene Y i =log X i .Theconstraintsbecome log i + X j Y j g ij )]TJ/F15 11.9552 Tf 11.955 0 Td [(log i + X j Y j h ij =0 ; 8 i: However,theobjectivefunctiontransformsintothenonlinearform P i c i e Y i .Therefore,integer linearprogrammingfailstosolvetheenzymatictargetidenticationproblem. GMAmodel[73,107]isevenmoreproblematicforintegerlinearprogramming-based methods.Thismodelconsiderseachreactionthatacompoundisapartofseparatelyand representsthesteadystateusingthefollowingequations. X i = X h ih Y j X f ij j =0 ; 8 i: Here,theconstants ih and f ij denotetherateofthereactionandtherateatwhicheachmolecule contributestoareaction.Inthismodel,notonlytheconstraintsarenon-linear,butalsothe summationofthemultiplicativetermsmakeitimpossibletotakethelogarithmoftheconstraints. Limitationsofthemethodsforbooleannetworkmodels.Sridharetal.andSongetal. consideredasimpliedversionoftheenzymatictargetidenticationproblem[9496].In theirversion,eachentryofthestatedenoteswhetherthecorrespondingcompoundispresent ornot.Forthissimpliedversion,thegoal,then,istoidentifythesetofenzymeswhose knockoutseliminateallthetargetedcompoundswhileincurringminimumdamage.Targeted compoundsaretheoneswhoseproductionsneedtobestopped.Theyhavedeneddamage asthenumberofnon-targetedcompoundsthatareeliminatedbecauseofknockouts.Minimum damageistheminimumnumberofnon-targetedcompoundseliminatedfromthemetabolism 74

PAGE 75

whileeliminatingthetargetedcompoundsamongallpossiblewaysofeliminatingthetargeted compounds.Sridharetal.,usedabranchandboundstrategytosolvethisproblemforpathways withupto32enzymesinlessthananhour[96].Sridharetal.andSongetal.developed aniterativeheuristicmethodtoscaletheirsolutionstolargepathways[94,95].Thebinary modeldescribedintheseworks,however,islimited.Itcannotaddresspartialproductionofthe compounds.Inaddition,itignoresthechangeinuxoryieldduetotheknockoutsofasetof enzymes. Figure4-1illustratesthelimitationsofthebinarymodelusedbySridharetal.andSong etal.[95,96].Inhibiting E 1 knocksoutcompounds C 2 C 4 and C 5 .Suppose C 4 isthetargeted compound.Inhibiting E 1 stopstwonon-targetedcompounds C 2 and C 5 .Thedamageistwo usingthebinarymodel.Thereareseveraldrawbacks.Oneisthattheknockoutof E 1 accumulates C 1 .Thebinarymodel,however,ignoresthis.Furthermore,althoughtheknockoutof E 1 doesnot fullyeliminate C 7 ,itinuencestheproductionof C 7 .Thebinarymodeldisregardsthisinuence aswell. Klamtetal.introducedaminimalcutsetproblem,whichaimedtondaminimalset ofreactionswhosedeletionleadstonofeasiblebalanceduxdistributionintheobjective reaction[54].Theauthorsdescribedseveralalgorithmstosolvetheminimalcutsetproblem. However,theminimalcutsetmodelhasthedrawbackssimilartotheoneusedbySridharetal. andSongetal.Theaimofthismodelistoblocktheobjectivereactionfunctionwhichresults intheremovaloftheobjectivemetabolitesynthesis.Itcannotbeusedwhentheaimistopartly decreaseorincreasetheobjectivemetabolites. Limitationsofrandomizedmethods.ExtremePathwayAnalysis[78]usesFBAtond thepathinapathwaythatmaximizesorminimizestheproductionofagivencompound.This problemissimilartoaspecialcaseoftheenzymetargetidenticationproblemconsideredinthis chapter.Deetal.forexample,considertheextremepathwayanalysisproblem[18].Inorderto 75

PAGE 76

reducetheyieldofacompoundinapathway,Deetal.useFBAtocomputetheoptimalpathway sothattheyieldofthetargetmetaboliteisminimum.They,then,changetheconcentrationofthe enzymesinotherpathssothatthesepathsareinactiveexceptthatoptimalone.Thismethodhas twomajordrawbacks.First,itrequireschangingtheconcentrationofmanyenzymes.Inpractice, changingtheenzymeconcentrationisacostlyprocess.Therefore,thenumberofenzymeswhose concentrationsarealteredshouldbekeptlow.Second,thealterationsthatchangetheproduction ofacompoundcanaffecttheproductionofothercompoundsinthatpathway.Thus,thesolution foundbythismethodcanhavesignicantside-effects.Inadditiontothesedrawbacks,extreme pathwayanalysiscannotsolvetheenzymatictargetidenticationproblemconsideredinthis chapter.Here,wedevelopmethodstoovercometheabove-mentioneddisadvantagesandsolvea moregenericproblem. Limitationsofevolutionaryprogrammingmethods.Patiletalpresentedanevolutionary programmingmethodforndingoptimalgenedeletionstrategies[72].Theirmethodgenerates apopulationofrandomsolutionsandusegeneticalgorithmtoimprovethispopulation.Their methodcanbeappliedtonon-linearmodelsaswellaslinearmodels.However,ithasseveral drawbacks.Theenzymatictargetidenticationproblemlooksforasetofenzymesthatare connectedoveracomplexnetworkandinteractthroughreactionsovercompounds.Patilet al'smethodignorestheseinteractionswhileconstructingthepopulationofsolutionsaswell ascreatingnewgenerationofsolutionsusingcrossover.Theyinsteadcreatethesesolutions randomly.Thesearchspaceoftheenzymetargetidenticationproblemisexponentialinthe numberofenzymes.Asaresult,theirmethodfailstoconvergetoagoodsolution.Furthermore, thesolutionsfoundbytheirmethodsuggestsknockingoutunnecessarilylargenumberof enzymes.WeexperimentallyevaluatetheirmethodinSection4.4.2.3anddeferfurtherdiscussion tothatsection. 76

PAGE 77

Allofabove,weconcludethat,existingalgorithmsworkonlyunderalimitedspecicset ofassumptionsaboutthemetabolism.Since,theseassumptionsdonotholdinpracticemostof thetime,moregenericmethodsthatcanaddresstheenzymatictargetidenticationproblemfor realisticmetabolicnetworkmodelsareneeded. 4.2ComputingtheSteadyState Thedevelopedalgorithmsrequirecomputingthesteadystateofthemetabolicpathway foreachenzymevectori.e.,candidatesolutionthatisconsideredduringthesearch.Sinceour algorithmsevaluatemanyenzymevectors,weneedtoanswerthefollowingquestion:Canwe computethesteadystateofthepathwayefciently?Thisquestionhasbeenstudiedthoroughlyin theliterature.Here,webrieydiscusstwoalternativewaystocomputethesteadystatebasedon twoalternative,yetsimilar,steadystatedenitionsSection4.2.1and4.2.2. 4.2.1ComputingtheSteadyStateUsingFlux Theuxofareactionshowsthespeedatwhicheachcompoundisproducedorconsumed bythatreaction.Asthereactionsprogress,eachuxcanincreaseordecreaseotheruxes.The pathwayreachestoasteadystatewhenalltheuxvaluesremainunchangedovertime. Figure4-2showsahypotheticalpathwayalongwithuxesthatoperateonit.Let v 1 v 2 v 3 denotetheinternaluxforreaction R 1 R 2 and R 3 b 1 b 2 ::: b 8 aretheexternalux.Thestate ofthepathwayshowsthesecurrentinternalandexternaluxwhichrelatestotheotherpathways orcompounds.Assumethereactionsinthepathwayareasfollows. R 1 : C 1 C 2 +2 C 7 R 2 :2 C 2 + C 3 C 4 + C 5 R 3 : C 6 C 7 + C 8 + C 9 OnewaytocomputethesteadystateofthispathwayistoemployFluxBalanceAnalysisFBA[7,29,48].FBAcreatesamatrix, A ,thatshowshoweachuxoperatesoneach compound.Eachrowofthismatrixcorrespondstoacompoundandeachcolumncorresponds 77

PAGE 78

toaux.Forexample,forthepathwayinFigure4-2, A hasninerowsand11columnssince thereareninecompoundsand11uxes.Considertheux v 1 whichcorrespondstoreaction R 1 above.Assumethattherstcolumnof A correspondsto v 1 .Thevaluesof A inthiscolumnare A [1 ; 1]= )]TJ/F15 11.9552 Tf 9.299 0 Td [(1 A [2 ; 1]=1 A [7 ; 1]=2 andallothersarezero.Thisisbecause v 1 consumes oneunitof C 1 toproduceoneunitof C 2 andtwounitsof C 7 .Let v denotetheuxvectori.e., eachentryofthisvectorcorrespondstoaux.Theproduct Av ,thenshowstheamountof changeintheconcentrationofeachofthecompounds.Inordertocomputethesteadystate,FBA makesseveralassumptions.Oneofthemisthatthetotalinuxofeachcompoundisequaltothe totaloutuxofthatcompound.Thus,FBAcomputesthesetofallpossiblesteadystatesasthe solutionspaceof v totheequation dA dt = Av =0 : Typically,thenumberofvariablesinthisequationismorethanthenumberofconstraints. Asaresultthereareinnitepossibilitiesfor x .Therearemanywaystonarrowdownthesolution space.Oneofthemisthatauxcannotbeanegativevaluei.e., v i 0 forall i .Another assumptionisthatthemaximalrateofincominguxtheuxentersthatthemetabolismfrom externalsourcesislimited,e.g. 1.Anotherwayistoincludeanobjectivefunction,suchas maximizingthebiomassortheproductionofaspeciccompoundorenergy. ExampleA.IConsiderthepathwayinFigure4-2.Assumethatalltheenzymesinthis pathwayarepresentinthemetabolismandtheyarenotknockedout.Also,assumethatasthe objectivefunction,wemaximizethetotaloutputux.Inotherwords,wewanttondthesteady statethatmaximizes b 4 + b 5 + b 6 + b 7 + b 8 Thus,theproblemofsolving Av =0 translatesintothefollowingone:Maximize b 4 + b 5 + b 6 + b 7 + b 8 subjecttotheconstraints b 1 v 1 =0 v 1 -2 v 2 =0 78

PAGE 79

b 3 v 2 =0 v 2 b 4 =0 v 2 b 5 =0 b 2 v 3 =0 2 v 1 + v 3 b 6 =0 v 3 b 7 =0 v 3 b 8 =0 b 1 b 2 b 3 1 v 1 v 2 v 3 b 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 0 Thesolutionis, v 1 =1.0, v 2 =0.5, v 3 =1.0, b 1 =1.0, b 2 =1.0, b 3 =0.5, b 4 =0.5, b 5 =0.5, b 6 =3.0, b 7 =1.0, b 8 =1.0.Wedenethesteadystateasalltheuxinthepathway.Thatis, [ v 1 v 2 v 3 b 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 ]= [1,0.51,1,1,0.5,0.5,0.5,3,1,1]. 2 ExampleA.IINowassumethat E 1 isknockedoutinthepathwayusedinExampleI.A, Since E 1 catalyzesthereaction R 1 R 1 becomesinactive.Then v 1 becomezerointheux computationofExampleA.I.Therefore,weaddonemorelimitation v 1 =0totheaboveux computation.Thesteadystateundertheseconstraintsis[0,0,1,0,1,0,0,0,1,1,1]. 2 4.2.2TheoreticalYieldComputation FBAcomputestheuxatthesteadystate.It,however,doesnotdescribetheconcentration ofeachcompoundwhenthemetabolismreachestoasteadystate.Theamountofeachcompound atthisstateisacrucialinformationformanyapplications.Forexample,whendopamine concentrationinthebrainreducesbelowacertainlevel,themotorsystemnervesbecomeunable tocontrolmovementandcoordination. Wecancomputetheamountofeachcompoundinthemetabolismasitstheoreticalyield. Thetheoreticalyieldofacompoundistheamountofthatcompoundproducedbytheunderlying reactions[106].Comparedtotheuxcomputation,whichdescribestheprocessofthereactions, 79

PAGE 80

theyieldcomputationonlydepictstheoutcomesofthereactions.Wecancomputetheyieldof eachcompoundinapathwayforagiveninitialstateusingthestoichiometryofthatpathway. Weusethesamematrix A thatisusedinSection4.2.1ofAppendixtodenotethestoichiometry. Recallthatthestoichiometriccoefcientsofareactionshowthedegreetowhichachemical speciesparticipatesinareaction. Inordertocomputetheyieldofeachcompoundinthesteadystate,weneedtheinitialstate ofthepathway.Forsimplicity,weassumethattheinitialyieldsofthecompoundswhicharenot producedinthatpathwayareoneunite.g.onemol.Forallothercompounds,wesetthemto zero.Notethattheprocessofcomputingthesteadystateisorthogonaltothatoftheinitialstate. Thus,onecanreplaceourstrategyofselectingtheinitialstatewithadifferentone.We,then, simulatethereactionsinthepathway.Weassumethatallthereactionstakeplacesimultaneously giventhattheenzymesandinputcompoundsrequiredforthemarepresentinthemetabolism. Forsimplicity,wewillassumethatallreactionstakethesameamountoftimeinourdiscussion. Computingtheyieldforvaryingreactionspeedsissimilar. ExampleB.IAssumethereactionsinthepathwayarethesameasthatinSection4.2.1of Appendix.Also,assumethatalltheenzymesarepresentinthemetabolism,similartoExample A.I.InFigure4-2,compounds C 1 C 3 and C 6 arenotproducedinthepathway.Theyareexternal inputstothepathway.Thus,theinitialstateofthepathwayis[1,0,1,0,0,1,0,0,0].Inother words,thereisoneunitof C 1 C 3 and C 6 andnoneoftheothercompoundsexistinitially. Initially,thereactions, R 1 and R 3 ,areactive.Thisisbecausetheyieldsof C 1 and C 6 are non-zerointheinitialstate.However,thereaction R 2 isinactivebecauseofthelackof C 2 R 1 consumesonemolof C 1 ,andgeneratesonemolof C 2 andtwomolof C 7 R 3 consumesone molof C 6 andproducesonemolof C 7 ,onemolof C 8 andonemolof C 9 .Thecurrentstateafter thesereactionstakeplaceis [0 ; 1 ; 1 ; 0 ; 0 ; 0 ; 3 ; 1 ; 1] .Atthismoment, R 2 isactiveforbothyields of C 2 and C 3 arenon-zero. R 1 and R 2 areinactivefortheyieldsof C 1 and C 6 arezero.Then, R 2 80

PAGE 81

consumesonemolof C 2 and 1 2 molof C 3 ,andgenerates 1 2 molof C 4 and 1 2 molof C 5 .Then, 1 2 molof C 3 isleft.Thus,thestateofthepathwayis[0,0,0.5,0.5,0.5,0,3,1,1].Sincetheyields of C 1 C 2 and C 6 becomezero, R 1 R 2 and R 3 areinactive.Asteadystateisreachedatthispoint astheyielddoesnotchange. 2 ExampleB.IIAssumethat E 1 isknockedoutinthepathwayusedinExampleA.II.This makes R 1 inactiveas E 1 catalyzes R 1 .Forconsistency,assumetheinitialstateofthepathway isalso[1,0,1,0,0,1,0,0,0]i.e.,sameasthatinExampleB.I.Then,only R 3 isactivefor boththeinputcompound C 6 andtheenzyme E 3 arepresent. R 3 consumesonemolof C 6 andgeneratesonemolof C 7 C 8 and C 9 .Thecurrentstateis [1 ; 0 ; 1 ; 0 ; 0 ; 0 ; 1 ; 1 ; 1] .Atthis moment,allthereactionsbecomeinactive.Therefore,themetabolismcomestoasteadystate.As comparedtothesteadystatewhenalltheenzymesarepresent,theyieldof C 1 increasesandthe yieldof C 7 decreases. 2 Therearefourimportantobservationsthatfollowfromthediscussionofsteadystateforthe theoreticalyieldcomputation. Weassumethatallreactionstakeplacesimultaneouslyaslongasalltheinputcompounds andtheenzymesarepresentinthemetabolism.Wealsoassumethateachreactiontakesplace untiltherearenosufcientcompoundsthatitcanconsume. Theoretically,thestatetransitionofapathwaymayleadtoaninniteloopofasequenceof morethanonestate.Forexample,thetranslations a b c a showaloopofthreestates a b and c .Suchloopsareconsideredasequenceofsteadystates.Onecanuseaverage,max,minor otherfunctionstosummarizesuchstatesinasinglestate.Wedonotdiscussthisindetailasitis beyondthescopeofthischapter. Thesteadystatedependsontheinitialstate.Onecanshowthatthemetabolismisguaranteedtoreachtoasteadystateorasequenceofsteadystatesandthatthesteadystatedependson theinitialstate.Theproofoftheformerfollowsfromthesynchronoustransitionmodel.Thestate 81

PAGE 82

transitionmoveseachstatetoauniquenextstate.Thus,atransitioneithercreatesanewstate,remainsconstantorcreatesoneofthepreviouslyvisitedstates.Thelattertwocasescorrespondto steadystates.Theproofofthedependanceofthesteadystateontheinitialstatecanbedoneby simplysettingthevalueof C 1 tozerointheinitialstateofExampleII.A.Whenalltheenzymes arepresent,thesteadystateafterthismodicationis[0,0,1,0,0,0,1,1,1],whichisdifferent thantheoneinthatexample. Thereisastrongcorrelationbetweenthesteadystateoftheuxandthesteadystateofthe yield.Astheuxreachestoasteadystate,thetotalinuxandoutuxofeachcompoundremains unchanged.Thusthederivativeoftheyieldofeachcompoundremainsunchanged. Theyieldcomputationisagoodapproximationofthestateofthepathwayandthisvalue canbemeasuredatlab.Itiscomputationallydesirablesinceitdoesnotrequirethefunctionalor kineticinformationunlikeconcentrationorux.Theseinformationisnotavailableformajority oftheexistingpathwaydatabases.Weuseyieldtodenotestateinthischapter.Furthermore,this chapterisorthogonaltothestatecomputation. 4.2.3ConcentrationComputation Thefollowingequationsshowtherelationshipbetweentheconcentrationofthemolecules andtheirratesofchangeforFigure4-2.Forexample, d dt C 1 meanstheconcentrationchangeof compound C 1 pertimeunit.Itequalsthattheincomingux b 1 minustheoutgoingux v 1 d dt C 1 = b 1 v 1 d dt C 2 = v 1 -2 v 2 d dt C 3 = b 3 v 2 d dt C 4 = v 2 b 4 d dt C 5 = v 2 b 5 d dt C 6 = b 2 v 3 d dt C 7 =2 v 1 + v 3 b 6 82

PAGE 83

d dt C 8 = v 3 b 7 d dt C 9 = v 3 b 8 Withmorebiologyinformation,e.g.enzymekinetics,metaboliccontrolanalysis,theabove equationscancomputethetemporalevolutionoftheconcentrationsstartingfromarbitrarily givenvalues.Forexample,assume b 1 =0and v 1 = k .Then, d dt C 1 = 0 )]TJ/F24 11.9552 Tf 12.476 0 Td [(k C 1 t = C 1 kt Then,wecangettheconcentrationforeachcompoundwhentime 7!1 .Wedenethesteady stateasalltheconcentrationswhentime 7!1 inthepathway. Ifenzyme E 1 isinhibited,thereactionof R 1 willbechangedandtheeffectivesubstrateconcentrationwillalsobealtered.Basedonthebiochemicalknowledge,wecanaddmoreequations totheaboveconcentrationcomputation.Bythisway,wecancomputetheconcentrationofeach compoundwhen E 1 isinhibited.Thesteadystateofconcentrationcomesout. 4.3Methods Inthissection,wepresentinsilicomethodsfortheenzymatictargetidenticationproblem. Werstprovideameasuretocomputethedistancebetweenthecurrentsteadystateandthe goalstateSection4.3.1.We,then,describetwoalgorithms,traversalandgeneticalgorithms Section4.3.2and4.3.3,tosolvetheenzymatictargetidenticationproblem. 4.3.1State-Distance Thersttaskthatneedstobeaddressedistomeasurethedistancebetweenagivengoal stateandthesteadystateofthepathwayafterknockingoutasetofenzymes.Inordertoachieve this,werstcomputethesteadystateofthepathwayafterasetofenzymesareknockedout. Wethenmeasurethedistancebetweenthisstateandthegoalstate.Wecallthismeasurethe State-DistanceSD. Thestateofametabolicpathwayisavectorthatindicatesitscurrentstatus.Thereare alternativewaystodeneandcomputethestateofagivenpathwayintheliterature.Thesealternativesaresimilarinspirit.Eachentryofthestatevectordenotestheyieldofacompoundora 83

PAGE 84

uxinthepathway.Yieldofacompoundistheamountofproductobtainedinthechemicalreaction[106].Theuxofareactionistherateatwhicheachcompoundisproducedorconsumedby thatreaction[71].Wecomputetheyieldofeachcompoundinthesteadystateusingthereaction parameterssuchastherateatwhichitisconsumedorproducedbyeachreaction. WeapplyFBA[7,29,48]orsolveS-systemsofequationstogettheuxofthepathway inthesteadystate.Usually,FBAproducesaspaceofsteadystatesthatcontainsinnitelymany possiblesteadystates.Toselectauniquesteadystate,FBAenforcesoptimizinganobjective functioninthesolutionspace.TheobjectivefunctionofFBAoftenmaximizesbiomass[24]or theproductionofATP[80].Sincetheliteraturecontainsdetaileddiscussiononthesteadystate computation,wedeferthediscussionofthesteadystatecomputationtoAppendix.Intherest ofthischapter,eachentryofthesteadystatedenotestheyieldofacompoundunlessotherwise stated.Thus,inournotation,thesteadystatevectorforapathwayhasasmanyentriesasthe numberofcompoundsinthatpathway. Givenagoalstatefortheunderlyingpathway,next,wediscusshowwecomputethe distance,SD,betweenthegoalstateandthecurrentsteadystate.NotethatSDisageneric measurethatcanbeusedtorepresentabroadsetofobjectivefunctionsincludingtheonesused intheliterature. WerstpresentthenotationtoformallydeneSD.Assumethatthenumberofcompounds inthepathwayis m .Wedenotethegoalstateas V G = [ g 1 g 2 ::: g m ] ,where g i = idealvalue forthe i thentryofthesteadystate.Let N denotethenumberofenzymes.Theenzymevector showstheknockoutstatusoftheenzymes.Wedenoteitwith V E =[ e 1 ;e 2 ;:::;e N ] ,where e i = f 0 ; 1 g e i = 1if E i isknockedout,otherwise e i =0.Let [ r 1 ;r 2 ;:::;r m ] bethesteady stateofthatpathwaybasedontheenzymevector V E .Let i bearealnumberthatshowsthe importanceofthe i thentryofthesteadystate.Wediscusstheparameter i laterinthissection. 84

PAGE 85

Wewillusethevariable d i torepresentthedistancecontributionofthe i thentryofthe steadystate.Wedeveloptwoalternativedenitionsfor d i ,namelyexactandfuzzydistance. Exactdistance: Thismeasureisusefulwhentheexactvalueoftheentryinthegoalstateis known.Forthe i thentry,wewanttoapproachthegoalstateascloseaspossible.Wecomputethe distanceas d i = i j g i )]TJ/F24 11.9552 Tf 11.956 0 Td [(r i j Fuzzydistance: Thismeasureisusefulforextremepathwayanalysisorwhenwedonot knowtheexactvaluesforsomeentriesinthegoalstate.Inthiscase,we,however,knowalower orupperboundforsuchentries.Inotherwords,wewanttoincreaseordecreasethevalueof thatentrytoatleastoratmostagivenvalue.Thus,wehavetwopossibilities. Case1.decreasetheproductionofacompoundWewanttominimizethe i thvalueofthe steadystatewithathresholdof g i .Inotherwords, r i shouldbesmallerthan g i .Thesmallerthe better: d i = 8 > < > : 1 if g i r i i = g i )]TJ/F24 11.9552 Tf 11.955 0 Td [(r i for g i >r i Case2.increasetheproductionofacompoundWewanttomaximizethe i thvalueofthe steadystatewithathresholdof g i .Inotherwords, r i shouldbebiggerthan g i .Thebiggerthe better. d i = 8 > < > : 1 if g i r i i = r i )]TJ/F24 11.9552 Tf 11.955 0 Td [(g i for g i
PAGE 86

WecomputeSDasthelargestofthedistanceofalltheentriesofthestatevector,i.e.,SD = max i f d i g = jj d i jj 1 .NotethatonecanalsodeneitasSD= i d i .Othercombinationsof distancemeasuresdiscussedaboveareorthogonalwiththerestofthischapter.Therefore,wedo notdiscussthemfurther. Example1 ConsiderthepathwayinFigure4-1.Assumethatthegoalstateis V G = [0,0, 1,0,0,0,1,1,1].Thatis,wewantoneunitmoleculeofeachofthecompounds C 3 C 7 C 8 and C 9 ,andnoneoftheremainingcompounds.Also,assumethatnoneoftheenzymesareknocked outinthispathwayi.e., V E = [0,0,0].Thesteadystateofthispathwayis s 0 = [0,0,0.5, 0.5,0.5,0,3,1,1]SeeExampleB.IofAppendix.Thestatedistanceunderthisconditionis SD V E = jj s 0 )]TJ/F24 11.9552 Tf 11.955 0 Td [(V G jj 1 = 2. Nowassumethat E 1 isknockedouti.e., V E = [1,0,0].Wecomputethesteadystateafter knockingout E 1 as s 1 = [1,0,1,0,0,0,1,1,1]SeeExampleB.IIofAppendix.Applyingthe statedistance,weget SD V E = jj s 0 )]TJ/F24 11.9552 Tf 12.04 0 Td [(V G jj 1 = 1.Thus,knockingout E 1 bringsthesteadystate closertothegoalstate. ut Example2 ConsiderthepathwayinFigure4-1whennoenzymeisknockedout.Assume thatthegoalstateisthesameasthatinExample1.Withthedifferencethatwewanttomaximize theyieldofthecompound C 7 andobtainayieldofatleastoneunitofit.Inthiscase,weuse fuzzydistancefor C 7 withaminimumgoal=1.Thesteadystateofthepathwayisthesameas thatinExample1.Thus,thestatedistanceis max f 0,0,0.5,0.5,0.5,0,0.5,0,0 g = 0.5. ut 4.3.2TraversalMethod Givenametabolicpathwayandagoalstate,weaimtondthesetofenzymeswhose knockoutsleadtoasteadystatewithlowestvalueofSD.Onewaytosolvethisproblemisto exhaustivelyexaminetheSDvalueafterknockoutsofallpossiblesubsetsofenzymes.However, thisisnotfeasiblebecausethenumberofsubsetsisexponentialinthenumberofenzymes.In thissection,wedevelopatraversalalgorithmandtwooptimizationstrategiestoaccelerateit. 86

PAGE 87

Wetraversethesearchspaceusingabranch-and-boundstrategy.Weconsiderthesearch spaceasabinarytree.Eachnodeofthistreecorrespondstoapotentialsolution.Eachnode recordsfouritems;ithesetofenzymesthatareknockedout,iithesetofenzymesthatare notknockedout,iiithesetofenzymesthathavenotbeenconsideredsofar,andivtheSD valueofthepathwaysafteralltheenzymesintherstsetareknockedout.Intherootnode,the rstandthesecondsetsareempty.Therefore,alltheenzymesofthepathwayareinthethirdset. Werecursivelyvisitthenodesusinganin-ordertraversalmethod[37].Aftervisitingthecurrent node,wevisittheleftandrightchild.Movingfromaparenttoachildnodemeansconsidering anewenzymeontopoftheenzymesconsideredintheparentnode.Theleftchilddenotesthat thenewenzymeisknockedoutandtherightchilddenotesthatitisnotknockedout.We,then, computethecurrentSD.IfthecurrentSDislessthanthebestresultseensofar,weupdatethe valueofbestresultwiththenewone.Weproposetoimprovetheperformanceofthisalgorithm throughtwodifferentoptimizations: Optimization1. Inmanycases,theknockoutsofsomeuninspectedenzymescannot improvetheSD.Wesetthevaluesintheenzymevectorforsuchenzymestozeroi.e.not knockedout.Thisprocess,calledFiltering,canimprovetheperformanceofthealgorithmasit skipsmanylevelsofthesearchtree. Optimization2. Theselectionoftheenzymewhenwemovefromaparenttoachild impactstheperformanceofthealgorithm.Thisisbecauseifthenodesofthetoplevelsinthe searchtreehavesmallSD,thechanceoflteringthenodesinitssubtreebecomeslarge.Wecall thisthePrioritizationstrategy. Wediscussthesetwooptimizationslaterinthissection.Inordertoimplementthesetwo optimizationswe,rst,discusshowwequicklypredictSDincrementallywhenanewenzymeis knockedout. 87

PAGE 88

4.3.2.1PredictingthevalueofSD ComputingtheSDvalueofagivenenzymevectorrequirescomputingthesteadystateofthe pathway.Aneffectivepredictionstrategycanhelpinavoidingcomputationofthesteadystatesof alargenumberofnodesiftheirSDvaluesaregreaterthanthecurrentbest. Weconjecturethatthesteadystateafterknockoutsofenzymes E i and E j simultaneouslyis closetotheaverageofthatafterknockoutsof E i and E j separately.Notethatsimilarconjectures havebeenmadeintheliterature[13,61].Thisisintuitive,becausetheinuenceofknockouts oftwoenzymesshouldbecorrelatedwiththetotalinuenceofknockoutsoftheindividual enzymes.Wetestedthisconjecturebyrandomlysampling50enzymepairsineachoften randomlyselectedmetabolicpathwaysofHomosapiens H.sapiens .Theaveragecorrelation coefcient[28]ofthesteadystatebetweenactualandpredictedvaluesofthese500random sampleswas0.91. Itisworthmentioningtheourconjectureabovemaynotholdif E i and E j aredependent. Thedependencycanbeinseveralways.Forexample,multipleenzymesmaycarryoutthesame reaction.Insomecases,thepresenceofonlyoneoftheseenzymessufcesforthereaction tooccurwhileinsomecasesalltheseenzymeshavetobeexpressedforthereactiontooccur. Thesedependenciesmaycauseoverorunderpredictionsifthetwoenzymescreateantagonism orsynergismrespectively.Then,howmuchcanwetrustinthisconjecture?Inordertoanswer thisquestion,weperformedanotherexperimentasfollows.Weselectedamorethan100pairs ofdependentenzymes.Eachenzymepairinthissetarewithinthreeinteractionsofeachother. Wethenmeasuretheestimatedandactualsteadystatevaluesafterdeletingthem.Theaverage correlationcoefcientbetweenwasstill0.91.Thishighcorrelationsupportsourconjecture. 4.3.2.2Filteringstrategy Weltersomeuninspectednodesasfollows.IfthepredictedSDvaluesofthesenodesare biggerthanthecurrentminimumSD,welterthesenodes.Foragivennodeinthesearchtree, 88

PAGE 89

let A B and C denotethesetofenzymesthatareknockedout,thesetofenzymesthatarenot knockedoutandthesetofenzymesthatarenotyetconsideredrespectively.Foreachenzymein set C wepredicttheSDvalueafterthatenzymeisknockedoutinadditiontotheenzymesin A Ifthepredictedvalueforanenzymein C isworsethanthebestSDvaluefoundsofar,we movethatenzymetoset B .Moving h enzymesfromset B toset C isequivalenttoltering h levelsofthesearchsubtreerootedatthecurrentnode.Wepredictthesteadystateforasingle enzymeduringlteringintimeproportionaltothesizeofthestatevectorbyprecomputingthe steadystateafterknockoutofeachenzymealone. Itisworthnotingthatweareusingeachenzymeindependentlytopredictthesteady stateusingacombinationofenzymes.Thisprocessiserrorproneandcanleadtopruningof potentiallyusefulenzymeswhenusingthelteringstrategy.Weaddressthisproblembygiving eachenzymeseveralchancesbeforewelterit.Let K denotethenumberofchances,where K isapositiveinteger.Wecallthis K -chancestrategy.Toincorporatethisstrategy,wekeep avector,whereeachentryofthisvectordenotesthenumberoftimesthatanenzymeistested positivelyforalter.Welteranenzymeonlyifthatenzymeusesallofits K chances. 4.3.2.3Prioritizationstrategy: Weselectthemostpromisingenzymeinset C toconsiderforknockout.Anenzyme ispromisingifitsknockoutinadditiontotheenzymesinset A hasasmallSDwithhigh probability.Thisincreasesthechanceoflteringthenodesinitssubtree.Ourmethodworksas follows.Foreachoftheuninspectedenzymes,wepredictthesteadystateafterknockoutofthat enzymeinadditiontoalreadydeletedenzymes. WethencomputetheSDbetweenthatstateandthegoalstate.Wemovetheenzymewith thesmallestpredictedSDtoset A tocreatethenextchildnode. 89

PAGE 90

4.3.2.4Multipleoptimalsolutions: Ourtraversalalgorithmcanndmultiplesolutionssay t solutionsasfollows.Westore thetop t solutionsi.e.,the t solutionswithsmallestSDvaluesfoundsofar.Aswetraversethe searchspace,ifwendasolutionbetterthanthe t thbestsolutionsofar,wereplacetheworst solutioninourlistwiththenewone.Also,whenlteringthesearchspace,weusetheSDvalue ofthe t thbestsolutionsofarratherthanthatofthetopsolution. 4.3.3GeneticAlgorithm Thetimecomplexityofthetraversalmethodremainsexponential,thoughthelteringand prioritizationstrategiesreducethesearchspacesignicantly.Fromexperiments,themethod describedintheprevioussectionisonlyusefulfor30-35enzymes.Inthissection,wepropose ageneticalgorithmtosolvethetargetidenticationproblemforlargepathways.Thegenetic algorithmexploitsthetraversalmethodasabuildingblock.Themainideaofthegenetic algorithmistogenerateapopulationofcandidatesolutionsandimprovethesesolutionsthrough crossoverandmutationoperations.Thealgorithmstopsafterapredenednumberofepochs oriterations.Next,wedescribethegeneticalgorithmindetail.Thegeneticalgorithmusesthe followingdatastructure. Population:Thepopulation P isasetofcandidatesolutions f S 1 S 2 S num seed g .Here, eachsolution, S i isavectorthatshowswhichenzymesareknockedoutandwhichenzymesare not.Let N bethenumberofenzymesintheunderlyingpathway.Werepresentasolutionwith S i =[ s i 1 s i 2 s i N ] ,where s i j =0meansthattheenzyme E j isnotknockedout.Similarly, s i j = 1meansthat E j isknockedout. Algorithm4.1summarizesoursolution.Wediscussthedetailsofthisalgorithmnext. Initializepopulation Thisstepgeneratestheinitialpopulation, P ,whichcontainscandidate solutions.Ideally,agoodcandidateresemblestheoptimalsolutionintermsofboththenumber andtheselectionofenzymesthatareknockedoutbyit.However,wedonotknowtheoptimal 90

PAGE 91

INPUT: Epochs =numberofepochsthegeneticalgorithmruns Algorithm4.1 GeneticAlgorithmEpochs Initializepopulation. for i =1 toEpochs do Generatechildrenusingcrossover. Performselection. Performmutation. Shrinkeachsolutiontominimalsubset. endfor ReportthesolutionwithminimalSD. solutionatthisstep.Toaddressthisproblem,weneedtoanswertwoquestions.iHowmany enzymesareknockedoutineachcandidate?iiHowdowedecidewhichenzymesareknocked outineachcandidate?Toaddresstherstproblem,weemployourtraversalalgorithmin Section4.3.2asfollows.Weestimatethenumberofremovedenzymesingoodsolutionsi.e., solutionswithlowSD.Let denotethisestimate.Werunthetraversalmethodforthetop severallevelsofthesearchspace.Wesearch10levelstolimitthistraversaltime.We,then,select asetofsolutionswithsmallestSDsay20bestsolutions.Weestimate astheaveragenumber ofremovedenzymesinthesesolutions.Thelteringandprioritizationstrategiespushtheresults withsmallSDtothetoplevelsofthesearchtreewithhighprobability.Thus,limitingthesearch toonlyafewlevelsofthetreedoesnotdegradetheaccuracyof greatly. Oncewecompute ,thenextproblemistodecidewhichenzymeswillbeknockedout. Weusebinomialdistributiontocomputetheknockoutprobabilityofeachenzyme.Ideally, theprobabilityofknockingoutanenzymeshouldbehighifthatenzymehasahighpotential tocontributetoagoodsolution.Wepredictthispotentialofeachenzymebyconsideringthe SDvalueobtainedafterknockingoutthatenzyme.Supposethat SD E i denotesthevalueof 91

PAGE 92

SDwhenonlyenzyme E i isknockedout.Weconjecturethatif SD E i issmall,then E i hasa highprobabilitytocontributetoagoodsolution.Let p i denotetheprobabilityofknockingout enzyme E i inagoodsolution.Thereisanegativecorrelationbetween p i and SD E i .Weuse thefollowingequationstoestimate p i p i = = + SD E i : Inthisequation,theparameter isanormalizingconstant.Wecomputethevalueof fromthe observationthattheexpectedvalueofthetotalnumberofremovedenzymesis .Wedothis usingthefollowingequation. = X i p i = X i 1 1+ SD E i : From4and4,wecomputetheprobabilityofknockingoutenzyme E i as: p i = + SD E i P i 1 1+ SD E i : Now,wearereadytogeneratecandidatesolutions.Wecreateeachcandidatebyknockingout enzyme E i withprobability p i 8 j .Inpractice,wegenerate100candidatesolutionstocreatethe initialpopulation. Generatechildrenusingcrossover. Thisstepcomputesachildpopulationfromthecurrent population.Itaimstocombinetwoexistingsolutionstocreateabetterone.Generatinga childsolutioninvolvesthefollowingsteps:selectionoftwoparentsolutionsfromtheexisting populationandcrossoverofthesetwoparents. Werstdiscusshowwepicktwoparentsolutionsfromthecurrentpopulation.Weconjecturethatgoodparentscangenerategoodchildrenwithhighprobabilities.We,thus,randomly chooseeachparentusingabiaseddistributionthatprefersparentswithsmallSD.Supposethe probabilityofchoosingthesolution S i asaparentis x i .Basedonourconjecture,thereisan 92

PAGE 93

inverserelationbetween x i and SD S i .Assumethat isthecoefcientofthatcorrelation.We canwrite x i as: x i = = + SD S i Weneedtheparameter tocomputetheprobability x i .Wecancompute fromtheobservation thattheselectionprobabilitiesofallthesolutionsinthecurrentpopulationadduptoone. Formally, P i x i =1.Thus,wehave 1= X i x i = X i 1 1+ SD S i Wegetthevalueof from4anduseitin4tocompute x i as: x i =1 = + SD S i X i 1 1+ SD S i Sofar,wehavediscussedtheselectionofparents.Next,wediscussthecreationofa singlechildsolutionfromtwoparentsolutions.Wedenotetheparentsolutionswith F and M F;M 2 P ,where F = [ f 1 f 2 f N ] and M = [ m 1 m 2 m N ] .Wedenotethechildof F and M with Ch = [ Ch 1 Ch N ].Weusethefollowingcrossovermethodtoproducethe child.Werstsettheenzymesinthechildvectorforwhichbothparentshavethesamevalue. Formally,if f j = m j ,thenweset ch j = f j = m j .Wedecidethevaluesoftheremainingenzymes usingourtraversalmethodinSection4.3.2.ThisstrategyfavorschildrenthathavesmallSD valuesasthetraversalstrategiesseekssolutionswithsmallSD. Table4-1demonstratesthisprocessonanexample.Inthisexample, f 2 = m 2 =1,so ch 2 =1.Wesetthevalues ch 5 ch 6 and ch 9 similarly.Theparentsdonotagreeforthevaluesof ch 1 ch 3 ch 4 ch 7 and ch 8 .Weusethetraversalmethodtondtheirvalues.Todothis,weinitialize therootofthesearchtreeasfollows;thesetofremovedenzymesis f E 2 E 5 g ,thesetofenzymes thatarenotknockedoutis f E 6 E 9 g ,restoftheenzymesareundecided.Thetraversalalgorithm, then,traversesthesearchspacedenedbytheundecidedenzymestodeterminetheirvaluesthat minimizesSD. 93

PAGE 94

Werepeatedlyselecttwoparentsandcreateachilduntilthenumberofchildrenisequalto thetotalnumbersolutionsintheinitialpopulation. PerformSelection. Attheendofpreviousstep,wehaveasetofalreadyexistingsolutions, P andasetofchildsolutions.Thusthetotalnumberofsolutionsisdoublethesizeof P i.e.,itis 2 num seed .Inthisstep,weupdate P asthe num seed solutionswiththesmallestSDamong theunionof P andthechildsolutions. Performmutation. Thepreviousstepsrepeatedlyimprovethesolutionsintheinitialpopulation P .Thereis,however,theriskthatthesestepsgetstuckinalocalminimaoraplateau.Thisstep aimstoavoidsuchlocalminimaorplateaus.Todothis,wealterthesolutionsin P bymutation i.e.,changingtheknockoutstatusofsomeoftheenzymes.Wemutateeachsolutionin P except theonewiththeminimumvalueofSD.Foragivenmutationrate weusedarateof0.04,we changethevalueofeach s i j to1s i j withprobability Shrinkeachsolutiontominimalsubset. Inpracticalapplications,solutionsthatknockout fewerenzymesaredesirable.Thisisbecauseknockingoutanenzymeisacostlytask.Oneway todothisistoknockoutfewerenzymesthangivenineachsolutionwithoutchangingtheSD valueofthatsolution.Thisstepimprovesthesolutionsinthepopulationbyshrinkingtheset ofremovedenzymes.Indetails,weiterativelytesteachremovedenzymeofasolution.For eachsuchenzyme,weupdateitsstatetounremoved.IftheSDofthesolutiondoesnotincrease afterthismodication,weremovethisenzymefromthesetofremovedenzymespermanently. Otherwisewekeepit.Werepeatthisiterationuntilwetestalltheenzymes.Infact,therecanbe multipleminimalsubsetswiththesameSDvalues.Wehoweverdonotseehowtocomputethe SDvalueofareducedsetofenzymeswithoutactuallycomputingthesteadystateafterreducing theenzymeset.Thus,thenumberofreducedsetstriedwhileshrinkingbecomesthebottleneck ofthisstep.Selectingoneenzymeatatimeforremovalguaranteesasmallupperboundtothe runningtimeoftheshrinkingstep.Therefore,weusetheabovestrategy.Notethat,onecan 94

PAGE 95

randomlyshufetheenzymesandtrytondalternativereducedsetsusingdifferentenzymesas seeds.However,tokeepthefocusofthechapter,wedonotprovideexperimentswithalternative implementationsoftheshrinkingstep. Findingmultiplesolutions: Anobviousquestionthatfollowsthegeneticalgorithmis whetheritcanndmultiplealternativesolutionsratherthanasinglesolution.Ourgenetic algorithmcanreturnalternativesolutionsintwodifferentways.First,recallthateachgeneration containsapopulationofsolutions.Thus,eachmemberofeachpopulationisasolutionitself. Keepingthetop t solutionswithsmallestSDvalueswillreturn t alternativesolutions.Second, theshrinkingphaseofourgeneticalgorithmcanbealteredtoreturnalternativeshrinksets ofenzymeswiththesameSDvalues.Thiscanbedonebyshrinkingtheenzymesinrandom ordermultipletimesasdiscussedabove.Eachrandomorderingcanproduceanotheralternative minimalsubsetofenzymesthathasthesameSDvalueifsuchalternativeminimalsubsetsexist. 4.3.4UseofTraversalAlgorithminCrossoverandPerformanceHiccups ThecrossoverstepStep3ofourgeneticalgorithmusesourtraversalalgorithmtocreate childsolutionswithsmallSDvalues.We,however,advocatedtheuseofourgeneticalgorithm overourtraversalalgorithmforlargepathwaysbecausethetraversalalgorithmisnotscalable. Thus,weneedtoshowwhyourgeneticalgorithmisscalabledespiteitusesthetraversal algorithmforeachchildateachiteration. Thescalabilityofthetraversalalgorithmusedinthegeneticalgorithmfollowsfromthe observationthatitisusedonlyfortheundecidedenzymesi.e.,theenzymesforwhichthetwo parentsdisagree.Usingthenotationwedenedinthissection,wecancomputetheprobability thatenzyme E i isundecidedas 2 p i )]TJ/F24 11.9552 Tf 12.27 0 Td [(p i .Thisisbecauseoneparentknocksout E i whilethe otherdoesnot.Sincetheknockoutstatusofeachenzymeinaparentisdecidedindependently, theexpectednumberofundecidedenzymesis 2 P i p i )]TJ/F24 11.9552 Tf 12.652 0 Td [(p i ,orsimply 2 )]TJ/F15 11.9552 Tf 12.651 0 Td [(2 P i p 2 i .The actualvalueofthisexpectationdependsontheprobabilityvalues p i .Table4-2liststheexpected 95

PAGE 96

numberofundecidedenzymesforfourpathwaysofvaryingsizes.Itshowsthattheexpected valueissmall,andthus,thetraversalalgorithmcanbeusedwithoutaffectingthescalabilityof thegeneticalgorithminpractice. OnecanarguethatthepathwaysinTable4-2mighthaveasmallexpectednumberof undecidedenzymesduetotheirtopologyorsize,andthus,thesearchspacemaybelargefor largerpathwaysorpathwayswithdifferenttopologies.Tocompleteourdiscussiononscalability, weneedtocomputetheexpectednumberofundecidedenzymesintheworstpossibledistribution ofprobabilities p i .Thefollowingtheoremcomputestheworstexpectationandthecorresponding standarddeviation. Theorem2. Let N denotethenumberofenzymesinagivenpathway.Let betheexpectednumberofremovedenzymesinasolutionofthepopulationofsolutiongeneratedforthatpathwayby ourgeneticalgorithm.Theexpectednumberofundecidedenzymesinachildsolutionisatmost 2 )]TJ/F23 7.9701 Tf 13.151 4.707 Td [(2 2 N .Thestandarddeviationfortheworstscenariois p 2 )]TJ/F15 11.9552 Tf 11.955 0 Td [(2 N + N 2 N 2 )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 N N 2 Proof: WedenotetheexpectedvalueofarandomvariablewithE [ : ] .Wewillrstprovethemean intherstpartoftheproof.Inthesecondpart,wewillprovethestandarddeviation. Part1ofproof:Theworstscenarioanalysisoftheexpectednumberofundecidedenzymes.Let F =[ f 1 f 2 f N ] and M = [ m 1 m 2 m N ] .denotetwoarbitrarysolutionstakenfromthe populationofsolutions.Inordertocomputetheexpectednumberofundecidedenzymesduring thecrossoverof F and M ,werstestimatethenumberofdecidedenzymes,thatis,thenumber ofenzymesforwhich, f i = m i .Wedenetherandomvariable X i as: X i = 8 > < > : 1 if f i = m i 0 otherwise X i =1 intwocases: f i = m i =1 and f i = m i =0 .Let p i betheprobabilitythatthe i thenzyme inasolutionisknockedout.Then,theexpectedvalueof X i is 96

PAGE 97

E [ X i ]= p i p i + )]TJ/F24 11.9552 Tf 11.956 0 Td [(p i )]TJ/F24 11.9552 Tf 11.955 0 Td [(p i =2 p 2 i )]TJ/F15 11.9552 Tf 11.955 0 Td [(2 p i +1 Wewouldliketominimize E [ P X i ] subjecttotheconstraintthat P p i = .Therandom variables X i and X j areindependentfor i 6 = j .Therefore,wecomputetheexpectednumberof decidedenzymesas E [ X i X i ]= X i E [ X i ] = X i p 2 i )]TJ/F15 11.9552 Tf 11.955 0 Td [(2 p i +1 =2 X i p 2 i )]TJ/F15 11.9552 Tf 11.956 0 Td [(2 X i p i + N Fromthedenitionoftheprobabilities p i inEquationwehave P i p i = = X i p i N = P i p i N = N r P i p 2 i N Followsfrompowermeaninequality = 2 N X i p 2 i Usingtheinequalityweobtainedabove,weget E [ X i X i ]=2 X p 2 i )]TJ/F15 11.9552 Tf 11.955 0 Td [(2 + N 2 2 N )]TJ/F15 11.9552 Tf 11.955 0 Td [(2 + N 97

PAGE 98

Bydenitionofpowermeaninequality,weminimizethevalueoftheexpectationi.e.,equality casewhenall p i = p j forall i j .Thus,weconcludethat p i = N forall i .Usingthisvaluefor p i wecomputetheminimumvalueofE[ P X i ]as E [ X X i ]= 2 2 N )]TJ/F15 11.9552 Tf 11.955 0 Td [(2 + N: Theexpectednumberofundecidedenzymesis N E [ P X i ] .Therefore,themaximumnumber ofundecidedenzymesis N )]TJ/F15 11.9552 Tf 11.955 0 Td [( 2 2 N )]TJ/F15 11.9552 Tf 11.955 0 Td [(2 + N =2 )]TJ/F15 11.9552 Tf 13.151 8.087 Td [(2 2 N : Part2ofproof:Thestandarddeviationoftheworstscenariooftheexpectednumberofundecidedenzymes.Here,weanalyzethestandarddeviation fortheundecidedenzymesinthe worstscenarioforthemean. Assumethat, X = X 1 + X 2 + ::: + X n : Fromthedenitionofthestandarddeviation, ,is = p E [ X )]TJ/F21 11.9552 Tf 11.955 0 Td [(E X 2 ]= p E [ X 2 ] )]TJ/F15 11.9552 Tf 11.956 0 Td [( E [ X ] 2 : WeprovedinPart1ofourproofofTheorem1thatE [ X ]= 2 2 N )]TJ/F15 11.9552 Tf 12.622 0 Td [(2 + N intheworst scenario,andthatthisscenariohappenswhen, p i = N .Let q betheprobabilityof X i =1 inthe worstscenario.Then, q = p i p i + )]TJ/F24 11.9552 Tf 11.955 0 Td [(p i )]TJ/F24 11.9552 Tf 11.955 0 Td [(p i = N 2 + )]TJ/F24 11.9552 Tf 15.047 8.088 Td [( N 2 =1 )]TJ/F15 11.9552 Tf 11.955 0 Td [(2 N +2 2 N 2 : BeforewecomputeE [ X 2 ] ,weconsiderE [ X 2 i ] .Thevaluesof X i iseither0or1.Therefore, E [ X i ]= Pr X i =1= Pr X 2 i =1= E [ X 2 i ] : 98

PAGE 99

Themultiplication X i X j i 6 = j evaluatestooneonlywhen X i = X j = 1.Assumethat k of the X i shavevalueofoneandtheremaining N )]TJ/F24 11.9552 Tf 12.186 0 Td [(k ofthe X i shavevalueofzero.Then,exactly k k )]TJ/F15 11.9552 Tf 12.277 0 Td [(1 ofthe X i X j multiplicationsevaluatetoone.Let Z k = k k )]TJ/F15 11.9552 Tf 12.278 0 Td [(1 denotethis.Now,we arereadytoconsiderE [ X 2 ] E [ X 2 ]= E X 1 + X 2 + ::: + X n 2 = E [ X i X 2 i + X i 6 = j X i X j ] = E [ X i X 2 i ]+ E [ X i 6 = j X i X j ] = E [ X i X i ]+ E [ X i 6 = j X i X j ] byE [ X i ]= E [ X 2 i ] = E [ X i X i ]+ X i 6 = j Pr X i =1 ;X j =1 = E [ X i X i ]+ X k 2 Pr X = k Z 2 k = E [ X i X i ]+ X k 2 0 B @ N k 1 C A q k )]TJ/F24 11.9552 Tf 11.955 0 Td [(q N )]TJ/F25 7.9701 Tf 6.586 0 Td [(k Z 2 k = E [ X i X i ]+ q 2 X k 2 N k N )]TJ/F24 11.9552 Tf 11.955 0 Td [(k q k )]TJ/F24 11.9552 Tf 11.955 0 Td [(q N )]TJ/F25 7.9701 Tf 6.586 0 Td [(k k k )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 = E [ X i X i ]+ X k 2 N k )]TJ/F15 11.9552 Tf 11.955 0 Td [(2! N )]TJ/F24 11.9552 Tf 11.955 0 Td [(k q k )]TJ/F24 11.9552 Tf 11.955 0 Td [(q N )]TJ/F25 7.9701 Tf 6.586 0 Td [(k 99

PAGE 100

= E [ X i X i ]+ N N )]TJ/F15 11.9552 Tf 11.956 0 Td [(1 q 2 X k 2 0 B @ N )]TJ/F15 11.9552 Tf 11.956 0 Td [(2 k )]TJ/F15 11.9552 Tf 11.955 0 Td [(2 1 C A q k )]TJ/F23 7.9701 Tf 6.586 0 Td [(2 )]TJ/F24 11.9552 Tf 11.955 0 Td [(q N )]TJ/F25 7.9701 Tf 6.587 0 Td [(k = E [ X i X i ]+ N N )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 q 2 q +1 )]TJ/F24 11.9552 Tf 11.955 0 Td [(q N )]TJ/F23 7.9701 Tf 6.587 0 Td [(2 = E [ X i X i ]+ N N )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 q 2 = 2 2 N )]TJ/F15 11.9552 Tf 11.955 0 Td [(2 + N + N N )]TJ/F15 11.9552 Tf 11.956 0 Td [(1 )]TJ/F15 11.9552 Tf 11.955 0 Td [(2 N +2 2 N 2 2 by q =1 )]TJ/F15 11.9552 Tf 11.955 0 Td [(2 N +2 2 N 2 Finally,wecomputethestandarddeviation. = p E [ X 2 ] )]TJ/F15 11.9552 Tf 11.955 0 Td [( E [ X ] 2 = r 2 2 N )]TJ/F15 11.9552 Tf 11.955 0 Td [(2 + N + N N )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 )]TJ/F15 11.9552 Tf 11.955 0 Td [(2 N +2 2 N 2 2 )]TJ/F15 11.9552 Tf 11.955 0 Td [( 2 2 N )]TJ/F15 11.9552 Tf 11.955 0 Td [(2 + N 2 = p 2 )]TJ/F15 11.9552 Tf 11.955 0 Td [(2 N + N 2 N 2 )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 N N 2 2 Figure4-3plotstheexpectednumberofundecidedenzymesintheworstcaseforvarying pathwaysizesand .Theverticallinesshowthestandarddeviationineachdirection.Weplotthe expectationforupto = 10,sinceweobserved < 10inpractice.Thegureshowsthatfor practicalvaluesof ,theexpectednumberofundecidedenzymesremainsmallenoughtomake thegeneticalgorithmefcient.Furthermore,asthepathwaysizegrowsbyeightfold,theupper boundfortheexpectationgrowsbyonlyafewenzymes.Thus,weconcludethatouralgorithmis scalabletolargepathways. 100

PAGE 101

4.4Results Inthissection,weevaluateouralgorithmsonrealdatasets.WeevaluatetheirbiologicalsignicanceonrealapplicationsSection4.4.1.Wealsoevaluatetheirperformancequantitatively Section4.4.2usingthefollowingtwomeasures: SD: TheSDvalueisthedistancefromthegoalstate.AsmallSDvalueindicatesabetterresult. Executiontime: Thisindicatesthetotaltimeinsecondstakenbyouralgorithms. Weusethemetabolicpathwayinformationof H.sapiens and E.coli fromKEGGasthe inputdataset.Wepresentresultsforeightpathwaysofvaryingsizesinourexperiments.Details ofthesedatasetsareshowninTable4-3.KEGGIdistheuniqueidentierofeachpathwayin theKEGGdatabase.#E,#Rand#Cdenotethenumberofenzymes,reactionsandcompounds respectivelyinthepathway.Thesymbol-meansthatnoKEGGIdexistsforthismetabolism. WeimplementedthedevelopedalgorithmsinC++.Inourexperiments,wesetthedefault valuesoftheparametersasfollows.InthecomputationofSD, i =1,foralltheentriesfor simplicity.Fromourexperience,thelteringstrategydoesagoodjobwithtwochances.The geneticalgorithmworkswellwhenthemutationrate, ,is0.04.Toestimate ,wesearchthe rst10levelsofthesearchspacetogettop20bestsolutionsforthebalanceofefciencyand accuracy.Weranourexperimentsonasystemwithdual2.2GHzAMDOpteronProcessors,4 gigabytesofRAM,andaLinuxoperatingsystem. Weusethefollowingstrategytosettheinitialstateofthemetabolicnetworksinour experiments.Someofthecompoundsusedinthemetabolicpathwayareproducedwithinthe samepathwaywhilesomearesuppliedfromexternalsources.Thisinformationisavailablein theexistingmetabolicpathwaydatabases.Inourexperiments,weassumethesameamountone molofeachcompoundthatisexternallysupplied.Thisisbecausewedonotwanttoarticially putanybiastowardsanyexternallysuppliedcompound.Wealsoassumethattheinternally producedcompoundsinitiallydonotexistinthesystem.Thisisbecause,weitisrealistictolet 101

PAGE 102

thereactionsdecidewhichoneisproducedmoreratherthanarticiallysettingsomevalues.That said,itisimportanttorealizethatouralgorithmscanworkforanyinitialstateastheydonot benetfromtheseassumptions. 4.4.1EvaluationoftheBiologicalSignicance Werstevaluatethebiologicalsignicanceofouralgorithmsusinguxoryield.Wedothis bycomparingourresultstoknownresultsintheliterature. 4.4.1.1MetabolicengineeringofGlycolysis/Gluconeogenesispathway TheGlycolysispathwayof T.pallidum producesPhosphoenolpyruvate.Deetal.,[18] studiedthepaththatmaximizestheproductionofthiscompound. T.pallidum ,however,hasa simplepathwaythatcontainsonlyfourenzymes.WeconsideredtheGlycolysis/Gluconeogenesis pathwayof E.coli .Thisorganismhassignicantlylargerandmorecomplexpathway,that containsalltheenzymesof T.pallidum andmanymore.Weranouralgorithmbysettingthe goalstatetomaximizationofPhosphoenolpyruvatei.e.,goal > 0forPhosphoenolpyruvate. Ourresultssuggestknockingoutthesetofenzymes f phosphoglucomutase,aldose1-epimerase ,fructose-1,6-bisphosphataseII,phosphoglyceratekinase,pyruvatekinase g ,seered,crossed outenzymesinFigure4-4.Whenalltheseenzymesareknockedout,thepathwayfrom alpha-D-Glucose1-phosphatetoPhosphoenolpyruvateis,alpha-D-Glucose1-phosphate alpha-D-Glucose alpha-D-Glucose6-phosphate $ beta-D-Glucose-6P $ beta-D-Fructose6P beta-D-Fructose-1,6P2 $ Glycerone-P $ Glyceraldehyde-3P $ Glycerate-1,3P2 Glycerate-3P $ Glycerate-2P $ Phosphoenolpyruvate.ThisisthesamepathfoundforT. pallidum[18].Thiscanhaveseveralexplanations.Theenzymefructose-bisphosphatasetakes partinseveralpathwaysof E.coli suchastheFructoseandmannosemetabolism.Thisenzyme howeverslowsdowntheproductionofbeta-D-Fructose-1,6P2whichisneededforproduction ofPhosphoenolpyruvate.Ourresultssuggeststhatknockingthisenzymeoutwillaccelerate theproductionofPhosphoenolpyruvate.Thisenzymehoweverdoesnotappearin T.pallidum 102

PAGE 103

Therefore,theproductionofPhosphoenolpyruvategoesthroughthesamepathin T.pallidum withoutinterruptionofotherenzymes.Similarly,pyruvatekinaseconvertsPhosphoenolpyruvate topyruvatein E.coli ,butnotin T.pallidum .ThusitreducestheamountofPhosphoenolpyruvate in E.coli .Thisexampleshowsthatwecansuccessfullyidentifythesefactorsandoptimizethe productionofthedesiredcompounds. 4.4.1.2ApplicationontheproductionofPhenylalanine L-Phenylalanineiswidelyusedinthemanufactureofaspartameandinparenteralnutrition [1].IndustrialapplicationneedanefcientprocesstogenerateL-Phenylalanine.Considering thecommercialapplication,Backmanetal.selected E.coli astheproductionorganism[1]. Thus,westudiedthePhenylalanine,tyrosineandtryptophanbiosynthesispathwayof E.coli WeranouralgorithmbysettingthegoalstatetomaximizationofPhenylalanine.Ourresults suggestknockingoutthesetofenzymes f phenylalaninetranslase,chorismatelyase,aspartate aminotransferase,tyrosyl-tRNAsynthetase g ,seethered,crossedoutenzymesinFigure4-5. Usually,phenylalanineissynthesizedfromglucoseandammoniaincommonbacterialsystems. InhibitingphenylalaninetranslasestopsthePhe-tRNAsynthesisfromphenylalanine.Thus,the concentrationofpheylalanineisaccumulated.Inhibitingchorismatelyasecutstheuxtoother biosynthesis.It,then,reducestheby-productsandefcientlyusestherawmaterials. 4.4.1.3IncreasingtheproductionofcGMP Thecompound3',5'-CyclicGMPcGMPplaysanimportantroleintheheartfailure. OnetreatmentistoincreasecGMP[12,42,74].cGMPappearsinthePurinemetabolism.We considerthispathwayin H.sapiens inthisexperiment.Wedesigntheexperimentasfollows. Werstcomputethesteadystatewhenalltheenzymesareactive.Weconsiderthissteady stateasthegoalstateexceptthecGMPentry.ForthecGMPentry,wesetourgoaltomaximize itsproduction.OuralgorithmsonthepathwaysuggestsknockingouttheenzymesPDEand guanase.Theliteraturesupportsthisresult.PDEinhibitorisakindofdrugthatblocksthe 103

PAGE 104

subtypesofPDE,thenitincreasestheconcentrationofcAMPorcGMPorboth.Itcantreatheart failure[12,32].WepredictthatthesideeffectmaybelessifPDEinhibitorandguanaseinhibitor areusedtogetherratherthanknockingoutPDEalone. 4.4.1.4MetabolicengineeringoftheButanoatemetabolism. Poly-hydroxybutyrateisanessentialcompoundforproducingplastics.Thus,increasing itsproductioniscriticalformanyindustrialapplications.Butanoatemetabolismproduces thiscompound.WerunouralgorithmontheButanoatepathwayof E.coli withthegoalof maximizingthiscompound.Ourresultspredictthattheknockoutsofphosphotransbutyrylaseand BHBDincreasetheentryvalueofpoly-hydroxybutyratewhileincurringminimumdamageto therestofthemetabolism.Infact,Vazqueetal.showstheevidenceofanassociationbetween poly-hydroxybutyrateandphosphotransbutyrylase[103]. 4.4.1.5AcetatereductioninE.coli Inthisexperiment,weshowhowourmethodperformswhenitisappliedonametabolism usingthelinearconstraintsoftheclassicaluxbalanceanalysis. Yangetal.showsseveralstrategiesforacetatereductioninthecentralmetabolicpathways ofE.coli[113].Onemethodistodirectlyinuencetheformationofacetate.Whenwerunour algorithmson E.coli Glycolysis/Gluconeogenesispathwaybysettingthegoaltominimization oftheyieldofacetate,theresultenzymeisacyl-activatingenzyme.Thisresultisconsistentwith thebiologicaldiscovery.Inhibitionofacyl-activatingenzymecutsdowntheacetyl-CoA-acetate pathway.Thedestructionofthispathwayresultsinthelowlevelofacetate[113].Theresults foundbyouralgorithmssuggeststoknockoutdihydrolipoamideacetyltransferaseandpyruvate decarboxylase,whichdestroythepathwayfromPyruvatetoAcetyl-CoA.Theseresultsare consistentwiththemetabolicengineeringmethods,e.g.,thedestructionorcompleteelimination oftheportionofthereactionpathwayattheacetyl-CoAnode[113]. 104

PAGE 105

4.4.1.6PurinemetabolismonGeneralizedMassActionGMAmodel Inthisexample,wedemonstratetheresultsfoundbyourmethodwhenitisappliedon ametabolismusinganon-linearconstraints.OneofthesignicantproductionsinPurine metabolismisUricacid.WhentheconcentrationofUricacidinhumanbodyisabovethenormal range,itresultsinsomedisease,suchas,hyperuricemiaandgout.Weapplyouralgorithms toPurinemetabolismonGeneralizedMassActionGMAmodel.GMAmodelconsidersthe concentrationofeachcompoundanditisnon-linearsystemseeSection6.1.Weutilizethe GMAmodelonPurinemetabolisminChapter10inVoit'sbook[108].Werunourgenetic algorithmonthismetabolismbysettingthegoaltominimizetheconcentrationofuricacidand keeptheconcentrationofothercompoundsunchanged.Theresultingenzymesetis f xanthine dehydrogenase,adenosinedeaminase,Oxidoreductases g .Fromtheliterature,weknowthat inhibitingxanthinedehydrogenaseoroxidaseiswidelyusedfortreatmentofhyperuricemiaand gout[70].Whenuricacidproductionisblockedbyinhibitingofxanthineoxidase,itresultsin anincreaseinhypoxanthineandxanthine.Thus,inhibitingadenosinedeaminasecandeceasethe concentrationofhypoxanthineandxanthine. 4.4.2QuantitativeAnalysisoftheProposedMethods Inthissectionwequantitativelyevaluatetheperformanceofourtraversalandgenetic algorithms. 4.4.2.1Evaluationofthetraversalmethod Thegoalofthisexperimentistoevaluatetheperformanceofthetraversalmethod.For eachpathway,weconstructthegoalstateasthesteadystateofthatpathwaywhennoenzyme isknockedout.We,then,modifythepathwaybyeliminatinganenzymefromthatpathway andsearchtheresultingpathwaytondthesolutionthatisclosesttothegoalstate.Foreach pathway,wecreateonequeryforeachenzyme.Thus,wehaveasmanyqueriesasthenumberof enzymes.Wereporttheaveragevaluesoverallqueriesforeachpathway. 105

PAGE 106

Theexperimentdescribedaboveisbasedonthefollowingbiologicalintuition.Anorganism is,often,healthyifallofitsenzymesfunctionwell.Thiscorrespondstothecasewhenno enzymesareknockedout.Thus,thecorrespondingsteadystateisthegoalstate.Anorganism maysufferfromadiseaseifanenzymemalfunctions.Inthiscasethequerywillcorrespondtoa pathwaythathasmalfunctioningenzymes.Weaimtondanenzymesetwhoseknockoutleadsit backtothehealthystateascloseaspossible. Wedidnotcomparethetraversalalgorithmtoexhaustivesearchforbiggerpathways astheexhaustivesearchrequiresweekstomonthsevenforasinglequery.Forexample,the exhaustivemethodwillcostdayswhenthenumberofenzymesaremorethan30.InFigure46andFigures4-7,`E'followedbyanumbershowsthenumberofenzymesinthepathway. Figure4-6showstheaverageSDofthesolutionscomputedbythetraversalmethodandthose oftheexhaustivesearchalgorithm.TheresultsshowthattheSDvaluesofthetraversalmethod andthoseoftheexhaustivesolutionarealmostidentical.Thus,thetraversalmethodisagood approximationtotheoptimalsolution.Figures4-7presentstheaveragerunningtimeofthe traversalmethodascomparedtotheexhaustivesearchalgorithm.Therunningtimeofexhaustive searchis1.2to11timesthatofthetraversalmethod.However,itisalsoclearthattherunning timeofthetraversalmethodincreasesexponentiallywiththenumberofenzymes.Forexample, inGlycine,serineandthreoninemetabolismthenumberofenzymesis32,therunningtimeof thetraversalmethodismorethantwodays.Therefore,itisimpracticalforverylargepathways, althoughitcanbeusedtosolvelargersizedproblemsthantheexhaustivesearch. Ournalexperimentinthissectionevaluatesthevariationofthenumberofenzymes knocked-outinthetopresultsfoundbyourtraversalalgorithm.Wemeasurethisasfollows.We usetheUreacycleandmetabolismofaminogroupsinthisexperiment.Werunaqueryafter deletingeachofthe21enzymesinthisnetwork.Wendthetoptensolutionsforeachofthe21 queriesi.e.,totally210solution.Figure4-8showsthedistributionofthenumberofenzymes 106

PAGE 107

knockedoutinthesesolutions.Theresultsshowthatthenumberofenzymesdeletedcanvary. However,theyclusteraroundseveralvaluesinthiscasetwovalues.Thisresemblesamixed Gaussiandistribution. 4.4.2.2Evaluationofthegeneticalgorithm Thisexperimentevaluatestheperformanceofourgeneticalgorithm.Similartotheevaluationofthetraversalmethod,wedesigntheexperimentwithpathwaysupto84enzymes. Weobtainthepathwaysthathavemorethan52enzymesbycombiningmultiplepathways. Forexample,Pathway00230+00790denotesthepathwayobtainedbycombining00230and 00790.Foreachofthesepathways,wecreatedonequeryforeachenzymesimilartotheprevious section.Wereporttheaveragevaluesoverallqueriesforeachpathway. Thetraversalalgorithmisimpracticalforsuchlargepathwaysduetotheexponential timecomplexity.We,therefore,implementatruncatedversionofthetraversalalgorithm.This versiontruncatesallthenodesdeeperthan L levels,where L isagivenparameter.Choosing anappropriatevaluefor L makestheexecutiontimeofthetruncatedmethodsuitablefor pathwayswithalargenumberofenzymes.Forexample,thenumberofenzymesinGlycolysis /Gluconeogenesispathwayis27.Intheworstcaseitgenerates 2 27 nodesrequiringan executiontimeof15+hours.Inordertoboundtherunningtimeofourtraversalalgorithmfor largepathways,wetruncatedthedepthofthesearchafteraxeddepth L .Wechose L tobe20 or23astherunningtimequicklybecomesimpracticalforlarger L Table4-4presentstheSDvalueandtherunningtimeofthetraversalmethodandthegenetic algorithmfordifferentpathways. D 20 and T 20 denotetheaverageSDofthebestsolutionand theaveragetimerequirementformultiplequeriesusingthetruncatedmethod. D tral and T tral denotetheaverageSDandtheaveragerunningtimeforthetraversalmethod. D ga and T ga denote theaverageSDandtheaveragerunningtimeforthegeneticalgorithmrespectively.Allthe timeisinseconds. D 0 denotestheaverageSDbetweenthesteadystateofthequerypathway 107

PAGE 108

andthegoalstate. 1 denotesthattherunningtimeismorethanoneday.?denotesthatnoSD valuescanbecomputedwithinoneday.Figure4-6hadalreadydemonstratedthatourtraversal algorithmndsnearoptimalsolutionsforsmallsizedpathways.Thisresultsinthistableshows thattheperformanceofthegeneticalgorithmiscomparabletothetraversalmethod.Thus,it ndsaccurateresults.Forexample,inMetabolismofxenobioticsbycytochromeP450, SDvaluesandrunningtimearesimilarforbothmethods.InUreacycleandmetabolismofamino groups,thegeneticalgorithmrunsmuchfasterwhileitobtainsworseSDvaluesthanthe traversalmethod.Whenthesizeofpathwaysincreases,theperformanceofthegeneticalgorithm ismuchbetterthanthetraversalmethod. InTable4-4,thedifferencebetweentheinitialdamage D 0 and D 20 showshowmuchthe truncatedtraversalalgorithmreducesthedistancebetweentheinitialandthegoalstate.The differencebetween D 20 and D ga showtheamountofimprovementobtainedbythegenetic algorithmoverthetruncatedtraversalmethod.Theseresultsshowthatthegeneticalgorithm generatessignicantlybettersolutionslowerdamagevaluesascomparedtothetruncated traversalmethodforallthecases.ThegeneticalgorithmfoundsolutionsthathaveSDvalues thatare2%to39%lowerthanthatfoundbythetruncatedtraversalalgorithm.Inaddition,the timerequirementsofthegeneticalgorithmwerecomparableorbetterthanthetruncatedtraversal method.Forpathway00010,thegeneticalgorithmgeneratedonanaverage 8 : 06 %improvement overthetraversalmethod.Themaximumimprovementintheseexperimentswas 39 %.Wehave similarcomparativegainsforotherpathways. Onecanarguethattheeffectivenessofthetraversalmethodcanbelimitedduetothe limitednumberoflevels.Toevaluatethetradeoffbetweentheaccuracyandrunningtimeof thetruncatedalgorithmbetter,weincreasethedepthofthetraversalalgorithmuntilitspends atleastasmuchtimeasthegeneticalgorithmforallthepathways.Forthispurpose,wetested thetruncatedalgorithmfor23levels.Table4-5showsthecomparisonbetweenthegenetic 108

PAGE 109

algorithmandthetraversalmethodwith23levelsforthelargestpathways.FromTable4-5,the runningtimeoftruncatedalgorithmwiththeadditional3levelsisuptoanorderofmagnitude higher.However,theaccuracyonlyimprovedbyasmallamountandismuchlowerthanthe geneticalgorithm.Itisworthnotingthatthegeneticalgorithmrequiredconsiderablylesstimefor mostofthesecases.Therefore,thegeneticalgorithmisagoodchoicefortheenzymatictarget identicationproblemforverylargepathways. Weapplyourgeneticalgorithmtothelargemetabolism.Fortheenergymetabolismand aminoacidmetabolism,ourgeneticalgorithmcostslessthanonehour.Evenforthewhole metabolism,ourgeneticalgorithmrunslessthantwelvehours.DetailsareinTable4-6. Ingeneticalgorithm,thereexistseveralparameters.First,wetesttheimpactofmutation rateontheSDvalues.Table4-7presentstheaverageSDvalueswithdifferentmutationratesin severalpathways.InTable4-7,theaverageSDvaluesdonotvarymuchwithdifferentmutation rate.Thus,weselect =0.04asweobservedslightlybetterSDvalueswiththischoiceinour experiments. InStep1ofthegeneticalgorithm,toestimate ,wesearchtherst10levelsofthesearch spacetogettop20bestsolutions.Table4-8presentstheestimated valuesandthecosttime fordifferentlevelsfortheGlycolysis/Gluconeogenesispathways.Asweincreasethenumberof levelsthevalueof increasesmonotonically.Thisisbecauseeachlevelsuggestsknockingout newenzymesontopofalreadyknockedoutones.However,althoughthenumberofsolutions increaseexponentiallyaswelookatdeeperlevels,theestimatedvalueof growsslowly. Betweenlevels8and10weestimatetheactualvalueof correctly. Table4-8alsoshowsthattraversingmorelevelsincreasestherunningtimeexponentially. Inpractice,wehaveobservedthatthereisnoimprovementintheSDvaluefoundbyourgenetic algorithmaswegotraversemorelevelstoestimate .Alsotherunningtimegrowsexponentially 109

PAGE 110

withthenumberoflevels.Therefore,inordertobalanceefciencyandaccuracy,wetraverse only10levelsforthesepracticalpurposes. 4.4.2.3Comparisontoanexistinggeneticalgorithmmethod Ourlastexperimentcomparestheperformanceofourgeneticalgorithmtoarecentwork. Patiletalpresentedanevolutionaryprogrammingmethodforndingoptimalgenedeletion strategies[72].Theirmethodusesgeneticalgorithmtogenerateapopulationofrandomsolutions.Thismethodcanbeappliedtoourenzymatictargetidenticationproblemdirectly.We implementPatil'smethodinC++.WecompareourgeneticalgorithmwithPatil'smethodin accuracy.Table4-9showstheSDvalueofPatil'smethodandourgeneticalgorithmforseven pathwaysandlargemetabolism. D Pat and D ga denotetheaverageSDforPatil'sandourmethod respectively.Similarly, NE Pat and NE ga denotetheaveragenumberofenzymesknockedout usingPatil'smethodandourgeneticalgorithmrespectively. OurmethodconsistentlyoutperformsPatiletal.'smethodinalltestcases.TheSDvalue ofPatiletal.'smethodis40%to3000%morethanthatofourmethod.Oneobviousquestion iswhetherourmethodndsbetterSDvaluesattheexpenseofknockingoutmoreenzymes thanPatiletal.'smethod.ThelasttwocolumnsofTable4-9showthatPatiletal.'smethodis knockingoutupto20.9timesmoreenzymesthanouralgorithmontheaverage.Weconclude thatouralgorithmissuperiorbothintermsofndingasolutionclosetothegoalstateandthe costindoingthat. ThereasonsbehindthesuccessofourmethodoverPatiletal.'sisthatourmethodrandomly knocksouteachenzymeusingadifferentprobabilitydistribution.Thisdistributiondependson thelikelihoodthattheeachenzymeisapartofagoodsolution.Furthermore,unlikePatiletal.'s method,ourcrossoverstrategyoptimizesthechildsolutioncreatedateachgeneration. 110

PAGE 111

4.5Discussion Thegoalofenzymatictargetidenticationproblemistoidentifythesetofenzymeswhose knockoutsleadtoa steadystate ofthemetabolicpathwaythatisclosetoagoalordesiredstate. Wedevelopanoveldistancemeasure,State-distanceSDthatmeasuresthedamageofthe knockoutsofasetofenzymesasafunctionofthedeviationoftheentryinthesteadystateafter theirknockoutsfromthatinthegoalstate.Usingthismeasure,wedeveloptwoalgorithmsthat arebasedonsearchspacetraversalandgeneticalgorithms. Experimentsusingthemetabolicpathwaysof H.sapiens and E.coli fromKEGGshowthat ouralgorithmscanbeusefulfornumerousapplicationincludingmetabolicengineeringand biomedicine.Ourtraversalmethodiseffectiveforpathwayswithupto30-35enzymes.Our geneticalgorithmiseffectiveforarbitrarilylargepathways. InSection4.3,wediscusstheenzymedeletionstrategiesfortheenzymatictargetidenticationproblem.However,geneticmanipulationscanleadtopartialreductionoftheenzyme concentrationsratherthanknockingitoutentirely.Onewaytodealwiththisistorepresentan enzymestatusbyanintegervariable, f 0,1,2,..., m g thatshowsthelevelofenzymeactivity. Here,theenzymecanexistin m +1status.0meansthattheenzymeisinactive.1to m presents theactivityleveloftheenzymewith m beingthehighestactivity.Inthissituation,ourmethods, traversalmethodandgeneticalgorithmcanstillwork.Geneticalgorithmcanbeusedtothissituationdirectly.Fortraversalmethod,thesearchspaceisnolongerabinarytree.Foreachnode, ithas m +1childrenwhichdenotesthe m +1enzymestatus.Thelteringandprioritization strategycanalsobeappliedtothissituation.Weexpectthatthiswillhurttherunningtimeofthe traversalmethodsignicantly. 111

PAGE 112

Figure4-1.Graphrepresentationofametabolicpathwaywiththreereactions R 1 R 2 R 3 ,two enzymes E 1 and E 2 ,andninecompounds C 1 C 9 .Dashedlinesshowtheimpact ofknockingoutenzyme E 1 Figure4-2.Fluxdistributionofahypotheticalpathway. Table4-1.Anexampleshowingthegenerationofachild Ch fromtwohypotheticalparents F and M forapathwaythatcontainsnineenzymes. ? denotesanundecidedvalue. f 1 f 2 f 3 f 4 f 5 f 6 f 7 f 8 f 9 F 011010010 m 1 m 2 m 3 m 4 m 5 m 6 m 7 m 8 m 9 M 110110100 ch 1 ch 2 ch 3 ch 4 ch 5 ch 6 ch 7 ch 8 ch 9 Ch ?1??10??0 112

PAGE 113

Figure4-3.Upperboundtotheexpectednumberofundecidedenzymesfordifferent and pathwaysizes N = j E j .Theverticalbarsshowthestandarddeviationineach direction. Table4-2.Theexpectednumberofundecidedenzymesfordifferentpathways.#Edenotesthe numberofenzymes. Metabolicpathway #E Exp.num. Valine,leucineandisoleucinedegradation 24 3.40 Glycolysis/Gluconeogenesis 27 2.69 Glycine,serineandthreoninemetabolism 32 7.15 Purinemetabolism 52 1.72 Table4-3.MetabolicpathwaysfromKEGGthatareusedinourexperimentsinthischapter. KEGGId Metabolicpathway #E #R #C 00980 MetabolismofxenobioticsbycytochromeP450 7 49 57 00670 Onecarbonpoolbyfolate 17 23 9 00220 Ureacycleandmetabolismofaminogroups 21 22 27 00310 Lysinedegradation 21 25 28 00280 Valine,leucineandisoleucinedegradation 24 33 32 00010 Glycolysis/Gluconeogenesis 27 31 25 00260 Glycine,serineandthreoninemetabolism 32 33 37 00230 Purinemetabolism 52 92 65 EnergyMetabolism 50 46 60 AminoAcidMetabolism 195 317 305 wholemetabolism 640 1176 1067 113

PAGE 114

Figure4-4.Glycolysis/Gluconeogenesisfor E.coli .Theenzymeshighlightedingreenarethe enzymesthatexistin E.coli 114

PAGE 115

Figure4-5.Phenylalanine,tyrosineandtryptophanbiosynthesisfor E.coli .Theenzymes highlightedingreenaretheenzymesthatexistin E.coli Figure4-6.AverageSDvaluesofthetraversalmethodandexhaustivesearchfordifferent pathwaysovermultiplequeries. 115

PAGE 116

Figure4-7.Theaveragerunningtimeinsecondsofthetraversalmethodandexhaustivesearch overmultiplequeriesfordifferentpathways. Figure4-8.DistributionofthenumberofenzymesforUreacycleandmetabolismofamino groups. Table4-4.Comparisonofthetruncatedtraversalmethodmaximumnumberoflevels=20and thegeneticalgorithmfordifferentpathways.?denotesthatnoSDvaluescanbe computedwithinoneday. SD Time Pathways D 0 D 20 D tral D ga T 20 T tral T ga 00980#E7 0.808 0.75 0.75 0.75 0.83 0.83 1 00220#E21 0.970 0.882 0.53 0.88 28 40 2 00280#E24 2.611 2.596 2.581 2.581 20 265 2 00010#E27 2.687 2.556 2.454 2.370 166 8386 52 00260#E32 0.837 0.819 ? 0.797 162 1 38 00230#E52 11.218 1.160 ? 1.156 145 1 153 00230+00790#E61 6.848 1.497 ? 0.920 58 1 573 00230+00030#E67 8.590 3.682 ? 3.159 465 1 410 00230+00340#E67 5.504 1.274 ? 0.902 122 1 168 00230+00260#E84 6.342 1.322 ? 0.996 121 1 494 116

PAGE 117

Table4-5.TheaverageSDvalueandtherunningtimeofthetruncatedtraversalalgorithmwith maximumnumberoflevels=23andtheGeneticAlgorithm.Therunningtimeis reportedinseconds. SD Time Pathways D 23 j D ga T 23 j T ga 00280#E24 2.581 j 2.581 144 j 2 00010#E27 2.525 j 2.370 987 j 52 00260#E32 0.819 j 0.797 1042 j 38 00230#E52 1.568 j 1.084 1366 j 153 00230+00790#E61 1.368 j 0.920 335 j 573 00230+00030#E67 3.552 j 3.159 3310 j 410 00230+00340#E67 1.242 j 1.064 828 j 168 00230+00260#E84 1.271 j 0.996 974 j 494 Table4-6.TheaveragerunningtimeofourGeneticAlgorithmforlargemetabolism.Therunning timeisreportedinseconds. Metabolism Time EnergyMetabolism#E50 90 AminoAcidMetabolism#E195 2794 wholemetabolism#E640 43262 Table4-7.TheaverageSDvalueswithdifferentmutationrates inseveralpathways. Pathways =0.02 =0.04 =0.06 00010 2.370 2.370 2.370 00230 1.131 1.084 1.102 Table4-8.Theestimated valuesandthecosttimefordifferentlevelsfortheGlycolysis/ Gluconeogenesispathway.Theaveragenumberofknockedoutenzymesinthe optimalsolutionofthispathwayis2.2. levels estimated value timesec 6 1.55 19 8 2.19 21 10 2.52 29 12 3.08 64 14 3.53 119 117

PAGE 118

Table4-9.TheaverageSDvalueandtherunningtimeofPatil'smethodandourGenetic Algorithm.Therunningtimeisreportedinseconds. SD NE Pathways D Pat j D ga NE Pat j NE ga 00280#E24 3.572 j 2.545 9.18 j 1.59 00010#E27 4.207 j 2.370 10.59 j 2.19 00260#E32 1.039 j 0.797 16.22 j 1.21 00230#E52 5.968 j 1.084 19.57 j 4.47 00230+00030#E67 13.996 j 3.159 28.47 j 3.83 00230+00340#E67 4.698 j 1.064 28.31 j 3.59 00230+00260#E84 5.754 j 0.996 35.72 j 3.27 AminoAcidMetabolism#E195 2.408 j 0.737 95.1 j 1.3 wholemetabolism#E640 60.904 j 1.962 315.5 j 5.25 118

PAGE 119

CHAPTER5 ENZYMATICTARGETIDENTIFICATIONWITHDYNAMICSTATES InChapter2,Chapter3andChapter4,wediscussthestrainoptimizationbasedonthe steadystate.Thatis,wediscusstheproblemtoidentifytheenzymesetwhoseknockoutslead thenalsteadystatetosatisfytheoptimalconstrains.Indetails,inchapter4,wediscussthe problemidentifythesetofenzymeswhoseknockoutsleadthemetabolismtoastatethatisclose toagivengoalstate.Inchapter3,wediscusstheproblemtoidentifytheenzymesetwhose knockoutsleadthenalstatetoobtaintheoptimalobjectivefunctionvalues.Inchapter2,we discusstheproblemtondthesetofenzymes,whoseknockoutsleadthenalstatetostopthe productionofagivensetoftargetcompounds,whileeliminatingminimalnumberofnon-target compounds. However,thebiologicalsystemneedsquiteafewtime,e.g.severalminutesorhourseven daysfromonesteadystatetoanotherstate.Theprocessbetweentwosteadystatesissignicant. Ifthereexiststwodifferentpathesfromthestartstatetothenalstate,theinuenceofthese twopathesonthewholebiologicalsystemmaybesignicantdifferent.Ifwewanttoincrease thebloodsugarconcentrationofabiologicalsystem,onepathistograduallyenhancethe bloodsugarconcentration.Anotherwayistoaggrandizethebloodsugarconcentrationwitha sharpcurvethendecreaseittothegoalconcentration.Therstmethodmaycostmuchtimeto reachthegoalhoweverthesecondmethodmaybringdangerousside-effect.Forexample,the sugarconcentrationmayreachaserioushighextentandleadtheorganismtodie.Thus,itis necessarytoconsiderthetransientpathwhenweidentifytheenzymesettochangethestateof thebiologicalsystem. 5.1MotivationandProblemDenition Metabolicnetworksshowhowenzymesandcompoundsinteractthroughreactions. Enzymescatalyzereactionsthattransformasetofcompoundsintoanothersetofcompounds. Thestateofametabolicpathwaycanbeexpressedasavector,whereeachentrydenotesthe 119

PAGE 120

concentrationofacompound[106]oraux[71]inthenetworkatagiventime.Steadystate isthestatethatremainsunchangedovertime.Anumberofmethodshavebeendevelopedto computethesteadystateofagivenmetabolismfordifferentnetworkmodelsSeeVoit'sbookfor anoverviewofthesemodelsandmethods[107]. Whenanenzymeisinhibited,itcannotcatalyzethereactionsitisresponsiblefrom.Asa result,theproductionofasetofcompoundscanchange.Followingfromthisobservation,the enzymatictargetidenticationproblemaimstoidentifythesetofenzymeswhoseknockouts leadthesteadystateofthemetabolismclosetoagivengoalstate[93].Intheliterature,this problemhasbeenconsideredforanumberofnetworkmodelsincludingBoolean[93,94,96]and stoichiometric[93]models. Theenzymatictargetidenticationproblemaboveandtheexistingsolutionsforthis problemhaveaseriousshortcoming.Thisisbecausetheyconsideronlythesteadystateofthe givenmetabolicnetwork.However,thebiologicalsystemreachestothesteadystateoveraperiod oftime,afterasequenceofchangestoitscurrentstate.Inotherwords,reachingtoasteadystate isaprocessthatinvolvesobservingmanyotherstates.Inhibitionoftwodifferentsetsofenzymes canleadtothesamesteadystateintwodifferentways. Figure5-1illustratesthisonahypotheticalexamplebyfocusingontheconcentration ofasinglecompound.Bothmanipulationsleadtothesamesteadystateinthesametime, howevertheydothisthroughdifferentsequenceofstates.Theareabetweenthetwoplots showthedifferencebetweentwodynamicstates P 1 and P 2 ofonecompound.Thepatternof statesobservedwhilereachingasteadystateisacriticalinformationthatisignoredbycurrent enzymatictargetidenticationmethods.Forinstance,ifthebloodsugarconcentrationincreases ordecreasestoorapidlyorifitbecomesoverorbelowathreshold,thispatternmaycause dangerousside-effects.Thus,itisnecessarytoconsiderthedynamicprocesswhilesolvingthe enzymatictargetidenticationproblem. 120

PAGE 121

Weusethetermdynamicstatetodescribethesequenceofstatesobservedovertime. Weconsiderthedyamicstateasavectorwherethe i thentrydenotesthestateofthenetworkat the i thtimeinstance.Thereareseveralmodelstodescribethedynamicstateofthebiological system.WediscusstheminSection5.2.However,tothebestofourknowledge,thereisno publishedstudythatconsidersthedynamicstatesfortheenzymetargetidenticationproblem. Followingfromthisobservation,wefocusthefollowingprobleminthispaper. Problemstatement. DynamicenzymatictargetidenticationproblemGivenagoaldynamic stateofasetofcompoundsinametabolicnetwork,weaimtoidentifythesetofenzymeswhose inhibitionleadstothedynamicstateofthesecompoundsasclosetothegoaldynamicstateas possible. Contributions: Weaddressthedynamicenzymaticidenticationprobleminthispaper. Inordertosolvethisproblem,itisnecessarytoprovideameasuretoevaluatethedifference betweentwodynamicstates.Weconsiderthreealternativemeasurestocoverabroadrange ofdistancefunctions.Therstonemeasurestheareabetweenthecurvesdenedbythetwo dynamicstates.Forexample,inFigure5-1,theshadedregioncorrespondstothedistance betweendynamicstates P 1 and P 2 .Thismeasuremakesthesimplifyingassumptionthatthe metabolismreachestothesteadystatesinthesameamountoftimewhendifferentenzymesets areinhibited.Oursecondmeasureeliminatesthisassumptionbyallowingthedynamicstates stretchalongthetimedimensionbyarbitraryamounts. Figure5-2depictsthisonasimplepatternrepresentingahypotheticaldynamicstate.We buildadynamicprogrammingsolutiontocomputethisdistancefunction.Ourthirdmeasure furthergeneralizesthedistancemeasurebyallowingscalingandshiftingofdynamicstatessee Figure5-2.Thisgeneralizationismotivatedbythefactthattwodifferentpatternscanhave similartrendswhilehavingsignicantlydifferentvalues.Webuildaniterativealgorithmthat 121

PAGE 122

ndsthedistancebetweentwodynamicstateswhenoneofthemisallowedtostretchalongthe timedomainaswellasshift/scaleonthedimensionscorrespondingtotheuxvalues. Wedevelopabranchandboundstrategythatusesthesedistancemeasurestosolvethe dynamicenzymatictargetidenticationproblem.Thebasicsearchstrategyfollowsthestandard OPMETalgorithmforBooleannetworks[96].Itconsidersthesearchspaceasahierarchicaltree whereeachnodeofthistreecorrespondstoasetofenzymestobeinhibited.Thefundamental differenceisthateachnodenowcorrespondstoadynamicstate.Thisdifferenceintroduces additionalcomputationalcostonthebasicOPMETalgorithmintwodifferentways.First, computingthedynamicstatecanbesignicantlyharderthanjustcomputingthesteadystate. Second,thetimecomplexityofcomputingthedistancebetweentwodynamicstatescanbe quadraticinthenumberoftimeinstances,whilethatforthesteadystatesonlyisconstant.We dealwiththesetwochallengesbydevelopapartitioningstrategyasfollows.Insteadofcreating theentiredynamicstate,wequicklycreateashortprexofit.Wethenusethisprextocompute alowerboundtothedistancebetweentheentiredynamicstates.Thisboundhelpsustoprune unpromisingsolutionsinthesearchspace.Weextendthisprexbycomputingmorevaluesinthe dynamicstateifneeded.Aslongerprexofthedynamicstatebecomesavailable,weimprovethe lowerboundusingthenewvaluesforfurtherpruningofthesearchspace. Ourexperimentsdemonstratethatourmethodis85-100%accuratewhenasingleenzyme isinhibited.Itis65-75%accuratewhentwoenzymesareinhibited.Ourpartitioningstrategy improvestherunningtimeofouralgorithmoverthebasicOPMETstrategybyafactorof4.4to 14. Therestofthechapterisorganizedasfollows.Section5.2discussestherelatedwork. Section5.3denesthreedynamicdistancefunctionstomeasurethedifferencebetweentwo dynamicstates.Section5.4discusseshowwesearchthesetofpossiblesolutionsandour partitioningstrategy.Section5.5presentsexperimentalresults.Section5.6concludesthepaper. 122

PAGE 123

5.2RelatedWork Existingmethodsforenzymatictargetidenticationusethreemodelstoexplainhow metabolismworks,namelyBoolean,linearandnon-linearmodels.Adetaileddiscussionofthese methodsisavailableatSonget.al.,[93].Webrieysummarizesomeofthemhere.Sridharet.al andSongetal.consideredaBooleanmodeloftheenzymatictargetidenticationproblem[94 96].Intheirversion,eachentryofthestatedenoteswhetherthecorrespondingenzymeis presentornot.FluxBalanceAnalysis,FBA[7,29,48,90]usesasetoflinearequationsto describeagivenmetabolicnetwork.Methods,thatusethissystemoftenuseLinearProgramming LP,IntegerLinearProgrammingILPorgeneticalgorithmstosolvetheenzymatictarget identicationproblem.OptKnock[9]andthemethodbyPatilet.al.,[72]aretwoexamplestothe algorithmsinthisclass[9].Thoughthelinearmodelworkswellforsomecases,thereexistmore complexnon-linearmodelstodescribethemetabolism.Thesenon-linearmodelscansimulate thecellsystembetterthanthelinearmodel.Forexample,S-systems[86,107]andGeneralized MassActionGMAmodel[73,107]belongtothiscategory.Mostoftheexistingmethodsare suitedwellforlinearmodels.Thustheydonotworkwhenthesemorecomplexmodelsareused tocomputethesteadystateofthemetabolicnetwork.Songetal.proposedageneticalgorithm solutionfornon-linearmodels[93].Alltheabovementionedmethodsconsideronlythesteady stateofthemetabolism.Theyignorethesequenceofstatestheunderlyingnetworkvisitswhile reachingthesteadystate.Asaresult,althoughtheirsolutionmaybeoptimalatthesteadystate, theintermediatestatesoftheirsolutionscanbeundesirable. Thereareseveralmodelsthatsimulatethedynamicstateofagivenmetabolicnetwork.For example,DynamicFluxBalanceAnalysisDFBAextendedthetraditionalFBAtodescribethe changerateoftheuxesoveraperiodoftime[62,64].DFBAincorporatesthetimeparameter whichcanpredictthemetaboliteconcentrations.Itconsiderstheentiretimeperiodandbuildsa non-linearprogrammingproblem.Itseparatesthetimeintroseveralintervals.Foreachintervalit 123

PAGE 124

employsalinear-programmingmethodtoestimatetheuxvaluesduringthatinterval.Integrated dynamicFBAidFBAsimulatestheintegratedsystemincludingsignaling,metabolicand regulatorynetworks[56].SimilartoDFBA,,idFBAseparatesthetimeintoseveralintervals. Foreachinterval,itappliesFBAtocomputetheuxvalues.Fromthesevalues,itdecides whichreactionswilltakeplaceduringthenextinterval.IntegratedFBAiFBAmodelbuilds adynamicsimulationamongmetabolic,regulatoryandsignalingnetworks[16]alongthe samelinesasDFBAandidFBA.Itrstseparatesthetimetoseveralintervals.Itthenapplies ordinarydifferentialequationsODEsandBooleanregulatorymodeltoconstraintheFBAlinear programmingproblem.Itupdatesthebiomassandexternalmetaboliteconcentrationsforuse insubsequenttimesteps.Allthesemethodsaimtondthedynamicstateofagivenmetabolic network.Theyhoweverdonotconsiderthedynamicenzymatictargetidenticationproblem, whichisthefocusofthispaper. Inthispaper,weuseGeneralizedMassActionGMAmodel[73,107]tomodelmetabolic networksasthisisoneofthemostaccuratemodels.ItisageneralizationoftheS-systemsof equations.WeemployDFBAtocomputethedynamicstateofthegivenmetabolicnetwork.It isworthnotingthatonecanreplacethesewithanothermathematicalmodelforcomputingthe dynamicstatewithlittleornochangetotherestofthispaper. 5.3CalculatingtheDistancebetweenTransientPaths Inordertosolvethedynamicenzymatictargetidenticationproblem,thersttaskisto measurethedistancebetweentwodynamicstates.Werstpresentthebasicnotationweusein thispaperinSection5.3.1Wethendescribethreedistancemeasures.Section5.3.2discusses theexactdistance.Section5.3.3discusseshowwecanmeasurethedistancewhenweallow exibilitytotheamountoftimeittakesforthemetabolicnetworktoreachthesteadystate. Section5.3.4describesthepatterndistancethatallowsthedynamicstatetobescaledofshifted byarbitraryamounts. 124

PAGE 125

5.3.1Notation Westartbydescribingthenotationthatwillbeusedintherestofthepaper.Assumethat agivenmetabolicnetworkcontains M uxes.Letusedenotetheuxesinthegivenmetabolic networkwith X 1 X 2 X M .Letusdenotetheamountofinstantaneouschangein X i with X i Assumethat X i takespartin N reactions.TheGMAmodelexpressestheeachux X i usingthe equation X i = N X k =1 i;k M Y j =1 X f i;j;k j : Here,theconstant i;k isthespeedofthe k threactionandtheconstant f i;j;k istherateof contributionofthe j thuxtothe k threaction.Foragivensetofequationsofthisform,whenthe initialvaluesi.e.,thevalueattimezeroofallthe X i sareprovided,wecancomputethevalueof each X i atagiventime t bysimulatingthisprocessorbysolvingtheseequations. Werepresentthedynamicstateof X i usingtwovectors X i;T and X i;V .Therstone, X i;T =[ xt i; 1 xt i; 2 xt i;n ],showsthetimepointsatwhichmecomputethevalueof X i untilthe metabolicnetworkreachestothesteadystate 8 j xt i;j
PAGE 126

5.3.2ExactDistance Ourrstdistancefunctionmeasurestheareabetweenthetwocurvesdenedbythetwo givendynamicstates.WenamethismeasuretheExactdistance.Theareaoftheshadedregion inFigure5-1showstheexactdistanceonahypotheticalexample. Givenagoaldynamicstate G V G T andthedynamicstate X V X T ,thismeasure assumesthat G T and X T hasexactlythesameentries.Althoughthisisastrongassumption,we ensurethatthisissatisedasfollows.Forallthevaluesin G T thataremissingin X T weinsert thosevaluesin X T andcomputethevalueof X V atthosenewtimepointsusingtheGMAmodel asdescribedinSection5.3.1.Forallthevaluesin X T thataremissingin G T weinsertthose valuesin G T .Wefollowadifferentproceduretocomputethe G V valuesforthenewtimepoints asthereisnometabolicnetworkavailableforthegoalstate.Wesimplyuselinearinterpolationof theexistingtimepointsandvaluesin G T and G V tocomputethenewentriesfor G V Onceweensuretheequalityofthevectors G T and X T ,wecomputetheexactdistance betweenthetwodynamicstatesas: X i j gv i )]TJ/F24 11.9552 Tf 11.955 0 Td [(xv i j + j gv i +1 )]TJ/F24 11.9552 Tf 11.955 0 Td [(xv i +1 j j xt i +1 )]TJ/F24 11.9552 Tf 11.955 0 Td [(xt i j 2 : Geometrically,thisequationapproximatestotheareabetweenthetwocurvesbysplitting theregionbetweenthemintotrapezoidsbetweenconsecutivetimeintervals.Eachsummation termintheaboveformulationcorrespondstotheareaofonetrapezoid. TheexactdistancemeasurecanbecomputedefcientlyinO n time,where n isthenumber oftimepoints.Ithoweverisrestrictiveasitreturnsasmallvalueonlyifthetwodynamicstates havethesimilarvaluesatsimilartimes.Inthefollowingsections,wewilldiscusshowwerelax thisconstraint. 126

PAGE 127

5.3.3Time-warpingDistance Theexactdistancefunctionrequiresthetwostatestohavesimilaruxvaluesforthesame periodoftime.Thisrestriction,however,willfailtondthesimilaritiesbetweentwopathways underthefollowingscenario.Itispossibletohavetwodifferentmetabolismsthathavesimilar setsofreactionsbutthespeedatwhichthereactionstakeplacediffer.Insuchscenarios,the dynamicstatesofthetwometabolismscanhavesimilarvaluesbutthetimeittakestoreach thesevalueswilldiffer.Wesaythatsuchdynamicstatesarestretchedalongthetimeaxis.The left-mostdynamicstatesinFigure5-2illustrateapairofdynamicstateswithonestretched. Ourseconddistancefunctionusestime-warpingdistancefunctiontoaddressproblems causedbythemetabolicmanipulationsthatchangethespeedofthereactions.Dynamic timewarpingtechniqueiswidelyusedindatamining[4,51,114],gesturerecognition[31], robotics[87],speechprocessing[79].Here,weapplydynamictimewarpingtechniquetoevaluatethedistancebetweentwodynamicstates.Assumethatwearegivenagoaldynamicstate G = G V G T andthedynamicstate X = X V X T asdescribedinSection5.3.1.Time-warping distancealignsthetwodynamicstatestondamappingbetweentheirtimepoints.Figure5-3 illustratesthealignmentoftwohypotheticaldynamicstatesandhowthetime-warpingdistance stretchesthemtobringtheirsimilarvaluesclosetoeachother.Oncethetwodynamicstatesare aligned,thismeasurecomputesthedistanceastheareabetweentheircurvesafterthedynamic state X V X T isstretchedalongthetimedimensiontomatchitstimepointstothoseofthegoal dynamicstate. Weusedynamicprogrammingmethodtoalignthetwodynamicstatesasfollows.Letus denotethedistancebetweenthe i thvalueof G andthe j thvalueof X with d G;i;X;j .We discusscomputationof d G;i;X;j laterinthissection. Letusalsodenotethedistancebetweentherst i valuesof G andtherst j valuesof X aftertheiroptimalalignmenti.e.,alignmentwithsmallestdistancewith i;j .Wecompute 127

PAGE 128

i;j as d G;i;X;j +min f i )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 ;j )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 ; i )]TJ/F15 11.9552 Tf 11.956 0 Td [(1 ;j ; i;j )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 g : Therstofthethreescenariosinthe min functioncorrespondtothecasethattherst i )]TJ/F15 11.9552 Tf 12.776 0 Td [(1 valuesof G isalignedtotherst j )]TJ/F15 11.9552 Tf 12.081 0 Td [(1 valuesof X .Thesecondscenariocorrespondstothecase thattherst i )]TJ/F15 11.9552 Tf 12.197 0 Td [(1 valuesof G isalignedtotherst j valuesof X .Inotherwords X isstretched alongthetimeaxistomatchits j thentrytoboth i thand i )]TJ/F15 11.9552 Tf 12.162 0 Td [(1 thentriesof G .Thelastscenario correspondstothecasethattherst i valuesof G isalignedtotherst j )]TJ/F15 11.9552 Tf 12.017 0 Td [(1 valuesof X .Thatis X iscontractedalongthetimeaxistomatchbothofits j )]TJ/F15 11.9552 Tf 12.253 0 Td [(1 thand j thentriestothe i thentry of G .Wecomputethetime-warpingdistancebetween G and X as Dis G;X = p m;n : Weskippedtwoimportantdetailsincomputationofthetime-warpingdistancesofar.Therst oneisthedistancebetweentwotimepointsofthetwodynamicstates d G;i;X;j .Wecompute thisvalueas gv i )]TJ/F24 11.9552 Tf 11.955 0 Td [(xv j 2 gt i +1 )]TJ/F24 11.9552 Tf 11.955 0 Td [(gt i )]TJ/F23 7.9701 Tf 6.586 0 Td [(1 2 gt m )]TJ/F24 11.9552 Tf 11.955 0 Td [(gt 1 + xt j +1 )]TJ/F24 11.9552 Tf 11.956 0 Td [(xt j )]TJ/F23 7.9701 Tf 6.586 0 Td [(1 2 xt n )]TJ/F24 11.9552 Tf 11.955 0 Td [(xt 1 = 2 : Briey,thisfunctionapproximatestotheareabetweenthetwocurveswhenthe j thintervalof X ismovedtothe i thintervalof G atthattimeinterval.Itapproximatesthisareaasthesumof twotriangles.Thersttriangledenestheareainthetimeinterval[ gt i )]TJ/F23 7.9701 Tf 6.586 0 Td [(1 gt i +1 ].Thesecondone denesthatinthetimeinterval[ xt j )]TJ/F23 7.9701 Tf 6.586 0 Td [(1 xt j +1 ]. Theseconddetailweomittedistheinitializationofthe i;j matrix.Weinitializethis matrixforthecaseswhenatleastoneofthetwodynamicstatescontainnovaluesasfollows. Case1: ; 0=0 Case2: i; 0= i )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 ; 0+ gv 2 i gt i +1 )]TJ/F25 7.9701 Tf 6.586 0 Td [(gt i )]TJ/F18 5.9776 Tf 5.756 0 Td [(1 4 gt m )]TJ/F25 7.9701 Tf 6.587 0 Td [(gt 1 Case3: ;j = ;j )]TJ/F15 11.9552 Tf 11.955 0 Td [(1+ xv 2 j xt i +1 )]TJ/F25 7.9701 Tf 6.586 0 Td [(xt i )]TJ/F18 5.9776 Tf 5.756 0 Td [(1 4 xt n )]TJ/F25 7.9701 Tf 6.586 0 Td [(xt 1 128

PAGE 129

Therstcaseindicatesthatbothdynamicstateshavenovalue.Thelasttwocasesindicate thatoneofthetwodynamicstateshavevalues.Inthiscasethedistanceistheareaunderthe nonemptycurve. 5.3.4PatternDistance Wehavediscussedhowtodealwiththedifferencesinthedynamicstatesduetodifferences inthereactionspeedsusingtimewarpingdistance.Twosimilarnetworks,however,canhave differentdynamicstatesevenunderthetimewarpingdistancewhenthevaluesofonedynamic stateisshiftedorscaled.ThedynamicstatesinthemiddleandontherightofFigure5-2 illustratethisevent.Thiskindofalterationsoftenhappensduetoexternalfactors,suchas increasingordecreasingtheconcentrationsofasetofcompounds.Itcanalsobecausedby inaccuraciesinmeasurements.Ourlastdistancemeasurecomputesthesmallestdistancebetween twodynamicstateswhenoneofthemisshifted,scaledandstretchedtomakeasclosetothe otheraspossible.Wecallthismeasurethepatterndistance. Assumethatwearegivenagoaldynamicstate G = G V G T andthedynamicstate X = X V X T asdescribedinSection5.3.1.Let and betworealnumbers.Scalingthegoalstate with moves G to G V G T .Similarly,shifting G by moves G to G V + G T Ifwearegiventheoptimalscalingandshiftingcoefcients and ,onecanndthe distancebetween G and X asthetimewarpingdistancebetween X and G V + G T using thedynamicprogrammingmethodinSection5.3.3.This,however,isnotfeasibleasthevaluesof and howeverarenotavailable.Inversely,givenanalignmentofthetimepointsof G and X throughalgebraicmanipulations,wecancomputethescalingandshiftingcoefcient and that willminimizethedistancebetweenthem.Wediscusshowthiscanbedonelaterinthissection. However,thisisnotfeasibleaswellastheiralignmentisunknown. WedevelopaniterativealgorithmtosolvethisproblemseeAlgorithm1.Eachiterationof ouralgorithmcontainstwophases.Intherstphase,wexthevaluesof and andoptimize 129

PAGE 130

Algorithm5.1 IterativeAlgorithmforPatternDistance Input: Dynamicstates G and X Initialize =1 and =0 Phase1:UsethedynamicprogrammingmethodofSection5.3.3tondthealignmentof G and X Phase2:Computethe and valuesthatminimizethedistanceforthecurrentalignment.Update G V as G V + Gotophase1untilagivennumberiterationsornodistanceimprovement. thealignment.Inthesecondphase,wexthealignmentandcomputetheoptimal and valuesforthatalignment.Eachphaseguaranteesthattheresultingdistanceislessthanorequal tothatinthepreviousphase/step.Thus,thedistancevalueouralgorithmcomputesdecreases monotonicallythroughoutiterations.Thus,thealgorithmisguaranteedtoconvergetoaminimal distance.Next,wefocusonPhase2ofouralgorithmandexplainhowtocomputetheoptimal valuesof and AttheendofPhase1ofAlgorithm5.1,wehaveanalignmentbetween G and X .Recall that,thisalignmentcanmaponetimepointofonedynamicstatetoseveralconsecutivetime pointsoftheother.Forinstance,Figure5-3showsanillustrationoftwodynamicstates.Each squarecircleshowsatimepoint.Thetimepointsofoneofthedynamicstatesislabeledwith numbers1to6.Thoseoftheotherarelabeledwithlettersatoe.Oneofthedynamicstatesis shifteddownslightlytomakeiteasiertovisualize.Dashedlinesoftopgureshowtheactual patternofthedynamicstates.Solidlinesshowthetime-warpingalignmentofthetwostates.In thebottomgure,thetimepointsthatarealignedwithmultipletimepointsfromtheotherstate areduplicatedandstretchedtomatchthemalongthetimeaxis.Timepointaismappedtotime pointsand.Tomakeournotationfortherestofthissectionsimple,weduplicatesuch timepointsseethebottomgureinFigure5-3.Letuscalltheresultingdynamicstatesas G 0 = 130

PAGE 131

G 0 V G 0 T and X 0 = X 0 V X 0 T ,where G 0 and X 0 bothhavesamenumberof n timepoints.In otherwords,the i thtimepointof G 0 isalignedtothe i thtimepointof X 0 .Formally,wewilluse thefollowingnotationfor G 0 and X 0 : X 0 V =[ xv 0 1 xv 0 2 xv 0 n ] X 0 T =[ xt 0 1 xt 0 2 xt 0 n ] G 0 V = [ gv 0 1 gv 0 2 ::: gv 0 n ] G 0 T = [ gt 0 1 gt 0 2 ::: gt 0 n ] Wedenetwofunctionsovertimeintervalsof G 0 and X 0 as gt 0 i = gt 0 i +1 )]TJ/F24 11.9552 Tf 11.955 0 Td [(gt 0 i )]TJ/F23 7.9701 Tf 6.586 0 Td [(1 2 gt 0 n )]TJ/F24 11.9552 Tf 11.956 0 Td [(gt 1 and xt 0 i = xt 0 i +1 )]TJ/F24 11.9552 Tf 11.955 0 Td [(xt 0 i )]TJ/F23 7.9701 Tf 6.587 0 Td [(1 2 xt 0 n )]TJ/F24 11.9552 Tf 11.956 0 Td [(xt 0 1 : Thepatterndistancebetween G and x is Dis G;X = s X i gv 0 i + )]TJ/F24 11.9552 Tf 11.955 0 Td [(rv 0 i 2 gt 0 i + rt 0 i = 2 : Wesolvethefollowingequationstogetthevaluesof and @Dis G;X @ = @Dis G;X @ =0 : Weskiptheindividualalgebraicstepsonhowwesolvetheseequationsandpresentthenal resulthere.Thevaluesof and are = C )]TJ/F24 11.9552 Tf 11.955 0 Td [(B D A )]TJ/F24 11.9552 Tf 11.955 0 Td [(B 2 and = D )]TJ/F24 11.9552 Tf 11.955 0 Td [( B; where A = X i gt 0 i + xt 0 i gv 0 i 2 ; B = X i gt 0 i + xt 0 i gv 0 i ; 131

PAGE 132

C = X i gt 0 i + xt 0 i xv 0 i gv 0 i ; and D = X i gt 0 i + xt 0 i xv 0 i : 5.4Methods Sofarwehavediscussedhowtocomputethedistancebetweentwodynamicstates.The nextstepistoidentifythesetofenzymeswhoseeliminationleadstothedynamicstateasclose tothegoaldynamicstateaspossiblewithrespecttothisdistancefunction.Onewaytosolvethis problemistoexhaustivelytraverseallpossiblesubsetsofenzymes,examinethedistancevalue andpickthelowestone.However,thisisnotfeasibleasthenumberofsubsetsisexponentialin thenumberofenzymes. OPMETsolvesthisproblemusingabranchandboundalgorithmforasimpliedBoolean modelofmetabolicnetworks[96].Atthisparagraph,wetakeadetourandsummarizethe OPMETalgorithmasweuseitinthispaper.Unliketheproblemconsideredinthispaper, OPMETconsidersonlythesteadystatei.e.,thelastvalueofthedynamicstateinsteadofthe entiredynamicstate.Itsystematicallysearchessubsetsofenzymes.Eachnodeinthebranch-andboundsearchtreeisacandidatesolutioni.e.,asetofenzymestobeinhibited.Therootnode containstheemptyset.Asthesearchspaceistraversed,OPMETkeepsthecurrentbestnode withtheminimumdistancesofarasthecurrentbestsolutionandtheassociateddistanceasthe globalcut-offthreshold.Ifthecurrentnodehasdistancelessthanthecurrentthreshold,itsaves thatnodeasthenewbestsolutionandupdatestheglobalthresholdwiththecurrentdistance value.Itthenselectsanotherenzymetoinsertintothecurrentsetofinhibitedenzymesifthe insertionimprovesthecurrentsolution.Otherwise,itbacktracktotheprevioussolution. WeusetheOPMETalgorithmtosearchsubsetsofenzymessystematicallytondtheone withtheclosestdynamicstatetothegoalstate.Usingthedynamicstate,however,introduces anewchallengethatdoesnotexistinOPMET.Thecomputationalcostforcomputingthe 132

PAGE 133

dynamicstate X ofametabolicnetworkoftendominatesthetimeittakestocomputethedistance betweentwodynamicstates.Thegapbetweenthetwodependsonthenumberofequationsand variablesthatexplaintheuxesofthemetabolicnetworkaswellasthedependenciesbetween thosevariables.Therefore,itisessentialthatweavoidcomputingthedynamicstateforthe nonpromisingenzymesubsets.Weachievethisforeachsubsetofenzymesasfollows.Wedo notcomputetheentiredynamicstate X .Instead,wecomputeonlyashortprexofthevalues in X .Usingthisprex,wecomputealowerboundtothedistancebetween G andtheentire X ComputationofthelowerboundvariesforthedistancemeasureswedenedinSection5.3.We discusshowtocomputethelowerboundforeachofthethreedistancemeasuresnext. Exactdistance. Oursolutionfollowsfromtheobservationthatthisdistancefunctionisadditive overthetimedomain.Thatistheexactdistancebetweentwodynamicstatesoveratimeperiodis simplythesumoftheirdistancesoftheshortertimeintervalsthatmakethattimeperiod. Wesplittheentiretimespantillthemetabolicnetworkreachessteadystateintononoverlappingintervals.Wecomputethedynamicstate X onlyforthersttimeinterval.Wecomputeits distanceto G insidethistimeintervalbyconsideringonlythosevaluesof G thatarewithinthis intervalasdescribedinSection5.3.2.Ifthedistanceisgreaterthantheglobalcut-offthreshold welterthatenzymeset.Otherwise,wecomputethevaluesof X forthenexttimeinterval andupdatethedistancewiththesenewvalues.Asthedistanceisadditiveandnonnegative,it monotonicallyincreaseswiththenumberoftimeintervals.Werepeatupdatingthedistancewith newintervalsuntilthedistanceexceedsthethresholdorwecomputetheentire X .Thisway,we canavoidcomputingtheentire X ifthedistancegrowsbeyondthecutoffquickly. Time-warping/patterndistance. Bothtime-warpingandpattendistancesusedynamicprogrammingtocomputethealignmentoftwodynamicstates.Asaresult,thesedistancesare notadditive.Wedevelopadifferentstrategyforthem.Next,weexplainonlythetime-warping distanceasbothdistancesusethesamedynamicprogrammingmethod. 133

PAGE 134

Figure5-4illustratesourstrategy.Inthegure,solidarrowsshowtheentriesinthedynamic programmingmatrixthatcancontributetothecomputationofentry i j .Thedashedarrow showsthepathcorrespondingtotheoptimalalignmentof G and X d min i.e.,valueofthe shadedentryistheminimumvalueateachrow.Ifthevalueof d min attheendofanintervalis morethanthecutoffwedonotneedtocomputetherestofthedynamicstate X .Similartothe exactdistance,wecomputethedynamicstate X onlyforashortprexof X .Assumethatthis prexcontains m 1 valueswecomputethevaluesof m 1 ;j forall j 2 [1 ;n ] .Ifthesmallest valueamong m 1 ;j ; 8 j isgreaterthantheglobalcut-offthresholdwelterthatenzyme set.Otherwise,wecomputethevaluesof X forthenexttimeintervalandcontinuellingthe dynamicprogrammingmatrix.Werepeatthisprocessuntiltheminimumdistanceinarowofthe dynamicprogrammingmatrixexceedsthethresholdorwecomputetheentire X .Thisway,we canavoidcomputingtheentire X iftheminimumdistancegrowsbeyondthecutoffquickly. 5.5ExperimentalResults Inthissection,weevaluateouralgorithmsonrealdatasets. Dataset: WetestourthreedistancefunctionsandthesearchonthePurinemetabolismasall thereactioncoefcientsneededtocomputethedynamicstatesareavailableinliteraturefor thisnetwork.Purinemetabolismisanimportantmetabolicnetwork.Itsynthesizesandbreaks downpurines.WeuseVoit'scomputationalmodelforPurinemetabolism[107].Wecompute thedynamicstateofthePurinemetabolismbysolvingtheGMAsystemequations.Thekinetic ordersandrateconstantsofthecorrespondingGMAsystemequationsareavailableinthe literature[107].Brieythisdatasetcontains36uxesand16ODEsthatdescribetherelationship betweendifferentvariables. Querysets: Wecreatedfourquerysets,namely Q 1 Q 2 Q 3 and Q 4 fromthePurinemetabolism dataset[107].Eachquerysetcontains10goaldynamicstates.Wecreatedeachqueryin Q 1 as follows.WerandomlyselectedoneenzymefromthePurinemetabolism.Wethencomputedthe 134

PAGE 135

thedynamicstateofthatmetabolismafterinhibitingthatenzymeanduseditasthegoalstate. Similarly,wecreatedeachqueryin Q 2 Q 3 and Q 4 byinhibitingtwo,threeandfourrandomly selectedenzymesrespectively.Thisensuresthatthereisasolutiontothedynamicenzymatic targetidenticationproblemforeachquerywithzerodistance.Inordertosimulatedifferent kindsofmutationsonthegoaldynamicstate,wealsocreatedthreemorequerysetsforeachof Q 1 Q 2 Q 3 and Q 4 asfollows. Stretch.Therstonesimulatesstretchinginthetimedomain.Forthis,westretchedeach querybyrandomlyshiftingitstimevalueateachtimepointbyarandomamount.Wedenote thesequerysetsbyincludingtheprex S -tothequerysete.g., S Q 1 Shift/Scale.Thisquerysetsimulatesscalingandshiftingmutations.Weobtainedthisby scalingandshiftingthedynamicstatesintheoriginalquerysetsbyrandom and values.We denotethesequerysetsbyincludingtheprex SS -tothequerysete.g., SS Q 1 Shift/Scale/Stretch.Thisquerysetsimulatesscalingandshiftingmutationsaswellas stretchinginthetimedomain.Weobtainedthisbyscalingandshiftingthedynamicstatesin thestretchedquerysetsbyrandom and values.Wedenotethesequerysetsbyincludingthe prex SSS -tothequerysete.g., SSS Q 1 Implementationandsystemdetails: WeimplementedthedevelopedalgorithmsinC++and MATLAB.Inourexperiments,weapplytheordinarydifferentialequationfunctionse.g.ode23t, ode45ofMATLABtocomputetheGMAsystemequations.Then,wecreatedaC++shared libraryfromMATLABM-lefordynamicstatecomputation.Thus,wecanuseourtraversal methodtondtheresults.WeranourexperimentsonasystemwithIntelCorei7-9202.66GHz Processor,4gigabytesofRAM,anda64-bitWindows7operatingsystem. Experimentalsetup:Weevaluatethespeedandtheaccuracyofourmethodforeachofthe threedistancemeasuresintermsofseveralmetrics. 135

PAGE 136

Executiontime:Thisindicatestheaveragetotaltimetakenbyouralgorithmstondthe enzymesetwhoseinhibitionleadsthemetabolicnetworktotheclosestdynamicstate. Percentageoftimeintervals:Thisindicatesthepercentageofthedynamicstateourmethod computesaftersplittingittosmallerintervals.Smallpercentageisdesirableasitoftenindicates lowerrunningtime. Percentageofsuccess:Eachgoaldynamicstateinourquerysethasamatchingenzyme subsetthathasthesmallestdistance.Thismetricreportsthepercentageofqueriesforwhichour algorithmcouldidentifytheoptimalresult.Itisworthnotingthatthismeasureisbiasedagainst ourmethod.Thisisbecauseevenwhenourmethodsdoesnotndtheoptimalenzymesubset, theresultitndcanbeveryclosetotheoptimalone.Thismetricconsidersthoseresultsasfailed results.However,aswepresentlaterinthissection,ourmethodhashighaccuracyforthismetric despitethisbias. 5.5.1EvaluationofthePerformance Ourrstexperimentevaluatestheperformanceofourmethodwithandwithoutsplitting thedynamicstatesinthetimedomain.Whilesearchingthepossibleenzymesets,recallthat, ouralgorithmsplitsthedynamicstateintoshortintervalsandincrementallycomputesthese intervals.Therstquestionsthatweneedtoanswerare:iWhatisthetypicalrunningtimeof ouralgorithmwithoutsplittingandiiwhatistheimpactofsplittingontheperformanceofour algorithm?Toanswerthesequestions,weranquerieswithoutsplittingandwithsplittingwhen wesplitthedynamicstateinto2,4and8intervals. Table5-1showstheperformanceresultsfortheexactandthetime-warpingdistance.It presentsthecomparisonoftheaveragepercentageoftheintervalsouralgorithmgeneratesfor theexactandtime-warpingdistancewithandwithoutsplittingthedynamicstatesforquerysets Q 2 Q 3 and Q 4 intodifferentnumberofintervals.1intervalindicatesthatthedynamicstateis 136

PAGE 137

notsplit. K intervals K> 1 indicatesthatthedynamicstateissplitinto K equalsizednonoverlappingintervalsalongthetimeaxis.Whentheunderlyingdistanceistheexactdistance,the percentageofintervalsgeneratedtendstodecreaseaswesplitthethedynamicstateintosmaller pieces.Thisindicatesthat,ouralgorithmcanoftenlteranenzymesubsetafterconsideringa smallprexofit.Thusbysplittingeachdynamicstateintoalargernumberofshorterintervals weavoidcomputingalargeportionofthedynamicstate.Weobservethelargestimprovement aftersplittingthedynamicstatesintotwointervals.Furthersplittingsgraduallyimprovesthe percentageofintervalsgenerated.Asthenumberofintervalsgrowsbeyondfour,weobserved thattheperformancegainisnegligible.Thelargestaveragerunningtimeweobservedinour experimentswasaroundtwominuteswhenthenumberofintervalsis=1andweusethe Q 4 queryset.Theimprovementintherunningtimeasweincreasethenumberofintervalschanges greatlyfromonequerytoanotherdependingonthecomplexityofthesetofordinarydifferential equationsandthecorrespondingtimeinterval.Ontheaverage,weobservedthebestrunning timesforfourtosixintervals. Theresultsforthetime-warpingdistancefollowsasimilarpattern.Thepercentageof intervalsgenerateddropsaswesplitthedynamicstateintoshorterintervals.Asthenumberof intervalsincreasestofour,welterenzymesubsetsafterconsideringasmallfractionaround 27%oftheofthevaluesofthedynamicstate.Aswesplitthedynamicstatefurtherinto smallerintervals,theamountofadditionalintervalslteredgrowsveryslowly.Similartothe exactdistance,thelargestaveragerunningtimeweobservedinourexperimentswasaround twominuteswhenthenumberofintervalsis=1andweusethe Q 4 queryset.Filteringtime intervalsimprovedtheaveragerunningtimeofouralgorithmbyafactoroftwotofour.However, itisworthmentioningthattherunningtimedoesnotnecessarilydroplinearlyalongwithit numberoftimeintervalsaswediscussedinthepreviousparagraph.Insummary,weobservethat partitioningthedynamicstatetoasmallnumberofintervalssuchasfourintervalsimproves 137

PAGE 138

therunningtime.Furthersplittingsdonothelpsignicantly.Theycanevenincreasetherunning time. 5.5.2EvaluationoftheAccuracy Thisexperimentevaluatestheaccuracyofourmethodusingthethreedistancefunctionswe denedinSection5.3onquerysetswithdifferentcharacteristics.Wemeasuretheaccuracyin termsofthepercentageofsuccess,i.e.,thepercentageofqueriesforwhichourmethodnds theoptimalresultcorrectly.Weusedtheallthefourclassesofquerysetswegeneratedforthis purpose.Table5-2presentstheresults. Asetofinterestingobservationsfollowfromtheseresults. Whenthequerydynamicstatecanbeachievedwithoutanystretch/shift/scaletransformation,ourmethodhassimilaraccuraciesforallthethreedistancemeasures.Thisisexpectedas thisisaspecialcaseofthetransformationse.g., =1,and =0. Asthegoalstatesaretransformedthroughstretchingalongthetimedomain,theaccuracyof theexactdistancedropsrapidly,whilethatoftheothertwodistancefunctionsremainidentical. Thisjustiestheneedforthedynamicprogrammingsolutionofthetime-warpingdistance. Thisisbecauseevensmallperturbationsinthenetworkcanaltertheresultoftheexactdistance greatly. Whenthegoalstateisshiftedorscaled,boththeexactandthetime-warpingdistance measuresfailwhilethepatterndistanceremaintohavehighaccuracy.Thehighaccuracyvalues ofthepatterndistancesuggeststhatouriterativealgorithmforoptimizingthepatterndistanceis highlyaccurate. Weobservethatthebestaccuracyvaluesweobserveineachrowofthistablecanbeless than100%evenwhenthequerystatesarenottransformedi.e.,considertheresultsfor Q 2 .This iscausedbecauseoftheheuristiclteringstrategyofOPMET[96]whilesearchingpossible 138

PAGE 139

enzymesubsets.Thisproblemcanbealleviatedbyusingamorestringentlteringstrategyatthe expenseofincreasedrunningtime. Weconcludefromtheseobservationsthatthepatterndistanceisthemostpromisingdistance measureamongthethree.Theexactdistanceisfasterthanthepatterndistance.However,the gapbetweentheiraccuracieswhengoalstatesaretransformedjustiestheirsmallperformance difference. 5.6Discussion Theclassicenzymatictargetidenticationproblemaimstoidentifythesetofenzymes whoseknockoutsleadthesteadystateofthemetabolismclosetoagivengoalstate.This denitionisproblematicforabiologicalsystemreachestothesteadystateoveraperiodoftime, afterasequenceofchangestoitscurrentstate.Thesesequenceofstates,calledthedynamic statescanbecrucialastheycanleadtoserioussideeffects.Weaddressedanewvariantofthe enzymatictargetidenticationproblem,namedthedynamicenzymaticidenticationproblemin thispaper.Unliketheexistingproblemweconsideredtheentiretrajectory,thegivennetwork's statefollowstoreachthesteadystate.Tothebestofourknowledge,thisproblemhasnotbeen consideredintheliteraturesofar. Weconsideredthreealternativedistancemeasurestocomputethedissimilaritybetween twodynamicstates,namelyexact,time-warpingandpatterndistance.Therstonemeasures theareabetweenthecurvesdenedbythetwodynamicstates.Thesecondoneallowesthe dynamicstatesstretchalongthetimedimensionbyarbitraryamounts.Thelastmeasurefurther generalizesthedistancebyallowingscalingandshiftingofdynamicstates.Weexploitedthe OPMETalgorithmtodevelopabranchandboundstrategythatusesthesedistancemeasuresto solvethedynamicenzymatictargetidenticationproblem.Inordertoimprovetherunningtime ofthisalgorithmforthedynamicstates,developedapartitioningstrategyasfollows.Instead ofcreatingtheentiredynamicstate,wequicklycreatedashortprexofit.Wethenusedthis 139

PAGE 140

prextocomputealowerboundtothedistancebetweentheentiredynamicstates.Ifthislower boundexceedsthedistanceofthebestsolutionfoundsofarweprunethatsolutionwithout generatingtherestofthedynamicstate.Ourexperimentsdemonstratedthatourmethodis85100%accuratewhenasingleenzymeisinhibited.Itwas65-75%accuratewhentwoenzymesare inhibited.Furthermore,ourpartitioningstrategyreducedthenumberoftimeintervalscomputed fordynamicstatesbyafactorof2to6. Figure5-1.Thepatterns P 1 and P 2 showthesequenceofconcentrationofacompoundresulting fromtwoalternativemanipulationstothemetabolicnetwork. Figure5-2.Threetransformations,stretch,scaleandshiftappliedonahypotheticaldynamic state. 140

PAGE 141

Figure5-3.Anillustrationoftwodynamicstates. Figure5-4.Anillustrationofhowpartitioningthedynamicstate X intoshorterintervals improvestherunningtime. 141

PAGE 142

Table5-1.Comparisonoftheaveragepercentageoftheintervalsouralgorithmgeneratesforthe exactandtime-warpingdistancewithandwithoutsplittingthedynamicstates. Exactdistance Numberofintervals Q 2 Q 3 Q 4 1 100 100 100 2 72.5 60.7 64.3 4 58.7 43.9 49.0 6 58.0 40.7 47.5 8 54.9 36.8 43.7 Time-warpingdistance Numberofintervals Q 2 Q 3 Q 4 1 100 100 100 2 52.3 51.8 52.1 4 27.6 27.8 27.7 6 18.7 19.4 19.0 8 14.9 15.1 15.1 Table5-2.Accuracyofouralgorithmusingthethreedistancemeasuresondatasetswith differentcharacteristics.Theaccuracyvaluesarereportedintermsofpercentageof success. Distancemeasure QuerySet Exact Time-warping Pattern Q 1 100 100 100 Q 2 70 75 75 S Q 1 10 100 100 S Q 2 20 75 75 SS Q 1 5 5 85 SS Q 2 10 10 75 SSS Q 1 0 0 85 SSS Q 2 0 5 65 142

PAGE 143

CHAPTER6 INTEGRATINGSTRUCTURALPROPERTIESOFPROTEINSANDBIOLOGICAL NETWORKSFORCOMPOUNDSELECTION Theprocessofinsilicocompoundselectionndinganewcandidatedrugfromlarge librariesofcompoundsbycomputeraid,playsasignicantroleinmoderndrugdiscovery. Oneofthepopularcompoundselectionmethodsistoscreenlibrariesofcompoundsfortheir abilitytobindtobiologicaltargetssuchasreceptorsandenzymes.Thisprocessisalsoknown asdocking.Recentvalidationstudiesshowthatdockingmethodshaveapoorperformancein compoundselection.Often,thecompoundsthathavethehighestafnitytothetargetedprotein accordingtoadockingalgorithmdonotleadtothehighestactivitylevelswhentheyaretestedin vitroorinvivo.Thus,thereisagreatneedofaccurateinsilicocompoundselectionmethods.In thischapter,wedeveloptwonovelcomputationalmethodsthatrankagivensetofcompoundsfor agiventargetproteinorenzyme.Themajordifferencebetweenourrstmethodandtraditional in-silicoscreeningmethodsisthatweconsideradditionalproteinsandenzymeswhileranking compoundswhereasexistingstrategiesoftenfocusonlyonthetargetproteinalone.Adrug compoundcanalterthestateofthemetabolicnetwork.Oursecondmethodconsiderstheimpact ofthedrugcompoundsonthemetabolicnetworkbyintegratingtheinteractionsamongproteins inmetabolicnetworkswiththedockingresults.Experimentsonthepharmacologicchaperonesof misfoldedrhodopsinshowthatourmethodhasbetteraccuracythanthetraditionalmethodsthat focusonlyonrhodopsin.Ourresultsareinthetop5.7%ofallpossiblerankings.Forthesame dataset,thetraditionalmethod'sresultsareinthetop81%ofallpossiblerankings. 6.1MotivationandProblemDenition Insilicocompoundselectionndinganewcandidatedrugfromlargelibrariesofcompoundsbycomputeraid,playsasignicantroleinmoderndrugdiscovery.Selectingcandidate drugsi.e.,compoundsthatsatisfythepharmacologicalproperties,i.e.absorption,distribution, 143

PAGE 144

metabolism,excretionandtoxicity,willlargelyreducethecostandtimeoftherestofthedrug discoveryprocess,e.g.invitroandinvivoscreens,preclinicaltestingandclinicaltesting. Oneofthepopularcompoundselectionmethodsistoscreenlibrariesofsmallcompounds fortheirabilitytobindtobiologicaltargetssuchasreceptorsandenzymes[38].Thisprocessis alsoknownasdocking.Dockingalgorithmsestimatehowtwomoleculescanbindwitheach othertoformastablecomplex[59].DOCK[26],Glide[30]andGOLD[41]areafewexamples ofexistingdockingsoftware.Thesetoolspredictthebindingafnitybetweeneachsmall moleculeandthetargetprotein.Oncethedockingsoftwarecomputestheafnityofeachsmall moleculeinalibraryofmolecules,thenextstepisoftentopicktheonesthathavehighpredicted afnityvaluesandtesttheminthelab.Thisprocesshasbeensuccessfulinseveralapplications. Forexample,LyneetalusedFlexX-Pharm[34]dockingsoftwaretosearchabout200,000 compoundsandidentiedfournovelclassesofinhibitorforChk1kinase[63].Kellenbergeret al.searchedabout44,000compoundsbydockingsoftware,GOLDandSurex,andfoundnovel non-peptideligandsforGPCRCCR5[50]. Despitesomesuccessincompoundpredictioninseveralapplications,recentvalidation studiesshowthatdockingmethodshaveapoorperformanceincompoundselection[110]. Warrenetalevaluated10populardockingprogramsand37scoringfunctionsforeightproteins. Theresultsshowedthatnoneofthedockingprogramsorscoringfunctionscanpredictauseful bindingafnity[109].Often,thecompoundsthathavethehighestafnitytothetargetedprotein dononecessarilyshowthehighestactivitylevelstheobservedresultsbyinvitroorinvivo assayswhentheyaretestedinlab.Toverifythis,wehaverankedthecompoundsfromareal applicationusingdockingalgorithms.Wehavealsorankedthesamecompoundsaccordingto theiractualactivitiesinvivoexperiments.Thetworankingweresignicantlydifferentdetails inSection6.3.Furthermore,theonesthatproducehighactivityinvitroarefrequentlytoxic, makingthempracticallyuseless. 144

PAGE 145

Therearemanyfactorsthatcausethefailureoftheuseofdockingprograms.Amajor factorthatcausesdeviationbetweenthepredictedafnitiesandobservedactivities,whichisalso thecentralproblemtackledinthischapter,isthatexistinginsilicomethodsrankcompounds solelybasedontheirafnitiestothetargetedproteins.Clearly,thisisnotarealisticstrategy asthedrugmoleculecanbindtoproteinsotherthanthetargetedone.Thisreducesthechance thatitwillbindtothetargetprotein.Furthermore,theproteinsandenzymesdonotnecessarily workindependentofeachother.Theyofteninteractoveracomplexbiologicalnetwork.Ifsome oftheseproteinsareenzymes,thereactionscatalyzedbythemmaybeinuenced.Thus,the concentrationofthesubstratesorproductsinthemetabolismmaybealtered.Asaresult,the metabolismmaybealteredortoxicitymayoccur. Anothersignicantfactoristheinaccuracywithinthedockingalgorithm.Forexample, mostofthedockingsoftwareslackpropertreatmentofproteinexibilityandsolvation.Treating thereceptorastoorigidorexiblealsoresultsinincorrectbindingafnity.Insomecases,the incorrectchoiceofprotonationortautomerisationstatescontributestoasignicanterrorin scoring[110].Newversionsoftheexistingdockingsoftwarehavealreadybeendevelopedto alleviatetheseproblems. Problemdenition: Inthischapter,weconsiderthecompoundselectionproblem.Wedene thisproblemasfollows.Assumethatwearegivenatargetproteinorenzymeandalibraryof compounds.Compoundselectionproblemaimstoidentifythecompoundsfromthislibrarythat willbindtothetargetatahighrateandchangetheactivitylevelofthetarget.Morespecically, wedeveloparankingalgorithmthatsortsthecompoundsinthecompoundlibraryaccordingto theirprobabilityofalteringtheactivityofthetargetproteinorenzyme. Contributions: Thecentralhypothesistestedinthischapteristhatin-silicocompoundselection willbemoreaccuratebyconsideringarichsetofproteinsandenzymesinthemetabolism,in additiontothetargetedprotein.Wedeveloptwonovelinsilicocompoundselectionmethods. 145

PAGE 146

Themajordifferencebetweenourmethodsandtraditionalin-silicoscreeningmethodsisthatwe consideradditionalproteinsandenzymesforrankingdrugcompoundswhileexistingstrategies oftenfocusonlyonthetargetproteinalone.Wealsoconsidertheinteractionsamongproteinsby integratingmetabolicnetworkswiththedockingresults. Givenatargetprotein,rst,weselectasetofproteinsbesidesthetargetedone.Thisset containstheproteinsthatarestructurallysimilartothetargetproteinandtheproteinsthatare closetothetargetproteininthemetabolicnetwork.Theformerproteinsaretheonesthathavea highchanceofbindingtothesamedrugmoleculeasthetarget.Thelatterproteinsaretheones whoseinhibitionscanalterthemetabolismgreatlyiftheyareinhibitedalongwiththetarget.We, then,predictthebindingafnityvaluesbetweentheseselectedproteinsandcandidatecompounds usingadockingtool. Ourrstmethoduseslinearregressiononthepredictedafnityvaluesandlearnsthe parametersoftherankingfunctionthatexplainsthehistoricalexperimentdatabasethebest.In ordertoavoidtheoverttingproblem,itrecursivelyeliminatestheproteinthathasthelowest correlationwiththeactivitylevelobservedusinginvitroorinvivoexperimentsuntilallthe remainingproteinshavehighcorrelation. Oursecondmethodcomputesthebindingprobabilityofthecompoundswitheachof theproteinsfromthepredictedafnities.Basedontheseprobabilities,itusesMonteCarlo simulationtocomputetheexpectedimpactofeachcompoundtothemetabolicnetwork.The impactofacompoundtoanetworkistheamountofchangeinthesteadystateofthatnetwork whenthatcompoundisincluded.Thisstrategyapplieslinearregressionontheseimpactvalues andlearnstheparametersthatexplainsthehistoricalexperimentdatabasethebest. Figure6-1illustratesthethreecompoundselectionstrategiesdiscussedabove.Strategy`A' isthetraditionalin-silicoscreeningmethod.Thismethodconsiderstheafnityofthecompounds totargetproteinsonly.Here,dockingalgorithmcomputestheafnitybetweenagivencompound 146

PAGE 147

andaprotein.`B'denotesourrstrankingmethod.Thismethodusesafnityofcompoundsto asupersetofthetargetproteins.`C'denotesoursecondrankingmethod.Thisoneestimatesthe impactofcompoundsonmetabolicnetworksfromtheirafnitiestotheenrichedsetofproteins. Experimentsonthepredictionofpharmacologicchaperonesofmisfoldedrhodopsinshow thatbothofourmethodshavebetteraccuracythanthetraditionalmethods.Oursecondmethod hasthehighestaccuracyamongthethree.Ourresultsareinthetop5.7%ofallpossiblerankings. Forthesamedataset,thetraditionalmethod'sresultsareinthetop81%ofallpossiblerankings. Insummary,thetechnicalcontributionsofthischapterareasfollows. Wedeveloptwomethodsthatusemultipleproteinsforrankingcompounds.Thesemethods useafnitiesoftheproteinstothecompoundsandintegratetheimpactofcompoundson metabolicnetworks. WedevelopaMonteCarlomethodforcomputingtheexpectedimpactofacompoundona metabolicnetwork. Weexperimentallyverifyourmethodonarealproblem,RetinisPigmentosa,bothinsilico andinvitro.Theresultsshowthatouralgorithmrankscompoundssignicantlybetterthanthe traditionalmethod[50,63,68]. Therestofthechapterisorganizedasfollows.Section6.2describesourinsilicocompound selectionmethod.Section6.3presentstheexperimentalresults.Section6.4discussesthis chapter. 6.2Methods Inthissection,wediscussourinsilicocompoundselectionmethod.Ourmethodemploys existingdockingsoftwareasabuildingblock.Italsousesasmalldatabaseofhistoricalobserved activityresultsobtainedinvitroorinvivoandin-silicoafnityvaluesobtainedbyusinga dockingsoftware.Briey,ourmethodworksasfollows. 147

PAGE 148

Itselectsasetofproteinsofinterestbesidesthetargetoneandpredictsthebindingafnity valuesbetweentheseselectedproteinsandcandidatecompounds.Section6.2.1 Thebasicalgorithmranksthecandidatecompoundsbasedonthesepredictedafnityvalues Section6.2.2.1.Weincorporatetheafnityvaluesandtheimpactofthecompoundsonthe metabolicnetworkstotherankingfunctionSection6.2.2.2. 6.2.1ProteinSelection Thecentralhypothesistestedinthischapteristhatin-silicocompoundselectionwillbe moreaccuratebyconsideringarichsetofproteinsandenzymesinthemetabolism,inaddition tothetargetedprotein.Theobviousrststepthenwouldbetocomputetheafnitybetweenall compound-proteinpairsbeforeselectingacompound.Thisstrategyhoweverisnotfeasiblefor severalreasons. Thesizeoftheproteinandthecompounddatabasesmakeitimpractical.Ittakesmorethan veminutestocomputetheafnityofacompoundtoasingleprotein.Thereare10million compoundsinPubChem.Evenifwerestrictthecompounddatabaseto0.1%ofPubChem,it willtakemorethanvethousandyearstocomparethemagainstallofthe58,000proteinsinthe ProteinDataBankPDBusingasingleCPU. Mostofthecompound-proteinafnitieswouldbeuselessforcompoundselectionasa signicantportionofthemwillhavenoafnity. Thedatasetsthatcontainactualinvitroexperimentalresultsonactualcompound-protein bindingsareverylimited.Asaresultofthis,usingtheafnityvaluesofmanyproteinstolearn afunctionthatestimatesdrugactivitywillresultinoverttingproblem.Inotherwords,the modelthatcanbelearnedusinganymachinelearningmethodwillexplaintheobserveddata well,butitwillfailtogeneralizeunobservedcompounds. Inordertoavoidtheseproblems,weselectasmallsetofrelevantproteins.Notethat, thisselectioncanbeassistedbyexpertanalysis.Inordertoautomatethecompoundselection 148

PAGE 149

process,wedevelopageneralizedframeworkthatcanworkwithoutexpertguidanceusingthe followingtwocriteria. Structuralproperties. Clearly,afeasibledrugcompoundneedstobechosensothatit bindstothetargetproteinstrongly.Thismeansthatthereisahighprobabilitythatthesame compoundbindstoalltheproteinsthatarestructurallysimilartothetargetprotein.Thus,we choosealltheproteinsthatareinthesamesuperfamilyasthetargetproteinintoproteinset.We usetheSCOPdatabase[65]todeterminethesuperfamilyoftheproteins.Notethatonecanalso useastructuralalignmentalgorithmsuchasCE[39]orDali[36]forthispurposeandchoosethe proteinsthatalignwellwiththetargetprotein. Spatialproperties. Proteinsinteractwitheachotheroveracomplexbiologicalnetwork. Asaresult,whenacompoundbindstoaprotein,itseffectisnotlimitedtothefunctionofthat protein.Itcanaffectalargesubsetofthemetabolism.Thiskindofinuenceusuallymagnies greatlywhenthesamecompoundbindstomultipleproteinsthatareclosebyintheinteraction network.Totaketheimpactofthecompoundsintoaccountbetter,wechoosealltheproteinsthat areintheneighborhoodofthetargetedproteininthemetabolicnetwork.Wesaythataprotein isintheneighborhoodofthetargetproteinifitappearsinthesamenetworkorinanetworkthat sharesatleastonecommoncompoundwithanetworkthatthetargetproteinappears. WeusethenetworksprovidedintheKEGGdatabase[47]todeterminewhethertwoproteins areneighbors.Notethatonecanuseothernetworkdatabasesaswellforthispurpose. 6.2.2RankingCompounds Givenatargetproteinorenzyme,sofarwehavediscussedhowweselectasetofproteins thatwillhelpinrankingthecompoundsinalargelibraryofcompounds.Wedeveloptworanking methodsinthissection. Therstoneusesafnityvaluesofalltheselectedproteinscomputedbyadockingprogram. Section6.2.2.1 149

PAGE 150

ThesecondonecomputestheexpectedImpactofeachcompoundtothemetabolic network.Section6.2.2.2 Bothmethodsapplylinearregressiontolearntheparametersthoseexplainthehistorical experimentdatabasethebest. Ouralgorithmslearntherankingfunctionfromahistoricalqueryworkloadi.e.,training dataset.Thetrainingdatasetisapopulationofcompoundswhoseactualactivitylevelsonthe targetproteinismeasuredbyinvivoorinvitroexperiments.Theseexperimentsrequiremany hoursforeachcompound.Forexample,thepharmacologicchaperoneexperimentsformisfolded rhodopsinneedhalfamonthfor24compounds.Therefore,typicallythetrainingdatasetcontains asmallnumberofcompounds.Asthesizeofthetrainingdatasetgrows,theaccuraciesofour rankingalgorithmsimprove. Wewillusethefollowingnotationintherestofthischapter.Wewilldenotetheselected proteinsseeSection6.2.1with P 1 P 2 P n .Wewilldenotethecompoundsinthetrainingdatasetwith C 1 C 2 C m .Wewillrepresenttheactualactivitylevelofeachtraining compound C i with A i .Wediscussourrankingmethodsnext. 6.2.2.1Rankingbasedonafnities Ourrstrankingmethodisbasedontheafnitiesofthecompoundstotheselectedproteins. Thismethodfollowsfromthehypothesisthatthereexiststhelinearcorrelationbetweenthe afnitiesandtheactivitylevels. Usingadockingsoftware,werstcomputethebindingafnityvalueforeach < training compound,protein > pair.Wewilldenotetheafnitybetweenthecompound C i andprotein P j with a ij .Thus,webuildanafnitymatrix A = f a ij g ,for 1 i m and 1 j n .We uselinearregressiontolearntherankingfunction,whichshowstherelationshipbetweenthe dependentvariableandoneormoreindependentones.Inthisproblem,theindependentvariables aretheactualactivitylevels;thedependentvariableisthepredictedafnityvalue.Formally,we 150

PAGE 151

computethelinearfunction A i = 0 + 1 a i 1 + 2 a i 2 + ::: + n a in + i thatholdsforall A i i 2f 1 2 ::: m g .Here, i denotesanoiseterm,whichisarandomvariable. Weassumethat i followsthestandardnormaldistribution,thatis, i Normal,1.We computetheparametersofequation6usingleast-squaresestimation[14]. Recursivefeatureelimination. Onepotentialproblemwithusinglinearregressiononthe afnitymatrixisthatthismatrixtypicallyhasasmallnumberofrows.Thisisbecauseitisvery costlytogettheexperimentaldatawhichtellsthevalueofthedependentvariableinpractice. Asaresultofthis,oftenonlyseveraltensoftheexperimentalresultsareobserved.Whenthe numberofselectedproteinsi.e.,independentvariablesorcolumnsinthematrixislarge,there isadangerthatthelinearregressionwillovertthedata.Thatis,theindependentvariablescan explainthedependentoneperfectlyontheobserveddata,butitwillfailonunobserveddata. Thereforeacomputationalapproachisneededtoaddressthisproblem. Inordertoavoidtheoverttingproblemwereducethenumberofindependentvariables byrecursivefeatureelimination.Thismethodrecursivelyremovesoneproteinfromtheprotein set;theonethatisleastsignicantindeterminingthedependentvariables.Wedeclarean independentvariablesaythe j thvariableasinsignicantifitsabsolutecoefcientvalue j isclosetozerooritsP-valueislargerthan0.05.Iftherearemorethanonesuchproteinswe removetheonewiththelargestP-valueorthesmallest j value.Onceweremovetheleast signicantprotein,weapplythelinearregressionagainandeliminateanotherproteiniteratively. Thiseliminationprocessstopswhenalltheremainingproteinsaresignicant.Algorithm6.2.2.1 presentsthepseudocodeforrecursivefeatureelimination. Rankingunobserveddata. Onceweeliminatealltheredundantproteins,thenextstepisto buildarankingfunctionthatpredictstheactivityoftheunobservedcompoundsfromtheir 151

PAGE 152

Algorithm6.1 Recursivefeatureelimination 1.Uselinearregressionontheafnitymatrix A andtheobservedactivities[ A 1 A 2 A m ]to computecoefcients j and i 8 i;j 1 i m and 1 j m 2.Amongallthefeaturesi.e.,proteinswithinsignicantcoefcientvaluei.e., j j j < 0.01or withP-valuelargerthan0.05,selecttheonewiththelargestP-value. 3.Ifnoproteinisselectedinthepreviousstep,thenallthefeaturesaresignicant.Thus,return thecurrentproteinsetandstopfeatureelimination. 4.Otherwise,thereisatleastoneinsignicantfeatureinthefeatureset.Removetheprotein choseninStep2fromtheproteinset.Removethecorrespondingcolumnfromtheafnitymatrix A 5.Ifthefeaturesetisnotempty,gotoStep1. afnitiestotheremainingproteins.Webuildthisrankingfunctionusinglinearregressionaswell. Assumethatthenumberofnon-redundantproteinsafterfeatureeliminationis n 0 n 0 n .We thenuselinearregressiontolearnthecoefcients i i n 0 asdiscussedatthebeginningof thissection.Therankingfunctionisthen Rank [ x 1 ;x 2 ; ;x n 0 ]= 0 + n 0 X i =0 i x i where x i istheafnitybetweentheunobservedcompoundandthe i thnon-redundantprotein computedbytheunderlyingdockingsoftware.Thelargerthevalueofthisfunction,thehigher thepredictedactivityofthecompound. 6.2.2.2Integratingnetworksinranking Therankingmethodwedevelopedintheprevioussectionassumesthatthereisalinear correlationbetweentheafnitiesofthecompoundstoselectedproteinsandtheactivitylevels ofthosecompounds.Therelationshipbetweenthetwohowevermaynotbeasimplelinear function.Thisisbecausetheproteinsandenzymesinteractwitheachotheroveracomplex metabolicnetwork.Whenacompoundbindstoaproteinorenzyme,itmayaffecttherestof themetabolismevenifitdoesnotbindtootherproteinsorenzymes.Forexample,thereactions 152

PAGE 153

catalyzedbythebindingenzymesmaybeinuenced.Thus,theconcentrationofsubstratesor productsinthemetabolismmaybealtered.Asaresult,thefunctionofthemetabolismmaybe alteredortoxicitymayoccur. Inthissection,wedevelopanewrankingfunctionthatintegratesthemetabolicnetworks withthepredictedafnityvalues.Unlikethepreviousrankingfunction,itcomputestheimpact ofthecompoundonthemetabolicnetworksandusesthesevaluestodeterminearankforeach compound.Thismethodworksasfollows. a.Givenacompoundandasetoftargetproteins,werstselectarichersetofproteinsas explainedinSection6.2.1.Wethencomputetheafnitymatrix A asdiscussedinSection6.2.2.1. b.Usingtheafnityvaluesin A ,wecomputeabindingmatrix B .Thebindingmatrixhas thesamenumberofdimensionsastheafnitymatrix.Eachentry b i;j i m ,1 j n of B showstheprobabilitythatthe i thdrugcompoundwillbindtothe j thselectedprotein. c.Wecomputetheexpectedimpactofeachcompoundoneachmetabolicnetworkusing MonteCarlosimulations. d.Wederivearankingfunctionbasedontheexpectedimpactvalues. Sections6.2.1and6.2.2.1discussedtherststepabove.Weelaborateontheremaining stepsnext. Step2:Thebindingmatrix. Bindingacompoundinacelltoaproteinorenzymeisaprobabilisticeventthatdependsonanumberoffactors.Oneimportantfactoristheirafnities.Inorder tocomputetheimpactofacompoundonthemetabolism,weneedtomodelthisprobabilistic event.Wederivethismodelusingtheafnitymatrix A Weusetherandomvariable B j todenotetheeventthatagivencompoundbindsto j th protein B j =1 ifitbinds, B j =0 otherwise.Weestimatetheprobabilitythat B j = 1 foracompoundfromitsafnityvalue a i;j .Todothis,werstanalyzethedistributionofthe 153

PAGE 154

afnityvaluesofalargenumberofrandomlyselectedcompoundstoagivenprotein.Figure62showsthedistributionofbindingafnityforrhodopsinPDB:1F88onthediversitysetof NCI/DTP[35].Thissetcontains1,990compounds.Theafnityvaluesseemtofollowanormal distribution.Themean andthevariance 2 ofthisdistributioncanbeobtainedasthoseof severalhundredrandomlyselectedcompounds. Smallerafnityvaluesimplythatthecompoundwillbindtothatproteinwithahigher chance.Sincethecompoundiscompetingwithothercompoundstobindtoaprotein,the probabilitythatitbindstothatproteinalsodependsontheafnityvaluesofthecompeting compounds.Fromtheseobservations,wederivetwoalternativewaystomeasuretheprobability thatacompoundbindstothe j thprotein. Allcompoundsarecompeting.Inthiscaseweassumethatallcompoundscompetefor thesameprotein.Inthiscase,theprobabilitybecomesthecumulativedensityfunctionofthe distributionwefound.Moreprecisely Prob B j =1 j Afnity a = 1 2 + 2 p Z )]TJ/F26 5.9776 Tf 5.756 0 Td [(a p 2 e )]TJ/F25 7.9701 Tf 6.587 0 Td [(t 2 dt Allthenaturalligandsarecompeting.Proteinsoftenhaveoneormorenaturalligandsthat bindstronglytothem.Thusanewcompoundhastocompetewiththeminordertoreplacethem. Assumethatthe j thproteinhas g ligands.Alsoassumethattheafnityofthe k thligandis L k Then,theprobabilitythatacompoundwithafnity a winsoverthe k thligandis p k = 1 2 + 2 p Z L k )]TJ/F26 5.9776 Tf 5.756 0 Td [(a p 2 e )]TJ/F25 7.9701 Tf 6.587 0 Td [(t 2 dt : Thus,theprobabilitythatthiscompoundwinsoveratleastoneofthenaturalligandsis Prob B j =1 j Afnity a =1 )]TJ/F25 7.9701 Tf 17.435 15.431 Td [(g Y i =1 )]TJ/F24 11.9552 Tf 11.955 0 Td [(p i : 154

PAGE 155

Now,wearereadytoformallydenethebindingmatrix B .Eachentryofthismatrixshows theprobabilitythatacompoundbindstoaproteinintheproteinset.Wecomputethisvalueas b i;j = Prob B j =1 j Afnity a i;j : Step3:Theimpactofcompoundsonnetworks. Oncewecomputethebindingmatrix,we estimatetheimpactofeachcompoundonthemetabolicnetworks.Weelaborateontheimpact computationlaterinthissection.Weconsiderthemetabolicnetworksthatcontainatleastoneof theselectedproteins/enzymes.Thus,iftheselectedsetofproteinsappearin k differentnetworks, thenwecompute k impactvaluesforeachcompound.Thisisdesirablesincethenumberof networksismuchsmallerthanthenumberofenzymesorproteins.Inotherwords,keeping impactvaluesratherthanafnitiesnaturallyreducesthenumberoffeatures,andthustheriskof dataovertting.Wediscussthecomputationofimpactonanetworknext. Whenadrugcompoundbindstoanenzyme,itinhibitsthatenzymeandslowsdown itsreaction.Thiscanaffecttherestofthenetworkasotherreactionsmaybeconsumingthe compoundsproducedbythereactioncatalyzedbythatenzymeorproducingthecompoundsas thesubstrateofthereactioncatalyzedbythatenzyme.Wedenetheimpactofinhibitingasetof enzymesofametabolicnetworkasthechangeinthesteadystateofthatnetwork. Inordertodiscusstheconceptofimpact,werstneedtounderstandthesteadystate ofanetwork.Thesteadystateofanetworkistheuxofallthereactionswhentheuxdoes notchangeanymore.Analternativedenitionistheyieldofallthecompoundsinthatnetwork whentheyielddoesnotchangeanymore.Onewaytocomputethesteadystateofthemetabolic networkistouseFluxBalanceAnalysisFBA[7,29,48].Inourearlierwork,wehavealso denedanalgorithmforcomputingthesteadystate[93].Wereferthereadertothesepapers forthedetailsofthesteadystatecomputation.Letusdenotethesteadystateofanetwork beforeandafterinhibitingagivensetofenzymes S withvectors u =[ u 1 ;u 2 ; ;u c ] and 155

PAGE 156

v =[ v 1 ;v 2 ; ;v c ] .Here, u i and v i aretheyieldofthe i thcompoundinthenetwork,andthe numberofcompoundsis c .Wecomputetheimpactas Impact S = c X i =1 j u i )]TJ/F24 11.9552 Tf 11.955 0 Td [(v i j : Now,weknowhowtocomputetheimpactwhenagivensetofenzymesareinhibited.We stillneedtotackleonemoreproblem.Theprocessofbindingacompoundtoaproteinorenzyme isaprobabilisticevent.Therefore,wedonotknowinadvancewhichenzymesareinhibited. Mathematically,theexpectedimpactofeachcompoundonanetworkcanbecomputedas follows.Letusdenotethesetofselectedproteinswith P .Wedenotetheprobabilitythatthe i th compoundbindstothesetofproteinsin S SP withProb Sj i th compound .Theexpected impactofthe i thcompoundisthen X SP Impact S Prob Sj i th compound : Computingthissummationisdifcultasthenumberofsubsets S isexponentialinthe sizeof P .WethereforecomputetheexpectedimpactofagivencompoundusingMonteCarlo simulationinAlgorithm6.2.2.2.Oncewecomputetheexpectedimpactvalues,weuselinear regressiononthesevaluestolearntherankingfunctionfortheactualactivitylevelssimilarto Section6.2.2.1. 6.3Results Wehavetestedthealgorithmsdescribedintheprevioussectionforpredictionofongoing experimentsonderivingpharmacologicchaperonesofmisfoldedrhodopsin.Inthissection,we presenttheperformanceofouralgorithmsonthedatasetthatisavailabletodate. Application: RetinitispigmentosaRPisaprevalentgeneticeyedisease[6]thatiscausedbya mutationinrhodopsinamongnearly20%to25%ofpatientswithautosomaldominantRP[68]. InUnitedStates,approximately,onethirdofsuchcasesresultsfromtheP23Hmutation[68]. 156

PAGE 157

INPUT:Metabolicnetworks,bindingmatrix B ,compoundnumber i OUTPUT:Expectedimpact Algorithm6.2 ExpectedimpactbyMonteCarlosimulation 1.Createasamplestateforalltheselectedproteinsasfollows.Foreachprotein j intheselected proteinset,randomlydecidewhethercompound i bindstoprotein j usingBernoullidistribution. Thebindingprobabilityforthe j thproteinis b i;j 2.Removetheproteins/enzymesforwhichtheBernoullitrialissuccessfulinthesamplestate. ComputetheimpactofremovingtheseenzymesoneachnetworkasdiscussedEquation6and storethesevalues. 3.Repeatsteps1and2manytimesafewhundredtimesissufcientinpracticetodramatically reducethevarianceoftheestimate.Reporttheaverageofallimpactvaluescomputedinthese iterations. OnestrategytodealwithRPistondcompoundswhichactaspharmacologicchaperones bindingandstabilizingmutantproteins.Thus,thesecompoundschangethefunctionofmutant proteinsandarrestRPpotentially[68,88,102].Weapplyourinsilicocompoundselection methodtoidentifythesmallmoleculepharmacologicchaperonesofmisfoldedrhodopsin. S ELECTEDPROTEINS .InadditiontoRhodopsinPDB:1F88,wecomputedtheafnityto RhodopsinkinaseEC:2.7.11.14,PDB:3C4WbecauseitcatalyzesthereactionATP:rhodopsin phosphotransferasethatproducesrhodopsin.Rhodopsincomesfromretinolmetabolism. Thus,theenzymesofretinolmetabolismcaninuencethegenerationofrhodopsin.KEGG database[47]showsretinolmetabolismofHomosapiens.Therearetensofenzymesinretinol metabolism.Welimitourfocusononlyenzymesthathavea3-Dstructures.ThesearealcoholdehydrogenaseEC:1.1.1.1,PDB:1U3T,unspecicmonooxygenaseEC:1.14.14.1, PDB:2FDU,retinaldehydrogenaseEC:1.2.1.36,PDB:1BI9andglucuronosyltransferase EC:2.4.1.17,PDB:2Q6V. 157

PAGE 158

Starchandsucrosemetabolismhsa00500canalsoinuenceroletheretinolmetabolism. Specially,ketodaseEC:3.2.1.31,PDB:1BHGcatalyzesthereactionwhichconsumesGlucuronide.RetinalmetabolisminuencesthegenerationofGlucuronide.Therefore,whenthe functionofketodaseisinected,thefunctionofretinolmetabolismmaybeinuenced.Wealso addedthesetothelist. D OCKINGSOFTWARE .Wecomputetheafnitybetweenacompoundandproteinbasedon computingthedockingenergy.Anumberofsoftwarepackagesareavailableforthispurpose. SomeofthepopularonesareDOCK[26],Glide[30]andGOLD[41].StudiesbyCummingset alconcludedthatGlideprovidedthemostconsistentresultsontheirtestcases[17].Therefore, wechoseGlideasourdockingsoftwareinthischapter. S ELECTEDMETABOLICNETWORKS .Asdescribedintheprevioussection,oneofour methodsalsoconsidertheinteractionsamongproteinsbyintegratingmetabolicnetworks. RhodopsinworksonretinolmetabolismandconnecttoStarchandsucrosemetabolism.Thus,we considerthesetwometabolicnetworks.Figure6-3showstheconsideredmetabolicnetworksin thischapter. I N VITROANDIN VIVODATA .Wehavemeasuredtheactualactivitylevelsof23compoundsaspharmacologicchaperonesofmisfoldedrhodopsinthroughinvitroexperimentationat lab.Inordertoensurethatourexperimentsreecttheactualproteinactivity,wehavealsocarried outin-vivoexperimentsonBovinecellculture.Intheseexperiments,thecellculturecontains mostoftheproteinsandenzymesofthisorganismandthecorrespondingpathways.Thus,we expectthatourwet-labresultswillbesimilarthosethatonecanobserveonsimilarorganisms. Inthissection,ourgoalistopredicttheobservedactivitylevelofthesecompoundsusing in-silicodatacollectedfordockingwithRhodopsinandotherproteins.Table6-1providesthe listofcompoundsthathavebeentestedsofar.Obsdenotestheobservedresultsbyinvivo orinvitroassays.hsa00830istheretinolmetabolismandhsa00500istheStarchandsucrose 158

PAGE 159

metabolismofHomosapiens.ThevaluesinAfnitytoproteinscolumnsshowthebinding afnitybetweencompoundsandproteins.ThevaluesinImpactonpathwayscolumnsshow theimpactvaluesforthecompoundsonthenetworks.Rhodeonotesrhodopsin.Thetotal experimentaltimerequiredformeasuringtheactivitylevelsofthesecompoundsinthelabwas aroundonemanmonth.TheObscolumninTable6-1showstheobservedresultsbyinvitroor invivoassays. Experimentdesign: Weusedvefoldcrossvalidationforcomparingthedifferentmethods. ForthiswedecomposedthedatasetinTable6-1intovegroupsrandomlysuchthateachof thegroupsroughlyhavethesamenumberofcompounds.Foreachfold,wetreatedoneofthe groupsasatestingdatasetandtheotherfourgroupsasatrainingset.Thefourmethodsthat werecomparedareasfollows. T ARGET ONLY :Thisisthetraditionalscreeningmethodusedintheliterature[50,63,68]. Thismethodfocusesonlyonthetargetproteinalone,thatis,rhodopsininourbiological application.Toreplicatethisprocess,weuselinearregressiontobuildalinearregressionbased onthebindingafnityofrhodopsinandtheobservedactivity. A FFINITY :ThisistherankingmethoddescribedinSection6.2.2.1.Itusesthebinding afnityinformationofadditionalproteins. R ANDOM -A FFINITY :Aninterestingquestionwouldbetowhatextenddoestheprotein selectionhelpinouralgorithm?.Toanswertothisquestion,wehaveimplementedarandomized versionofourafnityalgorithm.Thisalgorithmusesthesamenumberofproteinsanditworks thesameasourafnitymethod.Thedifferenceisthatitusesrandomlyselectedproteins. I MPACT :ThisistherankingmethoddiscussedinSection6.2.2.2.Itestimatestheimpact ofcompoundsonmetabolicnetworks.Preliminaryexperimentsshowedthatequation6hada betterresultthanequation6.Thus,weassumethatallthecompoundsarecompetingandapply equation6tocomputetheprobabilitythatacompoundbindstotheprotein. 159

PAGE 160

6.3.1AccuracyinEstimation Wetesttheaccuracyofeachmethodasfollows.Foreachmethod,webuiltamodelusing thetrainingset.Wethenusedthismodeltopredicttheactivitiesofthesamplesinthetest.We measuredtheaccuracyofthesepredictionsusingthreecriteria.Therstoneisthecorrelation coefcientbetweenthepredictedandtheobservedactivitylevels.Thesecondandthelastones aretheaverageabsolutedistanceandtheaveragesquareddistancebetweenthepredictedand actualactivitylevelsrespectively.Wereporttheaverageofalltheverunsforthevetraining datasets. Table6-2reportstheaccuraciesoffourmethods.Theresultsshowthatourimpactmethod hasthebestaccuracybothintermsofabettertandlowererrorinestimation.Thisindicates thatconsideringtheimpactofthecompoundsonthemetabolicnetworksismorepromisingthan consideringtheafnitybetweentheindividualproteinsandthecompounds.Thisisbecause alteringtheactivitylevelofasetofproteinscanaffecttheactivityoftheremainingproteins andenzymesinthemetabolicnetworkthroughinteractionsevenwhenthedrugcompoundhas littleornoafnitytothem.OurImpactmethodexploitsthisobservationsuccessfullytorankthe compounds. OurafnitymethodhassignicantlybettercorrelationcoefcientthantheRandom-Afnity method.Thissupportsourhypothesisthatproteinsshouldbecarefullyselected.Interestingly, thetraditionalmethodperformsbetterthantheRandom-Afnitymethod.Thismeansthatsimply usingmoreproteinstoselectcompoundsdoesnothelpinrankingcompounds.Theymayinstead lowerthequalityofthenalrankingastheymayintroducenoise.Thus,theadditionalproteins needtoberelevanttothetargetproteins. Finally,weobservethatthepredictionaccuracyoftheafnitymethodandthetraditional methodarecomparable.Therefore,onemightthinkthatthesetwomethodshavethesame 160

PAGE 161

valueinselectingcompounds.Nextsectioncarriesoutanotherexperimentthatshowsthatthe traditionalmethodperformssignicantlyworsethanbothofourmethods. 6.3.2AccuracyinRanking Theeventualgoalofacompoundselectionalgorithmistorankcompoundsaccordingto theirexpectedactivitylevels.Inthissection,weevaluatetheabilityofeachofthealgorithms tocorrectlyrankthetestset.Todothiswecomputetherankingerrorofeachalgorithmas follows.Werstrankthetestcompoundsbasedonourprediction.Wethenrankthesame compoundsbasedontheactualactivitylevelsofthecompounds.Wethencomputethesumof the j predictedrank-actualrank j ofallthecompoundsinthetestset. Wealsogeneratedadistributionoftherankingerrorbyrandomlypermutingthecompounds inthetestsetalargenumberoftimes.Weusedthisdistributiontoderivetheexpectedvalue ofthesumforarandomranking.Wealsocomputedtheprobabilityofanullhypothesisthat amethodisgeneratingarandomrankingbeingcorrectbasedonthesumderivedusingthat method.Forinstance,ifthisprobabilityvalueis10%foragivenranking,thenitimpliesthat 10%oftherandomrankingswillproducealesserrankingerrorthenthatranking.Alowervalue ofthisprobabilityimpliesthatthechancesarehighthatthemethodcanbetterpredictrelative rankingofcompounds. Table6-3showstheresults. R p denotestherankofthepredictresult. R a denotestherank oftheactualobservedresult.Percentagedenotesthepercentagethatthevalueof P j R p )]TJ/F24 11.9552 Tf 11.955 0 Td [(R a j byrandomrankingislessthanthevaluebythemethod.Fromtheresults,theimpactmethod predicttherankswithasmallestdistancefromtheactualranks.Thiscoincideswithourresults inTable6-2.Lessthan6%resultsbyrandomrankingarebetterthanthesumgeneratedbythe impactmethod.Also,justusingafnityforthetargetproteinonlyisnotsignicantlybetteror worsethanrandomranking.Weobservethatourafnitymethodissignicantlybetterthanthe traditionalmethodinselectingcompoundsalthoughtheyhavethesamepredictionerrorsee 161

PAGE 162

resultsinTable6-2.Thisimpliesthatsmallalterationsinthepredictedactivitylevelscanlead tosignicantchangesintherankingerror.Finally,extremelyhighprobabilityvalue.3%for thetarget-onlymethodimpliesthatthetraditionalmethodswillfailtodistinguishcompounds thathavehighafnitytothetargetprotein.Thus,fromTables6-2and6-3,weconcludethat consideringmetabolicnetworksimprovesrankingofcompoundssignicantly. 6.4Discussion Inthischapter,wedeveloptwonovelinsilicocompoundselectionmethods.Themajor differencebetweenourmethodsandtraditionalin-silicoscreeningmethodsisthatweconsider additionalproteinsandenzymesforrankingdrugcompoundswhileexistingstrategiesoften focusontargetproteinalone.Weconsidertheinteractionsamongenzymesbyintegrating metabolicnetworkswiththedockingresults.Inparticularourmetabolicnetworkbasedmethod usesMonteCarlosimulationstocomputetheimpactofeachcompoundtothemetabolic network.Theimpactofacompoundtoanetworkistheamountofchangeinthesteadystateof thenetworkwhenthatcompoundisincluded. Experimentsonthepredictionofpharmacologicchaperonesofmisfoldedrhodopsinshow thatourinsilicocompoundselectionmethodhasbetterperformancethanthedockingmethod thatonlyusesrhodopsin.Also,justusingafnityforRhodopsinaloneisnotsignicantlybetter orworsethanrandomranking. Ourmethodscanselectcompoundsbasedonlearningfrompreviousresults.Wecandothis asfollows.First,webuildatrainingdatasetwhichisapopulationofcompoundswhoseresults aremeasuredbyinvivoorinvitroexperiments.Asthesizeofthetrainingdatasetgrows,the accuraciesimproves.Infact,thetrainingdatasetcontainsasmallnumberofcompoundsfor eachexperimentalresultusuallyrequiresmanyhours.Weprovidetwomethodsforrankingthe compounds,rankingbasedonafnitiesandintegratingnetworksinranking.Weestimatethe parametersofthesetwomethodsbasedonthetrainingdataset.Thusanactivelearningprocess 162

PAGE 163

canbecreatedthatcontinuouslyrenesthepredictionasmoredataiscollected.Clearlythishas tobeusedcarefullytominimizethepotentialpitfallsofmultihypothesistesting. Figure6-1.AframeworkforthreecompoundselectionstrategiesdenotedbyA,B,andC. Figure6-2.DistributionofbindingafnityforrhodopsinPDB:1F88andallthe1,990 compoundsindiversityset. 163

PAGE 164

Table6-1.Datasetinformationneededfortheapplicationtothesmallmoleculepharmacologic chaperonesofmisfoldedrhodopsin. AfnitytoproteinsImpactonpathways Rho3.2.1.312.7.11.141.14.14.12.4.1.171.2.1.361.1.1.1 CompoundObs1F881BHG3C4W2FDU2Q6V1BI91U3T0083000500 NSC7230.98-8.58-4.32-6.35-7.49-5.25-7.52-7.544.494.88 NSC1639360.3-5.47-4.8-4.64-5.05-4.43-5.03-5.840.757.24 damascone0.8-5.13-4.58-4.94-5.39-3.69-5.11-6.480.856.40 Pseudoionone0-4.99-3.26-4.2-4.58-2.82-4.2-4.130.290.71 dihydro ionone0.7-4.24-4.05-4.58-5.01-4.01-4.18-6.030.323.85 NSC4883-0.14-7.63-4.56-5.82-6.95-5.13-5.67-5.882.536.09 NSC345600.9-5.69-3.89-4.7-5.04-4.47-4.38-5.990.402.77 NSC218-0.08-5.3-5.89-5.73-6.07-4.18-5.93-5.472.199.47 NSC267180.2-4.5-4.41-3.76-5.01-4.36-4.1-4.520.295.48 ionone1-5.55-3.73-5.43-5.4-4.44-3.98-5.570.311.86 irone1-5.71-3.97-4.64-5.02-3.37-4.25-5.240.353.09 ionol0.8-6.12-4.51-5.16-6.12-4.76-4.27-6.170.766.20 NSC450120.4-6.65-4.96-5.42-6.19-4.9-5.57-61.757.92 NSC1639950.5-4.69-4.03-5.79-6.15-4.31-4.21-5.960.883.79 NSC1706910.3-6.48-4.96-6.33-5.5-4.63-5.25-5.221.157.74 NSC47770.5-5.32-5.25-5.32-5.52-4.58-5.06-4.80.948.74 NSC3975-0.01-8.4-5.78-5.74-6.99-6.49-6.62-6.393.619.48 NSC491930.2-3.39-4.38-2.75-3.83-3.15-2.27-3.570.065.53 NSC927990.1-6.26-4.69-3.8-6.49-4.44-5.48-5.381.796.70 NSC415-0.2-5.26-5.36-5.73-5.87-6.16-4.56-6.071.039.06 NSC741590.15-4.84-4.79-4.51-4.55-3.66-4.49-5.260.267.16 NSC475200.45-6.53-5.08-5.9-6.68-4.42-5.96-6.052.338.37 NSC22800.03-7.86-4.17-4.48-8.68-4.29-6.73-6.44.454.42 Table6-2.Theaccuracyoftheactivitylevelpredictionsoffourstrategies.Correlationcoefcient showsthecorrelationbetweentheactualandthepredictedactivitylevels. Ob p and Ob a denotethepredictedandtheactualactivitylevels. CorrelationAveragedistance Methodscoefcient Ob p )]TJ/F24 11.9552 Tf 11.955 0 Td [(Ob a 2 j Ob p )]TJ/F24 11.9552 Tf 11.956 0 Td [(Ob a j Target-only0.650.170.35 Random-Afnity0.560.260.36 Afnity0.650.170.34 Impact 0.680.160.33 Table6-3.Therankingaccuracyofthreestrategies. Methods P j R p )]TJ/F24 11.9552 Tf 11.955 0 Td [(R a j Percentage% Target-only3881.3 Afnity3240.9 Impact 245.69 164

PAGE 165

Figure6-3.RetinolmetabolismandStarchandsucrosemetabolism.Thedarksolidcircleshows theenzymesintheselectedproteinset. 165

PAGE 166

CHAPTER7 CONCLUSION Thefocusofthisthesisisontheenzymatictargetidenticationandcompoundselection. Thetheoreticalandpracticalndingsofourworkcanbesummarizedasfollows. Wedevelopedascalableiterativemethodwhichcomputesasub-optimalsolutionwithin reasonabletime-boundsfortheenzymatictargetidenticationproblembybooleannetwork models.Themethodconsistedoftwophases:IterationandFusionPhases.Theexperiments onthe E.coli metabolicnetworkshowedthattheaverageaccuracyoftheIterationPhasealone deviatesfromthatoftheexhaustivesearchonlyby0.02%.TheIterationPhaseishighly scalable.Itcansolvetheproblemfortheentiremetabolicnetworkof Escherichiacoli inlessthan 10seconds.TheFusionPhaseimprovestheaccuracyoftheIterationPhaseby19.3%. WeprovedthatndingtheenzymeknockoutstrategybyOptKnockframeworkisNP-hard. Weprovidedabinarymethodandcontinuousmethodfortheenzymatictargetidentication problemonmultipleenzymesassociation.Experimentsshowedthattheenzymeassociation inuencetheperformanceoflinearprogrammingmethodverymuch.Weobservedthatour binarymethodrunsmuchfasterthancontinuousmethod.Forthepathwaysof H.sapiens from KEGG,ourbinarymethodrunslessthanonesecondforthewholemetabolism.Therefore,our binarymethodisusefulforthebiologicalapplication. Wedevelopedtwoalgorithmsfortheenzymatictargetidenticationproblembymanipulatingthesteadystate.Thatis,traversalapproachandgeneticalgorithm.Thetraversalapproach exploredpossiblesolutionsinasystematicwaybylteringandprioritizationstrategiestoreduce thesearchspace.Thegeneticalgorithmderivedgoodsolutionsfromasetofalternativesolutions iteratively,whichcanrunforverylargepathways. Ourexperimentsshowedthatouralgorithms'resultsfollowthoseobtainedinvitroin theliteraturefromanumberofapplications.Theyalsoshowedthatthetraversalmethodisa goodapproximationoftheexhaustivesearchalgorithmanditisupto11timesfasterthanthe 166

PAGE 167

exhaustiveone.Thisalgorithmrunsefcientlyforpathwayswithupto30enzymes.Forlarge pathways,ourgeneticalgorithmcanndgoodsolutionsinlessthan10minutes. Weaddressedanewvariantoftheenzymatictargetidenticationproblem,namedthe dynamicenzymaticidenticationproblem.Unliketheexistingproblemweconsideredthe entiretrajectory,thegivennetworksstatefollowstoreachthesteadystate.Weconsideredthree alternativedistancemeasurestocomputethedissimilaritybetweentwodynamicstates,namely exact,timewarpingandpatterndistance.WeexploitedtheOPMETalgorithmtodevelopabranch andboundstrategythatusesthesedistancemeasurestosolvethedynamicenzymatictarget identicationproblem.Inordertoimprovetherunningtimeofthisalgorithmforthedynamic states,developedapartitioningstrategy. Ourexperimentsdemonstratedthatourmethodis85-100%accuratewhenasingleenzyme isinhibited.Itwas65-75%accuratewhentwoenzymesareinhibited.Furthermore,ourpartitioningstrategyreducedthenumberoftimeintervalscomputedfordynamicstatesbyafactorof2to 6. Wedevelopedtwonovelcomputationalmethodsthatrankagivensetofcompoundsfor agiventargetproteinorenzymeforthecompoundselectionproblemintegratingstructural propertiesofproteinsandbiologicalnetworks.Therstmethodconsideredadditionalproteins andenzymeswhilerankingcompoundswhereasexistingstrategiesoftenfocusonlyonthetarget proteinalone.Adrugcompoundcanalterthestateofthemetabolicnetwork.Oursecondmethod consideredtheimpactofthedrugcompoundsonthemetabolicnetworkbyintegratingthe interactionsamongproteinsinmetabolicnetworkswiththedockingresults.Experimentsonthe pharmacologicchaperonesofmisfoldedrhodopsinshowedthatourmethodhasbetteraccuracy thanthetraditionalmethodsthatfocusonlyonrhodopsin.Ourresultsareinthetop5.7%ofall possiblerankings.Forthesamedataset,thetraditionalmethod'sresultsareinthetop81%ofall possiblerankings. 167

PAGE 168

REFERENCES [1]K.Backman,M.J.O'Connor,A.Maruya,E.Rudd,D.McKay,R.Balakrishnan,M.Radjai,V.DiPasquantonio,D.Shoda,andR.Hatch.Geneticengineeringofmetabolic pathwaysappliedtotheproductionofphenylalanine. AnnalsoftheNewYorkAcademyof Sciences ,589,1990. [2]G.D.Bader,I.Donaldson,C.Wolting,B.F.Ouellette,T.Pawson,andC.W.Hogue. Bindthebiomolecularinteractionnetworkdatabase. NucleicAcidsResearch ,2001. [3]A.Bairoch.Theenzymedatabasein2000. NucleicAcidsResearch ,,2000. [4]D.J.BerndtandJ.Clifford.Usingdynamictimewarpingtondpatternsintimeseries.In KDDWorkshop ,pages359,1994. [5]P.Blandina,G.Cherici,F.Moroni,G.D.Prell,andJ.P.Green.Releaseofglutamate fromstriatumoffreelymovingratsbypros-methylimidazoleaceticacid. Journalof Neurochemistry ,64:788,1995. [6]G.Bonapace,A.Waheed,G.N.Shah,andW.S.Sly.Chemicalchaperonesprotectfrom effectsofapoptosis-inducingmutationincarbonicanhydraseIVidentiedinretinitis pigmentosa17. ProceedingsoftheNationalAcademyofSciences ,101:12300, 2004. [7]H.P.J.Bonarius,G.Schmid,andJ.Tramper.Fluxanalysisofunderdeterminedmetabolic networks:Thequestforthemissingconstraints. TrendsBiotechnology ,15,1997. [8]C.Bro,B.Regenberg,J.Forster,andJ.Nielsen.Insilicoaidedmetabolicengineeringof saccharomycescerevisiaeforimprovedbioethanolproduction. Metabolicengineering 8:102,2006. [9]AnthonyP.Burgard,PritiPharkya,andCostasD.Maranas.Optknock:Abilevel programmingframeworkforidentifyinggeneknockoutstrategiesformicrobialstrain optimization. BiotechnologyandBioengineering ,84,2003. [10]M.Cascante,L.G.Boros,B.Comin-Anduix,P.deAtauri,J.J.Centelles,andP.W.N. Lee.Metaboliccontrolanalysisindrugdiscoveryanddisease. Naturebiotechnology 20:243,2002. [11]G.Q.ChenandQ.Wu.Theapplicationofpolyhydroxyalkanoatesastissueengineering materials. Biomaterials ,26,2005. [12]H.H.Chen.HeartFailure:AStateofBrainNatriureticPeptideDeciencyorResistance orBoth. JournaloftheAmericanCollegeofCardiology ,49,2007. 168

PAGE 169

[13]T.C.Chou.Theoreticalbasis,experimentaldesign,andcomputerizedsimulationof synergismandantagonismindrugcombinationstudies. PharmacologicalReviews ,58, 2006. [14]J.Cohen,P.Cohen,S.G.West,andL.S.Aiken. Appliedmultipleregression/correlation analysisforthebehavioralsciences .Hillsdale,NJ:LawrenceErlbaumAssociates,2 edition,2003. [15]A.Cornish-BowdenandM.L.Cardenas.Controlofmetabolicprocesses. NATOASI SeriesA ,1990. [16]M.W.Covert,N.Xiao,T.J.Chen,andJ.R.Karr.Integratingmetabolic,transcriptional regulatoryandsignaltransductionmodelsinEscherichiacoli. Bioinformatics ,24, 2008. [17]MaxwellD.Cummings,ReneeL.DesJarlais,AlanC.Gibbs,VenkatramanMohan,and EdwardP.Jaege.ComparisonofAutomatedDockingProgramsasVirtualScreening Tools. JournalofMedicinalChemistry ,48:962,2005. [18]R.K.De,M.Das,andS.Mukhopadhyay.Incorporationofenzymeconcentrationsinto FBAandidenticationofoptimalmetabolicpathways. BMCSystemsBiology ,2, 2008. [19]K.H.O.Deane,S.Spieker,andC.E.Clarke.Catechol-o-methyltransferaseinhibitors versusactivecomparatorsforlevodopa-inducedcomplicationsinparkinson'sdisease. CochraneDatabaseofSystematicReviews ,4,2004. [20]M.M.Domach. Introductiontobiomedicalengineering .UpperSaddleRiver:Pearson PrenticeHall.,2004. [21]J.Drews.Drugdiscovery:Ahistoricalperspective. Science ,287:1960,Mar 2000. [22]D.MalcolmDuckworthandPhilippeSanseau.Insilicoidenticationofnoveltherapeutic targets. DrugDiscoveryToday ,7:64,May2002. [23]J.Edwards,R.Ibarra,andB.Palsson.Insilicopredictionsofescherichiacolimetabolic capabilitiesareconsistentwithexperimentaldata. NatureBiotechnology ,2001. [24]J.S.EdwardsandB.O.Palsson.TheEscherichiacoliMG1655insilicometabolic genotype:Itsdenition,characteristics,andcapabilities. ProceedingsoftheNational AcademyofSciences ,97,2000. [25]JeremySEdwardsandBernhardOPalsson.Metabolicuxbalanceanalysisandthein silicoanalysisofEscherichiacoliK-12genedeletions. BMCBioinformatics ,1,2000. 169

PAGE 170

[26]T.J.A.Ewing,S.Makino,A.G.Skillman,andI.D.Kuntz.DOCK4.0:searchstrategies forautomatedmoleculardockingofexiblemoleculedatabases. JournalofComputerAidedMolecularDesign ,15:411,2001. [27]J.Fernandes,J.M.Saudubray,G.v.d.Berghe,andJ.H.Walter.InbornMetabolic Diseases:DiagnosisandTreatment.2006. [28]R.A.Fisher.Frequencydistributionofthevaluesofthecorrelationcoefcientinsamples fromanindenitelylargepopulation. Biometrika ,10,1915. [29]J.Forster,I.Famili,PFu,B.O.Palsson,andJNielsen.Genome-scalereconstructionof thesaccharomycescerevisiaemetabolicnetwork. GenomeResearch ,13,2003. [30]R.A.Friesner,J.L.Banks,R.B.Murphy,T.A.Halgren,J.J.Klicic,D.T.Mainz,M.P. Repasky,E.H.Knoll,M.Shelley,J.K.Perry,D.E.Shaw,P.Francis,andP.S.Shenkin. Glide:ANewApproachforRapid,AccurateDockingandScoring.1.Methodand AssessmentofDockingAccuracy. JournalofMedicinalChemistry ,47:1739, 2004. [31]D.M.GavrilaandL.S.Davis.Towards3-dmodel-basedtrackingandrecognitionof humanmovement. Proc.oftheIEEEInternationalWorkshoponFaceandGesture Recognition ,1995. [32]S.R.Goldsmith.Type5PhosphodiesteraseInhibitioninHeartFailure:TheNextStep. JournaloftheAmericanCollegeofCardiology ,50,2007. [33]C.M.GrishamandH.G.Reginald. Biochemistry .Philadelphia:SaundersCollegePub, 1999. [34]S.A.Hindle,M.Rarey,C.Buning,andT.Lengauer.Flexibledockingunderpharmacophoretypeconstraints. JournalofComputer-AidedMolecularDesign ,16,2002. [35]S.L.Holbeck.UpdateonNCIinvitrodrugscreenutilities. EuropeanJournalofCancer 40:785,2004. [36]L.Holm,S.Kaariainen,P.Rosenstrom,andA.Schenkel.Searchingproteinstructure databaseswithdalilitev.3. Bioinformatics ,24,2008. [37]E.Horowitz,S.Sahni,andD.Mehta.Fundamentalsofdatastructuresinc++. Silicon Press ,2007. [38]ZengjianHu.DrugDiscoveryinthePost-GenomicEra:Systems-BasedDrugDiscovery. BioinformaticsandBiomedicalEngineering ,pages406,July2007. [39]ShindyalovINandBournePE.Proteinstructurealignmentbyincrementalcombinatorial extensionceoftheoptimalpath. ProteinEngineering ,11,1998. 170

PAGE 171

[40]A.N.Jain.Surex:Fullyautomaticexiblemoleculardockingusingamolecular similarity-basedsearchengine. JournalofMedicinalChemistry ,2003. [41]G.Jones,P.Willett,R.C.Glen,A.R.Leach,andR.Taylor.Developmentandvalidation ofageneticalgorithmforexibledocking. JournalofMolecularBiology ,267:727, 1997. [42]JohnCBurnettJr.ModulatingcGMPinheartfailure. BMCPharmacology ,7Suppl1, 2007. [43]H.KacserandJ.A.Burns.Thecontrolofux. SymposiaoftheSocietyforExperimental Biology ,1973. [44]T.Kahveci.Npcompletenessforoptimalenzymecombinationidentication.Technical report,CISEDepartment,UniversityofFlorida,Jan2008. [45]R.V.KailandJ.C.Cavanaugh. HumanDevelopment:ALife-spanView .Thomson Learning,4edition,2006. [46]M.Kanehisa.Adatabaseforpost-genomeanalysis. TrendsinGenetics ,13:375, 1997. [47]M.KanehisaandS.Goto.KEGG:Kyotoencyclopediaofgenesandgenomes. Nucleic AcidsResearch ,28:27,Jan2000. [48]K.J.Kauffman,P.Prakash,andJ.S.Edwards.Advancesinuxbalanceanalysis. Current opinioninbiotechnology ,14,2003. [49]E.Kellenberger,J.Rodrigo,P.Muller,andD.Rognan.Comparativeevaluationofeight dockingtoolsfordockingandvirtualscreeningaccuracy. Proteins:Structure,Function, andBioinformatics ,2002. [50]E.Kellenberger,J.Y.Springael,M.Parmentier,M.Hachet-Haas,J.L.Galzi,andD.Rognan.IdenticationofnonpeptideCCR5receptoragonistsbystructure-basedvirtual screening. JournalofMedicinalChemistry ,50:1294,March2007. [51]E.J.KeoghandM.J.Pazzani.Scalingupdynamictimewarpingfordataminingapplications. InternationalConferenceonKnowledgeDiscoveryandDataMining,Proceedings ofthesixthACMSIGKDDinternationalconferenceonKnowledgediscoveryanddata mining ,2000. [52]I.M.Keseler,J.Collado-Vides,S.Gama-Castro,J.Ingraham,S.Paley,I.T.Paulsen, M.Peralta-Gil,andP.D.Karp.Ecocyc:acomprehensivedatabaseresourceforescherichia coli. NucleicAcidsResearch ,2005. 171

PAGE 172

[53]C.KhoslaandJ.D.Keasling.Metabolicengineeringfordrugdiscoveryanddevelopment. NatureReviewsDrugDiscovery ,2:1019,2003. [54]S.KlamtandE.D.Gilles.Minimalcutsetsinbiochemicalreactionnetworks. Bioinformatics ,20,2004. [55]L.SalwinskiL,C.S.Miller,A.J.Smith,F.K.Pettit,J.U.Bowie,andD.Eisenberg.The databaseofinteractingproteins:2004update. NucleicAcidsResearch ,2004. [56]J.M.Lee,E.P.Gianchandani,J.A.Eddy,andJ.A.Papin.Dynamicanalysisofintegrated signaling,metabolicandregulatorynetworks. PLoSComputationalBiology ,4,2008. [57]J.M.Lee,E.P.Gianchandani,andJ.A.Papin.Fluxbalanceanalysisintheeraof metabolomics. BriengsinBioinformatics ,2006. [58]ChristianLemer,ErickAntezana,FabianCouche,FredericFays,Xavier,Santolaria,Jean Richelle,andShoshanaJ.Wodak.Theamazelightbench:awebinterfacearelational databaseofbiochemicalpathwaysandcellularprocesses. NucleicAcidsResearch 32:443,2003. [59]T.LengauerandM.Rarey.Computationalmethodsforbiomoleculardocking. Current OpinioninStructuralBiology ,6:402,1996. [60]C.LipinskiandA.Hopkins.Navigatingchemicalspaceforbiologyandmedicine. Nature ,2004. [61]S.Loewe.Theproblemofsynergismandantagonismofcombineddrugs. Arzneimittelforschung ,3,1953. [62]R.Y.Luo,S.Liao,G.Y.Tao,Y.Y.Li,S.Zeng,Y.X.Li,andQ.Luo.Dynamicanalysis ofoptimalityinmyocardialenergymetabolismundernormalandischemicconditions. Molecularsystemsbiology2 ,.0031,2006. [63]P.D.Lyne,P.W.Kenny,D.A.Cosgrove,D.Chun,S.Zabludoff,J.J.Wendoloski,and S.Ashwell.Identicationofcompoundswithnanomolarbindingafnityforcheckpoint kinase-1usingknowledge-basedvirtualscreening. Journalofmedicinalchemistry 47:1962,2004. [64]R.Mahadevan,J.S.Edwards,andF.J.DoyleIII.Dynamicuxbalanceanalysisof diauxicgrowthinEscherichiacoli. BiophysicalJournal ,83,2002. [65]A.G.Murzin,S.E.Brenner,T.Hubbard,andC.Chothia.SCOP:astructuralclassicationofproteinsdatabasefortheinvestigationofsequencesandstructures. Journalof MolecularBiology ,247:536,1995. 172

PAGE 173

[66]C.Newgard,M.Brady,R.O'Doherty,andA.Saltiel.Organizingglucosedisposal: emergingrolesoftheglycogentargetingsubunitsofproteinphosphatase-1. Diabetes ,, 2000. [67]P.H.NgandR.L.Rardin.Commodityfamilyextendedformulationsofuncapacitated xedchargenetworkowproblems. Networks ,30,1996. [68]SyedM.Noorwez,DavidA.Ostrov,J.HughMcDowell,MarkP.Krebs,andShalesh Kaushal.AHigh-ThroughputScreeningMethodforSmall-MoleculePharmacologic ChaperonesofMisfoldedRhodopsin. InvestigativeOphthalmologyandVisualScience 49:3224,2008. [69]J.D.Orth,I.Thiele,andB.Palsson.Whatisuxbalanceanalysis? NatureBiotechnology 2010. [70]P.Pacher,A.Nivorozhkin,andC.Szabo.Therapeuticeffectsofxanthineoxidase inhibitors:renaissancehalfacenturyafterthediscoveryofallopurinol. Pharmacological Reviews ,58,2006. [71]B.O.Palsson. Systemsbiology:Propertiesofreconstructednetworks .Cambridge UniversityPress,2006. [72]K.R.Patil,I.Rocha,J.Forster,andJ.Nielsen.Evolutionaryprogrammingasaplatform forinsilicometabolicengineering. BMCBioinformatics ,6,2005. [73]M.PeschelandW.Mende. Thepredator-preymodel:doweliveinavolterraworld? Akademie-Verlag,Berlin,1986. [74]S.Philipp,J.Monti,I.Pagel,T.Langenickel,T.Notter,F.Ruschitzka,T.Luscher, R.Dietz,andR.Willenbrock.Treatmentwithdarusentanover21daysimprovedcGMP generationinpatientswithchronicheartfailure. ClinicalScience ,103Suppl.48,2002. [75]A.D.PolyaninandV.F.Zaitsev. HandbookofExactSolutionsforOrdinaryDifferential Equations .ChapmanandHall/CRC,2edition,2003. [76]G.D.Prell,J.K.Khandelwal,R.S.Burns,P.Blandina,A.M.Morrishow,andJ.P.Green. Levelsofpros-methylimidazoleaceticacid:CorrelationwithseverityofParkinson's diseaseinCSFofpatientsandwiththedepletionofstriataldopamineanditsmetabolites inMPTP-treatedmice. JournalofNeuralTransmission ,3:1435,1991. [77]N.D.Price,J.A.Papin,C.H.Schilling,andB.O.Palsson.Genome-scalemicrobialin silicomodels:theconstrains-basedapproach. TrendsinBiotechnology ,3:162,2003. [78]N.D.Price,J.L.Reed,J.A.Papin,S.L.Wiback,andB.O.Palsson.Network-based analysisofmetabolicregulationinthehumanredbloodcell. JournalofTheoretical Biology ,225,2008. 173

PAGE 174

[79]L.RabinerandB.Juang. Fundamentalsofspeechrecognition .Prentice-Hall,Inc.,Upper SaddleRiver,NJ,USA,1993. [80]R.Ramakrishna,J.S.Edwards,A.McCulloch,andB.O.Palsson.Fluxbalanceanalysis ofmitochondrialenergymetabolism:consequencesofsystemicstoichiometricconstraints. AmericanJournalofPhysiology-Regulatory,IntegrativeandComparativePhysiology 280,2007. [81]N.L.Rao,P.J.Dunford,X.Jiang,K.A.Lundeen,F.Coles,J.P.Riley,K.N.Williams, C.A.Grice,J.P.Edwards,L.Karlsson,andA.M.Fourie.Anti-InammatoryActivityofaPotent,SelectiveLeukotrieneA4HydrolaseInhibitorinComparisonwith the5-LipoxygenaseInhibitorZileuton. JournalofPharmacologyandExperimental Therapeutics ,321:1154,2007. [82]J.L.Reed,T.D.Vo,C.H.Schilling,andB.O.Palsson.Anexpandedgenomescalemodel ofescherichiacolik-12ijr904gsm/gpr. GenomeBiology ,4R54,2003. [83]D.Rodriguez-Amaya.Foodcarotenoids:analysis,compositionandalterationsduring storageandprocessingoffoods. ForumNutr ,56,2003. [84]M.Salter,R.Knowles,andC.Pogson.Metaboliccontrol. EssaysBiochem ,1994. [85]M.A.Savageau.Enzymekineticsinvitroandinvivo:Michaelis-mentenrevisited,in: Bittar,e.e.ed.. PrinciplesofMedicalBiology ,1995. [86]M.A.SavageauandE.O.Voit.RecastingnonlineardifferentialequationsasS-systems:a canonicalnonlinearform. MathematicalBiosciences ,87,1987. [87]M.D.Schmill,T.Oates,andP.R.Cohen.Learnedmodelsforcontinuousplanning. In InProceedingsofUncertainty99:TheSeventhInternationalWorkshoponArticial IntelligenceandStatistics ,pages278.MorganKaufmannPublishers,Inc,1999. [88]R.Schulein.Theearlystagesoftheintracellulartransportofmembraneproteins: clinicalandpharmacologicalimplications. ReviewsofPhysiology,Biochemistryand Pharmacology ,151:45,2004. [89]C.Scriver,A.L.Beaudet,D.Valle,W.S.Sly,B.Vagelstein,B.Childs,andK.W.Kinzler. TheOnlineMetabolicandMolecularBasesofInheritedDisease .NewYork:McGrawHill,2007. [90]T.Shlomi,O.Berkman,andE.Ruppin.Regulatoryon/offminimizationofmetabolicux changesaftergeneticperturbations. ProceedingsoftheNationalAcademyofSciences 102,2005. 174

PAGE 175

[91]A.L.Smith,S.P.Datta,G.HowardSmith,P.N.Campbell,R.Bentley,andH.A.McKenzie. Oxforddictionaryofbiochemistryandmolecularbiology .Oxford[Oxfordshire]: OxfordUniversityPress,1997. [92]C.Smith.Hittingthetarget. Nature ,422:341,Mar2003. [93]B.Song,I.E.Buyuktahtakin,T.Kahveci,andS.Ranka.Manipulatingthesteady stateofmetabolicpathways. IEEE/ACMTransactionsonComputationalBiologyand BioinformaticsIEEETCBB ,acceptedforpublication. [94]B.Song,P.Sridhar,T.Kahveci,andS.Ranka.Doubleiterativeoptimizationformetabolic network-baseddrugtargetidentication. InternationalJournalofDataMiningand Bioinformatics ,:145,2009. [95]P.Sridhar,T.Kahveci,andS.Ranka.Aniterativealgorithmformetabolicnetwork-based drugtargetidentication. PacicSymposiumonBiocomputing ,2007. [96]P.Sridhar,B.Song,T.Kahveci,andS.Ranka.OPMET:Ametabolicnetwork-based algorithmforoptimaldrugtargetidentication. PacicSymposiumonBiocomputing 2008. [97]G.N.Stephanopoulos,A.A.Aristidou,andJ.Nielsen. MetabolicEngineering:Principles andMethodologies .SanDiego:AcademicPress,1998. [98]UniversityOfSurrey.Insilicocellfortbdrugdiscovery. ScienceDaily ,2010. [99]R.SurteesandN.Blau.Theneurochemistryofphenylketonuria. EuropeanJournalof Pediatrics ,159:109,2000. [100]T.Takenaka.Classicalvsreversepharmacologyindrugdiscovery. BritishJournalof UrologyInternational ,88:7,Sep2001. [101]M.J.Torres-Galvan,N.Ortega,F.Sanchez-Garcia,andC.Blanco.LTC4-synthase A-444Cpolymorphism:lackofassociationwithNSAID-inducedisolatedperiorbital angioedemainaSpanishpopulation. AnnalsofAllergyAsthmaandImmunology 87:506,2001. [102]A.Ulloa-Aguirre,J.A.Janovick,S.P.Brothers,andP.M.Conn.Pharmacologicrescue ofconformationally-defectiveproteins:implicationsforthetreatmentofhumandisease. Trafc ,5:821,2004. [103]G.J.Vazque,M.J.Pettinari,andB.S.Mendez.Evidenceofanassociationbetween poly-hydroxybutyrateaccumulationandphosphotransbutyrylaseexpressioninBacillus megaterium. IntMicrobiol. ,6,2003. 175

PAGE 176

[104]G.N.VemuriandA.A.Aristidou.Metabolicengineeringinthe-omicsera:Elucidating andmodulatingregulatorynetworks. MicrobiologyandMolecularBiologyReviews 69:197,2005. [105]D.C.DeVivo,T.P.Bohan,D.L.Coulter,F.E.Dreifuss,R.S.Greenwood,D.R.Nordli Jr.,W.DonaldShields,C.E.Stafstrom,andI.Tein.L-carnitinesupplementationin childhoodepilepsy:Currentperspectives. Epilepsia.Vol. ,39:1216,1998. [106]A.I.Vogel,A.R.Tatchell,B.S.Furnis,A.J.Hannaford,andP.W.G.Smith. Vogel's textbookofpracticalorganicchemistry .PrenticeHall,5edition,1996. [107]EberhardO.Voit. Computationalanalysisofbiochemicalsystems:apracticalguidefor biochemistsandmolecularbiologists .CambridgeUniversityPress,2000. [108]EberhardO.Voit.Metabolicmodeling:atoolofdrugdiscoveryinthepost-genomicera. DrugDiscoveryToday ,7:621,May2002. [109]G.L.Warren,C.W.Andrews,A.M.Capelli,B.Clarke,J.LaLonde,M.H.Lambert, M.Lindvall,N.Nevins,S.F.Semus,S.Senger,G.Tedesco,I.D.Wall,J.M.Woolven, C.E.Peishoff,andM.S.Head.Acriticalassessmentofdockingprogramsandscoring functions. JournalofMedicinalChemistry ,49:5912,2006. [110]BohdanWaszkowycz.Towardsimprovingcompoundselectioninstructure-basedvirtual screening. DrugDiscoveryToday ,13/6:219,March2008. [111]H.Westerhoff,A.Groen,andR.Wanders.Moderntheoriesofmetaboliccontrolandtheir applicationsreview. BioscienceReports ,,1984. [112]J.D.Windass,M.J.Worsey,E.M.Pioli,D.Pioli,P.T.Barth,K.T.Atherton,E.C.Dart, D.Byrom,K.Powell,andP.J.Senior.Improvedconversionofmethanoltosingle-cell proteinbymethylophilusmethylotrophus. InNature ,287:396,1980. [113]Y.T.Yang,G.N.Bennett,andK.Y.San.Geneticandmetabolicengineering. Electronic JournalofBiotechnology ,1,1998. [114]B.K.Yi,H.V.Jagadish,andC.Faloutsos.Efcientretrievalofsimilartimesequences undertimewarping. ProceedingsoftheFourteenthInternationalConferenceonData EngineeringICDE ,1998. [115]R.Zechner,J.G.Strauss,G.Haemmerle,A.Lass,andR.Zimmermann.Lipolysis: pathwayunderconstruction. CurrentOpinioninLipidology ,16:333,2005. [116]S.S.Zumdahl. ChemicalPrinciples .HoughtonMifin,NewYork,2005. 176

PAGE 177

BIOGRAPHICALSKETCH BinSongwasaPh.D.studentinComputerScienceattheUniversityofFlorida,CISE Department,whereshewasalsoamemberoftheDatabaseCenter.BinSongworkedunderthe supervisionofDr.TamerKahveciandDr.SanjayRanka.BinSong'sresearchinterestswere metabolicengineering,compoundselectionandalgorithm. BinSongreceivedtheBSandMSdegreeincomputersciencefromtheUniversityof ScienceandTechnologyofChina,Hefei,Chinain2002and2005.ShereceivedaPh.D.degreein computerscienceattheUniversityofFloridain2010. 177