Group Title: Department of Computer and Information Science and Engineering Technical Reports
Title: OPMET : a metabolic network-based algorithm for optimal drug target identification
CITATION PDF VIEWER THUMBNAILS PAGE IMAGE ZOOMABLE
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00095639/00001
 Material Information
Title: OPMET : a metabolic network-based algorithm for optimal drug target identification
Alternate Title: Department of Computer and Information Science and Engineering Technical Report
Physical Description: Book
Language: English
Creator: Sridhar, Padmavati
Kahveci, Tamer
Ranka, Sanjay
Publisher: Department of Computer and Information Science and Engineering, University of Florida
Place of Publication: Gainesville, Fla.
Copyright Date: 2006
 Record Information
Bibliographic ID: UF00095639
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.

Downloads

This item has the following downloads:

2006009 ( PDF )


Full Text













OPMET: A metabolic network-based


algorithm for optimal drug target identification*


Padmavati Sridhar, Tamer Kahveci and Sanjay Ranka

Department of Computer and Information Science and Engineering,

University of Florida, Gainesville, FL, USA, 32611

{psridhar, tamer, ranka}@cise.ufl.edu



Abstract

Recent advances in bioinformatics promise rational drug-design meth-
ods which will reduce serious side-effects through the identification of en-
zymatic targets. Efficient computational methods are required to identify
the optimal enzyme-combination (i.e., drug targets) whose inhibition will
achieve the required effect of eliminating a given target set of compounds
while incurring minimal side-effects. An exhaustive evaluation of all possi-
ble enzyme combinations may become computationally infeasible for very
large metabolic networks.
*Work supported partially by ORAU (Award No: 00060845).














We formulate the optimal enzyme-combination identification problem as

an optimization problem on metabolic networks. We define a graph based

computational model of the network which encapsulates the impact of en-

zymes onto compounds. We propose a branch-and-bound algorithm, named

OPMET, to explore the search space. We develop a cost model and two en-

zyme prioritization strategies, Static OPMET and Dynamic OPMET, based

on it. Static OPMET priorities enzymes according to their impacts, such

that the most promising enzymes are inspected first for possible inclusion

in the optimal subset. Dynamic OPMET dynamically updates the priorities

as the search space is explored. We also develop two filtering strategies to

prune the search space while still guaranteeing an optimal solution. They

compute an upper bound to the number of target compounds eliminated and

a lower bound to the side-effect respectively. Our experiments on E.Coli

metabolic network show that OPMET can reduce the total search time by

several orders of magnitude as compared to the exhaustive search.



1 Introduction

In pharmaceutics, the development of every drug mainly involves target identi-

fication, validation and lead inhibitor identification [5]. Traditional drug discovery

approaches focused more on the efficacy of drugs than their toxicity (untoward

side effects). Lack of predictive models that account for the complexity of the

inter-relationships between the metabolic processes often leads to drug develop-

ment failures. Toxicity and/or lack of efficacy can result if metabolic network

components other than the intended target are affected. This is well-illustrated by














the example of the recent failure of Tolcapone (a new drug developed for Parkin-

son's disease) due to observed hepatic toxicity in some patients [7]. Post-genomic

advances has seen a perspective shift in drug research. The current focus is on the

identification of biological targets (gene products, such as enzymes or proteins)

for drugs, which can be manipulated to produce the desired effect (of curing a

disease) with minimum disruptive side-effects [22, 25].

Enzymes catalyze reactions, which produce metabolites (compounds) in the

metabolic networks of organisms. Enzyme malfunctions can result in the ac-

cumulation of certain compounds which may result in diseases. We term such

compounds as Target Compounds and the remaining compounds as Non-Target

compounds. For instance, the malfunction of enzyme phenylalanine hydroxy-

lase causes buildup of the amino acid, phenylalanine, resulting in phenylketonuria

[24], a disease that causes mental retardation. It is therefore intuitive to locate the

optimal enzyme set which can be manipulated by drugs to prevent the excess pro-

duction of target compounds, with minimal side-effects. We term the side-effects

of inhibiting a certain enzyme combination as the damage caused to the metabolic

network.

Given a metabolic network and a set of target compounds, we consider the

problem of identifying the set of enzymes whose inhibition eliminates the tar-

get compounds and incurs minimum damage. Evaluating all enzyme combina-

tions is not feasible as the number of such combinations increases exponentially

with the number of enzymes. Hence, more efficient computational methods are

needed. We propose OPMET, an Optimal enzyme drug target identification algo-













rithm based on Metabolic networks, to solve this problem. The main contributions

of our paper are as follows:

1. We define a graph based computational model of the metabolic network

that encapsulates the impact of enzymes onto compounds. We formulate

the optimal enzyme combination identification problem as an optimization

problem on this graph. We develop a cost model that takes both the observed

and potential damage (i.e, side-effects) resulting from the inhibition of an

enzyme set into account, for the cost computation.

2. We propose a branch-and-bound algorithm, named OPMET, to explore the

search space. We develop two enzyme prioritization strategies, Static OP-

MET and Dynamic OPMET based on the cost model. Static OPMET prior-

itizes enzymes according to the impact of their inhibition on the production

of compounds in the metabolic network. It inspects the most promising en-

zymes first, for possible inclusion in the optimal subset. Dynamic OPMET

dynamically updates the priorities as the search space is explored.

3. We develop two filtering strategies for OPMET, namely target filter and

non-targetfilter, to prune the search space while still guaranteeing an opti-

mal solution. The target filter eliminates a subspace when it is proven that

there is no combination of enzymes in this space that can stop the produc-

tion of all the target compounds (i.e., there is no useful drug target). The

non-target filter prunes subspaces where there is no solution with a damage

less than that of the optimal solution found so far.













Our experiments on E.Coli metabolic network (data extracted from KEGG [18,

19]) show that our methods can reduce the total search time by several orders of

magnitude as compared to the exhaustive search. Dynamic OPMET with a com-

bination of the target and non-target filters prunes 91.6 % of the search space

of the exhaustive search (232 combinations) on average. Dynamic OPMET with

combined filters generates the optimal enzyme combination within the exploration

0.005 % of the search space on average.

The rest of the paper is organized as follows. Section 2 formally defines

the problem and describes our proposed network model. Section 3 presents the

proposed OPMET algorithm with static and dynamic prioritization and filtering

strategies. Section 4, discusses experimental results. Section 5 discusses the re-

lated work. Section 6 gives a brief discussion of the contribution of the paper.


2 Problem Definition

In this section, we propose a formal definition of the optimal enzyme combi-

nation identification problem and describe our model of the metabolic network.

We develop a graph based model that captures the interactions between reac-

tions, compounds, and enzymes. Our model is a variation of the boolean network

model [23, 20]. Let R, C, and E denote the set of reactions, compounds, and

enzymes respectively. The vertex set consists of all the members of R U C U E.

A vertex is labeled as reaction, compound, or enzyme based on the entity it refers

to. Let VR, Vc, and VE denote the set of vertices from R, C, and E. A unique

identifier is also stored along with each vertex. A directed edge from vertex x to



































----------. Disrupted pathway
(a)









--- -- Disrupted Pathway
(b)

Figure 1: A graph constructed for a metabolic network with three reactions R1, R2, and
R3, two enzymes E1 and E2, and nine compounds C1, C2, C9. Circles, rectangles,
and triangles denote compounds, reactions, and enzymes respectively. Here, C4 (shown
by double circle) is the target compound. Dotted lines indicate the subgraph removed due
to inhibition of an enzyme. (a) Effect of inhibiting E2. (b) Effect of inhibiting Ei.













vertex y is then drawn if one of the following three conditions holds: (1) x rep-

resents an enzyme that catalyzes the reaction represented by y. (2) x corresponds

to a substrate for the reaction represented by y. (3) x represents a reaction that

produces the compound mapped to y. Notice that an edge is drawn only between

a vertex in VR and a vertex in Vc U VE.

Figure 1 illustrates a small hypothetical metabolic network. A directed edge

from an enzyme to a reaction implies that the enzyme catalyzes the reaction (i.e.,

El catalyzes RI and E2 catalyzes R2 and R3). A directed edge from a compound

to a reaction implies that the compound is a reactant. A directed edge from a

reaction to a compound implies that the compound is a product. In this figure, C4

is the target compound (i.e., the production of C4 should be stopped). In order

to stop the production of C4, R2 has to be prevented from taking place. This can

be achieved in two ways. One way is by disrupting one of its catalyzing enzymes

(E2 in this case). Another is by stopping the production of one of its reactant

compounds (C2 or C3 in this case). If we choose to stop the production of C2,

we need to recursively look for the enzyme which is indirectly responsible for its

production (El in this case). Thus, the production of the target compound can be

stopped by manipulating either El or E2.

Figure 1(a) shows the disruption of E2 and its effect on the network. Inhibiting

E2 results in the knock out of compounds C5, Cs and C9 in addition to the target

compound, C4. Note that the production of C7 is not stopped since it is produced

by R1 even after the inhibition of E2. We term the effect of disrupting non-target

compounds as the side-effect of manipulating E2. We define the number of non-













target compounds knocked out as the damage, the manipulation of an enzyme set

causes to the metabolic network. In this case, the damage of inhibiting E2 is 3 (i.e.,

C5, Cs and Cg). Figure l(b) shows the inhibition of El and its effect on the same

network. In this case, the damage is 2 (i.e., C2 and C5). The important observation

is that El and E2 both achieve the effect of disrupting the target compound, C4.

Hence, El and E2 are both potential drug targets. However, El is a better drug-

target than E2 since it causes lesser damage.

We formally state the optimal enzyme combination identification problem as:

Given a set of target compounds T (T c C), find the set of enzymes

X (X C E) with minimum damage, whose inhibition stops the pro-

duction of all the compounds in T.

Our graph model accurately reflects the impacts of gene manipulations. Inhi-

bition of a set of enzymes, for example, is simulated by deletion of the vertices

for the enzymes in that set. This modification then propagates to the vertices that

are reachable from the deleted vertices.

Lemke et. al [13, 15] defined the damage of inhibition of an enzyme as the

number of compounds whose production stops after the inhibition of that enzyme

(all the compounds are assumed to be of equal importance to the network). Our

definition of damage is similar to that of Lemke in principle. We differ by ex-

cluding the target compounds from the damage computation. At a high level, we

consider damage as the side-effects of inhibiting an enzyme set.














3 Methods


The number of possible subsets of enzymes that need to be considered for

finding the optimal drug target is exponential in an exhaustive search. Clearly, this

is not feasible beyond small sized metabolic networks. In this section, we propose

OPMET, a branch and bound algorithm that considerably reduces the number of

possible combinations to be searched while still guaranteeing to find an optimal

solution. We first describe the state space that represents all possible solutions

using a tree and the backtracking mechanism which guarantees that the search

is conducted till an optimal solution is found. As part of any effective branch

and bound strategy it is important to find a good solution quickly. This calls for

effective filtering (or pruning) of subspaces that can be guaranteed not to have

better solution that the best found so far. Our prioritization and filtering strategies

which achieve this goal are described in detail in the following subsections.

3.1 State Space and Basic Strategy

In this subsection we discuss our modeling of the search space and the basic

underlying strategy of the OPMET algorithm. We develop a systematic and flexi-

ble enumeration of the search space and use it to build our efficient search strategy

to find optimal result.

Let Ei, Vi, 1 < i < m denote the set of enzymes for a metabolic network. The

search space is modeled as a tree structure. Every node of this tree corresponds to

a state in the search space and it is represented by a 4-tuple ([en, e ,, - e m], k,

d, remove). Here, 7r1, ., 7m is a permutation of 1, 2, *- m. The first parameter














corresponds to the state of all the enzymes (i.e., e, corresponds to enzyme Ei).

e~ = 1 if Ei is inhibited. Otherwise, e, = 0. The parameter k indicates that

the first k enzymes are considered at that search state. The decision to inhibit or

not inhibit has been fixed for enzymes from 1 to k 1. We now set e, = 1

and e, = 0, Vi, k < i < m. The damage incurred due to inhibited enzymes at

that state is represented by d. The final parameter, remove, is a boolean variable.

It takes value True if the inhibited enzymes stop the production of all the target

compounds. Otherwise, it is set to False. We call a node with remove = True

as a true node, and a node with remove = False as a false node. Figure 2 shows

part of the search tree for a hypothetical 4-enzyme network. The tuple for node

ni indicates that enzyme El is inhibited and it is a true node with damage d = 5.

k = 1 since the decision to inhibit or not has been fixed only for one enzyme.

OPMET Algorithm: We start with a root node whose 4-tuple is ([0, 0, * 0],

0, 0, False) indicating that all enzymes are present in the network. As the search

space is traversed, we keep the true node with the minimum damage found so far

as the current true solution and store the associated damage value as D, the global

cut-off threshold. D is initialized to the number of compounds in the network. At

any point, we have an active set of nodes A, stored in a stack structure. A contains

the nodes currently being considered. Let node N = ([e, e,,, .* e ], k, d,

remove) be the node on top of this stack (i.e., the node to be evaluated). We check

if the damage of the current node N, d < D and whether it is a true node. There

are three cases:

Case 1: N is a true node with damage d < D. In this case, we save N as the

10



















[0001,


[0110, 3, 2, True]


Figure 2: The basic OPMET strategy for a hypothetical 4-enzyme network. Enzymes
are ordered as El, E2, E3, E4. no, nl, " -, n4 are the nodes generated. The initial global
cut-offthreshold D = 10 initializedd to the total number of compounds in the network).
ni is a true solution (shown by double circle) with damage d = 5. Since d < D, D is
updated to 5. n1 is saved and the subtree rooted at n1 is pruned. The method backtracks
to no. n2, n3 are false nodes generated along the search with damage d < D. ns is a true
node with d = 2. As d < D, D is updated to 2 and the method backtracks to search the
unexplored space for solutions with d < D (indicated by the dashed edge).

current true solution and update D with the damage value of N. We then

backtrack.


Case 2: N is a true or false node with damage d > D. In this case we prune

the subtree rooted at N. We then backtrack.

Case 3: N is a false node with damage d < D. In this case, we insert N in

the active set A for backtracking purposes. We then create a new node N'

by setting e+ = 1 in N (i.e., we inhibit the enzyme Ek+ ). The resulting

node is N' = ([e,, e,2, * e,], k + 1, d', remove'). The node N' is

evaluated in the next step similarly.

Backtracking involves following steps. First we pick the top node from the ac-













tive nodes stack A. Let N' = ([e,, e/, e], k', d', remove') denote this

node. We then set e,, = 0 (indicating the node we are backtracking from) and

e, 2 1 in N' (i.e., we inhibit the enzyme e,,, k). The resulting node becomes

the node to be evaluated in the next step. The first two cases above stop expanding

the tree at the current node. The former one implies that the current node is a pos-

sible solution (node ni in Figure 2). The latter one implies that the current node

incurs too much damage to lead to a possible solution. The third case happens

when the current node does not stop production of all the target compounds, but

the damage is lower than the damage of the current best solution (nodes n2 and

n3 in Figure 2). Such nodes may produce a possible solution (node n4 in Figure

2) with the inhibition of more enzymes. Thus, they need to be explored further

to ensure that we find an optimal solution. The search terminates when there are

no more nodes to explore. At this stage, the current true solution is the optimal

solution.

3.2 OPMET Prioritization Strategies

In this subsection we present our cost model as the basis for enzyme evalua-

tion and our resulting enzyme prioritization strategies. The purpose of ordering

enzymes by a priority measure is to test the combinations with a high likelihood

of deleting the target compounds first. We develop two enzyme prioritization

strategies: 1) Static OPMET: We pre-order the enzymes according to a priority

measure and apply the OPMET algorithm. 2) Dynamic OPMET: We dynami-

cally select the next enzyme to inhibit based on the partial solution. These are













described in more detail in the rest of the section.

3.2.1 Static OPMET

The simplest way to order the enzymes is to sort them in ascending order of

damage. However, this is inaccurate for several reasons. First, since damage is de-

fined in terms of non target compounds, it does not reflect an enzyme's impact on

the target compounds. Second, the inhibition of an enzyme can make a compound

more vulnerable even if the compound is not removed entirely. For example, in

Figure 1, the production of C7 is not stopped after the inhibition of El or E2 indi-

vidually. However, C7 is removed if El is inhibited as well as E2. Thus, damage

values of individual enzymes are not good indicators of the combined damage of

these enzymes.

Cost Model: We develop a cost model as the basis for enzyme ordering in OP-

MET. This cost model takes both the observed and potential damage (side-effects)

resulting from the inhibition of an enzyme set into the cost computation. For each

enzyme Ei c E, the set of enzymes in the network, we compute a weight W(Ei)

as W(Ei) = 0 if Ei inhibited and W(Ei) = 1 otherwise. We assign rational

weights (fractions between 0 and 1) to the reaction and compound nodes and the

edges. Intuitively, the weight of a node or edge denotes the rate at which that node

or edge appears in the network. We then use these weights to compute the cost of

Ei. The weight of each node depends on the weights of the incoming edges and

the type of the node (i.e., reaction or compound):

Cost Rule 1: Let Rj be a reaction node. Let , 1 < i < k, denote the

weights of the incoming edges to Rj. We compute the weight of Rj as:























.......... Disrupted Pathway

Figure 3: Effect of deletion of enzyme El on the weights of reactions and com-
pounds in the network.


W(Rj) =min,lt .}.

This computation is intuitive since a reaction takes place only if all the in-

puts are present.

Cost Rule 2: Let Cj be a compound node. Let "t 1 < i < k, denote the

weights of the incoming edges to Cj. We compute the weight of Cj as:

W(cj) =k {t}.

This computation captures the property that a compound disappears from

the system only if all the reactions that produce it stop.

We define the weight of an edge as the weight of the node for which it is the

outgoing edge.

In order to compute the cost of Ei, we set the weight of Ei to zero (i.e.,

W(Ei) = 0). The weights of all the reaction and compound nodes are as-

signed progressively by a breadth-first search, according to the above scheme.

The weights of all the nodes and edges which can be reached from Ei are recom-

puted to reflect the change. The effect of deleting El on the weights of reactions













and compounds in the network is shown in Figure 3. We define an impact vector

for each enzyme based on the effects of its inhibition.

Definition 1 Given a network i/ ith n compounds, Cj, 1 < j < n. Let Wi(Cj)

denote the weight of the node corresponding to Cj after the inhibition of enzyme

Ei. We define the impact vector of E as I(E,) = [WW(C1), Wi(C2), Wi(Cn)]

We term Wi(Cj) as the impact of Ei on Cj, Vj. U

The impact vector of an enzyme approximates the amount of each compound

that remains after the inhibition of that enzyme. Every entry of the impact vector

is a fractional number between 0 and 1, where 0 indicates that the corresponding

compound does not exist after inhibition of the corresponding enzyme. We define

the cost of inhibiting an enzyme as follows:

Definition 2 Given a network i, ith n compounds, Cj, 1 < j < n. Assume that

the compounds Cj, Vj, 1 < j < k < n constitute the set of target compounds.

Assume that the remaining compounds Cj, Vj, k + 1 < j < n constitute the non-

target compounds. Let I(Ei) = [Wi(CI), *- Wi(C,)] denote the impact vector

of Ei. We define the cost of inhibiting Ei as


cost(Ei) = I(Ei) VT,


where V [vl, . v,] is the normalization vector: = k for 1 < i < k, and

( (-k )fork

Each target compound contributes a positive value and each non-target com-

15














pound contributes a negative value to the cost of an enzyme. The magnitude of the

contribution of a compound increases linearly with the amount of that compound

that diminishes from the system after the inhibition of the corresponding enzyme.

In other words, if a target compound is eliminated, the cost is small. On the other

hand, if a non-target compound is eliminated, the cost is large. This is justified

since the cost promotes removal of target compounds and demotes the removal of

non-target compounds.

In order to benefit from the pruning power of OPMET cases 1 and 2 (see

Section 3.1), we need to compute the permutation 7r1, .. 7m carefully. The

earlier we place the enzymes in the optimal solution in this permutation, the better,

as OPMET reaches the optimal solution earlier under such an ordering. Reaching

the optimal solution early increases the chances of pruning the remaining nodes

of the search tree due to OPMET Case 2. We sort the enzymes in the ascending

order of cost (see Definition 2). This is justified since a small cost value indicates

that the corresponding enzyme has a large impact on target compounds and low

impact on non-target compounds.

3.2.2 Dynamic OPMET

Static OPMET priorities enzymes based on their individual potential to be

part of the solution. However, this model does not fully account for the combined

damage of a set of enzymes. Finding the combined damage of a set of enzymes

is non trivial. This is because the combined damage depends on the overlap of

the reactions catalyzed by these enzymes as well as their individual damages. En-

zymes that catalyze almost the same set of reactions have less combined damage















1. Let N= ([e, c, e2, e*,] k, d, remove) be the node currently being evaluated.
2. Let R = [rl, r2, *- rn] /* r, E [0, 1] corresponds to the
remaining fraction of compound Ci, Vi */
3. For every candidate enzyme ei E ({e,, e 1 e, }
(a) Ri= R I(Ei) /* Compute new ratio */
(b) Cost(Ei) = Ri VT /* Cost of inhibiting E, */
4. Letj = argminj {Cost(Ej)}
5. R Rj /* Update the ratio after inhibition
of the enzyme with least cost */
6. Select Ej as the next enzyme to inhibit.



Figure 4: Dynamic OPMET procedure for enzyme evaluation.


compared to enzymes that catalyze disjoint reaction sets. This is due to the overlap

of the damage of the enzymes in the former case. We develop a dynamic strat-

egy, Dynamic OPMET, which sorts the enzymes on the fly, based on the partial

solution. At a high level, this strategy evaluates the current state at every step and

picks the enzyme-to-inhibit for the next step based on this evaluation.

Our goal is to dynamically decide which enzyme to inhibit next based on the

combined impact of all the enzymes inhibited so far. We predict the remaining

fraction of all the compounds after inhibition of each enzyme Ei, with the help of

its impact vector, I(Ei) (see Definition 1). This procedure is described in Figure

4. Let R [r, r2, ,, r,,] denote the remaining fractions of compounds (step

1). Here, ri E [0, 1] corresponds to compound Ci, Vi. We initialize ri = 1 VC

indicating that all compounds are being produced without any disruption in the

metabolic network. Let V be the normalization vector (see Definition 2).













At every step of the Dynamic OPMET algorithm, let N = ([e,, e2, ,

e_], k, d, remove) be the node currently being evaluated (i.e., the decision to

inhibit or not inhibit has been fixed for e W, e2,, e. ) (step 2). We now

need to decide which enzyme has to be evaluated next. In Step 3a, for every

enzyme in the remaining enzyme set (e,, Vi, k < i < m), we compute the new

remaining fractions of compounds (Ri). This is done by a Vector Direct Product

of R and the impact vector of Ei (I(Ei)). Vector direct product is defined as

X @ Y = [xIyil,x2y, - x.,,], where X = [xi, - -, x,] and Y = [y ,

y,]. The resulting vector Ri is an approximation to the impact of inhibition of
the enzyme Ei in addition to already inhibited enzymes. This is justified since

the quantity of a compound eliminated by a combination including Ei will be at

least as much as the quantity eliminated by Ei alone. A good candidate enzyme

at this step is one which ensures that lesser of the target compounds remain after

its inhibition. Also, it should ensure that the non target compounds suffer the

minimum possible damage. Our cost model satisfies these requirements. In Step

3b, we compute the cost of each enzyme Ei as the dot product of Ri and V. In

step 4, we pick the enzyme with the minimum cost (Ej). We update R with the

amounts of remaining compounds in vector Rj. Ej is chosen as the next enzyme

to inhibit. Thus, this strategy dynamically chooses the the next best enzyme to

inhibit.

The cost of finding the best enzyme at each step is O(mn), where m and n

denote the number of enzymes and compounds in the metabolic network respec-

tively. This is because a vector direct product costs O(n), and O(m) such products













are carried out.


3.3 OPMET Filtering Strategies

The static and dynamic OPMET strategies perform a cost-based ordering of

the enzymes in order to arrive at the optimal set of enzymes quickly. However,

the method still needs to inspect a large portion of the search space, in order to

ensure that the result is the optimal one. A good filtering strategy is essential to

ensure optimality within a reasonable amount of time. We propose two strategies

to eliminate large portions of the search space quickly. We start by introducing the

following theorem (proof given at the end of this subsection) which establishes a

relationship between the impact of enzymes and their damage.


Theorem 1 Let E {El, E2, *- Er} be a set of enzymes. Let Cj be a com-

pound in the metabolic network. Let di(Cj), 1 < i < r, denote the impact of Ei

on Cj. If the inhibition of all the enzymes in E stops the production of Cj, then

Ei 1( di(Cj)) > 1. t

Next, we describe our filtering strategies.

Target Filter: We traverse the state space to locate the optimal combination of

enzymes to delete a given target compound set. Considerable effort is spent evalu-

ating combinations which might yield a solution as the encountered damage value

is less than D, the global cut-off threshold (see Subsection 3.1). If none of these

combinations ultimately delete the target compound set, the effort spent is wasted.

Finding such combinations will improve the performance of the search vastly.














This is the motivation behind our Target filter.

The target filter eliminates a bulk of the search space when it is proven that

there is no combination of enzymes in this space that can stop the production of all

the target compounds (i.e., there is no useful drug target). This filtering strategy is

based on Theorem 1. Formally, let node N = ([e,, e,2, ., el], k, d, False) be

a node in the search space. Let T denote the set of target compounds. Backtrack

if
k nm
-(1 di(C))e + Y (1 d(C)) < 1,3C c T.
i 1 i=k+1

In this inequality, the first term indicates the impact of enzymes, which are

currently part of the solution set, on the target compounds. The second term

represents the impact of the remaining enzymes on the target compounds. Thus,

this filter examines not only the next enzyme being considered, but all the enzymes

which might eventually be considered as part of the solution.

Non-target Filter: We traverse the state space to locate the optimal combination

of enzymes to delete a given target compound set. The motivation behind this

filtering strategy is to quickly determine if there is any solution in the subtree with

a damage d < D, the global cut-off threshold (see Subsection 3.1). This strategy

eliminates a large chunk of the search space where all the possible combinations

have damage more D. This filter utilizes Theorem 1 similar to the target filter.

The idea is as follows. At a given node N, for each target compound, C, we find














the minimum number of enzymes, n such that


(1 d(C))e '> 1.
i= 1

This gives us the minimum number of enzymes needed to delete C. Let n,ax be

the maximum value of n for any target compound (i.e, we will need at least nmax

enzymes to delete the entire target compound set). Now, we sort the remaining

enzymes (enzymes not considered so far) in the ascending order of their damage

values. Let dmax be the damage of the enzyme at index nmax. If dmax in addition

to the damage incurred so far is greater than D, the global cut-off threshold, we

prune the sub tree rooted at N.

Proof of Theorem 1: Let E = {E1, E2, * Er} be a set of enzymes. Let Cj be

a compound in the metabolic network. Let di, 1 < i < r, denote the impact of Ei

on Cj (see Definition 1). Let di(Rk), 1 < i < r, denote the impact of Ei on Rk.

We need to show that the following rules hold:

Rule 1: If the inhibition of all the enzymes in E stops the production of Cj, then


r
Y(1 d (C)) > 1.
i 1





Rule 2: If the inhibition of all the enzymes in E stops a reaction Rk, then














r

i 1
We first consider the case when a reaction Rk is stopped and prove that Rule

2 holds, given that Rule 1 is correct.

Case 1: Ej cEE such that Ej catalyzes Rk. Hence, the inhibition of Ej stops

Rk. The impact of Ej on Rk is 1 dj(Rk) = 1. Therefore, i(1 di(Rk)) > 1.

Rule 2 holds.

Case 2. No enzyme Ej E E catalyzes Rk. This implies that E removes Ci,

which is an input to Rk. From Rule 1, -i(1 di(Ci)) > 1. From the Cost

Model (see Cost Rule 1 in subsection 3.2), di(Rk) = mindi(C')}, C' is the set

of all input compounds of Rk. Thus, 1 di(Rk) = max{1 di(C')}. Hence,

i 1(1- di(Rk)) > i ,( di(Ci)) > i.
Rule 2 holds.

We now consider the case when the production of a compound Cj stops and

prove that Rule 1 holds, given that Rule 2 is correct. If E removes Cj, it means

that all the reactions that produce Cj have been stopped. Let R1, R2, Rt be

the reactions that produce Cj. For each reaction Rk, 1 < k < t, one of the the

following two cases has to be true.

Case 1: 3 Ej E E such that Ej catalyzes Rk. Hence, the inhibition of Ej

stops Rk. The impact of Ej on Rk is 1 dj (Rk) = 1. From the Cost Model (see

Cost Rule 2 in subsection 3.2), 1 dj(Cj) > 1. Since all the t input reactions of

Cj have been stopped,













Er-(1- di(Cj)) > x t > i.
Rule 1 holds.

Case 2: No enzyme Ej E E directly catalyzes Rk. But since Rk is stopped by

the inhibition of enzymes in E, 'L(1 di(Rk)) > 1 (Rule 2). Also given that

t reactions produce Cj,

(1- d(Cj,)) 1- (I
t
Zr ( rdC)) r Z 1 Zt E (1 d (R))
1xt>

Rule 1 holds. U


4 Experimental Results

In this section, we evaluate the performance of the OPMET algorithm using

the following three criteria:

1. Number of nodes generated: It represents the total number of enzyme combi-

nations tested to complete the search. The lesser the number of nodes generated,

the better the performance of the method.

2. Optimal node rank: This indicates the number of nodes explored before

the method arrives at the optimal solution. A small optimal node rank indicates

that the strategy efficiently tested the optimal combination as soon as the search

started.

3. Execution time: This indicates the total time taken by the method to finish the

search and conclude that it found an optimal solution.

We implemented the OPMET algorithm with a random enzyme ordering, Sta-













tic OPMET and Dynamic OPMET (see Subsection 3.2). We implemented two

filtering strategies, Non-target filter and Target filter (see Subsection 3.3), to pro-

vide pruning capabilities to the OPMET algorithm. We compare the methods we

implemented to a hypothetical exhaustive search which enumerates and tests all

possible combinations of enzymes (2", where n is the number of enzymes).

We extracted the metabolic network information of Escherichia Coli from

KEGG [18, 19] (ftp: // ftp. genome. jp/pub/kegg/pathways/eco/).

The metabolic network in KEGG has been divided into smaller networks accord-

ing to their specific functions. We chose six of these networks for our experi-

ments, based on the number of enzymes. We devised a labeling scheme for the

networks which begins with 'N' and is followed by the number of enzymes in

the network. For instance, 'N20' indicates a network with 20 enzymes. Table 1

shows the metabolic networks chosen, along with their identifiers and the number

of compounds (C) and reactions (R). For each network, we constructed query sets

of sizes one, two and four target compounds, by randomly choosing compounds

from that network. Each query set contains 10 queries each.

We ran our experiments on an Intel Pentium 4 processor with 2.8 GHz clock

speed and 1-GB main memory running Linux operating system.

4.1 Evaluation of OPMET prioritization strategies

In this section, our goal is to evaluate the performance of our enzyme priori-

tization strategies. Here, we compare our static and dynamic OPMET algorithms

with a random ordering of enzymes and an exhaustive search. We do not include














Table 1: Metabolic networks from KEGG with identifier (Id). C and R denote the number of
compounds and reactions respectively.

Id Metabolic Network C R
N14 Citrate or TCA cycle 21 35
N17 Galactose 35 50
N20 Pentose phosphate 26 37
N24 Glycerolipid 32 49
N28 Glycine, serine and threonine 36 46
N32 Pyruvate 21 51


Table 2: Average number of nodes generated (#N) and Optimal Node Rank (ONR) of the differ-
ent ordering strategies, namely, Random Order (RO), Static OPMET (SO) and Dynamic OPMET
(DO) compared with hypothetical exhaustive search (ES).

Id ES RO SO DO
#N 16384 4190 3859 3273
N14
ONR 1220 3.65 3.77
#N 131072 58379 46411 38973
N17
ONR 22303 140.87 2.54
#N 1048576 147605 82643 78257
N20
ONR 100845 440.48 11.36


our filtering strategies here as the goal is to select the best possible enzyme order-

ing. We have shown the results only up to a network of size 20 enzymes. This is

because, beyond this, the search space grows rapidly, necessitating the application

of filtering strategies.

Table 2 shows the average number of nodes generated and the average opti-

mal node rank of the random ordering, static OPMET and dynamic OPMET, as

compared to an exhaustive search. The average number of nodes results show that

Dynamic OPMET is the best strategy for all the tested networks. Dynamic OP-













MET generates only 7 % of the nodes compared to an exhaustive search for N20.

Whereas a random enzyme ordering generates 14 % of the nodes compared to an

exhaustive search for the same network. Dynamic OPMET generates 30 % of the

nodes for N17. This is still lesser than static priority (35 %) and random ordering

(45 %). Larger values for N17 are expected because the number of reactions and

compounds of this network is much larger than the other networks, resulting in

more interactions in the network (see Table 1).

The optimal node rank results show that Dynamic OPMET has the lowest

optimal node rank on average. On average, it arrives at the optimal solution within

the generation of 0.008 % of the number of nodes possible in an exhaustive search.

This is significantly better than the random ordering, which arrives at the optimal

solution within the generation of 11 % of the nodes on average and Static OPMET,

which arrives at the optimal solution within generation of 0.05 % of the nodes on

average.

We conclude that Dynamic OPMET is the best enzyme prioritization strategy

and hence use this strategy in the rest of the experiments.

4.2 Evaluation of OPMET filtering strategies

Though Dynamic OPMET arrives at the optimal solution early, it tests a large

number of combinations before concluding that the solution found is optimal. In

this section, we evaluate how much our filtering strategies reduce the search space.

Figure 5 shows the results for Dynamic OPMET with target, non-target filters and

a combination of both filters compared to Dynamic OPMET without filters.














Figure 5(a) shows the average number of nodes generated by the four meth-

ods. The combined filters show the best pruning. On an average, the combined

filters prune 91.5 % of the nodes generated in the method without filters. We also

see that most of this benefit is obtained from the target filter (it filters 91.4 % of

the nodes generated by the method without filters). The combined filters gener-

ate only 12700 nodes for N24 (0.004 % of an exhaustive search of 2.68 x 108

combinations).

Figure 5(b) shows the average execution time. We see that the average execu-

tion time shows a trend similar to Figure 5(a). This establishes that the execution

time is proportional to the number of nodes generated. The combined filters ef-

fectively reduce the execution time to one-tenth of the method without filters on

average.

Figure 5(c) shows the average optimal node rank. All the methods have the

same optimal node rank for networks except N24. This suggests that Dynamic

OPMET yielded the optimal solution as early as possible for these networks. The

results for N24 show that filtering strategies can also lead to advancement in find-

ing the optimal solution. For N24, Target filter arrives at the optimal solution 99 %

earlier and the combined filters arrive at the optimal solution 99.9 % earlier than

the method without filters (the additional 0.9 % improvement is obtained from the

non-target filter).

We observe that the target filter is more efficient than the non-target filter.

We conclude that Dynamic OPMET with combined filters is the most effective

strategy for pruning vast amounts of the search space.














4.3 Scalability in number of targets

In this section, we show the scaling of performance of our method with target

size. The target sets compared are of sizes one, two and four. We evaluate Dy-

namic OPMET with combined filters in this subsection, as it is the most effective

combination. Figure 6 shows the results for networks of different sizes.

Figure 6(a) shows the average number of nodes generated for the three target

sets. The network topology determines how the target set size affects the number

of nodes generated. The target compound set can contain highly correlated com-

pounds (deleted by the inhibition of almost the same set of enzymes) or unrelated

compounds (located in different parts of the network). If the target set consists

of correlated compounds, an increase in the target set size decreases the average

number of nodes generated. This can be seen for N14, N17, N20 and N24 (from 2

to 4 target compound queries). The number of nodes generated for the two-target

sets is 62 % lesser than single target sets on an average. For the four-target sets,

this value is 75 % lesser.

On the other hand, if the target set consists of unrelated compounds, an in-

crease in the target set size increases the average number of nodes generated. The

average number of nodes increases by 1.2 times when we go from single target to

two-target sets and by 2.2 times when we go from the two-target to the four-target

sets.

Figure 6(b) shows the average execution time for the three target sets. We see

that the average execution time shows a trend similar to Figure 6(a). It indicates













that the execution time is proportional to the number of nodes generated.

Figure 6(c) shows that the average optimal node rank increases sub linearly

with the number of target compounds. On an average, The two-target queries

arrive at the optimal solution after generation of 1.8 times more nodes than the

single target queries. Similarly, the four-target queries generate 3.8 times more

nodes than the single target queries before arriving at the optimal solution. This

suggests that as the target set size increases, the number of enzyme combinations

that need to be tested before we find the optimal solution increases.


5 Related Work

The classical drug discovery approach involves incorporating large number of

hypothetical targets into in-vitro or cell-based assays and performing automated

high throughput screening (HTS) of vast chemical compound libraries [26, 5]. The

developments in the post-genomic era promise to provide rational drug-design

methods and reduction of serious side-effects [6, 4, 3]. The advances in bioinfor-

matics have led to the concept of reverse pharmacology [25]. The first step in this

approach is the identification of protein targets that may be critical intervention

points in a disease process [22, 1]. The reverse approach is driven by the mech-

anistic basis of the disease and hence is expected to be more efficient than the

classical approach [25].

Rapid identification of optimal protein targets needs a thorough understand-

ing of the underlying metabolic network of the organism affected by a disease.

The availability of fully sequenced genomes has enabled researchers to integrate













the available genomic information to reconstruct and study metabolic networks

[16, 11]. These studies have revealed important properties of these networks

[8, 2, 14]. The potential of an enzyme to be an effective drug target is considered to

be related to its essentiality in the corresponding metabolic network [12]. Lemke

et. al proposed the measure enzyme damage as an indicator of enzyme essentiality

[13, 15]. Recently, a computational approach to prioritize potential drug targets

for antimalarial drugs was developed [17]. A chokepoint analysis of Pfalciparcum

was performed to identify essential enzymes which are potential drug targets. The

protein kinase targets that hold potential to reduce insulin resistance and normal-

ize blood glucose are discussed in [21]. These studies show the effectiveness of

computational techniques in reverse pharmacological approaches.

A combination of microarray time-course data and gene-knockout data is used

to provide an accurate gene network to study the effects of a chemical compound

on the network [10]. An investigation of metabolite essentiality is carried out with

the help of stoichiometric analysis in [9]. These approaches reveal the importance

of studying the role of compounds metabolitess) during the pursuit of computa-

tional solutions to pharmacological problems.


6 Conclusion

Efficient computational methods are required to identify the optimal enzyme-

combination (i.e., drug targets) whose inhibition will achieve the required effect of

eliminating a given target set of compounds while incurring minimal side-effects.

In this paper, we formulated the optimal enzyme-combination identification prob-













lem as an optimization problem on metabolic networks. We defined a graph based

computational model of the network that encapsulates the impact of enzymes onto

compounds. We proposed OPMET, a branch-and-bound algorithm to explore the

search space. We developed a cost model and two enzyme prioritization strate-

gies, Static OPMET and Dynamic OPMET based on it. We also developed two

filtering strategies to prune the search space while still guaranteeing an optimal

solution. The filters compute an upper bound to the number of target compounds

deleted and a lower bound to the side-effect respectively.

Our experiments on E.Coli metabolic network show that our methods reduce

the total search time by several orders of magnitude as compared to the exhaustive

search. The optimal solution is reached by Dynamic OPMET within the explo-

ration of 0.005 % of the total search space on average, proving that our methods

are effective in approximating the impact of enzymes on compounds. Dynamic

OPMET with combined filters pruned 91.6 % of the search space on average.

In this paper, we have developed a foundation for developing effective solu-

tions to pharmacological problems, through analysis of metabolic networks. We

are currently working to improve the accuracy of our network and cost models

in modeling the metabolic network's behavior. We are also developing methods

which will efficiently provide solutions to the same problem for larger metabolic

networks.


References

[1] 'Proteome Mining' can zero in on Drug Targets. Duke University medical














news, Aug 2004.


[2] M Arita. The metabolic world of Escherichia coli is not small. PNAS,

101(6):1543-1547, Feb 2004.

[3] S. Broder and J. C. Venter. Sequencing the Entire Genomes of Free-Living

Organisms: The Foundation of Pharmacology in the New Millennium. An-

nual Review ofPharmacology and Toxicology, 40:97-132, Apr 2000.

[4] S. K. Chanda and J. S. Caldwell. Fulfilling the promise: drug discovery in

the post-genomic era. Drug Discovery Today, 8(4): 168-174, Feb 2003.


[5] J Drews. Drug Discovery: A Historical Perspective. Science,

287(5460):1960-1964, Mar 2000.

[6] Davidov et. al. Advancing drug discovery through systems biology. Drug

Discovery Today, 8(4):175-183, Feb 2003.

[7] Deane et. al. Catechol-o-methyltransferase inhibitors versus active compara-

tors for levodopa-induced complications in parkinson's disease. Cochrane

Database of Systematic Reviews, 4, 2004.

[8] Hatzimanikatis et. al. Metabolic networks: enzyme function and metabolite

structure. Current Opinion in Structural Biology, (14):300-306, 2004.

[9] Imielinski et. al. Investigating metabolite essentiality through genome scale

analysis of E. coli production capabilities. Bioinformatics, Jan 2005.














[10] Imoto et. al. Computational Strategy for Discovering Druggable Gene Net-

works from Genome-Wide RNA Expression Profiles. In PSB 2006 Online

Proceedings, 2006.

[11] Jeong et. al. The large-scale organization of metabolic networks. letters to

NATURE, 407:651-654, Oct 2000.

[12] Jeong et. al. Prediction of Protein Essentiality Based on Genomic Data.

ComPlexUs, 1:19-28, 2003.

[13] Lemke et. al. Essentiality and damage in metabolic networks. Bioinformat-

ics, 20(1):115-119, Jan 2004.

[14] Ma et. al. Decomposition of metabolic network into functional modules

based on the global connectivity structure of reaction graph. Bioinformatics,

20(12):1870-1876, 2004.

[15] Mombach et. al. Bioinformatics analysis of mycoplasma metabolism: Im-

portant enzymes, metabolic similarities, and redundancy. Computers in Bi-

ology and Medicine, 2005.

[16] Teichmann et. al. The Evolution and Structural Anatomy of the Small Mole-

cule Metabolic Pathways in Escherichia coli. JMB, 311:693-708, 2001.

[17] Yeh et. al. Computational Analysis ofPlasmodium falciparum Metabolism:

Organizing Genomic Information to Facilitate Drug Discovery. Genome

Research, 14:917-924, 2004.













[18] M Kanehisa. A database for post-genome analysis. Trends in Genetics,

13(9):375-376, Sep 1997.

[19] M Kanehisa and S Goto. KEGG: kyoto encyclopedia of genes and genomes.

Nucleic Acids Res., 28(1):27-30, Jan 2000.

[20] S.A. Kauffman. The Origins of Order: Self-Organization and Selection in

Evolution. Oxford University Press, 1993.

[21] C. M Rondinone. Serine Kinases as New Drug Targets for the Treatment of

Type 2 Diabetes Current Medicinal Chemistry Immunology, Endocrine

andMetabolic Agents, 5(6):529-536, Dec 2005.

[22] C Smith. Hitting the target. Nature, 422:341-347, Mar 2003.

[23] R. Somogyi and C.A. Sniegoski. Modeling the complexity of genetic net-

works: understanding multigene and pleiotropic regulation. Complexity,

1:45-63, 1996.

[24] R. Surtees and N. Blau. The neurochemistry of phenylketonuria. European

Journal ofPediatrics, 159:109-13, 2000.

[25] Takenaka T. Classical vs reverse pharmacology in drug discovery. BJU

International, 88(2):7-10, Sep 2001.

[26] G Wess. How to escape the bottleneck of medicinal chemistry. Drug Dis-

covery Today, 7(10):533-535, May 2002.




















4 8 2
Pathway Identifier
SNoFilter Non-TargetFiiter OTargetFilter CombinedFiters


(a) Average number of nodes generated


II I I III
I I II III
Pathway Identifier
IINoFilter *Non-TargetFilter TargetFilter B CombinedFilters

(b) Average execution time in milliseconds


Pathway dentifier
i NoFiter Non-TargetFiter OTargetFilter :CombinedFiIters


(c) Average otmal node rank


Figure 5: Comparison of OPMET filtering strategies
Figure 5: Comparison of OPMET filtering strategies























iilil
N14 N17 WO N24 N28 M2
Pathway Identifier
*al-Target M2-Target 04-Target|


(a) Average number of nodes generated


N14 N17 N20O N24 N28 N32
Pathway Identfier
l-Target Targe t 2- et4-Target


(b) Average execution time in milliseconds


N14 N17 N20 N24 N28 N32
Pathway Identifier
[l-Target M2-Target 04-Target


(c) Average oglDnal node rank


Figure 6: Scaling of Dynamic OPMET with combined filters with respect to the target compound
set size




University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs