OPMET: A metabolic networkbased
algorithm for optimal drug target identification*
Padmavati Sridhar, Tamer Kahveci and Sanjay Ranka
Department of Computer and Information Science and Engineering,
University of Florida, Gainesville, FL, USA, 32611
{psridhar, tamer, ranka}@cise.ufl.edu
Abstract
Recent advances in bioinformatics promise rational drugdesign meth
ods which will reduce serious sideeffects through the identification of en
zymatic targets. Efficient computational methods are required to identify
the optimal enzymecombination (i.e., drug targets) whose inhibition will
achieve the required effect of eliminating a given target set of compounds
while incurring minimal sideeffects. An exhaustive evaluation of all possi
ble enzyme combinations may become computationally infeasible for very
large metabolic networks.
*Work supported partially by ORAU (Award No: 00060845).
We formulate the optimal enzymecombination identification problem as
an optimization problem on metabolic networks. We define a graph based
computational model of the network which encapsulates the impact of en
zymes onto compounds. We propose a branchandbound algorithm, named
OPMET, to explore the search space. We develop a cost model and two en
zyme prioritization strategies, Static OPMET and Dynamic OPMET, based
on it. Static OPMET priorities enzymes according to their impacts, such
that the most promising enzymes are inspected first for possible inclusion
in the optimal subset. Dynamic OPMET dynamically updates the priorities
as the search space is explored. We also develop two filtering strategies to
prune the search space while still guaranteeing an optimal solution. They
compute an upper bound to the number of target compounds eliminated and
a lower bound to the sideeffect respectively. Our experiments on E.Coli
metabolic network show that OPMET can reduce the total search time by
several orders of magnitude as compared to the exhaustive search.
1 Introduction
In pharmaceutics, the development of every drug mainly involves target identi
fication, validation and lead inhibitor identification [5]. Traditional drug discovery
approaches focused more on the efficacy of drugs than their toxicity (untoward
side effects). Lack of predictive models that account for the complexity of the
interrelationships between the metabolic processes often leads to drug develop
ment failures. Toxicity and/or lack of efficacy can result if metabolic network
components other than the intended target are affected. This is wellillustrated by
the example of the recent failure of Tolcapone (a new drug developed for Parkin
son's disease) due to observed hepatic toxicity in some patients [7]. Postgenomic
advances has seen a perspective shift in drug research. The current focus is on the
identification of biological targets (gene products, such as enzymes or proteins)
for drugs, which can be manipulated to produce the desired effect (of curing a
disease) with minimum disruptive sideeffects [22, 25].
Enzymes catalyze reactions, which produce metabolites (compounds) in the
metabolic networks of organisms. Enzyme malfunctions can result in the ac
cumulation of certain compounds which may result in diseases. We term such
compounds as Target Compounds and the remaining compounds as NonTarget
compounds. For instance, the malfunction of enzyme phenylalanine hydroxy
lase causes buildup of the amino acid, phenylalanine, resulting in phenylketonuria
[24], a disease that causes mental retardation. It is therefore intuitive to locate the
optimal enzyme set which can be manipulated by drugs to prevent the excess pro
duction of target compounds, with minimal sideeffects. We term the sideeffects
of inhibiting a certain enzyme combination as the damage caused to the metabolic
network.
Given a metabolic network and a set of target compounds, we consider the
problem of identifying the set of enzymes whose inhibition eliminates the tar
get compounds and incurs minimum damage. Evaluating all enzyme combina
tions is not feasible as the number of such combinations increases exponentially
with the number of enzymes. Hence, more efficient computational methods are
needed. We propose OPMET, an Optimal enzyme drug target identification algo
rithm based on Metabolic networks, to solve this problem. The main contributions
of our paper are as follows:
1. We define a graph based computational model of the metabolic network
that encapsulates the impact of enzymes onto compounds. We formulate
the optimal enzyme combination identification problem as an optimization
problem on this graph. We develop a cost model that takes both the observed
and potential damage (i.e, sideeffects) resulting from the inhibition of an
enzyme set into account, for the cost computation.
2. We propose a branchandbound algorithm, named OPMET, to explore the
search space. We develop two enzyme prioritization strategies, Static OP
MET and Dynamic OPMET based on the cost model. Static OPMET prior
itizes enzymes according to the impact of their inhibition on the production
of compounds in the metabolic network. It inspects the most promising en
zymes first, for possible inclusion in the optimal subset. Dynamic OPMET
dynamically updates the priorities as the search space is explored.
3. We develop two filtering strategies for OPMET, namely target filter and
nontargetfilter, to prune the search space while still guaranteeing an opti
mal solution. The target filter eliminates a subspace when it is proven that
there is no combination of enzymes in this space that can stop the produc
tion of all the target compounds (i.e., there is no useful drug target). The
nontarget filter prunes subspaces where there is no solution with a damage
less than that of the optimal solution found so far.
Our experiments on E.Coli metabolic network (data extracted from KEGG [18,
19]) show that our methods can reduce the total search time by several orders of
magnitude as compared to the exhaustive search. Dynamic OPMET with a com
bination of the target and nontarget filters prunes 91.6 % of the search space
of the exhaustive search (232 combinations) on average. Dynamic OPMET with
combined filters generates the optimal enzyme combination within the exploration
0.005 % of the search space on average.
The rest of the paper is organized as follows. Section 2 formally defines
the problem and describes our proposed network model. Section 3 presents the
proposed OPMET algorithm with static and dynamic prioritization and filtering
strategies. Section 4, discusses experimental results. Section 5 discusses the re
lated work. Section 6 gives a brief discussion of the contribution of the paper.
2 Problem Definition
In this section, we propose a formal definition of the optimal enzyme combi
nation identification problem and describe our model of the metabolic network.
We develop a graph based model that captures the interactions between reac
tions, compounds, and enzymes. Our model is a variation of the boolean network
model [23, 20]. Let R, C, and E denote the set of reactions, compounds, and
enzymes respectively. The vertex set consists of all the members of R U C U E.
A vertex is labeled as reaction, compound, or enzyme based on the entity it refers
to. Let VR, Vc, and VE denote the set of vertices from R, C, and E. A unique
identifier is also stored along with each vertex. A directed edge from vertex x to
. Disrupted pathway
(a)
  Disrupted Pathway
(b)
Figure 1: A graph constructed for a metabolic network with three reactions R1, R2, and
R3, two enzymes E1 and E2, and nine compounds C1, C2, C9. Circles, rectangles,
and triangles denote compounds, reactions, and enzymes respectively. Here, C4 (shown
by double circle) is the target compound. Dotted lines indicate the subgraph removed due
to inhibition of an enzyme. (a) Effect of inhibiting E2. (b) Effect of inhibiting Ei.
vertex y is then drawn if one of the following three conditions holds: (1) x rep
resents an enzyme that catalyzes the reaction represented by y. (2) x corresponds
to a substrate for the reaction represented by y. (3) x represents a reaction that
produces the compound mapped to y. Notice that an edge is drawn only between
a vertex in VR and a vertex in Vc U VE.
Figure 1 illustrates a small hypothetical metabolic network. A directed edge
from an enzyme to a reaction implies that the enzyme catalyzes the reaction (i.e.,
El catalyzes RI and E2 catalyzes R2 and R3). A directed edge from a compound
to a reaction implies that the compound is a reactant. A directed edge from a
reaction to a compound implies that the compound is a product. In this figure, C4
is the target compound (i.e., the production of C4 should be stopped). In order
to stop the production of C4, R2 has to be prevented from taking place. This can
be achieved in two ways. One way is by disrupting one of its catalyzing enzymes
(E2 in this case). Another is by stopping the production of one of its reactant
compounds (C2 or C3 in this case). If we choose to stop the production of C2,
we need to recursively look for the enzyme which is indirectly responsible for its
production (El in this case). Thus, the production of the target compound can be
stopped by manipulating either El or E2.
Figure 1(a) shows the disruption of E2 and its effect on the network. Inhibiting
E2 results in the knock out of compounds C5, Cs and C9 in addition to the target
compound, C4. Note that the production of C7 is not stopped since it is produced
by R1 even after the inhibition of E2. We term the effect of disrupting nontarget
compounds as the sideeffect of manipulating E2. We define the number of non
target compounds knocked out as the damage, the manipulation of an enzyme set
causes to the metabolic network. In this case, the damage of inhibiting E2 is 3 (i.e.,
C5, Cs and Cg). Figure l(b) shows the inhibition of El and its effect on the same
network. In this case, the damage is 2 (i.e., C2 and C5). The important observation
is that El and E2 both achieve the effect of disrupting the target compound, C4.
Hence, El and E2 are both potential drug targets. However, El is a better drug
target than E2 since it causes lesser damage.
We formally state the optimal enzyme combination identification problem as:
Given a set of target compounds T (T c C), find the set of enzymes
X (X C E) with minimum damage, whose inhibition stops the pro
duction of all the compounds in T.
Our graph model accurately reflects the impacts of gene manipulations. Inhi
bition of a set of enzymes, for example, is simulated by deletion of the vertices
for the enzymes in that set. This modification then propagates to the vertices that
are reachable from the deleted vertices.
Lemke et. al [13, 15] defined the damage of inhibition of an enzyme as the
number of compounds whose production stops after the inhibition of that enzyme
(all the compounds are assumed to be of equal importance to the network). Our
definition of damage is similar to that of Lemke in principle. We differ by ex
cluding the target compounds from the damage computation. At a high level, we
consider damage as the sideeffects of inhibiting an enzyme set.
3 Methods
The number of possible subsets of enzymes that need to be considered for
finding the optimal drug target is exponential in an exhaustive search. Clearly, this
is not feasible beyond small sized metabolic networks. In this section, we propose
OPMET, a branch and bound algorithm that considerably reduces the number of
possible combinations to be searched while still guaranteeing to find an optimal
solution. We first describe the state space that represents all possible solutions
using a tree and the backtracking mechanism which guarantees that the search
is conducted till an optimal solution is found. As part of any effective branch
and bound strategy it is important to find a good solution quickly. This calls for
effective filtering (or pruning) of subspaces that can be guaranteed not to have
better solution that the best found so far. Our prioritization and filtering strategies
which achieve this goal are described in detail in the following subsections.
3.1 State Space and Basic Strategy
In this subsection we discuss our modeling of the search space and the basic
underlying strategy of the OPMET algorithm. We develop a systematic and flexi
ble enumeration of the search space and use it to build our efficient search strategy
to find optimal result.
Let Ei, Vi, 1 < i < m denote the set of enzymes for a metabolic network. The
search space is modeled as a tree structure. Every node of this tree corresponds to
a state in the search space and it is represented by a 4tuple ([en, e ,,  e m], k,
d, remove). Here, 7r1, ., 7m is a permutation of 1, 2, * m. The first parameter
corresponds to the state of all the enzymes (i.e., e, corresponds to enzyme Ei).
e~ = 1 if Ei is inhibited. Otherwise, e, = 0. The parameter k indicates that
the first k enzymes are considered at that search state. The decision to inhibit or
not inhibit has been fixed for enzymes from 1 to k 1. We now set e, = 1
and e, = 0, Vi, k < i < m. The damage incurred due to inhibited enzymes at
that state is represented by d. The final parameter, remove, is a boolean variable.
It takes value True if the inhibited enzymes stop the production of all the target
compounds. Otherwise, it is set to False. We call a node with remove = True
as a true node, and a node with remove = False as a false node. Figure 2 shows
part of the search tree for a hypothetical 4enzyme network. The tuple for node
ni indicates that enzyme El is inhibited and it is a true node with damage d = 5.
k = 1 since the decision to inhibit or not has been fixed only for one enzyme.
OPMET Algorithm: We start with a root node whose 4tuple is ([0, 0, * 0],
0, 0, False) indicating that all enzymes are present in the network. As the search
space is traversed, we keep the true node with the minimum damage found so far
as the current true solution and store the associated damage value as D, the global
cutoff threshold. D is initialized to the number of compounds in the network. At
any point, we have an active set of nodes A, stored in a stack structure. A contains
the nodes currently being considered. Let node N = ([e, e,,, .* e ], k, d,
remove) be the node on top of this stack (i.e., the node to be evaluated). We check
if the damage of the current node N, d < D and whether it is a true node. There
are three cases:
Case 1: N is a true node with damage d < D. In this case, we save N as the
10
[0001,
[0110, 3, 2, True]
Figure 2: The basic OPMET strategy for a hypothetical 4enzyme network. Enzymes
are ordered as El, E2, E3, E4. no, nl, " , n4 are the nodes generated. The initial global
cutoffthreshold D = 10 initializedd to the total number of compounds in the network).
ni is a true solution (shown by double circle) with damage d = 5. Since d < D, D is
updated to 5. n1 is saved and the subtree rooted at n1 is pruned. The method backtracks
to no. n2, n3 are false nodes generated along the search with damage d < D. ns is a true
node with d = 2. As d < D, D is updated to 2 and the method backtracks to search the
unexplored space for solutions with d < D (indicated by the dashed edge).
current true solution and update D with the damage value of N. We then
backtrack.
Case 2: N is a true or false node with damage d > D. In this case we prune
the subtree rooted at N. We then backtrack.
Case 3: N is a false node with damage d < D. In this case, we insert N in
the active set A for backtracking purposes. We then create a new node N'
by setting e+ = 1 in N (i.e., we inhibit the enzyme Ek+ ). The resulting
node is N' = ([e,, e,2, * e,], k + 1, d', remove'). The node N' is
evaluated in the next step similarly.
Backtracking involves following steps. First we pick the top node from the ac
tive nodes stack A. Let N' = ([e,, e/, e], k', d', remove') denote this
node. We then set e,, = 0 (indicating the node we are backtracking from) and
e, 2 1 in N' (i.e., we inhibit the enzyme e,,, k). The resulting node becomes
the node to be evaluated in the next step. The first two cases above stop expanding
the tree at the current node. The former one implies that the current node is a pos
sible solution (node ni in Figure 2). The latter one implies that the current node
incurs too much damage to lead to a possible solution. The third case happens
when the current node does not stop production of all the target compounds, but
the damage is lower than the damage of the current best solution (nodes n2 and
n3 in Figure 2). Such nodes may produce a possible solution (node n4 in Figure
2) with the inhibition of more enzymes. Thus, they need to be explored further
to ensure that we find an optimal solution. The search terminates when there are
no more nodes to explore. At this stage, the current true solution is the optimal
solution.
3.2 OPMET Prioritization Strategies
In this subsection we present our cost model as the basis for enzyme evalua
tion and our resulting enzyme prioritization strategies. The purpose of ordering
enzymes by a priority measure is to test the combinations with a high likelihood
of deleting the target compounds first. We develop two enzyme prioritization
strategies: 1) Static OPMET: We preorder the enzymes according to a priority
measure and apply the OPMET algorithm. 2) Dynamic OPMET: We dynami
cally select the next enzyme to inhibit based on the partial solution. These are
described in more detail in the rest of the section.
3.2.1 Static OPMET
The simplest way to order the enzymes is to sort them in ascending order of
damage. However, this is inaccurate for several reasons. First, since damage is de
fined in terms of non target compounds, it does not reflect an enzyme's impact on
the target compounds. Second, the inhibition of an enzyme can make a compound
more vulnerable even if the compound is not removed entirely. For example, in
Figure 1, the production of C7 is not stopped after the inhibition of El or E2 indi
vidually. However, C7 is removed if El is inhibited as well as E2. Thus, damage
values of individual enzymes are not good indicators of the combined damage of
these enzymes.
Cost Model: We develop a cost model as the basis for enzyme ordering in OP
MET. This cost model takes both the observed and potential damage (sideeffects)
resulting from the inhibition of an enzyme set into the cost computation. For each
enzyme Ei c E, the set of enzymes in the network, we compute a weight W(Ei)
as W(Ei) = 0 if Ei inhibited and W(Ei) = 1 otherwise. We assign rational
weights (fractions between 0 and 1) to the reaction and compound nodes and the
edges. Intuitively, the weight of a node or edge denotes the rate at which that node
or edge appears in the network. We then use these weights to compute the cost of
Ei. The weight of each node depends on the weights of the incoming edges and
the type of the node (i.e., reaction or compound):
Cost Rule 1: Let Rj be a reaction node. Let , 1 < i < k, denote the
weights of the incoming edges to Rj. We compute the weight of Rj as:
.......... Disrupted Pathway
Figure 3: Effect of deletion of enzyme El on the weights of reactions and com
pounds in the network.
W(Rj) =min,lt .}.
This computation is intuitive since a reaction takes place only if all the in
puts are present.
Cost Rule 2: Let Cj be a compound node. Let "t 1 < i < k, denote the
weights of the incoming edges to Cj. We compute the weight of Cj as:
W(cj) =k {t}.
This computation captures the property that a compound disappears from
the system only if all the reactions that produce it stop.
We define the weight of an edge as the weight of the node for which it is the
outgoing edge.
In order to compute the cost of Ei, we set the weight of Ei to zero (i.e.,
W(Ei) = 0). The weights of all the reaction and compound nodes are as
signed progressively by a breadthfirst search, according to the above scheme.
The weights of all the nodes and edges which can be reached from Ei are recom
puted to reflect the change. The effect of deleting El on the weights of reactions
and compounds in the network is shown in Figure 3. We define an impact vector
for each enzyme based on the effects of its inhibition.
Definition 1 Given a network i/ ith n compounds, Cj, 1 < j < n. Let Wi(Cj)
denote the weight of the node corresponding to Cj after the inhibition of enzyme
Ei. We define the impact vector of E as I(E,) = [WW(C1), Wi(C2), Wi(Cn)]
We term Wi(Cj) as the impact of Ei on Cj, Vj. U
The impact vector of an enzyme approximates the amount of each compound
that remains after the inhibition of that enzyme. Every entry of the impact vector
is a fractional number between 0 and 1, where 0 indicates that the corresponding
compound does not exist after inhibition of the corresponding enzyme. We define
the cost of inhibiting an enzyme as follows:
Definition 2 Given a network i, ith n compounds, Cj, 1 < j < n. Assume that
the compounds Cj, Vj, 1 < j < k < n constitute the set of target compounds.
Assume that the remaining compounds Cj, Vj, k + 1 < j < n constitute the non
target compounds. Let I(Ei) = [Wi(CI), * Wi(C,)] denote the impact vector
of Ei. We define the cost of inhibiting Ei as
cost(Ei) = I(Ei) VT,
where V [vl, . v,] is the normalization vector: = k for 1 < i < k, and
( (k )fork
Each target compound contributes a positive value and each nontarget com
15
pound contributes a negative value to the cost of an enzyme. The magnitude of the
contribution of a compound increases linearly with the amount of that compound
that diminishes from the system after the inhibition of the corresponding enzyme.
In other words, if a target compound is eliminated, the cost is small. On the other
hand, if a nontarget compound is eliminated, the cost is large. This is justified
since the cost promotes removal of target compounds and demotes the removal of
nontarget compounds.
In order to benefit from the pruning power of OPMET cases 1 and 2 (see
Section 3.1), we need to compute the permutation 7r1, .. 7m carefully. The
earlier we place the enzymes in the optimal solution in this permutation, the better,
as OPMET reaches the optimal solution earlier under such an ordering. Reaching
the optimal solution early increases the chances of pruning the remaining nodes
of the search tree due to OPMET Case 2. We sort the enzymes in the ascending
order of cost (see Definition 2). This is justified since a small cost value indicates
that the corresponding enzyme has a large impact on target compounds and low
impact on nontarget compounds.
3.2.2 Dynamic OPMET
Static OPMET priorities enzymes based on their individual potential to be
part of the solution. However, this model does not fully account for the combined
damage of a set of enzymes. Finding the combined damage of a set of enzymes
is non trivial. This is because the combined damage depends on the overlap of
the reactions catalyzed by these enzymes as well as their individual damages. En
zymes that catalyze almost the same set of reactions have less combined damage
1. Let N= ([e, c, e2, e*,] k, d, remove) be the node currently being evaluated.
2. Let R = [rl, r2, * rn] /* r, E [0, 1] corresponds to the
remaining fraction of compound Ci, Vi */
3. For every candidate enzyme ei E ({e,, e 1 e, }
(a) Ri= R I(Ei) /* Compute new ratio */
(b) Cost(Ei) = Ri VT /* Cost of inhibiting E, */
4. Letj = argminj {Cost(Ej)}
5. R Rj /* Update the ratio after inhibition
of the enzyme with least cost */
6. Select Ej as the next enzyme to inhibit.
Figure 4: Dynamic OPMET procedure for enzyme evaluation.
compared to enzymes that catalyze disjoint reaction sets. This is due to the overlap
of the damage of the enzymes in the former case. We develop a dynamic strat
egy, Dynamic OPMET, which sorts the enzymes on the fly, based on the partial
solution. At a high level, this strategy evaluates the current state at every step and
picks the enzymetoinhibit for the next step based on this evaluation.
Our goal is to dynamically decide which enzyme to inhibit next based on the
combined impact of all the enzymes inhibited so far. We predict the remaining
fraction of all the compounds after inhibition of each enzyme Ei, with the help of
its impact vector, I(Ei) (see Definition 1). This procedure is described in Figure
4. Let R [r, r2, ,, r,,] denote the remaining fractions of compounds (step
1). Here, ri E [0, 1] corresponds to compound Ci, Vi. We initialize ri = 1 VC
indicating that all compounds are being produced without any disruption in the
metabolic network. Let V be the normalization vector (see Definition 2).
At every step of the Dynamic OPMET algorithm, let N = ([e,, e2, ,
e_], k, d, remove) be the node currently being evaluated (i.e., the decision to
inhibit or not inhibit has been fixed for e W, e2,, e. ) (step 2). We now
need to decide which enzyme has to be evaluated next. In Step 3a, for every
enzyme in the remaining enzyme set (e,, Vi, k < i < m), we compute the new
remaining fractions of compounds (Ri). This is done by a Vector Direct Product
of R and the impact vector of Ei (I(Ei)). Vector direct product is defined as
X @ Y = [xIyil,x2y,  x.,,], where X = [xi,  , x,] and Y = [y ,
y,]. The resulting vector Ri is an approximation to the impact of inhibition of
the enzyme Ei in addition to already inhibited enzymes. This is justified since
the quantity of a compound eliminated by a combination including Ei will be at
least as much as the quantity eliminated by Ei alone. A good candidate enzyme
at this step is one which ensures that lesser of the target compounds remain after
its inhibition. Also, it should ensure that the non target compounds suffer the
minimum possible damage. Our cost model satisfies these requirements. In Step
3b, we compute the cost of each enzyme Ei as the dot product of Ri and V. In
step 4, we pick the enzyme with the minimum cost (Ej). We update R with the
amounts of remaining compounds in vector Rj. Ej is chosen as the next enzyme
to inhibit. Thus, this strategy dynamically chooses the the next best enzyme to
inhibit.
The cost of finding the best enzyme at each step is O(mn), where m and n
denote the number of enzymes and compounds in the metabolic network respec
tively. This is because a vector direct product costs O(n), and O(m) such products
are carried out.
3.3 OPMET Filtering Strategies
The static and dynamic OPMET strategies perform a costbased ordering of
the enzymes in order to arrive at the optimal set of enzymes quickly. However,
the method still needs to inspect a large portion of the search space, in order to
ensure that the result is the optimal one. A good filtering strategy is essential to
ensure optimality within a reasonable amount of time. We propose two strategies
to eliminate large portions of the search space quickly. We start by introducing the
following theorem (proof given at the end of this subsection) which establishes a
relationship between the impact of enzymes and their damage.
Theorem 1 Let E {El, E2, * Er} be a set of enzymes. Let Cj be a com
pound in the metabolic network. Let di(Cj), 1 < i < r, denote the impact of Ei
on Cj. If the inhibition of all the enzymes in E stops the production of Cj, then
Ei 1( di(Cj)) > 1. t
Next, we describe our filtering strategies.
Target Filter: We traverse the state space to locate the optimal combination of
enzymes to delete a given target compound set. Considerable effort is spent evalu
ating combinations which might yield a solution as the encountered damage value
is less than D, the global cutoff threshold (see Subsection 3.1). If none of these
combinations ultimately delete the target compound set, the effort spent is wasted.
Finding such combinations will improve the performance of the search vastly.
This is the motivation behind our Target filter.
The target filter eliminates a bulk of the search space when it is proven that
there is no combination of enzymes in this space that can stop the production of all
the target compounds (i.e., there is no useful drug target). This filtering strategy is
based on Theorem 1. Formally, let node N = ([e,, e,2, ., el], k, d, False) be
a node in the search space. Let T denote the set of target compounds. Backtrack
if
k nm
(1 di(C))e + Y (1 d(C)) < 1,3C c T.
i 1 i=k+1
In this inequality, the first term indicates the impact of enzymes, which are
currently part of the solution set, on the target compounds. The second term
represents the impact of the remaining enzymes on the target compounds. Thus,
this filter examines not only the next enzyme being considered, but all the enzymes
which might eventually be considered as part of the solution.
Nontarget Filter: We traverse the state space to locate the optimal combination
of enzymes to delete a given target compound set. The motivation behind this
filtering strategy is to quickly determine if there is any solution in the subtree with
a damage d < D, the global cutoff threshold (see Subsection 3.1). This strategy
eliminates a large chunk of the search space where all the possible combinations
have damage more D. This filter utilizes Theorem 1 similar to the target filter.
The idea is as follows. At a given node N, for each target compound, C, we find
the minimum number of enzymes, n such that
(1 d(C))e '> 1.
i= 1
This gives us the minimum number of enzymes needed to delete C. Let n,ax be
the maximum value of n for any target compound (i.e, we will need at least nmax
enzymes to delete the entire target compound set). Now, we sort the remaining
enzymes (enzymes not considered so far) in the ascending order of their damage
values. Let dmax be the damage of the enzyme at index nmax. If dmax in addition
to the damage incurred so far is greater than D, the global cutoff threshold, we
prune the sub tree rooted at N.
Proof of Theorem 1: Let E = {E1, E2, * Er} be a set of enzymes. Let Cj be
a compound in the metabolic network. Let di, 1 < i < r, denote the impact of Ei
on Cj (see Definition 1). Let di(Rk), 1 < i < r, denote the impact of Ei on Rk.
We need to show that the following rules hold:
Rule 1: If the inhibition of all the enzymes in E stops the production of Cj, then
r
Y(1 d (C)) > 1.
i 1
Rule 2: If the inhibition of all the enzymes in E stops a reaction Rk, then
r
i 1
We first consider the case when a reaction Rk is stopped and prove that Rule
2 holds, given that Rule 1 is correct.
Case 1: Ej cEE such that Ej catalyzes Rk. Hence, the inhibition of Ej stops
Rk. The impact of Ej on Rk is 1 dj(Rk) = 1. Therefore, i(1 di(Rk)) > 1.
Rule 2 holds.
Case 2. No enzyme Ej E E catalyzes Rk. This implies that E removes Ci,
which is an input to Rk. From Rule 1, i(1 di(Ci)) > 1. From the Cost
Model (see Cost Rule 1 in subsection 3.2), di(Rk) = mindi(C')}, C' is the set
of all input compounds of Rk. Thus, 1 di(Rk) = max{1 di(C')}. Hence,
i 1(1 di(Rk)) > i ,( di(Ci)) > i.
Rule 2 holds.
We now consider the case when the production of a compound Cj stops and
prove that Rule 1 holds, given that Rule 2 is correct. If E removes Cj, it means
that all the reactions that produce Cj have been stopped. Let R1, R2, Rt be
the reactions that produce Cj. For each reaction Rk, 1 < k < t, one of the the
following two cases has to be true.
Case 1: 3 Ej E E such that Ej catalyzes Rk. Hence, the inhibition of Ej
stops Rk. The impact of Ej on Rk is 1 dj (Rk) = 1. From the Cost Model (see
Cost Rule 2 in subsection 3.2), 1 dj(Cj) > 1. Since all the t input reactions of
Cj have been stopped,
Er(1 di(Cj)) > x t > i.
Rule 1 holds.
Case 2: No enzyme Ej E E directly catalyzes Rk. But since Rk is stopped by
the inhibition of enzymes in E, 'L(1 di(Rk)) > 1 (Rule 2). Also given that
t reactions produce Cj,
(1 d(Cj,)) 1 (I
t
Zr ( rdC)) r Z 1 Zt E (1 d (R))
1xt>
Rule 1 holds. U
4 Experimental Results
In this section, we evaluate the performance of the OPMET algorithm using
the following three criteria:
1. Number of nodes generated: It represents the total number of enzyme combi
nations tested to complete the search. The lesser the number of nodes generated,
the better the performance of the method.
2. Optimal node rank: This indicates the number of nodes explored before
the method arrives at the optimal solution. A small optimal node rank indicates
that the strategy efficiently tested the optimal combination as soon as the search
started.
3. Execution time: This indicates the total time taken by the method to finish the
search and conclude that it found an optimal solution.
We implemented the OPMET algorithm with a random enzyme ordering, Sta
tic OPMET and Dynamic OPMET (see Subsection 3.2). We implemented two
filtering strategies, Nontarget filter and Target filter (see Subsection 3.3), to pro
vide pruning capabilities to the OPMET algorithm. We compare the methods we
implemented to a hypothetical exhaustive search which enumerates and tests all
possible combinations of enzymes (2", where n is the number of enzymes).
We extracted the metabolic network information of Escherichia Coli from
KEGG [18, 19] (ftp: // ftp. genome. jp/pub/kegg/pathways/eco/).
The metabolic network in KEGG has been divided into smaller networks accord
ing to their specific functions. We chose six of these networks for our experi
ments, based on the number of enzymes. We devised a labeling scheme for the
networks which begins with 'N' and is followed by the number of enzymes in
the network. For instance, 'N20' indicates a network with 20 enzymes. Table 1
shows the metabolic networks chosen, along with their identifiers and the number
of compounds (C) and reactions (R). For each network, we constructed query sets
of sizes one, two and four target compounds, by randomly choosing compounds
from that network. Each query set contains 10 queries each.
We ran our experiments on an Intel Pentium 4 processor with 2.8 GHz clock
speed and 1GB main memory running Linux operating system.
4.1 Evaluation of OPMET prioritization strategies
In this section, our goal is to evaluate the performance of our enzyme priori
tization strategies. Here, we compare our static and dynamic OPMET algorithms
with a random ordering of enzymes and an exhaustive search. We do not include
Table 1: Metabolic networks from KEGG with identifier (Id). C and R denote the number of
compounds and reactions respectively.
Id Metabolic Network C R
N14 Citrate or TCA cycle 21 35
N17 Galactose 35 50
N20 Pentose phosphate 26 37
N24 Glycerolipid 32 49
N28 Glycine, serine and threonine 36 46
N32 Pyruvate 21 51
Table 2: Average number of nodes generated (#N) and Optimal Node Rank (ONR) of the differ
ent ordering strategies, namely, Random Order (RO), Static OPMET (SO) and Dynamic OPMET
(DO) compared with hypothetical exhaustive search (ES).
Id ES RO SO DO
#N 16384 4190 3859 3273
N14
ONR 1220 3.65 3.77
#N 131072 58379 46411 38973
N17
ONR 22303 140.87 2.54
#N 1048576 147605 82643 78257
N20
ONR 100845 440.48 11.36
our filtering strategies here as the goal is to select the best possible enzyme order
ing. We have shown the results only up to a network of size 20 enzymes. This is
because, beyond this, the search space grows rapidly, necessitating the application
of filtering strategies.
Table 2 shows the average number of nodes generated and the average opti
mal node rank of the random ordering, static OPMET and dynamic OPMET, as
compared to an exhaustive search. The average number of nodes results show that
Dynamic OPMET is the best strategy for all the tested networks. Dynamic OP
MET generates only 7 % of the nodes compared to an exhaustive search for N20.
Whereas a random enzyme ordering generates 14 % of the nodes compared to an
exhaustive search for the same network. Dynamic OPMET generates 30 % of the
nodes for N17. This is still lesser than static priority (35 %) and random ordering
(45 %). Larger values for N17 are expected because the number of reactions and
compounds of this network is much larger than the other networks, resulting in
more interactions in the network (see Table 1).
The optimal node rank results show that Dynamic OPMET has the lowest
optimal node rank on average. On average, it arrives at the optimal solution within
the generation of 0.008 % of the number of nodes possible in an exhaustive search.
This is significantly better than the random ordering, which arrives at the optimal
solution within the generation of 11 % of the nodes on average and Static OPMET,
which arrives at the optimal solution within generation of 0.05 % of the nodes on
average.
We conclude that Dynamic OPMET is the best enzyme prioritization strategy
and hence use this strategy in the rest of the experiments.
4.2 Evaluation of OPMET filtering strategies
Though Dynamic OPMET arrives at the optimal solution early, it tests a large
number of combinations before concluding that the solution found is optimal. In
this section, we evaluate how much our filtering strategies reduce the search space.
Figure 5 shows the results for Dynamic OPMET with target, nontarget filters and
a combination of both filters compared to Dynamic OPMET without filters.
Figure 5(a) shows the average number of nodes generated by the four meth
ods. The combined filters show the best pruning. On an average, the combined
filters prune 91.5 % of the nodes generated in the method without filters. We also
see that most of this benefit is obtained from the target filter (it filters 91.4 % of
the nodes generated by the method without filters). The combined filters gener
ate only 12700 nodes for N24 (0.004 % of an exhaustive search of 2.68 x 108
combinations).
Figure 5(b) shows the average execution time. We see that the average execu
tion time shows a trend similar to Figure 5(a). This establishes that the execution
time is proportional to the number of nodes generated. The combined filters ef
fectively reduce the execution time to onetenth of the method without filters on
average.
Figure 5(c) shows the average optimal node rank. All the methods have the
same optimal node rank for networks except N24. This suggests that Dynamic
OPMET yielded the optimal solution as early as possible for these networks. The
results for N24 show that filtering strategies can also lead to advancement in find
ing the optimal solution. For N24, Target filter arrives at the optimal solution 99 %
earlier and the combined filters arrive at the optimal solution 99.9 % earlier than
the method without filters (the additional 0.9 % improvement is obtained from the
nontarget filter).
We observe that the target filter is more efficient than the nontarget filter.
We conclude that Dynamic OPMET with combined filters is the most effective
strategy for pruning vast amounts of the search space.
4.3 Scalability in number of targets
In this section, we show the scaling of performance of our method with target
size. The target sets compared are of sizes one, two and four. We evaluate Dy
namic OPMET with combined filters in this subsection, as it is the most effective
combination. Figure 6 shows the results for networks of different sizes.
Figure 6(a) shows the average number of nodes generated for the three target
sets. The network topology determines how the target set size affects the number
of nodes generated. The target compound set can contain highly correlated com
pounds (deleted by the inhibition of almost the same set of enzymes) or unrelated
compounds (located in different parts of the network). If the target set consists
of correlated compounds, an increase in the target set size decreases the average
number of nodes generated. This can be seen for N14, N17, N20 and N24 (from 2
to 4 target compound queries). The number of nodes generated for the twotarget
sets is 62 % lesser than single target sets on an average. For the fourtarget sets,
this value is 75 % lesser.
On the other hand, if the target set consists of unrelated compounds, an in
crease in the target set size increases the average number of nodes generated. The
average number of nodes increases by 1.2 times when we go from single target to
twotarget sets and by 2.2 times when we go from the twotarget to the fourtarget
sets.
Figure 6(b) shows the average execution time for the three target sets. We see
that the average execution time shows a trend similar to Figure 6(a). It indicates
that the execution time is proportional to the number of nodes generated.
Figure 6(c) shows that the average optimal node rank increases sub linearly
with the number of target compounds. On an average, The twotarget queries
arrive at the optimal solution after generation of 1.8 times more nodes than the
single target queries. Similarly, the fourtarget queries generate 3.8 times more
nodes than the single target queries before arriving at the optimal solution. This
suggests that as the target set size increases, the number of enzyme combinations
that need to be tested before we find the optimal solution increases.
5 Related Work
The classical drug discovery approach involves incorporating large number of
hypothetical targets into invitro or cellbased assays and performing automated
high throughput screening (HTS) of vast chemical compound libraries [26, 5]. The
developments in the postgenomic era promise to provide rational drugdesign
methods and reduction of serious sideeffects [6, 4, 3]. The advances in bioinfor
matics have led to the concept of reverse pharmacology [25]. The first step in this
approach is the identification of protein targets that may be critical intervention
points in a disease process [22, 1]. The reverse approach is driven by the mech
anistic basis of the disease and hence is expected to be more efficient than the
classical approach [25].
Rapid identification of optimal protein targets needs a thorough understand
ing of the underlying metabolic network of the organism affected by a disease.
The availability of fully sequenced genomes has enabled researchers to integrate
the available genomic information to reconstruct and study metabolic networks
[16, 11]. These studies have revealed important properties of these networks
[8, 2, 14]. The potential of an enzyme to be an effective drug target is considered to
be related to its essentiality in the corresponding metabolic network [12]. Lemke
et. al proposed the measure enzyme damage as an indicator of enzyme essentiality
[13, 15]. Recently, a computational approach to prioritize potential drug targets
for antimalarial drugs was developed [17]. A chokepoint analysis of Pfalciparcum
was performed to identify essential enzymes which are potential drug targets. The
protein kinase targets that hold potential to reduce insulin resistance and normal
ize blood glucose are discussed in [21]. These studies show the effectiveness of
computational techniques in reverse pharmacological approaches.
A combination of microarray timecourse data and geneknockout data is used
to provide an accurate gene network to study the effects of a chemical compound
on the network [10]. An investigation of metabolite essentiality is carried out with
the help of stoichiometric analysis in [9]. These approaches reveal the importance
of studying the role of compounds metabolitess) during the pursuit of computa
tional solutions to pharmacological problems.
6 Conclusion
Efficient computational methods are required to identify the optimal enzyme
combination (i.e., drug targets) whose inhibition will achieve the required effect of
eliminating a given target set of compounds while incurring minimal sideeffects.
In this paper, we formulated the optimal enzymecombination identification prob
lem as an optimization problem on metabolic networks. We defined a graph based
computational model of the network that encapsulates the impact of enzymes onto
compounds. We proposed OPMET, a branchandbound algorithm to explore the
search space. We developed a cost model and two enzyme prioritization strate
gies, Static OPMET and Dynamic OPMET based on it. We also developed two
filtering strategies to prune the search space while still guaranteeing an optimal
solution. The filters compute an upper bound to the number of target compounds
deleted and a lower bound to the sideeffect respectively.
Our experiments on E.Coli metabolic network show that our methods reduce
the total search time by several orders of magnitude as compared to the exhaustive
search. The optimal solution is reached by Dynamic OPMET within the explo
ration of 0.005 % of the total search space on average, proving that our methods
are effective in approximating the impact of enzymes on compounds. Dynamic
OPMET with combined filters pruned 91.6 % of the search space on average.
In this paper, we have developed a foundation for developing effective solu
tions to pharmacological problems, through analysis of metabolic networks. We
are currently working to improve the accuracy of our network and cost models
in modeling the metabolic network's behavior. We are also developing methods
which will efficiently provide solutions to the same problem for larger metabolic
networks.
References
[1] 'Proteome Mining' can zero in on Drug Targets. Duke University medical
news, Aug 2004.
[2] M Arita. The metabolic world of Escherichia coli is not small. PNAS,
101(6):15431547, Feb 2004.
[3] S. Broder and J. C. Venter. Sequencing the Entire Genomes of FreeLiving
Organisms: The Foundation of Pharmacology in the New Millennium. An
nual Review ofPharmacology and Toxicology, 40:97132, Apr 2000.
[4] S. K. Chanda and J. S. Caldwell. Fulfilling the promise: drug discovery in
the postgenomic era. Drug Discovery Today, 8(4): 168174, Feb 2003.
[5] J Drews. Drug Discovery: A Historical Perspective. Science,
287(5460):19601964, Mar 2000.
[6] Davidov et. al. Advancing drug discovery through systems biology. Drug
Discovery Today, 8(4):175183, Feb 2003.
[7] Deane et. al. Catecholomethyltransferase inhibitors versus active compara
tors for levodopainduced complications in parkinson's disease. Cochrane
Database of Systematic Reviews, 4, 2004.
[8] Hatzimanikatis et. al. Metabolic networks: enzyme function and metabolite
structure. Current Opinion in Structural Biology, (14):300306, 2004.
[9] Imielinski et. al. Investigating metabolite essentiality through genome scale
analysis of E. coli production capabilities. Bioinformatics, Jan 2005.
[10] Imoto et. al. Computational Strategy for Discovering Druggable Gene Net
works from GenomeWide RNA Expression Profiles. In PSB 2006 Online
Proceedings, 2006.
[11] Jeong et. al. The largescale organization of metabolic networks. letters to
NATURE, 407:651654, Oct 2000.
[12] Jeong et. al. Prediction of Protein Essentiality Based on Genomic Data.
ComPlexUs, 1:1928, 2003.
[13] Lemke et. al. Essentiality and damage in metabolic networks. Bioinformat
ics, 20(1):115119, Jan 2004.
[14] Ma et. al. Decomposition of metabolic network into functional modules
based on the global connectivity structure of reaction graph. Bioinformatics,
20(12):18701876, 2004.
[15] Mombach et. al. Bioinformatics analysis of mycoplasma metabolism: Im
portant enzymes, metabolic similarities, and redundancy. Computers in Bi
ology and Medicine, 2005.
[16] Teichmann et. al. The Evolution and Structural Anatomy of the Small Mole
cule Metabolic Pathways in Escherichia coli. JMB, 311:693708, 2001.
[17] Yeh et. al. Computational Analysis ofPlasmodium falciparum Metabolism:
Organizing Genomic Information to Facilitate Drug Discovery. Genome
Research, 14:917924, 2004.
[18] M Kanehisa. A database for postgenome analysis. Trends in Genetics,
13(9):375376, Sep 1997.
[19] M Kanehisa and S Goto. KEGG: kyoto encyclopedia of genes and genomes.
Nucleic Acids Res., 28(1):2730, Jan 2000.
[20] S.A. Kauffman. The Origins of Order: SelfOrganization and Selection in
Evolution. Oxford University Press, 1993.
[21] C. M Rondinone. Serine Kinases as New Drug Targets for the Treatment of
Type 2 Diabetes Current Medicinal Chemistry Immunology, Endocrine
andMetabolic Agents, 5(6):529536, Dec 2005.
[22] C Smith. Hitting the target. Nature, 422:341347, Mar 2003.
[23] R. Somogyi and C.A. Sniegoski. Modeling the complexity of genetic net
works: understanding multigene and pleiotropic regulation. Complexity,
1:4563, 1996.
[24] R. Surtees and N. Blau. The neurochemistry of phenylketonuria. European
Journal ofPediatrics, 159:10913, 2000.
[25] Takenaka T. Classical vs reverse pharmacology in drug discovery. BJU
International, 88(2):710, Sep 2001.
[26] G Wess. How to escape the bottleneck of medicinal chemistry. Drug Dis
covery Today, 7(10):533535, May 2002.
4 8 2
Pathway Identifier
SNoFilter NonTargetFiiter OTargetFilter CombinedFiters
(a) Average number of nodes generated
II I I III
I I II III
Pathway Identifier
IINoFilter *NonTargetFilter TargetFilter B CombinedFilters
(b) Average execution time in milliseconds
Pathway dentifier
i NoFiter NonTargetFiter OTargetFilter :CombinedFiIters
(c) Average otmal node rank
Figure 5: Comparison of OPMET filtering strategies
Figure 5: Comparison of OPMET filtering strategies
iilil
N14 N17 WO N24 N28 M2
Pathway Identifier
*alTarget M2Target 04Target
(a) Average number of nodes generated
N14 N17 N20O N24 N28 N32
Pathway Identfier
lTarget Targe t 2 et4Target
(b) Average execution time in milliseconds
N14 N17 N20 N24 N28 N32
Pathway Identifier
[lTarget M2Target 04Target
(c) Average oglDnal node rank
Figure 6: Scaling of Dynamic OPMET with combined filters with respect to the target compound
set size
