January 16, 2008 12:35 Proceedings Trim Size: 9in x 6in
NP COMPLETENESS FOR OPTIMAL ENZYME COMBINATION
IDENTIFICATION
TAMER KAHVECI*
Computer and Information Science and Engineering,
University of Florida,
Gainesville, FL, 32611
Email: tamerC . .. ,ri .. i,i
We prove that the problem of finding the optimal set of enzymes is NPcomplete. The exact
cover by 3sets (X3S) can be reduced to the drug target identification problem in polynomial
time. We first state the X3S problem, which is NPComplete.
1. Problem definition
We develop a graph based representation that captures the interactions between reac
tions, compounds, and enzymes. Our graph representation is a variation of the boolean
network model 3,1. R, C, and E denote the set of reactions, compounds, and enzymes
respectively. The vertex set consists of all the members of R U C U E. A vertex is
labeled as reaction, compound, or enzyme based on the entity it refers to. Let Vp, Vc,
and VE denote the set of vertices from R, C, and E. A directed edge from vertex x to
vertex y is then drawn if one of the following three conditions holds: (1) x represents an
enzyme that catalyzes the reaction represented by y. (2) x corresponds to a substrate for
the reaction represented by y. (3) x represents a reaction that produces the compound
mapped to y.
Figure 1 illustrates a small hypothetical metabolic network. In this figure, C4 is
the target compound (i.e., the production of C4 should be stopped). In order to stop
the production of C4, R2 has to be prevented from taking place. The obvious solution
is to disrupt one of its catalyzing enzymes (E2 in this case). Another is by stopping
the production of one of its reactant compounds (C2 or C3 in this case). If we stop
the production of C2, we need to recursively look for the enzyme which is indirectly
responsible for its production (El in this case). Thus, the production of the target
compound can be stopped by manipulating either El or E2.
Figure 1 shows the disruption of E2 and its effect on the network. Inhibiting E2 re
sults in the knock out of compounds C5, C8 and Cg in addition to the target compound,
C4. Note that the production of C7 is not stopped since it is produced by R1 even after
*to whom correspondence should be addressed
ETI'NP
January 16, 2008 12:35 Proceedings Trim Size: 9in x 6in
/E 2" 
 I  ., ,.
C9
. Disrupted pathway
Figure 1. A graph constructed for a hypothetical metabolic network with three reactions R1, R2, and R3,
three enzymes E1, E2, and E3, and nine compounds C1, * C9. Circles, rectangles, and triangles denote
compounds, reactions, and enzymes respectively. Here, C4 (shown by double circle) is the target compound.
Dotted lines indicate the subgraph removed due to inhibition of enzyme E2.
the inhibition of E2. We define the number of nontarget compounds knocked out as
the damage, the manipulation of an enzyme set causes to the metabolic network. In this
case, the damage of inhibiting E2 is 3 (i.e., C5, C8 and C9). The damage of inhibition
of El is 2 (i.e., C2 and C5). The important observation is that El and E2 both achieve
the effect of disrupting the target compound, C4. Hence, El and E2 are both potential
drug targets. However, El is a better drugtarget than E2 since it causes lesser damage.
Formally the optimal enzyme combination identification problem is: "Given a set
of target compounds T (T C C), find the set of enzymes X (X C E) with minimum
damage, whose inhibition stops the production of all the compounds in T."
For simplicity, we assume that the input compounds to all reactions are present in
the network and that there are no external inputs. Different enzymes and compounds
may have varying levels of importance in the metabolic network. We consider all the
enzymes and compounds to be of equal importance. This assumption can be relaxed
by assigning weights to enzymes and compounds based on their role in the network.
Also, we are not incorporating backup enzyme activities 2 in this paper. This can be
achieved by creating vertices for sets of enzymes in our graph representation. However,
we do not discuss these extensions in this paper.
2. NPcompleteness of the problem
We prove that the problem of finding the optimal set of enzymes is NPcomplete.
The exact cover by 3sets (X3S) can be reduced to the drug target identification problem
in polynomial time. We first state the X3S problem, which is NPComplete .
X3S: Given a set of n items X = {x1, x2, x,}, where n = 3m is a
multiple of 3. Given 3sets (i.e., sets with three items) ci, 1 < i < k such that
ci C X and Ulc, X. The X3S problem seeks whether there exists m
3sets ci whose union is X.
Figure 2 shows how the X3S problem can be reduced to drug discovery problem in
polynomial time. Given an instance of the X3S problem, we map each of the 3sets c, to
ETI'NP
January 16, 2008 12:35 Proceedings Trim Size: 9in x 6in
a compound C,. We map each of the items xj to a reaction Rj. We draw a hypothetical
edge from C, to Rj if the 3set c, contains xj, indicating that the compound C, is used
by the reaction Rj. We draw an edge from each of the Rj to a single hypothetical
target compound indicating that the target compound is produced by all the k reactions.
In the hypothetical metabolic network, each compound C, corresponding to a 3set is
produced by a single reaction that also produces k + 1 other compounds. Each such
reaction is catalyzed by a single and unique enzyme.
The set of enzymes whose inhibition eliminates the target compound with minimum
damage in the hypothetical network produces the answer to the X3S problem. The
sketch of the proof is as follows: In order to eliminate the target compound, all the
reactions Rj, 1 < j < n, needs to be stopped. This can only be done by eliminating
at least one of the input compounds C, for each Rj. Eliminating each input compound
stops three reactions. Thus, the set of compounds that needs to be eliminated should
cover the entire reaction set Rj, 1 < j < n. The minimality of the set of compounds
that needs to be eliminated is dictated by the definition of the damage. Each compound
C, can be eliminated by stopping the reaction that produces it. This can only be done
by inhibiting the corresponding enzyme. Stopping each such reaction incurs a damage
of k + 2. This is because the reaction that produces C, also produce k + 1 additional
nontarget compounds. Thus, if the solution set to the drug target identification problem
contains q enzymes, it incurs a damage of q(k + 2). There exists a solution to the X3S
problem if and only if q = m. Therefore, since X3S is an NPcomplete problem, the
drug target identification is an NPcomplete problem too.
References
1. S.A. Kauffman. The Origins of Order: SelfOrganization and Selection in Evolution. Oxford
University Press, 1993.
2. M.T.A. Ocampo and W. Chaung et. al. Targeted deletion of mNthl reveals a novel DNA repair
enzyme activity. Mol Cell Biol., 22(17):611121, Sep 2002.
3. R. Somogyi and C.A. Sniegoski. Modeling the complexity of genetic networks: Understand
ing multigene and pleiotropic regulation. Complexity, 1:4563, 1996.
ETI'NP
January 16, 2008 12:35 Proceedings Trim Size: 9in x 6in
0 0 0
compounds
O OC2 OCk
R1I R2 R3 R4 R Rs Rn
Target compound
Figure 2. Polynomial time mapping of the X3S problem to the drug target identification problem using
metabolic networks. Each black circle denotes a set of k + 1 compounds.
ETI NP
