Ay et al.
Scalable Steady State Analysis of Boolean Biological Regulatory
Networks
Ferhat Ay, Fei Xu, Tamer Kahveci
Computer and Information Science and Engineering, University of Florida, Gainesville, FL 32611, USA
Email: {fay,feixu,tamer}@cise.ufl.edu
Abstract
Computing the long term behavior of regulatory and signaling networks is critical in understanding how
biological functions take place in organisms. Steady states of these networks determine the activity levels
of individual entities in the long run. Identifying all the steady states of these networks is difficult as
it suffers from the state space explosion problem. In this paper, we propose a method for identifying
all the steady states of boolean regulatory and signaling networks accurately and efficiently. We build
a mathematical model that allows pruning a large portion of the state space quickly without causing
any false dismissals. For the remaining state space, which is typically very small compared to the whole
state space, we develop a randomized traversal method that extracts the steady states. We estimate the
number of steady states, and the expected behavior of individual genes and gene pairs in steady states in
an online fashion. Also, we formulate a stopping criterion that terminates the traversal as soon as user
supplied percentage of the results are returned with high confidence. We show that our algorithm can
identify all the steady states accurately. Furthermore, our method is scalable to virtually any large scale
boolean biological regulatory network.
Availability. Source code of this work is available at http://bioinformatics.cise.ufl.edu/palSteady.html
Author Summary
Importance of the biological interaction data stems
from the fact that the driving forces behind organ
isms' functions are described by these interactions.
Analyzing these interactions can reveal significant
information that is impossible to gather by analyz
ing individual entities. The interactions between
genes and regulatory elements of the cell are rep
resented as gene regulatory networks and signal
ing pathways. Here, we analyze the steady state
behavior of these networks. We have extracted
the steady states of the cell cycle networks of two
yeast types. We have observed that our method
successfully identifies the steady state that corre
sponds to the most stable phase of the cell cycle
(Gi) for both yeast types. Additionally, we have
used the steady state 1pi.1'. . of the genes to find
coexpressed genes in Hedgehog network of Homo
Sapiens and observed that our findings match well
with the database of coexpressed genes (COX
PRESdb [1]).
Introduction
Analyzing biological networks is essential in under
standing the machinery of living organisms which
has been a main goal for scientists. Gene regulatory
networks and signaling pathways are two important
network types that play role in every process of liv
ing organisms [2]. In the last decade, significant
amount of research has been done on reconstruction
of these networks from experimental data [310].
The amount of regulatory data produced by these
methods is sufficient enough to trigger the research
on automated tools to analyze various aspects of
these networks. We use the term biological regu
latory networks (BRN) to combine gene regulatory
networks and signal transduction pathways.
To capture the biological meaning of BRNs, it
is necessary to characterize their long term behav
ior. A common way to achieve this is to identify
the steady states of the dynamic system defined by
a BRN. Identification of steady states of BRNs is
crucial in several applications such as the treatment
Ay et al.
of various human cancers [11, 12] (e.g. leukemia,
glioblastoma) and genetic engineering [13]. Addi
tionally, the steady state analysis has proven to
be successful to explain the flower morphogenesis
of Arabidopsis thaliana [1416], the differentiation
process of Thelper cells [1719], the mechanism of
T cell receptor signaling [20] and the cell cycles of
yeast types [21,22].
We use boolean values for the states of the genes
("ON" or "OFF" meaning high or low activity)
since it is successfully used in the literature for
BRNs [14, 17, 19, 21, 22]. Recently, several meth
ods have used categorical values (e.g., low, medium,
high activity) for gene states in their model. The
steady states extracted by these methods showed
high parallelism with the ones found using boolean
models [15, 23, 24]. The naive approach to steady
state identification in boolean networks is to ex
haustively search the state space. However, the
number of possible states of a BRN is exponen
tial in the number of its genes. Therefore, ex
haustive methods are computationally infeasible for
even moderately sized BRNs. To address this prob
lem, some existing methods use finitestate Markov
chains [25], binary decision diagrams (BDD) [17,18],
constraint programming I'], probabilistic boolean
networks [27],linear programming I"], relational
programming [29] and module networks [30,31].
Orthogonal to the selection of the computational
method, there are two commonly used alternatives
for modeling the state transitions. These are syn
chronous and asynchronous models and both are
used in the literature [17, 18, 26, 29]. Synchronous
model assumes that the activity levels of all the
genes change simultaneously. Hence, the next state
is deterministically decided by the current state.
On the other hand, asynchronous model considers
time in small intervals, such that only one gene can
change its state at an interval and state change
is equally likely for all genes [18]. For an n gene
BRN, the state space of synchronous model has 2"
states and 2" state transitions. For asynchronous
model, the number of states is still 2" but the num
ber of possible transitions can go up to n2". The
advantages/disadvantages of these models together
with their effect on running time of steady state
identification algorithms are discussed in the litera
ture [18,32,33]. Due to its strong assumptions, such
as all genes change their state at the same time and
all have equal response times to these changes, syn
chronous model is arguably more of an abstraction
of the biological process compared to asynchronous
model. We use the asynchronous model in our dis
cussion here, however, it is important to note our
method works for the synchronous model as well.
A state of a BRN is the union of the states of
its genes at a certain time. The state of a gene can
change over the time due to internal regulations or
external stimulants. Steady states are the states
in which the dynamic system of that BRN stabi
lizes. Rest of the states of the network are called
transient states and they are usually not of inter
est from biological viewpoint. We follow the steady
state definition of Gary et al. [17].
Definition 1 Let S be a set of states. Each si E S
is steady if and only if the following conditions are
satisfied:
The set of the successor states of all the states
in S is equal to S
For each si E S once it is visited the probability
of revisiting si is equal to 1 in finite number of
state transitions.
This definition suggests that there are two types
of possible steady states, self loops (e.g., Fig
ure l(a)) and simple loops (e.g., Figure l(b)) as
named in [17]. If a set of states create a complex
loop, then all the states of this set are transient since
at least one of the states does not satisfy the second
condition of the above definition. For instance, in
Figure 1(c) the state [010] is not revisited with prob
ability equal to 1 in finite steps since the system can
loop forever through other four states which create
a loop. Similarly, Garg et al. name such sets of
states as transient states. Figure 1 exemplifies all
the state types discussed above.
Our Contributions: In this paper, we develop
an algorithm that identifies all the steady states of
BRNs accurately and efficiently.
To mathematically express this problems clearly,
we define three types of states according to the num
ber of possible outgoing transitions from them. We
name a state Type 0 if it has no outgoing transi
tions to another state except itself (self loop) (state
[110] in Figure l(a)). A state with exactly one out
going transition to another state is Type 1 (all the
states in Figure l(b)). States with more than one
Ay et al.
outgoing transitions are Type 2 states (state [110]
in Figure l(c)). Using this notation, we observed
the following:
All the states of Type 0 are steady (self loops).
All the states of Type 2 are transient.
All the states of a simple loop are of Type 1.
We name the steady states of Type 1 as cyclic steady
states (i.e, simple loops). Our method first divides
the whole state space into three types (Type 0, 1
and 2) without materializing the exponential state
space graph. Then, we extract the cyclic steady
states from Type 1 states by using a randomized
traversal method. Cyclic steady states together
with the Type 0 states constitute all the steady
states of the BRN of consideration.
We use the boolean network model proposed by
Kauffman et al. [34]. We build a hypothetical state
transition graph using the interactions in a BRN.
We develop a mathematical model that uses binary
decision diagram (BDD) data structure [35] to clas
sify each state into one of the three classes, namely
Type 0, Type 1 and Type 2. Type 0 and Type
2 states are guaranteed to be steady and transient
(i.e. not steady), respectively. Type 1 states can be
either one. To further classify the Type 1 states as
transient or steady, we develop a randomized traver
sal method which samples random seed states from
Type 1 states and classifies the visited states dur
ing the traversal from this seed state. While sam
pling, we calculate the estimators for the number of
steady states, expected steady state distribution of
individual genes and jointsteady state distributions
of gene pairs. We calculate a stopping criterion
from the statistical information of explored states.
This criterion allows early termination of sampling
when the user defined percentage of steady states
are found with high confidence. In summary, our
technical contributions are:
We build a mathematical model for pruning a
very large portion of state space quickly with
out losing any steady states.
We develop a randomized traversal method
that computes estimators for the number of
steady states and the fraction of individual
genes and gene pairs being active in these
states in an online fashion. Our algorithm
guarantees to find all the steady states after
sufficient number of iterations.
We formulate a stopping criterion which uses
the information of classified states to termi
nate the algorithm when sufficient percentage
of steady states are extracted with a given con
fidence value.
Results and Discussion
Cell Cycles of Budding Yeast and Fis
sion Yeast
To evaluate the accuracy of the results reported by
our algorithm, we compared the steady states that
we found to the steady states that are reported in
the literature. For this purpose, we use the cell
cycle networks of two yeast types, namely Saccha
romyces cerevisiae (budding yeast) and Schizosac
charomyces pombe (fission yeast). We consider the
key regulatory genes of these networks since the
core process of these two cell cycles are well an
alyzed in the literature by both differential equa
tion models [36, 37] and boolean network mod
els [21,22, 38, 39].
The cell cycles of both yeasts go through four
main phases. In the first phase the yeast cell grows
till its size reaches a certain amount (Gi). The
second phase is when the DNA is synthesized and
chromosomes are replicated (S). Third phase is a
transition gap between the second and fourth (G2).
The cell division is completed at the fourth phase
named M. The two new cells then enter the Gi
phase again which completes the cycle. The state
corresponding to Gi phase is a steady state that is
observed the most in the yeast life cycle.
Li et al. [22] studied the boolean network model
of the budding yeast (Figure 2a) and identified the
boolean states visited during a complete cell cycle
together with seven steady states of the network
corresponding to the fixed points of the dynamic
system. Similarly, Davidich et al. [21] found thir
teen different steady states for the boolean model
of the cell cycle of fission yeast (Figure 2b).
Here we compare the steady states reported by
our method with the ones from the methods of Li et
al. and Davidich et al. For this we use vector no
tation to represent the activity levels of an ordered
Ay et al.
gene set. In this notation, 0 means the correspond
ing gene is inactive, 1 means its active and X means
it can be either one. For instance, for a gene set of
{g1, 92, 93}, the [01X] vector represents two states,
namely [010] and [011].
The budding yeast cell cycle network in Figure
2a is the same as the one analyzed by Li et al. [22].
We use the order { (I MBF, SBF, Clnl2, Cdhl,
Swi5, Cdc20, Clb56, Sicl, Clbl2, Mcml} for the
vector representation of the states of eleven genes
in this network. We follow Li et al. by exclud
ing Cell size from the gene set and the state rep
resentation. Li et al. reported seven steady states
for this network one of which corresponds to the
GI phase of the cell cycle. We identified eight dif
ferent steady states, six of which are Type 0 and
the other two are of Type 1. Six Type 0 steady
states we found are [0100X000100] (4 states) and
[0000X000X00] (2 states) and they all are also re
ported by Li et al. Also, our method accurately
labeled the [00001000100] state that corresponds
to G1 phase as steady. The two Type 1 steady
states which visit each other in a cycle are SSi
[00100000000] and SS2 [00110000000]. SSi is
when SBF is the only active gene in the network.
SS1 is followed by SS2 since in the next time step
SBF also activates Clnl2. Due to self degradation
of Clnl2 in state SS2, this state goes back to the
SSi again. The method of Li et al. labels SS2 as
steady whereas it does not report SS1.
For the states of the fission yeast cell cycle in
Figure 2b, we use the ordered gene set {Start,
SK, Cdc2/Cdcl3, Ste9, Ruml, Slpl, Cdc2/Cdcl3*,
Weel/Mikl, C.I. '. PP}. Our method re
ports fifteen different steady states which are
[0001XOOXXO] (8 states), [0000100XXO] (4 states),
[00000001X0] (2 states) and [0000000000]. The first
set of states contains the steady state [0001100100]
that corresponds to most stable phase (Gi) of the
cell cycle. This state together with twelve other
steady states we found matches exactly the ones
found by Davidich et al. [21] The two additional
steady states that we found different than Da
vidich et al. are [0000000100] and [0000000110].
The first state corresponds to high activation
level of only Weel/Mikl genes and the second
state is when Cdc25 is also active together with
Weel/Mikl. The reason of this difference is that
Davidich et al. manually sets a negative threshold
for Cdc2/Cdcl3 activation. Cdc2/Cdcl3 degrades
Weel/Mikl which prevents their system from visit
ing the two steady states we found without setting
any threshold manually.
These two examples demonstrate that our method
can accurately identify the steady states of BRNs.
Performance Evaluation
Here, we compare the performance of our method
to that of Garg et al. [17,18]. We used the asyn
chronous state transition model for both algorithms
in this experiment. We compared the running times
for a number of real BRNs as well as for randomly
generated networks. We compiled the real BRNs
from the pathway database PID [40] and other pub
lished work [17,18, 21, 22]. Table 1 reports the run
ning times for Garg et al.'s method named Genysis
and our algorithm with different parameter settings.
For real networks of small size such as yeast cell
cycles and THelper network, the running times for
both methods are around one second with Geny
sis running slightly faster than our method. How
ever, for bigger real networks our method's running
time is significantly smaller than Genysis. As au
thors also stated in their work, Genysis might need
extensive amount of running time when using asyn
chronous model due to their heuristics to select seed
states from the state space. The row corresponding
to p38 MAPK signaling pathway constitutes a good
example for this scenario. For the same network our
algorithm can identify the i'. of the steady states
with l'. confidence in only 11.3 seconds. Addi
tionally, the running times on four randomly gener
ated networks indicated that Genysis can not scale
well with the growing network size whereas our al
gorithm can still find large portion of the steady
states in a few minutes.
We also compared the steady states found by
both algorithms for the two yeast cell cycles. As
discussed in previous section, the steady states of
these two networks are reported in Li et al. [22] and
Davidich et al. [21]. For the budding yeast cell cycle
in Figure 2a, Genysis was able to identify only the
trivial steady state when all the genes are inactive.
For the fission yeast, Genysis labeled the state that
corresponds to the Gi phase (only the genes Ste9,
Ruml and Weel/Mikl are active) of cell cycle as
transient. As reported in Davidich et al. [21], G1 is
the most stable phase of this cycle and our method
Ay et al.
correctly classifies this state as steady.
The above results showed that our algorithm is
more scalable and practical compared to Genysis.
Furthermore, the steady states we reported for yeast
cell cycles match better with the previous findings.
CoExpressed Gene Pairs in Human
Hedgehog Network
We calculate the fraction of steady states in which
two genes are in active state together. Biologically
this fraction corresponds to the coexpression of the
two genes. Revealing coexpressed genes has great
significance in discovery of conserved genetic mod
ules [30, 41, 42] and identification of differentially
expressed genes [43].
In here, we compare the coexpression values for
gene pairs found by our algorithm with the values
reported in the gene coexpression database, COX
PRESdb [1]. For this purpose, we use Hedgehog
signaling network of Homo Sapiens given in KEGG
Pathway Database [44]. This network consists of 17
genes and hence, 136 possible gene pairs. We sorted
the gene pairs according to their coexpression val
ues in decreasing order and compared our ordering
with the one in COXPRESdb. We picked top 20
gene pairs from our list and searched for the in
dices of these pairs in the ordering of COXPRESdb.
Here, we report the largest index, 1, among these k
indices for different values of k.
For k = 1 we have 1 = 1, which means that
the highest coexpressed gene pair (GL1b.10O) in
our ordering is also the top scoring pair in COX
PRESdb. For k 5 we have 1 6, meaning that
the five gene pairs (GL1b.1O, GSK3BFBXW11,
RAB23GAS1, GLI1IHH and SUFUb.!O) with
the highest ranks in our ordering are in between
the top 6 pairs in the ranking of COXPRESdb. For
the other values of k = 10 and k 15, the 1 values
are 16 and 35 respectively. Hence, the gene pairs
reported by our method that are found to be ac
tive together in the steady states suggest that there
is a coexpression between these two genes. This
shows that our algorithm is useful in predicting co
expression of genes by utilizing the the steady state
information of BRNs.
Accuracy of Estimators
To evaluate the quality of our samplingbased esti
mators, we measured their correctness and conver
gence rate. Correctness means that the estimates
will eventually converge to the correct value. For
the convergence rate, a good estimator should ap
proximate the correct value after a small fraction of
the state space is explored.
We use a portion of p53 network of Homo Sapiens
taken from KEGG [44] in this experiment. We mea
sure the estimated number of steady states at which
a gene is active for each gene at each iteration of our
algorithm. Our algorithm traverses the entire space
of Type 1 states in about 2,500 iterations for this
network. Figure 4 shows the results for seven differ
ent genes. We plot these genes as they have different
steady state 1I..."' In other words, they vary in
the fraction of steady states in which they are active
(e.g. CHK1 is active whereas p21 is suppressed in
most of the steady states). The results show that
our estimators converge to the correct ratio for all
genes in less than 500 iterations. The rapid conver
gence suggests that our algorithm approximates the
correct profile of gene levels at steady states without
traversing the whole space of Type 1 states. This
suggests that, equipped with the stopping criterion
we devised, our algorithm is also practical and accu
rate for BRNs with large number of Type 1 states
since early termination of the algorithm does not
lead to significant deviation from the correct steady
state profile.
Methods
This section discusses our algorithm for identifying
all the steady states of boolean BRNs. First we de
scribe the mathematical model for expressing the
states and state transitions. Then, we discuss our
method to segregate the state space into three sub
spaces. Finally, we present our randomized traver
sal method that extracts Type 1 steady states. We
also give the formulation of a stopping criterion that
terminates the traversal when sufficient amount of
steady states are reported with high confidence.
State Transition Model
In order to identify the steady states of a BRN,
we first need to build a mathematical model that
Ay et al.
explains its states and how the network moves from
one state to another.
Let Xi(t) true/false denote the state of the
ith gene at time t. Here true denotes that ith gene
is "active" and false denotes that it is "inactive".
We use Xi instead of Xi(t) for simplicity wherever
appropriate.
We summarize the interactions that determine
the next state of the ith gene from the activity
values at time t as follows. The ith gene will be
inhibited if at least one of its suppressors is ac
tive. If all the suppressors of the ith gene are in
active and at least one of its activators is active,
then it becomes active in the next time step. In
all other situations the state of the ith gene re
mains unchanged. Even though the assumption
that one inhibitor can suppress all activators seems
questionable, it is commonly observed in biological
networks. Wu et al. [39] named this as 1 I... i inhi
bition" model and showed that it produces the same
results as threshold network model [38] for fission
yeast cell cycle network. Also, it has been used as a
modeling decision by Garg et al. [17,18]. However,
it is important to note that our method does not
depend on this assumption.
The following equation summarizes how the next
state of ith gene is determined:
x,(t + 1): (Xi(t) pA(t)) A ps(t) (1)
In this equation, the symbols V and A denote the
logical "AND" and "OR" operators, pA(t) andps(t)
represent predicates for the activators and the sup
pressors of the ith gene at time t, respectively. We
compute these predicates as pA(t) VjeAXj(t)
and ps(t) = VjsXj(t), where A and S are the
sets of indices for activators and the suppressors of
the ith gene.
An important observation is that, even though
the next state of the ith gene is deterministically
calculated, there can be multiple next states for the
whole network since we use asynchronous model.
A state of a given BRN is defined by the states of
individual genes. Let u = [X1 ... X,] denote a
state of the network. The network can move from
state u to state v = [X1 . Xi1 X,i Xi+l .
X,] only if the ith gene is one of the genes that
can have a state change. Individual genes that can
issue a state change at a given state determines the
possible next states of the network.
We model the changes in the states of a BRN us
ing an abstract graph representation. In this graph,
each vertex corresponds to a possible state of the
BRN. Thus, if there are n genes in a BRN, then the
corresponding graph contains 2" vertices. There is
an edge from vertex u to vertex v, if it is possible
to change the state of the BRN from the state rep
resented by u to the state represented by v by only
changing the state of a single gene. There can be
up to n2" edges between these states. This graph
is hypothetical as we use it only for building our
mathematical model. We never materialize this ex
ponential graph in our method.
We classify the vertices of this graph into three
classes based on the number of their outgoing edges.
Figure 1 provides visual examples for all three state
types listed below:
Type 0: The vertices that have no outgoing
edges (except self cycles). These vertices cor
respond to steady states as the state of the net
work cannot change once one of them is visited.
(Figure l(a))
Type 1: The vertices that have exactly one out
going edge. The states for these vertices can be
steady or transient. (Figure l(b))
Type 2: The vertices that have two or more
outgoing edges. These are all transient states.
(Figure 1(c))
In the following section, we describe our method
for segregating the state space into the above three
types.
Segregation of States using BDDs
As we discussed in the previous section, we never
generate the state transition graph of the input net
work. A simple observation on our state transition
model allows us to segregate the states without this
materialization. This segregation results in not only
the immediate identification of all Type 0 steady
states, but also eliminates a huge portion of states
by classifying them as transient.
For instance, for THelper cell network with 23
genes and 8, 388, 608 (223) possible states, our seg
regation method classifies 1,321 states as Type 0
and 8, 364, 757(~ 223) states as Type 2 in only 0.08
seconds. Remaining 22,530 states are labeled as
Ay et al.
Type 1. Thus, we need to explore only a small per
centage (~ (I 2'.'.) of the whole state space.
Here, we describe how we construct the BDDs
for all Type 0 states and all Type 1 states, namely
Zo and Z1. We first define a predicate that will be
handy in this discussion.
Here, D denotes the logical "XOR" operator. C,
evaluating to true at time t means that gene i will
change its state from X, to Xi at time t + 1. Oth
erwise, it preserves its current state. The following
equations, show the formulas of BDDs representing
Type 0 and Type 1 states:
Zo : A\ and Z : V(C, A (A/\ ).
i i jji
Zo = "True" represents the states that do not sat
isfy any of the C, conditions (i.e. none of the genes
change state). The states in Z1 "True" satisfy ex
actly one of the C, conditions (i.e. exactly one gene
changes state). The states which are not included in
the two BDDs above are called Type 2 and they are
all transient states. The BDD for these states can
be constructed similarly. However, we simply elim
inate these states since they do not reflect the long
term behavior of the system. By doing this without
materialization, we quickly reduce the state space
of the problem to a significantly smaller one. In the
next section, we describe how we extract the steady
states of Type 1.
Extracting Cyclic Steady States
In this section, we develop a randomized traversal
strategy that identifies the steady states of Type 1.
We call these states "cyclic 1 .. I An example for
this is the cycle of four states in Figure l(b). Af
ter traversing a portion of the vertices, we estimate
the total number of steady states, the probability of
each gene being active and the joint probability of
gene pairs being coexpressed in steady states. It is
worth mentioning that our traversal method never
traverses a state more than once. Hence, if it runs
for enough time it labels all the Type 1 states as
steady or transient. Algorithm 1 briefly describes
how we traverse the Type 1 states. Next, we elab
orate on different steps of this algorithm.
Step 1. Selecting a random seed state
C : Xi(t + 1) ( Xi(t)
Algorithm 1 RANDOMIZED TRAVERSAL OF TYPE
1 STATES
1. Randomly get an unobserved vertex from the
Type 1 set.
2. Follow the out going edge to traverse the graph
until seeing one of the following vertices
(i) A vertex that is labeled as transient or
steady in previous iterations.
(ii) A vertex that is traversed in this iteration.
3. Label all the traversed vertices as transient or
steady and update the estimators.
4. Stop if number of steady states observed so far
is sufficient.
We obtain a random seed state among the untra
versed satisfying assignments of the BDD for Type
1 states. We do this by traversing the BDD from
root node to the leaf level. At each step of the
traversal, we randomly pick a child node of the cur
rently visited node. When we reach the leaf level of
the BDD, the states of all the genes are determined
and hence, our seed state for the whole BRN.
Step 2. Traversal starting from the seed state
Once we choose an unobserved seed state, the
next step is to understand whether or not we can
reach to a new steady state from this state. To do
this, we traverse the state transition graph starting
from this vertex by following the edges.
Since the seed state is of Type 1, by definition,
it has only one outgoing edge. Thus, we can eas
ily find the next state as the state that satisfies the
transition condition. We continue traversal by ap
plying the same principle. Figure 3 summarizes the
possible cases that can occur during this traversal.
Starting from an unobserved state if we traverse
one of the following three paths then all the states
visited on this path are transient:
A path ending in a Type 0 state
A path ending in a Type 2 state
A path ending in a state that is observed in
previous iterations
Ay et al.
Notice that all three cases correspond to Step 2(i)
of our traversal method. The next case produces
both cyclic steady and transient states:
A path leading to a cycle of states visited in
current iteration
In this case, we label all the states on the cycle as
steady and the other states on the path as tran
sient. For instance, if the traversal starts from the
[001] state in Figure l(b), then [001] is transient
and other four states are Type 1 steady states.
Step 3. Calculating Estimators
At each iteration, we traverse a path in the state
transition graph and label each state on this path
as transient or steady. We name the set of ver
tices visited in each such traversal as an observa
tion. Using these observations, we develop estima
tors for the total number and the ,./. ," of steady
states. The profile of the steady states is the vector
where the ith entry is the expected fraction of the
steady states at which the ith gene is active. For
example, if the second entry of the profile is 0.95, it
means that we expect that the second gene is active
in *'.'. of the steady states. We also compute the
estimators for the joint expression (coexpression)
fractions of gene pairs. Computing these estimates
is important as they can lead to early prediction of
the steady state profile.
Here, we describe in detail the calculation and
the analysis of the estimator of the total number of
Type 1 steady states. First of all, we prove that
it is an unbiased estimator. Then, we discuss how
to minimize the variance of this estimator. For the
other estimators we only give the formulations.
First, let us introduce some notation we use
throughout this section:
No, Ni: Number of Type 0 and Type 1 states,
respectively. We calculate these numbers at
the initial segregation step.
0O = (s, ti): ith observation. si and ti are
the number of observed steady and transient
states traversed in this observation.
Si, Ti, Ui: Total number of observed steady
states, observed transient states and unob
served states after first i observations, respec
tively.
From the definitions above, we can calculate Ui
N S, T, S, E=I sj and T, E jltj.
Now, we introduce a 0/1 random variable Bi for
each observation Oi. At a given time Bi = 1 means
the current iteration results in observation Oi. We
simulate our sampling by assuming at any time one
and only one of the Bi's can be 1. In other words,
E[BiBj] 0 for any i / j. Notice that E[Bi]
E[B7'] = it for observation Oi. We formulate
the estimator of the total number of Type 1 steady
states at the ith iteration as:
F o kSk
k 0
Lemma 1 The estimator Fi is an unbiased estima
tor.
Proof: We prove this by showing the expected
value of Fi is equal to the total number of Type
1 steady states. Taking expectations of both sides
and replacing E[Bk] with 8ktk:
N.
iE[B[ N1
io sN
ik
k=0
After defining the estimator, the next step is to
calculate its variance.
Lemma 2 The variance of Fi is
i N1
Var[Fi] E ( )
j=o 0
Proof: We know that, Var[F,]
We first compute F2.
2 s N1
j0 s + to
j=0
[5 ]2
j=0
E[F?2] E2[F}].
I N1
k se + tr
kc=o
 BjBkSjSk ( N ) N
2 2 sN1 2
j+ Sj tj Sk tk
1=0 *^ + t
SBsk
Ay et al.
When we take the expected value of Fi2 the first
term cancels since E[BjBk] 0 for any i / j.
Hence, the variance of F, can be computed as:
Var[Fi = E[F,] E2[Fi]
SE[ B ( )2](E[Fi])2
,r Si + tj
j=o
j[ 8]2
.70
*
There are many ways to build an estimator from
Fjs. However, it is desirable to build an estimator
with a small variance as it converges to true solution
faster. Following Lemma builds the estimator with
minimum variance.
Lemma 3 The estimator that has the smallest
variance is
1 F
Sc V, F
Proof: Now, we discuss how we combine the esti
mators F, F2, ..., F, with variances Vi,V2..., V, to
minimize the overall variance of our estimation. In
other words, we want to find the weight parameters
71, 72, ., 'n such that E y 1 and the variance
of the estimator for total number of steady states
of Type 1 is minimized. Let us denote this new
estimator as T = 1 iFi. Then,
Var(T)= YVI
Mathematically, our aim is to minimize Vi
given E 7i 1. We formulate this problem by us
ing Lagrange Multiplier as follows:
L= qV
A(E y 1)
Taking derivative of both sides with respect to
each yi, we get the equations:
1
2yV A 0,, A =
Solving these equations we get the 7, values that
minimizes the Var(T) as:
1
7J V 5
Thus, by using the value of 7ys we find that the
estimator with smallest variance is
1
T= 1 F
S V,3
Next, we give the formulations of the estimators
for the fractions of each gene and each gene pair
being active in steady states. First, we formulate
our estimator for the fraction of a gene being active
in cyclic steady states. Assume that the number
of steady states at the ith observation in which the
kth gene is active is nki. An estimator for the kth
gene after the ith iteration is then :
Gki = /Si
j=l
Let namb,i denote the number of steady states in
which gene a and gene b are both active or both
inactive after the ith observation. We calculate the
estimator of joint probability of two genes having
the same activity level at a steady state as:
Jab,i /Si
j=1
Step 4. Stopping Criteria When our method
finishes traversing all Type 1 states (steps 1 to 3),
it finds all the steady states. However, in some ap
plications it might be sufficient to find a predeter
mined percentage of steady states. We develop a
statistical criterion to be able to terminate the algo
rithm quickly after a sufficient portion of the Type
1 states are explored. Our method still guarantees
that the desired percentage of the results are found
with high confidence. More precisely, when the user
supplies a parameter a (e.g. 0.9), we compute a
confidence c E [0, 1], at each iteration such that "at
least a x 100 percent of the steady states are found
with probability at least c". This is desirable as the
user can terminate the loop when c is large enough
for the underlying application.
Now, let us describe how the stopping criterion
works. Let A* denote the actual number of total
Type 1 steady states. If we have known the value
of A* we could have stopped sampling with a con
fidence value of c = 1 when A* < S, + (1 )(No+Si)
a
= 0
Ay et al.
is satisfied. That is the time when we are sure that
a x 100 percent of the steady states are already re
ported. Since we do not know A* in advance, we use
the information gathered from observed portion of
states. We compute Ai which denotes the minimum
number of total steady states of Type 1 that needs
to be present for our method to continue traversal.
A S+ (1 a)(No + S)/a (6)
Trivially, if Ai > Ui + Si we just stop sampling with
c = 1 since even if all the unobserved states were
to be steady, the reported ones would constitute at
least a x 100 percent of the Type 1 steady states.
Otherwise, we calculate the confidence value in ith
iteration as the probability that we would have ob
served at least S, steady states in our observations
so far if there were Ai unobserved steady states.
Formally, we compute the confidence as:
C(Ai) = [+ + T) (1 qi)S +T ] (7)
k=S
qi in Equation 7 represents the percentage of steady
states if there were Ai steady states in Type 1 states
(i.e. qi = N). The inner term of the summa
tion represents "The probability of getting exactly k
steady states from Si + Ti currently observed states
if the probability of a state being steady is qi".
Lemma 4 shows that, the confidence value re
ported when we stop sampling is never an over es
timation.
Lemma 4 The confidence value given in Equa
tion 7 by using Ai does not lead to false dismissal.
Proof: Here, we have three cases to consider:
Casel : (A* > Ai)
Then, q, = > N qi. Since the confi
dence value is calculated as the area under the
right hand side of a binomial probability dis
tribution function (i.e. inverse CDF), c value
will be larger for a larger value of q. Hence,
C(A*) > C(Ai). That means whenever we stop
sampling the confidence we report is conserva
tive.
Case : (A* Ai)
Trivially, C(A*) C(Ai) when we terminate
the sampling.
Case : (A* < Ai)
This case implies that we overestimated the to
tal number of Type 1 steady states at ith itera
tion. Only thing that can happen in such a case
is that our method decides to continue travers
ing when it does not need to. Since the actual
number of steady states are less than what we
have estimated, when the traversal stops we
have already sampled at least as many steady
states as needed to guarantee the reported con
fidence value.
Corollary 1 follows from Lemma 4.
Corollary 1 Our method guarantees to find all the
steady states when the confidence value reaches 1. U
Acknowledgments
References
1. Obayashi T, Hayashi S, Shibaoka M, Saeki M,
Ohta H, et al. (2008) Coxpresdb: a database of
coexpressed gene networks in mammals. Nucleic
Acids Research (NAR) 36.
2. Karlebach G, Shamir R (2008) Modelling and
analysis of gene regulatory networks. Nat Rev
Mol Cell Biol 9: 77080.
3. Basso K, Margolin A, ..i.. .1 1. G, Klein U,
DallaFavera R, et al. (2005) Reverse engineering
of regulatory networks in human b cells. Nature
Genetics 4: 38290.
4. Margolin A, Nemenman I, Basso K, WI ...... C,
'ii.. .I i1. G, et al. (2006) Aracne: an algo
rithm for the reconstruction of gene regulatory
networks in a mammalian cellular context. BMC
Bioinformatics 7 Suppl 1.
5. Tatsuya A, Satoru M, Satoru K (1999) Identifica
tion of genetic networks from a small number of
gene expression patterns under the boolean net
work model. In: Pacific Symposium on Biocom
puting. volume 4, pp. 1728.
6. Osamu H, Naoki N, Yoshinori T, Hideo B, Seiya
I, et al. (2005) Estimating gene networks from
expression data and binding location data via
boolean networks. Computational Science and
Its Applications ICCSA 2005 3482: 34956.
7. Tatsuya A, Satoru K, Osamu M, Satoru M (2003)
Identification of genetic networks by strategic
gene disruptions and gene overexpressions under
a boolean model. Theoretical Computer Science
298: 235251.
8. Satoru M (2003) Inference, modeling and simu
lation of gene networks. In: Proceedings of the
First International Workshop on Computational
Methods in Systems Biology. pp. 20711.
9. Osamu H, Ryo Y, Seiya I, Rui Y, Tomoyuki H,
et al. (2008) Statistical inference of transcrip
tional modulebased gene networks from time
course gene expression profiles by using state
space models. Bioinformatics 24: 93242.
10. Wong S, Zhang L, Tong A, Li Z, Goldberg D,
et al. (2004) Combining biological networks to
predict genetic interactions. Proc Natl Acad Sci
USA 101: 156827.
11. Hupp T, Lane D, Ball K (2000) Strategies for
manipulating the p53 pathway in the treatment
of human cancer. Biochemical Journal 352: 1
17.
12. Lane D (1999) Exploiting the p53 pathway for
cancer diagnosis. and therapy. British Journal of
Cancer (BJC) 80: 15.
13. Ostergaard S, Olsson L, Johnston M, Nielsen J
(2000) Increasing galactose consumption by sac
charomyces cerevisiae through metabolic engi
neering of the gal gene regulatory network. Na
ture Biotechnology 18: 12836.
14. Mendoza L, Thieffry D, AlvarezBuylla E (1999)
Genetic control of flower morphogenesis in ara
bidopsis thaliana: a logical analysis. Bioinfor
matics 15: 593606.
15. Demongeot J, Morvan M, Sene S (2008) Impact
of fixed boundary conditions on the basins of
attraction in the flower's morphogenesis of ara
bidopsis thaliana. In: Proceedings of the 22nd
International Conference on Advanced Informa
tion Networking and Applications (AINAW). pp.
782789.
16. Ivarez Buylla E, Chaos, Aldana M, Bentez M,
CortesPoza Y, et al. (2008) Floral morphogen
esis: Stochastic explorations of a gene network
epigenetic landscape. PLoS ONE 3: e3626.
17. Garg A, Xenarios I, Mendoza L, De Micheli G
(2007) An efficient method for dynamic analy
sis of gene regulatory networks. and in silico gene
perturbation experiments. In: International Con
ference on Research in Computational Molecular
Biology (RECOMB). pp. 6276.
18. Garg A, Di Cara A, Xenarios I, Mendoza L,
De Micheli G (2008) Synchronous versus asyn
chronous modeling of gene regulatory networks.
Bioinformatics 24: 191725.
19. Mendoza L (2006) A network model for the con
trol of the differentiation process in th cells.
Biosystems 84: 10114.
20. SaezRodriguez J, Simeoni L, Lindquist J,
Hemenway R, Bommhardt U, et al. (2007) A log
ical model provides insights into t cell receptor
signaling. PLoS Computational Biology 3: 1580
90.
21. Davidich M, Bornholdt S (2008) Boolean network
model predicts cell cycle sequence of fission yeast.
PLoS ONE 3: e1672.
22. Fangting L, Tao L, Ying L, Qi O, Chao T (2004)
The yeast cellcycle network is robustly designed.
Proceedings of the National Academy of Sciences
(PNAS) 101: 47816.
23. Garg A, Mendoza L, Xenarios I, DeMicheli G
(2007) Modeling of multiple valued gene regula
tory networks. In: Proceedings IEEE Engineer
ing in Medicine and Biology Society. pp. 1398
404.
24. Mendoza L, Xenarios I (2006) A method for the
generation of standardized qualitative dynamical
systems of regulatory networks. Theoretical biol
ogy and medical modelling 13.
25. Hachtel G, Macii E, Pardo A, Somenzi F (1996)
Markovian analysis of large finite state machines.
IEEE Transactions on ComputerAided Design
15: 147993.
26. Devloo V, Hansen P, Labb M (2003) Identifi
cation of all steady states in large networks by
logical analysis. Bull Mathematical Biology 65:
102551.
27. Shmulevich I, Dougherty E, Kim S, Zhang W
(2002) Probabilistic boolean networks: a rule
based uncertainty model for gene regulatory net
works. Bioinformatics 18: 26174.
28. Shlomi T, Berkman O, Ruppin E (2005) Reg
ulatory on/off minimization of metabolic flux
changes after genetic perturbations. Proceedings
of the National Academy of Sciences (PNAS) 102:
7695700.
29. Schaub M, Henzinger T, Fisher J (2007) Quali
tative networks: a symbolic approach to analyze
biological signaling networks. BMC Systems Bi
ology 1.
Ay et al.
30. Segal E, Pe'er D, Regev A, Koller D, Friedman
N (2005) Learning module networks. Journal of
Machine Learning Research 6: 55788.
31. Segal E, Michael S, Regev A, Pe'er D, David B,
et al. (2003) Module networks: Discovering regu
latory modules and their condition specific regu
lators from gene expression data. Nature Genet
ics 34: 16676.
32. Albert I, Thakar J, Li S, Zhang R, Albert R
(2008) Boolean network simulations for life sci
entists. Source code for biology and medicine 3:
16.
33. Albert R, Othmer H (2003) The topology of
the regulatory interactions predicts the expres
sion pattern of the segment polarity genes in
drosophila melanogaster. Journal of Theoretical
Biology 223: 118.
34. Kauffman S (1969) Homeostasis and differentia
tion in random genetic control networks. Nature
224: 1778.
35. LindNielsen J (1999). Buddy a bi
nary decision diagram package. tech. rep,
http://cs.it.dtu.dk/buddy.
36. Tyson J, CsikaszNagy A, Novak B (2002) The
dynamics of cell cycle regulation. Bioessays 24:
1095109.
37. Tyson J, Chen K, Novak B (2001) Network dy
namics and cell physiology. Nat Rev Mol Cell
Biol 2: 90816.
38. Bornholdt S (2008) Boolean network models of
cellular regulation: prospects and limitations.
Journal of The Royal Society Interface 5: 8594.
39. Wu Y, Zhang X, Yu J, Ouyang Q (2009) Iden
tification of a topological characteristic respon
sible for the biological robustness of regula
tory networks. PLoS Computational Biology 5:
e1000442.
40. Schaefer C, Anthony K, Krupa S, Buchoff J, Day
M, et al. (2009) Pid: the pathway interaction
database. Nucleic Acids Research (NAR) 37:
6749.
41. Stuart J, Segal E, Koller D, Kim S (2003) A gene
coexpression network for global discovery of con
served genetic modules. Science 302: 2401.
42. Zotenko E, Guimaraes K, Jothi R, Przytycka
T (2005) Decomposition of overlapping protein
complexes: A graph theoretical method for an
alyzing static and dynamic protein associations.
RECOMB Satellite Meeting on Systems Biology
LNBI: 2338.
43. Oldham M, Horvath S, Geschwind D (2006) Con
servation. and evolution of gene coexpression net
works in human. and chimpanzee brains. Pro
ceedings of the National Academy of Sciences
(PNAS) 103: 179738.
44. Ogata H, Goto S, Sato K, I 1ii..I.. I1. W, Bono H,
et al. (1999) Kegg: Kyoto encyclopedia of genes.
and genomes. Nucleic Acids Research (NAR) 27:
2934.
Tables
Figure Legends
Ay et al.
Ay et al.
Table 1: The comparison of our algorithm with an existing method, Genysis [17,18], on real and random
networks. 1 Our algorithm when 90% of the steady states are found with 90% confidence. 2 Our algorithm when 80%
of the steady states are found with 80% confidence. We used a cutoff time of 24hours and "" indicates that the method
could not find all steady states within this time. s denotes seconds and m denotes minutes in running time columns.
Network Name Network size Running Time
Genes Interactions Genysis Our Alg.1 Our Alg.2
Fission yeast cell cycle [21] 10 27 0.21s 0.18s 0.17s
Budding yeast cell cycle [22] 12 35 0.13s 0.25s 0.22s
THelper cells [17] 23 35 0.23s 1.14s 0.43s
p38 MAPK signaling [40] 26 28 545.3m 11.3s 2.1s
Tcell receptor [18] 40 58 20.7m 14.2s 2.11s
randomNet 1 20 32 10.4m 2.282s 0.3s
randomNet 2 30 48 5.923s 3.13s
randomNet 3 40 64 4.7m 3.4m
randomNet 4 50 80 68.7m 15.3m
(001 011 111 110l
010
.^^ ^. ^^ ^010)
011 111 110
101 c 10)
Figure 1: Each circle represents a state of a hypothetical network with 3 genes. The binary values correspond to activation
levels of these genes. (a) The three states on the left are transient and of Type 1. The state with self loop is steady and
Type 0. (b) The four states in simple loop are cyclic steady states and they are of Type 1. (c) The leftmost state is transient
and Type 1. Even though only [110] is of Type 2 (others are Type 1), the remaining five states create a complex loop, and
thus they are transient.
Ay et al.
Cell Size
Figure 2: Regulatory networks of the cell cycles of two yeast types. Red arrows with pointed heads represent activation,
black arrows with bar heads represent inhibition and yellow arrows indicate selfdegradation. (a) S. cerevisiae (budding
yeast). (b) S. pombe (fission yeast).
Type 1
Type 0
!e d
steady transient
Figure 3: Summary of the traversal process for a randomly picked state (a) from unobserved Type 1 states. If the path
starting from a ends at b, c, d or e, then all the states on this path are transient (Step 2(i) of Algorithm 1). If the path
starting from a ends at a state like f then all the states on the path from a to f are transient (excluding f) and all the
states on the cycle from f to f are steady.
CHK1
cyclinG
cyclinEl
ATR
CDC2
Number of Iterations
Figure 4: Convergence of the estimators for the steady state profiles of the genes. These genes are a selected subset of
the genes of p53 network of Homo Sapiens [44]. Yaxis shows for each gene the fraction of steady states that the gene is
in active state.
Ste9 R um I
I NCdc2/CdcX13
PIP j, Cdc25
SIPI Weel/Miki
" Cdc2/Cdc13*
