Group Title: Department of Computer and Information Science and Engineering Technical Reports
Title: Scalable steady state analysis of boolean biological regulatory networks
Full Citation
Permanent Link:
 Material Information
Title: Scalable steady state analysis of boolean biological regulatory networks
Alternate Title: Department of Computer and Information Science and Engineering Technical Report
Physical Description: Book
Language: English
Creator: Ay, Ferhat
Xu, Fei
Kahveci, Tamer
Publisher: Department of Computer and Information Science and Engineering, University of Florida
Place of Publication: Gainesville, Fla.
Copyright Date: 2009
 Record Information
Bibliographic ID: UF00095740
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.


This item has the following downloads:

2009478 ( PDF )

Full Text

Ay et al.

Scalable Steady State Analysis of Boolean Biological Regulatory

Ferhat Ay, Fei Xu, Tamer Kahveci
Computer and Information Science and Engineering, University of Florida, Gainesville, FL 32611, USA
E-mail: {fay,feixu,tamer}


Computing the long term behavior of regulatory and signaling networks is critical in understanding how
biological functions take place in organisms. Steady states of these networks determine the activity levels
of individual entities in the long run. Identifying all the steady states of these networks is difficult as
it suffers from the state space explosion problem. In this paper, we propose a method for identifying
all the steady states of boolean regulatory and signaling networks accurately and efficiently. We build
a mathematical model that allows pruning a large portion of the state space quickly without causing
any false dismissals. For the remaining state space, which is typically very small compared to the whole
state space, we develop a randomized traversal method that extracts the steady states. We estimate the
number of steady states, and the expected behavior of individual genes and gene pairs in steady states in
an online fashion. Also, we formulate a stopping criterion that terminates the traversal as soon as user
supplied percentage of the results are returned with high confidence. We show that our algorithm can
identify all the steady states accurately. Furthermore, our method is scalable to virtually any large scale
boolean biological regulatory network.
Availability. Source code of this work is available at

Author Summary

Importance of the biological interaction data stems
from the fact that the driving forces behind organ-
isms' functions are described by these interactions.
Analyzing these interactions can reveal significant
information that is impossible to gather by analyz-
ing individual entities. The interactions between
genes and regulatory elements of the cell are rep-
resented as gene regulatory networks and signal-
ing pathways. Here, we analyze the steady state
behavior of these networks. We have extracted
the steady states of the cell cycle networks of two
yeast types. We have observed that our method
successfully identifies the steady state that corre-
sponds to the most stable phase of the cell cycle
(Gi) for both yeast types. Additionally, we have
used the steady state 1pi.1'. -. of the genes to find
co-expressed genes in Hedgehog network of Homo
Sapiens and observed that our findings match well
with the database of co-expressed genes (COX-
PRESdb [1]).


Analyzing biological networks is essential in under-
standing the machinery of living organisms which
has been a main goal for scientists. Gene regulatory
networks and signaling pathways are two important
network types that play role in every process of liv-
ing organisms [2]. In the last decade, significant
amount of research has been done on reconstruction
of these networks from experimental data [3-10].
The amount of regulatory data produced by these
methods is sufficient enough to trigger the research
on automated tools to analyze various aspects of
these networks. We use the term biological regu-
latory networks (BRN) to combine gene regulatory
networks and signal transduction pathways.
To capture the biological meaning of BRNs, it
is necessary to characterize their long term behav-
ior. A common way to achieve this is to identify
the steady states of the dynamic system defined by
a BRN. Identification of steady states of BRNs is
crucial in several applications such as the treatment

Ay et al.

of various human cancers [11, 12] (e.g. leukemia,
glioblastoma) and genetic engineering [13]. Addi-
tionally, the steady state analysis has proven to
be successful to explain the flower morphogenesis
of Arabidopsis thaliana [14-16], the differentiation
process of T-helper cells [17-19], the mechanism of
T cell receptor signaling [20] and the cell cycles of
yeast types [21,22].
We use boolean values for the states of the genes
("ON" or "OFF" meaning high or low activity)
since it is successfully used in the literature for
BRNs [14, 17, 19, 21, 22]. Recently, several meth-
ods have used categorical values (e.g., low, medium,
high activity) for gene states in their model. The
steady states extracted by these methods showed
high parallelism with the ones found using boolean
models [15, 23, 24]. The naive approach to steady
state identification in boolean networks is to ex-
haustively search the state space. However, the
number of possible states of a BRN is exponen-
tial in the number of its genes. Therefore, ex-
haustive methods are computationally infeasible for
even moderately sized BRNs. To address this prob-
lem, some existing methods use finite-state Markov
chains [25], binary decision diagrams (BDD) [17,18],
constraint programming I-'], probabilistic boolean
networks [27],linear programming I"], relational
programming [29] and module networks [30,31].
Orthogonal to the selection of the computational
method, there are two commonly used alternatives
for modeling the state transitions. These are syn-
chronous and asynchronous models and both are
used in the literature [17, 18, 26, 29]. Synchronous
model assumes that the activity levels of all the
genes change simultaneously. Hence, the next state
is deterministically decided by the current state.
On the other hand, asynchronous model considers
time in small intervals, such that only one gene can
change its state at an interval and state change
is equally likely for all genes [18]. For an n gene
BRN, the state space of synchronous model has 2"
states and 2" state transitions. For asynchronous
model, the number of states is still 2" but the num-
ber of possible transitions can go up to n2". The
advantages/disadvantages of these models together
with their effect on running time of steady state
identification algorithms are discussed in the litera-
ture [18,32,33]. Due to its strong assumptions, such
as all genes change their state at the same time and

all have equal response times to these changes, syn-
chronous model is arguably more of an abstraction
of the biological process compared to asynchronous
model. We use the asynchronous model in our dis-
cussion here, however, it is important to note our
method works for the synchronous model as well.
A state of a BRN is the union of the states of
its genes at a certain time. The state of a gene can
change over the time due to internal regulations or
external stimulants. Steady states are the states
in which the dynamic system of that BRN stabi-
lizes. Rest of the states of the network are called
transient states and they are usually not of inter-
est from biological viewpoint. We follow the steady
state definition of Gary et al. [17].

Definition 1 Let S be a set of states. Each si E S
is steady if and only if the following conditions are

The set of the successor states of all the states
in S is equal to S

For each si E S once it is visited the probability
of revisiting si is equal to 1 in finite number of
state transitions.

This definition suggests that there are two types
of possible steady states, self loops (e.g., Fig-
ure l(a)) and simple loops (e.g., Figure l(b)) as
named in [17]. If a set of states create a complex
loop, then all the states of this set are transient since
at least one of the states does not satisfy the second
condition of the above definition. For instance, in
Figure 1(c) the state [010] is not revisited with prob-
ability equal to 1 in finite steps since the system can
loop forever through other four states which create
a loop. Similarly, Garg et al. name such sets of
states as transient states. Figure 1 exemplifies all
the state types discussed above.
Our Contributions: In this paper, we develop
an algorithm that identifies all the steady states of
BRNs accurately and efficiently.
To mathematically express this problems clearly,
we define three types of states according to the num-
ber of possible outgoing transitions from them. We
name a state Type 0 if it has no outgoing transi-
tions to another state except itself (self loop) (state
[110] in Figure l(a)). A state with exactly one out-
going transition to another state is Type 1 (all the
states in Figure l(b)). States with more than one

Ay et al.

outgoing transitions are Type 2 states (state [110]
in Figure l(c)). Using this notation, we observed
the following:

All the states of Type 0 are steady (self loops).

All the states of Type 2 are transient.

All the states of a simple loop are of Type 1.

We name the steady states of Type 1 as cyclic steady
states (i.e, simple loops). Our method first divides
the whole state space into three types (Type 0, 1
and 2) without materializing the exponential state
space graph. Then, we extract the cyclic steady
states from Type 1 states by using a randomized
traversal method. Cyclic steady states together
with the Type 0 states constitute all the steady
states of the BRN of consideration.
We use the boolean network model proposed by
Kauffman et al. [34]. We build a hypothetical state
transition graph using the interactions in a BRN.
We develop a mathematical model that uses binary
decision diagram (BDD) data structure [35] to clas-
sify each state into one of the three classes, namely
Type 0, Type 1 and Type 2. Type 0 and Type
2 states are guaranteed to be steady and transient
(i.e. not steady), respectively. Type 1 states can be
either one. To further classify the Type 1 states as
transient or steady, we develop a randomized traver-
sal method which samples random seed states from
Type 1 states and classifies the visited states dur-
ing the traversal from this seed state. While sam-
pling, we calculate the estimators for the number of
steady states, expected steady state distribution of
individual genes and joint-steady state distributions
of gene pairs. We calculate a stopping criterion
from the statistical information of explored states.
This criterion allows early termination of sampling
when the user defined percentage of steady states
are found with high confidence. In summary, our
technical contributions are:

We build a mathematical model for pruning a
very large portion of state space quickly with-
out losing any steady states.

We develop a randomized traversal method
that computes estimators for the number of
steady states and the fraction of individual
genes and gene pairs being active in these
states in an online fashion. Our algorithm

guarantees to find all the steady states after
sufficient number of iterations.

We formulate a stopping criterion which uses
the information of classified states to termi-
nate the algorithm when sufficient percentage
of steady states are extracted with a given con-
fidence value.

Results and Discussion

Cell Cycles of Budding Yeast and Fis-
sion Yeast

To evaluate the accuracy of the results reported by
our algorithm, we compared the steady states that
we found to the steady states that are reported in
the literature. For this purpose, we use the cell
cycle networks of two yeast types, namely Saccha-
romyces cerevisiae (budding yeast) and Schizosac-
charomyces pombe (fission yeast). We consider the
key regulatory genes of these networks since the
core process of these two cell cycles are well an-
alyzed in the literature by both differential equa-
tion models [36, 37] and boolean network mod-
els [21,22, 38, 39].
The cell cycles of both yeasts go through four
main phases. In the first phase the yeast cell grows
till its size reaches a certain amount (Gi). The
second phase is when the DNA is synthesized and
chromosomes are replicated (S). Third phase is a
transition gap between the second and fourth (G2).
The cell division is completed at the fourth phase
named M. The two new cells then enter the Gi
phase again which completes the cycle. The state
corresponding to Gi phase is a steady state that is
observed the most in the yeast life cycle.
Li et al. [22] studied the boolean network model
of the budding yeast (Figure 2a) and identified the
boolean states visited during a complete cell cycle
together with seven steady states of the network
corresponding to the fixed points of the dynamic
system. Similarly, Davidich et al. [21] found thir-
teen different steady states for the boolean model
of the cell cycle of fission yeast (Figure 2b).
Here we compare the steady states reported by
our method with the ones from the methods of Li et
al. and Davidich et al. For this we use vector no-
tation to represent the activity levels of an ordered

Ay et al.

gene set. In this notation, 0 means the correspond-
ing gene is inactive, 1 means its active and X means
it can be either one. For instance, for a gene set of
{g1, 92, 93}, the [01X] vector represents two states,
namely [010] and [011].
The budding yeast cell cycle network in Figure
2a is the same as the one analyzed by Li et al. [22].
We use the order { (I MBF, SBF, Clnl-2, Cdhl,
Swi5, Cdc20, Clb5-6, Sicl, Clbl-2, Mcml} for the
vector representation of the states of eleven genes
in this network. We follow Li et al. by exclud-
ing Cell size from the gene set and the state rep-
resentation. Li et al. reported seven steady states
for this network one of which corresponds to the
GI phase of the cell cycle. We identified eight dif-
ferent steady states, six of which are Type 0 and
the other two are of Type 1. Six Type 0 steady
states we found are [0100X000100] (4 states) and
[0000X000X00] (2 states) and they all are also re-
ported by Li et al. Also, our method accurately
labeled the [00001000100] state that corresponds
to G1 phase as steady. The two Type 1 steady
states which visit each other in a cycle are SSi
[00100000000] and SS2 [00110000000]. SSi is
when SBF is the only active gene in the network.
SS1 is followed by SS2 since in the next time step
SBF also activates Clnl-2. Due to self degradation
of Clnl-2 in state SS2, this state goes back to the
SSi again. The method of Li et al. labels SS2 as
steady whereas it does not report SS1.
For the states of the fission yeast cell cycle in
Figure 2b, we use the ordered gene set {Start,
SK, Cdc2/Cdcl3, Ste9, Ruml, Slpl, Cdc2/Cdcl3*,
Weel/Mikl, C.I. '. PP}. Our method re-
ports fifteen different steady states which are
[0001XOOXXO] (8 states), [0000100XXO] (4 states),
[00000001X0] (2 states) and [0000000000]. The first
set of states contains the steady state [0001100100]
that corresponds to most stable phase (Gi) of the
cell cycle. This state together with twelve other
steady states we found matches exactly the ones
found by Davidich et al. [21] The two additional
steady states that we found different than Da-
vidich et al. are [0000000100] and [0000000110].
The first state corresponds to high activation
level of only Weel/Mikl genes and the second
state is when Cdc25 is also active together with
Weel/Mikl. The reason of this difference is that
Davidich et al. manually sets a negative threshold

for Cdc2/Cdcl3 activation. Cdc2/Cdcl3 degrades
Weel/Mikl which prevents their system from visit-
ing the two steady states we found without setting
any threshold manually.
These two examples demonstrate that our method
can accurately identify the steady states of BRNs.

Performance Evaluation

Here, we compare the performance of our method
to that of Garg et al. [17,18]. We used the asyn-
chronous state transition model for both algorithms
in this experiment. We compared the running times
for a number of real BRNs as well as for randomly
generated networks. We compiled the real BRNs
from the pathway database PID [40] and other pub-
lished work [17,18, 21, 22]. Table 1 reports the run-
ning times for Garg et al.'s method named Genysis
and our algorithm with different parameter settings.
For real networks of small size such as yeast cell
cycles and T-Helper network, the running times for
both methods are around one second with Geny-
sis running slightly faster than our method. How-
ever, for bigger real networks our method's running
time is significantly smaller than Genysis. As au-
thors also stated in their work, Genysis might need
extensive amount of running time when using asyn-
chronous model due to their heuristics to select seed
states from the state space. The row corresponding
to p38 MAPK signaling pathway constitutes a good
example for this scenario. For the same network our
algorithm can identify the i'. of the steady states
with -l'. confidence in only 11.3 seconds. Addi-
tionally, the running times on four randomly gener-
ated networks indicated that Genysis can not scale
well with the growing network size whereas our al-
gorithm can still find large portion of the steady
states in a few minutes.
We also compared the steady states found by
both algorithms for the two yeast cell cycles. As
discussed in previous section, the steady states of
these two networks are reported in Li et al. [22] and
Davidich et al. [21]. For the budding yeast cell cycle
in Figure 2a, Genysis was able to identify only the
trivial steady state when all the genes are inactive.
For the fission yeast, Genysis labeled the state that
corresponds to the Gi phase (only the genes Ste9,
Ruml and Weel/Mikl are active) of cell cycle as
transient. As reported in Davidich et al. [21], G1 is
the most stable phase of this cycle and our method

Ay et al.

correctly classifies this state as steady.
The above results showed that our algorithm is
more scalable and practical compared to Genysis.
Furthermore, the steady states we reported for yeast
cell cycles match better with the previous findings.

Co-Expressed Gene Pairs in Human
Hedgehog Network

We calculate the fraction of steady states in which
two genes are in active state together. Biologically
this fraction corresponds to the co-expression of the
two genes. Revealing co-expressed genes has great
significance in discovery of conserved genetic mod-
ules [30, 41, 42] and identification of differentially
expressed genes [43].
In here, we compare the co-expression values for
gene pairs found by our algorithm with the values
reported in the gene co-expression database, COX-
PRESdb [1]. For this purpose, we use Hedgehog
signaling network of Homo Sapiens given in KEGG
Pathway Database [44]. This network consists of 17
genes and hence, 136 possible gene pairs. We sorted
the gene pairs according to their co-expression val-
ues in decreasing order and compared our ordering
with the one in COXPRESdb. We picked top 20
gene pairs from our list and searched for the in-
dices of these pairs in the ordering of COXPRESdb.
Here, we report the largest index, 1, among these k
indices for different values of k.
For k = 1 we have 1 = 1, which means that
the highest co-expressed gene pair (GL1-b.10O) in
our ordering is also the top scoring pair in COX-
PRESdb. For k 5 we have 1 6, meaning that
the five gene pairs (GL1-b.1O, GSK3B-FBXW11,
RAB23-GAS1, GLI1-IHH and SUFU-b.!O) with
the highest ranks in our ordering are in between
the top 6 pairs in the ranking of COXPRESdb. For
the other values of k = 10 and k 15, the 1 values
are 16 and 35 respectively. Hence, the gene pairs
reported by our method that are found to be ac-
tive together in the steady states suggest that there
is a co-expression between these two genes. This
shows that our algorithm is useful in predicting co-
expression of genes by utilizing the the steady state
information of BRNs.

Accuracy of Estimators

To evaluate the quality of our sampling-based esti-
mators, we measured their correctness and conver-
gence rate. Correctness means that the estimates
will eventually converge to the correct value. For
the convergence rate, a good estimator should ap-
proximate the correct value after a small fraction of
the state space is explored.
We use a portion of p53 network of Homo Sapiens
taken from KEGG [44] in this experiment. We mea-
sure the estimated number of steady states at which
a gene is active for each gene at each iteration of our
algorithm. Our algorithm traverses the entire space
of Type 1 states in about 2,500 iterations for this
network. Figure 4 shows the results for seven differ-
ent genes. We plot these genes as they have different
steady state 1I..."' In other words, they vary in
the fraction of steady states in which they are active
(e.g. CHK1 is active whereas p21 is suppressed in
most of the steady states). The results show that
our estimators converge to the correct ratio for all
genes in less than 500 iterations. The rapid conver-
gence suggests that our algorithm approximates the
correct profile of gene levels at steady states without
traversing the whole space of Type 1 states. This
suggests that, equipped with the stopping criterion
we devised, our algorithm is also practical and accu-
rate for BRNs with large number of Type 1 states
since early termination of the algorithm does not
lead to significant deviation from the correct steady
state profile.


This section discusses our algorithm for identifying
all the steady states of boolean BRNs. First we de-
scribe the mathematical model for expressing the
states and state transitions. Then, we discuss our
method to segregate the state space into three sub-
spaces. Finally, we present our randomized traver-
sal method that extracts Type 1 steady states. We
also give the formulation of a stopping criterion that
terminates the traversal when sufficient amount of
steady states are reported with high confidence.

State Transition Model

In order to identify the steady states of a BRN,
we first need to build a mathematical model that

Ay et al.

explains its states and how the network moves from
one state to another.
Let Xi(t) true/false denote the state of the
ith gene at time t. Here true denotes that ith gene
is "active" and false denotes that it is "inactive".
We use Xi instead of Xi(t) for simplicity wherever
We summarize the interactions that determine
the next state of the ith gene from the activity
values at time t as follows. The ith gene will be
inhibited if at least one of its suppressors is ac-
tive. If all the suppressors of the ith gene are in-
active and at least one of its activators is active,
then it becomes active in the next time step. In
all other situations the state of the ith gene re-
mains unchanged. Even though the assumption
that one inhibitor can suppress all activators seems
questionable, it is commonly observed in biological
networks. Wu et al. [39] named this as -1 I... i inhi-
bition" model and showed that it produces the same
results as threshold network model [38] for fission
yeast cell cycle network. Also, it has been used as a
modeling decision by Garg et al. [17,18]. However,
it is important to note that our method does not
depend on this assumption.
The following equation summarizes how the next
state of ith gene is determined:

x,(t + 1): (Xi(t) pA(t)) A ps(t) (1)

In this equation, the symbols V and A denote the
logical "AND" and "OR" operators, pA(t) andps(t)
represent predicates for the activators and the sup-
pressors of the ith gene at time t, respectively. We
compute these predicates as pA(t) VjeAXj(t)
and ps(t) = VjsXj(t), where A and S are the
sets of indices for activators and the suppressors of
the ith gene.
An important observation is that, even though
the next state of the ith gene is deterministically
calculated, there can be multiple next states for the
whole network since we use asynchronous model.
A state of a given BRN is defined by the states of
individual genes. Let u = [X1 ... X,] denote a
state of the network. The network can move from
state u to state v = [X1 . Xi1 -X,i Xi+l .
X,] only if the ith gene is one of the genes that
can have a state change. Individual genes that can
issue a state change at a given state determines the
possible next states of the network.

We model the changes in the states of a BRN us-
ing an abstract graph representation. In this graph,
each vertex corresponds to a possible state of the
BRN. Thus, if there are n genes in a BRN, then the
corresponding graph contains 2" vertices. There is
an edge from vertex u to vertex v, if it is possible
to change the state of the BRN from the state rep-
resented by u to the state represented by v by only
changing the state of a single gene. There can be
up to n2" edges between these states. This graph
is hypothetical as we use it only for building our
mathematical model. We never materialize this ex-
ponential graph in our method.
We classify the vertices of this graph into three
classes based on the number of their outgoing edges.
Figure 1 provides visual examples for all three state
types listed below:

Type 0: The vertices that have no outgoing
edges (except self cycles). These vertices cor-
respond to steady states as the state of the net-
work cannot change once one of them is visited.
(Figure l(a))

Type 1: The vertices that have exactly one out-
going edge. The states for these vertices can be
steady or transient. (Figure l(b))

Type 2: The vertices that have two or more
outgoing edges. These are all transient states.
(Figure 1(c))

In the following section, we describe our method
for segregating the state space into the above three

Segregation of States using BDDs

As we discussed in the previous section, we never
generate the state transition graph of the input net-
work. A simple observation on our state transition
model allows us to segregate the states without this
materialization. This segregation results in not only
the immediate identification of all Type 0 steady
states, but also eliminates a huge portion of states
by classifying them as transient.
For instance, for T-Helper cell network with 23
genes and 8, 388, 608 (223) possible states, our seg-
regation method classifies 1,321 states as Type 0
and 8, 364, 757(~ 223) states as Type 2 in only 0.08
seconds. Remaining 22,530 states are labeled as

Ay et al.

Type 1. Thus, we need to explore only a small per-
centage (~ (I 2'.'.) of the whole state space.
Here, we describe how we construct the BDDs
for all Type 0 states and all Type 1 states, namely
Zo and Z1. We first define a predicate that will be
handy in this discussion.

Here, D denotes the logical "XOR" operator. C,
evaluating to true at time t means that gene i will
change its state from X, to -Xi at time t + 1. Oth-
erwise, it preserves its current state. The following
equations, show the formulas of BDDs representing
Type 0 and Type 1 states:

Zo : A\- and Z : V(C, A (A/\- ).
i i jji

Zo = "True" represents the states that do not sat-
isfy any of the C, conditions (i.e. none of the genes
change state). The states in Z1 "True" satisfy ex-
actly one of the C, conditions (i.e. exactly one gene
changes state). The states which are not included in
the two BDDs above are called Type 2 and they are
all transient states. The BDD for these states can
be constructed similarly. However, we simply elim-
inate these states since they do not reflect the long
term behavior of the system. By doing this without
materialization, we quickly reduce the state space
of the problem to a significantly smaller one. In the
next section, we describe how we extract the steady
states of Type 1.

Extracting Cyclic Steady States

In this section, we develop a randomized traversal
strategy that identifies the steady states of Type 1.
We call these states "cyclic -1 .. I- An example for
this is the cycle of four states in Figure l(b). Af-
ter traversing a portion of the vertices, we estimate
the total number of steady states, the probability of
each gene being active and the joint probability of
gene pairs being co-expressed in steady states. It is
worth mentioning that our traversal method never
traverses a state more than once. Hence, if it runs
for enough time it labels all the Type 1 states as
steady or transient. Algorithm 1 briefly describes
how we traverse the Type 1 states. Next, we elab-
orate on different steps of this algorithm.
Step 1. Selecting a random seed state

C : Xi(t + 1) ( Xi(t)


1. Randomly get an unobserved vertex from the
Type 1 set.

2. Follow the out going edge to traverse the graph
until seeing one of the following vertices
(i) A vertex that is labeled as transient or
steady in previous iterations.
(ii) A vertex that is traversed in this iteration.

3. Label all the traversed vertices as transient or
steady and update the estimators.

4. Stop if number of steady states observed so far
is sufficient.

We obtain a random seed state among the untra-
versed satisfying assignments of the BDD for Type
1 states. We do this by traversing the BDD from
root node to the leaf level. At each step of the
traversal, we randomly pick a child node of the cur-
rently visited node. When we reach the leaf level of
the BDD, the states of all the genes are determined
and hence, our seed state for the whole BRN.
Step 2. Traversal starting from the seed state
Once we choose an unobserved seed state, the
next step is to understand whether or not we can
reach to a new steady state from this state. To do
this, we traverse the state transition graph starting
from this vertex by following the edges.
Since the seed state is of Type 1, by definition,
it has only one outgoing edge. Thus, we can eas-
ily find the next state as the state that satisfies the
transition condition. We continue traversal by ap-
plying the same principle. Figure 3 summarizes the
possible cases that can occur during this traversal.
Starting from an unobserved state if we traverse
one of the following three paths then all the states
visited on this path are transient:

A path ending in a Type 0 state

A path ending in a Type 2 state

A path ending in a state that is observed in
previous iterations

Ay et al.

Notice that all three cases correspond to Step 2(i)
of our traversal method. The next case produces
both cyclic steady and transient states:

A path leading to a cycle of states visited in
current iteration

In this case, we label all the states on the cycle as
steady and the other states on the path as tran-
sient. For instance, if the traversal starts from the
[001] state in Figure l(b), then [001] is transient
and other four states are Type 1 steady states.
Step 3. Calculating Estimators
At each iteration, we traverse a path in the state
transition graph and label each state on this path
as transient or steady. We name the set of ver-
tices visited in each such traversal as an observa-
tion. Using these observations, we develop estima-
tors for the total number and the ,./. ," of steady
states. The profile of the steady states is the vector
where the ith entry is the expected fraction of the
steady states at which the ith gene is active. For
example, if the second entry of the profile is 0.95, it
means that we expect that the second gene is active
in *'.'. of the steady states. We also compute the
estimators for the joint expression (co-expression)
fractions of gene pairs. Computing these estimates
is important as they can lead to early prediction of
the steady state profile.
Here, we describe in detail the calculation and
the analysis of the estimator of the total number of
Type 1 steady states. First of all, we prove that
it is an unbiased estimator. Then, we discuss how
to minimize the variance of this estimator. For the
other estimators we only give the formulations.
First, let us introduce some notation we use
throughout this section:

No, Ni: Number of Type 0 and Type 1 states,
respectively. We calculate these numbers at
the initial segregation step.

0O = (s, ti): ith observation. si and ti are
the number of observed steady and transient
states traversed in this observation.

Si, Ti, Ui: Total number of observed steady
states, observed transient states and unob-
served states after first i observations, respec-

From the definitions above, we can calculate Ui
N- S,- T, S, E=I sj and T, E jltj.
Now, we introduce a 0/1 random variable Bi for
each observation Oi. At a given time Bi = 1 means
the current iteration results in observation Oi. We
simulate our sampling by assuming at any time one
and only one of the Bi's can be 1. In other words,
E[BiBj] 0 for any i / j. Notice that E[Bi]
E[B7'] -= it for observation Oi. We formulate
the estimator of the total number of Type 1 steady
states at the ith iteration as:

F o kSk
k 0

Lemma 1 The estimator Fi is an unbiased estima-

Proof: We prove this by showing the expected
value of Fi is equal to the total number of Type
1 steady states. Taking expectations of both sides
and replacing E[Bk] with 8k-tk:

iE[B[ N1

io sN


After defining the estimator, the next step is to
calculate its variance.

Lemma 2 The variance of Fi is

i N1
Var[Fi] E ( )
j=o 0

Proof: We know that, Var[F,]
We first compute F2.

2 s N1
j0 s + to

[5 ]2

E[F?2] E2[F}].

I- N1
k se + tr

- BjBkSjSk ( N ) N

2 2 sN1 2
j+ Sj tj Sk tk

-1=0 *^ + t

Ay et al.

When we take the expected value of Fi2 the first
term cancels since E[BjBk] 0 for any i / j.
Hence, the variance of F, can be computed as:

Var[Fi = E[F,] E2[Fi]

SE[ B (- )2]-(E[Fi])2
,r Si + tj


j[ 8]2


There are many ways to build an estimator from
Fjs. However, it is desirable to build an estimator
with a small variance as it converges to true solution
faster. Following Lemma builds the estimator with
minimum variance.

Lemma 3 The estimator that has the smallest
variance is
1 F
Sc V, F

Proof: Now, we discuss how we combine the esti-
mators F, F2, ..., F, with variances Vi,V2..., V, to
minimize the overall variance of our estimation. In
other words, we want to find the weight parameters
71, 72, ., 'n such that E y 1 and the variance
of the estimator for total number of steady states
of Type 1 is minimized. Let us denote this new
estimator as T = 1 iFi. Then,

Var(T)= YVI

Mathematically, our aim is to minimize Vi
given E 7i 1. We formulate this problem by us-
ing Lagrange Multiplier as follows:

L= qV

A(E y 1)

Taking derivative of both sides with respect to
each yi, we get the equations:

2yV A 0,, A =

Solving these equations we get the 7, values that
minimizes the Var(T) as:

7J V 5

Thus, by using the value of 7ys we find that the
estimator with smallest variance is
T= 1 F
S V,3

Next, we give the formulations of the estimators
for the fractions of each gene and each gene pair
being active in steady states. First, we formulate
our estimator for the fraction of a gene being active
in cyclic steady states. Assume that the number
of steady states at the ith observation in which the
kth gene is active is nki. An estimator for the kth
gene after the ith iteration is then :

Gki = /Si

Let namb,i denote the number of steady states in
which gene a and gene b are both active or both
inactive after the ith observation. We calculate the
estimator of joint probability of two genes having
the same activity level at a steady state as:

Ja-b,i /Si

Step 4. Stopping Criteria When our method
finishes traversing all Type 1 states (steps 1 to 3),
it finds all the steady states. However, in some ap-
plications it might be sufficient to find a predeter-
mined percentage of steady states. We develop a
statistical criterion to be able to terminate the algo-
rithm quickly after a sufficient portion of the Type
1 states are explored. Our method still guarantees
that the desired percentage of the results are found
with high confidence. More precisely, when the user
supplies a parameter a (e.g. 0.9), we compute a
confidence c E [0, 1], at each iteration such that "at
least a x 100 percent of the steady states are found
with probability at least c". This is desirable as the
user can terminate the loop when c is large enough
for the underlying application.
Now, let us describe how the stopping criterion
works. Let A* denote the actual number of total
Type 1 steady states. If we have known the value
of A* we could have stopped sampling with a con-
fidence value of c = 1 when A* < S, + (1 )(No+Si)

= 0

Ay et al.

is satisfied. That is the time when we are sure that
a x 100 percent of the steady states are already re-
ported. Since we do not know A* in advance, we use
the information gathered from observed portion of
states. We compute Ai which denotes the minimum
number of total steady states of Type 1 that needs
to be present for our method to continue traversal.

A S+ (1 a)(No + S)/a (6)

Trivially, if Ai > Ui + Si we just stop sampling with
c = 1 since even if all the unobserved states were
to be steady, the reported ones would constitute at
least a x 100 percent of the Type 1 steady states.
Otherwise, we calculate the confidence value in ith
iteration as the probability that we would have ob-
served at least S, steady states in our observations
so far if there were Ai unobserved steady states.
Formally, we compute the confidence as:

C(Ai) = [+ + T) (1 qi)S +T -] (7)

qi in Equation 7 represents the percentage of steady
states if there were Ai steady states in Type 1 states
(i.e. qi = N-). The inner term of the summa-
tion represents "The probability of getting exactly k
steady states from Si + Ti currently observed states
if the probability of a state being steady is qi".
Lemma 4 shows that, the confidence value re-
ported when we stop sampling is never an over es-

Lemma 4 The confidence value given in Equa-
tion 7 by using Ai does not lead to false dismissal.

Proof: Here, we have three cases to consider:

Casel : (A* > Ai)
Then, q, = > N qi. Since the confi-
dence value is calculated as the area under the
right hand side of a binomial probability dis-
tribution function (i.e. inverse CDF), c value
will be larger for a larger value of q. Hence,
C(A*) > C(Ai). That means whenever we stop
sampling the confidence we report is conserva-

Case : (A* Ai)
Trivially, C(A*) C(Ai) when we terminate
the sampling.

Case : (A* < Ai)
This case implies that we overestimated the to-
tal number of Type 1 steady states at ith itera-
tion. Only thing that can happen in such a case
is that our method decides to continue travers-
ing when it does not need to. Since the actual
number of steady states are less than what we
have estimated, when the traversal stops we
have already sampled at least as many steady
states as needed to guarantee the reported con-
fidence value.

Corollary 1 follows from Lemma 4.

Corollary 1 Our method guarantees to find all the
steady states when the confidence value reaches 1. U



1. Obayashi T, Hayashi S, Shibaoka M, Saeki M,
Ohta H, et al. (2008) Coxpresdb: a database of
coexpressed gene networks in mammals. Nucleic
Acids Research (NAR) 36.
2. Karlebach G, Shamir R (2008) Modelling and
analysis of gene regulatory networks. Nat Rev
Mol Cell Biol 9: 770-80.
3. Basso K, Margolin A, -..i..- .1 1. G, Klein U,
Dalla-Favera R, et al. (2005) Reverse engineering
of regulatory networks in human b cells. Nature
Genetics 4: 382-90.
4. Margolin A, Nemenman I, Basso K, WI ...... C,
'-ii..- .I i1. G, et al. (2006) Aracne: an algo-
rithm for the reconstruction of gene regulatory
networks in a mammalian cellular context. BMC
Bioinformatics 7 Suppl 1.
5. Tatsuya A, Satoru M, Satoru K (1999) Identifica-
tion of genetic networks from a small number of
gene expression patterns under the boolean net-
work model. In: Pacific Symposium on Biocom-
puting. volume 4, pp. 17-28.
6. Osamu H, Naoki N, Yoshinori T, Hideo B, Seiya
I, et al. (2005) Estimating gene networks from
expression data and binding location data via
boolean networks. Computational Science and
Its Applications ICCSA 2005 3482: 349-56.
7. Tatsuya A, Satoru K, Osamu M, Satoru M (2003)
Identification of genetic networks by strategic

gene disruptions and gene overexpressions under
a boolean model. Theoretical Computer Science
298: 235-251.

8. Satoru M (2003) Inference, modeling and simu-
lation of gene networks. In: Proceedings of the
First International Workshop on Computational
Methods in Systems Biology. pp. 207-11.

9. Osamu H, Ryo Y, Seiya I, Rui Y, Tomoyuki H,
et al. (2008) Statistical inference of transcrip-
tional module-based gene networks from time
course gene expression profiles by using state
space models. Bioinformatics 24: 932-42.

10. Wong S, Zhang L, Tong A, Li Z, Goldberg D,
et al. (2004) Combining biological networks to
predict genetic interactions. Proc Natl Acad Sci
USA 101: 15682-7.

11. Hupp T, Lane D, Ball K (2000) Strategies for
manipulating the p53 pathway in the treatment
of human cancer. Biochemical Journal 352: 1

12. Lane D (1999) Exploiting the p53 pathway for
cancer diagnosis. and therapy. British Journal of
Cancer (BJC) 80: 1-5.

13. Ostergaard S, Olsson L, Johnston M, Nielsen J
(2000) Increasing galactose consumption by sac-
charomyces cerevisiae through metabolic engi-
neering of the gal gene regulatory network. Na-
ture Biotechnology 18: 1283-6.

14. Mendoza L, Thieffry D, Alvarez-Buylla E (1999)
Genetic control of flower morphogenesis in ara-
bidopsis thaliana: a logical analysis. Bioinfor-
matics 15: 593-606.

15. Demongeot J, Morvan M, Sene S (2008) Impact
of fixed boundary conditions on the basins of
attraction in the flower's morphogenesis of ara-
bidopsis thaliana. In: Proceedings of the 22nd
International Conference on Advanced Informa-
tion Networking and Applications (AINAW). pp.

16. Ivarez Buylla E, Chaos, Aldana M, Bentez M,
Cortes-Poza Y, et al. (2008) Floral morphogen-
esis: Stochastic explorations of a gene network
epigenetic landscape. PLoS ONE 3: e3626.

17. Garg A, Xenarios I, Mendoza L, De Micheli G
(2007) An efficient method for dynamic analy-
sis of gene regulatory networks. and in silico gene
perturbation experiments. In: International Con-
ference on Research in Computational Molecular
Biology (RECOMB). pp. 62-76.

18. Garg A, Di Cara A, Xenarios I, Mendoza L,
De Micheli G (2008) Synchronous versus asyn-
chronous modeling of gene regulatory networks.
Bioinformatics 24: 1917-25.

19. Mendoza L (2006) A network model for the con-
trol of the differentiation process in th cells.
Biosystems 84: 101-14.

20. Saez-Rodriguez J, Simeoni L, Lindquist J,
Hemenway R, Bommhardt U, et al. (2007) A log-
ical model provides insights into t cell receptor
signaling. PLoS Computational Biology 3: 1580

21. Davidich M, Bornholdt S (2008) Boolean network
model predicts cell cycle sequence of fission yeast.
PLoS ONE 3: e1672.

22. Fangting L, Tao L, Ying L, Qi O, Chao T (2004)
The yeast cell-cycle network is robustly designed.
Proceedings of the National Academy of Sciences
(PNAS) 101: 4781-6.

23. Garg A, Mendoza L, Xenarios I, DeMicheli G
(2007) Modeling of multiple valued gene regula-
tory networks. In: Proceedings IEEE Engineer-
ing in Medicine and Biology Society. pp. 1398

24. Mendoza L, Xenarios I (2006) A method for the
generation of standardized qualitative dynamical
systems of regulatory networks. Theoretical biol-
ogy and medical modelling 13.

25. Hachtel G, Macii E, Pardo A, Somenzi F (1996)
Markovian analysis of large finite state machines.
IEEE Transactions on Computer-Aided Design
15: 1479-93.

26. Devloo V, Hansen P, Labb M (2003) Identifi-
cation of all steady states in large networks by
logical analysis. Bull Mathematical Biology 65:

27. Shmulevich I, Dougherty E, Kim S, Zhang W
(2002) Probabilistic boolean networks: a rule-
based uncertainty model for gene regulatory net-
works. Bioinformatics 18: 261-74.

28. Shlomi T, Berkman O, Ruppin E (2005) Reg-
ulatory on/off minimization of metabolic flux
changes after genetic perturbations. Proceedings
of the National Academy of Sciences (PNAS) 102:

29. Schaub M, Henzinger T, Fisher J (2007) Quali-
tative networks: a symbolic approach to analyze
biological signaling networks. BMC Systems Bi-
ology 1.

Ay et al.

30. Segal E, Pe'er D, Regev A, Koller D, Friedman
N (2005) Learning module networks. Journal of
Machine Learning Research 6: 557-88.
31. Segal E, Michael S, Regev A, Pe'er D, David B,
et al. (2003) Module networks: Discovering regu-
latory modules and their condition specific regu-
lators from gene expression data. Nature Genet-
ics 34: 166-76.
32. Albert I, Thakar J, Li S, Zhang R, Albert R
(2008) Boolean network simulations for life sci-
entists. Source code for biology and medicine 3:
33. Albert R, Othmer H (2003) The topology of
the regulatory interactions predicts the expres-
sion pattern of the segment polarity genes in
drosophila melanogaster. Journal of Theoretical
Biology 223: 1-18.
34. Kauffman S (1969) Homeostasis and differentia-
tion in random genetic control networks. Nature
224: 177-8.
35. Lind-Nielsen J (1999). Buddy a bi-
nary decision diagram package. tech. rep,
36. Tyson J, Csikasz-Nagy A, Novak B (2002) The
dynamics of cell cycle regulation. Bioessays 24:
37. Tyson J, Chen K, Novak B (2001) Network dy-
namics and cell physiology. Nat Rev Mol Cell
Biol 2: 908-16.
38. Bornholdt S (2008) Boolean network models of
cellular regulation: prospects and limitations.
Journal of The Royal Society Interface 5: 85-94.
39. Wu Y, Zhang X, Yu J, Ouyang Q (2009) Iden-
tification of a topological characteristic respon-
sible for the biological robustness of regula-
tory networks. PLoS Computational Biology 5:
40. Schaefer C, Anthony K, Krupa S, Buchoff J, Day
M, et al. (2009) Pid: the pathway interaction
database. Nucleic Acids Research (NAR) 37:
41. Stuart J, Segal E, Koller D, Kim S (2003) A gene-
coexpression network for global discovery of con-
served genetic modules. Science 302: 240-1.
42. Zotenko E, Guimaraes K, Jothi R, Przytycka
T (2005) Decomposition of overlapping protein
complexes: A graph theoretical method for an-
alyzing static and dynamic protein associations.
RECOMB Satellite Meeting on Systems Biology
LNBI: 23-38.

43. Oldham M, Horvath S, Geschwind D (2006) Con-
servation. and evolution of gene coexpression net-
works in human. and chimpanzee brains. Pro-
ceedings of the National Academy of Sciences
(PNAS) 103: 17973-8.
44. Ogata H, Goto S, Sato K, I 1ii..I.. I1. W, Bono H,
et al. (1999) Kegg: Kyoto encyclopedia of genes.
and genomes. Nucleic Acids Research (NAR) 27:


Figure Legends

Ay et al.

Ay et al.

Table 1: The comparison of our algorithm with an existing method, Genysis [17,18], on real and random
networks. 1 Our algorithm when 90% of the steady states are found with 90% confidence. 2 Our algorithm when 80%
of the steady states are found with 80% confidence. We used a cut-off time of 24-hours and "-" indicates that the method
could not find all steady states within this time. s denotes seconds and m denotes minutes in running time columns.

Network Name Network size Running Time
Genes Interactions Genysis Our Alg.1 Our Alg.2
Fission yeast cell cycle [21] 10 27 0.21s 0.18s 0.17s
Budding yeast cell cycle [22] 12 35 0.13s 0.25s 0.22s
T-Helper cells [17] 23 35 0.23s 1.14s 0.43s
p38 MAPK signaling [40] 26 28 545.3m 11.3s 2.1s
T-cell receptor [18] 40 58 20.7m 14.2s 2.11s
randomNet 1 20 32 10.4m 2.282s 0.3s
randomNet 2 30 48 -5.923s 3.13s
randomNet 3 40 64 -4.7m 3.4m
randomNet 4 50 80 -68.7m 15.3m

(001 011 111 110l

.^-^ ^. ^^ ^010)
011 111 110

101 -c--- 10)

Figure 1: Each circle represents a state of a hypothetical network with 3 genes. The binary values correspond to activation
levels of these genes. (a) The three states on the left are transient and of Type 1. The state with self loop is steady and
Type 0. (b) The four states in simple loop are cyclic steady states and they are of Type 1. (c) The leftmost state is transient
and Type 1. Even though only [110] is of Type 2 (others are Type 1), the remaining five states create a complex loop, and
thus they are transient.

Ay et al.

Cell Size

Figure 2: Regulatory networks of the cell cycles of two yeast types. Red arrows with pointed heads represent activation,
black arrows with bar heads represent inhibition and yellow arrows indicate self-degradation. (a) S. cerevisiae (budding
yeast). (b) S. pombe (fission yeast).

Type 1

Type 0

!e d

steady transient

Figure 3: Summary of the traversal process for a randomly picked state (a) from unobserved Type 1 states. If the path
starting from a ends at b, c, d or e, then all the states on this path are transient (Step 2(i) of Algorithm 1). If the path
starting from a ends at a state like f then all the states on the path from a to f are transient (excluding f) and all the
states on the cycle from f to f are steady.



Number of Iterations

Figure 4: Convergence of the estimators for the steady state profiles of the genes. These genes are a selected subset of
the genes of p53 network of Homo Sapiens [44]. Y-axis shows for each gene the fraction of steady states that the gene is
in active state.

Ste9 R- um I

I NCdc2/CdcX13
PIP j-, Cdc25

SIPI Weel/Miki

" Cdc2/Cdc13*

University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs