RNA-seq and microarray complement each other in transcriptome profiling

MISSING IMAGE

Material Information

Title:
RNA-seq and microarray complement each other in transcriptome profiling
Physical Description:
Mixed Material
Language:
English
Creator:
Kogenaru, Sunitha
Qing, Yan
Guo, Yinping
Wang, Nian
Publisher:
BioMed Central (BMC Genomics)
Publication Date:

Notes

Abstract:
Background: RNA-seq and microarray are the two popular methods employed for genome-wide transcriptome profiling. Current comparison studies have shown that transcriptome quantified by these two methods correlated well. However, none of them have addressed if they complement each other, considering the strengths and the limitations inherent with them. The pivotal requirement to address this question is the knowledge of a well known data set. In this regard, HrpX regulome from pathogenic bacteria serves as an ideal choice as the target genes of HrpX transcription factor are well studied due to their central role in pathogenicity.
General Note:
Publication of this article was funded in part by the University of Florida Open-Access publishing Fund. In addition, requestors receiving funding through the UFOAP project are expected to submit a post-review, final draft of the article to UF's institutional repository at the Univeristy of Florida community, with research, news, outreach, and educational materials.
General Note:
Kogenaru et al. BMC Genomics 2012, 13:629 http://www.biomedcentral.com/1471-2164/13/629; Pages 1-16
General Note:
doi:10.1186/1471-2164-13-629 Cite this article as: Kogenaru et al.: RNA-seq and microarray complement each other in transcriptome profiling. BMC Genomics 2012 13:629

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
All rights reserved by the source institution.
System ID:
AA00013474:00001

Full Text


Kogenaru et al. BMC Genomics 2012, 13:629
http://www.biomedcentral.com/1471-2164/13/629


BM ics
Genomics


RNA-seq and microarray complement each other

in transcriptome profiling


Sunitha Kogenaru, Yan Qing, Yinping Guo and Nian Wang


Background
Transcriptome of an organism represents the entire rep-
ertoire of transcripts encoded by the genes as a pheno-
typic response to the condition in which they exist. The
sheer ability to simultaneously quantify the expression
levels for a vast number of genes has revolutionized the
biomedical research, facilitating the analysis of global
gene expression patterns at the genome-wide scale [1].
In the past decade, there has been a tremendous pro-
gress in the development of methods to deduce and
quantify the gene expression levels at the whole tran-
scriptome level [1]. Among the several transcriptome

* Correspondence nianwang@ufl edu
Citrus Research and Education Center, Department of Microbiology and Cell
Science, Institute of Food and Agricultural Sciences, University of Florida, 700
Experiment Station Road, Lake Alfred 33850, USA


profiling methods, RNA-seq and DNA microarray stand
out as the two widely used genome-wide gene expression
quantification methods [1-17].
RNA-seq method involves the conversion of isolated
transcripts into the complementary DNA (cDNA), which
is then directly sequenced in a massively parallel deep-
sequencing-based approach [18]. By mapping the resulting
short sequencing reads onto the reference genome, the ex-
pression levels of genes relative to the condition of interest
or absolute levels can be quantified [9,11]. This method
has been implemented in different platforms like Illumi-
na's Genome Analyzer, Roche 454 Genome Sequence, and
Applied Biosystems' SOLiD [4]. On the other hand,
microarray is based on the hybridization of specimen tar-
get strands onto the immobilized complementary probe
strands. For example, in a two-color microarray, transcripts


0 2012 Kogenaru et al., licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative
Bill led Central Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.


Abstract
Background: RNA-seq and microarray are the two popular methods employed for genome-wide transcriptome
profiling. Current comparison studies have shown that transcriptome quantified by these two methods correlated
well. However, none of them have addressed if they complement each other, considering the strengths and the
limitations inherent with them. The pivotal requirement to address this question is the knowledge of a well known
data set. In this regard, HrpX regulome from pathogenic bacteria serves as an ideal choice as the target genes of
HrpX transcription factor are well studied due to their central role in pathogenicity.
Results: We compared the performance of RNA-seq and microarray in their ability to detect known HrpX target
genes by profiling the transcriptome from the wild-type and the hrpX mutant strains of y-Proteobacterium
Xanthomonas citri subsp. citri. Our comparative analysis indicated that gene expression levels quantified by RNA-seq
and microarray well-correlated both at absolute as well as relative levels (Spearman correlation-coefficient, rs > 0.76).
Further, the expression levels quantified by RNA-seq and microarray for the significantly differentially expressed
genes (DEGs) also well-correlated with qRT PCR based quantification (rs= 0.58 to 0.94). Finally, in addition to the 55
newly identified DEGs, 72% of the already known HrpX target genes were detected by both RNA-seq and
microarray, while, the remaining 28% could only be detected by either one of the methods.
Conclusions: This study has significantly advanced our understanding of the regulome of the critical transcriptional
factor HrpX. RNA-seq and microarray together provide a more comprehensive picture of HrpX regulome by
uniquely identifying new DEGs. Our study demonstrated that RNA-seq and microarray complement each other in
transcriptome profiling.
Keywords: RNA-seq, Microarray, Transcriptome profiling, Pathogenic bacteria, Virulence, Type 3 secretion system,
Effectors, HrpX, Xanthomonas, Citrus canker disease






Kogenaru et aL BMC Genomics 2012, 13:629
http://www.biomedcentral.com/1471-2164/13/629


extracted from different conditions are labeled with
distinct fluorescent dyes while being converted to
cDNA. These labeled samples are then hybridized to
the immobilized complementary probe strands in an
array representing the genes. By measuring the light in-
tensity of the distinct fluorescent dyes, the relative
abundance of each transcript in the two different con-
ditions can be measured [8,12,13,17,19,20]. Affymetrix
and Agilent are the two prevalent platforms in micro-
array technology [2,14].
Even though, initially microarray has been instrumental
in whole transcriptome analysis, currently RNA-seq is
becoming a preferred method of choice, since it is con-
sidered to effectively surmount the limitations of micro-
array [1,21-23]. RNA-seq technology, unlike microarray,
does not depend on the prerequisite knowledge of the
reference transcriptome [24]. Further, RNA-seq data
contains very low background signal, a higher dynamic
range of expression levels, and also relatively small
amount of total RNA required for quantification, when
compared to microarray [1,23]. Despite these advan-
tages, the efficiency of RNA-seq is marred with the
problem of overwhelming amount of ribosomal RNA
(rRNA) in the data, short reads, less base accuracy, and
variation of read density along the length of the tran-
script, posing a challenge for this high-throughput
method [21,25,26]. However, in spite of their strengths
and limitations, RNA-seq and microarray have become
the default popular methods of choices for genome-
wide transcriptome studies [1,2,23].
Currently several studies have been conducted to
compare the performance of RNA-seq and microarray
in quantifying the expression level of genes, by focusing
on various aspects like reproducibility, accuracy, statis-
tical issues, technical and biological variabilities
[1,15,21,27-30]. The main conclusion from these studies
has been that the expression levels quantified by these
two methods correlated to a large extent, and overall
favored the RNA-seq because of high reproducibility, ac-
curacy, and dynamic range [27,29]. However, none of
these comparison studies have addressed if these two
methods complement each other in transcriptome profil-
ing given the strengths and limitations associated with
them. In order to address this question, we require an
already well characterized dataset. The HrpX regulome
from Xanthomonas citri subsp. citri (Xcc) serves as an
ideal data model in this regard [31-33]. Xcc is a causal
agent of citrus canker, one of the serious and destructive
diseases in citrus that is resulting in significant losses to
citrus industry worldwide [34], while HrpX is a key glo-
bal transcription factor that regulates the expression of
hrp (hypersensitive response and pathogenicity) cluster
of genes, which are considered as the major pathogen-
icity factors [31,35]. HrpX contains AraC-type of DNA


binding domain, which specifically recognizes the plant-
inducible promoter (PIP) box (TTCGC-N15-TTCGC)
and imperfect PIP box (TTCGC-N8-TTCGT) present in
the cis-regulatory regions of hrp gene cluster [36-38].
Since HrpX has a key role in pathogenicity, tremendous
progress has been made in cataloguing the target genes
of HrpX [39-45]. We therefore assessed the performance
of RNA-seq and microarray in their ability to detect
known HrpX target genes. We chose Illumina and Agi-
lent as the corresponding platforms for RNA-seq and
microarray, as they are the most popular platforms for
these technologies [2,4].


Results
In order to uncover the regulome of HrpX transcription
regulator by profiling the wild-type and the hrpX mutant
strains transcriptome, we had designed a microarray
chip covering the whole genome under Agilent platform
in our previous study [33]. Here, we conducted genome-
wide transcriptome profiling of these two strains by
RNA-seq and compared the results to the previously
published microarray data, to assess the performance of
these two methods. Further, to avoid technical variation
associated with RNA isolation, we used the aliquots
from the same total RNA samples used for microarray
experiments also for RNA-seq.
We obtained 16,431,283, 17,289,220, 18,124,120 sequence
reads for the wild-type and 15,084,955, 17,831,920, and
18,115,115 for the hrpX mutant strain with a median
sequence length of 74-base pairs (bp) (Additional file 1:
Table Sl). Raw reads often have high sequencing errors,
especially in the 3' end where there is a high chance of
sequencing errors to occur [46]. We therefore filtered the
reads for high quality ones by trimming off the base pairs
with low quality score assigned to them during down-
line processing of RNA-seq. More than 90% of the reads
passed the quality filter, as a result, the median sequence
length of quality filtered reads subsequently dropped to
68-bp (Additional file 1: Table Sl). We then mapped
these high quality trimmed reads on to the Xcc genome.
Approximately more than 90% of the reads could be
mapped on to the reference genome, indicating good
sequence coverage (Additional file 1: Table Sl). Overall
-97% of the annotated genes had more than one read
mapped, while merely -3% of the annotated genes had
no reads mapped, indicating good sequencing depth.
Further, we also observed a difference in the sequence
coverage between the chromosome and the two en-
dogenous plasmids of Xcc. Annotated coding genes
from the chromosome with a size of 5.18 mega base
pairs (Mb) had 98% sequence coverage, whereas, it
was 78% for plasmid pXAC64 with a size of 0.06 Mb,
and relatively lower with only 62% sequence


Page 2 of 16







Kogenaru et al. BMC Genomics 2012, 13:629
http://www.biomedcentral.com/1471-2164/13/629


coverage for plasmid pXAC33 with a size of 0.03 Mb
(Additional file 1: Table S2).

Comparison at absolute levels of expression
RNA-seq had coverage for 4323 genes with one or
more reads mapped, while by microarray 4349 genes
were assigned the fluorescence intensity values after
the background correction. Among these 4312 genes
(-99% of the total genes) were common to both meth-
ods, while merely 37 (0.8%) and 11 genes (0.2%) were
uniquely called by microarray and RNA-seq respectively
(Additional file 1: Tables S2 and S3; Additional file 2: Fig-
ure FS1). We compared the absolute levels of gene ex-
pression in terms of RNA-seq counts and microarray
fluorescence intensities for all the listed genes called by
both the methods. These two independent measures of
transcript abundance associated with each gene for all
the biological replicates from the wild-type and the hrpX
mutant strains were compared separately. The resulting
correlation was mapped as a scatter plot, with an average
number of counts from Illumina sequencing against the
normalized fluorescence intensities from Agilent arrays
for each gene in the wild-type (Figure 1A) as well as in
the hrpX mutant (Figure IB). Absolute levels of gene ex-
pression correlated well, when estimated in terms of
Spearman's correlation coefficient (rs) with 0.78 (p-value
< 0.0001) for the wild-type and 0.80 (p-value < 0.0001)
for the hrpX mutant strain. This is in agreement with the


A Uo0
X10 4 rs=0.78
1 1002.


x10O ... .


1x10-o
1x1002 1x1003 1x10o4 1x10o5
Microarray (intensity)


previous reports that expression levels measured by
microarray and RNA-seq had correlations ranging be-
tween 0.62 and 0.8 for prokaryotic and eukaryotic data-
sets [18,28,29]. However, there seems to be little or no
correlation for the genes with low level of expression.
We further estimated the correlation for the subset of
genes with fluorescence intensity values <100 assigned by
microarray (-360 genes) with the corresponding expres-
sion levels determined by RNA-seq. This subset of genes
revealed a very poor rs of 0.2 (p-value <0.0002) and 0.3
(p-value <0.0001) for the wild-type and the hrpX mutant
strains respectively. Although the expression levels of
these genes did not change much according to micro-
array, RNA-seq reported them to have different expres-
sion levels. This may be attributed to the high sensitivity
of RNA-seq method.
We further estimated the correlation between all the
combinations of biological replicates for the wild-type
and the hrpX mutant strains independently. The result-
ing rs values of these comparisons are represented in the
form of heat maps, for the wild-type (Figure IC) and the
hrpX mutant strains (Figure ID), which provide a global
view of these correlations. Overall, on an average the
wild-type with rs = 0.76 (p-value < 0.0001) and the hrpX
mutant with rs = 0.78 (p-value < 0.0001) were observed
for the biological replicates from all the correlation com-
binations. This level of comparison strongly suggested
that not only the absolute level of gene expressions


B x105o
1X1O4 rs=0.80





1x10o o ... 0
1x10-o --


1x1002 1x1013 1x1004
Mcroarray (intensity)


1x10s5


WT-1


076 078 08
Microarray


Microarray


Figure 1 Comparison of absolute levels of gene expression by RNA-seq and microarray. Upper panel shows for the (A) wild-type and (B)
hrpX mutant, the correlation between normalized fluorescence intensities of Agilent microarray with the RNA-seq counts from Illumina. Each dot
represents the average values for each gene from all the biological replicates. Spearman's correlation coefficient (r) is indicated for each
comparison. Lower panel shows rs between normalized fluorescence intensities of Agilent microarray with the RNA-seq counts from Illumina for
all the combination of biological replicates for the (C) wild-type, and (D) the hrpX mutant. The r, values are plotted in the form of a heat map,
where green color represents low r, value, while red represents highest r, value. The dendrogram provides a hierarchical clustering.


Page 3 of 16






Kogenaru et al. BMC Genomics 2012, 13:629
http://www.biomedcentral.com/1471-2164/13/629


determined by RNA-seq and microarray highly corre-
lated, but were also highly reproducible, in spite of the
technical as well as the biological variability associated
with the quantifications.

Comparison at relative levels of expression
We also compared the performance of these two meth-
ods at relative level of gene expression. For this purpose,
we first computed the relative expression level of genes
in terms of fold-change (FC) for the hrpX mutant in re-
lation to the wild-type strain, along with p-values to de-
note the statistical significance and false discovery rate
(FDR), for having a good control over the false positives
rate. We compared the relative expression levels for
4312 consensus genes both qualitatively and quantita-
tively, after transforming the FC values to logarithm base
2 (log2) scale without any statistical cut-off thresholds
(Additional file 1: Table S2). For the 2587 (-60% of the
consensus) genes, the expression levels agreed qualita-
tively, while 1725 (-40%) genes disagreed between the
two methods (Figure 2A). At this point, our comparison
was exclusively focused on whether the gene of interest


is up- or down-regulated based on the sign of the log2
transformed FC values, but not necessarily on the FC
magnitude. We further illustrated the quantitative rela-
tionship of log2FC between RNA-seq and microarray
in the form of a scatter plot as shown in Figure 2B.
Genes with no change in expression levels in the wild-
type and the hrpX mutant strains (FC = 1) clustered
around log2FC of zero (log2 of one is zero) in the scat-
ter plot (Figure 2B). The rs between the log2FCs deter-
mined by RNA-seq and microarray was found to be
0.30 (p-value < 0.0001) (Figure 2B). This lower correl-
ation value indicated that the magnitude of FCs be-
tween the two methods differed largely that might be
due to the background noise resulting from the many
imperfections, which are inherent to the high-
throughput technologies [47,48].
The correlation coefficient provides an overall estimate
of correlation between the expression levels determined
by RNA-seq and microarray methods. However, this
does not zoom into the data in a detailed manner. For
instance, no information is provided about how much of
FC magnitude that actually differs between the two


Page 4 of 16


A B C

r,=0.30





4 = .76 0.1
-4






2-




-2-
"0.01 -






w ~~-4_1K







RNA-seq Microarray -4 -2 0 2 4 0 1 2 3 4 5
Microarray Fold difference
Figure 2 Qualitative and quantitative comparison of relative levels of gene expression in the hrpX mutant with respect to the
wild-type strain, determined by RNA-seq and microarray methods. (A) Venn diagram showing the qualitative agreement in the loga fold-
change values for expression of 4312 genes by RNA-seq and microarray. (B) Scatter plot showing the relative expression levels of genes in terms
of logFCs, determined by RNA-seq and microarray. Correlation between the two methods is shown by Spearman's correlation coefficient (rs). (C)
Frequency histogram showing the percentage of genes with the fold difference between RNA-seq and microarray, with a bin width of 0.5.
The lower panel D, E, and F are same as A, B, and C respectively, but for only those genes that have passed the statistical cut-of threshold
(FDR 5% and absolute lo fold-chan 0.6).
g2-














(FDR s 5% and absolute Ioc, fold-chanqe : 0.6).






Kogenaru et al. BMC Genomics 2012, 13:629
http://www.biomedcentral.com/1471-2164/13/629


methods for a given gene. In order to get an insight into
this aspect, we computed the fraction of genes deviating
in their FC magnitude values by dividing the FC magni-
tude value determined by RNA-seq with that of micro-
array (Figure 2C). Here, the fold difference of one
represents the fraction of genes that are determined to
have a FC magnitude of + 0.5 (bin width) by both RNA-
seq and microarray methods. When we plotted this fre-
quency as a histogram for the whole 4312 consensus
genes, more than 75% of genes were found to have FC
magnitude values + 0.5 by RNA-seq and microarray
methods. Since it is a relative expression comparison,
genes whose expression values did not change much in
the wild-type and the hrpX mutant strains, tend to have
FC values = 1. Subsequently, it is more sensible to con-
sider only differentially expressed set of genes for further
comparisons.
We therefore applied FDR < 0.05 (5%) in conjunction
with FC (absolute log2FC > 0.6) to filter the whole data
set. In total, 87 (2%) genes from RNA-seq and 64 (1.5%)
from microarray qualified at this cut-off threshold from
the 4312 consensus genes (Additional file 1: Table S4).
Together, 106 genes satisfied our selection criterion from
both the methods (Additional file 1: Table S4). Among
them 84 (79.2%) genes were up-regulated, while 22
(20.8%) genes were found to be down-regulated. Further,
45 (-42.45%) genes were common between both the
methods, whereas, 42 (39.63%) and 19 (-17.92%) genes
were uniquely detected by RNA-seq and microarray re-
spectively (Additional file 1: Table S4; Additional file 2:
Figure FS2). We further compared the FC values of the
45 consensus genes both qualitatively and quantitatively.
These genes qualitatively agreed 100% by having the
same trend of log2 transformed FC values by both RNA-
seq and microarray (Figure 2D). Likewise the quantita-
tive comparison was performed by estimating the correl-
ation between the magnitude of log2FC determined by
RNA-seq and microarray for the 45 consensus genes as
shown in Figure 2E. The magnitude of FC values be-
tween the two methods were found to be well correlated
(rs = 0.76, p-value < 0.0001), indicating that the same
trend of variation was observed in FC values between
the two methods without any dispersion. Thereby, the
magnitude of FC values determined by RNA-seq and
microarray agreed to a large extent for the 45 consensus
genes. In order to further pinpoint the deviation in the
FC magnitude quantified by the two methods, we plot-
ted the differences in the FC values determined by RNA-
seq with respect to microarray, and the percentage of
genes with that difference for the 45 consensus genes
(Figure 2F). Majority of the genes (~98%) were found to
have a magnitude of FC within the range of < 1.5, while
for the remaining 2% of the genes, it was 4.7-times
higher in RNA-seq than the microarray based


quantification. Based on these comparisons, we con-
cluded that the relative gene expression levels quantified
by RNA-seq and microarray were consistent to a large
extent for the statistically differentially expressed set of
consensus genes.

Comparison with qRT-PCR
Traditionally, quantitative Reverse Transcription PCR
(qRT-PCR) is used to validate the gene expression levels
quantified by high-throughput technologies like RNA-
seq and microarray [49]. Therefore, we compared the
relative expression levels quantified by RNA-seq and
microarray by qRT-PCR for a subset of 43 (40.6%) genes
(Additional file 1: Table S5) that were randomly selected
from the 106 significantly differentially expressed genes.
Among them, 19 genes were found to be common be-
tween both the methods, 12 genes were unique to RNA-
seq, while remaining 12 genes were found to be unique
to microarray (Additional file 1: Table S4). The expres-
sion levels were found to be highly reliable for genes that
are determined to be significantly differentially expressed
by RNA-seq (rs = 0.94; p-value < 0.0001) as well as
microarray (rs = 0.97; p-value < 0.0001). For the consen-
sus genes, microarray had a slightly higher correlation
with qRT-PCR than RNA-seq (Figures 3A and 3B).
We further plotted the percentage of genes that
deviated in the magnitude of FC quantified by RNA-seq
and microarray with respect to qRT-PCR (Figures 3C
and 3D). For most of the genes, the magnitude of FC
quantified by RNA-seq and microarray were relatively
higher, when compared to qRT-PCR (fold difference >1).
Overall, the magnitude of FC quantified by RNA-seq
was in consistence with qRT-PCR based quantification
(Figure 3C). For microarray, the magnitude of FC was
observed to be consistent with qRT-PCR for a majority
of genes, however, we also noticed outlier genes with a
9-times higher FC magnitude (Figure 3D).
For the subset of 12 genes that were found to be
uniquely determined by RNA-seq, the magnitude of
FC quantified by RNA-seq correlated moderately with
qRT-PCR (rs = 0.58; p-value 0.05) (Figure 4A; Additional
file 1: Table S5). The 12 genes found to be uniquely
detected by microarray had a correlation of rs = 0.92
(p-value 0.002) with qRT-PCR (Figure 4B; Additional file 1:
Table S5). These correlations are slightly lower when com-
pared to the consensus genes (rs > 0.94). This indicated
that the expression levels are more reliable for the genes
that are determined to be significantly differentially
expressed by both RNA-seq and microarray rather than by
any one method. Moreover, it also indicated that there is a
lot of variation in the magnitude of FC quantified by RNA-
seq and qRT-PCR. We further evaluated this variation
i.e. deviation from the magnitude of FC, by plotting the
frequency histogram for the 12 genes unique to RNA-seq


Page 5 of 16






Kogenaru et al. BMC Genomics 2012, 13:629
http://www.biomedcentral.com/1471-2164/13/629


rs= 0.94


rH't


0.01 0.1 1 10 10(
qRT-PCR









lil Il


rs= 0.97


0.01 0.1 1
qRT-PCR


10 100


100-



a) 10-,


Ililllllil I


0 1 2 3 4 5 0 1 2 3 4 567 8
Fold difference Fold difference
Figure 3 Comparison of expression levels quantified by RNA-seq and microarray with qRT-PCR. (A) Comparisor
determined by RNA-seq with qRT-PCR. (B) Comparison of expression levels determined by microarray with qRT-PCR (C)
showing the percentage of genes with the fold difference between RNA-seq and qRT-PCR, with a bin width of 0.5. (D)
showing percentage of genes with the fold difference between microarray and qRT-PCR, with a bin width of 0.5.


9 10

of expression levels
Frequency histogram
-requency histogram


qRT-PCR


,--- 0.01 4-
10 0.01


rs= 0.92


0.01 4-
0.01


0.1 1 10
qRT-PCR


C
100-



0) 10-
'd


I I


0 1 2 3 0.5 0.6 0.7 0.8 0.9 1.0
Fold difference Fold difference
Figure 4 Comparison of expression levels of genes that are uniquely determined by RNA-seq and microarray with that of qRT-PCR.
Expression levels for the set of selected genes quantified by (A) RNA-seq, and (B) microarray with that of qRT-PCR. Frequency histogram show
percentage of genes deviating from the magnitude of FC, quantified by RNA-seq (C), and microarray (D) with respect to qRT-PCR. Bin width of
0.5 and 0.05 are used respectively.


100-



10-



~1-


Page 6 of 16


!


1-t I=,-'
dil


rs= 0.58


T.1






Kogenaru et al. BMC Genomics 2012, 13:629
http://www.biomedcentral.com/1471-2164/13/629


(Figure 4C) and microarray (Figure 4D). For the genes
unique to RNA-seq, we observed that none of them had
the same magnitude of FC, with 50% genes having 0 to
0.5-time lower and for the remaining 50% of the genes the
magnitude of FC was observed to be 2 to 3-time higher,
when compared to qRT-PCR (Figure 4C). Because of this
inconsistence in the magnitude of FC, the expression levels
are moderately correlated. For the genes unique to micro-
array, we observed a good consistence in the magnitude of
FC with qRT-PCR (Figure 4D).


Comparison in terms of detection of genes encoding
T3SS and effectors
Extensive and detailed studies have been carried out
since past three decades in cataloguing the target genes
of HrpX in the genus Xanthomonas using various gen-
etic and biochemical methods [32,38,39,50-55]. HrpX is
known to regulate hrp gene cluster that encodes the type
III secretion system (T3SS) and effectors [31,56]. T3SS
are specialized macromolecular machinery that act as a
nano-injector to translocate the effector proteins into
the cytoplasm of host plant cells [50]. These translocated
effectors manipulate the host cellular processes by alter-
ing signal transduction, transcriptional activities like
suppression of basal plant defense responses, and pro-
tein turnover in host cells for the benefit of the pathogen
[50]. The T3SS machineries are evolutionarily conserved
across many Gram-negative animal- and plant-
pathogenic bacteria [57].
Xcc is comprised of 25 hrp genes, including 19 hrp-
conserved (hrc) and 6 hrp-associated (hpa) genes that
encode the T3SS [58]. These genes are clustered in a
-25 kb region spanning from 462712 to 488334 bp of
the genome [32]. We applied statistically significant dif-
ferentially expressed gene list that were derived from
RNA-seq and microarray methods to this cluster. We
counted for the number of known hrp cluster genes,
which passed the FC and FDR cut-off thresholds from
RNA-seq and microarray methods (Table 1). Among the
25 hrp cluster genes, 16 (64%) were detected by both
RNA-seq and microarray methods. Six genes were found
to be uniquely detected by microarray, whereas, none
uniquely detected by RNA-seq (Table 1). Three genes
namely, hrcC, hpa2, and hpaA could not pass our statis-
tical cut-off criteria by any of the methods, although
they followed the same qualitative expression pattern.
We further quantified the deviation in the magnitude of
FC for the 16 known hrp genes, found in consensus be-
tween RNA-seq and microarray (Figure 5A). The magni-
tude of FC for 5% genes found to be same, while for the
remaining 95% genes it was found to be between 1.2 to
1.8-time higher in RNA-seq than in microarray. Even
though, microarray overall detected more genes from


hrp cluster, RNA-seq reported higher magnitude of FC
(Table 1).
Xcc also encodes 25 putative effector genes regulated
by HrpX, which meditate the interaction with the host
plant, hence determine the host specificity [55]. Since
XAC2785, XAC1210 and XAC1209 were considered as
pseudo or inactive genes, they were excluded from our
analysis. We tabulated how many of these genes were
detected by RNA-seq and microarray methods with their
corresponding log2FC values along with p-value and
FDR from the respective methods (Table 2). In total, 10
(45.5%) genes were detected by both the methods. RNA-
seq and microarray uniquely detected one and three
genes respectively. The remaining 9 genes (36.4%) were
neither detected by RNA-seq nor by microarray, since
they could not pass both the FC and FDR cut-offs
(Table 2). For the 10 consensus genes, we calculated the
fold differences in the magnitude of FC quantified by
RNA-seq with respect to microarray. None of the genes
had the same magnitude of FC between the two meth-
ods. Microarray estimated higher magnitude of FC for
~64% genes than RNA-seq, while RNA-seq estimated
1.2 to 1.8-time higher magnitude of FC for the
remaining ~36% genes (Figure 5B). In contrast to hrp
gene cluster, where microarray qualitatively outper-
formed RNA-seq in its ability to detect more genes, here
RNA-seq complemented quantitatively with higher con-
fidence by reporting higher magnitude of FCs. Thereby,
for the effector gene data set, RNA-seq and microarray
complemented each other both qualitatively as well as
quantitatively.
Overall, considering T3SS and effector genes, in total
there are 47 genes, from which, 26 genes (55%) were
detected by both RNA-seq and microarray (Tables 1 and
2). RNA-seq uniquely detected 1 gene (2%), whereas,
microarray detected 9 genes (19%). Remaining 11 genes
(23%) were not detected by either one of the methods by
failing to pass the cut-off threshold (Tables 1 and 2).
Further, considering only the genes that are detected by
at least one method, 72% of the known were detected by
both methods, while remaining 28% were detected by ei-
ther one of the methods.


Genes uniquely detected by RNA-seq and microarray
Among the 87 statistically significant differentially
expressed genes from RNA-seq, 42 (39.63%) genes were
found to be uniquely detected by this method (Add-
itional file 2: Figure FS2). Of these 42 genes, 17 were
found to be down-regulated, while 25 were up-regulated
(Additional file 1: Table S4). Nearly 98% of these genes
(41 of 42 unique) could not pass the FC cut-off thresh-
old by microarray. The only exception is the gene fliO
(XAC1945) that encodes a flagellar protein for flagellum
apparatus, which passed the FC cut-off, but failed with


Page 7 of 16







Kogenaru et al BMC Genomics 2012, 13:629 Page 8 of 16
http://www.biomedcentral.com/1471-2164/13/629




Table 1 Summary of Type III secretion system (T3SS) hrp cluster genes detected by RNA-seq and microarray
Locus Tag Gene Symbol RNA-seq Microarray Detected by
log2FC p-value FDR log2FC p-value FDR
XAC0412 hrcN -2.2355 7.66E-09 1.27E-06 -0.8217 0.00E+00 0.00E+00 t
XAC0409 hrcJ -3.1488 4.20E-15 1.40E-12 -2.1840 0.00E+00 0.00E+00 t
XAC0406 hrcU -2.6729 1.01E-13 2.56E-11 -1.0507 0.00E+00 0.00E+00 t
XAC0405 hrcV -1.5755 3.27E-08 4.88E-06 -0.8427 4.00E-05 2.58E-03 t
XAC0407 hrpBl -3.9638 7.53E-28 3.61 E-25 -2.8603 0.00E+00 0.00E+00 t
XAC0408 hrpB2 -2.8155 2.48E-09 4.67E-07 -1.9507 0.00E+00 0.00E+00 t
XAC0410 hrpB4 -1.9274 7.42 E-05 6.05E-03 -1.5237 0.00E+00 0.00E+00 t
XAC0403 hrcQ -2.0615 6.36E-07 7.64E-05 -0.8647 0.00E+00 0.00E+00 t
XAC0402 hrcR -1.7123 4.73E-06 5.53E-04 -0.8677 0.00E+00 0.00E+00 t
XAC0399 hrpD5 -1.7287 2.67E-10 5.50E-08 -1.5487 0.00E+00 0.00E+00 t
XAC0398 hrpD6 -1.5446 5.85E-07 7.43E-05 -1.6347 0.00E+00 0.00E+00 t
XAC0397 hrpE -2.1535 7.67E-16 2.76E-13 -1.8163 0.00E+00 0.00E+00 t
XAC0394 hrpF -2.0517 7.04E-15 2.17E-12 -1.2113 0.00E+00 0.00E+00 t
XAC0416 hpa 1 -5.0096 1.49E-51 1.29E-48 -4.2917 0.00E+00 0.00E+00 t
XAC0396 hpaB -1.5429 1.56E-07 2.11 E-05 -1.2527 0.00E+00 0.00E+00 t
XAC0393 hpoF -0.9235 1.96E-04 1.34E-02 -0.6083 1.00E-05 8.10E-04 t
XAC0411 hrpB5 -1.9690 3.92E-03 1.38E-01 -0.8457 0.00E+00 0.00E+00 y
XAC0401 hrcS -1.0156 4.00E-02 4.71 E-01 -1.0837 0.00E+00 0.00E+00
XAC0404 hpoP -1.2673 1.10E-02 2.74E-01 -1.1110 0.00E+00 0.00E+00
XAC0395 XAC0395 -1.0817 4.73E-02 5.08E-01 -0.8640 0.00E+00 0.00E+00 y
XAC0415 hrcC -0.2713 2.49E-01 8.27E-01 -0.4053 4.75 E-03 1.78E-01 $
XAC0413 hrpB7 -1.2107 1.08E-02 2.74E-01 -0.6590 0.00E+00 0.00E+00
XAC0417 hpo2 -1.4130 8.33E-03 2.31 E-01 -0.4610 4.40E-04 2.64E-02 $
XAC0400 hpoA -1.0585 2.76E-03 1.10E-01 -0.7763 9.74E-03 2.05E-01 $
XAC0414 hrcT -1.0474 1.10E-02 2.74E-01 -0.6800 0.00E+00 0.00E+00 y
t Consensus between RNA-seq and microarray16 genes (64%).
( Only RNA-seq 0 gene (0%).
p) Only microarray 6 genes (24%).
$ Undetected 3 genes (12%).
The list of known T3SS hrp cluster genes along with the information about the log2FC, p-value and FDR value from RNA-seq and microarray experiments.
"Detected by" column assigns by which method the known hrp cluster gene is found to be significantly differentially expressed.



A B
100- 100-




tM 10 10--




i -i III I 1
1 2 3 1 2
Fold difference Fold difference
Figure 5 Comparison of expression levels of genes encoding T3SS and effectors that are commonly detected by RNA-seq and
microarray. (A) Frequency histogram showing percent of genes deviating from the magnitude of FC quantified by RNA-seq with respect to
microarray for hrp gene cluster. Bin width of 0.2 is used. (B) Frequency histogram showing percentage of genes deviating from the magnitude of
FC quantified by RNA-seq with respect to microarray for the T3SS and effector genes. Bin width of 0.5 is used.







Kogenaru et al. BMC Genomics 2012, 13:629
http://www.biomedcentral.com/1471-2164/13/629




Table 2 Summary of Type III effector genes detected by RNA-seq and microarray


Locus Tag Gene Symbol


XAC0286
XACb001
XAC0754
XAC3085
XAC2786
XAC1208
XAC0277
XAC0543
XAC3230
XAC2922
XAC4213
XAC4333
XAC0601
XAC0076
XAC3090
XAC3224
XAC2009
XAC3666
XACa0022
XACa0039
XACbOO15
XAChOO65


avrXacE3
xopl
xopK


RNA-seq
log2FC p-value
-1.3432 1.21 E-07


FDR
1.69E-0'


-0.9098 2.20E-04 1.51E-02
-1.1830 5.40E-04 3.11E-02
-2.1505 0.00E+00 0.00E+00


Microarray
log2FC p-value
-1.1813 0.00E+00


Detected by


FDR
0.00E+00


-1.6363 O.OOE+00 O.OOE+00 t
-0.7287 O.OOE+00 3.00E-05 t
-1.7157 0.00E+00 0.00E+00 t


0.00E+00 0.00E+00 -2.9897 0.00E+00 0.00E+00 t


1.00E-05
600E-05


6.20E-04
5.10E-03


-0.7583 0.00E+00 0.00E+00 t
-1.2360 0.00E+00 0.00E+00 t


-3.0765 O.OOE+00 O.OOE+00 -3.4423 O.OOE+00 O.OOE+00 t


xopAl
hrpW
xopAD
xopQ
xopV
avrBs2
xopL
xopE
xopZ
xopAK
pthA I
pthA2
pthA3
pthA4


-I.4353 O.OOE+00 O.OOE+00
-2.0833 O.OOE+00 O.OOE+00


-0.8816 2.60E-04


-0.6779
-0.5197
-0.8632


.68E-02


95F-02 3.51 F-0


5.44E-02
.79E-03


-0.3402 2.58E-01


5.33E-01
7.90E-02
8.37E-01


-0.5716 2.45E-02 3.85E-0'
-0.6456 9.45E-03 2.56E-0'


-0.2756 2.89-01


0.0225
-0.450'


8.00E-01
2.09E-01


-1.1700 3.52E-03
-03423 8 051F-01


8.59E-0"
9.80E-0"
7.97E-0"
1.28E-0'


-1.1510 0.00E+00 0.00E+00 t
-2.7723 0.00E+00 0.00E+00 t
-0.3960 2.17E-02 2.72E-01
-1.1190 0.00E+00 0.00E+00 y
-0.7367 0.00E+00 2.70E-04 y
-0.566 5.00E-05 3.53E-03 $
-0.6040 0.00E+00 0.00E+00 y


-0.2397
-0.4160


2.68E-02 4.48E-01 $
1.30E-04 9.38E-03 $


-0.2060 3.23E-01
0.0790 6.69E-01
0.0413 8.25E-01


0.0723


OOF+00 003


9.82E-01
9.98E-01
9.98E-01


13E-01 9.98E-01


'23F-01


998F-01


t Consensus between RNA-seq and microarray 10 genes (45.5%).
Lp Only microarray 3 genes (13.6%).
f Only RNA-seq 1 genes (4.5%).
$ Undetected 8 genes (36.4%).
The list of known T3SS effector genes along with the information about the log2FC, p-value and FDR value derived from RNA-seq and microarray experiments.
"Detected by" column assigns by which method the following effector genes are found to be significantly differentially expressed.


FDR threshold. The gene XAC0755 encoding KdpF, a
component of an integral membrane potassium-
transporting system [59], is down-regulated by a factor
of 3 (log2 FC of 1.6) according to RNA-seq, but, micro-
array could not capture this, as the probes for this gene
were missing on the chip. This shows the limitation of
microarray, where probes for all the genes need to be
defined while designing the chip. Furthermore, four
genes uniquely found by RNA-seq are involved in signal
transduction and gene regulation, i.e. XAC4116 encod-
ing a serine/threonine kinase, XAC1819 encoding a
tryptophan-rich sensory protein, and two regulatory
genes XAC3026, and XAC3363, whose function in citrus
canker disease development remain to be explored. Fur-
thermore, 21 genes (24%) are currently annotated as
hypothetical proteins (Additional file 1: Table S6).
Among them, four hypothetical proteins XAC0854,
XAC4131, XAC1203, and XACb0064 were predicted to
be T3SS secreted while 7 hypothetical proteins,
XAC3275, XAC3680, XAC1943, XAC0527, XAC0599,


XAC0239, and XAC0755 were predicted to be Type 2
Secretion System (T2SS) substrates (Additional file 1:
Table S6) by Effective database [60]. Gram-negative bac-
teria employ T2SS to transport proteins to the extracellular
milieu, where the T2SS exo-proteins containing N-terminal
signal peptides are used for inner-membrane transloca-
tion through either the Sec translocon or the Tat com-
plex [61]. Genes encoding proteins secreted by T3SS and
T2SS have been experimentally proved to be regulated
by HrpX [33,62,63].
Among the 64 statistically significant differentially
expressed genes from microarray, 19 (29.7%) genes
were found to be uniquely detected by this method
(Additional file 2: Figure FS2). 18 were found to be
down-regulated, while one gene was up-regulated
(Additional file 1: Table S4). Unlike that of RNA-seq,
nearly 63% genes (12 of 19 unique) could pass the FC
cut-off threshold, but failed to pass the FDR threshold
by RNA-seq. The remaining 37% genes (7 of 19


Page 9 of 16






Kogenaru et al. BMC Genomics 2012, 13:629
http://www.biomedcentral.com/1471-2164/13/629


unique) could not pass both FC and FDR cut-off
threshold. Furthermore, six genes were found to be
hypothetical. Among them XAC2876, XAC1241, and
XAC2370 were predicted as T2SS substrates. XAC1241
predicted as a T2SS substrate, shared 73% identity with
a putative secreted protein from X. campestris pv. vesi-
catoria strain 85-10. Another T2SS candidate
XAC2370 shared 95% identity with a secreted protein
from X. fuscans subsp. aurantifolii str. ICPB 10535.
XAC1124 shared 100% identity with MEKHLA domain
protein from X. axonopodis pv. punicae str. LMG 859
[33]. This domain is found in bacteria associated with
plants. It further shares similarity with the PAS do-
main and might be involved in light, oxygen, and
redox potential sensation [64].


Comparison at the level of functional annotations of
genes
For comparison based on the biological function for the
differentially expressed genes from RNA-seq and micro-
array, we utilized the ClueGO to integrate the Gene
Ontology (GO) [65] terms and KEGG [66] pathway
terms and create a functionally organised GO/KEGG
network. Functional annotation with biological processes
category resulted in 13 (14.94%) genes found from clus-
ter for RNA-seq, while for microarray it was 12
(19.35%).
The ClueGO overview pie chart highlighted that sig-
nificant proportion of the genes differentially regulated
are involved in "protein secretion by the T3SS" by both
RNA-seq and microarray (Additional file 3: Figure FS3A
& D). Additionally, RNA-seq also identified genes
involved in "secretion activity by cell" as well as "single
organism catabolic process" (Additional file 3: Figure
FS3A). On the other hand, microarray highlighted the
genes involved in "protein transmembrane transport",
"polycyclic aromatic hydrocarbon degradation" and "es-
tablishment of localization in cell" (Additional file 3:
Figure FS3D). Majority of the genes are involved in "bac-
terial secretion system", as shown by both RNA-seq and
microarray. Also the differentially expressed genes are
found to be significantly involved in the "transport of
monovalent inorganic cation" (Additional file 3: Figure
FS3B) and "protein transport" (Additional file 3: FS3E).
Genes have also been found uniquely by microarray as
significantly involved in "polycyclic aromatic hydrocar-
bon degradation" (Additional file 3: Figure FS3E). Genes
from RNA-seq have been found to be involved in "ribofla-
vin metabolism" as well as "single organism catabolic
process" (Additional file 3: Figure FS3B). Further,
visualization of the functionally grouped annotation net-
work for the differentially regulated genes derived from
RNA-seq (Additional file 3: Figure FS3C) and microarray


(Additional file 3: Figure FS3F) methods highlighted the
relationships between the terms. RNA-seq highlighted "pro-
tein secretion by the T3SS" along with the "small molecule
catabolic process", while microarray reflected "polycyclic
aromatic hydrocarbon degradation" and "establishment of
localization in cell", as the most significant terms of the
group. This analysis also showed that RNA-seq and micro-
array together provide more comprehensive functional in-
formation than the individual methods.

PIP box detection
HrpX is known to regulate the target gene expression by
specifically binding to PIP box motif present in the
cis-regulatory regions. PIP box consists of direct
repeats of "TTCGC" with a spacer of 8 to 26-bps in
between the repeats, even though ideally 8-bps and
15-bps are considered as the canonical PIP box [37].
We exploited this feature and looked for PIP boxes in
the promoter regions of the 106 significantly differen-
tially expressed genes (Additional file 4). All the 106
differentially expressed genes could be assigned to 90
transcriptional units based on MetaCyc database [67]
(Additional file 1: Table S8). However for simplicity,
genes under the control of the same cis-regulatory
regions were counted separately. Among the consensus
45 genes, 36 (80%) were shown to have canonical PIP
boxes (Figure 6A, Additional file 1: Table S7). Of the 42
genes that are uniquely determined by RNA-seq, 13
(31%) genes were confirmed to have PIP boxes; whereas,
among the 19 genes that are uniquely determined by
microarray 11 (57.8%) genes were confirmed to have PIP
boxes (Figure 6A, Additional file 1: Table S7).
In this study, we identified newly PIP box motif in 7
(19.4%) genes among consensus, 13 (100%) genes unique
to RNA-seq and 1 (9%) gene unique to microarray
(Figure 6B). Overall, 60 of the 106 (-57%) signifi-
cantly differentially expressed genes were confirmed to
have PIP boxes in their cis-regulatory regions (Add-
itional file 1: Table S7, Additional file 4). The pres-
ence of PIP box confirmed that these genes may be
directly regulated by HrpX, while the remaining 46
that do not have PIP boxes may be indirectly regu-
lated by HrpX via the other transcription factors. In
this regard, we looked for genes with sequence spe-
cific DNA binding activity in the 106 differentially
expressed genes. Six genes namely hrpG, pcaQ, blal,
XAC3026, XAC3445, and XAC3446 were known to
have sequence specific DNA binding activity according
to GO annotation. Among them, XAC3446, XAC3445,
and blal have been newly identified in this study con-
taining PIP box motif (Additional file 1: Table S7,
Additional file 4). Thereby these 3 transcription regu-
lators are directly regulated by HrpX, which in turn
we assume regulate the 46 genes, which do not


Page 10 of 16






Kogenaru et al. BMC Genomics 2012, 13:629
http://www.biomedcentral.com/1471-2164/13/629


contain the PIP box motif and hence indirectly regu-
lated by HrpX. The DNA binding signatures for many
of these transcription factors are unknown; hence, ob-
scure the further confirmation of regulation by these
transcription factors. Nevertheless, the fact that many
of the genes that were uniquely determined by each
method showed a clear PIP box in their cis-
regulatory regions reiterates that RNA-seq and micro-
array complement each other.

Discussion
Currently, RNA-seq is becoming the preferable choice
for gene expression profiling in place of microarrays.
Although, all the parameters that influence the various
aspects of this method are yet to be understood com-
pletely, RNA-seq undoubtedly is playing a very important
role in deciphering the complexity of the transcriptome
by giving a new direction to isoforms, allelic expression,
untranslated regions, splice junctions, antisense regula-
tion and intragenic expression [10,16,29,68-74]. Several
studies have begun to investigate on the parameters like
sequencing depth, precision, GC bias, length bias, lane
effects, and processing artifacts [16,29,48,75-77]. On the
other hand, microarrays are in usage for more than two
decades. Therefore, most of the biases inherent to this
method have become more apparent [78]. For instance,
biases in the hybridization of the samples labeled with
Cyanine5 (Cy5) and Cyanine3 (Cy3) are sufficiently
explored, and currently several approaches are practiced
to minimize such effects [79-82]. Further, systematic
variability like influence of the image scanner settings on
the dye intensity measurements have now been robustly
handled by applying various normalization techniques
[83-86]. Despite these developments, some inherent


genes-specific biases like differential hybridization
efficiencies of the labeled target transcript to the
same probe are still found to be inevitable in micro-
arrays. In RNA-seq as well as microarray, all these
known and unknown parameters influence the final
outcome. Therefore, in this study, we focused on the
assessment of RNA-seq and microarray based on the
final outcome .i.e. statistically significant differentially
expressed genes.
In comparison with previous RNA-seq studies, with a
sequence coverage of 97% we observed for our data set,
is in consistence with the reported 89.5% to 95% cover-
age observed in other bacterial RNA-seq studies [87-89].
In our study, RNA-seq has identified more significantly
differentially expressed genes (82%), when compared to
microarray (63%) as in previous studies [18,29,30]. The
overall correlation (rs 0.76) in the magnitudes of FC for
the consensus genes between the two methods was
found to be similar or higher than previous studies
[18,29,30,72]. Furthermore, our comparison analysis with
qRT-PCR suggested that the expression levels were
highly reliable for those genes that were determined to
be differentially expressed by both RNA-seq and micro-
array. Hence, confirming the differential expression of
genes by multiple methods reduces false positives
thereby enhances the biological discovery.
Even though microarray overall outperformed RNA-
seq by detecting more known HrpX target genes from
the T3SS in hrp cluster by satisfying both FC and FDR
cut-off threshold, in principle RNA-seq also detected
genes hrpBS, hrcS, hpaP, XAC0395, hrpB7, and hrcT, in
terms of FC, but failed to pass FDR threshold. This
parameter is more directly influenced by error model
considered in the statistical method that is used to


Page 11 of 16


A B
100- 100-
m Known
80- M Novel 80 -
S0-
S60- 60-

S40- 40
a W
S 2 0 2 20

0O 0


4 .o1


Significantly differentially expressed genes
Figure 6 Statistics of HrpX binding sites in the cis-regulatory regions of significantly differentially expressed genes from RNA-seq and
microarray. The genes belonging to consensus and unique to each method are shown (A) Percentage of genes containing PIP box in the
cis-regulatory regions of genes that belong to three different groups are shown. The known (black bar) and novel (red bar) are indicated. (B) Only
genes with PIP box are shown, percentage of which already known (black bar) and novel (red bar) are indicated.






Kogenaru et al. BMC Genomics 2012, 13:629
http://www.biomedcentral.com/1471-2164/13/629


infer the differential expression rather than RNA-seq
itself. For the same read counts, one can get slightly
different FDR values depending on the statistical
method [90]. But the implementation of all the statis-
tical methods is not feasible for every dataset. From
the T3SS in hrp cluster, three genes namely, hrcC,
hpa2, and hpaA were not found to be detected by
both RNA-seq and microarray, mainly because they fail
to pass FDR threshold. Interestingly, our previous
microarray analysis confirmed that all these three
genes are regulated by HrpX, but only at a later stage
of the growth phase by satisfying both FC and FDR
cut-off thresholds [33]. This consolidates the regulation
of some of the genes at later stages of the growth
phase. Further, in case of Type III effector genes, 8
genes (36.4%) were not detected by both RNA-seq and
microarray within considered cut-off threshold limit.
However, among them xopL, avrBs2, xopAK and xopZ
were found to be regulated by HrpX only at the later
stage of the growth phase (OD6oo time point 0.5),
according to our previous microarray analysis [33].
Further, four genes namely, pthA2, pthAl, pthA3,
pthA4 were regulated by another transcription regula-
tor HrpG at early stage of growth phase (OD6oo =
0.25 and 0.4) as observed in our previous study, while
another undetected gene xopE was found to be also
regulated by HrpG, but only at OD6oo = 0.25 time
point of growth phase [33]. Thereby this study further
validated our previous results. Subsequently, both
methods detected 100% of the genes known to be
regulated by HrpX (at time point OD6oo = 0.4) without
any false positives. Among them, 72% were detected
by both the methods while interestingly 28% of the
known target genes were detected by either one of the
methods. Hence, both the methods together could
complement each other.
In addition 55 genes (~51%) were newly identified as
differentially expressed by applying both microarray as
well as RNA-seq methods, thereby adding up to the
already existing repertoire of HrpX regulated genes. Fur-
thermore, 46 (83.6%) genes among them were uniquely
identified by either one of the methods. Overall, 21
newly identified genes were found to have PIP box in
their promoter regions, wherein 14 (58.3%) genes were
uniquely identified by either RNA-seq or microarray.
The presence of the PIP box in the promoter regions of
the HrpX-regulated genes uniquely identified by RNA-
seq and microarray further not only confirmed that
these genes are directly regulated by HrpX, but also that
these candidates are not false positives. Consequently,
100% of the known HrpX regulated genes could only be
detected together by both the methods, since each
method missed out on some of the known genes; hence
both the methods together enhance the understanding


of HrpX regulome by providing a more comprehensive
picture.

Conclusions
This study has significantly advanced our understanding
of the regulome of the critical transcriptional factor
HrpX and demonstrates that RNA-seq and microarray
complement each other in transcriptome profiling. Con-
sequently, our study demonstrates the advantage of ap-
plying multiple transcriptome profiling methods to
reveal a more comprehensive picture of a transcriptome,
rather than relying solely on one method.

Methods
Bacterial strains and growth conditions
The wild-type X. citri subsp. citri [32], and the hrpX mu-
tant strains used in this study were described in our pre-
vious study [33]. Both the strains were grown at 28C in
nutrient broth (NB), on nutrient agar (NA), or in NYG
medium [91]. Antibiotics rifamycin and kanamycin were
added to the media at 50 Vg/ml final concentrations.

RNA extraction
Total RNA was extracted from the wild-type and the
hrpX mutant strains as described in our previous study
[33]. Briefly, strains from NA plates were grown in NB
medium at 28C until mid-exponential phase. Cultures
were harvested by centrifugation and inoculated in to
nutrient-deficient XVM2 medium, after washing the pel-
let once with the same medium. Cultures were finally
harvested for RNA extraction, when the optical density
at 600 nm reached the value of 0.4, and mixed immedi-
ately with RNAprotect bacterial reagent (Qiagen, Valen-
cia, CA, and U.S.A.). Total RNA was extracted from
each replicate separately using RiboPure bacteria kit
(Ambion, Austin, TX, USA), according to manufacturer's
instructions. Genomic DNA contamination from the
extracted RNA samples was removed using TURBO
DNA-free kit (Ambion). Amount and the quality of the
RNA samples was initially determined using NanoDrop
1000 spectrophotometer (NanoDrop Technologies, Inc.,
Wilmington, DE). Samples with absorbency at 260/280
and 260/230 nm ratios > 2 were subjected to further
processing. Three biological replicates of the wild-type
and the hrpX mutant samples were used for RNA-seq
analysis.

Microarray data
The microarray data used in this study was generated
during our previous study [33]. Three unique 60-mer
oligonucleotide probes were designed for each of the
4,427 protein coding genes of X. citri subsp. citri [33]. 8-
by-15-K DNA microarray chips covering the whole gen-
ome were implemented under the Agilent platform.


Page 12 of 16







Kogenaru et al. BMC Genomics 2012, 13:629
http://www.biomedcentral.com/1471-2164/13/629


These microarrays were processed at the Interdisciplin-
ary Center for Biotechnology Research Microarray Core
Facility, University of Florida. The raw data is available
at National Center for Biotechnology Information
(NCBI) Gene Expression Omnibus (GEO) data reposi-
tory under the accession number GSE24016 [33].

mRNA enrichment and RNA-seq
Total RNA samples were enriched for mRNA, by depleting
rRNA using MICROBExpress kit from Ambion following
the manufacturer's instructions. Enriched samples were
checked for integrity using Agilent 2100 Bioanalyzer
(Agilent Technologies, Santa Clara, CA, USA). RNA
samples that passed the quality control were sequenced
using the Illumina Genome Analyzer IIx (GAIIx) system
by following the standard protocol at the Center for
Genome Analysis at Yale University. Real-time analysis
and base calling were performed using the CASAVA
vl.6 pipeline. The raw sequence data has been submit-
ted to the NCBI Sequence Read Archive and assigned
with an accession number SRA052842.

Reads mapping and statistical analysis
The X. citri subsp. citri whole genome sequence consisting
of one chromosome [GenBank: NC_003919.1], and two
plasmids [GenBank: NC_003921.3 and NC_003922.1],
along with the annotation information were downloaded
from NCBI repository (ftp://ftp.ncbi.nih.gov/genomes/
Bacteria/). Quality-filtered reads were aligned on to the
genome using CLC Genomics Workbench v4.7.2 (CLC
bio, Aarhus, Denmark). Reads uniquely aligned to each
gene were tabulated from each replicate separately. Differ-
entially expressed genes were estimated using DESeq
package [92], available under the open-source Bioconduc-
tor suite of programs [93]. DESeq is a powerful tool to es-
timate the variance in RNA-seq data and test for
differential expression [92]. As an input, DESeq accepts a
table of read counts for each gene from different biological
replicates, and estimates the differentially expressed genes
using negative binomial distribution [92]. Statistically sig-
nificant differentially expressed genes from both micro-
array and RNA-seq data were obtained by applying a cut-
off threshold of FDR < 0.05 (5%) and an absolute log2
fold-change > 0.6.

Bioinformatics analysis
Similarity searches were performed online using position-
specific iterative BLAST (PSI-BLAST) at NCBI site
against non-redundant protein database [94]. T3SS and
T2SS predictions were performed using Effective database
[60]. The promoter regions of the significantly differen-
tially expressed genes were retrieved manually using NCBI
genome browser to look for the presence of PIP boxes.
The differentially expressed genes were assigned to the


transcriptional units by referring to the MetaCyc database
[67]. Biological interpretation of the differentially expressed
genes was carried out using the ClueGO vl.5 [95], a
Cytoscape plug-in [96].

qRT-PCR
All the qRT-PCR assays were performed as detailed
elsewhere [33]. Briefly, gene-specific primers were
designed for the selected genes using PrimerQuestsM
from Integrated DNA technologies (IDT), Coralville,
Iowa (Additional file 1: Table S6). qRT-PCR experi-
ments were performed in triplicates, at least three
times for each gene using 7500 fast real-time PCR system
(Applied Biosystems, Foster City, CA, USA), using a
QuantiTect SYBR green RT-PCR kit (Qiagen) with simi-
lar results, by following the manufacturer's instructions.
The relative fold change of target gene expression was
calculated using 16S rRNA as an endogenous control
with the formula 2-AACT [97].

Data availability
The raw RNA-seq data from this study is deposited at
the NCBI sequence read archive (http://www.ncbi.nlm.
nih.gov/Traces/sra/sra.cgi), under the accession number
SRA052842, while the raw microarray data is available at
the NCBI Gene Expression Omnibus (http://www.ncbi.
nlm.nih.gov/geo) with the accession number GSE24016.

Additional files

Additional file 1: The following excel format file contains the
following 8 additional tables: Table S1: Summary of RNA-seq reads
from wild-type and hrpX mutant strains ofX citri subsp citri Table 52 List
of genes that are called by both RNA-seq and microarray Table 3' List of
genes that are uniquely called by RNA-seq and microarray Table 54 List
of statistically significant differentially expressed genes by RNA-seq and
microarray filtered by cut-off thresholds Table 55 List of randomly
selected genes for the comparison with qRT PCR from the statistically
significant differentially expressed genes from RNA-seq and microarray
Table 56 Gene specific primers used in qRT-PCR experiment Table 57
Summary of bioinformatics analysis of statistically significant differentially
expressed genes to be part of Type III Secretion System (T3SS) and Type
I Secretion System (T2SS) along with the occurrence of PIP box Table S8'
List of 90 transcriptional units from X citri subsp citri to which the 106
differentially regulated genes belong
Additional file 2: Contains the following two additional figures,
Figure FS1: Venn diagram summarizing genes called by both
technologies, when comparison is carried out between the total currently
annotated open reading frames (ORFs) available transcripts from the
transcriptome ofX citri subsp citri Fold-change values are available from
RNA-seq (4323) and microarray (4349) Gene's called by both
technologies are indicated by the overlap between the two circles 4312
are found in consensus, while 11 and 37 are unique to RNA-seq and
microarray respectively Figure FS2 Venn diagram summarizing genes
that are significantly differentially expressed determined by RNA-seq and
microarray Gene's common to both methods are indicated by the
overlap between the two circles
Additional file 3: Figure FS3 Comparison at the level of the functional
annotations of the significantly differentially expressed genes from RNA-
seq and microarray GO term and KEGG pathway information enrichment


Page 13 of 16








Kogenaru et al BMC Genomics 2012, 13:629
http://www.biomedcentral.com/1471-2164/13/629


analysis is shown for the genes from RNA-seq (left panel) and microarray
(right panel) The overview of the analysis is shown in the form of pie
chart for gene set from RNA-seq (A), and microarray (D) The histogram
shows the number of genes associated with terms for the genes from
RNA-seq (B) and microarray (E) Signifcantly enriched terms are indicated
with '*' The terms that are functionally related are shown as a network
with terms as nodes and relatedness is indicated with thickness of the
edges that is based on their kappa score The most significant term per
group are shown for genes from RNA-seq (C) and microarray (F)
Additional file 4: Figure FS4 Snapshot of the PIP box motif present in
the cis-regulatory region of significantly differentially expressed genes is
shown in the context of the whole genome ofX citri subsp citri The
absolute position of each PIP box motif occurrence is shown on the
whole genome map along with the -10 'TATA' regions and the gene
start site


Competing interests
We declare no competing interests


Authors' contributions
NW conceived the idea and initiated the study SK performed analysis and
interpretation of the data and wrote the manuscript QY carried out the PCR
experiments YG participated in the study design, sample and data
collection, and data interpretation NW participated in interpretation of data
and writing the manuscript and supervised the overall work All authors read
and approved the final manuscript


Acknowledgements
We wish to acknowledge Gabriela Bindea for providing annotation fles for
ClueGO software This work was supported by United States Department of
Agriculture NIFA Special Citrus Canker Grant Project 94677

Received: 18 July 2012 Accepted: 5 November 2012
Published: 15 November 2012


References
1 Wang Z, Gerstein M, Snyder M RNA-Seq: a revolutionary tool for
transcriptomics. Nat Rev Genet 2009, 10 57-63
2 Baginsky S, Hennig L, Zimmermann P, Gruissem W Gene expression
analysis, proteomics, and network discovery. Plant Physiol 2010,
152402-410
3 Cloonan N, Forrest AR, Kolle G, Gardiner BB, Faulkner GJ, Brown MK, Taylor
DF, Steptoe AL, Wani S, Bethel G, et ao Stem cell transcriptome profiling
via massive-scale mRNA sequencing. Nat Methods 2008, 5'613-619
4 Costa V, Angelini C, De Feis I, Ciccodicola A Uncovering the complexity of
transcriptomes with RNA-Seq. J Biomed Biotechno 2010, 2010'853916
5 DeRisi J, Penland L, Brown PO, Bittner ML, Meltzer PS, Ray M, Chen Y, Su YA,
Trent JM Use of a cDNA microarray to analyse gene expression patterns
in human cancer. Nat Genet 1996, 14457-460
6 Ekins R, Chu FW Microarrays: their origins and applications. Trends
Biotechnol 1999, 17'217-218
7 Fodor SP, Rava RP, Huang XC, Pease AC, Holmes CP, Adams CL Multiplexed
biochemical assays with biological chips. Nature 1993, 364'555-556
8 Hegde P, Qi R, Abernathy K, Gay C, Dharap S, Gaspard R, Hughes JE,
Snesrud E, Lee N, Quackenbush J A concise guide to cDNA microarray
analysis. Biotechniques 2000, 29'548-4 556
9 Marguerat S, Bahler J RNA-seq: from technology to biology. Cell Mol Life
Sc 2010, 67'569-579
10 Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M'
The transcriptional landscape of the yeast genome defined by RNA
sequencing. Science 2008, 320'1344-1349
11 Nagalakshmi U, Waern K, Snyder M RNA-Seq: a method for
comprehensive transcriptome analysis. Curr Protoc Mol Biol 2010,
Chapter 4 Unit 411 1-13
12 Ramsay G DNA chips: state-of-the art. Nat Biotechnol 1998, 1640-44
13 Schena M, Shalon D, Davis RW, Brown PO Quantitative monitoring of
gene expression patterns with a complementary DNA microarray. Science
1995, 270467-470


14 Tan PK, Downey TJ, Spitznagel EL Jr, Xu P, Fu D, Dimitrov DS, Lempicki RA,
Raaka BM, Cam MC Evaluation of gene expression measurements from
commercial microarray platforms. Nucleic Acids Res 2003, 31'5676-5684
15 Toung JM, Morley M, Li M, Cheung VG RNA-sequence analysis of human
B-cells. Genome Res 2011, 21991-998
16 Wilhelm BT, Marguerat S, Watt S, Schubert F, Wood V, Goodhead I, Penkett
C, Rogers J, Bahler J Dynamic repertoire of a eukaryotic transcriptome
surveyed at single-nucleotide resolution. Nature 2008, 453'1239-1243
17 Xiang CC, Chen Y cDNA microarray technology and its applications.
Biotechnol Adv 2000, 18 35-46
18 Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B Mapping and
quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 2008,
5621-628
19 Pariset L, Chillemi G, Bongiorni S, Romano SV, Valentini A Microarrays and
high-throughput transcriptomic analysis in species with incomplete
availability of genomic sequences. N Biotechnol 2009, 25'272-279
20 Schena M, Heller RA, Theriault TP, Konrad K, Lachenmeier E, Davis RW
Microarrays: biotechnology's discovery platform for functional genomics.
Trends Biotechnol 1998, 16'301-306
21 Fu X, Fu N, Guo S, Yan Z, Xu Y, Hu H, Menzel C, Chen W, Li Y, Zeng R, et o/
Estimating accuracy of RNA-Seq and microarrays with proteomics.
BMC Genomics 2009, 10 161
22 Shendure J The beginning of the end for microarrays? Nat Methods 2008,
5585-587
23 van Vliet AH Next generation sequencing of microbial transcriptomes:
challenges and opportunities. FEMS Microbiol Lett 2010, 302'1-7
24 Raz T, Kapranov P, Lipson D, Letovsky S, Milos PM, Thompson JF Protocol
dependence of sequencing-based gene expression measurements. PLoS
One 2011, 6el 9287
25 Martin JA, Wang Z Next-generation transcriptome assembly. Nat Rev
Genet 2011, 12 671-682
26 Zhou X, Ren L, Meng Q, Li Y, Yu Y, Yu J The next-generation sequencing
technology and application. Protein Cell 2010, 1520-536
27 't Hoen PA, Ariyurek Y, Thygesen HH, Vreugdenhil E, Vossen RH, de Meneze,
RX, Boer JM, van Ommen GJ, den Dunnen JT Deep sequencing-based
expression analysis shows major advances in robustness, resolution and
inter-lab portability over five microarray platforms. Nucleic Acids Res 2008,
36e141 doie141
28 Leimena MM, Wels M, Bongers RS, Smid EJ, Zoetendal EG, Kleerebezem M
Comparative Analysis of Lactobacillus plantarum WCFS1 Transcriptomes
by Using DNA Microarray and Next-Generation Sequencing
Technologies. App/ Environ Microbiol 2012, 784141-4148
29 Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y RNA-seq: an
assessment of technical reproducibility and comparison with gene
expression arrays. Genome Res 2008, 18'1509-1517
30 Su Z, Li Z, Chen T, Li QZ, Fang H, Ding D, Ge W, Ning B, Hong H, Perkins
RG, et oal Comparing next-generation sequencing and microarray
technologies in a toxicological study of the effects of aristolochic acid
on rat kidneys. Chem Res Toxicol 2011, 24'1486-1493
31 Wengelnik K, Bonas U HrpXv, an AraC-type regulator, activates
expression of five of the six loci in the hrp cluster of Xanthomonas
campestris pv. vesicatoria. J Bacteriol 1996, 178 3462-3469
32 da Silva AC, Ferro JA, Reinach FC, Farah CS, Furlan LR, Quaggio RB,
Monteiro Vitorello CB, Van Sluys MA, Almeida NF, Alves LM, et o'
Comparison of the genomes of two Xanthomonas pathogens with
differing host specificities. Nature 2002, 417459-463
33 Guo Y, Figueiredo F, Jones J, Wang N HrpG and HrpX play global roles in
coordinating different virulence traits of Xanthomonas axonopodis pv.
citri. Mol Plant Microbe Interact 2011, 24'649-661
34 Civerolo E Bacterial canker disease of citrus. J Rio Grande Vaol Hortic Soc
1984, 37'127-145
35 Astua-Monge G, Freitas-Astua J, Bacocina G, Roncoletta J, Carvalho SA,
Mchado MA Expression profiling of virulence and pathogenicity genes
of Xanthomonas axonopodis pv. citri. J Bacterio 2005, 187 1201-1205
36 Fenselau S, Bonas U Sequence and expression analysis of the hrpB
pathogenicity operon of Xanthomonas campestris pv. vesicatoria which
encodes eight proteins with similarity to components of the Hrp, Ysc,
Spa, and Fli secretion systems. Mol Plant Microbe Interact 1995,
8845-854
37 Koebnik R, Kruger A, Thieme F, Urban A, Bonas U Specific binding of the
Xanthomonas campestris pv. vesicatoria AraC-type transcriptional


Page 14 of 16








Kogenaru et al BMC Genomics 2012, 13:629
http://www.biomedcentral.com/1471-2164/13/629


activator HrpX to plant-inducible promoter boxes. J Bacteril 2006,
1887652-7660
38 Noel L, Thieme F, Nennstiel D, Bonas U Two novel type Ill-secreted
proteins of Xanthomonas campestris pv. vesicatoria are encoded within
the hrp pathogenicity island. J Bacteriol 2002, 184 1340-1348
39 Alfano JR, Collmer A The type III (Hrp) secretion pathway of plant
pathogenic bacteria: trafficking harpins, Avr proteins, and death.
J Bacteril 1997, 179'5655-5662
40 Bonas U hrp genes of phytopathogenic bacteria. Curr Top Microbiol
/mmunol 1994, 192 79-98
41 Buttner D, Bonas U Regulation and secretion of Xanthomonas virulence
factors. FEMS Microbiol Rev 2010, 34'107-133
42 Iwamoto M, Oku T Cloning and molecular characterization of hrpX from
Xanthomonas axonopodis pv. citri. DNA Seq 2000, 11 167-173
43 Gurlebeck D, Thieme F, Bonas U Type III effector proteins from the plant
pathogen Xanthomonas and their role in the interaction with the host
plant. J Plant Physio/ 2006, 163'233-255
44 Oku T, Alvarez AM, Kado Cl Conservation of the hypersensitivity-
pathogenicity regulatory gene hrpX of Xanthomonas campestris and X.
oryzae. DNA Seq 1995, 5'245-249
45 Lahaye T, Bonas U Molecular secrets of bacterial type III effector proteins.
Trends Plant Sci 2001, 6479-485
46 Kelley DR, Schatz MC, Salzberg SL Quake: quality-aware detection and
correction of sequencing errors. Genome Biol 2010, 11 R116
47 Klebanov L, Yakovlev A How high is the level of technical noise in
microarray data? Bio/ Direct 2007, 2'9
48 Oshlack A, Wakefield MJ Transcript length bias in RNA-seq data
confounds systems biology. Biol Direct 2009, 4'14
49 Sahl JW, Rasko DA Analysis of global transcriptional profiles of
enterotoxigenic Escherichia coli isolate E24377A. Infect Immun 2012,
80 1232-1242
50 Alfano JR, Collmer A Type III secretion system effector proteins: double
agents in bacterial disease and plant defense. Annu Rev Phytopathol 2004,
42'385-414
51 Collmer A, Bauer DW Erwinia chrysanthemi and Pseudomonas syringae:
plant pathogens trafficking in extracellular virulence proteins. Curr Top
Microbiol immunol 1994, 192 43-78
52 Cornelis GR The type III secretion injectisome. Nat Rev Microbiol 2006,
4811-825
53 Szczesny R, Jordan M, Schramm C, Schulz S, Cogez V, Bonas U, Buttner D'
Functional characterization of the Xcs and Xps type II secretion systems
from the plant pathogenic bacterium Xanthomonas campestris
pv vesicatoria. New Phytol 2010, 187'983-1002
54 Van GF, Genin S, Boucher C Conservation of secretion pathways for
pathogenicity determinants of plant and animal bacteria. Trends Microbiol
1993, 1 175-180
55 White FF, Potnis N, Jones JB, Koebnik R The type III effectors of
Xanthomonas. Mol Plant Pathol 2009, 10749-766
56 Lindgren PB The role of hrp genes during plant-bacterial interactions.
Annu Rev Phytopathol 1997, 35'129-152
57 Lipscomb L, Schell MA Elucidation of the regulon and cis-acting
regulatory element of HrpB, the AraC-type regulator of a plant
pathogen-like type III secretion system in Burkholderia pseudomallei.
J Bocteio 2011, 193 1991-2001
58 Kim JG, Park BK, Yoo CH, Jeon E, Oh J, Hwang I Characterization of the
Xanthomonas axonopodis pv. glycines Hrp pathogenicity island.
J Bacteril 2003, 185'3155-3166
59 Hu GB, Rice WJ, Drose S, Altendorf K, Stokes DL' Three-dimensional
structure of the KdpFABC complex of Escherichia coli by electron
tomography of two-dimensional crystals. J Struct Biol 2008, 161411-418
60 Jehl MA, Arnold R, Rattei T Effective-a database of predicted secreted
bacterial proteins. Nucleic Adds Res 2011, 39D591 -D595
61 Korotkov KV, Sandkvist M, Hol WG The type II secretion system:
biogenesis, molecular architecture and mechanism. Nat Rev Microbiol
2012, 10'336-351
62 Furutani A, Takaoka M, Sanada H, Noguchi Y, Oku T, Tsuno K, Ochiai H,
Tsuge S Identification of novel type III secretion effectors in
Xanthomonas oryzae pv. oryzae. Mol Plant Microbe Interact 2009,
2296-106
63 Wang L, Rong W, He C Two Xanthomonas extracellular
polygalacturonases, PghAxc and PghBxc, are regulated by type III


secretion regulators HrpX and HrpG and are required for virulence. Mol
Plant Microbe Interact 2008, 21'555-563
64 Mukherjee K, Burglin TR MEKHLA, a novel domain with similarity to PAS
domains, is fused to plant homeodomain-leucine zipper III proteins.
Plant Physio/ 2006, 140 1142-1150
65 Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP,
Dolinski K, et oal Gene ontology: tool for the unification of biology The
Gene Ontology Consortium. Nat Genet 2000, 25'25-29
66 Kanehisa M, Goto S, Kawashima 5, Nakaya A The KEGG databases at
GenomeNet. Nucleic Adds Res 2002, 3042-46
67 Caspi R, Altman T, Dreher K, Fulcher CA, Subhraveti P, Keseler IM, Kothari A,
Krummenacker M, et al The MetaCyc database of metabolic pathways
and enzymes and the BioCyc collection of pathway/genome databases.
Nucleic Acds Res 2012, 40'D742-D753
68 Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R,
Ravasi T, Lenhard B, Wells C, et al The transcriptional landscape of the
mammalian genome. Science 2005, 309 1559-1563
69 Graveley BR, Brooks AN, Carlson JW, Duff MO, Landolin JM, Yang L, Artieri
CG, van Baren MJ, Boley N, Booth BW, et al The developmental
transcriptome of Drosophila melanogaster. Nature 2011, 471473-479
70 Mane SP, Evans C, Cooper KL, Crasta OR, Folkerts 0, Hutchison SK, Harkins
TT, Thierry-Mieg D, Thierry-Mieg J, Jensen RV Transcriptome sequencing of
the Microarray Quality Control (MAQC) RNA reference samples using
next generation sequencing. BMC Genomics 2009, 10264
71 Pan Q, Shai 0, Lee LJ, Frey BJ, Blencowe BJ Deep surveying of alternative
splicing complexity in the human transcriptome by high-throughput
sequencing. Nat Genet 2008, 40'1413-1415
72 Sultan M, Schulz MH, Richard H, Magen A, i... i 11 A, Scherf M, Seifert
M, Borodina T, Soldatov A, Parkhomchuk D, et al A global view of gene
activity and alternative splicing by deep sequencing of the human
transcriptome. Science 2008, 321 956-960
73 Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ,
Salzberg SL, Wold BJ, Pachter L Transcript assembly and quantification by
RNA-Seq reveals unannotated transcripts and isoform switching during
cell differentiation. Nat Biotechnol 2010, 28'511-515
74 Van BH, Nislow C, Blencowe BJ, Hughes TR Most "dark matter" transcripts
are associated with known genes. PLoS Biol 2010, 8'e 000371
75 Bullard JH, Purdom E, Hansen KD, Dudoit 5 Evaluation of statistical
methods for normalization and differential expression in mRNA-Seq
experiments. BMC Bioinforma 2010, 1194
76 Labaj PP, Leparc GG, Linggi BE, Markillie LM, Wiley HS, Kreil DP
Characterization and improvement of RNA-Seq precision in quantitative
transcript expression profiling. Bioinformotics 2011, 27i383-i391
77 Risso D, Schwartz K, Sherlock G, Dudoit 5 GC-content normalization for
RNA-Seq data. BMC Bioinforma 2011, 12480
78 Draghici 5, Khatri P, Eklund AC, 5zallasi Z Reliability and reproducibility
issues in DNA microarray measurements. Trends Gener 2006,
22'101-109
79 Goryachev AB, Macgregor PF, Edwards AM Unfolding of microarray data.
J Comput Biol 2001, 8443-461
80 Kerr MK, Martin M, Churchill GA Analysis of variance for gene expression
microarray data. J Comput Biol 2000, 7'819-837
81 Tseng GC, Oh MK, Rohlin L, Liao JC, Wong WH Issues in cDNA microarray
analysis: quality filtering, channel normalization, models of
variations and assessment of gene effects. Nucleic Adds Res 2001,
29'2549-2557
82 Wang X, Wang D, Chen X, Hu M, Wang J, Li Y, Guo N, Shen B cDNA
cloning and function analysis of two novel erythroid differentiation
related genes. Sc China C Life Sc 2001, 4499-105
83 Cleveland WS, Devlin SJ, Grosse E Regression by Local Fitting: Methods,
Properties, and Computational Algorithms. J Econ 1988, 37'87-114
84 Engelen K, Coessens B, Marchal K, De MB MARAN: normalizing micro-array
data. Bioinformotics 2003, 19893-894
85 Ihaka R, Gentleman R R: A language for data analysis and graphics.
Journal of Computational and Graphical Statistics Journal of Computational
and Graphicol Statistics 1996, 5299-314
86 Venet D MatArray: a Matlab toolbox for microarray data. Bioinformatics
2003, 19'659-660
87 Kumar R, Lawrence ML, Watt J, Cooksey AM, Burgess SC, Nanduri B
RNA-seq based transcriptional map of bovine respiratory disease
pathogen "Histophilus somni 2336". PLoS One 2012, 7'e29435


Page 15 of 16








Kogenaru et al. BMC Genomics 2012, 13:629 Page 16 of 16
http://www.biomedcentral.com/1471-2164/13/629





88 Wurtzel 0, Sapra R, Chen F, Zhu Y, Simmons BA, Sorek R' A single-base
resolution map of an archaeal transcriptome. Genome Res 2010,
20'133-141
89 Yoder-Himes DR, Chain PS, Zhu Y, Wurtzel 0, Rubin EM, Tiedje JM, Sorek R
Mapping the Burkholderia cenocepacia niche response via high-
throughput sequencing. Proc Not/Acod Sd U S A 2009, 1063976-3981
90 Tarazona S, Garcia-Alcalde F, Dopazo J, Ferrer A, Conesa A Differential
expression in RNA-seq: a matter of depth. Genome Res 2011,
21'2213-2223
91 Daniels MJ, Barber CE, Turner PC, Sawczyc MK, Byrde RJ, Fielding AH
Cloning of genes involved in pathogenicity of Xanthomonas campestris
pv. campestris using the broad host range cosmid pLAFRI. EMBO J 1984,
3'3323-3328
92 Anders S, Huber W Differential expression analysis for sequence count
data. Genome Biol 2010, 11 'R106
93 Reimers M, Carey VJ Bioconductor: an open source framework for
bioinformatics and computational biology. Methods Enzymol 2006,
411 119-134
94 Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ
Gapped BLAST and PSI-BLAST: a new generation of protein database
search programs. Nucleic Acids Res 1997, 25 3389-3402
95 Bindea G, Mlecnik B, Hackl H, Charoentong P, Tosolini M, Kirilovsky A,
Fridman WH, Pages F, Trajanoski Z, Galon J ClueGO: a Cytoscape plug-in
to decipher functionally grouped gene ontology and pathway
annotation networks. Bioinformotics 2009, 25 1091-1093
96 Shannon P, Markiel A, Ozier 0, Baliga NS, Wang JT, Ramage D, Amin N,
Schwikowski B, Ideker T Cytoscape: a software environment for
integrated models of biomolecular interaction networks. Genome Res
2003, 13 2498-2504
97 Livak KJ, Schmittgen TD Analysis of relative gene expression data using
real-time quantitative PCR and the 2(-Delta Delta C(T)) Method.
Methods 2001, 25402-408

doi:10.1186/1471-2164-13-629
Cite this article as: Kogenaru et al RNA-seq and microarray
complement each other in transcriptome profiling. BMC Genomics 2012
13629


Submit your next manuscript to BioMed Central
and take full advantage of:

* Convenient online submission
* Thorough peer review
* No space constraints or color figure charges
* Immediate publication on acceptance
* Inclusion in PubMed, CAS, Scopus and Google Scholar
* Research which is freely available for redistribution

Submit your manuscript at nt
www.biomedcentral.com/submit ItlIa- d Central




Full Text

PAGE 1

D A E C F B


!DOCTYPE art SYSTEM 'http:www.biomedcentral.comxmlarticle.dtd'
ui 1471-2164-13-629
ji 1471-2164
fm
dochead Research article
bibl
title
p RNA-seq and microarray complement each other in transcriptome profiling
aug
au id A1 snm Kogenarufnm Sunithainsr iid I1 email skogenaru@ufl.edu
A2 QingYanyanqing@ufl.edu
A3 GuoYinpingyinping@ufl.edu
A4 ca yes WangNiannianwang@ufl.edu
insg
ins Citrus Research and Education Center, Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, 700 Experiment Station Road, Lake Alfred, 33850, USA
source BMC Genomics
section Transcriptomicsissn 1471-2164
pubdate 2012
volume 13
issue 1
fpage 629
url http://www.biomedcentral.com/1471-2164/13/629
xrefbib pubidlist pubid idtype doi 10.1186/1471-2164-13-629pmpid 23153100
history rec date day 18month 7year 2012acc 5112012pub 15112012
cpyrt 2012collab Kogenaru et al.; licensee BioMed Central Ltd.note This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
kwdg
kwd RNA-seq
Microarray
Transcriptome profiling
Pathogenic bacteria
Virulence
Type 3 secretion system
Effectors
HrpX
Xanthomonas
Citrus canker disease
abs
sec
st
Abstract
Background
RNA-seq and microarray are the two popular methods employed for genome-wide transcriptome profiling. Current comparison studies have shown that transcriptome quantified by these two methods correlated well. However, none of them have addressed if they complement each other, considering the strengths and the limitations inherent with them. The pivotal requirement to address this question is the knowledge of a well known data set. In this regard, HrpX regulome from pathogenic bacteria serves as an ideal choice as the target genes of HrpX transcription factor are well studied due to their central role in pathogenicity.
Results
We compared the performance of RNA-seq and microarray in their ability to detect known HrpX target genes by profiling the transcriptome from the wild-type and the it hrpX mutant strains of γ-Proteobacterium Xanthomonas citri subsp. citri. Our comparative analysis indicated that gene expression levels quantified by RNA-seq and microarray well-correlated both at absolute as well as relative levels (Spearman correlation-coefficient, rsub s > 0.76). Further, the expression levels quantified by RNA-seq and microarray for the significantly differentially expressed genes (DEGs) also well-correlated with qRT-PCR based quantification (rs = 0.58 to 0.94). Finally, in addition to the 55 newly identified DEGs, 72% of the already known HrpX target genes were detected by both RNA-seq and microarray, while, the remaining 28% could only be detected by either one of the methods.
Conclusions
This study has significantly advanced our understanding of the regulome of the critical transcriptional factor HrpX. RNA-seq and microarray together provide a more comprehensive picture of HrpX regulome by uniquely identifying new DEGs. Our study demonstrated that RNA-seq and microarray complement each other in transcriptome profiling.
bdy
Background
Transcriptome of an organism represents the entire repertoire of transcripts encoded by the genes as a phenotypic response to the condition in which they exist. The sheer ability to simultaneously quantify the expression levels for a vast number of genes has revolutionized the biomedical research, facilitating the analysis of global gene expression patterns at the genome-wide scale abbrgrp
abbr bid B1 1
. In the past decade, there has been a tremendous progress in the development of methods to deduce and quantify the gene expression levels at the whole transcriptome level
1
. Among the several transcriptome profiling methods, RNA-seq and DNA microarray stand out as the two widely used genome-wide gene expression quantification methods
1
B2 2
B3 3
B4 4
B5 5
B6 6
B7 7
B8 8
B9 9
B10 10
B11 11
B12 12
B13 13
B14 14
B15 15
B16 16
B17 17
.RNA-seq method involves the conversion of isolated transcripts into the complementary DNA (cDNA), which is then directly sequenced in a massively parallel deep-sequencing-based approach
B18 18
. By mapping the resulting short sequencing reads onto the reference genome, the expression levels of genes relative to the condition of interest or absolute levels can be quantified
9
11
. This method has been implemented in different platforms like Illumina’s Genome Analyzer, Roche 454 Genome Sequence, and Applied Biosystems’ SOLiD
4
. On the other hand, microarray is based on the hybridization of specimen target strands onto the immobilized complementary probe strands. For example, in a two-color microarray, transcripts extracted from different conditions are labeled with distinct fluorescent dyes while being converted to cDNA. These labeled samples are then hybridized to the immobilized complementary probe strands in an array representing the genes. By measuring the light intensity of the distinct fluorescent dyes, the relative abundance of each transcript in the two different conditions can be measured
8
12
13
17
B19 19
B20 20
. Affymetrix and Agilent are the two prevalent platforms in microarray technology
2
14
.Even though, initially microarray has been instrumental in whole transcriptome analysis, currently RNA-seq is becoming a preferred method of choice, since it is considered to effectively surmount the limitations of microarray
1
B21 21
B22 22
B23 23
. RNA-seq technology, unlike microarray, does not depend on the prerequisite knowledge of the reference transcriptome
B24 24
. Further, RNA-seq data contains very low background signal, a higher dynamic range of expression levels, and also relatively small amount of total RNA required for quantification, when compared to microarray
1
23
. Despite these advantages, the efficiency of RNA-seq is marred with the problem of overwhelming amount of ribosomal RNA (rRNA) in the data, short reads, less base accuracy, and variation of read density along the length of the transcript, posing a challenge for this high-throughput method
21
B25 25
B26 26
. However, in spite of their strengths and limitations, RNA-seq and microarray have become the default popular methods of choices for genome-wide transcriptome studies
1
2
23
.Currently several studies have been conducted to compare the performance of RNA-seq and microarray in quantifying the expression level of genes, by focusing on various aspects like reproducibility, accuracy, statistical issues, technical and biological variabilities
1
15
21
B27 27
B28 28
B29 29
B30 30
. The main conclusion from these studies has been that the expression levels quantified by these two methods correlated to a large extent, and overall favored the RNA-seq because of high reproducibility, accuracy, and dynamic range
27
29
. However, none of these comparison studies have addressed if these two methods complement each other in transcriptome profiling given the strengths and limitations associated with them. In order to address this question, we require an already well characterized dataset. The HrpX regulome from Xanthomonas citri subsp. citri (Xcc) serves as an ideal data model in this regard
B31 31
B32 32
B33 33
. Xcc is a causal agent of citrus canker, one of the serious and destructive diseases in citrus that is resulting in significant losses to citrus industry worldwide
B34 34
, while HrpX is a key global transcription factor that regulates the expression of hrp (hypersensitive response and pathogenicity) cluster of genes, which are considered as the major pathogenicity factors
31
B35 35
. HrpX contains AraC-type of DNA binding domain, which specifically recognizes the plant-inducible promoter (PIP) box (TTCGC-N15-TTCGC) and imperfect PIP box (TTCGC-N8-TTCGT) present in the cis-regulatory regions of hrp gene cluster
B36 36
B37 37
B38 38
. Since HrpX has a key role in pathogenicity, tremendous progress has been made in cataloguing the target genes of HrpX
B39 39
B40 40
B41 41
B42 42
B43 43
B44 44
B45 45
. We therefore assessed the performance of RNA-seq and microarray in their ability to detect known HrpX target genes. We chose Illumina and Agilent as the corresponding platforms for RNA-seq and microarray, as they are the most popular platforms for these technologies
2
4
.
Results
In order to uncover the regulome of HrpX transcription regulator by profiling the wild-type and the hrpX mutant strains transcriptome, we had designed a microarray chip covering the whole genome under Agilent platform in our previous study
33
. Here, we conducted genome-wide transcriptome profiling of these two strains by RNA-seq and compared the results to the previously published microarray data, to assess the performance of these two methods. Further, to avoid technical variation associated with RNA isolation, we used the aliquots from the same total RNA samples used for microarray experiments also for RNA-seq.We obtained 16,431,283, 17,289,220, 18,124,120 sequence reads for the wild-type and 15,084,955, 17,831,920, and 18,115,115 for the hrpX mutant strain with a median sequence length of 74-base pairs (bp) (Additional file supplr sid S1 1: Table S1). Raw reads often have high sequencing errors, especially in the 3sup ′ end where there is a high chance of sequencing errors to occur
B46 46
. We therefore filtered the reads for high quality ones by trimming off the base pairs with low quality score assigned to them during down-line processing of RNA-seq. More than 90% of the reads passed the quality filter, as a result, the median sequence length of quality filtered reads subsequently dropped to 68-bp (Additional file 1: Table S1). We then mapped these high quality trimmed reads on to the Xcc genome. Approximately more than 90% of the reads could be mapped on to the reference genome, indicating good sequence coverage (Additional file 1: Table S1). Overall ~97% of the annotated genes had more than one read mapped, while merely ~3% of the annotated genes had no reads mapped, indicating good sequencing depth. Further, we also observed a difference in the sequence coverage between the chromosome and the two endogenous plasmids of Xcc. Annotated coding genes from the chromosome with a size of 5.18 mega base pairs (Mb) had 98% sequence coverage, whereas, it was 78% for plasmid pXAC64 with a size of 0.06 Mb, and relatively lower with only 62% sequence coverage for plasmid pXAC33 with a size of 0.03 Mb (Additional file 1: Table S2).
suppl
Additional file 1
text
b The following excel format file contains the following 8 additional tables: Table S1: Summary of RNA-seq reads from wild-type and hrpX mutant strains of X. citri subsp. citri. Table S2: List of genes that are called by both RNA-seq and microarray. Table S3: List of genes that are uniquely called by RNA-seq and microarray. Table S4: List of statistically significant differentially expressed genes by RNA-seq and microarray filtered by cut-off thresholds. Table S5: List of randomly selected genes for the comparison with qRT-PCR from the statistically significant differentially expressed genes from RNA-seq and microarray. Table S6: Gene specific primers used in qRT-PCR experiment. Table S7: Summary of bioinformatics analysis of statistically significant differentially expressed genes to be part of Type III Secretion System (T3SS) and Type II Secretion System (T2SS) along with the occurrence of PIP box. Table S8: List of 90 transcriptional units from X. citri subsp. citri to which the 106 differentially regulated genes belong.
file name 1471-2164-13-629-S1.xlsx
Click here for file
Comparison at absolute levels of expression
RNA-seq had coverage for 4323 genes with one or more reads mapped, while by microarray 4349 genes were assigned the fluorescence intensity values after the background correction. Among these 4312 genes (~99% of the total genes) were common to both methods, while merely 37 (0.8%) and 11 genes (0.2%) were uniquely called by microarray and RNA-seq respectively (Additional file 1: Tables S2 and S3; Additional file S2 2: Figure FS1). We compared the absolute levels of gene expression in terms of RNA-seq counts and microarray fluorescence intensities for all the listed genes called by both the methods. These two independent measures of transcript abundance associated with each gene for all the biological replicates from the wild-type and the hrpX mutant strains were compared separately. The resulting correlation was mapped as a scatter plot, with an average number of counts from Illumina sequencing against the normalized fluorescence intensities from Agilent arrays for each gene in the wild-type (Figure figr fid F1 1A) as well as in the hrpX mutant (Figure 1B). Absolute levels of gene expression correlated well, when estimated in terms of Spearman’s correlation coefficient (rs) with 0.78 (p-value < 0.0001) for the wild-type and 0.80 (p-value < 0.0001) for the hrpX mutant strain. This is in agreement with the previous reports that expression levels measured by microarray and RNA-seq had correlations ranging between 0.62 and 0.8 for prokaryotic and eukaryotic datasets
18
28
29
. However, there seems to be little or no correlation for the genes with low level of expression. We further estimated the correlation for the subset of genes with fluorescence intensity values ≤100 assigned by microarray (~360 genes) with the corresponding expression levels determined by RNA-seq. This subset of genes revealed a very poor rs of 0.2 (p-value <0.0002) and 0.3 (p-value <0.0001) for the wild-type and the hrpX mutant strains respectively. Although the expression levels of these genes did not change much according to microarray, RNA-seq reported them to have different expression levels. This may be attributed to the high sensitivity of RNA-seq method.
Additional file 2
Contains the following two additional figures, Figure FS1: Venn diagram summarizing genes called by both technologies, when comparison is carried out between the total currently annotated open reading frames (ORFs) available transcripts from the transcriptome of X. citri subsp. citri. Fold-change values are available from RNA-seq (4323) and microarray (4349). Gene’s called by both technologies are indicated by the overlap between the two circles. 4312 are found in consensus, while 11 and 37 are unique to RNA-seq and microarray respectively. Figure FS2: Venn diagram summarizing genes that are significantly differentially expressed determined by RNA-seq and microarray. Gene’s common to both methods are indicated by the overlap between the two circles.
1471-2164-13-629-S2.pdf
Click here for file
fig Figure 1caption Comparison of absolute levels of gene expression by RNA-seq and microarray
Comparison of absolute levels of gene expression by RNA-seq and microarray. Upper panel shows for the (A) wild-type and (B) hrpX mutant, the correlation between normalized fluorescence intensities of Agilent microarray with the RNA-seq counts from Illumina. Each dot represents the average values for each gene from all the biological replicates. Spearman’s correlation coefficient (rs) is indicated for each comparison. Lower panel shows rs between normalized fluorescence intensities of Agilent microarray with the RNA-seq counts from Illumina for all the combination of biological replicates for the (C) wild-type, and (D) the hrpX mutant. The rs values are plotted in the form of a heat map, where green color represents low rs value, while red represents highest rs value. The dendrogram provides a hierarchical clustering.
graphic 1471-2164-13-629-1 We further estimated the correlation between all the combinations of biological replicates for the wild-type and the hrpX mutant strains independently. The resulting rs values of these comparisons are represented in the form of heat maps, for the wild-type (Figure 1C) and the hrpX mutant strains (Figure 1D), which provide a global view of these correlations. Overall, on an average the wild-type with rs = 0.76 (p-value < 0.0001) and the hrpX mutant with rs = 0.78 (p-value < 0.0001) were observed for the biological replicates from all the correlation combinations. This level of comparison strongly suggested that not only the absolute level of gene expressions determined by RNA-seq and microarray highly correlated, but were also highly reproducible, in spite of the technical as well as the biological variability associated with the quantifications.
Comparison at relative levels of expression
We also compared the performance of these two methods at relative level of gene expression. For this purpose, we first computed the relative expression level of genes in terms of fold-change (FC) for the hrpX mutant in relation to the wild-type strain, along with p-values to denote the statistical significance and false discovery rate (FDR), for having a good control over the false positives rate. We compared the relative expression levels for 4312 consensus genes both qualitatively and quantitatively, after transforming the FC values to logarithm base 2 (log2) scale without any statistical cut-off thresholds (Additional file 1: Table S2). For the 2587 (~60% of the consensus) genes, the expression levels agreed qualitatively, while 1725 (~40%) genes disagreed between the two methods (Figure F2 2A). At this point, our comparison was exclusively focused on whether the gene of interest is up- or down-regulated based on the sign of the log2 transformed FC values, but not necessarily on the FC magnitude. We further illustrated the quantitative relationship of log2FC between RNA-seq and microarray in the form of a scatter plot as shown in Figure 2B. Genes with no change in expression levels in the wild-type and the hrpX mutant strains (FC = 1) clustered around log2FC of zero (log2 of one is zero) in the scatter plot (Figure 2B). The rs between the log2FCs determined by RNA-seq and microarray was found to be 0.30 (p-value < 0.0001) (Figure 2B). This lower correlation value indicated that the magnitude of FCs between the two methods differed largely that might be due to the background noise resulting from the many imperfections, which are inherent to the high-throughput technologies
B47 47
B48 48
.
Figure 2Qualitative and quantitative comparison of relative levels of gene expression in the hrpX mutant with respect to the wild-type strain, determined by RNA-seq and microarray methods
Qualitative and quantitative comparison of relative levels of gene expression in the hrpX mutant with respect to the wild-type strain, determined by RNA-seq and microarray methods. (A) Venn diagram showing the qualitative agreement in the log2 fold-change values for expression of 4312 genes by RNA-seq and microarray. (B) Scatter plot showing the relative expression levels of genes in terms of log2FCs, determined by RNA-seq and microarray. Correlation between the two methods is shown by Spearman’s correlation coefficient (rs). (C) Frequency histogram showing the percentage of genes with the fold difference between RNA-seq and microarray, with a bin width of 0.5. The lower panel D, E, and F are same as A, B, and C respectively, but for only those genes that have passed the statistical cut-off threshold (FDR ≤ 5% and absolute log2 fold-change ≥ 0.6).
1471-2164-13-629-2 The correlation coefficient provides an overall estimate of correlation between the expression levels determined by RNA-seq and microarray methods. However, this does not zoom into the data in a detailed manner. For instance, no information is provided about how much of FC magnitude that actually differs between the two methods for a given gene. In order to get an insight into this aspect, we computed the fraction of genes deviating in their FC magnitude values by dividing the FC magnitude value determined by RNA-seq with that of microarray (Figure 2C). Here, the fold difference of one represents the fraction of genes that are determined to have a FC magnitude of ± 0.5 (bin width) by both RNA-seq and microarray methods. When we plotted this frequency as a histogram for the whole 4312 consensus genes, more than 75% of genes were found to have FC magnitude values ± 0.5 by RNA-seq and microarray methods. Since it is a relative expression comparison, genes whose expression values did not change much in the wild-type and the hrpX mutant strains, tend to have FC values = 1. Subsequently, it is more sensible to consider only differentially expressed set of genes for further comparisons.We therefore applied FDR ≤ 0.05 (5%) in conjunction with FC (absolute log2FC ≥ 0.6) to filter the whole data set. In total, 87 (2%) genes from RNA-seq and 64 (1.5%) from microarray qualified at this cut-off threshold from the 4312 consensus genes (Additional file 1: Table S4). Together, 106 genes satisfied our selection criterion from both the methods (Additional file 1: Table S4). Among them 84 (79.2%) genes were up-regulated, while 22 (20.8%) genes were found to be down-regulated. Further, 45 (~42.45%) genes were common between both the methods, whereas, 42 (39.63%) and 19 (~17.92%) genes were uniquely detected by RNA-seq and microarray respectively (Additional file 1: Table S4; Additional file 2: Figure FS2). We further compared the FC values of the 45 consensus genes both qualitatively and quantitatively. These genes qualitatively agreed 100% by having the same trend of log2 transformed FC values by both RNA-seq and microarray (Figure 2D). Likewise the quantitative comparison was performed by estimating the correlation between the magnitude of log2FC determined by RNA-seq and microarray for the 45 consensus genes as shown in Figure 2E. The magnitude of FC values between the two methods were found to be well correlated (rs = 0.76, p-value < 0.0001), indicating that the same trend of variation was observed in FC values between the two methods without any dispersion. Thereby, the magnitude of FC values determined by RNA-seq and microarray agreed to a large extent for the 45 consensus genes. In order to further pinpoint the deviation in the FC magnitude quantified by the two methods, we plotted the differences in the FC values determined by RNA-seq with respect to microarray, and the percentage of genes with that difference for the 45 consensus genes (Figure 2F). Majority of the genes (~98%) were found to have a magnitude of FC within the range of ≤ 1.5, while for the remaining 2% of the genes, it was 4.7-times higher in RNA-seq than the microarray based quantification. Based on these comparisons, we concluded that the relative gene expression levels quantified by RNA-seq and microarray were consistent to a large extent for the statistically differentially expressed set of consensus genes.
Comparison with qRT-PCR
Traditionally, quantitative Reverse Transcription PCR (qRT-PCR) is used to validate the gene expression levels quantified by high-throughput technologies like RNA-seq and microarray
B49 49
. Therefore, we compared the relative expression levels quantified by RNA-seq and microarray by qRT-PCR for a subset of 43 (40.6%) genes (Additional file 1: Table S5) that were randomly selected from the 106 significantly differentially expressed genes. Among them, 19 genes were found to be common between both the methods, 12 genes were unique to RNA-seq, while remaining 12 genes were found to be unique to microarray (Additional file 1: Table S4). The expression levels were found to be highly reliable for genes that are determined to be significantly differentially expressed by RNA-seq (rs = 0.94; p-value < 0.0001) as well as microarray (rs = 0.97; p-value < 0.0001). For the consensus genes, microarray had a slightly higher correlation with qRT-PCR than RNA-seq (Figures F3 3A and 3B).
Figure 3Comparison of expression levels quantified by RNA-seq and microarray with qRT-PCR
Comparison of expression levels quantified by RNA-seq and microarray with qRT-PCR. (A) Comparison of expression levels determined by RNA-seq with qRT-PCR. (B) Comparison of expression levels determined by microarray with qRT-PCR (C) Frequency histogram showing the percentage of genes with the fold difference between RNA-seq and qRT-PCR, with a bin width of 0.5. (D) Frequency histogram showing percentage of genes with the fold difference between microarray and qRT-PCR, with a bin width of 0.5.
1471-2164-13-629-3 We further plotted the percentage of genes that deviated in the magnitude of FC quantified by RNA-seq and microarray with respect to qRT-PCR (Figures 3C and 3D). For most of the genes, the magnitude of FC quantified by RNA-seq and microarray were relatively higher, when compared to qRT-PCR (fold difference >1). Overall, the magnitude of FC quantified by RNA-seq was in consistence with qRT-PCR based quantification (Figure 3C). For microarray, the magnitude of FC was observed to be consistent with qRT-PCR for a majority of genes, however, we also noticed outlier genes with a 9-times higher FC magnitude (Figure 3D).For the subset of 12 genes that were found to be uniquely determined by RNA-seq, the magnitude of FC quantified by RNA-seq correlated moderately with qRT-PCR (rs = 0.58; p-value 0.05) (Figure F4 4A; Additional file 1: Table S5). The 12 genes found to be uniquely detected by microarray had a correlation of rs = 0.92 (p-value 0.002) with qRT-PCR (Figure 4B; Additional file 1: Table S5). These correlations are slightly lower when compared to the consensus genes (rs ≥ 0.94). This indicated that the expression levels are more reliable for the genes that are determined to be significantly differentially expressed by both RNA-seq and microarray rather than by any one method. Moreover, it also indicated that there is a lot of variation in the magnitude of FC quantified by RNA-seq and qRT-PCR. We further evaluated this variation i.e. deviation from the magnitude of FC, by plotting the frequency histogram for the 12 genes unique to RNA-seq (Figure 4C) and microarray (Figure 4D). For the genes unique to RNA-seq, we observed that none of them had the same magnitude of FC, with 50% genes having 0 to 0.5-time lower and for the remaining 50% of the genes the magnitude of FC was observed to be 2 to 3-time higher, when compared to qRT-PCR (Figure 4C). Because of this inconsistence in the magnitude of FC, the expression levels are moderately correlated. For the genes unique to microarray, we observed a good consistence in the magnitude of FC with qRT-PCR (Figure 4D).
Figure 4Comparison of expression levels of genes that are uniquely determined by RNA-seq and microarray with that of qRT-PCR
Comparison of expression levels of genes that are uniquely determined by RNA-seq and microarray with that of qRT-PCR. Expression levels for the set of selected genes quantified by (A) RNA-seq, and (B) microarray with that of qRT-PCR. Frequency histogram showing percentage of genes deviating from the magnitude of FC, quantified by RNA-seq (C), and microarray (D) with respect to qRT-PCR. Bin width of 0.5 and 0.05 are used respectively.
1471-2164-13-629-4
Comparison in terms of detection of genes encoding T3SS and effectors
Extensive and detailed studies have been carried out since past three decades in cataloguing the target genes of HrpX in the genus Xanthomonas using various genetic and biochemical methods
32
38
39
B50 50
B51 51
B52 52
B53 53
B54 54
B55 55
. HrpX is known to regulate hrp gene cluster that encodes the type III secretion system (T3SS) and effectors
31
B56 56
. T3SS are specialized macromolecular machinery that act as a nano-injector to translocate the effector proteins into the cytoplasm of host plant cells
50
. These translocated effectors manipulate the host cellular processes by altering signal transduction, transcriptional activities like suppression of basal plant defense responses, and protein turnover in host cells for the benefit of the pathogen
50
. The T3SS machineries are evolutionarily conserved across many Gram-negative animal- and plant-pathogenic bacteria
B57 57
.Xcc is comprised of 25 hrp genes, including 19 hrp-conserved (hrc) and 6 hrp-associated (hpa) genes that encode the T3SS
B58 58
. These genes are clustered in a ~25 kb region spanning from 462712 to 488334 bp of the genome
32
. We applied statistically significant differentially expressed gene list that were derived from RNA-seq and microarray methods to this cluster. We counted for the number of known hrp cluster genes, which passed the FC and FDR cut-off thresholds from RNA-seq and microarray methods (Table tblr tid T1 1). Among the 25 hrp cluster genes, 16 (64%) were detected by both RNA-seq and microarray methods. Six genes were found to be uniquely detected by microarray, whereas, none uniquely detected by RNA-seq (Table 1). Three genes namely, hrcC, hpa2, and hpaA could not pass our statistical cut-off criteria by any of the methods, although they followed the same qualitative expression pattern. We further quantified the deviation in the magnitude of FC for the 16 known hrp genes, found in consensus between RNA-seq and microarray (Figure F5 5A). The magnitude of FC for 5% genes found to be same, while for the remaining 95% genes it was found to be between 1.2 to 1.8-time higher in RNA-seq than in microarray. Even though, microarray overall detected more genes from hrp cluster, RNA-seq reported higher magnitude of FC (Table 1).
table
Table 1
Summary of Type III secretion system (T3SS) hrp cluster genes detected by RNA-seq and microarray
tgroup align left cols 9
colspec colname c1 colnum 1 colwidth 1*
c2 2
c3 3
c4 4
c5 5
c6 6
c7 7
c8 8
c9
thead valign top
row
entry
Locus Tag
Gene Symbol
center nameend namest rowsep
RNA-seq
Microarray
Detected by
log
2
FC
p-value
FDR
log
2
FC
p-value
FDR
tfoot
† Consensus between RNA-seq and microarray16 genes (64%).ξ Only RNA-seq 0 gene (0%).ψ Only microarray 6 genes (24%).$ Undetected 3 genes (12%).The list of known T3SS hrp cluster genes along with the information about the log2FC, p-value and FDR value from RNA-seq and microarray experiments. "Detected by" column assigns by which method the known hrp cluster gene is found to be significantly differentially expressed.
tbody
XAC0412
hrcN
−2.2355
7.66E-09
1.27E-06
−0.8217
0.00E+00
0.00E+00

XAC0409
hrcJ
−3.1488
4.20E-15
1.40E-12
−2.1840
0.00E+00
0.00E+00

XAC0406
hrcU
−2.6729
1.01E-13
2.56E-11
−1.0507
0.00E+00
0.00E+00

XAC0405
hrcV
−1.5755
3.27E-08
4.88E-06
−0.8427
4.00E-05
2.58E-03

XAC0407
hrpB1
−3.9638
7.53E-28
3.61E-25
−2.8603
0.00E+00
0.00E+00

XAC0408
hrpB2
−2.8155
2.48E-09
4.67E-07
−1.9507
0.00E+00
0.00E+00

XAC0410
hrpB4
−1.9274
7.42E-05
6.05E-03
−1.5237
0.00E+00
0.00E+00

XAC0403
hrcQ
−2.0615
6.36E-07
7.64E-05
−0.8647
0.00E+00
0.00E+00

XAC0402
hrcR
−1.7123
4.73E-06
5.53E-04
−0.8677
0.00E+00
0.00E+00

XAC0399
hrpD5
−1.7287
2.67E-10
5.50E-08
−1.5487
0.00E+00
0.00E+00

XAC0398
hrpD6
−1.5446
5.85E-07
7.43E-05
−1.6347
0.00E+00
0.00E+00

XAC0397
hrpE
−2.1535
7.67E-16
2.76E-13
−1.8163
0.00E+00
0.00E+00

XAC0394
hrpF
−2.0517
7.04E-15
2.17E-12
−1.2113
0.00E+00
0.00E+00

XAC0416
hpa1
−5.0096
1.49E-51
1.29E-48
−4.2917
0.00E+00
0.00E+00

XAC0396
hpaB
−1.5429
1.56E-07
2.11E-05
−1.2527
0.00E+00
0.00E+00

XAC0393
hpaF
−0.9235
1.96E-04
1.34E-02
−0.6083
1.00E-05
8.10E-04

XAC0411
hrpB5
−1.9690
3.92E-03
1.38E-01
−0.8457
0.00E+00
0.00E+00
ψ
XAC0401
hrcS
−1.0156
4.00E-02
4.71E-01
−1.0837
0.00E+00
0.00E+00
ψ
XAC0404
hpaP
−1.2673
1.10E-02
2.74E-01
−1.1110
0.00E+00
0.00E+00
ψ
XAC0395
XAC0395
−1.0817
4.73E-02
5.08E-01
−0.8640
0.00E+00
0.00E+00
ψ
XAC0415
hrcC
−0.2713
2.49E-01
8.27E-01
−0.4053
4.75E-03
1.78E-01
$
XAC0413
hrpB7
−1.2107
1.08E-02
2.74E-01
−0.6590
0.00E+00
0.00E+00
ψ
XAC0417
hpa2
−1.4130
8.33E-03
2.31E-01
−0.4610
4.40E-04
2.64E-02
$
XAC0400
hpaA
−1.0585
2.76E-03
1.10E-01
−0.7763
9.74E-03
2.05E-01
$
XAC0414
hrcT
−1.0474
1.10E-02
2.74E-01
−0.6800
0.00E+00
0.00E+00
ψ
Figure 5Comparison of expression levels of genes encoding T3SS and effectors that are commonly detected by RNA-seq and microarray
Comparison of expression levels of genes encoding T3SS and effectors that are commonly detected by RNA-seq and microarray. (A) Frequency histogram showing percent of genes deviating from the magnitude of FC quantified by RNA-seq with respect to microarray for hrp gene cluster. Bin width of 0.2 is used. (B) Frequency histogram showing percentage of genes deviating from the magnitude of FC quantified by RNA-seq with respect to microarray for the T3SS and effector genes. Bin width of 0.5 is used.
1471-2164-13-629-5 Xcc also encodes 25 putative effector genes regulated by HrpX, which meditate the interaction with the host plant, hence determine the host specificity
55
. Since XAC2785, XAC1210 and XAC1209 were considered as pseudo or inactive genes, they were excluded from our analysis. We tabulated how many of these genes were detected by RNA-seq and microarray methods with their corresponding log2FC values along with p-value and FDR from the respective methods (Table T2 2). In total, 10 (45.5%) genes were detected by both the methods. RNA-seq and microarray uniquely detected one and three genes respectively. The remaining 9 genes (36.4%) were neither detected by RNA-seq nor by microarray, since they could not pass both the FC and FDR cut-offs (Table 2). For the 10 consensus genes, we calculated the fold differences in the magnitude of FC quantified by RNA-seq with respect to microarray. None of the genes had the same magnitude of FC between the two methods. Microarray estimated higher magnitude of FC for ~64% genes than RNA-seq, while RNA-seq estimated 1.2 to 1.8-time higher magnitude of FC for the remaining ~36% genes (Figure 5B). In contrast to hrp gene cluster, where microarray qualitatively outperformed RNA-seq in its ability to detect more genes, here RNA-seq complemented quantitatively with higher confidence by reporting higher magnitude of FCs. Thereby, for the effector gene data set, RNA-seq and microarray complemented each other both qualitatively as well as quantitatively.
Table 2
Summary of Type III effector genes detected by RNA-seq and microarray
Locus Tag
Gene Symbol
RNA-seq
Microarray
Detected by
log
2
FC
p-value
FDR
log
2
FC
p-value
FDR
† Consensus between RNA-seq and microarray 10 genes (45.5%).ψ Only microarray 3 genes (13.6%).ξ Only RNA-seq 1 genes (4.5%).$ Undetected 8 genes (36.4%).The list of known T3SS effector genes along with the information about the log2FC, p-value and FDR value derived from RNA-seq and microarray experiments. "Detected by" column assigns by which method the following effector genes are found to be significantly differentially expressed.
XAC0286
xopE
−1.3432
1.21E-07
1.69E-05
−1.1813
0.00E+00
0.00E+00

XACb0011
avrXacE3
−0.9098
2.20E-04
1.51E-02
−1.6363
0.00E+00
0.00E+00

XAC0754
xopI
−1.1830
5.40E-04
3.11E-02
−0.7287
0.00E+00
3.00E-05

XAC3085
xopK
−2.1505
0.00E+00
0.00E+00
−1.7157
0.00E+00
0.00E+00

XAC2786
xopN
−3.5117
0.00E+00
0.00E+00
−2.9897
0.00E+00
0.00E+00

XAC1208
xopP
−1.1860
1.00E-05
6.20E-04
−0.7583
0.00E+00
0.00E+00

XAC0277
xopR
−1.1202
6.00E-05
5.10E-03
−1.2360
0.00E+00
0.00E+00

XAC0543
xopX
−3.0765
0.00E+00
0.00E+00
−3.4423
0.00E+00
0.00E+00

XAC3230
xopAI
−1.4353
0.00E+00
0.00E+00
−1.1510
0.00E+00
0.00E+00

XAC2922
hrpW
−2.0833
0.00E+00
0.00E+00
−2.7723
0.00E+00
0.00E+00

XAC4213
xopAD
−0.8816
2.60E-04
1.68E-02
−0.3960
2.17E-02
2.72E-01
ξ
XAC4333
xopQ
−0.6779
1.95E-02
3.51E-01
−1.1190
0.00E+00
0.00E+00
ψ
XAC0601
xopV
−0.5197
5.44E-02
5.33E-01
−0.7367
0.00E+00
2.70E-04
ψ
XAC0076
avrBs2
−0.8632
1.79E-03
7.90E-02
−0.566
5.00E-05
3.53E-03
$
XAC3090
xopL
−0.3402
2.58E-01
8.37E-01
−0.6040
0.00E+00
0.00E+00
ψ
XAC3224
xopE
−0.5716
2.45E-02
3.85E-01
−0.2397
2.68E-02
4.48E-01
$
XAC2009
xopZ
−0.6456
9.45E-03
2.56E-01
−0.4160
1.30E-04
9.38E-03
$
XAC3666
xopAK
−0.2756
2.89E-01
8.59E-01
−0.2060
3.23E-01
9.82E-01
$
XACa0022
pthA1
0.0225
8.00E-01
9.80E-01
0.0790
6.69E-01
9.98E-01
$
XACa0039
pthA2
−0.4501
2.09E-01
7.97E-01
0.0413
8.25E-01
9.98E-01
$
XACb0015
pthA3
−1.1700
3.52E-03
1.28E-01
0.0723
7.13E-01
9.98E-01
$
XACb0065
pthA4
−0.3423
8.51E-01
1.00E+00
0.0315
7.23E-01
9.98E-01
$
Overall, considering T3SS and effector genes, in total there are 47 genes, from which, 26 genes (55%) were detected by both RNA-seq and microarray (Tables 1 and 2). RNA-seq uniquely detected 1 gene (2%), whereas, microarray detected 9 genes (19%). Remaining 11 genes (23%) were not detected by either one of the methods by failing to pass the cut-off threshold (Tables 1 and 2). Further, considering only the genes that are detected by at least one method, 72% of the known were detected by both methods, while remaining 28% were detected by either one of the methods.
Genes uniquely detected by RNA-seq and microarray
Among the 87 statistically significant differentially expressed genes from RNA-seq, 42 (39.63%) genes were found to be uniquely detected by this method (Additional file 2: Figure FS2). Of these 42 genes, 17 were found to be down-regulated, while 25 were up-regulated (Additional file 1: Table S4). Nearly 98% of these genes (41 of 42 unique) could not pass the FC cut-off threshold by microarray. The only exception is the gene fliO (XAC1945) that encodes a flagellar protein for flagellum apparatus, which passed the FC cut-off, but failed with FDR threshold. The gene XAC0755 encoding KdpF, a component of an integral membrane potassium-transporting system
B59 59
, is down-regulated by a factor of 3 (log2 FC of 1.6) according to RNA-seq, but, microarray could not capture this, as the probes for this gene were missing on the chip. This shows the limitation of microarray, where probes for all the genes need to be defined while designing the chip. Furthermore, four genes uniquely found by RNA-seq are involved in signal transduction and gene regulation, i.e. XAC4116 encoding a serine/threonine kinase, XAC1819 encoding a tryptophan-rich sensory protein, and two regulatory genes XAC3026, and XAC3363, whose function in citrus canker disease development remain to be explored. Furthermore, 21 genes (24%) are currently annotated as hypothetical proteins (Additional file 1: Table S6). Among them, four hypothetical proteins XAC0854, XAC4131, XAC1203, and XACb0064 were predicted to be T3SS secreted while 7 hypothetical proteins, XAC3275, XAC3680, XAC1943, XAC0527, XAC0599, XAC0239, and XAC0755 were predicted to be Type 2 Secretion System (T2SS) substrates (Additional file 1: Table S6) by Effective database
B60 60
. Gram-negative bacteria employ T2SS to transport proteins to the extracellular milieu, where the T2SS exo-proteins containing N-terminal signal peptides are used for inner-membrane translocation through either the Sec translocon or the Tat complex
B61 61
. Genes encoding proteins secreted by T3SS and T2SS have been experimentally proved to be regulated by HrpX
33
B62 62
B63 63
.Among the 64 statistically significant differentially expressed genes from microarray, 19 (29.7%) genes were found to be uniquely detected by this method (Additional file 2: Figure FS2). 18 were found to be down-regulated, while one gene was up-regulated (Additional file 1: Table S4). Unlike that of RNA-seq, nearly 63% genes (12 of 19 unique) could pass the FC cut-off threshold, but failed to pass the FDR threshold by RNA-seq. The remaining 37% genes (7 of 19 unique) could not pass both FC and FDR cut-off threshold. Furthermore, six genes were found to be hypothetical. Among them XAC2876, XAC1241, and XAC2370 were predicted as T2SS substrates. XAC1241 predicted as a T2SS substrate, shared 73% identity with a putative secreted protein from X. campestris pv. vesicatoria strain 85–10. Another T2SS candidate XAC2370 shared 95% identity with a secreted protein from X. fuscans subsp. aurantifolii str. ICPB 10535. XAC1124 shared 100% identity with MEKHLA domain protein from X. axonopodis pv. punicae str. LMG 859
33
. This domain is found in bacteria associated with plants. It further shares similarity with the PAS domain and might be involved in light, oxygen, and redox potential sensation
B64 64
.
Comparison at the level of functional annotations of genes
For comparison based on the biological function for the differentially expressed genes from RNA-seq and microarray, we utilized the ClueGO to integrate the Gene Ontology (GO)
B65 65
terms and KEGG
B66 66
pathway terms and create a functionally organised GO/KEGG network. Functional annotation with biological processes category resulted in 13 (14.94%) genes found from cluster for RNA-seq, while for microarray it was 12 (19.35%).The ClueGO overview pie chart highlighted that significant proportion of the genes differentially regulated are involved in “protein secretion by the T3SS” by both RNA-seq and microarray (Additional file S3 3: Figure FS3A & D). Additionally, RNA-seq also identified genes involved in “secretion activity by cell” as well as “single organism catabolic process” (Additional file 3: Figure FS3A). On the other hand, microarray highlighted the genes involved in “protein transmembrane transport”, “polycyclic aromatic hydrocarbon degradation” and “establishment of localization in cell” (Additional file 3: Figure FS3D). Majority of the genes are involved in “bacterial secretion system”, as shown by both RNA-seq and microarray. Also the differentially expressed genes are found to be significantly involved in the “transport of monovalent inorganic cation” (Additional file 3: Figure FS3B) and “protein transport” (Additional file 3: FS3E). Genes have also been found uniquely by microarray as significantly involved in “polycyclic aromatic hydrocarbon degradation” (Additional file 3: Figure FS3E). Genes from RNA-seq have been found to be involved in “riboflavin metabolism” as well as “single organism catabolic process” (Additional file 3: Figure FS3B). Further, visualization of the functionally grouped annotation network for the differentially regulated genes derived from RNA-seq (Additional file 3: Figure FS3C) and microarray (Additional file 3: Figure FS3F) methods highlighted the relationships between the terms. RNA-seq highlighted “protein secretion by the T3SS” along with the “small molecule catabolic process”, while microarray reflected “polycyclic aromatic hydrocarbon degradation” and “establishment of localization in cell”, as the most significant terms of the group. This analysis also showed that RNA-seq and microarray together provide more comprehensive functional information than the individual methods.
Additional file 3
Figure FS3 Comparison at the level of the functional annotations of the significantly differentially expressed genes from RNA-seq and microarray. GO term and KEGG pathway information enrichment analysis is shown for the genes from RNA-seq (left panel) and microarray (right panel). The overview of the analysis is shown in the form of pie chart for gene set from RNA-seq (A), and microarray (D). The histogram shows the number of genes associated with terms for the genes from RNA-seq (B) and microarray (E). Significantly enriched terms are indicated with ’*’. The terms that are functionally related are shown as a network with terms as nodes and relatedness is indicated with thickness of the edges that is based on their kappa score. The most significant term per group are shown for genes from RNA-seq (C) and microarray (F).
1471-2164-13-629-S3.pdf
Click here for file
PIP box detection
HrpX is known to regulate the target gene expression by specifically binding to PIP box motif present in the cis-regulatory regions. PIP box consists of direct repeats of “TTCGC” with a spacer of 8 to 26-bps in between the repeats, even though ideally 8-bps and 15-bps are considered as the canonical PIP box
37
. We exploited this feature and looked for PIP boxes in the promoter regions of the 106 significantly differentially expressed genes (Additional file S4 4). All the 106 differentially expressed genes could be assigned to 90 transcriptional units based on MetaCyc database
B67 67
(Additional file 1: Table S8). However for simplicity, genes under the control of the same cis-regulatory regions were counted separately. Among the consensus 45 genes, 36 (80%) were shown to have canonical PIP boxes (Figure F6 6A, Additional file 1: Table S7). Of the 42 genes that are uniquely determined by RNA-seq, 13 (31%) genes were confirmed to have PIP boxes; whereas, among the 19 genes that are uniquely determined by microarray 11 (57.8%) genes were confirmed to have PIP boxes (Figure 6A, Additional file 1: Table S7). In this study, we identified newly PIP box motif in 7 (19.4%) genes among consensus, 13 (100%) genes unique to RNA-seq and 1 (9%) gene unique to microarray (Figure 6B). Overall, 60 of the 106 (~57%) significantly differentially expressed genes were confirmed to have PIP boxes in their cis-regulatory regions (Additional file 1: Table S7, Additional file 4). The presence of PIP box confirmed that these genes may be directly regulated by HrpX, while the remaining 46 that do not have PIP boxes may be indirectly regulated by HrpX via the other transcription factors. In this regard, we looked for genes with sequence specific DNA binding activity in the 106 differentially expressed genes. Six genes namely hrpG, pcaQ, blal, XAC3026, XAC3445, and XAC3446 were known to have sequence specific DNA binding activity according to GO annotation. Among them, XAC3446, XAC3445, and blaI have been newly identified in this study containing PIP box motif (Additional file 1: Table S7, Additional file 4). Thereby these 3 transcription regulators are directly regulated by HrpX, which in turn we assume regulate the 46 genes, which do not contain the PIP box motif and hence indirectly regulated by HrpX. The DNA binding signatures for many of these transcription factors are unknown; hence, obscure the further confirmation of regulation by these transcription factors. Nevertheless, the fact that many of the genes that were uniquely determined by each method showed a clear PIP box in their cis-regulatory regions reiterates that RNA-seq and microarray complement each other.
Additional file 4
Figure FS4 Snapshot of the PIP box motif present in the cis-regulatory region of significantly differentially expressed genes is shown in the context of the whole genome of X. citri subsp. citri. The absolute position of each PIP box motif occurrence is shown on the whole genome map along with the −10 ‘TATA’ regions and the gene start site.
1471-2164-13-629-S4.pdf
Click here for file
Figure 6Statistics of HrpX binding sites in the cis-regulatory regions of significantly differentially expressed genes from RNA-seq and microarray
Statistics of HrpX binding sites in the cis-regulatory regions of significantly differentially expressed genes from RNA-seq and microarray. The genes belonging to consensus and unique to each method are shown (A) Percentage of genes containing PIP box in the cis-regulatory regions of genes that belong to three different groups are shown. The known (black bar) and novel (red bar) are indicated. (B) Only genes with PIP box are shown, percentage of which already known (black bar) and novel (red bar) are indicated.
1471-2164-13-629-6
Discussion
Currently, RNA-seq is becoming the preferable choice for gene expression profiling in place of microarrays. Although, all the parameters that influence the various aspects of this method are yet to be understood completely, RNA-seq undoubtedly is playing a very important role in deciphering the complexity of the transcriptome by giving a new direction to isoforms, allelic expression, untranslated regions, splice junctions, antisense regulation and intragenic expression
10
16
29
B68 68
B69 69
B70 70
B71 71
B72 72
B73 73
B74 74
. Several studies have begun to investigate on the parameters like sequencing depth, precision, GC bias, length bias, lane effects, and processing artifacts
16
29
48
B75 75
B76 76
B77 77
. On the other hand, microarrays are in usage for more than two decades. Therefore, most of the biases inherent to this method have become more apparent
B78 78
. For instance, biases in the hybridization of the samples labeled with Cyanine5 (Cy5) and Cyanine3 (Cy3) are sufficiently explored, and currently several approaches are practiced to minimize such effects
B79 79
B80 80
B81 81
B82 82
. Further, systematic variability like influence of the image scanner settings on the dye intensity measurements have now been robustly handled by applying various normalization techniques
B83 83
B84 84
B85 85
B86 86
. Despite these developments, some inherent genes–specific biases like differential hybridization efficiencies of the labeled target transcript to the same probe are still found to be inevitable in microarrays. In RNA-seq as well as microarray, all these known and unknown parameters influence the final outcome. Therefore, in this study, we focused on the assessment of RNA-seq and microarray based on the final outcome .i.e. statistically significant differentially expressed genes.In comparison with previous RNA-seq studies, with a sequence coverage of 97% we observed for our data set, is in consistence with the reported 89.5% to 95% coverage observed in other bacterial RNA-seq studies
B87 87
B88 88
B89 89
. In our study, RNA-seq has identified more significantly differentially expressed genes (82%), when compared to microarray (63%) as in previous studies
18
29
30
. The overall correlation (rs 0.76) in the magnitudes of FC for the consensus genes between the two methods was found to be similar or higher than previous studies
18
29
30
72
. Furthermore, our comparison analysis with qRT-PCR suggested that the expression levels were highly reliable for those genes that were determined to be differentially expressed by both RNA-seq and microarray. Hence, confirming the differential expression of genes by multiple methods reduces false positives thereby enhances the biological discovery.Even though microarray overall outperformed RNA-seq by detecting more known HrpX target genes from the T3SS in hrp cluster by satisfying both FC and FDR cut-off threshold, in principle RNA-seq also detected genes hrpB5, hrcS, hpaP, XAC0395, hrpB7, and hrcT, in terms of FC, but failed to pass FDR threshold. This parameter is more directly influenced by error model considered in the statistical method that is used to infer the differential expression rather than RNA-seq itself. For the same read counts, one can get slightly different FDR values depending on the statistical method
B90 90
. But the implementation of all the statistical methods is not feasible for every dataset. From the T3SS in hrp cluster, three genes namely, hrcC, hpa2, and hpaA were not found to be detected by both RNA-seq and microarray, mainly because they fail to pass FDR threshold. Interestingly, our previous microarray analysis confirmed that all these three genes are regulated by HrpX, but only at a later stage of the growth phase by satisfying both FC and FDR cut-off thresholds
33
. This consolidates the regulation of some of the genes at later stages of the growth phase. Further, in case of Type III effector genes, 8 genes (36.4%) were not detected by both RNA-seq and microarray within considered cut-off threshold limit. However, among them xopL, avrBs2, xopAK and xopZ were found to be regulated by HrpX only at the later stage of the growth phase (OD600 time point 0.5), according to our previous microarray analysis
33
. Further, four genes namely, pthA2, pthA1, pthA3, pthA4 were regulated by another transcription regulator HrpG at early stage of growth phase (OD600 = 0.25 and 0.4) as observed in our previous study, while another undetected gene xopE was found to be also regulated by HrpG, but only at OD600 = 0.25 time point of growth phase
33
. Thereby this study further validated our previous results. Subsequently, both methods detected 100% of the genes known to be regulated by HrpX (at time point OD600 = 0.4) without any false positives. Among them, 72% were detected by both the methods while interestingly 28% of the known target genes were detected by either one of the methods. Hence, both the methods together could complement each other.In addition 55 genes (~51%) were newly identified as differentially expressed by applying both microarray as well as RNA-seq methods, thereby adding up to the already existing repertoire of HrpX regulated genes. Furthermore, 46 (83.6%) genes among them were uniquely identified by either one of the methods. Overall, 21 newly identified genes were found to have PIP box in their promoter regions, wherein 14 (58.3%) genes were uniquely identified by either RNA-seq or microarray. The presence of the PIP box in the promoter regions of the HrpX-regulated genes uniquely identified by RNA-seq and microarray further not only confirmed that these genes are directly regulated by HrpX, but also that these candidates are not false positives. Consequently, 100% of the known HrpX regulated genes could only be detected together by both the methods, since each method missed out on some of the known genes; hence both the methods together enhance the understanding of HrpX regulome by providing a more comprehensive picture.
Conclusions
This study has significantly advanced our understanding of the regulome of the critical transcriptional factor HrpX and demonstrates that RNA-seq and microarray complement each other in transcriptome profiling. Consequently, our study demonstrates the advantage of applying multiple transcriptome profiling methods to reveal a more comprehensive picture of a transcriptome, rather than relying solely on one method.
Methods
Bacterial strains and growth conditions
The wild-type X. citri subsp. citri
32
, and the hrpX mutant strains used in this study were described in our previous study
33
. Both the strains were grown at 28°C in nutrient broth (NB), on nutrient agar (NA), or in NYG medium
B91 91
. Antibiotics rifamycin and kanamycin were added to the media at 50 μg/ml final concentrations.
RNA extraction
Total RNA was extracted from the wild-type and the hrpX mutant strains as described in our previous study
33
. Briefly, strains from NA plates were grown in NB medium at 28°C until mid-exponential phase. Cultures were harvested by centrifugation and inoculated in to nutrient-deficient XVM2 medium, after washing the pellet once with the same medium. Cultures were finally harvested for RNA extraction, when the optical density at 600 nm reached the value of 0.4, and mixed immediately with RNAprotect bacterial reagent (Qiagen, Valencia, CA, and U.S.A.). Total RNA was extracted from each replicate separately using RiboPure bacteria kit (Ambion, Austin, TX, USA), according to manufacturer’s instructions. Genomic DNA contamination from the extracted RNA samples was removed using TURBO DNA-free kit (Ambion). Amount and the quality of the RNA samples was initially determined using NanoDrop™ 1000 spectrophotometer (NanoDrop Technologies, Inc., Wilmington, DE). Samples with absorbency at 260/280 and 260/230 nm ratios > 2 were subjected to further processing. Three biological replicates of the wild-type and the hrpX mutant samples were used for RNA-seq analysis.
Microarray data
The microarray data used in this study was generated during our previous study
33
. Three unique 60-mer oligonucleotide probes were designed for each of the 4,427 protein coding genes of X. citri subsp. citri
33
. 8-by-15-K DNA microarray chips covering the whole genome were implemented under the Agilent platform. These microarrays were processed at the Interdisciplinary Center for Biotechnology Research Microarray Core Facility, University of Florida. The raw data is available at National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) data repository under the accession number GSE24016
33
.
mRNA enrichment and RNA-seq
Total RNA samples were enriched for mRNA, by depleting rRNA using MICROBExpress kit from Ambion following the manufacturer’s instructions. Enriched samples were checked for integrity using Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). RNA samples that passed the quality control were sequenced using the Illumina Genome Analyzer IIx (GAIIx) system by following the standard protocol at the Center for Genome Analysis at Yale University. Real-time analysis and base calling were performed using the CASAVA v1.6 pipeline. The raw sequence data has been submitted to the NCBI Sequence Read Archive and assigned with an accession number SRA052842.
Reads mapping and statistical analysis
The X. citri subsp. citri whole genome sequence consisting of one chromosome [GenBank: NC_003919.1], and two plasmids [GenBank: NC_003921.3 and NC_003922.1], along with the annotation information were downloaded from NCBI repository (ftp://ftp.ncbi.nih.gov/genomes/Bacteria/). Quality-filtered reads were aligned on to the genome using CLC Genomics Workbench v4.7.2 (CLC bio, Aarhus, Denmark). Reads uniquely aligned to each gene were tabulated from each replicate separately. Differentially expressed genes were estimated using DESeq package
B92 92
, available under the open-source Bioconductor suite of programs
B93 93
. DESeq is a powerful tool to estimate the variance in RNA-seq data and test for differential expression
92
. As an input, DESeq accepts a table of read counts for each gene from different biological replicates, and estimates the differentially expressed genes using negative binomial distribution
92
. Statistically significant differentially expressed genes from both microarray and RNA-seq data were obtained by applying a cut-off threshold of FDR ≤ 0.05 (5%) and an absolute log2 fold-change ≥ 0.6.
Bioinformatics analysis
Similarity searches were performed online using position-specific iterative BLAST (PSI-BLAST) at NCBI site against non-redundant protein database
B94 94
. T3SS and T2SS predictions were performed using Effective database
60
. The promoter regions of the significantly differentially expressed genes were retrieved manually using NCBI genome browser to look for the presence of PIP boxes. The differentially expressed genes were assigned to the transcriptional units by referring to the MetaCyc database
67
. Biological interpretation of the differentially expressed genes was carried out using the ClueGO v1.5
B95 95
, a Cytoscape plug-in
B96 96
.
qRT-PCR
All the qRT-PCR assays were performed as detailed elsewhere
33
. Briefly, gene-specific primers were designed for the selected genes using PrimerQuestSM from Integrated DNA technologies (IDT), Coralville, Iowa (Additional file 1: Table S6). qRT-PCR experiments were performed in triplicates, at least three times for each gene using 7500 fast real-time PCR system (Applied Biosystems, Foster City, CA, USA), using a QuantiTect SYBR green RT-PCR kit (Qiagen) with similar results, by following the manufacturer’s instructions. The relative fold change of target gene expression was calculated using 16S rRNA as an endogenous control with the formula 2–∆∆CT
B97 97
.
Data availability
The raw RNA-seq data from this study is deposited at the NCBI sequence read archive (http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi), under the accession number SRA052842, while the raw microarray data is available at the NCBI Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo) with the accession number GSE24016.
Competing interests
We declare no competing interests.
Authors’ contributions
NW conceived the idea and initiated the study. SK performed analysis and interpretation of the data and wrote the manuscript. QY carried out the PCR experiments. YG participated in the study design, sample and data collection, and data interpretation. NW participated in interpretation of data and writing the manuscript and supervised the overall work. All authors read and approved the final manuscript.
bm
ack
Acknowledgements
We wish to acknowledge Gabriela Bindea for providing annotation files for ClueGO software. This work was supported by United States Department of Agriculture NIFA Special Citrus Canker Grant Project 94677.
refgrp RNA-Seq: a revolutionary tool for transcriptomicsWangZGersteinMSnyderMNat Rev Genet20091057lpage 63Gene expression analysis, proteomics, and network discoveryBaginskySHennigLZimmermannPGruissemWPlant Physiol2010152402410Stem cell transcriptome profiling via massive-scale mRNA sequencingCloonanNForrestARKolleGGardinerBBFaulknerGJBrownMKTaylorDFSteptoeALWaniSBethelGetal Nat Methods20085613619Uncovering the complexity of transcriptomes with RNA-SeqCostaVAngeliniCDe FeisICiccodicolaAJ Biomed Biotechnol20102010853916Use of a cDNA microarray to analyse gene expression patterns in human cancerDeRisiJPenlandLBrownPOBittnerMLMeltzerPSRayMChenYSuYATrentJMNat Genet199614457460Microarrays: their origins and applicationsEkinsRChuFWTrends Biotechnol199917217218Multiplexed biochemical assays with biological chipsFodorSPRavaRPHuangXCPeaseACHolmesCPAdamsCLNature1993364555556A concise guide to cDNA microarray analysisHegdePQiRAbernathyKGayCDharapSGaspardRHughesJESnesrudELeeNQuackenbushJBiotechniques2000295484556.RNA-seq: from technology to biologyMargueratSBahlerJCell Mol Life Sci201067569579The transcriptional landscape of the yeast genome defined by RNA sequencingNagalakshmiUWangZWaernKShouCRahaDGersteinMSnyderMScience200832013441349RNA-Seq: a method for comprehensive transcriptome analysisNagalakshmiUWaernKSnyderMCurr Protoc Mol Biol2010Chapter 4 Unit 4.11.1-13.DNA chips: state-of-the artRamsayGNat Biotechnol1998164044Quantitative monitoring of gene expression patterns with a complementary DNA microarraySchenaMShalonDDavisRWBrownPOScience1995270467470Evaluation of gene expression measurements from commercial microarray platformsTanPKDowneyTJSpitznagelELsuf JrXuPFuDDimitrovDSLempickiRARaakaBMCamMCNucleic Acids Res20033156765684RNA-sequence analysis of human B-cellsToungJMMorleyMLiMCheungVGGenome Res201121991998Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolutionWilhelmBTMargueratSWattSSchubertFWoodVGoodheadIPenkettCJRogersJBahlerJNature200845312391243cDNA microarray technology and its applicationsXiangCCChenYBiotechnol Adv2000183546Mapping and quantifying mammalian transcriptomes by RNA-SeqMortazaviAWilliamsBAMcCueKSchaefferLWoldBNat Methods20085621628Microarrays and high-throughput transcriptomic analysis in species with incomplete availability of genomic sequencesParisetLChillemiGBongiorniSRomanoSVValentiniAN Biotechnol200925272279Microarrays: biotechnology’s discovery platform for functional genomicsSchenaMHellerRATheriaultTPKonradKLachenmeierEDavisRWTrends Biotechnol199816301306Estimating accuracy of RNA-Seq and microarrays with proteomicsFuXFuNGuoSYanZXuYHuHMenzelCChenWLiYZengRBMC Genomics200910161The beginning of the end for microarrays?ShendureJNat Methods20085585587Next generation sequencing of microbial transcriptomes: challenges and opportunitiesvan VlietAHFEMS Microbiol Lett201030217Protocol dependence of sequencing-based gene expression measurementsRazTKapranovPLipsonDLetovskySMilosPMThompsonJFPLoS One20116e19287Next-generation transcriptome assemblyMartinJAWangZNat Rev Genet201112671682The next-generation sequencing technology and applicationZhouXRenLMengQLiYYuYYuJProtein Cell20101520536Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms‘t HoenPAAriyurekYThygesenHHVreugdenhilEVossenRHde MenezesRXBoerJMvan OmmenGJden DunnenJTNucleic Acids Res200836e141e141Comparative Analysis of Lactobacillus plantarum WCFS1 Transcriptomes by Using DNA Microarray and Next-Generation Sequencing TechnologiesLeimenaMMWelsMBongersRSSmidEJZoetendalEGKleerebezemMAppl Environ Microbiol20127841414148RNA-seq: an assessment of technical reproducibility and comparison with gene expression arraysMarioniJCMasonCEManeSMStephensMGiladYGenome Res20081815091517Comparing next-generation sequencing and microarray technologies in a toxicological study of the effects of aristolochic acid on rat kidneysSuZLiZChenTLiQZFangHDingDGeWNingBHongHPerkinsRGChem Res Toxicol20112414861493HrpXv, an AraC-type regulator, activates expression of five of the six loci in the hrp cluster of Xanthomonas campestris pv. vesicatoriaWengelnikKBonasUJ Bacteriol199617834623469Comparison of the genomes of two Xanthomonas pathogens with differing host specificitiesda SilvaACFerroJAReinachFCFarahCSFurlanLRQuaggioRBMonteiro-VitorelloCBVan SluysMAAlmeidaNFAlvesLMNature2002417459463HrpG and HrpX play global roles in coordinating different virulence traits of Xanthomonas axonopodis pv. citriGuoYFigueiredoFJonesJWangNMol Plant Microbe Interact201124649661Bacterial canker disease of citrusCiveroloEJ Rio Grande Vall Hortic Soc198437127145Expression profiling of virulence and pathogenicity genes of Xanthomonas axonopodis pv. citriAstua-MongeGFreitas-AstuaJBacocinaGRoncolettaJCarvalhoSAMchadoMAJ Bacteriol200518712011205Sequence and expression analysis of the hrpB pathogenicity operon of Xanthomonas campestris pv. vesicatoria which encodes eight proteins with similarity to components of the Hrp, Ysc, Spa, and Fli secretion systemsFenselauSBonasUMol Plant Microbe Interact19958845854Specific binding of the Xanthomonas campestris pv. vesicatoria AraC-type transcriptional activator HrpX to plant-inducible promoter boxesKoebnikRKrugerAThiemeFUrbanABonasUJ Bacteriol200618876527660Two novel type III-secreted proteins of Xanthomonas campestris pv. vesicatoria are encoded within the hrp pathogenicity islandNoelLThiemeFNennstielDBonasUJ Bacteriol200218413401348The type III (Hrp) secretion pathway of plant pathogenic bacteria: trafficking harpins, Avr proteins, and deathAlfanoJRCollmerAJ Bacteriol199717956555662hrp genes of phytopathogenic bacteriaBonasUCurr Top Microbiol Immunol19941927998Regulation and secretion of Xanthomonas virulence factorsButtnerDBonasUFEMS Microbiol Rev201034107133Cloning and molecular characterization of hrpX from Xanthomonas axonopodis pv. citri.IwamotoMOkuTDNA Seq200011167173Type III effector proteins from the plant pathogen Xanthomonas and their role in the interaction with the host plantGurlebeckDThiemeFBonasUJ Plant Physiol2006163233255Conservation of the hypersensitivity-pathogenicity regulatory gene hrpX of Xanthomonas campestris and X. oryzaeOkuTAlvarezAMKadoCIDNA Seq19955245249Molecular secrets of bacterial type III effector proteinsLahayeTBonasUTrends Plant Sci20016479485Quake: quality-aware detection and correction of sequencing errorsKelleyDRSchatzMCSalzbergSLGenome Biol201011R116How high is the level of technical noise in microarray data?KlebanovLYakovlevABiol Direct200729Transcript length bias in RNA-seq data confounds systems biologyOshlackAWakefieldMJBiol Direct2009414Analysis of global transcriptional profiles of enterotoxigenic Escherichia coli isolate E24377ASahlJWRaskoDAInfect Immun20128012321242Type III secretion system effector proteins: double agents in bacterial disease and plant defenseAlfanoJRCollmerAAnnu Rev Phytopathol200442385414Erwinia chrysanthemi and Pseudomonas syringae: plant pathogens trafficking in extracellular virulence proteinsCollmerABauerDWCurr Top Microbiol Immunol19941924378The type III secretion injectisomeCornelisGRNat Rev Microbiol20064811825Functional characterization of the Xcs and Xps type II secretion systems from the plant pathogenic bacterium Xanthomonas campestris pv vesicatoriaSzczesnyRJordanMSchrammCSchulzSCogezVBonasUButtnerDNew Phytol20101879831002Conservation of secretion pathways for pathogenicity determinants of plant and animal bacteriaVanGFGeninSBoucherCTrends Microbiol19931175180The type III effectors of XanthomonasWhiteFFPotnisNJonesJBKoebnikRMol Plant Pathol200910749766The role of hrp genes during plant-bacterial interactionsLindgrenPBAnnu Rev Phytopathol199735129152Elucidation of the regulon and cis-acting regulatory element of HrpB, the AraC-type regulator of a plant pathogen-like type III secretion system in Burkholderia pseudomalleiLipscombLSchellMAJ Bacteriol201119319912001Characterization of the Xanthomonas axonopodis pv. glycines Hrp pathogenicity islandKimJGParkBKYooCHJeonEOhJHwangIJ Bacteriol200318531553166Three-dimensional structure of the KdpFABC complex of Escherichia coli by electron tomography of two-dimensional crystalsHuGBRiceWJDroseSAltendorfKStokesDLJ Struct Biol2008161411418Effective–a database of predicted secreted bacterial proteinsJehlMAArnoldRRatteiTNucleic Acids Res201139D591D595The type II secretion system: biogenesis, molecular architecture and mechanismKorotkovKVSandkvistMHolWGNat Rev Microbiol201210336351Identification of novel type III secretion effectors in Xanthomonas oryzae pv. oryzaeFurutaniATakaokaMSanadaHNoguchiYOkuTTsunoKOchiaiHTsugeSMol Plant Microbe Interact20092296106Two Xanthomonas extracellular polygalacturonases, PghAxc and PghBxc, are regulated by type III secretion regulators HrpX and HrpG and are required for virulenceWangLRongWHeCMol Plant Microbe Interact200821555563MEKHLA, a novel domain with similarity to PAS domains, is fused to plant homeodomain-leucine zipper III proteinsMukherjeeKBurglinTRPlant Physiol200614011421150Gene ontology: tool for the unification of biology The Gene Ontology ConsortiumAshburnerMBallCABlakeJABotsteinDButlerHCherryJMDavisAPDolinskiKNat Genet2000252529The KEGG databases at GenomeNetKanehisaMGotoSKawashimaSNakayaANucleic Acids Res2002304246The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databasesCaspiRAltmanTDreherKFulcherCASubhravetiPKeselerIMKothariAKrummenackerMNucleic Acids Res201240D742D753The transcriptional landscape of the mammalian genomeCarninciPKasukawaTKatayamaSGoughJFrithMCMaedaNOyamaRRavasiTLenhardBWellsCScience200530915591563The developmental transcriptome of Drosophila melanogasterGraveleyBRBrooksANCarlsonJWDuffMOLandolinJMYangLArtieriCGvan BarenMJBoleyNBoothBWNature2011471473479Transcriptome sequencing of the Microarray Quality Control (MAQC) RNA reference samples using next generation sequencingManeSPEvansCCooperKLCrastaORFolkertsOHutchisonSKHarkinsTTThierry-MiegDThierry-MiegJJensenRVBMC Genomics200910264Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencingPanQShaiOLeeLJFreyBJBlencoweBJNat Genet20084014131415A global view of gene activity and alternative splicing by deep sequencing of the human transcriptomeSultanMSchulzMHRichardHMagenAKlingenhoffAScherfMSeifertMBorodinaTSoldatovAParkhomchukDScience2008321956960Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiationTrapnellCWilliamsBAPerteaGMortazaviAKwanGvan BarenMJSalzbergSLWoldBJPachterLNat Biotechnol201028511515Most “dark matter” transcripts are associated with known genesVanBHNislowCBlencoweBJHughesTRPLoS Biol20108e1000371Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experimentsBullardJHPurdomEHansenKDDudoitSBMC Bioinforma20101194Characterization and improvement of RNA-Seq precision in quantitative transcript expression profilingLabajPPLeparcGGLinggiBEMarkillieLMWileyHSKreilDPBioinformatics201127i383i391GC-content normalization for RNA-Seq dataRissoDSchwartzKSherlockGDudoitSBMC Bioinforma201112480Reliability and reproducibility issues in DNA microarray measurementsDraghiciSKhatriPEklundACSzallasiZTrends Genet200622101109Unfolding of microarray dataGoryachevABMacgregorPFEdwardsAMJ Comput Biol20018443461Analysis of variance for gene expression microarray dataKerrMKMartinMChurchillGAJ Comput Biol20007819837Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effectsTsengGCOhMKRohlinLLiaoJCWongWHNucleic Acids Res20012925492557cDNA cloning and function analysis of two novel erythroid differentiation related genesWangXWangDChenXHuMWangJLiYGuoNShenBSci China C Life Sci20014499105Regression by Local Fitting: Methods, Properties, and Computational AlgorithmsClevelandWSDevlinSJGrosseEJ Econ19883787114MARAN: normalizing micro-array dataEngelenKCoessensBMarchalKDeMBBioinformatics200319893894R: A language for data analysis and graphicsIhakaRGentlemanRJournal of Computational and Graphical Statistics. Journal of Computational and Graphical Statistics19965299314MatArray: a Matlab toolbox for microarray dataVenetDBioinformatics200319659660RNA-seq based transcriptional map of bovine respiratory disease pathogen “Histophilus somni 2336”KumarRLawrenceMLWattJCookseyAMBurgessSCNanduriBPLoS One20127e29435A single-base resolution map of an archaeal transcriptomeWurtzelOSapraRChenFZhuYSimmonsBASorekRGenome Res201020133141Mapping the Burkholderia cenocepacia niche response via high-throughput sequencingYoder-HimesDRChainPSZhuYWurtzelORubinEMTiedjeJMSorekRProc Natl Acad Sci U S A200910639763981Differential expression in RNA-seq: a matter of depthTarazonaSGarcia-AlcaldeFDopazoJFerrerAConesaAGenome Res20112122132223Cloning of genes involved in pathogenicity of Xanthomonas campestris pv. campestris using the broad host range cosmid pLAFR1DanielsMJBarberCETurnerPCSawczycMKByrdeRJFieldingAHEMBO J1984333233328Differential expression analysis for sequence count dataAndersSHuberWGenome Biol201011R106Bioconductor: an open source framework for bioinformatics and computational biologyReimersMCareyVJMethods Enzymol2006411119134Gapped BLAST and PSI-BLAST: a new generation of protein database search programsAltschulSFMaddenTLSchafferAAZhangJZhangZMillerWLipmanDJNucleic Acids Res19972533893402ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networksBindeaGMlecnikBHacklHCharoentongPTosoliniMKirilovskyAFridmanWHPagèsFTrajanoskiZGalonJBioinformatics20092510911093Cytoscape: a software environment for integrated models of biomolecular interaction networksShannonPMarkielAOzierOBaligaNSWangJTRamageDAminNSchwikowskiBIdekerTGenome Res20031324982504Analysis of relative gene expression data using real-time quantitative PCR and the 2(−Delta Delta C(T)) MethodLivakKJSchmittgenTDMethods200125402408


xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID EMVP2LIW7_9KDJVP INGEST_TIME 2013-03-05T20:10:08Z PACKAGE AA00013474_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES


xml version 1.0 encoding utf-8 standalone no
mets ID sort-mets_mets OBJID sword-mets LABEL DSpace SWORD Item PROFILE METS SIP Profile xmlns http:www.loc.govMETS
xmlns:xlink http:www.w3.org1999xlink xmlns:xsi http:www.w3.org2001XMLSchema-instance
xsi:schemaLocation http:www.loc.govstandardsmetsmets.xsd
metsHdr CREATEDATE 2013-01-02T16:07:46
agent ROLE CUSTODIAN TYPE ORGANIZATION
name BioMed Central
dmdSec sword-mets-dmd-1 GROUPID sword-mets-dmd-1_group-1
mdWrap SWAP Metadata MDTYPE OTHER OTHERMDTYPE EPDCX MIMETYPE textxml
xmlData
epdcx:descriptionSet xmlns:epdcx http:purl.orgeprintepdcx2006-11-16 xmlns:MIOJAVI
http:purl.orgeprintepdcxxsd2006-11-16epdcx.xsd
epdcx:description epdcx:resourceId sword-mets-epdcx-1
epdcx:statement epdcx:propertyURI http:purl.orgdcelements1.1type epdcx:valueURI http:purl.orgeprintentityTypeScholarlyWork
http:purl.orgdcelements1.1title
epdcx:valueString RNA-seq and microarray complement each other in transcriptome profiling
http:purl.orgdctermsabstract
Abstract
Background
RNA-seq and microarray are the two popular methods employed for genome-wide transcriptome profiling. Current comparison studies have shown that transcriptome quantified by these two methods correlated well. However, none of them have addressed if they complement each other, considering the strengths and the limitations inherent with them. The pivotal requirement to address this question is the knowledge of a well known data set. In this regard, HrpX regulome from pathogenic bacteria serves as an ideal choice as the target genes of HrpX transcription factor are well studied due to their central role in pathogenicity.
Results
We compared the performance of RNA-seq and microarray in their ability to detect known HrpX target genes by profiling the transcriptome from the wild-type and the hrpX mutant strains of γ-Proteobacterium Xanthomonas citri subsp. citri. Our comparative analysis indicated that gene expression levels quantified by RNA-seq and microarray well-correlated both at absolute as well as relative levels (Spearman correlation-coefficient, rs > 0.76). Further, the expression levels quantified by RNA-seq and microarray for the significantly differentially expressed genes (DEGs) also well-correlated with qRT-PCR based quantification (rs = 0.58 to 0.94). Finally, in addition to the 55 newly identified DEGs, 72% of the already known HrpX target genes were detected by both RNA-seq and microarray, while, the remaining 28% could only be detected by either one of the methods.
Conclusions
This study has significantly advanced our understanding of the regulome of the critical transcriptional factor HrpX. RNA-seq and microarray together provide a more comprehensive picture of HrpX regulome by uniquely identifying new DEGs. Our study demonstrated that RNA-seq and microarray complement each other in transcriptome profiling.
http:purl.orgdcelements1.1creator
Kogenaru, Sunitha
Qing, Yan
Guo, Yinping
Wang, Nian
http:purl.orgeprinttermsisExpressedAs epdcx:valueRef sword-mets-expr-1
http:purl.orgeprintentityTypeExpression
http:purl.orgdcelements1.1language epdcx:vesURI http:purl.orgdctermsRFC3066
en
http:purl.orgeprinttermsType
http:purl.orgeprinttypeJournalArticle
http:purl.orgdctermsavailable
epdcx:sesURI http:purl.orgdctermsW3CDTF 2012-11-15
http:purl.orgdcelements1.1publisher
BioMed Central Ltd
http:purl.orgeprinttermsstatus http:purl.orgeprinttermsStatus
http:purl.orgeprintstatusPeerReviewed
http:purl.orgeprinttermscopyrightHolder
Sunitha Kogenaru et al.; licensee BioMed Central Ltd.
http:purl.orgdctermslicense
http://creativecommons.org/licenses/by/2.0
http:purl.orgdctermsaccessRights http:purl.orgeprinttermsAccessRights
http:purl.orgeprintaccessRightsOpenAccess
http:purl.orgeprinttermsbibliographicCitation
BMC Genomics. 2012 Nov 15;13(1):629
http:purl.orgdcelements1.1identifier
http:purl.orgdctermsURI http://dx.doi.org/10.1186/1471-2164-13-629
fileSec
fileGrp sword-mets-fgrp-1 USE CONTENT
file sword-mets-fgid-0 sword-mets-file-1
FLocat LOCTYPE URL xlink:href 1471-2164-13-629.xml
sword-mets-fgid-1 sword-mets-file-2 applicationpdf
1471-2164-13-629.pdf
sword-mets-fgid-3 sword-mets-file-3
1471-2164-13-629-S4.PDF
sword-mets-fgid-4 sword-mets-file-4
1471-2164-13-629-S3.PDF
sword-mets-fgid-5 sword-mets-file-5
1471-2164-13-629-S2.PDF
sword-mets-fgid-6 sword-mets-file-6 applicationvnd.openxmlformats-officedocument.spreadsheetml.sheet
1471-2164-13-629-S1.XLSX
structMap sword-mets-struct-1 structure LOGICAL
div sword-mets-div-1 DMDID Object
sword-mets-div-2 File
fptr FILEID
sword-mets-div-3
sword-mets-div-4
sword-mets-div-5
sword-mets-div-6
sword-mets-div-7



PAGE 1

11 0.2% 4312 99% 37 0.8% Microarray RNA-seq 42 39.63% 45 42.45% 19 17.92% FS1 FS2





PAGE 1

RESEARCHARTICLEOpenAccessRNA-seqandmicroarraycomplementeachother intranscriptomeprofilingSunithaKogenaru,YanQing,YinpingGuoandNianWang*AbstractBackground: RNA-seqandmicroarrayarethetwopopularmethodsemployedforgenome-widetranscriptome profiling.Currentcomparisonstudieshaveshownthattranscriptomequantifiedbythesetwomethodscorrelated well.However,noneofthemhaveaddressediftheycomplementeachother,consideringthestrengthsandthe limitationsinherentwiththem.Thepivotalrequirementtoaddressthisquestionistheknowledgeofawellknown dataset.Inthisregard,HrpXregulomefrompathogenicbacteriaservesasanidealchoiceasthetargetgenesof HrpXtranscriptionfactorarewellstudiedduetotheircentralroleinpathogenicity. Results: WecomparedtheperformanceofRNA-seqandmicroarrayintheirabilitytodetectknownHrpXtarget genesbyprofilingthetranscriptomefromthewild-typeandthe hrpX mutantstrainsof -Proteobacterium Xanthomonascitri subsp. citri .OurcomparativeanalysisindicatedthatgeneexpressionlevelsquantifiedbyRNA-seq andmicroarraywell-correlatedbothatabsoluteaswellasrelativelevels(Spearmancorrelation-coefficient,rs>0.76). Further,theexpressionlevelsquantifiedbyRNA-seqandmicroarrayforthesignificantlydifferentiallyexpressed genes(DEGs)alsowell-correlatedwithqRT-PCRbasedquantification(rs=0.58to0.94).Finally,inadditiontothe55 newlyidentifiedDEGs,72%ofthealreadyknownHrpXtargetgenesweredetectedbybothRNA-seqand microarray,while,theremaining28%couldonlybedetectedbyeitheroneofthemethods. Conclusions: Thisstudyhassignificantlyadvancedourunderstandingoftheregulomeofthecriticaltranscriptional factorHrpX.RNA-seqandmicroarraytogetherprovideamorecomprehensivepictureofHrpXregulomeby uniquelyidentifyingnewDEGs.OurstudydemonstratedthatRNA-seqandmicroarraycomplementeachotherin transcriptomeprofiling. Keywords: RNA-seq,Microarray,Transcriptomeprofiling,Pathogenicbacteria,Virulence,Type3secretionsystem, Effectors,HrpX,Xanthomonas,CitruscankerdiseaseBackgroundTranscriptomeofanorganismrepresentstheentirerepertoireoftranscriptsencodedbythegenesasaphenotypicresponsetotheconditioninwhichtheyexist.The sheerabilitytosimultaneouslyquantifytheexpression levelsforavastnumberofgeneshasrevolutionizedthe biomedicalresearch,facilitatingtheanalysisofglobal geneexpressionpatternsatthegenome-widescale[1]. Inthepastdecade,therehasbeenatremendousprogressinthedevelopmentofmethodstodeduceand quantifythegeneexpressionlevelsatthewholetranscriptomelevel[1].Amongtheseveraltranscriptome profilingmethods,RNA-seqandDNAmicroarraystand outasthetwowidelyusedgenome-widegeneexpression quantificationmethods[1-17]. RNA-seqmethodinvolvestheconversionofisolated transcriptsintothecomplementaryDNA(cDNA),which isthendirectlysequencedinamassivelyparalleldeepsequencing-basedapproach[18].Bymappingtheresulting shortsequencingreadsontothereferencegenome,theexpressionlevelsofgenesrelativetotheconditionofinterest orabsolutelevelscanbequantified[9,11].Thismethod hasbeenimplementedindifferentplatformslikeIllumina ’ sGenomeAnalyzer,Roche454GenomeSequence,and AppliedBiosystems ’ SOLiD[4].Ontheotherhand, microarrayisbasedonthehybridizationofspecimentargetstrandsontotheimmobilizedcomplementaryprobe strands.Forexample,inatwo-colormicroarray,transcripts *Correspondence: nianwang@ufl.edu CitrusResearchandEducationCenter,DepartmentofMicrobiologyandCell Science,InstituteofFoodandAgriculturalSciences,UniversityofFlorida,700 ExperimentStationRoad,LakeAlfred33850,USA 2012Kogenaruetal.;licenseeBioMedCentralLtd.ThisisanOpenAccessarticledistributedunderthetermsoftheCreative CommonsAttributionLicense(http://creativecommons.org/licenses/by/2.0),whichpermitsunrestricteduse,distribution,and reproductioninanymedium,providedtheoriginalworkisproperlycited.Kogenaru etal.BMCGenomics 2012, 13 :629 http://www.biomedcentral.com/1471-2164/13/629

PAGE 2

extractedfromdifferentcon ditionsarelabeledwith distinctfluorescentdyeswhilebeingconvertedto cDNA.Theselabeledsamplesarethenhybridizedto theimmobilizedcomplementaryprobestrandsinan arrayrepresentingthegenes.Bymeasuringthelightintensityofthedistinctfluorescentdyes,therelative abundanceofeachtranscriptinthetwodifferentconditionscanbemeasured[8,12,13,17,19,20].Affymetrix andAgilentarethetwoprevalentplatformsinmicroarraytechnology[2,14]. Eventhough,initiallymicroarrayhasbeeninstrumental inwholetranscriptomeanalysis,currentlyRNA-seqis becomingapreferredmethodofchoice,sinceitisconsideredtoeffectivelysurmountthelimitationsofmicroarray[1,21-23].RNA-seqtechnology,unlikemicroarray, doesnotdependontheprerequisiteknowledgeofthe referencetranscriptome[24].Further,RNA-seqdata containsverylowbackgroundsignal,ahigherdynamic rangeofexpressionlevels,andalsorelativelysmall amountoftotalRNArequiredforquantification,when comparedtomicroarray[1,23].Despitetheseadvantages,theefficiencyofRNA-seqismarredwiththe problemofoverwhelmingamountofribosomalRNA (rRNA)inthedata,shortreads,lessbaseaccuracy,and variationofreaddensityalongthelengthofthetranscript,posingachallengeforthishigh-throughput method[21,25,26].However,inspiteoftheirstrengths andlimitations,RNA-seqandmicroarrayhavebecome thedefaultpopularmethodsofchoicesforgenomewidetranscriptomestudies[1,2,23]. Currentlyseveralstudieshavebeenconductedto comparetheperformanceofRNA-seqandmicroarray inquantifyingtheexpressionlevelofgenes,byfocusing onvariousaspectslikereproducibility,accuracy,statisticalissues,technicalandbiologicalvariabilities [1,15,21,27-30].Themainconclusionfromthesestudies hasbeenthattheexpressionlevelsquantifiedbythese twomethodscorrelatedtoalargeextent,andoverall favoredtheRNA-seqbecauseofhighreproducibility,accuracy,anddynamicrange[27,29].However,noneof thesecomparisonstudieshaveaddressedifthesetwo methodscomplementeachotherintranscriptomeprofilinggiventhestrengthsandlimitationsassociatedwith them.Inordertoaddressthisquestion,werequirean alreadywellcharacterizeddataset.TheHrpXregulome from Xanthomonascitri subsp .citri (Xcc)servesasan idealdatamodelinthisregard[31-33].Xccisacausal agentofcitruscanker,oneoftheseriousanddestructive diseasesincitrusthatisresultinginsignificantlossesto citrusindustryworldwide[34],whileHrpXisakeyglobaltranscriptionfactorthatregulatestheexpressionof hrp (hypersensitiveresponseandpathogenicity)cluster ofgenes,whichareconsideredasthemajorpathogenicityfactors[31,35].HrpXcontainsAraC-typeofDNA bindingdomain,whichspecificallyrecognizestheplantinduciblepromoter(PIP)box(TTCGC-N15-TTCGC) andimperfectPIPbox(TTCGC-N8-TTCGT)presentin thecis-regulatoryregionsof hrp genecluster[36-38]. SinceHrpXhasakeyroleinpathogenicity,tremendous progresshasbeenmadeincataloguingthetargetgenes ofHrpX[39-45].Wethereforeassessedtheperformance ofRNA-seqandmicroarrayintheirabilitytodetect knownHrpXtargetgenes.WechoseIlluminaandAgilentasthecorrespondingplatformsforRNA-seqand microarray,astheyarethemostpopularplatformsfor thesetechnologies[2,4].ResultsInordertouncovertheregulomeofHrpXtranscription regulatorbyprofilingthewild-typeandthe hrpX mutant strainstranscriptome,wehaddesignedamicroarray chipcoveringthewholegenomeunderAgilentplatform inourpreviousstudy[33].Here,weconductedgenomewidetranscriptomeprofilingofthesetwostrainsby RNA-seqandcomparedtheresultstothepreviously publishedmicroarraydata,toassesstheperformanceof thesetwomethods.Further,toavoidtechnicalvariation associatedwithRNAisolation,weusedthealiquots fromthesametotalRNAsamplesusedformicroarray experimentsalsoforRNA-seq. Weobtained16,431,283,17,28 9,220,18,124,120sequence readsforthewild-typeand15,084,955,17,831,920,and 18,115,115forthe hrpX mutantstrainwithamedian sequencelengthof74-basepairs(bp)(Additionalfile1: TableS1).Rawreadsoftenhavehighsequencingerrors, especiallyinthe30endwherethereisahighchanceof sequencingerrorstooccur[46].Wethereforefilteredthe readsforhighqualityonesbytrimmingoffthebasepairs withlowqualityscoreassignedtothemduringdownlineprocessingofRNA-seq.Morethan90%ofthereads passedthequalityfilter,asaresult,themediansequence lengthofqualityfilteredreadssubsequentlydroppedto 68-bp(Additionalfile1:TableS1).Wethenmapped thesehighqualitytrimmedreadsontotheXccgenome. Approximatelymorethan90%ofthereadscouldbe mappedontothereferencegenome,indicatinggood sequencecoverage(Additionalfile1:TableS1).Overall ~97%oftheannotatedgeneshadmorethanoneread mapped,whilemerely~3%oftheannotatedgeneshad noreadsmapped,indicatinggoodsequencingdepth. Further,wealsoobservedadifferenceinthesequence coveragebetweenthechromosomeandthetwoendogenousplasmidsofXcc.Annotatedcodinggenes fromthechromosomewithasizeof5.18megabase pairs(Mb)had98%sequencecoverage,whereas,it was78%forplasmidpXAC64withasizeof0.06Mb, andrelativelylowerwithonly62%sequenceKogenaru etal.BMCGenomics 2012, 13 :629 Page2of16 http://www.biomedcentral.com/1471-2164/13/629

PAGE 3

coverageforplasmidpXAC33withasizeof0.03Mb (Additionalfile1:TableS2).ComparisonatabsolutelevelsofexpressionRNA-seqhadcoveragefor4323geneswithoneor morereadsmapped,whilebymicroarray4349genes wereassignedthefluorescenceintensityvaluesafter thebackgroundcorrection.Amongthese4312genes (~99%ofthetotalgenes)werecommontobothmethods,whilemerely37(0.8%)and11genes(0.2%)were uniquelycalledbymicroarrayandRNA-seqrespectively (Additionalfile1:TablesS2andS3;Additionalfile2:FigureFS1).WecomparedtheabsolutelevelsofgeneexpressionintermsofRNA-seqcountsandmicroarray fluorescenceintensitiesforallthelistedgenescalledby boththemethods.Thesetwoindependentmeasuresof transcriptabundanceassociatedwitheachgeneforall thebiologicalreplicatesfromthewild-typeandthe hrpX mutantstrainswerecomparedseparately.Theresulting correlationwasmappedasascatterplot,withanaverage numberofcountsfromIlluminasequencingagainstthe normalizedfluorescenceintensitiesfromAgilentarrays foreachgeneinthewild-type(Figure1A)aswellasin the hrpX mutant(Figure1B).Absolutelevelsofgeneexpressioncorrelatedwell,whenestimatedintermsof Spearman ’ scorrelationcoefficient(rs)with0.78(p-value <0.0001)forthewild-typeand0.80(p-value<0.0001) forthe hrpX mutantstrain.Thisisinagreementwiththe previousreportsthatexpressionlevelsmeasuredby microarrayandRNA-seqhadcorrelationsrangingbetween0.62and0.8forprokaryoticandeukaryoticdatasets[18,28,29].However,thereseemstobelittleorno correlationforthegeneswithlowlevelofexpression. Wefurtherestimatedthecorrelationforthesubsetof geneswithfluorescenceintensityvalues 100assignedby microarray(~360genes)withthecorrespondingexpressionlevelsdeterminedbyRNA-seq.Thissubsetofgenes revealedaverypoorrsof0.2(p-value<0.0002)and0.3 (p-value<0.0001)forthewild-typeandthe hrpX mutant strainsrespectively.Althoughtheexpressionlevelsof thesegenesdidnotchangemuchaccordingtomicroarray,RNA-seqreportedthemtohavedifferentexpressionlevels.Thismaybeattributedtothehighsensitivity ofRNA-seqmethod. Wefurtherestimatedthecorrelationbetweenallthe combinationsofbiologicalreplicatesforthewild-type andthe hrpX mutantstrainsindependently.Theresultingrsvaluesofthesecomparisonsarerepresentedinthe formofheatmaps,forthewild-type(Figure1C)andthe hrpX mutantstrains(Figure1D),whichprovideaglobal viewofthesecorrelations.Overall,onanaveragethe wild-typewithrs=0.76(p-value<0.0001)andthe hrpX mutantwithrs=0.78(p-value<0.0001)wereobserved forthebiologicalreplicatesfromallthecorrelationcombinations.Thislevelofcomparisonstronglysuggested thatnotonlytheabsolutelevelofgeneexpressions WT-4 WT-1 WT-3 WT-2MicroarrayWT-1 WT-2 WT-3RNAseq h r pX-4 hrpX-1 hrpX-3 hrpX-2MicroarrayhrpX-3 hrpX-2 hrpX-1 RNAseq 0.760.780.8 B A D C Figure1 ComparisonofabsolutelevelsofgeneexpressionbyRNA-seqandmicroarray. Upperpanelshowsforthe( A )wild-typeand( B ) hrpX mutant,thecorrelationbetweennormalizedfluorescenceintensitiesofAgilentmicroarraywiththeRNA-seqcountsfromIllumina.Eachdot representstheaveragevaluesforeachgenefromallthebiologicalreplicates.Spearman ’ scorrelationcoefficient(rs)isindicatedforeach comparison.LowerpanelshowsrsbetweennormalizedfluorescenceintensitiesofAgilentmicroarraywiththeRNA-seqcountsfromIlluminafor allthecombinationofbiologicalreplicatesforthe( C )wild-type,and( D )the hrpX mutant.Thersvaluesareplottedintheformofaheatmap, wheregreencolorrepresentslowrsvalue,whileredrepresentshighestrsvalue.Thedendrogramprovidesahierarchicalclustering. Kogenaru etal.BMCGenomics 2012, 13 :629 Page3of16 http://www.biomedcentral.com/1471-2164/13/629

PAGE 4

determinedbyRNA-seqandmicroarrayhighlycorrelated,butwerealsohighlyreproducible,inspiteofthe technicalaswellasthebiologicalvariabilityassociated withthequantifications.ComparisonatrelativelevelsofexpressionWealsocomparedtheperformanceofthesetwomethodsatrelativelevelofgeneexpression.Forthispurpose, wefirstcomputedtherelativeexpressionlevelofgenes intermsoffold-change(FC)forthe hrpX mutantinrelationtothewild-typestrain,alongwithp-valuestodenotethestatisticalsignificanceandfalsediscoveryrate (FDR),forhavingagoodcontroloverthefalsepositives rate.Wecomparedtherelativeexpressionlevelsfor 4312consensusgenesbothqualitativelyandquantitatively,aftertransformingtheFCvaluestologarithmbase 2(log2)scalewithoutanystatisticalcut-offthresholds (Additionalfile1:TableS2).Forthe2587(~60%ofthe consensus)genes,theexpressionlevelsagreedqualitatively,while1725(~40%)genesdisagreedbetweenthe twomethods(Figure2A).Atthispoint,ourcomparison wasexclusivelyfocusedonwhetherthegeneofinterest isup-ordown-regulatedbasedonthesignofthelog2transformedFCvalues,butnotnecessarilyontheFC magnitude.Wefurtherillustratedthequantitativerelationshipoflog2FCbetweenRNA-seqandmicroarray intheformofascatterplotasshowninFigure2B. Geneswithnochangeinexpressionlevelsinthewildtypeandthe hrpX mutantstrains(FC=1)clustered aroundlog2FCofzero(log2ofoneiszero)inthescatterplot(Figure2B).Thersbetweenthelog2FCsdeterminedbyRNA-seqandmicroarraywasfoundtobe 0.30(p-value<0.0001)(Figure2B).ThislowercorrelationvalueindicatedthatthemagnitudeofFCsbetweenthetwomethodsdifferedlargelythatmightbe duetothebackgroundnoiseresultingfromthemany imperfections,whichareinherenttothehighthroughputtechnologies[47,48]. Thecorrelationcoefficientprovidesanoverallestimate ofcorrelationbetweentheexpressionlevelsdetermined byRNA-seqandmicroarraymethods.However,this doesnotzoomintothedatainadetailedmanner.For instance,noinformationisprovidedabouthowmuchof FCmagnitudethatactuallydiffersbetweenthetwo 874 20% 2587 60% 851 20% Microarray RNA-seq Microarray RNA-seq 45 100% DF ABC E Figure2 Qualitativeandquantitativecomparisonofrelativelevelsofgeneexpressioninthe hrpX mutantwithrespecttothe wild-typestrain,determinedbyRNA-seqandmicroarraymethods. ( A )Venndiagramshowingthequalitativeagreementinthelog2foldchangevaluesforexpressionof4312genesbyRNA-seqandmicroarray.( B )Scatterplotshowingtherelativeexpressionlevelsofgenesinterms oflog2FCs,determinedbyRNA-seqandmicroarray.CorrelationbetweenthetwomethodsisshownbySpearman ’ scorrelationcoefficient(rs).( C ) FrequencyhistogramshowingthepercentageofgeneswiththefolddifferencebetweenRNA-seqandmicroarray,withabinwidthof0.5. Thelowerpanel D E ,and F aresameas A B ,and C respectively,butforonlythosegenesthathavepassedthestatisticalcut-offthreshold (FDR 5%andabsolutelog2fold-change 0.6). Kogenaru etal.BMCGenomics 2012, 13 :629 Page4of16 http://www.biomedcentral.com/1471-2164/13/629

PAGE 5

methodsforagivengene.Inordertogetaninsightinto thisaspect,wecomputedthefractionofgenesdeviating intheirFCmagnitudevaluesbydividingtheFCmagnitudevaluedeterminedbyRNA-seqwiththatofmicroarray(Figure2C).Here,thefolddifferenceofone representsthefractionofgenesthataredeterminedto haveaFCmagnitudeof0.5(binwidth)bybothRNAseqandmicroarraymethods.Whenweplottedthisfrequencyasahistogramforthewhole4312consensus genes,morethan75%ofgeneswerefoundtohaveFC magnitudevalues0.5byRNA-seqandmicroarray methods.Sinceitisarelativeexpressioncomparison, geneswhoseexpressionvaluesdidnotchangemuchin thewild-typeandthe hrpX mutantstrains,tendtohave FCvalues=1.Subsequently,itismoresensibletoconsideronlydifferentiallyexpressedsetofgenesforfurther comparisons. WethereforeappliedFDR 0.05(5%)inconjunction withFC(absolutelog2FC 0.6)tofilterthewholedata set.Intotal,87(2%)genesfromRNA-seqand64(1.5%) frommicroarrayqualifiedatthiscut-offthresholdfrom the4312consensusgenes(Additionalfile1:TableS4). Together,106genessatisfiedourselectioncriterionfrom boththemethods(Additionalfile1:TableS4).Among them84(79.2%)geneswereup-regulated,while22 (20.8%)geneswerefoundtobedown-regulated.Further, 45(~42.45%)geneswerecommonbetweenboththe methods,whereas,42(39.63%)and19(~17.92%)genes wereuniquelydetectedbyRNA-seqandmicroarrayrespectively(Additionalfile1:TableS4;Additionalfile2: FigureFS2).WefurthercomparedtheFCvaluesofthe 45consensusgenesbothqualitativelyandquantitatively. Thesegenesqualitativelyagreed100%byhavingthe sametrendoflog2transformedFCvaluesbybothRNAseqandmicroarray(Figure2D).Likewisethequantitativecomparisonwasperformedbyestimatingthecorrelationbetweenthemagnitudeoflog2FCdeterminedby RNA-seqandmicroarrayforthe45consensusgenesas showninFigure2E.ThemagnitudeofFCvaluesbetweenthetwomethodswerefoundtobewellcorrelated (rs=0.76,p-value<0.0001),indicatingthatthesame trendofvariationwasobservedinFCvaluesbetween thetwomethodswithoutanydispersion.Thereby,the magnitudeofFCvaluesdeterminedbyRNA-seqand microarrayagreedtoalargeextentforthe45consensus genes.Inordertofurtherpinpointthedeviationinthe FCmagnitudequantifiedbythetwomethods,weplottedthedifferencesintheFCvaluesdeterminedbyRNAseqwithrespecttomicroarray,andthepercentageof geneswiththatdifferenceforthe45consensusgenes (Figure2F).Majorityofthegenes(~98%)werefoundto haveamagnitudeofFCwithintherangeof 1.5,while fortheremaining2%ofthegenes,itwas4.7-times higherinRNA-seqthanthemicroarraybased quantification.Basedonthesecomparisons,weconcludedthattherelativegeneexpressionlevelsquantified byRNA-seqandmicroarraywereconsistenttoalarge extentforthestatisticallydifferentiallyexpressedsetof consensusgenes.ComparisonwithqRT-PCRTraditionally,quantitativeReverseTranscriptionPCR (qRT-PCR)isusedtovalidatethegeneexpressionlevels quantifiedbyhigh-throughputtechnologieslikeRNAseqandmicroarray[49].Therefore,wecomparedthe relativeexpressionlevelsquantifiedbyRNA-seqand microarraybyqRT-PCRforasubsetof43(40.6%)genes (Additionalfile1:TableS5)thatwererandomlyselected fromthe106significantlydifferentiallyexpressedgenes. Amongthem,19geneswerefoundtobecommonbetweenboththemethods,12geneswereuniquetoRNAseq,whileremaining12geneswerefoundtobeunique tomicroarray(Additionalfile1:TableS4).Theexpressionlevelswerefoundtobehighlyreliableforgenesthat aredeterminedtobesignificantlydifferentiallyexpressed byRNA-seq(rs=0.94;p-value<0.0001)aswellas microarray(rs=0.97;p-value<0.0001).Fortheconsensusgenes,microarrayhadaslightlyhighercorrelation withqRT-PCRthanRNA-seq(Figures3Aand3B). Wefurtherplottedthepercentageofgenesthat deviatedinthemagnitudeofFCquantifiedbyRNA-seq andmicroarraywithrespecttoqRT-PCR(Figures3C and3D).Formostofthegenes,themagnitudeofFC quantifiedbyRNA-seqandmicroarraywererelatively higher,whencomparedtoqRT-PCR(folddifference>1). Overall,themagnitudeofFCquantifiedbyRNA-seq wasinconsistencewithqRT-PCRbasedquantification (Figure3C).Formicroarray,themagnitudeofFCwas observedtobeconsistentwithqRT-PCRforamajority ofgenes,however,wealsonoticedoutliergeneswitha 9-timeshigherFCmagnitude(Figure3D). Forthesubsetof12genesthatwerefoundtobe uniquelydeterminedbyRNA-seq,themagnitudeof FCquantifiedbyRNA-seqcorrelatedmoderatelywith qRT-PCR(rs=0.58;p-value0.05)(Figure4A;Additional file1:TableS5).The12genesfoundtobeuniquely detectedbymicroarrayhadacorrelationofrs=0.92 (p-value0.002)withqRT-PCR(Figure4B;Additionalfile1: TableS5).Thesecorrelationsareslightlylowerwhencomparedtotheconsensusgenes(rs 0.94).Thisindicated thattheexpressionlevelsaremorereliableforthegenes thataredeterminedtobesignificantlydifferentially expressedbybothRNA-seqandmicroarrayratherthanby anyonemethod.Moreover,italsoindicatedthatthereisa lotofvariationinthemagnitudeofFCquantifiedbyRNAseqandqRT-PCR.Wefurtherevaluatedthisvariation i.e.deviationfromthemagnitudeofFC,byplottingthe frequencyhistogramforthe12genesuniquetoRNA-seqKogenaru etal.BMCGenomics 2012, 13 :629 Page5of16 http://www.biomedcentral.com/1471-2164/13/629

PAGE 6

C A D B Figure3 ComparisonofexpressionlevelsquantifiedbyRNA-seqandmicroarraywithqRT-PCR. ( A )Comparisonofexpressionlevels determinedbyRNA-seqwithqRT-PCR.( B )ComparisonofexpressionlevelsdeterminedbymicroarraywithqRT-PCR( C )Frequencyhistogram showingthepercentageofgeneswiththefolddifferencebetweenRNA-seqandqRT-PCR,withabinwidthof0.5.( D )Frequencyhistogram showingpercentageofgeneswiththefolddifferencebetweenmicroarrayandqRT-PCR,withabinwidthof0.5. A CD B Figure4 ComparisonofexpressionlevelsofgenesthatareuniquelydeterminedbyRNA-seqandmicroarraywiththatofqRT-PCR. Expressionlevelsforthesetofselectedgenesquantifiedby( A )RNA-seq,and( B )microarraywiththatofqRT-PCR.Frequencyhistogramshowing percentageofgenesdeviatingfromthemagnitudeofFC,quantifiedbyRNA-seq( C ),andmicroarray( D )withrespecttoqRT-PCR.Binwidthof 0.5and0.05areusedrespectively. Kogenaru etal.BMCGenomics 2012, 13 :629 Page6of16 http://www.biomedcentral.com/1471-2164/13/629

PAGE 7

(Figure4C)andmicroarray(Figure4D).Forthegenes uniquetoRNA-seq,weobservedthatnoneofthemhad thesamemagnitudeofFC,with50%geneshaving0to 0.5-timelowerandfortheremaining50%ofthegenesthe magnitudeofFCwasobservedtobe2to3-timehigher, whencomparedtoqRT-PCR(Figure4C).Becauseofthis inconsistenceinthemagnitudeofFC,theexpressionlevels aremoderatelycorrelated.Forthegenesuniquetomicroarray,weobservedagoodconsistenceinthemagnitudeof FCwithqRT-PCR(Figure4D).Comparisonintermsofdetectionofgenesencoding T3SSandeffectorsExtensiveanddetailedstudieshavebeencarriedout sincepastthreedecadesincataloguingthetargetgenes ofHrpXinthegenusXanthomonasusingvariousgeneticandbiochemicalmethods[32,38,39,50-55].HrpXis knowntoregulate hrp geneclusterthatencodesthetype IIIsecretionsystem(T3SS)andeffectors[31,56].T3SS arespecializedmacromolecularmachinerythatactasa nano-injectortotranslocatetheeffectorproteinsinto thecytoplasmofhostplantcells[50].Thesetranslocated effectorsmanipulatethehostcellularprocessesbyalteringsignaltransduction,transcriptionalactivitieslike suppressionofbasalplantdefenseresponses,andproteinturnoverinhostcellsforthebenefitofthepathogen [50].TheT3SSmachineriesareevolutionarilyconserved acrossmanyGram-negativeanimal-andplantpathogenicbacteria[57]. Xcciscomprisedof25 hrp genes,including19 hrp conserved( hrc )and6 hrp -associated( hpa )genesthat encodetheT3SS[58].Thesegenesareclusteredina ~25kbregionspanningfrom462712to488334bpof thegenome[32].Weappliedstatisticallysignificantdifferentiallyexpressedgenelistthatwerederivedfrom RNA-seqandmicroarraymethodstothiscluster.We countedforthenumberofknown hrp clustergenes, whichpassedtheFCandFDRcut-offthresholdsfrom RNA-seqandmicroarraymethods(Table1).Amongthe 25 hrp clustergenes,16(64%)weredetectedbyboth RNA-seqandmicroarraymethods.Sixgeneswerefound tobeuniquelydetectedbymicroarray,whereas,none uniquelydetectedbyRNA-seq(Table1).Threegenes namely, hrcC hpa2 ,and hpaA couldnotpassourstatisticalcut-offcriteriabyanyofthemethods,although theyfollowedthesamequalitativeexpressionpattern. Wefurtherquantifiedthedeviationinthemagnitudeof FCforthe16known hrp genes,foundinconsensusbetweenRNA-seqandmicroarray(Figure5A).ThemagnitudeofFCfor5%genesfoundtobesame,whileforthe remaining95%genesitwasfoundtobebetween1.2to 1.8-timehigherinRNA-seqthaninmicroarray.Even though,microarrayoveralldetectedmoregenesfrom hrp cluster,RNA-seqreportedhighermagnitudeofFC (Table1). Xccalsoencodes25putativeeffectorgenesregulated byHrpX,whichmeditatetheinteractionwiththehost plant,hencedeterminethehostspecificity[55].Since XAC2785,XAC1210andXAC1209wereconsideredas pseudoorinactivegenes,theywereexcludedfromour analysis.Wetabulatedhowmanyofthesegeneswere detectedbyRNA-seqandmicroarraymethodswiththeir correspondinglog2FCvaluesalongwithp-valueand FDRfromtherespectivemethods(Table2).Intotal,10 (45.5%)genesweredetectedbyboththemethods RNAseqandmicroarrayuniquelydetectedoneandthree genesrespectively.Theremaining9genes(36.4%)were neitherdetectedbyRNA-seqnorbymicroarray,since theycouldnotpassboththeFCandFDRcut-offs (Table2).Forthe10consensusgenes,wecalculatedthe folddifferencesinthemagnitudeofFCquantifiedby RNA-seqwithrespecttomicroarray.Noneofthegenes hadthesamemagnitudeofFCbetweenthetwomethods.MicroarrayestimatedhighermagnitudeofFCfor ~64%genesthanRNA-seq,whileRNA-seqestimated 1.2to1.8-timehighermagnitudeofFCforthe remaining~36%genes(Figure5B).Incontrastto hrp genecluster,wheremicroarrayqualitativelyoutperformedRNA-seqinitsabilitytodetectmoregenes,here RNA-seqcomplementedquantitativelywithhigherconfidencebyreportinghighermagnitudeofFCs.Thereby, fortheeffectorgenedataset,RNA-seqandmicroarray complementedeachotherbothqualitativelyaswellas quantitatively. Overall,consideringT3SSandeffectorgenes,intotal thereare47genes,fromwhich,26genes(55%)were detectedbybothRNA-seqandmicroarray(Tables1and 2).RNA-sequniquelydetected1gene(2%),whereas, microarraydetected9genes(19%).Remaining11genes (23%)werenotdetectedbyeitheroneofthemethodsby failingtopassthecut-offthreshold(Tables1and2). Further,consideringonlythegenesthataredetectedby atleastonemethod,72%oftheknownweredetectedby bothmethods,whileremaining28%weredetectedbyeitheroneofthemethods.GenesuniquelydetectedbyRNA-seqandmicroarrayAmongthe87statisticallysignificantdifferentially expressedgenesfromRNA-seq,42(39.63%)geneswere foundtobeuniquelydetectedbythismethod(Additionalfile2:FigureFS2).Ofthese42genes,17were foundtobedown-regulated,while25wereup-regulated (Additionalfile1:TableS4).Nearly98%ofthesegenes (41of42unique)couldnotpasstheFCcut-offthresholdbymicroarray.Theonlyexceptionisthegene fliO (XAC1945)thatencodesaflagellarproteinforflagellum apparatus,whichpassedtheFCcut-off,butfailedwithKogenaru etal.BMCGenomics 2012, 13 :629 Page7of16 http://www.biomedcentral.com/1471-2164/13/629

PAGE 8

Table1SummaryofTypeIIIsecretionsystem(T3SS) hrp clustergenesdetectedbyRNA-seqandmicroarrayLocusTagGeneSymbolRNA-seqMicroarrayDetectedby log2FCp-valueFDRlog2FCp-valueFDR XAC0412 hrcN Š 2.23557.66E-091.27E-06 Š 0.82170.00E+000.00E+00 † XAC0409 hrcJ Š 3.14884.20E-151.40E-12 Š 2.18400.00E+000.00E+00 † XAC0406 hrcU Š 2.67291.01E-132.56E-11 Š 1.05070.00E+000.00E+00 † XAC0405 hrcV Š 1.57553.27E-084.88E-06 Š 0.84274.00E-052.58E-03 † XAC0407 hrpB1 Š 3.96387.53E-283.61E-25 Š 2.86030.00E+000.00E+00 † XAC0408 hrpB2 Š 2.81552.48E-094.67E-07 Š 1.95070.00E+000.00E+00 † XAC0410 hrpB4 Š 1.92747.42E-056.05E-03 Š 1.52370.00E+000.00E+00 † XAC0403 hrcQ Š 2.06156.36E-077.64E-05 Š 0.86470.00E+000.00E+00 † XAC0402 hrcR Š 1.71234.73E-065.53E-04 Š 0.86770.00E+000.00E+00 † XAC0399 hrpD5 Š 1.72872.67E-105.50E-08 Š 1.54870.00E+000.00E+00 † XAC0398 hrpD6 Š 1.54465.85E-077.43E-05 Š 1.63470.00E+000.00E+00 † XAC0397 hrpE Š 2.15357.67E-162.76E-13 Š 1.81630.00E+000.00E+00 † XAC0394 hrpF Š 2.05177.04E-152.17E-12 Š 1.21130.00E+000.00E+00 † XAC0416 hpa1 Š 5.00961.49E-511.29E-48 Š 4.29170.00E+000.00E+00 † XAC0396 hpaBŠ 1.54291.56E-072.11E-05 Š 1.25270.00E+000.00E+00 † XAC0393 hpaF Š 0.92351.96E-041.34E-02 Š 0.60831.00E-058.10E-04 † XAC0411 hrpB5 Š 1.96903.92E-031.38E-01 Š 0.84570.00E+000.00E+00 XAC0401 hrcS Š 1.01564.00E-024.71E-01 Š 1.08370.00E+000.00E+00 XAC0404 hpaP Š 1.26731.10E-022.74E-01 Š 1.11100.00E+000.00E+00 XAC0395XAC0395 Š 1.08174.73E-025.08E-01 Š 0.86400.00E+000.00E+00 XAC0415 hrcC Š 0.27132.49E-018.27E-01 Š 0.40534.75E-031.78E-01$ XAC0413 hrpB7 Š 1.21071.08E-022.74E-01 Š 0.65900.00E+000.00E+00 XAC0417 hpa2 Š 1.41308.33E-032.31E-01 Š 0.46104.40E-042.64E-02$ XAC0400 hpaA Š 1.05852.76E-031.10E-01 Š 0.77639.74E-032.05E-01$ XAC0414 hrcT Š 1.04741.10E-022.74E-01 Š 0.68000.00E+000.00E+00 † ConsensusbetweenRNA-seqandmicroarray16genes(64%). OnlyRNA-seq0gene(0%). Onlymicroarray6genes(24%). $Undetected3genes(12%). ThelistofknownT3SS hrp clustergenesalongwiththeinformationaboutthelog2FC,p-valueandFDRvaluefromRNA-seqandmicroarrayexperiments. "Detectedby"columnassignsbywhichmethodtheknown hrp clustergeneisfoundtobesignificantlydifferentiallyexpressed. AB Figure5 ComparisonofexpressionlevelsofgenesencodingT3SSandeffectorsthatarecommonlydetectedbyRNA-seqand microarray. ( A )FrequencyhistogramshowingpercentofgenesdeviatingfromthemagnitudeofFCquantifiedbyRNA-seqwithrespectto microarrayfor hrp genecluster.Binwidthof0.2isused.( B )Frequencyhistogramshowingpercentageofgenesdeviatingfromthemagnitudeof FCquantifiedbyRNA-seqwithrespecttomicroarrayfortheT3SSandeffectorgenes.Binwidthof0.5isused. Kogenaru etal.BMCGenomics 2012, 13 :629 Page8of16 http://www.biomedcentral.com/1471-2164/13/629

PAGE 9

FDRthreshold.ThegeneXAC0755encodingKdpF,a componentofanintegralmembranepotassiumtransportingsystem[59],isdown-regulatedbyafactor of3(log2FCof1.6)accordingtoRNA-seq,but,microarraycouldnotcapturethis,astheprobesforthisgene weremissingonthechip.Thisshowsthelimitationof microarray,whereprobesforallthegenesneedtobe definedwhiledesigningthechip.Furthermore,four genesuniquelyfoundbyRNA-seqareinvolvedinsignal transductionandgeneregulation,i.e.XAC4116encodingaserine/threoninekinase,XAC1819encodinga tryptophan-richsensoryprotein,andtworegulatory genesXAC3026,andXAC3363,whosefunctionincitrus cankerdiseasedevelopmentremaintobeexplored.Furthermore,21genes(24%)arecurrentlyannotatedas hypotheticalproteins(Additionalfile1:TableS6). Amongthem,fourhypotheticalproteinsXAC0854, XAC4131,XAC1203,andXACb0064werepredictedto beT3SSsecretedwhile7hypotheticalproteins, XAC3275,XAC3680,XAC1943,XAC0527,XAC0599, XAC0239,andXAC0755werepredictedtobeType2 SecretionSystem(T2SS)substrates(Additionalfile1: TableS6)byEffectivedatabase[60].Gram-negativebacteriaemployT2SStotransportproteinstotheextracellular milieu,wheretheT2SSexo-prot einscontainingN-terminal signalpeptidesareusedforinner-membranetranslocationthrougheithertheSectransloconortheTatcomplex[61].GenesencodingproteinssecretedbyT3SSand T2SShavebeenexperimentallyprovedtoberegulated byHrpX[33,62,63]. Amongthe64statisticallysignificantdifferentially expressedgenesfrommicroarray,19(29.7%)genes werefoundtobeuniquelydetectedbythismethod (Additionalfile2:FigureFS2).18werefoundtobe down-regulated,whileonegenewasup-regulated (Additionalfile1:TableS4).UnlikethatofRNA-seq, nearly63%genes(12of19unique)couldpasstheFC cut-offthreshold,butfailedtopasstheFDRthreshold byRNA-seq.Theremaining37%genes(7of19 Table2SummaryofTypeIIIeffectorgenesdetectedbyRNA-seqandmicroarrayLocusTagGeneSymbolRNA-seqMicroarrayDetectedby log2FCp-valueFDRlog2FCp-valueFDR XAC0286 xopE Š 1.34321.21E-071.69E-05 Š 1.18130.00E+000.00E+00 † XACb0011 avrXacE3 Š 0.90982.20E-041.51E-02 Š 1.63630.00E+000.00E+00 † XAC0754 xopI Š 1.18305.40E-043.11E-02 Š 0.72870.00E+003.00E-05 † XAC3085 xopK Š 2.15050.00E+000.00E+00 Š 1.71570.00E+000.00E+00 † XAC2786 xopN Š 3.51170.00E+000.00E+00 Š 2.98970.00E+000.00E+00 † XAC1208 xopP Š 1.18601.00E-056.20E-04 Š 0.75830.00E+000.00E+00 † XAC0277 xopR Š 1.12026.00E-055.10E-03 Š 1.23600.00E+000.00E+00 † XAC0543 xopX Š 3.07650.00E+000.00E+00 Š 3.44230.00E+000.00E+00 † XAC3230 xopAI Š 1.43530.00E+000.00E+00 Š 1.15100.00E+000.00E+00 † XAC2922 hrpW Š 2.08330.00E+000.00E+00 Š 2.77230.00E+000.00E+00 † XAC4213 xopAD Š 0.88162.60E-041.68E-02 Š 0.39602.17E-022.72E-01 XAC4333 xopQ Š 0.67791.95E-023.51E-01 Š 1.11900.00E+000.00E+00 XAC0601 xopV Š 0.51975.44E-025.33E-01 Š 0.73670.00E+002.70E-04 XAC0076 avrBs2 Š 0.86321.79E-037.90E-02 Š 0.5665.00E-053.53E-03$ XAC3090 xopL Š0.34022.58E-018.37E-01 Š 0.60400.00E+000.00E+00 XAC3224 xopE Š 0.57162.45E-023.85E-01 Š 0.23972.68E-024.48E-01$ XAC2009 xopZ Š 0.64569.45E-032.56E-01 Š 0.41601.30E-049.38E-03$ XAC3666 xopAK Š 0.27562.89E-018.59E-01 Š 0.20603.23E-019.82E-01$ XACa0022 pthA1 0.02258.00E-019.80E-010.07906.69E-019.98E-01$ XACa0039 pthA2 Š 0.45012.09E-017.97E-010.04138.25E-019.98E-01$ XACb0015 pthA3 Š 1.17003.52E-031.28E-010.07237.13E-019.98E-01$ XACb0065 pthA4 Š 0.34238.51E-011.00E+000.03157.23E-019.98E-01$† ConsensusbetweenRNA-seqandmicroarray10genes(45.5%). Onlymicroarray3genes(13.6%). OnlyRNA-seq1genes(4.5%). $Undetected8genes(36.4%). ThelistofknownT3SSeffectorgenesalongwiththeinformationaboutthelog2FC,p-valueandFDRvaluederivedfromRNA-seqandmicroarrayexperiments. "Detectedby"columnassignsbywhichmethodthefollowingeffectorgenesarefoundtobesignificantlydifferentiallyexpressed.Kogenaru etal.BMCGenomics 2012, 13 :629 Page9of16 http://www.biomedcentral.com/1471-2164/13/629

PAGE 10

unique)couldnotpassbothFCandFDRcut-off threshold.Furthermore,sixgeneswerefoundtobe hypothetical.AmongthemXAC2876,XAC1241,and XAC2370werepredictedasT2SSsubstrates.XAC1241 predictedasaT2SSsubstrate,shared73%identitywith aputativesecretedproteinfrom X.campestris pv .vesicatoria strain85 – 10.AnotherT2SScandidate XAC2370shared95%identitywithasecretedprotein from X.fuscans subsp. aurantifolii str.ICPB10535. XAC1124shared100%identitywithMEKHLAdomain proteinfrom X.axonopodis pv .punicae str.LMG859 [33].Thisdomainisfoundinbacteriaassociatedwith plants.ItfurthersharessimilaritywiththePASdomainandmightbeinvolvedinlight,oxygen,and redoxpotentialsensation[64].Comparisonattheleveloffunctionalannotationsof genesForcomparisonbasedonthebiologicalfunctionforthe differentiallyexpressedgenesfromRNA-seqandmicroarray,weutilizedtheClueGOtointegratetheGene Ontology(GO)[65]termsandKEGG[66]pathway termsandcreateafunctionallyorganisedGO/KEGG network.Functionalannotationwithbiologicalprocesses categoryresultedin13(14.94%)genesfoundfromclusterforRNA-seq,whileformicroarrayitwas12 (19.35%). TheClueGOoverviewpiecharthighlightedthatsignificantproportionofthegenesdifferentiallyregulated areinvolvedin “ proteinsecretionbytheT3SS ” byboth RNA-seqandmicroarray(Additionalfile3:FigureFS3A &D).Additionally,RNA-seqalsoidentifiedgenes involvedin “ secretionactivitybycell ” aswellas “ single organismcatabolicprocess ” (Additionalfile3:Figure FS3A).Ontheotherhand,microarrayhighlightedthe genesinvolvedin “ proteintransmembranetransport ” “ polycyclicaromatichydrocarbondegradation ” and “ establishmentoflocalizationincell ” (Additionalfile3: FigureFS3D).Majorityofthegenesareinvolvedin “ bacterialsecretionsystem ” ,asshownbybothRNA-seqand microarray.Alsothedifferentiallyexpressedgenesare foundtobesignificantlyinvolvedinthe “ transportof monovalentinorganiccation ” (Additionalfile3:Figure FS3B)and “ proteintransport ” (Additionalfile3:FS3E). Geneshavealsobeenfounduniquelybymicroarrayas significantlyinvolvedin “ polycyclicaromatichydrocarbondegradation ” (Additionalfile3:FigureFS3E).Genes fromRNA-seqhavebeenfoundtobeinvolvedin “ riboflavinmetabolism ” aswellas “ singleorganismcatabolic process ” (Additionalfile3:FigureFS3B).Further, visualizationofthefunctionallygroupedannotationnetworkforthedifferentiallyregulatedgenesderivedfrom RNA-seq(Additionalfile3:FigureFS3C)andmicroarray (Additionalfile3:FigureFS 3F)methodshighlightedthe relationshipsbetweentheterms.RNA-seqhighlighted “ proteinsecretionbytheT3SS ” alongwiththe “ smallmolecule catabolicprocess ” ,whilemicroarrayreflected “ polycyclic aromatichydrocarbondegradation ” and “ establishmentof localizationincell ” ,asthemostsignificanttermsofthe group.ThisanalysisalsoshowedthatRNA-seqandmicroarraytogetherprovidemorecomprehensivefunctionalinformationthantheindividualmethods.PIPboxdetectionHrpXisknowntoregulatethetargetgeneexpressionby specificallybindingtoPIPboxmotifpresentinthe cis-regulatoryregions.PIPboxconsistsofdirect repeatsof “ TTCGC ” withaspacerof8to26-bpsin betweentherepeats,eventhoughideally8-bpsand 15-bpsareconsideredasthecanonicalPIPbox[37]. WeexploitedthisfeatureandlookedforPIPboxesin thepromoterregionsofthe106significantlydifferentiallyexpressedgenes(Additionalfile4).Allthe106 differentiallyexpressedgenescouldbeassignedto90 transcriptionalunitsbasedonMetaCycdatabase[67] (Additionalfile1:TableS8).Howeverforsimplicity, genesunderthecontrolofthesamecis-regulatory regionswerecountedseparately.Amongtheconsensus 45genes,36(80%)wereshowntohavecanonicalPIP boxes(Figure6A,Additionalfile1:TableS7).Ofthe42 genesthatareuniquelydeterminedbyRNA-seq,13 (31%)geneswereconfirmedtohavePIPboxes;whereas, amongthe19genesthatareuniquelydeterminedby microarray11(57.8%)geneswereconfirmedtohavePIP boxes(Figure6A,Additionalfile1:TableS7). Inthisstudy,weidentifiednewlyPIPboxmotifin7 (19.4%)genesamongconsensus,13(100%)genesunique toRNA-seqand1(9%)geneuniquetomicroarray (Figure6B).Overall,60ofthe106(~57%)significantlydifferentiallyexpressedgeneswereconfirmedto havePIPboxesintheircis-regulatoryregions(Additionalfile1:TableS7,Additionalfile4).ThepresenceofPIPboxconfirmedthatthesegenesmaybe directlyregulatedbyHrpX,whiletheremaining46 thatdonothavePIPboxesmaybeindirectlyregulatedbyHrpXviatheothertranscriptionfactors.In thisregard,welookedforgeneswithsequencespecificDNAbindingactivityinthe106differentially expressedgenes.Sixgenesnamely hrpG pcaQ blal XAC3026,XAC3445,andXAC3446wereknownto havesequencespecificDNAbindingactivityaccording toGOannotation.Amongthem,XAC3446,XAC3445, and blaI havebeennewlyidentifiedinthisstudycontainingPIPboxmotif(Additionalfile1:TableS7, Additionalfile4).Therebythese3transcriptionregulatorsaredirectlyregulatedbyHrpX,whichinturn weassumeregulatethe46genes,whichdonotKogenaru etal.BMCGenomics 2012, 13 :629 Page10of16 http://www.biomedcentral.com/1471-2164/13/629

PAGE 11

containthePIPboxmotifandhenceindirectlyregulatedbyHrpX.TheDNAbindingsignaturesformany ofthesetranscriptionfactorsareunknown;hence,obscurethefurtherconfirmationofregulationbythese transcriptionfactors.Nevertheless,thefactthatmany ofthegenesthatwereuniquelydeterminedbyeach methodshowedaclearPIPboxintheircisregulatoryregionsreiteratesthatRNA-seqandmicroarraycomplementeachother.DiscussionCurrently,RNA-seqisbecomingthepreferablechoice forgeneexpressionprofilinginplaceofmicroarrays. Although,alltheparametersthatinfluencethevarious aspectsofthismethodareyettobeunderstoodcompletely,RNA-sequndoubtedlyisplayingaveryimportant roleindecipheringthecomplexityofthetranscriptome bygivinganewdirectiontoisoforms,allelicexpression, untranslatedregions,splicejunctions,antisenseregulationandintragenicexpression[10,16,29,68-74].Several studieshavebeguntoinvestigateontheparameterslike sequencingdepth,precision,GCbias,lengthbias,lane effects,andprocessingartifacts[16,29,48,75-77].Onthe otherhand,microarraysareinusageformorethantwo decades.Therefore,mostofthebiasesinherenttothis methodhavebecomemoreapparent[78].Forinstance, biasesinthehybridizationofthesampleslabeledwith Cyanine5(Cy5)andCyanine3(Cy3)aresufficiently explored,andcurrentlyseveralapproachesarepracticed tominimizesucheffects[79-82].Further,systematic variabilitylikeinfluenceoftheimagescannersettingson thedyeintensitymeasurementshavenowbeenrobustly handledbyapplyingvariousnormalizationtechniques [83-86].Despitethesedevelopments,someinherent genes – specificbiaseslikedifferentialhybridization efficienciesofthelabeledtargettranscripttothe sameprobearestillfoundtobeinevitableinmicroarrays.InRNA-seqaswellasmicroarray,allthese knownandunknownparametersinfluencethefinal outcome.Therefore,inthisstudy,wefocusedonthe assessmentofRNA-seqandmicroarraybasedonthe finaloutcome.i.e.statisticallysignificantdifferentially expressedgenes. IncomparisonwithpreviousRNA-seqstudies,witha sequencecoverageof97%weobservedforourdataset, isinconsistencewiththereported89.5%to95%coverageobservedinotherbacterialRNA-seqstudies[87-89]. Inourstudy,RNA-seqhasidentifiedmoresignificantly differentiallyexpressedgenes(82%),whencomparedto microarray(63%)asinpreviousstudies[18,29,30].The overallcorrelation(rs0.76)inthemagnitudesofFCfor theconsensusgenesbetweenthetwomethodswas foundtobesimilarorhigherthanpreviousstudies [18,29,30,72].Furthermore,ourcomparisonanalysiswith qRT-PCRsuggestedthattheexpressionlevelswere highlyreliableforthosegenesthatweredeterminedto bedifferentiallyexpressedbybothRNA-seqandmicroarray.Hence,confirmingthedifferentialexpressionof genesbymultiplemethodsreducesfalsepositives therebyenhancesthebiologicaldiscovery. EventhoughmicroarrayoveralloutperformedRNAseqbydetectingmoreknownHrpXtargetgenesfrom theT3SSin hrp clusterbysatisfyingbothFCandFDR cut-offthreshold,inprincipleRNA-seqalsodetected genes hrpB5,hrcS,hpaP ,XAC0395, hrpB7 ,and hrcT ,in termsofFC,butfailedtopassFDRthreshold.This parameterismoredirectlyinfluencedbyerrormodel consideredinthestatisticalmethodthatisusedto AB Figure6 StatisticsofHrpXbindingsitesinthecis-regulatoryregionsofsignificantlydifferentiallyexpressedgenesfromRNA-seqand microarray. Thegenesbelongingtoconsensusanduniquetoeachmethodareshown( A )PercentageofgenescontainingPIPboxinthe cis-regulatoryregionsofgenesthatbelongtothreedifferentgroupsareshown.Theknown(blackbar)andnovel(redbar)areindicated.( B )Only geneswithPIPboxareshown,percentageofwhichalreadyknown(blackbar)andnovel(redbar)areindicated. Kogenaru etal.BMCGenomics 2012, 13 :629 Page11of16 http://www.biomedcentral.com/1471-2164/13/629

PAGE 12

inferthedifferentialexpressionratherthanRNA-seq itself.Forthesamereadcounts,onecangetslightly differentFDRvaluesdependingonthestatistical method[90].Buttheimplementationofallthestatisticalmethodsisnotfeasibleforeverydataset.From theT3SSin hrp cluster,threegenesnamely, hrcC hpa2 ,and hpaA werenotfoundtobedetectedby bothRNA-seqandmicroarray,mainlybecausetheyfail topassFDRthreshold.Interestingly,ourprevious microarrayanalysisconfirmedthatallthesethree genesareregulatedbyHrpX,butonlyatalaterstage ofthegrowthphasebysatisfyingbothFCandFDR cut-offthresholds[33].Thisconsolidatestheregulation ofsomeofthegenesatlaterstagesofthegrowth phase.Further,incaseofTypeIIIeffectorgenes,8 genes(36.4%)werenotdetectedbybothRNA-seqand microarraywithinconsideredcut-offthresholdlimit. However,amongthem xopL avrBs2 xopAK and xopZ werefoundtoberegulatedbyHrpXonlyatthelater stageofthegrowthphase(OD600timepoint0.5), accordingtoourpreviousmicroarrayanalysis[33]. Further,fourgenesnamely, pthA2 pthA1 pthA3 pthA4 wereregulatedbyanothertranscriptionregulatorHrpGatearlystageofgrowthphase(OD600= 0.25and0.4)asobservedinourpreviousstudy,while anotherundetectedgene xopE wasfoundtobealso regulatedbyHrpG,butonlyatOD600=0.25time pointofgrowthphase[33].Therebythisstudyfurther validatedourpreviousresults.Subsequently,both methodsdetected100%ofthegenesknowntobe regulatedbyHrpX(attimepointOD600=0.4)without anyfalsepositives.Amongthem,72%weredetected byboththemethodswhileinterestingly28%ofthe knowntargetgenesweredetectedbyeitheroneofthe methods.Hence,boththemethodstogethercould complementeachother. Inaddition55genes(~51%)werenewlyidentifiedas differentiallyexpressedbyapplyingbothmicroarrayas wellasRNA-seqmethods,therebyaddinguptothe alreadyexistingrepertoireofHrpXregulatedgenes.Furthermore,46(83.6%)genesamongthemwereuniquely identifiedbyeitheroneofthemethods.Overall,21 newlyidentifiedgeneswerefoundtohavePIPboxin theirpromoterregions,wherein14(58.3%)geneswere uniquelyidentifiedbyeitherRNA-seqormicroarray. ThepresenceofthePIPboxinthepromoterregionsof theHrpX-regulatedgenesuniquelyidentifiedbyRNAseqandmicroarrayfurthernotonlyconfirmedthat thesegenesaredirectlyregulatedbyHrpX,butalsothat thesecandidatesarenotfalsepositives.Consequently, 100%oftheknownHrpXregulatedgenescouldonlybe detectedtogetherbyboththemethods,sinceeach methodmissedoutonsomeoftheknowngenes;hence boththemethodstogetherenhancetheunderstanding ofHrpXregulomebyprovidingamorecomprehensive picture.ConclusionsThisstudyhassignificantlyadvancedourunderstanding oftheregulomeofthecriticaltranscriptionalfactor HrpXanddemonstratesthatRNA-seqandmicroarray complementeachotherintranscriptomeprofiling.Consequently,ourstudydemonstratestheadvantageofapplyingmultipletranscriptomeprofilingmethodsto revealamorecomprehensivepictureofatranscriptome, ratherthanrelyingsolelyononemethod.MethodsBacterialstrainsandgrowthconditionsThewild-type X.citri subsp. citri [32] andthe hrpX mutantstrainsusedinthisstudyweredescribedinourpreviousstudy[33].Boththestrainsweregrownat28Cin nutrientbroth(NB),onnutrientagar(NA),orinNYG medium[91].Antibioticsrifamycinandkanamycinwere addedtothemediaat50 g/mlfinalconcentrations.RNAextractionTotalRNAwasextractedfromthewild-typeandthe hrpX mutantstrainsasdescribedinourpreviousstudy [33].Briefly,strainsfromNAplatesweregrowninNB mediumat28Cuntilmid-exponentialphase.Cultures wereharvestedbycentrifugationandinoculatedinto nutrient-deficientXVM2medium,afterwashingthepelletoncewiththesamemedium.Cultureswerefinally harvestedforRNAextraction,whentheopticaldensity at600nmreachedthevalueof0.4,andmixedimmediatelywithRNAprotectbacterialreagent(Qiagen,Valencia,CA,andU.S.A.).TotalRNAwasextractedfrom eachreplicateseparatelyusingRiboPurebacteriakit (Ambion,Austin,TX,USA),accordingtomanufacturer ’ s instructions.GenomicDNAcontaminationfromthe extractedRNAsampleswasremovedusingTURBO DNA-freekit(Ambion).Amountandthequalityofthe RNAsampleswasinitiallydeterminedusingNanoDrop ™ 1000spectrophotometer(NanoDropTechnologies,Inc., Wilmington,DE).Sampleswithabsorbencyat260/280 and260/230nmratios>2weresubjectedtofurther processing.Threebiologicalreplicatesofthewild-type andthe hrpX mutantsampleswereusedforRNA-seq analysis.MicroarraydataThemicroarraydatausedinthisstudywasgenerated duringourpreviousstudy[33].Threeunique60-mer oligonucleotideprobesweredesignedforeachofthe 4,427proteincodinggenesof X.citri subsp. citri [33].8by-15-KDNAmicroarraychipscoveringthewholegenomewereimplementedundertheAgilentplatform.Kogenaru etal.BMCGenomics 2012, 13 :629 Page12of16 http://www.biomedcentral.com/1471-2164/13/629

PAGE 13

ThesemicroarrayswereprocessedattheInterdisciplinaryCenterforBiotechnologyResearchMicroarrayCore Facility,UniversityofFlorida.Therawdataisavailable atNationalCenterforBiotechnologyInformation (NCBI)GeneExpressionOmnibus(GEO)datarepositoryundertheaccessionnumberGSE24016[33].mRNAenrichmentandRNA-seqTotalRNAsampleswereenrichedformRNA,bydepleting rRNAusingMICROBExpresskitfromAmbionfollowing themanufacturer ’ sinstructions.Enrichedsampleswere checkedforintegrityusingAgilent2100Bioanalyzer (AgilentTechnologies,SantaClara,CA,USA).RNA samplesthatpassedthequalitycontrolweresequenced usingtheIlluminaGenomeAnalyzerIIx(GAIIx)system byfollowingthestandardprotocolattheCenterfor GenomeAnalysisatYaleUniversity.Real-timeanalysis andbasecallingwereperformedusingtheCASAVA v1.6pipeline.TherawsequencedatahasbeensubmittedtotheNCBISequenceReadArchiveandassigned withanaccessionnumberSRA052842.ReadsmappingandstatisticalanalysisThe X.citri subsp. citri wholegenomesequenceconsisting ofonechromosome[GenBank:NC_003919.1],andtwo plasmids[GenBank:NC_003921.3andNC_003922.1], alongwiththeannotationinformationweredownloaded fromNCBIrepository(ftp://ftp.ncbi.nih.gov/genomes/ Bacteria/).Quality-filteredreadswerealignedontothe genomeusingCLCGenomicsWorkbenchv4.7.2(CLC bio,Aarhus,Denmark).Readsuniquelyalignedtoeach geneweretabulatedfromeachreplicateseparately.DifferentiallyexpressedgeneswereestimatedusingDESeq package[92],availableundertheopen-sourceBioconductorsuiteofprograms[93].DESeqisapowerfultooltoestimatethevarianceinRNA-seqdataandtestfor differentialexpression[92].Asaninput,DESeqacceptsa tableofreadcountsforeachgenefromdifferentbiological replicates,andestimatesthedifferentiallyexpressedgenes usingnegativebinomialdistribution[92].StatisticallysignificantdifferentiallyexpressedgenesfrombothmicroarrayandRNA-seqdatawereobtainedbyapplyingacutoffthresholdofFDR 0.05(5%)andanabsolutelog2fold-change 0.6.BioinformaticsanalysisSimilaritysearcheswereperformedonlineusingpositionspecificiterativeBLAST(PSI-BLAST)atNCBIsite againstnon-redundantproteindatabase[94].T3SSand T2SSpredictionswereperformedusingEffectivedatabase [60].ThepromoterregionsofthesignificantlydifferentiallyexpressedgeneswereretrievedmanuallyusingNCBI genomebrowsertolookforthepresenceofPIPboxes. Thedifferentiallyexpressedgeneswereassignedtothe transcriptionalunitsbyreferringtotheMetaCycdatabase [67].Biologicalinterpretationofthedifferentiallyexpressed geneswascarriedoutusingtheClueGOv1.5[95],a Cytoscapeplug-in[96].qRT-PCRAlltheqRT-PCRassayswereperformedasdetailed elsewhere[33].Briefly,gene-specificprimerswere designedfortheselectedgenesusingPrimerQuestSMfromIntegratedDNAtechnologies(IDT),Coralville, Iowa(Additionalfile1:TableS6).qRT-PCRexperimentswereperformedintriplicates,atleastthree timesforeachgeneusing7500fastreal-timePCRsystem (AppliedBiosystems,FosterCity,CA,USA),usinga QuantiTectSYBRgreenRT-PCRkit(Qiagen)withsimilarresults,byfollowingthemanufacturer ’ sinstructions. Therelativefoldchangeoftargetgeneexpressionwas calculatedusing16SrRNAasanendogenouscontrol withtheformula2– CT[97].DataavailabilityTherawRNA-seqdatafromthisstudyisdepositedat theNCBIsequencereadarchive(http://www.ncbi.nlm. nih.gov/Traces/sra/sra.cgi),undertheaccessionnumber SRA052842,whiletherawmicroarraydataisavailableat theNCBIGeneExpressionOmnibus(http://www.ncbi. nlm.nih.gov/geo)withtheaccessionnumberGSE24016.AdditionalfilesAdditionalfile1: Thefollowingexcelformatfilecontainsthe following8additionaltables:TableS1: SummaryofRNA-seqreads fromwild-typeand hrpX mutantstrainsof X.citri subsp. citri .TableS2:List ofgenesthatarecalledbybothRNA-seqandmicroarray.TableS3:Listof genesthatareuniquelycalledbyRNA-seqandmicroarray.TableS4:List ofstatisticallysignificantdifferentiallyexpressedgenesbyRNA-seqand microarrayfilteredbycut-offthresholds.TableS5:Listofrandomly selectedgenesforthecomparisonwithqRT-PCRfromthestatistically significantdifferentiallyexpressedgenesfromRNA-seqandmicroarray. TableS6:GenespecificprimersusedinqRT-PCRexperiment.TableS7: Summaryofbioinformaticsanalysisofstatisticallysignificantdifferentially expressedgenestobepartofTypeIIISecretionSystem(T3SS)andType IISecretionSystem(T2SS)alongwiththeoccurrenceofPIPbox.TableS8: Listof90transcriptionalunitsfrom X.citri subsp. citri towhichthe106 differentiallyregulatedgenesbelong. Additionalfile2: Containsthefollowingtwoadditionalfigures, FigureFS1: Venndiagramsummarizinggenescalledbyboth technologies,whencomparisoniscarriedoutbetweenthetotalcurrently annotatedopenreadingframes(ORFs)availabletranscriptsfromthe transcriptomeof X.citri subsp. citri .Fold-changevaluesareavailablefrom RNA-seq(4323)andmicroarray(4349).Gene ’ scalledbyboth technologiesareindicatedbytheoverlapbetweenthetwocircles.4312 arefoundinconsensus,while11and37areuniquetoRNA-seqand microarrayrespectively.FigureFS2:Venndiagramsummarizinggenes thataresignificantlydifferentiallyexpresseddeterminedbyRNA-seqand microarray.Gene ’ scommontobothmethodsareindicatedbythe overlapbetweenthetwocircles. Additionalfile3: FigureFS3-Comparisonatthelevelofthefunctional annotationsofthesignificantlydifferentiallyexpressedgenesfromRNAseqandmicroarray.GOtermandKEGGpathwayinformationenrichmentKogenaru etal.BMCGenomics 2012, 13 :629 Page13of16 http://www.biomedcentral.com/1471-2164/13/629

PAGE 14

analysisisshownforthegenesfromRNA-seq(leftpanel)andmicroarray (rightpanel).Theoverviewoftheanalysisisshownintheformofpie chartforgenesetfromRNA-seq(A),andmicroarray(D).Thehistogram showsthenumberofgenesassociatedwithtermsforthegenesfrom RNA-seq(B)andmicroarray(E).Significantlyenrichedtermsareindicated with ’ ’ .Thetermsthatarefunctionallyrelatedareshownasanetwork withtermsasnodesandrelatednessisindicatedwiththicknessofthe edgesthatisbasedontheirkappascore.Themostsignificanttermper groupareshownforgenesfromRNA-seq(C)andmicroarray(F). Additionalfile4: FigureFS4-SnapshotofthePIPboxmotifpresentin thecis-regulatoryregionofsignificantlydifferentiallyexpressedgenesis showninthecontextofthewholegenomeof X.citri subsp. citri .The absolutepositionofeachPIPboxmotifoccurrenceisshownonthe wholegenomemapalongwiththe Š 10 ‘ TATA ’ regionsandthegene startsite. Competinginterests Wedeclarenocompetinginterests. Authors ’ contributions NWconceivedtheideaandinitiatedthestudy.SKperformedanalysisand interpretationofthedataandwrotethemanuscript.QYcarriedoutthePCR experiments.YGparticipatedinthestudydesign,sampleanddata collection,anddatainterpretation.NWparticipatedininterpretationofdata andwritingthemanuscriptandsupervisedtheoverallwork.Allauthorsread andapprovedthefinalmanuscript. Acknowledgements WewishtoacknowledgeGabrielaBindeaforprovidingannotationfilesfor ClueGOsoftware.ThisworkwassupportedbyUnitedStatesDepartmentof Agriculture-NIFASpecialCitrusCankerGrantProject94677. Received:18July2012Accepted:5November2012 Published:15November2012 References1.WangZ,GersteinM,SnyderM: RNA-Seq:arevolutionarytoolfor transcriptomics. NatRevGenet 2009, 10 :57 – 63. 2.BaginskyS,HennigL,ZimmermannP,GruissemW: Geneexpression analysis,proteomics,andnetworkdiscovery. PlantPhysiol 2010, 152 :402 – 410. 3.CloonanN,ForrestAR,KolleG,GardinerBB,FaulknerGJ,BrownMK,Taylor DF,SteptoeAL,WaniS,BethelG, etal : Stemcelltranscriptomeprofiling viamassive-scalemRNAsequencing. NatMethods 2008, 5 :613 – 619. 4.CostaV,AngeliniC,DeFeisI,CiccodicolaA: Uncoveringthecomplexityof transcriptomeswithRNA-Seq. JBiomedBiotechnol 2010, 2010 :853916. 5.DeRisiJ,PenlandL,BrownPO,BittnerML,MeltzerPS,RayM,ChenY,SuYA, TrentJM: UseofacDNAmicroarraytoanalysegeneexpressionpatterns inhumancancer. NatGenet 1996, 14 :457 – 460. 6.EkinsR,ChuFW: Microarrays:theiroriginsandapplications. Trends Biotechnol 1999, 17 :217 – 218. 7.FodorSP,RavaRP,HuangXC,PeaseAC,HolmesCP,AdamsCL: Multiplexed biochemicalassayswithbiologicalchips. Nature 1993, 364 :555 – 556. 8.HegdeP,QiR,AbernathyK,GayC,DharapS,GaspardR,HughesJE, SnesrudE,LeeN,QuackenbushJ: AconciseguidetocDNAmicroarray analysis. Biotechniques 2000, 29 :548 – 4.556. 9.MargueratS,BahlerJ: RNA-seq:fromtechnologytobiology. CellMolLife Sci 2010, 67 :569 – 579. 10.NagalakshmiU,WangZ,WaernK,ShouC,RahaD,GersteinM,SnyderM: ThetranscriptionallandscapeoftheyeastgenomedefinedbyRNA sequencing. Science 2008, 320 :1344 – 1349. 11.NagalakshmiU,WaernK,SnyderM: RNA-Seq:amethodfor comprehensivetranscriptomeanalysis. CurrProtocMolBiol 2010, Chapter4 :.Unit4.11.1-13. 12.RamsayG: DNAchips:state-of-theart.NatBiotechnol 1998, 16 :40 – 44. 13.SchenaM,ShalonD,DavisRW,BrownPO: Quantitativemonitoringof geneexpressionpatternswithacomplementaryDNAmicroarray. Science 1995, 270 :467 – 470. 14.TanPK,DowneyTJ,SpitznagelELJr,XuP,FuD,DimitrovDS,LempickiRA, RaakaBM,CamMC: Evaluationofgeneexpressionmeasurementsfrom commercialmicroarrayplatforms. NucleicAcidsRes 2003, 31 :5676 – 5684. 15.ToungJM,MorleyM,LiM,CheungVG: RNA-sequenceanalysisofhuman B-cells. GenomeRes 2011, 21 :991 – 998. 16.WilhelmBT,MargueratS,WattS,SchubertF,WoodV,GoodheadI,Penkett CJ,RogersJ,BahlerJ: Dynamicrepertoireofaeukaryotictranscriptome surveyedatsingle-nucleotideresolution. Nature 2008, 453 :1239 – 1243. 17.XiangCC,ChenY: cDNAmicroarraytechnologyanditsapplications. BiotechnolAdv 2000, 18 :35 – 46. 18.MortazaviA,WilliamsBA,McCueK,SchaefferL,WoldB: Mappingand quantifyingmammaliantranscriptomesbyRNA-Seq. NatMethods 2008, 5 :621 – 628. 19.ParisetL,ChillemiG,BongiorniS,RomanoSV,ValentiniA: Microarraysand high-throughputtranscriptomicanalysisinspecieswithincomplete availabilityofgenomicsequences. NBiotechnol 2009, 25 :272 – 279. 20.SchenaM,HellerRA,TheriaultTP,KonradK,LachenmeierE,DavisRW: Microarrays:biotechnology ’ sdiscoveryplatformforfunctionalgenomics. TrendsBiotechnol 1998, 16 :301 – 306. 21.FuX,FuN,GuoS,YanZ,XuY,HuH,MenzelC,ChenW,LiY,ZengR, etal : EstimatingaccuracyofRNA-Seqandmicroarrayswithproteomics. BMCGenomics 2009, 10 :161. 22.ShendureJ: Thebeginningoftheendformicroarrays? NatMethods 2008, 5 :585 – 587. 23.vanVlietAH: Nextgenerationsequencingofmicrobialtranscriptomes: challengesandopportunities. FEMSMicrobiolLett 2010, 302:1 – 7. 24.RazT,KapranovP,LipsonD,LetovskyS,MilosPM,ThompsonJF: Protocol dependenceofsequencing-basedgeneexpressionmeasurements. PLoS One 2011, 6 :e19287. 25.MartinJA,WangZ: Next-generationtranscriptomeassembly. NatRev Genet 2011, 12 :671 – 682. 26.ZhouX,RenL,MengQ,LiY,YuY,YuJ: Thenext-generationsequencing technologyandapplication. ProteinCell 2010, 1 :520 – 536. 27. ‘ tHoenPA,AriyurekY,ThygesenHH,VreugdenhilE,VossenRH,deMenezes RX,BoerJM,vanOmmenGJ,denDunnenJT: Deepsequencing-based expressionanalysisshowsmajoradvancesinrobustness,resolutionand inter-labportabilityoverfivemicroarrayplatforms. NucleicAcidsRes 2008, 36 :e141.doi:e141. 28.LeimenaMM,WelsM,BongersRS,SmidEJ,ZoetendalEG,KleerebezemM: ComparativeAnalysisofLactobacillusplantarumWCFS1Transcriptomes byUsingDNAMicroarrayandNext-GenerationSequencing Technologies. ApplEnvironMicrobiol 2012, 78 :4141 – 4148. 29.MarioniJC,MasonCE,ManeSM,StephensM,GiladY: RNA-seq:an assessmentoftechnicalreproducibilityandcomparisonwithgene expressionarrays. GenomeRes 2008, 18 :1509 – 1517. 30.SuZ,LiZ,ChenT,LiQZ,FangH,DingD,GeW,NingB,HongH,Perkins RG, etal : Comparingnext-generationsequencingandmicroarray technologiesinatoxicologicalstudyoftheeffectsofaristolochicacid onratkidneys. ChemResToxicol 2011, 24 :1486 – 1493. 31.WengelnikK,BonasU: HrpXv,anAraC-typeregulator,activates expressionoffiveofthesixlociinthehrpclusterofXanthomonas campestrispv.vesicatoria. JBacteriol 1996, 178 :3462 – 3469. 32.daSilvaAC,FerroJA,ReinachFC,FarahCS,FurlanLR,QuaggioRB, Monteiro-VitorelloCB,VanSluysMA,AlmeidaNF,AlvesLM, etal : ComparisonofthegenomesoftwoXanthomonaspathogenswith differinghostspecificities. Nature 2002, 417 :459 – 463. 33.GuoY,FigueiredoF,JonesJ,WangN: HrpGandHrpXplayglobalrolesin coordinatingdifferentvirulencetraitsofXanthomonasaxonopodispv. citri. MolPlantMicrobeInteract 2011, 24 :649 – 661.34.CiveroloE: Bacterialcankerdiseaseofcitrus. JRioGrandeVallHorticSoc 1984, 37 :127 – 145. 35.Astua-MongeG,Freitas-AstuaJ,BacocinaG,RoncolettaJ,CarvalhoSA, MchadoMA: Expressionprofilingofvirulenceandpathogenicitygenes ofXanthomonasaxonopodispv.citri. JBacteriol 2005, 187 :1201 – 1205. 36.FenselauS,BonasU: SequenceandexpressionanalysisofthehrpB pathogenicityoperonofXanthomonascampestrispv.vesicatoriawhich encodeseightproteinswithsimilaritytocomponentsoftheHrp,Ysc, Spa,andFlisecretionsystems. MolPlantMicrobeInteract 1995, 8 :845 – 854. 37.KoebnikR,KrugerA,ThiemeF,UrbanA,BonasU: Specificbindingofthe Xanthomonascampestrispv.vesicatoriaAraC-typetranscriptionalKogenaru etal.BMCGenomics 2012, 13 :629 Page14of16 http://www.biomedcentral.com/1471-2164/13/629

PAGE 15

activatorHrpXtoplant-induciblepromoterboxes. JBacteriol 2006, 188 :7652 – 7660. 38.NoelL,ThiemeF,NennstielD,BonasU: TwonoveltypeIII-secreted proteinsofXanthomonascampestrispv.vesicatoriaareencodedwithin thehrppathogenicityisland. JBacteriol 2002, 184 :1340 – 1348. 39.AlfanoJR,CollmerA: ThetypeIII(Hrp)secretionpathwayofplant pathogenicbacteria:traffickingharpins,Avrproteins,anddeath. JBacteriol 1997, 179 :5655 – 5662. 40.BonasU: hrpgenesofphytopathogenicbacteria. CurrTopMicrobiol Immunol 1994, 192 :79 – 98. 41.ButtnerD,BonasU: RegulationandsecretionofXanthomonasvirulence factors. FEMSMicrobiolRev 2010, 34 :107 – 133. 42.IwamotoM,OkuT: CloningandmolecularcharacterizationofhrpXfrom Xanthomonasaxonopodispv.citri. DNASeq 2000, 11 :167 – 173. 43.GurlebeckD,ThiemeF,BonasU: TypeIIIeffectorproteinsfromtheplant pathogenXanthomonasandtheirroleintheinteractionwiththehost plant. JPlantPhysiol 2006, 163 :233 – 255. 44.OkuT,AlvarezAM,KadoCI: ConservationofthehypersensitivitypathogenicityregulatorygenehrpXofXanthomonascampestrisandX. oryzae. DNASeq 1995, 5 :245 – 249. 45.LahayeT,BonasU: MolecularsecretsofbacterialtypeIIIeffectorproteins. TrendsPlantSci 2001, 6 :479 – 485. 46.KelleyDR,SchatzMC,SalzbergSL: Quake:quality-awaredetectionand correctionofsequencingerrors. GenomeBiol 2010, 11 :R116. 47.KlebanovL,YakovlevA: Howhighistheleveloftechnicalnoisein microarraydata? BiolDirect 2007, 2 :9. 48.OshlackA,WakefieldMJ: TranscriptlengthbiasinRNA-seqdata confoundssystemsbiology. BiolDirect 2009, 4 :14. 49.SahlJW,RaskoDA: AnalysisofglobaltranscriptionalprofilesofenterotoxigenicEscherichiacoliisolateE24377A. InfectImmun 2012, 80 :1232 – 1242. 50.AlfanoJR,CollmerA: TypeIIIsecretionsystemeffectorproteins:double agentsinbacterialdiseaseandplantdefense. AnnuRevPhytopathol 2004, 42 :385 – 414. 51.CollmerA,BauerDW: ErwiniachrysanthemiandPseudomonassyringae: plantpathogenstraffickinginextracellularvirulenceproteins. CurrTop MicrobiolImmunol 1994, 192 :43 – 78. 52.CornelisGR: ThetypeIIIsecretioninjectisome. NatRevMicrobiol 2006, 4 :811 – 825. 53.SzczesnyR,JordanM,SchrammC,SchulzS,CogezV,BonasU,ButtnerD: FunctionalcharacterizationoftheXcsandXpstypeIIsecretionsystems fromtheplantpathogenicbacteriumXanthomonascampestris pvvesicatoria. NewPhytol 2010, 187 :983 – 1002. 54.VanGF,GeninS,BoucherC: Conservationofsecretionpathwaysfor pathogenicitydeterminantsofplantandanimalbacteria. TrendsMicrobiol 1993, 1 :175 – 180. 55.WhiteFF,PotnisN,JonesJB,KoebnikR: ThetypeIIIeffectorsof Xanthomonas. MolPlantPathol 2009, 10 :749 – 766. 56.LindgrenPB: Theroleofhrpgenesduringplant-bacterialinteractions. AnnuRevPhytopathol 1997, 35 :129 – 152. 57.LipscombL,SchellMA: Elucidationoftheregulonandcis-acting regulatoryelementofHrpB,theAraC-typeregulatorofaplant pathogen-liketypeIIIsecretionsysteminBurkholderiapseudomallei. JBacteriol 2011, 193 :1991 – 2001. 58.KimJG,ParkBK,YooCH,JeonE,OhJ,HwangI: Characterizationofthe Xanthomonasaxonopodispv.glycinesHrppathogenicityisland. JBacteriol 2003, 185 :3155 – 3166. 59.HuGB,RiceWJ,DroseS,AltendorfK,StokesDL: Three-dimensional structureoftheKdpFABCcomplexofEscherichiacolibyelectron tomographyoftwo-dimensionalcrystals. JStructBiol 2008, 161 :411 – 418. 60.JehlMA,ArnoldR,RatteiT: Effective – adatabaseofpredictedsecretedbacterialproteins. NucleicAcidsRes 2011, 39 :D591 – D595. 61.KorotkovKV,SandkvistM,HolWG: ThetypeIIsecretionsystem: biogenesis,moleculararchitectureandmechanism. NatRevMicrobiol 2012, 10 :336 – 351. 62.FurutaniA,TakaokaM,SanadaH,NoguchiY,OkuT,TsunoK,OchiaiH, TsugeS: IdentificationofnoveltypeIIIsecretioneffectorsin Xanthomonasoryzaepv.oryzae. MolPlantMicrobeInteract 2009, 22 :96 – 106. 63.WangL,RongW,HeC: TwoXanthomonasextracellular polygalacturonases,PghAxcandPghBxc,areregulatedbytypeIII secretionregulatorsHrpXandHrpGandarerequiredforvirulence. Mol PlantMicrobeInteract 2008, 21 :555 – 563. 64.MukherjeeK,BurglinTR: MEKHLA,anoveldomainwithsimilaritytoPAS domains,isfusedtoplanthomeodomain-leucinezipperIIIproteins. PlantPhysiol 2006, 140 :1142 – 1150. 65.AshburnerM,BallCA,BlakeJA,BotsteinD,ButlerH,CherryJM,DavisAP, DolinskiK, etal : Geneontology:toolfortheunificationofbiologyThe GeneOntologyConsortium. NatGenet 2000, 25 :25 – 29. 66.KanehisaM,GotoS,KawashimaS,NakayaA: TheKEGGdatabasesat GenomeNet. NucleicAcidsRes 2002, 30 :42 – 46. 67.CaspiR,AltmanT,DreherK,FulcherCA,SubhravetiP,KeselerIM,KothariA, KrummenackerM, etal : TheMetaCycdatabaseofmetabolicpathways andenzymesandtheBioCyccollectionofpathway/genomedatabases. NucleicAcidsRes 2012, 40 :D742 – D753. 68.CarninciP,KasukawaT,KatayamaS,GoughJ,FrithMC,MaedaN,OyamaR, RavasiT,LenhardB,WellsC, etal : Thetranscriptionallandscapeofthe mammaliangenome. Science 2005, 309 :1559 – 1563. 69.GraveleyBR,BrooksAN,CarlsonJW,DuffMO,LandolinJM,YangL,Artieri CG,vanBarenMJ,BoleyN,BoothBW, etal : Thedevelopmental transcriptomeofDrosophilamelanogaster. Nature 2011, 471 :473 – 479.70.ManeSP,EvansC,CooperKL,CrastaOR,FolkertsO,HutchisonSK,Harkins TT,Thierry-MiegD,Thierry-MiegJ,JensenRV: Transcriptomesequencingof theMicroarrayQualityControl(MAQC)RNAreferencesamplesusing nextgenerationsequencing. BMCGenomics 2009, 10 :264. 71.PanQ,ShaiO,LeeLJ,FreyBJ,BlencoweBJ: Deepsurveyingofalternative splicingcomplexityinthehumantranscriptomebyhigh-throughput sequencing. NatGenet 2008, 40 :1413 – 1415. 72.SultanM,SchulzMH,RichardH,MagenA,KlingenhoffA,ScherfM,Seifert M,BorodinaT,SoldatovA,ParkhomchukD, etal : Aglobalviewofgene activityandalternativesplicingbydeepsequencingofthehuman transcriptome. Science 2008, 321 :956 – 960. 73.TrapnellC,WilliamsBA,PerteaG,MortazaviA,KwanG,vanBarenMJ, SalzbergSL,WoldBJ,PachterL: Transcriptassemblyandquantificationby RNA-Seqrevealsunannotatedtranscriptsandisoformswitchingduring celldifferentiation. NatBiotechnol 2010, 28 :511 – 515. 74.VanBH,NislowC,BlencoweBJ,HughesTR: Most “ darkmatter ” transcripts areassociatedwithknowngenes. PLoSBiol 2010, 8 :e1000371. 75.BullardJH,PurdomE,HansenKD,DudoitS: Evaluationofstatistical methodsfornormalizationanddifferentialexpressioninmRNA-Seq experiments. BMCBioinforma 2010, 11 :94. 76.LabajPP,LeparcGG,LinggiBE,MarkillieLM,WileyHS,KreilDP: CharacterizationandimprovementofRNA-Seqprecisioninquantitative transcriptexpressionprofiling. Bioinformatics 2011, 27 :i383 – i391. 77.RissoD,SchwartzK,SherlockG,DudoitS: GC-contentnormalizationfor RNA-Seqdata. BMCBioinforma 2011, 12 :480. 78.DraghiciS,KhatriP,EklundAC,SzallasiZ: Reliabilityandreproducibility issuesinDNAmicroarraymeasurements. TrendsGenet 2006, 22 :101 – 109. 79.GoryachevAB,MacgregorPF,EdwardsAM: Unfoldingofmicroarraydata. JComputBiol 2001, 8 :443 – 461. 80.KerrMK,MartinM,ChurchillGA: Analysisofvarianceforgeneexpression microarraydata. JComputBiol 2000, 7 :819 – 837.81.TsengGC,OhMK,RohlinL,LiaoJC,WongWH: IssuesincDNAmicroarray analysis:qualityfiltering,channelnormalization,modelsof variationsandassessmentofgeneeffects. NucleicAcidsRes 2001, 29 :2549 – 2557. 82.WangX,WangD,ChenX,HuM,WangJ,LiY,GuoN,ShenB: cDNA cloningandfunctionanalysisoftwonovelerythroiddifferentiation relatedgenes. SciChinaCLifeSci 2001, 44 :99 – 105. 83.ClevelandWS,DevlinSJ,GrosseE: RegressionbyLocalFitting:Methods, Properties,andComputationalAlgorithms. JEcon 1988, 37 :87 – 114. 84.EngelenK,CoessensB,MarchalK,DeMB: MARAN:normalizingmicro-array data. Bioinformatics 2003, 19 :893 – 894. 85.IhakaR,GentlemanR: R:Alanguagefordataanalysisandgraphics. JournalofComputationalandGraphicalStatistics.JournalofComputational andGraphicalStatistics 1996, 5 :299 – 314. 86.VenetD: MatArray:aMatlabtoolboxformicroarraydata. Bioinformatics 2003, 19 :659 – 660. 87.KumarR,LawrenceML,WattJ,CookseyAM,BurgessSC,NanduriB: RNA-seqbasedtranscriptionalmapofbovinerespiratorydisease pathogen “ Histophilussomni2336 ” PLoSOne 2012, 7 :e29435.Kogenaru etal.BMCGenomics 2012, 13 :629 Page15of16 http://www.biomedcentral.com/1471-2164/13/629

PAGE 16

88.WurtzelO,SapraR,ChenF,ZhuY,SimmonsBA,SorekR: Asingle-base resolutionmapofanarchaealtranscriptome. GenomeRes 2010, 20 :133 – 141. 89.Yoder-HimesDR,ChainPS,ZhuY,WurtzelO,RubinEM,TiedjeJM,SorekR: MappingtheBurkholderiacenocepacianicheresponseviahighthroughputsequencing. ProcNatlAcadSciUSA 2009, 106 :3976 – 3981. 90.TarazonaS,Garcia-AlcaldeF,DopazoJ,FerrerA,ConesaA: Differential expressioninRNA-seq:amatterofdepth. GenomeRes 2011, 21 :2213 – 2223. 91.DanielsMJ,BarberCE,TurnerPC,SawczycMK,ByrdeRJ,FieldingAH: CloningofgenesinvolvedinpathogenicityofXanthomonascampestris pv.campestrisusingthebroadhostrangecosmidpLAFR1. EMBOJ 1984, 3 :3323 – 3328. 92.AndersS,HuberW: Differentialexpressionanalysisforsequencecount data. GenomeBiol 2010, 11 :R106. 93.ReimersM,CareyVJ: Bioconductor:anopensourceframeworkfor bioinformaticsandcomputationalbiology. MethodsEnzymol 2006, 411 :119 – 134. 94.AltschulSF,MaddenTL,SchafferAA,ZhangJ,ZhangZ,MillerW,LipmanDJ: GappedBLASTandPSI-BLAST:anewgenerationofproteindatabase searchprograms. NucleicAcidsRes 1997, 25 :3389 – 3402. 95.BindeaG,MlecnikB,HacklH,CharoentongP,TosoliniM,KirilovskyA, FridmanWH,PagsF,TrajanoskiZ,GalonJ: ClueGO:aCytoscapeplug-in todecipherfunctionallygroupedgeneontologyandpathway annotationnetworks. Bioinformatics 2009, 25 :1091 – 1093. 96.ShannonP,MarkielA,OzierO,BaligaNS,WangJT,RamageD,AminN, SchwikowskiB,IdekerT: Cytoscape:asoftwareenvironmentfor integratedmodelsofbiomolecularinteractionnetworks. GenomeRes 2003, 13 :2498 – 2504. 97.LivakKJ,SchmittgenTD: Analysisofrelativegeneexpressiondatausing real-timequantitativePCRandthe2( DeltaDeltaC(T))Method. Methods 2001, 25 :402 – 408.doi:10.1186/1471-2164-13-629 Citethisarticleas: Kogenaru etal. : RNA-seqandmicroarray complementeachotherintranscriptomeprofiling. BMCGenomics 2012 13 :629. Submit your next manuscript to BioMed Central and take full advantage of: € Convenient online submission € Thorough peer review € No space constraints or color “gure charges € Immediate publication on acceptance € Inclusion in PubMed, CAS, Scopus and Google Scholar € Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit Kogenaru etal.BMCGenomics 2012, 13 :629 Page16of16 http://www.biomedcentral.com/1471-2164/13/629