Group Title: BMC Genomics
Title: Allele-specific expression assays using Solexa
CITATION PDF VIEWER THUMBNAILS PAGE IMAGE ZOOMABLE
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00099921/00001
 Material Information
Title: Allele-specific expression assays using Solexa
Physical Description: Book
Language: English
Creator: Main, Bradley
Bickel, Ryan
McIntyre, Lauren
Graze, Rita
Calabrese, Peter
Nuzhdin, Sergey
Publisher: BMC Genomics
Publication Date: 2009
 Notes
Abstract: BACKGROUND:Allele-specific expression (ASE) assays can be used to identify cis, trans, and cis-by-trans regulatory variation. Understanding the source of expression variation has important implications for disease susceptibility, phenotypic diversity, and adaptation. While ASE is commonly measured via relative fluorescence at a SNP, next generation sequencing provides an opportunity to measure ASE in an accurate and high-throughput manner using read counts.RESULTS:We introduce a Solexa-based method to perform large numbers of ASE assays using only a single lane of a Solexa flowcell. In brief, transcripts of interest, which contain a known SNP, are PCR enriched and barcoded to enable multiplexing. Then high-throughput sequencing is used to estimate allele-specific expression using sequencing counts. To validate this method, we measured the allelic bias in a dilution series and found high correlations between measured and expected values (r>0.9, p < 0.001). We applied this method to a set of 5 genes in a Drosophila simulans parental mix, F1 and introgression and found that for these genes the majority of expression divergence can be explained by cis-regulatory variation.CONCLUSION:We present a new method with the capacity to measure ASE for large numbers of assays using as little as one lane of a Solexa flowcell. This will be a valuable technique for molecular and population genetic studies, as well as for verification of genome-wide data sets.
General Note: Periodical Abbreviation:BMC Genomics
General Note: Start page 422
General Note: M3: 10.1186/1471-2164-10-422
 Record Information
Bibliographic ID: UF00099921
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: Open Access: http://www.biomedcentral.com/info/about/openaccess/
Resource Identifier: issn - 1471-2164
http://www.biomedcentral.com/1471-2164/10/422

Downloads

This item has the following downloads:

PDF ( PDF )


Full Text



BMC Genomics


0
BioMel Central


Methodology article

Allele-specific expression assays using Solexa
Bradley J Main* Ryan D Bickel', Lauren M Mclntyre2, Rita M Graze2,
Peter P Calabrese' and Sergey V Nuzhdin'


Address: 'Section of Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles,
California 90089, USA and 2Departments of Molecular Genetics and Microbiology and Statistics, University of Florida, Gainesville,
Florida 32610-1399, USA
Email: Bradley J Main* bmain@usc.edu; Ryan D Bickel rbickel@usc.edu; Lauren M McIntyre mcintyre@mgm.ufl.edu;
Rita M Graze rgraze@ufl.edu; Peter P Calabrese petercal@usc.edu; Sergey V Nuzhdin snuzhdin@usc.edu
* Corresponding author


Published: 9 September 2009
BMC Genomics 2009, 10:422 doi: 10.1 186/1471-2164-10-422


Received: 21 April 2009
Accepted: 9 September 2009


This article is available from: http://www.biomedcentral.com/1471-2164/10/422
2009 Main et al; licensee BioMed Central Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.ore/licenses/by/2.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.



Abstract
Background: Allele-specific expression (ASE) assays can be used to identify cis, trans, and cis-by-
trans regulatory variation. Understanding the source of expression variation has important
implications for disease susceptibility, phenotypic diversity, and adaptation. While ASE is commonly
measured via relative fluorescence at a SNP, next generation sequencing provides an opportunity
to measure ASE in an accurate and high-throughput manner using read counts.
Results: We introduce a Solexa-based method to perform large numbers of ASE assays using only
a single lane of a Solexa flowcell. In brief, transcripts of interest, which contain a known SNP, are
PCR enriched and barcoded to enable multiplexing. Then high-throughput sequencing is used to
estimate allele-specific expression using sequencing counts. To validate this method, we measured
the allelic bias in a dilution series and found high correlations between measured and expected
values (r>0.9, p < 0.001). We applied this method to a set of 5 genes in a Drosophila simulans
parental mix, FI and introgression and found that for these genes the majority of expression
divergence can be explained by cis-regulatory variation.
Conclusion: We present a new method with the capacity to measure ASE for large numbers of
assays using as little as one lane of a Solexa flowcell. This will be a valuable technique for molecular
and population genetic studies, as well as for verification of genome-wide data sets.


Background
Genotype-phenotype mapping is a fundamental aim of
biological science. This is critical for many goals such as
understanding of how genetic architecture shapes pheno-
typic variation and adaptation [1-3], and more specific
aims such as deciphering how genetic variation in
humans may affect response to treatment [4,5]. Many
genetic variants resulting in phenotypic differences are
mediated through changes in gene expression. Thus, ana-
lyzing gene expression allows us to better understand gen-


otypic variation. Variation in gene expression can be due
to polymorphisms both at the gene locus (cis) and in
other genes that influence its expression (trans), as well as
the non-additive interactions between the two (cis-by-
trans) [6]. Furthermore, epigenetic mechanisms [7], chro-
matin conformation [8], copy number variation [9,10]
and microRNA [11] all play important roles in the tran-
scription of a given gene. By partitioning regulatory varia-
tion into cis, trans, and cis-by-trans, we can identify their
respective contributions to changes in gene expression


Page 1 of 9
(page number not for citation purposes)







BMC Genomics 2009, 10:422


and potentially how expression levels evolve within the
genomes of complex organisms [12,13].

Allele-specific expression (ASE) studies have introduced a
creative method to uncover the respective contributions of
cis- and trans-regulatory variation [12,14-17]. First, total
expression differences are measured from a pooled sam-
ple of two homozygous lines. Then, cis-regulatory varia-
tion is estimated from the allelic imbalance (unequal
expression of alleles) in the corresponding F1 heterozy-
gote, where each allele is regulated by the same trans-fac-
tors [18]. Trans can then be inferred from the total
expression differences that are not explained by cis. Of
course, these inferences about cis- and trans-regulatory
variation can be complicated if cis-elements and trans-fac-
tors interact non-additively [17,19,20]. Allelic imbalance
in non-imprinted genes has been shown to be common in
mice, maize and humans [18,21,22]. Also, a few studies
have investigated cis- and trans-regulatory variation within
and between species of Drosophila. Measuring ASE,
Wittkopp et al. reported that cis-regulatory variation plays



I ea 7
solexa adapter/ !oe F
sequencing template


http://www. biomedcentral.com/1471-2164/10/422



a predominant role in divergent gene expression between
D. melanogaster and D. simulans [15].

Allele-specific expression has been measured using vari-
ous targeted approaches including reverse transcription-
PCR (asRT-PCR; [23]), and pyrosequencing [15,24].
Genome-wide approaches have also been used including
serial analysis of gene expression (SAGE) [25,26], mas-
sively parallel signature sequence (MPSS)[27,21], and
microarray-based methods [22,28]. Here, we introduce a
simple approach to ASE assays that combines a targeted
approach to gene expression assays with the power of
high-throughput sequencing. In brief, transcripts of inter-
est (containing a known SNP) are PCR enriched and bar-
coded to enable large-scale multiplexing. Using this
approach, we sequence only regions of interest and allele-
specific read counts are used to estimate ASE for large
numbers of samples using a single lane of a Solexa flow-
cell (Figure 1 and 2). To demonstrate its application, we
investigated variation in gene expression in a set of five
Drosophila simulans genes. Using a common tester line, we

b


genomic or cDNA


unique sample barcode


PCR
Amplification
Clean-up

Sequencing


gene-specific
PCR primer


4-SNP position


Output

ATAr(GCCCATCACGCACACGcflGACACCA
ATAV[GCCCATCACGCACACGII4GACACCA
ATAr[GCCCATCACGCACACGI4 GACACCA
ATA('GCCCATCACGCACACG(G GACACCA
ATAVGCCCATCACGCACACG(U GACACCA


(n
-o
(0



E
D


Rev primer


Rev prim ~er a

77N ,?


A G


Figure I
ASE using Solexa. Summary of sample preparation: (a) The forward primer is designed to anneal I bp upstream of the
known SNP. The 5' tail contains a unique barcode and adapter sequences necessary for hybridization to the Solexa sequencing
platform. (b) Model of target enrichment and allele-specific expression represented as sequencing coverage per allele.



Page 2 of 9
(page number not for citation purposes)







http://www. biomedcentral.com/1471-2164/10/422


ASE Using Quantitative Sequencing (W Line 84)


-1.0 r__7_k_^ L___I--j J* L_._1 __ L_- *
DNA RNATech-Reps DNA RNA Tech-Reps DNA RNA Tech-Reps DNA RNA Tech-Reps
Bio-Reps Bio-Reps Bio-Reps Bio-Reps
CG10824 CG11459 CG2604 DSX

SCorrected Proportion W Line 84 F1 Heterozygote DNA/RNA/Corrected RNA

Figure 2
Count-based ASE assays. Allele-specific expression represented as sequencing reads per W-allele (black) and tester allele
(white). The sequencing coverage per allele is shown on each bar and the corrected RNA (*) is represented as a proportion of
each allele. The y-axis is the proportion of the difference (allele I allele2)/(allele I + allele2).


measured ASE in an equal mix of homozygotes (parental
mix), a heterozygote, and an introgression (Figure 3). By
analyzing changes in ASE under these different regulatory
conditions, we elucidate the respective contributions of
cis, trans, and cis-by-trans interactions on variation in gene
expression. Furthermore, we tested for within-species var-
iation in cis and trans by comparing trends in ASE between
six highly inbred lines collected from a single population.

Results and discussion
Digital ASE assay
In this study, we introduce an accurate approach to allele-
specific expression assays that relies on read counts gener-
ated from Solexa sequencing. For each gene of interest, a
single nucleotide polymorphism (SNP) within the mRNA
transcript was identified that differs between the lines of
interest and our tester line. We then designed a PCR
primer that annealed immediately flanking the 5' end of
the SNP and another primer that annealed 200-300 base
pairs downstream (Figure 1). In the primer design, we
included adapter sequences, provided by Illumina, as 5'


tails to allow these PCR products to be sequenced on the
Solexa platform without additional steps. When we
planned on analyzing more than one sample per gene, a
unique forward primer was ordered for each sample that
contained the common elements described above, plus a
unique one to three base pair barcode sequence in the 5'
tail that will allow for individual sample identification
(Figure la). We used a touchdown PCR cycling program
to enrich each sample for our target region and then puri-
fied the amplicons using gel extraction. We then pooled
the purified samples in large numbers (in our case, 300)
and sequenced them in parallel on a single lane of a Sol-
exa flow cell. The resulting reads were analyzed by assign-
ing each read to a specific gene based on homology and
sample based on the unique barcode (see methods). We
can then compare the number of reads containing each
SNP to have a digital representation of ASE (Figure lb
and 2).

In all comparisons, both alleles were amplified in the
same reaction and thus utilized the same reagents. As a

Page 3 of 9
(page number not for citation purposes)


BMC Genomics 2009,10:422







http://www. biomedcentral.com/1471-2164/10/422


Parental mix


F1
Heterozygotes


Introgression


1XI 34IIII hllllX2 3 I4. 3 IIX .
XY 2 3 4 XY 2 3 4 XY 2 3 4 XY 2 3 4


b


+


CG2604
CG 10824
CG11459
dsx


Figure 3
Nucleic acid samples. The three nucleic acid sample types used in this study: the parental mix, FI heterozygote, and intro-
gression lines. (a) Representation of all chromosomes for each genetic background assayed. Red represents the W line and
black represents the tester line. (b) Zoom-in of chromosome 3 for each genotype. All genes analyzed are between the genetic
markers scarlet (st) and ebony (e).


result, each allele should theoretically maintain the same
relative abundance throughout amplification. However,
this may not be the case if small differences in primer
hybridization or polymerase efficiency exist between alle-
les. We can control for this error in amplification by ana-
lyzing heterozygous DNA samples, where the actual allele
frequency is 50:50, and then correcting each RNA sample
by the difference observed in the heterozygote ([15] and
Additional file 1). DNA samples from each mixed sample
were also analyzed in order to correct for both allele-spe-
cific amplification error and differences in cell number
between the individuals extracted from each parental line
[15]. We report allelic imbalance as: the proportion of
reads that are differentially expressed ((allelel allele2)/
(allelel + allele2)). This approach allows us to easily esti-
mate the binomial sampling error. If there is no difference
between alleles, the bias is zero, while the bias is positive
if the first allele is favored and negative if the second allele
is favored.

To verify the accuracy of Solexa and the normalization
procedure, we created three replicate dilution series using
genomic DNA from the tester line and an experimental
line (W line 84). Then we estimated the allelic bias at each
step of the dilution (in multiplex) using a separate
sequencing lane. All four genes demonstrated a strong
correlation (r >0.9 p < 0.001) between the expected and
observed allelic bias (Figure 4). The gene dsx had a rela-
tively low correlation, because there was very limited


sequencing coverage for all samples within this gene.
Thus, the Solexa sequencing output can be used to accu-
rately measure the relative abundance of alleles in a given
sample.

By analyzing technical variation and correlation coeffi-
cients in the verification experiment, it appears that cover-
age on the order of a few hundred reads is sufficient to
yield reproducible results (Figure 4 and Additional file 1).
Thus, increased biological replication should be favored
over sequencing coverage beyond a few hundred reads
(Additional file 1: Table S3). In this study, coverage varied
widely between genes (Additional file 1: Table Si), while
coverage of individual assays within each gene was similar
(Additional file 1: Table S3). This is most likely due to var-
iation in initial transcript abundance, variation in ampli-
fication efficiency between genes and the resulting uneven
pooling of DNA between genes. Therefore, we suggest that
if the pooling of DNA is done carefully based on the con-
centration of each purified sample or PCR amplicon band
intensity, the sequencing coverage can be more evenly dis-
tributed. This will also increase the number of high-qual-
ity ASE assays that can be performed on a single lane of
Solexa.

The cost per base of sequence using Solexa is very low, so
for large-scale projects, the preparation costs such as bar-
coded primers become the limiting economic factor. To
address these concerns, we suggest one of the following

Page 4 of 9
(page number not for citation purposes)


BMC Genomics 2009,10:422







http://www. biomedcentral.com/1471-2164/10/422


dsx


CG2604


*


r =.85

0.0 0.2 0.4 0.6 0.8 1.0

Dilution Series



CG11459












r =.98
I0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0


Dilution Series


.






r = .95

0.0 0.2 0.4 0.6 0.8 1.0

Dilution Series



CG10824












r =.97

0.0 0.2 0.4 0.6 0.8 1.0


Dilution Series


Figure 4
Verification of ASE assays using Solexa. Each data point represents one of three replicate dilutions analyzed at each step
of the dilution series (9:1, 8:2, 7:3, 5:5, 3:7, 2:8, 1:9). The distribution of sequencing reads within each gene is demonstrated as
follows: mean SE (n = 21). DSX = I1.3 3.8, CG2604 = 2,478.3 288.8, CG10824 = 2,756.4 333.5, CGI 1459 = 11,578.6
2515.1.


approaches depending on the type of project: For studies
where many assays are performed with a few select genes,
barcode costs can be significantly reduced if paired-end
sequencing reads are combined with multiplicative bar-
coding. For example, instead of ordering 900 barcoded
adapters for a given gene, we can create 900 unique bar-
code combinations using only 30 barcoded forward prim-
ers and 30 barcoded reverse primers. For studies involving
large gene sets, barcoded adapter sequences can be added
to typical gene-specific annealing primers before PCR
using single-strand ligation. Using this approach, bar-
coded adapter sequences can be used in multiple experi-
ments. We have shown that Solexa can effectively estimate


ASE using this targeted approach and we mention these
additional modifications to allow easier adaptation for
future researchers.

Error analysis
To understand the error in quantifying ASE, we looked at
sources of variation at multiple levels. First, we estimated
sequencing error at the SNP position by identifying the
proportion of reads with an incorrect base-call. Sequenc-
ing error was well below 2% for most of the genes. Fur-
thermore, this error rate did not seem to change when the
position of the SNP within the read shifted due to barcode
lengths changing from one to three base pairs (Additional


Page 5 of 9
(page number not for citation purposes)


BMC Genomics 2009, 10:422







http://www. biomedcentral.com/1471-2164/10/422


file 1: Table S2). Second, we estimated the binomial sam-
pling error and found that this error quickly becomes neg-
ligible with the high coverage obtained with this method
(Additional file 1: Table S3). Third, we assessed the error
introduced by the method, such as polymerase and
reverse transcription error by comparing allelic imbalance
between technical replicates (cDNA templates created sep-
arately from the same RNA pool). This technical variation
was considerably more than binomial sampling once cov-
erage was over a few hundred reads (Additional file 1:
Table S3). And finally, to analyze overall variation in
expression estimates, including dynamic gene expression
in vivo, we compared ASE estimates between biological
replicates (separately collected material from the same
genetic cross). The biological variation was greater than
technical variation in this study, but an accurate assess-
ment of the relative magnitude of technical and biological
variance is beyond the scope of this study and thus, both
should be considered when designing future experiments
(Additional file 1: Table S3).

Cis- and trans-regulatory variation within species
We used six highly inbred lines of D. simulans from Win-
ters, CA (W Lines) and a scarlet ebony (st e) tester line in
this study. For each W line, we compared expression to the
tester line in a parental mix (i.e. an equal mix of
homozygous tester and W line flies), the related F1 heter-
ozygote, and a corresponding introgression (see methods
for details) (Figure 3). This experimental design allows us
to assess intraspecific regulatory variation in cis, trans and
cis-by-trans. To do this, we estimated the overall expression
differences in four genes in the aforementioned parental
mix. Then, we determined cis-regulatory variation from
the allelic imbalance present in the related F1 heterozy-
gote, where trans-factors affect each allele equally. We can
then infer trans from the overall expression difference that
is not explained by cis. In the four genes analyzed, cis-reg-
ulatory variation explains the majority of the overall
divergence in gene expression between the tester line and
the W lines (Figure 5, regression coefficient = 0.91 + 0.13,
[12]). It should be noted that this is a small gene set and
most of these genes have been shown to be variable
within-species (see methods). Thus, we do not consider
this result to be a reflection of the genome as a whole. One
explanation for the small contribution of trans in these
genes is that trans-variation has an increased potential for
pleiotropic effects, some of which may be deleterious and
removed by purifying selection in the W lines tested.

Gene expression is a result of cis-regulatory elements inter-
acting with trans-regulatory proteins. If there is variation
in both cis-elements and trans-proteins, there is the poten-
tial for non-additive interactions (cis-by-trans) [29]. To
test for cis-by-trans, we compared allelic imbalance in the
heterozygote and the introgression within each W line


A dsx
+- CG2604
CG10824
SCG011459

0 0



-1.0 -0.5 0 -5 1.0








expression bias between line and tester (line/tester)
Figure 5
Intraspecific regulatory variation due to cis and trans.
Expression bias is the proportion of the difference between
alleles: (allele I allele2)/(allele I + allele2). The parental mix is
plotted on the x-axis and the F I heterozygote is plotted on
the y-axis. A 1: 1 relationship would indicate 100% cis and a
slope of zero would indicate 100% trans. A 1: 1 line is shown
for reference. The vast majority of the expression differences
between the W lines and the tester line are attributed to cis,
for the four genes and six lines tested (regression coefficient
= 0.91 0.13). Color code: line 84 blue, line 181 red, line
147 purple, line 192 orange, and line 68 green.


(Figure 3). If all genes act additively, allelic imbalance in
the heterozygote and introgression should be equal
because the cis-regulatory elements for each allele and the
trans-factors within each genotype are identical. If allelic
imbalance between these genotypes is not equal, that dif-
ference is due to cis-by-trans. We lacked the statistical
power to individually detect these cis-by-trans interac-
tions, but we were able to test for a systematic shift in
expression between these genotypes across all genes and
W lines. We hypothesized that if the cis-elements and
trans-factors within the tester line and the W lines co-
evolved, non-additive interactions would be relatively
common and therefore we might detect significant cis-by-
trans effects across all genes and lines. To test for this, we
analyzed the relationship between heterozygous and
introgression ASE values for the 6 W lines and 4 genes
using a linear model. The regression coefficient is not sig-
nificantly different from one (p = 0.5) and thus, there is
no systematic cis-by-trans interactions detected within our
sample population and selected genes. This result,
together with recent findings seems to be indicating that
at least within Drosophila, epistasis may be rare or is small
in magnitude [17]. This may also be due to the lack of



Page 6 of 9
(page number not for citation purposes)
nosseai(4-,-rn neatos eetdwti u


BMC Genomics 2009,10:422







BMC Genomics 2009, 10:422


population structure in Drosophila, which would hinder
the co-evolution of divergent cis-elements and trans-fac-
tors.

To compliment our previous estimates of cis and trans
between the tester line and the W lines, we measured var-
iation in cis between W lines in order to give additional
insight into the type and magnitude of regulatory varia-
tion that may be segregating in natural populations. To
identify this variation we tested for a significant effect of a
given W line on the overall estimate of cis and trans. Using
this approach we were unable to detect significant varia-
tion between lines (p = 0.31). We did however find evi-
dence for variation between the W lines and the tester line
suggesting that genetic variation may be more common
between populations (Figure 2).

Conclusion
We have presented a new method that uses Solexa to accu-
rately measure ASE for hundreds of samples using as little
as one lane of a Solexa flowcell. This will be a valuable
technique for analyzing a few genes for many individuals
or under many conditions, measuring a large selection of
genes for a few individuals, and verifying ASE estimates
from genome-wide expression assays.

Methods
Fly strains and genotypes
D. simulans lines used in this study were collected from the
Wolfskill orchard in Winters, CA and inbred for a mini-
mum of 20 generations to create highly inbred lines (W
lines) [30]. A D. simulans tester line containing the visible
mutations scarlet (st) and ebony (e) (st ((chr3L: 15833418-
15836165), e (chr3R: 4397293-4404648)) was obtained
from the Drosophila stock center at UCSD (stock number
14021-0251.041) and inbred for over 20 generations. We
analyzed ASE in a parental mix, heterozygote, and intro-
gression for each W line against the common tester line
(lines: 84, 181, 147,201, 68, 192) (Figure 3). To create the
parental mix, homozygous male flies from a given W line
were homogenized together with homozygous male flies
from the tester line. The heterozygous genotype was the
F1 male progeny from a cross between a given W line male
and two virgin tester line females. And finally, to construct
introgression lines, F1 heterozygous females were created
from an initial cross between a given W line female and st
e male. Then they were backcrossed to the st e line. After
each generation, wild-type females were selected (hetero-
zygous genotype at both markers) for subsequent back-
crossing. This process was repeated for a minimum of 20
generations to create an introgression line. Thus the intro-
gression line is heterozygous for approximately 12 Mb
between the genetic markers st and e on the third chromo-
some, while the remainder of the genome is homozygous
for the st e line (Figure 3). Two replicate introgression


http://www. biomedcentral.com/1471-2164/10/422



lines were created in parallel for each W line (introgres-
sion A and B) to control for expression differences due to
different introgression cutoff points flanking the visible
markers. Reciprocal crosses were not analyzed, but the
effects of genomic imprinting, the only type of parent-of-
origin effects that would impact ASE, are likely rare or do
not exist in adult Drosophila [311. All flies were reared at
~25 C with a 12 h:12 h light cycle.

Gene selection
We chose a set of genes within the introgressed region on
chromosome three for further analysis (Figure 3b). Genes
were chosen based on current interest in the lab (dsx) and
previous analyses showing them to vary within species
(Cyp21c, CG10824, CG2604) [30] or between species
(CG11459) [15]. We resequenced most of the coding
region of each gene and then designed our gene-specific
primers based on the location of a single nucleotide poly-
morphism (SNP) between the W lines and the st e tester
line. The Cyp21c primers non-specifically amplified a
homologous gene (Cyp4ac3), thus it was not analyzed fur-
ther.

Extraction of nucleic acids and cDNA synthesis
We extracted DNA and RNA in parallel from 14 whole-
body adult male flies for each sample using a modified
protocol for Promega's SV Total RNA Extraction Kit [15].
Experimental samples included a parental mix and a het-
erozygous DNA sample, along with cDNA from the paren-
tal mix, Fl, and introgression A and B (see above) for each
W line. We included replicate biological samples (sepa-
rately collected material from the same genetic cross) for
all experimental samples involving W line 181 and 84.
Also, we included technical replicate (cDNA templates
created separately from the same RNA pool) samples for
all biological samples involving W lines 84 (Additional
file 3). To extract nucleic acid from the parental mix, we
homogenized seven homozygous male flies from a given
W line together with seven homozygous male flies from
the st e line. The heterozygous and the introgression gen-
otypes were extracted from 14 male progeny. Before
extraction, adult male flies (three to five days old) were
snap-frozen in liquid nitrogen in the morning and stored
at -70 C until extraction. Briefly, we passed the superna-
tant from fly homogenate through an affinity column
optimized for binding DNA. Then the flow-through was
run through another column optimized to bind all RNA.
We then treated RNA columns with DNAse, followed by
subsequent washing steps and elution. Using Applied Bio-
system's (AB) High-Capacity cDNA Reverse Transcription
Kit we immediately synthesized cDNA from the RNA sam-
ples using AB's standard protocols with RNAse inhibitor.
Following extraction, we stored all cDNA and DNA sam-
ples at -70 C until further preparation.



Page 7 of 9
(page number not for citation purposes)







BMC Genomics 2009, 10:422


PCR and purification
PCR primers for the ASE assay were designed such that
one annealed immediately flanking the five prime (5')
end of the SNP to be sequenced and the other, 200-300
base pairs downstream (3'). Then we modified each
primer design with a custom 5' tail consisting of Solexa
adapter sequences. Furthermore, to allow for multiplex-
ing, a sample-specific barcode (one to three base pairs)
was added to the design between the adapter and the
annealing primer (Figure 1 and Additional file 2).

All PCR reactions were performed using Finnzymes Phu-
sion High-Fidelity DNA Polymerase and a touchdown
cycling program stepping down 1 each cycle from 70 C
to 60 C, followed by an additional 25 cycles at 60C
annealing. Samples were run on a 2% agarose gel for
amplicon size confirmation and agarose gel purification.
PCR products were gel purified using Qiagen's Qiaex II
Gel extraction kit and standard protocols. Following puri-
fication, a portion of each of the 300 PCR enriched sam-
ples was pooled together, ethanol precipitated and then
resuspended to a concentration of 10 nM. The pooled
sample was sequenced at Cornell's core sequencing center
using a custom primer.

Data analysis
Using a custom Perl script, the 6 million reads generated
from a single lane of a Solexa flow cell were assigned to
one of our five genes based on the first eight base pairs
(allowing for one mismatch) of the annealing primer.
Within each gene pool, the reads were then separated into
60 unique ASE assays using the one to three base pair sam-
ple-specific barcodes (Figure 1). We then checked for a
known sequence surrounding the SNP, including five base
pairs 5' of the SNP and eight base pairs 3' of the SNP. If
these 13 base pairs (5 bp pre-SNP + 8 bp post-SNP) could
not be identified (allowing for a single mismatch) the
reads were discarded. We then scored each read for the
nucleotide in the SNP position to determine the allele-
specific read count (Additional file 3).

To determine the respective contributions of cis- and trans-
regulatory variation, we compared ASE in the parental mix
flies and the heterozygous flies. ASE differences in the
parental mix reflect total variation in gene expression,
including cis and trans. In contrast, trans is shared between
alleles in the heterozygote, thus only cis contributes to
ASE differences in this genotype. As a result, trans can be
inferred from the percent of the total variation estimated
in the parental mix (cis+trans) that is not explained by the
variation estimated in the heterozygote (cis). To perform
this test this we used the model: Yii = |t+BXij+eij where for
the ith W line and the jth gene, Y is the estimated difference
between alleles in the heterozygote (cis) and X is the total
difference in expression estimated from the parental mix.


http://www. biomedcentral.com/1471-2164/10/422



Then, we estimated trans-regulatory variation from the
percent of the overall expression differences found in the
parental mix that are not explained by this model (Figure
5). Missing data points are due to lack of a SNP within a
given gene and W line.

To identify cis-by-trans interactions (i.e. non-additive
effects), we tested whether the allelic bias in heterozygotes
was significantly different from the allelic bias in intro-
gressions. To examine this effect, we fit the model: Yii =
lt+BZij+eij where for the ith W line and for the jth gene, Y is
the estimated difference between alleles for the heterozy-
gote and Z is the estimated difference between alleles for
the introgression line. The cis-regulatory elements for each
allele in the heterozygote are identical to the cis-regulatory
elements in the introgression, hence the allelic bias can-
not vary due to cis between these genotypes. Furthermore,
although the trans-factors change between genotypes (het-
erozygous W/st e to homozygous st e) the trans-factors are
shared equally between alleles within each genotype and
thus, differences in trans cannot contribute to allelic bias.
As a result, any change in allelic bias between these geno-
types must be due to cis-by-trans interactions.

To test for natural variation (i.e. variation between W
lines) in the relative contribution of cis and trans, we fit
the model Yij = t+BXij+B2Lj+eij where for the ith line and
the j*t gene, Y is the average bias between alleles for the
parental mix, and X is the average bias between alleles for
the heterozygote. We then tested for the effect of line.

To verify the accuracy of ASE assays using Solexa, we used a
separate lane of a flow cell to analyze three replicate dilution
series (1:9, 2:8, 3:7, 5:5, 7:3, 8:2, 9:1). These dilutions were
created using genomic DNA extracted separately from
homozygous W line 84 and the homozygous st e tester line
(Figure 4). Each of the four genes analyzed in this study were
tested for the ability to accurately detect known allele fre-
quencies. Heterozygous DNA from a cross between these
lines was also analyzed in parallel to correct for allele-specific
amplification bias ([ 15] and Additional file 1). To correct for
differences in starting concentration of the homozygous
DNA used for the dilution series, we corrected the expected
values by the average allelic bias found in the 1:1 mix [15].
Correlations between the known dilution and the measured
ASE using Solexa were estimated separately for each gene
using Pearson's correlation coefficient [32].

Authors' contributions
BJM conceived the method and performed the experi-
ments. SVN BJM RMG designed the experiment. RDB SVN
BJM LMM PPC analyzed the data. BJM RDB LMM SVN
wrote the paper. All authors read and approved the final
manuscript.



Page 8 of 9
(page number not for citation purposes)








BMC Genomics 2009, 10:422




Additional material


Additional file 1
Supplemental materials. "Supplemental material"
Click here for file
[http://www.biomedcentral.com/content/supplementary/1471-
2164-10-422-S1.doc]

Additional file 2
Detailed protocol. "ase using solexa protocol Detailed Protocol
Click here for file
[http://www.biomedcentral.com/content/supplementary/1471-
2164-10-422-S2.doc]

Additional file 3
Sequencing counts. "Raw counts" All sequencing counts for each sample.
Click here for file
[http://www.biomedcentral.com/content/supplementary/1471-
2164-10-422-S3.xls]




Acknowledgements
We would like to thank Hyo-sik Jang for fly work. Also, we thankJoe Dun-
ham and Johanna Main for discussions and comments on the method and
paper, respectively. This work was supported by NIH grant RGM076643
(SVN) and I R01 GM077618 (LMM, SVN).

References
1. Anholt R, Mackay T: Quantitative genetic analyses of complex
behaviours in Drosophila. Nat Rev Genet 2004, 5:838-49.
2. Ellegren H, Sheldon B: Genetic basis of fitness differences in nat-
ural populations. Nature 2008, 452:169-75.
3. Harbison S, Carbone M, Ayroles J, Stone E, Lyman R, Mackay T: Co-
regulated transcriptional networks contribute to natural
genetic variation in Drosophila sleep. Nat genet 2009, 4 1:371-5.
4. Nardi V, Raz T, Cao X, Wu C, Stone R, Cortes J, et al.: Quantitative
monitoring by polymerase colony assay of known mutations
resistant to ABL kinase inhibitors. Oncogene 2008, 27:775-82.
5. Altshuler D, Daly M, Lander E: Genetic mapping in human dis-
ease. Science 2008, 322:881-8.
6. Rockman M, Kruglyak L: Genetics of global gene expression. Nat
Rev Genet 2006, 7:862-72.
7. Jaenisch R, Bird A: Epigenetic regulation of gene expression:
how the genome integrates intrinsic and environmental sig-
nals. Nat genet 2003, 33:245-54.
8. Higgs D, Vernimmen D, Hughes J, Gibbons R: Using genomics to
study how chromatin influences gene expression. Annu Rev
Genomics Hum Genet 2007, 8:299-325.
9. Henrichsen C, Vinckenbosch N, Zollner S, Chaignat E, Pradervand S,
Schutz F, etal.: Segmental copy number variation shapes tissue
transcriptomes. Nat genet 2009, 41:424-429.
10. Cahan P, Li Y, Izumi M, Graubert T: The impact of copy number
variation on local gene expression in mouse hematopoietic
stem and progenitor cells. Nat genet 2009, 4 1:430-437.
I I. Bartel D: MicroRNAs: genomics, biogenesis, mechanism, and
function. Cell 2004, 116:281-97.
12. Wittkopp P, Haerum B, Clark A: Regulatory changes underlying
expression differences within and between Drosophila spe-
cies. Nat genet 2008, 40:346-350.
13. Stamatoyannopoulos J: The genomics of gene expression.
Genomics 2004, 84:449-57.
14. Yan H, Yuan W, Velculescu V, Vogelstein B, Kinzler K: Allelic vari-
ation in human gene expression. Science 2002, 297:1143.
15. Wittkopp P, Haerum B, Clark A: Evolutionary changes in cis and
trans gene regulation. Nature 2004, 430:85-88.
16. Knight J: Allele-specific gene expression uncovered. Trends
genet 2004, 20:1 13-6.


http://www.biomedcentral.com/1471-2164/10/422




17. Wittkopp P, Haerum B, Clark A: Independent effects of cis- and
trans-regulatory variation on gene expression in Drosophila
melanogaster. Genetics 2008, 178:1831-5.
18. Cowles C, Hirschhorn J, Altshuler D, Lander E: Detection of regu-
latory variation in mouse genes. Nat genet 2002, 32:432-437.
19. Carlborg 0, Haley C: Epistasis: too often neglected in complex
trait studies? Nat Rev Genet 2004, 5:618-25.
20. Gibson G: Extensive Sex-Specific Nonadditivity of Gene
Expression in Drosophila melanogaster. Genet 2004,
167:1791-1799.
21. Guo M, Rupe M, Zinselmeier C, Habben J, Bowen B, Smith 0: Allelic
variation of gene expression in maize hybrids. Plant Cell 2004,
7:1707-16.
22. Lo H, Wang Z, Hu Y, Yang H, Gere S, Buetow K: Allelic variation
in gene expression is common in the human genome. Genome
Res 2003, 13:1855-1862.
23. Singer-Sam J, Gao C: Quantitative RT-PCR-Based Analysis of
Allele-Specific Gene Expression. Meth Mol Biol 2001,
181:145-52.
24. Ahmadian A, Gharizadeh B, Gustafsson A, Sterky F, Nyr6n P, Uhlen
M, et al.: Single-nucleotide polymorphism analysis by pyrose-
quencing. Anal Biochem 2000, 280:103-10.
25. Velculescu V, Zhang L, Vogelstein B, Kinzler K: Serial analysis of
gene expression (LongSAGE) and 3' LongSAGE for tran-
scriptome characterization and genome annotation. Proc
Natl Acad Sci USA 2004, 32:1 1701 I-6.
26. Wei C, Ng P, Chiu K, Wong C, Ang C, Lipovich L, et al.: 5' Long
serial analysis of gene expression (LongSAGE) and 3' Long-
SAGE for transcriptome characterization and genome
annotation. Proc Natl Acad Sci USA 2004, 32:11701-6.
27. Brenner S, Johnson M, Bridgham J, Golda G, Lloyd DH, Johnson D, et
al.: Gene expression analysis by massively parallel signature
sequencing (MPSS) on microbead arrays. Nat Biotechnol 2000,
18:630-4.
28. Serre D, Gurd S, Ge B, Sladek R, Sinnett D, Harmsen E, et al.: Differ-
ential allelic expression in the human genome: a robust
approach to identify genetic and epigenetic cis-acting mech-
anisms regulating gene expression. PLoS Genet 2008, 2:1000006.
29. Mackay T: The genetic architecture of quantitative traits.
Annu Rev Genet 2001, 35:303-39.
30. Nuzhdin S, Wayne M, Harmon K, McIntyre L: Common Pattern of
Evolution of Gene Expression Level and Protein Sequence in
Drosophila. Mol Biol Evol 2004, 21:1308-1317.
31. Wittkopp P, Haerum B, Clark A: Parent-of-origin effects on
mRNA expression in Drosophila melanogaster not caused
by genomic imprinting. Genet 2006, 173:1817-1821.
32. Neter J, Maynes E: On the Appropriateness of the Correlation
Coefficient with a 0, I Dependent Variable. j Am Stat Assoc
1970, 65:501-509.


Page 9 of 9
(page number not for citation purposes)


Publish with BioMed Central and every
scientist can read your work free of charge
"BioMed Central will be the most significant development for
disseminating the results of biomedical research in our lifetime."
Sir Paul Nurse, Cancer Research UK
Your research papers will be:
available free of charge to the entire biomedical community
peer reviewed and published immediately upon acceptance
cited in PubMed and archived on PubMed Central
yours you keep the copyright
Submit your manuscript here: BioMedcentral
http://www.biomedcentral.com/info/publishing adv.asp




University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs