BMC Genetics Central
Methodology article
Intersection tests for single marker QTL analysis can be more
powerful than two marker QTL analysis
Cynthia J Coffman1'2, RW Doerge3,4,5, Marta L Wayne6 and
Lauren M McIntyre*2,4,5
Address: 'Institute for Clinical and Epidemiological Research, Biostatistics Unit, Durham VA Medical Center (152), Durham, NC 27705 USA,
2Duke Medical Center, Department of Biostatistics and Bioinformatics, Durham, NC 27710 USA, 3Department of Statistics, Purdue University,
West Lafayette, IN 47907 USA, 4Department of Agronomy, Purdue University, West Lafayette, IN 47907 USA, 5Computational Genomics, Purdue
University, West Lafayette, IN 47907 USA and 6Department of Zoology, University of Florida, Gainesville, FL 326118525 USA
Email: Cynthia J Coffman cynthia.coffman@duke.edu; RW Doerge doerge@stat.purdue.edu; Marta L Wayne mlwayne@zoo.ufl.edu;
Lauren M McIntyre* lmcintyre@purdue.edu
* Corresponding author
Published: 19 June 2003 Received: 26 November 2002
BMC Genetics 2003, 4:10 Accepted: 19 June 2003
This article is available from: http://www.biomedcentral.com/14712156/4/10
2003 Coffman et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all
media for any purpose, provided this notice is preserved along with the article's original URL.
Abstract
Background: It has been reported in the quantitative trait locus (QTL) literature that when
testing for QTL location and effect, the statistical power supporting methodologies based on two
markers and their estimated genetic map is higher than for the genetic map independent
methodologies known as single marker analyses. Close examination of these reports reveals that
the two marker approaches are more powerful than single marker analyses only in certain cases.
Simulation studies are a commonly used tool to determine the behavior of test statistics under
known conditions. We conducted a simulation study to assess the general behavior of an
intersection test and a two marker test under a variety of conditions. The study was designed to
reveal whether two marker tests are always more powerful than intersection tests, or whether
there are cases when an intersection test may outperform the two marker approach.
We present a reanalysis of a data set from a QTL study of ovariole number in Drosophila
melanogaster.
Results: Our simulation study results show that there are situations where the single marker
intersection test equals or outperforms the two marker test. The intersection test and the two
marker test identify overlapping regions in the reanalysis of the Drosophila melanogaster data. The
region identified is consistent with a regression based interval mapping analysis.
Conclusion: We find that the intersection test is appropriate for analysis of QTL data. This
approach has the advantage of simplicity and for certain situations supplies equivalent or more
powerful results than a comparable two marker test.
Background mapping. These comparisons have shown that the addi
Many authors [14] have compared the statistical power tional information supplied by the genetic distance
of different methods for quantitative trait locus (QTL) between identifiable DNA markers unconfounds the QTL
Page 1 of 9
(page number not for citation purposes)
70pen Ac
http://www.biomedcentral.com/14712156/4/10
effect from the QTL location, thus making two marker
models more powerful than single marker models of
detection (for review see Doerge et al. [5]).
With the goal of detecting and/or locating QTL, there are
two common statistical approaches that can be taken. The
first approach is based on ANOVA, or simple linear regres
sion, and performs statistical tests based solely on single
DNA marker information. No genetic map is required for
single marker analysis, and the calculations are based on
phenotypic means and variances within each of the geno
typic classes. The second, and more involved approach, is
based on two DNA markers, the estimated recombination
fraction between them (or the estimated genetic distance),
and either a maximum likelihood based calculation or a
regression model including multiple (two or more) mark
ers as independent variables. The linear ordering of multi
ple DNA markers based upon their estimated
relationships (i.e., recombination fraction (or genetic dis
tance)) supplies the framework or genetic map for (com
posite) interval mapping [6,7] and as such unconfounds
the QTL effect and the QTL location, thus providing a
more precise means for detecting and locating QTL with
respect to the estimated genetic map for the organism
under investigation.
Rebai et al. [4] present a comprehensive comparison of
the statistical power for many of the commonly used
flanking marker or two marker methods employed, and
conclude that two marker mapping provides a relatively
small gain (5%) in power over single marker methods
when the two markers define an interval of width less
than 20 cM, but a substantial increase (greater than 30%)
in power for intervals upwards of 70 cM, indicating that
the gain in power may come from the addition of the sec
ond marker to the analysis, or the addition of information
from that marker, rather than the map.
Using the findings of Rebai et al. [4], and others as our
motivation, we hypothesize that the power increase
between single marker and two marker regression meth
ods is due to additional genotypic information in the sec
ond marker. In order to assess this, similar test statistics
should be compared. A comparison of maximum likeli
hood interval mapping to single marker ANOVA is com
plicated because of the differences that may be observed
due to differences between regression and maximum like
lihood [3], as well as differences in marker information.
In order to avoid this complication, we consider two
regression based approaches that differ only in the
number of markers included in the initial model (i.e. the
statistical methodology is the same, the models are differ
ent). First, a set of compound hypotheses are defined for
use with a single marker analysis and used to define an
intersection test. We then state the equivalent hypotheses
for a regression based two marker model for a test of the
interval. Last we compare, via simulation, the power of
the intersection and the two marker test under the stated
hypotheses for each model. We do not consider the case
of multiple QTL in a single interval, as this case was not
considered by Rebai et al. [4]. These two approaches are
then applied to a backcrosss' population of Drosophila
with 76 informative markers [8] to detect QTL associated
with ovariole number.
Results
Simulations
An overview of the simulations performed is given in
Table 1 and 2. For the purpose of evaluating the relative
difference in statistical power between the intersection
test and the two marker test, the estimated statistical
power of the two marker test was subtracted from that of
the intersection test for each of the parameter combina
tions investigated, and a ttest [9] was performed to test
the null hypothesis that the mean difference in power was
zero.
For the binomial phenotype with a backcross design, sam
ple size of 100, the 100 parameter combinations exam
ined resulted in 34 showing no difference in statistical
power between the intersection test and the two marker
test (i.e, the value of the difference was exactly zero), 39
favoring the intersection test, and 27 favoring the two
marker test. The intersection test was more powerful with
a mean difference in power of 0.010, and the ttest of the
null hypothesis that the difference in power was zero was
rejected (p = 0.020). When n was equal to 500, the 139
parameter combinations examined yielded 116 showing
there was no difference in statistical power, 18 parameter
combinations indicated the intersection test as more pow
erful, and 5 indicated the two marker test as more power
ful. The mean difference in power was 0.020, and the ttest
of the null hypothesis that the difference in power was
zero was rejected (p = 0.010). The test of the null hypoth
esis that the mean difference between the intersection and
two marker was zero was rejected for both sets of simula
tions. The estimated difference between the two
approaches was positive, indicating that the intersection
test has slightly higher power than the two marker test in
these cases.
We also investigated F2 experimental populations for a
binomial phenotype, using a sample size of 500. From the
25 parameter combinations investigated, 5 failed to con
verge consistently for the two marker model due to singu
larity in the design matrix. The remaining 20 parameter
combinations showed 10 as having no difference in statis
tical power, while the remaining 10 favored the intersec
tion test. Results similar to those found in the initial
simulations indicate that the intersection test performs as
Page 2 of 9
(page number not for citation purposes)
BMC Genetics 2003, 4
http://www.biomedcentral.com/14712156/4/10
Table I: Simulation conditions for a binary phenotype, two marker loci M I and M2, a single locus (Q), sample sizes n = 100, 500,
recombination fraction TM1Q and rM2Q respectively, and effect size. M1Q = rM2Q = 0.0 not simulated
Population n
rMQ
Effect Size
Number of combinations
Backcross 100, 500 0.0*, 0.10, 0.20, 0.30, 0.40, 0.50
F2 500 0.0*, 0.10, 0.20, 0.30, 0.40, 0.50
0.0*, 0.I 0, 0.20, 0.30, 0.40, 0.50
0.0*, 0.10, 0.20, 0.30, 0.40, 0.50
0.40, 0.60, 0.80, 1.00 239*
1.00 25
Table 2: Simulation conditions for a normally distributed trait (Q), sample size n = 500, recombination fraction TM1Q and rT2Q ,
respectively, and effect size. rM1Q = rM,Q = 0.0 not simulated
Population n
Backcross 500
rM2Q
Effect size
Number of combinations
0.0*, 0.10, 0.20, 0.30, 0.40, 0.50 0.0*,0.10, 0.20, 0.30, 0.40, 0.50 0.20, 0.50, 0.80
well as, or better than the two marker test. Comparisons
with smaller sample sizes (n = 100) were not conducted
because of convergence problems using the two marker
model.
For a normally distributed phenotype, 75 parameter com
binations for the backcross were examined. From these,
12 showed no difference in statistical power, 34 scenarios
favored the intersection test, while 29 indicated the two
marker test as more powerful. The mean difference in
power was 0.015, and we failed to reject the test of the null
hypothesis that the difference was zero, p = 0.064.
Upon investigation of the parameter combinations that
showed some difference in power, specifically, for the sce
nario highlighted by Rebai et al. [4] (QTL in the middle of
the interval with a large distance between markers), we
also find the two marker approach to be slightly more
powerful than the intersection test. However, when the
distance between the QTL and one marker is much
smaller than the distance between the QTL and the second
marker, the intersection test is more powerful. Although
we can point to these cases, it is important to realize that
for most of the scenarios no difference in power was
observed (see Figure 1).
Drosophila Analysis
Ovariole number is related to reproductive success in Dro
sophila melanogaster and positively correlated with maxi
mum daily female fecundity [10,11]. The 98 RILs
(recombinant inbred lines) for this study were scored for
the trait ovariole number and genotyped as described in
Wayne et al. [8].
02 01 00 01 02
Difference in statistical power
Figure I
Histogram and density plot of difference in power between
intersection test and two marker test for 334 simulations.
Table 3: Concordance of significant test results from analysis of
71 unique pairs of adjacent makers using the intersection test and
two marker regression test. Yes, indicates that test was
significant, No, indicates test was not significant at a = 0.05
Two marker regression test
Page 3 of 9
(page number not for citation purposes)
Intersection test
Yes No
BMC Genetics 2003, 4
http://www.biomedcentral.com/14712156/4/10
Table 4: Marker analysis for ovariole number in Drosophila melanogaster from analysis using intersection test, u = 0.025, two marker
regression test a = 0.05; and interval mapping; only markers significant in either the two marker test or intersection test listed
Two marker test
Intersection test
Interval mapping
Chromosome Marker Frequency M2N2 df M2N2 Frequency
M2N2 pvalue M2
I 3E4F
4F5D
6E7D
2 35B38A 0.69
38A38E 0.7
38E43A 0.71
43A43E 0.73
43E46C 0.8
3 63A65A 0.29
65A65D 0.3
65D 0.42
67D
67D68B 0.44
68B68C 0.46
94 0.029
93 0.032
93 0.0499
82 0.028
77 0.017
78 0.028
79 0.03
85 0.038
85 0.075
86 0.035
90 0.021
Frequency M2 df N2 df M2pvalue N2 pvalue
N2
0.86 96 95 0.14
0.84 95 96 0.0256
0.74 96 95 0.069
89 84 0.33
84 87 0.017
87 84 0.028
84 87 0.038
87 89 0.034
0.38 94 89 0.227
0.51 89 93 0.0249
0.51 93 94 0.13
91 0.0000 0.51 0.58 94 94 0.004
4
90 0.0000
3
89 0.002
86 0.0007 0.46
0.49 94 92 0
0.46 92 94 0.002
0.45 94 90 0.007
0.0256
0.056
0.1
0.017
0.028
0.038
0.034
0.073
0.0249
0.13
0.004
pvalue
0.0242
0.0226
0.0373
0.3036
0.0105
0.0083
0.0177
0.0339
0.0187
0.0207
0.0031
70C71E 0.35
71E72A 0.35
72A73D 0.39
73D76A 0.34
76A76B 0.34
76B77A 0.41
77A82D 0.34
82D85F 0.31
85F87B 0.27
87B87E 0.28
87E87F 0.27
87F88E 0.23
96A96F 0.2
96F97D 0.23
97D97E 0.3
97E98A 0.26
98A99A 0.27
99A99B 0.34
99B99E 0.32
99E 0.33
100A
84 0.0002
84 0.0002
88 0.0001
83 0.0001
85 0.0001
90 0.0008
83 0.0002
84 0.0006
84 0.0019
85 0.008
84 0.006
85 0.13
87 0.066
86 0.013
88 0.024
88 0.013
88 0.0001
88 0
88 0
93 0
90 89 0.0008
89 90 0.0006
90 90 0.0007
90 87 0.0003
87 92 0.0001
92 92 0.0009
92 87 0.0009
87 90 0.0003
90 89 0.0012
89 91 0.003
91 89 0.011
89 88 0.004
94 90 0.21
90 92 0.013
92 90 0.048
90 94 0.02
94 90 0.018
90 90 0
90 94 0
94 95 0
For the 71 marker pairs considered, 36 markers were val of (0.58,0.90). McNemar's test showed no systematic
found to be significant using both the intersection test difference in the two approaches (S = 1.00, p = 0.32).
and the two marker test, and 26 were found to be nonsig
nificant with both tests. The intersection and two marker Overlapping regions on chromosome 3 were identified by
tests were concordant in 62 of the 71 pairs of markers the intersection test, the two marker test, and the interval
(Table 3). The estimated chance corrected agreement mapping test (Table 4). On chromosome 1, the two
(Kappa coefficient) was 0.75 with a 95% confidence inter marker test identified a region between 3E and 7D while
Page 4 of 9
(page number not for citation purposes)
0 0
0.002 0
0.007 0.0013
0.0008 0.001
0.0006
0.0007
0.0003
0.0001
0.0009
0.0009
0.0003
0.0012
0.003
0.01
0.004
0.11
0.013
0.048
0.02
0.018
0
0.0000
0.0000
0.0000
0.0009
0.0016
0.001
0.0007
0.0005
0.0009
0.0011
0.0006
0.0012
0.0028
0.0066
0.0061
0.0176
0.0168
0.053
0.016
0.0001
0
0
0
BMC Genetics 2003, 4
http://www.biomedcentral.com/14712156/4/10
I=3 b4, alpha=U UUUb
T=2 29, alpha=0 025
'N/
1B 3E 4F 5D 6E
7D 7E 9A 10D 11C
Chromosome 1
Panel a
T=3521, 00006
T=2 29, a=0 025
N'
1D 12E 14C 15A 17C 19A
/
50F 57F
21E 27B 30AB 33E 35B 38E 43E 48D 50B
Chromosome 2
Panel b
61A 65A 67D 68C 70C 72A 76A 77A 85F 87E 88E 91A 92A 93B 96A 97D 98A 99B100A
Chromosome 3
Panel c
Figure 2
Drosophila melanogaster intersection test. Ttest statistics and
thresholds for evaluation of significance considering all mark
ers typed (solid lines, T = 3.521, 76 markers; T = 2.29, paired
markers). The marker on Chromosome 4 spa was not statis
tically significant (p = 0.32). Panel a: Chromosome I Panel b:
Chromosome 2 Panel c: Chromosome 3
the comparable intersection test showed borderline sig
nificance for one marker in that region (4F, p = 0.0256).
On chromosome 2, the two marker test identified a region
from 35B46C while intersection tests identified a smaller
region from 35B38E and the interval mapping identified
a region from 38A46C. On Chromosome 3 the two
marker test identifies two regions 65A87F and 96F100A
while the intersection test finds the entire region from
63A100A significant. The interval mapping agrees with
the intersection test for interval 63A65A and finds it sig
nificant, while it agrees with the two marker test for the
interval 97D97F and does not find this interval signifi
cant. Chromosome 4 was not associated with the trait for
any test.
The application of the intersection test to these data can
be further expanded to include an analysis with all 76
markers. We conducted 76 single marker regression tests
at a Bonferroni adjusted a of 6.6 x 104. Markers 68B, 71E,
73D, 76A, 82D, 99A, 99B, 99E and 100A were significant
using this intersection test (see Figure 2). The regions
identified are consistent with a regressionbased interval
mapping analysis.
Discussion
The findings of Rebai et al. [41 show differences in the sta
tistical power of the two marker methods (i.e., interval
mapping) over single marker tests (e.g., ANOVA, ttest)
only when the markers are more than 50cM apart, sug
gesting that these differences may be due to the addition
of information in the second marker. Our simulation
study supports this hypothesis.
The application of an intersection test uses information
from both markers, and tests the same null hypotheses as
the two marker test. The use of the intersection test takes
advantage of the additional genotypic information pro
vided in the second marker.
While compound hypotheses are common in statistical
theory, and typically seen in the use of union/intersection
tests, their use in the quantitative genetics arena and QTL
application is relatively novel. Furthermore, the intersec
tion test is simple to implement, the expansion to multi
ple markers is straightforward, and uses all available
marker information. In a framework map, where markers
are unlinked, the intersection test is simply the single
marker analysis with a Bonferroni correction for the sig
nificance level. In cases where markers are correlated, the
application of the Bonferroni correction will be overly
conservative. This correction guarantees that the nominal
a is not exceeded, but is well known to be overly conserv
ative in cases where tests are not independent. In this case,
the application of the intersection test will require an
Page 5 of 9
(page number not for citation purposes)
BMC Genetics 2003, 4
http://www.biomedcentral.com/14712156/4/10
alternative correction in order to achieve maximum
power.
We demonstrate situations for a pair of adjacent markers
where the power of the intersection test is equal to or
greater than the power of the two marker test. In the case
highlighted by Rebai et al. [4], markers more than 50 cM
apart and large effect size, we also find that the two marker
test has higher power than the intersection test. A counter
example is when one marker is much closer to the QTL
than the other marker, in this case the intersection test is
more powerful. Overall, the power of the two approaches
is nearly identical and differences between them small.
In the Drosophila reanalysis both methods identify the
same general regions. However, six marker pairs were
found to be significant using the two marker tests that
were not identified using the intersection test. Using the
map and notation defined by Nuzhdin et al. [12] they
were: Chromosome 1 pairs 3E4F, 4F5D, 6E7D; Chro
mosome 2 pairs 38E43A, 43A43E, and 43E46C. The p
values for the 6 marker pairs from the intersection tests
were small but did not exceed the Bonferroni corrected
significance level (see Table 4). The above markers that
contribute to these marker pairs are linked indicating that
the Bonferroni correction may be overly conservative.
In contrast, three marker pairs on Chromosome 3 were
significant using the intersection tests, but were not signif
icant using the two marker tests: 63A65D, 87F88E, and
96A96F (Table 4). In these cases the "internal" marker of
the pair is giving signal while the "outer" marker does not.
This provides an interesting point of discussion. We could
say that marker 65D appears to be associated with ovari
ole number, but we do not know if the QTL lies to the left
or right of this marker. Just because marker 63A does not
appear to be significantly associated with ovariole
number, we can not infer that the region to the "left" of
65D does not contain the gene of interest.
In some cases, the two marker test results in a larger region
than the intersection test, while in others the reverse is
true. QTL mapping is usually a first attempt to locate
genes, which the biologist uses to identify all possible
regions of interest, e.g. is willing to accept type I error. We
have discussed different ways to detect underlying QTL
and an approach for maximizing or minimizing the
potential region containing the QTL. It is also possible to
estimate the QTL position directly. Estimates can be
obtained using a variety of techniques and the different
possible approaches to estimation are reviewed in Doerge
et al. [5] and Kao [3]. However, even when the position is
estimated, a confidence interval will exist defining the size
of the region to be included for further study. Different
approaches will result in regions of different sizes with
more, or fewer markers included. The differences in the
size of the regions are potentially important to a biologist,
who relies on QTL mapping analyses to determine regions
for further study. Most biologists accept that current QTL
mapping methods are best used for identifying broad
regions, which subsequently can be dissected with more
precise genetical techniques. The question is then: what
region should be advanced to fine mapping experiments?
The investigator may choose to take only the regions
which are significant in both intersection and multiple
marker approaches, or s/he may choose to carry forward
any marker that shows a positive result according to at
least one analysis. We recommend that experimentalists
perform both a single marker analysis with an intersection
test and a multiple marker analysis and use the informa
tion available in both analyses to guide their decisions
about what regions to carry forward for further study.
Conclusion
We find that the intersection test has equal or greater
power compared to the two marker equivalent. Our anal
yses were conducted using the Bonferroni correction for
the intersection test. When markers are linked, as in many
of our simulations, this correction is overly conservative.
If the intersection test is used in conjunction with a more
appropriate correction, the performance of the intersec
tion test would improve perhaps even surpassing the two
marker equivalent in more cases. Thus, our motivation
and hope in presenting this investigation of the statistical
power of intersection tests versus two marker tests is to
make clear the compound framework and resulting evi
dence under which intersection tests are indeed equal to
and/or more powerful than the complicated procedures
based on two marker models.
Materials
Statistical Framework
As the framework for our comparison, and in conjunction
with the previous simulations and conclusions provided
by the work of Rebai et al. [4], we consider a backcross
experimental design originating from a cross of two
homozygous inbred lines, differing in the trait of interest,
and producing heterozygous lines that are backcrossed to
one of the initial homozygous parental lines. We examine
both normal and binomial phenotypic distributions. In
general, we denote each marker as M ...Mk, where k is the
number of markers being examined and allow each
marker to have two alleles, M1 M12...Mki, Mk2. The 2k
phenotypic means are differentiated via subscripts (e.g.,
SM../M11...Mk or M11..Mk/M12...k2) and the frequencies
of these classes are denoted as pu1, P21...Pki under the bino
mial scenario (i.e., MA,, f/MA = rnp 1).
Page 6 of 9
(page number not for citation purposes)
BMC Genetics 2003, 4
http://www.biomedcentral.com/14712156/4/10
Single Marker Model and Hypotheses
A simple linear regression backcross model is employed
for single marker QTL detection
Yt = P + P.X + E; j = 1,...,n (1)
where Y is the quantitative trait value, X1. is an indicator
variable that denotes the state of a particular marker, P3 is
the overall mean, and P3 is the effect of an allelic substitu
tion at the marker. Ideally, if the marker and QTL are
completely linked, the effect of an allelic substitution is
the effect of the QTL. If k markers are considered
independently, k linear regression models can be consid
ered (i.e., one for each marker, M1, M2.... Mk) by denoting
the allelic substitution associated with marker Mi as P3 =
Pi, for i= 1... k. For k = 2 markers, we denote the allelic sub
stitution associated with marker M1 as P3 = P, where Po =
LMA11/M11 and P3 = tM11/A12 t11/M11; and the allelic substi
tution associated with marker M2 as P3 = 2, where Po =
LM21/M21 and P2 = tLM21/M22 M21/M21
A compound hypothesis testing the effect of an allelic sub
stitution at either or both of these two independent mark
ers is,
HO 1 = PA11 /M12 i11 /M = 0 and 32 = A421 M22 M21 M21 = 0
Ha P31 = P411 /M12 11 MA411 0 and/or 32 = P421/ M22 P21/21 0.
Rejection of this compound null hypothesis indicates an
association between a QTL and either or both of the mark
ers, M1 and M2, hence the term intersection test. From a
statistical perspective the relative position of the two
markers is irrelevant. However, to compare this to a two
marker model there is an implicit assumption that the
markers considered form an interval, or are adjacent to
one another. This marks a departure from the traditional
single marker analysis where no consideration to marker
order is given. To define an overall level a test, the signif
icance level a must be adjusted for the individual tests to
account for multiple testing. There are many ways to
account for multiple testing. Assuming the markers are
independent, the Bonferroni correction can be applied
[9 ]. The Bonferroni correction is conservative for the inter
section test and the lack of independence between mark
ers would tend to make it more difficult for the
intersection test to reject.
More generally, for k markers, the compound hypothesis
testing the effect of an allelic substitution at any of the
independent markers, M I...Mk is
Ho : Pi = f M2 I/M12
P2 = AM21 /A22
Pfk
HO: P1
P2
S11/i11 = 0 and
M21/M21 = 0 and
Mk1/M2 Mk1 /Mkl = 0
PM1, /MA12
[M21 /M22
SM11/i A 0 and/or
21 2/M21 0 and/or
Pf = Mkl / A2 Mkl / 0.
Rejection of this compound null hypothesis indicates an
association between a QTL and at least one of the markers,
M1...Mk. To define an overall level a test, using a Bonfer
roni correction [9], each P3 is tested at an adjusted signifi
cance level of . An association between a QTL and a
k
marker is then indicated when the individual single
marker test rejects the null at the adjusted a level.
The practical result of the application of an intersection
test, is the simplicity of calculation of the single marker
test statistic, with a correction for multiple testing.
Two Marker Regression Model and Test of the
Corresponding Interval
Extending the backcrosss) notation defined previously, a
multiple linear regression model (based on two markers)
can be employed for QTL detection purposes. The model
is defined as
Yj = o0 + p31X1 + 2X2 + 33X3 + j = ...
where Xj and X2j are the genotypic states of the respective
markers M1 and M2, along with their respective allelic sub
stitution effects (P1, P2), and Xj is the combined genotypic
states of markers M1 and M2 with allelic substitutions at
both markers M1 and M2 having effect 03. Interestingly to
note, when one is selectively genotyping, the information
in 33 is maximized.
In other words,
fo = M11M21/M11M21
Pi = /M 2A4 12A4 21 1 I21 /1 21
12 M11M21/M11M22 M11M21/M11M21
f2 tM 11M21/ A11 A22 'a11M21/ a11M21
P3 = 'tM11M21 /A4liA422 'A1iM21A/aiM21"
Based upon this two marker model with four parameters,
the hypothesis employed to perform a level a test for
Page 7 of 9
(page number not for citation purposes)
BMC Genetics 2003, 4
http://www.biomedcentral.com/14712156/4/10
association between a trait and the marker loci M1 and M2
is the test of 13 where,
Ho : 3 = PMiiM21/M12M22 Mi11MA21 /M1121 = 0
Ha P 33 IIAM21 /M12A22 M11Mi21/M 1121 0.
The null hypothesis for this test is that there is no associa
tion between either marker (M1 or M2) and the trait. A
similar set of hypotheses follow for an F2 experimental
design.
This model parameterization differs from the least squares
interval mapping approach first introduced by Knott and
Haley [2]. In the parameterization proposed here, only
one test is performed for the pair of markers. In contrast,
the regression based interval mapping approach [2],
recalculates the value of the independent variables for
each putative position in the interval. Our two marker
regression has a different parameterization from Knott
and Haley [2]. We chose the alternate parameterization in
order to directly compare the two marker model and the
single marker model. In the Knott and Haley [21 parame
terization, flanking markers are used to define the coeffi
cients of the regression as mean, additive or dominance
effects. For s steps along the interval between two markers
M1 and M2 values of X are calculated according to the con
ditional probability of a QTL in that location.
The regression based interval mapping parameterization
thus provides a mechanism to test for additive and domi
nance effects using tests of the regression parameters. In
our parameterization, the regression coefficients are tests
for detection. Thus, the two parameterizations have differ
ent null hypotheses for the tests of the regression coeffi
cients and are not directly comparable in terms of power.
We use the alternative parameterization so that the inter
pretation of the tests is comparable in the single marker
and two marker regression models and we can directly
compare the power of the two tests.
Simulations
Data were simulated for two marker backcross and F2 pop
ulations with binomial trait distributions and two marker
backcross populations with normal trait distributions. A
total of 339 parameter combinations were examined
(Table 1). For each combination of parameters, 1000 data
sets were simulated. Traits were simulated from a bino
mial distribution Bin(n,p) where sample sizes n = 100 and
n = 500 were utilized, and from a normal distribution
N( 1t + g2 1.0) with n = 500. The effect of the binary
2
trait [13] varied based on g = np, (Table 1). The binomial
probabilities p1, p2, and p3 represent the probability that a
binary trait is present given a specific BTL genotype (GT),
or the penetrance of the trait for the specific genotypes Q1/
Q1, Qi/Q2, and Q2/Q2, respectively. The location of the
locus relative to marker loci M1 and M2 also varied. Simi
larly, the effect under the normally distributed phenotype
was allowed to vary (Table 2) under seventy five parame
ter combinations. The effect size is the difference in the
penetrances (for binary traits) and between the means (for
normally distributed traits). For each phenotypic trait dis
tribution and each parameter combination (Table 1 and
2) we analyzed, via least squares, 1000 simulated data sets
using both the single marker regression model and the
two marker regression model.
For the intersection test, the null hypothesis was rejected
when the empirical pvalue for either single marker regres
sion test statistic was less than = 0.025 (Bonferroni
k
adjustment). For the comparable two marker test (i.e., 33
= 0), the null hypothesis was rejected when the empirical
pvalue was less than a = 0.05. Under each parameter
combination, the cumulative assessment of statistical
power was evaluated from the 1000 simulated data sets as
the proportion of times the empirical (permutation) p
values were less than the specified a level.
Drosophila Analysis
The population of Drosophila melanogaster used in our
analysis was a set of 98 RILs (recombinant in lines)
derived from a cross of two isogenic lines as described in
Wayne et al. [8], for the trait ovariole number. There were
76 informative markers on 4 chromosomes. Markers used
were the cytological map positions of the insertion sites of
roo transposable element markers, with the exception of
the fourth chromosome, where a visible mutation was
used as a marker (spa) [12]. A complete linkage map was
obtained for chromosome 1 (the X) and chromosome 3,
with 15 adjacent marker pairs (16 markers) on 1 and 36
adjacent marker pairs (37 markers) on 3. There was a cen
tromeric break in the genetic map for chromosome 2,
such that there were 18 adjacent pairs (19 markers) on the
left arm and 2 adjacent pairs (3 markers) on the right arm.
To compare the intersection test to the two marker test,
the 71 pairs of markers identified above were examined.
For each pair, the two marker regression with the test of
the 33 parameter was conducted at a = 0.05. The two indi
vidual markers were then separately modeled in a linear
regression model (see Equation 1), and the intersection
test was conducted. For the 71 unique pairs of markers,
concordance between the intersection test and two marker
test was estimated using the Kappa coefficient, and McNe
mar's test [14] was conducted to determine whether sys
tematic differences existed between the two methods.
Regression based interval mapping was performed
Page 8 of 9
(page number not for citation purposes)
BMC Genetics 2003, 4
http://www.biomedcentral.com/14712156/4/10
according to the Haley and Knott parameterization [1,2].
Analysis was conducted using SPLUS 2000 (Insightful
Corp.).
Authors' Contributions
Cynthia Coffman is the postdoctoral associate who pro
grammed all simulations and analyzed the Drosophila
data. Rebecca Doerge and Lauren McIntyre designed the
simulation study and assisted with the interpretation of
the results. Marta Wayne provided the Drosophila data,
and assisted in the interpretation of the results. All authors
contributed to the writing of this manuscript.
Acknowledgments
This work is supported by the Purdue University Experimental Research
Station; a National Science Foundation Grant (DBI 9808026/0096044) to
LMM, CJC, and RWD; a National Institute of Health grant (NIAAG 16996)
to LMM; a United States Department of Agriculture Grant (9835300
6173) to RWD; a National Institutes of Health grant (GM5988402) to
MLW, and a Veterans Affairs Health Services Research Postdoctoral Fel
lowship to CJC.
References
I. Haley C and Knott S: A simple regression method for mapping
quantitative trait loci in line crosses using flanking markers
Heredity 1992, 69:3 15324.
2. Knott S and Haley C: Aspects of maximum likelihood methods
for mapping of quantitative trait loci in line crosses Genetical
Research 1992, 60:139152.
3. Kao CH: On the differences between maximum likelihood
and regression interval mapping in the analysis of quantita
tive trait loci Genetics 2000, 156(2):855865.
4. Rebal A, Goffinet B and Mangin B: Comparing powers of different
methods for QTL detection Biometrics 1995, 51:8799.
5. Doerge RW, Zeng ZB and Weir BS: Statistical issues in the
search for genes affecting quantitative traits in experimental
populations Statistical Science 1997, 13:195219.
6. Lander ES and Botstein D: Mapping mendelian factors underly
ing quantitative traits using rflp linkage maps Genetics 1989,
121:185199.
7. Zeng ZB: Precision mapping of quantitative trait loci Genetics
1994, 136:14571468.
8. Wayne ML, Hackett JB, Dilda CL, Nuzhdin SV and Pasyukova EG:
Quantitative trait locus mapping of fitnessrelated traits in
Drosophila melanogaster Genetical Research 2001, 77:107116.
9. Steele R and Torrie J: Principles and procedures of statistics:a biometrical
approach 3rd edition. McGrawHill; 1997.
10. BouletreauMerle J, Allemand JR, Cohet Y and David JR: Reproduc
tive strategy in Drosophila melanogaster: Significance of a
genetic divergence between temperate and tropical
populations Oecologia 2001, 53:323329.
I I. Cohet Y and David J: Control of the adult reproductive poten
tial by preimaginal thermal conditions Oecologia 1978, 36:295
306. Publish with BioMed Central and every
12. Nuzhdin SV, Pasyukova EG, Dilda CL, Zeng ZB and Mackay TFC: scientist can read your work free of charge
Sexspecific quantitative trait loci affecting longevity in Dro
sophila melanogaster Proceedings from the National Academy of "BioMed Central will be the most significant development for
Science 1997, 94:97349739. disseminating the results of biomedical research in our lifetime."
13. McIntyre LM, Coffman CJ and Doerge RW: Detection and locali Sir Paul Nurse, Cancer Research UK
zation of a single binary trait locus in experimental
populations Genetical Research 2001, 78:7992. Your research papers will be:
14. Agresti A: Categorical Data Analysis John Wiley and Sons, New York, NY; available free of charge to the entire biomedical community
1990.
peer reviewed and published immediately upon acceptance
cited in PubMed and archived on PubMed Central
yours you keep the copyright
Submit your manuscript here: BioMedcentral
http://www.biomedcentral.com/info/publishing adv.asp 
Page 9 of 9
(page number not for citation purposes)
BMC Genetics 2003, 4
