Group Title: BMC Genetics
Title: Intersection tests for single marker QTL analysis can be more powerful than two marker QTL analysis
CITATION PDF VIEWER THUMBNAILS PAGE IMAGE ZOOMABLE
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00100049/00001
 Material Information
Title: Intersection tests for single marker QTL analysis can be more powerful than two marker QTL analysis
Physical Description: Book
Language: English
Creator: Coffman, Cynthia
Doerge, R. W.
Wayne, Marta
McIntyre, Lauren
Publisher: BMC Genetics
Publication Date: 2003
 Notes
Abstract: BACKGROUND:It has been reported in the quantitative trait locus (QTL) literature that when testing for QTL location and effect, the statistical power supporting methodologies based on two markers and their estimated genetic map is higher than for the genetic map independent methodologies known as single marker analyses. Close examination of these reports reveals that the two marker approaches are more powerful than single marker analyses only in certain cases.Simulation studies are a commonly used tool to determine the behavior of test statistics under known conditions. We conducted a simulation study to assess the general behavior of an intersection test and a two marker test under a variety of conditions. The study was designed to reveal whether two marker tests are always more powerful than intersection tests, or whether there are cases when an intersection test may outperform the two marker approach.We present a reanalysis of a data set from a QTL study of ovariole number in Drosophila melanogaster.RESULTS:Our simulation study results show that there are situations where the single marker intersection test equals or outperforms the two marker test. The intersection test and the two marker test identify overlapping regions in the reanalysis of the Drosophila melanogaster data. The region identified is consistent with a regression based interval mapping analysis.CONCLUSION:We find that the intersection test is appropriate for analysis of QTL data. This approach has the advantage of simplicity and for certain situations supplies equivalent or more powerful results than a comparable two marker test.
General Note: Start page 10
General Note: M3: 10.1186/1471-2156-4-10
 Record Information
Bibliographic ID: UF00100049
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: Open Access: http://www.biomedcentral.com/info/about/openaccess/
Resource Identifier: issn - 1471-2156
http://www.biomedcentral.com/1471-2156/4/10

Downloads

This item has the following downloads:

PDF ( PDF )


Full Text



BMC Genetics Central


Methodology article


Intersection tests for single marker QTL analysis can be more
powerful than two marker QTL analysis
Cynthia J Coffman1'2, RW Doerge3,4,5, Marta L Wayne6 and
Lauren M McIntyre*2,4,5


Address: 'Institute for Clinical and Epidemiological Research, Biostatistics Unit, Durham VA Medical Center (152), Durham, NC 27705 USA,
2Duke Medical Center, Department of Biostatistics and Bioinformatics, Durham, NC 27710 USA, 3Department of Statistics, Purdue University,
West Lafayette, IN 47907 USA, 4Department of Agronomy, Purdue University, West Lafayette, IN 47907 USA, 5Computational Genomics, Purdue
University, West Lafayette, IN 47907 USA and 6Department of Zoology, University of Florida, Gainesville, FL 32611-8525 USA
Email: Cynthia J Coffman cynthia.coffman@duke.edu; RW Doerge doerge@stat.purdue.edu; Marta L Wayne mlwayne@zoo.ufl.edu;
Lauren M McIntyre* lmcintyre@purdue.edu
* Corresponding author



Published: 19 June 2003 Received: 26 November 2002
BMC Genetics 2003, 4:10 Accepted: 19 June 2003
This article is available from: http://www.biomedcentral.com/1471-2156/4/10
2003 Coffman et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all
media for any purpose, provided this notice is preserved along with the article's original URL.



Abstract
Background: It has been reported in the quantitative trait locus (QTL) literature that when
testing for QTL location and effect, the statistical power supporting methodologies based on two
markers and their estimated genetic map is higher than for the genetic map independent
methodologies known as single marker analyses. Close examination of these reports reveals that
the two marker approaches are more powerful than single marker analyses only in certain cases.
Simulation studies are a commonly used tool to determine the behavior of test statistics under
known conditions. We conducted a simulation study to assess the general behavior of an
intersection test and a two marker test under a variety of conditions. The study was designed to
reveal whether two marker tests are always more powerful than intersection tests, or whether
there are cases when an intersection test may outperform the two marker approach.
We present a reanalysis of a data set from a QTL study of ovariole number in Drosophila
melanogaster.
Results: Our simulation study results show that there are situations where the single marker
intersection test equals or outperforms the two marker test. The intersection test and the two
marker test identify overlapping regions in the reanalysis of the Drosophila melanogaster data. The
region identified is consistent with a regression based interval mapping analysis.
Conclusion: We find that the intersection test is appropriate for analysis of QTL data. This
approach has the advantage of simplicity and for certain situations supplies equivalent or more
powerful results than a comparable two marker test.




Background mapping. These comparisons have shown that the addi-
Many authors [1-4] have compared the statistical power tional information supplied by the genetic distance
of different methods for quantitative trait locus (QTL) between identifiable DNA markers unconfounds the QTL


Page 1 of 9
(page number not for citation purposes)


70pen Ac







http://www.biomedcentral.com/1471-2156/4/10


effect from the QTL location, thus making two marker
models more powerful than single marker models of
detection (for review see Doerge et al. [5]).

With the goal of detecting and/or locating QTL, there are
two common statistical approaches that can be taken. The
first approach is based on ANOVA, or simple linear regres-
sion, and performs statistical tests based solely on single
DNA marker information. No genetic map is required for
single marker analysis, and the calculations are based on
phenotypic means and variances within each of the geno-
typic classes. The second, and more involved approach, is
based on two DNA markers, the estimated recombination
fraction between them (or the estimated genetic distance),
and either a maximum likelihood based calculation or a
regression model including multiple (two or more) mark-
ers as independent variables. The linear ordering of multi-
ple DNA markers based upon their estimated
relationships (i.e., recombination fraction (or genetic dis-
tance)) supplies the framework or genetic map for (com-
posite) interval mapping [6,7] and as such unconfounds
the QTL effect and the QTL location, thus providing a
more precise means for detecting and locating QTL with
respect to the estimated genetic map for the organism
under investigation.

Rebai et al. [4] present a comprehensive comparison of
the statistical power for many of the commonly used
flanking marker or two marker methods employed, and
conclude that two marker mapping provides a relatively
small gain (5%) in power over single marker methods
when the two markers define an interval of width less
than 20 cM, but a substantial increase (greater than 30%)
in power for intervals upwards of 70 cM, indicating that
the gain in power may come from the addition of the sec-
ond marker to the analysis, or the addition of information
from that marker, rather than the map.

Using the findings of Rebai et al. [4], and others as our
motivation, we hypothesize that the power increase
between single marker and two marker regression meth-
ods is due to additional genotypic information in the sec-
ond marker. In order to assess this, similar test statistics
should be compared. A comparison of maximum likeli-
hood interval mapping to single marker ANOVA is com-
plicated because of the differences that may be observed
due to differences between regression and maximum like-
lihood [3], as well as differences in marker information.
In order to avoid this complication, we consider two
regression based approaches that differ only in the
number of markers included in the initial model (i.e. the
statistical methodology is the same, the models are differ-
ent). First, a set of compound hypotheses are defined for
use with a single marker analysis and used to define an
intersection test. We then state the equivalent hypotheses


for a regression based two marker model for a test of the
interval. Last we compare, via simulation, the power of
the intersection and the two marker test under the stated
hypotheses for each model. We do not consider the case
of multiple QTL in a single interval, as this case was not
considered by Rebai et al. [4]. These two approaches are
then applied to a backcrosss' population of Drosophila
with 76 informative markers [8] to detect QTL associated
with ovariole number.

Results
Simulations
An overview of the simulations performed is given in
Table 1 and 2. For the purpose of evaluating the relative
difference in statistical power between the intersection
test and the two marker test, the estimated statistical
power of the two marker test was subtracted from that of
the intersection test for each of the parameter combina-
tions investigated, and a t-test [9] was performed to test
the null hypothesis that the mean difference in power was
zero.

For the binomial phenotype with a backcross design, sam-
ple size of 100, the 100 parameter combinations exam-
ined resulted in 34 showing no difference in statistical
power between the intersection test and the two marker
test (i.e, the value of the difference was exactly zero), 39
favoring the intersection test, and 27 favoring the two
marker test. The intersection test was more powerful with
a mean difference in power of 0.010, and the t-test of the
null hypothesis that the difference in power was zero was
rejected (p = 0.020). When n was equal to 500, the 139
parameter combinations examined yielded 116 showing
there was no difference in statistical power, 18 parameter
combinations indicated the intersection test as more pow-
erful, and 5 indicated the two marker test as more power-
ful. The mean difference in power was 0.020, and the t-test
of the null hypothesis that the difference in power was
zero was rejected (p = 0.010). The test of the null hypoth-
esis that the mean difference between the intersection and
two marker was zero was rejected for both sets of simula-
tions. The estimated difference between the two
approaches was positive, indicating that the intersection
test has slightly higher power than the two marker test in
these cases.

We also investigated F2 experimental populations for a
binomial phenotype, using a sample size of 500. From the
25 parameter combinations investigated, 5 failed to con-
verge consistently for the two marker model due to singu-
larity in the design matrix. The remaining 20 parameter
combinations showed 10 as having no difference in statis-
tical power, while the remaining 10 favored the intersec-
tion test. Results similar to those found in the initial
simulations indicate that the intersection test performs as


Page 2 of 9
(page number not for citation purposes)


BMC Genetics 2003, 4








http://www.biomedcentral.com/1471-2156/4/10


Table I: Simulation conditions for a binary phenotype, two marker loci M I and M2, a single locus (Q), sample sizes n = 100, 500,
recombination fraction TM1Q and rM2Q respectively, and effect size. M1Q = rM2Q = 0.0 not simulated


Population n


rMQ


Effect Size


Number of combinations


Backcross 100, 500 0.0*, 0.10, 0.20, 0.30, 0.40, 0.50
F2 500 0.0*, 0.10, 0.20, 0.30, 0.40, 0.50


0.0*, 0.I 0, 0.20, 0.30, 0.40, 0.50
0.0*, 0.10, 0.20, 0.30, 0.40, 0.50


0.40, 0.60, 0.80, 1.00 239*
1.00 25


Table 2: Simulation conditions for a normally distributed trait (Q), sample size n = 500, recombination fraction TM1Q and rT2Q ,

respectively, and effect size. rM1Q = rM,Q = 0.0 not simulated


Population n

Backcross 500


rM2Q


Effect size


Number of combinations


0.0*, 0.10, 0.20, 0.30, 0.40, 0.50 0.0*,0.10, 0.20, 0.30, 0.40, 0.50 0.20, 0.50, 0.80


well as, or better than the two marker test. Comparisons
with smaller sample sizes (n = 100) were not conducted
because of convergence problems using the two marker
model.

For a normally distributed phenotype, 75 parameter com-
binations for the backcross were examined. From these,
12 showed no difference in statistical power, 34 scenarios
favored the intersection test, while 29 indicated the two
marker test as more powerful. The mean difference in
power was 0.015, and we failed to reject the test of the null
hypothesis that the difference was zero, p = 0.064.

Upon investigation of the parameter combinations that
showed some difference in power, specifically, for the sce-
nario highlighted by Rebai et al. [4] (QTL in the middle of
the interval with a large distance between markers), we
also find the two marker approach to be slightly more
powerful than the intersection test. However, when the
distance between the QTL and one marker is much
smaller than the distance between the QTL and the second
marker, the intersection test is more powerful. Although
we can point to these cases, it is important to realize that
for most of the scenarios no difference in power was
observed (see Figure 1).

Drosophila Analysis
Ovariole number is related to reproductive success in Dro-
sophila melanogaster and positively correlated with maxi-
mum daily female fecundity [10,11]. The 98 RILs
(recombinant inbred lines) for this study were scored for
the trait ovariole number and genotyped as described in
Wayne et al. [8].


-02 -01 00 01 02
Difference in statistical power


Figure I
Histogram and density plot of difference in power between
intersection test and two marker test for 334 simulations.





Table 3: Concordance of significant test results from analysis of
71 unique pairs of adjacent makers using the intersection test and
two marker regression test. Yes, indicates that test was
significant, No, indicates test was not significant at a = 0.05


Two marker regression test


Page 3 of 9
(page number not for citation purposes)


Intersection test
Yes No


BMC Genetics 2003, 4








http://www.biomedcentral.com/1471-2156/4/10


Table 4: Marker analysis for ovariole number in Drosophila melanogaster from analysis using intersection test, u = 0.025, two marker
regression test a = 0.05; and interval mapping; only markers significant in either the two marker test or intersection test listed


Two marker test


Intersection test


Interval mapping


Chromosome Marker Frequency M2N2 df M2N2 Frequency
M2N2 p-value M2


I 3E-4F
4F-5D
6E-7D


2 35B-38A 0.69
38A-38E 0.7
38E-43A 0.71
43A-43E 0.73
43E-46C 0.8

3 63A-65A 0.29
65A-65D 0.3
65D- 0.42
67D
67D-68B 0.44

68B-68C 0.46


94 0.029
93 0.032
93 0.0499

82 0.028
77 0.017
78 0.028
79 0.03
85 0.038

85 0.075
86 0.035
90 0.021


Frequency M2 df N2 df M2p-value N2 p-value
N2


0.86 96 95 0.14
0.84 95 96 0.0256
0.74 96 95 0.069


89 84 0.33
84 87 0.017
87 84 0.028
84 87 0.038
87 89 0.034


0.38 94 89 0.227
0.51 89 93 0.0249
0.51 93 94 0.13


91 0.0000 0.51 0.58 94 94 0.004


4
90 0.0000
3
89 0.002


86 0.0007 0.46


0.49 94 92 0

0.46 92 94 0.002

0.45 94 90 0.007


0.0256
0.056
0.1

0.017
0.028
0.038
0.034
0.073

0.0249
0.13
0.004


p-value


0.0242
0.0226
0.0373

0.3036
0.0105
0.0083
0.0177
0.0339


0.0187
0.0207
0.0031


70C-71E 0.35
71E-72A 0.35
72A-73D 0.39
73D-76A 0.34
76A-76B 0.34
76B-77A 0.41
77A-82D 0.34
82D-85F 0.31
85F-87B 0.27
87B-87E 0.28
87E-87F 0.27
87F-88E 0.23
96A-96F 0.2
96F-97D 0.23
97D-97E 0.3
97E-98A 0.26
98A-99A 0.27
99A-99B 0.34
99B-99E 0.32
99E- 0.33
100A


84 0.0002
84 0.0002
88 0.0001
83 0.0001
85 0.0001
90 0.0008
83 0.0002
84 0.0006
84 0.0019
85 0.008
84 0.006
85 0.13
87 0.066
86 0.013
88 0.024
88 0.013
88 0.0001
88 0
88 0
93 0


90 89 0.0008
89 90 0.0006
90 90 0.0007
90 87 0.0003
87 92 0.0001
92 92 0.0009
92 87 0.0009
87 90 0.0003
90 89 0.0012
89 91 0.003
91 89 0.011
89 88 0.004
94 90 0.21
90 92 0.013
92 90 0.048
90 94 0.02
94 90 0.018
90 90 0
90 94 0
94 95 0


For the 71 marker pairs considered, 36 markers were val of (0.58,0.90). McNemar's test showed no systematic
found to be significant using both the intersection test difference in the two approaches (S = 1.00, p = 0.32).
and the two marker test, and 26 were found to be non-sig-
nificant with both tests. The intersection and two marker Overlapping regions on chromosome 3 were identified by
tests were concordant in 62 of the 71 pairs of markers the intersection test, the two marker test, and the interval
(Table 3). The estimated chance corrected agreement mapping test (Table 4). On chromosome 1, the two
(Kappa coefficient) was 0.75 with a 95% confidence inter- marker test identified a region between 3E and 7D while



Page 4 of 9
(page number not for citation purposes)


0 0

0.002 0

0.007 0.0013

0.0008 0.001


0.0006
0.0007
0.0003
0.0001
0.0009
0.0009
0.0003
0.0012
0.003
0.01
0.004
0.11
0.013
0.048
0.02
0.018
0
0.0000
0.0000
0.0000


0.0009
0.0016
0.001
0.0007
0.0005
0.0009
0.0011
0.0006
0.0012
0.0028
0.0066
0.0061
0.0176
0.0168
0.053
0.016
0.0001
0
0
0


BMC Genetics 2003, 4








http://www.biomedcentral.com/1471-2156/4/10


I=3 b4, alpha=U UUUb



T=2 29, alpha=0 025


'-N/


1B 3E 4F 5D 6E


7D 7E 9A 10D 11C
Chromosome 1
Panel a



T=3521, -00006


T=2 29, a=0 025


N'


1D 12E 14C 15A 17C 19A


/


50F 57F


21E 27B 30AB 33E 35B 38E 43E 48D 50B
Chromosome 2

Panel b


61A 65A 67D 68C 70C 72A 76A 77A 85F 87E 88E 91A 92A 93B 96A 97D 98A 99B100A
Chromosome 3

Panel c


Figure 2
Drosophila melanogaster intersection test. T-test statistics and
thresholds for evaluation of significance considering all mark-
ers typed (solid lines, T = 3.521, 76 markers; T = 2.29, paired
markers). The marker on Chromosome 4 spa was not statis-
tically significant (p = 0.32). Panel a: Chromosome I Panel b:
Chromosome 2 Panel c: Chromosome 3


the comparable intersection test showed borderline sig-
nificance for one marker in that region (4F, p = 0.0256).
On chromosome 2, the two marker test identified a region
from 35B-46C while intersection tests identified a smaller
region from 35B-38E and the interval mapping identified
a region from 38A-46C. On Chromosome 3 the two
marker test identifies two regions 65A-87F and 96F-100A
while the intersection test finds the entire region from
63A-100A significant. The interval mapping agrees with
the intersection test for interval 63A-65A and finds it sig-
nificant, while it agrees with the two marker test for the
interval 97D-97F and does not find this interval signifi-
cant. Chromosome 4 was not associated with the trait for
any test.

The application of the intersection test to these data can
be further expanded to include an analysis with all 76
markers. We conducted 76 single marker regression tests
at a Bonferroni adjusted a of 6.6 x 10-4. Markers 68B, 71E,
73D, 76A, 82D, 99A, 99B, 99E and 100A were significant
using this intersection test (see Figure 2). The regions
identified are consistent with a regression-based interval
mapping analysis.

Discussion
The findings of Rebai et al. [41 show differences in the sta-
tistical power of the two marker methods (i.e., interval
mapping) over single marker tests (e.g., ANOVA, t-test)
only when the markers are more than 50cM apart, sug-
gesting that these differences may be due to the addition
of information in the second marker. Our simulation
study supports this hypothesis.

The application of an intersection test uses information
from both markers, and tests the same null hypotheses as
the two marker test. The use of the intersection test takes
advantage of the additional genotypic information pro-
vided in the second marker.

While compound hypotheses are common in statistical
theory, and typically seen in the use of union/intersection
tests, their use in the quantitative genetics arena and QTL
application is relatively novel. Furthermore, the intersec-
tion test is simple to implement, the expansion to multi-
ple markers is straightforward, and uses all available
marker information. In a framework map, where markers
are unlinked, the intersection test is simply the single
marker analysis with a Bonferroni correction for the sig-
nificance level. In cases where markers are correlated, the
application of the Bonferroni correction will be overly
conservative. This correction guarantees that the nominal
a is not exceeded, but is well known to be overly conserv-
ative in cases where tests are not independent. In this case,
the application of the intersection test will require an


Page 5 of 9
(page number not for citation purposes)


BMC Genetics 2003, 4







http://www.biomedcentral.com/1471-2156/4/10


alternative correction in order to achieve maximum
power.

We demonstrate situations for a pair of adjacent markers
where the power of the intersection test is equal to or
greater than the power of the two marker test. In the case
highlighted by Rebai et al. [4], markers more than 50 cM
apart and large effect size, we also find that the two marker
test has higher power than the intersection test. A counter
example is when one marker is much closer to the QTL
than the other marker, in this case the intersection test is
more powerful. Overall, the power of the two approaches
is nearly identical and differences between them small.

In the Drosophila reanalysis both methods identify the
same general regions. However, six marker pairs were
found to be significant using the two marker tests that
were not identified using the intersection test. Using the
map and notation defined by Nuzhdin et al. [12] they
were: Chromosome 1 pairs 3E-4F, 4F-5D, 6E-7D; Chro-
mosome 2 pairs 38E-43A, 43A-43E, and 43E-46C. The p-
values for the 6 marker pairs from the intersection tests
were small but did not exceed the Bonferroni corrected
significance level (see Table 4). The above markers that
contribute to these marker pairs are linked indicating that
the Bonferroni correction may be overly conservative.

In contrast, three marker pairs on Chromosome 3 were
significant using the intersection tests, but were not signif-
icant using the two marker tests: 63A-65D, 87F-88E, and
96A-96F (Table 4). In these cases the "internal" marker of
the pair is giving signal while the "outer" marker does not.
This provides an interesting point of discussion. We could
say that marker 65D appears to be associated with ovari-
ole number, but we do not know if the QTL lies to the left
or right of this marker. Just because marker 63A does not
appear to be significantly associated with ovariole
number, we can not infer that the region to the "left" of
65D does not contain the gene of interest.

In some cases, the two marker test results in a larger region
than the intersection test, while in others the reverse is
true. QTL mapping is usually a first attempt to locate
genes, which the biologist uses to identify all possible
regions of interest, e.g. is willing to accept type I error. We
have discussed different ways to detect underlying QTL
and an approach for maximizing or minimizing the
potential region containing the QTL. It is also possible to
estimate the QTL position directly. Estimates can be
obtained using a variety of techniques and the different
possible approaches to estimation are reviewed in Doerge
et al. [5] and Kao [3]. However, even when the position is
estimated, a confidence interval will exist defining the size
of the region to be included for further study. Different
approaches will result in regions of different sizes with


more, or fewer markers included. The differences in the
size of the regions are potentially important to a biologist,
who relies on QTL mapping analyses to determine regions
for further study. Most biologists accept that current QTL
mapping methods are best used for identifying broad
regions, which subsequently can be dissected with more
precise genetical techniques. The question is then: what
region should be advanced to fine mapping experiments?
The investigator may choose to take only the regions
which are significant in both intersection and multiple
marker approaches, or s/he may choose to carry forward
any marker that shows a positive result according to at
least one analysis. We recommend that experimentalists
perform both a single marker analysis with an intersection
test and a multiple marker analysis and use the informa-
tion available in both analyses to guide their decisions
about what regions to carry forward for further study.

Conclusion
We find that the intersection test has equal or greater
power compared to the two marker equivalent. Our anal-
yses were conducted using the Bonferroni correction for
the intersection test. When markers are linked, as in many
of our simulations, this correction is overly conservative.
If the intersection test is used in conjunction with a more
appropriate correction, the performance of the intersec-
tion test would improve perhaps even surpassing the two
marker equivalent in more cases. Thus, our motivation
and hope in presenting this investigation of the statistical
power of intersection tests versus two marker tests is to
make clear the compound framework and resulting evi-
dence under which intersection tests are indeed equal to
and/or more powerful than the complicated procedures
based on two marker models.

Materials
Statistical Framework
As the framework for our comparison, and in conjunction
with the previous simulations and conclusions provided
by the work of Rebai et al. [4], we consider a backcross
experimental design originating from a cross of two
homozygous inbred lines, differing in the trait of interest,
and producing heterozygous lines that are backcrossed to
one of the initial homozygous parental lines. We examine
both normal and binomial phenotypic distributions. In
general, we denote each marker as M ...Mk, where k is the
number of markers being examined and allow each
marker to have two alleles, M1 M12...Mki, Mk2. The 2k
phenotypic means are differentiated via subscripts (e.g.,
SM../M11...Mk or M11..Mk/M12...k2) and the frequencies
of these classes are denoted as pu1, P21...Pki under the bino-
mial scenario (i.e., MA,, f/MA = rnp 1).




Page 6 of 9
(page number not for citation purposes)


BMC Genetics 2003, 4







http://www.biomedcentral.com/1471-2156/4/10


Single Marker Model and Hypotheses
A simple linear regression backcross model is employed
for single marker QTL detection

Yt = P + P.X + E; j = 1,...,n (1)

where Y is the quantitative trait value, X1. is an indicator
variable that denotes the state of a particular marker, P3 is
the overall mean, and P3 is the effect of an allelic substitu-
tion at the marker. Ideally, if the marker and QTL are
completely linked, the effect of an allelic substitution is
the effect of the QTL. If k markers are considered
independently, k linear regression models can be consid-
ered (i.e., one for each marker, M1, M2.... Mk) by denoting
the allelic substitution associated with marker Mi as P3 =
Pi, for i= 1... k. For k = 2 markers, we denote the allelic sub-
stitution associated with marker M1 as P3 = P, where Po =
LMA11/M11 and P3 = tM11/A12 t11/M11; and the allelic substi-
tution associated with marker M2 as P3 = 2, where Po =
LM21/M21 and P2 = tLM21/M22 M21/M21

A compound hypothesis testing the effect of an allelic sub-
stitution at either or both of these two independent mark-
ers is,

HO 1 = PA11 /M12 -i11 /M = 0 and 32 = A421 M22 M21 M21 = 0
Ha P31 = P411 /M12 11 MA411 0 and/or 32 = P421/ M22- P21/21 0.

Rejection of this compound null hypothesis indicates an
association between a QTL and either or both of the mark-
ers, M1 and M2, hence the term intersection test. From a
statistical perspective the relative position of the two
markers is irrelevant. However, to compare this to a two
marker model there is an implicit assumption that the
markers considered form an interval, or are adjacent to
one another. This marks a departure from the traditional
single marker analysis where no consideration to marker
order is given. To define an overall level a test, the signif-
icance level a must be adjusted for the individual tests to
account for multiple testing. There are many ways to
account for multiple testing. Assuming the markers are
independent, the Bonferroni correction can be applied
[9 ]. The Bonferroni correction is conservative for the inter-
section test and the lack of independence between mark-
ers would tend to make it more difficult for the
intersection test to reject.

More generally, for k markers, the compound hypothesis
testing the effect of an allelic substitution at any of the
independent markers, M I...Mk is


Ho : Pi = f M2 I/M12
P2 = AM21 /A22


Pfk

HO: P1

P2


S11/i11 = 0 and
-M21/M21 = 0 and


Mk1/M2 -Mk1 /Mkl = 0


PM1, /MA12
[M21 /M22


SM11/i A 0 and/or
--21 2/M21 0 and/or


Pf = Mkl / A2 Mkl / 0.
Rejection of this compound null hypothesis indicates an
association between a QTL and at least one of the markers,
M1...Mk. To define an overall level a test, using a Bonfer-
roni correction [9], each P3 is tested at an adjusted signifi-

cance level of -. An association between a QTL and a
k
marker is then indicated when the individual single
marker test rejects the null at the adjusted a level.

The practical result of the application of an intersection
test, is the simplicity of calculation of the single marker
test statistic, with a correction for multiple testing.

Two Marker Regression Model and Test of the
Corresponding Interval
Extending the backcrosss) notation defined previously, a
multiple linear regression model (based on two markers)
can be employed for QTL detection purposes. The model
is defined as

Yj = o0 + p31X1 + 2X2 + 33X3 + j = ...

where Xj and X2j are the genotypic states of the respective
markers M1 and M2, along with their respective allelic sub-
stitution effects (P1, P2), and Xj is the combined genotypic
states of markers M1 and M2 with allelic substitutions at
both markers M1 and M2 having effect 03. Interestingly to
note, when one is selectively genotyping, the information
in 33 is maximized.

In other words,

fo = M11M21/M11M21
Pi = /M 2A4 12A4 21 1 I21 /1 21
12 M11M21/M11M22 -M11M21/M11M21
f2 tM 11M21/ A11 A22 'a11M21/ a11M21
P3 = 'tM11M21 /A4liA422 'A1iM21A/aiM21"
Based upon this two marker model with four parameters,
the hypothesis employed to perform a level a test for



Page 7 of 9
(page number not for citation purposes)


BMC Genetics 2003, 4







http://www.biomedcentral.com/1471-2156/4/10


association between a trait and the marker loci M1 and M2
is the test of 13 where,

Ho : 3 = P-MiiM21/M12M22 Mi11MA21 /M1121 = 0
Ha P 33 IIAM21 /M12A22 M11Mi21/M 1121 0.
The null hypothesis for this test is that there is no associa-
tion between either marker (M1 or M2) and the trait. A
similar set of hypotheses follow for an F2 experimental
design.

This model parameterization differs from the least squares
interval mapping approach first introduced by Knott and
Haley [2]. In the parameterization proposed here, only
one test is performed for the pair of markers. In contrast,
the regression based interval mapping approach [2],
recalculates the value of the independent variables for
each putative position in the interval. Our two marker
regression has a different parameterization from Knott
and Haley [2]. We chose the alternate parameterization in
order to directly compare the two marker model and the
single marker model. In the Knott and Haley [21 parame-
terization, flanking markers are used to define the coeffi-
cients of the regression as mean, additive or dominance
effects. For s steps along the interval between two markers
M1 and M2 values of X are calculated according to the con-
ditional probability of a QTL in that location.

The regression based interval mapping parameterization
thus provides a mechanism to test for additive and domi-
nance effects using tests of the regression parameters. In
our parameterization, the regression coefficients are tests
for detection. Thus, the two parameterizations have differ-
ent null hypotheses for the tests of the regression coeffi-
cients and are not directly comparable in terms of power.
We use the alternative parameterization so that the inter-
pretation of the tests is comparable in the single marker
and two marker regression models and we can directly
compare the power of the two tests.

Simulations
Data were simulated for two marker backcross and F2 pop-
ulations with binomial trait distributions and two marker
backcross populations with normal trait distributions. A
total of 339 parameter combinations were examined
(Table 1). For each combination of parameters, 1000 data
sets were simulated. Traits were simulated from a bino-
mial distribution Bin(n,p) where sample sizes n = 100 and
n = 500 were utilized, and from a normal distribution

N( 1t + g2 1.0) with n = 500. The effect of the binary
2
trait [13] varied based on g = np, (Table 1). The binomial
probabilities p1, p2, and p3 represent the probability that a
binary trait is present given a specific BTL genotype (GT),


or the penetrance of the trait for the specific genotypes Q1/
Q1, Qi/Q2, and Q2/Q2, respectively. The location of the
locus relative to marker loci M1 and M2 also varied. Simi-
larly, the effect under the normally distributed phenotype
was allowed to vary (Table 2) under seventy five parame-
ter combinations. The effect size is the difference in the
penetrances (for binary traits) and between the means (for
normally distributed traits). For each phenotypic trait dis-
tribution and each parameter combination (Table 1 and
2) we analyzed, via least squares, 1000 simulated data sets
using both the single marker regression model and the
two marker regression model.

For the intersection test, the null hypothesis was rejected
when the empirical p-value for either single marker regres-

sion test statistic was less than = 0.025 (Bonferroni
k
adjustment). For the comparable two marker test (i.e., 33
= 0), the null hypothesis was rejected when the empirical
p-value was less than a = 0.05. Under each parameter
combination, the cumulative assessment of statistical
power was evaluated from the 1000 simulated data sets as
the proportion of times the empirical (permutation) p-
values were less than the specified a level.

Drosophila Analysis
The population of Drosophila melanogaster used in our
analysis was a set of 98 RILs (recombinant in lines)
derived from a cross of two isogenic lines as described in
Wayne et al. [8], for the trait ovariole number. There were
76 informative markers on 4 chromosomes. Markers used
were the cytological map positions of the insertion sites of
roo transposable element markers, with the exception of
the fourth chromosome, where a visible mutation was
used as a marker (spa) [12]. A complete linkage map was
obtained for chromosome 1 (the X) and chromosome 3,
with 15 adjacent marker pairs (16 markers) on 1 and 36
adjacent marker pairs (37 markers) on 3. There was a cen-
tromeric break in the genetic map for chromosome 2,
such that there were 18 adjacent pairs (19 markers) on the
left arm and 2 adjacent pairs (3 markers) on the right arm.

To compare the intersection test to the two marker test,
the 71 pairs of markers identified above were examined.
For each pair, the two marker regression with the test of
the 33 parameter was conducted at a = 0.05. The two indi-
vidual markers were then separately modeled in a linear
regression model (see Equation 1), and the intersection
test was conducted. For the 71 unique pairs of markers,
concordance between the intersection test and two marker
test was estimated using the Kappa coefficient, and McNe-
mar's test [14] was conducted to determine whether sys-
tematic differences existed between the two methods.
Regression based interval mapping was performed


Page 8 of 9
(page number not for citation purposes)


BMC Genetics 2003, 4








http://www.biomedcentral.com/1471-2156/4/10


according to the Haley and Knott parameterization [1,2].
Analysis was conducted using S-PLUS 2000 (Insightful
Corp.).

Authors' Contributions
Cynthia Coffman is the post-doctoral associate who pro-
grammed all simulations and analyzed the Drosophila
data. Rebecca Doerge and Lauren McIntyre designed the
simulation study and assisted with the interpretation of
the results. Marta Wayne provided the Drosophila data,
and assisted in the interpretation of the results. All authors
contributed to the writing of this manuscript.

Acknowledgments
This work is supported by the Purdue University Experimental Research
Station; a National Science Foundation Grant (DBI 98-08026/00-96044) to
LMM, CJC, and RWD; a National Institute of Health grant (NIA-AG 16996)
to LMM; a United States Department of Agriculture Grant (98-35300-
6173) to RWD; a National Institutes of Health grant (GM59884-02) to
MLW, and a Veterans Affairs Health Services Research Postdoctoral Fel-
lowship to CJC.

References
I. Haley C and Knott S: A simple regression method for mapping
quantitative trait loci in line crosses using flanking markers
Heredity 1992, 69:3 15-324.
2. Knott S and Haley C: Aspects of maximum likelihood methods
for mapping of quantitative trait loci in line crosses Genetical
Research 1992, 60:139-152.
3. Kao CH: On the differences between maximum likelihood
and regression interval mapping in the analysis of quantita-
tive trait loci Genetics 2000, 156(2):855-865.
4. Rebal A, Goffinet B and Mangin B: Comparing powers of different
methods for QTL detection Biometrics 1995, 51:87-99.
5. Doerge RW, Zeng Z-B and Weir BS: Statistical issues in the
search for genes affecting quantitative traits in experimental
populations Statistical Science 1997, 13:195-219.
6. Lander ES and Botstein D: Mapping mendelian factors underly-
ing quantitative traits using rflp linkage maps Genetics 1989,
121:185-199.
7. Zeng Z-B: Precision mapping of quantitative trait loci Genetics
1994, 136:1457-1468.
8. Wayne ML, Hackett JB, Dilda CL, Nuzhdin SV and Pasyukova EG:
Quantitative trait locus mapping of fitness-related traits in
Drosophila melanogaster Genetical Research 2001, 77:107-116.
9. Steele R and Torrie J: Principles and procedures of statistics:a biometrical
approach 3rd edition. McGraw-Hill; 1997.
10. Bouletreau-Merle J, Allemand JR, Cohet Y and David JR: Reproduc-
tive strategy in Drosophila melanogaster: Significance of a
genetic divergence between temperate and tropical
populations Oecologia 2001, 53:323-329.
I I. Cohet Y and David J: Control of the adult reproductive poten-
tial by preimaginal thermal conditions Oecologia 1978, 36:295-
306. Publish with BioMed Central and every
12. Nuzhdin SV, Pasyukova EG, Dilda CL, Zeng ZB and Mackay TFC: scientist can read your work free of charge
Sex-specific quantitative trait loci affecting longevity in Dro-
sophila melanogaster Proceedings from the National Academy of "BioMed Central will be the most significant development for
Science 1997, 94:9734-9739. disseminating the results of biomedical research in our lifetime."
13. McIntyre LM, Coffman CJ and Doerge RW: Detection and locali- Sir Paul Nurse, Cancer Research UK
zation of a single binary trait locus in experimental
populations Genetical Research 2001, 78:79-92. Your research papers will be:
14. Agresti A: Categorical Data Analysis John Wiley and Sons, New York, NY; available free of charge to the entire biomedical community
1990.
peer reviewed and published immediately upon acceptance
cited in PubMed and archived on PubMed Central
yours you keep the copyright
Submit your manuscript here: BioMedcentral
http://www.biomedcentral.com/info/publishing adv.asp -


Page 9 of 9
(page number not for citation purposes)


BMC Genetics 2003, 4




University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs