BMC Genetics
B
Bioll Central
Methodology article
A multilocus likelihood approach to joint modeling of linkage,
parental diplotype and gene order in a fullsib family
Qing Lul, Yuehua Cui' and Rongling Wu*1,2
Address: 'Department of Statistics, University of Florida, Gainesville, Florida 32611 USA and 2College of Life Sciences, Zhejiang Forestry
University, Lin'an, Zhejiang 311300, People's Republic of China
Email: Qing Lu qlu@darwin.epbi.cwru.edu; Yuehua Cui ycui@stat.ufl.edu; Rongling Wu* Rwu@mail.ifas.ufl.edu
* Corresponding author
Published: 26 July 2004
BMC Genetics 2004, 5:20 doi: 10.1 186/14712156520
Received: 10 March 2004
Accepted: 26 July 2004
This article is available from: http://www.biomedcentral.com/14712156/5/20
2004 Lu et al; licensee BioMed Central Ltd. This is an openaccess article distributed under the terms of the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
Abstract
Background: Unlike a pedigree initiated with two inbred lines, a fullsib family derived from two
outbred parents frequently has many different segregation types of markers whose linkage phases
are not known prior to linkage analysis.
Results: We formulate a general model of simultaneously estimating linkage, parental diplotype
and gene order through multipoint analysis in a fullsib family. Our model is based on a multinomial
mixture model taking into account different diplotypes and gene orders, weighted by their
corresponding occurring probabilities. The EM algorithm is implemented to provide the maximum
likelihood estimates of the linkage, parental diplotype and gene order over any type of markers.
Conclusions: Through simulation studies, this model is found to be more computationally efficient
compared with existing models for linkage mapping. We discuss the extension of the model and its
implications for genome mapping in outcrossing species.
Background
The construction of genetic linkage maps based on molec
ular markers has become a routine tool for comparative
studies of genome structure and organization and the
identification of loci affecting complex traits in different
organisms [ 1]. Statistical methods for linkage analysis and
map construction have been well developed in inbred line
crosses [2] and implemented in the computer packages
MAPMAKER [3], CRIMAP [4], JOINMAP [5] and MULTI
MAP [6]. Increasing efforts have been made to develop
robust tools for analyzing marker data in outcrossing
organisms [712], in which inbred lines are not available
due to the heterozygous nature of these organisms and/or
longgeneration intervals.
Genetic analyses and statistical methods in outcrossing
species are far more complicated than in species that can
be selfed to produce inbred lines. There are two reasons
for this. First, the number of marker alleles and the segre
gation pattern of marker genotypes may vary from locus
to locus in outcrossing species, whereas an inbred lineini
tiated segregating population, such as an F2 or backcross,
always has two alleles and a consistent segregation ratio
across different markers. Second, linkage phases among
different markers are not known a priori for outbred par
ents and, therefore, an algorithm should be developed to
characterize a most likely linkage phase for linkage
analysis.
Page 1 of 14
(page number not for citation purposes)
http://www.biomedcentral.com/14712156/5/20
To overcome these problems of linkage analysis in out
crossoing species, Grattapaglia and Sederoff [13]
proposed a twoway pseudotestcross mapping strategy in
which one parent is heterozygous whereas the other is
null for all markers. Using this strategy, two parentspe
cific linkage maps will be constructed. The limitation of
the pseudotestcross strategy is that it can only make use
of a portion of molecular markers. Ritter et al. [71 and Rit
ter and Salamini [9] proposed statistical methods for esti
mating the recombination fractions between different
segregation types of markers. Using both analytical and
simulation approaches, Maliepaard et al. [101 discussed
the power and precision of the estimation of the pairwise
recombination fractions between markers. Wu et al. [11]
formulated a multilocus likelihood approach to simulta
neously estimate the linkage and linkage phases of the
crossed parents over multiple markers. Ling [141 proposed
a threestep analytical procedure for linkage analysis in
outcrossing populations, which includes (1) determining
the parental haplotypes for all of the markers in a linkage
group, (2) estimating the recombination fractions, and
(3) choosing a most likely marker order based on optimi
zation analysis. This procedure was used to analyze segre
gating data in an outcrossing forest tree [15]. Currently,
none of these models for linkage analysis in outcrossing
species can provide a onestep analysis for the linkage,
parental linkage phase and marker order from segregating
marker data.
In this article, we construct a unifying likelihood analysis
to simultaneously estimate linkage, linkage phases and
gene order for a group of markers that display all possible
segregation patterns in a fullsib family derived from two
outbred parents (see Table 1 of Wu et al. [11]). Our idea
here is to integrate all possible linkage phases between a
pair of markers in the two parents, each specified by a
phase probability, into the framework of a mixture statis
tical model. In characterizing a most likely linkage phase
(or parental diplotype) based on the phase probabilities,
the recombination fractions are also estimated using a
likelihood approach. This integrative idea is extended to
consider gene orders in a multilocus analysis, in which the
probabilities of all possible gene orders are estimated and
a most likely order is chosen, along with the estimation of
the linkage and parental diplotype. We perform extensive
simulation studies to investigate the robustness, power
and precision of our statistical mapping method incorpo
rating linkage, parental diplotype and gene orders. An
example from the published literature is used to validate
the application of our method to linkage analysis in out
crossing species.
Table I: Estimation from twopoint analysis of the recombination fraction ( SD) and the parental diplotype probability of parent P
( p ) and Q (q ) for five markers in a fullsib family of n = 100
Parental diplotype
Pa x Q.
r = 0.05
r = 0.20
p
p
0.9960 0.9972
I (Oh) 0(lb)
I I/0c
I I1/ c
0.2097 0.0328
0.2103 0.0848
0.1952 0.0777
0.2002 0.0414
0.9882 0.9878
I (0b) 0(lb)
I I/0c
I I1/ c
oShown is the parental diplotype of each parent for the five markers hypothesized, where the vertical lines denote the two homologous
chromosomes. bThe values in the parentheses present a second possible solution. For any two symmetrical markers (2 and 3), p = I, q = 0 and
p = 0, 4 = I give an identical likelihood ratio test statistic (Wu et al. 2002a). Thus, when the two parents have different diplotypes for
symmetrical markers, their parental diplotypes cannot be correctly determined from twopoint analysis. cThe parental diplotype of parent P2 cannot
be estimated in these two cases because marker 4 is homozygous in this parent. The MLE of r is given between two markers under comparison,
whereas the MLEs of p and q given at the second marker.
Page 2 of 14
(page number not for citation purposes)
Marker
I I
c d
I I
a b
I I
o a
I I
b b
I I
c d
0.530 0.0183
0.0464 0.0303
0.0463 0.0371
0.0503 0.0231
BMC Genetics 2004, 5:20
http://www.biomedcentral.com/1 4712156/5/20
Twolocus analysis
A general framework
In general, the genotypes of the two markers for the two
parents can be observed in a molecular experiment, but
the allelic arrangement of the two markers in the two
homologous chromosomes of each parent (i.e., linkage
phase) is not known. In the current genetic literature, a
linear arrangement of nonalleles from different markers
on the same chromosomal region is called the haplotype.
The observable twomarker genotype of parent P is 12/12,
but it may be derived from one of two possible combina
tions of maternally and paternallyderived haplotypes,
i.e., [11] [22] or [12] [21], where we use [] to define a hap
lotype. The combination of two haplotypes is called the
diplotype. Diplotype [11] [22] (denoted by 1) is gener
ated due to the combination of twomarker haplotypes
[11] and [22], whereas diplotype [12] [21] (denoted by
1) is generated due to the combination of twomarker
haplotypes [12] and [21]. If the probability of forming
diplotype [11] [22] is p, then the probability of forming
diplotype [12] [21] is 1 p. The genotype of parent Q and
its possible diplotypes [33] [44] and [34] [43] can be
defined analogously; the formation probabilities of the
two diplotypes are q and 1 q, respectively.
Suppose there is a fullsib family of size n derived from
two outcrossed parents P and Q. Two sets of chromo
somes are coded as 1 and 2 for parent P and 3 and 4 for
parent Q. Consider two marker loci M1 and M2, whose
genotypes are denoted as 12/12 and 34/34 for parent P
and Q, respectively, where we use / to separate the two
markers. When the two parents are crossed, we have four
different progeny genotypes at each marker, i.e., 13, 14, 23
and 24, in the fullsib family. Let r be the recombination
fraction between the two markers.
The cross of the two parents should be one and only one
of four possible parental diplotype combinations, i.e.,
[11] [22] x [33] [44]), [11] [22] x [34] [43], [12] [211_x
133] [441 and [12] [21] x [34] [43], expressed as 11, 1 1,
1 1 and 11 with a probability of pq, p(l q), (1 p)q and
(1 p) (1 q), respectively. The estimation of the recombi
nation fraction in the fullsib family should be based on a
correct diplotype combination [10]. The four combina
tions each will generate 16 twomarker progeny geno
types, whose frequencies are expressed, in a 4 x 4 matrix,
13 14 23 24
(1r)2
4
r(1r)
4
r(1 r)
4
42
4
r(1r)
4
(1r)2
4
r2
4
r(1 r)
4
r(1r)
4
r2
4
(1 r)2
4
r(1 r)
4
13 14 23
13 r(1r) (1r)2 r2
4 4 4
14 (1 r)2 r(1 r) r(1 r)
HIT= 4 4 4
23 r2 r(1r) r(1r)
4 4 4
24 r(1 r) r2 (1r)2
4 4 4
for [11] [22] x [34] [43],
13 14 23
13 r(1 r) r2 (1 r)2
4 4 4
14 2 r(1 r) r(1 r)
H11 = 4 4 4
23 (1 r)2 r(1 r) r(1 r)
4 4 4
24 r(1r) (1r)2 r2
4 4 4
for [121 [211 x [331 [441 and
r(
r(
(1
13 14 23
r2 r(1r) r(1r)
4 4 4
1 r) r 2 (1 r)2
4 4 4
1lr) (1r)2 r2
4 4 4
 r)2 r(1 r) r(1 r)
4 4 4
r 2
4
r(1r)
4
r(1 r)
4
(1 r)2
4
24
r(1 r)
4
r2
4
(1r)2
4
r(1r)
4
24
r(1r)
4
(1 r)2
4
T2
4
r(1r)
4
24
(1r)2
4
r(1 r)
4
r(1 r)
4
r2
4
Page 3 of 14
(page number not for citation purposes)
BMC Genetics 2004, 5:20
for [11] [22] x [33] [44],
http://www.biomedcentral.com/1 4712156/5/20
for [12] [21] x [34] [43]. Note that these matrices are
expressed in terms of the combinations of the progeny
genotypes for two markers and 2, respectively.
Let n = (njlj2)4 x 4 denote the matrix for the observations of
progeny where j1,j2 = 1 for 13, 2 for 14, 3 for 23, or 4 for
34 for the progeny genotypes at these two markers. Under
each parental diplotype combination, njlj2 follows a
multinomial distribution. The likelihood for the four
diplotype combinations are expressed as
L11 ro(2N2+N3+N4)(1 r)(2N1+N3+N4),
LiT oc r(2N4+N +N2) (1 r)(2N3+N +N2)
LTl o r(2N3+Nl+N2) r)(2N4+Ni+N2)
LTT o r(2Ni+N3+N4) (1 r)(2N2+N3+N4),
where N = nl + n22 + n33 + n44, N2 = n14 + n23 + n32 + n41,
N3 = n12 + n21 + 34 + n43, and N4 = n13 + n31 + n24 + n42. It
can be seen that the maximum likeihood estimate (MLE)
ofr ( F) under the first diplotype combination is equal to
one minus T under the fourth combination, and the same
relation holds between the second and third diplotype
combinations. Although there are identical plugin likeli
hood values between the first and fourth combinations as
well as between the second and third combinations, one
can still choose an appropriate f from these two pairs
because one of them leads to f greater than 0.5. Tradi
tional approaches for estimating the linkage and parental
diplotypes are to estimate the recombination fractions
and likelihood values under each of the four combina
tions and choose one legitimate estimate of r with a
higher likelihood.
In this study, we incorporate the four parental diplotype
combinations into the observed data likelihood,
expressed as
L(en) = pL1, +p(1 q)LIT +(1 p)qLT +(1 p)(1 q)LTT (1)
where = (r, p, q) is an unknown parameter vector, which
can be estimated by differentiating the likelihood with
respect to each unknown parameter, setting the deriva
tives equal to zero and solving the likelihood equations.
This estimation procedure can be implemented with the
EM algorithm [2,11,16]. Let H be a mixture matrix of the
genotype frequencies under the four parental diplotype
combinations weighted by the occurring probabilities of
the diplotype combinations, expressed as
H = pqH + p(1 q)H1T + (1 p)qHT + (1 p)(1 q)HTT
13 14 23 24
13 F a b c d
=14 b a d c
23 c d a b
24 d c b a
where
a= [pq(1 r)2+(p+q 2pq)r(1r)+(1 p)(1q)r2],
4
b = [p(1 q)(1r)2 + (1 pq+2pq)r(1r)+ (1 p)qr2],
4
c = [(1 p)q(1r)2 +(1 pq+2pq)r(1r)+p(1q)2],
d [(1 p)(1 q)(1 r)2 + (p + q 2pq)r(1 r) + pqr2.
4
Similar to the expression of the genotype frequencies as a
mixture of the four diplotype combinations, the expected
number of recombination events contained within each
twomarker progeny genotype is the mixture of the four
different diplotype combinations, i.e.,
D = pqD1 + p(1 q)D1T + (1 p)qDT + (1 p)(1 q)DTT
13 14 23 24
13 2p q 1p+q 1+p q p+q 1
=14 1p+q 2pq p+q 1+pq ,
23 I+pq p+q 2pq 1p+q
24 p+q +pq 1 p+q 2pq
where the expected number of recombination events for
each combination are expressed as
Page 4 of 14
(page number not for citation purposes)
BMC Genetics 2004, 5:20
http://www.biomedcentral.com/14712156/5/20
13 14 23 24
0 1 1 2
1 0 2 1 ,
1 2 0 1
2 1 1 0
S4 4 q
qJJ2 } 1 j1 j2 (6)
l j2 =1 (6)
where djj2, I, pj2 and q2 are the (Jj2)th element of
matrix D, H, P and Q, respectively.
M Step: Calculate r +1} using the equation,
13
DIT = 14
23
24
13
DT = 14
23
24
13
DTT = 14
23
24
13 14 23 24
2 1 1 01
1 2 0 1 .
1 0 2 1
0 11 2J
Define
P = pqH11 +p(1 q)H1T,
Q= pqH11 +(1 q)qH .
The general procedure underlying the { r + 1} th EM step is
given as follows:
E Step: At step r, using the matrix H based on the current
estimate rl{ calculate the expected number of recombi
nation events between two markers for each progeny gen
{o+ 1} {a+ 1}
type and pii2 1ql2 '1
c+1} = dP n.
p{+l}= 1 4 4 p}
1112 n hl 1112'
"1=1 j2 = 1 j
4 4 4
r{r+} 1 c{r+} (7)
2n 1102
Jl =1 j2 =1
The E step and M step among Eqs. (4) (7) are repeated
until r converges to a value with satisfied precision. The
converged values are regarded as the MLEs of .
Model for partially informative markers
Unlike an inbred line cross, a fullsib family may have
many different marker segregation types. We symbolize
observed marker alleles in a fullsib family by A1, A2, A3
and A4, which are codominant to each other but domi
nant to the null allele, symbolized by 0. Wu et al. [11]
listed a total of 28 segregation types, which are classified
into 7 groups based on the amount of information for
linkage analysis:
A. Loci that are heterozygous in both parents and segre
gate in a 1:1:1:1 ratio, involving either four alleles A1A2 x
A3A4, three nonnull alleles A1A2 x A1A3, three nonnull
alleles and a null allele A1A2 x A30, or two null alleles and
two nonnull alleles AO x A20;
B. Loci that are heterozygous in both parents and segre
gate in a 1:2:1 ratio, which include three groups:
B1. One parent has two different dominant alleles and the
other has one dominant allele and one null allele, e.g.,
A1A2 x A10;
B2. The reciprocal of BI;
B3. Both parents have the same genotype of two codomi
nant alleles, i.e., A1A2 x A1A2;
C. Loci that are heterozygous in both parents and segre
gate in a 3:1 ratio, i.e., A1O x A1O;
D. Loci that are in the testcross configuration between the
parents and segregate in a 1:1 ratio, which include two
groups:
DI. Heterozygous in one parent and homozygous in the
other, including three alleles A1A2 x A3A3, two alleles A1A2
Page 5 of 14
(page number not for citation purposes)
13
Dll = 14
23
24
BMC Genetics 2004, 5:20
http://www.biomedcentral.com/14712156/5/20
x AIA1, AIA2 x 00 and A20 x AIA1, and one allele (with
three null alleles) A10 x 00;
D2. The reciprocals of D1.
The marker group A is regarded as containing fully inform
ative markers because of the complete distinction of the
four progeny genotypes. The other six groups all contain
the partially informative markers since some progeny geno
type cannot be phenotypically separated from other gen
otypes. This incomplete distinction leads to the
segregation ratios 1:2:1 (B), 3:1 (C) and 1:1 (D). Note that
marker group D can be viewed as fully informative if we
are only interested in the heterozygous parent.
In the preceding section, we defined a (4 x 4)matrix H for
joint genotype frequencies between two fully informative
markers. But for partially informative markers, only the
joint phenotypes can be observed and, thus, the joint geno
type frequencies, as shown in H, will be collapsed accord
ing to the same phenotype. Wu et al. [11] designed
specific incidence matrices (I) relating the genotype fre
quencies to the phenotype frequencies for different types
of markers. Here, we use the notation H' = I, HIb2 for a
(b, x b2) matrix of the phenotype frequencies between two
partially informative markers, where b1 and b2 are the
numbers of distinguishable phenotypes for markers M1
and M2, respectively. Correspondingly, we have
(DH)' = I (D o H)Ib2,P'= I1PIb2 and Q'= I QIb2 The
EM algorithm can then be developed to estimate the
recombination fraction between any two partial informa
tive markers.
E Step: At step r, based on the matrix (DH)' derived from
the current estimate r{ T}, calculate the expected number of
recombination events between the two markers for a
given progeny genotype and pj j1, qJ`+I j2
C (dh)'}
h112
1 bl 2 }^}
.+1} 1 1112
p =  n 1 2
{+l} =1 2 =1 O
b b2 2
q =  J nJ 2
where (dh) h'j, p and q' is the (j2th ele
ment of matrices (DH)', H', P' and Q', respectively.
M Step: Calculate r{ 1} using the equation,
r T 1 1 = I b I b 2
r{T+l} = b1 b {T+l}
2n 11 i 1
j, =1 i2 =1
The E and M steps between Eqs. (8) (11) are repeated
until the estimate converges to a stable value.
Threelocus analysis
A general framework
Consider three markers in a linkage group that have three
possible orders M1 M2 M3(01 ), M1M3 M2(O2)
and M2M1M3(03 ). Let o1, 02 and 03 be the corre
sponding probabilities of occurrence of these orders in the
parental genome. Without loss of generality, for a given
order, the allelic arrangement of the first marker between
the two homologous chromosomes can be fixed for a par
ent. Thus, the change of the allelic arrangements at the
other two markers will lead to 2 x 2 = 4 parental diplo
types. The threemarker genotype of parent P (12/12/12)
may have four possible diplotypes, [111] [222], [112]
[221], [121] [212] and [122] [211]. Relative to the fixed
allelic arrangement 1121 of the first marker on the two
homologous chromosomes 1 and 2, the probabilities of
allelic arragments 112 and 2 1 are denoted as p1 and 1 
p, for the second marker and as p2 and 1 p2 for the third
marker, respectively. Assuming that allelic arrangements
are independent between the second and third marker,
the probabilities of these four threemarker diplotypes can
be described by p2, p1(l p2), (1 p1)2 and (1 p1) (1 
p2), respectively. The four diplotypes of parent Q can also
be constructed, whose probabilities are defined as qlq2,
q(1 q2), (1 q)q2 and (1 q) (1 q2) respectively. Thus,
there are 4 x 4 = 16 possible diplotype combinations
(whose probabilities are the product of the corresponding
diplotype probabilities) when parents P and Q are
crossed.
Let r12 denote the recombination fraction between mark
ers M1 and M2, with r23 and r13defined similarly. These
recombination fractions are associated with the probabil
ities with which a crossover occurs between markers M1
and M2 and between markers M2 and M3. The event
that a crossover or no crossover occurs in each interval is
denoted by Dll and Doo, respectively, whereas the events
that a crossover occurs only in the first interval or in the
second interval is denoted by D10 and D01, respectively.
Page 6 of 14
(page number not for citation purposes)
BMC Genetics 2004, 5:20
http://www.biomedcentral.com/1 4712156/5/20
The probabilities of these events are denoted by d00, d01,
d,,, nd ~ 1,, respectively, whose sum equals 1. According to
the definition of recombination fraction as the probabil
ity of a crossover between a pair of loci, we have r12 = do
+ d 1, r23 = do, + d11 and r13 = do, + d10. These relationships
have been used by Haldane [17] to derive the map func
tion that converts the recombination fraction to the corre
sponsding genetic distance.
For a threepoint analysis, there are a total of 16 (16 x 4)
matrices for genotype frequencies under a given marker
order (Ok ), each corresponding to a diplotype combina
tion, denoted by H kX kkyk where x1,x2 = 1 for 12 or 2
X1x2yly2
for 2111 denote the two alternative allelic arrangements of
the second and third marker, respectively, for parent P,
and y, ) 2 =1 for 1121 or 2 for 2 11 denote the two alter
native allelic arrangements of the second and third
marker, respectively, for parent Q. According to Ridout et
al. [18] and Wu et al. [111, elements in H k k k k are
X1 x2 y2
expressed in terms of doo, do0, d10 and d11.
Similarly, there are 16 (16 x 4)matrices for the expected
numbers of crossover that have occurred for D00, D0I, D10
and D 1 for a given marker order, denoted by D k
X2 xiy2 Y2
DO D D10 and D1 I k respectively. In their
Xx2y Y2 x2x Y2 x2x Y2
Table 2, Wu et al. [11] gave the threelocus genotype fre
quencies and the number of crossovers on different
marker intervals under marker order 01 .
The joint genotype frequencies of the three markers can be
viewed as a mixture of 16 diplotype combinations and
three orders, weighted by their occurring probabilities,
and is expressed as
Hpt ( pSf2 _'(1p _' '() p2 x(l q ) q2) ''(lq '))22' (1 q 2 (12)
Similarly, the expected number of recombination events
contained within a progeny genotype is the mixture of the
different diplotype and order combinations, expressed as:
DAlso define
Also define
(1 P)' q1 (o1 )
`(1 ) ` 112'(1_ 1Y
p' i 2 2. 2. Pi P ( P2) '
1 p (1 2)1 2 1 11
22ly =1y2=1q 0
3 12
2=' o0 2. 2~ p2 ) 2 (1 1 2 lp )2q1 (1 ql) q2 (1 q2) H
k1 Xl(1 Yl 1 1 2
3 2 2 2
Q2= o p 2 (1i pPl) 4 p (p)x q1 2 2 (1 q) lH1
k1 =x =lg =
The occurring probabilities of the three marker orders are
the mixture of all diplotype combinations, expressed, in
matrix notation, as
o1_ (p2 2 2P2) q ( q) q ( q2) H( 2
Y Y ( P )1 P2 (P2)"2 ql (l) q 2 (lq2) H2 ly2^,
0 Pl2 ( Pl)' P2 ( P2)2 l1 q2" l) 1q22 (1)q2)2p H Xjq q q(8)H
We implement the EM algorithm to estimate the MLEs of
the recombination fractions between the three markers.
The general equations formulating the iteration of the { T
+ 1}th EM step are given as follows:
E Step: As step r, calculate the expected number of recom
bination events associated with D00(c'), D01 (fl), D10(7),
Dl (S) for the UjiJ3)th progeny genotype (where jl,j2 and
j3 denote the progeny genotypes of the three individual
markers, respectively):
{+1} DOO{T}
J11213 1J2J3 11213
4{.+l,} DO{T}n
{T+1} DlO{ nl
11J213 1ii23 112 3
(13) {*+1} .{+1} {*+1} { +1}
(13) Calculate 2 3 P J2J3, 23 q2jl2J3
= 1,2,3) using
and q{T+1} (k
Do, ^ f(1 P1' )f (1 ) P2)"2 (14)P
Page 7 of 14
(page number not for citation purposes)
BMC Genetics 2004, 5:20
2)r2 D;
ql)YIlq2y1 (Iq2)Y1'H
DooS~ S = ok 2xi y'P '1p 2 )x1 l 1d)' ^ _2 )'2'1D
http://www.biomedcentral.com/1 4712156/5/20
Table 2: Estimation from threepoint analysis of the recombination fraction ( T SD) and the parental diplotype probabilities of parent
P ( p ) and Q ( 4 ) for five markers in a fullsib family of n = 100
Parental diplotype f f
Marker P x Q Case I Case 2 p q Case I Case 2 p
Recombination fraction = 0.05
M1 a b c d
I  I 0.05 1 0.0175
M2 a b a b 0.1008 0.0298 0.9978 0.9986
  1 1 0.0578 0.0269 0.0557 0.03 12
AM3 a o x o a 0.9977 0 0.0988 0.0277 I 0
  1 1 0.0512 0.0307 0.0476 0.0280 I I/(
A4 a b b b 0.0932 0.0301 I I/0 I 1/(
I  1 0.0514 0.0229
MA5 a b c d I I
Recombination fraction = 0.20
M1 a b c d
I  1 0.2026 0.0348
a2 a b a b 0.3282 0.0482 0.9918 0.9916
,A   0.2240 0.0758 0.2408 0.0939
a o x o a 0.9944 0 0.3241 0.0488 I 0
  1 1 0.1927 0.0613 0.1824 0.0614
Al4 a b b b 0.3161 0.0502 I I/0 I /(
I  1 0.2017 0.0393
MA5 a b c d I I
Case I denotes the recombination fraction between two adjacent markers, whereas case 2 denotes the recombination fraction between the two
markers separated by a third marker. See Table I for other explanations.
p{T+ I} 1 4 4 4 \ hh}
7+h } nJ1 !3'
T+ I P12ij3) n
P11,11,223 n h212 3
i =12 =13 =1 h1j2J3
4 4 4 P{ }
{T+I} I Y E 2( l23)n
111213 n 2a 2 13'
=1 213 3
2jlJ23 n jl=lJ2 =1 j3=l hJl'2
{T+I} I4 4 4 q
ii =1 2 =1 j3 =1 h' JlJ2J3
(23)
(24)
(25)
(26)
4{T+I} 1 4 4 4 0{T2
',,2J3 = I I I I {T} 1J1J213 (27)
3 n j3=1 h2 3
j1=1 i2 =1 ij3 1 23
where nj1j2j3 denote the number of progeny with a partic
ular threemarker genotype, hjlj2j3, D 2 3, D 2 3
10 11
D 123' 11 2j P 11j2j3)' P2(1j2j3), q101j2j3) and ,.. are
the i j3)th element of matrices H, Doo, Dog, Dio, DI, Pi,
P2, Q1 and Q2, respectively.
M Step: Calculate d +, d+1, d+1} and d+1} using
the equations,
+1_ 1 Y a
S 2n =l=1 J2 =1 j3=1l 23
Page 8 of 14
(page number not for citation purposes)
BMC Genetics 2004, 5:20
0
0
0
http://www.biomedcentral.com/1 4712156/5/20
P i1 _1 4 4 4 {+}
01 2"jlJ2 J 3
2n i =112 12 3 =1 11213
4 4 4 {+1
+ =" 101 j 1 '121 3
2"ll =1 2==1 3=l
(29)
(30)
d 1 4 4 (31)
2n' lh i1
= 1=,11 2=1 3=1
The E and M steps are repeated among Eqs. (19) (32)
until doo, do01, d10 and d1l converge to values with satisfied
precision. From the MLEs of the g's, the MLEs of recombi
nation fractions r12, r13 and r23 can be obtained according
to the invariance property of the MLEs.
Model for partial informative markers
Consider three partially informative markers with the
numbers of distinguishable phenotypes denoted by b1, b2
and b3, respectively. Define H'= (I[ (01 T )HIb; is a (b1b2
x b3) matrix of genotype frequencies for three partially
informative markers. Similarly, we define
(HDoo)'= (IT g IT )(Doo H)Ib, (HDo0)' (IT IT )(Dxo I H)Ib, (HD10)' (IT x
IT "ii ... (HD11)'=(I1 IT 111,, .. ,P = (I bT )Plb, 2 (I
Q1 = (T1 (I )QIb Q2 = (I I = (IT l1 )OIllb I 0 = ( IT
and O'3 = (I I )03l
Using the procedure described in Section (2.2), we imple
ment the EM algorithm to estimate the MLEs of the
recombination fractions among the three partially
informative markers.
mpoint analysis
Threepoint analysis considering the dependence of
recombination events among different marker intervals
can be extended to perform the linkage analysis of an arbi
trary number of markers. Suppose there are m ordered
markers on a linkage group. The joint genotype probabil
ities of the m markers form a (4m1 x 4)dimensional
matrix. There are 2'1 x 2'1 such probability matrices each
corresponding to a different parental diplotype combina
tion. The reasonable estimates of the recombination frac
tions rely upon the characterization of a most likely
parental diplotype combination based on the multilocus
likelihood values calculated.
The mmarker joint genotype probabilities can be
expressed as a function of the probability of whether or
not there is a crossover occurring between two adjacent
markers, D 11 ...1Im1, where 11, 12 ..., l i_1 are the indicator
variables denoting the crossover event between markers
M1 and M2, markers M2 and M3, ..., and markers
Mm_1 and Mm, respectively. An indicator is defined as 1
if there is a crossover and 0 otherwise. Because each indi
cator can be taken as one or zero, there are a total of 2"1
D's.
The occurring probability of intervalspecific crossover
' I I can be estimated using the EM algorithm. In the
E step, the expected number of interval specific crossovers
is calculated (see Eqs. (19) (22) for threepoint
analysis). In the M step, an explicit equation is used to
estimate the probability 1~, I The MLEs of 1,,
are further used to estimate m(m 1)/2 recombination
fractions between all possible marker pairs. In mpoint
analysis, parental diplotypes and gene orders can be
incorporated in the model.
Monte Carlo simulation
Simulation studies are performed to investigate the statis
tical properties of our model for simultaneously estimat
ing linkage, parental diplotype and gene order in a fullsib
family derived from two outbred parents. Suppose there
are five markers of a known order on a chromosome.
These five markers are segregating differently in order,
1:1:1:1, 1:2:1, 3:1, 1:1 and 1:1:1:1. The diplotypes of the
two parents for the five markers are given in Table 1 and
using these two parents a segregating fullsib family is gen
erated. In order to examine the effects of parameter space
on the estimation of linkage, parental diplotype and gene
order, the fullsib family is simulated with different
degrees of linkage (r = 0.05 vs. 0.20) and different sample
sizes (n = 100 vs. 200).
As expected, the estimation precision of the recombina
tion fraction depends on the marker type, the degree of
linkage and sample size. More informative markers, more
tightly linked markers and larger sample sizes display
greater estimation precision of linkage than less informa
tive markers, less tightly linked markers and smaller sam
ple sizes (Tables 1 and 2). To save space, we do not give
the results about the effects of sample size in the tables.
Our model can provide an excellent estimation of paren
tal linkage phases, i.e., parental diplotype, in twopoint
analysis. For example, the MLE of the probability (p or q)
of parental diplotype is close to 1 or 0 (Table 1), suggest
ing that we can always accurately estimate parental diplo
types. But for two symmetrical markers (e.g., markers M2
and M3 in this example), two sets of MLEs, p = 1, q = 0
and p = 0, 4 = 1, give an identical likelihood ratio test
statistic. Thus, twopoint analysis cannot specify parental
Page 9 of 14
(page number not for citation purposes)
BMC Genetics 2004, 5:20
http://www.biomedcentral.com/1 4712156/5/20
diplotypes for symmetrical markers even when the two
parents have different diplotypes.
The estimation precision of linkage can be increased when
a threepoint analysis is performed (Table 2), but this
depends on different marker types and different degrees of
linkage. Advantage of threepoint analysis over twopoint
analysis is more pronounced for partially than fully
informative markers, and for less tightly than more tightly
linked markers. For example, the sampling error of the
MLE of the recombination fraction (assuming r = 0.20)
between markers M2 and M3 from twopoint analysis
is 0.0848, whereas this value from a threepoint analysis
decreases to 0.0758 when combining fully informative
marker M1 but increases to 0.0939 when combining par
tially informative marker M4. The threepoint analysis
can clearly determine the diplotypes of different parents
as long as one of the three markers is asymmetrical. In our
example, using either asymmetrical marker M1 or M4,
the diplotypes of the two parents for two symmetrical
markers ( M2 and M3) can be determined. Our model
for threepoint analysis can determine a most likely gene
order. In the threepoint analyses combining markers
M1M3, markers M2M4 and marker M3M5, the
MLEs of the probabilities of gene order are all almost
equal to 1, suggesting that the estimated gene order is con
sistent with the order hypothesized.
To demonstrate how our linkage analysis model is more
advantageous over the existing models for a fullsib family
population, we carry out a simulation study for linked
dominant markers. In twopoint analysis, two different
parental diplotype combinations are assumed: (1) [aal
[ool x [aal] ool (cis x cis) and (2) [ao] [oa] x [aol [oal (trans
x trans). The MLE of the linkage under combination (2),
in which two dominant alleles are in a repulsion phase, is
not as precise as that under combination (1), in which
two dominant nonalleles are in a coupling phase [12].
For a given data set with unknown linkage phase, the tra
ditional procedure for estimating the recombination frac
tion is to calculate the likelihood values under all possible
linkage phase combinations (i.e., cis x cis, cis x trans, trans
x cis and trans x trans). The combinations, cis x cis and
trans x trans, have the same likelihood value, with the MLE
of one combination being equal to the subtraction of the
MLE of the second combination from 1. The same rela
tionship is true for cis x trans and trans x cis. A most likely
phase combination is chosen corresponding to the largest
likelihood and a legitimate MLE of the recombination
fraction (r_< 0.5) [101].
Table 3: Comparison of the estimation of the linkage and parental diplotype between two dominant markers in a fullsib family ofn =
100 from the traditional and our model
Traditional model
Data simulated from cis x cis
Correct diplotype combination
Loglikelihooda
T under each diplotype combination
Estimated diplotype combination
r under correct diplotype combination
Diplotype probability for parent P (/)
Diplotype probability for parent Q ( 4 )
Data simulated from trans x trans
Correct diplotype combination
Loglikelihooda
r under each diplotype combination
Estimated diplotype combination
T under correct diplotype combination
Diplotype probability for parent P (p )
Diplotype probability for parent Q ( 4 )
trans x trans
Incorrect
46.2
0.8018 0.0446
0.1982 0.0446
1.0000 0.0000
1.0000 0.0000
'The loglikelihood values given here are those from one random simulation for each diplotype combination by the traditional model.
Page 10 of 14
(page number not for citation purposes)
is x cis
Our model
Correct
46.2
0.1981 0.0446
Selected
0.1981 0.0446
Incorrect
89.6
0.8573 0.1253
cis x trans
Incorrect
92.3
0.5000 0.0000
Incorrect
89.6
0.0393 0.0419
Selected
trans x as
Incorrect
92.3
0.5000 0.0000
Incorrect
89.6
0.0393 0.0419
Selected
Correct
89.6
0.1426 0.1253
0.1426 0.1253
0.1428 0.1253
0.0000 0.0000
0.0000 0.0000
BMC Genetics 2004, 5:20
http://www.biomedcentral.com/1 4712156/5/20
Table 4: Comparison of the estimation of the linkage and gene order between three dominant markers in a fullsib family of n = 100
from the traditional and our model
M1 M2 M3
Data stimulated from [aaa] [ooo] x [aaa] [ooo]
Correct gene order Correct
Estimated best gene order 100
r12 0.2047 0.0422
0.1980 0.0436
0.3245 0.0619
Prob(A41 AM2 13 )(1)
Prob (A1 A3A 2 )(2)(
Prob (M2A41 1 3 (03)
Data simulated from [aao] [ooa] x [aao] [ooa]
Correct gene order Correct
Estimated best gene order 80
/9 0.1991 0.0456
0.1697 0.0907
0.3218 0.0755
Traditional model
M1 M3 M2
Incorrect
0
Incorrect
II
0.8165 0.1003
0.8220 0.0338
0.2703 0.0586
Prob(A41 M2 13 )(}
Prob (A41 A4M3 A2 (62)
Prob (A2 A41 3 (3 )
oThe percent of a total of 200 simulations that have a largest likelihood for a given gene order estimated from the traditional approach. In this
example used to examine the advantage of implementing gene orders, known linkage phases are assumed.
For our data set simulated from [aal [ool x [aal [ool, one
can easily select cis x cis as the best estimation of phase
combination because it corresponds to a larger likelihood
and a smaller r (Table 3). Our model incorporating the
parental diplotypes can provide comparable estimation
precision of the linkage for the data from [aal [ool x [aal
[ool and precisely determine the parental diplotypes (see
the MLEs of p and q; Table 3). Our model has great advan
tage over the traditional model for the data derived from
[ao] [oa] x [ao] [oa]. For this data set, the same likelihood
was obtained under all possible four diplotype
combinations (Table 3). In this case, one would select cis
x trans or trans x cis because these two phase combinations
are associated with a lower estimate of r. But this estimate
of r (0.0393) is biased since it is far less than the value of
0.20 hypothesized. Our model gives the same estimation
precision of the linkage for the data derived from [aol [oal
x [ao] [oa] as obtained when the analysis is based on a cor
rect diplotype combination (Table 3). Also, our model
can precisely determine the parental diplotypes (p = q =
0).
In threepoint analysis, we examine the advantage of
implementing linkage analysis with gene orders. Three
dominant markers are assumed to have two different
parental diplotypes combinations: (1) [aaa] [ooo] x [aaal
[ooo] and (2) [aaol [ooal x [aaol [ooal. The traditional
approach is to calculate the likelihood values under three
possible gene orders and choose one of a maximum like
lihood to estimate the linkage. Under combination (1), a
Page 11 of 14
(page number not for citation purposes)
Our model
M2 1 M3
Incorrect
0
0.2048 0.0422
0.1985 0.0434
0.3235 0.0618
0.9860 0.0105
0.0060 0.0071
0.0080 0.0079
Incorrect
9
0.9284 0.0724
0.1636 0.0608
0.7821 0.0459
0.2104 0.0447
0.2073 0.0754
0.2944 0.0929
0.9952 0.0058
0.0045 0.0058
0.0003 + 0.0015
BMC Genetics 2004, 5:20
http://www.biomedcentral.com/14712156/5/20
most likely gene order can be well determined and, there
fore, the recombination fractions between the three mark
ers well estimated, because the likelihood value of the
correct order is always larger than those of incorrect orders
(Table 4). However, under combination (2), the estimates
of linkage are not always precise because with a frequency
of 20% gene orders are incorrectly determined. The
estimates ofr's will largely deviate from their actual values
based on a wrong gene order (Table 4). Our model incor
porating gene order can provide the better estimation of
linkage than the traditional approach, especially between
those markers with dominant alleles being in a repulsion
phase. Furthermore, a most likely gene order can be deter
mined from our model at the same time when the linkage
is estimated.
Our model is further used to perform joint analyses
including more than three markers. When the number of
markers increases, the number of parameters to be esti
mated will be exponentially increased. For fourpoint
analysis, the speed of convergence was slow and the accu
racy and precision of parameter estimation have been
affected for a sample size of 200 (data not shown).
According to our simulation experience, the improvement
of morethanthreepoint analysis can be made possible
by increasing sample size or by using the estimates from
two or threepoint analysis as initial values.
A worked example
We use an example from published literature [ 181 to dem
onstrate our unifying model for simultaneous estimation
of linkage, parental diplotype and gene order. A cross was
made between two triple heterozygotes with genotype
AaVvXx for markers A, 'V and X. Because these three
markers are dominant, the cross generates 8 distinguisha
ble genotypes, with observations of 28 for A_/V_/X, 4 for
A/V/xx, 12 for A/vv/X, 3 for A_/vv/xx, 1 for aa/V/X, 8 for
aa/V_/xx, 2 for aa/vv/X and 2 for aa/vv/xx. We first use two
point analysis to estimate the recombination fractions
and parental diplotypes between all possible pairs of the
three markers. The recombination fraction between mark
ers A and 'V is rryV = 0.3764, whose the estimated
parental diplotypes are [Av] [aV] x [AV] [av] or [AV] [av] x
[Av] [aV]. The other two recombination fractions and the
corresponding parental displotypes are estimated as
rq,x = 0.3855, [Vxl [vX] x [VX] [vxl or [VXI [vxl x [Vxl
[vX] and rya = 0.1836, [AX] [ax] x [AXI [ax], respec
tively. From the twopoint analysis, one of the two parents
have dominant alleles from markers A and X are
repulsed with the dominant alleles from marker V .
Our subsequent threepoint analysis combines parental
diplotypes and gene orders to estimate the linkage
between these three dominant markers. The estimated
gene order is XA'V. The MLEs of the recombination
fractions are rxg = 0.2120, rgy = 0.3049 and
rXyV = 0.3049. The parental diplotype combination is
[XAVI [xav] x [XAv] [xaV] or [XAv] [xaV] x [XAVI [xav]. The
threepoint analysis for these three markers by Ridout et
al. [18] led to the estimates of the three recombination
fractions all equal to 0.20. But their estimates may not be
optimal because the effect of gene order on r was not
considered.
Discussion
Several statistical methods and software packages have
been developed for linkage analysis and map construction
in experimental crosses and wellstructured pedigrees [2
6], but these methods need unambiguous linkage phases
over a set of markers in a linkage group. For outcrossing
species, such as forest trees, it is not possible to know exact
linkage phases for any of two parents that are crossed to
generate a fullsib family prior to linkage analysis. This
uncertainty about linkage phases makes linkage mapping
in outcrossing populations much more difficult than that
in phaseknown pedigrees [7,9].
In this article we present a unifying model for simultane
ously estimating the linkage, parental diplotype and gene
order in a fullsib family derived from two outbred par
ents. As demonstrated by simulation studies, our model is
robust to different parameter space. Compared to the tra
ditional approaches that calculate the likelihood values
separately under all possible linkage phases or orders
[9,10,18], our approach is more advantageous in three
aspects. First, it provides a onestep analysis of estimating
the linkage, parental diplotype and gene order, thus facil
itating the implementation of a general method for ana
lyzing any segregating type of markers for outcrossing
populations in a package of computer program. For some
shortgenerationinterval outcrossing species, we can
obtain marker information from grandparents, parents
and progeny. The model presented here allow for the use
of marker genotypes of the grandparents to derive the
diplotype of the parents. Second, our model for the first
time incorporates gene ordering into a unified linkage
analysis framework, whereas most earlier studies only
emphasized on the characterization of linkage phases
through a multilocus likelihood analysis [11,14,15].
Instead of a comparative analysis of different orders, we
proposed to determine a most likely gene order by esti
mating the order probabilities.
Third, and most importantly, our unifying approach can
significantly improve the estimation precision of the link
age for dominant markers whose alleles are in repulsion
phase. Previous analyses have indicated that the estimate
Page 12 of 14
(page number not for citation purposes)
BMC Genetics 2004, 5:20
http://www.biomedcentral.com/1 4712156/5/20
of the linkage between dominant markers in a repulsion
phase is biased and imprecise, especially when the linkage
is not strong and when sample size is small [12]. There are
two reasons for this: (1) the linkage phase cannot be cor
rectly determined, and/or (2) there is a fairly high possi
bility (20%) of detecting a wrong gene order. Our
approach provides more precise estimates of the recombi
nation fraction because correct parental diplotypes and a
correct gene order can be determined.
Our approach will be broadly useful in genetic mapping
of outcrossing species. In practice, a twopoint analysis
can first be performed to obtain the pairwise estimates of
the recombination fractions and using this pairwise infor
mation markers are grouped based on the criteria of a
maximum recombination fraction and minimum likeli
hood ratio test statistic [2]. The parental diplotypes of
markers in individual groups are constructed using a
threepoint analysis. With a limited sample size available
in practice, we do not recommend morethanthreepoint
analysis because this would bring too many more
unknown parameters to be precisely estimated. If such an
analysis is desirable, however, one may use the results
from these lowerpoint analyses as initial values to
improve the convergence rate and possibly the precision
of parameter estimation.
In any case, our two and threepoint analysis has built a
key stepping stone for map construction through two
approaches. One is the leastsquares method, as originally
developed by Stam [5], that can integrate the pairwise
recombination fractions into reconstruction of multilocus
linkage map. The second is to use the hidden Markov
chain (HMC) model, first proposed by Lander and Green
[2], to construct genetic linkage maps by treating map
construction as a combinatorial optimization problem.
The simulated annealing algorithm [19] for searching for
optima of the multilocus likelihood function need to be
implemented for the HMC model. A userfriendly package
of software that is being written by the senior author will
implement two and threepoint analyses as well as the
algorithm for map construction based on the estimates of
pairwise recombination fractions. This software will be
online available to the public.
Our maximum likelihoodbased approach is imple
mented with the EM algorithm. We also incorporate the
Gibbs sampler [20] into the estimation procedure of the
mixture model for the linkage characterizing different
parental diplotypes and gene orders of different markers.
The results from the Gibbs sampler are broadly consistent
with those from the EM algorithm, but the Gibbs sampler
is computationally more efficient for a complicated prob
lem than the EM algorithm. Therefore, the Gibbs sampler
may be particularly useful when our model is extended to
consider multiple fullsib families in which the parents
may be selected from a natural population. For such a
multifamily design, some population genetic parameters
describing the genetic structure of the original population,
such as allele frequencies and linkage disequilibrium,
should be incorporated and estimated in the model for
linkage analysis. It can be anticipated that the Gibbs sam
pler will play an important role in estimating these
parameters simultaneously along with the linkage, link
age phases, and gene order.
Authors' contributions
QL derived the genetic and statistical models and wrote
computer programs. YHC participated in the derivations
of models and statistical analyses. RLW conceived of ideas
and algorithms, and wrote the draft. All authors read and
approved the final manuscript.
Acknowledgements
We thank two anonymous referees for their constructive comments on
the manuscript. This work is partially supported by a University of Florida
Research Opportunity Fund (02050259) and a University of South Florida
Biodefense Grant (722206112) to R. W. The publication of this manuscript
is approved as Journal Series No. R10073 by the Florida Agricultural
Experiment Station.
References
I. Flint J, Mott R: Finding the molecular basis of quantitative
traits: Successes and pitfalls. Nat Rev Genet 2001, 2:437445.
2. Lander ES, Green P: Construction of multilocus genetic linkage
maps in humans. Proc NatlAcd Sci USA 1987, 84:23632367.
3. Lander ES, Green P, Abrahamson J, Barlow A, Daly MJ, Lincoln SE,
Newburg L: MAPMAKER: an interactive computer package
for constructing primary genetic linkage maps of experimen
tal and natural populations. Genomics 1987, 1:174181.
4. Green P, Falls K, Crooks S: Documentation for CRIMAP, ver
sion 2.4. Washington Univ. School of Medicine, St. Louis, MO. 1990.
5. Stam P: Construction of integrated genetic linkage maps by
means of a new computer package: JOINMAP. Plant j 1993,
3:739744.
6. Matise TC, Perlin M, Chakravarti A: Automated constrcution of
genetic linkage maps using an expert system (MULTIMAP):
a human genome linkage map. Nat Genet 1994, 6:384390.
7. Hitter E, Gebhardt C, Salamini F: Estimation of recombination
frequencies and construction of RFLP linkage maps in plants
from crosses between heterozygous parents. Genetics 1990,
125:645654.
8. Arus P, Olarte C, Romero M, Vargas F: Linkage analysis of 10 iso
zyme genes in Fl segregating almond progenies. j Am Soc Hort
Sci 1994, I 19:339344.
9. Ritter E, Salamini F: The calculation of recombination frequen
cies in crosses of allogamous plant species with applications
to linkage mapping. Genet Res 1996, 67:5565.
10. Maliepaard C, Jansen J, van Ooijen JW: Linkage analysis in a full
sib family of an outbreeding plant species: overview and con
sequences for applications. Genet Res 1997, 70:237250.
I I. Wu RL, Ma CM, Painter I, Zeng ZB: Simultaneous maximum like
lihood estimation of linkage and linkage phases in outcross
ing populations. Theor Pop Biol 2002, 61:349363.
12. Wu RL, Ma CM, Wu SS, Zeng ZB: Linkage mapping of sexspe
cific differences. Genet Res 2002, 79:8596.
13. Grattapaglia D, R Sederoff: Genetic linkage maps of Eucalyptus
grandis and Eucalyptus urophylla using a pseudotestcross:
mapping strategy and RAPD markers. Genetics 1994,
137:11211137.
14. Ling S: Constructing genetic maps for outbred experimental
crosses. Ph.D. thesis, University of California, Berkeley, CA 1999.
Page 13 of 14
(page number not for citation purposes)
BMC Genetics 2004, 5:20
http://www.biomedcentral.com/14712156/5/20
15. Butcher PA, Williams ER, Whitaker D, Ling S, Speed TP, Moran CF:
Improving linkage analysis in outcrossed forest trees an
example from. Acacia mangium. Theor AppI Genet 2002,
104:11851191.
16. Dempster AP, Laird NM, Rubin DB: Maximum likelihood from
incomplete data via EM algorithm. j Roy Stat Soc Ser 8 1977,
39:138.
17. HaldaneJBS: The combination of linkage values and the calcu
lation of distance between the loci of linked factors. j Genet
1919, 8:299309.
18. Ridout MS, Tong S, Vowden CJ, Tobutt KR: Threepoint linkage
analysis in crosses of allogamous plant species. Genet Res 1998,
72:111121.
19. van Laarhoven PJM, Aarts EHL: Simulated Annealing: Theory and
Application D. Reide Publishing Co., Dordrecht, The Netherlands;
1987.
20. Casella G: Empirical Bayes Gibbs sampling. Biostatistics 2001,
2:485500.
Page 14 of 14
(page number not for citation purposes)
Publish with BioMed Central and every
scientist can read your work free of charge
"BioMed Central will be the most significant development for
disseminating the results of biomedical research in our lifetime."
Sir Paul Nurse, Cancer Research UK
Your research papers will be:
available free of charge to the entire biomedical community
peer reviewed and published immediately upon acceptance
cited in PubMed and archived on PubMed Central
yours you keep the copyright
Submit your manuscript here: BioMedcentral
http://www.biomedcentral.com/info/publishing adv.asp
BMC Genetics 2004, 5:20
