Group Title: BMC Genetics
Title: A Multilocus likelihood approach to joint modeling of linkage, parental diplotype and gene order in a full-sib family
CITATION PDF VIEWER THUMBNAILS PAGE IMAGE ZOOMABLE
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00100043/00001
 Material Information
Title: A Multilocus likelihood approach to joint modeling of linkage, parental diplotype and gene order in a full-sib family
Physical Description: Book
Language: English
Creator: Lu, Qing
Cui, Yuehua
Wu, Rongling
Publisher: BMC Genetics
Publication Date: 2004
 Notes
Abstract: BACKGROUND:Unlike a pedigree initiated with two inbred lines, a full-sib family derived from two outbred parents frequently has many different segregation types of markers whose linkage phases are not known prior to linkage analysis.RESULTS:We formulate a general model of simultaneously estimating linkage, parental diplotype and gene order through multi-point analysis in a full-sib family. Our model is based on a multinomial mixture model taking into account different diplotypes and gene orders, weighted by their corresponding occurring probabilities. The EM algorithm is implemented to provide the maximum likelihood estimates of the linkage, parental diplotype and gene order over any type of markers.CONCLUSIONS:Through simulation studies, this model is found to be more computationally efficient compared with existing models for linkage mapping. We discuss the extension of the model and its implications for genome mapping in outcrossing species.
General Note: Start page 20
General Note: M3: 10.1186/1471-2156-5-20
 Record Information
Bibliographic ID: UF00100043
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: Open Access: http://www.biomedcentral.com/info/about/openaccess/
Resource Identifier: issn - 1471-2156
http://www.biomedcentral.com/1471-2156/5/20

Downloads

This item has the following downloads:

PDF ( PDF )


Full Text



BMC Genetics


B
Biol-l Central


Methodology article

A multilocus likelihood approach to joint modeling of linkage,
parental diplotype and gene order in a full-sib family
Qing Lul, Yuehua Cui' and Rongling Wu*1,2


Address: 'Department of Statistics, University of Florida, Gainesville, Florida 32611 USA and 2College of Life Sciences, Zhejiang Forestry
University, Lin'an, Zhejiang 311300, People's Republic of China
Email: Qing Lu qlu@darwin.epbi.cwru.edu; Yuehua Cui ycui@stat.ufl.edu; Rongling Wu* Rwu@mail.ifas.ufl.edu
* Corresponding author


Published: 26 July 2004
BMC Genetics 2004, 5:20 doi: 10.1 186/1471-2156-5-20


Received: 10 March 2004
Accepted: 26 July 2004


This article is available from: http://www.biomedcentral.com/1471-2156/5/20
2004 Lu et al; licensee BioMed Central Ltd. This is an open-access article distributed under the terms of the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.



Abstract
Background: Unlike a pedigree initiated with two inbred lines, a full-sib family derived from two
outbred parents frequently has many different segregation types of markers whose linkage phases
are not known prior to linkage analysis.
Results: We formulate a general model of simultaneously estimating linkage, parental diplotype
and gene order through multi-point analysis in a full-sib family. Our model is based on a multinomial
mixture model taking into account different diplotypes and gene orders, weighted by their
corresponding occurring probabilities. The EM algorithm is implemented to provide the maximum
likelihood estimates of the linkage, parental diplotype and gene order over any type of markers.
Conclusions: Through simulation studies, this model is found to be more computationally efficient
compared with existing models for linkage mapping. We discuss the extension of the model and its
implications for genome mapping in outcrossing species.


Background
The construction of genetic linkage maps based on molec-
ular markers has become a routine tool for comparative
studies of genome structure and organization and the
identification of loci affecting complex traits in different
organisms [ 1]. Statistical methods for linkage analysis and
map construction have been well developed in inbred line
crosses [2] and implemented in the computer packages
MAPMAKER [3], CRI-MAP [4], JOINMAP [5] and MULTI-
MAP [6]. Increasing efforts have been made to develop
robust tools for analyzing marker data in outcrossing
organisms [7-12], in which inbred lines are not available
due to the heterozygous nature of these organisms and/or
long-generation intervals.


Genetic analyses and statistical methods in outcrossing
species are far more complicated than in species that can
be selfed to produce inbred lines. There are two reasons
for this. First, the number of marker alleles and the segre-
gation pattern of marker genotypes may vary from locus
to locus in outcrossing species, whereas an inbred line-ini-
tiated segregating population, such as an F2 or backcross,
always has two alleles and a consistent segregation ratio
across different markers. Second, linkage phases among
different markers are not known a priori for outbred par-
ents and, therefore, an algorithm should be developed to
characterize a most likely linkage phase for linkage
analysis.


Page 1 of 14
(page number not for citation purposes)







http://www.biomedcentral.com/1471-2156/5/20


To overcome these problems of linkage analysis in out-
crossoing species, Grattapaglia and Sederoff [13]
proposed a two-way pseudo-testcross mapping strategy in
which one parent is heterozygous whereas the other is
null for all markers. Using this strategy, two parent-spe-
cific linkage maps will be constructed. The limitation of
the pseudo-testcross strategy is that it can only make use
of a portion of molecular markers. Ritter et al. [71 and Rit-
ter and Salamini [9] proposed statistical methods for esti-
mating the recombination fractions between different
segregation types of markers. Using both analytical and
simulation approaches, Maliepaard et al. [101 discussed
the power and precision of the estimation of the pairwise
recombination fractions between markers. Wu et al. [11]
formulated a multilocus likelihood approach to simulta-
neously estimate the linkage and linkage phases of the
crossed parents over multiple markers. Ling [141 proposed
a three-step analytical procedure for linkage analysis in
out-crossing populations, which includes (1) determining
the parental haplotypes for all of the markers in a linkage
group, (2) estimating the recombination fractions, and
(3) choosing a most likely marker order based on optimi-
zation analysis. This procedure was used to analyze segre-
gating data in an outcrossing forest tree [15]. Currently,
none of these models for linkage analysis in outcrossing
species can provide a one-step analysis for the linkage,


parental linkage phase and marker order from segregating
marker data.

In this article, we construct a unifying likelihood analysis
to simultaneously estimate linkage, linkage phases and
gene order for a group of markers that display all possible
segregation patterns in a full-sib family derived from two
outbred parents (see Table 1 of Wu et al. [11]). Our idea
here is to integrate all possible linkage phases between a
pair of markers in the two parents, each specified by a
phase probability, into the framework of a mixture statis-
tical model. In characterizing a most likely linkage phase
(or parental diplotype) based on the phase probabilities,
the recombination fractions are also estimated using a
likelihood approach. This integrative idea is extended to
consider gene orders in a multilocus analysis, in which the
probabilities of all possible gene orders are estimated and
a most likely order is chosen, along with the estimation of
the linkage and parental diplotype. We perform extensive
simulation studies to investigate the robustness, power
and precision of our statistical mapping method incorpo-
rating linkage, parental diplotype and gene orders. An
example from the published literature is used to validate
the application of our method to linkage analysis in out-
crossing species.


Table I: Estimation from two-point analysis of the recombination fraction ( SD) and the parental diplotype probability of parent P
( p ) and Q (q ) for five markers in a full-sib family of n = 100


Parental diplotype

Pa x Q.


r = 0.05


r = 0.20


p


p


0.9960 0.9972

I (Oh) 0(lb)

I I/0c

I I1/ c


0.2097 0.0328

0.2103 0.0848

0.1952 0.0777

0.2002 0.0414


0.9882 0.9878

I (0b) 0(lb)

I I/0c

I I1/ c


oShown is the parental diplotype of each parent for the five markers hypothesized, where the vertical lines denote the two homologous
chromosomes. bThe values in the parentheses present a second possible solution. For any two symmetrical markers (2 and 3), p = I, q = 0 and

p = 0, 4 = I give an identical likelihood ratio test statistic (Wu et al. 2002a). Thus, when the two parents have different diplotypes for
symmetrical markers, their parental diplotypes cannot be correctly determined from two-point analysis. cThe parental diplotype of parent P2 cannot
be estimated in these two cases because marker 4 is homozygous in this parent. The MLE of r is given between two markers under comparison,
whereas the MLEs of p and q given at the second marker.






Page 2 of 14
(page number not for citation purposes)


Marker


I I
c d
I I
a b
I I
o a
I I
b b
I I
c d


0.530 0.0183

0.0464 0.0303

0.0463 0.0371

0.0503 0.0231


BMC Genetics 2004, 5:20







http://www.biomedcentral.com/1 471-2156/5/20


Two-locus analysis
A general framework
In general, the genotypes of the two markers for the two
parents can be observed in a molecular experiment, but
the allelic arrangement of the two markers in the two
homologous chromosomes of each parent (i.e., linkage
phase) is not known. In the current genetic literature, a
linear arrangement of nonalleles from different markers
on the same chromosomal region is called the haplotype.
The observable two-marker genotype of parent P is 12/12,
but it may be derived from one of two possible combina-
tions of maternally- and paternally-derived haplotypes,
i.e., [11] [22] or [12] [21], where we use [] to define a hap-
lotype. The combination of two haplotypes is called the
diplotype. Diplotype [11] [22] (denoted by 1) is gener-
ated due to the combination of two-marker haplotypes
[11] and [22], whereas diplotype [12] [21] (denoted by
1) is generated due to the combination of two-marker
haplotypes [12] and [21]. If the probability of forming
diplotype [11] [22] is p, then the probability of forming
diplotype [12] [21] is 1 p. The genotype of parent Q and
its possible diplotypes [33] [44] and [34] [43] can be
defined analogously; the formation probabilities of the
two diplotypes are q and 1 q, respectively.

Suppose there is a full-sib family of size n derived from
two outcrossed parents P and Q. Two sets of chromo-
somes are coded as 1 and 2 for parent P and 3 and 4 for
parent Q. Consider two marker loci M1 and M2, whose
genotypes are denoted as 12/12 and 34/34 for parent P
and Q, respectively, where we use / to separate the two
markers. When the two parents are crossed, we have four
different progeny genotypes at each marker, i.e., 13, 14, 23
and 24, in the full-sib family. Let r be the recombination
fraction between the two markers.

The cross of the two parents should be one and only one
of four possible parental diplotype combinations, i.e.,
[11] [22] x [33] [44]), [11] [22] x [34] [43], [12] [211_x
133] [441 and [12] [21] x [34] [43], expressed as 11, 1 1,
1 1 and 11 with a probability of pq, p(l q), (1 p)q and
(1 p) (1 q), respectively. The estimation of the recombi-
nation fraction in the full-sib family should be based on a
correct diplotype combination [10]. The four combina-
tions each will generate 16 two-marker progeny geno-
types, whose frequencies are expressed, in a 4 x 4 matrix,


13 14 23 24


(1-r)2
4
r(1-r)
4
r(1 r)
4
42
4


r(1-r)
4
(1-r)2
4
r2
4
r(1 r)
4


r(1-r)
4
r2
4
(1 -r)2
4
r(1 r)
4


13 14 23

13 r(1-r) (1-r)2 r2
4 4 4

14 (1 r)2 r(1 r) r(1 r)
HIT= 4 4 4

23 r2 r(1-r) r(1-r)
4 4 4

24 r(1- r) r2 (1-r)2
4 4 4
for [11] [22] x [34] [43],

13 14 23

13 r(1 r) r2 (1- r)2
4 4 4

14 2 r(1 r) r(1 r)
H11 = 4 4 4

23 (1 r)2 r(1 r) r(1 r)
4 4 4

24 r(1-r) (1-r)2 r2
4 4 4
for [121 [211 x [331 [441 and


r(


r(


(1


13 14 23

r2 r(1-r) r(1-r)
4 4 4
1 -r) r 2 (1 r)2
4 4 4
1l-r) (1-r)2 r2
4 4 4
- r)2 r(1 r) r(1 r)
4 4 4


r 2

4
r(1-r)
4
r(1 r)
4
(1 -r)2
4



24

r(1- r)
4
r2
4
(1-r)2
4
r(1-r)
4



24

r(1-r)
4
(1 -r)2
4
T2
4
r(1-r)
4



24

(1-r)2
4
r(1- r)
4
r(1- r)
4
r2
4


Page 3 of 14
(page number not for citation purposes)


BMC Genetics 2004, 5:20


for [11] [22] x [33] [44],







http://www.biomedcentral.com/1 471-2156/5/20


for [12] [21] x [34] [43]. Note that these matrices are
expressed in terms of the combinations of the progeny

genotypes for two markers and 2, respectively.

Let n = (njlj2)4 x 4 denote the matrix for the observations of
progeny where j1,j2 = 1 for 13, 2 for 14, 3 for 23, or 4 for
34 for the progeny genotypes at these two markers. Under
each parental diplotype combination, njlj2 follows a
multinomial distribution. The likelihood for the four
diplotype combinations are expressed as

L11 ro(2N2+N3+N4)(1 r)(2N1+N3+N4),

LiT oc r(2N4+N +N2) (1 r)(2N3+N +N2)

LTl o r(2N3+Nl+N2) r)(2N4+Ni+N2)
LTT o r(2Ni+N3+N4) (1 r)(2N2+N3+N4),

where N = nl + n22 + n33 + n44, N2 = n14 + n23 + n32 + n41,
N3 = n12 + n21 + 34 + n43, and N4 = n13 + n31 + n24 + n42. It
can be seen that the maximum likeihood estimate (MLE)

ofr ( F) under the first diplotype combination is equal to

one minus T under the fourth combination, and the same
relation holds between the second and third diplotype
combinations. Although there are identical plug-in likeli-
hood values between the first and fourth combinations as
well as between the second and third combinations, one

can still choose an appropriate f from these two pairs

because one of them leads to f greater than 0.5. Tradi-
tional approaches for estimating the linkage and parental
diplotypes are to estimate the recombination fractions
and likelihood values under each of the four combina-
tions and choose one legitimate estimate of r with a
higher likelihood.

In this study, we incorporate the four parental diplotype
combinations into the observed data likelihood,
expressed as

L(e|n) = pL1, +p(1 q)LIT +(1 -p)qLT +(1 -p)(1 -q)LTT (1)

where = (r, p, q) is an unknown parameter vector, which
can be estimated by differentiating the likelihood with
respect to each unknown parameter, setting the deriva-
tives equal to zero and solving the likelihood equations.
This estimation procedure can be implemented with the
EM algorithm [2,11,16]. Let H be a mixture matrix of the
genotype frequencies under the four parental diplotype
combinations weighted by the occurring probabilities of
the diplotype combinations, expressed as


H = pqH + p(1 q)H1T + (1 p)qHT + (1 -p)(1 q)HTT
13 14 23 24
13 F a b c d
=14 b a d c
23 c d a b
24 d c b a


where


a= [pq(1- r)2+(p+q- 2pq)r(1-r)+(1- p)(1-q)r2],
4

b = [p(1 -q)(1-r)2 + (1- p-q+2pq)r(1-r)+ (1- p)qr2],
4
c = [(1- p)q(1-r)2 +(1- p-q+2pq)r(1-r)+p(1-q)2],

d [(1- p)(1- q)(1- r)2 + (p + q -2pq)r(1- r) + pqr2.
4
Similar to the expression of the genotype frequencies as a
mixture of the four diplotype combinations, the expected
number of recombination events contained within each
two-marker progeny genotype is the mixture of the four
different diplotype combinations, i.e.,


D = pqD1 + p(1 q)D1T + (1 p)qDT + (1 -p)(1 q)DTT


13 14 23 24
13 2-p- q 1-p+q 1+p- q p+q 1
=14 1-p+q 2-p-q p+q 1+p-q ,
23 I+p-q p+q 2-p-q 1-p+q
24 p+q +p-q 1 -p+q 2-p-q

where the expected number of recombination events for
each combination are expressed as


Page 4 of 14
(page number not for citation purposes)


BMC Genetics 2004, 5:20







http://www.biomedcentral.com/1471-2156/5/20


13 14 23 24
0 1 1 2
1 0 2 1 ,
1 2 0 1
2 1 1 0


S4 4 q
qJJ2 } 1 j1 j2 (6)
l j2 =1 (6)

where djj2, I, pj2 and q2 are the (Jj2)th element of
matrix D, H, P and Q, respectively.


M Step: Calculate r -+1} using the equation,


13
DIT = 14
23
24



13
DT = 14
23
24



13
DTT = 14
23
24


13 14 23 24
2 1 1 01
1 2 0 1 .
1 0 2 1
0 11 2J


Define

P = pqH11 +p(1 q)H1T,
Q= pqH11 +(1 q)qH .

The general procedure underlying the { r + 1} th EM step is
given as follows:

E Step: At step r, using the matrix H based on the current
estimate rl{ calculate the expected number of recombi-
nation events between two markers for each progeny gen-
{o+ 1} {a+ 1}
type and pii2 1ql2 '1


c+1} = d-P n.


p{+l}= 1 4 4 p}
1112 -n hl 1112'
"1=1 j2 = 1 j


4 4 4
r{r+} 1 c{r+} (7)
2n 1102
Jl =1 j2 =1
The E step and M step among Eqs. (4) (7) are repeated
until r converges to a value with satisfied precision. The
converged values are regarded as the MLEs of .

Model for partially informative markers
Unlike an inbred line cross, a full-sib family may have
many different marker segregation types. We symbolize
observed marker alleles in a full-sib family by A1, A2, A3
and A4, which are codominant to each other but domi-
nant to the null allele, symbolized by 0. Wu et al. [11]
listed a total of 28 segregation types, which are classified
into 7 groups based on the amount of information for
linkage analysis:

A. Loci that are heterozygous in both parents and segre-
gate in a 1:1:1:1 ratio, involving either four alleles A1A2 x
A3A4, three non-null alleles A1A2 x A1A3, three non-null
alleles and a null allele A1A2 x A30, or two null alleles and
two non-null alleles AO x A20;

B. Loci that are heterozygous in both parents and segre-
gate in a 1:2:1 ratio, which include three groups:

B1. One parent has two different dominant alleles and the
other has one dominant allele and one null allele, e.g.,
A1A2 x A10;

B2. The reciprocal of BI;

B3. Both parents have the same genotype of two codomi-
nant alleles, i.e., A1A2 x A1A2;

C. Loci that are heterozygous in both parents and segre-
gate in a 3:1 ratio, i.e., A1O x A1O;

D. Loci that are in the testcross configuration between the
parents and segregate in a 1:1 ratio, which include two
groups:

DI. Heterozygous in one parent and homozygous in the
other, including three alleles A1A2 x A3A3, two alleles A1A2


Page 5 of 14
(page number not for citation purposes)


13
Dll = 14
23
24


BMC Genetics 2004, 5:20







http://www.biomedcentral.com/1471-2156/5/20


x AIA1, AIA2 x 00 and A20 x AIA1, and one allele (with
three null alleles) A10 x 00;

D2. The reciprocals of D1.

The marker group A is regarded as containing fully inform-
ative markers because of the complete distinction of the
four progeny genotypes. The other six groups all contain
the partially informative markers since some progeny geno-
type cannot be phenotypically separated from other gen-
otypes. This incomplete distinction leads to the
segregation ratios 1:2:1 (B), 3:1 (C) and 1:1 (D). Note that
marker group D can be viewed as fully informative if we
are only interested in the heterozygous parent.

In the preceding section, we defined a (4 x 4)-matrix H for
joint genotype frequencies between two fully informative
markers. But for partially informative markers, only the
joint phenotypes can be observed and, thus, the joint geno-
type frequencies, as shown in H, will be collapsed accord-
ing to the same phenotype. Wu et al. [11] designed
specific incidence matrices (I) relating the genotype fre-
quencies to the phenotype frequencies for different types
of markers. Here, we use the notation H' = I, HIb2 for a
(b, x b2) matrix of the phenotype frequencies between two
partially informative markers, where b1 and b2 are the
numbers of distinguishable phenotypes for markers M1
and M2, respectively. Correspondingly, we have

(DH)' = I (D o H)Ib2,P'= I1PIb2 and Q'= I QIb2 The
EM algorithm can then be developed to estimate the
recombination fraction between any two partial informa-
tive markers.

E Step: At step r, based on the matrix (DH)' derived from
the current estimate r{ T}, calculate the expected number of
recombination events between the two markers for a

given progeny genotype and pj j1, qJ`+I j2


C (dh)'}
h112


1 bl 2 }^}
.+1} 1 1112
p = -- n 1 2
{+l} =1 2 =1 O

b b2 2
q = -- J nJ 2


where (dh) h'j, p and q' is the (j2th ele
ment of matrices (DH)', H', P' and Q', respectively.

M Step: Calculate r{ 1} using the equation,


r T 1 1 = I b I b 2
r{T+l} = b1 b {T+l}
-2n 11 i 1
j, =1 i2 =1


The E and M steps between Eqs. (8) (11) are repeated
until the estimate converges to a stable value.

Three-locus analysis
A general framework
Consider three markers in a linkage group that have three
possible orders M1 -M2 -M3(01 ), M1-M3 -M2(O2)
and M2-M1-M3(03 ). Let o1, 02 and 03 be the corre-
sponding probabilities of occurrence of these orders in the
parental genome. Without loss of generality, for a given
order, the allelic arrangement of the first marker between
the two homologous chromosomes can be fixed for a par-
ent. Thus, the change of the allelic arrangements at the
other two markers will lead to 2 x 2 = 4 parental diplo-
types. The three-marker genotype of parent P (12/12/12)
may have four possible diplotypes, [111] [222], [112]
[221], [121] [212] and [122] [211]. Relative to the fixed
allelic arrangement 1121 of the first marker on the two
homologous chromosomes 1 and 2, the probabilities of
allelic arragments 112| and 2| 1 are denoted as p1 and 1 -
p, for the second marker and as p2 and 1 p2 for the third
marker, respectively. Assuming that allelic arrangements
are independent between the second and third marker,
the probabilities of these four three-marker diplotypes can
be described by p2, p1(l p2), (1 p1)2 and (1 p1) (1 -
p2), respectively. The four diplotypes of parent Q can also
be constructed, whose probabilities are defined as qlq2,
q(1 q2), (1- q)q2 and (1 q) (1 q2) respectively. Thus,
there are 4 x 4 = 16 possible diplotype combinations
(whose probabilities are the product of the corresponding
diplotype probabilities) when parents P and Q are
crossed.

Let r12 denote the recombination fraction between mark-
ers M1 and M2, with r23 and r13defined similarly. These
recombination fractions are associated with the probabil-
ities with which a crossover occurs between markers M1
and M2 and between markers M2 and M3. The event
that a crossover or no crossover occurs in each interval is
denoted by Dll and Doo, respectively, whereas the events
that a crossover occurs only in the first interval or in the
second interval is denoted by D10 and D01, respectively.


Page 6 of 14
(page number not for citation purposes)


BMC Genetics 2004, 5:20








http://www.biomedcentral.com/1 471-2156/5/20


The probabilities of these events are denoted by d00, d01,
d,,, nd ~ 1,, respectively, whose sum equals 1. According to
the definition of recombination fraction as the probabil-
ity of a crossover between a pair of loci, we have r12 = do
+ d 1, r23 = do, + d11 and r13 = do, + d10. These relationships
have been used by Haldane [17] to derive the map func-
tion that converts the recombination fraction to the corre-
sponsding genetic distance.

For a three-point analysis, there are a total of 16 (16 x 4)-
matrices for genotype frequencies under a given marker
order (Ok ), each corresponding to a diplotype combina-

tion, denoted by H kX kkyk where x1,x2 = 1 for 1|2| or 2
X1x2yly2
for 2111 denote the two alternative allelic arrangements of
the second and third marker, respectively, for parent P,

and y, ) 2 =1 for 1121 or 2 for 2 11 denote the two alter-
native allelic arrangements of the second and third
marker, respectively, for parent Q. According to Ridout et
al. [18] and Wu et al. [111, elements in H k k k k are
X1 x2 y2
expressed in terms of doo, do0, d10 and d11.

Similarly, there are 16 (16 x 4)-matrices for the expected
numbers of crossover that have occurred for D00, D0I, D10

and D 1 for a given marker order, denoted by D k
X2 xiy2 Y2

DO D D10 and D1 I k respectively. In their
Xx2y Y2 x2x Y2 x2x Y2
Table 2, Wu et al. [11] gave the three-locus genotype fre-
quencies and the number of crossovers on different
marker intervals under marker order 01 .

The joint genotype frequencies of the three markers can be
viewed as a mixture of 16 diplotype combinations and
three orders, weighted by their occurring probabilities,
and is expressed as


Hpt ( pSf2- _'(1p _' '() -p2 x(l q ) q2) '-'(lq '))22' (1 q 2 (12)


Similarly, the expected number of recombination events
contained within a progeny genotype is the mixture of the
different diplotype and order combinations, expressed as:


DAlso define


Also define


(1 P)' q1 (o-1 )



`(1 ) ` 112'(1_ 1Y


p' i 2 2. 2. Pi P ( P2) '
1 p (1 2)-1 2 1 11
22ly =1y2=1q 0


3 12
2=' o0 2. 2~ p2 ) 2 (1 1 2 lp -)2q1 (1 ql) q2 (1 q2) H
k1 Xl(1 Yl -1 1 2
3 2 2 2
Q2= o p 2 (1i pPl) 4 p (-p)x q1 2 2 (1- q) -lH1
k-1 =x =lg =


The occurring probabilities of the three marker orders are
the mixture of all diplotype combinations, expressed, in
matrix notation, as

o1_ (p2 2 2P2) q ( q) q ( q2) H( 2
Y- Y ( P )1 P2 (P2)"2 ql -(l) q 2 (l-q2) H2 ly2^,

0 Pl2- ( Pl)' P2 ( P2)2 l1 q2-" l-) 1q22 (1)q2)2p H Xjq q q(8)H


We implement the EM algorithm to estimate the MLEs of
the recombination fractions between the three markers.
The general equations formulating the iteration of the { T
+ 1}th EM step are given as follows:

E Step: As step r, calculate the expected number of recom-
bination events associated with D00(c'), D01 (fl), D10(7),
Dl (S) for the UjiJ3)th progeny genotype (where jl,j2 and
j3 denote the progeny genotypes of the three individual
markers, respectively):


{+1} DOO{T}
J11213 1J2J3 11213

4{.+l,} DO{T}n


{T+1} DlO{ nl



11J213 1ii23 112 3


(13) {*+1} .{+1} {*+1} { +1}
(13) Calculate 2 3 P J2J3, 23 q2jl2J3

= 1,2,3) using


and q{T+1} (k


Do, ^ f(1 P1' )f (1 -) P2)"2 (14)P


Page 7 of 14
(page number not for citation purposes)


BMC Genetics 2004, 5:20


2)r2 D;


ql)Y-Ilq2-y1 (Iq2)Y1-'H


Doo-S~ S = ok 2-xi y'P '-1p 2- )x-1 l- 1d)-' ^ _2 )'2'1D







http://www.biomedcentral.com/1 471-2156/5/20


Table 2: Estimation from three-point analysis of the recombination fraction ( T SD) and the parental diplotype probabilities of parent
P ( p ) and Q ( 4 ) for five markers in a full-sib family of n = 100

Parental diplotype f f

Marker P x Q Case I Case 2 p q Case I Case 2 p


Recombination fraction = 0.05

M1 a b c d
I| | I 0.05 1 0.0175
M2 a b a b 0.1008 0.0298 0.9978 0.9986
| | 1 1 0.0578 0.0269 0.0557 0.03 12
AM3 a o x o a 0.9977 0 0.0988 0.0277 I 0
| | 1 1 0.0512 0.0307 0.0476 0.0280 I I/(
A4 a b b b 0.0932 0.0301 I I/0 I 1/(
I| | 1 0.0514 0.0229
MA5 a b c d I I


Recombination fraction = 0.20

M1 a b c d
I| | 1 0.2026 0.0348
a2 a b a b 0.3282 0.0482 0.9918 0.9916
,A | | 0.2240 0.0758 0.2408 0.0939
a o x o a 0.9944 0 0.3241 0.0488 I 0
| | 1 1 0.1927 0.0613 0.1824 0.0614
Al4 a b b b 0.3161 0.0502 I I/0 I /(
I| | 1 0.2017 0.0393
MA5 a b c d I I


Case I denotes the recombination fraction between two adjacent markers, whereas case 2 denotes the recombination fraction between the two
markers separated by a third marker. See Table I for other explanations.


p{T+ I} 1 4 4 4 \ hh}
7+h } nJ1 !3'
T+ I P12ij3) n
P11,11,223 n h212 3
i =12 =13 =1 h1j2J3


4 4 4 P{ }
{T+I} I Y E 2( l23)n
111213 n 2a 2 13'




=1 213 3
2jlJ23 n jl=lJ2 =1 j3=l hJl'2
{T+I} I4 4 4 q
ii =1 2 =1 j3 =1 h' JlJ2J3


(23)




(24)




(25)




(26)


4{T+I} 1 4 4 4 0{T2
',,2J3 = I I I I {T} 1J1J213 (27)
3 n j3=1 h2 3
j1=1 i2 =1 ij3 1 23
where nj1j2j3 denote the number of progeny with a partic-

ular three-marker genotype, hjlj2j3, D 2 3, D 2 3
10 11
D 123' 11 2j P 11j2j3)' P2(1j2j3), q101j2j3) and ,.. are
the i j3)th element of matrices H, Doo, Dog, Dio, DI, Pi,
P2, Q1 and Q2, respectively.


M Step: Calculate d +, d+1, d+1} and d+1} using
the equations,


+1_ 1 Y a
S 2n =l=1 J2 =1 j3=1l 23


Page 8 of 14
(page number not for citation purposes)


BMC Genetics 2004, 5:20


0
0


0







http://www.biomedcentral.com/1 471-2156/5/20


P i1 _1 4 4 4 {+}
01 2"jlJ2 J 3
2n i =112 12 3 =1 11213

4 4 4 {+1
+ =" 101 j- 1 '121 3
2"ll =1 2==1 3=l


(29)



(30)


d 1 4 4 (31)
2n' lh i1
= 1=,11 2=1 3=1
The E and M steps are repeated among Eqs. (19) (32)
until doo, do01, d10 and d1l converge to values with satisfied
precision. From the MLEs of the g's, the MLEs of recombi-
nation fractions r12, r13 and r23 can be obtained according
to the invariance property of the MLEs.

Model for partial informative markers
Consider three partially informative markers with the
numbers of distinguishable pheno-types denoted by b1, b2

and b3, respectively. Define H'= (I[ (01 T )HIb; is a (b1b2
x b3) matrix of genotype frequencies for three partially
informative markers. Similarly, we define
(HDoo)'= (IT g IT )(Doo H)Ib, (HDo0)' (IT IT )(Dxo I H)Ib, (HD10)' (IT x
IT "ii ... (HD11)'=(I1 IT 111,, .. ,P = (I bT )Plb, 2 (I
Q1 = (T1 (I )QIb Q2 = (I I = (IT l1 )OIllb I 0 = ( IT

and O'3 = (I I )03l

Using the procedure described in Section (2.2), we imple-
ment the EM algorithm to estimate the MLEs of the
recombination fractions among the three partially
informative markers.

m-point analysis
Three-point analysis considering the dependence of
recombination events among different marker intervals
can be extended to perform the linkage analysis of an arbi-
trary number of markers. Suppose there are m ordered
markers on a linkage group. The joint genotype probabil-
ities of the m markers form a (4m-1 x 4)-dimensional
matrix. There are 2'-1 x 2'-1 such probability matrices each
corresponding to a different parental diplotype combina-
tion. The reasonable estimates of the recombination frac-
tions rely upon the characterization of a most likely
parental diplotype combination based on the multilocus
likelihood values calculated.

The m-marker joint genotype probabilities can be
expressed as a function of the probability of whether or
not there is a crossover occurring between two adjacent
markers, D 11 ...1Im1, where 11, 12 ..., l i_1 are the indicator


variables denoting the crossover event between markers
M1 and M2, markers M2 and M3, ..., and markers
Mm_1 and Mm, respectively. An indicator is defined as 1
if there is a crossover and 0 otherwise. Because each indi-
cator can be taken as one or zero, there are a total of 2"-1
D's.

The occurring probability of interval-specific crossover
' I I can be estimated using the EM algorithm. In the
E step, the expected number of interval specific crossovers
is calculated (see Eqs. (19) (22) for three-point
analysis). In the M step, an explicit equation is used to
estimate the probability 1~, I The MLEs of 1,,
are further used to estimate m(m 1)/2 recombination
fractions between all possible marker pairs. In m-point
analysis, parental diplotypes and gene orders can be
incorporated in the model.

Monte Carlo simulation
Simulation studies are performed to investigate the statis-
tical properties of our model for simultaneously estimat-
ing linkage, parental diplotype and gene order in a full-sib
family derived from two outbred parents. Suppose there
are five markers of a known order on a chromosome.
These five markers are segregating differently in order,
1:1:1:1, 1:2:1, 3:1, 1:1 and 1:1:1:1. The diplotypes of the
two parents for the five markers are given in Table 1 and
using these two parents a segregating full-sib family is gen-
erated. In order to examine the effects of parameter space
on the estimation of linkage, parental diplotype and gene
order, the full-sib family is simulated with different
degrees of linkage (r = 0.05 vs. 0.20) and different sample
sizes (n = 100 vs. 200).

As expected, the estimation precision of the recombina-
tion fraction depends on the marker type, the degree of
linkage and sample size. More informative markers, more
tightly linked markers and larger sample sizes display
greater estimation precision of linkage than less informa-
tive markers, less tightly linked markers and smaller sam-
ple sizes (Tables 1 and 2). To save space, we do not give
the results about the effects of sample size in the tables.
Our model can provide an excellent estimation of paren-
tal linkage phases, i.e., parental diplotype, in two-point
analysis. For example, the MLE of the probability (p or q)
of parental diplotype is close to 1 or 0 (Table 1), suggest-
ing that we can always accurately estimate parental diplo-
types. But for two symmetrical markers (e.g., markers M2
and M3 in this example), two sets of MLEs, p = 1, q = 0
and p = 0, 4 = 1, give an identical likelihood ratio test
statistic. Thus, two-point analysis cannot specify parental


Page 9 of 14
(page number not for citation purposes)


BMC Genetics 2004, 5:20








http://www.biomedcentral.com/1 471-2156/5/20


diplotypes for symmetrical markers even when the two
parents have different diplotypes.

The estimation precision of linkage can be increased when
a three-point analysis is performed (Table 2), but this
depends on different marker types and different degrees of
linkage. Advantage of three-point analysis over two-point
analysis is more pronounced for partially than fully
informative markers, and for less tightly than more tightly
linked markers. For example, the sampling error of the
MLE of the recombination fraction (assuming r = 0.20)
between markers M2 and M3 from two-point analysis
is 0.0848, whereas this value from a three-point analysis
decreases to 0.0758 when combining fully informative
marker M1 but increases to 0.0939 when combining par-

tially informative marker M4. The three-point analysis
can clearly determine the diplotypes of different parents
as long as one of the three markers is asymmetrical. In our
example, using either asymmetrical marker M1 or M4,
the diplotypes of the two parents for two symmetrical
markers ( M2 and M3) can be determined. Our model
for three-point analysis can determine a most likely gene
order. In the three-point analyses combining markers
M1-M3, markers M2-M4 and marker M3-M5, the


MLEs of the probabilities of gene order are all almost
equal to 1, suggesting that the estimated gene order is con-
sistent with the order hypothesized.

To demonstrate how our linkage analysis model is more
advantageous over the existing models for a full-sib family
population, we carry out a simulation study for linked
dominant markers. In two-point analysis, two different
parental diplotype combinations are assumed: (1) [aal
[ool x [aal] ool (cis x cis) and (2) [ao] [oa] x [aol [oal (trans
x trans). The MLE of the linkage under combination (2),
in which two dominant alleles are in a repulsion phase, is
not as precise as that under combination (1), in which
two dominant non-alleles are in a coupling phase [12].
For a given data set with unknown linkage phase, the tra-
ditional procedure for estimating the recombination frac-
tion is to calculate the likelihood values under all possible
linkage phase combinations (i.e., cis x cis, cis x trans, trans
x cis and trans x trans). The combinations, cis x cis and
trans x trans, have the same likelihood value, with the MLE
of one combination being equal to the subtraction of the
MLE of the second combination from 1. The same rela-
tionship is true for cis x trans and trans x cis. A most likely
phase combination is chosen corresponding to the largest
likelihood and a legitimate MLE of the recombination
fraction (r_< 0.5) [101].


Table 3: Comparison of the estimation of the linkage and parental diplotype between two dominant markers in a full-sib family ofn =
100 from the traditional and our model


Traditional model


Data simulated from cis x cis
Correct diplotype combination
Log-likelihooda
T under each diplotype combination
Estimated diplotype combination
r under correct diplotype combination
Diplotype probability for parent P (/)

Diplotype probability for parent Q ( 4 )
Data simulated from trans x trans
Correct diplotype combination
Log-likelihooda
r under each diplotype combination
Estimated diplotype combination
T under correct diplotype combination
Diplotype probability for parent P (p )

Diplotype probability for parent Q ( 4 )


trans x trans


Incorrect
-46.2
0.8018 0.0446


0.1982 0.0446
1.0000 0.0000

1.0000 0.0000


'The log-likelihood values given here are those from one random simulation for each diplotype combination by the traditional model.




Page 10 of 14
(page number not for citation purposes)


is x cis


Our model


Correct
-46.2
0.1981 0.0446
Selected
0.1981 0.0446


Incorrect
-89.6
0.8573 0.1253


cis x trans


Incorrect
-92.3
0.5000 0.0000


Incorrect
-89.6
0.0393 0.0419
Selected


trans x as


Incorrect
-92.3
0.5000 0.0000


Incorrect
-89.6
0.0393 0.0419
Selected


Correct
-89.6
0.1426 0.1253

0.1426 0.1253


0.1428 0.1253
0.0000 0.0000

0.0000 0.0000


BMC Genetics 2004, 5:20







http://www.biomedcentral.com/1 471-2156/5/20


Table 4: Comparison of the estimation of the linkage and gene order between three dominant markers in a full-sib family of n = 100
from the traditional and our model


M1 -M2 -M3

Data stimulated from [aaa] [ooo] x [aaa] [ooo]
Correct gene order Correct
Estimated best gene order 100

r12 0.2047 0.0422


0.1980 0.0436

0.3245 0.0619


Prob(A41 -AM2- 13 )(1)

Prob (A1 -A3-A 2 )(2)(

Prob (M2-A41- 1 3 (03)


Data simulated from [aao] [ooa] x [aao] [ooa]
Correct gene order Correct
Estimated best gene order 80

/9 0.1991 0.0456


0.1697 0.0907
0.3218 0.0755


Traditional model

M1 -M3 -M2


Incorrect
0


Incorrect
II

0.8165 0.1003
0.8220 0.0338
0.2703 0.0586


Prob(A41 -M2- -13 )(}

Prob (A41 -A4M3- A2 (62)

Prob (A2 -A41 3 (3 )


oThe percent of a total of 200 simulations that have a largest likelihood for a given gene order estimated from the traditional approach. In this
example used to examine the advantage of implementing gene orders, known linkage phases are assumed.


For our data set simulated from [aal [ool x [aal [ool, one
can easily select cis x cis as the best estimation of phase
combination because it corresponds to a larger likelihood
and a smaller r (Table 3). Our model incorporating the
parental diplotypes can provide comparable estimation
precision of the linkage for the data from [aal [ool x [aal
[ool and precisely determine the parental diplotypes (see
the MLEs of p and q; Table 3). Our model has great advan-
tage over the traditional model for the data derived from
[ao] [oa] x [ao] [oa]. For this data set, the same likelihood
was obtained under all possible four diplotype
combinations (Table 3). In this case, one would select cis
x trans or trans x cis because these two phase combinations
are associated with a lower estimate of r. But this estimate
of r (0.0393) is biased since it is far less than the value of


0.20 hypothesized. Our model gives the same estimation
precision of the linkage for the data derived from [aol [oal
x [ao] [oa] as obtained when the analysis is based on a cor-
rect diplotype combination (Table 3). Also, our model
can precisely determine the parental diplotypes (p = q =
0).

In three-point analysis, we examine the advantage of
implementing linkage analysis with gene orders. Three
dominant markers are assumed to have two different
parental diplotypes combinations: (1) [aaa] [ooo] x [aaal
[ooo] and (2) [aaol [ooal x [aaol [ooal. The traditional
approach is to calculate the likelihood values under three
possible gene orders and choose one of a maximum like-
lihood to estimate the linkage. Under combination (1), a


Page 11 of 14
(page number not for citation purposes)


Our model


M2 1 M3


Incorrect
0


0.2048 0.0422

0.1985 0.0434

0.3235 0.0618

0.9860 0.0105

0.0060 0.0071

0.0080 0.0079


Incorrect
9


0.9284 0.0724
0.1636 0.0608
0.7821 0.0459


0.2104 0.0447
0.2073 0.0754
0.2944 0.0929
0.9952 0.0058

0.0045 0.0058

0.0003 + 0.0015


BMC Genetics 2004, 5:20







http://www.biomedcentral.com/1471-2156/5/20


most likely gene order can be well determined and, there-
fore, the recombination fractions between the three mark-
ers well estimated, because the likelihood value of the
correct order is always larger than those of incorrect orders
(Table 4). However, under combination (2), the estimates
of linkage are not always precise because with a frequency
of 20% gene orders are incorrectly determined. The
estimates ofr's will largely deviate from their actual values
based on a wrong gene order (Table 4). Our model incor-
porating gene order can provide the better estimation of
linkage than the traditional approach, especially between
those markers with dominant alleles being in a repulsion
phase. Furthermore, a most likely gene order can be deter-
mined from our model at the same time when the linkage
is estimated.

Our model is further used to perform joint analyses
including more than three markers. When the number of
markers increases, the number of parameters to be esti-
mated will be exponentially increased. For four-point
analysis, the speed of convergence was slow and the accu-
racy and precision of parameter estimation have been
affected for a sample size of 200 (data not shown).
According to our simulation experience, the improvement
of more-than-three-point analysis can be made possible
by increasing sample size or by using the estimates from
two- or three-point analysis as initial values.

A worked example
We use an example from published literature [ 181 to dem-
onstrate our unifying model for simultaneous estimation
of linkage, parental diplotype and gene order. A cross was
made between two triple heterozygotes with genotype
AaVvXx for markers A, 'V and X. Because these three
markers are dominant, the cross generates 8 distinguisha-
ble genotypes, with observations of 28 for A_/V_/X, 4 for
A/V/xx, 12 for A/vv/X, 3 for A_/vv/xx, 1 for aa/V/X, 8 for
aa/V_/xx, 2 for aa/vv/X and 2 for aa/vv/xx. We first use two-
point analysis to estimate the recombination fractions
and parental diplotypes between all possible pairs of the
three markers. The recombination fraction between mark-
ers A and 'V is rryV = 0.3764, whose the estimated
parental diplotypes are [Av] [aV] x [AV] [av] or [AV] [av] x
[Av] [aV]. The other two recombination fractions and the
corresponding parental displotypes are estimated as
rq,x = 0.3855, [Vxl [vX] x [VX] [vxl or [VXI [vxl x [Vxl
[vX] and rya = 0.1836, [AX] [ax] x [AXI [ax], respec-
tively. From the two-point analysis, one of the two parents
have dominant alleles from markers A and X are
repulsed with the dominant alleles from marker V .

Our subsequent three-point analysis combines parental
diplotypes and gene orders to estimate the linkage


between these three dominant markers. The estimated
gene order is X-A-'V. The MLEs of the recombination
fractions are rxg = 0.2120, rgy = 0.3049 and
rXyV = 0.3049. The parental diplotype combination is
[XAVI [xav] x [XAv] [xaV] or [XAv] [xaV] x [XAVI [xav]. The
three-point analysis for these three markers by Ridout et
al. [18] led to the estimates of the three recombination
fractions all equal to 0.20. But their estimates may not be
optimal because the effect of gene order on r was not
considered.

Discussion
Several statistical methods and software packages have
been developed for linkage analysis and map construction
in experimental crosses and well-structured pedigrees [2-
6], but these methods need unambiguous linkage phases
over a set of markers in a linkage group. For outcrossing
species, such as forest trees, it is not possible to know exact
linkage phases for any of two parents that are crossed to
generate a full-sib family prior to linkage analysis. This
uncertainty about linkage phases makes linkage mapping
in outcrossing populations much more difficult than that
in phase-known pedigrees [7,9].

In this article we present a unifying model for simultane-
ously estimating the linkage, parental diplotype and gene
order in a full-sib family derived from two outbred par-
ents. As demonstrated by simulation studies, our model is
robust to different parameter space. Compared to the tra-
ditional approaches that calculate the likelihood values
separately under all possible linkage phases or orders
[9,10,18], our approach is more advantageous in three
aspects. First, it provides a one-step analysis of estimating
the linkage, parental diplotype and gene order, thus facil-
itating the implementation of a general method for ana-
lyzing any segregating type of markers for outcrossing
populations in a package of computer program. For some
short-generation-interval outcrossing species, we can
obtain marker information from grandparents, parents
and progeny. The model presented here allow for the use
of marker genotypes of the grandparents to derive the
diplotype of the parents. Second, our model for the first
time incorporates gene ordering into a unified linkage
analysis framework, whereas most earlier studies only
emphasized on the characterization of linkage phases
through a multilocus likelihood analysis [11,14,15].
Instead of a comparative analysis of different orders, we
proposed to determine a most likely gene order by esti-
mating the order probabilities.

Third, and most importantly, our unifying approach can
significantly improve the estimation precision of the link-
age for dominant markers whose alleles are in repulsion
phase. Previous analyses have indicated that the estimate


Page 12 of 14
(page number not for citation purposes)


BMC Genetics 2004, 5:20








http://www.biomedcentral.com/1 471-2156/5/20


of the linkage between dominant markers in a repulsion
phase is biased and imprecise, especially when the linkage
is not strong and when sample size is small [12]. There are
two reasons for this: (1) the linkage phase cannot be cor-
rectly determined, and/or (2) there is a fairly high possi-
bility (20%) of detecting a wrong gene order. Our
approach provides more precise estimates of the recombi-
nation fraction because correct parental diplotypes and a
correct gene order can be determined.

Our approach will be broadly useful in genetic mapping
of outcrossing species. In practice, a two-point analysis
can first be performed to obtain the pairwise estimates of
the recombination fractions and using this pairwise infor-
mation markers are grouped based on the criteria of a
maximum recombination fraction and minimum likeli-
hood ratio test statistic [2]. The parental diplotypes of
markers in individual groups are constructed using a
three-point analysis. With a limited sample size available
in practice, we do not recommend more-than-three-point
analysis because this would bring too many more
unknown parameters to be precisely estimated. If such an
analysis is desirable, however, one may use the results
from these lower-point analyses as initial values to
improve the convergence rate and possibly the precision
of parameter estimation.

In any case, our two- and three-point analysis has built a
key stepping stone for map construction through two
approaches. One is the least-squares method, as originally
developed by Stam [5], that can integrate the pairwise
recombination fractions into reconstruction of multilocus
linkage map. The second is to use the hidden Markov
chain (HMC) model, first proposed by Lander and Green
[2], to construct genetic linkage maps by treating map
construction as a combinatorial optimization problem.
The simulated annealing algorithm [19] for searching for
optima of the multilocus likelihood function need to be
implemented for the HMC model. A user-friendly package
of software that is being written by the senior author will
implement two- and three-point analyses as well as the
algorithm for map construction based on the estimates of
pairwise recombination fractions. This software will be
online available to the public.

Our maximum likelihood-based approach is imple-
mented with the EM algorithm. We also incorporate the
Gibbs sampler [20] into the estimation procedure of the
mixture model for the linkage characterizing different
parental diplotypes and gene orders of different markers.
The results from the Gibbs sampler are broadly consistent
with those from the EM algorithm, but the Gibbs sampler
is computationally more efficient for a complicated prob-
lem than the EM algorithm. Therefore, the Gibbs sampler
may be particularly useful when our model is extended to


consider multiple full-sib families in which the parents
may be selected from a natural population. For such a
multi-family design, some population genetic parameters
describing the genetic structure of the original population,
such as allele frequencies and linkage disequilibrium,
should be incorporated and estimated in the model for
linkage analysis. It can be anticipated that the Gibbs sam-
pler will play an important role in estimating these
parameters simultaneously along with the linkage, link-
age phases, and gene order.

Authors' contributions
QL derived the genetic and statistical models and wrote
computer programs. YHC participated in the derivations
of models and statistical analyses. RLW conceived of ideas
and algorithms, and wrote the draft. All authors read and
approved the final manuscript.

Acknowledgements
We thank two anonymous referees for their constructive comments on
the manuscript. This work is partially supported by a University of Florida
Research Opportunity Fund (02050259) and a University of South Florida
Biodefense Grant (7222061-12) to R. W. The publication of this manuscript
is approved as Journal Series No. R-10073 by the Florida Agricultural
Experiment Station.

References
I. Flint J, Mott R: Finding the molecular basis of quantitative
traits: Successes and pitfalls. Nat Rev Genet 2001, 2:437-445.
2. Lander ES, Green P: Construction of multilocus genetic linkage
maps in humans. Proc NatlAcd Sci USA 1987, 84:2363-2367.
3. Lander ES, Green P, Abrahamson J, Barlow A, Daly MJ, Lincoln SE,
Newburg L: MAPMAKER: an interactive computer package
for constructing primary genetic linkage maps of experimen-
tal and natural populations. Genomics 1987, 1:174-181.
4. Green P, Falls K, Crooks S: Documentation for CRIMAP, ver-
sion 2.4. Washington Univ. School of Medicine, St. Louis, MO. 1990.
5. Stam P: Construction of integrated genetic linkage maps by
means of a new computer package: JOINMAP. Plant j 1993,
3:739-744.
6. Matise TC, Perlin M, Chakravarti A: Automated constrcution of
genetic linkage maps using an expert system (MULTIMAP):
a human genome linkage map. Nat Genet 1994, 6:384-390.
7. Hitter E, Gebhardt C, Salamini F: Estimation of recombination
frequencies and construction of RFLP linkage maps in plants
from crosses between heterozygous parents. Genetics 1990,
125:645-654.
8. Arus P, Olarte C, Romero M, Vargas F: Linkage analysis of 10 iso-
zyme genes in Fl segregating almond progenies. j Am Soc Hort
Sci 1994, I 19:339-344.
9. Ritter E, Salamini F: The calculation of recombination frequen-
cies in crosses of allogamous plant species with applications
to linkage mapping. Genet Res 1996, 67:55-65.
10. Maliepaard C, Jansen J, van Ooijen JW: Linkage analysis in a full-
sib family of an outbreeding plant species: overview and con-
sequences for applications. Genet Res 1997, 70:237-250.
I I. Wu RL, Ma CM, Painter I, Zeng ZB: Simultaneous maximum like-
lihood estimation of linkage and linkage phases in outcross-
ing populations. Theor Pop Biol 2002, 61:349-363.
12. Wu RL, Ma CM, Wu SS, Zeng ZB: Linkage mapping of sex-spe-
cific differences. Genet Res 2002, 79:85-96.
13. Grattapaglia D, R Sederoff: Genetic linkage maps of Eucalyptus
grandis and Eucalyptus urophylla using a pseudo-testcross:
mapping strategy and RAPD markers. Genetics 1994,
137:1121-1137.
14. Ling S: Constructing genetic maps for outbred experimental
crosses. Ph.D. thesis, University of California, Berkeley, CA 1999.


Page 13 of 14
(page number not for citation purposes)


BMC Genetics 2004, 5:20








http://www.biomedcentral.com/1471-2156/5/20


15. Butcher PA, Williams ER, Whitaker D, Ling S, Speed TP, Moran CF:
Improving linkage analysis in outcrossed forest trees an
example from. Acacia mangium. Theor AppI Genet 2002,
104:1185-1191.
16. Dempster AP, Laird NM, Rubin DB: Maximum likelihood from
incomplete data via EM algorithm. j Roy Stat Soc Ser 8 1977,
39:1-38.
17. HaldaneJBS: The combination of linkage values and the calcu-
lation of distance between the loci of linked factors. j Genet
1919, 8:299-309.
18. Ridout MS, Tong S, Vowden CJ, Tobutt KR: Three-point linkage
analysis in crosses of allogamous plant species. Genet Res 1998,
72:111-121.
19. van Laarhoven PJM, Aarts EHL: Simulated Annealing: Theory and
Application D. Reide Publishing Co., Dordrecht, The Netherlands;
1987.
20. Casella G: Empirical Bayes Gibbs sampling. Biostatistics 2001,
2:485-500.


Page 14 of 14
(page number not for citation purposes)


Publish with BioMed Central and every
scientist can read your work free of charge
"BioMed Central will be the most significant development for
disseminating the results of biomedical research in our lifetime."
Sir Paul Nurse, Cancer Research UK
Your research papers will be:
available free of charge to the entire biomedical community
peer reviewed and published immediately upon acceptance
cited in PubMed and archived on PubMed Central
yours you keep the copyright
Submit your manuscript here: BioMedcentral
http://www.biomedcentral.com/info/publishing adv.asp


BMC Genetics 2004, 5:20




University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs