Group Title: Algorithms for Molecular Biology 2009, 4:11
Title: Modeling genetic imprinting effects of DNA sequences with multilocus polymorphism data
CITATION PDF VIEWER THUMBNAILS PAGE IMAGE ZOOMABLE
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00100242/00001
 Material Information
Title: Modeling genetic imprinting effects of DNA sequences with multilocus polymorphism data
Series Title: Algorithms for Molecular Biology 2009, 4:11
Physical Description: Archival
Creator: Wen S
Wang C
Berg A
Li Y
Chang MM
Fillingim RB
Wallace MR
Staud R
Kaplan L
Wu R
Publication Date: 40036
 Record Information
Bibliographic ID: UF00100242
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: Open Access: http://www.biomedcentral.com/info/about/openaccess/

Downloads

This item has the following downloads:

modeling_genetic ( PDF )


Full Text




Algorithms for Molecular Biology nioVedCetra



Research

Modeling genetic imprinting effects of DNA sequences with
multilocus polymorphism data
Sheron Wen', Chenguang Wang', Arthur Berg', Yao Li1, Myron M Chang2,
Roger B Fillingim3, Margaret R Wallace4, Roland Staud4, Lee Kaplan4 and
Rongling Wu*1,5,6


Address: 'Department of Statistics, University of Florida, Gainesville, Florida 32611, USA, 2Department of Epidemiology and Health Policy
Research, University of Florida, Gainesville, Florida 32611, USA, 3Department of Community Dentistry and Behavioral Science, University of
Florida, Gainesville, Florida 32611, USA, 4Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, Florida 32611,
USA, 5Department of Public Health Sciences, Pennsylvania State College of Medicine, Hershey, Pennsylvania 17033, USA and 6Department of
Statistics, Pennsylvania State University, University Park, Pennsylvania 16802, USA
Email: Sheron Wen xwen4@ufl.edu; Chenguang Wang cgwang@cog.ufl.edu; Arthur Berg berg@ufl.edu; Yao Li li@ufl.edu;
Myron M Chang mchang@cog.ufl.edu; Roger B Fillingim rfilling@ufl.edu; Margaret R Wallace peggyw@ufl.edu;
Roland Staud staudr@ufl.edu; Lee Kaplan lee@ufl.edu; Rongling Wu* rwu@hes.hmc.psu.edu
* Corresponding author


Published: I I August 2009
Algorithms for Molecular Biology 2009, 4:11 doi:10.1 186/1748-7188-4-1 I


Received: 4 February 2009
Accepted: II August 2009


This article is available from: http://www.almob.org/content/4/I/I I
2009 Wen et al; licensee BioMed Central Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.



Abstract
Single nucleotide polymorphisms (SNPs) represent the most widespread type of DNA sequence
variation in the human genome and they have recently emerged as valuable genetic markers for
revealing the genetic architecture of complex traits in terms of nucleotide combination and
sequence. Here, we extend an algorithmic model for the haplotype analysis of SNPs to estimate
the effects of genetic imprinting expressed at the DNA sequence level. The model provides a
general procedure for identifying the number and types of optimal DNA sequence variants that are
expressed differently due to their parental origin. The model is used to analyze a genetic data set
collected from a pain genetics project. We find that DNA haplotype GAC from three SNPs,
OPRKG36T (with two alleles G and T), OPRKA843G (with alleles A and G), and OPRKC846T
(with alleles C and T), at the kappa-opioid receptor, triggers a significant effect on pain sensitivity,
but with expression significantly depending on the parent from which it is inherited (p = 0.008).
With a tremendous advance in SNP identification and automated screening, the model founded on
haplotype discovery and statistical inference may provide a useful tool for genetic analysis of any
quantitative trait with complex inheritance.


Background
In diploid organisms, there are two copies at every auto-
somal gene, one inherited from the maternal parent and
the other from the paternal parent. Both copies are
expressed to affect a trait for a majority of these genes. Yet,
there is also a small subset of genes for which one copy
from a particular parent is turned off. These genes, whose


expression depends on the parent of origin due to the epi-
genetic or imprinted mark of one copy in either the egg or
the sperm, have been thought to play an important role in
complex diseases and traits, although imprinted expres-
sion can also vary between tissues, developmental stages,
and species [1]. Anomalies derived from imprinted genes
are often manifested as developmental and neurological


Page 1 of 11
(page number not for citation purposes)







Algorithms for Molecular Biology 2009, 4:11


disorders during early development and as cancer later in
life [2-5].

The mechanisms for genetic imprinting are still incom-
pletely known, but they involve epigenetic modifications
that are erased and then reset during the formation of eggs
and sperm. Recent research shows that the parent-of-ori-
gin dependent expression of imprinted genes is related
with environmental interactions with the genome [5]. The
phenomenon ofgenomic imprinting is explained from an
evolutionary perspective [6]. Genomic imprinting evolves
in mammals with the advent of live birth through a paren-
tal battle between the sexes to control the maternal
expenditure of resources to the offspring [7].

Paternally expressed imprinted genes tend to promote off-
spring growth by extracting nutrients from the mother
during pregnancy while it is suppressed by those genes
that are maternally expressed. This genetic battle between
the paternal and maternal parents appears to continue
even after birth [8,9].

The genetic mechanisms for imprinting can be made clear
if the genomic distribution of imprinted genes and their
actions and interactions are studied. Genetic mapping
with molecular markers and linkage maps has been used
to map quantitative trait loci (QTLs) that show parent-of-
origin effects [10-12]. Using an outbred strategy appropri-
ate for plants and animals, significant imprinting QTLs
were detected for body composition and body weight in
pigs [13,14], chickens [15] and sheep [16]. Cui et al. [12]
proposed an F2-based strategy to map imprinting QTLs by
capitalizing on the difference in the recombination frac-
tion between different sexes. More explorations on the
development of imprinting models are given in Cui and
others [12,17]. Liu et al. [18] developed a random-effect
model for estimating the parent-dependent genetic vari-
ance of complex traits at imprinting QTLs.

We will propose a statistical model for estimating the
imprinting effects of DNA sequence variants that encode
a complex trait. This model uses widely available single
nucleotide polymorphisms (SNPs) that reside within a
coding sequence of the human genome. The central idea
of this model is to separate maternally- and paternally-
derived haplotypes (i.e., a linear combination of alleles at
different SNPs on a single chromosome) from observed
genotypes. By specifying one risk haplotype, i.e., one that
operates differently from the rest of haplotypes (called
non-risk haplotypes), Liu et al. [19] proposed a statistical
method for detecting risk haplotypes for a complex trait
with a random sample drawn from a natural population.
Liu et al.'s approach can be used to characterize DNA
sequence variants that encode the phenotypic value of a
trait. Wu et al. [20] constructed a general multiallelic


model in which any number of risk haplotypes can be
assumed. The best number and combination of risk hap-
lotypes can be estimated by using the likelihood and AIC
or BIC values. We will derive a computational algorithm
for estimating the imprinting effects of SNP-constructed
haplotypes with multilocus genetic data based on these
previous workings. The new algorithmic model provides a
general framework for hypothesis tests on the pattern of
genetic imprinting expressed by haplotypes. A real exam-
ple from a pain genetic study is used to demonstrate the
application of the model.

Results
The new imprinting model was used to detect the differ-
ence of gene expression between the maternal and pater-
nal parents at the haplotype level. Genetic and phenotypic
data were from a pain genetics project in which 237 sub-
jects (including 143 men and 94 women) from five differ-
ent races were sampled. All these subjects were genotyped
for three SNPs, OPRKG36T (rs1051160), with two alleles
G and T, OPRKA843G (rs702764), with alleles A and G,
and OPRKC846T (rs16918875), with alleles C and T, at
the kappa-opioid receptor [21]. Three traits of pain sensi-
tivity to heat were tested with a procedure described by
Fillingim [22], and they are the average pain rating for
heat stimuli at 49C (PreInt49tot), the increase in heat
pain threshold following administration of 0.5 mg/kg of
pentazocine (an mixed action opioid agonist-antagonist
with activity at the kappa receptor) (Hpthpent), and the
increase in heat pain tolerance following administration
of 0.5 mg/kg of pentazocine (an mixed action opioid ago-
nist-antagonist with activity at the kappa receptor) (Hpto-
pent). Prelnt49tot is a baseline pain measure before any
drug administration. Although the model allows the esti-
mation of any covariate effect, we will remove the effect
on the pain traits due to different races because of too few
samples for some races. Our final imprinting analysis was
based on 237 subjects for Prelnt49tot but on 85 subjects
for Hpthpent and Hptopent. Sex-specific haplotype fre-
quencies for the three SNPs were estimated for males and
females, from which linkage disequilibria were estimated
and tested with results given in Table 1. Of the eight hap-
lotypes, haplotype GAC occupies an overwhelming pro-
portion in both sexes (>75%). Some haplotypes, like GAT,
TAT, and TGC, are very rare, with population frequencies
tending to be zero. Linkage disequilibria between these
SNPs are generally strong: those between OPRKA843G
and OPRKC846T and between OPRKG36T and
OPRKC846T are highly significant (p < 0.01), although
that between OPRKG36T and OPRKA843G is much less
significant. There is a highly significant high-order linkage
disequilibrium among these three SNPs. It is interesting to
see significant difference in haplotype distribution
between the two sexes (Table 1). This sex-specific differ-
ence is due to the difference in linkage disequilibria


Page 2 of 11
(page number not for citation purposes)


hftp://www.almob.org/content/4/1/1 1








Algorithms for Molecular Biology 2009, 4:11


Table I: Sex-specific differences observed in haplotype frequencies and higher-order linkage disequilibria estimates


Female


Genetic Parameter


p-value


Sex-specific


p-value


Sex-specific

p-value


Haplotype Frequency
pGAC 0.780

p GAT 0.000

pGGC 0.124

PGGT 0.011

p TAC 0.049

p TAT 0.000

p TGC 0.008

p TGT 0.030

Allele Frequency and Linkage Disequilibrium

G (OPRKG36T) 0.914

pA (OPRKA843G) 0.828

SA (OPRKC846T) 0.960
D 0.034
D 12
0.026
0.023
0.023


-0.021


D 123


Estimates and tests of population genetic structure for three SNPs, OPRKG36T (with two alleles G and T),
and OPRKC846T (with alleles C and T), at the kappa-opioid receptor in males and females.


because allele frequencies seem to be mostly sex-invariant
(Table 1).

A "biallelic" model assuming that there is only one haplo-
type is used to estimate haplotype effects at the kappa-opi-
oid receptor on pain traits.

Significant haplotype effects were observed on the three
pain sensitivity traits studied. By assuming a risk haplo-
type from all possible haplotypes, we calculated the
resultant likelihood of haplotype effects, which are given
in Table 2.


OPRKA843G (with alleles A and G),


It can be seen that an optimal risk haplotype is GAC for
preint49tot and TAC for Hpthpent and Hptopent. Statisti-
cal tests indicate that these risk haplotypes trigger a signif-
icant effect on PreInt49tot (p = 0.004) and Hpthpent and
Hptopent (p = 0.025), respectively (Table 3). Risk haplo-
type GAC displays a significant additive effect (p = 0.005)
on preint49tot, but there is no dominant effect due to its
interaction with non-risk haplotypes. It is interesting to
find that risk haplotype GAC contributes to variation in
preint49tot in a parent-of-origin manner. The subjects
with risk haplotype GAC inherited from the maternal par-
ent will be significantly different in preint49tot than those
with the risk haplotype inherited from the paternal par-


Table 2: Haplotype effects are estimated over three pain sensitivity traits


GAC


Prelnt49tot
Hpthpent
Hptopent


-764.663
-195.107
-165.867


GAT

-773.196
-199.832
-170.494


GGC

-771.972
-198.640
-170.468


GGT

-770.903
-199.460
-170.216


TAC

-769.645
-193.569
-157.053


TAT

-773.196
-199.832
-170.494


TGC

-772.842
-195.345
-168.324


TGT

-772.677
-199.709
-170.414


Page 3 of 11
(page number not for citation purposes)


0.764

0.000

0.081

0.023

0.104

0.000

0.000

0.029



0.868

0.868

0.949

0.045

0.022

0.011

-0.018


0.0459

4.98e-6

8.87e-s

2.53e-4


6.166

3.501

2.943

5.77

42.28

22.216

21.088


0.1812

3.61e-6

0.0089

0.0055


0.0130

0.0613

0.0862

0.0163

7.9 1 e- I

2.44e-6

4.39e-6


http://www.almob.org/content/4/1/11








Algorithms for Molecular Biology 2009, 4:11


Table 3: Additive, dominant, imprinting, and overall effects at three SNPs


Risk Haplotype


Prelnt49tot


Hpthpent


Hptopent


Effect
p-value

Effect
p-value

Effect
p-value


GAC


Overall


-13.47
0.005

3.06
0.002

2.15
0.003


-6.22
0.237

3.08
0.003

2.33
0.003


19.02
0.008

-0.26
0.089

-0.28
0.621


0.004


0.023


0.025


Estimates of additive (a), dominant (d), and imprinting effects (i) of haplotypes at SNPs, OPRKG36T (with two alleles G and T), OPRKA843G (with
alleles A and G), and OPRKC846T (with alleles C and T), at the kappa-opioid receptor, on pain sensitivity traits.


ent. No significant imprinting effects are detected on traits
Hpthpent and Hptopent, although risk haplotype TAC
displays significant additive and dominant effects on
these two traits.

The statistical properties of the imprinting model are
investigated through simulation studies. The first simula-
tion mimics the data structure of the real example (with
237 subjects) above based on its estimates of sex-specific
allele frequencies and linkage disequilibria among three
SNPs (Table 1) and of the additive, dominant, and
imprinting effects arising from different haplotypes (for
Prelnt49tot, Table 3). The second simulation includes
sample size from 237 to 400, 800, and 2000, keeping the
other parameters unchanged. Each simulation scheme is
repeated for 1000 runs.

Results from simulation studies are summarized as fol-
lows:

(1) Population genetic parameters including three-
SNP haplotype frequencies, allele frequencies, and
linkage disequilibria of different orders can be pre-
cisely estimated even when a smaller sample size
(237) is used. As expected, the estimation precision
can be improved consistently when the sample size
increases from 237 to 2000.

(2) Quantitative genetic parameters can also be well
estimated, but the better estimation of the dominant
and imprinting effects needs a larger sample size (400
or more). With a sample size of 2000, the precision of
all parameter estimates are remarkably high, with
sampling errors of each estimate being low than 10%
of the estimate.

(3) The power to detect imprinting effects reaches 75%
for a sample size of 237, but it can increase dramati-
cally when increasing the sample size to 400.


(4) The type I error rate of detecting the imprinting
effect, i.e, a probability for a false positive discovery of
that effect, is quite low (< 10%) for a small sample size
and can be lowered when sample size increases.

The simulation for testing the type I error rate in (4) was
based on the same parameters as used in (1)-(3), except
that no imprinting effect is assumed. Because we have
derived a series of closed forms for the estimation of
parameters within the EM framework, parameter estima-
tion is very efficient. For a single simulation run, it will
take a few dozen of seconds to obtain all estimates on a
PC laptop. Also, estimates do not depend heavily on ini-
tial values of parameters, showing that the estimates
achieve a global maxima for the likelihood.

Discussion
Although a traditional view assumes that the maternally
and paternally derived alleles of each gene are expressed
simultaneously at a similar level, there are many excep-
tions where alleles are expressed from only one of the two
parental chromosomes [1,23]. This so-called genetic
imprinting or parent-of-origin effect has been thought to
play a pivotal role in regulating the phenotypic variation
of a complex trait [24-27]. With the discovery of more
imprinting genes involved in trait control through molec-
ular and bioinformatics approaches, we will be in a posi-
tion to elucidate the genetic architecture of quantitative
variation for various organisms including humans.
Genetic mapping in controlled crosses has opened up a
great opportunity for a genome-wide search of imprinting
effects by identifying imprinted quantitative trait loci
(iQTLs). This approach has successfully detected iQTLs
that are responsible for body mass and diseases
[10,11,14,19,28,29]. Cloning of these iQTLs will require
high-resolution mapping of genes, which is hardly met for
traditional linkage analysis based on the production of
recombinants in experimental crosses. Single nucleotide
polymorphisms (SNPs) are powerful markers that can
explain interindividual differences. Multiple adjacent
SNPs are especially useful to associate phenotypic varia-


Page 4 of 11
(page number not for citation purposes)


hftp://www.almob.org/content/4/1/1 1







Algorithms for Molecular Biology 2009, 4:11


ability with haplotypes [30-34]. The quantitative effect of
haplotypes on a complex trait was modeled by Liu et al.
[19] and subsequently refined byWu et al. [20].

In this article, we incorporate genetic imprinting into Wu
et al.'s [20] multiallelic model to estimate the number and
combination of multiple functional haplotypes that are
expressed differently depending on the parental origin of
these haplotypes. Because of the modeling of any possible
distinct haplotypes, the multiallelic model will have more
power for detecting significant haplotypes and their
imprinting effects than biallelic models. The imprinting
model was shown to work well in a wide range of param-
eter space for a modest sample size. However, a consider-
ably large sample size is needed if there are multiple risk
haplotypes that contribute to trait variation. By analyzing
a real example for pain genetics, the new model detects
significant haplotypes composed of three SNPs within the
kappa-opioid receptor, which may play an important role
in affecting pain sensitivity to heat.

Haplotype GAC derived from this gene appears to be
imprinted for Prelnt49tot, a pain sensitivity trait to heat
stimuli at 49 C before drug administration, leading to
different levels of pain sensitivity between the patients
when they inherit this haplotype from maternal and
paternal parents. In this example, no imprinting was
detected for Hpthpent and Hptopent, two pain sensitivity
traits measured after the patients were administrated with
pentazocine. This result should be, however, explained
with caution. First, the risk haplotype, TAG, detected for
Hpthpent and Hptopent is a rare haplotype in the
admixed population studied, although it is quite com-
mon in African Americans and Hispanics. The significance
of this rare haplotype detected could be a sample size arti-
fact, or it could be indicating a powerful haplotype effect.
Second, in a different analysis, no significant genetic asso-
ciation was detected for the same heat pain test at heat
stimuli at 52C (data not shown). Nonetheless, the
method provides a powerful tool for detecting possible
associations and imprinting effects, which provide a start-
ing point for future work to pursue the positive results
with larger sample sizes and family studies. There have
not been any previous reports suggesting an imprinting
effect at an opioid receptor locus, or related to pain meas-
ures.

In practice, although the human genome contains mil-
lions of SNPs, it is not possible and also not necessary to
model and analyze these SNPs simultaneously. These
SNPs are often distributed in different haplotype blocks
[35], within each of which a particular (small) number of
representative SNPs or htSNPs can uniquely explain most
of the haplotype variation. A minimal subset of htSNPs,
identified by several computing algorithms, can be imple-


mented into our imprinting model to detect their imprint-
ing effects at the haplotype level. In addition, our model
can be extended to model imprinting effects in a network
of interactive architecture, including haplotype-haplotype
interactions from different genomic regions, haplotype-
environment interactions, and haplotype effects regulat-
ing pharmacodynamic reactions of drugs. It can be
expected that all extensions will require expensive compu-
tation, but this computing can be made possible if combi-
natorial mathematics, graphical models, and machine
learning are incorporated into closed forms of parameter
estimation.

This imprinting model assumes that if the SNPs constitut-
ing haplotypes are tightly linked, haplotype frequencies
estimated from the current generation can be used to
approximate haplotype frequencies in the parental gener-
ation. To relax this assumption, a strategy of sampling a
panel of random families from a population is required,
in which a known family structure allows the tracing and
estimation of maternally- and paternally-derived haplo-
types. Such a strategy will help to precisely estimate and
test imprinting effects of haplotypes, providing a new
gateway for studying the genetic architecture of complex
traits.

Methods
Imprinting Model
Consider a set of three ordered SNPs, each with two alleles
1 and 0, genotyped from a candidate gene. These three
SNPs form eight haplotypes, 111, 110, 101, 100, 011,
010, 001, and 000. A risk haplotype group is defined as a
set of haplotypes that are in manner distinct from the
other haplotypes in affecting a complex trait. For example,
if a risk haplotype group only consists of the haplotype
111, the remaining seven haplotypes form the non-risk
haplotype group; this means that the diplotypes com-
posed of 111 will have different genotypic values from
those composed of 110, 101, 100, 011, 010, 001, or 000.
There may be multiple risk haplotype groups, and we let
Rk denote any risk haplotype from the kth risk haplotype
group (k = 1,...,K; K < 8, where K is the number of risk hap-
lotype groups) and R0 denote any of the remaining non-
risk haplotypes in the non-risk haplotype group. The com-
binations between the risk and non-risk haplotypes are
called the composite diplotypes, including RkRk. (k < k' =
1,...,K), RkRo and RoRo (any two non-risk haplotypes).
Here we do not distinguish between RkRk. and Rk.Rk as we
do not know parental origin of the haplotypes, however
genetic imprinting implies that the same composite diplo-
type may function differently, depending on the parental
origin of its underlying haplotypes. To reflect the parental
origin of haplotypes in the composite diplotype, the fol-
lowing notation is used: Rk|Rk' (k, k' = 1 .....K), Rk|Ro,
Ro|Rk, and Ro|Ro, where the vertical lines are used to sepa-


Page 5 of 11
(page number not for citation purposes)


hftp://www.almob.org/content/4/1/1 1







Algorithms for Molecular Biology 2009, 4:11


rate the haplotypes derived from the maternal parent
(left) and paternal parent (right).

According to traditional quantitative genetic principles,
the genotypic value of a given composite diplotype is par-
titioned into different components due to additive and
dominance genetic effects. For an imprinting model, an
additional component is the imprinting genetic effect due
to different contributions of haplotypes from the mater-
nal and paternal parents. Mathematically, the genotypic
value of a composite diplotype of known parental origins
is expressed as

uPk =p + ak,
for composite diplotype Rk I Rk
1 k'-k
kk' = Pf + -(ak + a') + dkk' + ik kk
2 |k'-k|
for composite diplotype Rk I Rk
K
Pko = I ak' + dko + ko
k'#k
for composite diplotype Rk I Ro
K
ok P + dko iko
2 h'z,
k'k
for composite diplotype Ro I Rk
K
Poo =P ak

for composite diplotype Ro I Ro

where p is the overall mean, ak is the additive effect due to
the substitution of risk haplotype k by the non-risk haplo-
type, I is the dominance effect due to the interaction
between risk haplotypes h and k'( = ). ,,, is the dom-
inance effect due to the interaction between risk haplo-
types k and the non-risk haplotype ( ,, = 1,,,), ik' is the
imprinting effects due to different parental origins of risk
haplotypes k and k'(ik' = ik'), and i'o is the imprinting
effect due to different parental origins of risk haplotype k
and the non-risk haplotype (i4o = i). The sizes and signs
of ikk. and i'o determine the extent and direction of
imprinting effects at the haplotype level.

A set of genetic parameters, including the additive, domi-
nance, and imprinting effects, define the genetic architec-
ture of a quantitative trait. By estimating and testing these
parameters, the genetic architecture of a trait can well be
studied. Specific genetic effects of haplotypes can be esti-
mated from genotypic values of the composite diplotypes
with the formulas as follows:


K
ak = K1 Kkk P + Poo
1 ,#

dkk' = IPk k' + Pk'k) (Pkuk + Pk'k')I
1
dko = I[(PUko + PoQk) (Pkk + Poo)1

1
1kk' = (- Pk k' P k'\k)
2
1
ko = (P o 0 Po0k)

Estimating Model
Genetic Structure
Suppose the three SNPs are genotyped from a natural
human population at Hardy-Weinberg equilibrium
(HWE). Let p and 1 p denote the frequencies of two
alternative alleles r; (r; = 1 or 0) at SNP I in the population
of sex s (s = M for females and P for males). Sex-specific
frequencies of eight haplotypes produced by the three
SNPs are generally expressed as p',, For each sex s, gen-
otypes consisting of three SNPs are denoted as
rlr / r2'r3'(r1 > ', r2 r3 > r), totaling 27 distinct
genotypes. Let (r1 > r ,r2 > ,r, 3 > = 1,0) denote the
observation of a three-SNP genotype for sex s, which sums
over the two sexes to nr,,irr ir r, Some genotypes are
consistent with diplotypes, whereas those that are hetero-
zygous at two or more SNPs are not. Each double hetero-
zygote contains two different diplotypes, and the triple
heterozygote, i.e., 10/10/10, contains four different diplo-
types: 111|000 (with probability 2p11pioo0 for sex s),

1101001 (with probability 2p loPooI for sex s), 101|010

(with probability 2Po iPoo1 for sex s), and 100|011 (with

probability 2p0ooP011 for sex s). Note that slashes are used
to separate genotypes at different SNPs and vertical lines
are used to separate haplotypes derived from the maternal
parent (left) and paternal parent (right). The observed
genotypes, the underlying diplotypes, and diplotype fre-
quencies are provided in Additional file 1. From the HWE
assumption, diplotype frequencies are simply expressed
as the products of the underlying-haplotype frequencies
derived from different parents.

For a random sample from a natural population, it is
impossible to estimate the frequencies of maternally- and


Page 6 of 11
(page number not for citation purposes)


http://www.almob.org/content/4/1/11







Algorithms for Molecular Biology 2009, 4:11


paternally-derived haplotypes. However, parent-specific
haplotype frequencies can be approximated by sex-spe-
cific haplotype frequencies in the current generation if the
SNPs studied are highly linked together. For example, we
will argue below that Plo which measures frequency for
the haplotype 100 among females in the current genera-
tion can be accurately approximated by the haplotype fre-
quency 100 in the "maternal generation" or previous
generation of females. This is proven as follows. Let
pr r2r3 (t) and Prr2r3 (t + 1) be the frequencies of a repre-
sentative three-SNP haplotype rr23r, for a monosexual
population in generations t and t + 1, respectively. The
relationship of haplotype frequencies between the two
generations is expressed as

Pr"r, (t + 1) = p, (t + 1)(t+)(t + 1)p, (t + 1)
+(-1) +2 p, (t + 1)D12(t +1)
+(-1)' (t+l)D13(t+l)
+(-1) 2+. p (t + )D23(t + 1)
-(-1)12+" D123(t +1) (generation t + 1)
= P (t)p ''. (t)
+(-1)+"p r12)D12(t)
+(-I)"+" p2 (t)(1 13)D13(t)
+(-1) .2+" Pp, (t)(1- 23)D23t)
-(-1)'i.2+ (1 -r12)(1 -13)(1 r23)D23(t) (generation t +1)
SP / (t)P2 )' (t. ) + (-1) + p,, (t)D 2(t)
+(-1)rl+ P (t)D13(t)
+(-1)' 2+ p, (t)D23(t)
-(-1)'l 2+ D123(t) (generation t + 1)
= p,, (t) (generation t)
where pq (t), p, (t), and pr (t) are the allele frequencies
of three SNPs in generation t, and D12(t), D13(t), and
(D23(t) are the linkage disequilibria between the first and
second SNPs, the first and third SNPs, and the second and
third SNPs in generation t. Under Hardy-Weinberg equi-
librium, the allele frequencies remain constant from gen-
eration to generation (p, (t) = p, (t + 1)) and the linkage
disequilibria decay in proportion with the recombination
fractions r1 in that

Dy(t + 1) = (1 r,)D,(t + 1)

and

D123(t +1) = (1 12)( r13)(1 r23)D123(t).

Because the three SNPs are genotyped from the same
region of a candidate gene, their recombination fractions


should be very small and can be thought to be close to
zero. Thus, the frequencies of maternally- and paternally-
derived haplotypes can be approximated by the estimates
of these haplotype frequencies in the female and male
populations of current generation, respectively.

Likelihoods
With a random sample from a natural population, in
which each genotyped subject is measured for a pheno-
typic trait of interest, we will develop a model to estimate
population genetic parameters, including the eight mater-
nally-derived haplotype frequencies
(pM = {Pr23 }\,r2,r3=o), the eight paternally-derived hap-

lotype frequencies (D, = {p,,,, }\,, ,r and the quan-
titative genetic parameters (4q) that include haplotype
effects (({ak,agkk',dkhoik',iOk}hlk'=l)) and the residual
variance of the trait (02). The haplotype effects are derived
uniquely from genotypic values of composite diplotypes
(({Pkk', Pkl,/o, Kolo}k=1) ) as provided in equations
(2)-(6).

Given sex-specific genotypic observations (MM and Mp)
and phenotypic data (y), a joint log-likelihood is con-
structed as

log L(pM,p q I y MMMP)
log L(M M ) + log L(QP Mp)
+logL(Q, y),MMMP, M ,Q)
Thus, maximizing the likelihood in (1) is equivalent to
individually maximizing the three terms on the right hand
side of(1).

A polynomial likelihood is constructed for the marker
data of a sex (s) to estimate Qp and 4 For notational
convenience, we define a function h(r) on genotypes
r = / r2,' / r3 to be

h(r) = (r r) + (r2 ) + (r3 -)

So, for example, h(r) = 2 if r is a double heterozygote. The
first two log likelihood on the righthand side of equation
(7) are then expressed as


Page 7 of 11
(page number not for citation purposes)


hftp://www.almob.org/content/4/1/1 1








Algorithms for Molecular Biology 2009, 4:11


Table 4: Number of choices for several multiallelic models with likelihood and model selection notation


Risk Haplotype


Log-likelihood


AIC/BIC


No. Choice


L(Q Q 2, y,M)


1 8 (one haplotype)
28 (two haplotypes)
56 (three haplotypes)
70 (four haplotypes)
2 28


3 56

4 170

5 56

6 24

7 8


log LB


log L

log LQ

log LP

log LH

logLs,

log Lo


In this table, Q~ is an estimated vector of the genotypic values of different composite diplotypes and the residual variance. The largest log-
likelihood and/or the smallest AIC or BIC value calculated is thought to correspond to the most likely risk haplotypes and the optimal number of
risk haplotypes.


log L(Q I M,)


constant +
{r:h(r)=Oj


2nr log(pSr,2)


risk haplotype(s). By assuming that haplotype 111 is only
one risk heplotype, we construct the likelihood as


+ n log(2p p,;)
{r:h(r)=1}

+ nrlog(p,12P ,rp
{r:h(r)=2}
S 5 S 5 ,5
+eP~r P3r^2 + Prr3 Per

+P12r3 1P

+n10/10/10 log(2p11Pspoo
+2psolp0lo + 2p+loP0ol + 2p1oo011)
A closed system of the EM algorithm was derived to esti-
mate these haplotype frequencies (see the Text Si). The
estimates of sex-specific haplotype frequencies are then
used to estimate haplotype effects by constructing a mix-
ture model-based likelihood. The construction of this
likelihood requires knowledge of which haplotypes are


Page 8 of 11
(page number not for citation purposes)


Model


Biallelic


Triallelic


Quadriallelic

Pentaallelic

Hexaallelic

Septemallelic

Octoallelic


http://www.almob.org/content/4/1/11







Algorithms for Molecular Biology 2009, 4:11


logL(Qq,c 2 , ,y,M)=
n111/11/11 1111/11/10

X logf11(yi)+ log lPifrl(Yi) +tffoYi)]
i=1 i=1

+ log[t2f10(i) +F 2fo1(i)]
i=1
n11111o1o
+ ,log[ 3 i10(y + 3f 01(yJ +3, fo(yi)
i=1
n 10 1111
+ log[tpy4fl0Yi) + t4fo(Yi)
i=1
no10/111/10
+ log[ io410(y+ J/o0(yi +), foo(Yr)
i=1
1010/11
+ loglt/so(i y + tf/oi((y )+ oo+Y)
i=1
n10io1010
+ log[y / io1(y + tfoi(Yi) ++7foo(Yi)]
i=1


i=i
+ l0ogo fo(y1)

and

m = n11111/oo + n11/10/00 + n11/1oo11
+n11/O/lO + n11/oo/oo + n10/11/oo
+n10O/10/O + l01/oo/11 + 101o0oolo
+n1o/oo/oo + noo/11/11 + noo/11/1o
+naOO/10/ + n0o0/10/0 + n00/10/10
+n010/10/O + no01oo/11 + n000oooo1
+n00/00/oo.
The EM algorithm is derived to estimate quantitative
genetic parameters with a detailed procedure given in
Additional file 2. The estimated genotypic values of com-
posite diplotypes are used to estimate the additive, domi-
nant and imprinting effects of haplotypes using equations
(2)-(6).

Model Selection
For an observed marker (M) and phenotypic (y) data set,
we do not know which are the risk haplotypes nor how
many there are. Standard model selection criteria such as
the AIC and BIC are used to determine the optimal
number and type of risk haplotypes. Among a total of
eight haplotypes formed by three SNPs, up to seven ones
can be risk haplotypes. The modeling of one to seven risk


haplotypes is equivalent to the analysis of the genetic data
by biallelic, triallelic, quadrialleli, pentaallelic, hexaal-
lelic, septemallelic and octoallelic models, respectively.
The biallelic model dissolves all the haplotypes into two
distinct groups, a single risk haplotype group and a non-
risk haplotype group. The risk haplotype group may be
composed of one (8 choices), two (28 choices), three (56
choices) or four haplotypes (70 choices). The triallelic,
quadrialleli, pentaallelic, hexaallelic, septemallelic and
octoallelic models contains 28, 56, 170, 56, 24 and 8
cases, respectively. The likelihood and model selection
criteria, AIC or BIC, are then calculated and compared
among different models and assumptions. This is summa-
rized in Table 4.

Hypothesis Tests
For a given data set, testing the existence of functional
haplotypes is a first step. This can be done by formulating
the following hypotheses:

Ho : P.k =k Pk-k' = Pk0 = Po0 = Po|o
(k = 1...K)
H1 : At least one of equality in H0 does not hold The log-
likelihood ratio (LR) is then calculated by plugging the
estimated parameters into the likelihood under the H0
and H1, respectively. The LR can be viewed as being
asymptotically X2-distributed with (k + 1)2 1 degrees of
freedom.

After a significant haplotype effect is detected, a series of
further tests are performed for the significance of additive,
dominance and imprinting effects triggered by haplo-
types. The null hypotheses under each of these tests can be
formulated by setting the effect being tested to be equal to
zero. For example, under the triallelic model, the null
hypothesis for testing the imprinting effect of the haplo-
typesis expressed as

Ho : i12 i10 i20 = 0.

In practice, it is also interesting to test each of the additive
genetic effects, each of the dominance effects and each of
the imprinting effects for the tri-and quadriallelic models.
The estimates of the parameters under the null hypotheses
can be obtained with the same EM algorithm derived for
the alternative hypotheses but with a constraint of the
tested effect equal to zero. The log-likelihood ratio test sta-
tistics for each hypothesis is thought to asymptotically fol-
low a X2-distributed with the degree of freedom equal to
the difference of the numbers of the parameters being
tested under the null and alternative hypotheses.

Competing interests
The authors declare that they have no competing interests.


Page 9 of 11
(page number not for citation purposes)


http://www.almob.org/content/4/1/11








Algorithms for Molecular Biology 2009, 4:11


Authors' contributions
SW, CW, and AB contributed equally to this manuscript.
SW, CW, AB, YL derived the algorithm and performed the
data analysis. RBF, MRW, RS, LK designed the experiment
and collected the data. MMC provided statistical advice
and help. RW conceived the model and wrote the paper.
AB provided final modifications to the paper. All authors
have read and approved the final manuscript.


Additional material


Additional file 1
Observed genotypes, underlying diplotypes, and diplotype frequencies
under biallelic and ocoallelic models. Two tables are provided describing
genotypes, diplotypes, and diplotype frequencies for 2 7 genotypes at three
SNPs, and genotypic values of composite diplotypes under the biallelic
(assuming that 111 is the risk haplotype and the others are the non-risk
haplotype) and ocoallelic models.
Click here for file
[http://www.biomedcentral.com/content/supplementary/1748-
7188-4-11-Sl.pdf]

Additional file 2
EM algorithm details for calculating parameter estimation. EM Algo-
rithms for estimating haplotype frequencies and for estimating quantita-
tive genetic parameters.
Click here for file
[http://www.biomedcentral.com/content/supplementary/1748-
7188-4-1 l-S2.pdf]


Acknowledgements
This work is supported by joint grant DMS/NIGMS-0540745 and NIH RO I
grant NS41670.

References
I. Reik W, Walter J: Genomic imprinting: parental influence on
the genome. Nature Reviews Genetics 2001, 2:21-32.
2. Falls J, Pulford D, Wylie A, Jirtle R: Genomic imprinting: implica-
tions for human disease. American journal of Pathology 1999,
154(3):635-647.
3. Jirtle R: Genomic imprinting and cancer. Experimental cell
research 1999, 248:18-24.
4. Luedi P, Hartemink A, Jirtle R: Genome-wide prediction of
imprinted murine genes. Genome Research 2005, 15(6):875-884.
5. Waterland R, Lin J, Smith C, Jirtle R: Post-weaning diet affects
genomic imprinting at the insulin-like growth factor 2(lgf2)
locus. Human Molecular Genetics 2006, 15(5):705-716.
6. Killian J, Byrd J, Jirtle J, Munday B, Stoskopf M, MacDonald R, Jirtle R:
M6P/IGF2R imprinting evolution in mammals. Molecular Cell
2000, 5(4):707-716.
7. Haig D: Altercation of generations: genetic conflicts of preg-
nancy. American journal of reproductive immunology (New York, NY:
1989) 1996, 35(3):226.
8. Lefebvre L, Viville S, Barton S, Ishino F, Keverne E, Surani M: Abnor-
mal maternal behaviour and growth retardation associated
with loss of the imprinted gene Mest. Nature genetics 1998,
20:163-169.
9. Li L, Keverne E, Aparicio S, Ishino F, Barton S, Surani M: Regulation
of maternal behavior and offspring growth by paternally
expressed Peg3. Science 1999, 284(5412):330.
10. de Koning D, Rattink A, Harlizius B, Van Arendonk J, Brascamp E,
Groenen M: Genome-wide scan for body composition in pigs


reveals important role of imprinting. Proceedings of the National
Academy of Sciences 2000, 97(14):7947-7950.
II. de Koning D, Bovenhuis H, van Arendonk J: On the detection of
imprinted quantitative trait loci in experimental crosses of
outbred species. Genetics 2002, 161 (2):93 1-938.
12. Cui Y, Lu Q, Cheverud J, Littell R, Wu R: Model for mapping
imprinted quantitative trait loci in an inbred F2 design.
Genomics 2006, 87(4):543-55 1.
13. Jeon J, Carlborg 0, Tornsten A, Giuffra E, Amarger V, Chardon P,
Andersen-Eklund L: A paternally expressed QTL affecting skel-
etal and cardiac muscle mass in pig maps to the IGF2 locus.
Not Genet 1999, 21:157-158.
14. Nezer C, Moreau L, Brouwers B, Coppieters W, Detilleuxj, Hanset
R, Karim L, Kvasz A, Leroy P, Georges M: An imprinted QTL with
major effect on muscle mass and fat deposition maps to the
IGF2 locus in pigs. Nature Genetics 1999, 21(2):155-156.
15. Tuiskula-Haavisto M, De Koning D, Honkatukia M, Schulman N, Maki-
tanila A, Vilkki J: Quantitative trait loci with parent-of-origin
effects in chicken. Genetics Research 2004, 84(01):57-66.
16. Lewis A, Redrup L: Genetic imprinting: conflict at the Callipyge
locus. Current Biology 2005, 15(8):29 1-294.
17. Cui Y: A statistical framework for genome-wide scanning and
testing of imprinted quantitative trait loci. journal of theoretical
biology 2007, 244:1 15-126.
18. Liu T, Todhunter R, Wu S, Hou W, Mateescu R, Zhang Z, Burton-
Wurster N, Acland G, Lust G, Wu R: A random model for map-
ping imprinted quantitative trait loci in a structured pedi-
gree: An implication for mapping canine hip dysplasia.
Genomics 2007, 90(2):276-284.
19. Liu T, Johnson J, Casella G, Wu R: Sequencing complex diseases
with HapMap. Genetics 2004, 168:503-51 I.
20. Wu S, Yang J, Wang C, Wu R: A General Quantitative Genetic
Model for Haplotyping a Complex Trait in Humans. Current
Genomics 2007, 8(5):343-350.
21. Takasaki I, Suzuki T, Sasaki A, Nakao K, Hirakata M, Okano K, Tanaka
T, Nagase H, Shiraki K, Nojima H, et al.: Suppression of acute her-
petic pain-related responses by the kappa-opioid receptor
agonist (-)-17-cyclopropylmethyl-3, 14beta-dihydroxy-4,
5alpha-epoxy-beta-[n-methyl-3-trans-3-(3-furyl) acryla-
mido] morphinan hydrochloride (TRK-820) in mice. The our-
nal of pharmacology and experimental therapeutics 2004, 309:36.
22. Fillingim R, Doleys D, Edwards R, Lowery D: Spousal Responses
Are Differentially Associated With Clinical Variables in
Women and Men With Chronic Pain. Clinical journal of Pain
2003, 19(4):217.
23. Wilkins J, Haig D: What good is genomic imprinting: the func-
tion of parent-specific gene expression. Nature Reviews Genetics
2003, 4(5):359-368.
24. Wood A, Oakey R: Genomic imprinting in mammals: emerg-
ing themes and established theories. PLoS genetics 2006, 2(I 1):.
25. Lewis A, Reik W: How imprinting centres work. Cytogenet
Genome Res 2006, I 13(1-4):81-89.
26. Jirtle R, Skinner M: Environmental epigenomics and disease
susceptibility. Nature Reviews Genetics 2007, 8(4):253-262.
27. Feil R, Berger F: Convergent evolution of genomic imprinting
in plants and mammals. Trends in Genetics 2007, 23(4):192-199.
28. Nezer C, Collette C, Moreau L, Brouwers B, Kim J, Giuffra E, Buys N,
Andersson L, Georges M: Haplotype sharing refines the location
of an imprinted quantitative trait locus with major effect on
muscle mass to a 250-kb chromosome segment containing
the porcine IGF2 gene. Genetics 2003, 165:277-285.
29. Cheverud J, Hager R, Roseman C, Fawcett G, Wang B, Wolf J:
Genomic imprinting effects on adult body composition in
mice. Proceedings of the National Academy of Sciences 2008,
105(1 1):4253.
30. Judson R, Stephens J, Windemuth A: The predictive power of hap-
lotypes in clinical response. Pharmacogenomics 2000, 1:15-26.
31. Bader : The relative power of SNPs and haplotype as genetic
markers for association tests. Pharmacogenomics 2001, 2:1 1-24.
32. Winkelmann B, Hoffmann M, Nauck M, Kumar A, Nandabalan K, Jud-
son R, Boehm B, Tall A, Ruano G, Marz W: Haplotypes of the
cholesterol ester transfer protein gene predict lipid-modify-
ing response to station therapy. The Pharmacogenomics journal
2003, 3(5):284-296.
33. Clark A: The role of haplotypes in candidate gene studies.
Genetic epidemiology 2004, 27(4):321-333.



Page 10 of 11
(page number not for citation purposes)


hftp://www.almob.org/content/4/1/1 1








Algorithms for Molecular Biology 2009, 4:11


34. Jin G, Miao R, Deng Y, Hu Z, Zhou Y, Tan Y, Wangj, Hua Z, Ding W,
Wang L, et al.: Variant genotypes and haplotypes of the epider-
mal growth factor gene promoter are associated with a
decreased risk of gastric cancer in a high-risk Chinese popu-
lation. Cancer science 2007, 98(6):864-868.
35. Altshuler D, Brooks L, Chakravarti A, Collins F, Daly M, Donnelly P,
et al.: A haplotype map of the human genome. Nature 2005,
437(7063): 1299-1320.


Page 11 of 11
(page number not for citation purposes)


Publish with BioMed Central and every
scientist can read your work free of charge
"BioMed Central will be the most significant development for
disseminating the results of biomedical research in our lifetime."
Sir Paul Nurse, Cancer Research UK
Your research papers will be:
available free of charge to the entire biomedical community
peer reviewed and published immediately upon acceptance
cited in PubMed and archived on PubMed Central
yours you keep the copyright
Submit your manuscript here: BioMedcentral
http://www.biomedcentral.com/info/publishingadv.asp


hftp://www.almob.org/content/4/1/1 1




University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs