<%BANNER%>

Statistical Models for Haplotyping Complex Human Diseases with a Family-based Design

Permanent Link: http://ufdc.ufl.edu/UFE0024904/00001

Material Information

Title: Statistical Models for Haplotyping Complex Human Diseases with a Family-based Design
Physical Description: 1 online resource (147 p.)
Language: english
Creator: Li, Qin
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2009

Subjects

Subjects / Keywords: complex, em, family, imprinting, linkage, statistical, zygotic
Statistics -- Dissertations, Academic -- UF
Genre: Statistics thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: It has long been recognized that many human diseases involve the action of multiple genes and nongenetic factors and also show strong correlation among relatives. Because of this complexity, genetic mapping with a family-based design (including parents and offspring) is particularly needed for identifying genes and their inheritance involved in human diseases. In this dissertation, I explore several fundamental aspects of family data in constructing the linkage disequilibrium map of the human genome and fine mapping disease genes. A library of statistical models has been derived to estimate and test the pattern of gene segregation in a natural population and genetic effects of haplotypes on complex diseases. Because genetic information of interest to population and biomedical genetic studies cannot be observed, a series of mixture models proven powerful for solving missing data problems have been built within the family design. These models generate a number of testable hypotheses about the genetic control of complex diseases. Specifically, this dissertation presents various solutions into genetic and statistical problems in the following ways: (1) Construct a multilocus population and multilocus quantitative genetic model with SNP data: The models proposed allow the test of high-order disequilibria on the diversity of a natural population and of crossover interference on the transmission of genes during meiosis. By tracing the path of gene transmission from different parents, the models provide a way of quantitatively testing genetic imprinting effects on human diseases. (2) Develop a new approach for estimating linkage disequilibria at the zygote level: The family design has a capacity of separating the diplotypes that form the same heterozygote and, thereby, estimating gametic and non-gametic disequilibria and trigenic and quadrigenic disequilibria. The new approach relaxes the Hardy-Weinberg equilibrium for a population and extends the concept of linkage disequilibrium mapping to any nonequilibrium populations. (3) Derive a series of closed forms for the EM algorithm: These algorithms are shown to be robust for estimating population genetic parameters (including haplotype frequencies and linkage disequilibria of various orders), gene transmission parameters (including the recombination fractions and crossover interference), and quantitative genetic parameters (including additive, dominant, and imprinting effects of haplotypes). The accuracy and precision of parameter estimates are investigated through simulation studies. The dissertation provides a handful of state-of-art technologies for genetic mapping of human diseases with commonly used family-based designs. These technologies, coupled with empirical and laboratory studies, will help to predict the occurrence and progression of a disease using the information about its underlying genes and biological pathways.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Qin Li.
Thesis: Thesis (Ph.D.)--University of Florida, 2009.
Local: Adviser: Wu, Rongling.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2010-08-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2009
System ID: UFE0024904:00001

Permanent Link: http://ufdc.ufl.edu/UFE0024904/00001

Material Information

Title: Statistical Models for Haplotyping Complex Human Diseases with a Family-based Design
Physical Description: 1 online resource (147 p.)
Language: english
Creator: Li, Qin
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2009

Subjects

Subjects / Keywords: complex, em, family, imprinting, linkage, statistical, zygotic
Statistics -- Dissertations, Academic -- UF
Genre: Statistics thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: It has long been recognized that many human diseases involve the action of multiple genes and nongenetic factors and also show strong correlation among relatives. Because of this complexity, genetic mapping with a family-based design (including parents and offspring) is particularly needed for identifying genes and their inheritance involved in human diseases. In this dissertation, I explore several fundamental aspects of family data in constructing the linkage disequilibrium map of the human genome and fine mapping disease genes. A library of statistical models has been derived to estimate and test the pattern of gene segregation in a natural population and genetic effects of haplotypes on complex diseases. Because genetic information of interest to population and biomedical genetic studies cannot be observed, a series of mixture models proven powerful for solving missing data problems have been built within the family design. These models generate a number of testable hypotheses about the genetic control of complex diseases. Specifically, this dissertation presents various solutions into genetic and statistical problems in the following ways: (1) Construct a multilocus population and multilocus quantitative genetic model with SNP data: The models proposed allow the test of high-order disequilibria on the diversity of a natural population and of crossover interference on the transmission of genes during meiosis. By tracing the path of gene transmission from different parents, the models provide a way of quantitatively testing genetic imprinting effects on human diseases. (2) Develop a new approach for estimating linkage disequilibria at the zygote level: The family design has a capacity of separating the diplotypes that form the same heterozygote and, thereby, estimating gametic and non-gametic disequilibria and trigenic and quadrigenic disequilibria. The new approach relaxes the Hardy-Weinberg equilibrium for a population and extends the concept of linkage disequilibrium mapping to any nonequilibrium populations. (3) Derive a series of closed forms for the EM algorithm: These algorithms are shown to be robust for estimating population genetic parameters (including haplotype frequencies and linkage disequilibria of various orders), gene transmission parameters (including the recombination fractions and crossover interference), and quantitative genetic parameters (including additive, dominant, and imprinting effects of haplotypes). The accuracy and precision of parameter estimates are investigated through simulation studies. The dissertation provides a handful of state-of-art technologies for genetic mapping of human diseases with commonly used family-based designs. These technologies, coupled with empirical and laboratory studies, will help to predict the occurrence and progression of a disease using the information about its underlying genes and biological pathways.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Qin Li.
Thesis: Thesis (Ph.D.)--University of Florida, 2009.
Local: Adviser: Wu, Rongling.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2010-08-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2009
System ID: UFE0024904:00001


This item has the following downloads:


Full Text





STATISTICAL MODELS FOR HAPLOTYPING COMPLEX HUMAN DISEASES WITH
A FAMILY-BASED DESIGN



















By

QIN LI



















A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY


UNIVERSITY OF FLORIDA

2009




































2009 Qin Li

































I dedicate this to my parents, my husband and my son









ACKNOWLEDGMENTS

I would like to express my gratitude to those who helped me to pursue my PhD

degree and complete my dissertation.

I am deeply indebted to my academic advisor and committee chair Dr. Rongling Wu.

He is such a great mentor and an outstanding researcher. His leadership, his enthusiasm,

his creativity gave me a lot of encouragement and inspiration. Without his guidance,

advice and countless help, it is impossible for me to finish this work. He is also such a nice

person and friend. His door is ah-- i-i open to his students. I learned not only how to do

research from him, but also, more importantly, how to live as a parent, as a child, as a

teacher and as a friend.

I would also like to thank the other committee members: Dr. Arthur Berg, Dr. Roger

Fillingim and Dr. Ronald Randles. Thank them for their time and advices. I am also very

grateful to the other faculty and staff members of the Department of Statistics and the

Department of Epidemiology and Health Policy Research at University of Florida, where I

have received generous support during the past few years.

Special thanks to my beloved family and friends who are ah-"i-x there for me. My

parents and parents-in-law, Tongwan Li and Yuying Z1i ii.- Tao Wang and Jinying Wang,

have been a great source of love, support and understanding. I am most deeply grateful to

my husband, C'!, i.1,1 ing W1ii: for his selfless, continuing, countless love, encouragement,

understanding and support. And to my son, Leon L. Wang, I thank you for bringing tears

of joy and love to us.










TABLE OF CONTENTS
page

ACKNOWLEDGMENTS ................... .............. 4

LIST OF TABLES ................... .................. 7

LIST OF FIGURES ................... ................. 9

ABSTRACT . ........... ................... .. 10

CHAPTER

1 INTRODUCTION TO MAPPING COMPLEX DISEASES ........... 12

1.1 Introduction ................... ................ 12
1.2 Genetic Designs ................... .............. 13
1.2.1 Experimental Crosses ................... ....... 13
1.2.2 Multigenerational Families ..... .......... .... 14
1.2.3 Natural Populations with Unrelated Individuals . 14
1.2.4 Natural Populations with Related Families . . . 15
1.2.5 Case-Control Studies . . . . .. ..... 15
1.3 New Developments . . . . .. .......... 15
1.3.1 From QTL to QTN . . . . . . 15
1.3.2 From Linkage or Linkage Disequilibrium Mapping to Joint Ain li. -i' 16
1.3.3 From Genetics to Epigenetics . . . . .. .. 18
1.4 Dissertation Goals .. . . . .. . . .... 20

2 CONSTRUCTING A LINKAGE DISEQUILIBRIUM MAP . . 22

2.1 Introduction ... . . .. . . . ..... 22
2.2 Two-Point Model . . . . . . . 24
2.2.1 Sampling Strategy ... . . .. . .... 24
2.2.2 Likelihood .. . . . .. . . .... 26
2.2.3 Estimation ... . . .. . . ..... 30
2.2.4 Hypothesis Testing .. . . . . ..... 35
2.3 Three-Point Model .................. . . .. 37
2.3.1 Multilocus Population Genetics . . . . .... 38
2.3.2 Multilocus Mendelian Genetics . . . . .... 41
2.3.3 Likelihood . . . . . . . . 44
2.3.4 Estimation . . . . . . . . 48
2.3.5 Marker Ordering . . . . . . . 73
2.3.6 Hypothesis Testing . . . . . . 73
2.4 Computer Simulation ... . . .. . . ..... 74
2.4.1 Two-Point Model .. . . . ... . ..... 74
2.4.2 Three-Point Model .. . . . . ..... 75
2.5 Discussion ... . . .. . . . .... 79










3 GENETIC HAPLOTYPING OF COMPLEX TRAITS


3.1 Haplotype and Diplotype ................... ......... 83
3.2 Two-Point Haplotyping Model ...... ........... ........ 85
3.2.1 Likelihood ................... ............. 85
3.2.2 Estimation ................... ............. 91
3.2.3 Hypothesis Testing ................... ........ 92
3.2.4 Model Selection . . . . . . . 93
3.3 Three-point Haplotyping Model . . . . . . 94
3.4 Simulation . . . . . . . . . 95
3.5 Discussion . . . . . . . . . 98

4 COMPUTING GENETIC IMPRINTING . . . . .... 100

4.1 Imprinting Model . . . . . . . . 101
4.2 Likelihood . . . . . . . . . 102
4.3 Estimation . . . . . . . . ..... 109
4.4 Hypothesis Testing . . . . . . . 111
4.5 Simulation . . . . . . . . . 112
4.6 Discussion . . . . . . . . . 112

5 ZYGOTIC DISEQUILIBRIUM MAPPING WITH FAMILY DATA ... 115

5.1 Zygotic Disequilibrium . . . . . . ..... 116
5.1.1 Gamete and Non-gamete Frequencies . . . .. 116
5.1.2 Complete Disequilibrium Parameters . . . . 119
5.2 Model for Estimation . . . . . . . 22
5.3 Hypothesis Tests . . . . . . . . 125
5.4 A Worked Example . . . . . . . 126
5.5 Discussion . . . . . . . . . 126

6 FUTURE DIRECTIONS . . . . . . . 128

APPENDIX

A HARDY-WEINBERG PRINCIPLE ... . . ... .... 130

B ESTIMATION OF ZYGOTIC DISEQUILIBRIA USING CROHN'S DISEASE
DATA SET ...... ........... .... 131

REFERENCES . . . . . . . . . 138

BIOGRAPHICAL SKETCH ... . . .. ............ 147









LIST OF TABLES


Table page

2-1 Data structure of two markers typed for a panel of full-sib families, each composed
of the mother, father and offspring, sampled at random from a natural population 25

2-2 Mating frequencies of families and offspring genotype frequencies per family for
two markers sampled from a natural population . . . ...... 28

2-3 Mating frequencies of families and offspring genotype frequencies per family for
three markers sampled from a natural population . . . ....... 40

2-4 MLEs ( standard deviations) of allele frequencies, linkage disequilibrium, and
recombination fraction from 1000 simulation replicates under different sampling
strategies . . . . . . . . . 76

2-5 Power to detect sex-specific differences in the recombination fraction and linkage
disequilibrium .. . . ... . . . ...... 77

2-6 MLEs of linkage disequilibria and recombination fractions among markers under
the three-point model, in a comparison with those under the two-point model.
The numbers in the parentheses are the standard deviations of the MLEs . 78

3-1 Diplotype configuration of offspring's genotypes at two SNPs . . 86

3-2 MLEs ( standard deviations) of population and quantitative genetic parameters
under the two-point model. . . . . . . ... 96

3-3 Power to detect the risk haplotype under the two-point model . . 97

3-4 MLEs ( standard deviations) of population and quantitative genetic parameters
under the three-point model based on 1000 x 1 design strategy . . 98

4-1 Diplotype configuration of offspring's genotypes with imprinting effects at two
SNPs ..... . . ... ................... .. 04

4-2 MLEs ( standard deviations) of quantitative genetic parameters under the
two-point model with imprinting effect . . . . ... 113

4-3 MLEs ( standard deviations) of population and quantitative genetic parameters
under the three-point model with imprinting effect based on 1000 x 1 design
strategy . . . . . . . . . . 114

5-1 Frequencies and observations of marker genotypes . . . 117

5-2 Expressions of quadrigenic disequilibrium DAB in terms of .-~, .Ir1 pic configuration
frequencies, allele frequencies and lower-order disequilibrium coefficients . 122

5-3 Mating frequencies of families and offspring genotype frequencies per family for
two markers sampled from a natural population . . . ....... 123









B-l Estimates of relative frequencies from a genetic mapping project of Crohn's disease 132

B-2 Estimates of relative frequencies from a genetic mapping project of Crohn's disease
Cont.. .... . . . ... ................... 133

B-3 Estimates of relative frequencies from a genetic mapping project of Crohn's disease
Cont.. .... . . . ... ................... 134

B-4 Estimates of zygotic disequilibrium from a genetic mapping project of Crohn's
disease ................... . . . ...... ... 135

B-5 Estimates of zygotic disequilibrium from a genetic mapping project of Crohn's
disease Cont. . . . . . . . . . 136

B-6 Estimates of zygotic disequilibrium from a genetic mapping project of Crohn's
disease Cont. . . . . . . . . . 137









LIST OF FIGURES


Figure page

3-1 Haplotype configuration of a diplotype for two SNPs . . . 84

3-2 Diplotype configuration of a -1-1 rn. 'pe for two SNPs . . . 85









Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

STATISTICAL MODELS FOR HAPLOTYPING COMPLEX HUMAN DISEASES WITH
A FAMILY-BASED DESIGN

By

Qin Li

August 2009

CI 'ir: Rongling Wu
Major: Statistics

It has long been recognized that many human diseases involve the action of multiple

genes and nongenetic factors and also show strong correlation among relatives. Because

of this complexity, genetic mapping with a family-based design (including parents and

offspring) is particularly needed for identifying genes and their inheritance involved in

human diseases. In this dissertation, I explore several fundamental aspects of family data

in constructing the linkage disequilibrium map of the human genome and fine mapping

disease genes. A library of statistical models has been derived to estimate and test the

pattern of gene segregation in a natural population and genetic effects of haplotypes on

complex diseases. Because genetic information of interest to population and biomedical

genetic studies cannot be observed, a series of mixture models proven powerful for solving

missing data problems have been built within the family design. These models generate a

number of testable hypotheses about the genetic control of complex diseases. Specifically,

this dissertation presents various solutions into genetic and statistical problems in the

following v-,-i-

(1) Construct a multilocus population and multilocus quantitative genetic model with

SNP data: The models proposed allow the test of high-order disequilibria on the diversity

of a natural population and of crossover interference on the transmission of genes during

meiosis. By tracing the path of gene transmission from different parents, the models

provide a way of quantitatively testing genetic imprinting effects on human diseases.









(2) Develop a new approach for estimating linkage disequilibria at the zygote

level: The family design has a capacity of separating the diplotypes that form the

same heterozygote and, thereby, estimating gametic and non-gametic disequilibria and

trigenic and quadrigenic disequilibria. The new approach relaxes the Hardy-Weinberg

equilibrium for a population and extends the concept of linkage disequilibrium mapping to

any nonequilibrium populations.

(3) Derive a series of closed forms for the EM algorithm: These algorithms are shown

to be robust for estimating population genetic parameters (including haplotype frequencies

and linkage disequilibria of various orders), gene transmission parameters (including the

recombination fractions and crossover interference), and quantitative genetic parameters

(including additive, dominant, and imprinting effects of haplotypes). The accuracy and

precision of parameter estimates are investigated through simulation studies.

The dissertation provides a handful of state-of-art technologies for genetic mapping

of human diseases with commonly used family-based designs. These technologies, coupled

with empirical and laboratory studies, will help to predict the occurrence and progression

of a disease using the information about its underlying genes and biological path-i, .









CHAPTER 1
INTRODUCTION TO MAPPING COMPLEX DISEASES

1.1 Introduction

Almost all biological traits have a genetic component. The identification of genetic

factors underlying a trait is one of the most thorny issues in genetic research. This is

due to many reasons. First, these genetic factors may be numerous, each contributing a

minor portion to trait variation. Second, the expression of one gene relies on the presence

of other genes, leading to a network of epistatic interactions. Such genetic interactions

may take place between genes not only from the same genome, but also from different

genomes or different individuals. Third, a web of genes often interact with environmental

and developmental signals to form a hyperspace of interaction landscape. Fourth, even if

the same gene is affecting a trait, increasing evidence from molecular studies indicates that

the expression of the gene depends on where its alleles come from, a phenomenon called

the parent-of-origin effect or genetic imprinting. These complexities have largely hampered

the discovery of genes responsible for biological traits.

One approach for enhancing gene discovery is to collect genetic and ph. i.rl/pic

data in a well-designed mapping population and analyze these data using powerful

statistical tools. The success of this approach will critically rely upon the development

of robust statistical models and algorithms suitable for a specific genetic problem. Since

the publication of Lander and Botstein (1989)'s seminar paper on interval mapping of

complex traits, a wide variety of statistical methods have been developed to map genes or

quantitative trait loci (QTLs). Broadly -I.' i1:ii these new methods have the following

aims: improving the precision of QTL separation from a chromosomal segment (composite

interval rii' l'iir. Zeng, 1994), extending genetic mapping to accommodating outbred

crosses (Xu and Atchley, 1996), complicated mating designs (Jannink and Wu, 2003),

and natural populations (Wu et al., 2002), mapping multiple interacting QTLs (multiple

interval rn -piIirIJ. (Kao et al., 1999), mapping QTLs for trait correlations (Jiang and









Zeng, 1995) and genotype-environment interactions (van Eeuwijk et al., 2005), mapping

QTLs for binary (Xu and Atchley, 1996) and categorical traits (Li et al., 2006), deriving

QTL mapping procedures within the B .i.- -~i i paradigm (Satagopan et al., 1996; Yi et al.,

2003), and mapping QTLs for complex dynamic traits (functional r- I piirj-. Ma et al.,

2002; Wu and Lin, 2006) among others. Specific statistical issues for QTL mapping have

also been considered in several areas including the determination of critical thresholds

(Ch'iii 1h11 and Doerge, 1994; Zou et al., 2004), model selection (Broman and Speed, 2002),

nonparametric mapping of QTLs (Zou et al., 2009), and .i-Jmptotic properties of QTL

parameter estimates (C'!. i, 2005).

In this chapter, I will introduce basic genetic designs used for QTL mapping and

highlight the recent developments of disease mapping. At the end, I will describe the goals

of this dissertation.

1.2 Genetic Designs

The genetic control of human diseases can be studied with an animal model system

like the mouse and rat (MiiI. et al., 1991) or directly with humans. Genetic studies

of complex diseases will need an appropriate genetic design. Types of populations for

genetic mapping are different between animals and humans because of their different

biological properties and ethnic consideration. Below are several commonly used mapping

populations.

1.2.1 Experimental Crosses

Cross between two different mouse strains can widen genetic diversity because of

segregation and recombination of genes (Darvasi, 1998). Crossing two inbred strains leads

to a backcross or F2 population, in which different .- irn..1 ipes are generated for each gene.

Other cross populations include recombinant inbred lines (RILs). An RIL population

is generated by continuous inbreeding of the progeny, initiated with two heterozygous

founders, for an adequately large number of generations that leads to the disappearance of

any heterozygote (Broman, 2005). Linkage analysis in terms of the recombination fraction









between different loci establishes the foundation for genetic mapping with experimental

crosses.

1.2.2 Multigenerational Families

In humans, neither adequate numbers of progeny can be generated from a single

family nor can any controlled cross be made possible. For this species, a nuclear family

with multiple successive generations is often used in order to accumulate a sufficient

number of progeny for genetic mapping (Fulker and Cardon, 1994; Liu et al., 2006; Xu

and Atchley, 1995). This design can be used to map genes for inherited diseases, such as

diabetes or cancer. The recombination fraction and identical by decent (IBD) coefficient

are the key for genetic mapping with multigenerational families.

1.2.3 Natural Populations with Unrelated Individuals

The genetic mapping of complex traits including drug response can be conducted by

sampling a collection of unrelated individuals at random from a natural population (Lou

et al., 2003). In a population, different loci are genetically associated, with the extent

described by a parameter called the (gametic) linkage disequilibrium (LD). LD-based

mapping is meritorious in terms of its simple sampling scheme (especially suitable for

humans) and high-resolution dissection of a gene into a narrow genomic region. Natural

populations with unrelated families: Although LD mapping has tremendous potential

to fine map functional genes for a complex trait, it may provide a spurious estimate of

LD in practice when the association between genes is due to evolutionary forces, such

as mutation, drift, selection, population structure, and admixture (Lynch and Walsh,

1998). A mapping strategy that samples unrelated families (composed of parents and

offspring) from a natural population (Dupuis et al., 2007; Wu and Zeng, 2001) is helpful

for overcoming the limitation of LD mapping by simultaneously estimating the linkage and

linkage disequilibrium.









1.2.4 Natural Populations with Related Families

Genetic studies of some particular complex traits require a mapping population to

be derived from multiple related families. For such a genetic design, the recombination

fraction, IBD, and linkage disequilibrium will be needed to be estimated at a time, tracing

the co-transmission of alleles from parents to their offspring. This design is powerful for

studying the evolution of genes that control drug response.

1.2.5 Case-Control Studies

In the context of pharmacogenetics, a usual approach is to examine the active

treatment arm of a clinical trial and divide subjects in the treatment arm into those with

a positive response to the drug and those with a negative or no response. These two

groups then constitute cases and controls who are .-, I.1 riped for a particular candidate

gene thought to be related to the treatment phenotype. Risch and Merikangas (1996) have

shown that association studies are more powerful than linkage studies for finding genetic

determinants of a complex i' i. .l irpe such as treatment response.

1.3 New Developments

1.3.1 From QTL to QTN

While genetic mapping has benefited from the rapid advances in statistical and

computational technologies, the completion of the human genome sequence in 2005 and its

derivative, the HapMap Project, together with rapid improvements in genotyping analysis,

have allowed a genome-wide scan of genes for complex traits or diseases (Altshuler

et al., 2008; Ikram et al., 2009). Such genome-wide association studies have greatly

stimulated our hope that detailed genetic control mechanisms for complex phenotypes

can be understood at individual nucleotide levels or nucleotide combinations. During the

past two years, more than 250 loci have been reproducibly identified for polygenic traits

(Hirschhorn and Lettre, 2009). Some of these loci confirm the previous discovery by other

approaches. It is clear that many genes detected affect the outcome of a trait or disease

through its biochemical and metabolic path-, i-. For example, of the 23 loci detected









for lipid levels, 11 tri :-. vr-r their effects by encoding apolipoproteins, lipases, and other

key proteins in lipid biosynthesis (Mohlke et al., 2008). Genes associated with Crohn's

disease encode autophagy and interleukin-23-realted path-i- (Lettre and Rioux, 2008).

The height loci detected regulate directly chromatin proteins and hedgehog signaling

(Hirschhorn and Lettre, 2009). Genome-wide studies have identified genes that encode

the sites of action of drugs approved by the Food and Drug Administration, including

thiazolidinediones and sulfonylureas (in studies of type 2 diabetes) (Mohlke et al., 2008),

stations (lipid levels) (Mohlke et al., 2008), and estrogens (bone density) (Styrkarsdottir

et al., 2008).

As a more accurate and useful approach for characterizing the genetic variants for

complex traits, a direct analysis of DNA sequences with SNP data has incredible benefits

(Ron and Weller, 2007). If a string of DNA sequence or haplotype is known to increase

disease risk, then a specialized drug can be designed to inhibit the expression of this DNA

sequence. The control of this disease can be made more efficient if all of its underlying

DNA sequences are identified over the entire genome. A new term, quantitative trait

nucleotides or QTNs, has been coined to describe the sequence polymorphisms that cause

phenotypic variation in a quantitative trait. The minimum nucleotide size of a QTN is

a single SNP, whereas very often a QTN contains two or more closely linked SNPs. The

association analysis between multi-SNP QTNs and phenotypes is statistically challenging

because of the existence of unphased haplotypes.

1.3.2 From Linkage or Linkage Disequilibrium Mapping to Joint Analysis

If a genetic design contains the transmission of genes from parents to their offspring,

linkage mapping can be used. Linkage mapping with an experimental cross is powerful to

detect the existence and distribution of QTLs throughout the genome, but its mapping

resolution is low unless a huge sample size has been obtained. Given a currently available

mapping population of 100-200 prc-. n.i for most genetic programs, linkage mapping

only can map QTLs to 5-10 cM. This may translate into a physical distance of several









megabases, which may contain several hundred genes. Although recombination-based

mapping has facilitated positional cloning of QTLs in inbred crop lines, its use in

outcrossing species including humans is not practical due to unavailability of mutagenesis

and candidate gene validation in these species. Currently, population-based association

studies have been proposed to map QTLs that are segregating in natural populations.

Because any long range association between a marker and QTL has been broken due to

recombinations that accumulate over many generations, this approach can only detect a

marker-QTL association in a short stretch of the genome, and thus provides a vital tool

for high-resolution mapping of QTLs. The theoretical basis of association studies is the

linkage disequilibrium (LD) that refers to the nonrandom association of alleles at different

loci in a population. LD-based mapping that directly capitalizes existing wild populations

is well suited to humans in which vast stores of genetic variation remain exploited.

A i, i r problem with the LD mapping strategy is that it provides little insight

into the mechanistic basis of LD detected in a natural population. Without such

knowledge, however, the genomic localization and cloning of genes based on LD may

not be successful, because a strong LD detected between two genetic loci may be due to

the recent occurrence of disequilibrium rather than a close physical map distance of the

two loci. The cause of LD can be revealed in a nuclear family through a combined linkage

and LD analysis with a transmission/disequilibrium testing design. For a set of randomly

sampled families from a natural population, linkage and LD mapping strategies can also

be combined, taking advantage of each strategy to improve the power and resolution of

QTL mapping (Wu et al., 2002). Obviously, simultaneous estimation of the recombination

fraction and LD between the markers and QTL avoids false positive results (spurious

LD) when LD is used to fine map genes for complex traits. Lou et al. (2005) developed

a unifying model for integrating interval and LD mapping to fine map the locations of

QTLs on a linkage map. As with interval rn lpplii this model allows to scan for the

existence and distribution of QTLs throughout the linkage map, and meanwhile provides









the high-resolution estimation of QTL mapping by making use of recombinants that are

generated over different generations.

1.3.3 From Genetics to Epigenetics

Genes and chromosomes are composed of the sequence of building blocks in the

DNA molecules. The DNA sequence of a gene is transcribed into mRNA, which is then

translated into the sequence of a protein. Various proteins form a phenotype through

biosynthesis. In this sense, genes carry the blueprints to make the phenotype of cells,

tissues, and organs. When different sets of genes are turned on or expressed, different

phenotypes will be generated.

A growing body of evidence shows that genes affect a phenotype not only through

the DNA sequence, but also through the structure of DNA. The phenomenon that a DNA

structure is modified is called the epigenetic modification or epigenetic mark. Epigenetic

modifications include addition of molecules, like methyl groups, to the DNA backbone.

Adding these groups changes the appearance and structure of DNA, altering how a gene

can interact with important interpreting (transcribing) molecules in the cell's nucleus.

Because they change how genes can interact with the cell's transcribing machinery,

epigenetic modifications generally turn genes on or off, allowing or preventing the gene

from being used to make a protein. These are different from mutations and '-.-I rv changes

in the DNA sequence (like insertions or deletions). The latter change the sequence of the

DNA and RNA, which may further affect the sequence of the protein.

There are different kinds of epigenetic i11 ii Is", chemical additions to the genetic

sequence. The addition of methyl groups to the DNA backbone is used on some genes to

distinguish the gene copy inherited from the father and that inherited from the mother.

This is known as ;, i,. I w imprinting". Through the imprinting, the gene copies are

distinguished and the cell is then informed of which copy will be used to make proteins.

According to Mendelian genetics, both parental copies should equally contribute

to the phenotype. The impact of an imprinted gene, however, depends only on which









parent its allele was inherited from. For some imprinted genes, the cell only capitalizes the

copy from the mother to create proteins, and for others only that from the father. Early

evidence for imprinting was detected centuries ago by mule breeders in Iraq. They noted

that crossing a male horse and a female donkey generated a different animal than breeding

a female horse and a male donkey. After the mid 1950s, such parent-of-origin effects in

genetics were increasingly recognized. In the mid 1980s, genetic studies with mice showed

that inheritance of genetic material from both a male and a female parent was required for

normal development, but abnormalities of development would results when the inherited

genetic material was purely from the male or the female. Around the same time, the

effects of some transgenes in mice were discovered to be different when they were passed

from the male or female parent. The first naturally occurring example of an imprinted

gene was the genetic imprinting of the IGF-2 gene in mice discovered in 1991. At present,

about 50 imprinted genes have been identified in mice and humans (Giannoukakis et al.,

1993).

The genetic mechanism of imprinting is still unclear. But a general hypothesis is

that imprinting represents a genetic "battle of the sexes", in which maternally-expressed

imprinted genes usually suppress growth, while paternally expressed genes usually enhance

growth. The "battle of the sexes" hypothesis is partly based on the finding in animals that

-n.I. 2 growth-promoting imprinted genes help ensure the continuation of the father's

genes. The mother, however, is more interested in maintaining her own health and, hence,

her genes "fiIli the paternal genes and limit the size of the embryo or fetus.

Because of their relevance in growth, it is likely that imprinted genes p1 ic a major

role in the development of cancer and other conditions in which cell and tissue growth are

abnormal. Imprinted genes in which the copy from the mother is turned on (maternally

expressed) usually suppress growth, while paternally expressed genes usually stimulate

growth. Some tumor suppressor genes should be maternally expressed, but they are

mistakenly turned off to prevent the growth-limiting protein from being made, finally









causing cancer. Likewise, many oncogenes growth-promoting genes are paternally

expressed, for which a single dose of the protein is just right for normal cell proliferation.

However, if the maternal '. i- of the oncogene loses its epigenetic marks and is turned

on as well, uncontrolled cell growth can result. In the collection of birth defects known as

Beckwith-Wiedemann syndroms, abnormal epigenetics leads to abnormal growth of tissues,

overgrowth of abdominal organs, low blood sugar at birth and cancers. Similarly, in the

imprinting disorder Prader-Willi syndrome, abnormal epigenetics causes short stature and

mental retardation as well as other syndromic features.

1.4 Dissertation Goals

In this dissertation, I will propose a library of statistical models for genetic haplotyping

of complex human diseases based on the linkage and linkage disequilibrium principles. I

will construct my model within a mixture-model framework in which multiple mixture

components corresponding to different genotypes of the underlying genes are involved.

I will implement the EM algorithm to estimate population and quantitative genetic

parameters for genotypes contained within the mixture model. More specifically, I will

focus on the following aspects of genetic haplotyping:

(1) Develop a statistical algorithm for constructing a joint linkage-linkage disequilibrium

map by simultaneously estimating the recombination fraction and linkage disequilibrium

between different molecular markers in a natural human population (C'!i Ipter 2);

(2) Frame a general quantitative genetic model for detecting risk haplotypes that encode

a complex disease in a set of random unrelated families from a natural population

and characterizing the pattern and magnitude of genetic interactions between

different haplotypes (C'!i pter 3);

(3) Derive a novel statistical model for computing genetic imprinting expressed at the

DNA sequence level and estimating the effects of genetic imprinting on human

diseases (Chapter 4);










(4) Establish a robust procedure for genetic haplotyping of complex diseases in a

population which is not at Hardy-Weinberg equilibrium (C!i Ilpter 5).

For each of the models developed, simulation studies will be performed to test

the statistical behaviors of the models. The accuracy and precision of parameter

estimation and power of the model to detect significant genes are explored under

different simulation scenarios (with different sample sizes and heritabilities). I will

also investigate several computation issues including the efficiency and convergence

rate.









CHAPTER 2
CONSTRUCTING A LINKAGE DISEQUILIBRIUM MAP

2.1 Introduction

Linkage disequilibrium mapping with high-throughput molecular markers, such

as single nucleotide polymorphisms (SNPs), has emerged as an important strategy for

the genome-wide identification of genes involved in common human diseases (Ardlie

et al., 2002; Tapper et al., 2008; Weir, 2008). The central rationale of this approach is

founded on the expectation that allelic association (known as linkage disequilibrium or

LD) between different polymorphic loci decays with their distance such that a candidate

genomic region for a disease can be refined by examining the LD between the disease and

a series of SNPs typed at high density (Collins, 2007; Marques et al., 2008). However, the

occurrence and extent of LD in a given population is often affected by many evolutionary

factors that operation on it, including mutation, selection, drift and admixture. By

obscuring the signal of LD, these factors will limit the application of LD mapping to

narrow the region containing a causal site for the disease.

One way to circumvent this limitation is to draw a profile of LD that decays over an

extended genomic region (Farnir et al., 2000; Liu et al., 2006; McRae et al., 2002). A great

deal about this can be learnt from marker by marker association which can then be used

to define the statistical power of association studies for the disease phenotype utilizing

SNPs (Schork, 2002) and to guide the selection and density of such polymorphisms to

create marker maps most efficient in candidate gene, candidate region, and eventually

whole-genome association studies (De La Vega et al., 2002). It is relatively straightforward

to construct a map of LD decaying over physical distance between difference SNPs because

current technologies are able to count the base pairs between any two polymorphic sites.

A number of LD maps in terms of physical distance have been reported and instrumental

for explaining the genetic diversity of human populations (Daly et al., 2001; Dawson et al.,









2002; Gabriel et al., 2002; Reich et al., 2001; Tsunoda et al., 2004) as well as the origin of

important human genes (Tishkoff et al., 1996, 2001; Tishkoff and Williams, 2002).

An LD map can also be described by the pattern of LD decline over genetic distance,

reflecting the recombination. Regions in the genome with extensive LD may correspond to

recombination cold-spots, whereas regions with recombination hot-spots will be expected

to show less extensive LD. Thus, a detailed map of LD in terms of the recombination

will help to identify genomic regions with recombination cold- or hot-spots and further

determine the best marker density for adequate coverage. In hot-spot areas, a higher

marker density is required for a better coverage of the genome than in cold-spot areas.

Several statistical approaches have been derived to jointly measure the linkage and linkage

disequilibrium for open-pollinated natural populations (Wu and Zeng, 2001) and domestic

animals (Georges, 2007). These approaches allow the construction of an LD map, but

do not take into account the design of LD mapping suitable for humans. Thus far, a

statistical algorithm for constructing a linkage-linkage disequilibrium map in humans has

not been developed.

The purpose of this chapter is to propose a statistical design and algorithm for

simultaneously estimating the recombination fraction and linkage disequilibrium for

human populations. The design samples a panel of random families from a natural human

population, each of which is composed of both parents and one or more children. The

design fully considers the outcrossing nature of human populations. The algorithm

developed is implemented with haplotype and diplotype information derived from

unphased marker data collected for each member of a family. The algorithm incorporates

a two-level hierarchy of estimation routes: at the upper level, population genetic

parameters including haplotype frequencies, allele frequencies and linkage disequilibria

are estimated from parental data, whereas at the lower level, the recombination fraction

is estimated from the data of offspring genotypes whose haplotypes are transmitted from

a specific mating of parental .-~, Ir'l'pes. The approach will include a testing procedure









that permit the examination of the difference of the extent of LD in two different sex

populations. We perform computer simulation to investigate the statistical properties of

the approaches in terms of the power for linkage and association detection and parameter

estimation with changing sample size and population structure.

2.2 Two-Point Model

2.2.1 Sampling Strategy

Suppose there is a natural human population at Hardy-Weinberg equilibrium

(Appendix A). From this population, n unrelated families are randomly sampled, each

of which contains the mother, the father and one or more children. Each subject sampled

is typed for single nucleotide polymorphism (SNP) markers, aimed to study the pattern

of genetic diversity and structure in humans. Consider two markers A (with two alleles

A and a) and B (with two alleles B and b), which are segregating in the population. The

two markers form four haplotypes, AB, Ab, aB, and ab, with the frequencies denoted as

Pui, Pio, Poi, and poo, respectively. Let p vs. 1 p and q vs. 1 q denote the frequencies

of alleles A, a and B, b, respectively, in a mix population of males and females (assuming

no sex-specific difference in allele and haplotype frequencies). Thus, we have the following

relationships:

pil = pq+D

pio = p(l q) D
P0 G A q)D (2-1)
Poi = (1 p)q- D

Poo = (1 p)(l q) + D,

where D is the coefficient of linkage disequilibrium between the two markers.

The two markers produce nine joint genotypes, AABB (coded as 1), AABb (coded

as 2), ..., aabb (coded as 9), which are observed. Thus, each subject will bear one of these

, iir.1rpes, and the parents in each family will be one of 9 x 9 = 81 possible genotype by

,. ip'i rpe combinations. Depending on the parental genotype combination, all offspring in

a family will have a certain number of marker genotypes. At meiosis, a parental diplotypes











Table 2-1. Data structure of two markers typed for a panel of full-sib families, each
composed of the mother, father and offspring, sampled at random from a
natural population


Mother

AABB

AABB

AABB

AABB

AABB

AABB

AABB

AABB

AABB


Family

Father

AABB

AABb

AAbb

AaBB

AaBb

Aabb

aaBB

aaBb

aabb


Group

1

2

3

4

5

6

7

8

9



41



81


Number

nil

n12

n13

n14

n15

n16

n17

n18

n19



n55


AABB

nil1

n121



n141

ni5i


Offspring

AABb AAbb AaBB AaBb Aabb aaBB aaBb aabb



n122

n132


n144

n154


n152

n162


n155

n165


n184 n185

n195



n551 n552 n553 n554 n555 n556 n557 n558 n559


n999


will be broken down to form recombinant and nonrecombinant haplotypes for the next

generation. The relative proportion of these two types of haplotypes is determined by

the recombination fraction denoted as 0. Let nij denote the number of families from

the combination between mother ., ir.1 ipe i and father genotype j for the two markers

(i,j = 1, 2,..., 9) and nijk denote the number of offspring derived from parental .- iv .1 i/pe

combination ij (k = 1,2,..., 9). Table 2-1 gives the structure of ii..1 lpic data collected

from n = Y7i Y i n"y random families in which the distribution of genotypes in the

mothers, fathers, and their offspring are shown.


AaBb AaBb



aabb aabb









2.2.2 Likelihood

For observed mother genotypes (Mm), father genotypes (Mf), and offspring

,. in'li pes (Mo), a joint probability is expressed as


P(Mm, Mf, Mo) = P(Mm, Mf)P(Mo|Mm, Mf)

P(Mm)P(Mf)P(Mo|Mm,Mf),

where it is assumed that there is random mating between parents for the two markers.

Thus, a joint log-likelihood for the parameters, Q = (pii,poo,pio,Poi; 0) (Op; g,), can be

factorized into two parts:


log L(Q|Mm, Mf, Mo) = log L(Q|Mm, Mf) + log L(,|Mm Mm Mf Op). (2-2)


Maximizing the likelihood (2-2) is equivalent to maximizing log L(Qp|Mm, Mf) and

log L(g I Mm, Mf, Mo, Op) individually. The first part is at a upper level of joint

likelihood (2-2) because it is constructed with the information from the parents. Given its

information derived from the transmission from parents to offspring, the second part is at

a lower level of the likelihood.

The upper-level likelihood is constructed by parental genotype observations and

expected parental .- In.. irpe frequencies. Table 2-2 shows the structure and frequencies of

mother by father genotype combinations under random mating. For a double heterozygote

AaBb, its observed genotype may be derived from two possible diplotypes, ABRab (in a

probability of plipoo) or Ab aB (in a probability of piopol), where the vertical lines are

used to separate the two underlying haplotypes of a diplotype. A polynomial function for









the upper-level likelihood is then constructed as


log L(2,|Mm, Mf) = constant

+nu logpI + n12 log(2p4pio) + 13 log(p1p20) + 14 log(2p41poi)

+I Mllog[2p 1(ppoo +Piopoi)] + 16 log(2pp1oPopoi) + n17log(p21p1)

+I18 log(ppioPop) + 019lOg(p po) + + + n55 log[(pllpoo + pIlOPO)2

+ + 199 log(po0). (2-3)




























0
t-





0

0



0

0
a














0
r^















0
m

















0
VI

















0




0



0


0





ClJ





0
H
cb
cc^
k-
a
>^
'(-


k-
43~




a;
r
o

*^
0.1
k





cb




bO


rr
i
,C
Sca
0D
r^



or

eT
mb


-IdCl





- idC~


-ICl


-kNu
-'Clc


0 -


S 0



x x x x x x x x x

R R R R R 4


-I | -lM -


0
o o








o o
c 5
C=)



CX)cU


c=
c=) c=)

C=)cu


~9
rg~4 ~


L0 c I-t 0 Ct


c0

ci a
a


co

ci B
u









For a given parental genotype combination, a certain group of offspring genotypes is

produced. For example, if the mother and father both have genotype AABB, then their

offspring will ahv-wi have a diplotype ABIAB or ,j. .1 irpe AABB. If one of the parents

is AABB and the other is AABb, then the offspring have diplotypes ABIAB (or genotype

AABB) and ABRAb (or genotype AABb) with an equal frequency. For a parent with

. irnli pe AaBb, there will be two possible diplotypes, ABRab or AblaB, whose relative

frequencies are

S PiPoo (2-4)
PiiPoo + PioPol
1 PoPo i(2-5)
PiiPoo + PioPol

respectively. Both the diplotypes will produce haplotypes AB, Ab, aB, and ab, with

frequencies defined as follows:

Parent AaBb Haplotype


Diplotype Frequency AB Ab aB ab


AB|ab (1 0) 0 i(1 0)


AblaB 1 1( i ) (1 ) 0 0
Let

W 0( (1 ) + (( 0 )0
(2-6)
UW2 00+(1 -0)(1 -0).

Thus, overall haplotype frequencies produced by this parent are calculated as iw1 for AB

or ab and -)2 for Ab or aB.

Table 2-2 provides the conditional probabilities of offspring genotypes or diplotypes

given joint mother and father Ir .1 rpes. Based on the information about genetic









segregation in each family, the lower-level likelihood is constructed as


log L(Qg|Mm, M,Mo, Qp) =constant

+n111 log(1) + n121 log(!) + n122 log(!) + n131 log(1) + n141 log(!) + n142 log(!)

+(n151 + n155) log(Uwi) + (n152 + n154) log(0 2)

+--.

+(n551 + n559) log(Iwf) + (n552 + n554 + n556 + "558) log(l0w2) +2 ("553 + "557) log(wJ)

+n555 log [L(w + a )]



+n999 log(1). (2-7)

In the next section, an algorithmic procedure will be described to estimate the parameters

that define the likelihood (2-3).

2.2.3 Estimation

By maximizing the upper-level likelihood (2-3), we derive a closed form for the EM

algorithm to estimate haplotype frequencies. This procedure is described as follows:

In the E step, we calculate the probability with which a double heterozygote parent

carries a particular diplotype with equations (2-4) and (2-5). In the M step, haplotype

frequencies are estimated with the calculated diplotype probability by

Pll = [4nu + 3(n112 + 21 + n14 + "41) + 2(n22 + 144 + n13 + "31 + n15 + l51 + n16

+/61 + n17 + "71 + n18 + "81 + n19 + "91 + n24 + n42) + (n23 + n32 + n25 + n52

+n26 + n62 + n27 + n72 + n28 + n82 + n29 + n92 + n34 + n43 + n45 + n54 + n46 (2-8)

+n64 + n47 + n74 + n48 + n84 + n49 + n94) + 0(n15 + "51 + n25 + n52 + U35

+n53 + n45 + n54 + n65 + n56 + n75 + 157 + n85 + n58 + 195 + 159) + n2s55],












Pl0 = I 1, + 3(123 + 132 + 136 + n63) + 2(n22 + 166 + 113 + 131 + 126 + n62 + 134

+n43 + 135 + 153 + 137 + 173 + 138 + 183 + n39 + n93) + (n12 + "21 + 116 + 161

+n24 + n42 + n25 + n52 + n27 + n72 + n28 + n82 + n29 + n92 + n46 + n64 + n56 (2-9)

+n65 + 167 + 176 + 168 + 186 + 169 + 196) + 0(n15 + 151 + 125 + 152 + n35 + 153

+n45 + 154 + n65 + n56 + 175 + n57 + 85 + 58 + n95 + n59) + (1 0)255,



POI =-I [4n77 + 3(178 + 187 + n47 + 174) + 2(188 + n44 + 197 + 179 + 184 + 148 + 176

+n67 + 175 + 157 + 173 + 137 + n72 + n27 + 171 + 117) + (n98 + 189 + 194 + 149

+, '-. + n68 + n85 + n58 + n83 + n38 + n82 + n28 + "81 + n18 + n64 + 46 + 54 (2-10)

+n34 + 142 + 124 + 141 + n14) + 0(n95 + 159 + 185 + 158 + 175 + 157 + 165

+n56 + 145 + 143 + r'!*I* + 135 + 153 + 125 + 152 + 115 + 151) + (1 )2 55],



Poo = + 3(198 + 189 + 196 + 169) + 2(188 + n66 + 197 + 179 + 195 + 159 + 194

+n49 + 193 + 139 + 192 + 129 + 191 + 119 + 186 + 168) + (187 + 178 + 185 + 158

+n84 + n48 + n83 + n38 + n82 + n28 + "81 + n18 + n76 + n67 + n65 + n56 + n64 (2-11)

+146 + 163 + 136 + 162 + 126 + 161 + 116) + 0(n95 + 159 + 185 + 158 + 175 + 157

+n65 + 156 + 145 + 154 + 135 + 153 + 125 + 152 + 115 + 151) + 255] -

The E and M steps are iterated between equations (2-4) and (2-5) and (2-8),

(2-9), (2-10), and (2-11) until the estimates converge to stable values. The estimates at

convergence are the maximum likelihood estimates (MLEs) of haplotype frequencies. The

MLEs of allele frequencies and linkage disequilibrium can be obtained by solving equation

(2-1), expressed as

P = P + P10

q = P + Pol (2-12)

D = piipoo PioPol.









The standard deviations of the estimate of haplotype frequencies, allele frequencies, and

linkage disequilibrium can be derived.

By maximizing the lower-level likelihood (2-7), we derive a closed form for the EM

algorithm to estimate the recombination fraction. This procedure is described as follows:

In the E step, we calculate the probability with which a considered haplotype

produced by a double heterozygote parent is the recombinant type using


0t + 0 for haplotype AB or ab
(2 13)
+12 =Q0 for haplotype Ab or aB,

and the probability with which a double heterozygote offspring carries recombinant

haplotypes by

2[2 +(1 0)2]02 + 0( 0) (2- 4)


In the M step, the estimate of the recombination fraction is obtained by


0 (2-15)

where m equals the sum of the following terms,










1l(n511 + n515) + Q2(n512 + n514)

+6l(n521 + n526) + ', -(n523 + n524)

+1l(n532 + n536) + Q2(n533 + n535)

+l(n541 + n548) + Q2(n542 + n547)

+ 1 (n562 + n569) + ', (n563 + n568)

+ 1(n"574 + n578) + Q2(n575 + n577)

+1 (n584 + n589) + ',(n586 + n587)

+ l(n595 + n599) + ','-(n596 + u598)

+Ql(151 + n155) + Q2("152 + n154)

+ l(n251 + n256) + (n253 + n254)

+ 1(n"352 + n356) + Q2(n353 + n355)

+Ql(i451 + n458) + Q2(n452 + n457)

+i (n652 + 659) + ', (n653 + n658)

+1(" 754 + n758) + Q2(n755 + n757)

+ l(n854 + n859) + ', -(n856 + n857)

+l(n955 + n959) + Q2(n956 + n958)

+2i(uss551 + n559) + 2b2("553 + n557)

+(1 + ''_)(n552 + n554 + n556 + n558) + u555
and


for parental -.1.' irpe combination 5 x 1

for parental ,.1 i'.rpe combination 5 x 2

for parental ,j. I rpe combination 5 x 3

for parental ,j. I rpe combination 5 x 4

for parental -.1.' irpe combination 5 x 6

for parental -.1.' irpe combination 5 x 7

for parental ,j.1. irpe combination 5 x 8

for parental ,j.1. irpe combination 5 x 9

for parental ,j.1. irpe combination

for parental -.1.' irpe combination

for parental -.1.' irpe combination

for parental -.1.' irpe combination

for parental ,j.1. irpe combination

for parental ,j.1. irpe combination

for parental -.1.' irpe combination

for parental -.1.' irpe combination



for parental ,j.1' 'pe combination 5 x 5,


M i= nijk n522 525 n544 n545 n565 n566 n585 n588
i j k
-n252 n255 n454 n455 n655 n656 n855 "858-


The E and M steps are iterated between equations (2-13), (2-14) and (2-15) until stable

estimates are obtained. The standard deviation of the estimate of the recombination

fraction is further derived.

To obtain the estimate of recombination fraction, the likelihood formulated by

the multinomial distribution with the parameter 7 9 LY nijk and pijk needs to be









maximized, where i,j, k = 1, ,9. nijk is the number of observations in genotype

combination of Mf = i, Mm = j, Mo = k. The next lemma using an example shows that

under our design, the above process is equivalent to maximize the likelihood using the

conditional probability.

Lemma 1. In the process of obtaining the Maximum Likelihood Estimate (II/ 1.) of

the recombination fraction, ''nq'1' n i':i.m;. EM i1'.i, ':hm in the likelihood using joint

probabilities of both parent's and offspring's _i. l,'.*/';/. is equivalent to using conditional

probabilities of offspring's _ii. ',i.*/ ;. p given parent's i. i,'.*/;'' -

Proof:


log L(,g|M, Mf,Mo,, p)

n= illog(p1 ) + n121log( p 12piplo) + + n151log( p [2p11poo( r) + 2piopoir])

+n152log(jpl [2pjJpoo0 + 2piopoi(1 0)]) +

+n55510log( [[2pipoo(1 0) + 2piopo0]2 + [2pijpooO + 2piopol(1 0)2])

+... + n999 log(p0o)




Since pii, po,poi,Poo are known from the estimation through the upper-level

likelihood (2-3), the above equation is equal to


Constant + n151 log( p1 [2p1poo(1 0) + 2piopoL1])

+n152log(p21[2pn pooO + 2piopo(l 0)]) +

+n555log(' [[2pnipoo(1 0) + 2piopoi0]2 + [2pipooO + 2piopoi(1 0)]2])

+ + 1959 log( po[2pipoo(1 8) + 2piopoil])




Take the cell 151 from the above likelihood as an example, in the E step, the

probability with which a considered haplotype produced by a double heterozygote parent









is the recombinant type is

joint PiOP01O
1 P11Poo(l 0) + ploPo

If use the cell 151 from the likelihood using conditional probabilities shown in (2-7),

in the E step, the probability with which a considered haplotype produced by a double

heterozygote parent is the recombinant type is calculated using (2-13), i.e.


cond

'P1
(1 )0
(1- 0)+(1- )0
P (______1 0 )0+P 0 Pl_01 0
P11POO+P1OPO1 P11POO+P1OPO1

1iiPoo(l 0) + piopol0
joint




Similarly, and ( in (2-13) and (2-14) using the conditional probabilities are the same

as those using the joint probabilities. Therefore, we conclude that it is equivalent to

obtain recombination fraction estimation using the joint probabilities and conditional

probabilities in the lower-level likelihood. o

2.2.4 Hypothesis Testing

After genetic parameters are estimated, we will need to test whether the two markers

are associated and/or linked on the same genomic region. This can formulate the following

hypotheses:


Ho : D = 0 and 0 = 0 vs. H1 : At least one of the qualities above does not hold.


The likelihood under the Ho and H1 are calculated from which a log-likelihood ratio

is calculated. By comparing this test statistic with a X2 threshold with two degrees of









freedom, we can accept or reject the Ho. It is also needed to test the significance of D and

0 separately, showing how the two markers are related. Under the null hypothesis of Ho:

D = 0, parental diplotype and genotype frequencies can be simply expressed as a function

of allele frequencies which can be estimated with no need of the EM algorithm. Similarly,

under the null hypothesis of Ho: 0 = 0, offspring diplotype and genotype frequencies

within each family are simply expressed as function of the Mendelian segregation ratio so

that no parameter need to be estimated.

Our family-based design allows the test of difference in genetic diversity and

structure between males and females. Let haplotype frequencies, allele frequencies,

linkage disequilibrium and recombination fraction different between the two sexes. These

sex-specific parameters are denoted as p', p', p' and po, pm and qm, Dm, and Om for

males and pgl, p1o, p4l and poo, Pf and qf, Df, and Of for females. Whether males and

females have the same haplotype distribution can be tested by formulating the following

hypotheses:

Ho *: (p~iP oPPo) (= ~(PmPmPUP ) (P1,Pio, pol, oo)
(2-16)
H1 : At least one of the qualities above does not hold.

The likelihood under the H1 of hypotheses (2-16) is calculated with the following steps.

First, the observations for each of the nine possible genotypes for the two markers in

males and females are collapsed using ni. 1= Y n and n. = 1 nij, respectively.

Second, from these collapsed observations, sex-specific haplotype frequencies are estimated

with the EM algorithm. Third, because of differences between males and females, the

frequencies of parental diplotype (and therefore genotype) combinations in table 2-2 will

be replaced by the products of sex-specific diplotype frequencies from which the likelihood

is estimated.

Sex-specific differences in allele frequencies and LD can be tested with the null

hypothesis, Ho : p" = pf and qm = qf and Ho : Dm = Df, which are equivalent to the









following constraints, respectively,


p7 +po = pI +P (2-17)

pi + pi = pi +p l (2-18)

p"1pmo pl1= pl _- PP. (2-19)

Parameter estimation under these null hypotheses can be conducted using the EM

algorithm implemented with the constraints in equations (2-17), (2-18), and (2-19),

respectively.

If there is a sex-specific difference in the recombination fraction, we will redefine the

overall probability of a haplotype in equation (2-6) by i = (1 Om) + (1 '.. ,

wT = +. f. + (1 -.(-) and w { = (1- Of) + (1 f ), w'+ = (0 t + O)( (1 O-)
2 = (i )(1- O) and 4f &-(l-O)(i ) Of, UA;(l-Q)l 09
where pmpmo/(pmpmo + pmpm) and o pf{ip /(p{P + Pf f). These sex-specific

overall probabilities are used to express the diplotype or vi. .1 nripe frequencies of offspring

within a family. Then, the EM algorithm is derived to estimate sex-specific recombination

fractions Om and Of. Whether the recombination fraction is sex-specific can be tested by

formulating the hypotheses

Ho : Om = Of
(2-20)
H,: Om# Of,

from which a log-likelihood ratio is then calculated.

2.3 Three-Point Model

The idea of two-point analysis can be generalized to multilocus data to analyze the

linkage and linkage disequilibrium simultaneously. For n-marker design, there are be

2" haplotypes, 2"-1(2n + 1) possible diplotypes and 3" .1 irpes. In this section, we

extend the idea of two-point analysis to three-marker data. We will show the procedure

of three-point analysis and its genetic and statistical advantages compared to two-point

analysis. We will use the same strategy to sample a random set of family from a natural









population at HWE. Each family is composed of the maternal and paternal parents and

their offspring. Suppose there are three segregating markers, A (with alleles A and a), B

(with alleles B and b), and C (with alleles C and c), in the population. Each member in a

family is typed for these three markers, with data structure similar to two-point analysis

of 2-1.

2.3.1 Multilocus Population Genetics

The three markers are segregating in the population. The eight haplotypes, ABC,

ABc, AbC, Abe, aBC, aBc, abC, and abc, generated by the three markers, are distributed

in the population with frequencies denoted as pill, p11o, p101, PIoo, poll, Polo, Pool, and pooo,

respectively. The allele frequencies of each of the three markers are denoted as p vs. 1 p,

q vs. 1 q, and r vs. 1 r, respectively. Let DAB, DAC, DBC, and DABC be the linkage

disequilibria between each possible pair of markers and among all the three markers. We

express the haplotype frequencies as a function of allele frequencies at each marker and

linkage disequilibria among the markers (Weir 1996), i.e.,


pill = pqr + pDBC + qDAC + rDAB + DABC

plio = pq(1 r) pDBC qDAC + (1 r)DAB DABC

plo = p(l q)r pDBC + (1 q)DAC rDAB DABC

ploo p(l q)(1 -r) + pDBc (1 q)DAC (1 r)DAB + DABC 21)

poll = (1 p)qr + (-1 p)DBC qDAC rDAB DABC

Polo = (1 p)q(1 r) (1 p)DBC + qDAC (1 r)DAB + DABC

Pool = (1 p)(l q)r (1 p)DBC (1 q)DAC + rDAB + DABC

pooo = (1 p)(l q)(1 r) + (1 p)DBC + (1 q)DAc + (1 r)DAB DABC-

These eight haplotypes generate 64 possible diplotypes, which leads to 27 genotypes.

Diplotype frequencies for the three markers are expressed as the products of haplotype

frequencies under the HWE assumption. The frequencies of diplotypes that are genotypically

identical are collapsed so that i,. .1v rpe frequencies are calculated. Analogous to Table 2-2,

we can tabulate the mating frequencies of different ,n vir.1 .pes between the maternal









and paternal parents in table 2-3. As shown in this table, three markers produce

27 x 27 = 729 possible parental genotype combinations. Let nij denote the number of

families from the combination between mother genotype i and father .-. .1irpe j for the

three markers (i, j = 1, 2,..., 27) and ni,j,k denote the number of offspring derived from

parental .-..' Irirpe combination ij (k = 1, 2,..., 27). Similarly to table 2-1, a table of the

structure of genotypic data collected from n -= 71 7 1 nr random families in which

the distribution of i. iri.irpes in the mothers, fathers, and their offspring can be tabulated.

With such a mating structure, we can formulate a multinomial likelihood for marker

,. i.rpe observations from which the EM algorithm is implemented to obtain the MLEs

of haplotype frequencies 2p = (piii, pilo,ioi, oo, Poll, Poo, Pool, Pooo).




























0
c-





0
0



0
0;













0

Q
0
r^

m

















0

0


0





0
ci

























0
0
a






















00



Cl
0



















H
o-
>:


m

t-


"o

o


VIa
bD
'3
w


cv
c;


-I~lc








- IC~l -IC~l


0


e-s -\ C>1 C>

x x x --


Cl-Cl-Cl-Cl- Cl-
a S^ a & S^


IC


































C=





=C0
o 0


















x^ i ^5
(M (M (M (

X00


V


-Icl -





-Icl


~iCl










CiC~

-Iv

-iC~l


C) l C













C=























o= o )
0
^ib

O ^~~
O O~~
0010



RRR







O ^~~
-1^~


(M,
^~~ i
f1iO
-1^~ i~


(Mse-se-e i
CUU CC


0

0
0 0
c c=)





ClO ClO


O
o
,,
,,
u u
u u


o


u


cr,
m


o o
u u
u u
c^o
^0 ^0
a a-









2.3.2 Multilocus Mendelian Genetics

Offspring's .~.' .r1 ipes in each family are determined by the parental i~ Ir'li'pe

combination. They will be a certain number of marker genotypes from the 27 possibilities

given its parents' ,-- -.ir lpes. At meiosis, a parental dip'l. .ipes will be broken down to

form recombinant and nonrecombinant haplotypes for the next generation. A heterozygous

parent will produce different haplotypes (i.e., types of gametes). For a triple heterozygote

with .-. -n.' irpe AaBbCc, its real diplotype is unknown, although it should carry one of the

four possible diplotypes, i.e., ABClabc, ABceabC, AbClaBc, and AbceaBC. As defined in

equations (2-4) and (2-5), the relative frequencies of these dip 'l..i .pes can be expressed in

terms of diplotype (and therefore haplotype) frequencies in the population.

Assume that the three markers are in orders A-B-C. We use goo, go,, go0, and g11 to

denote the probabilities with which there is no crossover between markers A and B as

well as between markers B and C, there is no crossover between markers A and B but

one crossover between markers B and C, there is one crossover between markers A and B

but no crossover between markers B and C, and there is one crossover between markers A

and B as well as between markers B and C, respectively. The recombination frequencies

between markers A and B (012), between markers B and C (023), and between markers A

and C (013) can be expressed in terms of these g probabilities, i.e.,


012 = g10io +11,

023 901 +11, (2-22)

013 = g1io + g0,









which leads to


911 (012 + 023 013),


91o = (012 013 023),
(2-23)

go1 = (013 + 023 012)


goo = 1 91g 91o 9oi,

From these expressions, the coefficient of genetic interference between two .11i went

marker intervals can be derived as

2012023 012 023 + 013
2012023

Non-interference, i.e., I = 0, means that we have


013 = 012 + 023 2012023. (2-25)


If there is no double recombination between two .,'i went marker intervals, this means

I = 1, or


013 = 012 +023. (2-26)


The frequencies of eight different haplotypes, ABC, ABc, AbC, Abe, aBC, aBc,

abC, and abc, produced by this heterozygous parent are expressed as a function of the g

probabilities, but depending on the diplotype of this parent, i.e.,











Parental Diplotype

Haplotype ABCjabc ABc abC AbC aBc Abc aBC


For those double heterozygous parents, haplotype


g9io

2 noo

29oi

901oi

29oo

2910

2911
frequencies


Z9io
910

2911

2901
go900

1
,goo




Zgoi

911

1
will 910
will be.


collapsed


when this kind of parent produces the same haplotype with different mechanisms. As an

example, the haplotype frequencies for double heterozygote AaBbCC are calculated as

follows:

Parental Diplotype

Haplotype ABCjabC AbCjaBC


ABC

AbC

aBC


!(goo + go)

(g11 + glo)

2(gio + gil)


(go I + goo)

2(910 +911)

(g + gi)910


abC (goi + goo) (goo + 9o1).
For any parental mating of different .-~ iiri pes, we can calculate the genotype

frequencies of offspring genotypes for the three markers. Similar to table 2-2, the

distribution and frequencies of offspring ,j. In. .lpes within each family can be derived

for the three-point model. Based on observations of offspring genotypes within each family,

a multinomial likelihood can be formulated from which the EM algorithm of a complex


ABC

ABc

AbC

Abe

aBC

aBc

abC


goo

gooi

g11

glo

910

g11

901


2901

29oo

210

2911

911

Zgio

2900

901










hierarchy is derived to obtain the MLEs of Qg = (gn, gio, goi, goo). Based on these MLEs,

the estimates of the recombination fractions between different markers can be obtained.

2.3.3 Likelihood

The upper-level likelihood is constructed by a polynomial function expressed as


log L(2, Mm, Mf) = constant

+1i,i logp + 1,2 log(2p1p11o) + 1,3 log(pp1p10) + 1,4 log(2p11p1o1)

+nl,5 log[2p11 (pP11ploo + P11oPioi)] + .

+1i,i4log(2p1(pimipooo + PiloPoo + PiooPol +PIoipolo))+ + (2-27)

+n14,14 log[(pliIpo00 + 11opoo001 + p100oopo011 + popoo)2] +

+n27,27 log(p oo). (2-28)


Similar to the two-point design, for a given parental genotype combination, a certain

group of offspring .-. iri.i pes is produced. For example, if the mother and father both

have homozygous genotype AABB, then their offspring will ahv-wi have a diplotype

ABC|ABC or genotype AABBCC. If one of the parents is AABBCC and the other

is AABBCc, then the offspring have diplotypes ABC|ABC (or .. n-lipe AABBCC)

and ABC|ABc (or genotype AABBCc) with an equal frequency. For a parent with

triple heterozygotic genotype AaBbCc, there will be four possible diplotypes, ABClabc,

ABc abC, AbCjaBc or Abc aBC, whose relative frequencies are


1 P111POOO+P110P001+P101P010+P100P011
P111Poo1

T/2 PillPooo+PlloPool+plolpolo+Ploopoll

ioo10100 (2-29)
Th P111Pooo+PlloPoo1+PioPlolo+Ploopoli


r4 t 1 2 3

P100P011
P11lPOOO+PllOPOOl+PiOlPOlO+PlOOPOll









respectively. These dipl 1.. pes will produce haplotypes ABC, ABc, AbC, Abc, aBC, aBc,

abC, and abc, with the frequencies defines in (2.3.2).

Parent AaBbCc Haplotype


Diplotype Frequency ABC ABc AbC Abe aBC aBc abC abc


ABCjabc ri goo ^goi fgii 9 gio g1io gi91 igo i goo


ABc abC 2 901 0 910 g1 1 910 0 g I
Ar] Zgoi Z9oo 2 9io 2 Y\\ 2 Y\\ \Q\Q Z9oo 2 9oi

AbCjaBc 3 911 910 901 901 910 1


AbC aBc 1 Egio 9 911 9go01 goo goo 01 go 911 i90
Let

71 = r1o00 + r201 + l03911 + (1 3- 1l1i )910

72 1lol + T/2900 + l3o910 + (1 i)11 (230)

73 = 1911 + 172910 + T3g00 + (1 i 01i)o

74 Ti110 + T12911 + 1/3901 + (1 j- I 1i)00oo

Write it in matrix expression as

Tr2 71 T/3 1 1 2 3

91 12 1 -2 33

T/2 1 11 172 -3 T/2 17

1 1 172 3 7r3 r1i 12

where F = [71,71, 73,74]' and G = [goi, goo, gio, g1j]'.

Thus, overall haplotype frequencies produced by the parent with triple heterozygotic

vi. I1 ivpe are calculated as i71 for ABC or abc, -72 for ABc or abC, -73 for AbC or aBc

and -74 for Abe or aBC.

If the parent is a double heterozygotic genotype, then its possible diplotypes, as well

as their respective relative frequencies, can be listed as follows:












G. ir. .1'pe


AABbCc






aaBbCc






AaBBCc






AAbbCc






AaBbCC






AnRhrr.


Possible Diplotypes


ABC Abc


ABc|AbC


aBC abc


aBc abC


ABC aBc


ABcjaBC


AbC abc


Abc abC


ABC abC


AbCjaBC


ABc abc


Abc aBc


Relative Frequencies


piiipnoo
p111p100+p11Po 101


P111P100+P110P101
1 rii PIIIPIOO+PIIOPIOI


PO=llpooo
0 pollpooo+polopool


1 o = POlOPOO
S0 PollPooo+PoloPool

P111Po010o
012 pi1polo+p11opo11


1 12 PlloPo ll
P111P010+P11001Po

PlolPooo
To02 PlolPooo+ploopool


1t_02 plop100poo1
0p1o1pooo+p1Opoo1

P11 01ool



PlllPool+plolPoll






S Pllopooo+ploopolo













v11

1 vil

Vol

1 vol

V12

1 v12

V02

1 V02

V13

1 13

v03


11(l 023)+ (1 1)023

11023 + (1 11)(l- 023)

0Ol1(1 023) + (1 01)023

Po01023 + (1 cOl)(l 023)

p12(l 013) + (1 (12)O13

P12 13 + (1 ( 12)(l 13)

c02(1 013) + (1 (02)013

o02013 + (1 c02)(1 013)

(P13(1- 012) + (1 (13)012

P13012 + (1 (13)(1- 012)

c03(1 012) + (1 (03)012
r\ /i \/i r


(2-31)


I V03 P03U12 I (P03)1- t12)

Where 012, 023, 013 are linear functions of gij, i, j = 0, 1, which defined in (2-22). Then the

haplotype frequencies produced by parent AABbCc are calculated as aVII for ABC or Abe

and (1 v1) for ABc or AbC; the haplotype frequencies produced by parent aaBbCc are

calculated as vol for aBC or abc and -(1 o01) for aBc or abC. Similarly, parents with

AaBBCc will produce haplotypes with frequencies of -v12 for ABC or aBc and (1 v12)

for ABc or aBC; parents with AabbCc will produce haplotypes with frequencies of v02

for AbC or abc and I(1 v02) for Abc or abC. Also, parents with AaBbCC will produce

haplotypes with frequencies of Iv13 for ABC or abC and (1 v13) for AbC or aBC;

parents with AaBbcc will produce haplotypes with frequencies of V03 for ABc or abc and

(1 V03) for Abe or aBc.









Again, based on the information of genetic segregation in each family, the lower-level
likelihood is constructed as follows:

log L(2g I|M Mf, M, 2p) = constant

+n1,1,1 log(l) + nl,2,1 log(1) + "nl,2,2 log(0 ) + 1,3,1 10g(t) + 1 1,4,1 0log() + n1,4,2 log(')

+(n1,5,1 + n11,5,5) log(17vi) + (n1,5,2 + 1n,5,4) log((t v11))



+(n1,14,1 + n1,14,14) log10 yi) + (n1,14,2 + n1,14,13) log(10 2) + (n1,14,4 + n1,14,11) log10y3)

+(n1,14,5 + 11,14,10) log(10 4) + (n14,14,1 + n14,14,27) o0g(47y2)

+(n14,14,2 + n14,14,26) log(-y1y2) + (n14,14,3 + n14,14,25) log(_y7)

(1214,14,4 + 1214,14,24) log(Y7173)

+(n14,14,5 + n14,14,23 + 114,14,11 + 114,14,17 + 114,14,13 + 114,14,15) log(1(7172 + 7374))

+(114,14,6 + n14,14,22) log(y7274) + (n14,14,7 + n14,14,21) log(4y7)

+(n14,14,8 + n14,14,20) log(17374) + (n14,14,9 + n14,14,19) log(y4)

+(114,14,10 + 114,14,18) log(1'74) + (1214,14,11 + 114,14,17) 10og(1(7l 3 + 7274))

+(14,14,12 + 114,14,16) log(1'7273) + 14,14,14 log(3( 7 ))



+127,27,27 log(1). (2-32)

2.3.4 Estimation

By maximizing the upper-level likelihood (2-28), a closed form for the EM algorithm

to estimate haplotype frequencies is derived and the procedure is described as follows.
In the E step, we calculate the probability with which a double heterozygote parent

carries a particular diplotype with equations in table (2.3.3); a triple heterozygote parent










carries a particular diplotype with equations (2-29). Also, let us define:


nk. +n.k, k = I, ,27


nk

where

12i.

n.j


j I ,
V27


(2-33)


I, 27

, 1 27


In the M step, haplotype frequencies are estimated with the calculated diplotype

probability by


pill

P110

P101

Ploo



Poll


PolO

Pool

Pooo000


2 [2nl + n2 + 14 + (p11n5 + n10 + (p12"ll + (P13"13 + 71nl14]

2n['[n2 + 2n3 + (1 (1)1)5 + 6 + (1 (012)nll + 112 + 121214 + (o0315]

2n [n4 + (1 (11)125 + 2127 + 8 + (1 (013)13 + 1731214 + n216 + (02117]

2[(1Pll5 + 16 + n8 + 29n + (1 Tl1 -2 3)- 13)nl4 + (1 (03)n15

+(1 p02) 127 + n18]

2[n10o + (1 p12)n11 + (1 p13)n13 + (1 71 7l T/3)n14 + 2nl9

+n20 + 122 + (P01"231

2n[(12"11 + n12 + 173114 + (1 (03)1215 + 220 + 1 + 2 (1 01823 + 124]

2n[P13"13 + 2"1nl4 + 116 + (1 (P02)n17 + 122 + (1 (01)"23 + 2n25 + 1261

- [7lnl4 + + 0217 + 18 + (P01"23 + 124 + 126 + 2n27]


(2-34)


The iterations between E and M steps were proceeded until the convergence of the

estimates. These estimates at the convergence are the maximum likelihood estimates

(MLEs) of haplotype frequencies. We can solve equations (2-21) to get the MLEs of allele

frequencies, expressed as


ill + P110 +Po101 +P10oo

Pill + P110 + Poll + Polo


(2-35)


r = Pill + P101 + Poll + Pool









Four different types of linkage disequilibria under the three-point model can be solved

by

DAB = (111 + 110) (poo001 p000) 101 100) ()(po011 + pol00o) (2-36)

DBC = (p111 + o011) (po100 + oo000) (p110 + oi00o) (p101 + 00oo) (2-37)

DAC (P111 + 101)(poio010 + oo000) (p110 + o100) (po011 + 00oo) (2-38)

DABC (Pioo100 + oio01oo001) (p000Po011 +11o101) (2-39)

The recombination frequencies 012, 023 and 013 are linear functions of the crossover

probabilities goo, go,, g9o and g11 by (2-22), thus the maximum likelihood estimates

(MLEs) of the recombination frequencies can be obtained using the MLEs of crossover

probabilities. By maximizing the lower-level likelihood (2-32), closed forms for the EM

algorithm to estimate the crossover probabilities are derived as follows.

In the E step, the probabilities with which a considered haplotype produced by

a double heterozygote or triple heterozygote parent is the recombinant type can be

expressed in a matrix form.

Let


Gi = [goo,goi,gii,gio]' (2-40)

G2 [goo, 9oo00go, I/./i 9oo0010o, 91, o0111, o0110o, 91, 91110o, 1]' (2-41)











E involves those offspring whose parents'

either mother or father:


011

1- yll

12

1- 4q12

913

1 13


T91




1 92 9- 3

4003

1 P03

002

1 P02



1 op1 n

We use | to denote appending


. r itJ vpes have only one double heterozygote,


1 cl11 1 cll

(11 011

t1 12 412

12 1 12

(13 1 013

1 4-13 4q13

9r2 /93

9i 1 92

Si- 92 93 91

rT3 T/2

(03 1 03

1 03 9003

1 9002 002

002 1 002

1 0o1 1 01ol




in columns for matrix, then


1 011

1 cpll

1 012

12

1 013

913

1 1 92

93

92



1 9- P03



1 002o

9002

9001

1 4q01


A =[A||B]


where A is a 114 x 10 matrix, it involves those offspring whose parents' ir.1 pes are both

double heterozygotes:


A = [A11,11 |A11,12 I|A11,131 A11,031 A11,021 IA11,oi |A12,121 A12,131 A12,031 A12,02 (2-42)


|A12,01 A13,131 A13,031 A13,021 A13,01 IA03,03 IA03,02 IA03,01 IA02,02 IA02,01 Ao01,01]









where Aij,kl is a 6 x 10 matrix:


Aij,kl A j,kl Af',kl,

ij,kl e {11, 12, 13, 01, 02, 03},

where Aikl and Afjk are 3 x 10 matrices which are defined as follows:













8
:3
-8-
s+s
8 -8-
8
:3
8




:3 :3 :3

+++

9
:3 3 :3

cu


8

-8-
s+s
~ ~
8 -8-
8

8


8 8

3, 3
8 8


8

-8-
s+s
~ ~
8 -8-
8
:3
8

:3 :3
8 -8-


8 -8-



8 8 g_
333
g_8g_
+++


333
8 8 g_
cu


8
:3
-8-
s+s
8 -8-
8
:3
8
y










where t = 1 .. We use A'(m) to denote the mth column of A'kl, then we can

further denote


Afk = [A1(1), A1(3),A1(2), A(4), A1(8), A1(6),A1(9),A1(5), A1(7), A'(10)]


As ij = kl, we define


1
ciJJ 16


2. ). 2 C[ '. .+ '^ ,]

12 12


2.' ..'

/o


where '. = 1 '. .. Ay, is a 4 x 10 matrix. Again, if we use QCj,iy (m) to denote the mth

column of j,iy, then:


Ali,11

[C11,11(1), C11,11(2), Cjj,jj(2), 2Cjj,jj(2), Cjj,jj(3), 2Cjj,jj(3), C11,11(2), C11,11(3), C11,11(2), C11,11(1)]

A12,12

[C12,12(l), C12,12(2), 2Ci2,12(l), C12,12(2), C12,12(3), C12,12(2), 2Ci2,12(3), C12,12(l), C12,12(2), C12,12(3)]

A13,13

[C13,13(1), 2C13,13(1), C13,13(2), C13,13(2), C13,13(2), C13,13(1), C13,13(2), C13,13(3), 2C13,13(3), C13,13(3)]

A03,03 =

[C03,03(1), 2C03,03(1), C03,03(2), C03,03(2), C03,03(1), C03,03(2), C03,03(2), C03,03(3), 2C03,03(3), C03,03(3)]

A02,02 =

[C02,02(l), C02,02(2), 2Co2,02(l), C02,02(2), C02,02(3), C02,02(2), 2C02,02(3), C02,02(1), C02,02(2), C02,02(3)]

Aol,ol =

[Coi,oi(1), C01,01(2), C01,01(2), 2C1oi,o(l), C01,01(3), 2Coi,o1(3), C01,01(2), C01,01(3), C01,01(2), Coi,oi(1)]


2.' ..


S.1. 2 [-,. + '1?2 ]


S'I2 + 1.2,











B is a 68 x 10 matrix, it involves those offspring whose parents' genotypes have at

least one triple heterozygote. It can be expressed as:


B = [B B11 B3 B14 B151 B171 I B23],


where the subscript represents the genotype groups from which one parent comes.























~s--




N N N


NN N
g g

1^ Cf 1

+ t


N p-

" '
i


Ng N





N N


aQ ci ^ c
ciN


^ '


^ ^ 15 1^ -
m + t
N1 !


i 1?







N i~0
" +
-3


;; ,; 1^ l-1 N' 1-S

















N N N



N C N








NN N N
i 1i i^im '^ '


























N N N N
N N N N N, N N



N N N N N
N

m



m ^ g -' '
c^ c^ ^ c^ +
*^ l-' : -S g

^ ^ 1'


N l p ^ p
*^ l^ l^ l^ 'f l^ '
+ i i + + + ^+


NN N N p m p






ri ri -S r lS' lS
asi t
*^ ~- ^ ~ o i ^ i




S -S' -3 -
N~fN








aa~aci




^ -& i-S iS iS
a a ^+


*^ -^ ^ 1 ^ 2

,, -"S -- 'i *S
ccc c


d B
NfNN



a







NNNN

aaaa
~,,,
ccc
NNNN


J


m
iim
CCC
NNN
iii
aaa
fff
amN
C3C
NCN
iN3


N N N^ Nf 1

N N




N NN N
ci





N N

N N N
-^ iS- i& iS
+ ^ +


N N


N N
C C


a -3


F:


N
C
m
N a
c
f
m
a N


NriN


N
C
m
N a
C f
i
m
a ~


" "
c c
N N
i i
a a
f f
a m
C i
N C


N N
C C
" "
" "
a a
f f
F:


N ~r






























N N N





-i Ti N


-; 7;


i


i i








i i


i i


a


C1 C1


,ii,
ccce
mmmm



E~a


i

i


ffa
ccC
m
m m
i


N





N
i
C
m
N N
" "


ta t


NNj
i i i C
C C C
m
m m m
" "

,,
ii i
caa,,
m
m mm,
i i i
a a-3a
N


N
N iN3

























9 i i
c. c. c.


N N

N N


I N l
+ t

I? I

I l
i


- ?




c i


N N

N N


c^ + mc
0 i
-^ 1^ ^


7 ;


7; 7


N N N


NN N
? 'g

1
(?1


o 1 o
^ '^ l
t

0
i m
m


i? ?
g g
U ^ '
+
? '^








N


^ OCO
N ioN ^


N N


I I I I


N N



N N
i i
C C
m m


a


mmaimci
C3CC3C
NCNNCN


C1 C


? IP?


0 1 t


"""


C1 C1


"t""


c N


N


0 N_ 0 0
i$ -- 1S *&
+ ^ +

0 0 C^


l -1 11 11 1

















dC" N -, fif

C C C1 C1 a C1 ^
NN s s
C1 (?1C (?
C~C C C"fNN


N N


N N N N


N1 N N


N N N
" 1^ I" 1^
i t
m o O ^ 1


c^ c^ ^

^ ^ ^ 1 '- 1
N t N
-s ^ -s ^ ^


i i i '^ i N Nm m N
"y" c-ic- ci "y
(?iaa


N N


NN
NCN
c+c
i i
c N3 C
NCN
N


iN3
CCC
fff
aaa
CCC
N3N
CCC
Nd~N


N ,5
C<


NN NN
C C

N3 N3
C C
N N


F: F


Na
C "
c c
Na m Nm N m Nm m N N
"CCC"C"CCCCC
C C C
i N i
C N NN N m
NN N N3 C ~ N C C C ~ C
NCCCN
"1 i "L3ii
N C N C C C C C
N3
d~ N Nc,,m
N




"c"
~ + d
Nm m N Nm m NCa
"CCC"C"CC
Cfffcfcfffff
i i N
C N NN N C C NN N N3 m a
N C C C N C N C C C C C
i f i i i i i
c c c N C N C C
N3
N C N N d~ d~ m
N


N N
N ~C" ~ ~C~
fC
NN N3 Nd Nm
C3CN "C"CN
C C"C C C"C
N NCN N NCN
i i
C C
N N


? +

j c
(?


.


m Nm N N
m Nm m N NCa C CCC
"CCC"C~
Cfffcffffff
N N
C N NN N C c N3 N NN m a
c c c c N C C C C C
i f i i i i i
C C C N C N C C
N3
N C N N d~ d~ m
N


NNN


NN N N N Nd i N i
"C"FFF~F~FFF
c c
i i
C""C~"~R""C~~~
N C N C C N
iii --- N3N
N CCC N CCC
Nd~N Nd~N





m a
c c
N N
N mC" "C"
C C C C C C
ii NN Nm NCa i N
C3C3
c c c
c c~ c -c
N N N N NCN
i i
C C
N N


^ +N

C1


C1 ^ C


N N
"C" N mCd
C C C C C
N3 NN
i NC3CN i
c~c c cme
N N N NCN
i i
C C
N N


N m Nm m N N
"C~CCCCC
~+,+++++
cacNNNNam
N C N C C C C C
i ifiii
c c ccc
N3
N
N












ij e {11,12,13, 01,02,03}


h4 1 t- 2 '3


Tr23 = r2+ r3

Tr13 = 1 + rT3

'12 = 1 + Tr2


We define
E(ij)Gi(j, 1)
S 1 E (i,( )Gi(j, 1)
where i 1, .. ,16 andj = 1, .. ,4; and

A(i,j)G2(j, )
4 1A(i,ij)G2(, 1)


In the M step, the estimates of crossover probabilities g's are obtained as:


16 y11828
L1 -1 1[ + 2 4i1N + Ej=2,3,4 L1i8 ,
Yi 11 + Yi N,
6 i2 11 + 2 i82 s ii + LE=2,6,7 E i8- ^,


Q,6 1 + 2 Y 48' 4i8N, + E^=3,6,9 L i8 N,
S2 1 1 + ,i N4,

S+ 2 i 1 + 4,7,9


where


I. -1


'r23 1 t 7- 2 'r3

'713 1 t- '3

/12 1 1 '72


(2-43)




(2-44)


(2-45)


(2-46)


(2-47)


(2-48)










where


MI = n5,1,1 + nl,5,1 + n5,1,5 + n1,5,5 + n5,2,1 + n2,5,1 + n5,2,6 + n2,5,6 + n5,3,2 + n3,5,2 +

n5,3,6 + n3,5,6 + n5,4,1 + n4,5,1 + n5,4,8 + n4,5,8 + n5,6,2 + n6,5,2 + n5,6,9 + n6,5,9 +

n5,7,4 + n7,5,4 + n5,7,8 + n7,5,8 + n5,8,4 + n8,5,4 + n5,8,9 + n8,5,9 + n5,9,5 + n9,5,5 +

n5,9,9 + n9,5,9 + n5,10,1 + l10,5,1 + n5,10,5 + l10,5,5 + n5,12,2 + "12,5,2 + "5,12,6 +

"12,5,6 + n5,16,4 + "16,5,4 + n5,16,8 + "16,5,8 + n5,18,5 + "18,5,5 + n5,18,9 + "18,5,9 +

n5,19,10 + n19,5,10 + n5,19,14 + n19,5,14 + n5,20,10 + n20,5,10 + n5,20,15 + n20,5,15 +

n5,21,11 + "21,5,11 + n5,21,15 + "21,5,15 + n5,22,10 + n22,5,10 + n5,22,17 + n22,5,17 +

n5,24,11 + n24,5,11 + n5,24,18 + n24,5,18 + n5,25,13 + n25,5,13 + n5,25,17 + n25,5,17 +

n5,26,13 + n26,5,13 + n5,26,18 + n26,5,18 + n5,27,14 + n27,5,14 + n5,27,18 + n27,5,18

_. = n5,1,2 + n1,5,2 + n5,1,4 + n1,5,4 + n5,2,3 + n2,5,3 + n5,2,4 + n2,5,4 + n5,3,3 + n3,5,3 +

n5,3,5 + n3,5,5 + n5,4,2 + n4,5,2 + n5,4,7 + n4,5,7 + n5,6,3 + n6,5,3 + n5,6,8 + n6,5,8 +

n5,7,5 + 17,5,5 + n5,7,7 + 17,5,7 + 15,8,6 + 18,5,6 + 15,8,8 + 18,5,8 + n5,9,6 + "9,5,6 +

n5,9,8 + n9,5,8 + n5,10,2 + "10,5,2 + n5,10,4 + "10,5,4 + 15,12,3 + n12,5,3 + 15,12,5 +

1n2,5,5 + 15,16,5 + "16,5,5 + 15,16,7 + "16,5,7 + 15,18,6 + "18,5,6 + 15,18,8 + "18,5,8 +

n5,19,11 + n19,5,11 + n5,19,13 + n19,5,13 + n5,20,12 + n20,5,12 + "5,20,13 + "20,5,13 +

15,21,12 + "21,5,12 + n5,21,14 + "21,5,14 + n5,22,11 + n22,5,11 + "5,22,16 + "22,5,16 +

15,24,12 + 124,5,12 + 15,24,17 + 124,5,17 + 15,25,14 + 125,5,14 + 15,25,16 + 125,5,16 +

n5,26,15 + n26,5,15 + n5,26,16 + n26,5,16 + n5,27,15 + n27,5,15 + n5,27,17 + n27,5,17














f. = n11,1,1 + n1,11,1 + n11,1,11 + n1,11,11 + n11,2,1 + n2,11,1 + n11,2,12 + n2,11,12 +

n11,3,2 + n3,11,2 + n11,3,12 + n3,11,12 + n11,4,1 + n4,11,1 + n11,4,11 + n4,11,11 +

n11,6,2 + n6,11,2 + n11,6,12 + n6,11,12 + n11,7,4 + n7,11,4 + n11,7,14 + n7,11,14 +

n11,8,4 + n8,11,4 + n11,8,15 + n8,11,15 + n11,9,5 + n9,11,5 + n11,9,15 + n9,11,15 +

n11,10,1 + n10o,11,1 + n11,10,20 + n10,11,20 + n11,12,2 + n12,11,2 + n11,12,21 + n12,11,21 +

"11,16,4 + n16,11,4 + "11,16,23 + n16,11,23 + "11,18,5 + n18,11,5 + "11,18,24 + n18,11,24 +

n11,19,10 + n19,11,10 + n11,19,20 + n19,11,20 + n11,20,10 + n20,11,10 + "11,20,21 +

n20,11,21 + n11,21,11 + "21,11,11 + n11,21,21 + "21,11,21 + n11,22,10 + n22,11,10 +

n11,22,20 + n22,11,20 + "11,24,11 + n24,11,11 + "11,24,14 + n24,11,14 + "11,24,21 +

n24,11,21 + "11,24,24 + n24,11,24 + "11,25,13 + n25,11,13 + "11,25,23 + "25,11,23 +

"11,26,13 + n26,11,13 + "11,26,24 + n26,11,24 + "11,27,14 + n27,11,14 + "11,27,24 + "27,11,24

M4 = "11,1,2 + "1,11,2 + 11,1,10 + "1,11,10 + "11,2,3 + n2,11,3 + "11,2,10 + n2,11,10 + "11,3,3 +

n3,11,3 + "11,3,11 + n3,11,11 + "11,4,2 + n4,11,2 + "11,4,10 + n4,11,10 + "11,6,3 + "6,11,3 +

"11,6,11 + n6,11,11 + "11,7,5 + n7,11,5 + "11,7,13 + n7,11,13 + "11,8,6 + n8,11,6 + "11,8,13 +

n8,11,13 + "11,9,6 + n9,11,6 + "11,9,14 + n9,11,14 + "11,10,2 + "10,11,2 + n11,10,19 +

n10,11,19 + "11,12,3 + n12,11,3 + n11,12,20 + n12,11,20 + 111,16,5 + 116,11,5 + "11,16,22 +

116,11,22 + "11,18,6 + n18,11,6 + "11,18,23 + n18,11,23 + 111,19,12 + 119,11,12 + 111,19,19 +

119,11,19 + n11,20,12 + n20,11,12 + n11,20,19 + n20,11,19 + 111,21,12 + 121,11,12 + "11,21,20 +

"21,11,20 + 111,22,11 + 122,11,11 + n11,22,19 + 122,11,19 + 111,24,12 + 124,11,12 + 111,24,15 +

124,11,15 + "11,24,20 + n24,11,20 + n11,24,23 + n24,11,23 + 111,25,14 + 125,11,14 + n11,25,22 +

125,11,22 + 111,26,15 + 126,11,15 + n11,26,22 + 126,11,22 + 111,27,15 + 127,11,15 + "11,27,23 +

n27,11,23













i -. = n13,1,1 + n1,13,1 + n13,1,13 + "1,13,13 + "13,2,1 + n2,13,1 + "13,2,2 + "2,13,2 + n13,2,12 +

n2,13,12 + n13,2,13 + n2,13,13 + "13,3,2 + n3,13,2 + n13,3,14 + n3,13,14 + "13,4,1 + "4,13,1 +

n13,4,16 + n4,13,16 + "13,6,2 + n6,13,2 + n13,6,17 + n6,13,17 + "13,7,4 + n7,13,4 + n13,7,16 +

n7,13,16 + "13,8,4 + n8,13,4 + U13,8,5 + n8,13,5 + n13,8,16 + n8,13,16 + n13,8,17 + "8,13,17 +

"13,9,5 + n9,13,5 + n13,9,17 + n9,13,17 + n13,10,1 + "10,13,1 + n13,10,22 + n10,13,22 +

n13,12,2 + n12,13,2 + n13,12,23 + n12,13,23 + n13,16,4 + n16,13,4 + n13,16,25 + n16,13,25 +

n13,18,5 + n18,13,5 + n13,18,26 + n18,13,26 + n13,19,10 + n19,13,10 + n13,19,22 + n19,13,22 +

n13,20,10 + n20,13,10 + n13,20,11 + n20,13,11 + n13,20,22 + n20,13,22 + "13,20,23 + "20,13,23 +

n13,21,11 + 21l,13,11 + n13,21,23 + 21l,13,23 + n13,22,10 + n22,13,10 + n13,22,25 + "22,13,25 +

n13,24,11 + n24,13,11 + n13,24,26 + n24,13,26 + n13,25,13 + n25,13,13 + n13,25,25 + n25,13,25 +

n13,26,13 + n26,13,13 + n13,26,14 + n26,13,14 + n13,26,25 + n26,13,25 + "13,26,26 + "26,13,26 +

n13,27,14 + U27,13,14 + U13,27,26 + U27,13,26

M-. U13,1,4 + U1,13,4 + U13,1,10 + U1,13,10 + "13,2,4 + U2,13,4 + U13,2,5 + U2,13,5 + U13,2,10 +

U2,13,10 + U13,2,11 + U2,13,11 + "13,3,5 + U3,13,5 + U13,3,11 + U3,13,11 + U13,4,8 + U4,13,8 +

n13,4,10 + n4,13,10 + "13,6,8 + n6,13,8 + n13,6,11 + n6,13,11 + "13,7,7 + n7,13,7 + "13,7,13 +

n7,13,13 + U13,8,7 + U8,13,7 + U13,8,8 + U8,13,8 + U13,8,13 + U8,13,13 + U13,8,14 + U8,13,14 +

"13,9,8 + U9,13,8 + U13,9,14 + U9,13,14 + U13,10,4 + U10,13,4 + U13,10,19 + U10,13,19 +

n13,12,5 + U12,13,5 + U13,12,20 + U12,13,20 + U13,16,7 + U16,13,7 + U13,16,22 + U16,13,22 +

U13,18,8 + U18,13,8 + U13,18,23 + U18,13,23 + U13,19,13 + U19,13,13 + U13,19,19 + U19,13,19 +

U13,20,13 + U20,13,13 + U13,20,14 + U20,13,14 + U13,20,19 + U20,13,19 + U13,20,220 + 20,13,20 +

U13,21,14 + U21,13,14 + U13,21,20 + U21,13,20 + U13,22,16 + U22,13,16 + U13,22,19 + U22,13,19 +

U13,24,17 + U24,13,17 + U13,24,20 + U24,13,20 + U13,25,16 + U25,13,16 + U13,25,22 + U25,13,22 +

U13,26,16 + U26,13,16 + U13,26,17 + U26,13,17 + U13,26,22 + U26,13,22 + U13,26,23 + U26,13,23 +

n13,27,17 + n27,13,17 + n13,27,23 + n27,13,23

63














M7 = n14,1,1 + 1,14,1 + n14,1,14 + n1,14,14 + "14,2,1 + n2,14,1 + n14,2,15 + n2,14,15 + "14,3,2 +

n3,14,2 + n14,3,15 + n3,14,15 + "14,4,1 + n4,14,1 + n14,4,17 + n4,14,17 + "14,6,2 + "6,14,2 +

n14,6,18 + n6,14,18 + U14,7,4 + n7,14,4 + n14,7,17 + n7,14,17 + U14,8,4 + U8,14,4 + U14,8,18 +

n8,14,18 + U14,9,5 + U9,14,5 + U14,9,18 + U9,14,18 + U14,10,1 + U10,14,1 + U14,10,23 +

n10,14,23 + n14,12,2 + n12,14,2 + U14,12,24 + n12,14,24 + n14,16,4 + n16,14,4 + n14,16,26 +

n16,14,26 + n14,18,5 + n18,14,5 + n14,18,27 + n18,14,27 + n14,19,10 + n19,14,10 + n14,19,23 +

19,14,23 + n14,20,10 + U20,14,10 + n14,20,24 + U20,14,24 + n14,21,11 + U21,14,11 + n14,21,24 +

"21,14,24 + n14,22,10 + U22,14,10 + n14,22,26 + n22,14,26 + n14,24,11 + U24,14,11 + n14,24,27 +

n24,14,27 + n14,25,13 + U25,14,13 + U14,25,26 + U25,14,26 + n14,26,13 + U26,14,13 + n14,26,27 +

n26,14,27 + n14,27,14 + U27,14,14 + n14,27,27 + n27,14,27

i, = U14,1,2 + nl,14,2 + n14,1,13 + U1,14,13 + U14,2,3 + n2,14,3 + n14,2,13 + n2,14,13 + U14,3,3 +

n3,14,3 + U14,3,14 + U3,14,14 + U14,4,2 + U4,14,2 + U14,4,16 + U4,14,16 + U14,6,3 + U6,14,3 +

n14,6,17 + U6,14,17 + U14,7,5 + U7,14,5 + U14,7,16 + U7,14,16 + U14,8,6 + U8,14,6 + U14,8,16 +

U8,14,16 + U14,9,6 + U9,14,6 + U14,9,17 + U9,14,17 + U14,10,2 + U10,14,2 + U14,10,22 +

n10,14,22 + n14,12,3 + n12,14,3 + n14,12,23 + n12,14,23 + n14,16,5 + n16,14,5 + n14,16,25 +

U16,14,25 + U14,18,6 + U18,14,6 + U14,18,26 + U18,14,26 + U14,19,11 + U19,14,11 + U14,19,22 +

U19,14,22 + U14,20,12 + U20,14,12 + U14,20,22 + U20,14,22 + U14,21,12 + U21,14,12 + U14,21,23 +

U21,14,23 + U14,22,11 + U22,14,11 + U14,22,25 + U22,14,25 + U14,24,12 + U24,14,12 + U14,24,26 +

U24,14,26 + U14,25,14 + U25,14,14 + U14,25,25 + U25,14,25 + U14,26,15 + U26,14,15 + U14,26,25 +

n26,14,25 + n14,27,15 + n27,14,15 + n14,27,26 + n27,14,26














iV., = "14,1,4 + n1,14,4 + n14,1,11 + n1,14,11 + "14,2,4 + n2,14,4 + n14,2,12 + n2,14,12 + "14,3,5 +

n3,14,5 + n14,3,12 + n3,14,12 + "14,4,7 + n4,14,7 + n14,4,11 + n4,14,11 + U14,6,8 + "6,14,8 +

n14,6,12 + n6,14,12 + U14,7,7 + n7,14,7 + n14,7,14 + U7,14,14 + U14,8,7 + U8,14,7 + U14,8,15 +

n8,14,15 + U14,9,8 + U9,14,8 + U14,9,15 + U9,14,15 + U14,10,4 + U10,14,4 + U14,10,20 +

n10,14,20 + n14,12,5 + n12,14,5 + U14,12,21 + n12,14,21 + n14,16,7 + n16,14,7 + n14,16,23 +

16,14,23 + n14,18,8 + n18,14,8 + U14,18,24 + n18,14,24 + n14,19,13 + n19,14,13 + n14,19,20 +

n19,14,20 + U14,20,13 + U20,14,13 + U14,20,21 + U20,14,21 + n14,21,14 + U21,14,14 + U14,21,21 +

"21,14,21 + U14,22,16 + U22,14,16 + n14,22,20 + n22,14,20 + n14,24,17 + U24,14,17 + n14,24,21 +

n24,14,21 + U14,25,16 + U25,14,16 + n14,25,23 + U25,14,23 + U14,26,16 + U26,14,16 + n14,26,24 +

n26,14,24 + n14,27,17 + U27,14,17 + n14,27,24 + n27,14,24

o10 = 14,1,5 + "1,14,5 + n14,1,10 + "1,14,10 + "14,2,6 + n2,14,6 + n14,2,10 + n2,14,10 + U14,3,6 +

n3,14,6 + U14,3,11 + U3,14,11 + U14,4,8 + U4,14,8 + U14,4,10 + U4,14,10 + U14,6,9 + U6,14,9 +

U14,6,11 + U6,14,11 + "14,7,8 + U7,14,8 + U14,7,13 + U7,14,13 + U14,8,9 + U8,14,9 + U14,8,14 +

U8,14,14 + U14,9,9 + U9,14,9 + U14,9,14 + U9,14,14 + U14,10,5 + U10,14,5 + U14,10,19 +

n10,14,19 + n14,12,6 + n12,14,6 + n14,12,20 + n12,14,20 + n14,16,8 + n16,14,8 + n14,16,22 +

U16,14,22 + U14,18,9 + U18,14,9 + U14,18,23 + U18,14,23 + U14,19,14 + U19,14,14 + U14,19,19 +

U19,14,19 + U14,20,15 + U20,14,15 + U14,20,19 + U20,14,19 + U14,21,15 + U21,14,15 + U14,21,20 +

U21,14,20 + U14,22,17 + U22,14,17 + U14,22,19 + U22,14,19 + U14,24,18 + U24,14,18 + U14,24,20 +

U24,14,20 + U14,25,17 + U25,14,17 + U14,25,22 + U25,14,22 + U14,26,18 + U26,14,18 + U14,26,22 +

n26,14,22 + n14,27,18 + n27,14,18 + n14,27,23 + n27,14,23














M11 = 1l5,1,2 + "1,15,2 + l15,1,14 + n1,15,14 + n15,2,2 + n2,15,2 + l15,2,3 + n2,15,3 + l15,2,14 +

n2,15,14 + l15,2,15 + n2,15,15 + l15,3,3 + n3,15,3 + l15,3,15 + n3,15,15 + l15,4,2 + "4,15,2 +

n15,4,17 + n4,15,17 + l15,6,3 + n6,15,3 + l15,6,18 + n6,15,18 + l15,7,5 + n7,15,5 + l15,7,17 +

n7,15,17 + "15,8,5 + U8,15,5 + "15,8,6 + n8,15,6 + n15,8,17 + n8,15,17 + n15,8,18 + n8,15,18 +

"15,9,6 + n9,15,6 + n15,9,18 + n9,15,18 + n15,10,2 + l10,15,2 + n15,10,23 + l10,15,23 +

n15,12,3 + n12,15,3 + n15,12,24 + n12,15,24 + n15,16,5 + n16,15,5 + n15,16,26 + n16,15,26 +

n15,18,6 + n18,15,6 + n15,18,27 + n18,15,27 + n15,19,11 + n19,15,11 + n15,19,23 + n19,15,23 +

n15,20,11 + n20,15,11 + n15,20,12 + n20,15,12 + n15,20,23 + n20,15,23 + n15,20,24 + "20,15,24 +

n15,21,12 + 21l,15,12 + n15,21,24 + "21,15,24 + n15,22,11 + n22,15,11 + n15,22,26 + n22,15,26 +

n15,24,12 + n24,15,12 + n15,24,27 + n24,15,27 + n15,25,14 + n25,15,14 + l15,25,26 + "25,15,26 +

n15,26,14 + n26,15,14 + n15,26,15 + n26,15,15 + n15,26,26 + n26,15,26 + n15,26,27 + "26,15,27 +

n15,27,15 + n27,15,15 + n15,27,27 + n27,15,27

M12 = 15,1,5 + "1,15,5 + n15,1,11 + "1,15,11 + "15,2,5 + U2,15,5 + U15,2,6 + U2,15,6 + U15,2,11 +

U2,15,11 + n15,2,12 + U2,15,12 + U15,3,6 + U3,15,6 + U15,3,12 + U3,15,12 + U15,4,8 + U4,15,8 +

n15,4,11 + n4,15,11 + "15,6,9 + n6,15,9 + n15,6,12 + n6,15,12 + "15,7,8 + "7,15,8 + "15,7,14 +

n7,15,14 + U15,8,8 + U8,15,8 + U15,8,9 + U8,15,9 + U15,8,14 + U8,15,14 + U15,8,15 + U8,15,15 +

U15,9,9 + U9,15,9 + U15,9,15 + U9,15,15 + U15,10,5 + U10,15,5 + U15,10,20 + U10,15,20 +

U15,12,6 + U12,15,6 + U15,12,21 + U12,15,21 + U15,16,8 + U16,15,8 + n15,16,23 + U16,15,23 +

U15,18,9 + U18,15,9 + U15,18,24 + U18,15,24 + U15,19,14 + U19,15,14 + U15,19,20 + U19,15,20 +

U15,20,14 + U20,15,14 + U15,20,15 + U20,15,15 + U15,20,20 + U20,15,20 + U15,20,21 + U20,15,21 +

U15,21,15 + U21,15,15 + U15,21,21 + U21,15,21 + U15,22,17 + U22,15,17 + U15,22,20 + U22,15,20 +

U15,24,18 + U24,15,18 + U15,24,21 + U24,15,21 + U15,25,17 + U25,15,17 + U15,25,23 + U25,15,23 +

U15,26,17 + U26,15,17 + U15,26,18 + U26,15,18 + U15,26,23 + U26,15,23 + U15,26,24 + U26,15,24 +

n15,27,18 + n27,15,18 + n15,27,24 + n27,15,24

66














n17,1,4 + U1,17,4 + n17,1,14 + U1,17,14 + "17,2,4 + n2,17,4 + n17,2,15 + n2,17,15 + U17,3,5 +

n3,17,5 + U17,3,15 + U3,17,15 + U17,4,4 + U4,17,4 + U17,4,14 + U4,17,14 + U17,6,5 + U6,17,5 +

n17,6,15 + U6,17,15 + U17,7,7 + U7,17,7 + U17,7,17 + U7,17,17 + U17,8,7 + U8,17,7 + U17,8,18 +

n8,17,18 + U17,9,8 + U9,17,8 + U17,9,18 + U9,17,18 + U17,10,4 + U10,17,4 + U17,10,23 +

n10,17,23 + n17,12,5 + n12,17,5 + n17,12,24 + n12,17,24 + n17,16,7 + n16,17,7 + n17,16,26 +

n16,17,26 + U17,18,8 + U18,17,8 + n17,18,27 + n18,17,27 + n17,19,14 + n19,17,14 + n17,19,23 +

n19,17,23 + U17,20,13 + U20,17,13 + n17,20,24 + U20,17,24 + n17,21,14 + U21,17,14 + n17,21,24 +

"21,17,24 + U17,22,13 + U22,17,13 + n17,22,23 + n22,17,23 + n17,24,14 + U24,17,14 + n17,24,24 +

n24,17,24 + n17,25,16 + U25,17,16 + n17,25,26 + U25,17,26 + n17,26,16 + U26,17,16 + n17,26,27 +

n26,17,27 + n17,27,17 + U27,17,17 + n17,27,27 + n27,17,27

U17,1,5 + U1,17,5 + n17,1,13 + U1,17,13 + "17,2,6 + n2,17,6 + n17,2,13 + n2,17,13 + U17,3,6 +

n3,17,6 + U17,3,14 + U3,17,14 + U17,4,5 + U4,17,5 + U17,4,13 + U4,17,13 + U17,6,6 + U6,17,6 +

n17,6,14 + U6,17,14 + U17,7,8 + U7,17,8 + U17,7,16 + U7,17,16 + U17,8,9 + U8,17,9 + U17,8,16 +

U8,17,16 + U17,9,9 + U9,17,9 + U17,9,17 + U9,17,17 + U17,10,5 + U10,17,5 + U17,10,22 +

n10,17,22 + n17,12,6 + n12,17,6 + n17,12,23 + n12,17,23 + n17,16,8 + n16,17,8 + n17,16,25 +

U16,17,25 + U17,18,9 + U18,17,9 + U17,18,26 + U18,17,26 + U17,19,15 + U19,17,15 + U17,19,22 +

U19,17,22 + U17,20,15 + U20,17,15 + U17,20,22 + U20,17,22 + U17,21,15 + U21,17,15 + U17,21,23 +

U21,17,23 + U17,22,14 + U22,17,14 + U17,22,22 + U22,17,22 + U17,24,15 + U24,17,15 + U17,24,23 +

U24,17,23 + U17,25,17 + U25,17,17 + U17,25,25 + U25,17,25 + U17,26,18 + U26,17,18 + U17,26,25 +

n26,17,25 + n17,27,18 + n27,17,18 + n17,27,26 + n27,17,26














M15 = n23,1,10 + n1,23,10 + n23,1,14 + n1,23,14 + n23,2,10 + n2,23,10 + n23,2,15 + n2,23,15 +

n23,3,11 + n3,23,11 + n23,3,15 + n3,23,15 + n23,4,10 + n4,23,10 + n23,4,17 + n4,23,17 +

n23,6,11 + n6,23,11 + n23,6,18 + n6,23,18 + n23,7,13 + n7,23,13 + n23,7,17 + n7,23,17 +

n23,8,13 + n8,23,13 + n23,8,18 + n8,23,18 + n23,9,14 + n9,23,14 + n23,9,18 + n9,23,18 +

n23,10,100 + 10,23,10 + n23,10,14 + n10,23,14 + n23,12,11 + n12,23,11 + "23,12,15 +

n12,23,15 + n23,16,13 + n16,23,13 + n23,16,17 + n16,23,17 + n23,18,14 + n18,23,14 +

n23,18,18 + n18,23,18 + n23,19,19 + n19,23,19 + n23,19,23 + n19,23,23 + n23,20,19 +

n20,23,19 + n23,20,24 + n20,23,24 + n23,21,20 + "21,23,20 + n23,21,24 + "21,23,24 +

n23,22,19 + n22,23,19 + n23,22,26 + n22,23,26 + n23,24,20 + n24,23,20 + n23,24,27 +

n24,23,27 + n23,25,22 + n25,23,22 + n23,25,26 + n25,23,26 + n23,26,22 + n26,23,22 +

n23,26,27 + n26,23,27 + n23,27,23 + n27,23,23 + n23,27,27 + n27,23,27

MA16 = 23,1,11 + "1,23,11 + n23,1,13 + "1,23,13 + n23,2,12 + n2,23,12 + n23,2,13 + n2,23,13 +

n23,3,12 + n3,23,12 + n23,3,14 + n3,23,14 + n23,4,11 + n4,23,11 + n23,4,16 + n4,23,16 +

n23,6,12 + n6,23,12 + n23,6,17 + n6,23,17 + n23,7,14 + n7,23,14 + n23,7,16 + n7,23,16 +

n23,8,15 + n8,23,15 + n23,8,16 + n8,23,16 + n23,9,15 + n9,23,15 + n23,9,17 + n9,23,17 +

n23,10,11 + n10,23,11 + n23,10,13 + n10,23,13 + n23,12,12 + n12,23,12 + "23,12,14 +

n12,23,14 + n23,16,14 + n16,23,14 + n23,16,16 + n16,23,16 + n23,18,15 + n18,23,15 +

n23,18,17 + n18,23,17 + n23,19,20 + n19,23,20 + n23,19,22 + n19,23,22 + n23,20,21 +

n20,23,21 + n23,20,22 + n20,23,22 + n23,21,21 + "21,23,21 + n23,21,23 + "21,23,23 +

n23,22,20 + n22,23,20 + n23,22,25 + n22,23,25 + n23,24,21 + n24,23,21 + n23,24,26 +

n24,23,26 + n23,25,23 + n25,23,23 + n23,25,25 + n25,23,25 + n23,26,24 + n26,23,24 +

n23,26,25 + n26,23,25 + n23,27,24 + n27,23,24 + n23,27,26 + n27,23,26













NI = u5,5,1 + n5,5,9

N3 = n5,5,3 + n5,5,7

NV5 = n5,11,1 + n5,11,15 + l11,5,1 + l11,5,15

17 = n5,11,3 + n5,11,13 + U11,5,3 + U11,5,13

NV9 = 5,11,5 + n5,11,11 + U11,5,5 + 11,5,11

N11 = n5,13,1 + n5,13,17 + U13,5,1 + U13,5,17

N13 = n5,13,4 + n5,13,14 + "13,5,4 + "13,5,14

N15 = n5,13,7 + n5,13,11 + "13,5,7 + "13,5,11

N17 = n5,15,2 + n5,15,18 + "15,5,2 + 15,5,18

N1V9 = 5,15,5 + n5,15,15 + "15,5,5 + 15,5,15

1V21 = 5,15,8 + n5,15,12 + "15,5,8 + 15,5,12

N23 = n5,17,4 + n5,17,18 + U17,5,4 + U17,5,18

1V25 = n5,17,6 + n5,17,16 + U17,5,6 + U17,5,16

N27 = n5,17,8 + n5,17,14 + U17,5,8 + U17,5,14

NV29 = n5,23,10 + n5,23,18 + n23,5,10 + n23,5,18

N31 = n5,23,12 + n5,23,16 + n23,5,12 + n23,5,16

N33 = n5,23,14 + n5,23,14 + n23,5,14 + n23,5,14

N35 = n11,11,1 + n11,11,21

N37 = "11,11,3 + n11,11,19

N39 = "11,13,1 + "11,13,23 + U13,11,1 + U13,11,23

N41 = "11,13,4 + n11,13,20 + n13,11,4 + n13,11,20


N2 = n5,5,2 + n5,5,4 + n5,5,6 + n5,5,8

N4 = n5,5,5

N6 = n5,11,2 + n5,11,14 + n11,5,2 + n11,5,14

N8 = n5,11,4 + n5,11,12 + n11,5,4 + n11,5,12

NI0 = n5,11,6 + n5,11,10 + n11,5,6 + n11,5,10

N12 = n5,13,2 + n5,13,16 + "13,5,2 + n13,5,16

N14 = 5,13,5 + n5,13,13 + "13,5,5 + n13,5,13

N16 = 5,13,8 + 5,13,10 + "13,5,8 + U13,5,10

N18 = n5,15,3 + n5,15,17 + "15,5,3 + n15,5,17

N20 = n5,15,6 + n5,15,14 + "15,5,6 + "15,5,14

N22 = n5,15,9 + n5,15,11 + "15,5,9 + n15,5,11

N24 = n5,17,5 + n5,17,17 + U17,5,5 + U17,5,17

N26 = n5,17,7 + n5,17,15 + U17,5,7 + U17,5,15

N28 = n5,17,9 + n5,17,13 + U17,5,9 + U17,5,13

N30 = n5,23,11 + n5,23,17 + n23,5,11 + n23,5,17

N32 = n5,23,13 + n5,23,15 + n23,5,13 + n23,5,15

N34 = n5,23,15 + n5,23,13 + n23,5,15 + n23,5,13

N36 = "11,11,2 + n11,11,10 + n11,11,12 + n11,11,20

N38 i= 11,11,11

N40 = 11,13,2 + l11,13,22 + n13,11,2 + n13,11,22

N42 = 11,13,5 + 11,13,19 + n13,11,5 + n13,11,19


N43 n11,13,10 + n11,13,14 + n13,11,10 + n13,11,14 N44 n- 11,13,11 + n11,13,13 + n13,11,11 + n13,11,13

N45 = "11,15,2 + n11,15,24 + n15,11,2 + n15,11,24 N46 = "11,15,3 + n11,15,23 + n15,11,3 + n15,11,23











U11,15,5 + n11,15,21 + n15,11,5 + n15,11,21

n11,15,11 + n11,15,15 + n15,11,11 + n15,11,15

11,17,4 + 11,17,24 + n17,11,4 + n17,11,24

11,17,6 + 11,17,22 + n17,11,6 + n17,11,22

n11,17,14 + n11,17,14 + n17,11,14 + n17,11,14

"11,23,10 + l11,23,24 + n23,11,10 + n23,11,24

"11,23,12 + l11,23,22 + n23,11,12 + n23,11,22

"11,23,14 + U11,23,20 + n23,11,14 + n23,11,20

n13,13,1 + n13,13,25

n13,13,7 + n13,13,19

n13,15,2 + n13,15,26 + n15,13,2 + n15,13,26

n13,15,8 + n13,15,20 + l15,13,8 + 15,13,20

n13,15,14 + n13,15,14 + n15,13,14 + 15,13,14

n13,17,4 + n13,17,26 + n17,13,4 + n17,13,26

n13,17,7 + n13,17,23 + n17,13,7 + n17,13,23

n13,17,13 + n13,17,17 + n17,13,13 + n17,13,17

n13,23,10 + n13,23,26 + n23,13,10 + n23,13,26

n13,23,13 + n13,23,23 + n23,13,13 + n23,13,23

n13,23,16 + n13,23,20 + n23,13,16 + n23,13,20

n15,15,3 + U15,15,27

n15,15,9 + U15,15,21

n15,17,5 + U15,17,27 + U17,15,5 + U17,15,27

n15,17,8 + U15,17,24 + n17,15,8 + U17,15,24

n15,17,14 + n15,17,18 + n17,15,14 + n17,15,18


U11,15,6 + n11,15,20 + n15,11,6 + n15,11,20

U11,15,12 + n11,15,14 + n15,11,12 + n15,11,14

l11,17,5 + 11,17,23 + n17,11,5 + n17,11,23

U11,17,13 + n11,17,15 + n17,11,13 + n17,11,15

U11,17,15 + n11,17,13 + n17,11,15 + n17,11,13

U11,23,11 + "11,23,23 + n23,11,11 + n23,11,23

"11,23,13 + "11,23,21 + n23,11,13 + I23,11,21

"11,23,15 + U11,23,19 + n23,11,15 + n23,11,19

n13,13,4 + n13,13,10 + n13,13,16 + n13,13,22

n13,13,13

n13,15,5 + n13,15,23 + n15,13,5 + n15,13,23

n13,15,11 + n13,15,17 + n15,13,11 + 15,13,17

n13,15,17 + n13,15,11 + n15,13,17 + 15,13,11

n13,17,5 + n13,17,25 + n17,13,5 + n17,13,25

n13,17,8 + n13,17,22 + l17,13,8 + n17,13,22

n13,17,14 + n13,17,16 + n17,13,14 + n17,13,16

n13,23,11 + n13,23,25 + n23,13,11 + n23,13,25

n13,23,14 + U13,23,22 + n23,13,14 + U23,13,22

U13,23,17 + U13,23,19 + n23,13,17 + U23,13,19

U15,15,6 + U15,15,12 + n15,15,18 + U15,15,24

U15,15,15

U15,17,6 + U15,17,26 + n17,15,6 + U17,15,26

U15,17,9 + U15,17,23 + n17,15,9 + U17,15,23

U15,17,15 + U15,17,17 + n17,15,15 + U17,15,17











NA95

N97

N99




N103

N105

N107

N109

NT111




AT115






v121

N123

N125

N1V27


N129

A131

A133

A135

N137


A139 n13,14,10 + n13,14,17 + n14,13,10 + U14,13,17


n15,23,11 + n15,23,27 + n23,15,11 + n23,15,27

n15,23,14 + n15,23,24 + n23,15,14 + n23,15,24

n15,23,17 + n15,23,21 + n23,15,17 + n23,15,21

"17,17,7 + n17,17,27

"17,17,9 + n17,17,25

n17,23,13 + n17,23,27 + n23,17,13 + n23,17,27

n17,23,15 + n17,23,25 + n23,17,15 + n23,17,25

n17,23,17 + n17,23,23 + n23,17,17 + n23,17,23

n23,23,19 + n23,23,27

n23,23,21 + n23,23,25

n5,14,1 + n5,14,18 + "14,5,1 + "14,5,18

n5,14,3 + n5,14,16 + "14,5,3 + n14,5,16

n5,14,5 + n5,14,14 + "14,5,5 + "14,5,14

n5,14,7 + n5,14,12 + "14,5,7 + n14,5,12

n5,14,9 + n5,14,10 + "14,5,9 + n14,5,10

ll,14,2 + nll,14,23 + nl4,11,2 + n14,11,23

l11,14,4 + l11,14,21 + "14,11,4 + n14,11,21

11,14,6 + l11,14,19 + "14,11,6 + n14,11,19

l11,14,11 + "11,14,14 + n14,11,11 + n14,11,14

"13,14,1 + n13,14,26 + "14,13,1 + n14,13,26

"13,14,4 + n13,14,23 + "14,13,4 + n14,13,23

"13,14,7 + n13,14,20 + "14,13,7 + n14,13,20


AT96

N98




N102

N104

N106

N108

N110o

A112

A114

A116

A118

N120

N122

N124

N126




A130

N132

A134

A136

A138

N140


n15,23,12 + n15,23,26 + n23,15,12 + n23,15,26

n15,23,15 + n15,23,23 + n23,15,15 + n23,15,23

n15,23,18 + n15,23,20 + n23,15,18 + n23,15,20

"17,17,8 + n17,17,16 + n17,17,18 + n17,17,26

n17,17,17

n17,23,14 + n17,23,26 + n23,17,14 + n23,17,26

n17,23,16 + n17,23,24 + n23,17,16 + n23,17,24

n17,23,18 + n17,23,22 + n23,17,18 + n23,17,22

n23,23,20 + U23,23,22 + U23,23,24 + U23,23,26

n23,23,23

n5,14,2 + n5,14,17 + "14,5,2 + "14,5,17

n5,14,4 + n5,14,15 + "14,5,4 + n14,5,15

n5,14,6 + n5,14,13 + "14,5,6 + n14,5,13

n5,14,8 + n5,14,11 + "14,5,8 + "14,5,11

"11,14,1 + "11,14,24 + "14,11,1 + n14,11,24

l11,14,3 + l11,14,22 + "14,11,3 + n14,11,22

"11,14,5 + "11,14,20 + "14,11,5 + n14,11,20

"11,14,10 + "11,14,15 + n14,11,10 + n14,11,15

n11,14,12 + U11,14,13 + U14,11,12 + U14,11,13

1n3,14,2 + n13,14,25 + 1n4,13,2 + n14,13,25

"13,14,5 + U13,14,22 + U14,13,5 + n14,13,22

1n3,14,8 + U13,14,19 + U14,13,7 + U14,13,19

n13,14,11 + U13,14,16 + U14,13,11 + U14,13,16











N141

N143

N145

N147

N149

N151

vN153

1V55

N157

N159

N161

N163

N165

N167

N169

N171

N173

N175

N177

N179

N181


n13,14,13 + n13,14,14 + n14,13,13 + n14,13,14 N142

n14,14,2 + n14,14,26 N144

n14,14,4 + n14,14,24 N146

n14,14,6 + n14,14,22 N148

n14,14,8 + n14,14,20 N150

n14,14,10 + n14,14,18 N152

n14,14,12 + n14,14,16 N154

n14,14,14 N156

n15,14,3 + n15,14,26 + n14,15,3 + n14,15,26 N158

n15,14,6 + n15,14,23 + n14,15,6 + n14,15,23 N160

n15,14,9 + n15,14,20 + n14,15,9 + n14,15,20 N162

n15,14,12 + n15,14,17 + n14,15,12 + n14,15,17 N164

n17,14,4 + n17,14,27 + n14,17,4 + n14,17,27 N166

n17,14,6 + n17,14,25 + n14,17,6 + n14,17,25 N168

n17,14,8 + n17,14,23 + n14,17,8 + n14,17,23 N170

n17,14,13 + n17,14,18 + n14,17,13 + n14,17,18 N172

n17,14,15 + n17,14,16 + n14,17,15 + n14,17,16 N174

n23,14,11 + n23,14,26 + n14,23,11 + n14,23,26 N176

n23,14,13 + n23,14,24 + n14,23,13 + n14,23,24 N178

n23,14,15 + n23,14,22 + n14,23,15 + n14,23,22 N180

n23,14,17 + n23,14,20 + n14,23,17 + n14,23,20 N182


n14,14,1 + n14,14,27

n14,14,3 + n14,14,25

n14,14,5 + n14,14,23

n14,14,7 + n14,14,21

n14,14,9 + n14,14,19

n14,14,11 + n14,14,17

n14,14,13 + n14,14,15

n15,14,2 + n15,14,27 + n14,15,2 + n14,15,27

n15,14,5 + n15,14,24 + n14,15,5 + n14,15,24

n15,14,8 + n15,14,21 + n14,15,8 + n14,15,21

n15,14,11 + n15,14,18 + n14,15,11 + n14,15,18

n15,14,14 + n15,14,15 + n14,15,14 + n14,15,15

n17,14,5 + n17,14,26 + n14,17,5 + n14,17,26

n17,14,7 + n17,14,24 + n14,17,7 + n14,17,24

n17,14,9 + n17,14,22 + n14,17,9 + n14,17,22

n17,14,14 + n17,14,17 + n14,17,14 + n14,17,17

n23,14,10 + n23,14,27 + n14,23,10 + n14,23,27

n23,14,12 + n23,14,25 + n14,23,12 + n14,23,25

n23,14,14 + n23,14,23 + n14,23,14 + n14,23,23

n23,14,16 + n23,14,21 + n14,23,16 + n14,23,21

n23,14,18 + n23,14,19 + n14,23,18 + n14,23,19









2.3.5 Marker Ordering

Three-point analysis allows the determination of an optimal marker order based

on linkage analysis (Thompson 1984). An optimal order of markers corresponds to the

shortest map length. To estimate the cumulative map distance for three ordered markers,

we need to assume that there is no interference between different marker intervals (Wu

et al., 2007a). Under this assumption, for a given marker order A-B-C, the probability

with which there is one crossover between A-B and also there is one crossover between

B-C can be expressed as g1 = 012023 = g11 + 10) + 01) = (g gg o10 900 g901 goo),

which leads to


910 = goo11 (2-49)
901

All the parameters in Q,9 are estimated with the EM algorithm, except that glo should

meet the constraint of equation (2-49).

The same procedure is conducted when the other two possible orders, A-C-B and

B-A-C, are assumed. The optimal order of the three markers is one that produces the

largest likelihood value among the three marker orders.

2.3.6 Hypothesis Testing

After an optimal order is determined, we need to test whether these markers are

significantly linked. For a given order A-B-C, the null hypotheses of OAB = 0.5, 0BC

0.5, and OAC = 0.5 are equivalent to the constraint as follows:


g1 = 9go = goi = goo = 0.25. (2-50)


We can further estimate and test the degree of interference in the occurrence of

crossovers between the two .I1i i:ent marker intervals. For the order A-B-C, we estimate

the coefficient of interference by equation (2-24). The null hypothesis Ho: I = 0 (equation

(2-25)) is equivalent to constraint (2-49). When there is no double recombination between









two .,li i -ent marker intervals (equation (2-26)), we have the constraint expressed as

g11 = 0.

It is interesting to test the significance of linkage disequilibria in the sampled

population. The three-point model will estimate four types of linkage disequilibria

using equations (2-36). Each of these linkage disequilibria can be tested individually or

through any combinations. The estimates of haplotype frequencies under various null

hypothesis tests of linkage disequilibria can be obtained through the EM algorithm with

the constraints from equations (2-36).

2.4 Computer Simulation

2.4.1 Two-Point Model

We use computer simulation to examine the statistical properties of the linkage and

linkage disequilibrium model proposed. The simulation scheme mimics a natural human

population at HWE from which a panel of unrelated families (each including a male

parent, a female parent, and one or more children) is randomly sampled. Given a total of

1000 subjects, the simulation considers two sampling strategies, 1000 x 1 (more families

vs. smaller size) and 200 x 5 (fewer families vs. larger size). For each strategy, we model

two markers with a -l i iw- moderate, and weak linkage disequilibrium in the population.

The allele frequencies for the two markers are p = 0.6 and q = 0.5, respectively. The

two markers are linked with varying sizes of recombination fraction. In each design,

1000 simulation replicates were performed to estimate the means of the MLEs for each

parameter and their standard deviations.

Table 2-4 gives the results from simulation studies under different designs. Allele

frequencies and linkage disequilibrium can be estimated with high accuracy and

precision. For a certain sample size including all subjects, a more families vs. smaller

size design provides better estimates than do a fewer families vs. larger size design. The

estimation precision of linkage disequilibrium can be improved with increasing degrees

of disequilibrium. In general, the recombination fraction can also well be estimated, but









depending on the degree of linkage disequilibrium. If linkage disequilibrium is near zero,

then Q is close to 1/2 so that cu\ and w2 will not contain 0. Thus, 0 is not estimable when

there is no association between the two markers.

The two-point model proposed can detect sex-specific differences in the recombination

fraction and linkage disequilibrium. In a simulation study assuming different allele

frequencies, different disequilibria, and different recombination fractions between males

and females, the model shows good power to discern sex-specific differences when there

is a large linkage disequilibrium in the population (table 2-5). When there is a small

linkage disequilibrium, the power to detect sex-specific linkage is low. Also, a small-family

sampling strategy will show low power for detecting the difference of linkage between the

two sexes.

2.4.2 Three-Point Model

We simulated data for three markers A, B, and C with allele frequencies, linkage

disequilibria of different types, and the recombination fractions given in table 2-6.

We consider a strong association between A and B, a moderately strong association

between A and C, and a weak association between B and C. Also, we assume three

different cases of linkage, no double recombination (c = 0), independent recombination

(c = i), and interference (c = 3). For all sampling strategies (family number vs.
family size), three-point model can provide excellent estimates of all the parameters

with great accuracy and precision. Table 2-6 lists the results of parameter estimates

under three-point model for a 1000 x 1 sampling strategy. It is interesting to see that

three-point model can accurately estimate the recombination of fraction between two

markers showing a low linkage disequilibrium. The same data were analyzed by two-point

model. Although most parameters can be estimated by two-point model as precisely as by

three-point model, the recombination fraction between weakly associated markers cannot

well be estimated. Three-point model is more powerful for detecting linkage and linkage









MLEs ( standard deviations) of allele frequencies, linkage disequilibrium, and
recombination fraction from 1000 simulation replicates under different sampling


strategies


MLE


Weak Linkage (0
0.600 0.017 0.499
0.600 0.017 0.500
0.600 0.017 0.501
0.600 0.008 0.500
0.600 0.008 0.500
0.600 0.008 0.500


-0.45)
0.018
0.017
0.018
0.008
0.008
0.008


0.190
0.100
0.020
0.190
0.100
0.020


0.006
0.010
0.012
0.003
0.005
0.005


0.449
0.448
0.465
0.450
0.449
0.446


0.021
0.037
0.240
0.022
0.038
0.199


Moderate Linkage (0 = 0.20)


0.600
0.601
0.600
0.600
0.600
0.600


0.017
0.017
0.017
0.008
0.008
0.008


0.499
0.500
0.501
0.500
0.500
0.500


Strong Linkage


0.600
0.601
0.600
0.600
0.600
0.600


0.017
0.017
0.017
0.008
0.008
0.008


0.499
0.500
0.501
0.500
0.500
0.500


0.018
0.017
0.018
0.008
0.008
0.008


0.190
0.100
0.020
0.190
0.100
0.020


0.006
0.010
0.012
0.003
0.005
0.005


0.200
0.197
0.279
0.200
0.199
0.198


0.016
0.046
0.272
0.016
0.035
0.162


0.05)


0.018
0.017
0.018
0.008
0.008
0.008


0.190
0.100
0.020
0.190
0.100
0.020


0.006
0.010
0.012
0.003
0.005
0.005


0.050
0.053
0.205
0.050
0.049
0.095


0.010
0.045
0.274
0.009
0.028
0.117


Highly Strong
0.600 0.017
0.601 0.017
0.600 0.017
0.600 0.008
0.600 0.008
0.600 0.008


Linkage (0 =
0.499 0.018
0.500 0.017
0.501 0.018
0.500 0.008
0.500 0.008
0.500 0.008


Table 2-4.




Famil
Number


Size


200
200
200
1000
1000
1000


200
200
200
1000
1000
1000


0.190
0.100
0.020
0.190
0.100
0.020


200
200
200
1000
1000
1000


0.190
0.100
0.020
0.190
0.100
0.020


200
200
200
1000
1000
1000


0.190
0.100
0.020
0.190
0.100
0.020


0.005)
0.190
0.100
0.020
0.190
0.100
0.020


0.006
0.010
0.012
0.003
0.005
0.005


0.006
0.024
0.191
0.005
0.013
0.075


0.006
0.033
0.275
0.004
0.017
0.108


True
D


0.190
0.100
0.020
0.190
0.100
0.020









Table 2-5. Power to detect sex-specific differences in the recombination fraction and
linkage disequilibrium


Family
Number Size
1000 1
30000 1
1000 1
3000 1
5000 1
10000 1


Df
0.004
0.004
0.04
0.04
0.04
0.04


Parameters
pm qm Drm
0.6 0.9 0.009
0.6 0.9 0.009
0.6 0.8 0.09
0.6 0.8 0.09
0.6 0.8 0.09
0.6 0.8 0.09


Of Om
0.05 0.20
0.05 0.20
0.05 0.20
0.05 0.20
0.05 0.20
0.05 0.20


Power
0 D
0.024 0.136
0.156 0.985
0.280 1.000
0.664 1.000
0.874 1.000
0.996 1.000


disequilibrium and their sex-specific differences than two-point model, especially when the

markers are not strongly associated.

Next, we prove a theorem of the non-estimability of recombination fraction between

weakly associated markers in two-point model.

Theorem 1. [Two-point /i,/l;',:--1 For ii.;, two SNPs with LD of 0, the consistent

estimate of the recombination fraction does not exist.

Proof: For a two-point model, by formula (2-1), D = 0 implies that plipoo = plopol.

That means


SP1Poo
P1iPoo +Piopoi
PioPol
PiiPoo + PioPol


by (2-4) and (2-5). Then, by (2-6), it implies


W1 = ((1
1
-(1
2
1
2


1
2











Table 2-6. MLEs of linkage disequilibria and recombination fractions among markers under
the three-point model, in a comparison with those under the two-point model.
The numbers in the parentheses are the standard deviations of the MLEs


Marker c (OAc) p q r DBC DAC DAB DABC BAB OBC BAC
True Value 0.6 0.5 0.4 0.02 0.06 0.10 0.01 0.05 0.05

Three-Point Model
A-B-C 0 (0.100) 0.600 0.500 0.400 0.020 0.060 0.100 0.010 0.051 0.057 0.092
(0.007) (0.008) (0.008) (0.006) (0.005) (0.005) (0.002) (0.024) (0.038) (0.049)
A-B-C 1 (0.095) 0.600 0.500 0.400 0.020 0.060 0.100 0.010 0.051 0.057 0.090
(0.007) (0.008) (0.008) (0.006) (0.005) (0.005) (0.002) (0.024) (0.038) (0.050)
A-B-C 3 (0.085) 0.600 0.500 0.400 0.020 0.060 0.100 0.010 0.051 0.057 0.086
(0.007) (0.008) (0.008) (0.006) (0.005) (0.005) (0.002) (0.024) (0.038) (0.050)

Two-Point Model
A-B 0 (0.100) 0.600 0.500 0.100 0.049
(0.007) (0.008) (0.005) (0.030)
B-C 0 (0.100) 0.500 0.400 0.020 0.170
(0.008) (0.008) (0.006) (0.177)
A-C 0 (0.100) 0.600 0.400 0.060 0.054
(0.007) (0.008) (0.005) (0.048)

A-B 1 (0.095) 0.600 0.500 0.100 0.049
(0.007) (0.008) (0.005) (0.030)
B-C 1 (0.095) 0.500 0.400 0.020 0.167
(0.008) (0.008) (0.006) (0.176)
A-C 1 (0.095) 0.600 0.400 0.060 0.054
(0.007) (0.008) (0.005) (0.049)

A-B 3 (0.085) 0.600 0.500 0.100 0.049
(0.007) (0.008) (0.005) (0.030)
B-C 3 (0.085) 0.500 0.400 0.020 0.159
(0.008) (0.008) (0.006) (0.174)
A-C 3 (0.085) 0.600 0.400 0.060 0.054
(0.007) (0.008) (0.005) (0.048)


Similarly, it can be shown that w02 = too. Under this condition, the lower-level

likelihood (2-7) will not contain the recombination fraction 0 in any terms, which makes it

non-estimable.

On the other hand, for three-point model, suppose that one of the linkage disequilibria

between one pair of markers, i D12 with the marker order setting of 1-2-3 is zero, what

we need to show is that 012 can be estimated as long as not all D's are zeroes.

We only need to show that there is at least one term involving 012 (or go10 and g1l) in

the lower-level likelihood (2-32). By (2-36), D12 = 0 implies












D12 = (pll +p110o)(poo + 000) (p01 + 100)(po01 + Po01o)


-0
(2-51)
= (pll +p0o)(pooi +00ooo) = (p01 + 10oo)(po01 + po01o)

i.e. plipoo = PioPol

Notice that, similarly to the two-point model, the non-estimability of 012 means V13 and

V03 in (2-31) do not contain 012, which leads to 013 = and (P03 = By formulas in table

2.3.3, it further implies that


Pillpooi = Poipoiio

Pilopooo = PIoopolo


But (2-51) does not satisfy the above equations as other D's are not zeroes, which proves

that 012 is estimable. o

When all the linkage disequilibria between three pairs of markers and among all the

three markers are zero, all the relative frequencies pij (i = 0, 1; j = 1, 2, 3) defined in

table (2.3.3) are equal to 1 and Ti, (i = 1, 2, 3, 4) defines in (2-29) are equal to 1. Then

vij (i = 0, 1;j = 1,2, 3) defined in (2-31) and (i = 1,2, 3, 4) defined in (2-30) in the

lower-level likelihood (2-32) are constant, which makes gij (i,j = 0, 1) non-estimable.

Using the similar derivation, these three-point model non-estimability issue can be solved

by a four-point model.

2.5 Discussion

The study of patterns of linkage disequilibrium (LD) over a genomic region has

received considerable attention in recent human genetic projects (Ardlie et al., 2002; Daly

et al., 2001; Dawson et al., 2002; Gabriel et al., 2002; Morton, 2005; Reich et al., 2001;

Tsunoda et al., 2004). These investigations provide an important contribution to our

understanding of the underlying structure of LD in the human genome. In this article,









we present an algorithmic model for simultaneously estimating the linkage and linkage

disequilibrium between different markers and further constructing a map of LD decaying

with genomic length. The model allows a large-scale comparison of the differences in the

strength and distribution of LD between male and females human populations. LD maps

p1 -i important roles in disease mapping and population genetics. First, LD maps indicate

regions of LD breakdown within which higher marker densities may be required for

identification of some causal polymorphisms. Experimental designs assuming a constant

or average level of LD across the genome is clearly awed (Schork, 2002). Instead, marker

selection, sample size estimations, and statistical power could be based on the empirically

determined LD map of the population of interest. Second, LD maps can be used to study

the evolutionary history of populations. The populations in which LD decays rapidly

within a small length of genome are considered to have a longer history than those where

LD does not remarkably change over the genome (Dawson et al., 2002; Gabriel et al.,

2002; Reich et al., 2001).

Linkage disequilibrium describes the non-random association between different

markers in a population, whereas the linkage concerns the co-transmission of different

markers from parents to offspring at meiosis. These two concepts are traditionally

separated in genetic studies, but their joint application aimed to increase genetic mapping

resolution has just been beginning (Dupuis et al., 2007; Farnir et al., 2002; Lee and

Van der Werf, 2006; Meuwissen and Goddard, 2000; Wu et al., 2002; Wu and Zeng, 2001).

The joint application of the linkage and linkage disequilibrium critically rely on their

simultaneous estimation. Currently, most designs and algorithms have been derived for

plant and animal populations (see the references cited above). Our model proposed in

this article considers the characteristic of human families, providing a useful approach

for estimating the linkage and LD at the same time. Most published models for the joint

estimation of the linkage and LD are based on two-locus analysis. By making full use of

genetic information, three-locus model increases dramatically the precision of parameter









estimation and the power for linkage and linkage disequilibrium detection, as compared

with a two-locus analysis. As detected from Monte Carlo simulations, three locus model

is particularly advantageous over two-locus model in parameter estimation precisions and

linkage and LD detection power when two markers are less strongly associated. Although

more parameters are involved, three locus model is computationally efficient in large part

because the three-locus estimation derived was based on a closed system of equations

implemented with the EM algorithm. Our approach derived within the EM paradigm

is comparable with the MC'\ I C algorithm (Niu et al., 2002) in terms of model deriving

and computing efficiency, but it doses not require the proper choices of priors for the

parameters, as needed for the B ,. i in approach, although this is often difficult.

The results of marker analysis with our model can illustrate how the interplay

between recombination and linkage disequilibrium serves as the i ii r force to shape the

patterns of LD in the human genome. While demography strongly impacts the overall

extent of LD manifested in the LD map lengths (Hernandez et al., 2007), a modified

model that incorporates demography and recombination should have better power to

define the major features of the LD maps. Our model allows the relatedness of sampled

families from a population through the implementation of identical-by-descent (IBD)

probabilities. Hill and Hernandez-Sanchez (2007) recently proposed a model for analyzing

the probability of multilocus IBD. A new approach for calculating probabilities of IBD for

pairs of haplotypes has been derived by Browning (2008). These models can be readily

incorporated into our joint linkage and linkage disequilibrium analysis strategy to better

characterize the genetic structure and diversity of a natural population with related

families.









CHAPTER 3
GENETIC HAPLOTYPING OF COMPLEX TRAITS

Traditional approaches for gene detection are based on the genetic mapping of

quantitative trait loci (QTLs) with a genetic linkage map. Because the markers and a

QTL bracketed by them are located at different genomic positions, the significant linkage

of a QTL detected with given markers cannot provide any precise information about

the sequence structure of the QTL unless the QTL is exactly at, or very close to, the

markers. A more accurate and useful approach for characterizing the genetic variants

contributing to quantitative variation is to directly analyze DNA sequences associated

with a particular disease (Cooper and Psaty, 2003; Ron and Weller, 2007, The STAR

Consortium 2008). The term quantitative trait nucleotide or QTN is proposed to describe

the sequence polymorphisms that cause phenotypic variation in a quantitative trait. Clark

(1990) presented a first model for inferring and reconstructing haplotypes from a diploid

population. A score statistical test was proposed by Schaid et al. (2002) for association

studies between traits and haplotype. These previous works allows Wu and his group (Liu

et al., 2004; Wu et al., 2007b) to derive a general statistical model for the characterization

of haplotype variants at a QTN that encode a complex phenotype in a natural population.

A similar model was also proposed by Lin and his group (Huang et al., 2007; Lin and

Huang, 2007; Lin and Zeng, 2006). A strategy based on QTL information has been

developed to identify QTN for complex traits in controlled crosses (Hou et al., 2007).

Recent molecular surveys -~i-i-. -1 that the human genome contains many discrete

haplotype blocks that are sites of closely located SNPs (Dawson et al., 2002; Gabriel et al.,

2002; Patil et al., 2001). Each block may have a few common haplotypes which account

for a large proportion of chromosomal variation. Between .I-1i ient blocks are there large

regions, called hotspots, in which recombination events occur with high frequencies.

Several algorithms have been developed to identify a minimal subset of SNPs, i.e.,

I ii--, SNPs, that can characterize the most common haplotypes (Zhang et al., 2002).









The number and type of I -' i,-; SNPs within each haplotype block can be determined

prior to association studies.

Most genome-wide association studies in the literature are to associate genotypes

at single :-'-ir-; SNPs with phenotypes and test their significance by adjusting multiple

comparisons. This approach has been instrumental for the detection of significant SNPs

affecting human traits (Lettre et al., 2008; Sanna et al., 2008; Weedon et al., 2008, 2007).

There is increasing evidence for the association between haplotypes at different SNPs and

phenotypic variation in complex traits (Bader, 2001; Clark, 2004; Judson et al., 2000).

Statistical simulation also indicates that haplotype analysis with multiple SNPs may be

more powerful than single SNP analysis (Akey et al., 2001; Collins et al., 1997; Morris

and Kaplan, 2002; Zaykin et al., 2002). All current haplotyping approaches are based on

a population-based design. This design may not only have less power to study inherited

diseases, like cancer, but also produce spurious results due to population substructure.

These shortcomings can be circumvented by using a family-based design. A design with

multiple nuclear families has an additional advantage that a linkage-linkage disequilibrium

map can be drawn to study the pattern of genetic variation throughout the genome. B

In this chapter, I will develop a haplotyping method for the identification of DNA

sequence variants that encode a quantitative trait in a natural population with family

structured data. By specifying genetic values of haplotypes and their interactions

according to well-developed quantitative genetic principles, I will derive models that

provide reasonable genetic interpretations of results.

3.1 Haplotype and Diplotype

A haplotype represents a linear arrangement of nucleotides alleless) at different SNPs

on a single chromosome, or part of a chromosome. The pair of haplotypes is called a

diplotype. The observed phenotype of a diplotype is called a genotype. A diplotype is

alh-bi-i composed by two haplotypes, with one from the maternal parent and the other one

from the paternal parent. Consider two SNPs on the same genomic region, one with two










Figure 3-1. Haplotype configuration of a diplotype for two SNPs

Haplotype I
AB 'A -'a SNP 1 has two alleles A and a



__Bb SNP 2 has two alleles B and b
Haplotype
ab I I



Diplotype
[AB\[ab\


alleles A and a and the other with two alleles B and b, respectively. Allele A from SNP 1

and allele B from SNP 2 are located on the first homologous chromosome, whereas allele a

from SNP1 and allele b from SNP 2 located on the second homologous chromosome. Thus,

[AB] is one haplotype and [ab] is a second haplotype, and both constitute a diplotype

[AB][ab] (Fig. 1).

As stated in the previous chapter, we can only observe the iJ. .' nrpe expressed as

Aa/Bb. However, the double heterozygote may be one (and only one) of two possible

diplotypes [AB] [ab] and [Ab] [aB]. But these two diplotypes cannot be directly observed,

Fig. 2 can demonstrate this situation. While QTL mapping is to correlate the QTL

,j iil .rpes with a phenotypic trait, the aim of this chapter is to estimate the haplotype

effects on a quantitative trait based on the diplotypes and therefore genotypes. In QTL

r ippinj the double heterozygote Aa/Bb may be found to associate with a favorable

phenotype. But according to QTN haplotyping, Aa/Bb may not correlate with the

favorable phenotype if its diplotype is [AB] [ab] when haplotype [Ab] or [aB] encodes the

traits. Thus if the genetic effect is expresses at the haplotype level, the same ii, In1li'pe

Aa/Bb may perform differently, depending on which di 'l.1 lrpe it carries. Therefore, it is

important to estimate the haplotype effects based on diplotypes and therefore genotypes.









Diplotype configuration of a -i*i I pe for two SNPs

[AB][1ab] AFbj]|P7B|

AII a AI| a


+
b b


Not observable


SGenotype: Phenotype
As/Bb




Obisvable t


QTN Effect


In other words, haplotyping complex diseases is to detect specific SNP haplotypes that
contribute distinctively to the diseases.
3.2 Two-Point Haplotyping Model

3.2.1 Likelihood
In our framework, the complete data are di1 ..li pe configurations at a given set
of SNPs for each .i- ir1i Tpe and for disease outcomes of subjects, whereas the observed
data are the genotypes of these SNPs and the disease outcomes. The missing data are
at the connection from the genotypes to diplotypes. For any given ,. --n. 1Tipe, all possible
diplotypes can be expressed out. For example, genotype AA/BB has only one diplotype

[AB][AB]. This is because for I. .ii... v-ote, the .-.i -Ir'lTpe and diplotype are identical,
so is for the situation where at most one SNP is heterozygous. However, for double
heterozygous genotype Aa/Bb, there will be two possible diplotypes [AB][ab] and [Ab][aB].


Figure 3-2.











































































































Q
Q~


zt<







zt-.
e




e




e
u










a
o
*-^


u


a
^-


b~ b~
b~ lo
^ ^
;n;n

~ r:
v v


b~ b~
oo
^ ^
;n;n

~ r:
v v


lo
lo

;n

r:


m
lo
lo



a


cu
lo
lo

;n

a



lo
lo



r:





ci t:
t:



co

ci a
t:


z--I


C CO




v v


CMl







v v

^ ^ ^

cq cq


cu cu
lo ~

ss,

a a



b~ lo



~ r:


rg~9 a319


a a
au a
t:


CO It 1-- 00 C









Table 3-1 lists all possible genotypes and diplotypes at two SNPs for offspring given

their parent's genotypes. Each ,j. 11.'1 pe (and therefore each diplotype) is composed of

two haplotypes, one from the mother and the other one from father. As mentioned in

previous chapter, we use pn1, pio, pol and poo to represent four haplotype frequencies

of AB, Ab, aB and ab. Then the relative frequencies of two diplotypes for the double

heterozygote are a function of haplotype frequencies, which was expressed in (2-4) and

(2-5). At this stage, the haplotype frequencies and the recombination fraction, denoted

by Qp = (pn, pio,Pol,poo; 0) in chapter 2, that belong to population genetic parameters

have been estimated from chapter 2. With assuming that diplotypes are associated

with phenotypic variation in a disease, the likelihood for unknown quantitative genetic

parameters (QQ) given observed phenotypes (y), parent's and offspring's .. I ir..' pes (i.,

Mf, i .) and population genetic parameters (Qp) can be formulated. Liu (2004) gives

a general likelihood for two-point model using offspring's generation data as shown as

follows.

Use superscripts and subscripts to distinguish between different SNPs and different

alleles within SNPs, respectively. For example, double heterozygous genotype A{A1/A2A2

has two diplotypes [A{A1] [AA 2] and [AA ] [AA 2]. Generally speaking, a given 2-SNP

,J I.ri pe A' A' /A/ A/ (in our framework, from AABB to aabb, denoted by 1 to 9), can

be partitioned into at most two possible diplotypes, [A'A ] [A A ] and [A1 A2 ][A A 21.

Thus, the log-likelihood function ca be formulated on the basis of a two-component

mixture model, i.e.,



log L(OQQ y, M,, Qp) 0log[aif ,I,](y]i) +(1- ai)f (,() (3 1)
k=i

where n was defined as n = Y:,9 u nijk and ai c {1, 0, Q, 1 }. Again, Q defined in (2-4)

represents the relative frequency of offspring i whose diplotype is [A1AA] [AllAA2], and











t1 (y P II' 1.."'2])2
"f, 21. (Uy) 1 exp[- i]

f (y) 1 exp[ (U P .I .Lk2i])2
,2wC a 2a2

are the probability density functions for offspring i who has two possible diplotypes,

respectively, with the genotypic means of 1'[ki ii..'2] for diplotype [A' A 21[AA2 ] and

I& ,. II, for diplotype [A1 A2] [A1 A2] and the common residual variance a2. In fact, f

can be any probability density function. In addition to commonly used Gaussian, it can

also be Bernoulli or Poisson.

Suppose there is a particular haplotype, -i [AB], that is different from the other

three haplotypes [Ab], [aB], and [ab] (which are collectively denoted as [ABI]), in its effect

on a complex disease. This particular haplotype is called the risk haplotype. Then 10

possible diplotypes can be sorted into three types of composite diplotypes, i.e., [AB][AB],

[AB] [AB] and [AB] [AB]. The ph.1 ii.. irpic means of these composite diplotypes are written

as P2, pi, and po. Ignoring the covariate effects, the phenotypic trait od subject i is
expressed as


Ulm = Pli + elm (1 = 2, 1, 0; m = 1, nijk)

where subscript ijk represents the offspring with ,. n, .1 rpe k whose parent genotypes

are i and j, yl, can follow some specific density distribution with mean Pt and elm is

the residual variance. On the basis of quantitative genetic theory, these three p's can be

partitioned into the overall mean (p), the additive effect (a) due to the substitution of

different haplotypes, and the dominance effect(d) due to the interaction between different

haplotypes. i.e.,












P2 = p + a (3-2)

1i = p+d (3-3)

Mo = p -a (3-4)

In table 3-1, the .,. I.1'rpic means are given for different genotypes by assuming that

haplotype [AB] is different from the rest three of the haplotypes.

The advantage of family structured model is that it can help to determine the relative

frequency of the diplotype in double heterozygous offspring by involving the parent's

,j. In.r.ipe. For example, for group 5 in table 3-1 where mother's genotype is AABB and

father's .,. In.1irpe is AaBb, the relative frequency 7i for double heterozygote offspring

AaBb equals 1. In other words, unlike one-generation model, with the availability of

parent's genotype information, the di|1l' irpe of double heterozygote offspring can be

figured out of being [AB] [ab] with probability of 1 in our framework. Therefore, the










log-likelihood is now expressed as the following by using the multinomial distribution.


log L(2Q |y, M Mf, Mo, p) = constant
ill n121 n122 1132
+ log f2(Y1111) + log91 (121) + 0log o(y1221) + og f1(Y1321)



1184 1185 1195
+ log fl (yI841) + log fl(yI851) + 10 log l(1951)
l=1 l=1 l=1


,n551 1552 1553 l554
+ log f2 (Y511) + logf l(y5521) + 10 log o(5531) + logfl(y5541) (3-5)
l=1 l=1 l=1 l=1
f555
+ Zlc-:-.I.-- fl(ys5551) + (1 555)Afo(y5551)]
l=1
1556 1557 1558 1559
+ log fo(Y5561) + 0 log o(5571) + 10 log o(5581) + 0 log o(5591)+
l=1 l=1 l=1 l=1
1999
+*** + log fo(y9991)
=1

where f1, i = 2, 1, 0 are Gaussian probability density functions with mean pi, i = 2, 1, 0

and variance a2 here. And the relative frequency =jij5s(taken value from {0, 1, -, t1, 2})

can be determined from the following table depending parent's genotypes.










Mother

Father 11/11 11/10 11/00 10/11 10/10 10/00 00/11 00/10 00/00

11/11 0 0 0 0 1 1 0 1 1

11/10 0 0 0 0 7TI 1 0 1

11/00 0 0 0 0 0 0 0 0 0

10/11 0 0 0 0 1TI i 0 1 1

10/10 1 7TI 0 i 72 71 0 TxI 1

10/00 1 1 0 1 1TI 0 0 0 0

00/11 0 0 0 0 0 0 0 0 0

00/10 1 1 0 1 1FI 0 0 0 0

00/00 1 1 0 1 1 0 0 0 0
where


(1 r)pipoo + rpiopol
PniPoo + PoPol
[(1 r)pipoo + rpiop1i]2
[(1 r)p1ipoo + rpiopol]2 + [rp1100 + (1


(3-6)


(3-7)


r)pioPo11 2


3.2.2 Estimation

By maximizing the likelihood (3-5), a closed form for EM algorithm was derived to

estimate the genotype means and the variance. This procedure is described as follows:

In the E step, we calculate the probability with which any double heterozygote carries

a particular diplotype with formulas shown in the above table and equations (3-7) as well

as (3-7), and also the posterior probability with which an offspring of double heterozygote

carries a particular diplotype using


=ljij5f1(Y ij5l) + (1 ljij5s)fo(Yij51)
9
( t 1 n i j -ys n i ij k = 1, k J t 9).
ij-1


(3-8)










In the M step, ,vi. .1 vpe means and variance are estimated with the calculated

diplotype probability by the closed form


9 niji

2 = 1 1-1
1 i,j= 1Z=1
9 nijk 9 rijs

no + n + fl1ij5 ij=1 k=2,4 1-1 ij-1 11


no + no + no + no + no+ 1(1 HnilI, )
9 nijk 9 frijs
+ > > -( 111lij5> -,'
ij=l k= 3,6,7,8,9 1-1 ij-1 1
9 niji 9 nijk
al~~~~~~ -E ''^ E E"
i=l i ij=1 1=1 i,j=1 k=-2,4 l=1
9 rijs 9 nijkc
+ EE .-' -0)2( lij5) E E
ij=1 1 1 ij=1 k=3,6,7,8,9 1-1


(3-9)

9 rijs
~1)2 i+ E E _' 1)2 lij5
i,j=1 1=1

S.', 10)2]


Then, the overall mean, additive effect as well as the dominance effect can be solved

from (3-3):




p = (2 + 0)
2
1
d = 1--(2 +0)
2

a =-(p2 -0)
2




3.2.3 Hypothesis Testing

Whether or not there exists a significant risk haplotype can be tested by formulating

the following hypotheses about the significance of haplotype additive and dominance

effects:












Ho : a = d = 0 (3-10)

Hi: at least one equality in Ho does not hold. (3-11)


The above Ho is equivalent to Ho : pi p, i = 2, 1, 0. The log-likelihood ratio test

statistic (LR) for the significance of haplotype effect can be calculated by the difference

between the likelihood values under H1 (full model) and Ho (reduced model).


LR2 = -2[logL(Op, ji y,, M,, Mf, M,) [log L(Op, ,Q y,, M,, Mf, M1],


where the tildes and hats denote the MLEs of parameters under the null and alternative

hypothesis in (3-11). The estimates of parameters under the null hypothesis can be

obtained with the same EM algorithm derived for alternative hypothesis, but with the

constraints in null hypothesis. This log-likelihood ratio score may approximately follow

a X2 distribution with 2 degree of freedom. However, when assumption such as Gaussian

or uncorrelated residuals are violated, the approximation of X2 distribution may not

be appropriate. We can use permutation test approach (C'nIII !h!i and Doerge 1994)

which does not rely on the distribution of LR2 to determine the critical threshold for the

significance of haplotype/diplotype effect.

3.2.4 Model Selection

There are different genetic models to identify the risk haplotype. In biallelic model,

we assume that all haplotypes are partitioned into two groups, risk and non-risk, and

define the combination of risk and non-risk haplotypes as a composite diplotypes. There

are 3 kinds of composite diplotypes depending on what the risk haplotype is. There are

seven options to choose the risk haplotype. First, the risk haplotype could be any one

from four haplotypes [AB], [Ab], [aB] and [ab]. Second, any two haplotypes could be

different from the other two, which generate three possibilities for composite diplotypes.









In a triallelic model, it is possible that there are two distinct risk haplotypes which

are each different from non-risk haplotypes. There are six possible composite diplotypes

which leads to three options of risk haplotype in triallelic model. If there are three distinct

risk haplotypes, the quadriallelic genetic model can be used to specify the haplotype

effects.

Based on the assumption that one or more haplotypes are risk haplotypes for the

biallelic, triallelic and quadriallelic model, we can formulate the likelihood for each

scenario. However, a real risk haplotype under each of these models is unknown from

the raw genotype and phenotype data. Also, we are uncertain about the optimal number

of risk haplotypes. Due to the different number of parameters being estimated in each

model, the strategy is to implement the Akaike information criterion (AIC) or B -,-i

information criterion (BIC) based model selection approach (Burnham and Anderson,

1998) to determine the haplotype/haplotypes and their number that is most distinct from

the rest of the haplotypes in explaining quantitative variation.

3.3 Three-point Haplotyping Model

The statistical method for QTN mapping is exemplified by a three-point model. As

introduced in section 2.3, three segregating markers, A (with alleles A and a), B (with

alleles B and b), and C (with alleles C and c), generate eight haplotypes from ABC to

abc and twenty seven .-. n.1'rpes from AABBCC to aabbcc. Some ,vi. Ir1 ivpes are consistent

with diplotypes, whereas the others that are heterozygote at two or three SNPs are not.

Three double heterozygote contain two different diplotypes. One triple heterozygote, i.e.

AaBbCc, contains four different diplotypes. The relative frequencies for double and triple

heterozygote are expressed as functions of haplotype frequencies given in tabular (2.3.3)

and (2-29).

By assuming ABC as a risk haplotype and all the others as non-risk haplotypes,

,j. I1.rppe values for three composite dip'l* .ipes P2, 1Li and po can be formulated similarly

as two-point model. The quantitative parameter space l 1i- the same as in two-point









model Q = (P2, 1, P10, a2). Also similarly, we face many different genetic models to

detect the risk haplotype in three-point framework from biallelic, triallelic to septallelic

and octallelic model. Likelihood values and AIC/BIC can be calculated for each model

in order to select the significant distinct haplotype/diplotype which differentiate the most

from others in the disease outcome.

3.4 Simulation

The statistical properties of the quantitative genetic parameters can be examined

by simulation studies. Suppose there are two SNPs that are segregating in a natural

population at HWE. Again, I consider two sampling strategies, 1000 x 1 (more families

vs. smaller size) and 200 x 5 (fewer families vs. larger size). The allele frequencies and

the linkage disequilibrium stay the same as the setup in simulation of chapter 2 for

population genetic parameter estimation. The recombination fraction is fixed with 0.005.

We assume one of the four possible haplotypes [AB] is the risk haplotype, this leads

to three distinct groups of composite dip 'l.i .rpes. The phenotypic values are simulated

based on Gaussian distribution with overall mean (p = 1), additive effect (a = 0.5) and

dominance effect (d = 0.4) (Table 3-2). The residual variance is determined according to

different heritability levels (0.1 vs. 0.4). The heritability is the proportion of phenotypic

variation that is attributable to genetic variation among individuals to total observed

variation. The genetic variance is determined on the basis of the genotypic values of all

diplotypes and their frequencies.

Table 3-2 describes the estimation results for both population and quantitative

genetic parameters of the proposed model. 1000 simulation replicates were performed

to calculate the mean and standard deviation of the MLEs for each parameter. For

quantitative genetic parameters (population parameters were discussed in chapter 2),

all parameters can be reasonably estimated in general. As expected the accuracy and

precision of population parameter estimation does not rely on the heritability value H2.

But the precision of estimation for quantitative parameters increases as H2 increases.













CC 00 00 0 C C 00



C0)M ( (0 (: (
000000




000000


bcCO



d -H -H
00
u00
-I C
C)


C)
--0



8H
T--13
YO
o


S00

O 0





d ^
v -








0
OG9
^


CDO






S0000000
2:


00000

-H -H -H -H -H
Ci C0 Ci C0 Ci
Ci C0 Ci C0 Ci
C,0 C, C

C0 C C









Power to detect the risk haplotype which is distinct from the other haplotypes in

disease outcome is calculated under four sampling strategies (200 x 5, 200 x 10, 1000 x 1

and 1000 x 2) with the same parameter setting as above and heritability ranges from 0.1

to 0.4 (Table 3-3). It is expected that increased heritability and increased sample size can

enlarge the power to detect the diplotype differences.

Table 3-3. Power to detect the risk haplotype under the two-point model


Family Kids H2


200
1000
200
1000

200
1000
200
1000

200
1000
200
1000

200
1000
200
1000


0.1 0.6
0.6
0.6
0.6

0.2 0.6
0.6
0.6
0.6

0.3 0.6
0.6
0.6
0.6

0.4 0.6
0.6
0.6
0.6


q D p a d r power


0.19
0.19
0.19
0.19

0.19
0.19
0.19
0.19

0.19
0.19
0.19
0.19

0.19
0.19
0.19
0.19


0.005
0.005
0.005
0.005

0.005
0.005
0.005
0.005

0.005
0.005
0.005
0.005

0.005
0.005
0.005
0.005


0.665
0.695
0.765
0.820

0.774
0.780
0.790
0.845

0.765
0.810
0.795
0.840

0.770
0.810
0.790
0.840


The simulation for three-point model is performed for 1000 x 1 sampling design. A

strong association between A and B, a moderately strong association between A and

C, and a weak association between B and C was considered. Three different cases of

linkage: no double recombination (I = 1) independent recombination (I = 1) and

interference (I = -2) were assumed. The phenotypic values of a quantitative trait were

simulated as a Gaussian distribution with mean corresponding to composite diplotypes

and variance determined by different heritability levels. As shown in table 3-4, all the

quantitative genetic parameters under each condition can be estimated with good accuracy












Table 3-4. MLEs ( standard deviations) of population and quantitative genetic
parameters under the three-point model based on 1000 x 1 design strategy


OAC
True Value

Population
0.100(1=1)

0.095(1=0)

0.085(1=-2)



Quantitativ
0AC
True Value

0.100(1=1)


p
0.6


Genetics
0.6002
(0.0068)
0.5998
(0.0085)
0.5995
(0.0075)

e Geneti
H2


0.1


0.4

0.095(1=0) 0.1

0.4

0.085(1=-2) 0.1

0.4


q
0.5


Parameters
0.4991 0
(0.0087) (0
0.4996 0
(0.0075) (0
0.5000 0
(0.0075) (0

cs Parameters

1

1.0062 0
(0.0483) (0
1.0004 0
(0.0196) (0
1.0038 0
(0.0472) (0
0.9950 0
(0.0183) (0
1.0070 0
(0.0527) (0
1.0021 0
(0.0196) (0


r DBC DAC DAB DABC BAB OBC
0.4 0.02 0.06 0.10 0.01 0.05 0.05


.3998
.0078)
.3997
.0078)
.3999
.0081)



a
0.5

.4943
.0475)
.4997
.0194)
.4986
.0468)
.5037
.0180)
.4939
.0514)
.5001
.0199)


0.0193
(0.0058)
0.0193
(0.0050)
0.0195
(0.0059)



d
0.4

0.3971
(0.0611)
0.4005
(0.0247)
0.3977
(0.0566)
0.4063
(0.0228)
0.3905
(0.0646)
0.3978
(0.0231)


0.0598
(0.0048)
0.0593
(0.0046)
0.0603
(0.0046)


0.1001
(0.0046)
0.1000
(0.0044)
0.10000
(0.0039)


0.0098
(0.0024)
0.0103
(0.0029)
0.0098
(0.0027)


0.0532
(0.0269)
0.0529
(0.0210)
0.0529
(0.0229)


0.0885
(0.0262)
0.0804
(0.0591)
0.0769
(0.0467)


and precision. A higher level of heritability provides smaller standard deviation of the


parameter estimates as expected.


3.5 Discussion


Currently used genome-wide association (GWA) studies allow the scan of functional


or causal polymorphisms from 300,000-1 million SNPs (Pearson and Manolio, 2008). It


seems that current genetic mapping has developed to a point at which a comprehensive


analysis of all the markers that cover the genetic map of the genome can be performed to


search for the chromosomal distribution of all possible QTLs or QTN. Some basic work


in GWA of QTLs has been initiated in recent years, although it is still full of challenges.


Wang et al. (2005) used B i0 i ,in shrinkage approaches for the whole-genome mapping


by shrinking the effects of all candidate QTLs toward zero. It is interesting to extend


OAC




0.1004
(0.0392)
0.0968
(0.0426)
0.0859
(0.0405)









the idea of B ,il- -i in shrinkage mapping into haplotype analysis, ultimately allowing for

the systematic scan and search of all possible QTN that affect complex human diseases

throughout the genome.

Epistasis p1 il a central role in shaping the genetic architecture of a quantitative

trait (Carlborg and Haley, 2004). Epistasis is also of paramount importance in the

pathogenesis of most common human diseases, such as cancer and cardiovascular disease

(Moore, 2003, 2005). The evidence for this is the nonlinear relationship between genotype

and phenotype. We will need to extend the haplotype model to consider epistasis. The

extended model can detect the main effects of individual sequences and the epistatic

effects of the interaction between different haplotype blocks. Various kinds of epistatic

effects resulting from additive x additive, additive x dominant, dominant x additive, and

dominant x dominant interactions can be identified. We will incorporate epistatic effects

into the GWA framework by developing a two-dimensional search algorithm. To the end,

we will construct a web of genetic interactions between genes from different chromosomal

regions in the entire genome. The issues arising from false-positive and false-negative

results will be addressed.









CHAPTER 4
COMPUTING GENETIC IMPRINTING

Genetic imprinting, or called the parent-of-origin effect, is an epigenetic phenomenon

in which an allele from one parent will be expressed while the same allele from the other

parent remains inactive. The expression of imprinted genes from only one parental allele

results from differential epigenetic marks that are established during gametogenesis. Since

its first discovery in the early 1980s, genetic imprinting has been thought to p1 i- an

important role in embryonic growth and development and disease pathogenesis in a variety

of organisms. Several dozen genes in the mouse were detected to express from only one of

the two parental chromosomes (Barton et al., 1984; Cattanach and Kirk, 1985), some of

which have been tested in other mammals (\!.. 'slon et al., 2005) including humans. Many

experiments were conducted to investigate the biological mechanisms for imprinting and

its influences on fetus development and various disease syndromes.

Some studies showed that many human diseases are related to imprinted genes

such as type I diabetes (Paterson et al., 1999), Prader-Willis syndrome and Angelman

syndrome (Falls et al., 1999), NOEY2 in ovarian and breast cancer (Yu et al., 1999)

and bipolar disorder (\I IiinKs et al., 2003). Thus far, it is incompletely documented

how the imprinting process affects development and disease. To better understand the

imprinting regulation, it has been proposed to discover and isolate imprinting genes that

may be distributed sporadically or located in clusters forming an imprinted domain.

One approach to discovering imprinting genes is based on genetic mapping in which

individual quantitative trait loci (QTLs) showing parent-of-origin effects are localized with

a molecular marker-based linkage map. Using an outbred strategy appropriate for plants

and animals, significant imprinting QTLs were detected for body composition and body

weight in pigs, chickens and sheep. Cui et al. (2006) proposed an F2-based strategy to map

imprinting QTLs by capitalizing on the difference in the recombination fraction between









different sexes. More explorations on the development of imprinting models are given in

Cui and others.

Current models for genetic mapping characterize imprinting genes at the QTL level,

but lack an ability to estimate the imprinting effects of DNA sequence variants that

encode a complex trait. By altering the biological function of a protein that forms a

phenotype, single nucleotide polymorphisms (SNPs) residing within a coding sequence

open up a gateway to study the detailed genetic architecture of complex traits, such as

human diseases (Cooper and Psaty, 2003; Ron and Weller, 2007, STAR Consortium 2008).

However, current experimental techniques do not allow the separation of maternally- and

paternally-derived haplotypes (i.e., a linear combination of alleles at different SNPs on a

single chromosome) from observed genotypes. More recently, a battery of computational

models has been derived to estimate and test haplotype effects on a complex trait with

a random sample drawn from a natural population (Liu et al., 2004; Wu et al., 2007b).

These models implement the population genetic properties of gene segregation into a

unifying mixture-model framework for haplotype discovery. They define a so-called risk

haplotype, i.e., one that is different from the remaining haplotypes in terms of their

genetic effects on a trait.

The motivation of this chapter is to develop a statistical model for estimating the

imprinting effects of SNP-constructed haplotypes with a set of families randomly sampled

from a natural population. Each family is composed of both parents and offspring.

Because both parents are .-~' ir..iped, imprinting effects due to different origins of the same

allele can be estimated from phenotypic data of the offspring. The model is validated by

computer simulation studies.

4.1 Imprinting Model

Let us first consider two SNPs, with two alleles A and a, and B and b, respectively,

v ii.i .rped at a genomic region. Without loss of generality, we assume one of the four

haplotypes to be the risk haplotype (denoted by R) and the other three as the non-risk









haplotypes (denoted by R). Then, the combinations between the risk and non-risk

haplotypes, i.e. composite diplotypes, include RR, Rr, rR and rr. To reflect the parental

origin of haplotypes, all the composite diplotypes can be generally denoted as RJR, R r,

r|R and r|r, where the vertical lines are used to separate the haplotypes derived from the

maternal (left) and paternal (right). Genetic imprinting implies that the same composite

diplotype may function differently, depending on the parental origin of its underlying

haplotypes, which means Rr / rR.

According to traditional quantitative genetic principles introduced in chapter 3, the

,.i irpic values of composite diplotypes can be partitioned into different components of

additive and dominance genetic effects. In an imprinting model, we add one additional

parameter to address the imprinting genetic effect due to the different contribution of

haplotypes from maternal or paternal parents. Comparing to the mean model (3-3) in

chapter 3, the phenotypic means corresponding to four possible composite diplotype

groups considering imprinting effect can be expressed as


P2 -= p +a

pi = p+d+i

pi, = + d i (4-1)

po = p- a


where /, a, and d are overall mean, additive effect, and dominance effect explained in

chapter 3; i is the parameter for estimating imprinting effect. The size and sign of i

determine the extent and direction of imprinting effect at the haplotype level.

4.2 Likelihood

A double-heterozygous i. In,.1rpe may have multiple diplotypes, whereas a heterozygous

composite diplotype may have two different configurations between which the same

haplotype is derived from different parents. For example, we assume AB as the risk

haplotype throughout this the chapter. For those offspring whose ., Iir.1' pes are AABb,









AaBB, or AaBb, their phenotypes will come from a mixture of multiple distributions

because of imprinting effects. Modified from Table 3-1 in C'!i lpter 3, Table 4-1 describes

the imprinting effect.



























































































01









0M
0 01


^0 v


cu cu
^ ^

^ r:


- CM





^ r:
,


C r Cl


--t-
C






C






C






C


o


:-^


C1


b~ C~











Then, we construct a log-likelihood within a mixture-model framework, expressed as


log L(2Q |y, Mm, Mf, Mo, p) = constant
ill n121 n122 1132
+ log f2(Y1111) + log1 (Y1211) + 0log o(y1221) + log f1( 1321)
1=1 1=1 1=1 1=1


'1184 n185 n195
+ log fIl(yI841) + log fl(YI851) + log f (Y1951)
1=1 1=1 1=1
n211 n212
+ log f2(Y21) + 10 logf'(2121)
1=1 l=1
n221 n222
+ log f2(Y2211) + log[, fl(y2221) + (1 ..)f'(2221)1
l=1 l=1
1223 1232 1233
+ log fo(Y2231) + 10og fl(y2321) + log fo(Y2331)
l=1 l=1 l=1
1241 1242 1244 1245
+ log f2(Y241) + log fl(y2421) + log fl(Y2441) + 0 log f(2451)
l=1 l=1 l=1 l=1


12551 1552
+ log f2(Y5511) + 0log[Q1l552 1 (y5521) + (1 Q 552)fl' (Y5521)] (4-2)
l=1 l=1
1553 1554
+ logf o(Y5531) + 0log[Ql|554fl (Y5541) + (1 Ql 554)f' (5541)
l=1 l=1
1555
+ log[ )1155fi(555) + Q155f/'(Y555) + (1 Q555 Q125)55)o(Y5551)]
-=1
1556 1557 1558 1559
+ 0log fo(y5561) + log9 fo 5571) + 0 logo(5581) + logfo(Y 5591)+
l=1 l=1 l=1 l=1
1999
+* + log fo(y9991)
l=1

where fi (i = 2, 1, 1', 0) are assumed to be a Gaussian probability density function with

mean pi and variance a2. The relative frequencies c. ., (i,j = 1, 9; k = 2,4, 5; / =

1, nijk can be determined from the following tables depending parent's genotypes.









With parent's genotypes available, we define the relative frequencies of a particular

diplotype for a given genotype in the offspring generation by their -. vi.1irpe groups as

follows. For group 2 (.. .n-l.irpe AABb), its di'p. .irpe can be [AB]m[Ab]f or [Ab]m[AB]f.

The relative frequency of being the first diplotype given its genotype and parent's

1.1 ii.rpes can be expressed as -.I ... Similarly, we define -.i ., for group 4 (genotype

AaBB); ( M):. and (2). fo group 5 (genotype AaBb) as below, where i,j =1, 9

represents mother's and father's iJ. Inli'pe group, I = 1, t nijk represents the Ith kid

within each combination of offspring and parent ,i. I.1 irpe.


*. I





1 2*)*

t* I 0



1 (1) ...- (2)...


[AB]m[Ab]f m= i, f = j,o

[Ab]m[AB]f m= i, f = j,o

[AB]m[aB]f m = i, f = j, o

[aB]m[AB]f m= i, f= j, o

[AB]m[ab]f m = i, f = j, o

[ab]m[AB]f m = i, f = j, o

[Ab]m [aB]for[aB] [Ab]f Im


For simplicity, use 1 and 0 to denote the capital and little

following table describes the values of .I


letters, respectively. The


AABb)

AABb)

- AaBB)

- AaBB)

AaBb)

AaBb)

i,f =j, o


(4-3)


AaBb)










Mother

Father: 11/11 11/10 11/00 10/11 10/10 10/00 00/11 00/10 00/00

11/11 0 0 0 0 0 1 0 0 0

11/10 1 0 1 p" 0 0 0 0

11/00 1 1 0 1 1 0 0 0 0

10/11 0 0 0 0 0 0 0 0 0

10/10 1 pf 0 1 KI 0 0 0 0

10/00 1 1 0 1 1 0 0 0 0

00/11 0 0 0 0 0 0 0 0 0

00/10 0 0 0 0 0 0 0 0 0

00/00 0 0 0 0 0 0 0 0 0
where


Ap1P o- + (1


8f )p{oP


P1pio + P0Pi
m npmpmPo + (1 Om mpm
mpmo n oin p
di
dl +d2
d3
dS + d4


(4-4)

(4-5)

(4-6)

(4-7)


f
[Ofpipoo + (1

[OpMp- + (1

[(1 Of)p1pP

[OmP-p- + (1


0- )piopo]l[(i Om )PI + mOPOIIm

f fm O)P~, + 1 0 0 1]
SP 1 _p O [(- f)p{1p 0 + M 0f

+ Op Op0l[(l+ t )pf pf + p0i1

- Om)P7 [pI1 [t{1po0 + (1- Ot)p{pfl]


Similarly, -. ., are expressed as


where

di

d2

d3

d4


(4-8)

(4-9)

(4-10)

(4-11)










Mother

Father 11/11 11/10 11/00 10/11 10/10 10/00 00/11 00/10 00/00

11/11 0 0 0 0 0 0 0 0 0

11/10 0 0 0 0 0 0 0 0 0

11/00 0 0 0 0 0 0 0 0 0

10/11 1 1 0 1 pm 0 0 0 0

10/10 1 1 0 pf KI 0 0 0 0

10/00 0 0 0 0 0 0 0 0 0

00/11 1 1 0 1 1 0 0 0 0

00/10 1 1 0 1 1 0 0 0 0

00/00 0 0 0 0 0 0 0 0 0

0 (1) are expressed as

Mother

Father 11/11 11/10 11/00 10/11 10/10 10/00 00/11 00/10 00/00

11/11 0 0 0 0 0 0 0 0 0

11/10 0 0 0 0 0 0 0 0 0

11/00 0 0 0 0 0 0 0 0 0

10/11 0 0 0 0 0 0 0 0 0

10/10 1 1 pf 0 1 -p Kp 2 0 0 0 0

10/00 1 1 0 1 pm 0 0 0 0

00/11 0 0 0 0 0 0 0 0 0

00/10 1 0- p m 0 0 0 0

00/00 1 1 0 1 1 0 0 0 0
and Q (5 are expressed as










Mother

Father 11/11 11/10 11/00 10/11 10/10 10/00 00/11 00/10 00/00

11/11 0 0 0 0 1 1 0 1 0

11/10 0 0 0 0 1 pm 1 0 1

11/00 0 0 0 0 0 0 0 0 1

10/11 0 0 0 0 1 p 0 1 0

10/10 0 0 0 0 K2 pf 0 1 pf 1

10/00 0 0 0 0 0 0 0 0 1

00/11 0 0 0 0 0 0 0 0 0

00/10 0 0 0 0 0 0 0 0 0

00/00 0 0 0 0 0 0 0 0 0

4.3 Estimation


We implement the EM algorithm to estimate the genotypic means


and residual


variance by maximizing the log-likelihood (4-2). The relative frequencies in the E step

was described in (4-4) in the above section. In the M step, we derive the closed forms of











. iinl .rpic means and residual variance:


9 nij
2 = E E 11
1 i,j= 1=1
9 nijk 9 nijs
1 -I(1) LE EE" ,.EI ,
Zk-2,42 l-1 /,A + -1 .1-. ij I k2,4 -1 ij-1 1-1


k=2,4 l (1 A +. 1


9 nij k

i,j=1 k=2,4 1=1


9 rijs

ij=1l 1 1


S n n + n + i(1 (1)


9 1 3 ijk

i,j=1 k=3,6,7,8,9 =1


9 nijs
i 1+ EE(1
i,j1= 1=1


9 niji

S9 no Ii E E 011 2
k=1 k i,j= 1=1
9 nijk 9 'ijs
+ E ED E p-1) 21 + ( .
i,j=1 k=2,4 1=1 ij= 1=1


9 nijk

ij=1- k=-2,4 l=1
9 rijs
+ -E
i,j=1 1=1


9 rijs
i+- )2 (1_ 1 )+ ., 9
i,j=1 1=1

o)2( (1). o(2))
i,j=1 k=3,6,7,


9


i, j= 1

pii(lylcI) + (- 1 ijk foy kl


ij = 1,- 9


k = 2,4


|1). 1 ( ij51
1).- f yij5) + 2). f yij5) + ( (2). f1 ij5)) fo Wijkl)


(1)____. fv y______


S1)fl(y1ij5) + 12). _f 1Y(ij51) + (


. 1) (2). fi(Yij5l)) f ijkl)


(4-12)


where


- 1)2 2(2)

nij89

8,9 1-1


O0)2]


p(1)


1)2(:


(1) ( ). .2









and I = 1 nijk. Then the overall mean, additive effect, dominance effect as well as the

imprinting genetic effect can be solved from (4-1):


S= -(2 + 0)
2
1
2
1 1
d }Mi i') -(f2+f0)
2 2
a = -(P2 -0)
2



4.4 Hypothesis Testing

For a given data set, the first step is to test whether there is any QTN effect. This

can be done by testing the following hypothesis:


Ho : [12 1= [i= l11' = Mlo /1 (4-13)

H1: at least one equality in Ho does not hold.


The log-likelihood ratio (LR) can then be calculated by substituting the estimated

parameters under the full model and reduced model. The .i-Jmptotical X2-distribution

with 3 df can be used to determine the significance. Furthermore, if a significant QTN

effect is detected, a few more tests can be performed to see if the additive, dominance and

imprinting effect t ii -. r I by QTN effect exists. For example, under the biallelic model,

the hypothesis for testing imprinting effect is




Ho : = 0 (4-14)

H1: i/0


The parameter estimates under Ho can be obtained using the EM algorithm from

chapter 3. The log-likelihood ratio test statistics for each hypothesis can be viewed to









.I-i~'h,'I| ically follow a X2 distribution with the degree of freedom equal to the difference

of numbers of parameters between null and alternative hypotheses.

4.5 Simulation

Under the same setting of chapter 3 with adding imprinting effect, the Monte Carlo

simulation was conducted to generate the genotype and phenotype data for the specified

family size for both two-point and three-point models. Table 4-2 and 4-3 describe the

estimation results for both population and quantitative genetic parameters of the proposed

two-point and three-point model. 1000 simulation replicates were performed to calculate

the mean and standard deviation of the MLEs for each parameter. Again, all parameters

can be reasonably estimated in general. As expected the accuracy and precision of

population parameter estimation does not rely on the heritability value H2. But the

accuracy and precision of estimation for quantitative parameters including imprinting

parameter increases with the increment of sample size and H2.

4.6 Discussion

Different expression of maternally- and paternally-inherited alleles at certain genes is

called the genetic imprinting (C!r. 1.x i ud et al., 2008; Reik and Walter, 2001; Wilkins and

Haig, 2003; Wood and Oakey, 2006). Despite its great importance in trait formation and

development, it remains unclear how genetic imprinting operates in a complex network

of interactive genes located across the genome. Genetic mapping has proven powerful to

estimate the distribution and effects of imprinted genes. While traditional mapping models

attempt to detect imprinted quantitative trait loci based on a linkage map constructed

from molecular markers, we developed a statistical model for estimating the imprinting

effects of haplotypes composed of multiple sequenced SNPs. The new model can provide

the characterization of the difference in the effect of maternally- and paternally-derived

haplotypes, which can be used as a tool for genetic association studies at the candidate

gene or genome-wide level.









Table 4-2. MLEs ( standard deviations) of quantitative genetic parameters under the
two-point model with imprinting effect


p(1.00)
0.9880.051
0.9910.021
1.0030.018
0.9950.053
0.9900.019
1.000+0.018


Quantitative
a(0.5)
0.4810.054
0.4950.023
0.4970.020
0.4850.053
0.4950.019
0.501+0.019


Parameters
d(0.4)
0.4090.068
0.4050.029
0.3940.026
0.4000.063
0.4090.029
0.4020.027


i(0.1)
0.1010.058
0.0920.024
0.1020.022
0.0830.059
0.0940.022
0.0980.019


Family
200
200
200
1000
1000
1000


Kids
5
5
5
1
1
1


D
0.190
0.100
0.020
0.190
0.100
0.020




























Table 4-3. MLEs ( standard deviations) of population and quantitative genetic
parameters under the three-point model with imprinting effect based on
1000 x 1 design strategy


0AC P
True Value 0.6


q r
0.5 0.4


DBC
0.02


DAC DAB DABC OAB OBC
0.06 0.10 0.01 0.05 0.05


Population Genetics Parameters
0.100(1=1) 0.6002 0.4991 0.3998
(0.0068) (0.0087) (0.0078)
0.095(1=0) 0.5998 0.4996 0.3997
(0.0085) (0.0075) (0.0078)
0.085(1=-2) 0.5995 0.5000 0.3999
(0.0075) (0.0075) (0.0081)


Quantitative Genetics Parameters with Imprinting Effect


OAC
True Value


H2 pi
1


0.100(1=1) 0.1 1.0064
(0.0503)
0.4 1.0004
(0.0205)
0.095(1=0) 0.1 1.0040
(0.0491)
0.4 0.9948
(0.0191)
0.085(1=-2) 0.1 1.0074
(0.0549)
0.4 1.0022
(0.0204)


0.4940
(0.0494)
0.4997
(0.0201)
0.4985
(0.0487)
0.5038
(0.0187)
0.4937
(0.0535)
0.5001
(0.0207)


0.0193
(0.0058)
0.0193
(0.0050)
0.0195
(0.0059)


0.0598
(0.0048)
0.0593
(0.0046)
0.0603
(0.0046)


0.1001
(0.0046)
0.1000
(0.0044)
0.10000
(0.0039)


0.0098
(0.0024)
0.0103
(0.0029)
0.0098
(0.0027)


0.0532
(0.0269)
0.0529
(0.0210)
0.0529
(0.0229)


0.0885
(0.0262)
0.0804
(0.0591)
0.0769
(0.0467)


0.1004
(0.0392)
0.0968
(0.0426)
0.0859
(0.0405)


0.3969
(0.0640)
0.4004
(0.0256)
0.3974
(0.0590)
0.4067
(0.0236)
0.3901
(0.0674)
0.3984
(0.0241)


0.1061
(0.0373)
0.0982
(0.0160)
0.0990
(0.0403)
0.1028
(0.0158)
0.1041
(0.0399)
0.1000
(0.0144)









CHAPTER 5
ZYGOTIC DISEQUILIBRIUM MAPPING WITH FAMILY DATA

Linkage disequilibria has been used as a fundamental concept for studying the pattern

of genetic diversity in a natural population and fine mapping the genetic control of

complex traits. The theoretical basis of linkage disequilibrium analysis is derived from the

assumption that the population under study is at Hardy-Weinberg equilibrium (HWE),

in which individuals are randomly mating to produce next generations. In such an HWE

population, the nonrandom associations of alleles at different loci only occur within

gametes rather than between gametes. Because such a gametic linkage disequilibrium

decays in relation to genetic distance with generation, the evolutionary history of a

population can be inferred by plotting the relationship between these two parameters.

In genetic mapping studies, the information provided by this plot helps to determine the

optimal number of molecular markers for high-resolution mapping of genes in an organism.

For a given population, the randomly mating assumption may be violated by

many evolutionary forces such as selection, mutation, genetic drift, and population

structure. For a nonequilibrium population at Hardy-Weinberg disequilibrium (HWD),

zygotic disequilibria that have power to characterize non-random associations at both

gametic and zygotic levels (Weir and Ott, 1996), may be more relevant. Earlier studies

have documented possible genetic and evolutionary causes for zygotic associations

in a nonequilibrium population (Barton and Gale, 1993; Bennett and Binet, 1956;

Ch'! I)i. worth, 1991; Haldane, 1949, pp.13-45). Weir and Ott (1996) documented five

different types of disequilibria simultaneously which are (1) Hardy-Weinberg disequilibria

at each locus, (2) gametic disequilibrium (including two alleles in the same gamete, each

from a different locus), (3) nongametic disequilibrium (including two alleles in different

gametes, each from a different locus), (4) trigenic disequilibrium (including a zygote at one

locus and an allele at the other), and (5) quadrigenic disequilibrium (including two zygotes

each from a different locus). Because it is impossible to estimate all the five disequilibrium









parameters due to inadequate degrees of freedom, Weir and Ott (1996) collapsed gametic

and non-gametic disequilibria to estimate a so-called composite gametic disequilibrium.

More recently, Liu et al. (2006) used Weir's approach to estimate zygotic disequilibria in

a canine population and gain a better insight into the structure and organization of the

canine genome.

Although it is needed for estimating disequilibrium parameters, Weir's treatment

will lead to a significant loss of information. The reason for inability to separate the

gametic and non-gametic disequilibria is due to insufficient information that can be used

to distinguish two dipi]..1 .rpes that form the same genotype of the double heterozygote. In

this chapter, I will show that a family-based design can provide the distinction between

these two diplotypes by tracing the co-transmission of nonalleles at different genes from

parents into their offspring. I will develop a statistical model for estimating estimates of

a full set of disequilibria with a panel of nuclear families. A real example for the genetic

study of Crohn's disease will be used to demonstrate the usefulness of the new model.

5.1 Zygotic Disequilibrium

5.1.1 Gamete and Non-gamete Frequencies

Consider two SNP markers A (with two alleles A and a) and B (with two alleles B

and b). Let PA and pa (PA +Pa = 1) as well as pa and pb (pB +Pb = 1) be the corresponding

allele frequencies. At each of the two SNPs, there are three distinguishable i. Ir.11 pes, i.e.,

AA, Aa and aa for marker A and BB, Bb and bb for marker B. The two markers form 10

,. ii'i .pic configurations or diplotypes, but only 9 can be genetically distinguished from
B b b B
each other. This is because genotypic configurations and have the same
A a A a
iil .rpe AaBb.

Let P, subscripted and superscripted by the genotype notation, be the genotypic

configuration frequencies which are individually tabulated in Table 5-1. It is not difficult

to estimate one-marker genotype frequencies from two-marker i, .1 irpic configuration










Table 5-1. Frequencies and observations of marker genotypes


Marker


Marker B


BB (2)


Bb (1)


bb (0)


AA(2) PABAB

Aa (1) PABB


aa (0) PBB


Total PBB

Note: Genotype
[Ab] [aB].


frequencies by


pBb
A A


PAA = p2 + DA


pABb + 'b


PAa = 2PAPa 2DA


pBb


a = p + DA


pB + DB PBb = 2pBpb 2DB Pbb= p +DB 1

AaBb contains two different configurations or diplotypes [AB] [ab] and


PAA

PAa

Paa


PBB pBb bb
AA 'AA AA


PBB Bb + B Pbb


(5-1)


pBB pBb + bb
aa aa aa


for marker A,


ABAB ABB BB


PABbb + pBb p pB + Bb


Pbb I nbb + Abb
AA + AO + 0aa


Total


PBB

PBb

Pbb


(5-2)









for marker B, and estimate the allele frequencies from the one-marker genotype frequencies

by

PA = PAA + PAa

Pa = Paa+ PAa
(5-3)
PB = PBB +I PBb

Pb Pbb +PBb.

The two markers form four gametes, AB, Ab, aB and ab, whose frequencies can be

estimated from genotypic configuration frequencies by

PAB PAAB + PAA +PAB PB)

PAb = PA A+ 2(Pj + PIA + PA)
(5-4)
PpBB aB + pBb + PB)
PaB = P + 2(PAa + b P+ )

Pab b- t PP P A PABb

Similarly, the frequencies of nonalleles from different gametes can be estimated by

PA BB lBb BB pbB)
PAIB PAA +2PAA +PA PA}

PAlb = PAA + 12 A + P +PA
(5-5)
PaIB = P + (PAB aBb PA

Pa/b b PBa 1a Paa PA^









The frequencies of triple alleles from different markers are estimated as


PAA -AA + AA, PAA AA a2AA

AB a ABB A \ Bb p6B\ & p6 i 1 nBb b p6B)
PAa Aa a Aa a a A Aa

B^ pLB + 1 p b a pbb Bb
(5-6)
pABB BB pBB BB pBB i ABaB

Bb pB ABb + hB\ + pB PBb pBb + 1A (Bb + pbB)
PA A + Aa I, aa A2 aA A

bb bb i bb pbb I 1bb
PA P-4-AA +P PAa Pa aa 2 A-


5.1.2 Complete Disequilibrium Parameters

The zygotic disequilibrium is defined as the deviation of two-locus genotype

frequencies from products of single-locus genotype frequencies and, thus, is composed

of all nonallelic genic disequilibria at the two loci (Weir and Ott, 1996). Assume that the

population considered above is at HWD. This population thus has no desirable property of

an equilibrium population, such as independence of different allele frequencies at the same

locus (Lynch and Walsh, 1998). The HWD attempts to test for two alleles at the same

locus, but on different gametes, whereas (gametic) linkage disequilibrium describes two

alleles on the same gametes, but at different loci. For the zygotic disequilibrium, however,

there is a third test, i.e., two alleles on different gametes and at different loci.

Since the population is not in HWE, two alleles at each marker are not independent,

with the coefficients of Hardy-Weinberg disequilibrium defined as

DA = PAA p

'PAa + {PAPa (5-'7)

Paa P










for marker A and


DB = PBB p

SPBb + {PBPb (5-8)

SPbb -

for marker B, respectively. The coefficient of digenic gametic linkage disequilibrium

between the two markers is defined as

Dab = PAB -PAPB

-PAb + PAPb
(5-9)
S-PaB + PaPB

= Pab PaPb-

For the nonequilibrium population, digenic linkage disequilibrium that occurs between

nonalleles at different gametes, defined as


Da/b = PA/B PAPB

-PA/b + PAPb
(5-10)
= -Pa/B + PaPB

SPalb PaPb-









The trigenic disequilibria between two alleles from marker A and one allele from

marker B is defined as

DAb = PAA PADab PADa/b PBDA PAPB

-PAA PADab PADa/b + PbDA + Pb

SPAB (PA Pa) Dab (PA Pa) Da/b PBDA + PAPaPB
(5-11)
A PA P)Dab (PA Pa) Da/b + PbDA PAPaPb

PaB + PaDab + PaDa/b PBDA 2pPB

-P + PaDab + PaDa/b + PbDA + pb

The trigenic disequilibria between one allele from marker A and two alleles from marker B

is defined as

DaB PABB PBDab PBDa/b PADA PAPB

S-PfB PBDab PBDa/b + PaDA + PaPB

ABb B PbDab PB Pb)Da/b PADA +PAPBPb
(5-12)
2 2PB ab 2 ''A + P APBPb
SPa PB PDab PB Pb) Da/b PD PaPBP

PA + PbDab + PbDa/b PADA PAP

b-p + PbDab + PbDa/b + PaDA + PP

With genotypic configuration frequencies, allele frequencies, HWD, gametic and

nongamete disequilibria, and trigenic disequilibria, we can estimate the quadrigenic

disequilibrium (DAB) between two alleles from marker A and two alleles from marker B

using the formulas given in Table 5-2 (see Weir 1996). Note that we use the lower and

upper cases to denote gamete and zygotic disequilibria, respectively. From Table 5-2, we

can see that each of the ,.. .r1 ipic configuration frequencies can be expressed in terms of

the allele frequencies (PA, Pa and PB, Pb), HWD coefficients (DA and DB) and gametic

(Dab) and nongametic disequilibria of different orders (Da/b, DAb, DaB and DAB).









Table 5-2. Expressions of quadrigenic disequilibrium DAB in terms of genotypic
configuration frequencies, allele frequencies and lower-order disequilibrium
coefficients

Fre- DADB+
quency 1 Db + D b DA DB Dab Da/b DAb DaB
PABAB -P -1 -p -PA -2PAPB -2pAPB -2pB -2pA
B PBb 2 -1 PBPb -PA -PAPB + PAPb -PAPB + PAPb -PB + Pb -2pA
PbA -PAP -1 -pb -PA 2pAPb -2pAPb -2pb -2pA
PABB PAPaB -1 PAPa -PAPB + PaPB -PAPB + PaPB -2PB -PA + Pa
PA PAPaPBPb -1 PBPb PAPa -PAPB PaPb PAPb + PaPB -PB + Pb -PA + Pa
P -PAPaPBPb -1 PBPb PAPa PAPb + PaPB -PAPB PaPb -PB + Pb -PA + Pa
P PAPBP -1 PAPa PAPb PaPb PAPb PaPb 2Pb -PA + Pa
Pa -PPB -1 -p 2PaPB 2PaPB -2PB 2Pa
S PBb PBPb -1 PBPb -P PaPB PaPb PaPB PaPb -PB + Pb 2pa
Pbb -P2 -1 -Pb -pa -2PaPb -2PaPb 2Pb 2Pa


5.2 Model for Estimation

Based on the above description, it can be seen that the estimation of all disequilibrium

parameters purely relies on the separation of two diplotypes underlying the double

heterozygote. In this section, I will show these two diplotypes can be separated by

using a family design. Table 5-3 gives mating frequencies and vi. .r1 ipe frequencies of

offspring in each family. If one parent is a double heterozygote AaBb, then any genotype

generated in the offspring will include a mixture of genotypes derived from gametes of its

two underlying diplotypes (see Table 5-3). The proportions of mixture components are

determined by two parameters, the recombination fraction (0) between the two markers

and the relative proportions (Q) of two underlying diplotypes for the double heterozygote.

As seen from Table 5-3, these two parameters contribute to the genotype frequency of

offspring in a symmetrical way so that it is not possible to separate one from another.






















0

t-
cb
0



0
a4






0

0
r^











0
m












Q
VI












0
C-^




0
0



0
0
c0








t


0
0^
00
a

























00
0



























2


H
a;
r
o

*^
0.1
k





cb




bO


rr
i
,C
Sca
0D
r^
r^
cTJ
m

cvj
'(-

e-
43


-Il| -l|








r -IlC|l


-l|(


Itl 3
-It


) -< )< ) )a
&CC & ~ Y
~-- ttl; S ; tl^
CL' L C, X


e ea (




x x xx x
D )< )< )<, )
C~,~~


r :0 c (0 1- c0 C


-3







l l|
-Il


-l| -






r 1|Cl


" cq
cq^ aa




8-Q
-d


cb



R


-O ~3e
~3~ ~o~
aa,


b~ .c~


- q (M It,









To overcome this limitation, we incorporate a third marker C to destroy such a

symmetry. In C(! ijpter 2, we described the procedure of three-point analysis which

allows the estimation of disequilibria of high orders and the crossover interference. By

introducing new parameters, crossover probabilities, goo (no crossover between markers A

and B and between marker B and C), go, (no crossover between markers A and B but

one crossover between marker B and C), glo (one crossover between markers A and B but

no crossover between marker B and C), and gil (one crossover between markers A and

B and between marker B and C), we can constitute the frequencies of gametes derived

from a given diplotype. These gamete frequencies, along with the relative diplotype

frequencies, are used to characterize the genotype frequencies in the offspring within a

given family. Using the EM algorithm for three-point analysis, we are able to estimate

crossover probabilities and relative diplotype frequencies.

For a three-SNP model, the following heterozygotes have two or four diplotypes with

respective relative frequencies:


Heterozygote

AABbCc

aaBbCc

AaBBCc

AabbCc

AaBbCC

AaBbcce

AaBbCc
After these


Diplotype 1


Diplotype 2


ABC|Abe Q1 ABcAbC

aBClabe Q3 aBceabC

ABC aBe Q5 ABceaBC

AbClabe 07 AbceabC

ABCjabC Q9 AbCjaBC

ABceabe oi1 AbceaBe

ABClabe Q13 ABceabC
O's are estimated, we estimate


Diplotype 3


Diplotype 4


Q14 AbClaBe Q15 AbceaBC Q16
the relative diplotype frequencies for


double heterozygote AaBb. The frequencies of diplotypes ABlab and AblaB are estimated,

respectively, by


(5-13)


'i = 09 + 11 + Q13 + Q14, b2 = 010 + Q12 + Q15 + 016.









Now, it is easy to estimate diplotype frequencies by


PAa = i1AaBb (5-14)

for AB ab and by


a = 'PAaBb (5-15)

for Ab aB.

5.3 Hypothesis Tests

Each of these disequilibria can be tested for their significance. The hypotheses for

testing HWD are formulated by


Ho : DA OA / 0 (5-16)

Ho : DB = 0 vs. HI : DB / 0 (5-17)


for two different markers, respectively. The hypotheses for testing each of zygotic

disequilibria between the two markers are given as


Ho : Dab = 0 vs. H, : Dab / 0 (5-18)

Ho : Da/b = 0 vs. H, : Da/b / 0 (5-19)

Ho : DAb = 0 vs. HI : DAb / 0 (5-20)

Ho : DaB = 0 vs. H, : DaB/ 0 (5-21)

Ho : DAB = 0 vs. HI : DAB / 0. (5-22)


For these hypotheses above, we calculate the likelihood under the Ho and H1,

respectively, from which the log-likelihood ratio (LR) is calculated. The LR test statistic

calculated follows a V2-distribution with one degree of freedom.

It is also possible to test whether all the disequilibrium coefficients are together equal

to zero. The parameters that need be estimated under the Ho : Dab = Da/b = DAb =









Da = DAB = 0, include allele frequencies and HWD coefficients which can be estimated

with a closed form. The LR value for this hypothesis should .i-i mptotically follow the

X2-distribution with five degrees of freedom.

5.4 A Worked Example

Lander and colleagues reported a high-resolution analysis of the haplotype structure

across 500 kilobases on human chromosome 5q31 using 103 common (>5'. minor allele

frequency) SNPs ,-. I.1' irped in 129 trios from a European-derived population (Daly et al.,

2001). These SNPs are divided into 11 discrete haplotype blocks. We used a three-point

analysis to analyze every three ..1i went SNPs for each block. In general, high linkage

was detected between each pair of ..11i ient markers. About s' marker intervals display

significant crossover interference.

Within each block, markers display strong pair-wise linkage disequilibria. Trigenic

associations were detected to be common; over 91i' markers within blocks show significant

linkage disequilibria of a high order. These results about the linkage and LD s'i--.- -1 the

importance of using a three-point analysis to characterize haplotype structure and effects

in a human population.

The model proposed to estimate zygotic disequilibria for each pair of neighboring

SNPs with results shown in the Appendix B). It is interesting to note that a high

proportion of SNP pairs display significant nongametic disequilibria. For example, SNPs

1 and 2 have a non-gametic disequilibrium of 0.147, whereas the gametic disequilibrium

is close to zero. We also detected significant high-order disequilibria, but it seems that

quadrigeneic disequilibria occur more frequently than trigenic disequilibria. The detection

of these zygotic disequilibria may reshape the traditional theory of population genetic

studies.

5.5 Discussion

The study of the linkage disequilibrium structure helps for the positional cloning of

genes underlying common complex diseases. The current approaches for estimating linkage









disequilibria rely on the assumption that the population under consideration is randomly

i li ii following Hardy-Weinberg equilibrium (HWE). However, many populations may

be founded by a small number of ancestors and/or are frequently under evolutionary

pressure, such as mutation, genetic drift, population admixture and structure (Lynch and

Walsh, 1998). For those populations, HWE may be violated. We will need a new analysis

that relaxes the random mating assumption. Weir and Ott (1996) introduced the concept

of zygotic association or zygotic disequilibrium that specify the disequilibria between

different loci in a nonequilibrium population. A multilocus statistic was proposed by Yang

(2000, 2002) to examine zygotic associations in nonequilibrium populations. More recently,

Liu et al. (2006) used these zygotic disequilibria to estimate the extent and distribution of

zygotic disequilibrium across the canine genome.

In this chapter, I have first time developed a new statistical model for estimating all

different types of zygotic disequilibria including (1) Hardy-Weinberg disequilibria at each

locus, (2) gametic disequilibrium (including two alleles in the same gamete, each from a

different locus), (3) nongametic disequilibrium (including two alleles in different gametes,

each from a different locus), (4) trigenic disequilibrium (including a zygote at one locus

and an allele at the other), and (5) quadrigenic disequilibrium (including two zygotes each

from a different locus). This approach based on a family design is more advantageous

over Weir and Ott (1996) treatment in which gametic and nongametic disequilibria are

combined. Weir's approach allows the estimation of part of zygotic disequilibria.

This model was used to analyze a real data set for Crohn's disease. We detect

significant evidence for zygotic disequilibria, showing that the disequilibria at the zygotic

level may have contributed to the evolution of a natural population. The approaches

for zygotic disequilibrium analysis will provide a routine tool for the identification of

the overall picture of disequilibria across the genome and the gene mapping of complex

diseases.









CHAPTER 6
FUTURE DIRECTIONS

Recent years have witnessed a tremendous growth in statistical and laboratory

methods for genetic mapping of human diseases, such as cardiovascular disease, hypertension,

diabetes, asthma, cancer, and other complex genetic disorders. Because these diseases are

not controlled by a single gene, powerful genetic designs and statistical algorithms are

largely needed. As a commonly used design in genetic rn- Illpvir nuclear families (including

both parents and offspring) have been shown to be informative by providing information

about the genetic control of complex diseases. In this dissertation, I have explored several

useful aspects of such family-based designs in gene identification and proposed a series of

statistical models for estimating and testing genetic information hidden in these designs.

(1) The design includes the segregation of alleles in a natural parental population

and the transmission of alleles from parents to the next generation, allowing the

simultaneous estimation of linkage and linkage disequilibrium and facilitating the

construction of a linkage disequilibrium map;

(2) Because both parents are ,: vir.' .1ped, the transmission of alleles from male and

female parents can be traced. Thus, genetic imprinting effects, i.e., parent-of-origin

effects can be estimated;

(3) The family design can separate the diplotypes for a heterozygote ,-:. Iir. .1pe, thus

showing power to estimate disequilibrium parameters at the zygotic level and extend

population genetic principles into a non-equilibrium human population.

Because all the aspects mentioned above include missing data problems, I have

derived a library of mixture models for estimating genetic parameters for each case.

Statistical properties of each model have been investigated by simulation studies. Other

practical statistical issues including the convergence rate, computational efficiency, and

global maxima specification are also examined.









Complex human diseases are controlled by multiple loci with effects that are sensitive

to the environment. More importantly, genes do not act in isolation; rather, they interact

with one another in a complicated web to regulate cellular systems and generate a certain

disease. Consequently, mapping of genetic networks is central to our understanding of

the function of complex inherited diseases. Also, most diseases undergo a developmental

process. The knowledge about the temporal control of pathogenesis helps to understand

the mechanisms of disease formation. Specifically, I propose the following research

directions for promoting the genetic study of human diseases:

(1) Integrate multilocus population and quantitative genetic principles into different

genetic designs derived from natural human populations. Genetic modeling of human

diseases is complex, involving a network of interactions between multiple single

nucleotide polymorphisms or haplotypes from different chromosomal segments.

Traditional quantitative genetic principles will be used to model the additive,

dominant, and epistatic effects of genes. We will need to explore how linkage

disequilibria of high orders affect genetic segregation in a population and how

crossover inferences contribute to genetic variation;

(2) Develop a warehouse of statistical methods for kinetic functional mapping of genes

that affect complex diseases by incorporating their physiological path-,i -. It is

interesting to study the genetic architecture of change of disease risk factors with age

by organizing ordinary differential equations into a statistical framework, facilitating

an understanding of the dynamic pattern of genetic control of disease risk.

With these developments, we can construct a comprehensive set of predictive tools

for human diseases by combining patients' epidemiological factors and/or transcriptomic,

proteomic and metabolic data. We will particularly incorporate allometrical scaling laws

into our predictive tools, thus enhancing the biological relevance of the tools.









APPENDIX A
HARDY-WEINBERG PRINCIPLE

The Hardy-Weinberg principle states that both allele and -., Ir.1 rpe frequencies

in a population remain constant that is, they are in equilibrium from generation

to generation unless specific disturbing influences are introduced. Those disturbing

influences include non-random rI Ivi lin mutations, selection, limited population size,

random genetic drift and gene flow (Wikipedia). The Hardy-Weinberg model have five

basic assumptions: 1) The population is large (i.e. there is no genetic drift); 2) There is

no gene flow between populations, from migration or transfer of gametes; 3) mutations are

negligible; 4) individuals are mating randomly; and 5) natural selection is not operating on

the population. Given these assumptions, a population's genotype and allele frequencies

will remain unchanged over successive generations.

Hardy-Weinberg equilibrium (HWE) states that in a population of random mating,

under certain conditions, the genotype frequencies follow a mathematical model

established by the English mathematician Hardy (1980) and the German physician

Weinberg (1980). In the simplest case of a single locus with two alleles A and a with allele

frequencies p and q where p + q = 1, we use P2, P1 and Po to represent the ,i. n'.1 lpic

frequencies for I..'..i.. v-ote AA, heterozygote Aa and homozygote aa. The HWE -i-;,

these frequencies follows the following equations:


P2 P2

Pi 2pq

Po = q2









APPENDIX B
ESTIMATION OF ZYGOTIC DISEQUILIBRIA USING CROHN'S DISEASE DATA SET














































0
"O
m~i 0
8


0
"O
Cl~i
8-0
d

0
"O
8
d

0
mlo



cY3
Iio
8-0
d


Cl
(M
e


0
L:o

0


cc cc^ ooo ooc oc C C
memaxa =
cr30Ln000cr3C e


CIA( - cc C CIA cy- C I- =) C t^- C1 =)C oc CI
cc cc cc Loro^t' o ooo t- Lro C t'- oc C cc C o
Ln a e ss m s e
sexce sexxe easa 0




















































































^ Tt- t t t -c Co ^ o T~o 'i CI cy- oot Lr' I- cc cq ooto oo CA C":
I- cc C CIA cOyo^ i t -) zt' Lr' t' t'- io CO cq C' '- O tO Lr O to C CoA O o o^







l O I^- cc C CI y- ( zt Lr' i o I^- oc C cq C s t o tL oc C CA C":) o o
asaasaasas^ ^^^^^ ^ iooiioi iioioinioiO t~t~tt3tO tCt






cc C oc CA cy-oc C C' C'C ~ C
000000000000000000000000000000000
000030000000000000000000000000000
000000000000000000000000000000000


00000000000000~000000000000000+00
ooo ooo oo ooo oo o ioooooooooo^-ioo^-inC

lO~~:r3' nD-C0~~ COi-o ^ Mo ^i oi- oo ( s^i t oo^o-( s ot ^


O

8-0
d

o
ilo
8-0
d

o
"O
m~i O
8-
d


o
"O
8-
d

o
ml~
8-0
d

o
Iio
















0 m
-4-

O ttOOlmOt-l ~~OO-t O( O Ot O t-0000 30t0 3 -0 0 t- 1

^^00000~0000~00000000000000~0000000

cc CI
00000300000~0(M(MOOOOOOOO OOOOO^MOOO



^000^~0 0000^00000000^-i-i^ioO ^-iOOO OOO

O -000 00000003000000000000^-0000000
CIA Lr














0l CIA
^- ~ ~ ~ ~ o 0c 0y- 0I 0 0 0 '- 0 0 0 0 I- 0 -



0
S3 00000000~000(000000~000 O0 ttOOOOO
^ ^00000000001^ 0000000000^00000 000000





00 0 0 0~00 0 ~ 0 ~ 0 00 0
000^^0^00000^-ioooo^-io ^-i~~ooo i oo io i o






bO
^cc~



0
U 000 00^0~0000000000000~000^-o O O O O O O




CI o
tCIA
0 '^00~00000000000^0^0~00(MO OO OOO^(00000



2 V
CC 5zt CY O C' C O O- 00 OOt tO O MOOOO
'&m000010000000000000000000000000000


0
0


0



oc CO i oc CC CC Lr' o Lr' L r r' o r c, o o oc oc C Co C

oc oc ocCC CC CCCC CC CC C CC C CC C C
ro 3 0 O l O0 0O0 0O O O O O ^ O O0 0 O O O O ~ t OO0t O ^ O O O O

5 ^ "O^I~lOO OOOO -00~001-000001^-00 -0 00 0
03^0000^00000000~000^000000^000300 00





0;
o 0 0 0 0 00C0 -0 0 0 0 0 0 0 0 0 0 0 0C c-0 t-0 0 0 0
e- 0000001^-00000000~0000^00000^00000^00

^0000000^^0^0000000^0000030~00000^000


t- 0 0 0 0 0- 0 0 0L 0 0 CI 0 0 c 0 0 0 0 C- 0 0 C- 0 0 0 0 C 0 0


















~CIA
't3
0 0 ( 03











H> C
'S ^, 0 ~ ~~~~0 ~0^ 0(0 10 1340




m

*S^00000000000000000000000000000000
ri ^,0000000000000000l0000000000000000o


03 ;- C O O^ O(03 l t00 0^ (M 0 lOO'00 0^ 0 (M 03 0 l0t 0 0^ 0 0 01


1j C/ t' OO O; (M 03 In l0 r0- 0, -~ (M 03 ^ l0 t0 1- 00 0s 0 (~M 03 in l0 r0- 00 0s 0 -
a; _o tO O to t- t' t- '- t'- t'- t'- t- ooO oo o oO oO oO oO Oo o^ o^ o^ o^ o^ o^ o^0 o^ o^ o^ o








134


























coo00 000m 000000c0oa0000 0a
C YL ) I Cl 7Oj S^ Otcl C I C IC0 ^

dd ?d ? ? d d ? a ?oo a c6 ? 66


CY -t, IN cf CII -t CII -z CIA CCIC -zt -zt







SUD

-O0000-00 -000- 00- 0O0000 O t O00000
O



4 a-S i -zt C- t0 -, =e C-
O co 0 p- o o o o o o Coo OO y- o

* 0 ^dd ddd ^^00^ d0400100^ 1 ^ ^ 0 0 0 0


C 1 A it CY
bDCD

C0I0 -) tI 11- I I CIIA I CIAI C1 t0 I I 0 0I 0r





0 (- - CIA I- CIA I- It IO I t- CC -I -





CC CC CC-O C, OC LI', "1O O 1- 1t-lCC Ol Mtl LO,- t- -- t- 0 V







C CDbZ Q M CD CD CD CD1 CD1 CD1 CD1CD
N01
l ) oo oo I cc CIA oo -tO tr t- OC C C' I o tzt LM I- CC CA




0 -0000000000000000000000000000000





4-1
cc IA6 6 66M66666666-- ((((((6666



c^U 1c3~L ~~0 3O~0 n ~~0 3O~0 nC 0~
ff
a;


^3
cTJ














































0C CD000000o 0D000o0
Vo od o o o o o o d d


~310
cg


SI- c C CI -m ( Loe tI- cc C = c o ot Lr .- oM o -b o CC:

cc C Lr' ^ ^ ^ ^ ^ ^ oc l ClCA l lyl olcl C t Lrtt'











































do OO-t0I-o


Ln
"O


OO OOt-oOQo ooo~o (m osfionot'-ooo00
C~ObbbbbbbbbOOOOOOOOOOOOOOQCO O00000+
f I- I- I- I- I- I I c c c c oc O o o oo o o b o o -
I- CC 0 C O O O to OO C1 c r I- CC = CIA t'- O -
^ ^ i^i^ ^ i^i^ ^ i i^cc oc oc oc oc oc oc oc oc C C C C C C C C C Co









REFERENCES

Akey, J., Jin, L., Xiong, M., et al. "Haplotypes vs single marker linkage disequilibrium
tests: what do we gain?" European Journal of Human Genetics 9 (2001).4: 291-300.

Altshuler, D., Daly, M. J., and Lander, E. S. "Genetic mapping in human disease."
Science 322 (2008): 881-888.

Ardlie, K.G., Kruglyak, L., and Seielstad, M. "Patterns of linkage disequilibrium in the
human genome." Nature Reviews Genetics 3 (2002).4: 299-309.

Bader, J.S. "The relative power of SNPs and haplotype as genetic markers for association
tests." P/hA, mia ,./.>mics 2 (2001).1: 11-24.

Barton, N. H. and Gale, K. S. Hi;1'1':.J Zones and the Evolut.':-'.r,,; Process. Oxford: Oxford
University Press, 1993.

Barton, S.C., Surani, M. A. H., and Norris, M. L. "Role of paternal and maternal genomes
in mouse development." Nature 311 (1984).5984: 374-376.

Bateson, W. Mendel's principles of her, 3.I',1 United Kingdom: Cambridge, 1909.

Bennett, J. H. and Binet, F. E. "Association between Mendelian factors with mixed selfing
and random mating." H. i..li/ 10 (1956): 51-55.

Broman, K. W. and Speed, T. P. "A model selection approach for the identification of
quantitative trait loci in experimental crosses (with discussion)." J. Roy. Stat. Soc. B64
(2002): 641-656.

Broman, K.W. "The genomes of recombinant inbred lines." Genetics 169 (2005).2:
1133-1146.

Bro. 1i1vin S.R. "Estimation of pairwise identity by descent from dense genetic marker
data in a population sample of haplotypes." Genetics 178 (2008).4: 2123.

Burnham, K.P. and Anderson, D.R. Model Selection and Inference: A Practical
Information-Theoretic Approach. Springer, New York, 1998.

Carlborg, 0. and Haley, C.S. "Epistasis: too often neglected in complex trait studies?"
Nature Reviews Genetics 5 (2004).8: 618-625.

Cattanach, B.M. and Kirk, M. "Differential activity of maternally and paternally derived
chromosome regions in mice." Nature 315 (1985): 496-498.

CIh 1I. worth, B. "The evolution of sex chromosomes." Science 251 (1991).4997:
1030-1033.

C'! i1, Z. "The full em algorithm for the miles of qtl effects and positions and their
estimated variances in multiple-interval mapping." Biometrics 61 (2005).2: 474-480.









C'!h. 1- i ud, J.M., Hager, R., Roseman, C., Fawcett, G., Wang, B., and Wolf, J.B. "Genomic
imprinting effects on adult body composition in mice." Proceedings of the National
A .13. iln; of Sciences 105 (2008).11: 4253.

C'h!. 111!11 G.A. and Doerge, R.W. "Empirical threshold values for quantitative triat
mapping." Genetics 138 (1994).3: 963-971.

Clark, AG. "Inference of haplotypes from PCR-amplified samples of diploid populations."
Molecular B.. 4..,i/; and Evolution 7 (1990).2: 111-122.

Clark, A.G. "The role of haplotypes in candidate gene studies." Genetic epidemiology 27
(2004).4: 321-333.

Collins, A. R. I. 1 '.': Disequilibrium and Association Mapping: Aaibl;,-:- and Applica-
tions. Humana Press, New York, 2007.

Collins, F.S., Guyer, M.S., and C('! i1.1 ivarti, A. "Variations on a theme: cataloging human
DNA sequence variation." Science 278 (1997).5343: 1580.

Cooper, R.S. and Psaty, B.M. "Genomics and Medicine: Distraction, Incremental
Progress, or the Dawn of a New Age?" Annals of internal medicine 138 (2003):
576-580.

Cui, Y., Lu, Q., Cheverud, J.M., Littell, R.C., and Wu, R. "Model for mapping imprinted
quantitative trait loci in an inbred F2 design." Genomics 87 (2006).4: 543-551.

Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J., and Lander, E.S. "High-resolution
haplotype structure in the human genome." Nature genetics 29 (2001).2: 229-232.

Darvasi, A. "Experimental strategies for the genetic dissection of complex traits in animal
models." Nature Genetics 18 (1998).1: 19-24.

Dawson, E., Abecasis, G.R., Bumpstead, S., C'h!. i Y., Hunt, S., Beare, D.M., Pabial, J.,
Dibling, T., Tinsley, E., Kirby, S., et al. "A first-generation linkage disequilibrium map
of human chromosome 22." Nature 418 (2002).6897: 544-548.

De La Vega, F.M., Dailey, D., Ziegle, J., Williams, J., Madden, D., and Gilbert, D.A.
v.-- generation pharmacogenomic tools: a SNP linkage disequilibrium Map, validated
SNP .i-- ii- resource, and high-throughput instrumentation system for large-scale genetic
studies." Biotechniques 32 (2002): 548-554.

Dupuis, J., Siegmund, D., and Yakir, B. "A unified framework for linkage and association
analysis of quantitative traits." Proc. Natl. Acad. Sci. USA 104 (2007): 20210-20215.

Falls, J.G., Pulford, D.J., Wylie, A.A., and Jirtle, R.L. "Genomic imprinting: implications
for human disease." American Journal of P /hi. 't/;/ 154 (1999).3: 635-647.









Farnir, F., Coppieters, W., Arranz, J.J., Berzi, P., Cambisano, N., Grisart, B., Karim, L.,
Marcq, F., Moreau, L., Mni, M., et al. "Extensive genome-wide linkage disequilibrium in
cattle." Genome Research 10 (2000).2: 220-227.

Farnir, F., Grisart, B., Coppieters, W., Riquet, J., Berzi, P., Cambisano, N., Karim, L.,
Mni, M., Moisio, S., Simon, P., et al. "Simultaneous Mining of Linkage and Linkage
Disequilibrium to Fine Map Quantitative Trait Loci in Outbred Half-Sib Pedigrees
Revisiting the Location of a Quantitative Trait Locus With Major Effect on Milk
Production on Bovine (Clintin..-.in. 14." Genetics 161 (2002).1: 275-287.

Fulker, D.W. and Cardon, L.R. "A sib-pair approach to interval mapping of quantitative
trait loci." American journal of human genetics 54 (1994).6: 1092-1103.

Gabriel, S.B., Schaffner, S.F., Nguyen, H., Moore, J.M., Roy, J., Blumenstiel, B., H-1-:iii-
J., DeFelice, M., Lochner, A., F I,.-- i't, M., et al. "The structure of haplotype blocks in
the human genome." Science 296 (2002).5576: 2225-2229.

Georges, M. '\ ppi'"' fine ri llpp1' and molecular dissection of quantitative trait loci in
domestic animals." Ann. Rev. Genomics Human Genet 8 (2007): 131f162.

Giannoukakis, N., Deal, C., Paquette, J., Goodyer, C.G., and Polychronakos, C. "Parental
genomic imprinting of the human IGF2 gene." Nature Genetics 4 (1993).1: 98-101.

Haldane, JB. "The association of characters as a result of inbreeding and linkage." Annals
of eugenics 15 (1949).1: 15-23.

Harrison, R.G. H;1, I./i zones and the evolut.':'",,n ; process. Oxford University Press, USA,
1993.

Hernandez, R.D., Hubisz, M.J., Wheeler, D.A., Smith, D.G., Ferguson, B., Rogers, J.,
Nazareth, L., Indap, A., Bourquin, T., McPherson, J., et al. "Demographic histories and
patterns of linkage disequilibrium in Chinese and Indian rhesus macaques." Science 316
(2007).5822: 240-243.

Hill, W.G. and Hernandez-Sanchez, J. "Prediction of multilocus identity-by-descent."
Genetics 176 (2007).4: 2307.

Hirschhorn, J. N. and Lettre, G. "Progress in genome-wide association studies of human
height." Hormone Research 71 (2009): s5-s13.

Hou, W., Yap, J. S., Wu, S., Liu, T., Cheverud, J. M., and Wu, R. L. "Haplotyping a
quantitative trait with a high-density map in experimental crosses." PLoS ONE 2
(2007).8: e732.

Huang, B. E., Amos, C. I., and Lin, D. Y. "Detecting haplotype effects in genomewide
association studies." Genetic Fi.E,,. I, .1. ,i; 31 (2007): 803-812.









Ikram, M. A., Seshadri, S., Bis, J. C., Fornage, M., DeStefano, A. L., Aulchenko, Y. S.,
et al. "Genomewide association studies of stroke." The New E',,j1-,i,1 Journal of
Medicine 360 (2009): 1718-1728.

Jannink, J.L. and Wu, X.L. "Estimating allelic number and identity in state of QTLs in
interconnected families." Genetics Research 81 (2003): 133-144.

Jiang, C. and Zeng, Z.B. \! I I ill!.; trait analysis of genetic mapping for quantitative trait
loci." Genetics 140 (1995).3: 1111-1127.

Judson, R., Stephens, J.C., and Windemuth, A. "The predictive power of haplotypes in
clinical response." pgs 1 (2000).1: 15-26.

Kao, C.H., Zeng, Z.B., and Teasdale, R.D. \!il!lp!.; interval mapping for quantitative
trait loci." Genetics 152 (1999).3: 1203-1216.

Lander, E.S. and Botstein, D. \! I 'lliI"; Mendelian factors underlying quantitative traits
using RFLP linkage maps." Genetics 121 (1989).1: 185-199.

Lee, S.H. and Van der Werf, J.H.J. "Using dominance relationship coefficients based on
linkage disequilibrium and linkage with a general complex pedigree to increase mapping
resolution." Genetics 174 (2006).2: 1009-1016.

Lettre, G., Jackson, A.U., Gieger, C., Schumacher, F.R., Berndt, S.I., Sanna, S.,
Eyheramendy, S., Voight, B.F., Butler, J.L., Guiducci, C., et al. "Identification of
ten loci associated with height highlights new biological pathr- iv- in human growth."
Nature Genetics 40 (2008).5: 584-591.

Lettre, G. and Rioux, J. D. "Autoimmune diseases: insights from genome-wide association
studies." Human Molecular Genetics 17 (2008): R116-R121.

Li, J., Wang, S., and Zeng, Z.B. \!ill i!. Il-interval mapping for ordinal traits." Genetics
173 (2006).3: 1649.

Lin, D.Y. and Huang, B.E. "The use of inferred haplotypes in downstream analyses." The
American Journal of Human Genetics 80 (2007).3: 577-579.

Lin, D.Y. and Zeng, D. "Likelihood-based inference on haplotype effects in genetic
association studies." Journal of the American Statistical Association 101 (2006).473:
89-104.

Liu, T., Johnson, J.A., Casella, G., and Wu, R. "Sequencing complex diseases with
HapMap." Genetics 168 (2004).1: 503-511.

Liu, T., Todhunter, R.J., Lu, Q., Schoettinger, L., Li, H., Littell, R.C., Burton-Wurster,
N., Acland, G.M., Lust, G., and Wu, R. "Modeling extent and distribution of zygotic
disequilibrium: implications for a multigenerational canine pedigree." Genetics 174
(2006).1: 439-453.









Lou, X.Y., Casella, G., Littell, R.C., Yang, M.C.K., Johnson, J.A., and Wu, R. "A
haplotype-based algorithm for multilocus linkage disequilibrium mapping of quantitative
trait loci with epistasis." Genetics 163 (2003).4: 1533-1548.

Lou, X.Y., Casella, G., Todhunter, R.J., Yang, M.C.K., and Wu, R. "A General Statistical
Framework for Unifying Interval and Linkage Disequilibrium Mapping." Journal of the
American Statistical Association 100 (2005).469: 158-171.

Lynch, M. and Walsh, B. Genetics and A''/l;-;' of Quantitative Traits. Sinauer,
Sunderland, MA, 1998.

Ma, C.X., Casella, G., and Wu, R. "Functional Mapping of Quantitative Trait Loci
Underlying the Ci' 1o w:ter Process A Theoretical Framework." Genetics 161 (2002).4:
1751-1762.

Marques, E., Schnabel, R.D., Stothard, P., Kolbehdari, D., Wang, Z., Taylor, J.F., and
Moore, S.S. "High density linkage disequilibrium maps of chromosome 14 in Holstein
and Angus cattle." BMC genetics 9 (2008).1: 45.

McInnis, M.G., Lan, T.H., Willour, V.L., McMahon, F.J., Simpson, S.G., Addington,
A.M., MacKinnon, D.F., Potash, J.B., Mahoney, A.T., Chellis, J., et al. "Genome-wide
scan of bipolar disorder in 65 pedigrees: supportive evidence for linkage at 8q24, 18q22,
4q32, 2pl2, and 13q12." Molecular r-;/. 1,. Ilry 8 (2003).3: 288-298.

McRae, A.F., McEwan, J.C., Dodds, K.G., Wilson, T., Crawford, A.M., and Slate, J.
"Linkage disequilibrium in domestic sheep." Genetics 160 (2002).3: 1113-1122.

Meuwissen, T.H.E. and Goddard, M.E. "Fine mapping of quantitative trait loci using
linkage disequilibria with closely linked marker loci." Genetics 155 (2000).1: 421-430.

Miles, J.S., Moss, J.E., Taylor, B.A., Burchell, B., and Wolf, C.R. '\! .ppi'I genes
encoding drug-metabolizing enzymes in recombinant inbred mice." Genomics 11
(1991).2: 309.

Mohlke, K.L., Boehnke, M., and Abecasis, G.R. "Metabolic and cardiovascular traits: an
abundance of recently identified common genetic variants." Human Molecular Genetics
17 (2008).R2: R102-R108.

Moore, J.H. "The ubiquitous nature of epistasis in determining susceptibility to common
human diseases." Human H.r, hi;l, 56 (2003).1-3: 73-82.

"A global view of epistasis." Nature genetics 37 (2005): 13-14.

Morison, I.M., R iminy-, J.P., and Spencer, H.G. "A census of mammalian imprinting."
Trends in Genetics 21 (2005).8: 457-465.

Morris, R.W. and Kaplan, N.L. "On the advantage of haplotype analysis in the presence
of multiple disease susceptibility alleles." Genetic .:K/. ,'. '-i,/;/ 23 (2002).3: 221-233.









Morton, N.E. "Linkage disequilibrium maps and association mapping." Journal of Clinical
Investigation 115 (2005).6: 1425-1430.

Niu, T., Qin, Z.S., Xu, X., and Liu, J.S. "B li-, -i in haplotype inference for multiple
linked single-nucleotide polymorphisms." The American Journal of Human Genetics 70
(2002).1: 157-169.

Paterson, A.D., Naimark, D.M.J., and Petronis, A. "The analysis of parental origin of
alleles may detect susceptibility loci for complex disorders." Human H i1.Ji; 49 (1999):
197-204.

Patil, N., Berno, A.J., Hinds, D.A., Barrett, W.A., Doshi, J.M., Hacker, C.R., Kautzer,
C.R., Lee, D.H., Marjoribanks, C., McDonough, D.P., et al. "Blocks of limited
haplotype diversity revealed by high-resolution scanning of human chromosome 21."
Science 294 (2001).5547: 1719-1723.

Pearson, T.A. and Manolio, T.A. "How to interpret a genome-wide association study."
Journal of American Medical Association 299 (2008).11: 1335-1344.

Reich, D.E., Cargill, M., Bolk, S., Ireland, J., Sabeti, P.C., Richter, D.J., Lavery, T.,
Kouyounii i-, R., Farhadian, S.F., Ward, R., et al. "Linkage disequilibrium in the
human genome." Nature 411 (2001): 199-204.

Reik, W. and Walter, J. "Genomic imprinting: parental influence on the genome." Nature
Reviews Genetics 2 (2001).1: 21-32.

Risch, N. and Merikangas, K. "The future of genetic studies of complex human diseases."
Science 273 (1996).5281: 1516.

Ron, M. and Weller, J.I. "From QTL to QTN identification in livestock-winning by points
rather than knock-out: a review." Animal genetics 38 (2007).5: 429.

Saar, K., Beck, A., Bihoreau, M.T., Birney, E., Brocklebank, D., Chen, Y., Cuppen, E.,
Demonchy, S., Dopazo, J., Flicek, P., et al. "SNP and haplotype mapping for genetic
analysis in the rat." Nature Genetics 40 (2008).5: 560-566.

Sanna, S., Jackson, A.U., N I, i i i R., Willer, C.J., C'!. ii W.M., Bonnycastle, L.L.,
Shen, H., Timpson, N., Lettre, G., Usala, G., et al. "Common variants in the
GDF5-UQCC region are associated with variation in human height." Nature Genetics
40 (2008).2: 198-203.

Satagopan, J.M., Yandell, B.S., Newton, M.A., and Osborn, T.C. "A B B.i -;i i approach
to detect quantitative trait loci using Markov chain Monte Carlo." Genetics 144
(1996).2: 805-816.

Schaid, D.J., Rowland, C.M., Tines, D.E., Jacobson, R.M., and Poland, G.A. "Score tests
for association between traits and haplotypes when linkage phase is ambiguous." The
American Journal of Human Genetics 70 (2002).2: 425-434.









Schork, N.J. "Power calculations for genetic association studies using estimated
probability distributions." The American Journal of Human Genetics 70 (2002).6:
1480-1489.

Styrkarsdottir, U., Halldorsson, B.V., Gretarsdottir, S., Gudbjartsson, D.F., Walters, G.B.,
Ingvarsson, T., Jonsdottir, T., Saemundsdottir, J., Center, J.R., Nguyen, T.V., et al.
\!i!i ll.,! genetic loci for bone mineral density and fractures." New Enlli:'J Journal of
Medicine 358 (2008).22: 2355-2365.

Tapper, W., Gibson, J., Morton, N. W., and Collins, A. "A comparison of methods to
detect recombination hotspots." Human H. i i.1;:, 66 (2008): 157-169.

Thompson, E.A. Iil .1i ii i-ii gain in joint linkage analysis." Mathematical Medicine and
B;.. 1..'; 1 (1984).1: 31-49.

Tishkoff, S.A., Dietzsch, E., Speed, W., Pakstis, A.J., Kidd, J.R., C(!. iii- K.,
Bonne-Tamir, B., Santachiara-Benerecetti, A.S., Moral, P., Krings, M., et al. "Global
patterns of linkage disequilibrium at the CD4 locus and modern human origins." Science
271 (1996).5254: 1380-1387.

Tishkoff, S.A., Varkonyi, R., Cahinhinan, N., Abbes, S., Argyropoulos, G., Destro-Bisol,
G., Drousiotou, A., Dangerfield, B., Lefranc, G., Loiselet, J., et al. "Haplotype diversity
and linkage disequilibrium at human G6PD: recent origin of alleles that confer malarial
resistance." Science 293 (2001).5529: 455-462.

Tishkoff, S.A. and Williams, S.M. "Genetic analysis of african populations: human
evolution and complex disease." Nature Reviews Genetics 3 (2002).8: 611-621.

Tsunoda, T., Lathrop, G.M., Sekine, A., Yamada, R., Takahashi, A., Ohnishi, Y., Tanaka,
T., and Nakamura, Y. "Variation of gene-based SNPs and linkage disequilibrium
patterns in the human genome." Human Molecular Genetics 13 (2004).15: 1623-1632.

van Eeuwijk, F.A., Malosetti, M., Yin, X., Struik, P.C., and Stam, P. "Statistical
models for genotype by environment data: from conventional ANOVA models to
eco-physiological QTL models." Australian Journal of A ,i.. /llin/l Research 56 (2005).9:
883-894.

Wi.-:- H., Zh lir- Y.M., Li, X., Masinde, G.L., Mohan, S., Baylink, D.J., and Xu, S.
"B oi- i in shrinkage estimation of quantitative trait loci parameters." Genetics 170
(2005).1: 465-480.

Weedon, M.N., Lango, H., Lindgren, C.M., Wallace, C., Evans, D.M., Mangino, M.,
Freathy, R.M., Perry, J.R.B., Stevens, S., Hall, A.S., et al. "Genome-wide association
analysis identifies 20 loci that influence adult height." Nature Genetics 40 (2008).5:
575-583.

Weedon, M.N., Lettre, G., Freathy, R.M., Lindgren, C.M., Voight, B.F., Perry, J.R.B.,
Elliott, K.S., Hackett, R., Guiducci, C., Shields, B., et al. "A common variant of









HMGA2 is associated with adult and childhood height in the general population."
Nature genetics 39 (2007).10: 1245-1250.

Weir, B. S. "Linkage disequilibrium and association mapping." Annual Review of
Genomics and Human Genetics 9 (2008): 129-142.

Weir, B.S. and Ott, J. Genetic data rahl',- II. Sinauer Associates Sunderland, Mass,
1996.

Wilkins, J.F. and Haig, D. "What good is genomic imprinting: the function of
parent-specific gene expression." Nature Reviews Genetics 4 (2003).5: 359-368.

Wood, A.J. and Oakey, R.J. "Genomic imprinting in mammals: emerging themes and
established theories." PLoS genetics 2 (2006).11: e147.

Wu, R., Ma, C.-X., and Casella, G. Statistical Genetics of Quantitative Traits: I.,j.'
Maps, and QTL. Springer Verlag, New York, 2007a.

Wu, R., Ma, C.X., and Casella, G. "Joint linkage and linkage disequilibrium mapping of
quantitative trait loci in natural populations." Genetics 160 (2002).2: 779-792.

Wu, R. L. and Lin, M. "Functional mapping A new tool to study the genetic architecture
of dynamic complex traits." Nature Reviews Genetics 7 (2006): 229-237.

Wu, R. L. and Zeng, Z-B. "Joint linkage and linkage disequilibrium mapping in natural
populations." Genetics 157 (2001): 899-909.

Wu, S., Yang, J., Wang, C. G., and Wu, R. L. "A general quantitative genetic model for
haplotyping a complex trait in humans." Current Genomics 8 (2007b): 343-350.

Xu, S. \! I pping quantitative trait loci using four-way crosses." Genetics Research 68
(2009).2: 175-181.

Xu, S. and Atchley, WR. "A random model approach to interval mapping of quantitative
trait loci." Genetics 141 (1995).3: 1189-1197.

\! pping quantitative trait loci for complex binary diseases using line crosses."
Genetics 143 (1996).3: 1417-1424.

Yang, R.C. "Zygotic associations and multilocus statistics in a nonequilibrium diploid
population." Genetics 155 (2000).3: 1449-1458.

"Ani 1,i--;- of multilocus zygotic associations." Genetics 161 (2002).1: 435-445.

Yi, N., Xu, S., and Allison, D.B. "B li-, -i in model choice and search strategies for
mapping interacting quantitative trait loci." Genetics 165 (2003).2: 867-883.

Yu, Y., Xu, F., Peng, H., Fang, X., Zhao, S., Li, Y., Cuevas, B., Kuo, W.L., Gray, J.W.,
Siciliano, M., et al. "NOEY2 (ARHI), an imprinted putative tumor suppressor gene in









ovarian and breast carcinomas." Proceedings of the National A .'./. 1il, of Sciences 96
(1999).1: 214-219.

Zaykin, D.V. "Bounds and normalization of the composite linkage disequilibrium
coefficient." Genetic I/.:. n- ;../;i 27 (2004).3: 252-257.

Zaykin, D.V., Westfall, P.H., Young, S.S., Karnoub, M.A., Wagner, M.J., Ehm, M.G.,
and Inc, G.S.K. "Testing association of statistically inferred haplotypes with discrete
and continuous traits in samples of unrelated individuals." Human H. i:.3.1;, 53 (2002):
79-91.

Zeng, Z.B. "Precision mapping of quantitative trait loci." Genetics 136 (1994).4:
1457-1468.

Zhang, K., Deng, M., C'!. 1, T., Waterman, M.S., and Sun, F. "A dynamic programming
algorithm for haplotype block partitioning." Proceedings of the National A ./.. il:1, of
Sciences 99 (2002).11: 7335-7339.

Zou, F., Fine, J.P., Hu, J., and Lin, D.Y. "An efficient resampling method for assessing
genome-wide statistical significance in mapping quantitative trait loci." Genetics 168
(2004).4: 2307-2316.

Zou, F., Nie, L., Wright, F. A., and Sen, P. K. "An efficient resampling method for
assessing genome-wide statistical significance in mapping quantitative trait loci."
Journal of Statistical Pira::., y and Inference 139 (2009): 978-989.









BIOGRAPHICAL SKETCH

Qin Li, originally trained in applied mathematics at Dalian University of Technology,

C'iii i received her Ph.D. from the University of Florida in the summer of 2009. Qin's

in i r is statistics while simultaneously working for the Department of Epidemiology and

Health Policy Research.

Her research focuses on statistical genetics. She is intrigued by the development of

statistical and computational models for identifying genes that control complex traits

and diseases. In her Ph.D. dissertation, Qin explored several fundamental aspects of

family data in constructing the linkage disequilibrium map of the human genome and

fine mapping disease genes. A library of statistical models has been derived to estimate

and test the pattern of gene segregation in a natural population and genetic effects of

haplotypes on complex diseases. She is eager to use her models and algorithms to solve

complicated real-world genetic problems.

Qin is a member of American Statistical Association and Eastern North American

Region/International Biometric Society.





PAGE 1

1

PAGE 2

2

PAGE 3

3

PAGE 4

IwouldliketoexpressmygratitudetothosewhohelpedmetopursuemyPhDdegreeandcompletemydissertation.IamdeeplyindebtedtomyacademicadvisorandcommitteechairDr.RonglingWu.Heissuchagreatmentorandanoutstandingresearcher.Hisleadership,hisenthusiasm,hiscreativitygavemealotofencouragementandinspiration.Withouthisguidance,adviceandcountlesshelp,itisimpossibleformetonishthiswork.Heisalsosuchanicepersonandfriend.Hisdoorisalwaysopentohisstudents.Ilearnednotonlyhowtodoresearchfromhim,butalso,moreimportantly,howtoliveasaparent,asachild,asateacherandasafriend.Iwouldalsoliketothanktheothercommitteemembers:Dr.ArthurBerg,Dr.RogerFillingimandDr.RonaldRandles.Thankthemfortheirtimeandadvices.IamalsoverygratefultotheotherfacultyandstamembersoftheDepartmentofStatisticsandtheDepartmentofEpidemiologyandHealthPolicyResearchatUniversityofFlorida,whereIhavereceivedgeneroussupportduringthepastfewyears.Specialthankstomybelovedfamilyandfriendswhoarealwaysthereforme.Myparentsandparents-in-law,TongwanLiandYuyingZhang,TaoWangandJinyingWang,havebeenagreatsourceoflove,supportandunderstanding.Iammostdeeplygratefultomyhusband,ChenguangWang,forhisseless,continuing,countlesslove,encouragement,understandingandsupport.Andtomyson,LeonL.Wang,Ithankyouforbringingtearsofjoyandlovetous. 4

PAGE 5

page ACKNOWLEDGMENTS ................................. 4 LISTOFTABLES ..................................... 7 LISTOFFIGURES .................................... 9 ABSTRACT ........................................ 10 CHAPTER 1INTRODUCTIONTOMAPPINGCOMPLEXDISEASES ............ 12 1.1Introduction ................................... 12 1.2GeneticDesigns ................................. 13 1.2.1ExperimentalCrosses .......................... 13 1.2.2MultigenerationalFamilies ....................... 14 1.2.3NaturalPopulationswithUnrelatedIndividuals ........... 14 1.2.4NaturalPopulationswithRelatedFamilies .............. 15 1.2.5Case-ControlStudies .......................... 15 1.3NewDevelopments ............................... 15 1.3.1FromQTLtoQTN ........................... 15 1.3.2FromLinkageorLinkageDisequilibriumMappingtoJointAnalysis 16 1.3.3FromGeneticstoEpigenetics ...................... 18 1.4DissertationGoals ............................... 20 2CONSTRUCTINGALINKAGEDISEQUILIBRIUMMAP ........... 22 2.1Introduction ................................... 22 2.2Two-PointModel ................................ 24 2.2.1SamplingStrategy ............................ 24 2.2.2Likelihood ................................ 26 2.2.3Estimation ................................ 30 2.2.4HypothesisTesting ........................... 35 2.3Three-PointModel ............................... 37 2.3.1MultilocusPopulationGenetics .................... 38 2.3.2MultilocusMendelianGenetics ..................... 41 2.3.3Likelihood ................................ 44 2.3.4Estimation ................................ 48 2.3.5MarkerOrdering ............................. 73 2.3.6HypothesisTesting ........................... 73 2.4ComputerSimulation .............................. 74 2.4.1Two-PointModel ............................ 74 2.4.2Three-PointModel ........................... 75 2.5Discussion .................................... 79 5

PAGE 6

.............. 82 3.1HaplotypeandDiplotype ............................ 83 3.2Two-PointHaplotypingModel ......................... 85 3.2.1Likelihood ................................ 85 3.2.2Estimation ................................ 91 3.2.3HypothesisTesting ........................... 92 3.2.4ModelSelection ............................. 93 3.3Three-pointHaplotypingModel ........................ 94 3.4Simulation .................................... 95 3.5Discussion .................................... 98 4COMPUTINGGENETICIMPRINTING ..................... 100 4.1ImprintingModel ................................ 101 4.2Likelihood .................................... 102 4.3Estimation .................................... 109 4.4HypothesisTesting ............................... 111 4.5Simulation .................................... 112 4.6Discussion .................................... 112 5ZYGOTICDISEQUILIBRIUMMAPPINGWITHFAMILYDATA ....... 115 5.1ZygoticDisequilibrium ............................. 116 5.1.1GameteandNon-gameteFrequencies ................. 116 5.1.2CompleteDisequilibriumParameters ................. 119 5.2ModelforEstimation .............................. 122 5.3HypothesisTests ................................ 125 5.4AWorkedExample ............................... 126 5.5Discussion .................................... 126 6FUTUREDIRECTIONS ............................... 128 APPENDIX AHARDY-WEINBERGPRINCIPLE ......................... 130 BESTIMATIONOFZYGOTICDISEQUILIBRIAUSINGCROHN'SDISEASEDATASET ...................................... 131 REFERENCES ....................................... 138 BIOGRAPHICALSKETCH ................................ 147 6

PAGE 7

Table page 2-1Datastructureoftwomarkerstypedforapaneloffull-sibfamilies,eachcomposedofthemother,fatherandospring,sampledatrandomfromanaturalpopulation 25 2-2Matingfrequenciesoffamiliesandospringgenotypefrequenciesperfamilyfortwomarkerssampledfromanaturalpopulation .................. 28 2-3Matingfrequenciesoffamiliesandospringgenotypefrequenciesperfamilyforthreemarkerssampledfromanaturalpopulation ................. 40 2-4MLEs(standarddeviations)ofallelefrequencies,linkagedisequilibrium,andrecombinationfractionfrom1000simulationreplicatesunderdierentsamplingstrategies ....................................... 76 2-5Powertodetectsex-specicdierencesintherecombinationfractionandlinkagedisequilibrium ..................................... 77 2-6MLEsoflinkagedisequilibriaandrecombinationfractionsamongmarkersunderthethree-pointmodel,inacomparisonwiththoseunderthetwo-pointmodel.ThenumbersintheparenthesesarethestandarddeviationsoftheMLEs .... 78 3-1Diplotypecongurationofospring'sgenotypesattwoSNPs ........... 86 3-2MLEs(standarddeviations)ofpopulationandquantitativegeneticparametersunderthetwo-pointmodel. ............................. 96 3-3Powertodetecttheriskhaplotypeunderthetwo-pointmodel .......... 97 3-4MLEs(standarddeviations)ofpopulationandquantitativegeneticparametersunderthethree-pointmodelbasedon10001designstrategy .......... 98 4-1Diplotypecongurationofospring'sgenotypeswithimprintingeectsattwoSNPs .......................................... 104 4-2MLEs(standarddeviations)ofquantitativegeneticparametersunderthetwo-pointmodelwithimprintingeect ....................... 113 4-3MLEs(standarddeviations)ofpopulationandquantitativegeneticparametersunderthethree-pointmodelwithimprintingeectbasedon10001designstrategy ........................................ 114 5-1Frequenciesandobservationsofmarkergenotypes ................. 117 5-2ExpressionsofquadrigenicdisequilibriumDABintermsofgenotypiccongurationfrequencies,allelefrequenciesandlower-orderdisequilibriumcoecients ..... 122 5-3Matingfrequenciesoffamiliesandospringgenotypefrequenciesperfamilyfortwomarkerssampledfromanaturalpopulation .................. 123 7

PAGE 8

132 B-2EstimatesofrelativefrequenciesfromageneticmappingprojectofCrohn'sdiseaseCont. .......................................... 133 B-3EstimatesofrelativefrequenciesfromageneticmappingprojectofCrohn'sdiseaseCont. .......................................... 134 B-4EstimatesofzygoticdisequilibriumfromageneticmappingprojectofCrohn'sdisease ......................................... 135 B-5EstimatesofzygoticdisequilibriumfromageneticmappingprojectofCrohn'sdiseaseCont. ..................................... 136 B-6EstimatesofzygoticdisequilibriumfromageneticmappingprojectofCrohn'sdiseaseCont. ..................................... 137 8

PAGE 9

Figure page 3-1HaplotypecongurationofadiplotypefortwoSNPs ............... 84 3-2DiplotypecongurationofagenotypefortwoSNPs ................ 85 9

PAGE 10

10

PAGE 11

11

PAGE 12

LanderandBotstein ( 1989 )'sseminarpaperonintervalmappingofcomplextraits,awidevarietyofstatisticalmethodshavebeendevelopedtomapgenesorquantitativetraitloci(QTLs).Broadlyspeaking,thesenewmethodshavethefollowingaims:improvingtheprecisionofQTLseparationfromachromosomalsegment(compositeintervalmapping; Zeng 1994 ),extendinggeneticmappingtoaccommodatingoutbredcrosses( XuandAtchley 1996 ),complicatedmatingdesigns( JanninkandWu 2003 ),andnaturalpopulations( Wuetal. 2002 ),mappingmultipleinteractingQTLs(multipleintervalmapping;( Kaoetal. 1999 ),mappingQTLsfortraitcorrelations( Jiangand 12

PAGE 13

, 1995 )andgenotype-environmentinteractions( vanEeuwijketal. 2005 ),mappingQTLsforbinary( XuandAtchley 1996 )andcategoricaltraits( Lietal. 2006 ),derivingQTLmappingprocedureswithintheBayesianparadigm( Satagopanetal. 1996 ; Yietal. 2003 ),andmappingQTLsforcomplexdynamictraits(functionalmapping; Maetal. 2002 ; WuandLin 2006 )amongothers.SpecicstatisticalissuesforQTLmappinghavealsobeenconsideredinseveralareasincludingthedeterminationofcriticalthresholds( ChurchillandDoerge 1994 ; Zouetal. 2004 ),modelselection( BromanandSpeed 2002 ),nonparametricmappingofQTLs( Zouetal. 2009 ),andasymptoticpropertiesofQTLparameterestimates( Chen 2005 ).Inthischapter,IwillintroducebasicgeneticdesignsusedforQTLmappingandhighlighttherecentdevelopmentsofdiseasemapping.Attheend,Iwilldescribethegoalsofthisdissertation. Milesetal. 1991 )ordirectlywithhumans.Geneticstudiesofcomplexdiseaseswillneedanappropriategeneticdesign.Typesofpopulationsforgeneticmappingaredierentbetweenanimalsandhumansbecauseoftheirdierentbiologicalpropertiesandethnicconsideration.Belowareseveralcommonlyusedmappingpopulations. Darvasi 1998 ).CrossingtwoinbredstrainsleadstoabackcrossorF2population,inwhichdierentgenotypesaregeneratedforeachgene.Othercrosspopulationsincluderecombinantinbredlines(RILs).AnRILpopulationisgeneratedbycontinuousinbreedingoftheprogeny,initiatedwithtwoheterozygousfounders,foranadequatelylargenumberofgenerationsthatleadstothedisappearanceofanyheterozygote( Broman 2005 ).Linkageanalysisintermsoftherecombinationfraction 13

PAGE 14

FulkerandCardon 1994 ; Liuetal. 2006 ; XuandAtchley 1995 ).Thisdesigncanbeusedtomapgenesforinheriteddiseases,suchasdiabetesorcancer.Therecombinationfractionandidenticalbydecent(IBD)coecientarethekeyforgeneticmappingwithmultigenerationalfamilies. Louetal. 2003 ).Inapopulation,dierentlociaregeneticallyassociated,withtheextentdescribedbyaparametercalledthe(gametic)linkagedisequilibrium(LD).LD-basedmappingismeritoriousintermsofitssimplesamplingscheme(especiallysuitableforhumans)andhigh-resolutiondissectionofageneintoanarrowgenomicregion.Naturalpopulationswithunrelatedfamilies:AlthoughLDmappinghastremendouspotentialtonemapfunctionalgenesforacomplextrait,itmayprovideaspuriousestimateofLDinpracticewhentheassociationbetweengenesisduetoevolutionaryforces,suchasmutation,drift,selection,populationstructure,andadmixture( LynchandWalsh 1998 ).Amappingstrategythatsamplesunrelatedfamilies(composedofparentsandospring)fromanaturalpopulation( Dupuisetal. 2007 ; WuandZeng 2001 )ishelpfulforovercomingthelimitationofLDmappingbysimultaneouslyestimatingthelinkageandlinkagedisequilibrium. 14

PAGE 15

RischandMerikangas ( 1996 )haveshownthatassociationstudiesaremorepowerfulthanlinkagestudiesforndinggeneticdeterminantsofacomplexphenotypesuchastreatmentresponse. 1.3.1FromQTLtoQTNWhilegeneticmappinghasbenetedfromtherapidadvancesinstatisticalandcomputationaltechnologies,thecompletionofthehumangenomesequencein2005anditsderivative,theHapMapProject,togetherwithrapidimprovementsingenotypinganalysis,haveallowedagenome-widescanofgenesforcomplextraitsordiseases( Altshuleretal. 2008 ; Ikrametal. 2009 ).Suchgenome-wideassociationstudieshavegreatlystimulatedourhopethatdetailedgeneticcontrolmechanismsforcomplexphenotypescanbeunderstoodatindividualnucleotidelevelsornucleotidecombinations.Duringthepasttwoyears,morethan250locihavebeenreproduciblyidentiedforpolygenictraits( HirschhornandLettre 2009 ).Someoftheselociconrmthepreviousdiscoverybyotherapproaches.Itisclearthatmanygenesdetectedaecttheoutcomeofatraitordiseasethroughitsbiochemicalandmetabolicpathways.Forexample,ofthe23locidetected 15

PAGE 16

Mohlkeetal. 2008 ).GenesassociatedwithCrohn'sdiseaseencodeautophagyandinterleukin-23-realtedpathways( LettreandRioux 2008 ).Theheightlocidetectedregulatedirectlychromatinproteinsandhedgehogsignaling( HirschhornandLettre 2009 ).Genome-widestudieshaveidentiedgenesthatencodethesitesofactionofdrugsapprovedbytheFoodandDrugAdministration,includingthiazolidinedionesandsulfonylureas(instudiesoftype2diabetes)( Mohlkeetal. 2008 ),statins(lipidlevels)( Mohlkeetal. 2008 ),andestrogens(bonedensity)( Styrkarsdottiretal. 2008 ).Asamoreaccurateandusefulapproachforcharacterizingthegeneticvariantsforcomplextraits,adirectanalysisofDNAsequenceswithSNPdatahasincrediblebenets( RonandWeller 2007 ).IfastringofDNAsequenceorhaplotypeisknowntoincreasediseaserisk,thenaspecializeddrugcanbedesignedtoinhibittheexpressionofthisDNAsequence.ThecontrolofthisdiseasecanbemademoreecientifallofitsunderlyingDNAsequencesareidentiedovertheentiregenome.Anewterm,quantitativetraitnucleotidesorQTNs,hasbeencoinedtodescribethesequencepolymorphismsthatcausephenotypicvariationinaquantitativetrait.TheminimumnucleotidesizeofaQTNisasingleSNP,whereasveryoftenaQTNcontainstwoormorecloselylinkedSNPs.Theassociationanalysisbetweenmulti-SNPQTNsandphenotypesisstatisticallychallengingbecauseoftheexistenceofunphasedhaplotypes. 16

PAGE 17

Wuetal. 2002 ).Obviously,simultaneousestimationoftherecombinationfractionandLDbetweenthemarkersandQTLavoidsfalsepositiveresults(spuriousLD)whenLDisusedtonemapgenesforcomplextraits. Louetal. ( 2005 )developedaunifyingmodelforintegratingintervalandLDmappingtonemapthelocationsofQTLsonalinkagemap.Aswithintervalmapping,thismodelallowstoscanfortheexistenceanddistributionofQTLsthroughoutthelinkagemap,andmeanwhileprovides 17

PAGE 18

18

PAGE 19

Giannoukakisetal. 1993 ).Thegeneticmechanismofimprintingisstillunclear.Butageneralhypothesisisthatimprintingrepresentsagenetic\battleofthesexes",inwhichmaternally-expressedimprintedgenesusuallysuppressgrowth,whilepaternallyexpressedgenesusuallyenhancegrowth.The\battleofthesexes"hypothesisispartlybasedonthendinginanimalsthatsuggestsgrowth-promotingimprintedgeneshelpensurethecontinuationofthefather'sgenes.Themother,however,ismoreinterestedinmaintainingherownhealthand,hence,hergenes\ght"thepaternalgenesandlimitthesizeoftheembryoorfetus.Becauseoftheirrelevanceingrowth,itislikelythatimprintedgenesplayamajorroleinthedevelopmentofcancerandotherconditionsinwhichcellandtissuegrowthareabnormal.Imprintedgenesinwhichthecopyfromthemotheristurnedon(maternallyexpressed)usuallysuppressgrowth,whilepaternallyexpressedgenesusuallystimulategrowth.Sometumorsuppressorgenesshouldbematernallyexpressed,buttheyaremistakenlyturnedotopreventthegrowth-limitingproteinfrombeingmade,nally 19

PAGE 20

(1) Developastatisticalalgorithmforconstructingajointlinkage-linkagedisequilibriummapbysimultaneouslyestimatingtherecombinationfractionandlinkagedisequilibriumbetweendierentmolecularmarkersinanaturalhumanpopulation(Chapter 2 ); (2) Frameageneralquantitativegeneticmodelfordetectingriskhaplotypesthatencodeacomplexdiseaseinasetofrandomunrelatedfamiliesfromanaturalpopulationandcharacterizingthepatternandmagnitudeofgeneticinteractionsbetweendierenthaplotypes(Chapter 3 ); (3) DeriveanovelstatisticalmodelforcomputinggeneticimprintingexpressedattheDNAsequencelevelandestimatingtheeectsofgeneticimprintingonhumandiseases(Chapter 4 ); 20

PAGE 21

EstablisharobustprocedureforgenetichaplotypingofcomplexdiseasesinapopulationwhichisnotatHardy-Weinbergequilibrium(Chapter 5 ).Foreachofthemodelsdeveloped,simulationstudieswillbeperformedtotestthestatisticalbehaviorsofthemodels.Theaccuracyandprecisionofparameterestimationandpowerofthemodeltodetectsignicantgenesareexploredunderdierentsimulationscenarios(withdierentsamplesizesandheritabilities).Iwillalsoinvestigateseveralcomputationissuesincludingtheeciencyandconvergencerate. 21

PAGE 22

Ardlieetal. 2002 ; Tapperetal. 2008 ; Weir 2008 ).Thecentralrationaleofthisapproachisfoundedontheexpectationthatallelicassociation(knownaslinkagedisequilibriumorLD)betweendierentpolymorphiclocidecayswiththeirdistancesuchthatacandidategenomicregionforadiseasecanberenedbyexaminingtheLDbetweenthediseaseandaseriesofSNPstypedathighdensity( Collins 2007 ; Marquesetal. 2008 ).However,theoccurrenceandextentofLDinagivenpopulationisoftenaectedbymanyevolutionaryfactorsthatoperationonit,includingmutation,selection,driftandadmixture.ByobscuringthesignalofLD,thesefactorswilllimittheapplicationofLDmappingtonarrowtheregioncontainingacausalsiteforthedisease.OnewaytocircumventthislimitationistodrawaproleofLDthatdecaysoveranextendedgenomicregion( Farniretal. 2000 ; Liuetal. 2006 ; McRaeetal. 2002 ).AgreatdealaboutthiscanbelearntfrommarkerbymarkerassociationwhichcanthenbeusedtodenethestatisticalpowerofassociationstudiesforthediseasephenotypeutilizingSNPs( Schork 2002 )andtoguidetheselectionanddensityofsuchpolymorphismstocreatemarkermapsmostecientincandidategene,candidateregion,andeventuallywhole-genomeassociationstudies( DeLaVegaetal. 2002 ).ItisrelativelystraightforwardtoconstructamapofLDdecayingoverphysicaldistancebetweendierenceSNPsbecausecurrenttechnologiesareabletocountthebasepairsbetweenanytwopolymorphicsites.AnumberofLDmapsintermsofphysicaldistancehavebeenreportedandinstrumentalforexplainingthegeneticdiversityofhumanpopulations( Dalyetal. 2001 ; Dawsonetal. 22

PAGE 23

; Gabrieletal. 2002 ; Reichetal. 2001 ; Tsunodaetal. 2004 )aswellastheoriginofimportanthumangenes( Tishkoetal. 1996 2001 ; TishkoandWilliams 2002 ).AnLDmapcanalsobedescribedbythepatternofLDdeclineovergeneticdistance,reectingtherecombination.RegionsinthegenomewithextensiveLDmaycorrespondtorecombinationcold-spots,whereasregionswithrecombinationhot-spotswillbeexpectedtoshowlessextensiveLD.Thus,adetailedmapofLDintermsoftherecombinationwillhelptoidentifygenomicregionswithrecombinationcold-orhot-spotsandfurtherdeterminethebestmarkerdensityforadequatecoverage.Inhot-spotareas,ahighermarkerdensityisrequiredforabettercoverageofthegenomethanincold-spotareas.Severalstatisticalapproacheshavebeenderivedtojointlymeasurethelinkageandlinkagedisequilibriumforopen-pollinatednaturalpopulations( WuandZeng 2001 )anddomesticanimals( Georges 2007 ).TheseapproachesallowtheconstructionofanLDmap,butdonottakeintoaccountthedesignofLDmappingsuitableforhumans.Thusfar,astatisticalalgorithmforconstructingalinkage-linkagedisequilibriummapinhumanshasnotbeendeveloped.Thepurposeofthischapteristoproposeastatisticaldesignandalgorithmforsimultaneouslyestimatingtherecombinationfractionandlinkagedisequilibriumforhumanpopulations.Thedesignsamplesapanelofrandomfamiliesfromanaturalhumanpopulation,eachofwhichiscomposedofbothparentsandoneormorechildren.Thedesignfullyconsiderstheoutcrossingnatureofhumanpopulations.Thealgorithmdevelopedisimplementedwithhaplotypeanddiplotypeinformationderivedfromunphasedmarkerdatacollectedforeachmemberofafamily.Thealgorithmincorporatesatwo-levelhierarchyofestimationroutes:attheupperlevel,populationgeneticparametersincludinghaplotypefrequencies,allelefrequenciesandlinkagedisequilibriaareestimatedfromparentaldata,whereasatthelowerlevel,therecombinationfractionisestimatedfromthedataofospringgenotypeswhosehaplotypesaretransmittedfromaspecicmatingofparentalgenotypes.Theapproachwillincludeatestingprocedure 23

PAGE 24

2.2.1SamplingStrategySupposethereisanaturalhumanpopulationatHardy-Weinbergequilibrium(AppendixA).Fromthispopulation,nunrelatedfamiliesarerandomlysampled,eachofwhichcontainsthemother,thefatherandoneormorechildren.Eachsubjectsampledistypedforsinglenucleotidepolymorphism(SNP)markers,aimedtostudythepatternofgeneticdiversityandstructureinhumans.ConsidertwomarkersA(withtwoallelesAanda)andB(withtwoallelesBandb),whicharesegregatinginthepopulation.Thetwomarkersformfourhaplotypes,AB,Ab,aB,andab,withthefrequenciesdenotedasp11,p10,p01,andp00,respectively.Letpvs.1pandqvs.1qdenotethefrequenciesofallelesA,aandB,b,respectively,inamixpopulationofmalesandfemales(assumingnosex-specicdierenceinalleleandhaplotypefrequencies).Thus,wehavethefollowingrelationships: whereDisthecoecientoflinkagedisequilibriumbetweenthetwomarkers.Thetwomarkersproduceninejointgenotypes,AABB(codedas1),AABb(codedas2),...,aabb(codedas9),whichareobserved.Thus,eachsubjectwillbearoneofthesegenotypes,andtheparentsineachfamilywillbeoneof99=81possiblegenotypebygenotypecombinations.Dependingontheparentalgenotypecombination,allospringinafamilywillhaveacertainnumberofmarkergenotypes.Atmeiosis,aparentaldiplotypes 24

PAGE 25

Datastructureoftwomarkerstypedforapaneloffull-sibfamilies,eachcomposedofthemother,fatherandospring,sampledatrandomfromanaturalpopulation GroupMotherFatherNumberAABBAABbAAbbAaBBAaBbAabbaaBBaaBbaabb 2-1 givesthestructureofgenotypicdatacollectedfromn=P9i=1P9j=1nijrandomfamiliesinwhichthedistributionofgenotypesinthemothers,fathers,andtheirospringareshown. 25

PAGE 26

logL(jMm;Mf;Mo)=logL(pjMm;Mf)+logL(gjMm;Mf;Mo;p):(2{2)Maximizingthelikelihood( 2{2 )isequivalenttomaximizinglogL(pjMm;Mf)andlogL(gjMm;Mf;Mo;p)individually.Therstpartisataupperlevelofjointlikelihood( 2{2 )becauseitisconstructedwiththeinformationfromtheparents.Givenitsinformationderivedfromthetransmissionfromparentstoospring,thesecondpartisatalowerlevelofthelikelihood.Theupper-levellikelihoodisconstructedbyparentalgenotypeobservationsandexpectedparentalgenotypefrequencies.Table 2-2 showsthestructureandfrequenciesofmotherbyfathergenotypecombinationsunderrandommating.ForadoubleheterozygoteAaBb,itsobservedgenotypemaybederivedfromtwopossiblediplotypes,ABjab(inaprobabilityofp11p00)orAbjaB(inaprobabilityofp10p01),wheretheverticallinesareusedtoseparatethetwounderlyinghaplotypesofadiplotype.Apolynomialfunctionfor 26

PAGE 27

logL(pjMm;Mf)=constant+n11logp411+n12log(2p311p10)+n13log(p211p210)+n14log(2p311p01)+n15log[2p211(p11p00+p10p01)]+n16log(2p211p10p01)+n17log(p211p201)+n18log(p211p10p01)+n19log(p211p200)++n55log[(p11p00+p10p01)2]++n99log(p400): 27

PAGE 28

Matingfrequenciesoffamiliesandospringgenotypefrequenciesperfamilyfortwomarkerssampledfromanaturalpopulation ParentalMatingAABBAABbAAbbAaBBAaBbAabbaaBBaaBbaabb 21 23AABBAAbbp211p21014AABBAaBBp2112p11p011 21 25AABBAaBbp2118<:2p11p002p10p011 2!11 2!21 2!21 2!16AABBAabbp2112p10p001 21 27AABBaaBBp211p20118AABBaaBbp2112p01p001 21 29AABBaabbp211p2001:::41AaBbAaBb8<:2p11p002p10p018<:2p11p002p10p011 4!211 2!1!21 4!221 2!1!21 2(!21+!22)1 2!1!21 4!221 2!1!21 4!21:::81aabbaabbp200p2001

PAGE 29

1=p10p01 respectively.BoththediplotypeswillproducehaplotypesAB,Ab,aB,andab,withfrequenciesdenedasfollows: ParentAaBbHaplotype DiplotypeFrequencyABAbaBab ABjab1 2(1)1 21 21 2(1)AbjaB11 21 2(1)1 2(1)1 2 Thus,overallhaplotypefrequenciesproducedbythisparentarecalculatedas1 2!1forABoraband1 2!2forAboraB.Table 2-2 providestheconditionalprobabilitiesofospringgenotypesordiplotypesgivenjointmotherandfathergenotypes.Basedontheinformationaboutgenetic 29

PAGE 30

logL(gjMm;Mf;Mo;p)=constant+n111log(1)+n121log(1 2)+n122log(1 2)+n131log(1)+n141log(1 2)+n142log(1 2)+(n151+n155)log(1 2!1)+(n152+n154)log(1 2!2)++(n551+n559)log(1 4!21)+(n552+n554+n556+n558)log(1 2!1!2)+(n553+n557)log(1 4!22)+n555log1 2(!21+!22)++n999log(1): Inthenextsection,analgorithmicprocedurewillbedescribedtoestimatetheparametersthatdenethelikelihood( 2{3 ). 2{3 ),wederiveaclosedformfortheEMalgorithmtoestimatehaplotypefrequencies.Thisprocedureisdescribedasfollows:IntheEstep,wecalculatetheprobabilitywithwhichadoubleheterozygoteparentcarriesaparticulardiplotypewithequations( 2{4 )and( 2{5 ).IntheMstep,haplotypefrequenciesareestimatedwiththecalculateddiplotypeprobabilityby ^p11=1 4n[4n11+3(n12+n21+n14+n41)+2(n22+n44+n13+n31+n15+n51+n16+n61+n17+n71+n18+n81+n19+n91+n24+n42)+(n23+n32+n25+n52+n26+n62+n27+n72+n28+n82+n29+n92+n34+n43+n45+n54+n46+n64+n47+n74+n48+n84+n49+n94)+(n15+n51+n25+n52+n35+n53+n45+n54+n65+n56+n75+n57+n85+n58+n95+n59)+2n55]; 30

PAGE 31

4n[4n33+3(n23+n32+n36+n63)+2(n22+n66+n13+n31+n26+n62+n34+n43+n35+n53+n37+n73+n38+n83+n39+n93)+(n12+n21+n16+n61+n24+n42+n25+n52+n27+n72+n28+n82+n29+n92+n46+n64+n56+n65+n67+n76+n68+n86+n69+n96)+(n15+n51+n25+n52+n35+n53+n45+n54+n65+n56+n75+n57+n85+n58+n95+n59)+(1)2n55]; ^p01=1 4n[4n77+3(n78+n87+n47+n74)+2(n88+n44+n97+n79+n84+n48+n76+n67+n75+n57+n73+n37+n72+n27+n71+n17)+(n98+n89+n94+n49+n86+n68+n85+n58+n83+n38+n82+n28+n81+n18+n64+n46+n54+n34+n42+n24+n41+n14)+(n95+n59+n85+n58+n75+n57+n65+n56+n45+n43+n45n54+n35+n53+n25+n52+n15+n51)+(1)2n55]; ^p00=1 4n[4n99+3(n98+n89+n96+n69)+2(n88+n66+n97+n79+n95+n59+n94+n49+n93+n39+n92+n29+n91+n19+n86+n68)+(n87+n78+n85+n58+n84+n48+n83+n38+n82+n28+n81+n18+n76+n67+n65+n56+n64+n46+n63+n36+n62+n26+n61+n16)+(n95+n59+n85+n58+n75+n57+n65+n56+n45+n54+n35+n53+n25+n52+n15+n51)+2n55]: TheEandMstepsareiteratedbetweenequations( 2{4 )and( 2{5 )and( 2{8 ),( 2{9 ),( 2{10 ),and( 2{11 )untiltheestimatesconvergetostablevalues.Theestimatesatconvergencearethemaximumlikelihoodestimates(MLEs)ofhaplotypefrequencies.TheMLEsofallelefrequenciesandlinkagedisequilibriumcanbeobtainedbysolvingequation( 2{1 ),expressedas 31

PAGE 32

2{7 ),wederiveaclosedformfortheEMalgorithmtoestimatetherecombinationfraction.Thisprocedureisdescribedasfollows:IntheEstep,wecalculatetheprobabilitywithwhichaconsideredhaplotypeproducedbyadoubleheterozygoteparentistherecombinanttypeusing (1)+(1)forhaplotypeABorab2= +(1)(1)forhaplotypeAboraB; andtheprobabilitywithwhichadoubleheterozygoteospringcarriesrecombinanthaplotypesby IntheMstep,theestimateoftherecombinationfractionisobtainedby M;(2{15)wheremequalsthesumofthefollowingterms, 32

PAGE 33

51 +1(n521+n526)+2(n523+n524)forparentalgenotypecombination 52 +1(n532+n536)+2(n533+n535)forparentalgenotypecombination 53 +1(n541+n548)+2(n542+n547)forparentalgenotypecombination 54 +1(n562+n569)+2(n563+n568)forparentalgenotypecombination 56 +1(n574+n578)+2(n575+n577)forparentalgenotypecombination 57 +1(n584+n589)+2(n586+n587)forparentalgenotypecombination 58 +1(n595+n599)+2(n596+n598)forparentalgenotypecombination 59 +1(n151+n155)+2(n152+n154)forparentalgenotypecombination 15 +1(n251+n256)+2(n253+n254)forparentalgenotypecombination 25 +1(n352+n356)+2(n353+n355)forparentalgenotypecombination 35 +1(n451+n458)+2(n452+n457)forparentalgenotypecombination 45 +1(n652+n659)+2(n653+n658)forparentalgenotypecombination 65 +1(n754+n758)+2(n755+n757)forparentalgenotypecombination 75 +1(n854+n859)+2(n856+n857)forparentalgenotypecombination 85 +1(n955+n959)+2(n956+n958)forparentalgenotypecombination 95 +21(n551+n559)+22(n553+n557)+(1+2)(n552+n554+n556+n558)+n555forparentalgenotypecombination 55 ,andM=9Xi9Xj9Xknijkn522n525n544n545n565n566n585n588n252n255n454n455n655n656n855n858:TheEandMstepsareiteratedbetweenequations( 2{13 ),( 2{14 )and( 2{15 )untilstableestimatesareobtained.Thestandarddeviationoftheestimateoftherecombinationfractionisfurtherderived.Toobtaintheestimateofrecombinationfraction,thelikelihoodformulatedbythemultinomialdistributionwiththeparameterP9iP9jP9knijkandpijkneedstobe 33

PAGE 34

2p2112p11p10)++n151log(1 2p211[2p11p00(1r)+2p10p01r])+n152log(1 2p211[2p11p00+2p10p01(1)])++n555log(1 2[2p11p00(1)+2p10p01]2+[2p11p00+2p10p01(1)]2)++n999log(p400)Sincep11;p10;p01;p00areknownfromtheestimationthroughtheupper-levellikelihood( 2{3 ),theaboveequationisequaltoConstant+n151log(1 2p211[2p11p00(1)+2p10p01])+n152log(1 2p211[2p11p00+2p10p01(1)])++n555log(1 2[2p11p00(1)+2p10p01]2+[2p11p00+2p10p01(1)]2)++n959log(1 2p200[2p11p00(1)+2p10p01])Takethecell151fromtheabovelikelihoodasanexample,intheEstep,theprobabilitywithwhichaconsideredhaplotypeproducedbyadoubleheterozygoteparent 34

PAGE 35

p11p00(1)+p10p01Ifusethecell151fromthelikelihoodusingconditionalprobabilitiesshownin( 2{7 ),intheEstep,theprobabilitywithwhichaconsideredhaplotypeproducedbyadoubleheterozygoteparentistherecombinanttypeiscalculatedusing( 2{13 ),i.e.cond1=1=(1) (1)+(1)=p10p01 p11p00(1)+p10p01=joint1Similarly,2andin( 2{13 )and( 2{14 )usingtheconditionalprobabilitiesarethesameasthoseusingthejointprobabilities.Therefore,weconcludethatitisequivalenttoobtainrecombinationfractionestimationusingthejointprobabilitiesandconditionalprobabilitiesinthelower-levellikelihood.2 35

PAGE 36

ThelikelihoodundertheH1ofhypotheses( 2{16 )iscalculatedwiththefollowingsteps.First,theobservationsforeachoftheninepossiblegenotypesforthetwomarkersinmalesandfemalesarecollapsedusingni=P9j=1nijandnj=P9i=1nij,respectively.Second,fromthesecollapsedobservations,sex-specichaplotypefrequenciesareestimatedwiththeEMalgorithm.Third,becauseofdierencesbetweenmalesandfemales,thefrequenciesofparentaldiplotype(andthereforegenotype)combinationsintable 2-2 willbereplacedbytheproductsofsex-specicdiplotypefrequenciesfromwhichthelikelihoodisestimated.Sex-specicdierencesinallelefrequenciesandLDcanbetestedwiththenullhypothesis,H0:pm=pfandqm=qfandH0:Dm=Df,whichareequivalenttothe 36

PAGE 37

ParameterestimationunderthesenullhypothesescanbeconductedusingtheEMalgorithmimplementedwiththeconstraintsinequations( 2{17 ),( 2{18 ),and( 2{19 ),respectively.Ifthereisasex-specicdierenceintherecombinationfraction,wewillredenetheoverallprobabilityofahaplotypeinequation( 2{6 )by!m1=m(1m)+(1m)m,!m2=mm+(1m)(1m)and!f1=f(1f)+(1f)f,!f2=ff+(1f)(1f)wherem=pm11pm00=(pm11pm00+pm10pm01)andf=pf11pf00=(pf11pf00+pf10pf01).Thesesex-specicoverallprobabilitiesareusedtoexpressthediplotypeorgenotypefrequenciesofospringwithinafamily.Then,theEMalgorithmisderivedtoestimatesex-specicrecombinationfractionsmandf.Whethertherecombinationfractionissex-speciccanbetestedbyformulatingthehypotheses fromwhichalog-likelihoodratioisthencalculated. 37

PAGE 38

2-1 (2{21) Theseeighthaplotypesgenerate64possiblediplotypes,whichleadsto27genotypes.DiplotypefrequenciesforthethreemarkersareexpressedastheproductsofhaplotypefrequenciesundertheHWEassumption.Thefrequenciesofdiplotypesthataregenotypicallyidenticalarecollapsedsothatgenotypefrequenciesarecalculated.AnalogoustoTable 2-2 ,wecantabulatethematingfrequenciesofdierentgenotypesbetweenthematernal 38

PAGE 39

2-3 .Asshowninthistable,threemarkersproduce2727=729possibleparentalgenotypecombinations.Letnijdenotethenumberoffamiliesfromthecombinationbetweenmothergenotypeiandfathergenotypejforthethreemarkers(i;j=1;2;:::;27)andni;j;kdenotethenumberofospringderivedfromparentalgenotypecombinationij(k=1;2;:::;27).Similarlytotable 2-1 ,atableofthestructureofgenotypicdatacollectedfromn=P27i=1P27j=1nijrandomfamiliesinwhichthedistributionofgenotypesinthemothers,fathers,andtheirospringcanbetabulated.Withsuchamatingstructure,wecanformulateamultinomiallikelihoodformarkergenotypeobservationsfromwhichtheEMalgorithmisimplementedtoobtaintheMLEsofhaplotypefrequenciesp=(p111;p110;p101;p100;p011;p010;p001;p000). 39

PAGE 40

Matingfrequenciesoffamiliesandospringgenotypefrequenciesperfamilyforthreemarkerssampledfromanaturalpopulation ParentalMatingAABBCCAABBCcAABBccAABbCCAABcCc:::aabbCcaabbcc 21 23AABBCCAABBccp2111p211014AABBCCAABbCCp21112p111p1011 21 25AABBCCAABbCcp2111(2p111p1002p110p1011 2111 2(111)1 2(111)1 211...14AABBCCAaBbCcp21118>>>><>>>>:2p111p0002p110p0012p101p0102p100p0111 211 221 231 24...365AaBbCcAaBbCc8>>>><>>>>:2p111p0002p110p0012p101p0102p100p0118>>>><>>>>:2p111p0002p110p0012p101p0102p100p0111 4211 4121 4221 4131 4(14+23)1 4121 421...728aabbccaabbCcp20002p000p0011 21 2729aabbccaabbccp2000p20001 2{31 )and( 2{30 ).

PAGE 41

2{4 )and( 2{5 ),therelativefrequenciesofthesediplotypescanbeexpressedintermsofdiplotype(andthereforehaplotype)frequenciesinthepopulation.AssumethatthethreemarkersareinordersA-B-C.Weuseg00,g01,g10,andg11todenotetheprobabilitieswithwhichthereisnocrossoverbetweenmarkersAandBaswellasbetweenmarkersBandC,thereisnocrossoverbetweenmarkersAandBbutonecrossoverbetweenmarkersBandC,thereisonecrossoverbetweenmarkersAandBbutnocrossoverbetweenmarkersBandC,andthereisonecrossoverbetweenmarkersAandBaswellasbetweenmarkersBandC,respectively.TherecombinationfrequenciesbetweenmarkersAandB(12),betweenmarkersBandC(23),andbetweenmarkersAandC(13)canbeexpressedintermsofthesegprobabilities,i.e., 41

PAGE 42

2(12+2313);g10=1 2(12+1323);g01=1 2(13+2312)g00=1g11g10g01; Fromtheseexpressions,thecoecientofgeneticinterferencebetweentwoadjacentmarkerintervalscanbederivedas Non-interference,i.e.,I=0,meansthatwehave Ifthereisnodoublerecombinationbetweentwoadjacentmarkerintervals,thismeansI=1,or Thefrequenciesofeightdierenthaplotypes,ABC,ABc,AbC,Abc,aBC,aBc,abC,andabc,producedbythisheterozygousparentareexpressedasafunctionofthegprobabilities,butdependingonthediplotypeofthisparent,i.e., 42

PAGE 43

Haplotype ABC 2g001 2g011 2g111 2g10ABc 2g011 2g001 2g101 2g11AbC 2g111 2g101 2g001 2g01Abc 2g101 2g111 2g011 2g00aBC 2g101 2g111 2g011 2g00aBc 2g111 2g101 2g001 2g01abC 2g011 2g001 2g101 2g11abc 2g001 2g011 2g111 2g10Forthosedoubleheterozygousparents,haplotypefrequencieswillbecollapsedwhenthiskindofparentproducesthesamehaplotypewithdierentmechanisms.Asanexample,thehaplotypefrequenciesfordoubleheterozygoteAaBbCCarecalculatedasfollows: ParentalDiplotype Haplotype ABC 2(g00+g01)1 2(g01+g00)AbC 2(g11+g10)1 2(g10+g11)aBC 2(g10+g11)1 2(g11+g10)abC 2(g01+g00)1 2(g00+g01).Foranyparentalmatingofdierentgenotypes,wecancalculatethegenotypefrequenciesofospringgenotypesforthethreemarkers.Similartotable 2-2 ,thedistributionandfrequenciesofospringgenotypeswithineachfamilycanbederivedforthethree-pointmodel.Basedonobservationsofospringgenotypeswithineachfamily,amultinomiallikelihoodcanbeformulatedfromwhichtheEMalgorithmofacomplex 43

PAGE 44

logL(pjMm;Mf)=constant+n1;1logp4111+n1;2log(2p3111p110)+n1;3log(p2111p2110)+n1;4log(2p3111p101)+n1;5log[2p2111(p111p100+p110p101)]++n1;14log(2p2111(p111p000+p110p001+p100p011+p101p010))+ +n14;14log[(p111p000+p110p001+p100p011+p101p010)2]++n27;27log(p4000): Similartothetwo-pointdesign,foragivenparentalgenotypecombination,acertaingroupofospringgenotypesisproduced.Forexample,ifthemotherandfatherbothhavehomozygousgenotypeAABB,thentheirospringwillalwayshaveadiplotypeABCjABCorgenotypeAABBCC.IfoneoftheparentsisAABBCCandtheotherisAABBCc,thentheospringhavediplotypesABCjABC(orgenotypeAABBCC)andABCjABc(orgenotypeAABBCc)withanequalfrequency.ForaparentwithtripleheterozygoticgenotypeAaBbCc,therewillbefourpossiblediplotypes,ABCjabc,ABcjabC,AbCjaBcorAbcjaBC,whoserelativefrequenciesare 44

PAGE 45

2.3.2 ). ParentAaBbCcHaplotype DiplotypeFrequencyABCABcAbCAbcaBCaBcabCabc ABCjabc11 2g001 2g011 2g111 2g101 2g101 2g111 2g011 2g00ABcjabC21 2g011 2g001 2g101 2g111 2g111 2g101 2g001 2g01AbCjaBc31 2g111 2g101 2g001 2g011 2g011 2g001 2g101 2g11AbCjaBc1P3i=1i1 2g101 2g111 2g011 2g001 2g001 2g011 2g111 2g10 Writeitinmatrixexpressionas=2666666642131123121123321123211123312377777775Gwhere=[1;1;3;4]0andG=[g01;g00;g10;g11]0.Thus,overallhaplotypefrequenciesproducedbytheparentwithtripleheterozygoticgenotypearecalculatedas1 21forABCorabc,1 22forABcorabC,1 23forAbCoraBcand1 24forAbcoraBC.Iftheparentisadoubleheterozygoticgenotype,thenitspossiblediplotypes,aswellastheirrespectiverelativefrequencies,canbelistedasfollows: 45

PAGE 47

(2{31) Where12;23;13arelinearfunctionsofgij;i;j=0;1,whichdenedin( 2{22 ).ThenthehaplotypefrequenciesproducedbyparentAABbCcarecalculatedas1 211forABCorAbcand1 2(111)forABcorAbC;thehaplotypefrequenciesproducedbyparentaaBbCcarecalculatedas1 201foraBCorabcand1 2(101)foraBcorabC.Similarly,parentswithAaBBCcwillproducehaplotypeswithfrequenciesof1 212forABCoraBcand1 2(112)forABcoraBC;parentswithAabbCcwillproducehaplotypeswithfrequenciesof1 202forAbCorabcand1 2(102)forAbcorabC.Also,parentswithAaBbCCwillproducehaplotypeswithfrequenciesof1 213forABCorabCand1 2(113)forAbCoraBC;parentswithAaBbccwillproducehaplotypeswithfrequenciesof1 203forABcorabcand1 2(103)forAbcoraBc. 47

PAGE 48

logL(gjMm;Mf;Mo;p)=constant+n1;1;1log(1)+n1;2;1log(1 2)+n1;2;2log(1 2)+n1;3;1log(1)+n1;4;1log(1 2)+n1;4;2log(1 2)+(n1;5;1+n1;5;5)log(1 211)+(n1;5;2+n1;5;4)log(1 2(111))++(n1;14;1+n1;14;14)log(1 21)+(n1;14;2+n1;14;13)log(1 22)+(n1;14;4+n1;14;11)log(1 23)+(n1;14;5+n1;14;10)log(1 24)+(n14;14;1+n14;14;27)log(1 421)+(n14;14;2+n14;14;26)log(1 412)+(n14;14;3+n14;14;25)log(1 422)+(n14;14;4+n14;14;24)log(1 413)+(n14;14;5+n14;14;23+n14;14;11+n14;14;17+n14;14;13+n14;14;15)log(1 4(12+34))+(n14;14;6+n14;14;22)log(1 424)+(n14;14;7+n14;14;21)log(1 423)+(n14;14;8+n14;14;20)log(1 434)+(n14;14;9+n14;14;19)log(1 424)+(n14;14;10+n14;14;18)log(1 414)+(n14;14;11+n14;14;17)log(1 4(13+24))+(n14;14;12+n14;14;16)log(1 423)+n14;14;14log(1 4(21+22+23+24))++n27;27;27log(1): 2{28 ),aclosedformfortheEMalgorithmtoestimatehaplotypefrequenciesisderivedandtheprocedureisdescribedasfollows.IntheEstep,wecalculatetheprobabilitywithwhichadoubleheterozygoteparentcarriesaparticulardiplotypewithequationsintable( 2.3.3 );atripleheterozygoteparent 48

PAGE 49

2{29 ).Also,letusdene: (2{33) IntheMstep,haplotypefrequenciesareestimatedwiththecalculateddiplotypeprobabilityby ^p111=1 2n[2n1+n2+n4+'11n5+n10+'12n11+'13n13+1n14]^p110=1 2n[n2+2n3+(1'11)n5+n6+(1'12)n11+n12+2n14+'03n15]^p101=1 2n[n4+(1'11)n5+2n7+n8+(1'13)n13+3n14+n16+'02n17]^p100=1 2n['11n5+n6+n8+2n9+(1123)n14+(1'03)n15+(1'02)n17+n18]^p011=1 2n[n10+(1'12)n11+(1'13)n13+(1123)n14+2n19+n20+n22+'01n23]^p010=1 2n['12n11+n12+3n14+(1'03)n15+n20+2n21+(1'01n23+n24]^p001=1 2n['13n13+2n14+n16+(1'02)n17+n22+(1'01)n23+2n25+n26]^p000=1 2n[1n14+'03n15+'02n17+n18+'01n23+n24+n26+2n27] (2{34) TheiterationsbetweenEandMstepswereproceededuntiltheconvergenceoftheestimates.Theseestimatesattheconvergencearethemaximumlikelihoodestimates(MLEs)ofhaplotypefrequencies.Wecansolveequations( 2{21 )togettheMLEsofallelefrequencies,expressedas 49

PAGE 50

(2{36) (2{37) (2{38) (2{39) Therecombinationfrequencies12,23and13arelinearfunctionsofthecrossoverprobabilitiesg00,g01,g10andg11by( 2{22 ),thusthemaximumlikelihoodestimates(MLEs)oftherecombinationfrequenciescanbeobtainedusingtheMLEsofcrossoverprobabilities.Bymaximizingthelower-levellikelihood( 2{32 ),closedformsfortheEMalgorithmtoestimatethecrossoverprobabilitiesarederivedasfollows.IntheEstep,theprobabilitieswithwhichaconsideredhaplotypeproducedbyadoubleheterozygoteortripleheterozygoteparentistherecombinanttypecanbeexpressedinamatrixform.Let 50

PAGE 51

22666666666666666666666666666666666666666666666664'111'111'11'111'11'11'111'11'121'12'121'121'12'121'12'12'13'131'131'131'131'13'13'131231123211123331123121123321'03'031'031'031'031'03'03'03'021'02'021'021'02'021'02'02'011'011'01'011'01'01'011'013777777777777777777777777777777777777777777777775:Weusejjtodenoteappendingincolumnsformatrix,then=[AjjB]whereAisa11410matrix,itinvolvesthoseospringwhoseparents'genotypesarebothdoubleheterozygotes: 51

PAGE 52

52

PAGE 53

160BBBBB@ijklijkl+ijklklijijklijklijklijkl+ijklijklijkl+ijkl2(ijkl+ijkl)11ijkl+ijkl11ijkl+ijkl2(ijkl+ijkl)ijkl+ijklijklijkl+ijklklijijklijklijklijkl+ijklijkl1CCCCCA:

PAGE 54

160BBBBBBBBB@2ij2ijij2ij2ijij2[2ij+2ij]2ijij2ij2ijij2ij2[2ij+2ij]8ijij2[2ij+2ij]1CCCCCCCCCA:whereij=1ij.Aij;ijisa410matrix.Again,ifweuseCij;ij(m)todenotethemthcolumnofCij;ij,then:A11;11=[C11;11(1);C11;11(2);C11;11(2);2C11;11(2);C11;11(3);2C11;11(3);C11;11(2);C11;11(3);C11;11(2);C11;11(1)]A12;12=[C12;12(1);C12;12(2);2C12;12(1);C12;12(2);C12;12(3);C12;12(2);2C12;12(3);C12;12(1);C12;12(2);C12;12(3)]A13;13=[C13;13(1);2C13;13(1);C13;13(2);C13;13(2);C13;13(2);C13;13(1);C13;13(2);C13;13(3);2C13;13(3);C13;13(3)]A03;03=[C03;03(1);2C03;03(1);C03;03(2);C03;03(2);C03;03(1);C03;03(2);C03;03(2);C03;03(3);2C03;03(3);C03;03(3)]A02;02=[C02;02(1);C02;02(2);2C02;02(1);C02;02(2);C02;02(3);C02;02(2);2C02;02(3);C02;02(1);C02;02(2);C02;02(3)]A01;01=[C01;01(1);C01;01(2);C01;01(2);2C01;01(1);C01;01(3);2C01;01(3);C01;01(2);C01;01(3);C01;01(2);C01;01(1)]

PAGE 55

55

PAGE 56

160BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB@111112+111113+11111231121123114+112113114+113114112+111111+1121123+11231123+1123112+1111123+11231123+1123113+114114+113113+114112111+112114+11211231111123113+111114113+114113113+1111123+11232[111+113]1123+1123112+1141123+11232[114+112]113+1111123+1123112+1141123+11232[1123+1123]2[1123+1123]2[1123+1123]1123+11232[1123+1123]2[1123+1123]1123+11232[1123+1123]1123+1123112+1141123+11232[114+112]1123+1123113+1111123+11232[111+113]112+1141123+1123113+111113114+113111+11311231141123112+114111112+111112113+1142[114+113]1123+11231123+1123113+1141123+11231123+1123112+1112[111+112]112+111114113+114112+11411231131123111+113112111+1121111CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCA:B11=1 160BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB@121122+1211213124+121122123+1221213123124+123124122+1212[121+122]1213+12131213+1213122+1211213+12131213+1213124+123123+124124+123122121+1221213123+122121124+1211213124123+124123123124+1231213122+123124121+1241213121122+121122124+1232[123+124]1213+12131213+1213124+123]1213+12131213+1213122+121]2[121+122122+121124123+1241213121+124123122+1231213122121+122121124+1211213+12131213+12132[121+124]122+1232[123+122]1213+1213122+1231213+1213124+1211213+12132[1213+1213]2[1213+1213]1213+12131213+12132[1213+1213]2[1213+1213]1213+12132[1213+1213]1213+1213122+1231213+12131213+12132[123+122]124+1212[121+124]1211+1213124+1211213+1213122+1231CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCA:

PAGE 57

160BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB@1311312133+131134+131132133+132134+13213313121341321312134+132133+132131134+131133+1311341312133133+1311312+13122[131+133]1312+1312134+1321312+13122[132+134]133+1311312+1312134+132134+1321312+13122[132+134]1312+1312133+1311312+13122[131+133]134+1321312+1312133+1311331312131+133132+133134131+134132+13413113121321341312132+134131+134133132+133131+1331321312131134+1311312+13121312+13122[131+134]133+1322[132+133]1312+1312133+1321312+1312134+131133+1321312+13121312+13122[132+133]134+1312[131+134]1312+1312134+1311312+1312133+1321312+13122[1312+1312]2[1312+1312]2[1312+1312]2[1312+1312]2[1312+1312]2[1312+1312]2[1312+1312]2[1312+1312]2[1312+1312]1CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCA:

PAGE 58

160BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB@0310312033+031034+031032033+032034+03203303120340320312034+032033+032031034+031033+0310340312033033+0310312+03122[031+033]0312+0312034+0320312+03122[032+034]033+0310312+0312034+032034+0320312+03122[032+034]0312+0312033+0310312+03122[031+033]034+0320312+0312033+0310330312031+033032+033034031+034032+03403103120320340312032+034031+034033032+033031+0330320312031034+0310312+03120312+03122[031+034]033+0322[032+033]0312+0312033+0320312+0312034+031033+0320312+03120312+03122[032+033]034+0312[031+034]0312+0312034+0310312+0312033+0320312+03122[0312+0312]2[0312+0312]2[0312+0312]2[0312+0312]2[0312+0312]2[0312+0312]2[0312+0312]2[0312+0312]2[0312+0312]1CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCA:B17=1 160BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB@021022+0210213024+021022023+0220213023024+023024022+0212[021+022]0213+02130213+0213022+0210213+02130213+0213024+023023+024024+023022021+0220213023+022021024+0210213024023+024023023024+0230213022+023024021+0240213021022+021022024+0232[023+024]0213+02130213+0213024+023]0213+02130213+0213022+021]2[021+022022+021024023+0240213021+024023022+0230213022021+022021024+0210213+02130213+02132[021+024]022+0232[023+022]0213+0213022+0230213+0213024+0210213+02132[0213+0213]2[0213+0213]0213+02130213+02132[0213+0213]2[0213+0213]0213+02132[0213+0213]0213+0213022+0230213+02130213+02132[023+022]024+0212[021+024]0211+0213024+0210213+0213022+0231CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCA:

PAGE 59

160BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB@011012+011013+01101230120123014+012013014+013014012+011011+0120123+01230123+0123012+0110123+01230123+0123013+014014+013013+014012011+012014+01201230110123013+011014013+014013013+0110123+01232[011+013]0123+0123012+0140123+01232[014+012]013+0110123+0123012+0140123+01232[0123+0123]2[0123+0123]2[0123+0123]0123+01232[0123+0123]2[0123+0123]0123+01232[0123+0123]0123+0123012+0140123+01232[014+012]0123+0123013+0110123+01232[011+013]012+0140123+0123013+011013014+013011+01301230140123012+014011012+011012013+0142[014+013]0123+01230123+0123013+0140123+01230123+0123012+0112[011+012]012+011014013+014012+01401230130123011+013012011+0120111CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCA:B14=1 160BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB@212122132142222322423234242122(21+22)2(14+23)2(13+24)2122(13+24)2(14+23)2342(23+24)234222122242232121421324234232132(14+23)2(21+23)2(12+34)2242(12+34)2(22+24)2132(14+23)2242(14+23)4(13+24)4(12+34)2(21+22+23+24)2(14+23)2(21+22+23+24)4(12+34)2(14+23)4(13+24)2(14+23)2242(14+23)2(22+24)2(12+34)2132(12+34)2(21+23)2242(14+23)213232342132232421422421212222342(23+24)2(14+23)2(13+24)2342(13+24)2(14+23)2122(21+22)212242342242142322321322212212142(24+13)2(12+34)2(21+24)2232(22+23)2(12+34)2232(24+13)2142(13+24)4(14+23)2(21+22+23+24)4(12+34)2(13+24)4(12+34)2(21+22+23+24)2(13+24)4(14+23)2(13+24)2232(24+13)2(12+34)2(22+23)2142(21+24)2(12+34)2142(24+13)2232(12+34)2(21+22+23+24)4(14+23)4(13+24)2(12+34)4(13+24)4(14+23)2(21+22+23+24)2(13+24)2(12+34)2(21+22+23+24)8(12+34)8(13+24)8(14+23)2(21+22+23+24)8(14+23)8(13+24)2(21+22+23+24)8(12+34)2(21+22+23+24)1CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCA:

PAGE 60

ij=(i;j)G1(j;1) ij=(i;j)G2(j;1) 60

PAGE 73

Wuetal. 2007a ).Underthisassumption,foragivenmarkerorderA-B-C,theprobabilitywithwhichthereisonecrossoverbetweenA-BandalsothereisonecrossoverbetweenB-Ccanbeexpressedasg11=1223=(g11+g10)(g11+g01)=(1g10g00)(1g01g00),whichleadsto AlltheparametersingareestimatedwiththeEMalgorithm,exceptthatg10shouldmeettheconstraintofequation( 2{49 ).Thesameprocedureisconductedwhentheothertwopossibleorders,A-C-BandB-A-C,areassumed.Theoptimalorderofthethreemarkersisonethatproducesthelargestlikelihoodvalueamongthethreemarkerorders. Wecanfurtherestimateandtestthedegreeofinterferenceintheoccurrenceofcrossoversbetweenthetwoadjacentmarkerintervals.FortheorderA-B-C,weestimatethecoecientofinterferencebyequation( 2{24 ).ThenullhypothesisH0:I=0(equation( 2{25 ))isequivalenttoconstraint( 2{49 ).Whenthereisnodoublerecombinationbetween 73

PAGE 74

2{26 )),wehavetheconstraintexpressedasg11=0.Itisinterestingtotestthesignicanceoflinkagedisequilibriainthesampledpopulation.Thethree-pointmodelwillestimatefourtypesoflinkagedisequilibriausingequations( 2{36 ).Eachoftheselinkagedisequilibriacanbetestedindividuallyorthroughanycombinations.TheestimatesofhaplotypefrequenciesundervariousnullhypothesistestsoflinkagedisequilibriacanbeobtainedthroughtheEMalgorithmwiththeconstraintsfromequations( 2{36 ). 2.4.1Two-PointModelWeusecomputersimulationtoexaminethestatisticalpropertiesofthelinkageandlinkagedisequilibriummodelproposed.ThesimulationschememimicsanaturalhumanpopulationatHWEfromwhichapanelofunrelatedfamilies(eachincludingamaleparent,afemaleparent,andoneormorechildren)israndomlysampled.Givenatotalof1000subjects,thesimulationconsiderstwosamplingstrategies,10001(morefamiliesvs.smallersize)and2005(fewerfamiliesvs.largersize).Foreachstrategy,wemodeltwomarkerswithastrong,moderate,andweaklinkagedisequilibriuminthepopulation.Theallelefrequenciesforthetwomarkersarep=0:6andq=0:5,respectively.Thetwomarkersarelinkedwithvaryingsizesofrecombinationfraction.Ineachdesign,1000simulationreplicateswereperformedtoestimatethemeansoftheMLEsforeachparameterandtheirstandarddeviations.Table 2-4 givestheresultsfromsimulationstudiesunderdierentdesigns.Allelefrequenciesandlinkagedisequilibriumcanbeestimatedwithhighaccuracyandprecision.Foracertainsamplesizeincludingallsubjects,amorefamiliesvs.smallersizedesignprovidesbetterestimatesthandoafewerfamiliesvs.largersizedesign.Theestimationprecisionoflinkagedisequilibriumcanbeimprovedwithincreasingdegreesofdisequilibrium.Ingeneral,therecombinationfractioncanalsowellbeestimated,but 74

PAGE 75

2-5 ).Whenthereisasmalllinkagedisequilibrium,thepowertodetectsex-speciclinkageislow.Also,asmall-familysamplingstrategywillshowlowpowerfordetectingthedierenceoflinkagebetweenthetwosexes. 2-6 .WeconsiderastrongassociationbetweenAandB,amoderatelystrongassociationbetweenAandC,andaweakassociationbetweenBandC.Also,weassumethreedierentcasesoflinkage,nodoublerecombination(c=0),independentrecombination(c=1),andinterference(c=3).Forallsamplingstrategies(familynumbervs.familysize),three-pointmodelcanprovideexcellentestimatesofalltheparameterswithgreataccuracyandprecision.Table 2-6 liststheresultsofparameterestimatesunderthree-pointmodelfora10001samplingstrategy.Itisinterestingtoseethatthree-pointmodelcanaccuratelyestimatetherecombinationoffractionbetweentwomarkersshowingalowlinkagedisequilibrium.Thesamedatawereanalyzedbytwo-pointmodel.Althoughmostparameterscanbeestimatedbytwo-pointmodelaspreciselyasbythree-pointmodel,therecombinationfractionbetweenweaklyassociatedmarkerscannotwellbeestimated.Three-pointmodelismorepowerfulfordetectinglinkageandlinkage 75

PAGE 76

MLEs(standarddeviations)ofallelefrequencies,linkagedisequilibrium,andrecombinationfractionfrom1000simulationreplicatesunderdierentsamplingstrategies FamilyTrueMLE NumberSizeD^p^q^D^ 76

PAGE 77

Powertodetectsex-specicdierencesintherecombinationfractionandlinkagedisequilibrium FamilyParametersPower NumberSizepfqfDfpmqmDmfmD disequilibriumandtheirsex-specicdierencesthantwo-pointmodel,especiallywhenthemarkersarenotstronglyassociated.Next,weproveatheoremofthenon-estimabilityofrecombinationfractionbetweenweaklyassociatedmarkersintwo-pointmodel. 2{1 ),D=0impliesthatp11p00=p10p01.Thatmeans=p11p00 21=p10p01 2by( 2{4 )and( 2{5 ).Then,by( 2{6 ),itimplies!1=(1)+(1)=1 2(1)+(11 2)=1 2 77

PAGE 78

MLEsoflinkagedisequilibriaandrecombinationfractionsamongmarkersunderthethree-pointmodel,inacomparisonwiththoseunderthetwo-pointmodel.ThenumbersintheparenthesesarethestandarddeviationsoftheMLEs 2too.Underthiscondition,thelower-levellikelihood( 2{7 )willnotcontaintherecombinationfractioninanyterms,whichmakesitnon-estimable.Ontheotherhand,forthree-pointmodel,supposethatoneofthelinkagedisequilibriabetweenonepairofmarkers,sayD12withthemarkerordersettingof1-2-3iszero,whatweneedtoshowisthat12canbeestimatedaslongasnotallD'sarezeroes.Weonlyneedtoshowthatthereisatleastoneterminvolving12(org10andg11)inthelower-levellikelihood( 2{32 ).By( 2{36 ),D12=0implies 78

PAGE 79

Noticethat,similarlytothetwo-pointmodel,thenon-estimabilityof12means13and03in( 2{31 )donotcontain12,whichleadsto'13=1 2and'03=1 2.Byformulasintable 2.3.3 ,itfurtherimpliesthatp111p001=p101p011p110p000=p100p010But( 2{51 )doesnotsatisfytheaboveequationsasotherD'sarenotzeroes,whichprovesthat12isestimable.2Whenallthelinkagedisequilibriabetweenthreepairsofmarkersandamongallthethreemarkersarezero,alltherelativefrequencies'ij(i=0;1;j=1;2;3)denedintable( 2.3.3 )areequalto1 2andi(i=1;2;3;4)denesin( 2{29 )areequalto1 4.Thenij(i=0;1;j=1;2;3)denedin( 2{31 )andi(i=1;2;3;4)denedin( 2{30 )inthelower-levellikelihood( 2{32 )areconstant,whichmakesgij(i;j=0;1)non-estimable.Usingthesimilarderivation,thesethree-pointmodelnon-estimabilityissuecanbesolvedbyafour-pointmodel. Ardlieetal. 2002 ; Dalyetal. 2001 ; Dawsonetal. 2002 ; Gabrieletal. 2002 ; Morton 2005 ; Reichetal. 2001 ; Tsunodaetal. 2004 ).TheseinvestigationsprovideanimportantcontributiontoourunderstandingoftheunderlyingstructureofLDinthehumangenome.Inthisarticle, 79

PAGE 80

Schork 2002 ).Instead,markerselection,samplesizeestimations,andstatisticalpowercouldbebasedontheempiricallydeterminedLDmapofthepopulationofinterest.Second,LDmapscanbeusedtostudytheevolutionaryhistoryofpopulations.ThepopulationsinwhichLDdecaysrapidlywithinasmalllengthofgenomeareconsideredtohavealongerhistorythanthosewhereLDdoesnotremarkablychangeoverthegenome( Dawsonetal. 2002 ; Gabrieletal. 2002 ; Reichetal. 2001 ).Linkagedisequilibriumdescribesthenon-randomassociationbetweendierentmarkersinapopulation,whereasthelinkageconcernstheco-transmissionofdierentmarkersfromparentstoospringatmeiosis.Thesetwoconceptsaretraditionallyseparatedingeneticstudies,buttheirjointapplicationaimedtoincreasegeneticmappingresolutionhasjustbeenbeginning( Dupuisetal. 2007 ; Farniretal. 2002 ; LeeandVanderWerf 2006 ; MeuwissenandGoddard 2000 ; Wuetal. 2002 ; WuandZeng 2001 ).Thejointapplicationofthelinkageandlinkagedisequilibriumcriticallyrelyontheirsimultaneousestimation.Currently,mostdesignsandalgorithmshavebeenderivedforplantandanimalpopulations(seethereferencescitedabove).Ourmodelproposedinthisarticleconsidersthecharacteristicofhumanfamilies,providingausefulapproachforestimatingthelinkageandLDatthesametime.MostpublishedmodelsforthejointestimationofthelinkageandLDarebasedontwo-locusanalysis.Bymakingfulluseofgeneticinformation,three-locusmodelincreasesdramaticallytheprecisionofparameter 80

PAGE 81

Niuetal. 2002 )intermsofmodelderivingandcomputingeciency,butitdosesnotrequiretheproperchoicesofpriorsfortheparameters,asneededfortheBayesianapproach,althoughthisisoftendicult.TheresultsofmarkeranalysiswithourmodelcanillustratehowtheinterplaybetweenrecombinationandlinkagedisequilibriumservesasthemajorforcetoshapethepatternsofLDinthehumangenome.WhiledemographystronglyimpactstheoverallextentofLDmanifestedintheLDmaplengths( Hernandezetal. 2007 ),amodiedmodelthatincorporatesdemographyandrecombinationshouldhavebetterpowertodenethemajorfeaturesoftheLDmaps.Ourmodelallowstherelatednessofsampledfamiliesfromapopulationthroughtheimplementationofidentical-by-descent(IBD)probabilities. HillandHernandez-Sanchez ( 2007 )recentlyproposedamodelforanalyzingtheprobabilityofmultilocusIBD.AnewapproachforcalculatingprobabilitiesofIBDforpairsofhaplotypeshasbeenderivedby Browning ( 2008 ).Thesemodelscanbereadilyincorporatedintoourjointlinkageandlinkagedisequilibriumanalysisstrategytobettercharacterizethegeneticstructureanddiversityofanaturalpopulationwithrelatedfamilies. 81

PAGE 82

CooperandPsaty 2003 ; RonandWeller 2007 ,TheSTARConsortium2008).ThetermquantitativetraitnucleotideorQTNisproposedtodescribethesequencepolymorphismsthatcausephenotypicvariationinaquantitativetrait. Clark ( 1990 )presentedarstmodelforinferringandreconstructinghaplotypesfromadiploidpopulation.Ascorestatisticaltestwasproposedby Schaidetal. ( 2002 )forassociationstudiesbetweentraitsandhaplotype.ThesepreviousworksallowsWuandhisgroup( Liuetal. 2004 ; Wuetal. 2007b )toderiveageneralstatisticalmodelforthecharacterizationofhaplotypevariantsataQTNthatencodeacomplexphenotypeinanaturalpopulation.AsimilarmodelwasalsoproposedbyLinandhisgroup( Huangetal. 2007 ; LinandHuang 2007 ; LinandZeng 2006 ).AstrategybasedonQTLinformationhasbeendevelopedtoidentifyQTNforcomplextraitsincontrolledcrosses( Houetal. 2007 ).RecentmolecularsurveyssuggestthatthehumangenomecontainsmanydiscretehaplotypeblocksthataresitesofcloselylocatedSNPs( Dawsonetal. 2002 ; Gabrieletal. 2002 ; Patiletal. 2001 ).Eachblockmayhaveafewcommonhaplotypeswhichaccountforalargeproportionofchromosomalvariation.Betweenadjacentblocksaretherelargeregions,calledhotspots,inwhichrecombinationeventsoccurwithhighfrequencies.SeveralalgorithmshavebeendevelopedtoidentifyaminimalsubsetofSNPs,i.e.,\tagging"SNPs,thatcancharacterizethemostcommonhaplotypes( Zhangetal. 2002 ). 82

PAGE 83

Lettreetal. 2008 ; Sannaetal. 2008 ; Weedonetal. 2008 2007 ).ThereisincreasingevidencefortheassociationbetweenhaplotypesatdierentSNPsandphenotypicvariationincomplextraits( Bader 2001 ; Clark 2004 ; Judsonetal. 2000 ).StatisticalsimulationalsoindicatesthathaplotypeanalysiswithmultipleSNPsmaybemorepowerfulthansingleSNPanalysis( Akeyetal. 2001 ; Collinsetal. 1997 ; MorrisandKaplan 2002 ; Zaykinetal. 2002 ).Allcurrenthaplotypingapproachesarebasedonapopulation-baseddesign.Thisdesignmaynotonlyhavelesspowertostudyinheriteddiseases,likecancer,butalsoproducespuriousresultsduetopopulationsubstructure.Theseshortcomingscanbecircumventedbyusingafamily-baseddesign.Adesignwithmultiplenuclearfamilieshasanadditionaladvantagethatalinkage-linkagedisequilibriummapcanbedrawntostudythepatternofgeneticvariationthroughoutthegenome.BInthischapter,IwilldevelopahaplotypingmethodfortheidenticationofDNAsequencevariantsthatencodeaquantitativetraitinanaturalpopulationwithfamilystructureddata.Byspecifyinggeneticvaluesofhaplotypesandtheirinteractionsaccordingtowell-developedquantitativegeneticprinciples,Iwillderivemodelsthatprovidereasonablegeneticinterpretationsofresults. 83

PAGE 84

HaplotypecongurationofadiplotypefortwoSNPs allelesAandaandtheotherwithtwoallelesBandb,respectively.AlleleAfromSNP1andalleleBfromSNP2arelocatedonthersthomologouschromosome,whereasalleleafromSNP1andallelebfromSNP2locatedonthesecondhomologouschromosome.Thus,[AB]isonehaplotypeand[ab]isasecondhaplotype,andbothconstituteadiplotype[AB][ab](Fig.1).Asstatedinthepreviouschapter,wecanonlyobservethegenotypeexpressedasAa=Bb.However,thedoubleheterozygotemaybeone(andonlyone)oftwopossiblediplotypes[AB][ab]and[Ab][aB].Butthesetwodiplotypescannotbedirectlyobserved,Fig.2candemonstratethissituation.WhileQTLmappingistocorrelatetheQTLgenotypeswithaphenotypictrait,theaimofthischapteristoestimatethehaplotypeeectsonaquantitativetraitbasedonthediplotypesandthereforegenotypes.InQTLmapping,thedoubleheterozygoteAa=Bbmaybefoundtoassociatewithafavorablephenotype.ButaccordingtoQTNhaplotyping,Aa=Bbmaynotcorrelatewiththefavorablephenotypeifitsdiplotypeis[AB][ab]whenhaplotype[Ab]or[aB]encodesthetraits.Thusifthegeneticeectisexpressesatthehaplotypelevel,thesamegenotypeAa=Bbmayperformdierently,dependingonwhichdiplotypeitcarries.Therefore,itisimportanttoestimatethehaplotypeeectsbasedondiplotypesandthereforegenotypes. 84

PAGE 85

DiplotypecongurationofagenotypefortwoSNPs Inotherwords,haplotypingcomplexdiseasesistodetectspecicSNPhaplotypesthatcontributedistinctivelytothediseases. 3.2.1LikelihoodInourframework,thecompletedataarediplotypecongurationsatagivensetofSNPsforeachgenotypeandfordiseaseoutcomesofsubjects,whereastheobserveddataarethegenotypesoftheseSNPsandthediseaseoutcomes.Themissingdataareattheconnectionfromthegenotypestodiplotypes.Foranygivengenotype,allpossiblediplotypescanbeexpressedout.Forexample,genotypeAA=BBhasonlyonediplotype[AB][AB].Thisisbecauseforhomozygote,thegenotypeanddiplotypeareidentical,soisforthesituationwhereatmostoneSNPisheterozygous.However,fordoubleheterozygousgenotypeAa=Bb,therewillbetwopossiblediplotypes[AB][ab]and[Ab][aB]. 85

PAGE 86

Diplotypecongurationofospring'sgenotypesattwoSNPs GenotypesAABBAABbAAbbAaBB AaBbAabbaaBBaaBbaabb ABjab AbjaBAbjabaBjaBaBjababjabRelativeFreq.1111i1i1111GenotypicMean +d a +d +d a a a a a

PAGE 87

3-1 listsallpossiblegenotypesanddiplotypesattwoSNPsforospringgiventheirparent'sgenotypes.Eachgenotype(andthereforeeachdiplotype)iscomposedoftwohaplotypes,onefromthemotherandtheotheronefromfather.Asmentionedinpreviouschapter,weusep11,p10,p01andp00torepresentfourhaplotypefrequenciesofAB,Ab,aBandab.Thentherelativefrequenciesoftwodiplotypesforthedoubleheterozygoteareafunctionofhaplotypefrequencies,whichwasexpressedin( 2{4 )and( 2{5 ).Atthisstage,thehaplotypefrequenciesandtherecombinationfraction,denotedbyP=(p11;p10;p01;p00;)inchapter 2 ,thatbelongtopopulationgeneticparametershavebeenestimatedfromchapter 2 .Withassumingthatdiplotypesareassociatedwithphenotypicvariationinadisease,thelikelihoodforunknownquantitativegeneticparameters(Q)givenobservedphenotypes(y),parent'sandospring'sgenotypes(Mm,Mf,Mo)andpopulationgeneticparameters(P)canbeformulated.Liu(2004)givesagenerallikelihoodfortwo-pointmodelusingospring'sgenerationdataasshownasfollows.UsesuperscriptsandsubscriptstodistinguishbetweendierentSNPsanddierentalleleswithinSNPs,respectively.Forexample,doubleheterozygousgenotypeA11A12=A21A22hastwodiplotypes[A11A21][A12A22]and[A11A22][A12A21].Generallyspeaking,agiven2-SNPgenotypeA1k1A1k2=A2l1A2l2(inourframework,fromAABBtoaabb,denotedby1to9),canbepartitionedintoatmosttwopossiblediplotypes,[A1k1A2k2][A1l1A2l2]and[A1k1A2l2][A1l1A2k2].Thus,thelog-likelihoodfunctioncabeformulatedonthebasisofatwo-componentmixturemodel,i.e., logL(Qjy;Mo;P)=nXk=1log[if[k1k2][l1l2](yi)+(1i)f[k1l2][l1k2](yi)]; wherenwasdenedasn=P9i;j;knijkandi2f1;0;;1g.Again,denedin( 2{4 )representstherelativefrequencyofospringiwhosediplotypeis[A1k1A2k2][A1l1A2l2],and 87

PAGE 88

88

PAGE 89

Intable 3-1 ,thegenotypicmeansaregivenfordierentgenotypesbyassumingthathaplotype[AB]isdierentfromtherestthreeofthehaplotypes.Theadvantageoffamilystructuredmodelisthatitcanhelptodeterminetherelativefrequencyofthediplotypeindoubleheterozygousospringbyinvolvingtheparent'sgenotype.Forexample,forgroup5intable 3-1 wheremother'sgenotypeisAABBandfather'sgenotypeisAaBb,therelativefrequencyifordoubleheterozygoteospringAaBbequals1.Inotherwords,unlikeone-generationmodel,withtheavailabilityofparent'sgenotypeinformation,thediplotypeofdoubleheterozygoteospringcanbeguredoutofbeing[AB][ab]withprobabilityof1inourframework.Therefore,the 89

PAGE 90

(3{5)+n555Xl=1log[$lj555f1(y555l)+(1$lj555)f0(y555l)]+n556Xl=1logf0(y556l)+n557Xl=1logf0(y557l)+n558Xl=1logf0(y558l)+n559Xl=1logf0(y559l)+++n999Xl=1logf0(y999l)wherefi,i=2;1;0areGaussianprobabilitydensityfunctionswithmeani,i=2;1;0andvariance2here.Andtherelativefrequency$ljij5(takenvaluefromf0;1;1 2;1;2g)canbedeterminedfromthefollowingtabledependingparent'sgenotypes. 90

PAGE 91

Father 11/1111/1011/0010/1110/1010/0000/1100/1000/00 11/11 00001101111/10 00001101 2111/00 00000000010/11 000011 201110/10 11012101110/00 1101 21000000/11 00000000000/10 11 2011000000/00 110110000 where 3{5 ),aclosedformforEMalgorithmwasderivedtoestimatethegenotypemeansandthevariance.Thisprocedureisdescribedasfollows:IntheEstep,wecalculatetheprobabilitywithwhichanydoubleheterozygotecarriesaparticulardiplotypewithformulasshownintheabovetableandequations( 3{7 )aswellas( 3{7 ),andalsotheposteriorprobabilitywithwhichanospringofdoubleheterozygotecarriesaparticulardiplotypeusing ljij5=$ljij5f1(yij5l) 91

PAGE 92

3{3 ):^=1 2(^2+^0)^d=^11 2(^2+^0)^a=1 2(^2^0) 92

PAGE 93

(3{10) TheaboveH0isequivalenttoH0:i,i=2;1;0.Thelog-likelihoodratioteststatistic(LR)forthesignicanceofhaplotypeeectcanbecalculatedbythedierencebetweenthelikelihoodvaluesunderH1(fullmodel)andH0(reducedmodel).LR2=2[logL(~P;~jyo;Mm;Mf;Mo)[logL(^P;^Qjyo;Mm;Mf;Mo];wherethetildesandhatsdenotetheMLEsofparametersunderthenullandalternativehypothesisin( 3{11 ).TheestimatesofparametersunderthenullhypothesiscanbeobtainedwiththesameEMalgorithmderivedforalternativehypothesis,butwiththeconstraintsinnullhypothesis.Thislog-likelihoodratioscoremayapproximatelyfollowa2distributionwith2degreeoffreedom.However,whenassumptionsuchasGaussianoruncorrelatedresidualsareviolated,theapproximationof2distributionmaynotbeappropriate.Wecanusepermutationtestapproach(ChurchillandDoerge1994)whichdoesnotrelyonthedistributionofLR2todeterminethecriticalthresholdforthesignicanceofhaplotype/diplotypeeect. 93

PAGE 94

BurnhamandAnderson 1998 )todeterminethehaplotype/haplotypesandtheirnumberthatismostdistinctfromtherestofthehaplotypesinexplainingquantitativevariation. 2.3 ,threesegregatingmarkers,A(withallelesAanda),B(withallelesBandb),andC(withallelesCandc),generateeighthaplotypesfromABCtoabcandtwentysevengenotypesfromAABBCCtoaabbcc.Somegenotypesareconsistentwithdiplotypes,whereastheothersthatareheterozygoteattwoorthreeSNPsarenot.Threedoubleheterozygotecontaintwodierentdiplotypes.Onetripleheterozygote,i.e.AaBbCc,containsfourdierentdiplotypes.Therelativefrequenciesfordoubleandtripleheterozygoteareexpressedasfunctionsofhaplotypefrequenciesgivenintabular( 2.3.3 )and( 2{29 ).ByassumingABCasariskhaplotypeandalltheothersasnon-riskhaplotypes,genotypevaluesforthreecompositediplotypes2,1and0canbeformulatedsimilarlyastwo-pointmodel.Thequantitativeparameterspacestaysthesameasintwo-point 94

PAGE 95

2 forpopulationgeneticparameterestimation.Therecombinationfractionisxedwith0.005.Weassumeoneofthefourpossiblehaplotypes[AB]istheriskhaplotype,thisleadstothreedistinctgroupsofcompositediplotypes.ThephenotypicvaluesaresimulatedbasedonGaussiandistributionwithoverallmean(=1),additiveeect(a=0:5)anddominanceeect(d=0:4)(Table 3-2 ).Theresidualvarianceisdeterminedaccordingtodierentheritabilitylevels(0.1vs.0.4).Theheritabilityistheproportionofphenotypicvariationthatisattributabletogeneticvariationamongindividualstototalobservedvariation.Thegeneticvarianceisdeterminedonthebasisofthegenotypicvaluesofalldiplotypesandtheirfrequencies.Table 3-2 describestheestimationresultsforbothpopulationandquantitativegeneticparametersoftheproposedmodel.1000simulationreplicateswereperformedtocalculatethemeanandstandarddeviationoftheMLEsforeachparameter.Forquantitativegeneticparameters(populationparameterswerediscussedinchapter 2 ),allparameterscanbereasonablyestimatedingeneral.AsexpectedtheaccuracyandprecisionofpopulationparameterestimationdoesnotrelyontheheritabilityvalueH2.ButtheprecisionofestimationforquantitativeparametersincreasesasH2increases. 95

PAGE 96

MLEs(standarddeviations)ofpopulationandquantitativegeneticparametersunderthetwo-pointmodel. FamilyKidsH2Dp(0.60)q(0.50)Dr(0.005)(1.00)a(0.5)d(0.4)20050.10.1900.6000.0170.4990.0180.1900.0060.0060.0061.0030.0450.5010.0460.3980.06620050.40.1000.6010.0170.5000.0170.1000.0100.0240.0330.9990.0200.5000.0200.4020.02820050.40.0200.6000.0170.5010.0180.0200.0120.1910.2751.0000.0210.5000.0210.4010.028100010.10.1900.6000.0080.5000.0080.1900.0030.0050.0040.9990.0460.4990.0460.3990.063100010.40.1000.6000.0080.5000.0080.1000.0050.0130.0171.0000.0200.5010.0200.4000.029100010.40.0200.6000.0080.5000.0080.0200.0050.0750.1080.9990.0210.5000.0210.4020.028

PAGE 97

3-3 ).Itisexpectedthatincreasedheritabilityandincreasedsamplesizecanenlargethepowertodetectthediplotypedierences. Table3-3. Powertodetecttheriskhaplotypeunderthetwo-pointmodel FamilyKidsH2pqDadrpower 20050.10.60.50.1910.50.40.0050.665100010.60.50.1910.50.40.0050.695200100.60.50.1910.50.40.0050.765100020.60.50.1910.50.40.0050.82020050.20.60.50.1910.50.40.0050.774100010.60.50.1910.50.40.0050.780200100.60.50.1910.50.40.0050.790100020.60.50.1910.50.40.0050.84520050.30.60.50.1910.50.40.0050.765100010.60.50.1910.50.40.0050.810200100.60.50.1910.50.40.0050.795100020.60.50.1910.50.40.0050.84020050.40.60.50.1910.50.40.0050.770100010.60.50.1910.50.40.0050.810200100.60.50.1910.50.40.0050.790100020.60.50.1910.50.40.0050.840 Thesimulationforthree-pointmodelisperformedfor10001samplingdesign.AstrongassociationbetweenAandB,amoderatelystrongassociationbetweenAandC,andaweakassociationbetweenBandCwasconsidered.Threedierentcasesoflinkage:nodoublerecombination(I=1),independentrecombination(I=1)andinterference(I=2)wereassumed.ThephenotypicvaluesofaquantitativetraitweresimulatedasaGaussiandistributionwithmeancorrespondingtocompositediplotypesandvariancedeterminedbydierentheritabilitylevels.Asshownintable 3-4 ,allthequantitativegeneticparametersundereachconditioncanbeestimatedwithgoodaccuracy 97

PAGE 98

MLEs(standarddeviations)ofpopulationandquantitativegeneticparametersunderthethree-pointmodelbasedon10001designstrategy PearsonandManolio 2008 ).ItseemsthatcurrentgeneticmappinghasdevelopedtoapointatwhichacomprehensiveanalysisofallthemarkersthatcoverthegeneticmapofthegenomecanbeperformedtosearchforthechromosomaldistributionofallpossibleQTLsorQTN.SomebasicworkinGWAofQTLshasbeeninitiatedinrecentyears,althoughitisstillfullofchallenges. Wangetal. ( 2005 )usedBayesianshrinkageapproachesforthewhole-genomemappingbyshrinkingtheeectsofallcandidateQTLstowardzero.Itisinterestingtoextend 98

PAGE 99

CarlborgandHaley 2004 ).Epistasisisalsoofparamountimportanceinthepathogenesisofmostcommonhumandiseases,suchascancerandcardiovasculardisease( Moore 2003 2005 ).Theevidenceforthisisthenonlinearrelationshipbetweengenotypeandphenotype.Wewillneedtoextendthehaplotypemodeltoconsiderepistasis.Theextendedmodelcandetectthemaineectsofindividualsequencesandtheepistaticeectsoftheinteractionbetweendierenthaplotypeblocks.Variouskindsofepistaticeectsresultingfromadditiveadditive,additivedominant,dominantadditive,anddominantdominantinteractionscanbeidentied.WewillincorporateepistaticeectsintotheGWAframeworkbydevelopingatwo-dimensionalsearchalgorithm.Totheend,wewillconstructawebofgeneticinteractionsbetweengenesfromdierentchromosomalregionsintheentiregenome.Theissuesarisingfromfalse-positiveandfalse-negativeresultswillbeaddressed. 99

PAGE 100

Bartonetal. 1984 ; CattanachandKirk 1985 ),someofwhichhavebeentestedinothermammals( Morisonetal. 2005 )includinghumans.Manyexperimentswereconductedtoinvestigatethebiologicalmechanismsforimprintinganditsinuencesonfetusdevelopmentandvariousdiseasesyndromes.SomestudiesshowedthatmanyhumandiseasesarerelatedtoimprintedgenessuchastypeIdiabetes( Patersonetal. 1999 ),Prader-WillissyndromeandAngelmansyndrome( Fallsetal. 1999 ),NOEY2inovarianandbreastcancer( Yuetal. 1999 )andbipolardisorder( McInnisetal. 2003 ).Thusfar,itisincompletelydocumentedhowtheimprintingprocessaectsdevelopmentanddisease.Tobetterunderstandtheimprintingregulation,ithasbeenproposedtodiscoverandisolateimprintinggenesthatmaybedistributedsporadicallyorlocatedinclustersforminganimprinteddomain.Oneapproachtodiscoveringimprintinggenesisbasedongeneticmappinginwhichindividualquantitativetraitloci(QTLs)showingparent-of-origineectsarelocalizedwithamolecularmarker-basedlinkagemap.Usinganoutbredstrategyappropriateforplantsandanimals,signicantimprintingQTLsweredetectedforbodycompositionandbodyweightinpigs,chickensandsheep. Cuietal. ( 2006 )proposedanF2-basedstrategytomapimprintingQTLsbycapitalizingonthedierenceintherecombinationfractionbetween 100

PAGE 101

CooperandPsaty 2003 ; RonandWeller 2007 ,STARConsortium2008).However,currentexperimentaltechniquesdonotallowtheseparationofmaternally-andpaternally-derivedhaplotypes(i.e.,alinearcombinationofallelesatdierentSNPsonasinglechromosome)fromobservedgenotypes.Morerecently,abatteryofcomputationalmodelshasbeenderivedtoestimateandtesthaplotypeeectsonacomplextraitwitharandomsampledrawnfromanaturalpopulation( Liuetal. 2004 ; Wuetal. 2007b ).Thesemodelsimplementthepopulationgeneticpropertiesofgenesegregationintoaunifyingmixture-modelframeworkforhaplotypediscovery.Theydeneaso-calledriskhaplotype,i.e.,onethatisdierentfromtheremaininghaplotypesintermsoftheirgeneticeectsonatrait.ThemotivationofthischapteristodevelopastatisticalmodelforestimatingtheimprintingeectsofSNP-constructedhaplotypeswithasetoffamiliesrandomlysampledfromanaturalpopulation.Eachfamilyiscomposedofbothparentsandospring.Becausebothparentsaregenotyped,imprintingeectsduetodierentoriginsofthesameallelecanbeestimatedfromphenotypicdataoftheospring.Themodelisvalidatedbycomputersimulationstudies. 101

PAGE 102

3 ,thegenotypicvaluesofcompositediplotypescanbepartitionedintodierentcomponentsofadditiveanddominancegeneticeects.Inanimprintingmodel,weaddoneadditionalparametertoaddresstheimprintinggeneticeectduetothedierentcontributionofhaplotypesfrommaternalorpaternalparents.Comparingtothemeanmodel( 3{3 )inchapter 3 ,thephenotypicmeanscorrespondingtofourpossiblecompositediplotypegroupsconsideringimprintingeectcanbeexpressedas 3 ;iistheparameterforestimatingimprintingeect.Thesizeandsignofideterminetheextentanddirectionofimprintingeectatthehaplotypelevel. 102

PAGE 103

3-1 inChapter 3 ,Table 4-1 describestheimprintingeect. 103

PAGE 104

Diplotypecongurationofospring'sgenotypeswithimprintingeectsattwoSNPs GenotypesAABB AABbAAbb AaBB AaBbAabbaaBBaaBbaabb ABjAb AbjABAbjAb ABjaB aBjAB ABjab abjAB AbjaBAbjabaBjaBaBjababjabRelativeFreq.1%ljij21%ljij21%ljij41%ljij4%(1)ljij5%(2)ljij51%(1)ljij5%(2)ljij51111No.MotherFather

PAGE 105

(4{2)+n553Xl=1logf0(y553l)+n554Xl=1log[%lj554f1(y554l)+(1%lj554)f10(y554l)]+n555Xl=1log[%(1)lj555f1(y555l)+%(2)lj555f10(y555l)+(1%(1)lj555%(2)lj555)f0(y555l)]+n556Xl=1logf0(y556l)+n557Xl=1logf0(y557l)+n558Xl=1logf0(y558l)+n559Xl=1logf0(y559l)+++n999Xl=1logf0(y999l)wherefi(i=2;1;10;0)areassumedtobeaGaussianprobabilitydensityfunctionwithmeaniandvariance2.Therelativefrequencies%ljijk(i;j=1;;9;k=2;4;5;l=1;;nijkcanbedeterminedfromthefollowingtablesdependingparent'sgenotypes. 105

PAGE 106

(4{3)%(1)ljij5=p(o=[AB]m[ab]fjm=i;f=j;o=AaBb)%(2)ljij5=p(o=[ab]m[AB]fjm=i;f=j;o=AaBb)1%(1)ljij5%(2)ljij5=p(o=[Ab]m[aB]for[aB]m[Ab]fjm=i;f=j;o=AaBb)Forsimplicity,use1and0todenotethecapitalandlittleletters,respectively.Thefollowingtabledescribesthevaluesof%ljij2: 106

PAGE 107

Father: 11/1111/1011/0010/1110/1010/0000/1100/1000/00 11/11 00000100011/10 11 201m000011/00 11011000010/11 00000000010/10 1f011000010/00 11011000000/11 00000000000/10 00000000000/00 000000000 wheref=fpf11pf00+(1f)pf10pf01 (4{8)d2=[mpm11pm00+(1m)pm10pm01][(1f)pf11pf00+fpf10pf01] (4{9)d3=[(1f)pf11pf00+fpf10pf01][(1m)pm11pm00+mpm10pm01] (4{10)d4=[mpm11pm00+(1m)pm10pm01][fpf11pf00+(1f)pf10pf01] (4{11)Similarly,%ljij4areexpressedas 107

PAGE 108

Father 11/1111/1011/0010/1110/1010/0000/1100/1000/00 11/11 00000000011/10 00000000011/00 00000000010/11 1101 21m000010/10 110f1000010/00 00000000000/11 11011000000/10 11011000000/00 000000000 Mother Father 11/1111/1011/0010/1110/1010/0000/1100/1000/00 11/11 00000000011/10 00000000011/00 00000000010/11 00000000010/10 11f01f2000010/00 1101 21m000000/11 00000000000/10 11 2011m000000/00 110110000 and%(2)ljij5areexpressedas 108

PAGE 109

Father 11/1111/1011/0010/1110/1010/0000/1100/1000/00 11/11 00001101011/10 00001m101 2111/00 00000000110/11 00001m1 201010/10 00002f01f110/00 00000000100/11 00000000000/10 00000000000/00 000000000 4{2 ).TherelativefrequenciesintheEstepwasdescribedin( 4{4 )intheabovesection.IntheMstep,wederivetheclosedformsof 109

PAGE 110

+9Xi;j=1Xk=2;4nijkXl=1(yijkl^1)2ljijk+9Xi;j=1nij5Xl=1(yij5l^1)2(1)ljij5+9Xi;j=1Xk=2;4nijkXl=1(yijkl^1)2(1ljijk)+9Xi;j=1nij5Xl=1(yij5l^1)2(2)ljij5+9Xi;j=1nij5Xl=1(yij5l^0)2(1(1)ljij5(2)ljij5)+9Xi;j=1Xk=3;6;7;8;9nijkXl=1(yijkl^0)2wherenok=9Xi;j=1nijki;j=1;;9ljijk=%ljijkf1(yijkl) 110

PAGE 111

4{1 ):^=1 2(^2+^0)^i=1 2(^1+^10)^d=1 2(^1^10)1 2(^2+^0)^a=1 2(^2^0) (4{14) 3 .Thelog-likelihoodratioteststatisticsforeachhypothesiscanbeviewedto 111

PAGE 112

3 withaddingimprintingeect,theMonteCarlosimulationwasconductedtogeneratethegenotypeandphenotypedataforthespeciedfamilysizeforbothtwo-pointandthree-pointmodels.Table 4-2 and 4-3 describetheestimationresultsforbothpopulationandquantitativegeneticparametersoftheproposedtwo-pointandthree-pointmodel.1000simulationreplicateswereperformedtocalculatethemeanandstandarddeviationoftheMLEsforeachparameter.Again,allparameterscanbereasonablyestimatedingeneral.AsexpectedtheaccuracyandprecisionofpopulationparameterestimationdoesnotrelyontheheritabilityvalueH2.ButtheaccuracyandprecisionofestimationforquantitativeparametersincludingimprintingparameterincreaseswiththeincrementofsamplesizeandH2. Cheverudetal. 2008 ; ReikandWalter 2001 ; WilkinsandHaig 2003 ; WoodandOakey 2006 ).Despiteitsgreatimportanceintraitformationanddevelopment,itremainsunclearhowgeneticimprintingoperatesinacomplexnetworkofinteractivegeneslocatedacrossthegenome.Geneticmappinghasprovenpowerfultoestimatethedistributionandeectsofimprintedgenes.Whiletraditionalmappingmodelsattempttodetectimprintedquantitativetraitlocibasedonalinkagemapconstructedfrommolecularmarkers,wedevelopedastatisticalmodelforestimatingtheimprintingeectsofhaplotypescomposedofmultiplesequencedSNPs.Thenewmodelcanprovidethecharacterizationofthedierenceintheeectofmaternally-andpaternally-derivedhaplotypes,whichcanbeusedasatoolforgeneticassociationstudiesatthecandidategeneorgenome-widelevel. 112

PAGE 113

MLEs(standarddeviations)ofquantitativegeneticparametersunderthetwo-pointmodelwithimprintingeect QuantitativeParameters FamilyKidsH2D(1.00)a(0.5)d(0.4)i(0.1)20050.10.1900.9880.0510.4810.0540.4090.0680.1010.05820050.40.1000.9910.0210.4950.0230.4050.0290.0920.02420050.40.0201.0030.0180.4970.0200.3940.0260.1020.022100010.10.1900.9950.0530.4850.0530.4000.0630.0830.059100010.40.1000.9900.0190.4950.0190.4090.0290.0940.022100010.40.0201.0000.0180.5010.0190.4020.0270.0980.019 113

PAGE 114

MLEs(standarddeviations)ofpopulationandquantitativegeneticparametersunderthethree-pointmodelwithimprintingeectbasedon10001designstrategy

PAGE 115

WeirandOtt 1996 ),maybemorerelevant.Earlierstudieshavedocumentedpossiblegeneticandevolutionarycausesforzygoticassociationsinanonequilibriumpopulation( BartonandGale 1993 ; BennettandBinet 1956 ; Charlesworth 1991 ; Haldane 1949 ,pp.13-45). WeirandOtt ( 1996 )documentedvedierenttypesofdisequilibriasimultaneouslywhichare(1)Hardy-Weinbergdisequilibriaateachlocus,(2)gameticdisequilibrium(includingtwoallelesinthesamegamete,eachfromadierentlocus),(3)nongameticdisequilibrium(includingtwoallelesindierentgametes,eachfromadierentlocus),(4)trigenicdisequilibrium(includingazygoteatonelocusandanalleleattheother),and(5)quadrigenicdisequilibrium(includingtwozygoteseachfromadierentlocus).Becauseitisimpossibletoestimateallthevedisequilibrium 115

PAGE 116

WeirandOtt ( 1996 )collapsedgameticandnon-gameticdisequilibriatoestimateaso-calledcompositegameticdisequilibrium.Morerecently, Liuetal. ( 2006 )usedWeir'sapproachtoestimatezygoticdisequilibriainacaninepopulationandgainabetterinsightintothestructureandorganizationofthecaninegenome.Althoughitisneededforestimatingdisequilibriumparameters,Weir'streatmentwillleadtoasignicantlossofinformation.Thereasonforinabilitytoseparatethegameticandnon-gameticdisequilibriaisduetoinsucientinformationthatcanbeusedtodistinguishtwodiplotypesthatformthesamegenotypeofthedoubleheterozygote.Inthischapter,Iwillshowthatafamily-baseddesigncanprovidethedistinctionbetweenthesetwodiplotypesbytracingtheco-transmissionofnonallelesatdierentgenesfromparentsintotheirospring.Iwilldevelopastatisticalmodelforestimatingestimatesofafullsetofdisequilibriawithapanelofnuclearfamilies.ArealexampleforthegeneticstudyofCrohn'sdiseasewillbeusedtodemonstratetheusefulnessofthenewmodel. 5.1.1GameteandNon-gameteFrequenciesConsidertwoSNPmarkersA(withtwoallelesAanda)andB(withtwoallelesBandb).LetpAandpa(pA+pa=1)aswellaspBandpb(pB+pb=1)bethecorrespondingallelefrequencies.AteachofthetwoSNPs,therearethreedistinguishablegenotypes,i.e.,AA,AaandaaformarkerAandBB,BbandbbformarkerB.Thetwomarkersform10genotypiccongurationsordiplotypes,butonly9canbegeneticallydistinguishedfromeachother.ThisisbecausegenotypiccongurationsB b A a andb B A a havethesamegenotypeAaBb.LetP,subscriptedandsuperscriptedbythegenotypenotation,bethegenotypiccongurationfrequencieswhichareindividuallytabulatedinTable 5-1 .Itisnotdiculttoestimateone-markergenotypefrequenciesfromtwo-markergenotypicconguration 116

PAGE 117

Frequenciesandobservationsofmarkergenotypes MarkerMarkerB ABB(2)Bb(1)bb(0)Total Note:GenotypeAaBbcontainstwodierentcongurationsordiplotypes[AB][ab]and[Ab][aB]. frequenciesby formarkerA, 117

PAGE 118

2PAapa=Paa+1 2PAapB=PBB+1 2PBbpb=Pbb+1 2PBb: Thetwomarkersformfourgametes,AB,Ab,aBandab,whosefrequenciescanbeestimatedfromgenotypiccongurationfrequenciesby 2(PBbAA+PBBAa+PBbAa)pAb=PbbAA+1 2(PBbAA+PbbAa+PbBAa)paB=PBBaa+1 2(PBBAa+PBbaa+PbBAa)pab=Pbbaa+1 2(PbbAa+PBbaa+PBbAa): Similarly,thefrequenciesofnonallelesfromdierentgametescanbeestimatedby 2(PBbAA+PBBAa+PbBAa)pA=b=PbbAA+1 2(PBbAA+PbbAa+PBbAa)pa=B=PBBaa+1 2(PBBAa+PBbaa+PBbAa)pa=b=Pbbaa+1 2(PbbAa+PBbaa+PbBAa): 118

PAGE 119

2PBbAA;pbAA=PbbAA+1 2PBbAApBAa=PBBAa+1 2(PBbAa+PbBAa);pbAa=PbbAa+1 2(PBbAa+PbBAa)pBaa=PBBaa+1 2PBbaa;pbaa=Pbbaa+1 2PBbaapBBA=PBBAA+1 2PBBAa;pBBa=PBBaa+1 2PBBAapBbA=PBbAA+1 2(PBbAa+PbBAa);pBba=PBbaa+1 2(PBbAa+PbBAa)pbbA=PbbAA+1 2PbbAa;pbba=Pbbaa+1 2PbbAa: WeirandOtt 1996 ).AssumethatthepopulationconsideredaboveisatHWD.Thispopulationthushasnodesirablepropertyofanequilibriumpopulation,suchasindependenceofdierentallelefrequenciesatthesamelocus( LynchandWalsh 1998 ).TheHWDattemptstotestfortwoallelesatthesamelocus,butondierentgametes,whereas(gametic)linkagedisequilibriumdescribestwoallelesonthesamegametes,butatdierentloci.Forthezygoticdisequilibrium,however,thereisathirdtest,i.e.,twoallelesondierentgametesandatdierentloci.SincethepopulationisnotinHWE,twoallelesateachmarkerarenotindependent,withthecoecientsofHardy-Weinbergdisequilibriumdenedas 2PAa+1 2pApa=Paap2a 119

PAGE 120

2PBb+1 2pBpb=Pbbp2b formarkerB,respectively.Thecoecientofdigenicgameticlinkagedisequilibriumbetweenthetwomarkersisdenedas Forthenonequilibriumpopulation,digeniclinkagedisequilibriumthatoccursbetweennonallelesatdierentgametes,denedas 120

PAGE 121

2pBAa1 2(pApa)Dab1 2(pApa)Da/bpBDA+pApapB=1 2pbAa1 2(pApa)Dab1 2(pApa)Da/b+pbDApApapb=pBaa+paDab+paDa/bpBDAp2apB=pbaa+paDab+paDa/b+pbDA+p2apb ThetrigenicdisequilibriabetweenoneallelefrommarkerAandtwoallelesfrommarkerBisdenedas 2pBbA1 2(pBpb)Dab1 2(pBpb)Da/bpADA+pApBpb=1 2pBba1 2(pBpb)Dab1 2(pBpb)Da/b+paDApapBpb=pbbA+pbDab+pbDa/bpADApAp2b=pbba+pbDab+pbDa/b+paDA+pap2b Withgenotypiccongurationfrequencies,allelefrequencies,HWD,gameticandnongametedisequilibria,andtrigenicdisequilibria,wecanestimatethequadrigenicdisequilibrium(DAB)betweentwoallelesfrommarkerAandtwoallelesfrommarkerBusingtheformulasgiveninTable 5-2 (seeWeir1996).Notethatweusetheloweranduppercasestodenotegameteandzygoticdisequilibria,respectively.FromTable 5-2 ,wecanseethateachofthegenotypiccongurationfrequenciescanbeexpressedintermsoftheallelefrequencies(pA,paandpB,pb),HWDcoecients(DAandDB)andgametic(Dab)andnongameticdisequilibriaofdierentorders(Da=b,DAb,DaBandDAB). 121

PAGE 122

ExpressionsofquadrigenicdisequilibriumDABintermsofgenotypiccongurationfrequencies,allelefrequenciesandlower-orderdisequilibriumcoecients Fre-DADB+quency1D2ab+D2a=bDADBDabDa=bDAbDaB 2PBbAAp2ApBpb1pBpbp2ApApB+pApbpApB+pApbpB+pb2pAPbbAAp2Ap2b1p2bp2A2pApb2pApb2pb2pA1 2PBBAapApap2B1p2BpApapApB+papBpApB+papB2pBpA+pa1 2PBbAapApapBpb1pBpbpApapApBpapbpApb+papBpB+pbpA+pa1 2PbBAapApapBpb1pBpbpApapApb+papBpApBpapbpB+pbpA+pa1 2PbbAapApBp2b1p2bpApapApbpapbpApbpapb2pbpA+paPBBaap2ap2B1p2Bp2a2papB2papB2pB2pa1 2PBbaap2apBpb1pBpbp2apapBpapbpapBpapbpB+pb2paPbbaap2ap2b1p2bp2a2papb2papb2pb2pa 5-3 givesmatingfrequenciesandgenotypefrequenciesofospringineachfamily.IfoneparentisadoubleheterozygoteAaBb,thenanygenotypegeneratedintheospringwillincludeamixtureofgenotypesderivedfromgametesofitstwounderlyingdiplotypes(seeTable 5-3 ).Theproportionsofmixturecomponentsaredeterminedbytwoparameters,therecombinationfraction()betweenthetwomarkersandtherelativeproportions()oftwounderlyingdiplotypesforthedoubleheterozygote.AsseenfromTable 5-3 ,thesetwoparameterscontributetothegenotypefrequencyofospringinasymmetricalwaysothatitisnotpossibletoseparateonefromanother. 122

PAGE 123

Matingfrequenciesoffamiliesandospringgenotypefrequenciesperfamilyfortwomarkerssampledfromanaturalpopulation ParentalMatingAABBAABbAAbbAaBBAaBbAabbaaBBaaBbaabb 21 23AABBAAbbPBBAAPbbAA14AABBAaBBPBBAAPBBAa1 21 25AABBAaBbPBBAA8<:PBbAaPbBAa1 2!11 2!21 2!21 2!16AABBAabbPBBAAPbbAa1 21 27AABBaaBBPBBAAPBBaa18AABBaaBbPBBAAPBbaa1 21 29AABBaabbPBBAAPbbaa1:::41AaBbAaBb8<:PBbAaPbBAa8<:PBbAaPbBAa1 4!211 2!1!21 4!221 2!1!21 2(!21+!22)1 2!1!21 4!221 2!1!21 4!21:::81aabbaabbPbbaaPbbaa1

PAGE 124

2 ,wedescribedtheprocedureofthree-pointanalysiswhichallowstheestimationofdisequilibriaofhighordersandthecrossoverinterference.Byintroducingnewparameters,crossoverprobabilities,g00(nocrossoverbetweenmarkersAandBandbetweenmarkerBandC),g01(nocrossoverbetweenmarkersAandBbutonecrossoverbetweenmarkerBandC),g10(onecrossoverbetweenmarkersAandBbutnocrossoverbetweenmarkerBandC),andg11(onecrossoverbetweenmarkersAandBandbetweenmarkerBandC),wecanconstitutethefrequenciesofgametesderivedfromagivendiplotype.Thesegametefrequencies,alongwiththerelativediplotypefrequencies,areusedtocharacterizethegenotypefrequenciesintheospringwithinagivenfamily.UsingtheEMalgorithmforthree-pointanalysis,weareabletoestimatecrossoverprobabilitiesandrelativediplotypefrequencies.Forathree-SNPmodel,thefollowingheterozygoteshavetwoorfourdiplotypeswithrespectiverelativefrequencies: Heterozygote Diplotype1Diplotype2Diplotype3Diplotype4 ABCjAbc1ABcjAbC2aaBbCc aBCjabc3aBcjabC4AaBBCc ABCjaBc5ABcjaBC6AabbCc AbCjabc7AbcjabC8AaBbCC ABCjabC9AbCjaBC10AaBbcc ABcjabc11AbcjaBc12AaBbCc ABCjabc13ABcjabC14AbCjaBc15AbcjaBC16Afterthese'sareestimated,weestimatetherelativediplotypefrequenciesfordoubleheterozygoteAaBb.ThefrequenciesofdiplotypesABjabandAbjaBareestimated,respectively,by ^1=^9+^11+^13+^14;^2=^10+^12+^15+^16: 124

PAGE 125

^PBbAa=^1^PAaBb forABjabandby ^PbBAa=^2^PAaBb forAbjaB. (5{16) (5{17) fortwodierentmarkers,respectively.Thehypothesesfortestingeachofzygoticdisequilibriabetweenthetwomarkersaregivenas (5{18) (5{19) (5{20) (5{21) Forthesehypothesesabove,wecalculatethelikelihoodsundertheH0andH1,respectively,fromwhichthelog-likelihoodratio(LR)iscalculated.TheLRteststatisticcalculatedfollowsa2-distributionwithonedegreeoffreedom.Itisalsopossibletotestwhetherallthedisequilibriumcoecientsaretogetherequaltozero.TheparametersthatneedbeestimatedundertheH0:Dab=Da=b=DAb= 125

PAGE 126

Dalyetal. 2001 ).TheseSNPsaredividedinto11discretehaplotypeblocks.Weusedathree-pointanalysistoanalyzeeverythreeadjacentSNPsforeachblock.Ingeneral,highlinkagewasdetectedbetweeneachpairofadjacentmarkers.About8%markerintervalsdisplaysignicantcrossoverinterference.Withineachblock,markersdisplaystrongpair-wiselinkagedisequilibria.Trigenicassociationsweredetectedtobecommon;over90%markerswithinblocksshowsignicantlinkagedisequilibriaofahighorder.TheseresultsaboutthelinkageandLDsuggesttheimportanceofusingathree-pointanalysistocharacterizehaplotypestructureandeectsinahumanpopulation.ThemodelproposedtoestimatezygoticdisequilibriaforeachpairofneighboringSNPswithresultsshownintheAppendixB).ItisinterestingtonotethatahighproportionofSNPpairsdisplaysignicantnongameticdisequilibria.Forexample,SNPs1and2haveanon-gameticdisequilibriumof0.147,whereasthegameticdisequilibriumisclosetozero.Wealsodetectedsignicanthigh-orderdisequilibria,butitseemsthatquadrigeneicdisequilibriaoccurmorefrequentlythantrigenicdisequilibria.Thedetectionofthesezygoticdisequilibriamayreshapethetraditionaltheoryofpopulationgeneticstudies. 126

PAGE 127

LynchandWalsh 1998 ).Forthosepopulations,HWEmaybeviolated.Wewillneedanewanalysisthatrelaxestherandommatingassumption. WeirandOtt ( 1996 )introducedtheconceptofzygoticassociationorzygoticdisequilibriumthatspecifythedisequilibriabetweendierentlociinanonequilibriumpopulation.Amultilocusstatisticwasproposedby Yang ( 2000 2002 )toexaminezygoticassociationsinnonequilibriumpopulations.Morerecently, Liuetal. ( 2006 )usedthesezygoticdisequilibriatoestimatetheextentanddistributionofzygoticdisequilibriumacrossthecaninegenome.Inthischapter,Ihaversttimedevelopedanewstatisticalmodelforestimatingalldierenttypesofzygoticdisequilibriaincluding(1)Hardy-Weinbergdisequilibriaateachlocus,(2)gameticdisequilibrium(includingtwoallelesinthesamegamete,eachfromadierentlocus),(3)nongameticdisequilibrium(includingtwoallelesindierentgametes,eachfromadierentlocus),(4)trigenicdisequilibrium(includingazygoteatonelocusandanalleleattheother),and(5)quadrigenicdisequilibrium(includingtwozygoteseachfromadierentlocus).Thisapproachbasedonafamilydesignismoreadvantageousover WeirandOtt ( 1996 )treatmentinwhichgameticandnongameticdisequilibriaarecombined.Weir'sapproachallowstheestimationofpartofzygoticdisequilibria.ThismodelwasusedtoanalyzearealdatasetforCrohn'sdisease.Wedetectsignicantevidenceforzygoticdisequilibria,showingthatthedisequilibriaatthezygoticlevelmayhavecontributedtotheevolutionofanaturalpopulation.Theapproachesforzygoticdisequilibriumanalysiswillprovidearoutinetoolfortheidenticationoftheoverallpictureofdisequilibriaacrossthegenomeandthegenemappingofcomplexdiseases. 127

PAGE 128

(1) Thedesignincludesthesegregationofallelesinanaturalparentalpopulationandthetransmissionofallelesfromparentstothenextgeneration,allowingthesimultaneousestimationoflinkageandlinkagedisequilibriumandfacilitatingtheconstructionofalinkagedisequilibriummap; (2) Becausebothparentsaregenotyped,thetransmissionofallelesfrommaleandfemaleparentscanbetraced.Thus,geneticimprintingeects,i.e.,parent-of-origineectscanbeestimated; (3) Thefamilydesigncanseparatethediplotypesforaheterozygotegenotype,thusshowingpowertoestimatedisequilibriumparametersatthezygoticlevelandextendpopulationgeneticprinciplesintoanon-equilibriumhumanpopulation.Becausealltheaspectsmentionedaboveincludemissingdataproblems,Ihavederivedalibraryofmixturemodelsforestimatinggeneticparametersforeachcase.Statisticalpropertiesofeachmodelhavebeeninvestigatedbysimulationstudies.Otherpracticalstatisticalissuesincludingtheconvergencerate,computationaleciency,andglobalmaximaspecicationarealsoexamined. 128

PAGE 129

(1) Integratemultilocuspopulationandquantitativegeneticprinciplesintodierentgeneticdesignsderivedfromnaturalhumanpopulations.Geneticmodelingofhumandiseasesiscomplex,involvinganetworkofinteractionsbetweenmultiplesinglenucleotidepolymorphismsorhaplotypesfromdierentchromosomalsegments.Traditionalquantitativegeneticprincipleswillbeusedtomodeltheadditive,dominant,andepistaticeectsofgenes.Wewillneedtoexplorehowlinkagedisequilibriaofhighordersaectgeneticsegregationinapopulationandhowcrossoverinferencescontributetogeneticvariation; (2) Developawarehouseofstatisticalmethodsforkineticfunctionalmappingofgenesthataectcomplexdiseasesbyincorporatingtheirphysiologicalpathways.Itisinterestingtostudythegeneticarchitectureofchangeofdiseaseriskfactorswithagebyorganizingordinarydierentialequationsintoastatisticalframework,facilitatinganunderstandingofthedynamicpatternofgeneticcontrolofdiseaserisk.Withthesedevelopments,wecanconstructacomprehensivesetofpredictivetoolsforhumandiseasesbycombiningpatients'epidemiologicalfactorsand/ortranscriptomic,proteomicandmetabolicdata.Wewillparticularlyincorporateallometricalscalinglawsintoourpredictivetools,thusenhancingthebiologicalrelevanceofthetools. 129

PAGE 131

131

PAGE 132

EstimatesofrelativefrequenciesfromageneticmappingprojectofCrohn'sdisease 2-locusmodel3-locusmodel1st-locus SNPSNPn51113114214314151723HWEtest 120.0001231020.0030.0031.0000.0000.0001.0000.9870.0000.0000.971230.0002341061.0000.0001.0000.0000.0000.0000.0150.0001.0000.000341.0003451030.9950.0001.0001.0000.0000.0001.0000.9601.0000.186450.993456930.0001.0000.9931.0000.0000.0001.0001.0000.9640.393560.990567800.9950.0000.9901.0000.0000.0001.0000.9861.0000.229670.993678850.0000.0000.9931.0000.0000.0001.0001.0001.0000.512780.000789820.0000.7631.0000.5130.0010.4861.0001.0000.2470.302890.0008910881.0000.0120.9960.0330.0000.0000.7690.0001.0000.2809100.00091011831.0001.0001.0000.9980.0000.0020.2390.0000.0010.00010110.000101112960.0120.0000.9970.0000.0001.0001.0000.0001.0000.90611120.000111213970.0000.8340.0010.0001.0000.0000.0060.9710.0000.00012130.0001213141101.0000.0000.0090.0000.0000.0001.0000.0001.0000.00013140.0001314151050.0010.0001.0000.9200.0000.0801.0001.0001.0000.62314150.000141516921.0001.0000.6640.9810.0010.0000.0000.9250.9980.00815160.000151617811.0000.9031.0000.0020.0000.9980.9800.0000.0000.59816170.000161718610.0000.0000.0000.0000.0530.0020.0100.2590.8610.17117180.000171819680.6960.0000.9530.0000.0000.9441.0000.0001.0000.85218190.000181920490.0000.2900.0280.0000.9340.0660.0000.0000.0000.26219200.000192021720.9911.0000.0000.0001.0000.0000.0001.0001.0000.00220210.000202122631.0000.0000.9930.0000.0000.0001.0000.0001.0000.93921220.000212223890.0001.0001.0000.0000.0010.9991.0001.0000.0000.54022230.000222324810.0001.0000.0000.0001.0000.0000.0021.0000.0020.00123240.000232425570.2270.0070.0000.0000.0250.0000.0000.0000.0000.02724250.000242526701.0000.9480.0000.9990.0000.0000.9851.0001.0000.59825260.000252627591.0001.0000.0001.0000.0000.0000.0001.0001.0000.65326270.000262728910.0001.0001.0000.0000.0001.0000.9440.0010.0000.22727280.000272829770.0000.0000.0000.0000.9720.0000.0000.0000.0000.10428290.000282930820.0011.0001.0000.0001.0000.0000.3471.0000.0000.58629300.000293031960.0001.0000.0170.0001.0000.0000.0001.0000.0000.00030310.0003031321100.0000.0000.0000.0001.0000.0000.0000.0000.0000.00031320.0003132331121.0000.0000.0000.0000.0000.0000.0000.0001.0000.03332330.000323334880.0000.0001.0000.0000.0001.0001.0000.0000.0000.41033340.000333435920.1160.9980.0000.0001.0000.0000.0000.9990.0840.249

PAGE 133

EstimatesofrelativefrequenciesfromageneticmappingprojectofCrohn'sdiseaseCont. 2-locusmodel3-locusmodel1st-locus SNPSNPn51113114214314151723HWEtest 34350.000343536941.0001.0000.0100.0000.9880.0000.0000.0000.0000.87335360.0003536371091.0001.0000.0001.0000.0000.0000.0041.0001.0000.07336370.000363738931.0001.0001.0001.0000.0000.0000.0000.9960.0000.00037380.000373839860.0001.0001.0001.0000.0000.0001.0001.0000.0250.00038390.830383940670.0000.9870.1750.0000.9970.0030.8300.9990.0000.63439400.000394041670.9931.0000.0000.0001.0000.0000.0001.0000.0050.38740410.000404142460.0281.0001.0000.4270.5730.0000.9881.0000.0000.00041420.000414243510.0001.0000.4770.0001.0000.0000.5691.0000.0000.76942430.000424344730.0001.0000.0000.0001.0000.0001.0001.0000.0000.64943440.000434445751.0001.0001.0000.0000.0000.0000.0000.0000.0000.00044450.000444546700.0000.0000.8250.0000.0001.0001.0000.0000.0000.00345460.000454647741.0001.0000.0000.0000.0000.0000.0001.0001.0000.03946470.000464748740.0030.0001.0000.0000.0001.0000.0060.0000.0000.00047480.000474849590.0000.6180.0030.0020.9960.0021.0001.0000.3950.00048490.000484950630.9990.1280.0000.0940.0000.0000.4221.0001.0000.59349500.000495051680.0000.0081.0000.0000.0001.0001.0000.3920.0000.98350510.000505152770.0011.0001.0000.0001.0000.0000.0000.0000.0000.37651520.000515253751.0001.0000.0010.0000.0020.0000.0000.0000.0000.02252530.000525354710.0001.0001.0000.0001.0000.0001.0001.0000.0000.32953540.000535455620.0000.0000.0000.0001.0000.0000.0000.0000.0000.00054550.000545556780.0000.0000.0000.0000.6490.1780.0000.0000.0000.33655560.000555657671.0000.0000.0360.0000.0000.0001.0000.0001.0000.06556570.000565758651.0000.0371.0000.9520.0000.0481.0000.0000.9980.48157580.000575859671.0001.0000.0001.0000.0000.0000.0001.0001.0000.03058590.000585960910.0001.0001.0001.0000.0000.0001.0001.0000.9990.93659600.000596061821.0000.0011.0000.6620.0000.3381.0000.0001.0000.00060610.000606162680.2360.0000.9910.0000.0000.9950.7640.0000.0150.82461620.000616263661.0001.0000.0200.0060.0000.0000.0000.2291.0000.11162630.000626364781.0001.0001.0001.0000.0000.0001.0000.9870.9870.91563641.000636465830.0001.0001.0001.0000.0000.0001.0001.0000.0000.77264650.000646566830.0000.0460.9720.0000.0010.9991.0001.0000.0000.87865660.0006566671001.0001.0000.0000.0000.0000.0000.0001.0001.0000.78266671.000666768860.0000.9931.0000.8710.0000.1291.0001.0000.1770.437

PAGE 134

EstimatesofrelativefrequenciesfromageneticmappingprojectofCrohn'sdiseaseCont. 2-locusmodel3-locusmodel1st-locus SNPSNPn51113114214314151723HWEtest 67680.000676869861.0001.0000.0000.9910.0000.0000.0051.0001.0000.20768690.000686970970.0000.0000.9250.0000.0001.0000.0000.0000.0000.31769700.000697071880.0000.0000.0000.0001.0000.0000.0000.0000.0000.00071720.000717273741.0000.0000.9560.2230.0000.0001.0000.0271.0000.00072730.000727374730.0000.0000.0000.0000.0001.0001.0000.0020.0030.54573740.000737475721.0001.0000.0020.0000.0000.0000.0000.0000.9980.10774750.000747576710.0000.9741.0000.0000.0001.0001.0001.0000.0000.59575760.000757677770.0000.6890.0000.0001.0000.0000.0000.0000.0000.32076770.000767778830.8860.9950.0000.0000.8380.0000.0001.0001.0000.87877780.000777879880.0001.0001.0000.0001.0000.0000.8931.0000.0000.62178790.000787980760.9650.0000.4190.0000.0000.0080.7020.0000.7220.02781820.000818283590.0000.0000.0000.0000.9060.0000.0000.3500.0520.24182830.000828384510.0001.0001.0000.0210.9530.0261.0001.0000.0000.34683840.000838485500.0000.0000.0000.0000.8910.0000.0000.0000.0000.84284850.000848586500.0020.0000.0000.0000.0000.0000.0000.0000.0000.01585860.000858687601.0001.0000.9760.0000.2690.0000.0000.0000.0000.55086870.250868788681.0000.9940.0000.9990.0000.0000.8801.0001.0000.47687880.000878889731.0000.0001.0000.0000.0001.0001.0000.0001.0000.62488890.000888990671.0000.0000.0000.9950.0000.0000.0000.0001.0000.34689901.000899091610.0000.0001.0000.9920.0000.0081.0000.0000.0000.24790911.000909192541.0001.0001.0000.9870.0000.0130.9810.0000.0000.41091920.000919293540.0000.3470.9760.0000.0130.9870.0000.0000.0000.69892930.000929394601.0000.0000.0000.0000.0000.0000.0001.0001.0000.26093940.000939495760.3981.0001.0000.0010.0000.9991.0000.0020.0000.41394950.000949596900.7791.0000.3120.0260.9510.0210.0360.1690.0000.07095960.000959697840.0000.0000.0000.0000.2300.5070.0160.0010.0000.01796970.000969798830.0000.0010.0040.0000.0000.0000.0000.0000.5640.44397980.000979899810.0000.9771.0000.9120.0000.0881.0001.0000.1920.00198990.0009899100840.0000.0000.0000.0000.3990.6010.0000.0000.0000.980991000.00099100101940.0001.0001.0000.0001.0000.0001.0001.0000.0000.0731001010.000100101102960.0000.0000.0000.0001.0000.0000.0000.0000.0000.0271011020.000101102103970.0000.0000.0000.0001.0000.0000.0000.0020.0000.455

PAGE 135

EstimatesofzygoticdisequilibriumfromageneticmappingprojectofCrohn'sdisease SNPpApBDADBDabDa=bDAbDaBDAB

PAGE 136

EstimatesofzygoticdisequilibriumfromageneticmappingprojectofCrohn'sdiseaseCont. SNPpApBDADBDabDa=bDAbDaBDAB

PAGE 137

EstimatesofzygoticdisequilibriumfromageneticmappingprojectofCrohn'sdiseaseCont. SNPpApBDADBDabDa=bDAbDaBDAB

PAGE 138

Akey,J.,Jin,L.,Xiong,M.,etal.\Haplotypesvssinglemarkerlinkagedisequilibriumtests:whatdowegain?"EuropeanJournalofHumanGenetics9(2001).4:291{300. Altshuler,D.,Daly,M.J.,andLander,E.S.\Geneticmappinginhumandisease."Science322(2008):881{888. Ardlie,K.G.,Kruglyak,L.,andSeielstad,M.\Patternsoflinkagedisequilibriuminthehumangenome."NatureReviewsGenetics3(2002).4:299{309. Bader,J.S.\TherelativepowerofSNPsandhaplotypeasgeneticmarkersforassociationtests."Pharmacogenomics2(2001).1:11{24. Barton,N.H.andGale,K.S.HybridZonesandtheEvolutionaryProcess.Oxford:OxfordUniversityPress,1993. Barton,S.C.,Surani,M.A.H.,andNorris,M.L.\Roleofpaternalandmaternalgenomesinmousedevelopment."Nature311(1984).5984:374{376. Bateson,W.Mendel'sprinciplesofheredity.UnitedKingdom:Cambridge,1909. Bennett,J.H.andBinet,F.E.\AssociationbetweenMendelianfactorswithmixedselngandrandommating."Heredity10(1956):51{55. Broman,K.W.andSpeed,T.P.\Amodelselectionapproachfortheidenticationofquantitativetraitlociinexperimentalcrosses(withdiscussion)."J.Roy.Stat.Soc.B64(2002):641{656. Broman,K.W.\Thegenomesofrecombinantinbredlines."Genetics169(2005).2:1133{1146. Browning,S.R.\Estimationofpairwiseidentitybydescentfromdensegeneticmarkerdatainapopulationsampleofhaplotypes."Genetics178(2008).4:2123. Burnham,K.P.andAnderson,D.R.ModelSelectionandInference:APracticalInformation-TheoreticApproach.Springer,NewYork,1998. Carlborg,O.andHaley,C.S.\Epistasis:toooftenneglectedincomplextraitstudies?"NatureReviewsGenetics5(2004).8:618{625. Cattanach,B.M.andKirk,M.\Dierentialactivityofmaternallyandpaternallyderivedchromosomeregionsinmice."Nature315(1985):496{498. Charlesworth,B.\Theevolutionofsexchromosomes."Science251(1991).4997:1030{1033. Chen,Z.\Thefullemalgorithmforthemlesofqtleectsandpositionsandtheirestimatedvariancesinmultiple-intervalmapping."Biometrics61(2005).2:474{480. 138

PAGE 139

Churchill,G.A.andDoerge,R.W.\Empiricalthresholdvaluesforquantitativetriatmapping."Genetics138(1994).3:963{971. Clark,AG.\InferenceofhaplotypesfromPCR-ampliedsamplesofdiploidpopulations."MolecularBiologyandEvolution7(1990).2:111{122. Clark,A.G.\Theroleofhaplotypesincandidategenestudies."Geneticepidemiology27(2004).4:321{333. Collins,A.R.LinkageDisequilibriumandAssociationMapping:AnalysisandApplica-tions.HumanaPress,NewYork,2007. Collins,F.S.,Guyer,M.S.,andChakravarti,A.\Variationsonatheme:cataloginghumanDNAsequencevariation."Science278(1997).5343:1580. Cooper,R.S.andPsaty,B.M.\GenomicsandMedicine:Distraction,IncrementalProgress,ortheDawnofaNewAge?"Annalsofinternalmedicine138(2003):576{580. Cui,Y.,Lu,Q.,Cheverud,J.M.,Littell,R.C.,andWu,R.\ModelformappingimprintedquantitativetraitlociinaninbredF2design."Genomics87(2006).4:543{551. Daly,M.J.,Rioux,J.D.,Schaner,S.F.,Hudson,T.J.,andLander,E.S.\High-resolutionhaplotypestructureinthehumangenome."Naturegenetics29(2001).2:229{232. Darvasi,A.\Experimentalstrategiesforthegeneticdissectionofcomplextraitsinanimalmodels."NatureGenetics18(1998).1:19{24. Dawson,E.,Abecasis,G.R.,Bumpstead,S.,Chen,Y.,Hunt,S.,Beare,D.M.,Pabial,J.,Dibling,T.,Tinsley,E.,Kirby,S.,etal.\Arst-generationlinkagedisequilibriummapofhumanchromosome22."Nature418(2002).6897:544{548. DeLaVega,F.M.,Dailey,D.,Ziegle,J.,Williams,J.,Madden,D.,andGilbert,D.A.\Newgenerationpharmacogenomictools:aSNPlinkagedisequilibriumMap,validatedSNPassayresource,andhigh-throughputinstrumentationsystemforlarge-scalegeneticstudies."Biotechniques32(2002):548{554. Dupuis,J.,Siegmund,D.,andYakir,B.\Auniedframeworkforlinkageandassociationanalysisofquantitativetraits."Proc.Natl.Acad.Sci.USA104(2007):20210{20215. Falls,J.G.,Pulford,D.J.,Wylie,A.A.,andJirtle,R.L.\Genomicimprinting:implicationsforhumandisease."AmericanJournalofPathology154(1999).3:635{647. 139

PAGE 140

Farnir,F.,Grisart,B.,Coppieters,W.,Riquet,J.,Berzi,P.,Cambisano,N.,Karim,L.,Mni,M.,Moisio,S.,Simon,P.,etal.\SimultaneousMiningofLinkageandLinkageDisequilibriumtoFineMapQuantitativeTraitLociinOutbredHalf-SibPedigreesRevisitingtheLocationofaQuantitativeTraitLocusWithMajorEectonMilkProductiononBovineChromosome14."Genetics161(2002).1:275{287. Fulker,D.W.andCardon,L.R.\Asib-pairapproachtointervalmappingofquantitativetraitloci."Americanjournalofhumangenetics54(1994).6:1092{1103. Gabriel,S.B.,Schaner,S.F.,Nguyen,H.,Moore,J.M.,Roy,J.,Blumenstiel,B.,Higgins,J.,DeFelice,M.,Lochner,A.,Faggart,M.,etal.\Thestructureofhaplotypeblocksinthehumangenome."Science296(2002).5576:2225{2229. Georges,M.\Mapping,nemapping,andmoleculardissectionofquantitativetraitlociindomesticanimals."Ann.Rev.GenomicsHumanGenet8(2007):131{162. Giannoukakis,N.,Deal,C.,Paquette,J.,Goodyer,C.G.,andPolychronakos,C.\ParentalgenomicimprintingofthehumanIGF2gene."NatureGenetics4(1993).1:98{101. Haldane,JB.\Theassociationofcharactersasaresultofinbreedingandlinkage."Annalsofeugenics15(1949).1:15{23. Harrison,R.G.Hybridzonesandtheevolutionaryprocess.OxfordUniversityPress,USA,1993. Hernandez,R.D.,Hubisz,M.J.,Wheeler,D.A.,Smith,D.G.,Ferguson,B.,Rogers,J.,Nazareth,L.,Indap,A.,Bourquin,T.,McPherson,J.,etal.\DemographichistoriesandpatternsoflinkagedisequilibriuminChineseandIndianrhesusmacaques."Science316(2007).5822:240{243. Hill,W.G.andHernandez-Sanchez,J.\Predictionofmultilocusidentity-by-descent."Genetics176(2007).4:2307. Hirschhorn,J.N.andLettre,G.\Progressingenome-wideassociationstudiesofhumanheight."HormoneResearch71(2009):s5{s13. Hou,W.,Yap,J.S.,Wu,S.,Liu,T.,Cheverud,J.M.,andWu,R.L.\Haplotypingaquantitativetraitwithahigh-densitymapinexperimentalcrosses."PLoSONE2(2007).8:e732. Huang,B.E.,Amos,C.I.,andLin,D.Y.\Detectinghaplotypeeectsingenomewideassociationstudies."GeneticEpidemiology31(2007):803{812. 140

PAGE 141

Jannink,J.L.andWu,X.L.\EstimatingallelicnumberandidentityinstateofQTLsininterconnectedfamilies."GeneticsResearch81(2003):133{144. Jiang,C.andZeng,Z.B.\Multipletraitanalysisofgeneticmappingforquantitativetraitloci."Genetics140(1995).3:1111{1127. Judson,R.,Stephens,J.C.,andWindemuth,A.\Thepredictivepowerofhaplotypesinclinicalresponse."pgs1(2000).1:15{26. Kao,C.H.,Zeng,Z.B.,andTeasdale,R.D.\Multipleintervalmappingforquantitativetraitloci."Genetics152(1999).3:1203{1216. Lander,E.S.andBotstein,D.\MappingMendelianfactorsunderlyingquantitativetraitsusingRFLPlinkagemaps."Genetics121(1989).1:185{199. Lee,S.H.andVanderWerf,J.H.J.\Usingdominancerelationshipcoecientsbasedonlinkagedisequilibriumandlinkagewithageneralcomplexpedigreetoincreasemappingresolution."Genetics174(2006).2:1009{1016. Lettre,G.,Jackson,A.U.,Gieger,C.,Schumacher,F.R.,Berndt,S.I.,Sanna,S.,Eyheramendy,S.,Voight,B.F.,Butler,J.L.,Guiducci,C.,etal.\Identicationoftenlociassociatedwithheighthighlightsnewbiologicalpathwaysinhumangrowth."NatureGenetics40(2008).5:584{591. Lettre,G.andRioux,J.D.\Autoimmunediseases:insightsfromgenome-wideassociationstudies."HumanMolecularGenetics17(2008):R116{R121. Li,J.,Wang,S.,andZeng,Z.B.\Multiple-intervalmappingforordinaltraits."Genetics173(2006).3:1649. Lin,D.Y.andHuang,B.E.\Theuseofinferredhaplotypesindownstreamanalyses."TheAmericanJournalofHumanGenetics80(2007).3:577{579. Lin,D.Y.andZeng,D.\Likelihood-basedinferenceonhaplotypeeectsingeneticassociationstudies."JournaloftheAmericanStatisticalAssociation101(2006).473:89{104. Liu,T.,Johnson,J.A.,Casella,G.,andWu,R.\SequencingcomplexdiseaseswithHapMap."Genetics168(2004).1:503{511. Liu,T.,Todhunter,R.J.,Lu,Q.,Schoettinger,L.,Li,H.,Littell,R.C.,Burton-Wurster,N.,Acland,G.M.,Lust,G.,andWu,R.\Modelingextentanddistributionofzygoticdisequilibrium:implicationsforamultigenerationalcaninepedigree."Genetics174(2006).1:439{453. 141

PAGE 142

Lou,X.Y.,Casella,G.,Todhunter,R.J.,Yang,M.C.K.,andWu,R.\AGeneralStatisticalFrameworkforUnifyingIntervalandLinkageDisequilibriumMapping."JournaloftheAmericanStatisticalAssociation100(2005).469:158{171. Lynch,M.andWalsh,B.GeneticsandAnalysisofQuantitativeTraits.Sinauer,Sunderland,MA,1998. Ma,C.X.,Casella,G.,andWu,R.\FunctionalMappingofQuantitativeTraitLociUnderlyingtheCharacterProcessATheoreticalFramework."Genetics161(2002).4:1751{1762. Marques,E.,Schnabel,R.D.,Stothard,P.,Kolbehdari,D.,Wang,Z.,Taylor,J.F.,andMoore,S.S.\Highdensitylinkagedisequilibriummapsofchromosome14inHolsteinandAnguscattle."BMCgenetics9(2008).1:45. McInnis,M.G.,Lan,T.H.,Willour,V.L.,McMahon,F.J.,Simpson,S.G.,Addington,A.M.,MacKinnon,D.F.,Potash,J.B.,Mahoney,A.T.,Chellis,J.,etal.\Genome-widescanofbipolardisorderin65pedigrees:supportiveevidenceforlinkageat8q24,18q22,4q32,2p12,and13q12."Molecularpsychiatry8(2003).3:288{298. McRae,A.F.,McEwan,J.C.,Dodds,K.G.,Wilson,T.,Crawford,A.M.,andSlate,J.\Linkagedisequilibriumindomesticsheep."Genetics160(2002).3:1113{1122. Meuwissen,T.H.E.andGoddard,M.E.\Finemappingofquantitativetraitlociusinglinkagedisequilibriawithcloselylinkedmarkerloci."Genetics155(2000).1:421{430. Miles,J.S.,Moss,J.E.,Taylor,B.A.,Burchell,B.,andWolf,C.R.\Mappinggenesencodingdrug-metabolizingenzymesinrecombinantinbredmice."Genomics11(1991).2:309. Mohlke,K.L.,Boehnke,M.,andAbecasis,G.R.\Metabolicandcardiovasculartraits:anabundanceofrecentlyidentiedcommongeneticvariants."HumanMolecularGenetics17(2008).R2:R102{R108. Moore,J.H.\Theubiquitousnatureofepistasisindeterminingsusceptibilitytocommonhumandiseases."HumanHeredity56(2003).1-3:73{82. |||.\Aglobalviewofepistasis."Naturegenetics37(2005):13{14. Morison,I.M.,Ramsay,J.P.,andSpencer,H.G.\Acensusofmammalianimprinting."TrendsinGenetics21(2005).8:457{465. Morris,R.W.andKaplan,N.L.\Ontheadvantageofhaplotypeanalysisinthepresenceofmultiplediseasesusceptibilityalleles."Geneticepidemiology23(2002).3:221{233. 142

PAGE 143

Niu,T.,Qin,Z.S.,Xu,X.,andLiu,J.S.\Bayesianhaplotypeinferenceformultiplelinkedsingle-nucleotidepolymorphisms."TheAmericanJournalofHumanGenetics70(2002).1:157{169. Paterson,A.D.,Naimark,D.M.J.,andPetronis,A.\Theanalysisofparentaloriginofallelesmaydetectsusceptibilitylociforcomplexdisorders."HumanHeredity49(1999):197{204. Patil,N.,Berno,A.J.,Hinds,D.A.,Barrett,W.A.,Doshi,J.M.,Hacker,C.R.,Kautzer,C.R.,Lee,D.H.,Marjoribanks,C.,McDonough,D.P.,etal.\Blocksoflimitedhaplotypediversityrevealedbyhigh-resolutionscanningofhumanchromosome21."Science294(2001).5547:1719{1723. Pearson,T.A.andManolio,T.A.\Howtointerpretagenome-wideassociationstudy."JournalofAmericanMedicalAssociation299(2008).11:1335{1344. Reich,D.E.,Cargill,M.,Bolk,S.,Ireland,J.,Sabeti,P.C.,Richter,D.J.,Lavery,T.,Kouyoumjian,R.,Farhadian,S.F.,Ward,R.,etal.\Linkagedisequilibriuminthehumangenome."Nature411(2001):199{204. Reik,W.andWalter,J.\Genomicimprinting:parentalinuenceonthegenome."NatureReviewsGenetics2(2001).1:21{32. Risch,N.andMerikangas,K.\Thefutureofgeneticstudiesofcomplexhumandiseases."Science273(1996).5281:1516. Ron,M.andWeller,J.I.\FromQTLtoQTNidenticationinlivestock-winningbypointsratherthanknock-out:areview."Animalgenetics38(2007).5:429. Saar,K.,Beck,A.,Bihoreau,M.T.,Birney,E.,Brocklebank,D.,Chen,Y.,Cuppen,E.,Demonchy,S.,Dopazo,J.,Flicek,P.,etal.\SNPandhaplotypemappingforgeneticanalysisintherat."NatureGenetics40(2008).5:560{566. Sanna,S.,Jackson,A.U.,Nagaraja,R.,Willer,C.J.,Chen,W.M.,Bonnycastle,L.L.,Shen,H.,Timpson,N.,Lettre,G.,Usala,G.,etal.\CommonvariantsintheGDF5-UQCCregionareassociatedwithvariationinhumanheight."NatureGenetics40(2008).2:198{203. Satagopan,J.M.,Yandell,B.S.,Newton,M.A.,andOsborn,T.C.\ABayesianapproachtodetectquantitativetraitlociusingMarkovchainMonteCarlo."Genetics144(1996).2:805{816. Schaid,D.J.,Rowland,C.M.,Tines,D.E.,Jacobson,R.M.,andPoland,G.A.\Scoretestsforassociationbetweentraitsandhaplotypeswhenlinkagephaseisambiguous."TheAmericanJournalofHumanGenetics70(2002).2:425{434. 143

PAGE 144

Styrkarsdottir,U.,Halldorsson,B.V.,Gretarsdottir,S.,Gudbjartsson,D.F.,Walters,G.B.,Ingvarsson,T.,Jonsdottir,T.,Saemundsdottir,J.,Center,J.R.,Nguyen,T.V.,etal.\Multiplegeneticlociforbonemineraldensityandfractures."NewEnglandJournalofMedicine358(2008).22:2355{2365. Tapper,W.,Gibson,J.,Morton,N.W.,andCollins,A.\Acomparisonofmethodstodetectrecombinationhotspots."HumanHeredity66(2008):157{169. Thompson,E.A.\Informationgaininjointlinkageanalysis."MathematicalMedicineandBiology1(1984).1:31{49. Tishko,S.A.,Dietzsch,E.,Speed,W.,Pakstis,A.J.,Kidd,J.R.,Cheung,K.,Bonne-Tamir,B.,Santachiara-Benerecetti,A.S.,Moral,P.,Krings,M.,etal.\GlobalpatternsoflinkagedisequilibriumattheCD4locusandmodernhumanorigins."Science271(1996).5254:1380{1387. Tishko,S.A.,Varkonyi,R.,Cahinhinan,N.,Abbes,S.,Argyropoulos,G.,Destro-Bisol,G.,Drousiotou,A.,Dangereld,B.,Lefranc,G.,Loiselet,J.,etal.\HaplotypediversityandlinkagedisequilibriumathumanG6PD:recentoriginofallelesthatconfermalarialresistance."Science293(2001).5529:455{462. Tishko,S.A.andWilliams,S.M.\Geneticanalysisofafricanpopulations:humanevolutionandcomplexdisease."NatureReviewsGenetics3(2002).8:611{621. Tsunoda,T.,Lathrop,G.M.,Sekine,A.,Yamada,R.,Takahashi,A.,Ohnishi,Y.,Tanaka,T.,andNakamura,Y.\Variationofgene-basedSNPsandlinkagedisequilibriumpatternsinthehumangenome."HumanMolecularGenetics13(2004).15:1623{1632. vanEeuwijk,F.A.,Malosetti,M.,Yin,X.,Struik,P.C.,andStam,P.\Statisticalmodelsforgenotypebyenvironmentdata:fromconventionalANOVAmodelstoeco-physiologicalQTLmodels."AustralianJournalofAgriculturalResearch56(2005).9:883{894. Wang,H.,Zhang,Y.M.,Li,X.,Masinde,G.L.,Mohan,S.,Baylink,D.J.,andXu,S.\Bayesianshrinkageestimationofquantitativetraitlociparameters."Genetics170(2005).1:465{480. Weedon,M.N.,Lango,H.,Lindgren,C.M.,Wallace,C.,Evans,D.M.,Mangino,M.,Freathy,R.M.,Perry,J.R.B.,Stevens,S.,Hall,A.S.,etal.\Genome-wideassociationanalysisidenties20locithatinuenceadultheight."NatureGenetics40(2008).5:575{583. Weedon,M.N.,Lettre,G.,Freathy,R.M.,Lindgren,C.M.,Voight,B.F.,Perry,J.R.B.,Elliott,K.S.,Hackett,R.,Guiducci,C.,Shields,B.,etal.\Acommonvariantof 144

PAGE 145

Weir,B.S.\Linkagedisequilibriumandassociationmapping."AnnualReviewofGenomicsandHumanGenetics9(2008):129{142. Weir,B.S.andOtt,J.GeneticdataanalysisII.SinauerAssociatesSunderland,Mass,1996. Wilkins,J.F.andHaig,D.\Whatgoodisgenomicimprinting:thefunctionofparent-specicgeneexpression."NatureReviewsGenetics4(2003).5:359{368. Wood,A.J.andOakey,R.J.\Genomicimprintinginmammals:emergingthemesandestablishedtheories."PLoSgenetics2(2006).11:e147. Wu,R.,Ma,C.-X.,andCasella,G.StatisticalGeneticsofQuantitativeTraits:Linkage,Maps,andQTL.SpringerVerlag,NewYork,2007a. Wu,R.,Ma,C.X.,andCasella,G.\Jointlinkageandlinkagedisequilibriummappingofquantitativetraitlociinnaturalpopulations."Genetics160(2002).2:779{792. Wu,R.L.andLin,M.\Functionalmapping-Anewtooltostudythegeneticarchitectureofdynamiccomplextraits."NatureReviewsGenetics7(2006):229{237. Wu,R.L.andZeng,Z-B.\Jointlinkageandlinkagedisequilibriummappinginnaturalpopulations."Genetics157(2001):899{909. Wu,S.,Yang,J.,Wang,C.G.,andWu,R.L.\Ageneralquantitativegeneticmodelforhaplotypingacomplextraitinhumans."CurrentGenomics8(2007b):343{350. Xu,S.\Mappingquantitativetraitlociusingfour-waycrosses."GeneticsResearch68(2009).2:175{181. Xu,S.andAtchley,WR.\Arandommodelapproachtointervalmappingofquantitativetraitloci."Genetics141(1995).3:1189{1197. |||.\Mappingquantitativetraitlociforcomplexbinarydiseasesusinglinecrosses."Genetics143(1996).3:1417{1424. Yang,R.C.\Zygoticassociationsandmultilocusstatisticsinanonequilibriumdiploidpopulation."Genetics155(2000).3:1449{1458. |||.\Analysisofmultilocuszygoticassociations."Genetics161(2002).1:435{445. Yi,N.,Xu,S.,andAllison,D.B.\Bayesianmodelchoiceandsearchstrategiesformappinginteractingquantitativetraitloci."Genetics165(2003).2:867{883. Yu,Y.,Xu,F.,Peng,H.,Fang,X.,Zhao,S.,Li,Y.,Cuevas,B.,Kuo,W.L.,Gray,J.W.,Siciliano,M.,etal.\NOEY2(ARHI),animprintedputativetumorsuppressorgenein 145

PAGE 146

Zaykin,D.V.\Boundsandnormalizationofthecompositelinkagedisequilibriumcoecient."Geneticepidemiology27(2004).3:252{257. Zaykin,D.V.,Westfall,P.H.,Young,S.S.,Karnoub,M.A.,Wagner,M.J.,Ehm,M.G.,andInc,G.S.K.\Testingassociationofstatisticallyinferredhaplotypeswithdiscreteandcontinuoustraitsinsamplesofunrelatedindividuals."HumanHeredity53(2002):79{91. Zeng,Z.B.\Precisionmappingofquantitativetraitloci."Genetics136(1994).4:1457{1468. Zhang,K.,Deng,M.,Chen,T.,Waterman,M.S.,andSun,F.\Adynamicprogrammingalgorithmforhaplotypeblockpartitioning."ProceedingsoftheNationalAcademyofSciences99(2002).11:7335{7339. Zou,F.,Fine,J.P.,Hu,J.,andLin,D.Y.\Anecientresamplingmethodforassessinggenome-widestatisticalsignicanceinmappingquantitativetraitloci."Genetics168(2004).4:2307{2316. Zou,F.,Nie,L.,Wright,F.A.,andSen,P.K.\Anecientresamplingmethodforassessinggenome-widestatisticalsignicanceinmappingquantitativetraitloci."JournalofStatisticalPlanningandInference139(2009):978{989. 146

PAGE 147

QinLi,originallytrainedinappliedmathematicsatDalianUniversityofTechnology,China,receivedherPh.D.fromtheUniversityofFloridainthesummerof2009.Qin'smajorisstatisticswhilesimultaneouslyworkingfortheDepartmentofEpidemiologyandHealthPolicyResearch.Herresearchfocusesonstatisticalgenetics.Sheisintriguedbythedevelopmentofstatisticalandcomputationalmodelsforidentifyinggenesthatcontrolcomplextraitsanddiseases.InherPh.D.dissertation,Qinexploredseveralfundamentalaspectsoffamilydatainconstructingthelinkagedisequilibriummapofthehumangenomeandnemappingdiseasegenes.Alibraryofstatisticalmodelshasbeenderivedtoestimateandtestthepatternofgenesegregationinanaturalpopulationandgeneticeectsofhaplotypesoncomplexdiseases.Sheiseagertousehermodelsandalgorithmstosolvecomplicatedreal-worldgeneticproblems.QinisamemberofAmericanStatisticalAssociationandEasternNorthAmericanRegion/InternationalBiometricSociety. 147