STABLE ALLELIC LINEAGES OF MHC CLASS II GENES
WITHIN THE GENUS MUS
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1990
This dissertation is dedicated to the members of our family as a token of my appreciation for the love, support and encouragement they have provided over the years.
The intellectual environment provided by Dr. Edward K. Wakeland has been the single most important factor in the enrichment of my evolution as a researcher; to him I am deeply indebted. It is my privilege to express my sincere gratitude to him for his patient guidance as well as constant infusion of encouragement and inspiration, and for allowing me to exercise thoughtful freedom to proceed with this work.
I thank the members of my supervisory committee, Drs. Kuo-Jang Kao, Harry S. Nick, Ammon B. Peck and William E. Winter, for their advice and assistance throughout. The timely help and attention of my colleague Richard McIndoe during the preparation of the present work needs a special mention.
I acknowledge Drs. Wayne Potts, Murali, Jin-Xion She and William Wang for their technical help and guidance.
I would like to thank the people in the department for what they have done and provided for me to make the completion and success of my graduate study possible.
My appreciation is extended to Dr. Linda Smith for her friendship and hospitality through the years.
My sincere thanks are extended to Dr. Ahmad N. Ali and Charles C. Brown for providing free cloning vector,
PbluescriptSK(+) and PbluescriptKS(+) and for their technical advice.
My sincere appreciation is extended to Vickie Henson, Thomas McConnell, Roy Tarnuzzer, Judith Nutkins, Stefen Boehme, Ivan Chang, Ying Ye, Mary Yu, Karen Wright, Julio Mas, Kristy Myrisk, Jerome and Xemena for their lively company, loving support and constant encouragement.
TABLE OF CONTENTS
ACKNOWLEDGEMENTS .................. iii
LIST OF FIGURES .................. viii
ABSTRACT . . . . . . xi
1 INTRODUCTION ............... 1
2 GENOMIC ORGANIZATION OF MAJOR HISTOCOMPATIBILITY COMPLEX. ........... 4
H-2 Complex . .......... 5
Three Classes of Mhc Genes ..... 5 Organization of Mouse Mhc ....... 6 Genetic Organization of the I region 16 Linkage Relationship of Class II Genes 19 Biochemistry of Class II Molecules 23
Analysis of the Structure-Function
Relationship of Class II Molecule 27 Functional Role of Mhc Gene ...... 39 Genetic Polymorphism of Mhc Genes . 42 Recombination Within the Mhc ... 61 Definition of Evolutionary Lineage. .... 65
Structure and Evolution of Retroposon . 66
Structure of Alu and "Alu-like" family. 67 Mechanisms of Retroposition ..... 68 Function Attributable to SINE ..... 74
Evolution of Intron. ........... .. 76
Wild Mice As a Useful Genetic Tool ..... 81
3 MATERIALS AND METHODS. ........... 92
Wild Mice. . . . . 92
Soure of Mouse Tissues and Preparation of DNA 92
Restriction Enzyme Digestion and Agarose Gel
Electrophoresis ........... 95
Probes . ... .. ..... .. 96
Capillary Transfer and Hybridization ... 96 Genomic Restriction Mapping ....... 100
Nucleotide Sequencing. ........ 100
Data Analysis. .. ..... 101
RFLP Patterns of Ab Alleles and Their
Phylogenetic Relationships ... 101
Computer Programs . . . 104
Polymerase Chain Reaction(PCR) Amplification 105
Enzymatic Amplification of Genomic DNA 105
Amplification of Central Fragment for
DNA Hybridization. ........ 109
4 SEQUENCE ANALYSIS OF LINEAGE 3 ALLELES . 114
Restriction Enzyme Analysis of Lineage 3
Alleles ........ ........... 114
Restriction-Site Polymorphism of Lineage
3 Alleles. . . . 114
Distinct Intron Size Between Lineage 2
and 3 Alleles .......... 122
DNA Sequence of Lineage 3 Intron ...... 122 Lineage 3 Derived from Lineage 2 ..... 125 Ab Genes Can Be Divided Into 4 Lineages. 141
Defining Evolutionary Lineage 2B. .... 141 4 Evolutionary Lineages of Ab Genes 156
5 EVOLUTION OF MHC CLASS II GENE POLYMORPHISM. 159
RFLP Analysis of Ab Genes Within the Genus
Mus . . . . 159
Lineage Distribution of Ab Alleles Within the
Genus Mus ............... 177
Phylogenetic Relationships of 86 Ab Alleles
in the Genus Mus ........... 180
6 DISCUSSION . . . . 193
Function of Mhc Genes ...... .. 193
Features of Mhc Polymorphism . ... 194
Mechanism of Generating Ab Gene Polymorphisms 195 Mhc Genes Evolve via Trans-species Mode 196
Possible Impact of Retroposon on Ab Gene
Expression . . .. ... 198
Linkage Disequilibrium Among Restriction
Sites ... ....... . 199
Maintenance of Mhc Polymorphism ...... 201
Overdominant Slection for Mhc
Polymorphism ........... 203
Divergent Allele Advantage ...... 204
Alu-like Repetitive Elements in A Genes 205
SINE as Evolutionary and Genetic Tags 205
539 bp Retroposon: a Newly Arisen
Repetitive Family ...... ... 207
Transposition of Middle Repetitive Elements 208
Preferential Site of Integration . 208 Possible Transposition Mechanism . 209
Phylogenetic Relationship of Ab Genes . 210
REFERENCE LIST ................. 213
BIOGRAPHICAL SKETCH ... ... . . . 236
LIST OF FIGURES
Figure 2-1 Location of genes in the Mhc of the BALB/c
mouse . . . . ... .. 8
Figure 2-2 Genomic structures of Mhc class I molecules. 12 Figure 2-3 Genomic structures of Mhc class II a and P
chain . . . . . 22
Figure 2-4 Location of Mhc class I and class II genes
within H-2 complex ........... . 25
Figure 2-5 A model of the antigen-binding site of the
Mhc class II I-A molecules . . . 29
Figure 2-6 Recombinatorial association and expression
of a and P chain of Mhc class II molecules ..... 37 Figure 2-7 Segmental exchange of Mhc class II Ab
genes . . . . . 48
Figure 2-8 Illustration of the evolutionary origins
of the three lineages of Ab alleles ....... 52 Figure 2-9 Analysis of the sequence homology of
bd(lineage 1) and Abb(lineage 2). ........ 55
Figure 2-10 Location of Recombinational hot spot(RHS)
within the H-2 complex ............. 64
Figure 2-11 A proposed mechanism for SINE retroposition 71 Figure 2-12 Proposed sequence of events that a group II
intron could mutate into a classical intron . 81 Figure 2-13 Geographical distribution of four separate
subspecies of Mus musculus complex ....... 85 Figure 2-14 Geographical distribution of four separate
species of genus Mus. .............. 88
Figure 2-15 Phylogenetic relationships within the genus
Mus and Rattus. . . . . 91
Figure 3-1 The genomic restriction map of Abd probe 98 Figure 3-2 The partial restriction map of Abk and the
sequencing strategy ............... 103
Figure 3-3 The sequences flanking the target site
(GATTCTGATACA) for the "Alu-like"(Bl) element 107 Figure 3-4 Location of two insertional events in a
lineage 3 allele(Ab) .............. 111
Figure 3-5 The nucleotide sequence of 539 bp insert 113 Figure 4-1 Restriction mapping performed by double
digest experiment ................ 116
Figure 4-2 Restriction mapping carried out by double
digest experiment. ................. 118
Figure 4-3 Restricion maps of seven lineage 3 Ab
alleles . . . . .. .. 121
Figure 4-4 Comparison of restriction maps of a
representative lineage 2 and 3 alleles. ..... 124 Figure 4-5 The 3735 bp of nucleotide sequence of Abk. 127 Figure 4-6 Partial nucleotide sequence of intron 2
from Ab"k . . . . . 130
Figure 4-7 Location of two inserts in a lineage 3(Abk)
allele . . . . . 132
Figure 4-8 Sequence identity between the retroposon
sequence in linage 2(Ab) and 3(Abk) alleles . 135 Figure 4-9 Sequence identity among 3 Ab alleles . 137 Figure 4-10 Sequence alignment among three B1 repeats 139 Figure 4-11 Southern blot experiments with Abd and
235bp non-repetitive element probe ....... 143 Figure 4-12 Blot hybridization experiment with 235 bp
non-repetitive probe . . . ... 145
Figure 4-13 PCR amplification of DNA samples from 12
species and subspecies of genus Mus ....... 148
Figure 4-14 PCR amplification of DNA samples from
lineage 3 alleles and recombinant inbred strains. 150 Figure 4-15 A typical RFLP analysis and restriction
mapping . . . . ... .. 153
Figure 4-16 Restriction analysis of PCR-amplified
products. . . . . .. 155
Figure 4-17 Summary of the evolutionary relationship
among four lineage Ab alleles. ... ... 158 Figure 5-1 Restriction maps of 86 Ab alleles derived
from Table 5-1 ................. 170
Figure 5-2 Diagram illustrating the evolutionary
origins of the 4 lineages of Ab alleles assayed 179 Figure 5-3 Example of a restriction site allele used
for parsimony analysis. ............. 183
Figure 5-4 Phylogenetic relationships of 86 Ab alleles
derived from 12 species and subspecies of genus
Mus . . . . . . . 189
Figure 5-5 Phylogenetic relationships of 86 Ab alleles
from 12 species and subspecies of Mus ...... 192
Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
STABLE ALLELIC LINEAGES OF MHC CLASS II GENES WITHIN THE GENUS MUS
Chairperson: Dr. Edward K. Wakeland Major Department: Pathology and Laboratory Medicine
Previous studies have organized alleles of the Mhc class II Ab gene into 3 evolutionary lineages based on genomic structures. The major distinction between lineage 1 and 2 is an 861 bp retroposon in the intron separating the A#, and A2 exons in lineage 2 alleles. By using this retroposon as an evolutionary tag, we have extended our molecular genetic studies of Ab to include 115 independently derived H-2 haplotypes from 12 separate species and subspecies of genus Mus. Ab alleles from lineage 1 and 2 were found in all 3 aboriginal species (Mus spretus, Mus spiceligus, and Mus spretoides) and in Mus caroli, indicating that these two lineages of Ab alleles diverged a minimum of 2.5 million years ago. Parsimony analysis of 86 Ab alleles, using restriction site as a character state, indicated that lineage 3 alleles
are evolutionarily more closely related to lineage 2 than to lineage 1. DNA sequence of intron 2 from an evolutionary lineage 3 allele was determined. The data indicated that lineage 3 was derived from a lineage 2 allele by two additional insertional events in the intron 2. One insertion, composed of Alu-like(Bl) repeat, occurred 508 bp 3' of A.1 exon. By using the polymerase chain reaction and restriction analysis, a lineage 2 allele from Mus m. musculus, was identified to carry that B1 insert, thus defining new lineage, 2B. The other insertion, occurring in the lineage 2 retroposon, starts 1141 bp 3' of the Ao, exon. This latter insertion is 539 bp in length and is composed of Alu-like repetitive elements and unique sequence. In summary, the murine Ab genes can be divided into 4 distinct evolutionary lineages, 1, 2A, 2B, and 3, which are produced by 3 independent retroposon insertions. Lineage 3 alleles were found in Mus m. musculus and Mus m. domesticus, indicating that lineage 3 as well as 2A and 2B diverged a minimum of 0.5 millions years ago. These results indicate that all 4 lineages of Ab have persisted through several speciation events in the genus Mus.
The I region of the murine major histocompatibility complex (H-2) contains a tightly-linked cluster of highly polymorphic genes (class II) that control immune responsiveness. Two major hypotheses have been proposed to account for the origin of this polymorphism, which is believed to be essential for the function of the class II proteins in immune protection of host. The first was that hypermutational mechanisms (gene conversion or segmental exchange) promote the rapid generation of diversity in Mhc genes. The alternative was that polymorphism arose from the steady accumulation of mutations over long evolutionary periods, and that multiple specific alleles commonly survived speciation event (transspecies evolution or ancestral polymorphism). In a previous study, McConnell et al. (1988) used restriction fragment length polymorphism (RFLP) and sequence analysis to seek evidence of "segmental exchange" and/or "trans-species evolution" in the class II genes of the genus Mus by a molecular genetic analysis of Ab alleles. This study detected 31 Ab alleles in a collection of 49 H-2 haplotypes derived
from 5 separate species and subspecies in the genus Mus. These alleles were organized into 3 evolutionary lineages on the basis of retroposon polymorphisms occurring in the intron (intron 2) separating the exons which encode the pl and 82 domains of Ab. By using this retroposon sequence as an evolutionary tag, they demonstrated that the AP alleles in two of these lineages diverged at least 0.5 million years ago and that alleles from both lineages survived the speciation events leading to several modern Mus species. These findings indicate that class II gene polymorphisms are evolving in a trans-species manner, suggesting that the extensive diversity of Mhc class II genes predominantly reflects the steady accumulation of mutations in distinct lineages of alleles which are selectively maintained in natural populations for long evolutionary periods.
In this dissertation, we address two additional issues concerning the evolution of Ab in Mus. The first issue concerns the evolutionary origin of lineage 3. What is the nature of the retroposon polymorphism in lineage 3 alleles and was lineage 3 derived from lineage 1 or lineage 2 ? If so, what kind of evolutioanry mechanism generated lineage 3 ? We have addressed this issue by sequencing a 3.8 kb DNA segment containing intron 2 from a prototypic lineage 3 allele. The results clearly indicate the lineage 3 alleles are derived from lineage 2 allele by two additional independent retroposon insertions in intron 2. The second issue concerns the
distribution of various Ab lineages within the genus Mus and how long these Ab lineage have persisted in the genus Mus. We have addressed this issue by expanding the RFLP analysis to include 115 independently-derived H-2 haplotypes derived from 12 separate species and subspecies of genus Mus. A total of 86 Ab alleles was identified from this analysis. Parsimony analysis, using restriction site as a character state, was also exploited to construct the evolutionary trees of Ab alleles to determine their phylogenetic relationships. DNA sequence and restriction enzyme analysis indicate that Ab genes can be divided into 4 distinct evolutionary lineages, which are generated from three independent insertional events. The presence of various lineages in different species and subspecies of Mus further the idea that the Mhc genes evolved in a trans-species fashion and they have persisted over long evolutionary timespans in genus Mus.
GENOMIC ORGANIZATION OF MAJOR HISTOCOMPATIBILITY COMPLEX
In the past decade our understanding of the major histocompatibillity complex has advanced dramatically because of the application of both monoclonal antibody techniques and recombinant DNA technology. Biologists are now able to characterize one of the most fundamental phenomena of eukaryotic biology--the ability of organisms to discriminate between self and nonself in molecular terms. Even the most primitive of metazoa, the sponges, display cell surface recognition systems capable of discerning and destroying nonself, probably to maintain the integrity of individuals surviving in densely populated environments (Hildemann et al. 1981). There are three fundamental features about this self/nonself recogntion systems---cell-surface recognition structures, effector mechanisms that result in the destruction of nonself, and a high degree of genetic variability in the recognition structures (Hood et al. 1983).
In mammalian genetic systems, a chromosomal region termed the Mhc encodes the self/nonself recognition system with similar features. Although all vertebrates appear to posses a homologous Mhc, it has been most extensively studied in
mouse (H-2) and in man (HLA) (Gotze et al. 1977). The Mhc was first identified in mice (H-2) because of the availability of inbred and congenic strains of mice. By grafting of tumors or skins between such strains of mice and following rejection or acceptance of the graft, Gorer and others (Gorer et al. 1938, 1948) were able to map the rejection phenomena to a region on chromosome 17, which was then denoted the Mhc. In mouse at least 60 traits, most of which are associated with the immune response, have been mapped to Mhc using the classic genetic techniques (Klein 1975).
Mhc is defined as a group of genes coding for molecules that provide the context for the recognition of foreign antigens by T lymphocytes (Klein 1983). "Context" implies that T cells do not recognize antigen alone; but instead recognizes antigen in the context of Mhc molecules on the surface of antigen-presenting cells. Thus far, Mhc genes have been found only in vertebrates. It is not known whether all vertebrates possess Mhc, but so far it has been identified in twenty vertebrate species (Klein 1986).
Three Classes of Mhc Genes
Traditionally, the Mhc genes can be divided into three classes, I, II and III. Class I molecules are involved in
transplantation rejection and T -cell-mediated cytotoxic killing. Class II molecules serve as restriction elements during the presentation and processing of foreign antigen to regulate the immune response. Certain complement components, e.g. C3 and C4, are encoded by class III genes within the Mhc complex. However, no significant homology can be shown between Mhc genes and complement genes, and although the C4 genes is closely linked to Mhc in many species, the C3 genes are only loosely linked to some species, but not in other species (Alper 1981). Klein et al. (1983) have argued against the inclusion of the complement genes as a class of Mhc genes.
Organization of Mouse Mhc
The H-2 complex of the laboratory mouse is the only Mhc in which nearly all of the loci have been identified and their position determined. For example, the molecular map of Mhc genes of C57BL/10 (Weiss et al. 1984) and BALB/c (Steinmetz et al. 1982a; Winoto et al. 1983) haplotypes have been extensively characterized. From the centromeric part of the Mhc of the BALB/c mouse, a 600 Kb segment cluster has been cloned containing two class I (K and K2) and seven class II genes (Pb(A3) to Ea) (Steinmetz et al. 1986) (Figure 2-1).
4 (1) a) 0 00
4 1 U) c SU
-1 a ) I 4) E1 4
0 $ *HO 'I
-4 PQM r
/ ) 0I
z 0c m U O. w : w .)
n" m C''.5 I)
Following a gap of about 170 kb, a second gene cluster of 330 kb in length has been cloned from the S region containing (C4, SlP, Bf, C2) coding for complement or related components and two homologous genes (21-OHA and 21-OHB), one of which encodes for steroid 21-hydroxylase (Muller et al. 1987). A third gene cluster covering 500 kb of DNA has been isolated from the D and Qa regions and localizes the positions of 13 class I genes(D to 01-10) (Stephan et al. 1986), the TNF-a and .- genes coding for cytotoxins (Muller et al. 1987b). From the Tla region, a total of 19 class I genes are distributed in 3 gene clusters. In summary, the Mhc complex of the BALB/c mouse contains 50 loci, of which 34 loci are class I and 7 are class II genes (Steinmetz & Uimatsu 1987). Whereas in the Mhc of C57BL/10 mouse, 26 class I genes have been identified, of which 10 genes are in the Qa2.3 regions and 13 genes in the TL region (Flavell et al. 1985). Among 3 H-2 haplotypes (b, d and k) analyzed thus far (b, d and k), the K and the class II regions show no large differences in organization (Klein & Figueroa 1986).
Genetic loci of class I gene
There are two class I genes (H-2K and H-2K1) at the centromeric end of the H-2 region; all the remaining genes are at the telomeric end. The class I loci can be divided into two subclasses: I-a, consisting of loci with a known function (H-2K, H-2D, H-2L) and I-b, consisting of the remaining loci
whose functions are largely unknown. The class II loci and a group of unrelated loci including genes coding for complement components are inserted between two H-2K loci and the rest of class I loci (Figure 2-1). The class I loci can be assigned to one of four regions: K5, _, Qa and Tla, depending on their position, this division only in part reflects the evolutionary relationships among the individual loci (Klein & Figueroa 1986). Class I transplantation antigen are found on virtually all nucleated cells of the mouse. The cell surface antigens encoded in Oa-2.3 and Tla region can be further distinguished from classical class I antigen because they are less polymorphic and more limited in tissue distribution than K or D-encoded antigens (Flaherty et al. 1980).
Class I gene structure. The exon-intron organization of class I genes are remarkably similar to each other. Each class I gene is composed of 8 exons, which correlates precisely with the domain structure of class I polypeptide (Figure 2-2) (Steinmetz et al. 1981; Nathenson et al. 1981). The first exon encodes the leader peptide, the second, third, and fourth exons encode the al, a2 and a3 domains. The fifth exon encodes the transmembrane region, and the sixth, seventh, and eighth exons encode the cytoplasmic domain and 3' untranslated region (Figure 2-2).
oH r 0 *p2
O. u k
N5 0 I 0E C.G
Class I polypeptide. Class I protein has a mol. wt. of 45,000 daltons and is a transmembrane glycoprotein noncovalently associated with P2-microglobulin (p2m), a 12,000-dalton polypeptide encoded by a gene located on chromosome 2 in the mouse (Goding et al. 1981; Michaelson et al. 1981; Robinson et al. 1981). Amino acid sequence analyses have demonstrated that class I antigen can be divided into 5 domains (Coligan et al. 1981). The three external domains, al,a2 and a3, are each about 90 residues in length. The transmembrane portion is about 40 residues and the cytoplasmic region is about 30 residues long. The a2 and a3 domains have a centrally placed disulfide bridge spanning about 60 residues and up to three N-linked glycosyl units bound to these domains (Maloy et al. 1982). Amino acid sequence analyses also suggest that the a3 domain (Strominger et al. 1980) and #2microglobulin (Peterson et al. 1972) show strong sequence homology to the constant region domains of immunoglobulins. Binding studies from class I molecules with peptide fragments have shown that the #2m subunit associated with the a3 domain (Yokoyama et al. 1983).
Three dimensional model of class I molecules. Recently, a three dimensional structure of human class I molecule HLAA2 was studied by X-ray crystallographic analysis (Bjorkman et al. 1987a, b). Soluble HLA-A2 was purified and crystallized after papain digestion of plasma membranes from
a homozygous human lymphoblastoid cell line. Papain treatment yields a molecule composed of al, a2, a3 and $2m. This class I molecule consists of two pairs of structurally similar domains: al has the same tertiary fold as a2 likewise a3 has the same tertiary fold as #2m. The a3 and #2m both have Psandwich structures composed of two antiparallel P-plated sheets, one with four P-strands and one with three 1-strands, connected by a disulphide bond. The same tertiary structure has been shown for constant region of immunoglobulin and is consistent with high degree of sequence homology between c3, #2m and constant region. The structurally similar al and r2 domains are paired, with the four 1-strands from each domain forming a single antiparallel 1-sheet with eight strands. This particular intramolecular "dimeric interaction" (McLachian et al. 1980) seen between al and a2, involving the creation of a single P-sheet from two domains, has been observed in many inter-molecular dimers, and has been proposed to be preserved in an intermolecular dimer, such as Mhc class II molecules (Bjorkman et al. 1987b).
Antigen binding site of class I molecule. Several observations suggest that the groove between al and a2 helices is the antigen binding site (ABS) (Bjorkman et al. 1987b). It is located in a position, distal from the membrane end of the molecule, capable of being recognized by receptors of another cells. The site, -25 A long by 10 A wide by 11 A
deep, has a size and shape consistent with the expectation. By analogy with class II molecules, class I molecules bind processed antigen in a form of peptides. Synthetic peptides have been shown to bind to purified murine class II molecules, presumably mimicking processed antigen (Guillet et al. 1986). Because class I and class II molecules have homologous structures (Kaufman et al. 1984) and T cells specific for either class I or II molecules use the same receptors (Rupp et al. 1985; Marrack & Kappler 1986), the type of interaction described between peptides and class II molecule is assumed to apply to peptides and class I molecules. Electron density representing an unknown molecule, possibly a bound peptide antigen, is found in the site of two crystal forms of HLA-A2 class I molecules (Bjorkman et al. 1987b). An a-helical conformation has been proposed for bound peptide (Berkower et al. 1986; Allen et al. 1987). Thus, one face of a peptide ahelix is envisioned to contact the class II molecule, the other to be contacted by T cell receptor. Many of the polymorphic residues that are responsible for recognition by T cells and haplotype-specific association with antigens are located in this site where they could serve as ligands to a processed antigen. This is further evidence that this region functions as antigen binding site (Bjorkman et al. 1987b). Most of non-conserved residues are located in and around the ABS site, suggesting that most variable residues in class I molecules have been selected to generate an ability to present
many different peptides. It is also noted that some of conserved amino acid residues are located in the ABS, suggesting that they may recognize a constant feature of processed antigens, consistent with the previous suggestions.
Genetic Organization of the I Region
In the past the I region had been divided into five subregions by serological and functional analysis of recombinant H-2 haplotypes; these are: I-A, I-B, I-J, I-E and I-C (Murphy 1981; Klein et al. 1981; Klein et al. 1983). The subregions are defined by crossover positions in H-2 recombinant strains. However, so far only four I regionassociated (Ia) products have been identified by both serological and biochemical analysis (Jones 1977; Uhr et al. 1979). Failure to identify gene products encoded by I-B, IJ, and I-C subregions was further explained as follows:
The existence of a separate I-B subregion was initially proposed by Lieberman and coworkers (1972) to explain the genetic control of antibody response to a myeloma protein. The involvement of the I-B subregion was later postulated for immune responses to at least five other antigens: lactate dehydrogenase B (LDHB) (Melchers et al. 1973), staphylococcus nuclease (Lozner et al. 1974), oxazolone (Fachet et al. 1977), the male-specific antigen (Hurme et al. 1978), and
trinitrophenylated mouse serum albumin (Urba et al. 1978). In all these cases the mapping of genes controlling the immune response centered around the four critical H-2 haplotypes,i.e. B10 (A) (H-2") C57BL/10 (H-j) Bl0.A(4R) (H-2h) and B10.A(5R) (H-_25), used by Lieberman and her co-workers. However, further analysis by Baxemanis et al. (1981) of the response to LDH, and to myeloma protein MOPC173 revealed the involvement of Th and T, cells in response to these antigens, making the postulate of a separate I-B subregion unnecessary.
This locus was originally defined serologically and mapped between I-A and I-E by reciprocal alloantisera raised between strains BlO.A(3R) and B1O.A(5R), which are inbred congenic recombinant strains with a crossover between I-A and I-E subregions (Murphy et al. 1978a, 1978b). Alloantisera and monoclonal antibodies raised against I-J-encoded molecules react with determinants expressed on suppressor T cells, and the soluble suppressor T cell factors released by these cell lines (Krupen et al. 1982). There is a lot of experimental data available supporting the existence of I-J locus (Murphy et al. 1978a; Waltenbaugh et al. 1981). However, its true identity and chromosomal location remain elusive. By using restriction fragment polymorphisms (RFLP) to map the crossover points among inbred congenic mouse strains that have recombination events between I-A and I-E loci, I-J subregion
was mapped to a 3.4 kb segment of DNA between I-A and I-E, including 3' half of Eb gene (Steinmetz et al. 1982). Molecular cloning of this 3.4 Kb region from ten parental and intra-I recombinant inbred strains have narrowed the distance between cross points separating I-A and I-E to 2.0 kb, contained entirely within the intron between 41-E42 and 42 exon of Eb gene (Kobori et al. 1984). Although a lot of explanations have been put forth to account for the apparent paradox of I-J, all of them are refuted by experiments showing that cloned DNA of this region fails to hybridize to mRNA isolated from I-J+ suppressor T cell lines (Kronenberg et al. 1983).
This subregion was defined by the Ia.6 specificity, detected as a cytotoxic antibody present in B10.A(4R) (H-2h2) anti-BlOA(2R) (H-2h4) antiserum (Sandrin et al. 1981). These antisera containing purported anti-I-C antibodies were shown to react with a suppressor factor generated in a mixed lymphocyte reaction (MLR) (Rich et al. 1979; Rich et al. 1979). A MLR that is generated in congenic strain combination differing at the I-C subregion can be inhibited by the addition of anti-I-C antisera (Okuda et al. 1978). Mapping by classic genetic methods has suggested a locus in the I-C subregion between Ea and the gene coding for the C4 complement components. Although this segment of DNA has not been
characterized using molecular techniques, the data available do not lend support for the existence of I-C. Others have never been able to demonstrate any activity in I-C-defining H-2h2 anti H-2h4 combination by serological methods, MLR, graft-versus-host reaction, or cell-mediated lymphocytotoxicity (CML) assays (Juretic et al. 1981; Livnat et al. 1973).
Linkage Relationship of Class II Genes Class II gene loci
Chromosomal walking through the I region by the ordering of overlapping cosmid clones (Steinmetz et al. 1982a) as well as genetic mapping of restriction fragment length polymorphisms (Mathis et al. 1983; Hood et al. 1983), has allowed the chromosomal localization of the loci encoding the four functional defined class II genes. A continuous stretch of about 500 kb of DNA encompassing the I region was first isolated by screening a BALB/c sperm cosmid library with a human Mhc class II DRA cDNA probe (Steinmetz et al. 1982a). This 500 kb region of DNA includes the right end of I region, as the complement component C4 gene mapping into the S region can be identified (Figure 2-1). C4 gene is located a few hundred kb distal to the Ea gene and was identified by a synthetic oligonucleotide probe specific for the aminoterminal of C4a subunit. Five class II genes, Aa, Ab, Eb, Eb2, and Ea extending over a 90 kb region of DNA, have been
identified. Ab, Aa and Ea were identified by DNA sequence analysis, and Eb was identified by a specific oligonucleotide probe. Eb2 was identified by cross-hybridization with a human DRA cDNA probe and mouse Eb gene. The identity of Eb gene was confirmed by comapping via RFLP analysis which localizes a serologically defined Eb recombinant in the middle of Eb gene (reviewed by Hood et al. 1983). Southern blot analysis of mouse genomic DNA with class II probes suggested that class II genes are single copy and that there are no more than two a genes and six p genes in the mouse genome (Steinmetz et al. 1982a; Devlin et al. 1984). All the known class II loci are contained in a tightly-linked cluster, inserted between the H-2K and C4 genes. This cluster contains 4 functional genes and 4 pseudogenes, which are further divided into two subclasses, I-A and I-E. The eight class II genes, Pb (A03), Ob (AP2), Ab, Aa, Eb, Eb2, Ea, and Eb3, are arranged in this order from the centromeric towards the telomeric end (Steinmetz et al. 1982a; Davis et al. 1984; Larhammar et al. 1983; Widera et al. 1985) (Figure 2-1 & Figure 2-3). Out of the eight genes, only four are have been shown to encode gene products, A coupled with A# to form I-A molecules, E, with Eg to form I-E molecules (Jones et al. 1978; Uhr et al. 1979). The Ob and Eb2 genes are reported to be transcribed, but at very low levels and have no detectable protein product (Wake & Flavell 1986). The Pb gene is a pseudogene, at least in the b and k haplotypes, as it has a deletion of eight nucleotides
- .0 H 0 c4 o
N 4) to14
:1 (L) ON X a-oi 0 >4 44 IQ ts 0
Ci) C ,,
and a termination codon in its sequence. The Eb3 thus far has been found only in the H-2b haplotype, but probably also exists in other haplotypes (Flavell et al. 1985b). All haplotypes studied thus far contain these class II genes. The distances between these genes are, with a few exceptions, approximately the same in different haplotypes.
Biochemistry of Class II Molecules
Up to now only four I region-associated (Ia) products have been identified by both serological and biochemical methods. The I-A subregion contains 3 loci that encode three serologically detectable polypeptides: Ap, A4, and E (Jones et al. 1978). I-E subregion contains a locus that encodes a fourth class II polypeptide chain, E (Uhr et al. 1979).
Structure of class II ploypeptides
The two class II molecules encoded in the I-A and I-E subregions are both heterodimeric glycoproteins composed of one heavy (a) and one light (P) chains (Figure 2-3 and Figure 2-4). The a chains range in molecular weight from 30,000 to 33,000 and the P chains range in molecular weight from 27,000 to 29,000. The difference in molecular weight of a and P chain is due to an extra N-linked glycosyl unit attached to a chain (reviewed by Klein et al. 1983). The structure of the class II polypeptides have been determined in a number of
H 44 O~r
g *0 r.4
4 H 04( 04
$# 0 -H1
X) C 0 -)
geW 0 .
O ) 00
4G g I ,C
op 4 14 I0
-H I I N4 w0
studies (McNicholas et al. 1982; Mathis et al. 1983a; Malissen et al. 1984; Benoist et al. 1983; Larhammar et al. 1983; Estess et al. 1986). The sequence data available suggest that the mouse I-A and I-E molecules are homologous to human pQ and DR class II genes, respectively (McNicholas et al. 1982; Malissen et al. 1983a; Larhammar et al. 1983). Each class II molecule consists of two extracellular domains, al and a2 or P1 and P2, each about 90 residues in length, a transmembrane region of about 30 residues, and a cytoplasmic tail of about 10-15 residues. Three of the four extracellular domains (a2, 1l and P2) have a centrally placed disulfide bridge spanning about 60 amino acid residues, while the al does not. The membrane proximal domains of both a and P, like that of class I molecules, show strong homology to immunoglobulin constantregion domains. In this respect, the class I and class II molecules are very similar to each other in overall organization and domain structure. For each of the two polypeptide chains of class II molecules, a and P chains, the polymorphic residues are concentrated in the al and P1 aminoterminal domains (Benoist et al. 1983; Larhammar et al. 1983). These domains are responsible for binding peptides in what appears to be a single site. By aligning the sequences of class II a and 6 chains with the class I heavy chain by matching the al and 1l domains of class II with the al and a2 of class I, a hypothetical tertiary structure for class II molecules has been proposed (Brown et al. 1987) (Figure
2-5). The folding of the class II molecule resembles that of class I, in that two a helices are supported by an array of eight -plated sheets (Brown et al. 1988). The recent results of Perkins et al. (1989) showing that peptides presented by class I molecules can be presented by class II molecules, and vice versa, support the notion that the structures of peptidebinding sites are similar in class I and class II.
Structures of class II genes
There is a striking correlation between the gene organization and domain structure of Mhc class II molecules (Figure 2-4). Both a and P genes begin with leader-encoding exons that contains 3-6 residues of the mature proteins. Exon 2 and 3 encode al or pr and a2 or P2 domains, respectively. P genes have three exons encoding TM, CY, and 3'UT region, while a genes have TM, CY, and the beginning of 3'UT regions in exon 4, and the rest of 3'UT region in exon 5 (Larhammar et al. 1983; Estess et al. 1986).
Analysis of the Structure-Function Relationship of Class II
The application of DNA-mediated gene transfer (DMGT) has been a major advancement in the analysis of structure and function relationships of Mhc gene products. Particularly,
r-, 'lr~L V8
I I I
DMGT has provided insight into the actual biochemical bases of immune recognition and regulation, which are highly dependent on the fine structure of Mhc-encoded products and T cell receptors with which they interact.
Regulation of class II gene expression
The expression of class II genes is normally limited to a number of tissues (Klein 1986). Cell surface expression of class II is positively regulated by the addition of gamma interferon (King & Jones 1983). Gamma interferon can increase both class I and class II gene expression (King & Jones 1983). It appears to act at the level of transcription, since the surface expression is correlated with the level of specific mRNA (Nakamura et al. 1984). Initial studies on class II gene expression following transfection were performed using cells that either constitutively expressed (B lymphoma) or were inducible (macrophage cell lines) for endogenous class II genes (reviewed by Germain & Malissen 1986). Introduction of the genomic copies of mouse class-II genes into B-lymphomas resulted in high levels of gene transcription and the expression of gene products of the transfected genes on the cell surface (Ben-Nun et al. 1984). However, it was difficult to assign the observed effect in serologic or T cell restriction element to the introduced gene products. The assembly of a variety of class II molecules following the introduction of a and/or P chains, prevented the dissection
of which introduced chain caused the phenotypic traits. Iamouse fibroblast L cell lines derived from the original L-cell line of C3H fibroblasts have been used for a variety of gene transfer studies. Using cosmid clones containing the complete DRA and DRB genes, L cells were first demonstrated to express the class II molecules by Rabourdin-Combe & Mach (1983). No expression was seen when either DRA gene or DRB gene was introduced separately. This is consistent with the suggestion that a:P pairing is required for the efficient cell-surface expression of Mhc class II, although one recombinant, A.TFR5 (I-Af, Eak) has been suggested to express a free E. chain on the cell surface (Begovich et al. 1985). Their observations were confirmed by studies of Malissen and coworkers (1984) and Norcross et al. (1985) with mouse class II genes. In both studies, transfection of either a or P chain gene alone failed to lead to the membrane expression, whereas the cotransfection of the &:A$ pairs derived from the same haplotypes (e.g. A'd d,kk) resulted in significant surface expression. These results agree with those obtained using Ia' recipient cells, in that the independent transfer of a or P chain genes result in the expression only through pairing with the endogenous complementary class II gene products (Ben-Nun et al. 1984). However, one should be cautious about the view that a:p heterodimers are required for the surface expression, as most of the monoclonal antibodies used for the detection of membrane molecules have not been
shown to react with single a or P chains, which presumably would assume a different configuration as single chains from when paired with the other complementary chain. Thus, the surface expression of isolated a or P chain might be undetectable using standard reagents. However, additional experiments are also consistent with a lack of surface expression of free a or P chains. McCluskey et al. (1985) compared the surface expression of Ae chain gene in L cells to membrane expression of a chimeric classII:classI gene. The latter chimeric molecule is composed of A ik domain covalently linked to the a3, TM and CY portion of class-I-Dd molecule. Following transfection, the expression of the chimeric gene can be detected with both anti-I-Ak and anti-a3 (Dd) monoclonal antibodies. The same anti-I-Ak antibodies failed to detect the surface expression of L cells transfected only with the native A$sk chain gene and shown to contain the high level of Abk mRNA. This pair of cells was also analyzed using rabbit anti-I-A heteroantiserum, which has been shown to precipitate free A$ chain from a reticulocyte lysate in vitro translation product (Robinson et al. 1983) and to detect both A. and As polypeptides in western blots (Germain & Malissen 1986). Again, the cells containing the chimeric gene stained, but the cells containing the native Ak gene alone did not. These results indicate that single a or P chain do not reach cell surface efficiently and further imply that the A41 domain per se does not prevent surface expression.
Dispensability of I-E molecules
It has been estimated that some 20% of wild mouse populations do not express I-E molecules (Gotze et al. 1981). Laboratory inbred mouse strains of b, s, f, and q haplotypes fail to express serologically detectable I-E molecules (Jones et al. 1981). The defect in mice of b and s haplotypes is due to a deficiency of E. chains; E. polypeptide is undetectable in the cytoplasm while the normal amount of cytoplasmic E chains can be visualized by 2-D gels (Jones et al. 1981). The expression defect of these strains can be complemented by crossing b or s haplotypes with Ea-expressing strains, which results in normal expression of hybrid I-E molecules in F1 hybrids (Jones et al. 1981). However, neither E, nor E4 chain can be detected in cytoplasm of f- and q-haplotype mice, because of defective processing of both Ea and Eb mRNA (Mathis et al. 1983; Tacchini-Cottier et al. 1988).
L cells have also been used to examine the issue of allelic control of a:P pairing and restriction on crossisotype a:P assembly. Initial studies by Fathman & Kimoto (1981) and Silver et al.(1980) suggested that Ia+ cells from heterozygous individuals contain a mixture of Ia molecules derived from the free assortment of allelic a and P chains of a single isotype in all possible combinations. Thus, in (H2b x H-2k)F1 mice, one would find &bAb, bAk kA_ and kAkbkk
heterodimer in approximately equivalent proportions. Such aand #- chain mixing within an isotype did not seem to occur between distinct isotypes (i.e. A:&). However, during attempts to develop cell lines expressing only Fl-type Ia molecules (e.g. dA4k), it was found that although haplotypematched e:A4 pairs yield high expression in primary transfectants, cotransfection of haplotype-mismatched pairs gave little or no expression (Germain et al. 1985). This was true even though the genes used for the matched or mismatched gene pairs were identical, and despite the presence of detectable Aa and Ab mRNA in the nonexpressing cells. Additional experiments revealed that for genes of b, d and k haplotypes, cis-chromosomal a:p pairs (e.g. LAkAsk) always gave better expression than trans-pairs (e.g. Ak Ab); experiments also indicated that the expression of the latter varied over a wide range, depending on the particular allelic forms of a and P employed. Furthermore, AbAk and AkAb molecules, the basis for previous suggested "free pairing", are the best expressed haplotype-mismatched mixes, whereas AkAd has never been detected. In order to map the region of the AB molecules controlling the preferential pairing, recombinant A# molecule involving the b, d and k alleles were constructed. The entire A#, domain was exchanged between different alleles, or the amino-half of A4 was covalently linked to the carboxyl-half of A#, and various A2, TM and CY regions. These "domain and hemi-domain shuffled" Ab genes
were independently cotransfected with Aab,'d'rk into L cells. Their results indicate that the most important portion of Ab with respect to a:P pairing is in the amino-half of A1, in that molecules containing this region from a given allele expressed best with cis-matched Aa and at levels similar to wild type Ab, irrespective of the origin of the remainder Ab gene. However, when isotype-different a:# pairs were cotransfected into L cells, the results were quite unexpected. Although introduction of Abk and Eaa/k yield no surface Ia detectable with either anti-Ab or anti-E antibodies, Abd did pair with Ea to produce membrane molecules reactive with antiI-Abd and anti-I-Ea antibodies. Immunoprecipitation studies showed that these molecules existed as noncovalently associated dimers (Germain & Quill 1985). These data support the view that Aa and Ab genes located on the same chromosome actually coevolve for best "fit", such that cis-pairs form more efficiently than trans-pairs (Figure 2-6). This view is further supported by the studies of McNicholas et al. (1982), showing that an 8-10 fold preference of EaU:E4u assembly over Ea:E4k in cells of (B1O.A(4R) x B1O.PL) F, mice. The data on cross-isotype molecules indicate control of a:P pairing is strongly influenced by the highly polymorphic amino termini. To evaluate the relative efficiency of inter- versus intraisotypic Ia dimer expression, L cells were sequentially transfected with multiple class II a and P chain genes (Germain & Sant 1989). Then individual clones were analyzed
liii Sii li Ii' i I i Ii
both for the level mRNA expression produced by transfected genes and for their expression of inter- and intra-dimer at the surface. In three gene transfection system (e.g., Ab, Ea, and Eb), it was found that isotype-matched E dimer was expressed at 3-5 times the efficiency of the isotypemismatched E dimer based on the amounts of each P chain required to drive cell surface expression for the limited amount of EI. When A and Ea were compared their coexpression with relative excess A, the efficiency advantage of isotypematched ( Ag) versus isotype-mismatched (EA4) is about 3 to 4 fold. Additional experiments employing transfectants expressing Abd, Aad, Ebd, and Ea showed that in clones expressing mRNA ratios similar to B cells, only the isotypematched dimers were expressed. In clones that expressed high levels of AOd, in addition to isotype-matched dAd and EdE there was a significant amount of Ed at the cell surface. These data indicate that the asymmetry chain production in individual chain levels can lead to the expression of less favored isotype-mismatched dimers. In a recent report, recombinant mouse strains and transgenic mice with defective Eb genes, but with normal Ea genes, were examined for surface expression of E molecules (Anderson & David 1989). E molecules were shown to be expressed in B10.RFB2 (Abf, Aaf, Eb Eak) and B1O.RQB3 (Abq, Aaq, Ebq, Eak) by cell surface staining with anti-_E monoclonal antibody (14-4-4) in flow cytometry analysis. It has been proposed that these E.
molecules in fact may be hybrid Ia dimers formed by E:A pairing, as they can not be stained by E4-specific antibodies and can be detected in H-2q mice with the Eak transgene. This finding is further supported by the demonstration of E d'_d as a major class II molecule at the cell surface of a BALB/c B cell lymphoma (Spencer & Kubo 1989). Furthermore, although the hybrid EAg can not be isolated by immunoprecipitation, it can function in vivo leading to the clonal deletion of two eV TcR subsets, V06 and V811 (Anderson & David 1989), which have been shown to interact with the I-E molecule during the thymic selection (Kappler et al. 1987).
Functional Role of Mhc Gene
One of the most distinguishing features of gene products of Mhc is their extensive genetic diversity. One of the most important breakthroughs in cellular immunology was the discovery that the influence of gene products of the Mhc on immune response stemmed mainly from the critical role they played in the activation of regulatory T lymphocytes (Benacerraf 1981; Heber-Katz et al. 1982, 1983). Immune T cells are clonally specific and only recognize foreign antigens in the context of appropriate Mhc molecules. The discovery of this Mhc-restriction was possible only because Mhc molecules are polymorphic and T cells selected by an antigen in the context of one polymorphic variant can be
activated only by the same combination of foreign and Mhc molecules (reviewed by Parham 1984). T cells must corecognize antigen in association with one of these Mhc-encoded molecules in order for activation to occur. Cytotoxic T cells prefer class I molecules whereas inducer T cells prefer class II molecules. However, the relationship between the antigenspecific and Mhc-specific recognition component of T-cell receptor remained speculative until the advent of T-cell cloning. Kappler et al. (1981) fused two T-cell clones with different specificities and asked whether the antigen- and Mhc-specific component could segregate independently. A hybridoma specific for ovalbumin (OVA) in association with the I-Ak molecules was fused to a normal T-cell line specific for keyhole limpet hemocyanin (KLH) in the context of I-Af molecules. The resulting cloned somatic hybrid could be stimulated to secret interleukin-2 by either original pair of antigen and Ia molecule, but not by OVA in association with I-Af or KLH with I-Ak. These results indicated that T cell recognition of antigen was dependent on recognition of the Ia molecules. The first convincing evidence that indicated that Ia molecules and antigen interact with each other during the T-cell activation process came from the studies of B10.A mice immunized with pigeon cytochrome c (Heber-Katz et al. 1982). In defining the specificity of the response by using different species of cytochrome c, it was noted that the moth cytochrome c and its C-terminal fragment always elicited a heteroclitic
response, i.e. it was more potent on a molar basis than the immunogen, pigeon cytochrome c. Although most of the B1O.A (E k:Ek) T-cell hybridomas specific for pigeon cytochrome c could be stimulated by moth cytochrome c in association with B1O.A(5R) hybrid I-E (E b:Ek) antigen-presenting cells, they could not be stimulated by pigeon cytochrome c in the context of hybrid I-E. No other antigen presenting cells (APCs) carrying disparate H-2 haplotypes, e.g., APCs from B10 and B1O.A(4R) mice (neither strain express I-E molecule), gave any stimulation. Thus, these T-cell clones were able to recognize moth cytochrome c associated with either pk:_k or b:Ek Ia molecules. Other experimental evidence also suggested that antigen recognition by cytotoxic T cells was fundamentally similar to that of helper T cells (Hunig & Bevan 1982). Using Ia-containing planar membrane as antigen presenting particles together with defined synthetic peptides, it was demonstrated that Ia and "processed" antigen are the only requirement for T cell recognition. That Ia and processed antigen interact specifically prior to T cell recognition was supported by the observation that antigens could compete with one another at the level of antigen presentation in the absence of T cells (reviewed by Buus et al. 1987). The first direct biochemical evidence of a specific antigen/Mhc interaction came from equilibrium dialysis studies using affinity purified Mhc molecules and labeled synthetic peptide (Babbitt et al. 1985). They
demonstrated that hen egg lysozyme (HEL) 46-61 [HEL(46-61)] bound to I-Ak, but not to I-Ad. This binding study correlated with the finding that T cells specific for HEL (46-61) from high responder H-2 k mice are restricted by I-Ak whereas H2d mice are low responders. These results demonstrated a correlation between immunogenic peptide-Ia interaction and Mhc restriction (Babbitt et al. 1985). Furthermore, it was shown that the failure of pigeon cytochrome c to be recognized in the context of the hybrid I-E molecule was due to the fact that hybrid I-E molecule was unable to interact with pigeon cytochrome c-derived synthetic peptides (Buus et al. 1987). Each Mhc molecule binds many different peptides, using a single binding site and probably through the recognition of broadly defined motifs (Buus et al. 1987). This concept of single antigen binding site is compatible with the recently described X-ray crystallographic structure of human class I molecules (Bjorkman et al. 1987a, 1987b).
Genetic Polymorphism of Mhc Genes
There are five distinguishing features of H-2 polymorphism in wild mice that have been the subject of considerable investigation. 1) there is a large number of alleles encoded by each genetic locus. The most polymorphic genetic loci known in the mouse are located within the H-2 complex. Although at least 50 alleles have been detected for the H-2K and for the H-2D genes, it is estimated that at least 100
alleles may exist in each of these genes (Gotze et al. 1980; Klein & Figueroa 1981, 1986). There are other genes within the H-2 complex are also highly polymorphic, but they tend to be less polymorphic than the H-2K and H-2D genes. 2) most if not all wild mice are heterozygous with respect to H-2 class I and class II genes (Duncan et al. 1979; Nadeau et al. 1981). This high level of heterozygosity is unprecedented in the mouse and is mainly, if not entirely, a result of the presence of a large number of alleles in wild mouse populations. It was estimated that over 90-95% of the wild mice are heterozygous at both K and D loci and at least 85% are heterozygous at the Ab and Eb loci (Duncan et al. 1979; Nadeau et al. 1981). These figures concur with the high H-2 polymorphism estimated from the antigen and gene frequencies (Klein 1986). 3) H-2 polymorphism occurs as a family of closely related alleles. Both amino acid and DNA sequence analysis demonstrates that the similarity between H-2 genes and proteins is discontinuous (Wakeland et al. 1986). 4) both sequence and amino acid analysis of serologically and biochemically indistinguishable class II molecules derived from different subspecies suggest that they are identical (Arden et al. 1980; Arden & Klein 1982). 5) there is a high percentage of nucleotide difference between alleles from the same locus. The nucleotide sequence variation can go up as high as 5-10%, including the coding region (Benoist et al. 1983; Estess et al. 1986)
Mechanisms Qenerating polymorphism of Mhc genes
Mutation. It is generally believed that ultimate source of genetic variation is mutation (Nei 1987b). There is no evidence suggesting that the extensive diversity of Mhc is generated by high mutation rate (Hayashida & Miyata 1983; Klein 1987). Serologic typing of class II genes of wild mice in global populations suggested class II molecules can be arranged into families of alleles, based on the antigenic similarity and tryptic peptide fingerprints of I-A molecules (Wakeland & Klein 1979; Wakeland & Klein 1983). Each family consists of a cluster of closely related alleles. Tryptic peptide fingerprinting comparisons of alleles within the same family revealed that the contemporary Aa and Ab alleles arose from common ancestors by multiple independent mutational events (Wakeland & Darby 1983). Furthermore, radiochemical sequence analysis of structural variants within the family indicates that these I-A variants have diversified by accumulating discreet mutations within the al and pl domains of I-A molecules (Wakeland et al. 1985). Similar conclusions have been drawn from the studies of human class II molecules (Gustafsson et al. 1984).
Gene conversion. Gene conversion (hypermutational mechanism or segmental exchange) is a process whereby the nonreciprocal exchange of genetic information between two genes occurs (Baltimore 1981). It differs from unequal crossing
over in that neither gene gains or loses genetic material. Classically, it has been studied in allelic genes of fungi due to the ease of tetrad analysis. However, a growing amount of evidence suggests its existence in mammalian genomes (reviewed by Hansen et al. 1984). Analysis of the murine class I mutants has provided compelling evidence for the occurrence of gene conversion-like events in Mhc gene. Nathenson and his coworkers have undertaken the painstaking structural analysis of a series of mutant Kb molecules (Geliebter et al. 1987) Four antigenically important regions within the al and a2 domains of Kb molecules are revealed from the analysis. Alterations in these regions result in the formation of new epitopes which are detectable by graft rejection in vivo and CTL in vitro. The result of their analysis suggests that micro-recombinations between Kb and other class I genes may be responsible for the generation of diversity of class I gene. In most, if not all, mutants analyzed, the non-classical H-2 genes, i.e. Qa and Tla region gene are identified to be donor genes that can recombine into and "mutate" H-2 genes. There is evidence showing that the gene conversion is operating in H-2 class II genes as well (Mengle-Gaw et al. 1984). A B6.CH-bm12 (bm 12) mouse is Mh class II Abb mutant, derived by spontaneous mutation from a (BALB/c x B6)F1 parent. The bml2 mutant and its B6 parent show reciprocal skin graft and twoway mixed lymphocyte reaction (MLR). Genetic studies and tryptic peptide mapping studies have concluded that Abbl2 gene
from bml2 mutant differ only 3 nucleotide from its B6 parent Abb gene. By T cell proliferation assay and monoclonal antibody-blocking studies, alloreactive T cell clones are shown to recognize the Ek Eb and &bA bm12. Comparison of sequences among Abn12, Abb and Ebb indicates that the bml2 DNA sequence is identical to the Ebb sequence in the region where it differs from Abb. Furthermore, this region is flanked by a stretch of identical DNA sequence between Abb and Ebb. These results suggest that the bml2 mutation arose by gene conversion of this region of Ebb into the corresponding region of Abb. The maximum extent of sequence transfer between Ebb and Abb is estimated to be 44 nucleotides, but could be as little as 14 nucleotides. Evidence of segmental exchange has also been provided by analyzing the exon sequences of eight Ab alleles (McConnell et al. 1988). In an attempt to analyze the association between exon and intron sequences, it was noted that most alleles of exons evolve in association with their associated intron sequence polymorphisms with the exception of two alleles, Abb and Abnod (Figure 2-7). These two alleles appear to be the products of intragenic segmental exchange (McConnell et al. 1988).
0 0 .O
-I/ U)i b Q4J 0 .4 4 A
.H 0 Cr
0 0 r
S.0 W1 o m
(k O U) o r43 r 4) P
O r" *4 0
*Og@ ,C t o O ) V
0 m O
_ 0) c-
0 C CD
Trans-specific evolution. The evolutionary rate of Mhc loci is not higher than that of any other loci (Hayashida & Miyata 1983). Although the presumed rapid diversification within species can be explained by mechanisms such as gene conversion, an alternative hypothesis has been proposed by Klein et al. (1980, 1987). According to this hypothesis, the evolution of Mhc polymorphism is via a trans-species mode, starting with a number of major alleles that are passed on in phylogeny from one species to another. During the evolutionary process the alleles accumulate the mutations, which result in the extensive diversity of Mhc genes. There is mounting evidence supporting this hypothesis. McConnell et al. (1988) assembled a collection of 49 H-2 haplotypes derived from five Mus species, including Mus m. musculus, Mus m. domesticus, Mus m. castaneus, Mus spicilequs, Mus spretus. A total of 31 Ab alleles was defined by RFLP analysis. Based on the degree of sequence divergence, 31 alleles defined by restriction fragment length polymorphism (RFLP) can be divided into three distinct evolutionary lineages. Most of these alleles (28 out of 31) were in either lineage 1 or 2, both of which consisted of alleles derived from 4 separate Mus species (Table 2-1 and Figure 2-8). These findings are consistent with the trans-species evolution of Ab gene and contrast with data obtained when other nuclear genes or mitochondrial DNA (mtDNA) polymorphisms were analyzed in mice from the same populations. Genomic sequence comparisons of Abd and Abb show
0- WON 0 0) 04 04- 0 0 00 0
rl.l H 4
.u r C a) 0 0 0
> 0 r
C 0 Ha
O r-4 wco >>
4 -H 0
0 o w O C Hw .) aP 4) 0 4 ) IA P 4 M r-". ) 0 Per, M
0 pO .) ,H oQ *H
rd .d o l ,0 o U OU4
0 U 0 ( ~4J
0OU OM 9 000 0
0 *4 U4 -4 # t0 i 0 0
H H N) *-4 4'r 414
that the region of highest divergence between these alleles occurs in the intron separating the pl and P2 exons (Figure 2-9). Abb contains an additional 861 bp of inserted sequences, which are composed of SINE (short interspersed repetitive elements), commonly named retroposon. The relationship of this retroposon polymorphism to the evolutionary lineage defined was tested by genomic restriction mapping of Ab genes from both lineages, 1 and 2. The results indicated the 861 bp retroposon insertion is characteristic of lineage 2 alleles. Using the SINE sequence as an evolutionary tag, it is estimated that the Ab alleles in these two lineages diverged at least 0.4 million years ago and have survived the speciation events leading to several Mus musculus subspecies.
Their studies are further supported by the works of Figueroa et al. (1988). They showed that the molecules encoded by alleles of Ab locus fall into two groups defined by their reactions with monoclonal antibodies. One group reacts with antibodies specific for the antigenic determinant H-2A.m25; the other with antibodies specific for determinant H-2A.m27. This serological reactivity pattern correlates with a specific structural feature of the proteins of Ab genes. Sequence comparison of Ab genes derived from inbred and wild strains has revealed that m27-positive proteins have two amino acids deleted at positions 65 and 67 in the pi exon, while m25 antibodies react with Ab chains that do not have deletions.
' 0M Q4 ) C OP4,% 2o ~ U O4 W 4o0) 4 4JW 04o.J WQ ) n
X 0 mC 0 O($ a) tn *MM4) r O
. mH 4 *HM ri 4 0
.C 44 C O ,C O 4 a) o i 0)0)
0O; 0)H O 4 0 H 4J r 4 n ) W -. 0 -' o apf '
OO ) UN
* u A 0 X 4 C 42
N H0 a 0 *
0 1l*H .. 0 ) "" (0 t ) 0)
0m0 0 0O
0 0) 4 -4 42 CA > o 0 0 0 4
0 -Hqc6 42)U) H* 0 10 ro ,1 0 4
I- 00 4
0 mn "0
QL 1~ co 133
I I c m m.
0) 0.c. I
0P _h ~ C
But no Ab molecules were ever detected to be positive ornegative for both antibodies simultaneously. The perfect correlation between the serological pattern and the presence or absence of the two deletions have been confirmed by testing a panel of Ab in Northern blot analysis (Figueroa et al. 1988). The same deletion polymorphisms also exist in other species distantly related to M. musculus complex such as M. caroli and M. pahari, which is estimated to be separated from M. musculus complex 1.7 and 4.8 million years ago, respectively. Furthermore, the non-deleted and deleted forms of Ab genes are also shown to be present in inbred strains of rat, which is another rodent genus closely related to the genus Mus. They conclude that the codon deletion polymorphisms are shared not only by different species of the same genus but also by different genera of the same order.
Comparisons of class I Mhc alleles in two closely relatedly species: humans (Homo sapiens) and chimpanzees (Pan troglodytes) have also indicated the trans-species mode of evolution in this family of genes (Lawlor et al. 1988; Mayer et al. 1988). There are no features that distinguish human alleles from chimpanzees. Individual HLA-A or B alleles are more closely related to individual chimpanzee alleles than to other HLA-A or B alleles. These studies support the notion that a considerable proportion of contemporary HLA-A and B polymorphisms existed before divergence of the chimpanzee and human lines. A recent report indicates that as high as 30%
of asian wild mice (e.g. Mus m. musculus, Mus m. domesticus, Mus m. castaneus) carry a H-2KE antigen detected by an alloantiserum specific for H-2 class I gene (Sagai et al. 1989). H-2Kf antigen is further characterized by a panel of monoclonal antibodies and restriction enzyme analysis with a H-2K locus-specific probe for 3' end of H-2K. A characteristic RFLP pattern was always found to be associated with a monoclonal antibody reactivity pattern. The concordance between the presence of antigenic determinant and a particular RFLP pattern is observed not only in Mus musculus subspecies, but also in M. spretus. Their results indicated that the antigenic determinant reactive with monoclonal antibodies is an ancient polymorphic structure which has survived speciation in the Mus genus, and is closely associated with a stable DNA segment at the 3' end of H-2K gene.
Intra-exonic recombination. A recent study of Mhc class II Ab genes indicated that another mechanism was mainly responsible for the genetic diversity of Mhc genes (She et al. 1990b). A panel of 52 different alleles derived from laboratory inbred mice as well as various species of mice and rats was analyzed for their A2 nucleotide sequence. Examination of the patterns of sequence polymorphisms revealed that the majority of sequence diversity was localized in five subdomains. Each of these subdomains have several
polymorphic sequence motifs. On the basis of the hypothetical three-dimensional structural model of class II molecules (Brown et al. 1988), these polymorphic sequence motifs are located in the regions encoding the ABS. With respect to the whole A2 exon, it was found that a specific sequence motif could associate with several different motifs from other subdomains to form an allele. This observation indicated that the diversification of A2 exons resulted from intraexonic recombinations which shuffled these motifs into various combinations (Wakeland et al. 1990a; She et al. 1990b)
Mechanisms that maintain Mhc polymorphisms
Although a variety of data indicate that Mhc polymorphism is maintained by some type of balancing selection, the precise mechanisms involved have remained controversial. Two forms of balancing selections, overdominance and frequency-dependent selection, have been proposed to account for the unprecedented genetic diversity of Mhc genes.
Overdominant selection(heterozygous advantage). The maintenance of Mhc polymorphism by overdominant selection was first proposed by Doherty and Zinkernagel (Doherty & Zinkernagel 1975). It is based on the well-established experimental observation that Mhc-linked responsiveness is a dominant (or codominant) genetic trait (Benaceraf & Germain 1978). Mhc heterozygotes are capable of responding to any
antigens recognized by either parental Mhc haplotypes, since Mhc molecules encoded by both Mhc haplotypes are coexpressed on the surfaces of antigen-presenting cells (Benaceraf & Germain 1978). Hughes & Nei (1988) examined the pattern of nucleotide substitution in the region of ABS, involving the 57 polymorphic amino acid residues and other regions of Mhc class I alleles of both human and mice. Their study is based on the theoretical prediction that in the presence of overdominant selection the rate of codon substitution is increased compared with that for neutral alleles and only nonsynonymous substitution would be subject to overdominant selection as synonymous substitutions are more or less neutral (Maruyama & Nei 1981). This increase in rate of codon substitution is due to the selective advantage of heterozygotes carrying the new mutant allele. Their results indicate that in the ABS the rate of nonsynonymous substitution is higher than that of synonymous substitution, whereas in other region the reverse is true. In a later study (Hughes & Nei 1989), the same type of analysis is extended to class II Mhc genes. It is concluded that the unusually high degree of polymorphism at class II Mhc loci is caused mainly by overdominant selection operating in the ABS. Therefore, the biological basis of overdominant selection for class II Mhc loci seems to be similar to that for class I Mhc loci. A mathematical study of overdominant selection model also indicates that it can maintain polymorphic allelic lineages
for a long time and thus it has sufficient explanation for the trans-species evolution of Mhc gene (Takahata & Nei 1990).
Frequency-dependent selection. Initially it was speculated that Mhc alleles generate heterozygote disadvantage in association with infectious diseases and that some kind of frequency-dependent selection is required to maintain the high degree of polymorphism (Bodmer 1972). Pathogen adaptation model was suggested as one form of frequency-dependent selection (Snell 1968; Bodmer 1972). This model is based on the assumption that host individuals carrying new antigens, which have arisen recently by mutation, will be at an advantage because pathogens will not have had the time to adapt to infecting the cells with new antigens. Therefore, this will generate a new form of frequency-dependent selection, in which a new Mhc allele initially has a selective advantage compared with an old allele, but gradually declines with time. This model also suggests that in the presence of pathogen adaptation the average heterozygosity, the number of alleles, and the rate of codon substitution will increase compared with those for neutral alleles.
Rare allele advantage. Another model of frequency dependent selection is rare allele advantage. This hypothesis is based on the notion that endemic pathogens, which evolve much more rapidly than their vertebrate hosts, will tend to
adapt their antigenicity to minimize immune recognition by the most prevalent Mhc genotypes in a population. Consequently, new or rare Mhc alleles will have a selective advantage due to increased resistance to prevalent pathogens. This model predicts cyclic fluctuations in the frequencies of Mhc alleles as pathogens are driven to evolve antigenicity, evading the immune responsiveness of a series of new "prevalent" alleles. This model can explain the maintenance and long persistence of polymorphic alleles by rescuing the rare alleles from distinction (Wakeland et al. 1990).
Recombination Within the Mhc
Recombinational hot spot within I region
The genetic material is a dynamic structure that reorganizes during evolution and differentiation. Nucleotide sequences are rearranged by recombination between homologous or non-homologous sequence. While homologous equal recombination breaks and rejoins chromosomes at precisely the same position, unequal recombination between homologous sequences in different positions leads to duplication and deletions. Over the last ten years recombinant mouse strains have been analyzed by RFLP analysis and DNA sequencing to map the crossover in the I region (Steinmetz et al. 1982a). These studies have shown that recombination within the I region is not random, but localized to specific sites. These sites
have been termed recombination hot spots (RHS) (Steinmetz et al. 1982a). A first such RHS, localized with the intron between the second and third exons of Eb gene, was identified from analysis of six intra-I region recombinant mouse strains (Kobori et al. 1984). Since then, additional three RHS's have been identified within the Mhc, including K/Pb, Pb/Ob (Steinmetz et al. 1986; Uematsu et al. 1986) and Ea (Lafuse & David, 1986) (Figure 2-10). RFLP analysis indicates that recombination within the Pb/Ob, Eb and Ea is reciprocal (Steinmetz et al. 1982a; Steinmetz et al. 1987; Lafuse & David 1987). Analysis of secondary recombinant strains indicates that chromosomes that have recently undergone a recombinant event are unstable and quite likely to undergo a second recombination in the next generation (Lafuse & David 1987).
Molecular basis of recombinational hotspots
In the human genome, recombinational hotspots mainly occur in regions containing hypervariable minisatellite sequences. These minisatellite sequences are composed of tandem repeats and occur at multiple locations. The repeat unit contains a common 16-bp core sequence, GGAGGTGGGCAGGARG. DNA sequence searchs for the Pb/Ob and Eb recombinational hotspots have found that short repeated sequences with some homology to the recombination signal Chi (GCTGGTGG) of phage lambda: (CAGA)6 in the PEb/Ob hotspot and (CAGG)7_9 in the Eb hotspot (Steinmetz et al. 1986). The CAGG repeated sequence
* 0 O
identified in the Eb hotspot exhibiting significant homology to the human minisatellite core sequence, and thus may represent a murine minisatellite (Steinmetz et al. 1987). Recently, a female-specific recombination hotspot has been mapped to a 1 kb region of DNA between the Pb and Ob genes (Shiroishi et al. 1990). This hotspot predominantly occurs in crosses between Japanese wild mice Mus musculus molossinus and laboratory haplotypes. Its location overlaps with a sexindependent hotspot previously identified in the Mus musculus castaneus CAS3 haplotype. Sequence comparisons between DNA surrounding this hotspot and corresponding regions from other strains, including B1O.A, C57BL/10, CAS3 and C57BL/6, revealed no significant difference. However, sequence analysis of this Pb/Ob hotspot with a hotspot in Eb indicated that both have a very similar molecular structure. Each hotspot is composed of two elements, mouse middle repetitive MT family and the tetrameric repeated sequence, both are separated by 1 kb of DNA (Shiroishi et al. 1990).
Definition of Evolutionary Lineage
The evolutionary lineage of Ab was initially defined by RFLP analysis of 31 Ab alleles from 5 different Mus species (McConnell et al. 1988). These 31 alleles were ordered into three distinct lineages based on calculating the fraction of restriction fragments (F) (Nei & Li 1979) and sites shared
(S), which is used to estimate the genomic sequence divergence
(Table 2-1). Sequence comparisons of lineage 1 (Abd) and lineage 2 (Abb) alleles indicated that the major DNA sequence polymorphism between these two lineages occur in the intron 2 between pr and 82 exons (Figure 2-9). The sequence homology in this intron is <90%, and Abb gene contains an extra 861 bp of retroposon, flanked by 13 bp direct repeats (ATGTATGCTGTTT). The host-derived nature of this direct repeat sequence indicates that the 861 bp retroposon was inserted into this position as a random event during the evolutionary divergence Ab genes. Inspection of genomic restriction maps of alleles derived from separate Mus species indicate that the retroposon insertion is characteristic of lineage 2 alleles (McConnell et al. 1988). These results indicate the evolutionary lineages defined by RFLP analysis reflect alleles with different retroposon polymorphisms.
Structure and Evolution of Retroposon
Before cloning of DNA became a major tool of studying gene structure and function, chromosome renaturation experiments showed that most organisms possess short stretches of moderately repeated DNA (mrDNA) separated by longer sequences of low copy number (Davidson and Britten 1979). For mammals, most of the mrDNA is composed of retroposons, some of which are thought to represent mobile genetic elements using RNA intermediates in their replication (Jagadeeswaran
et al. 1981). These mrDNA belong to different sequence families in different mammalian orders(reviewed by Rogers 1985). The majority of mammalian interspersed repeated DNA falls into two families, referred to as short and long interspersed nucleotide elements, SINEs and LINEs, respectively (Singer 1982). The "generic" SINE sequence contains an internal RNA polymerase III promoter, an A-rich 3'end and flanking direct repeats. The size of SINEs typically range from 75 to as much as 500 bp in length. All nonviral retroposons correspond to a partial or complete DNA copy of a cellular RNA species. With a few exceptions, nonviral retroposons are derived from fully processed RNAs (reviewed by Weiner et al. 1986).
Structure of Alu and "Alu-like" Family
The first well-characterized and the most abundant repeated DNA family in primates is the Alu family which constitute most of the dispersed, repeated DNA (Houck et al. 1979). The 500,000 Alu elements in the human constitute 5-6% of the genome by size, occurring on average every 5-9 kb and differing on average by 13% from the consensus sequences (Schmid & Jelinek 1982; Rinehart et al. 1981). Other SINE families are referred to as "Alu-like" or "Alu-equivalent" families. Mice, rats, and hamsters all contain two abundant "Alu-like" families, B1 and B2 (Kramerove et al. 1979; Krayev et al. 1980; Haynes et al. 1981). The Alu elements,
approximately 300 bp long, were so named because they contain a distinctive Alu I cleavage site. Regions of direct internal repetition within Alu sequences indicate that the Alu element is composed of two incompletely homologous arms, an approximately 130 bp left arm and a right arm which differs from the other by an insertion of 31 bp (reviewed by Doolittle 1985). Although human Alu sequences are dimeric, the homologous rodent sequences (the Bl superfamily) are monomeric. It is believed that both Alu and B1 sequences are derived independently from 7SL RNA as 7SL RNA gene has about 150 bp in the middle that is not found in the Alu family (Ullu et al. 1985; Weiner et al. 1986). 7SL RNA is a component of signal recognition particle, required for cotranslational secretion of proteins into the lumen of rough endoplasmic reticulum (Walter & Blobel 1982), and is highly conserved throughout evolution. Alu-like sequences, and retroposons in general, have a strong tendency to insert into each others'
(A)-rich tails. This has apparently generated composites which are themselves propagated as single retroposons (Jagadeeswaran et al. 1981; Haynes et al. 1981).
Mechanisms of Retroposition
Transcription by polymerase III
The basic model for retroposition of SINEs involves RNA polymerase III transcription of genes, reverse transcription of the RNA, and integration into the genome (Figure 2-11).
All SINEs contain an internal RNA polymerase III split promotor (Galli et al. 1981). In vitro transcription experiments have shown that the 5' end of the SINE transcripts have coincided exactly with the left end of the repeated DNA sequence. These results have led to the proposal that the SINEs propagate via RNA-mediated retroposition (Jagadeeswaran et al. 1982). SINE family members are able to produce in vivo transcripts, their transcription is regulated in a tissuespecific manner. The homogeneous size of Alu transcripts indicates that one or a few identical family members are transcribed (Watson & Sutfliffe 1987). The transcription of 7SL RNA gene requires a specific 37-bp upstream sequence in addition to its internal promoter (Ullu & Weiner 1985). Since the Alu family has evolved from 7SL RNA, its promotor may similarly depend on such upstream sequences. A critical step in promoting an efficient SINE retroposition may be mutations that render the promotor independent of flanking sequence. However, the established chromatin structure and environment into which the SINE member is situated may have a regulatory effect on the transcription of SINE family members. In transfection assays, it was found that the introduced SINE member is transcriptionally active in transient assay, but is silent in long-term transformants. These results also support the concept that the internal promotor is not sufficient by itself in vivo (reviewed by Deninger 1990).
Figure 2-11. A proposed mechanism for SINE retroposition. The first step is transcription of the repeated DNA sequence. The repeat is represented by a heavy line, its flanking sequence by thinner lines, an the transcript by a wavy line. Transcription initiates at the beginning of the repeat, adjacent to the flanking direct repeat (double solid arrows), continues through the entire repeat, and terminates in flanking sequence. This transcript is suggested to be capable of self-priming reverse transcription by priming with its terminal U residues on the 3' A-rich region of the repeat transcript. Removal of the RNA will then leave a singlestranded cDNA copy of the entire repeat with no falanking sequences. This cDNA must tehn integrate into a genomic site with staggered nicks. It is hypothesized that an A richness at the nikc site may interact with the T-rich cDNA end to stabilized the interaction. Repair synthiesis at the junctions will then result in formation of a newly integrated repeated DNA family member with a different flanking direct repeat(double hollow adrrows). Many of these steps are hypothetical and a number of alternatives are possible. Adapted from Deininger (1989).
Transcription "AAAAA 3' i shh~lr4PJAAA TTT
target site AAA
Termination of transcription
Most SINEs do not contain the termination signal for RNA polymerase III (Fuhrman et al. 1981). Transcription starts from the 5' end of SINE, runs through the entire repeat, and terminates at the flanking sequence by chance as the consensus sequence for termination contains four or more T's in a row (Bogenhagen et al. 1980). Most in vivo SINE transcripts appear to be polyadenylated (Deininger 1990).
Since the transcripts of SINE family members normally possess a poly(A) tract, they may be able to self-prime their reverse transcription (Jagadeeswaran et al. 1981). Moreover, the RNA polymerase III transcripts should have three or more U's at their 3' end, which may fold back and prime reverse transcription (Bogenhagen et al. 1980). Reverse transcription could also be primed by an intermolecular interaction, for instance, using the 3'end of another transcript through the
(A)-rich region (VanArsdell et al. 1981). The source of reverse transcriptase, which must be active in germ line, is unknown. One possible source is from the intracisternal A particles (IAP), which produce particles containing reverse transcriptase (Wilson & Kuff 1972) and are active in early embryos (Kelly and Condamine 1982). Or it may be provided during retroviral infections or from endogenous retroviral sequences (Martin et al. 1981). Small RNA molecules can be
packaged into retroviral particles and be reverse transcribed (Linial et al. 1978). Packaging should facilitate the reverse transcription and may account for the high efficiency of SINE retropositon. Packaging may also promote an "infection-like" process facilitating RNA made in somatic cell to enter the germ line (Vanin 1984).
To facilitate the integration process, the genome must be nicked to allow the entry of new sequences, followed by repair synthesis to make direct repeats at the integration sites. Direct repeats generated are generally rich in A residues and vary widely in length, suggesting that SINE do not use specific integration enzymes but instead take advantage of nicks generated by other nonspecific enzymes. Topoisomerases, enzymes that relax the genome during replication and transcription, have been shown to have nicking activity in a SINE family member in vitro (Perez-Stable et al. 1984). Although topoisomerase I is generally thought to be nonspecific in its nicking activity, hot spots for DNA cleavage have been reported (Busk et al. 1987). These sites are A rich and at least partially resemble the 3' terminus and direct repeats of SINEs. Not only are the integration sites of SINEs A rich, but the A richness is predominantly at the left end of the direct repeat (Daniels & Deininger 1985; Rogers et al. 1986) These findings have several
ramifications. First, it shows that the integration is not random. Second, since the 3' ends of the SINE families are generally A rich, when they integrate into a new site they generally make that site even more A rich. Therefore, the 3' end of SINEs are improved integration sites for more SINE copies, resulting in a tendency to form perfect tandem dimers (reviewed by Rogers 1985). In several examples, it appears that the integration of one element abutting another form a composite so that they could retropose as a single unit (Daniels & Deininger 1983).
Functions Attributable to SINE
It is assumed that the broad genomic distribution and high copy number may serve an important cellular function. It has also been argued that these repetitive elements are selfish DNA whose self-propagation provides no benefit to their hosts (Doolittle and Sapienza 1980; Orgel and Crick 1980). SINEs have been involved in a number of effects on genome structure and evolution. For example, SINEs may promote deletion or facilitate recombination (Lehrman et al 1987), act as limits to gene conversion (Hess et al. 1983) and move unrelated DNA segments throughout the genome either via retroposition of sequences adjacent to SINEs (Zelnick et al. 1987). They may just affect the long-terms adaptability of the species.
Recombination involving the Alu repeats have resulted in phenotypic changes. For example, at least two different forms of globin gene defects occur in a pair of inverted Alu repeats, which result in a deletion of gene. The LDL receptor gene has a number of Alu dispersed repeats in its intron, 3' noncoding region, and flanking region. Five naturally occurring insertion/deletion mutants of this gene have produced defective receptors, four of which involve Alu-Alu recombination (Horsthemke et al. 1987).
Suppression of gene conversion
Examination of regions of globin genes have provided evidence that SINE can help to limit gene conversion events (Hess et al. 1983; Schimenti & Duncan 1984). The globin genes consist of a multigene family whose members start to evolve after duplication. By limiting the degree of gene conversion, the SINE sequences may promote gene diversification and the evolution of new functions(Deininger 1990).
Mobilization of DNA sequence
Several composite SINE families are formed by fusing new sequences with a SINE to become a functionally-transposing unit, indicating that SINE has a potential to mobilize other sequences (reviewed by Weiner et al. 1986). There is one example of genomic non-repetitive sequence that lay between
two artiodactyl SINEs retroposed with them as a unit, resulting in the duplication within the cow haploid genome (Zelnick et al. 1987).
In vitro transfection experiments also indicated that SINEs might repress or activate transcription initiated by adjacent RNA polymerase II promotor (McKinnon et al. 1986). Another function conferred by certain SINEs is to encode portion of polypeptides. Alu dispersed repeats constitute for 32 codons of 3' portion of genes for decay-accelerating factor and for a B-cell growth factor (Caras et al. 1987; Sharma et al. 1987). The CCAAT box of the 6-globin gene in primates is part of an Alu repeat sequence (Kim et al. 1989). Some SINEs are found in the 3' noncoding exons and provided polyadenylation signal (Krane & Hardison. 1990). Thus, functional sequences provided by SINE include promotor, RNA processing and protein-coding sequences.
Evolution of Introns
Mammalian genes are discontinuous, broken up along the DNA into alternating regions: coding sequence or exons, which are interspaced with other noncoding sequences or introns that will be spliced out of the primary transcript. An intriguing question regarding the introns is what advantages or functions are provided to the cell by them. There has been ample speculation about the origin and maintenance of introns in
eukaryotic genomes. Gilbert (1978, 1985) proposed "exon shuffling" hypothesis which states that introns provide an evolutionary advantage by allowing recombination within intron sequences, and that introns in modern genomes were remnants of the recombination process that speed up evolution. The observations that the exons often correlated with functional domains and that the homologous exons can be found in different genes have been used to support this idea.
Examinations of genes coding for certain ubiquitous enzymes, such as triosephosphate isomerase, whose sequence is highly conserved across species, have revealed that the intron positions are not random and that all of these introns were in place before the division of plants and animals (Gilbert et al. 1986), the introns were lost from prokaryotes as their genomes became streamlined for rapid DNA replication (Doolittle 1978). After the discovery of introns, a number of authors have suggested that intron might represent the vestiges of transposable elements which had been inserted into the genes (Cavalier-Smith 1985; Hickey & Benkel 1986). Although there is evidence that many, if not all, introns are dispensable (Ng et al. 1985), there is also evidence that the internal sequences of introns are important for splicing (Rautmann & Breatnach 1985). Cech (1986) has suggested that all RNA splicing reactions are evolutionarily related, with the exception of those involving some pre-tRNA. This evolutionary link between different intron classes implies
that the introns of nuclear protein-coding genes were also capable of replicative transposition at some stage in their evolutionary history. Hickey & Benkel (1986) have suggested a model to account for the evolutionary origin of introns. The main points of this model are summarized as follows: (i) Most present day introns are the relics of retrotransposons;
(ii) copies of transposable sequence were contained within the RNA primary transcript; (iii) RNA splicing activity encoded by the transposable elements processed the transcripts into exon and intron sequences; (iv) the exons were then available for translated into gene product; (v) the spliced intron were able to be reversed-transcribed into DNA and reinserted into else where in the genome. Although Doolittle (1978) argued that the de novo insertion of introns into functional genes would disrupt normal gene expression and thus would be strongly selected against at the organismic level, it was proposed that the RNA splicing might function solely to counteract the potential negative effect of introns (Hickey & Benkel 1986). A common property shared by all introns is their removal from primary transcripts by splicing. Numerous evidences have indicated that the splicing activity is controlled by introns themselves. For instances, some fungal mitochondrial group I and II introns can undergo self-splicing which depends on the structure of RNA transcripts and can propagate themselves by insertion into genes (reviewed by Lambowitz 1989). Genetic analysis of mitochondrial system
also indicated that in vivo self-splicing depends on socalled maturase, some of which are encoded by the intron themselves. All characterized maturase function only in splicing the intron in which they are encoded or closely related intron. It has been proposed that the nuclear premRNA intron have evolved from self-inserted group II intron (Roger 1989) (Figure 2-12). Once an intron is inserted, it might take only a single base change to convert the group II intron into classical intron. Now both types of introns have similar consensus sequences.
Wild Mice As a Useful Genetic Tool
Part of the goal of this dissertation is to determine the distribution of evolutionary lineages of the class II Ab gene in the genus Mus and to determine how long these lineages have persisted in Mus during the evolution of Ab genes. Previous studies of the evolution of Mhc class II genes were limited in the number of species examined and limited in the number of strains tested. In this dissertation, we have extended the previous study by including twelve species and subspecies of genus Mus and the 115 H-2 haplotypes extracted from them.
The "house mouse", has become the most studied animal of laboratory research probably because its habitat is closest to that of man. It has been known for some time that the major laboratory inbred strains are derived from common
Figure 2-12. Proposed sequence of events that a group II intron could mutate into a classical intron. Adapted from Roger (1989).
Reverse DNA insertion of
"-h.or Group II intron GT AT
Autonomous NMutation GT AG
ancestors (Morse 1978). Study of mitochondrial DNA has indicated that most laboratory inbred strains belong to the Mus musculus domesticus type (Ferris et al. 1982). On the contrary, using a Y-specific DNA probe has revealed that the Y chromosomes of most of laboratory inbred strains, except SJL, is of M. m. musculus origin (Bishop et al. 1985). Thus the pool of segregating genes in laboratory mice is fairly limited and probably does not reflect the mouse species as it is in the wild (Guenet 1986). In fact, had it not been for wild mice, the analysis of certain genetic loci, e.g., Mta, a maternally transmitted histocompatibility antigen, would have suffered premature termination (Lindahl 1986). Depending on the degree of association with humans, wild mice can be distinguished into three groups. These are aboriginal, commensal and feral. Aboriginal mice live primarily independently of human construction. Commensal mice live in close association with man-made structure, and feral mice have resumed an aboriginal mode of life from the commensal stage (reviewed by Sage 1981). The aboriginal species include Mus spretus, M. spretoides (M. macedonicus; M. abbotti), M. spicilequs (M. hortulanus). All introduced populations of M. domesticus in the New World and in Australia, which live in native vegetation, are considered feral forms derived from commensal ancestors. Based on genetic variability of wild mice, using both DNA and biochemical markers, the Mus genus can be divided into the complex species Mus musculus and at
least eight other species, including Mus spretus, M. spretoides, M. spicilegeus, M. cooki, M. cervicolor, M. pahari, M. platythrix (Bonhomme et al. 1984; Bonhomme, 1986; Avner et al. 1988). Mus musculus complex species itself consists of four main biochemical groups Mus musculus musculus, Mus musculus domesticus, Mus musculus castaneus, and Mus musculus bactrianus, all of which are considered as subspecies.
M. m. domesticus is present in Western Europe, the Mediterranean basin, Africa, Arabia, Middle East and has been transported by ship to the New World, Australia and southeastern Africa, leaving few regions of the earth without house mice. M. m. musculus occurs in Eastern Europe, extending to Japan across USSR and North China. M. m. bacitrianus is distributed from Eastern Europe to Pakistan and India. The distribution of M. m. castaneus ranges from Ceylon to South East Asia through the Indo-Malayan archipelago (Figure 2-13). Even though these four subspecies are quite biochemically differentiated, they may exchange genes wherever they come into contact (Bonhomme et al. 1984). One of the best understood cases is that between M. m. musculus and M. m. castaneus in Japan (Yonekawa et al. 1986; Yonekawa et al. 1988). The Japanese mouse, M. m. molossinus, has long been considered an independent subspecies of the house mouse. However, the restriction enzyme analysis of mitochondrial DNA (mt DNA) indicated that M. m. molossinus has two main maternal
o ) 4-O
*H0 Ex S
~ifiRM1IIU~ ~~ ~~m,
lineages. One lineage is closely related to the mtDNA of the European subspecies M. m. musculus, the other is closely related to the mtDNA of the Asiatic subspecies M. m. castaneus.
The three aboriginal species, namely, M. spretus, M. spretoid, and M. spicilegus, may be found in sympatry with M. musculus subspecies. M. spretoides and M. spicileus probably represent the best case of sibling species thus far discovered in mammals. They are very similar morphologically and biochemically. Yet under the laboratory conditions they can not interbreed (Bonhomme 1986). The mound-building species, M. spicilequs, is found in steppe grasslands of the Carpathian basin and the Ukraine. The distribution of short-tailed M. spretoides is limited to southeastern Europe and Asia Minor (mainly eastern Mediterranean). M. spretus is found existent in the western Mediterranean, from France to Libya (Figure 2-14).
Europe is not the homeland of the genus Mus. All of the Mus species and subspecies that presently inhabit the continent seem to have entered it with man (Bonhomme 1986). Certain members of genus Mus have apparently inhabited India and Southeast Asia since their origins. Three strictly oriental species, M. caroli, M. cervicolor, and M. cooki, form a monophyletic group according to single copy nuclear DNA (scn DNA) hybridization and mtDNA data. Protein electrophoretic data also suggest that these three Asian species have speciated almost simultaneously (She et al. 1990).
0 o UJ 0 f 0 r OO
Erar O z